Molecules And Methods For Iterative Polypeptide Analysis And Processing Havranek; James J. ; et al. [Washington University]

Molecules And Methods For Iterative Polypeptide Analysis And Processing

Havranek; James J. ; et al.

Patent Application Summary

U.S. patent application number 17/088898 was filed with the patent office on 2021-03-11 for molecules and methods for iterative polypeptide analysis and processing. This patent application is currently assigned to Washington University. The applicant listed for this patent is Washington University. Invention is credited to Benjamin Borgo, James J. Havranek.

Application Number	20210072252 17/088898
Document ID	/
Family ID	1000005223221
Filed Date	2021-03-11

United States Patent Application	20210072252
Kind Code	A1
Havranek; James J. ; et al.	March 11, 2021

MOLECULES AND METHODS FOR ITERATIVE POLYPEPTIDE ANALYSIS AND PROCESSING

Abstract

Reagents and methods for the digital analysis of proteins or peptides are provided. Specifically provided herein are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide. Also, an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme are described.

Inventors:

Havranek; James J.; (Clayton, MO) ; Borgo; Benjamin; (St. Louis, MO)

Applicant:

Name	City	State	Country	Type
Washington University	St. Louis	MO	US

Assignee:

Washington University
St. Louis
MO

Family ID:

1000005223221

Appl. No.:

17/088898

Filed:

November 4, 2020

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
16907813	Jun 22, 2020	10852305
17088898
15255433	Sep 2, 2016
16907813
14211448	Mar 14, 2014	9435810
15255433
61798705	Mar 15, 2013

Current U.S. Class:	1/1
Current CPC Class:	C12Y 601/0101 20130101; C12N 9/93 20130101; C12N 9/641 20130101; G01N 33/6824 20130101; C12Y 304/11018 20130101; C12N 9/485 20130101; C12Y 601/0102 20130101; C12N 9/52 20130101; C12Y 601/01021 20130101
International Class:	G01N 33/68 20060101 G01N033/68; C12N 9/48 20060101 C12N009/48; C12N 9/52 20060101 C12N009/52; C12N 9/64 20060101 C12N009/64; C12N 9/00 20060101 C12N009/00

Goverment Interests

GOVERNMENT LICENSE RIGHTS

[0002] This invention was made with Government support under grant R01 GM101602 awarded by the National Institutes of Health. The Government has certain rights in the invention.

Claims

1-83. (canceled)

84. An isolated N-terminal amino acid binding protein (NAAB), comprising a modified, non-naturally occurring tRNA synthetase (RS) that selectively binds to a N-terminal amino acid residue of a polypeptide with at least about a 1.5:1 ratio of specific to non-specific binding.

85. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring aminoacyl tRNA synthetase is coupled with or bound to a fluorescent label.

86. The isolated NAAB of claim 85, wherein the fluorescent label is covalently attached to the modified, non-naturally occurring RS.

87. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring RS selectively binds to N-terminal amino acid residue of a particular type.

88. The isolated NAAB of claim 87, wherein the type of N-terminal amino acid residue is one selected from the group consisting of alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine.

89. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring RS binds to an N-terminal amino acid residue with a post translational-modification.

90. The isolated NAAB of claim 89, wherein the N-terminal amino acid residue with a post translational-modification is a phosphorylated N-terminal amino acid residue.

91. The isolated NAAB of claim 90, wherein the NAAB binds to an N-terminal pTyr residue and is a modified Class I TyrRS from Methanococcus janaschi or related archaea.

92. The isolated NAAB of claim 91, wherein the modified Class I TyrRS is modified at one or more of the following positions: Y32, L65, F108, Q109, D158, I59, and L162.

93. The isolated NAAB of claim 90, wherein the NAAB binds to an N-terminal pSer residue and is a modified Class II SepRS from Archaeoglobus fulgidus or related methanogenic archaea.

94. The isolated NAAB of claim 90, wherein the modified Class II SepRS is modified at one or more of the following positions: E412, E414, K417, P495, 1496 and F529.

95. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring RS selectively binds to methionine and comprises an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 3 and also containing a serine residue at a position corresponding to position 10 of SEQ ID NO: 3; a leucine residue at a position corresponding to position 257 of SEQ ID NO: 3; a glycine residue at a position corresponding to position 293 of SEQ ID NO: 3; and/or a leucine residue at a position corresponding to position 298 of SEQ ID NO: 3.

96. The isolated NAAB of claim 95, wherein the NAAB comprises the amino acid sequence of SEQ ID NO: 4.

97. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring RS selectively binds to phenylalanine and comprises an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 6 and also containing an aspartate residue at a position corresponding to position 15 of SEQ ID NO: 6, an asparagine residue at a position corresponding to position 57 of SEQ ID NO: 6, a glycine residue at a position corresponding to position 58 of SEQ ID NO: 6, a valine residue at a position corresponding to position 67 of SEQ ID NO: 6, a glycine residue at a position corresponding to position 68 of SEQ ID NO: 6, a lysine residue at a position corresponding to position 69 of SEQ ID NO: 6, an aspartate residue at a position corresponding to position 80 of SEQ ID NO: 6, an alanine residue at a position corresponding to position 120 of SEQ ID NO: 6, an alanine residue at a position corresponding to position 127 of SEQ ID NO: 6, a valine residue at a position corresponding to position 143 of SEQ ID NO: 6, an asparagine residue at a position corresponding to position 144 of SEQ ID NO: 6, a glutamate residue at a position corresponding to position 145 of SEQ ID NO: 6, a glycine residue at a position corresponding to position 146 of SEQ ID NO: 6, an aspartate residue at a position corresponding to position 147 of SEQ ID NO: 6, a tyrosine residue at a position corresponding to position 149 of SEQ ID NO: 6, a threonine residue at a position corresponding to position 172 of SEQ ID NO: 6, a glycine residue at a position corresponding to position 202 of SEQ ID NO: 6, an asparagine residue at a position corresponding to position 204 of SEQ ID NO: 6, an aspartate residue at a position corresponding to position 218 of SEQ ID NO: 6, an alanine residue at a position corresponding to position 251 of SEQ ID NO: 6, a threonine residue at a position corresponding to position 253 of SEQ ID NO: 6, and/or a glycine residue at a position corresponding to position 255 of SEQ ID NO: 6.

98. The isolated NAAB of claim 97, wherein the NAAB comprises the amino acid sequence of SEQ ID NO: 7.

99. The isolated NAAB of claim 84, wherein the modified, non-naturally occurring RS selectively binds to histidine and comprises an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 9 and also containing an asparagine residue at a position corresponding to position 121 of SEQ ID NO: 9 and an alanine residue at a position corresponding to position 122 of SEQ ID NO: 9.

100. The isolated NAAB of claim 99, wherein the NAAB comprises the amino acid sequence of SEQ ID NO: 10.

101. The isolated NAAB of claim 84, comprising a modified, non-naturally occurring tRNA synthetase that selectively binds to a PITC-derivatized N-terminal amino acid residue of a polypeptide with at least about a 1.5:1 ratio of specific to non-specific binding.

102. A method for making a N-terminal amino acid binding (NAAB) protein that selectively binds to a N-terminal amino acid residue of a polypeptide, the method comprising: identifying an amino acid binding domain of a tRNA synthetase (RS); introducing one or more mutations into the amino acid binding domain to form a NAAB; and optionally assaying the NAAB for specific binding to the N-terminal amino acid residue of a polypeptide.

103. The method of claim 102, wherein the tRNA synthetase is a first class I tRNA synthetase and the identifying step comprises aligning an amino acid sequence of the first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain.

104. The method of claim 103, wherein the identifying step comprises constructing a multiple sequence alignment that aligns the amino acid sequences of the first class I tRNA synthetase, the second class I tRNA synthetase, and at least one additional class I tRNA synthetase.

105. The method of claim 104, wherein the multiple sequence alignment aligns the sequences of at least five class I tRNA synthetases.

106. The method of claim 105, wherein the multiple sequence alignment aligns the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5), or a fragment thereof which includes the amino acid binding domain, with the amino acid sequences of at least two other class I tRNA synthetases selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine tRNA synthetases.

107. The method of claim 102, wherein the tRNA synthetase is a first class II tRNA synthetase and the identifying step comprises aligning an amino acid sequence of the first class II tRNA synthetase with an amino acid sequence of a second class II tRNA synthetase having a previously defined amino acid binding domain.

108. The method of claim 107, wherein the identifying step comprises aligning the amino acid sequence of the monomeric fragment of E. coli HisRS with a corresponding domain of a class II tRNA synthetase selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.

109. The method of claim 108, wherein the identifying step comprises constructing a multiple sequence alignment that aligns the amino acid sequences of the first class II tRNA synthetase, the second class II tRNA synthetase, and at least one additional class II tRNA synthetase.

110. The method of claim 109, wherein the multiple sequence alignment aligns the sequences of at least five class II tRNA synthetases.

111. A kit for analyzing or sequencing a polypeptide comprising: one or more N-terminal amino acid binding proteins (NAABs), wherein each of the one or more NAABs selectively binds to a N-terminal amino acid residue of a polypeptide; an Edman degradation enzyme; and instructions for using the NAABs and the Edman degradation enzyme for analyzing or sequencing a polypeptide.

112. The kit of claim 111, wherein at least one of the NAABs comprises a modified, non-naturally occurring tRNA synthetase (RS) that selectively binds to a N-terminal amino acid residue of a polypeptide with at least about a 1.5:1 ratio of specific to non-specific binding.

113. The kit of claim 111, wherein at least one of the NAABs comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.

114. The kit of claim 111, wherein the Edman degradation enzyme comprises an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 29, and also containing a glycine residue at a position corresponding to position 25 of SEQ ID NO: 29; a serine residue at a position corresponding to position 65 of SEQ ID NO: 29; a cysteine residue at a position corresponding to position 138 of SEQ ID NO: 29; and/or a tryptophan residue at a position corresponding to position 160 of SEQ ID NO: 29.

Description

REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of U.S. patent application Ser. No. 16/907,813, filed Jun. 22, 2020, which is a continuation of U.S. patent application Ser. No. 15/255,433, filed Sep. 2, 2016, now abandoned, which is a division of U.S. patent application Ser. No. 14/211,448, filed Mar. 14, 2014, now U.S. Pat. No. 9,435,810, issued Sep. 5, 2016, which claims the benefit of U.S. Provisional Application No. 61/798,705, filed Mar. 15, 2013, the entire disclosures of which are incorporated herein by reference.

FIELD OF THE INVENTION

[0003] The present invention generally relates to reagents and methods for the digital analysis of proteins or peptides. Specifically provided herein are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide. Another aspect of the invention is an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme.

BACKGROUND OF THE INVENTION

[0004] Proteins carry out the majority of signaling, metabolic, and regulatory tasks necessary for life. As a result, a quantitative description of the proteomic state of cells, tissues, and fluids is crucial for assessing the functionally relevant differences between diseased and unaffected tissues, between cells of different lineages or developmental states, and between cells executing different regulatory programs. Although powerful high-throughput techniques are available for determining the RNA content of a biological sample, the correlation between mRNA and protein levels is low (1).

[0005] The preferred method for proteomic characterization is currently mass spectrometry. Despite its many successes, mass spectrometry possesses limitations. One limitation is quantification. Because different proteins ionize with different efficiencies, it is difficult to compare relative amounts between two samples without isotopic labeling (2). In `shotgun` strategies for analyzing complex samples, the uncertainties of peptide assignment further complicate quantification, especially for low abundance proteins (3). A second limitation of mass spectrometry is its dynamic range. For unbiased samples that have not undergone prefractionation or affinity purification, the dynamic range in analyte concentration is roughly 10.sup.2-10.sup.3, depending upon the instrument (4). This is problematic for complex samples such as blood, where two proteins whose levels are measured in clinical laboratories (albumin and interleukin-6) can differ in abundance by 10.sup.10 (5). Another limitation is the analysis of phosphopeptides, due to the loss of phosphate in some ionization modes. The power of proteomic approaches would increase dramatically with the introduction of a more quantitative high-throughput assay possessing greater dynamic range.

[0006] One promising technology for the analysis of proteins in a sensitive and quantitative manner was developed by Mitra et al (7). This technology, referred to as Digital Analysis of Proteins by End Sequencing or DAPES, features a method for single molecule protein analysis. To perform DAPES, a large number (ca. 10.sup.9) of protein molecules are denatured and cleaved into peptides. These peptides are immobilized on a nanogel surface applied to the surface of a microscope slide and their amino acid sequences are determined in parallel using a method related to Edman degradation. Phenyl isothiocyanate (PITC) is added to the slide and reacts with the N-terminal amino acid of each peptide to form a stable phenylthiourea derivative. Next, the identity of the N-terminal amino acid derivative is determined by performing, for example, 20 rounds of antibody binding with antibodies specific for each PITC-derivatized N-terminal amino acid, detection, and stripping. The N-terminal amino acid is removed by raising the temperature or lowering pH, and the cycle is repeated to sequence 12-20 amino acids from each peptide on the slide. The absolute concentration of every protein in the original sample can then be calculated based on the number of different peptide sequences observed.

[0007] The phenyl isothiocyanate chemistry used in DAPES is the same used in Edman degradation and is efficient and robust (>99% efficiency). However, the cleavage of single amino acids requires strong anhydrous acid or alternatively, an aqueous buffer at elevated temperatures. Cycling between either of these harsh conditions is undesirable for multiple rounds of analysis on sensitive substrates used for single molecule protein detection (SMD). Thus, there is a need in the art for improved reagents and methods for the parallel analysis of peptides in single molecule protein detection (SMD) format.

SUMMARY OF THE INVENTION

[0008] One aspect of the invention is an improved method for single molecule sequencing of proteins or peptides. Generally, the method for sequencing a polypeptide, the method comprises (a) contacting the polypeptide with one or more fluorescently labeled N-terminal amino acid binding proteins (NAABs); (b) detecting fluorescence of a NAAB bound to an N-terminal amino acid of the polypeptide; (c) identifying the N-terminal amino acid of the polypeptide based on the fluorescence detected; (d) removing the NAAB from the polypeptide; (e) optionally repeating steps (a) through (d); (f) cleaving the N-terminal amino acid of the polypeptide via Edman degradation; and (g) repeating steps (a) through (f) one or more times.

[0009] The present invention also generally relates to reagents for the digital analysis of proteins or peptides. Specifically provided herein are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide.

[0010] Another aspect of the invention relates to an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme. Generally, the enzymatic Edman degradation method comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid and cleaving the PITC-derivatized N-terminal amino acid using an Edman degradation enzyme.

[0011] Other objects and features will be in part apparent and in part pointed out hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 depicts the Digital Analysis of Proteins by End Sequencing Protocol (DAPES) utilizing N-terminal amino acid binding proteins in the identification step and a synthetic enzyme in the cleavage step.

[0013] FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and an engineered leucine-specific aminopeptidase (eLAP) of the present invention in a single-molecule detection experiment.

[0014] FIG. 3 shows the binding specificity of an engineered mutant of methionine tRNA synthetase (MetRS) of the present invention that exhibits binding specificity for surface-immobilized peptides with N-terminal methionines.

[0015] FIG. 4A-4B depict three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27)) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate.

[0016] FIG. 5A depicts a model for a cleavage intermediate for Edman degradation generated using experimental small molecules structures for similar compounds and geometrically optimized using quantum chemistry calculations.

[0017] FIG. 5B shows the model for the intermediate fitted into the active site cleft of the enzyme cruzain. The wild-type catalytic cysteine was removed. The activating residues (the other two components of the `catalytic triad`) were retained. These are a histidine and asparagine that are intended to activate the sulfur atom in the Edman reagent for nucleophile attack on the peptide bond.

[0018] FIG. 6 is a graphical representation of kinetic data from cleavage experiments using an Edman degradation enzyme of the present invention and the substrate Ed-Asp-AMC.

[0019] FIG. 7 is a trace plot of biolayer interferometry kinetics data showing the binding affinity of two proteins for peptides with N-terminal histidine residues: (1) engineered His NAAB (open circles); (2) native wild-type protein (solid circles).

[0020] FIG. 8 is a full binding matrix showing the binding affinity of every single NAAB (row) for a single N-terminal amino acid (column) as measured by biolayer interferometry.

DESCRIPTION OF THE INVENTION

[0021] In one aspect, the present invention is directed to a method and reagents for sequencing a polypeptide. In particular, the present invention provides methods and reagents for the single-molecule, high-throughput sequencing of polypeptides. Recent advances in single-molecule protein detection (SMD) allow for the parallel analysis of large numbers of individual proteins utilizing digital protocols. In accordance with the present invention, reagents capable of specifically binding to N-terminal amino acids for an identification step are provided.

[0022] The present invention also includes methods and reagents for identification phosphorylated N-terminal amino acids. Quantitatively interrogating peptide sequences in neutral aqueous environments allows for the possibility of proteomic analyses complementary to those afforded by mass spectrometry. The N-terminal amino acids specific for phosphorylated forms of amino acids allow for quantitative comparison of proteomic inventories and signal transduction cascades in different samples.

[0023] In another aspect, the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., for enzymatically cleaving the N-terminal amino group of a polypeptide). In accordance with this aspect, a synthetic enzyme is provided that catalyzes the cleavage step of the Edman degradation reaction in an aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in Edman degradation.

[0024] Yet another aspect of the present invention is directed to an integrated high-throughput method for sequencing of polypeptides that includes use of reagents capable of specifically binding to N-terminal amino acids for an identification step and use of an enzymatic Edman degradation to remove N-terminal amino acids.

I. N-terminal Amino Acids Binders (NAABs)

[0025] In accordance with the present invention, reagents capable of specifically binding to N-terminal amino acids are provided. In various aspects of the invention, the N-terminal amino acid binders (NAABs) each selectively bind to a particular amino acid, for example one of the twenty standard naturally occurring amino acids. The standard, naturally-occurring amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or Ile), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gln), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Val), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).

[0026] The NAABs of the present invention can be made by modifying various naturally occurring proteins to introduce one or more mutations in the amino acid sequence to produce engineered proteins that bind to particular N-terminal amino acids. For example, aminopeptidases or tRNA synthetases can be modified to create NAABs that selectively bind to particular N-terminal amino acids.

A. eLAP

[0027] For example, a NAAB that binds specifically to N-terminal leucine residues has been developed by introducing mutations into E. coli methionine aminopeptidase (eMAP). This NAAB (eLAP) has 19 amino acid substitutions as compared to wild-type eMAP. In particular, eLAP has substitutions at the amino acid positions corresponding to positions 42, 46, 56-60, 62, 63, 65-70, 81, 101, 177, and 221 of wild-type eMAP. In eLAP, the aspartate at position 42 of eMAP is replaced with a glutamate, the asparagine at position 46 of eMAP is replaced with a tryptophan, the valine at position 56 of eMAP is replaced with a threonine, the serine at position 57 of eMAP is replaced with an aspartate, the alanine at position 58 of eMAP is replaced with a serine, the cysteine at position 59 of eMAP is replaced with a leucine, the leucine at position 60 of eMAP is replaced with a threonine, the tyrosine at position 62 of eMAP is replaced with a histidine, the histidine at position 63 of eMAP is replaced with an asparagine, the tyrosine at position 65 of eMAP is replaced with a isoleucine, the proline at position 66 of eMAP is replaced with an aspartate, the lysine at position 67 of eMAP is replaced with a glycine, the serine at position 68 of eMAP is replaced with a histidine, the valine at position 69 of eMAP is replaced with a glycine, the cysteine at position 70 of eMAP is replaced with a serine, the isoleucine at position 81 of eMAP is replaced with a valine, the isoleucine at position 101 of eMAP is replaced with an arginine, the phenylalanine at position 177 of eMAP is replaced with a histidine, and the tryptophan at position 221 of eMAP is replaced with a serine. Alternative substitutions could be made at selected positions. For example, valine at 56 could be replaced instead by serine, leucine at 60 could be replaced instead by serine, tyrosine at 65 could be replaced instead by valine, cysteine at 70 could be replaced instead by threonine, and tryptophan at 221 could be replaced instead by threonine.

[0028] Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a glutamate residue at a position corresponding to position 42 of wild-type E. coli methionine aminopeptidase (eMAP) (SEQ ID NO: 1), a tryptophan residue at a position corresponding to position 46 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 56 of wild-type eMAP, an aspartate residue at a position corresponding to position 57 of wild-type eMAP, a serine residue at a position corresponding to position 58 of wild-type eMAP, a leucine residue at a position corresponding to position 59 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 60 of wild-type eMAP, a histidine residue at a position corresponding to position 62 of wild-type eMAP, an asparagine residue at a position corresponding to position 63 of wild-type eMAP, a isoleucine or valine residue at a position corresponding to position 65 of wild-type eMAP, an aspartate residue at a position corresponding to position 66 of wild-type eMAP, a glycine residue at a position corresponding to position 67 of wild-type eMAP, a histidine residue at a position corresponding to position 68 of wild-type eMAP, a glycine residue at a position corresponding to position 69 of wild-type eMAP, a serine or threonine residue at a position corresponding to position 70 of wild-type eMAP, a valine residue at a position corresponding to position 81 of wild-type eMAP, an arginine residue at a position corresponding to position 101 of wild-type eMAP, a histidine residue at a position corresponding to position 177 of wild-type eMAP, and a serine or threonine residue at a position corresponding to position 221 of wild-type eMAP.

[0029] The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type eMAP, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal leucine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type eMAP (SEQ ID NO: 1), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1.

[0030] In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 2. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 2.

[0031] The NAAB preferably selectively binds to N-terminal leucine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding. Non-specific binding refers to background binding, and is the amount of signal that is produced when the amino acid target of the NAAB is not present at the N-terminus of an immobilized peptide.

B. tRNA Synthetase-Based NAABs

1. N-Terminal Methionine Binding Protein

[0032] NAABs can also be made by introducing mutations into class I and class II tRNA synthetases (RSs). NAABs for use in the polypeptide sequencing processes described herein should possess high affinity and specificity for amino acids at the N-terminus of peptides. Because tRNA synthetases have intrinsic specificity for free amino acids, they are useful scaffolds for developing NAABs for use in protein sequencing. The inherent specificity of these scaffold proteins is retained, while broadening the binding capabilities of these proteins from free monomers to peptides, and removing unnecessary domains or functions. The Protein Data Bank contains multiple crystal structures for RSs specific for all twenty canonical amino acids. Moreover, unlike other classes of amino acid binding molecules, such as riboswitches, RSs do not envelop the entire amino acid, as the C-terminus must be available for adenylation. The binding pocket in these molecules can be modified to permit the entry of peptides presenting the specifically bound amino acid. This results in a complete set of engineered RS fragments that can bind to their cognate amino acids at the N-termini of peptides.

[0033] The class I RS proteins form a distinct structural family that is identified by sequence homology and has been extensively characterized both biochemically and biophysically. RS proteins possess a modular architecture, and the domains conferring specificity for a particular amino acid are readily identified (18). Several types of mutations to improve the performance of the amino acid binding domain of an RS as a NAAB can be introduced. First, one or more mutations can be introduced into the binding domain to lock the domain into the bound conformation, eliminating the energetic cost of any induced conformational change (16). Second, one or more mutations can be introduced to widen the binding pocket for the amino acid, making room for entry of a peptide. This approach can be used for each of the RS proteins.

[0034] For example, mutations can be introduced into methionyl-tRNA synthetase (MetRS) from E. coli to create a NAAB that binds specifically to N-terminal methionine residues. This NAAB comprises a truncated version of wild-type E. coli MetRS (residues 4-547; SEQ ID NO: 3) having four substitution mutations as compared to the wild-type sequence (SEQ ID NO: 5). The sequence of this N-terminal methionine-specific NAAB is provided by SEQ ID NO: 4. In particular, in the methionine-specific NAAB, the leucine at position 13 of wild-type E. coli MetRS is replaced with a serine (L13S), the phenylalanine at position 260 is replaced with a leucine (Y260L), the aspartic acid at position 296 is replaced with a glycine (D296G), and the histidine at position 301 is replaced with a leucine (H301L).

[0035] Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a serine residue at a position corresponding to position 13 of wild-type E. coli methionyl-tRNA synthetase (MetRS); a leucine residue at a position corresponding to position 260 of wild-type E. coli MetRS; a glycine residue at a position corresponding to position 296 of wild-type E. coli MetRS; and a leucine residue at a position corresponding to position 301 of wild-type E. coli MetRS.

[0036] The remaining amino acid sequence of the NAAB comprises a sequence similar to that of amino acids 4-547 of wild-type MetRS, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal methionine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 3, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3.

[0037] In certain aspects of the invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 4. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 4.

[0038] The NAAB preferably selectively binds to N-terminal methionine residues with at least about a 2:1 ratio of specific to non-specific binding, more preferably at least about a 7:1 ratio, at least about a 10:1 ratio, or about a 13:1 ratio of specific to non-specific binding.

2. N-Terminal Phenylalanine Binding Protein

[0039] The starting point for the phenylalanine NAAB (Phe NAAB) was the phenylalanine-tRNA synthetase (PheRS) from Thermus Thermophilus, for which a crystal structure is available. Normally the operational unit is a tetramer with two copies each of two separate proteins. Only one of the proteins has the amino acid binding specificity, so a model was made of one copy of the protein in isolation. The N-terminus of the protein was truncated, which exposed a significant amount of surface area that was previously buried in contacts with other proteins. This surface was hydrophobic, and mutations were made the surface to make the protein stabile and soluble as a monomer. Tighter binding of the mutant to peptides was observed when compared to the wild-type protein.

[0040] For example, mutations can be introduced into PheRS from Thermus Thermophilus to create a NAAB that binds specifically to N-terminal phenylalanine residues. This NAAB comprises a truncated version of wild-type Thermus Thermophilus PheRS (residues 86-350; SEQ ID NO: 6) having 22 substitution mutations as compared to the wild-type sequence. The sequence of this N-terminal phenylalanine-specific NAAB is provided by SEQ ID NO: 7. In particular, PheNAAB has substitutions at the amino acid positions corresponding to positions 100, 142, 143, 152-154, 165, 205, 212, 228-232, 234, 257, 287, 289, 303, 336, 338, 340 of wild-type PheRS. In the NAAB, the leucine at position 100 of PheRS is replaced with an aspartate, the histidine at position 142 of PheRS is replaced with an asparagine, the histidine at position 143 of PheRS is replaced with a glycine, the phenylalanine at position 152 of PheRS is replaced with a valine, the tryptophan at position 153 of PheRS is replaced with a glycine, the leucine at position 154 of PheRS is replaced with a lysine, the leucine at position 165 of PheRS is replaced with an aspartate, the phenylalanine at position 205 of PheRS is replaced with an alanine, the histidine at position 212 of PheRS is replaced with an alanine, the isoleucine at position 228 of PheRS is replaced with a valine, the alanine at position 229 of PheRS is replaced with an asparagine, the methionine at position 230 of PheRS is replaced with a glutamate, the alanine at position 231 of PheRS is replaced with a glycine, the histidine at position 232 of PheRS is replaced with an aspartate, the lysine at position 234 of PheRS is replaced with a tyrosine, the tyrosine at position 257 of PheRS is replaced with a threonine, the histidine at position 287 of PheRS is replaced with a glycine, the lysine at position 289 of PheRS is replaced with an asparagine, the leucine at position 303 of PheRS is replaced with an aspartate, the phenylalanine at position 336 of PheRS is replaced with an alanine, the glycine at position 338 of PheRS is replaced with a threonine, and the leucine at position 340 of PheRS is replaced with a glycine.

[0041] Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a an aspartate residue at a position corresponding to position 100 of wild-type PheRS from Thermus Thermophilus (SEQ ID NO: 8), an asparagine residue at a position corresponding to position 142 of wild-type PheRS, a glycine residue at a position corresponding to position 143 of wild-type PheRS, a valine residue at a position corresponding to position 152 of wild-type PheRS, a glycine residue at a position corresponding to position 153 of wild-type PheRS, a lysine residue at a position corresponding to position 154 of wild-type PheRS, an aspartate residue at a position corresponding to position 165 of wild-type PheRS, an alanine residue at a position corresponding to position 205 of wild-type PheRS, an alanine residue at a position corresponding to position 212 of wild-type PheRS, a valine residue at a position corresponding to position 228 of wild-type PheRS, an asparagine residue at a position corresponding to position 229 of wild-type PheRS, a glutamate residue at a position corresponding to position 230 of wild-type PheRS, a glycine residue at a position corresponding to position 231 of wild-type PheRS, an aspartate residue at a position corresponding to position 232 of wild-type PheRS, a tyrosine residue at a position corresponding to position 234 of wild-type PheRS, a threonine residue at a position corresponding to position 257 of wild-type PheRS, a glycine residue at a position corresponding to position 287 of wild-type PheRS, an asparagine residue at a position corresponding to position 289 of wild-type PheRS, an aspartate residue at a position corresponding to position 303 of wild-type PheRS, an alanine residue at a position corresponding to position 336 of wild-type PheRS, a threonine residue at a position corresponding to position 338 of wild-type PheRS, and a glycine residue at a position corresponding to position 340 of wild-type PheRS.

[0042] The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type PheRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal phenylalanine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of truncated wild-type PheRS (SEQ ID NO: 6), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO:6.

[0043] In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 7. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 7.

[0044] The NAAB preferably selectively binds to N-terminal phenylalanine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.

3. N-Terminal Histidine Binding Protein

[0045] The starting point for the histidine NAAB (His NAAB) was the histidine-tRNA synthetase (HisRS) from E. coli, for which a crystal structure is available. The fragment of wild-type HisRS from 1-320 was shown to be monomeric by others. After inspecting the crystal structure, further residues were truncated from both ends. The initial fragment tested has from Lysine3 to Alanine180. Protein design was conducted to replace a long loop near the binding site with a shorter loop that would create a more open pocket and result in tighter binding to N-terminal histidine residues. This involved the removal of 7 residues (from Arginine113 to Lysine119) and two mutations wherein the arginine at position 121 of HisRS is replaced with an asparagine, and the tyrosine at position 122 of HisRS is replaced with an alanine. Thus, thus this NAAB comprises a truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having two substitution mutations as compared to the wild-type sequence. The sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.

[0046] Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having an asparagine residue at a position corresponding to position 121 of wild-type HisRS from E. coli (SEQ ID NO: 9) and an alanine residue at a position corresponding to position 122 of wild-type HisRS.

[0047] The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type HisRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal histidine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type HisRS (SEQ ID NO: 9), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 9.

[0048] In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 10. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 10.

[0049] The NAAB preferably selectively binds to N-terminal histidine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.

4. Other NAABs

[0050] Full-length or truncated fragments from wild-type synthetases from E. coli may be used as NAABs for the remaining amino acids. See Table A for the sequences of each of the NAABs. Accordingly, in some aspects of the present invention, the NAAB comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28. In various embodiments, a set of NAABs comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28. For example, a set of NAABs comprises of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.

C. NAABs for PITC-Derivatized Lysine

[0051] The phenyl isothiocyanate (PITC) reagent used to activate peptide N-termini for stepwise degradation also reacts with the NE atom in the lysine side chain. As a result, domains derived from lysine RNA synthetase (LysRS) proteins cannot be used for specific recognition of modified lysine. A NAAB that is specific for PITC-derivatized lysine is therefore required. The class II RS for pyrrolysine (Py1RS) served as a starting point for development. Pyrrolysine is a lysine derivative that possesses a pyrrole ring attached to the NE atom by an amide linkage (Structure A). Crystal structures have been determined for PylRS bound to several ligands (23), one of which is one bond longer than pyrrolysine (Structure B), and possesses steric similarity to a model of PITC-derivatized lysine (Structure C).

##STR00001##

[0052] Genomic DNA for the archaea Methanosarcina mazei, the source organism for the crystal structure, will be obtained from the American Type Culture Collection (ATCC). The gene will be cloned and expressed. The relevant substrate for assessing compatibility with the DAPES strategy is a peptide with an N-terminal lysine that has been modified with PITC on its side chain, but not its amino terminus. It is expected that the side chain will be derivatized during previous cycles, but that the N-terminus will be regenerated by the cleavage step of the preceding cycle. A peptide with the sequence DKGMMGSSC will be obtained. The peptide will be derivatized with PITC, modifying both the N-terminus and the side chain of the lysine at the second position. The modified aspartate residue will be with the designed enzyme, which has excellent activity against PITC-modified aspartate. The resulting peptide, with an N-terminal lysine modified only on its side chain, will be purified from the reaction mixture by HPLC. The peptide will then be immobilized on the nanogel surface via its C-terminal cysteine. The liberated Py1RS domain will be fluorescently labeled with Cy5 and assayed for binding to the immobilized peptide.

[0053] In the event that the engineered domain exhibits poor binding, a structural model of the NAAB in complex with pyrrolysine will be constructed using the crystal structure as a template. Computational design will be performed with the program RosettaDesign (24) to optimize the shape complementarity between the protein and the amino acid. We will introduce the suggested mutations into the gene for the NAAB, express and purify the protein, and reassess the binding properties of the new mutant NAAB.

D. NAABs for Phosphorylated Amino Acids

[0054] In accordance with various aspects of the present invention, the NAABs may also include reagents capable of specifically binding to phosphorylated N-terminal amino acids (e.g., phosphotyrosine, phosphoserine, and phosphothreonine).

[0055] The proteome is elaborated by post-translational modifications. These marks are reversible and provide a snapshot of the current state of a cell with respect to signaling pathways and other regulatory control. Side chain phosphorylation, which primarily occurs on tyrosine, serine, and threonine residues, is a well-known post-translational modification. However, characterization of phosphorylated amino acids by mass spectrometry is difficult. Phosphate groups can be altered or lost during the ionization process, and sample enrichment is typically required to cope with issues of dynamic range (2). Identification of phosphorylated amino acids using digital protocols (e.g., DAPES) is improved because of the improved dynamic range and mild buffer conditions afforded by the present invention. Moreover, the ability to distinguish between phosphorylated and unphosphorylated amino acids could have a huge impact for characterizing cellular and disease states.

[0056] NAABs that specifically bind to either phosphoserine, phosphotyrosine, or phosphothreonine can be made by modifying certain tRNA synthetases to include one or more mutations. For example, methanogenic archaea possess an RS for phosphoserine. In contrast to most organisms, methanogenic archaea lack a CysRS. In these organisms, phosphoserine (Sep) is first ligated to the tRNA for cysteine, and then converted to Cys-tRNA in a subsequent step. A crystal structure of SepRS, a class II synthetase in complex with Sep is available from the PDB (pdb code: 2DU3 (36)).

[0057] While there are no known phosphotyrosine tRNA synthetases, RSs for several chemically similar analogs have been obtained via directed evolution (37-39). The class I TyrRS from Methanococcus jannaschii is the parental protein for these mutants, and a crystal structure is available for engineering (pdb code: 1U7D (apo), 1J1U(holo)). There are several relevant mutant RSs, most notably for sulfotyrosine (37), p-acetyl-L-phenylalanine (pAF), and p-carboxymethyl-L-phenylalanine (pCMF).

[0058] Given the stereochemical similarity between phosphate and sulfate, and the fact that phosphatases and phosphoryltransferases often accept sulfates and sulfuryl groups as substrates (40), it has been found that the sulfotyrosine RS will recognize phosphotyrosine without further modification. The pAF RS, for which a crystal structure is available (pdb code: 1ZH6), differs from the sulfotyrosine RS at only two residues (38). Thus, if necessary a template is available for structural modeling and further protein engineering.

[0059] There are no reported pThrRSs or previously engineered RSs that recognize pThr analogs. Consequently, generation of a pThrRS may require more extensive protein engineering. We will approach this task from two directions. First, we will use computational design to widen the binding pocket of SepRS to accommodate the additional methyl group present in pThr. Second, we will use the motif-directed design approach to graft previously observed phosphate-binding interactions into the binding pocket of ThrRS. The PDB contains hundreds of examples of binding interactions involving phosphotyrosine (308 examples), phosphoserine (385), and phosphothreonine (325) that are suitable for building a motif library of protein-phosphate interactions. The same design protocol successfully used to switch the specificity of eMAP to eLAP will be applied to transplant these interaction motifs into E. coli ThrRS. Mutagenesis of SepRS and ThrRS proteins will be performed using the QuikChange protocol. We will purchase a peptide with the sequence pTGMMGSSC for attachment to the nanogel surface and characterization of binding by single-molecule detection.

[0060] It is expected that a NAAB for pThr may also bind to N-terminal pSer. If so, this NAAB can be used for pThr and pSer, and then the specific amino acid can be inferred by evaluating the surrounding sequence to map the peptide onto a reference proteome library. Alternatively, if de novo, phosphorylation-sensitive sequencing is required, then the efficacy of applying a pSer NAAB, detecting binding, then applying a pThr NAAB without an intervening wash step will be assessed. Bound pSer termini will be blocked by the pSer NAAB, and only additional fluorescent spots will be identified as pThr residues.

E. Fluorophores

[0061] In accordance with various aspects of the present invention, the NAABs are fluorescently labeled such that when a NAAB binds to an amino acid, fluorescence can be detected. Fluorophores useful for fluorescently labels on the NAABs include, for example, but are not limited to Cy3 and Cy5. The fluorophores are usually coupled on-specifically to free amine groups (e.g., lysine side chains) of the NAABs.

II. Method of Making NAABs by Introducing Mutations into tRNA Synthetase Proteins

[0062] The present invention also relates to a method for making a NAAB by introducing mutations into the amino acid sequence of a tRNA synthetase (RS) to produce a NAAB that selectively binds to a particular N-terminal amino acid. For example, such methods can involve introducing one or more mutations into a naturally occurring RS (e.g., into a wild-type E. coli RS). Such methods can also involve introducing one or more additional mutations into an RS that already includes one or more amino acid mutations in its sequence as compared to the sequence of a corresponding wild-type RS.

[0063] The methods for making NAABs comprise identifying the amino acid binding domain of a tRNA synthetase, introducing one or more mutations into the amino acid binding domain to create a NAAB, and assaying the NAAB for specific binding to an N-terminal amino acid of a polypeptide.

[0064] Where the tRNA synthetase is a class I tRNA synthetase, identification of the amino acid binding domain can be accomplished, for example, by constructing a sequence alignment that aligns pairwise the amino acid sequences of two or more class I tRNA synthetases with one another, wherein one of the class I tRNA synthetases has a previously defined amino acid binding domain. This allows for identification of corresponding sequence positions between proteins in order to share useful mutations between NAABs. Thus, in certain aspects of these methods, the tRNA synthetase is a first class I tRNA synthetase and the identifying step comprises aligning an amino acid sequence of the first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain. For example, the amino acid binding domain of E. coli MetRS is known to be encompassed within amino acids 4 to 547 of the protein. Thus, the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain. In addition, the amino acid sequence of the second class I tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases. For example, the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of the engineered MetRS fragment described above (of SEQ ID NO: 4), which contains four amino acid substitutions as compared to the corresponding fragment of wild-type E. coli MetRS. The identifying step can comprise aligning the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain with a class I tRNA synthetase selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine.

[0065] The method can also involve constructing a multiple sequence alignment that aligns the amino acid sequences of the first class I tRNA synthetase, the second class I tRNA synthetase, and at least one additional class I tRNA synthetase. For example, the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class I tRNA synthetases. Thus, the multiple sequence alignment can align the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain with the amino acid sequences of at least two other class I tRNA synthetases selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine.

[0066] Following alignment of an amino acid sequence of a first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain, the boundaries of the amino acid binding domain of the first class I tRNA synthetase can be identified using the known boundaries of the amino acid binding domain in the second class I tRNA synthetase as a guide.

[0067] Once the amino acid binding domain of a given class I tRNA synthetase has been identified, mutations homologous to the four substitution mutations present in the engineered MetRS fragment described above are introduced into the amino acid binding domain of the class I tRNA synthetase. Thus, for each class I tRNA synthetase, the leucine at position 13 of wild-type E. coli MetRS is replaced with a serine (L13S), the phenylalanine at position 260 is replaced with a leucine (Y260L), the aspartic acid at position 296 is replaced with a glycine (D296G), and the histidine at position 301 is replaced with a leucine (H301L).

[0068] The binding affinity of each NAAB containing these mutations against a panel of N-terminal amino acids can be predicted in silica using a computer modeling program (e.g., the Rosetta modeling program). Any NAAB with significant predicted cross-binding with undesired target peptides can be subjected to computational redesign for specificity using a multi-state strategy (11). For example, the computational redesign may identify one or more additional mutations likely to improve the binding specificity of the NAAB for a particular N-terminal amino acid. In this approach, structural models of the NAAB in complex with both the desired and undesired amino acids are constructed in silico.

[0069] If computational redesign identifies any further mutations as being likely to improve the binding specificity of the NAAB for a particular N-terminal amino acid, such mutations can be introduced into the NAAB.

[0070] Similar methods can be used to identify the amino acid binding domains of the class II RSs and introduce mutations into those domains to produce NAABs that selectively bind to N-terminal amino acids that are activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).

[0071] The catalytic domain of class II RS proteins contains the amino acid specificity for the enzyme, and these domains can be used as a starting point for developing additional NAABs. Although class II RSs function as multimers, the catalytic domain of the HisRS from E. coli can be made monomeric by liberating it from its activation domain (20). The crystal structure of the enzyme in complex with histidyl-adenylate is available (pdb code 1KMM (21)), and can serve as a basis for computational structure-based design. At least one RS crystal structure is available for each of the amino acids activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).

[0072] For example, the amino acid binding domains for each of the class II RSs can be identified using the monomeric fragment of E. coli HisRS (SEQ ID NO: 9) as a guide to identify corresponding domains in other class II RSs. Structural alignments between the monomeric fragment of E. coli HisRS (residues 3-180 and corresponding domains in other class II RSs can be obtained from the Dali web server (22). Multiple sequence alignments for the conserved class II catalytic domain can be obtained from the Pfam database (19). Using these alignments, boundaries for the amino acid binding domains for class II RSs can be identified.

[0073] Thus, in some aspects of the method of a making a NAAB, the tRNA synthetase is a first class II tRNA synthetase and the step of identifying the amino acid binding domain comprises aligning an amino acid sequence of the first class II tRNA synthetase with an amino acid sequence of a second class II tRNA synthetase having a previously defined amino acid binding domain. The amino acid sequence of the second class II tRNA synthetase can comprise the amino acid sequence a monomeric fragment of E. coli HisRS that contains the amino acid binding domain (e.g., SEQ ID NO: 9). The amino acid sequence of the second class II tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases.

[0074] For example, the identifying step can comprise aligning the amino acid sequence of the monomeric fragment of E. coli HisRS with a corresponding domain of a class II tRNA synthetase selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.

[0075] The identifying step can also comprise constructing a multiple sequence alignment that aligns the amino acid sequences of the first class II tRNA synthetase, the second class II tRNA synthetase, and at least one additional class II tRNA synthetase. For example, the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class II tRNA synthetases. The multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E. coli HisRS that contains the amino acid binding domain with a corresponding domain of at least two other class II tRNA synthetases selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS. Alternatively, the multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E. coli HisRS that contains the amino acid binding domain with corresponding domains of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.

[0076] Once the amino acid binding domain of a given class II tRNA synthetase has been identified, mutations (e.g., substitution mutations) are introduced into the amino acid binding domain in order to increase the binding affinity of the domain for a particular N-terminal amino acid.

[0077] As with the methods involving class I tRNA synthetases, the methods involving class II tRNA synthetases can also further comprise using a computer modeling program to predict the binding affinity of the NAAB against a panel of N-terminal amino acids. In addition, the NAAB can be subjected to computational redesign to identify one or more additional mutations to improve the binding specificity of the NAAB for a particular N-terminal amino acid. Any additional mutations identified using computational redesign can then be introduced into the NAAB.

[0078] The NAABs designed and made using any of the above methods can cloned into an expression vector, expressed in a host cell (e.g., in an E. coli host cell), purified, and assayed for specific binding to an N-terminal amino acid of a polypeptide. For example, the binding activity for each NAAB can be assayed against a standard set of polypeptides having different N-terminal residues (e.g., custom synthesized peptides of the form XGMMGSSC, where X is a variable position occupied by each of the twenty amino acids).

[0079] For NAABs derived from class II tRNA synthetases, if any of the E. coli protein fragments prove to are insoluble or perform poorly as NAABs, protein design can be used to redesign hydrophobic residues that become exposed upon monomerization. If a crystal structure is unavailable for the E. coli protein, a synthetic gene for an RS with an experimentally determined structure can be obtained. The availability of structures for these proteins allows application of protein surface redesign if the domain truncation results in loss of solubility, binding pocket redesign for enhanced affinity if binding is weak, or multi-state design for enhanced specificity if promiscuous binding is observed (11).

[0080] In any of the above methods, the tRNA synthetase amino acid sequences can be E. coli tRNA synthetase amino acid sequences.

[0081] In addition, in any of the above methods, the sequences can be aligned pairwise by various methods known in the art, for example, using the hidden Markov models available in the Pfam database (19), dynamic programming, and heuristic methods like BLAST.

[0082] Also, in any of the above methods, mutations that favor desired binding and disfavor undesired binding can be introduced into any of the wild-type proteins described above by various methods, for example, using mutagenic primers to introduce mutations via site-directed mutagenesis, PCR-based mutagenesis and Kunkel mutagenesis. Various computer programs can be used to design suitable primers (e.g., the QUICKCHANGE (Aligent Technologies) primer design program).

III. Polypeptide Sequencing Using NAABs

[0083] In accordance with various aspects of the present invention, the NAABs discussed above are used as reagents in a method of polypeptide sequencing. Generally, the method of sequencing a polypeptide comprises the steps of: [0084] (a) contacting the polypeptide with one or more fluorescently labeled N-terminal amino acid binding proteins (NAABs); [0085] (b) detecting fluorescence of a NAAB bound to an N-terminal amino acid of the polypeptide; [0086] (c) identifying the N-terminal amino acid of the polypeptide based on the fluorescence detected; [0087] (d) removing the NAAB from the polypeptide; [0088] (e) optionally repeating steps (a) through (d); [0089] (f) cleaving the N-terminal amino acid of the polypeptide via Edman degradation; and [0090] (g) repeating steps (a) through (f) one or more times.

[0091] In step (a), the polypeptide is contacted with one or more NAABs. In various aspects, the polypeptide is contacted with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue (e.g., a NAAB that selectively binds to N-terminal alanine residues or a NAAB that selectively binds to N-terminal methionine residues). In other embodiments, the polypeptide is contacted with a mixture of two or more types of NAABs that each selectively binds to different amino acid residues. For example, the mixture may comprise two NAABs such as a NAAB that selectively binds to N-terminal alanine residues and a NAAB that selectively binds to N-terminal cysteine residues. A mixture comprising two or more NAABs that selectively bind to different amino acid residues is especially useful when sequencing several polypeptides simultaneously. Introducing multiple different NAABs also reduces sequencing time because multiple N-terminal amino acid residues can be identified during a single iteration of steps (a) through (d). As such, in various embodiments, the method comprises sequencing a plurality of polypeptides. These embodiments are especially suited for high throughput sequencing methods.

[0092] In various aspects of the invention, the polypeptide may be immobilized on a substrate prior to contact with the one or more NAABs. The peptide may be immobilized on any suitable substrate. For example, nanogel substrates have been developed with low non-specific adsorption of proteins and the ability to visualize single attached molecules on this surface (8, 9). Moreover, a plurality of polypeptides may be immobilized on the substrate for sequencing. Immobilizing a plurality of polypeptides is especially suited for high throughput sequencing methods.

[0093] The NAABs of the present inventions are fluorescently labeled with a fluorophore such that when a NAAB binds to a N-terminal amino acid, fluorescence emitted by the fluorophore can be detected by an appropriate detector. Suitable fluorophores include, but are not limited to Cy3 and Cy5. Fluorescence can suitably be detected by detectors known in the art. Based on the fluorescence detected, the N-terminal amino acid of the polypeptide can identified.

[0094] In aspects of the method where the contacting step comprises contacting the polypeptide with a mixture of two or more types of NAABs that each selectively binds to different amino acid residues, each type of NAAB is suitably labeled with different fluorophores having different fluorescence emission spectra. For example, the contacting step can comprise contacting the polypeptide with a first type of NAAB and a second type of NAAB, wherein the first type of NAAB selectively binds to a first type of N-terminal amino acid residue and the second type of NAAB selectively binds to a second type of N-terminal amino acid residue different from the first type of N-terminal amino acid residue. In such methods, the first type of NAAB is suitably coupled to a first fluorophore and the second type of NAAB is suitably coupled to a second fluorophore, wherein the first and second fluorophores have different fluorescence emission spectra.

[0095] In step (d), the one or more NAABs are removed from the polypeptide(s). Removing the one or more NAABs includes removing any excess NAABs present in solution and/or removing any NAABs that are bound to N-terminal amino acids of the polypeptides. Removal of the NAABs is suitably accomplished by washing the polypeptide with a suitable wash buffer in order to cause dissociation of any bound NAABs. In embodiments where the polypeptide is immobilized on a solid substrate, the reagent may be removed by contacting the substrate with a suitable wash buffer.

[0096] Steps (a)-(d) may be repeated any number of times until the N-terminal amino acid of the polypeptide has been identified. In embodiments where a plurality of polypeptides is being sequenced, steps (a)-(d) may be repeated any number of times until all of the N-terminal amino acids of the polypeptide(s) have been identified. During each repetition, a different NAAB or a set of NAABs may be used in step (a) to probe the N-terminal amino acid of the polypeptide(s). Thus, for example, where step (a) comprises contacting the polypeptide with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue, it may be necessary to repeat steps (a) through (d) up to 24 or more times in order to probe the polypeptide with a NAAB specific for each of the twenty standard amino acids, for PITC-derivatized lysine, and for each of the three common phosphorylated amino acids. Alternatively, where step (a) comprises contacting the polypeptide with two or more different types of NAABs simultaneously, fewer repetitions of steps (a) through (d) will be necessary to identify the N-terminal amino acid of the polypeptide.

[0097] After the N-terminal amino acid has been identified or after all of the N-terminal amino acids have been identified (when sequencing multiple polypeptides simultaneously), the N-terminal amino acid(s) may be cleaved from the polypepitde(s) via Edman degradation. Generally, the Edman degradation comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid, and cleaving the PITC-derivatized N-terminal amino acid. In various aspects of the invention, the modified N-terminal amino acid may be cleaved using an Edman degradation enzyme as described in further detail below. In other embodiments, the modified N-terminal amino group may be cleaved by methods known in the art including contact with acid or exposure to high temperature. In these aspects, any substrate comprising the immobilized polypeptide(s) should be compatible with the acidic conditions or high temperatures.

[0098] FIG. 1 provides a diagrammatic representation of the steps of a method of polypeptide sequencing according to the present invention. In step 1 of FIG. 1, multiple polypeptide molecules are immobilized on a substrate. The individual peptide molecules are suitably spatially segregated on the substrate. Analyte proteins may be fragmented into two or more polypeptides prior to immobilization on the substrate.

[0099] In step 2 of FIG. 1, the immobilized polypeptides are contacted with a fluorescently labeled NAAB and fluorescence of the NAAB bound to the N-terminal amino acid of any of the peptides is detected. An image of the substrate is suitably captured at this stage. Subsequently, the NAAB is washed off the substrate. This cycle of binding, detection, and removal of the NAAB is repeated until the N-terminal amino acids of all of the immobilized polypeptides have been identified (step 3). Next, in step 4, the N-termini of the polypeptides are reacted with phenyl isothiocyanate (PITC) (black ovals in FIG. 1). In step 5, an Edman degradation ("Edmanase"), catalyzes the removal of the PITC-derivatized N-terminal amino acid under mild conditions. In each complete cycle, one amino acid is sequenced from each peptide and a new N-terminus is generated for identification in subsequent cycles (step 6).

[0100] In the polypeptide sequencing methods described herein, some of the NAABs may bind smaller, sterically similar off-target amino acids. For example, the isoluecine-specific NAAB derived from IleRS and the threonine-specific NAAB derived from ThrRS may bind N-terminal valine and serine residues, respectively, in addition to their desired targets. However, this does not hinder the effectiveness of this protein sequencing technique. Although various aspects of the present invention relate to a reagent comprising NAABs for all twenty amino acids, the optimal set size for actual sequencing may be less than twenty. Reducing the number of NAABs involves trading off absolute specificity for fewer binding molecules by using a reduced alphabet for protein sequences. It may be more efficient to identify multiple amino acids (such as isoleucine and valine) with a single NAAB, and treat these amino acids as interchangeable when matching against a sequence database. It is also possible to enforce specificity in digital protocols such as DAPES by introducing the NAABs in a step-wise fashion. For example, the valine-specific NAAB derived from ValRS can be added before the isoleucine-specific NAAB derived from IleRS, with the intention of identifying and capping N-terminal valine residues before molecules intended to target isoleucine residues that can bind to them.

[0101] Methods of the present invention possess attractive features relative to mass spectrometry. Because detection operates at the single molecule level, this method will have excellent dynamic range, and will be appropriate for extremely small amounts of sample. Furthermore, the digital nature of the detection produces inherently quantitative data. Finally, because all steps can be carried out in neutral aqueous buffer, post-translation modifications (e.g., phosphorylations) remain stable and available for analysis.

IV. Enzymatic Edman Degradation

[0102] In another aspect, the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., cleaving the N-terminal amino acid of a polypeptide). In accordance with this aspect, one or more enzymes are provided that catalyze the cleavage step of the Edman degradation in aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in conventional Edman degradation. In one aspect, the Edman degradation enzyme a modified cruzain enzyme. Cruzian is a cysteine protease in the protozoa Trypanosoma cruzi and was discovered to possess many of the desired characteristics for creating an Edman degradation enzyme.

[0103] In conventional Edman degradation, polypeptides are sequenced by degradation from their N-terminus using the Edman reagent, phenyl isothiocyanate (PITC). The process requires two steps: coupling and cleavage. In the first step (coupling), the N-terminal amino group of a peptide reacts with phenyl isothiocyanate to form a thiourea. In the second step, treatment of the thiourea with anhydrous acid (e.g., trifluoroacetic acid) results in cleavage of the peptide bond between the first and second amino acids. The N-terminal amino acid is released as a thiazolinone derivative. The thiazoline derivative may be extracted into an organic solvent, dried, and converted to the more stable phenylthiohydantoin (PTH) derivative for analysis. The most convenient method for identifying the PTH-amino acids generated during each sequencing cycle is by UV absorbance and HPLC chromatography. Each amino acid is detected by it UV absorbance at 269 nm and is identified by its characteristic retention time.

[0104] In digital protocols, such as DAPES, the N-terminal amino acid has already been identified. Therefore, there is no need to generate or detect a phenylthiohydantoin derivative of the terminal amino acid. However, the strongly acidic conditions typically used in the cleavage step of conventional Edman degradation protocols are incompatible with the substrate surface upon which the polypeptides are immobilized for single molecule protein detection (SMD) (e.g., a nanogel surface). One modification of the conventional Edman degradation dispenses with the acidic conditions promotes cleavage with elevated temperature (e.g., 70-75.degree. C.) instead (25). However, some substrate surfaces used to immobilize peptides include bovine serum albumin (BSA), which has a melting temperature of approximately 60.degree. C. in the absence of stabilizing additives (26). Further, repeated cycles of heating and cooling of the substrate surface (e.g., nanogel) may be undesirable. Thus, the present invention provides a method of performing the Edman degradation which dispenses with both acidic conditions and elevated temperature. Advantageously, an enzyme has been developed which accomplishes the cleavage step in a neutral, aqueous buffer. This enzyme avoids acidic conditions and high temperatures and decreases the cycle time for polypeptide sequencing by reducing or eliminating the need to change buffer and temperature conditions repeatedly.

[0105] The Edman degradation enzyme (or "Edmanase") according to the present invention accomplishes the chemical step of the N-terminal degradation by nucleophilic attack of the thiourea sulfur atom on the carbonyl group of the scissile peptide bond. As noted, the enzyme was made by modifying cruzain, a cysteine protease from the protozoa Trypanosoma cruzi (SEQ ID NO: 30). Cruzain prefers hydrophobic amino acids at the S2 position relative to the scissile bond, which corresponds to the phenyl moiety of the Edman reagent. The protease is relatively insensitive to the identity of the amino acid at the S1 position (29), allowing for promiscuous cleavage of diverse N-terminal residues. Furthermore, this protein has been the subject of extensive structural characterization (27).

[0106] The Edman degradation enzyme differs from the wild-type of cysteine protease cruzain at four positions. One mutation (C25G) removes the catalytic cysteine residue while three mutations (G65S, A138C, L160Y) were selected to create steric fit with the phenyl moiety of the Edman reagent (PITC). FIG. 4A-4B depicts latter three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27); SEQ ID NO: 30) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate. FIG. 4A depicts a model for the cleavage intermediate of an N-terminal alanine residue in the active site cleft. In addition to the engineered residues, two wild-type residues (shown in green sticks) contribute to forming a complementary pocket. FIG. 4B depicts a space-filling representation of the packing of the phenyl ring by protein side chains. The methyl group of the ligand (in gray at the top of the panel) corresponds to the side chain of the N-terminal residue to be cleaved, and is not involved in the tight packing between enzyme and substrate. The enzyme was expressed and purified.

[0107] Accordingly, one aspect of the present invention relates to an isolated, synthetic, or recombinant Edman degradation enzyme comprising an amino acid sequence having a glycine residue at a position corresponding to position 25 of wild-type Trypanosoma cruzi cruzian; a serine residue at a position corresponding to position 65; a cysteine residue at a position corresponding to position 138; and a tryptophan residue at a position corresponding to position 160.

[0108] The remaining amino acid sequence of the Edman degradation enzyme comprises a sequence similar to that of wild-type Trypanosoma cruzi cruzian, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions, so long as such mutations do not significantly impair the ability of the Edman degradation enzyme to cleave PITC-derivatized N-terminal amino acids. For example, the remaining amino acid sequence can have at least about 80%, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 87%, at least 98%, or at least 99% sequence identity with the sequence of the wild-type Trypanosoma cruzi cruzian.

[0109] In some aspects of the invention, the Edman degradation enzyme comprises the sequence of SEQ ID NO: 29. For example, the Edman degradation enzyme can consist of the sequence of SEQ ID NO: 29.

[0110] In various aspects of the invention, the reagents for enzymatic Edman degradation comprise two or more enzymes. For example, one point of concern is the ability to cleave proline residues. If a single mutant of cruzain cannot accomplish this reaction, then an additional enzyme would be required. Naturally occurring enzymes cleave dipeptides of the form Xaa-Pro from the N-terminus of peptides, for example, quiescent cell proline dipeptidase (QPP) (35), and Xaa-Pro amino peptidase (pdb code: 30VK). PITC-coupled N-terminal proline is chemically and sterically very similar to a dipeptide. Therefore, these enzymes are excellent starting points for engineering a proline-specific activity.

[0111] When introducing elements of the present invention or the preferred embodiments(s) thereof, the articles "a", "an", "the" and "said" are intended to mean that there are one or more of the elements. The terms "comprising", "including" and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.

[0112] As various changes could be made in the above products, compositions and processes without departing from the scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

EXAMPLES

[0113] The following non-limiting examples are provided to further illustrate the present invention.

Example 1. eLAP: a NAAB that Specifically Binds to N-Terminal Leucine Residues

[0114] In this example, an E. coli methionine aminopeptidase (eMAP) was modified to create a NAAB that binds specifically to N-terminal leucine residues. Two mutually compatible leucine-contacting interactions were identified from the protein data bank (PDB) (15) that could be incorporated into the eMAP structure. The surrounding protein residues of eMAP were redesigned around these two interactions. The resulting NAAB for leucine (eLAP) has 19 amino acid mutations relative to eMAP.

[0115] The eMAP and eLAP proteins were expressed and assayed for binding against a panel of peptides with different N-termini. The NAAB for N-terminal leucine amino acids was non-specifically labeled with Cy5 fluorophore on lysine side chains. Synthetic peptides with either N-terminal methionine, leucine, or asparagine amino acids were coupled to a nanogel surface by thiol linkage. An additional experiment was performed with no peptide added. The labeled NAAB was briefly incubated with the immobilized peptide, and unbound protein was removed by washing. Bound protein, which may be bound specifically to peptides or non-specifically to the surface, was imaged by total internal reflection fluorescence (TIRF) microscopy. Spots exceeding a detection threshold were deemed to indicate bound protein and were converted to a number of counts per field-of-view. FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and eLAP in a single-molecule detection experiment. In FIG. 2A, fluorescently labeled eMAP and eLAP NAABs were visualized after binding to immobilized peptides with different N-terminal amino acids. FIG. 2B depicts histograms of quantitative binding. Digital analysis of NAAB binding for eMAP and eLAP showed that each NAAB was specific for the expected N-terminal amino acid. Both proteins exhibited roughly a 2:1 ratio of specific to non-specific binding.

[0116] These results demonstrate that individual N-terminal amino acids can be identified in an SMD format using NAABs that are selective for a particular amino acid.

Example 2. A NAAB that Specifically Binds to N-Terminal Methionine Residues

[0117] In this example, a truncated version of wild-type E. coli methionyl-tRNA synthetase (MetRS) from E. coli. was modified to make a NAAB that binds specifically to N-terminal methionine residues. A truncated version of MetRS (residues 1-547) having three amino acid mutations (L13S, Y260L, and H301L) that had been shown to pre-organize the binding site towards the methionine-bound conformation was obtained (16). A crystal structure is available of this mutant bound to free methionine (pdb code: 3h99). An additional mutation (D296G) was introduced to provide a more open binding pocket capable of accommodating a peptide and avoid steric clashes. This mutation was introduced into MetRS and the altered protein was expressed in E. coli. The gene encoding MetRS from genomic DNA was amplified and was cloned into the pET42a expression vector between the Mfel and XhoI sites. This yielded a genetic fusion of a thrombin-cleavable GST tag and MetRS. The mutations were introduced using the QuikChange protocol. The proteins were expressed at 16.degree. C. overnight using the autoinduction protocol of Studier (17). The GST-MetRS fusion was purified from lysates by affinity chromatography using GSTrap columns on a Bio-Rad liquid chromatography system. Following purification, proteins were labeled with Cy5 fluorophore on lysine side chains for single-molecule binding assays.

[0118] Using an SMD assay we then tested the specificity of our mutant MetRS for peptides with different amino acids at the N-terminus. Peptides of the form XGMMGSSC were purchased commercially, where X is methionine, leucine, or asparagine. The peptides were immobilized on a nanogel surface via thiol linkages, and the engineered MetRS domain was applied to the surface. Single molecule detection of bound MetRS was imaged by total internal reflection fluorescence (TIRF) microscopy. The resulting images are shown in FIG. 3. Quantitation of single-molecule binding events yields specific to non-specific binding of .about.7:1 and .about.13:1 for the alternate amino acids. The data in FIG. 3 show that the domain exhibits specific binding for N-terminal methionine, indicating that engineered RS fragments are excellent molecular reagents for DAPES and that computational protein design is an efficient method for producing NAABs with specificity for particular N-terminal amino acids.

Example 3. A NAAB that Specifically Binds to N-Terminal Histidine Residues

[0119] In this example, a histidine-tRNA synthetase (HISRS) from E. coli was modified to create a NAAB that binds specifically to N-terminal histidine residues. The fragment of wild-type HisRS from 1-320 was shown to be monomeric by others. After inspecting the crystal structure of HisRS, further residues were truncated from both ends. The initial fragment tested has from Lysine3 to Alanine180. Protein design was conducted to replace a long loop near the binding site with a shorter loop that would create a more open pocket and result in tighter binding to N-terminal histidine residues. This involved the replacement of an 11 residue loop (from Arginine113 to Lysine123) with a 4 residue turn, wherein the four residues of the inserted turn are Glycine, Asparagine, Alanine, and Proline. Thus, this NAAB comprises an internally truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having seven fewer residues as compared to the wild-type sequence. The sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.

[0120] FIG. 7 shows that engineered HisNAAB (SEQ ID NO: 10) exhibits enhanced binding affinity for peptides with N-terminal histidine residues as compared to the wild-type fragment. Biolayer interferometry kinetics data show that the engineered HisNAAB (data in open circles) binds N-terminal histidine with the same off-rate as the wild-type fragment (SEQ ID NO: 90 (data in solid circles), but with an enhanced on-rate. As a result, the engineered His NAAB binds with an approximately 10-fold improvement in binding affinity.

Example 4. Purification of an Edman Degradation Enzyme

[0121] A synthetic gene containing the Edman degradation enzyme was purchased from GenScript. The gene encoded a modified version of the cruzian enzyme of T. cruzi having the following substitution mutations: C25G, G65S, A138C, and L160Y.

[0122] The gene was inserted between an NdeI and an XhoI site in a pet42(a) (Novagen) expression vector and transformed into E. coli, BL-21(De3) chemically competent cells. Protein was then over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4.degree. C. for 10 minutes. Cells were then suspended in 1.times.PBS with 10% glycerol and 6M guanidine chloride, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes. The supernatant was then filtered through a 0.2 .mu.m cellulose acetate filter. The filtered lysate was loaded onto a 5 mL HisTrap (Ni-NTA) column and washed with 5 column volumes of binding buffer (50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 25 mM imidazole). Bound protein was then eluted in 50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 500 mM imidazole. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1 part 4.times. loading dye. Samples were analyzed on 16% SDS-PAGE precast gels, and visualized by Coomassie staining. The purified protein was then refolded by successive, overnight dialyses into 1.times.PBS containing 5M, 3M, 1M, 0.5M, and 0M guanidine chloride. Protein concentration was determined using the calculated molar extinction coefficient and measuring the A280 on an ND-8000 spectrophotometer (Thermo Fisher Scientific).

Example 5. Substrates and Inhibitors for Edman Degradation Enzyme (Edmanase)

[0123] Single amino acid, aminomethylcoumarin (AMC) containing compounds were obtained from BAChem (Bubendorf, Switzerland). These included Arg-AMC, Asn-AMC, Phe-AMC, Met-AMC, Ala-AMC, and Pro-AMC. Phenylisothiocyanate (PITC) was purchased from Thermo-scientific and coupled to the N-terminus of each substrate by incubating for 10 minutes at room temperature in a 100 .mu.L solution of acetonitrile:pyridine:water (10:5:3) with 5 .mu.L of PITC. The derivatized substrate was then dried by rotary evaporation and suspended in 250 .mu.L of 1.times. Phosphate Buffered Saline (PBS). Inhibitor compound, 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)-ethanone, was ordered from Sigma-Aldrich (St. Louis, Mo.).

Example 6. Edmanase Activity Measurements

[0124] The ability of the Edman degradation enzyme to perform N-terminal cleavage on six substrates of the form Ed-X-AMC, where Ed denotes the Edman reagent, X is an amino acid from the set (Ala, Asp, Phe, Met, Pro, Arg), and AMC is the fluorogenic amidomethylcoumarin group was characterized. Cleavage of the X-AMC bond was monitored by the appearance of fluorescence (FIG. 6). The engineered protein displayed activity against all six substrates to varying degrees (See Table below).

[0125] All kinetic measurements were performed in a 96-well coming plate on a BioTek Synergy2 plate reader at 30 degrees. Reactions were started by adding 5-20 .mu.L of purified enzyme to 100 .mu.L of 10 mM substrate solution. Final enzyme concentration was between 1 nM and 100 nM, depending on the experiment. Fluorescence of the cleaved product was measured by exciting at 370 nm (30 second intervals for 1-10 hours) and monitoring emissions at 460 nm. A standard curve using AMC from Invitrogen was referenced quantitate the amount of product formation.

TABLE-US-00001 TABLE Measured kinetic rates for Edmanase Substrate (.chi.-AMC) K.sub.cat (s.sup.-1) K.sub.m (.mu.M) Kcat/K.sub.M Alanine 0.55 21.3 2.6 .times. 10.sup.4 Arginine 0.087 167.8 5.2 .times. 10.sup.2 Asparagine 3.6 124.5 2.9 .times. 10.sup.4 Methionine 0.54 271.8 2.0 .times. 10.sup.3 Phenylalanine 0.47 122.8 3.8 .times. 10.sup.3 Proline 0.0014 252.0 5.7 .times. 10.sup.1

Example 7. Inhibition of the Edman Degradation Enzyme by 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)-ethanone

[0126] Assays were conducted as described above in Example 5, with 5 .mu.M substrate, 100 nM enzyme, and 500 nM-15 .mu.M 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)-ethanone. Reaction velocity was determined as above, plotted against the inverse of inhibitor concentration, and fit by non-linear least squares to determine the inhibition constant.

Example 8. Cloning of additional N-terminal Amino Acid Binding Proteins (NAABs)

[0127] Primers specific for each NAAB were ordered from Integrated DNA Technologies. Each NAAB was then amplified from isolated, E. coli genomic DNA and transferred to a pet42a expression vector at various positions, depending on the gene sequence. These constructs were transformed into either E. coli BL21(DE3) or E. coli `Arctic Express` competent cells for expression.

Example 9. Expression and Purification of N-terminal Amino Acid Binders (NAABs)

[0128] Protein was over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4.degree. C. for 10 minutes. Cells were then suspended in 1.times.PBS with 10% glycerol, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes. The supernatant was then filtered through a 0.2 um cellulose acetate filter. The filtered lysate was loaded onto a 1 mL GSTrap column and washed with 5 column volumes of binding buffer (1.times.PBS). Bound protein was then eluted in 50 mM Tris-HCl, 10 mM reduced glutathione. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1 part 4.times. loading dye. Samples were analyzed on 16% SDS-PAGE precast gels, and visualized by Coomassie staining. Protein concentration was determined using the calculated molar extinction coefficient and measuring the A280 on an ND-8000 spectrophotometer (Thermo Fisher Scientific).

Example 10. Binding Assays

[0129] Real time binding assays between peptides and purified NAABs were performed using biolayer interferometry on a Blitz system (Fortebio, Menlo Park, Calif.). This system monitors interference of light reflected from the surface of a fiber optic sensor to measure the thickness of molecules bound to the sensor surface. Sensors coated with peptides were allowed to bind to the NAABs in 1.times.PBS at several different protein concentrations. Binding kinetics were calculated using the Blitz software package, which fit the observed binding curves to a 1:1 binding model to calculate the association rate constants. NAABs were allowed to dissociate by incubation of the sensors in 1.times.PBS. Dissociation curves were fit to a 1:1 model to calculate the dissociation rate constants. Binding affinities were calculated as the kinetic dissociation rate constant divided by the kinetic association rate constant.

TABLE-US-00002 TABLE Measured Affinity Constants Glutamate 2.12 .mu.M Phenylalanine 3.44 .mu.M Histidine 98.7 .mu.M Methionine 1.07 .mu.M Asparagine 754 nM Arginine 129 nM Tryptophan 48.9 nM Tyrosine 57.6 .mu.M Phosphoserine 7.72 .mu.M Phosphotyrosine 1.07 .mu.M Aspartate 411 nM Isoleucine 3.01 .mu.M Leucine 1.88 .mu.M Glutamine 531 nM Serine 938 nM Threonine 1.01 .mu.M Valine 1.22 .mu.M Lysine 2.61 .mu.M

[0130] FIG. 8 is a full binding matrix that shows how well every engineered NAAB protein binds to every N-terminal amino acid. Each square in the binding matrix represents the binding affinity for a single NAAB with an N-terminal amino acid as measured by biolayer interferometry. Each row in the matrix contains all the binding data for a single NAAB, and each column contains the binding data for a single N-terminal amino acid (shown by single-letter code). Darker squares represent tighter binding. The NAABs exhibit cross-binding for chemically similar N-terminal amino acids. However, the set of predicted binding patterns for each amino acid are distinct. Thus, when taken as a set, the engineered NAAB proteins are capable of identifying amino acids at the N-terminus of peptides.

[0131] For reference, the abbreviations of the amino acids are as follows:

TABLE-US-00003 Amino acid Three letter code One letter code alanine ala A arginine arg R asparagine asn N aspartic acid asp D asparagine or asx B aspartic acid cysteine cys C glutamic acid glu E glutamine gln Q glutamine or glx Z glutamic acid glycine gly G histidine his H isoleucine ile I leucine leu L lysine lys K methionine met M phenylalanine phe F proline pro P serine ser S threonine thr T tryptophan trp W tyrosine tyr Y valine val V

TABLE-US-00004 TABLE A NAAB sequences SEQ ID NO: SEQ ID wild-type MAISIKTPEDIEKMRVAGRLAAEVLEMIEPYVKPGVSTGELD NO: 1 eMAP RICNDYIVNEQHAVSACLGYHGYPKSVCISINEVVCHGIPDD AKLLKDGDIVNIDVTVIKDGFHGDTSKMFIVGKPTIMGERLC RITQESLYLALRMVKPGINLREIGAAIQKFVEAEGFSVVREYC GHGIGRGFHEEPQVLHYDSRETNVVLKPGMTFTIEPMVNAG KKEIRTMKDGWTVKTKDRSLSAQYEHTIVVTDNGCEILTLR KDDTIPAIISHDE SEQ ID eLAP MAISIKTPEDIEKMRVAGRLAAEVLEMIEPYVKPGVSTGELE NO: 2 RICWDYIVNEQHATDSLTGHNGIDGHGSISINEVVCHGVPDD AKLLKDGDIVNIDVTVRKDGFHGDTSKMFIVGKPTIMGERLC RITQESLYLALRMVKPGINLREIGAAIQKFVEAEGFSVVREYC GHGIGRGHHEEPQVLHYDSRETNVVLKPGMTFTIEPMVNAG KKEIRTMKDGSTVKTKDRSLSAQYEHTIVVTDNGCEILTLRK DDTIPAIISHDE SEQ ID truncated AKKILVTCALPYANGSIHLGHMLEHIQADVWVRYQRMRGH NO: 3 wild-type EVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEHQTDFA MetRS GFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRTISQLY (4-547) DPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGATYSPTE LIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRSGALQ EQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGKYFYV WLDAPIGYMGSFKNLCDKRGDSVSFDEYWKKDSTAELYHFI GKDIVYFHSLFWPAMLEGSNFRKPSNLFVHGYVTVNGAKMS KSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDIDLNLED FVQRVNADIVNKVVNLASRNAGFINKRFDGVLASELADPQL YKTFTDAAEVIGEAWESREFGKAVREIMALADLANRYVDEQ APWVVAKQEGRDADLQAICSMGINLFRVLMTYLKPVLPKLT ERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRIDMRQVE ALVEASK SEQ ID Met AKKILVTCASPYANGSIHLGHMLEHIQADVWVRYQRMRGH NO: 4 NAAB* EVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEHQTDFA GFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRTISQLY DPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGATYSPTE LIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRSGALQ EQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGKYFYV WLDAPIGLMGSFKNLCDKRGDSVSFDEYWKKDSTAELYHFI GKGIVYFLSLFWPAMLEGSNFRKPSNLFVHGYVTVNGAKMS KSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDIDLNLED FVQRVNADIVNKVVNLASRNAGFINKRFDGVLASELADPQL YKTFTDAAEVIGEAWESREFGKAVREIMALADLANRYVDEQ APWVVAKQEGRDADLQAICSMGINLFRVLMTYLKPVLPKLT ERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRIDMRQVE ALVEASK SEQ ID wild-type MTQVAKKILVTCALPYANGSIHLGHMLEHIQADVWVRYQR NO: 5 MetRS MRGHEVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEH (full QTDFAGFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRT length) ISQLYDPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGAT YSPTELIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRS GALQEQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGK YFYVWLDAPIGYMGSFKNLCDKRGDSVSFDEYWKKDSTAE LYHFIGKDIVYFHSLFWPAMLEGSNFRKPSNLFVHGYVTVN GAKMSKSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDID LNLEDFVQRVNADIVNKVVNLASRNAGFINKRFDGVLASEL ADPQLYKTFTDAAEVIGEAWESREFGKAVREIMALADLANR YVDEQAPWVVAKQEGRDADLQAICSMGINLFRVLMTYLKP VLPKLTERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRID MRQVEALVEASKEEVKAAAAPVTGPLADDPIQETITFDDFA KVDLRVALIENAEFVEGSDKLLRLTLDLGGEKRNVFSGIRSA YPDPQALIGRHTIMVANLAPRKMRFGISEGMVMAAGPGGKD IFLLSPDAGAKPGHQVK SEQ ID truncated VDVSLPGASLFSGGLHPITLMERELVEIFRALGYQAVEGPEV NO: 6 wild-type ESEFFNFDALNIPEHHPARDMWDTFWLTGEGFRLEGPLGEEV PheRS EGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRFEQTDAT (86-350) HEAVFHQLEGLVVGEGIAMAHLKGAIYELAQALFGPDSKVR FQPVYFPFVEPGAQFAVWWPEGGKWLELGGAGMVHPKVFQ AVDAYRERLGLPPAYRGVTGFAFGLGVERLAMLRYGIPDIR YFFGGRLKFLEQFKGVL SEQ ID PheNAAB VDVSLPGASLFSGGDHPITLMERELVEIFRALGYQAVEGPEV NO: 7 (86-350) ESEFFNFDALNIPENGPARDMWDTVGKTGEGFRLEGPDGEE VEGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRAEQTD ATAEAVFHQLEGLVVGEGVNEGDLYGAIYELAQALFGPDSK VRFQPVTFPFVEPGAQFAVWWPEGGKWLELGGAGMVGPNV FQAVDAYRERLGDPPAYRGVTGFAFGLGVERLAMLRYGIPD IRYF SEQ ID wild-type MLEEALAAIQNARDLEELKALKARYLGKKGLLTQEMKGLS NO: 8 PheRS ALPLEERRKRGQELNAIKAALEAALEAREKALEEAALKEAL (full ERERVDVSLPGASLFSGGLHPITLMERELVEIFRALGYQAVE length) GPEVESEFFNFDALNIPEHHPARDMWDTFWLTGEGFRLEGPL GEEVEGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRFEQ TDATHEAVFHQLEGLVVGEGIAMAHLKGAIYELAQALFGPD SKVRFQPVYFPFVEPGAQFAVWWPEGGKWLELGGAGMVHP KVFQAVDAYRERLGLPPAYRGVTGFAFGLGVERLAMLRYGI PDIRYFFGGRLKFLEQFKGVL SEQ ID truncated NIQAIRGMNDYLPGETAIWQRIEGTLKNVLGSYGYSEIRLPIV NO: 9 wild-type EQTPLFKRAIGEVTDVVEKEMYTFEDRNGDSLTLRPEGTAGC HisRS VRAGIEHGLLYNQEQRLWYIGPMFRHERPQKGRYRQFHQLG (3-180) CEVFGLQGPDIDAELIMLTARWWRALGISEHVTLELNSIGSL EARANYRDA SEQ ID HisNAAB KNIQAIRGMNDYLPGETAIWQRIEGTLKNVLGSYGYSEIRLPI NO: 10 (3-180) VEQTPLFKRAIGEVTDVVEKEMYTFEDRNGDSLTLRPEGTA GCVRAGIEHGLLYNQEQRLWYIGPMFGNAPQFHQLGCEVFG LQGPDIDAELIMLTARWWRALGISEHVTLELNSIGSLEARAN YRDA SEQ ID AlaNAAB SKSTAEIRQAFLDFFHSKGHQVVASSSLVPHNDPTLLFTNAG NO: 11 MNQFKDVFLGLDKRNYSRATTSQRCVRAGGKHNDLENVGY TARHHTFFEMLGNFSFGDYFKHDAIQFAWELLTSEKWF ALPKERLWVTVYESDDEAYEIWEKEVGIPRERIIRIGDNKGA PYASDNFWQMGDTGPCGPCTEIFYDHGDHIWGGPPGSPEED GDRYIEIWNIVFMQFNRQADGTMEPLPKPSVDTGMGL ERIAAVLQHVNSNYDIDL SEQ ID ArgNAAB EKQTIVVDYSAPNVAKEMHVGHLRSTIIGDAAVRTLEFLGH NO: 12 KVIRANHVGDWGTQFGMLIAWLEKQQQENAGEMELADLE GFYRDAKKHYDEDEEFAERARNYVVKLQSGDEYFREMWR KLVDITMTQNQITYDRLNVTLTRDDVMGESLYNPMLPGIVA DLKAKGLAVESEGATVVFLDEFKNKEGEPMGVIIQKKDGGY LYTTTDIACAKYRYESLHADRVLYYIDSRQHQHLMQAWAIV RKAGYVPESVPLEHHMFGMMLGKDGKPFKTRAGGTVKLAD LLDETLERARRLVAEKNPDMPADELEKLANAVGIGAVKYA DLSKNRTTDYIFDWDNMLAFEGNTAPYMQYAYTRVLSVFR KAEINEEQLAAAPVIIREDREAQLAARLLQFEETLTVVAREG TPHVMCAYLYDLAGLFSGFYEHCPILSAENEEVRNSRLKLAQ LTAKTLKLGLDTLGIETVERM SEQ ID AsnNAAB SIEYLREVAHLRPRTNLIGAVARVRHTLAQALHRFFNEQGFF NO: 13 WVSTPLITASDTEGAGEMFRVSTLDLE NLPRNDQGKVDFDKDFFGKESFLTVSGQLNGETYACALSKI YTFGPTFRAENSNTSRHLAEFWMLEPEVAFANLNDIAGLAE AMLKYVFKAVLEERADDMKFFAERVDKDAVSRLERFIEADF AQVDYTDAVTILENCGRKFENPVYWGVDLSSEHERYLAEEH FKAPVVVKNYPKDIKAFYMRLNEDGKTVAAMDVLAPGIGEI IGGSQREERLDVLDERMLEMGLNKEDYWWYRDLRRYGTVP HSGFGLGFERLIAYVTGVQNVRDVIPFPRTP SEQ ID AspNAAB LPLDSNHVNTEEARLKYRYLDLRRPEMAQRLKTRAKITSLV NO: 14 RRFMDDHGFLDIETPMLTKATPEGARDYLVPSRVHKGKFYA LPQSPQLFKQLLMMSGFDRYYQIVKCFRDEDLRADRQPEFT QIDVETSFMTAPQVREVMEALVRHLWLEVKGVDLGDFPVM TFAEAERRYGSDKPDLRNPMELTDVADLLRSVEFAVFAGPA NDPKGRVAALRVPGGASLTRKQIDEYDNFVKIYGAKGLAYI KVNERAKGLEGINSPVAKFLNAEHEAILDRTAAQDGDMIFFG ADNKKIVADAMGALRLKVGKDLGLTDESKWAPLWVIDFPM FEDDGEGGLTAMHHPFTSPKDMTAAELKAAPENAVANAYD MVINGYEVGGGSVRIHNGDMQQTVFGILGINEEEQREKFGFL LDALKYGTPPHAGLAFGLDRLTMLLTGTDNIRDVIAFPK SEQ ID CysNAAB MLKIFNTLTRQKEEFKPIHAGEVGMYVCGITVYDLCHIGHGR NO: 15 TFVAFDVVARYLRFLGYKLKYVRNITDI DDKIIKRANENGESFVAMVDRMIAEMHKDFDALNILRPDME PRATHHIAEIIELTEQLIAKGHAYVADNGDVMFDVPTDPTYG VLSRQDLDQLQAGARVDVVDDKRNPMDFVLWKMSKEGEP SWPSPWGAGRPGWHIECSAMNCKQLGNHFDIHGGGSDLMF PHHENEIAQSTCAHDGQYVNYWMHSGMVMVDREKMSKSL GNFFTVRDVLKYYDAETVRYFLMSGHYRSQLNY SEQ ID GlnNAAB TNFIRQIIDEDLASGKHTTVHTRFPPEPNGYLHIGHAKSICLNF NO: 16 GIAQDYKGQCNLRFDDTNPVKEDIEYVESIKNDVEWLGFHW SGNVRYSSDYFDQLHAYAIELINKGLAYVDELTPEQIREYRG TLTQPGKNSPYRDRSVEENLALFEKMRTGGFEEGKACLRAKI DMASPFIVMRDPVLYRIKFAEHHQTGNKWCIYPMYDFTHCIS DALEGITHSLCTLEFQDNRRLYDWVLDNITIPVHPRQYEFSR SEQ ID GluNAAB IKTRFAPSPTGYLHVGGARTALYSWLFARNHGGEFVLRIEDT NO: 17 DLERSTPEAIEAIMDGMNWLSLEWDEGPYYQTKRFDRYNAV IDQMLEEGTAYKCYCSKERLEALREEQMAKGEKPRYDGRC RHSHEHHADDEPCVVRFANPQEGSVVFDDQIRGPIEFSNQEL DDLIIRRTDGSPTYNFCVVVDDWDMEITHVIRGEDHINNTPR QINILKALNAPVPVYAHVSMINGDDGKKLSKRHGAVSVMQ YRDDGYLPEALLNYLVRLGWSHGDQEIFTREEMIKYFTLNA VSKSASAFNTDKLLWLNHHYI SEQ ID IleNAAB FPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKTFILHD NO: 18 GPPYANGSIHIGHSVNKILKDIIIKSKGLSGYDSPYVPGWDCH GLPIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRK DFIRLGVLGDWSHPYLTMDFKTEANIIRALGKIIGNGHLHKG AKPVHWCVDCRSALAEAEVEYYDKTSPSIVAFQAVDQDAL KTKFGVSNVNGPISLVIWTTTPWTLPANRAISIAPDFDYALVQ IDGQAVILAKDLVESMQRIGVSDYTILGTVKGAELELLRFTH PFMDFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKY GLETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEK GALLHVEKMQHSYPCCWRHKTPIIFRATPQWFVSMDQKGLR AQSLKEIKGVQWIPDWGQARIESMVANRPDWCISRQRTWG VPMSLFVHKDTEELHPRTLELMEEVAKRVEVDGIQAWWDL DAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRPEFA GHAADMYLEGSDQHRGWFMSSLMISTAMKGKAPYRQVLT HGFTVDGQGRKMSKSIGNTVSPQDVMNKLGADILRLWVAS TDYTGEMAVSDEILKRAADSYRRIRNTARFLLANLNGFDPA KDMVKPEEMVVLDRWAVGCAKAAQEDILKAYEAYDFHEV VQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQTAL YHIAEALVRWMAPILSFTADEVWGYLPGERE SEQ ID LeuNAAB IESKVQLHWDEKRTFEVTEDESKEKYYCLSMLPYPSGRLHM NO: 19 GHVRNYTIGDVIARYQRMLGKNVLQPIGWDAFGLPAEGAA VKNNTAPAPWTYDNIAYMKNQLKMLGFGYDWSRELATCTP EYYRWEQKCFTELYKKGLVYKKTSAVNWCPNDQTVLANE QVIDGCCWRCDTKVERKEIPQWFIKITAYADELLNDLDKLD HWPDTVKTMQRNWIGRSEGVEITFNVKDYDNTLTVYTTRPD TFMGCTYLAVAAGHPLAQKAAENNPELAAFIDECRNTKVAE AEMATMEKKGVDTGFKAVHPLTGEEIPVWAANFVLMEYGT GAVMAVPGHDQRDYEFASKYGLNIKPVILAADGSEPDLSQQ ALTEKGVLFNSGEFNGLDHEAAFNAIADKLTEMGVGERKVN YRLRDWGVSRQRYWGAPIPMVTLEDGTVMPTPDDQLPVILP EDVVMDGITSPIKADPEWAKTTVNGMPALRETDTFDTFMES SWYYARYTCPEYKEGMLDSKAANYWLPVDIYIGGIEHAIMH LLYFRFFHKLMRDAGMVNSDEPAKQLLCQGMVLADAFYYV GENGERNWVSPVDAIVERDEKGRIVKAKDAAGHELVYTGM SKMSKSKNNGIDPQVMVERYGADTVRLFMMFASPADMTLE WQESGVEGANRFLKRVWKLVYEHTAKGDVAALNVDALTE DQKALRRDVHKTIAKVTDDIGRRQTFNTAIAAIMELMNKLA KAPTDGEQDRALMQEALLAVVRMLNPFTPHICFTLWQELKG EGDIDNAPWP SEQ ID LysNAAB ANDKSRQTFVVRSKILAAIRQFMVARGFMEVETPMMQVIPG NO: 20 GASARPFITHHNALDLDMYLRIAPELYLKRLVVGGFERVFEI NRNFRNEGISVRHNPEFTMMELYMAYADYHDLIELTESLFRT LAQEVLGTTKVTYGEHVFDFGKPFEKLTMREAIKKYRPETD MADLDNFDAAKALAESIGITVEKSWGLGRIVTEIFDEVAEAH LIQPTFITEYPAEVSPLARRNDVNPEITDRFEFFIGGREIGNGFS ELNDAEDQAERFQEQVNAKAAGDDEAMFYDEDYVTALEY GLPPTAGLGIGIDRMIMLFTNSHTIRDVILFPAMRP SEQ ID ProNAAB MIRKLASGLYTWLPTGVRVLKKVENIVREEMNNAGAIEVLM NO: 21 PVVQPSELWQESGRWEQYGPELLRIADRGDRPFVLGPTHEE VITDLIRNELSSYKQLPLNFYQIQTKFRDEVRPRFGVMRSREF LMKDAYSFHTSQESLQETYDAMYAAYSKIFSRMGLDFRAVQ ADTGSIGGSASHEFQVLAQSGEDDVVFSDTSDYAANIELAEA IAPKEPRAAATQEMTLVDTPNAKTIAELVEQFNLPIEKTVKTL LVKAVEGSSFPLVALLVRGDHELNEVKAEKLPQVASPLTFAT EEEIRAVVKAGPGSLGPVNMPIPVVIDRTVAAMSDFAAGANI DGKHYFGINWDRDVATPEIADIRNVVAGDPSPDGQGTLLIKR GIEVGHIFQLG SEQ ID SerNAAB MLDPNLLRNEPDAVAEKLARRGFKLDVDKLGALEERRKVL NO: 22 QVKTENLQAERNSRSKSIGQAKARGEDIEPLRLEVNKLGEEL DAAKAELDALQAEIRDIALTIPNLPADEVPVGKDENDNVEVS RWGTPREFDFEVRDHVTLGEMYSGLDFAAAVKLTGSRFVV MKGQIARMHRALSQFMLDLHTEQHGYSENYVPYLVNQDTL YGTGQLPKFAGDLFHTRPLEEEADTSNYALIPTAEVPLTNLV RGEIIDEDDLPIKMTAHTPCFRSEAGSYGRDTRGLIRMHQFD KVEMVQIVRPEDSMAALEEMTGHAEKVLQLLGLPYRKIILC TGDMGFGACKTYDLEVWIPAQNTYREISSCSNVWDFQARR MQARCRSKSDKKTRLVHTLNGSGLAVGRTLVAVMENYQQ ADGRIEVPEVLRPYMNGLEYI SEQ ID ThreNAAB RDHRKIGKQLDLYHMQEEAPGMVFWHNDGWTIFRELEVFV

NO: 23 RSKLKEYQYQEVKGPFMMDRVLWEKTGHWDNYKDAMFTT S SENREYCIKPMNCPGHVQIFNQGLKSYRDLPLRMAEFGSCH RNEPSGSLHGLGRVRGFTQDDAHIFCTEEQIRDEVNGCIRLV YDMYSTFGFEKIVVKLSTRPEKRIGSDEMWDRAEADLAVAL EENNIPFEYQLGEGAFYGPKIEFTLYDCLDRAAQCGTVQLDF SLPSRLSASYVGEDNERKVPVMIHRAILGSMEVFIGILTEEFA GFFPTWLAPVQVVIMNITDSQSEYVNELTQKLSNAGIRVKAD LRNEKIGFKIREHTLRRVPYMLVCGDKEVESGKVAVRTRRG KDLGSMDVNEVIEKLQQEIRSRSLKQLEE SEQ ID TrpNAAB MTKPIVFSGAQPSGELTIGNYMGALRQWINMQDDYHCIYCI NO: 24 VDQHAITVRQDAQKLRKATLDTLALYLACGIDPEKSTIFVQS HVPEHAQLGWALNCYTYFGELSRMTQFKDKSARYAENINA GLFDYPVLMAADILLYQTNLVPVGEDQKQHLELSRDIAQRF NALYGDIFKVPEPFIPKSGARVMSLLEPTKKMSKSDDNRNNV IGLLEDPKSVVKKIKRAVTDSDEPPVVRYDVQNKAGVSNLL DILSAVTGQSIPELEKQ SEQ ID TyrNAAB MASSNLIKQLQERGLVAQVTDEEALVERLAQGPIALYCGFDP NO: 25 TADSLHLGHLVPLLCLKRFQQAGHKPVALVGGATGLIGDPS FKAAERKLNTEETVQEWVDKIRKQVAPFLDFDCGENSAIAA NNYDWFGNMNVLTFLRDIGKHFSVNQMINKEAVKQRLNRE DQGISFTEFSYNLLQGYDFACLNKQYGVVLQIGGSDQWGNI TSGIDLTRRLHQNQVFGLTVPLITKADGTKFGKTEGGAVWL DPKKTSPYKFYQFWINTADADVYRFLKFFTFMSIEEINALEEE DKNSGKAPRAQYVLAEQVTRLVHGEEGLQAAKRITECLFSG SLSALSEADFEQLAQDGVPMVKMEKGADLMQALVDSELQP SRGQARKTIASNAITINGEKQSDPEYFFKEEDRLFGRFTLLRR GKKNYCLICWK SEQ ID ValNAAB MEKTYNPQDIEQPLYEHWEKQGYFKPNGDESQESFCIMIPPP NO: 26 NVTGSLHMGHAFQQTIMDTMIRYQRMQGKNTLWQVGTDH AGIATQMVVERKIAAEEGKTRHDYGREAFIDKIWEWKAESG GTITRQMRRLGNSVDWERERFTMDEGLSNAVKEVFVRLYK EDLIYRGKRLVNWDPKLRTAISDLEVENRESKGSMWHIRYP LADGAKTADGKDYLVVATTRPETLLGDTGVAVNPEDPRYK DLIGKYVILPLVNRRIPIVGDEHADMEKGTGCVKITPAHDFN DYEVGKRHALPMINILTFDGDIRESAQVFDTKGNESDVYSSEI PAEFQKLERFAARKAVVAAIDALGLLEEIKPHDLTVPYGDRG GVVIEPMLTDQWYVRADVLAKPAVEAVENGDIQFVPKQYE NMYFSWMRDIQDWCISRQLWWGHRIPAWYDEAGNVYVGR NEEEVRKENNLGADVALRQDEDVLDTWFSSALWTFSTLGW PENTDALRQFHPTSVMVSGFDIIFFWIARMIMMTMHFIKDEN GKPQVPFHTVYMTGLIRDDEGQKMSKSKGNVIDPLDMVDGI SLPELLEKRTGNMMQPQLADKIRKRTEKQFPNGIEPHGTDAL RFTLAALASTGRDINWDMKRLEGYRNFCNKLWNASRFVLM NTEGQDCGFNGGEMTLSLADRWILAEFNQTIKAYREALDSF RFDIAAGILYEFTWNQFCDWYLELTKPVMNGGTEAELRGTR HTLVTVLEGLLRLAHPIIPFITETIWQ SEQ ID Phospho- MDEFEMIKRNTSEIISELREVLKKDEKSALIGFEPSGKIHLGH NO: 27 tyrosine YLQKKMIDLQNAGFDIIIPLADLHAYLNQKGELDEIRKIGDY NAAB** NKKVFEAMLKAKYVYGSEFQLDKYTLNVYRLALKTTLKAR RSMELIAREDENPVAEVIYPIMQVNGCHYKGVDVAVGGME QRKIMLARELLPKKVVCIHPVLTGLDGEGKMSSSGNFIAVDD SPEEIRAFKKAYCPAGVVEGNPEIAKYFLEYPLTIKPEKFGGD LTVNSYEESLFKNKELHPMDLKAVAEELIKILEPIRK SEQ ID Phospho- MRFDPEKIKKDAKENFDLTWNEGKKMVKTPTLNERYPRTTF NO: 28 serine RYGKAHPVYDTIQKLREAYLRMGFEEMMNPLIVDEKEVHK NAAB QFGSEALAVLDRCFYLAGLPRPNVGISDERIAQINGILGDIGD EGIDKVRKVLHAYKKGKVEGDDLVPEISAALEVSDALVAD MIEKVFPEFKELVAQASTKTLRSHMTSGWFISLGALLERKEP PFHFFSIDRCFRREQQEDASRLMTYYSASCVIMDENVTVDHG KAVAEGLLSQFGFEKFLFRPDEKRSKYYVPDTQTEVFAFHPK LVGSNSKYSDGWIEIATFGIYSPTALAEYDIPCPVMNLGLGVE RLAMILHDAPDIRSLTYPQIPQYSEWEMSDSELAKQVFVDKT PETPEGREIADAVVAQCELHGEEP SPCEFPAWEGEVCGRKVK VSVIEPEENTKLCGPAAFNEVVTYQGDILGIPNTKKWQKAFE NHSAMAGIRFIEAFAAQAAREIEEAAMSGADEHIVRVRIVKV PSEVNIKIGATAQRYITGKNKKIDMRGPIFTSAKAEFE *Utilizes base truncation mutant reported in reference (3) with an additional mutation of our own design. **Truncated version of sulfotyrosine tRNA synthetase mutant from (2). The full length mutant is under patent - no. U.S. Pat. No. 8,114,652 B2.

TABLE-US-00005 TABLE B Edmanase Sequence SEQ ID APAAVDWRARGAVTAVKDSGQCGSGWAFAAIGNVECQWFLA NO: 29 GHPLTNLSEQMLVSCDKTDSGCSSGLMDNAFEWIVQENNGA VYTEDSYPYASATGISPPCTTSGHTVGATITGHVELPQDEA QIAAWLAVNGPVAVCVDASSWMTYTGGVMTSCVSESYDHGV LLVGYNDSHKVPYWIIKNSWTTQWGEEGYIRIAKGSNQCLV KEEASSAVVG

REFERENCES

[0132] 1. Ingolia N T, Ghaemmaghami S, Newman JRS, Weissman J S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science, 2009, 324: 218. [0133] 2. Grimsrud P A, Swaney D L, Wenger C D, Beauchene N A, Coon J J. Phosphoproteomics for the masses. ACS Chem Biol. 2010, 5: 105-119. [0134] 3. Duncan M W, Aebersold R, Caprioli R M. The pros and cons of peptide-centric proteomics. Nat Biotechnol. 2010. [0135] 4. Gillette M A, Mani D R, Carr S A. Place of Pattern in Proteomic Biomarker Discovery. J Proteome Res. 2005, 4: 1143-1154. [0136] 5. Anderson N L, Anderson N G. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002 [0137] 6. Edman P. Method for determination of the amino acid sequence in peptides. Acta Chem Scand. 1950, 4: 283-293. [0138] 7. Mitra R D, Tessler L A. Single Molecule Protein Screening. WO 2010/065531 A1. [0139] 8. Tessler L A, Donahoe C D, Garcia D J, Jun Y S, Elbert D L, Mitra R D. Nanogel surface coatings for improved single-molecule imaging substrates. J R Soc Interface. 2011 [0140] 9. Tessler L A, Reifenberger J G, Mitra R D. Protein Quantification in Complex Mixtures by Solid Phase Single Molecule Counting. Anal Chem. 2009, 81: 7141-7148. [0141] 10. Emmert-Buck M R, Bonner R F, Smith P D, Chuaqui R F, Zhuang Z, Goldstein S R, Weiss R A, Liotta L A. Laser capture microdissection. Science. 1996, 274: 998. [0142] 11. Havranek J J, Harbury P B. Automated design of specificity in molecular recognition. Nat Struct Biol. 2003, 10: 45-52. [0143] 12. Ashworth J, Havranek J J, Duarte C M, Sussman D, R. J. Monnat J, Stoddard B L, Baker D. Computational redesign of endonuclease DNA binding and cleavage specificity. Nature. 2006 [0144] 13. Ashworth J, Taylor G K, Havranek J J, Quadri S A, Stoddard B L, Baker D. Computational reprogramming of homing endonuclease specificity at multiple adjacent base pairs. Nucleic Acids Res. 2010, 38: 5601. [0145] 14. Havranek J J, Baker D. Motif-directed flexible backbone design of functional interactions. Protein Sci. 2009, 18: 1293-1305. [0146] 15. Berman H, Henrick K, Nakamura H, Markley J L. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2oo6, 35: D301. [0147] 16. Schmitt E, Tanrikulu I C, Yoo T H, Panvert M, Tirrell D A, Mechulam Y. Switching from an induced-fit to a lock-and-key mechanism in an aminoacyl-tRNA synthetase with modified specificity. J Mol Biol. 2009, 394: 843-851. [0148] 17. Studier F W. Protein production by auto-induction in high-density shaking cultures. Protein Expr Purif 2005, 41: 207-234. [0149] 18. Wolf Y I, Aravind L, Grishin N V, Koonin E V. Evolution of aminoacyl-tRNA synthetases-analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999, 9: 689. [0150] 19. Finn R D, Tate J, Mistry J, Coggill P C, Sammut S J, Hotz H R, Ceric G, Forslund K, Eddy S R, Sonnhammer E L, Bateman A. The Pfam protein families database. Nucleic Acids Res. 2008, 36: D281-8. [0151] 20. Augustine J, Francklyn C. Design of an active fragment of a class II aminoacyl-tRNA synthetase and its significance for synthetase evolution. Biochemistry. 1997, 36: 3473-3482. [0152] 21. Amez J G, Augustine J G, Moras D, Francklyn C S. The first step of aminoacylation at the atomic level in histidyl-tRNA synthetase. Proc Natl Acad Sci USA. 1997, 94: 7144. [0153] 22. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010, 38: W545. [0154] 23. Kavran J M, Gundllapalli S, O'donoghue P, Englert M, Soll D, Steitz T A. Structure of pyrrolysyl-tRNA synthetase, an archaeal enzyme for genetic code innovation. Proc Natl Acad Sci USA. 2007, 104: 11268. [0155] 24. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA. 2000; 97:10383-10388. [0156] 25. Barrett G C, Penglis A J. Edman Stepwise degradation of polypeptides: a new strategy employing mild basic cleavage conditions. Tetrahedron Lett. 1985, 26: 4375-4378. [0157] 26. Celej M S, Montich G G, Fidelia G D. Protein stability induced by ligand binding correlates with changes in protein flexibility. Protein Sci. 2003, 12: 1496-1506. [0158] 27. Choe Y, Brinen L S, Price M S, Engel J C, Lange M, Grisostomi C, Weston S G, Pallai P V, Cheng H, Hardy L W. Development of a-keto-based inhibitors of cruzain, a cysteine protease implicated in Chagas disease. Bioorg Med Chem. 2005, 13: 2141-2156. [0159] 28. Carter P, Wells J A. Engineering enzyme specificity by "substrate-assisted catalysis". Science. 1987, 237: 394. [0160] 29. McGrath M E. The lysosomal cysteine proteases. Annu Rev Biophys Biomol Struct. 1999, 28: 181-204. [0161] 30. Jiang L, Althoff E A, Clemente F R, Doyle L, Rothlisberger D, Zanghellini A, Gallaher J L, Betker J L, Tanaka F, Barbas C F 3rd, Hilvert D, Houk H N, Stoddard B L, Baker D. De novo computational design of retro-aldol enzymes. Science. 2008, 319: 1387-1391. [0162] 31. Rothlisberger D, Khersonsky O, Wollacott A M, Jiang L, DeChancie J, Betker J, Gallaher J L, Althoff E A, Zanghellini A, Dym O, Albeck S, Houk K N, Tawfik D S, Baker D. Kemp elimination catalysts by computational enzyme design. Nature. 2008, 453: 190-195. [0163] 32. Schmidt M W, Baldridge K K, Boatz J A, Elbert S T, Gordon M S, Jensen J H, Koseki S, Matsunaga N, Nguyen K A, Su S. General atomic and molecular electronic structure system. J Comput Chem. 1993, 14: 1347-1363. [0164] 33. Dantas G, Corrent C, Reichow S L, Havranek J J, Eletr Z M, Isern N G, Kuhlman B, Varani G, Merritt E A, Baker D. High-resolution structural and thermodynamic analysis of extreme stabilization of human procarboxypeptidase by computational protein design. J Mol Biol. 2007, 366: 1209-1221. [0165] 34. Dunbrack R L. Backbone-dependent rotamer library for proteins application to side-chain prediction. J Mol Biol. 1993, 230: 543-574. [0166] 35. Chiravuri M, Agarraberes F, Mathieu S L, Lee H, Huber B T. Vesicular localization and characterization of a novel post-proline-cleaving aminodipeptidase, quiescent cell proline dipeptidase. J Immunol. 2000, 165: 5695. [0167] 36. Fukunaga R, Yokoyama S. Structural insights into the first step of RNA-dependent cysteine biosynthesis in archaea. Nat Struct Mol Biol. 2007, 14: 272-279. [0168] 37. Liu C C, Schultz P G. Recombinant expression of selectively sulfated proteins in Escherichia coli. Nat Biotechnol. 2006, 24: 1436-1440. [0169] 38. Turner J M, Graziano J, Spraggon G, Schultz P G. Structural characterization of a p-acetylphenylalanyl aminoacyl-tRNA synthetase. J Am Chem Soc. 2005, 127: 14976-14977. [0170] 39. Xie J, Supekova L, Schultz P G. A genetically encoded metabolically stable analogue of phosphotyrosine in Escherichia coli. ACS Chem Biol. 2007, 2: 474-478. [0171] 40. O'Brien P J, Herschlag D. Catalytic promiscuity and the evolution of new enzymatic activities. Chem Biol. 1999, 6: R91-R105.

Sequence CWU 1

1

301264PRTEscherichia coli 1Met Ala Ile Ser Ile Lys Thr Pro Glu Asp Ile Glu Lys Met Arg Val1 5 10 15Ala Gly Arg Leu Ala Ala Glu Val Leu Glu Met Ile Glu Pro Tyr Val 20 25 30Lys Pro Gly Val Ser Thr Gly Glu Leu Asp Arg Ile Cys Asn Asp Tyr 35 40 45Ile Val Asn Glu Gln His Ala Val Ser Ala Cys Leu Gly Tyr His Gly 50 55 60Tyr Pro Lys Ser Val Cys Ile Ser Ile Asn Glu Val Val Cys His Gly65 70 75 80Ile Pro Asp Asp Ala Lys Leu Leu Lys Asp Gly Asp Ile Val Asn Ile 85 90 95Asp Val Thr Val Ile Lys Asp Gly Phe His Gly Asp Thr Ser Lys Met 100 105 110Phe Ile Val Gly Lys Pro Thr Ile Met Gly Glu Arg Leu Cys Arg Ile 115 120 125Thr Gln Glu Ser Leu Tyr Leu Ala Leu Arg Met Val Lys Pro Gly Ile 130 135 140Asn Leu Arg Glu Ile Gly Ala Ala Ile Gln Lys Phe Val Glu Ala Glu145 150 155 160Gly Phe Ser Val Val Arg Glu Tyr Cys Gly His Gly Ile Gly Arg Gly 165 170 175Phe His Glu Glu Pro Gln Val Leu His Tyr Asp Ser Arg Glu Thr Asn 180 185 190Val Val Leu Lys Pro Gly Met Thr Phe Thr Ile Glu Pro Met Val Asn 195 200 205Ala Gly Lys Lys Glu Ile Arg Thr Met Lys Asp Gly Trp Thr Val Lys 210 215 220Thr Lys Asp Arg Ser Leu Ser Ala Gln Tyr Glu His Thr Ile Val Val225 230 235 240Thr Asp Asn Gly Cys Glu Ile Leu Thr Leu Arg Lys Asp Asp Thr Ile 245 250 255Pro Ala Ile Ile Ser His Asp Glu 2602264PRTEscherichia coli 2Met Ala Ile Ser Ile Lys Thr Pro Glu Asp Ile Glu Lys Met Arg Val1 5 10 15Ala Gly Arg Leu Ala Ala Glu Val Leu Glu Met Ile Glu Pro Tyr Val 20 25 30Lys Pro Gly Val Ser Thr Gly Glu Leu Glu Arg Ile Cys Trp Asp Tyr 35 40 45Ile Val Asn Glu Gln His Ala Thr Asp Ser Leu Thr Gly His Asn Gly 50 55 60Ile Asp Gly His Gly Ser Ile Ser Ile Asn Glu Val Val Cys His Gly65 70 75 80Val Pro Asp Asp Ala Lys Leu Leu Lys Asp Gly Asp Ile Val Asn Ile 85 90 95Asp Val Thr Val Arg Lys Asp Gly Phe His Gly Asp Thr Ser Lys Met 100 105 110Phe Ile Val Gly Lys Pro Thr Ile Met Gly Glu Arg Leu Cys Arg Ile 115 120 125Thr Gln Glu Ser Leu Tyr Leu Ala Leu Arg Met Val Lys Pro Gly Ile 130 135 140Asn Leu Arg Glu Ile Gly Ala Ala Ile Gln Lys Phe Val Glu Ala Glu145 150 155 160Gly Phe Ser Val Val Arg Glu Tyr Cys Gly His Gly Ile Gly Arg Gly 165 170 175His His Glu Glu Pro Gln Val Leu His Tyr Asp Ser Arg Glu Thr Asn 180 185 190Val Val Leu Lys Pro Gly Met Thr Phe Thr Ile Glu Pro Met Val Asn 195 200 205Ala Gly Lys Lys Glu Ile Arg Thr Met Lys Asp Gly Ser Thr Val Lys 210 215 220Thr Lys Asp Arg Ser Leu Ser Ala Gln Tyr Glu His Thr Ile Val Val225 230 235 240Thr Asp Asn Gly Cys Glu Ile Leu Thr Leu Arg Lys Asp Asp Thr Ile 245 250 255Pro Ala Ile Ile Ser His Asp Glu 2603544PRTEscherichia coli 3Ala Lys Lys Ile Leu Val Thr Cys Ala Leu Pro Tyr Ala Asn Gly Ser1 5 10 15Ile His Leu Gly His Met Leu Glu His Ile Gln Ala Asp Val Trp Val 20 25 30Arg Tyr Gln Arg Met Arg Gly His Glu Val Asn Phe Ile Cys Ala Asp 35 40 45Asp Ala His Gly Thr Pro Ile Met Leu Lys Ala Gln Gln Leu Gly Ile 50 55 60Thr Pro Glu Gln Met Ile Gly Glu Met Ser Gln Glu His Gln Thr Asp65 70 75 80Phe Ala Gly Phe Asn Ile Ser Tyr Asp Asn Tyr His Ser Thr His Ser 85 90 95Glu Glu Asn Arg Gln Leu Ser Glu Leu Ile Tyr Ser Arg Leu Lys Glu 100 105 110Asn Gly Phe Ile Lys Asn Arg Thr Ile Ser Gln Leu Tyr Asp Pro Glu 115 120 125Lys Gly Met Phe Leu Pro Asp Arg Phe Val Lys Gly Thr Cys Pro Lys 130 135 140Cys Lys Ser Pro Asp Gln Tyr Gly Asp Asn Cys Glu Val Cys Gly Ala145 150 155 160Thr Tyr Ser Pro Thr Glu Leu Ile Glu Pro Lys Ser Val Val Ser Gly 165 170 175Ala Thr Pro Val Met Arg Asp Ser Glu His Phe Phe Phe Asp Leu Pro 180 185 190Ser Phe Ser Glu Met Leu Gln Ala Trp Thr Arg Ser Gly Ala Leu Gln 195 200 205Glu Gln Val Ala Asn Lys Met Gln Glu Trp Phe Glu Ser Gly Leu Gln 210 215 220Gln Trp Asp Ile Ser Arg Asp Ala Pro Tyr Phe Gly Phe Glu Ile Pro225 230 235 240Asn Ala Pro Gly Lys Tyr Phe Tyr Val Trp Leu Asp Ala Pro Ile Gly 245 250 255Tyr Met Gly Ser Phe Lys Asn Leu Cys Asp Lys Arg Gly Asp Ser Val 260 265 270Ser Phe Asp Glu Tyr Trp Lys Lys Asp Ser Thr Ala Glu Leu Tyr His 275 280 285Phe Ile Gly Lys Asp Ile Val Tyr Phe His Ser Leu Phe Trp Pro Ala 290 295 300Met Leu Glu Gly Ser Asn Phe Arg Lys Pro Ser Asn Leu Phe Val His305 310 315 320Gly Tyr Val Thr Val Asn Gly Ala Lys Met Ser Lys Ser Arg Gly Thr 325 330 335Phe Ile Lys Ala Ser Thr Trp Leu Asn His Phe Asp Ala Asp Ser Leu 340 345 350Arg Tyr Tyr Tyr Thr Ala Lys Leu Ser Ser Arg Ile Asp Asp Ile Asp 355 360 365Leu Asn Leu Glu Asp Phe Val Gln Arg Val Asn Ala Asp Ile Val Asn 370 375 380Lys Val Val Asn Leu Ala Ser Arg Asn Ala Gly Phe Ile Asn Lys Arg385 390 395 400Phe Asp Gly Val Leu Ala Ser Glu Leu Ala Asp Pro Gln Leu Tyr Lys 405 410 415Thr Phe Thr Asp Ala Ala Glu Val Ile Gly Glu Ala Trp Glu Ser Arg 420 425 430Glu Phe Gly Lys Ala Val Arg Glu Ile Met Ala Leu Ala Asp Leu Ala 435 440 445Asn Arg Tyr Val Asp Glu Gln Ala Pro Trp Val Val Ala Lys Gln Glu 450 455 460Gly Arg Asp Ala Asp Leu Gln Ala Ile Cys Ser Met Gly Ile Asn Leu465 470 475 480Phe Arg Val Leu Met Thr Tyr Leu Lys Pro Val Leu Pro Lys Leu Thr 485 490 495Glu Arg Ala Glu Ala Phe Leu Asn Thr Glu Leu Thr Trp Asp Gly Ile 500 505 510Gln Gln Pro Leu Leu Gly His Lys Val Asn Pro Phe Lys Ala Leu Tyr 515 520 525Asn Arg Ile Asp Met Arg Gln Val Glu Ala Leu Val Glu Ala Ser Lys 530 535 5404544PRTEscherichia coli 4Ala Lys Lys Ile Leu Val Thr Cys Ala Ser Pro Tyr Ala Asn Gly Ser1 5 10 15Ile His Leu Gly His Met Leu Glu His Ile Gln Ala Asp Val Trp Val 20 25 30Arg Tyr Gln Arg Met Arg Gly His Glu Val Asn Phe Ile Cys Ala Asp 35 40 45Asp Ala His Gly Thr Pro Ile Met Leu Lys Ala Gln Gln Leu Gly Ile 50 55 60Thr Pro Glu Gln Met Ile Gly Glu Met Ser Gln Glu His Gln Thr Asp65 70 75 80Phe Ala Gly Phe Asn Ile Ser Tyr Asp Asn Tyr His Ser Thr His Ser 85 90 95Glu Glu Asn Arg Gln Leu Ser Glu Leu Ile Tyr Ser Arg Leu Lys Glu 100 105 110Asn Gly Phe Ile Lys Asn Arg Thr Ile Ser Gln Leu Tyr Asp Pro Glu 115 120 125Lys Gly Met Phe Leu Pro Asp Arg Phe Val Lys Gly Thr Cys Pro Lys 130 135 140Cys Lys Ser Pro Asp Gln Tyr Gly Asp Asn Cys Glu Val Cys Gly Ala145 150 155 160Thr Tyr Ser Pro Thr Glu Leu Ile Glu Pro Lys Ser Val Val Ser Gly 165 170 175Ala Thr Pro Val Met Arg Asp Ser Glu His Phe Phe Phe Asp Leu Pro 180 185 190Ser Phe Ser Glu Met Leu Gln Ala Trp Thr Arg Ser Gly Ala Leu Gln 195 200 205Glu Gln Val Ala Asn Lys Met Gln Glu Trp Phe Glu Ser Gly Leu Gln 210 215 220Gln Trp Asp Ile Ser Arg Asp Ala Pro Tyr Phe Gly Phe Glu Ile Pro225 230 235 240Asn Ala Pro Gly Lys Tyr Phe Tyr Val Trp Leu Asp Ala Pro Ile Gly 245 250 255Leu Met Gly Ser Phe Lys Asn Leu Cys Asp Lys Arg Gly Asp Ser Val 260 265 270Ser Phe Asp Glu Tyr Trp Lys Lys Asp Ser Thr Ala Glu Leu Tyr His 275 280 285Phe Ile Gly Lys Gly Ile Val Tyr Phe Leu Ser Leu Phe Trp Pro Ala 290 295 300Met Leu Glu Gly Ser Asn Phe Arg Lys Pro Ser Asn Leu Phe Val His305 310 315 320Gly Tyr Val Thr Val Asn Gly Ala Lys Met Ser Lys Ser Arg Gly Thr 325 330 335Phe Ile Lys Ala Ser Thr Trp Leu Asn His Phe Asp Ala Asp Ser Leu 340 345 350Arg Tyr Tyr Tyr Thr Ala Lys Leu Ser Ser Arg Ile Asp Asp Ile Asp 355 360 365Leu Asn Leu Glu Asp Phe Val Gln Arg Val Asn Ala Asp Ile Val Asn 370 375 380Lys Val Val Asn Leu Ala Ser Arg Asn Ala Gly Phe Ile Asn Lys Arg385 390 395 400Phe Asp Gly Val Leu Ala Ser Glu Leu Ala Asp Pro Gln Leu Tyr Lys 405 410 415Thr Phe Thr Asp Ala Ala Glu Val Ile Gly Glu Ala Trp Glu Ser Arg 420 425 430Glu Phe Gly Lys Ala Val Arg Glu Ile Met Ala Leu Ala Asp Leu Ala 435 440 445Asn Arg Tyr Val Asp Glu Gln Ala Pro Trp Val Val Ala Lys Gln Glu 450 455 460Gly Arg Asp Ala Asp Leu Gln Ala Ile Cys Ser Met Gly Ile Asn Leu465 470 475 480Phe Arg Val Leu Met Thr Tyr Leu Lys Pro Val Leu Pro Lys Leu Thr 485 490 495Glu Arg Ala Glu Ala Phe Leu Asn Thr Glu Leu Thr Trp Asp Gly Ile 500 505 510Gln Gln Pro Leu Leu Gly His Lys Val Asn Pro Phe Lys Ala Leu Tyr 515 520 525Asn Arg Ile Asp Met Arg Gln Val Glu Ala Leu Val Glu Ala Ser Lys 530 535 5405677PRTEscherichia coli 5Met Thr Gln Val Ala Lys Lys Ile Leu Val Thr Cys Ala Leu Pro Tyr1 5 10 15Ala Asn Gly Ser Ile His Leu Gly His Met Leu Glu His Ile Gln Ala 20 25 30Asp Val Trp Val Arg Tyr Gln Arg Met Arg Gly His Glu Val Asn Phe 35 40 45Ile Cys Ala Asp Asp Ala His Gly Thr Pro Ile Met Leu Lys Ala Gln 50 55 60Gln Leu Gly Ile Thr Pro Glu Gln Met Ile Gly Glu Met Ser Gln Glu65 70 75 80His Gln Thr Asp Phe Ala Gly Phe Asn Ile Ser Tyr Asp Asn Tyr His 85 90 95Ser Thr His Ser Glu Glu Asn Arg Gln Leu Ser Glu Leu Ile Tyr Ser 100 105 110Arg Leu Lys Glu Asn Gly Phe Ile Lys Asn Arg Thr Ile Ser Gln Leu 115 120 125Tyr Asp Pro Glu Lys Gly Met Phe Leu Pro Asp Arg Phe Val Lys Gly 130 135 140Thr Cys Pro Lys Cys Lys Ser Pro Asp Gln Tyr Gly Asp Asn Cys Glu145 150 155 160Val Cys Gly Ala Thr Tyr Ser Pro Thr Glu Leu Ile Glu Pro Lys Ser 165 170 175Val Val Ser Gly Ala Thr Pro Val Met Arg Asp Ser Glu His Phe Phe 180 185 190Phe Asp Leu Pro Ser Phe Ser Glu Met Leu Gln Ala Trp Thr Arg Ser 195 200 205Gly Ala Leu Gln Glu Gln Val Ala Asn Lys Met Gln Glu Trp Phe Glu 210 215 220Ser Gly Leu Gln Gln Trp Asp Ile Ser Arg Asp Ala Pro Tyr Phe Gly225 230 235 240Phe Glu Ile Pro Asn Ala Pro Gly Lys Tyr Phe Tyr Val Trp Leu Asp 245 250 255Ala Pro Ile Gly Tyr Met Gly Ser Phe Lys Asn Leu Cys Asp Lys Arg 260 265 270Gly Asp Ser Val Ser Phe Asp Glu Tyr Trp Lys Lys Asp Ser Thr Ala 275 280 285Glu Leu Tyr His Phe Ile Gly Lys Asp Ile Val Tyr Phe His Ser Leu 290 295 300Phe Trp Pro Ala Met Leu Glu Gly Ser Asn Phe Arg Lys Pro Ser Asn305 310 315 320Leu Phe Val His Gly Tyr Val Thr Val Asn Gly Ala Lys Met Ser Lys 325 330 335Ser Arg Gly Thr Phe Ile Lys Ala Ser Thr Trp Leu Asn His Phe Asp 340 345 350Ala Asp Ser Leu Arg Tyr Tyr Tyr Thr Ala Lys Leu Ser Ser Arg Ile 355 360 365Asp Asp Ile Asp Leu Asn Leu Glu Asp Phe Val Gln Arg Val Asn Ala 370 375 380Asp Ile Val Asn Lys Val Val Asn Leu Ala Ser Arg Asn Ala Gly Phe385 390 395 400Ile Asn Lys Arg Phe Asp Gly Val Leu Ala Ser Glu Leu Ala Asp Pro 405 410 415Gln Leu Tyr Lys Thr Phe Thr Asp Ala Ala Glu Val Ile Gly Glu Ala 420 425 430Trp Glu Ser Arg Glu Phe Gly Lys Ala Val Arg Glu Ile Met Ala Leu 435 440 445Ala Asp Leu Ala Asn Arg Tyr Val Asp Glu Gln Ala Pro Trp Val Val 450 455 460Ala Lys Gln Glu Gly Arg Asp Ala Asp Leu Gln Ala Ile Cys Ser Met465 470 475 480Gly Ile Asn Leu Phe Arg Val Leu Met Thr Tyr Leu Lys Pro Val Leu 485 490 495Pro Lys Leu Thr Glu Arg Ala Glu Ala Phe Leu Asn Thr Glu Leu Thr 500 505 510Trp Asp Gly Ile Gln Gln Pro Leu Leu Gly His Lys Val Asn Pro Phe 515 520 525Lys Ala Leu Tyr Asn Arg Ile Asp Met Arg Gln Val Glu Ala Leu Val 530 535 540Glu Ala Ser Lys Glu Glu Val Lys Ala Ala Ala Ala Pro Val Thr Gly545 550 555 560Pro Leu Ala Asp Asp Pro Ile Gln Glu Thr Ile Thr Phe Asp Asp Phe 565 570 575Ala Lys Val Asp Leu Arg Val Ala Leu Ile Glu Asn Ala Glu Phe Val 580 585 590Glu Gly Ser Asp Lys Leu Leu Arg Leu Thr Leu Asp Leu Gly Gly Glu 595 600 605Lys Arg Asn Val Phe Ser Gly Ile Arg Ser Ala Tyr Pro Asp Pro Gln 610 615 620Ala Leu Ile Gly Arg His Thr Ile Met Val Ala Asn Leu Ala Pro Arg625 630 635 640Lys Met Arg Phe Gly Ile Ser Glu Gly Met Val Met Ala Ala Gly Pro 645 650 655Gly Gly Lys Asp Ile Phe Leu Leu Ser Pro Asp Ala Gly Ala Lys Pro 660 665 670Gly His Gln Val Lys 6756265PRTThermus thermophilus 6Val Asp Val Ser Leu Pro Gly Ala Ser Leu Phe Ser Gly Gly Leu His1 5 10 15Pro Ile Thr Leu Met Glu Arg Glu Leu Val Glu Ile Phe Arg Ala Leu 20 25 30Gly Tyr Gln Ala Val Glu Gly Pro Glu Val Glu Ser Glu Phe Phe Asn 35 40 45Phe Asp Ala Leu Asn Ile Pro Glu His His Pro Ala Arg Asp Met Trp 50 55 60Asp Thr Phe Trp Leu Thr Gly Glu Gly Phe Arg Leu Glu Gly Pro Leu65 70 75 80Gly Glu Glu Val Glu Gly Arg Leu Leu Leu Arg Thr His Thr Ser Pro 85 90 95Met Gln Val Arg Tyr Met Val Ala His Thr Pro Pro Phe Arg Ile Val 100 105 110Val Pro Gly Arg Val Phe Arg Phe Glu Gln Thr Asp Ala Thr His Glu 115 120 125Ala Val Phe His Gln Leu Glu Gly Leu Val Val Gly Glu Gly Ile Ala 130 135 140Met Ala His Leu Lys Gly Ala Ile Tyr Glu Leu Ala Gln Ala Leu Phe145 150 155 160Gly Pro Asp Ser Lys Val Arg Phe Gln Pro Val Tyr Phe Pro Phe Val

165 170 175Glu Pro Gly Ala Gln Phe Ala Val Trp Trp Pro Glu Gly Gly Lys Trp 180 185 190Leu Glu Leu Gly Gly Ala Gly Met Val His Pro Lys Val Phe Gln Ala 195 200 205Val Asp Ala Tyr Arg Glu Arg Leu Gly Leu Pro Pro Ala Tyr Arg Gly 210 215 220Val Thr Gly Phe Ala Phe Gly Leu Gly Val Glu Arg Leu Ala Met Leu225 230 235 240Arg Tyr Gly Ile Pro Asp Ile Arg Tyr Phe Phe Gly Gly Arg Leu Lys 245 250 255Phe Leu Glu Gln Phe Lys Gly Val Leu 260 2657250PRTThermus thermophilus 7Val Asp Val Ser Leu Pro Gly Ala Ser Leu Phe Ser Gly Gly Asp His1 5 10 15Pro Ile Thr Leu Met Glu Arg Glu Leu Val Glu Ile Phe Arg Ala Leu 20 25 30Gly Tyr Gln Ala Val Glu Gly Pro Glu Val Glu Ser Glu Phe Phe Asn 35 40 45Phe Asp Ala Leu Asn Ile Pro Glu Asn Gly Pro Ala Arg Asp Met Trp 50 55 60Asp Thr Val Gly Lys Thr Gly Glu Gly Phe Arg Leu Glu Gly Pro Asp65 70 75 80Gly Glu Glu Val Glu Gly Arg Leu Leu Leu Arg Thr His Thr Ser Pro 85 90 95Met Gln Val Arg Tyr Met Val Ala His Thr Pro Pro Phe Arg Ile Val 100 105 110Val Pro Gly Arg Val Phe Arg Ala Glu Gln Thr Asp Ala Thr Ala Glu 115 120 125Ala Val Phe His Gln Leu Glu Gly Leu Val Val Gly Glu Gly Val Asn 130 135 140Glu Gly Asp Leu Tyr Gly Ala Ile Tyr Glu Leu Ala Gln Ala Leu Phe145 150 155 160Gly Pro Asp Ser Lys Val Arg Phe Gln Pro Val Thr Phe Pro Phe Val 165 170 175Glu Pro Gly Ala Gln Phe Ala Val Trp Trp Pro Glu Gly Gly Lys Trp 180 185 190Leu Glu Leu Gly Gly Ala Gly Met Val Gly Pro Asn Val Phe Gln Ala 195 200 205Val Asp Ala Tyr Arg Glu Arg Leu Gly Asp Pro Pro Ala Tyr Arg Gly 210 215 220Val Thr Gly Phe Ala Phe Gly Leu Gly Val Glu Arg Leu Ala Met Leu225 230 235 240Arg Tyr Gly Ile Pro Asp Ile Arg Tyr Phe 245 2508350PRTThermus thermophilus 8Met Leu Glu Glu Ala Leu Ala Ala Ile Gln Asn Ala Arg Asp Leu Glu1 5 10 15Glu Leu Lys Ala Leu Lys Ala Arg Tyr Leu Gly Lys Lys Gly Leu Leu 20 25 30Thr Gln Glu Met Lys Gly Leu Ser Ala Leu Pro Leu Glu Glu Arg Arg 35 40 45Lys Arg Gly Gln Glu Leu Asn Ala Ile Lys Ala Ala Leu Glu Ala Ala 50 55 60Leu Glu Ala Arg Glu Lys Ala Leu Glu Glu Ala Ala Leu Lys Glu Ala65 70 75 80Leu Glu Arg Glu Arg Val Asp Val Ser Leu Pro Gly Ala Ser Leu Phe 85 90 95Ser Gly Gly Leu His Pro Ile Thr Leu Met Glu Arg Glu Leu Val Glu 100 105 110Ile Phe Arg Ala Leu Gly Tyr Gln Ala Val Glu Gly Pro Glu Val Glu 115 120 125Ser Glu Phe Phe Asn Phe Asp Ala Leu Asn Ile Pro Glu His His Pro 130 135 140Ala Arg Asp Met Trp Asp Thr Phe Trp Leu Thr Gly Glu Gly Phe Arg145 150 155 160Leu Glu Gly Pro Leu Gly Glu Glu Val Glu Gly Arg Leu Leu Leu Arg 165 170 175Thr His Thr Ser Pro Met Gln Val Arg Tyr Met Val Ala His Thr Pro 180 185 190Pro Phe Arg Ile Val Val Pro Gly Arg Val Phe Arg Phe Glu Gln Thr 195 200 205Asp Ala Thr His Glu Ala Val Phe His Gln Leu Glu Gly Leu Val Val 210 215 220Gly Glu Gly Ile Ala Met Ala His Leu Lys Gly Ala Ile Tyr Glu Leu225 230 235 240Ala Gln Ala Leu Phe Gly Pro Asp Ser Lys Val Arg Phe Gln Pro Val 245 250 255Tyr Phe Pro Phe Val Glu Pro Gly Ala Gln Phe Ala Val Trp Trp Pro 260 265 270Glu Gly Gly Lys Trp Leu Glu Leu Gly Gly Ala Gly Met Val His Pro 275 280 285Lys Val Phe Gln Ala Val Asp Ala Tyr Arg Glu Arg Leu Gly Leu Pro 290 295 300Pro Ala Tyr Arg Gly Val Thr Gly Phe Ala Phe Gly Leu Gly Val Glu305 310 315 320Arg Leu Ala Met Leu Arg Tyr Gly Ile Pro Asp Ile Arg Tyr Phe Phe 325 330 335Gly Gly Arg Leu Lys Phe Leu Glu Gln Phe Lys Gly Val Leu 340 345 3509177PRTEscherichia coli 9Asn Ile Gln Ala Ile Arg Gly Met Asn Asp Tyr Leu Pro Gly Glu Thr1 5 10 15Ala Ile Trp Gln Arg Ile Glu Gly Thr Leu Lys Asn Val Leu Gly Ser 20 25 30Tyr Gly Tyr Ser Glu Ile Arg Leu Pro Ile Val Glu Gln Thr Pro Leu 35 40 45Phe Lys Arg Ala Ile Gly Glu Val Thr Asp Val Val Glu Lys Glu Met 50 55 60Tyr Thr Phe Glu Asp Arg Asn Gly Asp Ser Leu Thr Leu Arg Pro Glu65 70 75 80Gly Thr Ala Gly Cys Val Arg Ala Gly Ile Glu His Gly Leu Leu Tyr 85 90 95Asn Gln Glu Gln Arg Leu Trp Tyr Ile Gly Pro Met Phe Arg His Glu 100 105 110Arg Pro Gln Lys Gly Arg Tyr Arg Gln Phe His Gln Leu Gly Cys Glu 115 120 125Val Phe Gly Leu Gln Gly Pro Asp Ile Asp Ala Glu Leu Ile Met Leu 130 135 140Thr Ala Arg Trp Trp Arg Ala Leu Gly Ile Ser Glu His Val Thr Leu145 150 155 160Glu Leu Asn Ser Ile Gly Ser Leu Glu Ala Arg Ala Asn Tyr Arg Asp 165 170 175Ala10171PRTEscherichia coli 10Lys Asn Ile Gln Ala Ile Arg Gly Met Asn Asp Tyr Leu Pro Gly Glu1 5 10 15Thr Ala Ile Trp Gln Arg Ile Glu Gly Thr Leu Lys Asn Val Leu Gly 20 25 30Ser Tyr Gly Tyr Ser Glu Ile Arg Leu Pro Ile Val Glu Gln Thr Pro 35 40 45Leu Phe Lys Arg Ala Ile Gly Glu Val Thr Asp Val Val Glu Lys Glu 50 55 60Met Tyr Thr Phe Glu Asp Arg Asn Gly Asp Ser Leu Thr Leu Arg Pro65 70 75 80Glu Gly Thr Ala Gly Cys Val Arg Ala Gly Ile Glu His Gly Leu Leu 85 90 95Tyr Asn Gln Glu Gln Arg Leu Trp Tyr Ile Gly Pro Met Phe Gly Asn 100 105 110Ala Pro Gln Phe His Gln Leu Gly Cys Glu Val Phe Gly Leu Gln Gly 115 120 125Pro Asp Ile Asp Ala Glu Leu Ile Met Leu Thr Ala Arg Trp Trp Arg 130 135 140Ala Leu Gly Ile Ser Glu His Val Thr Leu Glu Leu Asn Ser Ile Gly145 150 155 160Ser Leu Glu Ala Arg Ala Asn Tyr Arg Asp Ala 165 17011258PRTEscherichia coli 11Ser Lys Ser Thr Ala Glu Ile Arg Gln Ala Phe Leu Asp Phe Phe His1 5 10 15Ser Lys Gly His Gln Val Val Ala Ser Ser Ser Leu Val Pro His Asn 20 25 30Asp Pro Thr Leu Leu Phe Thr Asn Ala Gly Met Asn Gln Phe Lys Asp 35 40 45Val Phe Leu Gly Leu Asp Lys Arg Asn Tyr Ser Arg Ala Thr Thr Ser 50 55 60Gln Arg Cys Val Arg Ala Gly Gly Lys His Asn Asp Leu Glu Asn Val65 70 75 80Gly Tyr Thr Ala Arg His His Thr Phe Phe Glu Met Leu Gly Asn Phe 85 90 95Ser Phe Gly Asp Tyr Phe Lys His Asp Ala Ile Gln Phe Ala Trp Glu 100 105 110Leu Leu Thr Ser Glu Lys Trp Phe Ala Leu Pro Lys Glu Arg Leu Trp 115 120 125Val Thr Val Tyr Glu Ser Asp Asp Glu Ala Tyr Glu Ile Trp Glu Lys 130 135 140Glu Val Gly Ile Pro Arg Glu Arg Ile Ile Arg Ile Gly Asp Asn Lys145 150 155 160Gly Ala Pro Tyr Ala Ser Asp Asn Phe Trp Gln Met Gly Asp Thr Gly 165 170 175Pro Cys Gly Pro Cys Thr Glu Ile Phe Tyr Asp His Gly Asp His Ile 180 185 190Trp Gly Gly Pro Pro Gly Ser Pro Glu Glu Asp Gly Asp Arg Tyr Ile 195 200 205Glu Ile Trp Asn Ile Val Phe Met Gln Phe Asn Arg Gln Ala Asp Gly 210 215 220Thr Met Glu Pro Leu Pro Lys Pro Ser Val Asp Thr Gly Met Gly Leu225 230 235 240Glu Arg Ile Ala Ala Val Leu Gln His Val Asn Ser Asn Tyr Asp Ile 245 250 255Asp Leu12467PRTEscherichia coli 12Glu Lys Gln Thr Ile Val Val Asp Tyr Ser Ala Pro Asn Val Ala Lys1 5 10 15Glu Met His Val Gly His Leu Arg Ser Thr Ile Ile Gly Asp Ala Ala 20 25 30Val Arg Thr Leu Glu Phe Leu Gly His Lys Val Ile Arg Ala Asn His 35 40 45Val Gly Asp Trp Gly Thr Gln Phe Gly Met Leu Ile Ala Trp Leu Glu 50 55 60Lys Gln Gln Gln Glu Asn Ala Gly Glu Met Glu Leu Ala Asp Leu Glu65 70 75 80Gly Phe Tyr Arg Asp Ala Lys Lys His Tyr Asp Glu Asp Glu Glu Phe 85 90 95Ala Glu Arg Ala Arg Asn Tyr Val Val Lys Leu Gln Ser Gly Asp Glu 100 105 110Tyr Phe Arg Glu Met Trp Arg Lys Leu Val Asp Ile Thr Met Thr Gln 115 120 125Asn Gln Ile Thr Tyr Asp Arg Leu Asn Val Thr Leu Thr Arg Asp Asp 130 135 140Val Met Gly Glu Ser Leu Tyr Asn Pro Met Leu Pro Gly Ile Val Ala145 150 155 160Asp Leu Lys Ala Lys Gly Leu Ala Val Glu Ser Glu Gly Ala Thr Val 165 170 175Val Phe Leu Asp Glu Phe Lys Asn Lys Glu Gly Glu Pro Met Gly Val 180 185 190Ile Ile Gln Lys Lys Asp Gly Gly Tyr Leu Tyr Thr Thr Thr Asp Ile 195 200 205Ala Cys Ala Lys Tyr Arg Tyr Glu Ser Leu His Ala Asp Arg Val Leu 210 215 220Tyr Tyr Ile Asp Ser Arg Gln His Gln His Leu Met Gln Ala Trp Ala225 230 235 240Ile Val Arg Lys Ala Gly Tyr Val Pro Glu Ser Val Pro Leu Glu His 245 250 255His Met Phe Gly Met Met Leu Gly Lys Asp Gly Lys Pro Phe Lys Thr 260 265 270Arg Ala Gly Gly Thr Val Lys Leu Ala Asp Leu Leu Asp Glu Thr Leu 275 280 285Glu Arg Ala Arg Arg Leu Val Ala Glu Lys Asn Pro Asp Met Pro Ala 290 295 300Asp Glu Leu Glu Lys Leu Ala Asn Ala Val Gly Ile Gly Ala Val Lys305 310 315 320Tyr Ala Asp Leu Ser Lys Asn Arg Thr Thr Asp Tyr Ile Phe Asp Trp 325 330 335Asp Asn Met Leu Ala Phe Glu Gly Asn Thr Ala Pro Tyr Met Gln Tyr 340 345 350Ala Tyr Thr Arg Val Leu Ser Val Phe Arg Lys Ala Glu Ile Asn Glu 355 360 365Glu Gln Leu Ala Ala Ala Pro Val Ile Ile Arg Glu Asp Arg Glu Ala 370 375 380Gln Leu Ala Ala Arg Leu Leu Gln Phe Glu Glu Thr Leu Thr Val Val385 390 395 400Ala Arg Glu Gly Thr Pro His Val Met Cys Ala Tyr Leu Tyr Asp Leu 405 410 415Ala Gly Leu Phe Ser Gly Phe Tyr Glu His Cys Pro Ile Leu Ser Ala 420 425 430Glu Asn Glu Glu Val Arg Asn Ser Arg Leu Lys Leu Ala Gln Leu Thr 435 440 445Ala Lys Thr Leu Lys Leu Gly Leu Asp Thr Leu Gly Ile Glu Thr Val 450 455 460Glu Arg Met46513345PRTEscherichia coli 13Ser Ile Glu Tyr Leu Arg Glu Val Ala His Leu Arg Pro Arg Thr Asn1 5 10 15Leu Ile Gly Ala Val Ala Arg Val Arg His Thr Leu Ala Gln Ala Leu 20 25 30His Arg Phe Phe Asn Glu Gln Gly Phe Phe Trp Val Ser Thr Pro Leu 35 40 45Ile Thr Ala Ser Asp Thr Glu Gly Ala Gly Glu Met Phe Arg Val Ser 50 55 60Thr Leu Asp Leu Glu Asn Leu Pro Arg Asn Asp Gln Gly Lys Val Asp65 70 75 80Phe Asp Lys Asp Phe Phe Gly Lys Glu Ser Phe Leu Thr Val Ser Gly 85 90 95Gln Leu Asn Gly Glu Thr Tyr Ala Cys Ala Leu Ser Lys Ile Tyr Thr 100 105 110Phe Gly Pro Thr Phe Arg Ala Glu Asn Ser Asn Thr Ser Arg His Leu 115 120 125Ala Glu Phe Trp Met Leu Glu Pro Glu Val Ala Phe Ala Asn Leu Asn 130 135 140Asp Ile Ala Gly Leu Ala Glu Ala Met Leu Lys Tyr Val Phe Lys Ala145 150 155 160Val Leu Glu Glu Arg Ala Asp Asp Met Lys Phe Phe Ala Glu Arg Val 165 170 175Asp Lys Asp Ala Val Ser Arg Leu Glu Arg Phe Ile Glu Ala Asp Phe 180 185 190Ala Gln Val Asp Tyr Thr Asp Ala Val Thr Ile Leu Glu Asn Cys Gly 195 200 205Arg Lys Phe Glu Asn Pro Val Tyr Trp Gly Val Asp Leu Ser Ser Glu 210 215 220His Glu Arg Tyr Leu Ala Glu Glu His Phe Lys Ala Pro Val Val Val225 230 235 240Lys Asn Tyr Pro Lys Asp Ile Lys Ala Phe Tyr Met Arg Leu Asn Glu 245 250 255Asp Gly Lys Thr Val Ala Ala Met Asp Val Leu Ala Pro Gly Ile Gly 260 265 270Glu Ile Ile Gly Gly Ser Gln Arg Glu Glu Arg Leu Asp Val Leu Asp 275 280 285Glu Arg Met Leu Glu Met Gly Leu Asn Lys Glu Asp Tyr Trp Trp Tyr 290 295 300Arg Asp Leu Arg Arg Tyr Gly Thr Val Pro His Ser Gly Phe Gly Leu305 310 315 320Gly Phe Glu Arg Leu Ile Ala Tyr Val Thr Gly Val Gln Asn Val Arg 325 330 335Asp Val Ile Pro Phe Pro Arg Thr Pro 340 34514449PRTEscherichia coli 14Leu Pro Leu Asp Ser Asn His Val Asn Thr Glu Glu Ala Arg Leu Lys1 5 10 15Tyr Arg Tyr Leu Asp Leu Arg Arg Pro Glu Met Ala Gln Arg Leu Lys 20 25 30Thr Arg Ala Lys Ile Thr Ser Leu Val Arg Arg Phe Met Asp Asp His 35 40 45Gly Phe Leu Asp Ile Glu Thr Pro Met Leu Thr Lys Ala Thr Pro Glu 50 55 60Gly Ala Arg Asp Tyr Leu Val Pro Ser Arg Val His Lys Gly Lys Phe65 70 75 80Tyr Ala Leu Pro Gln Ser Pro Gln Leu Phe Lys Gln Leu Leu Met Met 85 90 95Ser Gly Phe Asp Arg Tyr Tyr Gln Ile Val Lys Cys Phe Arg Asp Glu 100 105 110Asp Leu Arg Ala Asp Arg Gln Pro Glu Phe Thr Gln Ile Asp Val Glu 115 120 125Thr Ser Phe Met Thr Ala Pro Gln Val Arg Glu Val Met Glu Ala Leu 130 135 140Val Arg His Leu Trp Leu Glu Val Lys Gly Val Asp Leu Gly Asp Phe145 150 155 160Pro Val Met Thr Phe Ala Glu Ala Glu Arg Arg Tyr Gly Ser Asp Lys 165 170 175Pro Asp Leu Arg Asn Pro Met Glu Leu Thr Asp Val Ala Asp Leu Leu 180 185 190Arg Ser Val Glu Phe Ala Val Phe Ala Gly Pro Ala Asn Asp Pro Lys 195 200 205Gly Arg Val Ala Ala Leu Arg Val Pro Gly Gly Ala Ser Leu Thr Arg 210 215 220Lys Gln Ile Asp Glu Tyr Asp Asn Phe Val Lys Ile Tyr Gly Ala Lys225 230 235 240Gly Leu Ala Tyr Ile Lys Val Asn Glu Arg Ala Lys Gly Leu Glu Gly 245 250 255Ile Asn Ser Pro Val Ala Lys Phe Leu Asn Ala Glu Ile Ile Glu Ala 260 265 270Ile Leu Asp Arg Thr Ala Ala Gln Asp Gly Asp Met Ile Phe Phe Gly 275 280 285Ala Asp Asn Lys Lys Ile Val Ala Asp Ala Met Gly Ala Leu Arg Leu 290 295 300Lys Val Gly Lys Asp Leu Gly Leu Thr Asp Glu Ser Lys Trp Ala Pro305 310 315 320Leu Trp Val Ile Asp Phe Pro Met Phe Glu Asp Asp Gly Glu Gly Gly 325 330

335Leu Thr Ala Met His His Pro Phe Thr Ser Pro Lys Asp Met Thr Ala 340 345 350Ala Glu Leu Lys Ala Ala Pro Glu Asn Ala Val Ala Asn Ala Tyr Asp 355 360 365Met Val Ile Asn Gly Tyr Glu Val Gly Gly Gly Ser Val Arg Ile His 370 375 380Asn Gly Asp Met Gln Gln Thr Val Phe Gly Ile Leu Gly Ile Asn Glu385 390 395 400Glu Glu Gln Arg Glu Lys Phe Gly Phe Leu Leu Asp Ala Leu Lys Tyr 405 410 415Gly Thr Pro Pro His Ala Gly Leu Ala Phe Gly Leu Asp Arg Leu Thr 420 425 430Met Leu Leu Thr Gly Thr Asp Asn Ile Arg Asp Val Ile Ala Phe Pro 435 440 445Lys15304PRTEscherichia coli 15Met Leu Lys Ile Phe Asn Thr Leu Thr Arg Gln Lys Glu Glu Phe Lys1 5 10 15Pro Ile His Ala Gly Glu Val Gly Met Tyr Val Cys Gly Ile Thr Val 20 25 30Tyr Asp Leu Cys His Ile Gly His Gly Arg Thr Phe Val Ala Phe Asp 35 40 45Val Val Ala Arg Tyr Leu Arg Phe Leu Gly Tyr Lys Leu Lys Tyr Val 50 55 60Arg Asn Ile Thr Asp Ile Asp Asp Lys Ile Ile Lys Arg Ala Asn Glu65 70 75 80Asn Gly Glu Ser Phe Val Ala Met Val Asp Arg Met Ile Ala Glu Met 85 90 95His Lys Asp Phe Asp Ala Leu Asn Ile Leu Arg Pro Asp Met Glu Pro 100 105 110Arg Ala Thr His His Ile Ala Glu Ile Ile Glu Leu Thr Glu Gln Leu 115 120 125Ile Ala Lys Gly His Ala Tyr Val Ala Asp Asn Gly Asp Val Met Phe 130 135 140Asp Val Pro Thr Asp Pro Thr Tyr Gly Val Leu Ser Arg Gln Asp Leu145 150 155 160Asp Gln Leu Gln Ala Gly Ala Arg Val Asp Val Val Asp Asp Lys Arg 165 170 175Asn Pro Met Asp Phe Val Leu Trp Lys Met Ser Lys Glu Gly Glu Pro 180 185 190Ser Trp Pro Ser Pro Trp Gly Ala Gly Arg Pro Gly Trp His Ile Glu 195 200 205Cys Ser Ala Met Asn Cys Lys Gln Leu Gly Asn His Phe Asp Ile His 210 215 220Gly Gly Gly Ser Asp Leu Met Phe Pro His His Glu Asn Glu Ile Ala225 230 235 240Gln Ser Thr Cys Ala His Asp Gly Gln Tyr Val Asn Tyr Trp Met His 245 250 255Ser Gly Met Val Met Val Asp Arg Glu Lys Met Ser Lys Ser Leu Gly 260 265 270Asn Phe Phe Thr Val Arg Asp Val Leu Lys Tyr Tyr Asp Ala Glu Thr 275 280 285Val Arg Tyr Phe Leu Met Ser Gly His Tyr Arg Ser Gln Leu Asn Tyr 290 295 30016253PRTEscherichia coli 16Thr Asn Phe Ile Arg Gln Ile Ile Asp Glu Asp Leu Ala Ser Gly Lys1 5 10 15His Thr Thr Val His Thr Arg Phe Pro Pro Glu Pro Asn Gly Tyr Leu 20 25 30His Ile Gly His Ala Lys Ser Ile Cys Leu Asn Phe Gly Ile Ala Gln 35 40 45Asp Tyr Lys Gly Gln Cys Asn Leu Arg Phe Asp Asp Thr Asn Pro Val 50 55 60Lys Glu Asp Ile Glu Tyr Val Glu Ser Ile Lys Asn Asp Val Glu Trp65 70 75 80Leu Gly Phe His Trp Ser Gly Asn Val Arg Tyr Ser Ser Asp Tyr Phe 85 90 95Asp Gln Leu His Ala Tyr Ala Ile Glu Leu Ile Asn Lys Gly Leu Ala 100 105 110Tyr Val Asp Glu Leu Thr Pro Glu Gln Ile Arg Glu Tyr Arg Gly Thr 115 120 125Leu Thr Gln Pro Gly Lys Asn Ser Pro Tyr Arg Asp Arg Ser Val Glu 130 135 140Glu Asn Leu Ala Leu Phe Glu Lys Met Arg Thr Gly Gly Phe Glu Glu145 150 155 160Gly Lys Ala Cys Leu Arg Ala Lys Ile Asp Met Ala Ser Pro Phe Ile 165 170 175Val Met Arg Asp Pro Val Leu Tyr Arg Ile Lys Phe Ala Glu His His 180 185 190Gln Thr Gly Asn Lys Trp Cys Ile Tyr Pro Met Tyr Asp Phe Thr His 195 200 205Cys Ile Ser Asp Ala Leu Glu Gly Ile Thr His Ser Leu Cys Thr Leu 210 215 220Glu Phe Gln Asp Asn Arg Arg Leu Tyr Asp Trp Val Leu Asp Asn Ile225 230 235 240Thr Ile Pro Val His Pro Arg Gln Tyr Glu Phe Ser Arg 245 25017309PRTEscherichia coli 17Ile Lys Thr Arg Phe Ala Pro Ser Pro Thr Gly Tyr Leu His Val Gly1 5 10 15Gly Ala Arg Thr Ala Leu Tyr Ser Trp Leu Phe Ala Arg Asn His Gly 20 25 30Gly Glu Phe Val Leu Arg Ile Glu Asp Thr Asp Leu Glu Arg Ser Thr 35 40 45Pro Glu Ala Ile Glu Ala Ile Met Asp Gly Met Asn Trp Leu Ser Leu 50 55 60Glu Trp Asp Glu Gly Pro Tyr Tyr Gln Thr Lys Arg Phe Asp Arg Tyr65 70 75 80Asn Ala Val Ile Asp Gln Met Leu Glu Glu Gly Thr Ala Tyr Lys Cys 85 90 95Tyr Cys Ser Lys Glu Arg Leu Glu Ala Leu Arg Glu Glu Gln Met Ala 100 105 110Lys Gly Glu Lys Pro Arg Tyr Asp Gly Arg Cys Arg His Ser His Glu 115 120 125His His Ala Asp Asp Glu Pro Cys Val Val Arg Phe Ala Asn Pro Gln 130 135 140Glu Gly Ser Val Val Phe Asp Asp Gln Ile Arg Gly Pro Ile Glu Phe145 150 155 160Ser Asn Gln Glu Leu Asp Asp Leu Ile Ile Arg Arg Thr Asp Gly Ser 165 170 175Pro Thr Tyr Asn Phe Cys Val Val Val Asp Asp Trp Asp Met Glu Ile 180 185 190Thr His Val Ile Arg Gly Glu Asp His Ile Asn Asn Thr Pro Arg Gln 195 200 205Ile Asn Ile Leu Lys Ala Leu Asn Ala Pro Val Pro Val Tyr Ala His 210 215 220Val Ser Met Ile Asn Gly Asp Asp Gly Lys Lys Leu Ser Lys Arg His225 230 235 240Gly Ala Val Ser Val Met Gln Tyr Arg Asp Asp Gly Tyr Leu Pro Glu 245 250 255Ala Leu Leu Asn Tyr Leu Val Arg Leu Gly Trp Ser His Gly Asp Gln 260 265 270Glu Ile Phe Thr Arg Glu Glu Met Ile Lys Tyr Phe Thr Leu Asn Ala 275 280 285Val Ser Lys Ser Ala Ser Ala Phe Asn Thr Asp Lys Leu Leu Trp Leu 290 295 300Asn His His Tyr Ile30518767PRTEscherichia coli 18Phe Pro Met Arg Gly Asp Leu Ala Lys Arg Glu Pro Gly Met Leu Ala1 5 10 15Arg Trp Thr Asp Asp Asp Leu Tyr Gly Ile Ile Arg Ala Ala Lys Lys 20 25 30Gly Lys Lys Thr Phe Ile Leu His Asp Gly Pro Pro Tyr Ala Asn Gly 35 40 45Ser Ile His Ile Gly His Ser Val Asn Lys Ile Leu Lys Asp Ile Ile 50 55 60Ile Lys Ser Lys Gly Leu Ser Gly Tyr Asp Ser Pro Tyr Val Pro Gly65 70 75 80Trp Asp Cys His Gly Leu Pro Ile Glu Leu Lys Val Glu Gln Glu Tyr 85 90 95Gly Lys Pro Gly Glu Lys Phe Thr Ala Ala Glu Phe Arg Ala Lys Cys 100 105 110Arg Glu Tyr Ala Ala Thr Gln Val Asp Gly Gln Arg Lys Asp Phe Ile 115 120 125Arg Leu Gly Val Leu Gly Asp Trp Ser His Pro Tyr Leu Thr Met Asp 130 135 140Phe Lys Thr Glu Ala Asn Ile Ile Arg Ala Leu Gly Lys Ile Ile Gly145 150 155 160Asn Gly His Leu His Lys Gly Ala Lys Pro Val His Trp Cys Val Asp 165 170 175Cys Arg Ser Ala Leu Ala Glu Ala Glu Val Glu Tyr Tyr Asp Lys Thr 180 185 190Ser Pro Ser Ile Val Ala Phe Gln Ala Val Asp Gln Asp Ala Leu Lys 195 200 205Thr Lys Phe Gly Val Ser Asn Val Asn Gly Pro Ile Ser Leu Val Ile 210 215 220Trp Thr Thr Thr Pro Trp Thr Leu Pro Ala Asn Arg Ala Ile Ser Ile225 230 235 240Ala Pro Asp Phe Asp Tyr Ala Leu Val Gln Ile Asp Gly Gln Ala Val 245 250 255Ile Leu Ala Lys Asp Leu Val Glu Ser Met Gln Arg Ile Gly Val Ser 260 265 270Asp Tyr Thr Ile Leu Gly Thr Val Lys Gly Ala Glu Leu Glu Leu Leu 275 280 285Arg Phe Thr His Pro Phe Met Asp Phe Asp Val Pro Ala Ile Leu Gly 290 295 300Asp His Val Thr Leu Asp Ala Gly Thr Gly Ala Val His Thr Ala Pro305 310 315 320Gly His Gly Pro Asp Asp Tyr Val Ile Gly Gln Lys Tyr Gly Leu Glu 325 330 335Thr Ala Asn Pro Val Gly Pro Asp Gly Thr Tyr Leu Pro Gly Thr Tyr 340 345 350Pro Thr Leu Asp Gly Val Asn Val Phe Lys Ala Asn Asp Ile Val Val 355 360 365Ala Leu Leu Gln Glu Lys Gly Ala Leu Leu His Val Glu Lys Met Gln 370 375 380His Ser Tyr Pro Cys Cys Trp Arg His Lys Thr Pro Ile Ile Phe Arg385 390 395 400Ala Thr Pro Gln Trp Phe Val Ser Met Asp Gln Lys Gly Leu Arg Ala 405 410 415Gln Ser Leu Lys Glu Ile Lys Gly Val Gln Trp Ile Pro Asp Trp Gly 420 425 430Gln Ala Arg Ile Glu Ser Met Val Ala Asn Arg Pro Asp Trp Cys Ile 435 440 445Ser Arg Gln Arg Thr Trp Gly Val Pro Met Ser Leu Phe Val His Lys 450 455 460Asp Thr Glu Glu Leu His Pro Arg Thr Leu Glu Leu Met Glu Glu Val465 470 475 480Ala Lys Arg Val Glu Val Asp Gly Ile Gln Ala Trp Trp Asp Leu Asp 485 490 495Ala Lys Glu Ile Leu Gly Asp Glu Ala Asp Gln Tyr Val Lys Val Pro 500 505 510Asp Thr Leu Asp Val Trp Phe Asp Ser Gly Ser Thr His Ser Ser Val 515 520 525Val Asp Val Arg Pro Glu Phe Ala Gly His Ala Ala Asp Met Tyr Leu 530 535 540Glu Gly Ser Asp Gln His Arg Gly Trp Phe Met Ser Ser Leu Met Ile545 550 555 560Ser Thr Ala Met Lys Gly Lys Ala Pro Tyr Arg Gln Val Leu Thr His 565 570 575Gly Phe Thr Val Asp Gly Gln Gly Arg Lys Met Ser Lys Ser Ile Gly 580 585 590Asn Thr Val Ser Pro Gln Asp Val Met Asn Lys Leu Gly Ala Asp Ile 595 600 605Leu Arg Leu Trp Val Ala Ser Thr Asp Tyr Thr Gly Glu Met Ala Val 610 615 620Ser Asp Glu Ile Leu Lys Arg Ala Ala Asp Ser Tyr Arg Arg Ile Arg625 630 635 640Asn Thr Ala Arg Phe Leu Leu Ala Asn Leu Asn Gly Phe Asp Pro Ala 645 650 655Lys Asp Met Val Lys Pro Glu Glu Met Val Val Leu Asp Arg Trp Ala 660 665 670Val Gly Cys Ala Lys Ala Ala Gln Glu Asp Ile Leu Lys Ala Tyr Glu 675 680 685Ala Tyr Asp Phe His Glu Val Val Gln Arg Leu Met Arg Phe Cys Ser 690 695 700Val Glu Met Gly Ser Phe Tyr Leu Asp Ile Ile Lys Asp Arg Gln Tyr705 710 715 720Thr Ala Lys Ala Asp Ser Val Ala Arg Arg Ser Cys Gln Thr Ala Leu 725 730 735Tyr His Ile Ala Glu Ala Leu Val Arg Trp Met Ala Pro Ile Leu Ser 740 745 750Phe Thr Ala Asp Glu Val Trp Gly Tyr Leu Pro Gly Glu Arg Glu 755 760 76519779PRTEscherichia coli 19Ile Glu Ser Lys Val Gln Leu His Trp Asp Glu Lys Arg Thr Phe Glu1 5 10 15Val Thr Glu Asp Glu Ser Lys Glu Lys Tyr Tyr Cys Leu Ser Met Leu 20 25 30Pro Tyr Pro Ser Gly Arg Leu His Met Gly His Val Arg Asn Tyr Thr 35 40 45Ile Gly Asp Val Ile Ala Arg Tyr Gln Arg Met Leu Gly Lys Asn Val 50 55 60Leu Gln Pro Ile Gly Trp Asp Ala Phe Gly Leu Pro Ala Glu Gly Ala65 70 75 80Ala Val Lys Asn Asn Thr Ala Pro Ala Pro Trp Thr Tyr Asp Asn Ile 85 90 95Ala Tyr Met Lys Asn Gln Leu Lys Met Leu Gly Phe Gly Tyr Asp Trp 100 105 110Ser Arg Glu Leu Ala Thr Cys Thr Pro Glu Tyr Tyr Arg Trp Glu Gln 115 120 125Lys Cys Phe Thr Glu Leu Tyr Lys Lys Gly Leu Val Tyr Lys Lys Thr 130 135 140Ser Ala Val Asn Trp Cys Pro Asn Asp Gln Thr Val Leu Ala Asn Glu145 150 155 160Gln Val Ile Asp Gly Cys Cys Trp Arg Cys Asp Thr Lys Val Glu Arg 165 170 175Lys Glu Ile Pro Gln Trp Phe Ile Lys Ile Thr Ala Tyr Ala Asp Glu 180 185 190Leu Leu Asn Asp Leu Asp Lys Leu Asp His Trp Pro Asp Thr Val Lys 195 200 205Thr Met Gln Arg Asn Trp Ile Gly Arg Ser Glu Gly Val Glu Ile Thr 210 215 220Phe Asn Val Lys Asp Tyr Asp Asn Thr Leu Thr Val Tyr Thr Thr Arg225 230 235 240Pro Asp Thr Phe Met Gly Cys Thr Tyr Leu Ala Val Ala Ala Gly His 245 250 255Pro Leu Ala Gln Lys Ala Ala Glu Asn Asn Pro Glu Leu Ala Ala Phe 260 265 270Ile Asp Glu Cys Arg Asn Thr Lys Val Ala Glu Ala Glu Met Ala Thr 275 280 285Met Glu Lys Lys Gly Val Asp Thr Gly Phe Lys Ala Val His Pro Leu 290 295 300Thr Gly Glu Glu Ile Pro Val Trp Ala Ala Asn Phe Val Leu Met Glu305 310 315 320Tyr Gly Thr Gly Ala Val Met Ala Val Pro Gly His Asp Gln Arg Asp 325 330 335Tyr Glu Phe Ala Ser Lys Tyr Gly Leu Asn Ile Lys Pro Val Ile Leu 340 345 350Ala Ala Asp Gly Ser Glu Pro Asp Leu Ser Gln Gln Ala Leu Thr Glu 355 360 365Lys Gly Val Leu Phe Asn Ser Gly Glu Phe Asn Gly Leu Asp His Glu 370 375 380Ala Ala Phe Asn Ala Ile Ala Asp Lys Leu Thr Glu Met Gly Val Gly385 390 395 400Glu Arg Lys Val Asn Tyr Arg Leu Arg Asp Trp Gly Val Ser Arg Gln 405 410 415Arg Tyr Trp Gly Ala Pro Ile Pro Met Val Thr Leu Glu Asp Gly Thr 420 425 430Val Met Pro Thr Pro Asp Asp Gln Leu Pro Val Ile Leu Pro Glu Asp 435 440 445Val Val Met Asp Gly Ile Thr Ser Pro Ile Lys Ala Asp Pro Glu Trp 450 455 460Ala Lys Thr Thr Val Asn Gly Met Pro Ala Leu Arg Glu Thr Asp Thr465 470 475 480Phe Asp Thr Phe Met Glu Ser Ser Trp Tyr Tyr Ala Arg Tyr Thr Cys 485 490 495Pro Glu Tyr Lys Glu Gly Met Leu Asp Ser Lys Ala Ala Asn Tyr Trp 500 505 510Leu Pro Val Asp Ile Tyr Ile Gly Gly Ile Glu His Ala Ile Met His 515 520 525Leu Leu Tyr Phe Arg Phe Phe His Lys Leu Met Arg Asp Ala Gly Met 530 535 540Val Asn Ser Asp Glu Pro Ala Lys Gln Leu Leu Cys Gln Gly Met Val545 550 555 560Leu Ala Asp Ala Phe Tyr Tyr Val Gly Glu Asn Gly Glu Arg Asn Trp 565 570 575Val Ser Pro Val Asp Ala Ile Val Glu Arg Asp Glu Lys Gly Arg Ile 580 585 590Val Lys Ala Lys Asp Ala Ala Gly His Glu Leu Val Tyr Thr Gly Met 595 600 605Ser Lys Met Ser Lys Ser Lys Asn Asn Gly Ile Asp Pro Gln Val Met 610 615 620Val Glu Arg Tyr Gly Ala Asp Thr Val Arg Leu Phe Met Met Phe Ala625 630 635 640Ser Pro Ala Asp Met Thr Leu Glu Trp Gln Glu Ser Gly Val Glu Gly 645 650 655Ala Asn Arg Phe Leu Lys Arg Val Trp Lys Leu Val Tyr Glu His Thr 660 665 670Ala Lys Gly Asp Val Ala Ala Leu Asn Val Asp Ala Leu Thr Glu Asp 675 680 685Gln Lys Ala Leu Arg Arg Asp Val His Lys Thr Ile Ala Lys Val Thr 690 695 700Asp Asp Ile Gly Arg Arg Gln Thr Phe Asn Thr Ala Ile Ala Ala Ile705 710

715 720Met Glu Leu Met Asn Lys Leu Ala Lys Ala Pro Thr Asp Gly Glu Gln 725 730 735Asp Arg Ala Leu Met Gln Glu Ala Leu Leu Ala Val Val Arg Met Leu 740 745 750Asn Pro Phe Thr Pro His Ile Cys Phe Thr Leu Trp Gln Glu Leu Lys 755 760 765Gly Glu Gly Asp Ile Asp Asn Ala Pro Trp Pro 770 77520328PRTEscherichia coli 20Ala Asn Asp Lys Ser Arg Gln Thr Phe Val Val Arg Ser Lys Ile Leu1 5 10 15Ala Ala Ile Arg Gln Phe Met Val Ala Arg Gly Phe Met Glu Val Glu 20 25 30Thr Pro Met Met Gln Val Ile Pro Gly Gly Ala Ser Ala Arg Pro Phe 35 40 45Ile Thr His His Asn Ala Leu Asp Leu Asp Met Tyr Leu Arg Ile Ala 50 55 60Pro Glu Leu Tyr Leu Lys Arg Leu Val Val Gly Gly Phe Glu Arg Val65 70 75 80Phe Glu Ile Asn Arg Asn Phe Arg Asn Glu Gly Ile Ser Val Arg His 85 90 95Asn Pro Glu Phe Thr Met Met Glu Leu Tyr Met Ala Tyr Ala Asp Tyr 100 105 110His Asp Leu Ile Glu Leu Thr Glu Ser Leu Phe Arg Thr Leu Ala Gln 115 120 125Glu Val Leu Gly Thr Thr Lys Val Thr Tyr Gly Glu His Val Phe Asp 130 135 140Phe Gly Lys Pro Phe Glu Lys Leu Thr Met Arg Glu Ala Ile Lys Lys145 150 155 160Tyr Arg Pro Glu Thr Asp Met Ala Asp Leu Asp Asn Phe Asp Ala Ala 165 170 175Lys Ala Leu Ala Glu Ser Ile Gly Ile Thr Val Glu Lys Ser Trp Gly 180 185 190Leu Gly Arg Ile Val Thr Glu Ile Phe Asp Glu Val Ala Glu Ala His 195 200 205Leu Ile Gln Pro Thr Phe Ile Thr Glu Tyr Pro Ala Glu Val Ser Pro 210 215 220Leu Ala Arg Arg Asn Asp Val Asn Pro Glu Ile Thr Asp Arg Phe Glu225 230 235 240Phe Phe Ile Gly Gly Arg Glu Ile Gly Asn Gly Phe Ser Glu Leu Asn 245 250 255Asp Ala Glu Asp Gln Ala Glu Arg Phe Gln Glu Gln Val Asn Ala Lys 260 265 270Ala Ala Gly Asp Asp Glu Ala Met Phe Tyr Asp Glu Asp Tyr Val Thr 275 280 285Ala Leu Glu Tyr Gly Leu Pro Pro Thr Ala Gly Leu Gly Ile Gly Ile 290 295 300Asp Arg Met Ile Met Leu Phe Thr Asn Ser His Thr Ile Arg Asp Val305 310 315 320Ile Leu Phe Pro Ala Met Arg Pro 32521388PRTEscherichia coli 21Met Ile Arg Lys Leu Ala Ser Gly Leu Tyr Thr Trp Leu Pro Thr Gly1 5 10 15Val Arg Val Leu Lys Lys Val Glu Asn Ile Val Arg Glu Glu Met Asn 20 25 30Asn Ala Gly Ala Ile Glu Val Leu Met Pro Val Val Gln Pro Ser Glu 35 40 45Leu Trp Gln Glu Ser Gly Arg Trp Glu Gln Tyr Gly Pro Glu Leu Leu 50 55 60Arg Ile Ala Asp Arg Gly Asp Arg Pro Phe Val Leu Gly Pro Thr His65 70 75 80Glu Glu Val Ile Thr Asp Leu Ile Arg Asn Glu Leu Ser Ser Tyr Lys 85 90 95Gln Leu Pro Leu Asn Phe Tyr Gln Ile Gln Thr Lys Phe Arg Asp Glu 100 105 110Val Arg Pro Arg Phe Gly Val Met Arg Ser Arg Glu Phe Leu Met Lys 115 120 125Asp Ala Tyr Ser Phe His Thr Ser Gln Glu Ser Leu Gln Glu Thr Tyr 130 135 140Asp Ala Met Tyr Ala Ala Tyr Ser Lys Ile Phe Ser Arg Met Gly Leu145 150 155 160Asp Phe Arg Ala Val Gln Ala Asp Thr Gly Ser Ile Gly Gly Ser Ala 165 170 175Ser His Glu Phe Gln Val Leu Ala Gln Ser Gly Glu Asp Asp Val Val 180 185 190Phe Ser Asp Thr Ser Asp Tyr Ala Ala Asn Ile Glu Leu Ala Glu Ala 195 200 205Ile Ala Pro Lys Glu Pro Arg Ala Ala Ala Thr Gln Glu Met Thr Leu 210 215 220Val Asp Thr Pro Asn Ala Lys Thr Ile Ala Glu Leu Val Glu Gln Phe225 230 235 240Asn Leu Pro Ile Glu Lys Thr Val Lys Thr Leu Leu Val Lys Ala Val 245 250 255Glu Gly Ser Ser Phe Pro Leu Val Ala Leu Leu Val Arg Gly Asp His 260 265 270Glu Leu Asn Glu Val Lys Ala Glu Lys Leu Pro Gln Val Ala Ser Pro 275 280 285Leu Thr Phe Ala Thr Glu Glu Glu Ile Arg Ala Val Val Lys Ala Gly 290 295 300Pro Gly Ser Leu Gly Pro Val Asn Met Pro Ile Pro Val Val Ile Asp305 310 315 320Arg Thr Val Ala Ala Met Ser Asp Phe Ala Ala Gly Ala Asn Ile Asp 325 330 335Gly Lys His Tyr Phe Gly Ile Asn Trp Asp Arg Asp Val Ala Thr Pro 340 345 350Glu Ile Ala Asp Ile Arg Asn Val Val Ala Gly Asp Pro Ser Pro Asp 355 360 365Gly Gln Gly Thr Leu Leu Ile Lys Arg Gly Ile Glu Val Gly His Ile 370 375 380Phe Gln Leu Gly38522429PRTEscherichia coli 22Met Leu Asp Pro Asn Leu Leu Arg Asn Glu Pro Asp Ala Val Ala Glu1 5 10 15Lys Leu Ala Arg Arg Gly Phe Lys Leu Asp Val Asp Lys Leu Gly Ala 20 25 30Leu Glu Glu Arg Arg Lys Val Leu Gln Val Lys Thr Glu Asn Leu Gln 35 40 45Ala Glu Arg Asn Ser Arg Ser Lys Ser Ile Gly Gln Ala Lys Ala Arg 50 55 60Gly Glu Asp Ile Glu Pro Leu Arg Leu Glu Val Asn Lys Leu Gly Glu65 70 75 80Glu Leu Asp Ala Ala Lys Ala Glu Leu Asp Ala Leu Gln Ala Glu Ile 85 90 95Arg Asp Ile Ala Leu Thr Ile Pro Asn Leu Pro Ala Asp Glu Val Pro 100 105 110Val Gly Lys Asp Glu Asn Asp Asn Val Glu Val Ser Arg Trp Gly Thr 115 120 125Pro Arg Glu Phe Asp Phe Glu Val Arg Asp His Val Thr Leu Gly Glu 130 135 140Met Tyr Ser Gly Leu Asp Phe Ala Ala Ala Val Lys Leu Thr Gly Ser145 150 155 160Arg Phe Val Val Met Lys Gly Gln Ile Ala Arg Met His Arg Ala Leu 165 170 175Ser Gln Phe Met Leu Asp Leu His Thr Glu Gln His Gly Tyr Ser Glu 180 185 190Asn Tyr Val Pro Tyr Leu Val Asn Gln Asp Thr Leu Tyr Gly Thr Gly 195 200 205Gln Leu Pro Lys Phe Ala Gly Asp Leu Phe His Thr Arg Pro Leu Glu 210 215 220Glu Glu Ala Asp Thr Ser Asn Tyr Ala Leu Ile Pro Thr Ala Glu Val225 230 235 240Pro Leu Thr Asn Leu Val Arg Gly Glu Ile Ile Asp Glu Asp Asp Leu 245 250 255Pro Ile Lys Met Thr Ala His Thr Pro Cys Phe Arg Ser Glu Ala Gly 260 265 270Ser Tyr Gly Arg Asp Thr Arg Gly Leu Ile Arg Met His Gln Phe Asp 275 280 285Lys Val Glu Met Val Gln Ile Val Arg Pro Glu Asp Ser Met Ala Ala 290 295 300Leu Glu Glu Met Thr Gly His Ala Glu Lys Val Leu Gln Leu Leu Gly305 310 315 320Leu Pro Tyr Arg Lys Ile Ile Leu Cys Thr Gly Asp Met Gly Phe Gly 325 330 335Ala Cys Lys Thr Tyr Asp Leu Glu Val Trp Ile Pro Ala Gln Asn Thr 340 345 350Tyr Arg Glu Ile Ser Ser Cys Ser Asn Val Trp Asp Phe Gln Ala Arg 355 360 365Arg Met Gln Ala Arg Cys Arg Ser Lys Ser Asp Lys Lys Thr Arg Leu 370 375 380Val His Thr Leu Asn Gly Ser Gly Leu Ala Val Gly Arg Thr Leu Val385 390 395 400Ala Val Met Glu Asn Tyr Gln Gln Ala Asp Gly Arg Ile Glu Val Pro 405 410 415Glu Val Leu Arg Pro Tyr Met Asn Gly Leu Glu Tyr Ile 420 42523401PRTEscherichia coli 23Arg Asp His Arg Lys Ile Gly Lys Gln Leu Asp Leu Tyr His Met Gln1 5 10 15Glu Glu Ala Pro Gly Met Val Phe Trp His Asn Asp Gly Trp Thr Ile 20 25 30Phe Arg Glu Leu Glu Val Phe Val Arg Ser Lys Leu Lys Glu Tyr Gln 35 40 45Tyr Gln Glu Val Lys Gly Pro Phe Met Met Asp Arg Val Leu Trp Glu 50 55 60Lys Thr Gly His Trp Asp Asn Tyr Lys Asp Ala Met Phe Thr Thr Ser65 70 75 80Ser Glu Asn Arg Glu Tyr Cys Ile Lys Pro Met Asn Cys Pro Gly His 85 90 95Val Gln Ile Phe Asn Gln Gly Leu Lys Ser Tyr Arg Asp Leu Pro Leu 100 105 110Arg Met Ala Glu Phe Gly Ser Cys His Arg Asn Glu Pro Ser Gly Ser 115 120 125Leu His Gly Leu Gly Arg Val Arg Gly Phe Thr Gln Asp Asp Ala His 130 135 140Ile Phe Cys Thr Glu Glu Gln Ile Arg Asp Glu Val Asn Gly Cys Ile145 150 155 160Arg Leu Val Tyr Asp Met Tyr Ser Thr Phe Gly Phe Glu Lys Ile Val 165 170 175Val Lys Leu Ser Thr Arg Pro Glu Lys Arg Ile Gly Ser Asp Glu Met 180 185 190Trp Asp Arg Ala Glu Ala Asp Leu Ala Val Ala Leu Glu Glu Asn Asn 195 200 205Ile Pro Phe Glu Tyr Gln Leu Gly Glu Gly Ala Phe Tyr Gly Pro Lys 210 215 220Ile Glu Phe Thr Leu Tyr Asp Cys Leu Asp Arg Ala Ala Gln Cys Gly225 230 235 240Thr Val Gln Leu Asp Phe Ser Leu Pro Ser Arg Leu Ser Ala Ser Tyr 245 250 255Val Gly Glu Asp Asn Glu Arg Lys Val Pro Val Met Ile His Arg Ala 260 265 270Ile Leu Gly Ser Met Glu Val Phe Ile Gly Ile Leu Thr Glu Glu Phe 275 280 285Ala Gly Phe Phe Pro Thr Trp Leu Ala Pro Val Gln Val Val Ile Met 290 295 300Asn Ile Thr Asp Ser Gln Ser Glu Tyr Val Asn Glu Leu Thr Gln Lys305 310 315 320Leu Ser Asn Ala Gly Ile Arg Val Lys Ala Asp Leu Arg Asn Glu Lys 325 330 335Ile Gly Phe Lys Ile Arg Glu His Thr Leu Arg Arg Val Pro Tyr Met 340 345 350Leu Val Cys Gly Asp Lys Glu Val Glu Ser Gly Lys Val Ala Val Arg 355 360 365Thr Arg Arg Gly Lys Asp Leu Gly Ser Met Asp Val Asn Glu Val Ile 370 375 380Glu Lys Leu Gln Gln Glu Ile Arg Ser Arg Ser Leu Lys Gln Leu Glu385 390 395 400Glu24264PRTEscherichia coli 24Met Thr Lys Pro Ile Val Phe Ser Gly Ala Gln Pro Ser Gly Glu Leu1 5 10 15Thr Ile Gly Asn Tyr Met Gly Ala Leu Arg Gln Trp Ile Asn Met Gln 20 25 30Asp Asp Tyr His Cys Ile Tyr Cys Ile Val Asp Gln His Ala Ile Thr 35 40 45Val Arg Gln Asp Ala Gln Lys Leu Arg Lys Ala Thr Leu Asp Thr Leu 50 55 60Ala Leu Tyr Leu Ala Cys Gly Ile Asp Pro Glu Lys Ser Thr Ile Phe65 70 75 80Val Gln Ser His Val Pro Glu His Ala Gln Leu Gly Trp Ala Leu Asn 85 90 95Cys Tyr Thr Tyr Phe Gly Glu Leu Ser Arg Met Thr Gln Phe Lys Asp 100 105 110Lys Ser Ala Arg Tyr Ala Glu Asn Ile Asn Ala Gly Leu Phe Asp Tyr 115 120 125Pro Val Leu Met Ala Ala Asp Ile Leu Leu Tyr Gln Thr Asn Leu Val 130 135 140Pro Val Gly Glu Asp Gln Lys Gln His Leu Glu Leu Ser Arg Asp Ile145 150 155 160Ala Gln Arg Phe Asn Ala Leu Tyr Gly Asp Ile Phe Lys Val Pro Glu 165 170 175Pro Phe Ile Pro Lys Ser Gly Ala Arg Val Met Ser Leu Leu Glu Pro 180 185 190Thr Lys Lys Met Ser Lys Ser Asp Asp Asn Arg Asn Asn Val Ile Gly 195 200 205Leu Leu Glu Asp Pro Lys Ser Val Val Lys Lys Ile Lys Arg Ala Val 210 215 220Thr Asp Ser Asp Glu Pro Pro Val Val Arg Tyr Asp Val Gln Asn Lys225 230 235 240Ala Gly Val Ser Asn Leu Leu Asp Ile Leu Ser Ala Val Thr Gly Gln 245 250 255Ser Ile Pro Glu Leu Glu Lys Gln 26025424PRTEscherichia coli 25Met Ala Ser Ser Asn Leu Ile Lys Gln Leu Gln Glu Arg Gly Leu Val1 5 10 15Ala Gln Val Thr Asp Glu Glu Ala Leu Val Glu Arg Leu Ala Gln Gly 20 25 30Pro Ile Ala Leu Tyr Cys Gly Phe Asp Pro Thr Ala Asp Ser Leu His 35 40 45Leu Gly His Leu Val Pro Leu Leu Cys Leu Lys Arg Phe Gln Gln Ala 50 55 60Gly His Lys Pro Val Ala Leu Val Gly Gly Ala Thr Gly Leu Ile Gly65 70 75 80Asp Pro Ser Phe Lys Ala Ala Glu Arg Lys Leu Asn Thr Glu Glu Thr 85 90 95Val Gln Glu Trp Val Asp Lys Ile Arg Lys Gln Val Ala Pro Phe Leu 100 105 110Asp Phe Asp Cys Gly Glu Asn Ser Ala Ile Ala Ala Asn Asn Tyr Asp 115 120 125Trp Phe Gly Asn Met Asn Val Leu Thr Phe Leu Arg Asp Ile Gly Lys 130 135 140His Phe Ser Val Asn Gln Met Ile Asn Lys Glu Ala Val Lys Gln Arg145 150 155 160Leu Asn Arg Glu Asp Gln Gly Ile Ser Phe Thr Glu Phe Ser Tyr Asn 165 170 175Leu Leu Gln Gly Tyr Asp Phe Ala Cys Leu Asn Lys Gln Tyr Gly Val 180 185 190Val Leu Gln Ile Gly Gly Ser Asp Gln Trp Gly Asn Ile Thr Ser Gly 195 200 205Ile Asp Leu Thr Arg Arg Leu His Gln Asn Gln Val Phe Gly Leu Thr 210 215 220Val Pro Leu Ile Thr Lys Ala Asp Gly Thr Lys Phe Gly Lys Thr Glu225 230 235 240Gly Gly Ala Val Trp Leu Asp Pro Lys Lys Thr Ser Pro Tyr Lys Phe 245 250 255Tyr Gln Phe Trp Ile Asn Thr Ala Asp Ala Asp Val Tyr Arg Phe Leu 260 265 270Lys Phe Phe Thr Phe Met Ser Ile Glu Glu Ile Asn Ala Leu Glu Glu 275 280 285Glu Asp Lys Asn Ser Gly Lys Ala Pro Arg Ala Gln Tyr Val Leu Ala 290 295 300Glu Gln Val Thr Arg Leu Val His Gly Glu Glu Gly Leu Gln Ala Ala305 310 315 320Lys Arg Ile Thr Glu Cys Leu Phe Ser Gly Ser Leu Ser Ala Leu Ser 325 330 335Glu Ala Asp Phe Glu Gln Leu Ala Gln Asp Gly Val Pro Met Val Lys 340 345 350Met Glu Lys Gly Ala Asp Leu Met Gln Ala Leu Val Asp Ser Glu Leu 355 360 365Gln Pro Ser Arg Gly Gln Ala Arg Lys Thr Ile Ala Ser Asn Ala Ile 370 375 380Thr Ile Asn Gly Glu Lys Gln Ser Asp Pro Glu Tyr Phe Phe Lys Glu385 390 395 400Glu Asp Arg Leu Phe Gly Arg Phe Thr Leu Leu Arg Arg Gly Lys Lys 405 410 415Asn Tyr Cys Leu Ile Cys Trp Lys 42026763PRTEscherichia coli 26Met Glu Lys Thr Tyr Asn Pro Gln Asp Ile Glu Gln Pro Leu Tyr Glu1 5 10 15His Trp Glu Lys Gln Gly Tyr Phe Lys Pro Asn Gly Asp Glu Ser Gln 20 25 30Glu Ser Phe Cys Ile Met Ile Pro Pro Pro Asn Val Thr Gly Ser Leu 35 40 45His Met Gly His Ala Phe Gln Gln Thr Ile Met Asp Thr Met Ile Arg 50 55 60Tyr Gln Arg Met Gln Gly Lys Asn Thr Leu Trp Gln Val Gly Thr Asp65 70 75 80His Ala Gly Ile Ala Thr Gln Met Val Val Glu Arg Lys Ile Ala Ala 85 90 95Glu Glu Gly Lys Thr Arg His Asp Tyr Gly Arg Glu Ala Phe Ile Asp 100 105 110Lys Ile Trp Glu Trp Lys Ala Glu Ser Gly Gly Thr Ile Thr Arg Gln 115 120 125Met Arg Arg Leu Gly Asn Ser Val Asp Trp Glu Arg Glu Arg Phe Thr 130 135 140Met Asp Glu Gly Leu Ser Asn Ala Val Lys Glu Val Phe Val Arg Leu145 150 155 160Tyr Lys

Glu Asp Leu Ile Tyr Arg Gly Lys Arg Leu Val Asn Trp Asp 165 170 175Pro Lys Leu Arg Thr Ala Ile Ser Asp Leu Glu Val Glu Asn Arg Glu 180 185 190Ser Lys Gly Ser Met Trp His Ile Arg Tyr Pro Leu Ala Asp Gly Ala 195 200 205Lys Thr Ala Asp Gly Lys Asp Tyr Leu Val Val Ala Thr Thr Arg Pro 210 215 220Glu Thr Leu Leu Gly Asp Thr Gly Val Ala Val Asn Pro Glu Asp Pro225 230 235 240Arg Tyr Lys Asp Leu Ile Gly Lys Tyr Val Ile Leu Pro Leu Val Asn 245 250 255Arg Arg Ile Pro Ile Val Gly Asp Glu His Ala Asp Met Glu Lys Gly 260 265 270Thr Gly Cys Val Lys Ile Thr Pro Ala His Asp Phe Asn Asp Tyr Glu 275 280 285Val Gly Lys Arg His Ala Leu Pro Met Ile Asn Ile Leu Thr Phe Asp 290 295 300Gly Asp Ile Arg Glu Ser Ala Gln Val Phe Asp Thr Lys Gly Asn Glu305 310 315 320Ser Asp Val Tyr Ser Ser Glu Ile Pro Ala Glu Phe Gln Lys Leu Glu 325 330 335Arg Phe Ala Ala Arg Lys Ala Val Val Ala Ala Ile Asp Ala Leu Gly 340 345 350Leu Leu Glu Glu Ile Lys Pro His Asp Leu Thr Val Pro Tyr Gly Asp 355 360 365Arg Gly Gly Val Val Ile Glu Pro Met Leu Thr Asp Gln Trp Tyr Val 370 375 380Arg Ala Asp Val Leu Ala Lys Pro Ala Val Glu Ala Val Glu Asn Gly385 390 395 400Asp Ile Gln Phe Val Pro Lys Gln Tyr Glu Asn Met Tyr Phe Ser Trp 405 410 415Met Arg Asp Ile Gln Asp Trp Cys Ile Ser Arg Gln Leu Trp Trp Gly 420 425 430His Arg Ile Pro Ala Trp Tyr Asp Glu Ala Gly Asn Val Tyr Val Gly 435 440 445Arg Asn Glu Glu Glu Val Arg Lys Glu Asn Asn Leu Gly Ala Asp Val 450 455 460Ala Leu Arg Gln Asp Glu Asp Val Leu Asp Thr Trp Phe Ser Ser Ala465 470 475 480Leu Trp Thr Phe Ser Thr Leu Gly Trp Pro Glu Asn Thr Asp Ala Leu 485 490 495Arg Gln Phe His Pro Thr Ser Val Met Val Ser Gly Phe Asp Ile Ile 500 505 510Phe Phe Trp Ile Ala Arg Met Ile Met Met Thr Met His Phe Ile Lys 515 520 525Asp Glu Asn Gly Lys Pro Gln Val Pro Phe His Thr Val Tyr Met Thr 530 535 540Gly Leu Ile Arg Asp Asp Glu Gly Gln Lys Met Ser Lys Ser Lys Gly545 550 555 560Asn Val Ile Asp Pro Leu Asp Met Val Asp Gly Ile Ser Leu Pro Glu 565 570 575Leu Leu Glu Lys Arg Thr Gly Asn Met Met Gln Pro Gln Leu Ala Asp 580 585 590Lys Ile Arg Lys Arg Thr Glu Lys Gln Phe Pro Asn Gly Ile Glu Pro 595 600 605His Gly Thr Asp Ala Leu Arg Phe Thr Leu Ala Ala Leu Ala Ser Thr 610 615 620Gly Arg Asp Ile Asn Trp Asp Met Lys Arg Leu Glu Gly Tyr Arg Asn625 630 635 640Phe Cys Asn Lys Leu Trp Asn Ala Ser Arg Phe Val Leu Met Asn Thr 645 650 655Glu Gly Gln Asp Cys Gly Phe Asn Gly Gly Glu Met Thr Leu Ser Leu 660 665 670Ala Asp Arg Trp Ile Leu Ala Glu Phe Asn Gln Thr Ile Lys Ala Tyr 675 680 685Arg Glu Ala Leu Asp Ser Phe Arg Phe Asp Ile Ala Ala Gly Ile Leu 690 695 700Tyr Glu Phe Thr Trp Asn Gln Phe Cys Asp Trp Tyr Leu Glu Leu Thr705 710 715 720Lys Pro Val Met Asn Gly Gly Thr Glu Ala Glu Leu Arg Gly Thr Arg 725 730 735His Thr Leu Val Thr Val Leu Glu Gly Leu Leu Arg Leu Ala His Pro 740 745 750Ile Ile Pro Phe Ile Thr Glu Thr Ile Trp Gln 755 76027287PRTMethanococcus jannaschii 27Met Asp Glu Phe Glu Met Ile Lys Arg Asn Thr Ser Glu Ile Ile Ser1 5 10 15Glu Leu Arg Glu Val Leu Lys Lys Asp Glu Lys Ser Ala Leu Ile Gly 20 25 30Phe Glu Pro Ser Gly Lys Ile His Leu Gly His Tyr Leu Gln Lys Lys 35 40 45Met Ile Asp Leu Gln Asn Ala Gly Phe Asp Ile Ile Ile Pro Leu Ala 50 55 60Asp Leu His Ala Tyr Leu Asn Gln Lys Gly Glu Leu Asp Glu Ile Arg65 70 75 80Lys Ile Gly Asp Tyr Asn Lys Lys Val Phe Glu Ala Met Leu Lys Ala 85 90 95Lys Tyr Val Tyr Gly Ser Glu Phe Gln Leu Asp Lys Tyr Thr Leu Asn 100 105 110Val Tyr Arg Leu Ala Leu Lys Thr Thr Leu Lys Ala Arg Arg Ser Met 115 120 125Glu Leu Ile Ala Arg Glu Asp Glu Asn Pro Val Ala Glu Val Ile Tyr 130 135 140Pro Ile Met Gln Val Asn Gly Cys His Tyr Lys Gly Val Asp Val Ala145 150 155 160Val Gly Gly Met Glu Gln Arg Lys Ile Met Leu Ala Arg Glu Leu Leu 165 170 175Pro Lys Lys Val Val Cys Ile His Pro Val Leu Thr Gly Leu Asp Gly 180 185 190Glu Gly Lys Met Ser Ser Ser Gly Asn Phe Ile Ala Val Asp Asp Ser 195 200 205Pro Glu Glu Ile Arg Ala Phe Lys Lys Ala Tyr Cys Pro Ala Gly Val 210 215 220Val Glu Gly Asn Pro Glu Ile Ala Lys Tyr Phe Leu Glu Tyr Pro Leu225 230 235 240Thr Ile Lys Pro Glu Lys Phe Gly Gly Asp Leu Thr Val Asn Ser Tyr 245 250 255Glu Glu Ser Leu Phe Lys Asn Lys Glu Leu His Pro Met Asp Leu Lys 260 265 270Ala Val Ala Glu Glu Leu Ile Lys Ile Leu Glu Pro Ile Arg Lys 275 280 28528539PRTMethanococcus jannaschii 28Met Arg Phe Asp Pro Glu Lys Ile Lys Lys Asp Ala Lys Glu Asn Phe1 5 10 15Asp Leu Thr Trp Asn Glu Gly Lys Lys Met Val Lys Thr Pro Thr Leu 20 25 30Asn Glu Arg Tyr Pro Arg Thr Thr Phe Arg Tyr Gly Lys Ala His Pro 35 40 45Val Tyr Asp Thr Ile Gln Lys Leu Arg Glu Ala Tyr Leu Arg Met Gly 50 55 60Phe Glu Glu Met Met Asn Pro Leu Ile Val Asp Glu Lys Glu Val His65 70 75 80Lys Gln Phe Gly Ser Glu Ala Leu Ala Val Leu Asp Arg Cys Phe Tyr 85 90 95Leu Ala Gly Leu Pro Arg Pro Asn Val Gly Ile Ser Asp Glu Arg Ile 100 105 110Ala Gln Ile Asn Gly Ile Leu Gly Asp Ile Gly Asp Glu Gly Ile Asp 115 120 125Lys Val Arg Lys Val Leu His Ala Tyr Lys Lys Gly Lys Val Glu Gly 130 135 140Asp Asp Leu Val Pro Glu Ile Ser Ala Ala Leu Glu Val Ser Asp Ala145 150 155 160Leu Val Ala Asp Met Ile Glu Lys Val Phe Pro Glu Phe Lys Glu Leu 165 170 175Val Ala Gln Ala Ser Thr Lys Thr Leu Arg Ser His Met Thr Ser Gly 180 185 190Trp Phe Ile Ser Leu Gly Ala Leu Leu Glu Arg Lys Glu Pro Pro Phe 195 200 205His Phe Phe Ser Ile Asp Arg Cys Phe Arg Arg Glu Gln Gln Glu Asp 210 215 220Ala Ser Arg Leu Met Thr Tyr Tyr Ser Ala Ser Cys Val Ile Met Asp225 230 235 240Glu Asn Val Thr Val Asp His Gly Lys Ala Val Ala Glu Gly Leu Leu 245 250 255Ser Gln Phe Gly Phe Glu Lys Phe Leu Phe Arg Pro Asp Glu Lys Arg 260 265 270Ser Lys Tyr Tyr Val Pro Asp Thr Gln Thr Glu Val Phe Ala Phe His 275 280 285Pro Lys Leu Val Gly Ser Asn Ser Lys Tyr Ser Asp Gly Trp Ile Glu 290 295 300Ile Ala Thr Phe Gly Ile Tyr Ser Pro Thr Ala Leu Ala Glu Tyr Asp305 310 315 320Ile Pro Cys Pro Val Met Asn Leu Gly Leu Gly Val Glu Arg Leu Ala 325 330 335Met Ile Leu His Asp Ala Pro Asp Ile Arg Ser Leu Thr Tyr Pro Gln 340 345 350Ile Pro Gln Tyr Ser Glu Trp Glu Met Ser Asp Ser Glu Leu Ala Lys 355 360 365Gln Val Phe Val Asp Lys Thr Pro Glu Thr Pro Glu Gly Arg Glu Ile 370 375 380Ala Asp Ala Val Val Ala Gln Cys Glu Leu His Gly Glu Glu Pro Ser385 390 395 400Pro Cys Glu Phe Pro Ala Trp Glu Gly Glu Val Cys Gly Arg Lys Val 405 410 415Lys Val Ser Val Ile Glu Pro Glu Glu Asn Thr Lys Leu Cys Gly Pro 420 425 430Ala Ala Phe Asn Glu Val Val Thr Tyr Gln Gly Asp Ile Leu Gly Ile 435 440 445Pro Asn Thr Lys Lys Trp Gln Lys Ala Phe Glu Asn His Ser Ala Met 450 455 460Ala Gly Ile Arg Phe Ile Glu Ala Phe Ala Ala Gln Ala Ala Arg Glu465 470 475 480Ile Glu Glu Ala Ala Met Ser Gly Ala Asp Glu His Ile Val Arg Val 485 490 495Arg Ile Val Lys Val Pro Ser Glu Val Asn Ile Lys Ile Gly Ala Thr 500 505 510Ala Gln Arg Tyr Ile Thr Gly Lys Asn Lys Lys Ile Asp Met Arg Gly 515 520 525Pro Ile Phe Thr Ser Ala Lys Ala Glu Phe Glu 530 53529215PRTTrypanosoma cruzi 29Ala Pro Ala Ala Val Asp Trp Arg Ala Arg Gly Ala Val Thr Ala Val1 5 10 15Lys Asp Ser Gly Gln Cys Gly Ser Gly Trp Ala Phe Ala Ala Ile Gly 20 25 30Asn Val Glu Cys Gln Trp Phe Leu Ala Gly His Pro Leu Thr Asn Leu 35 40 45Ser Glu Gln Met Leu Val Ser Cys Asp Lys Thr Asp Ser Gly Cys Ser 50 55 60Ser Gly Leu Met Asp Asn Ala Phe Glu Trp Ile Val Gln Glu Asn Asn65 70 75 80Gly Ala Val Tyr Thr Glu Asp Ser Tyr Pro Tyr Ala Ser Ala Thr Gly 85 90 95Ile Ser Pro Pro Cys Thr Thr Ser Gly His Thr Val Gly Ala Thr Ile 100 105 110Thr Gly His Val Glu Leu Pro Gln Asp Glu Ala Gln Ile Ala Ala Trp 115 120 125Leu Ala Val Asn Gly Pro Val Ala Val Cys Val Asp Ala Ser Ser Trp 130 135 140Met Thr Tyr Thr Gly Gly Val Met Thr Ser Cys Val Ser Glu Ser Tyr145 150 155 160Asp His Gly Val Leu Leu Val Gly Tyr Asn Asp Ser His Lys Val Pro 165 170 175Tyr Trp Ile Ile Lys Asn Ser Trp Thr Thr Gln Trp Gly Glu Glu Gly 180 185 190Tyr Ile Arg Ile Ala Lys Gly Ser Asn Gln Cys Leu Val Lys Glu Glu 195 200 205Ala Ser Ser Ala Val Val Gly 210 21530215PRTTrypanosoma cruzi 30Ala Pro Ala Ala Val Asp Trp Arg Ala Arg Gly Ala Val Thr Ala Val1 5 10 15Lys Asp Gln Gly Gln Cys Gly Ser Cys Trp Ala Phe Ser Ala Ile Gly 20 25 30Asn Val Glu Cys Gln Trp Phe Leu Ala Gly His Pro Leu Thr Asn Leu 35 40 45Ser Glu Gln Met Leu Val Ser Cys Asp Lys Thr Asp Ser Gly Cys Ser 50 55 60Gly Gly Leu Met Asn Asn Ala Phe Glu Trp Ile Val Gln Glu Asn Asn65 70 75 80Gly Ala Val Tyr Thr Glu Asp Ser Tyr Pro Tyr Ala Ser Gly Glu Gly 85 90 95Ile Ser Pro Pro Cys Thr Thr Ser Gly His Thr Val Gly Ala Thr Ile 100 105 110Thr Gly His Val Glu Leu Pro Gln Asp Glu Ala Gln Ile Ala Ala Trp 115 120 125Leu Ala Val Asn Gly Pro Val Ala Val Ala Val Asp Ala Ser Ser Trp 130 135 140Met Thr Tyr Thr Gly Gly Val Met Thr Ser Cys Val Ser Glu Gln Leu145 150 155 160Asp His Gly Val Leu Leu Val Gly Tyr Asn Asp Ser Ala Ala Val Pro 165 170 175Tyr Trp Ile Ile Lys Asn Ser Trp Thr Thr Gln Trp Gly Glu Glu Gly 180 185 190Tyr Ile Arg Ile Ala Lys Gly Ser Asn Gln Cys Leu Val Lys Glu Glu 195 200 205Ala Ser Ser Ala Val Val Gly 210 215

* * * * *

Patent Diagrams and Documents

C00001

US20210072252A1-20210311-C00001.CDX

US20210072252A1-20210311-C00001.MOL

D00001

D00002

D00003

D00004

D00005

D00006

D00007

D00008

S00001

XML

US20210072252A1 – US 20210072252 A1