U.S. patent application number 11/147763 was filed with the patent office on 2006-04-13 for novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase.
Invention is credited to Michael Reifler, Maithreyan Srinivasan.
Application Number | 20060078909 11/147763 |
Document ID | / |
Family ID | 27382840 |
Filed Date | 2006-04-13 |
United States Patent
Application |
20060078909 |
Kind Code |
A1 |
Srinivasan; Maithreyan ; et
al. |
April 13, 2006 |
Novel sulfurylase-luciferase fusion proteins and thermostable
sulfurylase
Abstract
The present invention relates to the field of DNA recombinant
technology. More specifically, this invention relates to fusion
proteins comprising an ATP generating polypeptide joined to a
polypeptide that converts ATP into a detectable entity.
Accordingly, this invention focuses on sulfurylase-luciferase
fusion proteins. This invention also relates to pharmaceutical
compositions containing the fusion proteins and methods for using
them.
Inventors: |
Srinivasan; Maithreyan;
(Branford, CT) ; Reifler; Michael; (Hamden,
CT) |
Correspondence
Address: |
MINTZ, LEVIN, COHN, et al
24th Floor
666 Third Avenue
New York
NY
10017
US
|
Family ID: |
27382840 |
Appl. No.: |
11/147763 |
Filed: |
June 7, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10154515 |
May 23, 2002 |
6902921 |
|
|
11147763 |
Jun 7, 2005 |
|
|
|
10122706 |
Apr 11, 2002 |
6956114 |
|
|
10154515 |
May 23, 2002 |
|
|
|
60335949 |
Oct 30, 2001 |
|
|
|
60349076 |
Jan 16, 2002 |
|
|
|
Current U.S.
Class: |
435/6.12 ;
435/191; 435/6.1 |
Current CPC
Class: |
C07K 2319/00 20130101;
C12N 9/0069 20130101; C12N 9/1241 20130101 |
Class at
Publication: |
435/006 ;
435/191 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; C12N 9/06 20060101 C12N009/06 |
Claims
1-221. (canceled)
222. A method of determining the base sequence of a plurality of
single stranded template nucleotides on an array, the method
comprising: (a) providing a planar surface comprises at least
400,000 discrete cavities, wherein each cavity forms a reaction
chamber containing single-stranded nucleic acid templates of a
single species, wherein the reaction chambers have a center to
center spacing of between 5 to 200 .mu.m, wherein each reaction
chamber contains a reaction mixture comprising a template-directed
nucleotide polymerase and said one of said plurality of
single-stranded template nucleotides hybridized to a complementary
oligonucleotide primer strand at least one nucleotide residue
shorter than the single-stranded template nucleotides to form at
least one unpaired nucleotide residue in each template at the
3'-end of the primer strand; (b) adding an activated nucleotide
5'-triphosphate precursor of one known nitrogenous base to the
reaction chambers under conditions which allow incorporation of the
activated nucleoside 5'-triphosphate precursor onto the 3'-end of
the primer strand, provided the nitrogenous base of the activated
nucleoside 5'-triphosphate precursor is complementary to the
nitrogenous base of the unpaired nucleotide residue of the
templates; (c) detecting whether or not the nucleoside
5'-triphosphate precursor was incorporated into the primer strands
in each reaction chamber by detecting a sequencing byproduct with
an ATP generating polypeptide-ATP converting polypeptide fusion
protein or an ATP generating protein and an ATP converting protein,
thus indicating that the unpaired nucleotide residue of the
template has a nitrogenous base composition that is complementary
to that of the incorporated nucleoside 5'-triphosphate precursor in
each reaction chamber; (d) sequentially repeating steps (b) and
(c), wherein each sequential repetition adds and, detects the
incorporation of one type of activated nucleoside 5'-triphosphate
precursor of known nitrogenous base composition; and (e)
determining the base sequence of the unpaired nucleotide residues
of the template in each reaction chamber from the sequence of
incorporation of said nucleoside precursors.
223. The method of claim 222 wherein said sequencing byproduct is
pyrophosphate.
224. The method of claim 222 wherein the ATP generating
polypeptide-ATP converting polypeptide fusion protein comprises an
ATP generating polypeptide portion with an amino acid sequence
which is at least 96% homologous to SEQ ID NO:2.
225. The method of claim 222 wherein the ATP generating
polypeptide-ATP converting polypeptide fusion protein comprises an
ATP generating polypeptide portion with an amino acid sequence
which is SEQ ID NO:6.
226. The method of claim 222 wherein the ATP generating
polypeptide-ATP converting polypeptide fusion protein comprises an
amino acid sequence of SEQ ID NO:4.
227. The method of claim 222 wherein the ATP generating protein
comprises an amino acid sequence which is at least 96% homologous
to SEQ ID NO:2.
228. The method of claim 222 wherein the ATP generating protein
comprises an amino acid sequence of SEQ ID NO:2 or SEQ ID NO:6.
229. The method of claim 222 wherein said ATP generating
polypeptide-ATP converting polypeptide fusion protein comprise an
amino acid sequence encoded by a polynucleotide with an open
reading frame of SEQ ID NO:3.
230. The method of claim 222 wherein said ATP generating
polypeptide comprise an amino acid sequence encoded by a
polynucleotide with an open reading frame which is no more than 11%
different from an open reading frame of SEQ ID NO:1.
231. The method of claim 222 wherein said ATP generating
polypeptide comprises an amino acid sequence encoded by an open
reading frame of SEQ ID NO:1 or SEQ ID NO:5.
232. The method of claim 222 wherein said ATP generating
polypeptide-ATP converting polypeptide fusion protein or said ATP
generating protein further comprises an affinity tag.
233. The method of claim 222 wherein said ATP generating
polypeptide-ATP converting polypeptide fusion protein, said ATP
generating protein, or said ATP converting polypeptide is bound to
a bead.
234. A method of identifying a base at a target position in a
sample nucleic acid sequence, comprising providing a sample nucleic
acid and a primer which hybridizes to the sample nucleic acid
immediately adjacent to the target position, subjecting the sample
nucleic acid and primer to a polymerase reaction in the presence of
a nucleotide whereby the nucleotide will only become incorporated
if it is complementary to the base in the target position, and
detecting said incorporation of the nucleotide by monitoring the
release of inorganic pyrophosphate, whereby detection of
incorporation of said nucleotide is indicative of identification of
a base at a target position that is complementary to said
nucleotide, and wherein the release of inorganic pyrophosphate is
detected using a thermostable sulfurylase-luciferase fusion protein
or a thermostable sulfurylase.
235. The method of claim 234 wherein the thermostable
sulfurylase-luciferase fusion protein or the thermostable
sulfurylase comprises an amino acid of at least 96% homology to SEQ
ID NO:2, SEQ ID NO:4 or SEQ ID NO:6.
236. The method of claim 234 wherein the thermostable
sulfurylase-luciferase fusion protein or the thermostable
sulfurylase is encoded by an open reading frame of SEQ ID NO: 1, 3
or 5.
237. The method of claim 234 wherein the thermostable
sulfurylase-luciferase fusion protein or the thermostable
sulfurylase further comprises an affinity tag.
238. The method of claim 234 wherein said the thermostable
sulfurylase-luciferase fusion protein or the thermostable
sulfurylase is bound to a bead.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a Continuation of U.S. patent
application Ser. No. 10/154,515, filed May 23, 2002, which is a
continuation in part of U.S. patent application Ser. No. 10/122,706
filed Apr. 11, 2002 which claims the benefit of priority to U.S.
Patent Application 60/335,949 filed Oct. 30, 2001 and U.S. Patent
Application 60/349,076 filed Jan. 16, 2002. All patents, patent
applications and references cited in this specification is hereby
incorporated by reference.
FIELD OF THE INVENTION
[0002] The invention relates generally to fusion proteins that are
useful as reporter proteins, in particular to fusion proteins of
ATP sulfurylase and luciferase which are utilized to achieve an
efficient conversion of pyrophosphate (PPi) to light. This
invention also relates to a novel thermostable sulfurylase which
can be used in the detection of inorganic pyrophosphate,
particularly in the sequencing of nucleic acid.
BACKGROUND OF THE INVENTION
[0003] ATP sulfurylase has been identified as being involved in
sulfur metabolism. It catalyzes the initial reaction in the
metabolism of inorganic sulfate (SO.sub.4.sup.-2); see e.g.,
Robbins and Lipmann, 1958. J. Biol. Chem. 233: 686-690; Hawes and
Nicholas, 1973. Biochem. J. 133: 541-550). In this reaction
SO.sub.4.sup.-2 is activated to adenosine 5'-phosphosulfate (APS).
ATP sulfurylase is also commonly used in pyrophosphate sequencing
methods. In order to convert pyrophosphate (PPi) generated from the
addition of dNMP to a growing DNA chain to light, PPi must first be
converted to ATP by ATP sulfurylase.
[0004] ATP produced by an ATP sulfurylase can also be hydrolyzed
using enzymatic reactions to generate light. Light-emitting
chemical reactions (i.e., chemiluminescence) and biological
reactions (i.e., bioluminescence) are widely used in analytical
biochemistry for sensitive measurements of various metabolites. In
bioluminescent reactions, the chemical reaction that leads to the
emission of light is enzyme-catalyzed. For example, the
luciferin-luciferase system allows for specific assay of ATP. Thus,
both ATP generating enzymes, such as ATP sulfurylase, and light
emitting enzymes, such as luciferase, could be useful in a number
of different assays for the detection and/or concentration of
specific substances in fluids and gases. Since high physical and
chemical stability is sometimes required for enzymes involved in
sequencing reactions, a thermostable enzyme is desirable.
[0005] Because the product of the sulfurylase reaction is consumed
by luciferase, proximity between these two enzymes by covalently
linking the two enzymes in the form of a fusion protein would
provide for a more efficient use of the substrate. Substrate
channeling is a phenomenon in which substrates are efficiently
delivered from enzyme to enzyme without equilibration with other
pools of the same substrates. In effect, this creates local pools
of metabolites at high concentrations relative to those found in
other areas of the cell. Therefore, a fusion of an ATP generating
polypeptide and an ATP converting peptide could benefit from the
phenomenon of substrate channeling and would reduce production
costs and increase the number of enzymatic reactions that occur
during a given time period.
[0006] All patents and publications cited throughout the
specification are hereby incorporated by reference into this
specification in their entirety in order to more fully describe the
state of the art to which this invention pertains.
SUMMARY OF THE INVENTION
[0007] The invention provides a fusion protein comprising an ATP
generating polypeptide bound to a polypeptide which converts ATP
into an entity which is detectable. In one aspect, the invention
provides a fusion protein comprising a sulfurylase polypeptide
bound to a luciferase polypeptide. This invention provides a
nucleic acid that comprises an open reading frame that encodes a
novel thermostable sulfurylase polypeptide. In a further aspect,
the invention provides for a fusion protein comprising a
thermostable sulfurylase joined to at least one affinity tag.
[0008] In another aspect, the invention provides a recombinant
polynucleotide that comprises a coding sequence for a fusion
protein having a sulfurylase poylpeptide sequence joined to a
luciferase polypeptide sequence. In a further aspect, the invention
provides an expression vector for expressing a fusion protein. The
expression vector comprises a coding sequence for a fusion protein
having: (i) a regulatory sequence, (ii) a first polypeptide
sequence of an ATP generating polypeptide and (iii) a second
polypeptide sequence that converts ATP to an entity which is
detectable. In an additional embodiment, the fusion protein
comprises a sulfurylase polypeptide and a luciferase polypeptide.
In another aspect, the invention provides a transformed host cell
which comprises the expression vector. In an additional aspect, the
invention provides a fusion protein bound to a mobile support. The
invention also includes a kit comprising a sulfurylase-luciferase
fusion protein expression vector.
[0009] The invention also includes a method for determining the
nucleic acid sequence in a template nucleic acid polymer,
comprising: (a) introducing the template nucleic acid polymer into
a polymerization environment in which the nucleic acid polymer will
act as a template polymer for the synthesis of a complementary
nucleic acid polymer when nucleotides are added; (b) successively
providing to the polymerization environment a series of feedstocks,
each feedstock comprising a nucleotide selected from among the
nucleotides from which the complementary nucleic acid polymer will
be formed, such that if the nucleotide in the feedstock is
complementary to the next nucleotide in the template polymer to be
sequenced said nucleotide will be incorporated into the
complementary polymer and inorganic pyrophosphate will be released;
(c) separately recovering each of the feedstocks from the
polymerization environment; and (d) measuring the amount of PPi
with an ATP generating polypeptide-ATP converting polypeptide
fusion protein in each of the recovered feedstocks to determine the
identity of each nucleotide in the complementary polymer and thus
the sequence of the template polymer. In one embodiment, the amount
of inorganic pyrophosphate is measured by the steps of: (a) adding
adenosine-5'-phosphosulfate to the feedstock; (b) combining the
recovered feedstock containing adenosine-5'-phosphosulfate with an
ATP generating polypeptide-ATP converting polypeptide fusion
protein such that any inorganic pyrophosphate in the recovered
feedstock and the adenosine-5'-phosphosulfate will react to the
form ATP and sulfate; (c) combining the ATP and sulfate-containing
feedstock with luciferin in the presence of oxygen such that the
ATP is consumed to produced AMP, inorganic pyrophosphate, carbon
dioxide and light; and (d) measuring the amount of light
produced.
[0010] In another aspect, the invention includes a method wherein
each feedstock comprises adenosine-5'-phosphosulfate and luciferin
in addition to the selected nucleotide base, and the amount of
inorganic pyrophosphate is determined by reacting the inorganic
pyrophosphate feedstock with an ATP generating polypeptide-ATP
converting polypeptide fusion protein thereby producing light in an
amount proportional to the amount of inorganic pyrophosphate, and
measuring the amount of light produced.
[0011] In another aspect, the invention provides a method for
sequencing a nucleic acid, the method comprising; (a) providing one
or more nucleic acid anchor primers; (b) providing a plurality of
single-stranded circular nucleic acid templates disposed within a
plurality of cavities on a planar surface, each cavity forming an
analyte reaction chamber, wherein the reaction chambers have a
center to center spacing of between 5 to 200 .mu.m; (c) annealing
an effective amount of the nucleic acid anchor primer to at least
one of the single-stranded circular templates to yield a primed
anchor primer-circular template complex; (d) combining the primed
anchor primer-circular template complex with a polymerase to form
an extended anchor primer covalently linked to multiple copies of a
nucleic acid complementary to the circular nucleic acid template;
(e) annealing an effective amount of a sequencing primer to one or
more copies of said covalently linked complementary nucleic acid;
(f) extending the sequencing primer with a polymerase and a
predetermined nucleotide triphosphate to yield a sequencing product
and, if the predetermined nucleotide triphosphate is incorporated
onto the 3' end of said sequencing primer, a sequencing reaction
byproduct; and (g) identifying the sequencing reaction byproduct
with the use of a ATP generating polypeptide-ATP converting
polypeptide fusion protein, thereby determining the sequence of the
nucleic acid.
[0012] In one aspect, the invention provides a method for
sequencing a nucleic acid, the method comprising: (a) providing at
least one nucleic acid anchor primer; (b) providing a plurality of
single-stranded circular nucleic acid templates in an array having
at least 400,000 discrete reaction sites; (c) annealing a first
amount of the nucleic acid anchor primer to at least one of the
single-stranded circular templates to yield a primed anchor
primer-circular template complex; (d) combining the primed anchor
primer-circular template complex with a polymerase to form an
extended anchor primer covalently linked to multiple copies of a
nucleic acid complementary to the circular nucleic acid template;
(e) annealing a second amount of a sequencing primer to one or more
copies of the covalently linked complementary nucleic acid; (f)
extending the sequencing primer with a polymerase and a
predetermined nucleotide triphosphate to yield a sequencing product
and, when the predetermined nucleotide triphosphate is incorporated
onto the 3' end of the sequencing primer, to yield a sequencing
reaction byproduct; and (g) identifying the sequencing reaction
byproduct with the use of a ATP generating polypeptide-ATP
converting polypeptide fusion protein, thereby determining the
sequence of the nucleic acid at each reaction site that contains a
nucleic acid template.
[0013] In another aspect, the invention includes a method of
determining the base sequence of a plurality of nucleotides on an
array, the method comprising the steps of: (a) providing a
plurality of sample DNAs, each disposed within a plurality of
cavities on a planar surface, each cavity forming an analyte
reaction chamber, wherein the reaction chambers have a center to
center spacing of between 5 to 200 .mu.m, (b) adding an activated
nucleotide 5'-triphosphate precursor of one known nitrogenous base
to a reaction mixture in each reaction chamber, each reaction
mixture comprising a template-directed nucleotide polymerase and a
single-stranded polynucleotide template hybridized to a
complementary oligonucleotide primer strand at least one nucleotide
residue shorter than the templates to form at least one unpaired
nucleotide residue in each template at the 3'-end of the primer
strand, under reaction conditions which allow incorporation of the
activated nucleoside 5'-triphosphate precursor onto the 3'-end of
the primer strands, provided the nitrogenous base of the activated
nucleoside 5'-triphosphate precursor is complementary to the
nitrogenous base of the unpaired nucleotide residue of the
templates; (c) determining whether or not the nucleoside
5'-triphosphate precursor was incorporated into the primer strands
through detection of a sequencing byproduct with a ATP generating
polypeptide-ATP converting polypeptide fusion protein, thus
indicating that the unpaired nucleotide residue of the template has
a nitrogenous base composition that is complementary to that of the
incorporated nucleoside 5'-triphosphate precursor; and (d)
sequentially repeating steps (b) and (c), wherein each sequential
repetition adds and, detects the incorporation of one type of
activated nucleoside 5'-triphosphate precursor of known nitrogenous
base composition; and
(e) determining the base sequence of the unpaired nucleotide
residues of the template in each reaction chamber from the sequence
of incorporation of said nucleoside precursors.
[0014] In one aspect, the invention includes a method for
determining the nucleic acid sequence in a template nucleic acid
polymer, comprising: (a) introducing a plurality of template
nucleic acid polymers into a plurality of cavities on a planar
surface, each cavity forming an analyte reaction chamber, wherein
the reaction chambers have a center to center spacing of between 5
to 200 .mu.m, each reaction chamber having a polymerization
environment in which the nucleic acid polymer will act as a
template polymer for the synthesis of a complementary nucleic acid
polymer when nucleotides are added; (b) successively providing to
the polymerization environment a series of feedstocks, each
feedstock comprising a nucleotide selected from among the
nucleotides from which the complementary nucleic acid polymer will
be formed, such that if the nucleotide in the feedstock is
complementary to the next nucleotide in the template polymer to be
sequenced said nucleotide will be incorporated into the
complementary polymer and inorganic pyrophosphate will be released;
(c) detecting the formation of inorganic pyrophosphate with an ATP
generating polypeptide-ATP converting polypeptide fusion protein to
determine the identify of each nucleotide in the complementary
polymer and thus the sequence of the template polymer.
[0015] In one aspect, the invention provides a method of
identifying the base in a target position in a DNA sequence of
sample DNA including the steps comprising: (a) disposing sample DNA
within a plurality of cavities on a planar surface, each cavity
forming an analyte reaction chamber, wherein the reaction chambers
have a center to center spacing of between 5 to 200 .mu.m, said DNA
being rendered single stranded either before or after being
disposed in the reaction chambers, (b) providing an extension
primer which hybridizes to said immobilized single-stranded DNA at
a position immediately adjacent to said target position; (c)
subjecting said immobilized single-stranded DNA to a polymerase
reaction in the presence of a predetermined nucleotide
triphosphate, wherein if the predetermined nucleotide triphosphate
is incorporated onto the 3' end of said sequencing primer then a
sequencing reaction byproduct is formed; and
(d) identifying the sequencing reaction byproduct with a ATP
generating polypeptide-ATP converting polypeptide fusion protein,
thereby determining the nucleotide complementary to the base at
said target position.
[0016] The invention also includes a method of identifying a base
at a target position in a sample DNA sequence comprising: (a)
providing sample DNA disposed within a plurality of cavities on a
planar surface, each cavity forming an analyte reaction chamber,
wherein the reaction chambers have a center to center spacing of
between 5 to 200 .mu.m, said DNA being rendered single stranded
either before or after being disposed in the reaction chambers; (b)
providing an extension primer which hybridizes to the sample DNA
immediately adjacent to the target position; (c) subjecting the
sample DNA sequence and the extension primer to a polymerase
reaction in the presence of a nucleotide triphosphate whereby the
nucleotide triphosphate will only become incorporated and release
pyrophosphate (PPi) if it is complementary to the base in the
target position, said nucleotide triphosphate being added either to
separate aliquots of sample-primer mixture or successively to the
same sample-primer mixture; (d) detecting the release of PPi with
an ATP generating polypeptide-ATP converting polypeptide fusion
protein to indicate which nucleotide is incorporated.
[0017] In one aspect, the invention provides a method of
identifying a base at a target position in a single-stranded sample
DNA sequence, the method comprising: (a) providing an extension
primer which hybridizes to sample DNA immediately adjacent to the
target position, said sample DNA disposed within a plurality of
cavities on a planar surface, each cavity forming an analyte
reaction chamber, wherein the reaction chambers have a center to
center spacing of between 5 to 200 um, said DNA being rendered
single stranded either before or after being disposed in the
reaction chambers; (b) subjecting the sample DNA and extension
primer to a polymerase reaction in the presence of a predetermined
deoxynucleotide or dideoxynucleotide whereby the deoxynucleotide or
dideoxynucleotide will only become incorporated and release
pyrophosphate (PPi) if it is complementary to the base in the
target position, said predetermined deoxynucleotides or
dideoxynucleotides being added either to separate aliquots of
sample-primer mixture or successively to the same sample-primer
mixture, (c) detecting any release of PPi with an ATP generating
polypeptide-ATP converting polypeptide fusion protein to indicate
which deoxynucleotide or dideoxynucleotide is
incorporated;characterized in that, the PPi-detection enzyme(s) are
included in the polymerase reaction step and in that in place of
deoxy- or dideoxy adenosine triphosphate (ATP) a dATP or ddATP
analogue is used which is capable of acting as a substrate for a
polymerase but incapable of acting as a substrate for a said
PPi-detection enzyme.
[0018] In another aspect, the invention includes a method of
determining the base sequence of a plurality of nucleotides on an
array, the method comprising: (a) providing a plurality of sample
DNAs, each disposed within a plurality of cavities on a planar
surface, each cavity forming an analyte reaction chamber, wherein
the reaction chambers have a center to center spacing of between 5
to 200 .mu.m, (b) converting PPi into light with an ATP generating
polypeptide-ATP converting polypeptide fusion protein; (c)
detecting the light level emitted from a plurality of reaction
sites on respective portions of an optically sensitive device; (d)
converting the light impinging upon each of said portions of said
optically sensitive device into an electrical signal which is
distinguishable from the signals from all of said other regions;
(e) determining a light intensity for each of said discrete regions
from the corresponding electrical signal; (f) recording the
variations of said electrical signals with time.
[0019] In one aspect, the invention provides a method for
sequencing a nucleic acid, the method comprising:(a) providing one
or more nucleic acid anchor primers; (b) providing a plurality of
single-stranded circular nucleic acid templates disposed within a
plurality of cavities on a planar surface, each cavity forming an
analyte reaction chamber, wherein the reaction chambers have a
center to center spacing of between 5 to 200 .mu.m;(c) converting
PPi into a detectable entity with the use of an ATP generating
polypeptide-ATP converting polypeptide fusion protein; (d)
detecting the light level emitted from a plurality of reaction
sites on respective portions of an optically sensitive device; (e)
converting the light impinging upon each of said portions of said
optically sensitive device into an electrical signal which is
distinguishable from the signals from all of said other regions;
(f) determining a light intensity for each of said discrete regions
from the corresponding electrical signal; (g) recording the
variations of said electrical signals with time.
[0020] In another aspect, the invention includes a method for
sequencing a nucleic acid, the method comprising: (a) providing at
least one nucleic acid anchor primer; (b) providing a plurality of
single-stranded circular nucleic acid templates in an array having
at least 400,000 discrete reaction sites; (c) converting PPi into a
detectable entity with an ATP generating polypeptide-ATP converting
polypeptide fusion protein; (d) detecting the light level emitted
from a plurality of reaction sites on respective portions of an
optically sensitive device; (e) converting the light impinging upon
each of said portions of said optically sensitive device into an
electrical signal which is distinguishable from the signals from
all of said other regions; (f) determining a light intensity for
each of said discrete regions from the corresponding electrical
signal; (g) recording the variations of said electrical signals
with time.
[0021] In another aspect, the invention includes an isolated
polypeptide comprising an amino acid sequence selected from the
group consisting of: (a) a mature form of an amino acid sequence of
SEQ ID NO: 2; (b) a variant of a mature form of an amino acid
sequence of SEQ ID NO: 2; an amino acid sequence of SEQ ID NO: 2;
(c) a variant of an amino acid sequence of SEQ ID NO: 2, wherein
one or more amino acid residues in said variant differs from the
amino acid sequence of said mature form, provided that said variant
differs in no more than 5% of amino acid residues from said amino
acid sequence; (d) and at least one conservative amino acid
substitution to the amino acid sequences in (a), (b), (c) or (d).
The invention also includes an antibody that binds
immunospecifically to the polypeptide of (a), (b), (c) or (d).
[0022] In another aspect, the invention includes an isolated
nucleic acid molecule comprising a nucleic acid sequence encoding a
polypeptide comprising an amino acid sequence selected from the
group consisting of: (a) a mature form of an amino acid sequence of
SEQ ID NO: 2; (b) a variant of a mature form of an amino acid
sequence of SEQ ID NO: 2, wherein one or more amino acid residues
in said variant differs from the amino acid sequence of said mature
form, provided that said variant differs in no more than 5% of the
amino acid residues from the amino acid sequence of said mature
form; (c) an amino acid sequence of SEQ ID NO: 2; (d) a variant of
an amino acid sequence of SEQ ID NO: 2, wherein one or more amino
acid residues in said variant differs from the amino acid sequence
of said mature form, provided that said variant differs in no more
than 15% of amino acid residues from said amino acid sequence; a
nucleic acid fragment encoding at least a portion of a polypeptide
comprising an amino acid sequence of SEQ ID NO: 2, or a variant of
said polypeptide, wherein one or more amino acid residues in said
variant differs from the amino acid sequence of said mature form,
provided that said variant differs in no more than 5% of amino acid
residues from said amino acid sequence; (e) and a nucleic acid
molecule comprising the complement of (a), (b), (c), (d) or
(e).
[0023] In a further aspect, the invention provides a nucleic acid
molecule wherein the nucleic acid molecule comprises nucleotide
sequence selected from the group consisting of: (a) a first
nucleotide sequence comprising a coding sequence differing by one
or more nucleotide sequences from a coding sequence encoding said
amino acid sequence, provided that no more than 20% of the
nucleotides in the coding sequence in said first nucleotide
sequence differ from said coding sequence; an isolated second
polynucleotide that is a complement of the first polynucleotide;
(b) and a nucleic acid fragment of (a) or (b). The invention also
includes a vector comprising the nucleic acid molecule of (a) or
(b). In another aspect, the invention includes a cell comprising
the vector.
[0024] In a further aspect, the invention includes a method for
determining the nucleic acid sequence in a template nucleic acid
polymer, comprising: (a) introducing the template nucleic acid
polymer into a polymerization environment in which the nucleic acid
polymer will act as a template polymer for the synthesis of a
complementary nucleic acid polymer when nucleotides are added; (b)
successively providing to the polymerization environment a series
of feedstocks, each feedstock comprising a nucleotide selected from
among the nucleotides from which the complementary nucleic acid
polymer will be formed, such that if the nucleotide in the
feedstock is complementary to the next nucleotide in the template
polymer to be sequenced said nucleotide will be incorporated into
the complementary polymer and inorganic pyrophosphate will be
released; (c) separately recovering each of the feedstocks from the
polymerization environment; and (d) measuring the amount of PPi
with an ATP sulfurylase and a luciferase in each of the recovered
feedstocks to determine the identity of each nucleotide in the
complementary polymer and thus the sequence of the template
polymer.
[0025] In another aspect, the invention provides a method for
sequencing a nucleic acid, the method comprising: (a) providing one
or more nucleic acid anchor primers; (b) providing a plurality of
single-stranded circular nucleic acid templates disposed within a
plurality of cavities in an array on a planar surface, each cavity
forming an analyte reaction chamber, wherein the reaction chambers
have a center to center spacing of between 5 to 200 .mu.m and at
least 400,000 discrete sites; (c) annealing an effective amount of
the nucleic acid anchor primer to at least one of the
single-stranded circular templates to yield a primed anchor
primer-circular template complex; (d) combining the primed anchor
primer-circular template complex with a polymerase to form an
extended anchor primer covalently linked to multiple copies of a
nucleic acid complementary to the circular nucleic acid template;
(e) annealing an effective amount of a sequencing primer to one or
more copies of said covalently linked complementary nucleic acid;
(f) extending the sequencing primer with a polymerase and a
predetermined nucleotide triphosphate to yield a sequencing product
and, if the predetermined nucleotide triphosphate is incorporated
onto the 3' end of said sequencing primer, a sequencing reaction
byproduct; and (g) identifying the sequencing reaction byproduct
with the use of an ATP sulfurylase and a luciferase, thereby
determining the sequence of the nucleic acid.
[0026] In another aspect, the invention provides a method for
sequencing a nucleic acid, the method comprising: (a) providing at
least one nucleic acid anchor primer; (b) providing a plurality of
single-stranded circular nucleic acid templates in an array having
at least 400,000 discrete reaction sites; (c) annealing a first
amount of the nucleic acid anchor primer to at least one of the
single-stranded circular templates to yield a primed anchor
primer-circular template complex; (d) combining the primed anchor
primer-circular template complex with a polymerase to form an
extended anchor primer covalently linked to multiple copies of a
nucleic acid complementary to the circular nucleic acid template;
(e) annealing a second amount of a sequencing primer to one or more
copies of the covalently linked complementary nucleic acid; (f)
extending the sequencing primer with a polymerase and a
predetermined nucleotide triphosphate to yield a sequencing product
and, when the predetermined nucleotide triphosphate is incorporated
onto the 3' end of the sequencing primer, to yield a sequencing
reaction byproduct; and (g) identifying the sequencing reaction
byproduct with the use of a thermostable sulfurylase and a
luciferase, thereby determining the sequence of the nucleic acid at
each reaction site that contains a nucleic acid template.
[0027] In a further aspect, the invention includes a method of
determining the base sequence of a plurality of nucleotides on an
array, the method comprising: (a) providing a plurality of sample
DNAs, each disposed within a plurality of cavities on a planar
surface, each cavity forming an analyte reaction chamber, wherein
the reaction chambers have a center to center spacing of between 5
to 200 .mu.m, (b) adding an activated nucleotide 5'-triphosphate
precursor of one known nitrogenous base to a reaction mixture in
each reaction chamber, each reaction mixture comprising a
template-directed nucleotide polymerase and a single-stranded
polynucleotide template hybridized to a complementary
oligonucleotide primer strand at least one nucleotide residue
shorter than the templates to form at least one unpaired nucleotide
residue in each template at the 3'-end of the primer strand, under
reaction conditions which allow incorporation of the activated
nucleoside 5'-triphosphate precursor onto the 3'-end of the primer
strands, provided the nitrogenous base of the activated nucleoside
5'-triphosphate precursor is complementary to the nitrogenous base
of the unpaired nucleotide residue of the templates; (c) detecting
whether or not the nucleoside 5'-triphosphate precursor was
incorporated into the primer strands through detection of a
sequencing byproduct with a thermostable sulfurylase and
luciferase, thus indicating that the unpaired nucleotide residue of
the template has a nitrogenous base composition that is
complementary to that of the incorporated nucleoside
5'-triphosphate precursor; and (d) sequentially repeating steps (b)
and (c), wherein each sequential repetition adds and, detects the
incorporation of one type of activated nucleoside 5'-triphosphate
precursor of known nitrogenous base composition; and (e)
determining the base sequence of the unpaired nucleotide residues
of the template in each reaction chamber from the sequence of
incorporation of said nucleoside precursors.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] FIG. 1 is one embodiment for a cloning strategy for
obtaining the luciferase-sulfurylase sequence.
[0029] FIG. 2A and 2B show the preparative agarose gel of
luciferase and sulfurylase as well as sulfurylase-luciferase fusion
genes.
[0030] FIG. 3 shows the results of experiments to determine the
activity of the luciferase-sulfurylase fusion protein on
NTA-agarose and MPG-SA solid supports.
DETAILED DESCRIPTION OF THE INVENTION
[0031] This invention provides a fusion protein containing an ATP
generating polypeptide bound to a polypeptide which converts ATP
into an entity which is detectable. As used herein, the term
"fusion protein" refers to a chimeric protein containing an
exogenous protein fragment joined to another exogenous protein
fragment. The fusion protein could include an affinity tag to allow
attachment of the protein to a solid support or to allow for
purification of the recombinant fusion protein from the host cell
or culture supernatant, or both.
[0032] In a preferred embodiment, the ATP generating polypeptide
and ATP converting polypeptide are from a eukaryote or a
prokaryote. The eukaryote could be an animal, plant, fungus or
yeast. In some embodiments, the animal is a mammal, rodent, insect,
worm, mollusk, reptile, bird and amphibian. Plant sources of the
polypeptides include but are not limited to Arabidopsis thaliana,
Brassica napus, Allium sativum, Amaranthus caudatus, Hevea
brasiliensis, Hordeum vulgare, Lycopersicon esculentum, Nicotiana
tabacum, Oryza sativum, Pisum sativum, Populus trichocarpa, Solanum
tuberosum, Secale cereale, Sambucus nigra, Ulmus americana or
Triticum aestivum. Examples of fungi include but are not limited to
Penicillum chrysogenum, Stachybotrys chartarum, Aspergillus
fumigatus, Podospora anserina and Trichoderma reesei. Examples of
sources of yeast include but are not limited to Saccharomyces
cerevisiae, Candida tropicalis, Candida lypolitica, Candida utilis,
Kluyveromyces lactis, Schizosaccharomyces pombe, Yarrowia
lipolytica, Candida spp., Pichia spp. and Hansenula spp.
[0033] The prokaryote source could be bacteria or archaea. In some
embodiments, the bacteria is E. coli, B. subtilis, Streptococcus
gordonii, flavobacteria or green sulfur bacteria. In other
embodiments, the archaea is Sulfolobus, Thermococcus,
Methanobacterium, Halococcus, Halobacterium or Methanococcus
jannaschii.
[0034] The ATP generating polypeptide can be a ATP sulfurylase,
hydrolase or an ATP synthase. In a preferred embodiment, the ATP
generating polypeptide is ATP sulfurylase. In one embodiment, the
ATP sulfurylase is a thermostable sulfurylase cloned from Bacillus
stearothermophilus (Bst) and comprising the nucleotide sequence of
SEQ ID NO:1. This putative gene was cloned using genomic DNA
acquired from ATCC (Cat. No. 12980D). The gene is shown to code for
a functional ATP sulfurylase that can be expressed as a fusion
protein with an affinity tag. The disclosed Bst sulfurylase nucleic
acid (SEQ ID NO:1) includes the 1247 nucleotide sequence. An open
reading frame (ORF) for the mature protein was identified beginning
with an ATG codon at nucleotides 1-3 and ending with a TAA codon at
nucleotides 1159-1161. The start and stop codons of the open
reading frame are highlighted in bold type. The putative
untranslated regions are underlined and found upstream of the
initiation codon and downstream from the termination codon.
TABLE-US-00001 Bst Thermostable Sulfurylase Nucleotide Sequence
(SEQ ID NO: 1)
GTTATGAACATGAGTTTGAGCATTCCGCATGGCGGCACATTGATCAACCGTTGGAATCGG 60
GATTACCCAATGGATGAAGCAACGAAAACGATGGAGGTGTCCAAAGCCGAAGTAAGCGAC 120
CTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGGTTTTTAAGGAAAGCCGAT 180
TACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACTGTCTGGAGCATTCCGATC 240
ACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTCGGCGACAAAGCGAAACTC 300
GTTTATGGCGGCGACGTCTAGGGCGTCATTGAAATCGCCGATATTTACCGCCCGGATAAA 360
ACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCTCACCCGGGCGTGGGCAAG 420
CTGTTTGAAAAACCAGATGTGTAGGTCGGCGGAGCGGTTAGGCTCGTCAAACGGAGCGAC 480
AAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACGCGGAAACGATTTGCCGAA 540
CTCGGCTGGAATACCGTCGTGGGCTTCCAAACACGCAACCCGGTTCACCGGGCCCATGAA 600
TACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTTTTAAACCCGCTCGTCGGC 660
GAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAAAGCTATCAAGTGCTGCTG 720
GAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTCCAAGCTGCGATGCGGTAT 780
GCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAAAACTTCGGCTGCACGCAC 840
TTCATCGTCGGCCGGGACCATGCGGGCGTCGGCAACTATTACGGCACGTATGATGCGCAA 900
AAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACACCGCTCTTTTTCGAACAC 960
AGCTTTTATTGCAGGAAATGCGAAGGGATGGCATCGAGGAAAACATGCCCGCACGACGCA 1020
CAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATGTTGCGTAACGGCCAAGTG 1080
CCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGGCGTTTTGATCAAAGGGCTGCAAGAA 1140
CGCGAAACGGTCACCCCGTCGACACGCTAAAGGAGGAGCGAGATGAGCACGAATATCGTT 1200
TGGCATCATACATCGGTGACAAAAGAAGATCGCCGCCAACGCAACGG 1247
[0035] The Bst sulfurylase polypeptide (SEQ ID NO:2) is 386 amino
acid residues in length and is presented using the three letter
amino acid code. TABLE-US-00002 Bst Sulfurylase Amino Acid Sequence
(SEQ ID NO: 2) Met Ser Leu Ser Ile Pro His Gly Gly Thr Leu Ile 1 5
10 Asn Arg Trp Asn Pro Asp Tyr Pro Ile Asp Glu Ala 15 20 Thr Lys
Thr Ile Glu Leu Ser Lys Ala Glu Leu Ser 25 30 35 Asp Leu Glu Leu
Ile Gly Thr Gly Ala Tyr Ser Pro 40 45 Leu Thr Gly Phe Leu Thr Lys
Ala Asp Tyr Asp Ala 50 55 Val Val Glu Thr Met Arg Leu Ala Asp Gly
Thr Val 60 65 70 Trp Ser Ile Pro Ile Thr Leu Ala Val Thr Glu Glu 75
80 Lys Ala Ser Glu Leu Thr Val Gly Asp Lys Ala Lys 85 90 95 Leu Val
Tyr Gly Gly Asp Val Tyr Gly Val Ile Glu 100 105 Ile Ala Asp Ile Tyr
Arg Pro Asp Lys Thr Lys Glu 110 115 Ala Lys Leu Val Tyr Lys Thr Asp
Glu Leu Ala His 120 125 130 Pro Gly Val Arg Lys Leu Phe Glu Lys Pro
Asp Val 135 140 Tyr Val Gly Gly Ala Val Thr Leu Val Lys Arg Thr 145
150 155 Asp Lys Gly Gln Phe Ala Pro Phe Tyr Phe Asp Pro 160 165 Ala
Glu Thr Arg Lys Arg Phe Ala Glu Leu Gly Trp 170 175 Asn Thr Val Val
Gly Phe Gln Thr Arg Asn Pro Val 180 185 190 His Arg Ala His Glu Tyr
Ile Gln Lys Cys Ala Leu 195 200 Glu Ile Val Asp Gly Leu Phe Leu Asn
Pro Leu Val 205 210 215 Gly Glu Thr Lys Ala Asp Asp Ile Pro Ala Asp
Ile 220 225 Arg Met Glu Ser Tyr Gln Val Leu Leu Glu Asn Tyr 230 235
Tyr Pro Lys Asp Arg Val Phe Leu Gly Val Phe Gln 240 245 250 Ala Ala
Met Arg Tyr Ala Gly Pro Arg Glu Ala Ile 255 260 Phe His Ala Met Val
Arg Lys Asn Phe Gly Cys Thr 265 270 275 His Phe Ile Val Gly Arg Asp
His Ala Gly Val Gly 280 285 Asn Tyr Tyr Gly Thr Tyr Asp Ala Gln Lys
Ile Phe 290 295 Ser Asn Phe Thr Ala Glu Glu Leu Gly Ile Thr Pro 300
305 310 Leu Phe Phe Glu His Ser Phe Tyr Cys Thr Lys Cys 315 320 Glu
Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp 325 330 335 Ala Gln Tyr
His Val Val Leu Ser Gly Thr Lys Val 340 345 Arg Glu Met Leu Arg Asn
Gly Gln Val Pro Pro Ser 350 355 Thr Phe Ser Arg Pro Glu Val Ala Ala
Val Leu Ile 360 365 370 Lys Gly Leu Gln Glu Arg Glu Thr Val Thr Pro
Ser 375 380 Thr Arg 385
[0036] In one embodiment, the thermostable sulfurylase is active at
temperatures above ambient to at least 50.degree. C. This property
is beneficial so that the sulfurylase will not be denatured at
higher temperatures commonly utilized in polymerase chain reaction
(PCR) reactions or sequencing reactions. In one embodiment, the ATP
sulfurylase is from a thermophile. The thermostable sulfurylase can
come from thermophilic bacteria, including but not limited to,
Bacillus stearothermophilus, Thermus thermophilus, Bacillus
caldolyticus, Bacillus subtilis, Bacillus thermoleovorans,
Pyrococcus furiosus, Sulfolobus acidocaldarius, Rhodothermus
obamensis, Aquifex aeolicus, Archaeoglobus fulgidus, Aeropyrum
pernix, Pyrobaculum aerophilum, Pyrococcus abyssi, Penicillium
chrysogenum, Sulfolobus solfataricus and Thermomonospora fusca.
[0037] The homology of twelve ATP sulfurylases can be shown
graphically in the ClustalW analysis in Table 1. The alignment is
of ATP sulfurylases from the following species: Bacillus
stearothermophilus (Bst), University of Oklahoma--Strain 10 (Univ
of OK), Aquifex aeolicus (Aae), Pyrococcus furiosus (Pfu),
Sulfolobus solfataricus (Sso), Pyrobaculum aerophilum (Pae),
Archaeoglobus fulgidus (Afu), Penicillium chrysogenum (Pch),
Aeropyrum pernix (Ape), Saccharomyces cerevisiae (Sce), and
Thermomonospora fusca (Tfu).
[0038] A thermostable sulfurylase polypeptide is encoded by the
open reading frame ("ORF") of a thermostable sulfurylase nucleic
acid. An ORF corresponds to a nucleotide sequence that could
potentially be translated into a polypeptide. A stretch of nucleic
acids comprising an ORF is uninterrupted by a stop codon. An ORF
that represents the coding sequence for a full protein begins with
an ATG "start" codon and terminates with one of the three "stop"
codons, namely, TAA, TAG, or TGA. For the purposes of this
invention, an ORF may be any part of a coding sequence, with or
without a start codon, a stop codon, or both. For an ORF to be
considered as a good candidate for coding for a bona fide cellular
protein, a minimum size requirement is often set, e.g., a stretch
of DNA that would encode a protein of 50 amino acids or more.
[0039] The invention further encompasses nucleic acid molecules
that differ from the nucleotide sequences shown in SEQ ID NO:1 due
to degeneracy of the genetic code and thus encode the same
thermostable sulfurylase proteins as that encoded by the nucleotide
sequences shown in SEQ ID NO:1. In another embodiment, an isolated
nucleic acid molecule of the invention has a nucleotide sequence
encoding a protein having an amino acid sequence shown in SEQ ID
NO:2. In addition to the thermostable sulfurylase nucleotide
sequence shown in SEQ ID NO:1 it will be appreciated by those
skilled in the art that DNA sequence polymorphisms that lead to
changes in the amino acid sequences of the thermostable sulfurylase
polypeptides may exist within a population (e.g., the bacterial
population). Such genetic polymorphism in the thermostable
sulfurylase genes may exist among individuals within a population
due to natural allelic variation. As used herein, the terms "gene"
and "recombinant gene" refer to nucleic acid molecules comprising
an open reading frame encoding a thermostable sulfurylase protein.
Such natural allelic variations can typically result in 1-5%
variance in the nucleotide sequence of the thermostable sulfurylase
genes. Any and all such nucleotide variations and resulting amino
acid polymorphisms in the thermostable sulfurylase polypeptides,
which are the result of natural allelic variation and that do not
alter the functional activity of the thermostable sulfurylase
polypeptides, are intended to be within the scope of the
invention.
[0040] Moreover, nucleic acid molecules encoding thermostable
sulfurylase proteins from other species, and thus that have a
nucleotide sequence that differs from the sequence SEQ ID NO:1 are
intended to be within the scope of the invention. Nucleic acid
molecules corresponding to natural allelic variants and homologues
of the thermostable sulfurylase cDNAs of the invention can be
isolated based on their homology to the thermostable sulfurylase
nucleic acids disclosed herein using the human cDNAs, or a portion
thereof, as a hybridization probe according to standard
hybridization techniques under stringent hybridization conditions.
The invention further includes the nucleic acid sequence of SEQ ID
NO:1 and mature and variant forms thereof, wherein a first
nucleotide sequence comprising a coding sequence differing by one
or more nucleotide sequences from a coding sequence encoding said
amino acid sequence, provided that no more than 11% of the
nucleotides in the coding sequence differ from the coding
sequence.
[0041] Another aspect of the invention pertains to nucleic acid
molecules encoding a thermostable sulfurylase protein that contains
changes in amino acid residues that are not essential for activity.
Such thermostable sulfurylase proteins differ in amino acid
sequence from SEQ ID NO:2 yet retain biological activity. In
separate embodiments, the isolated nucleic acid molecule comprises
a nucleotide sequence encoding a protein, wherein the protein
comprises an amino acid sequence at least about 96%, 97%, 98% or
99% homologous to the amino acid sequence of SEQ ID NO:2. An
isolated nucleic acid molecule encoding a thermostable sulfurylase
protein homologous to the protein of SEQ ID NO: 2 can be created by
introducing one or more nucleotide substitutions, additions or
deletions into the nucleotide sequence of SEQ ID NO:1 such that one
or more amino acid substitutions, additions or deletions are
introduced into the encoded protein.
[0042] Mutations can be introduced into SEQ ID NO:2 by standard
techniques, such as site-directed mutagenesis and PCR-mediated
mutagenesis. Preferably, conservative amino acid substitutions are
made at one or more predicted, non-essential amino acid residues. A
"conservative amino acid substitution" is one in which the amino
acid residue is replaced with an amino acid residue having a
similar side chain. Families of amino acid residues having similar
side chains have been defined within the art. These families
include amino acids with basic side chains (e.g., lysine, arginine,
histidine), acidic side chains (e.g., aspartic acid, glutamic
acid), uncharged polar side chains (e.g., glycine, asparagine,
glutamine, serine, threonine, tyrosine, cysteine), nonpolar side
chains (e.g., alanine, valine, leucine, isoleucine, proline,
phenylalanine, methionine, tryptophan), beta-branched side chains
(e.g. threonine, valine, isoleucine) and aromatic side chains
(e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a
predicted non-essential amino acid residue in the thermostable
sulfurylase protein is replaced with another amino acid residue
from the same side chain family. Alternatively, in another
embodiment, mutations can be introduced randomly along all or part
of a thermostable sulfurylase coding sequence, such as by
saturation mutagenesis, and the resultant mutants can be screened
for thermostable sulfurylase biological activity to identify
mutants that retain activity. Following mutagenesis of SEQ ID NO:1,
the encoded protein can be expressed by any recombinant technology
known in the art and the activity of the protein can be
determined.
[0043] The relatedness of amino acid families may also be
determined based on side chain interactions. Substituted amino
acids may be fully conserved "strong" residues or fully conserved
"weak" residues. The "strong" group of conserved amino acid
residues may be any one of the following groups: STA, NEQK, NHQK,
NDEQ, QHRK, MILV, MILF, HY, FYW, wherein the single letter amino
acid codes are grouped by those amino acids that may be substituted
for each other. Likewise, the "weak" group of conserved residues
may be any one of the following: CSA, ATV, SAG, STNK, STPA, SGND,
SNDEQK, NDEQHK, NEQHRK, VLIM, HFY, wherein the letters within each
group represent the single letter amino acid code.
[0044] The thermostable sulfurylase nucleic acid of the invention
includes the nucleic acid whose sequence is provided herein, or
fragments thereof. The invention also includes mutant or variant
nucleic acids any of whose bases may be changed from the
corresponding base shown herein while still encoding a protein that
maintains its sulfurylase-like activities and physiological
functions, or a fragment of such a nucleic acid. The invention
further includes nucleic acids whose sequences are complementary to
those just described, including nucleic acid fragments that are
complementary to any of the nucleic acids just described. The
invention additionally includes nucleic acids or nucleic acid
fragments, or complements thereto, whose structures include
chemical modifications. Such modifications include, by way of
nonlimiting example, modified bases, and nucleic acids whose sugar
phosphate backbones are modified or derivatized. These
modifications are carried out at least in part to enhance the
chemical stability of the modified nucleic acid, such that they may
be used, for example, as antisense binding nucleic acids in
therapeutic applications in a subject.
[0045] A thermostable sulfurylase nucleic acid can encode a mature
thermostable sulfurylase polypeptide. As used herein, a "mature"
form of a polypeptide or protein disclosed in the present invention
is the product of a naturally occurring polypeptide or precursor
form or proprotein. The naturally occurring polypeptide, precursor
or proprotein includes, by way of nonlimiting example, the
full-length gene product, encoded by the corresponding gene.
Alternatively, it may be defined as the polypeptide, precursor or
proprotein encoded by an ORF described herein. The product "mature"
form arises, again by way of nonlimiting example, as a result of
one or more naturally occurring processing steps as they may take
place within the cell, or host cell, in which the gene product
arises. Examples of such processing steps leading to a "mature"
form of a polypeptide or protein include the cleavage of the
N-terminal methionine residue encoded by the initiation codon of an
ORF, or the proteolytic cleavage of a signal peptide or leader
sequence. Thus a mature form arising from a precursor polypeptide
or protein that has residues 1 to N, where residue 1 is the
N-terminal methionine, would have residues 2 through N remaining
after removal of the N-terminal methionine. Alternatively, a mature
form arising from a precursor polypeptide or protein having
residues 1 to N, in which an N-terminal signal sequence from
residue 1 to residue M is cleaved, would have the residues from
residue M+1 to residue N remaining. Further as used herein, a
"mature" form of a polypeptide or protein may arise from a step of
post-translational modification other than a proteolytic cleavage
event. Such additional processes include, by way of non-limiting
example, glycosylation, myristoylation or phosphorylation. In
general, a mature polypeptide or protein may result from the
operation of only one of these processes, or a combination of any
of them.
[0046] The term "isolated" nucleic acid molecule, as utilized
herein, is one, which is separated from other nucleic acid
molecules which are present in the natural source of the nucleic
acid. Preferably, an "isolated" nucleic acid is free of sequences
which naturally flank the nucleic acid (i.e., sequences located at
the 5'- and 3'-termini of the nucleic acid) in the genomic DNA of
the organism from which the nucleic acid is derived. For example,
in various embodiments, the isolated thermostable sulfurylase
nucleic acid molecules can contain less than about 5 kb, 4 kb, 3
kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which
naturally flank the nucleic acid molecule in genomic DNA of the
cell/tissue from which the nucleic acid is derived (e.g., brain,
heart, liver, spleen, etc.). Moreover, an "isolated" nucleic acid
molecule, such as a cDNA molecule, can be substantially free of
other cellular material or culture medium when produced by
recombinant techniques, or of chemical precursors or other
chemicals when chemically synthesized.
[0047] A nucleic acid molecule of the invention, e.g., a nucleic
acid molecule having the nucleotide sequence of SEQ ID NO:1 or a
complement of this aforementioned nucleotide sequence, can be
isolated using standard molecular biology techniques and the
sequence information provided herein. Using all or a portion of the
nucleic acid sequence of SEQ ID NO:1 as a hybridization probe,
thermostable sulfurylase molecules can be isolated using standard
hybridization and cloning techniques (e.g., as described in
Sambrook, et al., (eds.), MOLECULAR CLONING: A LABORATORY MANUAL
2.sup.nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, N.Y., 1989; and Ausubel, et al., (eds.), CURRENT PROTOCOLS
IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, N.Y.,
1993.)
[0048] A nucleic acid of the invention can be amplified using cDNA,
mRNA or alternatively, genomic DNA, as a template and appropriate
oligonucleotide primers according to standard PCR amplification
techniques. The nucleic acid so amplified can be cloned into an
appropriate vector and characterized by DNA sequence analysis.
Furthermore, oligonucleotides corresponding to thermostable
sulfurylase nucleotide sequences can be prepared by standard
synthetic techniques, e.g., using an automated DNA synthesizer.
[0049] As used herein, the term "complementary" refers to
Watson-Crick or Hoogsteen base pairing between nucleotides units of
a nucleic acid molecule, and the term "binding" means the physical
or chemical interaction between two polypeptides or compounds or
associated polypeptides or compounds or combinations thereof.
Binding includes ionic, non-ionic, van der Waals, hydrophobic
interactions, and the like. A physical interaction can be either
direct or indirect. Indirect interactions may be through or due to
the effects of another polypeptide or compound. Direct binding
refers to interactions that do not take place through, or due to,
the effect of another polypeptide or compound, but instead are
without other substantial chemical intermediates.
[0050] Fragments provided herein are defined as sequences of at
least 6 (contiguous) nucleic acids or at least 4 (contiguous) amino
acids, a length sufficient to allow for specific hybridization in
the case of nucleic acids or for specific recognition of an epitope
in the case of amino acids, respectively, and are at most some
portion less than a full length sequence. Fragments may be derived
from any contiguous portion of a nucleic acid or amino acid
sequence of choice. Derivatives are nucleic acid sequences or amino
acid sequences formed from the native compounds either directly or
by modification or partial substitution. Analogs are nucleic acid
sequences or amino acid sequences that have a structure similar to,
but not identical to, the native compound but differs from it in
respect to certain components or side chains. Analogs may be
synthetic or from a different evolutionary origin and may have a
similar or opposite metabolic activity compared to wild type.
Homologs are nucleic acid sequences or amino acid sequences of a
particular gene that are derived from different species.
[0051] Derivatives and analogs may be full length or other than
full length, if the derivative or analog contains a modified
nucleic acid or amino acid, as described below. Derivatives or
analogs of the nucleic acids or proteins of the invention include,
but are not limited to, molecules comprising regions that are
substantially homologous to the nucleic acids or proteins of the
invention, in various embodiments, by at least about 89% identity
over a nucleic acid or amino acid sequence of identical size or
when compared to an aligned sequence in which the alignment is done
by a computer homology program known in the art, or whose encoding
nucleic acid is capable of hybridizing to the complement of a
sequence encoding the aforementioned proteins under stringent,
moderately stringent, or low stringent conditions. See e.g.
Ausubel, et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley
& Sons, New York, N.Y., 1993, and below.
[0052] A "homologous nucleic acid sequence" or "homologous amino
acid sequence," or variations thereof, refer to sequences
characterized by a homology at the nucleotide level or amino acid
level as discussed above. Homologous nucleotide sequences encode
those sequences coding for isoforms of thermostable sulfurylase
polypeptides. Isoforms can be expressed in different tissues of the
same organism as a result of, for example, alternative splicing of
RNA. Alternatively, isoforms can be encoded by different genes. In
the invention, homologous nucleotide sequences include nucleotide
sequences encoding for a thermostable sulfurylase polypeptide of
species other than humans, including, but not limited to:
vertebrates, and thus can include, e.g., frog, mouse, rat, rabbit,
dog, cat cow, horse, and other organisms. Homologous nucleotide
sequences also include, but are not limited to, naturally occurring
allelic variations and mutations of the nucleotide sequences set
forth herein. Homologous nucleic acid sequences include those
nucleic acid sequences that encode conservative amino acid
substitutions in SEQ ID NO:1, as well as a polypeptide possessing
thermostable sulfurylase biological activity. Various biological
activities of the thermostable sulfurylase proteins are described
below.
[0053] The thermostable sulfurylase proteins of the invention
include the sulfurylase protein whose sequence is provided herein.
The invention also includes mutant or variant proteins any of whose
residues may be changed from the corresponding residue shown herein
while still encoding a protein that maintains its sulfurylase-like
activities and physiological functions, or a functional fragment
thereof. The invention further encompasses antibodies and antibody
fragments, such as F.sub.ab or (F.sub.ab)2, that bind
immunospecifically to any of the proteins of the invention. This
invention also includes a variant or a mature form of the amino
acid sequence of SEQ ID NO:2, wherein one or more amino acid
residues in the variant differs in no more than 4% of the amino
acic residues from the amino acid sequence of the mature form.
[0054] Several assays have been developed for detection of the
forward ATP sulfurylase reaction. The colorimetric molybdolysis
assay is based on phosphate detection (see e.g., Wilson and
Bandurski, 1958. J. Biol. Chem. 233: 975-981), whereas the
continuous spectrophotometric molybdolysis assay is based upon the
detection of NADH oxidation (see e.g., Seubert, et al., 1983. Arch.
Biochem. Biophys. 225: 679-691; Seubert, et al., 1985. Arch.
Biochem. Biophys. 240: 509-523). The later assay requires the
presence of several detection enzymes.
[0055] Suitable enzymes for converting ATP into light include
luciferases, e.g., insect luciferases. Luciferases produce light as
an end-product of catalysis. The best known light-emitting enzyme
is that of the firefly, Photinus pyralis (Coleoptera). The
corresponding gene has been cloned and expressed in bacteria (see
e.g., de Wet, et al., 1985. Proc. Natl. Acad. Sci. USA 80:
7870-7873) and plants (see e.g., Ow, et al., 1986. Science 234:
856-859), as well as in insect (see e.g., Jha, et al., 1990. FEBS
Lett. 274: 24-26) and mammalian cells (see e.g., de Wet, et al.,
1987. Mol. Cell. Biol. 7: 725-7373; Keller, et al., 1987. Proc.
Natl. Acad. Sci. USA 82: 3264-3268). In addition, a number of
luciferase genes from the Jamaican click beetle, Pyroplorus
plagiophihalamus (Coleoptera), have recently been cloned and
partially characterized (see e.g., Wood, et al., 1989. J. Biolumin.
Chemilumin. 4: 289-301; Wood, et al., 1989. Science 244: 700-702).
Distinct luciferases can sometimes produce light of different
wavelengths, which may enable simultaneous monitoring of light
emissions at different wavelengths. Accordingly, these
aforementioned characteristics are unique, and add new dimensions
with respect to the utilization of current reporter systems.
[0056] Firefly luciferase catalyzes bioluminescence in the presence
of luciferin, adenosine 5'-triphosphate (ATP), magnesium ions, and
oxygen, resulting in a quantum yield of 0.88 (see e.g., McElroy and
Selinger, 1960. Arch. Biochem. Biophys. 88: 136-145). The firefly
luciferase bioluminescent reaction can be utilized as an assay for
the detection of ATP with a detection limit of approximately
1.times.10.sup.-13 M (see e.g., Leach, 1981. J. Appl. Biochem. 3:
473-517). In addition, the overall degree of sensitivity and
convenience of the luciferase-mediated detection systems have
created considerable interest in the development of firefly
luciferase-based biosensors (see e.g., Green and Kricka, 1984.
Talanta 31: 173-176; Blum, et al., 1989. J. Biolumin. Chemilumin.
4: 543-550).
[0057] The development of new reagents have made it possible to
obtain stable light emission proportional to the concentrations of
ATP (see e.g., Lundin, 1982. Applications of firefly luciferase In;
Luminescent Assays (Raven Press, New York). With such stable light
emission reagents, it is possible to make endpoint assays and to
calibrate each individual assay by addition of a known amount of
ATP. In addition, a stable light-emitting system also allows
continuous monitoring of ATP-converting systems.
[0058] In a preferred embodiment, the ATP generating-ATP converting
fusion protein is attached to an affinity tag. The term "affinity
tag" is used herein to denote a peptide segment that can be
attached to a polypeptide to provide for purification or detection
of the polypeptide or provide sites for attachment of the
polypeptide to a substrate. In principal, any peptide or protein
for which an antibody or other specific binding agent is available
can be used as an affinity tag. Affinity tags include a
poly-histidine tract or a biotin carboxyl carrier protein (BCCP)
domain, protein A (Nilsson et al., EMBO J. 4:1075, 1985; Nilsson et
al., Methods Enzymol. 198:3, 1991), glutathione S transferase
(Smith and Johnson, Gene 67:31, 1988), substance P, Flag..TM..
peptide (Hopp et al., Biotechnology 6:1204-1210, 1988; available
from Eastman Kodak Co., New Haven, Conn.), streptavidin binding
peptide, or other antigenic epitope or binding domain. See, in
general Ford et al., Protein Expression and Purification 2: 95-107,
1991. DNAs encoding affinity tags are available from commercial
suppliers (e.g., Pharmacia Biotech, Piscataway, N.J.).
[0059] As used herein, the term "poly-histidine tag," when used in
reference to a fusion protein refers to the presence of two to ten
histidine residues at either the amino- or carboxy-terminus of a
protein of interest. A poly-histidine tract of six to ten residues
is preferred. The poly-histidine tract is also defined functionally
as being a number of consecutive histidine residues added to the
protein of interest which allows the affinity purification of the
resulting fusion protein on a nickel-chelate or IDA column.
[0060] In some embodiments, the fusion protein has an orientation
such that the sulfurylase polypeptide is N-terminal to the
luciferase polypeptide. In other embodiments, the luciferase
polypeptide is N-terminal to the sulfurylase polypeptide. As used
herein, the term sulfurylase-luciferase fusion protein refers to
either of these orientations. The terms "amino-terminal"
(N-terminal) and "carboxyl-terminal" (C-terminal) are used herein
to denote positions within polypeptides and proteins. Where the
context allows, these terms are used with reference to a particular
sequence or portion of a polypeptide or protein to denote proximity
or relative position. For example, a certain sequence positioned
carboxyl-terminal to a reference sequence within a protein is
located proximal to the carboxyl terminus of the reference
sequence, but is not necessarily at the carboxyl terminus of the
complete protein.
[0061] The fusion protein of this invention can be produced by
standard recombinant DNA techniques. For example, DNA fragments
coding for the different polypeptide sequences are ligated together
in-frame in accordance with conventional techniques, e.g. by
employing blunt-ended or "sticky"-ended termini for ligation,
restriction enzyme digestion to provide for appropriate termini,
filling-in of cohesive ends as appropriate, alkaline phosphatase
treatment to avoid undesirable joining, and enzymatic ligation. In
another embodiment, the fusion gene can be synthesized by
conventional techniques including automated DNA synthesizers.
Alternatively, PCR amplification of gene fragments can be carried
out using anchor primers that give rise to complementary overhangs
between two consecutive gene fragments that can subsequently be
annealed and reamplified to generate a chimeric gene sequence (see,
for example, Ausubel et al. (eds.) CURRENT PROTOCOLS IN MOLECULAR
BIOLOGY, John Wiley & Sons, 1992). The two polypeptides of the
fusion protein can also be joined by a linker, such as a unique
restriction site, which is engineered with specific primers during
the cloning procedure. In one embodiment, the sulfurylase and
luciferase polypeptides are joined by a linker, for example an
ala-ala-ala linker which is encoded by a Notl restriction site.
[0062] In one embodiment, the invention includes a recombinant
polynucleotide that comprises a coding sequence for a fusion
protein having an ATP generating polypeptide sequence and an ATP
converting polypeptide sequence. In a preferred embodiment, the
recombinant polynucleotide encodes a sulfurylase-luciferase fusion
protein. The term "recombinant DNA molecule" or "recombinant
polynucleotide" as used herein refers to a DNA molecule which is
comprised of segments of DNA joined together by means of molecular
biological techniques. The term "recombinant protein" or
"recombinant polypeptide" as used herein refers to a protein
molecule which is expressed from a recombinant DNA molecule.
[0063] In one aspect, this invention discloses a
sulfurylase-luciferase fusion protein with an N-terminal
hexahistidine tag and a BCCP tag. The nucleic acid sequence of the
disclosed N-terminal hexahistidine-BCCP luciferase-sulfurylase gene
(His6-BCCP L-S) gene is shown below: TABLE-US-00003 His6-BCCP L-S
Nucleotide Sequence (SEQ ID NO: 3):
ATGCGGGGTTCTCATCATCATCATCATCATGGTATGGCTAGCATGGAAGCGCCAGCAGCA 60
GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTAGTTTCTACCGCACCCCA 120
AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG 180
TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG 240
AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC 300
GAGGGATCCGAGCTCGAGATCCAAATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCG 360
CCATTCTATCCTCTAGAGGATGGAACCGCTGGAGAGCAACTGCATAAGGCTATGAAGAGA 420
TACGCCCTGGTTCCTGGAACAATTGCTTTTACAGATGCACATATCGAGGTGAACATCACG 480
TACGCGGAATACTTCGAAATGTCCGTTCGGTTGGCAGAAGCTATGAAACGATATGGGCTG 540
AATACAAATCACAGAATCGTCGTATGCAGTGAAAACTCTCTTCAATTCTTTATGCCGGTG 600
TTGGGCGCGTTATTTATCGGAGTTGCAGTTGCGCCCGCGAACGACATTTATAATGAACGT 660
GAATTGCTCAACAGTATGAACATTTCGCAGCCTACCGTAGTGTTTGTTTCCAAAAAGGGG 720
TTGCAAAAAATTTTGAACGTGCAAAAAAAATTACCAATAATCCAGAAAATTATTATCATG 780
GATTCTAAAACGGATTACCAGGGATTTCAGTCGATGTACACGTTCGTCACATCTGATCTA 840
CCTCCCGGTTTTAATGAATACGATTTTGTACCAGAGTCCTTTGATCGTGACAAAACAATT 900
GCACTGATAATGAATTCCTCTGGATCTACTGGGTTACCTAAGGGTGTGGCCCTTCCGCAT 960
AGAACTGCCTGCGTCAGATTCTCGCATGCCAGAGATCCTATTTTTGGCAATCAAATCATT 1020
CCGGATACTGCGATTTTAAGTGTTGTTCCATTCCATCACGGTTTTGGAATGTTTACTACA 1080
CTCGGATATTTGATATGTGGATTTCGAGTCGTCTTAATGTATAGATTTGAAGAAGAGCTG 1140
TTTTTACGATCCCTTCAGGATTACAAAATTCAAAGTGCGTTGCTAGTACCAACCCTATTT 1200
TCATTCTTCGCCAAAAGCACTCTGATTGACAAATACGATTTATCTAATTTACACGAAATT 1260
GCTTCTGGGGGCGCACCTCTTTCGAAAGAAGTCGGGGAAGCGGTTGCAAAACGCTTCCAT 1320
CTTCCAGGGATACGACAAGGATATGGGCTCACTGAGACTACATCAGCTATTCTGATTACA 1380
CCCGAGGGGGATGATAAACCGGGCGCGGTCGGTAAAGTTGTTCCATTTTTTGAAGCGAAG 1440
GTTGTGGATCTGGATACCGGGAAAACGCTGGGCGTTAATCAGAGAGGCGAATTATGTGTC 1500
AGAGGACCTATGATTATGTCCGGTTATGTAAACAATCCGGAAGCGACCAACGCCTTGATT 1560
GACAAGGATGGATGGCTACATTCTGGAGACATAGCTTACTGGGACGAAGACGAACACTTC 1620
TTCATAGTTGACCGCTTGAAGTCTTTAATTAAATACAAAGGATATCAGGTGGCCCCCGCT 1680
GAATTGGAATCGATATTGTTACAACACCCCAACATCTTCGACGCGGGCGTGGCAGGTCTT 1740
CCCGACGATGACGCCGGTGAACTTCCCGGCGCCGTTGTTGTTTTGGAGCACGGAAAGACG 1800
ATGACGGAAAAAGAGATCGTGGATTACGTCGCCAGTCAAGTAACAACCGCGAAAAAGTTG 1860
CGCGGAGGAGTTGTGTTTGTGGACGAAGTACCGAAAGGTCTTACCGGAAAACTCGACGCA 1920
AGAAAAATCAGAGAGATCCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGGCGGCC 1980
GCTATGCCTGCTCCTCACGGTGGTATTCTACAAGACTTGATTGCTAGAGATGCGTTAAAG 2040
AAGAATGAATTGTTATCTGAAGCGCAATCTTCGGACATTTTAGTATGGAACTTGACTCCT 2100
AGACAACTATGTGATATTGAATTGATTCTAAATGGTGGGTTTTCTCCTCTGACTGGGTTT 2160
TTGAACGAAAACGATTACTCCTCTGTTGTTACAGATTCGAGATTAGCAGACGGCACATTG 2220
TGGACCATCCCTATTACATTAGATGTTGATGAAGCATTTGCTAACCAAATTAAACCAGAC 2280
ACAAGAATTGCCCTTTTCCAAGATGATGAAATTCCTATTGCTATACTTACTGTCCAGGAT 2340
GTTTACAAGCCAAACAAAACTATCGAAGCCGAAAAAGTCTTCAGAGGTGACCCAGAACAT 2400
CCAGCCATTAGCTATTTATTTAACGTTGCCGGTGATTATTACGTCGGCGGTTCTTTAGAA 2460
GCGATTCAATTACCTCAACATTATGACTATCCAGGTTTGCGTAAGACACCTGCCCAACTA 2520
AGACTTGAATTCCAATCAAGACAATGGGACCGTGTCGTAGCTTTCCAAACTCGTAATCCA 2580
ATGCATAGAGCCCACAGGGAGTTGACTGTGAGAGCCGCCAGAGAAGCTAATGCTAAGGTG 2640
CTGATCCATCCAGTTGTTGGACTAACCAAACCAGGTGATATAGACCATCACACTCGTGTT 2700
CGTGTCTACCAGGAAATTATTAAGCGTTATCCTAATGGTATTGCTTTCTTATCCCTGTTG 2760
CCATTAGCAATGAGAATGAGTGGTGATAGAGAAGCCGTATGGCATGCTATTATTAGAAAG 2820
AATTATGGTGCCTCCCACTTCATTGTTGGTAGAGACCATGCGGGCCCAGGTAAGAACTCC 2880
AAGGGTGTTGATTTCTACGGTCCATACGATGCTCAAGAATTGGTCGAATCCTACAAGCAT 2940
GAACTGGACATTGAAGTTGTTGCATTCAGAATGGTCACTTATTTGCCAGACGAAGACCGT 3000
TATGCTCCAATTGATCAAATTGACACCACAAAGACGAGAACCTTGAACATTTCAGGTACA 3060
GAGTTGAGACGCCGTTTAAGAGTTGGTGGTGAGATTCCTGAATGGTTCTCATATCCTGAA 3120
GTGGTTAAAATCCTAAGAGAATCCAACCCACCAAGACCAAAACAAGGTTTTTCAATTGTT 3180
TTAGGTAATTCATTAACCGTTTCTCGTGAGCAATTATCCATTGCTTTGTTGTCAACATTC 3240
TTGCAATTCGGTGGTGGCAGGTATTACAAGATCTTTGAACACAATAATAAGACAGAGTTA 3300
CTATCTTTGATTCAAGATTTCATTGGTTCTGGTAGTGGACTAATTATTCCAAATCAATGG 3360
GAAGATGACAAGGACTCTGTTGTTGGCAAGCAAAACGTTTACTTATTAGATACCTCAAGC 3420
TCAGCCGATATTCAGCTAGAGTCAGCGGATGAACCTATTTCACATATTGTACAAAAAGTT 3480
GTCCTATTCTTGGAAGACAATGGCTTTTTTGTATTTTAA 3519
[0064] The amino acid sequence of the disclosed His6-BCCP L-S
polypeptide is presented using the three letter amino acid code
(SEQ ID NO:4). TABLE-US-00004 His6-BCCP L-S Amino Acid Sequence
(SEQ ID NO: 4) Met Arg Gly Ser His His His His His His Gly Met 1 5
10 Ala Ser Met Glu Ala Pro Ala Ala Ala Glu Ile Ser 15 20 Gly His
Ile Val Arg Ser Pro Met Val Gly Thr Phe 25 30 35 Tyr Arg Thr Pro
Ser Pro Asp Ala Lys Ala Phe Ile 40 45 Glu Val Gly Gln Lys Val Asn
Val Gly Asp Thr Leu 50 55 60 Cys Ile Val Glu Ala Met Lys Met Met
Asn Gln Ile 65 70 Glu Ala Asp Lys Ser Gly Thr Val Lys Ala Ile Leu
75 80 Val Glu Ser Gly Gln Pro Val Glu Phe Asp Glu Pro 85 90 95 Leu
Val Val Ile Glu Gly Ser Glu Leu Glu Ile Gln 100 105 Met Glu Asp Ala
Lys Asn Ile Lys Lys Gly Pro Ala 110 115 120 Pro Phe Tyr Pro Leu Glu
Asp Gly Thr Ala Gly Glu 125 130 Gln Leu His Lys Ala Met Lys Arg Tyr
Ala Leu Val 135 140 Pro Gly Thr Ile Ala Phe Thr Asp Ala His Ile Glu
145 150 155 Val Asn Ile Thr Tyr Ala Glu Tyr Phe Glu Met Ser 160 165
Val Arg Leu Ala Glu Ala Met Lys Arg Tyr Gly Leu 170 175 180 Asn Thr
Asn His Arg Ile Val Val Cys Ser Glu Asn 185 190 Ser Leu Gln Phe Phe
Met Pro Val Leu Gly Ala Leu 195 200 Phe Ile Gly Val Ala Val Ala Pro
Ala Asn Asp Ile 205 210 215 Tyr Asn Glu Arg Glu Leu Leu Asn Ser Met
Asn Ile 220 225 Ser Gln Pro Thr Val Val Phe Val Ser Lys Lys Gly 230
235 240 Leu Gln Lys Ile Leu Asn Val Gln Lys Lys Leu Pro 245 250 Ile
Ile Gln Lys Ile Ile Ile Met Asp Ser Lys Thr 255 260 Asp Tyr Gln Gly
Phe Gln Ser Met Tyr Thr Phe Val 265 270 275 Thr Ser His Leu Pro Pro
Gly Phe Asn Glu Tyr Asp 280 285 Phe Val Pro Glu Ser Phe Asp Arg Asp
Lys Thr Ile 290 295 300 Ala Leu Ile Met Asn Ser Ser Gly Ser Thr Gly
Leu 305 310 Pro Lys Gly Val Ala Leu Pro His Arg Thr Ala Cys 315 320
Val Arg Phe Ser His Ala Arg Asp Pro Ile Phe Gly 325 330 335 Asn Gln
Ile Ile Pro Asp Thr Ala Ile Leu Ser Val 340 345 Val Pro Phe His His
Gly Phe Gly Met Phe Thr Thr 350 355 360 Leu Gly Tyr Leu Ile Cys Gly
Phe Arg Val Val Leu 365 370 Met Tyr Arg Phe Glu Glu Glu Leu Phe Leu
Arg Ser 375 380 Leu Gln Asp Tyr Lys Ile Gln Ser Ala Leu Leu Val 385
390 395 Pro Thr Leu Phe Ser Phe Phe Ala Lys Ser Thr Leu 400 405 Ile
Asp Lys Tyr Asp Leu Ser Asn Leu His Glu Ile 410 415 420 Ala Ser Gly
Gly Ala Pro Leu Ser Lys Glu Val Gly 425 430 Glu Ala Val Ala Lys Arg
Phe His Leu Pro Gly Ile 435 440 Arg Gln Gly Tyr Gly Leu Thr Glu Thr
Thr Ser Ala 445 450 455 Ile Leu Ile Thr Pro Glu Gly Asp Asp Lys Pro
Gly 460 465 Ala Val Gly Lys Val Val Pro Phe Phe Glu Ala Lys 470 475
480 Val Val Asp Leu Asp Thr Gly Lys Thr Leu Gly Val 485 490 Asn Gln
Arg Gly Glu Leu Cys Val Arg Gly Pro Met 495 500 Ile Met Ser Gly Tyr
Val Asn Asn Pro Glu Ala Thr 505 510 515 Asn Ala Leu Ile Asp Lys Asp
Gly Trp Leu His Ser 520 525 Gly Asp Ile Ala Tyr Trp Asp Glu Asp Glu
His Phe 530 535 540 Phe Ile Val Asp Arg Leu Lys Ser Leu Ile Lys Tyr
545 550 Lys Gly Tyr Gln Val Ala Pro Ala Glu Leu Glu Ser 555 560 Ile
Leu Leu Gln His Pro Asn Ile Phe Asp Ala Gly 565 570 575 Val Ala Gly
Leu Pro Asp Asp Asp Ala Gly Glu Leu 580 585 Pro Ala Ala Val Val Val
Leu Glu His Gly Lys Thr 590 595 600 Met Thr Glu Lys Glu Ile Val Asp
Tyr Val Ala Ser 605 610 Gln Val Thr Thr Ala Lys Lys Leu Arg Gly Gly
Val 615 620 Val Phe Val Asp Glu Val Pro Lys Gly Leu Thr Gly 625 630
635 Lys Leu Asp Ala Arg Lys Ile Arg Glu Ile Leu Ile 640 645 Lys Ala
Lys Lys Gly Gly Lys Ser Lys Leu Ala Ala 650 655 660 Ala Met Pro Ala
Pro His Gly Gly Ile Leu Gln Asp 665 670 Leu Ile Ala Arg Asp Ala Leu
Lys Lys Asn Glu Leu 675 680 Leu Ser Glu Ala Gln Ser Ser Asp Ile Leu
Val Trp 685 690 695 Asn Leu Thr Pro Arg Gln Leu Cys Asp Ile Glu Leu
700 705 Ile Leu Asn Gly Gly Phe Ser Pro Leu Thr Gly Phe 710 715 Leu
Asn Glu Asn Asp Tyr Ser Ser Val Val Thr Asp 720 725 730 Ser Arg Leu
Ala Asp Gly Thr Leu Trp Thr Ile Pro 735 740 Ile Thr Leu Asp Val Asp
Glu Ala Phe Ala Asn Gln 745 750 755 Ile Lys Pro Asp Thr Arg Ile Ala
Leu Phe Gln Asp 760 765 Asp Glu Ile Pro Ile Ala Ile Leu Thr Val Gln
Asp 770 775 Val Tyr Lys Pro Asn Lys Thr Ile Glu Ala Glu Lys 780 785
790 Val Phe Arg Gly Asp Pro Glu His Pro Ala Ile Ser 795 800 Tyr Leu
Phe Asn Val Ala Gly Asp Tyr Tyr Val Gly 805 810 815 Gly Ser Leu Glu
Ala Ile Gln Leu Pro Gln His Tyr 820 825 Asp Tyr Pro Gly Leu Arg Lys
Thr Pro Ala Gln Leu 830 835 Arg Leu Glu Phe Gln Ser Arg Gln Trp Asp
Arg Val 840 845 850 Val Ala Phe Gln Thr Arg Asn Pro Met His Arg Ala
855 860 His Arg Glu Leu Thr Val Arg Ala Ala Arg Glu Ala 865 870 875
Asn Ala Lys Val Leu Ile His Pro Val Val Gly Leu 880 885 Thr Lys Pro
Gly Asp Ile Asp His His Thr Arg Val 890 895 Arg Val Tyr Gln Glu Ile
Ile Lys Arg Tyr Pro Asn 900 905 910 Gly Ile Ala Phe Leu Ser Leu Leu
Pro Leu Ala Met 915 920 Arg Met Ser Gly Asp Arg Glu Ala Val Trp His
Ala 925 930 935 Ile Ile Arg Lys Asn Tyr Gly Ala Ser His Phe Ile 940
945 Val Gly Arg Asp His Ala Gly Pro Gly Lys Asn Ser 950 955 Lys Gly
Val Asp Phe Tyr Gly Pro Tyr Asp Ala Gln 960 965 970 Glu Leu Val Glu
Ser Tyr Lys His Glu Leu Asp Ile 975 980
Glu Val Val Pro Phe Arg Met Val Thr Tyr Leu Pro 985 990 995 Asp Glu
Asp Arg Tyr Ala Pro Ile Asp Gln Ile Asp 1000 1005 Thr Thr Lys Thr
Arg Thr Leu Asn Ile Ser Gly Thr 1010 1015 Glu Leu Arg Arg Arg Leu
Arg Val Gly Gly Glu Ile 1020 1025 1030 Pro Glu Trp Phe Ser Tyr Pro
Glu Val Val Lys Ile 1035 1040 Leu Arg Glu Ser Asn Pro Pro Arg Pro
Lys Gln Gly 1045 1050 1055 Phe Ser Ile Val Leu Gly Asn Ser Leu Thr
Val Ser 1060 1065 Arg Glu Gln Leu Ser Ile Ala Leu Leu Ser Thr Phe
1070 1075 Leu Gln Phe Gly Gly Gly Arg Tyr Tyr Lys Ile Phe 1080 1085
1090 Glu His Asn Asn Lys Thr Glu Leu Leu Ser Leu Ile 1095 1100 Gln
Asp Phe Ile Gly Ser Gly Ser Gly Leu Ile Ile 1105 1110 1115 Pro Asn
Gln Trp Glu Asp Asp Lys Asp Ser Val Val 1120 1125 Gly Lys Gln Asn
Val Tyr Leu Leu Asp Thr Ser Ser 1130 1135 Ser Ala Asp Ile Gln Leu
Glu Ser Ala Asp Glu Pro 1140 1145 1150 Ile Ser His Ile Val Gln Lys
Val Val Leu Phe Leu 1155 1160 Glu Asp Asn Gly Phe Phe Val Phe 1165
1170
[0065] Accordingly, in one aspect, the invention provides for a
fusion protein comprising a thermostable sulfurylase joined to at
least one affinity tag. The nucleic acid sequence of the disclosed
N-terminal hexahistidine-BCCP Bst ATP Sulfurylase (His6-BCCP Bst
Sulfurylase) gene is shown below: TABLE-US-00005 His6-BCCP Bst
Sulfurylase Nucleotide Sequence (SEQ ID NO: 5)
ATGCGGGGTTCTCATGATCATCATCATCATGGTATGGCTAGCATGGAAGGGCCAGCAGCA 60
GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTACTTTCTACCGCACCCCA 120
AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG 180
TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG 240
AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC 300
GAGGGATCCGAGCTCGAGATCTGCAGCATGAGCGTAAGCATCCCGCATGGCGGCACATTG 360
ATCAACCGTTGGAATCCGGATTACCCAATCGATGAAGCAACGAAAACGATCGAGCTGTCC 420
AAAGCCGAACTAAGCGACCTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGG 480
TTTTTAACGAAAGCCGATTACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACT 540
GTCTGGAGCATTCCGATCACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTC 600
GGCGACAAAGCGAAACTCGTTTATGGCGGCGACGTCTACGGCGTCATTGAAATCGCCGAT 660
ATTTACCGCCCGGATAAAACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCT 720
CACCCGGGCGTGCGCAAGCTGTTTGAAAAACCAGATGTGTACGTCGGCGGAGCGGTTACG 780
CTCGTCAAACGGACCGACAAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACG 840
CGGAAACGATTTGCCGAACTCGGCTGGAATACCGTCGTCGGCTTCCAAACACGCAACCCG 900
GTTCACCGCGCCCATGAATACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTT 960
TTAAACCCGCTCGTCGGCGAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAA 1020
AGCTATCAAGTGCTGCTGGAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTC 1080
CAAGCTGCGATGCGCTATGCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAA 1140
AACTTCGGCTGCACGCACTTCATCGTCGGCCGCGACCATGCGGGCGTCGGCAACTATTAC 1200
GGCACGTATGATGCGCAAAAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACA 1260
CCGCTCTTTTTCGAACACAGCTTTTATTGCACGAAATGCGAAGGCATGGCATCGACGAAA 1320
ACATGCCCGCACGACGCACAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATG 1380
TTGCGTAACGGCCAAGTGCCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGCCGTTTTG 1440
ATCAAAGGGCTGCAAGAACGCGAAACGGTCGCCCCGTCAGCGGGCTAA 1488
[0066] The amino acid sequence of the His6-BCCP Bst Sulfurylase
polypeptide is presented using the three letter amino acid code in
Table 6 (SEQ ID NO:6).
Sequence CWU 1
1
31 1 1247 DNA Bacillus stearothermophilus 1 gttatgaaca tgagtttgag
cattccgcat ggcggcacat tgatcaaccg ttggaatccg 60 gattacccaa
tcgatgaagc aacgaaaacg atcgagctgt ccaaagccga actaagcgac 120
cttgagctga tcggcacagg cgcctacagc ccgctcaccg ggtttttaac gaaagccgat
180 tacgatgcgg tcgtagaaac gatgcgcctc gctgatggca ctgtctggag
cattccgatc 240 acgctggcgg tgacggaaga aaaagcgagt gaactcactg
tcggcgacaa agcgaaactc 300 gtttatggcg gcgacgtcta cggcgtcatt
gaaatcgccg atatttaccg cccggataaa 360 acgaaagaag ccaagctcgt
ctataaaacc gatgaactcg ctcacccggg cgtgcgcaag 420 ctgtttgaaa
aaccagatgt gtacgtcggc ggagcggtta cgctcgtcaa acggaccgac 480
aaaggccagt ttgctccgtt ttatttcgat ccggccgaaa cgcggaaacg atttgccgaa
540 ctcggctgga ataccgtcgt cggcttccaa acacgcaacc cggttcaccg
cgcccatgaa 600 tacattcaaa aatgcgcgct tgaaatcgtg gacggcttgt
ttttaaaccc gctcgtcggc 660 gaaacgaaag cggacgatat tccggccgac
atccggatgg aaagctatca agtgctgctg 720 gaaaactatt atccgaaaga
ccgcgttttc ttgggcgtct tccaagctgc gatgcgctat 780 gccggtccgc
gcgaagcgat tttccatgcc atggtgcgga aaaacttcgg ctgcacgcac 840
ttcatcgtcg gccgcgacca tgcgggcgtc ggcaactatt acggcacgta tgatgcgcaa
900 aaaatcttct cgaactttac agccgaagag cttggcatta caccgctctt
tttcgaacac 960 agcttttatt gcacgaaatg cgaaggcatg gcatcgacga
aaacatgccc gcacgacgca 1020 caatatcacg ttgtcctttc tggcacgaaa
gtccgtgaaa tgttgcgtaa cggccaagtg 1080 ccgccgagca cattcagccg
tccggaagtg gccgccgttt tgatcaaagg gctgcaagaa 1140 cgcgaaacgg
tcaccccgtc gacacgctaa aggaggagcg agatgagcac gaatatcgtt 1200
tggcatcata catcggtgac aaaagaagat cgccgccaac gcaacgg 1247 2 386 PRT
Bacillus stearothermophilus 2 Met Ser Leu Ser Ile Pro His Gly Gly
Thr Leu Ile Asn Arg Trp Asn 1 5 10 15 Pro Asp Tyr Pro Ile Asp Glu
Ala Thr Lys Thr Ile Glu Leu Ser Lys 20 25 30 Ala Glu Leu Ser Asp
Leu Glu Leu Ile Gly Thr Gly Ala Tyr Ser Pro 35 40 45 Leu Thr Gly
Phe Leu Thr Lys Ala Asp Tyr Asp Ala Val Val Glu Thr 50 55 60 Met
Arg Leu Ala Asp Gly Thr Val Trp Ser Ile Pro Ile Thr Leu Ala 65 70
75 80 Val Thr Glu Glu Lys Ala Ser Glu Leu Thr Val Gly Asp Lys Ala
Lys 85 90 95 Leu Val Tyr Gly Gly Asp Val Tyr Gly Val Ile Glu Ile
Ala Asp Ile 100 105 110 Tyr Arg Pro Asp Lys Thr Lys Glu Ala Lys Leu
Val Tyr Lys Thr Asp 115 120 125 Glu Leu Ala His Pro Gly Val Arg Lys
Leu Phe Glu Lys Pro Asp Val 130 135 140 Tyr Val Gly Gly Ala Val Thr
Leu Val Lys Arg Thr Asp Lys Gly Gln 145 150 155 160 Phe Ala Pro Phe
Tyr Phe Asp Pro Ala Glu Thr Arg Lys Arg Phe Ala 165 170 175 Glu Leu
Gly Trp Asn Thr Val Val Gly Phe Gln Thr Arg Asn Pro Val 180 185 190
His Arg Ala His Glu Tyr Ile Gln Lys Cys Ala Leu Glu Ile Val Asp 195
200 205 Gly Leu Phe Leu Asn Pro Leu Val Gly Glu Thr Lys Ala Asp Asp
Ile 210 215 220 Pro Ala Asp Ile Arg Met Glu Ser Tyr Gln Val Leu Leu
Glu Asn Tyr 225 230 235 240 Tyr Pro Lys Asp Arg Val Phe Leu Gly Val
Phe Gln Ala Ala Met Arg 245 250 255 Tyr Ala Gly Pro Arg Glu Ala Ile
Phe His Ala Met Val Arg Lys Asn 260 265 270 Phe Gly Cys Thr His Phe
Ile Val Gly Arg Asp His Ala Gly Val Gly 275 280 285 Asn Tyr Tyr Gly
Thr Tyr Asp Ala Gln Lys Ile Phe Ser Asn Phe Thr 290 295 300 Ala Glu
Glu Leu Gly Ile Thr Pro Leu Phe Phe Glu His Ser Phe Tyr 305 310 315
320 Cys Thr Lys Cys Glu Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp
325 330 335 Ala Gln Tyr His Val Val Leu Ser Gly Thr Lys Val Arg Glu
Met Leu 340 345 350 Arg Asn Gly Gln Val Pro Pro Ser Thr Phe Ser Arg
Pro Glu Val Ala 355 360 365 Ala Val Leu Ile Lys Gly Leu Gln Glu Arg
Glu Thr Val Thr Pro Ser 370 375 380 Thr Arg 385 3 3519 DNA
Escherichia coli 3 atgcggggtt ctcatcatca tcatcatcat ggtatggcta
gcatggaagc gccagcagca 60 gcggaaatca gtggtcacat cgtacgttcc
ccgatggttg gtactttcta ccgcacccca 120 agcccggacg caaaagcgtt
catcgaagtg ggtcagaaag tcaacgtggg cgataccctg 180 tgcatcgttg
aagccatgaa aatgatgaac cagatcgaag cggacaaatc cggtaccgtg 240
aaagcaattc tggtcgaaag tggacaaccg gtagaatttg acgagccgct ggtcgtcatc
300 gagggatccg agctcgagat ccaaatggaa gacgccaaaa acataaagaa
aggcccggcg 360 ccattctatc ctctagagga tggaaccgct ggagagcaac
tgcataaggc tatgaagaga 420 tacgccctgg ttcctggaac aattgctttt
acagatgcac atatcgaggt gaacatcacg 480 tacgcggaat acttcgaaat
gtccgttcgg ttggcagaag ctatgaaacg atatgggctg 540 aatacaaatc
acagaatcgt cgtatgcagt gaaaactctc ttcaattctt tatgccggtg 600
ttgggcgcgt tatttatcgg agttgcagtt gcgcccgcga acgacattta taatgaacgt
660 gaattgctca acagtatgaa catttcgcag cctaccgtag tgtttgtttc
caaaaagggg 720 ttgcaaaaaa ttttgaacgt gcaaaaaaaa ttaccaataa
tccagaaaat tattatcatg 780 gattctaaaa cggattacca gggatttcag
tcgatgtaca cgttcgtcac atctcatcta 840 cctcccggtt ttaatgaata
cgattttgta ccagagtcct ttgatcgtga caaaacaatt 900 gcactgataa
tgaattcctc tggatctact gggttaccta agggtgtggc ccttccgcat 960
agaactgcct gcgtcagatt ctcgcatgcc agagatccta tttttggcaa tcaaatcatt
1020 ccggatactg cgattttaag tgttgttcca ttccatcacg gttttggaat
gtttactaca 1080 ctcggatatt tgatatgtgg atttcgagtc gtcttaatgt
atagatttga agaagagctg 1140 tttttacgat cccttcagga ttacaaaatt
caaagtgcgt tgctagtacc aaccctattt 1200 tcattcttcg ccaaaagcac
tctgattgac aaatacgatt tatctaattt acacgaaatt 1260 gcttctgggg
gcgcacctct ttcgaaagaa gtcggggaag cggttgcaaa acgcttccat 1320
cttccaggga tacgacaagg atatgggctc actgagacta catcagctat tctgattaca
1380 cccgaggggg atgataaacc gggcgcggtc ggtaaagttg ttccattttt
tgaagcgaag 1440 gttgtggatc tggataccgg gaaaacgctg ggcgttaatc
agagaggcga attatgtgtc 1500 agaggaccta tgattatgtc cggttatgta
aacaatccgg aagcgaccaa cgccttgatt 1560 gacaaggatg gatggctaca
ttctggagac atagcttact gggacgaaga cgaacacttc 1620 ttcatagttg
accgcttgaa gtctttaatt aaatacaaag gatatcaggt ggcccccgct 1680
gaattggaat cgatattgtt acaacacccc aacatcttcg acgcgggcgt ggcaggtctt
1740 cccgacgatg acgccggtga acttcccgcc gccgttgttg ttttggagca
cggaaagacg 1800 atgacggaaa aagagatcgt ggattacgtc gccagtcaag
taacaaccgc gaaaaagttg 1860 cgcggaggag ttgtgtttgt ggacgaagta
ccgaaaggtc ttaccggaaa actcgacgca 1920 agaaaaatca gagagatcct
cataaaggcc aagaagggcg gaaagtccaa attggcggcc 1980 gctatgcctg
ctcctcacgg tggtattcta caagacttga ttgctagaga tgcgttaaag 2040
aagaatgaat tgttatctga agcgcaatct tcggacattt tagtatggaa cttgactcct
2100 agacaactat gtgatattga attgattcta aatggtgggt tttctcctct
gactgggttt 2160 ttgaacgaaa acgattactc ctctgttgtt acagattcga
gattagcaga cggcacattg 2220 tggaccatcc ctattacatt agatgttgat
gaagcatttg ctaaccaaat taaaccagac 2280 acaagaattg cccttttcca
agatgatgaa attcctattg ctatacttac tgtccaggat 2340 gtttacaagc
caaacaaaac tatcgaagcc gaaaaagtct tcagaggtga cccagaacat 2400
ccagccatta gctatttatt taacgttgcc ggtgattatt acgtcggcgg ttctttagaa
2460 gcgattcaat tacctcaaca ttatgactat ccaggtttgc gtaagacacc
tgcccaacta 2520 agacttgaat tccaatcaag acaatgggac cgtgtcgtag
ctttccaaac tcgtaatcca 2580 atgcatagag cccacaggga gttgactgtg
agagccgcca gagaagctaa tgctaaggtg 2640 ctgatccatc cagttgttgg
actaaccaaa ccaggtgata tagaccatca cactcgtgtt 2700 cgtgtctacc
aggaaattat taagcgttat cctaatggta ttgctttctt atccctgttg 2760
ccattagcaa tgagaatgag tggtgataga gaagccgtat ggcatgctat tattagaaag
2820 aattatggtg cctcccactt cattgttggt agagaccatg cgggcccagg
taagaactcc 2880 aagggtgttg atttctacgg tccatacgat gctcaagaat
tggtcgaatc ctacaagcat 2940 gaactggaca ttgaagttgt tccattcaga
atggtcactt atttgccaga cgaagaccgt 3000 tatgctccaa ttgatcaaat
tgacaccaca aagacgagaa ccttgaacat ttcaggtaca 3060 gagttgagac
gccgtttaag agttggtggt gagattcctg aatggttctc atatcctgaa 3120
gtggttaaaa tcctaagaga atccaaccca ccaagaccaa aacaaggttt ttcaattgtt
3180 ttaggtaatt cattaaccgt ttctcgtgag caattatcca ttgctttgtt
gtcaacattc 3240 ttgcaattcg gtggtggcag gtattacaag atctttgaac
acaataataa gacagagtta 3300 ctatctttga ttcaagattt cattggttct
ggtagtggac taattattcc aaatcaatgg 3360 gaagatgaca aggactctgt
tgttggcaag caaaacgttt acttattaga tacctcaagc 3420 tcagccgata
ttcagctaga gtcagcggat gaacctattt cacatattgt acaaaaagtt 3480
gtcctattct tggaagacaa tggctttttt gtattttaa 3519 4 1172 PRT
Escherichia coli 4 Met Arg Gly Ser His His His His His His Gly Met
Ala Ser Met Glu 1 5 10 15 Ala Pro Ala Ala Ala Glu Ile Ser Gly His
Ile Val Arg Ser Pro Met 20 25 30 Val Gly Thr Phe Tyr Arg Thr Pro
Ser Pro Asp Ala Lys Ala Phe Ile 35 40 45 Glu Val Gly Gln Lys Val
Asn Val Gly Asp Thr Leu Cys Ile Val Glu 50 55 60 Ala Met Lys Met
Met Asn Gln Ile Glu Ala Asp Lys Ser Gly Thr Val 65 70 75 80 Lys Ala
Ile Leu Val Glu Ser Gly Gln Pro Val Glu Phe Asp Glu Pro 85 90 95
Leu Val Val Ile Glu Gly Ser Glu Leu Glu Ile Gln Met Glu Asp Ala 100
105 110 Lys Asn Ile Lys Lys Gly Pro Ala Pro Phe Tyr Pro Leu Glu Asp
Gly 115 120 125 Thr Ala Gly Glu Gln Leu His Lys Ala Met Lys Arg Tyr
Ala Leu Val 130 135 140 Pro Gly Thr Ile Ala Phe Thr Asp Ala His Ile
Glu Val Asn Ile Thr 145 150 155 160 Tyr Ala Glu Tyr Phe Glu Met Ser
Val Arg Leu Ala Glu Ala Met Lys 165 170 175 Arg Tyr Gly Leu Asn Thr
Asn His Arg Ile Val Val Cys Ser Glu Asn 180 185 190 Ser Leu Gln Phe
Phe Met Pro Val Leu Gly Ala Leu Phe Ile Gly Val 195 200 205 Ala Val
Ala Pro Ala Asn Asp Ile Tyr Asn Glu Arg Glu Leu Leu Asn 210 215 220
Ser Met Asn Ile Ser Gln Pro Thr Val Val Phe Val Ser Lys Lys Gly 225
230 235 240 Leu Gln Lys Ile Leu Asn Val Gln Lys Lys Leu Pro Ile Ile
Gln Lys 245 250 255 Ile Ile Ile Met Asp Ser Lys Thr Asp Tyr Gln Gly
Phe Gln Ser Met 260 265 270 Tyr Thr Phe Val Thr Ser His Leu Pro Pro
Gly Phe Asn Glu Tyr Asp 275 280 285 Phe Val Pro Glu Ser Phe Asp Arg
Asp Lys Thr Ile Ala Leu Ile Met 290 295 300 Asn Ser Ser Gly Ser Thr
Gly Leu Pro Lys Gly Val Ala Leu Pro His 305 310 315 320 Arg Thr Ala
Cys Val Arg Phe Ser His Ala Arg Asp Pro Ile Phe Gly 325 330 335 Asn
Gln Ile Ile Pro Asp Thr Ala Ile Leu Ser Val Val Pro Phe His 340 345
350 His Gly Phe Gly Met Phe Thr Thr Leu Gly Tyr Leu Ile Cys Gly Phe
355 360 365 Arg Val Val Leu Met Tyr Arg Phe Glu Glu Glu Leu Phe Leu
Arg Ser 370 375 380 Leu Gln Asp Tyr Lys Ile Gln Ser Ala Leu Leu Val
Pro Thr Leu Phe 385 390 395 400 Ser Phe Phe Ala Lys Ser Thr Leu Ile
Asp Lys Tyr Asp Leu Ser Asn 405 410 415 Leu His Glu Ile Ala Ser Gly
Gly Ala Pro Leu Ser Lys Glu Val Gly 420 425 430 Glu Ala Val Ala Lys
Arg Phe His Leu Pro Gly Ile Arg Gln Gly Tyr 435 440 445 Gly Leu Thr
Glu Thr Thr Ser Ala Ile Leu Ile Thr Pro Glu Gly Asp 450 455 460 Asp
Lys Pro Gly Ala Val Gly Lys Val Val Pro Phe Phe Glu Ala Lys 465 470
475 480 Val Val Asp Leu Asp Thr Gly Lys Thr Leu Gly Val Asn Gln Arg
Gly 485 490 495 Glu Leu Cys Val Arg Gly Pro Met Ile Met Ser Gly Tyr
Val Asn Asn 500 505 510 Pro Glu Ala Thr Asn Ala Leu Ile Asp Lys Asp
Gly Trp Leu His Ser 515 520 525 Gly Asp Ile Ala Tyr Trp Asp Glu Asp
Glu His Phe Phe Ile Val Asp 530 535 540 Arg Leu Lys Ser Leu Ile Lys
Tyr Lys Gly Tyr Gln Val Ala Pro Ala 545 550 555 560 Glu Leu Glu Ser
Ile Leu Leu Gln His Pro Asn Ile Phe Asp Ala Gly 565 570 575 Val Ala
Gly Leu Pro Asp Asp Asp Ala Gly Glu Leu Pro Ala Ala Val 580 585 590
Val Val Leu Glu His Gly Lys Thr Met Thr Glu Lys Glu Ile Val Asp 595
600 605 Tyr Val Ala Ser Gln Val Thr Thr Ala Lys Lys Leu Arg Gly Gly
Val 610 615 620 Val Phe Val Asp Glu Val Pro Lys Gly Leu Thr Gly Lys
Leu Asp Ala 625 630 635 640 Arg Lys Ile Arg Glu Ile Leu Ile Lys Ala
Lys Lys Gly Gly Lys Ser 645 650 655 Lys Leu Ala Ala Ala Met Pro Ala
Pro His Gly Gly Ile Leu Gln Asp 660 665 670 Leu Ile Ala Arg Asp Ala
Leu Lys Lys Asn Glu Leu Leu Ser Glu Ala 675 680 685 Gln Ser Ser Asp
Ile Leu Val Trp Asn Leu Thr Pro Arg Gln Leu Cys 690 695 700 Asp Ile
Glu Leu Ile Leu Asn Gly Gly Phe Ser Pro Leu Thr Gly Phe 705 710 715
720 Leu Asn Glu Asn Asp Tyr Ser Ser Val Val Thr Asp Ser Arg Leu Ala
725 730 735 Asp Gly Thr Leu Trp Thr Ile Pro Ile Thr Leu Asp Val Asp
Glu Ala 740 745 750 Phe Ala Asn Gln Ile Lys Pro Asp Thr Arg Ile Ala
Leu Phe Gln Asp 755 760 765 Asp Glu Ile Pro Ile Ala Ile Leu Thr Val
Gln Asp Val Tyr Lys Pro 770 775 780 Asn Lys Thr Ile Glu Ala Glu Lys
Val Phe Arg Gly Asp Pro Glu His 785 790 795 800 Pro Ala Ile Ser Tyr
Leu Phe Asn Val Ala Gly Asp Tyr Tyr Val Gly 805 810 815 Gly Ser Leu
Glu Ala Ile Gln Leu Pro Gln His Tyr Asp Tyr Pro Gly 820 825 830 Leu
Arg Lys Thr Pro Ala Gln Leu Arg Leu Glu Phe Gln Ser Arg Gln 835 840
845 Trp Asp Arg Val Val Ala Phe Gln Thr Arg Asn Pro Met His Arg Ala
850 855 860 His Arg Glu Leu Thr Val Arg Ala Ala Arg Glu Ala Asn Ala
Lys Val 865 870 875 880 Leu Ile His Pro Val Val Gly Leu Thr Lys Pro
Gly Asp Ile Asp His 885 890 895 His Thr Arg Val Arg Val Tyr Gln Glu
Ile Ile Lys Arg Tyr Pro Asn 900 905 910 Gly Ile Ala Phe Leu Ser Leu
Leu Pro Leu Ala Met Arg Met Ser Gly 915 920 925 Asp Arg Glu Ala Val
Trp His Ala Ile Ile Arg Lys Asn Tyr Gly Ala 930 935 940 Ser His Phe
Ile Val Gly Arg Asp His Ala Gly Pro Gly Lys Asn Ser 945 950 955 960
Lys Gly Val Asp Phe Tyr Gly Pro Tyr Asp Ala Gln Glu Leu Val Glu 965
970 975 Ser Tyr Lys His Glu Leu Asp Ile Glu Val Val Pro Phe Arg Met
Val 980 985 990 Thr Tyr Leu Pro Asp Glu Asp Arg Tyr Ala Pro Ile Asp
Gln Ile Asp 995 1000 1005 Thr Thr Lys Thr Arg Thr Leu Asn Ile Ser
Gly Thr Glu Leu Arg Arg 1010 1015 1020 Arg Leu Arg Val Gly Gly Glu
Ile Pro Glu Trp Phe Ser Tyr Pro Glu 1025 1030 1035 1040 Val Val Lys
Ile Leu Arg Glu Ser Asn Pro Pro Arg Pro Lys Gln Gly 1045 1050 1055
Phe Ser Ile Val Leu Gly Asn Ser Leu Thr Val Ser Arg Glu Gln Leu
1060 1065 1070 Ser Ile Ala Leu Leu Ser Thr Phe Leu Gln Phe Gly Gly
Gly Arg Tyr 1075 1080 1085 Tyr Lys Ile Phe Glu His Asn Asn Lys Thr
Glu Leu Leu Ser Leu Ile 1090 1095 1100 Gln Asp Phe Ile Gly Ser Gly
Ser Gly Leu Ile Ile Pro Asn Gln Trp 1105 1110 1115 1120 Glu Asp Asp
Lys Asp Ser Val Val Gly Lys Gln Asn Val Tyr Leu Leu 1125 1130 1135
Asp Thr Ser Ser Ser Ala Asp Ile Gln Leu Glu Ser Ala Asp Glu Pro
1140 1145 1150 Ile Ser His Ile Val Gln Lys Val Val Leu Phe Leu Glu
Asp Asn Gly 1155 1160 1165 Phe Phe Val Phe 1170 5 1488 DNA
Escherichia coli 5 atgcggggtt ctcatcatca tcatcatcat ggtatggcta
gcatggaagc gccagcagca 60 gcggaaatca gtggtcacat cgtacgttcc
ccgatggttg gtactttcta ccgcacccca 120 agcccggacg caaaagcgtt
catcgaagtg ggtcagaaag tcaacgtggg cgataccctg 180 tgcatcgttg
aagccatgaa aatgatgaac cagatcgaag cggacaaatc cggtaccgtg 240
aaagcaattc tggtcgaaag tggacaaccg gtagaatttg acgagccgct ggtcgtcatc
300 gagggatccg agctcgagat ctgcagcatg agcgtaagca tcccgcatgg
cggcacattg 360 atcaaccgtt ggaatccgga ttacccaatc gatgaagcaa
cgaaaacgat cgagctgtcc 420 aaagccgaac taagcgacct tgagctgatc
ggcacaggcg cctacagccc gctcaccggg 480 tttttaacga aagccgatta
cgatgcggtc gtagaaacga tgcgcctcgc tgatggcact 540
gtctggagca ttccgatcac gctggcggtg acggaagaaa aagcgagtga actcactgtc
600 ggcgacaaag cgaaactcgt ttatggcggc gacgtctacg gcgtcattga
aatcgccgat 660 atttaccgcc cggataaaac gaaagaagcc aagctcgtct
ataaaaccga tgaactcgct 720 cacccgggcg tgcgcaagct gtttgaaaaa
ccagatgtgt acgtcggcgg agcggttacg 780 ctcgtcaaac ggaccgacaa
aggccagttt gctccgtttt atttcgatcc ggccgaaacg 840 cggaaacgat
ttgccgaact cggctggaat accgtcgtcg gcttccaaac acgcaacccg 900
gttcaccgcg cccatgaata cattcaaaaa tgcgcgcttg aaatcgtgga cggcttgttt
960 ttaaacccgc tcgtcggcga aacgaaagcg gacgatattc cggccgacat
ccggatggaa 1020 agctatcaag tgctgctgga aaactattat ccgaaagacc
gcgttttctt gggcgtcttc 1080 caagctgcga tgcgctatgc cggtccgcgc
gaagcgattt tccatgccat ggtgcggaaa 1140 aacttcggct gcacgcactt
catcgtcggc cgcgaccatg cgggcgtcgg caactattac 1200 ggcacgtatg
atgcgcaaaa aatcttctcg aactttacag ccgaagagct tggcattaca 1260
ccgctctttt tcgaacacag cttttattgc acgaaatgcg aaggcatggc atcgacgaaa
1320 acatgcccgc acgacgcaca atatcacgtt gtcctttctg gcacgaaagt
ccgtgaaatg 1380 ttgcgtaacg gccaagtgcc gccgagcaca ttcagccgtc
cggaagtggc cgccgttttg 1440 atcaaagggc tgcaagaacg cgaaacggtc
gccccgtcag cgcgctaa 1488 6 495 PRT Escherichia coli 6 Met Arg Gly
Ser His His His His His His Gly Met Ala Ser Met Glu 1 5 10 15 Ala
Pro Ala Ala Ala Glu Ile Ser Gly His Ile Val Arg Ser Pro Met 20 25
30 Val Gly Thr Phe Tyr Arg Thr Pro Ser Pro Asp Ala Lys Ala Phe Ile
35 40 45 Glu Val Gly Gln Lys Val Asn Val Gly Asp Thr Leu Cys Ile
Val Glu 50 55 60 Ala Met Lys Met Met Asn Gln Ile Glu Ala Asp Lys
Ser Gly Thr Val 65 70 75 80 Lys Ala Ile Leu Val Glu Ser Gly Gln Pro
Val Glu Phe Asp Glu Pro 85 90 95 Leu Val Val Ile Glu Gly Ser Glu
Leu Glu Ile Cys Ser Met Ser Val 100 105 110 Ser Ile Pro His Gly Gly
Thr Leu Ile Asn Arg Trp Asn Pro Asp Tyr 115 120 125 Pro Ile Asp Glu
Ala Thr Lys Thr Ile Glu Leu Ser Lys Ala Glu Leu 130 135 140 Ser Asp
Leu Glu Leu Ile Gly Thr Gly Ala Tyr Ser Pro Leu Thr Gly 145 150 155
160 Phe Leu Thr Lys Ala Asp Tyr Asp Ala Val Val Glu Thr Met Arg Leu
165 170 175 Ala Asp Gly Thr Val Trp Ser Ile Pro Ile Thr Leu Ala Val
Thr Glu 180 185 190 Glu Lys Ala Ser Glu Leu Thr Val Gly Asp Lys Ala
Lys Leu Val Tyr 195 200 205 Gly Gly Asp Val Tyr Gly Val Ile Glu Ile
Ala Asp Ile Tyr Arg Pro 210 215 220 Asp Lys Thr Lys Glu Ala Lys Leu
Val Tyr Lys Thr Asp Glu Leu Ala 225 230 235 240 His Pro Gly Val Arg
Lys Leu Phe Glu Lys Pro Asp Val Tyr Val Gly 245 250 255 Gly Ala Val
Thr Leu Val Lys Arg Thr Asp Lys Gly Gln Phe Ala Pro 260 265 270 Phe
Tyr Phe Asp Pro Ala Glu Thr Arg Lys Arg Phe Ala Glu Leu Gly 275 280
285 Trp Asn Thr Val Val Gly Phe Gln Thr Arg Asn Pro Val His Arg Ala
290 295 300 His Glu Tyr Ile Gln Lys Cys Ala Leu Glu Ile Val Asp Gly
Leu Phe 305 310 315 320 Leu Asn Pro Leu Val Gly Glu Thr Lys Ala Asp
Asp Ile Pro Ala Asp 325 330 335 Ile Arg Met Glu Ser Tyr Gln Val Leu
Leu Glu Asn Tyr Tyr Pro Lys 340 345 350 Asp Arg Val Phe Leu Gly Val
Phe Gln Ala Ala Met Arg Tyr Ala Gly 355 360 365 Pro Arg Glu Ala Ile
Phe His Ala Met Val Arg Lys Asn Phe Gly Cys 370 375 380 Thr His Phe
Ile Val Gly Arg Asp His Ala Gly Val Gly Asn Tyr Tyr 385 390 395 400
Gly Thr Tyr Asp Ala Gln Lys Ile Phe Ser Asn Phe Thr Ala Glu Glu 405
410 415 Leu Gly Ile Thr Pro Leu Phe Phe Glu His Ser Phe Tyr Cys Thr
Lys 420 425 430 Cys Glu Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp
Ala Gln Tyr 435 440 445 His Val Val Leu Ser Gly Thr Lys Val Arg Glu
Met Leu Arg Asn Gly 450 455 460 Gln Val Pro Pro Ser Thr Phe Ser Arg
Pro Glu Val Ala Ala Val Leu 465 470 475 480 Ile Lys Gly Leu Gln Glu
Arg Glu Thr Val Ala Pro Ser Ala Arg 485 490 495 7 45 DNA Artificial
Sequence Description of Artificial Sequence primer 7 cccttctgca
gcatgagcgt aagcatcccg catggcggca cattg 45 8 47 DNA Artificial
Sequence Description of Artificial Sequence primer 8 cccgtaagct
tttagcgcgc tgacggggcg accgtttcgc gttcttg 47 9 48 DNA Artificial
Sequence Description of Artificial Sequence primer 9 ccccctcgag
atccaaatgg aagacgccaa aaacataaag aaaggccc 48 10 45 DNA Artificial
Sequence Description of Artificial Sequence primer 10 ccccctcgag
atccaaatgg ctgacaaaaa catcctgtat ggccc 45 11 67 DNA Artificial
Sequence Description of Artificial Sequence primer 11 ttgtagaata
ccaccgtgag gagcaggcat agcggccgcc aatttggact ttccgccctt 60 cttggcc
67 12 64 DNA Artificial Sequence Description of Artificial Sequence
primer 12 ttgtagaata ccaccgtgag gagcaggcat agcggccgca ccgttggtgt
gtttctcgaa 60 catc 64 13 37 DNA Artificial Sequence Description of
Artificial Sequence primer 13 gcggccgcta tgcctgctcc tcacggtggt
attctac 37 14 49 DNA Artificial Sequence Description of Artificial
Sequence primer 14 ccccaagctt ttaaaataca aaaaagccat tgtcttccaa
gaataggac 49 15 49 DNA Artificial Sequence Description of
Artificial Sequence primer 15 ccccggatcc atccaaatgc ctgctcctca
cggtggtatt ctacaagac 49 16 62 DNA Artificial Sequence Description
of Artificial Sequence primer 16 gggcctttct ttatgttttt ggcgtcttcc
atagcggccg caaatacaaa aaagccattg 60 tc 62 17 41 DNA Artificial
Sequence Description of Artificial Sequence primer 17 gcggccgcta
tggaagacgc caaaaacata aagaaaggcc c 41 18 41 DNA Artificial Sequence
Description of Artificial Sequence primer 18 ccccccatgg ttacaatttg
gactttccgc ccttcttggc c 41 19 59 DNA Artificial Sequence
Description of Artificial Sequence primer 19 gggccataca ggatgttttt
gtcagccata gcggccgcaa atacaaaaaa gccattgtc 59 20 38 DNA Artificial
Sequence Description of Artificial Sequence primer 20 gcggccgcta
tggctgacaa aaacatcctg tatggccc 38 21 44 DNA Artificial Sequence
Description of Artificial Sequence primer 21 ccccaagctt ctaaccgttg
gtgtgtttct cgaacatctg acgc 44 22 386 PRT Bacillus
stearothermophilus 22 Met Ser Val Ser Ile Pro His Gly Gly Thr Leu
Ile Asn Arg Trp Asn 1 5 10 15 Pro Asp Tyr Pro Leu Asp Glu Ala Thr
Lys Thr Ile Glu Leu Ser Lys 20 25 30 Ala Glu Leu Ser Asp Leu Glu
Leu Ile Gly Thr Gly Ala Tyr Ser Pro 35 40 45 Leu Thr Gly Phe Leu
Thr Lys Thr Asp Tyr Asp Ala Val Val Glu Thr 50 55 60 Met Arg Leu
Ser Asp Gly Thr Val Trp Ser Ile Pro Val Thr Leu Ala 65 70 75 80 Val
Thr Glu Glu Lys Ala Lys Glu Leu Ala Val Gly Asp Lys Ala Lys 85 90
95 Leu Val Tyr Arg Gly Asp Val Tyr Gly Val Ile Glu Ile Ala Asp Ile
100 105 110 Tyr Arg Pro Asp Lys Thr Lys Glu Ala Lys Leu Val Tyr Lys
Thr Asp 115 120 125 Glu Leu Ala His Pro Gly Val Arg Lys Leu Phe Glu
Lys Pro Asp Val 130 135 140 Tyr Val Gly Gly Glu Ile Thr Leu Val Lys
Arg Thr Asp Lys Gly Gln 145 150 155 160 Phe Ala Ala Phe Tyr Phe Asp
Pro Ala Glu Thr Arg Lys Lys Phe Ala 165 170 175 Glu Phe Gly Trp Asn
Thr Val Val Gly Phe Gln Thr Arg Asn Pro Val 180 185 190 His Arg Ala
His Glu Tyr Ile Gln Lys Cys Ala Leu Glu Ile Val Asp 195 200 205 Gly
Leu Phe Leu Asn Pro Leu Val Gly Glu Thr Lys Ser Asp Asp Ile 210 215
220 Pro Ala Asp Ile Arg Met Glu Ser Tyr Gln Val Leu Leu Glu Asn Tyr
225 230 235 240 Tyr Pro Lys Asp Arg Val Phe Leu Gly Val Phe Gln Ala
Ala Met Arg 245 250 255 Tyr Ala Gly Pro Arg Glu Ala Ile Phe His Ala
Met Val Arg Lys Asn 260 265 270 Phe Gly Cys Thr His Phe Ile Val Gly
Arg Asp His Ala Gly Val Gly 275 280 285 Asn Tyr Tyr Gly Thr Tyr Asp
Ala Gln Lys Ile Phe Leu Asn Phe Thr 290 295 300 Ala Glu Glu Leu Gly
Ile Thr Pro Leu Phe Phe Glu His Ser Phe Tyr 305 310 315 320 Cys Thr
Lys Cys Glu Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp 325 330 335
Ala Lys Tyr His Val Val Leu Ser Gly Thr Lys Val Arg Glu Met Leu 340
345 350 Arg Asn Gly Gln Val Pro Pro Ser Thr Phe Ser Arg Pro Glu Val
Ala 355 360 365 Ala Val Leu Ile Lys Gly Leu Gln Glu Arg Glu Thr Val
Ala Pro Ser 370 375 380 Ala Arg 385 23 546 PRT Aquifex aeolicus 23
Met Glu Lys Ile Lys Tyr Leu Lys Ser Ile Gln Ile Ser Gln Arg Ser 1 5
10 15 Val Leu Asp Leu Lys Leu Leu Ala Val Gly Ala Phe Thr Pro Leu
Asp 20 25 30 Arg Phe Met Gly Glu Glu Asp Tyr Arg Asn Val Val Glu
Ser Met Arg 35 40 45 Leu Lys Ser Gly Thr Leu Phe Pro Ile Pro Ile
Thr Leu Pro Met Glu 50 55 60 Lys Glu Ile Ala Lys Asp Leu Lys Glu
Gly Glu Trp Ile Val Leu Arg 65 70 75 80 Asp Pro Lys Asn Val Pro Leu
Ala Ile Met Arg Val Glu Glu Val Tyr 85 90 95 Lys Trp Asn Leu Glu
Tyr Glu Ala Lys Asn Val Leu Gly Thr Thr Asp 100 105 110 Pro Arg His
Pro Leu Val Ala Glu Met His Thr Trp Gly Glu Tyr Tyr 115 120 125 Ile
Ser Gly Glu Leu Lys Val Ile Gln Leu Pro Lys Tyr Tyr Asp Phe 130 135
140 Pro Glu Tyr Arg Lys Thr Pro Lys Gln Val Arg Glu Glu Ile Lys Ser
145 150 155 160 Leu Gly Leu Asp Lys Ile Val Ala Phe Gln Thr Arg Asn
Pro Met His 165 170 175 Arg Val His Glu Glu Leu Thr Lys Arg Ala Met
Glu Lys Val Gly Gly 180 185 190 Gly Leu Leu Leu His Pro Val Val Gly
Leu Thr Lys Pro Gly Asp Val 195 200 205 Asp Val Tyr Thr Arg Met Arg
Ile Tyr Lys Val Leu Tyr Glu Lys Tyr 210 215 220 Tyr Asp Lys Lys Lys
Thr Ile Leu Ala Phe Leu Pro Leu Ala Met Arg 225 230 235 240 Met Ala
Gly Pro Arg Glu Ala Leu Trp His Gly Ile Ile Arg Arg Asn 245 250 255
Tyr Gly Ala Thr His Phe Ile Val Gly Arg Asp His Ala Ser Pro Gly 260
265 270 Lys Asp Ser Lys Gly Lys Pro Phe Tyr Asp Pro Tyr Glu Ala Gln
Glu 275 280 285 Leu Phe Lys Lys Tyr Glu Asp Glu Ile Gly Ile Lys Met
Val Pro Phe 290 295 300 Glu Glu Leu Val Tyr Val Pro Glu Leu Asp Gln
Tyr Val Glu Ile Asn 305 310 315 320 Glu Ala Lys Lys Arg Asn Leu Lys
Tyr Ile Asn Ile Ser Gly Thr Glu 325 330 335 Ile Arg Glu Asn Phe Leu
Lys Gln Gly Arg Lys Leu Pro Glu Trp Phe 340 345 350 Thr Arg Pro Glu
Val Ala Glu Ile Leu Ala Glu Thr Tyr Val Pro Lys 355 360 365 His Lys
Gln Gly Phe Cys Val Trp Leu Thr Gly Leu Pro Cys Ala Gly 370 375 380
Lys Ser Thr Ile Ala Glu Ile Leu Ala Thr Met Leu Gln Ala Arg Gly 385
390 395 400 Arg Lys Val Thr Leu Leu Asp Gly Asp Val Val Arg Thr His
Leu Ser 405 410 415 Arg Gly Leu Gly Phe Ser Lys Glu Asp Arg Ile Thr
Asn Ile Leu Arg 420 425 430 Val Gly Phe Val Ala Ser Glu Ile Val Lys
His Asn Gly Val Val Ile 435 440 445 Cys Ala Leu Val Ser Pro Tyr Arg
Ser Ala Arg Asn Gln Val Arg Asn 450 455 460 Met Met Glu Glu Gly Lys
Phe Ile Glu Val Phe Val Asp Ala Pro Val 465 470 475 480 Glu Val Cys
Glu Glu Arg Asp Val Lys Gly Leu Tyr Lys Lys Ala Lys 485 490 495 Glu
Gly Leu Ile Lys Gly Phe Thr Gly Val Asp Asp Pro Tyr Glu Pro 500 505
510 Pro Val Ala Pro Glu Val Arg Val Asp Thr Thr Lys Leu Thr Pro Glu
515 520 525 Glu Ser Ala Leu Lys Ile Leu Glu Phe Leu Lys Lys Glu Gly
Phe Ile 530 535 540 Lys Asp 545 24 259 PRT Pyrococcus furiosus 24
Met Val Ser Lys Pro His Gly Gly Lys Leu Ile Arg Arg Ile Ala Ala 1 5
10 15 Pro Arg Thr Arg Glu Arg Ile Leu Ser Glu Gln His Glu Tyr Pro
Arg 20 25 30 Val Gln Ile Asp His Gly Arg Ala Ile Asp Leu Glu Asn
Ile Ala His 35 40 45 Gly Val Tyr Ser Pro Leu Lys Gly Phe Leu Thr
Arg Glu Asp Phe Glu 50 55 60 Ser Val Leu Asp Tyr Met Arg Leu Ser
Asp Asp Thr Pro Trp Thr Ile 65 70 75 80 Pro Ile Val Leu Asp Val Gly
Glu Pro Thr Phe Glu Gly Gly Asp Ala 85 90 95 Ile Leu Leu Tyr Tyr
Glu Asn Pro Pro Ile Ala Arg Met His Val Glu 100 105 110 Asp Ile Tyr
Thr Tyr Asp Lys Lys Glu Phe Ala Val Lys Val Phe Lys 115 120 125 Thr
Asp Asp Pro Asn His Leu Gly Val Ala Arg Val Tyr Ser Met Gly 130 135
140 Lys Tyr Leu Val Gly Gly Gly Ile Glu Leu Leu Asn Glu Leu Pro Asn
145 150 155 160 Pro Phe Ala Lys Tyr Thr Leu Arg Pro Val Glu Thr Arg
Ile Leu Phe 165 170 175 Lys Glu Arg Gly Trp Lys Thr Ile Val Ala Phe
Gln Thr Arg Asn Val 180 185 190 Pro His Leu Gly His Glu Tyr Val Gln
Lys Ala Ala Leu Thr Phe Val 195 200 205 Asp Gly Leu Phe Ile Asn Pro
Val Leu Gly Arg Lys Lys Lys Gly Asp 210 215 220 Tyr Lys Asp Glu Val
Ile Ile Lys Ala Tyr Tyr Leu Ile Met Lys Tyr 225 230 235 240 Cys Ser
Asn Thr Thr His His Ala Ile Met Arg Lys Thr Ser Thr Ser 245 250 255
Ser Gln Thr 25 406 PRT Sulfolobus solfataricus 25 Met Asn Leu Ile
Gly His Gly Lys Val Glu Ile Val Glu Arg Ile Lys 1 5 10 15 Thr Ile
Ser Asp Phe Lys Glu Leu His Arg Ile Glu Val Lys Arg Gln 20 25 30
Leu Ala His Glu Ile Val Ser Ile Ala Tyr Gly Phe Leu Ser Pro Leu 35
40 45 Lys Gly Phe Met Asn Tyr Glu Glu Val Asp Gly Val Val Glu Asn
Met 50 55 60 Arg Leu Pro Asn Gly Val Leu Trp Pro Ile Pro Leu Val
Phe Asp Tyr 65 70 75 80 Ser Gln Asn Glu Lys Val Lys Glu Gly Asp Thr
Ile Gly Ile Thr Tyr 85 90 95 Leu Gly Lys Pro Leu Ala Ile Met Lys
Val Lys Glu Ile Phe Lys Tyr 100 105 110 Asp Lys Leu Lys Ile Ala Glu
Lys Val Tyr Lys Thr Lys Asp Ile Lys 115 120 125 His Pro Gly Val Lys
Arg Thr Leu Ser Tyr Ala Asp Ala Phe Leu Ala 130 135 140 Gly Asp Val
Trp Leu Val Arg Glu Pro Gln Phe Asn Lys Pro Tyr Ser 145 150 155 160
Glu Phe Trp Leu Thr Pro Arg Met His Arg Thr Val Phe Glu Lys Lys 165
170 175 Gly Trp Lys Arg Val Val Ala Phe Gln Thr Arg Asn Val Pro His
Thr 180 185 190 Gly His Glu Tyr Leu Met Lys Phe Ala Trp Phe Ala Ala
Asn Glu Asn 195 200 205 Gln Lys Val Asp Glu Pro Arg Thr Gly Ile Leu
Val Asn Val Val Ile 210 215 220 Gly Glu Lys Arg Val Gly Asp Tyr Ile
Asp Glu Ala Ile Leu Leu Thr 225 230 235 240 His Asp Ala Leu Ser Lys
Tyr Gly Tyr Ile Ser Pro Lys Val
His Leu 245 250 255 Leu Ser Phe Thr Leu Trp Asp Met Arg Tyr Ala Gly
Pro Arg Glu Ala 260 265 270 Leu Leu His Ala Ile Ile Arg Ser Asn Leu
Gly Cys Thr His His Val 275 280 285 Phe Gly Arg Asp His Ala Gly Val
Gly Asn Tyr Tyr Ser Pro Tyr Glu 290 295 300 Ala His Glu Ile Phe Asp
Ser Ile Asn Glu Glu Asp Leu Leu Ile Lys 305 310 315 320 Pro Ile Phe
Leu Arg Glu Asn Tyr Tyr Cys Pro Arg Cys Gly Ser Ile 325 330 335 Glu
Asn Glu Ile Leu Cys Asp His Lys Asp Glu Lys Gln Glu Phe Ser 340 345
350 Gly Ser Leu Ile Arg Ser Ile Ile Leu Asp Glu Val Lys Pro Thr Lys
355 360 365 Met Val Met Arg Pro Glu Val Tyr Asp Val Leu Met Lys Ala
Ala Glu 370 375 380 Gln Tyr Gly Phe Gly Ser Pro Phe Val Thr Glu Glu
Tyr Leu Glu Lys 385 390 395 400 Arg Gln Ser Ile Leu Gly 405 26 455
PRT Pyrobaculum aerophilum 26 Met Pro Met Pro Ala Pro Leu Glu Pro
His Gly Gly Arg Leu Val Tyr 1 5 10 15 Asn Val Ile Glu Asp Arg Asp
Lys Ala Ala Ala Met Ile Gln Gly Leu 20 25 30 Pro Ser Ile Glu Ile
Glu Pro Thr Leu Gly Pro Asp Gly Ser Pro Ile 35 40 45 Arg Asn Pro
Tyr Arg Glu Ile Met Ser Ile Ala Tyr Gly Phe Phe Ser 50 55 60 Pro
Val Glu Gly Phe Met Thr Arg Asn Glu Val Glu Ser Ile Leu Lys 65 70
75 80 Glu Arg Arg Leu Leu Asn Gly Trp Leu Phe Pro Phe Pro Leu Ile
Tyr 85 90 95 Asp Val Asp Glu Glu Lys Ile Lys Gly Ile Lys Glu Gly
Asp Ser Val 100 105 110 Leu Leu Lys Leu Lys Gly Lys Pro Leu Ala Val
Leu Asn Val Glu Glu 115 120 125 Ile Trp Arg Leu Pro Asp Arg Lys Glu
Leu Ala Asp Ala Val Phe Gly 130 135 140 Thr Pro Glu Arg Asn Lys Glu
Val Val Lys Lys Arg Phe Asp Glu Lys 145 150 155 160 His Pro Gly Trp
Leu Ile Tyr Arg Ser Met Arg Pro Met Ala Leu Ala 165 170 175 Gly Lys
Ile Thr Val Val Asn Pro Pro Arg Phe Lys Glu Pro Tyr Ser 180 185 190
Arg Phe Trp Met Pro Pro Arg Val Ser Arg Glu Tyr Val Glu Lys Lys 195
200 205 Gly Trp Arg Ile Val Val Ala His Gln Thr Arg Asn Val Pro His
Ile 210 215 220 Gly His Glu Met Leu Met Lys Arg Ala Met Phe Val Ala
Gly Gly Glu 225 230 235 240 Arg Pro Gly Asp Ala Val Leu Val Asn Ala
Ile Ile Gly Ala Lys Arg 245 250 255 Pro Gly Asp Tyr Val Asp Glu Ala
Ile Leu Glu Gly His Glu Ala Leu 260 265 270 Asn Lys Ala Gly Tyr Phe
His Pro Asp Arg His Val Val Thr Met Thr 275 280 285 Leu Trp Asp Met
Arg Tyr Gly Asn Pro Leu Glu Ser Leu Leu His Gly 290 295 300 Ile Ile
Arg Gln Asn Met Gly Ala Thr His His Met Phe Gly Arg Asp 305 310 315
320 His Ala Ala Thr Gly Asp Tyr Tyr Asp Pro Tyr Ala Thr Gln Tyr Leu
325 330 335 Trp Thr Arg Gly Leu Pro Ser Tyr Gly Leu Asn Glu Pro Pro
His Met 340 345 350 Thr Asp Lys Gly Leu Arg Ile Lys Pro Val Asn Leu
Gly Glu Phe Ala 355 360 365 Tyr Cys Pro Lys Cys Gly Glu Tyr Thr Tyr
Leu Gly Met Ser Tyr Glu 370 375 380 Gly Tyr Lys Glu Val Ala Leu Cys
Gly His Thr Pro Glu Arg Ile Ser 385 390 395 400 Gly Ser Leu Leu Arg
Gly Ile Ile Ile Glu Gly Leu Arg Pro Pro Lys 405 410 415 Val Val Met
Arg Pro Glu Val Tyr Asp Val Ile Val Lys Trp Trp Arg 420 425 430 Val
Tyr Gly Tyr Pro Tyr Val Thr Asp Lys Tyr Leu Arg Ile Lys Glu 435 440
445 Gln Glu Leu Glu Val Glu Leu 450 455 27 456 PRT Archaeoglobus
fulgidus 27 Met Pro Leu Ile Lys Thr Pro Pro Pro His Gly Gly Lys Leu
Val Glu 1 5 10 15 Arg Val Val Lys Lys Arg Asp Ile Ala Glu Lys Met
Ile Ala Gly Cys 20 25 30 Pro Thr Tyr Glu Leu Lys Pro Thr Thr Leu
Pro Asp Gly Thr Pro Ile 35 40 45 Arg His Val Tyr Arg Glu Ile Met
Ser Val Cys Tyr Gly Phe Phe Ser 50 55 60 Pro Val Glu Gly Ser Met
Val Gln Asn Glu Leu Glu Arg Val Leu Asn 65 70 75 80 Glu Arg Arg Leu
Leu Ser Glu Trp Ile Phe Pro Tyr Pro Ile Leu Phe 85 90 95 Asp Ile
Ser Glu Glu Asp Tyr Lys Ala Leu Asp Val Lys Glu Gly Asp 100 105 110
Arg Leu Leu Leu Met Leu Lys Gly Gln Pro Phe Ala Thr Leu Asp Ile 115
120 125 Glu Glu Val Tyr Lys Ile Asp Pro Val Asp Val Ala Thr Arg Thr
Phe 130 135 140 Gly Thr Pro Glu Lys Asn Pro Glu Val Val Arg Glu Pro
Phe Asp Asp 145 150 155 160 Lys His Pro Gly Tyr Val Ile Tyr Lys Met
His Asn Pro Ile Ile Leu 165 170 175 Ala Gly Lys Tyr Thr Ile Val Asn
Glu Pro Lys Phe Lys Glu Pro Tyr 180 185 190 Asp Arg Phe Trp Phe Pro
Pro Ser Lys Cys Arg Glu Val Ile Lys Asn 195 200 205 Glu Lys Lys Trp
Arg Thr Val Ile Ala His Gln Thr Arg Asn Val Pro 210 215 220 His Val
Gly His Glu Met Leu Met Lys Cys Ala Ala Tyr Thr Gly Asp 225 230 235
240 Ile Glu Pro Cys His Gly Ile Leu Val Asn Ala Ile Ile Gly Ala Lys
245 250 255 Arg Arg Gly Asp Tyr Pro Asp Glu Ala Ile Leu Glu Gly His
Glu Ala 260 265 270 Val Asn Lys Tyr Gly Tyr Ile Lys Pro Glu Arg His
Met Val Thr Phe 275 280 285 Thr Leu Trp Asp Met Arg Tyr Gly Asn Pro
Ile Glu Ser Leu Leu His 290 295 300 Gly Val Ile Arg Gln Asn Met Gly
Cys Thr His His Met Phe Gly Arg 305 310 315 320 Asp His Ala Ala Val
Gly Glu Tyr Tyr Asp Met Tyr Ala Thr Gln Ile 325 330 335 Leu Trp Ser
Gln Gly Ile Pro Ser Phe Gly Phe Glu Ala Pro Pro Asn 340 345 350 Glu
Val Asp Tyr Gly Leu Lys Ile Ile Pro Gln Asn Met Ala Glu Phe 355 360
365 Trp Tyr Cys Pro Ile Cys Gln Glu Ile Ala Tyr Ser Glu Asn Cys Gly
370 375 380 His Thr Asp Ala Lys Gln Lys Phe Ser Gly Ser Phe Leu Arg
Gly Met 385 390 395 400 Val Ala Glu Gly Val Phe Pro Pro Arg Val Val
Met Arg Pro Glu Val 405 410 415 Tyr Lys Gln Ile Val Lys Trp Trp Lys
Val Tyr Asn Tyr Pro Phe Val 420 425 430 Asn Arg Lys Tyr Leu Glu Leu
Lys Asn Lys Glu Leu Glu Ile Asp Leu 435 440 445 Pro Ala Met Glu Val
Pro Lys Ala 450 455 28 573 PRT Penicillium chrysogenum 28 Met Ala
Asn Ala Pro His Gly Gly Val Leu Lys Asp Leu Leu Ala Arg 1 5 10 15
Asp Ala Pro Arg Gln Ala Glu Leu Ala Ala Glu Ala Glu Ser Leu Pro 20
25 30 Ala Val Thr Leu Thr Glu Arg Gln Leu Cys Asp Leu Glu Leu Ile
Met 35 40 45 Asn Gly Gly Phe Ser Pro Leu Glu Gly Phe Met Asn Gln
Ala Asp Tyr 50 55 60 Asp Arg Val Cys Glu Asp Asn Arg Leu Ala Asp
Gly Asn Val Phe Ser 65 70 75 80 Met Pro Ile Thr Leu Asp Ala Ser Gln
Glu Val Ile Asp Glu Lys Lys 85 90 95 Leu Gln Ala Ala Ser Arg Ile
Thr Leu Arg Asp Phe Arg Asp Asp Arg 100 105 110 Asn Leu Ala Ile Leu
Thr Ile Asp Asp Ile Tyr Arg Pro Asp Lys Thr 115 120 125 Lys Glu Ala
Lys Leu Val Phe Gly Gly Asp Pro Glu His Pro Ala Ile 130 135 140 Val
Tyr Leu Asn Asn Thr Val Lys Glu Phe Tyr Ile Gly Gly Lys Ile 145 150
155 160 Glu Ala Val Asn Lys Leu Asn His Tyr Asp Tyr Val Ala Leu Arg
Tyr 165 170 175 Thr Pro Ala Glu Leu Arg Val His Phe Asp Lys Leu Gly
Trp Ser Arg 180 185 190 Val Val Ala Phe Gln Thr Arg Asn Pro Met His
Arg Ala His Arg Glu 195 200 205 Leu Thr Val Arg Ala Ala Arg Ser Arg
Gln Ala Asn Val Leu Ile His 210 215 220 Pro Val Val Gly Leu Thr Lys
Pro Gly Asp Ile Asp His Phe Thr Arg 225 230 235 240 Val Arg Ala Tyr
Gln Ala Leu Leu Pro Arg Tyr Pro Asn Gly Met Ala 245 250 255 Val Leu
Gly Leu Leu Gly Leu Ala Met Arg Met Gly Gly Pro Arg Glu 260 265 270
Ala Ile Trp His Ala Ile Ile Arg Lys Asn His Gly Ala Thr His Phe 275
280 285 Ile Val Gly Arg Asp His Ala Gly Pro Gly Ser Asn Ser Lys Gly
Glu 290 295 300 Asp Phe Tyr Gly Pro Tyr Asp Ala Gln His Ala Val Glu
Lys Tyr Lys 305 310 315 320 Asp Glu Leu Gly Ile Glu Val Val Glu Phe
Gln Met Val Thr Tyr Leu 325 330 335 Pro Asp Thr Asp Glu Tyr Arg Pro
Val Asp Gln Val Pro Ala Gly Val 340 345 350 Lys Thr Leu Asn Ile Ser
Gly Thr Glu Leu Arg Arg Arg Leu Arg Ser 355 360 365 Gly Ala His Ile
Pro Glu Trp Phe Ser Tyr Pro Glu Val Val Lys Ile 370 375 380 Leu Arg
Glu Ser Asn Pro Pro Arg Ala Thr Gln Gly Phe Thr Ile Phe 385 390 395
400 Leu Thr Gly Tyr Met Asn Ser Gly Lys Asp Ala Ile Ala Arg Ala Leu
405 410 415 Gln Val Thr Leu Asn Gln Gln Gly Gly Arg Ser Val Ser Leu
Leu Leu 420 425 430 Gly Asp Thr Val Arg His Glu Leu Ser Ser Glu Leu
Gly Phe Thr Arg 435 440 445 Glu Asp Arg His Thr Asn Ile Gln Arg Ile
Ala Phe Val Ala Thr Glu 450 455 460 Leu Thr Arg Ala Gly Ala Ala Val
Ile Ala Ala Pro Ile Ala Pro Tyr 465 470 475 480 Glu Glu Ser Arg Lys
Phe Ala Arg Asp Ala Val Ser Gln Ala Gly Ser 485 490 495 Phe Phe Leu
Val His Val Ala Thr Pro Leu Glu His Cys Glu Gln Ser 500 505 510 Asp
Lys Arg Gly Ile Tyr Ala Ala Ala Arg Arg Gly Glu Ile Lys Gly 515 520
525 Phe Thr Gly Val Asp Asp Pro Tyr Glu Thr Pro Glu Lys Ala Asp Leu
530 535 540 Val Val Asp Phe Ser Lys Gln Ser Val Arg Ser Ile Val His
Glu Ile 545 550 555 560 Ile Leu Val Leu Glu Ser Gln Gly Phe Leu Glu
Arg Gln 565 570 29 389 PRT Aeropyrum pernix 29 Met Gly Cys Ser Val
Gly Leu Val Ser Arg Pro His Gly Gly Arg Leu 1 5 10 15 Val Arg Arg
Val Leu Ser Gly Arg Arg Arg Glu Ile Phe Glu Ser Gln 20 25 30 Tyr
Arg Glu Met Pro Arg Leu Glu Val Pro Leu Glu Arg Ala Ile Asp 35 40
45 Ala Glu Asp Leu Ala Arg Gly Val Phe Ser Pro Leu Glu Gly Phe Met
50 55 60 Val Glu Asp Asp Tyr Leu Ser Val Leu Ser Arg Met Arg Leu
Ser Asn 65 70 75 80 Asp Leu Pro Trp Thr Ile Pro Ile Val Leu Asp Ala
Asn Arg Glu Trp 85 90 95 Val Leu Asn Glu Gly Val Ser Ala Gly Asp
Asp Ile Ile Leu Thr Tyr 100 105 110 His Gly Leu Pro Ile Ala Val Leu
Thr Leu Glu Asp Ile Tyr Ser Trp 115 120 125 Asp Lys Gly Leu His Ala
Glu Lys Val Phe Lys Thr Arg Asp Pro Asn 130 135 140 His Pro Gly Val
Glu Ala Thr Tyr Lys Arg Gly Asp Ile Leu Leu Gly 145 150 155 160 Gly
Arg Leu Glu Leu Ile Gln Gly Pro Pro Asn Pro Leu Glu Arg Tyr 165 170
175 Thr Leu Trp Pro Val Glu Thr Arg Val Leu Phe Lys Glu Lys Gly Trp
180 185 190 Arg Thr Val Ala Ala Phe Gln Thr Arg Asn Val Pro His Leu
Gly His 195 200 205 Glu Tyr Val Gln Lys Ala Ala Leu Thr Phe Val Asp
Gly Leu Leu Val 210 215 220 His Pro Leu Ala Gly Trp Lys Lys Arg Gly
Asp Tyr Arg Asp Glu Val 225 230 235 240 Ile Ile Arg Ala Tyr Glu Ala
Leu Ile Thr His Tyr Tyr Pro Arg Gly 245 250 255 Val Val Val Leu Ser
Val Leu Arg Met Asn Met Asn Tyr Ala Gly Pro 260 265 270 Arg Glu Ala
Val His His Ala Ile Val Arg Lys Asn Phe Gly Ala Thr 275 280 285 His
Phe Ile Val Gly Arg Asp His Ala Gly Val Gly Ser Tyr Tyr Gly 290 295
300 Pro Tyr Glu Ala Trp Glu Ile Phe Arg Glu Phe Pro Asp Leu Gly Ile
305 310 315 320 Thr Pro Leu Phe Val Arg Glu Ala Tyr Tyr Cys Arg Arg
Cys Gly Gly 325 330 335 Met Val Asn Glu Lys Val Cys Pro His Gly Asp
Glu Tyr Arg Val Arg 340 345 350 Ile Ser Gly Thr Arg Leu Arg Glu Met
Leu Gly Arg Gly Glu Arg Pro 355 360 365 Pro Glu Tyr Met Met Arg Pro
Glu Val Ala Asp Ala Ile Ile Ser His 370 375 380 Pro Asp Pro Phe Ile
385 30 511 PRT Saccharomyces cerevisiae 30 Met Pro Ala Pro His Gly
Gly Ile Leu Gln Asp Leu Ile Ala Arg Asp 1 5 10 15 Ala Leu Lys Lys
Asn Glu Leu Leu Ser Glu Ala Gln Ser Ser Asp Ile 20 25 30 Leu Val
Trp Asn Leu Thr Pro Arg Gln Leu Cys Asp Ile Glu Leu Ile 35 40 45
Leu Asn Gly Gly Phe Ser Pro Leu Thr Gly Phe Leu Asn Glu Asn Asp 50
55 60 Tyr Ser Ser Val Val Thr Asp Ser Arg Leu Ala Asp Gly Thr Leu
Trp 65 70 75 80 Thr Ile Pro Ile Thr Leu Asp Val Asp Glu Ala Phe Ala
Asn Gln Ile 85 90 95 Lys Pro Asp Thr Arg Ile Ala Leu Phe Gln Asp
Asp Glu Ile Pro Ile 100 105 110 Ala Ile Leu Thr Val Gln Asp Val Tyr
Lys Pro Asn Lys Thr Ile Glu 115 120 125 Ala Glu Lys Val Phe Arg Gly
Asp Pro Glu His Pro Ala Ile Ser Tyr 130 135 140 Leu Phe Asn Val Ala
Gly Asp Tyr Tyr Val Gly Gly Ser Leu Glu Ala 145 150 155 160 Ile Gln
Leu Pro Gln His Tyr Asp Tyr Pro Gly Leu Arg Lys Thr Pro 165 170 175
Ala Gln Leu Arg Leu Glu Phe Gln Ser Arg Gln Trp Asp Arg Val Val 180
185 190 Ala Phe Gln Thr Arg Asn Pro Met His Arg Ala His Arg Glu Leu
Thr 195 200 205 Val Arg Ala Ala Arg Glu Ala Asn Ala Lys Val Leu Ile
His Pro Val 210 215 220 Val Gly Leu Thr Lys Pro Gly Asp Ile Asp His
His Thr Arg Val Arg 225 230 235 240 Val Tyr Gln Glu Ile Ile Lys Arg
Tyr Pro Asn Gly Ile Ala Phe Leu 245 250 255 Ser Leu Leu Pro Leu Ala
Met Arg Met Ser Gly Asp Arg Glu Ala Val 260 265 270 Trp His Ala Ile
Ile Arg Lys Asn Tyr Gly Ala Ser His Phe Ile Val 275 280 285 Gly Arg
Asp His Ala Gly Pro Gly Lys Asn Ser Lys Gly Val Asp Phe 290 295 300
Tyr Gly Pro Tyr Asp Ala Gln Glu Leu Val Glu Ser Tyr Lys His Glu 305
310 315 320 Leu Asp Ile Glu Val Val Pro Phe Arg Met Val Thr Tyr Leu
Pro Asp 325 330 335 Glu Asp Arg Tyr Ala Pro Ile Asp Gln Ile Asp Thr
Thr Lys Thr Arg 340 345 350 Thr Leu Asn Ile Ser Gly Thr Glu Leu Arg
Arg Arg Leu Arg Val Gly 355 360 365 Gly Glu Ile Pro Glu Trp Phe Ser
Tyr Pro Glu Val Val Lys Ile Leu 370 375 380 Arg Glu Ser Asn Pro Pro
Arg Pro Lys Gln Gly Phe Ser Ile Val Leu 385 390
395 400 Gly Asn Ser Leu Thr Val Ser Arg Glu Gln Leu Ser Ile Ala Leu
Leu 405 410 415 Ser Thr Phe Leu Gln Phe Gly Gly Gly Arg Tyr Tyr Lys
Ile Phe Glu 420 425 430 His Asn Asn Lys Thr Glu Leu Leu Ser Leu Ile
Gln Asp Phe Ile Gly 435 440 445 Ser Gly Ser Gly Leu Ile Ile Pro Asn
Gln Trp Glu Asp Asp Lys Asp 450 455 460 Ser Val Val Gly Lys Gln Asn
Val Tyr Leu Leu Asp Thr Ser Ser Ser 465 470 475 480 Ala Asp Ile Gln
Leu Glu Ser Ala Asp Glu Pro Ile Ser His Ile Val 485 490 495 Gln Lys
Val Val Leu Phe Leu Glu Asp Asn Gly Phe Phe Val Phe 500 505 510 31
309 PRT Thermomonospora fusca 31 Met Ser Gln Val Ser Asp Ala Val
Gly Arg Tyr Gln Leu Ser Gln Leu 1 5 10 15 Asp Phe Leu Glu Ala Glu
Ala Ile Phe Ile Met Arg Glu Val Ala Ala 20 25 30 Glu Phe Glu Arg
Pro Val Leu Leu Phe Ser Gly Gly Lys Asp Ser Val 35 40 45 Val Met
Leu Arg Ile Ala Glu Lys Ala Phe Trp Pro Ala Pro Ile Pro 50 55 60
Phe Pro Val Met His Val Asp Thr Gly His Asn Phe Pro Glu Val Ile 65
70 75 80 Glu Phe Arg Asp Lys Arg Val Ala Glu Leu Gly Val Arg Leu
Ile Val 85 90 95 Ala Ser Val Gln Asp Leu Ile Asp Ala Gly Lys Val
Val Glu Pro Lys 100 105 110 Gly Arg Trp Ala Ser Arg Asn Arg Leu Gln
Thr Ala Ala Leu Leu Glu 115 120 125 Ala Ile Glu Lys Tyr Gly Phe Asp
Ala Ala Phe Gly Gly Ala Arg Arg 130 135 140 Asp Glu Glu Lys Ala Arg
Ala Lys Glu Arg Val Phe Ser Phe Arg Asp 145 150 155 160 Glu Phe Gly
Gln Trp Asp Pro Lys Asn Gln Arg Pro Glu Leu Trp Asn 165 170 175 Leu
Tyr Asn Thr Arg Val His Arg Gly Glu Asn Ile Arg Val Phe Pro 180 185
190 Leu Ser Asn Trp Thr Glu Leu Asp Val Trp His Tyr Ile Arg Arg Glu
195 200 205 Gly Leu Arg Leu Pro Ser Ile Tyr Phe Ala His Arg Arg Arg
Val Phe 210 215 220 Glu Arg Asp Gly Ile Leu Leu Pro Asp Ser Pro Tyr
Val Thr Arg Asp 225 230 235 240 Glu Asp Glu Glu Val Phe Glu Ala Ser
Val Arg Tyr Arg Thr Val Gly 245 250 255 Asp Met Thr Cys Thr Gly Ala
Val Leu Ser Thr Ala Thr Thr Leu Asp 260 265 270 Glu Val Ile Ala Glu
Ile Ala Ala Thr Arg Ile Thr Glu Arg Gly Gln 275 280 285 Thr Arg Ala
Asp Asp Arg Gly Ser Glu Ala Ala Met Glu Glu Arg Lys 290 295 300 Arg
Glu Gly Tyr Phe 305
* * * * *