U.S. patent application number 13/002709 was filed with the patent office on 2011-09-22 for enzyme-pore constructs.
This patent application is currently assigned to OXFORD NANOPORE TECHNOLOGIES LIMITED. Invention is credited to Hagan Bayley, Stephen Cheley, James Clarke, Lakmal Jayasinghe, Brian Mckeown, James White.
Application Number | 20110229877 13/002709 |
Document ID | / |
Family ID | 41161355 |
Filed Date | 2011-09-22 |
United States Patent
Application |
20110229877 |
Kind Code |
A1 |
Jayasinghe; Lakmal ; et
al. |
September 22, 2011 |
ENZYME-PORE CONSTRUCTS
Abstract
The invention relates to constructs comprising a transmembrane
protein pore subunit and a nucleic acid handling enzyme. The pore
subunit is covalently attached to the enzyme such that both the
subunit and enzyme retain their activity. The constructs can be
used to generate transmembrane protein pores having a nucleic acid
handling enzyme attached thereto. Such pores are particularly
useful for sequencing nucleic acids. The enzyme handles the nucleic
acid in such a way that the pore can detect its component
nucleotides by stochastic sensing.
Inventors: |
Jayasinghe; Lakmal; (Oxford,
GB) ; Bayley; Hagan; (Oxford, GB) ; Cheley;
Stephen; (East Lansing, MI) ; Mckeown; Brian;
(Chipping Norton, GB) ; White; James; (Oxford,
GB) ; Clarke; James; (Oxford, GB) |
Assignee: |
OXFORD NANOPORE TECHNOLOGIES
LIMITED
Kidlington
GB
|
Family ID: |
41161355 |
Appl. No.: |
13/002709 |
Filed: |
July 6, 2009 |
PCT Filed: |
July 6, 2009 |
PCT NO: |
PCT/GB09/01679 |
371 Date: |
May 13, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61078695 |
Jul 7, 2008 |
|
|
|
Current U.S.
Class: |
435/6.1 ;
435/188; 536/23.2 |
Current CPC
Class: |
C12N 9/1247 20130101;
C12N 9/16 20130101; C12N 9/90 20130101; C12Q 1/6869 20130101; C12N
9/52 20130101; C12Q 1/6869 20130101; C07K 14/31 20130101; C12N
9/127 20130101; C12N 9/1276 20130101; C12N 9/96 20130101; C12N 9/22
20130101; C12Q 2565/631 20130101; C12N 9/1252 20130101 |
Class at
Publication: |
435/6.1 ;
435/188; 536/23.2 |
International
Class: |
C12N 9/96 20060101
C12N009/96; C12Q 1/68 20060101 C12Q001/68; C07H 21/04 20060101
C07H021/04 |
Claims
1. A construct comprising a transmembrane protein pore subunit and
a nucleic acid handling enzyme, wherein the subunit is covalently
attached to the enzyme, wherein the subunit retains its ability to
form a pore and wherein the enzyme retains its ability to handle
nucleic acids.
2. A construct according to claim 1, wherein (a) the enzyme is
attached to the subunit at more than one point; (b) the enzyme is
genetically fused to the subunit; (c) the amino acid sequence of
the enzyme is added in frame into the amino acid sequence of the
subunit; or (d) the enzyme is chemically fused to the subunit.
3-5. (canceled)
6. A construct according to claim 1, wherein the enzyme is attached
to the pore by one or more linkers, optionally amino acid
linkers.
7. (canceled)
8. A construct according to claim 1, wherein the subunit is derived
from .alpha.-hemolysin (.alpha.-HL) or the subunit comprises the
sequence shown SEQ ID NO: 2 or a variant thereof.
9. (canceled)
10. A construct according to claim 1, wherein the nucleic acid
handling enzyme is (a) a nuclease and wherein the nuclease is
optionally a member of any of the Enzyme Classification (EC) groups
3.1.11, 3.1.13, 3.1.14, 3.1.15, 3.1.16, 3.1.21, 3.1.22, 3.1.25,
3.1.26, 3.1.27, 3.1.30 and 3.1.31; (b) an exonuclease and wherein
the enzyme optionally comprises the sequence shown in any one of
SEQ ID NOs: 10, 12, 14 and 16 or a variant thereof; (c) is a
polymerase and wherein the polymerase is optionally (i) a member of
any of the Enzyme Classification (EC) groups 2.7.7.6, 2.7.7.7,
2.7.7.19, 2.7.7.48 and 2.7.7.49 or (ii) a DNA-dependent DNA
polymerase, an RNA-dependent DNA polymerase, a DNA-dependent RNA
polymerase or an RNA-dependent RNA polymerase; (d) is a helicase
and wherein the helicase is optionally (i) a member of any of the
Enzyme Classification (EC) groups 3.6.1.- and 2.7.7 or (ii) an
ATP-dependent DNA helicase, an ATP-dependent RNA helicase or an
ATP-independent RNA helicase; or (e) is a topoisomerase and wherein
the helicase is optionally (i) a member of any of the Enzyme
Classification (EC) groups 5.99.1.2 and 5.99.1.3 or (ii) a
gyrase.
11-13. (canceled)
14. A construct according to claim 1, wherein the construct
comprises the sequence shown in any one of SEQ ID NOs: 18, 20, 22,
24, 26, 28 and 30 or a variant thereof.
15-19. (canceled)
20. A polynucleotide sequence which encodes a construct according
to claim 2, wherein the polynucleotide optionally comprises the
sequence shown in any one of SEQ ID NOs: 17, 19, 21, 23, 25, 27 and
29 or a variant thereof.
21. (canceled)
22. A modified pore for use in sequencing nucleic acids, comprising
at least one construct comprises a transmembrane protein pore
subunit and a nucleic acid handling enzyme, wherein the subunit is
covalently attached to the enzyme, wherein the subunit retains its
ability to form a pore and wherein the enzyme retains its ability
to handle nucleic acids, wherein the pore optionally comprises a
construct of claim 4 and six subunits comprising the sequence shown
SEQ ID NO: 2 or a variant thereof.
23. (canceled)
24. A pore according to claim 22, wherein (a) all seven subunits
have a glutamine at position 139 of SEQ ID NO: 2 and one of the
subunits has a cysteine at position 135; (b) all seven subunits
have an arginine at position 113 of SEQ ID NO: 2; and/or (c) the
pore comprises a molecular adaptor that facilitates an interaction
between the pore and one or more nucleotide(s) and wherein the
molecular adaptor is optionally (i) a cyclodextrin or a derivative
thereof or (ii) heptakis-6-amino-.beta.-cyclodextrin
(am.sub.7-.beta.CD), 6-monodeoxy-6-monoamino-.beta.-cyclodextrin
(am.sub.1-.beta. CD) or heptakis-(6-deoxy-6-guanidino)-cyclodextrin
(gu.sub.7-.beta.CD).
25-28. (canceled)
29. A kit for producing a modified pore for use in sequencing
nucleic acids, comprising: (a) at least one construct comprising a
transmembrane protein pore subunit and a nucleic acid handling
enzyme, wherein the subunit is covalently attached to the enzyme,
wherein the subunit retains its ability to form a pore and wherein
the enzyme retains its ability to handle nucleic acids and the
remaining subunits needed to form a pore; or (b) at least one
polynucleotide according to claim 7 and polynucleotide sequences
encoding any remaining subunits needed to form a pore.
30. A kit according to claim 29, wherein the kit comprises: (a) a
construct comprising a transmembrane protein pore subunit and a
nucleic acid handling enzyme, wherein the subunit is covalently
attached to the enzyme, wherein the subunit retains its ability to
form a pore, wherein the enzyme retains its ability to handle
nucleic acids, and wherein the subunit is derived from
.alpha.-hemolysin (.alpha.-HL) or the subunit comprises the
sequence shown SEQ ID NO: 2 or a variant thereof and six subunits
each comprising the sequence shown in SEQ ID NO: 2 or a variant
thereof.
31-32. (canceled)
33. A method of producing a construct according to claim 1,
comprising: (a) covalently attaching a nucleic acid handling enzyme
to a transmembrane protein pore subunit; and (b) determining
whether or not the resulting construct is capable of forming a pore
and handling nucleic acids, wherein the enzyme is optionally
attached to the subunit before the subunit forms part of a pore
(post expression modification) or after the subunit has formed part
of a pore (post oligomerisation modification) and/or wherein step
(a) optionally comprises: (i) providing a polynucleotide that
encodes the construct, wherein the polynucleotide optionally
comprises the sequence shown in any one of SEQ ID NOs: 17, 19, 21,
23, 25, 27 and 29 or a variant thereof; and (ii) expressing the
polynucleotide sequence.
34-35. (canceled)
36. A method of producing a modified pore according to claim 22,
comprising: (a) covalently attaching a nucleic acid handling enzyme
to a transmembrane protein pore; and (b) determining whether or not
the resulting pore is capable of handling nucleic acids and
detecting nucleotides; or comprising: (a) allowing at least one
construct to form a pore with other suitable subunits, wherein the
construct comprises a transmembrane protein pore subunit and a
nucleic acid handling enzyme, wherein the subunit is covalently
attached to the enzyme, wherein the subunit retains its ability to
form a pore and wherein the enzyme retains its ability to handle
nucleic acids; and (b) determining whether or not the resulting
pore is capable of handling nucleic acids and detecting
nucleotides.
37. (canceled)
38. A method of purifying a transmembrane pore comprising at least
one construct according to claim 1, comprising: (a) providing the
at least one construct and the other subunits required to form the
pore; (b) oligomerising the at least one construct and other
subunits on synthetic lipid vesicles; and (c) contacting the
vesicles with a non-ionic surfactant; and (d) recovering the
oligomerised pore, wherein the synthetic vesicles optionally
comprise 30% cholesterol, 30% phosphatidylcholine (PC), 20%
phosphatidylethanolamine (PE), 10% sphingomyelin (SM) and 10%
phosphatidylserine (PS) and/or wherein the non-ionic surfactant is
optionally an Octyl Glucoside (OG) or DoDecyl Maltoside (DDM)
detergent.
39-40. (canceled)
41. A method of sequencing a target nucleic acid sequence,
comprising: (a) contacting the target sequence with a pore
according to claim 24, which comprises an exonuclease, such that
the exonuclease digests an individual nucleotide from one end of
the target sequence; (b) contacting the nucleotide with the pore so
that the nucleotide interacts with the adaptor; (c) measuring the
current passing through the pore during the interaction and thereby
determining the identity of the nucleotide; and (d) repeating steps
(a) to (c) at the same end of the target sequence and thereby
determining the sequence of the target sequence.
42. A method of sequencing a target nucleic acid sequence,
comprising: (a) contacting the target sequence with a pore
according to claim 22 so that the enzyme pushes or pulls the target
sequence through the pore and a proportion of the nucleotides in
the target sequence interacts with the pore; and (b) measuring the
current passing through the pore during each interaction and
thereby determining the sequence of the target sequence.
43. A kit according to claim 29, wherein the kit comprises a
polynucleotide encoding a construct comprising a transmembrane
protein pore subunit and a nucleic acid handling enzyme, wherein
the subunit is covalently attached to the enzyme, wherein the
subunit retains its ability to form a pore and wherein the enzyme
retains its ability to handle nucleic acids, and six
polynucleotides each encoding a subunit comprising the sequence
shown in SEQ ID NO: 2 or a variant thereof.
Description
FIELD OF THE INVENTION
[0001] The invention relates to constructs comprising a
transmembrane protein pore subunit and a nucleic acid handling
enzyme. The pore subunit is covalently attached to the enzyme such
that both the subunit and enzyme retain their activity. The
constructs can be used to generate transmembrane protein pores
having a nucleic acid handling enzyme attached thereto. Such pores
are particularly useful for sequencing nucleic acids. The enzyme
handles the nucleic acid in such a way that the pore can detect
each of its component nucleotides by stochastic sensing.
BACKGROUND OF THE INVENTION
[0002] Stochastic detection is an approach to sensing that relies
on the observation of individual binding events between analyte
molecules and a receptor. Stochastic sensors can be created by
placing a single pore of nanometer dimensions in an insulating
membrane and measuring voltage-driven ionic transport through the
pore in the presence of analyte molecules. The frequency of
occurrence of fluctuations in the current reveals the concentration
of an analyte that binds within the pore. The identity of an
analyte is revealed through its distinctive current signature,
notably the duration and extent of current block (Braha, O.,
Walker, B., Cheley, S., Kasianowicz, J. J., Song, L., Gouaux, J.
E., and Bayley, H. (1997) Chem. Biol. 4, 497-505; and Bayley, H.,
and Cremer, P. S. (2001) Nature 413, 226-230).
[0003] Engineered versions of the bacterial pore forming toxin
.alpha.-hemolysin (.alpha.-HL) have been used for stochastic
sensing of many classes of molecules (Bayley, H., and Cremer, P. S.
(2001) Nature 413, 226-230; Shin, S., H., Luchian, T., Cheley, S.,
Braha, O., and Bayley, H. (2002) Angew. Chem. Int. Ed. 41,
3707-3709; and Guan, X., Gu, L.-Q., Cheley, S., Braha, O., and
Bayley, H. (2005) ChemBioChem 6, 1875-1881). In the course of these
studies, it was found that attempts to engineer .alpha.-HL to bind
small organic analytes directly can prove taxing, with rare
examples of success (Guan, X., Gu, L.-Q., Cheley, S., Braha, O.,
and Bayley, H. (2005) ChemBioChem 6, 1875-1881). Fortunately, a
different strategy was discovered, which utilized non-covalently
attached molecular adaptors, notably cyclodextrins (Gu, L.-Q.,
Braha, O., Conlan, S., Cheley, S., and Bayley, H. (1999) Nature
398, 686-690), but also cyclic peptides (Sanchez-Quesada, J.,
Ghadiri, M. R., Bayley, H., and Braha, O. (2000) J. Am. Chem. Soc.
122, 11758-11766) and cucurbiturils (Braha, O., Webb, J., Gu,
L.-Q., Kim, K., and Bayley, H. (2005) ChemPhysChem 6, 889-892).
Cyclodextrins become transiently lodged in the .alpha.-HL pore and
produce a substantial but incomplete channel block. Organic
analytes, which bind within the hydrophobic interiors of
cyclodextrins, augment this block allowing analyte detection (Gu,
L.-Q., Braha, O., Conlan, S., Cheley, S., and Bayley, H. (1999)
Nature 398, 686-690).
[0004] There is currently a need for rapid and cheap DNA or RNA
sequencing technologies across a wide range of applications.
Existing technologies are slow and expensive mainly because they
rely on amplification techniques to produce large volumes of
nucleic acid and require a high quantity of specialist fluorescent
chemicals for signal detection. Stochastic sensing has the
potential to provide rapid and cheap DNA sequencing by reducing the
quantity of nucleotide and reagents required.
SUMMARY OF THE INVENTION
[0005] The inventors have surprisingly demonstrated that covalent
attachment of a transmembrane protein pore subunit to a nucleic
acid handling enzyme results in a construct that is capable of both
forming a pore and handling nucleic acids. The inventors have also
surprisingly demonstrated that the construct can be used to
generate a transmembrane protein pore that is capable of both
handling a nucleic acid and sequencing the nucleic acid via
stochastic sensing. The fixed nature and close proximity of the
enzyme to the pore means that a proportion of the nucleotides in a
target nucleic acid will interact with the pore and affect the
current flowing through the pore in a distinctive manner. As a
result, transmembrane protein pores comprising such constructs are
useful tools for stochastic sensing and especially for sequencing
nucleic acids.
[0006] Accordingly, the invention provides a construct comprising a
transmembrane protein pore subunit and a nucleic acid handling
enzyme, wherein the subunit is covalently attached to the enzyme,
wherein the subunit retains its ability to form a pore and wherein
the enzyme retains its ability to handle nucleic acids. The
invention also provides: [0007] a polynucleotide sequence which
encodes a construct of the invention; [0008] a modified pore for
use in sequencing nucleic acids, comprising at least one construct
of the invention; [0009] a kit for producing a modified pore for
use in sequencing nucleic acids, comprising:
[0010] (a) at least one construct of the invention; and
[0011] (b) any remaining subunits needed to form a pore; [0012] a
kit for producing a modified pore for use in sequencing nucleic
acids, comprising:
[0013] (b) at least one polynucleotide of the invention; and
[0014] (c) polynucleotide sequences encoding any remaining subunits
needed to form a pore; [0015] a method of producing a construct of
the invention, comprising:
[0016] (a) covalently attaching a nucleic acid handling enzyme to a
transmembrane protein pore subunit; and
[0017] (b) determining whether or not the resulting construct is
capable of forming a pore and handling nucleic acids; [0018] a
method of producing a modified pore of the invention,
comprising:
[0019] (a) covalently attaching a nucleic acid handling enzyme to a
transmembrane protein pore; and
[0020] (b) determining whether or not the resulting pore is capable
of handling nucleic acids and detecting nucleotides; [0021] method
of producing a modified pore of the invention, comprising:
[0022] (a) allowing at least one construct of the invention to form
a pore with other suitable subunits; and
[0023] (b) determining whether or not the resulting pore is capable
of handling nucleic acids and detecting nucleotides. [0024] a
method of purifying a transmembrane pore comprising at least one
construct of the invention, comprising:
[0025] (a) providing the at least one construct and the other
subunits required to form the pore;
[0026] (b) oligomerising the at least one construct and other
subunits on synthetic lipid vesicles; and
[0027] (c) contacting the vesicles with a non-ionic surfactant;
and
[0028] (d) recovering the oligomerised pore; [0029] a method of
sequencing a target nucleic acid sequence, comprising:
[0030] (a) contacting the target sequence with a pore of the
invention, which comprises an exonuclease, such that the
exonuclease digests an individual nucleotide from one end of the
target sequence;
[0031] (b) contacting the nucleotide with the pore so that the
nucleotide interacts with the adaptor;
[0032] (c) measuring the current passing through the pore during
the interaction and thereby determining the identity of the
nucleotide; and
[0033] (d) repeating steps (a) to (c) at the same end of the target
sequence and thereby determining the sequence of the target
sequence; and
[0034] a method of sequencing a target nucleic acid sequence,
comprising:
[0035] (a) contacting the target sequence with a pore of the
invention so that the enzyme pushes or pulls the target sequence
through the pore and a proportion of the nucleotides in the target
sequence interacts with the pore; and
[0036] (b) measuring the current passing through the pore during
each interaction and thereby determining the sequence of the target
sequence.
DESCRIPTION OF THE FIGURES
[0037] FIG. 1 shows how exonuclease enzymes catalyse the hydrolysis
of phosphodiester bonds. Within the active site of the exonuclease,
a water molecule is enabled to react with the phosphate of the 3'
end of the polynucleotide (DNA). Cleavage of the bond between the
phosphate and the sugar towards the 5' end releases a monophosphate
(deoxy)nucleoside.
[0038] FIG. 2 shows the crystal structures of exonucleases used in
the Example, N and C-terminus and active sites are shown for each.
i) Adapted form of EcoExoIII; ii) EcoExoI; iii) TthRecJ-cd; and iv)
Lambda exo.
[0039] FIG. 3 shows a cartoon of an exonuclease equipped .alpha.-HL
pore. The exonuclease is genetically fused to one of the seven
monomers of the heptamer, with linker arms sufficiently long to
enable correct protein folding of the exonuclease moiety and the
.alpha.-HL moiety.
[0040] FIG. 4 shows generic image of the protein construct
generated shows the BspEI insertion point(s) in the .alpha.-HL
gene. Ligation AfuExoIII, bounded by two stretches of DNA encoding
a (serine/glycine).sub.x5 repeat (shown hatched) generates a fusion
protein in which a 64.5 kDa protein will be generated, under the
transcriptional control of the T7 promoter shown.
[0041] FIG. 5 shows the oligomerisation of .alpha.-HL Loop 1 fusion
constructs with wild-type .alpha.-HL at different protein ratios.
i) HL-wt-EcoExoIII-L1-H6; ii) HL-RQC-EcoExoI-L1-H6; and iii)
HL-RQC-TthRecJ-L1-H6.
[0042] FIG. 6 shows the control of homo and heteroheptamer
generation by different monomer ratios. HL-RQ subunits are shown in
white and fusion subunits in black. Increasing the ratio of fusion
subunits to wild-type subunits increases the generation of 2:5, 1:6
and 0:7 hetero and homo-heptamers. Similarly increasing the
concentration of HL-RQ monomer increases the generation of 6:1 and
5:2 heteroheptamers.
[0043] FIG. 7 shows the oligomerisation of HL-RQC-EcoExoIII-L1-H6
fusion proteins that contain a stiff polyproline EcoExoIII
C-terminus linker. IVTT expressed proteins mixed in a 5:1 wild-type
to fusion protein ratio in the presence of purified rabbit red
blood cell membranes. i) HL-RQC-EcoExoIII-L1-{SG} 5+{SG} 5-H6; ii)
HL-RQC-EcoExoIII-L1-{SG}5+5P-H6; iii)
HL-RQC-EcoExoIII-L1-4SG+5P-H6; and iv) HL monomers.
[0044] FIG. 8 shows the Loop 2 region of a single .alpha.-hemolysin
subunit with the mature heptamer. Subunit 1 shown in white,
subunits 2-7 shown in grey and the loop 2 region of subunit 1 shown
in black.
[0045] FIG. 9 shows the oligomerisation of alternative Loop 2
EcoExoIII fusion proteins. i) HL-(RQ).sub.7; ii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2a-H6).sub.1; iii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2a-8P-H6).sub.1; iv)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-H48.DELTA.-H6).sub.1; v)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D45.DELTA.-H6).sub.1; vi)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D45-K46.DELTA.-H6).sub.1; and vii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D45-N47.DELTA.-H6).sub.1.
[0046] FIG. 10 shows the oligomerisation of alternative Loop 2
EcoExoIII fusion proteins. i) HL-(RQ).sub.7; ii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2a-H6).sub.1; iii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D45-N47.DELTA.-H6).sub.1; iv)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D46-K56.DELTA.-H6).sub.1; v)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D46.DELTA.-H6).sub.1; vi)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D46-N47.DELTA.-H6).sub.1; vii)
HL-(RQ).sub.6(RQC-EcoExoIII-L2-A1-S16.DELTA./D46-N47.DELTA.-H6).sub.1;
viii) HL-(RQ).sub.6(RQC-EcoExoIII-L2-F42-D46.DELTA.-H6).sub.1; and
ix) HL-(RQ).sub.6(RQC-EcoExoIII-L2143-D46.DELTA.-H6).sub.1.
[0047] FIG. 11 shows the oligomerisation of EcoExoI C-terminus
fusion proteins. a) denotes both hemolysin and enzyme-fusion
protein monomers are radiolabelled, b) denotes only the fusion
protein monomer is radiolabelled. i)
HL-(RQ).sub.6(RQC-EcoExoI-Cter-{SG}8-H6).sub.1; ii)
HL-(RQ).sub.6(RQC-EcoExoI-Cter-DG{SG}8-H6).sub.1; iii)
HL-(RQ).sub.6(RQC-EcoExoI-Cter-WPV{SG}8-H6).sub.1; iv)
HL-(RQ).sub.6(RQC-EcoExoI-Cter-DGS {P}12-H6).sub.1; and v)
HL-(RQ).sub.6(RQC-EcoExoI-Cter-WPV {P}12-H6).sub.1.
[0048] FIG. 12 shows the effect of different surfactants on
EcoExoIII activity. Left graph--Sodium dodecyl sulphate (SDS): a;
0%, b; 0.1%, c; 0.5%. Right graph--n-Dodecyl-D-maltopyranoside
(DDM): a; 0%, b; 0.1%, c; 0.25%, d; 0.5%.
[0049] FIG. 13 shows the oligomerisation of E. coli BL21 (DE3)
pLysS expressed .beta.-hemolysin monomers for formation and
purification of preferentially 6:1 heteroheptamers. His-tag
purification is used to select between heteroheptamers and
wild-type homoheptamer to give a large excess of 6:1
heteroheptamer.
[0050] FIG. 14 shows the exonuclease activity of monomer and
heteroheptamer fusion proteins. Left graph--Activity of Wild-type
and fusion monomers: a, 10.sup.-'2 dilution HL-RQC-EcoExoIII-L1-H6;
b, 10.sup.-'4 dilution HL-RQC-EcoExoIII-L1-H6; c, 10.sup.-'6
dilution HL-RQC-EcoExoIII-L1-H6; d, 10.sup.-'2 dilution HL-RQ.
Right graph--Activity of HL-(RQ).sub.6(RQC-EcoExoIII-L1-H6).sub.1:
a, DDM crude extract; b, Ni-NTA purified; c, Ni-NTA purified and
buffer exchange.
[0051] FIG. 15 shows base detection by the
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D46-N47.DELTA.-H6).sub.1
heteroheptamer. The top trace was obtained from a heteroheptamer
with a covalently attached am.sub.6-amPDP.sub.1-.beta.CD adapter
molecule. Further blocking events can be seen and ascribed to
individual mono-phosphate nucleosides for base discrimination. The
bottom graph shows the corresponding histograms of dNMP events from
the top trace. Peaks, from left to right, correspond to G, T, A, C
respectively. Data acquired at 400/400 mM KCl, 180 mV and 10 .mu.M
dNMPs.
DESCRIPTION OF THE SEQUENCE LISTING
[0052] SEQ ID NO: 1 shows the polynucleotide sequence encoding one
subunit of wild-type .alpha.-hemolysin (.alpha.-HL).
[0053] SEQ ID NO: 2 shows the amino acid sequence of one subunit of
wild-type .alpha.-HL. Amino acids 2 to 6, 73 to 75, 207 to 209, 214
to 216 and 219 to 222 form .alpha.-helices. Amino acids 22 to 30,
35 to 44, 52 to 62, 67 to 71, 76 to 91, 98 to 103, 112 to 123, 137
to 148, 154 to 159, 165 to 172, 229 to 235, 243 to 261, 266 to 271,
285 to 286 and 291 to 293 form .beta.-strands. All the other
non-terminal amino acids, namely 7 to 21, 31 to 34, 45 to 51, 63 to
66, 72, 92 to 97, 104 to 111, 124 to 136, 149 to 153, 160 to 164,
173 to 206, 210 to 213, 217, 218, 223 to 228, 236 to 242, 262 to
265, 272 to 274 and 287 to 290 form loop regions. Amino acids 1 and
294 are terminal amino acids.
[0054] SEQ ID NO: 3 shows the polynucleotide sequence encoding one
subunit of .alpha.-HL M113R/N139Q (HL-RQ).
[0055] SEQ ID NO: 4 shows the amino acid sequence of one subunit of
.alpha.-HL M113R/N139Q (HL-RQ). The same amino acids that form
.alpha.-helices, .beta.-strands and loop regions in wild-type
.alpha.-HL form the corresponding regions in this subunit.
[0056] SEQ ID NO: 5 shows the pT7 .alpha.-HL BspEI knockout
polynucleotide sequence (pT7-SC1_BspEI-KO). The .alpha.-HL encoding
sequence is between nucleotides 2709 and 3593. The BspEI remnant is
at nucleotides 3781 and 3782.
[0057] SEQ ID NO: 6 shows the polynucleotide sequence encoding one
subunit of wild-type .alpha.-hemolysin containing a BspEI cloning
site at position 1 (L1).
[0058] SEQ ID NO: 7 shows the polynucleotide sequence encoding one
subunit of wild-type .alpha.-hemolysin containing a BspEI cloning
site at position 2 (L2a).
[0059] SEQ ID NO: 8 shows the polynucleotide sequence encoding one
subunit of wild-type .alpha.-hemolysin containing a BspEI cloning
site at position 2 (L2b).
[0060] SEQ ID NO: 9 shows the codon optimized polynucleotide
sequence derived from the xthA gene from E. coli. It encodes the
exonuclease III enzyme from E. coli.
[0061] SEQ ID NO: 10 shows the amino acid sequence of the
exonuclease III enzyme from E. coli. This enzyme performs
distributive digestion of 5' monophosphate nucleosides from one
strand of double stranded DNA (dsDNA) in a 3'-5' direction. Enzyme
initiation on a strand requires a 5' overhang of approximately 4
nucleotides. Amino acids 11 to 13, 15 to 25, 39 to 41, 44 to 49, 85
to 89, 121 to 139, 158 to 160, 165 to 174, 181 to 194, 198 to 202,
219 to 222, 235 to 240 and 248 to 252 form .alpha.-helices. Amino
acids 2 to 7, 29 to 33, 53 to 57, 65 to 70, 75 to 78, 91 to 98, 101
to 109, 146 to 151, 195 to 197, 229 to 234 and 241 to 246 form
.beta.-strands. All the other non-terminal amino acids, 8 to 10, 26
to 28, 34 to 38, 42, 43, 50 to 52, 58 to 64, 71 to 74, 79 to 84,
90, 99, 100, 110 to 120, 140 to 145, 152 to 157, 161 to 164, 175 to
180, 203 to 218, 223 to 228, 247 and 253 to 261, form loops. Amino
acids 1, 267 and 268 are terminal amino acids. The enzyme active
site is formed by loop regions connecting
.beta..sub.1-.alpha..sub.1, .beta..sub.3-.beta..sub.4,
.beta..sub.5-.beta..sub.6, .beta..sub.III-.alpha..sub.I,
.beta..sub.IV-.alpha..sub.II and .beta..sub.V-.beta..sub.VI
(consisting of amino acids 8-10, 58-64, 90, 110-120, 152-164,
175-180, 223-228 and 253-261 respectively). A single divalent metal
ion is bound at residue E34 and aids nucleophilic attack on the
phosphodiester bond by the D229 and H259 histidine-aspartate
catalytic pair.
[0062] SEQ ID NO: 11 shows the codon optimized polynucleotide
sequence derived from the sbcB gene from E. coli. It encodes the
exonuclease I enzyme (EcoExoI) from E. coli.
[0063] SEQ ID NO: 12 shows the amino acid sequence of exonuclease I
enzyme (EcoExoI) from E. coli. This enzyme performs processive
digestion of 5' monophosphate nucleosides from single stranded DNA
(ssDNA) in a 3'-5' direction. Enzyme initiation on a strand
requires at least 12 nucleotides. Amino acids 60 to 68, 70 to 78,
80 to 93, 107 to 119, 124 to 128, 137 to 148, 165 to 172, 182 to
211, 213 to 221, 234 to 241, 268 to 286, 313 to 324, 326 to 352,
362 to 370, 373 to 391, 401 to 454 and 457 to 475 form
.alpha.-helices. Amino acids 10 to 18, 28 to 26, 47 to 50, 97 to
101, 133 to 136, 229 to 232, 243 to 251, 258 to 263, 298 to 302 and
308 to 311 form .beta.-strands. All the other non-terminal amino
acids, 19 to 27, 37 to 46, 51 to 59, 69, 79, 94 to 96102 to 106,
120 to 123, 129 to 132, 149 to 164, 173 to 181, 212, 222 to 228
233, 242, 252 to 257, 264 to 267, 287 to 297, 303 to 307, 312, 325,
353 to 361, 371, 372, 392 to 400455 and 456, form loops. Amino
acids 1 to 9 are terminal amino acids. The overall fold of the
enzyme is such that three regions combine to form a molecule with
the appearance of the letter C, although residues 355-358,
disordered in the crystal structure, effectively convert this C
into an O-like shape. The amino terminus (1-206) forms the
exonuclease domain and has homology to the DnaQ superfamily, the
following residues (202-354) form an SH3-like domain and the
carboxyl domain (359-475) extends the exonuclease domain to form
the C-like shape of the molecule. Four acidic residues of EcoExoI
are conserved with the active site residues of the DnaQ superfamily
(corresponding to D15, E17, D108 and D186). It is suggested a
single metal ion is bound by residues D15 and 108. Hydrolysis of
DNA is likely catalyzed by attack of the scissile phosphate with an
activated water molecule, with H181 being the catalytic residue and
aligning the nucleotide substrate.
[0064] SEQ ID NO: 13 shows the codon optimized polynucleotide
sequence derived from the recJ gene from T. thermophilus. It
encodes the RecJ enzyme from T. thermophilus (TthRecJ-cd).
[0065] SEQ ID NO: 14 shows the amino acid sequence of the RecJ
enzyme from T. thermophilus (TthRecJ-cd). This enzyme performs
processive digestion of 5' monophosphate nucleosides from ssDNA in
a 5'-3' direction. Enzyme initiation on a strand requires at least
4 nucleotides. Amino acids 19 to 33, 44 to 61, 80 to 89, 103 to
111, 136 to 140, 148 to 163, 169 to 183, 189 to 202, 207 to 217,
223 to 240, 242 to 252, 254 to 287, 302 to 318, 338 to 350 and 365
to 382 form .alpha.-helices. Amino acids 36 to 40, 64 to 68, 93 to
96, 116 to 120, 133 to 135, 294 to 297, 321 to 325, 328 to 332, 352
to 355 and 359 to 363 form .beta.-strands. All the other
non-terminal amino acids, 34, 35, 41 to 43, 62, 63, 69 to 79, 90 to
92, 97 to 102, 112 to 115, 121 to 132, 141 to 147, 164 to 168, 184
to 188203 to 206, 218 to 222, 241, 253, 288 to 293, 298 to 301,
319, 320, 326, 327, 333 to 337, 351 to 358 and 364, form loops.
Amino acids 1 to 18 and 383 to 425 are terminal amino acids. The
crystal structure has only been resolved for the core domain of
RecJ from Thermus thermophilus (residues 40-463). To ensure
initiation of translation and in vivo expression of the RecJ core
domain a methionine residue was added at its amino terminus, this
is absent from the crystal structure information. The resolved
structure shows two domains, an amino (2-253) and a carboxyl
(288-463) region, connected by a long .alpha.-helix (254-287). The
catalytic residues (D46, D98, H122, and D183) co-ordinate a single
divalent metal ion for nucleophilic attack on the phosphodiester
bond. D46 and H120 proposed to be the catalytic pair; however,
mutation of any of these conserved residues in the E. coli RecJ was
shown to abolish activity.
[0066] SEQ ID NO: 15 shows the codon optimized polynucleotide
sequence derived from the bacteriophage lambda exo (redX) gene. It
encodes the bacteriophage lambda exonuclease.
[0067] SEQ ID NO: 16 shows the amino acid sequence of the
bacteriophage lambda exonuclease. The sequence is one of three
identical subunits that assemble into a trimer. The enzyme performs
highly processive digestion of nucleotides from one strand of
dsDNA, in a 3'-5' direction. Enzyme initiation on a strand
preferentially requires a 5' overhang of approximately 4
nucleotides with a 5' phosphate. Amino acids 3 to 10, 14 to 16, 22
to 26, 34 to 40, 52 to 67, 75 to 95, 135 to 149, 152 to 165 and 193
to 216 form .alpha.-helices. Amino acids 100 to 101, 106 to 107,
114 to 116, 120 to 122, 127 to 131, 169 to 175 and 184 to 190 form
.beta.-strands. All the other non-terminal amino acids, 11 to 13,
17 to 21, 27 to 33, 41 to 51, 68 to 74, 96 to 99, 102 to 105, 108
to 113, 117 to 119, 123 to 126, 132 to 134, 150 to 151, 166 to 168,
176 to 183, 191 to 192, 217 to 222, form loops. Amino acids 1, 2
and 226 are terminal amino acids. Lambda exonuclease is a
homo-trimer that forms a toroid with a tapered channel through the
middle, apparently large enough for dsDNA to enter at one end and
only ssDNA to exit at the other The catalytic residues are
undetermined but a single divalent metal ion appears bound at each
subunit by residues D119, E129 and L130.
[0068] SEQ ID NO: 17 shows the polynucleotide sequence encoding
HL-wt-EcoExoIII-L1-H6 used in the Example.
[0069] SEQ ID NO: 18 shows the amino acid sequence of one subunit
of HL-wt-EcoExoIII-L1-H6 used in the Example.
[0070] SEQ ID NO: 19 shows the polynucleotide sequence encoding
HL-RQC-EcoExoIII-L1-H6 used in the Example.
[0071] SEQ ID NO: 20 shows the amino acid sequence of one subunit
of HL-RQC-EcoExoIII-L1-H6 used in the Example.
[0072] SEQ ID NO: 21 shows the polynucleotide sequence encoding
HL-RQC-EcoExoI-L1-H6 used in the Example.
[0073] SEQ ID NO: 22 shows the amino acid sequence of one subunit
of HL-RQC-EcoExoI-L1-H6 used in the Example.
[0074] SEQ ID NO: 23 shows the polynucleotide sequence encoding
HL-RQC-TthRecJ-L1-H6 used in the Example.
[0075] SEQ ID NO: 24 shows the amino acid sequence of one subunit
of HL-RQC-TthRecJ-L1-H6 used in the Example.
[0076] SEQ ID NO: 25 shows the polynucleotide sequence encoding
HL-RQC-EcoExoIII-L2-D45-N47.DELTA.-H6 used in the Example.
[0077] SEQ ID NO: 26 shows the amino acid sequence of one subunit
of HL-RQC-EcoExoIII-L2-D45-N47.DELTA.-H6 used in the Example.
[0078] SEQ ID NO: 27 shows the polynucleotide sequence encoding
HL-RQC-EcoExoI-Cter-{SG}8-H6 used in the Example.
[0079] SEQ ID NO: 28 shows the amino acid sequence of one subunit
of HL-RQC-EcoExoI-Cter-{SG}8-H6 used in the Example.
[0080] SEQ ID NO: 29 shows the polynucleotide sequence encoding
HL-RQC-EcoExoI-Cter-DG{SG}8-H6 used in the Example.
[0081] SEQ ID NO: 30 shows the amino acid sequence of one subunit
of HL-RQC-EcoExoI-Cter-DG{SG}8-H6 used in the Example.
[0082] SEQ ID NOs: 31 and 32 show the oligonucleotide sequences
used in the exonuclease assay of the Example.
DETAILED DESCRIPTION OF THE INVENTION
[0083] It is to be understood that different applications of the
disclosed products and methods may be tailored to the specific
needs in the art. It is also to be understood that the terminology
used herein is for the purpose of describing particular embodiments
of the invention only, and is not intended to be limiting.
[0084] In addition as used in this specification and the appended
claims, the singular forms "a", "an", and "the" include plural
referents unless the content clearly dictates otherwise. Thus, for
example, reference to "a construct" includes "constructs",
reference to "a transmembrane protein pore" includes two or more
such pores, reference to "a molecular adaptor" includes two or more
such adaptors, and the like.
[0085] All publications, patents and patent applications cited
herein, whether supra or infra, are hereby incorporated by
reference in their entirety.
Constructs
[0086] The present invention provides constructs that are useful
for sequencing nucleic acids. The constructs comprise a
transmembrane protein pore subunit and a nucleic acid handling
enzyme. The subunit is covalently attached to the enzyme. The
constructs of the invention are useful tools for forming pores that
are capable of sequencing nucleic acids by stochastic sensing. The
constructs of the invention are particularly useful for generating
transmembrane protein pores that can both handle a target nucleic
acid sequence and discriminate between the different nucleotides in
the target sequence. As described in more detail below, the enzyme
handles a target nucleic acid in such a way that the pore can
identify nucleotides in the target sequence and thereby sequence
the target sequence.
[0087] The subunit retains its ability to form a pore. The ability
of a construct to form a pore can be assayed using any method known
in the art. For instance, the construct may be inserted into a
membrane along with other appropriate subunits and its ability to
oligomerize to form a pore may be determined. Methods are known in
the art for inserting constructs and subunits into membranes, such
as lipid bilayers. For example, constructs and subunits may be
suspended in a purified form in a solution containing a lipid
bilayer such that it diffuses to the lipid bilayer and is inserted
by binding to the lipid bilayer and assembling into a functional
state. Alternatively, constructs and subunits may be directly
inserted into the membrane using the "pick and place" method
described in M. A. Holden, H. Bayley. J. Am. Chem. Soc. 2005, 127,
6502-6503 and International Application No. PCT/GB2006/001057
(published as WO 2006/100484). The ability of a construct to form a
pore is typically assayed as described in the Examples.
[0088] The enzyme retains its ability to handle nucleic acids. This
allows the construct to form a pore that may be used to sequence
nucleic acids as described below. The ability of a construct to
handle nucleic acids can be assayed using any method known in the
art. For instance, construct or pores formed from the constructs
can be tested for their ability to handle specific sequences of
nucleic acids. The ability of a construct or a pore to handle
nucleic acids is typically assayed as described in the
Examples.
[0089] A construct of the invention may form part of a pore.
Alternatively, a construct may be isolated, substantially isolated,
purified or substantially purified. A construct is isolated or
purified if it is completely free of any other components, such as
lipids or other pore monomers. A construct is substantially
isolated if it is mixed with carriers or diluents which will not
interfere with its intended use. For instance, a construct is
substantially isolated or substantially purified if it present in a
form that comprises less than 10%, less than 5%, less than 2% or
less than 1% of other components, such as lipids or other pore
monomers. A construct of the invention may be present in a lipid
bilayer.
Attachment
[0090] The subunit is covalently attached to the enzyme. The
subunit may be attached to the enzyme at more than one, such as two
or three, points. Attaching the subunit to the enzyme at more than
one point can be used to constrain the mobility of the enzyme. For
instance, multiple attachments may be used to constrain the freedom
of the enzyme to rotate or its ability to move away from the
subunit.
[0091] The subunit may be in a monomeric form when it is attached
to the enzyme (post expression modification). Alternatively, the
subunit may be part of an oligomeric pore when it is attached to an
enzyme (post oligomerisation modification).
[0092] The subunit can be covalently attached to the enzyme using
any method known in the art. The subunit and enzyme may be produced
separately and then attached together. The two components may be
attached in any configuration. For instance, they may be attached
via their terminal (i.e. amino or carboxy terminal) amino acids.
Suitable configurations include, but are not limited to, the amino
terminus of the enzyme being attached to the carboxy terminus of
the subunit and vice versa. Alternatively, the two components may
be attached via amino acids within their sequences. For instance,
the enzyme may be attached to one or more amino acids in a loop
region of the subunit. In a preferred embodiment, terminal amino
acids of the enzyme are attached to one or more amino acids in the
loop region of a subunit. Terminal amino acids and loop regions are
discussed above.
[0093] In one preferred embodiment, the subunit is genetically
fused to the enzyme. A subunit is genetically fused to an enzyme if
the whole construct is expressed from a single polynucleotide
sequence. The coding sequences of the subunit and enzyme may be
combined in any way to form a single polynucleotide sequence
encoding the construct.
[0094] The subunit and enzyme may be genetically fused in any
configuration. The subunit and enzyme may be fused via their
terminal amino acids. For instance, the amino terminus of the
enzyme may be fused to the carboxy terminus of the subunit and vice
versa. The amino acid sequence of the enzyme is preferably added in
frame into the amino acid sequence of the subunit. In other words,
the enzyme is preferably inserted within the sequence of the
subunit. In such embodiments, the subunit and enzyme are typically
attached at two points, i.e. via the amino and carboxy terminal
amino acids of the enzyme. If the enzyme is inserted within the
sequence of the subunit, it is preferred that the amino and carboxy
terminal amino acids of the enzyme are in close proximity and are
each attached to adjacent amino acids in the sequence of the
subunit or variant thereof. In a preferred embodiment, the enzyme
is inserted into a loop region of the subunit.
[0095] In another preferred embodiment, the subunit is chemically
fused to the enzyme. A subunit is chemically fused to an enzyme if
the two parts are chemically attached, for instance via a linker
molecule.
[0096] The subunit may be transiently attached to the enzyme by a
hex-his tag or Ni-NTA. The subunit and enzyme may also be modified
such that they transiently attach to each other.
[0097] The construct retains the pore forming ability of the
subunit. The pore forming ability of the subunit is typically
provided by its .alpha.-helices and .beta.-strands. .beta.-barrel
pores comprise a barrel or channel that is formed from
.beta.-strands, whereas .alpha.-helix bundle pores comprise a
barrel or channel that is formed from .alpha.-helices. The
.alpha.-helices and .beta.-strands are typically connected by loop
regions. In order to avoid affecting the pore forming ability of
the subunit, the enzyme is preferably genetically fused to a loop
region of the subunit or inserted into a loop region of the
subunit. The loop regions of specific subunits are discussed in
more detail below.
[0098] Similarly, the construct retains the nucleic acid handling
ability of the enzyme, which is also typically provided by its
secondary structural elements (.alpha.-helices and .beta.-strands)
and tertiary structural elements. In order to avoid affecting the
nucleic acid handling ability of the enzyme, the enzyme is
preferably genetically fused to the subunit or inserted into the
subunit via residues or regions that does not affect its secondary
or tertiary structure.
[0099] The subunit may be attached directly to the enzyme. The
subunit is preferably attached to the enzyme using one or more,
such as two or three, linkers. The one or more linkers may be
designed to constrain the mobility of the enzyme. The linkers may
be attached to one or more reactive cysteine residues, reactive
lysine residues or non-natural amino acids in the subunit and/or
enzyme. Suitable linkers are well-known in the art. Suitable
linkers include, but are not limited to, chemical crosslinkers and
peptide linkers. Preferred linkers are amino acid sequences (i.e.
peptide linkers). The length, flexibility and hydrophilicity of the
peptide linker are typically designed such that it does not to
disturb the functions of the subunit and enzyme. Preferred flexible
peptide linkers are stretches of 2 to 20, such as 4, 6, 8, 10 or
16, serine and/or glycine amino acids. More preferred flexible
linkers include (SG).sub.1, (SG).sub.2, (SG).sub.3, (SG).sub.4,
(SG).sub.5 and (SG).sub.8 wherein S is serine and G is glycine.
Preferred rigid linkers are stretches of 2 to 30, such as 4, 6, 8,
16 or 24, proline amino acids. More preferred rigid linkers include
(P).sub.12 wherein P is proline.
[0100] Linkers may be attached to the subunit first and then the
enzyme, the enzyme first and then the subunit or the enzyme and
subunit at the same time. When the linker is attached to the
subunit, it may be a monomeric subunit, part of an oligomer of two
or more monomers or part of complete oligomeric pore. It is
preferred that the linker is reacted before any purification step
to remove any unbound linker.
[0101] A preferred method of attaching the subunit to the enzyme is
via cysteine linkage. This can be mediated by a bi-functional
chemical linker or by a polypeptide linker with a terminal
presented cysteine residue. .alpha.-HL (SEQ ID NO: 2) lacks native
cysteine residues so the introduction of a cysteine into the
sequence of SEQ ID NO: 2 enables the controlled covalent attachment
of the enzyme to the subunit. Cysteines can be introduced at
various positions, such as position K8, T9 or N17 of SEQ ID NO: 2
or at the carboxy terminus of SEQ ID NO: 2. The length, reactivity,
specificity, rigidity and solubility of any bi-functional linker
may designed to ensure that the enzyme is positioned correctly in
relation to the subunit and the function of both the subunit and
enzyme is retained. Suitable linkers include bismaleimide
crosslinkers, such as 1,4-bis(maleimido)butane (BMB) or
bis(maleimido)hexane. One draw back of bi-functional linkers is the
requirement of the enzyme to contain no further surface accessible
cysteine residues, as binding of the bi-functional linker to these
cannot be controlled and may affect substrate binding or activity.
If the enzyme does contain several accessible cysteine residues,
modification of the enzyme may be required to remove them while
ensuring the modifications do not affect the folding or activity of
the enzyme. In a preferred embodiment, a reactive cysteine is
presented on a peptide linker that is genetically attached to the
enzyme. This means that additional modifications will not
necessarily be needed to remove other accessible cysteine residues
from the enzyme. The reactivity of cysteine residues may be
enhanced by modification of the adjacent residues, for example on a
peptide linker. For instance, the basic groups of flanking
arginine, histidine or lysine residues will change the pKa of the
cysteines thiol group to that of the more reactive S.sup.- group.
The reactivity of cysteine residues may be protected by thiol
protective groups such as dTNB. These may be reacted with one or
more cysteine residues of the enzyme or subunit, either as a
monomer or part of an oligomer, before a linker is attached.
[0102] Cross-linkage of subunits or enzymes to themselves may be
prevented by keeping the concentration of linker in a vast excess
of the subunit and/or enzyme. Alternatively, a "lock and key"
arrangement may be used in which two linkers are used. Only one end
of each linker may react together to form a longer linker and the
other ends of the linker each react with a different part of the
construct (i.e. subunit or monomer).
[0103] The site of covalent attachment is selected such that, when
the construct is used to form a pore, the enzyme handles a target
nucleic acid sequence in such a way that a proportion of the
nucleotides in the target sequence interacts with the pore.
Nucleotides are then distinguished on the basis of the different
ways in which they affect the current flowing through the pore
during the interaction.
[0104] There are a number of ways that pores can be used to
sequence nucleic acid molecules. One way involves the use of an
exonuclease enzyme, such as a deoxyribonuclease. In this approach,
the exonuclease enzyme is used to sequentially detach the
nucleotides from a target nucleic strand. The nucleotides are then
detected and discriminated by the pore in order of their release,
thus reading the sequence of the original strand. For such an
embodiment, the exonuclease enzyme is preferably attached to the
subunit such that a proportion of the nucleotides released from the
target nucleic acid is capable of entering and interacting with the
barrel or channel of a pore comprising the construct. The
exonuclease is preferably attached to the subunit at a site in
close proximity to the part of the subunit that forms the opening
of the barrel of channel of the pore. The exonuclease enzyme is
more preferably attached to the subunit such that its nucleotide
exit trajectory site is orientated towards the part of the subunit
that forms part of the opening of the pore.
[0105] Another way of sequencing nucleic acids involves the use of
an enzyme that pushes or pulls the target nucleic acid strand
through the pore. In this approach, the ionic current fluctuates as
a nucleotide in the target strand passes through the pore. The
fluctuations in the current are indicative of the sequence of the
strand. For such an embodiment, the enzyme is preferably attached
to the subunit such that it is capable of pushing or pulling the
target nucleic acid through the barrel or channel of a pore
comprising the construct and does not interfere with the flow of
ionic current through the pore. The enzyme is preferably attached
to the subunit at a site in close proximity to the part of the
subunit that forms part of the opening of the barrel of channel of
the pore. The enzyme is more preferably attached to the subunit
such that its active site is orientated towards the part of the
subunit that forms part of the opening of the pore.
[0106] A third way of sequencing a nucleic acid strand is to detect
the bi-products of a polymerase in close proximity to a pore
detector. In this approach, nucleoside phosphates (nucleotides) are
labelled so that a phosphate labelled species is released upon the
addition of a polymerase to the nucleotide strand and the phosphate
labelled species is detected by the pore. The phosphate species
contains a specific label for each nucleotide. As nucleotides are
sequentially added to the nucleic acid strand, the bi-products of
the base addition are detected. The order that the phosphate
labelled species are detected can be used to determine the sequence
of the nucleic acid strand.
[0107] The enzyme is preferably attached to a part of the subunit
that forms part of the cis side of a pore comprising the construct.
In electrophysiology, the cis side is the grounded side. If a
hemolysin pore is inserted correctly into an elcetrophysiology
apparatus, the Cap region is on the cis side. It is well known
that, under a positive potential, nucleotides will migrate from the
cis to the trans side of pores used for stochastic sensing.
Positioning the enzyme at the cis side of a pore allows it to
handle the target nucleic acid such that a proportion of the
nucleotides in the sequence enters the barrel or channel of the
pore and interacts with it. Preferably, at least 20%, at least 40%,
at least 50%, at least 80% or at least 90% of the nucleotides in
the sequence enters the barrel or channel of the pore and interacts
with it.
[0108] The site and method of covalent attachment is preferably
selected such that mobility of the enzyme is constrained. This
helps to ensure that the enzyme handles the target nucleic acid
sequence in such a way that a proportion of the nucleotides in the
target sequence interacts with the pore. For instance, constraining
the ability of enzyme to move means that its active site can be
permanently orientated towards the part of the subunit that forms
part of the opening of the barrel of channel of the pore. The
mobility of the enzyme may be constrained by increasing the number
of points at which the enzyme is attached to the subunit and/or the
use of specific linkers.
Subunit
[0109] The constructs of the invention comprise a subunit from a
transmembrane protein pore. A transmembrane protein pore is a
polypeptide or a collection of polypeptides that permits ions
driven by an applied potential to flow from one side of a membrane.
The pore preferably permits nucleotides to flow from one side of a
membrane to the other along the applied potential. The pore
preferably allows a nucleic acid, such as DNA or RNA, to be pushed
or pulled through the pore.
[0110] The subunit is part of a pore. The pore may be a monomer or
an oligomer. The subunit preferably forms part of a pore made up of
several repeating subunits, such as 6, 7 or 8 subunits. The subunit
more preferably forms part of a heptameric pore. The subunit
typically forms part of a barrel or channel through which the ions
may flow. The subunits of the pore typically surround a central
axis and contribute strands to a transmembrane 13 barrel or channel
or a transmembrane .alpha.-helix bundle or channel. When part of a
construct of the invention, the subunit may be a monomer or part of
an oligomeric pore.
[0111] The subunit typically forms part of a pore whose barrel or
channel comprises amino acids that facilitate interaction with
nucleotides or nucleic acids. These amino acids are preferably
located near the constriction of the barrel or channel. The subunit
typically comprises one or more positively charged amino acids,
such as arginine, lysine or histidine. These amino acids typically
facilitate the interaction between the pore and nucleotides or
nucleic acids by interacting with the phosphate groups in the
nucleotides or nucleic acids or by .pi.-cation interaction with the
bases in the nucleotides or nucleic acids. The nucleotide detection
can be facilitated with an adaptor.
[0112] Subunits for use in accordance with the invention can be
derived from .beta.-barrel pores or .alpha.-helix bundle pores.
.beta.-barrel pores comprise a barrel or channel that is formed
from .beta.-strands. Suitable .beta.-barrel pores include, but are
not limited to, .beta.-toxins, such as .alpha.-hemolysin and
leukocidins, and outer membrane proteins/porins of bacteria, such
as outer membrane porin F (OmpF), outer membrane porin G (OmpG),
outer membrane phospholipase A and Neisseria autotransporter
lipoprotein (NalP). .alpha.-helix bundle pores comprise a barrel or
channel that is formed from .alpha.-helices. Suitable .alpha.-helix
bundle pores include, but are not limited to, inner membrane
proteins and a outer membrane proteins, such as WZA.
[0113] The subunit is preferably derived from .alpha.-hemolysin
(.alpha.-HL). The wild-type .alpha.-HL pore is formed of seven
identical monomers or subunits (i.e. it is heptameric). The
sequence of one wild-type monomer or subunit of .alpha.-hemolysin
is shown in SEQ ID NO: 2. The subunit in the constructs of the
invention preferably comprises the sequence shown in SEQ ID NO: 2
or a variant thereof. Amino acids 1, 7 to 21, 31 to 34, 45 to 51,
63 to 66, 72, 92 to 97, 104 to 111, 124 to 136, 149 to 153, 160 to
164, 173 to 206, 210 to 213, 217, 218, 223 to 228, 236 to 242, 262
to 265, 272 to 274, 287 to 290 and 294 of SEQ ID NO: 2 form loop
regions. The enzyme is preferably attached to one or more of amino
acids 8, 9, 17, 18, 19, 44, 45, 50 and 51 of SEQ ID NO: 2. The
enzyme is more preferably inserted between amino acids, 18 and 19,
44 and 45 or 50 and 51 of SEQ ID NO: 2.
[0114] A variant of SEQ ID NO: 2 is a subunit that has an amino
acid sequence which varies from that of SEQ ID NO: 2 and which
retains its pore forming ability. The ability of the variant to
form pores can be assayed as described above. The variant may
include modifications that facilitate covalent attachment to or
interaction with the nucleic acid handling enzyme. The variant
preferably comprises one or more reactive cysteine residues that
facilitate attachment to the enzyme. For instance, the variant may
include a cysteine at one or more of positions 8, 9, 17, 18, 19,
44, 45, 50 and 51 and/or on the amino or carboxy terminus of SEQ ID
NO: 2. Preferred variants comprise a substitution of the residue at
position 8, 9 or 17 of SEQ ID NO: 2 with cysteine (K8C, T9C or
N17C).
[0115] The variant may be modified to facilitate genetic fusion of
the enzyme. For instance, one or more residues adjacent to the
insertion site may be modified, such as deleted, to facilitate
insertion of the enzyme and/or linkers. If the enzyme is inserted
into loop 2 of SEQ ID NO: 2, one or more of residues D45, K46, N47,
H48, N49 and K50 of SEQ ID NO: 2 may be deleted. A preferred
construct containing such a deletion comprises the sequence shown
in SEQ ID NO: 26 or a variant thereof.
[0116] The variant may also include modifications that facilitate
any interaction with nucleotides or facilitate orientation of a
molecular adaptor as discussed below. The variant may also contain
modifications that facilitate covalent attachment of a molecular
adaptor.
[0117] The subunit may be any of the variants of SEQ ID NO: 2
described in a co-pending International application claiming
priority from U.S. Application No. 61/078,687 and being filed
simultaneously with this application [J A Kemp & Co Ref:
N.104403A; Oxford Nanolabs Ref: ONL IP 004]. All the teachings of
that application may be applied equally to the present invention.
In particular, the variant preferably has a glutamine at position
139 of SEQ ID NO: 2. The variant preferably has an arginine at
position 113 of SEQ ID NO: 2. The variant preferably has a cysteine
at position 119, 121 or 135 of SEQ ID NO: 2. Any of the variants of
SEQ ID NO: 2 shown in SEQ ID NOs: 4, 6, 8, 10, 12 and 14 of the
co-pending application may be used to form a construct of the
invention.
[0118] The subunit may be a naturally occurring variant which is
expressed by an organism, for instance by a Staphylococcus
bacterium. Variants also include non-naturally occurring variants
produced by recombinant technology. Over the entire length of the
amino acid sequence of SEQ ID NO: 2, a variant will preferably be
at least 50% homologous to that sequence based on amino acid
identity. More preferably, the subunit polypeptide may be at least
55%, at least 60%, at least 65%, at least 70%, at least 75%, at
least 80%, at least 85%, at least 90% and more preferably at least
95%, 97% or 99% homologous based on amino acid identity to the
amino acid sequence of SEQ ID NO: 2 over the entire sequence. There
may be at least 80%, for example at least 85%, 90% or 95%, amino
acid identity over a stretch of 200 or more, for example 230, 250,
270 or 280 or more, contiguous amino acids ("hard homology").
[0119] Amino acid substitutions may be made to the amino acid
sequence of SEQ ID NO: 2 in addition to those discussed above, for
example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions may be made, for example, according to
Table 1 below.
TABLE-US-00001 TABLE 1 Conservative substitutions Amino acids in
the same block in the second column and preferably in the same line
in the third column may be substituted for each other. NON-AROMATIC
Non-polar G A P I L V Polar - uncharged C S T M N Q Polar - charged
D E H K R AROMATIC H F W Y
[0120] One or more amino acid residues of the amino acid sequence
of SEQ ID NO: 2 may additionally be deleted from the polypeptides
described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30 residues may be
deleted, or more.
[0121] Variants may fragments of SEQ ID NO: 2. Such fragments
retain pore forming activity. Fragments may be at least 50, 100,
200 or 250 amino acids in length. A fragment preferably comprises
the pore forming domain of SEQ ID NO: 2. Fragments typically
include residues 119, 121, 135. 113 and 139 of SEQ ID NO: 2.
[0122] One or more amino acids may be alternatively or additionally
added to the polypeptides described above. An extension may be
provided at the amino terminus or carboxy terminus of the amino
acid sequence of SEQ ID NO: 2 or a variant or fragment thereof. The
extension may be quite short, for example from 1 to 10 amino acids
in length. Alternatively, the extension may be longer, for example
up to 50 or 100 amino acids. A carrier protein may be fused to a
subunit or variant.
[0123] As discussed above, a variant of SEQ ID NO: 2 is a subunit
that has an amino acid sequence which varies from that of SEQ ID
NO: 2 and which retains its ability to form a pore. A variant
typically contains the regions of SEQ ID NO: 2 that are responsible
for pore formation. The pore forming ability of o-HL, which
contains a .beta.-barrel, is provided by .beta.-strands in each
subunit. A variant of SEQ ID NO: 2 typically comprises the regions
in SEQ ID NO: 2 that form .beta.-strands. The amino acids of SEQ ID
NO: 2 that form .beta.-strands are discussed above. One or more
modifications can be made to the regions of SEQ ID NO: 2 that form
.beta.-strands as long as the resulting variant retains its ability
to form a pore. Specific modifications that can be made to the
.beta.-strand regions of SEQ ID NO: 2 are discussed above.
[0124] A variant of SEQ ID NO: 2 preferably includes one or more
modifications, such as substitutions, additions or deletions,
within its .alpha.-helices and/or loop regions. Amino acids that
form .alpha.-helices and loops are discussed above.
[0125] Standard methods in the art may be used to determine
homology. For example the UWGCG Package provides the BESTFIT
program which can be used to calculate homology, for example used
on its default settings (Devereux et al (1984) Nucleic Acids
Research 12, p387-395). The PILEUP and BLAST algorithms can be used
to calculate homology or line up sequences (such as identifying
equivalent residues or corresponding sequences (typically on their
default settings)), for example as described in Altschul S. F.
(1993) J Mol Evol 36:290-300; Altschul, S. F et al (1990) J Mol
Biol 215:403-10.
[0126] Software for performing BLAST analyses is publicly available
through the National Center for Biotechnology Information
(http://www.ncbi.nlm.nih.gov/). This algorithm involves first
identifying high scoring sequence pair (HSPs) by identifying short
words of length W in the query sequence that either match or
satisfy some positive-valued threshold score T when aligned with a
word of the same length in a database sequence. T is referred to as
the neighbourhood word score threshold (Altschul et al, supra).
These initial neighbourhood word hits act as seeds for initiating
searches to find HSP's containing them. The word hits are extended
in both directions along each sequence for as far as the cumulative
alignment score can be increased. Extensions for the word hits in
each direction are halted when: the cumulative alignment score
falls off by the quantity X from its maximum achieved value; the
cumulative score goes to zero or below, due to the accumulation of
one or more negative-scoring residue alignments; or the end of
either sequence is reached. The BLAST algorithm parameters W, T and
X determine the sensitivity and speed of the alignment. The BLAST
program uses as defaults a word length (W) of 11, the BLOSUM62
scoring matrix (see Henikoff and Henikoff (1992) Proc. Natl. Acad.
Sci. USA 89: 10915-10919) alignments (B) of 50, expectation (E) of
10, M=5, N=4, and a comparison of both strands.
[0127] The BLAST algorithm performs a statistical analysis of the
similarity between two sequences; see e.g., Karlin and Altschul
(1993) Proc. Natl. Acad. Sci. USA 90: 5873-5787. One measure of
similarity provided by the BLAST algorithm is the smallest sum
probability (P(N)), which provides an indication of the probability
by which a match between two amino acid sequences would occur by
chance. For example, a sequence is considered similar to another
sequence if the smallest sum probability in comparison of the first
sequence to the second sequence is less than about 1, preferably
less than about 0.1, more preferably less than about 0.01, and most
preferably less than about 0.001.
[0128] The variant may be modified for example by the addition of
histidine or aspartic acid residues to assist its identification or
purification or by the addition of a signal sequence to promote
their secretion from a cell where the polypeptide does not
naturally contain such a sequence.
[0129] The subunit may be labelled with a revealing label. The
revealing label may be any suitable label which allows the pore to
be detected. Suitable labels include, but are not limited to,
fluorescent molecules, radioisotopes, e.g. .sup.125I, .sup.35S,
enzymes, antibodies, antigens, polynucleotides and ligands such as
biotin.
[0130] The subunit may be isolated from a pore producing organism,
such as Staphylococcus aureus, or made synthetically or by
recombinant means. For example, the subunit may be synthesized by
in vitro translation and transcription. The amino acid sequence of
the subunit may be modified to include non-naturally occurring
amino acids or to increase the stability of the subunit. When the
subunit is produced by synthetic means, such amino acids may be
introduced during production. The subunit may also be altered
following either synthetic or recombinant production.
[0131] The subunit may also be produced using D-amino acids. For
instance, the pores may comprise a mixture of L-amino acids and
D-amino acids. This is conventional in the art for producing such
proteins or peptides.
[0132] The subunit may also contain other non-specific chemical
modifications as long as they do not interfere with its ability to
form a pore. A number of non-specific side chain modifications are
known in the art and may be made to the side chains of the pores.
Such modifications include, for example, reductive alkylation of
amino acids by reaction with an aldehyde followed by reduction with
NaBH.sub.4, amidination with methylacetimidate or acylation with
acetic anhydride. The modifications to the subunit can be made
after expression of the subunit or construct or after the subunit
has been used to form a pore.
[0133] The subunit can be produced using standard methods known in
the art. Polynucleotide sequences encoding a subunit may be
isolated and replicated using standard methods in the art. Such
sequences are discussed in more detail below. Polynucleotide
sequences encoding a subunit may be expressed in a bacterial host
cell using standard techniques in the art. The subunit may be
produced in a cell by in situ expression of the polypeptide from a
recombinant expression vector. The expression vector optionally
carries an inducible promoter to control the expression of the
polypeptide.
[0134] A subunit may be produced in large scale following
purification by any protein liquid chromatography system from pore
producing organisms or after recombinant expression as described
below. Typical protein liquid chromatography systems include FPLC,
AKTA systems, the Bio-Cad system, the Bio-Rad BioLogic system and
the Gilson HPLC system.
Nucleic Acid Handling Enzyme
[0135] The constructs of the invention comprise a nucleic acid
handling enzyme. A nucleic acid handling enzyme is a polypeptide
that is capable of interacting with and modifying at least one
property of a nucleic acid. The enzyme may modify the nucleic acid
by cleaving it to form individual nucleotides or shorter chains of
nucleotides, such as di- or trinucleotides. The enzyme may modify
the nucleic acid by orienting it or moving it to a specific
position.
[0136] A nucleic acid is a macromolecule comprising two or more
nucleotides. The nucleic acid handled by the enzyme may comprise
any combination of any nucleotides. The nucleotides can be
naturally occurring or artificial. A nucleotide typically contains
a nucleobase, a sugar and at least one phosphate group. The
nucleobase is typically heterocyclic. Nucleobases include, but are
not limited to, purines and pyrimidines and more specifically
adenine, guanine, thymine, uracil and cytosine. The sugar is
typically a pentose sugar. Nucleotide sugars include, but are not
limited to, ribose and deoxyribose. The nucleotide is typically a
ribonucleotide or deoxyribonucleotide. The nucleotide typically
contains a monophosphate, diphosphate or triphosphate. Phosphates
may be attached on the 5' or 3' side of a nucleotide.
[0137] Nucleotides include, but are not limited to, adenosine
monophosphate (AMP), adenosine diphosphate (ADP), adenosine
triphosphate (ATP), guanosine monophosphate (GMP), guanosine
diphosphate (GDP), guanosine triphosphate (GTP), thymidine
monophosphate (TMP), thymidine diphosphate (TDP), thymidine
triphosphate (TTP), uridine monophosphate (UMP), uridine
diphosphate (UDP), uridine triphosphate (UTP), cytidine
monophosphate (CMP), cytidine diphosphate (CDP), cytidine
triphosphate (CTP), cyclic adenosine monophosphate (cAMP), cyclic
guanosine monophosphate (cGMP), deoxyadenosine monophosphate
(dAMP), deoxyadenosine diphosphate (dADP), deoxyadenosine
triphosphate (dATP), deoxyguanosine monophosphate (dGMP),
deoxyguanosine diphosphate (dGDP), deoxyguanosine triphosphate
(dGTP), deoxythymidine monophosphate (dTMP), deoxythymidine
diphosphate (dTDP), deoxythymidine triphosphate (dTTP),
deoxyuridine monophosphate (dUMP), deoxyuridine diphosphate (dUDP),
deoxyuridine triphosphate (dUTP), deoxycytidine monophosphate
(dCMP), deoxycytidine diphosphate (dCDP) and deoxycytidine
triphosphate (dCTP). The nucleotides are preferably selected from
AMP, TMP, GMP, UMP, dAMP, dTMP, dGMP or dCMP.
[0138] The nucleic acid handled by the enzyme is preferably double
stranded, such as DNA. The nucleic acid handled by the enzyme may
be single stranded, such as cDNA or RNA. Enzymes that handle single
stranded nucleic acids may be used to sequence double stranded DNA
as long as the double stranded DNA is chemically or thermally
dissociated into a single strand before it is handled by the
enzyme.
[0139] It is preferred that the tertiary structure of the nucleic
acid handling enzyme is known. Knowledge of the three dimensional
structure of the enzyme allows modifications to be made to the
enzyme to facilitate its function in the construct or pore of the
invention.
[0140] The enzyme may be any size and have any structure. For
instance, the enzyme may be an oligomer, such as a dimer or trimer.
The enzyme is preferably a small, gloubular polypeptide formed from
one monomer. Such enzymes are easy to handle and are less likely to
interfere with the pore forming ability of the subunit,
particularly if fused to or inserted into the sequence of the
subunit.
[0141] The amino and carboxy terminii of the enzyme are preferably
in close proximity. The amino and carboxy terminii of the enzyme
are more preferably presented on same face of the enzyme. Such
embodiments facilitate insertion of the enzyme into the sequence of
the subunit. For instance, if the amino and carboxy terminii of the
enzyme are in close proximity, each can be attached by genetic
fusion to adjacent amino acids in the sequence of the subunit.
[0142] It is also preferred that the location and function of the
active site of the enzyme is known. This prevents modifications
being made to the active site that abolish the activity of the
enzyme. It also allows the enzyme to be attached to the subunit so
that the enzyme handles the target nucleic acid sequence in such a
way that a proportion of the nucleotides in the target sequence
interacts with the pore. It is beneficial to position the active
site of the enzyme as close as possible to the part of the subunit
that forms part of the opening of the barrel of channel of the
pore, without the enzyme itself presenting a block to the flow of
current. Knowledge of the way in which an enzyme may orient nucleic
acids also allows an effective construct to be designed.
[0143] As discussed in more detail below, it may be necessary to
purify the construct of the invention. It is preferred that the
enzyme is capable of withstanding the conditions used to purify the
construct.
[0144] The constructs of the invention are useful for forming
pores. Such pores may be used to sequence nucleic acids. In order
that most of the nucleotides in the target nucleic acid are
correctly identified by stochastic sensing, the enzyme must handle
the nucleic acid in a buffer background which is compatible with
discrimination of the nucleotides. The enzyme preferably has at
least residual activity in a salt concentration well above the
normal physiological level, such as from 100 mM to 500 mM. The
enzyme is more preferably modified to increase its activity at high
salt concentrations. The enzyme may also be modified to improve its
processivity, stability and shelf life.
[0145] Suitable modifications can be determined from the
characterisation of nucleic acid handling enzymes from
extremophiles such as halophilic, moderately halophilic bacteria,
thermophilic and moderately thermophilic organisms, as well as
directed evolution approaches to altering the salt tolerance,
stability and temperature dependence of mesophilic or thermophilic
exonucleases.
[0146] The enzyme also preferably retains at least partial activity
at room temperature. This allows pores formed from the construct to
sequence nucleic acids at room temperature.
[0147] The nucleic acid handling enzyme is preferably a nucleolytic
enzyme. The nucleic acid handling enzyme is more preferably member
of any of the Enzyme Classification (EC) groups 3.1.11, 3.1.13,
3.1.14, 3.1.15, 3.1.16, 3.1.21, 3.1.22, 3.1.25, 3.1.26, 3.1.27,
3.1.30 and 3.1.31. The nucleic acid handling enzyme is more
preferably any one of the following enzymes: [0148] 3.
1.11.-Exodeoxyribonucleases producing 5'-phosphomonoesters. [0149]
3.1.11.1 Exodeoxyribonuclease I. [0150] 3.1.11.2
Exodeoxyribonuclease III. [0151] 3.1.11.3 Exodeoxyribonuclease
(lambda-induced). [0152] 3.1.11.4 Exodeoxyribonuclease (phage
SP3-induced). [0153] 3.1.11.5 Exodeoxyribonuclease V. [0154]
3.1.11.6 Exodeoxyribonuclease VII. [0155] 3. 1.13.-Exoribonucleases
producing 5'-phosphomonoesters. [0156] 3.1.13.1 Exoribonuclease II.
[0157] 3.1.13.2 Exoribonuclease H. [0158] 3.1.13.3
Oligonucleotidase. [0159] 3.1.13.4 Poly(A)-specific ribonuclease.
[0160] 3.1.13.5 Ribonuclease D. [0161] 3. 1.14.-Exoribonucleases
producing 3'-phosphomonoesters. [0162] 3.1.14.1 Yeast ribonuclease.
[0163] 3. 1.15.-Exonucleases active with either ribo- or
deoxyribonucleic acid producing 5' phosphomonoesters [0164]
3.1.15.1 Venom exonuclease. [0165] 3. 1.16.-Exonucleases active
with either ribo- or deoxyribonucleic acid producing 3'
phosphomonoesters [0166] 3.1.16.1 Spleen exonuclease. [0167] 3.
1.21.-Endodeoxyribonucleases producing 5'-phosphomonoesters. [0168]
3.1.21.1 Deoxyribonuclease I. [0169] 3.1.21.2 Deoxyribonuclease IV
(phage-T(4)-induced). [0170] 3.1.21.3 Type I site-specific
deoxyribonuclease. [0171] 3.1.21.4 Type II site-specific
deoxyribonuclease. [0172] 3.1.21.5 Type III site-specific
deoxyribonuclease. [0173] 3.1.21.6 CC-preferring
endodeoxyribonuclease. [0174] 3.1.21.7 Deoxyribonuclease V. [0175]
3. 1.22.-Endodeoxyribonucleases producing other than
5'-phosphomonoesters. [0176] 3.1.22.1 Deoxyribonuclease II. [0177]
3.1.22.2 Aspergillus deoxyribonuclease K(1). [0178] 3.1.22.3
Transferred entry: 3.1.21.7. [0179] 3.1.22.4 Crossover junction
endodeoxyribonuclease. [0180] 3.1.22.5 Deoxyribonuclease X. [0181]
3. 1.25.-Site-specific endodeoxyribonucleases specific for altered
bases. [0182] 3.1.25.1 Deoxyribonuclease (pyrimidine dimer). [0183]
3.1.25.2 Transferred entry: 4.2.99.18. [0184] 3.
1.26.-Endoribonucleases producing 5'-phosphomonoesters. [0185]
3.1.26.1 Physarum polycephalum ribonuclease. [0186] 3.1.26.2
Ribonuclease alpha. [0187] 3.1.26.3 Ribonuclease III. [0188]
3.1.26.4 Ribonuclease H. [0189] 3.1.26.5 Ribonuclease P. [0190]
3.1.26.6 Ribonuclease IV. [0191] 3.1.26.7 Ribonuclease P4. [0192]
3.1.26.8 Ribonuclease M5. [0193] 3.1.26.9 Ribonuclease
(poly-(U)-specific). [0194] 3.1.26.10 Ribonuclease IX. [0195]
3.1.26.11 Ribonuclease Z. [0196] 3. 1.27.-Endoribonucleases
producing other than 5'-phosphomonoesters. [0197] 3.1.27.1
Ribonuclease T(2). [0198] 3.1.27.2 Bacillus subtilis ribonuclease.
[0199] 3.1.27.3 Ribonuclease T(1). [0200] 3.1.27.4 Ribonuclease
U(2). [0201] 3.1.27.5 Pancreatic ribonuclease. [0202] 3.1.27.6
Enterobacter ribonuclease. [0203] 3.1.27.7 Ribonuclease F. [0204]
3.1.27.8 Ribonuclease V. [0205] 3.1.27.9 tRNA-intron endonuclease.
[0206] 3.1.27.10 rRNA endonuclease. [0207] 3.
1.30.-Endoribonucleases active with either ribo- or
deoxyribonucleic producing 5' phosphomonoesters [0208] 3.1.30.1
Aspergillus nuclease S(1). [0209] 3.1.30.2 Serratia marcescens
nuclease. [0210] 3. 1.31.-Endoribonucleases active with either
ribo- or deoxyribonucleic producing 3' phosphomonoesters [0211]
3.1.31.1 Micrococcal nuclease.
[0212] The enzyme is most preferably an exonuclease, such as a
deoxyribonuclease, which cleave nucleic acids to form individual
nucleotides. The advantages of exodeoxyribonucleases are that they
are active on both single stranded and double stranded DNA and
hydrolyse bases either in either the 5'-3' or 3'-5' direction.
[0213] An individual nucleotide is a single nucleotide. An
individual nucleotide is one which is not bound to another
nucleotide or nucleic acid by a nucleotide bond. A nucleotide bond
involves one of the phosphate groups of a nucleotide being bound to
the sugar group of another nucleotide. An individual nucleotide is
typically one which is not bound by a nucleotide bond to another
nucleic acid sequence of at least 5, at least 10, at least 20, at
least 50, at least 100, at least 200, at least 500, at least 1000
or at least 5000 nucleotides.
[0214] Preferred enzymes for use in the method include exonuclease
III enzyme from E. coli (SEQ ID NO: 10), exonuclease I from E. coli
(SEQ ID NO: 12), RecJ from T. thermophilus (SEQ ID NO: 14) and
bacteriophage lambda exonuclease (SEQ ID NO: 16) and variants
thereof. The exonuclease enzyme preferably comprises any of the
sequences shown in SEQ ID NOs: 10, 12, 14 and 16 or a variant
thereof. Three identical subunits of SEQ ID NO: 16 interact to form
a trimer exonuclease. A variant of SEQ ID NO: 10, 12, 14 or 16 is
an enzyme that has an amino acid sequence which varies from that of
SEQ ID NO: 10, 12, 14 or 16 and which retains nucleic acid handling
ability. The enzyme may include modifications that facilitate
handling of the nucleic acid and/or facilitate its activity at high
salt concentrations and/or room temperature. The enzyme may include
modifications that facilitate covalent attachment to or its
interaction with the subunit. As discussed above, accessible
cysteines may be removed from the enzyme to avoid non-specific
reactions with a linker. Alternatively, one or more reactive
cysteines may be introduced into the enyme, for instance as part of
a genetically-fused peptide linker, to facilitate attachment to the
subunit.
[0215] Variants may differ from SEQ ID NO: 10, 12, 14 and 16 to the
same extent as variants of SEQ ID NO: 2 differ from SEQ ID NO: 2 as
discussed above.
[0216] A variant of SEQ ID NO: 10, 12, 14 or 16 retains its nucleic
acid handling activity. A variant typically contains the regions of
SEQ ID NO: 10, 12, 14 or 16 that are responsible for nucleic acid
handling activity. The catalytic domains of SEQ ID NOs: 10, 12, 14
and 16 are discussed above. A variant of SEQ ID NO: 10, 12, 14 or
16 preferably comprises the relavant catalytic domain. A variant
SEQ ID NO: 10, 12, 14 or 16 typically includes one or more
modifications, such as substitutions, additions or deletions,
outside the relevant catalytic domain.
[0217] Preferred enzymes that are capable of pushing or pulling the
target nucleic acid sequence through the pore include polymerases,
exonucleases, helicases and topoisomerases, such as gyrases. The
polymerase is preferably a member of any of the Enzyme
Classification (EC) groups 2.7.7.6, 2.7.7.7, 2.7.7.19, 2.7.7.48 and
2.7.7.49. The polymerase is preferably a DNA-dependent DNA
polymerase, an RNA-dependent DNA polymerase, a DNA-dependent RNA
polymerase or an RNA-dependent RNA polymerase. The helicase is
preferably a member of any of the Enzyme Classification (EC) groups
3.6.1.- and 2.7.7.-. The helicase is preferably an ATP-dependent
DNA helicase (EC group 3.6.1.8), an ATP-dependent RNA helicase (EC
group 3.6.1.8) or an ATP-independent RNA helicase. The
topoisomerase is preferably a member of any of the Enzyme
Classification (EC) groups 5.99.1.2 and 5.99.1.3.
[0218] The enzyme may be labelled with a revealing label. The
revealing label may be any of those described above.
[0219] The enzyme may be isolated from an enzyme-producing
organism, such as E. coli, T. thermophilus or bacteriophage, or
made synthetically or by recombinant means. For example, the enzyme
may be synthesized by in vitro translation and transcription as
described above and below. The enzyme may be produced in large
scale following purification as described above.
Preferred Constructs
[0220] Preferred constructs of the invention comprise the sequence
shown in any one of SEQ ID NOs: 18, 20, 22, 24, 26, 28 and 30 or a
variant thereof. Variants of SEQ ID NO: 18, 20, 22, 24, 26, 28 or
30 must retain their pore forming ability and nucleic acid handling
ability. Variants may differ from SEQ ID NOs: 18, 20, 22, 24, 26,
28 and 30 to the same extent and in the same way as discussed above
for variants of SEQ ID NO: 2 and variants of SEQ ID NO: 10, 12, 14
or 16.
Polynucleotide Sequences
[0221] The present invention also provides polynucleotide sequences
which encode a construct in which the enzyme is genetically fused
to the subunit or is inserted into the sequence of the subunit. It
is straightforward to generate such polynucleotide sequences using
standard techniques. A polynucleotide sequence encoding the enzyme
is either fused to or inserted into a polynucleotide sequence
encoding the subunit. The fusion or insertion is typically in
frame. If a polynucleotide sequence encoding the enzyme is inserted
into a polynucleotide sequence encoding the subunit, the sequence
encoding the enzyme is typically flanked at both ends by
restriction endonuclease sites, such as those recognized by BspE1.
It may also be flanked at both ends by polynucleotide sequences
encoding linkers, such as 5 to 10 codons each encoding serine or
glycine.
[0222] The polynucleotide sequence preferably encodes a construct
comprising SEQ ID NO: 10, 12, 14 or 16 or a variant thereof
genetically fused to or inserted into SEQ ID NO: 2 or a variant
thereof. The variants of SEQ ID NO: 2, 10, 12, 14 or 16 may be any
of those discussed above. SEQ ID NO: 10, 12, 14 or 16 or a variant
thereof may be genetically fused to or inserted into SEQ ID NO: 2
or a variant thereof as described above.
[0223] The polynucleotide sequence preferably comprises SEQ ID NO:
9, 11, 13 or 15 or a variant thereof genetically fused to or
inserted into SEQ ID NO: 1 or a variant thereof. SEQ ID NO: 9, 11,
13 or 15 or a variant thereof is preferably inserted into SEQ ID
NO: 1 or a variant thereof between nucleotides 2765 and 2766, 2843
and 2844 or 2861 and 2862 of SEQ ID NO: 1. The polynucleotide
sequence more preferably comprises the sequence shown in SEQ ID NO:
17, 19, 21, 23, 25, 27 or 29 or a variant thereof.
[0224] Variants of SEQ ID NOs: 1, 9, 11, 13, 15, 17, 19, 21, 23,
25, 27 or 29 are sequences that are at least 50%, 60%, 70%, 80%,
90% or 95% homologous based on nucleotide identity to sequence of
SEQ ID NO: 1, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27 or 29 over the
entire sequence. There may be at least 80%, for example at least
85%, 90% or 95% nucleotide identity over a stretch of 600 or more,
for example 700, 750, 850 or 900 or more, contiguous nucleotides
("hard homogly"). Homology may be calculated as described above.
The polynucleotide sequence may comprise a sequence that differs
from SEQ ID NO: 1, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27 or 29 on
the basis of the degeneracy of the genetic code.
[0225] Polynucleotide sequences may be isolated and replicated
using standard methods in the art. Chromosomal DNA may be extracted
from a pore producing organism, such as Staphylococcus aureus,
and/or an enzyme producing organism, such as E. coli, T.
thermophilus or bacteriophage. The gene encoding the subunit and
enzyme may be amplified using PCR involving specific primers. The
amplified sequences may then be incorporated into a recombinant
replicable vector such as a cloning vector. The vector may be used
to replicate the polynucleotide in a compatible host cell. Thus
polynucleotide sequences encoding a subunit and/or enzyme may be
made by introducing a polynucleotide encoding a subunit and/or
enzyme into a replicable vector, introducing the vector into a
compatible host cell, and growing the host cell under conditions
which bring about replication of the vector. The vector may be
recovered from the host cell. Suitable host cells for cloning of
polynucleotides are known in the art and described in more detail
below.
[0226] The polynucleotide sequence may be cloned into suitable
expression vector. In an expression vector, the polynucleotide
sequence encoding a construct is typically operably linked to a
control sequence which is capable of providing for the expression
of the coding sequence by the host cell. Such expression vectors
can be used to express a construct.
[0227] The term "operably linked" refers to a juxtaposition wherein
the components described are in a relationship permitting them to
function in their intended manner. A control sequence "operably
linked" to a coding sequence is ligated in such a way that
expression of the coding sequence is achieved under conditions
compatible with the control sequences. Multiple copies of the same
or different polynucleotide may be introduced into the vector.
[0228] The expression vector may then be introduced into a suitable
host cell. Thus, a construct can be produced by inserting a
polynucleotide sequence encoding a construct into an expression
vector, introducing the vector into a compatible bacterial host
cell, and growing the host cell under conditions which bring about
expression of the polynucleotide sequence. The
recombinantly-expressed construct may self-assemble into a pore in
the host cell membrane. Alternatively, the recombinant construct
produced in this manner may be isolated from the host cell and
inserted into another membrane. When producing an oligomeric pore
comprising a construct of the invention and at least one different
subunit, the construct and different subunits may be expressed
separately in different host cells as described above, removed from
the host cells and assembled into a pore in a separate membrane,
such as a rabbit cell membrane.
[0229] The vectors may be for example, plasmid, virus or phage
vectors provided with an origin of replication, optionally a
promoter for the expression of the said polynucleotide sequence and
optionally a regulator of the promoter. The vectors may contain one
or more selectable marker genes, for example an ampicillin
resistance gene. Promoters and other expression regulation signals
may be selected to be compatible with the host cell for which the
expression vector is designed. A T7, trc, lac, ara or
.lamda..sub.L, promoter is typically used.
[0230] The host cell typically expresses the construct at a high
level. Host cells transformed with a polynucleotide sequence
encoding a construct will be chosen to be compatible with the
expression vector used to transform the cell. The host cell is
typically bacterial and preferably E. coli. Any cell with a .lamda.
DE3 lysogen, for example C41 (DE3), BL21 (DE3), JM109 (DE3), B834
(DE3), TUNER, Origami and Origami B, can express a vector
comprising the T7 promoter.
Modified Pores
[0231] The present invention also provides modified pores for use
in sequencing nucleic acids. The pores comprise at least one
construct of the invention. The pores may comprise more than one,
such as 2, 3 or 4, constructs of the invention.
[0232] A pore of the invention may be isolated, substantially
isolated, purified or substantially purified. A pore of the
invention is isolated or purified if it is completely free of any
other components, such as lipids or other pores. A pore is
substantially isolated if it is mixed with carriers or diluents
which will not interfere with its intended use. For instance, a
pore is substantially isolated or substantially purified if it
present in a form that comprises less than 10%, less than 5%, less
than 2% or less than 1% of other components, such as lipids or
other pores. Alternatively, a pore of the invention may be present
in a lipid bilayer or in a surfactant micelle.
[0233] The enzyme attached to the construct handles a target
nucleic acid sequence in such a way that a proportion of the
nucleotide in the target sequence interacts with the pore,
preferably the barrel or channel of the pore. Nucleotides are then
distinguished on the basis of the different ways in which they
affect the current flowing through the pore during the
interaction.
[0234] The fixed nature of the enzyme means that a target nucleic
acid sequence is handled by the pore in a specific manner. For
instance, each nucleotide may be digested from one of the target
sequence in a processive manner or the target sequence may be
pushed or pulled through the pore. This ensures that a proportion
of the nucleotides in the target nucleic acid sequence interacts
with the pore and is identified. The lack of any interruption in
the signal is important when sequencing nucleic acids. In addition,
the fixed nature of the enzyme and the pore means they can be
stored together, thereby allowing the production of a ready-to-use
sensor.
[0235] In a preferred embodiment, an exonuclease enzyme, such as a
deoxyribonuclease, is attached to the pore such that a proportion
of the nucleotides is released from the target nucleic acid and
interacts with the barrel or channel of the pore. In another
preferred embodiment, an enzyme that is capable of pushing or
pulling the target nucleic acid sequence through the pore is
attached to the pore such that the target nucleic acid sequence is
pushed or pulled through the barrel or channel of the pore and a
proportion of the nucleotides in the target sequence interacts with
the barrel or channel. In this embodiment, the nucleotides may
interact with the pore in blocks or groups of more than one, such
as 2, 3 or 4. Suitable enzymes include, but are not limited to,
polymerases, exonucleases, helicases and topoisomerases, such as
gyrases. In each embodiment, the enzyme is preferably attached to
the pore at a site in close proximity to the opening of the barrel
of channel of the pore. The enzyme is more preferably attached to
the pore such that its active site is orientated towards the
opening of the barrel of channel of the pore. This means that a
proportion of the nucleotides of the target nucleic acid sequence
is fed in the barrel or channel. The enzyme is preferably attached
to the cis side of the pore.
[0236] The modified pore may be based on any of the transmembrane
protein pores discussed above, including the .beta.-barrel pores
and .alpha.-helix bundle pores.
[0237] For constructs comprising the sequence shown in SEQ ID NO: 2
or a variant thereof, the pore typically comprises an appropriate
number of additional subunits comprising the sequence shown in SEQ
ID NO: 2 or a variant thereof. A preferred pore of the invention
comprises one construct comprising the sequence shown in SEQ ID NO:
2 or a variant thereof and six subunits comprising the sequence
shown in SEQ ID NO: 2 or a variant thereof. The pore may comprise
one or more subunits comprising the sequence shown in SEQ ID NO: 4
or a variant thereof. SEQ ID NO: 4 shows the sequence of SEQ ID NO:
2 except that it has an arginine at position 113 (M113R) and a
glutamine at position 139 (N139Q). A variant of SEQ ID NO: 4 may
differ from SEQ ID NO: 4 in the same way and to the same extent as
discussed for SEQ ID NO: 2 above. A preferred pore of the invention
comprises one construct comprising the sequence shown in SEQ ID NO:
2 or a variant thereof and six subunits comprising the sequence
shown in SEQ ID NO: 4 or a variant thereof.
[0238] The pores may comprise a molecular adaptor that facilitates
the interaction between the pore and the nucleotides or the target
nucleic acid sequence. The presence of the adaptor improves the
host-guest chemistry of the pore and nucleotides released from or
present in the target nucleic acid sequence. The principles of
host-guest chemistry are well-known in the art. The adaptor has an
effect on the physical or chemical properties of the pore that
improves its interaction with nucleotides. The adaptor typically
alters the charge of the barrel or channel of the pore or
specifically interacts with or binds to nucleotides thereby
facilitating their interaction with the pore.
[0239] The adaptor mediates the interaction between nucleotides
released from or present in the target nucleic acid sequence and
the pore. The nucleotides preferably reversibly bind to the pore
via or in conjunction with the adaptor. The nucleotides most
preferably reversibly bind to the pore via or in conjunction with
the adaptor as they pass through the pore across the membrane. The
nucleotides can also reversibly bind to the barrel or channel of
the pore via or in conjunction with the adaptor as they pass
through the pore across the membrane. The adaptor preferably
constricts the barrel or channel so that it may interact with the
nucleotides.
[0240] The adaptor is typically cyclic. The adaptor preferably has
the same symmetry as the pore. An adaptor having seven-fold
symmetry is typically used if the pore is heptameric (e.g. has
seven subunits around a central axis that contribute 14 strands to
a transmembrane .beta. barrel). Likewise, an adaptor having
six-fold symmetry is typically used if the pore is hexameric (e.g.
has six subunits around a central axis that contribute 12 strands
to a transmembrane .beta. barrel, or is a 12-stranded .beta.
barrel). Any adaptor that that facilitates the interaction between
the pore and the nucleotide can be used. Suitable adaptors include,
but are not limited to, cyclodextrins, cyclic peptides and
cucurbiturils. The adaptor is preferably a cyclodextrin or a
derivative thereof. The adaptor is more preferably
heptakis-6-amino-.beta.-cyclodextrin (am.sub.7-.beta.CD),
6-monodeoxy-6-monoamino-.beta.-cyclodextrin (am.sub.1-.beta.CD) or
heptakis-(6-deoxy-6-guanidino)-cyclodextrin (gu.sub.7-.beta.CD).
Table 2 below shows preferred combinations of pores and
adaptors.
TABLE-US-00002 TABLE 2 Suitable combinations of pores and adaptors
Number of strands in the transmembrane Pore .beta.-barrel Adaptor
Leukocidin 1.6 .gamma.-cyclodextrin (.gamma.-CD) OmpF 16
.gamma.-cyclodextrin (.gamma.-CD) .alpha.-hemolysin 14
.beta.-cyclodextrin (.beta.-CD) (or a variant 6-monodeoxy-6-
thereof monoamino-.beta.-cyclodextrin discussed above)
(am.sub.1.beta.-CD) heptakis-6-amino-.beta.- cyclodextrin
(am.sub.7-.beta.-CD) heptakis-(6-deoxy-6- guanidino)-cyclodextrin
(gu.sub.7-.beta.-CD) OmpG 14 .beta.-cyclodextrin (.beta.-CD)
6-monodeoxy-6- monoamino-(.beta.-cyclodextrin (am.sub.1.beta.-CD)
heptakis-6-amino-.beta.- cyclodextrin (am.sub.7-.beta.-CD)
heptakis-(6-deoxy-6- guanidino)-cyclodextrin (gu.sub.7-.beta.-CD)
NalP 12 .alpha.-cyclodextrin (.alpha.-CD) OMPLA 12
.alpha.-cyclodextrin (.alpha.-CD)
[0241] The adaptor is preferably covalently attached to the pore.
The adaptor can be covalently attached to the pore using any method
known in the art. The adaptor may be attached directly to the pore.
The adaptor is preferably attached to the pore using a bifunctional
crosslinker. Suitable crosslinkers are well-known in the art.
Preferred crosslinkers include 2,5-dioxopyrrolidin-1-yl
3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl
4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-1-yl
8-(pyridin-2-yldisulfanyl)octananoate. The most preferred
crosslinker is succinimidyl 3-(2-pyridyldithio)propionate (SPDP).
Typically, the adaptor is covalently attached to the bifunctional
crosslinker before the adaptor/crosslinker complex is covalently
attached to the pore but it is also possible to covalently attach
the bifunctional crosslinker to the pore before the bifunctional
crosslinker/pore complex is attached to the adaptor.
[0242] The site of covalent attachment is selected such that the
adaptor facilitates interaction of nucleotides released from or
present in the target nucleic acid sequence with the pore and
thereby allows detection of nucleotides. This can be done as
explained in the co-pending International application claiming
priority from U.S. Application No. 61/078,687 and being filed
simultaneously with this application [J A Kemp & Co Ref:
N.104403A; Oxford Nanolabs Ref: ONL IP 004].
[0243] For pores based on .alpha.-HL, the correct orientation of
the adaptor within the barrel or channel of the pore and the
covalent attachment of adaptor to the pore can be facilitated as
described in the co-pending International application claiming
priority from U.S. Application No. 61/078,687 and being filed
simultaneously with this application [J A Kemp & Co Ref:
N.104403A; Oxford Nanolabs Ref: ONL IP 004]. Any of the specific
modifications to SEQ ID NO: 2 disclosed in the co-pending
application are equally applicable to the pores of this invention.
In particular, every subunit of the pore, including the
construct(s), preferably has a glutamine at position 139 of SEQ ID
NO: 2. One or more of the subunits of the pore, including the
construct(s), may have an arginine at position 113 of SEQ ID NO: 2.
One or more of the subunits of the pore, including the
construct(s), may have a cysteine at position 119, 121 or 135 of
SEQ ID NO: 2. Any of the variants of SEQ ID NO: 2 shown in SEQ ID
NOs: 4, 6, 8, 10, 12 and 14 of the co-pending application may be
used to form a modified pore of the invention.
[0244] Preferred modified pores of the invention comprise:
[0245] (a) a construct comprising the sequence shown in SEQ ID NO:
18, 20, 22, 24, 26, 28 or 30 or a variant thereof and six subunits
of .alpha.-HL M113R/N139Q shown in SEQ ID NO: 4;
[0246] (b) a construct of the invention comprising the sequence
shown in SEQ ID NO: 2 or a variant thereof, five subunits of
.alpha.-HL M113R/N139Q shown in SEQ ID NO: 4 or a variant thereof
and one subunit of .alpha.-HL M113R/N139Q/G119C-D8 shown in SEQ ID
NO: 10 of the co-pending application;
[0247] (c) a construct of the invention comprising the sequence
shown in SEQ ID NO: 2 or a variant thereof, five subunits of
.alpha.-HL M113R/N139Q shown in SEQ ID NO: 4 or a variant thereof
and one subunit of .alpha.-HL M113R/N139Q/N121C-D8 shown in SEQ ID
NO: 12 of the co-pending application; or
[0248] (d) a construct of the invention comprising the sequence
shown in SEQ ID NO: 2 or a variant thereof, five subunits of
.alpha.-HL M113R/N139Q shown in SEQ ID NO: 4 or a variant thereof
and one subunit of .alpha.-HL M113R/N139Q/L135C-D8 shown in SEQ ID
NO: 14 of the co-pending application.
Methods of Producing Constructs of the Invention
[0249] The invention also provides methods of producing a construct
of the invention. The methods comprise covalently attaching a
nucleic acid handling enzyme to a transmembrane protein pore
subunit. Any of the subunits and enzymes discussed above can be
used in the methods. The site of and method of covalent attachment
are selected as discussed above.
[0250] The methods also comprise determining whether or not the
construct is capable of forming a pore and handling nucleic acids.
Assays for doing this are described above. If a pore can be formed
and nucleic acids can be handled, the subunit and enzyme have been
attached correctly and a construct of the invention has been
produced. If a pore cannot be formed or nucleic acids cannot be
handled, a construct of the invention has not been produced.
Methods of Producing Modified Pores
[0251] The present invention also provides methods of producing
modified pores of the invention. The modified pore may be formed by
allowing at least one construct of the invention to form a pore
with other suitable subunits or by covalently attaching an enzyme
to a subunit in an oligomeric pore. Any of the constructs,
subunits, enzymes or pores discussed above can be used in the
methods. The site of and method of covalent attachment are selected
as discussed above.
[0252] The methods also comprise determining whether or not the
pore is capable of handling nucleic acids and detecting
nucleotides. The pore may be assessed for its ability to detect
individual nucleotides or short chains of nucleotides, such as di-
or trinucleotides. Assays for doing this are described above and
below. If the pore is capable of handling nucleic acids and
detecting nucleotides, the subunit and enzyme have been attached
correctly and a pore of the invention has been produced. If a pore
cannot be handle nucleic acids and detect nucleotides, a pore of
the invention has not been produced.
[0253] In a preferred embodiment, a heteroheptamer of seven
subunits comprising the sequence shown in SEQ ID NO: 2 or a variant
thereof and containing one cysteine in an appropriate place is
reacted with a bifunctional cross-linker. The pore may be reacted
with the linker before or after it has been purified, typically by
SDS PAGE. The pore/linker construct is then reacted with an enzyme
containing at least one reactive cysteine, for instance on a
genetically-fused peptide linker. After the coupling reaction, the
modified pore of the invention is removed from any unreacted enzyme
or pore/linker construct.
Method of Purifying Pores
[0254] The present invention also provides methods of purifying
modified pores of the invention. The methods allow the purification
of pores comprising at least one construct of the invention. The
methods do not involve the use of anionic surfactants, such as
sodium dodecyl sulphate (SDS), and therefore avoid any detrimental
effects on the enzyme part of the construct. The methods are
particularly good for purifying pores comprising a construct of the
invention in which the subunit and enzyme have been genetically
fused.
[0255] The methods involve providing at least one construct of the
invention and any remaining subunits required to form a pore of the
invention. Any of the constructs and subunits discussed above can
be used. The construct(s) and remaining subunits are inserted into
synthetic lipid vesicles and allowed to oligomerise. Methods for
inserting the construct(s) and remaining subunits into synthetic
vesicles are well known in the art.
[0256] The synthetic vesicles should have similar properties to
rabbit cell membranes, but should lack the rabbit cell membrane
proteins. The vesicles may comprise any components and are
typically made of a blend of lipids. Suitable lipids are well-known
in the art. The synthetic vesicles preferably comprise 30%
cholesterol, 30% phosphatidylcholine (PC), 20%
phosphatidylethanolamine (PE), 10% sphingomyelin (SM) and 10%
phosphatidylserine (PS).
[0257] The vesicles are then contacting with a non-ionic surfactant
or a blend of non-ionic surfactants. The non-ionic surfactant is
preferably an Octyl Glucoside (OG) or DoDecyl Maltoside (DDM)
detergent. The oligomerised pores are then purified, for example by
using affinity purification based on his-tag or Ni-NTA.
Methods of Sequencing Nucleic Acids
[0258] The present invention also provides methods of sequencing a
target nucleic acid sequence. In one embodiment, the method
comprises (a) contacting the target sequence with a pore of the
invention, which comprises an exonuclease, such that the
exonuclease digests an individual nucleotide from one end of the
target sequence; (b) contacting the nucleotide with the pore so
that the nucleotide interacts with the adaptor; (c) measuring the
current passing through the pore during the interaction and thereby
determining the identity of the nucleotide; and (d) repeating steps
(a) to (c) at the same end of the target sequence and thereby
determining the sequence of the target sequence. Hence, the method
involves stochastic sensing of a proportion of the nucleotides in a
target nucleic acid sequence in a successive manner in order to
sequence the target sequence. Individual nucleotides are described
above.
[0259] In another embodiment, the method comprises (a) contacting
the target sequence with a pore of the invention so that the target
sequence is pushed or pulled through the pore and a proportion of
the nucleotides in the target sequence interacts with the pore and
(b) measuring the current passing through the pore during each
interaction and thereby determining the sequence of the target
sequence. Hence, the method involves stochastic sensing of a
proportion of the nucleotides in a target nucleic acid sequence as
the nucleotides pass through the barrel or channel in a successive
manner in order to sequence the target sequence.
[0260] Pores comprising a construct of the invention are
particularly suited to these methods. In order to effectively
sequence the nucleic acid, it is important to ensure that a
proportion of the nucleotides in the nucleic acid is identified in
a successive manner. The fixed nature of the enzyme means that a
proportion of the nucleotides in the target sequence affects the
current flowing through the pore.
[0261] The whole or only part of the target nucleic acid sequence
may be sequenced using this method. The nucleic acid sequence can
be any length. For example, the nucleic acid sequence can be at
least 10, at least 50, at least 100, at least 150, at least 200, at
least 250, at least 300, at least 400 or at least 500 nucleotides
in length. The nucleic acid sequence can be naturally occurring or
artificial. For instance, the method may be used to verify the
sequence of a manufactured oligonucleotide. The methods are
typically carried out in vitro.
[0262] The methods may be carried out using any suitable
membrane/pore system in which a pore comprising a construct of the
invention is inserted into a membrane. The methods are typically
carried out using (i) an artificial membrane comprising a pore
comprising a construct of the invention, (ii) an isolated,
naturally occurring membrane comprising a pore comprising a
construct of the invention, or (iii) a cell expressing a pore
comprising a construct of the invention. The methods are preferably
carried out using an artificial membrane. The membrane may comprise
other transmembrane and/or intramembrane proteins as well as other
molecules in addition to the pore of the invention.
[0263] The membrane forms a barrier to the flow of ions,
nucleotides and nucleic acids. The membrane is preferably a lipid
bilayer. Lipid bilayers suitable for use in accordance with the
invention can be made using methods known in the art. For example,
lipid bilayer membranes can be formed using the method of Montal
and Mueller (1972). Lipid bilayers can also be formed using the
method described in International Application No.
PCT/GB08/000,563.
[0264] The methods of the invention may be carried out using lipid
bilayers formed from any membrane lipid including, but not limited
to, phospholipids, glycolipids, cholesterol and mixtures thereof.
Any of the lipids described in International Application No.
PCT/GB08/000,563 may be used.
[0265] Methods are known in the art for inserting pores into
membranes, such as lipid bilayers. Some of those methods are
discussed above.
Interaction Between the Pore and Nucleotides
[0266] The nucleotide or nucleic acid may be contacted with the
pore on either side of the membrane. The nucleotide or nucleic acid
may be introduced to the pore on either side of the membrane. The
nucleotide or nucleic acid is typically contacted with the side of
the membrane on which the enzyme is attached to the pore. This
allows the enzyme to handle the nucleic acid during the method.
[0267] A proportion of the nucleotides of the target nucleic acid
sequence interacts with the pore and/or adaptor as it passes across
the membrane through the barrel or channel of the pore.
Alternatively, if the target sequence is digested by an
exonuclease, the nucleotide may interact with the pore via or in
conjunction with the adaptor, dissociate from the pore and remain
on the same side of the membrane. The methods may involve the use
of pores in which the orientation of the adaptor is fixed. In such
embodiments, the nucleotide is preferably contacted with the end of
the pore towards which the adaptor is oriented. Most preferably,
the nucleotide is contacted with the end of the pore towards which
the portion of the adaptor that interacts with the nucleotide is
orientated.
[0268] The nucleotides may interact with the pore in any manner and
at any site. As discussed above, the nucleotides preferably
reversibly bind to the pore via or in conjunction with the adaptor.
The nucleotides most preferably reversibly bind to the pore via or
in conjunction with the adaptor as they pass through the pore
across the membrane. The nucleotides can also reversibly bind to
the barrel or channel of the pore via or in conjunction with the
adaptor as they pass through the pore across the membrane.
[0269] During the interaction between a nucleotides and the pore,
the nucleotide affects the current flowing through the pore in a
manner specific for that nucleotide. For example, a particular
nucleotide will reduce the current flowing through the pore for a
particular mean time period and to a particular extent. In other
words, the current flowing through the pore is distinctive for a
particular nucleotide. Control experiments may be carried out to
determine the effect a particular nucleotide has on the current
flowing through the pore. Results from carrying out the method of
the invention on a test sample can then be compared with those
derived from such a control experiment in order to identify a
particular nucleotide.
Apparatus
[0270] The methods may be carried out using any apparatus that is
suitable for investigating a membrane/pore system in which a pore
comprising a construct of the invention is inserted into a
membrane. The methods may be carried out using any apparatus that
is suitable for stochastic sensing. For example, the apparatus
comprises a chamber comprising an aqueous solution and a barrier
that separates the chamber into two sections. The barrier has an
aperture in which the membrane containing the pore is formed. The
nucleotide or nucleic acid may be contacted with the pore by
introducing the nucleic acid into the chamber. The nucleic acid may
be introduced into either of the two sections of the chamber, but
is preferably introduced into the section of the chamber containing
the enzyme.
[0271] The methods may be carried out using the apparatus described
in International Application No. PCT/GB08/000562.
[0272] The methods involve measuring the current passing through
the pore during interaction with the nucleotides. Therefore the
apparatus also comprises an electrical circuit capable of applying
a potential and measuring an electrical signal across the membrane
and pore. The methods may be carried out using a patch clamp or a
voltage clamp. The methods preferably involves the use of a voltage
clamp.
Conditions
[0273] The methods of the invention involve the measuring of a
current passing through the pore during interaction with
nucleotides in a target nucleic acid sequence. Suitable conditions
for measuring ionic currents through transmembrane protein pores
are known in the art and disclosed in the Examples. The method is
carried out with a voltage applied across the membrane and pore.
The voltage used is typically from -400 mV to +400 mV. The voltage
used is preferably in a range having a lower limit selected from
-400 mV, -300 mV, -200 mV, -150 mV, -100 mV, -50 mV, -20 mV and 0
mV and an upper limit independently selected from +10 mV, +20 mV,
+50 mV, +100 mV, +150 mV, +200 mV, +300 mV and +400 mV. The voltage
used is more preferably in the range 120 mV to 170 mV. It is
possible to increase discrimination between different nucleotides
by a pore of the invention by using an increased applied
potential.
[0274] The methods are carried out in the presence of any alkali
metal chloride salt. In the exemplary apparatus discussed above,
the salt is present in the aqueous solution in the chamber.
Potassium chloride (KCl), sodium chloride (NaCl) or caesium
chloride (CsCl) is typically used. KCl is preferred. The salt
concentration is typically from 0.1 to 2.5M, from 0.3 to 1.9M, from
0.5 to 1.8M, from 0.7 to 1.7M, from 0.9 to 1.6M or from 1M to 1.4M.
High salt concentrations provide a high signal to noise ratio and
allow for currents indicative of the presence of a nucleotide to be
identified against the background of normal current fluctuations.
However, lower salt concentrations are preferably used so that the
enzyme is capable of functioning. The salt concentration is
preferably from 150 to 500 mM. Good nucleotide discrimination at
these low salt concentrations can be achieved by carrying out the
method at temperatures above room temperature, such as from
30.degree. C. to 40.degree. C.
[0275] The methods are typically carried out in the presence of a
buffer. In the exemplary apparatus discussed above, the buffer is
present in the aqueous solution in the chamber. Any buffer may be
used in the methods. One suitable buffer is Tris-HCl buffer. The
methods are typically carried out at a pH of from 4.0 to 10.0, from
4.5 to 9.5, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or
from 7.0 to 8.8 or 7.5 to 8.5. The pH used is preferably about
7.5.
[0276] The methods are typically carried out at from 0.degree. C.
to 100.degree. C., from 15.degree. C. to 95.degree. C., from
16.degree. C. to 90.degree. C., from 17.degree. C. to 85.degree.
C., from 18.degree. C. to 80.degree. C., 19.degree. C. to
70.degree. C., or from 20.degree. C. to 60.degree. C. The methods
may be carried out at room temperature. The methods are preferably
carried out at a temperature that supports enzyme function, such as
about 37.degree. C. Good nucleotide discrimination can be achieved
at low salt concentrations if the temperature is increased.
[0277] In addition to increasing the solution temperature, there
are a number of other strategies that can be employed to increase
the conductance of the solution, while maintaining conditions that
are suitable for enzyme activity. One such strategy is to use the
lipid bilayer to divide two different concentrations of salt
solution, a low salt concentration of salt on the enzyme side and a
higher concentration on the opposite side. One example of this
approach is to use 200 mM of KCl on the cis side of the membrane
and 500 mM KCl in the trans chamber. At these conditions, the
conductance through the pore is expected to be roughly equivalent
to 400 mM KCl under normal conditions, and the enzyme only
experiences 200 mM if placed on the cis side. Another possible
benefit of using asymmetric salt conditions is the osmotic gradient
induced across the pore. This net flow of water could be used to
pull nucleotides into the pore for detection. A similar effect can
be achieved using a neutral osmolyte, such as sucrose, glycerol or
PEG. Another possibility is to use a solution with relatively low
levels of KCl and rely on an additional charge carrying species
that is less disruptive to enzyme activity.
Exonuclease-Based Methods
[0278] In one embodiment, the method of sequencing a target nucleic
acid sequence involves contacting the target sequence with a pore
having an exonuclease enzyme, such as deoxyribonuclease, attached
thereto. The constructs needed to make such pores are discussed
above. Any of the exonuclease enzymes discussed above may be used
in the method. The exonuclease releases individual nucleotides from
one end of the target sequence. Exonucleases are enzymes that
typically latch onto one end of a nucleic acid sequence and digest
the sequence one nucleotide at a time from that end. The
exonuclease can digest the nucleic acid in the 5' to 3' direction
or 3' to 5' direction. The end of the nucleic acid to which the
exonuclease binds is typically determined through the choice of
enzyme used and/or using methods known in the art. Hydroxyl groups
or cap structures at either end of the nucleic acid sequence may
typically be used to prevent or facilitate the binding of the
exonuclease to a particular end of the nucleic acid sequence.
[0279] The method involves contacting the nucleic acid sequence
with the exonuclease so that the nucleotides are digested from the
end of the nucleic acid at a rate that allows identification of a
proportion of nucleotides as discussed above. Methods for doing
this are well known in the art. For example, Edman degradation is
used to successively digest single amino acids from the end of
polypeptide such that they may be identified using High Performance
Liquid Chromatography (HPLC). A homologous method may be used in
the present invention.
[0280] The rate at which the exonuclease functions is typically
slower than the optimal rate of a wild-type exonuclease. A suitable
rate of activity of the exonuclease in the method of sequencing
involves digestion of from 0.5 to 1000 nucleotides per second, from
0.6 to 500 nucleotides per second, 0.7 to 200 nucleotides per
second, from 0.8 to 100 nucleotides per second, from 0.9 to 50
nucleotides per second or 1 to 20 or 10 nucleotides per second. The
rate is preferably 1, 10, 100, 500 or 1000 nucleotides per second.
A suitable rate of exonuclease activity can be achieved in various
ways. For example, variant exonucleases with a reduced optimal rate
of activity may be used in accordance with the invention.
Pushing or Pulling DNA Through the Pore
[0281] Strand sequencing involves the controlled and stepwise
translocation of nucleic acid polymers through a pore. The majority
of DNA handling enzymes are suitable for use in this application
provided they hydrolyse, polymerise or process single stranded DNA
or RNA. Preferred enzymes are polymerases, exonucleases, helicases
and topoisomerases, such as gyrases. The enzyme moiety is not
required to be in as close a proximity to the pore lumen as for
individual nucleotide sequencing as there is no potential for
disorder in the series in which nucleotides reach the sensing
moiety of the pore.
[0282] The two strategies for single strand DNA sequencing are the
translocation of the DNA through the nanopore, both cis to trans
and trans to cis, either with or against an applied potential. The
most advantageous mechanism for strand sequencing is the controlled
translocation of single strand DNA through the nanopore with an
applied potential. Exonucleases that act progressively or
processively on double stranded DNA can be used on the cis side of
the pore to feed the remaining single strand through under an
applied potential or the trans side under a reverse potential.
Likewise, a helicase that unwinds the double stranded DNA can also
be used in a similar manner. There are also possibilities for
sequencing applications that require strand translocation against
an applied potential, but the DNA must be first "caught" by the
enzyme under a reverse or no potential. With the potential then
switched back following binding the strand will pass cis to trans
through the pore and be held in an extended conformation by the
current flow. The single strand DNA exonucleases or single strand
DNA dependent polymerases can act as molecular motors to pull the
recently translocated single strand back through the pore in a
controlled stepwise manner, trans to cis, against the applied
potential.
Kits
[0283] The present invention also provides kits for producing a
modified pore for use in sequencing nucleic acids. In one
embodiment, the kits comprise at least one construct of the
invention and any remaining subunits need to form a pore. The kits
may comprise enough constructs of the invention to form a complete
pore (i.e. a homo-oligomer). The kits may comprise any of the
constructs and subunits discussed above. A preferred kit comprises
(i) a construct comprising a subunit comprising the sequence shown
in SEQ ID NO: 2 or a variant thereof and (ii) six subunits
comprising the sequence shown in SEQ ID NO: 2 or a variant thereof.
A more preferred kit comprises (i) a construct comprising the
sequence shown in SEQ ID NO: 18, 20, 22, 24, 26, 28 or 30 or a
variant thereof and (ii) six subunits comprising the sequence shown
in SEQ ID NO: 2 or a variant thereof.
[0284] In another embodiment, the kits comprise at least one
polynucleotide sequence of the invention and polynucleotide
sequences encoding any remaining subunits needed to form a pore.
The kit may comprise enough polynucleotides of the invention to
encode a complete pore (i.e. a homo-oligomer). The kits may
comprise any of the polynucleotides described above. A preferred
kit comprises (i) a polynucleotide sequence encoding a construct,
which comprises a subunit comprising the sequence shown in SEQ ID
NO: 2 or a variant thereof and (ii) six polynucleotide sequences
each encoding a subunit comprising the sequence shown in SEQ ID NO:
2 or a variant thereof. A more preferred kit comprises (i) a
polynucleotide sequence encoding a construct comprising the
sequence shown in SEQ ID NO: 18, 20, 22, 24, 26, 28 or 30 or a
variant thereof and (ii) six polynucleotide sequences each encoding
a subunit comprising the sequence shown in SEQ ID NO: 2 or a
variant thereof.
[0285] The kits of the invention may additionally comprise one or
more other reagents or instruments which enable any of the
embodiments mentioned above to be carried out. Such reagents or
instruments include one or more of the following: suitable
buffer(s) (aqueous solutions), means to obtain a sample from a
subject (such as a vessel or an instrument comprising a needle),
means to amplify and/or express polynucleotide sequences, a
membrane as defined above or voltage or patch clamp apparatus.
Reagents may be present in the kit in a dry state such that a fluid
sample resuspends the reagents. The kit may also, optionally,
comprise instructions to enable the kit to be used in the method of
the invention or details regarding which patients the method may be
used for. The kit may, optionally, comprise nucleotides.
[0286] The following Example illustrates the invention:
Example
1 Materials and Methods
1.1 Bacterial Strains and Growth Conditions
[0287] The bacterial strains used in this work were E. coli strains
XL-10 Gold and BL21 DE3 pLysS (Stratagene). E. coli strains were
grown at 37.degree. C. either in Luria-Bertani Broth (LB), Terrific
Broth at 225 rpm, Luria-Bertani agar (LA) or tryptone-yeast extract
agar (TY) (Bertani, G. (1951). Studies on lysogenesis. I. The mode
of phage liberation by lysogenic Escherichia coli. Journal of
Bacteriology. 62, 293-300; Beringer, J. (1974). R factor transfer
in Rhizobium leguminosarum. Journal of General Microbiology. 84,
188-98; and Tartoff, K. and Hobbs, C. (1987). Improved media for
growing plasmid and cosmid clones. Bethesda Research Labs Focus. 9,
12). Antibiotics were used at the following concentrations:
Ampicillin 100 .mu.g ml.sup.-1; chloramphenicol 30 .mu.g
ml.sup.-1.
1.2 Genetic Manipulations
[0288] All general DNA cloning was performed as adapted methods of
that previously described (Sambrook, J. and Russell, D. (2001).
Molecular Cloning: A Laboratory Manual, 3rd Edition. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, N.Y.). DNA
polymerases, restriction endonucleases, exonuclease, ligases and
phosphatases were all obtained from New England Biolabs.
Exonuclease genes were manufactured by GenScript Corporation and
received as fragments cloned into pT7-SC1, by BspEI or
NdeI/HindIII. All mutations and fusion constructs were assembled in
the expression vector pT7-SC1 (Cheley, S., Malghani, M., Song, L.,
Hobaugh, M., Gouaux, E., Yang, J. and Bayley, H. (1997).
Spontaneous oligomerization of a staphylococcal alpha-hemolysin
conformationally constrained by removal of residues that form the
transmembrane beta-barrel. Protein Engineering. 10, 1433-43) and
verified by sequencing using either the T7 forward or reverse
primers, EcoExoIII_seq and EcoExoI_seq.
[0289] Site directed mutagenesis of the .alpha.HL gene was
performed by in vivo homologous recombination of PCR products
(Jones, D. (1995) PCR mutagenesis and recombination in vivo. In PCR
primer: a laboratory manual. In: Dveksler, C. (ed). Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, N.Y.). Amplification
of two halves of the target plasmid with complimentary primer pairs
generates two PCR products with complimentary sequences at both the
5' and 3' ends. Transformation of both products into chemically
competent E. coli allows in vivo homologous recombination. For all
mutagenesis SC46 was used as the antisense primer for amplification
of product 1 and SC47 as the sense primer for amplification of
product 2. These complementary primer binding sites are within the
.beta.-lactamase gene of pT7-SC1. Colonies recovered on LA 100 ng
.mu.l.sup.-1 ampicillin therefore indicated successful homologous
recombination.
[0290] PCR was conducted in 50 .mu.l reactions using 1 unit
Phusion.TM. DNA polymerase, 0.2 mM dNTPs, 1 .mu.M primers and 4 ng
BamHI/HindIII or NdeI/EcoNI digested plasmid DNA. Reactions were
cycled as follows: 1 cycle of 98.degree. C. for 2 min; 30 cycles of
98.degree. C. for 15 s, 57.degree. C. for 30 s and 72.degree. C.
for 45 s; and a final extension of 72.degree. C. for 5 min. 2.5
.mu.l of each pair of PCR products were mixed and used to transform
chemically competent E. coli (XL-10 Gold).
1.3 Rapid In Vitro Transcription Translation
[0291] [.sup.35S]L-methionine labelled proteins were generated by
coupled in vitro transcription and translation (IVTT) using an E.
coli T7-S30 extract system for circular DNA (Promega). The complete
amino acid mixture (1 mM) minus cysteine and the complete amino
acid mixture (1 mM) minus methionine, supplied in the kit, were
mixed in equal volumes to obtain the working amino acid solution
required to generate high concentrations of the protein. Reactions
were scaled up or down based on the following, for a 50 .mu.l
reaction volume: 20 .mu.l S30 Premix solution; 5 .mu.l amino acid
mix; 1 .mu.l [.sup.35S]L-methionine (MP Biomedicals, 1175 Ci
mmol.sup.-1, 10 mCi ml.sup.-1), 1 .mu.l rifampicin (0.8 mg
ml.sup.-1), 8 .mu.l plasmid DNA (400 ng .mu.l.sup.-1) and 15 .mu.l
T7 S30 extract. Synthesis was carried out for 1.5 hours at
37.degree. C. to produce 50 .mu.l of radiolabelled IVTT protein.
Different proteins were also co-expressed in one reaction as for
coupled transcription, translation and oligomerisation. The
reaction components remained the same except the DNA concentration
was divided accordingly for each plasmid encoding each protein.
Protein samples were centrifuged at 14,000 rpm for 10 minutes to
separate insoluble debris of IVTT reactions.
1.4 In Vivo Protein Expression
[0292] Wild-type .alpha.-hemolysin and fusion constructs were
cloned into the expression vector pT7-SC1, under the control of the
inducible T7 promoter, and expressed in E. coli (BL21 DE3 pLysS,
Stratagene) as soluble proteins. Cultures were grown to a high
OD.sub.600 (approximately 1.5-2) at 37.degree. C. and 240 rpm in
Terrific broth medium (100 .mu.g .mu.l.sup.-1 ampicillin and 30
.mu.g .mu.l.sup.-1 chloramphenicol). The temperature was reduced to
18.degree. C. and cultures left for 30 minutes to equilibrate. Over
expression of the target protein was induced by addition of IPTG to
the medium (0.2 mM). After 18 hours cells were pelleted at 10,000
rpm for 30 minutes at 4.degree. C. Cells were resuspended and lysed
by the addition of BugBuster (Novagen) supplemented with the
addition of benzonase, EDTA-free proteinase inhibitors (Roche) and
to 50 mM MgCl.sub.2. Cell debris was pelleted by centrifugation at
10,000 rpm for 30 minutes at 4.degree. C. and polyethyleneimine
(PEI) added to the supernatant. The recovered supernatant was
incubated for 30 mins at 4.degree. C. after which precipitate was
removed by centrifugation at 10,000 rpm for 30 minutes at 4.degree.
C. Clarified lysate was filtered and adjusted to pH 8.0, 500 mM
NaCl, 10 mM Imidazole.
[0293] His-tagged proteins were purified as standard practice by
Ni-NTA affinity chromatography and gel filtration. Non-tagged
.alpha.-hemolysin subunits were purified as standard practice by
cation exchange followed by gel filtration.
1.4.1 Affinity Purification (His-tag)
[0294] Clarified lysate was filtered and adjusted to pH 8.0, 500 mM
NaCl, 10 mM Imidazole before loading onto a His-Trap crude column
(GE Healthcare) and eluted with 300 mM Imidazole. Fractions
containing the protein of interest were combined and applied to a
gel filtration column equilibrated with 10 mM TRIS pH 8.0, 100 mM
NaCl, 1 mM DTT. Eluted protein was evaluated by SDS-PAGE.
1.4.2 Ion Exchange
[0295] Clarified lysate was filtered and adjusted to 10 mM MES pH
6.0 before loading onto a cation exchange column (GE Healthcare)
and eluting with 0-500 mM NaCl. Fractions containing the protein of
interest were combined and applied to a gel filtration column.
Eluted protein was evaluated by SDS-PAGE.
[0296] To maintain the reactivity of engineered cysteine residues
in .alpha.-Hemolysin derivatives, required as sites for chemical
modification, proteins were purified using the same buffers but
supplemented to 1 mM DTT. Exonucleases or exonuclease fusion
proteins were purified using the same buffers supplemented to 1 mM
MgCl.sub.2.
1.5 Oligomerisation on Red Blood Cell Membranes
[0297] .alpha.-Hemolysin monomers were mixed in various molar
ratios and allowed to oligomerise on rabbit erythrocyte membranes
(2.5 mg protein ml.sup.-1) for 1 hour at either room temperature,
30.degree. C., 37.degree. C. or 42.degree. C. After the incubation,
reaction mixture was centrifuged at 14,000 rpm for 10 minutes and
supernatant discarded. Membrane pellet was washed by resuspension
in 200 .mu.l MBSA (10 mM MOPS, 150 mM NaCl, pH 7.4 containing 1 mg
ml.sup.-1 bovine serum albumin) and centrifuging again at 14,000
rpm for 10 minutes. After discarding the supernatant, membrane
pellet was dissolved in 75 .mu.l of 1.times. Laemmli sample buffer,
with the addition of .beta.-mercaptoethanol. The entire sample was
loaded into a single well of a 5% SDS-polyacrylamide gel and
electrophoresed for .about.18 hours at 50 V, with 0.01 mM sodium
thioglycolate included in the running buffer. Gel was vacuum-dried
onto a Whatman 3 mm filter paper at 50.degree. C. for about three
hours and exposed to an X-ray film overnight (Kodak). The oligomer
band was excised from the gel, using the autoradiogram as template,
and the gel slice rehydrated in 300 .mu.l TE buffer (10 mM Tris, 1
mM EDTA, pH 8.0) containing 2 mM DTT. After removing the Whatman
filter paper slice, gel piece was crushed using a sterile pestle.
Oligomer protein was separated from gel debris by centrifuging
through 0.2 UM cellulose acetate microfilterage tubes (Rainin) at
14,000 rpm for 30 min. Filtrate was stored in aliquots at
-80.degree. C.
1.6 Oligomerisation on Synthetic Lipid Vesicles
[0298] Synthetic lipid vesicles composed of: 30% cholesterol; 30%
phosphatidylcholine (PC); 20% phosphatidylethanolamine (PE); 10%
sphingomyelin (SM); 10% phosphatidylserine (PS); were prepared by
bath sonication for 15 minutes at room temperature. Organic solvent
is evaporated by a gentle stream of nitrogen until a dry film is
produced. Deionised water added to give a required concentration of
2.5 mg ml.sup.-1 and mixture bath sonicated again for 15 minutes.
Wild-type .alpha.-hemolysin and fusion monomers were mixed in
various molar ratios and allowed to oligomerise on synthetic lipid
vesicles (2.5 mg ml.sup.-1 for every 1 mg .alpha.-hemolysin
monomer) for 1 hour at either room temperature, 30.degree. C.,
37.degree. C. or 42.degree. C. and 350 rpm. To pellet lipid
associated proteins samples were centrifuged at 14,000 rpm for 10
minutes. Pellet was washed once in MBSA (10 mM MOPS, 150 mM NaCl,
pH 7.4 containing 1 mg ml.sup.-1 bovine serum albumin) and lipids
were dissolved by addition of 0.1-1% n-Dodecyl-D-maltopyranoside
(DDM), for 1 hour at either 4.degree. C. or room temperature. To
purify the fusion homo and heteroheptamers away from wild-type
homoheptamer 300 .mu.l of Ni-NTA agarose (Qiagen) was added and
left overnight at 4.degree. C. and 350 rpm. Affinity bound heptamer
was pelted with Ni-NTA agarose by centrifugation at 14,000 rpm for
10 minutes. The Ni-NTA agarose beads were washed twice in 500 .mu.l
wash buffer (10 mM Tris, 10 mM Imidazole, 500 mM NaCl, pH 8.0) for
10 minutes and recovered by centrifugation. Purified heteroheptamer
was eluted in 500 .mu.l elution buffer (10 mM Tris, 250 mM
Imidazole, pH 8.0) for 1 hour at 4.degree. C. The Ni-NTA agarose
was removed by centrifugation and the supernatant containing the
eluted purified fusion heptamers removed. Eluted heptamers were
de-salted by passage through a buffer exchange column (NAP-5, GE
Healthcare), equilibrated with 10 mM Tris pH 8.0.
1.7 Exonuclease Fluorescence Assay
[0299] Recombinant E. coli Exonuclease III was purchased from New
England Biolabs (100 units .mu.l.sup.-1). Double stranded DNA
template labelled with a 5' fluorophore (5HEX) on the sense strand
and a 3' black hole quencher (BHQ-2a-Q) on the antisense strand was
obtained from Operon.
[0300] The oligo sequences are given below along with the
respective fluorophore and quencher pair:
TABLE-US-00003 (SEQ ID NO: 31)
5'[5HEX]GCAACAGAGCTGATGGATCAAATGCATTAGGTAAACATGTT ACGTCGTAA 3' (SEQ
ID NO: 32) 5'CGATCTTACGACGTAACATGTTTACCTAATGCATTTGATCCATCAGC
TCTGTTGC[BHQ2a]3'
The substrate dsDNA has a 5 bp overhang at the 5' end of the
antisense strand, enabling initiation of exonuclease III on the 3'
end of the sense strand.
[0301] Fluorescence measurements were taken using a Cary Eclipse
(Varian) with an excitation and emission wavelength of 535 and 554
nm respectively and an excitation and emission slit of 5 nm.
Measurements were taken every 4 seconds for 60 minutes. 40 .mu.l
reactions were performed at 37.degree. C. and consisted of: 200 nm
substrate dsDNA; 25 mM Tris pH 7.5; 1 mM MgCl.sub.2; 100 mM KCl;
0.001 units Exo III; unless otherwise stated.
1.8 Planar Bilayer Recordings
[0302] All bilayers were formed by apposition of two monolayers of
1,2-diphytanoyl-sn-glycero-3-phosphocholine (Avanti Polar Lipids)
across a 60-150 .mu.m diameter aperture in Teflon film (25 .mu.m
thickness from Goodfellow, Malvern, Pa.), which divided a chamber
into two buffer compartments (cis and trans) each with a volume of
1 ml. Bilayers were formed across the aperture by consecutively
raising the buffer level in each compartment until a high
resistance seal was observed (.gtoreq.10 G.OMEGA.). Unless
otherwise stated, fusion heptamers and DNA or dNMPs were added to
the cis compartment, which was connected to ground. The adapter
molecule am7.beta.CD or am6-amPDP1-.beta.CD was added to the trans
compartment if required, which was connected to the head-stage of
the amplifier. Unless stated otherwise, experiments were carried
out in 25 mM Tris.HCl, 400 mM KCl pH 8.0, at 22.degree. C.
1.9 Exonucleases
[0303] Exonucleases, such as deoxyribonucleases, are a subgroup of
the EC 3.1 enzymes. They catalyse the hydrolysis of the
phosphodiester bond between adjacent bases in a DNA strand to
release individual nucleoside 5' mono-phosphates (FIG. 1).
Attractive activities catalyse the cleavage of this bond (through
nucleophilic attack of an activated water molecule upon the
phosphorus) as shown.
[0304] There are a limited number of distinct enzymatic activities
that degrade nucleic acids into their component parts, although
numerous homologues will exist in different organisms (for example,
Exonuclease III). From a detailed literature search, the two most
processive exonuclease enzymes are Exonuclease I, encoded by the
sbcB gene of E. coli, and .lamda.-exonuclease, encoded by the exo
gene of bacteriophage .lamda. (Thomas, K. and Olivera, B. (1978)
Processivity of DNA exonucleases. Journal of Biological Chemistry.
253, 424-429; and Zagursky, R. and Hays, J. (1983). Expression of
the phage lambda recombination genes exo and bet under lacPO
control on a multi-copy plasmid. Gene. 23, 277-292). In addition,
activity of Exonuclease I has been demonstrated in high salt
concentrations (Hornblower, B., Coombs, A., Whitaker, R.,
Kolomeisky, A., Picone, S., Meller, A. Akeson, M. (2007).
Single-molecule analysis of DNA-protein complexes using nanopores.
Nature Methods. 4, 315-317). As .lamda. exonuclease is a trimer the
attachment of a functional exonuclease is more challenging so the
monomeric enzyme Exonuclease III was also included, as despite its
shorter processivity rate it also degrades one strand of dsDNA to
yield nucleoside 5' monophosphates. Whilst Exo I degrades ssDNA in
a 3'-5' direction RecJ acts 5'-3' and so was also included in this
work (Lovett, S, and Kolodner, R. (1989). Identification and
purification of a single-stranded-DNA-specific exonuclease encoded
by the real gene of Escherichia coli. Proceedings of the National
Academy of Sciences of the United States of America. 86,
2627-2631). Both ssDNA exonucleases have been demonstrated to
interact and act cooperatively with single stranded binding protein
(Genschel, J., Curth, U. and Urbanke, C. (2000) Interaction of E.
coli single-stranded DNA binding protein (SSB) with exonuclease I.
The carboxy terminus of SSB is the recognition site for the
nuclease. Biological Chemistry. 381, 183-192; and Han, E., Cooper,
D., Persky, N., Sutera, V., Whitaker, R., Montello, M. and Lovett,
S. (2006). RecJ exonuclease: substrates, products and interaction
with SSB. Nucleic Acids Research. 34, 1084-1091). The use of these
proteins may be required to prevent secondary structure formation
of the ssDNA substrate that may enzyme initiation or processivity
in high salt concentrations.
[0305] Four exonucleases are used in this Example:
1. Exo III from E. coli, Monomeric, dsDNA, 3'-5' (SEQ ID NOs: 9 and
10) 2. Exo I from E. coli, Monomeric, ssDNA, 3'-5' (SEQ ID NOs: 11
and 12) 3. RecJ from T. thermophilus, Monomeric, ssDNA, 5'-3' (SEQ
ID NOs: 13 and 14) 4. .lamda. Exo from .lamda. bacteriophage,
Trimeric, dsDNA, 5'-3' (the sequence of one monomer is shown in SEQ
ID NOs: 15 and 16)
[0306] High resolution crystal structures are available for all
these enzymes (Mol, C., Kuo, C., Thayer, M., Cunningham, R. and
Tainer, J. (1995) Structure and function of the multifunctional
DNA-repair enzyme exonuclease III. Nature. 374, 381-386; Kovall, R.
and Matthews, B. (1997). Toroidal structure of lambda-exonuclease.
Science. 277, 1824-1827; and Busam, R. (2008). Structure of
Escherichia coli exonuclease I in complex with thymidine
5'-monophosphate. Acta Crystallographica. 64, 206-210) and are
shown in FIG. 2. The TthRecJ is the enzymes core domain as
identified by Yamagata et al. (Yamagata, A., Masui, R., Kakuta, Y.,
Kuramitsu, S, and Fukuyama, K. (2001).
1.10 Genetic Attachment
[0307] Taking the characteristics of the exonuclease as detailed
above, the work described here was guided by the generation of a
hypothetical model in which just one of the seven subunits of the
.alpha.HL heptamer is modified to carry the exonuclease activity.
FIG. 3 is a representation of the fusion construct assembled into a
heteroheptamer with the exonuclease attached to a loop on the cis
side of the protein. This model satisfies other additional
desirable characteristics. An exonuclease fused on the cis side of
the .alpha.HL heptamer under positive potential should release
monophosphate nucleosides or ssDNA that will migrate from the cis
to the trans side of the pore. This direction of migration is
standard in much of the published literature of nanopore sensing.
The genetic attachment of an exonuclease within a loop region also
invariably means that the N and C terminal linkers can be designed
to limit and constrain the mobility of the exonuclease in relation
to the lumen of the pore.
[0308] In order to create a genetic fusion of the .alpha.-HL and
the exonuclease proteins, genetic manipulation of the pre-existing
expression plasmid pT7-SC1 carrying the wild-type .alpha.-HL gene
was made (SEQ ID NO: 3). This plasmid carries the gene encoding the
wild-type .alpha.-HL (SEQ ID NO: 1) without the benefit of any
mutations that have been demonstrated to enhance the capacity of
the pore to detect and discriminate monophosphate nucleosides.
Unique BspEI restriction endonuclease sites were engineered into
the .alpha.-HL gene at three specific locations, to enable
insertion of the exonuclease gene, detailed below. Three plasmids
are thus generated, with each one carrying just a single BspEI site
for exonuclease gene infusion.
[0309] The first insertion site, L1, is located between residues
T18 and T19 of the first loop region (N6-V20) of the
.alpha.-hemolysin protein (SEQ ID NO: 6). The second insertion
site, L2, is located between residues D44 and D45 of the start of
the second loop region (D44-K50) of the .alpha.-hemolysin protein
(SEQ ID NO: 7). The third insertion site, L2b, is located between
residues K50 and K51 of the end of the second loop region (D44-K50)
of the .alpha.-hemolysin protein (SEQ ID NO: 8).
[0310] Exonuclease genes were codon optimised for expression in E.
coli and synthesised by GenScript Corporation (SEQ ID NOs: 10, 12,
15 and 16). Genes were flanked by regions encoding 10 residues of
repeating serine-glycine. Such a protein sequence is believed to be
substantially devoid of a defined secondary or tertiary structure.
The terminal ends of the linkers were also defined by recognition
sequences for the restriction endonuclease BspEI, as this sequence
also encodes a serine and glycine that form part of the linker. The
recognition site of this enzyme (TCCGGA) was similarly engineered
into the three specific locations within the .alpha.HL gene to
provide a means of inserting the exonuclease genes in frame at
these defined locations.
[0311] The recombinant gene encodes a fusion protein consisting of:
a portion of .alpha.HL; a 10 serine-glycine linker region; an
exonuclease; a 10 serine-glycine linker region; and the remaining
portion of .alpha.HL. Once made, the chimeric gene construct was
sequenced and verified to be as shown in FIG. 4.
[0312] Both the N and C-terminii of .alpha.-hemolysin are suitable
for genetic fusion to an enzyme. It has been shown that the 17
N-terminal residues, which constitute the amino latch, are
dispensable for heptamer formation. Whilst it is not possible to
delete more than 3 residues from the C-terminus, without effecting
oligomerisation, it is already readily presented as a possible
attachment point at the back of the cap domain (Walker, B. and
Bayley, H. (1995). Key residues for membrane binding,
oligomerization and pore-forming activity of Staphylococcal
.alpha.-hemolysin identified by cysteine scanning mutagenesis and
targeted chemical modification. The Journal of Biological
Chemistry. 270, 23065-23071).
[0313] The attachment of enzymes at the N and C-terminus of
.alpha.-hemolysin was carried out in a similar manner to that
described above. The enzyme and .alpha.-hemolysin domains were
again mediated by serine-glycine rich linkers to ensure the
physical separation necessary for correct folding and spatial
separation of each protein domain. The exact details of attachment
are however detailed in a later section.
[0314] The hemolysin monomers were initially used as a wildtype
monomer (wt), however we have shown that a HL-M113R/N139Q monomer
shows improved base discrimination and the baseline was changed to
this background. Further work showed that the base best resolution
was achieved when an adapter molecule was attached to the L135C
position, this was added to the hemolysin-exonuclease fusion in
later constructs.
[0315] In the construct nomenclature, the monomer HL-M113R/N139Q is
abbreviated to HL-RQ and the HL-M113R/N139Q/L135C monomer is
abbreviated to HL-RQC. Therefore the fusion construct
HL-(M113R/N139Q).sub.6(M113R/N139Q/L135C-EcoExoIII-L1-H6).sub.1 is
shortened to HL-(RQ).sub.6(RQC-EcoExoIII-L1-H6).sub.1.
2 Results
2.1 Oligomerisation of Loop 1 Fusion Proteins
[0316] Water soluble .alpha.-hemolysin monomers can bind to and
self-assemble on a lipid membrane to form a transmembrane pore of
defined structure, via an intermediate heptameric prepore (Walker,
B. and Bayley, H. (1995). Key residues for membrane binding,
oligomerization and pore-forming activity of Staphylococcal
.alpha.-hemolysin identified by cysteine scanning mutagenesis and
targeted chemical modification. The Journal of Biological
Chemistry. 270, 23065-23071). Fully assembled pores can then be
isolated and recovered through SDS PAGE, for biophysical
characterisation. Radiolabelled .alpha.-hemolysin monomers produced
through in vitro transcription translation (IVTT) and oligomerised
on purified rabbit red blood cell membranes, enable heptamers to be
recovered from the gel using the autoradiograph as template.
Modified monomers can also be incorporated into the heptamer in any
number and at any of the subunit positions (1-7). The modified
subunit also typically carries a poly-aspartate tail to allow the
differential migration of homo or heteroheptamers on SDS PAGE for
ease of purification for each variant (Braha, O., Walker, B.,
Cheley, S., Kasianowicz, J., Song, L., Gouaux, J. and Bayley, H.
(1997). Designed protein pores as components for biosensors.
Chemistry and Biology. 4, 497-505). Due to the size of the
exonuclease proteins it was not expected that a poly-aspartate tail
would be required on the fusion monomers, as the exonuclease alone
should cause a significant shift in electrophoretic mobility to
enable identification of individual heteroheptamers away from
wild-type homoheptamer.
[0317] To determine if a mixture of HL-RQ and fusion monomers were
able to form heteroheptamers [.sup.35S]L-methionine labelled HL-RQ
and fusion proteins (HL-wt-EcoExoIII-L1-H6 (SEQ ID NO: 18),
HL-RQC-EcoExoIII-L1-H6 (SEQ ID NO: 20), HL-RQC-EcoExoI-L1-H6 (SEQ
ID NO: 22) and HL-RQC-TthRecJ-L1-H6 (SEQ ID NO: 24) were expressed
by IVTT and oligomerised on purified rabbit red blood cell
membranes. The autoradiograph of the gel identified several
putative heptamer bands of differing size for all enzyme fusions
(FIG. 5).
[0318] To characterise these heptamer bands and to identify the
ratio of subunits within each, proteins were excised from the gel.
Heating heptamer at 95.degree. C. for 10 minutes breaks the protein
into its constitutive monomers, which can then be visualised on SDS
PAGE for densitometry to determine the heptamer subunit
composition. The different characteristic heptamer bands can then
be identified as homo or heteroheptamers that consist of different
ratios of wild-type and fusion .alpha.-HL monomers. This
characterisation was performed for putative heptamer bands
generated using both the HL-wt-EcoExoIII-L1-H6 and
HL-RQC-EcoExoI-L1-H6 fusion proteins.
[0319] An importance for a sequencing application is that there
preferentially be only one exonuclease moiety, ensuring bases are
released only from a single DNA stand being processed at any one
time. Electrophoretic migration of a 6:1 HL-monomer:HL-Exonuclease
species away from other oligomers is therefore desired for ease of
purification. Surprisingly, the
HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1 heptamer migrates to a
position slightly lower down the gel than HL-(RQ).sub.7, despite
the presence of a .about.36 kDa exonuclease being present on one of
the subunits. This band also has a "doublet" appearance, possibly
caused by incorrect incorporation of the fusion subunits amino
latch due to the downstream insertion of the exonuclease in loop 1
or translation initiating at two points (the start of the fusion
protein at hemolysin M1 and also at the first methionine of ExoIII)
giving a mixed pool of fusion proteins. The EcoExoIII fusion
protein gives formation of all theoretical heteroheptamer varieties
and the wild-type and fusion protein homoheptamers. As a
significantly smaller protein, .about.36 kDa, and with its N and C
terminus co-localised it is perhaps unsurprising that EcoExoIII
performs better than EcoExoI or TthRecJ as an exonuclease suitable
for inserting into loop regions to give good heteroheptamer
formation. Both the EcoExoI and TthRecJ fusion proteins give still
show formation of heteroheptamers, although with a limited number
of fusion monomer subunits, but in contrast the 6:1 heteroheptamer
of EcoExoIII these 6:1 heteroheptamers migrate to a position
identical to HL-(RQ).sub.7.
[0320] It is an important consideration that by varying the ratio
of wild-type to fusion monomer different bands corresponding to the
different homo and heteroheptamers were observed. This allows the
control of homo or heteroheptamer formation based on the molar
ratio of different monomer subunits, which is important for the
preferential generation of HL-(RQ).sub.6 (RQ-Exonuclease-H6).sub.1
(FIG. 6).
[0321] The conditions for the
HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1 heteroheptamer formation
were optimised by varying the ratios of monomer proteins. A
preferred ratio of 100:1 gives predominately formation of one type
of heteroheptamer, HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1, as well
as wild-type homoheptamer, HL-(RQ).sub.7. Affinity purification by
the hexa-His tag of the fusion subunit then allows separation of
heteroheptamer from HL-RQ homoheptamer.
[0322] The HL-(wt-EcoExoIII-L1-H6).sub.7 homoheptamer and the
HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1 heteroheptamer bands were
excised from the gel and the protein pores recovered by
re-hydration and maceration of the gel slice. These isolated
heptamers were both able to insert into planar lipid bilayers to
give single channel recordings. The single channel trace for the
HL-(wt-EcoExoIII-L1-H6).sub.7 homoheptamer, however, exhibited
numerous blocking events at .gtoreq.80 mV. This could be attributed
to the presence of seven denatured exonuclease peptide chains
surrounding the cap domain, as these events were significantly less
pronounced with the HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1
heteroheptamer. The HL-(RQ).sub.6(wt-EcoExoIII-L1-H6).sub.1
heteroheptamer gave an open pore current of .about.160 pA and a
heteroheptamer containing the mutations necessary for base
discrimination HL-(RQ).sub.6(RQC-EcoExoIII-L1-H6).sub.1 showed
covalent attachment of the .beta.-cyclodexterin adapter molecule,
which is characterised by an persistent current block to .about.90
pA.
[0323] The construction of a fusion protein involves the linking of
two proteins or domains of proteins by a peptide linker. Linker
sequence with regard to length, flexibility and hydrophilicity is
important so as not to disturb the functions of the domains. The
linker regions of loop 1 fusion constructs were initially designed
to be of sufficient length to allow the correct folding of both the
exonuclease and .alpha.-hemolysin domains of the fusion protein.
However, of importance to the release of monophosphate nucleosides
in a proximity to the pore lumen is the length and conformation of
the linker regions. At some point, however, the linkers will become
too short to connect the subunits in their native conformation
without strain, which may be particularly detrimental to
exonuclease activity and probably oligomerisation. The length of
the linkers was therefore reduced to (SG).sub.4, (SG).sub.2 and
(SG).sub.1 to determine the effect on oligomerisation efficiency.
For oligomerisation the shortened (SG).sub.4 and (SG).sub.2 linkers
had no adverse effect on the efficiency of heteroheptamer
formation. The effect of these shortened linkers on the enzyme
activity was not determined but the (SG).sub.4 fusion protein
showed increased expression of soluble protein, which is an
indicator of correctly folded proteins.
[0324] The conformational flexibility of these linkers will also
have an effect on the exonuclease position in relation to the pore
lumen at any given time. While conformational flexibility may be
required at the N and C-terminus linker juncture too much
flexibility in the rest of the linker may be detrimental to the
co-localisation of the exonuclease active site to the pore lumen.
The absence of a .beta.-carbon in glycine permits the polypeptide
backbone to access dihedral angles that other amino acids cannot.
Proline, as a cyclic imino acid, has no amide hydrogen to donate in
hydrogen bonding so cannot fit into either .alpha.-helix or
.beta.-strand secondary structure. Poly-proline regions are
therefore stiff with the absence of secondary structure. By in vivo
homologous recombination of PCR products the 10 serine-glycine
linker was replaced with 5 proline residues. The use of a rigid
polyproline "molecular rulers" was the determined for loop 1
EcoExoIII constructs as the linker between the c-terminus of the
exonuclease and the N-terminus of .alpha.-hemolysin (FIG. 7).
[0325] Heteroheptamer formation was not abolished demonstrating the
potential use of polyproline as a linker between the C-terminus of
EcoExoIII and .alpha.-hemolysin T19 for the fusion protein.
Although both fusion proteins showed a lower yield of
heteroheptamers where the fusion protein is predominant the
formation in particular of HL-(RQ).sub.6(RQC-EcoExoIII-L1-H6).sub.1
was unaffected.
[0326] The use of different length flexible linkers and alternative
rigid linkers for optimising the position and conformational
freedom of the exonuclease in relation to the pore lumen, as well
as a method for optimising the formation of preferentially 6:1
heteroheptamers, has been demonstrated.
2.2 Mutagenesis and Oligomerisation of Loop 2 Fusion Proteins
[0327] The high yield of heteroheptamers generated by IVTT proteins
for the EcoExoIII in loop 1 gave confidence for insertion of
EcoExoIII into other loop regions, in particular both positions
within loop 2 (FIG. 8). As this loop region connects two integral
beta stands then it is likely that any enzymes that do not have a
co-localised N and C-terminus will be too disruptive to the
.alpha.-hemolysin domain, abolishing the ability of this protomer
to oligomerize. Only very long linker regions may enable genetic
attachment of EcoExoI or TthRecJ at these positions, due to their N
and C-terminus localising to domains at distal ends of the
respective enzymes.
[0328] The oligomerisation of the HL-RQC-EcoExoIII-L2a-H6 and
HL-RQC-EcoExoIII-L2b-H6 fusion proteins was poor and only heptamers
with an electrophoretic mobility similar to HL-(RQ).sub.7 and
HL-(RQ).sub.6(RQC-EcoExoIII-L1-H6).sub.1 were observed. As
oligomerisation of HL-RQC-EcoExoIII-L2a-H6 was slightly improved
over the HL-RQC-EcoExoIII-L2b-H6 fusion protein, modification was
carried out to improve the formation of heteroheptamer. Deletions
of residues around the insertion site were made in an attempt to
accommodate the terminal linker residues. In addition certain
residues in loop 2 may be important for heptamer self-assembly.
Sequence alignment of the .alpha.-hemolysin monomer with other
.beta.-pore forming toxin monomers, LukS and LukF, indicates loop 2
is a highly conserved region and in particular residue D45, which
is the residue immediately after the exonuclease linker
juncture.
[0329] The crystal structure of the .alpha.-hemolysin heptamer also
indicates that H48 is important to binding the amino latch of the
adjoining subunit, at position T22 and D24 (Song, L., Hohaugh, M.,
Shustak, C., Cheley, S., Bayley, H. and Gouaux, E. (1996).
Structure of Staphylococcal .alpha.-hemolysin, a heptameric
transmembrane pore. Science. 274, 1859-1865). Attempts to modify
the insertion point to accommodate and characterise these
potentially important interactions were therefore made.
[0330] Around the loop 2a EcoExoIII insertion site (D44-D45)
residues D45, K46 and N47 were sequentially deleted by in vivo
homologous recombination of PCR products. To determine the
importance of H48 the site of insertion was also changed to lie
between N47-N49, deleting H48 entirely. As previously stated linker
flexibility can have an important effect of interaction of domains
within a fusion protein. Therefore the flexible 10 serine glycine
linkers were replaced with rigid 8 proline linkers in an attempt to
confer greater domain separation. Each loop 2 fusion construct was
expressed via IVTT and mixed in a 2.5:1 ratio with wild-type in the
presence of purified rabbit red blood cell membranes. Any
improvement in oligomerisation was determined by densitometry of
the autoradiograph (FIG. 9).
[0331] Oligomerisation of the L2 fusion protein was abolished when
the flexibility of the linker was changed to a more rigid
polyproline linker. In addition deletion of H48 and positioning of
the exonuclease insertion between N47 and N49 abolished
heteroheptamer formation. It appeared that only deletion of
residues from around the D44-D45 insertion site improved
oligomerisation of the fusion protein. To determine if this could
further be improved residue D45 was added back to the loop 2
deletion fusion proteins in a position adjacent to D44, before the
EcoExoIII insertion site (FIG. 10).
[0332] Heteroheptamer formation was not affected by the position of
residue D45 and indeed adding back this residue to all fusion
proteins was detrimental to oligomerisation, possibly as it reduced
the number of residues deleted to accommodate the exonuclease by
one as a consequence. Accommodating the exonuclease is therefore
the key to improving the oligomerisation of the loop 2 fusion
protein (as in SEQ ID NO: 26). The insertion site was varied
further in an attempt to determine how close to the .beta..sub.2
strand the insertion site could be. The position within the loop
region could be important for the relative positioning of the
EcoExoIII active site in relation to the pore lumen and it is
predicted the closer to .beta..sub.2 the better the presentation of
cleaved monophosphate nucleosides. In each fusion construct the
insertion site was not only varied but the following three residues
of .alpha.-hemolysin at the C-terminus of EcoExoIII were deleted in
order to accommodate the exonuclease. Oligomerisation of the
alternative loop 2 fusion proteins
HL-(RQ).sub.6(RQC-EcoExoIII-L2-D45-N47.DELTA.-H6).sub.1,
HL-(RQ).sub.6(RQC-EcoExoIII-L2-F42-D46.DELTA.-H6).sub.1 and
HL-(RQ).sub.6(RQC-EcoExoIII-L2-I43-D46.DELTA.-H6).sub.1 determined
that the insertion point can lie anywhere within the loop region
but as soon as it breaks a region of secondary structure all
oligomerisation is abolished (FIG. 10).
[0333] Whilst the linkers in the loop 2 fusion protein require some
degree of flexibility, as determined by the fact that rigid
polyproline linkers could not substitute, the length can be
reduced. The linker regions were shortened as for the loop 1
EcoExoIII fusion protein to (SG).sub.4, (SG).sub.3, (SG).sub.2 and
(SG).sub.1 to determine the effect on oligomerisation efficiency.
For oligomerisation the shortened (SG).sub.4, (SG).sub.3 and
(SG).sub.2 linkers had no adverse effect on the efficiency of
heteroheptamer formation. The effect of these shortened linkers on
the enzyme activity was not, however, determined.
2.3 Genetic Attachment at the N and C-Terminus of
.alpha.-Hemolysin
[0334] Genetic attachment of two proteins, typically an enzyme to
an antibody, has previously focused on the fusion of one protein's
C-terminus to another protein's N-terminus, mediated by a peptide
linker. As previously mentioned strategies for the attachment of a
DNA handling enzyme to the C or N-terminus of .alpha.-hemolysin was
considered, in particular the attachment of EcoExoI and the Klenow
fragment. Attachment of EcoExoI at the C-terminus was mediated by
five different linkers in order to determine the optimum fusion
protein for oligomerisation. As the C-terminus is at the back of
the .alpha.-hemolysin cap domain a turn of approximately
180.degree. was desired. In order to initiate this turn either a
Gly-Asp or Trp-Pro-Val motif was added at the start of the linker
peptide. Two linker peptides were also used, either a flexible 16
serine-glycine or a 12 polyproline. As early results from the
EcoExoI loop 1 fusion protein indicated that the 6:1 heteroheptamer
had the same electrophoretic mobility as wild-type homoheptamer
then a mixture of radiolabelled and non-radio labelled IVTT
monomers were used for oligomerisation. Monomers were mixed in a
1:1 ratio and oligomerised on purified rabbit red blood cell
membranes (FIG. 11).
[0335] Although the predominant fusion protein produced is the 6:1
heteroheptamer this migrates to the same position as the
HL-(RQ).sub.7 homoheptamer. Therefore the proteins corresponding to
HL-(RQ).sub.5(RQC-EcoExoI-Cter-{SG}8-H6).sub.2,
HL-(RQ).sub.5(RQC-EcoExoI-Cter-DG{SG}8-H6).sub.2 as well as the
HL-(RQ).sub.5(RQC-EcoExoI-L1-H6).sub.2 heteroheptamer from an
earlier experiment were purified from SDS and the ability to insert
into planar lipid bilayers determined. All heteroheptamers were
capable of inserting into the lipid bilayer to give single channel
recordings.
[0336] The success for fusion of the EcoExoI at the C-terminus of
.alpha.-hemolysin mediated by an (SG).sub.8 and DG(SG).sub.8
peptide linker provides the method for the later attachment of
other DNA handling enzymes via genetic fusion, such as the Klenow
fragment (SEQ ID NOs: 28 and 30). The advantages of the Klenow
fragment are the fact it provides a molecular motor for strand
sequencing and also shows some resistance to SDS PAGE (Akeson,
Personal Communication).
2.4 Non-SDS PAGE Purification of Heptamers
[0337] Sodium dodecyl sulphate (SDS) is an anionic surfactant that
is highly denaturing to proteins, due to its ability to disrupt
non-covalent bonds and bind to the peptide chain. As existing
heptamer purification techniques rely on the use of SDS PAGE then
the effect of this detergent on EcoExoIII was determined by a
fluorescence based activity assay (FIG. 12, left panel).
[0338] Even a low concentration of SDS abolished EcoExoIII activity
for the native enzyme, making the classical SDS PAGE purification
of heptamers denaturing with regard to the exonuclease moiety of a
fusion protein heteroheptamer. An alternative purification method
was developed therefore using the alternative detergent,
n-dodecyl-D-maltopyranoside (DDM). The effect of this surfactant on
the EcoExoIII was determined and found to be non-denaturing to the
native enzyme (FIG. 12, right panel). Following oligomerisation on
rabbit red blood cell membranes instead of purifying heptamers via
SDS PAGE the lipid membranes were dissolved by addition of 0.1% DDM
for 15 minutes. Heteroheptamers were then purified away from the
wild-type homoheptamer by affinity purification to the hexa-His tag
on the C-terminus of the fusion protein. A buffer exchange further
removed any surfactant and heptamers were then used for single
channel recordings. This method does not distinguish entirely
between heteroheptamers so the formation of 5:2 was limited by
optimising the ratios of monomers mixed.
[0339] Purification via DDM extraction produced heptamers that
showed an increased number of blocking events and surfactant
behaviour on the lipid bilayer in single channel recordings. Whilst
the cause of this instability remains undetermined, it is likely to
be a result of other membrane proteins released from the rabbit red
blood cell membranes, either affecting the lipid bilayer directly
or else increasing the protein associated surfactant carryover.
Oligomerisation of .alpha.-hemolysin monomers is classically
facilitated either on purified rabbit red blood cell membranes or
deoxycholate micelles. The yield of heptamer from deoxycholate is
too poor in this instance to be of use and as previously mentioned
the use of purified rabbit red blood cell membranes led to lipid
bilayer instability. As an alternative, synthetic lipid vesicles
were developed based on the lipid composition of rabbit red blood
cell membranes, which lack other the membrane proteins of rabbit
red blood cell membranes. These are composed of 30% cholesterol,
30% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE),
10% sphingomyelin (SM) and 10% phosphatidylserine (PS). The
synthetic lipid vesicles developed here give approximately the same
efficiency of heptamerisation as observed for rabbit red blood cell
membranes. Heptamers purified from these synthetic lipid vesicles
by DDM extraction also showed a dramatic decrease in the
occurrences of lipid bilayer instability.
[0340] Oligomerisation and DDM purification of heptamers was also
determined for E. coli expressed proteins. Expression of wild-type
and fusion monomers in E. coli gives a concentration sufficient for
large scale production of enzyme pores, typically 3 mg ml.sup.-1
and 1 mg ml.sup.-1 respectively. Monomers were oligomerised on
synthetic lipid vesicles at a ratio of 100:1 (wild-type:fusion) and
purified as detailed previously (FIG. 13).
[0341] High level E. coli expression of monomers that can be
oligomerised on synthetic lipid vesicles was achieved. Purification
of the 6:1 heteroheptamer was also achieved in conditions that are
non-denaturing to enzymes, ensuring activity of the pores
exonuclease moiety.
2.5 Enzymatic Activity of Fusion Protein Heptamers
[0342] As the terminal ends of the enzyme are conformationally
constrained within loop regions of the .alpha.-hemolysin monomer
then the dynamic movements of the exonuclease domains necessary for
activity could be impacted. The native enzyme (Exonuclease III,
NEB)) was able to cleave nucleotides from the dsDNA substrate to a
point where the sense strand was no longer of sufficient length to
hybridise to the antisense strand (.about.8 bp). On dissociation of
the DNA strands the fluorophore, at the 5' end of the sense strand,
was sufficiently spatially separated from its quencher pair, at the
3' end of the antisense strand, giving a fluorescence increase
relative to the enzyme activity. The activity of the native enzyme
was also determined in a range of salt concentrations (0-1M KCl).
Activity of the native enzyme was demonstrated in concentrations
.ltoreq.300 mM KCl, which is within the experimental conditions
required for single channel recordings and base discrimination. To
determine if exonuclease activity of the EcoExoIII moiety on the
fusion proteins was maintained after genetic attachment and
oligomerisation, its activity was determined in this same
fluorescence based DNA degradation assay (FIG. 14).
[0343] The EcoExoIII fusion proteins demonstrated retained
exonuclease activity but as yet this is a qualitative rather than
quantitative indication as amount of fusion protein was not
determined. Therefore the effect of genetic fusion of the EcoExoIII
to an .alpha.-hemolysin monomer on the rate of exonuclease activity
cannot be determined as yet.
[0344] The exonuclease activity of the fusion protein was checked
at all stages of purification and found to retain activity.
Following oligomerisation and DDM purification the activity of
fully formed pores was also checked and found to show some
exonuclease activity. This demonstrates the ability to genetically
couple an enzyme to a protein pore and still retain activity of the
enzyme after expression and oligomerisation to a fully assembled
pore.
2.6 Pore Forming Activity of Fusion Protein Heptamers.
[0345] As previously mentioned in the text the ability of a variety
of different enzyme pore constructs to insert into lipid bilayers
for single channel recordings has been shown. We have demonstrated
that changes to the .beta.-barrel of the .alpha.-hemolysin protein
can enable covalent linkage and stabilisation of an adapter
molecule for continuous base detection. For this the pore
preferentially requires 6 subunits with mutations M113R/N139Q and 1
subunit with mutations M113R/N139Q/L135C. To determine if the
exonuclease domain of the fusion protein within loop regions
affected the ability of the pore to discriminate bases the
M113R/N139Q/L135C mutations were made in the fusion constructs. As
base discrimination preferentially requires a heteroheptamer with
only one subunit carrying the L135C mutation and the enzyme pore
preferentially one subunit being a fusion protein, the L135C
mutation was made in the fusion protein. The wild-type M113R and
N139Q construct from previous work was used for the other subunits.
E. coli expressed HL-RQ and HL-RQC-EcoExoIII-L2-D46-N47.DELTA.-H6
were oligomerised on synthetic lipid vesicles (at a ratio of 100:1)
and purified by DDM extraction. The exonuclease activity of the
fully formed pore was determined and indicated correct folding of
the exonuclease moiety. The protein was also used for
electrophysiology to determine firstly pore functionality and
secondly if base discrimination was possible (FIG. 19.).
[0346] The 6:1 heteroheptamer can be inserted into a lipid bilayer
and a stable transmembrane current established. This current can be
modulated by the introduction of .beta.-cyclodexterin, and is
further reduced by the addition of monophosphate nucleosides. The
presence of the exonuclease domain appears to have no detrimental
effect on current flow or the base discrimination by the pore.
Although the work shown is for a heteroheptamer incorporating a
fusion protein with the insertion of EcoExoIII at the loop 2
position, similar data was acquired for the loop 1
heteroheptamers.
TABLE-US-00004 Sequence listing SEQ ID NO: 1 1 ATGGCAGATT
CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG CAATACTACA GTAAAAACAG
71 GTGATTTAGT CACTTATGAT AAAGAAAATG GCATGCACAA AAAAGTATTT
TATAGTTTTA TCGATGATAA 141 AAATCACAAT AAAAAACTGC TAGTTATTAG
AACAAAAGGT ACCATTGCTG GTCAATATAG AGTTTATAGC 211 GAAGAAGGTG
CTAACAAAAG TGGTTTAGCC TGGCCTTCAG CCTTTAAGGT ACAGTTGCAA CTACCTGATA
281 ATGAAGTAGC TCAAATATCT GATTACTATC CAAGAAATTC GATTGATACA
AAAGAGTATA TGAGTACTTT 351 AACTTATGGA TTCAACGGTA ATGTTACTGG
TGATGATACA GGAAAAATTG GCGGCCTTAT TGGTGCAAAT 421 GTTTCGATTG
GTCATACACT GAAATATGTT CAACCTGATT TCAAAACAAT TTTAGAGAGC CCAACTGATA
491 AAAAAGTAGG CTGGAAAGTG ATATTTAACA ATATGGTGAA TCAAAATTGG
GGACCATACG ATCGAGATTC 561 TTGGAACCCG GTATATGGCA ATCAACTTTT
CATGAAAACT AGAAATGGTT CTATGAAAGC AGCAGATAAC 631 TTCCTTGATC
CTAACAAAGC AAGTTCTCTA TTATCTTCAG GGTTTTCACC AGACTTCGCT ACAGTTATTA
701 CTATGGATAG AAAAGCATCC AAACAACAAA CAAATATAGA TGTAATATAC
GAACGAGTTC GTGATGATTA 771 CCAATTGCAT TGGACTTCAA CAAATTGGAA
AGGTACCAAT ACTAAAGATA AATGGACAGA TCGTTCTTCA 841 GAAAGATATA
AAATCGATTG GGAAAAAGAA GAAATGACAA AT SEQ ID NO: 2 1 ADSDINIKTG
TTDIGSNTTV KTGDLVTYDK ENGMHKKVFY SFIDDKNHNK KLLVIRTKGT IAGQYRVYSE
71 EGANKSGLAW PSAFKVQLQL PDNEVAQISD YYPRNSIDTK EYMSTLTYGF
NGNVTGDDTG KIGGLIGANV 141 SIGHTLKYVQ PDFKTILESP TDKKVGWKVI
FNNMVNQNWG PYDRDSWNPV YGNQLFMKTR NGSMKAADNF 211 LDPNKASSLL
SSGFSPDFAT VITMDRKASK QQTNIDVIYE RVRDDYQLHW TSTNWKGTNT KDKWTDRSSE
281 RYKIDWEKEE MTN SEQ ID NO: 3 1 ATGGCAGATT CTGATATTAA TATTAAAACC
GGTACTACAG ATATTGGAAG CAATACTACA GTAAAAACAG 71 GTGATTTAGT
CACTTATGAT AAAGAAAATG GCATGCACAA AAAAGTATTT TATAGTTTTA TCGATGATAA
141 AAATCACAAT AAAAAACTGC TAGTTATTAG AACAAAAGGT ACCATTGCTG
GTCAATATAG AGTTTATAGC 211 GAAGAAGGTG CTAACAAAAG TGGTTTAGCC
TGGCCTTCAG CCTTTAAGGT ACAGTTGCAA CTACCTGATA 281 ATGAAGTAGC
TCAAATATCT GATTACTATC CAAGAAATTC GATTGATACA AAAGAGTATA GGAGTACTTT
351 AACTTATGGA TTCAACGGTA ATGTTACTGG TGATGATACA GGAAAAATTG
GCGGCCTTAT TGGTGCACAA 421 GTTTCGATTG GTCATACACT GAAATATGTT
CAACCTGATT TCAAAACAAT TTTAGAGAGC CCAACTGATA 491 AAAAAGTAGG
CTGGAAAGTG ATATTTAACA ATATGGTGAA TCAAAATTGG GGACCATACG ATCGAGATTC
561 TTGGAACCCG GTATATGGCA ATCAACTTTT CATGAAAACT AGAAATGGTT
CTATGAAAGC AGCAGATAAC 631 TTCCTTGATC CTAACAAAGC AAGTTCTCTA
TTATCTTCAG GGTTTTCACC AGACTTCGCT ACAGTTATTA 701 CTATGGATAG
AAAAGCATCC AAACAACAAA CAAATATAGA TGTAATATAC GAACGAGTTC GTGATGATTA
771 CCAATTGCAT TGGACTTCAA CAAATTGGAA AGGTACCAAT ACTAAAGATA
AATGGACAGA TCGTTCTTCA 841 GAAAGATATA AAATCGATTG GGAAAAAGAA
GAAATGACAA AT SEQ ID NO: 4 1 ADSDINIKTG TTDIGSNTTV KTGDLVTYDK
ENGMHKKVFY SFIDDKNHNK KLLVIRTKGT IAGQYRVYSE 71 EGANKSGLAW
PSAFKVQLQL PDNEVAQISD YYPRNSIDTK EYRSTLTYGF NGNVTGDDTG KIGGLIGAQV
141 SIGHTLKYVQ PDFKTILESP TDKKVGWKVI FNNMVNQNWG PYDRDSWNPV
YGNQLFMKTR NGSMKAADNF 211 LDPNKASSLL SSGFSPDFAT VITMDRKASK
QQTNIDVIYE RVRDDYQLHW TSTNWKGTNT KDKWTDRSSE 281 RYKIDWEKEE MTN SEQ
ID NO: 5 1 TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG
TCATGATAAT AATGGTTTCT 71 TAGACGTCAG GTGGCACTTT TCGGGGAAAT
GTGCGCGGAA CCCCTATTTG TTTATTTTTC TAAATACATT 141 CAAATATGTA
TCCGCTCATG AGACAATAAC CCTGATAAAT GCTTCAATAA TATTGAAAAA GGAAGAGTAT
211 GAGTATTCAA CATTTCCGTG TCGCCCTTAT TCCCTTTTTT GCGGCATTTT
GCCTTCCTGT TTTTGCTCAC 281 CCAGAAACGC TGGTGAAAGT AAAAGATGCT
GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG 351 ATCTCAACAG
CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA
421 AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC
AACTCGGTCG CCGCATACAC 491 TATTCTCAGA ATGACTTGGT TGAGTACTCA
CCAGTCACAG AAAAGCATCT TACGGATGGC ATGACAGTAA 561 GAGAATTATG
CAGTGCTGCC ATAACCATGA GTGATAACAC TGCGGCCAAC TTACTTCTGA CAACGATCGG
631 AGGACCGAAG GAGCTAACCG CTTTTTTGCA CAACATGGGG GATCATGTAA
CTCGCCTTGA TCGTTGGGAA 701 CCGGAGCTGA ATGAAGCCAT ACCAAACGAC
GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT 771 TGCGCAAACT
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC
841 GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT
TTATTGCTGA TAAATCTGGA 911 GCCGGTGAGC GTGGGTCTCG CGGTATCATT
GCAGCACTGG GGCCAGATGG TAAGCCCTCC CGTATCGTAG 981 TTATCTACAC
GACGGGGAGT CAGGCAACTA TGGATGAACG AAATAGACAG ATCGCTGAGA TAGGTGCCTC
1051 ACTGATTAAG CATTGGTAAC TGTCAGACCA AGTTTACTCA TATATACTTT
AGATTGATTT AAAACTTCAT 1121 TTTTAATTTA AAAGGATCTA GGTGAAGATC
CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT 1191 TTTCGTTCCA
CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG
1261 CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT
GTTTGCCGGA TCAAGAGCTA 1331 CCAACTCTTT TTCCGAAGGT AACTGGCTTC
AGCAGAGCGC AGATACCAAA TACTGTCCTT CTAGTGTAGC 1401 CGTAGTTAGG
CCACCACTTC AAGAACTCTG TAGCACCGCC TACATACCTC GCTCTGCTAA TCCTGTTACC
1471 AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG TCTTACCGGG TTGGACTCAA
GACGATAGTT ACCGGATAAG 1541 GCGCAGCGGT CGGGCTGAAC GGGGGGTTCG
TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC 1611 TGAGATACCT
ACAGCGTGAG CTATGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC
1681 GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG
GAAACGCCTG GTATCTTTAT 1751 AGTCCTGTCG GGTTTCGCCA CCTCTGACTT
GAGCGTCGAT TTTTGTGATG CTCGTCAGGG GGGCGGAGCC 1821 TATGGAAAAA
CGCCAGCAAC GCGTCCTTTT TACGGTTCCT GGCCTTTTGC TGGCCTTTTG CTCACATGTT
1891 CTTTCCTGCG TTATCCCCTG ATTCTGTGGA TAACCGTATT ACCGCCTTTG
AGTGAGCTGA TACCGCTCGC 1961 CGCAGCCGAA CGACCGAGCG CAGCGAGTCA
GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC 2031 TCCTTACGCA
TCTGTGCGGT ATTTCACACC GCATATATGG TGCACTCTCA GTACAATCTG CTCTGATGCC
2101 GCATAGTTAA GCCAGTATAC ACTCCGCTAT CGCTACGTGA CTGGGTCATG
GCTGCGCCCC GACACCCGCC 2171 AACACCCGCT GACGCGCCCT GACGGGCTTG
TCTGCTCCCG GCATCCGCTT ACAGACAAGC TGTGACCGTC 2241 TCCGGGAGCT
GCATGTGTCA GAGGTTTTCA CCGTCATCAC CGAAACGCGC GAGGCAGCGC TCTCCCTTAT
2311 GCGACTCCTG CATTAGGAAG CAGCCCAGTA GTAGGTTGAG GCCGTTGAGC
ACCGCCGCCG CAAGGAATGG 2381 TGCATGCAAG GAGATGGCGC CCAACAGTCC
CCCGGCCACG GGGCCTGCCA CCATACCCAC GCCGAAACAA 2451 GCGCTCATGA
GCCCGAAGTG GCGAGCCCGA TCTTCCCCAT CGGTGATGTC GGCGATATAG GCGCCAGCAA
2521 CCGCACCTGT GGCGCCGGTG ATGCCGGCCA CGATGCGTCC GGCGTAGAGG
ATCGAGATCT AGCCCGCCTA 2591 ATGAGCGGGC TTTTTTTTAG ATCTCGATCC
CGCGAAATTA ATACGACTCA CTATAGGGAG ACCACAACGG 2661 TTTCCCTCTA
GAAATAATTT TGTTTAACTT TAAGAAGGAG ATATACATAT GGCAGATTCT GATATTAATA
2731 TTAAAACCGG TACTACAGAT ATTGGAAGCA ATACTACAGT AAAAACAGGT
GATTTAGTCA CTTATGATAA 2801 AGAAAATGGC ATGCACAAAA AAGTATTTTA
TAGTTTTATC GATGATAAAA ATCACAATAA AAAACTGCTA 2871 GTTATTAGAA
CAAAAGGTAC CATTGCTGGT CAATATAGAG TTTATAGCGA AGAAGGTGCT AACAAAAGTG
2941 GTTTAGCCTG GCCTTCAGCC TTTAAGGTAC AGTTGCAACT ACCTGATAAT
GAAGTAGCTC AAATATCTGA 3011 TTACTATCCA AGAAATTCGA TTGATACAAA
AGAGTATATG AGTACTTTAA CTTATGGATT CAACGGTAAT 3081 GTTACTGGTG
ATGATACAGG AAAAATTGGC GGCCTTATTG GTGCAAATGT TTCGATTGGT CATACACTGA
3151 AATATGTTCA ACCTGATTTC AAAACAATTT TAGAGAGCCC AACTGATAAA
AAAGTAGGCT GGAAAGTGAT 3221 ATTTAACAAT ATGGTGAATC AAAATTGGGG
ACCATACGAT CGAGATTCTT GGAACCCGGT ATATGGCAAT
3291 CAACTTTTCA TGAAAACTAG AAATGGTTCT ATGAAAGCAG CAGATAACTT
CCTTGATCCT AACAAAGCAA 3361 GTTCTCTATT ATCTTCAGGG TTTTCACCAG
ACTTCGCTAC AGTTATTACT ATGGATAGAA AAGCATCCAA 3431 ACAACAAACA
AATATAGATG TAATATACGA ACGAGTTCGT GATGATTACC AATTGCATTG GACTTCAACA
3501 AATTGGAAAG GTACCAATAC TAAAGATAAA TGGACAGATC GTTCTTCAGA
AAGATATAAA ATCGATTGGG 3571 AAAAAGAAGA AATGACAAAT TAATGTAAAT
TATTTGTACA TGTACAAATA AATATAATTT ATAACTTTAG 3641 CCGAAAGCTT
GGATCCGGCT GCTAACAAAG CCCGAAAGGA AGCTGAGTTG GCTGCTGCCA CCGCTGAGCA
3711 ATAACTAGCA TAACCCCTTG GGGCCTCTAA ACGGGTCTTG AGGGGTTTTT
TGCTGAAAGG AGGAACTATA 3781 TATAATTCGA GCTCGGTACC CACCCCGGTT
GATAATCAGA AAAGCCCCAA AAACAGGAAG ATTGTATAAG 3851 CAAATATTTA
AATTGTAAAC GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA TCAGCTCATT
3921 TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT
AGACCGAGAT AGGGTTGAGT 3991 GTTGTTCCAG TTTGGAACAA GAGTCCAGTA
TTAAAGAACG TGGACTCCAA CGTCAAAGGG CGAAAAACCG 4061 TCTATCAGGG
CGATGGCCCA CTACGTGAAC CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA
4131 AGCACTAAAT CGGAACCCTA AAGGGATGCC CCGATTTAGA GCTTGACGGG
GAAAGCCGGC GAACGTGGCG 4201 AGAAAGGAAG GGAAGAAAGC GAAAGGAGCG
GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 4271 TAACCACCAC
ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTGGGGA TCCTCTAGAG TCGACCTGCA
4341 GGCATGCAAG CTATCCCGCA AGAGGCCCGG CAGTACCGGC ATAACCAAGC
CTATGCCTAC AGCATCCAGG 4411 GTGACGGTGC CGAGGATGAC GATGAGCGCA
TTGTTAGATT TCATACACGG TGCCTGACTG CGTTAGCAAT 4481 TTAACTGTGA
TAAACTACCG CATTAAAGCT AGCTTATCGA TGATAAGCTG TCAAACATGA GAA SEQ ID
NO: 6 1 ATGGCAGATT CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG
CAATACTTCC GGAACAGTAA 71 AAACAGGTGA TTTAGTCACT TATGATAAAG
AAAATGGCAT GCACAAAAAA GTATTTTATA GTTTTATCGA 141 TGATAAAAAT
CACAATAAAA AACTGCTAGT TATTAGAACA AAAGGTACCA TTGCTGGTCA ATATAGAGTT
211 TATAGCGAAG AAGGTGCTAA CAAAAGTGGT TTAGCCTGGC CTTCAGCCTT
TAAGGTACAG TTGCAACTAC 281 CTGATAATGA AGTAGCTCAA ATATCTGATT
ACTATCCAAG AAATTCGATT GATACAAAAG AGTATATGAG 351 TACTTTAACT
TATGGATTCA ACGGTAATGT TACTGGTGAT GATACAGGAA AAATTGGCGG CCTTATTGGT
421 GCAAATGTTT CGATTGGTCA TACACTGAAA TATGTTCAAC CTGATTTCAA
AACAATTTTA GAGAGCCCAA 491 CTGATAAAAA AGTAGGCTGG AAAGTGATAT
TTAACAATAT GGTGAATCAA AATTGGGGAC CATACGATCG 561 AGATTCTTGG
AACCCGGTAT ATGGCAATCA ACTTTTCATG AAAACTAGAA ATGGTTCTAT GAAAGCAGCA
631 GATAACTTCC TTGATCCTAA CAAAGCAAGT TCTCTATTAT CTTCAGGGTT
TTCACCAGAC TTCGCTACAG 701 TTATTACTAT GGATAGAAAA GCATCCAAAC
AACAAACAAA TATAGATGTA ATATACGAAC GAGTTCGTGA 771 TGATTACCAA
TTGCATTGGA CTTCAACAAA TTGGAAAGGT ACCAATACTA AAGATAAATG GACAGATCGT
841 TCTTCAGAAA GATATAAAAT CGATTGGGAA AAAGAAGAAA TGACAAAT SEQ ID NO:
7 1 ATGGCAGATT CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG
CAATACTACA GTAAAAACAG 71 GTGATTTAGT CACTTATGAT AAAGAAAATG
GCATGCACAA AAAAGTATTT TATAGTTTTA TCGATTCCGG 141 AGATAAAAAT
CACAATAAAA AACTGCTAGT TATTAGAACA AAAGGTACCA TTGCTGGTCA ATATAGAGTT
211 TATAGCGAAG AAGGTGCTAA CAAAAGTGGT TTAGCCTGGC CTTCAGCCTT
TAAGGTACAG TTGCAACTAC 281 CTGATAATGA AGTAGCTCAA ATATCTGATT
ACTATCCAAG AAATTCGATT GATACAAAAG AGTATATGAG 351 TACTTTAACT
TATGGATTCA ACGGTAATGT TACTGGTGAT GATACAGGAA AAATTGGCGG CCTTATTGGT
421 GCAAATGTTT CGATTGGTCA TACACTGAAA TATGTTCAAC CTGATTTCAA
AACAATTTTA GAGAGCCCAA 491 CTGATAAAAA AGTAGGCTGG AAAGTGATAT
TTAACAATAT GGTGAATCAA AATTGGGGAC CATACGATCG 561 AGATTCTTGG
AACCCGGTAT ATGGCAATCA ACTTTTCATG AAAACTAGAA ATGGTTCTAT GAAAGCAGCA
631 GATAACTTCC TTGATCCTAA CAAAGCAAGT TCTCTATTAT CTTCAGGGTT
TTCACCAGAC TTCGCTACAG 701 TTATTACTAT GGATAGAAAA GCATCCAAAC
AACAAACAAA TATAGATGTA ATATACGAAC GAGTTCGTGA 771 TGATTACCAA
TTGCATTGGA CTTCAACAAA TTGGAAAGGT ACCAATACTA AAGATAAATG GACAGATCGT
841 TCTTCAGAAA GATATAAAAT CGATTGGGAA AAAGAAGAAA TGACAAAT SEQ ID NO:
8 1 ATGGCAGATT CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG
CAATACTACA GTAAAAACAG 71 GTGATTTAGT CACTTATGAT AAAGAAAATG
GCATGCACAA AAAAGTATTT TATAGTTTTA TCGATGATAA 141 AAATCACAAT
AAATCCGGAA AACTGCTAGT TATTAGAACA AAAGGTACCA TTGCTGGTCA ATATAGAGTT
211 TATAGCGAAG AAGGTGCTAA CAAAAGTGGT TTAGCCTGGC CTTCAGGGTT
TAAGGTACAG TTGCAACTAC 281 CTGATAATGA AGTAGCTCAA ATATCTGATT
ACTATCCAAG AAATTCGATT GATACAAAAG AGTATATGAG 351 TACTTTAACT
TATGGATTCA ACGGTAATGT TACTGGTGAT GATACAGGAA AAATTGGCGG CCTTATTGGT
421 GCAAATGTTT CGATTGGTCA TACACTGAAA TATGTTCAAC CTGATTTCAA
AACAATTTTA GAGAGCCCAA 491 CTGATAAAAA AGTAGGCTGG AAAGTGATAT
TTAACAATAT GGTGAATCAA AATTGGGGAC CATACGATCG 561 AGATTCTTGG
AACCCGGTAT ATGGCAATCA ACTTTTCATG AAAACTAGAA ATGGTTCTAT GAAAGCAGCA
631 GATAACTTCC TTGATCCTAA CAAAGCAAGT TCTCTATTAT CTTCAGGGTT
TTCACCAGAC TTCGCTACAG 701 TTATTACTAT GGATAGAAAA GCATCCAAAC
AACAAACAAA TATAGATGTA ATATACGAAC GAGTTCGTGA 771 TGATTACCAA
TTGCATTGGA CTTCAACAAA TTGGAAAGGT ACCAATACTA AAGATAAATG GACAGATCGT
841 TCTTCAGAAA GATATAAAAT CGATTGGGAA AAAGAAGAAA TGACAAAT SEQ ID NO:
9 1 ATGAAATTTG TCTCTTTTAA TATCAACGGC CTGCGCGCCA GACCTCACCA
GCTTGAAGCC ATCGTCGAAA 71 AGCACCAACC GGATGTGATT GGCCTGCAGG
AGACAAAAGT TCATGACGAT ATGTTTCCGC TCGAAGAGGT 141 GGCGAAGCTC
GGCTACAACG TGTTTTATCA CGGGCAGAAA GGCCATTATG GCGTGGCGCT GCTGACCAAA
211 GAGACGCCGA TTGCCGTGCG TCGCGGCTTT CCCGGTGACG ACGAAGAGGC
GCAGCGGCGG ATTATTATGG 281 CGGAAATCCC CTCACTGCTG GGTAATGTCA
CCGTGATCAA CGGTTACTTC CCGCAGGGTG AAAGCCGCGA 351 CCATCCGATA
AAATTGGCGG CAAAAGCGCA GTTTTATCAG AATCTGCAAA ACTACCTGGA AACCGAACTC
421 AAACGTGATA ATCCGGTACT GATTATGGGC GATATGAATA TCAGCCCTAC
AGATCTGGAT ATCGGCATTG 491 GCGAAGAAAA CCGTAAGCGC TGGCTGCGTA
CCGGTAAATG CTCTTTCCTG CCGGAAGAGC GCGAATGGAT 561 GGACAGGCTG
ATGAGCTGGG GGTTGGTCGA TACCTTCCGC CATGCGAATC CGCAAACAGC AGATCGTTTC
631 TCATGGTTTG ATTACCGCTC AAAAGGTTTT GACGATAACC GTGGTCTGCG
CATCGACCTG CTGCTCGCCA 701 GCCAACCGCT GGCAGAATGT TGCGTAGAAA
CCGGCATCGA CTATGAAATC CGCAGCATGG AAAAACCGTC 771 CGATCACGCC
CCCGTCTGGG CGACCTTCCG CCGC SEQ ID NO: 10 1 MKFVSFNING LRARPHQLEA
IVEKHQPDVI GLQETKVHDD MFPLEEVAKL GYNVFYHGQK GHYGVALLTK 71
ETPIAVRRGF PGDDEEAQRR IIMAEIPSLL GNVTVINGYF PQGESRDHPI KFPAKAQFYQ
NLQNYLETEL 141 KRDNPVLIMG DMNISPTDLD IGIGEENRKR WLRTGKCSFL
PEEREWMDRL MSWGLVDTFR HANPQTADRF 211 SWFDYRSKGF DDNRGLRIDL
LLASQPLAEC CVETGIDYEI RSMEKPSDHA PVWATFRR SEQ ID NO: 11 1
ATGATGAATG ACGGTAAGCA ACAATCTACC TTTTTGTTTC ACGATTACGA AACCTTTGGC
ACGCACCCCG 71 CGTTAGATCG CCCTGCACAG TTCGCAGCCA TTCGCACCGA
TAGCGAATTC AATGTCATCG GCGAACCCGA 141 AGTCTTTTAC TGCAAGCCCG
CTGATGACTA TTTACCCCAG CCAGGAGCCG TATTAATTAC CGGTATTACC 211
CCGCAGGAAG CACGGGCGAA AGGAGAAAAC GAAGCCGCGT TTGCCGCCCG TATTCACTCG
CTTTTTACCG 281 TACCGAAGAC CTGTATTCTG GGCTACAACA ATGTGCGTTT
CGACGACGAA GTCACACGCA ACATTTTTTA 351 TCGTAATTTC TACGATCCTT
ACGCCTGGAG CTGGCAGCAT GATAACTCGC GCTGGGATTT ACTGGATGTT 421
ATGCGTGCCT GTTATGCCCT GCGCCCGGAA GGAATAAACT GGCCTGAAAA TGATGACGGT
CTACCGAGCT 491 TTCGCCTTGA GCATTTAACC AAAGCGAATG GTATTGAACA
TAGCAACGCC CACGATGCGA TGGCTGATGT 561 GTACGCCACT ATTGCGATGG
CAAAGCTGGT AAAAACGCGT CAGCCACGCC TGTTTGATTA TCTCTTTACC 631
CATCGTAATA AACACAAACT GATGGCGTTG ATTGATGTTC CGCAGATGAA ACCCCTGGTG
CACGTTTCCG
701 GAATGTTTGG AGCATGGCGC GGCAATACCA GCTGGGTGGC ACCGCTGGCG
TGGCATCCTG AAAATCGCAA 771 TGCCGTAATT ATGGTGGATT TGGCAGGAGA
CATTTCGCCA TTACTGGAAC TGGATAGCGA CACATTGCGC 841 GAGCGTTTAT
ATACCGCAAA AACCGATCTT GGCGATAACG CCGCCGTTCC GGTTAAGCTG GTGCATATCA
911 ATAAATGTCC GGTGCTGGCC CAGGCGAATA CGCTACGCCC GGAAGATGCC
GACCGACTGG GAATTAATCG 981 TCAGCATTGC CTCGATAACC TGAAAATTCT
GCGTGAAAAT CCGCAAGTGC GCGAAAAAGT GGTGGCGATA 1051 TTCGCGGAAG
CCGAACCGTT TACGCCTTCA GATAACGTGG ATGCACAGCT TTATAACGGC TTTTTCAGTG
1121 ACGCAGATCG TGCAGCAATG AAAATTGTGC TGGAAACCGA GCCGCGTAAT
TTACCGGCAC TGGATATCAC 1191 TTTTGTTGAT AAACGGATTG AAAAGCTGTT
GTTCAATTAT CGGGCACGCA ACTTCCCGGG GACGCTGGAT 1261 TATGCCGAGC
AGCAACGCTG GCTGGAGCAC CGTCGCCAGG TCTTCACGCC AGAGTTTTTG CAGGGTTATG
1331 CTGATGAATT GCAGATGCTG GTACAACAAT ATGCCGATGA CAAAGAGAAA
GTGGCGCTGT TAAAAGCACT 1401 TTGGCAGTAC GCGGAAGAGA TTGTC SEQ ID NO:
12 1 MMNDGKQQST FLFHDYETFG THPALDRPAQ FAAIRTDSEF NVIGEPEVFY
CKPADDYLPQ PGAVLITGIT 71 PQEARAKGEN EAAFAARIHS LFTVPKTCIL
GYNNVRFDDE VTRNIFYRNF YDPYAWSWQH DNSRWDLLDV 141 MRACYALRPE
GINWPENDDG LPSFRLEHLT KANGIEHSNA HDAMADVYAT IAMAKLVKTR QPRLFDYLFT
211 HRNKHKLMAL IDVPQMKPLV HVSGMFGAWR GNTSWVAPLA WHPENRNAVI
MVDLAGDISP LLELDSDTLR 281 ERLYTAKTDL GDNAAVPVKL VHINKCPVLA
QANTLRPEDA DRLGINRQHC LDNLKILREN PQVREKVVAI 351 FAEAEPFTPS
DNVDAQLYNG FFSDADRAAM KIVLETEPRN LPALDITFVD KRIEKLLFNY RARNFPGTLD
421 YAEQQRWLEH RRQVFTPEFL QGYADELQML VQQYADDKEK VALLKALWQY AEEIV
SEQ ID NO: 13 1 ATGTTTCGTC GTAAAGAAGA TCTGGATCCG CCGCTGGCAC
TGCTGCCGCT GAAAGGCCTG CGCGAAGCCG 71 CCGCACTGCT GGAAGAAGCG
CTGCGTCAAG GTAAACGCAT TCGTGTTCAC GGCGACTATG ATGCGGATGG 141
CCTGACCGGC ACCGCGATCC TGGTTCGTGG TCTGGCCGCC CTGGGTGCGG ATGTTCATCC
GTTTATCCCG 211 CACCGCCTGG AAGAAGGCTA TGGTGTCCTG ATGGAACGCG
TCCCGGAACA TCTGGAAGCC TCGGACCTGT 281 TTCTGACCGT TGACTGCGGC
ATTACCAACC ATGCGGAACT GCGCGAACTG CTGGAAAATG GCGTGGAAGT 351
CATTGTTACC GATCATCATA CGCCGGGCAA AACGCCGCCG CCGGGTCTGG TCGTGCATCC
GGCGCTGACG 421 CCGGATCTGA AAGAAAAACC GACCGGCGCA GGCGTGGCGT
TTCTGCTGCT GTGGGCACTG CATGAACGCC 491 TGGGCCTGCC GCCGCCGCTG
GAATACGCGG ACCTGGCAGC CGTTGGCACC ATTGCCGACG TTGCCCCGCT 561
GTGGGGTTGG AATCGTGCAC TGGTGAAAGA AGGTCTGGCA CGCATCCCGG CTTCATCTTG
GGTGGGCCTG 631 CGTCTGCTGG CTGAAGCCGT GGGCTATACC GGCAAAGCGG
TCGAAGTCGC TTTCCGCATC GCGCCGCGCA 701 TCAATGCGGC TTCCCGCCTG
GGCGAAGCGG AAAAAGCCCT GCGCCTGCTG CTGACGGATG ATGCGGCAGA 771
AGCTCAGGCG CTGGTCGGCG AACTGCACCG TCTGAACGCC CGTCGTCAGA CCCTGGAAGA
AGCGATGCTG 841 CGCAAACTGC TGCCGCAGGC CGACCCGGAA GCGAAAGCCA
TCGTTCTGCT GGACCCGGAA GGCCATCCGG 911 GTGTTATGGG TATTGTGGCC
TCTCGCATCC TGGAAGCGAC CCTGCGCCCG GTCTTTCTGG TGGCCCAGGG 981
CAAAGGCACC GTGCGTTCGC TGGCTCCGAT TTCCGCCGTC GAAGCACTGC GCAGCGCGGA
AGATCTGCTG 1051 CTGCGTTATG GTGGTCATAA AGAAGCGGCG GGTTTCGCAA
TGGATGAAGC GCTGTTTCCG GCGTTCAAAG 1121 CACGCGTTGA AGCGTATGCC
GCACGTTTCC CGGATCCGGT TCGTGAAGTG GCACTGCTGG ATCTGCTGCC 1191
GGAACCGGGC CTGCTGCCGC AGGTGTTCCG TGAACTGGCA CTGCTGGAAC CGTATGGTGA
AGGTAACCCG 1261 GAACCGCTGT TCCTG SEQ ID NO: 14 1 MFRRKEDLDP
PLALLPLKGL REAAALLEEA LRQGKRIRVH GDYDADGLTG TAILVRGLAA LGADVHPFIP
71 HRLEEGYGVL MERVPEHLEA SDLFLTVDCG ITNHAELREL LENGVEVIVT
DHHTPGKTPP PGLVVHPALT 141 PDLKEKPTGA GVAFLLLWAL HERLGLPPPL
EYADLAAVGT IADVAPLWGW NRALVKEGLA RIPASSWVGL 211 RLLAEAVGYT
GKAVEVAFRI APRINAASRL GEAEKALRLL LTDDAAEAQA LVGELHRLNA RRQTLEEAML
281 RKLLPQADPE AKAIVLLDPE GHPGVMGIVA SRILEATLRP VFLVAQGKGT
VRSLAPISAV EALRSAEDLL 351 LRYGGHKEAA GFAMDEALFP AFKARVEAYA
ARFPDPVREV ALLDLLPEPG LLPQVFRELA LLEPYGEGNP 421 EPLFL SEQ ID NO: 15
1 TCCGGAAGCG GCTCTGGTAG TGGTTCTGGC ATGACACCGG ACATTATCCT GCAGCGTACC
GGGATCGATG 71 TGAGAGCTGT CGAACAGGGG GATGATGCGT GGCACAAATT
ACGGCTCGGC GTCATCACCG CTTCAGAAGT 141 TCACAACGTG ATAGCAAAAC
CCCGCTCCGG AAAGAAGTGG CCTGACATGA AAATGTCCTA CTTCCACACC 211
CTGCTTGCTG AGGTTTGCAC CGGTGTGGCT CCGGAAGTTA ACGCTAAAGC ACTGGCCTGG
GGAAAACAGT 281 ACGAGAACGA CGCCAGAACC CTGTTTGAAT TCACTTCCGG
CGTGAATGTT ACTGAATCCC CGATCATCTA 351 TCGCGACGAA AGTATGCGTA
CCGCCTGCTC TCCCGATGGT TTATGCAGTG ACGGCAACGG CCTTGAACTG 421
AAATGCCCGT TTACCTCCCG GGATTTCATG AAGTTCCGGC TCGGTGGTTT CGAGGCCATA
AAGTCAGCTT 491 ACATGGCCCA GGTGCAGTAC AGCATGTGGG TGACGCGAAA
AAATGCCTGG TACTTTGCCA ACTATGACCC 561 GCGTATGAAG CGTGAAGGCC
TGCATTATGT CGTGATTGAG CGGGATGAAA AGTACATGGC GAGTTTTGAC 631
GAGATCGTGC CGGAGTTCAT CGAAAAAATG GACGAGGCAC TGGCTGAAAT TGGTTTTGTA
TTTGGGGAGC 701 AATGGCGATC TGGCTCTGGT TCCGGCAGCG GTTCCGGA SEQ ID NO:
16 1 MTPDIILQRT GIDVRAVEQG DDAWHKLRLG VITASEVHNV IAKPRSGKKW
PDMKMSYFHT LLAEVCTGVA 71 PEVNAKALAW GKQYENDART LFEFTSGVNV
TESPIIYRDE SMRTACSPDG LCSDGNGLEL KCPFTSRDFM 141 KFRLGGFEAI
KSAYMAQVQY SMWVTRKNAW YFANYDPRMK REGLHYVVIE RDEKYMASFD EIVPEFIEKM
211 DEALAEIGFV FGEQWR SEQ ID NO: 17 1 ATGGCAGATT CTGATATTAA
TATTAAAACC GGTACTACAG ATATTGGAAG CAATACTTCC GGAAGCGGCT 71
CTGGTAGTGG TTCTGGCATG AAATTTGTTA GCTTCAATAT CAACGGCCTG CGCGCGCGCC
CGCATCAGCT 141 GGAAGCGATT GTGGAAAAAC ATCAGCCGGA TGTTATTGGT
CTGCAGGAAA CCAAAGTTCA CGATGATATG 211 TTTCCGCTGG AAGAAGTGGC
GAAACTGGGC TATAACGTGT TTTATCATGG CCAGAAAGGT CATTATGGCG 281
TGGCCCTGCT GACCAAAGAA ACCCCGATCG CGGTTCGTCG TGGTTTTCCG GGTGATGATG
AAGAAGCGCA 351 GCGTCGTATT ATTATGGCGG AAATTCCGAG CCTGCTGGGC
AATGTGACCG TTATTAACGG CTATTTTCCG 421 CAGGGCGAAA GCCGTGATCA
TCCGATTAAA TTTCCGGCCA AAGCGCAGTT CTATCAGAAC CTGCAGAACT 491
ATCTGGAAAC CGAACTGAAA CGTGATAATC CGGTGCTGAT CATGGGCGAT ATGAACATTA
GCCCGACCGA 561 TCTGGATATT GGCATTGGCG AAGAAAACCG TAAACGCTGG
CTGCGTACCG GTAAATGCAG CTTTCTGCCG 631 GAAGAACGTG AATGGATGGA
TCGCCTGATG AGCTGGGGCC TGGTGGATAC CTTTCGTCAT GCGAACCCGC 701
AGACCGCCGA TCGCTTTAGC TGGTTTGATT ATCGCAGCAA AGGTTTTGAT GATAACCGTG
GCCTGCGCAT 771 TGATCTGCTG CTGGCGAGCC AGCCGCTGGC GGAATGCTGC
GTTGAAACCG GTATTGATTA TGAAATTCGC 841 AGCATGGAAA AACCGAGCGA
TCACGCCCCG GTGTGGGCGA CCTTTCGCCG CTCTGGCTCT GGTTCCGGCA 911
GCGGTTCCGG AACAGTAAAA ACAGGTGATT TAGTCACTTA TGATAAAGAA AATGGCATGC
ACAAAAAAGT 981 ATTTTATAGT TTTATCGATG ATAAAAATCA CAATAAAAAA
CTGCTAGTTA TTAGAACAAA AGGTACCATT 1051 GCTGGTCAAT ATAGAGTTTA
TAGCGAAGAA GGTGCTAACA AAAGTGGTTT AGCCTGGCCT TCAGCCTTTA 1121
AGGTACAGTT GCAACTACCT GATAATGAAG TAGCTCAAAT ATCTGATTAC TATCCAAGAA
ATTCGATTGA 1191 TACAAAAGAG TATATGAGTA CTTTAACTTA TGGATTCAAC
GGTAATGTTA CTGGTGATGA TACAGGAAAA 1261 ATTGGCGGCC TTATTGGTGC
AAATGTTTCG ATTGGTCATA CACTGAAATA TGTTCAACCT GATTTCAAAA 1331
CAATTTTAGA GAGCCCAACT GATAAAAAAG TAGGCTGGAA AGTGATATTT AACAATATGG
TGAATCAAAA 1401 TTGGGGACCA TACGATCGAG ATTCTTGGAA CCCGGTATAT
GGCAATCAAC TTTTCATGAA AACTAGAAAT 1471 GGTTCTATGA AAGCAGCAGA
TAACTTCCTT GATCCTAACA AAGCAAGTTC TCTATTATCT TCAGGGIIII 1541
CACCAGACTT CGCTACAGTT ATTACTATGG ATAGAAAAGC ATCCAAACAA CAAACAAATA
TAGATGTAAT 1611 ATACGAACGA GTTCGTGATG ATTACCAATT GCATTGGACT
TCAACAAATT GGAAAGGTAC CAATACTAAA 1681 GATAAATGGA CAGATCGTTC
TTCAGAAAGA TATAAAATCG ATTGGGAAAA AGAAGAAATG
ACAAATGGTG 1751 GTTCGGGCTC ATCTGGTGGC TCGAGTCACC ATCATCATCA CCAC
SEQ ID NO: 18 1 ADSDINIKTG TTDIGSNTSG SGSGSGSGMK FVSFNINGLR
ARPHQLEAIV EKHQPDVIGL QETKVHDDMF 71 PLEEVAKLGY NVFYHGQKGH
YGVALLTKET PIAVRRGFPG DDEEAQRRII MAEIPSLLGN VTVINGYFPQ 141
GESRDHPIKF PAKAQFYQNL QNYLETELKR DNPVLIMGDM NISPTDLDIG IGEENRKRWL
RTGKCSFLPE 211 EREWMDRLMS WGLVDTFRHA NPQTADRFSW FDYRSKGFDD
NRGLRIDLLL ASQPLAECCV ETGIDYEIRS 281 MEKPSDHAPV WATFRRSGSG
SGSGSGTVKT GDLVTYDKEN GMHKKVFYSF IDDKNHNKKL LVIRTKGTIA 351
GQYRVYSEEG ANKSGLAWPS AFKVQLQLPD NEVAQISDYY PRNSIDTKEY MSTLTYGFNG
NVTGDDTGKI 421 GGLIGANVSI GHTLKYVQPD FKTILESPTD KKVGWKVIFN
NMVNQNWGPY DRDSWNPVYG NQLFMKTRNG 491 SMKAADNFLD PNKASSLLSS
GFSPDFATVI TMDRKASKQQ TNIDVIYERV RDDYQLHWTS TNWKGTNTKD 561
KWTDRSSERY KIDWEKEEMT NGGSGSSGGS SHHHHHH SEQ ID NO: 19 1 ATGGCAGATT
CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG CAATACTTCC GGAAGCGGCT
71 CTGGTAGTGG TTCTGGCATG AAATTTGTTA GCTTCAATAT CAACGGCCTG
CGCGCGCGCC CGCATCAGCT 141 GGAAGCGATT GTGGAAAAAC ATCAGCCGGA
TGTTATTGGT CTGCAGGAAA CCAAAGTTCA CGATGATATG 211 TTTCCGCTGG
AAGAAGTGGC GAAACTGGGC TATAACGTGT TTTATCATGG CCAGAAAGGT CATTATGGCG
281 TGGCCCTGCT GACCAAAGAA ACCCCGATCG CGGTTCGTCG TGGTTTTCCG
GGTGATGATG AAGAAGCGCA 351 GCGTCGTATT ATTATGGCGG AAATTCCGAG
CCTGCTGGGC AATGTGACCG TTATTAACGG CTATTTTCCG 421 CAGGGCGAAA
GCCGTGATCA TCCGATTAAA TTTCCGGCCA AAGCGCAGTT CTATCAGAAC CTGCAGAACT
491 ATCTGGAAAC CGAACTGAAA CGTGATAATC CGGTGCTGAT CATGGGCGAT
ATGAACATTA GCCCGACCGA 561 TCTGGATATT GGCATTGGCG AAGAAAACCG
TAAACGCTGG CTGCGTACCG GTAAATGCAG CTTTCTGCCG 631 GAAGAACGTG
AATGGATGGA TCGCCTGATG AGCTGGGGCC TGGTGGATAC CTTTCGTCAT GCGAACCCGC
701 AGACCGCCGA TCGCTTTAGC TGGTTTGATT ATCGCAGCAA AGGTTTTGAT
GATAACCGTG GCCTGCGCAT 771 TGATCTGCTG CTGGCGAGCC AGCCGCTGGC
GGAATGCTGC GTTGAAACCG GTATTGATTA TGAAATTCGC 841 AGCATGGAAA
AACCGAGCGA TCACGCCCCG GTGTGGGCGA CCTTTCGCCG CTCTGGCTCT GGTTCCGGCA
911 GCGGTTCCGG AACAGTAAAA ACAGGTGATT TAGTCACTTA TGATAAAGAA
AATGGCATGC ACAAAAAAGT 981 ATTTTATAGT TTTATCGATG ATAAAAATCA
CAATAAAAAA CTGCTAGTTA TTAGAACAAA AGGTACCATT 1051 GCTGGTCAAT
ATAGAGTTTA TAGCGAAGAA GGTGCTAACA AAAGTGGTTT AGCCTGGCCT TCAGCCTTTA
1121 AGGTACAGTT GCAACTACCT GATAATGAAG TAGCTCAAAT ATCTGATTAC
TATCCAAGAA ATTCGATTGA 1191 TACAAAAGAG TATAGGAGTA CTTTAACTTA
TGGATTCAAC GGTAATGTTA CTGGTGATGA TACAGGAAAA 1261 ATTGGCGGCT
GTATTGGTGC ACAAGTTTCG ATTGGTCATA CACTGAAATA TGTTCAACCT GATTTCAAAA
1331 CAATTTTAGA GAGCCCAACT GATAAAAAAG TAGGCTGGAA AGTGATATTT
AACAATATGG TGAATCAAAA 1401 TTGGGGACCA TACGATCGAG ATTCTTGGAA
CCCGGTATAT GGCAATCAAC TTTTCATGAA AACTAGAAAT 1471 GGTTCTATGA
AAGCAGCAGA TAACTTCCTT GATCCTAACA AAGCAAGTTC TCTATTATCT TCAGGGTTTT
1541 CACCAGACTT CGCTACAGTT ATTACTATGG ATAGAAAAGC ATCCAAACAA
CAAACAAATA TAGATGTAAT 1611 ATACGAACGA GTTCGTGATG ATTACCAATT
GCATTGGACT TCAACAAATT GGAAAGGTAC CAATACTAAA 1681 GATAAATGGA
CAGATCGTTC TTCAGAAAGA TATAAAATCG ATTGGGAAAA AGAAGAAATG ACAAATGGTG
1751 GTTCGGGCTC ATCTGGTGGC TCGAGTCACC ATCATCATCA CCAC SEQ ID NO: 20
1 ADSDINIKTG TTDIGSNTSG SGSGSGSGMK FVSFNINGLR ARPHQLEAIV EKHQPDVIGL
QETKVHDDMF 71 PLEEVAKLGY NVFYHGQKGH YGVALLTKET PIAVRRGFPG
DDEEAQRRII MAEIPSLLGN VTVINGYFPQ 141 GESRDHPIKF PAKAQFYQNL
QNYLETELKR DNPVLIMGDM NISPTDLDIG IGEENRKRWL RTGKCSFLPE 211
EREWMDRLMS WGLVDTFRHA NPQTADRFSW FDYRSKGFDD NRGLRIDLLL ASQPLAECCV
ETGIDYEIRS 281 MEKPSDHAPV WATFRRSGSG SGSGSGTVKT GDLVTYDKEN
GMHKKVFYSF IDDKNHNKKL LVIRTKGTIA 351 GQYRVYSEEG ANKSGLAWPS
AFKVQLQLPD NEVAQISDYY PRNSIDTKEY RSTLTYGFNG NVTGDDTGKI 421
GGCIGAQVSI GHTLKYVQPD FKTILESPTD KKVGWKVIFN NMVNQNWGPY DRDSWNPVYG
NQLFMKTRNG 491 SMKAADNFLD PNKASSLLSS GFSPDFATVI TMDRKASKQQ
TNIDVIYERV RDDYQLHWTS TNWKGTNTKD 561 KWTDRSSERY KIDWEKEEMT
NGGSGSSGGS SHHHHHH SEQ ID NO: 21 1 ATGGCAGATT CTGATATTAA TATTAAAACC
GGTACTACAG ATATTGGAAG CAATACTTCC GGAAGCGGCT 71 CTGGTAGTGG
TTCTGGCATG ATGAACGATG GCAAACAGCA GAGCACCTTC CTGTTTCATG ATTATGAAAC
141 CTTCGGTACC CATCCGGCCC TGGATCGTCC GGCGCAGTTT GCGGCCATTC
GCACCGATAG CGAATTCAAT 211 GTGATTGGCG AACCGGAAGT GTTTTATTGC
AAACCGGCCG ATGATTATCT GCCGCAGCCG GGTGCGGTGC 281 TGATTACCGG
TATTACCCCG CAGGAAGCGC GCGCGAAAGG TGAAAACGAA GCGGCGTTTG CCGCGCGCAT
351 TCATAGCCTG TTTACCGTGC CGAAAACCTG CATTCTGGGC TATAACAATG
TGCGCTTCGA TGATGAAGTT 421 ACCCGTAATA TCTTTTATCG TAACTTTTAT
GATCCGTATG CGTGGAGCTG GCAGCATGAT AACAGCCGTT 491 GGGATCTGCT
GGATGTGATG CGCGCGTGCT ATGCGCTGCG CCCGGAAGGC ATTAATTGGC CGGAAAACGA
561 TGATGGCCTG CCGAGCTTTC GTCTGGAACA TCTGACCAAA GCCAACGGCA
TTGAACATAG CAATGCCCAT 631 GATGCGATGG CCGATGTTTA TGCGACCATT
GCGATGGCGA AACTGGTTAA AACCCGTCAG CCGCGCCTGT 701 TTGATTATCT
GTTTACCCAC CGTAACAAAC ACAAACTGAT GGCGCTGATT GATGTTCCGC AGATGAAACC
771 GCTGGTGCAT GTGAGCGGCA TGTTTGGCGC CTGGCGCGGC AACACCAGCT
GGGTGGCCCC GCTGGCCTGG 841 CACCCGGAAA ATCGTAACGC CGTGATTATG
GTTGATCTGG CCGGTGATAT TAGCCCGCTG CTGGAACTGG 911 ATAGCGATAC
CCTGCGTGAA CGCCTGTATA CCGCCAAAAC CGATCTGGGC GATAATGCCG CCGTGCCGGT
981 GAAACTGGTT CACATTAACA AATGCCCGGT GCTGGCCCAG GCGAACACCC
TGCGCCCGGA AGATGCGGAT 1051 CGTCTGGGTA TTAATCGCCA GCATTGTCTG
GATAATCTGA AAATCCTGCG TGAAAACCCG CAGGTGCGTG 1121 AAAAAGTGGT
GGCGATCTTC GCGGAAGCGG AACCGTTCAC CCCGAGCGAT AACGTGGATG CGCAGCTGTA
1191 TAACGGCTTC TTTAGCGATG CCGATCGCGC GGCGATGAAA ATCGTTCTGG
AAACCGAACC GCGCAATCTG 1261 CCGGCGCTGG ATATTACCTT TGTTGATAAA
CGTATTGAAA AACTGCTGTT TAATTATCGT GCGCGCAATT 1331 TTCCGGGTAC
CCTGGATTAT GCCGAACAGC AGCGTTGGCT GGAACATCGT CGTCAGGTTT TCACCCCGGA
1401 ATTTCTGCAG GGTTATGCGG ATGAACTGCA GATGCTGGTT CAGCAGTATG
CCGATGATAA AGAAAAAGTG 1471 GCGCTGCTGA AAGCGCTGTG GCAGTATGCG
GAAGAAATCG TTTCTGGCTC TGGTTCCGGC AGCGGTTCCG 1541 GAACAGTAAA
AACAGGTGAT TTAGTCACTT ATGATAAAGA AAATGGCATG CACAAAAAAG TATTTTATAG
1611 TTTTATCGAT GATAAAAATC ACAATAAAAA ACTGCTAGTT ATTAGAACAA
AAGGTACCAT TGCTGGTCAA 1681 TATAGAGTTT ATAGCGAAGA AGGTGCTAAC
AAAAGTGGTT TAGCCTGGCC TTCAGCCTTT AAGGTACAGT 1751 TGCAACTACC
TGATAATGAA GTAGCTCAAA TATCTGATTA CTATCCAAGA AATTCGATTG ATACAAAAGA
1821 GTATAGGAGT ACTTTAACTT ATGGATTCAA CGGTAATGTT ACTGGTGATG
ATACAGGAAA AATTGGCGGC 1891 TGTATTGGTG CACAAGTTTC GATTGGTCAT
ACACTGAAAT ATGTTCAACC TGATTTCAAA ACAATTTTAG 1961 AGAGCCCAAC
TGATAAAAAA GTAGGCTGGA AAGTGATATT TAACAATATG GTGAATCAAA ATTGGGGACC
2031 ATACGATCGA GATTCTTGGA ACCCGGTATA TGGCAATCAA CTTTTCATGA
AAACTAGAAA TGGTTCTATG 2101 AAAGCAGCAG ATAACTTCCT TGATCCTAAC
AAAGCAAGTT CTCTATTATC TTCAGGGTTT TCACCAGACT 2171 TCGCTACAGT
TATTACTATG GATAGAAAAG CATCCAAACA ACAAACAAAT ATAGATGTAA TATACGAACG
2241 AGTTCGTGAT GATTACCAAT TGCATTGGAC TTCAACAAAT TGGAAAGGTA
CCAATACTAA AGATAAATGG 2311 ACAGATCGTT CTTCAGAAAG ATATAAAATC
GATTGGGAAA AAGAAGAAAT GACAAATGGT GGTTCGGGCT 2381 CATCTGGTGG
CTCGAGTCAC CATCATCATC ACCAC SEQ ID NO: 22 1 ADSDINIKTG TTDIGSNTSG
SGSGSGSGMM NDGKQQSTFL FHDYETFGTH PALDRPAQFA AIRTDSEFNV 71
IGEPEVFYCK PADDYLPQPG AVLITGITPQ EARAKGENEA AFAARIHSLF TVPKTCILGY
NNVRFDDEVT 141 RNIFYRNFYD PYAWSWQHDN SRWDLLDVMR ACYALRPEGI
NWPENDDGLP SFRLEHLTKA NGIEHSNAHD
211 AMADVYATIA MAKLVKTRQP RLFDYLFTHR NKHKLMALID VPQMKPLVHV
SGMFGAWRGN TSWVAPLAWH 281 PENRNAVIMV DLAGDISPLL ELDSDTLRER
LYTAKTDLGD NAAVPVKLVH INKCPVLAQA NTLRPEDADR 351 LGINRQHCLD
NLKILRENPQ VREKVVAIFA EAEPFTPSDN VDAQLYNGFF SDADRAAMKI VLETEPRNLP
421 ALDITFVDKR IEKLLFNYRA RNFPGTLDYA EQQRWLEHRR QVFTPEFLQG
YADELQMLVQ QYADDKEKVA 491 LLKALWQYAE EIVSGSGSGS GSGTVKTGDL
VTYDKENGMH KKVFYSFIDD KNHNKKLLVI RTKGTIAGQY 561 RVYSEEGANK
SGLAWPSAFK VQLQLPDNEV AQISDYYPRN SIDTKEYRST LTYGFNGNVT GDDTGKIGGC
631 IGAQVSIGHT LKYVQPDFKT ILESPTDKKV GWKVIFNNMV NQNWGPYDRD
SWNPVYGNQL FMKTRNGSMK 701 AADNFLDPNK ASSLLSSGFS PDFATVITMD
RKASKQQTNI DVIYERVRDD YQLHWTSTNW KGTNTKDKWT 771 DRSSERYKID
WEKEEMTNGG SGSSGGSSHH HHHH SEQ ID NO: 23 1 ATGGCAGATT CTGATATTAA
TATTAAAACC GGTACTACAG ATATTGGAAG CAATACTTCC GGAAGCGGCT 71
CTGGTAGTGG TTCTGGCATG TTTCGTCGTA AAGAAGATCT GGATCCGCCG CTGGCACTGC
TGCCGCTGAA 141 AGGCCTGCGC GAAGCCGCCG CACTGCTGGA AGAAGCGCTG
CGTCAAGGTA AACGCATTCG TGTTCACGGC 211 GACTATGATG CGGATGGCCT
GACCGGCACC GCGATCCTGG TTCGTGGTCT GGCCGCCCTG GGTGCGGATG 281
TTCATCCGTT TATCCCGCAC CGCCTGGAAG AAGGCTATGG TGTCCTGATG GAACGCGTCC
CGGAACATCT 351 GGAAGCCTCG GACCTGTTTC TGACCGTTGA CTGCGGCATT
ACCAACCATG CGGAACTGCG CGAACTGCTG 421 GAAAATGGCG TGGAAGTCAT
TGTTACCGAT CATCATACGC CGGGCAAAAC GCCGCCGCCG GGTCTGGTCG 491
TGCATCCGGC GCTGACGCCG GATCTGAAAG AAAAACCGAC CGGCGCAGGC GTGGCGTTTC
TGCTGCTGTG 561 GGCACTGCAT GAACGCCTGG GCCTGCCGCC GCCGCTGGAA
TACGCGGACC TGGCAGCCGT TGGCACCATT 631 GCCGACGTTG CCCCGCTGTG
GGGTTGGAAT CGTGCACTGG TGAAAGAAGG TCTGGCACGC ATCCCGGCTT 701
CATCTTGGGT GGGCCTGCGT CTGCTGGCTG AAGCCGTGGG CTATACCGGC AAAGCGGTCG
AAGTCGCTTT 771 CCGCATCGCG CCGCGCATCA ATGCGGCTTC CCGCCTGGGC
GAAGCGGAAA AAGCCCTGCG CCTGCTGCTG 841 ACGGATGATG CGGCAGAAGC
TCAGGCGCTG GTCGGCGAAC TGCACCGTCT GAACGCCCGT CGTCAGACCC 911
TGGAAGAAGC GATGCTGCGC AAACTGCTGC CGCAGGCCGA CCCGGAAGCG AAAGCCATCG
TTCTGCTGGA 981 CCCGGAAGGC CATCCGGGTG TTATGGGTAT TGTGGCCTCT
CGCATCCTGG AAGCGACCCT GCGCCCGGTC 1051 TTTCTGGTGG CCCAGGGCAA
AGGCACCGTG CGTTCGCTGG CTCCGATTTC CGCCGTCGAA GCACTGCGCA 1121
GCGCGGAAGA TCTGCTGCTG CGTTATGGTG GTCATAAAGA AGCGGCGGGT TTCGCAATGG
ATGAAGCGCT 1191 GTTTCCGGCG TTCAAAGCAC GCGTTGAAGC GTATGCCGCA
CGTTTCCCGG ATCCGGTTCG TGAAGTGGCA 1261 CTGCTGGATC TGCTGCCGGA
ACCGGGCCTG CTGCCGCAGG TGTTCCGTGA ACTGGCACTG CTGGAACCGT 1331
ATGGTGAAGG TAACCCGGAA CCGCTGTTCC TGTCTGGCTC TGGTTCCGGC AGCGGTTCCG
GAACAGTAAA 1401 AACAGGTGAT TTAGTCACTT ATGATAAAGA AAATGGCATG
CACAAAAAAG TATTTTATAG TTTTATCGAT 1471 GATAAAAATC ACAATAAAAA
ACTGCTAGTT ATTAGAACAA AAGGTACCAT TGCTGGTCAA TATAGAGTTT 1541
ATAGCGAAGA AGGTGCTAAC AAAAGTGGTT TAGCCTGGCC TTCAGCCTTT AAGGTACAGT
TGCAACTACC 1611 TGATAATGAA GTAGCTCAAA TATCTGATTA CTATCCAAGA
AATTCGATTG ATACAAAAGA GTATAGGAGT 1681 ACTTTAACTT ATGGATTCAA
CGGTAATGTT ACTGGTGATG ATACAGGAAA AATTGGCGGC TGTATTGGTG 1751
CACAAGTTTC GATTGGTCAT ACACTGAAAT ATGTTCAACC TGATTTCAAA ACAATTTTAG
AGAGCCCAAC 1821 TGATAAAAAA GTAGGCTGGA AAGTGATATT TAACAATATG
GTGAATCAAA ATTGGGGACC ATACGATCGA 1891 GATTCTTGGA ACCCGGTATA
TGGCAATCAA CTTTTCATGA AAACTAGAAA TGGTTCTATG AAAGCAGCAG 1961
ATAACTTCCT TGATCCTAAC AAAGCAAGTT CTCTATTATC TTCAGGGTTT TCACCAGACT
TCGCTACAGT 2031 TATTACTATG GATAGAAAAG CATCCAAACA ACAAACAAAT
ATAGATGTAA TATACGAACG AGTTCGTGAT 2101 GATTACCAAT TGCATTGGAC
TTCAACAAAT TGGAAAGGTA CCAATACTAA AGATAAATGG ACAGATCGTT 2171
CTTCAGAAAG ATATAAAATC GATTGGGAAA AAGAAGAAAT GACAAATGGT GGTTCGGGCT
CATCTGGTGG 2241 CTCGAGTCAC CATCATCATC ACCAC SEQ ID NO: 24 1
ADSDINIKTG TTDIGSNTSG SGSGSGSGMF RRKEDLDPPL ALLPLKGLRE AAALLEEALR
QGKRIRVHGD 71 YDADGLTGTA ILVRGLAALG ADVHPFIPHR LEEGYGVLME
RVPEHLEASD LFLTVDCGIT NHAELRELLE 141 NGVEVIVTDH HTPGKTPPPG
LVVHPALTPD LKEKPTGAGV AFLLLWALHE RLGLPPPLEY ADLAAVGTIA 211
DVAPLWGWNR ALVKEGLARI PASSWVGLRL LAEAVGYTGK AVEVAFRIAP RINAASRLGE
AEKALRLLLT 281 DDAAEAQALV GELHRLNARR QTLEEAMLRK LLPQADPEAK
AIVLLDPEGH PGVMGIVASR ILEATLRPVF 351 LVAQGKGTVR SLAPISAvEA
LRSAEDLLLR YGGHKEAAGF AMDEALFPAF KARVEAYAAR FPDPVREVAL 421
LDLLPEPGLL PQVFRELALL EPYGEGNPEP LFLSGSGSGS GSGTVKTGDL VTYDKENGMH
KKVFYSFIDD 491 KNHNKKLLVI RTKGTIAGQY RVYSEEGANK SGLAWPSAFK
VQLQLPDNEV AQISDYYPRN SIDTKEYRST 561 LTYGFNGNVT GDDTGKIGGC
IGAQVSIGHT LKYVQPDFKT ILESPTDKKV GWKVIFNNMV NQNWGPYDRD 631
SWNPVYGNQL FMKTRNGSMK AADNFLDPNK ASSLLSSGFS PDFATVITMD RKASKQQTNI
DVIYERVRDD 701 YQLHWTSTNW KGTNTKDKWT DRSSERYKID WEKEEMTNGG
SGSSGGSSHH HHRH SEQ ID NO: 25 1 ATGGCAGATT CTGATATTAA TATTAAAACC
GGTACTACAG ATATTGGAAG CAATACTACA GTAAAAACAG 71 GTGATTTAGT
CACTTATGAT AAAGAAAATG GCATGCACAA AAAAGTATTT TATAGTTTTA TCGATTCCGG
141 AAGCGGCTCT GGTAGTGGTT CTGGCATGAA ATTTGTTAGC TTCAATATCA
ACGGCCTGCG CGCGCGCCCG 211 CATCAGCTGG AAGCGATTGT GGAAAAACAT
CAGCCGGATG TTATTGGTCT GCAGGAAACC AAAGTTCACG 281 ATGATATGTT
TCCGCTGGAA GAAGTGGCGA AACTGGGCTA TAACGTGTTT TATCATGGCC AGAAAGGTCA
351 TTATGGCGTG GCCCTGCTGA CCAAAGAAAC CCCGATCGCG GTTCGTCGTG
GTTTTCCGGG TGATGATGAA 421 GAAGCGCAGC GTCGTATTAT TATGGCGGAA
ATTCCGAGCC TGCTGGGCAA TGTGACCGTT ATTAACGGCT 491 ATTTTCCGCA
GGGCGAAAGC CGTGATCATC CGATTAAATT TCCGGCCAAA GCGCAGTTCT ATCAGAACCT
561 GCAGAACTAT CTGGAAACCG AACTGAAACG TGATAATCCG GTGCTGATCA
TGGGCGATAT GAACATTAGC 631 CCGACCGATC TGGATATTGG CATTGGCGAA
GAAAACCGTA AACGCTGGCT GCGTACCGGT AAATGCAGCT 701 TTCTGCCGGA
AGAACGTGAA TGGATGGATC GCCTGATGAG CTGGGGCCTG GTGGATACCT TTCGTCATGC
771 GAACCCGCAG ACCGCCGATC GCTTTAGCTG GTTTGATTAT CGCAGCAAAG
GTTTTGATGA TAACCGTGGC 841 CTGCGCATTG ATCTGCTGCT GGCGAGCCAG
CCGCTGGCGG AATGCTGCGT TGAAACCGGT ATTGATTATG 911 AAATTCGCAG
CATGGAAAAA CCGAGCGATC ACGCCCCGGT GTGGGCGACC TTTCGCCGCT CTGGCTCTGG
981 TTCCGGCAGC GGTTCCGGAC ACAATAAAAA ACTGCTAGTT ATTAGAACAA
AAGGTACCAT TGCTGGTCAA 1051 TATAGAGTTT ATAGCGAAGA AGGTGCTAAC
AAAAGTGGTT TAGCCTGGCC TTCAGCCTTT AAGGTACAGT 1121 TGCAACTACC
TGATAATGAA GTAGCTCAAA TATCTGATTA CTATCCAAGA AATTCGATTG ATACAAAAGA
1191 GTATAGGAGT ACTTTAACTT ATGGATTCAA CGGTAATGTT ACTGGTGATG
ATACAGGAAA AATTGGCGGC 1261 TGTATTGGTG CACAAGTTTC GATTGGTCAT
ACACTGAAAT ATGTTCAACC TGATTTCAAA ACAATTTTAG 1331 AGAGCCCAAC
TGATAAAAAA GTAGGCTGGA AAGTGATATT TAACAATATG GTGAATCAAA ATTGGGGACC
1401 ATACGATCGA GATTCTTGGA ACCCGGTATA TGGCAATCAA CTTTTCATGA
AAACTAGAAA TGGTTCTATG 1471 AAAGCAGCAG ATAACTTCCT TGATCCTAAC
AAAGCAAGTT CTCTATTATC TTCAGGGTTT TCACCAGACT 1541 TCGCTACAGT
TATTACTATG GATAGAAAAG CATCCAAACA ACAAACAAAT ATAGATGTAA TATACGAACG
1611 AGTTCGTGAT GATTACCAAT TGCATTGGAC TTCAACAAAT TGGAAAGGTA
CCAATACTAA AGATAAATGG 1681 ACAGATCGTT CTTCAGAAAG ATATAAAATC
GATTGGGAAA AAGAAGAAAT GACAAATGGT GGTTCGGGCT 1751 CATCTGGTGG
CTCGAGTCAC CATCATCATC ACCAC SEQ ID NO: 26 1 ADSDINIKTG TTDIGSNTTV
KTGDLVTYDK ENGMHKKVFY SFIDSGSGSG SGSGMKFVSF NINGLRARPH 71
QLEAIVEKHQ PDVIGLQETK VHDDMFPLEE VAKLGYNVFY HGQKGHYGVA LLTKETPIAV
RRGFPGDDEE 141 AQRRIIMAEI PSLLGNVTVI NGYFPQGESR DHPIKFPAKA
QFYQNLQNYL ETELKRDNPV LIMGDMNISP 211 TDLDIGIGEE NRKRWLRTGK
CSFLPEEREW MDRLMSWGLV DTFRHANPQT ADRFSWFDYR SKGFDDNRGL 281
RIDLLLASQP LAECCVETGI DYEIRSMEKP SDHAPVWATF RRSGSGSGSG SGHNKKLLVI
RTKGTIAGQY
351 RVYSEEGANK SGLAWPSAFK VQLQLPDNEV AQISDYYPRN SIDTKEYRST
LTYGFNGNVT GDDTGKIGGC 421 IGAQVSIGHT LKYVQPDFKT ILESPTDKKV
GWKVIFNNMV NQNWGPYDRD SWNPVYGNQL FMKTRNGSMK 491 AADNFLDPNK
ASSLLSSGFS PDFATVITMD RKASKQQTNI DVIYERVRDD YQLHWTSTNW KGTNTKDKWT
561 DRSSERYKID WEKEEMTNGG SGSSGGSSHH HHHH SEQ ID NO: 27 1
ATGGCAGATT CTGATATTAA TATTAAAACC GGTACTACAG ATATTGGAAG CAATACTACA
GTAAAAACAG 71 GTGATTTAGT CACTTATGAT AAAGAAAATG GCATGCACAA
AAAAGTATTT TATAGTTTTA TCGATGATAA 141 AAATCACAAT AAAAAACTGC
TAGTTATTAG AACAAAAGGT ACCATTGCTG GTCAATATAG AGTTTATAGC 211
GAAGAAGGTG CTAACAAAAG TGGTTTAGCC TGGCCTTCAG CCTTTAAGGT ACAGTTGCAA
CTACCTGATA 281 ATGAAGTAGC TCAAATATCT GATTACTATC CAAGAAATTC
GATTGATACA AAAGAGTATA GGAGTACTTT 351 AACTTATGGA TTCAACGGTA
ATGTTACTGG TGATGATACA GGAAAAATTG GCGGCTGTAT TGGTGCACAA 421
GTTTCGATTG GTCATACACT GAAATATGTT CAACCTGATT TCAAAACAAT TTTAGAGAGC
CCAACTGATA 491 AAAAAGTAGG CTGGAAAGTG ATATTTAACA ATATGGTGAA
TCAAAATTGG GGACCATACG ATCGAGATTC 561 TTGGAACCCG GTATATGGCA
ATCAACTTTT CATGAAAACT AGAAATGGTT CTATGAAAGC AGCAGATAAC 631
TTCCTTGATC CTAACAAAGC AAGTTCTCTA TTATCTTCAG GGTTTTCACC AGACTTCGCT
ACAGTTATTA 701 CTATGGATAG AAAAGCATCC AAACAACAAA CAAATATAGA
TGTAATATAC GAACGAGTTC GTGATGATTA 771 CCAATTGCAT TGGACTTCAA
CAAATTGGAA AGGTACCAAT ACTAAAGATA AATGGACAGA TCGTTCTTCA 841
GAAAGATATA AAATCGATTG GGAAAAAGAA GAAATGACAA ATTCCGGTAG CGGCTCTGGT
TCTGGCTCTG 911 GTTCCGGCAG CGGTTCCGGA CAGAGCACCT TCCTGTTTCA
TGATTATGAA ACCTTCGGTA CCCATCCGGC 981 CCTGGATCGT CCGGCGCAGT
TTGCGGCCAT TCGCACCGAT AGCGAATTCA ATGTGATTGG CGAACCGGAA 1051
GTGTTTTATT GCAAACCGGC CGATGATTAT CTGCCGCAGC CGGGTGCGGT GCTGATTACC
GGTATTACCC 1121 CGCAGGAAGC GCGCGCGAAA GGTGAAAACG AAGCGGCGTT
TGCCGCGCGC ATTCATAGCC TGTTTACCGT 1191 GCCGAAAACC TGCATTCTGG
GCTATAACAA TGTGCGCTTC GATGATGAAG TTACCCGTAA TATCTTTTAT 1261
CGTAACTTTT ATGATCCGTA TGCGTGGAGC TGGCAGCATG ATAACAGCCG TTGGGATCTG
CTGGATGTGA 1331 TGCGCGCGTG CTATGCGCTG CGCCCGGAAG GCATTAATTG
GCCGGAAAAC GATGATGGCC TGCCGAGCTT 1401 TCGTCTGGAA CATCTGACCA
AAGCCAACGG CATTGAACAT AGCAATGCCC ATGATGCGAT GGCCGATGTT 1471
TATGCGACCA TTGCGATGGC GAAACTGGTT AAAACCCGTC AGCCGCGCCT GTTTGATTAT
CTGTTTACCC 1541 ACCGTAACAA ACACAAACTG ATGGCGCTGA TTGATGTTCC
GCAGATGAAA CCGCTGGTGC ATGTGAGCGG 1611 CATGTTTGGC GCCTGGCGCG
GCAACACCAG CTGGGTGGCC CCGCTGGCCT GGCACCCGGA AAATCGTAAC 1681
GCCGTGATTA TGGTTGATCT GGCCGGTGAT ATTAGCCCGC TGCTGGAACT GGATAGCGAT
ACCCTGCGTG 1751 AACGCCTGTA TACCGCCAAA ACCGATCTGG GCGATAATGC
CGCCGTGCCG GTGAAACTGG TTCACATTAA 1821 CAAATGCCCG GTGCTGGCCC
AGGCGAACAC CCTGCGCCCG GAAGATGCGG ATCGTCTGGG TATTAATCGC 1891
CAGCATTGTC TGGATAATCT GAAAATCCTG CGTGAAAACC CGCAGGTGCG TGAAAAAGTG
GTGGCGATCT 1961 TCGCGGAAGC GGAACCGTTC ACCCCGAGCG ATAACGTGGA
TGCGCAGCTG TATAACGGCT TCTTTAGCGA 2031 TGCCGATCGC GCGGCGATGA
AAATCGTTCT GGAAACCGAA CCGCGCAATC TGCCGGCGCT GGATATTACC 2101
TTTGTTGATA AACGTATTGA AAAACTGCTG TTTAATTATC GTGCGCGCAA TTTTCCGGGT
ACCCTGGATT 2171 ATGCCGAACA GCAGCGTTGG CTGGAACATC GTCGTCAGGT
TTTCACCCCG GAATTTCTGC AGGGTTATGC 2241 GGATGAACTG CAGATGCTGG
TTCAGCAGTA TGCCGATGAT AAAGAAAAAG TGGCGCTGCT GAAAGCGCTG 2311
TGGCAGTATG CGGAAGAAAT CGTTTCTGGC TCTGGTCACC ATCATCATCA CCAC SEQ ID
NO: 28 1 ADSDINIKTG TTDIGSNTTV KTGDLVTYDK ENGMHKKVFY SFIDDKNHNK
KLLVIRTKGT IAGQYRVYSE 71 EGANKSGLAW PSAFKVQLQL PDNEVAQISD
YYPRNSIDTK EYRSTLTYGF NGNVTGDDTG KIGGCIGAQV 141 SIGHTLKYVQ
PDFKTILESP TDKKVGWKVI FNNMVNQNWG PYDRDSWNPV YGNQLFMKTR NGSMKAADNF
211 LDPNKASSLL SSGFSPDFAT VITMDRKASK QQTNIDVIYE RVRDDYQLHW
TSTNWKGTNT KDKWTDRSSE 281 RYKIDWEKEE MTNSGSGSGS GSGSGSGSGQ
STFLFHDYET FGTHPALDRP AQFAAIRTDS EFNVIGEPEV 351 FYCKPADDYL
PQPGAVLITG ITPQEARAKG ENEAAFAARI HSLFTVPKTC ILGYNNVRFD DEVTRNIFYR
421 NFYDPYAWSW QHDNSRWDLL DVMRACYALR PEGINWPEND DGLPSFRLEH
LTKANGIEHS NAHDAMADVY 491 ATIAMAKLVK TRQPRLFDYL FTHRNKHKLM
ALIDVPQMKP LVHVSGMFGA WRGNTSWVAP LAWHPENRNA 561 VIMVDLAGDI
SPLLELDSDT LRERLYTAKT DLGDNAAVPV KLVHINKCPV LAQANTLRPE DADRLGINRQ
631 HCLDNLKILR ENPQVREKVV AIFAEAEPFT PSDNVDAQLY NGFFSDADRA
AMKIVLETEP RNLPALDITF 701 VDKRIEKLLF NYRARNFPGT LDYAEQQRWL
EHRRQVFTPE FLQGYADELQ MLVQQYADDK EKVALLKALW 771 QYAEEIVSGS GHHHHHH
SEQ ID NO: 29 1 ATGGCAGATT CTGATATTAA TATTAAAACC GGTACTACAG
ATATTGGAAG CAATACTACA GTAAAAACAG 71 GTGATTTAGT CACTTATGAT
AAAGAAAATG GCATGCACAA AAAAGTATTT TATAGTTTTA TCGATGATAA 141
AAATCACAAT AAAAAACTGC TAGTTATTAG AACAAAAGGT ACCATTGCTG GTCAATATAG
AGTTTATAGC 211 GAAGAAGGTG CTAACAAAAG TGGTTTAGCC TGGCCTTCAG
CCTTTAAGGT ACAGTTGCAA CTACCTGATA 281 ATGAAGTAGC TCAAATATCT
GATTACTATC CAAGAAATTC GATTGATACA AAAGAGTATA GGAGTACTTT 351
AACTTATGGA TTCAACGGTA ATGTTACTGG TGATGATACA GGAAAAATTG GCGGCTGTAT
TGGTGCACAA 421 GTTTCGATTG GTCATACACT GAAATATGTT CAACCTGATT
TCAAAACAAT TTTAGAGAGC CCAACTGATA 491 AAAAAGTAGG CTGGAAAGTG
ATATTTAACA ATATGGTGAA TCAAAATTGG GGACCATACG ATCGAGATTC 561
TTGGAACCCG GTATATGGCA ATCAACTTTT CATGAAAACT AGAAATGGTT CTATGAAAGC
AGCAGATAAC 631 TTCCTTGATC CTAACAAAGC AAGTTCTCTA TTATCTTCAG
GGTTTTCACC AGACTTCGCT ACAGTTATTA 701 CTATGGATAG AAAAGCATCC
AAACAACAAA CAAATATAGA TGTAATATAC GAACGAGTTC GTGATGATTA 771
CCAATTGCAT TGGACTTCAA CAAATTGGAA AGGTACCAAT ACTAAAGATA AATGGACAGA
TCGTTCTTCA 841 GAAAGATATA AAATCGATTG GGAAAAAGAA GAAATGACAA
ATGATGGCTC CGGTAGCGGC TCTGGTTCTG 911 GCTCTGGTTC CGGCAGCGGT
TCCGGACAGA GCACCTTCCT GTTTCATGAT TATGAAACCT TCGGTACCCA 981
TCCGGCCCTG GATCGTCCGG CGCAGTTTGC GGCCATTCGC ACCGATAGCG AATTCAATGT
GATTGGCGAA 1051 CCGGAAGTGT TTTATTGCAA ACCGGCCGAT GATTATCTGC
CGCAGCCGGG TGCGGTGCTG ATTACCGGTA 1121 TTACCCCGCA GGAAGCGCGC
GCGAAAGGTG AAAACGAAGC GGCGTTTGCC GCGCGCATTC ATAGCCTGTT 1191
TACCGTGCCG AAAACCTGCA TTCTGGGCTA TAACAATGTG CGCTTCGATG ATGAAGTTAC
CCGTAATATC 1261 TTTTATCGTA ACTTTTATGA TCCGTATGCG TGGAGCTGGC
AGCATGATAA CAGCCGTTGG GATCTGCTGG 1331 ATGTGATGCG CGCGTGCTAT
GCGCTGCGCC CGGAAGGCAT TAATTGGCCG GAAAACGATG ATGGCCTGCC 1401
GAGCTTTCGT CTGGAACATC TGACCAAAGC CAACGGCATT GAACATAGCA ATGCCCATGA
TGCGATGGCC 1471 GATGTTTATG CGACCATTGC GATGGCGAAA CTGGTTAAAA
CCCGTCAGCC GCGCCTGTTT GATTATCTGT 1541 TTACCCACCG TAACAAACAC
AAACTGATGG CGCTGATTGA TGTTCCGCAG ATGAAACCGC TGGTGCATGT 1611
GAGCGGCATG TTTGGCGCCT GGCGCGGCAA CACCAGCTGG GTGGCCCCGC TGGCCTGGCA
CCCGGAAAAT 1681 CGTAACGCCG TGATTATGGT TGATCTGGCC GGTGATATTA
GCCCGCTGCT GGAACTGGAT AGCGATACCC 1751 TGCGTGAACG CCTGTATACC
GCCAAAACCG ATCTGGGCGA TAATGCCGCC GTGCCGGTGA AACTGGTTCA 1821
CATTAACAAA TGCCCGGTGC TGGCCCAGGC GAACACCCTG CGCCCGGAAG ATGCGGATCG
TCTGGGTATT 1891 AATCGCCAGC ATTGTCTGGA TAATCTGAAA ATCCTGCGTG
AAAACCCGCA GGTGCGTGAA AAAGTGGTGG 1961 CGATCTTCGC GGAAGCGGAA
CCGTTCACCC CGAGCGATAA CGTGGATGCG CAGCTGTATA ACGGCTTCTT 2031
TAGCGATGCC GATCGCGCGG CGATGAAAAT CGTTCTGGAA ACCGAACCGC GCAATCTGCC
GGCGCTGGAT 2101 ATTACCTTTG TTGATAAACG TATTGAAAAA CTGCTGTTTA
ATTATCGTGC GCGCAATTTT CCGGGTACCC 2171 TGGATTATGC CGAACAGCAG
CGTTGGCTGG AACATCGTCG TCAGGTTTTC ACCCCGGAAT TTCTGCAGGG 2241
TTATGCGGAT GAACTGCAGA TGCTGGTTCA GCAGTATGCC GATGATAAAG AAAAAGTGGC
GCTGCTGAAA 2311 GCGCTGTGGC AGTATGCGGA AGAAATCGTT TCTGGCTCTG
GTCACCATCA TCATCACCAC
SEQ ID NO: 30 1 ADSDINIKTG TTDIGSNTTV KTGDLVTYDK ENGMHKKVFY
SFIDDKNHNK KLLVIRTKGT IAGQYRVYSE 71 EGANKSGLAW PSAFKVQLQL
PDNEVAQISD YYPRNSIDTK EYRSTLTYGF NGNVTGDDTG KIGGCIGAQV 141
SIGHTLKYVQ PDFKTILESP TDKKVGWKVI FNNMVNQNWG PYDRDSWNPV YGNQLFMKTR
NGSMKAADNF 211 LDPNKASSLL SSGFSPDFAT VITMDRKASK QQTNIDVIYE
RVRDDYQLHW TSTNWKGTNT KDKWTDRSSE 281 RYKIDWEKEE MTNDGSGSGS
GSGSGSGSGS GQSTFLFHDY ETFGTHPALD RPAQFAAIRT DSEFNVIGEP 351
EVFYCKPADD YLPQPGAVLI TGITPQEARA KGENEAAFAA RIHSLFTVPK TCILGYNNVR
FDDEVTRNIF 421 YRNFYDPYAW SWQHDNSRWD LLDVMRACYA LRPEGINWPE
NDDGLPSFRL EHLTKANGIE HSNAHDAMAD 491 VYATIAMAKL VKTRQPRLFD
YLFTHRNKHK LMALIDVPQM KPLVHVSGMF GAWRGNTSWV APLAWHPENR 561
NAVIMVDLAG DISPLLELDS DTLRERLYTA KTDLGDNAAV PVKLVHINKC PVLAQANTLR
PEDADRLGIN 631 RQHCLDNLKI LRENPQVREK VVAIFAEAEP FTPSDNVDAQ
LYNGFFSDAD RAAMKIVLET EPRNLPALDI 701 TFVDKRIEKL LFNYRARNFP
GTLDYAEQQR WLEHRRQVFT PEFLQGYADE LQMLVQQYAD DKEKVALLKA 771
LWQYAEEIVS GSGHHHHHH
Sequence CWU 1
1
321882DNAStaphylococcus aureusCDS(4)..(882) 1atg gca gat tct gat
att aat att aaa acc ggt act aca gat att gga 48 Ala Asp Ser Asp Ile
Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat act aca
gta aaa aca ggt gat tta gtc act tat gat aaa gaa 96Ser Asn Thr Thr
Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu 20 25 30aat ggc atg
cac aaa aaa gta ttt tat agt ttt atc gat gat aaa aat 144Asn Gly Met
His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn 35 40 45cac aat
aaa aaa ctg cta gtt att aga aca aaa ggt acc att gct ggt 192His Asn
Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly 50 55 60caa
tat aga gtt tat agc gaa gaa ggt gct aac aaa agt ggt tta gcc 240Gln
Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala 65 70
75tgg cct tca gcc ttt aag gta cag ttg caa cta cct gat aat gaa gta
288Trp Pro Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn Glu
Val80 85 90 95gct caa ata tct gat tac tat cca aga aat tcg att gat
aca aaa gag 336Ala Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile Asp
Thr Lys Glu 100 105 110tat atg agt act tta act tat gga ttc aac ggt
aat gtt act ggt gat 384Tyr Met Ser Thr Leu Thr Tyr Gly Phe Asn Gly
Asn Val Thr Gly Asp 115 120 125gat aca gga aaa att ggc ggc ctt att
ggt gca aat gtt tcg att ggt 432Asp Thr Gly Lys Ile Gly Gly Leu Ile
Gly Ala Asn Val Ser Ile Gly 130 135 140cat aca ctg aaa tat gtt caa
cct gat ttc aaa aca att tta gag agc 480His Thr Leu Lys Tyr Val Gln
Pro Asp Phe Lys Thr Ile Leu Glu Ser 145 150 155cca act gat aaa aaa
gta ggc tgg aaa gtg ata ttt aac aat atg gtg 528Pro Thr Asp Lys Lys
Val Gly Trp Lys Val Ile Phe Asn Asn Met Val160 165 170 175aat caa
aat tgg gga cca tac gat cga gat tct tgg aac ccg gta tat 576Asn Gln
Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr 180 185
190ggc aat caa ctt ttc atg aaa act aga aat ggt tct atg aaa gca gca
624Gly Asn Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala Ala
195 200 205gat aac ttc ctt gat cct aac aaa gca agt tct cta tta tct
tca ggg 672Asp Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser
Ser Gly 210 215 220ttt tca cca gac ttc gct aca gtt att act atg gat
aga aaa gca tcc 720Phe Ser Pro Asp Phe Ala Thr Val Ile Thr Met Asp
Arg Lys Ala Ser 225 230 235aaa caa caa aca aat ata gat gta ata tac
gaa cga gtt cgt gat gat 768Lys Gln Gln Thr Asn Ile Asp Val Ile Tyr
Glu Arg Val Arg Asp Asp240 245 250 255tac caa ttg cat tgg act tca
aca aat tgg aaa ggt acc aat act aaa 816Tyr Gln Leu His Trp Thr Ser
Thr Asn Trp Lys Gly Thr Asn Thr Lys 260 265 270gat aaa tgg aca gat
cgt tct tca gaa aga tat aaa atc gat tgg gaa 864Asp Lys Trp Thr Asp
Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu 275 280 285aaa gaa gaa
atg aca aat 882Lys Glu Glu Met Thr Asn 2902293PRTStaphylococcus
aureus 2Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly
Ser1 5 10 15Asn Thr Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu Asn 20 25 30Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp
Lys Asn His 35 40 45Asn Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr
Ile Ala Gly Gln 50 55 60Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys
Ser Gly Leu Ala Trp65 70 75 80Pro Ser Ala Phe Lys Val Gln Leu Gln
Leu Pro Asp Asn Glu Val Ala 85 90 95Gln Ile Ser Asp Tyr Tyr Pro Arg
Asn Ser Ile Asp Thr Lys Glu Tyr 100 105 110Met Ser Thr Leu Thr Tyr
Gly Phe Asn Gly Asn Val Thr Gly Asp Asp 115 120 125Thr Gly Lys Ile
Gly Gly Leu Ile Gly Ala Asn Val Ser Ile Gly His 130 135 140Thr Leu
Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser Pro145 150 155
160Thr Asp Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn
165 170 175Gln Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val
Tyr Gly 180 185 190Asn Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met
Lys Ala Ala Asp 195 200 205Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser
Leu Leu Ser Ser Gly Phe 210 215 220Ser Pro Asp Phe Ala Thr Val Ile
Thr Met Asp Arg Lys Ala Ser Lys225 230 235 240Gln Gln Thr Asn Ile
Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr 245 250 255Gln Leu His
Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys Asp 260 265 270Lys
Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu Lys 275 280
285Glu Glu Met Thr Asn 2903882DNAArtificial sequencealpha-HL
M113R/N139Q 3atg gca gat tct gat att aat att aaa acc ggt act aca
gat att gga 48 Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp
Ile Gly 1 5 10 15agc aat act aca gta aaa aca ggt gat tta gtc act
tat gat aaa gaa 96Ser Asn Thr Thr Val Lys Thr Gly Asp Leu Val Thr
Tyr Asp Lys Glu 20 25 30aat ggc atg cac aaa aaa gta ttt tat agt ttt
atc gat gat aaa aat 144Asn Gly Met His Lys Lys Val Phe Tyr Ser Phe
Ile Asp Asp Lys Asn 35 40 45cac aat aaa aaa ctg cta gtt att aga aca
aaa ggt acc att gct ggt 192His Asn Lys Lys Leu Leu Val Ile Arg Thr
Lys Gly Thr Ile Ala Gly 50 55 60caa tat aga gtt tat agc gaa gaa ggt
gct aac aaa agt ggt tta gcc 240Gln Tyr Arg Val Tyr Ser Glu Glu Gly
Ala Asn Lys Ser Gly Leu Ala 65 70 75tgg cct tca gcc ttt aag gta cag
ttg caa cta cct gat aat gaa gta 288Trp Pro Ser Ala Phe Lys Val Gln
Leu Gln Leu Pro Asp Asn Glu Val80 85 90 95gct caa ata tct gat tac
tat cca aga aat tcg att gat aca aaa gag 336Ala Gln Ile Ser Asp Tyr
Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu 100 105 110tat agg agt act
tta act tat gga ttc aac ggt aat gtt act ggt gat 384Tyr Arg Ser Thr
Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp 115 120 125gat aca
gga aaa att ggc ggc ctt att ggt gca caa gtt tcg att ggt 432Asp Thr
Gly Lys Ile Gly Gly Leu Ile Gly Ala Gln Val Ser Ile Gly 130 135
140cat aca ctg aaa tat gtt caa cct gat ttc aaa aca att tta gag agc
480His Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser
145 150 155cca act gat aaa aaa gta ggc tgg aaa gtg ata ttt aac aat
atg gtg 528Pro Thr Asp Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn
Met Val160 165 170 175aat caa aat tgg gga cca tac gat cga gat tct
tgg aac ccg gta tat 576Asn Gln Asn Trp Gly Pro Tyr Asp Arg Asp Ser
Trp Asn Pro Val Tyr 180 185 190ggc aat caa ctt ttc atg aaa act aga
aat ggt tct atg aaa gca gca 624Gly Asn Gln Leu Phe Met Lys Thr Arg
Asn Gly Ser Met Lys Ala Ala 195 200 205gat aac ttc ctt gat cct aac
aaa gca agt tct cta tta tct tca ggg 672Asp Asn Phe Leu Asp Pro Asn
Lys Ala Ser Ser Leu Leu Ser Ser Gly 210 215 220ttt tca cca gac ttc
gct aca gtt att act atg gat aga aaa gca tcc 720Phe Ser Pro Asp Phe
Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser 225 230 235aaa caa caa
aca aat ata gat gta ata tac gaa cga gtt cgt gat gat 768Lys Gln Gln
Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp240 245 250
255tac caa ttg cat tgg act tca aca aat tgg aaa ggt acc aat act aaa
816Tyr Gln Leu His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys
260 265 270gat aaa tgg aca gat cgt tct tca gaa aga tat aaa atc gat
tgg gaa 864Asp Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp
Trp Glu 275 280 285aaa gaa gaa atg aca aat 882Lys Glu Glu Met Thr
Asn 2904293PRTArtificial sequencealpha-HL M113R/N139Q 4Ala Asp Ser
Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1 5 10 15Asn Thr
Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu Asn 20 25 30Gly
Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn His 35 40
45Asn Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly Gln
50 55 60Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala
Trp65 70 75 80Pro Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn
Glu Val Ala 85 90 95Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile Asp
Thr Lys Glu Tyr 100 105 110Arg Ser Thr Leu Thr Tyr Gly Phe Asn Gly
Asn Val Thr Gly Asp Asp 115 120 125Thr Gly Lys Ile Gly Gly Leu Ile
Gly Ala Gln Val Ser Ile Gly His 130 135 140Thr Leu Lys Tyr Val Gln
Pro Asp Phe Lys Thr Ile Leu Glu Ser Pro145 150 155 160Thr Asp Lys
Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn 165 170 175Gln
Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr Gly 180 185
190Asn Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala Ala Asp
195 200 205Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser
Gly Phe 210 215 220Ser Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg
Lys Ala Ser Lys225 230 235 240Gln Gln Thr Asn Ile Asp Val Ile Tyr
Glu Arg Val Arg Asp Asp Tyr 245 250 255Gln Leu His Trp Thr Ser Thr
Asn Trp Lys Gly Thr Asn Thr Lys Asp 260 265 270Lys Trp Thr Asp Arg
Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu Lys 275 280 285Glu Glu Met
Thr Asn 29054543DNAArtificial sequencepT7-SC1_BspEI-KO 5ttcttgaaga
cgaaagggcc tcgtgatacg cctattttta taggttaatg tcatgataat 60aatggtttct
tagacgtcag gtggcacttt tcggggaaat gtgcgcggaa cccctatttg
120tttatttttc taaatacatt caaatatgta tccgctcatg agacaataac
cctgataaat 180gcttcaataa tattgaaaaa ggaagagtat gagtattcaa
catttccgtg tcgcccttat 240tccctttttt gcggcatttt gccttcctgt
ttttgctcac ccagaaacgc tggtgaaagt 300aaaagatgct gaagatcagt
tgggtgcacg agtgggttac atcgaactgg atctcaacag 360cggtaagatc
cttgagagtt ttcgccccga agaacgtttt ccaatgatga gcacttttaa
420agttctgcta tgtggcgcgg tattatcccg tgttgacgcc gggcaagagc
aactcggtcg 480ccgcatacac tattctcaga atgacttggt tgagtactca
ccagtcacag aaaagcatct 540tacggatggc atgacagtaa gagaattatg
cagtgctgcc ataaccatga gtgataacac 600tgcggccaac ttacttctga
caacgatcgg aggaccgaag gagctaaccg cttttttgca 660caacatgggg
gatcatgtaa ctcgccttga tcgttgggaa ccggagctga atgaagccat
720accaaacgac gagcgtgaca ccacgatgcc tgcagcaatg gcaacaacgt
tgcgcaaact 780attaactggc gaactactta ctctagcttc ccggcaacaa
ttaatagact ggatggaggc 840ggataaagtt gcaggaccac ttctgcgctc
ggcccttccg gctggctggt ttattgctga 900taaatctgga gccggtgagc
gtgggtctcg cggtatcatt gcagcactgg ggccagatgg 960taagccctcc
cgtatcgtag ttatctacac gacggggagt caggcaacta tggatgaacg
1020aaatagacag atcgctgaga taggtgcctc actgattaag cattggtaac
tgtcagacca 1080agtttactca tatatacttt agattgattt aaaacttcat
ttttaattta aaaggatcta 1140ggtgaagatc ctttttgata atctcatgac
caaaatccct taacgtgagt tttcgttcca 1200ctgagcgtca gaccccgtag
aaaagatcaa aggatcttct tgagatcctt tttttctgcg 1260cgtaatctgc
tgcttgcaaa caaaaaaacc accgctacca gcggtggttt gtttgccgga
1320tcaagagcta ccaactcttt ttccgaaggt aactggcttc agcagagcgc
agataccaaa 1380tactgtcctt ctagtgtagc cgtagttagg ccaccacttc
aagaactctg tagcaccgcc 1440tacatacctc gctctgctaa tcctgttacc
agtggctgct gccagtggcg ataagtcgtg 1500tcttaccggg ttggactcaa
gacgatagtt accggataag gcgcagcggt cgggctgaac 1560ggggggttcg
tgcacacagc ccagcttgga gcgaacgacc tacaccgaac tgagatacct
1620acagcgtgag ctatgagaaa gcgccacgct tcccgaaggg agaaaggcgg
acaggtatcc 1680ggtaagcggc agggtcggaa caggagagcg cacgagggag
cttccagggg gaaacgcctg 1740gtatctttat agtcctgtcg ggtttcgcca
cctctgactt gagcgtcgat ttttgtgatg 1800ctcgtcaggg gggcggagcc
tatggaaaaa cgccagcaac gcggcctttt tacggttcct 1860ggccttttgc
tggccttttg ctcacatgtt ctttcctgcg ttatcccctg attctgtgga
1920taaccgtatt accgcctttg agtgagctga taccgctcgc cgcagccgaa
cgaccgagcg 1980cagcgagtca gtgagcgagg aagcggaaga gcgcctgatg
cggtattttc tccttacgca 2040tctgtgcggt atttcacacc gcatatatgg
tgcactctca gtacaatctg ctctgatgcc 2100gcatagttaa gccagtatac
actccgctat cgctacgtga ctgggtcatg gctgcgcccc 2160gacacccgcc
aacacccgct gacgcgccct gacgggcttg tctgctcccg gcatccgctt
2220acagacaagc tgtgaccgtc tccgggagct gcatgtgtca gaggttttca
ccgtcatcac 2280cgaaacgcgc gaggcagcgc tctcccttat gcgactcctg
cattaggaag cagcccagta 2340gtaggttgag gccgttgagc accgccgccg
caaggaatgg tgcatgcaag gagatggcgc 2400ccaacagtcc cccggccacg
gggcctgcca ccatacccac gccgaaacaa gcgctcatga 2460gcccgaagtg
gcgagcccga tcttccccat cggtgatgtc ggcgatatag gcgccagcaa
2520ccgcacctgt ggcgccggtg atgccggcca cgatgcgtcc ggcgtagagg
atcgagatct 2580agcccgccta atgagcgggc ttttttttag atctcgatcc
cgcgaaatta atacgactca 2640ctatagggag accacaacgg tttccctcta
gaaataattt tgtttaactt taagaaggag 2700atatacatat ggcagattct
gatattaata ttaaaaccgg tactacagat attggaagca 2760atactacagt
aaaaacaggt gatttagtca cttatgataa agaaaatggc atgcacaaaa
2820aagtatttta tagttttatc gatgataaaa atcacaataa aaaactgcta
gttattagaa 2880caaaaggtac cattgctggt caatatagag tttatagcga
agaaggtgct aacaaaagtg 2940gtttagcctg gccttcagcc tttaaggtac
agttgcaact acctgataat gaagtagctc 3000aaatatctga ttactatcca
agaaattcga ttgatacaaa agagtatatg agtactttaa 3060cttatggatt
caacggtaat gttactggtg atgatacagg aaaaattggc ggccttattg
3120gtgcaaatgt ttcgattggt catacactga aatatgttca acctgatttc
aaaacaattt 3180tagagagccc aactgataaa aaagtaggct ggaaagtgat
atttaacaat atggtgaatc 3240aaaattgggg accatacgat cgagattctt
ggaacccggt atatggcaat caacttttca 3300tgaaaactag aaatggttct
atgaaagcag cagataactt ccttgatcct aacaaagcaa 3360gttctctatt
atcttcaggg ttttcaccag acttcgctac agttattact atggatagaa
3420aagcatccaa acaacaaaca aatatagatg taatatacga acgagttcgt
gatgattacc 3480aattgcattg gacttcaaca aattggaaag gtaccaatac
taaagataaa tggacagatc 3540gttcttcaga aagatataaa atcgattggg
aaaaagaaga aatgacaaat taatgtaaat 3600tatttgtaca tgtacaaata
aatataattt ataactttag ccgaaagctt ggatccggct 3660gctaacaaag
cccgaaagga agctgagttg gctgctgcca ccgctgagca ataactagca
3720taaccccttg gggcctctaa acgggtcttg aggggttttt tgctgaaagg
aggaactata 3780tataattcga gctcggtacc caccccggtt gataatcaga
aaagccccaa aaacaggaag 3840attgtataag caaatattta aattgtaaac
gttaatattt tgttaaaatt cgcgttaaat 3900ttttgttaaa tcagctcatt
ttttaaccaa taggccgaaa tcggcaaaat cccttataaa 3960tcaaaagaat
agaccgagat agggttgagt gttgttccag tttggaacaa gagtccagta
4020ttaaagaacg tggactccaa cgtcaaaggg cgaaaaaccg tctatcaggg
cgatggccca 4080ctacgtgaac catcacccta atcaagtttt ttggggtcga
ggtgccgtaa agcactaaat 4140cggaacccta aagggatgcc ccgatttaga
gcttgacggg gaaagccggc gaacgtggcg 4200agaaaggaag ggaagaaagc
gaaaggagcg ggcgctaggg cgctggcaag tgtagcggtc 4260acgctgcgcg
taaccaccac acccgccgcg cttaatgcgc cgctacaggg cgcgtgggga
4320tcctctagag tcgacctgca ggcatgcaag ctatcccgca agaggcccgg
cagtaccggc 4380ataaccaagc ctatgcctac agcatccagg gtgacggtgc
cgaggatgac gatgagcgca 4440ttgttagatt tcatacacgg tgcctgactg
cgttagcaat ttaactgtga taaactaccg 4500cattaaagct agcttatcga
tgataagctg tcaaacatga gaa 45436888DNAArtificial sequencewild-type
alpha-hemolysin containing a BspEI cloning site at position 1 (L1)
6atggcagatt ctgatattaa tattaaaacc ggtactacag atattggaag caatacttcc
60ggaacagtaa aaacaggtga tttagtcact tatgataaag aaaatggcat gcacaaaaaa
120gtattttata gttttatcga tgataaaaat cacaataaaa aactgctagt
tattagaaca 180aaaggtacca ttgctggtca atatagagtt tatagcgaag
aaggtgctaa caaaagtggt 240ttagcctggc cttcagcctt taaggtacag
ttgcaactac ctgataatga agtagctcaa 300atatctgatt actatccaag
aaattcgatt gatacaaaag agtatatgag tactttaact 360tatggattca
acggtaatgt tactggtgat gatacaggaa aaattggcgg ccttattggt
420gcaaatgttt cgattggtca tacactgaaa tatgttcaac ctgatttcaa
aacaatttta 480gagagcccaa ctgataaaaa agtaggctgg aaagtgatat
ttaacaatat ggtgaatcaa 540aattggggac catacgatcg agattcttgg
aacccggtat atggcaatca acttttcatg 600aaaactagaa atggttctat
gaaagcagca gataacttcc ttgatcctaa caaagcaagt 660tctctattat
cttcagggtt ttcaccagac ttcgctacag ttattactat ggatagaaaa
720gcatccaaac aacaaacaaa tatagatgta atatacgaac gagttcgtga
tgattaccaa 780ttgcattgga cttcaacaaa ttggaaaggt accaatacta
aagataaatg gacagatcgt 840tcttcagaaa gatataaaat
cgattgggaa aaagaagaaa tgacaaat 8887888DNAArtificial
sequencewild-type alpha-hemolysin containing a BspEI cloning site
at position 2 (L2a) 7atggcagatt ctgatattaa tattaaaacc ggtactacag
atattggaag caatactaca 60gtaaaaacag gtgatttagt cacttatgat aaagaaaatg
gcatgcacaa aaaagtattt 120tatagtttta tcgattccgg agataaaaat
cacaataaaa aactgctagt tattagaaca 180aaaggtacca ttgctggtca
atatagagtt tatagcgaag aaggtgctaa caaaagtggt 240ttagcctggc
cttcagcctt taaggtacag ttgcaactac ctgataatga agtagctcaa
300atatctgatt actatccaag aaattcgatt gatacaaaag agtatatgag
tactttaact 360tatggattca acggtaatgt tactggtgat gatacaggaa
aaattggcgg ccttattggt 420gcaaatgttt cgattggtca tacactgaaa
tatgttcaac ctgatttcaa aacaatttta 480gagagcccaa ctgataaaaa
agtaggctgg aaagtgatat ttaacaatat ggtgaatcaa 540aattggggac
catacgatcg agattcttgg aacccggtat atggcaatca acttttcatg
600aaaactagaa atggttctat gaaagcagca gataacttcc ttgatcctaa
caaagcaagt 660tctctattat cttcagggtt ttcaccagac ttcgctacag
ttattactat ggatagaaaa 720gcatccaaac aacaaacaaa tatagatgta
atatacgaac gagttcgtga tgattaccaa 780ttgcattgga cttcaacaaa
ttggaaaggt accaatacta aagataaatg gacagatcgt 840tcttcagaaa
gatataaaat cgattgggaa aaagaagaaa tgacaaat 8888888DNAArtificial
sequencewild-type alpha-hemolysin containing a BspEI cloning site
at position 2 (L2b) 8atggcagatt ctgatattaa tattaaaacc ggtactacag
atattggaag caatactaca 60gtaaaaacag gtgatttagt cacttatgat aaagaaaatg
gcatgcacaa aaaagtattt 120tatagtttta tcgatgataa aaatcacaat
aaatccggaa aactgctagt tattagaaca 180aaaggtacca ttgctggtca
atatagagtt tatagcgaag aaggtgctaa caaaagtggt 240ttagcctggc
cttcagcctt taaggtacag ttgcaactac ctgataatga agtagctcaa
300atatctgatt actatccaag aaattcgatt gatacaaaag agtatatgag
tactttaact 360tatggattca acggtaatgt tactggtgat gatacaggaa
aaattggcgg ccttattggt 420gcaaatgttt cgattggtca tacactgaaa
tatgttcaac ctgatttcaa aacaatttta 480gagagcccaa ctgataaaaa
agtaggctgg aaagtgatat ttaacaatat ggtgaatcaa 540aattggggac
catacgatcg agattcttgg aacccggtat atggcaatca acttttcatg
600aaaactagaa atggttctat gaaagcagca gataacttcc ttgatcctaa
caaagcaagt 660tctctattat cttcagggtt ttcaccagac ttcgctacag
ttattactat ggatagaaaa 720gcatccaaac aacaaacaaa tatagatgta
atatacgaac gagttcgtga tgattaccaa 780ttgcattgga cttcaacaaa
ttggaaaggt accaatacta aagataaatg gacagatcgt 840tcttcagaaa
gatataaaat cgattgggaa aaagaagaaa tgacaaat 8889804DNAE.
coliCDS(1)..(804) 9atg aaa ttt gtc tct ttt aat atc aac ggc ctg cgc
gcc aga cct cac 48Met Lys Phe Val Ser Phe Asn Ile Asn Gly Leu Arg
Ala Arg Pro His1 5 10 15cag ctt gaa gcc atc gtc gaa aag cac caa ccg
gat gtg att ggc ctg 96Gln Leu Glu Ala Ile Val Glu Lys His Gln Pro
Asp Val Ile Gly Leu 20 25 30cag gag aca aaa gtt cat gac gat atg ttt
ccg ctc gaa gag gtg gcg 144Gln Glu Thr Lys Val His Asp Asp Met Phe
Pro Leu Glu Glu Val Ala 35 40 45aag ctc ggc tac aac gtg ttt tat cac
ggg cag aaa ggc cat tat ggc 192Lys Leu Gly Tyr Asn Val Phe Tyr His
Gly Gln Lys Gly His Tyr Gly 50 55 60gtg gcg ctg ctg acc aaa gag acg
ccg att gcc gtg cgt cgc ggc ttt 240Val Ala Leu Leu Thr Lys Glu Thr
Pro Ile Ala Val Arg Arg Gly Phe65 70 75 80ccc ggt gac gac gaa gag
gcg cag cgg cgg att att atg gcg gaa atc 288Pro Gly Asp Asp Glu Glu
Ala Gln Arg Arg Ile Ile Met Ala Glu Ile 85 90 95ccc tca ctg ctg ggt
aat gtc acc gtg atc aac ggt tac ttc ccg cag 336Pro Ser Leu Leu Gly
Asn Val Thr Val Ile Asn Gly Tyr Phe Pro Gln 100 105 110ggt gaa agc
cgc gac cat ccg ata aaa ttc ccg gca aaa gcg cag ttt 384Gly Glu Ser
Arg Asp His Pro Ile Lys Phe Pro Ala Lys Ala Gln Phe 115 120 125tat
cag aat ctg caa aac tac ctg gaa acc gaa ctc aaa cgt gat aat 432Tyr
Gln Asn Leu Gln Asn Tyr Leu Glu Thr Glu Leu Lys Arg Asp Asn 130 135
140ccg gta ctg att atg ggc gat atg aat atc agc cct aca gat ctg gat
480Pro Val Leu Ile Met Gly Asp Met Asn Ile Ser Pro Thr Asp Leu
Asp145 150 155 160atc ggc att ggc gaa gaa aac cgt aag cgc tgg ctg
cgt acc ggt aaa 528Ile Gly Ile Gly Glu Glu Asn Arg Lys Arg Trp Leu
Arg Thr Gly Lys 165 170 175tgc tct ttc ctg ccg gaa gag cgc gaa tgg
atg gac agg ctg atg agc 576Cys Ser Phe Leu Pro Glu Glu Arg Glu Trp
Met Asp Arg Leu Met Ser 180 185 190tgg ggg ttg gtc gat acc ttc cgc
cat gcg aat ccg caa aca gca gat 624Trp Gly Leu Val Asp Thr Phe Arg
His Ala Asn Pro Gln Thr Ala Asp 195 200 205cgt ttc tca tgg ttt gat
tac cgc tca aaa ggt ttt gac gat aac cgt 672Arg Phe Ser Trp Phe Asp
Tyr Arg Ser Lys Gly Phe Asp Asp Asn Arg 210 215 220ggt ctg cgc atc
gac ctg ctg ctc gcc agc caa ccg ctg gca gaa tgt 720Gly Leu Arg Ile
Asp Leu Leu Leu Ala Ser Gln Pro Leu Ala Glu Cys225 230 235 240tgc
gta gaa acc ggc atc gac tat gaa atc cgc agc atg gaa aaa ccg 768Cys
Val Glu Thr Gly Ile Asp Tyr Glu Ile Arg Ser Met Glu Lys Pro 245 250
255tcc gat cac gcc ccc gtc tgg gcg acc ttc cgc cgc 804Ser Asp His
Ala Pro Val Trp Ala Thr Phe Arg Arg 260 26510268PRTE. coli 10Met
Lys Phe Val Ser Phe Asn Ile Asn Gly Leu Arg Ala Arg Pro His1 5 10
15Gln Leu Glu Ala Ile Val Glu Lys His Gln Pro Asp Val Ile Gly Leu
20 25 30Gln Glu Thr Lys Val His Asp Asp Met Phe Pro Leu Glu Glu Val
Ala 35 40 45Lys Leu Gly Tyr Asn Val Phe Tyr His Gly Gln Lys Gly His
Tyr Gly 50 55 60Val Ala Leu Leu Thr Lys Glu Thr Pro Ile Ala Val Arg
Arg Gly Phe65 70 75 80Pro Gly Asp Asp Glu Glu Ala Gln Arg Arg Ile
Ile Met Ala Glu Ile 85 90 95Pro Ser Leu Leu Gly Asn Val Thr Val Ile
Asn Gly Tyr Phe Pro Gln 100 105 110Gly Glu Ser Arg Asp His Pro Ile
Lys Phe Pro Ala Lys Ala Gln Phe 115 120 125Tyr Gln Asn Leu Gln Asn
Tyr Leu Glu Thr Glu Leu Lys Arg Asp Asn 130 135 140Pro Val Leu Ile
Met Gly Asp Met Asn Ile Ser Pro Thr Asp Leu Asp145 150 155 160Ile
Gly Ile Gly Glu Glu Asn Arg Lys Arg Trp Leu Arg Thr Gly Lys 165 170
175Cys Ser Phe Leu Pro Glu Glu Arg Glu Trp Met Asp Arg Leu Met Ser
180 185 190Trp Gly Leu Val Asp Thr Phe Arg His Ala Asn Pro Gln Thr
Ala Asp 195 200 205Arg Phe Ser Trp Phe Asp Tyr Arg Ser Lys Gly Phe
Asp Asp Asn Arg 210 215 220Gly Leu Arg Ile Asp Leu Leu Leu Ala Ser
Gln Pro Leu Ala Glu Cys225 230 235 240Cys Val Glu Thr Gly Ile Asp
Tyr Glu Ile Arg Ser Met Glu Lys Pro 245 250 255Ser Asp His Ala Pro
Val Trp Ala Thr Phe Arg Arg 260 265111425DNAE. coliCDS(1)..(1425)
11atg atg aat gac ggt aag caa caa tct acc ttt ttg ttt cac gat tac
48Met Met Asn Asp Gly Lys Gln Gln Ser Thr Phe Leu Phe His Asp Tyr1
5 10 15gaa acc ttt ggc acg cac ccc gcg tta gat cgc cct gca cag ttc
gca 96Glu Thr Phe Gly Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe
Ala 20 25 30gcc att cgc acc gat agc gaa ttc aat gtc atc ggc gaa ccc
gaa gtc 144Ala Ile Arg Thr Asp Ser Glu Phe Asn Val Ile Gly Glu Pro
Glu Val 35 40 45ttt tac tgc aag ccc gct gat gac tat tta ccc cag cca
gga gcc gta 192Phe Tyr Cys Lys Pro Ala Asp Asp Tyr Leu Pro Gln Pro
Gly Ala Val 50 55 60tta att acc ggt att acc ccg cag gaa gca cgg gcg
aaa gga gaa aac 240Leu Ile Thr Gly Ile Thr Pro Gln Glu Ala Arg Ala
Lys Gly Glu Asn65 70 75 80gaa gcc gcg ttt gcc gcc cgt att cac tcg
ctt ttt acc gta ccg aag 288Glu Ala Ala Phe Ala Ala Arg Ile His Ser
Leu Phe Thr Val Pro Lys 85 90 95acc tgt att ctg ggc tac aac aat gtg
cgt ttc gac gac gaa gtc aca 336Thr Cys Ile Leu Gly Tyr Asn Asn Val
Arg Phe Asp Asp Glu Val Thr 100 105 110cgc aac att ttt tat cgt aat
ttc tac gat cct tac gcc tgg agc tgg 384Arg Asn Ile Phe Tyr Arg Asn
Phe Tyr Asp Pro Tyr Ala Trp Ser Trp 115 120 125cag cat gat aac tcg
cgc tgg gat tta ctg gat gtt atg cgt gcc tgt 432Gln His Asp Asn Ser
Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys 130 135 140tat gcc ctg
cgc ccg gaa gga ata aac tgg cct gaa aat gat gac ggt 480Tyr Ala Leu
Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp Asp Gly145 150 155
160cta ccg agc ttt cgc ctt gag cat tta acc aaa gcg aat ggt att gaa
528Leu Pro Ser Phe Arg Leu Glu His Leu Thr Lys Ala Asn Gly Ile Glu
165 170 175cat agc aac gcc cac gat gcg atg gct gat gtg tac gcc act
att gcg 576His Ser Asn Ala His Asp Ala Met Ala Asp Val Tyr Ala Thr
Ile Ala 180 185 190atg gca aag ctg gta aaa acg cgt cag cca cgc ctg
ttt gat tat ctc 624Met Ala Lys Leu Val Lys Thr Arg Gln Pro Arg Leu
Phe Asp Tyr Leu 195 200 205ttt acc cat cgt aat aaa cac aaa ctg atg
gcg ttg att gat gtt ccg 672Phe Thr His Arg Asn Lys His Lys Leu Met
Ala Leu Ile Asp Val Pro 210 215 220cag atg aaa ccc ctg gtg cac gtt
tcc gga atg ttt gga gca tgg cgc 720Gln Met Lys Pro Leu Val His Val
Ser Gly Met Phe Gly Ala Trp Arg225 230 235 240ggc aat acc agc tgg
gtg gca ccg ctg gcg tgg cat cct gaa aat cgc 768Gly Asn Thr Ser Trp
Val Ala Pro Leu Ala Trp His Pro Glu Asn Arg 245 250 255aat gcc gta
att atg gtg gat ttg gca gga gac att tcg cca tta ctg 816Asn Ala Val
Ile Met Val Asp Leu Ala Gly Asp Ile Ser Pro Leu Leu 260 265 270gaa
ctg gat agc gac aca ttg cgc gag cgt tta tat acc gca aaa acc 864Glu
Leu Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala Lys Thr 275 280
285gat ctt ggc gat aac gcc gcc gtt ccg gtt aag ctg gtg cat atc aat
912Asp Leu Gly Asp Asn Ala Ala Val Pro Val Lys Leu Val His Ile Asn
290 295 300aaa tgt ccg gtg ctg gcc cag gcg aat acg cta cgc ccg gaa
gat gcc 960Lys Cys Pro Val Leu Ala Gln Ala Asn Thr Leu Arg Pro Glu
Asp Ala305 310 315 320gac cga ctg gga att aat cgt cag cat tgc ctc
gat aac ctg aaa att 1008Asp Arg Leu Gly Ile Asn Arg Gln His Cys Leu
Asp Asn Leu Lys Ile 325 330 335ctg cgt gaa aat ccg caa gtg cgc gaa
aaa gtg gtg gcg ata ttc gcg 1056Leu Arg Glu Asn Pro Gln Val Arg Glu
Lys Val Val Ala Ile Phe Ala 340 345 350gaa gcc gaa ccg ttt acg cct
tca gat aac gtg gat gca cag ctt tat 1104Glu Ala Glu Pro Phe Thr Pro
Ser Asp Asn Val Asp Ala Gln Leu Tyr 355 360 365aac ggc ttt ttc agt
gac gca gat cgt gca gca atg aaa att gtg ctg 1152Asn Gly Phe Phe Ser
Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu 370 375 380gaa acc gag
ccg cgt aat tta ccg gca ctg gat atc act ttt gtt gat 1200Glu Thr Glu
Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe Val Asp385 390 395
400aaa cgg att gaa aag ctg ttg ttc aat tat cgg gca cgc aac ttc ccg
1248Lys Arg Ile Glu Lys Leu Leu Phe Asn Tyr Arg Ala Arg Asn Phe Pro
405 410 415ggg acg ctg gat tat gcc gag cag caa cgc tgg ctg gag cac
cgt cgc 1296Gly Thr Leu Asp Tyr Ala Glu Gln Gln Arg Trp Leu Glu His
Arg Arg 420 425 430cag gtc ttc acg cca gag ttt ttg cag ggt tat gct
gat gaa ttg cag 1344Gln Val Phe Thr Pro Glu Phe Leu Gln Gly Tyr Ala
Asp Glu Leu Gln 435 440 445atg ctg gta caa caa tat gcc gat gac aaa
gag aaa gtg gcg ctg tta 1392Met Leu Val Gln Gln Tyr Ala Asp Asp Lys
Glu Lys Val Ala Leu Leu 450 455 460aaa gca ctt tgg cag tac gcg gaa
gag att gtc 1425Lys Ala Leu Trp Gln Tyr Ala Glu Glu Ile Val465 470
47512475PRTE. coli 12Met Met Asn Asp Gly Lys Gln Gln Ser Thr Phe
Leu Phe His Asp Tyr1 5 10 15Glu Thr Phe Gly Thr His Pro Ala Leu Asp
Arg Pro Ala Gln Phe Ala 20 25 30Ala Ile Arg Thr Asp Ser Glu Phe Asn
Val Ile Gly Glu Pro Glu Val 35 40 45Phe Tyr Cys Lys Pro Ala Asp Asp
Tyr Leu Pro Gln Pro Gly Ala Val 50 55 60Leu Ile Thr Gly Ile Thr Pro
Gln Glu Ala Arg Ala Lys Gly Glu Asn65 70 75 80Glu Ala Ala Phe Ala
Ala Arg Ile His Ser Leu Phe Thr Val Pro Lys 85 90 95Thr Cys Ile Leu
Gly Tyr Asn Asn Val Arg Phe Asp Asp Glu Val Thr 100 105 110Arg Asn
Ile Phe Tyr Arg Asn Phe Tyr Asp Pro Tyr Ala Trp Ser Trp 115 120
125Gln His Asp Asn Ser Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys
130 135 140Tyr Ala Leu Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp
Asp Gly145 150 155 160Leu Pro Ser Phe Arg Leu Glu His Leu Thr Lys
Ala Asn Gly Ile Glu 165 170 175His Ser Asn Ala His Asp Ala Met Ala
Asp Val Tyr Ala Thr Ile Ala 180 185 190Met Ala Lys Leu Val Lys Thr
Arg Gln Pro Arg Leu Phe Asp Tyr Leu 195 200 205Phe Thr His Arg Asn
Lys His Lys Leu Met Ala Leu Ile Asp Val Pro 210 215 220Gln Met Lys
Pro Leu Val His Val Ser Gly Met Phe Gly Ala Trp Arg225 230 235
240Gly Asn Thr Ser Trp Val Ala Pro Leu Ala Trp His Pro Glu Asn Arg
245 250 255Asn Ala Val Ile Met Val Asp Leu Ala Gly Asp Ile Ser Pro
Leu Leu 260 265 270Glu Leu Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr
Thr Ala Lys Thr 275 280 285Asp Leu Gly Asp Asn Ala Ala Val Pro Val
Lys Leu Val His Ile Asn 290 295 300Lys Cys Pro Val Leu Ala Gln Ala
Asn Thr Leu Arg Pro Glu Asp Ala305 310 315 320Asp Arg Leu Gly Ile
Asn Arg Gln His Cys Leu Asp Asn Leu Lys Ile 325 330 335Leu Arg Glu
Asn Pro Gln Val Arg Glu Lys Val Val Ala Ile Phe Ala 340 345 350Glu
Ala Glu Pro Phe Thr Pro Ser Asp Asn Val Asp Ala Gln Leu Tyr 355 360
365Asn Gly Phe Phe Ser Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu
370 375 380Glu Thr Glu Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe
Val Asp385 390 395 400Lys Arg Ile Glu Lys Leu Leu Phe Asn Tyr Arg
Ala Arg Asn Phe Pro 405 410 415Gly Thr Leu Asp Tyr Ala Glu Gln Gln
Arg Trp Leu Glu His Arg Arg 420 425 430Gln Val Phe Thr Pro Glu Phe
Leu Gln Gly Tyr Ala Asp Glu Leu Gln 435 440 445Met Leu Val Gln Gln
Tyr Ala Asp Asp Lys Glu Lys Val Ala Leu Leu 450 455 460Lys Ala Leu
Trp Gln Tyr Ala Glu Glu Ile Val465 470 475131275DNAT.
thermophilusCDS(1)..(1275) 13atg ttt cgt cgt aaa gaa gat ctg gat
ccg ccg ctg gca ctg ctg ccg 48Met Phe Arg Arg Lys Glu Asp Leu Asp
Pro Pro Leu Ala Leu Leu Pro1 5 10 15ctg aaa ggc ctg cgc gaa gcc gcc
gca ctg ctg gaa gaa gcg ctg cgt 96Leu Lys Gly Leu Arg Glu Ala Ala
Ala Leu Leu Glu Glu Ala Leu Arg 20 25 30caa ggt aaa cgc att cgt gtt
cac ggc gac tat gat gcg gat ggc ctg 144Gln Gly Lys Arg Ile Arg Val
His Gly Asp Tyr Asp Ala Asp Gly Leu 35 40 45acc ggc acc gcg atc ctg
gtt cgt ggt ctg gcc gcc ctg ggt gcg gat 192Thr Gly Thr Ala Ile Leu
Val Arg Gly Leu Ala Ala Leu Gly Ala Asp 50 55 60gtt cat ccg ttt atc
ccg cac cgc ctg gaa gaa ggc tat ggt gtc ctg 240Val His Pro Phe Ile
Pro His Arg Leu Glu Glu Gly Tyr Gly Val Leu65 70 75 80atg gaa cgc
gtc ccg gaa cat ctg gaa gcc tcg gac ctg ttt ctg acc 288Met Glu Arg
Val Pro Glu His Leu Glu Ala Ser Asp Leu Phe Leu Thr 85 90 95gtt gac
tgc ggc att acc aac cat gcg gaa ctg cgc gaa ctg ctg gaa 336Val Asp
Cys Gly Ile Thr Asn His Ala Glu Leu Arg Glu Leu Leu Glu 100 105
110aat ggc gtg gaa gtc att
gtt acc gat cat cat acg ccg ggc aaa acg 384Asn Gly Val Glu Val Ile
Val Thr Asp His His Thr Pro Gly Lys Thr 115 120 125ccg ccg ccg ggt
ctg gtc gtg cat ccg gcg ctg acg ccg gat ctg aaa 432Pro Pro Pro Gly
Leu Val Val His Pro Ala Leu Thr Pro Asp Leu Lys 130 135 140gaa aaa
ccg acc ggc gca ggc gtg gcg ttt ctg ctg ctg tgg gca ctg 480Glu Lys
Pro Thr Gly Ala Gly Val Ala Phe Leu Leu Leu Trp Ala Leu145 150 155
160cat gaa cgc ctg ggc ctg ccg ccg ccg ctg gaa tac gcg gac ctg gca
528His Glu Arg Leu Gly Leu Pro Pro Pro Leu Glu Tyr Ala Asp Leu Ala
165 170 175gcc gtt ggc acc att gcc gac gtt gcc ccg ctg tgg ggt tgg
aat cgt 576Ala Val Gly Thr Ile Ala Asp Val Ala Pro Leu Trp Gly Trp
Asn Arg 180 185 190gca ctg gtg aaa gaa ggt ctg gca cgc atc ccg gct
tca tct tgg gtg 624Ala Leu Val Lys Glu Gly Leu Ala Arg Ile Pro Ala
Ser Ser Trp Val 195 200 205ggc ctg cgt ctg ctg gct gaa gcc gtg ggc
tat acc ggc aaa gcg gtc 672Gly Leu Arg Leu Leu Ala Glu Ala Val Gly
Tyr Thr Gly Lys Ala Val 210 215 220gaa gtc gct ttc cgc atc gcg ccg
cgc atc aat gcg gct tcc cgc ctg 720Glu Val Ala Phe Arg Ile Ala Pro
Arg Ile Asn Ala Ala Ser Arg Leu225 230 235 240ggc gaa gcg gaa aaa
gcc ctg cgc ctg ctg ctg acg gat gat gcg gca 768Gly Glu Ala Glu Lys
Ala Leu Arg Leu Leu Leu Thr Asp Asp Ala Ala 245 250 255gaa gct cag
gcg ctg gtc ggc gaa ctg cac cgt ctg aac gcc cgt cgt 816Glu Ala Gln
Ala Leu Val Gly Glu Leu His Arg Leu Asn Ala Arg Arg 260 265 270cag
acc ctg gaa gaa gcg atg ctg cgc aaa ctg ctg ccg cag gcc gac 864Gln
Thr Leu Glu Glu Ala Met Leu Arg Lys Leu Leu Pro Gln Ala Asp 275 280
285ccg gaa gcg aaa gcc atc gtt ctg ctg gac ccg gaa ggc cat ccg ggt
912Pro Glu Ala Lys Ala Ile Val Leu Leu Asp Pro Glu Gly His Pro Gly
290 295 300gtt atg ggt att gtg gcc tct cgc atc ctg gaa gcg acc ctg
cgc ccg 960Val Met Gly Ile Val Ala Ser Arg Ile Leu Glu Ala Thr Leu
Arg Pro305 310 315 320gtc ttt ctg gtg gcc cag ggc aaa ggc acc gtg
cgt tcg ctg gct ccg 1008Val Phe Leu Val Ala Gln Gly Lys Gly Thr Val
Arg Ser Leu Ala Pro 325 330 335att tcc gcc gtc gaa gca ctg cgc agc
gcg gaa gat ctg ctg ctg cgt 1056Ile Ser Ala Val Glu Ala Leu Arg Ser
Ala Glu Asp Leu Leu Leu Arg 340 345 350tat ggt ggt cat aaa gaa gcg
gcg ggt ttc gca atg gat gaa gcg ctg 1104Tyr Gly Gly His Lys Glu Ala
Ala Gly Phe Ala Met Asp Glu Ala Leu 355 360 365ttt ccg gcg ttc aaa
gca cgc gtt gaa gcg tat gcc gca cgt ttc ccg 1152Phe Pro Ala Phe Lys
Ala Arg Val Glu Ala Tyr Ala Ala Arg Phe Pro 370 375 380gat ccg gtt
cgt gaa gtg gca ctg ctg gat ctg ctg ccg gaa ccg ggc 1200Asp Pro Val
Arg Glu Val Ala Leu Leu Asp Leu Leu Pro Glu Pro Gly385 390 395
400ctg ctg ccg cag gtg ttc cgt gaa ctg gca ctg ctg gaa ccg tat ggt
1248Leu Leu Pro Gln Val Phe Arg Glu Leu Ala Leu Leu Glu Pro Tyr Gly
405 410 415gaa ggt aac ccg gaa ccg ctg ttc ctg 1275Glu Gly Asn Pro
Glu Pro Leu Phe Leu 420 42514425PRTT. thermophilus 14Met Phe Arg
Arg Lys Glu Asp Leu Asp Pro Pro Leu Ala Leu Leu Pro1 5 10 15Leu Lys
Gly Leu Arg Glu Ala Ala Ala Leu Leu Glu Glu Ala Leu Arg 20 25 30Gln
Gly Lys Arg Ile Arg Val His Gly Asp Tyr Asp Ala Asp Gly Leu 35 40
45Thr Gly Thr Ala Ile Leu Val Arg Gly Leu Ala Ala Leu Gly Ala Asp
50 55 60Val His Pro Phe Ile Pro His Arg Leu Glu Glu Gly Tyr Gly Val
Leu65 70 75 80Met Glu Arg Val Pro Glu His Leu Glu Ala Ser Asp Leu
Phe Leu Thr 85 90 95Val Asp Cys Gly Ile Thr Asn His Ala Glu Leu Arg
Glu Leu Leu Glu 100 105 110Asn Gly Val Glu Val Ile Val Thr Asp His
His Thr Pro Gly Lys Thr 115 120 125Pro Pro Pro Gly Leu Val Val His
Pro Ala Leu Thr Pro Asp Leu Lys 130 135 140Glu Lys Pro Thr Gly Ala
Gly Val Ala Phe Leu Leu Leu Trp Ala Leu145 150 155 160His Glu Arg
Leu Gly Leu Pro Pro Pro Leu Glu Tyr Ala Asp Leu Ala 165 170 175Ala
Val Gly Thr Ile Ala Asp Val Ala Pro Leu Trp Gly Trp Asn Arg 180 185
190Ala Leu Val Lys Glu Gly Leu Ala Arg Ile Pro Ala Ser Ser Trp Val
195 200 205Gly Leu Arg Leu Leu Ala Glu Ala Val Gly Tyr Thr Gly Lys
Ala Val 210 215 220Glu Val Ala Phe Arg Ile Ala Pro Arg Ile Asn Ala
Ala Ser Arg Leu225 230 235 240Gly Glu Ala Glu Lys Ala Leu Arg Leu
Leu Leu Thr Asp Asp Ala Ala 245 250 255Glu Ala Gln Ala Leu Val Gly
Glu Leu His Arg Leu Asn Ala Arg Arg 260 265 270Gln Thr Leu Glu Glu
Ala Met Leu Arg Lys Leu Leu Pro Gln Ala Asp 275 280 285Pro Glu Ala
Lys Ala Ile Val Leu Leu Asp Pro Glu Gly His Pro Gly 290 295 300Val
Met Gly Ile Val Ala Ser Arg Ile Leu Glu Ala Thr Leu Arg Pro305 310
315 320Val Phe Leu Val Ala Gln Gly Lys Gly Thr Val Arg Ser Leu Ala
Pro 325 330 335Ile Ser Ala Val Glu Ala Leu Arg Ser Ala Glu Asp Leu
Leu Leu Arg 340 345 350Tyr Gly Gly His Lys Glu Ala Ala Gly Phe Ala
Met Asp Glu Ala Leu 355 360 365Phe Pro Ala Phe Lys Ala Arg Val Glu
Ala Tyr Ala Ala Arg Phe Pro 370 375 380Asp Pro Val Arg Glu Val Ala
Leu Leu Asp Leu Leu Pro Glu Pro Gly385 390 395 400Leu Leu Pro Gln
Val Phe Arg Glu Leu Ala Leu Leu Glu Pro Tyr Gly 405 410 415Glu Gly
Asn Pro Glu Pro Leu Phe Leu 420
42515738DNABacteriophageCDS(31)..(708) 15tccggaagcg gctctggtag
tggttctggc atg aca ccg gac att atc ctg cag 54 Met Thr Pro Asp Ile
Ile Leu Gln 1 5cgt acc ggg atc gat gtg aga gct gtc gaa cag ggg gat
gat gcg tgg 102Arg Thr Gly Ile Asp Val Arg Ala Val Glu Gln Gly Asp
Asp Ala Trp 10 15 20cac aaa tta cgg ctc ggc gtc atc acc gct tca gaa
gtt cac aac gtg 150His Lys Leu Arg Leu Gly Val Ile Thr Ala Ser Glu
Val His Asn Val25 30 35 40ata gca aaa ccc cgc tcc gga aag aag tgg
cct gac atg aaa atg tcc 198Ile Ala Lys Pro Arg Ser Gly Lys Lys Trp
Pro Asp Met Lys Met Ser 45 50 55tac ttc cac acc ctg ctt gct gag gtt
tgc acc ggt gtg gct ccg gaa 246Tyr Phe His Thr Leu Leu Ala Glu Val
Cys Thr Gly Val Ala Pro Glu 60 65 70gtt aac gct aaa gca ctg gcc tgg
gga aaa cag tac gag aac gac gcc 294Val Asn Ala Lys Ala Leu Ala Trp
Gly Lys Gln Tyr Glu Asn Asp Ala 75 80 85aga acc ctg ttt gaa ttc act
tcc ggc gtg aat gtt act gaa tcc ccg 342Arg Thr Leu Phe Glu Phe Thr
Ser Gly Val Asn Val Thr Glu Ser Pro 90 95 100atc atc tat cgc gac
gaa agt atg cgt acc gcc tgc tct ccc gat ggt 390Ile Ile Tyr Arg Asp
Glu Ser Met Arg Thr Ala Cys Ser Pro Asp Gly105 110 115 120tta tgc
agt gac ggc aac ggc ctt gaa ctg aaa tgc ccg ttt acc tcc 438Leu Cys
Ser Asp Gly Asn Gly Leu Glu Leu Lys Cys Pro Phe Thr Ser 125 130
135cgg gat ttc atg aag ttc cgg ctc ggt ggt ttc gag gcc ata aag tca
486Arg Asp Phe Met Lys Phe Arg Leu Gly Gly Phe Glu Ala Ile Lys Ser
140 145 150gct tac atg gcc cag gtg cag tac agc atg tgg gtg acg cga
aaa aat 534Ala Tyr Met Ala Gln Val Gln Tyr Ser Met Trp Val Thr Arg
Lys Asn 155 160 165gcc tgg tac ttt gcc aac tat gac ccg cgt atg aag
cgt gaa ggc ctg 582Ala Trp Tyr Phe Ala Asn Tyr Asp Pro Arg Met Lys
Arg Glu Gly Leu 170 175 180cat tat gtc gtg att gag cgg gat gaa aag
tac atg gcg agt ttt gac 630His Tyr Val Val Ile Glu Arg Asp Glu Lys
Tyr Met Ala Ser Phe Asp185 190 195 200gag atc gtg ccg gag ttc atc
gaa aaa atg gac gag gca ctg gct gaa 678Glu Ile Val Pro Glu Phe Ile
Glu Lys Met Asp Glu Ala Leu Ala Glu 205 210 215att ggt ttt gta ttt
ggg gag caa tgg cga tctggctctg gttccggcag 728Ile Gly Phe Val Phe
Gly Glu Gln Trp Arg 220 225cggttccgga 73816226PRTBacteriophage
16Met Thr Pro Asp Ile Ile Leu Gln Arg Thr Gly Ile Asp Val Arg Ala1
5 10 15Val Glu Gln Gly Asp Asp Ala Trp His Lys Leu Arg Leu Gly Val
Ile 20 25 30Thr Ala Ser Glu Val His Asn Val Ile Ala Lys Pro Arg Ser
Gly Lys 35 40 45Lys Trp Pro Asp Met Lys Met Ser Tyr Phe His Thr Leu
Leu Ala Glu 50 55 60Val Cys Thr Gly Val Ala Pro Glu Val Asn Ala Lys
Ala Leu Ala Trp65 70 75 80Gly Lys Gln Tyr Glu Asn Asp Ala Arg Thr
Leu Phe Glu Phe Thr Ser 85 90 95Gly Val Asn Val Thr Glu Ser Pro Ile
Ile Tyr Arg Asp Glu Ser Met 100 105 110Arg Thr Ala Cys Ser Pro Asp
Gly Leu Cys Ser Asp Gly Asn Gly Leu 115 120 125Glu Leu Lys Cys Pro
Phe Thr Ser Arg Asp Phe Met Lys Phe Arg Leu 130 135 140Gly Gly Phe
Glu Ala Ile Lys Ser Ala Tyr Met Ala Gln Val Gln Tyr145 150 155
160Ser Met Trp Val Thr Arg Lys Asn Ala Trp Tyr Phe Ala Asn Tyr Asp
165 170 175Pro Arg Met Lys Arg Glu Gly Leu His Tyr Val Val Ile Glu
Arg Asp 180 185 190Glu Lys Tyr Met Ala Ser Phe Asp Glu Ile Val Pro
Glu Phe Ile Glu 195 200 205Lys Met Asp Glu Ala Leu Ala Glu Ile Gly
Phe Val Phe Gly Glu Gln 210 215 220Trp Arg225171794DNAArtificial
sequenceHL-wt-EcoExoIII-L1-H6 17atg gca gat tct gat att aat att aaa
acc ggt act aca gat att gga 48 Ala Asp Ser Asp Ile Asn Ile Lys Thr
Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat act tcc gga agc ggc tct
ggt agt ggt tct ggc atg aaa ttt 96Ser Asn Thr Ser Gly Ser Gly Ser
Gly Ser Gly Ser Gly Met Lys Phe 20 25 30gtt agc ttc aat atc aac ggc
ctg cgc gcg cgc ccg cat cag ctg gaa 144Val Ser Phe Asn Ile Asn Gly
Leu Arg Ala Arg Pro His Gln Leu Glu 35 40 45gcg att gtg gaa aaa cat
cag ccg gat gtt att ggt ctg cag gaa acc 192Ala Ile Val Glu Lys His
Gln Pro Asp Val Ile Gly Leu Gln Glu Thr 50 55 60aaa gtt cac gat gat
atg ttt ccg ctg gaa gaa gtg gcg aaa ctg ggc 240Lys Val His Asp Asp
Met Phe Pro Leu Glu Glu Val Ala Lys Leu Gly 65 70 75tat aac gtg ttt
tat cat ggc cag aaa ggt cat tat ggc gtg gcc ctg 288Tyr Asn Val Phe
Tyr His Gly Gln Lys Gly His Tyr Gly Val Ala Leu80 85 90 95ctg acc
aaa gaa acc ccg atc gcg gtt cgt cgt ggt ttt ccg ggt gat 336Leu Thr
Lys Glu Thr Pro Ile Ala Val Arg Arg Gly Phe Pro Gly Asp 100 105
110gat gaa gaa gcg cag cgt cgt att att atg gcg gaa att ccg agc ctg
384Asp Glu Glu Ala Gln Arg Arg Ile Ile Met Ala Glu Ile Pro Ser Leu
115 120 125ctg ggc aat gtg acc gtt att aac ggc tat ttt ccg cag ggc
gaa agc 432Leu Gly Asn Val Thr Val Ile Asn Gly Tyr Phe Pro Gln Gly
Glu Ser 130 135 140cgt gat cat ccg att aaa ttt ccg gcc aaa gcg cag
ttc tat cag aac 480Arg Asp His Pro Ile Lys Phe Pro Ala Lys Ala Gln
Phe Tyr Gln Asn 145 150 155ctg cag aac tat ctg gaa acc gaa ctg aaa
cgt gat aat ccg gtg ctg 528Leu Gln Asn Tyr Leu Glu Thr Glu Leu Lys
Arg Asp Asn Pro Val Leu160 165 170 175atc atg ggc gat atg aac att
agc ccg acc gat ctg gat att ggc att 576Ile Met Gly Asp Met Asn Ile
Ser Pro Thr Asp Leu Asp Ile Gly Ile 180 185 190ggc gaa gaa aac cgt
aaa cgc tgg ctg cgt acc ggt aaa tgc agc ttt 624Gly Glu Glu Asn Arg
Lys Arg Trp Leu Arg Thr Gly Lys Cys Ser Phe 195 200 205ctg ccg gaa
gaa cgt gaa tgg atg gat cgc ctg atg agc tgg ggc ctg 672Leu Pro Glu
Glu Arg Glu Trp Met Asp Arg Leu Met Ser Trp Gly Leu 210 215 220gtg
gat acc ttt cgt cat gcg aac ccg cag acc gcc gat cgc ttt agc 720Val
Asp Thr Phe Arg His Ala Asn Pro Gln Thr Ala Asp Arg Phe Ser 225 230
235tgg ttt gat tat cgc agc aaa ggt ttt gat gat aac cgt ggc ctg cgc
768Trp Phe Asp Tyr Arg Ser Lys Gly Phe Asp Asp Asn Arg Gly Leu
Arg240 245 250 255att gat ctg ctg ctg gcg agc cag ccg ctg gcg gaa
tgc tgc gtt gaa 816Ile Asp Leu Leu Leu Ala Ser Gln Pro Leu Ala Glu
Cys Cys Val Glu 260 265 270acc ggt att gat tat gaa att cgc agc atg
gaa aaa ccg agc gat cac 864Thr Gly Ile Asp Tyr Glu Ile Arg Ser Met
Glu Lys Pro Ser Asp His 275 280 285gcc ccg gtg tgg gcg acc ttt cgc
cgc tct ggc tct ggt tcc ggc agc 912Ala Pro Val Trp Ala Thr Phe Arg
Arg Ser Gly Ser Gly Ser Gly Ser 290 295 300ggt tcc gga aca gta aaa
aca ggt gat tta gtc act tat gat aaa gaa 960Gly Ser Gly Thr Val Lys
Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu 305 310 315aat ggc atg cac
aaa aaa gta ttt tat agt ttt atc gat gat aaa aat 1008Asn Gly Met His
Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn320 325 330 335cac
aat aaa aaa ctg cta gtt att aga aca aaa ggt acc att gct ggt 1056His
Asn Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly 340 345
350caa tat aga gtt tat agc gaa gaa ggt gct aac aaa agt ggt tta gcc
1104Gln Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala
355 360 365tgg cct tca gcc ttt aag gta cag ttg caa cta cct gat aat
gaa gta 1152Trp Pro Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn
Glu Val 370 375 380gct caa ata tct gat tac tat cca aga aat tcg att
gat aca aaa gag 1200Ala Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile
Asp Thr Lys Glu 385 390 395tat atg agt act tta act tat gga ttc aac
ggt aat gtt act ggt gat 1248Tyr Met Ser Thr Leu Thr Tyr Gly Phe Asn
Gly Asn Val Thr Gly Asp400 405 410 415gat aca gga aaa att ggc ggc
ctt att ggt gca aat gtt tcg att ggt 1296Asp Thr Gly Lys Ile Gly Gly
Leu Ile Gly Ala Asn Val Ser Ile Gly 420 425 430cat aca ctg aaa tat
gtt caa cct gat ttc aaa aca att tta gag agc 1344His Thr Leu Lys Tyr
Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser 435 440 445cca act gat
aaa aaa gta ggc tgg aaa gtg ata ttt aac aat atg gtg 1392Pro Thr Asp
Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val 450 455 460aat
caa aat tgg gga cca tac gat cga gat tct tgg aac ccg gta tat 1440Asn
Gln Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr 465 470
475ggc aat caa ctt ttc atg aaa act aga aat ggt tct atg aaa gca gca
1488Gly Asn Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala
Ala480 485 490 495gat aac ttc ctt gat cct aac aaa gca agt tct cta
tta tct tca ggg 1536Asp Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu
Leu Ser Ser Gly 500 505 510ttt tca cca gac ttc gct aca gtt att act
atg gat aga aaa gca tcc 1584Phe Ser Pro Asp Phe Ala Thr Val Ile Thr
Met Asp Arg Lys Ala Ser 515 520 525aaa caa caa aca aat ata gat gta
ata tac gaa cga gtt cgt gat gat 1632Lys Gln Gln Thr Asn Ile Asp Val
Ile Tyr Glu Arg Val Arg Asp Asp 530 535 540tac caa ttg cat tgg act
tca aca aat tgg aaa ggt acc aat act aaa 1680Tyr Gln Leu His Trp Thr
Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys 545 550 555gat aaa tgg aca
gat cgt tct tca gaa aga tat aaa atc gat tgg gaa
1728Asp Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp
Glu560 565 570 575aaa gaa gaa atg aca aat ggt ggt tcg ggc tca tct
ggt ggc tcg agt 1776Lys Glu Glu Met Thr Asn Gly Gly Ser Gly Ser Ser
Gly Gly Ser Ser 580 585 590cac cat cat cat cac cac 1794His His His
His His His 59518597PRTArtificial sequenceHL-wt-EcoExoIII-L1-H6
18Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1
5 10 15Asn Thr Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Lys Phe
Val 20 25 30Ser Phe Asn Ile Asn Gly Leu Arg Ala Arg Pro His Gln Leu
Glu Ala 35 40 45Ile Val Glu Lys His Gln Pro Asp Val Ile Gly Leu Gln
Glu Thr Lys 50 55 60Val His Asp Asp Met Phe Pro Leu Glu Glu Val Ala
Lys Leu Gly Tyr65 70 75 80Asn Val Phe Tyr His Gly Gln Lys Gly His
Tyr Gly Val Ala Leu Leu 85 90 95Thr Lys Glu Thr Pro Ile Ala Val Arg
Arg Gly Phe Pro Gly Asp Asp 100 105 110Glu Glu Ala Gln Arg Arg Ile
Ile Met Ala Glu Ile Pro Ser Leu Leu 115 120 125Gly Asn Val Thr Val
Ile Asn Gly Tyr Phe Pro Gln Gly Glu Ser Arg 130 135 140Asp His Pro
Ile Lys Phe Pro Ala Lys Ala Gln Phe Tyr Gln Asn Leu145 150 155
160Gln Asn Tyr Leu Glu Thr Glu Leu Lys Arg Asp Asn Pro Val Leu Ile
165 170 175Met Gly Asp Met Asn Ile Ser Pro Thr Asp Leu Asp Ile Gly
Ile Gly 180 185 190Glu Glu Asn Arg Lys Arg Trp Leu Arg Thr Gly Lys
Cys Ser Phe Leu 195 200 205Pro Glu Glu Arg Glu Trp Met Asp Arg Leu
Met Ser Trp Gly Leu Val 210 215 220Asp Thr Phe Arg His Ala Asn Pro
Gln Thr Ala Asp Arg Phe Ser Trp225 230 235 240Phe Asp Tyr Arg Ser
Lys Gly Phe Asp Asp Asn Arg Gly Leu Arg Ile 245 250 255Asp Leu Leu
Leu Ala Ser Gln Pro Leu Ala Glu Cys Cys Val Glu Thr 260 265 270Gly
Ile Asp Tyr Glu Ile Arg Ser Met Glu Lys Pro Ser Asp His Ala 275 280
285Pro Val Trp Ala Thr Phe Arg Arg Ser Gly Ser Gly Ser Gly Ser Gly
290 295 300Ser Gly Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu Asn305 310 315 320Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile
Asp Asp Lys Asn His 325 330 335Asn Lys Lys Leu Leu Val Ile Arg Thr
Lys Gly Thr Ile Ala Gly Gln 340 345 350Tyr Arg Val Tyr Ser Glu Glu
Gly Ala Asn Lys Ser Gly Leu Ala Trp 355 360 365Pro Ser Ala Phe Lys
Val Gln Leu Gln Leu Pro Asp Asn Glu Val Ala 370 375 380Gln Ile Ser
Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr385 390 395
400Met Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp Asp
405 410 415Thr Gly Lys Ile Gly Gly Leu Ile Gly Ala Asn Val Ser Ile
Gly His 420 425 430Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile
Leu Glu Ser Pro 435 440 445Thr Asp Lys Lys Val Gly Trp Lys Val Ile
Phe Asn Asn Met Val Asn 450 455 460Gln Asn Trp Gly Pro Tyr Asp Arg
Asp Ser Trp Asn Pro Val Tyr Gly465 470 475 480Asn Gln Leu Phe Met
Lys Thr Arg Asn Gly Ser Met Lys Ala Ala Asp 485 490 495Asn Phe Leu
Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe 500 505 510Ser
Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser Lys 515 520
525Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr
530 535 540Gln Leu His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr
Lys Asp545 550 555 560Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys
Ile Asp Trp Glu Lys 565 570 575Glu Glu Met Thr Asn Gly Gly Ser Gly
Ser Ser Gly Gly Ser Ser His 580 585 590His His His His His
595191794DNAArtificial sequenceHL-RQC-EcoExoIII-L1-H6 19atg gca gat
tct gat att aat att aaa acc ggt act aca gat att gga 48 Ala Asp Ser
Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat
act tcc gga agc ggc tct ggt agt ggt tct ggc atg aaa ttt 96Ser Asn
Thr Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Lys Phe 20 25 30gtt
agc ttc aat atc aac ggc ctg cgc gcg cgc ccg cat cag ctg gaa 144Val
Ser Phe Asn Ile Asn Gly Leu Arg Ala Arg Pro His Gln Leu Glu 35 40
45gcg att gtg gaa aaa cat cag ccg gat gtt att ggt ctg cag gaa acc
192Ala Ile Val Glu Lys His Gln Pro Asp Val Ile Gly Leu Gln Glu Thr
50 55 60aaa gtt cac gat gat atg ttt ccg ctg gaa gaa gtg gcg aaa ctg
ggc 240Lys Val His Asp Asp Met Phe Pro Leu Glu Glu Val Ala Lys Leu
Gly 65 70 75tat aac gtg ttt tat cat ggc cag aaa ggt cat tat ggc gtg
gcc ctg 288Tyr Asn Val Phe Tyr His Gly Gln Lys Gly His Tyr Gly Val
Ala Leu80 85 90 95ctg acc aaa gaa acc ccg atc gcg gtt cgt cgt ggt
ttt ccg ggt gat 336Leu Thr Lys Glu Thr Pro Ile Ala Val Arg Arg Gly
Phe Pro Gly Asp 100 105 110gat gaa gaa gcg cag cgt cgt att att atg
gcg gaa att ccg agc ctg 384Asp Glu Glu Ala Gln Arg Arg Ile Ile Met
Ala Glu Ile Pro Ser Leu 115 120 125ctg ggc aat gtg acc gtt att aac
ggc tat ttt ccg cag ggc gaa agc 432Leu Gly Asn Val Thr Val Ile Asn
Gly Tyr Phe Pro Gln Gly Glu Ser 130 135 140cgt gat cat ccg att aaa
ttt ccg gcc aaa gcg cag ttc tat cag aac 480Arg Asp His Pro Ile Lys
Phe Pro Ala Lys Ala Gln Phe Tyr Gln Asn 145 150 155ctg cag aac tat
ctg gaa acc gaa ctg aaa cgt gat aat ccg gtg ctg 528Leu Gln Asn Tyr
Leu Glu Thr Glu Leu Lys Arg Asp Asn Pro Val Leu160 165 170 175atc
atg ggc gat atg aac att agc ccg acc gat ctg gat att ggc att 576Ile
Met Gly Asp Met Asn Ile Ser Pro Thr Asp Leu Asp Ile Gly Ile 180 185
190ggc gaa gaa aac cgt aaa cgc tgg ctg cgt acc ggt aaa tgc agc ttt
624Gly Glu Glu Asn Arg Lys Arg Trp Leu Arg Thr Gly Lys Cys Ser Phe
195 200 205ctg ccg gaa gaa cgt gaa tgg atg gat cgc ctg atg agc tgg
ggc ctg 672Leu Pro Glu Glu Arg Glu Trp Met Asp Arg Leu Met Ser Trp
Gly Leu 210 215 220gtg gat acc ttt cgt cat gcg aac ccg cag acc gcc
gat cgc ttt agc 720Val Asp Thr Phe Arg His Ala Asn Pro Gln Thr Ala
Asp Arg Phe Ser 225 230 235tgg ttt gat tat cgc agc aaa ggt ttt gat
gat aac cgt ggc ctg cgc 768Trp Phe Asp Tyr Arg Ser Lys Gly Phe Asp
Asp Asn Arg Gly Leu Arg240 245 250 255att gat ctg ctg ctg gcg agc
cag ccg ctg gcg gaa tgc tgc gtt gaa 816Ile Asp Leu Leu Leu Ala Ser
Gln Pro Leu Ala Glu Cys Cys Val Glu 260 265 270acc ggt att gat tat
gaa att cgc agc atg gaa aaa ccg agc gat cac 864Thr Gly Ile Asp Tyr
Glu Ile Arg Ser Met Glu Lys Pro Ser Asp His 275 280 285gcc ccg gtg
tgg gcg acc ttt cgc cgc tct ggc tct ggt tcc ggc agc 912Ala Pro Val
Trp Ala Thr Phe Arg Arg Ser Gly Ser Gly Ser Gly Ser 290 295 300ggt
tcc gga aca gta aaa aca ggt gat tta gtc act tat gat aaa gaa 960Gly
Ser Gly Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu 305 310
315aat ggc atg cac aaa aaa gta ttt tat agt ttt atc gat gat aaa aat
1008Asn Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys
Asn320 325 330 335cac aat aaa aaa ctg cta gtt att aga aca aaa ggt
acc att gct ggt 1056His Asn Lys Lys Leu Leu Val Ile Arg Thr Lys Gly
Thr Ile Ala Gly 340 345 350caa tat aga gtt tat agc gaa gaa ggt gct
aac aaa agt ggt tta gcc 1104Gln Tyr Arg Val Tyr Ser Glu Glu Gly Ala
Asn Lys Ser Gly Leu Ala 355 360 365tgg cct tca gcc ttt aag gta cag
ttg caa cta cct gat aat gaa gta 1152Trp Pro Ser Ala Phe Lys Val Gln
Leu Gln Leu Pro Asp Asn Glu Val 370 375 380gct caa ata tct gat tac
tat cca aga aat tcg att gat aca aaa gag 1200Ala Gln Ile Ser Asp Tyr
Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu 385 390 395tat agg agt act
tta act tat gga ttc aac ggt aat gtt act ggt gat 1248Tyr Arg Ser Thr
Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp400 405 410 415gat
aca gga aaa att ggc ggc tgt att ggt gca caa gtt tcg att ggt 1296Asp
Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly 420 425
430cat aca ctg aaa tat gtt caa cct gat ttc aaa aca att tta gag agc
1344His Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser
435 440 445cca act gat aaa aaa gta ggc tgg aaa gtg ata ttt aac aat
atg gtg 1392Pro Thr Asp Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn
Met Val 450 455 460aat caa aat tgg gga cca tac gat cga gat tct tgg
aac ccg gta tat 1440Asn Gln Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp
Asn Pro Val Tyr 465 470 475ggc aat caa ctt ttc atg aaa act aga aat
ggt tct atg aaa gca gca 1488Gly Asn Gln Leu Phe Met Lys Thr Arg Asn
Gly Ser Met Lys Ala Ala480 485 490 495gat aac ttc ctt gat cct aac
aaa gca agt tct cta tta tct tca ggg 1536Asp Asn Phe Leu Asp Pro Asn
Lys Ala Ser Ser Leu Leu Ser Ser Gly 500 505 510ttt tca cca gac ttc
gct aca gtt att act atg gat aga aaa gca tcc 1584Phe Ser Pro Asp Phe
Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser 515 520 525aaa caa caa
aca aat ata gat gta ata tac gaa cga gtt cgt gat gat 1632Lys Gln Gln
Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp 530 535 540tac
caa ttg cat tgg act tca aca aat tgg aaa ggt acc aat act aaa 1680Tyr
Gln Leu His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys 545 550
555gat aaa tgg aca gat cgt tct tca gaa aga tat aaa atc gat tgg gaa
1728Asp Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp
Glu560 565 570 575aaa gaa gaa atg aca aat ggt ggt tcg ggc tca tct
ggt ggc tcg agt 1776Lys Glu Glu Met Thr Asn Gly Gly Ser Gly Ser Ser
Gly Gly Ser Ser 580 585 590cac cat cat cat cac cac 1794His His His
His His His 59520597PRTArtificial sequenceHL-RQC-EcoExoIII-L1-H6
20Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1
5 10 15Asn Thr Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Lys Phe
Val 20 25 30Ser Phe Asn Ile Asn Gly Leu Arg Ala Arg Pro His Gln Leu
Glu Ala 35 40 45Ile Val Glu Lys His Gln Pro Asp Val Ile Gly Leu Gln
Glu Thr Lys 50 55 60Val His Asp Asp Met Phe Pro Leu Glu Glu Val Ala
Lys Leu Gly Tyr65 70 75 80Asn Val Phe Tyr His Gly Gln Lys Gly His
Tyr Gly Val Ala Leu Leu 85 90 95Thr Lys Glu Thr Pro Ile Ala Val Arg
Arg Gly Phe Pro Gly Asp Asp 100 105 110Glu Glu Ala Gln Arg Arg Ile
Ile Met Ala Glu Ile Pro Ser Leu Leu 115 120 125Gly Asn Val Thr Val
Ile Asn Gly Tyr Phe Pro Gln Gly Glu Ser Arg 130 135 140Asp His Pro
Ile Lys Phe Pro Ala Lys Ala Gln Phe Tyr Gln Asn Leu145 150 155
160Gln Asn Tyr Leu Glu Thr Glu Leu Lys Arg Asp Asn Pro Val Leu Ile
165 170 175Met Gly Asp Met Asn Ile Ser Pro Thr Asp Leu Asp Ile Gly
Ile Gly 180 185 190Glu Glu Asn Arg Lys Arg Trp Leu Arg Thr Gly Lys
Cys Ser Phe Leu 195 200 205Pro Glu Glu Arg Glu Trp Met Asp Arg Leu
Met Ser Trp Gly Leu Val 210 215 220Asp Thr Phe Arg His Ala Asn Pro
Gln Thr Ala Asp Arg Phe Ser Trp225 230 235 240Phe Asp Tyr Arg Ser
Lys Gly Phe Asp Asp Asn Arg Gly Leu Arg Ile 245 250 255Asp Leu Leu
Leu Ala Ser Gln Pro Leu Ala Glu Cys Cys Val Glu Thr 260 265 270Gly
Ile Asp Tyr Glu Ile Arg Ser Met Glu Lys Pro Ser Asp His Ala 275 280
285Pro Val Trp Ala Thr Phe Arg Arg Ser Gly Ser Gly Ser Gly Ser Gly
290 295 300Ser Gly Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu Asn305 310 315 320Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile
Asp Asp Lys Asn His 325 330 335Asn Lys Lys Leu Leu Val Ile Arg Thr
Lys Gly Thr Ile Ala Gly Gln 340 345 350Tyr Arg Val Tyr Ser Glu Glu
Gly Ala Asn Lys Ser Gly Leu Ala Trp 355 360 365Pro Ser Ala Phe Lys
Val Gln Leu Gln Leu Pro Asp Asn Glu Val Ala 370 375 380Gln Ile Ser
Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr385 390 395
400Arg Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp Asp
405 410 415Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile
Gly His 420 425 430Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile
Leu Glu Ser Pro 435 440 445Thr Asp Lys Lys Val Gly Trp Lys Val Ile
Phe Asn Asn Met Val Asn 450 455 460Gln Asn Trp Gly Pro Tyr Asp Arg
Asp Ser Trp Asn Pro Val Tyr Gly465 470 475 480Asn Gln Leu Phe Met
Lys Thr Arg Asn Gly Ser Met Lys Ala Ala Asp 485 490 495Asn Phe Leu
Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe 500 505 510Ser
Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser Lys 515 520
525Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr
530 535 540Gln Leu His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr
Lys Asp545 550 555 560Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys
Ile Asp Trp Glu Lys 565 570 575Glu Glu Met Thr Asn Gly Gly Ser Gly
Ser Ser Gly Gly Ser Ser His 580 585 590His His His His His
595212415DNAArtificial sequenceHL-RQC-EcoExoI-L1-H6 21atg gca gat
tct gat att aat att aaa acc ggt act aca gat att gga 48 Ala Asp Ser
Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat
act tcc gga agc ggc tct ggt agt ggt tct ggc atg atg aac 96Ser Asn
Thr Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Met Asn 20 25 30gat
ggc aaa cag cag agc acc ttc ctg ttt cat gat tat gaa acc ttc 144Asp
Gly Lys Gln Gln Ser Thr Phe Leu Phe His Asp Tyr Glu Thr Phe 35 40
45ggt acc cat ccg gcc ctg gat cgt ccg gcg cag ttt gcg gcc att cgc
192Gly Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala Ala Ile Arg
50 55 60acc gat agc gaa ttc aat gtg att ggc gaa ccg gaa gtg ttt tat
tgc 240Thr Asp Ser Glu Phe Asn Val Ile Gly Glu Pro Glu Val Phe Tyr
Cys 65 70 75aaa ccg gcc gat gat tat ctg ccg cag ccg ggt gcg gtg ctg
att acc 288Lys Pro Ala Asp Asp Tyr Leu Pro Gln Pro Gly Ala Val Leu
Ile Thr80 85 90 95ggt att acc ccg cag gaa gcg cgc gcg aaa ggt gaa
aac gaa gcg gcg 336Gly Ile Thr Pro Gln Glu Ala Arg Ala Lys Gly Glu
Asn Glu Ala Ala 100 105 110ttt gcc gcg cgc att cat agc ctg ttt acc
gtg ccg aaa acc tgc att 384Phe Ala Ala Arg Ile His Ser Leu Phe Thr
Val Pro Lys Thr Cys Ile 115 120 125ctg ggc tat aac aat gtg cgc ttc
gat gat gaa gtt acc cgt aat atc 432Leu Gly Tyr Asn Asn Val Arg Phe
Asp Asp Glu Val Thr Arg Asn Ile 130
135 140ttt tat cgt aac ttt tat gat ccg tat gcg tgg agc tgg cag cat
gat 480Phe Tyr Arg Asn Phe Tyr Asp Pro Tyr Ala Trp Ser Trp Gln His
Asp 145 150 155aac agc cgt tgg gat ctg ctg gat gtg atg cgc gcg tgc
tat gcg ctg 528Asn Ser Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys
Tyr Ala Leu160 165 170 175cgc ccg gaa ggc att aat tgg ccg gaa aac
gat gat ggc ctg ccg agc 576Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn
Asp Asp Gly Leu Pro Ser 180 185 190ttt cgt ctg gaa cat ctg acc aaa
gcc aac ggc att gaa cat agc aat 624Phe Arg Leu Glu His Leu Thr Lys
Ala Asn Gly Ile Glu His Ser Asn 195 200 205gcc cat gat gcg atg gcc
gat gtt tat gcg acc att gcg atg gcg aaa 672Ala His Asp Ala Met Ala
Asp Val Tyr Ala Thr Ile Ala Met Ala Lys 210 215 220ctg gtt aaa acc
cgt cag ccg cgc ctg ttt gat tat ctg ttt acc cac 720Leu Val Lys Thr
Arg Gln Pro Arg Leu Phe Asp Tyr Leu Phe Thr His 225 230 235cgt aac
aaa cac aaa ctg atg gcg ctg att gat gtt ccg cag atg aaa 768Arg Asn
Lys His Lys Leu Met Ala Leu Ile Asp Val Pro Gln Met Lys240 245 250
255ccg ctg gtg cat gtg agc ggc atg ttt ggc gcc tgg cgc ggc aac acc
816Pro Leu Val His Val Ser Gly Met Phe Gly Ala Trp Arg Gly Asn Thr
260 265 270agc tgg gtg gcc ccg ctg gcc tgg cac ccg gaa aat cgt aac
gcc gtg 864Ser Trp Val Ala Pro Leu Ala Trp His Pro Glu Asn Arg Asn
Ala Val 275 280 285att atg gtt gat ctg gcc ggt gat att agc ccg ctg
ctg gaa ctg gat 912Ile Met Val Asp Leu Ala Gly Asp Ile Ser Pro Leu
Leu Glu Leu Asp 290 295 300agc gat acc ctg cgt gaa cgc ctg tat acc
gcc aaa acc gat ctg ggc 960Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr
Ala Lys Thr Asp Leu Gly 305 310 315gat aat gcc gcc gtg ccg gtg aaa
ctg gtt cac att aac aaa tgc ccg 1008Asp Asn Ala Ala Val Pro Val Lys
Leu Val His Ile Asn Lys Cys Pro320 325 330 335gtg ctg gcc cag gcg
aac acc ctg cgc ccg gaa gat gcg gat cgt ctg 1056Val Leu Ala Gln Ala
Asn Thr Leu Arg Pro Glu Asp Ala Asp Arg Leu 340 345 350ggt att aat
cgc cag cat tgt ctg gat aat ctg aaa atc ctg cgt gaa 1104Gly Ile Asn
Arg Gln His Cys Leu Asp Asn Leu Lys Ile Leu Arg Glu 355 360 365aac
ccg cag gtg cgt gaa aaa gtg gtg gcg atc ttc gcg gaa gcg gaa 1152Asn
Pro Gln Val Arg Glu Lys Val Val Ala Ile Phe Ala Glu Ala Glu 370 375
380ccg ttc acc ccg agc gat aac gtg gat gcg cag ctg tat aac ggc ttc
1200Pro Phe Thr Pro Ser Asp Asn Val Asp Ala Gln Leu Tyr Asn Gly Phe
385 390 395ttt agc gat gcc gat cgc gcg gcg atg aaa atc gtt ctg gaa
acc gaa 1248Phe Ser Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu Glu
Thr Glu400 405 410 415ccg cgc aat ctg ccg gcg ctg gat att acc ttt
gtt gat aaa cgt att 1296Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe
Val Asp Lys Arg Ile 420 425 430gaa aaa ctg ctg ttt aat tat cgt gcg
cgc aat ttt ccg ggt acc ctg 1344Glu Lys Leu Leu Phe Asn Tyr Arg Ala
Arg Asn Phe Pro Gly Thr Leu 435 440 445gat tat gcc gaa cag cag cgt
tgg ctg gaa cat cgt cgt cag gtt ttc 1392Asp Tyr Ala Glu Gln Gln Arg
Trp Leu Glu His Arg Arg Gln Val Phe 450 455 460acc ccg gaa ttt ctg
cag ggt tat gcg gat gaa ctg cag atg ctg gtt 1440Thr Pro Glu Phe Leu
Gln Gly Tyr Ala Asp Glu Leu Gln Met Leu Val 465 470 475cag cag tat
gcc gat gat aaa gaa aaa gtg gcg ctg ctg aaa gcg ctg 1488Gln Gln Tyr
Ala Asp Asp Lys Glu Lys Val Ala Leu Leu Lys Ala Leu480 485 490
495tgg cag tat gcg gaa gaa atc gtt tct ggc tct ggt tcc ggc agc ggt
1536Trp Gln Tyr Ala Glu Glu Ile Val Ser Gly Ser Gly Ser Gly Ser Gly
500 505 510tcc gga aca gta aaa aca ggt gat tta gtc act tat gat aaa
gaa aat 1584Ser Gly Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu Asn 515 520 525ggc atg cac aaa aaa gta ttt tat agt ttt atc gat
gat aaa aat cac 1632Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp
Asp Lys Asn His 530 535 540aat aaa aaa ctg cta gtt att aga aca aaa
ggt acc att gct ggt caa 1680Asn Lys Lys Leu Leu Val Ile Arg Thr Lys
Gly Thr Ile Ala Gly Gln 545 550 555tat aga gtt tat agc gaa gaa ggt
gct aac aaa agt ggt tta gcc tgg 1728Tyr Arg Val Tyr Ser Glu Glu Gly
Ala Asn Lys Ser Gly Leu Ala Trp560 565 570 575cct tca gcc ttt aag
gta cag ttg caa cta cct gat aat gaa gta gct 1776Pro Ser Ala Phe Lys
Val Gln Leu Gln Leu Pro Asp Asn Glu Val Ala 580 585 590caa ata tct
gat tac tat cca aga aat tcg att gat aca aaa gag tat 1824Gln Ile Ser
Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr 595 600 605agg
agt act tta act tat gga ttc aac ggt aat gtt act ggt gat gat 1872Arg
Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp Asp 610 615
620aca gga aaa att ggc ggc tgt att ggt gca caa gtt tcg att ggt cat
1920Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly His
625 630 635aca ctg aaa tat gtt caa cct gat ttc aaa aca att tta gag
agc cca 1968Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu
Ser Pro640 645 650 655act gat aaa aaa gta ggc tgg aaa gtg ata ttt
aac aat atg gtg aat 2016Thr Asp Lys Lys Val Gly Trp Lys Val Ile Phe
Asn Asn Met Val Asn 660 665 670caa aat tgg gga cca tac gat cga gat
tct tgg aac ccg gta tat ggc 2064Gln Asn Trp Gly Pro Tyr Asp Arg Asp
Ser Trp Asn Pro Val Tyr Gly 675 680 685aat caa ctt ttc atg aaa act
aga aat ggt tct atg aaa gca gca gat 2112Asn Gln Leu Phe Met Lys Thr
Arg Asn Gly Ser Met Lys Ala Ala Asp 690 695 700aac ttc ctt gat cct
aac aaa gca agt tct cta tta tct tca ggg ttt 2160Asn Phe Leu Asp Pro
Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe 705 710 715tca cca gac
ttc gct aca gtt att act atg gat aga aaa gca tcc aaa 2208Ser Pro Asp
Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser Lys720 725 730
735caa caa aca aat ata gat gta ata tac gaa cga gtt cgt gat gat tac
2256Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr
740 745 750caa ttg cat tgg act tca aca aat tgg aaa ggt acc aat act
aaa gat 2304Gln Leu His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr
Lys Asp 755 760 765aaa tgg aca gat cgt tct tca gaa aga tat aaa atc
gat tgg gaa aaa 2352Lys Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile
Asp Trp Glu Lys 770 775 780gaa gaa atg aca aat ggt ggt tcg ggc tca
tct ggt ggc tcg agt cac 2400Glu Glu Met Thr Asn Gly Gly Ser Gly Ser
Ser Gly Gly Ser Ser His 785 790 795cat cat cat cac cac 2415His His
His His His80022804PRTArtificial sequenceHL-RQC-EcoExoI-L1-H6 22Ala
Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1 5 10
15Asn Thr Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Met Asn Asp
20 25 30Gly Lys Gln Gln Ser Thr Phe Leu Phe His Asp Tyr Glu Thr Phe
Gly 35 40 45Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala Ala Ile
Arg Thr 50 55 60Asp Ser Glu Phe Asn Val Ile Gly Glu Pro Glu Val Phe
Tyr Cys Lys65 70 75 80Pro Ala Asp Asp Tyr Leu Pro Gln Pro Gly Ala
Val Leu Ile Thr Gly 85 90 95Ile Thr Pro Gln Glu Ala Arg Ala Lys Gly
Glu Asn Glu Ala Ala Phe 100 105 110Ala Ala Arg Ile His Ser Leu Phe
Thr Val Pro Lys Thr Cys Ile Leu 115 120 125Gly Tyr Asn Asn Val Arg
Phe Asp Asp Glu Val Thr Arg Asn Ile Phe 130 135 140Tyr Arg Asn Phe
Tyr Asp Pro Tyr Ala Trp Ser Trp Gln His Asp Asn145 150 155 160Ser
Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys Tyr Ala Leu Arg 165 170
175Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp Asp Gly Leu Pro Ser Phe
180 185 190Arg Leu Glu His Leu Thr Lys Ala Asn Gly Ile Glu His Ser
Asn Ala 195 200 205His Asp Ala Met Ala Asp Val Tyr Ala Thr Ile Ala
Met Ala Lys Leu 210 215 220Val Lys Thr Arg Gln Pro Arg Leu Phe Asp
Tyr Leu Phe Thr His Arg225 230 235 240Asn Lys His Lys Leu Met Ala
Leu Ile Asp Val Pro Gln Met Lys Pro 245 250 255Leu Val His Val Ser
Gly Met Phe Gly Ala Trp Arg Gly Asn Thr Ser 260 265 270Trp Val Ala
Pro Leu Ala Trp His Pro Glu Asn Arg Asn Ala Val Ile 275 280 285Met
Val Asp Leu Ala Gly Asp Ile Ser Pro Leu Leu Glu Leu Asp Ser 290 295
300Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala Lys Thr Asp Leu Gly
Asp305 310 315 320Asn Ala Ala Val Pro Val Lys Leu Val His Ile Asn
Lys Cys Pro Val 325 330 335Leu Ala Gln Ala Asn Thr Leu Arg Pro Glu
Asp Ala Asp Arg Leu Gly 340 345 350Ile Asn Arg Gln His Cys Leu Asp
Asn Leu Lys Ile Leu Arg Glu Asn 355 360 365Pro Gln Val Arg Glu Lys
Val Val Ala Ile Phe Ala Glu Ala Glu Pro 370 375 380Phe Thr Pro Ser
Asp Asn Val Asp Ala Gln Leu Tyr Asn Gly Phe Phe385 390 395 400Ser
Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu Glu Thr Glu Pro 405 410
415Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe Val Asp Lys Arg Ile Glu
420 425 430Lys Leu Leu Phe Asn Tyr Arg Ala Arg Asn Phe Pro Gly Thr
Leu Asp 435 440 445Tyr Ala Glu Gln Gln Arg Trp Leu Glu His Arg Arg
Gln Val Phe Thr 450 455 460Pro Glu Phe Leu Gln Gly Tyr Ala Asp Glu
Leu Gln Met Leu Val Gln465 470 475 480Gln Tyr Ala Asp Asp Lys Glu
Lys Val Ala Leu Leu Lys Ala Leu Trp 485 490 495Gln Tyr Ala Glu Glu
Ile Val Ser Gly Ser Gly Ser Gly Ser Gly Ser 500 505 510Gly Thr Val
Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu Asn Gly 515 520 525Met
His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn His Asn 530 535
540Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly Gln
Tyr545 550 555 560Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly
Leu Ala Trp Pro 565 570 575Ser Ala Phe Lys Val Gln Leu Gln Leu Pro
Asp Asn Glu Val Ala Gln 580 585 590Ile Ser Asp Tyr Tyr Pro Arg Asn
Ser Ile Asp Thr Lys Glu Tyr Arg 595 600 605Ser Thr Leu Thr Tyr Gly
Phe Asn Gly Asn Val Thr Gly Asp Asp Thr 610 615 620Gly Lys Ile Gly
Gly Cys Ile Gly Ala Gln Val Ser Ile Gly His Thr625 630 635 640Leu
Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser Pro Thr 645 650
655Asp Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn Gln
660 665 670Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr
Gly Asn 675 680 685Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys
Ala Ala Asp Asn 690 695 700Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu
Leu Ser Ser Gly Phe Ser705 710 715 720Pro Asp Phe Ala Thr Val Ile
Thr Met Asp Arg Lys Ala Ser Lys Gln 725 730 735Gln Thr Asn Ile Asp
Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr Gln 740 745 750Leu His Trp
Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys Asp Lys 755 760 765Trp
Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu Lys Glu 770 775
780Glu Met Thr Asn Gly Gly Ser Gly Ser Ser Gly Gly Ser Ser His
His785 790 795 800His His His His232265DNAArtificial
sequenceHL-RQC-TthRecJ-L1-H6 23atg gca gat tct gat att aat att aaa
acc ggt act aca gat att gga 48 Ala Asp Ser Asp Ile Asn Ile Lys Thr
Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat act tcc gga agc ggc tct
ggt agt ggt tct ggc atg ttt cgt 96Ser Asn Thr Ser Gly Ser Gly Ser
Gly Ser Gly Ser Gly Met Phe Arg 20 25 30cgt aaa gaa gat ctg gat ccg
ccg ctg gca ctg ctg ccg ctg aaa ggc 144Arg Lys Glu Asp Leu Asp Pro
Pro Leu Ala Leu Leu Pro Leu Lys Gly 35 40 45ctg cgc gaa gcc gcc gca
ctg ctg gaa gaa gcg ctg cgt caa ggt aaa 192Leu Arg Glu Ala Ala Ala
Leu Leu Glu Glu Ala Leu Arg Gln Gly Lys 50 55 60cgc att cgt gtt cac
ggc gac tat gat gcg gat ggc ctg acc ggc acc 240Arg Ile Arg Val His
Gly Asp Tyr Asp Ala Asp Gly Leu Thr Gly Thr 65 70 75gcg atc ctg gtt
cgt ggt ctg gcc gcc ctg ggt gcg gat gtt cat ccg 288Ala Ile Leu Val
Arg Gly Leu Ala Ala Leu Gly Ala Asp Val His Pro80 85 90 95ttt atc
ccg cac cgc ctg gaa gaa ggc tat ggt gtc ctg atg gaa cgc 336Phe Ile
Pro His Arg Leu Glu Glu Gly Tyr Gly Val Leu Met Glu Arg 100 105
110gtc ccg gaa cat ctg gaa gcc tcg gac ctg ttt ctg acc gtt gac tgc
384Val Pro Glu His Leu Glu Ala Ser Asp Leu Phe Leu Thr Val Asp Cys
115 120 125ggc att acc aac cat gcg gaa ctg cgc gaa ctg ctg gaa aat
ggc gtg 432Gly Ile Thr Asn His Ala Glu Leu Arg Glu Leu Leu Glu Asn
Gly Val 130 135 140gaa gtc att gtt acc gat cat cat acg ccg ggc aaa
acg ccg ccg ccg 480Glu Val Ile Val Thr Asp His His Thr Pro Gly Lys
Thr Pro Pro Pro 145 150 155ggt ctg gtc gtg cat ccg gcg ctg acg ccg
gat ctg aaa gaa aaa ccg 528Gly Leu Val Val His Pro Ala Leu Thr Pro
Asp Leu Lys Glu Lys Pro160 165 170 175acc ggc gca ggc gtg gcg ttt
ctg ctg ctg tgg gca ctg cat gaa cgc 576Thr Gly Ala Gly Val Ala Phe
Leu Leu Leu Trp Ala Leu His Glu Arg 180 185 190ctg ggc ctg ccg ccg
ccg ctg gaa tac gcg gac ctg gca gcc gtt ggc 624Leu Gly Leu Pro Pro
Pro Leu Glu Tyr Ala Asp Leu Ala Ala Val Gly 195 200 205acc att gcc
gac gtt gcc ccg ctg tgg ggt tgg aat cgt gca ctg gtg 672Thr Ile Ala
Asp Val Ala Pro Leu Trp Gly Trp Asn Arg Ala Leu Val 210 215 220aaa
gaa ggt ctg gca cgc atc ccg gct tca tct tgg gtg ggc ctg cgt 720Lys
Glu Gly Leu Ala Arg Ile Pro Ala Ser Ser Trp Val Gly Leu Arg 225 230
235ctg ctg gct gaa gcc gtg ggc tat acc ggc aaa gcg gtc gaa gtc gct
768Leu Leu Ala Glu Ala Val Gly Tyr Thr Gly Lys Ala Val Glu Val
Ala240 245 250 255ttc cgc atc gcg ccg cgc atc aat gcg gct tcc cgc
ctg ggc gaa gcg 816Phe Arg Ile Ala Pro Arg Ile Asn Ala Ala Ser Arg
Leu Gly Glu Ala 260 265 270gaa aaa gcc ctg cgc ctg ctg ctg acg gat
gat gcg gca gaa gct cag 864Glu Lys Ala Leu Arg Leu Leu Leu Thr Asp
Asp Ala Ala Glu Ala Gln 275 280 285gcg ctg gtc ggc gaa ctg cac cgt
ctg aac gcc cgt cgt cag acc ctg 912Ala Leu Val Gly Glu Leu His Arg
Leu Asn Ala Arg Arg Gln Thr Leu 290 295 300gaa gaa gcg atg ctg cgc
aaa ctg ctg ccg cag gcc gac ccg gaa gcg 960Glu Glu Ala Met Leu Arg
Lys Leu Leu Pro Gln Ala Asp Pro Glu Ala 305 310 315aaa gcc atc gtt
ctg ctg gac ccg gaa ggc cat ccg ggt gtt atg ggt 1008Lys Ala Ile Val
Leu Leu Asp Pro Glu Gly His Pro Gly Val Met Gly320 325 330 335att
gtg gcc tct cgc atc ctg gaa gcg acc ctg cgc ccg gtc ttt ctg 1056Ile
Val Ala Ser Arg Ile Leu Glu Ala Thr Leu Arg Pro Val Phe Leu 340 345
350gtg gcc cag ggc aaa ggc acc gtg cgt tcg ctg gct ccg att tcc gcc
1104Val Ala Gln Gly Lys Gly Thr Val Arg Ser Leu Ala Pro Ile Ser
Ala
355 360 365gtc gaa gca ctg cgc agc gcg gaa gat ctg ctg ctg cgt tat
ggt ggt 1152Val Glu Ala Leu Arg Ser Ala Glu Asp Leu Leu Leu Arg Tyr
Gly Gly 370 375 380cat aaa gaa gcg gcg ggt ttc gca atg gat gaa gcg
ctg ttt ccg gcg 1200His Lys Glu Ala Ala Gly Phe Ala Met Asp Glu Ala
Leu Phe Pro Ala 385 390 395ttc aaa gca cgc gtt gaa gcg tat gcc gca
cgt ttc ccg gat ccg gtt 1248Phe Lys Ala Arg Val Glu Ala Tyr Ala Ala
Arg Phe Pro Asp Pro Val400 405 410 415cgt gaa gtg gca ctg ctg gat
ctg ctg ccg gaa ccg ggc ctg ctg ccg 1296Arg Glu Val Ala Leu Leu Asp
Leu Leu Pro Glu Pro Gly Leu Leu Pro 420 425 430cag gtg ttc cgt gaa
ctg gca ctg ctg gaa ccg tat ggt gaa ggt aac 1344Gln Val Phe Arg Glu
Leu Ala Leu Leu Glu Pro Tyr Gly Glu Gly Asn 435 440 445ccg gaa ccg
ctg ttc ctg tct ggc tct ggt tcc ggc agc ggt tcc gga 1392Pro Glu Pro
Leu Phe Leu Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly 450 455 460aca
gta aaa aca ggt gat tta gtc act tat gat aaa gaa aat ggc atg 1440Thr
Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu Asn Gly Met 465 470
475cac aaa aaa gta ttt tat agt ttt atc gat gat aaa aat cac aat aaa
1488His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn His Asn
Lys480 485 490 495aaa ctg cta gtt att aga aca aaa ggt acc att gct
ggt caa tat aga 1536Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala
Gly Gln Tyr Arg 500 505 510gtt tat agc gaa gaa ggt gct aac aaa agt
ggt tta gcc tgg cct tca 1584Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser
Gly Leu Ala Trp Pro Ser 515 520 525gcc ttt aag gta cag ttg caa cta
cct gat aat gaa gta gct caa ata 1632Ala Phe Lys Val Gln Leu Gln Leu
Pro Asp Asn Glu Val Ala Gln Ile 530 535 540tct gat tac tat cca aga
aat tcg att gat aca aaa gag tat agg agt 1680Ser Asp Tyr Tyr Pro Arg
Asn Ser Ile Asp Thr Lys Glu Tyr Arg Ser 545 550 555act tta act tat
gga ttc aac ggt aat gtt act ggt gat gat aca gga 1728Thr Leu Thr Tyr
Gly Phe Asn Gly Asn Val Thr Gly Asp Asp Thr Gly560 565 570 575aaa
att ggc ggc tgt att ggt gca caa gtt tcg att ggt cat aca ctg 1776Lys
Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly His Thr Leu 580 585
590aaa tat gtt caa cct gat ttc aaa aca att tta gag agc cca act gat
1824Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser Pro Thr Asp
595 600 605aaa aaa gta ggc tgg aaa gtg ata ttt aac aat atg gtg aat
caa aat 1872Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn
Gln Asn 610 615 620tgg gga cca tac gat cga gat tct tgg aac ccg gta
tat ggc aat caa 1920Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val
Tyr Gly Asn Gln 625 630 635ctt ttc atg aaa act aga aat ggt tct atg
aaa gca gca gat aac ttc 1968Leu Phe Met Lys Thr Arg Asn Gly Ser Met
Lys Ala Ala Asp Asn Phe640 645 650 655ctt gat cct aac aaa gca agt
tct cta tta tct tca ggg ttt tca cca 2016Leu Asp Pro Asn Lys Ala Ser
Ser Leu Leu Ser Ser Gly Phe Ser Pro 660 665 670gac ttc gct aca gtt
att act atg gat aga aaa gca tcc aaa caa caa 2064Asp Phe Ala Thr Val
Ile Thr Met Asp Arg Lys Ala Ser Lys Gln Gln 675 680 685aca aat ata
gat gta ata tac gaa cga gtt cgt gat gat tac caa ttg 2112Thr Asn Ile
Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr Gln Leu 690 695 700cat
tgg act tca aca aat tgg aaa ggt acc aat act aaa gat aaa tgg 2160His
Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys Asp Lys Trp 705 710
715aca gat cgt tct tca gaa aga tat aaa atc gat tgg gaa aaa gaa gaa
2208Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu Lys Glu
Glu720 725 730 735atg aca aat ggt ggt tcg ggc tca tct ggt ggc tcg
agt cac cat cat 2256Met Thr Asn Gly Gly Ser Gly Ser Ser Gly Gly Ser
Ser His His His 740 745 750cat cac cac 2265His His
His24754PRTArtificial sequenceHL-RQC-TthRecJ-L1-H6 24Ala Asp Ser
Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1 5 10 15Asn Thr
Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Met Phe Arg Arg 20 25 30Lys
Glu Asp Leu Asp Pro Pro Leu Ala Leu Leu Pro Leu Lys Gly Leu 35 40
45Arg Glu Ala Ala Ala Leu Leu Glu Glu Ala Leu Arg Gln Gly Lys Arg
50 55 60Ile Arg Val His Gly Asp Tyr Asp Ala Asp Gly Leu Thr Gly Thr
Ala65 70 75 80Ile Leu Val Arg Gly Leu Ala Ala Leu Gly Ala Asp Val
His Pro Phe 85 90 95Ile Pro His Arg Leu Glu Glu Gly Tyr Gly Val Leu
Met Glu Arg Val 100 105 110Pro Glu His Leu Glu Ala Ser Asp Leu Phe
Leu Thr Val Asp Cys Gly 115 120 125Ile Thr Asn His Ala Glu Leu Arg
Glu Leu Leu Glu Asn Gly Val Glu 130 135 140Val Ile Val Thr Asp His
His Thr Pro Gly Lys Thr Pro Pro Pro Gly145 150 155 160Leu Val Val
His Pro Ala Leu Thr Pro Asp Leu Lys Glu Lys Pro Thr 165 170 175Gly
Ala Gly Val Ala Phe Leu Leu Leu Trp Ala Leu His Glu Arg Leu 180 185
190Gly Leu Pro Pro Pro Leu Glu Tyr Ala Asp Leu Ala Ala Val Gly Thr
195 200 205Ile Ala Asp Val Ala Pro Leu Trp Gly Trp Asn Arg Ala Leu
Val Lys 210 215 220Glu Gly Leu Ala Arg Ile Pro Ala Ser Ser Trp Val
Gly Leu Arg Leu225 230 235 240Leu Ala Glu Ala Val Gly Tyr Thr Gly
Lys Ala Val Glu Val Ala Phe 245 250 255Arg Ile Ala Pro Arg Ile Asn
Ala Ala Ser Arg Leu Gly Glu Ala Glu 260 265 270Lys Ala Leu Arg Leu
Leu Leu Thr Asp Asp Ala Ala Glu Ala Gln Ala 275 280 285Leu Val Gly
Glu Leu His Arg Leu Asn Ala Arg Arg Gln Thr Leu Glu 290 295 300Glu
Ala Met Leu Arg Lys Leu Leu Pro Gln Ala Asp Pro Glu Ala Lys305 310
315 320Ala Ile Val Leu Leu Asp Pro Glu Gly His Pro Gly Val Met Gly
Ile 325 330 335Val Ala Ser Arg Ile Leu Glu Ala Thr Leu Arg Pro Val
Phe Leu Val 340 345 350Ala Gln Gly Lys Gly Thr Val Arg Ser Leu Ala
Pro Ile Ser Ala Val 355 360 365Glu Ala Leu Arg Ser Ala Glu Asp Leu
Leu Leu Arg Tyr Gly Gly His 370 375 380Lys Glu Ala Ala Gly Phe Ala
Met Asp Glu Ala Leu Phe Pro Ala Phe385 390 395 400Lys Ala Arg Val
Glu Ala Tyr Ala Ala Arg Phe Pro Asp Pro Val Arg 405 410 415Glu Val
Ala Leu Leu Asp Leu Leu Pro Glu Pro Gly Leu Leu Pro Gln 420 425
430Val Phe Arg Glu Leu Ala Leu Leu Glu Pro Tyr Gly Glu Gly Asn Pro
435 440 445Glu Pro Leu Phe Leu Ser Gly Ser Gly Ser Gly Ser Gly Ser
Gly Thr 450 455 460Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu
Asn Gly Met His465 470 475 480Lys Lys Val Phe Tyr Ser Phe Ile Asp
Asp Lys Asn His Asn Lys Lys 485 490 495Leu Leu Val Ile Arg Thr Lys
Gly Thr Ile Ala Gly Gln Tyr Arg Val 500 505 510Tyr Ser Glu Glu Gly
Ala Asn Lys Ser Gly Leu Ala Trp Pro Ser Ala 515 520 525Phe Lys Val
Gln Leu Gln Leu Pro Asp Asn Glu Val Ala Gln Ile Ser 530 535 540Asp
Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr Arg Ser Thr545 550
555 560Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp Asp Thr Gly
Lys 565 570 575Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly His
Thr Leu Lys 580 585 590Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu
Ser Pro Thr Asp Lys 595 600 605Lys Val Gly Trp Lys Val Ile Phe Asn
Asn Met Val Asn Gln Asn Trp 610 615 620Gly Pro Tyr Asp Arg Asp Ser
Trp Asn Pro Val Tyr Gly Asn Gln Leu625 630 635 640Phe Met Lys Thr
Arg Asn Gly Ser Met Lys Ala Ala Asp Asn Phe Leu 645 650 655Asp Pro
Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe Ser Pro Asp 660 665
670Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser Lys Gln Gln Thr
675 680 685Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr Gln
Leu His 690 695 700Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys
Asp Lys Trp Thr705 710 715 720Asp Arg Ser Ser Glu Arg Tyr Lys Ile
Asp Trp Glu Lys Glu Glu Met 725 730 735Thr Asn Gly Gly Ser Gly Ser
Ser Gly Gly Ser Ser His His His His 740 745 750His
His251785DNAArtificial sequenceHL-RQC-EcoExoIII-L2-D45-N47delta-H6
25atg gca gat tct gat att aat att aaa acc ggt act aca gat att gga
48 Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly 1 5
10 15agc aat act aca gta aaa aca ggt gat tta gtc act tat gat aaa
gaa 96Ser Asn Thr Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu 20 25 30aat ggc atg cac aaa aaa gta ttt tat agt ttt atc gat tcc
gga agc 144Asn Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Ser
Gly Ser 35 40 45ggc tct ggt agt ggt tct ggc atg aaa ttt gtt agc ttc
aat atc aac 192Gly Ser Gly Ser Gly Ser Gly Met Lys Phe Val Ser Phe
Asn Ile Asn 50 55 60ggc ctg cgc gcg cgc ccg cat cag ctg gaa gcg att
gtg gaa aaa cat 240Gly Leu Arg Ala Arg Pro His Gln Leu Glu Ala Ile
Val Glu Lys His 65 70 75cag ccg gat gtt att ggt ctg cag gaa acc aaa
gtt cac gat gat atg 288Gln Pro Asp Val Ile Gly Leu Gln Glu Thr Lys
Val His Asp Asp Met80 85 90 95ttt ccg ctg gaa gaa gtg gcg aaa ctg
ggc tat aac gtg ttt tat cat 336Phe Pro Leu Glu Glu Val Ala Lys Leu
Gly Tyr Asn Val Phe Tyr His 100 105 110ggc cag aaa ggt cat tat ggc
gtg gcc ctg ctg acc aaa gaa acc ccg 384Gly Gln Lys Gly His Tyr Gly
Val Ala Leu Leu Thr Lys Glu Thr Pro 115 120 125atc gcg gtt cgt cgt
ggt ttt ccg ggt gat gat gaa gaa gcg cag cgt 432Ile Ala Val Arg Arg
Gly Phe Pro Gly Asp Asp Glu Glu Ala Gln Arg 130 135 140cgt att att
atg gcg gaa att ccg agc ctg ctg ggc aat gtg acc gtt 480Arg Ile Ile
Met Ala Glu Ile Pro Ser Leu Leu Gly Asn Val Thr Val 145 150 155att
aac ggc tat ttt ccg cag ggc gaa agc cgt gat cat ccg att aaa 528Ile
Asn Gly Tyr Phe Pro Gln Gly Glu Ser Arg Asp His Pro Ile Lys160 165
170 175ttt ccg gcc aaa gcg cag ttc tat cag aac ctg cag aac tat ctg
gaa 576Phe Pro Ala Lys Ala Gln Phe Tyr Gln Asn Leu Gln Asn Tyr Leu
Glu 180 185 190acc gaa ctg aaa cgt gat aat ccg gtg ctg atc atg ggc
gat atg aac 624Thr Glu Leu Lys Arg Asp Asn Pro Val Leu Ile Met Gly
Asp Met Asn 195 200 205att agc ccg acc gat ctg gat att ggc att ggc
gaa gaa aac cgt aaa 672Ile Ser Pro Thr Asp Leu Asp Ile Gly Ile Gly
Glu Glu Asn Arg Lys 210 215 220cgc tgg ctg cgt acc ggt aaa tgc agc
ttt ctg ccg gaa gaa cgt gaa 720Arg Trp Leu Arg Thr Gly Lys Cys Ser
Phe Leu Pro Glu Glu Arg Glu 225 230 235tgg atg gat cgc ctg atg agc
tgg ggc ctg gtg gat acc ttt cgt cat 768Trp Met Asp Arg Leu Met Ser
Trp Gly Leu Val Asp Thr Phe Arg His240 245 250 255gcg aac ccg cag
acc gcc gat cgc ttt agc tgg ttt gat tat cgc agc 816Ala Asn Pro Gln
Thr Ala Asp Arg Phe Ser Trp Phe Asp Tyr Arg Ser 260 265 270aaa ggt
ttt gat gat aac cgt ggc ctg cgc att gat ctg ctg ctg gcg 864Lys Gly
Phe Asp Asp Asn Arg Gly Leu Arg Ile Asp Leu Leu Leu Ala 275 280
285agc cag ccg ctg gcg gaa tgc tgc gtt gaa acc ggt att gat tat gaa
912Ser Gln Pro Leu Ala Glu Cys Cys Val Glu Thr Gly Ile Asp Tyr Glu
290 295 300att cgc agc atg gaa aaa ccg agc gat cac gcc ccg gtg tgg
gcg acc 960Ile Arg Ser Met Glu Lys Pro Ser Asp His Ala Pro Val Trp
Ala Thr 305 310 315ttt cgc cgc tct ggc tct ggt tcc ggc agc ggt tcc
gga cac aat aaa 1008Phe Arg Arg Ser Gly Ser Gly Ser Gly Ser Gly Ser
Gly His Asn Lys320 325 330 335aaa ctg cta gtt att aga aca aaa ggt
acc att gct ggt caa tat aga 1056Lys Leu Leu Val Ile Arg Thr Lys Gly
Thr Ile Ala Gly Gln Tyr Arg 340 345 350gtt tat agc gaa gaa ggt gct
aac aaa agt ggt tta gcc tgg cct tca 1104Val Tyr Ser Glu Glu Gly Ala
Asn Lys Ser Gly Leu Ala Trp Pro Ser 355 360 365gcc ttt aag gta cag
ttg caa cta cct gat aat gaa gta gct caa ata 1152Ala Phe Lys Val Gln
Leu Gln Leu Pro Asp Asn Glu Val Ala Gln Ile 370 375 380tct gat tac
tat cca aga aat tcg att gat aca aaa gag tat agg agt 1200Ser Asp Tyr
Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr Arg Ser 385 390 395act
tta act tat gga ttc aac ggt aat gtt act ggt gat gat aca gga 1248Thr
Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly Asp Asp Thr Gly400 405
410 415aaa att ggc ggc tgt att ggt gca caa gtt tcg att ggt cat aca
ctg 1296Lys Ile Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly His Thr
Leu 420 425 430aaa tat gtt caa cct gat ttc aaa aca att tta gag agc
cca act gat 1344Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser
Pro Thr Asp 435 440 445aaa aaa gta ggc tgg aaa gtg ata ttt aac aat
atg gtg aat caa aat 1392Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn
Met Val Asn Gln Asn 450 455 460tgg gga cca tac gat cga gat tct tgg
aac ccg gta tat ggc aat caa 1440Trp Gly Pro Tyr Asp Arg Asp Ser Trp
Asn Pro Val Tyr Gly Asn Gln 465 470 475ctt ttc atg aaa act aga aat
ggt tct atg aaa gca gca gat aac ttc 1488Leu Phe Met Lys Thr Arg Asn
Gly Ser Met Lys Ala Ala Asp Asn Phe480 485 490 495ctt gat cct aac
aaa gca agt tct cta tta tct tca ggg ttt tca cca 1536Leu Asp Pro Asn
Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe Ser Pro 500 505 510gac ttc
gct aca gtt att act atg gat aga aaa gca tcc aaa caa caa 1584Asp Phe
Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser Lys Gln Gln 515 520
525aca aat ata gat gta ata tac gaa cga gtt cgt gat gat tac caa ttg
1632Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg Asp Asp Tyr Gln Leu
530 535 540cat tgg act tca aca aat tgg aaa ggt acc aat act aaa gat
aaa tgg 1680His Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys Asp
Lys Trp 545 550 555aca gat cgt tct tca gaa aga tat aaa atc gat tgg
gaa aaa gaa gaa 1728Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp
Glu Lys Glu Glu560 565 570 575atg aca aat ggt ggt tcg ggc tca tct
ggt ggc tcg agt cac cat cat 1776Met Thr Asn Gly Gly Ser Gly Ser Ser
Gly Gly Ser Ser His His His 580 585 590cat cac cac 1785His His
His26594PRTArtificial sequenceHL-RQC-EcoExoIII-L2-D45-N47delta-H6
26Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1
5 10 15Asn Thr Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu
Asn 20 25 30Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Ser Gly
Ser Gly 35 40 45Ser Gly Ser Gly Ser Gly Met Lys Phe Val Ser Phe Asn
Ile Asn Gly 50 55 60Leu Arg Ala Arg Pro His Gln Leu Glu Ala Ile Val
Glu Lys His Gln65 70 75 80Pro Asp Val Ile Gly Leu Gln Glu Thr Lys
Val His Asp Asp Met Phe 85
90 95Pro Leu Glu Glu Val Ala Lys Leu Gly Tyr Asn Val Phe Tyr His
Gly 100 105 110Gln Lys Gly His Tyr Gly Val Ala Leu Leu Thr Lys Glu
Thr Pro Ile 115 120 125Ala Val Arg Arg Gly Phe Pro Gly Asp Asp Glu
Glu Ala Gln Arg Arg 130 135 140Ile Ile Met Ala Glu Ile Pro Ser Leu
Leu Gly Asn Val Thr Val Ile145 150 155 160Asn Gly Tyr Phe Pro Gln
Gly Glu Ser Arg Asp His Pro Ile Lys Phe 165 170 175Pro Ala Lys Ala
Gln Phe Tyr Gln Asn Leu Gln Asn Tyr Leu Glu Thr 180 185 190Glu Leu
Lys Arg Asp Asn Pro Val Leu Ile Met Gly Asp Met Asn Ile 195 200
205Ser Pro Thr Asp Leu Asp Ile Gly Ile Gly Glu Glu Asn Arg Lys Arg
210 215 220Trp Leu Arg Thr Gly Lys Cys Ser Phe Leu Pro Glu Glu Arg
Glu Trp225 230 235 240Met Asp Arg Leu Met Ser Trp Gly Leu Val Asp
Thr Phe Arg His Ala 245 250 255Asn Pro Gln Thr Ala Asp Arg Phe Ser
Trp Phe Asp Tyr Arg Ser Lys 260 265 270Gly Phe Asp Asp Asn Arg Gly
Leu Arg Ile Asp Leu Leu Leu Ala Ser 275 280 285Gln Pro Leu Ala Glu
Cys Cys Val Glu Thr Gly Ile Asp Tyr Glu Ile 290 295 300Arg Ser Met
Glu Lys Pro Ser Asp His Ala Pro Val Trp Ala Thr Phe305 310 315
320Arg Arg Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly His Asn Lys Lys
325 330 335Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly Gln Tyr
Arg Val 340 345 350Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala
Trp Pro Ser Ala 355 360 365Phe Lys Val Gln Leu Gln Leu Pro Asp Asn
Glu Val Ala Gln Ile Ser 370 375 380Asp Tyr Tyr Pro Arg Asn Ser Ile
Asp Thr Lys Glu Tyr Arg Ser Thr385 390 395 400Leu Thr Tyr Gly Phe
Asn Gly Asn Val Thr Gly Asp Asp Thr Gly Lys 405 410 415Ile Gly Gly
Cys Ile Gly Ala Gln Val Ser Ile Gly His Thr Leu Lys 420 425 430Tyr
Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser Pro Thr Asp Lys 435 440
445Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn Gln Asn Trp
450 455 460Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr Gly Asn
Gln Leu465 470 475 480Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala
Ala Asp Asn Phe Leu 485 490 495Asp Pro Asn Lys Ala Ser Ser Leu Leu
Ser Ser Gly Phe Ser Pro Asp 500 505 510Phe Ala Thr Val Ile Thr Met
Asp Arg Lys Ala Ser Lys Gln Gln Thr 515 520 525Asn Ile Asp Val Ile
Tyr Glu Arg Val Arg Asp Asp Tyr Gln Leu His 530 535 540Trp Thr Ser
Thr Asn Trp Lys Gly Thr Asn Thr Lys Asp Lys Trp Thr545 550 555
560Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu Lys Glu Glu Met
565 570 575Thr Asn Gly Gly Ser Gly Ser Ser Gly Gly Ser Ser His His
His His 580 585 590His His272364DNAArtificial
sequenceHL-RQC-EcoExoI-Cter-{SG}8-H6 27atg gca gat tct gat att aat
att aaa acc ggt act aca gat att gga 48 Ala Asp Ser Asp Ile Asn Ile
Lys Thr Gly Thr Thr Asp Ile Gly 1 5 10 15agc aat act aca gta aaa
aca ggt gat tta gtc act tat gat aaa gaa 96Ser Asn Thr Thr Val Lys
Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu 20 25 30aat ggc atg cac aaa
aaa gta ttt tat agt ttt atc gat gat aaa aat 144Asn Gly Met His Lys
Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn 35 40 45cac aat aaa aaa
ctg cta gtt att aga aca aaa ggt acc att gct ggt 192His Asn Lys Lys
Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly 50 55 60caa tat aga
gtt tat agc gaa gaa ggt gct aac aaa agt ggt tta gcc 240Gln Tyr Arg
Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala 65 70 75tgg cct
tca gcc ttt aag gta cag ttg caa cta cct gat aat gaa gta 288Trp Pro
Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn Glu Val80 85 90
95gct caa ata tct gat tac tat cca aga aat tcg att gat aca aaa gag
336Ala Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu
100 105 110tat agg agt act tta act tat gga ttc aac ggt aat gtt act
ggt gat 384Tyr Arg Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr
Gly Asp 115 120 125gat aca gga aaa att ggc ggc tgt att ggt gca caa
gtt tcg att ggt 432Asp Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala Gln
Val Ser Ile Gly 130 135 140cat aca ctg aaa tat gtt caa cct gat ttc
aaa aca att tta gag agc 480His Thr Leu Lys Tyr Val Gln Pro Asp Phe
Lys Thr Ile Leu Glu Ser 145 150 155cca act gat aaa aaa gta ggc tgg
aaa gtg ata ttt aac aat atg gtg 528Pro Thr Asp Lys Lys Val Gly Trp
Lys Val Ile Phe Asn Asn Met Val160 165 170 175aat caa aat tgg gga
cca tac gat cga gat tct tgg aac ccg gta tat 576Asn Gln Asn Trp Gly
Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr 180 185 190ggc aat caa
ctt ttc atg aaa act aga aat ggt tct atg aaa gca gca 624Gly Asn Gln
Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala Ala 195 200 205gat
aac ttc ctt gat cct aac aaa gca agt tct cta tta tct tca ggg 672Asp
Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly 210 215
220ttt tca cca gac ttc gct aca gtt att act atg gat aga aaa gca tcc
720Phe Ser Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser
225 230 235aaa caa caa aca aat ata gat gta ata tac gaa cga gtt cgt
gat gat 768Lys Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg Val Arg
Asp Asp240 245 250 255tac caa ttg cat tgg act tca aca aat tgg aaa
ggt acc aat act aaa 816Tyr Gln Leu His Trp Thr Ser Thr Asn Trp Lys
Gly Thr Asn Thr Lys 260 265 270gat aaa tgg aca gat cgt tct tca gaa
aga tat aaa atc gat tgg gaa 864Asp Lys Trp Thr Asp Arg Ser Ser Glu
Arg Tyr Lys Ile Asp Trp Glu 275 280 285aaa gaa gaa atg aca aat tcc
ggt agc ggc tct ggt tct ggc tct ggt 912Lys Glu Glu Met Thr Asn Ser
Gly Ser Gly Ser Gly Ser Gly Ser Gly 290 295 300tcc ggc agc ggt tcc
gga cag agc acc ttc ctg ttt cat gat tat gaa 960Ser Gly Ser Gly Ser
Gly Gln Ser Thr Phe Leu Phe His Asp Tyr Glu 305 310 315acc ttc ggt
acc cat ccg gcc ctg gat cgt ccg gcg cag ttt gcg gcc 1008Thr Phe Gly
Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala Ala320 325 330
335att cgc acc gat agc gaa ttc aat gtg att ggc gaa ccg gaa gtg ttt
1056Ile Arg Thr Asp Ser Glu Phe Asn Val Ile Gly Glu Pro Glu Val Phe
340 345 350tat tgc aaa ccg gcc gat gat tat ctg ccg cag ccg ggt gcg
gtg ctg 1104Tyr Cys Lys Pro Ala Asp Asp Tyr Leu Pro Gln Pro Gly Ala
Val Leu 355 360 365att acc ggt att acc ccg cag gaa gcg cgc gcg aaa
ggt gaa aac gaa 1152Ile Thr Gly Ile Thr Pro Gln Glu Ala Arg Ala Lys
Gly Glu Asn Glu 370 375 380gcg gcg ttt gcc gcg cgc att cat agc ctg
ttt acc gtg ccg aaa acc 1200Ala Ala Phe Ala Ala Arg Ile His Ser Leu
Phe Thr Val Pro Lys Thr 385 390 395tgc att ctg ggc tat aac aat gtg
cgc ttc gat gat gaa gtt acc cgt 1248Cys Ile Leu Gly Tyr Asn Asn Val
Arg Phe Asp Asp Glu Val Thr Arg400 405 410 415aat atc ttt tat cgt
aac ttt tat gat ccg tat gcg tgg agc tgg cag 1296Asn Ile Phe Tyr Arg
Asn Phe Tyr Asp Pro Tyr Ala Trp Ser Trp Gln 420 425 430cat gat aac
agc cgt tgg gat ctg ctg gat gtg atg cgc gcg tgc tat 1344His Asp Asn
Ser Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys Tyr 435 440 445gcg
ctg cgc ccg gaa ggc att aat tgg ccg gaa aac gat gat ggc ctg 1392Ala
Leu Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp Asp Gly Leu 450 455
460ccg agc ttt cgt ctg gaa cat ctg acc aaa gcc aac ggc att gaa cat
1440Pro Ser Phe Arg Leu Glu His Leu Thr Lys Ala Asn Gly Ile Glu His
465 470 475agc aat gcc cat gat gcg atg gcc gat gtt tat gcg acc att
gcg atg 1488Ser Asn Ala His Asp Ala Met Ala Asp Val Tyr Ala Thr Ile
Ala Met480 485 490 495gcg aaa ctg gtt aaa acc cgt cag ccg cgc ctg
ttt gat tat ctg ttt 1536Ala Lys Leu Val Lys Thr Arg Gln Pro Arg Leu
Phe Asp Tyr Leu Phe 500 505 510acc cac cgt aac aaa cac aaa ctg atg
gcg ctg att gat gtt ccg cag 1584Thr His Arg Asn Lys His Lys Leu Met
Ala Leu Ile Asp Val Pro Gln 515 520 525atg aaa ccg ctg gtg cat gtg
agc ggc atg ttt ggc gcc tgg cgc ggc 1632Met Lys Pro Leu Val His Val
Ser Gly Met Phe Gly Ala Trp Arg Gly 530 535 540aac acc agc tgg gtg
gcc ccg ctg gcc tgg cac ccg gaa aat cgt aac 1680Asn Thr Ser Trp Val
Ala Pro Leu Ala Trp His Pro Glu Asn Arg Asn 545 550 555gcc gtg att
atg gtt gat ctg gcc ggt gat att agc ccg ctg ctg gaa 1728Ala Val Ile
Met Val Asp Leu Ala Gly Asp Ile Ser Pro Leu Leu Glu560 565 570
575ctg gat agc gat acc ctg cgt gaa cgc ctg tat acc gcc aaa acc gat
1776Leu Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala Lys Thr Asp
580 585 590ctg ggc gat aat gcc gcc gtg ccg gtg aaa ctg gtt cac att
aac aaa 1824Leu Gly Asp Asn Ala Ala Val Pro Val Lys Leu Val His Ile
Asn Lys 595 600 605tgc ccg gtg ctg gcc cag gcg aac acc ctg cgc ccg
gaa gat gcg gat 1872Cys Pro Val Leu Ala Gln Ala Asn Thr Leu Arg Pro
Glu Asp Ala Asp 610 615 620cgt ctg ggt att aat cgc cag cat tgt ctg
gat aat ctg aaa atc ctg 1920Arg Leu Gly Ile Asn Arg Gln His Cys Leu
Asp Asn Leu Lys Ile Leu 625 630 635cgt gaa aac ccg cag gtg cgt gaa
aaa gtg gtg gcg atc ttc gcg gaa 1968Arg Glu Asn Pro Gln Val Arg Glu
Lys Val Val Ala Ile Phe Ala Glu640 645 650 655gcg gaa ccg ttc acc
ccg agc gat aac gtg gat gcg cag ctg tat aac 2016Ala Glu Pro Phe Thr
Pro Ser Asp Asn Val Asp Ala Gln Leu Tyr Asn 660 665 670ggc ttc ttt
agc gat gcc gat cgc gcg gcg atg aaa atc gtt ctg gaa 2064Gly Phe Phe
Ser Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu Glu 675 680 685acc
gaa ccg cgc aat ctg ccg gcg ctg gat att acc ttt gtt gat aaa 2112Thr
Glu Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe Val Asp Lys 690 695
700cgt att gaa aaa ctg ctg ttt aat tat cgt gcg cgc aat ttt ccg ggt
2160Arg Ile Glu Lys Leu Leu Phe Asn Tyr Arg Ala Arg Asn Phe Pro Gly
705 710 715acc ctg gat tat gcc gaa cag cag cgt tgg ctg gaa cat cgt
cgt cag 2208Thr Leu Asp Tyr Ala Glu Gln Gln Arg Trp Leu Glu His Arg
Arg Gln720 725 730 735gtt ttc acc ccg gaa ttt ctg cag ggt tat gcg
gat gaa ctg cag atg 2256Val Phe Thr Pro Glu Phe Leu Gln Gly Tyr Ala
Asp Glu Leu Gln Met 740 745 750ctg gtt cag cag tat gcc gat gat aaa
gaa aaa gtg gcg ctg ctg aaa 2304Leu Val Gln Gln Tyr Ala Asp Asp Lys
Glu Lys Val Ala Leu Leu Lys 755 760 765gcg ctg tgg cag tat gcg gaa
gaa atc gtt tct ggc tct ggt cac cat 2352Ala Leu Trp Gln Tyr Ala Glu
Glu Ile Val Ser Gly Ser Gly His His 770 775 780cat cat cac cac
2364His His His His 78528787PRTArtificial
sequenceHL-RQC-EcoExoI-Cter-{SG}8-H6 28Ala Asp Ser Asp Ile Asn Ile
Lys Thr Gly Thr Thr Asp Ile Gly Ser1 5 10 15Asn Thr Thr Val Lys Thr
Gly Asp Leu Val Thr Tyr Asp Lys Glu Asn 20 25 30Gly Met His Lys Lys
Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn His 35 40 45Asn Lys Lys Leu
Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly Gln 50 55 60Tyr Arg Val
Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala Trp65 70 75 80Pro
Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn Glu Val Ala 85 90
95Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys Glu Tyr
100 105 110Arg Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val Thr Gly
Asp Asp 115 120 125Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala Gln Val
Ser Ile Gly His 130 135 140Thr Leu Lys Tyr Val Gln Pro Asp Phe Lys
Thr Ile Leu Glu Ser Pro145 150 155 160Thr Asp Lys Lys Val Gly Trp
Lys Val Ile Phe Asn Asn Met Val Asn 165 170 175Gln Asn Trp Gly Pro
Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr Gly 180 185 190Asn Gln Leu
Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala Ala Asp 195 200 205Asn
Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe 210 215
220Ser Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala Ser
Lys225 230 235 240Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg Val
Arg Asp Asp Tyr 245 250 255Gln Leu His Trp Thr Ser Thr Asn Trp Lys
Gly Thr Asn Thr Lys Asp 260 265 270Lys Trp Thr Asp Arg Ser Ser Glu
Arg Tyr Lys Ile Asp Trp Glu Lys 275 280 285Glu Glu Met Thr Asn Ser
Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser 290 295 300Gly Ser Gly Ser
Gly Gln Ser Thr Phe Leu Phe His Asp Tyr Glu Thr305 310 315 320Phe
Gly Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala Ala Ile 325 330
335Arg Thr Asp Ser Glu Phe Asn Val Ile Gly Glu Pro Glu Val Phe Tyr
340 345 350Cys Lys Pro Ala Asp Asp Tyr Leu Pro Gln Pro Gly Ala Val
Leu Ile 355 360 365Thr Gly Ile Thr Pro Gln Glu Ala Arg Ala Lys Gly
Glu Asn Glu Ala 370 375 380Ala Phe Ala Ala Arg Ile His Ser Leu Phe
Thr Val Pro Lys Thr Cys385 390 395 400Ile Leu Gly Tyr Asn Asn Val
Arg Phe Asp Asp Glu Val Thr Arg Asn 405 410 415Ile Phe Tyr Arg Asn
Phe Tyr Asp Pro Tyr Ala Trp Ser Trp Gln His 420 425 430Asp Asn Ser
Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys Tyr Ala 435 440 445Leu
Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp Asp Gly Leu Pro 450 455
460Ser Phe Arg Leu Glu His Leu Thr Lys Ala Asn Gly Ile Glu His
Ser465 470 475 480Asn Ala His Asp Ala Met Ala Asp Val Tyr Ala Thr
Ile Ala Met Ala 485 490 495Lys Leu Val Lys Thr Arg Gln Pro Arg Leu
Phe Asp Tyr Leu Phe Thr 500 505 510His Arg Asn Lys His Lys Leu Met
Ala Leu Ile Asp Val Pro Gln Met 515 520 525Lys Pro Leu Val His Val
Ser Gly Met Phe Gly Ala Trp Arg Gly Asn 530 535 540Thr Ser Trp Val
Ala Pro Leu Ala Trp His Pro Glu Asn Arg Asn Ala545 550 555 560Val
Ile Met Val Asp Leu Ala Gly Asp Ile Ser Pro Leu Leu Glu Leu 565 570
575Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala Lys Thr Asp Leu
580 585 590Gly Asp Asn Ala Ala Val Pro Val Lys Leu Val His Ile Asn
Lys Cys 595 600 605Pro Val Leu Ala Gln Ala Asn Thr Leu Arg Pro Glu
Asp Ala Asp Arg 610 615 620Leu Gly Ile Asn Arg Gln His Cys Leu Asp
Asn Leu Lys Ile Leu Arg625 630 635 640Glu Asn Pro Gln Val Arg Glu
Lys Val Val Ala Ile Phe Ala Glu Ala 645 650 655Glu Pro Phe Thr Pro
Ser Asp Asn Val Asp Ala Gln Leu Tyr Asn Gly 660 665 670Phe Phe Ser
Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu Glu Thr 675 680
685Glu Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe Val Asp Lys Arg
690 695 700Ile Glu Lys Leu Leu Phe Asn Tyr Arg Ala Arg Asn Phe Pro
Gly Thr705 710 715 720Leu Asp Tyr Ala Glu Gln Gln Arg Trp Leu Glu
His Arg Arg Gln Val 725 730 735Phe Thr Pro Glu Phe Leu Gln Gly Tyr
Ala Asp Glu Leu Gln Met Leu 740 745 750Val Gln Gln Tyr Ala Asp Asp
Lys Glu Lys Val Ala Leu Leu Lys Ala 755 760 765Leu Trp Gln Tyr Ala
Glu Glu Ile Val Ser Gly Ser Gly His His His 770 775 780His His
His785292370DNAArtificial sequenceHL-RQC-EcoExoI-Cter-DG{SG}8-H6
29atg gca gat tct gat att aat att aaa acc ggt act aca gat att gga
48 Ala Asp Ser Asp Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly 1 5
10 15agc aat act aca gta aaa aca ggt gat tta gtc act tat gat aaa
gaa 96Ser Asn Thr Thr Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys
Glu 20 25 30aat ggc atg cac aaa aaa gta ttt tat agt ttt atc gat gat
aaa aat 144Asn Gly Met His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp
Lys Asn 35 40 45cac aat aaa aaa ctg cta gtt att aga aca aaa ggt acc
att gct ggt 192His Asn Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr
Ile Ala Gly 50 55 60caa tat aga gtt tat agc gaa gaa ggt gct aac aaa
agt ggt tta gcc 240Gln Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys
Ser Gly Leu Ala 65 70 75tgg cct tca gcc ttt aag gta cag ttg caa cta
cct gat aat gaa gta 288Trp Pro Ser Ala Phe Lys Val Gln Leu Gln Leu
Pro Asp Asn Glu Val80 85 90 95gct caa ata tct gat tac tat cca aga
aat tcg att gat aca aaa gag 336Ala Gln Ile Ser Asp Tyr Tyr Pro Arg
Asn Ser Ile Asp Thr Lys Glu 100 105 110tat agg agt act tta act tat
gga ttc aac ggt aat gtt act ggt gat 384Tyr Arg Ser Thr Leu Thr Tyr
Gly Phe Asn Gly Asn Val Thr Gly Asp 115 120 125gat aca gga aaa att
ggc ggc tgt att ggt gca caa gtt tcg att ggt 432Asp Thr Gly Lys Ile
Gly Gly Cys Ile Gly Ala Gln Val Ser Ile Gly 130 135 140cat aca ctg
aaa tat gtt caa cct gat ttc aaa aca att tta gag agc 480His Thr Leu
Lys Tyr Val Gln Pro Asp Phe Lys Thr Ile Leu Glu Ser 145 150 155cca
act gat aaa aaa gta ggc tgg aaa gtg ata ttt aac aat atg gtg 528Pro
Thr Asp Lys Lys Val Gly Trp Lys Val Ile Phe Asn Asn Met Val160 165
170 175aat caa aat tgg gga cca tac gat cga gat tct tgg aac ccg gta
tat 576Asn Gln Asn Trp Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val
Tyr 180 185 190ggc aat caa ctt ttc atg aaa act aga aat ggt tct atg
aaa gca gca 624Gly Asn Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met
Lys Ala Ala 195 200 205gat aac ttc ctt gat cct aac aaa gca agt tct
cta tta tct tca ggg 672Asp Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser
Leu Leu Ser Ser Gly 210 215 220ttt tca cca gac ttc gct aca gtt att
act atg gat aga aaa gca tcc 720Phe Ser Pro Asp Phe Ala Thr Val Ile
Thr Met Asp Arg Lys Ala Ser 225 230 235aaa caa caa aca aat ata gat
gta ata tac gaa cga gtt cgt gat gat 768Lys Gln Gln Thr Asn Ile Asp
Val Ile Tyr Glu Arg Val Arg Asp Asp240 245 250 255tac caa ttg cat
tgg act tca aca aat tgg aaa ggt acc aat act aaa 816Tyr Gln Leu His
Trp Thr Ser Thr Asn Trp Lys Gly Thr Asn Thr Lys 260 265 270gat aaa
tgg aca gat cgt tct tca gaa aga tat aaa atc gat tgg gaa 864Asp Lys
Trp Thr Asp Arg Ser Ser Glu Arg Tyr Lys Ile Asp Trp Glu 275 280
285aaa gaa gaa atg aca aat gat ggc tcc ggt agc ggc tct ggt tct ggc
912Lys Glu Glu Met Thr Asn Asp Gly Ser Gly Ser Gly Ser Gly Ser Gly
290 295 300tct ggt tcc ggc agc ggt tcc gga cag agc acc ttc ctg ttt
cat gat 960Ser Gly Ser Gly Ser Gly Ser Gly Gln Ser Thr Phe Leu Phe
His Asp 305 310 315tat gaa acc ttc ggt acc cat ccg gcc ctg gat cgt
ccg gcg cag ttt 1008Tyr Glu Thr Phe Gly Thr His Pro Ala Leu Asp Arg
Pro Ala Gln Phe320 325 330 335gcg gcc att cgc acc gat agc gaa ttc
aat gtg att ggc gaa ccg gaa 1056Ala Ala Ile Arg Thr Asp Ser Glu Phe
Asn Val Ile Gly Glu Pro Glu 340 345 350gtg ttt tat tgc aaa ccg gcc
gat gat tat ctg ccg cag ccg ggt gcg 1104Val Phe Tyr Cys Lys Pro Ala
Asp Asp Tyr Leu Pro Gln Pro Gly Ala 355 360 365gtg ctg att acc ggt
att acc ccg cag gaa gcg cgc gcg aaa ggt gaa 1152Val Leu Ile Thr Gly
Ile Thr Pro Gln Glu Ala Arg Ala Lys Gly Glu 370 375 380aac gaa gcg
gcg ttt gcc gcg cgc att cat agc ctg ttt acc gtg ccg 1200Asn Glu Ala
Ala Phe Ala Ala Arg Ile His Ser Leu Phe Thr Val Pro 385 390 395aaa
acc tgc att ctg ggc tat aac aat gtg cgc ttc gat gat gaa gtt 1248Lys
Thr Cys Ile Leu Gly Tyr Asn Asn Val Arg Phe Asp Asp Glu Val400 405
410 415acc cgt aat atc ttt tat cgt aac ttt tat gat ccg tat gcg tgg
agc 1296Thr Arg Asn Ile Phe Tyr Arg Asn Phe Tyr Asp Pro Tyr Ala Trp
Ser 420 425 430tgg cag cat gat aac agc cgt tgg gat ctg ctg gat gtg
atg cgc gcg 1344Trp Gln His Asp Asn Ser Arg Trp Asp Leu Leu Asp Val
Met Arg Ala 435 440 445tgc tat gcg ctg cgc ccg gaa ggc att aat tgg
ccg gaa aac gat gat 1392Cys Tyr Ala Leu Arg Pro Glu Gly Ile Asn Trp
Pro Glu Asn Asp Asp 450 455 460ggc ctg ccg agc ttt cgt ctg gaa cat
ctg acc aaa gcc aac ggc att 1440Gly Leu Pro Ser Phe Arg Leu Glu His
Leu Thr Lys Ala Asn Gly Ile 465 470 475gaa cat agc aat gcc cat gat
gcg atg gcc gat gtt tat gcg acc att 1488Glu His Ser Asn Ala His Asp
Ala Met Ala Asp Val Tyr Ala Thr Ile480 485 490 495gcg atg gcg aaa
ctg gtt aaa acc cgt cag ccg cgc ctg ttt gat tat 1536Ala Met Ala Lys
Leu Val Lys Thr Arg Gln Pro Arg Leu Phe Asp Tyr 500 505 510ctg ttt
acc cac cgt aac aaa cac aaa ctg atg gcg ctg att gat gtt 1584Leu Phe
Thr His Arg Asn Lys His Lys Leu Met Ala Leu Ile Asp Val 515 520
525ccg cag atg aaa ccg ctg gtg cat gtg agc ggc atg ttt ggc gcc tgg
1632Pro Gln Met Lys Pro Leu Val His Val Ser Gly Met Phe Gly Ala Trp
530 535 540cgc ggc aac acc agc tgg gtg gcc ccg ctg gcc tgg cac ccg
gaa aat 1680Arg Gly Asn Thr Ser Trp Val Ala Pro Leu Ala Trp His Pro
Glu Asn 545 550 555cgt aac gcc gtg att atg gtt gat ctg gcc ggt gat
att agc ccg ctg 1728Arg Asn Ala Val Ile Met Val Asp Leu Ala Gly Asp
Ile Ser Pro Leu560 565 570 575ctg gaa ctg gat agc gat acc ctg cgt
gaa cgc ctg tat acc gcc aaa 1776Leu Glu Leu Asp Ser Asp Thr Leu Arg
Glu Arg Leu Tyr Thr Ala Lys 580 585 590acc gat ctg ggc gat aat gcc
gcc gtg ccg gtg aaa ctg gtt cac att 1824Thr Asp Leu Gly Asp Asn Ala
Ala Val Pro Val Lys Leu Val His Ile 595 600 605aac aaa tgc ccg gtg
ctg gcc cag gcg aac acc ctg cgc ccg gaa gat 1872Asn Lys Cys Pro Val
Leu Ala Gln Ala Asn Thr Leu Arg Pro Glu Asp 610 615 620gcg gat cgt
ctg ggt att aat cgc cag cat tgt ctg gat aat ctg aaa 1920Ala Asp Arg
Leu Gly Ile Asn Arg Gln His Cys Leu Asp Asn Leu Lys 625 630 635atc
ctg cgt gaa aac ccg cag gtg cgt gaa aaa gtg gtg gcg atc ttc 1968Ile
Leu Arg Glu Asn Pro Gln Val Arg Glu Lys Val Val Ala Ile Phe640 645
650 655gcg gaa gcg gaa ccg ttc acc ccg agc gat aac gtg gat gcg cag
ctg 2016Ala Glu Ala Glu Pro Phe Thr Pro Ser Asp Asn Val Asp Ala Gln
Leu 660 665 670tat aac ggc ttc ttt agc gat gcc gat cgc gcg gcg atg
aaa atc gtt 2064Tyr Asn Gly Phe Phe Ser Asp Ala Asp Arg Ala Ala Met
Lys Ile Val 675 680 685ctg gaa acc gaa ccg cgc aat ctg ccg gcg ctg
gat att acc ttt gtt 2112Leu Glu Thr Glu Pro Arg Asn Leu Pro Ala Leu
Asp Ile Thr Phe Val 690 695 700gat aaa cgt att gaa aaa ctg ctg ttt
aat tat cgt gcg cgc aat ttt 2160Asp Lys Arg Ile Glu Lys Leu Leu Phe
Asn Tyr Arg Ala Arg Asn Phe 705 710 715ccg ggt acc ctg gat tat gcc
gaa cag cag cgt tgg ctg gaa cat cgt 2208Pro Gly Thr Leu Asp Tyr Ala
Glu Gln Gln Arg Trp Leu Glu His Arg720 725 730 735cgt cag gtt ttc
acc ccg gaa ttt ctg cag ggt tat gcg gat gaa ctg 2256Arg Gln Val Phe
Thr Pro Glu Phe Leu Gln Gly Tyr Ala Asp Glu Leu 740 745 750cag atg
ctg gtt cag cag tat gcc gat gat aaa gaa aaa gtg gcg ctg 2304Gln Met
Leu Val Gln Gln Tyr Ala Asp Asp Lys Glu Lys Val Ala Leu 755 760
765ctg aaa gcg ctg tgg cag tat gcg gaa gaa atc gtt tct ggc tct ggt
2352Leu Lys Ala Leu Trp Gln Tyr Ala Glu Glu Ile Val Ser Gly Ser Gly
770 775 780cac cat cat cat cac cac 2370His His His His His His
78530789PRTArtificial sequenceSynthetic Construct 30Ala Asp Ser Asp
Ile Asn Ile Lys Thr Gly Thr Thr Asp Ile Gly Ser1 5 10 15Asn Thr Thr
Val Lys Thr Gly Asp Leu Val Thr Tyr Asp Lys Glu Asn 20 25 30Gly Met
His Lys Lys Val Phe Tyr Ser Phe Ile Asp Asp Lys Asn His 35 40 45Asn
Lys Lys Leu Leu Val Ile Arg Thr Lys Gly Thr Ile Ala Gly Gln 50 55
60Tyr Arg Val Tyr Ser Glu Glu Gly Ala Asn Lys Ser Gly Leu Ala Trp65
70 75 80Pro Ser Ala Phe Lys Val Gln Leu Gln Leu Pro Asp Asn Glu Val
Ala 85 90 95Gln Ile Ser Asp Tyr Tyr Pro Arg Asn Ser Ile Asp Thr Lys
Glu Tyr 100 105 110Arg Ser Thr Leu Thr Tyr Gly Phe Asn Gly Asn Val
Thr Gly Asp Asp 115 120 125Thr Gly Lys Ile Gly Gly Cys Ile Gly Ala
Gln Val Ser Ile Gly His 130 135 140Thr Leu Lys Tyr Val Gln Pro Asp
Phe Lys Thr Ile Leu Glu Ser Pro145 150 155 160Thr Asp Lys Lys Val
Gly Trp Lys Val Ile Phe Asn Asn Met Val Asn 165 170 175Gln Asn Trp
Gly Pro Tyr Asp Arg Asp Ser Trp Asn Pro Val Tyr Gly 180 185 190Asn
Gln Leu Phe Met Lys Thr Arg Asn Gly Ser Met Lys Ala Ala Asp 195 200
205Asn Phe Leu Asp Pro Asn Lys Ala Ser Ser Leu Leu Ser Ser Gly Phe
210 215 220Ser Pro Asp Phe Ala Thr Val Ile Thr Met Asp Arg Lys Ala
Ser Lys225 230 235 240Gln Gln Thr Asn Ile Asp Val Ile Tyr Glu Arg
Val Arg Asp Asp Tyr 245 250 255Gln Leu His Trp Thr Ser Thr Asn Trp
Lys Gly Thr Asn Thr Lys Asp 260 265 270Lys Trp Thr Asp Arg Ser Ser
Glu Arg Tyr Lys Ile Asp Trp Glu Lys 275 280 285Glu Glu Met Thr Asn
Asp Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser 290 295 300Gly Ser Gly
Ser Gly Ser Gly Gln Ser Thr Phe Leu Phe His Asp Tyr305 310 315
320Glu Thr Phe Gly Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala
325 330 335Ala Ile Arg Thr Asp Ser Glu Phe Asn Val Ile Gly Glu Pro
Glu Val 340 345 350Phe Tyr Cys Lys Pro Ala Asp Asp Tyr Leu Pro Gln
Pro Gly Ala Val 355 360 365Leu Ile Thr Gly Ile Thr Pro Gln Glu Ala
Arg Ala Lys Gly Glu Asn 370 375 380Glu Ala Ala Phe Ala Ala Arg Ile
His Ser Leu Phe Thr Val Pro Lys385 390 395 400Thr Cys Ile Leu Gly
Tyr Asn Asn Val Arg Phe Asp Asp Glu Val Thr 405 410 415Arg Asn Ile
Phe Tyr Arg Asn Phe Tyr Asp Pro Tyr Ala Trp Ser Trp 420 425 430Gln
His Asp Asn Ser Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys 435 440
445Tyr Ala Leu Arg Pro Glu Gly Ile Asn Trp Pro Glu Asn Asp Asp Gly
450 455 460Leu Pro Ser Phe Arg Leu Glu His Leu Thr Lys Ala Asn Gly
Ile Glu465 470 475 480His Ser Asn Ala His Asp Ala Met Ala Asp Val
Tyr Ala Thr Ile Ala 485 490 495Met Ala Lys Leu Val Lys Thr Arg Gln
Pro Arg Leu Phe Asp Tyr Leu 500 505 510Phe Thr His Arg Asn Lys His
Lys Leu Met Ala Leu Ile Asp Val Pro 515 520 525Gln Met Lys Pro Leu
Val His Val Ser Gly Met Phe Gly Ala Trp Arg 530 535 540Gly Asn Thr
Ser Trp Val Ala Pro Leu Ala Trp His Pro Glu Asn Arg545 550 555
560Asn Ala Val Ile Met Val Asp Leu Ala Gly Asp Ile Ser Pro Leu Leu
565 570 575Glu Leu Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala
Lys Thr 580 585 590Asp Leu Gly Asp Asn Ala Ala Val Pro Val Lys Leu
Val His Ile Asn 595 600 605Lys Cys Pro Val Leu Ala Gln Ala Asn Thr
Leu Arg Pro Glu Asp Ala 610 615 620Asp Arg Leu Gly Ile Asn Arg Gln
His Cys Leu Asp Asn Leu Lys Ile625 630 635 640Leu Arg Glu Asn Pro
Gln Val Arg Glu Lys Val Val Ala Ile Phe Ala 645 650 655Glu Ala Glu
Pro Phe Thr Pro Ser Asp Asn Val Asp Ala Gln Leu Tyr 660 665 670Asn
Gly Phe Phe Ser Asp Ala Asp Arg Ala Ala Met Lys Ile Val Leu 675 680
685Glu Thr Glu Pro Arg Asn Leu Pro Ala Leu Asp Ile Thr Phe Val Asp
690 695 700Lys Arg Ile Glu Lys Leu Leu Phe Asn Tyr Arg Ala Arg Asn
Phe Pro705 710 715 720Gly Thr Leu Asp Tyr Ala Glu Gln Gln Arg Trp
Leu Glu His Arg Arg 725 730 735Gln Val Phe Thr Pro Glu Phe Leu Gln
Gly Tyr Ala Asp Glu Leu Gln 740 745 750Met Leu Val Gln Gln Tyr Ala
Asp Asp Lys Glu Lys Val Ala Leu Leu 755 760 765Lys Ala Leu Trp Gln
Tyr Ala Glu Glu Ile Val Ser Gly Ser Gly His 770 775 780His His His
His His7853150DNAArtificial sequenceoligonucleotide for exonuclease
assay 31gcaacagagc tgatggatca aatgcattag gtaaacatgt tacgtcgtaa
503255DNAArtificial sequenceoligonucleotide for exonuclease assay
32cgatcttacg acgtaacatg tttacctaat gcatttgatc catcagctct gttgc
55
* * * * *
References