U.S. patent application number 16/624341 was filed with the patent office on 2022-01-27 for novel protein pores.
This patent application is currently assigned to Oxford Nanopore Technologies Limited. The applicant listed for this patent is Oxford Nanopore Technologies Limited, VIB VZW, Vrije Universiteit Brussel. Invention is credited to Richard George Hambley, Lakmal Nishantha Jayasinghe, Michael Robert Jordan, John Joseph Kilgour, Han Remaut, Pratik Raj Singh, Sander Egbert Van Der Verren, Nani Van Gerven, Elizabeth Jayne Wallace.
Application Number | 20220024985 16/624341 |
Document ID | / |
Family ID | 1000006075710 |
Filed Date | 2022-01-27 |
United States Patent
Application |
20220024985 |
Kind Code |
A9 |
Remaut; Han ; et
al. |
January 27, 2022 |
NOVEL PROTEIN PORES
Abstract
The present invention relates to novel protein pores and their
uses in analyte detection and characterisation. The invention
particularly relates to an isolated pore complex formed by a
CsgG-like pore and a modified CsgF peptide, or a homologue or
mutant thereof, thereby incorporating an additional channel
constriction or reader head in the nanopore. The invention further
relates to a transmembrane pore complex and methods for production
of the pore complex and for use in molecular sensing and nucleic
acid sequencing applications.
Inventors: |
Remaut; Han; (Gent, BE)
; Van Der Verren; Sander Egbert; (Serskamp, BE) ;
Van Gerven; Nani; (Huizingen, BE) ; Jayasinghe;
Lakmal Nishantha; (Oxford, GB) ; Wallace; Elizabeth
Jayne; (Oxford, GB) ; Singh; Pratik Raj;
(Oxford, GB) ; Hambley; Richard George; (Oxford,
GB) ; Jordan; Michael Robert; (Oxford, GB) ;
Kilgour; John Joseph; (Oxford, GB) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
VIB VZW
Vrije Universiteit Brussel
Oxford Nanopore Technologies Limited |
Gent
Brussels
Oxford |
|
BE
BE
GB |
|
|
Assignee: |
Oxford Nanopore Technologies
Limited
Oxford
GB
VIB VZW
Gent
BE
Vrije Universiteit Brussel
Brussels
BE
|
Prior
Publication: |
|
Document Identifier |
Publication Date |
|
US 20210147486 A1 |
May 20, 2021 |
|
|
Family ID: |
1000006075710 |
Appl. No.: |
16/624341 |
Filed: |
July 2, 2018 |
PCT Filed: |
July 2, 2018 |
PCT NO: |
PCT/GB2018/051858 PCKC 00 |
371 Date: |
December 19, 2019 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
B82Y 15/00 20130101;
C07K 14/00 20130101 |
International
Class: |
C07K 14/00 20060101
C07K014/00; B82Y 15/00 20060101 B82Y015/00 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 30, 2017 |
EP |
17179099.1 |
Claims
1.-61. (canceled)
62. A pore comprising a CsgG pore and a modified CsgF peptide,
wherein the modified CsgF peptide is bound to CsgG and forms a
constriction in the pore.
63. The pore according to claim 62, wherein the CsgF peptide is a
truncated CsgF peptide lacking the C-terminal head domain of CsgF
and at least part of the neck domain of CsgF.
64. The pore according to claim 1, wherein the CsgF peptide; (i)
has a length of from 25 to 50 amino acids; (ii) comprises the amino
acid sequence of SEQ ID NO: 6 from residue 1 up to any one of
residues 28 to 45 of SEQ ID NO: 6, or the corresponding residues
from a homologue of SEQ ID NO: 6; (iii) comprises SEQ ID NO: 39
(residues 1 to 29 of SEQ ID NO: 6); (iv) comprises SEQ ID NO: 15
(residues 1 to 34 of SEQ ID NO: 6); (v) comprises SEQ ID NO: 40
(residues 1 to 45 of SEQ ID NO: 6; (vi) comprises SEQ ID NO: 54
(residues 1 to 30 of SEQ ID NO: 6); (vii) comprises SEQ ID NO: 55
(residues 1 to 35 of SEQ ID NO: 6), optionally wherein one or more
of the residues in SEQ ID NO: 15, SEQ ID NO: 39, SEQ ID NO: 40, SEQ
ID NO: 54, or SEQ ID NO: 55 are modified.
65. The pore according to claim 62, wherein the CsgF peptide
comprises a modification at one or more of the following positions:
G1, T4, F5, R8, N9, N11, F12, A26, Q29, N15, N17, A20, N24, A28 and
D34, optionally wherein the modification is the introduction of a
cysteine, a hydrophobic amino acid, a charged amino acid, a
non-native reactive amino acid, or photoreactive amino acid.
66. The pore according to claim 62, wherein the CsgF peptide
comprises one or more of the substitutions:
N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C,
A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C,
A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, D34F/Y/W/R/K/N/Q/C, G1C, T4C,
N17S, and D34Y or D34N, optionally wherein the CsgF peptide further
comprises all or part of an enzyme cleavage site at the C-terminal
end.
67. The pore according to claim 62, wherein the CsgG pore comprises
from 6 to 10 CsgG monomers, optionally wherein the ratio of CsgG
monomers to truncated CsgF peptides in the pore is 1:1.
68. The pore according to claim 62, wherein the CsgF peptide and
the CsgG pore are covalently coupled and the covalent coupling is
via: (i) a cysteine residue at a position corresponding to 132,
133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183,
185, 187, 189, 191, 201, 203, 205, 207 or 209 of SEQ ID NO: 3, (ii)
a non-native reactive or photoreactive amino acid at a position
corresponding to 132, 133, 136, 138, 140, 142, 144, 145, 147, 149,
151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 or 209
of SEQ ID NO: 3, (iii) residues at a position corresponding to one
or more of the following pairs of positions of SEQ ID NO: 6 and SEQ
ID NO: 3, respectively: 1 and 153, 4 and 133, 5 and 136, 8 and 187,
8 and 203, 9 and 203, 11 and 142, 11 and 201, 12 and 149, 12 and
203, 26 and 191, and 29 and 144. (iv) a disulphide bond or click
chemistry.
69. The pore according to claim 62, wherein the interaction between
the CsgF peptide and the CsgG pore is stabilised by hydrophobic
interactions, electrostatic or covalent interactions at a position
corresponding to one or more of the following pairs of positions of
SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and 153, 4 and 133,
5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201,
12 and 149, 12 and 203, 26 and 191, and 29 and 144.
70. The pore according to claim 62, wherein the CsgG pore comprises
at least one monomer comprising one or more modification
corresponding to the following modifications in SEQ ID NO:3: (i) a
modification at one or more positions Y51, N55 and F56; (ii) at
least one substitution selected from R97W or R97Y and R93W or R93Y;
(iii) deletion of V105, A106 and I107 of SEQ ID NO:3; (iv) deletion
of one or more of positions R192, F193, I194, D105, Y196, Q197,
R198, L199 and E201 of SEQ ID NO:3; (v) at least one substitution
selected from K94N/Q/R/F/Y/W/L/S, D43S, E44S,
F48S/N/Q/Y/W/I/V/H/R/K, Q87N/R/K, N91K/R, R97F/Y/W/V/I/K/S/Q/H,
E101I/L/A/H, N102K/Q/L/I/V/S/H, R110F/G/N, Q114R/K, R142Q/S,
T150Y/A/V/L/S/Q/N; (vi) a modification at one or more of positions
I41, R93, A98, Q100, G103, T104, A106, I107, N108, L113, S115,
T117, Y130, K135, E170, S208, D233, D238, E244, Q42, E44, L90, N91,
I95, A99, E101 and Q114; (vii) at least one substitution selected
from Y51A/I/V/S/T, N55A/I/V/S/T and F56/A/I/V/S/T/Q; (viii) the
substitution R97W; (ix) deletion of F193, I194, D195, Y196, Q197,
R198 and L199 or deletion of D195, Y196, Q197, R198 and L199; (x)
deletion of V105, A106 and I107; (xi) a substitution selected from
K94Q and K94N; (xii) at least one substitution selected from Q42K
or Q42R; E44N or E44Q; L90R or L90K; N91R or N91K; I95R or I95K;
A99R or A99K; E101H, E101K, E101N, E101Q or E101T; and/or Q114K;
and/or (xiii) the substitution N55V; (xiv) R or K at a position
corresponding to R192.
71. The pore according to claim 62, which is a double pore
comprising two CsgG pores, wherein the CsgF peptide is inserted
into the lumen of at least one of the CsgG pores.
72. A method for producing a pore according to claim 62, the method
comprising (i) co-expressing one or more CsgG monomers and a CsgF
peptide in a host cell, thereby allowing transmembrane pore complex
formation in the cell or (ii) contacting one or more purified CsgG
monomers with a modified CsgF peptide, thereby allowing in vitro
formation of the pore, optionally wherein the modified CsgF peptide
comprises SEQ ID NO:15 or SEQ ID NO:16.
73. A method for producing a pore according to claim 72, wherein
the method comprises expressing a modified CsgF peptide comprising
an enzyme cleavage site positioned such that cleavage of the CsgF
peptide produces a truncated CsgF peptide comprising a CsgG-binding
region and a region that forms a constriction in the pore, and
wherein the method comprises cleaving the CsgF peptide.
74. A method for determining the presence, absence or one or more
characteristics of a target analyte, comprising the steps of: (i)
contacting the target analyte with a pore according to claim 62,
such that the target analyte moves into the pore complex; and (ii)
taking one or more measurements as the analyte moves through the
pore complex and thereby determining the presence, absence or one
or more characteristics of the analyte.
75. A CsgF peptide that is a modified CsgF peptide comprising a
CsgG-binding region, and a region that forms a constriction in the
pore.
76. The CsgF peptide according to claim 75, which is a truncated
CsgF peptide with a length of from 25 to 50 amino acids and which
lacks (i) the C-terminal head domain of CsgF and/or; (ii) at least
part of the neck domain of CsgF; or (iii) the C-terminal head
domain of CsgF and the neck domain of CsgF.
77. A CsgF peptide according to claim 75, which comprises; (i) the
amino acid sequence from residue 1 of SEQ ID NO: 6 up to any one of
residues 28 to 45 of SEQ ID NO: 6; (ii) SEQ ID NO: 39 (residues 1
to 29 of SEQ ID NO: 6); (iii) SEQ ID NO: 15 (residues 1 to 34 of
SEQ ID NO: 6); (iv) SEQ ID NO: 54 (residues 1 to 30 of SEQ ID NO:
6); (v) SEQ ID NO: 40 (residues 1 to 45 of SEQ ID NO: 6); (vi) SEQ
ID NO: 55 (residues 1 to 35 of SEQ ID NO: 6); optionally wherein
one or more residues in the region of between residues 1 and 28 to
45 of SEQ ID NO: 6, SEQ ID NO: 15, SEQ ID NO: 39, SEQ ID NO: 40,
SEQ ID NO: 54, or SEQ ID NO: 55 (residues 1 to 35) are
modified.
78. A CsgF peptide, according to claim 75, which comprises a
modification at one or more of the following positions: G1, T4, F5,
R8, N9, N11, F12, A26, Q29, N15, N17, A20, N24, A28 and D34,
optionally wherein the modification is the introduction of a
cysteine, a hydrophobic amino acid, a charged amino acid, a
non-native reactive amino acid, or photoreactive amino acid.
79. A CsgF peptide according to claim 75, which comprises one or
more of the substitutions: N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C,
N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C,
N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C, A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C,
D34F/Y/W/R/K/N/Q/C, G1C, T4C, N17S, and D34Y or D34N and which
optionally further comprises all or part of an enzyme cleavage site
at the C-terminal end.
80. A pore complex comprising: (i) a CsgG pore comprising a first
opening, a mid-section comprising a beta barrel, a second opening,
and a lumen extending from the first opening through the
mid-section to the second opening, wherein a luminal surface of the
mid-section defines a CsgG constriction; and (ii) a plurality of
modified CsgF peptides, each having a CsgF constriction region and
a CsgG binding region, wherein the modified CsgF peptides form a
CsgF constriction within the beta barrel of the CsgG pore and
wherein the CsgG constriction and the CsgF constriction are
co-axially spaced apart within the beta barrel of the CsgG
pore.
81. The pore complex of claim 80, wherein the luminal surface
comprises one or more loop regions of CsgG monomers that define the
CsgG constriction, optionally wherein the CsgF constriction region
and the CsgG binding region correspond to a N-terminal portion of a
CsgF mature peptide.
Description
FIELD
[0001] The present invention relates to novel protein pores and
their uses in analyte detection and characterisation. The invention
further relates to a transmembrane pore complex and methods for
production of the pore complex and for use in molecular sensing and
nucleic acid sequencing applications.
BACKGROUND
[0002] Nanopore sensing is an approach to analyte detection and
characterization that relies on the observation of individual
binding or interaction events between the analyte molecules and an
ion conducting channel. Nanopore sensors can be created by placing
a single pore of nanometer dimensions in an electrically insulating
membrane and measuring voltage-driven ion currents through the pore
in the presence of analyte molecules. The presence of an analyte
inside or near the nanopore will alter the ionic flow through the
pore, resulting in altered ionic or electric currents being
measured over the channel. The identity of an analyte is revealed
through its distinctive current signature, notably the duration and
extent of current blocks and the variance of current levels during
its interaction time with the pore. Analytes can be organic and
inorganic small molecules as well as various biological or
synthetic macromolecules and polymers including polynucleotides,
polypeptides and polysaccharides. Nanopore sensing can reveal the
identity and perform single molecule counting of the sensed
analytes, but can also provide information on the analyte
composition such as nucleotide, amino acid or glycan sequence, as
well as the presence of base, amino acid or glycan modifications
such as methylation and acylation, phosphorylation, hydroxylation,
oxidation, reduction, glycosylation, decarboxylation, deamination
and more. Nanopore sensing has the potential to allow rapid and
cheap polynucleotide sequencing, providing single molecule sequence
reads of polynucleotides of tens to tens of thousands bases
length.
[0003] Two of the essential components of polymer characterization
using nanopore sensing are (1) the control of polymer movement
through the pore and (2) the discrimination of the composing
building blocks as the polymer is moved through the pore. During
nanopore sensing, the narrowest part of the pore forms the reader
head, the most discriminating part of the nanopore with respect to
the current signatures as a function of the passing analyte. CsgG
was identified as an ungated, non-selective protein secretion
channel from Escherichia coli (Goyal et al., 2014) and has been
used as a nanopore for detecting and characterising analytes.
Mutations to the wild-type CsgG pore that improve the properties of
the pore in this context have also been disclosed (WO2016/034591,
WO2017/149316, WO2017/149317 and WO2017/149318, PCT/GB2018/051191,
all incorporated by reference herein).
[0004] For analytes being polynucleotides, nucleotide
discrimination is achieved via passage through such a mutant pore,
but current signatures have been shown to be sequence dependent,
and multiple nucleotides contributed to the observed current, so
that the height of the channel constriction and extent of the
interaction surface with the analyte affect the relationship
between observed current and polynucleotide sequence. While the
current range for nucleotide discrimination has been improved
through mutation of the CsgG pore, a sequencing system would have
higher performance if the current differences between nucleotides
could be improved further. Accordingly, there is a need to identify
novel ways to improve nanopore sensing features.
SUMMARY
[0005] The disclosure relates to modified CsgF peptides, in
particular truncated CsgF fragments, binding the CsgG pore and
thereby introducing another additional channel or pore constriction
within the CsgG pore. Other aspects of the invention also relate to
an isolated transmembrane pore complex, and the use of said
CsgG:CsgF complexes and modified CsgF peptides or fragments in a
nanopore sensing platform with two consecutive reader heads.
[0006] The first aspect of the invention relates to a pore
comprising a CsgG pore and a CsgF peptide. In one aspect, the CsgF
peptide comprises a CsgG-binding region, and a region that forms a
constriction in the pore. In one aspect the CsgF peptide is a
truncated CsgF peptide lacking the C-terminal head domain of CsgF.
In another aspect the CsgF peptide is a truncated CsgF peptide
lacking the C-terminal head and a portion of the neck domain of
CsgF. In another aspect the CsgF peptide is a truncated CsgF
peptide lacking the C-terminal head and neck domains of CsgF. The
pore is also referred to herein as a pore complex and as an
isolated pore complex. The isolated pore complex comprising a CsgG
pore, or a homologue or mutant thereof, and a modified CsgF
peptide, or a homologue or mutant thereof, in particular truncated
CsgF fragments, or homologues or mutants thereof. In one
embodiment, said modified CsgF peptide, or homologues or mutants,
is located in the lumen of the CsgG pore, or homologues or mutants
thereof. In another embodiment, said isolated pore complex has two
or more channel constrictions, one located or provided by the CsgG
pore, formed by its constriction loop, and another additional
channel constriction or reader head, introduced by the modified
CsgF peptide or its homologues or mutants. In one embodiment, said
CsgG-pore or CsgG-like pore, is not a wild-type pore, it is a
mutant CsgG pore, with in particular embodiments mutations being
present, for example, in said channel constriction loop. In another
embodiment, the isolated pore complex, comprising the modified CsgF
peptide, or a homologue or mutant thereof, has a CsgF channel
constriction with a diameter in the range from 0.5 nm to 2.0 nm. In
one embodiment, the pore complex comprises: (i) a CsgG pore
comprising a first opening, a mid-section comprising a beta barrel,
a second opening, and a lumen extending from the first opening
through the mid-section to the second opening, wherein a luminal
surface of the mid-section defines a CsgG constriction; and (ii) a
plurality of modified CsgF peptides, each having a CsgF
constriction region and a CsgF binding region (also referred to
herein as a CsgG-binding domain or region of CsgF), wherein the
modified CsgF peptides form a CsgF constriction within the beta
barrel of the CsgG pore and wherein the CsgG constriction and the
CsgF constriction are co-axially spaced apart within the beta
barrel of the CsgG pore. The luminal surface of the CsgG pore may
comprise one or more loop regions of CsgG monomers that define the
CsgG constriction. The CsgF constriction region and the CsgF
binding region typically correspond to a N-terminal portion of a
CsgF mature peptide. In one embodiment, the pore complex excludes
CsgA, CsgB and CsgE.
[0007] In a second aspect, the invention relates to a modified CsgF
peptide, or a homologue or mutant thereof, wherein the protein or
peptide is modified via a truncation or deletion of part of the
protein, resulting in a CsgF fragment of SEQ ID NO:6 or of a
homologue or mutant thereof. One embodiment relates to a modified
or truncated CsgF peptide, or a modified peptide of a CsgF
homologue or mutant, said modified peptide comprising SEQ ID NO:39,
or SEQ ID NO:40, or a homologue or mutant thereof, alternatively,
said modified peptide comprises SEQ ID NO:15, or a homologue or
mutant thereof, alternatively SEQ ID NO: 54, or SEQ ID NO: 55 or a
homologue or mutant thereof. Another embodiment discloses a
modified CsgF peptide wherein one or more positions in the region
comprising SEQ ID NO:15 are modified, and wherein said mutation(s)
required to retain a minimal of 35% amino acid identity to SEQ ID
NO:15, in the peptide fragment corresponding to the region
comprising SEQ ID NO:15.
[0008] One embodiment relates to a pore comprising a CsgG pore and
a modified CsgF peptide, wherein the modified CsgF peptide is bound
to CsgG and forms a constriction in the pore
[0009] One embodiment relates to a polynucleotide encoding said
modified CsgF peptide, or homologue or mutant thereof, according to
the second aspect of the invention. In another embodiment, the
isolated pore complex comprising a CsgG pore and a modified CsgF
peptide, or homologues or mutants thereof, are characterized in
that said modified CsgF peptide is a peptide provided by the
peptides disclosed in the second aspect of the invention.
[0010] Another embodiment relates to the isolated pore complex
wherein the modified CsgF peptide and the CsgG pore or a monomer of
said pore, or homologues or mutants thereof, are covalently
coupled. And even more particularly, said coupling is made via a
cysteine residue or via a non-native reactive or photo-reactive
amino acid in a CsgG monomer at a position corresponding to 132,
133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183,
185, 187, 189, 191, 201, 203, 205, 207 or 209 of SEQ ID NO: 3, or
of a homologue thereof.
[0011] A preferred embodiment relates to an isolated transmembrane
pore complex, or a membranous composition, which comprises the
isolated pore complex of the invention, and the components of a
membrane. Particularly, said transmembrane pore complex or
membranous composition consists of the isolated pore complex of the
invention, and the components of a membrane or an insulating
layer.
[0012] One embodiment relates to a method for producing a pore as
disclosed herein, the method comprising co-expressing one or more
CsgG monomers as disclosed herein and a CsgF peptide as disclosed
herein in a host cell, thereby allowing transmembrane pore complex
formation in the cell. The CsgF peptide may be produced in the cell
by cleaving a modified CsgF peptide or protein comprising an enzyme
cleavage site at a suitable position in the amino acid
sequence.
[0013] One embodiment relates to a method for producing a pore as
disclosed herein, the method comprising contacting one or more
purified CsgG monomers with one or more purified modified CsgF
peptides, thereby allowing in vitro formation of the pore. The
modified CsgF peptide may be a peptide comprising an enzyme
cleavage site at a suitable position in the amino acid sequence,
that is cleaved before or after formation of the pore.
[0014] A third aspect of the invention relates to a method for
producing said transmembrane pore complex, wherein the pore is an
isolated complex formed by a CsgG pore, or a homologue or mutant
thereof, and a modified CsgF peptide, or a homologue or mutant
thereof, the method comprising the steps of co-expressing CsgG SEQ
ID NO:2, or a homologue or mutant thereof, and modified or
truncated CsgF, comprising a fragment of SEQ ID NO:5, or a
homologue or mutant thereof, in a suitable host cell, thereby
allowing in vivo pore complex formation. In specific embodiments,
said modified CsgF peptide, or homologue or mutant thereof,
comprises SEQ ID NO: 12 or SEQ ID NO:14, or a homologue or mutant
thereof. Alternatively, a method for producing an isolated pore
complex comprises the steps of contacting CsgG monomers of SEQ ID
NO:3, or a homologue or mutant thereof, with modified CsgF
peptide(s), or a homologue or mutant thereof, for in vitro
reconstitution of the pore complex. In particular embodiments,
modified CsgF peptides of said method comprise SEQ ID NO:15 or SEQ
ID NO:16, or homologues or mutants thereof.
[0015] Another aspect of the invention relates to a method for
determining the presence, absence or one or more characteristics of
a target analyte, comprising the steps of:
[0016] (i) contacting the target analyte with said isolated pore
complex or transmembrane pore complex, such that the target analyte
moves into the pore channel; and
[0017] (ii) taking one or more measurements as the analyte moves
through the pore channel and thereby determining the presence,
absence or one or more characteristics of the analyte.
[0018] In one embodiment, said analyte is a polynucleotide. In
particular, said method using a polynucleotide as an analyte
alternatively comprises determining one or more characteristics
selected from (i) the length of the polynucleotide, (ii) the
identity of the polynucleotide, (iii) the sequence of the
polynucleotide, (iv) the secondary structure of the polynucleotide
and (v) whether or not the polynucleotide is modified.
[0019] In another embodiment, the analyte is a protein, or peptide,
and in further embodiments, said analyte is a polysaccharide, or a
small organic or inorganic compound, such as for instance but not
limited to pharmacologically active compounds, toxic compounds and
pollutants.
[0020] In another embodiment, a method for characterising a
polynucleotide or a (poly)peptide using an isolated transmembrane
pore complex is described, wherein the pore complex is an isolated
complex comprising a CsgG pore, or a homologue or mutant thereof,
and a modified CsgF peptide, or a homologue or mutant thereof. In
particular, said CsgG pore, or homologue or mutant thereof,
comprising six to ten CsgG monomers forming the CsgG pore
channel.
[0021] A further aspect of the invention discloses the use of said
isolated pore complex or transmembrane pore complex according to
the previous aspects of the invention to determine the presence,
absence or one or more characteristics of a target analyte.
Furthermore, the invention also relates to a kit for characterising
a target analyte comprising (a) said isolated pore complex and (b)
the components of a membrane.
DESCRIPTION OF THE FIGURES
[0022] The drawings described are only schematic and are
non-limiting. In the drawings, the size of some of the elements may
be exaggerated and not drawn on scale for illustrative
purposes.
[0023] FIG. 1. Structure of CsgG pore and the interface for complex
formation with CsgF. Cross-sectional (A), side (B) and top (C)
views of CsgG oligomers (e.g., nonamers) (gold) in surface (A) and
ribbon (B, C) representation, with a single CsgG protomer colored
light blue (D) (based on the CsgG X-ray structure PDB entry: 4uv3).
The CsgG constriction loop (CL loop) spans residues 46 to 61
according SEQ ID NO:3, and is indicated in dark grey in all panels,
and corresponds to the loop provided in the bottom left of (E).
CsgG residues for which the side chain faces the inner lumen of the
CsgG beta-barrel are colored mid-grey as indicated and labelled in
the .beta. strands in (E) and (D). These residues represent sites
that can be used for substitution to natural or non-natural amino
acids, e.g., amenable for attachment (e.g., covalent crosslinking)
of a pore-resident peptide, (including e.g., a modified CsgF
peptide, or a homologue thereof) to a CsgG pore or monomer. In some
embodiments crosslinking residues include Cys and reactive and
photo-reactive amino acids, acids such as azidohomoalanine,
homopropargylglycyine, homoallelglycine, p-acetyl-Phe, p-azido-Phe,
p-propargyloxy-Phe and p-benzoyl-Phe (Wang et al. 2012; Chin et al.
2002) and can be substituted into positions 132, 133, 136, 138,
140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189,
191, 201, 203, 205, 207 or 209 according to SEQ ID NO:3. (E) shows
a zoom of the CL loop and the transmembrane beta-strands of a CsgG
monomer. The CsgG constriction loop (colored dark blue) forms the
orifice or narrowest passage in the CsgG pore (panel A). In some
embodiments, three positions in the CL loop, 56, 55 and 51
according to SED ID NO:3, are of particular importance to the
diameter and chemical and physical properties of the CsgG channel
orifice or "reader head". These represent preferred positions to
alter the nanopore sensing properties of CsgG pores and
homologues.
[0024] FIG. 2. CsgG:CsgF complex protein co-expression and complex
purification. (A) Schematic representation of the purification
protocol of the CsgG:CsgF complex starting from an E. coli culture
co-expressing CsgG (SEQ ID NO:2+C-terminal StrepII tag) and CsgF
(SEQ ID NO:4+C-terminal 6.times.His tag). The protocol involves
disrupting resuspended cells and performing a 1% DDM extraction of
the membrane-bound proteins. The CsgG:CsgF complex and excess CsgF
undergo a first enrichment by affinity purification on a nickel
IMAC column, followed by a second affinity-based enrichment of the
CsgG:CsgF complex on a Streptavidin column. (B) Coommassie stained
SDS-PAGE of the IMAC (left) and Streptavidin (right) purification
steps. Protein bands corresponding to CsgG and CsgF are
labelled.
[0025] Notably, the IMAC eluate contains a N-terminally truncated
CsgF fragment (labelled *) that was not retained in the affinity
pulldown using the CsgG-bound Strep tag, indicating that the CsgF
N-terminus is required for complex formation with CsgG.
[0026] FIG. 3. CsgG:CsgF complex protein purification upon in vitro
reconstitution. (A) Superimposed chromatograms of Size Exclusion
Chromatography (SEC) runs (using BioRad Enrich 650 10/300 column)
of CsgG (light grey) and CsgG supplemented with an excess CsgF
(dark grey). The chromatograms show elution peaks corresponding to
a CsgG 9-mer (a) and CsgG 18-mer (b) for the CsgG run; and an
excess free CsgF (c), as well as a 9-mer CsgG:CsgF complex (d) and
18-mer CsgG:CsgF complex (e) that elute at higher hydrodynamic
radius (molecular mass) due the incorporation of CsgF in the
complex. (B) Native PAGE analysis of the representative species
labelled in panel (A), confirming the shift to higher molecular
mass due to incorporation of CsgF into the CsgG 9-mer and CsgG
18-mer complexes. These experiments demonstrate that CsgG:CsgF
complex can be reconstituted in vitro starting from the purified
components. (C) Ribbon representation of the CsgG 9-mer and CsgG
18-mer as previously reported in Goyal et al. 2014 (PDB entry
4uv3). The CsgG 18-mer is formed of a dimer of CsgG 9-mers. The SEC
and native PAGE analyses shown in panels A and B demonstrate that
the CsgG 18-mer is amenable to complex formation with CsgF.
[0027] FIG. 4. CsgG:CsgF structure as determined in cryo-EM. (A) A
cryo electron micrograph of the CsgG:CsgF complex shows the
presence of 9-mer and 18-mer CsgG:CsgF complexes, with a number of
single particles of the 9- and 18-mer forms highlighted by full and
dashed circles, respectively. (B) Two representative class averages
of the CsgG:CsgF 9-mer complex, viewed from the side. Class
averages include 6020 and 4159 individual particles, respectively.
The class averages reveal the presence of additional density on top
of the CsgG particle, corresponding to an oligomeric complex of
CsgF. Three distinct regions can be seen in the CsgF oligomer: a
"head" and "neck" region, as well as a region that resides inside
lumen of the CsgG beta-barrel and forms a constriction or narrow
passage (labelled F) that is stacked on top of the constriction
formed by the CsgG CL loop (labelled G). This latter CsgF region is
referred to as CsgF Constriction Peptide (FCP).
[0028] FIG. 5. Three-dimensional structural model of a CsgG:CsgF
complex. Cross-sectional views of the 3D cryoEM electron density of
the CsgG:CsgF 9-mer complex calculated from 20.000 particles
assigned to 21 class averages. The right picture shows a
superimposition with the CsgG 9-mer X-ray structure (PDB entry:
4uv3) docked into the cryoEM density. The regions corresponding to
CsgG, CsgF and the CsgF head, neck and FCP domains are indicated.
The cross-sections show the CsgF FCP regions forms an additional
constriction (labelled F) in the CsgG channel, approximately 2 nm
above the CsgG constriction loop (labelled G).
[0029] FIG. 6: Schematic representation of the CsgG:CsgF pore
complex based on the cryo-EM structure. (A) Schematic
representation of the CsgG nanopore in cross-sectional view,
harbouring a single constriction, labelled (1). CsgG-based
nanopores form 3.5-4 nm wide channels that contain a 0.5-1.5 nm
orifice made by the CsgG constriction loop (residues 46 to 61
according SEQ ID No:3). When in complex with CsgF, a second
constriction or orifice is introduced into the CsgG channel,
labelled (2)/F, and the channel exit becomes occluded by the CsgF
head domain (see FIG. 5). When using modified CsgF peptides, e.g.,
corresponding to a CsgF constriction peptide (FCP), which lacks the
neck and head regions, a CsgG:CsgF pore complex is formed with two
consecutive channel constrictions or orifices ((1) and (2)), as
shown in the cross-sectional view of the CsgG:CsgF cryo-EM density
in panel (B) and in schematic representation in panel (C). Removal
of the neck and head regions in the modified CsgF peptide
alleviates their blockage of the channel exit.
[0030] FIG. 7. Schematic representation of the use of CsgG:CsgF
pore complex for nanopore sensing applications of a (bio)polymer
(A) or single molecule analytes (B). When used in polymer sensing,
the second channel constriction introduced by the modified CsgF
peptide increases the contact region with the analyte and forms a
second interaction site and reader head. When used in single
molecule nanopore sensing, the second channel constriction
introduced by the modified CsgF peptide, creates a second
independent analyte interaction site. (C) Schematic representation
of theoretical channel conductance profiles of small molecules
(indicated by hexagonals or triangles) passing and interacting with
the consecutive CsgG (1) and CsgF (2) constrictions or reader
heads.
[0031] FIG. 8. Multiple sequence alignments of exemplary CsgF
homologues. Aligned sequences are shown as mature proteins (i.e.
lacking their N-terminal signal peptide (SP)). The boxed sequences
indicate a CsgF region of sequence conservation (between 35 and
100% pairwise sequence identity--see FIG. 10) that corresponds to a
CsgF constriction peptide (FCP) in some embodiments. CsgF
homologues included in the multiple sequence alignment are Q88H88;
A0A143HJA0; Q5E245; Q084E5; F0LZU2; A0A136HQR0; A0A0W1SRL3; B0UH01;
Q6NAU5; G8PUY5; A0A0S2ETP7; E3I1Z1; F3Z094; A0A176T7M2; D2QPP8;
N21YT1; W7QHV5; D4ZLW2; D2QT92; A0A167UJA2. The FCP regions of E.
coli CsgF (SEQ ID No:15) and the shown CsgF homologues correspond
to SEQ ID Nos 18-36.
[0032] FIG. 9. Experimental evaluation of the E. coli CsgF region
forming the CsgG-interaction sequence and CsgF constriction peptide
(FCP). Panel (A) shows the mature sequences (i.e. after removal of
the CsgF signal peptide, corresponding to residues 1-19 of SEQ ID
NO:5) of the four N-terminal CsgF fragments (SEQ ID NO:8_CsgF
residues 1-27; SEQ ID NO: 10; SEQ ID NO: 12 and SEQ ID NO: 14) that
were co-expressed with E. coli CsgG (SEQ ID NO:2). (B) Anti-Strep
(left) and anti-His (right) Western blot analysis of SDS-PAGE runs
of crude cell lysates of CsgG and CsgF co-expression experiments.
Anti-strep analysis demonstrates the expression of CsgG in all
co-expression experiments, whereas anti-his western blot analysis
shows detectible levels of CsgF fragments only for the truncation
mutant CsgF 1-64 (SEQ ID NO: 14). A His-tagged nanobody (Nb) was
used as positive control. (C) Anti-His dot blot analysis of the
presence of CsgF fragments in CsgG:CsgF co-expression experiments.
Top row shows whole cell lysates, middle and bottom rows show the
eluate and flowthrough of a Strep affinity pulldown experiment.
These data demonstrate that CsgF fragment 1-64, and to a much
lesser extent CsgF 1-48, is specifically pulled down as a complex
with Strep-tagged CsgG. CsgF fragments 1-27 and 1-38 do not result
in detectable levels of the corresponding CsgF fragments and show
no sign of complex formation with CsgG.
[0033] FIG. 10. Multiple sequence alignment of the CsgF region
forming the CsgG-interaction sequence and CsgF constriction peptide
(FCP). The figure shows a multiple sequence alignment and consensus
sequence of the CsgF peptides and known homologues thereof in the
region that corresponds to the CsgG-interaction. CsgF homologues
are defined by the PFAM domain PF03783. These peptides bind CsgG,
and localize to the lumen of the CsgG .beta.-barrel where they form
an additional constriction in the CsgG channel. These peptides and
homologues thereof are examples of CsgF Constriction Peptides or
FCPs. The pairwise sequence identity in the shown FCPs ranges
between 35 and 98%.
[0034] FIG. 11. The high resolution cryoEM structure of the
CsgG:CsgF complex. CsgG is shown in light grey and CsgF is shown in
dark grey. A. Final electron density map of the CsgG:CsgF complex
at 3.4 .ANG. resolution. Side view. B. Top view of the cryoEM
structure to show CsgG:CsgF comprises a 9:9 stoichiometry, with C9
symmetry. C. Internal architecture of the CsgG:CsgF complex. GC,
CsgG constriction, FC, CsgF constriction. D. Interactions between
CsgG and CsgF proteins. CsgG and the CsgG constriction are coloured
light grey and grey respectively. CsgF is coloured dark grey.
Residues in CsgG and CsgF are labelled in light grey and black
respectively.
[0035] FIG. 12. Two reader heads of the CsgG:CsgF complex. CsgG is
shown in light grey and reader head of the CsgG pore is shown in
dark grey. CsgF is shown in black and the reader head of the CsgF
is labelled.
[0036] FIG. 13. Co-expression of CsgG with CsgF WT in vivo. Gene
encoding the C-terminal strep tagged CsgG polypeptide in pT7 vector
with ampicillin resistance and the gene encoding the C-terminal His
tagged CsgF polypeptide in pRham vector with kanamycin resistance
were transformed together into E. coli BL21DE3 cells in the
presence of both ampicillin and kanamycin. Proteins were expressed
for overnight at 18.degree. C. at 250 rpm and the CsgG-CsgF complex
was purified using the Strep tag purification followed by His tag
purification. A. Protein sample (in duplicate) before strep
purification. B. Protein sample (three elution fractions) after His
purification. Proteins were run on a 4-20% Tris gel.
[0037] FIG. 14. Co-expression of CsgG and CsgF in vitro and the
heat stability of the CsgG-CsgF complex. CsgG and CsgF DNA in
different vectors are co-expressed in an in vitro transcription and
translation reaction. Proteins are radiolabled with S-35 methionine
and exposed onto a X-ray film. Stability of the complex is assessed
by incubating the reaction mixture for 10 minutes at different
temperatures.
[0038] FIG. 15. Making CsgG:CsgF complexes using protease cleavage
sites. A. TEV or C3 or any other protease cleavage site can be
Incorporated into the CsgF peptide at required sites (eg: between
30 and 31, 35 and 36, 40 and 41, 45 and 46 of seq ID no. 6) CsgG is
shown in gold and CsgF domains in red. 1-35 of one CsgF subunit is
coloured in green for clarity. 36-45 is shown in purple. 10
Histidine tag is shown in pink and the strep tag on CsgG is shown
in blue. B. SDS-PAGE (4-20% TGX) for protease cleavage of the full
length CsgG:CsgF complex where TEV protease cleavage site is
inserted between 35-36 of seq ID 6. M: molecular weight marker,
Lane 1: post strep purification of CsgG:CsgF full length complex,
Lane 2: post strep concentrated, Lane 3: post gel filtrated, Lane
4: cleaved with TEV protease to generate CsgG:CsgF complex, Lane 5:
flow through of CsgG:CsgF after strep purification, Lane 6:
CsgG:CsgF heated at 60.degree. C. for 10 minutes. Lane 7: Eluted
CsgG:CsgF complex from strep column, Lane 8: CsgG pore as the
control, Lane 9: TEV protease as the control.
[0039] FIG. 16. Heat stability of CsgG:CsgF complexes. M: Molecular
weight marker, Lane 1: CsgG pore, Lane 2: CsgG:CsgF complex at room
temperature: Lanes 3-9: CsgG:CsgF sample was heated at different
temperatures (40, 50, 60, 70, 80, 90, 100 C respectively) for 10
minutes. Lane 1:
A. Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-45).
B. Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-35).
C. Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-30).
[0040] Samples were subjected to SDS-PAGE on a 7.5% TGX gel.
CsgG:CsgF complexes with both CsgF-(1-45) and CsgF-(1-35) shows a
shift from the CsgG pore band in lanes 1. Therefore, it is clear
that both those complexes are heat stable up to 90 C. The complex
and the pore breaks down to CsgG monomers at 100 C (lanes 9).
Although the same heat stability pattern is seen with the CsgG:CsgF
complex with CsgF-(1-30), its difficult to see the shift between
the protein bands of the CsgG pore (lane 1) and CsgG-CsgF complexes
(lanes 2-8).
[0041] FIG. 17. CsgG:CsgF formation via in vitro reconstitution
using synthetic CsgF peptides. Native PAGE showing CsgG:CsgF
formation via in vitro reconstitution using wildtype CsgG or a CsgG
mutant with altered constriction
Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107). An Alexa 594-labelled
CsgF peptide corresponding to the first 34 residues of mature CsgF
(Seq ID No 6) was added to purified Strep-tagged CsgG or
Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107) in 50 mM Tris, 100 mM
NaCl, 1 mM EDTA, 5 mM LDAO/C8D4 in a 2:1 molar ratio during 15
minutes at room temperature to allow reconstitution. After pull
down of CsgG-strep on StrepTactin beads, the sample was analysed on
native-PAGE. Both WT and Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)
CsgG bind the CsgF N-terminal peptide as visualised by the
fluorescence tag.
[0042] FIG. 18. Stabilising CsgG:CsgF or CsgG:FCP complexes. A.
Identified amino acid positions of CsgG (SEQ ID NO: 3 and CsgF (SEQ
ID NO: 6) pairs where S--S bonds can be made. B. Schematic
representation to show the S--S bond between CsgG-Q153C and
CsgF-G1C.
[0043] FIG. 19. Cysteine cross linking of the CsgG:CsgF complex. A.
Y51A/F56Q/N91R/K94Q/R97W/Q153C-del(V105-I107) and CsgF-G1C proteins
were purified separately and incubated together at 4.degree. C. for
1 hour or overnight to form the complex and allow S-S formation. No
oxidising agents were added to promote S-S formation. Control CsgG
pore (Y51A/F56Q/N91R/K94Q/R97W/Q153C-DEL(V105-I107)) and complex
(with and without DTT) were heated at 100.degree. C. for 10 minutes
to breakdown the complex into CsgG monomer (CsgG.sub.m, 30KDa) and
CsgF monomer (CsgF.sub.m, 15KDa). A dimer between the CsgG.sub.m
and CsgF.sub.m (CsgG.sub.m-CsgF.sub.m, 45KDa) can be seen in the
absence of the reducing agents confirming the S-S bond formation.
Increased dimer formation can be seen in overnight incubation
compared to one hour incubation. B. Mass spectrometry analysis was
carried out on the gel purified CsgG.sub.m-CsgF.sub.m band from
overnight incubation. Protein was proteolytically cleaved to
generate tryptic peptides. LC-MS/MS sequencing methods were
performed, resulting in the identification of the precursor ion
above, corresponding to the linked peptides shown. This precursor
ion was fragmented to give the fragment ions observed. These
include ions for each of the peptides, as well as fragments
incorporating the intact disulphide bond. This data provides strong
evidence for the presence of a disulphide bond between Cl of CsgF
and C153 of CsgG.
[0044] FIG. 20. Improving the efficiency of Cysteine cross linking
of the CsgG:CsgF complex. Lane 1:
Y51A/F56Q/N91R/K94Q/R97W/N133C-del(V105-I107) and CsgF-T4C proteins
were co expressed the CsgG:CsgF complex was purified. Lane 2: The
complex was heated in the presence of DTT to break down the complex
into substituent monomers (CsgG.sub.m and CsgF.sub.m). DTT will
break down any S-S bonds between CsgG-N133C and CsgF-T4C if formed.
Lane 3: The complex is incubated with the oxidising agent
copper-orthophenanthroline to promote S-S bond formation. Lane 4:
Oxidised sample was heated at 100.degree. C. in the absence of DTT
to break down the complex. A new band of 45KDa corresponding to the
CsgG.sub.m-CsgF.sub.m appears confirming the S-S bond
formation.
[0045] FIG. 21. Current signature when the DNA strand is passing
through the CsgG:CsgF complex. The complexes were made by
co-expressing the CsgG pore
(Y51A/F56Q/N91R/K94Q/R97W-del(V105-I107)) containing the C terminal
strep tag with the full length CsgF proteins containing C terminal
His tag and TEV protease cleavage site between 35 and 36 of seq ID
no. 6. Purified complexes were then cleaved by TEV protease to make
the given CsgG:CsgF complexes. Note that TEV cleavage leaves ENLYFQ
sequence at the cleavage site. A. No mutations at 17 position of
CsgF. B. N175 mutation in CsgF.
[0046] FIG. 22. Current signature when the DNA strand is passing
through the CsgG:CsgF complex. The complexes were made by
incubating Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107) pore
containing the C terminal strep tag with CsgF-(1-35) mutants. A.
CsgF-N17S-(1-35). B. CsgF-N17V-(1-35).
[0047] FIG. 23. Current signature when the DNA strand is passing
through the CsgG:CsgF complex. The complexes were made by
incubating different CsgG pores containing the C terminal strep tag
with CsgF-N17S-(1-35). A. CsgG pore is
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107). B. CsgG pore is
Y51T/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107). C. CsgG pore is
Y51A/N551/F56Q/N91R/K94Q/R97W-del(V105-I107). D. CsgG pore is
Y51A/F56A/N91R/K94Q/R97W-del(V105-I107). E. CsgG pore is
Y51A/F561/N91R/K94Q/R97W-del(V105-I107). F. CsgG pore is
Y51S/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107).
[0048] FIG. 24. Current signature when the DNA strand is passing
through the CsgG:CsgF complex. Complexes were made by incubating
the E. coli purified Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)
pore containing the C terminal strep with CsgF of three different
lengths. A. CsgF-(1-29), B. CsgF-(1-35), C. CsgF-(1-45). The arrow
indicates the range of the signal. Surprisingly, complex with the
CsgF-(1-29) produces the signal with the largest range.
[0049] FIG. 25. Signal: noise of the current signature when the DNA
strand is passing through the CsgG:CsgF complex. Different
CsgG:CsgF complexes were made by incubating different CsgG pores
(1-Y51A/F56Q/N91R/K94Q/R97W-del(V105-I107)
2-Y51A/N551/F56Q/N91R/K94Q/R97W-del(V105-I107)
3-Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)
4-Y51A/F56A/N91R/K94Q/R97W-del(V105-I107)
5-Y51A/F561/N91R/K94Q/R97W-del(V105-I107)
6-Y51A/F56V/N91R/K94Q/R97W-del(V105-I107) 7-Y51
S/N55A/F56Q/N91R/K94Q/R97W-del(V105-I107)
8-Y51S/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)
9-Y51T/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)) with the same CsgF
peptide CsgF-(1-35). Different squiggle patterns were observed in
DNA translocation experiments and their signal:noise is measured.
Higher accuracies can be obtained with larger signal:noise
ratios.
[0050] FIG. 26. Sequencing errors with narrow reader-heads. A
representation of DNA base interaction with the reader head of the
CsgG pore. Approximately, 5 bases dominate the current signal at
any given time when the DNA strand is translocating through the
pore. B. Mapping plots of the signal. Event-detected signal for
multiple reads mapped to modelled signal using a custom HMM, for a
mixed sequence lacking homopolymer runs, and for a sequence
containing three homopolymer runs of 10 T.
[0051] FIG. 27. Mapping the reader heads of the CsgG:CsgF complex.
Reader head discrimination plot for the CsgG:CsgF complex. The
average variation in modelled current when the base at each read
head position is varied. To calculate the read head discrimination
at position i for a model of length k with alphabet of length n, we
define the discrimination at read-head position i as the median of
the standard deviations in current level for each of the nk-1
groups of size n where position i is varied while other positions
are held constant. B. Static DNA strands to map the reader head: A
set of polyA DNA strands (SS20 to SS38) in which one base is
missing from the DNA backbone (iSpc3) is created. In each strand,
the position of iSpc3 moves from 3' end towards the 5' end. Based
on previous experiments with the CsgG pore, 7.sup.th position of
the DNA is expected to be located within the CsgG constriction.
SS26 corresponds to this DNA is highlighted. Based on the model
from (A), 4-5 bases are expected to separate CsgG and CsgF reader
heads. Therefore, approximately, position 12 and 13 are expected to
be within the CsgF constriction. SS31 and SS32 DNA strands
corresponding to those positions are highlighted. C and D. Mapping
the two reader heads: Biotin modification at the 3' end of each
strand is complexed with monovalent streptavidin and the current
blockage generated from each strand is recorded in a MinION set up.
When the iSpc3 position is present above or below the constriction
within the pore, no deflection is expected. However, when the iSpc3
is located within the constriction, a higher current level is
expected to pass through the pore--the extra space created by the
lack of base lets more ions to pass through. Therefore, by plotting
the current passing through with each DNA strand, the locations of
the two reader heads can be mapped. As expected, the highest
deflection in the current is seen when the position 7 of the DNA
strand is occupied by iSpc3 (C). iSpc3 at positions 6 and 8 also
produce a higher deflection over the average polyA current level.
Therefore, positions 6, 7 and 8 of the DNA strand represent the
first reader head--CsgG reader head. As expected, when positions
12th and 13th are occupied by iCsp3, another deviation from
baseline polyA is observed (D). This indicates the second reader
head of the pore--CsgF reader head. Results also confirm that the
two reader heads are apart by approximately 4-5 bases.
[0052] FIG. 28. Reader head discrimination and base contribution.
Left hand panel demonstrates the read-head discrimination of each
mutant pore: the average variation in modelled current when the
base at each read head position is varied. To calculate the read
head discrimination at position i fora model of length k with
alphabet of length n, we define the discrimination at read-head
position i as the median of the standard deviations in current
level for each of the n.sup.k-1 groups of size n where position i
is varied while other positions are held constant. Right hand panel
demonstrates the base contribution plot: Median current over all
sequence contexts with base b (A, T, G or C) at position i of the
reader head.
[0053] FIG. 29. Error profiles of the double reader head pore. A.
Schematic representation of the CsgG:CsgF complex and the
interaction of bases of the DNA with the two reader heads. Red:
strong interactions, orange: weak interactions, grey: no
interactions. B. Comparison of errors in deletions. Reads from
Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) and
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107): CsgF-N17S-(1-35)
pores were base called from the same region of E. coli DNA. Reads
were aligned to the reference genome using Minimap2
(https://arxiv.org/abs/1708.01492), and the resultant alignments
were visualised in Savant Genome Browser
(http://www.ncbi.nlm.nih.gov/pubmed/20562449). The majority of
Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) reads contain a
single base deletion (black boxes) in the T homopolymer, which is
not present in the majority of CsgG:CsgF reads. C. Comparison of
the consensus accuracy from unpolished data generated from
Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) (blue) and
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107):CsgF-N175-(1-35) pores
(green) against the length of homopolymers.
[0054] FIG. 30. Homopolymer calling of CsgG:CsgF complex. DNA with
the sequence shown in (A) is translocated through the
Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) pore (B) and the
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107):CsgF-N175-(1-35) pore
(C) and their signal was analysed for the first polyT section shown
in red in (A). When the polyT section is passing through the CsgG
pore which contains a single reader head (model is based on 5 bases
located in the reader head), it generates a flat line in the
signal. Therefore, it is difficult to determine the exact number of
bases in this region which usually causes deletion errors. When the
DNA is passing through the CsgG:CsgF complex which contains two
reader heads (model is based on 9 bases located within and in
between the two reader heads), polyT section shows multiple steps
instead of a flat line. Information in these steps can be used to
correctly identify the number of bases in the homopolymeric region.
This additional information significantly reduce deletion errors
and improves overall consensus accuracy.
[0055] FIG. 31. Characterisation of the CsgG gore
(Y51A/F56Q/N91R/K94Q/R97W/ -del(V105-I107). A. Reader head
discrimination of the CsgG pore. The average variation in modelled
current when the base at each read head position is varied. To
calculate the read head discrimination at position i for a model of
length k with alphabet of length n, we define the discrimination at
read-head position i as the median of the standard deviations in
current level for each of the n.sup.k-1 groups of size n where
position i is varied while other positions are held constant. B.
Base contribution plot of the CsgG pore. Median current over all
kmers with base b (A, T, G or C) at position I of the reader head.
C. Current signature when the DNA strand is passing through the
CsgG pore.
DETAILED DESCRIPTION
[0056] The present invention will be described with respect to
particular embodiments and with reference to certain drawings but
the invention is not limited thereto but only by the claims. Any
reference signs in the claims shall not be construed as limiting
the scope. Of course, it is to be understood that not necessarily
all aspects or advantages may be achieved in accordance with any
particular embodiment of the invention. Thus, for example those
skilled in the art will recognize that the invention may be
embodied or carried out in a manner that achieves or optimizes one
advantage or group of advantages as taught herein without
necessarily achieving other aspects or advantages as may be taught
or suggested herein. The invention, both as to organization and
method of operation, together with features and advantages thereof,
may best be understood by reference to the following detailed
description when read in conjunction with the accompanying
drawings. The aspects and advantages of the invention will be
apparent from and elucidated with reference to the embodiment(s)
described hereinafter. Reference throughout this specification to
"one embodiment" or "an embodiment" means that a particular
feature, structure or characteristic described in connection with
the embodiment is included in at least one embodiment of the
present invention. Thus, appearances of the phrases "in one
embodiment" or "in an embodiment" in various places throughout this
specification are not necessarily all referring to the same
embodiment, but may. Similarly, it should be appreciated that in
the description of exemplary embodiments of the invention, various
features of the invention are sometimes grouped together in a
single embodiment, figure, or description thereof for the purpose
of streamlining the disclosure and aiding in the understanding of
one or more of the various inventive aspects. This method of
disclosure, however, is not to be interpreted as reflecting an
intention that the claimed invention requires more features than
are expressly recited in each claim. Rather, as the following
claims reflect, inventive aspects lie in less than all features of
a single foregoing disclosed embodiment.
[0057] In addition as used in this specification and the appended
claims, the singular forms "a", "an", and "the" include plural
referents unless the content clearly dictates otherwise. Thus, for
example, reference to "a polynucleotide" includes two or more
polynucleotides, reference to "a polynucleotide binding protein"
includes two or more such proteins, reference to "a helicase"
includes two or more helicases, reference to "a monomer" refers to
two or more monomers, reference to "a pore" includes two or more
pores and the like.
[0058] In all of the discussion herein, the standard one letter
codes for amino acids are used. These are as follows: alanine (A),
arginine (R), asparagine (N), aspartic acid (D), cysteine (C),
glutamic acid (E), glutamine (Q), glycine (G), histidine (H),
isoleucine (I), leucine (L), lysine (K), methionine (M),
phenylalanine (F), proline (P), serine (S), threonine (T),
tryptophan (V), tyrosine (Y) and valine (V). Standard substitution
notation is also used, i.e. Q42R means that Q at position 42 is
replaced with R.
[0059] In the paragraphs herein where different amino acids at a
specific position are separated by the / symbol, the / symbol means
"or". For instance, Q87R/K means Q87R or Q87K.
[0060] In the paragraphs herein where different positions are
separated by the / symbol, the / symbol means "and" such that
Y51/N55 is Y51 and N55.
[0061] All amino-acid substitutions, deletions and/or additions
disclosed herein are with reference to a mutant CsgG monomer
comprising a variant of the sequence shown in SEQ ID NO: 3, unless
stated to the contrary.
[0062] Reference to a mutant CsgG monomer comprising a variant of
the sequence shown in SEQ ID NO: 3 encompasses mutant CsgG monomers
comprising variants of sequences as set out in the further SEQ ID
NOS as disclosed below. Amino-acid substitutions, deletions and/or
additions may be made to CsgG monomers comprising a variant of the
sequence other than shown in SEQ ID NO: 3 that are equivalent to
those substitutions, deletions and/or additions disclosed herein
with reference to a mutant CsgG monomer comprising a variant of the
sequence shown in SEQ ID NO: 3.
[0063] All publications, patents and patent applications cited
herein, whether supra or infra, are hereby incorporated by
reference in their entirety.
Definitions
[0064] Where an indefinite or definite article is used when
referring to a singular noun e.g. "a" or "an", "the", this includes
a plural of that noun unless something else is specifically stated.
Where the term "comprising" is used in the present description and
claims, it does not exclude other elements or steps. Furthermore,
the terms first, second, third and the like in the description and
in the claims, are used for distinguishing between similar elements
and not necessarily for describing a sequential or chronological
order. It is to be understood that the terms so used are
interchangeable under appropriate circumstances and that the
embodiments of the invention described herein are capable of
operation in other sequences than described or illustrated herein.
The following terms or definitions are provided solely to aid in
the understanding of the invention. Unless specifically defined
herein, all terms used herein have the same meaning as they would
to one skilled in the art of the present invention. Practitioners
are particularly directed to Sambrook et al., Molecular Cloning: A
Laboratory Manual, 4.sup.th ed., Cold Spring Harbor Press,
Plainsview, N.Y. (2012); and Ausubel et al., Current Protocols in
Molecular Biology (Supplement 114), John Wiley & Sons, New York
(2016), for definitions and terms of the art. The definitions
provided herein should not be construed to have a scope less than
understood by a person of ordinary skill in the art.
[0065] "About" as used herein when referring to a measurable value
such as an amount, a temporal duration, and the like, is meant to
encompass variations of .+-.20% or .+-.10%, more preferably .+-.5%,
even more preferably .+-.1%, and still more preferably .+-.0.1%
from the specified value, as such variations are appropriate to
perform the disclosed methods.
[0066] "Nucleotide sequence", "DNA sequence" or "nucleic acid
molecule(s)" as used herein refers to a polymeric form of
nucleotides of any length, either ribonucleotides or
deoxyribonucleotides. This term refers only to the primary
structure of the molecule. Thus, this term includes double- and
single-stranded DNA, and RNA. The term "nucleic acid" as used
herein, is a single or double stranded covalently-linked sequence
of nucleotides in which the 3' and 5' ends on each nucleotide are
joined by phosphodiester bonds. The polynucleotide may be made up
of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acids
may be manufactured synthetically in vitro or isolated from natural
sources. Nucleic acids may further include modified DNA or RNA, for
example DNA or RNA that has been methylated, or RNA that has been
subject to post-translational modification, for example 5'-capping
with 7-methylguanosine, 3'-processing such as cleavage and
polyadenylation, and splicing. Nucleic acids may also include
synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA),
cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA),
glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide
nucleic acid (PNA). Sizes of nucleic acids, also referred to herein
as "polynucleotides" are typically expressed as the number of base
pairs (bp) for double stranded polynucleotides, or in the case of
single stranded polynucleotides as the number of nucleotides (nt).
One thousand bp or nt equal a kilobase (kb). Polynucleotides of
less than around 40 nucleotides in length are typically called
"oligonucleotides" and may comprise primers for use in manipulation
of DNA such as via polymerase chain reaction (PCR).
[0067] "Gene" as used here includes both the promoter region of the
gene as well as the coding sequence. It refers both to the genomic
sequence (including possible introns) as well as to the cDNA
derived from the spliced messenger, operably linked to a promoter
sequence.
[0068] "Coding sequence" is a nucleotide sequence, which is
transcribed into mRNA and/or translated into a polypeptide when
placed under the control of appropriate regulatory sequences. The
boundaries of the coding sequence are determined by a translation
start codon at the 5'-terminus and a translation stop codon at the
3'-terminus. A coding sequence can include, but is not limited to
mRNA, cDNA, recombinant nucleotide sequences or genomic DNA, while
introns may be present as well under certain circumstances.
[0069] The term "amino acid" in the context of the present
disclosure is used in its broadest sense and is meant to include
organic compounds containing amine (NH.sub.2) and carboxyl (COOH)
functional groups, along with a side chain (e.g., a R group)
specific to each amino acid. In some embodiments, the amino acids
refer to naturally occurring L .alpha.-amino acids or residues. The
commonly used one and three letter abbreviations for naturally
occurring amino acids are used herein: A=Ala; C=Cys; D=Asp; E=Glu;
F=Phe; G=Gly; H=His; I=Ile; K=Lys; L=Leu; M=Met; N=Asn; P=Pro;
Q=Gln; R=Arg; S=Ser; T=Thr; V=Val; W=Trp; and Y=Tyr (Lehninger, A.
L., (1975) Biochemistry, 2d ed., pp. 71-92, Worth Publishers, New
York). The general term "amino acid" further includes D-amino
acids, retro-inverso amino acids as well as chemically modified
amino acids such as amino acid analogues, naturally occurring amino
acids that are not usually incorporated into proteins such as
norleucine, and chemically synthesised compounds having properties
known in the art to be characteristic of an amino acid, such as
.beta.-amino acids. For example, analogues or mimetics of
phenylalanine or proline, which allow the same conformational
restriction of the peptide compounds as do natural Phe or Pro, are
included within the definition of amino acid. Such analogues and
mimetics are referred to herein as "functional equivalents" of the
respective amino acid. Other examples of amino acids are listed by
Roberts and Vellaccio, The Peptides: Analysis, Synthesis, Biology,
Gross and Meiehofer, eds., Vol. 5 p. 341, Academic Press, Inc.,
N.Y. 1983, which is incorporated herein by reference.
[0070] The terms "protein", "polypeptide", and "peptide" are
interchangeably used further herein to refer to a polymer of amino
acid residues and to variants and synthetic analogues of the same.
Thus, these terms apply to amino acid polymers in which one or more
amino acid residues is a synthetic non-naturally occurring amino
acid, such as a chemical analogue of a corresponding naturally
occurring amino acid, as well as to naturally-occurring amino acid
polymers. Polypeptides can also undergo maturation or
post-translational modification processes that may include, but are
not limited to: glycosylation, proteolytic cleavage, lipidization,
signal peptide cleavage, propeptide cleavage, phosphorylation, and
such like. By "recombinant polypeptide" is meant a polypeptide made
using recombinant techniques, e.g., through the expression of a
recombinant or synthetic polynucleotide. When the chimeric
polypeptide or biologically active portion thereof is recombinantly
produced, it is also preferably substantially free of culture
medium, e.g., culture medium represents less than about 20%, more
preferably less than about 10%, and most preferably less than about
5% of the volume of the protein preparation. By "isolated" is meant
material that is substantially or essentially free from components
that normally accompany it in its native state. For example, an
"isolated polypeptide", as used herein, refers to a polypeptide,
which has been purified from the molecules which flank it in a
naturally-occurring state, e.g., a protein complex or CsgF peptide
which has been removed from the molecules present in the production
host that are adjacent to said polypeptide. An isolated CsgF
peptide (optionally a truncated CsgF peptide) can be generated by
amino acid chemical synthesis or can be generated by recombinant
production. An isolated complex can be generated by in vitro
reconstitution after purification of the components of the complex,
e.g. a CsgG pore and the CsgF peptide(s), or can be generated by
recombinant co-expression.
[0071] "Orthologues" and "paralogues" encompass evolutionary
concepts used to describe the ancestral relationships of genes.
Paralogues are genes within the same species that have originated
through duplication of an ancestral gene; orthologues are genes
from different organisms that have originated through speciation,
and are also derived from a common ancestral gene.
[0072] "Homologue", "Homologues" of a protein encompass peptides,
oligopeptides, polypeptides, proteins and enzymes having amino acid
substitutions, deletions and/or insertions relative to the
unmodified or wild-type protein in question and having similar
biological and functional activity as the unmodified protein from
which they are derived. The term "amino acid identity" as used
herein refers to the extent that sequences are identical on an
amino acid-by-amino acid basis over a window of comparison. Thus, a
"percentage of sequence identity" is calculated by comparing two
optimally aligned sequences over the window of comparison,
determining the number of positions at which the identical amino
acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, Ile, Phe,
Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys and Met) occurs in
both sequences to yield the number of matched positions, dividing
the number of matched positions by the total number of positions in
the window of comparison (i.e., the window size), and multiplying
the result by 100 to yield the percentage of sequence identity.
[0073] The term "CsgG pore" defines a pore comprising multiple CsgG
monomers. Each CsgG momomer may be a wild-type monomer from E. coli
(SEQ ID NO: 3), wild-type homologues of E. coli CsgG, such as for
example, monomers having any one of the amino acid sequences shown
in SEQ ID NOS: 68 to 88. or a variant of any thereof (e.g. a
variant of any one of SEQ ID NOs: 3 and 68 to 88). The variant CsgG
momomer may also be referred to as a modified CsgG monomer or a
mutant CsgG monomer. The modifications, or mutations, in the
variant include but are not limited to any one or more of the
modifications disclosed herein, or combinations of said
modifications.
[0074] For all aspects and embodiments of the present invention, a
CsgG homologue is referred to as a polypeptide that has at least
50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgG as shown in SEQ ID NO: 3. A CsgG homologue
is also referred to as a polypeptide that contains the PFAM domain
PF03783, which is characteristic for CsgG-like proteins. A list of
presently known CsgG homologues and CsgG architectures can be found
at http://pfam.xfam.org//family/PF03783. Likewise, a CsgG
homologous polynucleotide can comprise a polynucleotide that has at
least 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence
identity to wild-type E. coli CsgG as shown in SEQ ID NO: 1.
Examples of homologues of CsgG shown in SEQ ID NO:3 have the
sequences shown in SEQ ID NOS: 68 to 88.
[0075] The term "modified CsgF peptide" or "CsgF peptide" defines
CsgF peptide that has been truncated from its C-terminal end (e.g.
is an N-terminal fragment) and/or is modified to include a cleavage
site. The CsgF peptide may be a fragment of wild-type E. coli CsgF
(SEQ ID NO: 5 or SEQ ID NO: 6), or of a wild-type homologue of E.
coli CsgF, such as for example, a peptide comprising any one of the
amino acid sequences shown in SEQ ID NOS: 17 to 36. or a variant
(e.g. one modified to include a cleavage site) of any thereof.
[0076] For all aspects and embodiments of the present invention, a
CsgF homologue is referred to as a polypeptide that has at least
50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgF as shown in SEQ ID NO: 6. In some
embodiments, a CsgF homologue is also referred to as a polypeptide
that contains the PFAM domain PF10614, which is characteristic for
CsgF-like proteins. A list of presently known CsgF homologues and
CsgF architectures can be found at
http://pfam.xfam.org//family/PF10614. Likewise, a CsgF homologous
polynucleotide can comprise a polynucleotide that has at least 50%,
60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgF as shown in SEQ ID NO: 4. Examples of
truncated regions of homologues of CsgF shown in SEQ ID NO:6 have
the sequences shown in SEQ ID NOs:17 to 36.
[0077] The term "N-terminal portion of a CsgF mature peptide"
refers to a peptide having an amino acid sequence that corresponds
to the first 60, 50, or 40 amino acid residues starting from the
N-terminus of a CsgF mature peptide (without a signal sequence).
The CsgF mature peptide can be a wild-type or mutant (e.g., with
one or more mutations).
[0078] Sequence identity can also be to a fragment or portion of
the full length polynucleotide or polypeptide. Hence, a sequence
may have only 50% overall sequence identity with a full length
reference sequence, but a sequence of a particular region, domain
or subunit could share 80%, 90%, or as much as 99% sequence
identity with the reference sequence. Homology to the nucleic acid
sequence of SEQ ID NO: 1 for CsgG homologues or SEQ ID NO:4 for
CsgF homologues, respectively, is not limited simply to sequence
identity. Many nucleic acid sequences can demonstrate biologically
significant homology to each other despite having apparently low
sequence identity. Homologous nucleic acid sequences are considered
to be those that will hybridise to each other under conditions of
low stringency (M. R. Green, J. Sambrook, 2012, Molecular Cloning:
A Laboratory Manual, Fourth Edition, Books 1-3, Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y.).
[0079] The term "wild-type" refers to a gene or gene product
isolated from a naturally occurring source. A wild-type gene is
that which is most frequently observed in a population and is thus
arbitrarily designed the "normal" or "wild-type" form of the gene.
In contrast, the term "modified", "mutant" or "variant" refers to a
gene or gene product that displays modifications in sequence (e.g.,
substitutions, truncations, or insertions), post-translational
modifications and/or functional properties (e.g., altered
characteristics) when compared to the wild-type gene or gene
product. It is noted that naturally occurring mutants can be
isolated; these are identified by the fact that they have altered
characteristics when compared to the wild-type gene or gene
product. Methods for introducing or substituting
naturally-occurring amino acids are well known in the art. For
instance, methionine (M) may be substituted with arginine (R) by
replacing the codon for methionine (ATG) with a codon for arginine
(CGT) at the relevant position in a polynucleotide encoding the
mutant monomer. Methods for introducing or substituting
non-naturally-occurring amino acids are also well known in the art.
For instance, non-naturally-occurring amino acids may be introduced
by including synthetic aminoacyl-tRNAs in the IVTT system used to
express the mutant monomer. Alternatively, they may be introduced
by expressing the mutant monomer in E. coli that are auxotrophic
for specific amino acids in the presence of synthetic (i.e.
non-naturally-occurring) analogues of those specific amino acids.
They may also be produced by naked ligation if the mutant monomer
is produced using partial peptide synthesis. Conservative
substitutions replace amino acids with other amino acids of similar
chemical structure, similar chemical properties or similar
side-chain volume. The amino acids introduced may have similar
polarity, hydrophilicity, hydrophobicity, basicity, acidity,
neutrality or charge to the amino acids they replace.
Alternatively, the conservative substitution may introduce another
amino acid that is aromatic or aliphatic in the place of a
pre-existing aromatic or aliphatic amino acid. Conservative amino
acid changes are well-known in the art and may be selected in
accordance with the properties of the 20 main amino acids as
defined in Table 1 below. Where amino acids have similar polarity,
this can also be determined by reference to the hydropathy scale
for amino acid side chains in Table 2.
TABLE-US-00001 TABLE 1 Chemical properties of amino acids Ala
aliphatic, hydrophobic, Met hydrophobic, neutral neutral Cys polar,
hydrophobic, Asn polar, hydrophilic, neutral neutral Asp polar,
hydrophilic, Pro hydrophobic, charged (-) neutral Glu polar,
hydrophilic, Gln polar, hydrophilic, charged (-) neutral Phe
aromatic, hydrophobic, Arg polar, hydrophilic, neutral charged (+)
Gly aliphatic, Ser polar, hydrophilic, neutral neutral His
aromatic, polar, hydrophilic, Thr polar, hydrophilic, charged (+)
neutral Ile aliphatic, hydrophobic, Val aliphatic, hydrophobic,
neutral neutral Lys polar, hydrophilic, Trp aromatic, hydrophobic,
charged(+) neutral Leu aliphatic, hydrophobic, Tyr aromatic, polar,
neutral hydrophobic
TABLE-US-00002 TABLE 2 Hydropathy scale Side Chain Hydropathy Ile
4.5 Val 4.2 Leu 3.8 Phe 2.8 Cys 2.5 Met 1.9 Ala 1.8 Gly -0.4 Thr
-0.7 Ser -0.8 Trp -0.9 Tyr -1.3 Pro -1.6 His -3.2 Glu -3.5 Gln -3.5
Asp -3.5 Asn -3.5 Lys -3.9 Arg -4.5
[0080] A mutant or modified protein, monomer or peptide can also be
chemically modified in any way and at any site. A mutant or
modified monomer or peptide is preferably chemically modified by
attachment of a molecule to one or more cysteines (cysteine
linkage), attachment of a molecule to one or more lysines,
attachment of a molecule to one or more non-natural amino acids,
enzyme modification of an epitope or modification of a terminus.
Suitable methods for carrying out such modifications are well-known
in the art. The mutant of modified protein, monomer or peptide may
be chemically modified by the attachment of any molecule. For
instance, the mutant of modified protein, monomer or peptide may be
chemically modified by attachment of a dye or a fluorophore. In
some embodiments, the mutant or modified monomer or peptide is
chemically modified with a molecular adaptor that facilitates the
interaction between a pore comprising the monomer or peptide and a
target nucleotide or target polynucleotide sequence. The molecular
adaptor is preferably a cyclic molecule, a cyclodextrin, a species
that is capable of hybridization, a DNA binder or interchelator, a
peptide or peptide analogue, a synthetic polymer, an aromatic
planar molecule, a small positively-charged molecule or a small
molecule capable of hydrogen-bonding.
[0081] The presence of the adaptor improves the host-guest
chemistry of the pore and the nucleotide or polynucleotide sequence
and thereby improves the sequencing ability of pores formed from
the mutant monomer. The principles of host-guest chemistry are
well-known in the art. The adaptor has an effect on the physical or
chemical properties of the pore that improves its interaction with
the nucleotide or polynucleotide sequence. The adaptor may alter
the charge of the barrel or channel of the pore or specifically
interact with or bind to the nucleotide or polynucleotide sequence
thereby facilitating its interaction with the pore. Hence a
modified CsgF peptide, as provided in the disclosure, may be
coupled to enzymes or proteins providing better proximity of said
proteins or enzymes to the pore, which may facilitate certain
applications of the pore complex comprising the modified CsgF
peptide.
[0082] In this context, proteins can also be fusion proteins,
referring in particular to genetic fusion, made e.g., by
recombinant DNA technology. Proteins can also be conjugated, or
"conjugated to", as used herein, which refers, in particular, to
chemical and/or enzymatic conjugation resulting in a stable
covalent link.
[0083] Proteins may form a protein complex when several
polypeptides or protein monomers bind to or interact with each
other. "Binding" means any interaction, be it direct or indirect. A
direct interaction implies a contact between the binding partners,
for instance through a covalent link or coupling. An indirect
interaction means any interaction whereby the interaction partners
interact in a complex of more than two compounds. The interaction
can be completely indirect, with the help of one or more bridging
molecules, or partly indirect, where there is still a direct
contact between the partners, which is stabilized by the additional
interaction of one or more compounds. The "complex" as referred to
in this disclosure is defined as a group of two or more associated
proteins, which might have different functions. The association
between the different polyptides of the protein complex might be
via non-covalent interactions, such as hydrophobic or ionic forces,
or may as well be a covalent binding or coupling, such as
disulphide bridges, or peptidic bounds. Covalent "binding" or
"coupling" are used interchangeably herein, and may also involve
"cysteine coupling" or "reactive or photoreactive amino acid
coupling", referring to a bioconjugation between cysteines or
between (photo)reactive amino acids, respectively, which is a
chemical covalent link to form a stable complex. Examples of
photoreactive amino acids include azidohomoalanine,
homopropargylglycyine, homoallelglycine, p-acetyl-Phe, p-azido-Phe,
p-propargyloxy-Phe and p-benzoyl-Phe (Wang et al. 2012, in Protein
Engineering, DOI: 10.5772/28719; Chin et al. 2002, Proc. Nat. Acad.
Sci. USA 99(17); 11020-24).
[0084] A "biological pore" is a transmembrane protein structure
defining a channel or hole that allows the translocation of
molecules and ions from one side of the membrane to the other. The
translocation of ionic species through the pore may be driven by an
electrical potential difference applied to either side of the pore.
A "nanopore" is a biological pore in which the minimum diameter of
the channel through which molecules or ions pass is in the order of
nanometres (10.sup.-9 metres). In some embodiments, the biological
pore can be a transmembrane protein pore. The transmembrane protein
structure of a biological pore may be monomeric or oligomeric in
nature. Typically, the pore comprises a plurality of polypeptide
subunits arranged around a central axis thereby forming a
protein-lined channel that extends substantially perpendicular to
the membrane in which the nanopore resides. The number of
polypeptide subunits is not limited. Typically, the number of
subunits is from 5 to up to 30, suitably the number of subunits is
from 6 to 10. Alternatively, the number of subunits is not defined
as in the case of perfringolysin or related large membrane pores.
The portions of the protein subunits within the nanopore that form
protein-lined channel typically comprise secondary structural
motifs that may include one or more trans-membrane .beta.-barrel,
and/or .alpha.-helix sections.
[0085] The term "pore", "pore complex", or "complex pore", as used
interchangeably herein, refer to an oligomeric pore, wherein for
instance at least a CsgG monomer (including, e.g., one or more CsgG
monomers such as two or more CsgG monomers, three or more CsgG
monomers) or a CsgG pore (comprised of CsgG monomers), and a CsgF
peptide (e.g., a modified or truncated CsgF peptide) are associated
in the complex and together form a pore or a nanopore. The pore
complex of the disclosure has the features of a biological pore,
i.e. it has a typical transmembrane protein structure. When the
pore complex is provided in an environment having membrane
components, membranes, cells, or an insulating layer, the pore
complex will insert in the membrane or the insulating layer, and
form a "transmembrane pore complex".
[0086] The pore complex or transmembrane pore complex of the
disclosure is suited for analyte characterization. In some
embodiments, the pore complex or transmembrane complex described
herein can be used for sequencing polynucleotide sequences e.g.,
because it can discriminate between different nucleotides with a
high degree of sensitivity. The pore complex of the disclosure may
be an isolated pore complex, substantially isolated, purified or
substantially purified. A pore complex of the disclosure is
"isolated" or purified if it is completely free of any other
components, such as lipids or other pores, or other proteins with
which it is normally associated in its native state e.g., CsgE,
CsgA CsgB, or if it is sufficiently enriched from a membranous
compartment. A pore complex is substantially isolated if it is
mixed with carriers or diluents which will not interfere with its
intended use. For instance, a pore complex is substantially
isolated or substantially purified if it is present in a form that
comprises less than 10%, less than 5%, less than 2% or less than 1%
of other components, such as triblock copolymers, lipids or other
pores. Alternatively, a pore complex of the disclosure may be a
transmembrane pore complex, when present in a membrane. The
disclosure provides isolated pore complexes comprising a
homo-oligomeric pore derived from CsgG comprising identical mutant
monomers, which may also contain a mutant form of the CsgG monomer,
as a homologue thereof. Alternatively, an isolated pore complex
comprising a hetero-oligomeric CsgG pore is provided, which can be
CsgG pore consisting of mutant and wild-type CsgG monomers, or of
different forms of CsgG variants, mutants or homologues. The
isolated pore complex typically comprises at least 7, at least 8,
at least 9 or at least 10 CsgG monomers, and 1 or more (modified)
CsgF peptides, such as 2, 3, 4, 5, 6, 7, 8, 9, 10 CsgF peptides.
The pore complex may comprise any ratio of CsG monomer:CsgF
peptide. In one embodiment, the ratio of CsG monomer:CsgF peptide
is 1:1.
[0087] The "constriction", "orifice", "constriction region",
"channel constriction", or "constriction site", as used
interchangeably herein, refers to an aperture defined by a luminal
surface of a pore or pore complex, which acts to allow the passage
of ions and target molecules (e.g., but not limited to
polynucleotides or individual nucleotides) but not other non-target
molecules through the pore complex channel. In some embodiments,
the constriction(s) are the narrowest aperture(s) within a pore or
pore complex. In this embodiment, the constriction(s) may serve to
limit the passage of molecules through the pore. The size of the
constriction is typically a key factor in determining suitability
of a nanopore for nucleic acid sequencing applications. If the
constriction is too small, the molecule to be sequenced will not be
able to pass through. However, to achieve a maximal effect on ion
flow through the channel, the constriction should not be too large.
For example, the constriction should not be wider than the
solvent-accessible transverse diameter of a target analyte.
Ideally, any constriction should be as close as possible in
diameter to the transverse diameter of the analyte passing through.
For sequencing of nucleic acids and nucleic acid bases, suitable
constriction diameters are in the nanometre range (10.sup.-9 meter
range). Suitably, the diameter should be in the region of 0.5 to
2.0 nm, typically, the diameter is in the region of 0.7 to 1.2 nm.
The constriction in wild type E. coli CsgG has a diameter of
approximately 9 .ANG. (0.9 nm). The CsgF constriction formed in the
pore complex comprising the CsgG-like pore and the modified CsgF
peptide, or homologues or mutants thereof, has a diameter in the
range of 0.5 to 2 nm or in the range of 0.7 to 1.2 nm and is hence
suitable for nucleic acid sequencing.
[0088] When two or more constrictions are present and spaced apart
each constriction may interact or "read" separate nucleotides
within the nucleic acid strand at the same time. In this situation,
the reduction in ion flow through the channel will be the result of
the combined restriction in flow of all the constrictions
containing nucleotides. Hence, in some instances a double
constriction may lead to a composite current signal. In certain
circumstances, the current read-out for one constriction, or
"reading head", may not be able to be determined individually when
two such reading heads are present. The constriction of wildtype E.
coli CsgG (SEQ ID NO:3) is composed of two annular rings formed by
juxtaposition of tyrosine residues at position 51 (Tyr 51) in the
adjacent protein monomers, and also the phenylalanine and
asparagine residues at positions 56 and 55 respectively (Phe 56 and
Asn 55) (FIG. 1). The wild-type pore structure of CsgG is in most
cases being re-engineered via recombinant genetic techniques to
widen, alter, or remove one of the two annular rings that make up
the CsgG constriction (mentioned as "CsgG channel constriction"
herein), to leave a single well-defined reading head. The
constriction motif in the CsgG oligomeric pore is located at amino
acid residues at position 38 to 63 in the wild type monomeric E.
coli CsgG polypeptide, depicted in SEQ ID NO: 3. In considering
this region, mutations at any of the amino acid residue positions
50 to 53, 54 to 56 and 58 to 59, as well as key of positioning of
the sidechains of Tyr51, Asn55, and Phe56 within the channel of the
wild-type CsgG structure, was shown to be advantageous in order to
modify or alter the characteristics of the reading head. The
present disclosure relating to a pore complex comprising a
CsgG-pore and a modified CsgF peptide, or homologues or mutants
thereof, surprisingly added another constriction (mentioned as
"CsgF channel constriction" herein) to the CsgG-containing pore
complex, forming a suitable additional, second reader head in the
pore, via complex formation with the modified CsgF peptide. Said
additional CsgF channel constriction or reader head is positioned
adjacent to the constriction loop of the CsgG pore, or of the
mutated GcsG pore. Said additional CsgF channel constriction or
reader head is positioned approximately 10 nm or less, such as 5 nm
or less, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 nm from the constriction
loop of the CsgG pore, or of the mutated GcsG pore. The pore
complex or transmembrane pore complex of the disclosure includes
pore complexes with two reader heads, meaning, channel
constrictions positioned in such a way to provide a suitable
separate reader head without interfering the accuracy of other
constriction channel reader heads. Said pore complexes therefore
may include CsgG mutant pores (see incorporated references
WO2016/034591, WO2017/149316, WO2017/149317, WO2017/149318 and
International patent application no. PCT/GB2018/051191 each of
which lists mutations to the wild-type CsgG pore that improve the
properties of the pore) as well as wild-type CsgG pores, or
homologues thereof, together with a modified CsgF peptide, or
homologue or mutant thereof, wherein said CsgF peptide has another
constriction channel forming a reader head.
[0089] Pores
[0090] The invention relates to CsgG pores complexed with an
extracellularly located CsgF peptide that surprisingly introduces
an additional channel constriction or reader head in the pore
complex. Moreover, the disclosure provides positional information
for the constriction made by the CsgF peptide within the pore
complex, the peptide being inserted in the lumen of the CsgG pore,
and the constriction site being in the N-terminal part of the CsgF
protein. Furthermore, modified or truncated CsgF peptides of the
disclosure were shown to be sufficient for pore complex formation,
and provide means and methods for biosensing applications. The
disclosure comprises wildtype and mutant CsgG pores (as disclosed
in e.g., WO2016/034591, WO2017/149316, WO2017/149317, WO2017/149318
and International patent application no. PCT/GB2018/051191), or
homologues or mutants thereof, combined with modified or truncated
CsgF peptides and its mutants or homologues, all together improving
the ability of the CsgG-like pore complex to interact with an
analyte, such as a polynucleotide. The additional constriction
introduced in the CsgG-like nanopore channel by complex formation
with (modified or truncated) CsgF peptides expands the contact
surface with passing analytes and can act as a second reader head
for analyte detection and characterization. Pores comprising mutant
CsgG monomers combined with novel mutant or modified forms of CsgF
can improve the characterisation of analytes, such as
polynucleotides, providing a more discriminating direct
relationship between the observed current as the polynucleotide
moves through the pore. In particular, by having two stacked reader
heads spaced at a defined distance, the CsgG:CsgF pore complex may
facilitate characterization of polynucleotides that contain at
least one homopolymeric stretch, e.g., several consecutive copies
of the same nucleotide that otherwise exceed the interaction length
of the single CsgG reader head. Additionally, by having two stacked
constrictions at a defined distance, small molecule analytes
including organic or inorganic drugs and pollutants passing through
the CsgG:CsgF complex pore will consecutively pass two independent
reader heads. The chemical nature of either reader head can be
independently modified, each giving unique interaction properties
with the analyte, thus providing additional discriminating power
during analyte detection.
[0091] In a first aspect, the invention relates to an isolated pore
complex, comprising a CsgG pore, or a homologue or mutant thereof,
or a CsgG-like pore, and a modified CsgF peptide, or a homologue or
mutant thereof. In fact, the disclosure relates to a modified CsgG
biological pore, comprising a modified CsgF peptide, which can be a
truncated, mutant and/or variant thereof. In one embodiment, the
interaction region between said modified CsgF peptide or homologue
or mutant thereof, is located at the lumen of the CsgG pore, or its
homologues or mutants. In another embodiment, the pore complex has
two or more constriction sites or reader heads, provided by at
least one constriction of the CsgG pore, and by at least one being
introduced by the CsgF peptide, forming a complex with the CsgG
pore. N-terminal CsgF positions with the inclusion of positions in
the range of amino acid residues 39-64 of SEQ ID NO:5, or more
particularly of amino acid residues 49-64 of SEQ ID NO:5, were
shown to allow detectable amounts of a stable CsgG:CsgF complex. In
one embodiment, the CsgF constriction produced by a modified CsgF
peptide (e.g., the ones described herein) is adjacent or
head-to-head of the first constriction in the CsgG pore of the pore
complex. For CsgG or CsgG-like protein pores, the constriction site
has been determined to be formed by a loop region of a beta strand
(see FIG. 1).
[0092] In one embodiment, the modified CsgF peptide is a peptide
wherein said modification in particular refers to a truncated CsgF
protein or fragment, comprising an N-terminal CsgF peptide fragment
defined by the limitation to contain the constriction region and to
bind CsgG monomers, or homologues or mutants thereof. Said modified
CsgF peptide may additionally comprise mutations or homologous
sequences, which may facilitate certain properties of the pore
complex. In a particular embodiment, modified CsgF peptides
comprise CsgF protein truncations as compared to the wild-type
preprotein (SEQ ID NO:5) or mature protein (SEQ ID NO:6) sequence,
or homologues thereof. These modified peptides are intended to
function as a pore complex component introducing an additional
constriction site or reader head, within the CsgG-like pore formed
by CsgG and the modified or truncated CsgF peptide. Examples of
truncated modified peptides are described below.
[0093] Examples of homologues of the modified CsgF peptides are for
instance determined in Example 3, and reveal CsgF-like proteins or
CsgF peptides comprising a homologous or similar constriction
region in different bacterial strains, which may be useful in the
use of similar pore complexes. The structural properties and
CsgG-binding elements in the CsgF peptides derived from various
CsgF homologues are conserved, such that CsgF peptides can be used
in combination with different wildtype or mutant CsgG pores. This
includes complexes of CsgG pores with non-cognate CsgF, meaning
that the CsgG pore and the parental CsgF homologue from which the
CsgF is derived do not need to originate from the same operon,
bacterial species or strain.
[0094] In alternative embodiments, the CsgG pore within the pore
complex is not a wild-type pore, but comprises mutations or
modifications to increase pore properties as well. The isolated
pore complex of the disclosure, formed by the CsgG pore, or a
homologue thereof, and the modified CsgF peptide, or a homologue
thereof, may be formed by the wild-type form of the CsgG pore or
may be further modified in the CsgG pore, such as by directed
mutagenesis of particular amino acid residues, to further enhance
the desired properties of the CsgG pore for use within the pore
complex. For example, in embodiments of the present invention
mutations are contemplated to alter the number, size, shape,
placement or orientation of the constriction within the channel.
The pore complex comprising a modified mutant CsgG pore may be
prepared by known genetic engineering techniques that result in the
insertion, substitution and/or deletion of specific targeted amino
acid residues in the polypeptide sequence. In the case of the
oligomeric CsgG pore, the mutations may be made in each monomeric
polypeptide subunit, or any one of the monomers, or all of the
monomers. Suitably, in one embodiment of the invention the
mutations described are made to all monomeric polypeptides within
the oligomeric protein structure. A mutant CsgG monomer is a
monomer whose sequence varies from that of a wild-type CsgG monomer
and which retains the ability to form a pore. Methods for
confirming the ability of mutant monomers to form pores are
well-known in the art. The disclosure comprises wild type and
mutant CsgG pores (e.g., as disclosed in WO2016/034591,
WO2017/149316, WO2017/149317, WO2017/149318 and International
patent application no. PCT/GB2018/051191), or homologues thereof,
combined with modified or truncated CsgF peptides and their mutants
or homologues, all together improving the ability of the CsgG-like
pore complex to interact with an analyte, such as a polynucleotide.
Mutant CsgG pores may comprise one or more mutant monomers. The
CsgG pore may be a homopolymer comprising identical monomers, or a
heteropolymer comprising two or more different monomers. The
monomers may have one or more of the mutations described below in
any combination.
[0095] The nanopore complex comprising a modified CsgF peptide
differs as compared to the wild-type CsgF protein depicted in SEQ
ID NO:6 since the modified CsgF peptide only comprises N-terminal
fragments or truncates of the wild-type CsgF protein in certain
embodiments. The modified CsgF peptide however may be additionally
or alternatively mutated CsgF peptide in the sense that mutations
as amino acid substitutions are made to allow for a better second
constriction site in the pore formed by the complex comprising the
CsgG pore and the modified CsgF peptide. The mutant monomers might
as such have improved polynucleotide reading properties when said
complex is used in nucleotide sequencing i.e. display improved
polynucleotide capture and nucleotide discrimination, in addition
to the improved feature of the complex to comprise two reader
heads. In particular, pores constructed from the mutant peptides
capture nucleotides and polynucleotides more easily than the wild
type. In addition, pores constructed from the mutant peptides may
display an increased current range, which makes it easier to
discriminate between different nucleotides, and a reduced variance
of states, which increases the signal-to-noise ratio. In addition,
the number of nucleotides contributing to the current as the
polynucleotide moves through pores constructed from the mutants may
be decreased. This makes it easier to identify a direct
relationship between the observed current as the polynucleotide
moves through the pore and the polynucleotide sequence. In
addition, pores constructed from the mutant peptides may display an
increased throughput, e.g., are more likely to interact with an
analyte, such as a polynucleotide. This makes it easier to
characterise analytes using the pores. Pores constructed from the
mutant peptides may insert into a membrane more easily, or may
provide easier way to retain additional proteins in close vicinity
of the pore complex.
[0096] In an alternative embodiment, the CsgF constriction site
provided in the pore complex of the invention has a diameter in the
range of 0.5 nm to 2.0 nm, thereby providing a pore complex
suitable for nucleic acid sequencing, as described above.
[0097] The pore may be stabilised by covalent attachment of the
CsgF peptide to the CsgG pore. The covalent linkage may for example
be a disulphide bond, or click chemistry. The CsgF peptide and CsgG
pore may, for example, be covalently linked via residues at a
position corresponding to one or more of the following pairs of
positions of SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and
153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and
142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, and 29 and
144.
[0098] In the pore, the interaction between the CsgF peptide and
the CsgG pore may, for example, be stabilised by hydrophobic
interactions or electrostatic interactions at a position
corresponding to one or more of the following pairs of positions of
SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and 153, 4 and 133,
5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201,
12 and 149, 12 and 203, 26 and 191, and 29 and 144.
[0099] The residues in CsgF and/or CsgG at one or more of the
positions listed above may be modified in order to enhance the
interaction between CsgG and CsgF in the pore.
[0100] In one embodiment, the pore of the invention may be
isolated, substantially isolated, purified or substantially
purified. A pore of the invention is isolated or purified if it is
completely free of any other components, such as lipids or other
pores. A pore is substantially isolated if it is mixed with
carriers or diluents which will not interfere with its intended
use. For instance, a pore is substantially isolated or
substantially purified if it is present in a form that comprises
less than 10%, less than 5%, less than 2% or less than 1% of other
components, such as triblock copolymers, lipids or other pores.
Alternatively, a pore of the invention may be present in a
membrane. Suitable membranes are discussed below.
[0101] A pore of the invention may be present as an individual or
single pore. Alternatively, a pore of the invention may be present
in a homologous or heterologous population of two or more
pores.
CsgF Peptide
[0102] A second aspect of the invention relates to novel modified
CsgF monomers (peptides), or truncated CsgF proteins, or a modified
or truncated peptide of a CsgF homologue or mutant. Those novel
modified CsgF peptides may be used in a pore complex to integrate a
second or additional reader head. Said modification or truncation
is preferably resulting in a fragment of the wild-type CsgF, or of
mutant or homologue CsgF protein, more preferably an N-terminal
fragment.
[0103] Mature CsgF (shown in SEQ ID NO:6) can be divided into three
main regions: a "CsgF constriction peptide" (FCP), a "neck" region
and a "head" region (as shown in FIGS. 4 and 5). The "head" region
of the CsgF peptide is distinct from a reader head of a pore as
described herein. The "head" region of the CsgF peptide may also be
referred to as the "C-terminal head domain".
[0104] The FCP forms the contact region with the CsgG
.beta.-barrel, where it gives rise to an additional constriction.
The neck region protrudes out of the .beta.-barrel. In the
CsgG:CsgF oligomer it forms a thin-walled hollow tube that connects
the FCP to a globular head region.
[0105] Based on multiple sequence alignments (FIG. 8),
co-purification experiments (FIG. 9) and the cryoEM reconstruction
of the CsgG:CsgF complex at 3.4 .ANG. resolution (FIG. 11) the CsgF
constriction peptide, neck and head regions can be defined as three
consecutive residue stretches in mature CsgF.
[0106] The FCP spans from approximately residues 1 to 35 of the
mature CsgF (Seq ID NO:6). The FCP forms the most conserved region
of the protein when comparing different CsgF orthologues (FIG. 8,
FIG. 10). CryoEM 3D reconstruction shows the FCP forms a
well-defined structure that binds the inside the CsgG .beta.-barrel
through non-covalent contacts with the CsgG transmembrane hairpins
TM1 (residues 134 to 154 of Seq ID NO:3) and TM2 (residues 184 to
208 of Seq ID NO:3) (FIG. 1E; 11) (TM1 and TM2 are defined in Goyal
P, et al., 2014). In the reconstruction, nine copies of the FCP
bind the CsgG oligomer (comprising 9 monomers) and together give
rise to an additional constriction approximately 2 nm above the top
of CsgG constriction formed by a continuous loop spanning residues
46 to 61 of mature CsgG (Seq ID NO:3; FIG. 1E; FIG. 11).
[0107] The cryoEM 3D reconstruction also shows the CsgF N-terminal
residue binds near the base or the top (depending on the
orientation) of the CsgG .beta.-barrel and exits the .beta.-barrel
at residue 32. This is in good agreement with MD simulations that
show the average contact time of residue pairs in the CsgG:CsgF
binding interface (Table 4). The cryoEM structure and MD
simulations show residues 33-34 reside outside the CsgG
.beta.-barrel, where a well, though not strictly, conserved Pro
residue (Pro 35 in Seq ID NO:6) makes the transition to the CsgF
neck regions. The CsgF neck is not resolved in atomic detail in the
CsgG:CsgF 3D reconstruction pointing to its conformational
flexibility. The CsgF neck is predicted to span from residue 36 to
approximately residue 50 (SEQ ID NO:6) based on multiple sequence
alignment and secondary structure prediction. The CsgF head region
forms the C-terminal part of CsgF and is predicted to span from
approximately residue 51 to the CsgF C-terminus. In the CsgG:CsgF
complex, this region oligomerizes to give rise to a globular
structure that appears to cap the CsgG:CsgF channel (FIGS. 4, 5).
Multiple sequence alignment of CsgF orthologues shows the CsgF neck
to be the least conserved region and suggests it may vary in length
from one orthologue to another (FIG. 8).
[0108] The CsgF peptide which forms part of the invention is a
truncated CsgF peptide lacking the C-terminal head; lacking the
C-terminal head and a part of the neck domain of CsgF (e.g., the
truncated CsgF peptide may comprise only a portion of the neck
domain of CsgF); or lacking the C-terminal head and neck domains of
CsgF. The CsgF peptide may lack part of the CsgF neck domain, e.g.
the CsgF peptide may comprise a portion of the neck domain, such as
for example, from amino acid residue 36 at the N-terminal end of
the neck domain (see SEQ ID:NO:6) (e.g. residues 36-40, 36-41,
36-42, 36-43, 36-45, 36-46 up to residues 36-50 or 36-60 of SEQ ID
NO: 6). The CsgF peptide preferably comprises a CsgG-binding region
and a region that forms a constriction in the pore. The
CsgG-binding region typically comprises residues 1 to 8 and/or 29
to 32 of the CsgF protein (SEQ ID NO: 6 or a homologue from another
species) and may include one or more modifications. The region that
forms a constriction in the pore typically comprises residues 9 to
28 of the CsgF protein (SEQ ID NO: 6 or a homologue from another
species) and may include one or more modifications. Residues 9 to
17 comprise the conserved motif N.sub.9PXFGGXXX.sub.17 and form a
turn region. Residues 9 to 28 form an alpha-helix. X.sub.17 (N17 in
SEQ ID NO: 6) forms the apex of the constriction region,
corresponding to the narrowest part of the CsgF constriction in the
pore. The CsgF constriction region also makes stabilising contacts
with the CsgG beta-barrel, primarily at residues 9, 11, 12, 18, 21
and 22 of SEQ ID NO: 6.
[0109] The CsgF peptide typically has a length of from 28 to 50
amino acids, such as 29 to 49, 30 to 45 or 32 to 40 amino acids.
Preferably the CsgF peptide comprises from 29 to 35 amino acids, or
29 to 45 amino acids. The CsgF peptide comprises all or part of the
FCP, which corresponds to residues 1 to 35 of SEQ ID NO: 6. Where
the CsgF peptide is shorter that the FCP, the truncation is
preferably made at the C-terminal end.
[0110] The CsgF fragment of SEQ ID NO:6 or of a homologue or mutant
thereof may have a length of 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54 or 55 amino acids.
[0111] The CsgF peptide may comprise the amino acid sequence of SEQ
ID NO: 6 from residue 1 up to any one of residues 25 to 60, such as
27 to 50, for example, 28 to 45 of SEQ ID NO: 6, or the
corresponding residues from a homologue of SEQ ID NO: 6, or variant
of either thereof. More specifically, the CsgF peptide may comprise
SEQ ID NO: 39 (residues 1 to 29 of SEQ ID NO: 6), or a homologue or
variant thereof.
[0112] Examples of such CsgF peptides comprises, consist
essentially of or consist of SEQ ID NO: 15 (residues 1 to 34 of SEQ
ID NO: 6), SEQ ID NO: 54 (residues 1 to 30 of SEQ ID NO: 6), SEQ ID
NO: 40 (residues 1 to 45 of SEQ ID NO: 6), or SEQ ID NO: 55
(residues 1 to 35 of SEQ ID NO: 6) and homologues or variants of
any thereof. Other Examples of CsgF peptides comprise, consist
essentially of or consist of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO:
9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ
ID NO: 14, SEQ ID NO: 16.
[0113] In the CsgF peptide, one or more residues e.g., in SEQ ID
NO: 15, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 54, or SEQ ID NO:
55 may be modified.
[0114] For example, the CsgF peptide may comprise a modification at
a position corresponding to one or more of the following positions
in SEQ ID NO: 6: G1, T4, F5, R8, N9, N11, F12, A26 and Q29.
[0115] The CsgF peptide may be modified to introduce a cysteine, a
hydrophobic amino acid, a charged amino acid, a non-native reactive
amino acid, or photoreactive amino acid, for example at a position
corresponding to one or more of the following positions in SEQ ID
NO: 6: G1, T4, F5, R8, N9, N11, F12, A26 and Q29.
[0116] For example, the CsgF peptide may comprise a modification at
a position corresponding to one or more of the following positions
in SEQ ID NO: 6: N15, N17, A20, N24 and A28. The CsgF peptide may
comprise a modification at a position corresponding to D34 to
stabilise the CsgG-CsgF complex. In particular embodiments, the
CsgF peptide comprises one or more of the substitutions:
N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C,
A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, N24S/T/Q/NG/L/V/I/F/Y/W/R/K/D/C,
A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C and D34F/Y/W/R/K/N/Q/C. The CsgF
peptide may, for example, comprise one or more of the following
substitutions: G1C, T4C, N17S, and D34Y or D34N.
[0117] The CsgF peptide may be produced by cleavage of a longer
protein, such as full-length CsgF using an enzyme. Cleavage at a
particular site may be directed by modifying the longer protein,
such as full-length CsgF, to include an enzyme cleavage site at an
appropriate position. Examples of CsgF amino acid sequences that
have been modified to include such enzyme cleavage sites are shown
in SEQ ID NOs: 56 to 67. Following cleavage all or part of the
added enzyme cleavage site may be present in the CsgF peptide that
associates with CsgG to form a pore. Thus the CsgF peptide may
further comprise all or part of an enzyme cleavage site at its
C-terminal end.
[0118] Some examples of suitable CsgF peptides are shown in Table 3
below:
TABLE-US-00003 TABLE 3 CsqF peptides SEQ Technical ID description
Protein sequence NO: CsgF-(1-45)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP 40 SYNDDFGIET CsgF-(1-29)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQ 39 CsgF-(1-35)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP 55 CsgF-G1C-
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP 89 (1-45) SYND DFGIET CsgF-G1C-
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQ 90 (1-30) CsgF-G1C-
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP 91 (1-35) CsgF-N17S-
GTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDP 92 (1-35) CsgF-G1C/
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKYP 93 D34Y-(1-35) CsgF-N17S-
GTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDP 94 (1-45) SYND DFGIET CsgF-G1C/
CTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKYP 95 N17S/D34Y- (1-35) CsgF-G1C-
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQHHHHHH 96 (1-30)-H10 HHHH CsgF-G1C-
CTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP 97 (1-35)-H10 HHHHHHHHHH
CsgF-N17S/ GTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKYP 98 D34Y-(1-35)
CsgF-N17S/ GTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKNP 99 D34N-(1-35)
CsgF-T4C/ GTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDP 100 N17S-(1-35)
CsgF-T4C/ GTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKYP 101 N17S/D34Y- (1-35)
CsgF-T4C/ GTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKNP 102 N17S/D34N- (1-35)
CsgF-G1C/ CTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDP 103 T4C/N17S- (1-35)
CsgF-G1C/ CTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKNP 104 T4C/N17S/
D34N-(1-35) CsgF-G1C/ CTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKYP 105
T4C/N17S/ D34Y-(1-35) CsgF-N17V-
GTMTFQFRNPNFGGNPVNGAFLLNSAQAQNSYKDP 106 (1-35) CsgF-N17A-
GTMTFQFRNPNFGGNPANGAFLLNSAQAQNSYKDP 107 (1-35) CsgF-N17F-
GTMTFQFRNPNFGGNPFNGAFLLNSAQAQNSYKDP 108 (1-35) CsgF-N17S-
GTMTFQFRNPNFGGNPSNGAFLLNSAQAQN 109 (1-30) CsgF-N17V-
GTMTFQFRNPNFGGNPVNGAFLLNSAQAQN 110 (1-30) CsgF-N17A-
GTMTFQFRNPNFGGNPANGAFLLNSAQAQN 111 (1-30) CsgF-N17F-
GTMTFQFRNPNFGGNPFNGAFLLNSAQAQN 112 (1-30) CsgF-N17V/
GTMTFQFRNPNFGGNPVNGAFLLNSAQAQNSYKYP 113 D34Y-(1-35) CsgF-N17A/
GTMTFQFRNPNFGGNPANGAFLLNSAQAQNSYKYP 114 D34Y-(1-35) CsgF-N17F/
GTMTFQFRNPNFGGNPFNGAFLLNSAQAQNSYKYP 115 D34Y-(1-35) CsgF-N17S/
GTMTFQFRNPNFGGNPSNGQFLLNSAQAQNSYKDP 116 A20Q-(1-35)
[0119] In particular embodiments, said CsgF fragment comprises the
amino acid sequence SEQ ID NO:39, or mutant or homologue thereof.
In particular, SEQ ID NO:39 comprises the first 29 amino acids of
the mature CsgF peptide (SEQ ID NO:6). In another embodiment, the
modified CsgF peptide of the invention is a truncated peptide
comprising SEQ ID NO:40. In particular, SEQ ID NO:40 comprises the
first 45 amino acids of the mature CsgF peptide (SEQ ID NO:6). In
particular, the CsgF constriction site and binding site to the CsgG
are located within the N-terminal CsgF peptide region, further
characterised in that amino acid 39 to 64 of SEQ ID NO:5 (present
in SEQ ID NO:39 and SEQ ID NO:40), or in particular amino acid 49
to 64 of SEQ ID NO:5 (present in SEQ ID NO:40, but not in SEQ ID
NO:39, the latter fragment encoded by SEQ ID NO:39 showing a weaker
interaction with CsgG (see Examples)), confer a higher stability to
the complex. Hence, the disclosure provides a modification of the
CsgF protein by truncating the protein to said peptides or peptides
comprising said N-terminal fragments or constriction site region to
allow complex formation with the CsgG pore, or homologues or
mutants thereof, in vivo. Further limitation is provided in one
embodiment relating to a modified CsgF peptide comprising SEQ ID
NO:37 or SEQ ID NO:38. Finally, identification of CsgF homologous
peptides, especially aligned within the constriction region (FCP
peptides), also provide modified CsgF peptide homologues that may
form a part of said isolated complex (e.g.; see FIGS. 8 and
10).
[0120] A further embodiment relates to the modified or truncated
CsgF peptides comprising SEQ ID NO:15, wherein said SEQ ID NO:15
contains the region of the CsgF protein including several residues
from the region of the CsgG binding and/or constriction site,
sufficient for in vitro reconstitution of the complex pore
comprising CsgG or a homologue thereof, and a modified CsgF
peptide, to result in an isolated pore complex comprising a CsgF
channel constriction. Another embodiment describes said modified
CsgF peptide comprising SEQ ID NO:16, which contains an N-terminal
fragment of the CsgF protein, and two additional amino acids (KD),
which will increase solubility and stability of the (synthetic)
peptide, as well to allow in vitro reconstitution of said complex
pore. Further embodiments are provided wherein said modified CsgF
peptide comprises SEQ ID NO:15, SEQ ID NO:16 or a homologue or
mutant thereof, wherein said modified CsgF peptide is further
mutated, but still retains a minimal of 35% amino acid identity to
SEQ ID NO:15, or SEQ ID NO:16, respectively, within the region of
the modified CsgF peptide corresponding to said SEQ ID NO:15 or 16
e.g., 40%, 50%, 60%, 70%, 80% 85%, 90% amino acid identity. Further
embodiments are provided wherein said modified CsgF peptide
comprises SEQ ID NO:15, SEQ ID NO:16 or a homologue or mutant
thereof, wherein said modified CsgF peptide is further mutated, but
still retains a minimal of 40%, 45%, 50%, 60%, 70%, 80% 85% or 90%
amino acid identity to SEQ ID NO:15, or SEQ ID NO:16, respectively,
within the region of the modified CsgF peptide corresponding to
said SEQ ID NO:15 or 16. Those mutated regions are intended to
alter and/or improve the characteristics of the CsgF constriction
site, as discussed above, so for instance a more accurate target
analysis can be obtained. Another embodiment discloses modified
CsgF peptides wherein one or more positions in the regions
comprising SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or SEQ ID NO:55
are modified, and wherein said mutation(s) retain a minimal of 35%
amino acid identity, or 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%
amino acid identity to SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or
SEQ ID NO:55 in the peptide fragment corresponding to the region
comprising SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or SEQ ID
NO:55.
[0121] So further embodiments of the invention involve the isolated
pore complex comprising a CsgG pore, or homologue or mutant
thereof, and modified CsgF peptide(s), or homologues or mutants
thereof, wherein said modified CsgF peptide(s) are defined as
described in the second aspect of the invention.
[0122] Additional embodiments relate to an isolated pore complex,
wherein said CsgG pore, at least via one monomer, and the modified
CsgF peptide, are coupled via covalent binding. Said covalent link
or binding is in one instance possible via cysteine linkage,
wherein the sulfhydryl side group of cysteine covalently links with
another amino acid residue or moiety. In a second possibility, the
covalent linkage is obtained via an interaction between non-native
(photo)reactive amino acids. (Photo-)reactive amino acids are
referring to artificial analogs of natural amino acids that can be
used for crosslinking of protein complexes, and may be incorporated
into proteins and peptides in vivo or in vitro. Photo-reactive
amino acid analogs in common use are photoreactive diazirine
analogs to leucine and methionine, and para-benzoyl-phenyl-alanine,
as well as azidohomoalanine, homopropargylglycyine,
homoallelglycine, p-acetyl-Phe, p-azido-Phe, p-propargyloxy-Phe and
p-benzoyl-Phe (Wang et al. 2012; Chin et al. 2002). Upon exposure
to ultraviolet light, they are activated and covalently bind to
interacting proteins that are within a few angstroms of the
photo-reactive amino acid analog. However, the positions in the
CsgG monomer where said covalent linkages may take place is
dependent on the exposure to the modified CsgF peptide. As shown in
FIG. 1, several amino acids are in the position to provide the
covalent linkage, namely positions 132, 133, 136, 138, 140, 142,
144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201,
203, 205, 207 or 209 of SEQ ID NO: 3, or of homologues thereof.
[0123] Another aspect of the invention relates to constructs
comprising said modified CsgF peptide, wherein said peptide is
covalently attached. A "construct" comprises two or more covalently
attached monomers derived from modified CsgF and/or CsgG, or a
homologue thereof. In other words, a construct may contain more
than one monomer. In another aspect, the invention also provides a
pore complex comprising at least one construct of the invention.
The pore complex contains sufficient constructs and, if necessary,
monomers to form the pore. For instance, an octameric pore may
comprise (a) four constructs each comprising two monomers, (b) two
constructs each comprising four monomers, (c) one construct
comprising two monomers and six monomers that do not form part of a
construct, or (d) one or two CsgF monomers in one construct, and
one construct with six to seven CsgG monomers or even (e) a
construct with CsgF and CsgG monomer in addition to another
construct solely comprising CsgG monomers. Same and additional
possibilities are provided for a nonameric pore for instance. Other
combinations of constructs and monomers can be envisaged by the
skilled person. One or more constructs of the invention may be used
to form a pore complex for characterising, such as sequencing,
polynucleotides. The construct may comprise at least 2, at least 3,
at least 4, at least 5, at least 6, at least 7, at least 8, at
least 9 or at least 10 monomers. The construct preferably comprises
two monomers. The two or more monomers may be the same or
different, may be CsgF, CsgG, CsgG/CsgF fusion monomers or
homologues thereof, or any combination thereof.
[0124] Another embodiment relates to the polynucleotide or nucleic
acid molecule encoding said modified CsgF peptides of the
invention, or homologues or mutants thereof, or polynucleotides
encoding a construct as described above.
[0125] Certain embodiments relate to an isolated transmembrane pore
complex comprising the isolated pore complex according to the first
and second aspect of the invention, and the components of a
membrane. Said isolated transmembrane pore complex is directly
applicable for use in molecular sensing, such as nucleic acid
sequencing. Alternatively, a membranous composition is provided,
comprising a modified CsgG/CsgF biological pore as described
herein, according to the isolated pore complex of the invention,
and a membrane, membrane components, or an insulating layer. One
embodiment relates to an isolated transmembrane pore complex
consisting of the isolated pore complex according to the invention,
and the components of a membrane.
[0126] Although the CsgG:CsgF complex is very stable, when CsgF is
truncated, the stability of CsgG:CsgF complexes decrease compared
to a complex comprising full length CsgF. Therefore, disulphide
bonds can be made between CsgG and CsgF to make the complex more
stable, for example following introduction of cysteine residues at
the positions identified herein. The pore complex can be made in
any of the previously mentioned methods and disulphide bond
formation can be induced by using oxidising agents (eg:
Copper-orthophenanthroline). Other interactions (eg: hydrophobic
interactions, charge-charge interactions/electrostatic
interactions) can also be used in those positions instead of
cysteine interactions.
[0127] In another embodiment, unnatural amino acids can also be
incorporated in those positions. In this embodiment, covalent bonds
made be made by via click chemistry. For example, unnatural amino
acids with azide or alkyne or with a dibenzocyclooctyne (DBCO)
group and/or a bicyclo[6.1.0]nonyne (BCN) group may be introduced
at one or more of these positions.
[0128] Such stabilising mutations can be combined with any other
modifications to CsgG and/or CsgF, for example the modifications
disclosed herein.
[0129] The CsgG pore may comprise at least one, such as 2, 3, 4, 5,
6, 7, 8, 9 or 10, CsgG monomers that is/are modified to facilitate
attachment to the CsgF peptide. For example a cysteine residue may
be introduced at one or more of the positions corresponding to
positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151,
153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of
SEQ ID NO: 3, and/or at any one of the positions identified in
Table 4 as being predicted to make contact with CsgF, to facilitate
covalent attachment to CsgG. As an alternative or addition to
covalent attachment via cysteine residues, the pore may be
stabilised by hydrophobic interactions or electrostatic
interactions. To facilitate such interactions, a non-native
reactive or photoreactive amino acid at a position corresponding to
one or more of positions 132, 133, 136, 138, 140, 142, 144, 145,
147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205,
207 and 209 of SEQ ID NO: 3, and/or at any one of the positions
identified in Table 4 as being predicted to make contact with
CsgF.
[0130] The CsgF peptide may be modified to facilitate attachment to
the CsgG pore. For example a cysteine residue may be introduced at
one or more of the positions corresponding to positions 1, 4, 5, 8,
9, 11, 12, 26 or 29 of SEQ ID NO: 6, and/or at any one of the
positions identified in Table 4 as being predicted to make contact
with CsgF, to facilitate covalent attachment to CsgG. As an
alternative or addition to covalent attachment via cysteine
residues, the pore may be stabilised by hydrophobic interactions or
electrostatic interactions. To facilitate such interactions, a
non-native reactive or photoreactive amino acid at a position
corresponding to one or more of positions 1, 4, 5, 8, 9, 11, 12, 26
or 29 of SEQ ID NO: 6, and/or at any one of the positions
identified in Table 4 as being predicted to make contact with
CsgF.
[0131] Preferred exemplary CsgF peptides include comprise the
following mutations relative to SEQ ID NO: 6:
N15X.sub.1/N17X.sub.2/A20X.sub.3/N24X.sub.4/A28X.sub.5/D34X.sub.6,
wherein X.sub.1 is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, X.sub.2 is
N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, X.sub.3 is
A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, X.sub.4 is
N/S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C, X.sub.5 is
A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C and X.sub.5 is D/F/Y/W/R/K/N/Q/C.
The mutations at positions N15, N17, A20, N24 and A28 are
constriction mutations and the mutation at position 34 affects the
interaction pf CsgF with the bottom of the CsgG pore to stabilise
the interaction.
CsgG Pore
[0132] The CsgG pore may be a homo-oligomeric pore comprising
identical mutant monomers of the invention. The CsgG pore may be a
hetero-oligomeric pore derived from CsgG, for example comprising at
least one mutant monomer as disclosed herein.
[0133] The CsgG pore may contain any number of mutant monomers. The
pore typically comprises at least 7, at least 8, at least 9 or at
least 10 identical mutant monomers, such as 7, 8, 9 or 10 mutant
monomers. The CsgG pore preferably comprises eight or nine
identical mutant monomers.
[0134] In a preferred embodiment, all of the monomers in the
hetero-oligomeric CsgG pore (such as 10, 9, 8 or 7 of the monomers)
are mutant monomers as disclosed herein, wherein at least one of
them differs from the others. They may all differ from one
another.
[0135] The mutant monomers in the CsgG pore are preferably all
approximately the same length or are the same length. The barrels
of the mutant monomers of the invention in the pore are preferably
approximately the same length or are the same length. Length may be
measured in number of amino acids and/or units of length.
[0136] A mutant monomer may be a variant of SEQ ID NO: 3. Over the
entire length of the amino acid sequence of SEQ ID NO: 3, a variant
will preferably be at least 50% homologous to that sequence based
on amino acid identity. More preferably, the variant may be at
least 55%, at least 60%, at least 65%, at least 70%, at least 75%,
at least 80%, at least 85%, at least 90% and more preferably at
least 95%, 97% or 99% homologous based on amino acid identity to
the amino acid sequence of SEQ ID NO: 3 over the entire sequence.
There may be at least 80%, for example at least 85%, 90% or 95%,
amino acid identity over a stretch of 100 or more, for example 125,
150, 175 or 200 or more, contiguous amino acids ("hard
homology").
[0137] CsgG monomers are highly conserved (as can be readily
appreciated from FIGS. 45 to 47 of WO2017/149317). Furthermore,
from knowledge of the mutations in relation to SEQ ID NO: 3 it is
possible to determine the equivalent positions for mutations of
CsgG monomers other than that of SEQ ID NO: 3.
[0138] Thus reference to a mutant CsgG monomer comprising a variant
of the sequence as shown in SEQ ID NO: 3 and specific amino-acid
mutations thereof as set out in the claims and elsewhere in the
specification also encompasses a mutant CsgG monomer comprising a
variant of the sequence as shown in SEQ ID NOs: 68 to 88 and
corresponding amino-acid mutations thereof. Likewise reference to a
construct, pore or method involving the use of a pore relating to a
mutant CsgG monomer comprising a variant of the sequence as shown
in SEQ ID NO: 3 and specific amino-acid mutations thereof as set
out in the claims and elsewhere in the specification also
encompasses a construct, pore or method relating to a mutant CsgG
monomer comprising a variant of the sequence according the above
disclosed SEQ ID NOs and corresponding amino-acid mutations
thereof. If will further be appreciated that the invention extends
to other variant CsgG monomers not expressly identified in the
specification that show highly conserved regions.
[0139] Standard methods in the art may be used to determine
homology. For example the UWGCG Package provides the BESTFIT
program which can be used to calculate homology, for example used
on its default settings (Devereux et al (1984) Nucleic Acids
Research 12, p 387-395). The PILEUP and BLAST algorithms can be
used to calculate homology or line up sequences (such as
identifying equivalent residues or corresponding sequences
(typically on their default settings)), for example as described in
Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, S. F et al
(1990) J Mol Biol 215:403-10. Software for performing BLAST
analyses is publicly available through the National Center for
Biotechnology Information (http://www.ncbi.nlm.nih.gov/).
[0140] SEQ ID NO: 3 is the wild-type CsgG monomer from Escherichia
coli Str. K-12 substr. MC4100. A variant of SEQ ID NO: 3 may
comprise any of the substitutions present in another CsgG
homologue. Preferred CsgG homologues are shown in SEQ ID NOs: 68 to
88. The variant may comprise combinations of one or more of the
substitutions present in SEQ ID NOs: 68 to 88 compared with SEQ ID
NO: 3. For example, mutations may be made at any one or more of the
positions in SEQ ID NO: 3 that differ between SEQ ID NO: 3 and any
one of SEQ ID NOs: 68 to 88. Such a mutation may be a substitution
of an amino acid in SEQ ID NO: 3 with an amino acid from the
corresponding position in any one of SEQ ID NOs: 68 to 88.
Alternatively, the mutation at any one of these positions may be a
substitution with any amino acid, or may be a deletion or insertion
mutation, such as deletion or insertion of 1 to 10 amino acids,
such as of 2 to 8 or 3 to 6 amino acids. Other than the mutations
disclosed herein, the amino acids that are conserved between SEQ ID
NO: 3 and all of SEQ ID NOs: 66 to 88 are preferably present in a
variant of the invention. However, conservative mutations may be
made at any one or more of these positions that are conserved
between SEQ ID NO: 3 and all of SEQ ID NOs: 66 to 88.
[0141] The invention provides a pore-forming CsgG mutant monomer
that comprises any one or more of the amino acids described herein
as being substituted into a specific position of SEQ ID NO: 3 at a
position in the structure of the CsgG monomer that corresponds to
the specific position in SEQ ID NO: 3.
[0142] Corresponding positions may be determined by standard
techniques in the art. For example, the PILEUP and BLAST algorithms
mentioned above can be used to align the sequence of a CsgG monomer
with SEQ ID NO: 3 and hence to identify corresponding residues.
[0143] The pore-forming mutant monomer typically retains the
ability to form the same 3D structure as the wild-type CsgG
monomer, such as the same 3D structure as a CsgG monomer having the
sequence of SEQ ID NO: 3. The 3D structure of CsgG is known in the
art and is disclosed, for example, in Goyal et al (2014) Nature
516(7530):250-3. Any number of mutations may be made in the
wild-type CsgG sequence in addition to the mutations described
herein provided that the CsgG mutant monomer retains the improved
properties imparted on it by the mutations of the present
invention.
[0144] Typically the CsgG monomer will retain the ability to form a
structure comprising three alpha-helicies and five beta-sheets.
Mutations may be made at least in the region of CsgG which is
N-terminal to the first alpha helix (which starts at S63 in SEQ ID
NO:3), in the second alpha helix (from G85 to A99 of SEQ ID NO: 3),
in the loop between the second alpha helix and the first beta sheet
(from Q100 to N120 of SEQ ID NO: 3), in the fourth and fifth beta
sheets (S173 to R192 and R198 to T107 of SEQ ID NO: 3,
respectively) and in the loop between the fourth and fifth beta
sheets (F193 to Q197 of SEQ ID NO: 3) without affecting the ability
of the CsgG monomer to form a transmembrane pore, which
transmembrane pore is capable of translocating polypeptides.
Therefore, it is envisaged that further mutations may be made in
any of these regions in any CsgG monomer without affecting the
ability of the monomer to form a pore that can translocate
polynucleotides. It is also expected that mutations may be made in
other regions, such as in any of the alpha helicies (S63 to R76,
G85 to A99 or V211 to L236 of SEQ ID NO: 3) or in any of the beta
sheets (1121 to N133, K135 to R142, 1146 to R162, S173 to R192 or
R198 to T107 of SEQ ID NO: 3) without affecting the ability of the
monomer to form a pore that can translocate polynucleotides. It is
also expected that deletions of one or more amino acids can be made
in any of the loop regions linking the alpha helicies and beta
sheets and/or in the N-terminal and/or C-terminal regions of the
CsgG monomer without affecting the ability of the monomer to form a
pore that can translocate polynucleotides.
[0145] Amino acid substitutions may be made to the amino acid
sequence of SEQ ID NO: 3 in addition to those discussed above, for
example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions replace amino acids with other amino
acids of similar chemical structure, similar chemical properties or
similar side-chain volume. The amino acids introduced may have
similar polarity, hydrophilicity, hydrophobicity, basicity,
acidity, neutrality or charge to the amino acids they replace.
Alternatively, the conservative substitution may introduce another
amino acid that is aromatic or aliphatic in the place of a
pre-existing aromatic or aliphatic amino acid. Conservative amino
acid changes are well-known in the art and may be selected in
accordance with the properties of the 20 main amino acids as
defined in Table 1 above. Where amino acids have similar polarity,
this can also be determined by reference to the hydropathy scale
for amino acid side chains in Table 2.
[0146] One or more amino acid residues of the amino acid sequence
of SEQ ID NO: 3 may additionally be deleted from the polypeptides
described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30 or more residues
may be deleted.
[0147] Variants may include fragments of SEQ ID NO: 3. Such
fragments retain pore forming activity. Fragments may be at least
50, at least 100, at least 150, at least 200 or at least 250 amino
acids in length. Such fragments may be used to produce the pores. A
fragment preferably comprises the membrane spanning domain of SEQ
ID NO: 3, namely K135-Q153 and S183-S208.
[0148] One or more amino acids may be alternatively or additionally
added to the polypeptides described above. An extension may be
provided at the amino terminal or carboxy terminal of the amino
acid sequence of SEQ ID NO: 3 or polypeptide variant or fragment
thereof. The extension may be quite short, for example from 1 to 10
amino acids in length. Alternatively, the extension may be longer,
for example up to 50 or 100 amino acids. A carrier protein may be
fused to an amino acid sequence according to the invention. Other
fusion proteins are discussed in more detail below.
[0149] A CsgG pore as described herein includes a wild type CsgG
pore, or a homologue or a mutant/variant thereof. A variant is a
polypeptide that has an amino acid sequence which varies from that
of SEQ ID NO: 3 and which retains its ability to form a pore. A
variant typically contains the regions of SEQ ID NO: 3 that are
responsible for pore formation. The pore forming ability of CsgG,
which contains a .beta.-barrel, is provided by .beta.-sheets in
each subunit. A variant of SEQ ID NO: 3 typically comprises the
regions in SEQ ID NO: 3 that form .beta.-sheets, namely K134-Q154
and S183-S208. One or more modifications can be made to the regions
of SEQ ID NO: 3 that form .beta.-sheets as long as the resulting
variant retains its ability to form a pore. A variant of SEQ ID NO:
3 preferably includes one or more modifications, such as
substitutions, additions or deletions, within its .alpha.-helices
and/or loop regions. The mutant CsgG monomers may be a mutant CsgG
monomer, which is a monomer whose sequence varies from that of a
wild-type CsgG monomer and which retains the ability to form a
pore. A mutant monomer may also be referred to herein as a variant.
Methods for confirming the ability of mutant monomers to form pores
are well-known in the art and are discussed in more detail
below.
[0150] Particular pore-forming CsgG mutant monomers that may be
included in the CsgG pore may comprise any one or more of the
following modifications: [0151] a W at a position corresponding to
R97 in SEQ ID NO:3; [0152] a W at a position corresponding to R93
in SEQ ID NO:3; [0153] a Y at a position corresponding to R97 in
SEQ ID NO: 3; [0154] a Y at a position corresponding to R93 in SEQ
ID NO: 3; [0155] a Y at each of the positions corresponding to R93
and R97 in SEQ ID NO: 3; [0156] a D at the position corresponding
to R192 in SEQ ID NO:3; [0157] deletion of the residues at the
positions corresponding to V105-I107 in SEQ ID NO:3; [0158]
deletion of the residues at one or more of the positions
corresponding to F193 to L199 in SEQ ID NO: 3; [0159] deletion of
the residues the positions corresponding to F195 to L199 in SEQ ID
NO: 3; [0160] deletion of the residues the positions corresponding
to F193 to L199 in SEQ ID NO: 3; [0161] a T at the position
corresponding to F191 in SEQ ID NO: 3; [0162] a Q at the position
corresponding to K49 in SEQ ID NO: 3; [0163] a N at the position
corresponding to K49 in SEQ ID NO: 3; [0164] a Q at the position
corresponding to K42 in SEQ ID NO: 3; [0165] a Q at the position
corresponding to E44 in SEQ ID NO: 3; [0166] a N at the position
corresponding to E44 in SEQ ID NO: 3; [0167] a R at the position
corresponding to L90 in SEQ ID NO: 3; [0168] a R at the position
corresponding to L91 in SEQ ID NO: 3; [0169] a R at the position
corresponding to 195 in SEQ ID NO: 3; [0170] a R at the position
corresponding to A99 in SEQ ID NO: 3; [0171] a H at the position
corresponding to E101 in SEQ ID NO: 3; [0172] a K at the position
corresponding to E101 in SEQ ID NO: 3; [0173] a N at the position
corresponding to E101 in SEQ ID NO: 3; [0174] a Q at the position
corresponding to E101 in SEQ ID NO: 3; [0175] a T at the position
corresponding to E101 in SEQ ID NO: 3; [0176] a K at the position
corresponding to Q114 in SEQ ID NO: 3.
[0177] The CsgG pore-forming monomer preferably further comprises
an A at the position corresponding to Y51 in SEQ ID NO: 3 and/or a
Q at the position corresponding to F56 in SEQ ID NO: 3.
[0178] Pores constructed from the CsgG monomers comprising a R to W
substitution at the position corresponding to position 97 of SEQ ID
NO: 3 display an increased accuracy as compared to otherwise
identical pores without the modification at 97 when characterizing
(or sequencing) target polynucleotides. An increased accuracy is
also seen when instead of R97W the CsgG monomers comprise the
modification R to W at a position corresponding to position 97 of
SEQ ID NO: 3 or the modifications R to Y at positions corresponding
to positions 93 and 97 of SEQ ID NO: 3. Accordingly, pores may be
constructed from one or more mutant CsgG monomers that comprise a
modification at positions corresponding to R97 or R93 of SEQ ID NO:
3 such that the modification increases the hydrophobicity of the
amino acid. For example, such modification may include an amino
acid substitution with any amino acid containing a hydrophobic side
chain, including, e.g., but not limited to W and Y.
[0179] CsgG monomers that comprise a R to D, Q, F, S or T mutation
at a position corresponding to position 192 of SEQ ID NO: 3 are
easier to express than monomers which do not have a substitution at
position 192 which may be due to the reduction of positive charge.
Accordingly position 192 may be substituted with an amino-acid
which reduces the positive charge. Monomers that comprise
R192D/Q/F/S/T may also comprise additional modifications which
improve the ability of mutant pores formed from the monomers to
interact with and characterise analytes, such as polynucleotides.
However, in one embodiment it is preferred that the residue at the
position corresponding to position 193 of SEQ ID NO: 3 is R or K,
more preferably R.
[0180] Pores comprising CsgG monomers that comprise a deletion of
V105, A106 and I107, a deletion of F193, 1194, D195, Y196, Q197,
R198 and L199 or a deletion of D195, Y196, Q197, R198 and L199,
and/or F191T display an increased accuracy when characterizing (or
sequencing) target polynucleotides. The amino-acids at positions
105 to 107 correspond to the cis-loops in the cap of the nanopore
and the amino-acids at positions 193 to 199 correspond to the
trans-loops at the other end of the pore. Without wishing to be
bound by theory it is thought that deletion of the cis-loops
improves the interaction of the enzyme with the pore and removal of
the trans-loops decreases any unwanted interaction between DNA on
the trans side of the pore.
[0181] Pores comprising CsgG monomers that comprise a K to Q or K
to N mutations at a position corresponding to K94 of SEQ ID NO: 3
show a reduction in the number of noisy pores (namely those pores
that give rise to an increased signal:noise ratio) as compared to
identical pores without the mutation at 94 when characterizing (or
sequencing) target polynucleotides. Position 94 is found within the
vestibule of the pore and was found to be a particularly sensitive
position in relation to the noise of the current signal.
[0182] Pores comprising the CsgG monomers that comprise T104K or
T104R, N91R, E101K/N/Q/T/H, E44N/Q, Q114K, A99R, 195R, N91R, L90R,
E44Q/N and/or Q42K, or corresponding mutations, all demonstrate an
improved ability to capture target polynucleotides when used to
characterize (or sequence) target polynucleotides as compared to
identical pores without substitutions at these positions.
[0183] In one embodiment, the CsgG pore comprises one or more
monomers that are variants of SEQ ID NO: 3 comprising (a) one or
more mutations at the following positions (i.e. mutations at one or
more of the following positions) 141, R93, A98, Q100, G103, T104,
A106, I107, N108, L113, S115, T117, Y130, K135, E170, S208, D233,
D238 and E244 and/or (b) one or more of D43S, E44S,
F48S/N/Q/Y/W/I/V/H/R/K, Q87N/R/K, N91K/R, K94R/F/Y/W/L/S/N,
R97F/Y/W/V/I/K/S/Q/H, E101I/L/A/H, N102K/Q/L/I/V/S/H, R110F/G/N,
Q114R/K, R142Q/S, T150Y/A/V/L/S/Q/N, R192D/Q/F/S/T and
D248S/N/Q/K/R. The variant may comprise (a); (b); or (a) and (b).
In some embodiments, the variant comprises R97W. In some
embodiments, the variant comprises R192D/Q/F/S/T, such as R192D/Q.
In (a), the variant may comprise modifications at any number and
combination of the positions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 of the positions.
[0184] In (a), the variant preferably comprises one or more of
I41N, R93F/Y/W/L/I/V/N/Q/S, A98K/R, Q100K/R, G103F/W/S/N/K/R,
T104R/K, A106R/K, I107R/K/W/F/Y/L/V, N108R/K, L113K/R, S115R/K,
T117R/K, Y130W/F/H/Q/N, K135L/V/N/Q/S, E170S/N/Q/K/R,
S208V/I/F/W/Y/L/T, D233S/N/Q/K/R, D238S/N/Q/K/R and
E244S/N/Q/K/R.
[0185] In (a), the variant preferably comprises one or more
modifications which provide more consistent movement of a target
polynucleotide with respect to, such as through, a transmembrane
pore comprising the monomer. In particular, in (a), the variant
preferably comprises one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) R93, G103 and I107. The variant may comprise R93; G103;
I107; R93 and G103; R93 and I107; G103 and I107; or R93, G103 and
I107. The variant preferably comprises one or more of
R93F/Y/W/L/I/V/N/Q/S, G103F/W/S/N/K/R and I107R/K/W/F/Y/L/V. These
may be present in any combination shown for the positions R93, G103
and I107.
[0186] In (a), the variant preferably comprises one or
modifications which allow pores constructed from the mutant
monomers preferably capture nucleotides and polynucleotides more
easily. In particular, in (a), the variant preferably comprises one
or more mutations at the following positions (i.e. mutations at one
or more of the following positions) 141, T104, A106, N108, L113,
S115, T117, E170, D233, D238 and E244. The variant may comprise
modifications at any number and combination of the positions, such
as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 of the positions. The
variant preferably comprises one or more of I41N, T104R/K, A106R/K,
N108R/K, L113K/R, S115R/K, T117R/K, E170S/N/Q/K/R, D233S/N/Q/K/R,
D238S/N/Q/K/R and E244S/N/Q/K/R. Additionally or alternatively the
variant may comprise (c) Q42K/R, E44N/Q, L90R/K, N91R/K, 195R/K,
A99R/K, E101H/K/N/Q/T and/or Q114K/R.
[0187] In (a), the variant preferably comprises one or more
modifications which provide more consistent movement and increase
capture. In particular, in (a), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) (i) A98, (ii) Q100, (iii)
G103 and (iv) I107. The variant preferably comprises one or more of
(i) A98R/K, (ii) Q100K/R, (iii) G103K/R and (iv) I107R/K.
[0188] Particularly preferred mutant monomers which provide for
increased capture of analytes, such as a polynucleotides include a
mutation at one or more of positions Q42, E44, E44, L90, N91, I95,
A99, E101 and Q114, which mutation removes the negative charge
and/or increases the positive charge at the mutated positions. In
particular, the following mutations may be included in a mutant
monomer of the invention to produce a CsgG pore that has an
improved ability to capture an analyte, preferably a
polynucleotide: Q42K, E44N, E44Q, L90R, N91R, 195R, A99R, E101H,
E101K, E101N, E101Q, E101T and Q114K. Examples of particular mutant
monomers which comprise one of these mutations in combination with
other beneficial mutations are:
[0189] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-Q42K
[0190] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E44N
[0191] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E44Q
[0192] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-L90R
[0193] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-N91R
[0194] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-195R
[0195] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-A99R
[0196] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101H
[0197] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101K
[0198] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101N
[0199] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101Q
[0200] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101T
[0201] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-Q114K.
[0202] In (a), the variant preferably comprises one or more
modifications which provide increased characterisation accuracy. In
particular, in (a), the variant preferably comprises one or more
mutations at the following positions (i.e. mutations at one or more
of the following positions) Y130, K135 and S208, such as Y130;
K135; S208; Y130 and K135; Y130 and S208; K135 and S208; or Y130,
K135 and S208. The variant preferably comprises one or more of
Y130W/F/H/Q/N, K135L/V/N/Q/S and R142Q/S. These substitutions may
be present in any number and combination as set out for Y130, K135
and S208.
[0203] In (b), the variant may comprise any number and combination
of the substitutions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or
12 of the substitutions. In (b), the variant preferably comprises
one or more modifications which provide more consistent movement of
a target polynucleotide with respect to, such as through, a
transmembrane pore comprising the monomer. In particular, in (b),
the variant preferably comprises one or more one or more of (i)
Q87N/R/K, (ii) K94R/F/Y/W/L/S/N, (iii) R97F/Y/W/V/I/K/S/Q/H, (iv)
N102K/Q/L/I/V/S/H and (v) R110F/G/N. More preferably, the variant
comprises K94D or K94Q and/or R97W or R97Y. Other preferred
variants that are modified to provide more consistent movement of a
target polynucleotide with respect to, such as through, a
transmembrane pore comprising the monomer include (vi) R93W and
R93Y. A preferred variant may comprise R93W and R97W, R93Y and
R97W, R93W and R97W, or more preferably R93Y and R97Y.
[0204] In (b), the variant preferably comprises one or
modifications which allow pores constructed from the mutant
monomers preferably capture nucleotides and polynucleotides more
easily. In particular, in (b), the variant preferably comprises one
or more of (i) D43S, (ii) E44S, (iii) N91K/R, (iv) Q114R/K and (v)
D248S/N/Q/K/R.
[0205] In (b), the variant preferably comprises one or more
modifications which provide more consistent movement and increase
capture. In particular, in (b), the variant preferably comprises
one or more of Q87R/K, E101I/L/A/H and N102K, such as Q87R/K;
E101I/L/A/H; N102K; Q87R/K and E101I/L/A/H; Q87R/K and N102K;
E101I/L/A/H and N102K; or Q87R/K, E101I/L/A/H and N102K.
[0206] In (b), the variant preferably comprises one or more
modifications which provide increased characterisation accuracy. In
particular, in (a), the variant preferably comprises
F48S/N/Q/Y/W/I/V. In (b), the variant preferably comprises one or
more modifications which provide increased characterisation
accuracy and increased capture. In particular, in (a), the variant
preferably comprises F48H/R/K.
[0207] The variant may comprise modifications in both (a) and (b)
which provide more consistent movement. The variant may comprise
modifications in both (a) and (b) which provide increased
capture.
[0208] The invention provides variants of SEQ ID NO: 3 which
provide an increased throughput of an assay for characterising an
analyte, such as a polynucleotide, using a pore comprising the
variant. Such variants may comprise a mutation at K94, preferably
K94Q or K94N, more preferably K94Q. Examples of particular mutant
monomers which comprise a K94Q or K94N mutation in combination with
other beneficial mutations are:
[0209] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-K94N
[0210] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-K94Q.
[0211] Using monomers that are variants of SEQ ID NO: 3 to form the
CsgG pore can provide increased characterisation accuracy in an
assay for characterising an analyte, such as a polynucleotide. Such
variants include variants that comprise: a mutation at F191,
preferably F191T; deletion of V105-I107; deletion of F193-L199 or
of D195-L199; and/or a mutation at R93 and/or R97, preferably R93Y,
R97Y, or more preferably, R97W, R93W or both R97Y and R97Y.
Examples of particular mutant monomers which comprise one or more
of these mutations in combination with other beneficial mutations
are:
[0212] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-del(D195-L199)
[0213] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-del(F193-L199)
[0214] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-F191T
[0215] CsgG-(WT-Y51A/F56Q/R97W/R192D-del(V105-I107)-StrepII)9
[0216] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)
[0217] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93W
[0218] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93W-del(D195-L199)
[0219] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93Y/R97Y.
[0220] In another embodiment, the variant of SEQ ID NO: 3 comprises
(A) deletion of one or more positions R192, F193, 1194, D195, Y196,
Q197, R198, L199, L200 and E201 and/or (B) deletion of one or more
of V139/G140/D149/T150/V186/Q187/V204/G205 (called band 1 herein),
G137/G138/Q151/Y152/Y184/E185/Y206/T207 (called band 2 herein) and
A141/R142/G147/A148/A188/G189/G202/E203 (called band 3 herein).
[0221] In (A), the variant may comprise deletion of any number and
combination of the positions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or
10 of the positions. In (A), the variant preferably comprises
deletion of [0222] D195, Y196, Q197, R198 and L199; [0223] R192,
F193, 1194, D195, Y196, Q197, R198, L199 and L200; [0224] Q197,
R198, L199 and L200; [0225] I194, D195, Y196, Q197, R198 and L199;
[0226] D195, Y196, Q197, R198, L199 and L200; [0227] Y196, Q197,
R198, L199, L200 and E201; [0228] Q197, R198, L199, L200 and E201;
[0229] Q197, R198, L199; or [0230] F193, 1194, D195, Y196, Q197,
R198 and L199.
[0231] More preferably, the variant comprises deletion of D195,
Y196, Q197, R198 and L199 or F193, 1194, D195, Y196, Q197, R198 and
L199. In (B), any number and combination of bands 1 to 3 may be
deleted, such as band 1; band 2; band 3; bands 1 and 2; bands 1 and
3; bands 2 and 3; or bands 1, 2 and 3. The variant may comprise
deletions according to (A); (B); or (A) and (B).
[0232] The variants comprising deletion of one or more positions
according to (A) and/or (B) above may further comprise any of the
modifications or substitutions discussed above and below. If the
modifications or substitutions are made at one or more positions
which appear after the deletion positions in SEQ ID NO: 3, the
numbering of the one or more positions of the modifications or
substitutions must be adjusted accordingly. For instance, if L199
is deleted, E244 becomes E243. Similarly, if band 1 is deleted,
R192 becomes R186.
[0233] In another embodiment, the variant of SEQ ID NO: 3 comprises
(C) deletion of one or more positions V105, A106 and I107. The
deletions in accordance with (C) may be made in addition to
deletions according to (A) and/or (B).
[0234] The above-described deletions typically reduce the noise
associated with the movement of the target polynucleotide with
respect to, such as through, a transmembrane pore comprising the
monomer.
[0235] As a result the target polynucleotide can be characterised
more accurately.
[0236] In the paragraphs above where different amino acids at a
specific positon are separated by the /symbol, the / symbol means
"or". For instance, Q87R/K means Q87R or Q87K.
[0237] The variants of SEQ ID NO: 3 which provide increased capture
of an analyte, such as a polynucleotide may comprise a mutation at
T104, preferably T104R or T104K, a mutation at N91, preferably
N91R, a mutation at E101, preferably E101K/N/Q/T/H, a mutation at
position E44, preferably E44N or E44Q and/or a mutation at position
Q42, preferably Q42K.
[0238] The mutations at different positions in SEQ ID NO: 3 may be
combined in any possible way. In particular, a monomer in the CsgG
pore may comprise one or more mutation that improves accuracy, one
ore more mutation that reduces noise and/ore one or more mutation
that enhances capture of an analyte.
[0239] The variant of SEQ ID NO: 3 preferably comprises one or more
of the following (i) one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) N40, D43, E44, S54, S57, Q62, R97, E101, E124, E131,
R142, T150 and R192, such as one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) N40, D43, E44, S54, S57, Q62, E101, E131 and T150 or
N40, D43, E44, E101 and E131; (ii) mutations at Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56; (iii) Q42R or Q42K; (iv) K49R; (v) N102R,
N102F, N102Y or N102W; (vi) D149N, D149Q or D149R; (vii) E185N,
E185Q or E185R; (viii) D195N, D195Q or D195R; (ix) E201N, E201Q or
E201R; (x) E203N, E203Q or E203R; and (xi) deletion of one or more
of the following positions F48, K49, P50, Y51, P52, A53, S54, N55,
F56 and S57. The variant may comprise any combination of (i) to
(xi).
[0240] If the variant comprises any one of (i) and (iii) to (xi),
it may further comprise a mutation at one or more of Y51, N55 and
F56, such as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or
Y51/N55/F56.
[0241] In (i), the variant may comprises mutations at any number
and combination of N40, D43, E44, S54, S57, Q62, R97, E101, E124,
E131, R142, T150 and R192. In (i), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) N40, D43, E44, S54, S57,
Q62, E101, E131 and T150. In (i), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) N40, D43, E44, E101 and
E131. In (i), the variant preferably comprises a mutation at S54
and/or S57. In (i), the variant more preferably comprises a
mutation at (a) S54 and/or S57 and (b) one or more of Y51, N55 and
F56, such as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or
Y51/N55/F56. If S54 and/or S57 are deleted in (xi), it/they cannot
be mutated in (i) and vice versa. In (i), the variant preferably
comprises a mutation at T150, such as T150I. Alternatively the
variant preferably comprises a mutation at (a) T150 and (b) one or
more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56. In (i), the variant preferably
comprises a mutation at Q62, such as Q62R or Q62K. Alternatively
the variant preferably comprises a mutation at (a) Q62 and (b) one
or more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56. The variant may comprise a
mutation at D43, E44, Q62 or any combination thereof, such as D43,
E44, Q62, D43/E44, D43/Q62, E44/Q62 or D43/E44/Q62. Alternatively
the variant preferably comprises a mutation at (a) D43, E44, Q62,
D43/E44, D43/Q62, E44/Q62 or D43/E44/Q62 and (b) one or more of
Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56.
[0242] In (ii) and elsewhere in this application where different
positions are separated by the / symbol, the / symbol means "and"
such that Y51/N55 is Y51 and N55. In (ii), the variant preferably
comprises mutations at Y51/N55. It has been proposed that the
constriction in CsgG is composed of three stacked concentric rings
formed by the side chains of residues Y51, N55 and F56 (Goyal et
al, 2014, Nature, 516, 250-253). Mutation of these residues in (ii)
may therefore decrease the number of nucleotides contributing to
the current as the polynucleotide moves through the pore and
thereby make it easier to identify a direct relationship between
the observed current (as the polynucleotide moves through the pore)
and the polynucleotide. F56 may be mutated in any of the ways
discussed below with reference to variants and pores useful in the
method of the invention.
[0243] In (v), the variant may comprise N102R, N102F, N102Y or
N102W. The variant preferably comprises (a) N102R, N102F, N102Y or
N102W and (b) a mutation at one or more of Y51, N55 and F56, such
as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56.
[0244] In (xi), any number and combination of K49, P50, Y51, P52,
A53, S54, N55, F56 and S57 may be deleted. Preferably one or more
of K49, P50, Y51, P52, A53, S54, N55 and S57 may be deleted. If any
of Y51, N55 and F56 are deleted in (xi), it/they cannot be mutated
in (ii) and vice versa.
[0245] In (i), the variant preferably comprises one of more of the
following substitutions N40R, N40K, D43N, D43Q, D43R, D43K, E44N,
E44Q, E44R, E44K, S54P, S57P, Q62R, Q62K, R97N, R97G, R97L, E101N,
E101Q, E101R, E101K, E101F, E101Y, E101W, E124N, E124Q, E124R,
E124K, E124F, E124Y, E124W, E131D, R142E, R142N, T150I, R192E and
R192N, such as one or more of N40R, N40K, D43N, D43Q, D43R, D43K,
E44N, E44Q, E44R, E44K, S54P, S57P, Q62R, Q62K, E101N, E101Q,
E101R, E101K, E101F, E101Y, E101W, E131D and T150I, or one or more
of N40R, N40K, D43N, D43Q, D43R, D43K, E44N, E44Q, E44R, E44K,
E101N, E101Q, E101R, E101K, E101F, E101Y, E101W and E131D. The
variant may comprise any number and combination of these
substitutions. In (i), the variant preferably comprises S54P and/or
S57P. In (i), the variant preferably comprises (a) S54P and/or S57P
and (b) a mutation at one or more of Y51, N55 and F56, such as at
Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56. The
mutations at one or more of Y51, N55 and F56 may be any of those
discussed below. In (i), the variant preferably comprises F56A/S57P
or S54P/F56A. The variant preferably comprises T150I. Alternatively
the variant preferably comprises a mutation at (a) T150I and (b)
one or more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56.
[0246] In (i), the variant preferably comprises Q62R or Q62K.
Alternatively the variant preferably comprises (a) Q62R or Q62K and
(b) a mutation at one or more of Y51, N55 and F56, such as at Y51,
N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56. The variant may
comprise D43N, E44N, Q62R or Q62K or any combination thereof, such
as D43N, E44N, Q62R, Q62K, D43N/E44N, D43N/Q62R, D43N/Q62K,
E44N/Q62R, E44N/Q62K, D43N/E44N/Q62R or D43N/E44N/Q62K.
Alternatively the variant preferably comprises (a) D43N, E44N,
Q62R, Q62K, D43N/E44N, D43N/Q62R, D43N/Q62K, E44N/Q62R, E44N/Q62K,
D43N/E44N/Q62R or D43N/E44N/Q62K and (b) a mutation at one or more
of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56.
[0247] In (i), the variant preferably comprises D43N.
[0248] In (i), the variant preferably comprises E101R, E101S, E101F
or E101N.
[0249] In (i), the variant preferably comprises E124N, E124Q,
E124R, E124K, E124F, E124Y, E124W or E124D, such as E124N.
[0250] In (i), the variant preferably comprises R142E and
R142N.
[0251] In (i), the variant preferably comprises R97N, R97G or
R97L.
[0252] In (i), the variant preferably comprises R192E and
R192N.
[0253] In (ii), the variant preferably comprises F56N/N55Q,
F56N/N55R, F56N/N55K, F56N/N55S, F56N/N55G, F56N/N55A, F56N/N55T,
F56Q/N55Q, F56Q/N55R, F56Q/N55K, F56Q/N55S, F56Q/N55G, F56Q/N55A,
F56Q/N55T, F56R/N55Q, F56R/N55R, F56R/N55K, F56R/N55S, F56R/N55G,
F56R/N55A, F56R/N55T, F56S/N55Q, F56S/N55R, F56S/N55K, F56S/N55S,
F56S/N55G, F56S/N55A, F56S/N55T, F56G/N55Q, F56G/N55R, F56G/N55K,
F56G/N55S, F56G/N55G, F56G/N55A, F56G/N55T, F56A/N55Q, F56A/N55R,
F56A/N55K, F56A/N55S, F56A/N55G, F56A/N55A, F56A/N55T, F56K/N55Q,
F56K/N55R, F56K/N55K, F56K/N55S, F56K/N55G, F56K/N55A, F56K/N55T,
F56N/Y51L, F56N/Y51V, F56N/Y51A, F56N/Y51N, F56N/Y51Q, F56N/Y51 S,
F56N/Y51G, F56Q/Y51L, F56Q/Y51V, F56Q/Y51A, F56Q/Y51N, F56Q/Y51Q,
F56Q/Y51S, F56Q/Y51G, F56R/Y51L, F56R/Y51V, F56R/Y51A, F56R/Y51N,
F56R/Y51Q, F56R/Y51S, F56R/Y51G, F56S/Y51L, F56S/Y51V, F56S/Y51A,
F56S/Y51N, F56S/Y51Q, F56S/Y51S, F56S/Y51G, F56G/Y51L, F56G/Y51V,
F56G/Y51A, F56G/Y51N, F56G/Y51Q, F56G/Y51S, F56G/Y51G, F56A/Y51L,
F56A/Y51V, F56A/Y51A, F56A/Y51N, F56A/Y51Q, F56A/Y51S, F56A/Y51G,
F56K/Y51L, F56K/Y51V, F56K/Y51A, F56K/Y51N, F56K/Y51Q, F56K/Y51S,
F56K/Y51G, N55Q/Y51L, N55Q/Y51V, N55Q/Y51A, N55Q/Y51N, N55Q/Y51Q,
N55Q/Y51S, N55Q/Y51G, N55R/Y51L, N55R/Y51V, N55R/Y51A, N55R/Y51N,
N55R/Y51Q, N55R/Y51S, N55R/Y51G, N55K/Y51L, N55K/Y51V, N55K/Y51A,
N55K/Y51N, N55K/Y51Q, N55K/Y51 S, N55K/Y51G, N55S/Y51L, N55S/Y51V,
N55S/Y51A, N55S/Y51N, N55S/Y51Q, N55S/Y51S, N55S/Y51G, N55G/Y51L,
N55G/Y51V, N55G/Y51A, N55G/Y51N, N55G/Y51Q, N55G/Y51S, N55G/Y51 G,
N55A/Y51L, N55A/Y51V, N55A/Y51A, N55A/Y51N, N55A/Y51Q, N55A/Y51S,
N55A/Y51G, N55T/Y51L, N55T/Y51V, N55T/Y51A, N55T/Y51N, N55T/Y51Q,
N55T/Y51S, N55T/Y51G, F56N/N55Q/Y51L, F56N/N55Q/Y51V,
F56N/N55Q/Y51A, F56N/N55Q/Y51N, F56N/N55Q/Y51Q, F56N/N55Q/Y51S,
F56N/N55Q/Y51G, F56N/N55R/Y51L, F56N/N55R/Y51V, F56N/N55R/Y51A,
F56N/N55R/Y51N, F56N/N55R/Y51 Q, F56N/N55R/Y51S, F56N/N55R/Y51G,
F56N/N55K/Y51L, F56N/N55K/Y51V, F56N/N55K/Y51A, F56N/N55K/Y51N,
F56N/N55K/Y51Q, F56N/N55K/Y51S, F56N/N55K/Y51G, F56N/N55S/Y51L,
F56N/N55S/Y51V, F56N/N55S/Y51A, F56N/N55S/Y51N, F56N/N55S/Y51Q,
F56N/N55S/Y51S, F56N/N55S/Y51G, F56N/N55G/Y51L, F56N/N55G/Y51V,
F56N/N55G/Y51A, F56N/N55G/Y51N, F56N/N55G/Y51 Q, F56N/N55G/Y51S,
F56N/N55G/Y51G, F56N/N55A/Y51L, F56N/N55A/Y51V, F56N/N55A/Y51A,
F56N/N55A/Y51N, F56N/N55A/Y51Q, F56N/N55A/Y51S, F56N/N55A/Y51G,
F56N/N55T/Y51L, F56N/N55T/Y51V, F56N/N55T/Y51A, F56N/N55T/Y51N,
F56N/N55T/Y51 Q, F56N/N55T/Y51S, F56N/N55T/Y51G, F56Q/N55Q/Y51L,
F56Q/N55Q/Y51V, F56Q/N55Q/Y51A, F56Q/N55Q/Y51N, F56Q/N55Q/Y51 Q,
F56Q/N55Q/Y51S, F56Q/N55Q/Y51G, F56Q/N55R/Y51L, F56Q/N55R/Y51V,
F56Q/N55R/Y51A, F56Q/N55R/Y51N, F56Q/N55R/Y51Q, F56Q/N55R/Y51S,
F56Q/N55R/Y51G, F56Q/N55K/Y51L, F56Q/N55K/Y51V, F56Q/N55K/Y51A,
F56Q/N55K/Y51N, F56Q/N55K/Y51Q, F56Q/N55K/Y51 S, F56Q/N55K/Y51G,
F56Q/N55S/Y51L, F56Q/N55S/Y51V, F56Q/N55S/Y51A, F56Q/N55S/Y51N,
F56Q/N55S/Y51 Q, F56Q/N55S/Y51 S, F56Q/N55S/Y51G, F56Q/N55G/Y51L,
F56Q/N55G/Y51V, F56Q/N55G/Y51A, F56Q/N55G/Y51N, F56Q/N55G/Y51Q,
F56Q/N55G/Y51S, F56Q/N55G/Y51G, F56Q/N55A/Y51L, F56Q/N55A/Y51V,
F56Q/N55A/Y51A, F56Q/N55A/Y51N, F56Q/N55A/Y51Q, F56Q/N55A/Y51S,
F56Q/N55A/Y51G, F56Q/N55T/Y51L, F56Q/N55T/Y51V, F56Q/N55T/Y51A,
F56Q/N55T/Y51N, F56Q/N55T/Y51Q, F56Q/N55T/Y51S, F56Q/N55T/Y51G,
F56R/N55Q/Y51L, F56R/N55Q/Y51V, F56R/N55Q/Y51A, F56R/N55Q/Y51N,
F56R/N55Q/Y51 Q, F56R/N55Q/Y51S, F56R/N55Q/Y51G, F56R/N55R/Y51L,
F56R/N55R/Y51V, F56R/N55R/Y51A, F56R/N55R/Y51N, F56R/N55R/Y51 Q,
F56R/N55R/Y51S, F56R/N55R/Y51G, F56R/N55K/Y51L, F56R/N55K/Y51V,
F56R/N55K/Y51A, F56R/N55K/Y51N, F56R/N55K/Y51Q, F56R/N55K/Y51S,
F56R/N55K/Y51G, F56R/N55S/Y51L, F56R/N55S/Y51V, F56R/N55S/Y51A,
F56R/N55S/Y51N, F56R/N55S/Y51Q, F56R/N55S/Y51S, F56R/N55S/Y51G,
F56R/N55G/Y51L, F56R/N55G/Y51V, F56R/N55G/Y51A, F56R/N55G/Y51N,
F56R/N55G/Y51Q, F56R/N55G/Y51S, F56R/N55G/Y51G, F56R/N55A/Y51L,
F56R/N55A/Y51V, F56R/N55A/Y51A, F56R/N55A/Y51N, F56R/N55A/Y51Q,
F56R/N55A/Y51 S, F56R/N55A/Y51 G, F56R/N55T/Y51L, F56R/N55T/Y51V,
F56R/N55T/Y51A, F56R/N55T/Y51N, F56R/N55T/Y51Q, F56R/N55T/Y51S,
F56R/N55T/Y51G, F56S/N55Q/Y51L, F56S/N55Q/Y51V, F56S/N55Q/Y51A,
F56S/N55Q/Y51N, F56S/N55Q/Y51 Q, F56S/N55Q/Y51S, F56S/N55Q/Y51 G,
F56S/N55R/Y51L, F56S/N55R/Y51V, F56S/N55R/Y51A, F56S/N55R/Y51N,
F56S/N55R/Y51Q, F56S/N55R/Y51S, F56S/N55R/Y51G, F56S/N55K/Y51L,
F56S/N55K/Y51V, F56S/N55K/Y51A, F56S/N55K/Y51N, F56S/N55K/Y51Q,
F56S/N55K/Y51S, F56S/N55K/Y51G, F56S/N55S/Y51L, F56S/N55S/Y51V,
F56S/N55S/Y51A, F56S/N55S/Y51N, F56S/N55S/Y51Q, F56S/N55S/Y51S,
F56S/N55S/Y51G, F56S/N55G/Y51L, F56S/N55G/Y51V, F56S/N55G/Y51A,
F56S/N55G/Y51N, F56S/N55G/Y51 Q, F56S/N55G/Y51 S, F56S/N55G/Y51G,
F56S/N55A/Y51L, F56S/N55A/Y51V, F56S/N55A/Y51A, F56S/N55A/Y51N,
F56S/N55A/Y51Q, F56S/N55A/Y51S, F56S/N55A/Y51G, F56S/N55T/Y51L,
F56S/N55T/Y51V, F56S/N55T/Y51A, F56S/N55T/Y51N, F56S/N55T/Y51Q,
F56S/N55T/Y51S, F56S/N55T/Y51G, F56G/N55Q/Y51L, F56G/N55Q/Y51V,
F56G/N55Q/Y51A, F56G/N55Q/Y51N, F56G/N55Q/Y51Q, F56G/N55Q/Y51S,
F56G/N55Q/Y51G, F56G/N55R/Y51L, F56G/N55R/Y51V, F56G/N55R/Y51A,
F56G/N55R/Y51N, F56G/N55R/Y51Q, F56G/N55R/Y51S, F56G/N55R/Y51G,
F56G/N55K/Y51L, F56G/N55K/Y51V, F56G/N55K/Y51A, F56G/N55K/Y51N,
F56G/N55K/Y51 Q, F56G/N55K/Y51S, F56G/N55K/Y51G, F56G/N55S/Y51L,
F56G/N55S/Y51V, F56G/N55S/Y51A, F56G/N55S/Y51N, F56G/N55S/Y51Q,
F56G/N55S/Y51S, F56G/N55S/Y51G, F56G/N55G/Y51L, F56G/N55G/Y51V,
F56G/N55G/Y51A, F56G/N55G/Y51N, F56G/N55G/Y51 Q, F56G/N55G/Y51S,
F56G/N55G/Y51G, F56G/N55A/Y51L, F56G/N55A/Y51V, F56G/N55A/Y51A,
F56G/N55A/Y51N, F56G/N55A/Y51Q, F56G/N55A/Y51S, F56G/N55A/Y51G,
F56G/N55T/Y51L, F56G/N55T/Y51V, F56G/N55T/Y51A, F56G/N55T/Y51N,
F56G/N55T/Y51Q, F56G/N55T/Y51S, F56G/N55T/Y51G, F56A/N55Q/Y51L,
F56A/N55Q/Y51V, F56A/N55Q/Y51A, F56A/N55Q/Y51N, F56A/N55Q/Y51 Q,
F56A/N55Q/Y51 S, F56A/N55Q/Y51G, F56A/N55R/Y51L, F56A/N55R/Y51V,
F56A/N55R/Y51A, F56A/N55R/Y51N, F56A/N55R/Y51Q, F56A/N55R/Y51S,
F56A/N55R/Y51G, F56A/N55K/Y51L, F56A/N55K/Y51V, F56A/N55K/Y51A,
F56A/N55K/Y51N, F56A/N55K/Y51 Q, F56A/N55K/Y51S, F56A/N55K/Y51G,
F56A/N55S/Y51L, F56A/N55S/Y51V, F56A/N55S/Y51A, F56A/N55S/Y51N,
F56A/N55S/Y51Q, F56A/N55S/Y51S, F56A/N55S/Y51G, F56A/N55G/Y51L,
F56A/N55G/Y51V, F56A/N55G/Y51A, F56A/N55G/Y51N, F56A/N55G/Y51Q,
F56A/N55G/Y51S, F56A/N55G/Y51G, F56A/N55A/Y51L, F56A/N55A/Y51V,
F56A/N55A/Y51A, F56A/N55A/Y51N, F56A/N55A/Y51Q, F56A/N55A/Y51S,
F56A/N55A/Y51G, F56A/N55T/Y51L, F56A/N55T/Y51V, F56A/N55T/Y51A,
F56A/N55T/Y51N, F56A/N55T/Y51Q, F56A/N55T/Y51S, F56A/N55T/Y51G,
F56K/N55Q/Y51L, F56K/N55Q/Y51V, F56K/N55Q/Y51A, F56K/N55Q/Y51N,
F56K/N55Q/Y51Q, F56K/N55Q/Y51S, F56K/N55Q/Y51 G, F56K/N55R/Y51L,
F56K/N55R/Y51V, F56K/N55R/Y51A, F56K/N55R/Y51N, F56K/N55R/Y51Q,
F56K/N55R/Y51S, F56K/N55R/Y51 G, F56K/N55K/Y51L, F56K/N55K/Y51V,
F56K/N55K/Y51A, F56K/N55K/Y51N, F56K/N55K/Y51Q, F56K/N55K/Y51S,
F56K/N55K/Y51G, F56K/N55S/Y51L, F56K/N55S/Y51V, F56K/N55S/Y51A,
F56K/N55S/Y51N, F56K/N55S/Y51Q, F56K/N55S/Y51S, F56K/N55S/Y51G,
F56K/N55G/Y51L, F56K/N55G/Y51V, F56K/N55G/Y51A, F56K/N55G/Y51N,
F56K/N55G/Y51 Q, F56K/N55G/Y51 S, F56K/N55G/Y51G, F56K/N55A/Y51L,
F56K/N55A/Y51V, F56K/N55A/Y51A, F56K/N55A/Y51N, F56K/N55A/Y51 Q,
F56K/N55A/Y51S, F56K/N55A/Y51G, F56K/N55T/Y51L, F56K/N55T/Y51V,
F56K/N55T/Y51A, F56K/N55T/Y51N, F56K/N55T/Y51Q, F56K/N55T/Y51S,
F56K/N55T/Y51G, F56E/N55R, F56E/N55K, F56D/N55R, F56D/N55K,
F56R/N55E, F56R/N55D, F56K/N55E or F56K/N55D.
[0254] In (ii), the variant preferably comprises Y51R/F56Q,
Y51N/F56N, Y51M/F56Q, Y51L/F56Q, Y51 I/F56Q, Y51V/F56Q, Y51A/F56Q,
Y51P/F56Q, Y51G/F56Q, Y51C/F56Q, Y51Q/F56Q, Y51N/F56Q, Y51S/F56Q,
Y51E/F56Q, Y51D/F56Q, Y51K/F56Q or Y51H/F56Q.
[0255] In (ii), the variant preferably comprises Y51T/F56Q,
Y51Q/F56Q or Y51A/F56Q.
[0256] In (ii), the variant preferably comprises Y51T/F56F,
Y51T/F56M, Y51T/F56L, Y51T/F56I, Y51T/F56V, Y51T/F56A, Y51T/F56P,
Y51T/F56G, Y51T/F56C, Y51T/F56Q, Y51T/F56N, Y51T/F56T, Y51T/F56S,
Y51T/F56E, Y51T/F56D, Y51T/F56K, Y51T/F56H or Y51T/F56R.
[0257] In (ii), the variant preferably comprises Y51T/N55Q,
Y51T/N55S or Y51T/N55A.
[0258] In (ii), the variant preferably comprises Y51A/F56F,
Y51A/F56L, Y51A/F561, Y51A/F56V, Y51A/F56A, Y51A/F56P, Y51A/F56G,
Y51A/F56C, Y51A/F56Q, Y51A/F56N, Y51A/F56T, Y51A/F56S, Y51A/F56E,
Y51A/F56D, Y51A/F56K, Y51A/F56H or Y51A/F56R.
[0259] In (ii), the variant preferably comprises Y51C/F56A,
Y51E/F56A, Y51 D/F56A, Y51K/F56A, Y51H/F56A, Y51 Q/F56A, Y51N/F56A,
Y51S/F56A, Y51P/F56A or Y51V/F56A.
[0260] In (xi), the variant preferably comprises deletion of
Y51/P52, Y51/P52/A53, P50 to P52, P50 to A53, K49 to Y51, K49 to
A53 and replacement with a single proline (P), K49 to S54 and
replacement with a single P, Y51 to A53, Y51 to S54, N55/F56, N55
to S57, N55/F56 and replacement with a single P, N55/F56 and
replacement with a single glycine (G), N55/F56 and replacement with
a single alanine (A), N55/F56 and replacement with a single P and
Y51N, N55/F56 and replacement with a single P and Y51Q, N55/F56 and
replacement with a single P and Y51S, N55/F56 and replacement with
a single G and Y51N, N55/F56 and replacement with a single G and
Y51Q, N55/F56 and replacement with a single G and Y51S, N55/F56 and
replacement with a single A and Y51N, N55/F56 and replacement with
a single A/Y51Q or N55/F56 and replacement with a single A and
Y51S.
[0261] The variant more preferably comprises D195N/E203N,
D195Q/E203N, D195N/E203Q, D195Q/E203Q, E201N/E203N, E201Q/E203N,
E201N/E203Q, E201Q/E203Q, E185N/E203Q, E185Q/E203Q, E185N/E203N,
E185Q/E203N, D195N/E201N/E203N, D195Q/E201N/E203N,
D195N/E201Q/E203N, D195N/E201N/E203Q, D195Q/E201Q/E203N,
D195Q/E201N/E203Q, D195N/E201Q/E203Q, D195Q/E201Q/E203Q,
D149N/E201N, D149Q/E201N, D149N/E201Q, D149Q/E201Q,
D149N/E201N/D195N, D149Q/E201N/D195N, D149N/E201Q/D195N,
D149N/E201N/D195Q, D149Q/E201Q/D195N, D149Q/E201N/D195Q,
D149N/E201Q/D195Q, D149Q/E201Q/D195Q, D149N/E203N, D149Q/E203N,
D149N/E203Q, D149Q/E203Q, D149N/E185N/E201N, D149Q/E185N/E201N,
D149N/E185Q/E201N, D149N/E185N/E201Q, D149Q/E185Q/E201N,
D149Q/E185N/E201Q, D149N/E185Q/E201Q, D149Q/E185Q/E201Q,
D149N/E185N/E203N, D149Q/E185N/E203N, D149N/E185Q/E203N,
D149N/E185N/E203Q, D149Q/E185Q/E203N, D149Q/E185N/E203Q,
D149N/E185Q/E203Q, D149Q/E185Q/E203Q, D149N/E185N/E201N/E203N,
D149Q/E185N/E201N/E203N, D149N/E185Q/E201N/E203N,
D149N/E185N/E201Q/E203N, D149N/E185N/E201N/E203Q,
D149Q/E185Q/E201N/E203N, D149Q/E185N/E201Q/E203N,
D149Q/E185N/E201N/E203Q, D149N/E185Q/E201Q/E203N,
D149N/E185Q/E201N/E203Q, D149N/E185N/E201Q/E203Q,
D149Q/E185Q/E201Q/E203Q, D149Q/E185Q/E201N/E203Q,
D149Q/E185N/E201Q/E203Q, D149N/E185Q/E201Q/E203Q,
D149Q/E185Q/E201Q/E203N, D149N/E185N/D195N/E201N/E203N,
D149Q/E185N/D195N/E201N/E203N, D149N/E185Q/D195N/E201N/E203N,
D149N/E185N/D195Q/E201N/E203N, D149N/E185N/D195N/E201Q/E203N,
D149N/E185N/D195N/E201N/E203Q, D149Q/E185Q/D195N/E201N/E203N,
D149Q/E185N/D195Q/E201N/E203N, D149Q/E185N/D195N/E201Q/E203N,
D149Q/E185N/D195N/E201N/E203Q, D149N/E185Q/D195Q/E201N/E203N,
D149N/E185Q/D195N/E201Q/E203N, D149N/E185Q/D195N/E201N/E203Q,
D149N/E185N/D195Q/E201Q/E203N, D149N/E185N/D195Q/E201N/E203Q,
D149N/E185N/D195N/E201Q/E203Q, D149Q/E185Q/D195Q/E201N/E203N,
D149Q/E185Q/D195N/E201Q/E203N, D149Q/E185Q/D195N/E201N/E203Q,
D149Q/E185N/D195Q/E201Q/E203N, D149Q/E185N/D195Q/E201N/E203Q,
D149Q/E185N/D195N/E201Q/E203Q, D149N/E185Q/D195Q/E201Q/E203N,
D149N/E185Q/D195Q/E201N/E203Q, D149N/E185Q/D195N/E201Q/E203Q,
D149N/E185N/D195Q/E201Q/E203Q, D149Q/E185Q/D195Q/E201Q/E203N,
D149Q/E185Q/D195Q/E201N/E203Q, D149Q/E185Q/D195N/E201Q/E203Q,
D149Q/E185N/D195Q/E201Q/E203Q, D149N/E185Q/D195Q/E201Q/E203Q,
D149Q/E185Q/D195Q/E201Q/E203Q, D149N/E185R/E201N/E203N,
D149Q/E185R/E201N/E203N, D149N/E185R/E201Q/E203N,
D149N/E185R/E201N/E203Q, D149Q/E185R/E201Q/E203N,
D149Q/E185R/E201N/E203Q, D149N/E185R/E201Q/E203Q,
D149Q/E185R/E201Q/E203Q, D149R/E185N/E201N/E203N,
D149R/E185Q/E201N/E203N, D149R/E185N/E201Q/E203N,
D149R/E185N/E201N/E203Q, D149R/E185Q/E201Q/E203N,
D149R/E185Q/E201N/E203Q, D149R/E185N/E201Q/E203Q, D149R/E185Q/E201
Q/E203Q, D149R/E185N/D195N/E201N/E203N,
D149R/E185Q/D195N/E201N/E203N, D149R/E185N/D195Q/E201N/E203N,
D149R/E185N/D195N/E201Q/E203N, D149R/E185Q/D195N/E201N/E203Q,
D149R/E185Q/D195Q/E201N/E203N, D149R/E185Q/D195N/E201Q/E203N,
D149R/E185Q/D195N/E201N/E203Q, D149R/E185N/D195Q/E201Q/E203N,
D149R/E185N/D195Q/E201N/E203Q, D149R/E185N/D195N/E201Q/E203Q,
D149R/E185Q/D195Q/E201Q/E203N, D149R/E185Q/D195Q/E201N/E203Q,
D149R/E185Q/D195N/E201Q/E203Q, D149R/E185N/D195Q/E201Q/E203Q,
D149R/E185Q/D195Q/E201Q/E203Q, D149N/E185R/D195N/E201N/E203N,
D149Q/E185R/D195N/E201N/E203N, D149N/E185R/D195Q/E201N/E203N,
D149N/E185R/D195N/E201Q/E203N, D149N/E185R/D195N/E201N/E203Q,
D149Q/E185R/D195Q/E201N/E203N, D149Q/E185R/D195N/E201Q/E203N,
D149Q/E185R/D195N/E201N/E203Q, D149N/E185R/D195Q/E201Q/E203N,
D149N/E185R/D195Q/E201N/E203Q, D149N/E185R/D195N/E201Q/E203Q,
D149Q/E185R/D195Q/E201 Q/E203N, D149Q/E185R/D195Q/E201N/E203Q,
D149Q/E185R/D195N/E201Q/E203Q, D149N/E185R/D195Q/E201Q/E203Q,
D149Q/E185R/D195Q/E201Q/E203Q, D149N/E185R/D195N/E201R/E203N,
D149Q/E185R/D195N/E201R/E203N, D149N/E185R/D195Q/E201R/E203N,
D149N/E185R/D195N/E201R/E203Q, D149Q/E185R/D195Q/E201R/E203N,
D149Q/E185R/D195N/E201R/E203Q, D149N/E185R/D195Q/E201R/E203Q,
D149Q/E185R/D195Q/E201R/E203Q, E131D/K49R, E101N/N102F,
E101N/N102Y, E101N/N102W, E101F/N102F, E101F/N102Y, E101F/N102W,
E101Y/N102F, E101Y/N102Y, E101Y/N102W, E101W/N102F, E101W/N102Y,
E101W/N102W, E101N/N102R, E101F/N102R, E101Y/N102R or
E101W/N102F.
[0262] Preferred variants of the invention which form pores in
which fewer nucleotides contribute to the current as the
polynucleotide moves through the pore comprise Y51A/F56A,
Y51A/F56N, Y51 I/F56A, Y51L/F56A, Y51T/F56A, Y51 I/F56N, Y51L/F56N
or Y51T/F56N or more preferably Y51 I/F56A, Y51L/F56A or Y51T/F56A.
As discussed above, this makes it easier to identify a direct
relationship between the observed current (as the polynucleotide
moves through the pore) and the polynucleotide.
[0263] Preferred variants which form pores displaying an increased
range comprise mutations at the following positions:
[0264] Y51, F56, D149, E185, E201 and E203;
[0265] N55 and F56;
[0266] Y51 and F56;
[0267] Y51, N55 and F56; or
[0268] F56 and N102.
[0269] Preferred variants which form pores displaying an increased
range comprise:
[0270] Y51N, F56A, D149N, E185R, E201N and E203N;
[0271] N55S and F56Q;
[0272] Y51A and F56A;
[0273] Y51A and F56N;
[0274] Y51I and F56A;
[0275] Y51L and F56A;
[0276] Y51T and F56A;
[0277] Y51I and F56N;
[0278] Y51L and F56N;
[0279] Y51T and F56N;
[0280] Y51T and F56Q;
[0281] Y51A, N55S and F56A;
[0282] Y51A, N55S and F56N;
[0283] Y51T, N55S and F56Q; or
[0284] F56Q and N102R.
[0285] Preferred variants which form pores in which fewer
nucleotides contribute to the current as the polynucleotide moves
through the pore comprise mutations at the following positions:
[0286] N55 and F56, such as N55X and F56Q, wherein X is any amino
acid; or
[0287] Y51 and F56, such as Y51X and F56Q, wherein X is any amino
acid.
[0288] Particularly preferred variants comprise Y51A and F56Q.
[0289] Preferred variants which form pores displaying an increased
throughput comprise mutations at the following positions:
[0290] D149, E185 and E203;
[0291] D149, E185, E201 and E203; or
[0292] D149, E185, D195, E201 and E203.
[0293] Preferred variants which form pores displaying an increased
throughput comprise:
[0294] D149N, E185N and E203N;
[0295] D149N, E185N, E201N and E203N;
[0296] D149N, E185R, D195N, E201N and E203N; or
[0297] D149N, E185R, D195N, E201R and E203N.
[0298] Preferred variants which form pores in which capture of the
polynucleotide is increased comprise the following mutations:
[0299] D43N/Y51T/F56Q;
[0300] E44N/Y51T/F56Q;
[0301] D43N/E44N/Y51T/F56Q;
[0302] Y51T/F56Q/Q62R;
[0303] D43N/Y51T/F56Q/Q62R;
[0304] E44N/Y51T/F56Q/Q62R; or
[0305] D43N/E44N/Y51T/F56Q/Q62R.
[0306] Preferred variants comprise the following mutations:
[0307] D149R/E185R/E201R/E203R or
Y51T/F56Q/D149R/E185R/E201R/E203R;
[0308] D149N/E185N/E201N/E203N or
Y51T/F56Q/D149N/E185N/E201N/E203N;
[0309] E201R/E203R or Y51T/F56Q/E201R/E203R
[0310] E201N/E203R or Y51T/F56Q/E201N/E203R;
[0311] E203R or Y51T/F56Q/E203R;
[0312] E203N or Y51T/F56Q/E203N;
[0313] E201R or Y51T/F56Q/E201R;
[0314] E201N or Y51T/F56Q/E201N;
[0315] E185R or Y51T/F56Q/E185R;
[0316] E185N or Y51T/F56Q/E185N;
[0317] D149R or Y51T/F56Q/D149R;
[0318] D149N or Y51T/F56Q/D149N;
[0319] R142E or Y51T/F56Q/R142E;
[0320] R142N or Y51T/F56Q/R142N;
[0321] R192E or Y51T/F56Q/R192E; or
[0322] R192N or Y51T/F56Q/R192N.
[0323] Preferred variants comprise the following mutations:
[0324] Y51A/F56Q/E101N/N102R;
[0325] Y51A/F56Q/R97N/N102G;
[0326] Y51A/F56Q/R97N/N102R;
[0327] Y51A/F56Q/R97N;
[0328] Y51A/F56Q/R97G;
[0329] Y51A/F56Q/R97L;
[0330] Y51A/F56Q/N102R;
[0331] Y51A/F56Q/N102F;
[0332] Y51A/F56Q/N102G;
[0333] Y51A/F56Q/E101R;
[0334] Y51A/F56Q/E101F;
[0335] Y51A/F56Q/E101N; or
[0336] Y51A/F56Q/E101G
[0337] The variant preferably further comprises a mutation at T150.
A preferred variant which forms a pore displaying an increased
insertion comprises T150I. A mutation at T150, such as T150I, may
be combined with any of the mutations or combinations of mutations
discussed above.
[0338] A preferred variant of SEQ ID NO: 3 comprises (a) R97W and
(b) a mutation at Y51 and/or F56. A preferred variant of SEQ ID NO:
3 comprises (a) R97W and (b) Y51R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M
and/or F56R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M. A preferred variant of
SEQ ID NO: 3 comprises (a) R97W and (b) Y51L/V/A/N/Q/S/G and/or
F56A/Q/N. A preferred variant of SEQ ID NO: 3 comprises (a) R97W
and (b) Y51A and/or F56Q. A preferred variant of SEQ ID NO: 3
comprises R97W, Y51A and F56Q.
[0339] The variant of SEQ ID NO: 3 preferably comprises a mutation
at R192. The variant preferably comprises R192D/Q/F/S/T/N/E,
R192D/Q/F/S/T or R192D/Q. A preferred variant of SEQ ID NO: 3
comprises (a) R97W, (b) a mutation at Y51 and/or F56 and (c) a
mutation at R192, such as R192D/Q/F/S/T/N/E, R192D/Q/F/S/T or
R192D/Q. A preferred variant of SEQ ID NO: 3 comprises (a) R97W,
(b) Y51R/H/K/D/E/S/T/N/Q/C/G/P/NV/I/L/M and/or
F56R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M and (c) a mutation at R192,
such as R192D/Q/F/S/T/N/E, R192D/Q/F/S/T or R192D/Q. A preferred
variant of SEQ ID NO: 3 comprises (a) R97W, (b) Y51L/V/A/N/Q/S/G
and/or F56A/Q/N and (c) a mutation at R192, such as
R192D/Q/F/S/T/N/E, R192D/Q/F/S/T or R192D/Q. A preferred variant of
SEQ ID NO: 3 comprises (a) R97W, (b) Y51A and/or F56Q and (c) a
mutation at R192, such as R192 D/Q/F/S/T/N/E, R192D/Q/F/S/T or
R192D/Q. A preferred variant of SEQ ID NO: 3 comprises R97W, Y51A,
F56Q and R192D/Q/F/S/T or R192D/Q. A preferred variant of SEQ ID
NO: 3 comprises R97W, Y51A, F56Q and R192D. A preferred variant of
SEQ ID NO: 3 comprises R97W, Y51A, F56Q and R192Q. In the
paragraphs above where different amino acids at a specific positon
are separated by the / symbol, the /symbol means "or". For
instance, R192D/Q means R192D or R192Q.
[0340] Any of the above preferred variants of SEQ ID NO: 3
described above may further comprises a mutation at R93. A
preferred variant of SEQ ID NO: 3 comprise (a) R93W and (b) a
mutation at Y51 and/or F56, preferably Y51A and F56Q.
[0341] Any of the above preferred variants of SEQ ID NO: 3 may
comprise a K94N/Q mutation. Any of the above preferred variants of
SEQ ID NO: 3 may comprise a F191T mutation.
[0342] The CsgG monomer may be modified to facilitate attachment to
the CsgF peptide. For example a cysteine residue may be introduced
at one or more of the positions corresponding to positions 132,
133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183,
185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 3,
and/or at any one of the positions identified in Table 4 as being
predicted to make contact with CsgF, to facilitate covalent
attachment to CsgG. As an alternative or addition to covalent
attachment via cysteine residues, the pore may be stabilised by
hydrophobic interactions or electrostatic interactions. To
facilitate such interactions, a non-native reactive or
photoreactive amino acid at a position corresponding to one or more
of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151,
153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of
SEQ ID NO: 3, and/or at any one of the positions identified in
Table 4 as being predicted to make contact with CsgF.
[0343] Preferred exemplary pores include at least one CsgG monomer
having the following mutations relative to SEQ ID NO: 3:
Y51X.sub.1/N55X.sub.2/F56X.sub.3/N91R/K94Q/R97W/R192D-del(V105-I107),
wherein X.sub.1 is I/V/S/T, X.sub.2 is N/IN/S/T and/or X.sub.3 is
Q/l/V/S/T.
[0344] Methods for introducing or substituting naturally-occurring
amino acids are well known in the art. For instance, methionine (M)
may be substituted with arginine (R) by replacing the codon for
methionine (ATG) with a codon for arginine (CGT) at the relevant
position in a polynucleotide encoding the mutant monomer. The
polynucleotide can then be expressed as discussed below.
Double Pores
[0345] The CsgG/CsgF pore may be a double pore comprising a first
pore and a second pore. At least the first pore is a CsgG/CsgF pore
disclosed herein. The second pore may be a CsgG pore or a CsgG/CsgF
pore. In one embodiment both the first pore and the second pore are
CsgG/CsgF pores as disclosed herein. The first and second pores may
be the same or different. In addition to any of the mutations
disclosed herein, in a double pore, the CsgG monomer may comprise
one or more of the additional mutations described below.
[0346] In the double pore, the first pore may be attached to the
second CsgG pore by hydrophobic interactions and/or by one or more
disulphide bond. One or more, such as 2, 3, 4, 5, 6, 8, 9, for
example all, of the monomers in the first pore and/or the second
pore may be modified to enhance such interactions. This may be
achieved in any suitable way.
[0347] At least one cysteine residue in the amino acid sequence of
the first pore at the interface between the first and second pores
may be disulphide bonded to at least one cysteine residue in the
amino acid sequence of the second pore at the interface between the
first and second pores. The cysteine residue in the first pore
and/or the cysteine residue in the second pore may be a cysteine
residue that is not present in the wild type CsgG monomer. Multiple
disulphide bonds, such as from 2, 3, 4, 5, 6, 7, 8 or 9 to 16, 18,
24, 27, 32, 36, 40, 45, 48, 54, 56 or 63, may form between the two
pores in the double pore. One or both the first or second pore may
comprise at least one monomer, such as up to 8, 9 or 10 monomers,
that comprises a cysteine residue at the interface between the
first and second pores at a position corresponding to R97, I107,
R110, Q100, E101, N102 and/or L113 of SEQ ID NO: 3.
[0348] At least one monomer in the first pore and/or at least one
monomer in the second pore may comprise at least one residue at the
interface between the first and second pores, which residue is more
hydrophobic than the residue present at the corresponding position
in the wild type CsgG monomer. For example, from 2 to 10, such as
3, 4, 5, 6, 7, 8 or 9, residues in the first pore and/or the second
pore may be more hydrophobic that the residues at the same
positions in the corresponding wild type CsgG monomer. Such
hydrophobic residues strengthen the interaction between the two
pores in the double pore. The at least one residue at the interface
between the first and second pores may be at a position
corresponding to R97, I107, R110, Q100, E101, N102 and or L113 of
SEQ ID NO: 3. Where the residue at the interface in the wild type
CsgG monomer is R, Q, N or E, the hydrophobic residue is typically
I, L, V, M, F, W or Y. Where the residue at the interface in the
wild type CsgG monomer is 1, the hydrophobic residue is typically
L, V, M, F, W or Y. Where the residue at the interface in the wild
type CsgG monomer is L, the hydrophobic residue is typically I, V,
M, F, W or Y.
[0349] The double pore may comprise one or more monomer that
comprises one or more cysteine residue at the interface between the
pores and one or more monomer that comprises one or more introduced
hydrophobic residue at the interface between the pores, or may
comprise one or more monomer that comprises such cysteine residues
and such hydrophobic residues. For example, one or more, such as
any 2, 3, or 4, of the positions in the monomer corresponding to
the positions at R97, I107, R110, Q100, E101, N102 and or L113 of
SEQ ID NO: 3 may comprise a cysteine (C) residue and one or more,
such as any 2, 3 or 4, of the positions in the monomer
corresponding to the positions at R97, I107, R110, Q100, E101, N102
and or L113 of SEQ ID NO: 3 may comprise a hydrophobic residue,
such as 1, L, V, M, F, W or Y.
[0350] The double pore according may contain bulky residues at one
or more, such as 2, 3, 4, 5, 6 or 7, positions in the tail region,
which residues are typically at the interface between the first and
second pores and are bulkier than the residues present at the
corresponding positions in the wild type CsgG monomer. The bulk of
these residues prevents holes from forming in the walls of the pore
at the interface between the first and second pore in the double
pore. The at least one bulky residue at the interface between the
first and second pores is typically at a position corresponding to
A98, A99, T104, V105, L113, Q114 or S115 of SEQ ID NO: 3. Where the
residue at the interface in the wild type CsgG monomer is A, the
bulky residue is typically I, L, V, M, F, W, Y, N, Q, S or T. Where
the residue present at the interface in the wild type CsgG monomer
is T, the bulky residue is typically L, M, F, W, Y, N, Q, R, D or
E. Where the residue present at the interface in the wild type CsgG
monomer is V, the bulky residue is typically I, L, M, F, W, Y, N,
Q. Where the residue present at the interface in the wild type CsgG
monomer is L, the bulky residue is typically M, F, W, Y, N, Q, R, D
or E. Where the residue present at the interface in the wild type
CsgG monomer is Q, the bulky residue is typically F, W or Y. Where
the residue present at the interface in the wild type CsgG monomer
is S, the bulky residue is typically M, F, W, Y, N, Q, E or R.
[0351] Particularly where the second pore is located outside the
membrane, the second pore, and optionally the first pore,
preferably comprises residues in the barrel region of the pore that
reduce the negative charge inside the barrel compared to the charge
in the barrel of the wild type CsgG pore. These mutations make the
barrel more hydrophilic. At least one monomer in the first pore
and/or at least one monomer in the second pore of the double pore
may comprise at least one residue in the barrel region of the pore,
which residue has less negative charge than the residue present at
the corresponding position in the wild type CsgG monomer. The
charge inside the barrel is sufficiently neutral or positive such
that negatively charged analytes, such as polynucleotides, are not
repelled from entering the pore by electrostatic charges. At least
one residue, such as 2, 3, 4 or 5 residues, in the barrel region of
the pore at a position corresponding to D149, E185, D195, E210
and/or E203 of SEQ ID NO: 3 may be a neutral or positively charged
amino acid. At least one residue, such as 2, 3, 4 or 5 residues, in
the barrel region of the pore at a position corresponding to D149,
E185, D195, E210 and/or E203 of SEQ ID NO: 3 is preferably N, Q, R
or K.
[0352] Particular examples of charge-removing mutations in SEQ ID
NO: 3 include the following: E185N/E203N;
D149N/E185R/D195N/E201R/E203N, D149N/E185R/D195N/E201N/E203N,
D149R/E185N/D195N/E201N/E203N, D149R/E185N/E201N/E203N,
D149N/E185N/D195/E201N/E203N, D149N/E185N/E201N/E203N,
D149N/E185N/E203N, D149N/E185N/E201N, D149N/E203N,
D149N/E201N/D195N, D149N/E201N, D195N/E201N/E203N, E201N/E203N,
D195N/E203, E203R, E203N, E201R, E201N, D195R, D195N, E185R, E185N,
D149R and D149N.
[0353] At least one CsgG monomer in the first pore may comprise at
least one residue in the constriction of the barrel region of the
first pore, which residue decreases, maintains or increases the
length of the constriction compared to the wild type CsgG pore
and/or at least one monomer in the second CsgG pore may comprise at
least one residue in the constriction of the barrel region of the
second pore, which residue decreases, maintains or increases the
length of the constriction compared to the wild type CsgG pore.
Preferably, the length of the constriction in the first pore and/or
the length of the constriction in the second pore is at least as
long as in the wild-type pore and more preferably longer.
[0354] The length of the pore may be increased by inserting
residues into the region corresponding to the region between
positions K49 and F56 of SEQ ID NO: 3. From 1 to 5, such as 2, 3,
or 4 amino acid residues may be inserted at any one or more of the
following positions defined by reference to SEQ ID NO: 3: K49 and
P50, P50 and Y51, Y51 and P52, P52 and A53, A53 and S54, S54 and
N55 and/or N55 and F56. Preferably from 1 to 10, such as 2 to 8, or
3 to 5 amino acid residues in total are inserted into the sequence
of a monomer. Preferably, all of the monomers in the first pore
and/or all of the monomers in the second pore have the same number
of insertions in this region. The inserted residues may increase
the length of the loop between the residues corresponding to Y51
and N55 of SEQ ID NO: 3. The inserted residues may be any
combination of A, S, G or T to maintain flexibility; P to add a
kink to the loop; and/or S, T, N, Q, M, F, W, Y, V and/or I to
contribute to the signal produced when a analyte interacts with the
barrel of the pore under an applied potential difference. The
inserted amino acids may be any combination of S, G, SG, SGG, SGS,
GS, GSS and/or GSG.
[0355] In the double pore, the constriction in the barrel of the
first pore and/or the second pore may comprise at least one
residue, such as 2, 3, 4 or 5 residues, which influences the
properties of the pore when used to detect or characterise an
analyte compared to when a first pore or a second pore with a
wild-type constriction is used, wherein the at least one residue in
the constriction of the barrel region of the pore is at a position
corresponding to Y51, N55, Y51, P52 and/or A53 of SEQ ID NO: 3. The
at least one residue may be Q or V at a position corresponding to
F56 of SEQ ID NO: 3; A or Q at a position corresponding to Y51 of
SEQ ID NO: 3; and/or V at a position corresponding to N55 of SEQ ID
NO: 3. The double pore may comprise at least one monomer in the
first CsgG pore and/or at least one monomer in the second CsgG
pore, which monomer comprises two or more of the mutations defined
above.
[0356] A CsgG monomer in the double pore may comprise a cysteine
residue at a position corresponding to R97, I107, R110, Q100, E101,
N102 and or L113 of SEQ ID NO: 3.
[0357] A CsgG monomer in the double pore may comprise a residue at
a position corresponding to any one or more of R97, Q100, I107,
R110, E101, N102 and L113 of SEQ ID NO: 3, which residue is more
hydrophobic than the residue present at the corresponding position
of SEQ ID NO: 3, such as the corresponding position of any one of
SEQ ID NOs: 68 to 88, wherein the residue at the position
corresponding to R97 and/or I107 is M, the residue at the position
corresponding to R110 is 1, L, V, M, W or Y, and/or the residue at
the position corresponding to E101 or N102 is V or M. The residue
at a position corresponding to Q100 is typically I, L, V, M, F, W
or Y; and or the residue at a position corresponding to L113 is
typically I, V, M, F, W or Y.
[0358] Particular monomers may have the sequence shown in SEQ ID NO
3 comprising Y51A, F56Q substitutions and R97I/V/L/M/F/W/Y,
I107L/V/M/F/W/Y, R110I/V/L/M/F/W/Y, Q100I/V/L/M/F/W/Y,
E101I/V/L/M/F/W/Y, N102I/V/L/M/F/W/Y and L113Cl/V/L/M/F/W/Y in
combination, R97I/V/L/M/F/W/Y and N102I/V/L/M/F/W/Y in combination
and/or R97I/V/L/M/F/W/Y and E101I/V/L/M/F/W/Y in combination. I107
may already form hydrophobic interactions between two pores.
[0359] The CsgG monomer in at least one of the pores in the double
pore may comprise a residue at a position corresponding to any one
or more of A98, A99, T104, V105, L113, Q114 and S115 of SEQ ID NO:
3 which is bulkier than the residue present at the corresponding
position of SEQ ID NO: 3, such as the corresponding position of any
one of SEQ ID NOs: 68 to 88, wherein the residue at the position
corresponding to T104 is L, M, F, W, Y, N, Q, D or E, the residue
at the position corresponding to L113 is M, F, W, Y, N, G, D or E
and/or the residue at the position corresponding to S115 is M, F,
W, Y, N, Q or E. The residue at a position corresponding to A98 or
A99, is typically I, L, V, M, F, W, Y, N, Q, S or T. The residue at
a position corresponding to V105 is 1, L, M, F, W, Y, N or Q. The
residue at a position corresponding to Q114 is F, W or Y. The
residue at a position corresponding to E210 is N, Q, R or K.
Particular monomers may have the sequence shown in SEQ ID NO 3
comprising Y51A, F56Q substitutions and 1, 2, 3, 4, 5, 6 or all of
the following substitutions: A98I/L/V/M/F/W/Y/N/Q/S/T;
A99I/L/V/M/F/W/Y/N/Q/S/T; T104N/Q/L/R/D/E/M/F/W/Y;
V105I/L/M/F/W/Y/N/Q; L113M/F/W/Y/N/Q/D/E/L/R; Q114Y/F/W; and
S115N/Q/M/F/W/Y/E/R.
[0360] The CsgG monomer in at least one of the pores in the double
pore may comprise a residue in the barrel region of the pore at a
position corresponding to any one ore more of D149, E185, D195,
E210 and E203 less negative charge than the residue present at the
corresponding position of SEQ ID NO: 3, such as the corresponding
position of any one of SEQ ID NOs: 68 to 88, wherein the residue at
the position corresponding to D149, E185, D195 and/or E203 is
K.
[0361] The CsgG monomer in at least one of the pores in the double
pore may comprise at least one residue in the constriction of the
barrel region of the pore, which residue increases the length of
the constriction compared to the wild type CsgG pore. The at least
one residue is additional to the residues present in the
constriction of the wild type CsgG pore.
[0362] The length of the pore may be increased by inserting
residues into the region corresponding to the region between
positions K49 and F56 of SEQ ID NO: 3. From 1 to 5, such as 2, 3,
or 4 amino acid residues may be inserted at any one or more of the
following positions defined by reference to SEQ ID NO: 3: K49 and
P50, P50 and Y51, Y51 and P52, P52 and A53, A53 and S54, S54 and
N55 and/or N55 and F56. Preferably from 1 to 10, such as 2 to 8, or
3 to 5 amino acid residues in total are inserted into the sequence
of the monomer. The inserted residues may increase the length of
the loop between the residues corresponding to Y51 and N55 of SEQ
ID NO: 3. The inserted residues may be any combination of A, S, G
or T to maintain flexibility; P to add a kink to the loop; and/or
S, T, N, Q, M, F, W, Y, V and/or I to contribute to the signal
produced when a analyte interacts with the barrel of the pore under
an applied potential difference. The inserted amino acids may be
any combination of S, G, SG, SGG, SGS, GS, GSS and/or GSG.
[0363] The CsgG monomer in at least one of the pores in the double
pore may comprise at least one residue in the constriction of the
barrel region of the pore at a position corresponding to N55, P52
and/or A53 of SEQ ID NO: 3 that is different from the residue
present in the corresponding wild type monomer, wherein the residue
at a position corresponding to N55 is V.
[0364] Any two or more of the above described residues may be
present in the same monomer.
[0365] In particular the monomer may comprise at least one said
cysteine residue, at least one said hydrophobic residue, at least
one said bulky residue, at least one said neutral or positively
charged residue and/or at least one said residue that increases the
length of the constriction.
[0366] A CsgG monomer in the double pore may additionally comprise
one or more, such as 2, 3, 4 or 5 residues, which influence the
properties of the pore when used to detect or characterise an
analyte compared to when a first pore or a second pore with a
wild-type constriction is used, wherein the at least one residue in
the constriction of the barrel region of the pore is at a position
corresponding to Y51, N55, Y51, P52 and/or A53 of SEQ ID NO: 3. The
at least one residue may be Q or V at a position corresponding to
F56 of SEQ ID NO: 3; A or Q at a position corresponding to Y51 of
SEQ ID NO: 3; and/or V at a position corresponding to N55 of SEQ ID
NO: 3.
Method for Making Modified Proteins
[0367] Methods for introducing or substituting
non-naturally-occurring amino acids are also well known in the art.
For instance, non-naturally-occurring amino acids may be introduced
by including synthetic aminoacyl-tRNAs in the IVTT system used to
express the mutant monomer. Alternatively, they may be introduced
by expressing the mutant monomer in E. coli that are auxotrophic
for specific amino acids in the presence of synthetic (i.e.
non-naturally-occurring) analogues of those specific amino acids.
They may also be produced by naked ligation if the mutant monomer
is produced using partial peptide synthesis.
[0368] The monomers derived from CsgG may be modified to assist
their identification or purification, for example by the addition
of a streptavidin tag or by the addition of a signal sequence to
promote their secretion from a cell where the monomer does not
naturally contain such a sequence. Other suitable tags are
discussed in more detail below. The monomer may be labelled with a
revealing label. The revealing label may be any suitable label
which allows the monomer to be detected. Suitable labels are
described below.
[0369] The monomer derived from CsgG may also be produced using
D-amino acids. For instance, the monomer derived from CsgG may
comprise a mixture of L-amino acids and D-amino acids. This is
conventional in the art for producing such proteins or
peptides.
[0370] The monomer derived from CsgG contains one or more specific
modifications to facilitate nucleotide discrimination. The monomer
derived from CsgG may also contain other non-specific modifications
as long as they do not interfere with pore formation. A number of
non-specific side chain modifications are known in the art and may
be made to the side chains of the monomer derived from CsgG. Such
modifications include, for example, reductive alkylation of amino
acids by reaction with an aldehyde followed by reduction with
NaBH.sub.4, amidination with methylacetimidate or acylation with
acetic anhydride.
[0371] The monomer derived from CsgG can be produced using standard
methods known in the art. The monomer derived from CsgG may be made
synthetically or by recombinant means. For example, the monomer may
be synthesised by in vitro translation and transcription (IVTT).
Suitable methods for producing pores and monomers are discussed in
the International applications WO 2010/004273, WO 2010/004265 or WO
2010/086603. Methods for inserting pores into membranes are
known.
[0372] Two or more CsgG monomers in the pore may be covalently
attached to one another. For example, at least 2, at least 3, at
least 4, at least 5, at least 6, at least 7, at least 8, at least 9
or at least 10 monomers may be covalently attached. The covalently
attached monomers may be the same or different.
[0373] The monomers may be genetically fused, optionally via a
linker, or chemically fused, for instance via a chemical
crosslinker. Methods for covalently attaching monomers are
disclosed in WO2017/149316, WO2017/149317 and WO2017/149318.
[0374] In some embodiments, the mutant monomer is chemically
modified. The mutant monomer can be chemically modified in any way
and at any site. The mutant monomer is preferably chemically
modified by attachment of a molecule to one or more cysteines
(cysteine linkage), attachment of a molecule to one or more
lysines, attachment of a molecule to one or more non-natural amino
acids, enzyme modification of an epitope or modification of a
terminus. Suitable methods for carrying out such modifications are
well-known in the art. The mutant monomer may be chemically
modified by the attachment of any molecule. For instance, the
mutant monomer may be chemically modified by attachment of a dye or
a fluorophore.
[0375] In some embodiments, the mutant monomer is chemically
modified with a molecular adaptor that facilitates the interaction
between a pore comprising the monomer and a target nucleotide or
target polynucleotide sequence. The presence of the adaptor
improves the host-guest chemistry of the pore and the nucleotide or
polynucleotide sequence and thereby improves the sequencing ability
of pores formed from the mutant monomer. The principles of
host-guest chemistry are well-known in the art. The adaptor has an
effect on the physical or chemical properties of the pore that
improves its interaction with the nucleotide or polynucleotide
sequence. The adaptor may alter the charge of the barrel or channel
of the pore or specifically interact with or bind to the nucleotide
or polynucleotide sequence thereby facilitating its interaction
with the pore.
[0376] The molecular adaptor is preferably a cyclic molecule, a
cyclodextrin, a species that is capable of hybridization, a DNA
binder or interchelator, a peptide or peptide analogue, a synthetic
polymer, an aromatic planar molecule, a small positively-charged
molecule or a small molecule capable of hydrogen-bonding.
[0377] The adaptor may be cyclic. A cyclic adaptor preferably has
the same symmetry as the pore. The adaptor preferably has
eight-fold or nine-fold symmetry since CsgG typically has eight or
nine subunits around a central axis. This is discussed in more
detail below.
[0378] The adaptor typically interacts with the nucleotide or
polynucleotide sequence via host-guest chemistry. The adaptor is
typically capable of interacting with the nucleotide or
polynucleotide sequence. The adaptor comprises one or more chemical
groups that are capable of interacting with the nucleotide or
polynucleotide sequence. The one or more chemical groups preferably
interact with the nucleotide or polynucleotide sequence by
non-covalent interactions, such as hydrophobic interactions,
hydrogen bonding, Van der Waal's forces, .pi.-cation interactions
and/or electrostatic forces. The one or more chemical groups that
are capable of interacting with the nucleotide or polynucleotide
sequence are preferably positively charged. The one or more
chemical groups that are capable of interacting with the nucleotide
or polynucleotide sequence more preferably comprise amino groups.
The amino groups can be attached to primary, secondary or tertiary
carbon atoms. The adaptor even more preferably comprises a ring of
amino groups, such as a ring of 6, 7 or 8 amino groups. The adaptor
most preferably comprises a ring of eight amino groups. A ring of
protonated amino groups may interact with negatively charged
phosphate groups in the nucleotide or polynucleotide sequence.
[0379] The correct positioning of the adaptor within the pore can
be facilitated by host-guest chemistry between the adaptor and the
pore comprising the mutant monomer. The adaptor preferably
comprises one or more chemical groups that are capable of
interacting with one or more amino acids in the pore.
[0380] The adaptor more preferably comprises one or more chemical
groups that are capable of interacting with one or more amino acids
in the pore via non-covalent interactions, such as hydrophobic
interactions, hydrogen bonding, Van der Waal's forces, .pi.-cation
interactions and/or electrostatic forces. The chemical groups that
are capable of interacting with one or more amino acids in the pore
are typically hydroxyls or amines. The hydroxyl groups can be
attached to primary, secondary or tertiary carbon atoms. The
hydroxyl groups may form hydrogen bonds with uncharged amino acids
in the pore. Any adaptor that facilitates the interaction between
the pore and the nucleotide or polynucleotide sequence can be
used.
[0381] Suitable adaptors include, but are not limited to,
cyclodextrins, cyclic peptides and cucurbiturils. The adaptor is
preferably a cyclodextrin or a derivative thereof. The cyclodextrin
or derivative thereof may be any of those disclosed in Eliseev, A.
V., and Schneider, H-J. (1994) J. Am. Chem. Soc. 116, 6081-6088.
The adaptor is more preferably heptakis-6-amino-.beta.-cyclodextrin
(am.sub.7-.beta.CD), 6-monodeoxy-6-monoamino-.beta.-cyclodextrin
(am.sub.1-CD) or heptakis-(6-deoxy-6-guanidino)-cyclodextrin
(gu.sub.7-.beta.CD). The guanidino group in gu.sub.7-.beta.CD has a
much higher pKa than the primary amines in am.sub.7-.beta.CD and so
it is more positively charged. This gu.sub.7-.beta.CD adaptor may
be used to increase the dwell time of the nucleotide in the pore,
to increase the accuracy of the residual current measured, as well
as to increase the base detection rate at high temperatures or low
data acquisition rates.
[0382] If a succinimidyl 3-(2-pyridyldithio)propionate (SPDP)
crosslinker is used as discussed in more detail below, the adaptor
is preferably
heptakis(6-deoxy-6-amino)-6-N-mono(2-pyridyl)dithiopropanoyl-.beta.-cyclo-
dextrin (am.sub.6amPDP.sub.1-.beta.CD).
[0383] More suitable adaptors include .gamma.-cyclodextrins, which
comprise 9 sugar units (and therefore have nine-fold symmetry). The
.gamma.-cyclodextrin may contain a linker molecule or may be
modified to comprise all or more of the modified sugar units used
in the .beta.-cyclodextrin examples discussed above. The molecular
adaptor may be covalently attached to the mutant monomer. The
adaptor can be covalently attached to the pore using any method
known in the art. The adaptor is typically attached via chemical
linkage. If the molecular adaptor is attached via cysteine linkage,
the one or more cysteines have preferably been introduced to the
mutant, for instance in the barrel, by substitution. The mutant
monomer may be chemically modified by attachment of a molecular
adaptor to one or more cysteines in the mutant monomer. The one or
more cysteines may be naturally-occurring, i.e. at positions 1
and/or 215 in SEQ ID NO: 3. Alternatively, the mutant monomer may
be chemically modified by attachment of a molecule to one or more
cysteines introduced at other positions. The cysteine at position
215 may be removed, for instance by substitution, to ensure that
the molecular adaptor does not attach to that position rather than
the cysteine at position 1 or a cysteine introduced at another
position.
[0384] The reactivity of cysteine residues may be enhanced by
modification of the adjacent residues. For instance, the basic
groups of flanking arginine, histidine or lysine residues will
change the pKa of the cysteines thiol group to that of the more
reactive S.sup.- group. The reactivity of cysteine residues may be
protected by thiol protective groups such as dTNB. These may be
reacted with one or more cysteine residues of the mutant monomer
before a linker is attached.
[0385] The molecule may be attached directly to the mutant monomer.
The molecule is preferably attached to the mutant monomer using a
linker, such as a chemical crosslinker or a peptide linker.
[0386] Suitable chemical crosslinkers are well-known in the art.
Preferred crosslinkers include 2,5-dioxopyrrolidin-1-yl
3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl
4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-1-yl
8-(pyridin-2-yldisulfanyl)octananoate. The most preferred
crosslinker is succinimidyl 3-(2-pyridyldithio)propionate (SPDP).
Typically, the molecule is covalently attached to the bifunctional
crosslinker before the molecule/crosslinker complex is covalently
attached to the mutant monomer but it is also possible to
covalently attach the bifunctional crosslinker to the monomer
before the bifunctional crosslinker/monomer complex is attached to
the molecule.
[0387] The linker is preferably resistant to dithiothreitol (DTT).
Suitable linkers include, but are not limited to,
iodoacetamide-based and Maleimide-based linkers.
[0388] In other embodiment, the monomer may be attached to a
polynucleotide binding protein. This forms a modular sequencing
system that may be used in the methods of sequencing of the
invention. Polynucleotide binding proteins are discussed below.
[0389] The polynucleotide binding protein is preferably covalently
attached to the mutant monomer. The protein can be covalently
attached to the monomer using any method known in the art. The
monomer and protein may be chemically fused or genetically fused.
The monomer and protein are genetically fused if the whole
construct is expressed from a single polynucleotide sequence.
Genetic fusion of a monomer to a polynucleotide binding protein is
discussed in WO 2010/004265.
[0390] If the polynucleotide binding protein is attached via
cysteine linkage, the one or more cysteines have preferably been
introduced to the mutant by substitution. The one or more cysteines
are preferably introduced into loop regions which have low
conservation amongst homologues indicating that mutations or
insertions may be tolerated. They are therefore suitable for
attaching a polynucleotide binding protein. In such embodiments,
the naturally-occurring cysteine at position 251 may be
removed.
[0391] The reactivity of cysteine residues may be enhanced by
modification as described above.
[0392] The polynucleotide binding protein may be attached directly
to the mutant monomer or via one or more linkers. The molecule may
be attached to the mutant monomer using the hybridization linkers
described in as WO 2010/086602. Alternatively, peptide linkers may
be used. Peptide linkers are amino acid sequences. The length,
flexibility and hydrophilicity of the peptide linker are typically
designed such that it does not to disturb the functions of the
monomer and molecule. Preferred flexible peptide linkers are
stretches of 2 to 20, such as 4, 6, 8, 10 or 16, serine and/or
glycine amino acids. More preferred flexible linkers include
(SG).sub.1, (SG).sub.2, (SG).sub.3, (SG).sub.4, (SG).sub.5 and
(SG).sub.6 wherein S is serine and G is glycine. Preferred rigid
linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24,
proline amino acids. More preferred rigid linkers include
(P).sub.12 wherein P is proline.
Chemical Modification
[0393] The mutant CsgG monomer or CsgF peptide may be chemically
modified with a molecular adaptor and a polynucleotide binding
protein.
[0394] The molecule (with which the monomer or peptide is
chemically modified) may be attached directly to the monomer or
peptide or attached via a linker as disclosed in WO 2010/004273, WO
2010/004265 or WO 2010/086603.
[0395] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be modified to assist their
identification or purification, for example by the addition of
histidine residues (a his tag), aspartic acid residues (an asp
tag), a streptavidin tag, a flag tag, a SUMO tag, a GST tag or a
MBP tag, or by the addition of a signal sequence to promote their
secretion from a cell where the polypeptide does not naturally
contain such a sequence. An alternative to introducing a genetic
tag is to chemically react a tag onto a native or engineered
position on the protein. An example of this would be to react a
gel-shift reagent to a cysteine engineered on the outside of the
protein. This has been demonstrated as a method for separating
hemolysin hetero-oligomers (Chem Biol. 1997 July;
4(7):497-505).
[0396] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be labelled with a revealing
label. The revealing label may be any suitable label which allows
the protein to be detected. Suitable labels include, but are not
limited to, fluorescent molecules, radioisotopes, e.g. .sup.125I,
.sup.35S, enzymes, antibodies, antigens, polynucleotides and
ligands such as biotin.
[0397] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be made synthetically or by
recombinant means. For example, the protein may be synthesised by
in vitro translation and transcription (IVTT). The amino acid
sequence of the protein may be modified to include non-naturally
occurring amino acids or to increase the stability of the protein.
When a protein is produced by synthetic means, such amino acids may
be introduced during production. The protein may also be altered
following either synthetic or recombinant production.
[0398] Proteins may also be produced using D-amino acids. For
instance, the protein may comprise a mixture of L-amino acids and
D-amino acids. This is conventional in the art for producing such
proteins or peptides.
[0399] The protein may also contain other non-specific
modifications as long as they do not interfere with the function of
the protein. A number of non-specific side chain modifications are
known in the art and may be made to the side chains of the
protein(s). Such modifications include, for example, reductive
alkylation of amino acids by reaction with an aldehyde followed by
reduction with NaBH.sub.4, amidination with methylacetimidate or
acylation with acetic anhydride.
[0400] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, can be produced using standard
methods known in the art. Polynucleotide sequences encoding a
protein may be derived and replicated using standard methods in the
art. Polynucleotide sequences encoding a protein may be expressed
in a bacterial host cell using standard techniques in the art. The
protein may be produced in a cell by in situ expression of the
polypeptide from a recombinant expression vector. The expression
vector optionally carries an inducible promoter to control the
expression of the polypeptide. These methods are described in
Sambrook, J. and Russell, D. (2001). Molecular Cloning: A
Laboratory Manual, 3rd Edition. Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, N.Y.
[0401] Proteins may be produced in large scale following
purification by any protein liquid chromatography system from
protein producing organisms or after recombinant expression.
Typical protein liquid chromatography systems include FPLC, AKTA
systems, the Bio-Cad system, the Bio-Rad BioLogic system and the
Gilson HPLC system.
Method of Producing Pores
[0402] In a third aspect, the invention provides methods to in vivo
and in vitro produce CsgG: modified CsgF pore complex holding two
or more constriction sites. One embodiment provides a method for
producing a transmembrane pore complex, comprising a CsgG pore, or
homologue or mutant form thereof, and the modified CsgF peptide, or
its homologue or mutant, via co-expression. Said method comprising
the steps of expressing CsgG monomers (expressed as preprotein
provided in SEQ ID NO: 2, or a homologue or mutant thereof), and
expressing modified or truncated CsgF monomers, both in a suitable
host cell, allowing in vivo complex pore formation. Said complex
comprises modified CsgF peptides, in complex with the CsgG pore, to
provide the pore with an additional reader head. The resulting pore
complex produced by said method using modified CsgF peptides
provides a structure that is sufficient for a use of the pore
complex in characterization of target analytes such as nucleic acid
sequencing, as it allows passage of the analytes, in particular
polynucleotide strands, and comprises two or more reader heads for
improved reading of said polynucleotide sequence, when used in the
appropriate settings for said application.
[0403] More particularly, the modified CsgF peptide expressed in
said method comprises the preprotein depicted in SEQ ID NO: 8, 10,
12, or 14, or homologues thereof. Those sequences limiting the
method to those CsgF fragments capable of introducing a
constriction site in the pore complex, and binding to the CsgG
protein pore, to obtain a biological pore.
[0404] Another method for producing an isolated pore complex formed
by the CsgG and CsgF proteins or the like, relates to in vitro
reconstitution of said monomers to obtain a functional pore. Said
method comprises the steps of contacting the mature CsgG monomer as
depicted in SEQ ID NO:3 or a homologue or mutant thereof, with
modified CsgF peptides, or homologues or mutants thereof, in a
suitable system to allow complex formation. Said system may be an
"in vitro system", which refers to a system comprising at least the
necessary components and environment to execute said method, and
makes use of biological molecules, organisms, a cell (or part of a
cell) outside of their normal naturally-occurring environment,
permitting a more detailed, more convenient, or more efficient
analysis than can be done with whole organisms. An in vitro system
may also comprise a suitable buffer composition provided in a test
tube, wherein said protein components to form the complex have been
added. A person skilled in the art is aware of the options to
provide said system. Said modified CsgF peptides or the like
applied in said method for in vitro reconstitution in particular
embodiments are peptides comprising SEQ ID NO:15 or SEQ ID NO:16,
or mutant or homologues thereof, which can be synthesized or
recombinantly produced. Alternatively, modified CsgF peptides
comprising SEQ ID NO:40, 39, 38, or 37, 15, 54, 55 or homologues or
mutants thereof, are provided in said method for contacting with
CsgG or CsgG-like pores to produce the pore complex.
[0405] The CsgG/CsgF pore can be made by any suitable method.
Examples of such suitable methods are described.
[0406] In one embodiment, the CsgG/CsgF pore may be produced by
co-expression. In this embodiment, at least one gene encoding a
CsgG monomer polypeptide (which may be a mutant polypeptide) in one
vector and a gene encoding at least one full length or truncated
CsgF polypeptide (which may be a mutant polypeptide) in a second
vector may be transformed together to express the proteins and make
the complex within transformed cells. This could be in vivo or in
vitro. Alternatively, the two genes encoding the polypetides of
CsgG and CsgF can be placed in one vector under the control of a
single promotor or under the control of two separate promoters,
which may be the same or different.
[0407] In another embodiment, the CsgG/CsgF pore is produced by
expressing the CsgG monomer(s) separately from the CsgF peptide.
CsgG monomers or a CsgG pore may be purified from the cells
transformed with a vector encoding at least one CsgG monomer, or
with more than one vector each expressing a CsgG monomer. The CsgF
peptide may be purified from the cells transformed with a vector
encoding at least one CsgF peptide. The purified CsgG
monomer(s)/pore may then be incubated together with the CsgF
peptide(s) to make the pore complex.
[0408] In another embodiment, the CsgG monomer(s) and/or the CsgF
peptide are produced separately by in vitro translation and
transcription (IVTT). The CsgG monomer(s) may then be incubated
together with the CsgF peptide(s) to make the pore complex. Use of
this method is illustrated in FIG. 14.
[0409] The above embodiments may be combined, such that for
example, (i) CsgG is produced in vivo and CsgF in vivo; (ii) CsgG
is produced in vitro and CsgF in vivo; (iii) CsgG is produced in
vivo and CsgF in vitro; or (iv) CsgG is produced in vitro and CsgF
in vitro.
[0410] One or both of the CsgG momoner and the CsgF peptide may be
tagged to facilitate purification. Purification can also be
performed when the CsgG monomer and/or CsgF peptide are untagged.
Methods known in the art (e.g. ion exchange, gel filtration,
hydrophobic interaction column chromatography etc.) can be used
alone or in different combinations to purify the components of the
pore.
[0411] Any known tags can be used in any of the two proteins. In
one embodiment, two tag purification can be used to purify the
CsgG:CsgF complex from CsgG pore and CsgF. For example, a Strep tag
can be used in CsgG and His tag can be used in CsgF or vice versa.
This is exemplified in FIG. 13. A similar end result can be
obtained when the two proteins are purified individually and mixed
together followed by another round of Strep and His
purification.
[0412] When the full length CsgF protein forms the complex with
CsgG, the neck and head domains of CsgF (FIG. 4B) (shown in the red
box) protrudes out from the beta barrel of the CsgG pore.
[0413] Therefore, if a pore comprising a CsgG pore and the full
length CsgF were used in single channel recording experiments, the
head domain may hinder or prevent pore insertion in to a membrane.
They may also potentially block the passage of the pore for
analytes to pass through. Thus, when inserting a pore into a
membrane, it is better if the number of flexible polypeptides
dangling down from the beta-barrel are reduced. Truncated versions
of the CsgF protein that mimic the FCP region resolved in the cryo
EM structure of the complex and that maintain structural integrity
are provided herein.
[0414] The CsgG/CsgF pore can be made prior to insertion into a
membrane or after insertion of the CsgG pore into a membrane. When
the pore complex is made prior to insertion into a membrane, a
truncation mutant is preferably used. However, the CsgG pore may be
inserted into a membrane and the CsgF peptide may be added
afterwards so that the CsgG and CsgF complex can form in situ. For
example, in one embodiment a system where the trans side of the
membrane is accessible (for example in a chip or chamber for
electrophysiogy measurements), the CsgG pore may be inserted into
the membrane, and then a CsgF peptide may be added from the trans
side of the membrane, so that the complex can be formed in-situ. In
any embodiment in which the CsgG pore is formed in situ, larger
CsgF peptide may be used. For example, the CsgF peptide may
comprise all or part of the neck domain of CsgF (from about residue
36 of SEQ ID NO: 6). In some embodiments the CsgF may comprise all
of the neck domain and part of the head domain (residues 36 to XX
of SEQ ID NO: 6) Depending on the methods of making the complex and
the stability of the complex with a particular truncation,
CsgG:CsgF and CsgG:FCP complexes can be made in different
methods.
[0415] In one embodiment, the truncated version of the CsgF
polypeptide at the required length is used directly.
[0416] Another embodiment uses the full length polypeptide of CsgF
or a longer than required truncation (enough to keep the complex
stable) in which a protease cleavage sites is inserted (e.g. TEV,
HRV 3 or any other protease cleavage site) so that cleavage by the
protease produces a CsgF peptide of the required length. In this
embodiment, once the CsgG/CsgF complex is formed, the protease is
used to cleave the CsgF at the required site. Alternatively, the
protease may be used to produce the CsgF peptide prior to complex
assembly.
[0417] Some protease sites will leave an additional tag behind
after cleavage. For example, the TEV protease cleavage sequence is
ENLYFQS. TEV protease cleaves the protein between Q and S leaving
ENLYFQ intact at the C-terminus of the CsgF peptide. FIG. 15 shows
an example using a modified CsgF comprising a TEV cleavage site
which is cleaved using TEV protease after complex formation.
[0418] By way of another example, the HRV C3 cleavage site is
LEVLFQGP and the enzyme cleaves between Q and G leaving LEVLFQ
intact at the C-terminus of the CsgF peptide.
Methods of Characterising an Analyte
[0419] In a further aspect, the invention provides a method of
determining the presence, absence or one or more characteristics of
a target analyte. The method involves contacting the target analyte
with an isolated pore complex, or transmembrane pore, such as a
pore of the invention, such that the target analyte moves with
respect to, such as into or through, the pore channel and taking
one or more measurements as the analyte moves with respect to the
pore and thereby determining the presence, absence or one or more
characteristics of the analyte. The target analyte may also be
called the template analyte or the analyte of interest. The
isolated pore complex typically comprises at least 7, at least 8,
at least 9 or at least 10 monomers, such as 7, 8, 9 or 10 CsgG
monomers. The isolated pore complex preferably comprises eight or
nine identical CsgG monomers. One or more, such as 2, 3, 4, 5, 6,
7, 8, 9 or 10, of the CsgG monomers is preferably chemically
modified, or the CsgF peptide is chemically modified. The isolated
pore complex monomers, such as the CsgG monomers, or homologues or
mutants thereof, and the modified CsgF monomers, or homologues or
mutants thereof, may be derived from any organism. The analyte may
pass through the CsgG constriction, followed by the CsgF
constriction. In an alternative embodiment the analyte may pass
through the CsgF constriction, followed by the CsgG constriction,
depending on the orientation of the CsgG/CsgF complex in the
membrane.
[0420] The method is for determining the presence, absence or one
or more characteristics of a target analyte. The method may be for
determining the presence, absence or one or more characteristics of
at least one analyte. The method may concern determining the
presence, absence or one or more characteristics of two or more
analytes. The method may comprise determining the presence, absence
or one or more characteristics of any number of analytes, such as
2, 5, 10, 15, 20, 30, 40, 50, 100 or more analytes. Any number of
characteristics of the one or more analytes may be determined, such
as 1, 2, 3, 4, 5, 10 or more characteristics.
[0421] The binding of a molecule in the channel of the pore
complex, or in the vicinity of either opening of the channel will
have an effect on the open-channel ion flow through the pore, which
is the essence of "molecular sensing" of pore channels. In a
similar manner to the nucleic acid sequencing application,
variation in the open-channel ion flow can be measured using
suitable measurement techniques by the change in electrical current
(for example, WO 2000/28312 and D. Stoddart et al., Proc. Natl.
Acad. Sci., 2010, 106, 7702-7 or WO 2009/077734). The degree of
reduction in ion flow, as measured by the reduction in electrical
current, is related to the size of the obstruction within, or in
the vicinity of, the pore. Binding of a molecule of interest, also
referred to as an "analyte", in or near the pore therefore provides
a detectable and measurable event, thereby forming the basis of a
"biological sensor". Suitable molecules for nanopore sensing
include nucleic acids; proteins; peptides; polysaccharides and
small molecules (refers here to a low molecular weight (e.g.,
<900 Da or <500 Da) organic or inorganic compound) such as
pharmaceuticals, toxins, cytokines, and pollutants. Detecting the
presence of biological molecules finds application in personalised
drug development, medicine, diagnostics, life science research,
environmental monitoring and in the security and/or the defence
industry.
[0422] In another aspect, the isolated pore complex, or the
transmembrane pore complex containing a wild type or modified E.
coli CsgG nanopore, or homologue or mutant thereof, and a modified
CsgF peptide providing a channel constriction to the pore within
the complex, may serve as a molecular or biological sensor. In some
embodiments, the CsgG nanopore can be derived or isolated from
bacterial proteins (e.g., E. coli, Salmonella typhi). In some
embodiments, the CsgG nanopore can be recombinantly produced.
Procedures for analyte detection are described in Howorka et al.
Nature Biotechnology (2012) Jun. 7; 30(6):506-7. The analyte
molecule that is to be detected may bind to either face of the
channel, or within the lumen of the channel itself. The position of
binding may be determined by the size of the molecule to be
sensed.
[0423] The target analyte is preferably a metal ion, an inorganic
salt, a polymer, an amino acid, a peptide, a polypeptide, a
protein, a nucleotide, an oligonucleotide, a polynucleotide, a
polysaccharide, a dye, a bleach, a pharmaceutical, a diagnostic
agent, a recreational drug, an explosive, a toxic compound, or an
environmental pollutant. The method may concern determining the
presence, absence or one or more characteristics of two or more
analytes of the same type, such as two or more proteins, two or
more nucleotides or two or more pharmaceuticals. Alternatively, the
method may concern determining the presence, absence or one or more
characteristics of two or more analytes of different types, such as
one or more proteins, one or more nucleotides and one or more
pharmaceuticals.
[0424] The target analyte can be secreted from cells.
Alternatively, the target analyte can be an analyte that is present
inside cells such that the analyte must be extracted from the cells
before the method can be carried out.
[0425] A wild-type pore may act as sensor, but is often modified
via recombinant or chemical methods to increase the strength of
binding, the position of binding, or the specificity of binding of
the molecule to be sensed. Typical modifications include addition
of a specific binding moiety complimentary to the structure of the
molecule to be sensed. Where the analyte molecule comprises a
nucleic acid, this binding moiety may comprise a cyclodextrin or an
oligonucleotide; for small molecules this may be a known
complimentary binding region, for example the antigen binding
portion of an antibody or of a non-antibody molecule, including a
single chain variable fragment (scFv) region or an antigen
recognition domain from a T-cell receptor (TCR); or for proteins,
it may be a known ligand of the target protein. In this way the
wild type or modified E. coli CsgG nanopore, or homologue thereof,
may be rendered capable of acting as a molecular sensor for
detecting presence in a sample of suitable antigens (including
epitopes) that may include cell surface antigens, including
receptors, markers of solid tumours or haematologic cancer cells
(e.g. lymphoma or leukaemia), viral antigens, bacterial antigens,
protozoal antigens, allergens, allergy related molecules, albumin
(e.g. human, rodent, or bovine), fluorescent molecules (including
fluorescein), blood group antigens, small molecules, drugs,
enzymes, catalytic sites of enzymes or enzyme substrates, and
transition state analogues of enzyme substrates. As described
above, modifications may be achieved using known genetic
engineering and recombinant DNA techniques. The positioning of any
adaptation would be dependent on the nature of the molecule to be
sensed, for example, the size, three-dimensional structure, and its
biochemical nature. The choice of adapted structure may make use of
computational structural design. Determination and optimization of
protein-protein interactions or protein-small molecule interactions
can be investigated using technologies such as a BIAcore.RTM. which
detects molecular interactions using surface plasmon resonance
(BIAcore, Inc., Piscataway, N.J.; see also www.biacore.com).
[0426] In one embodiment, the analyte is an amino acid, a peptide,
a polypeptides or protein. The amino acid, peptide, polypeptide or
protein can be naturally-occurring or non-naturally-occurring. The
polypeptide or protein can include within them synthetic or
modified amino acids. Several different types of modification to
amino acids are known in the art. Suitable amino acids and
modifications thereof are above. It is to be understood that the
target analyte can be modified by any method available in the
art.
[0427] In another embodiment, the analyte is a polynucleotide, such
as a nucleic acid, which is defined as a macromolecule comprising
two or more nucleotides. Nucleic acids are particularly suitable
for nanopore sequencing. The naturally-occurring nucleic acid bases
in DNA and RNA may be distinguished by their physical size. As a
nucleic acid molecule, or individual base, passes through the
channel of a nanopore, the size differential between the bases
causes a directly correlated reduction in the ion flow through the
channel. The variation in ion flow may be recorded. Suitable
electrical measurement techniques for recording ion flow variations
are described in, for example, WO 2000/28312 and D. Stoddart et
al., Proc. Natl. Acad. Sci., 2010, 106, pp 7702-7 (single channel
recording equipment); and, for example, in WO 2009/077734
(multi-channel recording techniques). Through suitable calibration,
the characteristic reduction in ion flow can be used to identify
the particular nucleotide and associated base traversing the
channel in real-time. In typical nanopore nucleic acid sequencing,
the open-channel ion flow is reduced as the individual nucleotides
of the nucleic sequence of interest sequentially pass through the
channel of the nanopore due to the partial blockage of the channel
by the nucleotide. It is this reduction in ion flow that is
measured using the suitable recording techniques described above.
The reduction in ion flow may be calibrated to the reduction in
measured ion flow for known nucleotides through the channel
resulting in a means for determining which nucleotide is passing
through the channel, and therefore, when done sequentially, a way
of determining the nucleotide sequence of the nucleic acid passing
through the nanopore. For the accurate determination of individual
nucleotides, it has typically required for the reduction in ion
flow through the channel to be directly correlated to the size of
the individual nucleotide passing through the constriction (or
"reading head"). It will be appreciated that sequencing may be
performed upon an intact nucleic acid polymer that is `threaded`
through the pore via the action of an associated polymerase, for
example. Alternatively, sequences may be determined by passage of
nucleotide triphosphate bases that have been sequentially removed
from a target nucleic acid in proximity to the pore (see for
example WO 2014/187924).
[0428] The polynucleotide or nucleic acid may comprise any
combination of any nucleotides. The nucleotides can be naturally
occurring or artificial. One or more nucleotides in the
polynucleotide can be oxidized or methylated. One or more
nucleotides in the polynucleotide may be damaged. For instance, the
polynucleotide may comprise a pyrimidine dimer. Such dimers are
typically associated with damage by ultraviolet light and are the
primary cause of skin melanomas. One or more nucleotides in the
polynucleotide may be modified, for instance with a label or a tag,
for which suitable examples are known by a skilled person. The
polynucleotide may comprise one or more spacers. A nucleotide
typically contains a nucleobase, a sugar and at least one phosphate
group. The nucleobase and sugar form a nucleoside. The nucleobase
is typically heterocyclic. Nucleobases include, but are not limited
to, purines and pyrimidines and more specifically adenine (A),
guanine (G), thymine (T), uracil (U) and cytosine (C). The sugar is
typically a pentose sugar. Nucleotide sugars include, but are not
limited to, ribose and deoxyribose. The sugar is preferably a
deoxyribose. The polynucleotide preferably comprises the following
nucleosides: deoxyadenosine (dA), deoxyuridine (dU) and/or
thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC). The
nucleotide is typically a ribonucleotide or deoxyribonucleotide.
The nucleotide typically contains a monophosphate, diphosphate or
triphosphate. The nucleotide may comprise more than three
phosphates, such as 4 or 5 phosphates. Phosphates may be attached
on the 5' or 3' side of a nucleotide. The nucleotides in the
polynucleotide may be attached to each other in any manner. The
nucleotides are typically attached by their sugar and phosphate
groups as in nucleic acids. The nucleotides may be connected via
their nucleobases as in pyrimidine dimers. The polynucleotide may
be single stranded or double stranded. At least a portion of the
polynucleotide is preferably double stranded. The polynucleotide is
most preferably ribonucleic nucleic acid (RNA) or deoxyribonucleic
acid (DNA). In particular, said method using a polynucleotide as an
analyte alternatively comprises determining one or more
characteristics selected from (i) the length of the polynucleotide,
(ii) the identity of the polynucleotide, (iii) the sequence of the
polynucleotide, (iv) the secondary structure of the polynucleotide
and (v) whether or not the polynucleotide is modified.
[0429] The polynucleotide can be any length (i). For example, the
polynucleotide can be at least 10, at least 50, at least 100, at
least 150, at least 200, at least 250, at least 300, at least 400
or at least 500 nucleotides or nucleotide pairs in length. The
polynucleotide can be 1000 or more nucleotides or nucleotide pairs,
5000 or more nucleotides or nucleotide pairs in length or 100000 or
more nucleotides or nucleotide pairs in length. Any number of
polynucleotides can be investigated. For instance, the method may
concern characterising 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100
or more polynucleotides. If two or more polynucleotides are
characterised, they may be different polynucleotides or two
instances of the same polynucleotide. The polynucleotide can be
naturally occurring or artificial. For instance, the method may be
used to verify the sequence of a manufactured oligonucleotide. The
method is typically carried out in vitro.
[0430] Nucleotides can have any identity (ii), and include, but are
not limited to, adenosine monophosphate (AMP), guanosine
monophosphate (GMP), thymidine monophosphate (TMP), uridine
monophosphate (UMP), 5-methylcytidine monophosphate,
5-hydroxymethylcytidine monophosphate, cytidine monophosphate
(CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine
monophosphate (cGMP), deoxyadenosine monophosphate (dAMP),
deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate
(dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine
monophosphate (dCMP) and deoxymethylcytidine monophosphate. The
nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP,
dAMP, dTMP, dGMP, dCMP and dUMP. A nucleotide may be abasic (i.e.
lack a nucleobase). A nucleotide may also lack a nucleobase and a
sugar (i.e. is a C3 spacer). The sequence of the nucleotides (iii)
is determined by the consecutive identity of following nucleotides
attached to each other throughout the polynucleotide strain, in the
5' to 3' direction of the strand.
[0431] The pores comprising a CsgG pore and CsgF peptides are
particularly useful in analysing homopolymers. For example, the
pores may be used to determine the sequence of a polynucleotide
comprising two or more, such as at least 3, 4, 5, 6, 7, 8, 9 or 10,
consecutive nucleotides that are identical.
[0432] For example, the pores may be used to sequence a
polynucleotide comprising a polyA, polyT, polyG and/or polyC
region.
[0433] The CsgG pore constriction is made of the residues at the
51, 55 and 56 positions of SEQ ID NO: 3. The reader head of CsgG
and its constriction mutants are generally sharp. When DNA is
passing through the constriction, interactions of approximately 5
bases of DNA with the reader head of the pore at any given time
dominate the current signal. Although these sharper reader heads
are very good in reading mixed sequence regions of DNA (when A, T,
G and C are mixed), the signal becomes flat and lack information
when there is a homopolymeric region within the DNA (eg: polyT,
polyG, polyA, polyC). Because 5 bases dominate the signal of the
CsgG and its constriction mutants, its difficult to discriminate
photopolymers longer than 5 without using additional dwell time
information. However, if DNA is passing through a second reader
head, more DNA bases will interact with the combined reader heads,
increasing the length of the homopolymers that can be
discriminated. The Examples and Figures show that such an increase
in homopolymer sequencing accuracy is achieved using the pore
comprising a CsgG pore and a CsgF peptide.
Kit
[0434] In a further aspect, the present invention also provides a
kit for characterising a target polynucleotide. The kit comprises
an isolated pore complex according to the invention, and the
components of a membrane or an insulating layer. The membrane is
preferably formed from the components. The isolated pore complex is
preferably present in the membrane or in the insulating layer,
together forming a transmembrane pore complex channel. The kit may
comprise components of any type of membranes, such as an
amphiphilic layer or a triblock copolymer membrane. The kit may
further comprise a polynucleotide binding protein. The kit may
further comprise one or more anchors for coupling the
polynucleotide to the membrane. The kit may additionally comprise
one or more other reagents or instruments which enable any of the
embodiments mentioned above to be carried out. Such reagents or
instruments include one or more of the following: suitable
buffer(s) (aqueous solutions), means to obtain a sample from a
subject (such as a vessel or an instrument comprising a needle),
means to amplify and/or express polynucleotides or voltage or patch
clamp apparatus. Reagents may be present in the kit in a dry state
such that a fluid sample resuspends the reagents. The kit may also,
optionally, comprise instructions to enable the kit to be used in
the method of the invention or details regarding for which organism
the method may be used. Finally, the kit may also comprise
additional components useful in peptide characterization.
[0435] In one embodiment, the isolated pore complex or
transmembrane pore complex, as provided by the invention, is used
for nucleic acid sequencing. For said use, the Phi29 DNA polymerase
(DNAP) may be used as a molecular motor with a CsgG:CsgF Nanopore
complex located within a membrane to allow controlled movement of
an oligomeric probe DNA strand through the pore. A voltage may be
applied across the pore and a current generated from the movement
of ions in a salt solution on either side of the nanopore. As the
probe DNA moves through the pore, the ionic flow through the pore
changes with respect to the DNA. This information has been shown to
be sequence dependent and allows for the sequence of the probe to
be read with accuracy from current measurements.
[0436] It is to be understood that although particular embodiments,
specific configurations as well as materials and/or molecules, have
been discussed herein for engineered cells and methods according to
the present invention, various changes or modifications in form and
detail may be made without departing from the scope and spirit of
this invention. The following examples are provided to better
illustrate particular embodiments, and they should not be
considered limiting the application. The application is limited
only by the claims.
EXAMPLES
Introduction
[0437] The CsgG pore is part of the multi-component type VIII
secretion system, also known as the curli biosynthesis system,
which is responsible for the formation of aggregative fibres known
in E. coli as curli. Curli are extracellular proteinaceous fibres
primarily involved in bacterial biofilm formation and attachment to
non-biotic surfaces. Curli biosynthesis is directed by two operons,
in E. coli called csgBAC and csgDEFG (curli-specific genes) (Hammar
et al., 1995). Secretion of curli subunits CsgA and CsgB depends on
CsgG, a dedicated lipoprotein found to form an oligomeric secretion
channel in the outer membrane. For transport, CsgG works in
coordination with the periplasmic and extracellular accessory
proteins CsgE and CsgF. CsgE forms a specificity factor for
CsgG-mediated transport, whilst CsgF appears to couple CsgA
secretion to the CsgB-templated aggregation into extracellular
fibers.
[0438] A crystal structure of the CsgG secretion channel
demonstrated CsgG to form a nonameric transport complex 120 .ANG.
in diameter and 85 .ANG. in height that traverses the OM through a
36-stranded .beta.-barrel of 40 .ANG. inner diameter (Goyal et al.,
2014; FIG. 1). A periplasmic domain of the channel, separated from
the transmembrane .beta.-barrel by an iris-like diaphragm, forms a
large solvent-accessible cavity of 24,000 .ANG..sup.3. This
diaphragm is formed by a conserved 12-residue `constriction loop`
(CL) found in each subunit, and whose concentric organization in
the CsgG oligomer forms an orifice with solvent-excluded diameter
of .about.0.6 nm and a height of .about.1.5 nm (FIG. 1). When
acting as a protein secretion channel, this orifice or constriction
in the CsgG channel forms the primary site of interaction with the
translocating polypeptide. When CsgG is used as a nanopore sensing
platform, this orifice serves the role of primary reader head for
analytes residing in or passing the channel. The diameter of the
orifice as well as its physical and chemical properties can be
modified by amino acid substitutions, deletions or insertions in a
region of the protein corresponding to residues 46 to 61 of SEQ ID
NO:3 (FIG. 1D). In particular, mutation of positions 51, 55, and 56
according to SEQ ID NO:3, together or individually, has a
beneficial influence on the conductance characteristics of the
nanopore and its interaction with analytes including
polynucleotides.
[0439] The assembly factor CsgF represents a component to the curli
secretion apparatus. The CsgF preprotein reaches the periplasm via
the SEC pathway, after which mature CsgF (12.9 KDa) is found as a
surface-exposed protein in a CsgG-dependent manner. In presence of
CsgG, CsgF fractionates with the OM, and co-immunoprecipitation
experiments suggested the two proteins are in direct contact.
[0440] Available data demonstrate that CsgF is non-essential for
productive subunit secretion, but rather suggest the protein forms
a coupling factor between CsgA secretion and extracellular
polymerization into curli fibers by coordinating or chaperoning the
nucleating function of the CsgB subunit.
Example 1: CsgG:CsgF Complex Protein Production (Co-Expression, In
Vitro Reconstitution, Coupled In-Vitro Transcription and
Translation and Reconstitution of CsgG with CsgF Synthetic
Peptides)
[0441] To produce the CsgG:CsgF complex, both proteins can be
co-expressed in a suitable Gram-negative host such as E. coli, and
extracted and purified as a complex from the outer membrane. The in
vivo formation of the CsgG pore and the CsgG:CsgF complex requires
targeting of the proteins to the outer membrane. To do so, CsgG is
expressed as a prepro-protein with a lipoprotein signal peptide
(Juncker et al. 2003, Protein Sci. 12(8): 1652-62) and Cys residue
at the N-terminal position of the mature protein (SEQ ID No:3). An
example of such lipoprotein signal peptide is residues 1-15 of full
length E. coli CsgG as shown in SEQ ID No:2. Processing of prepro
CsgG results in cleavage of the signal peptide and lipidatation of
mature CsgG, following by transfer of the mature lipoprotein to the
outer membrane, where it inserts as an oligomeric pore (Goyal et
al. 2014, Nature 516(7530):250-3). To form the CsgG:CsgF complex,
CsgF can be co-expressed with CsgG and targeted to the periplasm by
means of a leader sequence such as the native signal peptide
corresponding to residues 1-19 of SEQ ID No:5. CsgG:CsgF
combination pores can then be extracted from the outer membrane
using detergents, and purified to a homogeneous complex by
chromatography (FIG. 2).
[0442] Alternatively, the CsgG:CsgF pore complex can be produced by
in vitro reconstitution using the CsgG pore and CsgF--see below and
FIG. 3.
[0443] For in vivo CsgG:CsgF complex formation in the example shown
in FIG. 2, E. coli CsgF (SEQ ID NO:5) and CsgG (SEQ ID NO:2) were
co-expressed using their native signal peptides to ensure
periplasmic targeting of both proteins, as well as N-terminal
lipidation of CsgG. Additionally, for ease of purification, CsgF
was modified by introduction of a C-terminally 6x histidine tag and
CsgG was fused C-terminally to a Strep-II tag. Co-expression and
complex purification was performed as described in the Methods.
SDS-PAGE analysis of the His affinity purification eluate revealed
the enrichment of CsgF-His, as well as the co-purification of
CsgG-Strep, suggesting the latter was in a complex with CsgF (FIG.
2B). Additionally, the SDS-PAGE revealed that a significant
fraction of the eluted CsgF ran at lower molecular mass due to the
loss of a N-terminal fragment of the protein (FIG. 2B, indicated
with asterisk). SDS-PAGE analysis of the pooled fractions of the
His-trap elution of the second affinity purification revealed the
presence of CsgG and CsgF in an apparent equimolar concentrations,
as well as the loss of the CsgF truncation fragment seen in the
His-trap eluate (FIG. 2B). Co-elution of CsgF in the Strep-affinity
purification indicated that the protein is present as a
non-covalent complex with CsgG. Strikingly, the N-terminal
truncation fragment of CsgF was lost in the Strep-affinity
purification, suggesting that the CsgF N-terminus is required to
bind CsgG (FIG. 2B).
[0444] FIG. 13 shows another example of CsgG:CsgF complex formation
by in vivo co-expression. In this example, CsgG protein is modified
with a C terminal Strep-II tag and the CsgF full length protein is
modified with a C terminal 10.times. Histidine tag. Co-expressed
CsgG:CsgF complex is been purified away from its constituent
components by Strep tag purification followed by a Histidine tag
purification as shown in the materials and methods section for
characterisation of analytes. Due to the differences in molecular
weights, CsgG:CsgF complex can be clearly differentiated from the
CsgG pore in SDS-PAGE analysis (FIG. 13A). As shown in FIG. 13.B,
two tag purification method can be successfully applied to purify
the CsgG:CsgF complex away from its constituent components.
[0445] To produce the CsgG:CsgF complex by in vitro reconstitution,
CsgG and CsgF were expressed in separate E. coli cultures
transformed with pPG1 and pNA101, respectively, and purified,
followed by in vitro reconstitution of the CsgG:CsgF complex (see
Methods). For comparison, purified CsgG was similarly run over the
Superose 6 column as the complex. The CsgG Superose 6 run showed
the existence of two discrete populations, corresponding to
nonameric CsgG pores (FIG. 3A (a) and 3C) as well as dimers of
nonameric CsgG pores (FIG. 3A (b) and 3C), as previously described
in Goyal et al. (2014). The Superose 6 run of the CsgG:CsgF
reconstitution revealed the existence of three discrete populations
corresponding to excess CsgF (FIG. 3A (c)), nonameric CsgG:CsgF
complex (FIG. 3A (d)) and dimers of nonameric CsgG:CsgF (FIG. 3A
(e)). To provide independent confirmation of the formation of
CsgG:CsgF complexes, the various Superose 6 elution peaks were
analysed on native PAGE (FIG. 3B).
[0446] Surprisingly, CsgG:CsgF complex can also be made by coupled
in vitro transcription and translation (IVTT) method as described
in the materials and methods section for characterisation of
analytes. The complex can be made either by expressing CsgG and
CsgF proteins in the same IVTT reaction or reconstituting
separately made CsgG and CsgF in two different IVTT reactions. In
the example shown in FIG. 14, E. coli T7-S30 extract system for
circular DNA (Promega) has been used to make the CsgG:CsgF complex
in one reaction mixture and proteins were analysed on SDS-PAGE.
Since the protein expression in IVTT does not use the natural
molecular machinery of protein expression, DNA that are used to
express proteins in IVTT lack the DNA encoding the signal peptide
region. When the DNA of CsgG is expressed in IVTT in the absence of
DNA of CsgF, only the monomers of CsgG can be produced.
Surprisingly, these expressed monomers can be assembled into CsgG
oligomeric pores in situ by using cell extract membranes present in
the IVTT reaction mixture (Figure, 14, lane 1). Although the
oligomer of CsgG is SDS stable, it breaks down into its constituent
monomers when the sample is heated to 100.degree. C. (Figure, 14,
lane 2). When the DNA of CsgF is expressed in IVTT in the absence
of DNA of CsgG, only CsgF monomers can be seen (Figure, 14, lane
3). When DNA of CsgG and CsgF are mixed in 1:1 ratio and expressed
simultaneously in the same IVTT reaction mixture, CsgF proteins
generated interact with the assembled CsgG pore with high
efficiency to make CsgG:CsgF complex (Figure, 14, lane 5). This SDS
stable complex made in IVTT is heat stable at least up to
70.degree. C. (Figure, 14, lanes 6-12).
[0447] CsgG:CsgF complexes with truncated CsgF can also be made by
any of the methods shown above by using DNA encoding truncated CsgF
instead of the full length version. However, stability of the
complex may be compromised when CsgF is truncated below the FCP
domain. In addition, CsgG:CsgF complexes with truncated CsgF can be
made by cleaving the full length CsgF in appropriate positions once
the full length CsgG:CsgF complex is formed. Truncations can be
done by modifying the DNA that encode CsgF protein by incorporating
protease cleavage sites at positions where cleavage is needed (FIG.
15.A). Seq ID No. 56-67 show TEV or HCV C3 protease sites
incorporated in various positions of CsgF to generate CsgG:CsgF
complexes with truncated CsgF. SDS-PAGE analysis of TEV cleavage of
the CsgG:CsgF complex made with the Seq ID No. 61 is shown (FIG.
15.B). When the CsgG:CsgF complex (with full length CsgF) is
treated with TEV protease enzyme as described in the materials and
methods section for characterisation of analytes, CsgF is being
truncated at position 35 (FIG. 15.B, lanes 3 and 4). However, TEV
cleavage leaves an extra 6 amino acids at the C terminal of the
cleavage site. Therefore, remaining CsgF truncated protein in
complex with the CsgG pore is 42 amino acids long. Molecular weight
difference of this complex and the CsgG pore (without the CsgF) is
still visible in SDS-PAGE (FIG. 15.B, lanes 7 and 8).
[0448] Surprisingly, CsgG:CsgF complexes with truncated CsgF can
also be made by reconstituting purified CsgG pore (made by in vivo
or in vitro) with synthetic peptides of appropriate length. Since
the reconstitution takes place in vitro, signal peptide of CsgF is
not required to make the CsgG:CsgF complex. Further, this method
does not leave extra amino acids at the C terminus of the CsgF.
Mutations and modifications can also be easily incorporated into
synthetic CsgF peptides. Therefore, this method is a very
convenient way to reconstitute different CsgG pores or mutants or
homologues thereof with different CsgF peptides or mutants or
homologues thereof to generate different CsgG:CsgF complex
variants. Stability of the complex may be compromised when the CsgF
is truncated beyond the FCP domain. Examples of truncated CsgF and
FCP peptides used to generate CsgG:CsgF complex variants are shown
in Table 3. Surprisingly, SDS-PAGE analysis of the heat stability
of CsgG:CsgF complexes made by this method with CsgF-(1-45) (FIG.
16.A), CsgF-(1-35) (FIG. 16.B) and CsgF-(1-30) (FIG. 16.C) shows at
least CsgF-(1-45) and CsgF-(1-35) peptides make complexes with CsgG
that are heat stable at least to 90.degree. C. Since the CsgG pore
breaks down to its constituent monomers at 90.degree. C., it is
difficult to assess the stability of the complex beyond 90.degree.
C. Due to the minimal difference between the CsgG pore band and the
CsgG:CsgF-(1-30) complex band in SDS-PAGE, this method is not
sufficient to analyse the heat stability of the CsgG:CsgF-(1-30)
complex (FIG. 16.C). However, CsgG:CsgF complexes have been
observed in all three cases and even with CsgG:CsgF-(1-29) in
electrophysiological experiments indicating that even CsgF-(1-29)
peptide is producing at least some CsgG:CsgF complexes (FIG.
24).
Example 2: CsgG:CsgF Structural Analysis Via Cryo-EM
[0449] To gain structural insight in the CsgG:CsgF complex,
co-purified or in vitro reconstituted CsgG:CsgF particles were
analysed by transmission electron microscopy. In preparation of
cryo-EM analysis, 500 .mu.L of the peak fraction of the
double-affinity purified CsgG:CsgF complex was injected onto a
Superose 6 10/30 column (GE Healthcare) equilibrated with Buffer D
(25 mM Tris pH8, 200 mM NaCl and 0.03% DDM), and run at 0.5 mL/min.
Protein concentration was determined based on calculated absorbance
at 280 nm and assuming 1:1 stoichiometry. Samples for electron
cryomicroscopy were analysed as described in the Methods. A cryo-EM
micrograph of the CsgG:CsgF complex as well as two selected class
averages from the picked CsgG:CsgF particles are shown in FIG. 4.
The micrograph shows the presence of nonameric pore as well as
dimer of nonameric pore complexes. For image reconstruction,
nonameric CsgG:CsgF particles were picked and aligned using RELION.
Class averages of the CsgG:CsgF complex as side views, as well as
the 3D reconstructed electron density show the presence of an
additional density corresponding to CsgF, seen as a protrusion from
the CsgG particle, located at the side of the CsgG R-barrel (FIG.
4B, 5). The additional density reveals three distinct regions,
encompassing a globular head domain, a hollow neck domain and a
domain that interacts with the CsgG R-barrel. The latter CsgF
region, referred to as CsgF constriction peptide or FCP, inserts
into the lumen of the CsgG R-barrel and can be seen to form an
additional constriction (labeled F in FIG. 4B, 5) of the CsgG pore,
located approximately 2 nm above the constriction formed by the
CsgG constriction loop (labeled G in FIG. 4B, 5).
Example 3: Identification of the CsgF Interaction and Constriction
Peptide by Truncation of CsgF
[0450] The presence of a second constriction in the CsgG:CsgF pore
complex as compared to the CsgG only pore provides opportunities
for nanopore sensing applications, providing a second orifice in
the nanopore that can be used as a second reader head or as an
extension of the primary reader head provided by the CsgG
constriction loop (FIG. 6, 7). However, when in complex with the
full length CsgF, the exit side of CsgG:CsgF combination pore is
blocked by the CsgF neck and head domains. Therefore, we sought to
determine the CsgF region required to interact with and insert into
the CsgG .beta.-barrel. Our Strep-tactin affinity purification
experiments hinted that the N-terminal region of CsgF was required
for CsgG interaction, since an N-terminal truncation fragment of
CsgF present in the His-trap affinity purification was lost and did
not co-purify with CsgG (FIG. 2B). CsgF homologues are
characterised by the presence of PFAM domain PF03783. When
performing a multiple sequence alignment (MSA) of CsgG homologues
found in Gram-negative bacteria (FIG. 8 shows a MSA of a selected
CsgF homologues) a region of sequence conservation (between 35 and
100% pairwise sequence identity) was seen corresponding to the
first .about.30-35 amino acids of mature CsgF (SEQ ID NO:6). Based
on the combined data, this N-terminal region of CsgF was
hypothesised to form the CsgG interaction peptide or FCP. A
multiple sequence alignment of the FCPs in known CsgF homologues is
shown in FIG. 10.
[0451] To test the hypothesis that the CsgF N-terminus corresponds
to the CsgG binding region and forms the CsgF constriction peptide
residing in the CsgG R-barrel lumen, Strep-tagged CsgG and
His-tagged CsgF truncates were co-overexpressed in E. coli (see
Methods). pNA97, pNA98, pNA99 and pNA100 encode N-terminal CsgF
fragments corresponding to residues 1-27, 1-38, 1-48 and 1-64 of
CsgF (SEQ ID NO:5). These peptides include the CsgF signal peptide
corresponding to residues 1-19 of SEQ ID NO: 5, and thus will
produce periplasmic peptides corresponding to the first 8, 19, 29
and 45 residues of mature CsgF (SEQ ID NO:6; FIG. 9A), each
including a C-terminal 6.times. His tag. SDS-PAGE analysis of whole
cell lysates revealed the presence of CsgG in all samples, as well
as the presence of CsgF fragment corresponding to the first 45
residues of mature CsgF (SEQ ID NO:6; FIG. 9B). For the shorter
N-terminal CsgF fragments, no detectible expression of the peptides
was seen in the whole cell lysates. After two freeze/thaw cycles,
cell mass of the various CsgG:CsgF fragments were further enriched
by purification. Whole cell lysates as well as the eluted fractions
of the Strep affinity purification were spotted onto a
nitrocellulose membrane for dot blot analysis using an anti-His
antibody for the detection of the His-tagged CsgF fragments (FIG.
9C). The dot blot shows the CsgF 20:64 peptide co-purifies with
CsgG, demonstrating this CsgF fragment is sufficient to form a
stable non-covalent complex with CsgG. For the CsgG 20:48 fragment
a small amount of peptide can be seen to co-purify with CsgG,
whilst no detectable levels are seen for CsgF 20:27 or CsgF 20:38
in either the whole cell lysate or the Strep affinity purification
(FIG. 9C), suggesting that the latter peptides are not stably
expressed in E. coli, and/or do not form a stable complex with
CsgG.
Example 4: Description of the CsgG:CsgF Interaction at Atomic
Resolution
[0452] To gain an atomic level detail on the CsgG:CsgF interaction
we determined the high resolution cryoEM structure of the CsgG:CsgF
complex. For this purpose, CsgG and CsgF were co-expressed
recombinantly in E. coli and the CsgG:CsgF complex was isolated
from E. coli outer membranes by detergent extraction and purified
using tandem affinity purification. Samples for electron
cryo-microscopy were prepared by spotting 3 .mu.l sample on R2/1
Holey grids (Quantifoil), coated with graphene oxide, and data was
collected on a 300 kV TITAN Krios with Gatan K2 direct electron
detector in counting mode. 62.000 single CsgG:CsgF particles were
used to calculate a final electron density map at 3.4 .ANG.
resolution (FIG. 11A). The map allowed unambiguous docking and
local rebuilding of the CsgG crystal structure, as well as the de
novo building of the N-terminal 35 residues of mature CsgF (i.e.
residues 20:54 of Seq ID No. 5), which encompass the FCP that binds
CsgG and forms a second constriction at the height of the CsgG
transmembrane .beta.-barrel (FIG. 11C, D). The cryoEM structure
shows CsgG:CsgF comprises a 9:9 stoichiometry, with C9 symmetry
(FIG. 11B). The FCP binds the inside of the CsgG .beta.-barrel,
with the C-terminus of the CsgF pointing out of the CsgG
.beta.-barrel, and the CsgF N-terminus located near the CsgG
constriction. The structure shows that P35 in mature CsgF lies
outside the CsgG .beta.-barrel and forms the connection between the
CsgF FCP and neck regions. The CsgF neck and head regions are not
resolved in the high resolution cryoEM maps due to flexibility
relative to the main body of the CsgG:CsgF complex. Three regions
in the CsgG .beta.-barrel stabilize the CsgG:CsgF interaction:
(IR1) residues Y130, D155, S183, N209 and T207 in mature CsgG (SEQ
ID NO: 3) form an interaction network with the N-terminal amine and
residues 1 to 4 of mature CsgF (SEQ ID NO: 6), comprising four
H-bonds and an electrostatic interaction; (IR2) residues Q187, D149
and E203 in mature CsgG (SEQ ID NO: 3) form an interaction network
with R8 and N9 in mature CsgF (SEQ ID NO 6), encompassing three
H-bonds and two electrostatic interaction; and (IR3) residues F144,
F191, F193 and L199 in mature CsgG (SEQ ID NO: 3) form a
hydrophobic interaction surface with residues F21, L22 and A26 in
mature CsgF (SEQ ID NO: 6). The latter are located in an
.alpha.-helix (helix 1) formed by residues 19-30 of mature CsgF.
The conserved sequence N-P-X-F-G-G (residues 9-14 in SEQ ID NO: 6)
forms an inward turn that connects the loop region formed by
residues 15-19 with the CsgF helix 1. Together, these elements give
rise to a constriction in the CsgG:CsgF complex, of which residue
17 (N17 in mature E. coli CsgF, SEQ ID NO: 6) forms the narrowest
point, resulting in an orifice with 15 .ANG. diameter (FIG. 11C).
The second constriction (F-constriction or FC) lies approximately
15 .ANG. and 30 .ANG. above the top and bottom, respectively, of
the constriction formed by CsgG residues 46 to 59 (G-constriction
or GC).
Example 5: Simulations to Improve CsgG-CsgF Complex Stability
[0453] Molecular dynamics simulations were performed to establish
which residues in CsgG and CsgF come into close proximity. This
information was used to design CsgG and CsgF mutants that could
increase the stability of the complex.
[0454] Simulations were performed using the GROMACS package version
4.6.5, with the GROMOS 53a6 forcefield and the SPC water model. The
cryo-EM structure of the CsgG-CsgF complex was used in the
simulations. The complex was solvated and then energy minimised
using the steepest descents algorithm. Throughout the simulation,
restraints were applied to the backbone of the complex, however,
the residue sidechains were free to move. The system was simulated
in the NPT ensemble for 20 ns, using the Berendsen thermostat and
Berendsen barostat to 300 K.
[0455] Contacts between CsgG and CsgF were analysed using both
GROMACS analysis software and also locally written code. Two
residues were defined as having made a contact if they came within
3 Angstroms. The results are shown in Table 4 below.
TABLE-US-00004 TABLE 4 Predicted contact frequencies of residue
pairs in the CsgG/CsgF complex: CsgG CsgF % Time spent residue
residue in contact GLU 203 ARG 8 88.8 GLU 201 ASN 11 87.4 GLU 201
PHE 12 84.3 GLU 203 ASN 9 83.6 ASP 155 GLY 1 81.2 GLU 203 PHE 7 81
GLU 201 ASN 9 77.2 SER 183 GLY 1 76.1 ASN 209 MET 3 70.8 THR 207
PHE 5 70.1 ASP 149 ARG 8 68.5 GLN 187 ARG 8 66.1 ARG 142 PHE 12
65.4 GLU 185 ARG 8 64.4 ASP 149 PHE 12 64.2 GLN 187 GLN 6 63.3 GLY
205 PHE 5 54 GLN 197 ASN 30 52.5 GLN 197 SER 31 51.4 LYS 49 THR 2
50.8 PHE 144 GLN 29 50.6 GLU 201 PHE 21 48 GLN 151 PHE 5 47 PHE 191
ASN 9 46.9 ARG 142 ASN 11 46.4 GLN 151 PHE 7 45.6 TYR 196 TYR 32
45.4 PHE 191 PHE 21 45.3 PHE 193 ALA 26 45.1 GLU 201 SER 25 44.9
LEU 199 GLN 29 44.7 ARG 141 PHE 12 43.1 GLY 138 PHE 7 43 GLN 187
PHE 5 43 GLY 145 GLN 29 42.4 GLN 153 GLY 1 42.1 GLY 140 PHE 7 40.5
PHE 193 TYR 32 39.9 GLU 203 PHE 12 39.7 ASN 133 PHE 5 35.9 GLN 151
MET 3 32 PHE 193 ASN 30 31.9 SER 136 PHE 5 31.7 PHE 144 SER 31 30.3
TYR 130 GLY 1 30 GLN 187 PHE 7 29.9 PHE 192 ASN 30 28.9 GLY 138 PHE
5 28.3 ILE 194 TYR 32 26.7 ASN 209 GLY 1 26.1 PHE 192 GLN 29 25.8
PHE 193 GLN 29 25.4 PHE 193 GLN 27 23.9 ASP 149 GLY 13 22.7 TYR 196
ASN 30 22.6 PHE 192 SER 31 22.2 ASP 148 PHE 12 22 GLY 140 PHE 12
21.7 TYR 196 ASP 34 21.6 ARG 198 SER 31 19.9 VAL 139 PHE 7 19.5 PHE
191 ALA 26 18.3 ASN 132 GLY 1 18.1 TYR 195 TYR 32 17.9 GLN 197 ALA
28 17.6 GLN 151 ARG 8 16.9 PHE 191 LEU 22 16.5 PHE 191 GLN 29 15.6
THR 206 PHE 5 14.7 GLN 153 MET 3 14.3 PHE 192 TYR 32 13.8 GLU 201
GLN 29 13.3 ARG 142 SER 25 13.3 PHE 144 ASN 30 12.6 ARG 142 ARG 8
12.6 PHE 191 ASN 11 12.3 GLU 131 THR 2 12.2 ASN 133 GLY 1 11.3 GLY
205 PHE 7 11.2 GLN 151 PHE 12 10.4 ASN 132 PHE 5 10.3 GLU 202 PHE
12 10.2 ASP 149 PHE 7 10.2
Materials and Methods for Structural Determination of the CsgG:
CsgF Complex:
Cloning
[0456] For the expression of E. coli CsgG as outer membrane
localized pore, the coding sequence of E. coli CsgG (SEQ ID NO:1)
was cloned into pASK-lba12, resulting in plasmid pPG1 (Goyal et al.
2013).
[0457] For the expression of C-terminally 6x-His tagged CsgF in the
E. coli cytoplasm, the coding sequence for mature E. coli CsgF (SEQ
ID NO:6; i.e. CsgF without its signal sequence) was cloned into
pET22b via the NdeI and EcoRI sites, using a PCR product generated
using the primers "CsgF-His_pET22b_FW" (SEQ ID NO:46) and
"CsgF-His_pET22b_Rev" (SEQ ID NO:47), resulting in the CsgF-His
expression plasmid pNA101.
[0458] The pNA62 plasmid, a pTrc99a based vector expressing
csgF-His and csgG-strep, was created based on pGV5403 (pTrc99a with
the pDEST14 Gateway.RTM. cassette integrated). The pGV5403
ampicillin resistance cassette was replaced by a
streptomycin/spectinomycin resistance cassette. A PCR fragment
encompassing part of the E. coli MC4100 csgDEFG operon
corresponding to the coding sequences of csgE, csgF and csgG was
generated with primers csgEFG_pDONR221_FW (SEQ ID NO:48) and
csgEFG_pDONR221_Rev (SEQ ID NO:49), and inserted in pDONR221
(ThermoFisher Scientific) via BP Gateway.RTM. recombination. Next,
this recombinant csgEFG operon from the pDONR221 donor plasmid was
inserted via LR Gateway.RTM. recombination into pGV5403 with
streptomycin/spectinomycin resistance cassette. Via PCR, a
6.times.His-tag was added to the CsgF C-terminus using primers
Mut_csgF_His_FW (SEQ ID NO:50) and Mut_csgF_His_Rev (SEQ ID NO:51).
Finally, csgE was removed by outwards PCR (primers DelCsgE_FW (SEQ
ID NO:52) and DelCsgE_Rev (SEQ ID NO:53)) to obtain pNA62.
[0459] Constructs for the periplasmic expression of C-terminally
His-tagged CsgF fragments corresponding to the putative
constriction peptides (FIG. 9 A) were created by outwards PCR on
pNA62, a pTrc99a based vector expressing CsgF-his and CsgG-strep.
Primer combinations were as follows: pNa62_CsgF_histag_Fw (SEQ ID
NO:45) as forward primers, with CsgF_d27_end (SEQ ID NO:41),
CsgF_d38_end (SEQ ID NO:42), CsgF_d48_end (SEQ ID NO:43) or
CsgF_d64_end (SEQ ID NO:44) as reverse primers to create pNA97,
pNA98, pNA99 and pNA100 respectively.
[0460] In pNA97 csgF is truncated to SEQ ID NO:7, encoding a CsgF
fragment including residues 1-27 (SEQ ID NO:8); In pNA98 csgF is
truncated to SEQ ID NO:9, encoding a CsgF fragment including
residues 1-38 (SEQ ID NO:10); In pNA99 csgF is truncated to SEQ ID
NO:11, encoding a CsgF fragment including residues 1-48 (SEQ ID
NO:12); and in pNA100 csgF is truncated to SEQ ID NO:13, encoding a
CsgF fragment including residues 1-64 (SEQ ID NO:14). Expression of
pNA97, pNA98, pNA99 and pNA100 in E. coli does result in production
of the CsgG pore (SEQ ID NO:3) in the outer membrane, as well as
periplasmic targeting of CsgF-derived peptides with sequences:
[0461] "GTMTFQFRHHHHHH" (SEQ ID NO:37+6.times.His),
"GTMTFQFRNPNFGGNPNNGHHHHHH" (SEQ ID NO:38+6.times.His),
"GTMTFQFRNPNFGGNPNNGAFLLNSAQAQHHHHHH" (SEQ ID NO:39+6.times.His),
and "GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETHHHHHH" (SEQ ID
NO:40+6.times.His), respectively.
Strains
[0462] E. coli Top10 (F.sup.- mcrA .DELTA.(mrrhsdRMS.sup.-mcrBC)
.PHI.80lacZ.DELTA.M15 .DELTA. lacX74 recA1 araD139 .DELTA.(araleu)
7697 galU galK rpsL (StrR) endA1 nupG) was used for all cloning
procedures. E. coli C43(DE3) (F.sup.- ompT hsdSB (rB.sup.-
mB.sup.-) gal dcm (DE3)) and Top10 were used for protein
production.
Recombinant CsgG:CsgF Complex Production Via Co-Expression
[0463] For co-expression of E. coli CsgF (SEQ ID NO:5) and CsgG
(SEQ ID NO:2), both recombinant genes including their native Shine
Dalgarno sequences were placed under control of the inducible trc
promotor in a pTrc99a-derived plasmid to form plasmid pNA62. CsgG
and CsgF were overexpressed in E. coli C43(DE3) cells transformed
with plasmid pNA62 and grown at 37.degree. C. in Terrific Broth
medium. When the cell culture reached an optical density (OD) at
600 nm of 0.7, recombinant protein expression was induced with 0.5
mM IPTG and left to grow for 15 hours at 28.degree. C., before
being harvested by centrifugation at 5500 g.
Recombinant CsgG:CsgF Complex Production Via In Vitro
Reconstitution
[0464] Full-length E. coli CsgG (SEQ ID NO:2) modified with a
C-terminal StrepII-tag was overexpressed in E. coli BL21 (DE3)
cells transformed with plasmid pPG1 (Goyal et al. 2013). The cells
were grown at 37.degree. C. to an OD 600 nm of 0.6 in Terrific
Broth medium. Recombinant protein production was induced with
0.0002% anhydrotetracyclin (Sigma) and the cells were grown at
25.degree. C. for a further 16 h before being harvested by
centrifugation at 5500 g.
[0465] E. coli CsgF (SEQ ID NO:6; i.e. lacking the CsgF signal
sequence) in a C-terminal fusion with a 6.times. His-tag was
overexpressed in the cytoplasm of E. coli BL21(DE3) cells
transformed with plasmid pNA101. Cells were grown at 37.degree. C.
to an OD of 600 nm followed by induction by 1 mM IPTG and left to
express protein 15h at 37.degree. C. before being harvested by
centrifugation at 5500 g.
Recombinant Protein Purification of the CsgG:CsgF Complex, CsgG,
and CsgF
[0466] E. coli cells transformed with pNA62 and co-expressing
CsgG-Strep and CsgF-His were resuspended in 50 mM Tris-HCl pH 8.0,
200 mM NaCl, 1 mM EDTA, 5 mM MgCl.sub.2, 0.4 mM AEBSF, 1 .mu.g/mL
Leupeptin, 0.5 mg/mL DNase I and 0.1 mg/mL lysozyme. The cells were
disrupted at 20 kPsi using a TS series cell disruptor (Constant
Systems Ltd) and the lysed cell suspension incubated 30' with 1%
n-dodecyl-.beta.-d-maltopyranoside (DDM; Inalco) for further cell
lysis and extraction of outer membrane components. Next, remaining
cell debris and membranes were spun down by ultracentrifugation at
100.000 g for 40'. Supernatant was loaded onto a 5 mL HisTrap
column equilibrated in buffer A (25 mM Tris pH8, 200 mM NaCl, 10 mM
imidazole, 10% sucrose and 0.06% DDM). Column was washed with
>10 CVs 5% buffer B (25 mM Tris pH8, 200 mM NaCl, 500 mM
imidazole, 10% sucrose and 0.06% DDM) ion buffer A and eluted with
a gradient of 5-100% buffer B over 60 mL.
[0467] Eluens was diluted 2-fold before loading overnight on a 5 mL
Strep-tactin column (IBA GmbH) equilibrated with buffer C (25 mM
Tris pH8, 200 mM NaCl, 10% sucrose and 0.06% DDM). Column was
washed with >10 CVs buffer C and protein was eluted by the
addition of 2.5 mM desthiobiotin. Next 500 .mu.L of the peak
fraction of the double-affinity purified complex was injected on a
Superose 6 10/30 (GE Healthcare) equilibrated with Buffer D (25 mM
Tris pH8, 200 mM NaCl and 0.03% DDM), and run at 0.5 mL/min to
prepare samples for electron microscopy. Protein concentration was
determined based on calculated absorbance at 280 nm and assuming
1/1 stoichiometry. Buffer D (25 mM Tris pH8, 200 mM NaCl and 0.03%
DDM)
[0468] CsgG-strep purification for in vitro reconstitution is
identical to the protocol for CsgG:CsgF when omitting sucrose in
the buffers and bypassing the IMAC and size exclusion steps.
[0469] CsgF-His purification for in vitro reconstitution was
performed by resuspension of the cell mass in 50 mM Tris-HCl pH
8.0, 200 mM NaCl, 1 mM EDTA, 5 mM MgCl.sub.2, 0.4 mM AEBSF, 1
.mu.g/mL Leupeptin, 0.5 mg/mL DNase I and 0.1 mg/mL lysozyme. The
cells were disrupted at 20 kPsi using a TS series cell disruptor
(Constant Systems Ltd) and the lysed cell suspension was
centrifuged at 10.000 g for 30 min to remove intact cells and cell
debris. Supernatant was added to 5 mL Ni-IMAC-beads (Workbeads 40
IDA, Bio-Works Technologies AB) equilibrated with buffer A (25 mM
Tris pH8, 200 mM NaCl, 10 mM imidazole) and left incubating for 1
hour at 4.degree. C. Ni-NTA beads were pooled in a gravity flow
column and washed with 100 mL of 5% buffer B (25 mM Tris pH8, 200
mM NaCl, 500 mM imidazole diluted in buffer A. Bound protein was
eluted by stepwise increase of Buffer B (10% steps of each 5
mL).
In Vitro Reconstitution of the CsgG:CsgF Complex
[0470] Purified CsgG and CsgF were pooled and used to in vitro
reconstitute the complex. Therefore a molar ratio of 1 CsgG:2 CsgF
was mixed to saturate the CsgG barrel with CsgF. Next, the
reconstituted mixture was injected on a Superose 6 10/30 column (GE
Healthcare) equilibrated with Buffer D (25 mM Tris pH8, 200 mM NaCl
and 0.03% DDM), and run at 0.5 mL/min to prepare samples for
electron microscopy (FIG. 3). Protein concentration was determined
based on calculated absorbance at 280 nm and assuming 1/1
stoichiometry.
Structural Analysis Using Electron Microscopy
[0471] Sample behavior of the size exclusion fraction is probed
using negative stain electron microscopy. Samples are stained with
1% uranyl formate and imaged using an in-house 120 kV JEM 1400
(JEOL) microscope equipped with a LaB6 filament. Samples for
electron cryomicroscopy were prepared by spotting 2 .mu.L sample
onto R2/1 continuous carbon (2 nm) coated grids (Quantifoil),
manually blotted and plunged in liquid ethane using an in house
plunging device. Sample quality was screened on the in-house JEOL
JEM 1400 before collecting a dataset on a 200 kV TALOS ARCTICA
(FEI) microscope equipped with a Falcon-3 direct electron detection
camera. Images were motion corrected with MotionCor2.1 (Zheng et
al. 2017), defocus values were determined using ctffind4 (Rohou and
Grigorieff, 2015) and data was further analysed using a combination
of RELION (Scheres, 2012) and EMAN2 (Ludtke, 2016). C9 Symmetry was
imposed during 3D model generation and refinement on selected 2D
class averages featuring additional density for a head group.
[0472] For high resolution cryoEM analysis, CsgG:CsgF samples were
prepared for electron cryo-microscopy by spotting 3 .mu.l sample on
R2/1 Holey grids (Quantifoil), coated with graphene oxide (Sigma
Aldrich), manually blotted and plunged in liquid ethane using CP3
plunger (Gatan). Sample quality was screened on the in-house JEOL
JEM 1400 before collecting a dataset on a 300 kV TITAN KRIOS (FEI,
Thermo-Scientific) microscope equipped K2 Summit direct electron
detector (Gatan). The detector was used in counting mode with a
cumulative electron dose of 56 electrons per .ANG..sup.2 spread
over 50 frames. 2045 images were collected with a pixel size of
1.07 .ANG.. Images were motion-corrected with MotionCor2.1 (Zheng
et al. 2017) and defocus values were determined using ctffind4
(Rohou and Grigorieff, 2015). Particles were picked automatically
using Gautomatch (Dr. Kai Zhang) and data was further analysed
using a combination of RELION2.0 (Kimanius et al. 2016, Elife 5.
pii: e18722) and EMAN2 (Ludtke, 2016). C9 Symmetry was imposed
during 3D model generation and refinement on selected 2D class
averages featuring additional density for the head group
corresponding to CsgF. 62.000 particles were used to calculate the
final map at 3.4 .ANG. resolution. De novo model building of CsgF
was done with COOT (Brown et al. 2015 Acta Crystallogr D Biol
Crystallogr 71(Pt 1):136-53) and iterative cycles of model building
and refinement of the full complex was done with PHENIX (Afonine
2018, Acta Crystallogr D Struct Biol 74(Pt 6):531-544) real-space
refinement in combination with COOT.
Protein Expression and Purification of CsgG:CsgF Fragments
[0473] CsgF fragments and CsgG were co-expressed, with CsgF
fragments being C-terminally His-tagged and CsgG fused C-terminally
to a Strep tag. The CsgG: CsgF fragments complex was over-expressed
in E. coli Top10 cells, transformed with plasmid pNA97, pNA98,
pNA99 or pNA100. Plates were grown at 37.degree. C. ON, and a
colony was resuspended in LB medium supplemented with
Streptomycin/spectomycin. When the cell cultures reached an optical
density (OD) at 600 nm of 0.7, recombinant protein expression was
induced with 0.5 mM IPTG and left to grow for 15 hours at
28.degree. C., before being harvested by centrifugation at 5500 g.
Pellets were frozen at -20.degree. C.
[0474] Cell mass for the various CsgG:CsgF fragment co-expressions
was resuspended in 200 mL 50 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM
EDTA, 5 mM MgCl.sub.2, 0.4 mM AEBSF, 1 .mu.g/mL Leupeptin, 0.5
mg/mL DNase I and 0.1 mg/mL lysozyme, sonicated and incubated with
1% n-dodecyl-.beta.-d-maltopyranoside (DDM; Inalco) for further
cell lysis and extraction of outer membrane components. Next,
remaining cell debris and membranes were spun down by
centrifugation at 15.000 g for 40'. The supernatant was incubated
with 100 .mu.L Strep-tactin beads at RT for 30 min. Strep beads
were washed with buffer (25 mM Tris pH8, 200 mM NaCl, and 1% DDM)
by centrifugation and bound proteins were eluted by the addition of
2.5 mM desthiobiotin in 25 mM Tris pH8, 200 mM NaCl, 0.01% DDM.
[0475] Production of CsgG:FCP by in vitro reconstitution.
[0476] A synthetic peptide corresponding to the N-terminal 34
residues of mature CsgF (SEQ ID NO: 6) was diluted to 1 mg/ml in
buffer 0.1 M MES, 0.5 M NaCl, 0.4 mg/ml EDC
(1-ethyl-3-(3-dimethylaminopropyl)carbodiimide), 0.6 mg/ml NHS
(N-hydroxysuccinimide) and incubated for 15 min at room temperature
to allow activation of the peptide carboxyterminus. Next, 1 mg/ml
Cadaverin-Alexa594 in PBS was added during a 2h incubation to allow
covalent coupling at room temperature. The reaction was quenched
via buffer exchange to 50 mM Tris, NaCl, 1 mM EDTA, 0.1% DDM using
Zeba Spin filters.
[0477] Labelled peptide was added to strep-affinity purified CsgG
in 50 mM Tris, 100 mM NaCl, 1 mM EDTA, 5 mM LDAO/C8D4 in a 2:1
molar ratio during 15 minutes at room temperature to allow
reconstitution of the CsgG:FCP complex. After pull down of
CsgG-strep on StrepTactin beads, the sample was analysed on
native-PAGE.
Example 6: Further Stabilization of CsgG:CsgF Complex by Covalent
Cross Linking
[0478] Although full length and some of the truncated versions of
CsgF make stable CsgG:CsgF complexes with the CsgG pore, CsgF can
still be dislodged from the barrel region of CsgG pore under
certain conditions. Therefore, it is desirable to make a covalent
link between the CsgG and CsgF subunits. Based on molecular
simulation studies, positions of CsgG and CsgF that are in close
proximity to each other have been identified (Example 5 and Table
4). Some of these identified positions have been modified to
incorporate a Cysteine in both CsgG and CsgF. FIG. 19 shows an
example of thiol-thiol bond formation between Q153 position of CsgG
and G1 position of CsgF. CsgG pore containing Q153C mutation was
reconstituted with CsgF containing G1C mutation and incubated for 1
hour enabling S-S bond formation. When the complex is heated to
100.degree. C. in the absence of DTT, a 45 kDa band corresponding
to dimer between CsgG monomer and CsgF monomer (CsgGm-CsgFm) can be
seen indicating the S-S bond formation between the two monomers
(CsgGm is 30 kDa and CsgFm is 15 kDa) (FIG. 19.A). This band
disappears when the heating is done in the presence of DTT. DTT
breaks down the S-S bond. When the CsgG:CsgF complex incubated
overnight instead of 1 hour, the extend of CsgGm-CsgFm dimer
formation increases (FIG. 19.A). Mass spectroscopy methods have
been carried out to further identify the dimer band. Gel purified
protein was proteolytically cleaved to generate tryptic peptides.
LC-MS/MS sequencing methods were performed, resulting in the
identification of S-S bond between the Q153 position of CsgG and G1
position of CsgF (FIG. 19.B). Oxidising agents such as
copper-orthophenanthroline can be used to enhance the S-S bond
formation. When CsgG pore containing N133C modification is
reconstituted with CsgF containing T4C modification in the presence
of copper-orthophenanthroline as described in methods section and
then broken down to its constituent monomers by heating to
100.degree. C. in the absence of DTT, a strong dimer band
corresponding to CsgGm-CsgFm can be observed on SDS-PAGE (FIG. 20,
lanes 3 and 4). When the heating was carried out in the presence of
DTT, the dimer breaks down to its constituent monomers (FIG. 20,
lanes 1 and 2).
Example 7: Electrophysiological Characterisation of CsgG:CsgF
Complexes
[0479] The signal observed when a DNA strand translocates through
CsgG is well characterised when the pore is inserted in the
copolymer membrane and experiments are carried out using the MinION
of Oxford Nanopore Technologies (FIG. 31). Y51, N55 and F56 of each
subunit of CsgG form the constriction of the CsgG pore (FIG. 12).
This sharp constriction serves as the reader head of the CsgG pore
(FIG. 31A) and is able to accurately discriminate a mixed sequence
of A,C,G and T as it passes through the pore. This is because the
measured signal contains characteristic current deflections from
which the identity of the sequence can be derived. However, in
homopolymeric regions of DNA, the measured signal may not show
current deflections of sufficient magnitude to allow single base
identification; such that an accurate determination of the length
of a homopolymer cannot be made from the magnitude of the measured
signal alone (FIGS. 26B and C). The reduction in accuracy of the
CsgG reader head is correlated to the length of the homopolymeric
region (FIG. 29 C). When CsgF interacts with the CsgG pore to make
the CsgG:CsgF complex, CsgF introduces a second reader head within
the CsgG barrel. This second reader head primarily consists of the
N17 position of Seq. ID No. 6. A static strand experiment as
described in the methods section and FIG. 27 was carried out to map
the two reader heads of the CsgG:CsgF complex experimentally, and
results indicate the presence of the two reader heads that are
separated from each other by approximately 5-6 bases (FIGS. 27, B,
C and D). Reader head discrimination plot for the CsgG:CsgF complex
shows that the second reader head introduced by CsgF contributes
less to the base discrimination than that of the CsgG reader head
(FIG. 27 A). Surprisingly, when a second reader head is introduced
by CsgF within the CsgG barrel, the homopolymeric region which was
flat previously shows a step wise signal (FIGS. 30 B and C). These
steps contain information that can be used to identify the sequence
accurately resulting in a decrease in errors. Accuracy of the DNA
signal of the CsgG:CsgF complex remains relatively constant over a
longer homopolymeric length compared to the accuracy profile of the
CsgG pore by itself (FIG. 29 C).
[0480] CsgG:CsgF complexes made in any of the methods described in
the methods section can be used to characterise the complex in DNA
sequencing experiments. Signals of a lambda DNA strand passing
through various CsgG:CsgF complexes made by different methods
consisting of different CsgG mutant pores and different CsgF
peptides with different lengths are shown in FIGS. 21-24. Reader
head discrimination of those pore complexes and their base
contribution profiles are shown in FIG. 28 (A-H). Surprisingly,
different modifications at constrictions of both CsgG pore and the
CsgF peptide can alter the signal of the CsgG:CsgF pore complex
significantly. For example, when the CsgG:CsgF complexes are made
with the same CsgG pore, but with two different CsgF peptides of
the same length containing either Asn or Ser at position 17 (of Seq
ID No. 6) (made by the same method of co-expression of the full
length CsgF protein followed by TEV protease cleavage of CsgF
between positions 35 and 36), the signals generated are different
from each other (FIG. 21). The CsgG:CsgF complex with Ser at
position 17 of the CsgF peptide shows lower noise and higher
signal:noise ratio compared to the CsgG:CsgF complex with Asn at
position 17 of the CsgF peptide. Similarly, when the same CsgG pore
was reconstituted with two different peptides of CsgF of the same
length (1-35 of Seq ID No. 6) but with either Ser or Val at positon
17 to make the CsgG:CsgF complexes, the complex with Val at
position 17 of CsgF shows a noisier signal than the complex with
Ser at position 17 of CsgF (FIG. 22). When the same CsgF peptide of
the same length was reconstituted with different CsgG pores
containing different mutations at the CsgG reader head (positions
51, 55 and 56), the resulting CsgG:CsgF complexes showed very
different signals (FIG. 23, A-F) with different signal to noise
ratios (FIG. 25). Surprisingly, when different lengths of CsgF
peptides that contained the same constriction region were
reconstituted with the same CsgG pore to make CsgG:CsgF complexes,
they gave signals with a different range (FIG. 24). CsgG:CsgF
complex which contains the shortest CsgF peptide (1-29 of Seq ID
No. 6) showed the largest range and the CsgG:CsgF complex which
contains the longest CsgF peptide (1-45 of Seq ID No. 6) showed the
smallest range (FIG. 24).
[0481] Materials and Methods for characterisation of analytes: The
proteins produced by the methods described below can be used
interchangeably with those produced by the methods described above
with respect to structural determination.
[0482] Methods Expression of the CsgG:CsgF or CsgG:FCP complex by
co-expression Genes encoding the CsgG proteins and its mutants are
constructed in the pT7 vector which contains ampicillin resistance
gene. Genes encoding the CsgF or FCP proteins and its mutants are
constructed in the pRham vector which contains Kanamycin resistant
gene. 1 uL of both plasmids is mixed with 50 uL of
Lemo(DE3).DELTA.CsgEFG for 10 minutes on ice. The sample is then
heated at 42.degree. C. for 45 seconds before being returned to ice
for another 5 minutes. 150 uL of NEB SOC outgrowth medium is added
and the sample is incubated at 37.degree. C. with shaking at 250
rpm for 1 hour. The entire volume is spread onto an agar plate
containing kanamycin (40 ug/mL), ampicillin (100 ug/mL) and
chloramphenicol (34 ug/ml) and incubated overnight at 37.degree. C.
Single colony is taken from the plate and inoculated into 100 mL of
LB media containing kanamycin (40 ug/mL), ampicillin (100 ug/mL)
and chloramphenicol (34 ug/ml) and incubated overnight at
37.degree. C. with shaking at 250 rpm. 25 mL of the starter culture
is added to 500 mL of LB media containing 3 mM ATP, 15 mM
MgSO.sub.4, kanamycin (40 ug/mL), ampicillin (100 ug/mL) and
chloramphenicol (34 ug/ml) and incubated overnight at 37.degree. C.
The culture was allowed to grow for 7 hours, at which point the
OD.sub.600 was greater than 3.0. Lactose (1.0% final
concentration), glucose (0.2% final concentration) and rhamnose (2
mM final concentration) were added and the temperature dropped to
18.degree. C. whist shaking is maintained at 250 rpm for 16 hours.
Culture was centrifuged at 6000 rpm for 20 mins at 4.degree. C. The
supernatant was discarded and the pellet kept. Cells stored at
-80.degree. C. until purification.
Expression of the CsgG Pore with or without a C-Term Strep Tag and
CsgF with or without a C Terminal Strep or His Tag
[0483] All genes encoding all the CsgG proteins and CsgF or FCP
proteins are constructed in the pT7 vector which contains
ampicillin resistance gene. Expression procedure is same as above
except for Kanacmycin is being omitted in all medias and
buffers.
Cell Lysis (Co Expressed Complex or Individual CsgG/CsgF/FCP
Proteins)
[0484] The lysis buffer is made of 50 mM Tris, pH 8.0, 150 mM NaCl,
0.1% DDM, 1.times. Bugbuster Protein Extraction Reagent (Merck),
2.5 uL Benzonase Nuclease (stock 250 units/.mu.L)/100 mL of lysis
buffer and 1 tablet Sigma Protease inhibitor cocktail/100 mL of
lysis buffer. 5.times. volume of lysis buffer is used to lyse
1.times. weight of harvested cells. Cells resuspended and left to
spin at room temperature for 4 hours until a homogenous lysate is
produced. Lysate is spun at 20,000 rpm for 35 minutes at 4.degree.
C. The supernatant is carefully extracted and filtered through a
0.2 uM Acrodisc syringe filter.
Strep Purification of the CsgG or CsgF/FCP Proteins or Co-Expressed
Complex if the CsgG Contains a C-Term Strep Tag and CsgF or FCP
Contains a C-Term His Tag
[0485] The filtered sample was then loaded onto a 5 mL StrepTrap
column with the following parameters: Loading speed: 0.8 mL/min,
Complete sample loading: 10 mL, Wash out unbound: 10CV (5 mL/min),
Extra wash: 10CV (5 mL/min), Elution: 3CV (5 mL/min). Affinity
buffer: 50 mL Tris, pH 8.0, 150 mM NaCl, 0.1% DDM; Wash buffer: 50
mL Tris, pH 8.0, 2M NaCl, 0.1% DDM; Elution buffer: 50 mL Tris,
pH8.0, 150 mM NaCl, 0.1% DDM, 10 mM desthiobiotin. Eluted sample is
collected.
His Purification of the CsgG or CsgF/FCP Proteins or Co-Expressed
Complex if the CsgG Contains a C-Term Strep Tag and CsgF or FCP
Contains a C-Term His Tag
[0486] Filtered sample or pooled eluted peaks from Strep
purification (in case of the complex) loaded onto 5 mL HisTrap
column using the same parameters as above, except with the
following buffers: Affinity & wash buffer: 50 mL Tris, pH 8.0,
150 mM NaCl, 0.1% DDM, 25 mM imidazole; Elution: 50 mL Tris, pH
8.0, 150 mM NaCl, 0.1% DDM, 350 mM imidazole. Peak eluted,
concentrated in 30 kDa MWCO Merck Milipore centrifugal unit to a
volume of 500 uL.
Formation of the Complex In Vitro with In Vivo Purified
Components.
[0487] Both the CsgG and the CsgF/FCP proteins expressed and
purified separately are mixed in various ratios to identify the
correct ratio. however always in excess CsgF conditions. The
complex was then incubated overnight at 25.degree. C. To remove the
excess CsgF and remove DTT from the buffer, the mixture was again
injected onto the Superdex Increase 200 10/300 equilibrated in 50
mM Tris, pH 8.0, 150 mM NaCl, 0.1% DDM. The complex usually elutes
between 9 to 10 mL on this column.
Polishing Step with Gel Filtration for the Complex (Co-Expressed or
Made In Vitro)
[0488] If necessary, Strep purified or His purified or His followed
by Strep purified CsgG:CsgF or CsgG:FCP can be subjected to a
further polishing step by gel filtration. 500 uL of the sample was
injected into a 1 mL sample loop and onto the Superdex Increase 200
10/300 equilibrated in 50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% DDM.
The peak associated to the complex usually elutes between 9 and 10
mL on this column when run 1 mL/min. Sample was heated at
60.degree. C. for 15 minutes and centrifuged at 21,000 rcf for 10
mins. Supernatant was taken for testing. Samples were subjected to
SDS-PAGE to confirm and identify fractions eluted with the
complex.
Cleavage of CsgF or FCP at the TEV Protease Site
[0489] If the CsgF or FCP contains a TEV cleavage site,
TEV-protease with a C-term Histidine tag is added to the sample
(amount added is identified based on the rough concentration of the
protein complex) with 2 mM DTT. Sample incubated overnight at
4.degree. C. on the roller mixer at 25 rpm. The mixture is then run
back through a 5 mL HisTrap column and the flow through is
collected. Anything uncleaved will remain bound to the column and
the cleaved protein will elute. Same buffers and parameters and the
final heating step are used as in the His purification described
above.
[0490] Purifying the CsgG:FCP complex with in vivo purified CsgG
pore and synthetic FCP Lyophilised FCP peptides received from
Genscript and Lifetein. 1 mg of peptide dissolved in 1 mL of
nuclease free ddH2O to obtain 1 mg/mL sample. Sample was vortexed
until no peptide remains visible. Due to differences in expression
levels of CsgG pores and mutants, it's difficult to measure the
concentration accurately. Intensity of protein bands on SDS-PAGE
against known markers can be used to get a rough estimate of the
sample. CsgG and FCP are then mixed in approximately 1:50 molar
ratio and incubate at 25.degree. C. overnight at 700 rpm. Samples
were heated at 60.degree. C. for 15 minutes and centrifuged at
21,000 rcf for 10 mins. Supernatant was taken for testing. If
needed, the complex can be purified as detailed above in
co-expression.
Purifying CsgG:CsgF or CsgG:FCP Containing Cysteine Mutants
[0491] Same procedure as above can be used to purify the CsgG:CsgF
or CsgG:FCP complexes (with I or II or III below) if either or both
components contain cysteines except for the composition of
affinity, wash and elution buffers in His and Strep purifications
and the buffer used in gel filtration. To purify cysteine mutants,
all these buffers should contain 2 mM DTT. 2 mM DTT was also been
added when synthetic peptides containing cysteines are dissolved in
ddH2O
[0492] I. co-expression of CsgG and CsgF or FCP
[0493] II. Making the CsgG:CsgF or CsgG:FCP complexes in vitro with
in vivo purified individual components
[0494] III. Making the CsgG:CsgF or CsgG:FCP complexes in vitro
with in vivo purified CsgG and synthetic FCP
Determination of Cys-Bond Formation
[0495] Two tubes of 50 uL each from the final elution were
separated. In one of the tube, 2 mM DTT was added as a reducing
agent and in the other tube 100 .mu.M of Cu(II): 1-10
Phenanthroline (33 mM: 100 mM) was added as an oxidizing agent.
Samples were mixed 1:1 with Laemmli buffer containing 4% SDS. Half
the sample were heat treated to 100 deg for 10 min (denaturating
condition) and half of them were left untreated, before running on
a 4-20% TGX gel (Bio-rad Criterion) in TGS buffer.
Coupled In Vitro Transcription and Translation (IVTT)
[0496] All proteins were generated by coupled in vitro
transcription and translation (IVTT) by using an E. coli T7-S30
extract system for circular DNA (Promega). The complete 1 mM amino
acid mixture minus cysteine and the complete 1 mM amino acid
mixture minus methionine were mixed in equal volumes to obtain the
working amino acid solution required to generate high
concentrations of the proteins. The amino acids (10 uL) were mixed
with premix solution (40 uL), [35S]L-methionine (2 uL, 1175
Ci/mmol, 10 mCi/mL), plasmid DNA (16 uL, 400 ng/uL) and T7 S30
extract (30 uL) and rifampicin (2 uL, 20 mg/mL) to generate a 100
uL reaction of IVTT proteins. Synthesis was carried out for 4 hours
at 30.degree. C. followed by overnight incubation at room
temperature. If the CsgG:CsgF or CsgG;FCP complexes were made in
co-expression, plasmid DNAs encoding each component were mixed in
equal amounts, and a portion of the mixture (16 uL) was used for
IVTT. After incubation, the tube was centrifuged for 10 minutes at
22000 g, of which the supernatant was discarded. The resulting
pellet was resuspended and washed in MBSA (10 mM MOPS, 1 mg/ml BSA
pH7.4) and centrifuged again under the same conditions. The protein
present in the pellet was re-suspended in 1.times. Laemmli sample
buffer and run in 4-20% TGX gel at 300V for 25 min. The gel was
then dried and exposed to Carestream.RTM. Kodak BioMax.RTM. MR film
overnight. The film was then processed and the protein in the gel
visualized.
Samples for Testing in MinIONs
[0497] All samples prior to testing are incubated with Brij58
(final concentration of 0.1%) for 10 minutes at room temperature
before making up subsequent pore dilutions necessary for pore
insertion.
Method for Preparing and Running Static Strands
[0498] A set of polyA DNA strands (SS20 to SS38 of FIG. 27) in
which one base is missing from the DNA backbone (iSpc3) is obtained
by Integrated DNA Technologies (IDT). 3' end of each of these
strand also comprise a biotin modification. The static strands are
incubated with monovalent streptavidin at room temperature for 20
minutes, resulting in the biotin binding to the streptavidin. The
streptavidin-static strand complex was diluted to 500 nM (B, FIG.
27) and 2 uM (C, FIG. 27) in 25 mM HEPES, 430 mM KCl. 30 mM ATP, 30
mM MgCl2, 2.15 mM EDTA, pH8 (known as RBFM). The residual current
generated by each static strand is recorded in a MinION set up.
MinIOn flow cells were flushed as per standard running protocols,
and then the sequencing protocol was started with 1 minute static
flicks. Initially 10 minutes of open pore recording was generated
before 150 uL of the first streptavidin-static strand complex was
added. After 10 minutes, 800 uL of RBFM was flushed through the
flow cell before the next streptavidin-static strand complex was
added. This process was repeated for all streptavidin-static
strands. Once the final streptavidin-static strand complex had been
incubated on the flow cell, 800 uL of RBFM was flushed through the
flow cell and 10 minutes of open pore recording was generated
before finishing the experiment.
Method for Making Discrimination Profile Plots
[0499] The reader head discrimination profiles show the average
variation in modelled current when the base at each reader head
position is varied. To calculate the reader head discrimination at
position i for a model of length k with alphabet of length n, we
defined the discrimination at reader head position i as the median
of the standard deviations in current level for each of the
n.sup.k-1 groups of size n where position i is varied while other
positions are held constant.
Aspects of the Disclosure
[0500] 1. An isolated pore complex comprising a CsgG pore, or a
homologue or mutant thereof, and a modified CsgF peptide, or a
homologue or mutant thereof. 2. The isolated pore complex according
to 1, wherein the modified CsgF peptide, or a homologue or mutant
thereof, is inserted into the lumen of the CsgG pore, or a
homologue or mutant thereof. 3. The isolated pore complex according
to 2, wherein the pore complex has two or more channel
constrictions, comprising a CsgG channel constriction and a CsgF
channel constriction. 4. The isolated pore complex according to any
one of 1 to 3, wherein the CsgG pore, or homologue or mutant
thereof, is a mutant CsgG pore. 5. The isolated pore complex
according to 3 or 4, wherein the CsgF channel constriction has a
diameter in the range from 0.5 nm to 2.0 nm 6. A modified CsgF
peptide, or a modified peptide of a CsgF homologue or mutant,
wherein the modification comprises a truncation of the CsgF protein
SEQ ID NO:6 or of a homologue or mutant thereof. 7. A modified CsgF
peptide, or a modified peptide of a CsgF homologue or mutant,
according to 6, wherein said modified CsgF peptide comprises SEQ ID
NO:39, or SEQ ID NO:40, or a homologue or mutant thereof. 8. A
modified CsgF peptide, or a modified peptide of a CsgF homologue or
mutant, according to 6, wherein said modified CsgF peptide
comprises SEQ ID NO:15, or a homologue or mutant thereof. 9. A
modified CsgF peptide according to 8, or a modified peptide of a
CsgF homologue or mutant, wherein one or more positions in the
region comprising SEQ ID NO:15 are mutated, with a minimal of 35
amino acid identity to SEQ ID NO:15. 10. A polynucleotide which
encodes a modified CsgF peptide according to any one of 6 to 9. 11.
The isolated pore complex according to any one of 1 to 5, wherein
said modified CsgF peptide, or a homologue or mutant thereof, is a
peptide according to any one of 6 to 9. 12. The isolated pore
complex according to 11, wherein the modified CsgF peptide and the
CsgG pore, or homologues or mutants thereof, are covalently
coupled. 13. The isolated pore complex according to 12, wherein the
covalent coupling is via: (i) a cysteine residue at a position
corresponding to 132, 133, 136, 138, 140, 142, 144, 145, 147, 149,
151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 or 209
of SEQ ID NO: 3, or of a homologue thereof; (ii) a non-native
reactive or photoreactive amino acid at a position corresponding to
132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155,
183, 185, 187, 189, 191, 201, 203, 205, 207 or 209 of SEQ ID NO: 3,
or of a homologue thereof. 14. An isolated transmembrane pore
complex comprising the isolated pore complex according to any one
of 1 to 5 or 11 to 13, and the components of a membrane. 15. A
method for producing a transmembrane pore complex, wherein the pore
complex is formed by a CsgG pore and a modified CsgF peptide, or
homologues or mutants thereof, comprising co-expressing CsgG as
depicted in SEQ ID NO:2, or a homologue or mutant thereof, and a
modified CsgF peptide, or a homologue or mutant thereof, in a
suitable host cell, thereby allowing in vivo transmembrane pore
complex formation. 16. The method according to 15, wherein the
modified CsgF peptide, or homologue or mutant thereof, comprises
SEQ ID NO:12 or SEQ ID NO:14, or a homologue or mutant thereof. 17.
A method for producing an isolated pore complex, wherein the
isolated pore is formed by a CsgG pore, or a homologue or mutant
thereof, and a modified CsgF peptide, or a homologue or mutant
thereof, comprising contacting the CsgG monomers of SEQ ID NO:3, or
a homologue or mutant thereof, with modified CsgF peptide, or a
homologue or mutant thereof, thereby allowing in vitro
reconstitution of the isolated pore complex. 18. The method
according to 17, wherein the modified CsgF peptide, or a homologue
or mutant thereof, comprises SEQ ID NO:15 or SEQ ID NO:16, or a
homologue or mutant thereof. 19. A method for determining the
presence, absence or one or more characteristics of a target
analyte, comprising the steps of: (i) contacting the target analyte
with a pore complex according to any one of 1 to 5 or 11 to 13 or
with a transmembrane pore complex according to 14, such that the
target analyte moves into the pore complex; and (ii) taking one or
more measurements as the analyte moves through the pore complex and
thereby determining the presence, absence or one or more
characteristics of the analyte. 20. A method according to 19,
wherein the analyte is a polynucleotide. 21. A method according to
19, wherein the analyte is a (poly)peptide 22. A method according
to 19, wherein the analyte is a polysaccharide 23. A method
according to 19, wherein the analyte is a small organic or
inorganic compound, such as pharmacologically active compounds,
toxic compounds and pollutants. 24. A method according to 20,
comprising determining one or more characteristics selected from
(i) the length of the polynucleotide, (ii) the identity of the
polynucleotide, (iii) the sequence of the polynucleotide, (iv) the
secondary structure of the polynucleotide and (v) whether or not
the polynucleotide is modified. 25. A method of characterising a
polynucleotide or a (poly)peptide using an isolated transmembrane
pore complex, wherein the pore complex is a complex comprising a
CsgG pore, or a homologue or mutant thereof, and a modified CsgF
peptide, or a homologue or mutant thereof. 26. A method according
to 25, wherein the CsgG pore, or homologue or mutant thereof,
comprises six to ten monomers. 27. Use of the isolated pore complex
according to any one of 1 to 5 or 11 to 13 or of the transmembrane
pore complex according to 14 to determine the presence, absence or
one or more characteristics of a target analyte. 28. A kit for
characterising a target analyte comprising (a) an isolated pore
complex according to any one of 1 to 5 or 11 to 13 and (b) the
components of a membrane.
Sequences
Description of the Sequences:
[0501] SEQ ID NO:1 shows polynucleotide sequence of wild-type E.
coli CsgG from strain K12, including signal sequence (Gene ID:
945619). SEQ ID NO:2 shows amino acid sequence of wild-type E. coli
CsgG including signal sequence (Uniprot accession number P0AEA2).
SEQ ID NO:3 shows amino acid sequence of wild-type E. coli CsgG as
mature protein (Uniprot accession number P0AEA2). SEQ ID NO:4 shows
polynucleotide sequence of wild-type E. coli CsgF from strain K12,
including signal sequence (Gene ID: 945622). SEQ ID NO:5 shows
amino acid sequence of wild-type E. coli CsgF including signal
sequence (Uniprot accession number P0AE98). SEQ ID NO:6 shows amino
acid sequence of wild-type E. coli CsgF as mature protein (Uniprot
accession number P0AE98). SEQ ID NO:7 shows polynucleotide sequence
of a fragment of wild-type E. coli CsgF encoding amino acids 1 to
27 and a C-terminal 6 His tag. SEQ ID NO:8 shows amino acid
sequence of a fragment of wild-type E. coli CsgF encompassing amino
acids 1 to 27 and a C-terminal 6 His tag. SEQ ID NO:9 shows
polynucleotide sequence of a fragment of wild-type E. coli CsgF
encoding amino acids 1 to 38 and a C-terminal 6 His tag. SEQ ID
NO:10 shows amino acid sequence of a fragment of wild-type E. coli
CsgF encompassing amino acids 1 to 38 and a C-terminal 6 His tag.
SEQ ID NO:11 shows polynucleotide sequence of a fragment of
wild-type E. coli CsgF encoding amino acids 1 to 48 and a
C-terminal 6 His tag. SEQ ID NO:12 shows amino acid sequence of a
fragment of wild-type E. coli CsgF encompassing amino acids 1 to 48
and a C-terminal 6 His tag. SEQ ID NO:13 shows polynucleotide
sequence of a fragment of wild-type E. coli CsgF encoding amino
acids 1 to 64 and a C-terminal 6 His tag. SEQ ID NO:14 shows amino
acid sequence of a fragment of wild-type E. coli CsgF encompassing
amino acids 1 to 64 and a C-terminal 6 His tag. SEQ ID NO:15 shows
amino acid sequence of a peptide corresponding to residues 20 to 53
of E. coli CsgF SEQ ID NO:16 shows amino acid sequence of a peptide
corresponding to residues 20 to 42 of E. coli CsgF, including KD at
its C-terminus SEQ ID NO:17 shows amino acid sequence of a peptide
corresponding to residues 23 to 55 of CsgF homologue Q88H88 SEQ ID
NO:18 shows amino acid sequence of a peptide corresponding to
residues 25 to 57 of CsgF homologue A0A143HJA0 SEQ ID NO:19 shows
amino acid sequence of a peptide corresponding to residues 21 to 53
of CsgF homologue Q5E245 SEQ ID NO:20 shows amino acid sequence of
a peptide corresponding to residues 19 to 51 of CsgF homologue
Q084E5 SEQ ID NO:21 shows amino acid sequence of a peptide
corresponding to residues 15 to 47 of CsgF homologue F0LZU2 SEQ ID
NO:22 shows amino acid sequence of a peptide corresponding to
residues 26 to 58 of CsgF homologue A0A136HQR0 SEQ ID NO:23 shows
amino acid sequence of a peptide corresponding to residues 21 to 53
of CsgF homologue A0A0W1SRL3 SEQ ID NO:24 shows amino acid sequence
of a peptide corresponding to residues 26 to 59 of CsgF homologue
B0UH01 SEQ ID NO:25 shows amino acid sequence of a peptide
corresponding to residues 22 to 53 of CsgF homologue Q6NAU5 SEQ ID
NO:26 shows amino acid sequence of a peptide corresponding to
residues 7 to 38 of CsgF homologue G8PUY5 SEQ ID NO:27 shows amino
acid sequence of a peptide corresponding to residues 25 to 57 of
CsgF homologue A0A0S2ETP7 SEQ ID NO:28 shows amino acid sequence of
a peptide corresponding to residues 19 to 51 of CsgF homologue
E311Z1 SEQ ID NO:29 shows amino acid sequence of a peptide
corresponding to residues 24 to 55 of CsgF homologue F3Z094 SEQ ID
NO:30 shows amino acid sequence of a peptide corresponding to
residues 21 to 53 of CsgF homologue A0A176T7M2 SEQ ID NO:31 shows
amino acid sequence of a peptide corresponding to residues 14 to 45
of CsgF homologue D2QPP8 SEQ ID NO:32 shows amino acid sequence of
a peptide corresponding to residues 28 to 58 of CsgF homologue
N2IYT1 SEQ ID NO:33 shows amino acid sequence of a peptide
corresponding to residues 26 to 58 of CsgF homologue W7QHV5 SEQ ID
NO:34 shows amino acid sequence of a peptide corresponding to
residues 23 to 55 of CsgF homologue D4ZLW2 SEQ ID NO:35 shows amino
acid sequence of a peptide corresponding to residues 21 to 53 of
CsgF homologue D2QT92 SEQ ID NO:36 shows amino acid sequence of a
peptide corresponding to residues 20 to 51 of CsgF homologue
A0A167UJA2 SEQ ID NO:37 shows amino acid sequence of a fragment of
wild-type E. coli CsgF encompassing amino acids 20 to 27. SEQ ID
NO:38 shows amino acid sequence of a fragment of wild-type E. coli
CsgF encompassing amino acids 20 to 38. SEQ ID NO:39: shows amino
acid sequence of a fragment of wild-type E. coli CsgF encompassing
amino acids 20 to 48. SEQ ID NO:40 shows amino acid sequence of a
fragment of wild-type E. coli CsgF encompassing amino acids 20 to
64. SEQ ID NO:41 shows the nucleotide sequence of primer
CsgF_d27_end SEQ ID NO:42 shows the nucleotide sequence of primer
CsgF_d38_end SEQ ID NO:43 shows the nucleotide sequence of primer
CsgF_d48_end SEQ ID NO:44 shows the nucleotide sequence of primer
CsgF_d64_end SEQ ID NO:45 shows the nucleotide sequence of primer
pNa62_CsgF_histag_Fw SEQ ID NO:46 shows the nucleotide sequence of
primer CsgF-His_pET22b_FW SEQ ID NO:47 shows the nucleotide
sequence of primer CsgF-His_pET22b_Rev SEQ ID NO:48 shows the
nucleotide sequence of primer csgEFG_pDONR221_FW SEQ ID NO:49 shows
the nucleotide sequence of primer csgEFG_pDONR221_Rev SEQ ID NO:50
shows the nucleotide sequence of primer Mut_csgF_His_FW SEQ ID
NO:51 shows the nucleotide sequence of primer Mut_csgF_His_Rev SEQ
ID NO:52 shows the nucleotide sequence of primer DeICsgE_Rev SEQ ID
NO:53 shows the nucleotide sequence of primer DeICsgE FW SEQ ID NO:
54 shows the amino acid sequence of residues 1 to 30 of mature E.
coli CsgF SEQ ID NO: 55 shows the amino acid sequence of residues 1
to 35 of mature E. coli CsgF SEQ ID NO: 56 shows the amino acid
sequence of a mutated (T4C/N17S) CsgF sequence with a signal
sequence, and a TEV protease cleavage site (ENLYFQS) inserted
between residues 35 and 36 of sequence of the mature protein. SEQ
ID NO: 57 shows the amino acid sequence of a mutated (N17S-Del)
CsgF sequence with a signal sequence, and a TEV protease cleavage
site (ENLYFQS) inserted between residues 35 and 36 of sequence of
the mature protein. SEQ ID NO: 58 shows the amino acid sequence of
a mutated (G1C/N17S) CsgF sequence with a signal sequence, and a
TEV protease cleavage site (ENLYFQS) inserted between residues 35
and 36 of sequence of the mature protein. SEQ ID NO: 59 shows the
amino acid sequence of a mutated (G1C) CsgF sequence with a signal
sequence, and a TEV protease cleavage site (ENLYFQS) inserted
between residues 35 and 36 of sequence of the mature protein. SEQ
ID NO: 60 shows the amino acid sequence of a CsgF sequence with a
signal sequence, a TEV protease cleavage site (ENLYFQS) inserted
between residues 45 and 46 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 61 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a TEV
protease cleavage site (ENLYFQS) inserted between residues 35 and
36 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 62 shows the amino acid sequence of a CsgF
sequence with a signal sequence, a TEV protease cleavage site
(ENLYFQS) inserted between residues 30 and 31 of sequence of the
mature protein, and a His.sub.10 tag at the C-terminus. SEQ ID NO:
63 shows the amino acid sequence of a CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 51 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 64 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a TEV
protease cleavage site (ENLYFQS) inserted between residues 30 and
37 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 65 shows the amino acid sequence of a CsgF
sequence with a signal sequence, a HCV C3 protease cleavage site
(LEVLFQGP) inserted between residues 34 and 36 of sequence of the
mature protein, and a His.sub.10 tag at the C-terminus. SEQ ID NO:
66 shows the amino acid sequence of a CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 42 and 43 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 67 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a HCV C3
protease cleavage site (LEVLFQGP) inserted between residues 38 and
47 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 68 shows the amino acid sequence of
YP_001453594.1: 1-248 of hypothetical protein CKO_02032
[Citrobacter koseri ATCC BAA-895], which is 99% identical to SEQ ID
NO: 3. SEQ ID NO: 69 shows the amino acid sequence of
WP_001787128.1: 16-238 of curli production assembly/transport
component CsgG, partial [Salmonella enterica], which is 98% to SEQ
ID NO: 3. SEQ ID NO: 70 shows the amino acid sequence of
KEY44978.1|: 16-277 of curli production assembly/transport protein
CsgG [Citrobacter amalonaticus], which is 98% identical to SEQ ID
NO: 3. SEQ ID NO: 71 shows the amino acid sequence of
YP_003364699.1: 16-277 of curli production assembly/transport
component [Citrobacter rodentium ICC168], which is 97% identical to
SEQ ID NO: 3. SEQ ID NO: 72 shows the amino acid sequence of
YP_004828099.1: 16-277 of curli production assembly/transport
component CsgG [Enterobacter asburiae LF7a], which is 94% identical
to SEQ ID NO: 3. SEQ ID NO: 73 shows the amino acid sequence of
WP_006819418.1: 19-280 of transporter [Yokenella regensburgei],
which is 91% identical to SEQ ID NO: 3. SEQ ID NO: 74 shows the
amino acid sequence of WP_024556654.1: 16-277 of curli production
assembly/transport protein CsgG [Cronobacter pulveris], which is
89% identical to SEQ ID NO: 3. SEQ ID NO: 75 shows the amino acid
sequence of YP_005400916.1:16-277 of curli production
assembly/transport protein CsgG [Rahnella aquatilis HX2], which is
84% identical to SEQ ID NO: 3. SEQ ID NO: 76 shows the amino acid
sequence of KFC99297.1: 20-278 of CsgG family curli production
assembly/transport component [Kluyvera ascorbata ATCC 33433], which
is 82% identical to SEQ ID NO: 3. SEQ ID NO: 77 shows the amino
acid sequence of KFC86716.1 1:16-274 of CsgG family curli
production assembly/transport component [Hafnia alvei ATCC 13337],
which is 81% identical to SEQ ID NO: 3. SEQ ID NO: 78 shows the
amino acid sequence of YP_007340845.1|:16-270 of uncharacterised
protein involved in formation of curli polymers [Enterobacteriaceae
bacterium strain FGI 57], which is 76% identical to SEQ ID NO: 3.
SEQ ID NO: 79 shows the amino acid sequence of WP_010861740.1:
17-274 of curli production assembly/transport protein CsgG
[Plesiomonas shigelloides], which is 70% identical to SEQ ID NO: 3.
SEQ ID NO: 80 shows the amino acid sequence of YP_205788.1: 23-270
of curli production assembly/transport outer membrane lipoprotein
component CsgG [Vibrio fischeri ES114], which is 60% identical to
SEQ ID NO: 3. SEQ ID NO: 81 shows the amino acid sequence of
WP_017023479.1: 23-270 of curli production assembly protein CsgG
[Aliivibrio logei], which is 59% identical to SEQ ID NO: 3. SEQ ID
NO: 82 shows the amino acid sequence of WP_007470398.1: 22-275 of
Curli production assembly/transport component CsgG [Photobacterium
sp. AK15], which is 57% identical to SEQ ID NO: 3. SEQ ID NO: 83
shows the amino acid sequence of WP_021231638.1: 17-277 of curli
production assembly protein CsgG [Aeromonas veronii], which is 56%
identical to SEQ ID NO: 3. SEQ ID NO: 84 shows the amino acid
sequence of WP_033538267.1: 27-265 of curli production
assembly/transport protein CsgG [Shewanella sp. ECSMB14101], which
is 56% identical to SEQ ID NO: 3. SEQ ID NO: 85 shows the amino
acid sequence of WP_003247972.1: 30-262 of curli production
assembly protein CsgG [Pseudomonas putida], which is 54% identical
to SEQ ID NO: 3. SEQ ID NO: 86 shows the amino acid sequence of
YP_003557438.1: 1-234 of curli production assembly/transport
component CsgG [Shewanella violacea D5512], which is 53% identical
to SEQ ID NO: 3. SEQ ID NO: 87 shows the amino acid sequence of
WP_027859066.1: 36-280 of curli production assembly/transport
protein CsgG [Marinobacterium jannaschii], which is 53% identical
to SEQ ID NO: 3. SEQ ID NO: 88 shows the amino acid sequence of
CEJ70222.1: 29-262 of Curli production assembly/transport component
CsgG [Chryseobacterium oranimense G311], which is 50% identical to
SEQ ID NO: 3.
TABLE-US-00005 SEQ ID NO: 1 (>P0AEA2; coding sequence for WT
CsgG from E. coli K12)
ATGCAGCGCTTATTTCTTTTGGTTGCCGTCATGTTACTGAGCGGATGCTTAACCGCCCCGCCTAAA
GAAGCCGCCAGACCGACATTAATGCCTCGTGCTCAGAGCTACAAAGATTTGACCCATCTGCCAGCG
CCGACGGGTAAAATCTTTGTTTCGGTATACAACATTCAGGACGAAACCGGGCAATTTAAACCCTACC
CGGCAAGTAACTTCTCCACTGCTGTTCCGCAAAGCGCCACGGCAATGCTGGTCACGGCACTGAAA
GATTCTCGCTGGTTTATACCGCTGGAGCGCCAGGGCTTACAAAACCTGCTTAACGAGCGCAAGATT
ATTCGTGCGGCACAAGAAAACGGCACGGTTGCCATTAATAACCGAATCCCGCTGCAATCTTTAACG
GCGGCAAATATCATGGTTGAAGGTTCGATTATCGGTTATGAAAGCAACGTCAAATCTGGCGGGGTT
GGGGCAAGATATTTTGGCATCGGTGCCGACACGCAATACCAGCTCGATCAGATTGCCGTGAACCT
GCGCGTCGTCAATGTGAGTACCGGCGAGATCCTTTCTTCGGTGAACACCAGTAAGACGATACTTTC
CTATGAAGTTCAGGCCGGGGTTTTCCGCTTTATTGACTACCAGCGCTTGCTTGAAGGGGAAGTGGG
TTACACCTCGAACGAACCTGTTATGCTGTGCCTGATGTCGGCTATCGAAACAGGGGTCATTTTCCT
GATTAATGATGGTATCGACCGTGGTCTGTGGGATTTGCAAAATAAAGCAGAACGGCAGAATGACAT
TCTGGTGAAATACCGCCATATGTCGGTTCCACCGGAATCCTGA SEQ ID NO: 2
(>P0AEA2 (1:277); WT prepro CsgG from E. coli K12)
MQRLFLLVAVMLLSGCLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPAS
NFSTAVPQSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEG
SIIGYESNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDY
QRLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES
SEQ ID NO: 3 (>P0AEA2 (16:277); mature CsgG from E. coli K12)
CLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVT
ALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGGVGA
RYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEVGYTSNEPV
MLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES SEQ ID NO: 4
(>P0AE98; coding sequence for WT CsgF from E. coli K12)
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGA
CTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAGCGCT
CAGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACACCCTCAGCGT
TAGATAACTTTACTCAGGCCATCCAGTCACAAATTTTAGGTGGGCTACTGTCGAATATTAATACCGG
TAAACCGGGCCGCATGGTGACCAACGATTATATTGTCGATATTGCCAACCGCGATGGTCAATTGCA
GTTGAACGTGACAGATCGTAAAACCGGACAAACCTCGACCATCCAGGTTTCGGGTTTACAAAATAA
CTCAACCGATTTT SEQ ID NO: 5 (>P0AE98 (1:138); WT pre CsgF from E.
coli K12)
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSAL
DNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTDF
SEQ ID NO: 6 (>P0AE98 (20:138); WT mature CsgF from E. coli K12)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSALDNFTQAIQSQILGGLLSNIN
TGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTDF SEQ ID NO: 7
(>P0AE98; coding sequence for CsgF 1:27_6His)
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGA
CTTTCCAGTTCCGTCATCACCATCACCATCACTAAGCCC SEQ ID NO: 8 (>P0AE98
(1:28); preprotein of CsgF 20:27_6His) MRVKHAVVLLMLISPLSWA GTMTFQFR
HHHHHH SEQ ID NO: 9 (>P0AE98; coding sequence for CsgF
1:38_6His)
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGA
CTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCCATCACCATCACCATCACTA
AGCCC SEQ ID NO: 10 (>P0AE98 (1:39); preprotein of CsgF
20:38_6His) MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNG HHHHHH SEQ ID
NO: 11 (>P0AE98; coding sequence for CsgF 1:48_6His)
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGA
CTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAGCGCT
CAGGCCCAACATCACCATCACCATCACTAAGCCC SEQ ID NO: 12 (>P0AE98
(1:49); preprotein of CsgF 20:48_6His)
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQ HHHHHH SEQ ID NO:
13 (>P0AE98; coding sequence for CsgF 1:64_6His)
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGA
CTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAGCGCT
CAGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACA
CATCACCATCACCATCACTAAGCCC SEQ ID NO: 14 (>P0AE98 (1:65);
preprotein of CsgF 20:64_6His)
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETHHHH
HH SEQ ID NO: 15 (>P0AE98 (20:53); mature peptide of CsgF 20:53)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKD SEQ ID NO: 16 (>P0AE98
(20:42); mature peptide of CsgF 20:42 + KD)
GTMTFQFRNPNFGGNPNNGAFLLKD SEQ ID NO: 17 (>Q88H88_PSEPK (23:55))
TELVYTPVNPAFGGNPLNGTWLLNNAQAQNDY SEQ ID NO: 18
(>A0A143HJA0_9GAMM (25:57)) TELIYEPVNPNFGGNPLNGSYLLNNAQAQDRH SEQ
ID NO: 19 (>Q5E245_VIBF1 (21:53))
SELVYTPVNPNFGGNPLNTSHLFGGANAINDY SEQ ID NO: 20 (>Q084E5_SHEFN
(19:51)) TQLVYTPVNPAFGGSYLNGSYLLANASAQNEH SEQ ID NO: 21
(>F0LZU2_VIBFN (15:47)) SSLVYEPVNPTFGGNPLNTTHLFSRAEAINDY SEQ ID
NO: 22 (>A0A136HQR0_9ALTE (26:58))
TELVYEPINPSFGGNPLNGSFLLSKANSQNAH SEQ ID NO: 23
(>A0A0W1SRL3_9GAMM (21:53)) TEIVYQPINPSFGGNPMNGSFLLQKAQSQNAH SEQ
ID NO: 24 (>B0UH01_METS4 26:59))
SSLVYQPVNPAFGGPQLNGSWLQAEANAQNIPQ SEQ ID NO: 25 (>Q6NAU5_RHOPA
(22:53)) GSLVYTPTNPAFGGSPLNGSWQMQQATAGNH SEQ ID NO: 26
(>G8PUY5_PSEUV (7:38)) QQLIYQPTNPSFGGYAANTTHLFATANAQKTA SEQ ID
NO: 27 (>A0A0S2ETP7_9RHIZ (25:57))
GDLVYTPVNPSFGGSPLNSAHLLSIAGAQKNA SEQ ID NO: 28 (>E3I1Z1_RHOVT
(19:51)) AELGYTPVNPSFGGSPLNGSTLLSEASAQKPN SEQ ID NO: 29
(>F3Z094_DESAF (24:55)) TELVFSFTNPSFGGDPMIGNFLLNKADSQKR SEQ ID
NO: 30 (>A0A176T7M2_9FLAO (21:53))
QQLVYKSINPFFGGGDSFAYQQLLASANAQND SEQ ID NO: 31 (>D2QPP8_SPILD
(14:45)) QALVYHPNNPAFGGNTFNYQWMLSSAQAQDR SEQ ID NO: 32
(>N2IYT1_9PSED (26:58)) TELVYTPKNPAFGGSPLNGSYLLGNAQAQNDY SEQ ID
NO: 33 (>W7QHV5_9GAMM (26:58)) GQLIYQPINPSFGGDPLLGNHLLNKAQAQDTK
SEQ ID NO: 34 (>D4ZLW2_SHEVD (23:55))
TQLIYTPVNPNFGGSYLNGSYLLANASVQNDH SEQ ID NO: 35 (>D2QT92_SPILD
(21:53)) QAFVYHPNNPNFGGNTFNYSWMLSSAQAQDRT SEQ ID NO: 36
(>A0A167UJA2_9FLAO (20:51)) QGLIYKPKNPAFGGDTFNYQWLASSAESQNK SEQ
ID NO: 37 (>P0AE98 (20:28); mature peptide of CsgF 20:27)
GTMTFQFR SEQ ID NO: 38 (>P0AE98 (20:39); mature peptide of CsgF
20:38) GTMTFQFRNPNFGGNPNNG SEQ ID NO: 39 (>P0AE98 (20:49);
mature peptide of CsgF 20:48) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQ SEQ ID
NO: 40 (>P0AE98 (20:65); mature peptide of CsgF 20:64)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIET SEQ ID NO: 41
(CsgF_d27_end) ACGGAACTGGAAAGTCATGGTTCC SEQ ID NO: 42
(CsgF_d38_end) GCCATTATTTGGGTTACCACCAAAGTTTGG SEQ ID NO: 43
(CsgF_d48_end) TTGGGCCTGAGCGCTATTTAATAAAAAAGC SEQ ID NO: 44
(CsgF_d64_end) TGTTTCAATACCAAAGTCATCGTTATAGCTCGG SEQ ID NO: 45
(pNa62_CsgF_histag_Fw) CATCACCATCACCATCACTAAGCCC SEQ ID NO: 46
(CsgF-His_pET22b_FW) CCCCCATATGGGAACCATGACTTTCCAGTTCC SEQ ID NO:
47: (CsgF-His_pET22b_Rev)
CCCCGAATTCCTAATGGTGATGGTGATGGTGGTAAAAATCGGTTGAGTTATTTTG SEQ ID NO:
48: (csgEFG_pDONR221_FW)
GGGGACAAGTTTGTACAAAAAAGCAGGCTACCTCAGGCGATAAAGCCATGAAACGTTA SEQID
NO: 49:(csgEFG_pDONR221_Rev)
GGGGACCACTTTGTACAAGAAAGCTGGGTGTTTAAACTCATTTTTCGAACTGCGGGTGGCTCCAAG
CGCTGG SEQ ID NO: 50: (Mut_csgF_His_FW)
CAAAATAACTCAACCGATTTTCATCACCATCACCATCACTAAGCCCCAGCTTCATAAGG SEQ ID
NO: 51: (Mut_csgF_His_Rev)
CCTTATGAAGCTGGGGCTTAGTGATGGTGATGGTGATGAAAATCGGTTGAGTTATTTTG SEQ ID
NO: 52: (DelCsgE_Rev) AGCCTGCTTTTTTGTACAAAC SEQ ID NO: 53: (DelCsgE
FW) ATAAAAAATTGTTCGGAGGCTGC SEQ ID NO: 54 (>P0AE98 (20:50);
mature peptide of CsgF 1:30) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN SEQ ID
NO: 55 (>P0AE98 (20:54); mature peptide of CsgF 1:35)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP
Examples of CsgF sequences with protease cleavage sites made into
proteins. Signal peptide is shown in bold TEV protease cleavage
site in bold and underline and HCV C3 protease cleavage site in
underline. StrepII indicate the Strep tag at the C terminus, H10
indicates the 10.times.Histidine tag at the C terminus and **
indicates STOP codons.
TABLE-US-00006 Pro-CsgF-Eco-(WT-T4C/N17S/P35-TEV-S36)-StrepII SEQ
ID NO: 56
MRVKHAVVLLMLISPLSWAGTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFSAWSHPQFEK**
Pro-CsgF-Eco-(WT-N17S-Del(P35-[TEV]-S36)-StrepII SEQ ID NO: 57
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFSAWSHPQFEK**
Pro-CsgF-Eco-(WT-G1C/N17S/P35-[TEV]-S36)-StrepII SEQ ID NO: 58
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-G1C/P35-[TEV]-S36)-StrepII SEQ
ID NO: 59
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPENLYFQSSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-T45-TEV-P46)-H10 SEQ ID NO: 60
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETENL
YFQSPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSG
LQNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-P35-TEV-S36)-H10 SEQ ID NO:
61
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPENLYFQSSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-N30-TEV-S31)-H10 SEQ ID NO: 62
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNENLYFQSSYKDPSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-T45-TEV-F51)-H10 SEQ ID NO: 63
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETENL
YFQSFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNST
DFHHHHHHHHHH** Pro-CsgF-Eco-(WT-N30-TEV-Y37)-H10 SEQ ID NO: 64
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNENLYFQSYNDDFGIETPS
ALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNST
DFHHHHHHHHHH** Pro-CsgF-Eco-(WT-D34-[C3]-S36) SEQ ID NO: 65
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDLEVLFQGPSYNDDF
GIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGL
QNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-I42-[C3]-E43) SEQ ID NO: 66
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGILEVLFQ
GPETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSG
LQNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-N38-[C3]-S47) SEQ ID NO: 67
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNLEVLFQGPSA
LDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTD
FSAWSHPQFEK** SEQ ID NO: 68
MPRAQSYKDLTHLPMPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKDSRWFIPLER
QGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGGVGARYFGIGADTQY
QLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEIGYTSNEPVMLCLMSAIE
TGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES SEQ ID NO: 69
CLTAPPKQAAKPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLV
TALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAMNNRIPLQSLTAANIMVEGSIIGYESNVKSGG
VGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEIGYT
SNEPVMLCLMSAIETG SEQ ID NO: 70
CLTAPPKEAAKPILMPRAQSYKDLTHLPIPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLV
TALKDSRWFVPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGG
VGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVERFIDYQRLLEGEIGYT
SNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKADRQNDILVKYRHMSVPPES SEQ ID NO:
71
CLTIPPKEAAKPILMPRAQSYKDLTHLPVPIGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLV
TALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLPSLTAANIMVEGSIIGYESNVKSGG
AGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVERFIDYQRLLEGEIGYT
SNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKADRQNDILVKYRQMSVPPES SEQ ID NO:
72
CLTAPPKEAAKPILMPRAQSYRDLTHLPAPIGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLV
TALKDSHWEIPLERQGLQNLLNERKIIRAAQENGTVANNNRMPLQSLAAANVMIEGSIIGYESNVKSGG
VGARYFGIGADTQYQLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVERFIDYQRLLEGEIGYT
SNEPVMMCLMSAIETGVIFLINDGIDRGLWDLQNKADAQNPVLVKYRDMSVPPES SEQ ID NO:
73
CLTAPPKEAAKPTLMPRAQSYRDLTHLPLPSGKVFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKD-
SR
WFVPLERQGLQNLLNERKIIRAAQENGTVADNNRIPLQSLTAANVMIEGSIIGYESNVKSGGVGARYFGIGADT-
QY
QLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVFRFVDYQRLLEGEIGYTSNEPVMLCLMSAIETGVIY-
LI NDGIERGLWDLQQKADVDNPILARYRNMSAPPES SEQ ID NO: 74
CLTAPPKEAAKPTLMPRAQSYRDLTNLPDPKGKLFVSVYNIQDETGQFKPYPASNFSTAVPQSATSMLVTALKD-
SR
WFIPLERQGLQNLLNERKIIRAAQENGTVAENNRMPLQSLVAANVMIEGSIIGYESNVKSGGVGARYFGIGGDT-
QY
QLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEIGYTANEPVMLCLMSAIETGVIH-
LI NDGINRGLWELKNKGDAKNTILAKYRSMAVPPES SEQ ID NO: 75
CLTAAPKEAARPTLLPRAPSYTDLTHLPSPQGRIFVSVYNIQDETGQFKPYPACNFSTAVPQSATAMLVSALKD-
SK
WFIPLERQGLQNLLNERKIIRAAQENGSVAINNQRPLSSLVAANILIEGSIIGYESNVKSGGVGARYFGIGAST-
QY
QLDQIAVNLRAVDVNTGEVLSSVNTSKTILSYEVQAGVFRFIDYQRLLEGELGYTTNEPVMLCLMSAIESGVIY-
LV NDGIERNLWQLQNPSEINSPILQRYKNNIVPAES SEQ ID NO: 76
CITSPPKQAAKPTLLPRSQSYQDLTHLPEPQGRLFVSVYNISDETGQFKPYPASNFSTSVPQSATAMLVSALKD-
SN
WFIPLERQGLQNLLNERKIIRAAQENGTVAVNNRTQLPSLVAANILIEGSIIGYESNVKSGGAGARYFGIGAST-
QY
QLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEFQAGVFRYIDYQRLLEGEVGYTVNEPVMLCLMSAIETGVIY-
LV NDGISRNLWQLKNASDINSPVLEKYKSIIVP SEQ ID NO: 77
CLTAPPKQAAKPTLMPRAQSYQDLTHLPEPAGKLFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVSALKD-
SG
WFIPLERQGLQNLLNERKIIRAAQENGTAAVNNQHQLSSLVAANVLVEGSIIGYESNVKSGGAGARFFGIGAST-
QY
QLDQIAVNLRVVDVNTGQVLSSVNTSKTILSYEVQAGVFRYIDYQRLLEGEIGYTTNEPVMLCVMSAIETGVIY-
LV NDGINRNLWTLKNPQDAKSSVLERYKSTIVP SEQ ID NO: 78
CITTPPQEAAKPILLPRDATYKDLVSLPQPRGKIYVAVYNIQDETGQFQPYPASNESTSVPQSATAMLV
SSLKDSRWFVPLERQGLNNLLNERKIIRAAQQNGTVGDNNASPLPSLYSANVIVEGSIIGYASNVKTGG
FGARYFGIGGSTQYQLDQVAVNLRIVNVHIGEVLSSVNTSKTILSYEIQAGVFRFIDYQRLLEGEAGFT
TNEPVMTCLMSAIEEGVIHLINDGINKKLWALSNAADINSEVLTRYRK SEQ ID NO: 79
ITEVPKEAAKPTLMPRASTYKDLVALPKPNGKIIVSVYSVQDETGQFKPLPASNFSTAVPQSGNAMLTSALKDS-
GW
FVPLEREGLQNLLNERKIIRAAQENGTVAANNQQPLPSLLSANVVIEGAIIGYDSDIKTGGAGARYFGIGADGK-
YR
VDQVAVNLRAVDVRTGEVLLSVNTSKTILSSELSAGVFRFIEYQRLLELEAGYTTNEPVMMCMMSALEAGVAHL-
IV EGIRQNLWSLQNPSDINNPIIQRYMKEDVP SEQ ID NO: 80
PETSESPTLMQRGANYIDLISLPKPQGKIFVSVYDFRDQTGQYKPQPNSNFSTAVPQGGTALLTMALLDSEWFY-
PL
ERQGLQNLLTERKIIRAAQKKQESISNHGSTLPSLLSANVMIEGGIVAYDSNIKTGGAGARYLGIGGSGQYRAD-
QV
TVNIRAVDVRSGKILTSVTTSKTILSYEVSAGAFRFVDYKELLEVELGYTNNEPVNIALMSAIDSAVIHLIVKG-
VQ QGLWRPANLDTRNNPIFKKY SEQ ID NO: 81
PDASESPTLMQRGATYLDLISLPKPQGKIYVSVYDFRDQTGQYKPQPNSNFSTAVPQGGTALLTMALLDSEWFY-
PL
ERQGLQNLLTERKIIRAAQKKQESISNHGSTLPSLLSANVMIEGGIVAYDSNIKTGGAGARYLGIGGSGQYRAD-
QV
TVNIRAVDVRSGKILTSVTTSKTILSYELSAGAFRFVDYKELLEVELGYTNNEPVNIALMSAIDSAVIHLIVKG-
IE EGLWRPENQNGKENPIFRKY SEQ ID NO: 82
PETSKEPTLMARGTAYQDLVSLPLPKGKVYVSVYDFRDQTGQYKPQPNSNFSTAVPQGGAALLTTALLDSRWFM-
PL
EREGLQNLLTERKIIRAAQKKDEIPTNHGVHLPSLASANIMVEGGIVAYDTNIQTGGAGARYLGVGASGQYRTD-
QV
TVNIRAVDVRTGRILLSVTTSKTILSKELQTGVFKFVDYKDLLEAELGYTTNEPVNLAVMSAIDAAVVHVIVDG-
IK TGLWEPLRGEDLQHPIIQEYMNRSKP SEQ ID NO: 83
CATHIGSPVADEKATLMPRSVSYKELISLPKPKGKIVAAVYDFRDQTGQYLPAPASNFSTAVTQGGVAMLSTAL-
WD
SQWFVPLEREGLQNLLTERKIVRAAQNKPNVPGNNANQLPSLVAANILIEGGIVAYDSNVRTGGAGAKYFGIGA-
SG
EYRVDQVTVNLRAVDIRSGRILNSVTTSKTVMSQQVQAGVFRFVEYKRLLEAEAGFSTNEPVQMCVMSAIESGV-
IR LIANGVRDNLWQLADQRDIDNPILQEYLQDNAP SEQ ID NO: 84
ASSSLMPKGESYYDLINLPAPQGVMLAAVYDFRDQTGQYKPIPSSNFSTAVPQSGTAFLAQALNDSSWFIPVER-
EG
LQNLLTERKIVRAGLKGDANKLPQLNSAQILMEGGIVAYDTNVRTGGAGARYLGIGAATQFRVDTVTVNLRAVD-
IR
TGRLLSSVTTTKSILSKEITAGVFKFIDAQELLESELGYTSNEPVSLCVASAIESAVVHMIADGIWKGAWNLAD-
QA SGLRSPVLQKY SEQ ID NO: 85
QDSETPTLTPRASTYYDLINMPRPKGRLMAVVYGFRDQTGQYKPTPASSFSTSVTQGAASMLMDALSASGWFVV-
LE
REGLQNLLTERKIIRASQKKPDVAENIMGELPPLQAANLMLEGGIIAYDTNVRSGGEGARYLGIDISREYRVDQ-
VT
VNLRAVDVRTGQVLANVMTSKTIYSVGRSAGVFKFIEFKKLLEAEVGYTTNEPAQLCVLSAIESAVGHLLAQGI-
EQ RLWQV SEQ ID NO: 86
MPKSDTYYDLIGLPHPQGSMLAAVYDFRDQTGQYKAIPSSNFSTAVPQSGTAFLAQALNDSSWFVPVER
EGLQNLLTERKIVRAGLKGEANQLPQLSSAQILMEGGIVAYDINIKTGGAGARYLGIGVNSKFRVDTVT
VNLRAVDIRTGRLLSSVITIKSILSKEVSAGVFKFIDAQDLLESELGYISNEPVSLCVAQAIESAVVHM
IADGIWKRAWNLADTASGLNNPVLQKY SEQ ID NO: 87
LTRRMSTYQDLIDMPAPRGKIVTAVYSFRDQSGQYKPAPSSSFSTAVTQGAAAMLVNVLNDSGWFIPLEREGLQ-
NI
LTERKIIRAALKKDNVPVNNSAGLPSLLAANIMLEGGIVGYDSNIHTGGAGARYFGIGASEKYRVDEVTVNLRA-
ID
IRTGRILHSVLTSKKILSREIRSDVYRFIEFKHLLEMEAGITTNDPAQLCVLSAIESAVAHLIVDGVIKKSWSL-
AD PNELNSPVIQAYQQQRI SEQ ID NO: 88
PSDPERSTMGELTPSTAELRNLPLPNEKIVIGVYKFRDQTGQYKPSENGNNWSTAVPQGTTTILIKALEDSRWF-
IP
IERENIANLLNERQIIRSTRQEYMKDADKNSQSLPPLLYAGILLEGGVISYDSNTMTGGFGARYFGIGASTQYR-
QD
RITIYLRAVSTLNGEILKTVYTSKTILSTSVNGSFFRYIDTERLLEAEVGLTQNEPVQLAVTEAIEKAVRSLII-
EG TRDKIW
REFERENCES
[0502] Chin J W., Martin A B., King D S., Wang L., Schultz P G.
(2002) Addition of a photocrosslinking amino acid to the genetic
code of Escherichia coli. Proc Nat Acad Sci USA 99(17):
11020-11024. [0503] Goyal P, Van Gerven N, Jonckheere W, Remaut H.
(2013) Crystallization and preliminary X-ray crystallographic
analysis of the curli transporter CsgG. Acta Crystallogr Sect F
Struct Biol Cryst Commun. 69(Pt 12):1349-53. [0504] Goyal P,
Krasteva P V, Van Gerven N, Gubellini F, Van den Broeck I,
Troupiotis-Tsailaki A, Jonckheere W, Pehau-Arnaudet G, Pinkner J S,
Chapman M R, Hultgren S J, Howorka S, Fronzes R, Remaut H. (2014)
Structural and mechanistic insights into the bacterial amyloid
secretion channel CsgG. Nature 516(7530):250-3. [0505] Hammar M,
Arnqvist A, Bian Z, Olsen A, Normark S. (1995) Expression of two
csg operons is required for production of fibronectin- and congo
red-binding curli polymers in Escherichia coli K-12. Mol Microbiol.
18(4):661-70. [0506] Juncker A S, Willenbrock H, Von Heijne G,
Brunak S, Nielsen H, Krogh A. (2003) Prediction of lipoprotein
signal peptides in Gram-negative bacteria. Protein Sci.
12(8):1652-62. [0507] Ludtke S J. 2016, Single-particle refinement
and variability analysis in EMAN2.1. Methods Enzymol. 579:159-89.
[0508] Rohou A and Grigorieff N 2015, CTFFIND4: Fast and accurate
defocus estimation from electron micrographs. J Struct Biol.
192(2):216-21. [0509] Robinson L S, Ashman E M, Hultgren S J,
Chapman M R. (2006) Secretion of curli fibre subunits is mediated
by the outer membrane-localized CsgG protein. Molecular
Microbiology 59, 870-881. [0510] Scheres 2012, RELION:
implementation of a Bayesian approach to cryo-E M structure
determination. J. Struct. Biol. 180(3):519-30. [0511] Wang A.,
Winblade Nairn N., Marelli M., Grabstein K. (2012). Protein
Engineering with Non-Natural Amino Acids. Protein Engineering,
Prof. Pravin Kaumaya (Ed.), InTech, DOI: 10.5772/28719. [0512]
Zheng S Q., Palovcak E., Armache J-P., Verba K A., Cheng Y., Agard
D A. (2017) MotionCor2: anisotropic correction of beam-induced
Sequence CWU 1
1
1201834DNAEscherichia coli 1atgcagcgct tatttctttt ggttgccgtc
atgttactga gcggatgctt aaccgccccg 60cctaaagaag ccgccagacc gacattaatg
cctcgtgctc agagctacaa agatttgacc 120catctgccag cgccgacggg
taaaatcttt gtttcggtat acaacattca ggacgaaacc 180gggcaattta
aaccctaccc ggcaagtaac ttctccactg ctgttccgca aagcgccacg
240gcaatgctgg tcacggcact gaaagattct cgctggttta taccgctgga
gcgccagggc 300ttacaaaacc tgcttaacga gcgcaagatt attcgtgcgg
cacaagaaaa cggcacggtt 360gccattaata accgaatccc gctgcaatct
ttaacggcgg caaatatcat ggttgaaggt 420tcgattatcg gttatgaaag
caacgtcaaa tctggcgggg ttggggcaag atattttggc 480atcggtgccg
acacgcaata ccagctcgat cagattgccg tgaacctgcg cgtcgtcaat
540gtgagtaccg gcgagatcct ttcttcggtg aacaccagta agacgatact
ttcctatgaa 600gttcaggccg gggttttccg ctttattgac taccagcgct
tgcttgaagg ggaagtgggt 660tacacctcga acgaacctgt tatgctgtgc
ctgatgtcgg ctatcgaaac aggggtcatt 720ttcctgatta atgatggtat
cgaccgtggt ctgtgggatt tgcaaaataa agcagaacgg 780cagaatgaca
ttctggtgaa ataccgccat atgtcggttc caccggaatc ctga
8342277PRTEscherichia coli 2Met Gln Arg Leu Phe Leu Leu Val Ala Val
Met Leu Leu Ser Gly Cys1 5 10 15Leu Thr Ala Pro Pro Lys Glu Ala Ala
Arg Pro Thr Leu Met Pro Arg 20 25 30Ala Gln Ser Tyr Lys Asp Leu Thr
His Leu Pro Ala Pro Thr Gly Lys 35 40 45Ile Phe Val Ser Val Tyr Asn
Ile Gln Asp Glu Thr Gly Gln Phe Lys 50 55 60Pro Tyr Pro Ala Ser Asn
Phe Ser Thr Ala Val Pro Gln Ser Ala Thr65 70 75 80Ala Met Leu Val
Thr Ala Leu Lys Asp Ser Arg Trp Phe Ile Pro Leu 85 90 95Glu Arg Gln
Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile Arg 100 105 110Ala
Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg Ile Pro Leu 115 120
125Gln Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly Ser Ile Ile Gly
130 135 140Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr
Phe Gly145 150 155 160Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn Leu 165 170 175Arg Val Val Asn Val Ser Thr Gly Glu
Ile Leu Ser Ser Val Asn Thr 180 185 190Ser Lys Thr Ile Leu Ser Tyr
Glu Val Gln Ala Gly Val Phe Arg Phe 195 200 205Ile Asp Tyr Gln Arg
Leu Leu Glu Gly Glu Val Gly Tyr Thr Ser Asn 210 215 220Glu Pro Val
Met Leu Cys Leu Met Ser Ala Ile Glu Thr Gly Val Ile225 230 235
240Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp Leu Gln Asn
245 250 255Lys Ala Glu Arg Gln Asn Asp Ile Leu Val Lys Tyr Arg His
Met Ser 260 265 270Val Pro Pro Glu Ser 2753262PRTEscherichia coli
3Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Arg Pro Thr Leu Met Pro1 5
10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ala Pro Thr
Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly
Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro
Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg
Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu
Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val
Ala Ile Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr Ala Ala
Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn
Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly
Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155
160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val Asn
165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val
Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Val
Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser
Ala Ile Glu Thr Gly Val 210 215 220Ile Phe Leu Ile Asn Asp Gly Ile
Asp Arg Gly Leu Trp Asp Leu Gln225 230 235 240Asn Lys Ala Glu Arg
Gln Asn Asp Ile Leu Val Lys Tyr Arg His Met 245 250 255Ser Val Pro
Pro Glu Ser 2604414PRTEscherichia coli 4Ala Thr Gly Cys Gly Thr Gly
Thr Cys Ala Ala Ala Cys Ala Thr Gly1 5 10 15Cys Ala Gly Thr Ala Gly
Thr Thr Cys Thr Ala Cys Thr Cys Ala Thr 20 25 30Gly Cys Thr Thr Ala
Thr Thr Thr Cys Gly Cys Cys Ala Thr Thr Ala 35 40 45Ala Gly Thr Thr
Gly Gly Gly Cys Thr Gly Gly Ala Ala Cys Cys Ala 50 55 60Thr Gly Ala
Cys Thr Thr Thr Cys Cys Ala Gly Thr Thr Cys Cys Gly65 70 75 80Thr
Ala Ala Thr Cys Cys Ala Ala Ala Cys Thr Thr Thr Gly Gly Thr 85 90
95Gly Gly Thr Ala Ala Cys Cys Cys Ala Ala Ala Thr Ala Ala Thr Gly
100 105 110Gly Cys Gly Cys Thr Thr Thr Thr Thr Thr Ala Thr Thr Ala
Ala Ala 115 120 125Thr Ala Gly Cys Gly Cys Thr Cys Ala Gly Gly Cys
Cys Cys Ala Ala 130 135 140Ala Ala Cys Thr Cys Thr Thr Ala Thr Ala
Ala Ala Gly Ala Thr Cys145 150 155 160Cys Gly Ala Gly Cys Thr Ala
Thr Ala Ala Cys Gly Ala Thr Gly Ala 165 170 175Cys Thr Thr Thr Gly
Gly Thr Ala Thr Thr Gly Ala Ala Ala Cys Ala 180 185 190Cys Cys Cys
Thr Cys Ala Gly Cys Gly Thr Thr Ala Gly Ala Thr Ala 195 200 205Ala
Cys Thr Thr Thr Ala Cys Thr Cys Ala Gly Gly Cys Cys Ala Thr 210 215
220Cys Cys Ala Gly Thr Cys Ala Cys Ala Ala Ala Thr Thr Thr Thr
Ala225 230 235 240Gly Gly Thr Gly Gly Gly Cys Thr Ala Cys Thr Gly
Thr Cys Gly Ala 245 250 255Ala Thr Ala Thr Thr Ala Ala Thr Ala Cys
Cys Gly Gly Thr Ala Ala 260 265 270Ala Cys Cys Gly Gly Gly Cys Cys
Gly Cys Ala Thr Gly Gly Thr Gly 275 280 285Ala Cys Cys Ala Ala Cys
Gly Ala Thr Thr Ala Thr Ala Thr Thr Gly 290 295 300Thr Cys Gly Ala
Thr Ala Thr Thr Gly Cys Cys Ala Ala Cys Cys Gly305 310 315 320Cys
Gly Ala Thr Gly Gly Thr Cys Ala Ala Thr Thr Gly Cys Ala Gly 325 330
335Thr Thr Gly Ala Ala Cys Gly Thr Gly Ala Cys Ala Gly Ala Thr Cys
340 345 350Gly Thr Ala Ala Ala Ala Cys Cys Gly Gly Ala Cys Ala Ala
Ala Cys 355 360 365Cys Thr Cys Gly Ala Cys Cys Ala Thr Cys Cys Ala
Gly Gly Thr Thr 370 375 380Thr Cys Gly Gly Gly Thr Thr Thr Ala Cys
Ala Ala Ala Ala Thr Ala385 390 395 400Ala Cys Thr Cys Ala Ala Cys
Cys Gly Ala Thr Thr Thr Thr 405 4105138PRTEscherichia coli 5Met Arg
Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser
Trp Ala Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25
30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln
35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu
Thr 50 55 60Pro Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln
Ile Leu65 70 75 80Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro
Gly Arg Met Val 85 90 95Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg
Asp Gly Gln Leu Gln 100 105 110Leu Asn Val Thr Asp Arg Lys Thr Gly
Gln Thr Ser Thr Ile Gln Val 115 120 125Ser Gly Leu Gln Asn Asn Ser
Thr Asp Phe 130 1356119PRTEscherichia coli 6Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser
Tyr Asn Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala 35 40 45Leu Asp Asn
Phe Thr Gln Ala Ile Gln Ser Gln Ile Leu Gly Gly Leu 50 55 60Leu Ser
Asn Ile Asn Thr Gly Lys Pro Gly Arg Met Val Thr Asn Asp65 70 75
80Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu Gln Leu Asn Val
85 90 95Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr Ile Gln Val Ser Gly
Leu 100 105 110Gln Asn Asn Ser Thr Asp Phe 1157106DNAArtificial
Sequencefragment of wild-type E. coli CsgF encoding amino acids 1
to 27 and a C-terminal 6 His tag 7atgcgtgtca aacatgcagt agttctactc
atgcttattt cgccattaag ttgggctgga 60accatgactt tccagttccg tcatcaccat
caccatcact aagccc 106833PRTArtificial Sequencefragment of wild-type
E. coli CsgF encompassing amino acids 1 to 27 and a C-terminal 6
His tag 8Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg His His
His His His 20 25 30His9139DNAArtificial Sequencefragment of
wild-type E. coli CsgF encoding amino acids 1 to 38 and a
C-terminal 6 His tag 9atgcgtgtca aacatgcagt agttctactc atgcttattt
cgccattaag ttgggctgga 60accatgactt tccagttccg taatccaaac tttggtggta
acccaaataa tggccatcac 120catcaccatc actaagccc 1391044PRTArtificial
Sequencefragment of wild-type E. coli CsgF encompassing amino acids
1 to 38 and a C-terminal 6 His tag 10Met Arg Val Lys His Ala Val
Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn
Gly His His His His His His 35 4011169DNAArtificial
Sequencefragment of wild-type E. coli CsgF encoding amino acids 1
to 48 and a C-terminal 6 His tag 11atgcgtgtca aacatgcagt agttctactc
atgcttattt cgccattaag ttgggctgga 60accatgactt tccagttccg taatccaaac
tttggtggta acccaaataa tggcgctttt 120ttattaaata gcgctcaggc
ccaacatcac catcaccatc actaagccc 1691254PRTArtificial
Sequencefragment of wild-type E. coli CsgF encompassing amino acids
1 to 48 and a C-terminal 6 His tag 12Met Arg Val Lys His Ala Val
Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn
Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45His His His His
His His 5013217DNAArtificial Sequencefragment of wild-type E. coli
CsgF encoding amino acids 1 to 64 and a C-terminal 6 His tag
13atgcgtgtca aacatgcagt agttctactc atgcttattt cgccattaag ttgggctgga
60accatgactt tccagttccg taatccaaac tttggtggta acccaaataa tggcgctttt
120ttattaaata gcgctcaggc ccaaaactct tataaagatc cgagctataa
cgatgacttt 180ggtattgaaa cacatcacca tcaccatcac taagccc
2171470PRTArtificial Sequencefragment of wild-type E. coli CsgF
encompassing amino acids 1 to 64 and a C-terminal 6 His tag 14Met
Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10
15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly
20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala
Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile
Glu Thr 50 55 60His His His His His His65 701534PRTArtificial
Sequenceresidues 20 to 53 of E. coli CsgF 15Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys
Asp1625PRTArtificial Sequenceresidues 20 to 42 of E. coli CsgF,
including KD at its C-terminus 16Gly Thr Met Thr Phe Gln Phe Arg
Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu
Lys Asp 20 251732PRTArtificial Sequenceresidues 23 to 55 of CsgF
homologue Q88H88 17Thr Glu Leu Val Tyr Thr Pro Val Asn Pro Ala Phe
Gly Gly Asn Pro1 5 10 15Leu Asn Gly Thr Trp Leu Leu Asn Asn Ala Gln
Ala Gln Asn Asp Tyr 20 25 301832PRTArtificial Sequenceresidues 25
to 57 of CsgF homologue A0A143HJA0 18Thr Glu Leu Ile Tyr Glu Pro
Val Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Leu Asn Gly Ser Tyr Leu
Leu Asn Asn Ala Gln Ala Gln Asp Arg His 20 25 301932PRTArtificial
Sequenceresidues 21 to 53 of CsgF homologue Q5E245 19Ser Glu Leu
Val Tyr Thr Pro Val Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Leu Asn
Thr Ser His Leu Phe Gly Gly Ala Asn Ala Ile Asn Asp Tyr 20 25
302032PRTArtificial Sequenceresidues 19 to 51 of CsgF homologue
Q084E5 20Thr Gln Leu Val Tyr Thr Pro Val Asn Pro Ala Phe Gly Gly
Ser Tyr1 5 10 15Leu Asn Gly Ser Tyr Leu Leu Ala Asn Ala Ser Ala Gln
Asn Glu His 20 25 302132PRTArtificial Sequenceresidues 15 to 47 of
CsgF homologue F0LZU2 21Ser Ser Leu Val Tyr Glu Pro Val Asn Pro Thr
Phe Gly Gly Asn Pro1 5 10 15Leu Asn Thr Thr His Leu Phe Ser Arg Ala
Glu Ala Ile Asn Asp Tyr 20 25 302232PRTArtificial Sequenceresidues
26 to 58 of CsgF homologue A0A136HQR0 22Thr Glu Leu Val Tyr Glu Pro
Ile Asn Pro Ser Phe Gly Gly Asn Pro1 5 10 15Leu Asn Gly Ser Phe Leu
Leu Ser Lys Ala Asn Ser Gln Asn Ala His 20 25 302332PRTArtificial
Sequenceresidues 21 to 53 of CsgF homologue A0A0W1SRL3 23Thr Glu
Ile Val Tyr Gln Pro Ile Asn Pro Ser Phe Gly Gly Asn Pro1 5 10 15Met
Asn Gly Ser Phe Leu Leu Gln Lys Ala Gln Ser Gln Asn Ala His 20 25
302433PRTArtificial Sequenceresidues 26 to 59 of CsgF homologue
B0UH01 24Ser Ser Leu Val Tyr Gln Pro Val Asn Pro Ala Phe Gly Gly
Pro Gln1 5 10 15Leu Asn Gly Ser Trp Leu Gln Ala Glu Ala Asn Ala Gln
Asn Ile Pro 20 25 30Gln2531PRTArtificial Sequenceresidues 22 to 53
of CsgF homologue Q6NAU5 25Gly Ser Leu Val Tyr Thr Pro Thr Asn Pro
Ala Phe Gly Gly Ser Pro1 5 10 15Leu Asn Gly Ser Trp Gln Met Gln Gln
Ala Thr Ala Gly Asn His 20 25 302632PRTArtificial Sequenceresidues
7 to 38 of CsgF homologue G8PUY5 26Gln Gln Leu Ile Tyr Gln Pro Thr
Asn Pro Ser Phe Gly Gly Tyr Ala1 5 10 15Ala Asn Thr Thr His Leu Phe
Ala Thr Ala Asn Ala Gln Lys Thr Ala 20 25 302732PRTArtificial
Sequenceresidues 25 to 57 of CsgF homologue A0A0S2ETP7 27Gly Asp
Leu Val Tyr Thr Pro Val Asn Pro Ser Phe Gly Gly Ser Pro1 5 10 15Leu
Asn Ser Ala His Leu Leu Ser Ile Ala Gly Ala Gln Lys Asn Ala 20 25
302832PRTArtificial Sequenceresidues 19 to 51 of CsgF homologue
E3I1Z1 28Ala Glu Leu Gly Tyr Thr Pro Val Asn Pro Ser Phe Gly Gly
Ser Pro1 5 10 15Leu Asn Gly Ser Thr Leu Leu Ser Glu Ala Ser Ala Gln
Lys Pro Asn 20 25 302931PRTArtificial Sequenceresidues 24 to 55 of
CsgF homologue F3Z094 29Thr Glu Leu Val Phe Ser Phe Thr Asn Pro Ser
Phe Gly Gly Asp Pro1 5 10 15Met Ile Gly Asn Phe Leu Leu Asn Lys Ala
Asp Ser Gln Lys Arg 20 25 303032PRTArtificial Sequenceresidues 21
to 53 of CsgF homologue A0A176T7M2 30Gln Gln Leu Val Tyr Lys Ser
Ile Asn Pro Phe Phe
Gly Gly Gly Asp1 5 10 15Ser Phe Ala Tyr Gln Gln Leu Leu Ala Ser Ala
Asn Ala Gln Asn Asp 20 25 303131PRTArtificial Sequenceresidues 14
to 45 of CsgF homologue D2QPP8 31Gln Ala Leu Val Tyr His Pro Asn
Asn Pro Ala Phe Gly Gly Asn Thr1 5 10 15Phe Asn Tyr Gln Trp Met Leu
Ser Ser Ala Gln Ala Gln Asp Arg 20 25 303232PRTArtificial
Sequenceresidues 28 to 58 of CsgF homologue N2IYT1 32Thr Glu Leu
Val Tyr Thr Pro Lys Asn Pro Ala Phe Gly Gly Ser Pro1 5 10 15Leu Asn
Gly Ser Tyr Leu Leu Gly Asn Ala Gln Ala Gln Asn Asp Tyr 20 25
303332PRTArtificial Sequenceresidues 26 to 58 of CsgF homologue
W7QHV5 33Gly Gln Leu Ile Tyr Gln Pro Ile Asn Pro Ser Phe Gly Gly
Asp Pro1 5 10 15Leu Leu Gly Asn His Leu Leu Asn Lys Ala Gln Ala Gln
Asp Thr Lys 20 25 303432PRTArtificial Sequenceresidues 23 to 55 of
CsgF homologue D4ZLW2 34Thr Gln Leu Ile Tyr Thr Pro Val Asn Pro Asn
Phe Gly Gly Ser Tyr1 5 10 15Leu Asn Gly Ser Tyr Leu Leu Ala Asn Ala
Ser Val Gln Asn Asp His 20 25 303532PRTArtificial Sequenceresidues
21 to 53 of CsgF homologue D2QT92 35Gln Ala Phe Val Tyr His Pro Asn
Asn Pro Asn Phe Gly Gly Asn Thr1 5 10 15Phe Asn Tyr Ser Trp Met Leu
Ser Ser Ala Gln Ala Gln Asp Arg Thr 20 25 303631PRTArtificial
Sequenceresidues 20 to 51 of CsgF homologue A0A167UJA2 36Gln Gly
Leu Ile Tyr Lys Pro Lys Asn Pro Ala Phe Gly Gly Asp Thr1 5 10 15Phe
Asn Tyr Gln Trp Leu Ala Ser Ser Ala Glu Ser Gln Asn Lys 20 25
30378PRTArtificial Sequenceresidues 20 to 27 of wild-type E. coli
CsgF 37Gly Thr Met Thr Phe Gln Phe Arg1 53819PRTArtificial
Sequenceresidues 20 to 38 of wild-type E. coli CsgF 38Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn
Gly3929PRTArtificial Sequenceresidues 20 to 48 of wild-type E. coli
CsgF 39Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn
Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 20
254045PRTArtificial Sequenceresidues 20 to 64 of wild-type E. coli
CsgF 40Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn
Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn
Ser Tyr 20 25 30Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu Thr
35 40 454124DNAArtificial Sequenceprimer CsgF_d27_end 41acggaactgg
aaagtcatgg ttcc 244230DNAArtificial Sequenceprimer CsgF_d38_end
42gccattattt gggttaccac caaagtttgg 304330DNAArtificial
Sequenceprimer CsgF_d48_end 43ttgggcctga gcgctattta ataaaaaagc
304433DNAArtificial Sequenceprimer CsgF_d64_end 44tgtttcaata
ccaaagtcat cgttatagct cgg 334525DNAArtificial Sequenceprimer
pNa62_CsgF_histag_Fw 45catcaccatc accatcacta agccc
254632DNAArtificial Sequenceprimer CsgF-His_pET22b_FW 46cccccatatg
ggaaccatga ctttccagtt cc 324755DNAArtificial Sequenceprimer
CsgF-His_pET22b_Rev 47ccccgaattc ctaatggtga tggtgatggt ggtaaaaatc
ggttgagtta ttttg 554858DNAArtificial Sequenceprimer
csgEFG_pDONR221_FW 48ggggacaagt ttgtacaaaa aagcaggcta cctcaggcga
taaagccatg aaacgtta 584972DNAArtificial Sequenceprimer
csgEFG_pDONR221_Rev 49ggggaccact ttgtacaaga aagctgggtg tttaaactca
tttttcgaac tgcgggtggc 60tccaagcgct gg 725059DNAArtificial
Sequenceprimer Mut_csgF_His_FW 50caaaataact caaccgattt tcatcaccat
caccatcact aagccccagc ttcataagg 595159DNAArtificial Sequenceprimer
Mut_csgF_His_Rev 51ccttatgaag ctggggctta gtgatggtga tggtgatgaa
aatcggttga gttattttg 595221DNAArtificial Sequenceprimer DelCsgE_Rev
52agcctgcttt tttgtacaaa c 215323DNAArtificial Sequenceprimer
DelCsgE FW 53ataaaaaatt gttcggaggc tgc 235430PRTArtificial
Sequenceresidues 1 to 29 of mature E. coli CsgF 54Gly Thr Met Thr
Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly
Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25
305535PRTArtificial Sequenceresidues 1 to 45 of mature E. coli CsgF
55Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1
5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser
Tyr 20 25 30Lys Asp Pro 3556155PRTArtificial Sequencea mutated
(T4C/N17S) CsgF sequence with a signal sequence, and a TEV protease
cleavage site (ENLYFQS) inserted between residues 35 and 36 of
sequence of the mature protein 56Met Arg Val Lys His Ala Val Val
Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Cys
Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly
Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp
Pro Glu Asn Leu Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly
Ile Glu Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile
Gln Ser Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly
Lys Pro Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105
110Asn Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly
115 120 125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser
Thr Asp 130 135 140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145
150 15557155PRTArtificial Sequencea mutated (N17S-Del) CsgF
sequence with a signal sequence, and a TEV protease cleavage site
(ENLYFQS) inserted between residues 35 and 36 of sequence of the
mature protein 57Met Arg Val Lys His Ala Val Val Leu Leu Met Leu
Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg
Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly Ala Phe Leu Leu
Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu
Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro
Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile
Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg
Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp
Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln
Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135
140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15558155PRTArtificial Sequencea mutated (G1C/N17S) CsgF sequence
with a signal sequence, and a TEV protease cleavage site (ENLYFQS)
inserted between residues 35 and 36 of sequence of the mature
protein 58Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe Gln Phe Arg Asn Pro
Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly Ala Phe Leu Leu Asn Ser
Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu Tyr Phe
Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala
Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile Leu Gly
Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg Met Val
Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp Gly Gln
Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln Thr Ser
Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135 140Phe
Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15559155PRTArtificial Sequencea mutated (G1C) CsgF sequence with a
signal sequence, and a TEV protease cleavage site (ENLYFQS)
inserted between residues 35 and 36 of sequence of the mature
protein 59Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe Gln Phe Arg Asn Pro
Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser
Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu Tyr Phe
Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala
Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile Leu Gly
Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg Met Val
Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp Gly Gln
Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln Thr Ser
Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135 140Phe
Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15560155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 46 of sequence of the mature protein, and a His10
tag at the C-terminus 60Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr
Asn Asp Asp Phe Gly Ile Glu Thr 50 55 60Glu Asn Leu Tyr Phe Gln Ser
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15561155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 35 and 36 of sequence of the mature protein, and a His10
tag at the C-terminus 61Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn
Leu Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15562155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 30 and 31 of sequence of the mature protein, and a His10
tag at the C-terminus. 62Met Arg Val Lys His Ala Val Val Leu Leu
Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Glu Asn Leu Tyr Phe Gln
Ser Ser Tyr Lys Asp Pro Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu
Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser
Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro
Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn
Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15563149PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 51 of sequence of the mature protein, and a His10
tag at the C-terminus 63Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr
Asn Asp Asp Phe Gly Ile Glu Thr 50 55 60Glu Asn Leu Tyr Phe Gln Ser
Phe Thr Gln Ala Ile Gln Ser Gln Ile65 70 75 80Leu Gly Gly Leu Leu
Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg Met 85 90 95Val Thr Asn Asp
Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu 100 105 110Gln Leu
Asn Val Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr Ile Gln 115 120
125Val Ser Gly Leu Gln Asn Asn Ser Thr Asp Phe His His His His His
130 135 140His His His His His14564149PRTArtificial Sequencea CsgF
sequence with a signal sequence, a TEV protease cleavage site
(ENLYFQS) inserted between residues 30 and 37 of sequence of the
mature protein, and a His10 tag at the C-terminus 64Met Arg Val Lys
His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala
Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn
Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn
Glu Asn Leu Tyr Phe Gln Ser Tyr Asn Asp Asp Phe Gly Ile Glu 50 55
60Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln Ile65
70 75 80Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg
Met 85 90 95Val Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly
Gln Leu 100 105 110Gln Leu Asn Val Thr Asp Arg Lys Thr Gly Gln Thr
Ser Thr Ile Gln 115 120 125Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
Phe His His His His His 130 135 140His His His His
His14565155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 34 and 36 of sequence of the mature protein, and a
His10 tag at the C-terminus 65Met Arg Val Lys His Ala Val Val Leu
Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala
Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Leu Glu Val
Leu Phe Gln Gly Pro Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15566156PRTArtificial Sequencea CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 42 and 43 of sequence of the mature protein, and a
His10 tag at the C-terminus 66Met Arg Val Lys His Ala Val Val Leu
Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro
Ser Tyr Asn Asp Asp Phe Gly Ile Leu Glu 50 55 60Val Leu Phe Gln Gly
Pro Glu Thr Pro Ser Ala Leu Asp Asn Phe Thr65 70 75 80Gln Ala Ile
Gln Ser Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn 85 90 95Thr Gly
Lys Pro Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile 100 105
110Ala Asn Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr
115 120 125Gly Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn
Ser Thr 130 135 140Asp Phe Ser Ala Trp Ser His Pro Gln Phe Glu
Lys145 150 15567148PRTArtificial Sequencea CsgF sequence with a
signal sequence, a HCV C3 protease cleavage site (LEVLFQGP)
inserted between residues 38 and 47 of sequence of the mature
protein, and a His10 tag at the C-terminus 67Met Arg Val Lys His
Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys
Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro
Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser
Tyr Lys Asp Pro Ser Tyr Asn Leu Glu Val Leu Phe Gln Gly 50 55 60Pro
Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln Ile Leu65 70 75
80Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg Met Val
85 90 95Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu
Gln 100 105 110Leu Asn Val Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr
Ile Gln Val 115 120 125Ser Gly Leu Gln Asn Asn Ser Thr Asp Phe Ser
Ala Trp Ser His Pro 130 135 140Gln Phe Glu
Lys14568248PRTCitrobacter koseri 68Met Pro Arg Ala Gln Ser Tyr Lys
Asp Leu Thr His Leu Pro Met Pro1 5 10 15Thr Gly Lys Ile Phe Val Ser
Val Tyr Asn Ile Gln Asp Glu Thr Gly 20 25 30Gln Phe Lys Pro Tyr Pro
Ala Ser Asn Phe Ser Thr Ala Val Pro Gln 35 40 45Ser Ala Thr Ala Met
Leu Val Thr Ala Leu Lys Asp Ser Arg Trp Phe 50 55 60Ile Pro Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys65 70 75 80Ile Ile
Arg Ala Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg 85 90 95Ile
Pro Leu Gln Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly Ser 100 105
110Ile Ile Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
115 120 125Tyr Phe Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala 130 135 140Val Asn Leu Arg Val Val Asn Val Ser Thr Gly Glu
Ile Leu Ser Ser145 150 155 160Val Asn Thr Ser Lys Thr Ile Leu Ser
Tyr Glu Val Gln Ala Gly Val 165 170 175Phe Arg Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Ile Gly Tyr 180 185 190Thr Ser Asn Glu Pro
Val Met Leu Cys Leu Met Ser Ala Ile Glu Thr 195 200 205Gly Val Ile
Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp 210 215 220Leu
Gln Asn Lys Ala Glu Arg Gln Asn Asp Ile Leu Val Lys Tyr Arg225 230
235 240His Met Ser Val Pro Pro Glu Ser 24569223PRTSalmonella
enterica 69Cys Leu Thr Ala Pro Pro Lys Gln Ala Ala Lys Pro Thr Leu
Met Pro1 5 10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ala
Pro Thr Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu
Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala
Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp
Ser Arg Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly
Thr Val Ala Met Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr
Ala Ala Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu
Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly
Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150
155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val
Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly
Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu
Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met
Ser Ala Ile Glu Thr Gly 210 215 22070262PRTCitrobacter amalonaticus
70Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1
5 10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ile Pro Thr
Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly
Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro
Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg
Trp Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu
Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val
Ala Ile Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr Ala Ala
Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn
Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly
Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155
160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val Asn
165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val
Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile
Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser
Ala Ile Glu Thr Gly Val 210 215 220Ile Phe Leu Ile Asn Asp Gly Ile
Asp Arg Gly Leu Trp Asp Leu Gln225 230 235 240Asn Lys Ala Asp Arg
Gln Asn Asp Ile Leu Val Lys Tyr Arg His Met 245 250 255Ser Val Pro
Pro Glu Ser 26071262PRTCitrobacter rodentium 71Cys Leu Thr Thr Pro
Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser
Tyr Lys Asp Leu Thr His Leu Pro Val Pro Thr Gly 20 25 30Lys Ile Phe
Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro
Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr
Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg Trp Phe Ile Pro65 70 75
80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile
85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg Ile
Pro 100 105 110Leu Pro Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly
Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Ala
Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Ala Asp Thr Gln Tyr Gln
Leu Asp Gln Ile Ala Val Asn145 150 155 160Leu Arg Val Val Asn Val
Ser Thr Gly Glu Ile Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr
Ile Leu Ser Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile
Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile Gly Tyr Thr Ser 195 200
205Asn Glu Pro Val Met Leu Cys Leu Met Ser Ala Ile Glu Thr Gly Val
210 215 220Ile Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp
Leu Gln225 230 235 240Asn Lys Ala Asp Arg Gln Asn Asp Ile Leu Val
Lys Tyr Arg Gln Met 245 250 255Ser Val Pro Pro Glu Ser
26072262PRTEnterobacter asburiae 72Cys Leu Thr Ala Pro Pro Lys Glu
Ala Ala Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Arg Asp
Leu Thr His Leu Pro Ala Pro Thr Gly 20 25 30Lys Ile Phe Val Ser Val
Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala
Ser Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu
Val Thr Ala Leu Lys Asp Ser His Trp Phe Ile Pro65 70 75 80Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg
Ala Ala Gln Glu Asn Gly Thr Val Ala Asn Asn Asn Arg Met Pro 100 105
110Leu Gln Ser Leu Ala Ala Ala Asn Val Met Ile Glu Gly Ser Ile Ile
115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
Tyr Phe 130 135 140Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn145 150 155 160Leu Arg Val Val Asn Val Ser Thr Gly
Glu Val Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser
Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro
Val Met Met Cys Leu Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile
Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp Leu Gln225 230
235 240Asn Lys Ala Asp Ala Gln Asn Pro Val Leu Val Lys Tyr Arg Asp
Met 245 250 255Ser Val Pro Pro Glu Ser 26073262PRTYokenella
regensburgei 73Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr
Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Arg Asp Leu Thr His Leu Pro
Leu Pro Ser Gly 20 25 30Lys Val Phe Val Ser Val Tyr Asn Ile Gln Asp
Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr
Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys
Asp Ser Arg Trp Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln
Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn
Gly Thr Val Ala Asp Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu
Thr Ala Ala Asn Val Met Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr
Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135
140Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val
Asn145 150 155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Val Leu
Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val
Gln Ala Gly Val Phe Arg 180 185 190Phe Val Asp Tyr Gln Arg Leu Leu
Glu Gly Glu Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu
Cys Leu Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu Ile
Asn Asp Gly Ile Glu Arg Gly Leu Trp Asp Leu Gln225 230 235 240Gln
Lys Ala Asp Val Asp Asn Pro Ile Leu Ala Arg Tyr Arg Asn Met 245 250
255Ser Ala Pro Pro Glu Ser 26074262PRTCronobacter pulveris 74Cys
Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1 5 10
15Arg Ala Gln Ser Tyr Arg Asp Leu Thr Asn Leu Pro Asp Pro Lys Gly
20 25 30Lys Leu Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln
Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro Gln
Ser Ala 50 55 60Thr Ser Met Leu Val Thr Ala Leu Lys Asp Ser Arg Trp
Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu Asn
Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val Ala
Glu Asn Asn Arg Met Pro 100 105 110Leu Gln Ser Leu Val Ala Ala Asn
Val Met Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn Val
Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Gly
Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155 160Leu
Arg Val Val Asn Val Ser Thr Gly Glu Val Leu Ser Ser Val Asn 165 170
175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val Phe Arg
180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile Gly Tyr
Thr Ala 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser Ala Ile
Glu Thr Gly Val 210 215 220Ile His Leu Ile Asn Asp Gly Ile Asn Arg
Gly Leu Trp Glu Leu Lys225 230 235 240Asn Lys Gly Asp Ala Lys Asn
Thr Ile Leu Ala Lys Tyr Arg Ser Met 245 250 255Ala Val Pro Pro Glu
Ser 26075262PRTRahnella aquatilis 75Cys Leu Thr Ala Ala Pro Lys Glu
Ala Ala Arg Pro Thr Leu Leu Pro1 5 10 15Arg Ala Pro Ser Tyr Thr Asp
Leu Thr His Leu Pro Ser Pro Gln Gly 20 25 30Arg Ile Phe Val Ser Val
Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala
Cys Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu
Val Ser Ala Leu Lys Asp Ser Lys Trp Phe Ile Pro65 70 75 80Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg
Ala Ala Gln Glu Asn Gly Ser Val Ala Ile Asn Asn Gln Arg Pro 100 105
110Leu Ser Ser Leu Val Ala Ala Asn Ile Leu Ile Glu Gly Ser Ile Ile
115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
Tyr Phe 130 135 140Gly Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn145 150 155 160Leu Arg Ala Val Asp Val Asn Thr Gly
Glu Val Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu
Ser
Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Leu Gly Tyr Thr Thr 195 200 205Asn Glu Pro
Val Met Leu Cys Leu Met Ser Ala Ile Glu Ser Gly Val 210 215 220Ile
Tyr Leu Val Asn Asp Gly Ile Glu Arg Asn Leu Trp Gln Leu Gln225 230
235 240Asn Pro Ser Glu Ile Asn Ser Pro Ile Leu Gln Arg Tyr Lys Asn
Asn 245 250 255Ile Val Pro Ala Glu Ser 26076259PRTKluyvera
ascorbata 76Cys Ile Thr Ser Pro Pro Lys Gln Ala Ala Lys Pro Thr Leu
Leu Pro1 5 10 15Arg Ser Gln Ser Tyr Gln Asp Leu Thr His Leu Pro Glu
Pro Gln Gly 20 25 30Arg Leu Phe Val Ser Val Tyr Asn Ile Ser Asp Glu
Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ser
Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Ser Ala Leu Lys Asp
Ser Asn Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly
Thr Val Ala Val Asn Asn Arg Thr Gln 100 105 110Leu Pro Ser Leu Val
Ala Ala Asn Ile Leu Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu
Ser Asn Val Lys Ser Gly Gly Ala Gly Ala Arg Tyr Phe 130 135 140Gly
Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150
155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Val Leu Ser Ser Val
Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Phe Gln Ala Gly
Val Phe Arg 180 185 190Tyr Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu
Val Gly Tyr Thr Val 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met
Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu Val Asn Asp Gly
Ile Ser Arg Asn Leu Trp Gln Leu Lys225 230 235 240Asn Ala Ser Asp
Ile Asn Ser Pro Val Leu Glu Lys Tyr Lys Ser Ile 245 250 255Ile Val
Pro77259PRTHafnia alvei 77Cys Leu Thr Ala Pro Pro Lys Gln Ala Ala
Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Gln Asp Leu Thr
His Leu Pro Glu Pro Ala Gly 20 25 30Lys Leu Phe Val Ser Val Tyr Asn
Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn
Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Ser
Ala Leu Lys Asp Ser Gly Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln
Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala
Gln Glu Asn Gly Thr Ala Ala Val Asn Asn Gln His Gln 100 105 110Leu
Ser Ser Leu Val Ala Ala Asn Val Leu Val Glu Gly Ser Ile Ile 115 120
125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Ala Gly Ala Arg Phe Phe
130 135 140Gly Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln Ile Ala
Val Asn145 150 155 160Leu Arg Val Val Asp Val Asn Thr Gly Gln Val
Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu
Val Gln Ala Gly Val Phe Arg 180 185 190Tyr Ile Asp Tyr Gln Arg Leu
Leu Glu Gly Glu Ile Gly Tyr Thr Thr 195 200 205Asn Glu Pro Val Met
Leu Cys Val Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu
Val Asn Asp Gly Ile Asn Arg Asn Leu Trp Thr Leu Lys225 230 235
240Asn Pro Gln Asp Ala Lys Ser Ser Val Leu Glu Arg Tyr Lys Ser Thr
245 250 255Ile Val Pro78255PRTEnterobacteriaceae bacterium 78Cys
Ile Thr Thr Pro Pro Gln Glu Ala Ala Lys Pro Thr Leu Leu Pro1 5 10
15Arg Asp Ala Thr Tyr Lys Asp Leu Val Ser Leu Pro Gln Pro Arg Gly
20 25 30Lys Ile Tyr Val Ala Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln
Phe 35 40 45Gln Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ser Val Pro Gln
Ser Ala 50 55 60Thr Ala Met Leu Val Ser Ser Leu Lys Asp Ser Arg Trp
Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Asn Asn Leu Leu Asn
Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Gln Asn Gly Thr Val Gly
Asp Asn Asn Ala Ser Pro 100 105 110Leu Pro Ser Leu Tyr Ser Ala Asn
Val Ile Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Ala Ser Asn Val
Lys Thr Gly Gly Phe Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Gly
Ser Thr Gln Tyr Gln Leu Asp Gln Val Ala Val Asn145 150 155 160Leu
Arg Ile Val Asn Val His Thr Gly Glu Val Leu Ser Ser Val Asn 165 170
175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Ile Gln Ala Gly Val Phe Arg
180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ala Gly Phe
Thr Thr 195 200 205Asn Glu Pro Val Met Thr Cys Leu Met Ser Ala Ile
Glu Glu Gly Val 210 215 220Ile His Leu Ile Asn Asp Gly Ile Asn Lys
Lys Leu Trp Ala Leu Ser225 230 235 240Asn Ala Ala Asp Ile Asn Ser
Glu Val Leu Thr Arg Tyr Arg Lys 245 250 25579258PRTPlesiomonas
shigelloides 79Ile Thr Glu Val Pro Lys Glu Ala Ala Lys Pro Thr Leu
Met Pro Arg1 5 10 15Ala Ser Thr Tyr Lys Asp Leu Val Ala Leu Pro Lys
Pro Asn Gly Lys 20 25 30Ile Ile Val Ser Val Tyr Ser Val Gln Asp Glu
Thr Gly Gln Phe Lys 35 40 45Pro Leu Pro Ala Ser Asn Phe Ser Thr Ala
Val Pro Gln Ser Gly Asn 50 55 60Ala Met Leu Thr Ser Ala Leu Lys Asp
Ser Gly Trp Phe Val Pro Leu65 70 75 80Glu Arg Glu Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile Arg 85 90 95Ala Ala Gln Glu Asn Gly
Thr Val Ala Ala Asn Asn Gln Gln Pro Leu 100 105 110Pro Ser Leu Leu
Ser Ala Asn Val Val Ile Glu Gly Ala Ile Ile Gly 115 120 125Tyr Asp
Ser Asp Ile Lys Thr Gly Gly Ala Gly Ala Arg Tyr Phe Gly 130 135
140Ile Gly Ala Asp Gly Lys Tyr Arg Val Asp Gln Val Ala Val Asn
Leu145 150 155 160Arg Ala Val Asp Val Arg Thr Gly Glu Val Leu Leu
Ser Val Asn Thr 165 170 175Ser Lys Thr Ile Leu Ser Ser Glu Leu Ser
Ala Gly Val Phe Arg Phe 180 185 190Ile Glu Tyr Gln Arg Leu Leu Glu
Leu Glu Ala Gly Tyr Thr Thr Asn 195 200 205Glu Pro Val Met Met Cys
Met Met Ser Ala Leu Glu Ala Gly Val Ala 210 215 220His Leu Ile Val
Glu Gly Ile Arg Gln Asn Leu Trp Ser Leu Gln Asn225 230 235 240Pro
Ser Asp Ile Asn Asn Pro Ile Ile Gln Arg Tyr Met Lys Glu Asp 245 250
255Val Pro80248PRTVibrio fischeri 80Pro Glu Thr Ser Glu Ser Pro Thr
Leu Met Gln Arg Gly Ala Asn Tyr1 5 10 15Ile Asp Leu Ile Ser Leu Pro
Lys Pro Gln Gly Lys Ile Phe Val Ser 20 25 30Val Tyr Asp Phe Arg Asp
Gln Thr Gly Gln Tyr Lys Pro Gln Pro Asn 35 40 45Ser Asn Phe Ser Thr
Ala Val Pro Gln Gly Gly Thr Ala Leu Leu Thr 50 55 60Met Ala Leu Leu
Asp Ser Glu Trp Phe Tyr Pro Leu Glu Arg Gln Gly65 70 75 80Leu Gln
Asn Leu Leu Thr Glu Arg Lys Ile Ile Arg Ala Ala Gln Lys 85 90 95Lys
Gln Glu Ser Ile Ser Asn His Gly Ser Thr Leu Pro Ser Leu Leu 100 105
110Ser Ala Asn Val Met Ile Glu Gly Gly Ile Val Ala Tyr Asp Ser Asn
115 120 125Ile Lys Thr Gly Gly Ala Gly Ala Arg Tyr Leu Gly Ile Gly
Gly Ser 130 135 140Gly Gln Tyr Arg Ala Asp Gln Val Thr Val Asn Ile
Arg Ala Val Asp145 150 155 160Val Arg Ser Gly Lys Ile Leu Thr Ser
Val Thr Thr Ser Lys Thr Ile 165 170 175Leu Ser Tyr Glu Val Ser Ala
Gly Ala Phe Arg Phe Val Asp Tyr Lys 180 185 190Glu Leu Leu Glu Val
Glu Leu Gly Tyr Thr Asn Asn Glu Pro Val Asn 195 200 205Ile Ala Leu
Met Ser Ala Ile Asp Ser Ala Val Ile His Leu Ile Val 210 215 220Lys
Gly Val Gln Gln Gly Leu Trp Arg Pro Ala Asn Leu Asp Thr Arg225 230
235 240Asn Asn Pro Ile Phe Lys Lys Tyr 24581248PRTAliivibrio logei
81Pro Asp Ala Ser Glu Ser Pro Thr Leu Met Gln Arg Gly Ala Thr Tyr1
5 10 15Leu Asp Leu Ile Ser Leu Pro Lys Pro Gln Gly Lys Ile Tyr Val
Ser 20 25 30Val Tyr Asp Phe Arg Asp Gln Thr Gly Gln Tyr Lys Pro Gln
Pro Asn 35 40 45Ser Asn Phe Ser Thr Ala Val Pro Gln Gly Gly Thr Ala
Leu Leu Thr 50 55 60Met Ala Leu Leu Asp Ser Glu Trp Phe Tyr Pro Leu
Glu Arg Gln Gly65 70 75 80Leu Gln Asn Leu Leu Thr Glu Arg Lys Ile
Ile Arg Ala Ala Gln Lys 85 90 95Lys Gln Glu Ser Ile Ser Asn His Gly
Ser Thr Leu Pro Ser Leu Leu 100 105 110Ser Ala Asn Val Met Ile Glu
Gly Gly Ile Val Ala Tyr Asp Ser Asn 115 120 125Ile Lys Thr Gly Gly
Ala Gly Ala Arg Tyr Leu Gly Ile Gly Gly Ser 130 135 140Gly Gln Tyr
Arg Ala Asp Gln Val Thr Val Asn Ile Arg Ala Val Asp145 150 155
160Val Arg Ser Gly Lys Ile Leu Thr Ser Val Thr Thr Ser Lys Thr Ile
165 170 175Leu Ser Tyr Glu Leu Ser Ala Gly Ala Phe Arg Phe Val Asp
Tyr Lys 180 185 190Glu Leu Leu Glu Val Glu Leu Gly Tyr Thr Asn Asn
Glu Pro Val Asn 195 200 205Ile Ala Leu Met Ser Ala Ile Asp Ser Ala
Val Ile His Leu Ile Val 210 215 220Lys Gly Ile Glu Glu Gly Leu Trp
Arg Pro Glu Asn Gln Asn Gly Lys225 230 235 240Glu Asn Pro Ile Phe
Arg Lys Tyr 24582254PRTPhotobacterium sp. 82Pro Glu Thr Ser Lys Glu
Pro Thr Leu Met Ala Arg Gly Thr Ala Tyr1 5 10 15Gln Asp Leu Val Ser
Leu Pro Leu Pro Lys Gly Lys Val Tyr Val Ser 20 25 30Val Tyr Asp Phe
Arg Asp Gln Thr Gly Gln Tyr Lys Pro Gln Pro Asn 35 40 45Ser Asn Phe
Ser Thr Ala Val Pro Gln Gly Gly Ala Ala Leu Leu Thr 50 55 60Thr Ala
Leu Leu Asp Ser Arg Trp Phe Met Pro Leu Glu Arg Glu Gly65 70 75
80Leu Gln Asn Leu Leu Thr Glu Arg Lys Ile Ile Arg Ala Ala Gln Lys
85 90 95Lys Asp Glu Ile Pro Thr Asn His Gly Val His Leu Pro Ser Leu
Ala 100 105 110Ser Ala Asn Ile Met Val Glu Gly Gly Ile Val Ala Tyr
Asp Thr Asn 115 120 125Ile Gln Thr Gly Gly Ala Gly Ala Arg Tyr Leu
Gly Val Gly Ala Ser 130 135 140Gly Gln Tyr Arg Thr Asp Gln Val Thr
Val Asn Ile Arg Ala Val Asp145 150 155 160Val Arg Thr Gly Arg Ile
Leu Leu Ser Val Thr Thr Ser Lys Thr Ile 165 170 175Leu Ser Lys Glu
Leu Gln Thr Gly Val Phe Lys Phe Val Asp Tyr Lys 180 185 190Asp Leu
Leu Glu Ala Glu Leu Gly Tyr Thr Thr Asn Glu Pro Val Asn 195 200
205Leu Ala Val Met Ser Ala Ile Asp Ala Ala Val Val His Val Ile Val
210 215 220Asp Gly Ile Lys Thr Gly Leu Trp Glu Pro Leu Arg Gly Glu
Asp Leu225 230 235 240Gln His Pro Ile Ile Gln Glu Tyr Met Asn Arg
Ser Lys Pro 245 25083261PRTAeromonas veronii 83Cys Ala Thr His Ile
Gly Ser Pro Val Ala Asp Glu Lys Ala Thr Leu1 5 10 15Met Pro Arg Ser
Val Ser Tyr Lys Glu Leu Ile Ser Leu Pro Lys Pro 20 25 30Lys Gly Lys
Ile Val Ala Ala Val Tyr Asp Phe Arg Asp Gln Thr Gly 35 40 45Gln Tyr
Leu Pro Ala Pro Ala Ser Asn Phe Ser Thr Ala Val Thr Gln 50 55 60Gly
Gly Val Ala Met Leu Ser Thr Ala Leu Trp Asp Ser Gln Trp Phe65 70 75
80Val Pro Leu Glu Arg Glu Gly Leu Gln Asn Leu Leu Thr Glu Arg Lys
85 90 95Ile Val Arg Ala Ala Gln Asn Lys Pro Asn Val Pro Gly Asn Asn
Ala 100 105 110Asn Gln Leu Pro Ser Leu Val Ala Ala Asn Ile Leu Ile
Glu Gly Gly 115 120 125Ile Val Ala Tyr Asp Ser Asn Val Arg Thr Gly
Gly Ala Gly Ala Lys 130 135 140Tyr Phe Gly Ile Gly Ala Ser Gly Glu
Tyr Arg Val Asp Gln Val Thr145 150 155 160Val Asn Leu Arg Ala Val
Asp Ile Arg Ser Gly Arg Ile Leu Asn Ser 165 170 175Val Thr Thr Ser
Lys Thr Val Met Ser Gln Gln Val Gln Ala Gly Val 180 185 190Phe Arg
Phe Val Glu Tyr Lys Arg Leu Leu Glu Ala Glu Ala Gly Phe 195 200
205Ser Thr Asn Glu Pro Val Gln Met Cys Val Met Ser Ala Ile Glu Ser
210 215 220Gly Val Ile Arg Leu Ile Ala Asn Gly Val Arg Asp Asn Leu
Trp Gln225 230 235 240Leu Ala Asp Gln Arg Asp Ile Asp Asn Pro Ile
Leu Gln Glu Tyr Leu 245 250 255Gln Asp Asn Ala Pro
26084239PRTShewanella sp. 84Ala Ser Ser Ser Leu Met Pro Lys Gly Glu
Ser Tyr Tyr Asp Leu Ile1 5 10 15Asn Leu Pro Ala Pro Gln Gly Val Met
Leu Ala Ala Val Tyr Asp Phe 20 25 30Arg Asp Gln Thr Gly Gln Tyr Lys
Pro Ile Pro Ser Ser Asn Phe Ser 35 40 45Thr Ala Val Pro Gln Ser Gly
Thr Ala Phe Leu Ala Gln Ala Leu Asn 50 55 60Asp Ser Ser Trp Phe Ile
Pro Val Glu Arg Glu Gly Leu Gln Asn Leu65 70 75 80Leu Thr Glu Arg
Lys Ile Val Arg Ala Gly Leu Lys Gly Asp Ala Asn 85 90 95Lys Leu Pro
Gln Leu Asn Ser Ala Gln Ile Leu Met Glu Gly Gly Ile 100 105 110Val
Ala Tyr Asp Thr Asn Val Arg Thr Gly Gly Ala Gly Ala Arg Tyr 115 120
125Leu Gly Ile Gly Ala Ala Thr Gln Phe Arg Val Asp Thr Val Thr Val
130 135 140Asn Leu Arg Ala Val Asp Ile Arg Thr Gly Arg Leu Leu Ser
Ser Val145 150 155 160Thr Thr Thr Lys Ser Ile Leu Ser Lys Glu Ile
Thr Ala Gly Val Phe 165 170 175Lys Phe Ile Asp Ala Gln Glu Leu Leu
Glu Ser Glu Leu Gly Tyr Thr 180 185 190Ser Asn Glu Pro Val Ser Leu
Cys Val Ala Ser Ala Ile Glu Ser Ala 195 200 205Val Val His Met Ile
Ala Asp Gly Ile Trp Lys Gly Ala Trp Asn Leu 210 215 220Ala Asp Gln
Ala Ser Gly Leu Arg Ser Pro Val Leu Gln Lys Tyr225 230
23585233PRTPseudomonas putida 85Gln Asp Ser Glu Thr Pro Thr Leu Thr
Pro Arg Ala Ser Thr Tyr Tyr1 5 10 15Asp Leu Ile Asn Met Pro Arg Pro
Lys Gly Arg Leu Met Ala Val Val 20 25 30Tyr Gly Phe Arg Asp Gln Thr
Gly Gln Tyr Lys Pro Thr Pro Ala Ser 35 40 45Ser Phe Ser Thr Ser Val
Thr Gln Gly Ala Ala Ser Met Leu Met Asp 50 55 60Ala Leu Ser Ala Ser
Gly Trp Phe Val Val Leu Glu Arg Glu Gly Leu65 70 75 80Gln Asn Leu
Leu
Thr Glu Arg Lys Ile Ile Arg Ala Ser Gln Lys Lys 85 90 95Pro Asp Val
Ala Glu Asn Ile Met Gly Glu Leu Pro Pro Leu Gln Ala 100 105 110Ala
Asn Leu Met Leu Glu Gly Gly Ile Ile Ala Tyr Asp Thr Asn Val 115 120
125Arg Ser Gly Gly Glu Gly Ala Arg Tyr Leu Gly Ile Asp Ile Ser Arg
130 135 140Glu Tyr Arg Val Asp Gln Val Thr Val Asn Leu Arg Ala Val
Asp Val145 150 155 160Arg Thr Gly Gln Val Leu Ala Asn Val Met Thr
Ser Lys Thr Ile Tyr 165 170 175Ser Val Gly Arg Ser Ala Gly Val Phe
Lys Phe Ile Glu Phe Lys Lys 180 185 190Leu Leu Glu Ala Glu Val Gly
Tyr Thr Thr Asn Glu Pro Ala Gln Leu 195 200 205Cys Val Leu Ser Ala
Ile Glu Ser Ala Val Gly His Leu Leu Ala Gln 210 215 220Gly Ile Glu
Gln Arg Leu Trp Gln Val225 23086234PRTShewanella violacea 86Met Pro
Lys Ser Asp Thr Tyr Tyr Asp Leu Ile Gly Leu Pro His Pro1 5 10 15Gln
Gly Ser Met Leu Ala Ala Val Tyr Asp Phe Arg Asp Gln Thr Gly 20 25
30Gln Tyr Lys Ala Ile Pro Ser Ser Asn Phe Ser Thr Ala Val Pro Gln
35 40 45Ser Gly Thr Ala Phe Leu Ala Gln Ala Leu Asn Asp Ser Ser Trp
Phe 50 55 60Val Pro Val Glu Arg Glu Gly Leu Gln Asn Leu Leu Thr Glu
Arg Lys65 70 75 80Ile Val Arg Ala Gly Leu Lys Gly Glu Ala Asn Gln
Leu Pro Gln Leu 85 90 95Ser Ser Ala Gln Ile Leu Met Glu Gly Gly Ile
Val Ala Tyr Asp Thr 100 105 110Asn Ile Lys Thr Gly Gly Ala Gly Ala
Arg Tyr Leu Gly Ile Gly Val 115 120 125Asn Ser Lys Phe Arg Val Asp
Thr Val Thr Val Asn Leu Arg Ala Val 130 135 140Asp Ile Arg Thr Gly
Arg Leu Leu Ser Ser Val Thr Thr Thr Lys Ser145 150 155 160Ile Leu
Ser Lys Glu Val Ser Ala Gly Val Phe Lys Phe Ile Asp Ala 165 170
175Gln Asp Leu Leu Glu Ser Glu Leu Gly Tyr Thr Ser Asn Glu Pro Val
180 185 190Ser Leu Cys Val Ala Gln Ala Ile Glu Ser Ala Val Val His
Met Ile 195 200 205Ala Asp Gly Ile Trp Lys Arg Ala Trp Asn Leu Ala
Asp Thr Ala Ser 210 215 220Gly Leu Asn Asn Pro Val Leu Gln Lys
Tyr225 23087245PRTMarinobacterium jannaschii 87Leu Thr Arg Arg Met
Ser Thr Tyr Gln Asp Leu Ile Asp Met Pro Ala1 5 10 15Pro Arg Gly Lys
Ile Val Thr Ala Val Tyr Ser Phe Arg Asp Gln Ser 20 25 30Gly Gln Tyr
Lys Pro Ala Pro Ser Ser Ser Phe Ser Thr Ala Val Thr 35 40 45Gln Gly
Ala Ala Ala Met Leu Val Asn Val Leu Asn Asp Ser Gly Trp 50 55 60Phe
Ile Pro Leu Glu Arg Glu Gly Leu Gln Asn Ile Leu Thr Glu Arg65 70 75
80Lys Ile Ile Arg Ala Ala Leu Lys Lys Asp Asn Val Pro Val Asn Asn
85 90 95Ser Ala Gly Leu Pro Ser Leu Leu Ala Ala Asn Ile Met Leu Glu
Gly 100 105 110Gly Ile Val Gly Tyr Asp Ser Asn Ile His Thr Gly Gly
Ala Gly Ala 115 120 125Arg Tyr Phe Gly Ile Gly Ala Ser Glu Lys Tyr
Arg Val Asp Glu Val 130 135 140Thr Val Asn Leu Arg Ala Ile Asp Ile
Arg Thr Gly Arg Ile Leu His145 150 155 160Ser Val Leu Thr Ser Lys
Lys Ile Leu Ser Arg Glu Ile Arg Ser Asp 165 170 175Val Tyr Arg Phe
Ile Glu Phe Lys His Leu Leu Glu Met Glu Ala Gly 180 185 190Ile Thr
Thr Asn Asp Pro Ala Gln Leu Cys Val Leu Ser Ala Ile Glu 195 200
205Ser Ala Val Ala His Leu Ile Val Asp Gly Val Ile Lys Lys Ser Trp
210 215 220Ser Leu Ala Asp Pro Asn Glu Leu Asn Ser Pro Val Ile Gln
Ala Tyr225 230 235 240Gln Gln Gln Arg Ile
24588234PRTChryseobacterium oranimense 88Pro Ser Asp Pro Glu Arg
Ser Thr Met Gly Glu Leu Thr Pro Ser Thr1 5 10 15Ala Glu Leu Arg Asn
Leu Pro Leu Pro Asn Glu Lys Ile Val Ile Gly 20 25 30Val Tyr Lys Phe
Arg Asp Gln Thr Gly Gln Tyr Lys Pro Ser Glu Asn 35 40 45Gly Asn Asn
Trp Ser Thr Ala Val Pro Gln Gly Thr Thr Thr Ile Leu 50 55 60Ile Lys
Ala Leu Glu Asp Ser Arg Trp Phe Ile Pro Ile Glu Arg Glu65 70 75
80Asn Ile Ala Asn Leu Leu Asn Glu Arg Gln Ile Ile Arg Ser Thr Arg
85 90 95Gln Glu Tyr Met Lys Asp Ala Asp Lys Asn Ser Gln Ser Leu Pro
Pro 100 105 110Leu Leu Tyr Ala Gly Ile Leu Leu Glu Gly Gly Val Ile
Ser Tyr Asp 115 120 125Ser Asn Thr Met Thr Gly Gly Phe Gly Ala Arg
Tyr Phe Gly Ile Gly 130 135 140Ala Ser Thr Gln Tyr Arg Gln Asp Arg
Ile Thr Ile Tyr Leu Arg Ala145 150 155 160Val Ser Thr Leu Asn Gly
Glu Ile Leu Lys Thr Val Tyr Thr Ser Lys 165 170 175Thr Ile Leu Ser
Thr Ser Val Asn Gly Ser Phe Phe Arg Tyr Ile Asp 180 185 190Thr Glu
Arg Leu Leu Glu Ala Glu Val Gly Leu Thr Gln Asn Glu Pro 195 200
205Val Gln Leu Ala Val Thr Glu Ala Ile Glu Lys Ala Val Arg Ser Leu
210 215 220Ile Ile Glu Gly Thr Arg Asp Lys Ile Trp225
2308945PRTArtificial SequenceCsgF peptide 89Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser
Tyr Asn Asp Asp Phe Gly Ile Glu Thr 35 40 459029PRTArtificial
SequenceCsgF peptide 90Cys Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln 20 259135PRTArtificial SequenceCsgF peptide 91Cys Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro 359235PRTArtificial SequenceCsgF peptide 92Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro 359335PRTArtificial SequenceCsgF peptide 93Cys Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Tyr Pro 359445PRTArtificial SequenceCsgF peptide 94Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu Thr 35 40
459535PRTArtificial SequenceCsgF peptide 95Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
359639PRTArtificial SequenceCsgF peptide 96Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln His His His 20 25 30His His His His
His His His 359745PRTArtificial SequenceCsgF peptide 97Cys Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn
Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys
Asp Pro His His His His His His His His His His 35 40
459835PRTArtificial SequenceCsgF peptide 98Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
359935PRTArtificial SequenceCsgF peptide 99Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510035PRTArtificial SequenceCsgF peptide 100Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510135PRTArtificial SequenceCsgF peptide 101Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
3510235PRTArtificial SequenceCsgF peptide 102Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510335PRTArtificial SequenceCsgF peptide 103Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510435PRTArtificial SequenceCsgF peptide 104Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510535PRTArtificial SequenceCsgF peptide 105Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
3510635PRTArtificial SequenceCsgF peptide 106Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510735PRTArtificial SequenceCsgF peptide 107Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ala Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510835PRTArtificial SequenceCsgF peptide 108Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510930PRTArtificial SequenceCsgF peptide 109Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25 3011030PRTArtificial
SequenceCsgF peptide 110Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn 20 25 3011130PRTArtificial SequenceCsgF peptide
111Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1
5 10 15Ala Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20
25 3011230PRTArtificial SequenceCsgF peptide 112Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25 3011335PRTArtificial
SequenceCsgF peptide 113Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511435PRTArtificial
SequenceCsgF peptide 114Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Ala Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511535PRTArtificial
SequenceCsgF peptide 115Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511635PRTArtificial
SequenceCsgF peptide 116Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Gln Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro 3511714PRTArtificial
SequenceCsgF peptide 117Gly Thr Met Thr Phe Gln Phe Arg His His His
His His His1 5 1011825PRTArtificial SequenceCsgF peptide 118Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly His His His His His His 20 2511935PRTArtificial
SequenceCsgF peptide 119Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln His His His 20 25 30His His His 3512051PRTArtificial
SequenceCsgF peptide 120Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser Tyr Asn Asp Asp Phe
Gly Ile Glu Thr His His His 35 40 45His His His 50
* * * * *
References