U.S. patent application number 17/205054 was filed with the patent office on 2022-05-26 for novel protein pores.
This patent application is currently assigned to Oxford Nanopore Technologies Limited. The applicant listed for this patent is Oxford Nanopore Technologies Limited, VIB VZW, Vrije Universiteit Brussel. Invention is credited to Richard George Hambley, Lakmal Jayasinghe, Michael Jordan, John Joseph Kilgour, Han Remaut, Pratik Raj Singh, Sander Van Der Verren, Nani Van Gerven, Elizabeth Jayne Wallace.
Application Number | 20220162264 17/205054 |
Document ID | / |
Family ID | 1000006330521 |
Filed Date | 2022-05-26 |
United States Patent
Application |
20220162264 |
Kind Code |
A9 |
Remaut; Han ; et
al. |
May 26, 2022 |
NOVEL PROTEIN PORES
Abstract
The present invention relates to novel protein pores and their
uses in analyte detection and characterisation. The invention
particularly relates to an isolated pore complex formed by a
CsgG-like pore and a modified CsgF peptide, or a homologue or
mutant thereof, thereby incorporating an additional channel
constriction or reader head in the nanopore. The invention further
relates to a transmembrane pore complex and methods for production
of the pore complex and for use in molecular sensing and nucleic
acid sequencing applications.
Inventors: |
Remaut; Han; (Roosbeek,
BE) ; Van Der Verren; Sander; (Oxford, GB) ;
Van Gerven; Nani; (Oxford, GB) ; Jayasinghe;
Lakmal; (Oxford, GB) ; Wallace; Elizabeth Jayne;
(Oxford, GB) ; Singh; Pratik Raj; (Oxford, GB)
; Hambley; Richard George; (Oxford, GB) ; Jordan;
Michael; (Oxford, GB) ; Kilgour; John Joseph;
(Oxford, GB) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Oxford Nanopore Technologies Limited
VIB VZW
Vrije Universiteit Brussel |
Oxford
Gent
Brussels |
|
GB
BE
BE |
|
|
Assignee: |
Oxford Nanopore Technologies
Limited
Oxford
GB
VIB VZW
Gent
BE
Vrije Universiteit Brussel
Brussels
BE
|
Prior
Publication: |
|
Document Identifier |
Publication Date |
|
US 20210284696 A1 |
September 16, 2021 |
|
|
Family ID: |
1000006330521 |
Appl. No.: |
17/205054 |
Filed: |
March 18, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16624341 |
Dec 19, 2019 |
|
|
|
PCT/GB2018/051858 |
Jul 2, 2018 |
|
|
|
17205054 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C07K 14/00 20130101;
B82Y 15/00 20130101 |
International
Class: |
C07K 14/00 20060101
C07K014/00; B82Y 15/00 20060101 B82Y015/00 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 30, 2017 |
EP |
17179099.1 |
Claims
1-61. (canceled)
62. An apparatus for characterising a target analyte, the apparatus
comprising; (a) an isolated pore complex comprising: (i) a modified
CsgG pore, wherein the CsgG constriction loop of the CsgG pore
comprises one or more deletions; and (ii) a modified CsgF peptide,
wherein the modified CsgF peptide is bound to and forms a
constriction in the modified CsgG pore; and (b) an in vitro
membrane; wherein the isolated pore complex is inserted into the in
vitro membrane.
63. An apparatus according to claim 62, wherein the CsgG
constriction loop spans residues 46 to 61 of SEQ ID NO: 3.
64. An apparatus according to claim 62, wherein the CsgG
constriction loop comprises a deletion at one or more of the
following positions F48, K49, P50, Y51, P52, A53, S54, N55, F56 and
S57.
65. An isolated pore complex comprising: (i) a modified CsgG pore,
wherein the CsgG constriction loop of the CsgG pore comprises one
or more deletions; and (ii) a modified CsgF peptide, wherein the
modified CsgF peptide is bound to and forms a constriction in the
modified CsgG pore.
66. An isolated pore complex according to claim 65, wherein the
CsgF peptide is truncated and lacks the C-terminal head domain of
CsgF and at least part of the neck domain of CsgF.
67. An isolated pore complex according to claim 65, wherein the
CsgG constriction loop spans residues 46 to 61 of SEQ ID NO:3
68. An isolated pore complex according to claim 65, wherein the
CsgG constriction loop comprises a deletion at one or more of the
following positions F48, K49, P50, Y51, P52, A53, S54, N55, F56 and
S57.
69. A method for determining the presence, absence or one or more
characteristics of a target analyte, comprising the steps of: (i)
contacting the target analyte with an apparatus according to claim
62, such that the target analyte moves into the pore complex; and
(ii) taking one or more measurements as the analyte moves through
the pore complex and thereby determining the presence, absence or
one or more characteristics of the analyte.
70. A method according to claim 69, wherein the analyte is a
peptide, polypeptide, polysaccharide, small organic compound, or
small inorganic compound.
71. A method according to claim 69, wherein analyte is a
pharmacologically active compound, toxic compound, or
pollutant.
72. A method according to claim 69, wherein the analyte is a
polynucleotide.
73. A method according to claim 72, wherein the polynucleotide
comprises at least one homopolymeric region.
Description
RELATED APPLICATIONS
[0001] This application is a Continuation of U.S. application Ser.
No. 16/624,341, filed Dec. 19, 2019, entitled "NOVEL PROTEIN
PORES", which is a national stage filing under 35 U.S.C. 371 of
International Patent Application Serial No. PCT/GB2018/051858,
filed Jul. 2, 2018, entitled "NOVEL PROTEIN PORES". Foreign
priority benefits are claimed under 35 U.S.C. .sctn. 119(a)-(d) or
35 U.S.C. .sctn. 365(b) of European application number 17179099.1,
filed Jun. 30, 2017. The entire contents of these applications are
incorporated herein by reference in their entirety.
FIELD
[0002] The present invention relates to novel protein pores and
their uses in analyte detection and characterisation. The invention
further relates to a transmembrane pore complex and methods for
production of the pore complex and for use in molecular sensing and
nucleic acid sequencing applications.
BACKGROUND
[0003] Nanopore sensing is an approach to analyte detection and
characterization that relies on the observation of individual
binding or interaction events between the analyte molecules and an
ion conducting channel. Nanopore sensors can be created by placing
a single pore of nanometer dimensions in an electrically insulating
membrane and measuring voltage-driven ion currents through the pore
in the presence of analyte molecules. The presence of an analyte
inside or near the nanopore will alter the ionic flow through the
pore, resulting in altered ionic or electric currents being
measured over the channel. The identity of an analyte is revealed
through its distinctive current signature, notably the duration and
extent of current blocks and the variance of current levels during
its interaction time with the pore. Analytes can be organic and
inorganic small molecules as well as various biological or
synthetic macromolecules and polymers including polynucleotides,
polypeptides and polysaccharides. Nanopore sensing can reveal the
identity and perform single molecule counting of the sensed
analytes, but can also provide information on the analyte
composition such as nucleotide, amino acid or glycan sequence, as
well as the presence of base, amino acid or glycan modifications
such as methylation and acylation, phosphorylation, hydroxylation,
oxidation, reduction, glycosylation, decarboxylation, deamination
and more. Nanopore sensing has the potential to allow rapid and
cheap polynucleotide sequencing, providing single molecule sequence
reads of polynucleotides of tens to tens of thousands bases
length.
[0004] Two of the essential components of polymer characterization
using nanopore sensing are (1) the control of polymer movement
through the pore and (2) the discrimination of the composing
building blocks as the polymer is moved through the pore. During
nanopore sensing, the narrowest part of the pore forms the reader
head, the most discriminating part of the nanopore with respect to
the current signatures as a function of the passing analyte. CsgG
was identified as an ungated, non-selective protein secretion
channel from Escherichia coli (Goyal et al., 2014) and has been
used as a nanopore for detecting and characterising analytes.
Mutations to the wild-type CsgG pore that improve the properties of
the pore in this context have also been disclosed (WO2016/034591,
WO2017/149316, WO2017/149317 and WO2017/149318, PCT/GB2018/051191,
all incorporated by reference herein).
[0005] For analytes being polynucleotides, nucleotide
discrimination is achieved via passage through such a mutant pore,
but current signatures have been shown to be sequence dependent,
and multiple nucleotides contributed to the observed current, so
that the height of the channel constriction and extent of the
interaction surface with the analyte affect the relationship
between observed current and polynucleotide sequence. While the
current range for nucleotide discrimination has been improved
through mutation of the CsgG pore, a sequencing system would have
higher performance if the current differences between nucleotides
could be improved further. Accordingly, there is a need to identify
novel ways to improve nanopore sensing features.
SUMMARY
[0006] The disclosure relates to modified CsgF peptides, in
particular truncated CsgF fragments, binding the CsgG pore and
thereby introducing another additional channel or pore constriction
within the CsgG pore. Other aspects of the invention also relate to
an isolated transmembrane pore complex, and the use of said
CsgG:CsgF complexes and modified CsgF peptides or fragments in a
nanopore sensing platform with two consecutive reader heads.
[0007] The first aspect of the invention relates to a pore
comprising a CsgG pore and a CsgF peptide. In one aspect, the CsgF
peptide comprises a CsgG-binding region, and a region that forms a
constriction in the pore. In one aspect the CsgF peptide is a
truncated CsgF peptide lacking the C-terminal head domain of CsgF.
In another aspect the CsgF peptide is a truncated CsgF peptide
lacking the C-terminal head and a portion of the neck domain of
CsgF. In another aspect the CsgF peptide is a truncated CsgF
peptide lacking the C-terminal head and neck domains of CsgF. The
pore is also referred to herein as a pore complex and as an
isolated pore complex. The isolated pore complex comprising a CsgG
pore, or a homologue or mutant thereof, and a modified CsgF
peptide, or a homologue or mutant thereof, in particular truncated
CsgF fragments, or homologues or mutants thereof. In one
embodiment, said modified CsgF peptide, or homologues or mutants,
is located in the lumen of the CsgG pore, or homologues or mutants
thereof. In another embodiment, said isolated pore complex has two
or more channel constrictions, one located or provided by the CsgG
pore, formed by its constriction loop, and another additional
channel constriction or reader head, introduced by the modified
CsgF peptide or its homologues or mutants. In one embodiment, said
CsgG-pore or CsgG-like pore, is not a wild-type pore, it is a
mutant CsgG pore, with in particular embodiments mutations being
present, for example, in said channel constriction loop. In another
embodiment, the isolated pore complex, comprising the modified CsgF
peptide, or a homologue or mutant thereof, has a CsgF channel
constriction with a diameter in the range from 0.5 nm to 2.0 nm. In
one embodiment, the pore complex comprises: (i) a CsgG pore
comprising a first opening, a mid-section comprising a beta barrel,
a second opening, and a lumen extending from the first opening
through the mid-section to the second opening, wherein a luminal
surface of the mid-section defines a CsgG constriction; and (ii) a
plurality of modified CsgF peptides, each having a CsgF
constriction region and a CsgF binding region (also referred to
herein as a CsgG-binding domain or region of CsgF), wherein the
modified CsgF peptides form a CsgF constriction within the beta
barrel of the CsgG pore and wherein the CsgG constriction and the
CsgF constriction are co-axially spaced apart within the beta
barrel of the CsgG pore. The luminal surface of the CsgG pore may
comprise one or more loop regions of CsgG monomers that define the
CsgG constriction. The CsgF constriction region and the CsgF
binding region typically correspond to a N-terminal portion of a
CsgF mature peptide. In one embodiment, the pore complex excludes
CsgA, CsgB and CsgE.
[0008] In a second aspect, the invention relates to a modified CsgF
peptide, or a homologue or mutant thereof, wherein the protein or
peptide is modified via a truncation or deletion of part of the
protein, resulting in a CsgF fragment of SEQ ID NO:6 or of a
homologue or mutant thereof. One embodiment relates to a modified
or truncated CsgF peptide, or a modified peptide of a CsgF
homologue or mutant, said modified peptide comprising SEQ ID NO:39,
or SEQ ID NO:40, or a homologue or mutant thereof, alternatively,
said modified peptide comprises SEQ ID NO:15, or a homologue or
mutant thereof, alternatively SEQ ID NO: 54, or SEQ ID NO: 55 or a
homologue or mutant thereof. Another embodiment discloses a
modified CsgF peptide wherein one or more positions in the region
comprising SEQ ID NO:15 are modified, and wherein said mutation(s)
required to retain a minimal of 35% amino acid identity to SEQ ID
NO:15, in the peptide fragment corresponding to the region
comprising SEQ ID NO:15.
[0009] One embodiment relates to a pore comprising a CsgG pore and
a modified CsgF peptide, wherein the modified CsgF peptide is bound
to CsgG and forms a constriction in the pore
[0010] One embodiment relates to a polynucleotide encoding said
modified CsgF peptide, or homologue or mutant thereof, according to
the second aspect of the invention. In another embodiment, the
isolated pore complex comprising a CsgG pore and a modified CsgF
peptide, or homologues or mutants thereof, are characterized in
that said modified CsgF peptide is a peptide provided by the
peptides disclosed in the second aspect of the invention.
[0011] Another embodiment relates to the isolated pore complex
wherein the modified CsgF peptide and the CsgG pore or a monomer of
said pore, or homologues or mutants thereof, are covalently
coupled. And even more particularly, said coupling is made via a
cysteine residue or via a non-native reactive or photo-reactive
amino acid in a CsgG monomer at a position corresponding to 132,
133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183,
185, 187, 189, 191, 201, 203, 205, 207 or 209 of SEQ ID NO: 3, or
of a homologue thereof.
[0012] A preferred embodiment relates to an isolated transmembrane
pore complex, or a membranous composition, which comprises the
isolated pore complex of the invention, and the components of a
membrane. Particularly, said transmembrane pore complex or
membranous composition consists of the isolated pore complex of the
invention, and the components of a membrane or an insulating
layer.
[0013] One embodiment relates to a method for producing a pore as
disclosed herein, the method comprising co-expressing one or more
CsgG monomers as disclosed herein and a CsgF peptide as disclosed
herein in a host cell, thereby allowing transmembrane pore complex
formation in the cell. The CsgF peptide may be produced in the cell
by cleaving a modified CsgF peptide or protein comprising an enzyme
cleavage site at a suitable position in the amino acid
sequence.
[0014] One embodiment relates to a method for producing a pore as
disclosed herein, the method comprising contacting one or more
purified CsgG monomers with one or more purified modified CsgF
peptides, thereby allowing in vitro formation of the pore. The
modified CsgF peptide may be a peptide comprising an enzyme
cleavage site at a suitable position in the amino acid sequence,
that is cleaved before or after formation of the pore.
[0015] A third aspect of the invention relates to a method for
producing said transmembrane pore complex, wherein the pore is an
isolated complex formed by a CsgG pore, or a homologue or mutant
thereof, and a modified CsgF peptide, or a homologue or mutant
thereof, the method comprising the steps of co-expressing CsgG SEQ
ID NO:2, or a homologue or mutant thereof, and modified or
truncated CsgF, comprising a fragment of SEQ ID NO:5, or a
homologue or mutant thereof, in a suitable host cell, thereby
allowing in vivo pore complex formation. In specific embodiments,
said modified CsgF peptide, or homologue or mutant thereof,
comprises SEQ ID NO: 12 or SEQ ID NO:14, or a homologue or mutant
thereof. Alternatively, a method for producing an isolated pore
complex comprises the steps of contacting CsgG monomers of SEQ ID
NO:3, or a homologue or mutant thereof, with modified CsgF
peptide(s), or a homologue or mutant thereof, for in vitro
reconstitution of the pore complex. In particular embodiments,
modified CsgF peptides of said method comprise SEQ ID NO:15 or SEQ
ID NO:16, or homologues or mutants thereof.
[0016] Another aspect of the invention relates to a method for
determining the presence, absence or one or more characteristics of
a target analyte, comprising the steps of:
[0017] (i) contacting the target analyte with said isolated pore
complex or transmembrane pore complex, such that the target analyte
moves into the pore channel; and
[0018] (ii) taking one or more measurements as the analyte moves
through the pore channel and thereby determining the presence,
absence or one or more characteristics of the analyte.
[0019] In one embodiment, said analyte is a polynucleotide. In
particular, said method using a polynucleotide as an analyte
alternatively comprises determining one or more characteristics
selected from (i) the length of the polynucleotide, (ii) the
identity of the polynucleotide, (iii) the sequence of the
polynucleotide, (iv) the secondary structure of the polynucleotide
and (v) whether or not the polynucleotide is modified.
[0020] In another embodiment, the analyte is a protein, or peptide,
and in further embodiments, said analyte is a polysaccharide, or a
small organic or inorganic compound, such as for instance but not
limited to pharmacologically active compounds, toxic compounds and
pollutants.
[0021] In another embodiment, a method for characterising a
polynucleotide or a (poly)peptide using an isolated transmembrane
pore complex is described, wherein the pore complex is an isolated
complex comprising a CsgG pore, or a homologue or mutant thereof,
and a modified CsgF peptide, or a homologue or mutant thereof. In
particular, said CsgG pore, or homologue or mutant thereof,
comprising six to ten CsgG monomers forming the CsgG pore
channel.
[0022] A further aspect of the invention discloses the use of said
isolated pore complex or transmembrane pore complex according to
the previous aspects of the invention to determine the presence,
absence or one or more characteristics of a target analyte.
Furthermore, the invention also relates to a kit for characterising
a target analyte comprising (a) said isolated pore complex and (b)
the components of a membrane.
DESCRIPTION OF THE FIGURES
[0023] The drawings described are only schematic and are
non-limiting. In the drawings, the size of some of the elements may
be exaggerated and not drawn on scale for illustrative
purposes.
[0024] FIGS. 1A-1E. Structure of CsgG pore and the interface for
complex formation with CsgF. Cross-sectional (FIG. 1A), side (FIG.
1B) and top (FIG. 1C) views of CsgG oligomers (e.g., nonamers)
(gold) in surface (FIG. 1A) and ribbon (FIG. 1B, FIG. 1C)
representation, with a single CsgG protomer colored light blue
(FIG. 1D) (based on the CsgG X-ray structure PDB entry: 4uv3). The
CsgG constriction loop (CL loop) spans residues 46 to 61 according
SEQ ID NO:3, and is indicated in dark grey in all panels, and
corresponds to the loop provided in the bottom left of FIG. 1E.
CsgG residues for which the side chain faces the inner lumen of the
CsgG beta-barrel are colored mid-grey as indicated and labelled in
the p strands in FIG. 1E and FIG. 1D. These residues represent
sites that can be used for substitution to natural or non-natural
amino acids, e.g., amenable for attachment (e.g., covalent
crosslinking) of a pore-resident peptide, (including e.g., a
modified CsgF peptide, or a homologue thereof) to a CsgG pore or
monomer. In some embodiments crosslinking residues include Cys and
reactive and photo-reactive amino acids, acids such as
azidohomoalanine, homopropargylglycyine, homoallelglycine,
p-acetyl-Phe, p-azido-Phe, p-propargyloxy-Phe and p-benzoyl-Phe
(Wang et al. 2012; Chin et al. 2002) and can be substituted into
positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151,
153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 or 209
according to SEQ ID NO:3. FIG. 1E shows a zoom of the CL loop and
the transmembrane beta-strands of a CsgG monomer. The CsgG
constriction loop (colored dark blue) forms the orifice or
narrowest passage in the CsgG pore (FIG. 1A). In some embodiments,
three positions in the CL loop, 56, 55 and 51 according to SEQ ID
NO:3, are of particular importance to the diameter and chemical and
physical properties of the CsgG channel orifice or "reader head".
These represent preferred positions to alter the nanopore sensing
properties of CsgG pores and homologues.
[0025] FIGS. 2A-2B. CsgG:CsgF complex protein co-expression and
complex purification. FIG. 2A. Schematic representation of the
purification protocol of the CsgG:CsgF complex starting from an E.
coli culture co-expressing CsgG (SEQ ID NO:2+C-terminal StrepII
tag) and CsgF (SEQ ID NO:4+C-terminal 6.times.His tag). The
protocol involves disrupting resuspended cells and performing a 1%
DDM extraction of the membrane-bound proteins. The CsgG:CsgF
complex and excess CsgF undergo a first enrichment by affinity
purification on a nickel IMAC column, followed by a second
affinity-based enrichment of the CsgG:CsgF complex on a
Streptavidin column. FIG. 2B. Coommassie stained SDS-PAGE of the
IMAC (left) and Streptavidin (right) purification steps. Protein
bands corresponding to CsgG and CsgF are labelled. Notably, the
IMAC eluate contains a N-terminally truncated CsgF fragment
(labelled *) that was not retained in the affinity pulldown using
the CsgG-bound Strep tag, indicating that the CsgF N-terminus is
required for complex formation with CsgG.
[0026] FIGS. 3A-3C. CsgG:CsgF complex protein purification upon in
vitro reconstitution. FIG. 3A. Superimposed chromatograms of Size
Exclusion Chromatography (SEC) runs (using BioRad Enrich 650 10/300
column) of CsgG (light grey) and CsgG supplemented with an excess
CsgF (dark grey). The chromatograms show elution peaks
corresponding to a CsgG 9-mer (a) and CsgG 18-mer (b) for the CsgG
run; and an excess free CsgF (c), as well as a 9-mer CsgG:CsgF
complex (d) and 18-mer CsgG:CsgF complex (e) that elute at higher
hydrodynamic radius (molecular mass) due the incorporation of CsgF
in the complex. FIG. 3B. Native PAGE analysis of the representative
species labelled in panel (A), confirming the shift to higher
molecular mass due to incorporation of CsgF into the CsgG 9-mer and
CsgG 18-mer complexes. These experiments demonstrate that CsgG:CsgF
complex can be reconstituted in vitro starting from the purified
components. FIG. 3C. Ribbon representation of the CsgG 9-mer and
CsgG 18-mer as previously reported in Goyal et al. 2014 (PDB entry
4uv3). The CsgG 18-mer is formed of a dimer of CsgG 9-mers. The SEC
and native PAGE analyses shown in panels A and B demonstrate that
the CsgG 18-mer is amenable to complex formation with CsgF.
[0027] FIGS. 4A-4B. CsgG:CsgF structure as determined in cryo-EM.
FIG. 4A. A cryo electron micrograph of the CsgG:CsgF complex shows
the presence of 9-mer and 18-mer CsgG:CsgF complexes, with a number
of single particles of the 9- and 18-mer forms highlighted by full
and dashed circles, respectively. FIG. 4B. Two representative class
averages of the CsgG:CsgF 9-mer complex, viewed from the side.
Class averages include 6020 and 4159 individual particles,
respectively. The class averages reveal the presence of additional
density on top of the CsgG particle, corresponding to an oligomeric
complex of CsgF. Three distinct regions can be seen in the CsgF
oligomer: a "head" and "neck" region, as well as a region that
resides inside lumen of the CsgG beta-barrel and forms a
constriction or narrow passage (labelled F) that is stacked on top
of the constriction formed by the CsgG CL loop (labelled G). This
latter CsgF region is referred to as CsgF Constriction Peptide
(FCP).
[0028] FIG. 5. Three-dimensional structural model of a CsgG:CsgF
complex. Cross-sectional views of the 3D cryoEM electron density of
the CsgG:CsgF 9-mer complex calculated from 20.000 particles
assigned to 21 class averages. The right picture shows a
superimposition with the CsgG 9-mer X-ray structure (PDB entry:
4uv3) docked into the cryoEM density. The regions corresponding to
CsgG, CsgF and the CsgF head, neck and FCP domains are indicated.
The cross-sections show the CsgF FCP regions forms an additional
constriction (labelled F) in the CsgG channel, approximately 2 nm
above the CsgG constriction loop (labelled G).
[0029] FIGS. 6A-6C. Schematic representation of the CsgG:CsgF pore
complex based on the cryo-EM structure. FIG. 6A. Schematic
representation of the CsgG nanopore in cross-sectional view,
harbouring a single constriction, labelled (1). CsgG-based
nanopores form 3.5-4 nm wide channels that contain a 0.5-1.5 nm
orifice made by the CsgG constriction loop (residues 46 to 61
according SEQ ID No:3). When in complex with CsgF, a second
constriction or orifice is introduced into the CsgG channel,
labelled (2)/F, and the channel exit becomes occluded by the CsgF
head domain (see FIG. 5). When using modified CsgF peptides, e.g.,
corresponding to a CsgF constriction peptide (FCP), which lacks the
neck and head regions, a CsgG:CsgF pore complex is formed with two
consecutive channel constrictions or orifices ((1) and (2)), as
shown in the cross-sectional view of the CsgG:CsgF cryo-EM density
in FIG. 6B and in schematic representation in FIG. 6C. Removal of
the neck and head regions in the modified CsgF peptide alleviates
their blockage of the channel exit.
[0030] FIGS. 7A-7C. Schematic representation of the use of
CsgG:CsgF pore complex for nanopore sensing applications of a
(bio)polymer (FIG. 7A) or single molecule analytes (FIG. 7B). When
used in polymer sensing, the second channel constriction introduced
by the modified CsgF peptide increases the contact region with the
analyte and forms a second interaction site and reader head. When
used in single molecule nanopore sensing, the second channel
constriction introduced by the modified CsgF peptide, creates a
second independent analyte interaction site. FIG. 7C. Schematic
representation of theoretical channel conductance profiles of small
molecules (indicated by hexagonals or triangles) passing and
interacting with the consecutive CsgG (1) and CsgF (2)
constrictions or reader heads.
[0031] FIG. 8. Multiple sequence alignments of exemplary CsgF
homologues. Aligned sequences are shown as mature proteins (i.e.
lacking their N-terminal signal peptide (SP)). The boxed sequences
indicate a CsgF region of sequence conservation (between 35 and
100% pairwise sequence identity--see FIG. 10) that corresponds to a
CsgF constriction peptide (FCP) in some embodiments. CsgF
homologues included in the multiple sequence alignment are Q88H88;
A0A143HJA0; Q5E245; Q084E5; FOLZU2; A0A136HQR0; A0A0W1SRL3; B0UH01;
Q6NAU5; G8PUY5; A0A0S2ETP7; E311Z1; F3Z094; A0A176T7M2; D2QPP8;
N2IYT1; W7QHV5; D4ZLW2; D2QT92; A0A167UJA2. The FCP regions of E.
coli CsgF (SEQ ID No:15) and the shown CsgF homologues correspond
to SEQ ID Nos 18-36.
[0032] FIGS. 9A-9C. Experimental evaluation of the E. coli CsgF
region forming the CsgG-interaction sequence and CsgF constriction
peptide (FCP). FIG. 9A shows the mature sequences (i.e. after
removal of the CsgF signal peptide, corresponding to residues 1-19
of SEQ ID NO:5) of the four N-terminal CsgF fragments (SEQ ID
NO:8_CsgF residues 1-27; SEQ ID NO: 10; SEQ ID NO: 12 and SEQ ID
NO: 14) that were co-expressed with E. coli CsgG (SEQ ID NO:2).
FIG. 9B. Anti-Strep (left) and anti-His (right) Western blot
analysis of SDS-PAGE runs of crude cell lysates of CsgG and CsgF
co-expression experiments. Anti-strep analysis demonstrates the
expression of CsgG in all co-expression experiments, whereas
anti-his western blot analysis shows detectible levels of CsgF
fragments only for the truncation mutant CsgF 1-64 (SEQ ID NO: 14).
A His-tagged nanobody (Nb) was used as positive control. FIG. 9C.
Anti-His dot blot analysis of the presence of CsgF fragments in
CsgG:CsgF co-expression experiments. Top row shows whole cell
lysates, middle and bottom rows show the eluate and flowthrough of
a Strep affinity pulldown experiment. These data demonstrate that
CsgF fragment 1-64, and to a much lesser extent CsgF 1-48, is
specifically pulled down as a complex with Strep-tagged CsgG. CsgF
fragments 1-27 and 1-38 do not result in detectable levels of the
corresponding CsgF fragments and show no sign of complex formation
with CsgG.
[0033] FIG. 10. Multiple sequence alignment of the CsgF region
forming the CsgG-interaction sequence and CsgF constriction peptide
(FCP). The figure shows a multiple sequence alignment and consensus
sequence of the CsgF peptides and known homologues thereof in the
region that corresponds to the CsgG-interaction. CsgF homologues
are defined by the PFAM domain PF03783. These peptides bind CsgG,
and localize to the lumen of the CsgG .beta.-barrel where they form
an additional constriction in the CsgG channel. These peptides and
homologues thereof are examples of CsgF Constriction Peptides or
FCPs. The pairwise sequence identity in the shown FCPs ranges
between 35 and 98%.
[0034] FIGS. 11A-11D. The high resolution cryoEM structure of the
CsgG:CsgF complex. CsgG is shown in light grey and CsgF is shown in
dark grey. FIG. 11A. Final electron density map of the CsgG:CsgF
complex at 3.4 .ANG. resolution. Side view. FIG. 11B. Top view of
the cryoEM structure to show CsgG:CsgF comprises a 9:9
stoichiometry, with C9 symmetry. FIG. 11C. Internal architecture of
the CsgG:CsgF complex. GC, CsgG constriction, FC, CsgF
constriction. FIG. 11D. Interactions between CsgG and CsgF
proteins. CsgG and the CsgG constriction are coloured light grey
and grey respectively. CsgF is coloured dark grey. Residues in CsgG
and CsgF are labelled in light grey and black respectively.
[0035] FIG. 12. Two reader heads of the CsgG:CsgF complex. CsgG is
shown in light grey and reader head of the CsgG pore is shown in
dark grey. CsgF is shown in black and the reader head of the CsgF
is labelled.
[0036] FIG. 13. Co-expression of CsgG with CsgF WT in vivo. Gene
encoding the C-terminal strep tagged CsgG polypeptide in pT7 vector
with ampicillin resistance and the gene encoding the C-terminal His
tagged CsgF polypeptide in pRham vector with kanamycin resistance
were transformed together into E. coli BL21 DE3 cells in the
presence of both ampicillin and kanamycin. Proteins were expressed
for overnight at 18.degree. C. at 250 rpm and the CsgG-CsgF complex
was purified using the Strep tag purification followed by His tag
purification. (Gel A) Protein sample (in duplicate) before strep
purification. (Gel B) Protein sample (three elution fractions)
after His purification. Proteins were run on a 4-20% Tris gel.
[0037] FIG. 14. Co-expression of CsgG and CsgF in vitro and the
heat stability of the CsgG-CsgF complex. CsgG and CsgF DNA in
different vectors are co-expressed in an in vitro transcription and
translation reaction. Proteins are radiolabeled with S-35
methionine and exposed onto a X-ray film. Stability of the complex
is assessed by incubating the reaction mixture for 10 minutes at
different temperatures.
[0038] FIGS. 15A-15B. Making CsgG:CsgF complexes using protease
cleavage sites. FIG. 15A. TEV or C3 or any other protease cleavage
site can be Incorporated into the CsgF peptide at required sites
(eg: between 30 and 31, 35 and 36, 40 and 41, 45 and 46 of seq ID
no. 6) CsgG is shown in gold and CsgF domains in red. 1-35 of one
CsgF subunit is coloured in green for clarity. 36-45 is shown in
purple. 10 Histidine tag is shown in pink and the strep tag on CsgG
is shown in blue. FIG. 15B. SDS-PAGE (4-20% TGX) for protease
cleavage of the full length CsgG:CsgF complex where TEV protease
cleavage site is inserted between 35-36 of seq ID 6. M:molecular
weight marker, Lane 1: post strep purification of CsgG:CsgF full
length complex, Lane 2: post strep concentrated, Lane 3: post gel
filtrated, Lane 4: cleaved with TEV protease to generate CsgG:CsgF
complex, Lane 5: flow through of CsgG:CsgF after strep
purification, Lane 6: CsgG:CsgF heated at 60.degree. C. for 10
minutes. Lane 7: Eluted CsgG:CsgF complex from strep column, Lane
8: CsgG pore as the control, Lane 9: TEV protease as the
control.
[0039] FIGS. 16A-16C. Heat stability of CsgG:CsgF complexes. M:
Molecular weight marker, Lane 1: CsgG pore, Lane 2: CsgG:CsgF
complex at room temperature: Lanes 3-9: CsgG:CsgF sample was heated
at different temperatures (40, 50, 60, 70, 80, 90, 100.degree. C.
respectively) for 10 minutes. Lane 1:
FIG. 16A.
Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-45).
FIG. 16B.
Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-35).
FIG. 16C.
Y51A/F56Q/N55V/N91R/K94Q/R97W-del(V105-I107):CsgF-(1-30).
[0040] Samples were subjected to SDS-PAGE on a 7.5% TGX gel.
CsgG:CsgF complexes with both CsgF-(1-45) and CsgF-(1-35) shows a
shift from the CsgG pore band in lanes 1. Therefore, it is clear
that both those complexes are heat stable up to 90.degree. C. The
complex and the pore breaks down to CsgG monomers at 1000 (lanes
9). Although the same heat stability pattern is seen with the
CsgG:CsgF complex with CsgF-(1-30), its difficult to see the shift
between the protein bands of the CsgG pore (lane 1) and CsgG-CsgF
complexes (lanes 2-8).
[0041] FIG. 17. CsgG:CsgF formation via in vitro reconstitution
using synthetic CsgF peptides. Native PAGE showing CsgG:CsgF
formation via in vitro reconstitution using wildtype CsgG or a CsgG
mutant with altered constriction
Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107). An Alexa 594-labelled
CsgF peptide corresponding to the first 34 residues of mature CsgF
(Seq ID No 6) was added to purified Strep-tagged CsgG or
Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107) in 50 mM Tris, 100 mM
NaCl, 1 mM EDTA, 5 mM LDAO/C8D4 in a 2:1 molar ratio during 15
minutes at room temperature to allow reconstitution. After pull
down of CsgG-strep on StrepTactin beads, the sample was analysed on
native-PAGE. Both WT and Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)
CsgG bind the CsgF N-terminal peptide as visualised by the
fluorescence tag.
[0042] FIGS. 18A-18B. Stabilising CsgG:CsgF or CsgG:FCP complexes.
FIG. 18A. Identified amino acid positions of CsgG (SEQ ID NO: 3 and
CsgF (SEQ ID NO:. 6) pairs where S--S bonds can be made. FIG. 18B.
Schematic representation to show the S--S bond between CsgG-Q153C
and CsgF-G1C.
[0043] FIGS. 19A-19B. Cysteine cross linking of the CsgG:CsgF
complex. FIG. 19A. Y51A/F56Q/N91R/K94Q/R97W/Q153C-del(V105-I107)
and CsgF-G1C proteins were purified separately and incubated
together at 4.degree. C. for 1 hour or overnight to form the
complex and allow S--S formation. No oxidising agents were added to
promote S--S formation. Control CsgG pore
(Y51A/F56Q/N91R/K94Q/R97W/Q153C-DEL(V105-I107)) and complex (with
and without DTT) were heated at 100.degree. C. for 10 minutes to
breakdown the complex into CsgG monomer (CsgG.sub.m, 30KDa) and
CsgF monomer (CsgF.sub.m, 15KDa). A dimer between the CsgG.sub.m
and CsgF.sub.m (CsgG.sub.m-CsgF.sub.m, 45KDa) can be seen in the
absence of the reducing agents confirming the S--S bond formation.
Increased dimer formation can be seen in overnight incubation
compared to one hour incubation. FIG. 19B. Mass spectrometry
analysis was carried out on the gel purified CsgG.sub.m-CsgF.sub.m
band from overnight incubation. Protein was proteolytically cleaved
to generate tryptic peptides. LC-MS/MS sequencing methods were
performed, resulting in the identification of the precursor ion
above, corresponding to the linked peptides shown. This precursor
ion was fragmented to give the fragment ions observed. These
include ions for each of the peptides, as well as fragments
incorporating the intact disulphide bond. This data provides strong
evidence for the presence of a disulphide bond between C1 of CsgF
and C153 of CsgG.
[0044] FIG. 20. Improving the efficiency of Cysteine cross linking
of the CsgG:CsgF complex. Lane 1:
Y51A/F56Q/N91R/K94Q/R97W/N133C-del(V105-I107) and CsgF-T4C proteins
were co expressed the CsgG:CsgF complex was purified. Lane 2: The
complex was heated in the presence of DTT to break down the complex
into substituent monomers (CsgG.sub.m and CsgF.sub.m). DTT will
break down any S--S bonds between CsgG-N133C and CsgF-T4C if
formed. Lane 3: The complex is incubated with the oxidising agent
copper-orthophenanthroline to promote S--S bond formation. Lane 4:
Oxidised sample was heated at 100.degree. C. in the absence of DTT
to break down the complex. A new band of 45KDa corresponding to the
CsgG.sub.m-CsgF.sub.m appears confirming the S--S bond
formation.
[0045] FIGS. 21A-21B. Current signature when the DNA strand is
passing through the CsgG:CsgF complex. The complexes were made by
co-expressing the CsgG pore
(Y51A/F56Q/N91R/K94Q/R97W-del(V105-I107)) containing the C terminal
strep tag with the full length CsgF proteins containing C terminal
His tag and TEV protease cleavage site between 35 and 36 of seq ID
no. 6. Purified complexes were then cleaved by TEV protease to make
the given CsgG:CsgF complexes. Note that TEV cleavage leaves ENLYFQ
sequence at the cleavage site. FIG. 21A. No mutations at 17
position of CsgF. FIG. 21B. N17S mutation in CsgF.
[0046] FIGS. 22A-22B. Current signature when the DNA strand is
passing through the CsgG:CsgF complex. The complexes were made by
incubating Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107) pore
containing the C terminal strep tag with CsgF-(1-35) mutants. FIG.
22A. CsgF-N17S-(1-35). FIG. 22B. CsgF-N17V-(1-35).
[0047] FIGS. 23A-23F. Current signature when the DNA strand is
passing through the CsgG:CsgF complex. The complexes were made by
incubating different CsgG pores containing the C terminal strep tag
with CsgF-N17S-(1-35). FIG. 23A. CsgG pore is
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107). FIG. 23B. CsgG pore
is Y51T/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107). FIG. 23C. CsgG
pore is Y51A/N551/F56Q/N91R/K94Q/R97W-del(V105-I107). FIG. 23D.
CsgG pore is Y51A/F56A/N91R/K94Q/R97W-del(V105-I107). FIG. 23E.
CsgG pore is Y51A/F561/N91R/K94Q/R97W-del(V105-I107). FIG. 23F.
CsgG pore is Y51S/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107).
[0048] FIGS. 24A-24C. Current signature when the DNA strand is
passing through the CsgG:CsgF complex. Complexes were made by
incubating the E. coli purified
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107) pore containing the C
terminal strep with CsgF of three different lengths. FIG. 24A.
CsgF-(1-29), FIG. 24B. CsgF-(1-35), FIG. 24C. CsgF-(1-45). The
arrow indicates the range of the signal. Surprisingly, complex with
the CsgF-(1-29) produces the signal with the largest range.
[0049] FIG. 25. Signal:noise of the current signature when the DNA
strand is passing through the CsgG:CsgF complex. Different
CsgG:CsgF complexes were made by incubating different CsgG pores
(1-Y51A/F56Q/N91R/K94Q/R97W-del(V105-I107)
2-Y51A/N551/F56Q/N91R/K94Q/R97W-del(V105-I107)
3-Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)
4-Y51A/F56A/N91R/K94Q/R97W-del(V105-I107)
5-Y51A/F561/N91R/K94Q/R97W-del(V105-I107)
6-Y51A/F56V/N91R/K94Q/R97W-del(V105-I107)
7-Y51S/N55A/F56Q/N91R/K94Q/R97W-del(V105-I107)
8-Y51S/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)
9-Y51T/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107)) with the same CsgF
peptide CsgF-(1-35). Different squiggle patterns were observed in
DNA translocation experiments and their signal:noise is measured.
Higher accuracies can be obtained with larger signal:noise
ratios.
[0050] FIGS. 26A-26B. Sequencing errors with narrow reader-heads.
FIG. 26A. representation of DNA base interaction with the reader
head of the CsgG pore. Approximately, 5 bases dominate the current
signal at any given time when the DNA strand is translocating
through the pore. FIG. 26B. Mapping plots of the signal.
Event-detected signal for multiple reads mapped to modelled signal
using a custom HMM, for a mixed sequence lacking homopolymer runs,
and for a sequence containing three homopolymer runs of 10 T.
[0051] FIGS. 27A-27D. Mapping the reader heads of the CsgG:CsgF
complex. FIG. 27A. Reader head discrimination plot for the
CsgG:CsgF complex. The average variation in modelled current when
the base at each read head position is varied. To calculate the
read head discrimination at position i for a model of length k with
alphabet of length n, we define the discrimination at read-head
position las the median of the standard deviations in current level
for each of the nk-1 groups of size n where position i is varied
while other positions are held constant. FIG. 27B. Static DNA
strands to map the reader head: A set of polyA DNA strands (SS20 to
SS38) in which one base is missing from the DNA backbone (iSpc3) is
created. In each strand, the position of iSpc3 moves from 3' end
towards the 5' end. Based on previous experiments with the CsgG
pore, 7.sup.th position of the DNA is expected to be located within
the CsgG constriction. SS26 corresponds to this DNA is highlighted.
Based on the model from FIG. 27A, 4-5 bases are expected to
separate CsgG and CsgF reader heads. Therefore, approximately,
position 12 and 13 are expected to be within the CsgF constriction.
SS31 and SS32 DNA strands corresponding to those positions are
highlighted. FIGS. 27C-27D. Mapping the two reader heads: Biotin
modification at the 3' end of each strand is complexed with
monovalent streptavidin and the current blockage generated from
each strand is recorded in a MinION set up. When the iSpc3 position
is present above or below the constriction within the pore, no
deflection is expected. However, when the iSpc3 is located within
the constriction, a higher current level is expected to pass
through the pore--the extra space created by the lack of base lets
more ions to pass through. Therefore, by plotting the current
passing through with each DNA strand, the locations of the two
reader heads can be mapped. As expected, the highest deflection in
the current is seen when the position 7 of the DNA strand is
occupied by iSpc3 (FIG. 27C). iSpc3 at positions 6 and 8 also
produce a higher deflection over the average polyA current level.
Therefore, positions 6, 7 and 8 of the DNA strand represent the
first reader head--CsgG reader head. As expected, when positions
12.sup.th and 13.sup.th are occupied by iCsp3, another deviation
from baseline polyA is observed (FIG. 27D). This indicates the
second reader head of the pore--CsgF reader head. Results also
confirm that the two reader heads are apart by approximately 4-5
bases.
[0052] FIGS. 28A-28B. Reader head discrimination and base
contribution. Left hand panel demonstrates the read-head
discrimination of each mutant pore: the average variation in
modelled current when the base at each read head position is
varied. To calculate the read head discrimination at position i for
a model of length k with alphabet of length n, we define the
discrimination at read-head position i as the median of the
standard deviations in current level for each of the n.sup.k-1
groups of size n where position i is varied while other positions
are held constant. Right hand panel demonstrates the base
contribution plot: Median current over all sequence contexts with
base b (A, T, G or C) at position i of the reader head.
[0053] FIGS. 29A-29C. Error profiles of the double reader head
pore. FIG. 29A. Schematic representation of the CsgG:CsgF complex
and the interaction of bases of the DNA with the two reader heads.
Red: strong interactions, orange: weak interactions, grey: no
interactions. FIG. 29B. Comparison of errors in deletions. Reads
from Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) and
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107): CsgF-N17S-(1-35)
pores were base called from the same region of E. coli DNA. Reads
were aligned to the reference genome using Minimap2
(arxiv.org/abs/1708.01492), and the resultant alignments were
visualised in Savant Genome Browser
(www.ncbi.nlm.nih.gov/pubmed/20562449). The majority of
Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) reads contain a
single base deletion (black boxes) in the T homopolymer, which is
not present in the majority of CsgG:CsgF reads. FIG. 29C.
Comparison of the consensus accuracy from unpolished data generated
from Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) (blue) and
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107):CsgF-N17S-(1-35) pores
(green) against the length of homopolymers.
[0054] FIGS. 30A-30H. Homopolymer calling of CsgG:CsgF complex.
FIG. 30A. DNA with the sequence shown in (A) is translocated
through the Y51A/F56Q/N91R/K94Q/R97W/R192D-del(V105-I107) pore (B)
and the
Y51A/N55V/F56Q/N91R/K94Q/R97W-del(V105-I107):CsgF-N17S-(1-35) pore
(C) and their signal was analysed for the first polyT section shown
in red in (A). FIGS. 30B-30H. When the polyT section is passing
through the CsgG pore which contains a single reader head (model is
based on 5 bases located in the reader head), it generates a flat
line in the signal. Therefore, it is difficult to determine the
exact number of bases in this region which usually causes deletion
errors. When the DNA is passing through the CsgG:CsgF complex which
contains two reader heads (model is based on 9 bases located within
and in between the two reader heads), polyT section shows multiple
steps instead of a flat line. Information in these steps can be
used to correctly identify the number of bases in the homopolymeric
region. This additional information significantly reduce deletion
errors and improves overall consensus accuracy.
[0055] FIGS. 31A-31C. Characterisation of the CsgG pore
(Y51A/F56Q/N91R/K94Q/R97W/-del(V105-I107). FIG. 31A. Reader head
discrimination of the CsgG pore. The average variation in modelled
current when the base at each read head position is varied. To
calculate the read head discrimination at position i for a model of
length k with alphabet of length n, we define the discrimination at
read-head position i as the median of the standard deviations in
current level for each of the n.sup.k-1 groups of size n where
position i is varied while other positions are held constant. FIG.
31B. Base contribution plot of the CsgG pore. Median current over
all k mers with base b (A, T, G or C) at position l of the reader
head. FIG. 31C. Current signature when the DNA strand is passing
through the CsgG pore.
DETAILED DESCRIPTION
[0056] The present invention will be described with respect to
particular embodiments and with reference to certain drawings but
the invention is not limited thereto but only by the claims. Any
reference signs in the claims shall not be construed as limiting
the scope. Of course, it is to be understood that not necessarily
all aspects or advantages may be achieved in accordance with any
particular embodiment of the invention. Thus, for example those
skilled in the art will recognize that the invention may be
embodied or carried out in a manner that achieves or optimizes one
advantage or group of advantages as taught herein without
necessarily achieving other aspects or advantages as may be taught
or suggested herein.
[0057] The invention, both as to organization and method of
operation, together with features and advantages thereof, may best
be understood by reference to the following detailed description
when read in conjunction with the accompanying drawings. The
aspects and advantages of the invention will be apparent from and
elucidated with reference to the embodiment(s) described
hereinafter. Reference throughout this specification to "one
embodiment" or "an embodiment" means that a particular feature,
structure or characteristic described in connection with the
embodiment is included in at least one embodiment of the present
invention. Thus, appearances of the phrases "in one embodiment" or
"in an embodiment" in various places throughout this specification
are not necessarily all referring to the same embodiment, but may.
Similarly, it should be appreciated that in the description of
exemplary embodiments of the invention, various features of the
invention are sometimes grouped together in a single embodiment,
figure, or description thereof for the purpose of streamlining the
disclosure and aiding in the understanding of one or more of the
various inventive aspects. This method of disclosure, however, is
not to be interpreted as reflecting an intention that the claimed
invention requires more features than are expressly recited in each
claim. Rather, as the following claims reflect, inventive aspects
lie in less than all features of a single foregoing disclosed
embodiment.
[0058] In addition as used in this specification and the appended
claims, the singular forms "a", "an", and "the" include plural
referents unless the content clearly dictates otherwise. Thus, for
example, reference to "a polynucleotide" includes two or more
polynucleotides, reference to "a polynucleotide binding protein"
includes two or more such proteins, reference to "a helicase"
includes two or more helicases, reference to "a monomer" refers to
two or more monomers, reference to "a pore" includes two or more
pores and the like.
[0059] In all of the discussion herein, the standard one letter
codes for amino acids are used. These are as follows: alanine (A),
arginine (R), asparagine (N), aspartic acid (D), cysteine (C),
glutamic acid (E), glutamine (Q), glycine (G), histidine (H),
isoleucine (I), leucine (L), lysine (K), methionine (M),
phenylalanine (F), proline (P), serine (S), threonine (T),
tryptophan (W), tyrosine (Y) and valine (V). Standard substitution
notation is also used, i.e. Q42R means that Q at position 42 is
replaced with R.
[0060] In the paragraphs herein where different amino acids at a
specific position are separated by the / symbol, the / symbol means
"or". For instance, Q87R/K means Q87R or Q87K.
[0061] In the paragraphs herein where different positions are
separated by the / symbol, the / symbol means "and" such that
Y51/N55 is Y51 and N55.
[0062] All amino-acid substitutions, deletions and/or additions
disclosed herein are with reference to a mutant CsgG monomer
comprising a variant of the sequence shown in SEQ ID NO: 3, unless
stated to the contrary.
[0063] Reference to a mutant CsgG monomer comprising a variant of
the sequence shown in SEQ ID NO: 3 encompasses mutant CsgG monomers
comprising variants of sequences as set out in the further SEQ ID
NOS as disclosed below. Amino-acid substitutions, deletions and/or
additions may be made to CsgG monomers comprising a variant of the
sequence other than shown in SEQ ID NO: 3 that are equivalent to
those substitutions, deletions and/or additions disclosed herein
with reference to a mutant CsgG monomer comprising a variant of the
sequence shown in SEQ ID NO: 3.
[0064] All publications, patents and patent applications cited
herein, whether supra or infra, are hereby incorporated by
reference in their entirety.
Definitions
[0065] Where an indefinite or definite article is used when
referring to a singular noun e.g. "a" or "an", "the", this includes
a plural of that noun unless something else is specifically stated.
Where the term "comprising" is used in the present description and
claims, it does not exclude other elements or steps. Furthermore,
the terms first, second, third and the like in the description and
in the claims, are used for distinguishing between similar elements
and not necessarily for describing a sequential or chronological
order. It is to be understood that the terms so used are
interchangeable under appropriate circumstances and that the
embodiments of the invention described herein are capable of
operation in other sequences than described or illustrated herein.
The following terms or definitions are provided solely to aid in
the understanding of the invention. Unless specifically defined
herein, all terms used herein have the same meaning as they would
to one skilled in the art of the present invention. Practitioners
are particularly directed to Sambrook et al., Molecular Cloning: A
Laboratory Manual, 4.sup.th ed., Cold Spring Harbor Press,
Plainsview, N.Y. (2012); and Ausubel et al., Current Protocols in
Molecular Biology (Supplement 114), John Wiley & Sons, New York
(2016), for definitions and terms of the art. The definitions
provided herein should not be construed to have a scope less than
understood by a person of ordinary skill in the art.
[0066] "About" as used herein when referring to a measurable value
such as an amount, a temporal duration, and the like, is meant to
encompass variations of .+-.20% or +10%, more preferably .+-.5%,
even more preferably .+-.1%, and still more preferably .+-.0.1%
from the specified value, as such variations are appropriate to
perform the disclosed methods.
[0067] "Nucleotide sequence", "DNA sequence" or "nucleic acid
molecule(s)" as used herein refers to a polymeric form of
nucleotides of any length, either ribonucleotides or
deoxyribonucleotides. This term refers only to the primary
structure of the molecule. Thus, this term includes double- and
single-stranded DNA, and RNA. The term "nucleic acid" as used
herein, is a single or double stranded covalently-linked sequence
of nucleotides in which the 3' and 5' ends on each nucleotide are
joined by phosphodiester bonds. The polynucleotide may be made up
of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acids
may be manufactured synthetically in vitro or isolated from natural
sources. Nucleic acids may further include modified DNA or RNA, for
example DNA or RNA that has been methylated, or RNA that has been
subject to post-translational modification, for example 5'-capping
with 7-methylguanosine, 3'-processing such as cleavage and
polyadenylation, and splicing. Nucleic acids may also include
synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA),
cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA),
glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide
nucleic acid (PNA). Sizes of nucleic acids, also referred to herein
as "polynucleotides" are typically expressed as the number of base
pairs (bp) for double stranded polynucleotides, or in the case of
single stranded polynucleotides as the number of nucleotides (nt).
One thousand bp or nt equal a kilobase (kb). Polynucleotides of
less than around 40 nucleotides in length are typically called
"oligonucleotides" and may comprise primers for use in manipulation
of DNA such as via polymerase chain reaction (PCR).
[0068] "Gene" as used here includes both the promoter region of the
gene as well as the coding sequence. It refers both to the genomic
sequence (including possible introns) as well as to the cDNA
derived from the spliced messenger, operably linked to a promoter
sequence.
[0069] "Coding sequence" is a nucleotide sequence, which is
transcribed into mRNA and/or translated into a polypeptide when
placed under the control of appropriate regulatory sequences. The
boundaries of the coding sequence are determined by a translation
start codon at the 5'-terminus and a translation stop codon at the
3'-terminus. A coding sequence can include, but is not limited to
mRNA, cDNA, recombinant nucleotide sequences or genomic DNA, while
introns may be present as well under certain circumstances.
[0070] The term "amino acid" in the context of the present
disclosure is used in its broadest sense and is meant to include
organic compounds containing amine (NH.sub.2) and carboxyl (COOH)
functional groups, along with a side chain (e.g., a R group)
specific to each amino acid. In some embodiments, the amino acids
refer to naturally occurring L .alpha.-amino acids or residues. The
commonly used one and three letter abbreviations for naturally
occurring amino acids are used herein: A=Ala; C=Cys; D=Asp; E=Glu;
F=Phe; G=Gly; H=His; I=Ile; K=Lys; L=Leu; M=Met; N=Asn; P=Pro;
Q=Gln; R=Arg; S=Ser; T=Thr; V=Val; W=Trp; and Y=Tyr (Lehninger, A.
L., (1975) Biochemistry, 2d ed., pp. 71-92, Worth Publishers, New
York). The general term "amino acid" further includes D-amino
acids, retro-inverso amino acids as well as chemically modified
amino acids such as amino acid analogues, naturally occurring amino
acids that are not usually incorporated into proteins such as
norleucine, and chemically synthesised compounds having properties
known in the art to be characteristic of an amino acid, such as
.beta.-amino acids. For example, analogues or mimetics of
phenylalanine or proline, which allow the same conformational
restriction of the peptide compounds as do natural Phe or Pro, are
included within the definition of amino acid. Such analogues and
mimetics are referred to herein as "functional equivalents" of the
respective amino acid. Other examples of amino acids are listed by
Roberts and Vellaccio, The Peptides: Analysis, Synthesis, Biology,
Gross and Meiehofer, eds., Vol. 5 p. 341, Academic Press, Inc.,
N.Y. 1983, which is incorporated herein by reference.
[0071] The terms "protein", "polypeptide", and "peptide" are
interchangeably used further herein to refer to a polymer of amino
acid residues and to variants and synthetic analogues of the same.
Thus, these terms apply to amino acid polymers in which one or more
amino acid residues is a synthetic non-naturally occurring amino
acid, such as a chemical analogue of a corresponding naturally
occurring amino acid, as well as to naturally-occurring amino acid
polymers. Polypeptides can also undergo maturation or
post-translational modification processes that may include, but are
not limited to: glycosylation, proteolytic cleavage, lipidization,
signal peptide cleavage, propeptide cleavage, phosphorylation, and
such like. By "recombinant polypeptide" is meant a polypeptide made
using recombinant techniques, e.g., through the expression of a
recombinant or synthetic polynucleotide. When the chimeric
polypeptide or biologically active portion thereof is recombinantly
produced, it is also preferably substantially free of culture
medium, e.g., culture medium represents less than about 20%, more
preferably less than about 10%, and most preferably less than about
5% of the volume of the protein preparation. By "isolated" is meant
material that is substantially or essentially free from components
that normally accompany it in its native state. For example, an
"isolated polypeptide", as used herein, refers to a polypeptide,
which has been purified from the molecules which flank it in a
naturally-occurring state, e.g., a protein complex or CsgF peptide
which has been removed from the molecules present in the production
host that are adjacent to said polypeptide. An isolated CsgF
peptide (optionally a truncated CsgF peptide) can be generated by
amino acid chemical synthesis or can be generated by recombinant
production. An isolated complex can be generated by in vitro
reconstitution after purification of the components of the complex,
e.g. a CsgG pore and the CsgF peptide(s), or can be generated by
recombinant co-expression.
[0072] "Orthologues" and "paralogues" encompass evolutionary
concepts used to describe the ancestral relationships of genes.
Paralogues are genes within the same species that have originated
through duplication of an ancestral gene; orthologues are genes
from different organisms that have originated through speciation,
and are also derived from a common ancestral gene.
[0073] "Homologue", "Homologues" of a protein encompass peptides,
oligopeptides, polypeptides, proteins and enzymes having amino acid
substitutions, deletions and/or insertions relative to the
unmodified or wild-type protein in question and having similar
biological and functional activity as the unmodified protein from
which they are derived. The term "amino acid identity" as used
herein refers to the extent that sequences are identical on an
amino acid-by-amino acid basis over a window of comparison. Thus, a
"percentage of sequence identity" is calculated by comparing two
optimally aligned sequences over the window of comparison,
determining the number of positions at which the identical amino
acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, lie, Phe,
Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys and Met) occurs in
both sequences to yield the number of matched positions, dividing
the number of matched positions by the total number of positions in
the window of comparison (i.e., the window size), and multiplying
the result by 100 to yield the percentage of sequence identity.
[0074] The term "CsgG pore" defines a pore comprising multiple CsgG
monomers. Each CsgG monomer may be a wild-type monomer from E. coli
(SEQ ID NO: 3), wild-type homologues of E. coli CsgG, such as for
example, monomers having any one of the amino acid sequences shown
in SEQ ID NOS: 68 to 88. or a variant of any thereof (e.g. a
variant of any one of SEQ ID NOs: 3 and 68 to 88). The variant CsgG
monomer may also be referred to as a modified CsgG monomer or a
mutant CsgG monomer. The modifications, or mutations, in the
variant include but are not limited to any one or more of the
modifications disclosed herein, or combinations of said
modifications.
[0075] For all aspects and embodiments of the present invention, a
CsgG homologue is referred to as a polypeptide that has at least
50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgG as shown in SEQ ID NO: 3. A CsgG homologue
is also referred to as a polypeptide that contains the PFAM domain
PF03783, which is characteristic for CsgG-like proteins. A list of
presently known CsgG homologues and CsgG architectures can be found
at pfam.xfam.orq//family/PF03783. Likewise, a CsgG homologous
polynucleotide can comprise a polynucleotide that has at least 50%,
60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgG as shown in SEQ ID NO: 1. Examples of
homologues of CsgG shown in SEQ ID NO:3 have the sequences shown in
SEQ ID NOS: 68 to 88.
[0076] The term "modified CsgF peptide" or "CsgF peptide" defines
CsgF peptide that has been truncated from its C-terminal end (e.g.
is an N-terminal fragment) and/or is modified to include a cleavage
site. The CsgF peptide may be a fragment of wild-type E. coli CsgF
(SEQ ID NO: 5 or SEQ ID NO: 6), or of a wild-type homologue of E.
coli CsgF, such as for example, a peptide comprising any one of the
amino acid sequences shown in SEQ ID NOS: 17 to 36. or a variant
(e.g. one modified to include a cleavage site) of any thereof.
[0077] For all aspects and embodiments of the present invention, a
CsgF homologue is referred to as a polypeptide that has at least
50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to
wild-type E. coli CsgF as shown in SEQ ID NO: 6. In some
embodiments, a CsgF homologue is also referred to as a polypeptide
that contains the PFAM domain PF 0614, which is characteristic for
CsgF-like proteins. A list of presently known CsgF homologues and
CsgF architectures can be found at pfam.xfam.orq//family/PF10614.
Likewise, a CsgF homologous polynucleotide can comprise a
polynucleotide that has at least 50%, 60%, 70%, 80%, 90%, 95% or
99% complete sequence identity to wild-type E. coli CsgF as shown
in SEQ ID NO: 4. Examples of truncated regions of homologues of
CsgF shown in SEQ ID NO:6 have the sequences shown in SEQ ID NOs:17
to 36.
[0078] The term "N-terminal portion of a CsgF mature peptide"
refers to a peptide having an amino acid sequence that corresponds
to the first 60, 50, or 40 amino acid residues starting from the
N-terminus of a CsgF mature peptide (without a signal sequence).
The CsgF mature peptide can be a wild-type or mutant (e.g., with
one or more mutations).
Sequence identity can also be to a fragment or portion of the full
length polynucleotide or polypeptide. Hence, a sequence may have
only 50% overall sequence identity with a full length reference
sequence, but a sequence of a particular region, domain or subunit
could share 80%, 90%, or as much as 99% sequence identity with the
reference sequence. Homology to the nucleic acid sequence of SEQ ID
NO: 1 for CsgG homologues or SEQ ID NO:4 for CsgF homologues,
respectively, is not limited simply to sequence identity. Many
nucleic acid sequences can demonstrate biologically significant
homology to each other despite having apparently low sequence
identity. Homologous nucleic acid sequences are considered to be
those that will hybridise to each other under conditions of low
stringency (M. R. Green, J. Sambrook, 2012, Molecular Cloning: A
Laboratory Manual, Fourth Edition, Books 1-3, Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y.).
[0079] The term "wild-type" refers to a gene or gene product
isolated from a naturally occurring source. A wild-type gene is
that which is most frequently observed in a population and is thus
arbitrarily designed the "normal" or "wild-type" form of the gene.
In contrast, the term "modified", "mutant" or "variant" refers to a
gene or gene product that displays modifications in sequence (e.g.,
substitutions, truncations, or insertions), post-translational
modifications and/or functional properties (e.g., altered
characteristics) when compared to the wild-type gene or gene
product. It is noted that naturally occurring mutants can be
isolated; these are identified by the fact that they have altered
characteristics when compared to the wild-type gene or gene
product. Methods for introducing or substituting
naturally-occurring amino acids are well known in the art. For
instance, methionine (M) may be substituted with arginine (R) by
replacing the codon for methionine (ATG) with a codon for arginine
(CGT) at the relevant position in a polynucleotide encoding the
mutant monomer. Methods for introducing or substituting
non-naturally-occurring amino acids are also well known in the art.
For instance, non-naturally-occurring amino acids may be introduced
by including synthetic aminoacyl-tRNAs in the IVTT system used to
express the mutant monomer. Alternatively, they may be introduced
by expressing the mutant monomer in E. coli that are auxotrophic
for specific amino acids in the presence of synthetic (i.e.
non-naturally-occurring) analogues of those specific amino acids.
They may also be produced by naked ligation if the mutant monomer
is produced using partial peptide synthesis. Conservative
substitutions replace amino acids with other amino acids of similar
chemical structure, similar chemical properties or similar
side-chain volume. The amino acids introduced may have similar
polarity, hydrophilicity, hydrophobicity, basicity, acidity,
neutrality or charge to the amino acids they replace.
Alternatively, the conservative substitution may introduce another
amino acid that is aromatic or aliphatic in the place of a
pre-existing aromatic or aliphatic amino acid. Conservative amino
acid changes are well-known in the art and may be selected in
accordance with the properties of the 20 main amino acids as
defined in Table 1 below. Where amino acids have similar polarity,
this can also be determined by reference to the hydropathy scale
for amino acid side chains in Table 2.
TABLE-US-00001 TABLE 1 Chemical properties of amino acids Ala
aliphatic, hydrophobic, neutral Met hydrophobic, neutral Cys polar,
hydrophobic, neutral Asn polar, hydrophilic, neutral Asp polar,
hydrophilic, charged (-) Pro hydrophobic, neutral Glu polar,
hydrophilic, charged (-) Gln polar, hydrophilic, neutral Phe
aromatic, hydrophobic, neutral Arg polar, hydrophilic, charged (+)
Gly aliphatic, neutral Ser polar, hydrophilic, neutral His
aromatic, polar, hydrophilic, charged (+) Thr polar, hydrophilic,
neutral Ile aliphatic, hydrophobic, neutral Val aliphatic,
hydrophobic, neutral Lys polar, hydrophilic, charged(+) Trp
aromatic, hydrophobic, neutral Leu aliphatic, hydrophobic, neutral
Tyr aromatic, polar, hydrophobic
TABLE-US-00002 TABLE 2 Hydropathy scale Side Chain Hydropathy Ile
4.5 Val 4.2 Leu 3.8 Phe 2.8 Cys 2.5 Met 1.9 Ala 1.8 Gly -0.4 Thr
-0.7 Ser -0.8 Trp -0.9 Tyr -1.3 Pro -1.6 His -3.2 Glu -3.5 Gln -3.5
Asp -3.5 Asn -3.5 Lys -3.9 Arg -4.5
[0080] A mutant or modified protein, monomer or peptide can also be
chemically modified in any way and at any site. A mutant or
modified monomer or peptide is preferably chemically modified by
attachment of a molecule to one or more cysteines (cysteine
linkage), attachment of a molecule to one or more lysines,
attachment of a molecule to one or more non-natural amino acids,
enzyme modification of an epitope or modification of a terminus.
Suitable methods for carrying out such modifications are well-known
in the art. The mutant of modified protein, monomer or peptide may
be chemically modified by the attachment of any molecule. For
instance, the mutant of modified protein, monomer or peptide may be
chemically modified by attachment of a dye or a fluorophore. In
some embodiments, the mutant or modified monomer or peptide is
chemically modified with a molecular adaptor that facilitates the
interaction between a pore comprising the monomer or peptide and a
target nucleotide or target polynucleotide sequence. The molecular
adaptor is preferably a cyclic molecule, a cyclodextrin, a species
that is capable of hybridization, a DNA binder or interchelator, a
peptide or peptide analogue, a synthetic polymer, an aromatic
planar molecule, a small positively-charged molecule or a small
molecule capable of hydrogen-bonding.
[0081] The presence of the adaptor improves the host-guest
chemistry of the pore and the nucleotide or polynucleotide sequence
and thereby improves the sequencing ability of pores formed from
the mutant monomer. The principles of host-guest chemistry are
well-known in the art. The adaptor has an effect on the physical or
chemical properties of the pore that improves its interaction with
the nucleotide or polynucleotide sequence. The adaptor may alter
the charge of the barrel or channel of the pore or specifically
interact with or bind to the nucleotide or polynucleotide sequence
thereby facilitating its interaction with the pore. Hence a
modified CsgF peptide, as provided in the disclosure, may be
coupled to enzymes or proteins providing better proximity of said
proteins or enzymes to the pore, which may facilitate certain
applications of the pore complex comprising the modified CsgF
peptide.
[0082] In this context, proteins can also be fusion proteins,
referring in particular to genetic fusion, made e.g., by
recombinant DNA technology. Proteins can also be conjugated, or
"conjugated to", as used herein, which refers, in particular, to
chemical and/or enzymatic conjugation resulting in a stable
covalent link.
[0083] Proteins may form a protein complex when several
polypeptides or protein monomers bind to or interact with each
other. "Binding" means any interaction, be it direct or indirect. A
direct interaction implies a contact between the binding partners,
for instance through a covalent link or coupling. An indirect
interaction means any interaction whereby the interaction partners
interact in a complex of more than two compounds. The interaction
can be completely indirect, with the help of one or more bridging
molecules, or partly indirect, where there is still a direct
contact between the partners, which is stabilized by the additional
interaction of one or more compounds. The "complex" as referred to
in this disclosure is defined as a group of two or more associated
proteins, which might have different functions. The association
between the different polyptides of the protein complex might be
via non-covalent interactions, such as hydrophobic or ionic forces,
or may as well be a covalent binding or coupling, such as
disulphide bridges, or peptidic bounds. Covalent "binding" or
"coupling" are used interchangeably herein, and may also involve
"cysteine coupling" or "reactive or photoreactive amino acid
coupling", referring to a bioconjugation between cysteines or
between (photo)reactive amino acids, respectively, which is a
chemical covalent link to form a stable complex. Examples of
photoreactive amino acids include azidohomoalanine,
homopropargylglycyine, homoallelglycine, p-acetyl-Phe, p-azido-Phe,
p-propargyloxy-Phe and p-benzoyl-Phe (Wang et al. 2012, in Protein
Engineering, DOI: 10.5772/28719; Chin et al. 2002, Proc. Nat. Acad.
Sci. USA 99(17); 11020-24).
[0084] A "biological pore" is a transmembrane protein structure
defining a channel or hole that allows the translocation of
molecules and ions from one side of the membrane to the other. The
translocation of ionic species through the pore may be driven by an
electrical potential difference applied to either side of the pore.
A "nanopore" is a biological pore in which the minimum diameter of
the channel through which molecules or ions pass is in the order of
nanometres (10.sup.-9 metres). In some embodiments, the biological
pore can be a transmembrane protein pore. The transmembrane protein
structure of a biological pore may be monomeric or oligomeric in
nature. Typically, the pore comprises a plurality of polypeptide
subunits arranged around a central axis thereby forming a
protein-lined channel that extends substantially perpendicular to
the membrane in which the nanopore resides. The number of
polypeptide subunits is not limited. Typically, the number of
subunits is from 5 to up to 30, suitably the number of subunits is
from 6 to 10. Alternatively, the number of subunits is not defined
as in the case of perfringolysin or related large membrane pores.
The portions of the protein subunits within the nanopore that form
protein-lined channel typically comprise secondary structural
motifs that may include one or more trans-membrane .beta.-barrel,
and/or .alpha.-helix sections.
[0085] The term "pore", "pore complex", or "complex pore", as used
interchangeably herein, refer to an oligomeric pore, wherein for
instance at least a CsgG monomer (including, e.g., one or more CsgG
monomers such as two or more CsgG monomers, three or more CsgG
monomers) or a CsgG pore (comprised of CsgG monomers), and a CsgF
peptide (e.g., a modified or truncated CsgF peptide) are associated
in the complex and together form a pore or a nanopore. The pore
complex of the disclosure has the features of a biological pore,
i.e. it has a typical transmembrane protein structure. When the
pore complex is provided in an environment having membrane
components, membranes, cells, or an insulating layer, the pore
complex will insert in the membrane or the insulating layer, and
form a "transmembrane pore complex".
[0086] The pore complex or transmembrane pore complex of the
disclosure is suited for analyte characterization. In some
embodiments, the pore complex or transmembrane complex described
herein can be used for sequencing polynucleotide sequences e.g.,
because it can discriminate between different nucleotides with a
high degree of sensitivity. The pore complex of the disclosure may
be an isolated pore complex, substantially isolated, purified or
substantially purified. A pore complex of the disclosure is
"isolated" or purified if it is completely free of any other
components, such as lipids or other pores, or other proteins with
which it is normally associated in its native state e.g., CsgE,
CsgA CsgB, or if it is sufficiently enriched from a membranous
compartment. A pore complex is substantially isolated if it is
mixed with carriers or diluents which will not interfere with its
intended use. For instance, a pore complex is substantially
isolated or substantially purified if it is present in a form that
comprises less than 10%, less than 5%, less than 2% or less than 1%
of other components, such as triblock copolymers, lipids or other
pores. Alternatively, a pore complex of the disclosure may be a
transmembrane pore complex, when present in a membrane. The
disclosure provides isolated pore complexes comprising a
homo-oligomeric pore derived from CsgG comprising identical mutant
monomers, which may also contain a mutant form of the CsgG monomer,
as a homologue thereof. Alternatively, an isolated pore complex
comprising a hetero-oligomeric CsgG pore is provided, which can be
CsgG pore consisting of mutant and wild-type CsgG monomers, or of
different forms of CsgG variants, mutants or homologues. The
isolated pore complex typically comprises at least 7, at least 8,
at least 9 or at least 10 CsgG monomers, and 1 or more (modified)
CsgF peptides, such as 2, 3, 4, 5, 6, 7, 8, 9, 10 CsgF peptides.
The pore complex may comprise any ratio of CsG monomer:CsgF
peptide. In one embodiment, the ratio of CsG monomer:CsgF peptide
is 1:1.
[0087] The "constriction", "orifice", "constriction region",
"channel constriction", or "constriction site", as used
interchangeably herein, refers to an aperture defined by a luminal
surface of a pore or pore complex, which acts to allow the passage
of ions and target molecules (e.g., but not limited to
polynucleotides or individual nucleotides) but not other non-target
molecules through the pore complex channel. In some embodiments,
the constriction(s) are the narrowest aperture(s) within a pore or
pore complex. In this embodiment, the constriction(s) may serve to
limit the passage of molecules through the pore. The size of the
constriction is typically a key factor in determining suitability
of a nanopore for nucleic acid sequencing applications. If the
constriction is too small, the molecule to be sequenced will not be
able to pass through. However, to achieve a maximal effect on ion
flow through the channel, the constriction should not be too large.
For example, the constriction should not be wider than the
solvent-accessible transverse diameter of a target analyte.
Ideally, any constriction should be as close as possible in
diameter to the transverse diameter of the analyte passing through.
For sequencing of nucleic acids and nucleic acid bases, suitable
constriction diameters are in the nanometre range (10.sup.-9 meter
range). Suitably, the diameter should be in the region of 0.5 to
2.0 nm, typically, the diameter is in the region of 0.7 to 1.2 nm.
The constriction in wild type E. coli CsgG has a diameter of
approximately 9 A (0.9 nm). The CsgF constriction formed in the
pore complex comprising the CsgG-like pore and the modified CsgF
peptide, or homologues or mutants thereof, has a diameter in the
range of 0.5 to 2 nm or in the range of 0.7 to 1.2 nm and is hence
suitable for nucleic acid sequencing.
[0088] When two or more constrictions are present and spaced apart
each constriction may interact or "read" separate nucleotides
within the nucleic acid strand at the same time. In this situation,
the reduction in ion flow through the channel will be the result of
the combined restriction in flow of all the constrictions
containing nucleotides. Hence, in some instances a double
constriction may lead to a composite current signal. In certain
circumstances, the current read-out for one constriction, or
"reading head", may not be able to be determined individually when
two such reading heads are present. The constriction of wildtype E.
coli CsgG (SEQ ID NO:3) is composed of two annular rings formed by
juxtaposition of tyrosine residues at position 51 (Tyr 51) in the
adjacent protein monomers, and also the phenylalanine and
asparagine residues at positions 56 and 55 respectively (Phe 56 and
Asn 55) (FIG. 1). The wild-type pore structure of CsgG is in most
cases being re-engineered via recombinant genetic techniques to
widen, alter, or remove one of the two annular rings that make up
the CsgG constriction (mentioned as "CsgG channel constriction"
herein), to leave a single well-defined reading head. The
constriction motif in the CsgG oligomeric pore is located at amino
acid residues at position 38 to 63 in the wild type monomeric E.
coli CsgG polypeptide, depicted in SEQ ID NO: 3. In considering
this region, mutations at any of the amino acid residue positions
50 to 53, 54 to 56 and 58 to 59, as well as key of positioning of
the sidechains of Tyr51, Asn55, and Phe56 within the channel of the
wild-type CsgG structure, was shown to be advantageous in order to
modify or alter the characteristics of the reading head. The
present disclosure relating to a pore complex comprising a
CsgG-pore and a modified CsgF peptide, or homologues or mutants
thereof, surprisingly added another constriction (mentioned as
"CsgF channel constriction" herein) to the CsgG-containing pore
complex, forming a suitable additional, second reader head in the
pore, via complex formation with the modified CsgF peptide. Said
additional CsgF channel constriction or reader head is positioned
adjacent to the constriction loop of the CsgG pore, or of the
mutated GcsG pore. Said additional CsgF channel constriction or
reader head is positioned approximately 10 nm or less, such as 5 nm
or less, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 nm from the constriction
loop of the CsgG pore, or of the mutated GcsG pore. The pore
complex or transmembrane pore complex of the disclosure includes
pore complexes with two reader heads, meaning, channel
constrictions positioned in such a way to provide a suitable
separate reader head without interfering the accuracy of other
constriction channel reader heads. Said pore complexes therefore
may include CsgG mutant pores (see incorporated references
WO2016/034591, WO2017/149316, WO2017/149317, WO2017/149318 and
International patent application no. PCT/GB2018/051191 each of
which lists mutations to the wild-type CsgG pore that improve the
properties of the pore) as well as wild-type CsgG pores, or
homologues thereof, together with a modified CsgF peptide, or
homologue or mutant thereof, wherein said CsgF peptide has another
constriction channel forming a reader head.
Pores
[0089] The invention relates to CsgG pores complexed with an
extracellularly located CsgF peptide that surprisingly introduces
an additional channel constriction or reader head in the pore
complex. Moreover, the disclosure provides positional information
for the constriction made by the CsgF peptide within the pore
complex, the peptide being inserted in the lumen of the CsgG pore,
and the constriction site being in the N-terminal part of the CsgF
protein. Furthermore, modified or truncated CsgF peptides of the
disclosure were shown to be sufficient for pore complex formation,
and provide means and methods for biosensing applications. The
disclosure comprises wildtype and mutant CsgG pores (as disclosed
in e.g., WO2016/034591, WO2017/149316, WO2017/149317, WO2017/149318
and International patent application no. PCT/GB2018/051191), or
homologues or mutants thereof, combined with modified or truncated
CsgF peptides and its mutants or homologues, all together improving
the ability of the CsgG-like pore complex to interact with an
analyte, such as a polynucleotide. The additional constriction
introduced in the CsgG-like nanopore channel by complex formation
with (modified or truncated) CsgF peptides expands the contact
surface with passing analytes and can act as a second reader head
for analyte detection and characterization. Pores comprising mutant
CsgG monomers combined with novel mutant or modified forms of CsgF
can improve the characterisation of analytes, such as
polynucleotides, providing a more discriminating direct
relationship between the observed current as the polynucleotide
moves through the pore. In particular, by having two stacked reader
heads spaced at a defined distance, the CsgG:CsgF pore complex may
facilitate characterization of polynucleotides that contain at
least one homopolymeric stretch, e.g., several consecutive copies
of the same nucleotide that otherwise exceed the interaction length
of the single CsgG reader head. Additionally, by having two stacked
constrictions at a defined distance, small molecule analytes
including organic or inorganic drugs and pollutants passing through
the CsgG:CsgF complex pore will consecutively pass two independent
reader heads. The chemical nature of either reader head can be
independently modified, each giving unique interaction properties
with the analyte, thus providing additional discriminating power
during analyte detection.
[0090] In a first aspect, the invention relates to an isolated pore
complex, comprising a CsgG pore, or a homologue or mutant thereof,
or a CsgG-like pore, and a modified CsgF peptide, or a homologue or
mutant thereof. In fact, the disclosure relates to a modified CsgG
biological pore, comprising a modified CsgF peptide, which can be a
truncated, mutant and/or variant thereof. In one embodiment, the
interaction region between said modified CsgF peptide or homologue
or mutant thereof, is located at the lumen of the CsgG pore, or its
homologues or mutants. In another embodiment, the pore complex has
two or more constriction sites or reader heads, provided by at
least one constriction of the CsgG pore, and by at least one being
introduced by the CsgF peptide, forming a complex with the CsgG
pore. N-terminal CsgF positions with the inclusion of positions in
the range of amino acid residues 39-64 of SEQ ID NO:5, or more
particularly of amino acid residues 49-64 of SEQ ID NO:5, were
shown to allow detectable amounts of a stable CsgG:CsgF complex. In
one embodiment, the CsgF constriction produced by a modified CsgF
peptide (e.g., the ones described herein) is adjacent or
head-to-head of the first constriction in the CsgG pore of the pore
complex. For CsgG or CsgG-like protein pores, the constriction site
has been determined to be formed by a loop region of a beta strand
(see FIG. 1).
[0091] In one embodiment, the modified CsgF peptide is a peptide
wherein said modification in particular refers to a truncated CsgF
protein or fragment, comprising an N-terminal CsgF peptide fragment
defined by the limitation to contain the constriction region and to
bind CsgG monomers, or homologues or mutants thereof. Said modified
CsgF peptide may additionally comprise mutations or homologous
sequences, which may facilitate certain properties of the pore
complex. In a particular embodiment, modified CsgF peptides
comprise CsgF protein truncations as compared to the wild-type
preprotein (SEQ ID NO:5) or mature protein (SEQ ID NO:6) sequence,
or homologues thereof. These modified peptides are intended to
function as a pore complex component introducing an additional
constriction site or reader head, within the CsgG-like pore formed
by CsgG and the modified or truncated CsgF peptide. Examples of
truncated modified peptides are described below.
[0092] Examples of homologues of the modified CsgF peptides are for
instance determined in Example 3, and reveal CsgF-like proteins or
CsgF peptides comprising a homologous or similar constriction
region in different bacterial strains, which may be useful in the
use of similar pore complexes. The structural properties and
CsgG-binding elements in the CsgF peptides derived from various
CsgF homologues are conserved, such that CsgF peptides can be used
in combination with different wildtype or mutant CsgG pores. This
includes complexes of CsgG pores with non-cognate CsgF, meaning
that the CsgG pore and the parental CsgF homologue from which the
CsgF is derived do not need to originate from the same operon,
bacterial species or strain.
[0093] In alternative embodiments, the CsgG pore within the pore
complex is not a wild-type pore, but comprises mutations or
modifications to increase pore properties as well. The isolated
pore complex of the disclosure, formed by the CsgG pore, or a
homologue thereof, and the modified CsgF peptide, or a homologue
thereof, may be formed by the wild-type form of the CsgG pore or
may be further modified in the CsgG pore, such as by directed
mutagenesis of particular amino acid residues, to further enhance
the desired properties of the CsgG pore for use within the pore
complex. For example, in embodiments of the present invention
mutations are contemplated to alter the number, size, shape,
placement or orientation of the constriction within the channel.
The pore complex comprising a modified mutant CsgG pore may be
prepared by known genetic engineering techniques that result in the
insertion, substitution and/or deletion of specific targeted amino
acid residues in the polypeptide sequence. In the case of the
oligomeric CsgG pore, the mutations may be made in each monomeric
polypeptide subunit, or any one of the monomers, or all of the
monomers. Suitably, in one embodiment of the invention the
mutations described are made to all monomeric polypeptides within
the oligomeric protein structure. A mutant CsgG monomer is a
monomer whose sequence varies from that of a wild-type CsgG monomer
and which retains the ability to form a pore. Methods for
confirming the ability of mutant monomers to form pores are
well-known in the art. The disclosure comprises wild type and
mutant CsgG pores (e.g., as disclosed in WO2016/034591,
WO2017/149316, WO2017/149317, WO2017/149318 and International
patent application no. PCT/GB2018/051191), or homologues thereof,
combined with modified or truncated CsgF peptides and their mutants
or homologues, all together improving the ability of the CsgG-like
pore complex to interact with an analyte, such as a polynucleotide.
Mutant CsgG pores may comprise one or more mutant monomers. The
CsgG pore may be a homopolymer comprising identical monomers, or a
heteropolymer comprising two or more different monomers. The
monomers may have one or more of the mutations described below in
any combination.
[0094] The nanopore complex comprising a modified CsgF peptide
differs as compared to the wild-type CsgF protein depicted in SEQ
ID NO:6 since the modified CsgF peptide only comprises N-terminal
fragments or truncates of the wild-type CsgF protein in certain
embodiments. The modified CsgF peptide however may be additionally
or alternatively mutated CsgF peptide in the sense that mutations
as amino acid substitutions are made to allow for a better second
constriction site in the pore formed by the complex comprising the
CsgG pore and the modified CsgF peptide. The mutant monomers might
as such have improved polynucleotide reading properties when said
complex is used in nucleotide sequencing i.e. display improved
polynucleotide capture and nucleotide discrimination, in addition
to the improved feature of the complex to comprise two reader
heads. In particular, pores constructed from the mutant peptides
capture nucleotides and polynucleotides more easily than the wild
type. In addition, pores constructed from the mutant peptides may
display an increased current range, which makes it easier to
discriminate between different nucleotides, and a reduced variance
of states, which increases the signal-to-noise ratio. In addition,
the number of nucleotides contributing to the current as the
polynucleotide moves through pores constructed from the mutants may
be decreased. This makes it easier to identify a direct
relationship between the observed current as the polynucleotide
moves through the pore and the polynucleotide sequence. In
addition, pores constructed from the mutant peptides may display an
increased throughput, e.g., are more likely to interact with an
analyte, such as a polynucleotide. This makes it easier to
characterise analytes using the pores. Pores constructed from the
mutant peptides may insert into a membrane more easily, or may
provide easier way to retain additional proteins in close vicinity
of the pore complex.
[0095] In an alternative embodiment, the CsgF constriction site
provided in the pore complex of the invention has a diameter in the
range of 0.5 nm to 2.0 nm, thereby providing a pore complex
suitable for nucleic acid sequencing, as described above.
[0096] The pore may be stabilised by covalent attachment of the
CsgF peptide to the CsgG pore. The covalent linkage may for example
be a disulphide bond, or click chemistry. The CsgF peptide and CsgG
pore may, for example, be covalently linked via residues at a
position corresponding to one or more of the following pairs of
positions of SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and
153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and
142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, and 29 and
144.
[0097] In the pore, the interaction between the CsgF peptide and
the CsgG pore may, for example, be stabilised by hydrophobic
interactions or electrostatic interactions at a position
corresponding to one or more of the following pairs of positions of
SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and 153, 4 and 133,
5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201,
12 and 149, 12 and 203, 26 and 191, and 29 and 144.
[0098] The residues in CsgF and/or CsgG at one or more of the
positions listed above may be modified in order to enhance the
interaction between CsgG and CsgF in the pore.
[0099] In one embodiment, the pore of the invention may be
isolated, substantially isolated, purified or substantially
purified. A pore of the invention is isolated or purified if it is
completely free of any other components, such as lipids or other
pores. A pore is substantially isolated if it is mixed with
carriers or diluents which will not interfere with its intended
use. For instance, a pore is substantially isolated or
substantially purified if it is present in a form that comprises
less than 10%, less than 5%, less than 2% or less than 1% of other
components, such as triblock copolymers, lipids or other pores.
Alternatively, a pore of the invention may be present in a
membrane. Suitable membranes are discussed below.
[0100] A pore of the invention may be present as an individual or
single pore. Alternatively, a pore of the invention may be present
in a homologous or heterologous population of two or more
pores.
CsgF Peptide
[0101] A second aspect of the invention relates to novel modified
CsgF monomers (peptides), or truncated CsgF proteins, or a modified
or truncated peptide of a CsgF homologue or mutant. Those novel
modified CsgF peptides may be used in a pore complex to integrate a
second or additional reader head. Said modification or truncation
is preferably resulting in a fragment of the wild-type CsgF, or of
mutant or homologue CsgF protein, more preferably an N-terminal
fragment.
[0102] Mature CsgF (shown in SEQ ID NO:6) can be divided into three
main regions: a "CsgF constriction peptide" (FCP), a "neck" region
and a "head" region (as shown in FIGS. 4 and 5). The "head" region
of the CsgF peptide is distinct from a reader head of a pore as
described herein. The "head" region of the CsgF peptide may also be
referred to as the "C-terminal head domain".
[0103] The FCP forms the contact region with the CsgG
.beta.-barrel, where it gives rise to an additional constriction.
The neck region protrudes out of the R-barrel. In the CsgG:CsgF
oligomer it forms a thin-walled hollow tube that connects the FCP
to a globular head region.
[0104] Based on multiple sequence alignments (FIG. 8),
co-purification experiments (FIG. 9) and the cryoEM reconstruction
of the CsgG:CsgF complex at 3.4 .ANG. resolution (FIG. 11) the CsgF
constriction peptide, neck and head regions can be defined as three
consecutive residue stretches in mature CsgF.
[0105] The FCP spans from approximately residues 1 to 35 of the
mature CsgF (Seq ID NO:6). The FCP forms the most conserved region
of the protein when comparing different CsgF orthologues (FIG. 8,
FIG. 10). CryoEM 3D reconstruction shows the FCP forms a
well-defined structure that binds the inside the CsgG .beta.-barrel
through non-covalent contacts with the CsgG transmembrane hairpins
TM1 (residues 134 to 154 of Seq ID NO:3) and TM2 (residues 184 to
208 of Seq ID NO:3) (FIG. 1E; 11) (TM1 and TM2 are defined in Goyal
P, et al., 2014). In the reconstruction, nine copies of the FCP
bind the CsgG oligomer (comprising 9 monomers) and together give
rise to an additional constriction approximately 2 nm above the top
of CsgG constriction formed by a continuous loop spanning residues
46 to 61 of mature CsgG (Seq ID NO:3; FIG. 1E; FIG. 11).
[0106] The cryoEM 3D reconstruction also shows the CsgF N-terminal
residue binds near the base or the top (depending on the
orientation) of the CsgG .beta.-barrel and exits the R-barrel at
residue 32. This is in good agreement with MD simulations that show
the average contact time of residue pairs in the CsgG:CsgF binding
interface (Table 4). The cryoEM structure and MD simulations show
residues 33-34 reside outside the CsgG .beta.-barrel, where a well,
though not strictly, conserved Pro residue (Pro 35 in Seq ID NO:6)
makes the transition to the CsgF neck regions. The CsgF neck is not
resolved in atomic detail in the CsgG:CsgF 3D reconstruction
pointing to its conformational flexibility. The CsgF neck is
predicted to span from residue 36 to approximately residue 50 (SEQ
ID NO:6) based on multiple sequence alignment and secondary
structure prediction. The CsgF head region forms the C-terminal
part of CsgF and is predicted to span from approximately residue 51
to the CsgF C-terminus. In the CsgG:CsgF complex, this region
oligomerizes to give rise to a globular structure that appears to
cap the CsgG:CsgF channel (FIGS. 4, 5). Multiple sequence alignment
of CsgF orthologues shows the CsgF neck to be the least conserved
region and suggests it may vary in length from one orthologue to
another (FIG. 8).
[0107] The CsgF peptide which forms part of the invention is a
truncated CsgF peptide lacking the C-terminal head; lacking the
C-terminal head and a part of the neck domain of CsgF (e.g., the
truncated CsgF peptide may comprise only a portion of the neck
domain of CsgF); or lacking the C-terminal head and neck domains of
CsgF. The CsgF peptide may lack part of the CsgF neck domain, e.g.
the CsgF peptide may comprise a portion of the neck domain, such as
for example, from amino acid residue 36 at the N-terminal end of
the neck domain (see SEQ ID:NO:6) (e.g. residues 36-40, 36-41,
36-42, 36-43, 36-45, 36-46 up to residues 36-50 or 36-60 of SEQ ID
NO: 6). The CsgF peptide preferably comprises a CsgG-binding region
and a region that forms a constriction in the pore. The
CsgG-binding region typically comprises residues 1 to 8 and/or 29
to 32 of the CsgF protein (SEQ ID NO: 6 or a homologue from another
species) and may include one or more modifications. The region that
forms a constriction in the pore typically comprises residues 9 to
28 of the CsgF protein (SEQ ID NO: 6 or a homologue from another
species) and may include one or more modifications. Residues 9 to
17 comprise the conserved motif N.sub.9PXFGGXXX.sub.17 and form a
turn region. Residues 9 to 28 form an alpha-helix. X.sub.17 (N17 in
SEQ ID NO: 6) forms the apex of the constriction region,
corresponding to the narrowest part of the CsgF constriction in the
pore. The CsgF constriction region also makes stabilising contacts
with the CsgG beta-barrel, primarily at residues 9, 11, 12, 18, 21
and 22 of SEQ ID NO: 6.
[0108] The CsgF peptide typically has a length of from 28 to 50
amino acids, such as 29 to 49, 30 to 45 or 32 to 40 amino acids.
Preferably the CsgF peptide comprises from 29 to 35 amino acids, or
29 to 45 amino acids. The CsgF peptide comprises all or part of the
FCP, which corresponds to residues 1 to 35 of SEQ ID NO: 6. Where
the CsgF peptide is shorter that the FCP, the truncation is
preferably made at the C-terminal end.
[0109] The CsgF fragment of SEQ ID NO:6 or of a homologue or mutant
thereof may have a length of 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54 or 55 amino acids.
[0110] The CsgF peptide may comprise the amino acid sequence of SEQ
ID NO: 6 from residue 1 up to any one of residues 25 to 60, such as
27 to 50, for example, 28 to 45 of SEQ ID NO: 6, or the
corresponding residues from a homologue of SEQ ID NO: 6, or variant
of either thereof. More specifically, the CsgF peptide may comprise
SEQ ID NO: 39 (residues 1 to 29 of SEQ ID NO: 6), or a homologue or
variant thereof.
[0111] Examples of such CsgF peptides comprises, consist
essentially of or consist of SEQ ID NO: 15 (residues 1 to 34 of SEQ
ID NO: 6), SEQ ID NO: 54 (residues 1 to 30 of SEQ ID NO: 6), SEQ ID
NO: 40 (residues 1 to 45 of SEQ ID NO: 6), or SEQ ID NO: 55
(residues 1 to 35 of SEQ ID NO: 6) and homologues or variants of
any thereof. Other Examples of CsgF peptides comprise, consist
essentially of or consist of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO:
9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ
ID NO: 14, SEQ ID NO: 16.
[0112] In the CsgF peptide, one or more residues e.g., in SEQ ID
NO: 15, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 54, or SEQ ID NO:
55 may be modified.
[0113] For example, the CsgF peptide may comprise a modification at
a position corresponding to one or more of the following positions
in SEQ ID NO: 6: G1, T4, F5, R8, N9, N11, F12, A26 and Q29.
[0114] The CsgF peptide may be modified to introduce a cysteine, a
hydrophobic amino acid, a charged amino acid, a non-native reactive
amino acid, or photoreactive amino acid, for example at a position
corresponding to one or more of the following positions in SEQ ID
NO: 6: G1, T4, F5, R8, N9, N11, F12, A26 and Q29.
[0115] For example, the CsgF peptide may comprise a modification at
a position corresponding to one or more of the following positions
in SEQ ID NO: 6: N15, N17, A20, N24 and A28. The CsgF peptide may
comprise a modification at a position corresponding to D34 to
stabilise the CsgG-CsgF complex. In particular embodiments, the
CsgF peptide comprises one or more of the substitutions:
N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C,
A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C,
A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C and D34F/Y/W/R/K/N/Q/C. The CsgF
peptide may, for example, comprise one or more of the following
substitutions: G1C, T4C, N17S, and D34Y or D34N.
[0116] The CsgF peptide may be produced by cleavage of a longer
protein, such as full-length CsgF using an enzyme. Cleavage at a
particular site may be directed by modifying the longer protein,
such as full-length CsgF, to include an enzyme cleavage site at an
appropriate position. Examples of CsgF amino acid sequences that
have been modified to include such enzyme cleavage sites are shown
in SEQ ID NOs: 56 to 67. Following cleavage all or part of the
added enzyme cleavage site may be present in the CsgF peptide that
associates with CsgG to form a pore. Thus the CsgF peptide may
further comprise all or part of an enzyme cleavage site at its
C-terminal end.
[0117] Some examples of suitable CsgF peptides are shown in Table 3
below:
TABLE-US-00003 TABLE 3 CsgF peptides SEQ ID Technical description
Protein sequence NO: CsgF-(1-45) GTMTFQFRNPNFGGNPNNGAFLL 40
NSAQAQNSYKDPSYNDDFGIET CsgF-(1-29) GTMTFQFRNPNFGGNPNNGAFLL 39
NSAQAQ CsgF-(1-35) GTMTFQFRNPNFGGNPNNGAFLL 55 NSAQAQNSYKDP
CsgF-G1C-(1-45) CTMTFQFRNPNFGGNPNNGAFLL 89 NSAQAQNSYKDPSYND DFGIET
CsgF-G1C-(1-30) CTMTFQFRNPNFGGNPNNGAFLL 90 NSAQAQ CsgF-G1C-(1-35)
CTMTFQFRNPNFGGNPNNGAFLL 91 NSAQAQNSYKDP CsgF-N17S-(1-35)
GTMTFQFRNPNFGGNPSNGAFLL 92 NSAQAQNSYKDP CsgF-G1C/D34Y-
CTMTFQFRNPNFGGNPNNGAFLL 93 (1-35) NSAQAQNSYKYP CsgF-N17S-(1-45)
GTMTFQFRNPNFGGNPSNGAFLL 94 NSAQAQNSYKDPSYND DFGIET CsgF-G1C/N17S/
CTMTFQFRNPNFGGNPSNGAFLL 95 D34Y-(1-35) NSAQAQNSYKYP
CsgF-G1C-(1-30)- CTMTFQFRNPNFGGNPNNGAFLL 96 H10 NSAQAQHHHHHHHHHH
CsgF-G1C-(1-35)- CTMTFQFRNPNFGGNPNNGAFLL 97 H10
NSAQAQNSYKDPHHHHHHHHHH CsgF-N17S/D34Y- GTMTFQFRNPNFGGNPSNGAFLL 98
(1-35) NSAQAQNSYKYP CsgF-N17S/D34N- GTMTFQFRNPNFGGNPSNGAFLL 99
(1-35) NSAQAQNSYKNP CsgF-T4C/N17S- GTMCFQFRNPNFGGNPSNGAFLL 100
(1-35) NSAQAQNSYKDP CsgF-T4C/N17S/ GTMCFQFRNPNFGGNPSNGAFLL 101
D34Y-(1-35) NSAQAQNSYKYP CsgF-T4C/N17S/ GTMCFQFRNPNFGGNPSNGAFLL 102
D34N-(1-35) NSAQAQNSYKNP CsgF-G1C/T4C/ CTMCFQFRNPNFGGNPSNGAFLL 103
N17S-(1-35) NSAQAQNSYKDP CsgF-G1C/T4C/N17S/ CTMCFQFRNPNFGGNPSNGAFLL
104 D34N-(1-35) NSAQAQNSYKNP CsgF-G1C/T4C/N17S/
CTMCFQFRNPNFGGNPSNGAFLL 105 D34Y-(1-35) NSAQAQNSYKYP
CsgF-N17V-(1-35) GTMTFQFRNPNFGGNPVNGAFLL 106 NSAQAQNSYKDP
CsgF-N17A-(1-35) GTMTFQFRNPNFGGNPANGAFLL 107 NSAQAQNSYKDP
CsgF-N17F-(1-35) GTMTFQFRNPNFGGNPFNGAFLL 108 NSAQAQNSYKDP
CsgF-N17S-(1-30) GTMTFQFRNPNFGGNPSNGAFLL 109 NSAQAQN
CsgF-N17V-(1-30) GTMTFQFRNPNFGGNPVNGAFLL 110 NSAQAQN
CsgF-N17A-(1-30) GTMTFQFRNPNFGGNPANGAFLL 111 NSAQAQN
CsgF-N17F-(1-30) GTMTFQFRNPNFGGNPFNGAFLL 112 NSAQAQN
CsgF-N17V/D34Y- GTMTFQFRNPNFGGNPVNGAFLL 113 (1-35) NSAQAQNSYKYP
CsgF-N17A/D34Y- GTMTFQFRNPNFGGNPANGAFLL 114 (1-35) NSAQAQNSYKYP
CsgF-N17F/D34Y- GTMTFQFRNPNFGGNPFNGAFLL 115 (1-35) NSAQAQNSYKYP
CsgF-N17S/A20Q- GTMTFQFRNPNFGGNPSNGQFLL 116 (1-35) NSAQAQNSYKDP
[0118] In particular embodiments, said CsgF fragment comprises the
amino acid sequence SEQ ID NO:39, or mutant or homologue thereof.
In particular, SEQ ID NO:39 comprises the first 29 amino acids of
the mature CsgF peptide (SEQ ID NO:6). In another embodiment, the
modified CsgF peptide of the invention is a truncated peptide
comprising SEQ ID NO:40. In particular, SEQ ID NO:40 comprises the
first 45 amino acids of the mature CsgF peptide (SEQ ID NO:6). In
particular, the CsgF constriction site and binding site to the CsgG
are located within the N-terminal CsgF peptide region, further
characterised in that amino acid 39 to 64 of SEQ ID NO:5 (present
in SEQ ID NO:39 and SEQ ID NO:40), or in particular amino acid 49
to 64 of SEQ ID NO:5 (present in SEQ ID NO:40, but not in SEQ ID
NO:39, the latter fragment encoded by SEQ ID NO:39 showing a weaker
interaction with CsgG (see Examples)), confer a higher stability to
the complex. Hence, the disclosure provides a modification of the
CsgF protein by truncating the protein to said peptides or peptides
comprising said N-terminal fragments or constriction site region to
allow complex formation with the CsgG pore, or homologues or
mutants thereof, in vivo. Further limitation is provided in one
embodiment relating to a modified CsgF peptide comprising SEQ ID
NO:37 or SEQ ID NO:38. Finally, identification of CsgF homologous
peptides, especially aligned within the constriction region (FCP
peptides), also provide modified CsgF peptide homologues that may
form a part of said isolated complex (e.g.; see FIGS. 8 and
10).
[0119] A further embodiment relates to the modified or truncated
CsgF peptides comprising SEQ ID NO:15, wherein said SEQ ID NO:15
contains the region of the CsgF protein including several residues
from the region of the CsgG binding and/or constriction site,
sufficient for in vitro reconstitution of the complex pore
comprising CsgG or a homologue thereof, and a modified CsgF
peptide, to result in an isolated pore complex comprising a CsgF
channel constriction. Another embodiment describes said modified
CsgF peptide comprising SEQ ID NO:16, which contains an N-terminal
fragment of the CsgF protein, and two additional amino acids (KD),
which will increase solubility and stability of the (synthetic)
peptide, as well to allow in vitro reconstitution of said complex
pore. Further embodiments are provided wherein said modified CsgF
peptide comprises SEQ ID NO:15, SEQ ID NO:16 or a homologue or
mutant thereof, wherein said modified CsgF peptide is further
mutated, but still retains a minimal of 35% amino acid identity to
SEQ ID NO:15, or SEQ ID NO:16, respectively, within the region of
the modified CsgF peptide corresponding to said SEQ ID NO:15 or 16
e.g., 40%, 50%, 60%, 70%, 80% 85%, 90% amino acid identity. Further
embodiments are provided wherein said modified CsgF peptide
comprises SEQ ID NO:15, SEQ ID NO:16 or a homologue or mutant
thereof, wherein said modified CsgF peptide is further mutated, but
still retains a minimal of 40%, 45%, 50%, 60%, 70%, 80% 85% or 90%
amino acid identity to SEQ ID NO:15, or SEQ ID NO:16, respectively,
within the region of the modified CsgF peptide corresponding to
said SEQ ID NO:15 or 16. Those mutated regions are intended to
alter and/or improve the characteristics of the CsgF constriction
site, as discussed above, so for instance a more accurate target
analysis can be obtained. Another embodiment discloses modified
CsgF peptides wherein one or more positions in the regions
comprising SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or SEQ ID NO:55
are modified, and wherein said mutation(s) retain a minimal of 35%
amino acid identity, or 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%
amino acid identity to SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or
SEQ ID NO:55 in the peptide fragment corresponding to the region
comprising SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:54 or SEQ ID
NO:55.
[0120] So further embodiments of the invention involve the isolated
pore complex comprising a CsgG pore, or homologue or mutant
thereof, and modified CsgF peptide(s), or homologues or mutants
thereof, wherein said modified CsgF peptide(s) are defined as
described in the second aspect of the invention.
[0121] Additional embodiments relate to an isolated pore complex,
wherein said CsgG pore, at least via one monomer, and the modified
CsgF peptide, are coupled via covalent binding. Said covalent link
or binding is in one instance possible via cysteine linkage,
wherein the sulfhydryl side group of cysteine covalently links with
another amino acid residue or moiety. In a second possibility, the
covalent linkage is obtained via an interaction between non-native
(photo)reactive amino acids. (Photo-)reactive amino acids are
referring to artificial analogs of natural amino acids that can be
used for crosslinking of protein complexes, and may be incorporated
into proteins and peptides in vivo or in vitro. Photo-reactive
amino acid analogs in common use are photoreactive diazirine
analogs to leucine and methionine, and para-benzoyl-phenyl-alanine,
as well as azidohomoalanine, homopropargylglycyine,
homoallelglycine, p-acetyl-Phe, p-azido-Phe, p-propargyloxy-Phe and
p-benzoyl-Phe (Wang et al. 2012; Chin et al. 2002). Upon exposure
to ultraviolet light, they are activated and covalently bind to
interacting proteins that are within a few angstroms of the
photo-reactive amino acid analog. However, the positions in the
CsgG monomer where said covalent linkages may take place is
dependent on the exposure to the modified CsgF peptide. As shown in
FIG. 1, several amino acids are in the position to provide the
covalent linkage, namely positions 132, 133, 136, 138, 140, 142,
144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201,
203, 205, 207 or 209 of SEQ ID NO: 3, or of homologues thereof.
[0122] Another aspect of the invention relates to constructs
comprising said modified CsgF peptide, wherein said peptide is
covalently attached. A "construct" comprises two or more covalently
attached monomers derived from modified CsgF and/or CsgG, or a
homologue thereof. In other words, a construct may contain more
than one monomer. In another aspect, the invention also provides a
pore complex comprising at least one construct of the invention.
The pore complex contains sufficient constructs and, if necessary,
monomers to form the pore. For instance, an octameric pore may
comprise (a) four constructs each comprising two monomers, (b) two
constructs each comprising four monomers, (c) one construct
comprising two monomers and six monomers that do not form part of a
construct, or (d) one or two CsgF monomers in one construct, and
one construct with six to seven CsgG monomers or even (e) a
construct with CsgF and CsgG monomer in addition to another
construct solely comprising CsgG monomers. Same and additional
possibilities are provided for a nonameric pore for instance. Other
combinations of constructs and monomers can be envisaged by the
skilled person. One or more constructs of the invention may be used
to form a pore complex for characterising, such as sequencing,
polynucleotides. The construct may comprise at least 2, at least 3,
at least 4, at least 5, at least 6, at least 7, at least 8, at
least 9 or at least 10 monomers. The construct preferably comprises
two monomers. The two or more monomers may be the same or
different, may be CsgF, CsgG, CsgG/CsgF fusion monomers or
homologues thereof, or any combination thereof.
[0123] Another embodiment relates to the polynucleotide or nucleic
acid molecule encoding said modified CsgF peptides of the
invention, or homologues or mutants thereof, or polynucleotides
encoding a construct as described above.
[0124] Certain embodiments relate to an isolated transmembrane pore
complex comprising the isolated pore complex according to the first
and second aspect of the invention, and the components of a
membrane. Said isolated transmembrane pore complex is directly
applicable for use in molecular sensing, such as nucleic acid
sequencing. Alternatively, a membranous composition is provided,
comprising a modified CsgG/CsgF biological pore as described
herein, according to the isolated pore complex of the invention,
and a membrane, membrane components, or an insulating layer. One
embodiment relates to an isolated transmembrane pore complex
consisting of the isolated pore complex according to the invention,
and the components of a membrane.
[0125] Although the CsgG:CsgF complex is very stable, when CsgF is
truncated, the stability of CsgG:CsgF complexes decrease compared
to a complex comprising full length CsgF. Therefore, disulphide
bonds can be made between CsgG and CsgF to make the complex more
stable, for example following introduction of cysteine residues at
the positions identified herein. The pore complex can be made in
any of the previously mentioned methods and disulphide bond
formation can be induced by using oxidising agents (eg:
Copper-orthophenanthroline). Other interactions (eg: hydrophobic
interactions, charge-charge interactions/electrostatic
interactions) can also be used in those positions instead of
cysteine interactions.
[0126] In another embodiment, unnatural amino acids can also be
incorporated in those positions. In this embodiment, covalent bonds
made be made by via click chemistry. For example, unnatural amino
acids with azide or alkyne or with a dibenzocyclooctyne (DBCO)
group and/or a bicyclo[6.1.0]nonyne (BCN) group may be introduced
at one or more of these positions.
[0127] Such stabilising mutations can be combined with any other
modifications to CsgG and/or CsgF, for example the modifications
disclosed herein.
[0128] The CsgG pore may comprise at least one, such as 2, 3, 4, 5,
6, 7, 8, 9 or 10, CsgG monomers that is/are modified to facilitate
attachment to the CsgF peptide. For example a cysteine residue may
be introduced at one or more of the positions corresponding to
positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151,
153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of
SEQ ID NO: 3, and/or at any one of the positions identified in
Table 4 as being predicted to make contact with CsgF, to facilitate
covalent attachment to CsgG. As an alternative or addition to
covalent attachment via cysteine residues, the pore may be
stabilised by hydrophobic interactions or electrostatic
interactions. To facilitate such interactions, a non-native
reactive or photoreactive amino acid at a position corresponding to
one or more of positions 132, 133, 136, 138, 140, 142, 144, 145,
147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205,
207 and 209 of SEQ ID NO: 3, and/or at any one of the positions
identified in Table 4 as being predicted to make contact with
CsgF.
[0129] The CsgF peptide may be modified to facilitate attachment to
the CsgG pore. For example a cysteine residue may be introduced at
one or more of the positions corresponding to positions 1, 4, 5, 8,
9, 11, 12, 26 or 29 of SEQ ID NO: 6, and/or at any one of the
positions identified in Table 4 as being predicted to make contact
with CsgF, to facilitate covalent attachment to CsgG. As an
alternative or addition to covalent attachment via cysteine
residues, the pore may be stabilised by hydrophobic interactions or
electrostatic interactions. To facilitate such interactions, a
non-native reactive or photoreactive amino acid at a position
corresponding to one or more of positions 1, 4, 5, 8, 9, 11, 12, 26
or 29 of SEQ ID NO: 6, and/or at any one of the positions
identified in Table 4 as being predicted to make contact with
CsgF.
[0130] Preferred exemplary CsgF peptides include comprise the
following mutations relative to SEQ ID NO: 6:
N15X.sub.1/N17X.sub.2/A20X.sub.3/N24X.sub.4/A28X.sub.5/D34X.sub.6,
wherein X.sub.1 is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, X.sub.2 is
N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C, X.sub.3 is
A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C, X.sub.4 is
N/S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C, X.sub.5 is
A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C and X.sub.5 is D/F/Y/W/R/K/N/Q/C.
The mutations at positions N15, N17, A20, N24 and A28 are
constriction mutations and the mutation at position 34 affects the
interaction pf CsgF with the bottom of the CsgG pore to stabilise
the interaction.
CsgG Pore
[0131] The CsgG pore may be a homo-oligomeric pore comprising
identical mutant monomers of the invention. The CsgG pore may be a
hetero-oligomeric pore derived from CsgG, for example comprising at
least one mutant monomer as disclosed herein.
[0132] The CsgG pore may contain any number of mutant monomers. The
pore typically comprises at least 7, at least 8, at least 9 or at
least 10 identical mutant monomers, such as 7, 8, 9 or 10 mutant
monomers. The CsgG pore preferably comprises eight or nine
identical mutant monomers.
[0133] In a preferred embodiment, all of the monomers in the
hetero-oligomeric CsgG pore (such as 10, 9, 8 or 7 of the monomers)
are mutant monomers as disclosed herein, wherein at least one of
them differs from the others. They may all differ from one
another.
[0134] The mutant monomers in the CsgG pore are preferably all
approximately the same length or are the same length. The barrels
of the mutant monomers of the invention in the pore are preferably
approximately the same length or are the same length. Length may be
measured in number of amino acids and/or units of length.
[0135] A mutant monomer may be a variant of SEQ ID NO: 3. Over the
entire length of the amino acid sequence of SEQ ID NO: 3, a variant
will preferably be at least 50% homologous to that sequence based
on amino acid identity. More preferably, the variant may be at
least 55%, at least 60%, at least 65%, at least 70%, at least 75%,
at least 80%, at least 85%, at least 90% and more preferably at
least 95%, 97% or 99% homologous based on amino acid identity to
the amino acid sequence of SEQ ID NO: 3 over the entire sequence.
There may be at least 80%, for example at least 85%, 90% or 95%,
amino acid identity over a stretch of 100 or more, for example 125,
150, 175 or 200 or more, contiguous amino acids ("hard
homology").
[0136] CsgG monomers are highly conserved (as can be readily
appreciated from FIGS. 45 to 47 of WO2017/149317). Furthermore,
from knowledge of the mutations in relation to SEQ ID NO: 3 it is
possible to determine the equivalent positions for mutations of
CsgG monomers other than that of SEQ ID NO: 3.
[0137] Thus reference to a mutant CsgG monomer comprising a variant
of the sequence as shown in SEQ ID NO: 3 and specific amino-acid
mutations thereof as set out in the claims and elsewhere in the
specification also encompasses a mutant CsgG monomer comprising a
variant of the sequence as shown in SEQ ID NOs: 68 to 88 and
corresponding amino-acid mutations thereof. Likewise reference to a
construct, pore or method involving the use of a pore relating to a
mutant CsgG monomer comprising a variant of the sequence as shown
in SEQ ID NO: 3 and specific amino-acid mutations thereof as set
out in the claims and elsewhere in the specification also
encompasses a construct, pore or method relating to a mutant CsgG
monomer comprising a variant of the sequence according the above
disclosed SEQ ID NOs and corresponding amino-acid mutations
thereof. If will further be appreciated that the invention extends
to other variant CsgG monomers not expressly identified in the
specification that show highly conserved regions.
[0138] Standard methods in the art may be used to determine
homology. For example the UWGCG Package provides the BESTFIT
program which can be used to calculate homology, for example used
on its default settings (Devereux et al (1984) Nucleic Acids
Research 12, p 387-395). The PILEUP and BLAST algorithms can be
used to calculate homology or line up sequences (such as
identifying equivalent residues or corresponding sequences
(typically on their default settings)), for example as described in
Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, S. F et al
(1990) J Mol Biol 215:403-10. Software for performing BLAST
analyses is publicly available through the National Center for
Biotechnology Information (www.ncbi.nlm.nih.gov/).
[0139] SEQ ID NO: 3 is the wild-type CsgG monomer from Escherichia
coli Str. K-12 substr. MC4100. A variant of SEQ ID NO: 3 may
comprise any of the substitutions present in another CsgG
homologue. Preferred CsgG homologues are shown in SEQ ID NOs: 68 to
88. The variant may comprise combinations of one or more of the
substitutions present in SEQ ID NOs: 68 to 88 compared with SEQ ID
NO: 3. For example, mutations may be made at any one or more of the
positions in SEQ ID NO: 3 that differ between SEQ ID NO: 3 and any
one of SEQ ID NOs: 68 to 88. Such a mutation may be a substitution
of an amino acid in SEQ ID NO: 3 with an amino acid from the
corresponding position in any one of SEQ ID NOs: 68 to 88.
Alternatively, the mutation at any one of these positions may be a
substitution with any amino acid, or may be a deletion or insertion
mutation, such as deletion or insertion of 1 to 10 amino acids,
such as of 2 to 8 or 3 to 6 amino acids. Other than the mutations
disclosed herein, the amino acids that are conserved between SEQ ID
NO: 3 and all of SEQ ID NOs: 66 to 88 are preferably present in a
variant of the invention. However, conservative mutations may be
made at any one or more of these positions that are conserved
between SEQ ID NO: 3 and all of SEQ ID NOs: 66 to 88.
[0140] The invention provides a pore-forming CsgG mutant monomer
that comprises any one or more of the amino acids described herein
as being substituted into a specific position of SEQ ID NO: 3 at a
position in the structure of the CsgG monomer that corresponds to
the specific position in SEQ ID NO: 3. Corresponding positions may
be determined by standard techniques in the art. For example, the
PILEUP and BLAST algorithms mentioned above can be used to align
the sequence of a CsgG monomer with SEQ ID NO: 3 and hence to
identify corresponding residues.
[0141] The pore-forming mutant monomer typically retains the
ability to form the same 3D structure as the wild-type CsgG
monomer, such as the same 3D structure as a CsgG monomer having the
sequence of SEQ ID NO: 3. The 3D structure of CsgG is known in the
art and is disclosed, for example, in Goyal et al (2014) Nature
516(7530):250-3. Any number of mutations may be made in the
wild-type CsgG sequence in addition to the mutations described
herein provided that the CsgG mutant monomer retains the improved
properties imparted on it by the mutations of the present
invention.
[0142] Typically the CsgG monomer will retain the ability to form a
structure comprising three alpha-helicies and five beta-sheets.
Mutations may be made at least in the region of CsgG which is
N-terminal to the first alpha helix (which starts at S63 in SEQ ID
NO:3), in the second alpha helix (from G85 to A99 of SEQ ID NO: 3),
in the loop between the second alpha helix and the first beta sheet
(from Q100 to N120 of SEQ ID NO: 3), in the fourth and fifth beta
sheets (S173 to R192 and R198 to T107 of SEQ ID NO: 3,
respectively) and in the loop between the fourth and fifth beta
sheets (F193 to Q197 of SEQ ID NO: 3) without affecting the ability
of the CsgG monomer to form a transmembrane pore, which
transmembrane pore is capable of translocating polypeptides.
Therefore, it is envisaged that further mutations may be made in
any of these regions in any CsgG monomer without affecting the
ability of the monomer to form a pore that can translocate
polynucleotides. It is also expected that mutations may be made in
other regions, such as in any of the alpha helicies (S63 to R76,
G85 to A99 or V211 to L236 of SEQ ID NO: 3) or in any of the beta
sheets (1121 to N133, K135 to R142, 1146 to R162, S173 to R192 or
R198 to T107 of SEQ ID NO: 3) without affecting the ability of the
monomer to form a pore that can translocate polynucleotides. It is
also expected that deletions of one or more amino acids can be made
in any of the loop regions linking the alpha helicies and beta
sheets and/or in the N-terminal and/or C-terminal regions of the
CsgG monomer without affecting the ability of the monomer to form a
pore that can translocate polynucleotides.
[0143] Amino acid substitutions may be made to the amino acid
sequence of SEQ ID NO: 3 in addition to those discussed above, for
example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions replace amino acids with other amino
acids of similar chemical structure, similar chemical properties or
similar side-chain volume. The amino acids introduced may have
similar polarity, hydrophilicity, hydrophobicity, basicity,
acidity, neutrality or charge to the amino acids they replace.
Alternatively, the conservative substitution may introduce another
amino acid that is aromatic or aliphatic in the place of a
pre-existing aromatic or aliphatic amino acid. Conservative amino
acid changes are well-known in the art and may be selected in
accordance with the properties of the 20 main amino acids as
defined in Table 1 above. Where amino acids have similar polarity,
this can also be determined by reference to the hydropathy scale
for amino acid side chains in Table 2.
[0144] One or more amino acid residues of the amino acid sequence
of SEQ ID NO: 3 may additionally be deleted from the polypeptides
described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30 or more residues
may be deleted.
[0145] Variants may include fragments of SEQ ID NO: 3. Such
fragments retain pore forming activity. Fragments may be at least
50, at least 100, at least 150, at least 200 or at least 250 amino
acids in length. Such fragments may be used to produce the pores. A
fragment preferably comprises the membrane spanning domain of SEQ
ID NO: 3, namely K135-Q153 and S183-S208.
[0146] One or more amino acids may be alternatively or additionally
added to the polypeptides described above. An extension may be
provided at the amino terminal or carboxy terminal of the amino
acid sequence of SEQ ID NO: 3 or polypeptide variant or fragment
thereof. The extension may be quite short, for example from 1 to 10
amino acids in length. Alternatively, the extension may be longer,
for example up to 50 or 100 amino acids. A carrier protein may be
fused to an amino acid sequence according to the invention. Other
fusion proteins are discussed in more detail below.
[0147] A CsgG pore as described herein includes a wild type CsgG
pore, or a homologue or a mutant/variant thereof. A variant is a
polypeptide that has an amino acid sequence which varies from that
of SEQ ID NO: 3 and which retains its ability to form a pore. A
variant typically contains the regions of SEQ ID NO: 3 that are
responsible for pore formation. The pore forming ability of CsgG,
which contains a R-barrel, is provided by R-sheets in each subunit.
A variant of SEQ ID NO: 3 typically comprises the regions in SEQ ID
NO: 3 that form R-sheets, namely K134-Q154 and S183-S208. One or
more modifications can be made to the regions of SEQ ID NO: 3 that
form R-sheets as long as the resulting variant retains its ability
to form a pore. A variant of SEQ ID NO: 3 preferably includes one
or more modifications, such as substitutions, additions or
deletions, within its .alpha.-helices and/or loop regions.
[0148] The mutant CsgG monomers may be a mutant CsgG monomer, which
is a monomer whose sequence varies from that of a wild-type CsgG
monomer and which retains the ability to form a pore. A mutant
monomer may also be referred to herein as a variant. Methods for
confirming the ability of mutant monomers to form pores are
well-known in the art and are discussed in more detail below.
[0149] Particular pore-forming CsgG mutant monomers that may be
included in the CsgG pore may comprise any one or more of the
following modifications: [0150] a W at a position corresponding to
R97 in SEQ ID NO:3; [0151] a W at a position corresponding to R93
in SEQ ID NO:3; [0152] a Y at a position corresponding to R97 in
SEQ ID NO: 3; [0153] a Y at a position corresponding to R93 in SEQ
ID NO: 3; [0154] a Y at each of the positions corresponding to R93
and R97 in SEQ ID NO: 3; [0155] a D at the position corresponding
to R192 in SEQ ID NO:3; [0156] deletion of the residues at the
positions corresponding to V105-I107 in SEQ ID NO:3; [0157]
deletion of the residues at one or more of the positions
corresponding to F193 to L199 in SEQ ID NO: 3; [0158] deletion of
the residues the positions corresponding to F195 to L199 in SEQ ID
NO: 3; [0159] deletion of the residues the positions corresponding
to F193 to L199 in SEQ ID NO: 3; [0160] a T at the position
corresponding to F191 in SEQ ID NO: 3; [0161] a Q at the position
corresponding to K49 in SEQ ID NO: 3; [0162] a N at the position
corresponding to K49 in SEQ ID NO: 3; [0163] a Q at the position
corresponding to K42 in SEQ ID NO: 3; [0164] a Q at the position
corresponding to E44 in SEQ ID NO: 3; [0165] a N at the position
corresponding to E44 in SEQ ID NO: 3; [0166] a R at the position
corresponding to L90 in SEQ ID NO: 3; [0167] a R at the position
corresponding to L91 in SEQ ID NO: 3; [0168] a R at the position
corresponding to 195 in SEQ ID NO: 3; [0169] a R at the position
corresponding to A99 in SEQ ID NO: 3; [0170] a H at the position
corresponding to E101 in SEQ ID NO: 3; [0171] a K at the position
corresponding to E101 in SEQ ID NO: 3; [0172] a N at the position
corresponding to E101 in SEQ ID NO: 3; [0173] a Q at the position
corresponding to E101 in SEQ ID NO: 3; [0174] a T at the position
corresponding to E101 in SEQ ID NO: 3; [0175] a K at the position
corresponding to Q114 in SEQ ID NO: 3.
[0176] The CsgG pore-forming monomer preferably further comprises
an A at the position corresponding to Y51 in SEQ ID NO: 3 and/or a
Q at the position corresponding to F56 in SEQ ID NO: 3.
[0177] Pores constructed from the CsgG monomers comprising a R to W
substitution at the position corresponding to position 97 of SEQ ID
NO: 3 display an increased accuracy as compared to otherwise
identical pores without the modification at 97 when characterizing
(or sequencing) target polynucleotides. An increased accuracy is
also seen when instead of R97W the CsgG monomers comprise the
modification R to W at a position corresponding to position 97 of
SEQ ID NO: 3 or the modifications R to Y at positions corresponding
to positions 93 and 97 of SEQ ID NO: 3. Accordingly, pores may be
constructed from one or more mutant CsgG monomers that comprise a
modification at positions corresponding to R97 or R93 of SEQ ID NO:
3 such that the modification increases the hydrophobicity of the
amino acid. For example, such modification may include an amino
acid substitution with any amino acid containing a hydrophobic side
chain, including, e.g., but not limited to W and Y.
[0178] CsgG monomers that comprise a R to D, Q, F, S or T mutation
at a position corresponding to position 192 of SEQ ID NO: 3 are
easier to express than monomers which do not have a substitution at
position 192 which may be due to the reduction of positive charge.
Accordingly position 192 may be substituted with an amino-acid
which reduces the positive charge. Monomers that comprise
R192D/Q/F/S/T may also comprise additional modifications which
improve the ability of mutant pores formed from the monomers to
interact with and characterise analytes, such as polynucleotides.
However, in one embodiment it is preferred that the residue at the
position corresponding to position 193 of SEQ ID NO: 3 is R or K,
more preferably R.
[0179] Pores comprising CsgG monomers that comprise a deletion of
V105, A106 and I107, a deletion of F193, I194, D195, Y196, Q197,
R198 and L199 or a deletion of D195, Y196, Q197, R198 and L199,
and/or F191T display an increased accuracy when characterizing (or
sequencing) target polynucleotides. The amino-acids at positions
105 to 107 correspond to the cis-loops in the cap of the nanopore
and the amino-acids at positions 193 to 199 correspond to the
trans-loops at the other end of the pore. Without wishing to be
bound by theory it is thought that deletion of the cis-loops
improves the interaction of the enzyme with the pore and removal of
the trans-loops decreases any unwanted interaction between DNA on
the trans side of the pore.
[0180] Pores comprising CsgG monomers that comprise a K to Q or K
to N mutations at a position corresponding to K94 of SEQ ID NO: 3
show a reduction in the number of noisy pores (namely those pores
that give rise to an increased signal:noise ratio) as compared to
identical pores without the mutation at 94 when characterizing (or
sequencing) target polynucleotides. Position 94 is found within the
vestibule of the pore and was found to be a particularly sensitive
position in relation to the noise of the current signal.
[0181] Pores comprising the CsgG monomers that comprise T104K or
T104R, N91R, E101K/N/Q/T/H, E44N/Q, Q114K, A99R, I95R, N91R, L90R,
E44Q/N and/or Q42K, or corresponding mutations, all demonstrate an
improved ability to capture target polynucleotides when used to
characterize (or sequence) target polynucleotides as compared to
identical pores without substitutions at these positions.
[0182] In one embodiment, the CsgG pore comprises one or more
monomers that are variants of SEQ ID NO: 3 comprising (a) one or
more mutations at the following positions (i.e. mutations at one or
more of the following positions) 141, R93, A98, Q100, G103, T104,
A106, I107, N108, L113, S115, T117, Y130, K135, E170, S208, D233,
D238 and E244 and/or (b) one or more of D43S, E44S,
F48S/N/Q/Y/W/l/V/H/R/K, Q87N/R/K, N91K/R, K94R/F/Y/W/L/S/N,
R97F/Y/W/V/I/K/S/Q/H, E101I/L/A/H, N102K/Q/L/l/V/S/H, R110F/G/N,
Q114R/K, R142Q/S, T150Y/A/V/L/S/Q/N, R192D/Q/F/S/T and
D248S/N/Q/K/R. The variant may comprise (a); (b); or (a) and (b).
In some embodiments, the variant comprises R97W. In some
embodiments, the variant comprises R192D/Q/F/S/T, such as R192D/Q.
In (a), the variant may comprise modifications at any number and
combination of the positions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 of the positions.
[0183] In (a), the variant preferably comprises one or more of
I41N, R93F/Y/W/L/I/V/N/Q/S, A98K/R, Q100K/R, G103F/W/S/N/K/R,
T104R/K, A106R/K, I107R/K/W/F/Y/L/V, N108R/K, L113K/R, S115R/K,
T117R/K, Y130W/F/H/Q/N, K135L/V/N/Q/S, E170S/N/Q/K/R,
S208V/I/F/W/Y/L/T, D233S/N/Q/K/R, D238S/N/Q/K/R and
E244S/N/Q/K/R.
[0184] In (a), the variant preferably comprises one or more
modifications which provide more consistent movement of a target
polynucleotide with respect to, such as through, a transmembrane
pore comprising the monomer. In particular, in (a), the variant
preferably comprises one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) R93, G103 and I107. The variant may comprise R93; G103;
1107; R93 and G103; R93 and I107; G103 and I107; or R93, G103 and
I107. The variant preferably comprises one or more of
R93F/Y/W/L/l/V/N/Q/S, G103F/W/S/N/K/R and I107R/K/W/F/Y/L/V. These
may be present in any combination shown for the positions R93, G103
and I107.
[0185] In (a), the variant preferably comprises one or
modifications which allow pores constructed from the mutant
monomers preferably capture nucleotides and polynucleotides more
easily. In particular, in (a), the variant preferably comprises one
or more mutations at the following positions (i.e. mutations at one
or more of the following positions) 141, T104, A106, N108, L113,
S115, T117, E170, D233, D238 and E244. The variant may comprise
modifications at any number and combination of the positions, such
as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 of the positions. The
variant preferably comprises one or more of I41N, T104R/K, A106R/K,
N108R/K, L113K/R, S115R/K, T117R/K, E170S/N/Q/K/R, D233S/N/Q/K/R,
D238S/N/Q/K/R and E244S/N/Q/K/R. Additionally or alternatively the
variant may comprise (c) Q42K/R, E44N/Q, L90R/K, N91R/K, I95R/K,
A99R/K, E101H/K/N/Q/T and/or Q114K/R.
[0186] In (a), the variant preferably comprises one or more
modifications which provide more consistent movement and increase
capture. In particular, in (a), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) (i) A98, (ii) Q100, (iii)
G103 and (iv) I107. The variant preferably comprises one or more of
(i) A98R/K, (ii) Q100K/R, (iii) G103K/R and (iv) I107R/K.
[0187] Particularly preferred mutant monomers which provide for
increased capture of analytes, such as a polynucleotides include a
mutation at one or more of positions Q42, E44, E44, L90, N91, 195,
A99, E101 and Q114, which mutation removes the negative charge
and/or increases the positive charge at the mutated positions. In
particular, the following mutations may be included in a mutant
monomer of the invention to produce a CsgG pore that has an
improved ability to capture an analyte, preferably a
polynucleotide: Q42K, E44N, E44Q, L90R, N91R, I95R, A99R, E101H,
E101K, E101N, E101Q, E101T and Q114K. Examples of particular mutant
monomers which comprise one of these mutations in combination with
other beneficial mutations are:
[0188] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-Q42K
[0189] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E44N
[0190] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E44Q
[0191] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-L90R
[0192] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-N91R
[0193] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-I95R
[0194] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-A99R
[0195] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101H
[0196] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101K
[0197] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101N
[0198] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101Q
[0199] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-E101T
[0200] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)-Q114K.
[0201] In (a), the variant preferably comprises one or more
modifications which provide increased characterisation accuracy. In
particular, in (a), the variant preferably comprises one or more
mutations at the following positions (i.e. mutations at one or more
of the following positions) Y130, K135 and S208, such as Y130;
K135; S208; Y130 and K135; Y130 and S208; K135 and S208; or Y130,
K135 and S208. The variant preferably comprises one or more of
Y130W/F/H/Q/N, K135L/V/N/Q/S and R142Q/S. These substitutions may
be present in any number and combination as set out for Y130, K135
and S208.
[0202] In (b), the variant may comprise any number and combination
of the substitutions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or
12 of the substitutions. In (b), the variant preferably comprises
one or more modifications which provide more consistent movement of
a target polynucleotide with respect to, such as through, a
transmembrane pore comprising the monomer. In particular, in (b),
the variant preferably comprises one or more one or more of (i)
Q87N/R/K, (ii) K94R/F/Y/W/L/S/N, (iii) R97F/Y/W/V/l/K/S/Q/H, (iv)
N102K/Q/L/I/V/S/H and (v) R110F/G/N. More preferably, the variant
comprises K94D or K94Q and/or R97W or R97Y. Other preferred
variants that are modified to provide more consistent movement of a
target polynucleotide with respect to, such as through, a
transmembrane pore comprising the monomer include (vi) R93W and
R93Y. A preferred variant may comprise R93W and R97W, R93Y and
R97W, R93W and R97W, or more preferably R93Y and R97Y.
[0203] In (b), the variant preferably comprises one or
modifications which allow pores constructed from the mutant
monomers preferably capture nucleotides and polynucleotides more
easily. In particular, in (b), the variant preferably comprises one
or more of (i) D43S, (ii) E44S, (iii) N91K/R, (iv) Q114R/K and (v)
D248S/N/Q/K/R.
[0204] In (b), the variant preferably comprises one or more
modifications which provide more consistent movement and increase
capture. In particular, in (b), the variant preferably comprises
one or more of Q87R/K, E101I/L/NH and N102K, such as Q87R/K;
E101I/L/A/H; N102K; Q87R/K and E101I/L/A/H; Q87R/K and N102K;
E101I/L/A/H and N102K; or Q87R/K, E101I/L/A/H and N102K.
[0205] In (b), the variant preferably comprises one or more
modifications which provide increased characterisation accuracy. In
particular, in (a), the variant preferably comprises
F48S/N/Q/Y/W/I/V.
[0206] In (b), the variant preferably comprises one or more
modifications which provide increased characterisation accuracy and
increased capture. In particular, in (a), the variant preferably
comprises F48H/R/K.
[0207] The variant may comprise modifications in both (a) and (b)
which provide more consistent movement. The variant may comprise
modifications in both (a) and (b) which provide increased
capture.
[0208] The invention provides variants of SEQ ID NO: 3 which
provide an increased throughput of an assay for characterising an
analyte, such as a polynucleotide, using a pore comprising the
variant. Such variants may comprise a mutation at K94, preferably
K94Q or K94N, more preferably K94Q. Examples of particular mutant
monomers which comprise a K94Q or K94N mutation in combination with
other beneficial mutations are:
[0209] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-K94N
[0210] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-K94Q.
[0211] Using monomers that are variants of SEQ ID NO: 3 to form the
CsgG pore can provide increased characterisation accuracy in an
assay for characterising an analyte, such as a polynucleotide. Such
variants include variants that comprise: a mutation at F191,
preferably F191T; deletion of V105-I107; deletion of F193-L199 or
of D195-L199; and/or a mutation at R93 and/or R97, preferably R93Y,
R97Y, or more preferably, R97W, R93W or both R97Y and R97Y.
Examples of particular mutant monomers which comprise one or more
of these mutations in combination with other beneficial mutations
are:
[0212] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-del(D195-L199)
[0213] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-del(F193-L199)
[0214] CsgG-(WT-Y51A/F56Q/R97W/R192D-StrepII)9-F191T
[0215] CsgG-(WT-Y51A/F56Q/R97W/R192D-del(V105-I107)-StrepII)9
[0216] CsgG-(WT-Y51A/F56Q/K94Q/R97W/R192D-del(V105-I107)
[0217] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93W
[0218] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93W-del(D195-L199)
[0219] CsgG-(WT-Y51A/F56Q/R192D-StrepII)9-R93Y/R97Y.
[0220] In another embodiment, the variant of SEQ ID NO: 3 comprises
(A) deletion of one or more positions R192, F193, I194, D195, Y196,
Q197, R198, L199, L200 and E201 and/or (B) deletion of one or more
of V139/G140/D149/T150/V186/Q187/V204/G205 (called band 1 herein),
G137/G138/Q151/Y152/Y184/E185/Y206/T207 (called band 2 herein) and
A141/R142/G147/A148/A188/G189/G202/E203 (called band 3 herein).
[0221] In (A), the variant may comprise deletion of any number and
combination of the positions, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or
10 of the positions. In (A), the variant preferably comprises
deletion of [0222] D195, Y196, Q197, R198 and L199; [0223] R192,
F193, I194, D195, Y196, Q197, R198, L199 and L200; [0224] Q197,
R198, L199 and L200; [0225] 1194, D195, Y196, Q197, R198 and L199;
[0226] D195, Y196, Q197, R198, L199 and L200; [0227] Y196, Q197,
R198, L199, L200 and E201; [0228] Q197, R198, L199, L200 and E201;
[0229] Q197, R198, L199; or [0230] F193, I194, D195, Y196, Q197,
R198 and L199.
[0231] More preferably, the variant comprises deletion of D195,
Y196, Q197, R198 and L199 or F193, I194, D195, Y196, Q197, R198 and
L199. In (B), any number and combination of bands 1 to 3 may be
deleted, such as band 1; band 2; band 3; bands 1 and 2; bands 1 and
3; bands 2 and 3; or bands 1, 2 and 3. The variant may comprise
deletions according to (A); (B); or (A) and (B).
[0232] The variants comprising deletion of one or more positions
according to (A) and/or (B) above may further comprise any of the
modifications or substitutions discussed above and below. If the
modifications or substitutions are made at one or more positions
which appear after the deletion positions in SEQ ID NO: 3, the
numbering of the one or more positions of the modifications or
substitutions must be adjusted accordingly. For instance, if L199
is deleted, E244 becomes E243. Similarly, if band 1 is deleted,
R192 becomes R186.
[0233] In another embodiment, the variant of SEQ ID NO: 3 comprises
(C) deletion of one or more positions V105, A106 and I107. The
deletions in accordance with (C) may be made in addition to
deletions according to (A) and/or (B).
[0234] The above-described deletions typically reduce the noise
associated with the movement of the target polynucleotide with
respect to, such as through, a transmembrane pore comprising the
monomer. As a result the target polynucleotide can be characterised
more accurately.
[0235] In the paragraphs above where different amino acids at a
specific positon are separated by the / symbol, the / symbol means
"or". For instance, Q87R/K means Q87R or Q87K.
[0236] The variants of SEQ ID NO: 3 which provide increased capture
of an an analyte, such as a polynucleotide may comprise a mutation
at T104, preferably T104R or T104K, a mutation at N91, preferably
N91R, a mutation at E101, preferably E101K/N/Q/T/H, a mutation at
position E44, preferably E44N or E44Q and/or a mutation at position
Q42, preferably Q42K.
[0237] The mutations at different positions in SEQ ID NO: 3 may be
combined in any possible way. In particular, a monomer in the CsgG
pore may comprise one or more mutation that improves accuracy, one
ore more mutation that reduces noise and/ore one or more mutation
that enhances capture of an analyte.
[0238] The variant of SEQ ID NO: 3 preferably comprises one or more
of the following (i) one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) N40, D43, E44, S54, S57, Q62, R97, E101, E124, E131,
R142, T150 and R192, such as one or more mutations at the following
positions (i.e. mutations at one or more of the following
positions) N40, D43, E44, S54, S57, Q62, E101, E131 and T150 or
N40, D43, E44, E101 and E131; (ii) mutations at Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56; (iii) Q42R or Q42K; (iv) K49R; (v) N102R,
N102F, N102Y or N102W; (vi) D149N, D149Q or D149R; (vii) E185N,
E185Q or E185R; (viii) D195N, D195Q or D195R; (ix) E201N, E201Q or
E201R; (x) E203N, E203Q or E203R; and (xi) deletion of one or more
of the following positions F48, K49, P50, Y51, P52, A53, S54, N55,
F56 and S57. The variant may comprise any combination of (i) to
(xi).
[0239] If the variant comprises any one of (i) and (iii) to (xi),
it may further comprise a mutation at one or more of Y51, N55 and
F56, such as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or
Y51/N55/F56.
[0240] In (i), the variant may comprises mutations at any number
and combination of N40, D43, E44, S54, S57, Q62, R97, E101, E124,
E131, R142, T150 and R192. In (i), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) N40, D43, E44, S54, S57,
Q62, E101, E131 and T150. In (i), the variant preferably comprises
one or more mutations at the following positions (i.e. mutations at
one or more of the following positions) N40, D43, E44, E101 and
E131. In (i), the variant preferably comprises a mutation at S54
and/or S57. In (i), the variant more preferably comprises a
mutation at (a) S54 and/or S57 and (b) one or more of Y51, N55 and
F56, such as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or
Y51/N55/F56. If S54 and/or S57 are deleted in (xi), it/they cannot
be mutated in (i) and vice versa. In (i), the variant preferably
comprises a mutation at T150, such as T150I. Alternatively the
variant preferably comprises a mutation at (a) T150 and (b) one or
more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56. In (i), the variant preferably
comprises a mutation at Q62, such as Q62R or Q62K. Alternatively
the variant preferably comprises a mutation at (a) Q62 and (b) one
or more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56. The variant may comprise a
mutation at D43, E44, Q62 or any combination thereof, such as D43,
E44, Q62, D43/E44, D43/Q62, E44/Q62 or D43/E44/Q62. Alternatively
the variant preferably comprises a mutation at (a) D43, E44, Q62,
D43/E44, D43/Q62, E44/Q62 or D43/E44/Q62 and (b) one or more of
Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56.
[0241] In (ii) and elsewhere in this application where different
positions are separated by the / symbol, the / symbol means "and"
such that Y51/N55 is Y51 and N55. In (ii), the variant preferably
comprises mutations at Y51/N55. It has been proposed that the
constriction in CsgG is composed of three stacked concentric rings
formed by the side chains of residues Y51, N55 and F56 (Goyal et
al, 2014, Nature, 516, 250-253). Mutation of these residues in (ii)
may therefore decrease the number of nucleotides contributing to
the current as the polynucleotide moves through the pore and
thereby make it easier to identify a direct relationship between
the observed current (as the polynucleotide moves through the pore)
and the polynucleotide. F56 may be mutated in any of the ways
discussed below with reference to variants and pores useful in the
method of the invention.
[0242] In (v), the variant may comprise N102R, N102F, N102Y or
N102W. The variant preferably comprises (a) N102R, N102F, N102Y or
N102W and (b) a mutation at one or more of Y51, N55 and F56, such
as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56.
[0243] In (xi), any number and combination of K49, P50, Y51, P52,
A53, S54, N55, F56 and S57 may be deleted. Preferably one or more
of K49, P50, Y51, P52, A53, S54, N55 and S57 may be deleted. If any
of Y51, N55 and F56 are deleted in (xi), it/they cannot be mutated
in (ii) and vice versa.
[0244] In (i), the variant preferably comprises one of more of the
following substitutions N40R, N40K, D43N, D43Q, D43R, D43K, E44N,
E44Q, E44R, E44K, S54P, S57P, Q62R, Q62K, R97N, R97G, R97L, E101N,
E101Q, E101R, E101K, E101F, E101Y, E101W, E124N, E124Q, E124R,
E124K, E124F, E124Y, E124W, E131D, R142E, R142N, T150I, R192E and
R192N, such as one or more of N40R, N40K, D43N, D43Q, D43R, D43K,
E44N, E44Q, E44R, E44K, S54P, S57P, Q62R, Q62K, E101N, E101Q,
E101R, E101K, E101F, E101Y, E101W, E131 D and T150I, or one or more
of N40R, N40K, D43N, D43Q, D43R, D43K, E44N, E44Q, E44R, E44K,
E101N, E101Q, E101R, E101K, E101F, E101Y, E101W and E131 D. The
variant may comprise any number and combination of these
substitutions. In (i), the variant preferably comprises S54P and/or
S57P. In (i), the variant preferably comprises (a) S54P and/or S57P
and (b) a mutation at one or more of Y51, N55 and F56, such as at
Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56. The
mutations at one or more of Y51, N55 and F56 may be any of those
discussed below. In (i), the variant preferably comprises F56A/S57P
or S54P/F56A. The variant preferably comprises T150I. Alternatively
the variant preferably comprises a mutation at (a) T150I and (b)
one or more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55,
Y51/F56, N55/F56 or Y51/N55/F56.
[0245] In (i), the variant preferably comprises Q62R or Q62K.
Alternatively the variant preferably comprises (a) Q62R or Q62K and
(b) a mutation at one or more of Y51, N55 and F56, such as at Y51,
N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56. The variant may
comprise D43N, E44N, Q62R or Q62K or any combination thereof, such
as D43N, E44N, Q62R, Q62K, D43N/E44N, D43N/Q62R, D43N/Q62K,
E44N/Q62R, E44N/Q62K, D43N/E44N/Q62R or D43N/E44N/Q62K.
Alternatively the variant preferably comprises (a) D43N, E44N,
Q62R, Q62K, D43N/E44N, D43N/Q62R, D43N/Q62K, E44N/Q62R, E44N/Q62K,
D43N/E44N/Q62R or D43N/E44N/Q62K and (b) a mutation at one or more
of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55, Y51/F56,
N55/F56 or Y51/N55/F56.
[0246] In (i), the variant preferably comprises D43N.
[0247] In (i), the variant preferably comprises E101R, E101 S,
E101F or E101N.
[0248] In (i), the variant preferably comprises E124N, E124Q,
E124R, E124K, E124F, E124Y, E124W or E124D, such as E124N.
[0249] In (i), the variant preferably comprises R142E and
R142N.
[0250] In (i), the variant preferably comprises R97N, R97G or
R97L.
[0251] In (i), the variant preferably comprises R192E and
R192N.
[0252] In (ii), the variant preferably comprises F56N/N55Q,
F56N/N55R, F56N/N55K, F56N/N55S, F56N/N55G, F56N/N55A, F56N/N55T,
F56Q/N55Q, F56Q/N55R, F56Q/N55K, F56Q/N55S, F56Q/N55G, F56Q/N55A,
F56Q/N55T, F56R/N55Q, F56R/N55R, F56R/N55K, F56R/N55S, F56R/N55G,
F56R/N55A, F56R/N55T, F56S/N55Q, F56S/N55R, F56S/N55K, F56S/N55S,
F56S/N55G, F56S/N55A, F56S/N55T, F56G/N55Q, F56G/N55R, F56G/N55K,
F56G/N55S, F56G/N55G, F56G/N55A, F56G/N55T, F56A/N55Q, F56A/N55R,
F56A/N55K, F56A/N55S, F56A/N55G, F56A/N55A, F56A/N55T, F56K/N55Q,
F56K/N55R, F56K/N55K, F56K/N55S, F56K/N55G, F56K/N55A, F56K/N55T,
F56N/Y51L, F56N/Y51V, F56N/Y51A, F56N/Y51N, F56N/Y51Q, F56N/Y51S,
F56N/Y51G, F56Q/Y51L, F56Q/Y51V, F56Q/Y51A, F56Q/Y51N, F56Q/Y51Q,
F56Q/Y51S, F56Q/Y51G, F56R/Y51L, F56R/Y51V, F56R/Y51A, F56R/Y51N,
F56R/Y51Q, F56R/Y51S, F56R/Y51G, F56S/Y51L, F56S/Y51V, F56S/Y51A,
F56S/Y51N, F56S/Y51Q, F56S/Y51S, F56S/Y51G, F56G/Y51L, F56G/Y51V,
F56G/Y51A, F56G/Y51N, F56G/Y51Q, F56G/Y51S, F56G/Y51G, F56A/Y51L,
F56A/Y51V, F56A/Y51A, F56A/Y51N, F56A/Y51Q, F56A/Y51S, F56A/Y51G,
F56K/Y51L, F56K/Y51V, F56K/Y51A, F56K/Y51N, F56K/Y51Q, F56K/Y51S,
F56K/Y51G, N55Q/Y51L, N55Q/Y51V, N55Q/Y51A, N55Q/Y51N, N55Q/Y51Q,
N55Q/Y51S, N55Q/Y51G, N55R/Y51L, N55R/Y51V, N55R/Y51A, N55R/Y51N,
N55R/Y51Q, N55R/Y51S, N55R/Y51G, N55K/Y51L, N55K/Y51V, N55K/Y51A,
N55K/Y51N, N55K/Y51Q, N55K/Y51S, N55K/Y51G, N55S/Y51L, N55S/Y51V,
N55S/Y51A, N55S/Y51N, N55S/Y51Q, N55S/Y51S, N55S/Y51G, N55G/Y51L,
N55G/Y51V, N55G/Y51A, N55G/Y51N, N55G/Y51Q, N55G/Y51S, N55G/Y51G,
N55A/Y51L, N55A/Y51V, N55A/Y51A, N55A/Y51N, N55A/Y51Q, N55A/Y51S,
N55A/Y51G, N55T/Y51L, N55T/Y51V, N55T/Y51A, N55T/Y51N, N55T/Y51Q,
N55T/Y51S, N55T/Y51G, F56N/N55Q/Y51L, F56N/N55Q/Y51V,
F56N/N55Q/Y51A, F56N/N55Q/Y51N, F56N/N55Q/Y51Q, F56N/N55Q/Y51S,
F56N/N55Q/Y51G, F56N/N55R/Y51L, F56N/N55R/Y51V, F56N/N55R/Y51A,
F56N/N55R/Y51N, F56N/N55R/Y51Q, F56N/N55R/Y51S, F56N/N55R/Y51G,
F56N/N55K/Y51L, F56N/N55K/Y51V, F56N/N55K/Y51A, F56N/N55K/Y51N,
F56N/N55K/Y51Q, F56N/N55K/Y51S, F56N/N55K/Y51G, F56N/N55S/Y51L,
F56N/N55S/Y51V, F56N/N55S/Y51A, F56N/N55S/Y51N, F56N/N55S/Y51Q,
F56N/N55S/Y51S, F56N/N55S/Y51G, F56N/N55G/Y51L, F56N/N55G/Y51V,
F56N/N55G/Y51A, F56N/N55G/Y51N, F56N/N55G/Y51Q, F56N/N55G/Y51S,
F56N/N55G/Y51G, F56N/N55A/Y51L, F56N/N55A/Y51V, F56N/N55A/Y51A,
F56N/N55A/Y51N, F56N/N55A/Y51Q, F56N/N55A/Y51S, F56N/N55A/Y51G,
F56N/N55T/Y51L, F56N/N55T/Y51V, F56N/N55T/Y51A, F56N/N55T/Y51N,
F56N/N55T/Y51Q, F56N/N55T/Y51S, F56N/N55T/Y51G, F56Q/N55Q/Y51L,
F56Q/N55Q/Y51V, F56Q/N55Q/Y51A, F56Q/N55Q/Y51N, F56Q/N55Q/Y51Q,
F56Q/N55Q/Y51S, F56Q/N55Q/Y51G, F56Q/N55R/Y51L, F56Q/N55R/Y51V,
F56Q/N55R/Y51A, F56Q/N55R/Y51N, F56Q/N55R/Y51Q, F56Q/N55R/Y51S,
F56Q/N55R/Y51G, F56Q/N55K/Y51L, F56Q/N55K/Y51V, F56Q/N55K/Y51A,
F56Q/N55K/Y51N, F56Q/N55K/Y51Q, F56Q/N55K/Y51S, F56Q/N55K/Y51G,
F56Q/N55S/Y51L, F56Q/N55S/Y51V, F56Q/N55S/Y51A, F56Q/N55S/Y51N,
F56Q/N55S/Y51Q, F56Q/N55S/Y51S, F56Q/N55S/Y51G, F56Q/N55G/Y51L,
F56Q/N55G/Y51V, F56Q/N55G/Y51A, F56Q/N55G/Y51N, F56Q/N55G/Y51Q,
F56Q/N55G/Y51S, F56Q/N55G/Y51G, F56Q/N55A/Y51L, F56Q/N55A/Y51V,
F56Q/N55A/Y51A, F56Q/N55A/Y51N, F56Q/N55A/Y51Q, F56Q/N55A/Y51S,
F56Q/N55A/Y51G, F56Q/N55T/Y51L, F56Q/N55T/Y51V, F56Q/N55T/Y51A,
F56Q/N55T/Y51N, F56Q/N55T/Y51Q, F56Q/N55T/Y51S, F56Q/N55T/Y51G,
F56R/N55Q/Y51L, F56R/N55Q/Y51V, F56R/N55Q/Y51A, F56R/N55Q/Y51N,
F56R/N55Q/Y51Q, F56R/N55Q/Y51S, F56R/N55Q/Y51G, F56R/N55R/Y51L,
F56R/N55R/Y51V, F56R/N55R/Y51A, F56R/N55R/Y51N, F56R/N55R/Y51Q,
F56R/N55R/Y51S, F56R/N55R/Y51G, F56R/N55K/Y51L, F56R/N55K/Y51V,
F56R/N55K/Y51A, F56R/N55K/Y51N, F56R/N55K/Y51Q, F56R/N55K/Y51S,
F56R/N55K/Y51G, F56R/N55S/Y51L, F56R/N55S/Y51V, F56R/N55S/Y51A,
F56R/N55S/Y51N, F56R/N55S/Y51Q, F56R/N55S/Y51S, F56R/N55S/Y51G,
F56R/N55G/Y51L, F56R/N55G/Y51V, F56R/N55G/Y51A, F56R/N55G/Y51N,
F56R/N55G/Y51Q, F56R/N55G/Y51S, F56R/N55G/Y51G, F56R/N55A/Y51L,
F56R/N55A/Y51V, F56R/N55A/Y51A, F56R/N55A/Y51N, F56R/N55A/Y51Q,
F56R/N55A/Y51S, F56R/N55A/Y51G, F56R/N55T/Y51L, F56R/N55T/Y51V,
F56R/N55T/Y51A, F56R/N55T/Y51N, F56R/N55T/Y51Q, F56R/N55T/Y51S,
F56R/N55T/Y51G, F56S/N55Q/Y51L, F56S/N55Q/Y51V, F56S/N55Q/Y51A,
F56S/N55Q/Y51N, F56S/N55Q/Y51Q, F56S/N55Q/Y51S, F56S/N55Q/Y51G,
F56S/N55R/Y51L, F56S/N55R/Y51V, F56S/N55R/Y51A, F56S/N55R/Y51N,
F56S/N55R/Y51Q, F56S/N55R/Y51S, F56S/N55R/Y51G, F56S/N55K/Y51L,
F56S/N55K/Y51V, F56S/N55K/Y51A, F56S/N55K/Y51N, F56S/N55K/Y51Q,
F56S/N55K/Y51S, F56S/N55K/Y51G, F56S/N55S/Y51L, F56S/N55S/Y51V,
F56S/N55S/Y51A, F56S/N55S/Y51N, F56S/N55S/Y51Q, F56S/N55S/Y51S,
F56S/N55S/Y51G, F56S/N55G/Y51L, F56S/N55G/Y51V, F56S/N55G/Y51A,
F56S/N55G/Y51N, F56S/N55G/Y51Q, F56S/N55G/Y51S, F56S/N55G/Y51G,
F56S/N55A/Y51L, F56S/N55A/Y51V, F56S/N55A/Y51A, F56S/N55A/Y51N,
F56S/N55A/Y51Q, F56S/N55A/Y51S, F56S/N55A/Y51G, F56S/N55T/Y51L,
F56S/N55T/Y51V, F56S/N55T/Y51A, F56S/N55T/Y51N, F56S/N55T/Y51Q,
F56S/N55T/Y51S, F56S/N55T/Y51G, F56G/N55Q/Y51L, F56G/N55Q/Y51V,
F56G/N55Q/Y51A, F56G/N55Q/Y51N, F56G/N55Q/Y51Q, F56G/N55Q/Y51S,
F56G/N55Q/Y51G, F56G/N55R/Y51L, F56G/N55R/Y51V, F56G/N55R/Y51A,
F56G/N55R/Y51N, F56G/N55R/Y51Q, F56G/N55R/Y51S, F56G/N55R/Y51G,
F56G/N55K/Y51L, F56G/N55K/Y51V, F56G/N55K/Y51A, F56G/N55K/Y51N,
F56G/N55K/Y51Q, F56G/N55K/Y51S, F56G/N55K/Y51G, F56G/N55S/Y51L,
F56G/N55S/Y51V, F56G/N55S/Y51A, F56G/N55S/Y51N, F56G/N55S/Y51Q,
F56G/N55S/Y51S, F56G/N55S/Y51G, F56G/N55G/Y51L, F56G/N55G/Y51V,
F56G/N55G/Y51A, F56G/N55G/Y51N, F56G/N55G/Y51Q, F56G/N55G/Y51S,
F56G/N55G/Y51G, F56G/N55A/Y51L, F56G/N55A/Y51V, F56G/N55A/Y51A,
F56G/N55A/Y51N, F56G/N55A/Y51Q, F56G/N55A/Y51S, F56G/N55A/Y51G,
F56G/N55T/Y51L, F56G/N55T/Y51V, F56G/N55T/Y51A, F56G/N55T/Y51N,
F56G/N55T/Y51Q, F56G/N55T/Y51S, F56G/N55T/Y51G, F56A/N55Q/Y51L,
F56A/N55Q/Y51V, F56A/N55Q/Y51A, F56A/N55Q/Y51N, F56A/N55Q/Y51Q,
F56A/N55Q/Y51S, F56A/N55Q/Y51G, F56A/N55R/Y51L, F56A/N55R/Y51V,
F56A/N55R/Y51A, F56A/N55R/Y51N, F56A/N55R/Y51Q, F56A/N55R/Y51S,
F56A/N55R/Y51G, F56A/N55K/Y51L, F56A/N55K/Y51V, F56A/N55K/Y51A,
F56A/N55K/Y51N, F56A/N55K/Y51Q, F56A/N55K/Y51S, F56A/N55K/Y51G,
F56A/N55S/Y51L, F56A/N55S/Y51V, F56A/N55S/Y51A, F56A/N55S/Y51N,
F56A/N55S/Y51Q, F56A/N55S/Y51S, F56A/N55S/Y51G, F56A/N55G/Y51L,
F56A/N55G/Y51V, F56A/N55G/Y51A, F56A/N55G/Y51N, F56A/N55G/Y51Q,
F56A/N55G/Y51S, F56A/N55G/Y51G, F56A/N55A/Y51L, F56A/N55A/Y51V,
F56A/N55A/Y51A, F56A/N55A/Y51N, F56A/N55A/Y51Q, F56A/N55A/Y51S,
F56A/N55A/Y51G, F56A/N55T/Y51L, F56A/N55T/Y51V, F56A/N55T/Y51A,
F56A/N55T/Y51N, F56A/N55T/Y51Q, F56A/N55T/Y51S, F56A/N55T/Y51G,
F56K/N55Q/Y51L, F56K/N55Q/Y51V, F56K/N55Q/Y51A, F56K/N55Q/Y51N,
F56K/N55Q/Y51Q, F56K/N55Q/Y51S, F56K/N55Q/Y51G, F56K/N55R/Y51L,
F56K/N55R/Y51V, F56K/N55R/Y51A, F56K/N55R/Y51N, F56K/N55R/Y51Q,
F56K/N55R/Y51S, F56K/N55R/Y51G, F56K/N55K/Y51L, F56K/N55K/Y51V,
F56K/N55K/Y51A, F56K/N55K/Y51N, F56K/N55K/Y51Q, F56K/N55K/Y51S,
F56K/N55K/Y51G, F56K/N55S/Y51L, F56K/N55S/Y51V, F56K/N55S/Y51A,
F56K/N55S/Y51N, F56K/N55S/Y51Q, F56K/N55S/Y51S, F56K/N55S/Y51G,
F56K/N55G/Y51L, F56K/N55G/Y51V, F56K/N55G/Y51A, F56K/N55G/Y51N,
F56K/N55G/Y51Q, F56K/N55G/Y51S, F56K/N55G/Y51G, F56K/N55A/Y51L,
F56K/N55A/Y51V, F56K/N55A/Y51A, F56K/N55A/Y51N, F56K/N55A/Y51Q,
F56K/N55A/Y51S, F56K/N55A/Y51G, F56K/N55T/Y51L, F56K/N55T/Y51V,
F56K/N55T/Y51A, F56K/N55T/Y51N, F56K/N55T/Y51Q, F56K/N55T/Y51S,
F56K/N55T/Y51G, F56E/N55R, F56E/N55K, F56D/N55R, F56D/N55K,
F56R/N55E, F56R/N55D, F56K/N55E or F56K/N55D.
[0253] In (ii), the variant preferably comprises Y51R/F56Q,
Y51N/F56N, Y51M/F56Q, Y51L/F56Q, Y51I/F56Q, Y51V/F56Q, Y51A/F56Q,
Y51P/F56Q, Y51G/F56Q, Y51C/F56Q, Y51Q/F56Q, Y51N/F56Q, Y51S/F56Q,
Y51E/F56Q, Y51D/F56Q, Y51K/F56Q or Y51H/F56Q.
[0254] In (ii), the variant preferably comprises Y51T/F56Q,
Y51Q/F56Q or Y51A/F56Q.
[0255] In (ii), the variant preferably comprises Y51T/F56F,
Y51T/F56M, Y51T/F56L, Y51T/F561, Y51T/F56V, Y51T/F56A, Y51T/F56P,
Y51T/F56G, Y51T/F56C, Y51T/F56Q, Y51T/F56N, Y51T/F56T, Y51T/F56S,
Y51T/F56E, Y51T/F56D, Y51T/F56K, Y51T/F56H or Y51T/F56R.
[0256] In (ii), the variant preferably comprises Y51T/N55Q,
Y51T/N55S or Y51T/N55A.
[0257] In (ii), the variant preferably comprises Y51A/F56F,
Y51A/F56L, Y51A/F561, Y51A/F56V, Y51A/F56A, Y51A/F56P, Y51A/F56G,
Y51A/F56C, Y51A/F56Q, Y51A/F56N, Y51A/F56T, Y51A/F56S, Y51A/F56E,
Y51A/F56D, Y51A/F56K, Y51A/F56H or Y51A/F56R.
[0258] In (ii), the variant preferably comprises Y51C/F56A,
Y51E/F56A, Y51D/F56A, Y51K/F56A, Y51H/F56A, Y51Q/F56A, Y51N/F56A,
Y51S/F56A, Y51P/F56A or Y51V/F56A.
[0259] In (xi), the variant preferably comprises deletion of
Y51/P52, Y51/P52/A53, P50 to P52, P50 to A53, K49 to Y51, K49 to
A53 and replacement with a single proline (P), K49 to S54 and
replacement with a single P, Y51 to A53, Y51 to S54, N55/F56, N55
to S57, N55/F56 and replacement with a single P, N55/F56 and
replacement with a single glycine (G), N55/F56 and replacement with
a single alanine (A), N55/F56 and replacement with a single P and
Y51N, N55/F56 and replacement with a single P and Y51Q, N55/F56 and
replacement with a single P and Y51S, N55/F56 and replacement with
a single G and Y51N, N55/F56 and replacement with a single G and
Y51Q, N55/F56 and replacement with a single G and Y51S, N55/F56 and
replacement with a single A and Y51N, N55/F56 and replacement with
a single A/Y51Q or N55/F56 and replacement with a single A and
Y51S.
[0260] The variant more preferably comprises D195N/E203N,
D195Q/E203N, D195N/E203Q, D195Q/E203Q, E201N/E203N, E201Q/E203N,
E201N/E203Q, E201Q/E203Q, E185N/E203Q, E185Q/E203Q, E185N/E203N,
E185Q/E203N, D195N/E201N/E203N, D195Q/E201N/E203N,
D195N/E201Q/E203N, D195N/E201N/E203Q, D195Q/E201Q/E203N,
D195Q/E201N/E203Q, D195N/E201Q/E203Q, D195Q/E201Q/E203Q,
D149N/E201N, D149Q/E201N, D149N/E201Q, D149Q/E201Q,
D149N/E201N/D195N, D149Q/E201N/D195N, D149N/E201Q/D195N,
D149N/E201N/D195Q, D149Q/E201Q/D195N, D149Q/E201N/D195Q,
D149N/E201Q/D195Q, D149Q/E201Q/D195Q, D149N/E203N, D149Q/E203N,
D149N/E203Q, D149Q/E203Q, D149N/E185N/E201N, D149Q/E185N/E201N,
D149N/E185Q/E201N, D149N/E185N/E201Q, D149Q/E185Q/E201N,
D149Q/E185N/E201Q, D149N/E185Q/E201Q, D149Q/E185Q/E201Q,
D149N/E185N/E203N, D149Q/E185N/E203N, D149N/E185Q/E203N,
D149N/E185N/E203Q, D149Q/E185Q/E203N, D149Q/E185N/E203Q,
D149N/E185Q/E203Q, D149Q/E185Q/E203Q, D149N/E185N/E201N/E203N,
D149Q/E185N/E201N/E203N, D149N/E185Q/E201N/E203N,
D149N/E185N/E201Q/E203N, D149N/E185N/E201N/E203Q,
D149Q/E185Q/E201N/E203N, D149Q/E185N/E201Q/E203N,
D149Q/E185N/E201N/E203Q, D149N/E185Q/E201Q/E203N,
D149N/E185Q/E201N/E203Q, D149N/E185N/E201Q/E203Q,
D149Q/E185Q/E201Q/E203Q, D149Q/E185Q/E201N/E203Q,
D149Q/E185N/E201Q/E203Q, D149N/E185Q/E201Q/E203Q,
D149Q/E185Q/E201Q/E203N, D149N/E185N/D195N/E201N/E203N,
D149Q/E185N/D195N/E201N/E203N, D149N/E185Q/D195N/E201N/E203N,
D149N/E185N/D195Q/E201N/E203N, D149N/E185N/D195N/E201Q/E203N,
D149N/E185N/D195N/E201N/E203Q, D149Q/E185Q/D195N/E201N/E203N,
D149Q/E185N/D195Q/E201N/E203N, D149Q/E185N/D195N/E201Q/E203N,
D149Q/E185N/D195N/E201N/E203Q, D149N/E185Q/D195Q/E201N/E203N,
D149N/E185Q/D195N/E201Q/E203N, D149N/E185Q/D195N/E201N/E203Q,
D149N/E185N/D195Q/E201Q/E203N, D149N/E185N/D195Q/E201N/E203Q,
D149N/E185N/D195N/E201Q/E203Q, D149Q/E185Q/D195Q/E201N/E203N,
D149Q/E185Q/D195N/E201Q/E203N, D149Q/E185Q/D195N/E201N/E203Q,
D149Q/E185N/D195Q/E201Q/E203N, D149Q/E185N/D195Q/E201N/E203Q,
D149Q/E185N/D195N/E201Q/E203Q, D149N/E185Q/D195Q/E201Q/E203N,
D149N/E185Q/D195Q/E201N/E203Q, D149N/E185Q/D195N/E201Q/E203Q,
D149N/E185N/D195Q/E201Q/E203Q, D149Q/E185Q/D195Q/E201Q/E203N,
D149Q/E185Q/D195Q/E201N/E203Q, D149Q/E185Q/D195N/E201Q/E203Q,
D149Q/E185N/D195Q/E201Q/E203Q, D149N/E185Q/D195Q/E201Q/E203Q,
D149Q/E185Q/D195Q/E201Q/E203Q, D149N/E185R/E201N/E203N,
D149Q/E185R/E201N/E203N, D149N/E185R/E201Q/E203N,
D149N/E185R/E201N/E203Q, D149Q/E185R/E201Q/E203N,
D149Q/E185R/E201N/E203Q, D149N/E185R/E201Q/E203Q,
D149Q/E185R/E201Q/E203Q, D149R/E185N/E201N/E203N,
D149R/E185Q/E201N/E203N, D149R/E185N/E201Q/E203N,
D149R/E185N/E201N/E203Q, D149R/E185Q/E201Q/E203N,
D149R/E185Q/E201N/E203Q, D149R/E185N/E201Q/E203Q,
D149R/E185Q/E201Q/E203Q, D149R/E185N/D195N/E201N/E203N,
D149R/E185Q/D195N/E201N/E203N, D149R/E185N/D195Q/E201N/E203N,
D149R/E185N/D195N/E201Q/E203N, D149R/E185Q/D195N/E201N/E203Q,
D149R/E185Q/D195Q/E201N/E203N, D149R/E185Q/D195N/E201Q/E203N,
D149R/E185Q/D195N/E201N/E203Q, D149R/E185N/D195Q/E201Q/E203N,
D149R/E185N/D195Q/E201N/E203Q, D149R/E185N/D195N/E201Q/E203Q,
D149R/E185Q/D195Q/E201Q/E203N, D149R/E185Q/D195Q/E201N/E203Q,
D149R/E185Q/D195N/E201Q/E203Q, D149R/E185N/D195Q/E201Q/E203Q,
D149R/E185Q/D195Q/E201Q/E203Q, D149N/E185R/D195N/E201N/E203N,
D149Q/E185R/D195N/E201N/E203N, D149N/E185R/D195Q/E201N/E203N,
D149N/E185R/D195N/E201Q/E203N, D149N/E185R/D195N/E201N/E203Q,
D149Q/E185R/D195Q/E201N/E203N, D149Q/E185R/D195N/E201Q/E203N,
D149Q/E185R/D195N/E201N/E203Q, D149N/E185R/D195Q/E201Q/E203N,
D149N/E185R/D195Q/E201N/E203Q, D149N/E185R/D195N/E201Q/E203Q,
D149Q/E185R/D195Q/E201Q/E203N, D149Q/E185R/D195Q/E201N/E203Q,
D149Q/E185R/D195N/E201Q/E203Q, D149N/E185R/D195Q/E201Q/E203Q,
D149Q/E185R/D195Q/E201Q/E203Q, D149N/E185R/D195N/E201R/E203N,
D149Q/E185R/D195N/E201R/E203N, D149N/E185R/D195Q/E201R/E203N,
D149N/E185R/D195N/E201R/E203Q, D149Q/E185R/D195Q/E201R/E203N,
D149Q/E185R/D195N/E201R/E203Q, D149N/E185R/D195Q/E201R/E203Q,
D149Q/E185R/D195Q/E201R/E203Q, E131D/K49R, E101N/N102F,
E101N/N102Y, E101N/N102W, E101F/N102F, E101F/N102Y, E101F/N102W,
E101Y/N102F, E101Y/N102Y, E101Y/N102W, E101W/N102F, E101W/N102Y,
E101W/N102W, E101N/N102R, E101F/N102R, E101Y/N102R or
E101W/N102F.
[0261] Preferred variants of the invention which form pores in
which fewer nucleotides contribute to the current as the
polynucleotide moves through the pore comprise Y51A/F56A,
Y51A/F56N, Y51I/F56A, Y51L/F56A, Y51T/F56A, Y51I/F56N, Y51L/F56N or
Y51T/F56N or more preferably Y51I/F56A, Y51L/F56 .ANG. or
Y51T/F56A. As discussed above, this makes it easier to identify a
direct relationship between the observed current (as the
polynucleotide moves through the pore) and the polynucleotide.
[0262] Preferred variants which form pores displaying an increased
range comprise mutations at the following positions:
[0263] Y51, F56, D149, E185, E201 and E203;
[0264] N55 and F56;
[0265] Y51 and F56;
[0266] Y51, N55 and F56; or
[0267] F56 and N102.
[0268] Preferred variants which form pores displaying an increased
range comprise:
[0269] Y51N, F56A, D149N, E185R, E201N and E203N;
[0270] N55S and F56Q;
[0271] Y51A and F56A;
[0272] Y51A and F56N;
[0273] Y51I and F56A;
[0274] Y51L and F56A;
[0275] Y51T and F56A;
[0276] Y51I and F56N;
[0277] Y51L and F56N;
[0278] Y51T and F56N;
[0279] Y51T and F56Q;
[0280] Y51A, N55S and F56A;
[0281] Y51A, N55S and F56N;
[0282] Y51T, N55S and F56Q; or
[0283] F56Q and N102R.
[0284] Preferred variants which form pores in which fewer
nucleotides contribute to the current as the polynucleotide moves
through the pore comprise mutations at the following positions:
[0285] N55 and F56, such as N55X and F56Q, wherein X is any amino
acid; or
[0286] Y51 and F56, such as Y51X and F56Q, wherein X is any amino
acid.
[0287] Particularly preferred variants comprise Y51A and F56Q.
[0288] Preferred variants which form pores displaying an increased
throughput comprise mutations at the following positions:
[0289] D149, E185 and E203;
[0290] D149, E185, E201 and E203; or
[0291] D149, E185, D195, E201 and E203.
[0292] Preferred variants which form pores displaying an increased
throughput comprise:
[0293] D149N, E185N and E203N;
[0294] D149N, E185N, E201N and E203N;
[0295] D149N, E185R, D195N, E201N and E203N; or
[0296] D149N, E185R, D195N, E201R and E203N.
[0297] Preferred variants which form pores in which capture of the
polynucleotide is increased comprise the following mutations:
[0298] D43N/Y51T/F56Q;
[0299] E44N/Y51T/F56Q;
[0300] D43N/E44N/Y51T/F56Q;
[0301] Y51T/F56Q/Q62R;
[0302] D43N/Y51T/F56Q/Q62R;
[0303] E44N/Y51T/F56Q/Q62R; or
[0304] D43N/E44N/Y51T/F56Q/Q62R.
[0305] Preferred variants comprise the following mutations:
[0306] D149R/E185R/E201R/E203R or
Y51T/F56Q/D149R/E185R/E201R/E203R;
[0307] D149N/E185N/E201N/E203N or
Y51T/F56Q/D149N/E185N/E201N/E203N;
[0308] E201R/E203R or Y51T/F56Q/E201R/E203R
[0309] E201N/E203R or Y51T/F56Q/E201N/E203R;
[0310] E203R or Y51T/F56Q/E203R;
[0311] E203N or Y51T/F56Q/E203N;
[0312] E201R or Y51T/F56Q/E201R;
[0313] E201N or Y51T/F56Q/E201N;
[0314] E185R or Y51T/F56Q/E185R;
[0315] E185N or Y51T/F56Q/E185N;
[0316] D149R or Y51T/F56Q/D149R;
[0317] D149N or Y51T/F56Q/D149N;
[0318] R142E or Y51T/F56Q/R142E;
[0319] R142N or Y51T/F56Q/R142N;
[0320] R192E or Y51T/F56Q/R192E; or
[0321] R192N or Y51T/F56Q/R192N.
[0322] Preferred variants comprise the following mutations:
[0323] Y51A/F56Q/E101N/N102R;
[0324] Y51A/F56Q/R97N/N102G;
[0325] Y51A/F56Q/R97N/N102R;
[0326] Y51A/F56Q/R97N;
[0327] Y51A/F56Q/R97G;
[0328] Y51A/F56Q/R97L;
[0329] Y51A/F56Q/N102R;
[0330] Y51A/F56Q/N102F;
[0331] Y51A/F56Q/N102G;
[0332] Y51A/F56Q/E101R;
[0333] Y51A/F56Q/E101F;
[0334] Y51A/F56Q/E101N; or
[0335] Y51A/F56Q/E101 G
[0336] The variant preferably further comprises a mutation at T150.
A preferred variant which forms a pore displaying an increased
insertion comprises T150I. A mutation at T150, such as T150I, may
be combined with any of the mutations or combinations of mutations
discussed above.
[0337] A preferred variant of SEQ ID NO: 3 comprises (a) R97W and
(b) a mutation at Y51 and/or F56. A preferred variant of SEQ ID NO:
3 comprises (a) R97W and (b) Y51R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M
and/or F56 R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M. A preferred variant
of SEQ ID NO: 3 comprises (a) R97W and (b) Y51L/A/N/Q/S/G and/or
F56A/Q/N. A preferred variant of SEQ ID NO: 3 comprises (a) R97W
and (b) Y51A and/or F56Q. A preferred variant of SEQ ID NO: 3
comprises R97W, Y51A and F56Q.
[0338] The variant of SEQ ID NO: 3 preferably comprises a mutation
at R192. The variant preferably comprises R192D/Q/F/S/T/N/E,
R192D/Q/F/S/T or R192D/Q. A preferred variant of SEQ ID NO: 3
comprises (a) R97W, (b) a mutation at Y51 and/or F56 and (c) a
mutation at R192, such as R192D/Q/F/S/T/N/E, R192D/Q/F/S/T or
R192D/Q. A preferred variant of SEQ ID NO: 3 comprises (a) R97W,
(b) Y51R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M and/or F56
R/H/K/D/E/S/T/N/Q/C/G/P/A/V/I/L/M and (c) a mutation at R192, such
as R192D/Q/F/S/T/N/E, R192D/Q/F/S/T or R192D/Q. A preferred variant
of SEQ ID NO: 3 comprises (a) R97W, (b) Y51LV/A/N/Q/S/G and/or
F56A/Q/N and (c) a mutation at R192, such as R192D/Q/F/S/T/N/E,
R192D/Q/F/S/T or R192D/Q. A preferred variant of SEQ ID NO: 3
comprises (a) R97W, (b) Y51A and/or F56Q and (c) a mutation at
R192, such as R192 D/Q/F/S/T/N/E, R192D/Q/F/S/T or R192D/Q. A
preferred variant of SEQ ID NO: 3 comprises R97W, Y51A, F56Q and
R192D/Q/F/S/T or R192D/Q. A preferred variant of SEQ ID NO: 3
comprises R97W, Y51A, F56Q and R192D. A preferred variant of SEQ ID
NO: 3 comprises R97W, Y51A, F56Q and R192Q. In the paragraphs above
where different amino acids at a specific positon are separated by
the / symbol, the / symbol means "or". For instance, R192D/Q means
R192D or R192Q.
[0339] Any of the above preferred variants of SEQ ID NO: 3
described above may further comprises a mutation at R93. A
preferred variant of SEQ ID NO: 3 comprise (a) R93W and (b) a
mutation at Y51 and/or F56, preferably Y51A and F56Q.
[0340] Any of the above preferred variants of SEQ ID NO: 3 may
comprise a K94N/Q mutation. Any of the above preferred variants of
SEQ ID NO: 3 may comprise a F191T mutation.
[0341] The CsgG monomer may be modified to facilitate attachment to
the CsgF peptide. For example a cysteine residue may be introduced
at one or more of the positions corresponding to positions 132,
133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183,
185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 3,
and/or at any one of the positions identified in Table 4 as being
predicted to make contact with CsgF, to facilitate covalent
attachment to CsgG. As an alternative or addition to covalent
attachment via cysteine residues, the pore may be stabilised by
hydrophobic interactions or electrostatic interactions. To
facilitate such interactions, a non-native reactive or
photoreactive amino acid at a position corresponding to one or more
of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151,
153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of
SEQ ID NO: 3, and/or at any one of the positions identified in
Table 4 as being predicted to make contact with CsgF.
[0342] Preferred exemplary pores include at least one CsgG monomer
having the following mutations relative to SEQ ID NO: 3:
Y51X.sub.1/N55X.sub.2/F56X.sub.3/N91R/K94Q/R97W/R192D-del(V105-I107),
wherein X.sub.1 is I/V/S/T, X.sub.2 is N/I/V/S/T and/or X.sub.3 is
Q/I/V/S/T.
[0343] Methods for introducing or substituting naturally-occurring
amino acids are well known in the art. For instance, methionine (M)
may be substituted with arginine (R) by replacing the codon for
methionine (ATG) with a codon for arginine (CGT) at the relevant
position in a polynucleotide encoding the mutant monomer. The
polynucleotide can then be expressed as discussed below.
Double Pores
[0344] The CsgG/CsgF pore may be a double pore comprising a first
pore and a second pore. At least the first pore is a CsgG/CsgF pore
disclosed herein. The second pore may be a CsgG pore or a CsgG/CsgF
pore. In one embodiment both the first pore and the second pore are
CsgG/CsgF pores as disclosed herein. The first and second pores may
be the same or different. In addition to any of the mutations
disclosed herein, in a double pore, the CsgG monomer may comprise
one or more of the additional mutations described below.
[0345] In the double pore, the first pore may be attached to the
second CsgG pore by hydrophobic interactions and/or by one or more
disulphide bond. One or more, such as 2, 3, 4, 5, 6, 8, 9, for
example all, of the monomers in the first pore and/or the second
pore may be modified to enhance such interactions. This may be
achieved in any suitable way.
[0346] At least one cysteine residue in the amino acid sequence of
the first pore at the interface between the first and second pores
may be disulphide bonded to at least one cysteine residue in the
amino acid sequence of the second pore at the interface between the
first and second pores. The cysteine residue in the first pore
and/or the cysteine residue in the second pore may be a cysteine
residue that is not present in the wild type CsgG monomer. Multiple
disulphide bonds, such as from 2, 3, 4, 5, 6, 7, 8 or 9 to 16, 18,
24, 27, 32, 36, 40, 45, 48, 54, 56 or 63, may form between the two
pores in the double pore. One or both the first or second pore may
comprise at least one monomer, such as up to 8, 9 or 10 monomers,
that comprises a cysteine residue at the interface between the
first and second pores at a position corresponding to R97, I107,
R110, Q100, E101, N102 and/or L113 of SEQ ID NO: 3.
[0347] At least one monomer in the first pore and/or at least one
monomer in the second pore may comprise at least one residue at the
interface between the first and second pores, which residue is more
hydrophobic than the residue present at the corresponding position
in the wild type CsgG monomer. For example, from 2 to 10, such as
3, 4, 5, 6, 7, 8 or 9, residues in the first pore and/or the second
pore may be more hydrophobic that the residues at the same
positions in the corresponding wild type CsgG monomer. Such
hydrophobic residues strengthen the interaction between the two
pores in the double pore. The at least one residue at the interface
between the first and second pores may be at a position
corresponding to R97, I107, R110, Q100, E101, N102 and or L113 of
SEQ ID NO: 3. Where the residue at the interface in the wild type
CsgG monomer is R, Q, N or E, the hydrophobic residue is typically
I, L, V, M, F, W or Y. Where the residue at the interface in the
wild type CsgG monomer is I, the hydrophobic residue is typically
L, V, M, F, W or Y. Where the residue at the interface in the wild
type CsgG monomer is L, the hydrophobic residue is typically I, V,
M, F, W or Y.
[0348] The double pore may comprise one or more monomer that
comprises one or more cysteine residue at the interface between the
pores and one or more monomer that comprises one or more introduced
hydrophobic residue at the interface between the pores, or may
comprise one or more monomer that comprises such cysteine residues
and such hydrophobic residues. For example, one or more, such as
any 2, 3, or 4, of the positions in the monomer corresponding to
the positions at R97, I107, R110, Q100, E101, N102 and or L113 of
SEQ ID NO: 3 may comprise a cysteine (C) residue and one or more,
such as any 2, 3 or 4, of the positions in the monomer
corresponding to the positions at R97, I107, R110, Q100, E101, N102
and or L113 of SEQ ID NO: 3 may comprise a hydrophobic residue,
such as I, L, V, M, F, W or Y.
[0349] The double pore according may contain bulky residues at one
or more, such as 2, 3, 4, 5, 6 or 7, positions in the tail region,
which residues are typically at the interface between the first and
second pores and are bulkier than the residues present at the
corresponding positions in the wild type CsgG monomer. The bulk of
these residues prevents holes from forming in the walls of the pore
at the interface between the first and second pore in the double
pore. The at least one bulky residue at the interface between the
first and second pores is typically at a position corresponding to
A98, A99, T104, V105, L113, Q114 or S115 of SEQ ID NO: 3. Where the
residue at the interface in the wild type CsgG monomer is A, the
bulky residue is typically I, L, V, M, F, W, Y, N, Q, S or T. Where
the residue present at the interface in the wild type CsgG monomer
is T, the bulky residue is typically L, M, F, W, Y, N, Q, R, D or
E. Where the residue present at the interface in the wild type CsgG
monomer is V, the bulky residue is typically I, L, M, F, W, Y, N,
Q. Where the residue present at the interface in the wild type CsgG
monomer is L, the bulky residue is typically M, F, W, Y, N, Q, R, D
or E. Where the residue present at the interface in the wild type
CsgG monomer is Q, the bulky residue is typically F, W or Y. Where
the residue present at the interface in the wild type CsgG monomer
is S, the bulky residue is typically M, F, W, Y, N, Q, E or R.
[0350] Particularly where the second pore is located outside the
membrane, the second pore, and optionally the first pore,
preferably comprises residues in the barrel region of the pore that
reduce the negative charge inside the barrel compared to the charge
in the barrel of the wild type CsgG pore. These mutations make the
barrel more hydrophilic. At least one monomer in the first pore
and/or at least one monomer in the second pore of the double pore
may comprise at least one residue in the barrel region of the pore,
which residue has less negative charge than the residue present at
the corresponding position in the wild type CsgG monomer. The
charge inside the barrel is sufficiently neutral or positive such
that negatively charged analytes, such as polynucleotides, are not
repelled from entering the pore by electrostatic charges. At least
one residue, such as 2, 3, 4 or 5 residues, in the barrel region of
the pore at a position corresponding to D149, E185, D195, E210
and/or E203 of SEQ ID NO: 3 may be a neutral or positively charged
amino acid. At least one residue, such as 2, 3, 4 or 5 residues, in
the barrel region of the pore at a position corresponding to D149,
E185, D195, E210 and/or E203 of SEQ ID NO: 3 is preferably N, Q, R
or K.
[0351] Particular examples of charge-removing mutations in SEQ ID
NO: 3 include the following: E185N/E203N;
D149N/E185R/D195N/E201R/E203N, D149N/E185R/D195N/E201N/E203N,
D149R/E185N/D195N/E201N/E203N, D149R/E185N/E201N/E203N,
D149N/E185N/D195/E201N/E203N, D149N/E185N/E201N/E203N,
D149N/E185N/E203N, D149N/E185N/E201N, D149N/E203N,
D149N/E201N/D195N, D149N/E201N, D195N/E201N/E203N, E201N/E203N,
D195N/E203, E203R, E203N, E201R, E201N, D195R, D195N, E185R, E185N,
D149R and D149N.
[0352] At least one CsgG monomer in the first pore may comprise at
least one residue in the constriction of the barrel region of the
first pore, which residue decreases, maintains or increases the
length of the constriction compared to the wild type CsgG pore
and/or at least one monomer in the second CsgG pore may comprise at
least one residue in the constriction of the barrel region of the
second pore, which residue decreases, maintains or increases the
length of the constriction compared to the wild type CsgG pore.
Preferably, the length of the constriction in the first pore and/or
the length of the constriction in the second pore is at least as
long as in the wild-type pore and more preferably longer.
[0353] The length of the pore may be increased by inserting
residues into the region corresponding to the region between
positions K49 and F56 of SEQ ID NO: 3. From 1 to 5, such as 2, 3,
or 4 amino acid residues may be inserted at any one or more of the
following positions defined by reference to SEQ ID NO: 3: K49 and
P50, P50 and Y51, Y51 and P52, P52 and A53, A53 and S54, S54 and
N55 and/or N55 and F56. Preferably from 1 to 10, such as 2 to 8, or
3 to 5 amino acid residues in total are inserted into the sequence
of a monomer. Preferably, all of the monomers in the first pore
and/or all of the monomers in the second pore have the same number
of insertions in this region. The inserted residues may increase
the length of the loop between the residues corresponding to Y51
and N55 of SEQ ID NO: 3. The inserted residues may be any
combination of A, S, G or T to maintain flexibility; P to add a
kink to the loop; and/or S, T, N, Q, M, F, W, Y, V and/or I to
contribute to the signal produced when a analyte interacts with the
barrel of the pore under an applied potential difference. The
inserted amino acids may be any combination of S, G, SG, SGG, SGS,
GS, GSS and/or GSG.
[0354] In the double pore, the constriction in the barrel of the
first pore and/or the second pore may comprise at least one
residue, such as 2, 3, 4 or 5 residues, which influences the
properties of the pore when used to detect or characterise an
analyte compared to when a first pore or a second pore with a
wild-type constriction is used, wherein the at least one residue in
the constriction of the barrel region of the pore is at a position
corresponding to Y51, N55, Y51, P52 and/or A53 of SEQ ID NO: 3. The
at least one residue may be Q or V at a position corresponding to
F56 of SEQ ID NO: 3; A or Q at a position corresponding to Y51 of
SEQ ID NO: 3; and/or V at a position corresponding to N55 of SEQ ID
NO: 3.
[0355] The double pore may comprise at least one monomer in the
first CsgG pore and/or at least one monomer in the second CsgG
pore, which monomer comprises two or more of the mutations defined
above.
[0356] A CsgG monomer in the double pore may comprise a cysteine
residue at a position corresponding to R97, I107, R110, Q100, E101,
N102 and or L113 of SEQ ID NO: 3.
[0357] A CsgG monomer in the double pore may comprise a residue at
a position corresponding to any one or more of R97, Q100, I107,
R110, E101, N102 and L113 of SEQ ID NO: 3, which residue is more
hydrophobic than the residue present at the corresponding position
of SEQ ID NO: 3, such as the corresponding position of any one of
SEQ ID NOs: 68 to 88, wherein the residue at the position
corresponding to R97 and/or I107 is M, the residue at the position
corresponding to R110 is I, L, V, M, W or Y, and/or the residue at
the position corresponding to E101 or N102 is V or M. The residue
at a position corresponding to Q100 is typically I, L, V, M, F, W
or Y; and or the residue at a position corresponding to L113 is
typically I, V, M, F, W or Y.
[0358] Particular monomers may have the sequence shown in SEQ ID NO
3 comprising Y51A, F56Q substitutions and R97I/V/L/M/F/W/Y,
I107L/V/M/F/W/Y, R110I/V/L/M/F/W/Y, Q100I/V/L/M/F/W/Y,
E101I/V/L/M/F/W/Y, N102I/V/L/M/F/W/Y and L113CI/V/L/M/F/W/Y in
combination, R97I/V/L/M/F/W/Y and N102I/V/L/M/F/W/Y in combination
and/or R97I/V/L/M/F/W/Y and E101I/V/L/M/F/W/Y in combination. I107
may already form hydrophobic interactions between two pores.
[0359] The CsgG monomer in at least one of the pores in the double
pore may comprise a residue at a position corresponding to any one
or more of A98, A99, T104, V105, L113, Q114 and S115 of SEQ ID NO:
3 which is bulkier than the residue present at the corresponding
position of SEQ ID NO: 3, such as the corresponding position of any
one of SEQ ID NOs: 68 to 88, wherein the residue at the position
corresponding to T104 is L, M, F, W, Y, N, Q, D or E, the residue
at the position corresponding to L113 is M, F, W, Y, N, G, D or E
and/or the residue at the position corresponding to S115 is M, F,
W, Y, N, Q or E. The residue at a position corresponding to A98 or
A99, is typically I, L, V, M, F, W, Y, N, Q, S or T. The residue at
a position corresponding to V105 is I, L, M, F, W, Y, N or Q. The
residue at a position corresponding to Q114 is F, W or Y. The
residue at a position corresponding to E210 is N, Q, R or K.
[0360] Particular monomers may have the sequence shown in SEQ ID NO
3 comprising Y51A, F56Q substitutions and 1, 2, 3, 4, 5, 6 or all
of the following substitutions: A981/L/V/M/F/W/Y/N/Q/S/T;
A991/L/V/M/F/W/Y/N/Q/S/T; T104N/Q/L/R/D/E/M/F/W/Y;
V105I/L/M/F/W/Y/N/Q; L113M/F/W/Y/N/Q/D/E/L/R; Q114Y/F/W; and
S115N/Q/M/F/W/Y/E/R.
[0361] The CsgG monomer in at least one of the pores in the double
pore may comprise a residue in the barrel region of the pore at a
position corresponding to any one ore more of D149, E185, D195,
E210 and E203 less negative charge than the residue present at the
corresponding position of SEQ ID NO: 3, such as the corresponding
position of any one of SEQ ID NOs: 68 to 88, wherein the residue at
the position corresponding to D149, E185, D195 and/or E203 is
K.
[0362] The CsgG monomer in at least one of the pores in the double
pore may comprise at least one residue in the constriction of the
barrel region of the pore, which residue increases the length of
the constriction compared to the wild type CsgG pore. The at least
one residue is additional to the residues present in the
constriction of the wild type CsgG pore.
[0363] The length of the pore may be increased by inserting
residues into the region corresponding to the region between
positions K49 and F56 of SEQ ID NO: 3. From 1 to 5, such as 2, 3,
or 4 amino acid residues may be inserted at any one or more of the
following positions defined by reference to SEQ ID NO: 3: K49 and
P50, P50 and Y51, Y51 and P52, P52 and A53, A53 and S54, S54 and
N55 and/or N55 and F56. Preferably from 1 to 10, such as 2 to 8, or
3 to 5 amino acid residues in total are inserted into the sequence
of the monomer. The inserted residues may increase the length of
the loop between the residues corresponding to Y51 and N55 of SEQ
ID NO: 3. The inserted residues may be any combination of A, S, G
or T to maintain flexibility; P to add a kink to the loop; and/or
S, T, N, Q, M, F, W, Y, V and/or I to contribute to the signal
produced when a analyte interacts with the barrel of the pore under
an applied potential difference. The inserted amino acids may be
any combination of S, G, SG, SGG, SGS, GS, GSS and/or GSG.
[0364] The CsgG monomer in at least one of the pores in the double
pore may comprise at least one residue in the constriction of the
barrel region of the pore at a position corresponding to N55, P52
and/or A53 of SEQ ID NO: 3 that is different from the residue
present in the corresponding wild type monomer, wherein the residue
at a position corresponding to N55 is V.
[0365] Any two or more of the above described residues may be
present in the same monomer. In particular the monomer may comprise
at least one said cysteine residue, at least one said hydrophobic
residue, at least one said bulky residue, at least one said neutral
or positively charged residue and/or at least one said residue that
increases the length of the constriction.
[0366] A CsgG monomer in the double pore may additionally comprise
one or more, such as 2, 3, 4 or 5 residues, which influence the
properties of the pore when used to detect or characterise an
analyte compared to when a first pore or a second pore with a
wild-type constriction is used, wherein the at least one residue in
the constriction of the barrel region of the pore is at a position
corresponding to Y51, N55, Y51, P52 and/or A53 of SEQ ID NO: 3. The
at least one residue may be Q or V at a position corresponding to
F56 of SEQ ID NO: 3; A or Q at a position corresponding to Y51 of
SEQ ID NO: 3; and/or V at a position corresponding to N55 of SEQ ID
NO: 3.
Method for Making Modified Proteins
[0367] Methods for introducing or substituting
non-naturally-occurring amino acids are also well known in the art.
For instance, non-naturally-occurring amino acids may be introduced
by including synthetic aminoacyl-tRNAs in the IVTT system used to
express the mutant monomer. Alternatively, they may be introduced
by expressing the mutant monomer in E. coli that are auxotrophic
for specific amino acids in the presence of synthetic (i.e.
non-naturally-occurring) analogues of those specific amino acids.
They may also be produced by naked ligation if the mutant monomer
is produced using partial peptide synthesis.
[0368] The monomers derived from CsgG may be modified to assist
their identification or purification, for example by the addition
of a streptavidin tag or by the addition of a signal sequence to
promote their secretion from a cell where the monomer does not
naturally contain such a sequence. Other suitable tags are
discussed in more detail below. The monomer may be labelled with a
revealing label. The revealing label may be any suitable label
which allows the monomer to be detected. Suitable labels are
described below.
[0369] The monomer derived from CsgG may also be produced using
D-amino acids. For instance, the monomer derived from CsgG may
comprise a mixture of L-amino acids and D-amino acids. This is
conventional in the art for producing such proteins or
peptides.
[0370] The monomer derived from CsgG contains one or more specific
modifications to facilitate nucleotide discrimination. The monomer
derived from CsgG may also contain other non-specific modifications
as long as they do not interfere with pore formation. A number of
non-specific side chain modifications are known in the art and may
be made to the side chains of the monomer derived from CsgG. Such
modifications include, for example, reductive alkylation of amino
acids by reaction with an aldehyde followed by reduction with
NaBH.sub.4, amidination with methylacetimidate or acylation with
acetic anhydride.
[0371] The monomer derived from CsgG can be produced using standard
methods known in the art. The monomer derived from CsgG may be made
synthetically or by recombinant means. For example, the monomer may
be synthesised by in vitro translation and transcription (IVTT).
Suitable methods for producing pores and monomers are discussed in
the International applications WO 2010/004273, WO 2010/004265 or WO
2010/086603. Methods for inserting pores into membranes are
known.
[0372] Two or more CsgG monomers in the pore may be covalently
attached to one another. For example, at least 2, at least 3, at
least 4, at least 5, at least 6, at least 7, at least 8, at least 9
or at least 10 monomers may be covalently attached. The covalently
attached monomers may be the same or different.
[0373] The monomers may be genetically fused, optionally via a
linker, or chemically fused, for instance via a chemical
crosslinker. Methods for covalently attaching monomers are
disclosed in WO2017/149316, WO2017/149317 and WO2017/149318.
[0374] In some embodiments, the mutant monomer is chemically
modified. The mutant monomer can be chemically modified in any way
and at any site. The mutant monomer is preferably chemically
modified by attachment of a molecule to one or more cysteines
(cysteine linkage), attachment of a molecule to one or more
lysines, attachment of a molecule to one or more non-natural amino
acids, enzyme modification of an epitope or modification of a
terminus. Suitable methods for carrying out such modifications are
well-known in the art. The mutant monomer may be chemically
modified by the attachment of any molecule. For instance, the
mutant monomer may be chemically modified by attachment of a dye or
a fluorophore.
[0375] In some embodiments, the mutant monomer is chemically
modified with a molecular adaptor that facilitates the interaction
between a pore comprising the monomer and a target nucleotide or
target polynucleotide sequence. The presence of the adaptor
improves the host-guest chemistry of the pore and the nucleotide or
polynucleotide sequence and thereby improves the sequencing ability
of pores formed from the mutant monomer. The principles of
host-guest chemistry are well-known in the art. The adaptor has an
effect on the physical or chemical properties of the pore that
improves its interaction with the nucleotide or polynucleotide
sequence. The adaptor may alter the charge of the barrel or channel
of the pore or specifically interact with or bind to the nucleotide
or polynucleotide sequence thereby facilitating its interaction
with the pore.
[0376] The molecular adaptor is preferably a cyclic molecule, a
cyclodextrin, a species that is capable of hybridization, a DNA
binder or interchelator, a peptide or peptide analogue, a synthetic
polymer, an aromatic planar molecule, a small positively-charged
molecule or a small molecule capable of hydrogen-bonding.
[0377] The adaptor may be cyclic. A cyclic adaptor preferably has
the same symmetry as the pore. The adaptor preferably has
eight-fold or nine-fold symmetry since CsgG typically has eight or
nine subunits around a central axis. This is discussed in more
detail below.
[0378] The adaptor typically interacts with the nucleotide or
polynucleotide sequence via host-guest chemistry. The adaptor is
typically capable of interacting with the nucleotide or
polynucleotide sequence. The adaptor comprises one or more chemical
groups that are capable of interacting with the nucleotide or
polynucleotide sequence. The one or more chemical groups preferably
interact with the nucleotide or polynucleotide sequence by
non-covalent interactions, such as hydrophobic interactions,
hydrogen bonding, Van der Waal's forces, .pi.-cation interactions
and/or electrostatic forces. The one or more chemical groups that
are capable of interacting with the nucleotide or polynucleotide
sequence are preferably positively charged. The one or more
chemical groups that are capable of interacting with the nucleotide
or polynucleotide sequence more preferably comprise amino groups.
The amino groups can be attached to primary, secondary or tertiary
carbon atoms. The adaptor even more preferably comprises a ring of
amino groups, such as a ring of 6, 7 or 8 amino groups. The adaptor
most preferably comprises a ring of eight amino groups. A ring of
protonated amino groups may interact with negatively charged
phosphate groups in the nucleotide or polynucleotide sequence.
[0379] The correct positioning of the adaptor within the pore can
be facilitated by host-guest chemistry between the adaptor and the
pore comprising the mutant monomer. The adaptor preferably
comprises one or more chemical groups that are capable of
interacting with one or more amino acids in the pore. The adaptor
more preferably comprises one or more chemical groups that are
capable of interacting with one or more amino acids in the pore via
non-covalent interactions, such as hydrophobic interactions,
hydrogen bonding, Van der Waal's forces, .pi.-cation interactions
and/or electrostatic forces. The chemical groups that are capable
of interacting with one or more amino acids in the pore are
typically hydroxyls or amines. The hydroxyl groups can be attached
to primary, secondary or tertiary carbon atoms. The hydroxyl groups
may form hydrogen bonds with uncharged amino acids in the pore. Any
adaptor that facilitates the interaction between the pore and the
nucleotide or polynucleotide sequence can be used.
[0380] Suitable adaptors include, but are not limited to,
cyclodextrins, cyclic peptides and cucurbiturils. The adaptor is
preferably a cyclodextrin or a derivative thereof. The cyclodextrin
or derivative thereof may be any of those disclosed in Eliseev, A.
V., and Schneider, H-J. (1994) J. Am. Chem. Soc. 116, 6081-6088.
The adaptor is more preferably heptakis-6-amino-.beta.-cyclodextrin
(am.sub.7-.beta.CD), 6-monodeoxy-6-monoamino-.beta.-cyclodextrin
(am.sub.1-.quadrature.CD) or
heptakis-(6-deoxy-6-guanidino)-cyclodextrin (gu.sub.7-.beta.CD).
The guanidino group in gu.sub.7-.beta.CD has a much higher pKa than
the primary amines in am.sub.7-.beta.CD and so it is more
positively charged. This gu.sub.7-.beta.CD adaptor may be used to
increase the dwell time of the nucleotide in the pore, to increase
the accuracy of the residual current measured, as well as to
increase the base detection rate at high temperatures or low data
acquisition rates.
[0381] If a succinimidyl 3-(2-pyridyldithio)propionate (SPDP)
crosslinker is used as discussed in more detail below, the adaptor
is preferably
heptakis(6-deoxy-6-amino)-6-N-mono(2-pyridyl)dithiopropanoyl-.beta.-cyclo-
dextrin (am.sub.6amPDP.sub.1-.beta.CD).
[0382] More suitable adaptors include .gamma.-cyclodextrins, which
comprise 9 sugar units (and therefore have nine-fold symmetry). The
.gamma.-cyclodextrin may contain a linker molecule or may be
modified to comprise all or more of the modified sugar units used
in the .beta.-cyclodextrin examples discussed above.
[0383] The molecular adaptor may be covalently attached to the
mutant monomer. The adaptor can be covalently attached to the pore
using any method known in the art. The adaptor is typically
attached via chemical linkage. If the molecular adaptor is attached
via cysteine linkage, the one or more cysteines have preferably
been introduced to the mutant, for instance in the barrel, by
substitution. The mutant monomer may be chemically modified by
attachment of a molecular adaptor to one or more cysteines in the
mutant monomer. The one or more cysteines may be
naturally-occurring, i.e. at positions 1 and/or 215 in SEQ ID NO:
3. Alternatively, the mutant monomer may be chemically modified by
attachment of a molecule to one or more cysteines introduced at
other positions. The cysteine at position 215 may be removed, for
instance by substitution, to ensure that the molecular adaptor does
not attach to that position rather than the cysteine at position 1
or a cysteine introduced at another position.
[0384] The reactivity of cysteine residues may be enhanced by
modification of the adjacent residues. For instance, the basic
groups of flanking arginine, histidine or lysine residues will
change the pKa of the cysteines thiol group to that of the more
reactive S.sup.- group. The reactivity of cysteine residues may be
protected by thiol protective groups such as dTNB. These may be
reacted with one or more cysteine residues of the mutant monomer
before a linker is attached.
[0385] The molecule may be attached directly to the mutant monomer.
The molecule is preferably attached to the mutant monomer using a
linker, such as a chemical crosslinker or a peptide linker.
[0386] Suitable chemical crosslinkers are well-known in the art.
Preferred crosslinkers include 2,5-dioxopyrrolidin-1-yl
3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl
4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-1-yl
8-(pyridin-2-yldisulfanyl)octananoate. The most preferred
crosslinker is succinimidyl 3-(2-pyridyldithio)propionate (SPDP).
Typically, the molecule is covalently attached to the bifunctional
crosslinker before the molecule/crosslinker complex is covalently
attached to the mutant monomer but it is also possible to
covalently attach the bifunctional crosslinker to the monomer
before the bifunctional crosslinker/monomer complex is attached to
the molecule.
[0387] The linker is preferably resistant to dithiothreitol (DTT).
Suitable linkers include, but are not limited to,
iodoacetamide-based and Maleimide-based linkers.
[0388] In other embodiment, the monomer may be attached to a
polynucleotide binding protein. This forms a modular sequencing
system that may be used in the methods of sequencing of the
invention. Polynucleotide binding proteins are discussed below.
[0389] The polynucleotide binding protein is preferably covalently
attached to the mutant monomer. The protein can be covalently
attached to the monomer using any method known in the art. The
monomer and protein may be chemically fused or genetically fused.
The monomer and protein are genetically fused if the whole
construct is expressed from a single polynucleotide sequence.
Genetic fusion of a monomer to a polynucleotide binding protein is
discussed in WO 2010/004265.
[0390] If the polynucleotide binding protein is attached via
cysteine linkage, the one or more cysteines have preferably been
introduced to the mutant by substitution. The one or more cysteines
are preferably introduced into loop regions which have low
conservation amongst homologues indicating that mutations or
insertions may be tolerated. They are therefore suitable for
attaching a polynucleotide binding protein. In such embodiments,
the naturally-occurring cysteine at position 251 may be removed.
The reactivity of cysteine residues may be enhanced by modification
as described above.
[0391] The polynucleotide binding protein may be attached directly
to the mutant monomer or via one or more linkers. The molecule may
be attached to the mutant monomer using the hybridization linkers
described in as WO 2010/086602. Alternatively, peptide linkers may
be used. Peptide linkers are amino acid sequences. The length,
flexibility and hydrophilicity of the peptide linker are typically
designed such that it does not to disturb the functions of the
monomer and molecule. Preferred flexible peptide linkers are
stretches of 2 to 20, such as 4, 6, 8, 10 or 16, serine and/or
glycine amino acids. More preferred flexible linkers include
(SG).sub.1, (SG)2, (SG).sub.3, (SG).sub.4, (SG).sub.5 and
(SG).sub.8 wherein S is serine and G is glycine. Preferred rigid
linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24,
proline amino acids. More preferred rigid linkers include
(P).sub.12 wherein P is proline.
Chemical Modification
[0392] The mutant CsgG monomer or CsgF peptide may be chemically
modified with a molecular adaptor and a polynucleotide binding
protein.
[0393] The molecule (with which the monomer or peptide is
chemically modified) may be attached directly to the monomer or
peptide or attached via a linker as disclosed in WO 2010/004273, WO
2010/004265 or WO 2010/086603.
[0394] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be modified to assist their
identification or purification, for example by the addition of
histidine residues (a his tag), aspartic acid residues (an asp
tag), a streptavidin tag, a flag tag, a SUMO tag, a GST tag or a
MBP tag, or by the addition of a signal sequence to promote their
secretion from a cell where the polypeptide does not naturally
contain such a sequence. An alternative to introducing a genetic
tag is to chemically react a tag onto a native or engineered
position on the protein. An example of this would be to react a
gel-shift reagent to a cysteine engineered on the outside of the
protein. This has been demonstrated as a method for separating
hemolysin hetero-oligomers (Chem Biol. 1997 July;
4(7):497-505).
[0395] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be labelled with a revealing
label. The revealing label may be any suitable label which allows
the protein to be detected. Suitable labels include, but are not
limited to, fluorescent molecules, radioisotopes, e.g. .sup.125I,
.sup.35S, enzymes, antibodies, antigens, polynucleotides and
ligands such as biotin.
[0396] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, may be made synthetically or by
recombinant means. For example, the protein may be synthesised by
in vitro translation and transcription (IVTT). The amino acid
sequence of the protein may be modified to include non-naturally
occurring amino acids or to increase the stability of the protein.
When a protein is produced by synthetic means, such amino acids may
be introduced during production. The protein may also be altered
following either synthetic or recombinant production.
[0397] Proteins may also be produced using D-amino acids. For
instance, the protein may comprise a mixture of L-amino acids and
D-amino acids. This is conventional in the art for producing such
proteins or peptides.
[0398] The protein may also contain other non-specific
modifications as long as they do not interfere with the function of
the protein. A number of non-specific side chain modifications are
known in the art and may be made to the side chains of the
protein(s). Such modifications include, for example, reductive
alkylation of amino acids by reaction with an aldehyde followed by
reduction with NaBH.sub.4, amidination with methylacetimidate or
acylation with acetic anhydride.
[0399] Any of the proteins described herein, such as the CsgG
monomers and/or CsgF peptides, can be produced using standard
methods known in the art. Polynucleotide sequences encoding a
protein may be derived and replicated using standard methods in the
art. Polynucleotide sequences encoding a protein may be expressed
in a bacterial host cell using standard techniques in the art. The
protein may be produced in a cell by in situ expression of the
polypeptide from a recombinant expression vector. The expression
vector optionally carries an inducible promoter to control the
expression of the polypeptide. These methods are described in
Sambrook, J. and Russell, D. (2001). Molecular Cloning: A
Laboratory Manual, 3rd Edition. Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, N.Y.
[0400] Proteins may be produced in large scale following
purification by any protein liquid chromatography system from
protein producing organisms or after recombinant expression.
Typical protein liquid chromatography systems include FPLC, AKTA
systems, the Bio-Cad system, the Bio-Rad BioLogic system and the
Gilson HPLC system.
Method of Producing Pores
[0401] In a third aspect, the invention provides methods to in vivo
and in vitro produce CsgG: modified CsgF pore complex holding two
or more constriction sites. One embodiment provides a method for
producing a transmembrane pore complex, comprising a CsgG pore, or
homologue or mutant form thereof, and the modified CsgF peptide, or
its homologue or mutant, via co-expression. Said method comprising
the steps of expressing CsgG monomers (expressed as preprotein
provided in SEQ ID NO: 2, or a homologue or mutant thereof), and
expressing modified or truncated CsgF monomers, both in a suitable
host cell, allowing in vivo complex pore formation. Said complex
comprises modified CsgF peptides, in complex with the CsgG pore, to
provide the pore with an additional reader head. The resulting pore
complex produced by said method using modified CsgF peptides
provides a structure that is sufficient for a use of the pore
complex in characterization of target analytes such as nucleic acid
sequencing, as it allows passage of the analytes, in particular
polynucleotide strands, and comprises two or more reader heads for
improved reading of said polynucleotide sequence, when used in the
appropriate settings for said application.
More particularly, the modified CsgF peptide expressed in said
method comprises the preprotein depicted in SEQ ID NO: 8, 10, 12,
or 14, or homologues thereof. Those sequences limiting the method
to those CsgF fragments capable of introducing a constriction site
in the pore complex, and binding to the CsgG protein pore, to
obtain a biological pore.
[0402] Another method for producing an isolated pore complex formed
by the CsgG and CsgF proteins or the like, relates to in vitro
reconstitution of said monomers to obtain a functional pore. Said
method comprises the steps of contacting the mature CsgG monomer as
depicted in SEQ ID NO:3 or a homologue or mutant thereof, with
modified CsgF peptides, or homologues or mutants thereof, in a
suitable system to allow complex formation. Said system may be an
"in vitro system", which refers to a system comprising at least the
necessary components and environment to execute said method, and
makes use of biological molecules, organisms, a cell (or part of a
cell) outside of their normal naturally-occurring environment,
permitting a more detailed, more convenient, or more efficient
analysis than can be done with whole organisms. An in vitro system
may also comprise a suitable buffer composition provided in a test
tube, wherein said protein components to form the complex have been
added. A person skilled in the art is aware of the options to
provide said system. Said modified CsgF peptides or the like
applied in said method for in vitro reconstitution in particular
embodiments are peptides comprising SEQ ID NO:15 or SEQ ID NO:16,
or mutant or homologues thereof, which can be synthesized or
recombinantly produced. Alternatively, modified CsgF peptides
comprising SEQ ID NO:40, 39, 38, or 37, 15, 54, 55 or homologues or
mutants thereof, are provided in said method for contacting with
CsgG or CsgG-like pores to produce the pore complex.
[0403] The CsgG/CsgF pore can be made by any suitable method.
Examples of such suitable methods are described.
[0404] In one embodiment, the CsgG/CsgF pore may be produced by
co-expression. In this embodiment, at least one gene encoding a
CsgG monomer polypeptide (which may be a mutant polypeptide) in one
vector and a gene encoding at least one full length or truncated
CsgF polypeptide (which may be a mutant polypeptide) in a second
vector may be transformed together to express the proteins and make
the complex within transformed cells. This could be in vivo or in
vitro. Alternatively, the two genes encoding the polypeptides of
CsgG and CsgF can be placed in one vector under the control of a
single promotor or under the control of two separate promoters,
which may be the same or different.
[0405] In another embodiment, the CsgG/CsgF pore is produced by
expressing the CsgG monomer(s) separately from the CsgF peptide.
CsgG monomers or a CsgG pore may be purified from the cells
transformed with a vector encoding at least one CsgG monomer, or
with more than one vector each expressing a CsgG monomer. The CsgF
peptide may be purified from the cells transformed with a vector
encoding at least one CsgF peptide. The purified CsgG
monomer(s)/pore may then be incubated together with the CsgF
peptide(s) to make the pore complex.
[0406] In another embodiment, the CsgG monomer(s) and/or the CsgF
peptide are produced separately by in vitro translation and
transcription (IVTT). The CsgG monomer(s) may then be incubated
together with the CsgF peptide(s) to make the pore complex. Use of
this method is illustrated in FIG. 14.
[0407] The above embodiments may be combined, such that for
example, (i) CsgG is produced in vivo and CsgF in vivo; (ii) CsgG
is produced in vitro and CsgF in vivo; (iii) CsgG is produced in
vivo and CsgF in vitro; or (iv) CsgG is produced in vitro and CsgF
in vitro.
[0408] One or both of the CsgG monomer and the CsgF peptide may be
tagged to facilitate purification. Purification can also be
performed when the CsgG monomer and/or CsgF peptide are untagged.
Methods known in the art (e.g. ion exchange, gel filtration,
hydrophobic interaction column chromatography etc.) can be used
alone or in different combinations to purify the components of the
pore.
[0409] Any known tags can be used in any of the two proteins. In
one embodiment, two tag purification can be used to purify the
CsgG:CsgF complex from CsgG pore and CsgF. For example, a Strep tag
can be used in CsgG and His tag can be used in CsgF or vice versa.
This is exemplified in FIG. 13. A similar end result can be
obtained when the two proteins are purified individually and mixed
together followed by another round of Strep and His
purification.
[0410] When the full length CsgF protein forms the complex with
CsgG, the neck and head domains of CsgF (FIG. 4B) (shown in the red
box) protrudes out from the beta barrel of the CsgG pore.
[0411] Therefore, if a pore comprising a CsgG pore and the full
length CsgF were used in single channel recording experiments, the
head domain may hinder or prevent pore insertion in to a membrane.
They may also potentially block the passage of the pore for
analytes to pass through. Thus, when inserting a pore into a
membrane, it is better if the number of flexible polypeptides
dangling down from the beta-barrel are reduced. Truncated versions
of the CsgF protein that mimic the FCP region resolved in the cryo
EM structure of the complex and that maintain structural integrity
are provided herein.
[0412] The CsgG/CsgF pore can be made prior to insertion into a
membrane or after insertion of the CsgG pore into a membrane. When
the pore complex is made prior to insertion into a membrane, a
truncation mutant is preferably used. However, the CsgG pore may be
inserted into a membrane and the CsgF peptide may be added
afterwards so that the CsgG and CsgF complex can form in situ. For
example, in one embodiment a system where the trans side of the
membrane is accessible (for example in a chip or chamber for
electrophysiogy measurements), the CsgG pore may be inserted into
the membrane, and then a CsgF peptide may be added from the trans
side of the membrane, so that the complex can be formed in-situ. In
any embodiment in which the CsgG pore is formed in situ, larger
CsgF peptide may be used. For example, the CsgF peptide may
comprise all or part of the neck domain of CsgF (from about residue
36 of SEQ ID NO: 6). In some embodiments the CsgF may comprise all
of the neck domain and part of the head domain (residues 36 to XX
of SEQ ID NO: 6)
[0413] Depending on the methods of making the complex and the
stability of the complex with a particular truncation, CsgG:CsgF
and CsgG:FCP complexes can be made in different methods.
[0414] In one embodiment, the truncated version of the CsgF
polypeptide at the required length is used directly.
[0415] Another embodiment uses the full length polypeptide of CsgF
or a longer than required truncation (enough to keep the complex
stable) in which a protease cleavage sites is inserted (e.g. TEV,
HRV 3 or any other protease cleavage site) so that cleavage by the
protease produces a CsgF peptide of the required length. In this
embodiment, once the CsgG/CsgF complex is formed, the protease is
used to cleave the CsgF at the required site. Alternatively, the
protease may be used to produce the CsgF peptide prior to complex
assembly.
[0416] Some protease sites will leave an additional tag behind
after cleavage. For example, the TEV protease cleavage sequence is
ENLYFQS. TEV protease cleaves the protein between Q and S leaving
ENLYFQ intact at the C-terminus of the CsgF peptide. FIG. 15 shows
an example using a modified CsgF comprising a TEV cleavage site
which is cleaved using TEV protease after complex formation. Byway
of another example, the HRV C3 cleavage site is LEVLFQGP and the
enzyme cleaves between Q and G leaving LEVLFQ intact at the
C-terminus of the CsgF peptide.
Methods of Characterising an Analyte
[0417] In a further aspect, the invention provides a method of
determining the presence, absence or one or more characteristics of
a target analyte. The method involves contacting the target analyte
with an isolated pore complex, or transmembrane pore, such as a
pore of the invention, such that the target analyte moves with
respect to, such as into or through, the pore channel and taking
one or more measurements as the analyte moves with respect to the
pore and thereby determining the presence, absence or one or more
characteristics of the analyte. The target analyte may also be
called the template analyte or the analyte of interest. The
isolated pore complex typically comprises at least 7, at least 8,
at least 9 or at least 10 monomers, such as 7, 8, 9 or 10 CsgG
monomers. The isolated pore complex preferably comprises eight or
nine identical CsgG monomers. One or more, such as 2, 3, 4, 5, 6,
7, 8, 9 or 10, of the CsgG monomers is preferably chemically
modified, or the CsgF peptide is chemically modified. The isolated
pore complex monomers, such as the CsgG monomers, or homologues or
mutants thereof, and the modified CsgF monomers, or homologues or
mutants thereof, may be derived from any organism. The analyte may
pass through the CsgG constriction, followed by the CsgF
constriction. In an alternative embodiment the analyte may pass
through the CsgF constriction, followed by the CsgG constriction,
depending on the orientation of the CsgG/CsgF complex in the
membrane.
[0418] The method is for determining the presence, absence or one
or more characteristics of a target analyte. The method may be for
determining the presence, absence or one or more characteristics of
at least one analyte. The method may concern determining the
presence, absence or one or more characteristics of two or more
analytes. The method may comprise determining the presence, absence
or one or more characteristics of any number of analytes, such as
2, 5, 10, 15, 20, 30, 40, 50, 100 or more analytes. Any number of
characteristics of the one or more analytes may be determined, such
as 1, 2, 3, 4, 5, 10 or more characteristics.
[0419] The binding of a molecule in the channel of the pore
complex, or in the vicinity of either opening of the channel will
have an effect on the open-channel ion flow through the pore, which
is the essence of "molecular sensing" of pore channels. In a
similar manner to the nucleic acid sequencing application,
variation in the open-channel ion flow can be measured using
suitable measurement techniques by the change in electrical current
(for example, WO 2000/28312 and D. Stoddart et al., Proc. Natl.
Acad. Sci., 2010, 106, 7702-7 or WO 2009/077734). The degree of
reduction in ion flow, as measured by the reduction in electrical
current, is related to the size of the obstruction within, or in
the vicinity of, the pore. Binding of a molecule of interest, also
referred to as an "analyte", in or near the pore therefore provides
a detectable and measurable event, thereby forming the basis of a
"biological sensor". Suitable molecules for nanopore sensing
include nucleic acids; proteins; peptides; polysaccharides and
small molecules (refers here to a low molecular weight (e.g.,
<900 Da or <500 Da) organic or inorganic compound) such as
pharmaceuticals, toxins, cytokines, and pollutants. Detecting the
presence of biological molecules finds application in personalised
drug development, medicine, diagnostics, life science research,
environmental monitoring and in the security and/or the defence
industry.
[0420] In another aspect, the isolated pore complex, or the
transmembrane pore complex containing a wild type or modified E.
coli CsgG nanopore, or homologue or mutant thereof, and a modified
CsgF peptide providing a channel constriction to the pore within
the complex, may serve as a molecular or biological sensor. In some
embodiments, the CsgG nanopore can be derived or isolated from
bacterial proteins (e.g., E. coli, Salmonella typhi). In some
embodiments, the CsgG nanopore can be recombinantly produced.
Procedures for analyte detection are described in Howorka et al.
Nature Biotechnology (2012) Jun. 7; 30(6):506-7. The analyte
molecule that is to be detected may bind to either face of the
channel, or within the lumen of the channel itself. The position of
binding may be determined by the size of the molecule to be
sensed.
[0421] The target analyte is preferably a metal ion, an inorganic
salt, a polymer, an amino acid, a peptide, a polypeptide, a
protein, a nucleotide, an oligonucleotide, a polynucleotide, a
polysaccharide, a dye, a bleach, a pharmaceutical, a diagnostic
agent, a recreational drug, an explosive, a toxic compound, or an
environmental pollutant. The method may concern determining the
presence, absence or one or more characteristics of two or more
analytes of the same type, such as two or more proteins, two or
more nucleotides or two or more pharmaceuticals. Alternatively, the
method may concern determining the presence, absence or one or more
characteristics of two or more analytes of different types, such as
one or more proteins, one or more nucleotides and one or more
pharmaceuticals.
[0422] The target analyte can be secreted from cells.
Alternatively, the target analyte can be an analyte that is present
inside cells such that the analyte must be extracted from the cells
before the method can be carried out.
[0423] A wild-type pore may act as sensor, but is often modified
via recombinant or chemical methods to increase the strength of
binding, the position of binding, or the specificity of binding of
the molecule to be sensed. Typical modifications include addition
of a specific binding moiety complimentary to the structure of the
molecule to be sensed. Where the analyte molecule comprises a
nucleic acid, this binding moiety may comprise a cyclodextrin or an
oligonucleotide; for small molecules this may be a known
complimentary binding region, for example the antigen binding
portion of an antibody or of a non-antibody molecule, including a
single chain variable fragment (scFv) region or an antigen
recognition domain from a T-cell receptor (TCR); or for proteins,
it may be a known ligand of the target protein. In this way the
wild type or modified E. coli CsgG nanopore, or homologue thereof,
may be rendered capable of acting as a molecular sensor for
detecting presence in a sample of suitable antigens (including
epitopes) that may include cell surface antigens, including
receptors, markers of solid tumours or haematologic cancer cells
(e.g. lymphoma or leukaemia), viral antigens, bacterial antigens,
protozoal antigens, allergens, allergy related molecules, albumin
(e.g. human, rodent, or bovine), fluorescent molecules (including
fluorescein), blood group antigens, small molecules, drugs,
enzymes, catalytic sites of enzymes or enzyme substrates, and
transition state analogues of enzyme substrates. As described
above, modifications may be achieved using known genetic
engineering and recombinant DNA techniques. The positioning of any
adaptation would be dependent on the nature of the molecule to be
sensed, for example, the size, three-dimensional structure, and its
biochemical nature. The choice of adapted structure may make use of
computational structural design. Determination and optimization of
protein-protein interactions or protein-small molecule interactions
can be investigated using technologies such as a BIAcore.RTM. which
detects molecular interactions using surface plasmon resonance
(BIAcore, Inc., Piscataway, N.J.; see also www.biacore.com).
[0424] In one embodiment, the analyte is an amino acid, a peptide,
a polypeptides or protein. The amino acid, peptide, polypeptide or
protein can be naturally-occurring or non-naturally-occurring. The
polypeptide or protein can include within them synthetic or
modified amino acids. Several different types of modification to
amino acids are known in the art. Suitable amino acids and
modifications thereof are above. It is to be understood that the
target analyte can be modified by any method available in the
art.
[0425] In another embodiment, the analyte is a polynucleotide, such
as a nucleic acid, which is defined as a macromolecule comprising
two or more nucleotides. Nucleic acids are particularly suitable
for nanopore sequencing. The naturally-occurring nucleic acid bases
in DNA and RNA may be distinguished by their physical size. As a
nucleic acid molecule, or individual base, passes through the
channel of a nanopore, the size differential between the bases
causes a directly correlated reduction in the ion flow through the
channel. The variation in ion flow may be recorded. Suitable
electrical measurement techniques for recording ion flow variations
are described in, for example, WO 2000/28312 and D. Stoddart et
al., Proc. Natl. Acad. Sci., 2010, 106, pp 7702-7 (single channel
recording equipment); and, for example, in WO 2009/077734
(multi-channel recording techniques). Through suitable calibration,
the characteristic reduction in ion flow can be used to identify
the particular nucleotide and associated base traversing the
channel in real-time. In typical nanopore nucleic acid sequencing,
the open-channel ion flow is reduced as the individual nucleotides
of the nucleic sequence of interest sequentially pass through the
channel of the nanopore due to the partial blockage of the channel
by the nucleotide. It is this reduction in ion flow that is
measured using the suitable recording techniques described above.
The reduction in ion flow may be calibrated to the reduction in
measured ion flow for known nucleotides through the channel
resulting in a means for determining which nucleotide is passing
through the channel, and therefore, when done sequentially, a way
of determining the nucleotide sequence of the nucleic acid passing
through the nanopore. For the accurate determination of individual
nucleotides, it has typically required for the reduction in ion
flow through the channel to be directly correlated to the size of
the individual nucleotide passing through the constriction (or
"reading head"). It will be appreciated that sequencing may be
performed upon an intact nucleic acid polymer that is `threaded`
through the pore via the action of an associated polymerase, for
example. Alternatively, sequences may be determined by passage of
nucleotide triphosphate bases that have been sequentially removed
from a target nucleic acid in proximity to the pore (see for
example WO 2014/187924).
[0426] The polynucleotide or nucleic acid may comprise any
combination of any nucleotides. The nucleotides can be naturally
occurring or artificial. One or more nucleotides in the
polynucleotide can be oxidized or methylated. One or more
nucleotides in the polynucleotide may be damaged. For instance, the
polynucleotide may comprise a pyrimidine dimer. Such dimers are
typically associated with damage by ultraviolet light and are the
primary cause of skin melanomas. One or more nucleotides in the
polynucleotide may be modified, for instance with a label or a tag,
for which suitable examples are known by a skilled person. The
polynucleotide may comprise one or more spacers. A nucleotide
typically contains a nucleobase, a sugar and at least one phosphate
group. The nucleobase and sugar form a nucleoside. The nucleobase
is typically heterocyclic. Nucleobases include, but are not limited
to, purines and pyrimidines and more specifically adenine (A),
guanine (G), thymine (T), uracil (U) and cytosine (C). The sugar is
typically a pentose sugar. Nucleotide sugars include, but are not
limited to, ribose and deoxyribose. The sugar is preferably a
deoxyribose. The polynucleotide preferably comprises the following
nucleosides: deoxyadenosine (dA), deoxyuridine (dU) and/or
thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC). The
nucleotide is typically a ribonucleotide or deoxyribonucleotide.
The nucleotide typically contains a monophosphate, diphosphate or
triphosphate. The nucleotide may comprise more than three
phosphates, such as 4 or 5 phosphates. Phosphates may be attached
on the 5' or 3' side of a nucleotide. The nucleotides in the
polynucleotide may be attached to each other in any manner. The
nucleotides are typically attached by their sugar and phosphate
groups as in nucleic acids. The nucleotides may be connected via
their nucleobases as in pyrimidine dimers. The polynucleotide may
be single stranded or double stranded. At least a portion of the
polynucleotide is preferably double stranded. The polynucleotide is
most preferably ribonucleic nucleic acid (RNA) or deoxyribonucleic
acid (DNA). In particular, said method using a polynucleotide as an
analyte alternatively comprises determining one or more
characteristics selected from (i) the length of the polynucleotide,
(ii) the identity of the polynucleotide, (iii) the sequence of the
polynucleotide, (iv) the secondary structure of the polynucleotide
and (v) whether or not the polynucleotide is modified.
[0427] The polynucleotide can be any length (i). For example, the
polynucleotide can be at least 10, at least 50, at least 100, at
least 150, at least 200, at least 250, at least 300, at least 400
or at least 500 nucleotides or nucleotide pairs in length. The
polynucleotide can be 1000 or more nucleotides or nucleotide pairs,
5000 or more nucleotides or nucleotide pairs in length or 100000 or
more nucleotides or nucleotide pairs in length. Any number of
polynucleotides can be investigated. For instance, the method may
concern characterising 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100
or more polynucleotides. If two or more polynucleotides are
characterised, they may be different polynucleotides or two
instances of the same polynucleotide. The polynucleotide can be
naturally occurring or artificial. For instance, the method may be
used to verify the sequence of a manufactured oligonucleotide. The
method is typically carried out in vitro.
[0428] Nucleotides can have any identity (ii), and include, but are
not limited to, adenosine monophosphate (AMP), guanosine
monophosphate (GMP), thymidine monophosphate (TMP), uridine
monophosphate (UMP), 5-methylcytidine monophosphate,
5-hydroxymethylcytidine monophosphate, cytidine monophosphate
(CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine
monophosphate (cGMP), deoxyadenosine monophosphate (dAMP),
deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate
(dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine
monophosphate (dCMP) and deoxymethylcytidine monophosphate. The
nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP,
dAMP, dTMP, dGMP, dCMP and dUMP. A nucleotide may be abasic (i.e.
lack a nucleobase). A nucleotide may also lack a nucleobase and a
sugar (i.e. is a C3 spacer). The sequence of the nucleotides (iii)
is determined by the consecutive identity of following nucleotides
attached to each other throughout the polynucleotide strain, in the
5' to 3' direction of the strand.
[0429] The pores comprising a CsgG pore and CsgF peptides are
particularly useful in analysing homopolymers. For example, the
pores may be used to determine the sequence of a polynucleotide
comprising two or more, such as at least 3, 4, 5, 6, 7, 8, 9 or 10,
consecutive nucleotides that are identical. For example, the pores
may be used to sequence a polynucleotide comprising a polyA, polyT,
polyG and/or polyC region.
[0430] The CsgG pore constriction is made of the residues at the
51, 55 and 56 positions of SEQ ID NO: 3. The reader head of CsgG
and its constriction mutants are generally sharp. When DNA is
passing through the constriction, interactions of approximately 5
bases of DNA with the reader head of the pore at any given time
dominate the current signal. Although these sharper reader heads
are very good in reading mixed sequence regions of DNA (when A, T,
G and C are mixed), the signal becomes flat and lack information
when there is a homopolymeric region within the DNA (eg: polyT,
polyG, polyA, polyC). Because 5 bases dominate the signal of the
CsgG and its constriction mutants, its difficult to discriminate
photopolymers longer than 5 without using additional dwell time
information. However, if DNA is passing through a second reader
head, more DNA bases will interact with the combined reader heads,
increasing the length of the homopolymers that can be
discriminated. The Examples and Figures show that such an increase
in homopolymer sequencing accuracy is achieved using the pore
comprising a CsgG pore and a CsgF peptide.
Kit
[0431] In a further aspect, the present invention also provides a
kit for characterising a target polynucleotide. The kit comprises
an isolated pore complex according to the invention, and the
components of a membrane or an insulating layer. The membrane is
preferably formed from the components. The isolated pore complex is
preferably present in the membrane or in the insulating layer,
together forming a transmembrane pore complex channel. The kit may
comprise components of any type of membranes, such as an
amphiphilic layer or a triblock copolymer membrane. The kit may
further comprise a polynucleotide binding protein. The kit may
further comprise one or more anchors for coupling the
polynucleotide to the membrane. The kit may additionally comprise
one or more other reagents or instruments which enable any of the
embodiments mentioned above to be carried out. Such reagents or
instruments include one or more of the following: suitable
buffer(s) (aqueous solutions), means to obtain a sample from a
subject (such as a vessel or an instrument comprising a needle),
means to amplify and/or express polynucleotides or voltage or patch
clamp apparatus. Reagents may be present in the kit in a dry state
such that a fluid sample resuspends the reagents. The kit may also,
optionally, comprise instructions to enable the kit to be used in
the method of the invention or details regarding for which organism
the method may be used. Finally, the kit may also comprise
additional components useful in peptide characterization.
[0432] In one embodiment, the isolated pore complex or
transmembrane pore complex, as provided by the invention, is used
for nucleic acid sequencing. For said use, the Phi29 DNA polymerase
(DNAP) may be used as a molecular motor with a CsgG:CsgF Nanopore
complex located within a membrane to allow controlled movement of
an oligomeric probe DNA strand through the pore. A voltage may be
applied across the pore and a current generated from the movement
of ions in a salt solution on either side of the nanopore. As the
probe DNA moves through the pore, the ionic flow through the pore
changes with respect to the DNA. This information has been shown to
be sequence dependent and allows for the sequence of the probe to
be read with accuracy from current measurements.
[0433] It is to be understood that although particular embodiments,
specific configurations as well as materials and/or molecules, have
been discussed herein for engineered cells and methods according to
the present invention, various changes or modifications in form and
detail may be made without departing from the scope and spirit of
this invention. The following examples are provided to better
illustrate particular embodiments, and they should not be
considered limiting the application. The application is limited
only by the claims.
Examples
Introduction
[0434] The CsgG pore is part of the multi-component type VIII
secretion system, also known as the curli biosynthesis system,
which is responsible for the formation of aggregative fibres known
in E. coli as curli. Curli are extracellular proteinaceous fibres
primarily involved in bacterial biofilm formation and attachment to
non-biotic surfaces. Curli biosynthesis is directed by two operons,
in E. coli called csgBAC and csgDEFG (curli-specific genes) (Hammar
et al., 1995). Secretion of curli subunits CsgA and CsgB depends on
CsgG, a dedicated lipoprotein found to form an oligomeric secretion
channel in the outer membrane. For transport, CsgG works in
coordination with the periplasmic and extracellular accessory
proteins CsgE and CsgF. CsgE forms a specificity factor for
CsgG-mediated transport, whilst CsgF appears to couple CsgA
secretion to the CsgB-templated aggregation into extracellular
fibers.
[0435] A crystal structure of the CsgG secretion channel
demonstrated CsgG to form a nonameric transport complex 120 .ANG.
in diameter and 85 .ANG. in height that traverses the OM through a
36-stranded .beta.-barrel of 40 .ANG. inner diameter (Goyal et al.,
2014; FIG. 1). A periplasmic domain of the channel, separated from
the transmembrane R-barrel by an iris-like diaphragm, forms a large
solvent-accessible cavity of 24,000 .ANG.3. This diaphragm is
formed by a conserved 12-residue `constriction loop` (CL) found in
each subunit, and whose concentric organization in the CsgG
oligomer forms an orifice with solvent-excluded diameter of
.about.0.6 nm and a height of .about.1.5 nm (FIG. 1). When acting
as a protein secretion channel, this orifice or constriction in the
CsgG channel forms the primary site of interaction with the
translocating polypeptide. When CsgG is used as a nanopore sensing
platform, this orifice serves the role of primary reader head for
analytes residing in or passing the channel. The diameter of the
orifice as well as its physical and chemical properties can be
modified by amino acid substitutions, deletions or insertions in a
region of the protein corresponding to residues 46 to 61 of SEQ ID
NO:3 (FIG. 1D). In particular, mutation of positions 51, 55, and 56
according to SEQ ID NO:3, together or individually, has a
beneficial influence on the conductance characteristics of the
nanopore and its interaction with analytes including
polynucleotides.
[0436] The assembly factor CsgF represents a component to the curli
secretion apparatus. The CsgF preprotein reaches the periplasm via
the SEC pathway, after which mature CsgF (12.9 KDa) is found as a
surface-exposed protein in a CsgG-dependent manner. In presence of
CsgG, CsgF fractionates with the OM, and co-immunoprecipitation
experiments suggested the two proteins are in direct contact.
Available data demonstrate that CsgF is non-essential for
productive subunit secretion, but rather suggest the protein forms
a coupling factor between CsgA secretion and extracellular
polymerization into curli fibers by coordinating or chaperoning the
nucleating function of the CsgB subunit.
Example 1: CsgG:CsgF Complex Protein Production (Co-Expression, In
Vitro Reconstitution, Coupled In-Vitro Transcription and
Translation and Reconstitution of CsgG with CsgF Synthetic
Peptides)
[0437] To produce the CsgG:CsgF complex, both proteins can be
co-expressed in a suitable Gram-negative host such as E. coli, and
extracted and purified as a complex from the outer membrane. The in
vivo formation of the CsgG pore and the CsgG:CsgF complex requires
targeting of the proteins to the outer membrane. To do so, CsgG is
expressed as a prepro-protein with a lipoprotein signal peptide
(Juncker et al. 2003, Protein Sci. 12(8): 1652-62) and Cys residue
at the N-terminal position of the mature protein (SEQ ID No:3). An
example of such lipoprotein signal peptide is residues 1-15 of full
length E. coli CsgG as shown in SEQ ID No:2. Processing of prepro
CsgG results in cleavage of the signal peptide and lipidatation of
mature CsgG, following by transfer of the mature lipoprotein to the
outer membrane, where it inserts as an oligomeric pore (Goyal et
al. 2014, Nature 516(7530):250-3). To form the CsgG:CsgF complex,
CsgF can be co-expressed with CsgG and targeted to the periplasm by
means of a leader sequence such as the native signal peptide
corresponding to residues 1-19 of SEQ ID No:5. CsgG:CsgF
combination pores can then be extracted from the outer membrane
using detergents, and purified to a homogeneous complex by
chromatography (FIG. 2).
[0438] Alternatively, the CsgG:CsgF pore complex can be produced by
in vitro reconstitution using the CsgG pore and CsgF--see below and
FIG. 3.
[0439] For in vivo CsgG:CsgF complex formation in the example shown
in FIG. 2, E. coli CsgF (SEQ ID NO:5) and CsgG (SEQ ID NO:2) were
co-expressed using their native signal peptides to ensure
periplasmic targeting of both proteins, as well as N-terminal
lipidation of CsgG. Additionally, for ease of purification, CsgF
was modified by introduction of a C-terminally 6.times. histidine
tag and CsgG was fused C-terminally to a Strep-II tag.
Co-expression and complex purification was performed as described
in the Methods. SDS-PAGE analysis of the His affinity purification
eluate revealed the enrichment of CsgF-His, as well as the
co-purification of CsgG-Strep, suggesting the latter was in a
complex with CsgF (FIG. 2B). Additionally, the SDS-PAGE revealed
that a significant fraction of the eluted CsgF ran at lower
molecular mass due to the loss of a N-terminal fragment of the
protein (FIG. 2B, indicated with asterisk). SDS-PAGE analysis of
the pooled fractions of the His-trap elution of the second affinity
purification revealed the presence of CsgG and CsgF in an apparent
equimolar concentrations, as well as the loss of the CsgF
truncation fragment seen in the His-trap eluate (FIG. 2B).
Co-elution of CsgF in the Strep-affinity purification indicated
that the protein is present as a non-covalent complex with CsgG.
Strikingly, the N-terminal truncation fragment of CsgF was lost in
the Strep-affinity purification, suggesting that the CsgF
N-terminus is required to bind CsgG (FIG. 2B).
[0440] FIG. 13 shows another example of CsgG:CsgF complex formation
by in vivo co-expression. In this example, CsgG protein is modified
with a C terminal Strep-II tag and the CsgF full length protein is
modified with a C terminal 10.times. Histidine tag. Co-expressed
CsgG:CsgF complex is been purified away from its constituent
components by Strep tag purification followed by a Histidine tag
purification as shown in the materials and methods section for
characterisation of analytes. Due to the differences in molecular
weights, CsgG:CsgF complex can be clearly differentiated from the
CsgG pore in SDS-PAGE analysis (FIG. 13A). As shown in FIG. 13.B,
two tag purification method can be successfully applied to purify
the CsgG:CsgF complex away from its constituent components.
[0441] To produce the CsgG:CsgF complex by in vitro reconstitution,
CsgG and CsgF were expressed in separate E. coli cultures
transformed with pPG1 and pNA101, respectively, and purified,
followed by in vitro reconstitution of the CsgG:CsgF complex (see
Methods). For comparison, purified CsgG was similarly run over the
Superose 6 column as the complex. The CsgG Superose 6 run showed
the existence of two discrete populations, corresponding to
nonameric CsgG pores (FIGS. 3A (a) and 3C) as well as dimers of
nonameric CsgG pores (FIGS. 3A (b) and 3C), as previously described
in Goyal et al. (2014). The Superose 6 run of the CsgG:CsgF
reconstitution revealed the existence of three discrete populations
corresponding to excess CsgF (FIG. 3A (c)), nonameric CsgG:CsgF
complex (FIG. 3A (d)) and dimers of nonameric CsgG:CsgF (FIG. 3A
(e)). To provide independent confirmation of the formation of
CsgG:CsgF complexes, the various Superose 6 elution peaks were
analysed on native PAGE (FIG. 3B).
[0442] Surprisingly, CsgG:CsgF complex can also be made by coupled
in vitro transcription and translation (IVTT) method as described
in the materials and methods section for characterisation of
analytes. The complex can be made either by expressing CsgG and
CsgF proteins in the same IVTT reaction or reconstituting
separately made CsgG and CsgF in two different IVTT reactions. In
the example shown in FIG. 14, E. coli T7-S30 extract system for
circular DNA (Promega) has been used to make the CsgG:CsgF complex
in one reaction mixture and proteins were analysed on SDS-PAGE.
Since the protein expression in IVTT does not use the natural
molecular machinery of protein expression, DNA that are used to
express proteins in IVTT lack the DNA encoding the signal peptide
region. When the DNA of CsgG is expressed in IVTT in the absence of
DNA of CsgF, only the monomers of CsgG can be produced.
Surprisingly, these expressed monomers can be assembled into CsgG
oligomeric pores in situ by using cell extract membranes present in
the IVTT reaction mixture (Figure, 14, lane 1). Although the
oligomer of CsgG is SDS stable, it breaks down into its constituent
monomers when the sample is heated to 100.degree. C. (Figure, 14,
lane 2). When the DNA of CsgF is expressed in IVTT in the absence
of DNA of CsgG, only CsgF monomers can be seen (Figure, 14, lane
3). When DNA of CsgG and CsgF are mixed in 1:1 ratio and expressed
simultaneously in the same IVTT reaction mixture, CsgF proteins
generated interact with the assembled CsgG pore with high
efficiency to make CsgG:CsgF complex (Figure, 14, lane 5). This SDS
stable complex made in IVTT is heat stable at least up to
70.degree. C. (Figure, 14, lanes 6-12).
[0443] CsgG:CsgF complexes with truncated CsgF can also be made by
any of the methods shown above by using DNA encoding truncated CsgF
instead of the full length version. However, stability of the
complex may be compromised when CsgF is truncated below the FCP
domain. In addition, CsgG:CsgF complexes with truncated CsgF can be
made by cleaving the full length CsgF in appropriate positions once
the full length CsgG:CsgF complex is formed. Truncations can be
done by modifying the DNA that encode CsgF protein by incorporating
protease cleavage sites at positions where cleavage is needed (FIG.
15.A). Seq ID No. 56-67 show TEV or HCV C3 protease sites
incorporated in various positions of CsgF to generate CsgG:CsgF
complexes with truncated CsgF. SDS-PAGE analysis of TEV cleavage of
the CsgG:CsgF complex made with the Seq ID No. 61 is shown (FIG.
15.B). When the CsgG:CsgF complex (with full length CsgF) is
treated with TEV protease enzyme as described in the materials and
methods section for characterisation of analytes, CsgF is being
truncated at position 35 (FIG. 15.B, lanes 3 and 4). However, TEV
cleavage leaves an extra 6 amino acids at the C terminal of the
cleavage site. Therefore, remaining CsgF truncated protein in
complex with the CsgG pore is 42 amino acids long. Molecular weight
difference of this complex and the CsgG pore (without the CsgF) is
still visible in SDS-PAGE (FIG. 15.B, lanes 7 and 8).
[0444] Surprisingly, CsgG:CsgF complexes with truncated CsgF can
also be made by reconstituting purified CsgG pore (made by in vivo
or in vitro) with synthetic peptides of appropriate length. Since
the reconstitution takes place in vitro, signal peptide of CsgF is
not required to make the CsgG:CsgF complex. Further, this method
does not leave extra amino acids at the C terminus of the CsgF.
Mutations and modifications can also be easily incorporated into
synthetic CsgF peptides. Therefore, this method is a very
convenient way to reconstitute different CsgG pores or mutants or
homologues thereof with different CsgF peptides or mutants or
homologues thereof to generate different CsgG:CsgF complex
variants. Stability of the complex may be compromised when the CsgF
is truncated beyond the FCP domain. Examples of truncated CsgF and
FCP peptides used to generate CsgG:CsgF complex variants are shown
in Table 3. Surprisingly, SDS-PAGE analysis of the heat stability
of CsgG:CsgF complexes made by this method with CsgF-(1-45) (FIG.
16.A), CsgF-(1-35) (FIG. 16.B) and CsgF-(1-30) (FIG. 16.C) shows at
least CsgF-(1-45) and CsgF-(1-35) peptides make complexes with CsgG
that are heat stable at least to 90.degree. C. Since the CsgG pore
breaks down to its constituent monomers at 90.degree. C., it is
difficult to assess the stability of the complex beyond 90.degree.
C. Due to the minimal difference between the CsgG pore band and the
CsgG:CsgF-(1-30) complex band in SDS-PAGE, this method is not
sufficient to analyse the heat stability of the CsgG:CsgF-(1-30)
complex (FIG. 16.C). However, CsgG:CsgF complexes have been
observed in all three cases and even with CsgG:CsgF-(1-29) in
electrophysiological experiments indicating that even CsgF-(1-29)
peptide is producing at least some CsgG:CsgF complexes (FIG.
24).
Example 2: CsgG:CsgF Structural Analysis Via Cryo-EM
[0445] To gain structural insight in the CsgG:CsgF complex,
co-purified or in vitro reconstituted CsgG:CsgF particles were
analysed by transmission electron microscopy. In preparation of
cryo-EM analysis, 500 .mu.L of the peak fraction of the
double-affinity purified CsgG:CsgF complex was injected onto a
Superose 6 10/30 column (GE Healthcare) equilibrated with Buffer D
(25 mM Tris pH8, 200 mM NaCl and 0.03% DDM), and run at 0.5 mL/min.
Protein concentration was determined based on calculated absorbance
at 280 nm and assuming 1:1 stoichiometry. Samples for electron
cryomicroscopy were analysed as described in the Methods. A cryo-EM
micrograph of the CsgG:CsgF complex as well as two selected class
averages from the picked CsgG:CsgF particles are shown in FIG. 4.
The micrograph shows the presence of nonameric pore as well as
dimer of nonameric pore complexes. For image reconstruction,
nonameric CsgG:CsgF particles were picked and aligned using RELION.
Class averages of the CsgG:CsgF complex as side views, as well as
the 3D reconstructed electron density show the presence of an
additional density corresponding to CsgF, seen as a protrusion from
the CsgG particle, located at the side of the CsgG .beta.-barrel
(FIG. 4B, 5). The additional density reveals three distinct
regions, encompassing a globular head domain, a hollow neck domain
and a domain that interacts with the CsgG .beta.-barrel. The latter
CsgF region, referred to as CsgF constriction peptide or FCP,
inserts into the lumen of the CsgG .beta.-barrel and can be seen to
form an additional constriction (labeled F in FIG. 4B, 5) of the
CsgG pore, located approximately 2 nm above the constriction formed
by the CsgG constriction loop (labeled G in FIG. 4B, 5).
Example 3: Identification of the CsgF Interaction and Constriction
Peptide by Truncation of CsgF
[0446] The presence of a second constriction in the CsgG:CsgF pore
complex as compared to the CsgG only pore provides opportunities
for nanopore sensing applications, providing a second orifice in
the nanopore that can be used as a second reader head or as an
extension of the primary reader head provided by the CsgG
constriction loop (FIG. 6, 7). However, when in complex with the
full length CsgF, the exit side of CsgG:CsgF combination pore is
blocked by the CsgF neck and head domains. Therefore, we sought to
determine the CsgF region required to interact with and insert into
the CsgG .beta.-barrel. Our Strep-tactin affinity purification
experiments hinted that the N-terminal region of CsgF was required
for CsgG interaction, since an N-terminal truncation fragment of
CsgF present in the His-trap affinity purification was lost and did
not co-purify with CsgG (FIG. 2B). CsgF homologues are
characterised by the presence of PFAM domain PF03783. When
performing a multiple sequence alignment (MSA) of CsgG homologues
found in Gram-negative bacteria (FIG. 8 shows a MSA of a selected
CsgF homologues) a region of sequence conservation (between 35 and
100% pairwise sequence identity) was seen corresponding to the
first .about.30-35 amino acids of mature CsgF (SEQ ID NO:6). Based
on the combined data, this N-terminal region of CsgF was
hypothesised to form the CsgG interaction peptide or FCP. A
multiple sequence alignment of the FCPs in known CsgF homologues is
shown in FIG. 10.
[0447] To test the hypothesis that the CsgF N-terminus corresponds
to the CsgG binding region and forms the CsgF constriction peptide
residing in the CsgG .beta.-barrel lumen, Strep-tagged CsgG and
His-tagged CsgF truncates were co-overexpressed in E. coli (see
Methods). pNA97, pNA98, pNA99 and pNA100 encode N-terminal CsgF
fragments corresponding to residues 1-27, 1-38, 1-48 and 1-64 of
CsgF (SEQ ID NO:5). These peptides include the CsgF signal peptide
corresponding to residues 1-19 of SEQ ID NO: 5, and thus will
produce periplasmic peptides corresponding to the first 8, 19, 29
and 45 residues of mature CsgF (SEQ ID NO:6; FIG. 9A), each
including a C-terminal 6.times. His tag. SDS-PAGE analysis of whole
cell lysates revealed the presence of CsgG in all samples, as well
as the presence of CsgF fragment corresponding to the first 45
residues of mature CsgF (SEQ ID NO:6; FIG. 9B). For the shorter
N-terminal CsgF fragments, no detectible expression of the peptides
was seen in the whole cell lysates. After two freeze/thaw cycles,
cell mass of the various CsgG:CsgF fragments were further enriched
by purification. Whole cell lysates as well as the eluted fractions
of the Strep affinity purification were spotted onto a
nitrocellulose membrane for dot blot analysis using an anti-His
antibody for the detection of the His-tagged CsgF fragments (FIG.
9C). The dot blot shows the CsgF 20:64 peptide co-purifies with
CsgG, demonstrating this CsgF fragment is sufficient to form a
stable non-covalent complex with CsgG. For the CsgG 20:48 fragment
a small amount of peptide can be seen to co-purify with CsgG,
whilst no detectable levels are seen for CsgF 20:27 or CsgF 20:38
in either the whole cell lysate or the Strep affinity purification
(FIG. 9C), suggesting that the latter peptides are not stably
expressed in E. coli, and/or do not form a stable complex with
CsgG.
Example 4: Description of the CsgG:CsgF Interaction at Atomic
Resolution
[0448] To gain an atomic level detail on the CsgG:CsgF interaction
we determined the high resolution cryoEM structure of the CsgG:CsgF
complex. For this purpose, CsgG and CsgF were co-expressed
recombinantly in E. coli and the CsgG:CsgF complex was isolated
from E. coli outer membranes by detergent extraction and purified
using tandem affinity purification. Samples for electron
cryo-microscopy were prepared by spotting 3 .mu.l sample on R2/1
Holey grids (Quantifoil), coated with graphene oxide, and data was
collected on a 300 kV TITAN Krios with Gatan K2 direct electron
detector in counting mode. 62.000 single CsgG:CsgF particles were
used to calculate a final electron density map at 3.4 .ANG.
resolution (FIG. 11A). The map allowed unambiguous docking and
local rebuilding of the CsgG crystal structure, as well as the de
novo building of the N-terminal 35 residues of mature CsgF (i.e.
residues 20:54 of Seq ID No. 5), which encompass the FCP that binds
CsgG and forms a second constriction at the height of the CsgG
transmembrane .beta.-barrel (FIG. 11C, D). The cryoEM structure
shows CsgG:CsgF comprises a 9:9 stoichiometry, with C9 symmetry
(FIG. 11B). The FCP binds the inside of the CsgG .beta.-barrel,
with the C-terminus of the CsgF pointing out of the CsgG
.beta.-barrel, and the CsgF N-terminus located near the CsgG
constriction. The structure shows that P35 in mature CsgF lies
outside the CsgG .beta.-barrel and forms the connection between the
CsgF FCP and neck regions. The CsgF neck and head regions are not
resolved in the high resolution cryoEM maps due to flexibility
relative to the main body of the CsgG:CsgF complex. Three regions
in the CsgG .beta.-barrel stabilize the CsgG:CsgF interaction:
(IR1) residues Y130, D155, S183, N209 and T207 in mature CsgG (SEQ
ID NO: 3) form an interaction network with the N-terminal amine and
residues 1 to 4 of mature CsgF (SEQ ID NO: 6), comprising four
H-bonds and an electrostatic interaction; (IR2) residues Q187, D149
and E203 in mature CsgG (SEQ ID NO: 3) form an interaction network
with R8 and N9 in mature CsgF (SEQ ID NO 6), encompassing three
H-bonds and two electrostatic interaction; and (IR3) residues F144,
F191, F193 and L199 in mature CsgG (SEQ ID NO: 3) form a
hydrophobic interaction surface with residues F21, L22 and A26 in
mature CsgF (SEQ ID NO: 6). The latter are located in an
.alpha.-helix (helix 1) formed by residues 19-30 of mature CsgF.
The conserved sequence N-P-X-F-G-G (residues 9-14 in SEQ ID NO: 6)
forms an inward turn that connects the loop region formed by
residues 15-19 with the CsgF helix 1. Together, these elements give
rise to a constriction in the CsgG:CsgF complex, of which residue
17 (N17 in mature E. coli CsgF, SEQ ID NO: 6) forms the narrowest
point, resulting in an orifice with 15 .ANG. diameter (FIG. 11C).
The second constriction (F-constriction or FC) lies approximately
15 .ANG. and 30 .ANG. above the top and bottom, respectively, of
the constriction formed by CsgG residues 46 to 59 (G-constriction
or GC).
Example 5: Simulations to Improve CcgG-CsgF Complex Stability
[0449] Molecular dynamics simulations were performed to establish
which residues in CsgG and CsgF come into close proximity. This
information was used to design CsgG and CsgF mutants that could
increase the stability of the complex.
[0450] Simulations were performed using the GROMACS package version
4.6.5, with the GROMOS 53a6 forcefield and the SPC water model. The
cryo-EM structure of the CsgG-CsgF complex was used in the
simulations. The complex was solvated and then energy minimised
using the steepest descents algorithm. Throughout the simulation,
restraints were applied to the backbone of the complex, however,
the residue sidechains were free to move. The system was simulated
in the NPT ensemble for 20 ns, using the Berendsen thermostat and
Berendsen barostat to 300 K.
[0451] Contacts between CsgG and CsgF were analysed using both
GROMACS analysis software and also locally written code. Two
residues were defined as having made a contact if they came within
3 Angstroms. The results are shown in Table 4 below.
TABLE-US-00004 TABLE 4 Predicted contact frequencies of residue
pairs in the CsgG/CsgF complex: CsgG CsgF % Time spent residue
residue in contact GLU 203 ARG 8 88.8 GLU 201 ASN 11 87.4 GLU 201
PHE 12 84.3 GLU 203 ASN 9 83.6 ASP 155 GLY 1 81.2 GLU 203 PHE 7 81
GLU 201 ASN 9 77.2 SER 183 GLY 1 76.1 ASN 209 MET 3 70.8 THR 207
PHE 5 70.1 ASP 149 ARG 8 68.5 GLN 187 ARG 8 66.1 ARG 142 PHE 12
65.4 GLU 185 ARG 8 64.4 ASP 149 PHE 12 64.2 GLN 187 GLN 6 63.3 GLY
205 PHE 5 54 GLN 197 ASN 30 52.5 GLN 197 SER 31 51.4 LYS 49 THR 2
50.8 PHE 144 GLN 29 50.6 GLU 201 PHE 21 48 GLN 151 PHE 5 47 PHE 191
ASN 9 46.9 ARG 142 ASN 11 46.4 GLN 151 PHE 7 45.6 TYR 196 TYR 32
45.4 PHE 191 PHE 21 45.3 PHE 193 ALA 26 45.1 GLU 201 SER 25 44.9
LEU 199 GLN 29 44.7 ARG 141 PHE 12 43.1 GLY 138 PHE 7 43 GLN 187
PHE 5 43 GLY 145 GLN 29 42.4 GLN 153 GLY 1 42.1 GLY 140 PHE 7 40.5
PHE 193 TYR 32 39.9 GLU 203 PHE 12 39.7 ASN 133 PHE 5 35.9 GLN 151
MET 3 32 PHE 193 ASN 30 31.9 SER 136 PHE 5 31.7 PHE 144 SER 31 30.3
TYR 130 GLY 1 30 GLN 187 PHE 7 29.9 PHE 192 ASN 30 28.9 GLY 138 PHE
5 28.3 ILE 194 TYR 32 26.7 ASN 209 GLY 1 26.1 PHE 192 GLN 29 25.8
PHE 193 GLN 29 25.4 PHE 193 GLN 27 23.9 ASP 149 GLY 13 22.7 TYR 196
ASN 30 22.6 PHE 192 SER 31 22.2 ASP 148 PHE 12 22 GLY 140 PHE 12
21.7 TYR 196 ASP 34 21.6 ARG 198 SER 31 19.9 VAL 139 PHE 7 19.5 PHE
191 ALA 26 18.3 ASN 132 GLY 1 18.1 TYR 195 TYR 32 17.9 GLN 197 ALA
28 17.6 GLN 151 ARG 8 16.9 PHE 191 LEU 22 16.5 PHE 191 GLN 29 15.6
THR 206 PHE 5 14.7 GLN 153 MET 3 14.3 PHE 192 TYR 32 13.8 GLU 201
GLN 29 13.3 ARG 142 SER 25 13.3 PHE 144 ASN 30 12.6 ARG 142 ARG 8
12.6 PHE 191 ASN 11 12.3 GLU 131 THR 2 12.2 ASN 133 GLY 1 11.3 GLY
205 PHE 7 11.2 GLN 151 PHE 12 10.4 ASN 132 PHE 5 10.3 GLU 202 PHE
12 10.2 ASP 149 PHE 7 10.2
Materials and Methods for Structural Determination of the CsgG:CsgF
Complex:
Cloning
[0452] For the expression of E. coli CsgG as outer membrane
localized pore, the coding sequence of E. coli CsgG (SEQ ID NO:1)
was cloned into pASK-Iba12, resulting in plasmid pPG1 (Goyal et al.
2013).
[0453] For the expression of C-terminally 6.times.-His tagged CsgF
in the E. coli cytoplasm, the coding sequence for mature E. coli
CsgF (SEQ ID NO:6; i.e. CsgF without its signal sequence) was
cloned into pET22b via the Ndel and EcoRI sites, using a PCR
product generated using the primers "CsgF-His_pET22b_FW" (SEQ ID
NO:46) and "CsgF-His_pET22b_Rev" (SEQ ID NO:47), resulting in the
CsgF-His expression plasmid pNA101.
[0454] The pNA62 plasmid, a pTrc99a based vector expressing
csgF-His and csgG-strep, was created based on pGV5403 (pTrc99a with
the pDEST14 Gateway.RTM. cassette integrated). The pGV5403
ampicillin resistance cassette was replaced by a
streptomycin/spectinomycin resistance cassette. A PCR fragment
encompassing part of the E. coli MC4100 csgDEFG operon
corresponding to the coding sequences of csgE, csgF and csgG was
generated with primers csgEFG_pDONR221_FW (SEQ ID NO:48) and
csgEFG_pDONR221_Rev (SEQ ID NO:49), and inserted in pDONR221
(ThermoFisher Scientific) via BP Gateway.RTM. recombination. Next,
this recombinant csgEFG operon from the pDONR221 donor plasmid was
inserted via LR Gateway.RTM. recombination into pGV5403 with
streptomycin/spectinomycin resistance cassette. Via PCR, a
6.times.His-tag was added to the CsgF C-terminus using primers
Mut_csgF_His_FW (SEQ ID NO:50) and Mut_csgF_His_Rev (SEQ ID NO:51).
Finally, csgE was removed by outwards PCR (primers DelCsgE_FW (SEQ
ID NO:52) and DelCsgE_Rev (SEQ ID NO:53)) to obtain pNA62.
[0455] Constructs for the periplasmic expression of C-terminally
His-tagged CsgF fragments corresponding to the putative
constriction peptides (FIG. 9A) were created by outwards PCR on
pNA62, a pTrc99a based vector expressing CsgF-his and CsgG-strep.
Primer combinations were as follows: pNa62_CsgF_histag_Fw (SEQ ID
NO:45) as forward primers, with CsgF_d27_end (SEQ ID NO:41),
CsgF_d38_end (SEQ ID NO:42), CsgF_d48_end (SEQ ID NO:43) or
CsgF_d64_end (SEQ ID NO:44) as reverse primers to create pNA97,
pNA98, pNA99 and pNA100 respectively.
[0456] In pNA97 csgF is truncated to SEQ ID NO:7, encoding a CsgF
fragment including residues 1-27 (SEQ ID NO:8); In pNA98 csgF is
truncated to SEQ ID NO:9, encoding a CsgF fragment including
residues 1-38 (SEQ ID NO:10); In pNA99 csgF is truncated to SEQ ID
NO:11, encoding a CsgF fragment including residues 1-48 (SEQ ID
NO:12); and in pNA100 csgF is truncated to SEQ ID NO:13, encoding a
CsgF fragment including residues 1-64 (SEQ ID NO:14). Expression of
pNA97, pNA98, pNA99 and pNA100 in E. coli does result in production
of the CsgG pore (SEQ ID NO:3) in the outer membrane, as well as
periplasmic targeting of CsgF-derived peptides with sequences:
TABLE-US-00005 (SEQ ID NO: 37 + 6.times. His) "GTMTFQFRHHHHHH",
(SEQ ID NO: 38 + 6.times. His) "GTMTFQFRNPNFGGNPNNGHHHHHH", (SEQ ID
NO: 39 + 6.times. His) "GTMTFQFRNPNFGGNPNNGAFLLNSAQAQHHHHHH", and
(SEQ ID NO: 40 + 6.times. His)
"GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSY NDDFGIETHHHHHH",
respectively.
Strains
[0457] E. coli Top10 (F.sup.- mcrA .DELTA.(mrrhsdRMS:mcrBC)
.PHI.80lacZ.DELTA.M15 .DELTA. lacX74 recA1 araD139 .DELTA.(araleu)
7697 galU galK rpsL (StrR) endA1 nupG) was used for all cloning
procedures. E. coli C43(DE3) (F.sup.- ompT hsdSB (rB.sup.-
mB.sup.-) gal dcm (DE3)) and Top10 were used for protein
production.
Recombinant CsgG:CsgF Complex Production Via Co-Expression
[0458] For co-expression of E. coli CsgF (SEQ ID NO:5) and CsgG
(SEQ ID NO:2), both recombinant genes including their native Shine
Dalgarno sequences were placed under control of the inducible trc
promotor in a pTrc99a-derived plasmid to form plasmid pNA62. CsgG
and CsgF were overexpressed in E. coli C43(DE3) cells transformed
with plasmid pNA62 and grown at 37.degree. C. in Terrific Broth
medium. When the cell culture reached an optical density (OD) at
600 nm of 0.7, recombinant protein expression was induced with 0.5
mM IPTG and left to grow for 15 hours at 28.degree. C., before
being harvested by centrifugation at 5500 g.
Recombinant CsgG:CsgF Complex Production Via In Vitro
Reconstitution
[0459] Full-length E. coli CsgG (SEQ ID NO:2) modified with a
C-terminal StrepII-tag was overexpressed in E. coli BL21 (DE3)
cells transformed with plasmid pPG1 (Goyal et al. 2013). The cells
were grown at 37.degree. C. to an OD 600 nm of 0.6 in Terrific
Broth medium. Recombinant protein production was induced with
0.0002% anhydrotetracyclin (Sigma) and the cells were grown at
25.degree. C. for a further 16 h before being harvested by
centrifugation at 5500 g.
[0460] E. coli CsgF (SEQ ID NO:6; i.e. lacking the CsgF signal
sequence) in a C-terminal fusion with a 6.times. His-tag was
overexpressed in the cytoplasm of E. coli BL21(DE3) cells
transformed with plasmid pNA101. Cells were grown at 37.degree. C.
to an OD of 600 nm followed by induction by 1 mM IPTG and left to
express protein 15h at 37.degree. C. before being harvested by
centrifugation at 5500 g.
Recombinant Protein Purification of the CsgG:CsgF Complex, CsgG,
and CsgF
[0461] E. coli cells transformed with pNA62 and co-expressing
CsgG-Strep and CsgF-His were resuspended in 50 mM Tris-HCl pH 8.0,
200 mM NaCl, 1 mM EDTA, 5 mM MgCl.sub.2, 0.4 mM AEBSF, 1 .mu.g/mL
Leupeptin, 0.5 mg/mL DNase I and 0.1 mg/mL lysozyme. The cells were
disrupted at 20 kPsi using a TS series cell disruptor (Constant
Systems Ltd) and the lysed cell suspension incubated 30' with 1%
n-dodecyl-.beta.-d-maltopyranoside (DDM; Inalco) for further cell
lysis and extraction of outer membrane components. Next, remaining
cell debris and membranes were spun down by ultracentrifugation at
100.000 g for 40'. Supernatant was loaded onto a 5 mL HisTrap
column equilibrated in buffer A (25 mM Tris pH8, 200 mM NaCl, 10 mM
imidazole, 10% sucrose and 0.06% DDM). Column was washed with
>10 CVs 5% buffer B (25 mM Tris pH8, 200 mM NaCl, 500 mM
imidazole, 10% sucrose and 0.06% DDM) ion buffer A and eluted with
a gradient of 5-100% buffer B over 60 mL.
[0462] Eluens was diluted 2-fold before loading overnight on a 5 mL
Strep-tactin column (IBA GmbH) equilibrated with buffer C (25 mM
Tris pH8, 200 mM NaCl, 10% sucrose and 0.06% DDM). Column was
washed with >10 CVs buffer C and protein was eluted by the
addition of 2.5 mM desthiobiotin. Next 500 .mu.L of the peak
fraction of the double-affinity purified complex was injected on a
Superose 6 10/30 (GE Healthcare) equilibrated with Buffer D (25 mM
Tris pH8, 200 mM NaCl and 0.03% DDM), and run at 0.5 mL/min to
prepare samples for electron microscopy. Protein concentration was
determined based on calculated absorbance at 280 nm and assuming
1/1 stoichiometry. Buffer D (25 mM Tris pH8, 200 mM NaCl and 0.03%
DDM)
[0463] CsgG-strep purification for in vitro reconstitution is
identical to the protocol for CsgG:CsgF when omitting sucrose in
the buffers and bypassing the IMAC and size exclusion steps.
[0464] CsgF-His purification for in vitro reconstitution was
performed by resuspension of the cell mass in 50 mM Tris-HCl pH
8.0, 200 mM NaCl, 1 mM EDTA, 5 mM MgCl2, 0.4 mM AEBSF, 1 .mu.g/mL
Leupeptin, 0.5 mg/mL DNase I and 0.1 mg/mL lysozyme. The cells were
disrupted at 20 kPsi using a TS series cell disruptor (Constant
Systems Ltd) and the lysed cell suspension was centrifuged at
10.000 g for 30 min to remove intact cells and cell debris.
Supernatant was added to 5 mL Ni-IMAC-beads (Workbeads 40 IDA,
Bio-Works Technologies AB) equilibrated with buffer A (25 mM Tris
pH8, 200 mM NaCl, 10 mM imidazole) and left incubating for 1 hour
at 4.degree. C. Ni-NTA beads were pooled in a gravity flow column
and washed with 100 mL of 5% buffer B (25 mM Tris pH8, 200 mM NaCl,
500 mM imidazole diluted in buffer A. Bound protein was eluted by
stepwise increase of Buffer B (10% steps of each 5 mL).
In Vitro Reconstitution of the CsgG:CsgF Complex
[0465] Purified CsgG and CsgF were pooled and used to in vitro
reconstitute the complex. Therefore a molar ratio of 1 CsgG:2 CsgF
was mixed to saturate the CsgG barrel with CsgF. Next, the
reconstituted mixture was injected on a Superose 6 10/30 column (GE
Healthcare) equilibrated with Buffer D (25 mM Tris pH8, 200 mM NaCl
and 0.03% DDM), and run at 0.5 mL/min to prepare samples for
electron microscopy (FIG. 3). Protein concentration was determined
based on calculated absorbance at 280 nm and assuming 1/1
stoichiometry.
Structural Analysis Using Electron Microscopy
[0466] Sample behavior of the size exclusion fraction is probed
using negative stain electron microscopy. Samples are stained with
1% uranyl formate and imaged using an in-house 120 kV JEM 1400
(JEOL) microscope equipped with a LaB6 filament. Samples for
electron cryomicroscopy were prepared by spotting 2 .mu.L sample
onto R2/1 continuous carbon (2 nm) coated grids (Quantifoil),
manually blotted and plunged in liquid ethane using an in house
plunging device. Sample quality was screened on the in-house JEOL
JEM 1400 before collecting a dataset on a 200 kV TALOS ARCTICA
(FEI) microscope equipped with a Falcon-3 direct electron detection
camera. Images were motion corrected with MotionCor2.1 (Zheng et
al. 2017), defocus values were determined using ctffind4 (Rohou and
Grigorieff, 2015) and data was further analysed using a combination
of RELION (Scheres, 2012) and EMAN2 (Ludtke, 2016). C9 Symmetry was
imposed during 3D model generation and refinement on selected 2D
class averages featuring additional density for a head group.
[0467] For high resolution cryoEM analysis, CsgG:CsgF samples were
prepared for electron cryo-microscopy by spotting 3 .mu.l sample on
R2/1 Holey grids (Quantifoil), coated with graphene oxide (Sigma
Aldrich), manually blotted and plunged in liquid ethane using CP3
plunger (Gatan). Sample quality was screened on the in-house JEOL
JEM 1400 before collecting a dataset on a 300 kV TITAN KRIOS (FEI,
Thermo-Scientific) microscope equipped K2 Summit direct electron
detector (Gatan). The detector was used in counting mode with a
cumulative electron dose of 56 electrons per .ANG..sup.2 spread
over 50 frames. 2045 images were collected with a pixel size of
1.07 .ANG.. Images were motion-corrected with MotionCor2.1 (Zheng
et al. 2017) and defocus values were determined using ctffind4
(Rohou and Grigorieff, 2015). Particles were picked automatically
using Gautomatch (Dr. Kai Zhang) and data was further analysed
using a combination of RELION2.0 (Kimanius et al. 2016, Elife 5.
pii: el 8722) and EMAN2 (Ludtke, 2016). C9 Symmetry was imposed
during 3D model generation and refinement on selected 2D class
averages featuring additional density for the head group
corresponding to CsgF. 62.000 particles were used to calculate the
final map at 3.4 .ANG. resolution. De novo model building of CsgF
was done with COOT (Brown et al. 2015 Acta Crystallogr D Biol
Crystallogr 71 (Pt 1):136-53) and iterative cycles of model
building and refinement of the full complex was done with PHENIX
(Afonine 2018, Acta Crystallogr D Struct Biol 74(Pt 6):531-544)
real-space refinement in combination with COOT.
Protein Expression and Purification of CsgG:CsgF Fragments
[0468] CsgF fragments and CsgG were co-expressed, with CsgF
fragments being C-terminally His-tagged and CsgG fused C-terminally
to a Strep tag. The CsgG: CsgF fragments complex was over-expressed
in E. coli Top10 cells, transformed with plasmid pNA97, pNA98,
pNA99 or pNA100. Plates were grown at 37.degree. C. ON, and a
colony was resuspended in LB medium supplemented with
Streptomycin/spectomycin. When the cell cultures reached an optical
density (OD) at 600 nm of 0.7, recombinant protein expression was
induced with 0.5 mM IPTG and left to grow for 15 hours at
28.degree. C., before being harvested by centrifugation at 5500 g.
Pellets were frozen at -20.degree. C.
[0469] Cell mass for the various CsgG:CsgF fragment co-expressions
was resuspended in 200 mL 50 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM
EDTA, 5 mM MgCl2, 0.4 mM AEBSF, 1 .mu.g/mL Leupeptin, 0.5 mg/mL
DNase I and 0.1 mg/mL lysozyme, sonicated and incubated with 1%
n-dodecyl-.beta.-d-maltopyranoside (DDM; Inalco) for further cell
lysis and extraction of outer membrane components. Next, remaining
cell debris and membranes were spun down by centrifugation at
15.000 g for 40'. The supernatant was incubated with 100 .mu.L
Strep-tactin beads at RT for 30 min. Strep beads were washed with
buffer (25 mM Tris pH8, 200 mM NaCl, and 1% DDM) by centrifugation
and bound proteins were eluted by the addition of 2.5 mM
desthiobiotin in 25 mM Tris pH8, 200 mM NaCl, 0.01% DDM.
Production of CsgG:FCP by In Vitro Reconstitution.
[0470] A synthetic peptide corresponding to the N-terminal 34
residues of mature CsgF (SEQ ID NO: 6) was diluted to 1 mg/ml in
buffer 0.1 M MES, 0.5 M NaCl, 0.4 mg/ml EDC
(1-ethyl-3-(3-dimethylaminopropyl)carbodiimide), 0.6 mg/ml NHS
(N-hydroxysuccinimide) and incubated for 15 min at room temperature
to allow activation of the peptide carboxyterminus. Next, 1 mg/ml
Cadaverin-Alexa594 in PBS was added during a 2h incubation to allow
covalent coupling at room temperature. The reaction was quenched
via buffer exchange to 50 mM Tris, NaCl, 1 mM EDTA, 0.1% DDM using
Zeba Spin filters.
[0471] Labelled peptide was added to strep-affinity purified CsgG
in 50 mM Tris, 100 mM NaCl, 1 mM EDTA, 5 mM LDAO/C8D4 in a 2:1
molar ratio during 15 minutes at room temperature to allow
reconstitution of the CsgG:FCP complex. After pull down of
CsgG-strep on StrepTactin beads, the sample was analysed on
native-PAGE.
Example 6: Further Stabilization of CsgG:CsgF Complex by Covalent
Cross Linking
[0472] Although full length and some of the truncated versions of
CsgF make stable CsgG:CsgF complexes with the CsgG pore, CsgF can
still be dislodged from the barrel region of CsgG pore under
certain conditions. Therefore, it is desirable to make a covalent
link between the CsgG and CsgF subunits. Based on molecular
simulation studies, positions of CsgG and CsgF that are in close
proximity to each other have been identified (Example 5 and Table
4). Some of these identified positions have been modified to
incorporate a Cysteine in both CsgG and CsgF. FIG. 19 shows an
example of thiol-thiol bond formation between Q153 position of CsgG
and G1 position of CsgF. CsgG pore containing Q153C mutation was
reconstituted with CsgF containing G1C mutation and incubated for 1
hour enabling S--S bond formation. When the complex is heated to
100.degree. C. in the absence of DTT, a 45 kDa band corresponding
to dimer between CsgG monomer and CsgF monomer (CsgGm-CsgFm) can be
seen indicating the S--S bond formation between the two monomers
(CsgGm is 30 kDa and CsgFm is 15 kDa) (FIG. 19.A). This band
disappears when the heating is done in the presence of DTT. DTT
breaks down the S--S bond. When the CsgG:CsgF complex incubated
overnight instead of 1 hour, the extend of CsgGm-CsgFm dimer
formation increases (FIG. 19.A). Mass spectroscopy methods have
been carried out to further identify the dimer band. Gel purified
protein was proteolytically cleaved to generate tryptic peptides.
LC-MS/MS sequencing methods were performed, resulting in the
identification of S--S bond between the Q153 position of CsgG and
G1 position of CsgF (FIG. 19.B). Oxidising agents such as
copper-orthophenanthroline can be used to enhance the S--S bond
formation. When CsgG pore containing N133C modification is
reconstituted with CsgF containing T4C modification in the presence
of copper-orthophenanthroline as described in methods section and
then broken down to its constituent monomers by heating to
100.degree. C. in the absence of DTT, a strong dimer band
corresponding to CsgGm-CsgFm can be observed on SDS-PAGE (FIG. 20,
lanes 3 and 4). When the heating was carried out in the presence of
DTT, the dimer breaks down to its constituent monomers (FIG. 20,
lanes 1 and 2).
Example 7: Electrophysiological Characterisation of CsgG:CsgF
Complexes
[0473] The signal observed when a DNA strand translocates through
CsgG is well characterised when the pore is inserted in the
copolymer membrane and experiments are carried out using the MinION
of Oxford Nanopore Technologies (FIG. 31). Y51, N55 and F56 of each
subunit of CsgG form the constriction of the CsgG pore (FIG. 12).
This sharp constriction serves as the reader head of the CsgG pore
(FIG. 31A) and is able to accurately discriminate a mixed sequence
of A,C,G and T as it passes through the pore. This is because the
measured signal contains characteristic current deflections from
which the identity of the sequence can be derived. However, in
homopolymeric regions of DNA, the measured signal may not show
current deflections of sufficient magnitude to allow single base
identification; such that an accurate determination of the length
of a homopolymer cannot be made from the magnitude of the measured
signal alone (FIGS. 26 B and C). The reduction in accuracy of the
CsgG reader head is correlated to the length of the homopolymeric
region (FIG. 29 C). When CsgF interacts with the CsgG pore to make
the CsgG:CsgF complex, CsgF introduces a second reader head within
the CsgG barrel. This second reader head primarily consists of the
N17 position of Seq. ID No. 6. A static strand experiment as
described in the methods section and FIG. 27 was carried out to map
the two reader heads of the CsgG:CsgF complex experimentally, and
results indicate the presence of the two reader heads that are
separated from each other by approximately 5-6 bases (FIG. 27, B, C
and D). Reader head discrimination plot for the CsgG:CsgF complex
shows that the second reader head introduced by CsgF contributes
less to the base discrimination than that of the CsgG reader head
(FIG. 27A). Surprisingly, when a second reader head is introduced
by CsgF within the CsgG barrel, the homopolymeric region which was
flat previously shows a step wise signal (FIGS. 30 B and C). These
steps contain information that can be used to identify the sequence
accurately resulting in a decrease in errors. Accuracy of the DNA
signal of the CsgG:CsgF complex remains relatively constant over a
longer homopolymeric length compared to the accuracy profile of the
CsgG pore by itself (FIG. 29 C).
[0474] CsgG:CsgF complexes made in any of the methods described in
the methods section can be used to characterise the complex in DNA
sequencing experiments. Signals of a lambda DNA strand passing
through various CsgG:CsgF complexes made by different methods
consisting of different CsgG mutant pores and different CsgF
peptides with different lengths are shown in FIGS. 21-24. Reader
head discrimination of those pore complexes and their base
contribution profiles are shown in FIG. 28 (A-H). Surprisingly,
different modifications at constrictions of both CsgG pore and the
CsgF peptide can alter the signal of the CsgG:CsgF pore complex
significantly. For example, when the CsgG:CsgF complexes are made
with the same CsgG pore, but with two different CsgF peptides of
the same length containing either Asn or Ser at position 17 (of Seq
ID No. 6) (made by the same method of co-expression of the full
length CsgF protein followed by TEV protease cleavage of CsgF
between positions 35 and 36), the signals generated are different
from each other (FIG. 21). The CsgG:CsgF complex with Ser at
position 17 of the CsgF peptide shows lower noise and higher
signal:noise ratio compared to the CsgG:CsgF complex with Asn at
position 17 of the CsgF peptide. Similarly, when the same CsgG pore
was reconstituted with two different peptides of CsgF of the same
length (1-35 of Seq ID No. 6) but with either Ser or Val at positon
17 to make the CsgG:CsgF complexes, the complex with Val at
position 17 of CsgF shows a noisier signal than the complex with
Ser at position 17 of CsgF (FIG. 22). When the same CsgF peptide of
the same length was reconstituted with different CsgG pores
containing different mutations at the CsgG reader head (positions
51, 55 and 56), the resulting CsgG:CsgF complexes showed very
different signals (FIG. 23, A-F) with different signal to noise
ratios (FIG. 25). Surprisingly, when different lengths of CsgF
peptides that contained the same constriction region were
reconstituted with the same CsgG pore to make CsgG:CsgF complexes,
they gave signals with a different range (FIG. 24). CsgG:CsgF
complex which contains the shortest CsgF peptide (1-29 of Seq ID
No. 6) showed the largest range and the CsgG:CsgF complex which
contains the longest CsgF peptide (1-45 of Seq ID No. 6) showed the
smallest range (FIG. 24).
Materials and Methods for Characterisation of Analytes:
[0475] The proteins produced by the methods described below can be
used interchangeably with those produced by the methods described
above with respect to structural determination.
Methods
Expression of the CsgG:CsgF or CsgG:FCP Complex by
Co-Expression
[0476] Genes encoding the CsgG proteins and its mutants are
constructed in the pT7 vector which contains ampicillin resistance
gene. Genes encoding the CsgF or FCP proteins and its mutants are
constructed in the pRham vector which contains Kanamycin resistant
gene. 1 uL of both plasmids is mixed with 50 uL of
Lemo(DE3).DELTA.CsgEFG for 10 minutes on ice. The sample is then
heated at 42.degree. C. for 45 seconds before being returned to ice
for another 5 minutes. 150 uL of NEB SOC outgrowth medium is added
and the sample is incubated at 37.degree. C. with shaking at 250
rpm for 1 hour. The entire volume is spread onto an agar plate
containing kanamycin (40 ug/mL), ampicillin (100 ug/mL) and
chloramphenicol (34 ug/ml) and incubated overnight at 37.degree. C.
Single colony is taken from the plate and inoculated into 100 mL of
LB media containing kanamycin (40 ug/mL), ampicillin (100 ug/mL)
and chloramphenicol (34 ug/ml) and incubated overnight at
37.degree. C. with shaking at 250 rpm. 25 mL of the starter culture
is added to 500 mL of LB media containing 3 mM ATP, 15 mM
MgSO.sub.4, kanamycin (40 ug/mL), ampicillin (100 ug/mL) and
chloramphenicol (34 ug/ml) and incubated overnight at 37.degree. C.
The culture was allowed to grow for 7 hours, at which point the
OD.sub.600 was greater than 3.0. Lactose (1.0% final
concentration), glucose (0.2% final concentration) and rhamnose (2
mM final concentration) were added and the temperature dropped to
18.degree. C. whist shaking is maintained at 250 rpm for 16 hours.
Culture was centrifuged at 6000 rpm for 20 mins at 4.degree. C. The
supernatant was discarded and the pellet kept. Cells stored at
-80.degree. C. until purification.
Expression of the CsgG Pore with or without a C-Term Strep Tag and
CsgF with or without a C Terminal Strep or His Tag
[0477] All genes encoding all the CsgG proteins and CsgF or FCP
proteins are constructed in the pT7 vector which contains
ampicillin resistance gene. Expression procedure is same as above
except for Kanacmycin is being omitted in all medias and
buffers.
Cell lysis (co expressed complex or individual CsgG/CsgF/FCP
proteins)
[0478] The lysis buffer is made of 50 mM Tris, pH 8.0, 150 mM NaCl,
0.1% DDM, 1.times. Bugbuster Protein Extraction Reagent (Merck),
2.5 uL Benzonase Nuclease (stock .gtoreq.250 units/.mu.L)/100 mL of
lysis buffer and 1 tablet Sigma Protease inhibitor cocktail/100 mL
of lysis buffer. 5.times. volume of lysis buffer is used to lyse
1.times. weight of harvested cells. Cells resuspended and left to
spin at room temperature for 4 hours until a homogenous lysate is
produced. Lysate is spun at 20,000 rpm for 35 minutes at 4.degree.
C. The supernatant is carefully extracted and filtered through a
0.2 uM Acrodisc syringe filter.
Strep Purification of the CsgG or CsgF/FCP Proteins or Co-Expressed
Complex if the CsgG Contains a C-Term Strep Tag and CsgF or FCP
Contains a C-Term His Tag
[0479] The filtered sample was then loaded onto a 5 mL StrepTrap
column with the following parameters: Loading speed: 0.8 mL/min,
Complete sample loading: 10 mL, Wash out unbound: 10 CV (5 mL/min),
Extra wash: 10 CV (5 mL/min), Elution: 3 CV (5 mL/min). Affinity
buffer: 50 mL Tris, pH 8.0, 150 mM NaCl, 0.1% DDM; Wash buffer: 50
mL Tris, pH 8.0, 2M NaCl, 0.1% DDM; Elution buffer: 50 mL Tris,
pH8.0, 150 mM NaCl, 0.1% DDM, 10 mM desthiobiotin. Eluted sample is
collected.
His Purification of the CsgG or CsgF/FCP Proteins or Co-Expressed
Complex if the CsgG Contains a C-Term Strep Tag and CsgF or FCP
Contains a C-Term His Tag
[0480] Filtered sample or pooled eluted peaks from Strep
purification (in case of the complex) loaded onto 5 mL HisTrap
column using the same parameters as above, except with the
following buffers: Affinity & wash buffer: 50 mL Tris, pH 8.0,
150 mM NaCl, 0.1% DDM, 25 mM imidazole; Elution: 50 mL Tris, pH
8.0, 150 mM NaCl, 0.1% DDM, 350 mM imidazole. Peak eluted,
concentrated in 30 kDa MWCO Merck Milipore centrifugal unit to a
volume of 500 uL.
Formation of the Complex In Vitro with In Vivo Purified
Components.
[0481] Both the CsgG and the CsgF/FCP proteins expressed and
purified separately are mixed in various ratios to identify the
correct ratio. however always in excess CsgF conditions. The
complex was then incubated overnight at 25.degree. C. To remove the
excess CsgF and remove DTT from the buffer, the mixture was again
injected onto the Superdex Increase 200 10/300 equilibrated in 50
mM Tris, pH 8.0, 150 mM NaCl, 0.1% DDM. The complex usually elutes
between 9 to 10 mL on this column.
Polishing Step with Gel Filtration for the Complex (Co-Expressed or
Made In Vitro) If necessary, Strep purified or His purified or His
followed by Strep purified CsgG:CsgF or CsgG:FCP can be subjected
to a further polishing step by gel filtration. 500 uL of the sample
was injected into a 1 mL sample loop and onto the Superdex Increase
200 10/300 equilibrated in 50 mM Tris, pH 8.0, 150 mM NaCl, 0.1%
DDM. The peak associated to the complex usually elutes between 9
and 10 mL on this column when run 1 mL/min. Sample was heated at
60.degree. C. for 15 minutes and centrifuged at 21,000rcf for 10
mins. Supernatant was taken for testing. Samples were subjected to
SDS-PAGE to confirm and identify fractions eluted with the
complex.
Cleavage of CsgF or FCP at the TEV Protease Site
[0482] If the CsgF or FCP contains a TEV cleavage site,
TEV-protease with a C-term Histidine tag is added to the sample
(amount added is identified based on the rough concentration of the
protein complex) with 2 mM DTT. Sample incubated overnight at
4.degree. C. on the roller mixer at 25 rpm. The mixture is then run
back through a 5 mL HisTrap column and the flow through is
collected. Anything uncleaved will remain bound to the column and
the cleaved protein will elute. Same buffers and parameters and the
final heating step are used as in the His purification described
above.
Purifying the CsgG:FCP Complex with In Vivo Purified CsgG Pore and
Synthetic FCP
[0483] Lyophilised FCP peptides received from Genscript and
Lifetein. 1 mg of peptide dissolved in 1 mL of nuclease free
ddH.sub.2O to obtain 1 mg/mL sample. Sample was vortexed until no
peptide remains visible. Due to differences in expression levels of
CsgG pores and mutants, it's difficult to measure the concentration
accurately. Intensity of protein bands on SDS-PAGE against known
markers can be used to get a rough estimate of the sample. CsgG and
FCP are then mixed in approximately 1:50 molar ratio and incubate
at 25.degree. C. overnight at 700 rpm. Samples were heated at
60.degree. C. for 15 minutes and centrifuged at 21,000rcf for 10
mins. Supernatant was taken for testing. If needed, the complex can
be purified as detailed above in co-expression.
Purifying CsgG:CsgF or CsgG:FCP Containing Cysteine Mutants
[0484] Same procedure as above can be used to purify the CsgG:CsgF
or CsgG:FCP complexes (with I or II or III below) if either or both
components contain cysteines except for the composition of
affinity, wash and elution buffers in His and Strep purifications
and the buffer used in gel filtration. To purify cysteine mutants,
all these buffers should contain 2 mM DTT. 2 mM DTT was also been
added when synthetic peptides containing cysteines are dissolved in
ddH2O
[0485] I. co-expression of CsgG and CsgF or FCP
[0486] II. Making the CsgG:CsgF or CsgG:FCP complexes in vitro with
in vivo purified individual components
[0487] III. Making the CsgG:CsgF or CsgG:FCP complexes in vitro
with in vivo purified CsgG and synthetic FCP
Determination of Cys-Bond Formation
[0488] Two tubes of 50 uL each from the final elution were
separated. In one of the tube, 2 mM DTT was added as a reducing
agent and in the other tube 100 .mu.M of Cu(II):1-10 Phenanthroline
(33 mM: 100 mM) was added as an oxidizing agent. Samples were mixed
1:1 with Laemmli buffer containing 4% SDS. Half the sample were
heat treated to 100deg for 10 min (denaturating condition) and half
of them were left untreated, before running on a 4-20% TGX gel
(Bio-rad Criterion) in TGS buffer.
Coupled In Vitro Transcription and Translation (IVTT)
[0489] All proteins were generated by coupled in vitro
transcription and transition (IVTT) by using an E. coli T7-S30
extract system for circular DNA (Promega). The complete 1 mM amino
acid mixture minus cysteine and the complete 1 mM amino acid
mixture minus methionine were mixed in equal volumes to obtain the
working amino acid solution required to generate high
concentrations of the proteins. The amino acids (10 uL) were mixed
with premix solution (40 uL), [35S]L-methionine (2 uL, 1175
Ci/mmol, 10 mCi/mL), plasmid DNA (16 uL, 400 ng/uL) and T7 S30
extract (30 uL) and rifarnpicin (2 uL, 20 mg/mL) to generate a 100
uL reaction of IVTT proteins. Synthesis was carried out for 4 hours
at 30.degree. C. followed by overnight incubation at room
temperature. If the CsgG:CsgF or CsgG; FCP complexes were made in
co-expression, plasmid DNAs encoding each component were mixed in
equal amounts, and a portion of the mixture (16 uL) was used for
IVTT. After incubation, the tube was centrifuged for 10 minutes at
22000g, of which the supernatant was discarded. The resulting
pellet was resuspended and washed in MBSA (10 mM MOPS, 1 mg/ml BSA
pH7.4) and centrifuged again under the same conditions, The protein
present in the pellet was re-suspended in 1.times. Laemmli sample
buffer and run in 4-20% TGX gel at 300V for 25 min. The gel was
then dried and exposed to Carestream.RTM. Kodak.RTM. BioMax.RTM. MR
film overnight. The film was then processed and the protein in the
gel visualized,
Samples for Testing in MinIONs
[0490] All samples prior to testing are incubated with Brij58
(final concentration of 0.1%) for 10 minutes at room temperature
before making up subsequent pore dilutions necessary for pore
insertion.
Method for Preparing and Running Static Strands
[0491] A set of polyA DNA strands (SS20 to SS38 of FIG. 27) in
which one base is missing from the DNA backbone (iSpc3) is obtained
by integrated DNA Technologies (iDT). 3' end of each of these
strand also comprise a biotin modification. The static strands are
incubated with monovalent streptavidin at room temperature for 20
minutes, resulting in the biotin binding to the streptavidin. The
streptavidin-static strand complex was diluted to 500 nM (B, FIG.
27) and 2 uM (C, FIG. 27) in 25 mM HEPES, 430 mM KCl, 30 mM ATP, 30
mM MgCl2, 2.15 mM EDTA, pH8 (known as RBFM). The residual current
generated by each static strand is recorded in a MinION set up,
MinION flow cells were flushed as per standard running protocols,
and then the sequencing protocol was started with 1 minute static
flicks, initially 10 minutes of open pore recording was generated
before 150 uL of the first streptavidin-static strand complex was
added. After 10 minutes, 300 uL of RBFM was flushed through the
flow ceil before the next streptavidin-static strand complex was
added. This process was repeated for ail streptavidin-static
strands. Once the final streptavidin-static strand complex had been
incubated on the flow cell, 800 uL of RBFM was flushed through the
flow cell and 10 minutes of open pore recording was generated
before finishing the experiment.
Method for Making Discrimination Profile Plots
[0492] The reader head discrimination profiles show the average
variation in modelled current when the base at each reader head
position is varied. To calculate the reader head discrimination at
position i for a model of length k with alphabet of length n, we
defined the discrimination at reader head position as the median of
the standard deviations in current level for each of the n.sup.k
groups of size n where position i is varied while other positions
are held constant.
Aspects of the Disclosure
[0493] 1. An isolated pore complex comprising a CsgG pore, or a
homologue or mutant thereof, and a modified CsgF peptide, or a
homologue or mutant thereof. 2. The isolated pore complex according
to 1, wherein the modified CsgF peptide, or a homologue or mutant
thereof, is inserted into the lumen of the CsgG pore, or a
homologue or mutant thereof. 3. The isolated pore complex according
to 2, wherein the pore complex has two or more channel
constrictions, comprising a CsgG channel constriction and a CsgF
channel constriction. 4. The isolated pore complex according to any
one of 1 to 3, wherein the CsgG pore, or homologue or mutant
thereof, is a mutant CsgG pore. 5. The isolated pore complex
according to 3 or 4, wherein the CsgF channel constriction has a
diameter in the range from 0.5 nm to 2.0 nm 6. A modified CsgF
peptide, or a modified peptide of a CsgF homologue or mutant,
wherein the modification comprises a truncation of the CsgF protein
SEQ ID NO:6 or of a homologue or mutant thereof. 7. A modified CsgF
peptide, or a modified peptide of a CsgF homologue or mutant,
according to 6, wherein said modified CsgF peptide comprises SEQ ID
NO:39, or SEQ ID NO:40, or a homologue or mutant thereof. 8. A
modified CsgF peptide, or a modified peptide of a CsgF homologue or
mutant, according to 6, wherein said modified CsgF peptide
comprises SEQ ID NO:15, or a homologue or mutant thereof. 9. A
modified CsgF peptide according to 8, or a modified peptide of a
CsgF homologue or mutant, wherein one or more positions in the
region comprising SEQ ID NO:15 are mutated, with a minimal of 35%
amino acid identity to SEQ ID NO:15. 10. A polynucleotide which
encodes a modified CsgF peptide according to any one of 6 to 9. 11.
The isolated pore complex according to any one of 1 to 5, wherein
said modified CsgF peptide, or a homologue or mutant thereof, is a
peptide according to any one of 6 to 9. 12. The isolated pore
complex according to 11, wherein the modified CsgF peptide and the
CsgG pore, or homologues or mutants thereof, are covalently
coupled. 13. The isolated pore complex according to 12, wherein the
covalent coupling is via: (i) a cysteine residue at a position
corresponding to 132, 133, 136, 138, 140, 142, 144, 145, 147, 149,
151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 or 209
of SEQ ID NO: 3, or of a homologue thereof; (ii) a non-native
reactive or photoreactive amino acid at a position corresponding to
132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155,
183, 185, 187, 189, 191, 201, 203, 205, 207 or 209 of SEQ ID NO: 3,
or of a homologue thereof. 14. An isolated transmembrane pore
complex comprising the isolated pore complex according to any one
of 1 to 5 or 11 to 13, and the components of a membrane. 15. A
method for producing a transmembrane pore complex, wherein the pore
complex is formed by a CsgG pore and a modified CsgF peptide, or
homologues or mutants thereof, comprising co-expressing CsgG as
depicted in SEQ ID NO:2, or a homologue or mutant thereof, and a
modified CsgF peptide, or a homologue or mutant thereof, in a
suitable host cell, thereby allowing in vivo transmembrane pore
complex formation. 16. The method according to 15, wherein the
modified CsgF peptide, or homologue or mutant thereof, comprises
SEQ ID NO:12 or SEQ ID NO:14, or a homologue or mutant thereof. 17.
A method for producing an isolated pore complex, wherein the
isolated pore is formed by a CsgG pore, or a homologue or mutant
thereof, and a modified CsgF peptide, or a homologue or mutant
thereof, comprising contacting the CsgG monomers of SEQ ID NO:3, or
a homologue or mutant thereof, with modified CsgF peptide, or a
homologue or mutant thereof, thereby allowing in vitro
reconstitution of the isolated pore complex. 18. The method
according to 17, wherein the modified CsgF peptide, or a homologue
or mutant thereof, comprises SEQ ID NO:15 or SEQ ID NO:16, or a
homologue or mutant thereof. 19. A method for determining the
presence, absence or one or more characteristics of a target
analyte, comprising the steps of: (i) contacting the target analyte
with a pore complex according to any one of 1 to 5 or 11 to 13 or
with a transmembrane pore complex according to 14, such that the
target analyte moves into the pore complex; and (ii) taking one or
more measurements as the analyte moves through the pore complex and
thereby determining the presence, absence or one or more
characteristics of the analyte. 20. A method according to 19,
wherein the analyte is a polynucleotide. 21. A method according to
19, wherein the analyte is a (poly)peptide 22. A method according
to 19, wherein the analyte is a polysaccharide 23. A method
according to 19, wherein the analyte is a small organic or
inorganic compound, such as pharmacologically active compounds,
toxic compounds and pollutants. 24. A method according to 20,
comprising determining one or more characteristics selected from
(i) the length of the polynucleotide, (ii) the identity of the
polynucleotide, (iii) the sequence of the polynucleotide, (iv) the
secondary structure of the polynucleotide and (v) whether or not
the polynucleotide is modified. 25. A method of characterising a
polynucleotide or a (poly)peptide using an isolated transmembrane
pore complex, wherein the pore complex is a complex comprising a
CsgG pore, or a homologue or mutant thereof, and a modified CsgF
peptide, or a homologue or mutant thereof. 26. A method according
to 25, wherein the CsgG pore, or homologue or mutant thereof,
comprises six to ten monomers. 27. Use of the isolated pore complex
according to any one of 1 to 5 or 11 to 13 or of the transmembrane
pore complex according to 14 to determine the presence, absence or
one or more characteristics of a target analyte. 28. A kit for
characterising a target analyte comprising (a) an isolated pore
complex according to any one of 1 to 5 or 11 to 13 and (b) the
components of a membrane.
Sequences
Description of the Sequences:
[0494] SEQ ID NO:1 shows polynucleotide sequence of wild-type E.
coli CsgG from strain K12, including signal sequence (Gene ID:
945619). SEQ ID NO:2 shows amino acid sequence of wild-type E. coli
CsgG including signal sequence (Uniprot accession number P0AEA2).
SEQ ID NO:3 shows amino acid sequence of wild-type E. coli CsgG as
mature protein (Uniprot accession number P0AEA2). SEQ ID NO:4 shows
polynucleotide sequence of wild-type E. coli CsgF from strain K12,
including signal sequence (Gene ID: 945622). SEQ ID NO:5 shows
amino acid sequence of wild-type E. coli CsgF including signal
sequence (Uniprot accession number P0AE98). SEQ ID NO:6 shows amino
acid sequence of wild-type E. coli CsgF as mature protein (Uniprot
accession number P0AE98). SEQ ID NO:7 shows polynucleotide sequence
of a fragment of wild-type E. coli CsgF encoding amino acids 1 to
27 and a C-terminal 6 His tag. SEQ ID NO:8 shows amino acid
sequence of a fragment of wild-type E. coli CsgF encompassing amino
acids 1 to 27 and a C-terminal 6 His tag. SEQ ID NO:9 shows
polynucleotide sequence of a fragment of wild-type E. coli CsgF
encoding amino acids 1 to 38 and a C-terminal 6 His tag. SEQ ID
NO:10 shows amino acid sequence of a fragment of wild-type E. coli
CsgF encompassing amino acids 1 to 38 and a C-terminal 6 His tag.
SEQ ID NO:11 shows polynucleotide sequence of a fragment of
wild-type E. coli CsgF encoding amino acids 1 to 48 and a
C-terminal 6 His tag. SEQ ID NO:12 shows amino acid sequence of a
fragment of wild-type E. coli CsgF encompassing amino acids 1 to 48
and a C-terminal 6 His tag. SEQ ID NO:13 shows polynucleotide
sequence of a fragment of wild-type E. coli CsgF encoding amino
acids 1 to 64 and a C-terminal 6 His tag. SEQ ID NO:14 shows amino
acid sequence of a fragment of wild-type E. coli CsgF encompassing
amino acids 1 to 64 and a C-terminal 6 His tag. SEQ ID NO:15 shows
amino acid sequence of a peptide corresponding to residues 20 to 53
of E. coli CsgF SEQ ID NO:16 shows amino acid sequence of a peptide
corresponding to residues 20 to 42 of E. coli CsgF, including KD at
its C-terminus SEQ ID NO:17 shows amino acid sequence of a peptide
corresponding to residues 23 to 55 of CsgF homologue Q88H88 SEQ ID
NO:18 shows amino acid sequence of a peptide corresponding to
residues 25 to 57 of CsgF homologue A0A143HJA0 SEQ ID NO:19 shows
amino acid sequence of a peptide corresponding to residues 21 to 53
of CsgF homologue Q5E245 SEQ ID NO:20 shows amino acid sequence of
a peptide corresponding to residues 19 to 51 of CsgF homologue
Q084E5 SEQ ID NO:21 shows amino acid sequence of a peptide
corresponding to residues 15 to 47 of CsgF homologue FOLZU2 SEQ ID
NO:22 shows amino acid sequence of a peptide corresponding to
residues 26 to 58 of CsgF homologue A0A136HQR0 SEQ ID NO:23 shows
amino acid sequence of a peptide corresponding to residues 21 to 53
of CsgF homologue A0A0W1 SRL3 SEQ ID NO:24 shows amino acid
sequence of a peptide corresponding to residues 26 to 59 of CsgF
homologue B0UH01 SEQ ID NO:25 shows amino acid sequence of a
peptide corresponding to residues 22 to 53 of CsgF homologue Q6NAU5
SEQ ID NO:26 shows amino acid sequence of a peptide corresponding
to residues 7 to 38 of CsgF homologue G8PUY5 SEQ ID NO:27 shows
amino acid sequence of a peptide corresponding to residues 25 to 57
of CsgF homologue A0A0S2ETP7 SEQ ID NO:28 shows amino acid sequence
of a peptide corresponding to residues 19 to 51 of CsgF homologue
E311Z1 SEQ ID NO:29 shows amino acid sequence of a peptide
corresponding to residues 24 to 55 of CsgF homologue F3Z094 SEQ ID
NO:30 shows amino acid sequence of a peptide corresponding to
residues 21 to 53 of CsgF homologue A0A176T7M2 SEQ ID NO:31 shows
amino acid sequence of a peptide corresponding to residues 14 to 45
of CsgF homologue D2QPP8 SEQ ID NO:32 shows amino acid sequence of
a peptide corresponding to residues 28 to 58 of CsgF homologue
N2IYT1 SEQ ID NO:33 shows amino acid sequence of a peptide
corresponding to residues 26 to 58 of CsgF homologue W7QHV5 SEQ ID
NO:34 shows amino acid sequence of a peptide corresponding to
residues 23 to 55 of CsgF homologue D4ZLW2 SEQ ID NO:35 shows amino
acid sequence of a peptide corresponding to residues 21 to 53 of
CsgF homologue D2QT92 SEQ ID NO:36 shows amino acid sequence of a
peptide corresponding to residues 20 to 51 of CsgF homologue
A0A167UJA2 SEQ ID NO:37 shows amino acid sequence of a fragment of
wild-type E. coli CsgF encompassing amino acids 20 to 27. SEQ ID
NO:38 shows amino acid sequence of a fragment of wild-type E. coli
CsgF encompassing amino acids 20 to 38. SEQ ID NO:39: shows amino
acid sequence of a fragment of wild-type E. coli CsgF encompassing
amino acids 20 to 48. SEQ ID NO:40 shows amino acid sequence of a
fragment of wild-type E. coli CsgF encompassing amino acids 20 to
64. SEQ ID NO:41 shows the nucleotide sequence of primer
CsgF_d27_end SEQ ID NO:42 shows the nucleotide sequence of primer
CsgF_d38_end SEQ ID NO:43 shows the nucleotide sequence of primer
CsgF_d48_end SEQ ID NO:44 shows the nucleotide sequence of primer
CsgF_d64_end SEQ ID NO:45 shows the nucleotide sequence of primer
pNa62_CsgF_histag_Fw SEQ ID NO:46 shows the nucleotide sequence of
primer CsgF-His_pET22b_FW SEQ ID NO:47 shows the nucleotide
sequence of primer CsgF-His_pET22b_Rev SEQ ID NO:48 shows the
nucleotide sequence of primer csgEFG_pDONR221_FW SEQ ID NO:49 shows
the nucleotide sequence of primer csgEFG_pDONR221_Rev SEQ ID NO:50
shows the nucleotide sequence of primer Mut_csgF_His_FW SEQ ID
NO:51 shows the nucleotide sequence of primer Mut_csgF_His_Rev SEQ
ID NO:52 shows the nucleotide sequence of primer DelCsgE_Rev SEQ ID
NO:53 shows the nucleotide sequence of primer DelCsgE FW SEQ ID NO:
54 shows the amino acid sequence of residues 1 to 30 of mature E.
coli CsgF SEQ ID NO: 55 shows the amino acid sequence of residues 1
to 35 of mature E. coli CsgF SEQ ID NO: 56 shows the amino acid
sequence of a mutated (T4C/N17S) CsgF sequence with a signal
sequence, and a TEV protease cleavage site (ENLYFQS) inserted
between residues 35 and 36 of sequence of the mature protein. SEQ
ID NO: 57 shows the amino acid sequence of a mutated (N17S-Del)
CsgF sequence with a signal sequence, and a TEV protease cleavage
site (ENLYFQS) inserted between residues 35 and 36 of sequence of
the mature protein. SEQ ID NO: 58 shows the amino acid sequence of
a mutated (G1C/N17S) CsgF sequence with a signal sequence, and a
TEV protease cleavage site (ENLYFQS) inserted between residues 35
and 36 of sequence of the mature protein. SEQ ID NO: 59 shows the
amino acid sequence of a mutated (G1C) CsgF sequence with a signal
sequence, and a TEV protease cleavage site (ENLYFQS) inserted
between residues 35 and 36 of sequence of the mature protein. SEQ
ID NO: 60 shows the amino acid sequence of a CsgF sequence with a
signal sequence, a TEV protease cleavage site (ENLYFQS) inserted
between residues 45 and 46 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 61 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a TEV
protease cleavage site (ENLYFQS) inserted between residues 35 and
36 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 62 shows the amino acid sequence of a CsgF
sequence with a signal sequence, a TEV protease cleavage site
(ENLYFQS) inserted between residues 30 and 31 of sequence of the
mature protein, and a His.sub.10 tag at the C-terminus. SEQ ID NO:
63 shows the amino acid sequence of a CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 51 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 64 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a TEV
protease cleavage site (ENLYFQS) inserted between residues 30 and
37 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 65 shows the amino acid sequence of a CsgF
sequence with a signal sequence, a HCV C3 protease cleavage site
(LEVLFQGP) inserted between residues 34 and 36 of sequence of the
mature protein, and a His.sub.10 tag at the C-terminus. SEQ ID NO:
66 shows the amino acid sequence of a CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 42 and 43 of sequence of the mature protein, and a
His.sub.10 tag at the C-terminus. SEQ ID NO: 67 shows the amino
acid sequence of a CsgF sequence with a signal sequence, a HCV C3
protease cleavage site (LEVLFQGP) inserted between residues 38 and
47 of sequence of the mature protein, and a His.sub.10 tag at the
C-terminus. SEQ ID NO: 68 shows the amino acid sequence of
YP_001453594.1: 1-248 of hypothetical protein CKO_02032
[Citrobacter koseri ATCC BAA-895], which is 99% identical to SEQ ID
NO: 3. SEQ ID NO: 69 shows the amino acid sequence of
WP_001787128.1: 16-238 of curli production assembly/transport
component CsgG, partial [Salmonella enterica], which is 98% to SEQ
ID NO: 3. SEQ ID NO: 70 shows the amino acid sequence of
KEY44978.1|:16-277 of curli production assembly/transport protein
CsgG [Citrobacter amalonaticus], which is 98% identical to SEQ ID
NO: 3. SEQ ID NO: 71 shows the amino acid sequence of
YP_003364699.1: 16-277 of curli production assembly/transport
component [Citrobacter rodentium ICC 68], which is 97% identical to
SEQ ID NO: 3. SEQ ID NO: 72 shows the amino acid sequence of
YP_004828099.1: 16-277 of curli production assembly/transport
component CsgG [Enterobacter asburiae LF7a], which is 94% identical
to SEQ ID NO: 3. SEQ ID NO: 73 shows the amino acid sequence of
WP_006819418.1: 19-280 of transporter [Yokenella regensburgei],
which is 91% identical to SEQ ID NO: 3. SEQ ID NO: 74 shows the
amino acid sequence of WP_024556654.1: 16-277 of curli production
assembly/transport protein CsgG [Cronobacter pulveris], which is
89% identical to SEQ ID NO: 3. SEQ ID NO: 75 shows the amino acid
sequence of YP_005400916.1:16-277 of curli production
assembly/transport protein CsgG [Rahnella aquatilis HX2], which is
84% identical to SEQ ID NO: 3. SEQ ID NO: 76 shows the amino acid
sequence of KFC99297.1: 20-278 of CsgG family curli production
assembly/transport component [Kluyvera ascorbata ATCC 33433], which
is 82% identical to SEQ ID NO: 3. SEQ ID NO: 77 shows the amino
acid sequence of KFC86716.11:16-274 of CsgG family curli production
assembly/transport component [Hafnia alvei ATCC 13337], which is
81% identical to SEQ ID NO: 3. SEQ ID NO: 78 shows the amino acid
sequence of YP_007340845.11:16-270 of uncharacterised protein
involved in formation of curli polymers [Enterobacteriaceae
bacterium strain FGI 57], which is 76% identical to SEQ ID NO: 3.
SEQ ID NO: 79 shows the amino acid sequence of WP_010861740.1:
17-274 of curli production assembly/transport protein CsgG
[Plesiomonas shigelloides], which is 70% identical to SEQ ID NO: 3.
SEQ ID NO: 80 shows the amino acid sequence of YP_205788.1: 23-270
of curli production assembly/transport outer membrane lipoprotein
component CsgG [Vibrio fischeri ES114], which is 60% identical to
SEQ ID NO: 3. SEQ ID NO: 81 shows the amino acid sequence of
WP_017023479.1: 23-270 of curli production assembly protein CsgG
[Aliivibrio logei], which is 59% identical to SEQ ID NO: 3. SEQ ID
NO: 82 shows the amino acid sequence of WP_007470398.1: 22-275 of
Curli production assembly/transport component CsgG [Photobacterium
sp. AK 5], which is 57% identical to SEQ ID NO: 3. SEQ ID NO: 83
shows the amino acid sequence of WP_021231638.1: 17-277 of curli
production assembly protein CsgG [Aeromonas veronii], which is 56%
identical to SEQ ID NO: 3. SEQ ID NO: 84 shows the amino acid
sequence of WP_033538267.1: 27-265 of curli production
assembly/transport protein CsgG [Shewanella sp. ECSMB14101], which
is 56% identical to SEQ ID NO: 3. SEQ ID NO: 85 shows the amino
acid sequence of WP_003247972.1: 30-262 of curli production
assembly protein CsgG [Pseudomonas putida], which is 54% identical
to SEQ ID NO: 3. SEQ ID NO: 86 shows the amino acid sequence of
YP_003557438.1: 1-234 of curli production assembly/transport
component CsgG [Shewanella violacea DSS12], which is 53% identical
to SEQ ID NO: 3. SEQ ID NO: 87 shows the amino acid sequence of
WP_027859066.1: 36-280 of curli production assembly/transport
protein CsgG [Marinobacterium jannaschii], which is 53% identical
to SEQ ID NO: 3. SEQ ID NO: 88 shows the amino acid sequence of
CEJ70222.1: 29-262 of Curli production assembly/transport component
CsgG [Chryseobacterium oranimense G311], which is 50% identical to
SEQ ID NO: 3.
TABLE-US-00006 (>P0AEA2; coding sequence for WT CsgG from E.
coli K12) SEQ ID NO: 1
ATGCAGCGCTTATTTCTTTTGGTTGCCGTCATGTTACTGAGCGGATGCTTAACCGCCCCGCCTAA
AGAAGCCGCCAGACCGACATTAATGCCTCGTGCTCAGAGCTACAAAGATTTGACCCATCTGCCA
GCGCCGACGGGTAAAATCTTTGTTTCGGTATACAACATTCAGGACGAAACCGGGCAATTTAAACC
CTACCCGGCAAGTAACTTCTCCACTGCTGTTCCGCAAAGCGCCACGGCAATGCTGGTCACGGCA
CTGAAAGATTCTCGCTGGTTTATACCGCTGGAGCGCCAGGGCTTACAAAACCTGCTTAACGAGC
GCAAGATTATTCGTGCGGCACAAGAAAACGGCACGGTTGCCATTAATAACCGAATCCCGCTGCA
ATCTTTAACGGCGGCAAATATCATGGTTGAAGGTTCGATTATCGGTTATGAAAGCAACGTCAAAT
CTGGCGGGGTTGGGGCAAGATATTTTGGCATCGGTGCCGACACGCAATACCAGCTCGATCAGAT
TGCCGTGAACCTGCGCGTCGTCAATGTGAGTACCGGCGAGATCCTTTCTTCGGTGAACACCAGT
AAGACGATACTTTCCTATGAAGTTCAGGCCGGGGTTTTCCGCTTTATTGACTACCAGCGCTTGCT
TGAAGGGGAAGTGGGTTACACCTCGAACGAACCTGTTATGCTGTGCCTGATGTCGGCTATCGAA
ACAGGGGTCATTTTCCTGATTAATGATGGTATCGACCGTGGTCTGTGGGATTTGCAAAATAAAGC
AGAACGGCAGAATGACATTCTGGTGAAATACCGCCATATGTCGGTTCCACCGGAATCCTGA
(>P0AEA2 (1:277); WT prepro CsgG from E. coli K12) SEQ ID NO: 2
MQRLFLLVAVMLLSGCLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPA
SNFSTAVPQSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIM
VEGSIIGYESNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVF
RFIDYQRLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPP
ES (>P0AEA2 (16:277); mature CsgG from E. coli K12) SEQ ID NO: 3
CLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAML
VTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGG
VGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEVGYT
SNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES
(>P0AE98; coding sequence for WT CsgF from E. coli K12) SEQ ID
NO: 4
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCAT
GACTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAG
CGCTCAGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACACCCT
CAGCGTTAGATAACTTTACTCAGGCCATCCAGTCACAAATTTTAGGTGGGCTACTGTCGAATATTA
ATACCGGTAAACCGGGCCGCATGGTGACCAACGATTATATTGTCGATATTGCCAACCGCGATGG
TCAATTGCAGTTGAACGTGACAGATCGTAAAACCGGACAAACCTCGACCATCCAGGTTTCGGGTT
TACAAAATAACTCAACCGATTTT (>P0AE98 (1:138); WT pre CsgF from E.
coli K12) SEQ ID NO: 5
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPS
ALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNN
STDF (>P0AE98 (20:138); WT mature CsgF from E. coli K12) SEQ ID
NO: 6
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSALDNFTQAIQSQILGGLLSNI
NTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTDF (>P0AE98;
coding sequence for CsgF 1:27_6His) SEQ ID NO: 7
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCAT
GACTTTCCAGTTCCGTCATCACCATCACCATCACTAAGCCC (>P0AE98 (1:28);
preprotein of CsgF 20:27_6His) SEQ ID NO: 8 MRVKHAVVLLMLISPLSWA
GTMTFQFRHHHHHH (>P0AE98; coding sequence for CsgF 1:38_6His) SEQ
ID NO: 9
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCAT
GACTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCCATCACCATCACCATC
ACTAAGCCC (>P0AE98 (1:39); preprotein of CsgF 20:38_6His) SEQ ID
NO: 10 MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNG HHHHHH (>P0AE98;
coding sequence for CsgF 1:48_6His) SEQ ID NO: 11
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCAT
GACTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAG
CGCTCAGGCCCAACATCACCATCACCATCACTAAGCCC (>P0AE98 (1:49);
preprotein of CsgF 20:48_6His) SEQ ID NO: 12
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQ HHHHHH
(>P0AE98; coding sequence for CsgF 1:64_6His) SEQ ID NO: 13
ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCAT
GACTTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAG
CGCTCAGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACA
CATCACCATCACCATCACTAAGCCC (>P0AE98 (1:65); preprotein of CsgF
20:64_6His) SEQ ID NO: 14
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETHH
HHHH (>P0AE98 (20:53); mature peptide of CsgF 20:53) SEQ ID NO:
15 GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKD (>P0AE98 (20:42); mature
peptide of CsgF 20:42 + KD) SEQ ID NO: 16 GTMTFQFRNPNFGGNPNNGAFLLKD
(>Q88H88_PSEPK (23:55)) SEQ ID NO: 17
TELVYTPVNPAFGGNPLNGTWLLNNAQAQNDY (>A0A143HJA0_9GAMM (25:57)) SEQ
ID NO: 18 TELIYEPVNPNFGGNPLNGSYLLNNAQAQDRH (>Q5E245_VIBF1
(21:53)) SEQ ID NO: 19 SELVYTPVNPNFGGNPLNTSHLFGGANAINDY
(>Q084E5_SHEFN (19:51)) SEQ ID NO: 20
TQLVYTPVNPAFGGSYLNGSYLLANASAQNEH (>F0LZU2_VIBFN (15:47)) SEQ ID
NO: 21 SSLVYEPVNPTFGGNPLNTTHLFSRAEAINDY (>A0A136HQR0_9ALTE
(26:58)) SEQ ID NO: 22 TELVYEPINPSFGGNPLNGSFLLSKANSQNAH
(>A0A0W1SRL3_9GAMM (21:53)) SEQ ID NO: 23
TEIVYQPINPSFGGNPMNGSFLLQKAQSQNAH (>B0UH01_METS4 (26:59)) SEQ ID
NO: 24 SSLVYQPVNPAFGGPQLNGSWLQAEANAQNIPQ (>Q6NAU5_RHOPA (22:53))
SEQ ID NO: 25 GSLVYTPTNPAFGGSPLNGSWQMQQATAGNH (>G8PUY5_PSEUV
(7:38)) SEQ ID NO: 26 QQLIYQPTNPSFGGYAANTTHLFATANAQKTA
(>A0A0S2ETP7_9RHIZ (25:57)) SEQ ID NO: 27
GDLVYTPVNPSFGGSPLNSAHLLSIAGAQKNA (>E3I1Z1_RHOVT (19:51)) SEQ ID
NO: 28 AELGYTPVNPSFGGSPLNGSTLLSEASAQKPN (>F3Z094_DESAF (24:55))
SEQ ID NO: 29 TELVFSFTNPSFGGDPMIGNFLLNKADSQKR (>A0A176T7M2_9FLAO
(21:53)) SEQ ID NO: 30 QQLVYKSINPFFGGGDSFAYQQLLASANAQND
(>D2QPP8_SPILD (14:45)) SEQ ID NO: 31
QALVYHPNNPAFGGNTFNYQWMLSSAQAQDR (>N2IYT1_9PSED (26:58)) SEQ ID
NO: 32 TELVYTPKNPAFGGSPLNGSYLLGNAQAQNDY (>W7QHV5_9GAMM (26:58))
SEQ ID NO: 33 GQLIYQPINPSFGGDPLLGNHLLNKAQAQDTK (>D4ZLW2_SHEVD
(23:55)) SEQ ID NO: 34 TQLIYTPVNPNFGGSYLNGSYLLANASVQNDH
(>D2QT92_SPILD (21:53)) SEQ ID NO: 35
QAFVYHPNNPNFGGNTFNYSWMLSSAQAQDRT (>A0A167UJA2_9FLAO (20:51)) SEQ
ID NO: 36 QGLIYKPKNPAFGGDTFNYQWLASSAESQNK (>P0AE98 (20:28);
mature peptide of CsgF 20:27) SEQ ID NO: 37 GTMTFQFR (>P0AE98
(20:39); mature peptide of CsgF 20:38) SEQ ID NO: 38
GTMTFQFRNPNFGGNPNNG (>P0AE98 (20:49); mature peptide of CsgF
20:48) SEQ ID NO: 39 GTMTFQFRNPNFGGNPNNGAFLLNSAQAQ (>P0AE98
(20:65); mature peptide of CsgF 20:64) SEQ ID NO: 40
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIET (CsgF_d27_end) SEQ ID
NO: 41 ACGGAACTGGAAAGTCATGGTTCC (CsgF_d38_end) SEQ ID NO: 42
GCCATTATTTGGGTTACCACCAAAGTTTGG (CsgF_d48_end) SEQ ID NO: 43
TTGGGCCTGAGCGCTATTTAATAAAAAAGC (CsgF_d64_end) SEQ ID NO: 44
TGTTTCAATACCAAAGTCATCGTTATAGCTCGG
(pNa62_CsgF_histag_Fw) SEQ ID NO: 45 CATCACCATCACCATCACTAAGCCC
(CsgF-His_pET22b_FW) SEQ ID NO: 46 CCCCCATATGGGAACCATGACTTTCCAGTTCC
SEQ ID NO: 47: (CsgF-His_pET22b_Rev)
CCCCGAATTCCTAATGGTGATGGTGATGGTGGTAAAAATCGGTTGAGTTATTTTG SEQ ID NO:
48: (csgEFG_pDONR221_FW)
GGGGACAAGTTTGTACAAAAAAGCAGGCTACCTCAGGCGATAAAGCCATGAAACGTTA SEQ ID
NO: 49: (csgEFG_pDONR221_Rev)
GGGGACCACTTTGTACAAGAAAGCTGGGTGTTTAAACTCATTTTTCGAACTGCGGGTGGCTCCAA
GCGCTGG SEQ ID NO: 50: (Mut_csgF_His_FW)
CAAAATAACTCAACCGATTTTCATCACCATCACCATCACTAAGCCCCAGCTTCATAAGG SEQ ID
NO: 51: (Mut_csgF_His_Rev)
CCTTATGAAGCTGGGGCTTAGTGATGGTGATGGTGATGAAAATCGGTTGAGTTATTTTG SEQ ID
NO: 52: (DelCsgE_Rev) AGCCTGCTTTTTTGTACAAAC SEQ ID NO: 53: (DelCsgE
FW) ATAAAAAATTGTTCGGAGGCTGC (>P0AE98 (20:50); mature peptide of
CsgF 1:30) SEQ ID NO: 54 GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN (>P0AE98
(20:54); mature peptide of CsgF 1:35) SEQ ID NO: 55
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDP
Examples of CsgF sequences with protease cleavage sites made into
proteins. Signal peptide is shown in bold TEV protease cleavage
site in bold and underline and HCV C3 protease cleavage site in
underline. StrepII indicate the Strep tag at the C terminus, H10
indicates the 10.times.Histidine tag at the C terminus and **
indicates STOP codons.
TABLE-US-00007 Pro-CsgF-Eco-(WT-T4C/N17S/P35-TEV-S36)-StrepII SEQ
ID NO: 56
MRVKHAVVLLMLISPLSWAGTMCFQFRNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK**
Pro-CsgF-Eco-(WT-N17S-Del(P35-[TEV]-S36)-StrepII SEQ ID NO: 57
MRVKHAVVLLMLISPLSWAGTMTFURNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK**
Pro-CsgF-Eco-(WT-G1C/N17S/P35-[TEV]-S36)-StrepII SEQ ID NO: 58
MRVKHAVVLLMLISPLSWACTMTFURNPNFGGNPSNGAFLLNSAQAQNSYKDPENLYFQSSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-G1C/P35-[TEV]-S36)-StrepII
SEQ ID NO: 59
MRVKHAVVLLMLISPLSWACTMTFURNPNFGGNPNNGAFLLNSAQAQNSYKDPENLYFQSSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-T45-TEV-P46)-H10 SEQ ID
NO: 60
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETEN
LYFQSPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQV
SGLQNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-P35-TEV-S36)-H10 SEQ ID NO:
61
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPENLYFQSSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-N30-TEV-S31)-H10 SEQ ID
NO: 62
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNENLYFQSSYKDPSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-T45-TEV-F51)-H10 SEQ ID
NO: 63
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETEN
LYFQSFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQN
NSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-N30-TEV-Y37)-H10 SEQ ID NO: 64
MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNENLYFQSYNDDFGIETP
SALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQN
NSTDFHHHHHHHHHH** Pro-CsgF-Eco-(WT-D34-[C3]-S36) SEQ ID NO: 65
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDLEVLFQGPSYND
DFGIETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-I42-[C3]-E43) SEQ ID NO:
66
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGILEVL
FQGPETPSALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQ
VSGLQNNSTDFSAWSHPQFEK** Pro-CsgF-Eco-(WT-N38-[C3]-S47) SEQ ID NO:
67
MRVKHAVVLLMLISPLSWACTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNLEVLFQGPS
ALDNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNN
STDFSAWSHPQFEK** SEQ ID NO: 68
MPRAQSYKDLTHLPMPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKDSRWFIPLE
RQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGGVGARYFGIGADT
QYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEIGYTSNEPVMLCLMS
AIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES SEQ ID NO: 69
CLTAPPKQAAKPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAML
VTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAMNNRIPLQSLTAANIMVEGSIIGYESNVKS
GGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEI
GYTSNEPVMLCLMSAIETG SEQ ID NO: 70
CLTAPPKEAAKPTLMPRAQSYKDLTHLPIPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAML
VTALKDSRWFVPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKS
GGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEI
GYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKADRQNDILVKYRHMSVPPES SEQ ID
NO: 71
CLTTPPKEAAKPTLMPRAQSYKDLTHLPVPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAML
VTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLPSLTAANIMVEGSIIGYESNVKS
GGAGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEI
GYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKADRQNDILVKYRQMSVPPES SEQ ID
NO: 72
CLTAPPKEAAKPTLMPRAQSYRDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAML
VTALKDSHWFIPLERQGLQNLLNERKIIRAAQENGTVANNNRMPLQSLAAANVMIEGSIIGYESNVKS
GGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEI
GYTSNEPVMMCLMSAIETGVIFLINDGIDRGLWDLQNKADAQNPVLVKYRDMSVPPES SEQ ID
NO: 73
CLTAPPKEAAKPTLMPRAQSYRDLTHLPLPSGKVFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKD-
S
RWFVPLERQGLQNLLNERKIIRAAQENGTVADNNRIPLQSLTAANVMIEGSIIGYESNVKSGGVGARYFGIGAD-
T
QYQLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVFRFVDYQRLLEGEIGYTSNEPVMLCLMSAIETGV-
I YLINDGIERGLWDLQQKADVDNPILARYRNMSAPPES SEQ ID NO: 74
CLTAPPKEAAKPTLMPRAQSYRDLTNLPDPKGKLFVSVYNIQDETGQFKPYPASNFSTAVPQSATSMLVTALKD-
S
RWFIPLERQGLQNLLNERKIIRAAQENGTVAENNRMPLQSLVAANVMIEGSIIGYESNVKSGGVGARYFGIGGD-
T
QYQLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEIGYTANEPVMLCLMSAIETGV-
I HLINDGINRGLWELKNKGDAKNTILAKYRSMAVPPES SEQ ID NO: 75
CLTAAPKEAARPTLLPRAPSYTDLTHLPSPQGRIFVSVYNIQDETGQFKPYPACNFSTAVPQSATAMLVSALKD-
S
KWFIPLERQGLQNLLNERKIIRAAQENGSVAINNQRPLSSLVAANILIEGSIIGYESNVKSGGVGARYFGIGAS-
T
QYQLDQIAVNLRAVDVNTGEVLSSVNTSKTILSYEVQAGVFRFIDYQRLLEGELGYTTNEPVMLCLMSAIESGV-
I YLVNDGIERNLWQLQNPSEINSPILQRYKNNIVPAES SEQ ID NO: 76
CITSPPKQAAKPTLLPRSQSYQDLTHLPEPQGRLFVSVYNISDETGQFKPYPASNFSTSVPQSATAMLVSALKD-
S
NWFIPLERQGLQNLLNERKIIRAAQENGTVAVNNRTQLPSLVAANILIEGSIIGYESNVKSGGAGARYFGIGAS-
T
QYQLDQIAVNLRVVNVSTGEVLSSVNTSKTILSYEFQAGVFRYIDYQRLLEGEVGYTVNEPVMLCLMSAIETGV-
I YLVNDGISRNLWQLKNASDINSPVLEKYKSIIVP SEQ ID NO: 77
CLTAPPKQAAKPTLMPRAQSYQDLTHLPEPAGKLFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVSALKD-
S
GWFIPLERQGLQNLLNERKIIRAAQENGTAAVNNQHQLSSLVAANVLVEGSIIGYESNVKSGGAGARFFGIGAS-
T
QYQLDQIAVNLRVVDVNTGQVLSSVNTSKTILSYEVQAGVFRYIDYQRLLEGEIGYTTNEPVMLCVMSAIETGV-
I YLVNDGINRNLWTLKNPQDAKSSVLERYKSTIVP SEQ ID NO: 78
CITTPPQEAAKPTLLPRDATYKDLVSLPQPRGKIYVAVYNIQDETGQFQPYPASNFSTSVPQSATAML
VSSLKDSRWFVPLERQGLNNLLNERKIIRAAQQNGTVGDNNASPLPSLYSANVIVEGSIIGYASNVKT
GGFGARYFGIGGSTQYQLDQVAVNLRIVNVHTGEVLSSVNTSKTILSYEIQAGVFRFIDYQRLLEGEA
GFTTNEPVMTCLMSAIEEGVIHLINDGINKKLWALSNAADINSEVLTRYRK SEQ ID NO: 79
ITEVPKEAAKPTLMPRASTYKDLVALPKPNGKIIVSVYSVQDETGQFKPLPASNFSTAVPQSGNAMLTSALKDS-
G
WFVPLEREGLQNLLNERKIIRAAQENGTVAANNQQPLPSLLSANVVIEGAIIGYDSDIKTGGAGARYFGIGADG-
K
YRVDQVAVNLRAVDVRTGEVLLSVNTSKTILSSELSAGVFRFIEYQRLLELEAGYTTNEPVMMCMMSALEAGVA-
H LIVEGIRQNLWSLQNPSDINNPIIQRYMKEDVP SEQ ID NO: 80
PETSESPTLMQRGANYIDLISLPKPQGKIFVSVYDFRDQTGQYKPQPNSNFSTAVPQGGTALLTMALLDSEWFY-
P
LERQGLQNLLTERKIIRAAQKKQESISNHGSTLPSLLSANVMIEGGIVAYDSNIKTGGAGARYLGIGGSGQYRA-
D
QVTVNIRAVDVRSGKILTSVTTSKTILSYEVSAGAFRFVDYKELLEVELGYTNNEPVNIALMSAIDSAVIHLIV-
K GVQQGLWRPANLDTRNNPIFKKY SEQ ID NO: 81
PDASESPTLMQRGATYLDLISLPKPQGKIYVSVYDFRDQTGQYKPQPNSNFSTAVPQGGTALLTMALLDSEWFY-
P
LERQGLQNLLTERKIIRAAQKKQESISNHGSTLPSLLSANVMIEGGIVAYDSNIKTGGAGARYLGIGGSGQYRA-
D
QVTVNIRAVDVRSGKILTSVTTSKTILSYELSAGAFRFVDYKELLEVELGYTNNEPVNIALMSAIDSAVIHLIV-
K GIEEGLWRPENQNGKENPIFRKY SEQ ID NO: 82
PETSKEPTLMARGTAYQDLVSLPLPKGKVYVSVYDFRDQTGQYKPQPNSNFSTAVPQGGAALLTTALLDSRWFM-
P
LEREGLQNLLTERKIIRAAQKKDEIPTNHGVHLPSLASANIMVEGGIVAYDTNIQTGGAGARYLGVGASGQYRT-
D
QVTVNIRAVDVRTGRILLSVTTSKTILSKELQTGVFKFVDYKDLLEAELGYTTNEPVNLAVMSAIDAAVVHVIV-
D GIKTGLWEPLRGEDLQHPIIQEYMNRSKP SEQ ID NO: 83
CATHIGSPVADEKATLMPRSVSYKELISLPKPKGKIVAAVYDFRDQTGQYLPAPASNFSTAVTQGGVAMLSTAL-
W
DSQWFVPLEREGLQNLLTERKIVRAAQNKPNVPGNNANQLPSLVAANILIEGGIVAYDSNVRTGGAGAKYFGIG-
A
SGEYRVDQVTVNLRAVDIRSGRILNSVTTSKTVMSQQVQAGVFRFVEYKRLLEAEAGFSTNEPVQMCVMSAIES-
G VIRLIANGVRDNLWQLADQRDIDNPILQEYLQDNAP SEQ ID NO: 84
ASSSLMPKGESYYDLINLPAPQGVMLAAVYDFRDQTGQYKPIPSSNFSTAVPQSGTAFLAQALNDSSWFIPVER-
E
GLQNLLTERKIVRAGLKGDANKLPQLNSAQILMEGGIVAYDTNVRTGGAGARYLGIGAATQFRVDTVTVNLRAV-
D
IRTGRLLSSVTTTKSILSKEITAGVFKFIDAQELLESELGYTSNEPVSLCVASAIESAVVHMIADGIWKGAWNL-
A DQASGLRSPVLQKY SEQ ID NO: 85
QDSETPTLTPRASTYYDLINMPRPKGRLMAVVYGFRDQTGQYKPTPASSFSTSVTQGAASMLMDALSASGWFVV-
L
EREGLQNLLTERKIIRASQKKPDVAENIMGELPPLQAANLMLEGGIIAYDTNVRSGGEGARYLGIDISREYRVD-
Q
VTVNLRAVDVRTGQVLANVMTSKTIYSVGRSAGVFKFIEFKKLLEAEVGYTTNEPAQLCVLSAIESAVGHLLAQ-
G IEQRLWQV SEQ ID NO: 86
MPKSDTYYDLIGLPHPQGSMLAAVYDFRDQTGQYKAIPSSNFSTAVPQSGTAFLAQALNDSSWFVPVE
REGLQNLLTERKIVRAGLKGEANQLPQLSSAQILMEGGIVAYDTNIKTGGAGARYLGIGVNSKFRVDT
VTVNLRAVDIRTGRLLSSVTTTKSILSKEVSAGVFKFIDAQDLLESELGYTSNEPVSLCVAQAIESAV
VHMIADGIWKRAWNLADTASGLNNPVLQKY SEQ ID NO: 87
LTRRMSTYQDLIDMPAPRGKIVTAVYSFRDQSGQYKPAPSSSFSTAVTQGAAAMLVNVLNDSGWFIPLEREGLQ-
N
ILTERKIIRAALKKDNVPVNNSAGLPSLLAANIMLEGGIVGYDSNIHTGGAGARYFGIGASEKYRVDEVTVNLR-
A
IDIRTGRILHSVLTSKKILSREIRSDVYRFIEFKHLLEMEAGITTNDPAQLCVLSAIESAVAHLIVDGVIKKSW-
S LADPNELNSPVIQAYQQQRI SEQ ID NO: 88
PSDPERSTMGELTPSTAELRNLPLPNEKIVIGVYKFRDQTGQYKPSENGNNWSTAVPQGTTTILIKALEDSRWF-
I
PIERENIANLLNERQIIRSTRQEYMKDADKNSQSLPPLLYAGILLEGGVISYDSNTMTGGFGARYFGIGASTQY-
R
QDRITIYLRAVSTLNGEILKTVYTSKTILSTSVNGSFFRYIDTERLLEAEVGLTQNEPVQLAVTEAIEKAVRSL-
I IEGTRDKIW
REFERENCES
[0495] Chin J W., Martin A B., King D S., Wang L., Schultz P G.
(2002) Addition of a photocrosslinking amino acid to the genetic
code of Escherichia coli. Proc Nat Acad Sci USA 99(17):
11020-11024. [0496] Goyal P, Van Gerven N, Jonckheere W, Remaut H.
(2013) Crystallization and preliminary X-ray crystallographic
analysis of the curli transporter CsgG. Acta Crystallogr Sect F
Struct Biol Cryst Commun. 69(Pt 12):1349-53. [0497] Goyal P,
Krasteva P V, Van Gerven N, Gubellini F, Van den Broeck I,
Troupiotis-Tsailaki A, Jonckheere W, Pehau-Arnaudet G, Pinkner J S,
Chapman M R, Hultgren S J, Howorka S, Fronzes R, Remaut H. (2014)
Structural and mechanistic insights into the bacterial amyloid
secretion channel CsgG. Nature 516(7530):250-3. [0498] Hammar M,
Arnqvist A, Bian Z, Ols6n A, Normark S. (1995) Expression of two
csg operons is required for production of fibronectin- and congo
red-binding curli polymers in Escherichia coli K-12. Mol Microbiol.
18(4):661-70. [0499] Juncker A S, Willenbrock H, Von Heijne G,
Brunak S, Nielsen H, Krogh A. (2003) Prediction of lipoprotein
signal peptides in Gram-negative bacteria. Protein Sci.
12(8):1652-62. [0500] Ludtke S J. 2016, Single-particle refinement
and variability analysis in EMAN2.1. Methods Enzymol. 579:159-89.
[0501] Rohou A and Grigorieff N 2015, CTFFIND4: Fast and accurate
defocus estimation from electron micrographs. J Struct Biol.
192(2):216-21. [0502] Robinson L S, Ashman E M, Hultgren S J,
Chapman M R. (2006) Secretion of curli fibre subunits is mediated
by the outer membrane-localized CsgG protein. Molecular
Microbiology 59, 870-881. [0503] Scheres 2012, RELION:
implementation of a Bayesian approach to cryo-E M structure
determination. J. Struct. Biol. 180(3):519-30. [0504] Wang A.,
Winblade Nairn N., Marelli M., Grabstein K. (2012). Protein
Engineering with Non-Natural Amino Acids. Protein Engineering,
Prof. Pravin Kaumaya (Ed.), InTech, DOI: 10.5772/28719. [0505]
Zheng S Q., Palovcak E., Armache J-P., Verba K A., Cheng Y., Agard
D A. (2017) MotionCor2: anisotropic correction of beam-induced
Sequence CWU 1
1
1201834DNAEscherichia coli 1atgcagcgct tatttctttt ggttgccgtc
atgttactga gcggatgctt aaccgccccg 60cctaaagaag ccgccagacc gacattaatg
cctcgtgctc agagctacaa agatttgacc 120catctgccag cgccgacggg
taaaatcttt gtttcggtat acaacattca ggacgaaacc 180gggcaattta
aaccctaccc ggcaagtaac ttctccactg ctgttccgca aagcgccacg
240gcaatgctgg tcacggcact gaaagattct cgctggttta taccgctgga
gcgccagggc 300ttacaaaacc tgcttaacga gcgcaagatt attcgtgcgg
cacaagaaaa cggcacggtt 360gccattaata accgaatccc gctgcaatct
ttaacggcgg caaatatcat ggttgaaggt 420tcgattatcg gttatgaaag
caacgtcaaa tctggcgggg ttggggcaag atattttggc 480atcggtgccg
acacgcaata ccagctcgat cagattgccg tgaacctgcg cgtcgtcaat
540gtgagtaccg gcgagatcct ttcttcggtg aacaccagta agacgatact
ttcctatgaa 600gttcaggccg gggttttccg ctttattgac taccagcgct
tgcttgaagg ggaagtgggt 660tacacctcga acgaacctgt tatgctgtgc
ctgatgtcgg ctatcgaaac aggggtcatt 720ttcctgatta atgatggtat
cgaccgtggt ctgtgggatt tgcaaaataa agcagaacgg 780cagaatgaca
ttctggtgaa ataccgccat atgtcggttc caccggaatc ctga
8342277PRTEscherichia coli 2Met Gln Arg Leu Phe Leu Leu Val Ala Val
Met Leu Leu Ser Gly Cys1 5 10 15Leu Thr Ala Pro Pro Lys Glu Ala Ala
Arg Pro Thr Leu Met Pro Arg 20 25 30Ala Gln Ser Tyr Lys Asp Leu Thr
His Leu Pro Ala Pro Thr Gly Lys 35 40 45Ile Phe Val Ser Val Tyr Asn
Ile Gln Asp Glu Thr Gly Gln Phe Lys 50 55 60Pro Tyr Pro Ala Ser Asn
Phe Ser Thr Ala Val Pro Gln Ser Ala Thr65 70 75 80Ala Met Leu Val
Thr Ala Leu Lys Asp Ser Arg Trp Phe Ile Pro Leu 85 90 95Glu Arg Gln
Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile Arg 100 105 110Ala
Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg Ile Pro Leu 115 120
125Gln Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly Ser Ile Ile Gly
130 135 140Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr
Phe Gly145 150 155 160Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn Leu 165 170 175Arg Val Val Asn Val Ser Thr Gly Glu
Ile Leu Ser Ser Val Asn Thr 180 185 190Ser Lys Thr Ile Leu Ser Tyr
Glu Val Gln Ala Gly Val Phe Arg Phe 195 200 205Ile Asp Tyr Gln Arg
Leu Leu Glu Gly Glu Val Gly Tyr Thr Ser Asn 210 215 220Glu Pro Val
Met Leu Cys Leu Met Ser Ala Ile Glu Thr Gly Val Ile225 230 235
240Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp Leu Gln Asn
245 250 255Lys Ala Glu Arg Gln Asn Asp Ile Leu Val Lys Tyr Arg His
Met Ser 260 265 270Val Pro Pro Glu Ser 2753262PRTEscherichia coli
3Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Arg Pro Thr Leu Met Pro1 5
10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ala Pro Thr
Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly
Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro
Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg
Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu
Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val
Ala Ile Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr Ala Ala
Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn
Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly
Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155
160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val Asn
165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val
Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Val
Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser
Ala Ile Glu Thr Gly Val 210 215 220Ile Phe Leu Ile Asn Asp Gly Ile
Asp Arg Gly Leu Trp Asp Leu Gln225 230 235 240Asn Lys Ala Glu Arg
Gln Asn Asp Ile Leu Val Lys Tyr Arg His Met 245 250 255Ser Val Pro
Pro Glu Ser 2604414PRTEscherichia coli 4Ala Thr Gly Cys Gly Thr Gly
Thr Cys Ala Ala Ala Cys Ala Thr Gly1 5 10 15Cys Ala Gly Thr Ala Gly
Thr Thr Cys Thr Ala Cys Thr Cys Ala Thr 20 25 30Gly Cys Thr Thr Ala
Thr Thr Thr Cys Gly Cys Cys Ala Thr Thr Ala 35 40 45Ala Gly Thr Thr
Gly Gly Gly Cys Thr Gly Gly Ala Ala Cys Cys Ala 50 55 60Thr Gly Ala
Cys Thr Thr Thr Cys Cys Ala Gly Thr Thr Cys Cys Gly65 70 75 80Thr
Ala Ala Thr Cys Cys Ala Ala Ala Cys Thr Thr Thr Gly Gly Thr 85 90
95Gly Gly Thr Ala Ala Cys Cys Cys Ala Ala Ala Thr Ala Ala Thr Gly
100 105 110Gly Cys Gly Cys Thr Thr Thr Thr Thr Thr Ala Thr Thr Ala
Ala Ala 115 120 125Thr Ala Gly Cys Gly Cys Thr Cys Ala Gly Gly Cys
Cys Cys Ala Ala 130 135 140Ala Ala Cys Thr Cys Thr Thr Ala Thr Ala
Ala Ala Gly Ala Thr Cys145 150 155 160Cys Gly Ala Gly Cys Thr Ala
Thr Ala Ala Cys Gly Ala Thr Gly Ala 165 170 175Cys Thr Thr Thr Gly
Gly Thr Ala Thr Thr Gly Ala Ala Ala Cys Ala 180 185 190Cys Cys Cys
Thr Cys Ala Gly Cys Gly Thr Thr Ala Gly Ala Thr Ala 195 200 205Ala
Cys Thr Thr Thr Ala Cys Thr Cys Ala Gly Gly Cys Cys Ala Thr 210 215
220Cys Cys Ala Gly Thr Cys Ala Cys Ala Ala Ala Thr Thr Thr Thr
Ala225 230 235 240Gly Gly Thr Gly Gly Gly Cys Thr Ala Cys Thr Gly
Thr Cys Gly Ala 245 250 255Ala Thr Ala Thr Thr Ala Ala Thr Ala Cys
Cys Gly Gly Thr Ala Ala 260 265 270Ala Cys Cys Gly Gly Gly Cys Cys
Gly Cys Ala Thr Gly Gly Thr Gly 275 280 285Ala Cys Cys Ala Ala Cys
Gly Ala Thr Thr Ala Thr Ala Thr Thr Gly 290 295 300Thr Cys Gly Ala
Thr Ala Thr Thr Gly Cys Cys Ala Ala Cys Cys Gly305 310 315 320Cys
Gly Ala Thr Gly Gly Thr Cys Ala Ala Thr Thr Gly Cys Ala Gly 325 330
335Thr Thr Gly Ala Ala Cys Gly Thr Gly Ala Cys Ala Gly Ala Thr Cys
340 345 350Gly Thr Ala Ala Ala Ala Cys Cys Gly Gly Ala Cys Ala Ala
Ala Cys 355 360 365Cys Thr Cys Gly Ala Cys Cys Ala Thr Cys Cys Ala
Gly Gly Thr Thr 370 375 380Thr Cys Gly Gly Gly Thr Thr Thr Ala Cys
Ala Ala Ala Ala Thr Ala385 390 395 400Ala Cys Thr Cys Ala Ala Cys
Cys Gly Ala Thr Thr Thr Thr 405 4105138PRTEscherichia coli 5Met Arg
Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser
Trp Ala Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25
30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln
35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu
Thr 50 55 60Pro Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln
Ile Leu65 70 75 80Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro
Gly Arg Met Val 85 90 95Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg
Asp Gly Gln Leu Gln 100 105 110Leu Asn Val Thr Asp Arg Lys Thr Gly
Gln Thr Ser Thr Ile Gln Val 115 120 125Ser Gly Leu Gln Asn Asn Ser
Thr Asp Phe 130 1356119PRTEscherichia coli 6Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser
Tyr Asn Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala 35 40 45Leu Asp Asn
Phe Thr Gln Ala Ile Gln Ser Gln Ile Leu Gly Gly Leu 50 55 60Leu Ser
Asn Ile Asn Thr Gly Lys Pro Gly Arg Met Val Thr Asn Asp65 70 75
80Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu Gln Leu Asn Val
85 90 95Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr Ile Gln Val Ser Gly
Leu 100 105 110Gln Asn Asn Ser Thr Asp Phe 1157106DNAArtificial
Sequencefragment of wild-type E. coli CsgF encoding amino acids 1
to 27 and a C-terminal 6 His tag 7atgcgtgtca aacatgcagt agttctactc
atgcttattt cgccattaag ttgggctgga 60accatgactt tccagttccg tcatcaccat
caccatcact aagccc 106833PRTArtificial Sequencefragment of wild-type
E. coli CsgF encompassing amino acids 1 to 27 and a C-terminal 6
His tag 8Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg His His
His His His 20 25 30His9139DNAArtificial Sequencefragment of
wild-type E. coli CsgF encoding amino acids 1 to 38 and a
C-terminal 6 His tag 9atgcgtgtca aacatgcagt agttctactc atgcttattt
cgccattaag ttgggctgga 60accatgactt tccagttccg taatccaaac tttggtggta
acccaaataa tggccatcac 120catcaccatc actaagccc 1391044PRTArtificial
Sequencefragment of wild-type E. coli CsgF encompassing amino acids
1 to 38 and a C-terminal 6 His tag 10Met Arg Val Lys His Ala Val
Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn
Gly His His His His His His 35 4011169DNAArtificial
Sequencefragment of wild-type E. coli CsgF encoding amino acids 1
to 48 and a C-terminal 6 His tag 11atgcgtgtca aacatgcagt agttctactc
atgcttattt cgccattaag ttgggctgga 60accatgactt tccagttccg taatccaaac
tttggtggta acccaaataa tggcgctttt 120ttattaaata gcgctcaggc
ccaacatcac catcaccatc actaagccc 1691254PRTArtificial
Sequencefragment of wild-type E. coli CsgF encompassing amino acids
1 to 48 and a C-terminal 6 His tag 12Met Arg Val Lys His Ala Val
Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn
Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45His His His His
His His 5013217DNAArtificial Sequencefragment of wild-type E. coli
CsgF encoding amino acids 1 to 64 and a C-terminal 6 His tag
13atgcgtgtca aacatgcagt agttctactc atgcttattt cgccattaag ttgggctgga
60accatgactt tccagttccg taatccaaac tttggtggta acccaaataa tggcgctttt
120ttattaaata gcgctcaggc ccaaaactct tataaagatc cgagctataa
cgatgacttt 180ggtattgaaa cacatcacca tcaccatcac taagccc
2171470PRTArtificial Sequencefragment of wild-type E. coli CsgF
encompassing amino acids 1 to 64 and a C-terminal 6 His tag 14Met
Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10
15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly
20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala
Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile
Glu Thr 50 55 60His His His His His His65 701534PRTArtificial
Sequenceresidues 20 to 53 of E. coli CsgF 15Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys
Asp1625PRTArtificial Sequenceresidues 20 to 42 of E. coli CsgF,
including KD at its C-terminus 16Gly Thr Met Thr Phe Gln Phe Arg
Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu
Lys Asp 20 251732PRTArtificial Sequenceresidues 23 to 55 of CsgF
homologue Q88H88 17Thr Glu Leu Val Tyr Thr Pro Val Asn Pro Ala Phe
Gly Gly Asn Pro1 5 10 15Leu Asn Gly Thr Trp Leu Leu Asn Asn Ala Gln
Ala Gln Asn Asp Tyr 20 25 301832PRTArtificial Sequenceresidues 25
to 57 of CsgF homologue A0A143HJA0 18Thr Glu Leu Ile Tyr Glu Pro
Val Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Leu Asn Gly Ser Tyr Leu
Leu Asn Asn Ala Gln Ala Gln Asp Arg His 20 25 301932PRTArtificial
Sequenceresidues 21 to 53 of CsgF homologue Q5E245 19Ser Glu Leu
Val Tyr Thr Pro Val Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Leu Asn
Thr Ser His Leu Phe Gly Gly Ala Asn Ala Ile Asn Asp Tyr 20 25
302032PRTArtificial Sequenceresidues 19 to 51 of CsgF homologue
Q084E5 20Thr Gln Leu Val Tyr Thr Pro Val Asn Pro Ala Phe Gly Gly
Ser Tyr1 5 10 15Leu Asn Gly Ser Tyr Leu Leu Ala Asn Ala Ser Ala Gln
Asn Glu His 20 25 302132PRTArtificial Sequenceresidues 15 to 47 of
CsgF homologue F0LZU2 21Ser Ser Leu Val Tyr Glu Pro Val Asn Pro Thr
Phe Gly Gly Asn Pro1 5 10 15Leu Asn Thr Thr His Leu Phe Ser Arg Ala
Glu Ala Ile Asn Asp Tyr 20 25 302232PRTArtificial Sequenceresidues
26 to 58 of CsgF homologue A0A136HQR0 22Thr Glu Leu Val Tyr Glu Pro
Ile Asn Pro Ser Phe Gly Gly Asn Pro1 5 10 15Leu Asn Gly Ser Phe Leu
Leu Ser Lys Ala Asn Ser Gln Asn Ala His 20 25 302332PRTArtificial
Sequenceresidues 21 to 53 of CsgF homologue A0A0W1SRL3 23Thr Glu
Ile Val Tyr Gln Pro Ile Asn Pro Ser Phe Gly Gly Asn Pro1 5 10 15Met
Asn Gly Ser Phe Leu Leu Gln Lys Ala Gln Ser Gln Asn Ala His 20 25
302433PRTArtificial Sequenceresidues 26 to 59 of CsgF homologue
B0UH01 24Ser Ser Leu Val Tyr Gln Pro Val Asn Pro Ala Phe Gly Gly
Pro Gln1 5 10 15Leu Asn Gly Ser Trp Leu Gln Ala Glu Ala Asn Ala Gln
Asn Ile Pro 20 25 30Gln2531PRTArtificial Sequenceresidues 22 to 53
of CsgF homologue Q6NAU5 25Gly Ser Leu Val Tyr Thr Pro Thr Asn Pro
Ala Phe Gly Gly Ser Pro1 5 10 15Leu Asn Gly Ser Trp Gln Met Gln Gln
Ala Thr Ala Gly Asn His 20 25 302632PRTArtificial Sequenceresidues
7 to 38 of CsgF homologue G8PUY5 26Gln Gln Leu Ile Tyr Gln Pro Thr
Asn Pro Ser Phe Gly Gly Tyr Ala1 5 10 15Ala Asn Thr Thr His Leu Phe
Ala Thr Ala Asn Ala Gln Lys Thr Ala 20 25 302732PRTArtificial
Sequenceresidues 25 to 57 of CsgF homologue A0A0S2ETP7 27Gly Asp
Leu Val Tyr Thr Pro Val Asn Pro Ser Phe Gly Gly Ser Pro1 5 10 15Leu
Asn Ser Ala His Leu Leu Ser Ile Ala Gly Ala Gln Lys Asn Ala 20 25
302832PRTArtificial Sequenceresidues 19 to 51 of CsgF homologue
E3I1Z1 28Ala Glu Leu Gly Tyr Thr Pro Val Asn Pro Ser Phe Gly Gly
Ser Pro1 5 10 15Leu Asn Gly Ser Thr Leu Leu Ser Glu Ala Ser Ala Gln
Lys Pro Asn 20 25 302931PRTArtificial Sequenceresidues 24 to 55 of
CsgF homologue F3Z094 29Thr Glu Leu Val Phe Ser Phe Thr Asn Pro Ser
Phe Gly Gly Asp Pro1 5 10 15Met Ile Gly Asn Phe Leu Leu Asn Lys Ala
Asp Ser Gln Lys Arg 20 25 303032PRTArtificial Sequenceresidues 21
to 53 of CsgF homologue A0A176T7M2 30Gln Gln Leu Val Tyr Lys Ser
Ile Asn Pro Phe Phe
Gly Gly Gly Asp1 5 10 15Ser Phe Ala Tyr Gln Gln Leu Leu Ala Ser Ala
Asn Ala Gln Asn Asp 20 25 303131PRTArtificial Sequenceresidues 14
to 45 of CsgF homologue D2QPP8 31Gln Ala Leu Val Tyr His Pro Asn
Asn Pro Ala Phe Gly Gly Asn Thr1 5 10 15Phe Asn Tyr Gln Trp Met Leu
Ser Ser Ala Gln Ala Gln Asp Arg 20 25 303232PRTArtificial
Sequenceresidues 28 to 58 of CsgF homologue N2IYT1 32Thr Glu Leu
Val Tyr Thr Pro Lys Asn Pro Ala Phe Gly Gly Ser Pro1 5 10 15Leu Asn
Gly Ser Tyr Leu Leu Gly Asn Ala Gln Ala Gln Asn Asp Tyr 20 25
303332PRTArtificial Sequenceresidues 26 to 58 of CsgF homologue
W7QHV5 33Gly Gln Leu Ile Tyr Gln Pro Ile Asn Pro Ser Phe Gly Gly
Asp Pro1 5 10 15Leu Leu Gly Asn His Leu Leu Asn Lys Ala Gln Ala Gln
Asp Thr Lys 20 25 303432PRTArtificial Sequenceresidues 23 to 55 of
CsgF homologue D4ZLW2 34Thr Gln Leu Ile Tyr Thr Pro Val Asn Pro Asn
Phe Gly Gly Ser Tyr1 5 10 15Leu Asn Gly Ser Tyr Leu Leu Ala Asn Ala
Ser Val Gln Asn Asp His 20 25 303532PRTArtificial Sequenceresidues
21 to 53 of CsgF homologue D2QT92 35Gln Ala Phe Val Tyr His Pro Asn
Asn Pro Asn Phe Gly Gly Asn Thr1 5 10 15Phe Asn Tyr Ser Trp Met Leu
Ser Ser Ala Gln Ala Gln Asp Arg Thr 20 25 303631PRTArtificial
Sequenceresidues 20 to 51 of CsgF homologue A0A167UJA2 36Gln Gly
Leu Ile Tyr Lys Pro Lys Asn Pro Ala Phe Gly Gly Asp Thr1 5 10 15Phe
Asn Tyr Gln Trp Leu Ala Ser Ser Ala Glu Ser Gln Asn Lys 20 25
30378PRTArtificial Sequenceresidues 20 to 27 of wild-type E. coli
CsgF 37Gly Thr Met Thr Phe Gln Phe Arg1 53819PRTArtificial
Sequenceresidues 20 to 38 of wild-type E. coli CsgF 38Gly Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn
Gly3929PRTArtificial Sequenceresidues 20 to 48 of wild-type E. coli
CsgF 39Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn
Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 20
254045PRTArtificial Sequenceresidues 20 to 64 of wild-type E. coli
CsgF 40Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn
Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn
Ser Tyr 20 25 30Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu Thr
35 40 454124DNAArtificial Sequenceprimer CsgF_d27_end 41acggaactgg
aaagtcatgg ttcc 244230DNAArtificial Sequenceprimer CsgF_d38_end
42gccattattt gggttaccac caaagtttgg 304330DNAArtificial
Sequenceprimer CsgF_d48_end 43ttgggcctga gcgctattta ataaaaaagc
304433DNAArtificial Sequenceprimer CsgF_d64_end 44tgtttcaata
ccaaagtcat cgttatagct cgg 334525DNAArtificial Sequenceprimer
pNa62_CsgF_histag_Fw 45catcaccatc accatcacta agccc
254632DNAArtificial Sequenceprimer CsgF-His_pET22b_FW 46cccccatatg
ggaaccatga ctttccagtt cc 324755DNAArtificial Sequenceprimer
CsgF-His_pET22b_Rev 47ccccgaattc ctaatggtga tggtgatggt ggtaaaaatc
ggttgagtta ttttg 554858DNAArtificial Sequenceprimer
csgEFG_pDONR221_FW 48ggggacaagt ttgtacaaaa aagcaggcta cctcaggcga
taaagccatg aaacgtta 584972DNAArtificial Sequenceprimer
csgEFG_pDONR221_Rev 49ggggaccact ttgtacaaga aagctgggtg tttaaactca
tttttcgaac tgcgggtggc 60tccaagcgct gg 725059DNAArtificial
Sequenceprimer Mut_csgF_His_FW 50caaaataact caaccgattt tcatcaccat
caccatcact aagccccagc ttcataagg 595159DNAArtificial Sequenceprimer
Mut_csgF_His_Rev 51ccttatgaag ctggggctta gtgatggtga tggtgatgaa
aatcggttga gttattttg 595221DNAArtificial Sequenceprimer DelCsgE_Rev
52agcctgcttt tttgtacaaa c 215323DNAArtificial Sequenceprimer
DelCsgE FW 53ataaaaaatt gttcggaggc tgc 235430PRTArtificial
Sequenceresidues 1 to 29 of mature E. coli CsgF 54Gly Thr Met Thr
Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly
Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25
305535PRTArtificial Sequenceresidues 1 to 45 of mature E. coli CsgF
55Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1
5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser
Tyr 20 25 30Lys Asp Pro 3556155PRTArtificial Sequencea mutated
(T4C/N17S) CsgF sequence with a signal sequence, and a TEV protease
cleavage site (ENLYFQS) inserted between residues 35 and 36 of
sequence of the mature protein 56Met Arg Val Lys His Ala Val Val
Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Cys
Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly
Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp
Pro Glu Asn Leu Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly
Ile Glu Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile
Gln Ser Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly
Lys Pro Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105
110Asn Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly
115 120 125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser
Thr Asp 130 135 140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145
150 15557155PRTArtificial Sequencea mutated (N17S-Del) CsgF
sequence with a signal sequence, and a TEV protease cleavage site
(ENLYFQS) inserted between residues 35 and 36 of sequence of the
mature protein 57Met Arg Val Lys His Ala Val Val Leu Leu Met Leu
Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe Arg
Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly Ala Phe Leu Leu
Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu
Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro
Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile
Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg
Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp
Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln
Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135
140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15558155PRTArtificial Sequencea mutated (G1C/N17S) CsgF sequence
with a signal sequence, and a TEV protease cleavage site (ENLYFQS)
inserted between residues 35 and 36 of sequence of the mature
protein 58Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe Gln Phe Arg Asn Pro
Asn Phe Gly 20 25 30Gly Asn Pro Ser Asn Gly Ala Phe Leu Leu Asn Ser
Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu Tyr Phe
Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala
Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile Leu Gly
Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg Met Val
Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp Gly Gln
Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln Thr Ser
Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135 140Phe
Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15559155PRTArtificial Sequencea mutated (G1C) CsgF sequence with a
signal sequence, and a TEV protease cleavage site (ENLYFQS)
inserted between residues 35 and 36 of sequence of the mature
protein 59Met Arg Val Lys His Ala Val Val Leu Leu Met Leu Ile Ser
Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe Gln Phe Arg Asn Pro
Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser
Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn Leu Tyr Phe
Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr Pro Ser Ala
Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln Ile Leu Gly
Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly Arg Met Val
Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg Asp Gly Gln
Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120 125Gln Thr Ser
Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp 130 135 140Phe
Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15560155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 46 of sequence of the mature protein, and a His10
tag at the C-terminus 60Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr
Asn Asp Asp Phe Gly Ile Glu Thr 50 55 60Glu Asn Leu Tyr Phe Gln Ser
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15561155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 35 and 36 of sequence of the mature protein, and a His10
tag at the C-terminus 61Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Glu Asn
Leu Tyr Phe Gln Ser Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15562155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 30 and 31 of sequence of the mature protein, and a His10
tag at the C-terminus. 62Met Arg Val Lys His Ala Val Val Leu Leu
Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Glu Asn Leu Tyr Phe Gln
Ser Ser Tyr Lys Asp Pro Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu
Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser
Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro
Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn
Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe His His His His His His His His His His145 150
15563149PRTArtificial Sequencea CsgF sequence with a signal
sequence, a TEV protease cleavage site (ENLYFQS) inserted between
residues 45 and 51 of sequence of the mature protein, and a His10
tag at the C-terminus 63Met Arg Val Lys His Ala Val Val Leu Leu Met
Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Gly Thr Met Thr Phe Gln Phe
Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro Ser Tyr
Asn Asp Asp Phe Gly Ile Glu Thr 50 55 60Glu Asn Leu Tyr Phe Gln Ser
Phe Thr Gln Ala Ile Gln Ser Gln Ile65 70 75 80Leu Gly Gly Leu Leu
Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg Met 85 90 95Val Thr Asn Asp
Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu 100 105 110Gln Leu
Asn Val Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr Ile Gln 115 120
125Val Ser Gly Leu Gln Asn Asn Ser Thr Asp Phe His His His His His
130 135 140His His His His His14564149PRTArtificial Sequencea CsgF
sequence with a signal sequence, a TEV protease cleavage site
(ENLYFQS) inserted between residues 30 and 37 of sequence of the
mature protein, and a His10 tag at the C-terminus 64Met Arg Val Lys
His Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala
Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn
Pro Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn
Glu Asn Leu Tyr Phe Gln Ser Tyr Asn Asp Asp Phe Gly Ile Glu 50 55
60Thr Pro Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln Ile65
70 75 80Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg
Met 85 90 95Val Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly
Gln Leu 100 105 110Gln Leu Asn Val Thr Asp Arg Lys Thr Gly Gln Thr
Ser Thr Ile Gln 115 120 125Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
Phe His His His His His 130 135 140His His His His
His14565155PRTArtificial Sequencea CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 34 and 36 of sequence of the mature protein, and a
His10 tag at the C-terminus 65Met Arg Val Lys His Ala Val Val Leu
Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala
Phe Leu
Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Leu Glu Val
Leu Phe Gln Gly Pro Ser Tyr Asn 50 55 60Asp Asp Phe Gly Ile Glu Thr
Pro Ser Ala Leu Asp Asn Phe Thr Gln65 70 75 80Ala Ile Gln Ser Gln
Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn Thr 85 90 95Gly Lys Pro Gly
Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile Ala 100 105 110Asn Arg
Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr Gly 115 120
125Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn Ser Thr Asp
130 135 140Phe Ser Ala Trp Ser His Pro Gln Phe Glu Lys145 150
15566156PRTArtificial Sequencea CsgF sequence with a signal
sequence, a HCV C3 protease cleavage site (LEVLFQGP) inserted
between residues 42 and 43 of sequence of the mature protein, and a
His10 tag at the C-terminus 66Met Arg Val Lys His Ala Val Val Leu
Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro Asn Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser Tyr Lys Asp Pro
Ser Tyr Asn Asp Asp Phe Gly Ile Leu Glu 50 55 60Val Leu Phe Gln Gly
Pro Glu Thr Pro Ser Ala Leu Asp Asn Phe Thr65 70 75 80Gln Ala Ile
Gln Ser Gln Ile Leu Gly Gly Leu Leu Ser Asn Ile Asn 85 90 95Thr Gly
Lys Pro Gly Arg Met Val Thr Asn Asp Tyr Ile Val Asp Ile 100 105
110Ala Asn Arg Asp Gly Gln Leu Gln Leu Asn Val Thr Asp Arg Lys Thr
115 120 125Gly Gln Thr Ser Thr Ile Gln Val Ser Gly Leu Gln Asn Asn
Ser Thr 130 135 140Asp Phe Ser Ala Trp Ser His Pro Gln Phe Glu
Lys145 150 15567148PRTArtificial Sequencea CsgF sequence with a
signal sequence, a HCV C3 protease cleavage site (LEVLFQGP)
inserted between residues 38 and 47 of sequence of the mature
protein, and a His10 tag at the C-terminus 67Met Arg Val Lys His
Ala Val Val Leu Leu Met Leu Ile Ser Pro Leu1 5 10 15Ser Trp Ala Cys
Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly 20 25 30Gly Asn Pro
Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln 35 40 45Asn Ser
Tyr Lys Asp Pro Ser Tyr Asn Leu Glu Val Leu Phe Gln Gly 50 55 60Pro
Ser Ala Leu Asp Asn Phe Thr Gln Ala Ile Gln Ser Gln Ile Leu65 70 75
80Gly Gly Leu Leu Ser Asn Ile Asn Thr Gly Lys Pro Gly Arg Met Val
85 90 95Thr Asn Asp Tyr Ile Val Asp Ile Ala Asn Arg Asp Gly Gln Leu
Gln 100 105 110Leu Asn Val Thr Asp Arg Lys Thr Gly Gln Thr Ser Thr
Ile Gln Val 115 120 125Ser Gly Leu Gln Asn Asn Ser Thr Asp Phe Ser
Ala Trp Ser His Pro 130 135 140Gln Phe Glu
Lys14568248PRTCitrobacter koseri 68Met Pro Arg Ala Gln Ser Tyr Lys
Asp Leu Thr His Leu Pro Met Pro1 5 10 15Thr Gly Lys Ile Phe Val Ser
Val Tyr Asn Ile Gln Asp Glu Thr Gly 20 25 30Gln Phe Lys Pro Tyr Pro
Ala Ser Asn Phe Ser Thr Ala Val Pro Gln 35 40 45Ser Ala Thr Ala Met
Leu Val Thr Ala Leu Lys Asp Ser Arg Trp Phe 50 55 60Ile Pro Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys65 70 75 80Ile Ile
Arg Ala Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg 85 90 95Ile
Pro Leu Gln Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly Ser 100 105
110Ile Ile Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
115 120 125Tyr Phe Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala 130 135 140Val Asn Leu Arg Val Val Asn Val Ser Thr Gly Glu
Ile Leu Ser Ser145 150 155 160Val Asn Thr Ser Lys Thr Ile Leu Ser
Tyr Glu Val Gln Ala Gly Val 165 170 175Phe Arg Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Ile Gly Tyr 180 185 190Thr Ser Asn Glu Pro
Val Met Leu Cys Leu Met Ser Ala Ile Glu Thr 195 200 205Gly Val Ile
Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp 210 215 220Leu
Gln Asn Lys Ala Glu Arg Gln Asn Asp Ile Leu Val Lys Tyr Arg225 230
235 240His Met Ser Val Pro Pro Glu Ser 24569223PRTSalmonella
enterica 69Cys Leu Thr Ala Pro Pro Lys Gln Ala Ala Lys Pro Thr Leu
Met Pro1 5 10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ala
Pro Thr Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu
Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala
Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp
Ser Arg Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly
Thr Val Ala Met Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr
Ala Ala Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu
Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly
Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150
155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val
Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly
Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu
Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met
Ser Ala Ile Glu Thr Gly 210 215 22070262PRTCitrobacter amalonaticus
70Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1
5 10 15Arg Ala Gln Ser Tyr Lys Asp Leu Thr His Leu Pro Ile Pro Thr
Gly 20 25 30Lys Ile Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly
Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro
Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg
Trp Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu
Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val
Ala Ile Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu Thr Ala Ala
Asn Ile Met Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn
Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly
Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155
160Leu Arg Val Val Asn Val Ser Thr Gly Glu Ile Leu Ser Ser Val Asn
165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val
Phe Arg 180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile
Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser
Ala Ile Glu Thr Gly Val 210 215 220Ile Phe Leu Ile Asn Asp Gly Ile
Asp Arg Gly Leu Trp Asp Leu Gln225 230 235 240Asn Lys Ala Asp Arg
Gln Asn Asp Ile Leu Val Lys Tyr Arg His Met 245 250 255Ser Val Pro
Pro Glu Ser 26071262PRTCitrobacter rodentium 71Cys Leu Thr Thr Pro
Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser
Tyr Lys Asp Leu Thr His Leu Pro Val Pro Thr Gly 20 25 30Lys Ile Phe
Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro
Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr
Ala Met Leu Val Thr Ala Leu Lys Asp Ser Arg Trp Phe Ile Pro65 70 75
80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile
85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val Ala Ile Asn Asn Arg Ile
Pro 100 105 110Leu Pro Ser Leu Thr Ala Ala Asn Ile Met Val Glu Gly
Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Ala
Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Ala Asp Thr Gln Tyr Gln
Leu Asp Gln Ile Ala Val Asn145 150 155 160Leu Arg Val Val Asn Val
Ser Thr Gly Glu Ile Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr
Ile Leu Ser Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile
Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile Gly Tyr Thr Ser 195 200
205Asn Glu Pro Val Met Leu Cys Leu Met Ser Ala Ile Glu Thr Gly Val
210 215 220Ile Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp
Leu Gln225 230 235 240Asn Lys Ala Asp Arg Gln Asn Asp Ile Leu Val
Lys Tyr Arg Gln Met 245 250 255Ser Val Pro Pro Glu Ser
26072262PRTEnterobacter asburiae 72Cys Leu Thr Ala Pro Pro Lys Glu
Ala Ala Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Arg Asp
Leu Thr His Leu Pro Ala Pro Thr Gly 20 25 30Lys Ile Phe Val Ser Val
Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala
Ser Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu
Val Thr Ala Leu Lys Asp Ser His Trp Phe Ile Pro65 70 75 80Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg
Ala Ala Gln Glu Asn Gly Thr Val Ala Asn Asn Asn Arg Met Pro 100 105
110Leu Gln Ser Leu Ala Ala Ala Asn Val Met Ile Glu Gly Ser Ile Ile
115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
Tyr Phe 130 135 140Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn145 150 155 160Leu Arg Val Val Asn Val Ser Thr Gly
Glu Val Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser
Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro
Val Met Met Cys Leu Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile
Phe Leu Ile Asn Asp Gly Ile Asp Arg Gly Leu Trp Asp Leu Gln225 230
235 240Asn Lys Ala Asp Ala Gln Asn Pro Val Leu Val Lys Tyr Arg Asp
Met 245 250 255Ser Val Pro Pro Glu Ser 26073262PRTYokenella
regensburgei 73Cys Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr
Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Arg Asp Leu Thr His Leu Pro
Leu Pro Ser Gly 20 25 30Lys Val Phe Val Ser Val Tyr Asn Ile Gln Asp
Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr
Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Thr Ala Leu Lys
Asp Ser Arg Trp Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln
Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn
Gly Thr Val Ala Asp Asn Asn Arg Ile Pro 100 105 110Leu Gln Ser Leu
Thr Ala Ala Asn Val Met Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr
Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135
140Gly Ile Gly Ala Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val
Asn145 150 155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Val Leu
Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val
Gln Ala Gly Val Phe Arg 180 185 190Phe Val Asp Tyr Gln Arg Leu Leu
Glu Gly Glu Ile Gly Tyr Thr Ser 195 200 205Asn Glu Pro Val Met Leu
Cys Leu Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu Ile
Asn Asp Gly Ile Glu Arg Gly Leu Trp Asp Leu Gln225 230 235 240Gln
Lys Ala Asp Val Asp Asn Pro Ile Leu Ala Arg Tyr Arg Asn Met 245 250
255Ser Ala Pro Pro Glu Ser 26074262PRTCronobacter pulveris 74Cys
Leu Thr Ala Pro Pro Lys Glu Ala Ala Lys Pro Thr Leu Met Pro1 5 10
15Arg Ala Gln Ser Tyr Arg Asp Leu Thr Asn Leu Pro Asp Pro Lys Gly
20 25 30Lys Leu Phe Val Ser Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln
Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ala Val Pro Gln
Ser Ala 50 55 60Thr Ser Met Leu Val Thr Ala Leu Lys Asp Ser Arg Trp
Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn Leu Leu Asn
Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly Thr Val Ala
Glu Asn Asn Arg Met Pro 100 105 110Leu Gln Ser Leu Val Ala Ala Asn
Val Met Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu Ser Asn Val
Lys Ser Gly Gly Val Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Gly
Asp Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150 155 160Leu
Arg Val Val Asn Val Ser Thr Gly Glu Val Leu Ser Ser Val Asn 165 170
175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Val Gln Ala Gly Val Phe Arg
180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ile Gly Tyr
Thr Ala 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met Ser Ala Ile
Glu Thr Gly Val 210 215 220Ile His Leu Ile Asn Asp Gly Ile Asn Arg
Gly Leu Trp Glu Leu Lys225 230 235 240Asn Lys Gly Asp Ala Lys Asn
Thr Ile Leu Ala Lys Tyr Arg Ser Met 245 250 255Ala Val Pro Pro Glu
Ser 26075262PRTRahnella aquatilis 75Cys Leu Thr Ala Ala Pro Lys Glu
Ala Ala Arg Pro Thr Leu Leu Pro1 5 10 15Arg Ala Pro Ser Tyr Thr Asp
Leu Thr His Leu Pro Ser Pro Gln Gly 20 25 30Arg Ile Phe Val Ser Val
Tyr Asn Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala
Cys Asn Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu
Val Ser Ala Leu Lys Asp Ser Lys Trp Phe Ile Pro65 70 75 80Leu Glu
Arg Gln Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg
Ala Ala Gln Glu Asn Gly Ser Val Ala Ile Asn Asn Gln Arg Pro 100 105
110Leu Ser Ser Leu Val Ala Ala Asn Ile Leu Ile Glu Gly Ser Ile Ile
115 120 125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Val Gly Ala Arg
Tyr Phe 130 135 140Gly Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln
Ile Ala Val Asn145 150 155 160Leu Arg Ala Val Asp Val Asn Thr Gly
Glu Val Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu
Ser
Tyr Glu Val Gln Ala Gly Val Phe Arg 180 185 190Phe Ile Asp Tyr Gln
Arg Leu Leu Glu Gly Glu Leu Gly Tyr Thr Thr 195 200 205Asn Glu Pro
Val Met Leu Cys Leu Met Ser Ala Ile Glu Ser Gly Val 210 215 220Ile
Tyr Leu Val Asn Asp Gly Ile Glu Arg Asn Leu Trp Gln Leu Gln225 230
235 240Asn Pro Ser Glu Ile Asn Ser Pro Ile Leu Gln Arg Tyr Lys Asn
Asn 245 250 255Ile Val Pro Ala Glu Ser 26076259PRTKluyvera
ascorbata 76Cys Ile Thr Ser Pro Pro Lys Gln Ala Ala Lys Pro Thr Leu
Leu Pro1 5 10 15Arg Ser Gln Ser Tyr Gln Asp Leu Thr His Leu Pro Glu
Pro Gln Gly 20 25 30Arg Leu Phe Val Ser Val Tyr Asn Ile Ser Asp Glu
Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ser
Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Ser Ala Leu Lys Asp
Ser Asn Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Glu Asn Gly
Thr Val Ala Val Asn Asn Arg Thr Gln 100 105 110Leu Pro Ser Leu Val
Ala Ala Asn Ile Leu Ile Glu Gly Ser Ile Ile 115 120 125Gly Tyr Glu
Ser Asn Val Lys Ser Gly Gly Ala Gly Ala Arg Tyr Phe 130 135 140Gly
Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln Ile Ala Val Asn145 150
155 160Leu Arg Val Val Asn Val Ser Thr Gly Glu Val Leu Ser Ser Val
Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Phe Gln Ala Gly
Val Phe Arg 180 185 190Tyr Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu
Val Gly Tyr Thr Val 195 200 205Asn Glu Pro Val Met Leu Cys Leu Met
Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu Val Asn Asp Gly
Ile Ser Arg Asn Leu Trp Gln Leu Lys225 230 235 240Asn Ala Ser Asp
Ile Asn Ser Pro Val Leu Glu Lys Tyr Lys Ser Ile 245 250 255Ile Val
Pro77259PRTHafnia alvei 77Cys Leu Thr Ala Pro Pro Lys Gln Ala Ala
Lys Pro Thr Leu Met Pro1 5 10 15Arg Ala Gln Ser Tyr Gln Asp Leu Thr
His Leu Pro Glu Pro Ala Gly 20 25 30Lys Leu Phe Val Ser Val Tyr Asn
Ile Gln Asp Glu Thr Gly Gln Phe 35 40 45Lys Pro Tyr Pro Ala Ser Asn
Phe Ser Thr Ala Val Pro Gln Ser Ala 50 55 60Thr Ala Met Leu Val Ser
Ala Leu Lys Asp Ser Gly Trp Phe Ile Pro65 70 75 80Leu Glu Arg Gln
Gly Leu Gln Asn Leu Leu Asn Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala
Gln Glu Asn Gly Thr Ala Ala Val Asn Asn Gln His Gln 100 105 110Leu
Ser Ser Leu Val Ala Ala Asn Val Leu Val Glu Gly Ser Ile Ile 115 120
125Gly Tyr Glu Ser Asn Val Lys Ser Gly Gly Ala Gly Ala Arg Phe Phe
130 135 140Gly Ile Gly Ala Ser Thr Gln Tyr Gln Leu Asp Gln Ile Ala
Val Asn145 150 155 160Leu Arg Val Val Asp Val Asn Thr Gly Gln Val
Leu Ser Ser Val Asn 165 170 175Thr Ser Lys Thr Ile Leu Ser Tyr Glu
Val Gln Ala Gly Val Phe Arg 180 185 190Tyr Ile Asp Tyr Gln Arg Leu
Leu Glu Gly Glu Ile Gly Tyr Thr Thr 195 200 205Asn Glu Pro Val Met
Leu Cys Val Met Ser Ala Ile Glu Thr Gly Val 210 215 220Ile Tyr Leu
Val Asn Asp Gly Ile Asn Arg Asn Leu Trp Thr Leu Lys225 230 235
240Asn Pro Gln Asp Ala Lys Ser Ser Val Leu Glu Arg Tyr Lys Ser Thr
245 250 255Ile Val Pro78255PRTEnterobacteriaceae bacterium 78Cys
Ile Thr Thr Pro Pro Gln Glu Ala Ala Lys Pro Thr Leu Leu Pro1 5 10
15Arg Asp Ala Thr Tyr Lys Asp Leu Val Ser Leu Pro Gln Pro Arg Gly
20 25 30Lys Ile Tyr Val Ala Val Tyr Asn Ile Gln Asp Glu Thr Gly Gln
Phe 35 40 45Gln Pro Tyr Pro Ala Ser Asn Phe Ser Thr Ser Val Pro Gln
Ser Ala 50 55 60Thr Ala Met Leu Val Ser Ser Leu Lys Asp Ser Arg Trp
Phe Val Pro65 70 75 80Leu Glu Arg Gln Gly Leu Asn Asn Leu Leu Asn
Glu Arg Lys Ile Ile 85 90 95Arg Ala Ala Gln Gln Asn Gly Thr Val Gly
Asp Asn Asn Ala Ser Pro 100 105 110Leu Pro Ser Leu Tyr Ser Ala Asn
Val Ile Val Glu Gly Ser Ile Ile 115 120 125Gly Tyr Ala Ser Asn Val
Lys Thr Gly Gly Phe Gly Ala Arg Tyr Phe 130 135 140Gly Ile Gly Gly
Ser Thr Gln Tyr Gln Leu Asp Gln Val Ala Val Asn145 150 155 160Leu
Arg Ile Val Asn Val His Thr Gly Glu Val Leu Ser Ser Val Asn 165 170
175Thr Ser Lys Thr Ile Leu Ser Tyr Glu Ile Gln Ala Gly Val Phe Arg
180 185 190Phe Ile Asp Tyr Gln Arg Leu Leu Glu Gly Glu Ala Gly Phe
Thr Thr 195 200 205Asn Glu Pro Val Met Thr Cys Leu Met Ser Ala Ile
Glu Glu Gly Val 210 215 220Ile His Leu Ile Asn Asp Gly Ile Asn Lys
Lys Leu Trp Ala Leu Ser225 230 235 240Asn Ala Ala Asp Ile Asn Ser
Glu Val Leu Thr Arg Tyr Arg Lys 245 250 25579258PRTPlesiomonas
shigelloides 79Ile Thr Glu Val Pro Lys Glu Ala Ala Lys Pro Thr Leu
Met Pro Arg1 5 10 15Ala Ser Thr Tyr Lys Asp Leu Val Ala Leu Pro Lys
Pro Asn Gly Lys 20 25 30Ile Ile Val Ser Val Tyr Ser Val Gln Asp Glu
Thr Gly Gln Phe Lys 35 40 45Pro Leu Pro Ala Ser Asn Phe Ser Thr Ala
Val Pro Gln Ser Gly Asn 50 55 60Ala Met Leu Thr Ser Ala Leu Lys Asp
Ser Gly Trp Phe Val Pro Leu65 70 75 80Glu Arg Glu Gly Leu Gln Asn
Leu Leu Asn Glu Arg Lys Ile Ile Arg 85 90 95Ala Ala Gln Glu Asn Gly
Thr Val Ala Ala Asn Asn Gln Gln Pro Leu 100 105 110Pro Ser Leu Leu
Ser Ala Asn Val Val Ile Glu Gly Ala Ile Ile Gly 115 120 125Tyr Asp
Ser Asp Ile Lys Thr Gly Gly Ala Gly Ala Arg Tyr Phe Gly 130 135
140Ile Gly Ala Asp Gly Lys Tyr Arg Val Asp Gln Val Ala Val Asn
Leu145 150 155 160Arg Ala Val Asp Val Arg Thr Gly Glu Val Leu Leu
Ser Val Asn Thr 165 170 175Ser Lys Thr Ile Leu Ser Ser Glu Leu Ser
Ala Gly Val Phe Arg Phe 180 185 190Ile Glu Tyr Gln Arg Leu Leu Glu
Leu Glu Ala Gly Tyr Thr Thr Asn 195 200 205Glu Pro Val Met Met Cys
Met Met Ser Ala Leu Glu Ala Gly Val Ala 210 215 220His Leu Ile Val
Glu Gly Ile Arg Gln Asn Leu Trp Ser Leu Gln Asn225 230 235 240Pro
Ser Asp Ile Asn Asn Pro Ile Ile Gln Arg Tyr Met Lys Glu Asp 245 250
255Val Pro80248PRTVibrio fischeri 80Pro Glu Thr Ser Glu Ser Pro Thr
Leu Met Gln Arg Gly Ala Asn Tyr1 5 10 15Ile Asp Leu Ile Ser Leu Pro
Lys Pro Gln Gly Lys Ile Phe Val Ser 20 25 30Val Tyr Asp Phe Arg Asp
Gln Thr Gly Gln Tyr Lys Pro Gln Pro Asn 35 40 45Ser Asn Phe Ser Thr
Ala Val Pro Gln Gly Gly Thr Ala Leu Leu Thr 50 55 60Met Ala Leu Leu
Asp Ser Glu Trp Phe Tyr Pro Leu Glu Arg Gln Gly65 70 75 80Leu Gln
Asn Leu Leu Thr Glu Arg Lys Ile Ile Arg Ala Ala Gln Lys 85 90 95Lys
Gln Glu Ser Ile Ser Asn His Gly Ser Thr Leu Pro Ser Leu Leu 100 105
110Ser Ala Asn Val Met Ile Glu Gly Gly Ile Val Ala Tyr Asp Ser Asn
115 120 125Ile Lys Thr Gly Gly Ala Gly Ala Arg Tyr Leu Gly Ile Gly
Gly Ser 130 135 140Gly Gln Tyr Arg Ala Asp Gln Val Thr Val Asn Ile
Arg Ala Val Asp145 150 155 160Val Arg Ser Gly Lys Ile Leu Thr Ser
Val Thr Thr Ser Lys Thr Ile 165 170 175Leu Ser Tyr Glu Val Ser Ala
Gly Ala Phe Arg Phe Val Asp Tyr Lys 180 185 190Glu Leu Leu Glu Val
Glu Leu Gly Tyr Thr Asn Asn Glu Pro Val Asn 195 200 205Ile Ala Leu
Met Ser Ala Ile Asp Ser Ala Val Ile His Leu Ile Val 210 215 220Lys
Gly Val Gln Gln Gly Leu Trp Arg Pro Ala Asn Leu Asp Thr Arg225 230
235 240Asn Asn Pro Ile Phe Lys Lys Tyr 24581248PRTAliivibrio logei
81Pro Asp Ala Ser Glu Ser Pro Thr Leu Met Gln Arg Gly Ala Thr Tyr1
5 10 15Leu Asp Leu Ile Ser Leu Pro Lys Pro Gln Gly Lys Ile Tyr Val
Ser 20 25 30Val Tyr Asp Phe Arg Asp Gln Thr Gly Gln Tyr Lys Pro Gln
Pro Asn 35 40 45Ser Asn Phe Ser Thr Ala Val Pro Gln Gly Gly Thr Ala
Leu Leu Thr 50 55 60Met Ala Leu Leu Asp Ser Glu Trp Phe Tyr Pro Leu
Glu Arg Gln Gly65 70 75 80Leu Gln Asn Leu Leu Thr Glu Arg Lys Ile
Ile Arg Ala Ala Gln Lys 85 90 95Lys Gln Glu Ser Ile Ser Asn His Gly
Ser Thr Leu Pro Ser Leu Leu 100 105 110Ser Ala Asn Val Met Ile Glu
Gly Gly Ile Val Ala Tyr Asp Ser Asn 115 120 125Ile Lys Thr Gly Gly
Ala Gly Ala Arg Tyr Leu Gly Ile Gly Gly Ser 130 135 140Gly Gln Tyr
Arg Ala Asp Gln Val Thr Val Asn Ile Arg Ala Val Asp145 150 155
160Val Arg Ser Gly Lys Ile Leu Thr Ser Val Thr Thr Ser Lys Thr Ile
165 170 175Leu Ser Tyr Glu Leu Ser Ala Gly Ala Phe Arg Phe Val Asp
Tyr Lys 180 185 190Glu Leu Leu Glu Val Glu Leu Gly Tyr Thr Asn Asn
Glu Pro Val Asn 195 200 205Ile Ala Leu Met Ser Ala Ile Asp Ser Ala
Val Ile His Leu Ile Val 210 215 220Lys Gly Ile Glu Glu Gly Leu Trp
Arg Pro Glu Asn Gln Asn Gly Lys225 230 235 240Glu Asn Pro Ile Phe
Arg Lys Tyr 24582254PRTPhotobacterium sp. 82Pro Glu Thr Ser Lys Glu
Pro Thr Leu Met Ala Arg Gly Thr Ala Tyr1 5 10 15Gln Asp Leu Val Ser
Leu Pro Leu Pro Lys Gly Lys Val Tyr Val Ser 20 25 30Val Tyr Asp Phe
Arg Asp Gln Thr Gly Gln Tyr Lys Pro Gln Pro Asn 35 40 45Ser Asn Phe
Ser Thr Ala Val Pro Gln Gly Gly Ala Ala Leu Leu Thr 50 55 60Thr Ala
Leu Leu Asp Ser Arg Trp Phe Met Pro Leu Glu Arg Glu Gly65 70 75
80Leu Gln Asn Leu Leu Thr Glu Arg Lys Ile Ile Arg Ala Ala Gln Lys
85 90 95Lys Asp Glu Ile Pro Thr Asn His Gly Val His Leu Pro Ser Leu
Ala 100 105 110Ser Ala Asn Ile Met Val Glu Gly Gly Ile Val Ala Tyr
Asp Thr Asn 115 120 125Ile Gln Thr Gly Gly Ala Gly Ala Arg Tyr Leu
Gly Val Gly Ala Ser 130 135 140Gly Gln Tyr Arg Thr Asp Gln Val Thr
Val Asn Ile Arg Ala Val Asp145 150 155 160Val Arg Thr Gly Arg Ile
Leu Leu Ser Val Thr Thr Ser Lys Thr Ile 165 170 175Leu Ser Lys Glu
Leu Gln Thr Gly Val Phe Lys Phe Val Asp Tyr Lys 180 185 190Asp Leu
Leu Glu Ala Glu Leu Gly Tyr Thr Thr Asn Glu Pro Val Asn 195 200
205Leu Ala Val Met Ser Ala Ile Asp Ala Ala Val Val His Val Ile Val
210 215 220Asp Gly Ile Lys Thr Gly Leu Trp Glu Pro Leu Arg Gly Glu
Asp Leu225 230 235 240Gln His Pro Ile Ile Gln Glu Tyr Met Asn Arg
Ser Lys Pro 245 25083261PRTAeromonas veronii 83Cys Ala Thr His Ile
Gly Ser Pro Val Ala Asp Glu Lys Ala Thr Leu1 5 10 15Met Pro Arg Ser
Val Ser Tyr Lys Glu Leu Ile Ser Leu Pro Lys Pro 20 25 30Lys Gly Lys
Ile Val Ala Ala Val Tyr Asp Phe Arg Asp Gln Thr Gly 35 40 45Gln Tyr
Leu Pro Ala Pro Ala Ser Asn Phe Ser Thr Ala Val Thr Gln 50 55 60Gly
Gly Val Ala Met Leu Ser Thr Ala Leu Trp Asp Ser Gln Trp Phe65 70 75
80Val Pro Leu Glu Arg Glu Gly Leu Gln Asn Leu Leu Thr Glu Arg Lys
85 90 95Ile Val Arg Ala Ala Gln Asn Lys Pro Asn Val Pro Gly Asn Asn
Ala 100 105 110Asn Gln Leu Pro Ser Leu Val Ala Ala Asn Ile Leu Ile
Glu Gly Gly 115 120 125Ile Val Ala Tyr Asp Ser Asn Val Arg Thr Gly
Gly Ala Gly Ala Lys 130 135 140Tyr Phe Gly Ile Gly Ala Ser Gly Glu
Tyr Arg Val Asp Gln Val Thr145 150 155 160Val Asn Leu Arg Ala Val
Asp Ile Arg Ser Gly Arg Ile Leu Asn Ser 165 170 175Val Thr Thr Ser
Lys Thr Val Met Ser Gln Gln Val Gln Ala Gly Val 180 185 190Phe Arg
Phe Val Glu Tyr Lys Arg Leu Leu Glu Ala Glu Ala Gly Phe 195 200
205Ser Thr Asn Glu Pro Val Gln Met Cys Val Met Ser Ala Ile Glu Ser
210 215 220Gly Val Ile Arg Leu Ile Ala Asn Gly Val Arg Asp Asn Leu
Trp Gln225 230 235 240Leu Ala Asp Gln Arg Asp Ile Asp Asn Pro Ile
Leu Gln Glu Tyr Leu 245 250 255Gln Asp Asn Ala Pro
26084239PRTShewanella sp. 84Ala Ser Ser Ser Leu Met Pro Lys Gly Glu
Ser Tyr Tyr Asp Leu Ile1 5 10 15Asn Leu Pro Ala Pro Gln Gly Val Met
Leu Ala Ala Val Tyr Asp Phe 20 25 30Arg Asp Gln Thr Gly Gln Tyr Lys
Pro Ile Pro Ser Ser Asn Phe Ser 35 40 45Thr Ala Val Pro Gln Ser Gly
Thr Ala Phe Leu Ala Gln Ala Leu Asn 50 55 60Asp Ser Ser Trp Phe Ile
Pro Val Glu Arg Glu Gly Leu Gln Asn Leu65 70 75 80Leu Thr Glu Arg
Lys Ile Val Arg Ala Gly Leu Lys Gly Asp Ala Asn 85 90 95Lys Leu Pro
Gln Leu Asn Ser Ala Gln Ile Leu Met Glu Gly Gly Ile 100 105 110Val
Ala Tyr Asp Thr Asn Val Arg Thr Gly Gly Ala Gly Ala Arg Tyr 115 120
125Leu Gly Ile Gly Ala Ala Thr Gln Phe Arg Val Asp Thr Val Thr Val
130 135 140Asn Leu Arg Ala Val Asp Ile Arg Thr Gly Arg Leu Leu Ser
Ser Val145 150 155 160Thr Thr Thr Lys Ser Ile Leu Ser Lys Glu Ile
Thr Ala Gly Val Phe 165 170 175Lys Phe Ile Asp Ala Gln Glu Leu Leu
Glu Ser Glu Leu Gly Tyr Thr 180 185 190Ser Asn Glu Pro Val Ser Leu
Cys Val Ala Ser Ala Ile Glu Ser Ala 195 200 205Val Val His Met Ile
Ala Asp Gly Ile Trp Lys Gly Ala Trp Asn Leu 210 215 220Ala Asp Gln
Ala Ser Gly Leu Arg Ser Pro Val Leu Gln Lys Tyr225 230
23585233PRTPseudomonas putida 85Gln Asp Ser Glu Thr Pro Thr Leu Thr
Pro Arg Ala Ser Thr Tyr Tyr1 5 10 15Asp Leu Ile Asn Met Pro Arg Pro
Lys Gly Arg Leu Met Ala Val Val 20 25 30Tyr Gly Phe Arg Asp Gln Thr
Gly Gln Tyr Lys Pro Thr Pro Ala Ser 35 40 45Ser Phe Ser Thr Ser Val
Thr Gln Gly Ala Ala Ser Met Leu Met Asp 50 55 60Ala Leu Ser Ala Ser
Gly Trp Phe Val Val Leu Glu Arg Glu Gly Leu65 70 75 80Gln Asn Leu
Leu
Thr Glu Arg Lys Ile Ile Arg Ala Ser Gln Lys Lys 85 90 95Pro Asp Val
Ala Glu Asn Ile Met Gly Glu Leu Pro Pro Leu Gln Ala 100 105 110Ala
Asn Leu Met Leu Glu Gly Gly Ile Ile Ala Tyr Asp Thr Asn Val 115 120
125Arg Ser Gly Gly Glu Gly Ala Arg Tyr Leu Gly Ile Asp Ile Ser Arg
130 135 140Glu Tyr Arg Val Asp Gln Val Thr Val Asn Leu Arg Ala Val
Asp Val145 150 155 160Arg Thr Gly Gln Val Leu Ala Asn Val Met Thr
Ser Lys Thr Ile Tyr 165 170 175Ser Val Gly Arg Ser Ala Gly Val Phe
Lys Phe Ile Glu Phe Lys Lys 180 185 190Leu Leu Glu Ala Glu Val Gly
Tyr Thr Thr Asn Glu Pro Ala Gln Leu 195 200 205Cys Val Leu Ser Ala
Ile Glu Ser Ala Val Gly His Leu Leu Ala Gln 210 215 220Gly Ile Glu
Gln Arg Leu Trp Gln Val225 23086234PRTShewanella violacea 86Met Pro
Lys Ser Asp Thr Tyr Tyr Asp Leu Ile Gly Leu Pro His Pro1 5 10 15Gln
Gly Ser Met Leu Ala Ala Val Tyr Asp Phe Arg Asp Gln Thr Gly 20 25
30Gln Tyr Lys Ala Ile Pro Ser Ser Asn Phe Ser Thr Ala Val Pro Gln
35 40 45Ser Gly Thr Ala Phe Leu Ala Gln Ala Leu Asn Asp Ser Ser Trp
Phe 50 55 60Val Pro Val Glu Arg Glu Gly Leu Gln Asn Leu Leu Thr Glu
Arg Lys65 70 75 80Ile Val Arg Ala Gly Leu Lys Gly Glu Ala Asn Gln
Leu Pro Gln Leu 85 90 95Ser Ser Ala Gln Ile Leu Met Glu Gly Gly Ile
Val Ala Tyr Asp Thr 100 105 110Asn Ile Lys Thr Gly Gly Ala Gly Ala
Arg Tyr Leu Gly Ile Gly Val 115 120 125Asn Ser Lys Phe Arg Val Asp
Thr Val Thr Val Asn Leu Arg Ala Val 130 135 140Asp Ile Arg Thr Gly
Arg Leu Leu Ser Ser Val Thr Thr Thr Lys Ser145 150 155 160Ile Leu
Ser Lys Glu Val Ser Ala Gly Val Phe Lys Phe Ile Asp Ala 165 170
175Gln Asp Leu Leu Glu Ser Glu Leu Gly Tyr Thr Ser Asn Glu Pro Val
180 185 190Ser Leu Cys Val Ala Gln Ala Ile Glu Ser Ala Val Val His
Met Ile 195 200 205Ala Asp Gly Ile Trp Lys Arg Ala Trp Asn Leu Ala
Asp Thr Ala Ser 210 215 220Gly Leu Asn Asn Pro Val Leu Gln Lys
Tyr225 23087245PRTMarinobacterium jannaschii 87Leu Thr Arg Arg Met
Ser Thr Tyr Gln Asp Leu Ile Asp Met Pro Ala1 5 10 15Pro Arg Gly Lys
Ile Val Thr Ala Val Tyr Ser Phe Arg Asp Gln Ser 20 25 30Gly Gln Tyr
Lys Pro Ala Pro Ser Ser Ser Phe Ser Thr Ala Val Thr 35 40 45Gln Gly
Ala Ala Ala Met Leu Val Asn Val Leu Asn Asp Ser Gly Trp 50 55 60Phe
Ile Pro Leu Glu Arg Glu Gly Leu Gln Asn Ile Leu Thr Glu Arg65 70 75
80Lys Ile Ile Arg Ala Ala Leu Lys Lys Asp Asn Val Pro Val Asn Asn
85 90 95Ser Ala Gly Leu Pro Ser Leu Leu Ala Ala Asn Ile Met Leu Glu
Gly 100 105 110Gly Ile Val Gly Tyr Asp Ser Asn Ile His Thr Gly Gly
Ala Gly Ala 115 120 125Arg Tyr Phe Gly Ile Gly Ala Ser Glu Lys Tyr
Arg Val Asp Glu Val 130 135 140Thr Val Asn Leu Arg Ala Ile Asp Ile
Arg Thr Gly Arg Ile Leu His145 150 155 160Ser Val Leu Thr Ser Lys
Lys Ile Leu Ser Arg Glu Ile Arg Ser Asp 165 170 175Val Tyr Arg Phe
Ile Glu Phe Lys His Leu Leu Glu Met Glu Ala Gly 180 185 190Ile Thr
Thr Asn Asp Pro Ala Gln Leu Cys Val Leu Ser Ala Ile Glu 195 200
205Ser Ala Val Ala His Leu Ile Val Asp Gly Val Ile Lys Lys Ser Trp
210 215 220Ser Leu Ala Asp Pro Asn Glu Leu Asn Ser Pro Val Ile Gln
Ala Tyr225 230 235 240Gln Gln Gln Arg Ile
24588234PRTChryseobacterium oranimense 88Pro Ser Asp Pro Glu Arg
Ser Thr Met Gly Glu Leu Thr Pro Ser Thr1 5 10 15Ala Glu Leu Arg Asn
Leu Pro Leu Pro Asn Glu Lys Ile Val Ile Gly 20 25 30Val Tyr Lys Phe
Arg Asp Gln Thr Gly Gln Tyr Lys Pro Ser Glu Asn 35 40 45Gly Asn Asn
Trp Ser Thr Ala Val Pro Gln Gly Thr Thr Thr Ile Leu 50 55 60Ile Lys
Ala Leu Glu Asp Ser Arg Trp Phe Ile Pro Ile Glu Arg Glu65 70 75
80Asn Ile Ala Asn Leu Leu Asn Glu Arg Gln Ile Ile Arg Ser Thr Arg
85 90 95Gln Glu Tyr Met Lys Asp Ala Asp Lys Asn Ser Gln Ser Leu Pro
Pro 100 105 110Leu Leu Tyr Ala Gly Ile Leu Leu Glu Gly Gly Val Ile
Ser Tyr Asp 115 120 125Ser Asn Thr Met Thr Gly Gly Phe Gly Ala Arg
Tyr Phe Gly Ile Gly 130 135 140Ala Ser Thr Gln Tyr Arg Gln Asp Arg
Ile Thr Ile Tyr Leu Arg Ala145 150 155 160Val Ser Thr Leu Asn Gly
Glu Ile Leu Lys Thr Val Tyr Thr Ser Lys 165 170 175Thr Ile Leu Ser
Thr Ser Val Asn Gly Ser Phe Phe Arg Tyr Ile Asp 180 185 190Thr Glu
Arg Leu Leu Glu Ala Glu Val Gly Leu Thr Gln Asn Glu Pro 195 200
205Val Gln Leu Ala Val Thr Glu Ala Ile Glu Lys Ala Val Arg Ser Leu
210 215 220Ile Ile Glu Gly Thr Arg Asp Lys Ile Trp225
2308945PRTArtificial SequenceCsgF peptide 89Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser
Tyr Asn Asp Asp Phe Gly Ile Glu Thr 35 40 459029PRTArtificial
SequenceCsgF peptide 90Cys Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln 20 259135PRTArtificial SequenceCsgF peptide 91Cys Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro 359235PRTArtificial SequenceCsgF peptide 92Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro 359335PRTArtificial SequenceCsgF peptide 93Cys Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Tyr Pro 359445PRTArtificial SequenceCsgF peptide 94Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser
Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25
30Lys Asp Pro Ser Tyr Asn Asp Asp Phe Gly Ile Glu Thr 35 40
459535PRTArtificial SequenceCsgF peptide 95Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
359639PRTArtificial SequenceCsgF peptide 96Cys Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln His His His 20 25 30His His His His
His His His 359745PRTArtificial SequenceCsgF peptide 97Cys Thr Met
Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn Asn
Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys
Asp Pro His His His His His His His His His His 35 40
459835PRTArtificial SequenceCsgF peptide 98Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
359935PRTArtificial SequenceCsgF peptide 99Gly Thr Met Thr Phe Gln
Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala Phe
Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510035PRTArtificial SequenceCsgF peptide 100Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510135PRTArtificial SequenceCsgF peptide 101Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
3510235PRTArtificial SequenceCsgF peptide 102Gly Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510335PRTArtificial SequenceCsgF peptide 103Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510435PRTArtificial SequenceCsgF peptide 104Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asn Pro
3510535PRTArtificial SequenceCsgF peptide 105Cys Thr Met Cys Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro
3510635PRTArtificial SequenceCsgF peptide 106Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510735PRTArtificial SequenceCsgF peptide 107Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ala Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510835PRTArtificial SequenceCsgF peptide 108Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro
3510930PRTArtificial SequenceCsgF peptide 109Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25 3011030PRTArtificial
SequenceCsgF peptide 110Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn 20 25 3011130PRTArtificial SequenceCsgF peptide
111Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1
5 10 15Ala Asn Gly Ala Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20
25 3011230PRTArtificial SequenceCsgF peptide 112Gly Thr Met Thr Phe
Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala
Phe Leu Leu Asn Ser Ala Gln Ala Gln Asn 20 25 3011335PRTArtificial
SequenceCsgF peptide 113Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Val Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511435PRTArtificial
SequenceCsgF peptide 114Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Ala Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511535PRTArtificial
SequenceCsgF peptide 115Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Phe Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Tyr Pro 3511635PRTArtificial
SequenceCsgF peptide 116Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Ser Asn Gly Gln Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro 3511714PRTArtificial
SequenceCsgF peptide 117Gly Thr Met Thr Phe Gln Phe Arg His His His
His His His1 5 1011825PRTArtificial SequenceCsgF peptide 118Gly Thr
Met Thr Phe Gln Phe Arg Asn Pro Asn Phe Gly Gly Asn Pro1 5 10 15Asn
Asn Gly His His His His His His 20 2511935PRTArtificial
SequenceCsgF peptide 119Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln His His His 20 25 30His His His 3512051PRTArtificial
SequenceCsgF peptide 120Gly Thr Met Thr Phe Gln Phe Arg Asn Pro Asn
Phe Gly Gly Asn Pro1 5 10 15Asn Asn Gly Ala Phe Leu Leu Asn Ser Ala
Gln Ala Gln Asn Ser Tyr 20 25 30Lys Asp Pro Ser Tyr Asn Asp Asp Phe
Gly Ile Glu Thr His His His 35 40 45His His His 50
* * * * *
References