U.S. patent application number 13/294072 was filed with the patent office on 2012-05-24 for gb1 peptidic libraries and methods of screening the same.
Invention is credited to SACHDEV S. SIDHU, MARUTI UPPALAPATI.
Application Number | 20120129715 13/294072 |
Document ID | / |
Family ID | 46064897 |
Filed Date | 2012-05-24 |
United States Patent
Application |
20120129715 |
Kind Code |
A1 |
SIDHU; SACHDEV S. ; et
al. |
May 24, 2012 |
GB1 PEPTIDIC LIBRARIES AND METHODS OF SCREENING THE SAME
Abstract
GB1 peptidic libraries and methods of screening the same for
specific binding to a target protein are provided. Libraries of
polynucleotides that encode GB1 peptidic compounds are provided.
These libraries find use in a variety of applications in which
specific binding to target molecules, e.g., target proteins is
desired. Also provided are methods of screening the libraries for
binding to a target.
Inventors: |
SIDHU; SACHDEV S.; (TORONTO,
CA) ; UPPALAPATI; MARUTI; (TORONTO, CA) |
Family ID: |
46064897 |
Appl. No.: |
13/294072 |
Filed: |
November 10, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61413318 |
Nov 12, 2010 |
|
|
|
61413331 |
Nov 12, 2010 |
|
|
|
61413316 |
Nov 12, 2010 |
|
|
|
Current U.S.
Class: |
506/9 ; 506/17;
506/18 |
Current CPC
Class: |
C07K 1/00 20130101; G01N
33/6845 20130101; C40B 30/04 20130101; C07K 14/315 20130101 |
Class at
Publication: |
506/9 ; 506/18;
506/17 |
International
Class: |
C40B 30/04 20060101
C40B030/04; C40B 40/08 20060101 C40B040/08; C40B 40/10 20060101
C40B040/10 |
Claims
1. A library comprising 50 or more distinct GB1 peptidic compounds;
wherein each compound of the library comprises a .beta.1-.beta.2
region and has three or more different non-core mutations in a
region outside of the .beta.1-.beta.2 region.
2. The library according to claim 1, wherein the library comprises
1.times.10.sup.4 or more distinct compounds.
3-4. (canceled)
5. The library according to claim 1, wherein each compound of the
library comprises six or more different non-core mutations in a
region outside of the .beta.1-.beta.2 region.
6. The library according to claim 1, wherein each compound of the
library comprises ten or more different mutations.
7-13. (canceled)
14. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 21-24, 26, 27, 30, 31, 34, 35,
37-41.
15. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 18-24, 26-28, 30-32, 34 and 35.
16. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 1, 18-24 ad 45-49.
17. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 7-12, 36-41, 54 and 55.
18. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 3, 5, 7-14, 16, 52, 54 and 55.
19. The library according to claim 6, wherein the ten or more
different mutations are located at positions selected from the
group consisting of positions 1, 3, 5, 7, 41, 43, 45-50 52 and
54.
20. The library according to claim 6, wherein each compound of the
library comprises five or more different mutations in the .alpha.1
region.
21-23. (canceled)
24. The library according to claim 6, wherein each compound of the
library comprises three or more different mutations in the
.beta.3-.beta.4 region.
25-30. (canceled)
31. The library according to claim 6, wherein each compound of the
library comprises two or more different mutations in the region
between the .alpha.1 and .beta.3 regions.
32. (canceled)
33. The library according to claim 6, wherein each compound of the
library comprises ten or more different mutations in the
.beta.1-.beta.2 region.
34-35. (canceled)
36. The library according to claim 33, wherein P1 is
.beta.1-.beta.2 and P2 is .beta.3-.beta.4 such that the compound is
described by the formula (II):
.beta.1-.beta.2-.alpha.1-.beta.3-.beta.4 (II) wherein .beta.1,
.beta.2, .beta.3 and .crclbar.4 are independently beta-strand
domains; and .beta.1, .beta.2, .alpha.1, .beta.3 and .beta.4 are
connected independently by linking sequences of between 1 and 10
residues in length.
37-38. (canceled)
39. The library according to claim 36, wherein each compound of the
library is described by a formula independently selected from the
group consisting of: F1-V1-F2 (III); F3-V2-F4 (IV);
V3-F5-V4-F6-V5-F7 (V); F8-V6-F9-V7-F10-V8 (VI); V9-F11-V10 (VII);
and V11-F12-V12 (VIII) wherein F1, F2, F3, F4, F5, F6, F7, F8, F9,
F10, F11 and F12 are fixed regions and V1, V2, V3, V4, V5, V6, V7,
V8, V9, V10, V11 and V12 are variable regions; wherein each fixed
region is common to all compounds of the same formula and each
compound of the library has a distinct variable region.
40. The library according to claim 39, wherein each compound of the
library is described by formula (III), wherein: F1 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 2; F2 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 3; and V1 comprises a
sequence that comprises 10 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 4.
41. The library according to claim 40, wherein V1 comprises a
sequence of the formula: TABLE-US-00015 VXXXXAXXVFXXYAXXNXXXXXW
(SEQ ID NO: 5)
wherein each X is independently a mutation that comprises
substitution with a variant amino acid, wherein the mutation at
position 19 of V1 comprises insertion of 0, 1 or 2 additional
variant amino acids.
42. (canceled)
43. The library according to claim 39, wherein each compound of the
library is described by formula (IV), wherein: F3 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 7; F4 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 8; and V2 comprises a
sequence that comprises 10 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 9.
44. The library according to claim 39, wherein V2 comprises a
sequence of the formula: TABLE-US-00016 TXXXXXXXAXXXFXXXAXXN (SEQ
ID NO: 10)
wherein each X is independently a mutation that comprises
substitution with a variant amino acid, wherein the mutation at
position 3 of V2 comprises insertion of 0, 1 or 2 additional
variant amino acids.
45. (canceled)
46. The library according to claim 39, wherein each compound of the
library is described by formula (V), wherein: F5 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 12; F6 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 13; F7 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 14; V3 comprises a
sequence that comprises one or more mutation compared to a parent
amino acid sequence that is TY; and V4 comprises a sequence that
comprises 7 or more mutations compared to a parent amino acid
sequence set forth in SEQ ID NO: 15; and V5 comprises a sequence
that comprises 5 or more mutations compared to a parent amino acid
sequence set forth in SEQ ID NO: 16.
47. The library according to claim 46, wherein: V3 comprises a
sequence of the formula XY; V4 comprises a sequence of the formula
TXXXXXXXA (SEQ ID NO: 17); and V5 comprises a sequence of the
formula YXXXXXT (SEQ ID NO: 18); wherein each X is independently a
mutation that comprises substitution with a variant amino acid,
wherein the mutation at position 1 of V3 comprises insertion of 2
additional variant amino acids and the mutations at positions 3 and
4 of V4 and V5 each independently comprise insertion of 0, 1 or 2
additional variant amino acids.
48. (canceled)
49. The library according to claim 39, wherein each compound of the
library is described by formula (VI), wherein: F8 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 21; F9 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 22; F10 comprises a
sequence having 75% or more amino acid sequence identity to the
amino acid sequence set forth in SEQ ID NO: 23; V6 comprises a
sequence that comprises 6 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 24; V7 comprises a
sequence that comprises 6 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 25; and V8 comprises a
sequence that comprises 2 or more mutations compared to a parent
amino acid sequence that is VTE.
50. The library according to claim 49, wherein: V6 comprises a
sequence of the formula LXXXXXXG (SEQ ID NO: 26); V7 comprises a
sequence of the formula DXXXXXXW (SEQ ID NO: 27); and V8 comprises
a sequence of the formula VXX; wherein each X is independently a
mutation that comprises substitution with a variant amino acid,
wherein the mutations at position 4 of V6 and V7 each independently
comprise insertion of 0, 1 or 2 additional variant amino acids and
the mutation at position 3 of V8 comprises insertion of 1
additional variant amino acid.
51. (canceled)
52. The library according to claim 39, wherein each compound of the
library is described by formula (VII), wherein: F11 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 30; V9 comprises a
sequence that comprises at least 11 mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 31; and V10 comprises a
sequence that comprises 3 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 32.
53. The library according to claim 52, wherein: V9 comprises a
sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO: 33); and V10
comprises a sequence of the formula FXVXX (SEQ ID NO: 34); wherein
each X is independently a mutation that comprises substitution with
a variant amino acid, wherein the mutations at position 9 of V9
comprises insertion of 0, 1 or 2 additional variant amino acids and
the mutation at position 5 of V10 comprises insertion of 1
additional variant amino acid.
54. (canceled)
55. The library according to claim 39, wherein each compound of the
library is described by formula (VIII), wherein: F12 comprises a
sequence having 75% or more amino acid sequence identity to an
amino acid sequence set forth in SEQ ID NO: 37; V11 comprises a
sequence that comprises 4 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 38; V12 comprises a
sequence that comprises 10 or more mutations compared to a parent
amino acid sequence set forth in SEQ ID NO: 39.
56. The library according to claim 55, wherein: V11 comprises a
sequence of the formula XYXLXLXG (SEQ ID NO: 40); and V12 comprises
a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID NO: 41); wherein
each X is independently a mutation that comprises substitution with
a variant amino acid, wherein the mutation at position 8 of V12
comprises insertion of 0, 1 or 2 additional variant amino acids and
the mutation at position 1 of V11 comprises insertion of 1
additional variant amino acid.
57-65. (canceled)
66. A library of polynucleotides that encodes 50 or more distinct
compounds, wherein each polynucleotide encodes a GB1 peptidic
compound that comprises a .beta.1-.beta.2 region and has three or
more different non-core mutations at positions in a region outside
of the .beta.1-.beta.2 region.
67-72. (canceled)
73. The library according to claim 66, wherein each polynucleotide
encodes a GB1 peptidic compounds comprising ten or more variant
amino acids at non core positions, wherein each variant amino acid
is encoded by a random codon.
74. (canceled)
75. A method comprising: contacting a target protein with a library
comprising: 50 or more distinct GB1 peptidic compounds, wherein
each compound of the library comprises a .beta.1-.beta.2 region and
has three or more different non-core mutations in a region outside
of the .beta.1-.beta.2 region; and identifying a compound of the
library that specifically binds to the target protein.
76-79. (canceled)
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] Pursuant to 35 U.S.C. .sctn.119(e), this application claims
priority to the filing date of U.S. provisional application Ser.
No. 61/413,318, filed Nov. 12, 2010, the disclosure of which is
herein incorporated by reference.
[0002] This application is related to copending U.S. application
entitled "GB1 peptidic compounds and methods for making and using
the same" filed on Nov. 10, 2011 to Sidhu et al. (attorney
reference number RFLX-003) and accorded Ser. No. ______, and U.S.
provisional application Ser. No. 61/413,331 filed Nov. 12, 2010,
which are entirely incorporated herein by reference.
[0003] This application is related to copending U.S. application
entitled "Methods and compositions for identifying D-peptidic
compounds that specifically bind target proteins" filed on Nov. 10,
2011 to Ault-Riche et al. (attorney reference number RFLX-002) and
accorded Ser. No. ______, and U.S. provisional application Ser. No.
61/413,316 filed Nov. 12, 2010, which are entirely incorporated
herein by reference.
INTRODUCTION
[0004] Essentially all biological processes depend on molecular
recognition mediated by proteins. The ability to manipulate the
interactions of such proteins is of interest for both basic
biological research and for the development of therapeutics and
diagnostics.
[0005] Libraries of polypeptides can be prepared, e.g., by
manipulating the immune system or via chemical synthesis, from
which specificity of binding to target molecules can be selected.
Molecular diversity from which specificity can be selected is large
for polypeptides having numerous possible sequence combinations of
amino acids. In addition, proteins can form large binding surfaces
with multiple contacts to a target molecule that leads to highly
specific and high affinity binding events. For example, antibodies
are a class of protein that has yielded highly specific and tight
binding ligands for various target antigens.
[0006] Because of the diversity of target molecules of interest and
the binding properties of proteins, the screening of peptidic
libraries to identify molecules with useful functions is of
interest.
SUMMARY
[0007] GB1 peptidic libraries and methods of screening the same for
specific binding to a target protein are provided. Libraries of
polynucleotides that encode GB1 peptidic compounds are provided.
These libraries find use in a variety of applications in which
specific binding to target molecules, e.g., target proteins is
desired. Also provided are methods of screening the libraries for
binding to a target.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIG. 1 depicts a ribbon structure of a GB1 protein that
illustrates a 4.beta.-1.alpha. motif (Mayo et al., Nature
Structural Biology, 5(6), 1998, p. 470-475).
[0009] FIGS. 2A and 2B depict six different libraries that include
a GB1 scaffold, both in a ribbon representation (top) and a space
filling representation (bottom). Amino acids at several positions
of the GB1 scaffold that are selected for mutation are highlighted
in darker shade (top). The space filling representations of Library
1 to Library 6 (bottom) illustrate six different potential binding
surfaces (shown in darker shade) on the GB1 scaffold.
[0010] FIG. 3 illustrates the underlying sequence of the GB1
scaffold domain (SEQ ID NO: 1) of FIGS. 2A-2B and the positions of
the variant amino acids (shown in the grey blocks) in Libraries 1
to 6. The asterisks indicate positions at which mutations may
include insertion of amino acids.
[0011] FIG. 4A depicts the phage display of a GB1 peptidic compound
fusion of coat protein p3 that includes a hinge and dimerization
format. FIG. 4B illustrates display levels of various formats of
the GB1 peptidic compound fusion on the phage particles.
[0012] FIG. 5 illustrates the design of phage display Library 1
(SEQ ID NOs: 225, 226 and 197-199).
[0013] FIG. 6 illustrates the design of phage display Library 2
(SEQ ID NOs: 225, 226 and 200-202).
[0014] FIG. 7 illustrates the design of phage display Library 3
(SEQ ID NOs: 225, 226 and 203-209).
[0015] FIG. 8 illustrates the design of phage display Library 4
(SEQ ID NOs: 225, 226 and 210-216).
[0016] FIG. 9 illustrates the design of phage display Library 5
(SEQ ID NOs: 225, 226 and 217-220).
[0017] FIG. 10 illustrates the design of phage display Library 6
(SEQ ID NOs: 225, 226 and 221-224).
[0018] FIGS. 5 to 10 illustrate the design of phage display
libraries based on Libraries 1 to 6 illustrated in FIGS. 2A-2B.
Ribbon (left) and space filling (right) structural representations
depict the variant amino acid positions in dark. Oligonucleotide
and amino acid sequences (SEQ ID NOs: 225 and 226) show the GB1
scaffold in the context of the fusion protein with GGS linkers at
the N- and C-termini of the scaffold. Also shown are the
oligonucleotide sequences synthesized for use in preparation of the
libraries by Kunkel mutagenesis that include KHT codons at variant
amino acid positions to encode variable regions of GB1 peptidic
compounds.
[0019] FIG. 11 illustrates binding results from four rounds of
phage display screening of Libraries 1 to 6 against L-VEGF and
D-VEGF.
[0020] FIG. 12 illustrates binding assay results of individual
clones identified from phage display screening of subject libraries
against VEGF proteins. 10 nM or 100 nM VEGF protein was added to
binding solutions in a competition binding assay.
[0021] FIG. 13 illustrates exemplary bifunctional libraries having
two potential binding surfaces. A: Solvent exposed residues of
surface 1 (S1) and surface 5 (S5) are shown in dark. B: Solvent
exposed residues of surface 4 (S4) and surface 3 (S3) are shown in
dark. C: Solvent exposed residues of surface 2 (S2) and surface 6
(S6) are shown in dark.
DEFINITIONS
[0022] As used herein, the term "peptidic" refers to a moiety that
is composed of amino acid residues. The term "peptidic" includes
compounds or libraries in which the conventional backbone has been
replaced with non-naturally occurring or synthetic backbones, and
peptides in which one or more naturally occurring amino acids have
been replaced with one or more non-naturally occurring or synthetic
amino acids, or a D-amino acid. Any of the depictions of sequences
found herein (e.g., using one-letter or three-letter codes) may
represent a L-amino acid or a D-amino acid version of the sequence.
Unless noted otherwise, the capital and small letter codes of L-
and D-amino acid residues are not utilized.
[0023] As used herein, the terms "polypeptide" and "protein" are
used interchangeably. The term "polypeptide" also includes post
translational modified polypeptides or proteins. The term
"polypeptide" includes polypeptides in which the conventional
backbone has been replaced with non-naturally occurring or
synthetic backbones, and peptides in which one or more of the
conventional amino acids have been replaced with one or more
non-naturally occurring or synthetic amino acids. In some
instances, polypeptides may be of any length, e.g., 2 or more amino
acids, 4 or more amino acids, 10 or more amino acids, 20 or more
amino acids, 30 or more amino acids, 40 or more amino acids, 50 or
more amino acids, 60 or more amino acids, 100 or more amino acids,
300 or more amino acids, 500 or more or 1000 or more amino
acids.
[0024] As used herein, the term "scaffold" or "scaffold domain"
refers to a peptidic framework from which a library of compounds
arose, and against which the compounds are able to be compared.
When a compound of a library arises from amino acid mutations at
various positions within a scaffold, the amino acids at those
positions are referred to as "variant amino acids." Such variant
amino acids may confer on the resulting peptidic compounds
different functions, such as specific binding to a target
protein.
[0025] As used herein, the term "mutation" is a deletion,
insertion, or substitution of an amino acid(s) residue or
nucleotide(s) residue relative to a reference sequence or motif,
such as a scaffold sequence or motif.
[0026] As used herein, the terms "GB1 scaffold domain" and "GB1
scaffold" refer to a scaffold that has a structural motif similar
to the B1 domain of Protein G (GB1), where the structural motif is
characterized by a motif including a four stranded .beta.-sheet
packed against a helix (also referred to as a 4.beta.-1.alpha.
motif). The arrangement of four .beta.-strands and one
.alpha.-helix may form a hairpin-helix-hairpin motif. An exemplary
GB1 scaffold domain is depicted in FIG. 1. GB1 scaffold domains
include members of the family of IgG binding B domains, e.g.,
Protein L B1 domain. Amino acid sequences of exemplary B domains
that may be employed herein as GB1 scaffold domains are found in
the Wellcome Trust Sanger Institute Pfam database (The Pfam protein
families database: Finn et al., Nucleic Acids Research (2010)
Database Issue 38:D211-222), see, e.g., Family: IgG_binding_B
(PF01378) (pfam.sanger.ac.uk/family/PF01378.10#tabview=tab0) or in
NCBI's protein database. Exemplary GB1 scaffold domain sequences
include those described by SEQ ID NOs: 227-261. A GB1 scaffold
domain may be a native sequence of a member of the B domain protein
family, a B domain sequence with pre-existing amino acid sequence
modifications (such as additions, deletions and/or substitutions),
or a fragment or analogue thereof. A GB1 scaffold domain may be
L-peptidic, D-peptidic or a combination thereof. In some cases, a
"GB1 scaffold domain" may also be referred to as a "parent amino
acid sequence."
[0027] As used herein, the term "GB1 peptidic compound" refers to a
compound composed of peptidic residues that has a parent GB1
scaffold domain.
[0028] As used herein, the terms "parent amino acid sequence" and
"parent polypeptide" refer to a polypeptide comprising an amino
acid sequence from which a variant GB1 peptidic compound arose and
against which the variant GB1 peptidic compound is being compared.
In some cases, the parent polypeptide lacks one or more of the
modifications disclosed herein and differs in function compared to
a variant GB1 peptidic compound as disclosed herein. The parent
polypeptide may comprise a native GB1 sequence or GB1 scaffold
sequence with pre-existing amino acid sequence modifications (such
as additions, deletions and/or substitutions).
[0029] As used herein, the term "variable region" refers to a
continuous sequence of residues that includes one or more variant
amino acids. A variable region may also include one or more
conserved amino acids at fixed positions. As used herein, the term
"fixed region" refers to a continuous sequence of residues that
does not include any mutations or variant amino acids, and is
conserved across a library of compounds.
[0030] As used herein, the term "variable domain" refers to a
domain that includes all of the variant amino acids of a GB1
scaffold. The variable domain may include one or more variable
regions, and may encompass a continuous or a discontinuous sequence
of residues. The variable domain may be part of the scaffold
domain.
[0031] As used herein, the term "discontinuous sequence of
residues" refers to a sequence of residues that is not continuous
with respect to the primary sequence of a peptidic compound. A
peptidic compound may fold to form a secondary or tertiary
structure, e.g., a 4.beta.-1.alpha. motif, where the amino acids of
a discontinuous sequence of residues are adjacent to each other in
space, i.e., contiguous. As used herein, the term "continuous
sequence of residues" refers to a sequence of residues that is
continuous in terms of the primary sequence of a peptidic
compound.
[0032] As used herein, the term "non-core mutation" refers to an
amino acid mutation of a GB1 peptidic compound that is located at a
position in the 4.beta.-1.alpha. structure that is not part of the
hydrophobic core of the structure. Amino acid residues in the
hydrophobic core of a GB1 peptidic compound are not significantly
solvent exposed but rather tend to form intramolecular hydrophobic
contacts. Unless explicitly defined otherwise, a hydrophobic core
residue or core position, as described herein, of a GB1 scaffold
domain that is described by SEQ ID NO: 1 is defined by one of
positions 2, 4, 6, 19, 25, 29, 33, 38, 42, 51 and 53 of the GB1
scaffold. The methodology used to specify hydrophobic core residues
in GB1 is described by Dahiyat et al., ("Probing the role of
packing specificity in protein design," Proc. Natl. Acad. Sci. USA,
1997, 94, 10172-10177) where a PDB structure was used to calculate
which side chains expose less than 10% of their surface area to
solvent. Such methods can be modified for use with the GB1 scaffold
domain.
[0033] As used herein, the term "surface mutation" refers to an
amino acid mutation in a GB1 scaffold that is located at a position
in the 4.beta.-1.alpha. structure that is solvent exposed. Such
variant amino acid residues at surface positions of a GB1 peptidic
compound are capable of interacting directly with a target
molecule, whether or not such an interaction occurs.
[0034] As used herein, the term "boundary mutation" refers to an
amino acid mutation of a GB1 scaffold that is located at a position
in the 4.beta.-1.alpha. structure that is at the boundary between
the hydrophobic core and the solvent exposed surface. Such variant
amino acid residues at boundary positions of a GB1 peptidic
compound may be in part contacting hydrophobic core residues and/or
in part solvent exposed and capable of some interaction with a
target molecule, whether or not such an interaction occurs. One
criteria for describing core, surface and boundary residues of a
GB1 peptidic structure is described by Mayo et al. Nature
Structural Biology, 5(6), 1998, 470-475. Such methods and criteria
can be modified for use with the GB1 scaffold domain.
[0035] As used herein, the term "linking sequence" refers to a
continuous sequence of amino acid residues, or analogs thereof,
that connect two peptidic motifs. In certain embodiments, a linking
sequence is the loop connecting two .beta.-strands in a 13-hairpin
motif.
[0036] As used herein, the term "phage display" refers to a
technique by which variant peptidic compounds are displayed as
fusion proteins to a coat protein on the surface of phage, e.g.
filamentous phage particles. The term "phagemid" refers to a
plasmid vector having a bacterial origin of replication, e.g.,
Co1E1, and a copy of an intergenic region of a bacteriophage. The
phagemid may be based on any known bacteriophage, including
filamentous bacteriophage. In some instances, the plasmid will also
contain a selectable marker for antibiotic resistance. Segments of
DNA cloned into these vectors can be propagated as plasmids. When
cells harboring these vectors are provided with all genes necessary
for the production of phage particles, the mode of replication of
the plasmid changes to rolling circle replication to generate
copies of one strand of the plasmid DNA and package phage
particles. The phagemid may form infectious or non-infectious phage
particles. This term includes phagemids which contain a phage coat
protein gene or fragment thereof linked to a heterologous
polypeptide gene as a gene fusion such that the heterologous
polypeptide is displayed on the surface of the phage particle.
[0037] As used herein, the term "phage vector" refers to a double
stranded replicative form of a bacteriophage that contains a
heterologous gene and is capable of replication. The phage vector
has a phage origin of replication allowing phage replication and
phage particle formation. In some cases, the phage is a filamentous
bacteriophage, such as an M13, f1, fd, Pf3 phage or a derivative
thereof, a lambdoid phage, such as lambda, 21, phi80, phi81, 82,
424, 434, etc., or a derivative thereof, a Baculovirus or a
derivative thereof, a T4 phage or a derivative thereof, a T7 phage
virus or a derivative thereof.
[0038] As used herein, the term "stable" refers to a compound that
is able to maintain a folded state under physiological conditions
at a certain temperature, such that it retains at least one of its
normal functional activities, for example binding to a target
protein. The stability of the compound can be determined using
standard methods. For example, the "thermostability" of a compound
can be determined by measuring the thermal melt ("Tm") temperature.
The Tm is the temperature in degrees Celsius at which half of the
compounds become unfolded. In some instances, the higher the Tm,
the more stable the compound.
[0039] The compounds of the subject libraries may contain one or
more asymmetric centers and may thus give rise to enantiomers,
diastereomers, and other stereoisomeric forms that may be defined,
in terms of absolute stereochemistry, as (R)- or (S)- or, as (D)-
or (L)- for amino acids and polypeptides. The present invention is
meant to include all such possible isomers, as well as, their
racemic and optically pure forms. When the compounds described
herein contain olefinic double bonds or other centers of geometric
asymmetry, and unless specified otherwise, it is intended that the
compounds include both E and Z geometric isomers. Likewise, all
tautomeric forms are also intended to be included.
[0040] As used herein, the term "a target protein" refers to all
members of the target family, and fragments and enantiomers
thereof, and protein mimics thereof. The target proteins of
interest that are described herein are intended to include all
members of the target family, and fragments and enantiomers
thereof, and protein mimics thereof, unless explicitly described
otherwise. The target protein may be any protein of interest, such
as a therapeutic or diagnostic target, including but not limited
to: hormones, growth factors, receptors, enzymes, cytokines,
osteoinductive factors, colony stimulating factors and
immunoglobulins. The term "target protein" is intended to include
recombinant and synthetic molecules, which can be prepared using
any convenient recombinant expression methods or using any
convenient synthetic methods, or purchased commercially, as well as
fusion proteins containing a target molecule, as well as synthetic
L- or D-proteins.
[0041] As used herein, the term "protein mimic" refers to a
peptidic compound that mimics a binding property of a protein of
interest, e.g., a target protein. In general terms, the target
protein mimic includes an essential part of the original target
protein (e.g., an epitope or essential residues thereof) that is
necessary for forming a potential binding surface, such that the
target protein mimic and the original target protein are each
capable of binding specifically to a binding moiety of interest,
e.g., an antibody or a D-peptidic compound. In some embodiments,
the part(s) of the original target protein that is essential for
binding is displayed on a scaffold such that potential binding
surface of the original target protein is mimicked. Any suitable
scaffold for displaying the minimal essential part of the target
protein may be used, including but not limited to antibody
scaffolds, scFv, anticalins, non-antibody scaffolds, mimetics of
protein secondary and tertiary structures. In some embodiments, a
target protein mimic includes residues or fragments of the original
target protein that are incorporated into a protein scaffold, where
the scaffold mimics a structural motif of the target protein. For
example, by incorporating residues of the target protein at
desirable positions of a convenient scaffold, the protein mimic may
present a potential binding surface that mimics that of the
original target protein. In some embodiments, the native structure
of the fragments of the original target protein are retained using
methods of conformational constraint. Any convenient methods of
conformationally constraining a peptidic compound may be used, such
as but not limited to, bioconjugation, dimerization (e.g., via a
linker), multimerization, or cyclization.
DETAILED DESCRIPTION
[0042] GB1 peptidic libraries and methods of screening the same for
the identification of compounds that specifically bind to target
proteins are provided. The subject libraries include a plurality of
GB1 peptidic compounds, where each GB1 peptidic compound has a
scaffold domain of the same structural motif as the B1 domain of
Protein G (GB1), where the structural motif of GB1 is characterized
by a motif that includes an arrangement of four 13-strands and one
.alpha.-helix around a hydrophobic core (also referred to as a
4.beta.-1.alpha. motif). The GB1 peptidic compounds of the subject
libraries include mutations at non-core positions, e.g., variant
amino acids at positions within a GB1 scaffold domain that are not
part of the hydrophobic core of the structure. A 4.beta.-1.alpha.
motif is depicted in FIG. 1.
[0043] A variety of libraries of GB1 peptidic compounds are
provided. For library diversity, both the positions of the
mutations and the nature of the mutation at each variable position
of the scaffold may be varied. In some instances, the mutations are
included at non-core positions, although mutations at core
positions may also be included. The mutations may confer different
functions on the resulting GB1 peptidic compounds, such as specific
binding to a target molecule. The mutations may be selected at
positions of a GB1 scaffold domain that are solvent exposed such
that the variant amino acids at these positions can form part of a
potential target molecule binding surface, although mutations at
selected core and/or boundary positions may also be included. In a
subject library, the mutations may be concentrated in a variable
domain that defines one of several distinct potential binding
surfaces of the GB1 scaffold domain. Libraries of GB1 peptidic
compounds are provided that include distinct arrangements of
mutations concentrated at various surfaces of the 413-1.alpha.
motif, for example, as depicted in FIGS. 2A-2B. The subject
libraries may include compounds that specifically bind to a target
molecule via one of the several potential binding sites of the GB1
scaffold domain. Mutations may be included at the potential binding
surface to provide for specific binding to a target molecule
without significantly disrupting the GB1 peptidic structure.
[0044] In the subject methods, a GB1 peptidic library is contacted
with a target molecule to screen for a compound of the library that
specifically binds to the target with high affinity. The subject
methods and libraries find use in a variety of applications,
including screening applications.
[0045] Before certain embodiments are described in greater detail,
it is to be understood that this invention is not limited to
certain embodiments described, as such may, of course, vary. It is
also to be understood that the terminology used herein is for the
purpose of describing certain embodiments only, and is not intended
to be limiting, since the scope of the present invention will be
limited only by the appended claims.
[0046] Where a range of values is provided, it is understood that
each intervening value, to the tenth of the unit of the lower limit
unless the context clearly dictates otherwise, between the upper
and lower limit of that range and any other stated or intervening
value in that stated range, is encompassed within the invention.
The upper and lower limits of these smaller ranges may
independently be included in the smaller ranges and are also
encompassed within the invention, subject to any specifically
excluded limit in the stated range. Where the stated range includes
one or both of the limits, ranges excluding either or both of those
included limits are also included in the invention.
[0047] Unless defined otherwise, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs. Although
any methods and materials similar or equivalent to those described
herein can also be used in the practice or testing of the present
invention, representative illustrative methods and materials are
now described.
[0048] All publications and patents cited in this specification are
herein incorporated by reference as if each individual publication
or patent were specifically and individually indicated to be
incorporated by reference and are incorporated herein by reference
to disclose and describe the methods and/or materials in connection
with which the publications are cited. The citation of any
publication is for its disclosure prior to the filing date and
should not be construed as an admission that the present invention
is not entitled to antedate such publication by virtue of prior
invention. Further, the dates of publication provided may be
different from the actual publication dates which may need to be
independently confirmed.
[0049] It is noted that, as used herein and in the appended claims,
the singular forms "a", "an", and "the" include plural referents
unless the context clearly dictates otherwise. It is further noted
that the claims may be drafted to exclude any optional element. As
such, this statement is intended to serve as antecedent basis for
use of such exclusive terminology as "solely," "only" and the like
in connection with the recitation of claim elements, or use of a
"negative" limitation.
[0050] Each of the individual embodiments described and illustrated
herein has discrete components and features which may be readily
separated from or combined with the features of any of the other
several embodiments without departing from the scope or spirit of
the present invention. Any recited method can be carried out in the
order of events recited or in any other order which is logically
possible.
[0051] In further describing the various aspects of the invention,
the structures and sequences of members of the various libraries
are described first in greater detail, followed by a description of
methods of screening and applications in which the libraries finds
use.
Libraries
[0052] As summarized above, aspects of the invention include
libraries of GB1 peptidic compounds where each GB1 peptidic
compound has a scaffold domain of the same structural motif as the
B1 domain of Protein G (GB1), where the structural motif of GB1 is
characterized by a motif that includes an arrangement of four
.beta.-strands and one .alpha.-helix (also referred to as a
4.beta.-1.alpha. motif) around a hydrophobic core. The GB1 peptidic
compounds of the subject libraries include mutations at various
non-core positions of the 4.beta.-1.alpha. motif, e.g., variant
amino acids at non-core positions within a GB1 scaffold domain. In
many embodiments, the four .beta.-strands and one .alpha.-helix
motifs of the structure are arranged in a hairpin-helix-hairpin
motif, e.g., .beta.1-.beta.2-.alpha.1-.beta.3-.beta.4 where
.beta.1-.beta.4 are .beta.-strand motifs and .alpha.1 is a helix
motif. A GB1 peptidic hairpin-helix-hairpin motif is depicted in
FIG. 1.
[0053] A GB1 scaffold domain may be any polypeptide, or fragment
thereof that includes the 4.beta.-1.alpha. motif, whether naturally
occurring or synthetic. The GB1 scaffold domain may be a native
sequence of a member of the IgG binding B domain protein family, a
IgG binding B domain sequence with pre-existing amino acid sequence
modifications (such as additions, deletions and/or substitutions),
or a fragment or analogue thereof. GB1 scaffold domains include
those described in the following references Gronenborn et al., FEBS
Letters 398 (1996), 312-316; Kotz et al., Eur. J. Biochem. 271,
1623-1629 (2004); Malakaukas et al., Nature Structural Biology,
5(6), 1998, p. 470-475; Minor Jr. et al., Nature, 367, 1994,
660-663; Nauli et al. Nature Structural Biology, 8(7), 2001,
602-605; Smith et al., Biochemistry, 1994, 33, 5510-5517;
Wunderlich et al. J. Mol. Biol. (2006) 363, 545-557; and analogs or
fragments thereof; and those scaffolds described in the definitions
section above. In certain embodiments, a GB1 scaffold domain has an
amino acid sequence as set forth in one of SEQ ID NOs: 1 and
227-261. In certain embodiments, a GB1 scaffold domain includes a
sequence having 60% or more amino acid sequence identity, such as
70% or more, 80% or more, 90% or more, 95% or more or 98% or more
amino acid sequence identity to an amino acid sequence set forth in
one of SEQ ID NO: 1 and 227-261. A GB1 scaffold domain sequence may
include 1 or more, such as 2 or more, 3 or more, 4 or more, 5 or
more, 10 or more, 15 or more, or even 20 or more additional
peptidic residues compared to a native IgG binding B domain
sequence. Alternatively, a GB1 scaffold domain sequence may include
fewer peptidic residues compared a native IgG binding B domain
sequence, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10, or even fewer
residues.
[0054] Exemplary GB1 scaffold domain sequences from the Wellcome
Trust Sanger Institute Pfam database are shown in the following
sequence alignments:
TABLE-US-00001 B4U242_STREM/244-298 (SEQ ID NO: 227)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...
B4U242_STREM/316-370 (SEQ ID NO: 228)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGITGEWAYDTATKTFTVTE...
C0MA37_STRE4/228-282 (SEQ ID NO: 229)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...
C0MA37_STRE4/300-354 (SEQ ID NO: 230)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGVTGEWAYDAATKTFTVTE...
C0MCK9_STRS7/228-282 (SEQ ID NO: 231)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...
C0MCK9_STRS7/300-354 (SEQ ID NO: 232)
....TYRLVIKGVTFSGETSTKAVDAATAEQ.TFRQYANDNGVTGEWAYDAATKTFTVTE...
Q1JGB6_STRPD/117-137 (SEQ ID NO: 233)
ANIP........................AEK.AFRQYANDNGVDGV.................
Q53291_PEPMA/330-384 (SEQ ID NO: 234)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
Q53291_PEPMA/400-454 (SEQ ID NO: 235)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...
Q53337_9STRE/3-57 (SEQ ID NO: 236)
....TYKLVINGKTLKGETTTKTVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...
Q53974_STRDY/258-312 (SEQ ID NO: 237)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...
Q53975_STRDY/224-278 (SEQ ID NO: 238)
....TYKLVVKGNTFSGETTTKAIDTATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...
Q53975_STRDY/294-348 (SEQ ID NO: 239)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYATANNVDGEWSYDDATKTFTVTE...
Q53975_STRDY/364-418 (SEQ ID NO: 240)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANGVDGEWSYDDATKTFTVTE...
Q53975_STRDY/434-488 (SEQ ID NO: 241)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYANENGVYGEWSYDDATKTFTVTE...
Q53975_STRDY/504-558 (SEQ ID NO: 242)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...
Q54181_STRSG/1-45 (SEQ ID NO: 243)
..............MKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
Q54181_STRSG/131-185 (SEQ ID NO: 244)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...
Q54181_STRSG/61-115 (SEQ ID NO: 245)
....TYKLVINGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
Q56192_STAXY/238-290 (SEQ ID NO: 246)
....TYKLILNGKTLKGETTTEAVDAATARSFNFPILENSSSVPGDPLESTCMH......VEH
Q56193_STAXY/238-293 (SEQ ID NO: 247)
....TYKLILNGKTLKGETTTEAVDAATARSFNFPILENSSSVPGDPLESTCRHASFAQA...
Q56212_STRSZ/228-282 (SEQ ID NO: 248)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...
Q56212_STRSZ/300-354 (SEQ ID NO: 249)
....TYRLVIKGVTFSGETATKAVDAATAEQ.AFRQYANDNGVTGEWAYDAATKTFTVTE...
Q76K19_STRSZ/232-286 (SEQ ID NO: 250)
...S.YKLVIKGATFSGETATKAVDAAVAEQ.TFRDYANKNGVDGVWAYDAATKTFTVTE...
Q76K19_STRSZ/304-358 (SEQ ID NO: 251)
....TYRLVIKGVTFSGETATKAVDAATAEQ.TFRQYANDNGITGEWAYDTATKTFTVTE...
Q93EM8_STRDY/224-278 (SEQ ID NO: 252)
....TYKLVVKGNTFSGETTTKAIDTATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...
Q93EM8_STRDY/294-348 (SEQ ID NO: 253)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANNVDGEWSYDYATKTFTVTE...
Q93EM8_STRDY/364-418 (SEQ ID NO: 254)
....TYKLIVKGNTFSGETTTKAIDAATAEK.EFKQYATANNVDGEWSYDDATKTFTVTE...
Q93EM8_STRDY/434-488 (SEQ ID NO: 255)
....TYKLIVKGNTFSGETTTKAVDAETAEK.AFKQYATANNVDGEWSYDDATKTFTVTE...
Q93EM8_STRDY/504-558 (SEQ ID NO: 256)
....TYKLVINGKTLKGETTTKAVDVETAEK.AFKQYANENGVDGVWTYDDATKTFTVTE...
SPG1_STRSG/228-282 (SEQ ID NO: 257)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
SPG1_STRSG/298-352 (SEQ ID NO: 258)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...
SPG2_STRSG/303-357 (SEQ ID NO: 259)
....TYKLILNGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
SPG2_STRSG/373-427 (SEQ ID NO: 260)
....TYKLVINGKTLKGETTTEAVDAATAEK.VFKQYANDNGVDGEWTYDDATKTFTVTE...
SPG2_STRSG/443-497 (SEQ ID NO: 261)
....TYKLVINGKTLKGETTTKAVDAETAEK.AFKQYANDNGVDGVWTYDDATKTFTVTE...
[0055] In some embodiments, the GB1 scaffold domain is described by
the following sequence:
(T/S)Y(K/R)L(Z1)(Z1)(N/K)G(K/N/V/A)T(L/F)(K/S)GET(T/A/S)T(K/E)(A/T)(V/I)D-
(A/T/V)
(A/E)(T/V)AE(K/Q)(A/E/T/V)F(K/R)(Q/D)YA(N/T)(A/D/E/K)N(G/N)(Z3)(D/-
T)G(E/V)W(A/T/S)YD(D/A/Y/T)ATKT(Z1)T(Z1)TE (SEQ ID NO:262) where
each Z1 is independently a hydrophobic residue. In some
embodiments, the GB1 scaffold domain is described by the following
sequence:
(T/S)Y(K/R)L(I/V)(L/I/V)(N/K)G(K/N/V/A)T(L/F)(K/S)GET(T/A/S)T(K/E)(A/T)(V-
/I)D(A/T/V)(A/E)(T/V)AE(K/Q)(A/E/T/V)F(K/R)(Q/D)YA(N/T)(A/D/E/K)N(G/N)(V/I-
)(D/T)G(E/V) W(A/T/S)YD(D/A/Y/T)ATKTFTVTE (SEQ ID NO:263). In
certain embodiments, GB1 scaffold domain is described by the
following sequence:
TYKL(I/V)(L/I/V)(N/K)G(K/N)T(L/F)(K/S)GET(T/A)T(K/E)AVD(A/T/V)(A/E)TAE(K/-
Q)(A/E/T/V)F(K/R)QYA(N/T)(A/D/E/K)N(G/N)VDG(E/V)W(A/T/S)YD(D/A)ATKTFTVTE
(SEQ ID NO:264). A mutation in a scaffold domain may include a
deletion, insertion, or substitution of an amino acid residue at
any convenient position to produce a sequence that is distinct from
the reference scaffold domain sequence.
[0056] In some embodiments, the GB1 scaffold domain is described by
the following sequence:
T(Z2)K(Z1)(Z1)(Z1)(N/V)(G/L/I)(K/G)(Q/T/D)(L/A/R)(K/V)(G/E/V)(E/V)(A/T/R/-
I/P/V)(T/I)
(R/W/L/K/V/T/I)E(A/L/I)VDA(A/G)(T/E)(A/V/F)EK(V/I/Y)(F/L/W/I/A)K(L/Q)(Z1)-
(Z3)N(A/D)(K/N)(T/G)(V/I)(E/D)G(V/E)(W/F)TY(D/K)D(E/A)(T/I)KT(Z1)T(Z1)TE
(SEQ ID NO:265), where each Z1 is independently a hydrophobic
residue, Z2 is an aromatic hydrophobic residue, and Z3 is a
non-aromatic hydrophobic residue.
[0057] In some embodiments, the GB1 scaffold domain is described by
the following sequence:
TABLE-US-00002 (SEQ ID NO: 266)
T(Y/F/W/A)K(L/V/I/M/F/Y/A)(L/V/I/F/M)(L/V/I/F/M/A/Y/S)(N/V)(G/L/I)(K/G)(Q/-
T/D)(L/
A/R)(K/V)(G/E/V)(E/V)(A/T/R/I/P/V)(T/I)(R/W/L/K/V/T/I)E(A/L/I)VDA(A/G)(T/E-
)(A/V/F)
EK(V/I/Y)(F/L/W/I/A)K(L/Q)(W/F/L/M/Y/I)(L/V/I/A)N(A/D)(K/N)(T/G)(V/I)(E/D)-
G(V/E) (W/F)TY(D/K)D(E/A)(T/I)KT(L/V/I/F/M/W)T(L/V/I/F/M)TE .
[0058] The diversity of the subject libraries is designed to
maximize diversity while minimizing structural perturbations of the
GB1 scaffold domain. The positions to be mutated are selected to
ensure that the GB1 peptidic compounds of the subject libraries can
maintain a folded state under physiological conditions. Another
aspect of generating diversity in the subject libraries is the
selection of amino acid positions to be mutated such that the amino
acids can form a potential binding surface in the GB1 scaffold
domain, whether or not the residues actually contact a target
protein. One way of determining whether an amino acid position is
part of a potential binding surface involves examining the three
dimensional structure of the GB1 scaffold domain, using a computer
program such as the UCSF Chimera program. Other ways include
crystallographic and genetic mutational analysis. Any convenient
method may be used to determine whether an amino acid position is
part of a potential binding surface.
[0059] The mutations may be found at positions in the GB1 scaffold
domain where the amino acid residue is at least in part solvent
exposed. Solvent exposed positions can be determined using software
suitable for protein modeling and three-dimensional structural
information obtained from a crystal structure. For example, solvent
exposed residues may be determined using the Protein Data Bank
(PDB) structure 3 GB1 and estimating the solvent accessible surface
area (SASA) for each residue using the GETarea tool (Fraczkiewicz
& Braun, "Exact and efficient analytical calculation of the
accessible surface areas and their gradients for macromolecules,"
J. Comput. Chem. 1998, 19, 319-333). This tool calculates the ratio
of SASA in structure compared to SASA in a random coil. A ratio of
0.4 was used in selecting the following solvent accessible residues
(shown in bold):
TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVT E (SEQ ID
NO:1).
[0060] The mutations of the GB1 scaffold domain may be concentrated
at one of several different potential binding surfaces of the
scaffold domain. Several distinct arrangements of mutations of the
GB1 scaffold domain at non-core positions of the
hairpin-helix-hairpin scaffold domain are provided. In some
instances, the majority of the mutations are at non-core positions
of the GB1 scaffold domain (e.g., solvent exposed or boundary
positions) however in some cases one or more mutations may be
located at hydrophobic core positions. In certain embodiments,
mutations at hydrophobic core position may be tolerated without
significantly disrupting the GB1 scaffold structure, such as, when
those core mutations are selected in a loop region. In such cases
the loop region may form a structure or conformation that is
different to that of the parent scaffold.
[0061] In certain embodiments, the GB1 scaffold may have loop
regions that are independently selected from any one of the loop
sequences set forth in Table 1: (SEQ ID NOs: 67-196 and 267-272).
Any of the loop sequences 1-4 of Table 1 may be incorporated at the
positions indicated in Table 1 into any convenient GB1 scaffold
domain (e.g., SEQ ID NO: 1) to produce another GB1 scaffold
domain.
[0062] In certain embodiments, mutations at boundary positions may
also be tolerated without significantly disrupting the GB1 scaffold
structure. Mutations at such positions may confer desirable
properties upon the resulting GB1 compound variants, such as
stability, a certain structural property, or specific binding to a
target molecule.
[0063] The positions of the mutations in the GB1 scaffold domain
may be described herein either by reference to a structural motif
or region, or by reference to a position number in the primary
sequence of the scaffold domain. FIG. 3 illustrates the alignment
of the position numbering scheme for a GB1 scaffold domain relative
to its .beta.1, .beta.2, .alpha.1, .beta.3 and .beta.4 motifs, and
relative to the mutations of certain libraries of the invention.
Positions marked with an asterix indicate exemplary positions at
which mutations that include the insertion of one or more amino
acids may be included. Any GB1 scaffold domain sequence may be
substituted for the scaffold sequence depicted in FIG. 3, and the
positions of the mutations that define a subject library may be
transferred from one scaffold to another by any convenient method.
For example, a sequence alignment method may be used to place any
GB1 scaffold domain sequence within the framework of the position
numbering scheme illustrated in FIG. 3. Alignment methods based on
structural motifs such as .beta.-strands and .alpha.-helices may
also be used to place a GB1 scaffold domain sequence within the
framework of the position numbering scheme illustrated in FIG.
3.
[0064] In some cases, a first GB1 scaffold domain sequence may be
aligned with a second GB1 scaffold domain sequence that is one or
more amino acids longer or shorter. For example, the second GB1
scaffold domain may have one or more additional amino acids at the
N-terminal or C-terminal relative to the first GB1 scaffold, or may
have one or more additional amino acids in one of the loop regions
of the structure. In such cases, a numbering scheme such as is
described below for insertion mutations may be used to relate two
scaffold domain sequences.
[0065] Another aspect of the diversity of the subject libraries is
the size of the library, i.e, the number of distinct compounds of
the library. In some embodiments, a subject library includes 50 or
more distinct compounds, such as 100 or more, 300 or more,
1.times.10.sup.3 or more, 1.times.10.sup.4 or more,
1.times.10.sup.5 or more, 1.times.10.sup.6 or more,
1.times.10.sup.7 or more, 1.times.10.sup.8 or more,
1.times.10.sup.9 or more, 1.times.10.sup.10 or more,
1.times.10.sup.11 or more, or 1.times.10.sup.12 or more, distinct
compounds.
[0066] A subject library may include GB1 peptidic compounds each
having a hairpin-helix-hairpin scaffold domain described by formula
(I):
P1-.alpha.1-P2 (I)
[0067] where P1 and P2 are independently beta-hairpin domains and
.alpha.1 is a helix domain and P1, .alpha.1 and P2 are connected
independently by linking sequences of between 1 and 10 residues in
length. In some embodiments, in formula (I), P1 is .beta.1-.beta.2
and P2 is .beta.3-.beta.4 such that the compounds are described by
formula (II):
.beta.1-.beta.2-.alpha.1-.beta.3-.beta.4 (II)
[0068] where .beta.1, .beta.2, .beta.3 and .beta.4 are
independently beta-strand domains and .alpha.1 is a helix domain,
and .beta.1, .beta.2, .alpha.1, .beta.3 and .beta.4 are connected
independently by linking sequences of between 1 and 10 residues in
length, such as, between 2 and 8 residues, or between 3 and 6
residues in length. In certain embodiments, each linking sequence
is independently of 3, 4, 5, 6, 7 or 8 residues in length, such as
4 or 5 residues in length.
[0069] In certain embodiments, the linking sequences may form a
loop or a turn structure. For example, the two antiparallel
.beta.-strands of a hairpin motif may be connected via a loop.
Mutations in a linking sequence that includes insertion or deletion
of one or more amino acid residues may be tolerated without
significantly disrupting the GB1 scaffold structure. In some
embodiments, in formulas (I) and (II), each compound of the subject
library includes mutations in one or more linking sequences. In
certain embodiments, 80% or more, 90% or more, 95% or more, or even
100% of the mutations are at positions within the regions of the
linking sequences. In certain embodiments, in formulas (I) and
(II), at least one of the linking sequences is one or more (e.g.,
such as 2 or more) residues longer in length than the corresponding
linking sequence of the GB1 scaffold. In certain embodiments, in
formulas (I) and (II), at least one of the linking sequences is one
or more residues shorter in length than the corresponding linking
sequence of the GB1 scaffold.
[0070] In some embodiments, one or more positions in the scaffold
may be selected as positions at which to include insertion
mutations, e.g., mutations that include the insertion of 1 or 2
additional amino acid residues in addition to the amino acid
residue being substituted. In certain embodiments, the insertion
mutations are selected for inclusion in one or more loop regions,
or at the N-terminal or C-terminal of the scaffold. The positions
of the variant amino acids that are inserted may be referred to
using a letter designation with respect to the numbered position of
the mutation, e.g., an insertion mutation of 2 amino acids at
position 38 may be referred to as positions 38a and 38b.
[0071] In certain embodiments, the subject library includes a
mutation at position 38 that includes insertion of 0, 1 or 2
variant amino acids. In certain embodiments, the subject library
includes a mutation at position 19 that includes insertion of 0, 1
or 2 variant amino acids. In certain embodiments, the subject
library includes a mutation at position 1 that includes insertion
of 2 variant amino acids, and at positions 19 and 47 that each
include insertion of 0, 1 or 2 variant amino acids. In certain
embodiments, the subject library includes mutations at positions 9
and 38 that each includes insertion of 0, 1 or 2 variant amino
acids, and at position 55 that includes insertion of 1 variant
amino acid. In certain embodiments, the subject library includes a
mutation at position 9 that includes insertion of 0, 1 or 2 variant
amino acids, and at position 55 that includes insertion of 1
variant amino acid. In certain embodiments, the subject library
includes a mutation at position 1 that includes insertion of 1
variant amino acid and at position 47 that includes insertion of 0,
1 or 2 variant amino acids.
[0072] In some cases, when an insertion mutation (e.g., insertion
of one or more additional variant amino acids) is made in a GB1
scaffold, the resulting GB1 compound variants may be aligned with
the parent GB1 scaffold in different ways. For example, an
insertion mutation including 2 additional variant amino acids at
position 38 of the GB1 scaffold may lead to GB1 compound variants
where the loop regions between the .alpha.1 and P3 regions can be
aligned with the GB1 scaffold domain in two or more distinct ways.
In other words, the resulting GB1 compounds may encompass various
distinct loop sequences and/or structures that align differently
with the parent GB1 scaffold domain. In some cases, the various
distinct loop sequences are produced when the insertion mutation is
in a variable loop region (e.g. where most of the loop region is
being mutated).
[0073] In some embodiments, each compound of a subject library
includes 4 or more, such as, 5 or more, 6 or more, 7 or more, 8 or
more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14
or more, or 15 or more mutations at different positions of a
hairpin-helix-hairpin scaffold domain. The mutations may involve
the deletion, insertion, or substitution of the amino acid residue
at the position of the scaffold being mutated. The mutations may
include substitution with any naturally or non-naturally occurring
amino acid, or an analog thereof.
[0074] In some embodiments, each compound of a subject library
includes 3 or more different non-core mutations, such as, 4 or
more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or
more, 11 or more, or 12 or more different non-core mutations in a
region outside of the .beta.1-.beta.2 region.
[0075] In some embodiments, each compound of a subject library
includes 3 or more different non-core mutations, such as, 4 or
more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or
more or 11 or more different non-core mutations in the .alpha.1
region.
[0076] In some embodiments, each compound of a subject library
includes 3 or more different non-core mutations, such as 4 or more,
5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more
different non-core mutations in the .beta.3-.beta.4 region.
[0077] In some embodiments, each compound of a subject library
includes 5 or more different non-core mutations, such as 6 or more,
7 or more, 8 or more, 9 or more, 10 or more, 11 or more, or 12 or
more different non-core mutations in the .alpha.1-.beta.3
region.
[0078] In certain embodiments, each compound of a subject library
includes ten or more different mutations, where the ten or more
different mutations are located at positions selected from the
group consisting of positions 21-24, 26, 27, 30, 31, 34, 35,
37-41.
[0079] In certain embodiments, each compound of a subject library
includes ten or more different mutations, where the ten or more
different mutations are located at positions selected from the
group consisting of positions 18-24, 26-28, 30-32, 34 and 35.
[0080] In certain embodiments, each compound of a subject library
includes ten or more different mutations, where the ten or more
different mutations are located at positions selected from the
group consisting of positions 1, 18-24 and 45-49. In certain
embodiments, each compound of a subject library includes ten or
more different mutations, where the ten or more different mutations
are located at positions selected from the group consisting of
positions 7-12, 36-41, 54 and 55.
[0081] In certain embodiments, each compound of a subject library
includes ten or more different mutations, where the ten or more
different mutations are located at positions selected from the
group consisting of positions 3, 5, 7-14, 16, 52, 54 and 55.
[0082] In certain embodiments, each compound of a subject library
includes ten or more different mutations, where the ten or more
different mutations are located at positions selected from the
group consisting of positions 1, 3, 5, 7, 41, 43, 45-50 52 and
54.
[0083] In certain embodiments, each compound of a subject library
includes five or more different mutations in the .alpha.1 region.
In certain embodiments, five or more different mutations are
located at positions selected from the group consisting of
positions 22-24, 26, 27, 30, 31, 34 and 35.
[0084] In certain embodiments, each compound of a subject library
includes ten or more different mutations in the .alpha.1 region. In
certain embodiments, the ten or more different mutations are
located at positions selected from the group consisting of
positions 22-24, 26, 27, 28, 30, 31, 32, 34 and 35.
[0085] In certain embodiments, each compound of a subject library
includes three or more different mutations in the .beta.3-.beta.4
region. In certain embodiments, the three or more different
mutations are located at positions selected from the group
consisting of positions 41, 54 and 55. In certain embodiments, the
three or more different mutations are located at positions selected
from the group consisting of positions 52, 54 and 55.
[0086] In certain embodiments, each compound of a subject library
includes five or more different mutations in the .beta.3-.beta.4
region. In certain embodiments, the five or more different
mutations are located at positions selected from the group
consisting of positions 45-49. In certain embodiments, each
compound of a subject library includes nine or more different
mutations in the .beta.3-.beta.4 region. In certain embodiments,
the nine or more different mutations are located at positions
selected from the group consisting of positions 41, 43, 45-50 52
and 54.
[0087] In certain embodiments, each compound of a subject library
includes two or more different mutations in the region between the
al and .beta.3 regions, e.g., mutations in the linking sequence
between al and .beta.3. In certain embodiments, the two or more
different mutations are located at positions selected from the
group consisting of positions 37-40.
[0088] In certain embodiments, each compound of a subject library
includes three or more, four or more, five or more, six or more, or
ten or more different mutations in the .beta.1-.beta.2 region. In
certain embodiments, the ten or more different mutations in the
.beta.1-.beta.2 region are located at positions selected from the
group consisting of positions 3, 5, 7-14 and 16.
[0089] In some embodiments, each compound of a subject library is
described by a formula independently selected from the group
consisting of:
F1-V1-F2 (III);
F3-V2-F4 (IV);
V3-F5-V4-F6-V5-F7 (V);
F8-V6-F9-V7-F10-V8 (VI);
V9-F11-V10 (VII); and
V11-F12-V12 (VIII)
[0090] where F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11 and F12
are fixed regions and V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11
and V12 are variable regions;
[0091] where each fixed region is common to all compounds of the
same formula and each compound of the library has a distinct
variable region.
[0092] In certain embodiments, each compound of a subject library
is described by formula (III), where:
[0093] F1 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TYKLILNGKTLKGETTTEA (SEQ ID NO:
2);
[0094] F2 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence TYDDATKTFTVTE (SEQ ID NO: 3);
and
[0095] V1 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence VDAATAEKVFKQYANDNGVDGEW (SEQ ID
NO: 4), where each compound of the library comprises 10 or more
mutations (e.g., 11, 12, 13, 14 or 15 or more mutations) in the V1
variable region.
[0096] In certain embodiments, in formula (III), V1 comprises a
sequence of the following formula: VXXXXAXXVFXXYAXXNXXXXXW (SEQ ID
NO: 5), where each X is a variant amino acid.
[0097] In certain embodiments, in formula (III), F1 comprises the
sequence TYKLILNGKTLKGETTTEA (SEQ ID NO: 2), F2 comprises the
sequence TYDDATKTFTVTE (SEQ ID NO: 3), and V1 comprises a sequence
of the following formula: VXXXXAXXVFXXYAXXNXXXXXW (SEQ ID NO: 6)
where each X is independently selected from the group consisting of
A, D, F, S, V and Y.
[0098] In certain embodiments, in formula (III), the mutation at
position 19 of V1 includes insertion of 0, 1 or 2 variant amino
acids.
[0099] In certain embodiments, each compound of a subject library
is described by formula (IV), where:
[0100] F3 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TYKLILNGKTLKGETT (SEQ ID NO:
7);
[0101] F4 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence GVDGEWTYDDATKTFTVTE (SEQ ID NO:
8); and
[0102] V2 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TEAVDAATAEKVFKQYANDN (SEQ ID
NO: 9), where each compound of the library comprises 10 or more
mutations (e.g., 11, 12, 13, 14 or 15 or more mutations) in the V2
variable region.
[0103] In certain embodiments, in formula (IV), V2 comprises a
sequence of the formula: TXXXXXXXAXXXFXXXAXXN (SEQ ID NO: 10),
where each X is a variant amino acid.
[0104] In certain embodiments, in formula (IV), F3 comprises the
sequence TYKLILNGKTLKGETT (SEQ ID NO: 7), F4 comprises the sequence
GVDGEWTYDDATKTFTVTE (SEQ ID NO: 8), and V2 comprises a sequence of
the formula: TXXXXXXXAXXXFXXXAXXN (SEQ ID NO: 11) where each X is
independently selected from the group consisting of A, D, F, S, V
and Y.
[0105] In certain embodiments, in formula (IV), the mutation at
position 3 of V2 includes insertion of 0, 1 or 2 variant amino
acids.
[0106] In certain embodiments, each compound of a subject library
is described by formula (V), where:
[0107] F5 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence KLILNGKTLKGETT (SEQ ID NO:
12);
[0108] F6 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence EKVFKQYANDNGVDGEWT (SEQ ID NO:
13);
[0109] F7 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence FTVTE (SEQ ID NO: 14);
[0110] V3 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence TY; and
[0111] V4 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence TEAVDAATA (SEQ ID NO: 15);
and
[0112] V5 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence YDDATKT (SEQ ID NO: 16);
[0113] where each compound of the library comprises one or more
mutation in the V3 variable region, 3 or more mutations (e.g., 4,
5, 6 or 7 or more mutations) in the V4 variable region, and 3 or
more mutations (e.g., 4 or 5 or more mutations) in the V5 variable
region.
[0114] In certain embodiments, in formula (V), V3 comprises a
sequence of the formula XY, V4 comprises a sequence of the formula
TXXXXXXXA (SEQ ID NO: 17), and V5 comprises a sequence of the
formula YXXXXXT (SEQ ID NO: 18) where each X is a variant amino
acid.
[0115] In certain embodiments, in formula (V), F5 comprises the
sequence KLILNGKTLKGETT (SEQ ID NO: 12), F6 comprises the sequence
EKVFKQYANDNGVDGEWT (SEQ ID NO: 13), F7 comprises the sequence FTVTE
(SEQ ID NO: 14), V3 comprises a sequence of the formula XY, V4
comprises a sequence of the formula TXXXXXXXA (SEQ ID NO: 19), and
V5 comprises a sequence of the formula YXXXXXT (SEQ ID NO: 20)
where each X is independently selected from the group consisting of
A, D, F, S, V and Y.
[0116] In certain embodiments, in formula (V), the mutation at
position 1 of V3 includes insertion of 2 variant amino acids, and
the mutations at positions 3 and 4 of V4 and V5, respectively, each
include insertion of 0, 1 or 2 variant amino acids.
[0117] In certain embodiments, each compound of a subject library
is described by formula (VI), where:
[0118] F8 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TYKLI (SEQ ID NO: 21);
[0119] F9 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence ETTTEAVDAATAEKVFKQYAN (SEQ ID
NO: 22);
[0120] F10 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TYDDATKTFT (SEQ ID NO: 23);
[0121] V6 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence LNGKTLKG (SEQ ID NO: 24);
[0122] V7 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence DNGVDGEW (SEQ ID NO: 25);
[0123] V8 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence VTE;
[0124] where each compound of the library comprises 3 or more
mutations (e.g., 4, 5 or 6 or more mutations) in the V6 variable
region, 3 or more mutations (e.g., 4, 5 or 6 or more mutations) in
the V7 variable region; and one or more mutations (e.g., 2 or more
mutations) in the V8 variable region.
[0125] In certain embodiments, in formula (VI), V6 comprises a
sequence of the formula LXXXXXXG (SEQ ID NO: 26), V7 comprises a
sequence of the formula DXXXXXXW (SEQ ID NO: 27), and V8 comprises
a sequence of the formula VXX where each X is a variant amino
acid.
[0126] In certain embodiments, in formula (VI), F8 comprises the
sequence TYKLI (SEQ ID NO: 21), F9 comprises the sequence
ETTTEAVDAATAEKVFKQYAN (SEQ ID NO: 22), F10 comprises the sequence
TYDDATKTFT (SEQ ID NO: 23), V6 comprises a sequence of the formula
LXXXXXXG (SEQ ID NO: 28), V7 comprises a sequence of the formula
DXXXXXXW (SEQ ID NO: 29), and V8 comprises a sequence of the
formula VXX where each X is independently selected from the group
consisting of A, D, F, S, V and Y.
[0127] In certain embodiments, in formula (VI), the mutations at
position 4 of V6 and V7 each include insertion of 0, 1 or 2 variant
amino acids, and the mutation at position 3 of V8 includes
insertion of 1 variant amino acid.
[0128] In certain embodiments, each compound of a subject library
is described by formula (VII), where:
[0129] F11 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence
EAVDAATAEKVFKQYANDNGVDGEWTYDDATKT (SEQ ID NO: 30);
[0130] V9 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence TYKLILNGKTLKGETTT (SEQ ID NO:
31); and
[0131] V10 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to an amino acid sequence FTVTE (SEQ ID NO: 32);
[0132] where each compound of the library comprises 6 or more
mutations (e.g., 7, 8, 9, 10 or 11 or more mutations) in the V9
variable region, and 2 or more mutations (e.g., 3 or more
mutations) in the V10 variable region.
[0133] In certain embodiments, in formula (VII), V9 comprises a
sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO: 33), and V10
comprises a sequence of the formula FXVXX (SEQ ID NO: 34), where
each X is a variant amino acid.
[0134] In certain embodiments, in formula (VII), F11 comprises the
sequence EAVDAATAEKVFKQYANDNGVDGEWTYDDATKT (SEQ ID NO: 30); V9
comprises a sequence of the formula TYXLXLXXXXXXXXTXT (SEQ ID NO:
35), and V10 comprises a sequence of the formula FXVXX (SEQ ID NO:
36), where each X is independently selected from the group
consisting of A, D, F, S, V and Y.
[0135] In certain embodiments, in formula (VII), the mutation at
position 9 of V9 includes insertion of 0, 1 or 2 variant amino
acids, and the mutation at position 5 of V10 includes insertion of
1 variant amino acid.
[0136] In certain embodiments, each compound of a subject library
is described by formula (VIII), where:
[0137] F12 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence KTLKGETTTEAVDAATAEKVFKQYANDNGVD
(SEQ ID NO: 37);
[0138] V11 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence TYKLILNG (SEQ ID NO: 38);
[0139] V12 comprises a sequence having 60% or more (e.g., 70% or
more, 80% or more, 90% or more, or 95% or more) amino acid sequence
identity to the amino acid sequence GEWTYDDATKTFTVTE (SEQ ID NO:
39);
[0140] where each compound of the library comprises 3 or more
mutations (e.g., 4 or more mutations) in the V11 variable region,
and 5 or more mutations (e.g., 6, 7, 8, 9 or 10 or more mutations)
in the V12 variable region.
[0141] In certain embodiments, in formula (VIII), V11 comprises a
sequence of the formula XYXLXLXG (SEQ ID NO: 40), and V12 comprises
a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID NO: 41), where
each X is a variant amino acid.
[0142] In certain embodiments, in formula (VIII), F12 comprises the
sequence KTLKGETTTEAVDAATAEKVFKQYANDNGVD (SEQ ID NO: 37), V11
comprises a sequence of the formula XYXLXLXG (SEQ ID NO: 42), and
V12 comprises a sequence of the formula GXWXYXXXXXXFXVXE (SEQ ID
NO: 43), where each X is independently selected from the group
consisting of A, D, F, S, V and Y.
[0143] In certain embodiments, in formula (VIII), the mutation at
position 8 of V12 includes insertion of 0, 1 or 2 variant amino
acids, and the mutation at position 1 of V11 includes insertion of
2 variant amino acids.
[0144] In some embodiments, each compound of the subject library
includes a peptidic sequence of between 30 and 80 residues, such as
between 40 and 70, between 45 and 60 residues, or between 52 and 58
residues. In certain embodiments, each compound of the subject
library includes a peptidic sequence of 52, 53, 54, 55, 56, 57 or
58 residues. In certain embodiments, the peptidic sequence is of
55, 56, or 57 residues, such as 56 residues.
[0145] In certain embodiments, each compound of the subject library
includes a GB1 scaffold domain and a variable domain. The variable
domain may be a part of the GB1 scaffold domain and may be either a
continuous or a discontinuous sequence of residues. A variable
domain that is defined by a discontinuous sequence of residues may
include contiguous variant amino acids at positions that are
arranged close in space relative to each other in the structure of
the compound. The variable domain may form a potential binding
interface of the compounds. The variable domain may define a
binding surface area of a suitable size for forming protein-protein
interactions. The variable domain may include a surface area of
between 600 and 1800 .ANG..sup.2, such as between 800 and 1600
.ANG..sup.2, between 1000 and 1400 .ANG..sup.2, between 1100 and
1300 .ANG..sup.2, or about 1200 .ANG..sup.2.
[0146] The individual sequences of the members of any one of the
subject libraries can be determined as follows. Any GB1 scaffold as
defined herein may be selected as a scaffold for a subject library.
The positions of the mutations in the GB1 scaffold domain may be
selected as described herein, e.g., as depicted in FIG. 3 for
Libraries 1 to 6, where the GB1 scaffold domain may be aligned with
the framework of FIG. 3 as described above. The nature of the
mutation at each variant amino acid position may be selected, e.g.,
substitution with any naturally occurring amino acid, or
substitution with a limited number of representative amino acids
that provide a reasonable diversity of physiochemical properties
(e.g., hydrophobicity, hydrophilicity, size, solubility). Certain
variant amino acid positions may be selected as positions where
mutations can include the insertion or deletion of amino acids,
e.g., the insertion of 1 or 2 amino acids where the variant amino
acid position occurs in a loop or turn region of the scaffold. In
certain embodiments, the mutations can include the insertion or
amino acids at one or more positions selected from positions 1, 9,
19, 38, 47 and 55. After selection of the GB1 scaffold, selection
of the positions of variant amino acids, and selection of the
nature of the mutations at each position, the individual sequences
of the members of the library can be determined.
[0147] In some embodiments, two or more of the subject libraries
may be combined to produce a larger library. The combination
library may include members that have any one of two or more
distinct arrangements of mutations that define two or more
potential binding surfaces of the GB1 scaffold. In some
embodiments, each compound of the library is described by one of
formulas (III) to (VIII), as defined above, and the library
includes at least one member described by formula (III), at least
one member described by formula (IV), at least one member described
by formula (V), at least one member described by formula (VI), at
least one member described by formula (VII), and at least one
member described by formula (VIII).
[0148] In certain embodiments, each compound of the library is
described by one of formulas (III) to (VIII), where only two of the
formulas (III) to (VIII) are represented by the members of the
library. In certain embodiments, each compound of the library is
described by one of formulas (III) to (VIII), where 5 or less, such
as 4 or less or 3 or less of the formulas (III) to (VIII) are
represented by the members of the library.
[0149] In some embodiments, the subject library includes a
combination of libraries 1 to 6 depicted in FIG. 3, e.g., a
combination of 2 or more, such as 3 or more, 4 or more, or 5 or
more of libraries 1 to 6. In some embodiments, the subject library
includes a combination of any 2 of the libraries 1 to 6 depicted in
FIG. 3, e.g., a combination of libraries 1 and 2, a combination of
libraries 2 and 3, a combination of libraries 1 and 3, a
combination of libraries 4 and 5, a combination of libraries 5 and
6, a combination of libraries 4 and 6, a combination of any one of
libraries 1-3 and any one of libraries 4-6. In some embodiments,
the subject library includes a combination of any 3 of the
libraries 1 to 6 depicted in FIG. 3, e.g., a combination of
libraries 1-3, a combination of libraries 4-6, a combination of any
2 libraries of 1-3 and any one library of 4-6, or a combination of
any one library of 1-3 and any 2 libraries of 4-6. In some
embodiments, the subject library includes a combination of all of
libraries 1 to 6 depicted in FIG. 3.
[0150] In some embodiments, the subject library is bifunctional in
the sense that the GB1 compounds of the library have two potential
binding surfaces. Such libraries can be screened to identify
compounds having specific binding properties for two target
molecules. In certain embodiments, the compounds may include a
first potential binding surface for a first target molecule and a
second potential binding surface for a second target molecule. In
certain embodiments, the first target molecule is a therapeutic
target protein and the second target molecule is an endogenous
protein or receptor (e.g., an IgG, FcRn, or serum albumin protein)
that is capable of modulating the pharmacokinetic properties (e.g.,
in vivo half-life) of a GB1 compound upon recruitment. In some
embodiments, any convenient endogenous protein target may be
selected as one of the targets to be screened. In certain
embodiments, the compounds of the library include two potential
binding surfaces for the same target molecule, where the overall
binding affinity of the compound may be modulated via an avidity
effect.
[0151] GB1 has binding affinity for human IgG fragments, e.g., hFc
binds to the al helix motif and hFab binds to the second
beta-strand (.beta.2) motif. In some embodiments, the IgG-binding
properties of the GB1 scaffold are utilized to provide one
potential binding surface of the subject bifunctional libraries. In
certain embodiments, the bifunctional library has an IgG binding
surface that includes the .alpha.1 helix motif and a target binding
surface, such as surface 5 or 6.
[0152] Any suitable combinations of potential binding surfaces may
be utilized to produce the subject bifunctional libraries. In some
cases, the two potential binding surfaces of a bifunctional library
are selected to minimize any potential steric interactions between
the first and second target molecules, e.g., by binding the targets
on opposite sides of the scaffold. In some embodiments, a pair of
potential biding surfaces of the subject bifunctional library are
selected from surfaces 1 and 5, surfaces 3 and 4, surfaces 2 and 6,
surfaces 1 and 6, surfaces 2 and 5, and surfaces 2 and 4, where the
individual surfaces 1 to 6 are shown in FIGS. 2A and 2B,
respectively. FIG. 13 illustrates exemplary pairs of potential
binding surfaces for use in the subject bifunctional libraries.
[0153] The subject bifunctional library may include one or more
variable domains on each of the potential binding surfaces of the
library. Any convenient variable domains as described herein for
surfaces 1-6 may be employed in the subject bifunctional libraries.
In some embodiments, the subject bifunctional library includes 3 or
more mutations, such as 4 or more, 5 or more, 6 or more, 7 or more,
8 or more, 10 or more, 12 or more or 14 or more mutations in the
variable domain of a first surface, and 3 or more mutations, such
as 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 10 or
more, 12 or more or 14 or more mutations in the variable domain of
a second surface. Any suitable mutations in the variable domains
may be selected, as described above for the mutations surfaces 1-6
(see e.g., FIG. 3).
[0154] The subject bifunctional library may be screened for
specific binding to first and second target molecules using a
variety of strategies. For example, the libraries can be screened
for binding to first and second target molecules using simultaneous
screening, consecutive screening or convergent screening
strategies. In some embodiments, the bifunctional library is
screened for simultaneous binding of first and second targets to
first and second surfaces, respectively. In some embodiments, a
first library is screened for binding of a first target to a first
surface to produce a second generation library based on a scaffold
that binds the first target. In certain embodiments, such binding
of a first target protein to a first surface is inherent in the
scaffold, and does not require screening, although affinity
maturation optimization of the binding of the first target may be
performed. The second generation library based on the scaffold that
binds the first target is then screening for binding to a second
target at a second surface. In some embodiments, a convergent
screening strategy is utilized where a first library is screened
for binding to a first target and a second library is screened for
binding to a second target. Utilizing the results of these screens,
first and second binding surfaces are then incorporated into the
same GB1 scaffold to produce bifunctional GB1 compounds. Such
bifunctional compounds and libraries can be optimized by affinity
maturation.
[0155] Also provided are affinity maturation libraries, e.g.,
second generation GB1 peptidic libraries based on a parent GB1
peptidic compound that binds to a certain target molecule, where
the libraries can be screened to optimize for binding affinity and
specificity, or any desirable property, such as, protein folding,
protease stability, thermostability, compatibility with a
pharmaceutical formulation, etc.
[0156] In some embodiments, the affinity maturation library is a
GB1 peptidic library as described above, except that a fraction of
the variant amino acid positions are held as fixed positions while
the remaining variant amino acid positions define the new library.
The mutations of these variant amino acids that define the affinity
maturation library may include substitution with all 20 naturally
occurring amino acids. The variant amino acids that are held as
fixed become part of a new scaffold domain. In certain embodiments,
the affinity maturation library is a GB1 peptidic library described
herein, where 70% or more of the variant amino acids, such as 75%
or more, 80% or more, or 85% or more are held fixed. In certain
embodiments, the affinity maturation library is a GB1 peptidic
library described herein, where 8 or more of the variant amino
acids, such as 9 or more, 10 or more, or 11 or more, or 12 or more
are held fixed. In some cases, the affinity maturation library
includes 6 or less, such as 5 or less, 4 or less, or 3 or less
variant amino acids. In certain embodiments, the affinity
maturation library includes 4 remaining variant amino acids. In
certain embodiments, the remaining variant amino acids are
contiguous. In certain embodiments, the remaining variant amino
acids form a continuous sequence of residues in the GB1 scaffold
domain. In certain embodiments, the affinity maturation library is
based on one of the GB1 peptidic libraries 1 to 6 as described in
FIGS. 2-3, where a fraction of the variant amino acid positions are
held as fixed positions while the remaining variant amino acid
positions define the new library. In some cases, one or more of the
variant amino acids that are held fixed may be different from the
amino acids of the GB1 scaffold shown in FIG. 3. Further, any GB1
scaffold domain may be substituted for the scaffold domain shown in
FIG. 3. The scaffold domain of an affinity maturation library may
be selected based on an initial selection for binding to a target
molecule.
[0157] In some instances, a GB1 peptidic compound that is
identified after initial screening a subject library for binding to
a certain target molecule may be selected as a scaffold for an
affinity maturation library. Any convenient methods of affinity
maturation may be used. In some cases, a number of affinity
maturation libraries are prepared that include mutations at limited
subsets of possible variant positions (e.g., mutations at 4 of a 15
variable positions), while the rest of the variant positions are
held as fixed positions. The positions of the mutations may be
tiled through the scaffold sequence to produce a series of
libraries such that mutations at every variant position is
represented and a diverse range of amino acids are substituted at
every position (e.g., all 20 naturally occurring amino acids).
Mutations that include deletion or insertion of one or more amino
acids may also be included at variant positions of the affinity
maturation libraries. An affinity maturation library may be
prepared and screened using any convenient method, e.g., phage
display library screening, to identify members of the library
having an improved property, e.g., increased binding affinity for a
target molecule, protein folding, protease stability,
thermostability, compatibility with a pharmaceutical formulation,
etc.
[0158] In some embodiments, in an affinity maturation library, most
or all of the variant amino acid positions in the variable regions
of the parent GB1 compound are held as fixed positions, and
contiguous mutations are introduced at positions adjacent to these
variable regions. Such mutations may be introduced at positions in
the parent GB1 compound that were previously considered fixed
positions in the original GB1 scaffold. Such mutations may be used
to optimize the GB1 compound variants for any desirable property,
such as protein folding, protease stability, thermostability,
compatibility with a pharmaceutical formulation, etc.
[0159] Fusion polypeptides including GB1 peptidic compounds can be
displayed on the surface of a cell or virus in a variety of formats
and multivalent forms. In one embodiment, a bivalent moiety, for
example, a hinge and dimerization sequence from a Fab template, an
anti-MBP (maltose binding protein) Fab scaffold is used for
displaying GB1 peptidic compound variants on the surface of a phage
particle. Optionally, other sequences encoding polypeptide tags
useful for purification or detection such as a FLAG tag, can be
fused at the 3' end of the nucleic acid sequence encoding the GB1
peptidic compound.
Polynucleotide Libraries
[0160] Also provided is a library of polynucleotides that encodes a
library of GB1 peptidic compounds as described above. In some
embodiments, each polynucleotide of the library encodes a distinct
GB1 peptidic compound that includes three or more, such as four or
more or five or more mutations at non-core positions in a region
outside of the .beta.1-.beta.2 region.
[0161] In some embodiments, each polynucleotide of the library
encodes a GB1 peptidic compound that includes 30 or more, 40 or
more, or 50 or more amino acids. In some embodiments, each
polynucleotide of the library encodes a GB1 peptidic compound where
the compound includes three or more variant amino acids at non-core
positions, and where each variant amino acid is encoded by a random
codon. In certain embodiments, the random codon is selected from
the group consisting of NNK (where N=A, G, C and T, and K=G and T)
and KHT (where K=G and T, and H=A, C and T).
[0162] In certain embodiments, the subject library of
polynucleotides is a library of replicable expression vectors that
includes a nucleic acid sequence encoding a gene fusion, where the
gene fusion encodes a fusion protein including the GB1 peptidic
compound fused to all or a portion of a viral coat protein. Also
included is a library of diverse replicable expression vectors
comprising a plurality of gene fusions encoding a plurality of
different fusion proteins including a plurality of the antibody
variable domains generated with diverse sequences as described
above. The vectors can include a variety of components and can be
constructed to allow for movement of the GB1 domain between
different vectors and/or to provide for display of the fusion
proteins in different formats. Examples of vectors include phage
vectors and ribosome display vectors. The phage vector has a phage
origin of replication allowing phage replication and phage particle
formation. In certain embodiments, the phage is a filamentous
bacteriophage, such as an M13, f1, fd, Pf3 phage or a derivative
thereof, or a lambdoid phage, such as lambda, 21, phi80, phi81, 82,
424, 434, etc., or a derivative thereof.
[0163] Any convenient display methods may be used to display GB1
peptidic compounds encoded by the subject library of
polynucleotides, such as cell-based display techniques and
cell-free display techniques. In certain embodiments, cell-based
display techniques include phage display, bacterial display, yeast
display and mammalian cell display. In certain embodiments,
cell-free display techniques include mRNA display and ribosome
display.
[0164] In certain embodiments, the library of polynucleotides is a
library that encodes 50 or more distinct compounds, such as 100 or
more, 300 or more, 1.times.10.sup.3 or more, 1.times.10.sup.4 or
more, 1.times.10.sup.5 or more, 1.times.10.sup.6 or more,
1.times.10.sup.7 or more, 1.times.10.sup.8 or more,
1.times.10.sup.9 or more, 1.times.10.sup.10 or more,
1.times.10.sup.11 or more, or 1.times.10.sup.12 or more, distinct
compounds, where each polynucleotide of the library encodes a GB1
peptidic compound that comprises three or more, such as four or
more or five or more different non-core mutations at positions in a
region outside of the .beta.1-.beta.2 region. In certain
embodiments, the library of polynucleotides is a library of
replicable expression vectors.
[0165] In some embodiments, each polynucleotide of the library
encodes a GB1 peptidic compound comprising ten or more variant
amino acids at non core positions, wherein each variant amino acid
is encoded by a random codon. In certain embodiments, the random
codon is selected from the group consisting of NNK and KHT.
Phage Display Libraries
[0166] The subject libraries may be prepared using any convenient
methods, such as, methods that find use in the preparation of
libraries of peptidic compounds, for example, phage display
methods.
[0167] In some embodiments, the subject library is a phage display
library. A utility of phage display is that large libraries of
randomized protein variants can be rapidly and efficiently sorted
for those sequences that bind to a target protein. Display of
polypeptide libraries on phage may be used for screening for
polypeptides with specific binding properties. Polyvalent phage
display methods may be used for displaying polypeptides through
fusions to either gene III or gene VIII of filamentous phage. Wells
and Lowman (1992) Curr. Opin. Struct. Biol B:355-362 and references
cited therein. In monovalent phage display, a polypeptide library
is fused to a gene III or a portion thereof and expressed at low
levels in the presence of wild type gene III protein so that phage
particles display one copy or none of the fusion proteins. Avidity
effects are reduced relative to polyvalent phage so that sorting is
on the basis of intrinsic ligand affinity, and phagemid vectors are
used, which simplify DNA manipulations. Lowman and Wells (1991)
Methods: A companion to Methods in Enzymology 3:205-216. In phage
display, the phenotype of the phage particle, including the
displayed polypeptide, corresponds to the genotype inside the phage
particle, the DNA enclosed by the phage coat proteins.
[0168] In some embodiments, each GB1 peptidic compound of a subject
library is fused to at least a portion of a viral coat protein.
Examples of viral coat proteins include infectivity protein PIII,
major coat protein PVIII, p3, Soc, Hoc, gpD (of bacteriophage
lambda), minor bacteriophage coat protein 6 (pVI) (filamentous
phage; J. Immunol. Methods, 1999, 231(1-2):39-51), variants of the
M13 bacteriophage major coat protein (P8) (Protein Sci 2000 April;
9(4):647-54). The fusion protein can be displayed on the surface of
a phage and suitable phage systems include M13KO7 helper phage,
M13R408, M13-VCS, and Phi X 174, pJuFo phage system (J. Virol. 2001
August; 75(15):7107-13), hyperphage (Nat. Biotechnol. 2001 January;
19(1):75-8). In certain embodiments, the helper phage is M13KO7,
and the coat protein is the M13 Phage gene III coat protein. In
certain embodiments, the host is E. coli or protease deficient
strains of E. coli. Vectors, such as the fth1 vector (Nucleic Acids
Res. 2001 May 15; 29(10):E50-0) can be useful for the expression of
the fusion protein.
Display of Fusion Polypeptides
[0169] Any convenient methods for displaying fusion polypeptides
including GB1 peptidic compounds on the surface of bacteriophage
may be used. For example methods as described in patent publication
number WO 92/01047; WO 92/20791; WO 93/06213; WO 93/11236 and WO
93/19172.
[0170] The expression vector also can have a secretory signal
sequence fused to the DNA encoding each GB1 peptidic compound. This
sequence may be located immediately 5' to the gene encoding the
fusion protein, and will thus be transcribed at the amino terminus
of the fusion protein. However, in certain cases, the signal
sequence has been demonstrated to be located at positions other
than 5' to the gene encoding the protein to be secreted. This
sequence targets the protein to which it is attached across the
inner membrane of the bacterial cell. The DNA encoding the signal
sequence may be obtained as a restriction endonuclease fragment
from any gene encoding a protein that has a signal sequence.
Suitable prokaryotic signal sequences may be obtained from genes
encoding, for example, LamB or OmpF (Wong et al., Gene, 68:1931
(1983), MalE, PhoA and other genes. A prokaryotic signal sequence
for practicing this invention is the E. coli heat-stable
enterotoxin II (STII) signal sequence as described by Chang et al.,
Gene 55:189 (1987), and malE.
[0171] The vector may also include a promoter to drive expression
of the fusion protein. Promoters most commonly used in prokaryotic
vectors include the lac Z promoter system, the alkaline phosphatase
pho A promoter, the bacteriophage .gamma-.sub.PL promoter (a
temperature sensitive promoter), the tac promoter (a hybrid trp-lac
promoter that is regulated by the lac repressor), the tryptophan
promoter, and the bacteriophage T7 promoter. While these are the
most commonly used promoters, other suitable microbial promoters
may be used as well.
[0172] The vector can also include other nucleic acid sequences,
for example, sequences encoding gD tags, c-Myc epitopes, FLAG tags,
poly-histidine tags, fluorescence proteins (e.g., GFP), or
beta-galactosidase protein which can be useful for detection or
purification of the fusion protein expressed on the surface of the
phage or cell. Nucleic acid sequences encoding, for example, a gD
tag, also provide for positive or negative selection of cells or
virus expressing the fusion protein. In some embodiments, the gD
tag is fused to a GB1 peptidic compound which is not fused to the
viral coat protein. Nucleic acid sequences encoding, for example, a
polyhistidine tag, are useful for identifying fusion proteins
including GB1 peptidic compounds that bind to a specific target
using immunohistochemistry. Tags useful for detection of target
binding can be fused to either a GB1 peptidic compound not fused to
a viral coat protein or a GB1 peptidic compound fused to a viral
coat protein.
[0173] Another useful component of the vectors used to practice
this invention are phenotypic selection genes. The phenotypic
selection genes are those encoding proteins that confer antibiotic
resistance upon the host cell. By way of illustration, the
ampicillin resistance gene (ampr), and the tetracycline resistance
gene (tetr) are readily employed for this purpose.
[0174] The vector can also include nucleic acid sequences
containing unique restriction sites and suppressible stop codons.
The unique restriction sites are useful for moving GB1 peptidic
compound domains between different vectors and expression systems.
The suppressible stop codons are useful to control the level of
expression of the fusion protein and to facilitate purification of
GB1 peptidic compounds. For example, an amber stop codon can be
read as Gln in a supE host to enable phage display, while in a
non-supE host it is read as a stop codon to produce soluble GB1
peptidic compounds without fusion to phage coat proteins. These
synthetic sequences can be fused to GB1 peptidic compounds in the
vector.
[0175] In some cases, vector systems that allow the nucleic acid
encoding a GB1 peptidic compound of interest to be easily removed
from the vector system and placed into another vector system, may
be used. For example, appropriate restriction sites can be
engineered in a vector system to facilitate the removal of the
nucleic acid sequence encoding the GB1 peptidic compounds. The
restriction sequences are usually chosen to be unique in the
vectors to facilitate efficient excision and ligation into new
vectors. GB1 peptidic compound domains can then be expressed from
vectors without extraneous fusion sequences, such as viral coat
proteins or other sequence tags.
[0176] Between nucleic acid encoding GB1 peptidic compounds (gene
1) and the viral coat protein (gene 2), DNA encoding a termination
codon may be inserted, such termination codons including UAG
(amber), UAA (ocher) and UGA (opel). (Microbiology, Davis et al.,
Harper & Row, New York, 1980, pp. 237, 245-47 and 374). The
termination codon expressed in a wild type host cell results in the
synthesis of the gene 1 protein product without the gene 2 protein
attached. However, growth in a suppressor host cell results in the
synthesis of detectable quantities of fused protein. Such
suppressor host cells are well known and described, such as E. coli
suppressor strain (Bullock et al., BioTechniques 5:376-379 (1987)).
Any acceptable method may be used to place such a termination codon
into the mRNA encoding the fusion polypeptide.
[0177] The suppressible codon may be inserted between the first
gene encoding the GB1 peptidic compounds, and a second gene
encoding at least a portion of a phage coat protein. Alternatively,
the suppressible termination codon may be inserted adjacent to the
fusion site by replacing the last amino acid triplet in the
antibody variable domain or the first amino acid in the phage coat
protein. When the plasmid containing the suppressible codon is
grown in a suppressor host cell, it results in the detectable
production of a fusion polypeptide containing the polypeptide and
the coat protein. When the plasmid is grown in a non-suppressor
host cell, the GB1 peptidic compound domain is synthesized
substantially without fusion to the phage coat protein due to
termination at the inserted suppressible triplet UAG, UAA, or UGA.
In the non-suppressor cell the GB1 peptidic compound domain is
synthesized and secreted from the host cell due to the absence of
the fused phage coat protein which otherwise anchored it to the
host membrane.
Methods of Screening
[0178] Also provided are methods of screening libraries of the
compounds, e.g., as described above, for binding to a target
protein. In addition, the libraries may be selected for improved
binding affinity to a certain target protein, e.g., as described
above, for the preparation and screening of affinity maturation
libraries. The target proteins may include any type of protein of
interest in research or therapeutic applications. Aspects of these
screening methods may include determining whether a compound of the
subject libraries specifically binds to a target protein of
interest. Screening methods may include screening for inhibition of
a biological activity. Such methods may include: (i) contacting a
sample containing a target protein with a library of the invention;
and (ii) determining whether a compound of the library specifically
binds to the target protein.
[0179] The determining step may be carried out by any one or more
of a variety a protocols for characterizing the specific binding or
the inhibition of binding.
[0180] For example, screening may be a cell-based assay, an enzyme
assay, a ELISA assay or other related biological assay for
assessing specific binding or the inhibition of binding, and the
determining or assessment step suitable for application in such
assays are well known and involve routine protocols.
[0181] Screening may also include in silico methods, in which one
or more physical and/or chemical attributes of compounds of the
library of interest are expressed in a computer-readable format and
evaluated by any one or more of a variety of molecular modeling
and/or analysis programs and algorithms suitable for this purpose.
In some embodiments, the in silico method includes inputting one or
more parameters related to the D-target protein, such as but not
limited to, the three-dimensional coordinates of a known X-ray
crystal structure of the D-target protein. In some embodiments, the
in silico method includes inputting one or more parameters related
to the compounds of the L-peptidic library, such as but not limited
to, the three-dimensional coordinates of a known X-ray crystal
structure of a parent scaffold domain of the library. In some
instances, the in silico method includes generating one or more
parameters for each compound in a peptidic library in a computer
readable format, and evaluating the capabilities of the compounds
to specifically bind to the target protein. The in silico methods
include, but are not limited to, molecular modelling studies,
biomolecular docking experiments, and virtual representations of
molecular structures and/or processes, such as molecular
interactions. The in silico methods may be performed as a
pre-screen (e.g., prior to preparing a L-peptidic library and
performing in vitro screening), or as a validation of binding
compounds identified after in vitro screening.
[0182] Thus the screening methods of the invention can be carried
out in vitro or in vivo. For example, when the compound is in a
cell, the cell may be in vitro or in vivo, and the determining of
whether the compound is capable of specifically binding to a target
protein in the cell includes: (i) contacting the cell with a
library of the invention; and (ii) assessing whether a compound of
the library specifically binds to the target protein.
[0183] As such, determining whether a GB1 peptidic compound of a
subject library is capable of specifically binding a target protein
may be carried out by any number of methods, as well as
combinations thereof.
[0184] In some embodiments, the subject method includes:
[0185] (a) contacting a target protein with a library including 50
or more distinct GB1 peptidic compounds, where each compound
includes a .beta.1-.beta.2 region and three or more, such as four
or more or five or more mutations at non-core positions in a region
outside of the .beta.1-.beta.2 region; and
[0186] (b) identifying a compound of the library that specifically
binds to the target protein.
[0187] In some embodiments, in the subject method, the target
protein is a D-protein. In some embodiments, in the subject method,
the target protein is a L-protein.
Phage Display Screening Methods
[0188] Screening for the ability of a fusion polypeptide including
a GB1 peptidic compound of a subject library to bind a target
molecule can also be performed in solution phase. For example, a
target protein can be attached with a detectable moiety, such as
biotin. Phage that bind to the target molecule in solution can be
separated from unbound phage by a molecule that binds to the
detectable moiety, such as streptavidin-coated beads where biotin
is the detectable moiety. Affinity of binders (GB1 peptidic
compound fusions that bind to target protein) can be determined
based on concentration of the target protein used, using any
convenient formulas and criteria.
[0189] In some embodiments, the target protein may be attached to a
suitable matrix such as agarose beads, acrylamide beads, glass
beads, cellulose, various acrylic copolymers, hydroxyalkyl
methacrylate gels, polyacrylic and polymethacrylic copolymers,
nylon, neutral and ionic carriers, and the like. Attachment of the
target protein to the matrix may be accomplished by any convenient
methods, e.g., methods as described in Methods in Enzymology, 44
(1976). After attachment of the target protein to the matrix, the
immobilized target is contacted with the library expressing the GB1
peptidic compound containing fusion polypeptides under conditions
suitable for binding of at least a portion of the phage particles
with the immobilized target. In some instances, the conditions,
including pH, ionic strength, temperature and the like will mimic
physiological conditions. Bound particles ("binders") to the
immobilized target are separated from those particles that do not
bind to the target by washing. Wash conditions can be adjusted to
result in removal of all but the higher affinity binders. Binders
may be dissociated from the immobilized target by a variety of
methods. These methods include competitive dissociation using the
wild-type ligand, altering pH and/or ionic strength, and methods
known in the art. Selection of binders may involve elution from an
affinity matrix with a ligand. Elution with increasing
concentrations of ligand should elute displayed binding GB1
peptidic compounds of increasing affinity.
[0190] The binders can be isolated and then reamplified or
expressed in a host cell and subjected to another round of
selection for binding of target molecules. Any number of rounds of
selection or sorting can be utilized. One of the selection or
sorting procedures can involve isolating binders that bind to an
antibody to a polypeptide tag such as antibodies to the gD protein,
FLAG or polyhistidine tags. Another selection or sorting procedure
can involve multiple rounds of sorting for stability, such as
binding to a target protein that specifically binds to folded GB1
peptidic compound containing polypeptide and does not bind to
unfolded polypeptide followed by selecting or sorting the stable
binders for binding to a target protein.
[0191] In some cases, suitable host cells are infected with the
binders and helper phage, and the host cells are cultured under
conditions suitable for amplification of the phagemid particles.
The phagemid particles are then collected and the selection process
is repeated one or more times until binders having the desired
affinity for the target molecule are selected. In certain
embodiments, two or more rounds of selection are conducted.
[0192] After binders are identified by binding to the target
protein, the nucleic acid can be extracted. Extracted DNA can then
be used directly to transform E. coli host cells or alternatively,
the encoding sequences can be amplified, for example using PCR with
suitable primers, and then inserted into a vector for
expression.
[0193] Any convenient strategy may be used to select for high
affinity binders to a target protein. In certain embodiments, the
process of screening is carried out by automated systems to allow
for high-throughput screening of library candidates.
[0194] In certain embodiments, compounds of the subject peptidic
library specifically bind to a target protein with high affinity,
e.g., as determined by an SPR binding assay or an ELISA assay. The
compounds of the subject peptidic library may exhibit an affinity
for a target protein of 1 uM or less, such as 300 nM or less, 100
nM or less, 30 nM or less, 10 nM or less, 5 nM or less, 2 nM or
less, 1 nM or less, 300 pM or less, or even less. The compounds of
the subject peptidic libraries may exhibit a specificity for a
target protein, e.g., as determined by comparing the affinity of
the compound for the target protein with that for a reference
protein (e.g., an albumin protein), that is 5:1 or more 10:1 or
more, such as 30:1 or more, 100:1 or more, 300:1 or more, 1000:1 or
more, or even more.
Target Molecules
[0195] Once the subject libraries are prepared they can be selected
and/or screened for binding to one or more target molecules. In
addition, the libraries may be selected for improved binding
affinity to certain target molecule. The target molecules may be
any type of protein-binding or antigenic molecule, such as
proteins, nucleic acids, carbohydrates or small molecules. In
certain embodiments, the target molecule is a therapeutic target
molecule or a diagnostic target molecule, or a fragment thereof, or
a mimic thereof.
[0196] In certain embodiments, the target molecule is a hormone, a
growth factor, a receptor, an enzyme, a cytokine, an osteoinductive
factor, a colony stimulating factor or an immunoglobulin.
[0197] In certain embodiments, the target molecule may be one or
more of the following: growth hormone, bovine growth hormone,
insulin like growth factors, human growth hormone including
n-methionyl human growth hormone, parathyroid hormone, thyroxine,
insulin, proinsulin, amylin, relaxin, prorelaxin, glycoprotein
hormones such as follicle stimulating hormone (FSH), leutinizing
hormone (LH), hemapoietic growth factor, Her-2, fibroblast growth
factor, prolactin, placental lactogen, tumor necrosis factors,
mullerian inhibiting substance, mouse gonadotropin-associated
polypeptide, inhibin, activin, vascular endothelial growth factors,
integrin, nerve growth factors such as NGF-beta, insulin-like
growth factor-I and II, erythropoietin, osteoinductive factors,
interferons, colony stimulating factors, interleukins (e.g., an
IL-4 or an IL-8 protein), bone morphogenetic proteins, LIF, SCF,
FLT-3 ligand, kit-ligand, SH3 domain, apoptosis protein, hepatocyte
growth factor, hepatocyte growth factor receptor, neutravidin,
maltose binding protein, angiostatin, aFGF, bFGF, TGF-alpha,
TGF-beta, HGF, TNF-alpha, angiogenin, IL-8, thrombospondin, the
16-kilodalton N-terminal fragment of prolactin and endostatin.
[0198] In certain embodiments, the target molecule may be a
therapeutic target protein for which structural information is
known, such as, but not limited to: Raf kinase (a target for the
treatment of melanoma), Rho kinase (a target in the prevention of
pathogenesis of cardiovascular disease), nuclear factor kappaB
(NF-.kappa.B, a target for the treatment of multiple myeloma),
vascular endothelial growth factor (VEGF) receptor kinase (a target
for action of anti-angiogenetic drugs), Janus kinase 3 (JAK-3, a
target for the treatment of rheumatoid arthritis), cyclin dependent
kinase (CDK) 2 (CDK2, a target for prevention of stroke), FMS-like
tyrosine kinase (FLT) 3 (FLT-3; a target for the treatment of acute
myelogenous leukemia (AML)), epidermal growth factor receptor
(EGFR) kinase (a target for the treatment of cancer), protein
kinase A (PKA, a therapeutic target in the prevention of
cardiovascular disease), p21-activated kinase (a target for the
treatment of breast cancer), mitogen-activated protein kinase
(MAPK, a target for the treatment of cancer and arthritis), c-Jun
NH.sub.2-terminal kinase (JNK, a target for treatment of diabetes),
AMP-activated kinase (AMPK, a target for prevention and treatment
of insulin resistance), lck kinase (a target for
immuno-suppression), phosphodiesterase PDE4 (a target in treatment
of inflammatory diseases such as rheumatoid arthritis and asthma),
Abl kinase (a target in treatment of chronic myeloid leukemia
(CML)), phosphodiesterase PDE5 (a target in treatment of erectile
dysfunction), a disintegrin and metalloproteinase 33 (ADAM33, a
target for the treatment of asthma), human immunodeficiency virus
(HIV)-1 protease and HIV integrase (targets for the treatment of
HIV infection), respiratory syncytial virus (RSV) integrase (a
target for the treatment of infection with RSV), X-linked inhibitor
of apoptosis (XIAP, a target for the treatment of neurodegenerative
disease and ischemic injury), thrombin (a therapeutic target in the
treatment and prevention of thromboembolic disorders), tissue type
plasminogen activator (a target in prevention of neuronal death
after injury of central nervous system), matrix metalloproteinases
(targets of anti-cancer agents preventing angiogenesis), beta
secretase (a target for the treatment of Alzheimer's disease), src
kinase (a target for the treatment of cancer), fyn kinase, lyn
kinase, zeta-chain associated protein 70 (ZAP-70) protein tyrosine
kinase, extracellular signal-regulated kinase 1 (ERK-1), p38 MAPK,
CDK4, CDK5, glycogen synthase kinase 3 (GSK-3), KIT tyrosine
kinase, FLT-1, FLT-4, kinase insert domain-containing receptor
(KDR) kinase, and cancer osaka thyroid (COT) kinase.
[0199] In certain embodiments, the target molecule is a target
protein that is selected from the group consisting of a VEGF
protein, a RANKL protein, a NGF protein, a TNF-alpha protein, a SH2
domain containing protein, a SH3 domain containing protein, an IgE
protein a BLyS protein (Oren et al., "Structural basis of BLyS
receptor recognition", Nature Structural Biology 9, 288-292, 2002),
a PCSK9 protein (Ni et al., "A proprotein convertase
subtilisin-like/kexin type 9 (PCSK9) C-terminal domain antibody
antigen-binding fragment inhibits PCSK9 internalization and
restores low density lipoprotein uptake", J. Biol. Chem. 2010 Apr.
23; 285(17):12882-91), a DLL4 protein (Garber, "Targeting Vessel
Abnormalization in Cancer", JNCI Journal of the National Cancer
Institute 2007 99(17):1284-1285), an Ang2 (Angiopoietin-2) protein,
a Clostridium difficile Toxin A or B protein (e.g., Ho et al.,
"Crystal structure of receptor-binding C-terminal repeats from
Clostridium difficile toxin A", (2005) Proc. Natl. Acad. Sci. Usa
102: 18373-18378), a CTLA4 protein (Cytotoxic T-Lymphocyte Antigen
4), and fragments thereof. In certain embodiments, the target
protein is a VEGF protein. In certain embodiments, the target
protein is a SH2 domain containing protein (e.g., a 3BP2 protein)
or a SH3 domain containing protein (e.g., a ABL or a Src
protein).
Utility
[0200] The libraries of the invention, e.g., as described above,
find use in a variety of applications. Applications of interest
include, but are not limited to, screening applications and
research applications.
[0201] The screening methods, e.g., as described above, find use in
a variety of applications, including selection and/or screening of
the subject libraries in a wide range of research and therapeutic
applications, such as therapeutic lead identification and affinity
maturation, identification of diagnostic reagents, development of
high throughput screening assays, development of drug delivery
systems for the delivery of toxins or other therapeutic moieties.
The subject screening methods may be exploited in multiple
settings.
[0202] In some cases, the subject libraries may find use as
research tools to analyze the roles of proteins of interest in
modulating various biological processes, e.g., angiogenesis,
inflammation, cellular growth, metabolism, regulation of
transcription and regulation of phosphorylation. For example,
antibody libraries have been useful tools in many such areas of
biological research and lead to the development of effective
therapeutic agents, see Sidhu and Fellhouse, "Synthetic therapeutic
antibodies," Nature Chemical Biology, 2006, 2(12), 682-688.
[0203] The subject libraries may be exploited as research tools in
the development of clinical diagnostics, e.g., in vitro diagnostics
(e.g., for targeting various biomarkers), or in vivo tumor imaging
agents. The screening of libraries of binding molecules (e.g.,
aptamers and antibodies) has found use in the development of such
clinical diagnostics, see for example, Jayasena, "Aptamers: An
Emerging Class of Molecules That Rival Antibodies in Diagnostics,"
Clinical Chemistry. 1999; 45:1628-1650.
[0204] The following examples are offered by way of illustration
and not by way of limitation.
EXPERIMENTAL
1. Phage Display of GB1 Peptidic Libraries
1.1 Cloning
[0205] The wild-type sequence of the Protein G B1 domain
(Gronenborn et al., Science 253, 657-61, 1991) was prepared
(Genscript USA Inc.) with an N-terminal FLAG tag and a C-terminal
10.times.His tag spaced by a Glycine-Glycine-Serine linker, is
shown below:
DYKDDDDK-GGS-TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEW
TYDDATKTFTVTE-GGS-HHHHHHHHHH-amber stop (SEQ ID NO: 44)
[0206] This sequence was synthesized with NcoI and XbaI restriction
sites at 5' and 3' respectively and cloned into a display vector as
an N-terminal fusion to truncated protein 3 of M13 filamentous
phage. The features of the vector include a ptac promoter and StII
secretion leader sequence (MKKNIAFLLASMFVFSIATNAYA; SEQ ID NO: 45).
This display version allows the display of GB1 in amber suppressor
bacterial strains and is useful for expression of the protein in
non-suppressor strains.
1.2 Optimization of Phage Display Levels
[0207] The presence of the His-tag and amber-stop at the C-terminus
of the protein allows the purification of proteins/mutants without
additional mutagenesis. In addition, to optimize for display of GB1
peptidic compounds, two additional constructs were tested for
display-levels of GB1 (i) without His-tag and amber-stop (ii) with
a hinge and dimerization sequence derived from a Fab-template
(DKTHTCGRP; SEQ ID NO: 46) for dimeric display.
[0208] The following oligonucleotides were prepared (Integrated DNA
Technologies Inc.), for site-directed mutagenesis:
TABLE-US-00003 i) 5'-GTT ACC GAA GGC GGT TCT TCT AGA AGT GGT TCC
GGT-3' SEQ ID NO: 47 V T E G G S S R S G S G SEQ ID NO: 48
[0209] For removal of 10.times.His and amber-stop
TABLE-US-00004 ii) 5'-TT ACC GAA GGC GGT TCT GAC AAA ACT CAC ACA
TGC GGC CGG CCC AGT GGT TCC GGT GAT T-3' SEQ ID NO: 49 V T E G G S
D K T H T C G R P S G S G D F SEQ ID NO: 50
[0210] For insertion of Fab-dimerization sequence to replace
His-tag and amber stop
[0211] Site-directed mutagenesis was performed by methods described
by Kunkel et al. (Methods Enzymol., 1987, 154, 367-82) and the
sequence was confirmed by DNA sequencing. For comparing display
levels, phage for each construct was harvested from a 25 mL
overnight culture using methods described previously (Fellouse
& Sidhu, "Making antibodies in bacteria. Making and using
antibodies" Howard & Kaser, Eds., CRC Press, Boca Raton, Fla.,
2007). The phage concentrations were estimated using a
spectrophotometer (OD.sub.268=1 for 5.times.10.sup.12 phage/ml) and
normalized to the lowest concentration. Three-fold serial dilutions
of phage for each construct were prepared and added to NUNC
maxisorb plates previously coated with anti-FLAG antibody (5
.mu.g/ml) and blocked with BSA (0.2% BSA in PBS). The plates were
washed and assayed with anti-M13-HRP to detect binding. The HRP
signal was plotted as function of phage concentration.
2 Preparation of GB1 Loop Libraries
[0212] The mutational and insertion tolerance of GB1 loops was
tested, by randomizing the loops and beta-turns and selecting for
stably folded proteins. The loop lengths were varied from 4-6
residues and randomized with a NNK codon. The beta-turns and loop
residues of GB1 are shown as underlined below:
TABLE-US-00005 (SEQ ID NO: 1)
TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE B1 L1 L2
B2
[0213] Regions B1 and L2 are contiguous and regions L1 and B2 are
contiguous. These loops/turn regions were randomized together to
produce libraries for screening. Site directed mutagenesis (Kunkel
1987) was used to introduce trip stop codons in the loop pairs.
Since wild-type protein is more stable, it would have selective
advantage over the rest of the library. The following
oligonucleotides were used to make the stop-templates (Integrated
DNA Technologies, Inc.):
TABLE-US-00006 B1-L2- Stop template SEQ ID NO: 51 5'- TAC AAA CTG
ATT CTG AAC TAA TAA TAA AAA GGT GAA ACC ACG AC-3' (For B1) SEQ ID
NO: 52 5'- G TAC GCC AAC GAT AAT TAA TAA TAA GAA TGG ACC TAC GAT
G-3' (For L2) L1-B2- Stop template SEQ ID NO: 53 5'- GGT GAA ACC
ACG ACC TAA TAA TAA GCA GCA ACG GCA GAA AAA-3' (For L1) SEQ ID NO:
54 5'- GT GAA TGG ACC TAC GAT TAA TAA TAA ACC TTC ACG GTT ACC G-3'
(For B2)
[0214] These stop templates were mutated to construct the Loop
libraries using methods described in previous protocols (Kunkel
1987). The following oligonucleotides were used for randomization
(Integrated DNA Technologies, Inc.):
TABLE-US-00007 Library B1-L2 SEQ ID NO: 55 5'- TAC AAA CTG ATT CTG
AAC NNK NNK NNK NNK AAA GGT GAA ACC ACG AC-3' SEQ ID NO: 56 5'- TAC
AAA CTG ATT CTG AAC NNK NNK NNK NNK NNK AAA GGT GAA ACC ACG AC-3'
SEQ ID NO: 57 5'- TAC AAA CTG ATT CTG AAC NNK NNK NNK NNK NNK NNK
AAA GGT GAA ACC ACG AC-3' SEQ ID NO: 58 5'- G TAC GCC AAC GAT AAT
NNK NNK NNK NNK GAA TGG ACC TAC GAT G-3' SEQ ID NO: 59 5'- G TAC
GCC AAC GAT AAT NNK NNK NNK NNK NNK GAA TGG ACC TAC GAT G-3' SEQ ID
NO: 60 5'- G TAC GCC AAC GAT AAT NNK NNK NNK NNK NNK NNK GAA TGG
ACC TAC GAT G-3' Library L1-B2 SEQ ID NO: 61 5'- GGT GAA ACC ACG
ACC NNK NNK NNK NNK GCA GCA ACG GCA GAA AAA-3' SEQ ID NO: 62 5'-
GGT GAA ACC ACG ACC NNK NNK NNK NNK NNK GCA GCA ACG GCA GAA AAA-3'
SEQ ID NO: 63 5'- GGT GAA ACC ACG ACC NNK NNK NNK NNK NNK NNK GCA
GCA ACG GCA GAA AAA-3' SEQ ID NO: 64 5'- GT GAA TGG ACC TAC GAT NNK
NNK NNK NNK ACC TTC ACG GTT ACC G-3' SEQ ID NO: 65 5'- GT GAA TGG
ACC TAC GAT NNK NNK NNK NNK NNK ACC TTC ACG GTT ACC G-3' SEQ ID NO:
66 5'- GT GAA TGG ACC TAC GAT NNK NNK NNK NNK NNK NNK ACC TTC ACG
GTT ACC G-3'
[0215] The number of transformants was 1.times.10.sup.9 for Library
B1-L2 and 1.times.10.sup.10 for Library L1-B2. The selections were
performed using the methods described below except that the library
was directly added to selections wells coated with anti-FLAG
antibody (5 .mu.g/ml diluted in PBT) and there was no preincubation
step. Selections on anti-FLAG were performed to identify folded
variants (misfolded proteins are cleaved thereby losing N-terminal
FLAG tag). Three rounds of selection (8 washes/round) were
performed as good enrichment was observed in Pool ELISA at Rounds 2
and 3.
[0216] The results of anti-FLAG selections of the loop libraries
showed that all loops tolerated mutations, including insertion
mutations, while maintaining the structure of the scaffold. The
following exemplary loop sequences were identified following
anti-FLAG selection:
TABLE-US-00008 TABLE 1 anti-FLAG selection of loop libraries B1-L2
and L1-B2. Loop 1 SEQ Loop 3 SEQ Loop 2 SEQ Loop 4 SEQ 8 9 10 11 a
b ID NO: 37 38 39 40 a b ID NO: 18 19 20 21 a b ID 46 47 48 49 a b
ID NO: W P C G V 67 Q V G S 104 G R R T 135 L I P N C Y 166 E V G G
V 68 G V W S Q G 105 F E C G W G 136 S S A L K R 167 S S A W R 69 W
G C R 106 D R G S 137 E L G G 168 C R G T 70 S T L G G 107 T C T P
138 C A R R H C 169 W G E E 71 F V L A H S 108 V E G G 139 C W P S
G 170 G S K T G 72 R H A M 109 S L D E R 140 G A S I N C 171 A S T
G 73 T K F C 110 G G A E 141 G C G R 172 G G R W R 74 F C G S R G
111 A F E A E 142 Y K C T D D 173 R G G E 75 M F T E 112 P E S I M
R 143 C R G P R 174 S D H S 76 G V G G 113 G E V T 144 S S V G 175
S D G M 77 L R G L 114 S S V D G 145 A C L G G 176 N A H R 78 R R I
Q C G 115 V G G A 146 Q N C E M 177 C G E P E 79 Q N L V 116 G W C
A P R 147 K E R G A G 178 T H G A 80 Y T D A L S 117 G E C W G 148
P D E M V 179 T G L V R 81 K A V S V R 118 H H G C R A 149 N S D Q
Q 180 G A C V R 82 H G R T A G 119 C D D R 150 G A G G 181 G Q Q H
83 R G V V 120 D W G R 151 Q G C G E 182 G T S R E 84 V W L G 121 T
R G N 152 C P S R 183 C A T T W 85 G E D A 122 D S S A 153 S D G C
184 G V A G 86 S V W E C 123 L S C Q 154 A G S S P 185 C A R Y G 87
S K Y V L G 124 C V E T R 155 A P Q V G 186 L D F L C 88 A P L R M
Q 125 V V G E 156 G C S A 187 C N T R 89 Y G W K H 126 R P T S D M
157 G C R G E S 188 L P S R 90 G C G S R L 127 W E D T C V 158 P R
P D A 189 R D I Y 91 D A M C K G 128 S C L G 159 S G N L G G 190 G
W G G A W 92 R G K Y 129 K E V K Q 160 R G M A 191 L C V P I N 93 E
G G G 130 D S S V 161 E G G G 192 W E K E D 94 D S S C G 131 C T L
K 162 R R D D E 193 W G S Q 95 G I G V A 132 P S G H 163 L P Y P
194 G D H A F S 96 M C S S G 133 W S Q C 164 G R A G 195 W G G G A
C 97 C P T R 134 Q C N N 165 Y R L G R 196 G C V K 98 S I I L 267 E
G H S A 99 Q R Y D 268 G Y G G R 100 K E Y Y N M 269 C C G L 101 G
G H S 270 K D G G 102 E F F S 271 T S N G V 103 G V V LK 272
3. Preparation of GB1 Peptidic Libraries
[0217] The solvent accessible surface area (SASA) for each residue
in the Protein Data Bank (PDB) structure 3 GB1 was estimated using
the GETarea tool (Fraczkiewicz & Braun, "Exact and efficient
analytical calculation of the accessible surface areas and their
gradients for macromolecules," J. Comput. Chem. 1998, 19, 319-333).
This tool also calculates the ratio of SASA in structure compared
to SASA in a random coil. A ratio of 0.4 was used to select solvent
accessible residues (shown in bold):
TYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVT E (SEQ ID
NO: 1).
[0218] Various contiguous stretches of solvent-accessible residues
were selected for randomization (shown in dark in FIGS. 5 to 10)
taking into account the oligonucleotide length and homology
requirements for Kunkel mutagenesis. The parent sequence is also
shown in FIG. 3 with the numbering scheme and loop/beta-turn
regions defined.
[0219] In addition, positions in the loops were selected for
mutations that include insertion of 0, 1 or 2 additional amino acid
residues in addition to substitution. Library 1: +0-2 insertions at
position 38; Library 2: +0-2 insertions at position 19; Library 3:
+2 insertions at position 1, +0-2 insertions at positions 19 and
47; Library 4: +0-2 insertions at positions 9 and 38, +1 insertion
at position 55; Library 5: +0-2 insertions at position 9, +1
insertion at position 55; Library 6: +1 insertion at position 1,
+0-2 insertions at position 47.
[0220] The following oligonucleotides were prepared (Integrated DNA
Technologies) to make the libraries using the Kunkel mutagensis
method:
Library 1:
TABLE-US-00009 [0221] (SEQ ID NO: 197) 5'-ACGACCGAAGCAGTG KHT KHT
KHT KHT GCA KHT KHT GTT TTC KHT KHT TAC GCC KHT KHT AAT KHT KHT KHT
KHT KHT TGGACCTACGATGAT-3' (SEQ ID NO: 198) 5'-ACGACCGAAGCAGTG KHT
KHT KHT KHT GCA KHT KHT GTT TTC KHT KHT TAC GCC KHT KHT AAT KHT KHT
KHT KHT KHT KHT TGGACCTACGATGAT-3' (SEQ ID NO: 199)
5'-ACGACCGAAGCAGTG KHT KHT KHT KHT GCA KHT KHT GTT TTC KHT KHT TAC
GCC KHT KHT AAT KHT KHT KHT KHT KHT KHT KHT TGGACCTACGATGAT-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NOs:
197-199 include insertion mutations of +0, 1 or 2 additional
variant amino acids, respectively, at the position equivalent to
position 38 of the scaffold.
Library 2:
TABLE-US-00010 [0222] (SEQ ID NO: 200) 5'-GGTGAAACCACGACC KHT KHT
KHT KHT KHT KHT KHT GCA KHT KHT KHT TTC KHT KHT KHT GCC KHT KHT
AATGGCGTGGATGGT-3' (SEQ ID NO: 201) 5'-GGTGAAACCACGACC KHT KHT KHT
KHT KHT KHT KHT KHT GCA KHT KHT KHT TTC KHT KHT KHT GCC KHT KHT
AATGGCGTGGATGGT-3' (SEQ ID NO: 202) 5'-GGTGAAACCACGACC KHT KHT KHT
KHT KHT KHT KHT KHT KHT GCA KHT KHT KHT TTC KHT KHT KHT GCC KHT KHT
AATGGCGTGGATGGT-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NOs:
200-202 include insertion mutations of +0, 1 or 2 additional
variant amino acids, respectively, at the position equivalent to
position 19 of the scaffold.
Library 3:
TABLE-US-00011 [0223] (SEQ ID NO: 203) 5'-GATGATAAAGGCGGTAGC KHT
KHT KHT TACAAACTGATTCTG AAC-3' (SEQ ID NO: 204)
5'-AAAGGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT
GCAGAAAAAGTTTTCAAA-3' (SEQ ID NO: 205) 5'-AAAGGTGAAACCACGACC KHT
KHT KHT KHT KHT KHT KHT KHT GCAGAAAAAGTTTTCAAA-3' (SEQ ID NO: 206)
5'-AAAGGTGAAACCACGACC KHT KHT KHT KHT KHT KHT KHT KHT KHT
GCAGAAAAAGTTTTCAAA-3' (SEQ ID NO: 207) 5'-GATGGTGAATGGACCTAC KHT
KHT KHT KHT KHT ACCTTCACGGTTACCGAA-3' (SEQ ID NO: 208)
5'-GATGGTGAATGGACCTAC KHT KHT KHT KHT KHT KHT ACCTTCACGGTTACCGAA-3'
(SEQ ID NO: 209) 5'-GATGGTGAATGGACCTAC KHT KHT KHT KHT KHT KHT KHT
ACCTTCACGGTTACCGAA-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NO:
203 includes an insertion mutation of +2 variant amino acids at the
position equivalent to position 1 of the scaffold. SEQ ID NOs:
204-206 include mutations of +0, 1 or 2 additional variant amino
acids, respectively, at the position equivalent to position 19 of
the scaffold. SEQ ID NOs: 207-209 include mutations of +0, 1 or 2
additional variant amino acids, respectively, at the position
equivalent to position 47 of the scaffold.
Library 4
TABLE-US-00012 [0224] (SEQ ID NO: 210) 5'-ACGTACAAACTGATTCTG KHT
KHT KHT KHT KHT KHT GGTGAAACCACGACCGAA-3' (SEQ ID NO: 211)
5'-ACGTACAAACTGATTCTG KHT KHT KHT KHT KHT KHT KHT
GGTGAAACCACGACCGAA-3' (SEQ ID NO: 212) 5'-ACGTACAAACTGATTCTG KHT
KHT KHT KHT KHT KHT KHT KHT GGTGAAACCACGACCGAA-3' (SEQ ID NO: 213)
5'-AAACAGTACGCCAACGAT KHT KHT KHT KHT KHT KHT TGGACCTACGATGATGCG-3'
(SEQ ID NO: 214) 5'-AAACAGTACGCCAACGAT KHT KHT KHT KHT KHT KHT KHT
TGGACCTACGATGATGCG-3' (SEQ ID NO: 215) 5'-AAACAGTACGCCAACGAT KHT
KHT KHT KHT KHT KHT KHT KHT TGGACCTACGATGATGCG-3' (SEQ ID NO: 216)
5'-ACGAAAACCTTCACGGTT KHT KHT KHT GGCGGTTCTGACAAA ACT-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NOs:
210-212 include mutations of +0, 1 or 2 additional variant amino
acids, respectively, at the position equivalent to position 9 of
the scaffold. SEQ ID NOs: 213-215 include mutations of +0, 1 or 2
additional variant amino acids, respectively, at the position
equivalent to position 38 of the scaffold. SEQ ID NO: 216 includes
an insertion mutation of +2 variant amino acids at the position
equivalent to position 55 of the scaffold.
Library 5
TABLE-US-00013 [0225] (SEQ ID NO: 217) 5'-AAAGGCGGTAGCACGTAC KHT
CTG KHT CTG KHT KHT KHT KHT KHT KHT KHT KHT ACC KHT
ACCGAAGCAGTGGATGCA-3' (SEQ ID NO: 218) 5'-AAAGGCGGTAGCACGTAC KHT
CTG KHT CTG KHT KHT KHT KHT KHT KHT KHT KHT KHT ACC KHT
ACCGAAGCAGTGGATGCA-3' (SEQ ID NO: 219) 5'-AAAGGCGGTAGCACGTAC KHT
CTG KHT CTG KHT KHT KHT KHT KHT KHT KHT KHT KHT KHT ACC KHT
ACCGAAGCAGTGGATGCA-3' (SEQ ID NO: 220) 5'-GATGCGACGAAAACCTTC KHT
GTT KHT KHT KHT GGCGGTTCTGACAAAACT-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NOs:
217-219 include mutations of +0, 1 or 2 additional variant amino
acids, respectively, at the position equivalent to position 9 of
the scaffold. SEQ ID NO: 220 includes an insertion mutation of +2
variant amino acids at the position equivalent to position 55 of
the scaffold.
Library 6
TABLE-US-00014 [0226] (SEQ ID NO: 221) 5'-GATGATAAAGGCGGTAGC KHT
KHT TAC KHT CTG KHT CTG KHT GGCAAAACCCTGAAAGGT-3' (SEQ ID NO: 222)
5'-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT TTC
KHT GTT KHT GAAGGCGGTTCTGACAAA-3' (SEQ ID NO: 223)
5'-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT KHT
TTC KHT GTT KHT GAAGGCGGTTCTGACAAA-3' (SEQ ID NO: 224)
5'-GATAATGGCGTGGATGGT KHT TGG KHT TAC KHT KHT KHT KHT KHT KHT KHT
KHT TTC KHT GTT KHT GAAGGCGGTTCTGACAAA-3'
These oligonucleotides include the variable regions where each
variant amino acid position is encoded by a KHT codon. SEQ ID NO:
221 includes an insertion mutation of +1 variant amino acids at the
position equivalent to position 1 of the scaffold. SEQ ID NOs:
222-224 include mutations of +0, 1 or 2 additional variant amino
acids, respectively, at the position equivalent to position 47 of
the scaffold.
[0227] The libraries were prepared using the same method described
above for the GB1 template with Fab dimerization sequence (Fellouse
& Sidhu, 2007). Oligonucleotides with 0/1/2 insertions have the
same homology regions and compete for binding the template.
Therefore they were pooled together (equimolar ratio) and treated
as a single oligonucleotide for mutagenesis. The constructed
libraries were pooled together for total diversity of
3.5.times.10.sup.10 transformants. Selections were performed
against L-VEGF and D-VEGF using a method as described below with
the exception that 10 selection wells were used for Round 1.
[0228] Selections were also performed against 3BP2-SH2, ABL-SH3 and
v-Src-SH3 proteins using similar methods to those described
below.
[0229] Individual clones were analyzed by direct-binding ELISA as
described below and by single-point competitive ELISA (Fellouse
& Sidhu, 2007).
4. Methods of Screening of Phage Display Libraries
[0230] 4.1 Library Selections Against VEGF Protein and Negative
Selection with BSA
[0231] The selection procedure is essentially the same as described
in previous protocols (Fellouse & Sidhu, 2007) with some minor
changes. Although the method below is described for L-VEGF, the
method can be adapted to screen for binding to any target. The
media and buffer recipes are the same as in the described
protocol.
1. Coat NUNC Maxisorp plate wells with 100 .mu.l of L-VEGF (5
.mu.g/ml in PBS) for 2 h at room temperature. Coat 5 wells for
selection and 1 well for phage pool ELISA. 2. Remove the coating
solution and block for 1 h with 200 .mu.l of PBS, 0.2% BSA. At the
same time, block an uncoated well as a negative control for pool
ELISA. Also block 7 wells for pre-incubation of library on a
separate plate. 3. Remove the block solution from the
pre-incubation plate and wash four times with PT buffer. 4. Add 100
.mu.l of library phage solution (precipitated and resuspended in
PBT buffer) to each blocked wells. Incubate at room temperature for
1 h with gentle shaking. 5. Remove the block solution from
selection plate and wash four times with PT buffer. 6. Transfer
library phage solution from pre-incubation plate to selection plate
(5 selection wells+2 controls for pool ELISA) 7. Remove the phage
solution and wash 8-10 times with PT buffer (increased based on
pool ELISA signal from previous round). 8. To elute bound phage
from selection wells, add 100 .mu.l of 100 mM HCl. Incubate 5 min
at room temperature. Transfer the HCl solution to a 1.5-ml
microfuge tube. Adjust to neutral pH with 11 .mu.l of 1.0 M
Tris-HCl, pH 11.0. 9. In the meantime add 100 .mu.l of anti-M13 HRP
conjugate (1:5000 dilution in PBT buffer) to the control wells and
incubate for 30 min. 10. Wash control wells four times with PT
buffer. Add 100 .mu.l of freshly prepared TMB substrate. Allow
color to develop for 5-10 min. 11. Stop the reaction with 100 .mu.l
of 1.0 M H.sub.3PO.sub.4 and read absorbance at 450 nm in a
microtiter plate reader. The enrichment ratio can be calculated as
the ratio of signal from coated vs uncoated well. 12. Add 250 .mu.l
eluted phage solution to 2.5 ml of actively growing E. coli
XL1-Blue (OD.sub.600<0.8) in 2YT/tet medium. Incubate for 20 min
at 37.degree. C. with shaking at 200 rpm. 13. Add M13KO7 helper
phage to a final concentration of 10.sup.10 phage/ml. Incubate for
45 min at 37.degree. C. with shaking at 200 rpm. 14. Transfer the
culture from the antigen-coated wells to 25 volumes of 2YT/carb/kan
medium and incubate overnight at 37.degree. C. with shaking at 200
rpm. 15. Isolate phage by precipitation with PEG/NaCl solution,
resuspend in 1.0 ml of PBT buffer 16. Repeat the selection cycle
for 4 rounds. 4.2. Negative Selection with GST Tagged Protein
[0232] A more stringent negative selection procedure is as follows.
The selection process is essentially the same as described above
except that:
i) For Rounds 1 and 2 the libraries were pre-incubated on GST
coated (10 .mu.g/ml in PBS) and blocked wells. ii) For Rounds 3 and
4, the libraries were pre-incubated with 0.2 mg/ml GST in solution
for 1 hr before transfer to selection wells iii) The control wells
for pool ELISA were coated with GST (5 .mu.g/ml in PBS)
4.3. Selections of Libraries Against Anti-FLAG
[0233] Misfolded proteins are degraded in the periplasm and will
not be displayed on phage (Missiakas & Raina, "Protein
misfolding in the cell envelope of Escherichia coli: new signaling
pathways," Trends in Biochemical Sciences, 1997, 22, 59-63). Stably
folded proteins can therefore be selected for display of the
N-terminal FLAG tag.
[0234] The selections were performed on the GB1 Loop libraries by a
method similar to the one described above except that the library
was directly added to selection wells coated with anti-FLAG
antibody (5 .mu.g/ml diluted in PBT) and there was no preincubation
step. Only three rounds of selection were performed as good
enrichment was observed in Pool ELISA at Rounds 2 and 3.
5. Analysis of Single-Clones by Direct Binding ELISA
[0235] The following protocol is an adapted version of previous
protocols (Fellouse & Sidhu 2007; Tonikian et al., "Identifying
specificity profiles for peptide recognition modules from
phage-displayed peptide libraries," Nat. Protoc., 2007, 2,
1368-86):
1. Inoculate 450 .mu.l aliquots of 2YT/carb/KO7 medium in 96-well
microtubes with single colonies harboring phagemids and grow for 21
hrs at 37.degree. C. with shaking at 200 rpm. 2. Centrifuge at
4,000 rpm for 10 min and transfer phage supernatants to fresh
tubes. 3. Coat 3 wells of a 384 well NUNC maxisorb plate per clone,
with 2 .mu.g/ml of L-VEGF, Neutravidn, Erbin-GST respectively and
leave one well uncoated. Incubate for 2 hrs at room temperature and
block the plates (all 4 well). 4. Wash the plate four times with PT
buffer. 5. Transfer 30 .mu.l of phage supernatant to each well and
incubate for 2 hrs at room temperature with gentle shaking. 6. Wash
four times with PT buffer. 7. Add 30 .mu.l of anti-M13-HRP
conjugate (diluted 1:5000 in PBT buffer). Incubate 30 min with
gentle shaking. 8. Wash four times with PT buffer 9. Add 30 .mu.l
of freshly prepared TMB substrate. Allow color to develop for 5-10
min. 10. Stop the reaction with 100 .mu.l of 1.0 M H.sub.3PO.sub.4
and read absorbance at 450 nm in a microtiter plate reader.
[0236] Although the particular embodiments have been described in
some detail by way of illustration and example for purposes of
clarity of understanding, it is readily apparent in light of the
teachings of this invention that certain changes and modifications
may be made thereto without departing from the spirit or scope of
the appended claims.
[0237] Accordingly, the preceding merely illustrates the principles
of the invention. Various arrangements may be devised which,
although not explicitly described or shown herein, embody the
principles of the invention and are included within its spirit and
scope. Furthermore, all examples and conditional language recited
herein are principally intended to aid the reader in understanding
the principles of the invention and the concepts contributed by the
inventors to furthering the art, and are to be construed as being
without limitation to such specifically recited examples and
conditions. Moreover, all statements herein reciting principles,
aspects, and embodiments of the invention as well as specific
examples thereof, are intended to encompass both structural and
functional equivalents thereof. Additionally, it is intended that
such equivalents include both currently known equivalents and
equivalents developed in the future, i.e., any elements developed
that perform the same function, regardless of structure. The scope
of the present invention, therefore, is not intended to be limited
to the exemplary embodiments shown and described herein. Rather,
the scope and spirit of present invention is embodied by the
appended claims.
Sequence CWU 1
1
272155PRTArtificial SequenceSynthetic polypeptide 1Thr Tyr Lys Leu
Ile Leu Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu Ala
Val Asp Ala Ala Thr Ala Glu Lys Val Phe Lys Gln Tyr 20 25 30Ala Asn
Asp Asn Gly Val Asp Gly Glu Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys
Thr Phe Thr Val Thr Glu 50 55219PRTArtificial SequenceSynthetic
polypeptide 2Thr Tyr Lys Leu Ile Leu Asn Gly Lys Thr Leu Lys Gly
Glu Thr Thr1 5 10 15Thr Glu Ala313PRTArtificial SequenceSynthetic
polypeptide 3Thr Tyr Asp Asp Ala Thr Lys Thr Phe Thr Val Thr Glu1 5
10423PRTArtificial SequenceSynthetic polypeptide 4Val Asp Ala Ala
Thr Ala Glu Lys Val Phe Lys Gln Tyr Ala Asn Asp1 5 10 15Asn Gly Val
Asp Gly Glu Trp 20523PRTArtificial SequenceSynthetic polypeptide
5Val Xaa Xaa Xaa Xaa Ala Xaa Xaa Val Phe Xaa Xaa Tyr Ala Xaa Xaa1 5
10 15Asn Xaa Xaa Xaa Xaa Xaa Trp 20623PRTArtificial
SequenceSynthetic polypeptide 6Val Xaa Xaa Xaa Xaa Ala Xaa Xaa Val
Phe Xaa Xaa Tyr Ala Xaa Xaa1 5 10 15Asn Xaa Xaa Xaa Xaa Xaa Trp
20716PRTArtificial SequenceSynthetic polypeptide 7Thr Tyr Lys Leu
Ile Leu Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1 5 10
15819PRTArtificial SequenceSynthetic polypeptide 8Gly Val Asp Gly
Glu Trp Thr Tyr Asp Asp Ala Thr Lys Thr Phe Thr1 5 10 15Val Thr
Glu920PRTArtificial SequenceSynthetic polypeptide 9Thr Glu Ala Val
Asp Ala Ala Thr Ala Glu Lys Val Phe Lys Gln Tyr1 5 10 15Ala Asn Asp
Asn 201020PRTArtificial SequenceSynthetic polypeptide 10Thr Xaa Xaa
Xaa Xaa Xaa Xaa Xaa Ala Xaa Xaa Xaa Phe Xaa Xaa Xaa1 5 10 15Ala Xaa
Xaa Asn 201120PRTArtificial SequenceSynthetic polypeptide 11Thr Xaa
Xaa Xaa Xaa Xaa Xaa Xaa Ala Xaa Xaa Xaa Phe Xaa Xaa Xaa1 5 10 15Ala
Xaa Xaa Asn 201214PRTArtificial SequenceSynthetic polypeptide 12Lys
Leu Ile Leu Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1 5
101318PRTArtificial SequenceSynthetic polypeptide 13Glu Lys Val Phe
Lys Gln Tyr Ala Asn Asp Asn Gly Val Asp Gly Glu1 5 10 15Trp
Thr145PRTArtificial SequenceSynthetic polypeptide 14Phe Thr Val Thr
Glu1 5159PRTArtificial SequenceSynthetic polypeptide 15Thr Glu Ala
Val Asp Ala Ala Thr Ala1 5167PRTArtificial SequenceSynthetic
polypeptide 16Tyr Asp Asp Ala Thr Lys Thr1 5179PRTArtificial
SequenceSynthetic polypeptide 17Thr Xaa Xaa Xaa Xaa Xaa Xaa Xaa
Ala1 5187PRTArtificial SequenceSynthetic polypeptide 18Tyr Xaa Xaa
Xaa Xaa Xaa Thr1 5199PRTArtificial SequenceSynthetic polypeptide
19Thr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ala1 5207PRTArtificial
SequenceSynthetic polypeptide 20Tyr Xaa Xaa Xaa Xaa Xaa Thr1
5215PRTArtificial SequenceSynthetic polypeptide 21Thr Tyr Lys Leu
Ile1 52221PRTArtificial SequenceSynthetic polypeptide 22Glu Thr Thr
Thr Glu Ala Val Asp Ala Ala Thr Ala Glu Lys Val Phe1 5 10 15Lys Gln
Tyr Ala Asn 202310PRTArtificial SequenceSynthetic polypeptide 23Thr
Tyr Asp Asp Ala Thr Lys Thr Phe Thr1 5 10248PRTArtificial
SequenceSynthetic polypeptide 24Leu Asn Gly Lys Thr Leu Lys Gly1
5258PRTArtificial SequenceSynthetic polypeptide 25Asp Asn Gly Val
Asp Gly Glu Trp1 5268PRTArtificial SequenceSynthetic polypeptide
26Leu Xaa Xaa Xaa Xaa Xaa Xaa Gly1 5278PRTArtificial
SequenceSynthetic polypeptide 27Asp Xaa Xaa Xaa Xaa Xaa Xaa Trp1
5288PRTArtificial SequenceSynthetic polypeptide 28Leu Xaa Xaa Xaa
Xaa Xaa Xaa Gly1 5298PRTArtificial SequenceSynthetic polypeptide
29Asp Xaa Xaa Xaa Xaa Xaa Xaa Trp1 53033PRTArtificial
SequenceSynthetic polypeptide 30Glu Ala Val Asp Ala Ala Thr Ala Glu
Lys Val Phe Lys Gln Tyr Ala1 5 10 15Asn Asp Asn Gly Val Asp Gly Glu
Trp Thr Tyr Asp Asp Ala Thr Lys 20 25 30Thr3117PRTArtificial
SequenceSynthetic polypeptide 31Thr Tyr Lys Leu Ile Leu Asn Gly Lys
Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr325PRTArtificial
SequenceSynthetic polypeptide 32Phe Thr Val Thr Glu1
53317PRTArtificial SequenceSynthetic polypeptide 33Thr Tyr Xaa Leu
Xaa Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Thr Xaa1 5 10
15Thr345PRTArtificial SequenceSynthetic polypeptide 34Phe Xaa Val
Xaa Xaa1 53517PRTArtificial SequenceSynthetic polypeptide 35Thr Tyr
Xaa Leu Xaa Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Thr Xaa1 5 10
15Thr365PRTArtificial SequenceSynthetic polypeptide 36Phe Xaa Val
Xaa Xaa1 53731PRTArtificial SequenceSynthetic polypeptide 37Lys Thr
Leu Lys Gly Glu Thr Thr Thr Glu Ala Val Asp Ala Ala Thr1 5 10 15Ala
Glu Lys Val Phe Lys Gln Tyr Ala Asn Asp Asn Gly Val Asp 20 25
30388PRTArtificial SequenceSynthetic polypeptide 38Thr Tyr Lys Leu
Ile Leu Asn Gly1 53916PRTArtificial SequenceSynthetic polypeptide
39Gly Glu Trp Thr Tyr Asp Asp Ala Thr Lys Thr Phe Thr Val Thr Glu1
5 10 15408PRTArtificial SequenceSynthetic polypeptide 40Xaa Tyr Xaa
Leu Xaa Leu Xaa Gly1 54116PRTArtificial SequenceSynthetic
polypeptide 41Gly Xaa Trp Xaa Tyr Xaa Xaa Xaa Xaa Xaa Xaa Phe Xaa
Val Xaa Glu1 5 10 15428PRTArtificial SequenceSynthetic polypeptide
42Xaa Tyr Xaa Leu Xaa Leu Xaa Gly1 54316PRTArtificial
SequenceSynthetic polypeptide 43Gly Xaa Trp Xaa Tyr Xaa Xaa Xaa Xaa
Xaa Xaa Phe Xaa Val Xaa Glu1 5 10 154479PRTArtificial
SequenceSynthetic polypeptide 44Asp Tyr Lys Asp Asp Asp Asp Lys Gly
Gly Ser Thr Tyr Lys Leu Ile1 5 10 15Leu Asn Gly Lys Thr Leu Lys Gly
Glu Thr Thr Thr Glu Ala Val Asp 20 25 30Ala Ala Thr Ala Glu Lys Val
Phe Lys Gln Tyr Ala Asn Asp Asn Gly 35 40 45Val Asp Gly Glu Trp Thr
Tyr Asp Asp Ala Thr Lys Thr Phe Thr Val 50 55 60Thr Glu Gly Gly Ser
His His His His His His His His His His65 70 754523PRTArtificial
SequenceSynthetic polypeptide 45Met Lys Lys Asn Ile Ala Phe Leu Leu
Ala Ser Met Phe Val Phe Ser1 5 10 15Ile Ala Thr Asn Ala Tyr Ala
20469PRTArtificial SequenceSynthetic polypeptide 46Asp Lys Thr His
Thr Cys Gly Arg Pro1 54736DNAArtificial SequenceSynthetic
polynucleotide 47gttaccgaag gcggttcttc tagaagtggt tccggt
364812PRTArtificial SequenceSynthetic polypeptide 48Val Thr Glu Gly
Gly Ser Ser Arg Ser Gly Ser Gly1 5 104960DNAArtificial
SequenceSynthetic polynucleotide 49ttaccgaagg cggttctgac aaaactcaca
catgcggccg gcccagtggt tccggtgatt 605021PRTArtificial
SequenceSynthetic polypeptide 50Val Thr Glu Gly Gly Ser Asp Lys Thr
His Thr Cys Gly Arg Pro Ser1 5 10 15Gly Ser Gly Asp Phe
205144DNAArtificial SequenceSynthetic polynucleotide 51tacaaactga
ttctgaacta ataataaaaa ggtgaaacca cgac 445241DNAArtificial
SequenceSynthetic polynucleotide 52gtacgccaac gataattaat aataagaatg
gacctacgat g 415342DNAArtificial SequenceSynthetic polynucleotide
53ggtgaaacca cgacctaata ataagcagca acggcagaaa aa
425442DNAArtificial SequenceSynthetic polynucleotide 54gtgaatggac
ctacgattaa taataaacct tcacggttac cg 425547DNAArtificial
SequenceSynthetic polynucleotide 55tacaaactga ttctgaacnn knnknnknnk
aaaggtgaaa ccacgac 475650DNAArtificial SequenceSynthetic
polynucleotide 56tacaaactga ttctgaacnn knnknnknnk nnkaaaggtg
aaaccacgac 505753DNAArtificial SequenceSynthetic polynucleotide
57tacaaactga ttctgaacnn knnknnknnk nnknnkaaag gtgaaaccac gac
535844DNAArtificial SequenceSynthetic polynucleotide 58gtacgccaac
gataatnnkn nknnknnkga atggacctac gatg 445947DNAArtificial
SequenceSynthetic polynucleotide 59gtacgccaac gataatnnkn nknnknnknn
kgaatggacc tacgatg 476050DNAArtificial SequenceSynthetic
polynucleotide 60gtacgccaac gataatnnkn nknnknnknn knnkgaatgg
acctacgatg 506145DNAArtificial SequenceSynthetic polynucleotide
61ggtgaaacca cgaccnnknn knnknnkgca gcaacggcag aaaaa
456248DNAArtificial SequenceSynthetic polynucleotide 62ggtgaaacca
cgaccnnknn knnknnknnk gcagcaacgg cagaaaaa 486351DNAArtificial
SequenceSynthetic polynucleotide 63ggtgaaacca cgaccnnknn knnknnknnk
nnkgcagcaa cggcagaaaa a 516445DNAArtificial SequenceSynthetic
polynucleotide 64gtgaatggac ctacgatnnk nnknnknnka ccttcacggt taccg
456548DNAArtificial SequenceSynthetic polynucleotide 65gtgaatggac
ctacgatnnk nnknnknnkn nkaccttcac ggttaccg 486651DNAArtificial
SequenceSynthetic polynucleotide 66gtgaatggac ctacgatnnk nnknnknnkn
nknnkacctt cacggttacc g 51675PRTArtificial SequenceSynthetic
polypeptide 67Trp Pro Cys Gly Val1 5685PRTArtificial
SequenceSynthetic polypeptide 68Glu Val Gly Gly Val1
5695PRTArtificial SequenceSynthetic polypeptide 69Ser Ser Ala Trp
Arg1 5704PRTArtificial SequenceSynthetic polypeptide 70Cys Arg Gly
Thr1714PRTArtificial SequenceSynthetic polypeptide 71Trp Gly Glu
Glu1725PRTArtificial SequenceSynthetic polypeptide 72Gly Ser Lys
Thr Gly1 5734PRTArtificial SequenceSynthetic polypeptide 73Ala Ser
Thr Gly1745PRTArtificial SequenceSynthetic polypeptide 74Gly Gly
Arg Trp Arg1 5754PRTArtificial SequenceSynthetic polypeptide 75Arg
Gly Gly Glu1764PRTArtificial SequenceSynthetic polypeptide 76Ser
Asp His Ser1774PRTArtificial SequenceSynthetic polypeptide 77Ser
Asp Gly Met1784PRTArtificial SequenceSynthetic polypeptide 78Asn
Ala His Arg1795PRTArtificial SequenceSynthetic polypeptide 79Cys
Gly Glu Pro Glu1 5804PRTArtificial SequenceSynthetic polypeptide
80Thr His Gly Ala1815PRTArtificial SequenceSynthetic polypeptide
81Thr Gly Leu Val Arg1 5825PRTArtificial SequenceSynthetic
polypeptide 82Gly Ala Cys Val Arg1 5834PRTArtificial
SequenceSynthetic polypeptide 83Gly Gln Gln His1845PRTArtificial
SequenceSynthetic polypeptide 84Gly Thr Ser Arg Glu1
5855PRTArtificial SequenceSynthetic polypeptide 85Cys Ala Thr Thr
Trp1 5864PRTArtificial SequenceSynthetic polypeptide 86Gly Val Ala
Gly1875PRTArtificial SequenceSynthetic polypeptide 87Cys Ala Arg
Tyr Gly1 5885PRTArtificial SequenceSynthetic polypeptide 88Leu Asp
Phe Leu Cys1 5894PRTArtificial SequenceSynthetic polypeptide 89Cys
Asn Thr Arg1904PRTArtificial SequenceSynthetic polypeptide 90Leu
Pro Ser Arg1914PRTArtificial SequenceSynthetic polypeptide 91Arg
Asp Ile Tyr1926PRTArtificial SequenceSynthetic polypeptide 92Gly
Trp Gly Gly Ala Trp1 5936PRTArtificial SequenceSynthetic
polypeptide 93Leu Cys Val Pro Ile Asn1 5945PRTArtificial
SequenceSynthetic polypeptide 94Trp Glu Lys Glu Asp1
5954PRTArtificial SequenceSynthetic polypeptide 95Trp Gly Ser
Gln1966PRTArtificial SequenceSynthetic polypeptide 96Gly Asp His
Ala Phe Ser1 5976PRTArtificial SequenceSynthetic polypeptide 97Trp
Gly Gly Gly Ala Cys1 5984PRTArtificial SequenceSynthetic
polypeptide 98Gly Cys Val Lys1995PRTArtificial SequenceSynthetic
polypeptide 99Glu Gly His Ser Ala1 51005PRTArtificial
SequenceSynthetic polypeptide 100Gly Tyr Gly Gly Arg1
51014PRTArtificial SequenceSynthetic polypeptide 101Cys Cys Gly
Leu11024PRTArtificial SequenceSynthetic polypeptide 102Lys Asp Gly
Gly11035PRTArtificial SequenceSynthetic polypeptide 103Thr Ser Asn
Gly Val1 51044PRTArtificial SequenceSynthetic polypeptide 104Gln
Val Gly Ser11056PRTArtificial SequenceSynthetic polypeptide 105Gly
Val Trp Ser Gln Gly1 51064PRTArtificial SequenceSynthetic
polypeptide 106Trp Gly Cys Arg11075PRTArtificial SequenceSynthetic
polypeptide 107Ser Thr Leu Gly Gly1 51086PRTArtificial
SequenceSynthetic polypeptide 108Phe Val Leu Ala His Ser1
51094PRTArtificial SequenceSynthetic polypeptide 109Arg His Ala
Met11104PRTArtificial SequenceSynthetic polypeptide 110Thr Lys Phe
Cys11116PRTArtificial SequenceSynthetic polypeptide 111Phe Cys Gly
Ser Arg Gly1 51124PRTArtificial SequenceSynthetic polypeptide
112Met Phe Thr Glu11134PRTArtificial SequenceSynthetic polypeptide
113Gly Val Gly Gly11144PRTArtificial SequenceSynthetic polypeptide
114Leu Arg Gly Leu11156PRTArtificial SequenceSynthetic polypeptide
115Arg Arg Ile Gln Cys Gly1 51164PRTArtificial SequenceSynthetic
polypeptide 116Gln Asn Leu Val11176PRTArtificial SequenceSynthetic
polypeptide 117Tyr Thr Asp Ala Leu Ser1 51186PRTArtificial
SequenceSynthetic polypeptide 118Lys Ala Val Ser Val Arg1
51196PRTArtificial SequenceSynthetic polypeptide 119His Gly Arg Thr
Ala Gly1 51204PRTArtificial SequenceSynthetic polypeptide 120Arg
Gly Val Val11214PRTArtificial SequenceSynthetic polypeptide 121Val
Trp Leu Gly11224PRTArtificial SequenceSynthetic polypeptide 122Gly
Glu Asp Ala11235PRTArtificial SequenceSynthetic polypeptide 123Ser
Val Trp Glu Cys1 51246PRTArtificial SequenceSynthetic polypeptide
124Ser Lys Tyr Val Leu Gly1 51256PRTArtificial SequenceSynthetic
polypeptide 125Ala Pro Leu Arg Met Gln1 51265PRTArtificial
SequenceSynthetic polypeptide 126Tyr Gly Trp Lys His1
51276PRTArtificial SequenceSynthetic polypeptide 127Gly Cys Gly Ser
Arg Leu1 51286PRTArtificial SequenceSynthetic polypeptide 128Asp
Ala Met Cys Lys Gly1 51294PRTArtificial SequenceSynthetic
polypeptide 129Arg Gly Lys Tyr11304PRTArtificial SequenceSynthetic
polypeptide 130Glu Gly Gly Gly11315PRTArtificial SequenceSynthetic
polypeptide 131Asp Ser Ser Cys Gly1 51325PRTArtificial
SequenceSynthetic polypeptide 132Gly Ile Gly Val Ala1
51335PRTArtificial SequenceSynthetic polypeptide 133Met Cys Ser Ser
Gly1 51344PRTArtificial SequenceSynthetic polypeptide 134Cys Pro
Thr Arg11354PRTArtificial SequenceSynthetic polypeptide 135Gly Arg
Arg Thr11366PRTArtificial SequenceSynthetic polypeptide 136Phe Glu
Cys Gly Trp Gly1 51374PRTArtificial SequenceSynthetic polypeptide
137Asp Arg Gly Ser11384PRTArtificial SequenceSynthetic polypeptide
138Thr Cys Thr Pro11394PRTArtificial SequenceSynthetic polypeptide
139Val Glu Gly Gly11405PRTArtificial SequenceSynthetic polypeptide
140Ser Leu Asp Glu Arg1 51414PRTArtificial SequenceSynthetic
polypeptide 141Gly Gly Ala Glu11425PRTArtificial SequenceSynthetic
polypeptide 142Ala Phe Glu Ala Glu1 51436PRTArtificial
SequenceSynthetic polypeptide 143Pro Glu Ser Ile Met Arg1
51444PRTArtificial SequenceSynthetic polypeptide 144Gly Glu Val
Thr11455PRTArtificial SequenceSynthetic polypeptide 145Ser Ser Val
Asp Gly1 51464PRTArtificial SequenceSynthetic polypeptide 146Val
Gly Gly Ala11476PRTArtificial SequenceSynthetic polypeptide 147Gly
Trp Cys Ala Pro Arg1 51485PRTArtificial SequenceSynthetic
polypeptide 148Gly Glu Cys Trp Gly1 51496PRTArtificial
SequenceSynthetic polypeptide 149His His Gly Cys Arg Ala1
51504PRTArtificial SequenceSynthetic polypeptide 150Cys Asp Asp
Arg11514PRTArtificial SequenceSynthetic polypeptide 151Asp Trp Gly
Arg11524PRTArtificial SequenceSynthetic polypeptide 152Thr Arg Gly
Asn11534PRTArtificial SequenceSynthetic polypeptide 153Asp Ser Ser
Ala11544PRTArtificial SequenceSynthetic polypeptide 154Leu Ser Cys
Gln11555PRTArtificial SequenceSynthetic polypeptide 155Cys Val Glu
Thr Arg1 51564PRTArtificial SequenceSynthetic polypeptide 156Val
Val Gly Glu11576PRTArtificial SequenceSynthetic polypeptide 157Arg
Pro Thr Ser Asp Met1 51586PRTArtificial SequenceSynthetic
polypeptide 158Trp Glu Asp Thr Cys Val1 51594PRTArtificial
SequenceSynthetic polypeptide 159Ser Cys Leu
Gly11605PRTArtificial
SequenceSynthetic polypeptide 160Lys Glu Val Lys Gln1
51614PRTArtificial SequenceSynthetic polypeptide 161Asp Ser Ser
Val11624PRTArtificial SequenceSynthetic polypeptide 162Cys Thr Leu
Lys11634PRTArtificial SequenceSynthetic polypeptide 163Pro Ser Gly
His11644PRTArtificial SequenceSynthetic polypeptide 164Trp Ser Gln
Cys11654PRTArtificial SequenceSynthetic polypeptide 165Gln Cys Asn
Asn11666PRTArtificial SequenceSynthetic polypeptide 166Leu Ile Pro
Asn Cys Tyr1 51676PRTArtificial SequenceSynthetic polypeptide
167Ser Ser Ala Leu Lys Arg1 51684PRTArtificial SequenceSynthetic
polypeptide 168Glu Leu Gly Gly11696PRTArtificial SequenceSynthetic
polypeptide 169Cys Ala Arg Arg His Cys1 51705PRTArtificial
SequenceSynthetic polypeptide 170Cys Trp Pro Ser Gly1
51716PRTArtificial SequenceSynthetic polypeptide 171Gly Ala Ser Ile
Asn Cys1 51724PRTArtificial SequenceSynthetic polypeptide 172Gly
Cys Gly Arg11736PRTArtificial SequenceSynthetic polypeptide 173Tyr
Lys Cys Thr Asp Asp1 51745PRTArtificial SequenceSynthetic
polypeptide 174Cys Arg Gly Pro Arg1 51754PRTArtificial
SequenceSynthetic polypeptide 175Ser Ser Val Gly11765PRTArtificial
SequenceSynthetic polypeptide 176Ala Cys Leu Gly Gly1
51775PRTArtificial SequenceSynthetic polypeptide 177Gln Asn Cys Glu
Met1 51786PRTArtificial SequenceSynthetic polypeptide 178Lys Glu
Arg Gly Ala Gly1 51795PRTArtificial SequenceSynthetic polypeptide
179Pro Asp Glu Met Val1 51805PRTArtificial SequenceSynthetic
polypeptide 180Asn Ser Asp Gln Gln1 51814PRTArtificial
SequenceSynthetic polypeptide 181Gly Ala Gly Gly11825PRTArtificial
SequenceSynthetic polypeptide 182Gln Gly Cys Gly Glu1
51834PRTArtificial SequenceSynthetic polypeptide 183Cys Pro Ser
Arg11844PRTArtificial SequenceSynthetic polypeptide 184Ser Asp Gly
Cys11855PRTArtificial SequenceSynthetic polypeptide 185Ala Gly Ser
Ser Pro1 51865PRTArtificial SequenceSynthetic polypeptide 186Ala
Pro Gln Val Gly1 51874PRTArtificial SequenceSynthetic polypeptide
187Gly Cys Ser Ala11886PRTArtificial SequenceSynthetic polypeptide
188Gly Cys Arg Gly Glu Ser1 51895PRTArtificial SequenceSynthetic
polypeptide 189Pro Arg Pro Asp Ala1 51906PRTArtificial
SequenceSynthetic polypeptide 190Ser Gly Asn Leu Gly Gly1
51914PRTArtificial SequenceSynthetic polypeptide 191Arg Gly Met
Ala11924PRTArtificial SequenceSynthetic polypeptide 192Glu Gly Gly
Gly11935PRTArtificial SequenceSynthetic polypeptide 193Arg Arg Asp
Asp Glu1 51944PRTArtificial SequenceSynthetic polypeptide 194Leu
Pro Tyr Pro11954PRTArtificial SequenceSynthetic polypeptide 195Gly
Arg Ala Gly11965PRTArtificial SequenceSynthetic polypeptide 196Tyr
Arg Leu Gly Arg1 519793DNAArtificial SequenceSynthetic
polynucleotide 197acgaccgaag cagtgkhtkh tkhtkhtgca khtkhtgttt
tckhtkhtta cgcckhtkht 60aatkhtkhtk htkhtkhttg gacctacgat gat
9319896DNAArtificial SequenceSynthetic polynucleotide 198acgaccgaag
cagtgkhtkh tkhtkhtgca khtkhtgttt tckhtkhtta cgcckhtkht 60aatkhtkhtk
htkhtkhtkh ttggacctac gatgat 9619999DNAArtificial SequenceSynthetic
polynucleotide 199acgaccgaag cagtgkhtkh tkhtkhtgca khtkhtgttt
tckhtkhtta cgcckhtkht 60aatkhtkhtk htkhtkhtkh tkhttggacc tacgatgat
9920084DNAArtificial SequenceSynthetic polynucleotide 200ggtgaaacca
cgacckhtkh tkhtkhtkht khtkhtgcak htkhtkhttt ckhtkhtkht 60gcckhtkhta
atggcgtgga tggt 8420187DNAArtificial SequenceSynthetic
polynucleotide 201ggtgaaacca cgacckhtkh tkhtkhtkht khtkhtkhtg
cakhtkhtkh tttckhtkht 60khtgcckhtk htaatggcgt ggatggt
8720290DNAArtificial SequenceSynthetic polynucleotide 202ggtgaaacca
cgacckhtkh tkhtkhtkht khtkhtkhtk htgcakhtkh tkhtttckht 60khtkhtgcck
htkhtaatgg cgtggatggt 9020345DNAArtificial SequenceSynthetic
polynucleotide 203gatgataaag gcggtagckh tkhtkhttac aaactgattc tgaac
4520457DNAArtificial SequenceSynthetic polynucleotide 204aaaggtgaaa
ccacgacckh tkhtkhtkht khtkhtkhtg cagaaaaagt tttcaaa
5720560DNAArtificial SequenceSynthetic polynucleotide 205aaaggtgaaa
ccacgacckh tkhtkhtkht khtkhtkhtk htgcagaaaa agttttcaaa
6020663DNAArtificial SequenceSynthetic polynucleotide 206aaaggtgaaa
ccacgacckh tkhtkhtkht khtkhtkhtk htkhtgcaga aaaagttttc 60aaa
6320751DNAArtificial SequenceSynthetic polynucleotide 207gatggtgaat
ggacctackh tkhtkhtkht khtaccttca cggttaccga a 5120854DNAArtificial
SequenceSynthetic polynucleotide 208gatggtgaat ggacctackh
tkhtkhtkht khtkhtacct tcacggttac cgaa 5420957DNAArtificial
SequenceSynthetic polynucleotide 209gatggtgaat ggacctackh
tkhtkhtkht khtkhtkhta ccttcacggt taccgaa 5721054DNAArtificial
SequenceSynthetic polynucleotide 210acgtacaaac tgattctgkh
tkhtkhtkht khtkhtggtg aaaccacgac cgaa 5421157DNAArtificial
SequenceSynthetic polynucleotide 211acgtacaaac tgattctgkh
tkhtkhtkht khtkhtkhtg gtgaaaccac gaccgaa 5721260DNAArtificial
SequenceSynthetic polynucleotide 212acgtacaaac tgattctgkh
tkhtkhtkht khtkhtkhtk htggtgaaac cacgaccgaa 6021354DNAArtificial
SequenceSynthetic polynucleotide 213aaacagtacg ccaacgatkh
tkhtkhtkht khtkhttgga cctacgatga tgcg 5421457DNAArtificial
SequenceSynthetic polynucleotide 214aaacagtacg ccaacgatkh
tkhtkhtkht khtkhtkhtt ggacctacga tgatgcg 5721560DNAArtificial
SequenceSynthetic polynucleotide 215aaacagtacg ccaacgatkh
tkhtkhtkht khtkhtkhtk httggaccta cgatgatgcg 6021645DNAArtificial
SequenceSynthetic polynucleotide 216acgaaaacct tcacggttkh
tkhtkhtggc ggttctgaca aaact 4521778DNAArtificial SequenceSynthetic
polynucleotide 217aaaggcggta gcacgtackh tctgkhtctg khtkhtkhtk
htkhtkhtkh tkhtacckht 60accgaagcag tggatgca 7821881DNAArtificial
SequenceSynthetic polynucleotide 218aaaggcggta gcacgtackh
tctgkhtctg khtkhtkhtk htkhtkhtkh tkhtkhtacc 60khtaccgaag cagtggatgc
a 8121984DNAArtificial SequenceSynthetic polynucleotide
219aaaggcggta gcacgtackh tctgkhtctg khtkhtkhtk htkhtkhtkh
tkhtkhtkht 60acckhtaccg aagcagtgga tgca 8422051DNAArtificial
SequenceSynthetic polynucleotide 220gatgcgacga aaaccttckh
tgttkhtkht khtggcggtt ctgacaaaac t 5122160DNAArtificial
SequenceSynthetic polynucleotide 221gatgataaag gcggtagckh
tkhttackht ctgkhtctgk htggcaaaac cctgaaaggt 6022278DNAArtificial
SequenceSynthetic polynucleotide 222gataatggcg tggatggtkh
ttggkhttac khtkhtkhtk htkhtkhttt ckhtgttkht 60gaaggcggtt ctgacaaa
7822381DNAArtificial SequenceSynthetic polynucleotide 223gataatggcg
tggatggtkh ttggkhttac khtkhtkhtk htkhtkhtkh tttckhtgtt 60khtgaaggcg
gttctgacaa a 8122484DNAArtificial SequenceSynthetic polynucleotide
224gataatggcg tggatggtkh ttggkhttac khtkhtkhtk htkhtkhtkh
tkhtttckht 60gttkhtgaag gcggttctga caaa 84225201DNAArtificial
SequenceSynthetic polynucleotide 225gatgataaag gcggtagcac
gtacaaactg attctgaacg gcaaaaccct gaaaggtgaa 60accacgaccg aagcagtgga
tgcagcaacg gcagaaaaag ttttcaaaca gtacgccaac 120gataatggcg
tggatggtga atggacctac gatgatgcga cgaaaacctt cacggttacc
180gaaggcggtt ctgacaaaac t 20122667PRTArtificial SequenceSynthetic
polypeptide 226Asp Asp Lys Gly Gly Ser Thr Tyr Lys Leu Ile Leu Asn
Gly Lys Thr1 5 10 15Leu Lys Gly Glu Thr Thr Thr Glu Ala Val Asp Ala
Ala Thr Ala Glu 20 25 30Lys Val Phe Lys Gln Tyr Ala Asn Asp Asn Gly
Val Asp Gly Glu Trp 35 40 45Thr Tyr Asp Asp Ala Thr Lys Thr Phe Thr
Val Thr Glu Gly Gly Ser 50 55 60Asp Lys Thr6522755PRTArtificial
SequenceSynthetic polypeptide 227Ser Tyr Lys Leu Val Ile Lys Gly
Ala Thr Phe Ser Gly Glu Thr Ala1 5 10 15Thr Lys Ala Val Asp Ala Ala
Val Ala Glu Gln Thr Phe Arg Asp Tyr 20 25 30Ala Asn Lys Asn Gly Val
Asp Gly Val Trp Ala Tyr Asp Ala Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5522855PRTArtificial SequenceSynthetic polypeptide
228Thr Tyr Arg Leu Val Ile Lys Gly Val Thr Phe Ser Gly Glu Thr Ala1
5 10 15Thr Lys Ala Val Asp Ala Ala Thr Ala Glu Gln Thr Phe Arg Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Ile Thr Gly Glu Trp Ala Tyr Asp Thr
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5522955PRTArtificial
SequenceSynthetic polypeptide 229Ser Tyr Lys Leu Val Ile Lys Gly
Ala Thr Phe Ser Gly Glu Thr Ala1 5 10 15Thr Lys Ala Val Asp Ala Ala
Val Ala Glu Gln Thr Phe Arg Asp Tyr 20 25 30Ala Asn Lys Asn Gly Val
Asp Gly Val Trp Ala Tyr Asp Ala Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5523055PRTArtificial SequenceSynthetic polypeptide
230Thr Tyr Arg Leu Val Ile Lys Gly Val Thr Phe Ser Gly Glu Thr Ala1
5 10 15Thr Lys Ala Val Asp Ala Ala Thr Ala Glu Gln Thr Phe Arg Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Thr Gly Glu Trp Ala Tyr Asp Ala
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5523155PRTArtificial
SequenceSynthetic polypeptide 231Ser Tyr Lys Leu Val Ile Lys Gly
Ala Thr Phe Ser Gly Glu Thr Ala1 5 10 15Thr Lys Ala Val Asp Ala Ala
Val Ala Glu Gln Thr Phe Arg Asp Tyr 20 25 30Ala Asn Lys Asn Gly Val
Asp Gly Val Trp Ala Tyr Asp Ala Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5523255PRTArtificial SequenceSynthetic polypeptide
232Thr Tyr Arg Leu Val Ile Lys Gly Val Thr Phe Ser Gly Glu Thr Ser1
5 10 15Thr Lys Ala Val Asp Ala Ala Thr Ala Glu Gln Thr Phe Arg Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Thr Gly Glu Trp Ala Tyr Asp Ala
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5523321PRTArtificial
SequenceSynthetic polypeptide 233Ala Asn Ile Pro Ala Glu Lys Ala
Phe Arg Gln Tyr Ala Asn Asp Asn1 5 10 15Gly Val Asp Gly Val
2023455PRTArtificial SequenceSynthetic polypeptide 234Thr Tyr Lys
Leu Ile Leu Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu
Ala Val Asp Ala Ala Thr Ala Glu Lys Val Phe Lys Gln Tyr 20 25 30Ala
Asn Asp Asn Gly Val Asp Gly Glu Trp Thr Tyr Asp Asp Ala Thr 35 40
45Lys Thr Phe Thr Val Thr Glu 50 5523555PRTArtificial
SequenceSynthetic polypeptide 235Thr Tyr Lys Leu Val Ile Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Asp Gly Val Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5523655PRTArtificial SequenceSynthetic polypeptide
236Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Lys Thr Val Asp Ala Glu Thr Ala Glu Lys Ala Phe Lys Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Asp Gly Val Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5523755PRTArtificial
SequenceSynthetic polypeptide 237Thr Tyr Lys Leu Val Ile Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Asn Glu Asn Gly Val
Asp Gly Val Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5523855PRTArtificial SequenceSynthetic polypeptide
238Thr Tyr Lys Leu Val Val Lys Gly Asn Thr Phe Ser Gly Glu Thr Thr1
5 10 15Thr Lys Ala Ile Asp Thr Ala Thr Ala Glu Lys Glu Phe Lys Gln
Tyr 20 25 30Ala Thr Ala Asn Asn Val Asp Gly Glu Trp Ser Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5523955PRTArtificial
SequenceSynthetic polypeptide 239Thr Tyr Lys Leu Ile Val Lys Gly
Asn Thr Phe Ser Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Thr Ala Asn Asn Val
Asp Gly Glu Trp Ser Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5524055PRTArtificial SequenceSynthetic polypeptide
240Thr Tyr Lys Leu Ile Val Lys Gly Asn Thr Phe Ser Gly Glu Thr Thr1
5 10 15Thr Lys Ala Ile Asp Ala Ala Thr Ala Glu Lys Glu Phe Lys Gln
Tyr 20 25 30Ala Thr Ala Asn Gly Val Asp Gly Glu Trp Ser Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5524155PRTArtificial
SequenceSynthetic polypeptide 241Thr Tyr Lys Leu Ile Val Lys Gly
Asn Thr Phe Ser Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Asn Glu Asn Gly Val
Tyr Gly Glu Trp Ser Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5524255PRTArtificial SequenceSynthetic polypeptide
242Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Lys Ala Val Asp Ala Glu Thr Ala Glu Lys Ala Phe Lys Gln
Tyr 20 25 30Ala Asn Glu Asn Gly Val Asp Gly Val Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5524345PRTArtificial
SequenceSynthetic polypeptide 243Met Lys Gly Glu Thr Thr Thr Glu
Ala Val Asp Ala Ala Thr Ala Glu1 5 10 15Lys Val Phe Lys Gln Tyr Ala
Asn Asp Asn Gly Val Asp Gly Glu Trp 20 25 30Thr Tyr Asp Asp Ala Thr
Lys Thr Phe Thr Val Thr Glu 35 40 4524455PRTArtificial
SequenceSynthetic polypeptide 244Thr Tyr Lys Leu Val Ile Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Asp Gly Val Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5524555PRTArtificial SequenceSynthetic polypeptide
245Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Glu Ala Val Asp Ala Ala Thr Ala Glu Lys Val Phe Lys Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Asp Gly Glu Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5524653PRTArtificial
SequenceSynthetic polypeptide 246Thr Tyr Lys Leu Ile Leu Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu Ala Val Asp Ala Ala
Thr Ala Arg Ser Phe Asn Phe Pro Ile 20 25 30Leu Glu Asn Ser Ser Ser
Val Pro Gly Asp Pro Leu Glu Ser Thr Cys 35 40 45Met His Val Glu His
5024756PRTArtificial SequenceSynthetic polypeptide 247Thr Tyr Lys
Leu Ile Leu Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu
Ala Val Asp Ala Ala Thr Ala Arg Ser Phe Asn Phe Pro Ile 20 25 30Leu
Glu Asn Ser Ser Ser Val Pro Gly Asp Pro Leu Glu Ser Thr Cys 35 40
45Arg His Ala Ser Phe
Ala Gln Ala 50 5524855PRTArtificial SequenceSynthetic polypeptide
248Ser Tyr Lys Leu Val Ile Lys Gly Ala Thr Phe Ser Gly Glu Thr Ala1
5 10 15Thr Lys Ala Val Asp Ala Ala Val Ala Glu Gln Thr Phe Arg Asp
Tyr 20 25 30Ala Asn Lys Asn Gly Val Asp Gly Val Trp Ala Tyr Asp Ala
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5524955PRTArtificial
SequenceSynthetic polypeptide 249Thr Tyr Arg Leu Val Ile Lys Gly
Val Thr Phe Ser Gly Glu Thr Ala1 5 10 15Thr Lys Ala Val Asp Ala Ala
Thr Ala Glu Gln Ala Phe Arg Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Thr Gly Glu Trp Ala Tyr Asp Ala Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5525055PRTArtificial SequenceSynthetic polypeptide
250Ser Tyr Lys Leu Val Ile Lys Gly Ala Thr Phe Ser Gly Glu Thr Ala1
5 10 15Thr Lys Ala Val Asp Ala Ala Val Ala Glu Gln Thr Phe Arg Asp
Tyr 20 25 30Ala Asn Lys Asn Gly Val Asp Gly Val Trp Ala Tyr Asp Ala
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5525155PRTArtificial
SequenceSynthetic polypeptide 251Thr Tyr Arg Leu Val Ile Lys Gly
Val Thr Phe Ser Gly Glu Thr Ala1 5 10 15Thr Lys Ala Val Asp Ala Ala
Thr Ala Glu Gln Thr Phe Arg Gln Tyr 20 25 30Ala Asn Asp Asn Gly Ile
Thr Gly Glu Trp Ala Tyr Asp Thr Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5525255PRTArtificial SequenceSynthetic polypeptide
252Thr Tyr Lys Leu Val Val Lys Gly Asn Thr Phe Ser Gly Glu Thr Thr1
5 10 15Thr Lys Ala Ile Asp Thr Ala Thr Ala Glu Lys Glu Phe Lys Gln
Tyr 20 25 30Ala Thr Ala Asn Asn Val Asp Gly Glu Trp Ser Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5525355PRTArtificial
SequenceSynthetic polypeptide 253Thr Tyr Lys Leu Ile Val Lys Gly
Asn Thr Phe Ser Gly Glu Thr Thr1 5 10 15Thr Lys Ala Ile Asp Ala Ala
Thr Ala Glu Lys Glu Phe Lys Gln Tyr 20 25 30Ala Thr Ala Asn Asn Val
Asp Gly Glu Trp Ser Tyr Asp Tyr Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5525455PRTArtificial SequenceSynthetic polypeptide
254Thr Tyr Lys Leu Ile Val Lys Gly Asn Thr Phe Ser Gly Glu Thr Thr1
5 10 15Thr Lys Ala Ile Asp Ala Ala Thr Ala Glu Lys Glu Phe Lys Gln
Tyr 20 25 30Ala Thr Ala Asn Asn Val Asp Gly Glu Trp Ser Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5525555PRTArtificial
SequenceSynthetic polypeptide 255Thr Tyr Lys Leu Ile Val Lys Gly
Asn Thr Phe Ser Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Thr Ala Asn Asn Val
Asp Gly Glu Trp Ser Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5525655PRTArtificial SequenceSynthetic polypeptide
256Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Lys Ala Val Asp Val Glu Thr Ala Glu Lys Ala Phe Lys Gln
Tyr 20 25 30Ala Asn Glu Asn Gly Val Asp Gly Val Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5525755PRTArtificial
SequenceSynthetic polypeptide 257Thr Tyr Lys Leu Ile Leu Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu Ala Val Asp Ala Ala
Thr Ala Glu Lys Val Phe Lys Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Asp Gly Glu Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5525855PRTArtificial SequenceSynthetic polypeptide
258Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Lys Ala Val Asp Ala Glu Thr Ala Glu Lys Ala Phe Lys Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Asp Gly Val Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5525955PRTArtificial
SequenceSynthetic polypeptide 259Thr Tyr Lys Leu Ile Leu Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Glu Ala Val Asp Ala Ala
Thr Ala Glu Lys Val Phe Lys Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Asp Gly Glu Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5526055PRTArtificial SequenceSynthetic polypeptide
260Thr Tyr Lys Leu Val Ile Asn Gly Lys Thr Leu Lys Gly Glu Thr Thr1
5 10 15Thr Glu Ala Val Asp Ala Ala Thr Ala Glu Lys Val Phe Lys Gln
Tyr 20 25 30Ala Asn Asp Asn Gly Val Asp Gly Glu Trp Thr Tyr Asp Asp
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5526155PRTArtificial
SequenceSynthetic polypeptide 261Thr Tyr Lys Leu Val Ile Asn Gly
Lys Thr Leu Lys Gly Glu Thr Thr1 5 10 15Thr Lys Ala Val Asp Ala Glu
Thr Ala Glu Lys Ala Phe Lys Gln Tyr 20 25 30Ala Asn Asp Asn Gly Val
Asp Gly Val Trp Thr Tyr Asp Asp Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5526255PRTArtificial SequenceSynthetic polypeptide
262Xaa Tyr Xaa Leu Xaa Xaa Xaa Gly Xaa Thr Xaa Xaa Gly Glu Thr Xaa1
5 10 15Thr Xaa Xaa Xaa Asp Xaa Xaa Xaa Ala Glu Xaa Xaa Phe Xaa Xaa
Tyr 20 25 30Ala Xaa Xaa Asn Xaa Xaa Xaa Gly Xaa Trp Xaa Tyr Asp Xaa
Ala Thr 35 40 45Lys Thr Xaa Thr Xaa Thr Glu 50 5526355PRTArtificial
SequenceSynthetic polypeptide 263Xaa Tyr Xaa Leu Xaa Xaa Xaa Gly
Xaa Thr Xaa Gly Glu Thr Xaa Thr1 5 10 15Xaa Xaa Xaa Xaa Asp Xaa Xaa
Xaa Ala Glu Xaa Xaa Phe Xaa Xaa Tyr 20 25 30Ala Xaa Xaa Asn Xaa Xaa
Xaa Gly Xaa Trp Xaa Tyr Asp Xaa Ala Thr 35 40 45Lys Thr Phe Thr Val
Thr Glu 50 5526455PRTArtificial SequenceSynthetic polypeptide
264Thr Tyr Lys Leu Xaa Xaa Xaa Gly Xaa Thr Xaa Xaa Gly Glu Thr Xaa1
5 10 15Thr Xaa Ala Val Asp Xaa Xaa Thr Ala Glu Xaa Xaa Phe Xaa Gln
Tyr 20 25 30Ala Xaa Xaa Asn Xaa Val Asp Gly Xaa Trp Xaa Tyr Asp Xaa
Ala Thr 35 40 45Lys Thr Phe Thr Val Thr Glu 50 5526555PRTArtificial
SequenceSynthetic polypeptide 265Thr Xaa Lys Xaa Xaa Xaa Xaa Xaa
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa1 5 10 15Xaa Glu Xaa Val Asp Ala Xaa
Xaa Xaa Glu Lys Xaa Xaa Lys Xaa Xaa 20 25 30Xaa Asn Xaa Xaa Xaa Xaa
Xaa Gly Xaa Xaa Thr Tyr Xaa Asp Xaa Xaa 35 40 45Lys Thr Xaa Thr Xaa
Thr Glu 50 5526655PRTArtificial SequenceSynthetic polypeptide
266Thr Xaa Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa1
5 10 15Xaa Glu Xaa Val Asp Ala Xaa Xaa Xaa Glu Lys Xaa Xaa Lys Xaa
Xaa 20 25 30Xaa Asn Xaa Xaa Xaa Xaa Xaa Gly Xaa Xaa Thr Tyr Xaa Asp
Xaa Xaa 35 40 45Lys Thr Xaa Thr Xaa Thr Glu 50 552674PRTArtificial
SequenceSynthetic polypeptide 267Ser Ile Ile Leu12684PRTArtificial
SequenceSynthetic polypeptide 268Gln Arg Tyr Asp12696PRTArtificial
SequenceSynthetic polypeptide 269Lys Glu Tyr Tyr Asn Met1
52704PRTArtificial SequenceSynthetic polypeptide 270Gly Gly His
Ser12714PRTArtificial SequenceSynthetic polypeptide 271Glu Phe Phe
Ser12725PRTArtificial SequenceSynthetic polypeptide 272Gly Val Val
Leu Lys1 5
* * * * *