U.S. patent application number 14/280384 was filed with the patent office on 2014-10-09 for gpcr fusion protein containing an n-terminal autonomously folding stable domain, and crystals of the same.
The applicant listed for this patent is The Board of Trustees of the Leland Stanford Junior University. Invention is credited to Brian K. Kobilka, Yaozhong Zou.
Application Number | 20140303345 14/280384 |
Document ID | / |
Family ID | 46828780 |
Filed Date | 2014-10-09 |
United States Patent
Application |
20140303345 |
Kind Code |
A1 |
Kobilka; Brian K. ; et
al. |
October 9, 2014 |
GPCR Fusion Protein Containing an N-Terminal Autonomously Folding
Stable Domain, and Crystals of the Same
Abstract
Certain embodiments provide a GPCR fusion protein. In particular
embodiments, the GPCR fusion protein comprises: a) a G-protein
coupled receptor (GPCR); and b) an autonomously folding stable
domain, where the autonomously folding stable domain is N-terminal
to the GPCR and is heterologous to the GPCR. The GPCR fusion
protein is characterized in that is crystallizable under lipidic
cubic phase crystallization conditions. In certain embodiments, the
GPCR fusion protein may be crystallizable in a complex with a
G-protein or in a complex with an antibody that binds to the IC3
loop of the GPCR.
Inventors: |
Kobilka; Brian K.; (Palo
Alto, CA) ; Zou; Yaozhong; (Sunnyvale, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
The Board of Trustees of the Leland Stanford Junior
University |
Palo Alto |
CA |
US |
|
|
Family ID: |
46828780 |
Appl. No.: |
14/280384 |
Filed: |
May 16, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13420329 |
Mar 14, 2012 |
8765414 |
|
|
14280384 |
|
|
|
|
61453020 |
Mar 15, 2011 |
|
|
|
61507425 |
Jul 13, 2011 |
|
|
|
Current U.S.
Class: |
530/350 ;
435/69.7; 536/23.4; 703/11 |
Current CPC
Class: |
C07K 14/723 20130101;
C07K 2319/00 20130101; C07K 14/705 20130101; C07K 2319/50 20130101;
C07K 2319/43 20130101; G16B 15/00 20190201; C07K 14/70571 20130101;
C12N 9/2462 20130101 |
Class at
Publication: |
530/350 ;
536/23.4; 435/69.7; 703/11 |
International
Class: |
C07K 14/705 20060101
C07K014/705; G06F 19/16 20060101 G06F019/16 |
Goverment Interests
GOVERNMENT RIGHTS
[0002] This invention was made with Government support under
contract GM083118 awarded by the National Institutes of Health. The
Government has certain rights in this invention.
Claims
1. A GPCR fusion protein comprising, a) a G-protein coupled
receptor (GPCR); and, b) an autonomously folding stable domain;
wherein said autonomously folding stable domain is N-terminal to
said GPCR and is heterologous to said GPCR; and wherein said GPCR
fusion protein is characterized in that is crystallizable under
lipidic cubic phase crystallization conditions.
2. The GPCR fusion protein of claim 1, further comprising an
epitope tag N-terminal to said autonomously folding stable
domain.
3. The GPCR fusion protein of claim 2, further comprising a
protease cleavage site between said epitope tag and said
autonomously folding stable domain
4. The GPCR fusion protein of claim 1, wherein said autonomously
folding stable domain comprises the amino acid sequence of
lysozyme.
5. The GPCR fusion protein of claim 1, wherein said GPCR comprises
a second autonomously folding stable domain between the TM5 and TM6
regions of said GPCR.
6. The GPCR fusion protein of claim 1, wherein said GPCR is
active.
7. The GPCR fusion protein of claim 1, wherein said GPCR is
naturally occurring.
8. The GPCR fusion protein of claim 1, wherein said GPCR is
non-naturally occurring.
9. A composition of matter comprising: a) a GPCR fusion protein of
claim 1; and b) a moiety complexed with said GPCR fusion
protein.
10. The composition of claim 9, wherein said moiety complexed with
said GPCR fusion protein is a G-protein.
11. The composition of claim 9, wherein said moiety is an antibody
that is bound to the IC3 loop of said GPCR.
12. The composition of claim 9, wherein said moiety is a ligand for
said GPCR.
13. A nucleic acid encoding the fusion protein of claim 1.
14. The nucleic acid of claim 13, wherein said nucleic acid
encodes, from 5' to 3': a) a signal sequence; b) an epitope tag; c)
a protease cleavage site; d) an autonomously folding stable domain;
and e) a GPCR.
15. A crystal comprising the GPCR fusion protein of claim 1.
16. The crystal of claim 15, further comprising a G protein
complexed with said GPCR.
17-19. (canceled)
20. A method comprising: culturing the cell of claim 13 to produce
said GPCR fusion protein; and isolating said fusion protein from
said cell.
21. The method of claim 20, further comprising: crystallizing said
GPCR fusion protein to make crystals.
22. The method of claim 20, further comprising: obtaining atomic
coordinates of said fusion protein from said crystal.
23. A method for analyzing the three dimensional structure of a
GPCR on a computer system, comprising: a) accessing a file
containing atomic coordinates of a GPCR using a computer system
that comprises a modeling program, wherein said atomic coordinates
are produced by subjecting crystals of a GPCR fusion protein of
claim 1 to X-ray diffraction analysis; b) modeling said atomic
coordinates on said computer system using said modeling program to
produce a model of the three dimensional structure of at least the
ligand binding site of the GPCR; c) displaying on the computer
system a model of said ligand binding site.
Description
CROSS-REFERENCING
[0001] This application claims the benefit of U.S. provisional
application Ser. Nos. 61/453,020, filed Mar. 15, 2011 and
61/507,425, filed Jul. 13, 2011, which are incorporated by
reference in their entirety.
BACKGROUND
[0003] G protein-coupled receptor (GPCR) signaling plays a vital
role in a number of physiological contexts including, but not
limited to, metabolism, inflammation, neuronal function, and
cardiovascular function. For instance, GPCRs include receptors for
biogenic amines, e.g., dopamine, epinephrine, histamine, glutamate,
acetylcholine, and serotonin; for purines such as ADP and ATP; for
the vitamin niacin; for lipid mediators of inflammation such as
prostaglandins, lipoxins, platelet activating factor, and
leukotrienes; for peptide hormones such as calcitonin, follicle
stimulating hormone, gonadotropin releasing hormone, ghrelin,
motilin, neurokinin, and oxytocin; for non-hormone peptides such as
beta-endorphin, dynorphin A, Leu-enkephalin, and Met-enkephalin;
for the non-peptide hormone melatonin; for polypeptides such as C5a
anaphylatoxin and chemokines; for proteases such as thrombin,
trypsin, and factor Xa; and for sensory signal mediators, e.g.,
retinal photopigments and olfactory stimulatory molecules. GPCRs
are of immense interest for drug development.
SUMMARY
[0004] A GPCR fusion protein is provided. In certain embodiments,
the GPCR fusion protein comprises: a) a G-protein coupled receptor
(GPCR); and b) an autonomously folding stable domain, where the
autonomously folding stable domain is N-terminal to the GPCR and is
heterologous to the GPCR. The GPCR fusion protein is characterized
in that is crystallizable under lipidic cubic phase crystallization
conditions. In certain embodiments, the GPCR fusion protein may be
crystallizable in a complex with a G-protein or in a complex with
an antibody that binds to the IC3 loop of the GPCR.
[0005] In particular embodiments, the GPCR fusion protein may
further comprise an epitope tag N-terminal to the autonomously
folding stable domain. In some cases, the GPCR fusion protein may
further comprise a protease cleavage site between the epitope tag
and the autonomously folding stable domain, thereby allowing the
epitope tag to cleaved off.
[0006] In particular embodiments, the autonomously folding stable
domain may comprises the amino acid sequence of lysozyme. In some
cases, the GPCR fusion protein may also comprise a second
autonomously folding stable domain between the TM5 and TM6 regions
of the GPCR (i.e., in the IC3 loop of the GPCR).
[0007] In certain embodiments, the GPCR of the fusion protein may
be active. The GPCR of the fusion protein may be naturally
occurring or non-naturally occurring.
[0008] Also provided is a composition of matter comprising: a) a
subject GPCR fusion protein; and b) a moiety complexed with the
GPCR fusion protein. The moiety complexed with the GPCR fusion
protein may be, for example, a G-protein or an antibody that is
bound to the IC3 loop of the GPCR. The moiety may also be a ligand
for the GPCR.
[0009] A nucleic acid encoding the subject GPCR fusion protein is
also provided. In particular embodiments, the nucleic acid may
encode, from 5' to 3': a) a signal sequence; b) an epitope tag; c)
a protease cleavage site; d) an autonomously folding stable domain;
and e) a GPCR. Also provided is a cell containing the nucleic acid.
In particular cases, the fusion protein may be expressed in the
cell, and disposed on the plasma membrane of the cell.
[0010] Also provided is a crystal comprising a crystalline form of
the subject GPCR fusion protein. The crystal may further contain,
for example, a G protein complexed with the GPCR fusion protein, a
ligand for the GPCR, or an antibody that is bound to the IC3 loop
of the GPCR. In particular embodiments, the crystallized GPCR
fusion protein may comprise a second autonomously folding stable
domain between the TM5 and TM6 regions of the GPCR.
[0011] Also provided is a method for producing the subject fusion
protein. In some embodiments, this method may involve culturing the
above-described cell to produce the GPCR fusion protein; and
isolating the GPCR fusion protein from the cell. The may further
comprises crystallizing the GPCR fusion protein to make crystals,
e.g., using a bicelle crystallization method or a lipidic cubic
phase crystallization method. Prior to crystallization, the
isolated GPCR fusion protein may be combined with a moiety to which
it complexes, e.g., the G protein to which it couples, a ligand or
an antibody, for example, to produce a complexes. This method may
further comprise obtaining atomic coordinates of the GPCR fusion
protein from said crystal.
[0012] A method of determining a crystal structure is also
provided. In certain cases this method comprises: receiving a
subject GPCR fusion protein, crystallizing the fusion protein to
produce a crystal; and obtaining atomic coordinates of the fusion
protein from the crystal. Other embodiment include forwarding a
subject GPCR fusion protein to a remote location, and receiving
atomic coordinates for said GPCR fusion protein.
[0013] In particular embodiments, a composition comprising a fusion
protein in crystalline form is provided in which the fusion protein
comprises: a) a G-protein coupled receptor (GPCR); and b) a
lysozyme domain, where the lysozyme domain is N-terminal to the
GPCR.
[0014] In particular embodiments, the GPCR may comprise the amino
acid sequence of a naturally occurring GPCR. In other embodiments,
GPCR may comprise the amino acid sequence of a non-naturally
occurring GPCR.
[0015] The domain, in certain cases, may comprise an amino acid
sequence having at least 80% identity to the amino acid sequence of
a wild-type lysozyme. For example, in certain cases, the domain may
comprise an amino acid sequence that is at least 95% identical to
the amino acid sequence of T4 lysozyme.
[0016] In particular embodiments, the GPCR may be a family A GPCR,
a family B GPCR or a family C GCPR. In particular embodiments, the
GPCR may be a receptor for a biogenic amine, a dopamine receptor, a
seratonin receptor, an adrenergic receptor, a .beta.2-adrenergic
receptor, a melanocortin receptor subtype 4, a ghrelin receptor, a
metabotropic glutamate receptor or a chemokine receptor. The
crystallized GPCR fusion protein may comprise a second autonomously
folding stable domain (e.g., another lysozyme domain) between the
TM5 and TM6 regions of the GPCR.
[0017] In some embodiments, the fusion protein is bound to a ligand
for the GPCR. In particular embodiments, the fusion protein may be
co-crystalized with a G protein to which the GPCR couples (which
may be composed of the G.alpha., .beta. and .gamma. subunits) or an
antibody that binds the IC3 loop of the GPCR, for example.
[0018] In particular cases, a GPCR-G-protein complex may be
crystallized in conjunction with an antibody that stabilizes the
G-protein in the same way as the nanobody described below. Such an
antibody may be from any species and, in certain cases, may be a
single chain antibody.
BRIEF DESCRIPTION OF THE FIGURES
[0019] FIG. 1 is a schematic illustration of a GPCR, showing the
canonical transmembrane regions (TM1, TM2, TM3, TM4, TM5, TM6, and
TM7), intracellular regions (IC1, IC2, and IC3), and extracellular
regions (EC1, EC2, and EC3).
[0020] FIG. 2 is a schematic illustration of a subject fusion
protein, showing an autonomously folding stable domain that is
N-terminal to a GPCR.
[0021] FIG. 3 is a schematic illustration of the fusion protein
encoded by a subject nucleic acid. The encoded fusion protein
contains an autonomously folding stable domain that is N-terminal
to a GPCR. The protein further contains a signal sequence, an
epitope tag and a protease cleavage site.
[0022] FIG. 4 shows exemplary sequences that may be employed in
place of the lysozyme sequences of FIG. 5. From top to bottom, SEQ
ID NOS: 2-6.
[0023] FIG. 5 shows the amino acid sequence of an exemplary fusion
protein. SEQ ID NO:1. The HA signal peptide is shown in unbolded
italic letters; the FLAG epitope tag is shown in underlined
letters; the TEV recognition sequence is marked with non-underlined
bold letters and the cleavage site is shown in asterisk. The full
length T4L is shown by bold underlined letters and the
.beta..sub.2AR sequence from Asp29 to Gly365 is shown by bold,
underlined, italicized letters.
[0024] FIG. 6 shows the amino acid sequences of further exemplary
fusion proteins. SEQ ID NOS: 7-13. The HA signal peptide is shown
in unbolded italic letters; the FLAG epitope tag is shown in
underlined letters; the TEV recognition sequence is marked with
non-underlined bold letters and the cleavage site is shown in
asterisk. The full length T4L is shown by bold underlined letters
and the GPCR sequence is shown by bold, underlined, italicized
letters.
[0025] FIG. 7. G protein cycle for the .beta..sub.2AR-Gs complex.
a, Extracellular agonist binding to the .beta..sub.2AR leads to
conformational rearrangements of the cytoplasmic ends of
transmembrane segments that enable the G.sub.s heterotrimer
(.alpha., .beta., and .gamma.) to bind the receptor. GDP is
released from the .alpha. subunit upon formation of R:G complex.
The GTP binds to the nucleotide-free .alpha. subunit resulting in
dissociation of the .alpha. and .beta..gamma. subunits from the
receptor. The subunits regulate their respective effector proteins
adenylyl cyclase (AC) and Ca.sup.2+ channels. The G.sub.s
heterotrimer reassembles from .alpha. and .beta..gamma. subunits
following hydrolysis of GTP to GDP in the .alpha. subunit. b, The
purified nucleotide-free .beta..sub.2AR-Gs protein complex
maintained in detergent micelles. The Gs.alpha. subunit consists of
two domains, the Ras domain (.alpha.Ras) and the .alpha.-helical
domain (.alpha.AH). Both are involved in nucleotide binding. In the
nucleotide-free state, the .alpha.AH domain has a variable position
relative the .alpha.Ras domain.
[0026] FIG. 8. Overall structure of the .beta..sub.2AR Gs complex.
a, Lattice packing of the complex shows alternating layers of
receptor and G protein within the crystal. Abundant contacts are
formed among proteins within the aqueous layers. b, The overall
structure of the asymmetric unit contents shows the .beta..sub.2AR
bound to an agonist (spheres) and engaged in extensive interactions
with Gs.alpha.. G.alpha.s together with G.beta. and G.gamma.
constitute the heterotrimeric G protein Gs. A Gs binding nanobody
binds the G protein between the .alpha. and .beta. subunits. The
nanobody (Nb35) facilitates crystallization, as does T4 lysozyme
fused to the amino terminus of the .beta..sub.2AR. c, The
biological complex omitting crystallization aids, showing its
location and orientation within a cell membrane.
[0027] FIG. 9. Comparison of active and inactive .beta..sub.2AR
structures. a, Side and cytoplasmic views of the .beta..sub.2AR-Gs
structure compared to the inactive carazolol-bound .beta..sub.2AR
structure (blue). Significant structural changes are seen for the
intracellular domain of TM5 and TM6. TM5 is extended by two helical
turns while TM6 is moved outward by 14 .ANG. as measured at the
.alpha.-carbons of Glu268 in the two structures. b,
.beta..sub.2AR-Gs compared with the nanobody-stabilized active
state .beta..sub.2AR-Nb80 structure .sup.12c and d, The positions
of residues in the E/DRY and NPxxY motifs and other key residues of
the .beta..sub.2AR-Gs and .beta..sub.2AR-Nb80 structures as seen
from the cytoplasmic side. All residues occupy very similar
positions except Arg131 which in the .beta..sub.2AR-Nb80 structure
interacts with the nanobody.
[0028] FIG. 10. Receptor-G protein interactions. a, b The
.alpha.5-helix of G.alpha.s docks into a cavity formed on the
intracellular side of the receptor by the opening of transmembrane
helices 5 and 6. a. Within the transmembrane core, the interactions
are primarily non-polar. An exception involves packing of Tyr391 of
the .alpha.5-helix against Arg131 of the conserved DRY sequence in
TM3 (see also FIG. 15). Arg131 also packs against Tyr of the
conserved NPxxY sequence in TM7. b. As .alpha.5-helix exits the
receptor it forms a network of polar interactions with TM5 and TM3.
c, Receptor residues Thr68 and Asp 130 interact with the IL2 helix
of the .beta..sub.2AR via Tyr141, positioning the helix so that
Phe139 of the receptor docks into a hydrophobic pocket on the G
protein surface, thereby structurally linking receptor-G protein
interactions with the highly conserved DRY motif of the
.beta..sub.2AR.
[0029] FIG. 11. Conformational changes in G.alpha.s. a, A
comparison of Gs in the .beta..sub.2AR-Gs complex with the
GTP.gamma.S-bound G.alpha.s (PDB ID: 1AZT). GTP.gamma.S is shown as
spheres. The helical domain of G.alpha.s (G.alpha.sAH) exhibits a
dramatic displacement relative to its position in the
GTP.gamma.S-bound state. b, The .alpha.5-helix of G.alpha.s is
rotated and displaced toward the .beta..sub.2AR, perturbing the
.beta.6-.alpha.5 loop which otherwise forms part of the GTP.gamma.S
binding pocket. c, The .beta.1-.alpha.1 loop (P-loop) and
.beta.6-.alpha.5 loop of G.alpha.s interact with the phosphates and
purine ring, respectively, of GTP.gamma.S in the GTP.gamma.S-Gs
structure. d, The .beta.1-.alpha.1 and .beta.6-.alpha.5 loops are
rearranged in the nucleotide-free .beta..sub.2AR-Gs structure.
[0030] FIG. 12. Proposed model for structural changes causing GDP
release from the R:G complex. a, Alignment of the TM segments of
.beta..sub.2AR in the .beta..sub.2AR-Gs structure and metarhodopsin
II .sup.24(PDB ID: 3PQR) (purple) bound with the C-terminal peptide
of transducin (blue). b, The C-terminal end of GsRas domain from
the GTP.gamma.S bound Gs structure .sup.22 (PDB ID: 1AZT) is
aligned with the C-terminal peptide of transducin. The C-terminal
end of the .alpha.5-helix was moved away from the rest of the GsRas
domain to avoid clashes with the .beta..sub.2AR. c, Cartoon of the
.beta..sub.2AR-Gs peptide fusion construct used in the binding
experiments (d). d, Competition binding experiments between
[.sup.3H]-DHA and full agonist isoproterenol. Top panel shows
binding data (reproduced from Rasmussen et al., 2011) on
.beta..sub.2AR reconstituted in HDL particles with and without Gs
heterotrimer. The fraction of .beta..sub.2AR in the K.sub.i high
state for the .beta..sub.2AR with Gs is 0.55. Bottom panel shows
binding to .beta..sub.2AR and a .beta..sub.2AR-G.alpha.s peptide
fusion expressed in Sf9 cell membranes. The fraction of
.beta..sub.2AR in the K.sub.i high state for the
.beta..sub.2AR-G.alpha.s peptide fusion is 0.68. e, Same view as
(b) but with metarhodopsin II structure and the C-terminal peptide
removed. f, Comparison of GsRas domains of the transducin peptide
aligned GTP.gamma.S bound Gs structure and the nucleotide-free Gs
heterotrimer of the .beta..sub.2AR-Gs complex.
[0031] FIG. 13. Effect of nucleotide analogs, pH, and nanobodies on
the stability of the R:G complex. a) Analytical gel filtration
showing that nucleotides GDP and GTP.gamma.S (0.1 mM) cause
dissociation of the R:G complex. b) The phosphates pyrophosphate
and foscarnet (used at 5 mM) resemble the nucleotide phosphate
groups, but do not cause disruption of the complex. When used as
additives they improved crystal growth of both the T4L-.beta.2AR:Gs
complex (without nanobodies), T4L-.beta.2AR:Gs:Nb37, and
T4L-.beta.2AR:Gs:Nb35. c) The pH limit was determined to guide the
preparation of crystallization screens. For the same purpose the
effect of ionic strength (data not shown) was determined using NaCl
at various concentrations. The complex is stable in 20, 100, and
500 mM but dissociates at 2.5 M NaCl. d) Nanobody 35 (Nb35, broken
line) binds to the R:G complex (solid line) to form the R:G:Nb35
complex (red solid line) which is insensitive to GTP.degree. S
treatment (solid line) in contrast to the treated R:G complex alone
(broken line). Nb35 and Nb37 binds separate epitopes on the Gs
heterotrimer to form a R:G:Nb35:Nb37 complex (solid line). Nb37
binding also prevents GTP.degree. S from dissociating the R:G
complex (data not shown).
[0032] FIG. 14. Crystals of the T4L-.beta.2AR:Gs:Nb35 complex in
sponge-like mesophase
[0033] FIG. 15. Views of electron density for residues in the R:G
interface. a) The D/ERY motif at the cytoplasmic end of TM3. b)
Packing interaction between Arg131 of the E/DRY motif and Tyr391 of
C-terminal Gs.alpha.. c) The NPxxY in the cytoplasmic end of TM7.
d) Interactions of Thr68 and Tyr141 with Asp130 of the E/DRY motif.
Phe139 of IL2 is buried in a hydrophobic pocket in Gs.alpha.. e)
The .beta.1-.alpha.1 loop (P-loop) of Gs.alpha. involved in
nucleotide binding. Electron density maps are 2Fo-Fc maps contoured
at 1 sigma.
[0034] FIG. 16. Flow-chart of the purification procedures for
preparing R:G complex with Nb35
[0035] FIG. 17. Purity and homogeneity of the R:G complex: a)
Analytical SDS-PAGE/Coomassie blue stain of samples obtained at
various stages of receptor-G protein purification. BI167107 agonist
bound, dephosphorylated, and deglycosylated receptor is used in
excess of Gs heterotrimer for optimal coupling efficiency with the
functional fraction of the G protein. Functional purification of Gs
is archived through its interaction with the immobilized receptor
on the M1 resin while non-functional/non-binding Gs is not
retained. b) A representative elution profile of one of four
consecutive preparative size exclusion chromatography (SEC) runs
with fractionation indicated in red. SEC fractions containing the
R:G complex (within the indicated dashed lines) were pooled, spin
concentrated, and analyzed for purity and homogeneity by
SDS-PAGE/Coomassie blue (a, lane 6), gel filtration (c), and by
anion exchange chromotography (d). d) Upper panel shows elution pro
le from an analytical ion exchange chromatography (IEC) run of
.beta.2AR-365:Gs complex that was treated with .lamda. phosphatase
prior to complex formation. Lower panel shows IEC of complex which
was not dephosphorylated resulting in a heterogeneous preparation.
Off-peak fractions from the preparative SEC (b) were used for
analytical gel filtration experiments shown in FIGS. 13 and 21.
[0036] FIG. 18. Purification of Nb35 and determination of R:G:Nb
mixing ratios a) Preparative ion exchange chromatography following
nickel affinity chromatography purification of Nb35. The nanobody
eluted in two populations (shown in red) as a minor peak and a
major homogeneous peak which was collected, spin concentrated, and
used for crystallography following determination of proper mixing
ratio with the R:G complex as shown in (b). b) The R:G complex was
mixed with slight excess of Nb35 (1 to 1.2 molar ratio of R:G
complex to Nb35) on the basis of their protein concentrations and
verified by analytical gel filtration.
[0037] FIG. 19. Formation of a stable R:G complex. A stable complex
was achieved by the combined effects of: 1) binding a high affinity
agonist to the receptor with an extremely slow dissociation rate
(as described in Rasmussen et al., 2011); 2) formation of a
nucleotide free complex in the presence of apyrase that hydrolyses
released GDP preventing it from rebinding and causing a less stable
R:G interaction; and 3) detergent exchange of DDM for MNG-3 that
stabilizes the complex.
[0038] FIG. 20. Stabilizing effect of MNG-3 on the R:G complexes a)
Analytical gel filtration of R:G complexes purified in DDM (in
black), MNG-3 (in blue), or two MNG-3 analogs (in red and green)
following incubation for 48 hrs at 4.degree. C. In contrast to DDM,
the R:G complexes are stable in the MNG detergents. b) Effect of
diluting unliganded purified .beta.2AR in either DDM or MNG-3 below
the critical micelle concentration (CMC) of the detergent.
Functional activity of the receptor was determined by 3H-dihydro
alprenolol (3H-DHA) saturation binding. Diluting 2AR maintained in
DDM by 1000-fold below the CMC cause loss in 3H-DHA binding (black
data points) after 20 sec. In contrast, .beta.2AR in MNG-3 diluted
1000-fold below the CMC maintained full ability to bind 3H-DHA
after 24 hrs.
[0039] FIG. 21. Effect of alkylating and reducing agents on the
stability and aggregation of the R:G complex. a) Disulfide-mediated
aggregation of the R:G complex was observed by size exclusion
chromatography (SEC) following incubation at 0.degree. C. for 7
days in buffer containing 0.1 mM tris(2-carboxyethyl)phosphine
(TCEP). b) Treatment of the complex with iodoacetamide (5 mM for 20
hrs at 20.degree. C.) led to dissociation of the complex.
Alkylating free cysteines with iodoacetic acid and cadmium chloride
also led to dissociation. c) Disulfide-mediated aggregation of the
complex could be prevented by higher concentrations of reducing
agents. Shown are the effects of 0.1, 1, and 10 mM TCEP for 1 hr at
20.degree. C., or 10 mM betamercaptoethanol (.beta.-ME, 1 hr at
20.degree. C.). Crystallization setups were performed using 1 to 5
mM TCEP, which was essential for optimal crystal growth.
[0040] FIG. 22. a. shows a schematic diagram of
T4L-.beta..sub.2AR-.DELTA.ICL3 fusion protein used for
crystallography, in including the .beta..sub.2AR residues, the wild
type .beta..sub.2AR sequence, the HA signal peptide, the FLAG tag,
the TEV recognition site the M96T, M98T mutations, the cysteines
involved in disulfide bonds, disulfide bond linkages, the N187E
mutation, and the 2-Ala linker. b. shows a chematic diagram of all
of the T4L-.beta..sub.2AR-.DELTA.ICL3 constructs that were
generated and evaluated for expression of functional receptor
protein in insect cells. SEQ ID NOS: 18-29.
[0041] FIG. 23. a, b. Packing interactions mediated by T4L. Each
T4L packs against three adjacent T4L-.beta..sub.2AR-.DELTA.ICL3
molecules and is involved in 4 packing interactions. The T4L and
.beta..sub.2AR-.DELTA.-ICL3 from the reference molecule are shown.
The T4L and .beta..sub.2AR-.DELTA.-ICL3 from the three adjacent
molecules are shown. c-f. Close-up few of packing interactions 1-4.
The residues involved in interactions are shown as spheres c. In
interaction 1 the reference T4L packs against ECL2 of its fused
.beta..sub.2AR-.DELTA.-ICL3. d. In interaction 2 the reference T4L
packs against T4L of an adjacent T4L-.beta..sub.2AR-.DELTA.-ICL3.
e. In interaction 3 the reference T4L packs against T4L, ECL2 and
ECL3 of a second adjacent T4L-.beta..sub.2AR-.DELTA.-ICL3. f. In
interaction 4 the reference T4L packs against ICL3 and helix 8 of a
third T4L-.beta..sub.2AR-.DELTA.-ICL3.
[0042] FIG. 24. a. The crystal structure of the .beta..sub.2AR-Gs
complex. The T4L, .beta..sub.2AR and the G-protein heterotrimer are
shown in grey, as is the stabilizing nanobody. There is no packing
interaction between the T4L and its fused .beta..sub.2AR. b. The
crystal structure of T4L-.beta..sub.2AR-.DELTA.ICL3. The T4L is
shown in red and its fused .beta..sub.2AR-.DELTA.ICL3 is shown. In
contrast to the .beta..sub.2AR-Gs complex structure, there are
packing interactions between the T4L and its fused receptor
.beta..sub.2AR-.DELTA.ICL3.
[0043] FIG. 25. a. Saturation binding curves for antagonist
dihydroalprenolol (DHA) binding to T4L-.beta..sub.2AR-.DELTA.ICL3
and the wild type .beta..sub.2AR365. b. Competition binding curves
for agonist isopreterenol binding to T4L-.beta..sub.2AR-.DELTA.ICL3
and the wild type .beta..sub.2AR365.
[0044] FIG. 26. 2Fo-Fc map around the 2-Ala linker between T4L and
the .beta..sub.2AR. The main chain of the fusion junction is shown
in sticks. The electron density is shown in green mesh (1.sigma.).
The T4L and .beta..sub.2AR-.DELTA.ICL3 are shown in grey, as is the
2-Ala linker.
[0045] FIG. 27. a. The superposed structures of the
T4L-.beta..sub.2AR-.DELTA.ICL3 and the .beta..sub.2AR-T4L (pdb
2RH1). The T4L-.beta..sub.2AR-.DELTA.ICL3 and the
.beta..sub.2AR-T4L are shown in grey. b. The extracellular side
view of the superposed structures. c. The intracellular side view
of the superposed structures. d. ICL2 in the .beta..sub.2AR-Fab5
structure (pdb 2R4R). e. ICL2 in the .beta..sub.2AR-T4L structure
(pdb 2RH1). f. ICL2 in the T4L-.beta..sub.2AR-.DELTA.ICL3
structure. g. ICL2 in the structure of .beta..sub.2AR stabilized by
Nb80 (pdb 3P0G) and h. ICL2 in the .beta..sub.2AR-Gs structure (pdb
3SN6)
[0046] FIG. 28. Shows a model of .beta.2AR bound to salmeterol, a
partial agonist that is used to treat asthma. The partial-active
state is stabilized by nanobody 71.
[0047] Certain of the figures described above are shown in color in
U.S. provisional application Ser. Nos. 61/453,020, filed Mar. 15,
2011 and 61/507,425, filed Jul. 13, 2011. Those color figures, the
brief description of those figures, and all references to color
figures in those applications are incorporated by reference
herein.
DEFINITIONS
[0048] Unless defined otherwise herein, all technical and
scientific terms used herein have the same meaning as commonly
understood by one of ordinary skill in the art to which this
invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY
AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York
(1994), and Hale & Marham, THE HARPER COLLINS DICTIONARY OF
BIOLOGY, Harper Perennial, N.Y. (1991) provide one of skill with
general dictionaries of many of the terms used in this disclosure.
Although any methods and materials similar or equivalent to those
described herein can be used in the practice or testing of the
present invention, the preferred methods and materials are
described.
[0049] All patents and publications, including all sequences
disclosed within such patents and publications, referred to herein
are expressly incorporated by reference.
[0050] Numeric ranges are inclusive of the numbers defining the
range. Unless otherwise indicated, nucleic acids are written left
to right in 5' to 3' orientation; amino acid sequences are written
left to right in amino to carboxy orientation, respectively.
[0051] The headings provided herein are not limitations of the
various aspects or embodiments of the invention which can be had by
reference to the specification as a whole. Accordingly, the terms
defined immediately below are more fully defined by reference to
the specification as a whole.
[0052] "G-protein coupled receptors", or "GPCRs" are polypeptides
that share a common structural motif, having seven regions of
between 22 to 24 hydrophobic amino acids that form seven alpha
helices, each of which spans a membrane. As illustrated in FIG. 1,
each span is identified by number, i.e., transmembrane-1 (TM1),
transmembrane-2 (TM2), etc. The transmembrane helices are joined by
regions of amino acids between transmembrane-2 and transmembrane-3,
transmembrane-4 and transmembrane-5, and transmembrane-6 and
transmembrane-7 on the exterior, or "extracellular" side, of the
cell membrane, referred to as "extracellular" regions 1, 2 and 3
(EC1, EC2 and EC3), respectively. The transmembrane helices are
also joined by regions of amino acids between transmembrane-1 and
transmembrane-2, transmembrane-3 and transmembrane-4, and
transmembrane-5 and transmembrane-6 on the interior, or
"intracellular" side, of the cell membrane, referred to as
"intracellular" regions 1, 2 and 3 (IC1, IC2 and IC3),
respectively. The "carboxy" ("C") terminus of the receptor lies in
the intracellular space within the cell, and the "amino" ("N")
terminus of the receptor lies in the extracellular space outside of
the cell. GPCR structure and classification is generally well known
in the art, and further discussion of GPCRs may be found in Probst,
DNA Cell Biol. 1992 11:1-20; Marchese et al Genomics 23: 609-618,
1994; and the following books: Jurgen Wess (Ed) Structure-Function
Analysis of G Protein-Coupled Receptors published by Wiley-Liss
(1st edition; Oct. 15, 1999); Kevin R. Lynch (Ed) Identification
and Expression of G Protein-Coupled Receptors published by John
Wiley & Sons (March 1998) and Tatsuya Haga (Ed), G
Protein-Coupled Receptors, published by CRC Press (Sep. 24, 1999);
and Steve Watson (Ed) G-Protein Linked Receptor Factsbook,
published by Academic Press (1st edition; 1994). A schematic
representation of a typical GPCR is shown in FIG. 1.
[0053] The term "naturally-occurring" in reference to a GPCR means
a GPCR that is naturally produced (for example and not limitation,
by a mammal or by a human). Such GPCRs are found in nature. The
term "non-naturally occurring" in reference to a GPCR means a GPCR
that is not naturally-occurring. Wild-type GPCRs that have been
made constitutively active through mutation, and variants of
naturally-occurring GPCRs, e.g., epitope-tagged GPCR and GPCRs
lacking their native N-terminus are examples of non-naturally
occurring GPCRs. Non-naturally occurring versions of a naturally
occurring GPCR are activated by the same ligand as the
naturally-occurring GPCR.
[0054] The term "ligand" means a molecule that specifically binds
to a GPCR. A ligand may be, for example a polypeptide, a lipid, a
small molecule, an antibody. A "native ligand" is a ligand that is
an endogenous, natural ligand for a native GPCR. A ligand may be a
GPCR "antagonist", "agonist", "partial agonist" or "inverse
agonist", or the like.
[0055] A "modulator" is a ligand that increases or decreases a GPCR
intracellular response when it is in contact with, e.g., binds, to
a GPCR that is expressed in a cell. This term includes agonists,
including partial agonists and inverse agonists, and
antagonists.
[0056] A "deletion" is defined as a change in either amino acid or
nucleotide sequence in which one or more amino acid or nucleotide
residues, respectively, are absent as compared to an amino acid
sequence or nucleotide sequence of a parental GPCR polypeptide or
nucleic acid. In the context of a GPCR or a fragment thereof, a
deletion can involve deletion of about 2, about 5, about 10, up to
about 20, up to about 30 or up to about 50 or more amino acids. A
GPCR or a fragment thereof may contain more than one deletion.
[0057] An "insertion" or "addition" is that change in an amino acid
or nucleotide sequence which has resulted in the addition of one or
more amino acid or nucleotide residues, respectively, as compared
to an amino acid sequence or nucleotide sequence of a parental
GPCR. "Insertion" generally refers to addition to one or more amino
acid residues within an amino acid sequence of a polypeptide, while
"addition" can be an insertion or refer to amino acid residues
added at an N- or C-terminus, or both termini. In the context of a
GPCR or fragment thereof, an insertion or addition is usually of
about 1, about 3, about 5, about 10, up to about 20, up to about 30
or up to about 50 or more amino acids. A GPCR or fragment thereof
may contain more than one insertion. Reference to particular GPCR
or group of GPCRs by name, e.g., reference to the serotonin or
histamine receptor, is intended to refer to the wild type receptor
as well as active variants of that receptor that can bind to the
same ligand as the wild type receptor and/or transduce a signal in
the same way as the wild type receptor.
[0058] A "substitution" results from the replacement of one or more
amino acids or nucleotides by different amino acids or nucleotides,
respectively as compared to an amino acid sequence or nucleotide
sequence of a parental GPCR or a fragment thereof. It is understood
that a GPCR or a fragment thereof may have conservative amino acid
substitutions which have substantially no effect on GPCR activity.
By conservative substitutions is intended combinations such as gly,
ala; val, ile, leu; asp, glu; asn, gln; ser, thr; lys, arg; and
phe, tyr.
[0059] The term "biologically active", with respect to a GPCR,
refers to a GPCR having a biochemical function (e.g., a binding
function, a signal transduction function, or an ability to change
conformation as a result of ligand binding) of a naturally
occurring GPCR.
[0060] As used herein, the terms "determining," "measuring,"
"assessing," and "assaying" are used interchangeably and include
both quantitative and qualitative determinations. Reference to an
"amount" of a GPCR in these contexts is not intended to require
quantitative assessment, and may be either qualitative or
quantitative, unless specifically indicated otherwise.
[0061] The terms "polypeptide" and "protein", used interchangeably
herein, refer to a polymeric form of amino acids of any length,
which can include coded and non-coded amino acids, chemically or
biochemically modified or derivatized amino acids, and polypeptides
having modified peptide backbones.
[0062] The term "fusion protein" or grammatical equivalents thereof
is meant a protein composed of a plurality of polypeptide
components, that while typically unjoined in their native state,
are joined by their respective amino and carboxyl termini through a
peptide linkage to form a single continuous polypeptide. Fusion
proteins may be a combination of two, three or even four or more
different proteins. The term polypeptide includes fusion proteins,
including, but not limited to, fusion proteins with a heterologous
amino acid sequence, fusions with heterologous and homologous
leader sequences, with or without N-terminal methionine residues;
immunologically tagged proteins; fusion proteins with detectable
fusion partners, e.g., fusion proteins including as a fusion
partner a fluorescent protein, .beta.-galactosidase, luciferase,
etc.; and the like.
[0063] The terms "nucleic acid molecule" and "polynucleotide" are
used interchangeably and refer to a polymeric form of nucleotides
of any length, either deoxyribonucleotides or ribonucleotides, or
analogs thereof. Polynucleotides may have any three-dimensional
structure, and may perform any function, known or unknown.
Non-limiting examples of polynucleotides include a gene, a gene
fragment, exons, introns, messenger RNA (mRNA), transfer RNA,
ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides,
branched polynucleotides, plasmids, vectors, isolated DNA of any
sequence, control regions, isolated RNA of any sequence, nucleic
acid probes, and primers. The nucleic acid molecule may be linear
or circular.
[0064] The terms "antibodies" and "immunoglobulin" include
antibodies or immunoglobulins of any isotype, fragments of
antibodies which retain specific binding to antigen, including, but
not limited to, Fab, Fv, scFv, and Fd fragments, chimeric
antibodies, humanized antibodies, single-chain antibodies, and
fusion proteins comprising an antigen-binding portion of an
antibody and a non-antibody protein. The antibodies may be
detectably labeled, e.g., with a radioisotope, an enzyme which
generates a detectable product, a fluorescent protein, and the
like. The antibodies may be further conjugated to other moieties,
such as members of specific binding pairs, e.g., biotin (member of
biotin-avidin specific binding pair), and the like. The antibodies
may also be bound to a solid support, including, but not limited
to, polystyrene plates or beads, and the like. Also encompassed by
the terms are Fab', Fv, F(ab').sub.2, and or other antibody
fragments that retain specific binding to antigen.
[0065] Antibodies may exist in a variety of other forms including,
for example, Fv, Fab, and (Fab').sub.2, as well as bi-functional
(i.e. bi-specific) hybrid antibodies (e.g., Lanzavecchia et al.,
Eur. J. Immunol. 17, 105 (1987)) and in single chains (e.g., Huston
et al., Proc. Natl. Acad. Sci. U.S.A., 85, 5879-5883 (1988) and
Bird et al., Science, 242, 423-426 (1988), which are incorporated
herein by reference). (See, generally, Hood et al., "Immunology",
Benjamin, N.Y., 2nd ed. (1984), and Hunkapiller and Hood, Nature,
323, 15-16 (1986),). This term also encompasses so-called "phage
display" antibodies.
[0066] A "monovalent" antibody is an antibody that has a single
antigen binding region. Fab fragments, scFv antibodies, and phage
display antibodies are types of monovalent antibodies, although
others are known. A "Fab" fragment of an antibody has a single
binding region, and may be made by papain digestion of a full
length monoclonal antibody. A single chain variable (or "scFv")
fragment of an antibody is an antibody fragment containing the
variable regions of the heavy and light chains of immunoglobulins,
linked together with a short flexible linker.
[0067] As used herein the term "isolated," when used in the context
of an isolated compound, refers to a compound of interest that is
in an environment different from that in which the compound
naturally occurs. "Isolated" is meant to include compounds that are
within samples that are substantially enriched for the compound of
interest and/or in which the compound of interest is partially or
substantially purified.
[0068] As used herein, the term "substantially pure" refers to a
compound that is removed from its natural environment and is at
least 60% free, at least 75% free, or at least 90% free from other
components with which it is naturally associated.
[0069] A "coding sequence" or a sequence that "encodes" a selected
polypeptide, is a nucleic acid molecule which can be transcribed
(in the case of DNA) and translated (in the case of mRNA) into a
polypeptide, for example, in a host cell when placed under the
control of appropriate regulatory sequences (or "control
elements"). The boundaries of the coding sequence are typically
determined by a start codon at the 5' (amino) terminus and a
translation stop codon at the 3' (carboxy) terminus. A coding
sequence can include, but is not limited to, cDNA from viral,
prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or
prokaryotic DNA, and synthetic DNA sequences. A transcription
termination sequence may be located 3' to the coding sequence.
Other "control elements" may also be associated with a coding
sequence. A DNA sequence encoding a polypeptide can be optimized
for expression in a selected cell by using the codons preferred by
the selected cell to represent the DNA copy of the desired
polypeptide coding sequence.
[0070] "Operably linked" refers to an arrangement of elements
wherein the components so described are configured so as to perform
their usual function. In the case of a promoter, a promoter that is
operably linked to a coding sequence will effect the expression of
a coding sequence. The promoter or other control elements need not
be contiguous with the coding sequence, so long as they function to
direct the expression thereof. For example, intervening
untranslated yet transcribed sequences can be present between the
promoter sequence and the coding sequence and the promoter sequence
can still be considered "operably linked" to the coding
sequence.
[0071] By "nucleic acid construct" it is meant a nucleic acid
sequence that has been constructed to comprise one or more
functional units not found together in nature. Examples include
circular, linear, double-stranded, extrachromosomal DNA molecules
(plasmids), cosmids (plasmids containing COS sequences from lambda
phage), viral genomes comprising non-native nucleic acid sequences,
and the like.
[0072] A "vector" is capable of transferring gene sequences to a
host cell. Typically, "vector construct," "expression vector," and
"gene transfer vector," mean any nucleic acid construct capable of
directing the expression of a gene of interest and which can
transfer gene sequences to host cells, which can be accomplished by
genomic integration of all or a portion of the vector, or transient
or inheritable maintenance of the vector as an extrachromosomal
element. Thus, the term includes cloning, and expression vehicles,
as well as integrating vectors.
[0073] An "expression cassette" comprises any nucleic acid
construct capable of directing the expression of a gene/coding
sequence of interest, which is operably linked to a promoter of the
expression cassette. Such cassettes can be constructed into a
"vector," "vector construct," "expression vector," or "gene
transfer vector," in order to transfer the expression cassette into
a host cell. Thus, the term includes cloning and expression
vehicles, as well as viral vectors.
[0074] A first polynucleotide is "derived from" or "corresponds to"
a second polynucleotide if it has the same or substantially the
same nucleotide sequence as a region of the second polynucleotide,
its cDNA, complements thereof, or if it displays sequence identity
as described above.
[0075] A first polypeptide is "derived from" or "corresponds to" a
second polypeptide if it is (i) encoded by a first polynucleotide
derived from a second polynucleotide, or (ii) displays sequence
identity to the second polypeptides as described above.
[0076] The term "autonomously folding stable domain" is intended to
exclude the amino acid sequence of a reporter protein, e.g., an
optically detectable protein such as a fluorescent protein (e.g.,
GFP, CFP or YFP) or luciferase, and also excludes amino acid
sequences that are at least 90% identical to the extracellular of a
naturally occurring GPCR.
[0077] The term "active form" or "native state" of a protein is a
protein that is folded in a way so as to be active. A GPCR is in
its active form if it can bind ligand, alter conformation in
response to ligand binding, and/or transduce a signal which may or
may not be induced by ligand binding. An active or native protein
is not denatured.
[0078] The term "stable domain" is a polypeptide domain that, when
folded in its active form, is stable, i.e., does not readily become
inactive or denatured.
[0079] The term "folds autonomously" indicates a protein that folds
into its active form in a cell, without biochemical denaturation
and renaturation of the protein, and without chaperones.
[0080] The term "naturally-occurring" refers to an object that is
found in nature.
[0081] The term "non-naturally-occurring" refers to an object that
is not found in nature.
[0082] The term "heterologous", in the context of two things that
are heterologous to one another, refers to two things that do not
exist in the same arrangement in nature.
[0083] The term "signal sequence" or "signal peptide" refers to a
sequence of amino acids at the N-terminal portion of a protein,
which facilitates the secretion of the mature form of the protein
through the plasma membrane. The mature form of the protein lacks
the signal sequence which is cleaved off during the secretion
process.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0084] In the following description, the fusion protein is
described first, followed by a discussion of the crystallization
method in which the fusion protein may be employed.
Fusion Proteins
[0085] As noted above, a subject fusion protein comprise: a) GPCR;
and b) an autonomously folding stable domain, where the
autonomously folding stable domain is N-terminal to the GPCR and is
heterologous to the GPCR. The autonomously folding stable domain is
believed to provide a polar surface for crystal lattice contacts on
the extracellular surface of the protein, thereby allowing the
fusion protein to be crystallized. In particular embodiments, the
protein is characterized in that is crystallizable under lipidic
cubic phase crystallization conditions, although other
crystallization conditions may be employed. A polar surface for
crystal lattice contacts on the extracellular surface of the
protein provides several options for crystallizing the fusion
protein. In one embodiment, the fusion protein may be crystallized
as a complex with the G-protein to which the GPCR couples. In
another embodiment, the protein may be crystallized as a complex
with an monovalent antibody that binds to the IC3 loop of the GPCR,
as described in published US patent application US20090148510 and
by Rasmusson et al (Nature 2007 450: 383-388), which publications
are incorporated by reference for disclosure of those methods. In
another embodiment, the third intracellular loop of the GPCR may
contain another autonomously folding stable domain (which may be
the same as or different to the autonomously folding stable domain
at the N-terminal end of the protein) as described in Rosenbaum et
al (Science 2007 318: 1266-73) and published U.S. patent
application US20090118474, which publications are incorporated by
reference for disclosure of those methods
[0086] In very general terms, such a fusion protein may be made by
substituting the N-terminal extracellular region of a GPCR with an
autonomously folding stable protein that is globular and readily
crystallizable, e.g., lysozyme, chitinase, glucose isomerase,
xylanase, trypsin inhibitor, crambin or ribonuclease, for example.
During crystallization, the autonomously folding stable domain is
thought to provides a polar surface for crystal lattice contacts on
the extracellular surface of the protein, thereby facilitating
crystallization of the protein.
[0087] As will be described in greater detail below, the GPCR
fusion protein may be produced using a nucleic acid encoding a
longer protein that, in order from N- to C-terminus, contains a
signal peptide, an epitope tag and a protease cleavage site and the
GPCR fusion protein. The longer protein is produced in the cell.
During secretion, the signal peptide is cleaved from the protein
and the resulting protein can be purified using the epitope tag.
The epitope tag can be cleaved from the GPCR fusion protein prior
use. Various signal peptides, epitope tags and protease cleavage
sites and methods for their use are known in the art.
[0088] GPCRs
[0089] Any known GPCR is suitable for use in the subject method. A
disclosure of the sequences and phylogenetic relationships between
277 GPCRs is provided in Joost et al. (Genome Biol. 2002
3:RESEARCH0063, the entire contents of which is incorporated by
reference) and, as such, at least 277 GPCRs are suitable for the
subject methods. A more recent disclosure of the sequences and
phylogenetic relationships between 367 human and 392 mouse GPCRs is
provided in Vassilatis et al. (Proc Natl Acad Sci 2003 100:4903-8
and www.primalinc.com, each of which is hereby incorporated by
reference in its entirely) and, as such, at least 367 human and at
least 392 mouse GPCRs are suitable for the subject methods. GPCR
families are also described in Fredriksson et al (Mol. Pharmacol.
2003 63, 1256-72).
[0090] The methods may be used, by way of exemplification, for
purinergic receptors, vitamin receptors, lipid receptors, peptide
hormone receptors, non-hormone peptide receptors, non-peptide
hormone receptors, polypeptide receptors, protease receptors,
receptors for sensory signal mediator, and biogenic amine receptors
not including .beta.2-adrenergic receptor. In certain embodiments,
said biogenic amine receptor does not include an adrenoreceptor.
.alpha.-type adrenoreceptors (e.g. .alpha..sub.1A, .alpha..sub.1B
or .alpha..sub.1C adrenoreceptors), and .beta.-type adrenoreceptors
(e.g. .beta..sub.1, .beta..sub.2, or .beta..sub.3 adrenoreceptors)
are discussed in Singh et al., J. Cell Phys. 189:257-265, 2001.
[0091] It is recognized that both native (naturally occurring) and
altered native (non-naturally occurring) GPCRs may be used in the
subject methods. In certain embodiments, therefore, an altered
native GPCR (e.g. a native GPCR that is altered by an amino acid
substitution, deletion and/or insertion) such that it binds the
same ligand as a corresponding native GPCR, and/or couples to a
G-protein as a result of the binding. In certain cases, a GPCR
employed herein may have an amino acid sequence that is at least
80% identical to, e.g., at least 90% identical, at least 85%
identical, at least 90% identical, at least 95% identical, or at
least 98% identical, to at least the heptahelical domain of a
naturally occurring GPCR. A GPCR employed herein may optionally
contain the C-terminal domain of a GPCR. In certain embodiments, a
native GPCR may be "trimmed back" from its N-terminus and/or its
C-terminus to leave its heptahelical domain, prior to use.
[0092] As such, the following GPCRs (native or altered) find
particular use as parental GPCRs in the subject methods:
cholinergic receptor, muscarinic 3; melanin-concentrating hormone
receptor 2; cholinergic receptor, muscarinic 4; niacin receptor;
histamine 4 receptor; ghrelin receptor; CXCR3 chemokine receptor;
motilin receptor; 5-hydroxytryptamine (serotonin) receptor 2A;
5-hydroxytryptamine (serotonin) receptor 2B; 5-hydroxytryptamine
(serotonin) receptor 2C; dopamine receptor D3; dopamine receptor
D4; dopamine receptor D1; histamine receptor H2; histamine receptor
H3; galanin receptor 1; neuropeptide Y receptor Y1; angiotensin II
receptor 1; neurotensin receptor 1; melanocortin 4 receptor;
glucagon-like peptide 1 receptor; adenosine A1 receptor;
cannabinoid receptor 1; and melanin-concentrating hormone receptor
1.
[0093] In particular embodiments, the GPCR may belong to one of the
following GPCR families: amine, peptide, glycoprotein hormone,
opsin, olfactory, prostanoid, nucleotide-like, cannabinoid,
platelet activating factor, gonadotropin-releasing hormone,
thyrotropin-releasing hormone or melatonin families, as defined by
Lapinsh et al (Classification of G-protein coupled receptors by
alignment-independent extraction of principle chemical properties
of primary amino acid sequences. Prot. Sci. 2002 11:795-805). The
subject GPCR may be a family A GPCR (rhodopsin-like), family B GPCR
(secretin-like, which includes the PTH and glucagon receptors), or
a family C GPCR (glutamate receptor-like, which includes the GABA
glutamate receptors), or an "other" family GPCR (which includes
adhesion, frizzled, taste type-2, and unclassified family
members).
[0094] In the subject methods, the N-terminal extracellular region
N-terminal to the TM1 region of a GPCR is usually identified, and
replaced with an autonomously folding stable domain to produce a
fusion protein. A schematic representation of the prototypical
structure of a GPCR is provided in FIG. 1, where these regions, in
the context of the entire structure of a GPCR, may be seen. A
schematic representation of a subject fusion protein is shown in
FIG. 2.
[0095] The N-terminal extracellular region is readily discernable
by one of skill in the art using, for example, a program for
identifying transmembrane regions: once transmembrane region TM1 is
identified, the N-terminal extracellular region will be apparent.
The N-terminal extracellular region may also be identified using
such methods as pairwise or multiple sequence alignment (e.g. using
the GAP or BESTFIT of the University of Wisconsin's GCG program, or
CLUSTAL alignment programs, Higgins et al., Gene. 1988 73:237-44),
using a target GPCR and, for example, GPCRs of known structure.
[0096] Suitable programs for identifying transmembrane regions
include those described by Moller et al., (Bioinformatics,
17:646-653, 2001). A particularly suitable program is called
"TMHMM" Krogh et al., (Journal of Molecular Biology, 305:567-580,
2001). To use these programs via a user interface, a sequence
corresponding to a GPCR or a fragment thereof is entered into the
user interface and the program run. Such programs are currently
available over the world wide web, for example at the website of
the Center for Biological Sequence Analysis at
cbs.dtu.dk/services/. The output of these programs may be variable
in terms its format, however they usually indicate transmembrane
regions of a GPCR using amino acid coordinates of a GPCR.
[0097] When TM regions of a GPCR polypeptide are determined using
TMHMM, the prototypical GPCR profile is usually obtained: an
N-terminus that is extracellular, followed by a segment comprising
seven TM regions, and further followed by a C-terminus that is
intracellular. TM numbering for this prototypical GPCR profile
begins with the most N-terminally disposed TM region (TM1) and
concludes with the most C-terminally disposed TM region (TM7).
[0098] In certain cases, once the N-terminal extracellular region
is identified in a GPCR, a suitable region of amino acids is chosen
for substitution with the amino acid sequence of the autonomously
folding stable domain. In certain embodiments, the C-terminus of
the autonomously folding stable domain is linked to the amino acid
that is within 50 residues (e.g., e.g., 1-5, 1-10, 1-20, 1-30,
1-40, etc. residues) N-terminal to the N-terminal amino acid of the
TM 1 region of the GPCR, although linkages outside of this region
are envisioned. In one exemplary embodiment, amino acids that are
at the N-terminal end of the TM1 region (i.e., within what would be
referred to as the TM1 region) may be replaced in addition the
amino acids that are N-terminal to the TM region. In particular
embodiments, this junction may be optimized to provide for maximal
expression and receptor activity.
[0099] In addition to substituting N-terminal extracellular region
of a GPCR with a autonomously folding stable domain, as described
above, in certain cases, the intracellular C-terminal region of the
GPCR (which may C-terminal to the cysteine palmitoylation site that
is approximately 10 to 25 amino acid residues downstream of a
conserved NPXXY motif), may be deleted. In certain cases, the 20-30
amino acids immediately C-terminal to the cysteine palmitoylation
site are not deleted. In particular embodiments, this position may
be optimized to provide for maximal expression and receptor
activity.
[0100] Autonomously Folding Stable Domains
[0101] In particular embodiments, the autonomously folding stable
domain is a polypeptide than can fold autonomously in a variety of
cellular expression hosts, and is resistant to chemical and thermal
denaturation. In particular embodiments, the autonomously folding
stable domains may be derived from a protein that is known to be
highly crystallizable in a variety of space groups and crystal
packing arrangements. In certain cases, the stable, folded protein
insertion may also shield the fusion protein from proteolysis, and
may itself be protease resistant. Lysozyme is one such polypeptide,
however many others are known.
[0102] In certain embodiments, a autonomously folding stable domain
of a subject fusion protein may be a soluble, stable protein (e.g.,
a protein displaying resistance to thermal and chemical
denaturation) that folds autonomously of the GPCR portion of the
fusion protein, in a cell. In certain cases, the stable,
autonomously folding stable domain may have no cysteine residues
(or may be engineered to have no cysteine residues) in order to
avoid potential disulphide bonds between the autonomously folding
stable domain and a GPCR portion of the fusion protein, or internal
disulphide bonds. Autonomously folding stable domains are
conformationally restrained, and are resistant to protease
cleavage.
[0103] In certain cases, the autonomously folding stable domain may
contain most or all of the amino acid sequence of a polypeptide
that is readily crystallized. Such proteins may be characterized by
a large number of deposits in the protein data bank (www.rcsb.org)
in a variety of space groups and crystal packing arrangements.
While examples that employ lysozyme as stable, folded protein
insertion are discussed below, the general principles may be used
to employ any of a number of polypeptides that have the
characteristics discussed above. Autonomously folding stable domain
candidates include those containing the amino acid sequence of
proteins that are readily crystallized including, but not limited
to: lysozyme, chitinase, glucose isomerase, xylanase, trypsin
inhibitor, crambin, ribonuclease. Other suitable polypeptides may
be found at the BMCD database (Gilliland et al 1994. The Biological
Macromolecule Crystallization Database, Version 3.0: New Features,
Data, and the NASA Archive for Protein Crystal Growth Data. Acta
Crystallogr. D50 408-413), as published to the world wide web.
[0104] In certain embodiments, the autonomously folding stable
domain used may be at least 80% identical (e.g., at least 85%
identical, at least 90% identical, at least 95% identical or at
least 98% identical to a wild type protein. Many suitable wild type
proteins, including non-naturally occurring variants thereof, are
readily crystalizable.
[0105] In one embodiment, the autonomously folding stable domain
may be of the lysozyme superfamily, which share a common structure
and are readily crystallized. Such proteins are described in, e.g.,
Wohlkonig et al (Structural Relationships in the Lysozyme
Superfamily: Significant Evidence for Glycoside Hydrolase Signature
Motifs. PLoS ONE 2010 5: e15388).
[0106] As noted above, one such autonomously folding stable domain
that may be employed in a subject fusion protein is lysozyme.
Lysozyme is a highly crystallizable protein (see, e.g., Strynadka
et al Lysozyme: a model enzyme in protein crystallography EXS 1996
75: 185-222) and at present over 200 atomic coordinates for various
lysozymes, including many wild-type lysozymes and variants thereof,
including lysozymes from phage T4, human, swan, rainbow trout,
guinea fowl, soft-shelled turtle, tapes japonica, nurse shark,
mouse sperm, dog and phage P1, as well as man-made variants
thereof, have been deposited in NCBI's structure database. A
subject fusion protein may contain any of a wide variety of
lysozyme sequences. See, e.g., Strynadka et al (Lysozyme: a model
enzyme in protein crystallography (EXS. 1996; 75:185-222), Evrard
et al (Crystal structure of the lysozyme from bacteriophage lambda
and its relationship with V and C-type lysozymes) J. Mol. Biol.
1998 276:151-64), Forsythe et al (Crystallization of chicken
egg-white lysozyme from ammonium sulfate. Acta Crystallogr D Biol
Crystallogr. 1997 53:795-7), Remington et al (Structure of the
Lysozyme from Bacteriophage T4: An Electron Density Map at 2.4A
Resolution), Lyne et al (Preliminary crystallographic examination
of a novel fungal lysozyme from Chalaropsis. J Biol Chem. 1990
265:6928-30), Marana et al. (Crystallization, data collection and
phasing of two digestive lysozymes from Musca domestica. Acta
Crystallogr Sect F Struct Biol Cryst Commun. 2006 62:750-2), Harada
et al (Preliminary X-ray crystallographic study of lysozyme
produced by Streptomyces globisporus. J Mol Biol. 1989 207:851-2)
and Yao et al (Crystallization and preliminary X-ray structure
analysis of pigeon egg-white lysozyme). J. Biochem. 1992
111:1-3).
[0107] The length of the autonomously folding stable domain may be
in the range of 50-500 amino acids, e.g., 80-200 amino acids in
length, although autonomously folding stable domain having lengths
outside of this range are also envisioned.
[0108] As noted above, the autonomously folding stable domain is
not fluorescent or light-emitting. As such, the autonomously
folding stable domain is not CFP, GFP, YFP, luciferase, or other
light emitting, fluorescent variants thereof. In certain cases, a
autonomously folding stable domain does not contain a flexible
linker (e.g., a flexible polyglycine linker) or other such
conformationally unrestrained regions. In certain cases, the
autonomously folding stable domain contains a sequence of amino
acids from a protein that has a crystal structure that has been
solved. In certain cases, the stable, folded protein insertion
should not have highly flexible loop region characterized by high
cyrstallographic temperature factors (i.e., high B-factors).
[0109] An exemplary amino acid sequence for exemplary lysozyme
fusion protein is set forth in FIG. 5, and the amino acid sequences
of exemplary alternative additions (which may be substituted into
any of the sequences of FIG. 5 in place of the lysozyme sequence)
are shown in FIG. 4. These sequences include the sequences of
trypsin inhibitor, calbindin, barnase, xylanase, glucokinase or a
cytochrome, e.g., cytochrome a, b or c, although other sequences
can be readily used. In particular embodiments, any of the proteins
listed in table 1 of Papandreou et al (Eur. J. Biochem. 271,
4762-4768 (2004) FEBS 2004) or any of the 674 globular proteins
listed by Wang and Yuan (Proteins 2000 38, 165-175) (which
publications are incorporated by reference for disclosure of
individual proteins), including orthologs from other species and
variants proteins that are at least 80% identical to the listed
proteins. Exemplary sequences include those of apolipophorin-III,
staphylococcal nuclease, RNAse sa, uteroglobin, xylanase II,
glutaredoxin, myohemerythin, bacillus 1-3, 1-4-.beta.-glucanase,
orotate phosphoribosyltransferase, cytochrome b562, serine
esterase, fructose permease, subunit IIb, fibritin, legume lectin,
chloramphenicol acetyltransferase, cytochrome c oxidase, adenovirus
fibre, flavodoxin, phospholipase a2, stnv coat protein, signal
transduction protein, lysin, pseudoazurin, cutinase, retinoid-x
receptor .alpha., transthyretin, dihydropteridin reductase,
cytochrome c3, picornavirus, ch-p21 ras, interleukin-10, cellular
retinoic-acid-binding protein, retroviral integrase, catalytic
domain, oncomodulin, 2 (hiv-2) protease, glutamate receptor ligand
binding core, calcium-binding protein, histidine-containing
phosphocarrier, cellulase e2, parvalbumin, ubiquitin,
triosephosphate isomerase, myoglobin, 2fe-2s ferredoxin,
endonuclease, glycera globin, lysozyme, goose, uracil-dna
glycosylase, lamprey globin, lysozyme, chicken, lumazine synthase,
hemoglobin (horse), profilin, hypothetical protein ybea, hemoglobin
(human), ribosomal protein, d-tyr trnatyr deacylase,
erythrocruorin, integrase, coagulation factor x, leukemia
inhibitory factor, glycosylasparaginase, carboxypeptidase
inhibitor, mitochondrial cytochrome c, astacin, mhc class II p41
invariantchain fragment, cytochrome c2, diphtheria toxin,
methylamine dehydrogenase, phospholipase, nadh oxidase, ovomucoid
iii domain, dna-binding protein, signal transduction protein, ldl
receptor, pheromone, ferredoxin ii, peptostreptococcus,
anti-platelet protein, phosphatidylinositol 3-kinase, ferredoxin
ii, desulfovibrio gigas, crambin, .alpha.-spectrin, sh3 domain,
1c0ba ribonuclease a, heat-stable enterotoxin b, signal
transduction protein, c-src tyrosine kinase, tgf-.beta.3, seed
storage protein 7 s vicillin, prion protein domain, rubredoxin,
clostridium pasteurianum, immunoglobulin, abrin a-chain,
rubredoxin, archaeon pyrococcus furiosus, cd2, first domain,
platelet factor 4, fasciculin, macromycin, chemokine (growth
factor), plasminogen, cohesin-2 domain, (pro)cathepsin b,
ectothiorhodospira vacuolata, glucose-specific factor iii,
actinidin, hipip, allochromatium vinosum, staphylococcal nuclease,
chymotrypsin inhibitor CI-2, collagen type VI, dna-binding protein,
fk-506 binding, and factor IX.
[0110] The amino acid sequences of a variety of exemplary GPCR
fusion proteins that can be employed herein are set forth in FIG.
6. Given these sequences, suitable fusion proteins could be
designed using other GPCR.
Nucleic Acids
[0111] A nucleic acid comprising a nucleotide sequence encoding a
subject fusion protein is also provided. A subject nucleic acid may
be produced by any method. Since the genetic code and recombinant
techniques for manipulating nucleic acid are known, the design and
production of nucleic acids encoding a subject fusion protein is
well within the skill of an artisan. In certain embodiments,
standard recombinant DNA technology (Ausubel, et al, Short
Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995;
Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second
Edition, (1989) Cold Spring Harbor, N.Y.) methods are used.
[0112] For example, site directed mutagenesis and subcloning may be
used to introduce/delete/substitute nucleic acid residues in a
polynucleotide encoding GPCR. In other embodiments, PCR may be
used. Nucleic acids encoding a polypeptide of interest may also be
made by chemical synthesis entirely from oligonucleotides (e.g.,
Cello et al., Science (2002) 297:1016-8).
[0113] In certain embodiments, the codons of the nucleic acids
encoding polypeptides of interest are optimized for expression in
cells of a particular species, particularly a mammalian, e.g.,
human, species. Vectors comprising a subject nucleic acid are also
provided. A vector may contain a subject nucleic acid, operably
linked to a promoter.
[0114] A host cell (e.g., a host bacterial, mammalian, insect,
plant or yeast cell) comprising a subject nucleic acid is also
provided as well a culture of subject cells. The culture of cells
may contain growth medium, as well as a population of the cells.
The cells may be employed to make the subject fusion protein in a
method that includes culturing the cells to provide for production
of the fusion protein. In many embodiments, the fusion protein is
directed to the plasma membrane of the cell, and is folded into its
active form by the cell.
[0115] The native form of a subject fusion protein may be isolated
from a subject cell by conventional technology, e.g., by
precipitation, centrifugation, affinity, filtration or any other
method known in the art. For example, affinity chromatography
(Tilbeurgh et al., (1984) FEBS Lett. 16:215); ion-exchange
chromatographic methods (Goyal et al., (1991) Biores. Technol.
36:37; Fliess et al., (1983) Eur. J. Appl. Microbiol. Biotechnol.
17:314; Bhikhabhai et al., (1984) J. Appl. Biochem. 6:336; and
Ellouz et al., (1987) Chromatography 396:307), including
ion-exchange using materials with high resolution power (Medve et
al., (1998) J. Chromatography A 808:153; hydrophobic interaction
chromatography (Tomaz and Queiroz, (1999) J. Chromatography A
865:123; two-phase partitioning (Brumbauer, et al., (1999)
Bioseparation 7:287); ethanol precipitation; reverse phase HPLC;
chromatography on silica or on a cation-exchange resin such as
DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation;
or size exclusion chromatography using, e.g., Sephadex G-75, may be
employed.
[0116] In particular embodiments, the GPCR, e.g., the N- or
C-terminus of the GPCR or an external loop of the GPCR, may be
tagged with an affinity moiety, e.g., a his tag, GST, MBP, flag
tag, or other antibody binding site, in order to facilitate
purification of the GPCR fusion protein by affinity methods. Before
crystallization, a subject fusion protein may be assayed to
determine if the fusion protein is active, e.g., can bind ligand
and change in conformation upon ligand binding, and if the fusion
protein is resistant to protease cleavage. Such assays are well
known in the art.
[0117] In particular embodiments and illustrated in FIG. 3, the
protein encoded by the nucleic acid contains, from N-terminus to
C-terminus: a) a signal sequence; b) an affinity, e.g., epitope,
tag; c) a protease cleavage site; d) an autonomously folding stable
domain; and e) a GPCR. During secretion, the signal peptide is
cleaved from the protein and the resulting protein can be purified
using the affinity tag. The affinity tag can be cleaved from the
GPCR fusion protein prior use.
Crystallization Methods
[0118] Prior to crystallization, the isolated fusion protein may
optionally be combined with a variety of moieties (e.g., an
antibody (see, e.g., US20090148510, Rasmusson et al Nature 2007
450: 383-388 and Day et al Nature Methods 2007 4:927-9), a
modulator (such as an agonist, an antagonist, a native ligand,
etc., as described in, e.g., Rosenbaum Science. 2007 318:1266-73
etc), another GPCR, the G protein to which the GPCR couples or
another protein, e.g., Gs, Gi, or Gq), that bind to the GPCR, to
produce a complex. The complex is then crystallized and the atomic
coordinates of the complex can be obtained.
[0119] A subject fusion protein may be crystallized using any of a
variety of crystallization methods, many of which are reviewed in
Caffrey Membrane protein crystallization. J Struct. Biol. 2003
142:108-32, including those that employ detergent micelles,
bicelles and lipidic cubic phase (LCP). In general terms, the
methods are lipid-based methods that include adding lipid to the
fusion protein prior to crystallization. Such methods have
previously been used to crystallize other membrane proteins. Many
of these methods, including the lipidic cubic phase crystallization
method and the bicelle crystallization method, exploit the
spontaneous self-assembling properties of lipids and detergent as
vesicles (vesicle-fusion method), discoidal micelles (bicelle
method), and liquid crystals or mesophases (in meso or cubic-phase
method). Lipidic cubic phases crystallization methods are described
in, for example: Landau et al, Lipidic cubic phases: a novel
concept for the crystallization of membrane proteins. Proc. Natl.
Acad. Sci. 1996 93:14532-5; Gouaux, It's not just a phase:
crystallization and X-ray structure determination of
bacteriorhodopsin in lipidic cubic phases. Structure. 1998 6:5-10;
Rummel et al, Lipidic Cubic Phases: New Matrices for the
Three-Dimensional Crystallization of Membrane Proteins. J. Struct.
Biol. 1998 121:82-91; and Nollert et al Lipidic cubic phases as
matrices for membrane protein crystallization Methods. 2004
34:348-53, which publications are incorporated by reference for
disclosure of those methods. Bicelle crystallization methods are
described in, for example: Faham et al Crystallization of
bacteriorhodopsin from bicelle formulations at room temperature.
Protein Sci. 2005 14:836-40. 2005 and Faham et al, Bicelle
crystallization: a new method for crystallizing membrane proteins
yields a monomeric bacteriorhodopsin structure. J Mol Biol. 2002
Feb. 8; 316(1):1-6, which publications are incorporated by
reference for disclosure of those methods.
[0120] Computer Models and Computer Systems
[0121] In certain embodiments, the above-described computer
readable medium may further comprise programming for displaying a
molecular model of a GPCR or a complex of the same crystalized by
the instant method, programming for identifying a compound that
binds to the GPCR and/or a database of structures of known test
compounds, for example. A computer system comprising the
computer-readable medium is also provided. The model may be
displayed to a user via a display, e.g., a computer monitor, for
example.
[0122] The atomic coordinates may be employed in conjunction with a
modeling program to provide a model of the a GPCR or a complex of
the same. As used herein, the term "model" refers to a
representation in a tangible medium of the three dimensional
structure of the a GPCR or a complex of the same. For example, a
model can be a representation of the three dimensional structure in
an electronic file, on a display, e.g., a computer screen, on a
piece of paper (i.e., on a two dimensional medium), and/or as a
ball-and-stick figure. Physical three-dimensional models are
tangible and include, but are not limited to, stick models and
space-filling models. The phrase "imaging the model on a computer
screen" refers to the ability to express (or represent) and
manipulate the model on a computer screen using appropriate
computer hardware and software technology known to those skilled in
the art. Such technology is available from a variety of sources
including, for example, Evans and Sutherland, Salt Lake City, Utah,
and Biosym Technologies, San Diego, Calif. The phrase "providing a
picture of the model" refers to the ability to generate a "hard
copy" of the model. Hard copies include both motion and still
pictures. Computer screen images and pictures of the model can be
visualized in a number of formats including space-filling
representations, backbone traces, ribbon diagrams, and electron
density maps. Exemplary modeling programs include, but are not
limited to PYMOL, GRASP, or O software, for example.
[0123] In another embodiment, the invention provides a computer
system having a memory comprising the above-described atomic
coordinates; and a processor in communication with the memory,
wherein the processor generates a molecular model having a three
dimensional structure representative of a GPCR or a complex of the
same. The processor can be adapted for identifying a candidate
compound having a structure that is capable of binding to the a
GPCR or a complex of the same, for example.
[0124] In the present disclosure, the processor may execute a
modeling program which accesses data representative of the GPCR
structure. In addition, the processor also can execute another
program, a compound modeling program, which uses the
three-dimensional model of the GPCR or a complex of the same to
identify compounds having a chemical structure that binds to the
GPCR or a complex of the same. In one embodiment the compound
identification program and the structure modeling program are the
same program. In another embodiment, the compound identification
program and the structure modeling program are different programs,
which programs may be stored on the same or different storage
medium.
[0125] A number of exemplary public and commercial sources of
libraries of compound structures are available, for example the
Cambridge Structural Database (CSD), the Chemical Directory (ACD)
from the company MDL (US), ZINC (Irwin and Shoichet, J. Chem. Inf
Model. (2005) 45:177-82) as well as various electronic catalogues
of publicly available compounds such as the National Cancer
Institute (NCI, US) catalogue, ComGenex catalogue (Budapest,
Hungary), and Asinex (Moscow, Russia). Such libraries may be used
to allow computer-based docking of many compounds in order to
identify those with potential to interact with the GPCR using the
atomic coordinates described herein.
[0126] In certain cases, the method may further comprise a testing
a compound to determine if it binds and/or modulates the GPCR or a
complex of the same, using the atomic coordinates provided herein.
In some embodiments, the method may further comprise obtaining the
compound (e.g., purchasing or synthesizing the compound) and
testing the compound to determine if it modulates (e.g., activates
or inhibits) the GPCR, e.g., acts an agonist, antagonist or inverse
agonist of the GPCR).
[0127] In some embodiments, the method employs a docking program
that computationally tests known compounds for binding to the GPCR
or complex of the same. Structural databases of known compounds are
known in the art. In certain cases, compounds that are known to
bind and modulate the GPCR or complex of the same may be
computationally tested for binding to GPCR or complex of the same,
e.g., in order to identify a binding site and/or facilitate the
identification of active variants of an existing compound. Such
compounds include compounds that are known to be agonists of the
GPCR. In other cases, the method may include designing a compound
that binds to the GPCR, either de novo, or by modifying an existing
compound that is known to bind to the GPCR.
[0128] A method that comprises receiving a set of atomic
coordinates for the GPCR or complex of the same; and identifying a
compound that binds to said GPCR or complex of the same using the
coordinates is also provided, as is a method comprising: forwarding
to a remote location a set of atomic coordinates for the GPCR or
complex of the same; and receiving the identity of a compound that
binds to the GPCR or complex of the same.
[0129] In certain embodiments, a computer system comprising a
memory comprising the atomic coordinates of a GPCR or complex of
the same is provided. The atomic coordinates are useful as models
for rationally identifying compounds that bind to the GPCR or
complex of the same. Such compounds may be designed either de novo,
or by modification of a known compound, for example. In other
cases, binding compounds may be identified by testing known
compounds to determine if the "dock" with a molecular model of the
GPCR. Such docking methods are generally well known in the art.
[0130] The structure data provided can be used in conjunction with
computer-modeling techniques to develop models of ligand-binding
sites on the GPCR or complex of the same selected by analysis of
the crystal structure data. The site models characterize the
three-dimensional topography of site surface, as well as factors
including van der Waals contacts, electrostatic interactions, and
hydrogen-bonding opportunities. Computer simulation techniques are
then used to map interaction positions for functional groups
including but not limited to protons, hydroxyl groups, amine
groups, divalent cations, aromatic and aliphatic functional groups,
amide groups, alcohol groups, etc. that are designed to interact
with the model site. These groups may be designed into a candidate
compound with the expectation that the candidate compound will
specifically bind to the site.
[0131] The ability of a candidate compound to bind to a GPCR can be
analyzed prior to actual synthesis using computer modeling
techniques. Only those candidates that are indicated by computer
modeling to bind the target with sufficient binding energy (i.e.,
binding energy corresponding to a dissociation constant with the
target on the order of 10.sup.-2 M or tighter) may be synthesized
and tested for their ability to bind to and modulate the GPCR. Such
assays are known to those of skill in the art. The computational
evaluation step thus avoids the unnecessary synthesis of compounds
that are unlikely to bind the GPCR with adequate affinity.
[0132] A candidate compound may be computationally identified by
means of a series of steps in which chemical entities or fragments
are screened and selected for their ability to associate with
individual binding target sites on the GPCR. One skilled in the art
may use one of several methods to screen chemical entities or
fragments for their ability to associate with the GPCR, and more
particularly with target sites on the GPCR. The process may begin
by visual inspection of, for example a target site on a computer
screen, based on the coordinates, or a subset of those coordinates.
Selected fragments or chemical entities may then be positioned in a
variety of orientations or "docked" within a target site of the
GPCR as defined from analysis of the crystal structure data.
Docking may be accomplished using software such as Quanta
(Molecular Simulations, Inc., San Diego, Calif.) and Sybyl (Tripos,
Inc. St. Louis, Mo.) followed by energy minimization and molecular
dynamics with standard molecular mechanics forcefields such as
CHARMM (Molecular Simulations, Inc., San Diego, Calif.) and AMBER
(University of California at San Francisco).
[0133] Specialized computer programs may also assist in the process
of selecting fragments or chemical entities. These include but are
not limited to: GRID (Goodford, P. J., "A Computational Procedure
for Determining Energetically Favorable Binding Sites on
Biologically Important Macromolecules," J. Med. Chem., 28, pp.
849-857 (1985)); GRID is available from Oxford University, Oxford,
UK; MCSS (Miranker, A. and M. Karplus, "Functionality Maps of
Binding Sites: A Multiple Copy Simultaneous Search Method,"
Proteins: Structure, Function and Genetics, 11, pp. 29-34 (1991));
MCSS is available from Molecular Simulations, Inc., San Diego,
Calif.; AUTODOCK (Goodsell, D. S. and A. J. Olsen, "Automated
Docking of Substrates to Proteins by Simulated Annealing,"
Proteins: Structure, Function, and Genetics, 8, pp. 195-202
(1990)); AUTODOCK is available from Scripps Research Institute, La
Jolla, Calif.; DOCK (Kunts, I. D., et al. "A Geometric Approach to
Macromolecule-Ligand Interactions," J. Mol. Biol., 161, pp. 269-288
(1982)); DOCK is available from University of California, San
Francisco, Calif.; CERIUS II (available from Molecular Simulations,
Inc., San Diego, Calif.); and Flexx (Raret, et al. J. Mol. Biol.
261, pp. 470-489 (1996)).
[0134] Also provided is a method of determining a crystal
structure. This method may comprise receiving an above described
fusion protein, crystallizing the fusion protein to produce a
crystal; and obtaining atomic coordinates of the fusion protein
from the crystal. The fusion protein may be received from a remote
location (e.g., a different laboratory in the same building or
campus, or from a different campus or city), and, in certain
embodiments, the method may also comprise transmitting the atomic
coordinates, e.g., by mail, e-mail or using the internet, to the
remote location or to a third party.
[0135] In other embodiments, the method may comprise forwarding a
fusion protein to a remote location where the protein may be
crystallized and analyzed, and receiving the atomic coordinates of
the fusion protein.
[0136] In some embodiments a method for displaying the three
dimensional structure of a GPCR on a computer system is provided.
This method may comprise: a) accessing a file containing atomic
coordinates of a GPCR using a computer system that comprises a
modeling program, wherein the atomic coordinates are produced by
subjecting crystals of a GPCR fusion protein to X-ray diffraction
analysis, wherein the GPCR fusion protein is described above, b)
modeling the atomic coordinates on the computer system using the
modeling program to produce a model of the three dimensional
structure of at least a portion of the GPCR by; and c) displaying
the model of the three dimensional structure on the computer
system. The crystals also contain a ligand for the GPCR, and the
method further comprises identifying the binding site for the
ligand in the GPCR using the model. This method may further
comprises identifying the amino acids in the binding site. This
method may further comprise determining whether a test compound
docks with the binding site using the model. This method may
further comprise analyzing the packing between the test compound
and surrounding amino acids in said binding site. In some
embodiments, the analyzing may comprise calculating polar contacts
between the ligand and the model.
[0137] In particular embodiments, a method for analyzing the three
dimensional structure of a GPCR on a computer system is provided.
This method may involve: a) accessing a file containing atomic
coordinates of a GPCR using a computer system that comprises a
modeling program, wherein the atomic coordinates are produced by
subjecting crystals of a GPCR fusion protein to X-ray diffraction
analysis, wherein the GPCR fusion protein is described above, b)
modeling the atomic coordinates on the computer system using the
modeling program to produce a model of the three dimensional
structure of at least a portion of the GPCR.; and c) displaying the
model of the three dimensional structure on the computer system. In
certain cases, the crystals contain a ligand for the GPCR (e.g., a
known inhibitor, natural ligand or agonist, etc.), and the method
further comprises identifying the binding site for the ligand in
the GPCR using the model. The analyzing step may comprise
identifying amino acids that form polar contacts between the ligand
and amino acids in the binding site, using the model. This method
may further comprise determining whether a test compound, e.g., a
candidate pharmaceutical, docks with the binding site using the
model. The method may comprise analyzing the packing of the test
compound and amino acids in the binding site, using the model. This
method may further comprise making the modulator and testing it on
the GPCR in the presence of a ligand for the GPCR.
[0138] In order to further illustrate the present invention, the
following specific examples are given with the understanding that
they are being offered to illustrate the present invention and
should not be construed in any way as limiting its scope.
Materials, Methods and Results I
[0139] Molecular Biology for the Generation of N-T4L Fused
.beta.2AR Construct FLAAT
[0140] The previously generated construct .beta..sub.2AR365 was
used as the template for further modification to generate the N-T4L
fused .beta..sub.2AR construct FLAAT. In this .beta..sub.2AR365
template construct, the coding sequence of human .beta..sub.2AR
encompassing Gly2 to Gly365 was cloned into the pFastbac1 Sf9
expression vector (Invitrogen). The HA signal peptide followed by
FLAG epitope tag and tobacco etch virus (TEV) protease recognition
sequence was directly added to the N-terminus of the receptor for
expression and purification purpose. A point mutation of N187E was
also introduced to the construct to disrupt this unwanted
glycosylation site.
[0141] The DNA cassette encoding the full length T4L lysozyme (WT*,
C54T, C97A) with 2 additional alanines attached at the C-terminus
was made and amplified by PCR using previously described construct
.beta..sub.2AR-T4L (Rasmussen et al. Crystal structure of the human
beta2 adrenergic G-protein-coupled receptor. Nature. 2007 450:383
and Cherezov et al High-resolution crystal structure of an
engineered human beta2-adrenergic G protein-coupled receptor.
Science. 2007 318:1258-65) as the template and synthetic
oligonucleotides as primers. This cassette was inserted into the
.beta..sub.2AR365 construct between the end of the TEV protease
recognition sequence and Asp29 of the receptor by using the
Quickchange multi protocol (Stratagene). Two point mutations M96T,
M98T were also introduced into the construct based on the
Quickchange multi protocol using synthetic oligonucleotides as
mutation primers. The protein sequence of the entire fusion FLAAT
is shown in FIG. 5.
[0142] The entire FLAAT gene described above was further cloned
into the Best-Bac Sf9 expression vector pv11393 (expressionsystems)
using the restriction enzyme digestion site XbaI and EcoRI. The
final construct was confirmed by NDA sequencing.
[0143] Expression and Purification of FLAAT from
Baculovirus-Infected Sf9 Cells
[0144] Recombinant baculovirus was made from pv11393-FLAAT using
Best-Bac expression system, as described by the system protocol
(expressionsystem). FLAAT was expressed by Sf9 cells that were
infected by this baculovirus with 1:50 dilution at the cell density
of 4 million/ml. 1 .mu.M of receptor antagonist alprenolol was
included to enhance the receptor stability and yield. The infected
cells were harvested after 48 hs of incubation at 27.degree. C.
[0145] The harvested cells were lysed by vigorous stifling in 10
times volume of lysis buffer (10 mM TRIS-Cl pH 7.5, 2 mM EDTA)
complemented with protease inhibitor Leupeptin (2.5 .mu.g/ml final
concentration, Sigma) and Benzamindine (160 .mu.g/ml final
concentration, Sigma) for 15 minutes. The FLAAT protein was
extracted from the cell membrane by thorough homogenization using
solubilization buffer (100 mM NaCl, 20 mM TRIS-Cl, pH 7.5, 1%
Dodecylmaltoside) complemented with Leupeptin and Benzamindine (2.5
.mu.g/ml and 160 .mu.g/ml final concentration, respectively). 10 ml
of solubilization buffer was used for each gram of cell pellet. The
Dodecylmaltoside (DDM)-solubilized FLAAT bearing the FLAG epitope
was then purified by M1 antibody affinity chromatography (Sigma).
Extensive washing using HLS buffer (100 mM NaCl, 20 mM HEPES pH
7.5, 0.1% DDM) was performed to get rid of alprenolol. The protein
was then eluted with HLS buffer complemented with 5 mM EDTA, 200
.mu.g free FLAG peptide and saturating concentration of cholesterol
hemisuccinate.
[0146] The eluted FLAAT was further purified by affinity
chromatography using Sepharose attached with Alprenolol as
previously described (Cherezov et al High-resolution crystal
structure of an engineered human beta2-adrenergic G protein-coupled
receptor. Science 2007 318:1258-65) in order to selectively isolate
functional FLAAT from non-functional protein. HHS buffer (350 mM
NaCl, 20 mM HEPES pH 7.5, 0.1% DDM) complemented with 300 .mu.M
alprenolol and saturating concentration of cholesterol
hemisuccinate was used to elute the protein. The eluted FLAAT bound
with Alprenolol was then re-applied to M1 resin, allowing either
washing off Alprenolol or exchanging Alprenolol with different
ligand (for example, full agonist BI167107). Unliganded FLAAT or
FLAAT bound with BI167107 was then eluted from M1 resin with HLS
buffer complemented with 5 mM EDTA, 200 mg/ml free FLAG peptide and
saturating concentration of cholesterol hemisuccinate. The FLAG
epitope tag of FLAAT was removed by the treatment of tobacco etch
virus (TEV) protease (invitrogen) for 3 hs at room temperature or
overnight at 4.degree. C. The purity of the final FLAAT is more
than 90% according to the result of SDS-PAGE electrophoresis.
[0147] Crystallization of the FLAAT-BI167107-NB80 Ternary
Complex
[0148] Nanobody80 (NB80) was expressed and purified as previously
described (Rasmussen Structure of a nanobody-stabilized active
state of the .beta.(2) adrenoceptor. Nature. 2011 469:175-80.). The
untagged FLAAT bound with high affinity agonist BI167107 was
purified as described above. The purified FLAAT-BI167107 and NB80
was mixed with a 1:2 molar ratio. The FLAAT-BI167107-NB80 ternary
complex was then isolated from free NB80 by size exclusion
chromatography (SEC) using sephacryl S-200 column (GE health care
life sciences) equilibrated in 100 mM NaCl, 10 mM HEPES pH 7.5,
0.1% DDM and 10 .mu.M BI167107. The same buffer was used as the
running buffer for SEC.
[0149] The FLAAT-BI167107-NB80 complex after SEC was concentrated
to a final concentration of 60 mg/ml using vivaspin concentrator
(Sartorius-Stedim). The complex was crystallized using lipid cubic
phase (LCP) method as previously described (Rosenbaum et al, GPCR
engineering yields high-resolution structural insights into
beta2-adrenergic receptor function. Science. 2007 318: 1266-73.).
The protein complex was firstly mixed with lipid moloolein with a
1:1.5 mass ratio in room temperature. 0.1 .mu.l of the
protein-lipid mixture drop was put in each well of a 24-well glass
sandwich plate. The drop was then overlaid with 0.80 of precipitant
and the well was sealed by glass coverslip. By using this method,
the FLAAT-BI167107-NB80 ternary complex was crystallized in 31%-35%
PEG400 (v/v) and 0.1M Tris-Cl, pH8.0 after 4 days of incubation in
20.degree. C.
Materials and Methods II
[0150] Expression and Purification of .beta.2AR, Gs Heterotrimer,
and Nanobody-35
[0151] An N-terminally fused T4 lysozyme-.beta.2AR construct
.beta.2AR truncated in position 365 (T4L-.beta.2AR, described in
detail below) was expressed in Sf9 insect cell cultures infected
with recombinant baculovirus (BestBac, Expression Systems), and
solubilized in n-Dodecyl-.beta.-D-maltoside (DDM) according to
methods described previously Kobilka (Amino and carboxyl terminal
modifications to facilitate the production and purification of a G
protein-coupled receptor. Anal Biochem 1995 231, 269-271; see FIG.
16 for purification overview). A .beta.2AR construct truncated
after residue 365 (.beta.2AR-365) was used for the majority of the
analytical experiments and for deuterium exchange experiments. M1
Flag affinity chromatography (Sigma) served as the initial
purification step followed by alprenolol-Sepharose chromatography
for selection of functional receptor. A subsequent M1 Flag affinity
chromatography step was used to exchange receptor-bound alprenolol
for high-affinity agonist BI-167107. The agonist-bound receptor was
eluted, dialyzed against buffer (20 mM HEPES, pH 7.5, 100 mM NaCl,
0.1% DDM and 10 .mu.M BI-167107), treated with lambda phosphatase
(New England Biolabs), and concentrated to approximately 50 mg
ml.sup.-1 with a 50 kDa molecular weight cut off (MWCO) Millipore
concentrator. Prior to spin concentration, the .beta.2AR-365
construct, but not T4L-.beta.2AR, was treated with PNGaseF (New
England Biolabs) to remove amino-terminal N-linked glycosylation.
The purified receptor was routinely analyzed by SDS-PAGE/Coomassie
brilliant blue staining (see FIG. 17a).
[0152] Bovine G.alpha.s short, His6-bovine G.beta.1, and bovine
G.gamma.2 were expressed in HighFive insect cells (Invitrogen)
grown in Insect Xpress serum-free media (Lonza). Cultures were
grown to a density of 1.5 million cells per ml and then infected
with three separate Autographa californica nuclear polyhedrosis
virus each containing the gene for one of the G protein subunits at
a 1:1 multiplicity of infection (the viruses were a generous gift
from Dr. Alfred Gilman). After 40-48 hours of incubation the
infected cells were harvested by centrifugation and resuspended in
75 ml lysis buffer (50 mM HEPES, pH 8.0, 65 mM NaCl, 1.1 mM
MgCl.sub.2, 1 mM EDTA, 1.times.PTT (35 .mu.g/ml
phenylmethanesulfonyl fluoride, 32 .mu.g/ml tosyl phenylalanyl
chloromethyl ketone, 32 .mu.g/ml tosyl lysyl chloromethyl ketone),
1.times. LS (3.2 .mu.g/ml leupeptin and 3.2 .mu.g/ml soybean
trypsin inhibitor), 5 mM .beta.-mercaptoethanol (.beta.-ME), and 10
.mu.M GDP) per liter of culture volume. The suspension was
pressurized with 600 psi N.sub.2 for 40 minutes in a nitrogen
cavitation bomb (Parr Instrument Company). After depressurization,
the lysate was centrifuged to remove nuclei and unlysed cells, and
then ultracentrifuged at 180,000.times.g for 40 minutes. The
pelleted membranes were resuspended in 30 ml wash buffer (50 mM
HEPES, pH 8.0, 50 mM NaCl, 100 .mu.M MgCl.sub.2, 1.times.PTT,
1.times. LS, 5 mM .beta.-ME, 10 .mu.M GDP) per liter culture volume
using a Dounce homogenizer and centrifuged again at 180,000.times.g
for 40 minutes. The washed pellet was resuspended in a minimal
volume of wash buffer and flash frozen with liquid nitrogen.
[0153] The frozen membranes were thawed and diluted to a total
protein concentration of 5 mg/ml with fresh wash buffer. Sodium
cholate detergent was added to the suspension at a final
concentration of 1.0%, MgCl.sub.2 was added to a final
concentration of 5 mM, and 0.05 mg of purified protein phosphatase
5 (prepared in house) was added per liter of culture volume. The
sample was stirred on ice for 40 minutes, and then centrifuged at
180,000.times.g for 40 minutes to remove insoluble debris. The
supernatant was diluted 5-fold with Ni-NTA load buffer (20 mM
HEPES, pH 8.0, 363 mM NaCl, 1.25 mM MgCl.sub.2, 6.25 mM imidazole,
0.2% Anzergent 3-12, 1.times.PTT, 1.times. LS, 5 mM .beta.-ME, 10
.mu.M GDP), taking care to add the buffer slowly to avoid dropping
the cholate concentration below its critical micelle concentration
too quickly. 3 ml of Ni-NTA resin (Qiagen) pre-equlibrated in
Ni-NTA wash buffer 1 (20 mM HEPES, pH 8.0, 300 mM NaCl, 2 mM
MgCl.sub.2, 5 mM imidazole, 0.2% Cholate, 0.15% Anzergent 3-12,
1.times.PTT, 1.times.LS, 5 mM .beta.-ME, 10 .mu.M GDP) per liter
culture volume was added and the sample was stirred on ice for 20
minutes. The resin was collected into a gravity column and washed
with 4.times. column volumes of Ni-NTA wash buffer 1, Ni-NTA wash
buffer 2 (20 mM HEPES, pH 8.0, 50 mM NaCl, 1 mM MgCl.sub.2, 10 mM
imidazole, 0.15% Anzergent 3-12, 0.1% DDM, 1.times.PTT, 1.times.
LS, 5 mM .beta.-ME, 10 .mu.M GDP), and Ni-NTA wash buffer 3 (20 mM
HEPES, pH 8.0, 50 mM NaCl, 1 mM MgCl.sub.2, 5 mM imidazole, 0.1%
DDM, 1.times.PTT, 1.times. LS, 5 mM .beta.-ME, 10 .mu.M GDP). The
protein was eluted with Ni-NTA elution buffer (20 mM HEPES, pH 8.0,
40 mM NaCl, 1 mM MgCl2, 200 mM imidazole, 0.1% DDM, 1.times.PTT,
1.times. LS, 5 mM .beta.-ME, 10 .mu.M GDP). Protein-containing
fractions were pooled and MnCl.sub.2 was added to a final
concentration of 100 .mu.M. Fifty .mu.g of purified lambda protein
phosphatase (prepared in house) was added per liter of culture
volume and the elute was incubated on ice with stifling for 30
minutes. The eluate was passed through a 0.22 .mu.m filter and
loaded directly onto a MonoQ HR 16/10 column (GE Healthcare)
equilibrated in MonoQ buffer A (20 mM HEPES, pH 8.0, 50 mM NaCl,
100 .mu.M MgCl.sub.2, 0.1% DDM, 5 mM .beta.-ME, 1.times.PTT). The
column was washed with 150 ml buffer A at 5 ml/min and bound
proteins were eluted over 350 ml with a linear gradient up to 28%
MonoQ buffer B (same as buffer A except with 1 M NaCl). Fractions
were collected in tubes spotted with enough GDP to make a final
concentration of 10 .mu.M. The Gs containing fractions were
concentrated to 2 ml using a stirred ultrafiltration cell (Amicon)
with a 10 kDa NMWL regenerated cellulose membrane (Millipore). The
concentrated sample was run on a Superdex 200 prep grade XK 16/70
column (GE Healthcare) equilibrated in 5200 buffer (20 mM HEPES, pH
8.0, 100 mM NaCl, 1.1 mM MgCl.sub.2, 1 mM EDTA, 0.012% DDM, 100
.mu.M TCEP, 2 .mu.M GDP). The fractions containing pure Gs were
pooled, glycerol was added to 10% final concentration, and then the
protein was concentrated to at least 10 mg/ml using a 30 kDa MWCO
centrifugal ultrafiltration device (Millipore). The concentrated
sample was then aliquoted, flash frozen, and stored at -80.degree..
A typical yield of final, purified Gs heterotrimer from 8 liters of
cell culture volume was 6 mg.
[0154] Nanobody-35 (Nb35) was expressed in the periplasm of E. coli
strain WK6, extracted, and purified by nickel affinity
chromatography according to previously described methods
(Rasmussen, S. G. et al. Structure of a nanobody-stabilized active
state of the beta(2) adrenoceptor. Nature 2011 469, 175-180)
followed by ion-exchange chromatography (FIG. 18a) using a Mono S
10/100 GL column (GE Healthcare). Selected Nb35 fractions were
dialysis against buffer (10 mM HEPES, pH 7.5, 100 mM NaCl) and
concentrated to approximately 65 mg ml-1 with a 10 kDa MWCO
Millipore concentrator.
[0155] Complex Formation, Stabilization and Purification
[0156] Formation of a stable complex (see FIG. 19) was accomplished
by mixing Gs heterotrimer at approximately 100 .mu.M concentration
with BI-167107 bound T4L-.beta..sub.2AR (or .beta.2AR-365) in molar
excess (approximately 130 .mu.M) in 2 ml buffer (10 mM HEPES, pH
7.5, 100 mM NaCl, 0.1% DDM, 1 mM EDTA, 3 mM MgCl.sub.2, 10 .mu.M
BI-167107) and incubating for 3 hrs at room temperature. BI-167107,
which was identified from screening and characterizing
approximately 50 different .beta..sub.2AR agonists, has a
dissociation half-time of approximately 30 hrs providing higher
degree of stabilization to the active G protein-bound receptor than
other full agonists such as isoproterenol (Rasmussen, S. G. et al.
Structure of a nanobody-stabilized active state of the beta(2)
adrenoceptor. Nature 2011 469, 175-180). To maintain the
high-affinity nucleotide-free state of the complex, apyrase (25
mU/ml, NEB) was added after 90 min to hydrolyze residual GDP
released from Gsupon binding to the receptor. GMP resulting from
hydrolysis of GDP by apyrase has very poor affinity for the G
protein in the complex. Rebinding of GDP can cause dissociation of
the R:G complex (FIG. 13A).
[0157] The R:G complex in DDM shows significant dissociation after
48 hours at 4.degree. C. (FIG. 20A). Over 50 amphiphiles were
screened and identified MNG-3 (Rasmussen, S. G. et al. Structure of
a nanobody-stabilized active state of the beta(2) adrenoceptor.
Nature 2011 469, 175-180; Chae, P. S. et al. Maltose-neopentyl
glycol (MNG) amphiphiles for solubilization, stabilization and
crystallization of membrane proteins. Nat Methods 7, 1003-1008;
NG-310, Affymetrix-Anatrace) and its closely related analogs as
detergents that substantially stabilize the complex (FIGS. 20A and
B). The complex was exchanged into MNG-3 by adding the R:G mixture
(2 ml) to 8 ml buffer (20 mM HEPES, pH 7.5, 100 mM NaCl, 10 .mu.M
BI-167107) containing 1% MNG-3 for 1 hr at room temperature.
[0158] At this stage the mixture contains the R:G complex,
non-functional Gs, and an excess of .beta..sub.2AR. To separate
functional R:G complex from non-functional Gs, and to complete the
detergent exchange, the R:G complex was immobilized on M1 Flag
resin and washed in buffer (20 mM HEPES, pH 7.5, 100 mM NaCl, 10
.mu.M BI-167107, and 3 mM CaCl.sub.2) containing 0.2% MNG-3. To
prevent cysteine bridge-mediated aggregation of R:G complexes, 100
.mu.M TCEP was added to the eluted protein prior to concentrating
it with a 50 kDa MWCO Millipore concentrator. Of note, it was
discovered later that crystal growth improved at even higher TCEP
concentrations (above 1 mM) compared to 100 .mu.M TCEP, and that
the integrity of the R:G complex in MNG-3 was stable to 10 mM TCEP
as measured by gel filtration analysis (FIG. 21C). In contrast,
DDM-solubilized .beta..sub.2AR loses its ability to bind the
high-affinity antagonist .sup.3H-dihydroalprenolol in 10 mM TCEP
(data not shown), probably due to disruption of extracellular
disulfide bonds. Iodoacetamide could not be used to block reactive
cysteines on G.sub.s alpha and beta subunits as it caused
dissociation of the R:G complex (fig. S9b). The final size
exclusion chromatography procedure to separate excess free receptor
from the R:G complex (FIG. 17b) was performed on a Superdex 200
10/300 GL column (GE Healthcare) equilibrated with buffer
containing 0.02% MNG-3, 10 mM HEPES pH 7.5, 100 mM NaCl, 10 .mu.M
BI-167107, and 100 .mu.M TCEP. Peak fractions were pooled (FIG.
17b) and concentrated to approximately 90 mg ml.sup.-1 with a 100
kDa MWCO Viva-spin concentrator and analyzed by SDS-PAGE/Coomassie
brilliant blue staining (FIG. 17a) and gel filtration (FIG. 17c).
To confirm a pure, homogeneous, and dephosphorylated preparation,
the R:G complex was routinely analyzed by ion exchange
chromatography (FIG. 17d).
[0159] Protein Engineering
[0160] To increase the probability of obtaining crystals of the R:G
complex two strategies were used to increase the polar surface area
on the extracellular side of the receptor. The first approach, to
generate extracellular binding antibodies, was not successful. The
second approach was to replace the flexible and presumably
unstructured N-terminus with the globular protein T4 lysozyme (T4L)
used previously to crystallize and solve the carazolol-bound
receptor (Rosenbaum, D. M. et al. GPCR engineering yields
high-resolution structural insights into beta2-adrenergic receptor
function. Science 2007 318, 1266-1273). The construct used here
(T4L-.beta..sub.2AR) contained the cleavable signal sequence
followed by the M1 Flag epitope (DYKDDDDA; SEQ ID NO: 14), the TEV
protease recognition sequence (ENLYFQG; SEQ ID NO: 15),
bacteriophage T4 lysozyme from N2 through Y161 including C54T and
C97A mutations, and a two residue alanine linker fused to the human
.beta..sub.2AR sequence D29 through G365. The PNGaseF-inaccessible
glycosylation site of the .beta..sub.2AR at N187 was mutated to
Glu. M96 and M98 in the first extracellular loop were each replaced
by Thr to increase the otherwise low expression level of
T4L-.beta..sub.2AR. The threonine mutations did not affect ligand
binding affinity for .sup.3H-dihydro-alprenolol, but caused a
small, approximately two-fold decrease in affinity for
isoproterenol.
[0161] The .beta..sub.2AR-Gs peptide fusion construct used for
[.sup.3H]-DHA competition binding with isoproterenol was
constructed from the receptor truncated at position 365 and fused
to the last 21 amino acids of the G.alpha.s subunit (amino acids
374-394, except for C379A). A Gly-Ser is inserted between the
receptor and the peptide. Also an extended TEV protease site
(SENLYFQGS; SEQ ID NO: 16) was introduced in the .beta..sub.2AR
between G360 and G361.
[0162] Stabilization of Gs with Nanobodies
[0163] From negative stain EM imaging, we observed that the alpha
helical domain of G.alpha.s was flexible and therefore possibly
responsible for poor crystal quality. Targeted stabilization of
this domain was addressed by immunizing two llamas (Llama glama)
with the bis(sulfosuccinimidyl)glutarate (BS2G, Pierce)
cross-linked .beta..sub.2AR-Gs-BI-167107 ternary complex.
Peripheral blood lymphocytes were isolated from the immunized
animals to extract total RNA, prepare cDNA and construct a Nanobody
phage display library according to published methods. Nb35 and Nb37
were enriched by two rounds of biopanning on the
.beta..sub.2AR-Gs-BI-167107 ternary complex embedded in
biotinylated high-density lipoprotein particles (Whorton, et al.
Proc Natl Acad Sci USA 2007 104, 7682-7687). Nb35 and Nb37 were
selected for further characterization because they bind the
.beta..sub.2AR-Gs-BI-167107 ternary complex but not the free
receptor in an ELISA assay. Nanobody binding to the R:G complex was
confirmed by size exclusion chromatography (FIG. 13d), and it was
noted that both nanobodies protected the complex from dissociation
by GTP.gamma.S, suggestive of a stabilizing Gs:Nb interaction (FIG.
13d).
[0164] Crystallization
[0165] BI-167107 bound T4L-.beta..sub.2AR:Gs complex and Nb35 were
mixed in 1:1.2 molar ratio. The small molar excess of Nb35 was
verified by analytical gel filtration (see FIG. 15b). The mixture
incubated for 1 hr at room temperature prior to mixing with 7.7 MAG
containing 10% cholesterol (C8667, Sigma) in 1:1 protein to lipid
ratio (w/w) using the twin-syringe mixing method reported
previously. The concentration of R:G:Nb complex in 7.7 MAG was
approximately 25 mg ml.sup.-1. The detergent MNG-3 may stabilize
the T4L-.beta..sub.2AR-Gs complex during its incorporation into the
lipid cubic phase. This may be due to the high affinity of MNG-3
for the receptor. The .beta..sub.2AR in MNG-3 maintains its
structural integrity even when diluted below the CMC of the
detergent, in contrast to .beta..sub.2AR in DDM, which rapidly
loses binding activity (FIG. 20b). Moreover, MNG-3 improved crystal
size and quality, as previously reported. The protein:lipid mixture
was delivered through an LCP dispensing robot (Gryphon, Art Robbins
Instruments) in 40 nl drops to either 24-well or 96-well glass
sandwich plates and overlaid en-bloc with 0.8 .mu.l precipitant
solution. Multiple crystallization leads were initially identified
using in-house screens partly based on reagents from the
StockOptions Salt kit (Hampton Research). Crystals for data
collection were grown in 18 to 22% PEG 400, 100 mM MES pH 6.5 (FIG.
13c), 350 to 450 mM potassium nitrate, 10 mM foscarnet (FIG. 13b),
1 mM TCEP (FIG. 21c), and 10 .mu.M BI-167107 Crystals reached full
size within 3-4 days at 20.degree. C. and were picked from a
sponge-like mesophase and flash-frozen in liquid nitrogen without
additional cryo-protectant.
[0166] Microcrystallography Data Collection and Processing.
[0167] Diffraction data were measured at the Advanced Photon Source
beamline 23 ID-B. Hundreds of crystals were screened, and a final
dataset was compiled using diffraction wedges of typically 10
degrees from 20 strongly diffracting crystals. All data reduction
was performed using HKL2000 (Otwinowski. & Minor, W. Processing
of x-ray diffraction data collected in oscillation mode. Methods
Enzymol. 1997 276, 307-326). Although in many cases diffraction to
beyond 3 .ANG. was seen in initial frames, radiation damage and
anisotropic diffraction resulted in low completeness in higher
resolution shells. Analysis of the final dataset by the UCLA
diffraction anisotropy server .sup.31 indicated that diffraction
along the a* axis was superior to that in other directions. On the
basis of an F/.sigma. (F) cutoff of 3 along each reciprocal space
axis, reflections were subjected to an anisotropic truncation with
resolution limits of 2.9, 3.2, and 3.2 Angstroms along a*, b*, and
c* prior to use in refinement. The structure is reported to an
overall resolution of 3.2 .ANG.. Despite the low completeness in
the highest resolution shells (Table 3) inclusion of these
reflections gave substantial improvements in map quality and lower
Rfree during refinement.
[0168] Structure Solution and Refinement
[0169] The structure was solved by molecular replacement using
Phaser. In order, the search models used were: the .beta. and
.gamma. subunits from a Gi heterotrimer (PDB ID: 1GP2), the Gs
alpha ras-like domain (PDB ID: 1AZT), the active-state .beta.2
adrenergic receptor (PDB ID: 3P0G), a .beta..sub.2AR binding
nanobody (PDB ID: 3P0G), T4 lysozyme (PDB ID: 2RH1), and the Gs
alpha helical domain (PDB ID: 1AZT). Following the determination of
the initial structure by molecular replacement, rigid body
refinement and simulated annealing were performed in Phenix and
BUSTER, followed by restrained refinement and manual rebuilding in
Coot. After iterative refinement and manual adjustments, the
structure was refined in CNS using the DEN method. Although the
resolution of this structure exceeds that for which DEN is
typically most useful, the presence of several poorly resolved
regions indicated that the incorporation of additional information
to guide refinement could provide better results. The DEN reference
models used were those used for molecular replacement, with the
exception of NB35, which was well ordered and for which no higher
resolution structure is available. Side chains were omitted from 52
residues for which there was no electron density past C.beta. below
a low contour level of 0.7.sigma. in a 2Fo-Fc map. Figures were
prepared using PyMOL (The PyMOL Molecular Graphics System, Version
1.3, Schrodinger, LLC.). MolProbity was used to determine
Ramachandran statistics.
[0170] Competition Binding
[0171] Membranes expressing the .beta..sub.2AR or the
.beta..sub.2AR-Gs peptide fusion were prepared from
baculovirus-infected Sf9 cells and [.sup.3H]-dihydroalprenolol
([.sup.3H]-DHA) binding performed as previously described
(Swaminath et al Mol Pharmacol 2002 61, 65-72). For competition
binding, membranes were incubated with [.sup.3H]-DHA (1.1 nM final)
and increasing concentrations of (-)-isoproterenol (ISO) for 1 hr
before harvesting onto GF/B filters. Competition data were fitted
to a two-site binding model and ISO high and low Ki's and fractions
calculated using GraphPad prism.
Results II
Crystallization of the .beta.2AR-Gs Complex
[0172] One challenge for crystallogenesis was to prepare a stable
.beta..sub.2AR-Gs complex in detergent solution. The .beta..sub.2AR
and Gs couple efficiently in lipid bilayers, but not in detergents
used to solubilize and purify these proteins. We found that a
relatively stable .beta..sub.2AR-Gs complex could be prepared by
mixing purified GDP-Gs (approximately 100 .mu.M final
concentration) with a molar excess of purified .beta..sub.2AR bound
to a high affinity agonist (BI-167107, Boehringer Ingelheim) in
dodecylmaltoside solution. Apyrase, a non-selective purine
pyrophosphatase, was added to hydrolyze GDP released from Gs on
forming a complex with the .beta..sub.2AR. The complex was
subsequently purified by sequential antibody affinity
chromatography and size exclusion chromatography. The stability of
the complex was enhanced by exchanging it into a recently developed
maltose neopentyl glycol detergent (NG-310, Anatrace). The complex
could be incubated at room temperature for 24 hrs without any
noticeable degradation; however, initial efforts to crystallize the
complex using sparse matrix screens in detergent micelles, bicelles
and lipidic cubic phase (LCP) failed.
[0173] To further assess the quality of the complex, the protein
was analyzed by single particle electron microscopy (EM). The
results confirmed that the complex was monodispersed, and revealed
two potential problems for obtaining diffraction of quality
crystals. First, the detergent used to stabilize the complex formed
a large micelle, leaving little polar surface on the extracellular
side of the .beta..sub.2AR-Gs complex for the formation of crystal
lattice contacts. The initial approach to this problem, which was
to generate antibodies to the extracellular surface, was not
successful. As an alternative approach, we replaced the amino
terminus of the .beta..sub.2AR with T4 lysozyme (T4L). Several
different amino-terminal fusion proteins were prepared and single
particle EM was used to identify a fusion with a relatively fixed
orientation of T4L in relation to the .beta..sub.2AR.
[0174] The second problem revealed by single particle EM analysis
was increased variability in the positioning of the .alpha.-helical
component of the G.alpha.s subunit. G.alpha.s consists of two
domains, the ras-like GTPase domain (G.alpha.sRas), which interacts
with the .beta..sub.2AR and the G.beta. subunit, and the
.alpha.-helical domain (G.alpha.sAH). The interface of the two
G.alpha.s subdomains forms the nucleotide-binding pocket (FIG. 7),
and EM 2D averages and 3D reconstructions show that in the absence
of guanine nucleotide, G.alpha.sAH has a variable position relative
to the complex of T4L-.beta..sub.2AR-G.alpha.sRAS-G.beta..gamma.
(FIG. 7b).
[0175] The variable position of G.alpha.sAH was attributed to the
empty nucleotide-binding pocket. However, both GDP and
nonhydrolyzable GTP analogs disrupt the .beta..sub.2AR-Gs complex
(FIG. 13). The addition of pyrophosphate and its analog
phosphonoformate (foscarnet) led to a significant increase in
stabilization of G.alpha.sAH as determined by EM analysis of the
detergent solubilized complex. Crystallization trials were carried
out in Lipidic Cubic Phase (LCP) using a modified monolein designed
to accommodate the large hydrophilic component of the
T4L-.beta.2AR-Gs complex (Misquitta, L. V. et al. Membrane protein
crystallization in lipidic mesophases with tailored bilayers.
Structure 2004 12, 2113-2124). Although we were able to obtain
small crystals that diffracted to 7 .ANG., we were unable to
improve their quality through the use of additives and other
modifications.
[0176] In an effort to generate an antibody that would further
stabilize the complex and facilitate crystallogenesis, .beta.2AR
and the Gs heterotrimer were crosslinked with a small,
homobifunctional amine-reactive crosslinker and used this
stabilized complex to immunized llamas. Llamas and other camelids
produce antibodies devoid of light chains. The single domain
antigen binding fragments of these heavy chain only antibodies,
known as nanobodies, are small (15 kDa), rigid and are easily
cloned and expressed in E. coli. A nanobody (Nb35) was obtained
that binds to the complex and prevents dissociation of the complex
by GTP.gamma.S (FIG. 13). The T4L-.beta.2AR-Gs-Nb35 complex was
used to obtain crystals that grew to 250 microns (FIG. 14) in LCP
(monoolein 7.7) and diffracted to 2.9 .ANG.. A 3.2 .ANG. data set
was obtained from 20 crystals and the structure was determined by
molecular replacement.
[0177] The .beta..sub.2AR-Gs complex crystallized in space group
P2.sub.1, with a single complex in each asymmetric unit. FIG. 8a
shows the crystallographic packing interactions. Complexes are
arrayed in alternating aqueous and lipidic layers with lattice
contacts formed almost exclusively between soluble components of
the complex, leaving receptor molecules suspended between G protein
layers and widely separated from one another in the plane of the
membrane. Extensive lattice contacts were formed among all the
soluble proteins, likely accounting for the strong overall
diffraction and remarkably clear electron density for the G
protein. Nb35 and T4L facilitated crystal formation. Nb35 packs at
the interface of G.beta. and G.alpha. subunits with complementarity
determining region (CDR) 1 interacting primarily with G.beta. and a
long CDR3 loop interacting with both G.beta. and G.alpha. subunits.
The framework regions of Nb35 from one complex also interact with
G.alpha. subunits from two adjacent complexes. T4L forms relatively
sparse interactions with the amino terminus of the receptor, but
packs against the amino terminus of the G.beta. subunit of one
complex, the carboxyl terminus of the G.beta. subunit of another
complex, and the G.beta. subunit of yet another complex. FIG. 8b
shows the structure of the complete complex including T4L and Nb35,
and FIG. 8c shows the .beta..sub.2AR-Gs complex alone.
Structure of the Active-State .beta.2AR
[0178] The .beta..sub.2AR-Gs structure provides the first
high-resolution insight into the mechanism of signal transduction
across the plasma membrane by a GPCR, and the structural basis for
the functional properties of the ternary complex. FIG. 9a compares
the structures of the agonist-bound receptor in the
.beta..sub.2AR-Gs complex and the inactive carazolol-bound
.beta..sub.2AR. The largest difference between the inactive and
active structures is a 14 .ANG. outward movement of TM6 when
measured at the C.alpha. carbon of E268. There is a smaller outward
movement and extension of the cytoplasmic end of the TM5 helix by 7
residues. A stretch of 26 amino acids in the third intracellular
loop (ICL3) is disordered. Another notable difference between
inactive and active structures is the second intracellular loop
(ICL2), which forms an extended loop in the inactive .beta..sub.2AR
structure and an .alpha.-helix in the .beta..sub.2AR-Gs complex.
This helix is also observed in the .beta..sub.2AR-Nb80 structure
(FIG. 9b); however, it may not be a feature that is unique to the
active state, since it is also observed in the inactive structure
of the highly homologous avian .beta..sub.1AR.
[0179] The quality of the electron density maps for the
.beta..sub.2AR is highest at this .beta..sub.2AR-G.alpha.sRas
interface, and much weaker for the extracellular half, possibly due
to the lack of crystal lattice contacts with the extracellular
surface (FIG. 8a). As a result, we cannot confidently model the
high-affinity agonist (BI-167107) in the ligand-binding pocket.
However, the overall structure of the .beta..sub.2AR in the
T4L-.beta..sub.2AR-Gs complex is very similar to our recent
active-state structure of .beta..sub.2AR stabilized by a G protein
mimetic nanobody (Nb80). These structures deviate primarily at the
cytoplasmic ends of TMs 5 and 6 (FIG. 9b), possibly due to the
presence of T4L that replaces ICL3 in the .beta..sub.2AR-Nb80
structure. Nonetheless, the .beta..sub.2AR-Nb80 complex exhibits
the same high affinity for the agonist isoproterenol as does the
.beta..sub.2AR-Gs complex, consistent with high structural homology
around the ligand binding pocket. The electron density maps for the
.beta..sub.2AR-Nb80 crystals provide a more reliable view of the
conformational rearrangements of amino acids around the
ligand-binding pocket and between the ligand-binding pocket and the
Gs-coupling interface.
[0180] FIG. 9c shows the position of the highly conserved sequence
motifs including D/ERY and NPxxY in the .beta..sub.2AR-Gs complex
compared with the .beta..sub.2AR-Nb80 complex (see also Fig. S3).
These conserved sequences have been proposed to be important for
activation or for maintaining the receptor in the inactive state.
The positions of these amino acids are essentially identical in
these two structures demonstrating that Nb80 is a very good G
protein surrogate. Only Arg131 differs between these two
structures. In the .beta..sub.2AR-Nb80 structure Arg131 interacts
with Nb80, whereas in the .beta..sub.2AR-Gs structure Arg131 packs
against Tyr391 of G.alpha.s (FIG. 15).
[0181] The active state of the .beta..sub.2AR is stabilized by
extensive interactions with (G.alpha.sRas) (FIG. 10). There are no
direct interactions with G.beta. or G.gamma. subunits. The total
buried surface of the .beta..sub.2AR-GsRas interface is 2576
.ANG..sup.2 (1300 .ANG..sup.2 for GsRas and 1276 .ANG..sup.2 for
the .beta..sub.2AR). This interface is formed by ICL2, TM5 and TM6
of the .beta..sub.2AR, and by .alpha.5-helix, the .alpha.N-.beta.1
junction, the top of the .beta.3-strand, and the .alpha.4-helix of
G.alpha.sRas (see Table 1 below for specific interactions). The
.beta..sub.2AR sequences involved in this interaction have been
shown to play a role in G protein coupling; however, there is no
clear consensus sequence for Gs-coupling specificity when these
segments are aligned with other GPCRs. Perhaps this is not
surprising considering that the .beta..sub.2AR also couples to Gi
and that many GPCRs couple to more than one G protein isoform. The
structural basis for G protein coupling specificity must therefore
involve more subtle features of the secondary and tertiary
structure. Nevertheless, a noteworthy interaction involves Phe139,
which is located at the beginning of the ICL2 helix and sits in a
hydrophobic pocket formed by G.alpha.s His41 at the beginning of
the .beta.1-strand, Val213 at the start of the .beta.3-strand and
Phe376, Arg380 and Ile383 in the .alpha.5-helix (FIG. 4c). The
.beta..sub.2AR mutant F139A displays severely impaired coupling to
Gs. The residue corresponding to Phe139 is a Phe or Leu on almost
all Gs coupled receptors, but is more variable in GPCRs known to
couple to other G proteins. Of interest, the ICL2 helix is
stabilized by an interaction between Asp130 of the conserved DRY
sequence and Tyr141 in the middle of the ICL2 helix (FIG. 10c).
Tyr141 has been shown to be a substrate for the insulin receptor
tyrosine kinase; however, the functional significance of this
phosphorylation is currently unknown.
Structure of Activated Gs
[0182] One surprising observation in the .beta..sub.2AR-Gs complex
is the large displacement of the G.alpha.sAH relative to
G.alpha.sRas (an approximately 127.degree. rotation about the
junction between the domains) (FIG. 11a). In the crystal structure
of G.alpha.s, the nucleotide-binding pocket is formed by the
interface between G.alpha.sRas and G.alpha.sAH. Guanine nucleotide
binding stabilizes the interaction between these two domains. The
loss of this stabilizing effect of guanine nucleotide binding is
consistent with the high flexibility observed for G.alpha.sAH in
single particle EM analysis of the detergent solubilized complex.
It is also in agreement with the increase in deuterium exchange at
the interface between these two domains upon formation of the
complex. Recently Hamm, Hubbell and colleagues, using double
electron-electron resonance (DEER) spectroscopy, documented large
(up to 20 .ANG.) changes in distance between nitroxide probes
positioned on the Ras and .alpha.-helical domains of Gi upon
formation of a complex with light-activated rhodopsin. Therefore,
it is perhaps not surprising that GsAH is displaced relative to
G.alpha.sRas; however, its location in this crystal structure most
likely reflects only one of an ensemble of conformations that it
can adopt under physiological conditions, but has been stabilized
by crystal packing interactions.
[0183] The conformational links between the .beta..sub.2AR and the
nucleotide-binding pocket primarily involve the amino and carboxyl
terminal helices of G.alpha.s (FIG. 10). FIG. 11b focuses on the
region of G.alpha.sRas that undergoes the largest conformational
change when comparing the structure of G.alpha.sRas from the
Gs-.beta..sub.2AR complex with that from the G.alpha.s-GTP.gamma.S
complex. The largest difference is observed for the .alpha.5-helix,
which is displaced 6 .ANG. towards the receptor and rotated as the
carboxyl terminal end projects into transmembrane core of the
.beta..sub.2AR. Associated with this movement, the .beta.6-.alpha.5
loop, which interacts with the guanine ring in the
G.alpha.s-GTP.gamma.S structure, is displaced outward, away from
the nucleotide-binding pocket (FIG. 11b-d). The movement of
.alpha.5-helix is also associated with changes in interactions
between this helix and the .beta.6-strand, the .alpha.N-.beta.1
loop, and the .alpha.1-helix. The .beta.1-strand forms another link
between the .beta..sub.2AR and the nucleotide-binding pocket. The
C-terminal end of this strand changes conformation around Gly47,
and there are further changes in the .beta.1-.alpha.1 loop (P-loop)
that coordinates the .gamma.-phosphate in the GTP-bound form (FIG.
11 b-d). The observations in the crystal structure are in agreement
with deuterium exchange experiments where there is enhanced
deuterium exchange in the .beta.1-strand and the amino terminal end
of the .alpha.5-helix upon formation of the nucleotide-free
.beta..sub.2AR-Gs complex. The DXMS studies provide additional
insights into the dynamic nature of these conformational changes in
Gs upon complex formation.
[0184] The structure of a GDP-bound Gs heterotrimer has not been
determined in this study, so it is not possible to directly compare
the G.alpha.s-G.beta..gamma. interface before and after formation
of the .beta..sub.2AR-Gs complex. Based on the structure of the
GDP-bound Gi heterotrimer, large changes in interactions between
G.alpha.sRas and G.beta..gamma. upon formation of the complex with
.beta..sub.2AR are not observed. This is also consistent with
deuterium exchange studies. It should be noted that Nb35 binds at
the interface between G.alpha.sRas and G.beta. (FIG. 8b).
Therefore, we cannot exclude the possibility that Nb35 may
influence the relative orientation of the
G.alpha.sRas-G.beta..gamma. interface in the crystal structure.
However, single particle EM studies provide evidence that Nb35 does
not disrupt interactions between G.alpha.sAH and G.alpha.sRas.
Assembly of the .beta.2AR-Gs Complex
[0185] Clues to the initial stages of complex formation may come
from the recent active state structures of rhodopsin. FIGS. 12a and
b compare the active-state structure of .beta..sub.2AR in the
.beta..sub.2AR-Gs complex with the recent structure of
metarhodopsin II bound to the transducin peptide. The
conformational changes in TM5 and TM6 are smaller in metarhodopsin
II, and the position of the carboxyl terminal alpha helix of
transducin is tilted by approximately 30.degree. relative to the
position of the homologous region of Gs. These may represent
fundamental differences in the receptor-G protein interactions
between these two proteins, but given the strong conservation of
the G-protein binding pocket, the changes more likely reflect the
extensive contacts formed with the intact G protein. The position
of the transducin peptide in metarhodopsin II may represent the
initial interaction between a GDP-bound G protein and a GPCR. We
have attempted to reproduce a similar complex between the
.beta..sub.2AR and a synthetic peptide representing the carboxyl
terminal 20 amino acids of Gs, but did not observe any effect of
this peptide on receptor function, possibly due to the solubility
and behavior of the peptide in solution. However, when the carboxyl
terminal 20 amino acids of Gs are fused to the carboxyl terminus of
the .beta..sub.2AR (FIG. 12c), we observe a 27-fold increase in
agonist affinity (FIG. 12d). This effect is only 3.5-fold smaller
than the effect we observe on agonist binding affinity in the
.beta..sub.2AR-Gs complex, and demonstrates that there is a
functional interaction between the peptide and receptor that may
represent an initial stage in .beta..sub.2AR-Gs complex formation.
FIG. 12 e, f presents a possible sequence of interactions of
.beta..sub.2AR and Gs when forming the nucleotide free complex. The
initial interaction of the .beta..sub.2AR with Gs would require an
outward movement of the carboxyl terminus of the .alpha.5-helix
away from the .beta.6-strand to permit interactions with the
.beta..sub.2AR similar to those observed in metarhodopsin II. The
dynamic character of the carboxyl terminal end of .alpha.5 is
supported by deuterium exchange studies and the relatively loose
packing of .alpha.5 with the rest of G.alpha.sRas in the structure
of G.alpha.s alone. The subsequent formation of more extensive
interactions between the .beta..sub.2AR ICL 2 and the amino
terminus of G.alpha.s requires a rotation of G.alpha.sRas relative
to the receptor and would be associated with further conformational
changes in both .beta..sub.2AR and G.alpha.sRas (FIG. 12f). This
binding model is in agreement with deuterium exchange
experiments.
[0186] The coordinates and structure factors for the
.beta..sub.2AR-Gs complex are deposited in the Protein Data Bank as
accession number 3SN6, which is incorporated by reference
herein.
TABLE-US-00001 TABLE 1 Potential intermolecular interaction within
the R:G interface ##STR00001##
TABLE-US-00002 TABLE 2 Data collection and refinement statistics
Data collection* Number of crystals 20 Space group P 2.sub.1 Cell
dimensions a, b, c (.ANG.) 119.3, 64.6, 131.2 .alpha., .beta.,
.gamma. (.degree.) 90.0, 91.7, 90.0 Resolution (.ANG.) 41-3.2
(3.26-3.20) R.sub.merge (%) 15.6 (553) <I>/<.sigma.I>
10.8 (1.8) Completeness (%) 91.2 (53.9) Redundancy 6.5 (5.0)
Refinement Resolution (.ANG.) 41-3.2 No. reflections 31075 (1557 in
test set) R.sub.work/R.sub.free (%) 22.5/27.7 No. atoms 10277 No.
protein residues 1318 Anisotropic B tensor B.sub.11 = -7.0/B.sub.22
= 4.7/B.sub.33 = 2.3/B.sub.13 = 2.1 Unmodelled sequences*
.beta..sub.2 adrenergic receptor 29.sup.b, 176-178, 240-264,
342-365 G.sub.s.alpha., ras domain 1-8, 60-88, 203-204, 256-262
G.sub.s.gamma. 1-4, 63-68 T4 lysozyme 161.sup.c Average B-factors
(.ANG..sup.2) .beta..sub.2 adrenergic receptor 133.5
G.sub.s.alpha., ras domain 82.8 G.sub.s.alpha., helical domain
123.0 G.sub.s.beta. 64.2 G.sub.s.gamma. 85.2 Nanobody 35 60.7 T4
lysozyme 113.7 R.m.s. deviation from ideality Bond length (.ANG.)
0.007 Bond angles (.degree.) 0.72 Ramachandran statistics.sup.d
Favored regions (%) 95.8 Allowed regions (%) 4.2 Outliers (%) 0
*Highest shell statistics are in parentheses. .sup.aThese regions
were omitted from the model due to poorly resolved electron
density. Unmodelled purification tags are not included in these
residue ranges. .sup.bResidues 1-28 of the .beta.2AR were omitted
from the construct and T4L was fused to the amino terminus of
transmembrane helix 1 to facilitate crystallization. .sup.cResidue
1 of T4L was omitted from the construct .sup.dAs defined by
MolProbity.sup.3B.
TABLE-US-00003 TABLE 3 Data collection statistics by resolution
shell Resolution Shell (.ANG.) <I>/<.sigma.I>
R.sub.merge (%) Completeness (%) .sup. 41-8.67 18.8 06.6 97.1
8.67-6.89 16.9 09.2 99.5 6.89-6.02 14.4 13.0 99.7 6.02-5.47 12.8
16.7 99.9 5.47-5.08 13.4 15.9 99.9 5.08-4.78 13.4 16.9 99.8
4.78-4.54 12.2 18.2 99.6 4.54-4.34 11.6 20.1 99.8 4.34-4.18 9.5
22.9 99.4 4.18-4.03 7.7 26.2 99.1 4.03-3.91 6.6 27.9 98.7 3.91-3.79
5.3 30.2 98.7 3.79-3.69 3.8 36.6 96.7 3.69-3.60 4.6 36.9 94.6
3.60-3.52 2.3 45.7 90.3 3.52-3.45 2.2 47.9 86.3 3.45-3.38 2.4 45.6
80.5 3.38-3.31 2.1 47.3 69 3.31-3.26 2.2 49.8 59.4 3.26-3.20 1.8
55.3 53.9 Overall 10.8 15.6 91.2
Materials and Methods III
Generation of N-T4L Fused .beta.2AR Constructs
[0187] The human .beta..sub.2AR in the pFastbac1 Sf9 expression
vector truncated at amino acid 365 in the cytoplasmic tail
(.beta..sub.2AR365) was used as the starting template for
generating the N-T4L fused .beta..sub.2AR constructs. The HA signal
peptide followed by FLAG epitope tag and tobacco etch virus (TEV)
protease recognition sequence were added to the N-terminus of the
receptor to facilitate expression and purification. A point
mutation of N187E was also introduced in the second extracellular
loop to remove a glycosylation site (FIG. 22).
[0188] DNA cassettes encoding two different versions of T4L
lysozyme (full length or with truncated C-terminus) with different
numbers of additional alanines attached to the C-terminus were
generated and amplified by PCR using the original
.beta..sub.2AR-T4L .sup.3 as the template and synthetic
oligonucleotides as primers. These different cassettes were
inserted into the .beta..sub.2AR365 construct between the end of
the TEV protease recognition sequence and Asp29, Glu30 or Val31 of
the receptor as shown in (FIG. 22) by using the Quickchange multi
protocol (Stratagene). Two point mutations M96T, M98T were also
introduced into the .beta..sub.2AR sequence. Residues from Ser235
to Lys263 in the third intracellular loop were deleted with the
Quickchange multi protocol using synthetic oligonucleotides as
mutation primers. All the constructs were confirmed by DNA
sequencing. The protein sequence of T4L-.beta..sub.2AR-.DELTA.-ICL3
is shown below:
TABLE-US-00004 (SEQ ID NO: 17) ##STR00002##
DTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQ
DVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLR
MLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYAADEVWVV
GMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGL
AVVPFGAAHILTKTWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFA
ITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINC
YAEETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKROLOKID
KFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLN
WIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRSSLKAYGNGYSSNGNTGE QSG.
[0189] The HA signal peptide is shown in italic letters; the FLAG
epitope tag is shown in letters with underscore; the TEV
recognition sequence is marked with a box and the cleavage site is
shown with an asterisk; the full length T4L is shown in bold; the
.beta..sub.2AR sequence from Asp29 to Gly365 excluding Ser235 to
K263 is shown in bold underline, the 2-Ala linker is
underlined).
[0190] The entire T4L-.beta..sub.2AR-.DELTA.-ICL3 gene described
above was further cloned into the Best-Bac Sf9 expression vector
pv11393 (expression systems) using the restriction enzyme digestion
site XbaI and EcoRI. This version of
T4L-.beta..sub.2AR-.DELTA.-ICL3 construct was also confirmed by DNA
sequencing.
[0191] Whole Cell Binding to Assess the Expression Yield of Each
Construct.
[0192] Recombinant baculovirus was made from the pFastbac1 Sf9
expression vector for each of the constructs illustrated in FIG. 22
using the Invitrogen protocol. Sf9 cells at a density of 4
million/ml were infected with second passage virus at different
ratios of virus stock to cell culture (1:20, 1:50, and 1:100).
After 48 hours, 5 .mu.l of the infected cells were incubated with
10 nM of [.sup.3H]-dihydroalprenolol (DHA) in 500 .mu.l of binding
buffer (75 mM Tris, 12.5 mM MgCl2, 1 mM EDTA, pH 7.4, supplemented
with 5 mg/ml BSA). Cells were harvested and washed with cold
binding buffer using a Brandel harvester. Bound [.sup.3H]DHA was
measured with scintillation counter (Beckman). Non-specific binding
of [.sup.3H]DHA was assessed by including 10 .mu.M of alprenolol
(Sigma) in the same binding reaction. The expression level of each
construct was determined using the specific activity of the bound
[.sup.3H]DHA. Each experiment was performed in triplicate.
[0193] Saturation and Competition Binding Assays.
[0194] Membranes from Sf9 cells expressing either wild-type
.beta.2AR or T4L-.beta..sub.2AR-.DELTA.-ICL3 were prepared based on
a previously describe protocol.sup.12. In each reaction for the
saturation binding assay, membranes containing approximately 0.2
pmol receptor were incubated with concentrations of [.sup.3H]DHA
ranging from 5 pM to 10 nM in 500 .mu.l of buffer (75 mM Tris, 12.5
mM MgCl2, 1 mM EDTA, pH 7.4, supplemented with 0.5 mg/ml BSA) at
room temperature with shaking at 230 rpm for 1 hour. Membranes were
isolated from free [.sup.3H]DHA using a Brandel harvester and
washed three times with cold buffer. The amount of receptor bound
[.sup.3H]DHA was measured using a scintillation counter (Beckman).
Non-specific binding of the [.sup.3H]DHA in each reaction was
assessed by including 1 .mu.M alprenolol (Sigma) in the same
reaction. In each reaction for the competition binding assay,
membrane containing approximately 0.2 pmol receptor was incubated
with 1 nM [.sup.3H]DHA and different concentrations of
(-)-isoproterenol (Sigma) ranging from 1 nM to 1 mM. Membranes were
harvested and washed three times with cold buffer. The bound
[.sup.3H]DHA was counted as described above. Non-specific
[.sup.3H]DHA was assessed by replacing (-)-isoproterenol with 1
.mu.M alprenolol. All the binding data was analyzed by non-linear
regression method using Graphpad Prism. Each experiment was
performed in triplicate.
[0195] Expression and Purification of
T4L-.beta..sub.2AR-.DELTA.-ICL3 from Baculovirus-Infected Sf9
Cells
[0196] Recombinant baculovirus was made from
pv11393-T4L-.beta..sub.2AR-.DELTA.-ICL3 using Best-Bac expression
system, as described by the system protocol (Expression Systems).
T4L-.beta..sub.2AR-.DELTA.-ICL3 was expressed by infecting Sf9
cells at a density of 4 million/ml with a second passage
baculovirus stock at a virus to cell ratio of 1:50. 1 .mu.M of the
antagonist alprenolol was included to enhance the receptor
stability and yield. The infected cells were harvested after 48 hs
of incubation at 27.degree. C.
[0197] Cell pellets were lysed by vigorous stirring in lysis buffer
(10 mM TRIS-Cl pH 7.5, 2 mM EDTA, 10 ml of buffer per gram of cell
pellet) supplemented with protease inhibitor Leupeptin (2.5
.mu.g/ml final concentration, Sigma) and Benzamindine (160 .mu.g/ml
final concentration, Sigma) for 15 minutes. The
T4L-.beta..sub.2AR-.DELTA.-ICL3 protein was extracted from the cell
membrane by dounce homogenization in solubilization buffer (100 mM
NaCl, 20 mM TRIS-Cl, pH 7.5, 1% Dodecylmaltoside) supplemented with
Leupeptin and Benzamindine (2.5 .mu.g/ml and 160 .mu.g/ml final
concentration, respectively). 10 ml of solubilization buffer was
used for each gram of cell pellet. The Dodecylmaltoside
(DDM)-solubilized T4L-.beta..sub.2AR-.DELTA.-ICL3 bearing the FLAG
epitope was then purified by M1 antibody affinity chromatography
(Sigma). Extensive washing using HLS buffer (100 mM NaCl, 20 mM
HEPES pH 7.5, 0.1% DDM) was performed to get rid of alprenolol. The
protein was then eluted with HLS buffer supplemented with 5 mM
EDTA, 200 .mu.g/ml free FLAG peptide and a saturating concentration
of cholesterol hemisuccinate.
[0198] The eluted T4L-.beta..sub.2AR-.DELTA.-ICL3 was further
purified by affinity chromatography using alprenolol-Sepharose as
previously described .sup.3 in order to isolate functional
T4L-.beta..sub.2AR-.DELTA.-ICL3 from non-functional protein. HHS
buffer (350 mM NaCl, 20 mM HEPES pH 7.5, 0.1% DDM) supplemented
with 300 .mu.M alprenolol and a saturating concentration of
cholesterol hemisuccinate was used to elute the protein. The eluted
T4L-.beta..sub.2AR-.DELTA.-ICL3 bound with alprenolol was then
re-applied to M1 resin, allowing exchanging alprenolol with
carazolol in HHS buffer supplemented with 30 nM carazolol.
T4L-.beta..sub.2AR-.DELTA.-ICL3 bound with carazolol was then
eluted from M1 resin with HHS buffer supplemented with 5 mM EDTA,
200 .mu.g/ml free FLAG peptide and saturating concentration of
cholesterol hemisuccinate. The FLAG epitope tag of
T4L-.beta..sub.2AR-.DELTA.-ICL3 was removed by the treatment of
tobacco etch virus (TEV) protease (invitrogen) for 3 hs at room
temperature or overnight at 4.degree. C. The untagged
T4L-.beta..sub.2AR-.DELTA.-ICL3-cazazolol complex was then further
purified by chromatography (SEC) using S200 column (GE healthcare)
equilibrated in 100 mM NaCl, 10 mM HEPES pH 7.5, 0.1% DDM and 1 nM
carazolol. The same buffer was used as the running buffer for SEC.
The purity of the final T4L-.beta..sub.2AR-.DELTA.ICL3 is more than
90% according to the result of SDS-PAGE electrophoresis.
[0199] Crystallization of the
T4L-.beta..sub.2AR-.DELTA.ICL3-Carazolo Complex
[0200] The purified T4L-.beta..sub.2AR-.DELTA.-ICL3-carazolol
complex was concentrated to a final concentration of 60 mg/ml using
centricon Vivaspin (GE healthcare). The complex was crystallized
using the lipid cubic phase (LCP) method as previously
described.sup.3. The protein complex was mixed with lipid moloolein
with a 1:1.5 mass ratio at room temperature. 0.030 of the
protein-lipid mixture drop was deposited in each well of a 96-well
glass sandwich plate (Molecular Dimensions). The drop was then
overlaid with 0.65 .mu.l of precipitant and the well was sealed by
glass coverslip. By using this method, the
T4L-.beta..sub.2AR-.DELTA.-ICL3-carazolol complex was crystallized
in 37% PEG300 (v/v), 0.1M Bis-Tris propane, pH 6.5, 0.1 M ammonium
phosphate after 2 days of incubation in 20.degree. C.
[0201] Data Collection and Structure Determination
[0202] The crystals were harvested and frozen in liquid nitrogen
directly without using additional cryo-protectant. Diffraction data
from 15 different crystals was collected using the GM/CA-CAT
minibeam at 23-ID-D, Advance Photon Source, Argonne National Labs.
The data was processed with HKL2000 and the structure was solved by
molecular replacement using Molrep. Further model rebuilding was
performed by using coot and the structure was refined with Phenix.
The validation of the final structural model was performed using
Molprobity. Data processing and refinement statistics are shown in
Table 4.
Results III
[0203] T4 lysozyme was fused to the N-terminus of the .beta..sub.2
adrenergic receptor (.beta..sub.2AR), a G-protein coupled receptor
(GPCR) for catecholamines. The N-terminally fused T4L is
sufficiently rigid relative to the receptor to facilitate
crystallogenesis without thermostabilizing mutations or the use of
a stabilizing antibody, G protein, or protein fused to the 3rd
intracellular loop. This approach adds to the protein engineering
strategies that enable crystallographic studies of GPCRs alone or
in complex with a signaling partner.
[0204] The N terminus of the .beta..sub.2AR was replaced with T4
lysozyme to produce a T4L-GPCR fusion. To have a T4L-.beta..sub.2AR
construct suitable for crystallization, the link between T4L and
the receptor should be relatively short and rigid, yet not
interfere with receptor function. Several different constructs were
generated and examined for expression levels and binding properties
(FIG. 22). In an effort to generate a rigid interaction between T4L
and the .beta..sub.2AR, we removed the relatively flexible
C-terminus of the T4L and attempted to fuse the remaining C
terminal helix of T4L with the extracellular end of TM1 of the
.beta..sub.2AR. None of these constructs gave sufficient amounts of
functional receptor.
[0205] In the second approach, we fused the carboxyl terminus of
T4L to D29, the first amino acid of the extracellular helical
extension of TM 1. Four constructs were generated and examined:
direct fusion of T4L to D29, and the inclusion of 1-3 Ala residues
between T4L and the .beta..sub.2AR (FIG. 22). The highest level of
expression was obtained from the fusion with a 2-Ala linker. The
fusion protein had normal pharmacology and G protein coupling. To
improve expression, two additional point mutations M96T and M98T
were made in the .beta..sub.2AR component of the fusion protein. We
have previously observed that mutation of these residues, which are
located in the first extracellular loop and face away from the
protein, had no effect on receptor function, but enhanced
expression by up to two-fold. We were able to produce 1.5 mg of
pure, functional protein from 1 liter of Sf9 cells.
[0206] This version of T4L-.beta..sub.2AR was recently used to
obtain the crystal structure of the .beta..sub.2AR-Gs complex.
However, in this structure most of the lattice contacts in this
crystal are mediated by Gs, and the N terminal fused T4L does not
pack against the extracellular surface of its fused .beta..sub.2AR
(FIG. 24). The lack of interactions between T4L and the
extracellular surface of the .beta..sub.2AR in the
.beta..sub.2AR-Gs complex suggested that T4L fused to the N
terminus of the .beta..sub.2AR might not be sufficiently
constrained to facilitate crystallogenesis in the absence of the
cytoplasmic G protein. The amino terminal T4L facilitated
crystallogenesis in the absence of a soluble protein bound or fused
to the third intracellular loop. Additional modifications were made
to minimize unstructured sequence in the third intracellular loop
and carboxyl terminus (FIG. 22). The C-terminus was truncated after
amino acid 365. The 3.sup.rd intracellular loop (ICL3) of
.beta..sub.2AR is another flexible region and it is subject to
proteolysis. This loop was truncated in the fusion protein by
removing residues 235 to 263. The final construct
T4L-.beta..sub.2AR-.DELTA.-ICL3 is illustrated in FIG. 22.
[0207] To determine the functional integrity of
T4L-.beta..sub.2AR-.DELTA.-ICL3, agonist and antagonist binding
affinities were determined. The ligand binding pocket is formed by
amino acids from four transmembrane domains and is therefore very
sensitive to any perturbation of the receptor structure.
T4L-.beta..sub.2AR-.DELTA.-ICL3 exhibits ligand binding affinities
for the antagonist [3H]-Dihydroalprenolol and the agonist
isopreterenol that are comparable to those of the wild type
receptor (FIG. 25).
[0208] Purified T4L-.beta..sub.2AR-.DELTA.-ICL3 bound to the
inverse agonist carazolol crystallized as small rods in lipid cubic
phase (37% PEG300 (v/v), 0.1M Bis-Tris propane, pH 6.5, 0.1 M
ammonium phosphate). Crystals diffracted to a resolution of 3.3
.ANG.; however, due to radiation damage, our dataset was limited to
4.0 (Table 4). Nevertheless, the dataset allowed us to solve the
structure by molecular replacement. The interaction between the
.beta..sub.2AR and T4L is sufficiently rigid to detect electron
density for the 2 Ala link between these two proteins (FIG. 26).
This link was not detectable in the electron density map of the
.beta..sub.2AR-Gs structure (FIG. 24). In the
T4L-.beta..sub.2AR-.DELTA.-ICL3 crystal lattice, the packing
interactions are primarily mediated by T4L and there are no
contacts between adjacent receptors (FIG. 23), indicating the
important role of the T4L in facilitating GPCR crystallization.
Each T4L has four packing interactions: 1-against ECL1 and ECL2 of
its fused .beta..sub.2AR-.DELTA.-ICL3, 2-against T4L of one
adjacent T4L-.beta..sub.2AR-.DELTA.-ICL3, 3-against T4L, ECL2 and
ECL3 of a second T4L-.beta..sub.2AR-.DELTA.-ICL3, and 4-against
ICL3 and Helix 8 of a third T4L-.beta..sub.2AR-.DELTA.-ICL3 (FIG.
23).
[0209] The structures of the .beta..sub.2AR in
T4L-.beta..sub.2AR-.DELTA.-ICL3 and .beta.2AR-T4L (pdb 2RH1) are
very similar to each other (FIG. 27), with an overall root mean
square deviation of 0.48 .ANG.. Only minor differences can be
observed in these two structures, presumably due to different
crystal packing patterns. The similarity of the structures
determined independently through different strategies further
validates the fusion protein approach, demonstrating that
structural distortions due to protein engineering or crystal
packing are unlikely.
[0210] Of interest, ICL2 in the two inactive structures of
.beta..sub.2AR-Fab5 and .beta..sub.2AR-T4L is in an extended loop
while it is an alpha helix in both active structures: the
.beta..sub.2AR-Gs complex and the .beta..sub.2AR stabilized by
Nb80. In both of the inactive structures (.beta..sub.2AR-Fab5 and
.beta..sub.2AR-T4L), ICL2 participates in lattice contacts that may
influence its conformation. However, in the
T4L-.beta.2AR-.DELTA.-ICL3 structure ICL2 is not involved in
packing interactions, yet is an extended loop is nearly identical
to that observed in the other inactive state .beta..sub.2AR
structures (FIG. 27). Thus, this extended loop structure may
reflect an inactive state.
[0211] In conclusion, fusion of T4L to the amino terminus of a GPCR
can facilitate crystallogenesis. This approach can also facilitate
the formation of crystals of a GPCR in complex with a cytoplasmic
signaling protein.
[0212] FIG. 28 illustrates shows the structure of T4L-.beta.2AR
fusion bound to salmeterol, a partial agonist used to treat asthma.
In this structure, the partial-active state is stabilized by a
nanobody (nanobody 71). This structure was obtained using similar
methods to those described above.
TABLE-US-00005 TABLE 4 Data collection Space group
P2.sub.12.sub.12.sub.1 Unit cell dimensions a, b, c (.ANG.) 51.4,
71.4, 161.4 Resolution (.ANG.) 50-4.0 (4.07-4.00)* R.sub.merge
0.199 (0.799) <I/.sigma.I> 8.4 (1.5) Completeness (%) 84.3
(71.2) Multiplicity 4.7 (3.7) Refinement Resolution (.ANG.) 30-3.99
No. reflections work/free 4547/691 R.sub.work/R.sub.free
0.267/0.293 No. atoms 3623 Average B values (.ANG..sub.2) Receptor
197 T4L 177 Carazolol 160 Overall anisotropic B (.ANG..sub.2)
B11/B22/B33 -21.2/59.3/-38.0 R.m.s deviations Bond lengths (.ANG.)
0.004 Bond angles (.degree.) 0.6764 Ramachandran plot** % favored
96.4 allowed 3.6 generously allowed 0.0 disallowed 0.0 *High
resolution shell in parenthesis. **As defined by Molprobity
R.sub.merge = .SIGMA..sub.hkI .SIGMA..sub.i|I.sub.i -
<I>/.SIGMA..sub.hki.SIGMA..sub.iI.sub.i
Sequence CWU 1
1
291530PRTArtificial Sequencesynthetic fusion protein 1Met Lys Thr
Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1 5 10 15 Asp
Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly Asn 20 25
30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu Arg Leu Lys Ile Tyr
35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly His Leu
Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser Glu Leu
Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr Asn Gly Val Ile Thr Lys
Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn Gln Asp Val Asp Ala Ala
Val Arg Gly Ile Leu Arg Asn Ala 100 105 110 Lys Leu Lys Pro Val Tyr
Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115 120 125 Leu Ile Asn Met
Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly Phe 130 135 140 Thr Asn
Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala 145 150 155
160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro Asn Arg Ala
165 170 175 Lys Arg Val Ile Thr Thr Phe Arg Thr Gly Thr Trp Asp Ala
Tyr Ala 180 185 190 Ala Asp Glu Val Trp Val Val Gly Met Gly Ile Val
Met Ser Leu Ile 195 200 205 Val Leu Ala Ile Val Phe Gly Asn Val Leu
Val Ile Thr Ala Ile Ala 210 215 220 Lys Phe Glu Arg Leu Gln Thr Val
Thr Asn Tyr Phe Ile Thr Ser Leu 225 230 235 240 Ala Cys Ala Asp Leu
Val Met Gly Leu Ala Val Val Pro Phe Gly Ala 245 250 255 Ala His Ile
Leu Thr Lys Thr Trp Thr Phe Gly Asn Phe Trp Cys Glu 260 265 270 Phe
Trp Thr Ser Ile Asp Val Leu Cys Val Thr Ala Ser Ile Glu Thr 275 280
285 Leu Cys Val Ile Ala Val Asp Arg Tyr Phe Ala Ile Thr Ser Pro Phe
290 295 300 Lys Tyr Gln Ser Leu Leu Thr Lys Asn Lys Ala Arg Val Ile
Ile Leu 305 310 315 320 Met Val Trp Ile Val Ser Gly Leu Thr Ser Phe
Leu Pro Ile Gln Met 325 330 335 His Trp Tyr Arg Ala Thr His Gln Glu
Ala Ile Asn Cys Tyr Ala Glu 340 345 350 Glu Thr Cys Cys Asp Phe Phe
Thr Asn Gln Ala Tyr Ala Ile Ala Ser 355 360 365 Ser Ile Val Ser Phe
Tyr Val Pro Leu Val Ile Met Val Phe Val Tyr 370 375 380 Ser Arg Val
Phe Gln Glu Ala Lys Arg Gln Leu Gln Lys Ile Asp Lys 385 390 395 400
Ser Glu Gly Arg Phe His Val Gln Asn Leu Ser Gln Val Glu Gln Asp 405
410 415 Gly Arg Thr Gly His Gly Leu Arg Arg Ser Ser Lys Phe Cys Leu
Lys 420 425 430 Glu His Lys Ala Leu Lys Thr Leu Gly Ile Ile Met Gly
Thr Phe Thr 435 440 445 Leu Cys Trp Leu Pro Phe Phe Ile Val Asn Ile
Val His Val Ile Gln 450 455 460 Asp Asn Leu Ile Arg Lys Glu Val Tyr
Ile Leu Leu Asn Trp Ile Gly 465 470 475 480 Tyr Val Asn Ser Gly Phe
Asn Pro Leu Ile Tyr Cys Arg Ser Pro Asp 485 490 495 Phe Arg Ile Ala
Phe Gln Glu Leu Leu Cys Leu Arg Arg Ser Ser Leu 500 505 510 Lys Ala
Tyr Gly Asn Gly Tyr Ser Ser Asn Gly Asn Thr Gly Glu Gln 515 520 525
Ser Gly 530 258PRTBos taurus 2Arg Pro Asp Phe Cys Leu Glu Pro Pro
Tyr Thr Gly Pro Cys Lys Ala 1 5 10 15 Arg Ile Ile Arg Tyr Phe Tyr
Asn Ala Lys Ala Gly Leu Cys Gln Thr 20 25 30 Phe Val Tyr Gly Gly
Cys Arg Ala Lys Arg Asn Asn Phe Lys Ser Ala 35 40 45 Glu Asp Cys
Met Arg Thr Cys Gly Gly Ala 50 55 376PRTBos taurus 3Met Lys Ser Pro
Glu Glu Leu Lys Gly Ile Phe Glu Lys Tyr Ala Ala 1 5 10 15 Lys Glu
Gly Asp Pro Asn Gln Leu Ser Lys Glu Glu Leu Lys Leu Leu 20 25 30
Leu Gln Thr Glu Phe Pro Ser Leu Leu Lys Gly Pro Ser Thr Leu Asp 35
40 45 Glu Leu Phe Glu Glu Leu Asp Lys Asn Gly Asp Gly Glu Val Ser
Phe 50 55 60 Glu Glu Phe Gln Val Leu Val Lys Lys Ile Ser Gln 65 70
75 4111PRTBacillus amyloliquefaciens 4Met Ala Gln Val Ile Asn Thr
Phe Asp Gly Val Ala Asp Tyr Leu Gln 1 5 10 15 Thr Tyr His Lys Leu
Pro Asp Asn Tyr Ile Thr Lys Ser Glu Ala Gln 20 25 30 Ala Leu Gly
Trp Val Ala Ser Lys Gly Asn Leu Ala Asp Val Ala Pro 35 40 45 Gly
Lys Ser Ile Gly Gly Asp Ile Phe Ser Asn Arg Glu Gly Lys Leu 50 55
60 Pro Gly Lys Ser Gly Arg Thr Trp Arg Glu Ala Asp Ile Asn Tyr Thr
65 70 75 80 Ser Gly Phe Arg Asn Ser Asp Arg Ile Leu Tyr Ser Ser Asp
Trp Leu 85 90 95 Ile Tyr Lys Thr Thr Asp His Tyr Gln Thr Phe Thr
Lys Ile Arg 100 105 110 5190PRTTrichoderma reesei 5Glu Thr Ile Gln
Pro Gly Thr Gly Tyr Asn Asn Gly Tyr Phe Tyr Ser 1 5 10 15 Tyr Trp
Asn Asp Gly His Gly Gly Val Thr Tyr Thr Asn Gly Pro Gly 20 25 30
Gly Gln Phe Ser Val Asn Trp Ser Asn Ser Gly Asn Phe Val Gly Gly 35
40 45 Lys Gly Trp Gln Pro Gly Thr Lys Asn Lys Val Ile Asn Phe Ser
Gly 50 55 60 Ser Tyr Asn Pro Asn Gly Asn Ser Tyr Leu Ser Val Tyr
Gly Trp Ser 65 70 75 80 Arg Asn Pro Leu Ile Glu Tyr Tyr Ile Val Glu
Asn Phe Gly Thr Tyr 85 90 95 Asn Pro Ser Thr Gly Ala Thr Lys Leu
Gly Glu Val Thr Ser Asp Gly 100 105 110 Ser Val Tyr Asp Ile Tyr Arg
Thr Gln Arg Val Asn Gln Pro Ser Ile 115 120 125 Ile Gly Thr Ala Thr
Phe Tyr Gln Tyr Trp Ser Val Arg Arg Asn His 130 135 140 Arg Ser Ser
Gly Ser Val Asn Thr Ala Asn His Phe Asn Ala Trp Ala 145 150 155 160
Gln Gln Gly Leu Thr Leu Gly Thr Met Asp Tyr Gln Ile Val Ala Val 165
170 175 Glu Gly Tyr Phe Ser Ser Gly Ser Ala Ser Ile Thr Val Ser 180
185 190 6455PRTPyrococcus furiosus 6Met Pro Thr Trp Glu Glu Leu Tyr
Lys Asn Ala Ile Glu Lys Ala Ile 1 5 10 15 Lys Ser Val Pro Lys Val
Lys Gly Val Leu Leu Gly Tyr Asn Thr Asn 20 25 30 Ile Asp Ala Ile
Lys Tyr Leu Asp Ser Lys Asp Leu Glu Glu Arg Ile 35 40 45 Ile Lys
Ala Gly Lys Glu Glu Val Ile Lys Tyr Ser Glu Glu Leu Pro 50 55 60
Asp Lys Ile Asn Thr Val Ser Gln Leu Leu Gly Ser Ile Leu Trp Ser 65
70 75 80 Ile Arg Arg Gly Lys Ala Ala Glu Leu Phe Val Glu Ser Cys
Pro Val 85 90 95 Arg Phe Tyr Met Lys Arg Trp Gly Trp Asn Glu Leu
Arg Met Gly Gly 100 105 110 Gln Ala Gly Ile Met Ala Asn Leu Leu Gly
Gly Val Tyr Gly Val Pro 115 120 125 Val Ile Val His Val Pro Gln Leu
Ser Arg Leu Gln Ala Asn Leu Phe 130 135 140 Leu Asp Gly Pro Ile Tyr
Val Pro Thr Leu Glu Asn Gly Glu Val Lys 145 150 155 160 Leu Ile His
Pro Lys Glu Phe Ser Gly Asp Glu Glu Asn Cys Ile His 165 170 175 Tyr
Ile Tyr Glu Phe Pro Arg Gly Phe Arg Val Phe Glu Phe Glu Ala 180 185
190 Pro Arg Glu Asn Arg Phe Ile Gly Ser Ala Asp Asp Tyr Asn Thr Thr
195 200 205 Leu Phe Ile Arg Glu Glu Phe Arg Glu Ser Phe Ser Glu Val
Ile Lys 210 215 220 Asn Val Gln Leu Ala Ile Leu Ser Gly Leu Gln Ala
Leu Thr Lys Glu 225 230 235 240 Asn Tyr Lys Glu Pro Phe Glu Ile Val
Lys Ser Asn Leu Glu Val Leu 245 250 255 Asn Glu Arg Glu Ile Pro Val
His Leu Glu Phe Ala Phe Thr Pro Asp 260 265 270 Glu Lys Val Arg Glu
Glu Ile Leu Asn Val Leu Gly Met Phe Tyr Ser 275 280 285 Val Gly Leu
Asn Glu Val Glu Leu Ala Ser Ile Met Glu Ile Leu Gly 290 295 300 Glu
Lys Lys Leu Ala Lys Glu Leu Leu Ala His Asp Pro Val Asp Pro 305 310
315 320 Ile Ala Val Thr Glu Ala Met Leu Lys Leu Ala Lys Lys Thr Gly
Val 325 330 335 Lys Arg Ile His Phe His Thr Tyr Gly Tyr Tyr Leu Ala
Leu Thr Glu 340 345 350 Tyr Lys Gly Glu His Val Arg Asp Ala Leu Leu
Phe Ala Ala Leu Ala 355 360 365 Ala Ala Ala Lys Ala Met Lys Gly Asn
Ile Thr Ser Leu Glu Glu Ile 370 375 380 Arg Glu Ala Thr Ser Val Pro
Val Asn Glu Lys Ala Thr Gln Val Glu 385 390 395 400 Glu Lys Leu Arg
Ala Glu Tyr Gly Ile Lys Glu Gly Ile Gly Glu Val 405 410 415 Glu Gly
Tyr Gln Ile Ala Phe Ile Pro Thr Lys Ile Val Ala Lys Pro 420 425 430
Lys Ser Thr Val Gly Ile Gly Asp Thr Ile Ser Ser Ser Ala Phe Ile 435
440 445 Gly Glu Phe Ser Phe Thr Leu 450 455 7576PRTArtificial
Sequencesynthetic fusion protein 7Met Lys Thr Ile Ile Ala Leu Ser
Tyr Ile Phe Cys Leu Val Phe Ala 1 5 10 15 Asp Tyr Lys Asp Asp Asp
Asp Ala Glu Asn Leu Tyr Phe Gln Gly Asn 20 25 30 Ile Phe Glu Met
Leu Arg Ile Asp Glu Gly Leu Arg Leu Lys Ile Tyr 35 40 45 Lys Asp
Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly His Leu Leu Thr 50 55 60
Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile 65
70 75 80 Gly Arg Asn Thr Asn Gly Val Ile Thr Lys Asp Glu Ala Glu
Lys Leu 85 90 95 Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly Ile
Leu Arg Asn Ala 100 105 110 Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp
Ala Val Arg Arg Ala Ala 115 120 125 Leu Ile Asn Met Val Phe Gln Met
Gly Glu Thr Gly Val Ala Gly Phe 130 135 140 Thr Asn Ser Leu Arg Met
Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala 145 150 155 160 Val Asn Leu
Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro Asn Arg Ala 165 170 175 Lys
Arg Val Ile Thr Thr Phe Arg Thr Gly Thr Trp Asp Ala Tyr Ala 180 185
190 Ala Thr Ala Cys Lys Ile Thr Ile Thr Val Val Leu Ala Val Leu Ile
195 200 205 Leu Ile Thr Val Ala Gly Asn Val Val Val Cys Leu Ala Val
Gly Leu 210 215 220 Asn Arg Arg Leu Arg Asn Leu Thr Asn Cys Phe Ile
Val Ser Leu Ala 225 230 235 240 Ile Thr Asp Leu Leu Leu Gly Leu Leu
Val Leu Pro Phe Ser Ala Ile 245 250 255 Tyr Gln Leu Ser Cys Lys Trp
Ser Phe Gly Lys Val Phe Cys Asn Ile 260 265 270 Tyr Thr Ser Leu Asp
Val Met Leu Cys Thr Ala Ser Ile Leu Asn Leu 275 280 285 Phe Met Ile
Ser Leu Asp Arg Tyr Cys Ala Val Met Asp Pro Leu Arg 290 295 300 Tyr
Pro Val Leu Val Thr Pro Val Arg Val Ala Ile Ser Leu Val Leu 305 310
315 320 Ile Trp Val Ile Ser Ile Thr Leu Ser Phe Leu Ser Ile His Leu
Gly 325 330 335 Trp Asn Ser Arg Asn Glu Thr Ser Lys Gly Asn His Thr
Thr Ser Lys 340 345 350 Cys Lys Val Gln Val Asn Glu Val Tyr Gly Leu
Val Asp Gly Leu Val 355 360 365 Thr Phe Tyr Leu Pro Leu Leu Ile Met
Cys Ile Thr Tyr Tyr Arg Ile 370 375 380 Phe Lys Val Ala Arg Asp Gln
Ala Lys Arg Ile Asn His Ile Ser Ser 385 390 395 400 Trp Lys Ala Ala
Thr Ile Arg Glu His Lys Ala Thr Val Thr Leu Ala 405 410 415 Ala Val
Met Gly Ala Phe Ile Ile Cys Trp Phe Pro Tyr Phe Thr Ala 420 425 430
Phe Val Tyr Arg Gly Leu Arg Gly Asp Asp Ala Ile Asn Glu Val Leu 435
440 445 Glu Ala Ile Val Leu Trp Leu Gly Tyr Ala Asn Ser Ala Leu Asn
Pro 450 455 460 Ile Leu Tyr Ala Ala Leu Asn Arg Asp Phe Arg Thr Gly
Tyr Gln Gln 465 470 475 480 Leu Phe Cys Cys Arg Leu Ala Asn Arg Asn
Ser His Lys Thr Ser Leu 485 490 495 Arg Ser Asn Ala Ser Gln Leu Ser
Arg Thr Gln Ser Arg Glu Pro Arg 500 505 510 Gln Gln Glu Glu Lys Pro
Leu Lys Leu Gln Val Trp Ser Gly Thr Glu 515 520 525 Val Thr Ala Pro
Gln Gly Ala Thr Asp Arg Pro Trp Leu Cys Leu Pro 530 535 540 Glu Cys
Trp Ser Val Glu Leu Thr His Ser Phe Ile His Leu Phe Ile 545 550 555
560 His Ser Phe Ala Asn Ile His Pro Ile Pro Thr Thr Cys Gln Glu Leu
565 570 575 8594PRTArtificial Sequencesynthetic fusion protein 8Met
Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1 5 10
15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly Asn
20 25 30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu Arg Leu Lys
Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly
His Leu Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser
Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr Asn Gly Val Ile
Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn Gln Asp Val Asp
Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100 105 110 Lys Leu Lys Pro
Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115 120 125 Leu Ile
Asn Met Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly Phe 130 135 140
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala 145
150 155 160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro Asn
Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr Phe Arg Thr Gly Thr Trp
Asp Ala Tyr Ala 180 185 190 Ala Leu Gln Glu Lys Asn Trp Ser Ala Leu
Leu Thr Ala Val Val Ile 195 200 205 Ile Leu Thr Ile Ala Gly Asn Ile
Leu Val Ile Met Ala Val Ser Leu 210 215 220 Glu Lys Lys Leu Gln Asn
Ala Thr Asn Tyr Phe Leu Met Ser Leu Ala 225 230 235 240 Ile Ala Asp
Met Leu Leu Gly Phe Leu Val Met Pro Val Ser Met Leu 245 250 255 Thr
Ile Leu Tyr Gly Tyr Arg Trp Pro Leu Pro Ser Lys Leu Cys Ala 260 265
270 Val Trp Ile Tyr Leu Asp Val Leu
Phe Ser Thr Ala Ser Ile Met His 275 280 285 Leu Cys Ala Ile Ser Leu
Asp Arg Tyr Val Ala Ile Gln Asn Pro Ile 290 295 300 His His Ser Arg
Phe Asn Ser Arg Thr Lys Ala Phe Leu Lys Ile Ile 305 310 315 320 Ala
Val Trp Thr Ile Ser Val Gly Ile Ser Met Pro Ile Pro Val Phe 325 330
335 Gly Leu Gln Asp Asp Ser Lys Val Phe Lys Glu Gly Ser Cys Leu Leu
340 345 350 Ala Asp Asp Asn Phe Val Leu Ile Gly Ser Phe Val Ser Phe
Phe Ile 355 360 365 Pro Leu Thr Ile Met Val Ile Thr Tyr Phe Leu Thr
Ile Lys Ser Leu 370 375 380 Gln Lys Glu Ala Thr Leu Cys Val Ser Asp
Leu Gly Thr Arg Ala Lys 385 390 395 400 Leu Ala Ser Phe Ser Phe Leu
Pro Gln Ser Ser Leu Ser Ser Glu Lys 405 410 415 Leu Phe Gln Arg Ser
Ile His Arg Glu Pro Gly Ser Tyr Thr Gly Arg 420 425 430 Arg Thr Met
Gln Ser Ile Ser Asn Glu Gln Lys Ala Cys Lys Val Leu 435 440 445 Gly
Ile Val Phe Phe Leu Phe Val Val Met Trp Cys Pro Phe Phe Ile 450 455
460 Thr Asn Ile Met Ala Val Ile Cys Lys Glu Ser Cys Asn Glu Asp Val
465 470 475 480 Ile Gly Ala Leu Leu Asn Val Phe Val Trp Ile Gly Tyr
Leu Ser Ser 485 490 495 Ala Val Asn Pro Leu Val Tyr Thr Leu Phe Asn
Lys Thr Tyr Arg Ser 500 505 510 Ala Phe Ser Arg Tyr Ile Gln Cys Gln
Tyr Lys Glu Asn Lys Lys Pro 515 520 525 Leu Gln Leu Ile Leu Val Asn
Thr Ile Pro Ala Leu Ala Tyr Lys Ser 530 535 540 Ser Gln Leu Gln Met
Gly Gln Lys Lys Asn Ser Lys Gln Asp Ala Lys 545 550 555 560 Thr Thr
Asp Asn Asp Cys Ser Met Val Ala Leu Gly Lys Gln His Ser 565 570 575
Glu Glu Ala Ser Lys Asp Asn Ser Asp Gly Val Asn Glu Lys Val Ser 580
585 590 Cys Val 9530PRTArtificial Sequencesynthetic fusion protein
9Met Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1
5 10 15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly
Asn 20 25 30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu Arg Leu
Lys Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile
Gly His Leu Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn Ala Ala Lys
Ser Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr Asn Gly Val
Ile Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn Gln Asp Val
Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100 105 110 Lys Leu Lys
Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115 120 125 Leu
Ile Asn Met Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly Phe 130 135
140 Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala
145 150 155 160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro
Asn Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr Phe Arg Thr Gly Thr
Trp Asp Ala Tyr Ala 180 185 190 Ala Arg His Asn Tyr Ile Phe Val Met
Ile Pro Thr Leu Tyr Ser Ile 195 200 205 Ile Phe Val Val Gly Ile Phe
Gly Asn Ser Leu Val Val Ile Val Ile 210 215 220 Tyr Phe Tyr Met Lys
Leu Lys Thr Val Ala Ser Val Phe Leu Leu Asn 225 230 235 240 Leu Ala
Leu Ala Asp Leu Cys Phe Leu Leu Thr Leu Pro Leu Trp Ala 245 250 255
Val Tyr Thr Ala Met Glu Tyr Arg Trp Pro Phe Gly Asn Tyr Leu Cys 260
265 270 Lys Ile Ala Ser Ala Ser Val Ser Phe Asn Leu Tyr Ala Ser Val
Phe 275 280 285 Leu Leu Thr Cys Leu Ser Ile Asp Arg Tyr Leu Ala Ile
Val His Pro 290 295 300 Met Lys Ser Arg Leu Arg Arg Thr Met Leu Val
Ala Lys Val Thr Cys 305 310 315 320 Ile Ile Ile Trp Leu Leu Ala Gly
Leu Ala Ser Leu Pro Ala Ile Ile 325 330 335 His Arg Asn Val Phe Phe
Ile Glu Asn Thr Asn Ile Thr Val Cys Ala 340 345 350 Phe His Tyr Glu
Ser Gln Asn Ser Thr Leu Pro Ile Gly Leu Gly Leu 355 360 365 Thr Lys
Asn Ile Leu Gly Phe Leu Phe Pro Phe Leu Ile Ile Leu Thr 370 375 380
Ser Tyr Thr Leu Ile Trp Lys Ala Leu Lys Lys Ala Tyr Glu Ile Gln 385
390 395 400 Lys Asn Lys Pro Arg Asn Asp Asp Ile Phe Lys Ile Ile Met
Ala Ile 405 410 415 Val Leu Phe Phe Phe Phe Ser Trp Ile Pro His Gln
Ile Phe Thr Phe 420 425 430 Leu Asp Val Leu Ile Gln Leu Gly Ile Ile
Arg Asp Cys Arg Ile Ala 435 440 445 Asp Ile Val Asp Thr Ala Met Pro
Ile Thr Ile Cys Ile Ala Tyr Phe 450 455 460 Asn Asn Cys Leu Asn Pro
Leu Phe Tyr Gly Phe Leu Gly Lys Lys Phe 465 470 475 480 Lys Arg Tyr
Phe Leu Gln Leu Leu Lys Tyr Ile Pro Pro Lys Ala Lys 485 490 495 Ser
His Ser Asn Leu Ser Thr Lys Met Ser Thr Leu Ser Tyr Arg Pro 500 505
510 Ser Asp Asn Val Ser Ser Ser Thr Lys Lys Pro Ala Pro Cys Phe Glu
515 520 525 Val Glu 530 10528PRTArtificial Sequencesynthetic fusion
protein 10Met Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val
Phe Ala 1 5 10 15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr
Phe Gln Gly Asn 20 25 30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly
Leu Arg Leu Lys Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr
Ile Gly Ile Gly His Leu Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn
Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr
Asn Gly Val Ile Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn
Gln Asp Val Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100 105 110
Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115
120 125 Leu Ile Asn Met Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly
Phe 130 135 140 Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp
Glu Ala Ala 145 150 155 160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn
Gln Thr Pro Asn Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr Phe Arg
Thr Gly Thr Trp Asp Ala Tyr Ala 180 185 190 Ala Ser Met Ile Thr Ala
Thr Thr Ile Met Ala Leu Tyr Ser Ile Val 195 200 205 Cys Val Val Gly
Leu Phe Gly Asn Phe Leu Val Met Tyr Val Ile Val 210 215 220 Arg Tyr
Thr Lys Met Lys Thr Ala Thr Asn Ile Tyr Ile Phe Asn Leu 225 230 235
240 Ala Leu Ala Asp Ala Leu Ala Thr Ser Thr Leu Pro Phe Gln Ser Val
245 250 255 Asn Tyr Leu Met Gly Thr Trp Pro Phe Gly Thr Ile Leu Cys
Lys Ile 260 265 270 Val Ile Ser Ile Asp Tyr Tyr Asn Met Phe Thr Ser
Ile Phe Thr Leu 275 280 285 Cys Thr Met Ser Val Asp Arg Tyr Ile Ala
Val Cys His Pro Val Lys 290 295 300 Ala Leu Asp Phe Arg Thr Pro Arg
Asn Ala Lys Ile Ile Asn Val Cys 305 310 315 320 Asn Trp Ile Leu Ser
Ser Ala Ile Gly Leu Pro Val Met Phe Met Ala 325 330 335 Thr Thr Lys
Tyr Arg Gln Gly Ser Ile Asp Cys Thr Leu Thr Phe Ser 340 345 350 His
Pro Thr Trp Tyr Trp Glu Asn Leu Leu Lys Ile Cys Val Phe Ile 355 360
365 Phe Ala Phe Ile Met Pro Val Leu Ile Ile Thr Val Cys Tyr Gly Leu
370 375 380 Met Ile Leu Arg Leu Lys Ser Val Arg Met Leu Ser Gly Ser
Lys Glu 385 390 395 400 Lys Asp Arg Asn Leu Arg Arg Ile Thr Arg Met
Val Leu Val Val Val 405 410 415 Ala Val Phe Ile Val Cys Trp Thr Pro
Ile His Ile Tyr Val Ile Ile 420 425 430 Lys Ala Leu Val Thr Ile Pro
Glu Thr Thr Phe Gln Thr Val Ser Trp 435 440 445 His Phe Cys Ile Ala
Leu Gly Tyr Thr Asn Ser Cys Leu Asn Pro Val 450 455 460 Leu Tyr Ala
Phe Leu Asp Glu Asn Phe Lys Arg Cys Phe Arg Glu Phe 465 470 475 480
Cys Ile Pro Thr Ser Ser Asn Ile Glu Gln Gln Asn Ser Thr Arg Ile 485
490 495 Arg Gln Asn Thr Arg Asp His Pro Ser Thr Ala Asn Thr Val Asp
Arg 500 505 510 Thr Asn His Gln Leu Glu Asn Leu Glu Ala Glu Thr Ala
Pro Leu Pro 515 520 525 11543PRTArtificial Sequencesynthetic fusion
protein 11Met Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val
Phe Ala 1 5 10 15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr
Phe Gln Gly Asn 20 25 30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly
Leu Arg Leu Lys Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr
Ile Gly Ile Gly His Leu Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn
Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr
Asn Gly Val Ile Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn
Gln Asp Val Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100 105 110
Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115
120 125 Leu Ile Asn Met Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly
Phe 130 135 140 Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp
Glu Ala Ala 145 150 155 160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn
Gln Thr Pro Asn Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr Phe Arg
Thr Gly Thr Trp Asp Ala Tyr Ala 180 185 190 Ala Leu Pro Leu Ala Met
Ile Phe Thr Leu Ala Leu Ala Tyr Gly Ala 195 200 205 Val Ile Ile Leu
Gly Val Ser Gly Asn Leu Ala Leu Ile Ile Ile Ile 210 215 220 Leu Lys
Gln Lys Glu Met Arg Asn Val Thr Asn Ile Leu Ile Val Asn 225 230 235
240 Leu Ser Phe Ser Asp Leu Leu Val Ala Ile Met Cys Leu Pro Leu Thr
245 250 255 Phe Val Tyr Thr Leu Met Asp His Trp Val Phe Gly Glu Ala
Met Cys 260 265 270 Lys Leu Asn Pro Phe Val Gln Cys Val Ser Ile Thr
Val Ser Ile Phe 275 280 285 Ser Leu Val Leu Ile Ala Val Glu Arg His
Gln Leu Ile Ile Asn Pro 290 295 300 Arg Gly Trp Arg Pro Asn Asn Arg
His Ala Tyr Val Gly Ile Ala Val 305 310 315 320 Ile Trp Val Leu Ala
Val Ala Ser Ser Leu Pro Phe Leu Ile Tyr Gln 325 330 335 Val Met Thr
Asp Glu Pro Phe Gln Asn Val Thr Leu Asp Ala Tyr Lys 340 345 350 Asp
Lys Tyr Val Cys Phe Asp Gln Phe Pro Ser Asp Ser His Arg Leu 355 360
365 Ser Tyr Thr Thr Leu Leu Leu Val Leu Gln Tyr Phe Gly Pro Leu Cys
370 375 380 Phe Ile Phe Ile Cys Tyr Phe Lys Ile Tyr Ile Arg Leu Lys
Arg Arg 385 390 395 400 Asn Asn Met Met Asp Lys Met Arg Asp Asn Lys
Tyr Arg Ser Ser Glu 405 410 415 Thr Lys Arg Ile Asn Ile Met Leu Leu
Ser Ile Val Val Ala Phe Ala 420 425 430 Val Cys Trp Leu Pro Leu Thr
Ile Phe Asn Thr Val Phe Asp Trp Asn 435 440 445 His Gln Ile Ile Ala
Thr Cys Asn His Asn Leu Leu Phe Leu Leu Cys 450 455 460 His Leu Thr
Ala Met Ile Ser Thr Cys Val Asn Pro Ile Phe Tyr Gly 465 470 475 480
Phe Leu Asn Lys Asn Phe Gln Arg Asp Leu Gln Phe Phe Phe Asn Phe 485
490 495 Cys Asp Phe Arg Ser Arg Asp Asp Asp Tyr Glu Thr Ile Ala Met
Ser 500 505 510 Thr Met His Thr Asp Val Ser Lys Thr Ser Leu Lys Gln
Ala Ser Pro 515 520 525 Val Ala Phe Lys Lys Ile Asn Asn Asn Asp Asp
Asn Glu Lys Ile 530 535 540 12482PRTArtificial Sequencesynthetic
fusion protein 12Met Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys
Leu Val Phe Ala 1 5 10 15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn
Leu Tyr Phe Gln Gly Asn 20 25 30 Ile Phe Glu Met Leu Arg Ile Asp
Glu Gly Leu Arg Leu Lys Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr
Tyr Thr Ile Gly Ile Gly His Leu Leu Thr 50 55 60 Lys Ser Pro Ser
Leu Asn Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg
Asn Thr Asn Gly Val Ile Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95
Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100
105 110 Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala
Ala 115 120 125 Leu Ile Asn Met Val Phe Gln Met Gly Glu Thr Gly Val
Ala Gly Phe 130 135 140 Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg
Trp Asp Glu Ala Ala 145 150 155 160 Val Asn Leu Ala Lys Ser Arg Trp
Tyr Asn Gln Thr Pro Asn Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr
Phe Arg Thr Gly Thr Trp Asp Ala Tyr Ala 180 185 190 Ala Trp Pro His
Leu Glu Val Val Ile Phe Val Val Val Leu Ile Phe 195 200 205 Tyr Leu
Met Thr Leu Ile Gly Asn Leu Phe Ile Ile Ile Leu Ser Tyr 210 215 220
Leu Asp Ser His Leu His Thr Pro Met Tyr Phe Phe Leu Ser Asn Leu 225
230 235 240 Ser Phe Leu Asp Leu Cys Tyr Thr Thr Ser Ser Ile Pro Gln
Leu Leu 245 250 255 Val Asn Leu Trp Gly Pro Glu Lys Thr Ile Ser Tyr
Ala Gly Cys Met 260 265 270 Ile Gln Leu Tyr Phe Val Leu Ala Leu Gly
Thr Ala Glu Cys Val Leu 275 280 285 Leu Val Val Met Ser Tyr Asp Arg
Tyr Ala Ala Val Cys Arg Pro Leu 290 295 300 His Tyr Thr Val Leu Met
His Pro Arg Phe Cys His Leu Leu Ala Val 305 310 315 320 Ala Ser Trp
Val Ser Gly Phe Thr Asn Ser Ala Leu His Ser Ser Phe 325 330 335 Thr
Phe Trp Val Pro Leu Cys Gly His Arg Gln Val Asp His Phe Phe 340 345
350 Cys Glu Val Pro Ala Leu Leu Arg Leu Ser Cys Val Asp Thr His Val
355
360 365 Asn Glu Leu Thr Leu Met Ile Thr Ser Ser Ile Phe Val Leu Ile
Pro 370 375 380 Leu Ile Leu Ile Leu Thr Ser Tyr Gly Ala Ile Val Gln
Ala Val Leu 385 390 395 400 Arg Met Gln Ser Thr Thr Gly Leu Gln Lys
Val Phe Gly Thr Cys Gly 405 410 415 Ala His Leu Met Ala Val Ser Leu
Phe Phe Ile Pro Ala Met Cys Ile 420 425 430 Tyr Leu Gln Pro Pro Ser
Gly Asn Ser Gln Asp Gln Gly Lys Phe Ile 435 440 445 Ala Leu Phe Tyr
Thr Val Val Thr Pro Ser Leu Asn Pro Leu Ile Tyr 450 455 460 Thr Leu
Arg Asn Lys Val Val Arg Gly Ala Val Lys Arg Leu Met Gly 465 470 475
480 Trp Glu 13570PRTArtificial Sequencesynthetic fusion protein
13Met Lys Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1
5 10 15 Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly
Asn 20 25 30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu Arg Leu
Lys Ile Tyr 35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile
Gly His Leu Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn Ala Ala Lys
Ser Glu Leu Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr Asn Gly Val
Ile Thr Lys Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn Gln Asp Val
Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala 100 105 110 Lys Leu Lys
Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115 120 125 Leu
Ile Asn Met Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly Phe 130 135
140 Thr Asn Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala
145 150 155 160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro
Asn Arg Ala 165 170 175 Lys Arg Val Ile Thr Thr Phe Arg Thr Gly Thr
Trp Asp Ala Tyr Ala 180 185 190 Ala Leu Glu Tyr Gln Val Val Thr Ile
Leu Leu Val Leu Ile Ile Cys 195 200 205 Gly Leu Gly Ile Val Gly Asn
Ile Met Val Val Leu Val Val Met Arg 210 215 220 Thr Lys His Met Arg
Thr Pro Thr Asn Cys Tyr Leu Val Ser Leu Ala 225 230 235 240 Val Ala
Asp Leu Met Val Leu Val Ala Ala Gly Leu Pro Asn Ile Thr 245 250 255
Asp Ser Ile Tyr Gly Ser Trp Val Tyr Gly Tyr Val Gly Cys Leu Cys 260
265 270 Ile Thr Tyr Leu Gln Tyr Leu Gly Ile Asn Ala Ser Ser Cys Ser
Ile 275 280 285 Thr Ala Phe Thr Ile Glu Arg Tyr Ile Ala Ile Cys His
Pro Ile Lys 290 295 300 Ala Gln Phe Leu Cys Thr Phe Ser Arg Ala Lys
Lys Ile Ile Ile Phe 305 310 315 320 Val Trp Ala Phe Thr Ser Leu Tyr
Cys Met Leu Trp Phe Phe Leu Leu 325 330 335 Asp Leu Asn Ile Ser Thr
Tyr Lys Asp Ala Ile Val Ile Ser Cys Gly 340 345 350 Tyr Lys Ile Ser
Arg Asn Tyr Tyr Ser Pro Ile Tyr Leu Met Asp Phe 355 360 365 Gly Val
Phe Tyr Val Val Pro Met Ile Leu Ala Thr Val Leu Tyr Gly 370 375 380
Phe Ile Ala Arg Ile Leu Phe Leu Asn Pro Ile Pro Ser Asp Pro Lys 385
390 395 400 Glu Asn Ser Lys Thr Trp Lys Asn Asp Ser Thr His Gln Asn
Thr Asn 405 410 415 Leu Asn Val Asn Thr Ser Asn Arg Cys Phe Asn Ser
Thr Val Ser Ser 420 425 430 Arg Lys Gln Val Thr Lys Met Leu Ala Val
Val Val Ile Leu Phe Ala 435 440 445 Leu Leu Trp Met Pro Tyr Arg Thr
Leu Val Val Val Asn Ser Phe Leu 450 455 460 Ser Ser Pro Phe Gln Glu
Asn Trp Phe Leu Leu Phe Cys Arg Ile Cys 465 470 475 480 Ile Tyr Leu
Asn Ser Ala Ile Asn Pro Val Ile Tyr Asn Leu Met Ser 485 490 495 Gln
Lys Phe Arg Ala Ala Phe Arg Lys Leu Cys Asn Cys Lys Gln Lys 500 505
510 Pro Thr Glu Lys Pro Ala Asn Tyr Ser Val Ala Leu Asn Tyr Ser Val
515 520 525 Ile Lys Glu Ser Asp His Phe Ser Thr Glu Leu Asp Asp Ile
Thr Val 530 535 540 Thr Asp Thr Tyr Leu Ser Ala Thr Lys Val Ser Phe
Asp Asp Thr Cys 545 550 555 560 Leu Ala Ser Glu Val Ser Phe Ser Gln
Ser 565 570 148PRTArtificial Sequencesynthetic peptide 14Asp Tyr
Lys Asp Asp Asp Asp Ala 1 5 157PRTArtificial Sequencesynthetic
peptide 15Glu Asn Leu Tyr Phe Gln Gly 1 5 169PRTArtificial
Sequencesynthetic peptide 16Ser Glu Asn Leu Tyr Phe Gln Gly Ser 1 5
17502PRTArtificial Sequencesynthetic fusion protein 17Met Lys Thr
Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1 5 10 15 Asp
Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly Asn 20 25
30 Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu Arg Leu Lys Ile Tyr
35 40 45 Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly His Leu
Leu Thr 50 55 60 Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser Glu Leu
Asp Lys Ala Ile 65 70 75 80 Gly Arg Asn Thr Asn Gly Val Ile Thr Lys
Asp Glu Ala Glu Lys Leu 85 90 95 Phe Asn Gln Asp Val Asp Ala Ala
Val Arg Gly Ile Leu Arg Asn Ala 100 105 110 Lys Leu Lys Pro Val Tyr
Asp Ser Leu Asp Ala Val Arg Arg Ala Ala 115 120 125 Leu Ile Asn Met
Val Phe Gln Met Gly Glu Thr Gly Val Ala Gly Phe 130 135 140 Thr Asn
Ser Leu Arg Met Leu Gln Gln Lys Arg Trp Asp Glu Ala Ala 145 150 155
160 Val Asn Leu Ala Lys Ser Arg Trp Tyr Asn Gln Thr Pro Asn Arg Ala
165 170 175 Lys Arg Val Ile Thr Thr Phe Arg Thr Gly Thr Trp Asp Ala
Tyr Ala 180 185 190 Ala Asp Glu Val Trp Val Val Gly Met Gly Ile Val
Met Ser Leu Ile 195 200 205 Val Leu Ala Ile Val Phe Gly Asn Val Leu
Val Ile Thr Ala Ile Ala 210 215 220 Lys Phe Glu Arg Leu Gln Thr Val
Thr Asn Tyr Phe Ile Thr Ser Leu 225 230 235 240 Ala Cys Ala Asp Leu
Val Met Gly Leu Ala Val Val Pro Phe Gly Ala 245 250 255 Ala His Ile
Leu Thr Lys Thr Trp Thr Phe Gly Asn Phe Trp Cys Glu 260 265 270 Phe
Trp Thr Ser Ile Asp Val Leu Cys Val Thr Ala Ser Ile Glu Thr 275 280
285 Leu Cys Val Ile Ala Val Asp Arg Tyr Phe Ala Ile Thr Ser Pro Phe
290 295 300 Lys Tyr Gln Ser Leu Leu Thr Lys Asn Lys Ala Arg Val Ile
Ile Leu 305 310 315 320 Met Val Trp Ile Val Ser Gly Leu Thr Ser Phe
Leu Pro Ile Gln Met 325 330 335 His Trp Tyr Arg Ala Thr His Gln Glu
Ala Ile Asn Cys Tyr Ala Glu 340 345 350 Glu Thr Cys Cys Asp Phe Phe
Thr Asn Gln Ala Tyr Ala Ile Ala Ser 355 360 365 Ser Ile Val Ser Phe
Tyr Val Pro Leu Val Ile Met Val Phe Val Tyr 370 375 380 Ser Arg Val
Phe Gln Glu Ala Lys Arg Gln Leu Gln Lys Ile Asp Lys 385 390 395 400
Phe Cys Leu Lys Glu His Lys Ala Leu Lys Thr Leu Gly Ile Ile Met 405
410 415 Gly Thr Phe Thr Leu Cys Trp Leu Pro Phe Phe Ile Val Asn Ile
Val 420 425 430 His Val Ile Gln Asp Asn Leu Ile Arg Lys Glu Val Tyr
Ile Leu Leu 435 440 445 Asn Trp Ile Gly Tyr Val Asn Ser Gly Phe Asn
Pro Leu Ile Tyr Cys 450 455 460 Arg Ser Pro Asp Phe Arg Ile Ala Phe
Gln Glu Leu Leu Cys Leu Arg 465 470 475 480 Arg Ser Ser Leu Lys Ala
Tyr Gly Asn Gly Tyr Ser Ser Asn Gly Asn 485 490 495 Thr Gly Glu Gln
Ser Gly 500 1831PRTArtificial Sequencesynthetic peptide 18Met Lys
Thr Ile Ile Ala Leu Ser Tyr Ile Phe Cys Leu Val Phe Ala 1 5 10 15
Asp Tyr Lys Asp Asp Asp Asp Ala Glu Asn Leu Tyr Phe Gln Gly 20 25
30 19413PRTArtificial Sequencesynthetic fusion protein 19Met Gly
Gln Pro Gly Asn Gly Ser Ala Phe Leu Leu Ala Pro Asn Arg 1 5 10 15
Ser His Ala Pro Asp His Asp Val Thr Gln Gln Arg Asp Glu Val Trp 20
25 30 Val Val Gly Met Gly Ile Val Met Ser Leu Ile Val Leu Ala Ile
Val 35 40 45 Phe Gly Asn Val Leu Val Ile Thr Ala Ile Ala Lys Phe
Glu Arg Leu 50 55 60 Gln Thr Val Thr Asn Tyr Phe Ile Thr Ser Leu
Ala Cys Ala Asp Leu 65 70 75 80 Val Met Gly Leu Ala Val Val Pro Phe
Gly Ala Ala His Ile Leu Thr 85 90 95 Lys Thr Trp Thr Phe Gly Asn
Phe Trp Cys Glu Phe Trp Thr Ser Ile 100 105 110 Asp Val Leu Cys Val
Thr Ala Ser Ile Glu Thr Leu Cys Val Ile Ala 115 120 125 Val Asp Arg
Tyr Phe Ala Ile Thr Ser Pro Phe Lys Tyr Gln Ser Leu 130 135 140 Leu
Thr Lys Asn Lys Ala Arg Val Ile Ile Leu Met Val Trp Ile Val 145 150
155 160 Ser Gly Leu Thr Ser Phe Leu Pro Ile Gln Met His Trp Tyr Arg
Ala 165 170 175 Thr His Gln Glu Ala Ile Asn Cys Tyr Ala Glu Glu Thr
Cys Cys Asp 180 185 190 Phe Phe Thr Asn Gln Ala Tyr Ala Ile Ala Ser
Ser Ile Val Ser Phe 195 200 205 Tyr Val Pro Leu Val Ile Met Val Phe
Val Tyr Ser Arg Val Phe Gln 210 215 220 Glu Ala Lys Arg Gln Leu Gln
Lys Ile Asp Lys Ser Glu Gly Arg Phe 225 230 235 240 His Val Gln Asn
Leu Ser Gln Val Glu Gln Asp Gly Arg Thr Gly His 245 250 255 Gly Leu
Arg Arg Ser Ser Lys Phe Cys Leu Lys Glu His Lys Ala Leu 260 265 270
Lys Thr Leu Gly Ile Ile Met Gly Thr Phe Thr Leu Cys Trp Leu Pro 275
280 285 Phe Phe Ile Val Asn Ile Val His Val Ile Gln Asp Asn Leu Ile
Arg 290 295 300 Lys Glu Val Tyr Ile Leu Leu Asn Trp Ile Gly Tyr Val
Asn Ser Gly 305 310 315 320 Phe Asn Pro Leu Ile Tyr Cys Arg Ser Pro
Asp Phe Arg Ile Ala Phe 325 330 335 Gln Glu Leu Leu Cys Leu Arg Arg
Ser Ser Leu Lys Ala Tyr Gly Asn 340 345 350 Gly Tyr Ser Ser Asn Gly
Asn Thr Gly Glu Gln Ser Gly Tyr His Val 355 360 365 Glu Gln Glu Lys
Glu Asn Lys Leu Leu Cys Glu Asp Leu Pro Gly Thr 370 375 380 Glu Asp
Phe Val Gly His Gln Gly Thr Val Pro Ser Asp Asn Ile Asp 385 390 395
400 Ser Gln Gly Arg Asn Cys Ser Thr Asn Asp Ser Leu Leu 405 410
204PRTArtificial Sequencesynthetic peptide 20Arg Thr Val Trp 1
215PRTArtificial Sequencesynthetic peptide 21Arg Thr Glu Val Trp 1
5 226PRTArtificial Sequencesynthetic peptide 22Arg Thr Asp Glu Val
Trp 1 5 237PRTArtificial Sequencesynthetic peptide 23Arg Thr Ala
Asp Glu Val Trp 1 5 248PRTArtificial Sequencesynthetic peptide
24Arg Thr Ala Ala Asp Glu Val Trp 1 5 259PRTArtificial
Sequencesynthetic peptide 25Arg Thr Ala Ala Ala Asp Glu Val Trp 1 5
2612PRTArtificial Sequencesynthetic peptide 26Arg Thr Gly Thr Trp
Asp Ala Tyr Asp Glu Val Trp 1 5 10 2713PRTArtificial
Sequencesynthetic peptide 27Arg Thr Gly Thr Trp Asp Ala Tyr Ala Asp
Glu Val Trp 1 5 10 2814PRTArtificial Sequencesynthetic peptide
28Arg Thr Gly Thr Trp Asp Ala Tyr Ala Ala Asp Glu Val Trp 1 5 10
2915PRTArtificial Sequencesynthetic peptide 29Arg Thr Gly Thr Trp
Asp Ala Tyr Ala Ala Ala Asp Glu Val Trp 1 5 10 15
* * * * *
References