U.S. patent application number 15/843233 was filed with the patent office on 2018-07-05 for polyvalent vaccine.
The applicant listed for this patent is BETH ISRAEL DEACONESS MEDICAL CENTER, DUKE UNIVERSITY, LOS ALAMOS NATIONAL SECURITY, LLC, THE UAB RESEARCH FOUNDATION. Invention is credited to William Fischer, Beatrice H. Hahn, Barton F. Haynes, Bette T. Korber, Norman Letvin, Hua-Xin Liao.
Application Number | 20180185471 15/843233 |
Document ID | / |
Family ID | 41669535 |
Filed Date | 2018-07-05 |
United States Patent
Application |
20180185471 |
Kind Code |
A1 |
Korber; Bette T. ; et
al. |
July 5, 2018 |
POLYVALENT VACCINE
Abstract
The present invention relates, in general, to an immunogenic
composition (e.g., a vaccine) and, in particular, to a polyvalent
immunogenic composition, such as a polyvalent HIV vaccine, and to
methods of using same. The invention further relates to methods
that use a genetic algorithm to create sets of polyvalent antigens
suitable for use, for example, in vaccination strategies.
Inventors: |
Korber; Bette T.; (Los
Alamos, NM) ; Fischer; William; (Los Alamos, NM)
; Letvin; Norman; (Boston, MA) ; Liao;
Hua-Xin; (Durham, NC) ; Haynes; Barton F.;
(Durham, NC) ; Hahn; Beatrice H.; (Birmingham,
AL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
DUKE UNIVERSITY
BETH ISRAEL DEACONESS MEDICAL CENTER
LOS ALAMOS NATIONAL SECURITY, LLC
THE UAB RESEARCH FOUNDATION |
Durham
Boston
Los Alamos
Birmingham |
NC
MA
NM
AL |
US
US
US
US |
|
|
Family ID: |
41669535 |
Appl. No.: |
15/843233 |
Filed: |
December 15, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14726373 |
May 29, 2015 |
9844590 |
|
|
15843233 |
|
|
|
|
12737761 |
Feb 1, 2012 |
9044445 |
|
|
PCT/US2009/004664 |
Aug 14, 2009 |
|
|
|
14726373 |
|
|
|
|
12192015 |
Aug 14, 2008 |
7951377 |
|
|
12737761 |
|
|
|
|
11990222 |
Apr 20, 2009 |
8119140 |
|
|
PCT/US2006/032907 |
Aug 23, 2006 |
|
|
|
12192015 |
|
|
|
|
60710154 |
Aug 23, 2005 |
|
|
|
60739413 |
Nov 25, 2005 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C12N 7/00 20130101; C12N
2740/16334 20130101; C12N 2710/24143 20130101; C12N 2740/16122
20130101; C07K 14/005 20130101; A61P 37/00 20180101; A61K 39/12
20130101; A61P 31/18 20180101; A61K 39/21 20130101; C12N 2740/16022
20130101; A61K 2039/53 20130101; C12N 2740/16134 20130101; A61P
37/04 20180101; A61P 31/12 20180101; C12N 2740/16234 20130101; A61K
2039/70 20130101; C12N 2740/16034 20130101 |
International
Class: |
A61K 39/21 20060101
A61K039/21 |
Goverment Interests
GOVERNMENT INTERESTS
[0002] This invention was made with Government support under
Contract No. DE-AC52-06NA25396 awarded by the U.S. Department of
Energy. The Government has certain rights in the invention.
Claims
1. A polypeptide or protein comprising at least one sequence of
amino acids set forth in FIG. 21 or FIG. 22.
2. A nucleic acid encoding the polypeptide or protein according to
claim 1.
3. A nucleic acid comprising at least one sequence of nucleotides
set forth in FIG. 22.
4. A vector comprising the nucleic acid according to claim 2.
5. The vector according to claim 3 wherein said vector is a viral
vector.
6. A composition comprising at least one polypeptide or protein
according to claim 1 and a carrier.
7. A composition comprising at least one nucleic acid according to
claim 2 and a carrier.
8. A method of inducing an immune response in a mammal comprising
administering to said mammal an amount of at least one polypeptide
or protein according to claim 1 sufficient to effect said
induction.
9. A method of inducing an immune response in a mammal comprising
administering to said mammal an amount of at least one nucleic acid
according to claim 2 sufficient to effect said induction.
Description
[0001] This application is a continuation of U.S. application Ser.
No. 14/726,373, filed on May 29, 2015, which is a divisional of
U.S. application Ser. No. 12/737,761, filed Feb. 1, 2012, now U.S.
Pat. No. 9,044,445, which is the U.S. national phase of
International Application No. PCT/US2009/004664, filed 14 Aug.
2009, which designated the U.S. and is a continuation of U.S.
application Ser. No. 12/192,015, filed 14 Aug. 2008, now U.S. Pat.
No. 7,795,377 and continuation-in-part of U.S. application Ser. No.
11/990,222, filed Apr. 20, 2009, now U.S. Pat. No. 8,119,140, which
is the U.S. national phase of International Application No.
PCT/US2006/032907, filed Aug. 23, 2006, which designated the U.S.
and claims the benefit of U.S. Provisional Application No.
60/710,154, filed Aug. 23, 2005, and U.S. Provisional Application
No. 60/739,413, filed Nov. 25, 2005. The entire contents of each of
the above-identified applications are hereby incorporated herein by
reference.
SEQUENCE LISTING
[0003] The instant application contains a "lengthy" Sequence
Listing which has been submitted as an ASCII text file via EFS-Web
in lieu of a paper copy, and is hereby incorporated by reference in
its entirety. The ASCII text file, created on Mar. 9, 2018, is
1,075,982 bytes in size and is named 2933311-029-US15_SL.txt.
TECHNICAL FIELD
[0004] The present invention relates, in general, to an immunogenic
composition (e.g., a vaccine) and, in particular, to a polyvalent
immunogenic composition, such as a polyvalent HIV vaccine, and to
methods of using same. The invention further relates to methods
that use a genetic algorithm to create sets of polyvalent antigens
suitable for use, for example, in vaccination strategies.
BACKGROUND
[0005] Designing an effective HIV vaccine is a many-faceted
challenge. The vaccine preferably elicits an immune response
capable of either preventing infection or, minimally, controlling
viral replication if infection occurs, despite the failure of
immune responses to natural infection to eliminate the virus
(Nabel, Vaccine 20:1945-1947 (2002)) or to protect from
superinfection (Altfeld et al, Nature 420:434-439 (2002)). Potent
vaccines are needed, with optimized vectors, immunization
protocols, and adjuvants (Nabel, Vaccine 20:1945-1947 (2002)),
combined with antigens that can stimulate cross-reactive responses
against the diverse spectrum of circulating viruses (Gaschen et al,
Science 296:2354-2360 (2002), Korber et al, Br. Med. Bull. 58:19-42
(2001)). The problems that influenza vaccinologists have confronted
for decades highlight the challenge posed by HIV-1: human influenza
strains undergoing antigenic drift diverge from one another by
around 1-2% per year, yet vaccine antigens often fail to elicit
cross-reactive B-cell responses from one year to the next,
requiring that contemporary strains be continuously monitored and
vaccines be updated every few years (Korber et al, Br. Med. Bull.
58:19-42 (2001)). In contrast, co-circulating individual HIV-1
strains can differ from one another by 20% or more in relatively
conserved proteins, and up to 35% in the Envelope protein (Gaschen
et al, Science 296:2354-2360 (2002), Korber et al, Br. Med. Bull.
58:19-42 (2001)).
[0006] Different degrees of viral diversity in regional HIV-1
epidemics provide a potentially useful hierarchy for vaccine design
strategies. Some geographic regions recapitulate global diversity,
with a majority of known HIV-1 subtypes, or clades, co-circulating
(e.g., the Democratic Republic of the Congo (Mokili & Korber,
J. Neurovirol 11(Suppl. 1):66-75 (2005)); others are dominated by
two subtypes and their recombinants (e.g., Uganda (Barugahare et
al, J. Virol. 79:4132-4139 (2005)), and others by a single subtype
(e.g., South Africa (Williamson et al, AIDS Res. Hum. Retroviruses
19:133-144 (2003)). Even areas with predominantly single-subtype
epidemics must address extensive within-clade diversity (Williamson
et al, AIDS Res. Hum. Retroviruses 19:133-44 (2003)) but, since
international travel can be expected to further blur geographic
distinctions, all nations would benefit from a global vaccine.
[0007] Presented herein is the design of polyvalent vaccine antigen
sets focusing on T lymphocyte responses, optimized for either the
common B and C subtypes, or all HIV-1 variants in global
circulation [the HIV-1 Main (M) group]. Cytotoxic T-lymphocytes
(CTL) directly kill infected, virus-producing host cells,
recognizing them via viral protein fragments (epitopes) presented
on infected cell surfaces by human leukocyte antigen (HLA)
molecules. Helper T-cell responses control varied aspects of the
immune response through the release of cytokines. Both are likely
to be crucial for an HIV-1 vaccine: CTL responses have been
implicated in slowing disease progression (Oxenius et al, J.
Infect. Dis. 189:1199-208 (2004)); vaccine-elicited cellular immune
responses in nonhuman primates help control pathogenic SIV or SHIV,
reducing the likelihood of disease after challenge (Barouch et al,
Science 290:486-92 (2000)); and experimental depletion of CD8+
T-cells results in increased viremia in SIV infected rhesus
macaques Schmitz et al, Science 283:857-60 (1999)). Furthermore,
CTL escape mutations are associated with disease progression
(Barouch et al, J. Virol. 77:7367-75 (2003)), thus
vaccine-stimulated memory responses that block potential escape
routes may be valuable.
[0008] The highly variable Env protein is the primary target for
neutralizing antibodies against HIV; since immune protection will
likely require both B-cell and T-cell responses (Moore and Burton,
Nat. Med. 10:769-71 (2004)), Env vaccine antigens will also need to
be optimized separately to elicit antibody responses.
T-cell-directed vaccine components, in contrast, can target the
more conserved proteins, but even the most conserved HIV-1 proteins
are diverse enough that variation is an issue. Artificial
central-sequence vaccine approaches (e.g., consensus sequences, in
which every amino acid is found in a plurality of sequences, or
maximum likelihood reconstructions of ancestral sequences (Gaschen
et al, Science 296:2354-60 (2002), Gao et al, J. Virol. 79:1154-63
(2005), Doria-Rose et al, J. Virol. 79:11214-24 (2005), Weaver et
al, J. Virol., in press)) are promising; nevertheless, even
centralized strains provide limited coverage of HIV-1 variants, and
consensus-based reagents fail to detect many autologous T-cell
responses (Altfeld et al, J. Virol. 77:7330-40 (2003)).
[0009] Single amino acid changes can allow an epitope to escape
T-cell surveillance; since many T-cell epitopes differ between
HIV-1 strains at one or more positions, potential responses to any
single vaccine antigen are limited. Whether a particular mutation
results in escape depends upon the specific epitope/T-cell
combination, although some changes broadly affect between-subtype
cross-reactivity (Norris et al, AIDS Res. Hum. Retroviruses
20:315-25 (2004)). Including multiple variants in a polyvalent
vaccine could enable responses to a broader range of circulating
variants, and could also prime the immune system against common
escape mutants (Jones et al, J. Exp. Med. 200:1243-56 (2004)).
Escape from one T-cell receptor may create a variant that is
susceptible to another (Allen et al, J. Virol. 79:12952-60 (2005),
Feeney et al, J. Immunol. 174:7524-30 (2005)), so stimulating
polyclonal responses to epitope variants may be beneficial (Killian
et al, Aids 19:887-96 (2005)). Escape mutations that inhibit
processing (Milicic et al, J. Immunol. 175:4618-26 (2005)) or HLA
binding (Ammaranond et al, AIDS Res. Hum. Retroviruses 21:395-7
(2005)) cannot be directly countered by a T-cell with a different
specificity, but responses to overlapping epitopes may block even
some of these escape routes.
[0010] The present invention relates to a polyvalent vaccine
comprising several "mosaic" proteins (or genes encoding these
proteins). The candidate vaccine antigens can be cocktails of k
composite proteins (k being the number of sequence variants in the
cocktail), optimized to include the maximum number of potential
T-cell epitopes in an input set of viral proteins. The mosaics are
generated from natural sequences: they resemble natural proteins
and include the most common forms of potential epitopes. Since CD8+
epitopes are contiguous and typically nine amino-acids long, sets
of mosaics can be scored by "coverage" of nonamers (9-mers) in the
natural sequences (fragments of similar lengths are also well
represented). 9-Mers not found at least three times can be
excluded. This strategy provides the level of diversity coverage
achieved by a massively polyvalent multiple-peptide vaccine but
with important advantages: it allows vaccine delivery as intact
proteins or genes, excludes low-frequency or unnatural epitopes
that are not relevant to circulating strains, and its intact
protein antigens are more likely to be processed as in a natural
infection.
SUMMARY OF THE INVENTION
[0011] In general, the present invention relates to an immunogenic
composition. More specifically, the invention relates to a
polyvalent immunogenic composition (e.g., an HIV vaccine), and to
methods of using same. The invention further relates to methods
that involve the use of a genetic algorithm to design sets of
polyvalent antigens suitable for use as vaccines.
[0012] Objects and advantages of the present invention will be
clear from the description that follows.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIGS. 1A-1F. The upper bound of potential epitope coverage
of the HIV-1 M group. The upper bound for population coverage of
9-mers for increasing numbers of variants is shown, for k=1-8
variants. A sliding window of length nine was applied across
aligned sequences, moving down by one position. Different colors
denote results for different numbers of sequences. At each window,
the coverage given by the k most common 9-mers is plotted for Gag
(FIGS. 1A and 1B), Nef (FIGS. 1C and 1D) and Env gp120 (FIGS. 1E
and 1F). Gaps inserted to maintain the alignment are treated as
characters. The diminishing returns of adding more variants are
evident, since, as k increases, increasingly rare forms are added.
In FIGS. 1A, 1C and 1E, the scores for each consecutive 9-mer are
plotted in their natural order to show how diversity varies in
different protein regions; both p24 in the center of Gag and the
central region of Nef are particularly highly conserved. In FIGS.
1B, 1D and 1F, the scores for each 9-mer are reordered by coverage
(a strategy also used in FIG. 4), to provide a sense of the overall
population coverage of a given protein. Coverage of gp120, even
with 8 variant 9-mers, is particularly poor (FIGS. 1E and 1F).
[0014] FIGS. 2A-2C. Mosaic initialization, scoring, and
optimization.
[0015] FIG. 2A) A set of k populations is generated by random
2-point recombination of natural sequences (1-6 populations of
50-500 sequences each have been tested). One sequence from each
population is chosen (initially at random) for the mosaic cocktail,
which is subsequently optimized. The cocktail sequences are scored
by computing coverage (defined as the mean fraction of
natural-sequence 9-mers included in the cocktail, averaged over all
natural sequences in the input data set). Any new sequence that
covers more epitopes will increase the score of the whole cocktail.
FIG. 2B) The fitness score of any individual sequence is the
coverage of a cocktail containing that sequence plus the current
representatives from other populations. FIG. 2C) Optimization: 1)
two "parents" are chosen: the higher-scoring of a randomly chosen
pair of recombined sequences, and either (with 50% probability) the
higher-scoring sequence of a second random pair, or a randomly
chosen natural sequence. 2) Two-point recombination between the two
parents is used to generate a "child" sequence. If the child
contains unnatural or rare 9-mers, it is immediately rejected,
otherwise it is scored (Gaschen et al, Science 296:2354-2360
(2002)). If the score is higher than that of any of four
randomly-selected population members, the child is inserted in the
population in place of the weakest of the four, thus evolving an
improved population; 4) if its score is a new high score, the new
child replaces the current cocktail member from its population. Ten
cycles of child generation are repeated for each population in
turn, and the process iterates until improvement stalls.
[0016] FIG. 3. Mosaic strain coverage for all HIV proteins. The
level of 9-mer coverage achieved by sets of four mosaic proteins
for each HIV protein is shown, with mosaics optimized using either
the M group or the C subtype. The fraction of C subtype sequence
9-mers covered by mosaics optimized on the C subtype (within-clade
optimization) is shown in gray. Coverage of 9-mers found in non-C
subtype M-group sequences by subtype-C-optimized mosaics
(between-clade coverage) is shown in white. Coverage of subtype C
sequences by M-group optimized mosaics is shown in black. B clade
comparisons gave comparable results (data not shown).
[0017] FIGS. 4A-4F. Coverage of M group sequences by different
vaccine candidates, nine-mer by nine-mer. Each plot presents
site-by-site coverage (i.e., for each nine-mer) of an M-group
natural-sequence alignment by a single tri-valent vaccine
candidate. Bars along the x-axis represent the proportion of
sequences matched by the vaccine candidate for a given alignment
position: 9/9 matches (in red), 8/9 (yellow), 7/9 (blue). Aligned
9-mers are sorted along the x-axis by exact-match coverage value.
656 positions include both the complete Gag and the central region
of Nef. For each alignment position, the maximum possible matching
value (i.e. the proportion of aligned sequences without gaps in
that nine-mer) is shown in gray. FIG. 4A) Non-optimal natural
sequences selected from among strains being used in vaccine studies
(Kong et al, J. Virol. 77:12764-72 (2003)) including an individual
clade A, B, and C viral sequences (Gag: GenBank accession numbers
AF004885, K03455, and U52953; Nef core: AF069670, K02083, and
U52953). FIG. 4B) Optimum set of natural sequences [isolates US2
(subtype B, USA), 70177 (subtype C, India), and 99TH.R2399 (subtype
CRF15_01B, Thailand); accession numbers AY173953, AF533131,
and_AF530576] selected by choosing the single sequence with maximum
coverage, followed by the sequence that had the best coverage when
combined with the first (i.e. the best complement), and so on,
selected for M group coverage FIG. 4C) Consensus sequence cocktail
(M group, B- and C-subtypes). FIG. 4D) 3 mosaic sequences, FIG. 4E)
4 mosaic sequences, FIG. 4F) 6 mosaic sequences. FIGS. 4D-4F were
all optimized for M group coverage.
[0018] FIGS. 5A and 5B. Overall coverage of vaccine candidates:
coverage of 9-mers in C clade sequences using different input data
sets for mosaic optimization, allowing different numbers of
antigens, and comparing to different candidate vaccines. Exact
(blue), 8/9 (one-off; red), and 7/9 (two-off; yellow) coverage was
computed for mono- and polyvalent vaccine candidates for Gag (FIG.
5A) and Nef (core) (FIG. 5B) for four test situations: within-clade
(C-clade-optimized candidates scored for C-clade coverage),
between-clade (B-clade-optimized candidates scored for C-clade
coverage), global-against-single-subtype (M-group-optimized
candidates scored for C-clade coverage), global-against-global
(M-group-optimized candidates scored for global coverage). Within
each set of results, vaccine candidates are grouped by number of
sequences in the cocktail (1-6); mosaic sequences are plotted with
darker colors. "Non-opt" refers to one set of sequences moving into
vaccine trials (Kong et al, J. Virol. 77:12764-72 (2003)); "mosaic"
denotes sequences generated by the genetic algorithm; "opt.
natural" denotes intact natural sequences selected for maximum
9-mer coverage; "MBC consensus" denotes a cocktail of 3 consensus
sequences, for M-group, B-subtype, and C-subtype. For ease of
comparison, a dashed line marks the coverage of a 4-sequence set of
M-group mosaics (73.7-75.6%). Over 150 combinations of
mosaic-number, virus subset, protein region, and optimization and
test sets were tested. The C clade/B clade/M group comparisons
illustrated in this figure are generally representative of
within-clade, between-clade, and M group coverage. In particular,
levels of mosaic coverage for B and C clade were very similar,
despite there being many more C clade sequences in the Gag
collection, and many more B clade sequences in the Nef collection
(see FIG. 6 for a full B and C clade comparison). There were
relatively few A and G clade sequences in the alignments (24 Gag,
75 Nef), and while 9-mer coverage by M-group optimized mosaics was
not as high as for subtypes for B and C clades (4-mosaic coverage
for A and G subtypes was 63% for Gag, 74% for Nef), it was much
better than a non-optimal cocktail (52% Gag, 52% for Nef).
[0019] FIGS. 6A and 6B. Overall coverage of vaccine candidates:
coverage of 9-mers in B-clade, C-clade, and M-group sequences using
different input data sets for mosaic optimization, allowing
different numbers of antigens, and comparing to different candidate
vaccines. Exact (blue), 8/9 (one-off; red), and 7/9 (two-off;
yellow) coverage was computed for mono- and polyvalent vaccine
candidates for Gag (FIG. 6A) and Nef (core) (FIG. 6B) for seven
test situations: within-clade (B- or C-clade-optimized candidates
scored against the same clade), between-clade (B- or
C-clade-optimized candidates scored against the other clade),
global vaccine against single subtype (M-group-optimized candidates
scored against B- or C-clade), global vaccine against global
viruses (M-group-optimized candidates scored against all M-group
sequences). Within each set of results, vaccine candidates are
grouped by number of sequences in the cocktail (1-6); mosaic
sequences are plotted with darker colors. "Non-opt" refers to a
particular set of natural sequences previously proposed for a
vaccine (Kong, W. P. et al. J Virol 77, 12764-72 (2003)); "mosaic"
denotes sequences generated by the genetic algorithm; "opt.
natural" denotes intact natural sequences selected for maximum
9-mer coverage; "MBC consensus" denotes a cocktail of 3 consensus
sequences, for M-group, B-subtype, and C-subtype. A dashed line is
shown at the level of exact-match M-group coverage for a 4-valent
mosaic set optimized on the M-group.
[0020] FIGS. 7A and 7B. The distribution of 9-mers by frequency of
occurrence in natural, consensus, and mosaic sequences. Occurrence
counts (y-axis) for different 9-mer frequencies (x-axis) for
vaccine cocktails produced by several methods. FIG. 7A: frequencies
from 0-60% (for 9-mer frequencies >60%, the distributions are
equivalent for all methods). FIG. 7B: Details of low-frequency
9-mers. Natural sequences have large numbers of rare or
unique-to-isolate 9-mers (bottom right, FIGS. 7A and 7B); these are
unlikely to induce useful vaccine responses. Selecting optimal
natural sequences does select for more common 9-mers, but rare and
unique 9-mers are still included (top right, FIGS. 7A and 7B).
Consensus cocktails, in contrast, under-represent uncommon 9-mers,
especially below 20% frequency (bottom left, FIGS. 7A and 7B). For
mosaic sequences, the number of lower-frequency 9-mers
monotonically increases with the number of sequences (top left,
each panel), but unique-to-isolate 9-mers are completely excluded
(top left of right panel: * marks the absence of 9-mers with
frequencies <0.005).
[0021] FIGS. 8A-8D. HLA binding potential of vaccine candidates.
FIGS. 8A and 8B) HLA binding motif counts. FIGS. 8C and 8D) number
of unfavorable amino acids. In all graphs: natural sequences are
marked with black circles (.lamda.); consensus sequences with blue
triangles (.sigma.); inferred ancestral sequences with green
squares () and mosaic sequences with red diamonds () Left panel
(FIGS. 8A and 8C) shows HLA-binding-motif counts (FIG. 8A) and
counts of unfavorable amino acids (FIG. 8C) calculated for
individual sequences; Right panel (FIGS. 8B and 8D) shows HLA
binding motifs counts (FIG. 8B) and counts of unfavorable amino
acids (FIG. 8D) calculated for sequence cocktails. The top portion
of each graph (box-and-whiskers graph) shows the distribution of
respective counts (motif counts or counts of unfavorable amino
acids) based either on alignment of M group sequences (for
individual sequences, FIGS. 8A and 8C) or on 100 randomly composed
cocktails of three sequences, one from each A, B and C subtypes
(for sequence cocktails, FIGS. 8B and 8D). The alignment was
downloaded from the Los Alamos HIV database. The box extends from
the 25 percentile to the 75 percentile, with the line at the
median. The whiskers extending outside the box show the highest and
lowest values. Amino acids that are very rarely found as C-terminal
anchor residues are G, S, T, P, N, Q, D, E, and H, and tend to be
small, polar, or negatively charged (Yusim et al, J. Virol.
76:8757-8768 (2002)). Results are shown for Gag, but the same
qualitative results hold for Nef core and complete Nef. The same
procedure was done for supertype motifs with results qualitatively
similar to the results for HLA binding motifs (data not shown).
[0022] FIG. 9. Mosaic protein sets limited to 4 sequences (k=4),
spanning Gag and the central region of Nef, optimized for subtype
B, subtype C, and the M group. Figure discloses SEQ ID NOS: 1-84,
respectively, in order of appearance.
[0023] FIG. 10. Mosaic sets for Env and Pol. Figure discloses SEQ
ID NOS 85-168, respectively, in order of appearance.
[0024] FIG. 11. This plot is alignment independent, based on
splintering all M group proteins, (database and CHAVI, one sequence
per person) into all possible 9-mers, attending to their
frequencies, and then looking for matches and near matches in each
vaccine antigen or cocktail with the database.
[0025] FIG. 12. Additional summaries of coverage.
[0026] FIG. 13. 9-mer coverage by position (Mos. 3 vaccine
cocktail).
[0027] FIGS. 14A-14D. Plots resorted by frequency of 9-mer matches
for each vaccine proposed for use.
[0028] FIGS. 15A-15D. Plots mapping every amino acid in every
sequence in the full database alignment.
[0029] FIG. 16. 3 Mosaic, M group Optimizations.
[0030] FIG. 17. Coverage of the HIV database plus CHAVI sequences
(N=2020).
[0031] FIG. 18. Differences in acute infection patient sequences
compared to patient consensus.
[0032] FIG. 19. The compromise and benefit in terms of coverage for
Env M group versus subtype-specific design.
[0033] FIG. 20. Proposed vaccine mosaic coverage of Gag and
Env.
[0034] FIG. 21. Gag, Nef and Env sequences. Figure discloses SEQ ID
NOS 169-179, respectively, in order of appearance.
[0035] FIG. 22. Mosaic gag and nef genes and M consensus gag and
nef genes. Figure discloses SEQ ID NOS 180-187, 183, 188, 184,
189-191, 183, 188, 184, 192-194, 183-184, 195-197, 183-184,
198-200, 183-184, 201-204, 183-184, 205-207, 183-184, 208-211,
183-184, 212-217, 183-184, 208 and 218, respectively, in order of
appearance.
DETAILED DESCRIPTION OF THE INVENTION
[0036] The present invention results from the realization that a
polyvalent set of antigens comprising synthetic viral proteins, the
sequences of which provide maximum coverage of non-rare short
stretches of circulating viral sequences, constitutes a good
vaccine candidate. The invention provides a "genetic algorithm"
strategy to create such sets of polyvalent antigens as mosaic
blends of fragments of an arbitrary set of natural protein
sequences provided as inputs. In the context of HIV, the proteins
Gag and Nef are ideal candidates for such antigens. To expand
coverage, Pol and/or Env can also be used. The invention further
provides optimized sets for these proteins.
[0037] The genetic algorithm strategy of the invention uses
unaligned protein sequences from the general population as an input
data set, and thus has the virtue of being "alignment independent".
It creates artificial mosaic proteins that resemble proteins found
in nature--the success of the consensus antigens in small animals
models suggest this works well. 9 Mers are the focus of the studies
described herein, however, different length peptides can be
selected depending on the intended target. In accordance with the
present approach, 9 mers (for example) that do not exist in nature
or that are very rare can be excluded--this is an improvement
relative to consensus sequences since the latter can contain some 9
mers (for example) that have not been found in nature, and relative
to natural strains that almost invariably contain some 9 mers (for
example) that are unique to that strain. The definition of fitness
used for the genetic algorithm is that the most "fit" polyvalent
cocktail is the combination of mosaic strains that gives the best
coverage (highest fraction of perfect matches) of all of the 9 mers
in the population and is subject to the constraint that no 9 mer is
absent or rare in the population.
[0038] The mosaics protein sets of the invention can be optimized
with respect to different input data sets--this allows use of
current data to assess virtues of a subtype or region specific
vaccines from a T cell perspective. By way of example, options that
have been compared include: [0039] 1) Optimal polyvalent mosaic
sets based on M group, B clade and C clade. The question presented
was how much better is intra-clade coverage than inter-clade or
global. [0040] 2) Different numbers of antigens: 1, 3, 4, 6 [0041]
3) Natural strains currently in use for vaccine protocols just to
exemplify "typical" strains (Merck, VRC) [0042] 4) Natural strains
selected to give the best coverage of 9-mers in a population [0043]
5) Sets of consensus: A+B+C. [0044] 6) Optimized cocktails that
include one "given" strain in a polyvalent antigen, one ancestral+3
mosaic strains, one consensus+3 mosaic strains. [0045] 7) Coverage
of 9 mers that were perfectly matched was compared with those that
match 8/9, 7/9, and 6/9 or less. This is a computationally
difficult problem, as the best set to cover one 9-mer may not be
the best set to cover overlapping 9-mers.
[0046] It will be appreciated from a reading of this disclosure
that the approach described herein can be used to design peptide
reagents to test HIV immune responses, and be applied to other
variable pathogens as well. For example, the present approach can
be adapted to the highly variable virus Hepatitis C.
[0047] The proteins/polypeptides/peptides ("immunogens") of the
invention can be formulated into compositions with a
pharmaceutically acceptable carrier and/or adjuvant using
techniques well known in the art. Suitable routes of administration
include systemic (e.g. intramuscular or subcutaneous), oral,
intravaginal, intrarectal and intranasal.
[0048] The immunogens of the invention can be chemically
synthesized and purified using methods which are well known to the
ordinarily skilled artisan. The immunogens can also be synthesized
by well-known recombinant DNA techniques.
[0049] Nucleic acids encoding the immunogens of the invention can
be used as components of, for example, a DNA vaccine wherein the
encoding sequence is administered as naked DNA or, for example, a
minigene encoding the immunogen can be present in a viral vector.
The encoding sequences can be expressed, for example, in
mycobacterium, in a recombinant chimeric adenovirus, or in a
recombinant attenuated vesicular stomatitis virus. The encoding
sequence can also be present, for example, in a replicating or
non-replicating adenoviral vector, an adeno-associated virus
vector, an attenuated mycobacterium tuberculosis vector, a Bacillus
Calmette Guerin (BCG) vector, a vaccinia or Modified Vaccinia
Ankara (MVA) vector, another pox virus vector, recombinant polio
and other enteric virus vector, Salmonella species bacterial
vector, Shigella species bacterial vector, Venezuelean Equine
Encephalitis Virus (VEE) vector, a Semliki Forest Virus vector, or
a Tobacco Mosaic Virus vector. The encoding sequence, can also be
expressed as a DNA plasmid with, for example, an active promoter
such as a CMV promoter. Other live vectors can also be used to
express the sequences of the invention. Expression of the immunogen
of the invention can be induced in a patient's own cells, by
introduction into those cells of nucleic acids that encode the
immunogen, preferably using codons and promoters that optimize
expression in human cells. Examples of methods of making and using
DNA vaccines are disclosed in U.S. Pat. Nos. 5,580,859, 5,589,466,
and 5,703,055. Examples of methods of codon optimization are
described in Haas et al, Current Biology 6:315-324 (1996) and in
Andre et al, J. Virol. 72(2):1497-1503 (1998).
[0050] It will be appreciated that adjuvants can be included in the
compositions of the invention (or otherwise administered to enhance
the immunogenic effect). Examples of suitable adjuvants include
TRL-9 agonists, TRL-4 agonists, and TRL-7, 8 and 9 agonist
combinations (as well as alum). Adjuvants can take the form of oil
and water emulsions. Squalene adjuvants can also be used.
[0051] The composition of the invention comprises an
immunologically effective amount of the immunogen of this
invention, or nucleic acid sequence encoding same, in a
pharmaceutically acceptable delivery system. The compositions can
be used for prevention and/or treatment of virus infection (e.g.
HIV infection). As indicated above, the compositions of the
invention can be formulated using adjuvants, emulsifiers,
pharmaceutically-acceptable carriers or other ingredients routinely
provided in vaccine compositions. Optimum formulations can be
readily designed by one of ordinary skill in the art and can
include formulations for immediate release and/or for sustained
release, and for induction of systemic immunity and/or induction of
localized mucosal immunity (e.g, the formulation can be designed
for intranasal, intravaginal or intrarectal administration). As
noted above, the present compositions can be administered by any
convenient route including subcutaneous, intranasal, oral,
intramuscular, or other parenteral or enteral route. The immunogens
can be administered as a single dose or multiple doses. Optimum
immunization schedules can be readily determined by the ordinarily
skilled artisan and can vary with the patient, the composition and
the effect sought.
[0052] The invention contemplates the direct use of both the
immunogen of the invention and/or nucleic acids encoding same
and/or the immunogen expressed as indicated above. For example, a
minigene encoding the immunogen can be used as a prime and/or
boost.
[0053] The invention includes any and all amino acid sequences
disclosed herein, as well as nucleic acid sequences encoding same
(and nucleic acids complementary to such encoding sequences).
[0054] Specifically disclosed herein are vaccine antigen sets
optimized for single B or C subtypes, targeting regional epidemics,
as well as for all HIV-1 variants in global circulation [the HIV-1
Main (M) group]. In the study described in Example 1 that follows,
the focus is on designing polyvalent vaccines specifically for
T-cell responses. HIV-1 specific T-cells are likely to be crucial
to an HIV-1-specific vaccine response: CTL responses are correlated
with slow disease progression in humans (Oxenius et al, J. Infect.
Dis. 189:1199-1208 (2004)), and the importance of CTL responses in
non-human primate vaccination models is well-established. Vaccine
elicited cellular immune responses help control pathogenic SIV or
SHIV, and reduce the likelihood of disease after challenge with
pathogenic virus (Barouch et al, Science 290:486-492 (2000)).
Temporary depletion of CD8+ T cells results in increased viremia in
SIV-infected rhesus macaques (Schmitz et al, Science 283:857-860
(1999)). Furthermore, the evolution of escape mutations has been
associated with disease progression, indicating that CTL responses
help constrain viral replication in vivo (Barouch et al, J. Virol.
77:7367-7375 (2003)), and so vaccine-stimulated memory responses
that could block potential escape routes may be of value. While the
highly variable Envelope (Env) is the primary target for
neutralizing antibodies against HIV, and vaccine antigens will also
need to be tailored to elicit these antibody responses (Moore &
Burton, Nat. Med. 10:769-771 (2004)), T-cell vaccine components can
target more conserved proteins to trigger responses that are more
likely to cross-react. But even the most conserved HIV-1 proteins
are diverse enough that variation will be an issue. Artificial
central-sequence vaccine approaches, consensus and ancestral
sequences (Gaschen et al, Science 296:2354-2360 (2002), Gao et al,
J. Virol. 79:1154-1163 (2005), Doria-Rose et al, J. Virol.
79:11214-11224 (2005)), which essentially "split the differences"
between strains, show promise, stimulating responses with enhanced
cross-reactivity compared to natural strain vaccines (Gao et al, J.
Virol. 79:1154-1163 (2005)) (Liao et al. and Weaver et al.,
submitted.) Nevertheless, even central strains cover the spectrum
of HIV diversity to a very limited extent, and consensus-based
peptide reagents fail to detect many autologous CD8+ T-cell
responses (Altfeld et al, J. Virol. 77:7330-7340 (2003)).
[0055] A single amino acid substitution can mediate T-cell escape,
and as one or more amino acids in many T-cell epitopes differ
between HIV-1 strains, the potential effectiveness of responses to
any one vaccine antigen is limited. Whether a particular mutation
will diminish T-cell cross-reactivity is epitope- and
T-cell-specific, although some changes can broadly affect
between-clade cross-reactivity (Norris et al, AIDS Res. Hum.
Retroviruses 20:315-325 (2004)). Including more variants in a
polyvalent vaccine could enable responses to a broader range of
circulating variants. It could also prime the immune system against
common escape variants (Jones et al, J. Exp. Med. 200:1243-1256
(2004)); escape from one T-cell receptor might create a variant
that is susceptible to another (Lee et al, J. Exp. Med.
200:1455-1466 (2004)), thus stimulating polyclonal responses to
epitope variants may be beneficial (Killian et al, AIDS 19:887-896
(2005)). Immune escape involving avenues that inhibit processing
(Milicic et al, J. Immunol. 175:4618-4626 (2005)) or HLA binding
(Ammaranond et al, AIDS Res. Hum. Retroviruses 21:395-397 (2005))
prevent epitope presentation, and in such cases the escape variant
could not be countered by a T-cell with a different specificity.
However, it is possible the presence of T-cells that recognize
overlapping epitopes may in some cases block these even escape
routes.
[0056] Certain aspects of the invention can be described in greater
detail in the non-limiting Examples that follow.
Example 1
Experimental Details
[0057] HIV-1 Sequence Data.
[0058] The reference alignments from the 2005 HIV sequence database
(http://hiv.lanl.gov), which contain one sequence per person, were
used, supplemented by additional recently available C subtype Gag
and Nef sequences from Durban, South Africa (GenBank accession
numbers AY856956-AY857186) (Kiepiela et al, Nature 432:769-75
(2004)). This set contained 551 Gag and 1,131 Nef M group sequences
from throughout the globe; recombinant sequences were included as
well as pure subtype sequences for exploring M group diversity. The
subsets of these alignments that contained 18 A, 102 B, 228 C, and
6 G subtype (Gag), and 62 A, 454 B, 284 C, and 13 G subtype
sequences (Nef) sequences were used for within- and
between-single-clade optimizations and comparisons.
[0059] The Genetic Algorithm.
[0060] GAs are computational analogues of biological processes
(evolution, populations, selection, recombination) used to find
solutions to problems that are difficult to solve analytically
(Holland, Adaptation in Natural and Artificial Systems: An
Introductory Analysis with Applicatins to Biology, Control, and
Artificial Intelligence, (M.I.T. Press, Cambridge, Mass. (1992))).
Solutions for a given input are "evolved" though a process of
random modification and selection according to a "fitness"
(optimality) criterion. GAs come in many flavors; a "steady-state
co-evolutionary multi-population" GA was implemented.
"Steady-state" refers to generating one new candidate solution at a
time, rather than a whole new population at once; and
"co-evolutionary" refers to simultaneously evolving several
distinct populations that work together to form a complete
solution. The input is an unaligned set of natural sequences; a
candidate solution is a set of k pseudo-natural "mosaic" sequences,
each of which is formed by concatenating sections of natural
sequences. The fitness criterion is population coverage, defined as
the proportion of all 9-amino-acid sequence fragments (potential
epitopes) in the input sequences that are found in the
cocktail.
[0061] To initialize the GA (FIG. 2), k populations of n initial
candidate sequences are generated by 2-point recombination between
randomly selected natural sequences. Because the input natural
sequences are not aligned, "homologous" crossover is used:
crossover points in each sequence are selected by searching for
short matching strings in both sequences; strings of c-1=8, were
used where a typical epitope length is c=9. This ensures that the
recombined sequences resemble natural proteins: the boundaries
between sections of sequence derived from different strains are
seamless, the local sequences spanning the boundaries are always
found in nature, and the mosaics are prevented from acquiring large
insertions/deletions or unnatural combinations of amino acids.
Mosaic sequence lengths fall within the distribution of natural
sequence lengths as a consequence of mosaic construction:
recombination is only allowed at identical regions, reinforced by
an explicit software prohibition against excessive lengths to
prevent reduplication of repeat regions. (Such "in frame" insertion
of reduplicated epitopes could provide another way of increasing
coverage without generating unnatural 9-mers, but their inclusion
would create "unnatural" proteins.) Initially, the cocktail
contains one randomly chosen "winner" from each population. The
fitness score for any individual sequence in a population is the
coverage value for the cocktail consisting of that sequence plus
the current winners from the other populations. The individual
fitness of any sequence in a population therefore depends
dynamically upon the best sequences found in the other
populations.
[0062] Optimization proceeds one population at a time. For each
iteration, two "parent" sequences are chosen. The first parent is
chosen using "2-tournament" selection: two sequences are picked at
random from the current population, scored, and the better one is
chosen. This selects parents with a probability inversely
proportional to their fitness rank within the population, without
the need to actually compute the fitness of all individuals. The
second parent is chosen in the same way (50% of the time), or is
selected at random from the set of natural sequences. 2-point
homologous crossover between the parents is then used to generate a
"child" sequence. Any child containing a 9-mer that was very rare
in the natural population (found less than 3 times) is rejected
immediately. Otherwise, the new sequence is scored, and its fitness
is compared with the fitnesses of four randomly chosen sequences
from the same population. If any of the four randomly chosen
sequences has a score lower than that of the new sequence, it is
replaced in the population by the new sequence. Whenever a sequence
is encountered that yields a better score than the current
population "winner", that sequence becomes the winner for the
current population and so is subsequently used in the cocktail to
evaluate sequences in other populations. A few such optimization
cycles (typically 10) are applied to each population in turn, and
this process continues cycling through the populations until
evolution stalls (i.e., no improvement has been made for a defined
number of generations). At this point, the entire procedure is
restarted using newly generated random starting populations, and
the restarts are continued until no further improvement is seen.
The GA was run on each data set with n=50 or 500; each run was
continued until no further improvement occurred for 12-24 hours on
a 2 GHz Pentium processor. Cocktails were generated having k=1, 3,
4, or 6 mosaic sequences.
[0063] The GA also enables optional inclusion of one or more fixed
sequences of interest (for example, a consensus) in the cocktail
and will evolve the other elements of the cocktail in order to
optimally complement that fixed strain. As these solutions were
suboptimal, they are not included here. An additional program
selects from the input file the k best natural strains that in
combination provide the best population coverage.
[0064] Comparison with Other Polyvalent Vaccine Candidates.
[0065] Population coverage scores were computed for other potential
mono- or polyvalent vaccines to make direct comparisons with the
mosaic-sequence vaccines, tracking identities with population
9-mers, as well as similarities of 8/9 and 7/9 amino acids.
Potential vaccine candidates based on natural strains include
single strains (for example, a single C strain for a vaccine for
southern Africa (Williamson et al, AIDS Res. Hum. Retroviruses
19:133-44 (2003))) or combinations of natural strains (for example,
one each of subtype A, B, and C (Kong et al, J. Virol. 77:12764-72
(2003)). To date, natural-strain vaccine candidates have not been
systematically selected to maximize potential T-cell epitope
coverage; vaccine candidates were picked from the literature to be
representative of what could be expected from unselected vaccine
candidates. An upper bound for coverage was also determined using
only intact natural strains: optimal natural-sequence cocktails
were generated by selecting the single sequence with the best
coverage of the dataset, and then successively adding the most
complementary sequences up to a given k. The comparisons included
optimal natural-sequence cocktails of various sizes, as well as
consensus sequences, alone or in combination (Gaschen et al,
Science 296:2354-60 (2002)), to represent the concept of central,
synthetic vaccines. Finally, using the fixed-sequence option in the
GA, consensus-plus-mosaic combinations in the comparisons; these
scores were essentially equivalent to all-mosaic combinations were
included for a given k (data not shown). The code used for
performing these analyses are available at:
ftp://ftp-t10/pub/btk/mosaics.
Results
[0066] Protein Variation.
[0067] In conserved HIV-1 proteins, most positions are essentially
invariant, and most variable positions have only two to three amino
acids that occur at appreciable frequencies, and variable positions
are generally well dispersed between conserved positions.
Therefore, within the boundaries of a CD8+ T-cell epitope (8-12
amino acids, typically nine), most of the population diversity can
be covered with very few variants. FIG. 1 shows an upper bound for
population coverage of 9-mers (stretches of nine contiguous amino
acids) comparing Gag, Nef, and Env for increasing numbers of
variants, sequentially adding variants that provide the best
coverage. In conserved regions, a high degree of population
coverage is achieved with 2-4 variants. By contrast, in variable
regions like Env, limited population coverage is possible even with
eight variants. Since each new addition is rarer, the relative
benefits of each addition diminish as the number of variants
increases.
[0068] Vaccine Design Optimization Strategies.
[0069] FIG. 1 shows an idealized level of 9-mer coverage. In
reality, high-frequency 9-mers often conflict: because of local
co-variation, the optimal amino acid for one 9-mer may differ from
that for an overlapping 9-mer. To design mosaic protein sets that
optimize population coverage, the relative benefits of each amino
acid must be evaluated in combination with nearby variants. For
example, Alanine (Ala) and Glutamate (Glu) might each frequently
occur in adjacent positions, but if the Ala-Glu combination is
never observed in nature, it should be excluded from the vaccine.
Several optimization strategies were investigated: a greedy
algorithm, a semi-automated compatible-9 mer assembly strategy, an
alignment-based genetic algorithm (GA), and an
alignment-independent GA.
[0070] The alignment-independent GA generated mosaics with the best
population coverage. This GA generates a user-specified number of
mosaic sequences from a set of unaligned protein sequences,
explicitly excluding rare or unnatural epitope-length fragments
(potentially introduced at recombination breakpoints) that could
induce non-protective vaccine-antigen-specific responses. These
candidate vaccine sequences resemble natural proteins, but are
assembled from frequency-weighted fragments of database sequences
recombined at homologous breakpoints (FIG. 2); they approach
maximal coverage of 9-mers for the input population.
[0071] Selecting HIV Protein Regions for an Initial Mosaic
Vaccine.
[0072] The initial design focused on protein regions meeting
specific criteria: i) relatively low variability, ii) high levels
of recognition in natural infection, iii) a high density of known
epitopes and iv) either early responses upon infection or CD8+
T-cell responses associated with good outcomes in infected
patients. First, an assessment was made of the level of 9-mer
coverage achieved by mosaics for different HIV proteins (FIG. 3).
For each protein, a set of four mosaics was generated using either
the M group or the B- and C-subtypes alone; coverage was scored on
the C subtype. Several results are notable: i) within-subtype
optimization provides the best within-subtype coverage, but
substantially poorer between-subtype coverage--nevertheless,
B-subtype-optimized mosaics provide better C-subtype coverage than
a single natural B subtype protein (Kong et al, J. Virol.
77:12764-72 (2003)); ii) Pol and Gag have the most potential to
elicit broadly cross-reactive responses, whereas Rev, Tat, and Vpu
have even fewer conserved 9-mers than the highly variable Env
protein, iii) within-subtype coverage of M-group-optimized mosaic
sets approached coverage of within-subtype optimized sets,
particularly for more conserved proteins.
[0073] Gag and the central region of Nef meet the four criteria
listed above. Nef is the HIV protein most frequently recognized by
T-cells (Frahm et al, J. Virol. 78:2187-200 (2004)) and the target
for the earliest response in natural infection (Lichterfeld et al,
Aids 18:1383-92 (2004)). While overall it is variable (FIG. 3), its
central region is as conserved as Gag (FIG. 1). It is not yet clear
what optimum proteins for inclusion in a vaccine might be, and
mosaics could be designed to maximize the potential coverage of
even the most variable proteins (FIG. 3), but the prospects for
global coverage are better for conserved proteins. Improved vaccine
protection in macaques has been demonstrated by adding Rev, Tat,
and Nef to a vaccine containing Gag, Pol, and Env (Hel et al, J.
Immunol. 176:85-96 (2006)), but this was in the context of
homologous challenge, where variability was not an issue. The
extreme variability of regulatory proteins in circulating virus
populations may preclude cross-reactive responses; in terms of
conservation, Pol, Gag (particularly p24) and the central region of
Nef (HXB2 positions 65-149) are promising potential immunogens
(FIGS. 1,3). Pol, however, is infrequently recognized during
natural infection (Frahm et al, J. Virol. 78:2187-200 (2004)), so
it was not included in the initial immunogen design. The conserved
portion of Nef that were included contains the most highly
recognized peptides in HIV-1 (Frahm et al, J. Virol. 78:2187-200
(2004)), but as a protein fragment, would not allow Nef's immune
inhibitory functions (e.g. HLA class I down-regulation
(Blagoveshchenskaya, Cell 111:853-66 (2002))). Both Gag and Nef are
densely packed with overlapping well-characterized CD8+ and CD4+
T-cell epitopes, presented by many different HLA molecules
(http://www.hiv.lanl.gov//content/immunology/maps/maps.html), and
Gag-specific CD8+ (Masemola et al, J. Virol. 78:3233-43 (2004)) and
CD4+ (Oxenius et al, J. Infect. Dis. 189:1199-208 (2004)) T-cell
responses have been associated with low viral set points in
infected individuals (Masemola et al, J. Virol. 78:3233-43
(2004)).
[0074] To examine the potential impact of geographic variation and
input sample size, a limited test was done using published subtype
C sequences. The subtype C Gag data were divided into three sets of
comparable size--two South African sets (Kiepiela et al, Nature
432:769-75 (2004)), and one non-South-African subtype C set.
Mosaics were optimized independently on each of the sets, and the
resulting mosaics were tested against all three sets. The coverage
of 9-mers was slightly better for identical training and test sets
(77-79% 9/9 coverage), but essentially equivalent when the training
and test sets were the two different South African data sets
(73-75%), or either of the South African sets and the non-South
African C subtype sequences (74-76%). Thus between- and
within-country coverage approximated within-clade coverage, and in
this case no advantage to a country-specific C subtype mosaic
design was found.
[0075] Designing Mosaics for Gag and Nef and Comparing Vaccine
Strategies.
[0076] To evaluate within- and between-subtype cross-reactivity for
various vaccine design strategies, a calculation was made of the
coverage they provided for natural M-Group sequences. The fraction
of all 9-mers in the natural sequences that were perfectly matched
by 9-mers in the vaccine antigens were computed, as well as those
having 8/9 or 7/9 matching amino acids, since single (and sometimes
double) substitutions within epitopes may retain cross-reactivity.
FIG. 4 shows M group coverage per 9-mer in Gag and the central
region of Nef for cocktails designed by various strategies: a)
three non-optimal natural strains from the A, B, and C subtypes
that have been used as vaccine antigens (Kong et al, J. Virol.
77:12764-72 (2003)); b) three natural strains that were
computationally selected to give the best M group coverage; c) M
group, B subtype, and C subtype consensus sequences; and, d, e, f)
three, four and six mosaic proteins. For cocktails of multiple
strains, sets of k=3, k=4, and k=6, the mosaics clearly perform the
best, and coverage approaches the upper bound for k strains. They
are followed by optimally selected natural strains, the consensus
protein cocktail, and finally, non-optimal natural strains.
Allowing more antigens provides greater coverage, but gains for
each addition are reduced as k increases (FIGS. 1 and 4).
[0077] FIG. 5 summarizes total coverage for the different vaccine
design strategies, from single proteins through combinations of
mosaic proteins, and compares within-subtype optimization to M
group optimization. The performance of a single mosaic is
comparable to the best single natural strain or a consensus
sequence. Although a single consensus sequence out-performs a
single best natural strain, the optimized natural-sequence cocktail
does better than the consensus cocktail: the consensus sequences
are more similar to each other than are natural strains, and are
therefore somewhat redundant. Including even just two mosaic
variants, however, markedly increases coverage, and four and six
mosaic proteins give progressively better coverage than polyvalent
cocktails of natural or consensus strains. Within-subtype optimized
mosaics perform best--with four mosaic antigens 80-85% of the
9-mers are perfectly matched--but between-subtype coverage of these
sets falls off dramatically, to 50-60%. In contrast, mosaic
proteins optimized using the full M group give coverage of
approximately 75-80% for individual subtypes, comparable to the
coverage of the M group as a whole (FIGS. 5 and 6). If imperfect
8/9 matches are allowed, both M group optimized and within-subtype
optimized mosaics approach 90% coverage.
[0078] Since coverage is increased by adding progressively rarer
9-mers, and rare epitopes may be problematic (e.g., by inducing
vaccine-specific immunodominant responses), an investigation was
made of the frequency distribution of 9-mers in the vaccine
constructs relative to the natural sequences from which they were
generated. Most additional epitopes in a k=6 cocktail compared to a
k=4 cocktail are low-frequency (<0.1, FIG. 7). Despite enhancing
coverage, these epitopes are relatively rare, and thus responses
they induce might draw away from vaccine responses to more common,
thus more useful, epitopes. Natural-sequence cocktails actually
have fewer occurrences of moderately low-frequency epitopes than
mosaics, which accrue some lower frequency 9-mers as coverage is
optimized. On the other hand, the mosaics exclude unique or very
rare 9-mers, while natural strains generally contain 9-mers present
in no other sequence. For example, natural M group Gag sequences
had a median of 35 (range 0-148) unique 9-mers per sequence.
Retention of HLA-anchor motifs was also explored, and anchor motif
frequencies were found to be comparable between four mosaics and
three natural strains. Natural antigens did exhibit an increase in
number of motifs per antigen, possibly due to inclusion of
strain-specific motifs (FIG. 8).
[0079] The increase in ever-rarer epitopes with increasing k,
coupled with concerns about vaccination-point dilution and reagent
development costs, resulted in the initial production of mosaic
protein sets limited to 4 sequences (k=4), spanning Gag and the
central region of Nef, optimized for subtype B, subtype C, and the
M group (these sequences are included in FIG. 9; mosaic sets for
Env and Pol are set forth in FIG. 10). Synthesis of various
four-sequence Gag-Nef mosaics and initial antigenicity studies are
underway. In the initial mosaic vaccine, targeted are just Gag and
the center of the Nef protein, which are conserved enough to
provide excellent global population coverage, and have the
desirable properties described above in terms of natural responses
(Bansal et al, Aids 19:241-50 (2005)). Additionally, including B
subtype p24 variants in Elispot peptide mixtures to detect natural
CTL responses to infection significantly enhanced both the number
and the magnitude of responses detected supporting the idea that
including variants of even the most conserved proteins will be
useful. Finally, cocktails of proteins in a polyvalent HIV-1
vaccine given to rhesus macaques did not interfere with the
development of robust responses to each antigen (Seaman et al, J.
Virol. 79:2956-63 (2005)), and antigen cocktails did not produce
antagonistic responses in murine models (Singh et al, J. Immunol.
169:6779-86 (2002)), indicating that antigenic mixtures are
appropriate for T-cell vaccines.
[0080] Even with mosaics, variable proteins like Env have limited
coverage of 9-mers, although mosaics improve coverage relative to
natural strains. For example three M group natural proteins, one
each selected from the A, B, and C clades, and currently under
study for vaccine design (Seaman et al, J. Virol. 79:2956-63
(2005)) perfectly match only 39% of the 9-mers in M group proteins,
and 65% have at least 8/9 matches. In contrast, three M group Env
mosaics match 47% of 9-mers perfectly, and 70% have at least an 8/9
match. The code written to design polyvalent mosaic antigens is
available, and could readily be applied to any input set of
variable proteins, optimized for any desired number of antigens.
The code also allows selection of optimal combinations of k natural
strains, enabling rational selection of natural antigens for
polyvalent vaccines. Included in Table 1 are the best natural
strains for Gag and Nef population coverage of current database
alignments.
TABLE-US-00001 TABLE 1 Natural sequence cocktails having the best
available 9-mer coverage for different genes, subtype sets, and
numbers of sequences Gag, B-subtype, 1 natural sequence
B.US.86.AD87_AF004394 Gag, B-subtype, 3 natural sequences
B.US.86.AD87_AF004394 B.US.97.Ac_06_AY247251 B.US.88.WR27_AF286365
Gag, B-subtype, 4 natural sequences B.US.86.AD87_AF004394
B.US.97.Ac_06_AY247251 B.US._.R3_PDC1_AY206652
B.US.88.WR27_AF286365 Gag, B-subtype, 6 natural sequences
B.CN._.CNHN24_AY180905 B.US.86.AD87_AF004394 B.US.97.Ac_06_AY247251
B.US._.P2_AY206654 B.US._.R3_PDC1_AY206652 B.US.88.WR27_AF286365
Gag, C-subtype, 1 natural sequence C.IN._.70177_AF533131 Gag,
C-subtype, 3 natural sequences C.ZA.97.97ZA012 C.ZA.x.04ZASK161B1
C.IN.-.70177_AF533131 Gag, C-subtype, 4 natural sequences
C.ZA.97.97ZA012 C.ZA.x.04ZASK142B1 C.ZA.x.04ZASK161B1
C.IN._.70177_AF533131 Gag, C-subtype, 6 natural sequences
C.ZA.97.97ZA012 C.ZA.x.04ZASK142B1 C.ZA.x.04ZASK161B1
C.BW.99.99BWMC168_AF443087 C.IN._.70177_AF533131 C.IN_MYA1_AF533139
Gag, M-group, 1 natural sequence C.IN._.70177_AF533131 Gag,
M-group, 3 natural sequences B.US.90.US2_AY173953
C.IN.-.70177_AF533131 15_01B.TH.99.99TH_R2399_AF530576 Gag,
M-group, 4 natural sequences B.US.90.US2_AY173953
C.IN._.70177_AF533131 C.1N.93.931N999_AF067154
15_01B.TH.99.99TH_R2399_AF530576 Gag, M-group, 6 natural sequences
C.ZA.x.04ZASK138B1 B.US.90.US2_AY173953 B.US._.WT1_PDC1_AY206656
C.IN._.70177_AF533131 C.IN.93.93IN999_AF067154
15_01B.TH.99.99TH_R2399_AF530576 Nef (central region), B-subtype, 1
natural sequence B.GB.94.028jh_94_1_NP_AF129346 Nef (central
region), B-subtype, 3 natural sequences
B.GB.94.028jh_94_1_NP_AF129346 B.KR.96.96KCS4_AY121471
B.FR.83.HXB2_K03455 Nef (central region), B-subtype, 4 natural
sequences B.GB.94.028jh_94_1_NP_AF129346 B.KR.96.96KCS4_AY121471
B.US.90.E90NEF_U43108 B.FR.83.HXB2_K03455 Nef (central region),
B-subtype, 6 natural sequences B.GB.94.028jh_94_1_NP_AF129346
B.KR.02.02HYJ3_AY7121454 B.KR.96.96KCS4_AY121471 B.CN._.RL42_U71182
B.US.90.E90NEF_U43108 B.FR.83.HXB2_K03455 Nef (central region),
C-subtype, 1 natural sequence C.ZA.04.04ZA8K139B1 Nef (central
region), C-subtype, 3 natural sequences C.ZA.04.04ZASK180B1
C.ZA.04.04ZASK139B1 C.ZA._.ZASW15_AF397568 Nef (central region),
C-subtype, 4 natural sequences C.ZA.97.ZA97004_AF529682
C.ZA.04.04ZASK180B1 C.ZA.04.04ZASK139B1 C.ZA._.ZASW15_AF397568 Nef
(central region), C-subtype, 6 natural sequences
C.ZA.97.ZA97004_AF529682 C.ZA.00.1192M3M C.ZA.04.04ZASK180B1
C.ZA.04.04ZASK139B1 C.04ZASK184B1 C.ZA._.ZASW15_AF397568 Nef
(central region), M-group, 1 natural sequence
B.GB.94.028jh_94_1_NP_AF129346 Nef (central region), M-group, 3
natural sequences 02_AG.CM._.98CM1390_AY265107 C.ZA.03.03ZASK020B2
B.GB.94.028jh_94_1_NP_AF129346 Nef (central region), M-group, 4
natural sequences 02_AG.CM._.98CM1390_AY265107
01A1.MM.99.mCSW105_AB097872 C.ZA.03.03ZASK020B2
B.GB.94.028jh_94_1_NP_AF129346 Nef (central region), M-group, 6
natural sequences 02_AG.CM._.98CM1390_AY265107
01A1.MM.99.mCSW105_AB097872 C.ZA.03.03ZASK020B2 C.03ZASK111B1
B.GB.94.028jh_94_1_NP_AF129346 B.KR.01.01CWS2_AF462757
[0081] Summarizing, the above-described study focuses on the design
of T-cell vaccine components to counter HIV diversity at the moment
of infection, and to block viral escap e routes and thereby
minimize disease progression in infected individuals. The
polyvalent mosaic protein strategy developed here for HIV-1 vaccine
design could be applied to any variable protein, to other
pathogens, and to other immunological problems. For example,
incorporating a minimal number of variant peptides into T-cell
response assays could markedly increase sensitivity without
excessive cost: a set of k mosaic proteins provides the maximum
coverage possible for k antigens.
[0082] A centralized (consensus or ancestral) gene and protein
strategy has been proposed previously to address HIV diversity
(Gaschen et al, Science 296:2354-2360 (2002)). Proof-of-concept for
the use of artificial genes as immunogens has been demonstrated by
the induction of both T and B cell responses to wild-type HIV-1
strains by group M consensus immunogens (Gaschen et al, Science
296:2354-2360 (2002), Gao et al, J. Virol. 79:1154-63 (2005),
Doria-Rose et al, J. Virol. 79:11214-24 (2005), Weaver et al, J.
Virol., in press)). The mosaic protein design improves on consensus
or natural immunogen design by co-optimizing reagents for a
polyclonal vaccine, excluding rare CD8+ T-cell epitopes, and
incorporating variants that, by virtue of their frequency at the
population level, are likely to be involved in escape pathways.
[0083] The mosaic antigens maximize the number of epitope-length
variants that are present in a small, practical number of vaccine
antigens. The decision was made to use multiple antigens that
resemble native proteins, rather than linking sets of concatenated
epitopes in a poly-epitope pseudo-protein (Hanke et al, Vaccine
16:426-35 (1998)), reasoning that in vivo processing of native-like
vaccine antigens will more closely resemble processing in natural
infection, and will also allow expanded coverage of overlapping
epitopes. T-cell mosaic antigens would be best employed in the
context of a strong polyvalent immune response; improvements in
other areas of vaccine design and a combination of the best
strategies, incorporating mosaic antigens to cover diversity, may
ultimately enable an effective cross-reactive vaccine-induced
immune response against HIV-1.
Example 2
[0084] Group M consensus envelope and trivalent mosaic envelopes
(both of which were designed by in silico modeling and are
predicted to be superior than wildtype envelopes) will be compared
to a monovalent wild-type envelope and trivalent wild-type
transmitted envelopes in a 4 arm immunogenicity clinical trial. The
mosaic antigens have been designed based on the current Los Alamos
database, a set that includes more full length envelopes sampled
globally from more than 2000 individuals with a large set of
sequences of transmitted viruses primarily from the CHAVI
database.
[0085] The selection of the natural strains to be used for the
comparison is based on the following criteria: For the monovalent
natural antigen, use will be made of the single transmitted virus
that is the best choice in terms of providing coverage of potential
T cell epitopes in the global database. The database is biased
towards B clade envelopes, so the single best acute Env is a B
clade representative. One A, one B and one C subtype transmitted
virus sequence is proposed for inclusion in the trivalent set, to
compensate for the biases in sampling inherent in the global
sequence collection, and to better reflect the circulating pandemic
strains. The A and C natural sequences are those that optimally
complement the best B clade sequence to provide potential epitope
coverage of the database. Vaccine antigens have been selected from
among available SGA sequenced acute samples, each representing a
transmitted virus. Therefore, this study, although primarily a T
cell study, will also provide important additional data regarding
the ability of transmitted envelope vaccines to elicit neutralizing
antibodies.
[0086] For a mosaic/consensus human trial, the following 4 arm
trial is proposed, 20 people per group, with a negative control:
[0087] 1) Con S (a well studied consensus of the consensus of each
clade, based on the 2002 database; Con S has been extensively
tested in animal models, and has theoretical coverage roughly
comparable to a single mosaic.) [0088] 2) A 3 mosaic M group
antigen set designed to, in combination, provide optimal global
coverage of 9 amino acid long stretches in the database. Such
9-mers represent potential epitope coverage of the database.
Unnatural 9-mers are excluded in mosaics, and rare variants
minimized. [0089] 3) The optimal single best natural protein
selected from sequences sampled from acutely infected patients with
SGA sequences available; these sequences should correspond to
viable, transmitted sequences. As in (2), this sequence will be
selected to be the one that provides optimal 9-mer coverage of the
database. The B clade currently dominates sampling for the sequence
database, so the sequence with the best database coverage will be a
B clade sequence. [0090] 4) The best natural strains from acute
infection SGA sequences that in combination provide the best global
coverage. (Note: the B and C dominate the M group sampling hence
the code naturally selects one of each as the two best. Thus, the
third complementary sequence was forced to be selected from an
acute SGA A clade set, to counter this bias and better reflect the
global epidemic). [0091] 5) Negative control buffer/saline
[0092] The current M group alignment in the HIV database was
combined with all of the newer CHAVI sequences--this includes a
total of 2020 sequences: [0093] 728 B clade [0094] 599 C clade
[0095] 693 that are all other clades, circulating recombinant
forms, and unique recombinants. This was used for the M group
vaccine design.
[0096] This sampling is obviously skewed toward the B and C clade.
As will be shown subsequently, the coverage of "potential epitopes"
(9-mers) in other clades is still excellent.
The Sequences
[0097] M consensus
TABLE-US-00002 >ConS (SEQ ID NO: 219)
MRVRGIQRNCQHLWRWGTLILGMLMICSAAENLWVTVYYGVPVWKEANTT
LFCASDAKAYDTEVHNVWATHACVPTDPNPQEIVLENVTENFNMWKNNMV
EQMHEDIISLWDQSLKPCVKLTPLCVTLNCTNVNVTNTTNNTEEKGEIKN
CSFNITTEIRDKKQKVYALFYRLDVVPIDDNNNNSSNYRLINCNTSAITQ
ACPKVSFEPIPIHYCAPAGFAILKCNDKKFNGTGPCKNVSTVQCTHGIKP
VVSTQLLLNGSLAEEEIIIRSENITNNAKTIIVQLNESVEINCTRPNNNT
RKSIRIGPGQAFYATGDIIGDIRQAHCNISGTKWNKTLQQVAKKLREHFN
NKTIIFKPSSGGDLEITTHSFNCRGEFFYCNTSGLFNSTWIGNGTKNNNN
TNDTITLPCRIKQIINMWQGVGQAMYAPPIEGKITCKSNITGLLLTRDGG
NNNTNETEIFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVER
EKRAVGIGAVFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQSNLLRA
IEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTTV
PWNSSWSNKSQDEIWDNMTWMEWEREINNYTDIIYSLIEESQNQQEKNEQ
ELLALDKWASLWNWFDITNWLWYIKIFIMIVGGLIGLRIVFAVLSIVNRV
RQGYSPLSFQTLIPNPRGPDRPEGIEEEGGEQDRDRSIRLVNGFLALAWD
DLRSLCLFSYHRLRDFILIAARTVELLGRKGLRRGWEALKYLWNLLQYWG
QELKNSAISLLDTTAIAVAEGTDRVIEVVQRACRAILNIPRRIRQGLERA LL
3 mosaics
TABLE-US-00003 >M_mos_3_1 (SEQ ID NO: 177)
MRVKGIRKNYQHLWRWGTMLLGMLMICSAAEQLWVTVYYGVPVWRDAETT
LFCASDAKAYEREVHNVWATHACVPTDPNPQEIVLENVTEEFNMWKNNMV
DQMHEDIISLWDESLKPCVKLTPLCVTLNCTDVNVTKTNSTSWGMMEKGE
IKNCSFNMTTELRDKKQKVYALFYKLDIVPLEENDTISNSTYRLINCNTS
AITQACPKVTFEPIPIHYCTPAGFAILKCNDKKFNGTGPCKNVSTVQCTH
GIRPVVTTQLLLNGSLAEEEIIIRSENLTNNAKTIIVQLNESVVINCTRP
NNNTRKSIRIGPGQTFYATGDIIGNIRQAHCNISREKWINTTRDVRKKLQ
EHFNKTIIFNSSSGGDLEITTHSFNCRGEFFYCNTSKLFNSVWGNSSNVT
KVNGTKVKETITLPCKIKQIINMWQEVGRAMYAPPIAGNITCKSNITGLL
LVRDGGNVTNNTEIFRPGGGNMKDNWRSELYKYKVVEIKPLGIAPTKAKR
RVVEREKRAVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQS
NLLRAIEAQQHMLQLTVWGIKQLQARILAVERYLRDQQLLGIWGCSGKLI
CTTNVPWNSSWSNKSLDEIWNNMTWMQWEKEIDNYTSLIYTLIEESQNQQ
EKNEQDLLALDKWANLWNWFDISNWLWYIRIFIMIVGGLIGLRIVFAVLS
IVNRVRKGYSPLSFQTLTPNPRGPDRLGRIEEEGGEQDKDRSIRLVNGFL
ALAWDDLRNLCLFSYHRLRDLLLIVTRIVELLGRRGWEALKYLWNLLQYW
IQELKNSAVSLLNATAIAVAEGTDRVIEVVQRACRAILHIPRRIRQGLER ALL
>M_mos_3_2 (SEQ ID NO: 220)
MRVKETQMNWPNLWKWGTLILGLVIICSASDNLWVTVYYGVPVWKEATTT
LFCASDAKAYDTEVHNVWATYACVPTDPNPQEVVLGNVTENFNMWKNNMV
EQMHEDIISLWDQSLKPCVRLTPLCVTLNCSNANTTNTNSTEEIKNCSFN
ITTSIRDKVQKEYALFYKLDVVPIDNDNTSYRLISCNTSVITQACPKVSF
EPIPIHYCAPAGFAILKCKDKKFNGTGPCTNVSTVQCTHGIRPVVSTQLL
LNGSLAEEEVVIRSENFTNNAKTIIVHLNKSVEINCTRPNNNTRKSIHIG
PGRAFYATGEIIGDIRQAHCNISRAKWNNTLKQIVKKLKEQFNKTIIFNQ
SSGGDPEITTHSFNCGGEFFYCNTSGLFNSTWNSTATQESNNTELNGNIT
LPCRIKQIVNMWQEVGKAMYAPPIRGQIRCSSNITGLILTRDGGNNNSTN
ETFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGT
IGAMFLGFLGAAGSTMGAASLTLTVQARLLLSGIVQQQNNLLRAIEAQQH
LLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTTVPWNTSW
SNKSLNEIWDNMTWMEWEREIDNYTGLIYTLLEESQNQQEKNEQELLELD
KWASLWNWFDITKWLWYIKIFIMIVGGLVGLRIVFTVLSIVNRVRQGYSP
LSFQTHLPAPRGPDRPEGIEEEGGERDRDRSGRLVDGFLAIIWVDLRSLC
LFSYHQLRDFILIAARTVELLGHSSLKGLRRGWEALKYWWNLLQYWSQEL
KNSAISLLNTTAIVVAEGTDRIIEVLQRAGRAILHIPTRIRQGLERLLL >M_mos_3_3
(SEQ ID NO: 179) MRVRGIQRNWPQWWIWGILGFWMLMICNVVGNLWVTVYYGVPVWKEAKTT
LFCASDAKAYEKEVHNVWATHACVPTDPSPQEVVLENVTENFNMWKNDMV
DQMHEDVISLWDQSLKPCVKLTHLCVTLNCTNATNTNYNNSTNVTSSMIG
EMKNCSFNITTEIRDKSRKEYALFYRLDIVPLNEQNSSEYRLINCNTSTI
TQACPKVSFDPIPIHYCAPAGYAILKCNNKTFNGTGPCNNVSTVQCTHGI
KPVVSTQLLLNGSLAEGEIIIRSENLTDNAKTIIVHLNESVEIVCTRPNN
NTRKSVRIGPGQAFYATGDIIGDIRQAHCNLSRTQWNNTLKQIVTKLREQ
FGNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTQLFNSTWENSNITQP
LTLNRTKGPNDTITLPCRIKQIINMWQGVGRAMYAPPIEGLIKCSSNITG
LLLTRDGGNNSETKTTETFRPGGGNMRDNWRNELYKYKVVQIEPLGVAPT
RAKRRVVEREKRAVGIGAVFLGFLGTAGSTMGAASITLTVQARQVLSGIV
QQQSNLLKAIEAQQHLLKLTVWGIKQLQTRVLAIERYLKDQQLLGLWGCS
GKLICTTAVPWNSSWSNKSQTDIWDNMTWMQWDREISNYTDTIYRLLEDS
QNQQEKNEKDLLALDSWKNLWNWFDITNWLWYIKIFIIIVGGLIGLRIIF
AVLSIVNRCRQGYSPLSLQTLIPNPRGPDRLGGIEEEGGEQDRDRSIRLV
SGFLALAWDDLRSLCLFSYHRLRDFILIVARAVELLGRSSLRGLQRGWEA
LKYLGSLVQYWGLELKKSAISLLDTIAIAVAEGTDRIIEVIQRICRAIRN
IPRRIRQGFEAALL
Single optimal natural sequence selected from available acute SGA
sequences:
TABLE-US-00004 >B.acute.Con.1059 (SEQ ID NO: 221)
MRVTEIRKNYLWRWGIMLLGMLMICSAAEQLWVTVYYGVPVWKEATTTLF
CASDAKAYTAEAHNVWATHACVPTDPNPQEVVLENVTENFNMWKNNMVEQ
MHEDIISLWDQSLKPCVKLTPLCVTLNCTDLANNTNLANNTNSSISSWEK
MEKGEIKNCSFNITTVIKDKIQKNYALFNRLDIVPIDDDDTNVTNNASYR
LISCNTSVITQACPKISFEPIPIHYCAPAGFAILKCNDKKFNGTGPCTNV
STVQCTHGIKPVVSTQLLLNGSLAEEEVVIRSENFTDNVKTIIVQLNESV
IINCTRPNNNTRKSITFGPGRAFYTTGDIIGDIRKAYCNISSTQWNNTLR
QIARRLREQFKDKTIVFNSSSGGDPEIVMHSFNCGGEFFYCNTTQLFNST
WNGNDTGEFNNTGKNITYITLPCRIKQIINMWQEVGKAMYAPPIAGQIRC
SSNITGILLTRDGGNSSEDKEIFRPEGGNMRDNWRSELYKYKVVKIEPLG
VAPTKAKRRVVQREKRAVGIGAVFLGFLGAAGSTMGAASMTLTVQARLLL
SGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGI
WGCSGKLICTTAVPWNASWSNRSLDNIWNNMTWMEWDREINNYTNLIYNL
IEESQNQQEKNEQELLELDKWASLWNWFDITKWLWYIKIFIMIVGGLVGL
RIVFVILSIVNRVRQGYSPLSFQTHLPTPRGLDRHEGTEEEGGERDRDRS
GRLVDGFLTLIWIDLRSLCLFSYHRLRDLLLIVTRIVELLGRRGWEILKY
WWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRIIEIVQRIFRAILHIPT RIRQGLERALL
3 optimal natural selected from available acute samples, SGA
sequences:
TABLE-US-00005 >B.acute.Con.1059 (SEQ ID NO: 221)
MRVTEIRKNYLWRWGIMLLGMLMICSAAEQLWVTVYYGVPVWKEATTTLF
CASDAKAYTAEAHNVWATHACVPTDPNPQEVVLENVTENFNMWKNNMVEQ
MHEDIISLWDQSLKPCVKLTPLCVTLNCTDLANNTNLANNTNSSISSWEK
MEKGEIKNCSFNITTVIKDKIQKNYALFNRLDIVPIDDDDTNVTNNASYR
LISCNTSVITQACPKISFEPIPIHYCAPAGFAILKCNDKKFNGTGPCTNV
STVQCTHGIKPVVSTQLLLNGSLAEEEVVIRSENFTDNVKTIIVQLNESV
IINCTRPNNNTRKSITFGPGRAFYTTGDIIGDIRKAYCNISSTQWNNTLR
QIARRLREQFKDKTIVFNSSSGGDPEIVMHSFNCGGEFFYCNTTQLFNST
WNGNDTGEFNNTGKNITYITLPCRIKQIINMWQEVGKAMYAPPIAGQIRC
SSNITGILLTRDGGNSSEDKEIFRPEGGNMRDNWRSELYKYKVVKIEPLG
VAPTKAKRRVVQREKRAVGIGAVFLGFLGAAGSTMGAASMTLTVQARLLL
SGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGI
WGCSGKLICTTAVPWNASWSNRSLDNIWNNMTWMEWDREINNYTNLIYNL
IEESQNQQEKNEQELLELDKWASLWNWFDITKWLWYIKIFIMIVGGLVGL
RIVFVILSIVNRVRQGYSPLSFQTHLPTPRGLDRHEGTEEEGGERDRDRS
GRLVDGFLTLIWIDLRSLCLFSYHRLRDLLLIVTRIVELLGRRGWEILKY
WWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRIIEIVQRIFRAILHIPT RIRQGLERALL
>C.acute.Con.0393 (SEQ ID NO: 222)
MRVRGILRNYQQWWIWGILGFWMLMICSVGGNLWVTVYYGVPVWREAKTT
LFCASDAKAYEREVHNVWATHACVPTDPNPQELFLENVTENFNMWKNDMV
DQMHEDIISLWDQSLKPCVKLTPLCVTLNCSNANITRNSTDGNTTRNSTA
TPSDTINGEIKNCSFNITTELKDKKKKEYALFYRLDIVPLNEENSNFNEY
RLINCNTSAVTQACPKVSFDPIPIHYCAPAGYAILKCNNKTFNGTGPCNN
VSTVQCTHGIKPVVSTQLLLNGSLAEEEIIIRSENLTNNAKTIIVHLKEP
VEIVCTRPNNNTRKSMRIGPGQTFYATDIIGDIRQASCNIDEKTWNNTLN
KVGEKLQEHFPNKTLNFAPSSGGDLEITTHSFNCRGEFFYCNTSKLFYKT
EFNSTTNSTITLQCRIKQIINMWQGVGRAMYAPPIEGNITCKSNITGLLL
TRDGGTNDSMTETFRPGGGDMRDNWRSELYKYKVVEIKPLGVAPTEAKRR
VVEREKRALTLGALFLGFLGTAGSTMGAASITLTVQARQLLSGIVQQQSN
LLKAIEAQQHLLQLTVWGIKQLQTRVLAIERYLQDQQLLGLWGCSGKLIC
TTAVPWNSSWSNKSQGEIWGNMTWMQWDREISNYTNTIYRLLEDSQIQQE
KNEKDLLALDSWKNLWSWFSITNWLWYIKIFIMIVGGLIGLRIIFAVLSI
VNRVRQGYSPLPFQTLIPNPRGPDRLGRIEEEGGEQDRDRSIRLVNGFLA
IAWDDLRSLCLFSYHRLRDFILIAARAAELLGRSSLRGLQRGWEALKYLG
SLVQYWGLELKKSAISLLDTVAITVAEGTDRIIEVVQRICRAICNIPRRI RQGFEAALQ
Coverage Comparison of the Four Vaccine Antigens.
[0098] Mosaics and naturals are optimized for the first red bar on
the left for each vaccine (the total). The "total" represents all
sequences, database+ CHAVI. The "B" is the subset that are B clade,
"C" the subset that are C clade, and "N" the remaining M group
sequences that are not B or C (all other clades and recombinants).
As B is most common, the single best natural is of course a B, and
B thus has the best coverage for Nat.1. Con S, as expected,
provides much more even coverage for all clades, and provides
better coverage for all the groups except B clade. (Note: in a Con
S Macaque study, the natural B was not selected to be optimal, and
Con S had better coverage even within B clade than the B vaccine
strain that had been used; this was reflected in the number of
detected responses to heterogeneous B's. A difference here is that
the natural B was selected to be the natural B clade sequence from
acute infection that provides optimal coverage). Nat.3 gives good
broad coverage, Mos.3 better. (See FIG. 11.)
[0099] The mosaics will minimize rare 9-mers but in Env they cannot
be excluded or it is not possible to span certain really variable
regions to make intact proteins. For all other HIV proteins tested,
it was possible to exclude 9-mers that were found at 3 times or
less. Still, the 3 best natural Envs contain more than twice the
number of rare 9-mer variants relative to the 3 Env mosaics.
[0100] FIG. 12 includes additional summaries of coverage; ConS
gp160 contains quite a few conserved 9-mers that are missed in
gp140DCFI, as one would expect. ConS provides slightly less
coverage than a single mosaic, but it is already known that ConS
works very well in macaques so serves as a good positive control.
1, 2, and 3 mosaics give increasingly better coverage, and Nat.3 is
not as good as Mos.3.
[0101] FIG. 13 is alignment dependent, and based on the database
alignment (the tow plots above this are alignment independent).
Each position represents the 9-mer it initiates as one moves across
the protein. The upper bound (black dashed lined) is the sum of the
frequencies of the three most common 9-mers starting from each
position; it represents the maximal limit that could be achieved
for coverage with 3 proteins, and this is not quite achievable in
practice because there can be conflicts in a given position for
overlapping 9-mers, although the 3 mosaic combination very nearly
achieves it. The reason the "total 9-mers" shown in grey varies is
because of insertions and deletions in the alignment.
[0102] Only the Mos.3 vaccine cocktail is shown in FIG. 13.
However, all four vaccines resorted by coverage is shown in FIG.
14, where those positions that start the 9-mers that are best
covered by the vaccine are moved to the left. The exact match line
is left in all four plots for a reference point. Not only does
Mos.3 (red) approach the maximum, but the orange and yellow
near-matches that have potential for cross-reactivity are also
improved in this vaccine cocktail as compared to the others.
[0103] The plots shown in FIG. 15 map every amino acid in every
sequence in the full database alignment. A row of pixels is a
sequence, a column is an alignment position. White patches are
insertions to maintain the alignment. All 9-mers that encompass an
amino acid are considered. If every 9-mer that spans the amino acid
has a perfect match in the vaccine cocktail, the pixel is yellow,
so yellow is good. If one is off, light orange, two off, darker
orange . . . through no spanning 9-mer matches represented by
black. Note: lots of yellow for 3 mosaics, relative to the other
vaccines. There is a big patch of the most yellow for the B clade
in Nat.1 as the single best natural is a B clade. Note, all those
dark bits: in these regions the sequences in the database are
different than any 9-mer in the vaccine, so cross-reactivity would
be several limited.
Optimization Using 9-Mers.
[0104] 9-mers were selected because that is the most common size of
an optimal CD8+ T cell epitope. They range from 8-12, and optimal
CD4+ T cell epitopes can be even be larger or smaller. As it turns
out, coverage of 9-mers is best when optimized for 9-mer coverage,
but if optimization on a different size yields very little decrease
in coverage for 9-mers. The same goes for all lengths, 8-12, the
peak coverage is for the size selected but the coverage is
excellent for other lengths, as the solutions are related. 9-versus
12-mers are shown in FIG. 16, 12 being the most extreme value one
might reasonably consider. The coverage is nearly identical for
9-mers optimized for 9 or 12, or for 12-mers optimized for 9 or 12;
it is 1-2% higher for the length selected for optimization.
Naturally, 12-mers have fewer identities than 9-mers in general,
because they are longer so it is harder to find a prefect match. A
more comprehensive study was made of this for HIV proteins showing
that the loss was consistently larger for 12-mers when optimized on
9 rather than vice versa, and that, in other proteins, this
difference could be up to 4-5%. Thus, for Env the selection of
9-mers is less of a problem. Given all of the above, 9-mers were
selected since this is the most common optimal CTL epitope length,
and since optimal coverage of 9-mers provides approaching optimal
coverage of other lengths.
Options for the 3 Best Natural Strains: Acute Transmission Cases,
SGA Sequences.
[0105] Use of all database sequences as a source for natural
strains for vaccine cocktails was first explored, and then a
comparison was made of that with selecting from a restricted group
of just acute SGA sequences, essentially transmitted viruses.
Essentially comparable coverage of the full database could be
achieved by restricting to acute infection sequences. As these have
other obvious advantages, they will be used for the natural
sequences.
[0106] First, the exploration of coverage using the full database
as a source for a natural cocktail. As noted above, the current M
group Env one-seq-per-person data set is dominated by B clade
infections, closely followed by C clade. Thus, the single best
optimal natural selected by the vaccine design program to cover
9-mers in the (database+CHAVI) data set is a B. If one picks from
among any sequence in the database, YU-2 comes up as the best
single sequence. To get better representation of other clades, the
best B was fixed, and then the next best sequence was added to
complement YU-2, which is (logically) a C clade sequence, DU467.
Those two were then fixed, and the third complement of the antigen
was selected. (If the first two are not fixed, and the program is
allowed to choose the third, it logically found a B/C recombinant,
it has to be forced to select an A. It is believed that forcing the
ABC set would improve global coverage, and partly counteract the B
& C clade sampling bias among sequences.)
[0107] The optimal naturals from the database tend to harken back
to older sequences; this is not surprising, as the older sequences
tend to be more central in phylogenetic trees, and thus more
similar other circulating strains. For this study, however, it is
preferred to use more contemporary Envelope proteins sampled during
acute infection and sequenced using SGA, as these sequences
accurately reflect the transmitted virus. Given that constraint, it
is still desired to optimize for 9-mer coverage, so that the
cocktail of natural sequences is given the best chance for success
in the comparison with mosaics. It turns out when this was done
there was an extremely minor loss of coverage when comparing the
trivalent cocktail selected from among acute SGA sequences to the
trivalent antigen selected from the entire database, (in both cases
optimizing for coverage the full database). Thus, by restricting
the antigen cocktails to transmitted virus, coverage is not
compromized. This alternative has several advantages. Most
importantly, it enables a determination of the cross-reactive
potential of antibodies generated from acute infection viruses used
for the natural cocktail relative to consensus or mosaics as a
secondary endpoint of interest, without compromising the primary
endpoint focusing on a comparison of T-cell response breadth of
coverage. A large set of B (113) and C (40) clade acute samples
sequenced from CHAVI study is available, giving a large dataset
from which to select an optimum combination. For the selection of
the complementary sequence from the A clade, to complete the B and
C in the trivalent vaccine. Several acute sequences were
available.
[0108] Analysis of gp160 was undertaken that included the 8 subtype
A gp160s, and also a subregion analysis was done with all 15 in
V1-V4, to get an indication of whether or not more sequencing was
required. Fortunately, one of the available full length sequences
made an excellent complement to the B and C acutes, essentially as
good as any of the others. This comparison indicated there was no
particular need to do more sequencing at this time. It is believed
that this is appropriate since with such a limited A baseline to
select from, because the A sequence only needs to complement the
choice of B and C clade strains, and many Bs and Cs were available
from which to choose. Two of the patients from which the Nat.3
cocktail is derived are below. Nat.1 is just the first one.
B Patient 1059
Patient Sex=M
RiskFactor=PPD
[0109] Sample country=USA Sample city=Long Beach, Calif. Patient
cohort=CA-UCSF Patient health status=Acute
Viral Load=2,800,000
[0110] Infection country=USA Sample date=Mar. 26, 1998
C Patient 0393
Fiebig Stage=4
[0111] Infection country=Malawi Sample date=17 Jul. 2003
Viral Load=12,048,485
[0112] Patient sex=F CD4count=618 (measured 13 days after sequenced
sample) Patient age=23
STD=GUD,PID
[0113] FIGS. 17 and 18 illustrate the minimal loss of coverage in
selecting from acute SGA sequences, and a highlighter plot of each
of the 3 patients env sequences, that shows that the consensus of
each patient is equivalent to the most common strains, and thus an
excellent estimate of the actual transmitted virus.
Why M Group and not Clade Specific Coverage?
[0114] It is believed that it is important to strive for a global
HIV vaccine, if at all possible, with exploratory methods such as
these since many nations have multiclade epidemics, and people
travel. While intra-clade coverage can definitely be gained by a
within-clade optimized vaccine, the result of such a strategy would
be dramatic loss of inter-clade coverage. The hope is that a
multivalent mosaic could provide enough breadth to counter viruses
of virtually any clades or recombinants. The compromise and benefit
in terms of coverage for Env M group versus subtype-specific design
is shown in FIG. 19.
Why Env?
[0115] This proof of concept study is well positioned to see
differences in breadth of responses using Env as the test antigen.
This is partly because of the theoretical considerations described
herein (ENV has twice many conserved 9 mers in the mosaics relative
to the best natural strain, and only half as many rare variants)
and partly because of the prior animal studies. Env studies with a
consensus versus natural in macaques showed a highly significant
increase in breadth of responses: 3-4 fold more epitopes per Env
protein were recognized (Santra et al, in press, PNAS). Env mosaics
have shown an even more profound advantage in a mouse study (up to
10-fold over comparable numbers of natural antigens, manuscript in
preparation in collaboration with the VRC). Based on this prior
work, it makes sense to start with a small human trial testing the
breadth of responses to Env. Ultimately, the hope is to apply the
proof of concept gained with Env to a more conserved protein like a
Gag where it may be possible to confer broadest protection. Gag
gives outstanding coverage of the full M group. Tests of Gag and
Nef are ongoing in macaque, using a 4 mosaic vaccine cocktail
approach (see Example 3). A coverage comparison of macaque 4 mosaic
Gag vaccine and proposed human Env 3 mosaic vaccine against the
current database is in FIG. 20. There is more theoretical potential
for cross-reactivity with the Gag vaccine, but more progress has
been made with Env in the animal models to date, so Env has the
best foundation to justify moving forward. The three mosaic Env
sequences described above and the sequences used in Example 3 are
shown in FIG. 21.
DNA
[0116] The DNAs to be used will be in the form of the full gp160
Env. The gp160 would be in the PCMVR plasmid (Gary Nabel) and will
be the identical plasmid used in all VRC DNA immunization trials.
Dose is anticipated to be 4 mg. The following DNA constructs will
be used: [0117] DNA optimal Wildtype Env transmitted/founder env
(WT Env) [0118] DNA group M consensus Env (ConS Env) [0119] DNA
Trivalent optimal wildtype transmitted/founder Env (WT Tri Env)
[0120] DNA Trivalent Mosaic Env
NYVAC
[0121] NYVAC (vP866) is a recombinant poxvirus vector which has an
18 gene deletion versus wild-type virus. The NYVAC vector will be
licensed from Sanofi-Pasteur and manufactured by a third party
contractor and will be propagated on a CEF cell substrate. The Env
construct expressed in NYVAC will be gp140C (entire Env with
transmembrane and cytoplasmic domain deleted and gp41/gp120
cleavage site mutated) or will be a full gp160. The choice of
construct design will depend on the ability to make the NYVAC with
gp160 forms vs gp140. The dose of NYVAC is anticipated to be
.about.1.times.10 7 TCID50. The following NYVAC constructs will be
used: [0122] NYVAC WT Env [0123] NYVAC ConS Env [0124] NYVAC
Trivalent Native Env [0125] NYVAC Trivalent Mosaic Env
[0126] Vaccinations will be given by intramuscular injection.
TABLE-US-00006 TABLE Protocol Schema Injection schedule in weeks
Group Number Dose 0 4 20 24 1 20 DNA WT DNA WT NYVAC WT NYVAC WT
Env Env EnvA EnvA 4 Placebo Placebo Placebo Placebo 2 20 DNA ConS
DNA ConS Env Env NYVAC ConS NYVAC ConS 4 Placebo Placebo Placebo
Placebo 3 20 DNA DNA NYVAC NYVAC Trivalent Trivalent Trivalent
Trivalent Native Env Native Env Native Env Native Env 4 Placebo
Placebo Placebo Placebo 4 20 DNA DNA NYVAC NYVAC Trivalent
Trivalent Trivalent Trivalent Mosaic Env Mosaic Env Mosaic Env
Mosaic Env 4 Placebo Placebo Placebo Placebo Total 96 (80/16)
Sequence CWU 0 SQTB SEQUENCE LISTING The patent application
contains a lengthy "Sequence Listing" section. A copy of the
"Sequence Listing" is available in electronic form from the USPTO
web site
(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20180185471A1).
An electronic copy of the "Sequence Listing" will also be available
from the USPTO upon request and payment of the fee set forth in 37
CFR 1.19(b)(3).
0 SQTB SEQUENCE LISTING The patent application contains a lengthy
"Sequence Listing" section. A copy of the "Sequence Listing" is
available in electronic form from the USPTO web site
(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20180185471A1).
An electronic copy of the "Sequence Listing" will also be available
from the USPTO upon request and payment of the fee set forth in 37
CFR 1.19(b)(3).
* * * * *
References