U.S. patent application number 15/902263 was filed with the patent office on 2018-07-19 for methods and compositions for rna-guided treatment of hiv infection.
This patent application is currently assigned to Temple University of the Commonwealth System of Higher Education. The applicant listed for this patent is Temple University of the Commonwealth System of Higher Education. Invention is credited to Wenhui HU, Kamel KHALILI.
Application Number | 20180200343 15/902263 |
Document ID | / |
Family ID | 52587370 |
Filed Date | 2018-07-19 |
United States Patent
Application |
20180200343 |
Kind Code |
A1 |
KHALILI; Kamel ; et
al. |
July 19, 2018 |
METHODS AND COMPOSITIONS FOR RNA-GUIDED TREATMENT OF HIV
INFECTION
Abstract
A method of treating a subject having or at risk for having a
virus infection, by administering a therapeutically effective
amount of a composition comprising a vector encoding a
CRISPR-associated endonuclease and at least two guide RNAs that are
complementary to two target sequences spanning from the 5'- to
3'-LTRs of the sequence in the virus, and completely excising a
fragment of greater than 9000-bp of integrated proviral DNA that
spanned from its 5'- to 3'-LTRs. A method of treating a subject
having or at risk for having a genetic caused disease, by
administering a therapeutically effective amount of a composition
comprising a vector encoding a CRISPR-associated endonuclease and
at least two guide RNAs that are complementary to two target
sequences spanning from the sequence of the subjects DNA greater
than 9000-bp that is chromosomally integrated and causes the
genetic caused disease, and excising the chromosomally integrated
sequence.
Inventors: |
KHALILI; Kamel; (BalaCynwyd,
PA) ; HU; Wenhui; (Cherry Hill, NJ) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Temple University of the Commonwealth System of Higher
Education |
Philadelphia |
PA |
US |
|
|
Assignee: |
Temple University of the
Commonwealth System of Higher Education
Philadelphia
PA
|
Family ID: |
52587370 |
Appl. No.: |
15/902263 |
Filed: |
February 22, 2018 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15148261 |
May 6, 2016 |
9981020 |
|
|
15902263 |
|
|
|
|
14838057 |
Dec 11, 2015 |
9925248 |
|
|
15148261 |
|
|
|
|
PCT/US14/53441 |
Aug 29, 2014 |
|
|
|
14838057 |
|
|
|
|
61871626 |
Aug 29, 2013 |
|
|
|
62018441 |
Jun 27, 2014 |
|
|
|
62026103 |
Jul 18, 2014 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
A61K 9/0034 20130101;
A61K 38/465 20130101; A61P 31/00 20180101; C12N 9/22 20130101; A61P
31/12 20180101; C12N 2740/16063 20130101; C12Y 301/21 20130101;
A61P 31/18 20180101; A61K 48/005 20130101; A61K 35/12 20130101;
C12N 7/00 20130101; A61K 45/06 20130101; C12N 2320/30 20130101;
C12N 2310/20 20170501; C12N 15/111 20130101; A61K 48/00
20130101 |
International
Class: |
A61K 38/46 20060101
A61K038/46; C12N 7/00 20060101 C12N007/00; A61K 9/00 20060101
A61K009/00; A61K 35/12 20150101 A61K035/12; A61K 45/06 20060101
A61K045/06; C12N 15/11 20060101 C12N015/11; C12N 9/22 20060101
C12N009/22; A61K 48/00 20060101 A61K048/00 |
Goverment Interests
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0001] This invention was made with US government support under
grant numbers R01MH093271, R01NS087971, and P30MH092177 awarded by
the National Institutes of Health. The US government may have
certain rights in the invention.
Claims
1. A method of treating a subject having or at risk for having an
HIV-1 virus infection, including the steps of: administering to the
subject a therapeutically effective amount of a composition
comprising a Clustered Regularly Interspaced Short Palindromic
Repeat (CRISPR)-associated endonuclease, and two or more different
multiplex guide RNAs (gRNAs), wherein each of the at least two
gRNAs is complementary to a different target nucleic acid sequence
in a long terminal repeat (LTR) of proviral DNA of the virus that
is unique from the genome of the host cell; cleaving a double
strand of the proviral DNA at a first target protospacer sequence
with the CRISPR-associated endonuclease; cleaving a double strand
of the proviral DNA at a second target protospacer sequence with
the CRISPR-associated endonuclease; completely excising a fragment
of greater than 9000-bp of integrated HIV-1 proviral DNA that
spanned from its 5'- to 3'-LTRs; and eradicating the HIV-1 proviral
DNA from the host cell.
2. The method of claim 1, wherein said administering step further
includes the steps of: exposing a host cell to a composition
including an isolated nucleic acid encoding the CRISPR-associated
endonuclease; an isolated nucleic acid sequence encoding a first
gRNA having a first spacer sequence that is complementary to a
first target protospacer sequence in a proviral DNA; and an
isolated nucleic acid encoding a second gRNA having a second spacer
sequence that is complementary to a second target protospacer
sequence in the proviral DNA; expressing in the host cell the
CRISPR-associated endonuclease, the first gRNA, and the second
gRNA; assembling, in the host cell, a first gene editing complex
including the CRISPR-associated endonuclease and the first gRNA;
and a second gene editing complex including the CRISPR-associated
endonuclease and the second gRNA; directing the first gene editing
complex to the first target protospacer sequence by complementary
base pairing between the first spacer sequence and the first target
protospacer sequence; and directing the second gene editing complex
to the second target protospacer sequence by complementary base
pairing between the second spacer sequence and the second target
protospacer sequence.
3. The method of claim 2, wherein at least one of the first target
protospacer sequence and the second target protospacer sequence is
situated within the U3 region of the LTR.
4. The method of claim 3, wherein the first spacer sequence and the
second spacer sequence each include a sequence complementary to a
target protospacer sequence selected from the group consisting of
SEQ ID NO: 96, SEQ ID NO: 121, SEQ ID NO: 87, and SEQ ID NO:
110.
5. The method of claim 3, wherein the first spacer sequence and the
second spacer sequence include, respectively, a sequence
complementary to the target protospacer sequences SEQ ID NO: 96 and
SEQ ID NO: 121.
6. The method of claim 3, wherein the first spacer sequence and the
second spacer sequence each include, respectively, a sequence
complementary to the target protospacer sequences SEQ ID NO: 87 and
SEQ ID NO: 110.
7. The method of claim 1, wherein the CRISPR-associated
endonuclease is Cas9 or a human-optimized Cas9.
8. The method of claim 1, wherein the composition is encoded in a
vector selected from the group consisting of a plasmid vector, a
lentiviral vector, an adenoviral vector, and an adeno-associated
virus vector.
9. The method of claim 1, wherein at least one of the gRNAs
comprises a CRISPR RNA (crRNA) and a trans-activated small RNA
(tracrRNA), which are expressed as separate nucleic acids.
10. The method of claim 1, wherein at least one of the gRNAs is
engineered as an artificial fusion small guide RNA (sgRNA)
comprised of a crRNA and a tracrRNA.
11. The method of claim 2, wherein said step of expressing in the
host cell the CRISPR-associated endonuclease, the first gRNA, and
the second gRNA, is further defined as stably expressing in the
host cell the CRISPR-associated endonuclease, the first gRNA, and
the second gRNA, and the method additionally includes the step of
immunizing the host cell against new retroviral infection.
12. The method of claim 2, wherein the host cell is chosen from the
group consisting of a CD4+ T cell, a macrophage, a monocyte, a gut
associated lymphoid cell, a microglial cell, and an astrocyte.
13. A method of treating a subject having or at risk for having a
genetic caused disease, including the steps of: administering to
the subject a therapeutically effective amount of a composition
comprising a Clustered Regularly Interspaced Short Palindromic
Repeat (CRISPR)-associated endonuclease, and two or more different
multiplex guide RNAs (gRNAs), wherein each of the at least two
gRNAs is complementary to a different target nucleic acid sequence
in a long terminal repeat (LTR) of the proviral DNA that is unique
from the genome of the host cell, and wherein the gRNAs are
complementary to two target sequences spanning from the sequence of
the subjects DNA greater than 9000-bp that is chromosomally
integrated and causes the genetic caused disease; cleaving a double
strand of the DNA at a first target protospacer sequence with the
CRISPR-associated endonuclease; cleaving a double strand of the DNA
at a second target protospacer sequence with the CRISPR-associated
endonuclease; excising the entire chromosomally integrated
sequence; and eradicating the chromosomally integrated sequence
from the host cell.
14. The method of claim 13, wherein said administering step further
includes the steps of: exposing a host cell to a composition
including an isolated nucleic acid encoding the CRISPR-associated
endonuclease; an isolated nucleic acid sequence encoding a first
gRNA having a first spacer sequence that is complementary to a
first target protospacer sequence in the DNA; and an isolated
nucleic acid encoding a second gRNA having a second spacer sequence
that is complementary to a second target protospacer sequence in
the DNA; expressing in the host cell the CRISPR-associated
endonuclease, the first gRNA, and the second gRNA; assembling, in
the host cell, a first gene editing complex including the
CRISPR-associated endonuclease and the first gRNA; and a second
gene editing complex including the CRISPR-associated endonuclease
and the second gRNA; directing the first gene editing complex to
the first target protospacer sequence by complementary base pairing
between the first spacer sequence and the first target protospacer
sequence; and directing the second gene editing complex to the
second target protospacer sequence by complementary base pairing
between the second spacer sequence and the second target
protospacer sequence.
15. The method of claim 13, wherein at least one of the first
target protospacer sequence and the second target protospacer
sequence is situated within the U3 region of the LTR.
16. The method of claim 13, wherein the CRISPR-associated
endonuclease is Cas9 or a human-optimized Cas9.
17. The method of claim 13, wherein the composition is encoded in a
vector selected from the group consisting of a plasmid vector, a
lentiviral vector, an adenoviral vector, and an adeno-associated
virus vector.
18. The method of claim 13, wherein at least one of the gRNAs
comprises a CRISPR RNA (crRNA) and a trans-activated small RNA
(tracrRNA), which are expressed as separate nucleic acids.
19. The method of claim 13, wherein at least one of the gRNAs is
engineered as an artificial fusion small guide RNA (sgRNA)
comprised of a crRNA and a tracrRNA.
20. The method of claim 13, wherein the host cell is chosen from
the group consisting of a CD4+ T cell, a macrophage, a monocyte, a
gut associated lymphoid cell, a microglial cell, and an
astrocyte.
21. The method of claim 13, wherein said method is performed
prenatally.
Description
BACKGROUND OF THE INVENTION
1. Technical Field
[0002] The present invention relates to compositions that
specifically cleave target sequences in retroviruses, for example
human immunodeficiency virus (HIV). Such compositions, which can
include nucleic acids encoding a Clustered Regularly Interspace
Short Palindromic Repeat (CRISPR) associated endonuclease and a
guide RNA sequence complementary to a target sequence in a human
immunodeficiency virus, can be administered to a subject having or
at risk for contracting an HIV infection.
2. Background Art
[0003] For more than three decades since the discovery of HIV-1,
AIDS remains a major public health problem affecting greater than
35.3 million people worldwide. AIDS remains incurable due to the
permanent integration of HIV-1 into the host genome. Current
therapy (highly active antiretroviral therapy or HAART) for
controlling HIV-1 infection and impeding AIDS development
profoundly reduces viral replication in cells that support HIV-1
infection and reduces plasma viremia to a minimal level. But HAART
fails to suppress low level viral genome expression and replication
in tissues and fails to target the latently-infected cells, for
example, resting memory T cells, brain macrophages, microglia, and
astrocytes, gut-associated lymphoid cells, that serve as a
reservoir for HIV-1. Persistent HIV-1 infection is also linked to
co-morbidities including heart and renal diseases, osteopenia, and
neurological disorders. There is a continuing need for curative
therapeutic strategies that target persistent viral reservoirs.
SUMMARY OF THE INVENTION
[0004] The present invention provides for a method of treating a
subject having or at risk for having an HIV-1 virus infection, by
administering to the subject a therapeutically effective amount of
a composition comprising a Clustered Regularly Interspaced Short
Palindromic Repeat (CRISPR)-associated endonuclease, and two or
more different multiplex guide RNAs (gRNAs), wherein each of the at
least two gRNAs is complementary to a different target nucleic acid
sequence in a long terminal repeat (LTR) of proviral DNA of the
virus that is unique from the genome of the host cell, cleaving a
double strand of the proviral DNA at a first target protospacer
sequence with the CRISPR-associated endonuclease, cleaving a double
strand of the proviral DNA at a second target protospacer sequence
with the CRISPR-associated endonuclease, completely excising a
fragment of greater than 9000-bp of integrated HIV-1 proviral DNA
that spanned from its 5'- to 3'-LTRs, and eradicating the HIV-1
proviral DNA from the host cell.
[0005] The present invention also provides for a method of treating
a subject having or at risk for having a genetic caused disease, by
administering to the subject a therapeutically effective amount of
a composition comprising a Clustered Regularly Interspaced Short
Palindromic Repeat (CRISPR)-associated endonuclease, and two or
more different multiplex guide RNAs (gRNAs), wherein each of the at
least two gRNAs is complementary to a different target nucleic acid
sequence in a long terminal repeat (LTR) of the proviral DNA that
is unique from the genome of the host cell, and wherein the gRNAs
are complementary to two target sequences spanning from the
sequence of the subjects DNA greater than 9000-bp that is
chromosomally integrated and causes the genetic caused disease,
cleaving a double strand of the DNA at a first target protospacer
sequence with the CRISPR-associated endonuclease, cleaving a double
strand of the DNA at a second target protospacer sequence with the
CRISPR-associated endonuclease, excising the entire chromosomally
integrated sequence, and eradicating the chromosomally integrated
sequence from the host cell.
[0006] The details of one or more embodiments of the invention are
set forth in the accompanying drawings and the description below.
Other features, objects, and advantages of the invention will be
apparent from the description and drawings, and from the
claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1A, FIG. 1B, FIG. 1C, FIG. 1D, FIG. 1E, FIG. 1F, FIG.
1G, and FIG. 1H show that Cas9/LTR-gRNA suppresses HIV-1 reporter
virus production in CHME5 microglial cells latently infected with
HIV-1. FIG. 1A shows a representative gating diagram of EGFP flow
cytometry shows a dramatic reduction in TSA-induced reactivation of
latent pNL4-3-.DELTA.Gag-d2EGFP reporter virus by stably expressed
Cas9 plus LTR-A or -B, vs. empty U6-driven gRNA expression vector
(U6-CAG). FIG. 1B shows SURVEYOR Cel-I nuclease assay of PCR
product (-453 to +43 within LTR) from selected LTR-A- or
-B-expressing stable clones shows dramatic indel mutation patterns
(arrows). FIG. 1C shows a PCR fragment analysis of a precise
deletion of 190-bp region between LTRs A and B cutting sites
(arrowhead and arrow in FIG. 1D), leaving 306-bp fragment (arrow in
FIG. 1C) validated by TA-cloning and sequencing results. FIG. 1D
discloses SEQ ID NOS 1-3, respectively, in order of appearance.
FIG. 1E is a graph showing subcloning of LTR-A/B stable clones
reveals complete loss of reporter reactivation determined by EGFP
flow cytometry, and FIG. 1F shows elimination of
pNL4-3-.DELTA.Gag-d2EGFP proviral genome detected by standard, and
FIG. 1G shows real-time (1G) PCR amplification of genomic DNA for
EGFP and HIV-1 Rev response element (RRE); .beta.-actin is a DNA
purification and loading control. FIG. 1H shows PCR genotyping of
LTR-A/B subclones (#8, 13) using primers to amplify DNA fragment
covering HIV-1 LTR U3/R/U5 regions (-411 to +129) shows indels (a,
deletion; c, insertion) and "intact" or combined LTR (b).
[0008] FIG. 2A, FIG. 2B, and FIG. 2C show that Cas9/LTR-gRNA
efficiently eradicates latent HIV-1 virus from U1 monocytic cells.
FIG. 2A shows a diagram showing excision of HIV-1 entire genome in
chromosome Xp11.4. HIV-1 integration sites were identified using a
Genome-Walker link PCR kit. Left, analysis of PCR amplicon lengths
using a primer pair (P1/P2) targeting chromosome X integration
site-flanking sequence reveals elimination of the entire HIV-1
genome (9709-bp), leaving two fragments (833- and 670-bp). FIG. 2B
shows TA cloning and sequencing of the LTR fragment (833-bp)
showing the host genomic sequence (small letters, 226-bp) and the
partial sequences (634-27=607 bp) of 5'-LTR (underlined using
dashes) and 3'-LTR (first underlined section) with a 27-bp deletion
around the LTR-A targeting site (second underlined section).
Bottom, two indel alleles identified from 15 sequenced clonal
amplicons. The 670-bp fragment consists of a host sequence (226-bp)
and the remaining LTR sequence (634-190=444 bp) after 190-bp
excision by simultaneous cutting at LTR-A and B target sites. The
underlined and highlighted sequences indicate the gRNA LTR-A target
site and PAM. FIG. 2B discloses SEQ ID NOS 4-13, respectively, in
order of appearance. FIG. 2C shows a functional analysis of
LTR-A/B-induced eradication of HIV-1 genome, showing substantial
blockade of TSA/PMA reactivation-induced p24 virion release. U1
cells were transfected with pX260-LTRs-A, -B, or -A/B. After 2-week
puromycin selection, cells were treated with TSA (250 nM)/PMA for 2
days before p24 Gag ELISA was performed.
[0009] FIG. 3A, FIG. 3B, and FIG. 3C show that stable expression of
Cas9 plus LTR-A/B vaccinates TZM-bl cells against new HIV-1 virus
infection. FIG. 3A shows immunohistochemistry (ICC) and Western
blot (WB) analyses with anti-Flag antibody confirm the expression
of Flag-Cas9 in TZM-bl stable clones puromycin (2
.mu.g/ml)-selected for 2 weeks. FIG. 3B shows PCR genotyping of
Cas9/LTR-A/B stable clones (c1-c7) reveals a close correlation of
LTR excision with repression of LTR luciferase reporter activation.
Fold changes represent TSA/PMA-induced levels over corresponding
non-induction levels. FIG. 3C shows Cas9/LTR-A/B-expressing cells
(c4) were infected with pseudotyped-pNL4-3-Nef-EGFP lentivirus at
indicated multiplicity of infection (MOI) and infection efficiency
measured by EGFP flow cytometry, 2 d post-infection. FIG. 3D shows
phase-contrast/fluorescence micrographs show that LTR-A/B stable,
but not control (U6-CAG; black) cells, are resistant to new
infection (right panel) by pNL4-3-.DELTA.E-EGFP HIV-1 reporter
virus (gray).
[0010] FIG. 4A, FIG. 4B, FIG. 4C, and FIG. 4D illustrate the
off-target effects of Cas9/LTR-A/B on the human genome. FIG. 4A is
a SURVEYOR assay that shows no indel mutations in
predicted/potential off-target regions in human TZM-bl and U1
cells. LTR-A on-target region (A) was used as a positive control
and empty U6-CAG vector (U6) as a negative control. FIG. 4B shoes
whole-genome sequencing of LTR-A/B stable TZM-bl subclone showing
the numbers of called indels in the U6-CAG control and LTR-A/B
samples, FIG. 4C shows detailed information on 10 called indels
near gRNA target sites in both samples, and FIG. 4D shows
distribution of off-target called indels. FIG. 4C discloses SEQ ID
NOS 14-15, respectively, in order of appearance.
[0011] FIG. 5 shows the LTR U3 sequence of the integrated
lentiviral LTR-firefly luciferase reporter identified by TA-cloning
and sequencing of PCR product (-411 to -10) from the genomic DNA of
human TZM-bl cells. The protospacer and PAM (NGG) sequences of 4
gRNAs (LTR-A to D) and the predicted binding sites of indicated
transcription factors are highlighted. The precise cleavage sites
are marked with scissors. +1 indicates the transcriptional start
site. FIG. 5 discloses SEQ ID NO: 16.
[0012] FIG. 6A, FIG. 6B, and FIG. 6C show that LTR-C and LTR-D
remarkably suppress TSA-induced reactivation of latent
pNL4-3-.DELTA.Gag-d2EGFP virus in CHME5 microglia cells. FIG. 6A is
a diagram schematically showing pNL4-3-.DELTA.Gag-d2EGFP vector
containing Tat, Rev, Env, Vpu, and Nef with the reporter gene
d2EGFP. FIG. 6B shows a SURVEYOR assay showing indel mutations in
the on-target LTR genome of Cas9/LTR-D but not Cas9/LTR-C
transfected cells. FIG. 6C shows a representative gating diagram of
EGFP flow cytometry showing a dramatic reduction in TSA-induced
reactivation of latent pNL4-3-.DELTA.Gag-d2EGFP reporter viruses by
stable expression of Cas9/LTR-C or LTR-D as compared with empty
U6-driven gRNA expression vector (U6-CAG).
[0013] FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D, FIG. 7E, and FIG. 7F
show that both LTR-C and LTR-D induced indel mutations and
significantly decreased constitutive and TSA/PMA-induced luciferase
activity in TZM-bl cells stably incorporated with HIV-1 LTR-firefly
luciferase reporter gene. FIG. 7A shows a functional luciferase
reporter assay revealing a significant reduction of LTR
reactivation by LTR-C, LTR-D or both. FIG. 7B shows a SURVEYOR
assay showing indel mutation in LTR DNA (-453 to +43) induced by
LTR-C and LTR-D (upper arrow). A combination of LTR-C and LTR-D
generates a 194 bp fragment (lower arrow) resulting from the
deletion of 302 bp region between LTR-C and LTR-D. FIG. 7C and FIG.
7D show Sanger sequencing of 30 clones validating the indel
efficiency at 23% for LTR-C and 13% for LTR-D and example
chromatograms showing insertion/deletion. FIG. 7C discloses SEQ ID
NOS 17-25, respectively, in order of appearance. FIG. 7D discloses
SEQ ID NOS 26-30, respectively, in order of appearance. FIG. 7E
shows PCR-restriction fragment length polymorphism (RFLP) analysis
using BsaJ I to cut 5 sites (96, 102, 372, 386, 482) of the PCR
product covering -453 to +43 of LTR showing two major bands (96 bp
and 270 bp) in the U6-CAG control sample, but an additional 372 bp
band (upper arrow) after LTR-C-induced indel mutation at the 96/102
sites, a 290 bp band (middle arrow) after LTR-D-induced mutations
at the 372 site and a 180 bp fragment (lower arrow) after
LTR-C/D-induced excision. FIG. 7F shows chromatograms showing the
deletion of a 302 bp fragment between LTR-C and LTR-D (top) and an
additional 17 bp deletion (bottom). Red arrows indicate the
junction sites. *P<0.05 indicates a significant decrease in
LTR-C or LTR-D-mediated luciferase activation compared to U6-CAG
control. FIG. 7F discloses SEQ ID NOS 31-32, respectively, in order
of appearance.
[0014] FIG. 8A, FIG. 8B, and FIG. 8C illustrate the TA cloning and
Sanger sequencing of PCR products from CHME5 subclones of LTR-A/B
and empty U6-CAG control using primers covering HIV-1 LTR U3/R/U5
regions (-411 to +129). FIG. 8A shows possible combination of LTR-A
and LTR-B cuts on both 5'- and 3'-LTRs generating potential
fragments a-c as indicated. FIG. 8B shows blasting of fragment a
(351 bp) showing 190 bp deletion between LTR-A and LTR-B cut sites.
FIG. 8C shows a blast of fragment c (682 bp) showing a 175 bp
insertion at the LTR-A cleavage site and a 27 bp deletion at the
LTR-B cleavage site. FIG. 8C discloses SEQ ID NOS 33-34,
respectively, in order of appearance.
[0015] FIG. 9A, FIG. 9B, FIG. 9C, and FIG. 9D demonstrate that
Cas9/LTR-gRNA efficiently eradicates latent HIV-1 virus from U1
monocytic cells. FIG. 9A shows a Sanger sequencing of a 1.1 kb
fragment from long-range PCR using a primer pair (T492/T493)
targeting a chromosome 2 integration site-flanking sequence (small
letters, 467-bp) reveals elimination of the entire HIV-1 genome
(9709-bp), leaving combined 5'-LTR (underlined using dashes) and
3'-LTR with a 6-bp insertion (boxed) precisely at the third
nucleotide from PAM (TGG) LTR-A targeting site (underlined) and a
4-bp deletion (nnnn). FIG. 9A discloses SEQ ID NO: 35. FIG. 9B is a
representative DNA gel picture that shows specific eradication of
the HIV-1 genome. NS, non-specific band. FIG. 9C is a graph and
FIG. 9D is a graph showing quantitative PCR analysis using the
primer pair targeting the Gag gene (T457/T458) shows 85% efficiency
of entire HIV-1 genome eradication in Cas9/LTR-A/B-expressing U1
cells. U1 cells were transfected with pX260 empty vector (U6-CAG)
or LTRs-A/B-encoding vectors. After 2-week puromycin selection, the
cellular genomic DNAs were used for absolute quantitative qPCR
analysis using spiked pNL4-3-.DELTA.E-EGFP human genomic DNA as a
standard. **P<0.01 indicates a significant decrease compared to
the U6-CAG control.
[0016] FIG. 10A, FIG. 10B, and FIG. 10C show that Cas9/LTR gRNAs
effectively eradicates HIV-1 provirus in J-Lat latently infected T
cells. FIG. 10A shows functional analysis by EGFP flow cytometry
reveals approximately 50% reduction of PMA and TNF.alpha.-induced
reactivation of EGFP reporter viruses. FIG. 10B is a SURVEYOR assay
that shows indel mutations (arrow) in the on-target LTR genome of
Cas9/LTR-A/B transfected cells. J-Lat cells were transfected with
pX260 empty vector or LTRs-A and -B. After 2-week puromycin
selection, cells were treated with PMA or TNF.alpha. for 24 h. The
genomic DNAs were subject to PCR using primers covering HIV-1 LTR
U3/R/U5 regions (-411 to +129) and the SURVEYOR assay was
performed. **P<0.01 indicates a significant decrease compared to
the U6-CAG control. FIG. 10C shows a PCR fragment analysis using
primers covering HIV-1 LTR (-374 to +43) shows a precise deletion
of 190-bp region between LTRs A and B cutting sites, leaving 227-bp
fragment (arrow). House-keeping gene .beta.-actin serves as a DNA
purification and loading control.
[0017] FIG. 11A, FIG. 11B, FIG. 11C, and FIG. 11D show that genome
editing efficiency depends upon the presence of Cas9 and gRNAs.
FIG. 11A shows PCR genotyping reveals the absence of a U6-driven
LTR-A or LTR-B expression cassette and FIG. 11B shows
absence/reduction of CMV-driven Cas9 DNA in puromycin-selected
TZM-bl subclones without any indication of genomic editing. Genomic
DNAs from indicated subclones were subject to conventional (FIG.
11A) or real-time (FIG. 11B) PCR analyses using a primer pair
covering U6 promoter (T351) and LTR-A (T354) or -B (T356), and
targeting Cas9 (T477/T491). FIG. 11C and FIG. 11D show Cas9 protein
expression is absent in ineffective TZM-bl subclones. FIG. 11C
shows that the Flag-tagged Cas9 fusion protein was detected by
Western blot (WB) and immunocytochemistry (ICC) with anti-Flag
monoclonal antibody. HEK293T cell line stably expressing Flag-Cas9
was used as a positive control for WB. GAPDH serves as a protein
loading control. Clone c6 contains Cas9 DNA but no Cas9 protein
expression, suggesting a potential mechanism of epigenetic
repression after puromycin selection. Clone c5 and c3 may represent
a truncated Flag-Cas9 (tCas9). FIG. 11D shows that the nucleus was
stained with Hoechst 33258.
[0018] FIG. 12A, FIG. 12B, FIG. 12C, and FIG. 12D demonstrate that
stable expression of Cas9/LTR-A/B gRNAs in TZM-bl cells vaccinates
against pseudotyped or native HIV-1 viruses. FIG. 12 shows that
flow cytometry shows a significant reduction of native
pNL4-3-.DELTA.E-EGFP reporter virus infection efficiency in
Cas9/LTR-A/B expressing TZM-bl subclones. Real-time PCR analysis
reveals suppression or elimination of viral RNA as shown in FIG.
12B and DNA as shown in FIG. 12C by Cas9/LTR-A/B gRNAs. FIG. 12D
shows that the firefly-luciferase luminescent assay demonstrates
dramatic inhibition of virus infection-stimulated LTR promoter
activity by Cas9/LTR-A/B gRNAs. The stable Cas9/LTR-A/B
gRNA-expressing TZM-bl cells were infected for 2 hours with
indicated native HIV-1 viruses, and washed twice with PBS. At 2
days post-infection, cells were collected, fixed and analyzed by
flow cytometry for EGFP expression (in FIG. 12A), or lysed for
total RNA extraction and RT-qPCR (in FIG. 12B), genomic DNA
purification for qPCR (in FIG. 12C) and luminescence measurement
(in FIG. 12D). *P<0.05 and **P<0.01 indicate significant
decreases compared to the U6-CAG control.
[0019] FIG. 13 shows the predicted LTR gRNAs and their off-target
numbers (100% match). The 5'-LTR sense and antisense sequences (SEQ
ID NOS 79-111 and 112-141, respectively) (634 bp) of pHR'-CMV-LacZ
lentiviral vector (AF105229) were utilized to search for Cas9/gRNA
target sites containing a 20-bp guide sequence (protospacer) plus
the protospacer adjacent motif sequence (NGG) using Jack Lin's
CRISPR/Cas9 gRNA finder tool
(http://spot.colorado.edu/.about.slin/cas9.html). Each gRNA plus
NGG (AGG, TGG, GGG, CGG) was blasted against available human
genomic and transcript sequences with 1000 aligned sequences being
displayed. After pressing Control+F, copy/paste the target sequence
(1-23 through 9-23 nucleotides) and find the number of genomic
targets with 100% match. The number of off-targets for each
searching was divided by 3 because of repeated genome library. The
number shown indicates the sum of 4 searches (NGG). The top number
(for example, for gRNA sequence (sense): 20, 19, 19, 17, 16, 15,
14, 13, 12) indicates the gRNA target sequences farthest from NGG.
The sequence and off-target numbers for the selected LTR-A/B and
LTR-C/D are highlighted red and green respectively.
[0020] FIG. 14 depicts the oligonucleotides for gRNA targeting
sites and primers (SEQ D NOS 36-78, respectively, in order of
appearance) used for PCR and sequencing.
[0021] FIG. 15 shows the locations of predicted gRNA targeting
sites of LTR-A and LTR-B and discloses "query Seq" sequences as SEQ
ID NOS 142-252, and "ref Seq" sequences as SEQ ID NOS 253-363, all
respectively, in order of appearance.
[0022] FIG. 16A, FIG. 16B, FIG. 16C, FIG. 16D, FIG. 16E, FIG. 16F,
FIG. 16G, and FIG. 16H show that both LTR-C and LTR-D decreased
constitutive and TSA/PMA-induced luciferase activity in TZMBI cells
stably incorporated with HIV-1 LTR firefly luciferase reporter gene
and combination induced precise genome excision. FIG. 16A shows
that six gRNA targets were designed for the promoter region of
HIV-LTR. FIG. 16A discloses SEQ ID NO: 16. TZMBI cells were
cotransfected with Cas9-EGFP and chimera gRNA expression cassette
(PCR products) by lipofectamine 2000. FIG. 16B is a graph showing
that after 3 d, EGFP-positive cells were sorted through FACS and
2000 cells per group were collected for luciferase assay. FIG. 16B
discloses SEQ ID: 31. FIG. 16C is a graph showing the population
sorted cells were cultured for 2 d and treated with TSA/PMA for 1 d
before luciferase assay. The single cells were sorted into 96-well
plate and cultured till confluence for luciferase assay in the
absence (shown in the graph of FIG. 16D) of TSA/PMA for 1 d or
presence (shown in the graph of FIG. 1E) of TSA/PMA for 1 d. FIG.
16F and FIG. 16G show the PCR product from the population sorted
cells were analyzed with Surveyor Cel-I nuclease assay and
restriction fragment length polymorphism with BsajI (FIG. 16G)
showing mutation (FIG. 16F) or uncut (FIG. 16G) band (red arrow). A
200 bp fragment (FIG. 16F, FIG. 16G, black arrow) resulting from
the deletion of 321 bp region between LTR-C and LTR-D as predicted
(FIG. 16A, red arrowhead) was validated by TA-cloning and
sequencing showing precise genomic excision (FIG. 16H). Sanger
sequencing of PCR products from individual LTR-C and -D identified
% and % indel mutation efficiency respectively. * p<0.05
indicates statistically significant reduction using a student's t
test compared to the corresponding U6-CAG control. Protospace(E),
Protospace(C), Protospace(A), Protospace(B), Protospace(D), and
Protospace(F) correspond to SEQ ID NOS 365, 367, 369, 371, 373, and
375, respectively, in order of appearance.
[0023] FIG. 17A, FIG. 17B, FIG. 17C, FIG. 17D, FIG. 17E, FIG. 17F,
FIG. 17G, and FIG. 17H show that Cas9/LTR-gRNA inhibited
constitutive and inducible production of HIV-1 virus measured by
EGFP flow cytometry in HIV-1 latently infected CHME5 microglia cell
line. The pHR' lentiviral vector containing Tat, Rev, Env, Vpu, and
Nef with the reported gene d2EGFP was transduced into human fetal
microglia cell line CHME5 and 400 bp deletion in U3 region of
3'-LTR is illustrated (shown in FIG. 17A). FIG. 17B is a graph
showing transient transfection of Cas9/gRNA, Human HIV-1 LTR-A, B
alone or combination decreased the intensity but not percentage of
EGFP due to suppression of LTR promoter activity. FIG. 17C is a
graph showing transient transfection of Cas9/gRNA, Human HIV-1
LTR-C, D alone or combination decreased the intensity but not
percentage of EGFP due to suppression of LTR promoter activity.
FIG. 17D and FIG. 18 are graphs showing that after antibiotic
selection for 1-2 weeks, the percentage of EGFP cells was also
reduced. FIG. 17F and FIG. 17G show the PCR product from the stable
selected clones were analyzed with Surveyor Cel-I nuclease assay
showing indel mutation dramatically in LTR-A and LTR-B but weakly
in the combination of LTR-A/B (red arrow). A 331 bp fragment (shown
in FIG. 17F and FIG. 17G, black arrow) resulting from the deletion
of 190 bp region between LTR-A and LTR-B as predicted (FIG. 17H,
red arrowhead) was validated by TA-cloning and sequencing showing
precise genomic excision (FIG. 17H). FIG. 17H discloses SEQ ID NOS
1-3, respectively, in order of appearance.
[0024] FIG. 18 shows LTR of a representative HIV-1 sequence (SEQ ID
NO: 376). The U3 region extends from nucleotide 1 to nucleotide 432
(SEQ ID NO: 377), the R region extends from nucleotide 432 to
nucleotide 559 (SEQ ID NO: 378), and the U5 region extends from 560
to nucleotide 634 (SEQ ID NO: 379).
[0025] FIG. 19 shows LTR of a representative SIV sequence (SEQ ID
NO: 380). The U3 region extends from nucleotide 1 to nucleotide 517
(SEQ ID NO: 381), the R region extends from nucleotide 518 to
nucleotide 693 (SEQ ID NO: 382), and the U5 region extends from 694
to nucleotide 818 (SEQ ID NO: 383).
DETAILED DESCRIPTION OF THE INVENTION
[0026] The present invention is based, in part, on our discovery
that we could eliminate the integrated HIV-1 genome from HIV-1
infected cells by using the RNA-guided Clustered Regularly
Interspace Short Palindromic Repeat (CRISPR)-Cas 9 nuclease system
(Cas9/gRNA) in single and multiplex configurations. We identified
highly specific targets within the HIV-1 LTR U3 region that were
efficiently edited by Cas9/gRNA, inactivating viral gene expression
and replication in latently-infected microglial, promonocytic and T
cells. Cas9/gRNAs caused neither genotoxicity nor off-target
editing to the host cells, and completely excised a 9709-bp
fragment of integrated proviral DNA that spanned from its 5'- to
3'-LTRs. Furthermore, the presence of multiplex gRNAs within
Cas9-expressing cells prevented HIV-1 infection. Our results
suggest that Cas9/gRNA can be engineered to provide a specific,
efficacious prophylactic and therapeutic approach against AIDS.
[0027] Accordingly, the invention features compositions comprising
a nucleic acid encoding a CRISPR-associated endonuclease and a
guide RNA that is complementary to a target sequence in a
retrovirus, e.g., HIV, as well as pharmaceutical formulations
comprising a nucleic acid encoding a CRISPR-associated endonuclease
and a guide RNA that is complementary to a target sequence in HIV.
Also featured are compositions comprising a CRISPR-associated
endonuclease polypeptide and a guide RNA that is complementary to a
target sequence in HIV, as well as pharmaceutical formulations
comprising a CRISPR-associated endonuclease polypeptide and a guide
RNA that is complementary to a target sequence in HIV.
[0028] Also featured are methods of administering the compositions
to treat a retroviral infection, e.g., HIV infection, methods of
eliminating viral replication, and methods of preventing HIV
infection. The therapeutic methods described herein can be carried
out in connection with other antiretroviral therapies (e.g.,
HAART).
[0029] The clinical course of HIV infection can vary according to a
number of factors, including the subject's genetic background, age,
general health, nutrition, treatment received, and the HIV subtype.
In general, most individuals develop flu-like symptoms within a few
weeks or months of infection. The symptoms can include fever,
headache, muscle aches, rash, chills, sore throat, mouth or genital
ulcers, swollen lymph glands, joint pain, night sweats, and
diarrhea. The intensity of the symptoms can vary from mild to
severe depending upon the individual. During the acute phase, the
HIV viral particles are attracted to and enter cells expressing the
appropriate CD4 receptor molecules. Once the virus has entered the
host cell, the HIV encoded reverse transcriptase generates a
proviral DNA copy of the HIV RNA and the pro-viral DNA becomes
integrated into the host cell genomic DNA. It is this HIV provirus
that is replicated by the host cell, resulting in the release of
new HIV virions which can then infect other cells. The methods and
compositions of the invention are generally and variously useful
for excision of integrated HIV proviral DNA, although the invention
is not so limited, and the compositions may be administered to a
subject at any stage of infection or to an uninfected subject who
is at risk for HIV infection.
[0030] The primary HIV infection subsides within a few weeks to a
few months, and is typically followed by a long clinical "latent"
period which may last for up to 10 years. The latent period is also
referred to as asymptomatic HIV infection or chronic HIV infection.
The subject's CD4 lymphocyte numbers rebound, but not to
pre-infection levels and most subjects undergo seroconversion, that
is, they have detectable levels of anti-HIV antibody in their
blood, within 2 to 4 weeks of infection. During this latent period,
there can be no detectable viral replication in peripheral blood
mononuclear cells and little or no culturable virus in peripheral
blood. During the latent period, also referred to as the clinical
latency stage, people who are infected with HIV may experience no
HIV-related symptoms, or only mild ones. But, the HIV virus
continues to reproduce at very low levels. In subjects who have
treated with anti-retroviral therapies, this latent period may
extend for several decades or more. However, subjects at this stage
are still able to transmit HIV to others even if they are receiving
antiretroviral therapy, although anti-retroviral therapy reduces
the risk of transmission. As noted above, anti-retroviral therapy
does not suppress low levels of viral genome expression nor does it
efficiently target latently infected cells such as resting memory T
cells, brain macrophages, microglia, astrocytes and gut associated
lymphoid cells.
[0031] Clinical signs and symptoms of AIDS (acquired
immunodeficiency syndrome) appear as CD4 lymphocyte numbers
decrease, resulting in irreversible damage to the immune system.
Many patients also present with AIDS-related complications,
including, for example, opportunistic infections such as
tuberculosis, salmonellosis, cytomegalovirus, candidiasis,
cryptococcal meningitis, toxoplasmosis, and cryptosporidiosis, as
well as certain kinds of cancers, including for example, Kaposi's
sarcoma, and lymphomas, as well as wasting syndrome, neurological
complications, and HIV-associated nephropathy.
[0032] Compositions
[0033] The compositions of the invention include nucleic acids
encoding a CRISPR-associated endonuclease, e.g., Cas9, and a guide
RNA that is complementary to a target sequence in a retrovirus,
e.g., HIV. In bacteria the CRISPR/Cas loci encode RNA-guided
adaptive immune systems against mobile genetic elements (viruses,
transposable elements and conjugative plasmids). Three types
(I-III) of CRISPR systems have been identified. CRISPR clusters
contain spacers, the sequences complementary to antecedent mobile
elements. CRISPR clusters are transcribed and processed into mature
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)
RNA (crRNA). The CRISPR-associated endonuclease, Cas9, belongs to
the type II CRISPR/Cas system and has strong endonuclease activity
to cut target DNA. Cas9 is guided by a mature crRNA that contains
about 20 base pairs (bp) of unique target sequence (called spacer)
and a trans-activated small RNA (tracrRNA) that serves as a guide
for ribonuclease III-aided processing of pre-crRNA. The
crRNA:tracrRNA duplex directs Cas9 to target DNA via complementary
base pairing between the spacer on the crRNA and the complementary
sequence (called protospacer) on the target DNA. Cas9 recognizes a
trinucleotide (NGG) protospacer adjacent motif (PAM) to specify the
cut site (the 3rd nucleotide from PAM). The crRNA and tracrRNA can
be expressed separately or engineered into an artificial fusion
small guide RNA (sgRNA) via a synthetic stem loop (AGAAAU) to mimic
the natural crRNA/tracrRNA duplex. Such sgRNA, like shRNA, can be
synthesized or in vitro transcribed for direct RNA transfection or
expressed from U6 or H1-promoted RNA expression vector, although
cleavage efficiencies of the artificial sgRNA are lower than those
for systems with the crRNA and tracrRNA expressed separately.
[0034] The compositions of the invention can include a nucleic acid
encoding a CRISPR-associated endonuclease. In some embodiments, the
CRISPR-associated endonuclease can be a Cas9 nuclease. The Cas9
nuclease can have a nucleotide sequence identical to the wild type
Streptococcus pyrogenes sequence. In some embodiments, the
CRISPR-associated endonuclease can be a sequence from other
species, for example other Streptococcus species, such as
thermophilus; Psuedomona aeruginosa, Escherichia coli, or other
sequenced bacteria genomes and archaea, or other prokaryotic
microorganisms. Alternatively, the wild type Streptococcus
pyrogenes Cas9 sequence can be modified. The nucleic acid sequence
can be codon optimized for efficient expression in mammalian cells,
i.e., "humanized." A humanized Cas9 nuclease sequence can be for
example, the Cas9 nuclease sequence encoded by any of the
expression vectors listed in Genbank accession numbers KM099231.1
GI:669193757; KM099232.1 GI:669193761; or KM099233.1 GI:669193765.
Alternatively, the Cas9 nuclease sequence can be for example, the
sequence contained within a commercially available vector such as
PX330 or PX260 from Addgene (Cambridge, Mass.). In some
embodiments, the Cas9 endonuclease can have an amino acid sequence
that is a variant or a fragment of any of the Cas9 endonuclease
sequences of Genbank accession numbers KM099231.1 GI:669193757;
KM099232.1 GI:669193761; or KM099233.1 GI:669193765 or Cas9 amino
acid sequence of PX330 or PX260 (Addgene, Cambridge, Mass.). The
Cas9 nucleotide sequence can be modified to encode biologically
active variants of Cas9, and these variants can have or can
include, for example, an amino acid sequence that differs from a
wild type Cas9 by virtue of containing one or more mutations (e.g.,
an addition, deletion, or substitution mutation or a combination of
such mutations). One or more of the substitution mutations can be a
substitution (e.g., a conservative amino acid substitution). For
example, a biologically active variant of a Cas9 polypeptide can
have an amino acid sequence with at least or about 50% sequence
identity (e.g., at least or about 50%, 55%, 60%, 65%, 70%, 75%,
80%, 85%, 90%, 95%, 97%, 98%, or 99% sequence identity) to a wild
type Cas9 polypeptide. Conservative amino acid substitutions
typically include substitutions within the following groups:
glycine and alanine; valine, isoleucine, and leucine; aspartic acid
and glutamic acid; asparagine, glutamine, serine and threonine;
lysine, histidine and arginine; and phenylalanine and tyrosine. The
amino acid residues in the Cas9 amino acid sequence can be
non-naturally occurring amino acid residues. Naturally occurring
amino acid residues include those naturally encoded by the genetic
code as well as non-standard amino acids (e.g., amino acids having
the D-configuration instead of the L-configuration). The present
peptides can also include amino acid residues that are modified
versions of standard residues (e.g. pyrrolysine can be used in
place of lysine and selenocysteine can be used in place of
cysteine). Non-naturally occurring amino acid residues are those
that have not been found in nature, but that conform to the basic
formula of an amino acid and can be incorporated into a peptide.
These include D-alloisoleucine(2R,3S)-2-amino-3-methylpentanoic
acid and L-cyclopentyl glycine (S)-2-amino-2-cyclopentyl acetic
acid. For other examples, one can consult textbooks or the
worldwide web (a site is currently maintained by the California
Institute of Technology and displays structures of non-natural
amino acids that have been successfully incorporated into
functional proteins).
[0035] The Cas9 nuclease sequence can be a mutated sequence. For
example the Cas9 nuclease can be mutated in the conserved HNH and
RuvC domains, which are involved in strand specific cleavage. For
example, an aspartate-to-alanine (D10A) mutation in the RuvC
catalytic domain allows the Cas9 nickase mutant (Cas9n) to nick
rather than cleave DNA to yield single-stranded breaks, and the
subsequent preferential repair through HDR can potentially decrease
the frequency of unwanted indel mutations from off-target
double-stranded breaks.
[0036] In some embodiments, compositions of the invention can
include a CRISPR-associated endonuclease polypeptide encoded by any
of the nucleic acid sequences described above. The terms "peptide,"
"polypeptide," and "protein" are used interchangeably herein,
although typically they refer to peptide sequences of varying
sizes. We may refer to the amino acid-based compositions of the
invention as "polypeptides" to convey that they are linear polymers
of amino acid residues, and to help distinguish them from
full-length proteins. A polypeptide of the invention can
"constitute" or "include" a fragment of a CRISPR-associated
endonuclease, and the invention encompasses polypeptides that
constitute or include biologically active variants of a
CRISPR-associated endonuclease. It will be understood that the
polypeptides can therefore include only a fragment of a
CRISPR-associated endonuclease (or a biologically active variant
thereof) but may include additional residues as well. Biologically
active variants will retain sufficient activity to cleave target
DNA.
[0037] The bonds between the amino acid residues can be
conventional peptide bonds or another covalent bond (such as an
ester or ether bond), and the polypeptides can be modified by
amidation, phosphorylation or glycosylation. A modification can
affect the polypeptide backbone and/or one or more side chains.
Chemical modifications can be naturally occurring modifications
made in vivo following translation of an mRNA encoding the
polypeptide (e.g., glycosylation in a bacterial host) or synthetic
modifications made in vitro. A biologically active variant of a
CRISPR-associated endonuclease can include one or more structural
modifications resulting from any combination of naturally occurring
(i.e., made naturally in vivo) and synthetic modifications (i.e.,
naturally occurring or non-naturally occurring modifications made
in vitro). Examples of modifications include, but are not limited
to, amidation (e.g., replacement of the free carboxyl group at the
C-terminus by an amino group); biotinylation (e.g., acylation of
lysine or other reactive amino acid residues with a biotin
molecule); glycosylation (e.g., addition of a glycosyl group to
either asparagines, hydroxylysine, serine or threonine residues to
generate a glycoprotein or glycopeptide); acetylation (e.g., the
addition of an acetyl group, typically at the N-terminus of a
polypeptide); alkylation (e.g., the addition of an alkyl group);
isoprenylation (e.g., the addition of an isoprenoid group);
lipoylation (e.g. attachment of a lipoate moiety); and
phosphorylation (e.g., addition of a phosphate group to serine,
tyrosine, threonine or histidine).
[0038] One or more of the amino acid residues in a biologically
active variant may be a non-naturally occurring amino acid residue.
Naturally occurring amino acid residues include those naturally
encoded by the genetic code as well as non-standard amino acids
(e.g., amino acids having the D-configuration instead of the
L-configuration). The present peptides can also include amino acid
residues that are modified versions of standard residues (e.g.
pyrrolysine can be used in place of lysine and selenocysteine can
be used in place of cysteine). Non-naturally occurring amino acid
residues are those that have not been found in nature, but that
conform to the basic formula of an amino acid and can be
incorporated into a peptide. These include
D-alloisoleucine(2R,3S)-2-amino-3-methylpentanoic acid and
L-cyclopentyl glycine (S)-2-amino-2-cyclopentyl acetic acid. For
other examples, one can consult textbooks or the worldwide web (a
site is currently maintained by the California Institute of
Technology and displays structures of non-natural amino acids that
have been successfully incorporated into functional proteins).
[0039] Alternatively, or in addition, one or more of the amino acid
residues in a biologically active variant can be a naturally
occurring residue that differs from the naturally occurring residue
found in the corresponding position in a wildtype sequence. In
other words, biologically active variants can include one or more
amino acid substitutions. We may refer to a substitution, addition,
or deletion of amino acid residues as a mutation of the wildtype
sequence. As noted, the substitution can replace a naturally
occurring amino acid residue with a non-naturally occurring residue
or just a different naturally occurring residue. Further the
substitution can constitute a conservative or non-conservative
substitution. Conservative amino acid substitutions typically
include substitutions within the following groups: glycine and
alanine; valine, isoleucine, and leucine; aspartic acid and
glutamic acid; asparagine, glutamine, serine and threonine; lysine,
histidine and arginine; and phenylalanine and tyrosine.
[0040] The polypeptides that are biologically active variants of a
CRISPR-associated endonuclease can be characterized in terms of the
extent to which their sequence is similar to or identical to the
corresponding wild-type polypeptide. For example, the sequence of a
biologically active variant can be at least or about 80% identical
to corresponding residues in the wild-type polypeptide. For
example, a biologically active variant of a CRISPR-associated
endonuclease can have an amino acid sequence with at least or about
80% sequence identity (e.g., at least or about 85%, 90%, 95%, 97%,
98%, or 99% sequence identity) to a CRISPR-associated endonuclease
or to a homolog or ortholog thereof.
[0041] A biologically active variant of a CRISPR-associated
endonuclease polypeptide will retain sufficient biological activity
to be useful in the present methods. The biologically active
variants will retain sufficient activity to function in targeted
DNA cleavage. The biological activity can be assessed in ways known
to one of ordinary skill in the art and includes, without
limitation, in vitro cleavage assays or functional assays.
[0042] Polypeptides can be generated by a variety of methods
including, for example, recombinant techniques or chemical
synthesis. Once generated, polypeptides can be isolated and
purified to any desired extent by means well known in the art. For
example, one can use lyophilization following, for example,
reversed phase (preferably) or normal phase HPLC, or size exclusion
or partition chromatography on polysaccharide gel media such as
Sephadex G-25. The composition of the final polypeptide may be
confirmed by amino acid analysis after degradation of the peptide
by standard means, by amino acid sequencing, or by FAB-MS
techniques. Salts, including acid salts, esters, amides, and N-acyl
derivatives of an amino group of a polypeptide may be prepared
using methods known in the art, and such peptides are useful in the
context of the present invention.
[0043] The compositions of the invention include sequence encoding
a guide RNA (gRNA) comprising a sequence that is complementary to a
target sequence in a retrovirus. The retrovirus can be a
lentivirus, for example, a human immunodeficiency virus, a simian
immunodeficiency virus, a feline immunodeficiency virus or a bovine
immunodeficiency virus. The human immunodeficiency virus can be
HIV-1 or HIV-2. The target sequence can include a sequence from any
HIV, for example, HIV-1 and HIV-2, and any circulating recombinant
form thereof. The genetic variability of HIV is reflected in the
multiple groups and subtypes that have been described. A collection
of HIV sequences is compiled in the Los Alamos HIV databases and
compendiums. The methods and compositions of the invention can be
applied to HIV from any of those various groups, subtypes, and
circulating recombinant forms. These include for example, the HIV-1
major group (often referred to as Group M) and the minor groups,
Groups N, O, and P, as well as but not limited to, any of the
following subtypes, A, B, C, D, F, G, H, J and K. or group (for
example, but not limited to any of the following Groups, N, O and
P) of HIV. The methods and compositions can also be applied to
HIV-2 and any of the A, B, C, F or G clades (also referred to as
"subtypes" or "groups"), as well as any circulating recombinant
form of HIV-2.
[0044] The guide RNA can be a sequence complimentary to a coding or
a non-coding sequence. For example, the guide RNA can be an HIV
sequence, such as a long terminal repeat (LTR) sequence, a protein
coding sequence, or a regulatory sequence. In some embodiments, the
guide RNA comprises a sequence that is complementary to an HIV long
terminal repeat (LTR) region. The HIV-1 LTR is approximately 640 bp
in length. An exemplary HIV-1 LTR is the sequence of SEQ ID NO:
376. An exemplary SIV LTR is the sequence of SEQ ID NO: 380. HIV-1
long terminal repeats (LTRs) are divided into U3, R and U5 regions.
Exemplary HIV-1 LTR U3, R and U5 regions are SEQ ID NOs: 377, 378
and 379, respectively. Exemplary SIV LTR U3, R and U5 regions are
SEQ ID NOs: 381, 382, and 383, respectively. The configuration of
the U1, R, U5 regions for exemplary HIV-1 and SIV sequences are
shown in FIG. 18 and FIG. 19, respectively. LTRs contain all of the
required signals for gene expression and are involved in the
integration of a provirus into the genome of a host cell. For
example, the basal or core promoter, a core enhancer and a
modulatory region is found within U3 while the transactivation
response element is found within R. In HIV-1, the U5 region
includes several sub-regions, for example, TAR or trans-acting
responsive element, which is involved in transcriptional
activation; Poly A, which is involved in dimerization and genome
packaging; PBS or primer binding site; Psi or the packaging signal;
DIS or dimer initiation site
[0045] Useful guide sequences are complementary to the U3, R, or U5
region of the LTR. Exemplary guide RNA sequences that target the U3
region of HIV-1 are shown in FIG. 13. A guide RNA sequence can
comprise, for example, a sequence complementary to the target
protospacer sequence of:
TABLE-US-00001 (SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ
ID NO: 121) LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87) LTR C:
GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110) LTR D:
GCGTGGCCTGGGCGGGACTGGGG.
[0046] The locations of LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO:
121), LTR C (SEQ ID NO: 87) and LTR D (SEQ ID NO: 110) within the
U3 (SEQ ID NO: 16) region are shown FIG. 5. Additional exemplary
guide RNA sequences that target the U3 region are listed in the
table shown in FIG. 13 and can have the sequence of any of SEQ ID
NOs: 79-111 and SEQ ID NOs: 111-141. In some embodiments, the guide
sequence can comprise a sequence having 95% identity to any of SEQ
ID NOs: 79-111 and SEQ ID NOs: 111-141. Thus, a guide RNA sequence
can comprise, for example, a sequence having 95% identity to a
sequence complementary to the target protospacer sequence of:
TABLE-US-00002 (SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ
ID NO: 121) LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87) LTR C
GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110) LTR D:
GCGTGGCCTGGGCGGGACTGGGG.
[0047] We may also be refer to the guide RNA sequence as a spacer,
e.g., spacer (A), spacer (B), spacer (C), and spacer (D).
[0048] The guide RNA sequence can be complementary to a sequence
found within an HIV-1 U3, R, or U5 region reference sequence or
consensus sequence. The invention is not so limiting however, and
the guide RNA sequences can be selected to target any variant or
mutant HIV sequence. In some embodiments, more than one guide RNA
sequence is employed, for example a first guide RNA sequence and a
second guide RNA sequence, with the first and second guide RNA
sequences being complimentary to target sequences in any of the
above mentioned retroviral regions. In some embodiments, the guide
RNA can include a variant sequence or quasi-species sequence. In
some embodiments, the guide RNA can be a sequence corresponding to
a sequence in the genome of the virus harbored by the subject
undergoing treatment. Thus for example, the sequence of the
particular U3, R, or U5 region in the HIV virus harbored by the
subject can be obtained and guide RNAs complementary to the
patient's particular sequences can be used.
[0049] In some embodiments, the guide RNA can be a sequence
complimentary to a protein coding sequence, for example, a sequence
encoding one or more viral structural proteins, (e.g., gag, pol,
env and tat). Thus, the sequence can be complementary to sequence
within the gag polyprotein, e.g., MA (matrix protein, p17); CA
(capsid protein, p24); SP1 (spacer peptide 1, p2); NC (nucleocapsid
protein, p7); SP2 (spacer peptide 2, p1) and P6 protein; pol, e.g.,
reverse transcriptase (RT) and RNase H, integrase (IN), and HIV
protease (PR); env, e.g., gp160, or a cleavage product of gp160,
e.g., gp120 or SU, and gp41 or TM; or tat, e.g., the 72-amino acid
one-exon Tat or the 86-101 amino-acid two-exon Tat. In some
embodiments, the guide RNA can be a sequence complementary to a
sequence encoding an accessory protein, including, for example,
vif, nef (negative factor) vpu (Virus protein U) and tev.
[0050] In some embodiments, the sequence can be a sequence
complementary to a structural or regulatory element, for example,
an LTR, as described above; TAR (Target sequence for viral
transactivation), the binding site for Tat protein and for cellular
proteins, consists of approximately the first 45 nucleotides of the
viral mRNAs in HIV-1 (or the first 100 nucleotides in HIV-2) forms
a hairpin stem-loop structure; RRE (Rev responsive element) an RNA
element encoded within the env region of HIV-1, consisting of
approximately 200 nucleotides (positions 7710 to 8061 from the
start of transcription in HIV-1, spanning the border of gp120 and
gp41); PE (Psi element), a set of 4 stem-loop structures preceding
and overlapping the Gag start codon; SLIP, a TTTTTT "slippery
site", followed by a stem-loop structure; CRS (Cis-acting
repressive sequences); INS Inhibitory/Instability RNA sequences)
found for example, at nucleotides 414 to 631 in the gag region of
HIV-1.
[0051] The guide RNA sequence can be a sense or anti-sense
sequence. The guide RNA sequence generally includes a proto-spacer
adjacent motif (PAM). The sequence of the PAM can vary depending
upon the specificity requirements of the CRISPR endonuclease used.
In the CRISPR-Cas system derived from S. pyogenes, the target DNA
typically immediately precedes a 5'-NGG proto-spacer adjacent motif
(PAM). Thus, for the S. pyogenes Cas9, the PAM sequence can be AGG,
TGG, CGG or GGG. Other Cas9 orthologs may have different PAM
specificities. For example, Cas9 from S. thermophilus requires
5'-NNAGAA for CRISPR 1 and 5'-NGGNG for CRISPR3) and Neiseria
menigiditis requires 5'-NNNNGATT). The specific sequence of the
guide RNA may vary, but, regardless of the sequence, useful guide
RNA sequences will be those that minimize off-target effects while
achieving high efficiency and complete ablation of the genomically
integrated HIV-1 provirus. The length of the guide RNA sequence can
vary from about 20 to about 60 or more nucleotides, for example
about 20, about 21, about 22, about 23, about 24, about 25, about
26, about 27, about 28, about 29, about 30, about 31, about 32,
about 33, about 34, about 35, about 36, about 37, about 38, about
39, about 40, about 45, about 50, about 55, about 60 or more
nucleotides. Useful selection methods identify regions having
extremely low homology between the foreign viral genome and host
cellular genome including endogenous retroviral DNA, include
bioinformatic screening using 12-bp+NGG target-selection criteria
to exclude off-target human transcriptome or (even rarely)
untranslated-genomic sites; avoiding transcription factor binding
sites within the HIV-1 LTR promoter (potentially conserved in the
host genome); selection of LTR-A- and -B-directed, 30-bp gRNAs and
also pre-crRNA system reflecting the original bacterial immune
mechanism to enhance specificity/efficiency vs. 20-bp gRNA-,
chimeric crRNA-tracRNA-based system and WGS, Sanger sequencing and
SURVEYOR assay, to identify and exclude potential off-target
effects.
[0052] The guide RNA sequence can be configured as a single
sequence or as a combination of one or more different sequences,
e.g., a multiplex configuration. Multiplex configurations can
include combinations of two, three, four, five, six, seven, eight,
nine, ten, or more different guide RNAs, for example any
combination of sequences in U3, R, or U5. In some embodiments,
combinations of LTR A, LTR B, LTR C and LTR D can be used. In some
embodiments, combinations of any of the sequences LTR A (SEQ ID NO:
96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87), and LTR D (SEQ
ID NO: 110), can be used. In some embodiments, any combinations of
the sequences having the sequence of SEQ ID NOs: 79-111 and SEQ ID
NOs: 111-141 can be used. When the compositions are administered in
an expression vector, the guide RNAs can be encoded by a single
vector. Alternatively, multiple vectors can be engineered to each
include two or more different guide RNAs. Useful configurations
will result in the excision of viral sequences between cleavage
sites resulting in the ablation of HIV genome or HIV protein
expression. Thus, the use of two or more different guide RNAs
promotes excision of the viral sequences between the cleavage sites
recognized by the CRISPR endonuclease. The excised region can vary
in size from a single nucleotide to several thousand nucleotides.
Exemplary excised regions are described in the examples.
[0053] When the compositions are administered as a nucleic acid or
are contained within an expression vector, the CRISPR endonuclease
can be encoded by the same nucleic acid or vector as the guide RNA
sequences. Alternatively or in addition, the CRISPR endonuclease
can be encoded in a physically separate nucleic acid from the guide
RNA sequences or in a separate vector.
[0054] In some embodiments, the RNA molecules e.g. crRNA, tracrRNA,
gRNA are engineered to comprise one or more modified nucleobases.
For example, known modifications of RNA molecules can be found, for
example, in Genes VI, Chapter 9 ("Interpreting the Genetic Code"),
Lewis, ed. (1997, Oxford University Press, New York), and
Modification and Editing of RNA, Grosjean and Benne, eds. (1998,
ASM Press, Washington D.C.). Modified RNA components include the
following: 2'-O-methylcytidine; N.sup.4-methylcytidine;
N.sup.4-2'-O-dimethylcytidine; N.sup.4-acetylcytidine;
5-methylcytidine; 5,2'-O-dimethylcytidine; 5-hydroxymethylcytidine;
5-formylcytidine; 2'-O-methyl-5-formaylcytidine; 3-methylcytidine;
2-thiocytidine; lysidine; 2'-O-methyluridine; 2-thiouridine;
2-thio-2'-O-methyluridine; 3,2'-O-dimethyluridine;
3-(3-amino-3-carboxypropyl)uridine; 4-thiouridine; ribosylthymine;
5,2'-O-dimethyluridine; 5-methyl-2-thiouridine; 5-hydroxyuridine;
5-methoxyuridine; uridine 5-oxyacetic acid; uridine 5-oxyacetic
acid methyl ester; 5-carboxymethyluridine;
5-methoxycarbonylmethyluridine;
5-methoxycarbonylmethyl-2'-O-methyluridine;
5-methoxycarbonylmethyl-2'-thiouridine; 5-carbamoylmethyluridine;
5-carbamoylmethyl-2'-O-methyluridine;
5-(carboxyhydroxymethyl)uridine; 5-(carboxyhydroxymethyl)
uridinemethyl ester; 5-aminomethyl-2-thiouridine;
5-methylaminomethyluridine; 5-methylaminomethyl-2-thiouridine;
5-methylaminomethyl-2-selenouridine;
5-carboxymethylaminomethyluridine;
5-carboxymethylaminomethyl-2'-O-methyl-uridine;
5-carboxymethylaminomethyl-2-thiouridine; dihydrouridine;
dihydroribosylthymine; 2'-methyladenosine; 2-methyladenosine;
N.sup.6N-methyladenosine; N.sup.6, N.sup.6-dimethyladenosine;
N.sup.6,2'-O-trimethyladenosine;
2-methylthio-N.sup.6N-isopentenyladenosine;
N.sup.6-(cis-hydroxyisopentenyl)-adenosine;
2-methylthio-N.sup.6-(cis-hydroxyisopentenyl)-adenosine;
N.sup.6-glycinylcarbamoyl)adenosine; N.sup.6-threonylcarbamoyl
adenosine; N.sup.6-methyl-N.sup.6-threonylcarbamoyl adenosine;
2-methylthio-N.sup.6-methyl-N.sup.6-threonylcarbamoyl adenosine;
N.sup.6-hydroxynorvalylcarbamoyl adenosine;
2-methylthio-N.sup.6-hydroxnorvalylcarbamoyl adenosine;
2'-O-ribosyladenosine (phosphate); inosine; 2'O-methyl inosine;
1-methyl inosine; 1;2'-O-dimethyl inosine; 2'-O-methyl guanosine;
1-methyl guanosine; N.sup.2-methyl guanosine;
N.sup.2,N.sup.2-dimethyl guanosine; N.sup.2, 2'-O-dimethyl
guanosine; N.sup.2, N.sup.2, 2'-O-trimethyl guanosine; 2'-O-ribosyl
guanosine (phosphate); 7-methyl guanosine; N.sup.2;7-dimethyl
guanosine; N.sup.2; N.sup.2;7-trimethyl guanosine; wyosine;
methylwyosine; under-modified hydroxywybutosine; wybutosine;
hydroxywybutosine; peroxywybutosine; queuosine; epoxyqueuosine;
galactosyl-queuosine; mannosyl-queuosine; 7-cyano-7-deazaguanosine;
arachaeosine [also called 7-formamido-7-deazaguanosine]; and
7-aminomethyl-7-deazaguanosine.
[0055] We may use the terms "nucleic acid" and "polynucleotide"
interchangeably to refer to both RNA and DNA, including cDNA,
genomic DNA, synthetic DNA, and DNA (or RNA) containing nucleic
acid analogs, any of which may encode a polypeptide of the
invention and all of which are encompassed by the invention.
Polynucleotides can have essentially any three-dimensional
structure. A nucleic acid can be double-stranded or single-stranded
(i.e., a sense strand or an antisense strand). Non-limiting
examples of polynucleotides include genes, gene fragments, exons,
introns, messenger RNA (mRNA) and portions thereof, transfer RNA,
ribosomal RNA, siRNA, micro-RNA, ribozymes, cDNA, recombinant
polynucleotides, branched polynucleotides, plasmids, vectors,
isolated DNA of any sequence, isolated RNA of any sequence, nucleic
acid probes, and primers, as well as nucleic acid analogs. In the
context of the present invention, nucleic acids can encode a
fragment of a naturally occurring Cas9 or a biologically active
variant thereof and a guide RNA where in the guide RNA is
complementary to a sequence in HIV.
[0056] An "isolated" nucleic acid can be, for example, a
naturally-occurring DNA molecule or a fragment thereof, provided
that at least one of the nucleic acid sequences normally found
immediately flanking that DNA molecule in a naturally-occurring
genome is removed or absent. Thus, an isolated nucleic acid
includes, without limitation, a DNA molecule that exists as a
separate molecule, independent of other sequences (e.g., a
chemically synthesized nucleic acid, or a cDNA or genomic DNA
fragment produced by the polymerase chain reaction (PCR) or
restriction endonuclease treatment). An isolated nucleic acid also
refers to a DNA molecule that is incorporated into a vector, an
autonomously replicating plasmid, a virus, or into the genomic DNA
of a prokaryote or eukaryote. In addition, an isolated nucleic acid
can include an engineered nucleic acid such as a DNA molecule that
is part of a hybrid or fusion nucleic acid. A nucleic acid existing
among many (e.g., dozens, or hundreds to millions) of other nucleic
acids within, for example, cDNA libraries or genomic libraries, or
gel slices containing a genomic DNA restriction digest, is not an
isolated nucleic acid.
[0057] Isolated nucleic acid molecules can be produced by standard
techniques. For example, polymerase chain reaction (PCR) techniques
can be used to obtain an isolated nucleic acid containing a
nucleotide sequence described herein, including nucleotide
sequences encoding a polypeptide described herein. PCR can be used
to amplify specific sequences from DNA as well as RNA, including
sequences from total genomic DNA or total cellular RNA. Various PCR
methods are described in, for example, PCR Primer: A Laboratory
Manual, Dieffenbach and Dveksler, eds., Cold Spring Harbor
Laboratory Press, 1995. Generally, sequence information from the
ends of the region of interest or beyond is employed to design
oligonucleotide primers that are identical or similar in sequence
to opposite strands of the template to be amplified. Various PCR
strategies also are available by which site-specific nucleotide
sequence modifications can be introduced into a template nucleic
acid.
[0058] Isolated nucleic acids also can be chemically synthesized,
either as a single nucleic acid molecule (e.g., using automated DNA
synthesis in the 3' to 5' direction using phosphoramidite
technology) or as a series of oligonucleotides. For example, one or
more pairs of long oligonucleotides (e.g., >50-100 nucleotides)
can be synthesized that contain the desired sequence, with each
pair containing a short segment of complementarity (e.g., about 15
nucleotides) such that a duplex is formed when the oligonucleotide
pair is annealed. DNA polymerase is used to extend the
oligonucleotides, resulting in a single, double-stranded nucleic
acid molecule per oligonucleotide pair, which then can be ligated
into a vector. Isolated nucleic acids of the invention also can be
obtained by mutagenesis of, e.g., a naturally occurring portion of
a Cas9-encoding DNA (in accordance with, for example, the formula
above).
[0059] Two nucleic acids or the polypeptides they encode may be
described as having a certain degree of identity to one another.
For example, a Cas9 protein and a biologically active variant
thereof may be described as exhibiting a certain degree of
identity. Alignments may be assembled by locating short Cas9
sequences in the Protein Information Research (PIR) site, followed
by analysis with the "short nearly identical sequences." Basic
Local Alignment Search Tool (BLAST) algorithm on the NCBI
website.
[0060] As used herein, the term "percent sequence identity" refers
to the degree of identity between any given query sequence and a
subject sequence. For example, a naturally occurring Cas9 can be
the query sequence and a fragment of a Cas9 protein can be the
subject sequence. Similarly, a fragment of a Cas9 protein can be
the query sequence and a biologically active variant thereof can be
the subject sequence.
[0061] To determine sequence identity, a query nucleic acid or
amino acid sequence can be aligned to one or more subject nucleic
acid or amino acid sequences, respectively, using the computer
program ClustalW (version 1.83, default parameters), which allows
alignments of nucleic acid or protein sequences to be carried out
across their entire length (global alignment). See Chenna et al.,
Nucleic Acids Res. 31:3497-3500, 2003.
[0062] ClustalW calculates the best match between a query and one
or more subject sequences and aligns them so that identities,
similarities and differences can be determined. Gaps of one or more
residues can be inserted into a query sequence, a subject sequence,
or both, to maximize sequence alignments. For fast pair wise
alignment of nucleic acid sequences, the following default
parameters are used: word size: 2; window size: 4; scoring method:
percentage; number of top diagonals: 4; and gap penalty: 5. for
multiple alignments of nucleic acid sequences, the following
parameters are used: gap opening penalty: 10.0; gap extension
penalty: 5.0; and weight transitions: yes. For fast pair wise
alignment of protein sequences, the following parameters are used:
word size: 1; window size: 5; scoring method: percentage; number of
top diagonals: 5; gap penalty: 3. For multiple alignment of protein
sequences, the following parameters are used: weight matrix:
blosum; gap opening penalty: 10.0; gap extension penalty: 0.05;
hydrophilic gaps: on; hydrophilic residues: Gly, Pro, Ser, Asn,
Asp, Gin, Glu, Arg, and Lys; residue-specific gap penalties: on.
The output is a sequence alignment that reflects the relationship
between sequences. ClustalW can be run, for example, at the Baylor
College of Medicine Search Launcher site
(searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and at
the European Bioinformatics Institute site on the World Wide Web
(ebi.ac.uk/clustalw).
[0063] To determine a percent identity between a query sequence and
a subject sequence, ClustalW divides the number of identities in
the best alignment by the number of residues compared (gap
positions are excluded), and multiplies the result by 100. The
output is the percent identity of the subject sequence with respect
to the query sequence. It is noted that the percent identity value
can be rounded to the nearest tenth. For example, 78.11, 78.12,
78.13, and 78.14 are rounded down to 78.1, while 78.15, 78.16,
78.17, 78.18, and 78.19 are rounded up to 78.2.
[0064] The nucleic acids and polypeptides described herein may be
referred to as "exogenous". The term "exogenous" indicates that the
nucleic acid or polypeptide is part of, or encoded by, a
recombinant nucleic acid construct, or is not in its natural
environment. For example, an exogenous nucleic acid can be a
sequence from one species introduced into another species, i.e., a
heterologous nucleic acid. Typically, such an exogenous nucleic
acid is introduced into the other species via a recombinant nucleic
acid construct. An exogenous nucleic acid can also be a sequence
that is native to an organism and that has been reintroduced into
cells of that organism. An exogenous nucleic acid that includes a
native sequence can often be distinguished from the naturally
occurring sequence by the presence of non-natural sequences linked
to the exogenous nucleic acid, e.g., non-native regulatory
sequences flanking a native sequence in a recombinant nucleic acid
construct. In addition, stably transformed exogenous nucleic acids
typically are integrated at positions other than the position where
the native sequence is found.
[0065] Recombinant constructs are also provided herein and can be
used to transform cells in order to express Cas9 and/or a guide RNA
complementary to a target sequence in HIV. A recombinant nucleic
acid construct comprises a nucleic acid encoding a Cas9 and/or a
guide RNA complementary to a target sequence in HIV as described
herein, operably linked to a regulatory region suitable for
expressing the Cas9 and/or a guide RNA complementary to a target
sequence in HIV in the cell. It will be appreciated that a number
of nucleic acids can encode a polypeptide having a particular amino
acid sequence. The degeneracy of the genetic code is well known in
the art. For many amino acids, there is more than one nucleotide
triplet that serves as the codon for the amino acid. For example,
codons in the coding sequence for Cas9 can be modified such that
optimal expression in a particular organism is obtained, using
appropriate codon bias tables for that organism.
[0066] Vectors containing nucleic acids such as those described
herein also are provided. A "vector" is a replicon, such as a
plasmid, phage, or cosmid, into which another DNA segment may be
inserted so as to bring about the replication of the inserted
segment. Generally, a vector is capable of replication when
associated with the proper control elements. Suitable vector
backbones include, for example, those routinely used in the art
such as plasmids, viruses, artificial chromosomes, BACs, YACs, or
PACs. The term "vector" includes cloning and expression vectors, as
well as viral vectors and integrating vectors. An "expression
vector" is a vector that includes a regulatory region. A wide
variety of host/expression vector combinations may be used to
express the nucleic acid sequences described herein. Suitable
expression vectors include, without limitation, plasmids and viral
vectors derived from, for example, bacteriophage, baculoviruses,
and retroviruses. Numerous vectors and expression systems are
commercially available from such corporations as Novagen (Madison,
Wis.), Clontech (Palo Alto, Calif.), Stratagene (La Jolla, Calif.),
and Invitrogen/Life Technologies (Carlsbad, Calif.).
[0067] The vectors provided herein also can include, for example,
origins of replication, scaffold attachment regions (SARs), and/or
markers. A marker gene can confer a selectable phenotype on a host
cell. For example, a marker can confer biocide resistance, such as
resistance to an antibiotic (e.g., kanamycin, G418, bleomycin, or
hygromycin). As noted above, an expression vector can include a tag
sequence designed to facilitate manipulation or detection (e.g.,
purification or localization) of the expressed polypeptide. Tag
sequences, such as green fluorescent protein (GFP), glutathione
S-transferase (GST), polyhistidine, c-myc, hemagglutinin, or
Flag.TM. tag (Kodak, New Haven, Conn.) sequences typically are
expressed as a fusion with the encoded polypeptide. Such tags can
be inserted anywhere within the polypeptide, including at either
the carboxyl or amino terminus.
[0068] Additional expression vectors also can include, for example,
segments of chromosomal, non-chromosomal and synthetic DNA
sequences. Suitable vectors include derivatives of SV40 and known
bacterial plasmids, e.g., E. coli plasmids col E1, pCR1, pBR322,
pMal-C2, pET, pGEX, pMB9 and their derivatives, plasmids such as
RP4; phage DNAs, e.g., the numerous derivatives of phage 1, e.g.,
NM989, and other phage DNA, e.g., M13 and filamentous single
stranded phage DNA; yeast plasmids such as the 2.mu. plasmid or
derivatives thereof, vectors useful in eukaryotic cells, such as
vectors useful in insect or mammalian cells; vectors derived from
combinations of plasmids and phage DNAs, such as plasmids that have
been modified to employ phage DNA or other expression control
sequences.
[0069] Yeast expression systems can also be used. For example, the
non-fusion pYES2 vector (XbaI, SphI, ShoI, NotI, GstXI, EcoRI,
BstXI, BamH1, SacI, Kpn1, and HindIII cloning sites; Invitrogen) or
the fusion pYESHisA, B, C (XbaI, SphI, ShoI, NotI, BstXI, EcoRI,
BamH1, SacI, KpnI, and HindIII cloning sites, N-terminal peptide
purified with ProBond resin and cleaved with enterokinase;
Invitrogen), to mention just two, can be employed according to the
invention. A yeast two-hybrid expression system can also be
prepared in accordance with the invention.
[0070] The vector can also include a regulatory region. The term
"regulatory region" refers to nucleotide sequences that influence
transcription or translation initiation and rate, and stability
and/or mobility of a transcription or translation product.
Regulatory regions include, without limitation, promoter sequences,
enhancer sequences, response elements, protein recognition sites,
inducible elements, protein binding sequences, 5' and 3'
untranslated regions (UTRs), transcriptional start sites,
termination sequences, polyadenylation sequences, nuclear
localization signals, and introns.
[0071] As used herein, the term "operably linked" refers to
positioning of a regulatory region and a sequence to be transcribed
in a nucleic acid so as to influence transcription or translation
of such a sequence. For example, to bring a coding sequence under
the control of a promoter, the translation initiation site of the
translational reading frame of the polypeptide is typically
positioned between one and about fifty nucleotides downstream of
the promoter. A promoter can, however, be positioned as much as
about 5,000 nucleotides upstream of the translation initiation site
or about 2,000 nucleotides upstream of the transcription start
site. A promoter typically comprises at least a core (basal)
promoter. A promoter also may include at least one control element,
such as an enhancer sequence, an upstream element or an upstream
activation region (UAR). The choice of promoters to be included
depends upon several factors, including, but not limited to,
efficiency, selectability, inducibility, desired expression level,
and cell- or tissue-preferential expression. It is a routine matter
for one of skill in the art to modulate the expression of a coding
sequence by appropriately selecting and positioning promoters and
other regulatory regions relative to the coding sequence.
[0072] Vectors include, for example, viral vectors (such as
adenoviruses ("Ad"), adeno-associated viruses (AAV), and vesicular
stomatitis virus (VSV) and retroviruses), liposomes and other
lipid-containing complexes, and other macromolecular complexes
capable of mediating delivery of a polynucleotide to a host cell.
Vectors can also comprise other components or functionalities that
further modulate gene delivery and/or gene expression, or that
otherwise provide beneficial properties to the targeted cells. As
described and illustrated in more detail below, such other
components include, for example, components that influence binding
or targeting to cells (including components that mediate cell-type
or tissue-specific binding); components that influence uptake of
the vector nucleic acid by the cell; components that influence
localization of the polynucleotide within the cell after uptake
(such as agents mediating nuclear localization); and components
that influence expression of the polynucleotide. Such components
also might include markers, such as detectable and/or selectable
markers that can be used to detect or select for cells that have
taken up and are expressing the nucleic acid delivered by the
vector. Such components can be provided as a natural feature of the
vector (such as the use of certain viral vectors which have
components or functionalities mediating binding and uptake), or
vectors can be modified to provide such functionalities. Other
vectors include those described by Chen et al; BioTechniques, 34:
167-171 (2003). A large variety of such vectors are known in the
art and are generally available.
[0073] A "recombinant viral vector" refers to a viral vector
comprising one or more heterologous gene products or sequences.
Since many viral vectors exhibit size-constraints associated with
packaging, the heterologous gene products or sequences are
typically introduced by replacing one or more portions of the viral
genome. Such viruses may become replication-defective, requiring
the deleted function(s) to be provided in trans during viral
replication and encapsidation (by using, e.g., a helper virus or a
packaging cell line carrying gene products necessary for
replication and/or encapsidation). Modified viral vectors in which
a polynucleotide to be delivered is carried on the outside of the
viral particle have also been described (see, e.g., Curiel, D T, et
al. PNAS 88: 8850-8854, 1991).
[0074] Suitable nucleic acid delivery systems include recombinant
viral vector, typically sequence from at least one of an
adenovirus, adenovirus-associated virus (AAV), helper-dependent
adenovirus, retrovirus, or hemagglutinating virus of Japan-liposome
(HVJ) complex. In such cases, the viral vector comprises a strong
eukaryotic promoter operably linked to the polynucleotide e.g., a
cytomegalovirus (CMV) promoter. The recombinant viral vector can
include one or more of the polynucleotides therein, preferably
about one polynucleotide. In some embodiments, the viral vector
used in the invention methods has a pfu (plague forming units) of
from about 108 to about 5.times.10.sup.10 pfu. In embodiments in
which the polynucleotide is to be administered with a non-viral
vector, use of between from about 0.1 nanograms to about 4000
micrograms will often be useful e.g., about 1 nanogram to about 100
micrograms.
[0075] Additional vectors include viral vectors, fusion proteins
and chemical conjugates. Retroviral vectors include Moloney murine
leukemia viruses and HIV-based viruses. One HIV-based viral vector
comprises at least two vectors wherein the gag and pol genes are
from an HIV genome and the env gene is from another virus. DNA
viral vectors include pox vectors such as orthopox or avipox
vectors, herpesvirus vectors such as a herpes simplex I virus (HSV)
vector [Geller, A. I. et al., J. Neurochem, 64: 487 (1995); Lim,
F., et al., in DNA Cloning: Mammalian Systems, D. Glover, Ed.
(Oxford Univ. Press, Oxford England) (1995); Geller, A. I. et al.,
Proc Natl. Acad. Sci.: USA.:90 7603 (1993); Geller, A. I., et al.,
Proc Natl. Acad. Sci USA: 87:1149 (1990)], Adenovirus Vectors
[LeGal LaSalle et al., Science, 259:988 (1993); Davidson, et al.,
Nat. Genet. 3: 219 (1993); Yang, et al., J. Virol. 69: 2004 (1995)]
and Adeno-associated Virus Vectors [Kaplitt, M. G., et al., Nat.
Genet. 8:148 (1994)].
[0076] Pox viral vectors introduce the gene into the cells
cytoplasm. Avipox virus vectors result in only a short term
expression of the nucleic acid. Adenovirus vectors,
adeno-associated virus vectors and herpes simplex virus (HSV)
vectors may be an indication for some invention embodiments. The
adenovirus vector results in a shorter term expression (e.g., less
than about a month) than adeno-associated virus, in some
embodiments, may exhibit much longer expression. The particular
vector chosen will depend upon the target cell and the condition
being treated. The selection of appropriate promoters can readily
be accomplished. An example of a suitable promoter is the
763-base-pair cytomegalovirus (CMV) promoter. Other suitable
promoters which may be used for gene expression include, but are
not limited to, the Rous sarcoma virus (RSV) (Davis, et al., Hum
Gene Ther 4:151 (1993)), the SV40 early promoter region, the herpes
thymidine kinase promoter, the regulatory sequences of the
metallothionein (MMT) gene, prokaryotic expression vectors such as
the .beta.-lactamase promoter, the tac promoter, promoter elements
from yeast or other fungi such as the Gal 4 promoter, the ADC
(alcohol dehydrogenase) promoter, PGK (phosphoglycerol kinase)
promoter, alkaline phosphatase promoter; and the animal
transcriptional control regions, which exhibit tissue specificity
and have been utilized in transgenic animals: elastase I gene
control region which is active in pancreatic acinar cells, insulin
gene control region which is active in pancreatic beta cells,
immunoglobulin gene control region which is active in lymphoid
cells, mouse mammary tumor virus control region which is active in
testicular, breast, lymphoid and mast cells, albumin gene control
region which is active in liver, alpha-fetoprotein gene control
region which is active in liver, alpha 1-antitrypsin gene control
region which is active in the liver, beta-globin gene control
region which is active in myeloid cells, myelin basic protein gene
control region which is active in oligodendrocyte cells in the
brain, myosin light chain-2 gene control region which is active in
skeletal muscle, and gonadotropic releasing hormone gene control
region which is active in the hypothalamus. Certain proteins can
expressed using their native promoter. Other elements that can
enhance expression can also be included such as an enhancer or a
system that results in high levels of expression such as a tat gene
and tar element. This cassette can then be inserted into a vector,
e.g., a plasmid vector such as, pUC19, pUC118, pBR322, or other
known plasmid vectors, that includes, for example, an E. coli
origin of replication. See, Sambrook, et al., Molecular Cloning: A
Laboratory Manual, Cold Spring Harbor Laboratory press, (1989). The
plasmid vector may also include a selectable marker such as the
.beta.-lactamase gene for ampicillin resistance, provided that the
marker polypeptide does not adversely affect the metabolism of the
organism being treated. The cassette can also be bound to a nucleic
acid binding moiety in a synthetic delivery system, such as the
system disclosed in WO 95/22618.
[0077] If desired, the polynucleotides of the invention may also be
used with a microdelivery vehicle such as cationic liposomes and
adenoviral vectors. For a review of the procedures for liposome
preparation, targeting and delivery of contents, see Mannino and
Gould-Fogerite, BioTechniques, 6:682 (1988). See also, Felgner and
Holm, Bethesda Res. Lab. Focus, 11(2):21 (1989) and Maurer, R. A.,
Bethesda Res. Lab. Focus, 11(2):25 (1989).
[0078] Replication-defective recombinant adenoviral vectors, can be
produced in accordance with known techniques. See, Quantin, et al.,
Proc. Natl. Acad. Sci. USA, 89:2581-2584 (1992);
Stratford-Perricadet, et al., J. Clin. Invest., 90:626-630 (1992);
and Rosenfeld, et al., Cell, 68:143-155 (1992).
[0079] Another delivery method is to use single stranded DNA
producing vectors which can produce the expressed products
intracellularly. See for example, Chen et al, BioTechniques, 34:
167-171 (2003), which is incorporated herein, by reference, in its
entirety.
[0080] Pharmaceutical Compositions
[0081] As described above, the compositions of the present
invention can be prepared in a variety of ways known to one of
ordinary skill in the art. Regardless of their original source or
the manner in which they are obtained, the compositions of the
invention can be formulated in accordance with their use. For
example, the nucleic acids and vectors described above can be
formulated within compositions for application to cells in tissue
culture or for administration to a patient or subject. Any of the
pharmaceutical compositions of the invention can be formulated for
use in the preparation of a medicament, and particular uses are
indicated below in the context of treatment, e.g., the treatment of
a subject having an HIV infection or at risk for contracting and
HIV infection. When employed as pharmaceuticals, any of the nucleic
acids and vectors can be administered in the form of pharmaceutical
compositions. These compositions can be prepared in a manner well
known in the pharmaceutical art, and can be administered by a
variety of routes, depending upon whether local or systemic
treatment is desired and upon the area to be treated.
Administration may be topical (including ophthalmic and to mucous
membranes including intranasal, vaginal and rectal delivery),
pulmonary (e.g., by inhalation or insufflation of powders or
aerosols, including by nebulizer; intratracheal, intranasal,
epidermal and transdermal), ocular, oral or parenteral. Methods for
ocular delivery can include topical administration (eye drops),
subconjunctival, periocular or intravitreal injection or
introduction by balloon catheter or ophthalmic inserts surgically
placed in the conjunctival sac. Parenteral administration includes
intravenous, intra-arterial, subcutaneous, intraperitoneal or
intramuscular injection or infusion; or intracranial, e.g.,
intrathecal or intraventricular administration. Parenteral
administration can be in the form of a single bolus dose, or may
be, for example, by a continuous perfusion pump. Pharmaceutical
compositions and formulations for topical administration may
include transdermal patches, ointments, lotions, creams, gels,
drops, suppositories, sprays, liquids, powders, and the like.
Conventional pharmaceutical carriers, aqueous, powder or oily
bases, thickeners and the like may be necessary or desirable.
[0082] This invention also includes pharmaceutical compositions
which contain, as the active ingredient, nucleic acids and vectors
described herein in combination with one or more pharmaceutically
acceptable carriers. We use the terms "pharmaceutically acceptable"
(or "pharmacologically acceptable") to refer to molecular entities
and compositions that do not produce an adverse, allergic or other
untoward reaction when administered to an animal or a human, as
appropriate. The term "pharmaceutically acceptable carrier," as
used herein, includes any and all solvents, dispersion media,
coatings, antibacterial, isotonic and absorption delaying agents,
buffers, excipients, binders, lubricants, gels, surfactants and the
like, that may be used as media for a pharmaceutically acceptable
substance. In making the compositions of the invention, the active
ingredient is typically mixed with an excipient, diluted by an
excipient or enclosed within such a carrier in the form of, for
example, a capsule, tablet, sachet, paper, or other container. When
the excipient serves as a diluent, it can be a solid, semisolid, or
liquid material (e.g., normal saline), which acts as a vehicle,
carrier or medium for the active ingredient. Thus, the compositions
can be in the form of tablets, pills, powders, lozenges, sachets,
cachets, elixirs, suspensions, emulsions, solutions, syrups,
aerosols (as a solid or in a liquid medium), lotions, creams,
ointments, gels, soft and hard gelatin capsules, suppositories,
sterile injectable solutions, and sterile packaged powders. As is
known in the art, the type of diluent can vary depending upon the
intended route of administration. The resulting compositions can
include additional agents, such as preservatives. In some
embodiments, the carrier can be, or can include, a lipid-based or
polymer-based colloid. In some embodiments, the carrier material
can be a colloid formulated as a liposome, a hydrogel, a
microparticle, a nanoparticle, or a block copolymer micelle. As
noted, the carrier material can form a capsule, and that material
may be a polymer-based colloid.
[0083] The nucleic acid sequences of the invention can be delivered
to an appropriate cell of a subject. This can be achieved by, for
example, the use of a polymeric, biodegradable microparticle or
microcapsule delivery vehicle, sized to optimize phagocytosis by
phagocytic cells such as macrophages. For example, PLGA
(poly-lacto-co-glycolide) microparticles approximately 1-10 .mu.m
in diameter can be used. The polynucleotide is encapsulated in
these microparticles, which are taken up by macrophages and
gradually biodegraded within the cell, thereby releasing the
polynucleotide. Once released, the DNA is expressed within the
cell. A second type of microparticle is intended not to be taken up
directly by cells, but rather to serve primarily as a slow-release
reservoir of nucleic acid that is taken up by cells only upon
release from the microparticle through biodegradation. These
polymeric particles should therefore be large enough to preclude
phagocytosis (i.e., larger than 5 .mu.m and preferably larger than
20 .mu.m). Another way to achieve uptake of the nucleic acid is
using liposomes, prepared by standard methods. The nucleic acids
can be incorporated alone into these delivery vehicles or
co-incorporated with tissue-specific antibodies, for example
antibodies that target cell types that are commonly latently
infected reservoirs of HIV infection, for example, brain
macrophages, microglia, astrocytes, and gut-associated lymphoid
cells. Alternatively, one can prepare a molecular complex composed
of a plasmid or other vector attached to poly-L-lysine by
electrostatic or covalent forces. Poly-L-lysine binds to a ligand
that can bind to a receptor on target cells. Delivery of "naked
DNA" (i.e., without a delivery vehicle) to an intramuscular,
intradermal, or subcutaneous site, is another means to achieve in
vivo expression. In the relevant polynucleotides (e.g., expression
vectors) the nucleic acid sequence encoding the an isolated nucleic
acid sequence comprising a sequence encoding a CRISPR-associated
endonuclease and a guide RNA is operatively linked to a promoter or
enhancer-promoter combination. Promoters and enhancers are
described above.
[0084] In some embodiments, the compositions of the invention can
be formulated as a nanoparticle, for example, nanoparticles
comprised of a core of high molecular weight linear
polyethylenimine (LPEI) complexed with DNA and surrounded by a
shell of polyethyleneglycol-modified (PEGylated) low molecular
weight LPEI.
[0085] The nucleic acids and vectors may also be applied to a
surface of a device (e.g., a catheter) or contained within a pump,
patch, or other drug delivery device. The nucleic acids and vectors
of the invention can be administered alone, or in a mixture, in the
presence of a pharmaceutically acceptable excipient or carrier
(e.g., physiological saline). The excipient or carrier is selected
on the basis of the mode and route of administration. Suitable
pharmaceutical carriers, as well as pharmaceutical necessities for
use in pharmaceutical formulations, are described in Remington's
Pharmaceutical Sciences (E. W. Martin), a well-known reference text
in this field, and in the USP/NF (United States Pharmacopeia and
the National Formulary).
[0086] In some embodiments, the compositions may be formulated as a
topical gel for blocking sexual transmission of HIV. The topical
gel can be applied directly to the skin or mucous membranes of the
male or female genital region prior to sexual activity.
Alternatively or in addition the topical gel can be applied to the
surface or contained within a male or female condom or
diaphragm.
[0087] In some embodiments, the compositions can be formulated as a
nanoparticle encapsulating a nucleic acid encoding Cas9 or a
variant Cas9 and a guide RNA sequence complementary to a target HIV
or vector comprising a nucleic acid encoding Cas9 and a guide RNA
sequence complementary to a target HIV. Alternatively, the
compositions can be formulated as a nanoparticle encapsulating a
CRISPR-associated endonuclease polypeptide, e.g., Cas9 or a variant
Cas9 and a guide RNA sequence complementary to a target.
[0088] The present formulations can encompass a vector encoding
Cas9 and a guide RNA sequence complementary to a target HIV. The
guide RNA sequence can include a sequence complementary to a single
region, e.g. LTR A, B, C, or D or it can include any combination of
sequences complementary to LTR A, B, C, and D. Alternatively the
sequence encoding Cas9 and the sequence encoding the guide RNA
sequence can be on separate vectors.
[0089] Methods of Treatment
[0090] The compositions disclosed herein are generally and
variously useful for treatment of a subject having a retroviral
infection, e.g., an HIV infection. We may refer to a subject,
patient, or individual interchangeably. The methods are useful for
targeting any HIV, for example, HIV-1, HIV-2, and any circulating
recombinant form thereof. A subject is effectively treated whenever
a clinically beneficial result ensues. This may mean, for example,
a complete resolution of the symptoms of a disease, a decrease in
the severity of the symptoms of the disease, or a slowing of the
disease's progression. These methods can further include the steps
of a) identifying a subject (e.g., a patient and, more
specifically, a human patient) who has an HIV infection; and b)
providing to the subject a composition comprising a nucleic acid
encoding a CRISPR-associated nuclease, e.g., Cas9, and a guide RNA
complementary to an HIV target sequence, e.g. an HIV LTR. A subject
can be identified using standard clinical tests, for example,
immunoassays to detect the presence of HIV antibodies or the HIV
polypeptide p24 in the subject's serum, or through HIV nucleic acid
amplification assays. An amount of such a composition provided to
the subject that results in a complete resolution of the symptoms
of the infection, a decrease in the severity of the symptoms of the
infection, or a slowing of the infection's progression is
considered a therapeutically effective amount. The present methods
may also include a monitoring step to help optimize dosing and
scheduling as well as predict outcome. In some methods of the
present invention, one can first determine whether a patient has a
latent HIV-1 infection, and then make a determination as to whether
or not to treat the patient with one or more of the compositions
described herein. Monitoring can also be used to detect the onset
of drug resistance and to rapidly distinguish responsive patients
from nonresponsive patients. In some embodiments, the methods can
further include the step of determining the nucleic acid sequence
of the particular HIV harbored by the patient and then designing
the guide RNA to be complementary to those particular sequences.
For example, one can determine the nucleic acid sequence of a
subject's LTR U3, R or U5 region and then design one or more guide
RNAs to be precisely complementary to the patient's sequences.
[0091] The compositions are also useful for the treatment, for
example, as a prophylactic treatment, of a subject at risk for
having a retroviral infection, e.g., an HIV infection. These
methods can further include the steps of a) identifying a subject
at risk for having an HIV infection; b) providing to the subject a
composition comprising a nucleic acid encoding a CRISPR-associated
nuclease, e.g., Cas9, and a guide RNA complementary to an HIV
target sequence, e.g. an HIV LTR. A subject at risk for having an
HIV infection can be, for example, any sexually active individual
engaging in unprotected sex, i.e., engaging in sexual activity
without the use of a condom; a sexually active individual having
another sexually transmitted infection; an intravenous drug user;
or an uncircumcised man. A subject at risk for having an HIV
infection can be, for example, an individual whose occupation may
bring him or her into contact with HIV-infected populations, e.g.,
healthcare workers or first responders. A subject at risk for
having an HIV infection can be, for example, an inmate in a
correctional setting or a sex worker, that is, an individual who
uses sexual activity for income employment or nonmonetary items
such as food, drugs, or shelter.
[0092] The compositions can also be administered to a pregnant or
lactating woman having an HIV infection in order to reduce the
likelihood of transmission of HIV from the mother to her offspring.
A pregnant woman infected with HIV can pass the virus to her
offspring transplacentally in utero, at the time of delivery
through the birth canal or following delivery, through breast milk.
The compositions disclosed herein can be administered to the HIV
infected mother either prenatally, perinatally or postnatally
during the breast-feeding period, or any combination of prenatal,
perinatal, and postnatal administration. Compositions can be
administered to the mother along with standard antiretroviral
therapies as described below. In some embodiments, the compositions
of the invention are also administered to the infant immediately
following delivery and, in some embodiments, at intervals
thereafter. The infant also can receive standard antiretroviral
therapy.
[0093] The methods and compositions disclosed herein are useful for
the treatment of retroviral infections. Exemplary retroviruses
include human immunodeficiency viruses, e.g. HIV-1, HIV-2; simian
immunodeficiency virus (SIV); feline immunodeficiency virus (FIV);
bovine immunodeficiency virus (BIV); equine infectious anemia virus
(EIAV); and caprine arthritis/encephalitis virus (CAEV). The
methods disclosed herein can be applied to a wide range of species,
e.g., humans, non-human primates (e.g., monkeys), horses or other
livestock, dogs, cats, ferrets or other mammals kept as pets, rats,
mice, or other laboratory animals.
[0094] The methods of the invention can be expressed in terms of
the preparation of a medicament. Accordingly, the invention
encompasses the use of the agents and compositions described herein
in the preparation of a medicament. The compounds described herein
are useful in therapeutic compositions and regimens or for the
manufacture of a medicament for use in treatment of diseases or
conditions as described herein.
[0095] Any composition described herein can be administered to any
part of the host's body for subsequent delivery to a target cell. A
composition can be delivered to, without limitation, the brain, the
cerebrospinal fluid, joints, nasal mucosa, blood, lungs,
intestines, muscle tissues, skin, or the peritoneal cavity of a
mammal. In terms of routes of delivery, a composition can be
administered by intravenous, intracranial, intraperitoneal,
intramuscular, subcutaneous, intramuscular, intrarectal,
intravaginal, intrathecal, intratracheal, intradermal, or
transdermal injection, by oral or nasal administration, or by
gradual perfusion over time. In a further example, an aerosol
preparation of a composition can be given to a host by
inhalation.
[0096] The dosage required will depend on the route of
administration, the nature of the formulation, the nature of the
patient's illness, the patient's size, weight, surface area, age,
and sex, other drugs being administered, and the judgment of the
attending clinicians. Wide variations in the needed dosage are to
be expected in view of the variety of cellular targets and the
differing efficiencies of various routes of administration.
Variations in these dosage levels can be adjusted using standard
empirical routines for optimization, as is well understood in the
art. Administrations can be single or multiple (e.g., 2- or 3-, 4-,
6-, 8-, 10-, 20-, 50-, 100-, 150-, or more fold). Encapsulation of
the compounds in a suitable delivery vehicle (e.g., polymeric
microparticles or implantable devices) may increase the efficiency
of delivery.
[0097] The duration of treatment with any composition provided
herein can be any length of time from as short as one day to as
long as the life span of the host (e.g., many years). For example,
a compound can be administered once a week (for, for example, 4
weeks to many months or years); once a month (for, for example,
three to twelve months or for many years); or once a year for a
period of 5 years, ten years, or longer. It is also noted that the
frequency of treatment can be variable. For example, the present
compounds can be administered once (or twice, three times, etc.)
daily, weekly, monthly, or yearly.
[0098] An effective amount of any composition provided herein can
be administered to an individual in need of treatment. The term
"effective" as used herein refers to any amount that induces a
desired response while not inducing significant toxicity in the
patient. Such an amount can be determined by assessing a patient's
response after administration of a known amount of a particular
composition. In addition, the level of toxicity, if any, can be
determined by assessing a patient's clinical symptoms before and
after administering a known amount of a particular composition. It
is noted that the effective amount of a particular composition
administered to a patient can be adjusted according to a desired
outcome as well as the patient's response and level of toxicity.
Significant toxicity can vary for each particular patient and
depends on multiple factors including, without limitation, the
patient's disease state, age, and tolerance to side effects.
[0099] Any method known to those in the art can be used to
determine if a particular response is induced. Clinical methods
that can assess the degree of a particular disease state can be
used to determine if a response is induced. The particular methods
used to evaluate a response will depend upon the nature of the
patient's disorder, the patient's age, and sex, other drugs being
administered, and the judgment of the attending clinician.
[0100] The compositions may also be administered with another
therapeutic agent, for example, an anti-retroviral agent, used in
HAART. Exemplary antiretroviral agents include reverse
transcriptase inhibitors (e.g., nucleoside/nucleotide reverse
transcriptase inhibitors, zidovudine, emtricitibine, lamivudine and
tenofivir; and non-nucleoside reverse transcriptase inhibitors such
as efavarenz, nevirapine, rilpivirine); protease inhibitors, e.g.,
tipiravir, darunavir, indinavir; entry inhibitors, e.g., maraviroc;
fusion inhibitors, e.g., enfuviritide; or integrase inhibitors
e.g., raltegrivir, dolutegravir. Exemplary antiretroviral agents
can also include multi-class combination agents for example,
combinations of emtricitabine, efavarenz, and tenofivir;
combinations of emtricitabine; rilpivirine, and tenofivir; or
combinations of elvitegravir, cobicistat, emtricitabine and
tenofivir.
[0101] Concurrent administration of two or more therapeutic agents
does not require that the agents be administered at the same time
or by the same route, as long as there is an overlap in the time
period during which the agents are exerting their therapeutic
effect. Simultaneous or sequential administration is contemplated,
as is administration on different days or weeks. The therapeutic
agents may be administered under a metronomic regimen, e.g.,
continuous low-doses of a therapeutic agent.
[0102] Dosage, toxicity and therapeutic efficacy of such
compositions can be determined by standard pharmaceutical
procedures in cell cultures or experimental animals, e.g., for
determining the LD.sub.50 (the dose lethal to 50% of the
population) and the ED.sub.50 (the dose therapeutically effective
in 50% of the population). The dose ratio between toxic and
therapeutic effects is the therapeutic index and it can be
expressed as the ratio LD.sub.50/ED.sub.50.
[0103] The data obtained from the cell culture assays and animal
studies can be used in formulating a range of dosage for use in
humans. The dosage of such compositions lies preferably within a
range of circulating concentrations that include the ED.sub.50 with
little or no toxicity. The dosage may vary within this range
depending upon the dosage form employed and the route of
administration utilized. For any composition used in the method of
the invention, the therapeutically effective dose can be estimated
initially from cell culture assays. A dose may be formulated in
animal models to achieve a circulating plasma concentration range
that includes the IC.sub.50 (i.e., the concentration of the test
compound which achieves a half-maximal inhibition of symptoms) as
determined in cell culture. Such information can be used to more
accurately determine useful doses in humans. Levels in plasma may
be measured, for example, by high performance liquid
chromatography.
[0104] As described, a therapeutically effective amount of a
composition (i.e., an effective dosage) means an amount sufficient
to produce a therapeutically (e.g., clinically) desirable result.
The compositions can be administered one from one or more times per
day to one or more times per week; including once every other day.
The skilled artisan will appreciate that certain factors can
influence the dosage and timing required to effectively treat a
subject, including but not limited to the severity of the disease
or disorder, previous treatments, the general health and/or age of
the subject, and other diseases present. Moreover, treatment of a
subject with a therapeutically effective amount of the compositions
of the invention can include a single treatment or a series of
treatments.
[0105] The compositions described herein are suitable for use in a
variety of drug delivery systems described above. Additionally, in
order to enhance the in vivo serum half-life of the administered
compound, the compositions may be encapsulated, introduced into the
lumen of liposomes, prepared as a colloid, or other conventional
techniques may be employed which provide an extended serum
half-life of the compositions. A variety of methods are available
for preparing liposomes, as described in, e.g., Szoka, et al., U.S.
Pat. Nos. 4,235,871, 4,501,728 and 4,837,028 each of which is
incorporated herein by reference. Furthermore, one may administer
the drug in a targeted drug delivery system, for example, in a
liposome coated with a tissue-specific antibody. The liposomes will
be targeted to and taken up selectively by the organ.
[0106] Also provided, are methods of inactivating a retrovirus, for
example a lentivirus such as a human immunodeficiency virus, a
simian immunodeficiency virus, a feline immunodeficiency virus, or
a bovine immunodeficiency virus in a mammalian cell. The human
immunodeficiency virus can be HIV-1 or HIV-2. The human
immunodeficiency virus can be a chromosomally integrated provirus.
The mammalian cell can be any cell type infected by HIV, including,
but not limited to CD4+ lymphocytes, macrophages, fibroblasts,
monocytes, T lymphocytes, B lymphocytes, natural killer cells,
dendritic cells such as Langerhans cells and follicular dendritic
cells, hematopoietic stem cells, endothelial cells, brain
microglial cells, and gastrointestinal epithelial cells. Such cell
types include those cell types that are typically infected during a
primary infection, for example, a CD4+ lymphocyte, a macrophage, or
a Langerhans cell, as well as those cell types that make up latent
HIV reservoirs, i.e., a latently infected cell.
[0107] The methods can include exposing the cell to a composition
comprising an isolated nucleic acid encoding a gene editing complex
comprising a CRISPR-associated endonuclease and one or more guide
RNAs wherein the guide RNA is complementary to a target nucleic
acid sequence in the retrovirus. In a preferred embodiment, as
previously described, the method of inactivating a proviral DNA
integrated into the genome of a host cell latently infected with a
retrovirus includes the steps of treating the host cell with a
composition comprising a CRISPR-associated endonuclease, and two or
more different guide RNAs (gRNAs), wherein each of the at least two
gRNAs is complementary to a different target nucleic acid sequence
in the proviral DNA; and inactivating the proviral DNA. The at
least two gRNAs can be configured as a single sequence or as a
combination of one or more different sequences, e.g., a multiplex
configuration. Multiplex configurations can include combinations of
two, three, four, five, six, seven, eight, nine, ten, or more
different gRNAs, for example any combination of sequences in U3, R,
or U5. In some embodiments, combinations of LTR A, LTR B, LTR C and
LTR D can be used. In some embodiments, combinations of any of the
sequences LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ
ID NO: 87), and LTR D (SEQ ID NO: 110), can be used. In experiments
described in the Examples, the use of two different gRNAs caused
the excision of the viral sequences between the cleavage sites
recognized by the CRISPR endonuclease. The excised region can
include the entire HIV-1 genome. The treating step can take place
in vivo, that is, the compositions can be administered directly to
a subject having HIV infection. The methods are not so limited
however, and the treating step can take place ex vivo. For example,
a cell or plurality of cells, or a tissue explant, can be removed
from a subject having an HIV infection and placed in culture, and
then treated with a composition comprising a CRISPR-associated
endonuclease and a guide RNA wherein the guide RNA is complementary
to the nucleic acid sequence in the human immunodeficiency virus.
As described above, the composition can be a nucleic acid encoding
a CRISPR-associated endonuclease and a guide RNA wherein the guide
RNA is complementary to the nucleic acid sequence in the human
immunodeficiency virus; an expression vector comprising the nucleic
acid sequence; or a pharmaceutical composition comprising a nucleic
acid encoding a CRISPR-associated endonuclease and a guide RNA
wherein the guide RNA is complementary to the nucleic acid sequence
in the human immunodeficiency virus; or an expression vector
comprising the nucleic acid sequence. In some embodiments, the gene
editing complex can comprise a CRISPR-associated endonuclease
polypeptide and a guide RNA wherein the guide RNA is complementary
to the nucleic acid sequence in the human immunodeficiency
virus.
[0108] Regardless of whether compositions are administered as
nucleic acids or polypeptides, they are formulated in such a way as
to promote uptake by the mammalian cell. Useful vector systems and
formulations are described above. In some embodiments the vector
can deliver the compositions to a specific cell type. The invention
is not so limited however, and other methods of DNA delivery such
as chemical transfection, using, for example calcium phosphate,
DEAE dextran, liposomes, lipoplexes, surfactants, and perfluoro
chemical liquids are also contemplated, as are physical delivery
methods, such as electroporation, micro injection, ballistic
particles, and "gene gun" systems.
[0109] Standard methods, for example, immunoassays to detect the
CRISPR-associated endonuclease, or nucleic acid-based assays such
as PCR to detect the gRNA, can be used to confirm that the complex
has been taken up and expressed by the cell into which it has been
introduced. The engineered cells can then be reintroduced into the
subject from whom they were derived as described below.
[0110] The gene editing complex comprises a CRISPR-associated
nuclease, e.g., Cas9, and a guide RNA complementary to the
retroviral target sequence, for example, an HIV target sequence.
The gene editing complex can introduce various mutations into the
proviral DNA. The mechanism by which such mutations inactivate the
virus can vary, for example the mutation can affect proviral
replication, viral gene expression or proviral excision. The
mutations may be located in regulatory sequences or structural gene
sequences and result in defective production of HIV. The mutation
can comprise a deletion. The size of the deletion can vary from a
single nucleotide base pair to about 10,000 base pairs. In some
embodiments, the deletion can include all or substantially all of
the proviral sequence. In some embodiments the deletion can include
the entire proviral sequence. The mutation can comprise an
insertion; that is the addition of one or more nucleotide base
pairs to the pro-viral sequence. The size of the inserted sequence
also may vary, for example from about one base pair to about 300
nucleotide base pairs. The mutation can comprise a point mutation,
that is, the replacement of a single nucleotide with another
nucleotide. Useful point mutations are those that have functional
consequences, for example, mutations that result in the conversion
of an amino acid codon into a termination codon or that result in
the production of a nonfunctional protein.
[0111] In exemplary multiplex methods for inactivating proviral DNA
integrated into the genome of a host cell, as demonstrated in
Examples 2-5, two different gRNA sequences are deployed, with each
gRNA sequence targeting a different site in the proviral DNA. That
is, the methods include the steps of exposing the host cell to a
composition including an isolated nucleic acid encoding a
CRISPR-associated endonuclease; an isolated nucleic acid sequence
encoding a first gRNA having a first spacer sequence that is
complementary to a first target protospacer sequence in a proviral
DNA; and an isolated nucleic acid encoding a second gRNA having a
second spacer sequence that is complementary to a second target
protospacer sequence in the proviral DNA; expressing in the host
cell the CRISPR-associated endonuclease, the first gRNA, and the
second gRNA; assembling, in the host cell, a first gene editing
complex including the CRISPR-associated endonuclease and the first
gRNA; and a second gene editing complex including the
CRISPR-associated endonuclease and the second gRNA; directing the
first gene editing complex to the first target protospacer sequence
by complementary base pairing between the first spacer sequence and
the first target protospacer sequence; directing the second gene
editing complex to the second target protospacer sequence by
complementary base pairing between the second spacer sequence and
the second target protospacer sequence; cleaving the proviral DNA
at the first target protospacer sequence with the CRISPR-associated
endonuclease; cleaving the proviral DNA at the second target
protospacer sequence with the CRISPR-associated endonuclease; and
inducing at least one mutation in the proviral DNA. The same
multiplex method is readily incorporated into methods for treating
a subject having a human immunodeficiency virus, and for reducing
the risk of a human immunodeficiency virus infection. It will be
understood that the term "composition" can include not only a
mixture of components, but also separate components that are not
necessarily administered simultaneously. As a non-limiting example,
a composition according to the present invention can include
separate component preparations of nucleic acid sequences encoding
a Cas9 nuclease, a first gRNA, and a second gRNA, with each
component being administered sequentially in an infusion, during a
time frame that results in a host cell being exposed to all three
components.
[0112] In other embodiments, the compositions comprise a cell which
has been transformed or transfected with one or more Cas/gRNA
vectors. In some embodiments, the methods of the invention can be
applied ex vivo. That is, a subject's cells can be removed from the
body and treated with the compositions in culture to excise HIV
sequences and the treated cells returned to the subject's body. The
cell can be the subject's cells or they can be haplotype matched or
a cell line. The cells can be irradiated to prevent replication. In
some embodiments, the cells are human leukocyte antigen
(HLA)-matched, autologous, cell lines, or combinations thereof. In
other embodiments the cells can be a stem cell. For example, an
embryonic stem cell or an artificial pluripotent stem cell (induced
pluripotent stem cell (iPS cell)). Embryonic stem cells (ES cells)
and artificial pluripotent stem cells (induced pluripotent stem
cell, iPS cells) have been established from many animal species,
including humans. These types of pluripotent stem cells would be
the most useful source of cells for regenerative medicine because
these cells are capable of differentiation into almost all of the
organs by appropriate induction of their differentiation, with
retaining their ability of actively dividing while maintaining
their pluripotency. iPS cells, in particular, can be established
from self-derived somatic cells, and therefore are not likely to
cause ethical and social issues, in comparison with ES cells which
are produced by destruction of embryos. Further, iPS cells, which
are self-derived cell, make it possible to avoid rejection
reactions, which are the biggest obstacle to regenerative medicine
or transplantation therapy.
[0113] The gRNA expression cassette can be easily delivered to a
subject by methods known in the art, for example, methods which
deliver siRNA. In some aspects, the Cas may be a fragment wherein
the active domains of the Cas molecule are included, thereby
cutting down on the size of the molecule. Thus, the, Cas9/gRNA
molecules can be used clinically, similar to the approaches taken
by current gene therapy. In particular, a Cas9/multiplex gRNA
stable expression stem cell or iPS cells for cell transplantation
therapy as well as HIV-1 vaccination will be developed for use in
subjects.
[0114] Transduced cells are prepared for reinfusion according to
established methods. After a period of about 2-4 weeks in culture,
the cells may number between 1.times.10.sup.6 and
1.times.10.sup.10. In this regard, the growth characteristics of
cells vary from patient to patient and from cell type to cell type.
About 72 hours prior to reinfusion of the transduced cells, an
aliquot is taken for analysis of phenotype, and percentage of cells
expressing the therapeutic agent. For administration, cells of the
present invention can be administered at a rate determined by the
LD.sub.50 of the cell type, and the side effects of the cell type
at various concentrations, as applied to the mass and overall
health of the patient. Administration can be accomplished via
single or divided doses. Adult stem cells may also be mobilized
using exogenously administered factors that stimulate their
production and egress from tissues or spaces that may include, but
are not restricted to, bone marrow or adipose tissues.
[0115] Articles of Manufacture
[0116] The compositions described herein can be packaged in
suitable containers labeled, for example, for use as a therapy to
treat a subject having a retroviral infection, for example, an HIV
infection or a subject at for contracting a retroviral infection,
for example, an HIV infection. The containers can include a
composition comprising a nucleic acid sequence encoding a
CRISPR-associated endonuclease, for example, a Cas9 endonuclease,
and a guide RNA complementary to a target sequence in a human
immunodeficiency virus, or a vector encoding that nucleic acid, and
one or more of a suitable stabilizer, carrier molecule, flavoring,
and/or the like, as appropriate for the intended use. Accordingly,
packaged products (e.g., sterile containers containing one or more
of the compositions described herein and packaged for storage,
shipment, or sale at concentrated or ready-to-use concentrations)
and kits, including at least one composition of the invention,
e.g., a nucleic acid sequence encoding a CRISPR-associated
endonuclease, for example, a Cas9 endonuclease, and a guide RNA
complementary to a target sequence in a human immunodeficiency
virus, or a vector encoding that nucleic acid and instructions for
use, are also within the scope of the invention. A product can
include a container (e.g., a vial, jar, bottle, bag, or the like)
containing one or more compositions of the invention. In addition,
an article of manufacture further may include, for example,
packaging materials, instructions for use, syringes, delivery
devices, buffers or other control reagents for treating or
monitoring the condition for which prophylaxis or treatment is
required.
[0117] In some embodiments, the kits can include one or more
additional antiretroviral agents, for example, a reverse
transcriptase inhibitor, a protease inhibitor or an entry
inhibitor. The additional agents can be packaged together in the
same container as a nucleic acid sequence encoding a
CRISPR-associated endonuclease, for example, a Cas9 endonuclease,
and a guide RNA complementary to a target sequence in a human
immunodeficiency virus, or a vector encoding that nucleic acid or
they can be packaged separately. The nucleic acid sequence encoding
a CRISPR-associated endonuclease, for example, a Cas9 endonuclease,
and a guide RNA complementary to a target sequence in a human
immunodeficiency virus, or a vector encoding that nucleic acid and
the additional agent may be combined just before use or
administered separately.
[0118] The product may also include a legend (e.g., a printed label
or insert or other medium describing the product's use (e.g., an
audio- or videotape)). The legend can be associated with the
container (e.g., affixed to the container) and can describe the
manner in which the compositions therein should be administered
(e.g., the frequency and route of administration), indications
therefor, and other uses. The compositions can be ready for
administration (e.g., present in dose-appropriate units), and may
include one or more additional pharmaceutically acceptable
adjuvants, carriers or other diluents and/or an additional
therapeutic agent. Alternatively, the compositions can be provided
in a concentrated form with a diluent and instructions for
dilution.
Example 1: Materials and Methods
[0119] Plasmid preparation: Vectors containing human Cas9 and gRNA
expression cassette, pX260, and pX330 (Addgene) were utilized to
create various constructs, LTR-A, B, C, and D.
[0120] Cell culture and stable cell lines: TZM-b1 reporter and U1
cell lines were obtained from the NIH AIDS Reagent Program and
CHME5 microglial cells are known in the art.
[0121] Immunohistochemistry and Western Blot: Standard methods for
immunocytochemical observation of the cells and evaluation of
protein expression by Western blot were utilized.
[0122] Firefly-luciferase assay: Cells were lysed 24 h
post-treatment using Passive Lysis Buffer (Promega) and assayed
with a Luciferase Reporter Gene Assay kit (Promega) according to
the manufacturer's protocol. Luciferase activity was normalized to
the number of cells determined by a parallel MTT assay (Vybrant,
Invitrogen)
[0123] p24 ELISA: After infection or reactivation, the levels of
HIV-1 viral load in the supernatants were quantified by p24 Gag
ELISA (Advanced BioScience Laboratories, Inc) following the
manufacturer's protocol. To assess cell viability upon treatments,
MTT assay was performed in parallel according to the manufacturer's
manual (Vybrant, Invitrogen).
[0124] EGFP Flow cytometry: Cells were trypsinized, washed with PBS
and fixed in 2% paraformaldehyde for 10 min at room temperature,
then washed twice with PBS and analyzed using a Guava EasyCyte Mini
flow cytometer (Guava Technologies).
[0125] HIV-1 reporter virus preparation and infections: HEK293T
cells were transfected using Lipofectamine 2000 reagent
(Invitrogen) with pNL4-3-.DELTA.E-EGFP (NIH AIDS Research and
Reference Reagent Program). After 48 h, the supernatant was
collected, 0.45 .mu.m filtered and tittered in HeLa cells using
EGFP as an infection marker. For viral infection, stable Cas9/gRNA
TZM-bl cells were incubated 2 h with diluted viral stock, and then
washed twice with PBS. At 2 and 4 d post-infection, cells were
collected, fixed and analyzed by flow cytometry for EGFP
expression, or genomic DNA purification was performed for PCR and
whole genome sequencing.
[0126] Genomic DNA amplification, PCR, TA-cloning, and Sanger
sequencing, GenomeWalker link PCR: Standard methods for DNA
manipulation for cloning and sequencing were utilized. For
identification of the integration sites of HIV-1, we utilized
Lenti-X.TM. integration site analysis kit was used.
[0127] Surveyor assay: The presence of mutations in PCR products
was examined using a SURVEYOR Mutation Detection Kit (Transgenomic)
according to the protocol from the manufacturer. Briefly
heterogeneous PCR product was denatured for 10 min in 95.degree. C.
and hybridized by gradual cooling using a thermocycler. Next, 300
ng of hybridized DNA (9 .mu.l) was subjected to digestion with 0.25
.mu.l of SURVEYOR Nuclease in the presence of 0.25 .mu.l SURVEYOR
Enhancer S and 15 mM MgCl.sub.2 for 4 h at 42.degree. C. Then Stop
Solution was added and samples were resolved in 2% agarose gel
together with equal amounts of undigested PCR product controls.
[0128] Some PCR products were used for restriction fragment length
polymorphism analysis. Equal amounts of the PCR products were
digested with BsaJI. Digested DNA was separated on an ethidium
bromide-contained agarose gel (2%). For sequencing, PCR products
were cloned using a TA Cloning.RTM. Kit Dual Promoter with pCR.TM.
II vector (Invitrogen). The insert was confirmed by digestion with
EcoRI and positive clones were sent to Genewiz for Sanger
sequencing.
[0129] Selection of LTR target sites, whole genome sequencing and
bioinformatics and statistical analysis. We utilized Jack Lin's
CRISPR/Cas9 gRNA finder tool for initial identification of
potential target sites within the LTR.
[0130] Plasmid preparation. DNA segment expressing LTR-A or LTR-B
for pre-crRNA was cloned into the pX260 vector that contains the
puromycin selection gene (Addgene, plasmid #42229). DNA segments
expressing LTR-C or LTR-D for the chimeric crRNA-tracrRNA were
cloned into the pX330 vector (Addgene, plasmid #42230). Both
vectors contain a humanized Cas9 coding sequence driven by a CAG
promoter and a gRNA expression cassette driven by a human U6
promoter. The vectors were digested with BbsI and treated with
Antarctic Phosphatase, and the linearized vector was purified with
a Quick nucleotide removal kit (Qiagen). A pair of oligonucleotides
for each targeting site (FIG. 14, AlphaDNA) was annealed,
phosphorylated, and ligated to the linearized vector. The gRNA
expression cassette was sequenced with U6 sequencing primer (FIG.
14) in GENEWIZ. For pX330 vectors, we designed a pair of universal
PCR primers with overhang digestion sites (FIG. 14) that can tease
out the gRNA expression cassette (U6-gRNA-crRNA-stem-tracrRNA) for
direct transfection or subcloning to other vectors.
[0131] Cell culture. TZM-bl reporter cell line from Dr John C.
Kappes, Dr Xiaoyun Wu and Tranzyme Inc, U1/Hiv-1 cell line from Dr.
Thomas Folks and J-Lat full length clone from Dr. Eric Verdin were
obtained through the NIH AIDS Reagent Program, Division of AIDS,
NIAID, NIH. CHME5/HIV fetal microglia cell line were generated as
previously described. TZM-bl and CHME5 cells were cultured in
Dulbecco's minimal essential medium high glucose supplemented with
10% heat-inactivated fetal bovine serum (FBS) and 1%
penicillin/streptomycin. U1 and J-Lat cells were cultured in RPMI
1640 containing 2.0 mM L-glutamine, 10% FBS and 1%
penicillin/streptomycin.
[0132] Stable cell lines and subcloning. TZM-bl or CHME5/HIV cells
were seeded in 6-well plates at 1.5.times.10.sup.5 cells/well and
transfected using Lipofectamine 2000 reagent (Invitrogen) with 1
.mu.g of pX260 (for LTR-A and B) or 1 .mu.g/0.1 .mu.g of
pX330/pX260 (for LTR-C and D) plasmids. Next day, cells were
transferred into 100-mm dishes and incubated with growth medium
containing 1 .mu.g/ml of puromycin (Sigma). Two weeks later,
surviving cell colonies were isolated using cloning cylinders
(Corning). U1 cells (1.5.times.10.sup.5) were electroporated with 1
.mu.g of DNA using 10 .mu.l tip, 3.times.10 ms 1400 V impulses at
The Neon.TM. Transfection System (Invitrogen). Cells were selected
with 0.5 .mu.g/ml of puromycin for two weeks. The stable clones
were subcultured using a limited dilution method in 96-well plates
and single cell-derived subclones were maintained for further
studies.
[0133] Immunocytochemistry and western blot. The Cas9/gRNA stable
expression TZM-bl cells were cultured in 8-well chamber slides for
2 days and fixed for 10 min in 4% paraformaldehyde/PBS. After three
rinses, the cells were treated with 0.5% Triton X-100/PBS for 20
min and blocked in 10% donkey serum for 1 h. Cells were incubated
overnight at 4.degree. C. with mouse anti-Flag M2 primary antibody
(1:500, Sigma). After rinsing three times, cells were incubated for
1 h with donkey anti-mouse Alexa-Fluor-594 secondary antibodies,
and incubated with Hoechst 33258 for 5 min. After three rinses with
PBS, the cells were coverslipped with anti-fading aqueous mounting
media (Biomeda) and analyzed under a Leica DMI6000B fluorescence
microscope.
[0134] TZM-bl cells cultured in 6-well plate were solubilized in
200 .mu.l of Triton X-100-based lysis buffer containing 20 mM
Tris-HCl (pH 7.4), 1% Triton X-100, 5 mM ethylenediaminetetraacetic
acid, 5 mM dithiothreitol, 150 mM NaCl, 1 mM phenylmethylsulfonyl
fluoride, 1.times. nuclear extraction proteinase inhibitor cocktail
(Cayman Chemical, Ann Arbor, Mich.), 1 mM sodium orthovanadate and
30 mM NaF. Cell lysates were rotated at 4.degree. C. for 30 min.
Nuclear and cellular debris was cleared by centrifugation at 20,000
g for 20 min at 4.degree. C. Equal amounts of lysate proteins (20
.mu.g) were denatured by boiling for 5 min in sodium dodecyl
sulphate (SDS) sample buffer, fractionated by SDS-polyacrylamide
gel electrophoresis in tris-glycine buffer, and transferred to
nitrocellulose membrane (BioRad). The SeeBlue prestained standards
(Invitrogen) were used as a molecular weight reference. Blots were
blocked in 5% BSA/tris-buffered saline (pH 7.6) plus 0.1% Tween-20
(TBS-T) for 1 h and then incubated overnight at 4.degree. C. with
mouse anti-Flag M2 monoclonal antibody (1:1000, Sigma) or mouse
anti-GAPDH monoclonal antibody (1:3000, Santa Cruz Biotechnology).
After washing with TBS-T, the blots were incubated with IRDye
680LT-conjugated anti-mouse antibody for 1 h at room temperature.
Membranes were scanned and analyzed using an Odyssey Infrared
Imaging System (LI-COR Biosciences).
[0135] Firefly-luciferase assay. Cells were lysed 24 h
post-treatment using Passive Lysis Buffer (Promega) and assayed
with a Luciferase Reporter Gene Assay kit (Promega) according to
the protocol of the manufacturer. Luciferase activity was
normalized to the number of cells determined by parallel MTT assay
(Vybrant, Invitrogen).
[0136] p24 ELISA After infection or reactivation, the HIV-1 viral
load levels in the supernatants were quantified by p24 Gag ELISA
(Advanced BioScience Laboratories, Inc) following the
manufacturer's protocol. To assess the cell viability upon
treatments, MTT assay was performed in parallel according to the
manufacturer's protocol (Vybrant, Invitrogen).
[0137] EGFP Flow cytometry. Cells were trypsinized, washed with PBS
and fixed in 2% paraformaldehyde for 10 min at room temperature,
then washed twice with PBS and analyzed using a Guava EasyCyte Mini
flow cytometer (Guava Technologies).
[0138] Hiv-1 reporter virus preparation and infections. HEK293T
cells were transfected using Lipofectamine 2000 reagent
(Invitrogen) with pNL4-3-.DELTA.E-EGFP, SF162 and JRFL (NIH AIDS
Research and Reference Reagent Program). For pseudotyped
pNL4-3-.DELTA.E-EGFP, the VSVG vector was cotransfected. After 48
h, the supernatant was collected, 0.45 .mu.m filtered and tittered
in HeLa cells using expressed EGFP as an infection marker. For
viral infection, stable Cas9/gRNA TZM-bl cells were incubated 2 h
with a diluted viral stock, and washed twice with PBS. At 2 and 4
days post-infection, cells were collected, fixed and analyzed by
flow cytometry for EGFP expression, or genomic DNA purification was
performed for PCR and whole genome sequencing.
[0139] Genomic DNA purification, PCR, TA-cloning and Sanger
sequencing. Genomic DNA was isolated from cells using an
ArchivePure DNA cell/tissue purification kit (5PRIME) according to
the protocol recommended by the manufacturer. One hundred ng of
extracted DNA were subjected to PCR using a high-fidelity FailSafe
PCR kit (Epicentre) using primers listed in FIG. 14. Three steps of
standard PCR were carried out for 30 cycles with 55.degree. C.
annealing and 72.degree. C. extension. The products were resolved
in 2% agarose gel. The bands of interest were gel-purified and
cloned into pCRII T-A vector (Invitrogen), and the nucleotide
sequence of individual clones was determined by sequencing at
Genewiz using universal T7 and/or SP6 primers.
[0140] Conventional and real-time reverse transcription (RT)-PCR.
For total RNA extraction, cells were processed with an RNeasy Mini
kit (Qiagen) as per manufacturer's instructions. The potentially
residual genomic DNA was removed through on-column DNase digestion
with an RNase-Free DNase Set (Qiagen). One .mu.g of RNA for each
sample was reversely transcribed into cDNAs using random
hexanucleotide primers with a High Capacity cDNA Reverse
Transcription Kit (Invitrogen, Grand Island, N.Y.). Conventional
PCR was performed using a standard protocol. Quantitative PCR
(qPCR) analyses were carried out in a LightCycler480 (Roche) using
an SYBR.RTM. Green PCR Master Mix Kit (Applied Biosystems). The RT
reactions were diluted to 5 ng of total RNA per micro-liter of
reactions and 2 .mu.l was used in a 20-.mu.l PCR reaction. For qPCR
analysis of HIV-1 proviruses, 50 ng of genomic DNA were used. The
primers were synthesized in AlphaDNA and shown in FIG. 14. The
primers for human housekeeping genes GAPDH and RPL13A were obtained
from RealTimePrimers (Elkins Park, Pa.). Each sample was tested in
triplicate. Cycle threshold (Ct) values were obtained graphically
for the target genes and house-keeping genes. The difference in Ct
values between the housekeeping gene and target gene was
represented as .DELTA.Ct values. The .DELTA..DELTA.Ct values were
obtained by subtracting the .DELTA.Ct values of control samples
from those of experimental samples. Relative fold or percentage
change was calculated as 2-.DELTA..DELTA.Ct. In some cases,
absolute quantification was performed using the
pNL4-3-.DELTA.E-EGFP plasmid spiked in human genomic DNA as a
standard. The number of HIV-1 viral copies was calculated based on
standard curve after normalization with housekeeping gene.
[0141] GenomeWalker link PCR and long-range PCR. The integration
sites of HIV-1 in host cells were identified using a Lenti-X.TM.
Integration Site Analysis kit (Clontech) following the
manufacturer's instruction. Briefly, high quality genomic DNAs were
extracted from U1 cells using a NucleoSpin Tissue kit (Clontech).
To construct the viral integration libraries, each genomic DNA
sample was digested with blunt-end-generating digestion enzymes Dra
I, Ssp I or HpaI separately overnight at 37.degree. C. The
digestion efficiency was verified by electrophoresis on 0.6%
agarose. The digested DNA was purified using a NucleoSpin Gel and
PCR Clean-Up kit followed by ligation of the digested genomic DNA
fragments to GenomeWalker.TM. Adaptor at 16.degree. C. overnight.
The ligation reaction was stopped by incubation at 70.degree. C.
for 5 min and diluted 5 times with TE buffer. The primary PCR was
performed on the DNA segments with adaptor primer 1 (AP1) and
LTR-specific primer 1 (LSP1) using Advantage 2 Polymerase Mix
followed by a secondary (nested) PCR using AP2 and LSP2 primers
(FIG. 14). The secondary PCR products were separated on 1.5%
ethidium bromide-containing agarose gel. The major bands were
gel-purified and cloned into pCRII T-A vector (Invitrogen), and the
nucleotide sequence of individual clones was determined by
sequencing at Genewiz using universal T7 and SP6 primers. The
sequence reads were analyzed by NCBI BLAST searching. Two
integration sites of HIV-1 in U1 cells were identified in
chromosomes X and 2. A pair of primers covering each integration
site (FIG. 14) was synthesized in AlphaDNA. Long-range PCR using
the U1 genomic DNA was performed with a Phusion High-Fidelity PCR
kit (New England Biolabs) following the manufacturer's protocol.
The PCR products were visualized on 1% agarose gel and validated by
Sanger sequencing.
[0142] Surveyor assay. The presence of mutations in PCR products
was tested using a SURVEYOR Mutation Detection Kit (Transgenomic)
according to the protocol of the manufacturer. Briefly
heterogeneous PCR products were denatured for 10 min in 95.degree.
C. and hybridized by gradual cooling using a thermocycler. Next 300
ng of hybridized DNA (9 ul) was subjected to digestion with 0.25
.mu.l of SURVEYOR Nuclease in the presence of 0.25 .mu.l SURVEYOR
Enhancer S and 15 mM MgCl.sub.2 for 4 h at 42.degree. C. Then Stop
Solution was added and samples were resolved in 2% agarose gel
together with equal amounts of undigested PCR products.
[0143] Some PCR products were used for restriction fragment length
polymorphism analysis. Equal amount of PCR products were digested
with BsaJI. Digested DNA was separated on an ethidium
bromide-contained agarose gel (2%). For sequencing, PCR products
were cloned using a TA Cloning.RTM. Kit Dual Promoter with pCR.TM.
II vector (Invitrogen). The insert was confirmed by digestion with
EcoRI and positive clones were sent to Genwiz for Sanger
sequencing.
[0144] Selection of LTR target sites and prediction of potential
off-target sites. For initial studies, we obtained the LTR promoter
sequence (-411 to -10) of the integrated lentiviral LTR-luciferase
reporter by TA-cloning sequencing of PCR products from the genome
of human TZM-bl cells because of potential mutation of LTR during
passaging. This promoter sequence has 100% match to the 5'-LTR of
pHR'-CMV-LacZ lentiviral vector (AF105229). Thus, sense and
antisense sequences of the full-length pHR' 5'-LTR (634 bp) were
utilized to search for Cas9/gRNA target sites containing 20 bp gRNA
targeting sequence plus the PAM sequence (NRG) using Jack Lin's
CRISPR/Cas9 gRNA finder tool. The number of potential off-targets
with exact match was predicted by blasting each gRNA targeting
sequence plus NRG (AGG, TGG, GGG and CGG; AAG, TAG, GAG, CAG)
against all available human genomic and transcript sequences using
the NCBI/blastn suite with E-value cutoff 1,000 and word size 7.
After pressing Control+F, copy/paste the target sequence (1-23
through 9-23 nucleotides) and find the number of genomic targets
with 100% match to the target sequence. The number of off-targets
for each search was divided by 3 because of repeated genome
library.
[0145] Whole genome sequencing and bioinformatics analysis. The
control subclone C1 and experimental subclone AB7 of TZM-bl cells
were validated for target cut efficiency and functional suppression
of the LTR-luciferase reporter. The genomic DNA was isolated with
NucleoSpin Tissue kit (Clontech). The DNA samples were submitted to
the NextGen sequencing facility at Temple University Fox Chase
Cancer Center. Duplicated genomic DNA libraries were prepared from
each subclone using a NEBNext Ultra DNA Library Prep Kit for
Illumina (New England Biolab) following the manufacturer's
instruction. All libraries were sequenced with paired-end 141-bp
reads in two Illumina Rapid Run flowcells on HiSeq 2500 instrument
(Illumina). Demultiplexed read data from the sequenced libraries
were sent to AccuraScience, LLC for professional bioinformatics
analysis. Briefly, the raw reads were mapped against human genome
(hg19) and HIV-1 genome by using Bowtie2. A genomic analysis
toolkit (GATK, version 2.8.1) was used for the duplicated read
removal, local alignment, base quality recalibration and indel
calling. The confidence scores 10 and 30 were the thresholds for
low quality (LowQual) and high confidence calling (PASS). The
potential off-target sites of LTR-A and LTR-B with various
mismatches were predicted by NCBI/blastn suite as described above
and by a CRISPR Design Tool. All the potential gRNA target sites
(FIG. 15) were used to map the .+-.300 bp regions around each indel
identified by GATK. The locations of the overlapped regions in the
human genome and HIV-1 genome were compared between the control C1
and experimental AB7.
[0146] Statistical analysis. The quantitative data represented
mean.+-.standard deviation from 3-5 independent experiments, and
were evaluated by Student's t-test or ANOVA and Newman-Keuls
multiple comparison test. A p value that is <0.05 or 0.01 was
considered as a statistically significant difference.
Example 2: Cas9/LTR-gRNA Suppresses HIV-1 Reporter Virus Production
in CHME5 Microglial Cells Latently Infected with HIV-1
[0147] We assessed the ability of HIV-1-directed guide RNAs (gRNAs)
to abrogate LTR transcriptional activity and eradicate proviral DNA
from the genomes of latently-infected myeloid cells that serve as
HIV-1 reservoirs in the brain, a particularly intractable target
population. Our strategy was focused on targeting the HIV-1 LTR
promoter U3 region. By bioinformatic screening and
efficiency/off-target prediction, we identified four gRNA targets
(protospacers; LTRs A-D) that avoid conserved transcription factor
binding sites, minimizing the likelihood of altering host gene
expression (FIG. 5 and FIG. 13). We inserted DNA fragments
complementary to gRNAs A-D into a humanized Cas9 expression vector
(A/B in pX260; C/D in pX330) and tested their individual and
combined abilities to alter the integrated HIV-1 genome activity.
We first utilized the microglial cell line CHME5, which harbors
integrated copies of a single round HIV-1 vector that includes the
5' and 3' LTRs, and a gene encoding an enhanced green fluorescent
protein (EGFP) reporter replacing Gag (pNL4-3-.DELTA.Gag-d2EGFP).
Treating CHME5 cells with trichostatin A (TSA), a histone
deacetylase inhibitor, reactivates transcription from the majority
of the integrated proviruses and leads to expression of EGFP and
the remaining HIV-1 proteome. Expressing of gRNAs plus Cas9
markedly decreased the fraction of TSA-induced EGFP-positive CHME5
cells (FIG. 1A and FIG. 6). We detected insertion/deletion gene
mutations (indels) for LTRs A-D (FIG. 1B and FIG. 6B) using a Cell
nuclease-based heteroduplex-specific SURVEYOR assay. Similarly,
expressing gRNAs targeting LTRs C and D in HeLa-derived TZM-bl
cells, that contain stably incorporated HIV-1 LTR copies driving a
firefly-luciferase reporter gene, suppressed viral promoter
activity (FIG. 7A), and elicited indels within the LTR U3 region
(FIG. 7B, FIG. 7C, and FIG. 7D) demonstrated by SURVEYOR and Sanger
sequencing. Moreover, the combined expression of LTR C/D-targeting
gRNAs in these cells caused excision of the predicted 302-bp viral
DNA sequence, and emergence of the residual 194-bp fragment (FIG.
7E and FIG. 7F).
[0148] Multiplex expression of LTR-A/B gRNAs in mixed clonal CHME5
cells caused deletion of a 190-bp fragment between A and B target
sites and led to indels to various extents (FIG. 1C and FIG. 1D).
Among >20 puromycin-selected stable subclones, we found cell
populations with complete blockade of TSA-induced HIV-1 proviral
reactivation determined by flow cytometry for EGFP (FIG. 1E).
PCR-based analysis for EGFP and HIV-1 Rev response element (RRE) in
the proviral genome validated the eradication of HIV-1 genome (FIG.
1F, FIG. 1G). Furthermore, sequencing of the PCR products revealed
the entire 5'-3' LTR-spanning viral genome was deleted, yielding a
351-bp fragment via a 190-bp excision between cleavage sites A and
B (FIG. 1G and FIG. 8B), and a 682-bp fragment with a 175-bp
insertion and a 27-bp deletion at the LTR-A and -B sites
respectively (FIG. 8C). The residual HIV-1 genome (FIG. 1F, FIG.
1G, and FIG. 1H) may reflect the presence of trace
Cas9/gRNA-negative cells. These results indicate that LTR-targeting
Cas9/gRNAs A/B eradicates the HIV-1 genome and blocks its
reactivation in latently infected microglial cells.
Example 3: Cas9/LTR-gRNA Efficiently Eradicates Latent HIV-1 Virus
from U1 Monocytic Cells
[0149] The promonocytic U-937 cell subclone U1, an HIV-1 latency
model for infected perivascular macrophages and monocytes, is
chronically HIV-1-infected and exhibits low level constitutive
viral gene expression and replication. GenomeWalker mapping
detected two integrated proviral DNA copies at chromosomes Xp11-4
(FIG. 2A) and 2p21 (FIG. 9A) in U1 cells. A 9935-bp DNA fragment
representing the entire 9709-bp proviral HIV-1 DNA plus a flanking
226-bp X-chromosome-derived sequence (FIG. 2A), and a 10176-bp
fragment containing 9709-bp HIV-1 genome plus its flanking
2-chromosome-derived 467-bp (FIG. 9A, FIG. 9B) were identified by
the long-range PCR analysis of the parental control or empty-vector
(U6-CAG) U1 cells. The 226-bp and 467-bp fragments represent the
predicted segment from the other copy of chromosome X and 2
respectively, which lacked the integrated proviral DNA. In U1 cells
expressing LTR-A/B gRNAs and Cas9, we found two additional DNA
fragments of 833 and 670 bp in chromosome X and one additional
1102-bp fragment in chromosome 2. Thus, gRNAs A/B enabled Cas9 to
excise the HIV-15'-3' LTR-spanning viral genome segment in both
chromosomes. The 833-bp fragment includes the expected 226-bp from
the host genome and a 607-bp viral LTR sequence with a 27-bp
deletion around the LTR-A site (FIG. 2A and FIG. 2B). The 670-bp
fragment encompassed a 226-bp host sequence and residual 444-bp
viral LTR sequence after 190-bp fragment excision (FIG. 1D), caused
by gRNAs-A/B-guided cleavage at both LTRs (FIG. 2A). The additional
fragments did not emerge via circular LTR integration, because it
was absent in the parental U1 cells, and such circular LTR viral
genome configuration occurs immediately after HIV-1 infection but
is short lived and intolerant to repeated passaging. These cells
exhibited substantially decreased HIV-1 viral load, shown by the
functional p24 ELISA replication assay (FIG. 2C) and real-time PCR
analysis (FIG. 9C, FIG. 9D). The detectable but low residual viral
load and reactivation may result from cell population heterogeneity
and/or incomplete genome editing. We also validated the ablation of
HIV-1 genome by Cas9/LTR-A/B gRNAs in latently infected J-Lat T
cells harboring integrated HIV-R7/E-/EGFP using flow cytometry
analysis, SURVEYOR assay and PCR genotyping (FIG. 10), supporting
the results of previous reports on HIV-1 proviral deletion in
Jurkat T cells by Cas9/gRNA and ZFN. Taken together, our results
suggest that the multiplex LTR-gRNAs/Cas9 system efficiently
suppress HIV-1 replication and reactivation in latently
HIV-1-infected "reservoir" (microglial, monocytic and T) cells
typical of human latent HIV-1 infection, and in TZM-bl cells highly
sensitive for detecting HIV-1 transcription and reactivation.
Single or multiplex gRNAs targeting 5'- and 3'-LTRs effectively
eradicated the entire HIV-1 genome.
Example 4: Stable Expression of Cas9 Plus LTR-A/B Vaccinates TZM-bl
Cells Against New HIV-1 Virus Infection
[0150] We next tested whether combined Cas9/LTR gRNAs can immunize
cells against HIV-1 infection using stable Cas9/gRNAs-A and
-B-expressing TZM-bl-based clones (FIG. 3A). Two of 7
puromycin-selected subclones exhibited efficient excision of the
190-bp LTR-A/B site-spanning DNA fragment (FIG. 3B). However, the
remaining 5 subclones exhibited no excision (FIG. 3B) and no indel
mutations as verified by Sanger sequencing. PCR genotyping using
primers targeting Cas9 and U6-LTR showed that none of these
ineffective subclones retained the integrated copies of
Cas9/LTR-A/B gRNA expression cassettes. (FIG. 11A, FIG. 11B). As a
result, no expression of full-length Cas9 was detected (FIG. 11C,
FIG. 11D). The long-term expression of Cas9/LTR-A/B gRNAs did not
adversely affect cell growth or viability, suggesting a low
occurrence of off-target interference with the host genome or
Cas9-induced toxicity in this model. We assessed de novo HIV-1
replication by infecting cells with the VSVG-pseudotyped
pNL4-3-.DELTA.E-EGFP reporter virus, with EGFP-positivity by flow
cytometry indicating HIV-1 replication. Unlike the control U6-CAG
cells, the cells stably expressing Cas9/gRNAs LTRs-A/B failed to
support HIV-1 replication at 2 d post infection, indicating that
they were immunized effectively against new HIV-1 infection (FIG.
3C and FIG. 3D). A similar immunity against HIV-1 was observed in
Cas/LTR-A/B gRNA expressing cells infected with native T-tropic X4
strain pNL4-3-.DELTA.E-EGFP reporter virus (FIG. 12A) or native
M-tropic R5 strains such as SF162 and JRFL (FIG. 12B, FIG. 12C, and
FIG. 12D).
Example 5: Off-Target Effects of Cas9/LTR-A/B on Human Genome
[0151] The appeal of Cas9/gRNA as an interventional approach rests
on its highly specific on-target indel-producing cleavage, but
multiplex gRNAs could potentially cause host genome mutagenesis and
chromosomal disorders, cytotoxicity, genotoxicity, or oncogenesis.
Fairly low viral-human genome homology reduces this risk, but the
human genome contains numerous endogenous retroviral genomes that
are potentially susceptible to HIV-1-directed gRNAs. Therefore, we
assessed off-target effects of selected HIV-1 LTR gRNAs on the
human genome. Because the 12-14-bp seed sequence nearest the
protospacer-adjacent motif (PAM) region (NGG) is critical for
cleavage specificity, we searched >14-bp seed+NGG, and found no
off-target candidate sites by LTR gRNAs A-D (FIG. 13). It is not
surprising that progressively shorter gRNA segments yielded
increasing off-target cleavage sites 100% matched to corresponding
on-target sequences (i.e., NGG+13 bp yielded 6, 0, 2 and 9
off-target sites, respectively, whereas NGG+12 bp yielded 16, 5, 16
and 29; FIG. 13). From human genomic DNA we obtained a 500-800-bp
sequence covering one of predicted off-target sites using
high-fidelity PCR, and analyzed the potential mutations by SURVEYOR
and Sanger sequencing. We found no mutations (see representative
off-target sites #1, 5 and 6 in TZM-bl and U1 cells; FIG. 4A).
[0152] To assess risk of off-target effects comprehensively, we
performed whole genome sequencing (WGS) using the stable Cas9/gRNA
A/B-expressing and control U6-CAG TZM-bl cells (FIG. 4B, FIG. 4C,
and FIG. 4D). We identified 676,105 indels, using a genome analysis
toolkit (GATK, v.2.8.1) with human (hg19) and HIV-1 genomes as
reference sequences. Among the indels, 24% occurred in the U6-CAG
control, 26% in LTR-A/B subclone, and 50% in both (FIG. 4B). Such
substantial inter-sample indel-calling discrepancy suggests the
probable off-target effects, but most likely results from its
limited confidence, limited WGS coverage (15-30.times.), and
cellular heterogeneity. GATK reported only confidently-identified
indels: some found in the U6-CAG control but not in the LTR-A/B
subclone, and others in the LTR-A/B but not in the U6-CAG. We
expected abundant missing indel calls for both samples due to the
limited WGS coverage. Such limited indel-calling confidence also
implies the possibility of false negatives: missed indels occurring
in LTR-A/B but not U6-CAG controls. Cellular heterogeneity may
reflect variability of Cas9/gRNA editing efficiency and effects of
passaging. Therefore, we tested whether each indel was LTR-A/B
gRNA-induced, by analyzing .+-.300 bp flanking each indel against
LTRs-A/-B-targeted sites of the HIV-1 genome and
predicted/potential gRNA off-target sites of the host genome (FIG.
15). For sequences 100% matched to one containing the seed (12-bp)
plus NRG, we identified only 8 overlapped regions of 92 potential
off-target sites against 676,105 indels: 6 indels occurring in both
samples, and 2 only in the U6-CAG control (FIG. 4C, FIG. 4D). We
also identified 2 indels on HIV-1 LTR that occurred only in the
LTR-A/B subclone but, as expected, not in the U6-CAG control (FIG.
4C). The results suggest that LTR-A/B gRNAs induce the indicated
on-target indels, but no off-target indels, consistent with prior
findings using deep sequencing of PCR products covering
predicted/potential off-target site.
[0153] Our combined approaches minimized off-target effects while
achieving high efficiency and complete ablation of the genomically
integrated HIV-1 provirus. In addition to an extremely low homology
between the foreign viral genome and host cellular genome including
endogenous retroviral DNA, the key design attributes in our study
included: bioinformatic screening using the strictest 12-bp+NGG
target-selection criteria to exclude off-target human transcriptome
or (even rarely) untranslated-genomic sites; avoiding transcription
factor binding sites within the HIV-1 LTR promoter (potentially
conserved in the host genome); selection of LTR-A- and -B-directed,
30-bp gRNAs and also pre-crRNA system reflecting the original
bacterial immune mechanism to enhance specificity/efficiency vs.
20-bp gRNA-, chimeric crRNA-tracRNA-based system; and WGS, Sanger
sequencing and SURVEYOR assay, to identify and exclude potential
off-target effects. Indeed, the use of newly developed Cas9
double-nicking and RNA-guided FokI nuclease may further assist
identification of new targets within the various conserved regions
of HIV-1 with reduced off-target effects.
[0154] Our results show that the HIV-1 Cas9/gRNA system has the
ability to target more than one copy of the LTR, which are
positioned on different chromosomes, suggesting that this genome
editing system can alter the DNA sequence of HIV-1 in latently
infected patient's cells harboring multiple proviral DNAs. To
further ensure high editing efficacy and consistency of our
technology, one may consider the most stable region of HIV-1 genome
as a target to eradicate HIV-1 in patient samples, which may not
harbor only one strain of HIV-1. Alternatively, one may develop
personalized treatment modalities based on the data from deep
sequencing of the patient-derived viral genome prior to engineering
therapeutic Cas9/gRNA molecules.
[0155] Our results also demonstrate that Cas9/gRNA genome editing
can be used to immunize cells against HIV-1 infection. The
preventative vaccination is independent of HIV-1 strain's diversity
because the system targets genomic sequences regardless of how the
viruses enter the infected cells. The preexistence of the Cas9/gRNA
system in cells led to a rapid elimination of the new HIV-1 before
it integrates into the host genome. One may explore various systems
for delivery of Cas9/LTR-gRNA for immunizing high-risk subjects,
e.g., gene therapies (viral vector and nanoparticle) and
transplantation of autologous Cas9/gRNA-modified bone marrow
stem/progenitor cells or inducible pluripotent stem cells for
eradicating HIV-1 infection.
[0156] Here, we demonstrated the high specificity of Cas9/gRNAs in
editing HIV-1 target genome. Results from subclone data revealed
the strict dependence of genome editing on the presence of both
Cas9 and gRNA. Moreover, only one nucleotide mismatch in the
designed gRNA target will disable the editing potency. In addition,
all of our 4 designed LTR gRNAs worked well with different cell
lines, indicating that the editing is more efficient in the HIV-1
genome than the host cellular genome, wherein not all designed
gRNAs are functional, which may be due to different epigenetic
regulation, variable genome accessibility, or other reasons. Given
the ease and rapidity of Cas9/gRNA development, even if HIV-1
mutations confer resistance to one Cas9/gRNA-based therapy, as
described above, HIV-1 variants can be genotyped to enable another
personalized therapy for individual patients.
[0157] A number of embodiments of the invention have been
described. Nevertheless, it will be understood that various
modifications may be made without departing from the spirit and
scope of the invention. Accordingly, other embodiments are within
the scope of the following claims.
Sequence CWU 1
1
389130DNAHuman immunodeficiency virus 1 1gccagggatc agatatccac
tgacctttgg 30234DNAHuman immunodeficiency virus 1 2tccggagtac
ttcaagaact gctgacatcg agct 34319DNAArtificial SequenceDescription
of Artificial Sequence Syntheticoligonucleotide 3ccactgacta
cttcaagaa 194859DNAArtificial SequenceDescription of Artificial
Sequence Syntheticpolynucleotidemodified_base(289)..(313)a, c, t,
g, unknown or othermisc_feature(289)..(313)n is a, c, g, or t
4ctaggtgatt aggatattct acaatccaaa ttcttaccag tttgggatta ttcaaattgg
60gcaccttggc agatatgttt tgaaaactgc taggcaaagc attctggaag aatagacaaa
120gaagtaataa aatataacaa aaagcagtgg aagttacaaa aaaaaatgtt
tctcttttgg 180aagggctaat ttggtcccaa agaagacaag atatccttga
tctgtggatc taccacacac 240aaggctactt ccctgattgg cagaactaca
acaccagggc cagggatcnn nnnnnnnnnn 300nnnnnnnnnn nnnttcaagt
tagtaccagt tgagccaggg caggtagaag aggccaatga 360aggagagaac
aacaccttgt tacaccctat gagcctgcat gggatggagg acccggaggg
420agaagtatta gtgtggaagt ttgacagcct cctagcattt cgtcacatgg
cccgagagct 480gcatccggag tactacaaag actgctgaca tcgagttttc
tacaagggac tttccgctgg 540ggactttcca gggaggtgtg gcctgggcgg
gactggggag tggcgagccc tcagatgctg 600catataagca gctgcttttt
gcctgtactg ggtctctctg gttagaccag atctgagcct 660gggagctctc
tggctagcta gggaacccac tgcttaagcc tcaataaagc ttgccttgag
720tgctacaagt agtgtgtgcc cgtctgttgt gtgactctgg taactagaga
tccctcagac 780ccttttagtc agtgtggaaa atctctagca tctttaaagt
acagaatgcc aaaacaggaa 840ggattgataa gatagtcgt 859510DNAHuman
immunodeficiency virus 1 5tcttttggaa 10676DNAHuman immunodeficiency
virus 1 6gattggcaga actacacacc agggccaggg atcagatatc cactgacctt
tggatggtgc 60ttcaagttag taccag 76710DNAHuman immunodeficiency virus
1 7tctttaaagt 10810DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 8tcttttggaa 10963DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
9gattggcaga actacaacac cagggccagg gatcagatgg atggtgcttc aagttagtac
60cag 631010DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 10tctttaaagt 101110DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
11tcttttggaa 101250DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 12gattggcaga actacaacac
cagggccagg gatcttcaag ttagtaccag 501310DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
13tctttaaagt 101424DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 14gagatcctgt ctcaaaaaaa agtt
241517DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 15atctatccat gagggcg 1716402DNAHuman
immunodeficiency virus 1 16gatctgtgga tctaccacac acaaggctac
ttccctgatt ggcagaacta cacaccaggg 60ccagggatca gatatccact gacctttgga
tggtgctaca agctagtacc agttgagcaa 120gagaaggtag aagaagccaa
tgaaggagag aacacccgct tgttacaccc tgtgagcctg 180catgggatgg
atgacccgga gagagaagta ttagagtgga ggtttgacag ccgcctagca
240tttcatcaca tggcccgaga gctgcatccg gagtacttca agaactgctg
acatcgagct 300tgctacaagg gactttccgc tggggacttt ccagggaggc
gtggcctggg cgggactggg 360gagtggcgag ccctcagatg ctgcatataa
gcagctgctt tt 4021731DNAHuman immunodeficiency virus 1 17ccctgattgg
cagaactaca caccagggcc a 311832DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 18ccctgattgg
cagaactaca acaccagggc ca 321932DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 19ccctgattgg
cagaactaca acaccagggc ca 322032DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 20ccctgattgg
cagaactaca acaccagggc ca 322130DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 21ccctgattgg
cagaactaca accagggcca 302229DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 22ccctgattgg
cagaactaca ccagggcca 292329DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 23ccctgattgg
cagaactaca ccagggcca 292426DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 24ccctgattgg
cagaactaca gggcca 262529DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 25ccctgattgg
cagaactaca gggccaggg 292686DNAHuman immunodeficiency virus 1
26gactttccag ggaggcgtgg cctgggcggg actggggagt ggcgagccct cagatgctgc
60atataagcag cggtgaagcc gaattc 862786DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
27gactttccag ggaggcgtgg cctgggcggg actggggggt ggcgagccct cagatgctgc
60atataagcag cggtgaagcc gaattc 862888DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
28gactttccag ggaggcgtgg cctgggcggg tatctgggga gtggcgagcc ctcagatgct
60gcatataagc agcggtgaag ccgaattc 882985DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
29gactttccag gggggcgtgg cctgggcggg actggggagt ggcgagccct cagatgctgc
60ataaagcagc ggtgaagccg aattc 853023DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
30gactttccag ggaagccgaa ttc 233125DNAArtificial SequenceDescription
of Artificial Sequence Syntheticoligonucleotide 31gattggcaga
actacactgg ggagt 253226DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 32gattggcaga
actacacctc agatgc 263328DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 33catcacatgg
cccgctgctg acatcgag 283455DNAHuman immunodeficiency virus 1
34catcacgtgg cccgagagct gcatccggag tacttcaaga actgctgaca tcgag
55351106DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotidemodified_base(152)..(155)a, c, t, g, unknown
or othermisc_feature(152)..(155)n is a, c, g, or t 35gctattgtat
ctgatcacaa gctgttaaaa gcggtcatgc cacttcttga atgctttgca 60gctggaaggg
ctaatttggt cccaaagaag acaagatatc cttgatctgt ggatctacca
120cacacaaggc tacttccctg attggcagaa cnnnncacca gggccaggga
tcagatatcc 180actgaccatc cactttggat ggtgcttcaa gttagtacca
gttgagccag ggcaggtaga 240agaggccaat gaaggagaga acaacacctt
gttacaccct atgagcctgc atgggatgga 300ggacccggag ggagaagtat
tagtgtggaa gtttgacagc ctcctagcat ttcgtcacat 360ggcccgagag
ctgcatccgg agtactacaa agactgctga catcgagttt tctacaaggg
420actttccgct ggggactttc cagggaggtg tggcctgggc gggactgggg
agtggcgagc 480cctcagatgc tgcatataag cagctgcttt ttgcctgtac
tgggtctctc tggttagacc 540agatctgagc ctgggagctc tctggctagc
tagggaaccc actgcttaag cctcaataaa 600gcttgccttg agtgctacaa
gtagtgtgtg cccgtctgtt gtgtgactct ggtaactaga 660gatccctcag
acccttttag tcagtgtgga aaatctctag cagcagctta gaaatttttt
720ccaccagagg ccgggcgtgg tggctcacgc ctgtaatccc agcactttgg
gaggccgagg 780tgggcggatc acctgaagtc aggagttcga gaccagcctc
aacatggaga aaccccatct 840ctactaaaaa tacaaaatta gctgggcgtg
gtggtgcatg cctgtaatcc cagctacttg 900ggaggctgag acaggataat
tgcttgaacc tggaaggcag aggttgcggt gagccgagat 960tgcgccattg
cattccagcc tgggcaacag gagcgaaact tcgtctcaaa aaaaaaaaaa
1020aaagacattt tttccaccag ataccctaga tcatgactgt taagtctggc
cttccacgaa 1080gccctaggac ctggacacac aatcaa 11063636DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
36aaacagggcc agggatcaga tatccactga ccttgt 363735DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
37taaacaaggt cagtggatat ctgatccctg gccct 353836DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
38aaacagctcg atgtcagcag ttcttgaagt actcgt 363935DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
39taaacgagta cttcaagaac tgctgacatc gagct 354024DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
40caccgattgg cagaactaca cacc 244124DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
41aaacggtgtg tagttctgcc aatc 244224DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
42caccgcgtgg cctgggcggg actg 244324DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
43aaaccagtcc cgcccaggcc acgc 244424DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
44tggaagggct aattcactcc caac 244524DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
45ccgagagctc ccaggctcag atct 244627DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
46caccgatctg tggatctacc acacaca 274724DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
47aaacgagtca cacaacagac gggc 244837DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
48cgcctcgagg atccgagggc ctatttccca tgattcc 374935DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
49tgtgaattca ggcgggccat ttaccgtaag ttatg 355025DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
50acgactatct tatcaatcct tcctg 255126DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
51ctaggtgatt aggatattct acaatc 265224DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
52gctattgtat ctgatcacaa gctg 245324DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
53ttgattgtgt gtccaggtcc tagg 245423DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
54gcaagggcga ggagctgttc acc 235524DNAArtificial SequenceDescription
of Artificial Sequence Syntheticprimer 55ttgtagttgc cgtcgtcctt gaag
245623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 56aatggtacat caggccatat cac 235723DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
57cccactgtgt ttagcatggt att 235823DNAArtificial SequenceDescription
of Artificial Sequence Syntheticprimer 58cacagcatca agaagaacct gat
235924DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 59tcttccgtct ggtgtatctt cttc 246028DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
60cgccaagctt gaataggagc tttgttcc 286130DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer
61ctaggatcca ggagctgttg atcctttagg 306223DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
62gtggactttg gatggtgaga tag 236323DNAArtificial SequenceDescription
of Artificial Sequence Syntheticoligonucleotide 63gcctggcaag
agtgaactga gtc 236423DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 64aagataatga
gttgtggcag agc 236524DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 65tctacctggt
aatccagcat ctgg 246623DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 66ataggaggaa
ggcaccaaga ggg 236723DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 67aatgatgctt
tggtcctact cct 236824DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 68tgctcttgct
actctggcat gtac 246923DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 69aatctacctc
tgagagctgc agg 237023DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 70tcagacacag
ctgaagcaga ggc 237123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 71atgccagtgt
cagtagatgt cag 237224DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 72tcaagatcag
ccagagtgca catg 247323DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 73tgctcttccg
agcctctctg gag 237422DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 74atggactatc
atatgcttac cg 227528DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 75gcttcagcaa gccgagtcct gcgtcgag
287628DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 76gctcctctgg tttccctttc gctttcaa
287722DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 77gtaatacgac tcactatagg gc
227819DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 78actatagggc acgcgtggt 197923DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
79tcagaccctt ttagtcagtg tgg 238023DNAArtificial SequenceDescription
of Artificial Sequence Syntheticoligonucleotide 80ttgcttgtac
tgggtctctc tgg 238123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 81cagctgcttt
ttgcttgtac tgg 238223DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 82ctgacatcga
gcttgctaca agg 238323DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 83ccgcctagca
tttcatcaca tgg 238423DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 84cggagagaga
agtattagag tgg 238523DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 85agtaccagtt
gagcaagaga agg 238623DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 86gatatccact
gacctttgga tgg 238723DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 87gattggcaga
actacacacc agg 238823DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 88cacaaggcta
cttccctgat tgg 238923DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 89ctgtggatct
accacacaca agg 239023DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 90tgggagctct
ctggctaact agg 239123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 91ggttagacca
gatctgagcc tgg 239223DNAArtificial SequenceDescription of
Artificial Sequence
Syntheticoligonucleotide 92tgctacaagg gactttccgc tgg
239323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 93agagagaagt attagagtgg agg
239423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 94ttacaccctg tgagcctgca tgg
239523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 95aaggtagaag aagccaatga agg
239623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 96atcagatatc cactgacctt tgg
239723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 97gacaagatat ccttgatctg tgg
239823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 98gcccgtctgt tgtgtgactc tgg
239923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 99atctgagcct gggagctctc tgg
2310023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 100ctttccgctg gggactttcc agg
2310123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 101cagaactaca caccagggcc agg
2310223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 102cctgcatggg atggatgacc cgg
2310323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 103ccctgtgagc ctgcatggga tgg
2310423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 104ctttccaggg aggcgtggcc tgg
2310523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 105ggggactttc cagggaggcg tgg
2310623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 106ccgctgggga ctttccaggg agg
2310723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 107catggcccga gagctgcatc cgg
2310823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 108gcctgggcgg gactggggag tgg
2310923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 109aggcgtggcc tgggcgggac tgg
2311023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 110gcgtggcctg ggcgggactg ggg
2311123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 111ccagggaggc gtggcctggg cgg
2311223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 112tgtggtagat ccacagatca agg
2311323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 113ggtgtgtagt tctgccaatc agg
2311423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 114gtcagtggat atctgatccc tgg
2311523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 115tagcaccatc caaaggtcag tgg
2311623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 116tagcttgtag caccatccaa agg
2311723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 117tctaccttct cttgctcaac tgg
2311823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 118cactctaata cttctctctc cgg
2311923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 119ccatgtgatg aaatgctagg cgg
2312023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 120gggccatgtg atgaaatgct agg
2312123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 121cagcagttct tgaagtactc cgg
2312223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 122ctgcttatat gcagcatctg agg
2312323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 123cacactactt gaagcactca agg
2312423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 124taccagagtc acacaacaga cgg
2312523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 125acactgacta aaagggtctg agg
2312623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 126caaggatatc ttgtcttcgt tgg
2312723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 127cagggaagta gccttgtgtg tgg
2312823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 128gcgggtgttc tctccttcat tgg
2312923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 129tagttagcca gagagctccc agg
2313023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 130ctttattgag gcttaagcag tgg
2313123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 131actcaaggca agctttattg agg
2313223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 132ggatatctga tccctggccc tgg
2313323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 133ggctcacagg gtgtaacaag cgg
2313423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 134tccatcccat gcaggctcac agg
2313523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 135agtactccgg atgcagctct cgg
2313623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 136agagctccca ggctcagatc tgg
2313723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 137gattttccac actgactaaa agg
2313823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 138ccgggtcatc catcccatgc agg
2313923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 139cctccctgga aagtccccag cgg
2314023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 140gccactcccc agtcccgccc agg
2314123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 141ccgcccaggc cacgcctccc tgg
2314223DNAHuman immunodeficiency virus 1 142atcagatatc cactgacctt
tgg 2314322DNAHuman immunodeficiency virus 1 143tcagatatcc
actgaccttt gg 2214422DNAHuman immunodeficiency virus 1
144tcagatatcc actgaccttt gg 2214521DNAHuman immunodeficiency virus
1 145cagatatcca ctgacctttg g 2114621DNAHuman immunodeficiency virus
1 146cagatatcca ctgacctttg g 2114720DNAHuman immunodeficiency virus
1 147agatatccac tgacctttgg 2014820DNAHuman immunodeficiency virus 1
148agatatccac tgacctttgg 2014919DNAHuman immunodeficiency virus 1
149gatatccact gacctttgg 1915019DNAHuman immunodeficiency virus 1
150gatatccact gacctttgg 1915118DNAHuman immunodeficiency virus 1
151atatccactg acctttgg 1815218DNAHuman immunodeficiency virus 1
152atatccactg acctttgg 1815317DNAHuman immunodeficiency virus 1
153tatccactga ccttggg 1715417DNAHuman immunodeficiency virus 1
154tatccactga cctttgg 1715517DNAHuman immunodeficiency virus 1
155tatccactga cctttgg 1715617DNAHuman immunodeficiency virus 1
156tatccactga ccttaag 1715717DNAHuman immunodeficiency virus 1
157tatccactga ccttgag 1715816DNAHuman immunodeficiency virus 1
158atccactgac cttagg 1615916DNAHuman immunodeficiency virus 1
159atccactgac cttagg 1616016DNAHuman immunodeficiency virus 1
160atccactgac cttggg 1616116DNAHuman immunodeficiency virus 1
161atccactgac cttggg 1616216DNAHuman immunodeficiency virus 1
162atccactgac cttggg 1616316DNAHuman immunodeficiency virus 1
163atccactgac cttggg 1616416DNAHuman immunodeficiency virus 1
164atccactgac ctttgg 1616516DNAHuman immunodeficiency virus 1
165atccactgac ctttgg 1616616DNAHuman immunodeficiency virus 1
166atccactgac ctttgg 1616716DNAHuman immunodeficiency virus 1
167atccactgac cttaag 1616816DNAHuman immunodeficiency virus 1
168atccactgac cttaag 1616916DNAHuman immunodeficiency virus 1
169atccactgac cttcag 1617016DNAHuman immunodeficiency virus 1
170atccactgac cttcag 1617116DNAHuman immunodeficiency virus 1
171atccactgac cttgag 1617216DNAHuman immunodeficiency virus 1
172atccactgac cttgag 1617315DNAHuman immunodeficiency virus 1
173tccactgacc ttagg 1517415DNAHuman immunodeficiency virus 1
174tccactgacc ttagg 1517515DNAHuman immunodeficiency virus 1
175tccactgacc ttagg 1517615DNAHuman immunodeficiency virus 1
176tccactgacc ttagg 1517715DNAHuman immunodeficiency virus 1
177tccactgacc ttagg 1517815DNAHuman immunodeficiency virus 1
178tccactgacc ttagg 1517915DNAHuman immunodeficiency virus 1
179tccactgacc ttggg 1518015DNAHuman immunodeficiency virus 1
180tccactgacc ttggg 1518115DNAHuman immunodeficiency virus 1
181tccactgacc ttggg 1518215DNAHuman immunodeficiency virus 1
182tccactgacc ttggg 1518315DNAHuman immunodeficiency virus 1
183tccactgacc ttggg 1518415DNAHuman immunodeficiency virus 1
184tccactgacc ttggg 1518515DNAHuman immunodeficiency virus 1
185tccactgacc ttggg 1518615DNAHuman immunodeficiency virus 1
186tccactgacc ttggg 1518715DNAHuman immunodeficiency virus 1
187tccactgacc tttgg 1518815DNAHuman immunodeficiency virus 1
188tccactgacc tttgg 1518915DNAHuman immunodeficiency virus 1
189tccactgacc tttgg 1519015DNAHuman immunodeficiency virus 1
190tccactgacc tttgg 1519115DNAHuman immunodeficiency virus 1
191tccactgacc tttgg 1519215DNAHuman immunodeficiency virus 1
192tccactgacc tttgg 1519315DNAHuman immunodeficiency virus 1
193tccactgacc tttgg 1519415DNAHuman immunodeficiency virus 1
194tccactgacc tttgg 1519515DNAHuman immunodeficiency virus 1
195tccactgacc tttgg 1519615DNAHuman immunodeficiency virus 1
196tccactgacc ttaag 1519715DNAHuman immunodeficiency virus 1
197tccactgacc ttaag 1519815DNAHuman immunodeficiency virus 1
198tccactgacc ttaag 1519915DNAHuman immunodeficiency virus 1
199tccactgacc ttaag 1520015DNAHuman immunodeficiency virus 1
200tccactgacc ttaag 1520115DNAHuman immunodeficiency virus 1
201tccactgacc ttcag 1520215DNAHuman immunodeficiency virus 1
202tccactgacc ttcag 1520315DNAHuman immunodeficiency virus 1
203tccactgacc ttcag 1520415DNAHuman immunodeficiency virus 1
204tccactgacc ttcag 1520515DNAHuman immunodeficiency virus 1
205tccactgacc ttcag 1520615DNAHuman immunodeficiency virus 1
206tccactgacc ttcag 1520715DNAHuman immunodeficiency virus 1
207tccactgacc ttcag 1520815DNAHuman immunodeficiency virus 1
208tccactgacc ttcag 1520915DNAHuman immunodeficiency virus 1
209tccactgacc ttcag 1521015DNAHuman immunodeficiency virus 1
210tccactgacc ttcag 1521115DNAHuman immunodeficiency virus 1
211tccactgacc ttcag 1521215DNAHuman immunodeficiency virus 1
212tccactgacc ttcag 1521315DNAHuman immunodeficiency virus 1
213tccactgacc ttgag 1521415DNAHuman immunodeficiency virus 1
214tccactgacc ttgag 1521515DNAHuman immunodeficiency virus 1
215tccactgacc ttgag 1521615DNAHuman immunodeficiency virus 1
216tccactgacc ttgag 1521715DNAHuman immunodeficiency virus 1
217tccactgacc ttgag 1521815DNAHuman immunodeficiency virus 1
218tccactgacc ttgag 1521915DNAHuman immunodeficiency virus 1
219tccactgacc ttgag 1522015DNAHuman immunodeficiency virus 1
220tccactgacc ttgag 1522115DNAHuman immunodeficiency virus 1
221tccactgacc ttgag 1522215DNAHuman immunodeficiency virus 1
222tccactgacc tttag 1522315DNAHuman immunodeficiency virus 1
223tccactgacc tttag 1522415DNAHuman immunodeficiency virus 1
224tccactgacc tttag 1522515DNAHuman immunodeficiency virus 1
225tccactgacc tttag 1522615DNAHuman immunodeficiency virus 1
226tccactgacc tttag
1522723DNAHuman immunodeficiency virus 1 227cagcagttct tgaagtactc
cgg 2322822DNAHuman immunodeficiency virus 1 228agcagttctt
gaagtactcc gg 2222921DNAHuman immunodeficiency virus 1
229gcagttcttg aagtactccg g 2123020DNAHuman immunodeficiency virus 1
230cagttcttga agtactccgg 2023119DNAHuman immunodeficiency virus 1
231agttcttgaa gtactccgg 1923218DNAHuman immunodeficiency virus 1
232gttcttgaag tactccgg 1823317DNAHuman immunodeficiency virus 1
233ttcttgaagt actccgg 1723416DNAHuman immunodeficiency virus 1
234tcttgaagta ctccgg 1623516DNAHuman immunodeficiency virus 1
235tcttgaagta ctctag 1623615DNAHuman immunodeficiency virus 1
236cttgaagtac tcagg 1523715DNAHuman immunodeficiency virus 1
237cttgaagtac tcagg 1523815DNAHuman immunodeficiency virus 1
238cttgaagtac tcagg 1523915DNAHuman immunodeficiency virus 1
239cttgaagtac tcagg 1524015DNAHuman immunodeficiency virus 1
240cttgaagtac tccgg 1524115DNAHuman immunodeficiency virus 1
241cttgaagtac tctgg 1524215DNAHuman immunodeficiency virus 1
242cttgaagtac tcaag 1524315DNAHuman immunodeficiency virus 1
243cttgaagtac tcaag 1524415DNAHuman immunodeficiency virus 1
244cttgaagtac tcaag 1524515DNAHuman immunodeficiency virus 1
245cttgaagtac tcaag 1524615DNAHuman immunodeficiency virus 1
246cttgaagtac tcaag 1524715DNAHuman immunodeficiency virus 1
247cttgaagtac tccag 1524815DNAHuman immunodeficiency virus 1
248cttgaagtac tccag 1524915DNAHuman immunodeficiency virus 1
249cttgaagtac tccag 1525015DNAHuman immunodeficiency virus 1
250cttgaagtac tccag 1525115DNAHuman immunodeficiency virus 1
251cttgaagtac tctag 1525215DNAHuman immunodeficiency virus 1
252cttgaagtac tctag 1525323DNAHuman immunodeficiency virus 1
253atcagatatc cactgacctt tgg 2325422DNAHuman immunodeficiency virus
1 254tcagatatcc actgaccttt gg 2225522DNAHuman immunodeficiency
virus 1 255tcagatatcc actgaccttt gg 2225621DNAHuman
immunodeficiency virus 1 256cagatatcca ctgacctttg g 2125721DNAHuman
immunodeficiency virus 1 257cagatatcca ctgacctttg g 2125820DNAHuman
immunodeficiency virus 1 258agatatccac tgacctttgg 2025920DNAHuman
immunodeficiency virus 1 259agatatccac tgacctttgg 2026019DNAHuman
immunodeficiency virus 1 260gatatccact gacctttgg 1926119DNAHuman
immunodeficiency virus 1 261gatatccact gacctttgg 1926218DNAHuman
immunodeficiency virus 1 262atatccactg acctttgg 1826318DNAHuman
immunodeficiency virus 1 263atatccactg acctttgg 1826417DNAHuman
immunodeficiency virus 1 264tatccactga ccttggg 1726517DNAHuman
immunodeficiency virus 1 265tatccactga cctttgg 1726617DNAHuman
immunodeficiency virus 1 266tatccactga cctttgg 1726717DNAHuman
immunodeficiency virus 1 267tatccactga ccttaag 1726817DNAHuman
immunodeficiency virus 1 268tatccactga ccttgag 1726916DNAHuman
immunodeficiency virus 1 269atccactgac cttagg 1627016DNAHuman
immunodeficiency virus 1 270atccactgac cttagg 1627116DNAHuman
immunodeficiency virus 1 271atccactgac cttggg 1627216DNAHuman
immunodeficiency virus 1 272atccactgac cttggg 1627316DNAHuman
immunodeficiency virus 1 273atccactgac cttggg 1627416DNAHuman
immunodeficiency virus 1 274atccactgac cttggg 1627516DNAHuman
immunodeficiency virus 1 275atccactgac ctttgg 1627616DNAHuman
immunodeficiency virus 1 276atccactgac ctttgg 1627716DNAHuman
immunodeficiency virus 1 277atccactgac ctttgg 1627816DNAHuman
immunodeficiency virus 1 278atccactgac cttaag 1627916DNAHuman
immunodeficiency virus 1 279atccactgac cttaag 1628016DNAHuman
immunodeficiency virus 1 280atccactgac cttcag 1628116DNAHuman
immunodeficiency virus 1 281atccactgac cttcag 1628216DNAHuman
immunodeficiency virus 1 282atccactgac cttgag 1628316DNAHuman
immunodeficiency virus 1 283atccactgac cttgag 1628415DNAHuman
immunodeficiency virus 1 284tccactgacc ttagg 1528515DNAHuman
immunodeficiency virus 1 285tccactgacc ttagg 1528615DNAHuman
immunodeficiency virus 1 286tccactgacc ttagg 1528715DNAHuman
immunodeficiency virus 1 287tccactgacc ttagg 1528815DNAHuman
immunodeficiency virus 1 288tccactgacc ttagg 1528915DNAHuman
immunodeficiency virus 1 289tccactgacc ttagg 1529015DNAHuman
immunodeficiency virus 1 290tccactgacc ttggg 1529115DNAHuman
immunodeficiency virus 1 291tccactgacc ttggg 1529215DNAHuman
immunodeficiency virus 1 292tccactgacc ttggg 1529315DNAHuman
immunodeficiency virus 1 293tccactgacc ttggg 1529415DNAHuman
immunodeficiency virus 1 294tccactgacc ttggg 1529515DNAHuman
immunodeficiency virus 1 295tccactgacc ttggg 1529615DNAHuman
immunodeficiency virus 1 296tccactgacc ttggg 1529715DNAHuman
immunodeficiency virus 1 297tccactgacc ttggg 1529815DNAHuman
immunodeficiency virus 1 298tccactgacc tttgg 1529915DNAHuman
immunodeficiency virus 1 299tccactgacc tttgg 1530015DNAHuman
immunodeficiency virus 1 300tccactgacc tttgg 1530115DNAHuman
immunodeficiency virus 1 301tccactgacc tttgg 1530215DNAHuman
immunodeficiency virus 1 302tccactgacc tttgg 1530315DNAHuman
immunodeficiency virus 1 303tccactgacc tttgg 1530415DNAHuman
immunodeficiency virus 1 304tccactgacc tttgg 1530515DNAHuman
immunodeficiency virus 1 305tccactgacc tttgg 1530615DNAHuman
immunodeficiency virus 1 306tccactgacc tttgg 1530715DNAHuman
immunodeficiency virus 1 307tccactgacc ttaag 1530815DNAHuman
immunodeficiency virus 1 308tccactgacc ttaag 1530915DNAHuman
immunodeficiency virus 1 309tccactgacc ttaag 1531015DNAHuman
immunodeficiency virus 1 310tccactgacc ttaag 1531115DNAHuman
immunodeficiency virus 1 311tccactgacc ttaag 1531215DNAHuman
immunodeficiency virus 1 312tccactgacc ttcag 1531315DNAHuman
immunodeficiency virus 1 313tccactgacc ttcag 1531415DNAHuman
immunodeficiency virus 1 314tccactgacc ttcag 1531515DNAHuman
immunodeficiency virus 1 315tccactgacc ttcag 1531615DNAHuman
immunodeficiency virus 1 316tccactgacc ttcag 1531715DNAHuman
immunodeficiency virus 1 317tccactgacc ttcag 1531815DNAHuman
immunodeficiency virus 1 318tccactgacc ttcag 1531915DNAHuman
immunodeficiency virus 1 319tccactgacc ttcag 1532015DNAHuman
immunodeficiency virus 1 320tccactgacc ttcag 1532115DNAHuman
immunodeficiency virus 1 321tccactgacc ttcag 1532215DNAHuman
immunodeficiency virus 1 322tccactgacc ttcag 1532315DNAHuman
immunodeficiency virus 1 323tccactgacc ttcag 1532415DNAHuman
immunodeficiency virus 1 324tccactgacc ttgag 1532515DNAHuman
immunodeficiency virus 1 325tccactgacc ttgag 1532615DNAHuman
immunodeficiency virus 1 326tccactgacc ttgag 1532715DNAHuman
immunodeficiency virus 1 327tccactgacc ttgag 1532815DNAHuman
immunodeficiency virus 1 328tccactgacc ttgag 1532915DNAHuman
immunodeficiency virus 1 329tccactgacc ttgag 1533015DNAHuman
immunodeficiency virus 1 330tccactgacc ttgag 1533115DNAHuman
immunodeficiency virus 1 331tccactgacc ttgag 1533215DNAHuman
immunodeficiency virus 1 332tccactgacc ttgag 1533315DNAHuman
immunodeficiency virus 1 333tccactgacc tttag 1533415DNAHuman
immunodeficiency virus 1 334tccactgacc tttag 1533515DNAHuman
immunodeficiency virus 1 335tccactgacc tttag 1533615DNAHuman
immunodeficiency virus 1 336tccactgacc tttag 1533715DNAHuman
immunodeficiency virus 1 337tccactgacc tttag 1533823DNAHuman
immunodeficiency virus 1 338cagcagttct tgaagtactc cgg
2333922DNAHuman immunodeficiency virus 1 339agcagttctt gaagtactcc
gg 2234021DNAHuman immunodeficiency virus 1 340gcagttcttg
aagtactccg g 2134120DNAHuman immunodeficiency virus 1 341cagttcttga
agtactccgg 2034219DNAHuman immunodeficiency virus 1 342agttcttgaa
gtactccgg 1934318DNAHuman immunodeficiency virus 1 343gttcttgaag
tactccgg 1834417DNAHuman immunodeficiency virus 1 344ttcttgaagt
actccgg 1734516DNAHuman immunodeficiency virus 1 345tcttgaagta
ctccgg 1634616DNAHuman immunodeficiency virus 1 346tcttgaagta
ctctag 1634715DNAHuman immunodeficiency virus 1 347cttgaagtac tcagg
1534815DNAHuman immunodeficiency virus 1 348cttgaagtac tcagg
1534915DNAHuman immunodeficiency virus 1 349cttgaagtac tcagg
1535015DNAHuman immunodeficiency virus 1 350cttgaagtac tcagg
1535115DNAHuman immunodeficiency virus 1 351cttgaagtac tccgg
1535215DNAHuman immunodeficiency virus 1 352cttgaagtac tctgg
1535315DNAHuman immunodeficiency virus 1 353cttgaagtac tcaag
1535415DNAHuman immunodeficiency virus 1 354cttgaagtac tcaag
1535515DNAHuman immunodeficiency virus 1 355cttgaagtac tcaag
1535615DNAHuman immunodeficiency virus 1 356cttgaagtac tcaag
1535715DNAHuman immunodeficiency virus 1 357cttgaagtac tcaag
1535815DNAHuman immunodeficiency virus 1 358cttgaagtac tccag
1535915DNAHuman immunodeficiency virus 1 359cttgaagtac tccag
1536015DNAHuman immunodeficiency virus 1 360cttgaagtac tccag
1536115DNAHuman immunodeficiency virus 1 361cttgaagtac tccag
1536215DNAHuman immunodeficiency virus 1 362cttgaagtac tctag
1536315DNAHuman immunodeficiency virus 1 363cttgaagtac tctag
1536423DNAHuman immunodeficiency virus 1 364gatctgtgga tctaccacac
aca 2336526DNAHuman immunodeficiency virus 1 365gatctgtgga
tctaccacac acaagg 2636620DNAHuman immunodeficiency virus 1
366gattggcaga actacacacc 2036723DNAHuman immunodeficiency virus 1
367gattggcaga actacacacc agg 2336827DNAHuman immunodeficiency virus
1 368gccagggatc agatatccac tgacctt 2736930DNAHuman immunodeficiency
virus 1 369gccagggatc agatatccac tgacctttgg 3037030DNAHuman
immunodeficiency virus 1 370gagtacttca agaactgctg acatcgagct
3037133DNAHuman immunodeficiency virus 1 371ccggagtact tcaagaactg
ctgacatcga gct 3337220DNAHuman immunodeficiency virus 1
372gcgtggcctg ggcgggactg 2037323DNAHuman immunodeficiency virus 1
373gcgtggcctg ggcgggactg ggg 2337422DNAHuman immunodeficiency virus
1 374tcagatgctg catataagca gc 2237525DNAHuman immunodeficiency
virus 1 375ccctcagatg ctgcatataa gcagc 25376634DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
376tggaagggct aattcactcc caacgaagac aagatatcct tgatctgtgg
atctaccaca 60cacaaggcta cttccctgat tggcagaact acacaccagg gccagggatc
agatatccac 120tgacctttgg atggtgctac aagctagtac cagttgagca
agagaaggta gaagaagcca 180atgaaggaga gaacacccgc ttgttacacc
ctgtgagcct gcatgggatg gatgacccgg 240agagagaagt attagagtgg
aggtttgaca gccgcctagc atttcatcac atggcccgag 300agctgcatcc
ggagtacttc aagaactgct gacatcgagc ttgctacaag ggactttccg
360ctggggactt tccagggagg cgtggcctgg gcgggactgg ggagtggcga
gccctcagat 420gctgcatata agcagctgct ttttgcttgt actgggtctc
tctggttaga ccagatctga 480gcctgggagc tctctggcta actagggaac
ccactgctta agcctcaata aagcttgcct 540tgagtgcttc aagtagtgtg
tgcccgtctg ttgtgtgact ctggtaacta gagatccctc 600agaccctttt
agtcagtgtg
gaaaatctct agca 634377453DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotide 377tggaagggct
aattcactcc caacgaagac aagatatcct tgatctgtgg atctaccaca 60cacaaggcta
cttccctgat tggcagaact acacaccagg gccagggatc agatatccac
120tgacctttgg atggtgctac aagctagtac cagttgagca agagaaggta
gaagaagcca 180atgaaggaga gaacacccgc ttgttacacc ctgtgagcct
gcatgggatg gatgacccgg 240agagagaagt attagagtgg aggtttgaca
gccgcctagc atttcatcac atggcccgag 300agctgcatcc ggagtacttc
aagaactgct gacatcgagc ttgctacaag ggactttccg 360ctggggactt
tccagggagg cgtggcctgg gcgggactgg ggagtggcga gccctcagat
420gctgcatata agcagctgct ttttgcttgt act 45337897DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
378gggtctctct ggttagacca gatctgagcc tgggagctct ctggctaact
agggaaccca 60ctgcttaagc ctcaataaag cttgccttga gtgcttc
9737984DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 379aagtagtgtg tgcccgtctg ttgtgtgact
ctggtaacta gagatccctc agaccctttt 60agtcagtgtg gaaaatctct agca
84380818DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotide 380tggaagggat ttattacagt gcaagaagac
atagaatctt agacatatac ttagaaaagg 60aagaaggcat cataccagat tggcaggatt
acacctcagg accaggaatt agatacccaa 120agacatttgg ctggctatgg
aaattagtcc ctgtaaatgt atcagatgag gcacaggagg 180atgaggagca
ttatttaatg catccagctc aaacttccca gtgggatgac ccttggggag
240aggttctagc atggaagttt gatccaactc tggcctacac ttatgaggca
tatgttagat 300acccagaaga gtttggaagc aagtcaggcc tgtcagagga
agaggttaga agaaggctaa 360ccgcaagagg ccttcttaac atggctgaca
agaaggaaac tcgctgaaac agcagggact 420ttccacaagg ggatgttacg
gggaggtact ggggaggagc cggtcgggaa cgcccacttt 480cttgatgtat
aaatatcact gcatttcgct ctgtattcag tcgctctgcg gagaggctgg
540cagattgagc cctgggaggt tctctccagc actagcaggt agagcctggg
tgttccctgc 600tagactctca ccagcacttg gccggtgctg ggcagagtga
ctccacgctt gcttgcttaa 660agccctcttc aataaagctg ccattttaga
agtaagctag tgtgtgttcc catctctcct 720agccgccgcc tggtcaactc
ggtactcaat aataagaaga ccctggtctg ttaggaccct 780ttctgctttg
ggaaaccgaa gcaggaaaat ccctagca 818381517DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
381tggaagggat ttattacagt gcaagaagac atagaatctt agacatatac
ttagaaaagg 60aagaaggcat cataccagat tggcaggatt acacctcagg accaggaatt
agatacccaa 120agacatttgg ctggctatgg aaattagtcc ctgtaaatgt
atcagatgag gcacaggagg 180atgaggagca ttatttaatg catccagctc
aaacttccca gtgggatgac ccttggggag 240aggttctagc atggaagttt
gatccaactc tggcctacac ttatgaggca tatgttagat 300acccagaaga
gtttggaagc aagtcaggcc tgtcagagga agaggttaga agaaggctaa
360ccgcaagagg ccttcttaac atggctgaca agaaggaaac tcgctgaaac
agcagggact 420ttccacaagg ggatgttacg gggaggtact ggggaggagc
cggtcgggaa cgcccacttt 480cttgatgtat aaatatcact gcatttcgct ctgtatt
517382176DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotide 382cagtcgctct gcggagaggc tggcagattg
agccctggga ggttctctcc agcactagca 60ggtagagcct gggtgttccc tgctagactc
tcaccagcac ttggccggtg ctgggcagag 120tgactccacg cttgcttgct
taaagccctc ttcaataaag ctgccatttt agaagt 176383125DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
383aagctagtgt gtgttcccat ctctcctagc cgccgcctgg tcaactcggt
actcaataat 60aagaagaccc tggtctgtta ggaccctttc tgctttggga aaccgaagca
ggaaaatccc 120tagca 12538414825DNAHuman immunodeficiency virus 1
384tggaagggct aatttggtcc caaaaaagac aagagatcct tgatctgtgg
atctaccaca 60cacaaggcta cttccctgat tggcagaact acacaccagg gccagggatc
agatatccac 120tgacctttgg atggtgcttc aagttagtac cagttgaacc
agagcaagta gaagaggcca 180atgaaggaga gaacaacagc ttgttacacc
ctatgagcca gcatgggatg gaggacccgg 240agggagaagt attagtgtgg
aagtttgaca gcctcctagc atttcgtcac atggcccgag 300agctgcatcc
ggagtactac aaagactgct gacatcgagc tttctacaag ggactttccg
360ctggggactt tccagggagg tgtggcctgg gcgggactgg ggagtggcga
gccctcagat 420gctacatata agcagctgct ttttgcctgt actgggtctc
tctggttaga ccagatctga 480gcctgggagc tctctggcta actagggaac
ccactgctta agcctcaata aagcttgcct 540tgagtgctca aagtagtgtg
tgcccgtctg ttgtgtgact ctggtaacta gagatccctc 600agaccctttt
agtcagtgtg gaaaatctct agcagtggcg cccgaacagg gacttgaaag
660cgaaagtaaa gccagaggag atctctcgac gcaggactcg gcttgctgaa
gcgcgcacgg 720caagaggcga ggggcggcga ctggtgagta cgccaaaaat
tttgactagc ggaggctaga 780aggagagaga tgggtgcgag agcgtcggta
ttaagcgggg gagaattaga taaatgggaa 840aaaattcggt taaggccagg
gggaaagaaa caatataaac taaaacatat agtatgggca 900agcagggagc
tagaacgatt cgcagttaat cctggccttt tagagacatc agaaggctgt
960agacaaatac tgggacagct acaaccatcc cttcagacag gatcagaaga
acttagatca 1020ttatataata caatagcagt cctctattgt gtgcatcaaa
ggatagatgt aaaagacacc 1080aaggaagcct tagataagat agaggaagag
caaaacaaaa gtaagaaaaa ggcacagcaa 1140gcagcagctg acacaggaaa
caacagccag gtcagccaaa attaccctat agtgcagaac 1200ctccaggggc
aaatggtaca tcaggccata tcacctagaa ctttaaatgc atgggtaaaa
1260gtagtagaag agaaggcttt cagcccagaa gtaataccca tgttttcagc
attatcagaa 1320ggagccaccc cacaagattt aaataccatg ctaaacacag
tggggggaca tcaagcagcc 1380atgcaaatgt taaaagagac catcaatgag
gaagctgcag aatgggatag attgcatcca 1440gtgcatgcag ggcctattgc
accaggccag atgagagaac caaggggaag tgacatagca 1500ggaactacta
gtacccttca ggaacaaata ggatggatga cacataatcc acctatccca
1560gtaggagaaa tctataaaag atggataatc ctgggattaa ataaaatagt
aagaatgtat 1620agccctacca gcattctgga cataagacaa ggaccaaagg
aaccctttag agactatgta 1680gaccgattct ataaaactct aagagccgag
caagcttcac aagaggtaaa aaattggatg 1740acagaaacct tgttggtcca
aaatgcgaac ccagattgta agactatttt aaaagcattg 1800ggaccaggag
cgacactaga agaaatgatg acagcatgtc agggagtggg gggacccggc
1860cataaagcaa gagttttggc tgaagcaatg agccaagtaa caaatccagc
taccataatg 1920atacagaaag gcaattttag gaaccaaaga aagactgtta
agtgtttcaa ttgtggcaaa 1980gaagggcaca tagccaaaaa ttgcagggcc
cctaggaaaa agggctgttg gaaatgtgga 2040aaggaaggac accaaatgaa
agattgtact gagagacagg ctaatttttt agggaagatc 2100tggccttccc
acaagggaag gccagggaat tttcttcaga gcagaccaga gccaacagcc
2160ccaccagaag agagcttcag gtttggggaa gagacaacaa ctccctctca
gaagcaggag 2220ccgatagaca aggaactgta tcctttagct tccctcagat
cactctttgg cagcgacccc 2280tcgtcacaat aaagataggg gggcaattaa
aggaagctct attagataca ggagcagatg 2340atacagtatt agaagaaatg
aatttgccag gaagatggaa accaaaaatg atagggggaa 2400ttggaggttt
tatcaaagta agacagtatg atcagatact catagaaatc tgcggacata
2460aagctatagg tacagtatta gtaggaccta cacctgtcaa cataattgga
agaaatctgt 2520tgactcagat tggctgcact ttaaattttc ccattagtcc
tattgagact gtaccagtaa 2580aattaaagcc aggaatggat ggcccaaaag
ttaaacaatg gccattgaca gaagaaaaaa 2640taaaagcatt agtagaaatt
tgtacagaaa tggaaaagga aggaaaaatt tcaaaaattg 2700ggcctgaaaa
tccatacaat actccagtat ttgccataaa gaaaaaagac agtactaaat
2760ggagaaaatt agtagatttc agagaactta ataagagaac tcaagatttc
tgggaagttc 2820aattaggaat accacatcct gcagggttaa aacagaaaaa
atcagtaaca gtactggatg 2880tgggcgatgc atatttttca gttcccttag
ataaagactt caggaagtat actgcattta 2940ccatacctag tataaacaat
gagacaccag ggattagata tcagtacaat gtgcttccac 3000agggatggaa
aggatcacca gcaatattcc agtgtagcat gacaaaaatc ttagagcctt
3060ttagaaaaca aaatccagac atagtcatct atcaatacat ggatgatttg
tatgtaggat 3120ctgacttaga aatagggcag catagaacaa aaatagagga
actgagacaa catctgttga 3180ggtggggatt taccacacca gacaaaaaac
atcagaaaga acctccattc ctttggatgg 3240gttatgaact ccatcctgat
aaatggacag tacagcctat agtgctgcca gaaaaggaca 3300gctggactgt
caatgacata cagaaattag tgggaaaatt gaattgggca agtcagattt
3360atgcagggat taaagtaagg caattatgta aacttcttag gggaaccaaa
gcactaacag 3420aagtagtacc actaacagaa gaagcagagc tagaactggc
agaaaacagg gagattctaa 3480aagaaccggt acatggagtg tattatgacc
catcaaaaga cttaatagca gaaatacaga 3540agcaggggca aggccaatgg
acatatcaaa tttatcaaga gccatttaaa aatctgaaaa 3600caggaaagta
tgcaagaatg aagggtgccc acactaatga tgtgaaacaa ttaacagagg
3660cagtacaaaa aatagccaca gaaagcatag taatatgggg aaagactcct
aaatttaaat 3720tacccataca aaaggaaaca tgggaagcat ggtggacaga
gtattggcaa gccacctgga 3780ttcctgagtg ggagtttgtc aatacccctc
ccttagtgaa gttatggtac cagttagaga 3840aagaacccat aataggagca
gaaactttct atgtagatgg ggcagccaat agggaaacta 3900aattaggaaa
agcaggatat gtaactgaca gaggaagaca aaaagttgtc cccctaacgg
3960acacaacaaa tcagaagact gagttacaag caattcatct agctttgcag
gattcgggat 4020tagaagtaaa catagtgaca gactcacaat atgcattggg
aatcattcaa gcacaaccag 4080ataagagtga atcagagtta gtcagtcaaa
taatagagca gttaataaaa aaggaaaaag 4140tctacctggc atgggtacca
gcacacaaag gaattggagg aaatgaacaa gtagataaat 4200tggtcagtgc
tggaatcagg aaagtactat ttttagatgg aatagataag gcccaagaag
4260aacatgagaa atatcacagt aattggagag caatggctag tgattttaac
ctaccacctg 4320tagtagcaaa agaaatagta gccagctgtg ataaatgtca
gctaaaaggg gaagccatgc 4380atggacaagt agactgtagc ccaggaatat
ggcagctaga ttgtacacat ttagaaggaa 4440aagttatctt ggtagcagtt
catgtagcca gtggatatat agaagcagaa gtaattccag 4500cagagacagg
gcaagaaaca gcatacttcc tcttaaaatt agcaggaaga tggccagtaa
4560aaacagtaca tacagacaat ggcagcaatt tcaccagtac tacagttaag
gccgcctgtt 4620ggtgggcggg gatcaagcag gaatttggca ttccctacaa
tccccaaagt caaggagtaa 4680tagaatctat gaataaagaa ttaaagaaaa
ttataggaca ggtaagagat caggctgaac 4740atcttaagac agcagtacaa
atggcagtat tcatccacaa ttttaaaaga aaagggggga 4800ttggggggta
cagtgcaggg gaaagaatag tagacataat agcaacagac atacaaacta
4860aagaattaca aaaacaaatt acaaaaattc aaaattttcg ggtttattac
agggacagca 4920gagatccagt ttggaaagga ccagcaaagc tcctctggaa
aggtgaaggg gcagtagtaa 4980tacaagataa tagtgacata aaagtagtgc
caagaagaaa agcaaagatc atcagggatt 5040atggaaaaca gatggcaggt
gatgattgtg tggcaagtag acaggatgag gattaacaca 5100tggaaaagat
tagtaaaaca ccatatgtat atttcaagga aagctaagga ctggttttat
5160agacatcact atgaaagtac taatccaaaa ataagttcag aagtacacat
cccactaggg 5220gatgctaaat tagtaataac aacatattgg ggtctgcata
caggagaaag agactggcat 5280ttgggtcagg gagtctccat agaatggagg
aaaaagagat atagcacaca agtagaccct 5340gacctagcag accaactaat
tcatctgcac tattttgatt gtttttcaga atctgctata 5400agaaatacca
tattaggacg tatagttagt cctaggtgtg aatatcaagc aggacataac
5460aaggtaggat ctctacagta cttggcacta gcagcattaa taaaaccaaa
acagataaag 5520ccacctttgc ctagtgttag gaaactgaca gaggacagat
ggaacaagcc ccagaagacc 5580aagggccaca gagggagcca tacaatgaat
ggacactaga gcttttagag gaacttaaga 5640gtgaagctgt tagacatttt
cctaggatat ggctccataa cttaggacaa catatctatg 5700aaacttacgg
ggatacttgg gcaggagtgg aagccataat aagaattctg caacaactgc
5760tgtttatcca tttcagaatt gggtgtcgac atagcagaat aggcgttact
cgacagagga 5820gagcaagaaa tggagccagt agatcctaga ctagagccct
ggaagcatcc aggaagtcag 5880cctaaaactg cttgtaccaa ttgctattgt
aaaaagtgtt gctttcattg ccaagtttgt 5940ttcatgacaa aagccttagg
catctcctat ggcaggaaga agcggagaca gcgacgaaga 6000gctcatcaga
acagtcagac tcatcaagct tctctatcaa agcagtaagt agtacatgta
6060atgcaaccta taatagtagc aatagtagca ttagtagtag caataataat
agcaatagtt 6120gtgtggtcca tagtaatcat agaatatagg aaaatattaa
gacaaagaaa aatagacagg 6180ttaattgata gactaataga aagagcagaa
gacagtggca atgagagtga aggagaagta 6240tcagcacttg tggagatggg
ggtggaaatg gggcaccatg ctccttggga tattgatgat 6300ctgtagtgct
acagaaaaat tgtgggtcac agtctattat ggggtacctg tgtggaagga
6360agcaaccacc actctatttt gtgcatcaga tgctaaagca tatgatacag
aggtacataa 6420tgtttgggcc acacatgcct gtgtacccac agaccccaac
ccacaagaag tagtattggt 6480aaatgtgaca gaaaatttta acatgtggaa
aaatgacatg gtagaacaga tgcatgagga 6540tataatcagt ttatgggatc
aaagcctaaa gccatgtgta aaattaaccc cactctgtgt 6600tagtttaaag
tgcactgatt tgaagaatga tactaatacc aatagtagta gcgggagaat
6660gataatggag aaaggagaga taaaaaactg ctctttcaat atcagcacaa
gcataagaga 6720taaggtgcag aaagaatatg cattctttta taaacttgat
atagtaccaa tagataatac 6780cagctatagg ttgataagtt gtaacacctc
agtcattaca caggcctgtc caaaggtatc 6840ctttgagcca attcccatac
attattgtgc cccggctggt tttgcgattc taaaatgtaa 6900taataagacg
ttcaatggaa caggaccatg tacaaatgtc agcacagtac aatgtacaca
6960tggaatcagg ccagtagtat caactcaact gctgttaaat ggcagtctag
cagaagaaga 7020tgtagtaatt agatctgcca atttcacaga caatgctaaa
accataatag tacagctgaa 7080cacatctgta gaaattaatt gtacaagacc
caacaacaat acaagaaaaa gtatccgtat 7140ccagagggga ccagggagag
catttgttac aataggaaaa ataggaaata tgagacaagc 7200acattgtaac
attagtagag caaaatggaa tgccacttta aaacagatag ctagcaaatt
7260aagagaacaa tttggaaata ataaaacaat aatctttaag caatcctcag
gaggggaccc 7320agaaattgta acgcacagtt ttaattgtgg aggggaattt
ttctactgta attcaacaca 7380actgtttaat agtacttggt ttaatagtac
ttggagtact gaagggtcaa ataacactga 7440aggaagtgac acaatcacac
tcccatgcag aataaaacaa tttataaaca tgtggcagga 7500agtaggaaaa
gcaatgtatg cccctcccat cagtggacaa attagatgtt catcaaatat
7560tactgggctg ctattaacaa gagatggtgg taataacaac aatgggtccg
agatcttcag 7620acctggagga ggcgatatga gggacaattg gagaagtgaa
ttatataaat ataaagtagt 7680aaaaattgaa ccattaggag tagcacccac
caaggcaaag agaagagtgg tgcagagaga 7740aaaaagagca gtgggaatag
gagctttgtt ccttgggttc ttgggagcag caggaagcac 7800tatgggcgca
gcgtcaatga cgctgacggt acaggccaga caattattgt ctgatatagt
7860gcagcagcag aacaatttgc tgagggctat tgaggcgcaa cagcatctgt
tgcaactcac 7920agtctggggc atcaaacagc tccaggcaag aatcctggct
gtggaaagat acctaaagga 7980tcaacagctc ctggggattt ggggttgctc
tggaaaactc atttgcacca ctgctgtgcc 8040ttggaatgct agttggagta
ataaatctct ggaacagatt tggaataaca tgacctggat 8100ggagtgggac
agagaaatta acaattacac aagcttaata cactccttaa ttgaagaatc
8160gcaaaaccag caagaaaaga atgaacaaga attattggaa ttagataaat
gggcaagttt 8220gtggaattgg tttaacataa caaattggct gtggtatata
aaattattca taatgatagt 8280aggaggcttg gtaggtttaa gaatagtttt
tgctgtactt tctatagtga atagagttag 8340gcagggatat tcaccattat
cgtttcagac ccacctccca atcccgaggg gacccgacag 8400gcccgaagga
atagaagaag aaggtggaga gagagacaga gacagatcca ttcgattagt
8460gaacggatcc ttagcactta tctgggacga tctgcggagc ctgtgcctct
tcagctacca 8520ccgcttgaga gacttactct tgattgtaac gaggattgtg
gaacttctgg gacgcagggg 8580gtgggaagcc ctcaaatatt ggtggaatct
cctacagtat tggagtcagg aactaaagaa 8640tagtgctgtt aacttgctca
atgccacagc catagcagta gctgagggga cagatagggt 8700tatagaagta
ttacaagcag cttatagagc tattcgccac atacctagaa gaataagaca
8760gggcttggaa aggattttgc tataagatgg gtggcaagtg gtcaaaaagt
agtgtgattg 8820gatggcctgc tgtaagggaa agaatgagac gagctgagcc
agcagcagat ggggtgggag 8880cagtatctcg agacctagaa aaacatggag
caatcacaag tagcaataca gcagctaaca 8940atgctgcttg tgcctggcta
gaagcacaag aggaggaaga ggtgggtttt ccagtcacac 9000ctcaggtacc
tttaagacca atgacttaca aggcagctgt agatcttagc cactttttaa
9060aagaaaaggg gggactggaa gggctaattc actcccaaag aagacaagat
atccttgatc 9120tgtggatcta ccacacacaa ggctacttcc ctgattggca
gaactacaca ccagggccag 9180gggtcagata tccactgacc tttggatggt
gctacaagct agtaccagtt gagccagata 9240aggtagaaga ggccaataaa
ggagagaaca ccagcttgtt acaccctgtg agcctgcatg 9300gaatggatga
ccctgagaga gaagtgttag agtggaggtt tgacagccgc ctagcatttc
9360atcacgtggc ccgagagctg catccggagt acttcaagaa ctgctgacat
cgagcttgct 9420acaagggact ttccgctggg gactttccag ggaggcgtgg
cctgggcggg actggggagt 9480ggcgagccct cagatgctgc atataagcag
ctgctttttg cctgtactgg gtctctctgg 9540ttagaccaga tctgagcctg
ggagctctct ggctaactag ggaacccact gcttaagcct 9600caataaagct
tgccttgagt gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt
9660aactagagat ccctcagacc cttttagtca gtgtggaaaa tctctagcac
ccaggaggta 9720gaggttgcag tgagccaaga tcgcgccact gcattccagc
ctgggcaaga aaacaagact 9780gtctaaaata ataataataa gttaagggta
ttaaatatat ttatacatgg aggtcataaa 9840aatatatata tttgggctgg
gcgcagtggc tcacacctgc gcccggccct ttgggaggcc 9900gaggcaggtg
gatcacctga gtttgggagt tccagaccag cctgaccaac atggagaaac
9960cccttctctg tgtattttta gtagatttta ttttatgtgt attttattca
caggtatttc 10020tggaaaactg aaactgtttt tcctctactc tgataccaca
agaatcatca gcacagagga 10080agacttctgt gatcaaatgt ggtgggagag
ggaggttttc accagcacat gagcagtcag 10140ttctgccgca gactcggcgg
gtgtccttcg gttcagttcc aacaccgcct gcctggagag 10200aggtcagacc
acagggtgag ggctcagtcc ccaagacata aacacccaag acataaacac
10260ccaacaggtc caccccgcct gctgcccagg cagagccgat tcaccaagac
gggaattagg 10320atagagaaag agtaagtcac acagagccgg ctgtgcggga
gaacggagtt ctattatgac 10380tcaaatcagt ctccccaagc attcggggat
cagagttttt aaggataact tagtgtgtag 10440ggggccagtg agttggagat
gaaagcgtag ggagtcgaag gtgtcctttt gcgccgagtc 10500agttcctggg
tgggggccac aagatcggat gagccagttt atcaatccgg gggtgccagc
10560tgatccatgg agtgcagggt ctgcaaaata tctcaagcac tgattgatct
taggttttac 10620aatagtgatg ttaccccagg aacaatttgg ggaaggtcag
aatcttgtag cctgtagctg 10680catgactcct aaaccataat ttcttttttg
tttttttttt tttatttttg agacagggtc 10740tcactctgtc acctaggctg
gagtgcagtg gtgcaatcac agctcactgc agcctcaacg 10800tcgtaagctc
aagcgatcct cccacctcag cctgcctggt agctgagact acaagcgacg
10860ccccagttaa tttttgtatt tttggtagag gcagcgtttt gccgtgtggc
cctggctggt 10920ctcgaactcc tgggctcaag tgatccagcc tcagcctccc
aaagtgctgg gacaaccggg 10980gccagtcact gcacctggcc ctaaaccata
atttctaatc ttttggctaa tttgttagtc 11040ctacaaaggc agtctagtcc
ccaggcaaaa agggggtttg tttcgggaaa gggctgttac 11100tgtctttgtt
tcaaactata aactaagttc ctcctaaact tagttcggcc tacacccagg
11160aatgaacaag gagagcttgg aggttagaag cacgatggaa ttggttaggt
cagatctctt 11220tcactgtctg agttataatt ttgcaatggt ggttcaaaga
ctgcccgctt ctgacaccag 11280tcgctgcatt aatgaatcgg ccaacgcgcg
gggagaggcg gtttgcgtat tgggcgctct 11340tccgcttcct cgctcactga
ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca 11400gctcactcaa
aggcggtaat acggttatcc acagaatcag gggataacgc aggaaagaac
11460atgtgagcaa aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt
gctggcgttt 11520ttccataggc tccgcccccc tgacgagcat cacaaaaatc
gacgctcaag tcagaggtgg 11580cgaaacccga caggactata aagataccag
gcgtttcccc ctggaagctc cctcgtgcgc 11640tctcctgttc cgaccctgcc
gcttaccgga tacctgtccg cctttctccc ttcgggaagc 11700gtggcgcttt
ctcatagctc acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc
11760aagctgggct gtgtgcacga accccccgtt cagcccgacc gctgcgcctt
atccggtaac 11820tatcgtcttg agtccaaccc ggtaagacac gacttatcgc
cactggcagc agccactggt 11880aacaggatta gcagagcgag gtatgtaggc
ggtgctacag agttcttgaa gtggtggcct 11940aactacggct acactagaag
aacagtattt ggtatctgcg ctctgctgaa gccagttacc 12000ttcggaaaaa
gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt
12060ttttttgttt gcaagcagca gattacgcgc agaaaaaaag gatctcaaga
agatcctttg 12120atcttttcta cggggtctga cgctcagtgg aacgaaaact
cacgttaagg gattttggtc 12180atgagattat caaaaaggat cttcacctag
atccttttaa attaaaaatg aagttttaaa 12240tcaatctaaa gtatatatga
gtaaacttgg tctgacagtt accaatgctt aatcagtgag 12300gcacctatct
cagcgatctg tctatttcgt tcatccatag ttgcctgact ccccgtcgtg
12360tagataacta cgatacggga gggcttacca tctggcccca gtgctgcaat
gataccgcga 12420gacccacgct caccggctcc agatttatca gcaataaacc
agccagccgg aagggccgag 12480cgcagaagtg gtcctgcaac tttatccgcc
tccatccagt ctattaattg ttgccgggaa 12540gctagagtaa gtagttcgcc
agttaatagt ttgcgcaacg ttgttgccat tgctacaggc 12600atcgtggtgt
cacgctcgtc gtttggtatg gcttcattca gctccggttc ccaacgatca
12660aggcgagtta catgatcccc catgttgtgc aaaaaagcgg ttagctcctt
cggtcctccg 12720atcgttgtca gaagtaagtt ggccgcagtg ttatcactca
tggttatggc agcactgcat 12780aattctctta ctgtcatgcc atccgtaaga
tgcttttctg tgactggtga gtactcaacc 12840aagtcattct gagaatagtg
tatgcggcga ccgagttgct cttgcccggc gtcaatacgg 12900gataataccg
cgccacatag cagaacttta aaagtgctca tcattggaaa acgttcttcg
12960gggcgaaaac tctcaaggat cttaccgctg ttgagatcca gttcgatgta
acccactcgt 13020gcacccaact gatcttcagc atcttttact ttcaccagcg
tttctgggtg agcaaaaaca 13080ggaaggcaaa atgccgcaaa aaagggaata
agggcgacac ggaaatgttg aatactcata 13140ctcttccttt ttcaatatta
ttgaagcatt tatcagggtt attgtctcat gagcggatac 13200atatttgaat
gtatttagaa aaataaacaa ataggggttc cgcgcacatt tccccgaaaa
13260gtgccacctg acgtctaaga aaccattatt atcatgacat taacctataa
aaataggcgt 13320atcacgaggc cctttcgtct cgcgcgtttc ggtgatgacg
gtgaaaacct ctgacacatg 13380cagctcccgg agacggtcac agcttgtctg
taagcggatg ccgggagcag acaagcccgt 13440cagggcgcgt cagcgggtgt
tggcgggtgt cggggctggc ttaactatgc ggcatcagag 13500cagattgtac
tgagagtgca ccatatgcgg tgtgaaatac cgcacagatg cgtaaggaga
13560aaataccgca tcaggcgcca ttcgccattc aggctgcgca actgttggga
agggcgatcg 13620gtgcgggcct cttcgctatt acgccagggg aggcagagat
tgcagtaagc tgagatcgca 13680gcactgcact ccagcctggg cgacagagta
agactctgtc tcaaaaataa aataaataaa 13740tcaatcagat attccaatct
tttcctttat ttatttattt attttctatt ttggaaacac 13800agtccttcct
tattccagaa ttacacatat attctatttt tctttatatg ctccagtttt
13860ttttagacct tcacctgaaa tgtgtgtata caaaatctag gccagtccag
cagagcctaa 13920aggtaaaaaa taaaataata aaaaataaat aaaatctagc
tcactccttc acatcaaaat 13980ggagatacag ctgttagcat taaataccaa
ataacccatc ttgtcctcaa taattttaag 14040cgcctctctc caccacatct
aactcctgtc aaaggcatgt gccccttccg ggcgctctgc 14100tgtgctgcca
accaactggc atgtggactc tgcagggtcc ctaactgcca agccccacag
14160tgtgccctga ggctgcccct tccttctagc ggctgccccc actcggcttt
gctttcccta 14220gtttcagtta cttgcgttca gccaaggtct gaaactaggt
gcgcacagag cggtaagact 14280gcgagagaaa gagaccagct ttacaggggg
tttatcacag tgcaccctga cagtcgtcag 14340cctcacaggg ggtttatcac
attgcaccct gacagtcgtc agcctcacag ggggtttatc 14400acagtgcacc
cttacaatca ttccatttga ttcacaattt ttttagtctc tactgtgcct
14460aacttgtaag ttaaatttga tcagaggtgt gttcccagag gggaaaacag
tatatacagg 14520gttcagtact atcgcatttc aggcctccac ctgggtcttg
gaatgtgtcc cccgaggggt 14580gatgactacc tcagttggat ctccacaggt
cacagtgaca caagataacc aagacacctc 14640ccaaggctac cacaatgggc
cgccctccac gtgcacatgg ccggaggaac tgccatgtcg 14700gaggtgcaag
cacacctgcg catcagagtc cttggtgtgg agggagggac cagcgcagct
14760tccagccatc cacctgatga acagaaccta gggaaagccc cagttctact
tacaccagga 14820aaggc 1482538510535DNASimian immunodeficiency virus
385gcatgcacat tttaaaggct tttgctaaat atagccaaaa gtccttctac
aaattttcta 60agagttctga ttcaaagcag taacaggcct tgtctcatca tgaactttgg
catttcatct 120acagctaagt ttatatcata aatagttctt tacaggcagc
accaacttat acccttatag 180catactttac tgtgtgaaaa ttgcatcttt
cattaagctt actgtaaatt tactggctgt 240cttccttgca ggtttctgga
agggatttat tacagtgcaa gaagacatag aatcttagac 300atatacttag
aaaaggaaga aggcatcata ccagattggc aggattacac ctcaggacca
360ggaattagat acccaaagac atttggctgg ctatggaaat tagtccctgt
aaatgtatca 420gatgaggcac aggaggatga ggagcattat ttaatgcatc
cagctcaaac ttcccagtgg 480gatgaccctt ggggagaggt tctagcatgg
aagtttgatc caactctggc ctacacttat 540gaggcatatg ttagataccc
agaagagttt ggaagcaagt caggcctgtc agaggaagag 600gttagaagaa
ggctaaccgc aagaggcctt cttaacatgg ctgacaagaa ggaaactcgc
660tgaaacagca gggactttcc acaaggggat gttacgggga ggtactgggg
aggagccggt 720cgggaacgcc cactttcttg atgtataaat atcactgcat
ttcgctctgt attcagtcgc 780tctgcggaga ggctggcaga ttgagccctg
ggaggttctc tccagcacta gcaggtagag 840cctgggtgtt ccctgctaga
ctctcaccag cacttggccg gtgctgggca gagtgactcc 900acgcttgctt
gcttaaagcc ctcttcaata aagctgccat tttagaagta agctagtgtg
960tgttcccatc tctcctagcc gccgcctggt caactcggta ctcaataata
agaagaccct 1020ggtctgttag gaccctttct gctttgggaa accgaagcag
gaaaatccct agcagattgg 1080cgcctgaaca gggacttgaa ggagagtgag
agactcctga gtacggctga gtgaaggcag 1140taagggcggc aggaaccaac
cacgacggag tgctcctata aaggcgcggg tcggtaccag 1200acggcgtgag
gagcgggaga ggaagaggcc tccggttgca ggtaagtgca acacaaaaaa
1260gaaatagctg tcttttatcc aggaaggggt aataagatag agtgggagat
gggcgtgaga 1320aactccgtct tgtcagggaa gaaagcagat gaattagaaa
aaattaggct acgacccaac 1380ggaaagaaaa agtacatgtt gaagcatgta
gtatgggcag caaatgaatt agatagattt 1440ggattagcag aaagcctgtt
ggagaacaaa gaaggatgtc aaaaaatact ttcggtctta 1500gctccattag
tgccaacagg ctcagaaaat ttaaaaagcc tttataatac tgtctgcgtc
1560atctggtgca ttcacgcaga agagaaagtg aaacacactg aggaagcaaa
acagatagtg 1620cagagacacc tagtggtgga aacaggaaca acagaaacta
tgccaaaaac aagtagacca 1680acagcaccat ctagcggcag aggaggaaat
tacccagtac aacaaatagg tggtaactat 1740gtccacctgc cattaagccc
gagaacatta aatgcctggg taaaattgat agaggaaaag 1800aaatttggag
cagaagtagt gccaggattt caggcactgt cagaaggttg caccccctat
1860gacattaatc agatgttaaa ttgtgtggga gaccatcaag cggctatgca
gattatcaga 1920gatattataa acgaggaggc tgcagattgg gacttgcagc
acccacaacc agctccacaa 1980caaggacaac ttagggagcc gtcaggatca
gatattgcag gaacaactag ttcagtagat 2040gaacaaatcc agtggatgta
cagacaacag aaccccatac cagtaggcaa catttacagg 2100agatggatcc
aactggggtt gcaaaaatgt gtcagaatgt ataacccaac aaacattcta
2160gatgtaaaac aagggccaaa agagccattt cagagctatg tagacaggtt
ctacaaaagt 2220ttaagagcag aacagacaga tgcagcagta aagaattgga
tgactcaaac actgctgatt 2280caaaatgcta acccagattg caagctagtg
ctgaaggggc tgggtgtgaa tcccacccta 2340gaagaaatgc tgacggcttg
tcaaggagta ggggggccgg gacagaaggc tagattaatg 2400gcagaagccc
tgaaagaggc cctcgcacca gtgccaatcc cttttgcagc agcccaacag
2460aggggaccaa gaaagccaat taagtgttgg aattgtggga aagagggaca
ctctgcaagg 2520caatgcagag ccccaagaag acagggatgc tggaaatgtg
gaaaaatgga ccatgttatg 2580gccaaatgcc cagacagaca ggcgggtttt
ttaggccttg gtccatgggg aaagaagccc 2640cgcaatttcc ccatggctca
agtgcatcag gggctgatgc caactgctcc cccagaggac 2700ccagctgtgg
atctgctaaa gaactacatg cagttgggca agcagcagag agaaaagcag
2760agagaaagca gagagaagcc ttacaaggag gtgacagagg atttgctgca
cctcaattct 2820ctctttggag gagaccagta gtcactgctc atattgaagg
acagcctgta gaagtattac 2880tggatacagg ggctgatgat tctattgtaa
caggaataga gttaggtcca cattataccc 2940caaaaatagt aggaggaata
ggaggtttta ttaatactaa agaatacaaa aatgtagaaa 3000tagaagtttt
aggcaaaagg attaaaggga caatcatgac aggggacacc ccgattaaca
3060tttttggtag aaatttgcta acagctctgg ggatgtctct aaattttccc
atagctaaag 3120tagagcctgt aaaagtcgcc ttaaagccag gaaaggatgg
accaaaattg aagcagtggc 3180cattatcaaa agaaaagata gttgcattaa
gagaaatctg tgaaaagatg gaaaaggatg 3240gtcagttgga ggaagctccc
ccgaccaatc catacaacac ccccacattt gctataaaga 3300aaaaggataa
gaacaaatgg agaatgctga tagattttag ggaactaaat agggtcactc
3360aggactttac ggaagtccaa ttaggaatac cacaccctgc aggactagca
aaaaggaaaa 3420gaattacagt actggatata ggtgatgcat atttctccat
acctctagat gaagaattta 3480ggcagtacac tgcctttact ttaccatcag
taaataatgc agagccagga aaacgataca 3540tttataaggt tctgcctcag
ggatggaagg ggtcaccagc catcttccaa tacactatga 3600gacatgtgct
agaacccttc aggaaggcaa atccagatgt gaccttagtc cagtatatgg
3660atgacatctt aatagctagt gacaggacag acctggaaca tgacagggta
gttttacagt 3720caaaggaact cttgaatagc atagggtttt ctaccccaga
agagaaattc caaaaagatc 3780ccccatttca atggatgggg tacgaattgt
ggccaacaaa atggaagttg caaaagatag 3840agttgccaca aagagagacc
tggacagtga atgatataca gaagttagta ggagtattaa 3900attgggcagc
tcaaatttat ccaggtataa aaaccaaaca tctctgtagg ttaattagag
3960gaaaaatgac tctaacagag gaagttcagt ggactgagat ggcagaagca
gaatatgagg 4020aaaataaaat aattctcagt caggaacaag aaggatgtta
ttaccaagaa ggcaagccat 4080tagaagccac ggtaataaag agtcaggaca
atcagtggtc ttataaaatt caccaagaag 4140acaaaatact gaaagtagga
aaatttgcaa agataaagaa tacacatacc aatggagtga 4200gactattagc
acatgtaata cagaaaatag gaaaggaagc aatagtgatc tggggacagg
4260tcccaaaatt ccacttacca gttgagaagg atgtatggga acagtggtgg
acagactatt 4320ggcaggtaac ctggataccg gaatgggatt ttatctcaac
accaccgcta gtaagattag 4380tcttcaatct agtgaaggac cctatagagg
gagaagaaac ctattataca gatggatcat 4440gtaataaaca gtcaaaagaa
gggaaagcag gatatatcac agataggggc aaagacaaag 4500taaaagtgtt
agaacagact actaatcaac aagcagaatt ggaagcattt ctcatggcat
4560tgacagactc agggccaaag gcaaatatta tagtagattc acaatatgtt
atgggaataa 4620taacaggatg ccctacagaa tcagagagca ggctagttaa
tcaaataata gaagaaatga 4680ttaaaaagtc agaaatttat gtagcatggg
taccagcaca caaaggtata ggaggaaacc 4740aagaaataga ccacctagtt
agtcaaggga ttagacaagt tctcttcttg gaaaagatag 4800agccagcaca
agaagaacat gataaatacc atagtaatgt aaaagaattg gtattcaaat
4860ttggattacc cagaatagtg gccagacaga tagtagacac ctgtgataaa
tgtcatcaga 4920aaggagaggc tatacatggg caggcaaatt cagatctagg
gacttggcaa atggattgta 4980cccatctaga gggaaaaata atcatagttg
cagtacatgt agctagtgga ttcatagaag 5040cagaggtaat tccacaagag
acaggaagac agacagcact atttctgtta aaattggcag 5100gcagatggcc
tattacacat ctacacacag ataatggtgc taactttgct tcgcaagaag
5160taaagatggt tgcatggtgg gcagggatag agcacacctt tggggtacca
tacaatccac 5220agagtcaggg agtagtggaa gcaatgaatc accacctgaa
aaatcaaata gatagaatca 5280gggaacaagc aaattcagta gaaaccatag
tattaatggc agttcattgc atgaatttta 5340aaagaagggg aggaataggg
gatatgactc cagcagaaag attaattaac atgatcacta 5400cagaacaaga
gatacaattt caacaatcaa aaaactcaaa atttaaaaat tttcgggtct
5460attacagaga aggcagagat caactgtgga agggacccgg tgagctattg
tggaaagggg 5520aaggagcagt catcttaaag gtagggacag acattaaggt
agtacccaga agaaaggcta 5580aaattatcaa agattatgga ggaggaaaag
aggtggatag cagttcccac atggaggata 5640ccggagaggc tagagaggtg
gcatagcctc ataaaatatc tgaaatataa aactaaagat 5700ctacaaaagg
tttgctatgt gccccatttt aaggtcggat gggcatggtg gacctgcagc
5760agagtaatct tcccactaca ggaaggaagc catttagaag tacaagggta
ttggcatttg 5820acaccagaaa aagggtggct cagtacttat gcagtgagga
taacctggta ctcaaagaac 5880ttttggacag atgtaacacc aaactatgca
gacattttac tgcatagcac ttatttccct 5940tgctttacag cgggagaagt
gagaagggcc atcaggggag aacaactgct gtcttgctgc 6000aggttcccga
gagctcataa gtaccaggta ccaagcctac agtacttagc actgaaagta
6060gtaagcgatg tcagatccca gggagagaat cccacctgga aacagtggag
aagagacaat 6120aggagaggcc ttcgaatggc taaacagaac agtagaggag
ataaacagag aggcggtaaa 6180ccacctacca agggagctaa ttttccaggt
ttggcaaagg tcttgggaat actggcatga 6240tgaacaaggg atgtcaccaa
gctatgtaaa atacagatac ttgtgtttaa tacaaaaggc 6300tttatttatg
cattgcaaga aaggctgtag atgtctaggg gaaggacatg gggcaggggg
6360atggagacca ggacctcctc ctcctccccc tccaggacta gcataaatgg
aagaaagacc 6420tccagaaaat gaaggaccac aaagggaacc atgggatgaa
tgggtagtgg aggttctgga 6480agaactgaaa gaagaagctt taaaacattt
tgatcctcgc ttgctaactg cacttggtaa 6540tcatatctat aatagacatg
gagacaccct tgagggagca ggagaactca ttagaatcct 6600ccaacgagcg
ctcttcatgc atttcagagg cggatgcatc cactccagaa tcggccaacc
6660tgggggagga aatcctctct cagctatacc gccctctaga agcatgctat
aacacatgct 6720attgtaaaaa gtgttgctac cattgccagt tttgttttct
taaaaaaggc ttggggatat 6780gttatgagca atcacgaaag agaagaagaa
ctccgaaaaa ggctaaggct aatacatctt 6840ctgcatcaaa caagtaagta
tgggatgtct tgggaatcag ctgcttatcg ccatcttgct 6900tttaagtgtc
tatgggatct attgtactct atatgtcaca gtcttttatg gtgtaccagc
6960ttggaggaat gcgacaattc ccctcttttg tgcaaccaag aatagggata
cttggggaac 7020aactcagtgc ctaccagata atggtgatta ttcagaagtg
gcccttaatg ttacagaaag 7080ctttgatgcc tggaataata cagtcacaga
acaggcaata gaggatgtat ggcaactctt 7140tgagacctca ataaagcctt
gtgtaaaatt atccccatta tgcattacta tgagatgcaa 7200taaaagtgag
acagatagat ggggattgac aaaatcaata acaacaacag catcaacaac
7260atcaacgaca gcatcagcaa aagtagacat ggtcaatgag actagttctt
gtatagccca 7320ggataattgc acaggcttgg aacaagagca aatgataagc
tgtaaattca acatgacagg 7380gttaaaaaga gacaagaaaa aagagtacaa
tgaaacttgg tactctgcag atttggtatg 7440tgaacaaggg aataacactg
gtaatgaaag tagatgttac atgaaccact gtaacacttc 7500tgttatccaa
gagtcttgtg acaaacatta ttgggatgct attagattta ggtattgtgc
7560acctccaggt tatgctttgc ttagatgtaa tgacacaaat tattcaggct
ttatgcctaa 7620atgttctaag gtggtggtct cttcatgcac aaggatgatg
gagacacaga cttctacttg 7680gtttggcttt aatggaacta gagcagaaaa
tagaacttat atttactggc atggtaggga 7740taataggact ataattagtt
taaataagta ttataatcta acaatgaaat gtagaagacc 7800aggaaataag
acagttttac cagtcaccat tatgtctgga ttggttttcc actcacaacc
7860aatcaatgat aggccaaagc aggcatggtg ttggtttgga ggaaaatgga
aggatgcaat 7920aaaagaggtg aagcagacca ttgtcaaaca tcccaggtat
actggaacta acaatactga 7980taaaatcaat ttgacggctc ctggaggagg
agatccggaa gttaccttca tgtggacaaa 8040ttgcagagga gagttcctct
actgtaaaat gaattggttt ctaaattggg tagaagatag 8100gaatacagct
aaccagaagc caaaggaaca gcataaaagg aattacgtgc catgtcatat
8160tagacaaata atcaacactt ggcataaagt aggcaaaaat gtttatttgc
ctccaagaga 8220gggagacctc acgtgtaact ccacagtgac cagtctcata
gcaaacatag attggattga 8280tggaaaccaa actaatatca ccatgagtgc
agaggtggca gaactgtatc gattggaatt 8340gggagattat aaattagtag
agatcactcc aattggcttg gcccccacag atgtgaagag 8400gtacactact
ggtggcacct caagaaataa aagaggggtc tttgtgctag ggttcttggg
8460ttttctcgca acggcaggtt ctgcaatggg cgcggcgtcg ttgacgctga
ccgctcagtc 8520ccgaacttta ttggctggga tagtgcagca acagcaacag
ctgttggacg tggtcaagag 8580acaacaagaa ttgttgcgac tgaccgtctg
gggaacaaag aacctccaga ctagggtcac 8640tgccatcgag aagtacttaa
aggaccaggc gcagctgaat gcttggggat gtgcgtttag 8700acaagtctgc
cacactactg taccatggcc aaatgcaagt ctaacaccaa agtggaacaa
8760tgagacttgg caagagtggg agcgaaaggt tgacttcttg gaagaaaata
taacagccct 8820cctagaggag gcacaaattc aacaagagaa gaacatgtat
gaattacaaa agttgaatag 8880ctgggatgtg tttggcaatt ggtttgacct
tgcttcttgg ataaagtata tacaatatgg 8940agtttatata gttgtaggag
taatactgtt aagaatagtg atctatatag tacaaatgct 9000agctaagtta
aggcaggggt ataggccagt gttctcttcc ccaccctctt atttccagca
9060gacccatatc caacaggacc cggcactgcc aaccagagaa ggcaaagaaa
gagacggtgg 9120agaaggcggt ggcaacagct cctggccttg gcagatagaa
tatattcatt tcctgatccg 9180ccaactgata cgcctcttga cttggctatt
cagcaactgc agaaccttgc tatcgagagt 9240ataccagatc ctccaaccaa
tactccagag gctctctgcg accctacaga ggattcgaga 9300agtcctcagg
actgaactga cctacctaca atatgggtgg agctatttcc atgaggcggt
9360ccaggccgtc tggagatctg cgacagagac tcttgcgggc gcgtggggag
acttatggga 9420gactcttagg agaggtggaa gatggatact cgcaatcccc
aggaggatta gacaagggct 9480tgagctcact ctcttgtgag ggacagaaat
acaatcaggg acagtatatg aatactccat 9540ggagaaaccc agctgaagag
agagaaaaat tagcatacag aaaacaaaat atggatgata 9600tagatgagta
agatgatgac ttggtagggg tatcagtgag gccaaaagtt cccctaagaa
9660caatgagtta caaattggca atagacatgt ctcattttat aaaagaaaag
gggggactgg 9720aagggattta ttacagtgca agaagacata gaatcttaga
catatactta gaaaaggaag 9780aaggcatcat accagattgg caggattaca
cctcaggacc aggaattaga tacccaaaga 9840catttggctg gctatggaaa
ttagtccctg taaatgtatc agatgaggca caggaggatg 9900aggagcatta
tttaatgcat ccagctcaaa cttcccagtg ggatgaccct tggggagagg
9960ttctagcatg gaagtttgat ccaactctgg cctacactta tgaggcatat
gttagatacc 10020cagaagagtt tggaagcaag tcaggcctgt cagaggaaga
ggttagaaga aggctaaccg 10080caagaggcct tcttaacatg gctgacaaga
aggaaactcg ctgaaacagc agggactttc 10140cacaagggga tgttacgggg
aggtactggg gaggagccgg tcgggaacgc ccactttctt 10200gatgtataaa
tatcactgca tttcgctctg tattcagtcg ctctgcggag aggctggcag
10260attgagccct gggaggttct ctccagcact agcaggtaga gcctgggtgt
tccctgctag 10320actctcacca gcacttggcc ggtgctgggc agagtgactc
cacgcttgct tgcttaaagc 10380cctcttcaat aaagctgcca ttttagaagt
aagctagtgt gtgttcccat ctctcctagc 10440cgccgcctgg tcaactcggt
actcaataat aagaagaccc tggtctgtta ggaccctttc 10500tgctttggga
aaccgaagca ggaaaatccc tagca 105353869713DNAHuman immunodeficiency
virus 2 386agtcgctctg cggagaggct ggcagattga gccctgggag gttctctcca
gcactagcag 60gtagagcctg ggtgttccct gctagactct caccggtgct tggccggcac
tgggcagacg 120gctccacgct tgcttgctta aaagacctct taataaagct
gccagttaga agcaagttaa 180gtgtgtgttc ccatctctcc tagtcgccgc
ctggtcattc ggtgttcatc tgaataacaa 240gaccctggtc tgttaggacc
ctttctgctt tgggaaacca aagcaggaaa atccctagca 300ggttggcgcc
cgaacaggga cttagagaag actgaaaagc cttggaacac ggctgagtga
360aggcagtaag ggcggcagga acaaaccacg acggagtgct cctagaaagg
cgcaggccaa 420ggtaccaaag gcggcgtgtg gagcgggagt aaagaggcct
ccgggtgaag gtaagtacct 480acaccaaaaa attgtagcca ggaagggctt
gttatcctac ctttagacag gtagaagatt 540gtgggagatg ggcgcgagaa
actccgtctt gaaagggaaa aaagcagacg aattagaaac 600aattaggtta
cggcccggcg gaaagaaaaa atacaggcta aagcatattg tgtgggcagc
660gaatgaattg gacagattcg gattagcaga gagcctgttg gagtcaaaag
aaggttgcca 720aagaattctt acagttttag gtccattagt accgacaggt
tcagaaaatt taaaaagcct 780ttttaatact gtctgcgtca tttggtgcat
acacgcagaa gagaaagtga aagatactga 840aggagcaaaa caaatagtac
agagacatct agcggcagaa acaggaactg cagagaaaat 900gccaaataca
agtagaccaa cagcaccacc tagcgggaag ggaggaaact tccccgtaca
960acaagtaggc ggcaattata cccatgtgcc gctgagtcct cgaaccctaa
atgcttgggt 1020aaaattagta gaggaaaaga agttcggggc agaggtagtg
ccaggatttc aggcactctc 1080agaaggctgc acgccctatg atatcaacca
aatgcttaat tgtgtgggcg accatcaagc 1140agctatgcaa ataatcaggg
agatcgttaa tgaagaagca gcagattggg atgtgcaaca 1200tccaatacca
ggtcccttac cagcggggca gcttagagaa ccaagagggt ctgacatagc
1260agggacaaca agcacagtag atgaacagat ccagtggatg tttaggccac
aaaatcccgt 1320accagtggga aacatctata ggagatggat ccagatagga
ctgcagaagt gcgtcaggat 1380gtacaacccg accaacatcc tagacataaa
acaaggacca aaggaaccat tccaaagtta 1440tgtagataga ttctacaaaa
gcttgagggc agaacaaaca gatccagcag tgaagaattg
1500gatgacccag acactactag tacagaatgc caacccagac tgtaaattag
tactaaaagg 1560actagggatg aatcctacct tagaagagat gctaaccgcc
tgccaagggg taggtgggcc 1620aggccagaaa gctagactaa tggcagaagc
cttaaaagag gccttgacac cagcccctat 1680cccatttgca gcagcccagc
agaaaaggac aattaaatgc tggaattgtg gaaaggaagg 1740acactcggca
agacaatgcc gagcacctag aagacagggc tgctggaagt gtggtaaacc
1800aggacatgtc atagcaaatt gcccagatag acaggtgggt tttttaggga
tgggcccccg 1860gggaaagaag ccccgcaact tccccgtggc ccaagtcccg
caggggctaa caccaacagc 1920acccccagta gatccagcag tggacctact
ggagaattat atgcagcaag gaaaaagaca 1980aagagaacag agagagagac
catacaaaga agtgacagag gacttactgc acctcgagca 2040gggagaggca
ccatgcagag agacgacaga ggacttgctg cacctcaatt ctctcttttg
2100aaaagaccag tagtcacggc atacgtcgag ggccagccag tagaagttct
gctagacacg 2160ggggctgacg actcaatagt agcagggata gagttaggga
gcaattatag tccaaagata 2220gtaggaggaa tagggggatt cataaatacc
aaggaatata aaaatgtaaa aatagaagtt 2280ttaggtaaaa aggtaagggc
caccataatg acaggtgaca ccccaatcaa catttttggc 2340agaaatattc
tgacagcctt aggcatgtca ttaaatttac cagtcgccaa aatagaacca
2400ataaaaataa tgttaaagcc aggaaaagat ggaccaaaac tgaggcaatg
gcccttaaca 2460aaagaaaaaa tagaggcact aaaagaaatc tgtgaaaaaa
tggaaagaga aggccagcta 2520gaggaagcgc ctccaactaa tccttataac
acccccacat ttgcaatcaa gaaaaaggac 2580aaaaataaat ggaggatgct
aatagatttt agagaactaa acaaggtaac tcaagatttc 2640acagaaattc
agttaggaat tccacaccca gcaggattgg ccaagaaaaa aagaattact
2700gtactagata taggggatgc ttacttttcc ataccactac atgaagactt
tagacagtat 2760actgcattta ctttaccatc aataaacaat gcagaaccag
gaaaaagata tatatataag 2820gtcctgcctc agggatggaa ggggtcacca
gcaatttttc aatacacaat gaggcaggtc 2880ttagaaccat tcagaaaagc
aaacctagat gtcattatca ttcagtacat ggatgatatc 2940ctaatagcta
gtgacaggac agatctagaa catgacaagg tggtcctgca gctaaaggaa
3000cttctaaata acctaggatt ttctacccca gatgagaagt tccaaaagga
ccctccatac 3060cactggatgg gctatgaact gtggccaact aagtggaagc
tgcagaagat acagttgccc 3120caaaaagatg tatggacagt aaatgacatc
caaaagttag tgggtgtctt aaactgggca 3180gcacaaatct acccagggat
aaaaaccaga cacttatgta agctaattag aggaaaaatg 3240acactcacag
aagaagtaca gtggacagaa ctagcagagg cggagttaga agagaacaag
3300attatcttaa gccaggagca agagggacac tattaccaag aagaaaaaga
gttagaagca 3360acagtccaaa aggatcaaga caatcagtgg acatataaag
tacaccaggg agagaaaatt 3420ctaaaagtag ggaaatatgc aaagataaaa
aatacccata ccaatggggt cagattgtta 3480gcacaagtag ttcaaaagat
aggaaaagaa gcactaatca tttggggacg aataccaaaa 3540tttcacctac
cagtagaaag agagacatgg gaacagtggt gggatgacta ctggcaggtg
3600acatggatcc ctgactggga cttcgtatct accccgccgc tggtcagact
agcatttaac 3660ctggtaaaag atcctatacc aagaacagag actttctaca
cagatggatc ctgcaatagg 3720caatcaaagg aaggaaaagc aggatatgta
acagatagag ggagagacaa ggtaaggatg 3780ctagaacaaa ctaccaatca
gcaagcagaa ttagaagcct ttgcaatggc actaacagac 3840tcaggtccaa
aagccaatat tatagtagac tcacagtatg taatggggat agtagcaggc
3900cagccaacag aatcagagag tagaatagta aatcaaatca tagaggagat
gataaaaaag 3960gaagcaatct atgttgcatg ggtcccagcc cataaaggca
taggagggaa tcaggaggta 4020gatcagttag taagtcaggg catcagacaa
gtgttgttcc tggaaaaaat agagcccgct 4080caggaagaac atgagaaata
ccatagcaat gtaaaagaac tatcccataa atttggattg 4140cccaaattag
tagcaagaca aatagtaaac acatgtgccc aatgtcaaca gaaaggggag
4200gctatacatg ggcaagtaga tgcagaatta ggcacttggc aaatggactg
cacacactta 4260gaaggaaaga tcattatagt agcagtacat gttgcaagtg
gattcataga agcagaagtc 4320atcccacagg aatcaggaag gcagacagca
ctcttcctat taaaactggc cagtaggtgg 4380ccaataacac acttgcacac
agataatggt gccaacttca cttcacagga agtaaaaatg 4440gtagcatggt
gggtaggtat agaacaatct ttcggagtac cttacaatcc acaaagccaa
4500ggagtagtag aagcaatgaa tcaccaccta aaaaatcaga taagtagaat
tagagaacag 4560gcaaatacag tagaaacaat agtactgatg gcaacacact
gcatgaattt taaaagaagg 4620ggaggaatag gggatatgac cccagcagaa
agactaatca atatgatcac cacagaacaa 4680gaaatacaat tcctccacgc
caaaaattca aaattaaaaa attttcgggt ctatttcaga 4740gaaggcagag
atcagctgtg gaaaggaccc ggggaactac tgtggaaggg agacggagca
4800gtcatagtca aggtagggac agacataaaa gtagtaccaa ggaggaaagc
caagatcatc 4860aaagactatg gaggaaggca agaactggat agtggttccc
acttggaggg tgccagggag 4920gatggagaaa tggcatagcc ttgtcaaata
tctaaaatac agaacaaaag atctagaaga 4980cgtgtgctat gttccccacc
ataaagtagg atgggcatgg tggacttgca gcagggtaat 5040attcccatta
aagggaaaca gtcatctaga aatacaggca tattggaacc taacgccaga
5100aaaaggatgg ctctcctctt attcagtaag aatgacttgg tatacggaaa
ggttctggac 5160agatgttacc ccagactgtg cagactccct aatacatagc
acttatttct cttgctttac 5220agcaggtgaa gtaagaagag ccatcagagg
ggaaaagtta ttgtcctgct gcaattatcc 5280ccaagcccat agagcccagg
taccgtcact ccaatttttg gccttagtgg tagtgcagca 5340aaatgacaga
ccccagagaa acggtacccc caggaaacag tggcgaagag actatcgaag
5400aggccttcaa ttggctagac aggacggtag aagccataaa cagagaggca
gtgaatcacc 5460tgccccgaga gcttattttc caggtgtggc agaggtcctg
gagatactgg catgatgaac 5520aagggatgtc acaaagttac acaaagtata
gatatttgtg cttaatacag aaggctatgt 5580tcacacattg taagagaggg
tgcacttgcc tggggggagg acatgggcca ggagggtgga 5640gaccaggacc
tccccctcct ccccctccag gtctagtcta atgactgaag caccaacaga
5700gtttcccccg gaggatggga ccccaccgag ggaaccaggg gatgagtgga
taatagaaat 5760cctgagaaaa ataaagaaag aagctttaaa gcattttgac
cctcgcttgc taactgctct 5820tggcaactat atccatacta gacatggaga
cacccttgaa ggcgccagag agctcattaa 5880tgtcctacaa cgagccctct
tcatgcactt cagagcggga tgtaggctct caagaattgg 5940ccaaacaggg
ggaagaactc ctttcccagc tacatcgacc cctagaacca tgcaataaca
6000aatgctattg taaaggatgc tgcttccact gccagctgtg ttttttaaac
aaggggctcg 6060ggatatgtta tgaccggaag ggcagacgaa gaagaactcc
gaagaaaact aaggctcatt 6120catcttctgc atcagacaag tgagtatgat
gggtggtaga aatcagctgc ttgttgccat 6180tttgctaact agtacttgct
tgatatattg caccaattat gtgactgttt tctatggcat 6240acccgcgtgg
agaaatgcat ccattcccct cttttgtgca accaagaata gggatacttg
6300gggaaccata cagtgcttgc cagacaatga tgattatcag gagataactt
tgaatgtgac 6360agaggctttc gatgcatggg ataatacagt aacagaacaa
gcaatagaag atgtctggaa 6420tctatttgag acatcaataa aaccatgtgt
caaattaacg cctttatgtg tagcaatgag 6480atgtaacaac acagatgcaa
ggaacacaac cacacccaca acagcatccc cgcgtacaat 6540aaaacccgtg
acagagataa gtgagaattc ctcatgcata cgcgcaaaca actgctcagg
6600attgggagaa gaagaggtgg tcaattgtca attcaatatg acaggattag
agagagataa 6660gaaaaagcaa tatagtgaga catggtactc gaaggatgta
gtttgtgaag gaaatggcac 6720cacagataca tgttacatga accattgcaa
cacatcggtc atcacagagt catgtgacaa 6780gcactattgg gatgctatga
ggtttagata ctgtgcacca ccaggttttg ccctactaag 6840atgcaatgat
accaattatt caggctttgc gcccaattgc tctaaggtag tagctgctac
6900atgcaccaga atgatggaaa cgcaaacttc tacatggttt ggctttaatg
gcactagagc 6960agaaaataga acatttatct attggcatgg tagggataac
agaactatca tcagcttaaa 7020caaatattat aatctcacta tacattgtaa
gaggccagga aataagacag tggtaccaat 7080aacacttatg tcagggttaa
ggtttcactc ccagccggtc atcaataaaa gacccagaca 7140agcatggtgt
tggttcaaag gtgaatggaa gggagccatg caggaggtga aggaaaccct
7200tgcaaaacat cccaggtata aaggaaccaa tgaaacaaag aatattaact
ttacagcacc 7260aggaaagggc tcagacccag aggtggcata catgtggact
aactgcagag gagaatttct 7320ctactgcaac atgacttggt tcctcaattg
gatagaaaat aagacacacc gcaattatgt 7380accgtgccat ataagacaaa
taattaacac ctggcataag gtagggaaaa atgtatattt 7440gcctcccagg
gaaggggagt tgacctgcaa ctcaacagta actagcataa ttgctaacat
7500tgatgcaaat ggaaataata caaatattac ctttagtgca gaggtggcag
aactataccg 7560attagagttg ggagattata aattggtaga aataacacca
attggcttcg cacctacagc 7620agaaaaaaga tactcctcta ctccaatgag
gaacaagaga ggtgtgttcg tgctagggtt 7680cttgggtttt ctcgcaacag
caggctctgc aatgggcgcg gcgtccttaa cgctgtcggc 7740tcagtctcgg
actttactgg ccgggatagt gcagcaacag caacagctgt tggacgtggt
7800caagagacaa caggaaatgt tgcgactgac cgtctgggga acaaaaaatc
tccaggcaag 7860agtcactgct atcgagaagt acttaaagga ccaggcgcaa
ctaaattcat ggggatgtgc 7920atttagacaa gtctgccaca ctactgtacc
atgggtaaat gataccttaa cgcctgagtg 7980gaacaatatg acgtggcaag
aatgggaagg caaaatccgc gacctggagg caaatatcag 8040tcaacaatta
gaacaagcac aaattcagca agagaagaat atgtatgaac tacaaaagtt
8100aaatagctgg gatgtttttg gtaactggtt tgacttaacc tcctggatca
agtatattca 8160atatggagtt tatataataa taggaatagt agttcttaga
atagtaatat atatagtaca 8220gatgttaagt agacttagaa agggctatag
gcctgttttc tcttcccccc ccggttacct 8280ccaacagatc catatccaca
aggactggga acagccagcc agagaagaaa cagaagaaga 8340cgttggaaac
aacgttggag acagctcgtg gccttggccg ataagatata tacatttcct
8400gatccaccag ctgattcgcc tcttggccgg actatacaac atctgcagga
acttactatc 8460caggatctcc ctgaccctcc gaccagtttt ccagagtctt
cagagggcac tgacagcaat 8520cagagactgg ctaagaactg acgcagccta
cttgcagtat gggtgcgagt ggatccaagg 8580agcgttccag gccttcgcaa
gggctacgag agagactctt gcgggcacgt ggagagactt 8640gtggggggca
ctgcagcgga tcgggagggg aatacttgca gtcccaagaa gaatcaggca
8700gggagcagag atcgccctcc tatgagggac agcggtatca gcagggagac
tttatgaata 8760ccccatggag aaccccagca aaagaagggg agaaagaatt
gtacaagcaa caaaatagag 8820atgatgtaga ttcggatgat gatgacctag
taggggtctc tgtcacacca agagtaccac 8880taagagaatt gacacataga
ttagcaatag atgtgtcaca ttttataaaa gaaaaagggg 8940gactggaagg
gatgtattac agtgagagaa gacatagaat cttagacata taccttgaaa
9000aggaagaagg gataattgca gattggcaga actatactca tgggccagga
ataagatacc 9060caatgttctt tgggtggcta tggaagctag taccagtaga
tgtcacacga caggaggagg 9120acgatgggac tcactgttta ctacacccag
cacaaacaag caggtttgat gacccgcatg 9180gggaaacact gatatggaag
tttgacccca cgctggctca tgattacaag gcttttatcc 9240tgcacccaga
ggaatttggg cataagtcag gcctgccaga agaagactgg aaggcaagac
9300tgaaagcaag agggatacca tttagttaga gacaggaaca gctatatttg
gccagggcag 9360gaaataacta ctgaaaacag ctgagactgc agggactttc
cgaaggggct gtaaccaggg 9420gagggacatg ggaggagccg gtggggaacg
ccctcatact ttctgtataa agatacccgc 9480tgcttgcatt gtacttcagt
cgctctgcgg agaggctggc agattgagcc ctgggaggtt 9540ctctccagca
ctagcaggta gagcctgggt gttccctgct agactctcac cggtgcttgg
9600ccggcactgg gcagacggct ccacgcttgc ttgcttaaaa gacctcttaa
taaagctgcc 9660agttagaagc aagttaagtg tgtgttccca tctctcctag
tcgccgcctg gtc 971338711878DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotide 387gcctcactga
ttaagcattg gtaactgtca gaccaagttt actcatatat actttagatt 60gatttaaaac
ttcattttta atttaaaagg atctaggtga agatcctttt tgataatctc
120atgaccaaaa tcccttaacg tgagttttcg ttccactgag cgtcagaccc
cgtagaaaag 180atcaaaggat cttcttgaga tccttttttt ctgcgcgtaa
tctgctgctt gcaaacaaaa 240aaaccaccgc taccagcggt ggtttgtttg
ccggatcaag agctaccaac tctttttccg 300aaggtaactg gcttcagcag
agcgcagata ccaaatactg ttcttctagt gtagccgtag 360ttaggccacc
acttcaagaa ctctgtagca ccgcctacat acctcgctct gctaatcctg
420ttaccagtgg ctgctgccag tggcgataag tcgtgtctta ccgggttgga
ctcaagacga 480tagttaccgg ataaggcgca gcggtcgggc tgaacggggg
gttcgtgcac acagcccagc 540ttggagcgaa cgacctacac cgaactgaga
tacctacagc gtgagctatg agaaagcgcc 600acgcttcccg aagggagaaa
ggcggacagg tatccggtaa gcggcagggt cggaacagga 660gagcgcacga
gggagcttcc agggggaaac gcctggtatc tttatagtcc tgtcgggttt
720cgccacctct gacttgagcg tcgatttttg tgatgctcgt caggggggcg
gagcctatgg 780aaaaacgcca gcaacgcggc ctttttacgg ttcctggcct
tttgctggcc ttttgctcac 840atgttctttc ctgcgttatc ccctgattct
gtggataacc gtattaccgc ctttgagtga 900gctgataccg ctcgccgcag
ccgaacgacc gagcgcagcg agtcagtgag cgaggaagcg 960gaagagcgcc
caatacgcaa accgcctctc cccgcgcgtt ggccgattca ttaatgcagc
1020tggcacgaca ggtttcccga ctggaaagcg ggcagtgagc gcaacgcaat
taatgtgagt 1080tagctcactc attaggcacc ccaggcttta cactttatgc
ttccggctcg tatgttgtgt 1140ggaattgtga gcggataaca atttcacaca
ggaaacagct atgaccatga ttacgccaag 1200ctatttaggt gacactatag
aatactcaag cttgggggga tcctctagag tcgacctgca 1260ggcatgctat
ttgatgaatt aactacactt aaaataatac aattattatt aaattttttt
1320ttgatttatt tattaatttt taaacttaat catttgtatt tgggaggaat
tatatatatc 1380tttataatta ttttattttt ttttattttt ttattttttt
attattatta ttttttttta 1440tttttttttt ttactgtatc aaagaaaaac
ctttaaaaaa aaaattataa tttccccatc 1500ttactatatt tttaatacat
acgttttaag gaattaaatt agacaaaagc tatattatgc 1560tttacatata
attagaattt ataaacgttt ggttattaga tatttcatgt ctcagtaaag
1620tctttcaata catatgtaaa aaaatatata tgaatacaca taagttgtta
atatatttta 1680tatgcataaa tgtataaata tatatatata tatatatata
tgtatgtatg tatatgtgtg 1740tatatgaaat tatttcaatg tttaattttt
taaattttaa tttttttttt tttttttttt 1800tttattatgt atattgatct
ttattattta aatattactt ttttcgtttt ttcttctttt 1860tattattttt
tttttttttt atattttata caaatggtaa ttcaaataaa aggtataaat
1920ttatatttaa ttttctttta tggataaata aaagaaaaat ataaatatat
aaaaatataa 1980aaatatatat atgtatattg gggtgatgat aaaatgaaag
ataatatata tatatatata 2040tctttatttt tttttttttg tagaccccat
tgtgagtaca taaatatatt atataactcg 2100ggagcatcag tcatggaatt
cttatttctt tttctttttt gcctggccgg cctttttcgt 2160ggccgccggc
cttttgtcgc ctcccagctg agacaggtcg atccgtgtct cgtacaggcc
2220ggtgatgctc tggtggatca gggtggcgtc cagcacctct ttggtgctgg
tgtacctctt 2280ccggtcgatg gtggtgtcaa agtacttgaa ggcggcaggg
gctcccagat tggtcagggt 2340aaacaggtgg atgatattct cggcctgctc
tctgatgggc ttatcccggt gcttgttgta 2400ggcggacagc actttgtcca
gattagcgtc ggccaggatc actctcttgg agaactcgct 2460gatctgctcg
atgatctcgt ccaggtagtg cttgtgctgt tccacaaaca gctgtttctg
2520ctcattatcc tcgggggagc ccttcagctt ctcatagtgg ctggccaggt
acaggaagtt 2580cacatatttg gagggcaggg ccagttcgtt tcccttctgc
agttcgccgg cagaggccag 2640cattctcttc cggccgtttt ccagctcgaa
cagggagtac ttaggcagct tgatgatcag 2700gtcctttttc acttctttgt
agcccttggc ttccagaaag tcgatgggat tcttctcgaa 2760gctgcttctt
tccatgatgg tgatccccag cagctctttc acactcttca gtttcttgga
2820cttgcccttt tccactttgg ccaccaccag cacagaatag gccacggtgg
ggctgtcgaa 2880gccgccgtac ttcttagggt cccagtcctt ctttctggcg
atcagcttat cgctgttcct 2940cttgggcagg atagactctt tgctgaagcc
gcctgtctgc acctcggtct ttttcacgat 3000attcacttgg ggcatgctca
gcactttccg cacggtggca aaatcccggc ccttatccca 3060cacgatctcc
ccggtttcgc cgtttgtctc gatcagaggc cgcttccgga tctcgccgtt
3120ggccagggta atctcggtct tgaaaaagtt catgatgttg ctgtagaaga
agtacttggc 3180ggtagccttg ccgatttcct gctcgctctt ggcgatcatc
ttccgcacgt cgtacacctt 3240gtagtcgccg tacacgaact cgctttccag
cttagggtac tttttgatca gggcggttcc 3300cacgacggcg ttcaggtagg
cgtcgtgggc gtggtggtag ttgttgatct cgcgcacttt 3360gtaaaactgg
aaatccttcc ggaaatcgga caccagcttg gacttcaggg tgatcacttt
3420cacttcccgg atcagcttgt cattctcgtc gtacttagtg ttcatccggg
agtccaggat 3480ctgtgccacg tgctttgtga tctgccgggt ttccaccagc
tgtctcttga tgaagccggc 3540cttatccagt tcgctcaggc cgcctctctc
ggccttggtc agattgtcga actttctctg 3600ggtaatcagc ttggcgttca
gcagctgccg ccagtagttc ttcatcttct tcacgacctc 3660ttcggagggc
acgttgtcgc tcttgccccg gttcttgtcg cttctggtca gcaccttgtt
3720gtcgatggag tcgtccttca gaaagctctg aggcacgata tggtccacat
cgtagtcgga 3780cagccggttg atgtccagtt cctggtccac gtacatatcc
cgcccattct gcaggtagta 3840caggtacagc ttctcgttct gcagctgggt
gttttccacg gggtgttctt tcaggatctg 3900gctgcccagc tctttgatgc
cctcttcgat ccgcttcatt ctctcgcggc tgttcttctg 3960tcccttctgg
gtggtctggt tctctctggc catttcgatc acgatgttct cgggcttgtg
4020ccggcccatc actttcacga gctcgtccac caccttcact gtctgcagga
tgcccttctt 4080aatggcgggg ctgccggcca gattggcaat gtgctcgtgc
aggctatcgc cctggccgga 4140cacctgggct ttctggatgt cctctttaaa
ggtcaggctg tcgtcgtgga tcagctgcat 4200gaagtttctg ttggcgaagc
cgtcggactt caggaaatcc aggattgtct tgccggactg 4260cttgtcccgg
atgccgttga tcagcttccg gctcagcctg ccccagccgg tgtatctccg
4320ccgcttcagc tgcttcatca ctttgtcgtc gaacaggtgg gcataggttt
tcagccgttc 4380ctcgatcatc tctctgtcct caaacagtgt cagggtcagc
acgatatctt ccagaatgtc 4440ctcgttttcc tcattgtcca ggaagtcctt
gtccttgata attttcagca gatcgtggta 4500tgtgcccagg gaggcgttga
accgatcttc cacgccggag atttccacgg agtcgaagca 4560ctcgattttc
ttgaagtagt cctctttcag ctgcttcacg gtcactttcc ggttggtctt
4620gaacagcagg tccacgatgg cctttttctg ctcgccgctc aggaaggcgg
gctttctcat 4680tccctcggtc acgtatttca ctttggtcag ctcgttatac
acggtgaagt actcgtacag 4740caggctgtgc ttgggcagca ccttctcgtt
gggcaggttc ttatcgaagt tggtcatccg 4800ctcgatgaag ctctgggcgg
aagcgccctt gtccaccact tcctcgaagt tccagggggt 4860gatggtttcc
tcgctctttc tggtcatcca ggcgaatctg ctgtttcccc tggccagagg
4920gcccacgtag taggggatgc ggaaggtcag gatcttctcg atcttttccc
ggttgtcctt 4980caggaatggg taaaaatctt cctgccgccg cagaatggcg
tgcagctctc ccaggtggat 5040ctggtggggg atgctgccgt tgtcgaaggt
ccgctgcttc cgcagcaggt cctctctgtt 5100cagcttcacg agcagttcct
cggtgccgtc catcttttcc aggatgggct tgatgaactt 5160gtagaactct
tcctggctgg ctccgccgtc aatgtagccg gcgtagccgt tcttgctctg
5220gtcgaagaaa atctctttgt acttctcagg cagctgctgc cgcacgagag
ctttcagcag 5280ggtcaggtcc tggtggtgct cgtcgtatct cttgatcata
gaggcgctca ggggggcctt 5340ggtgatctcg gtgttcactc tcaggatgtc
gctcagcagg atggcgtcgg acaggttctt 5400ggcggccaga aacaggtcgg
cgtactggtc gccgatctgg gccagcaggt tgtccaggtc 5460gtcgtcgtag
gtgtccttgc tcagctgcag tttggcatcc tcggccaggt cgaagttgct
5520cttgaagttg ggggtcaggc ccaggctcag ggcaatcagg tttccgaaca
ggccattctt 5580cttctcgccg ggcagctggg cgatcagatt ttccagccgt
ctgctcttgc tcagtctggc 5640agacaggatg gccttggcgt ccacgccgct
ggcgttgatg gggttttcct cgaacagctg 5700gttgtaggtc tgcaccagct
ggatgaacag cttgtccacg tcgctgttgt cggggttcag 5760gtcgccctcg
atcaggaagt ggccccggaa cttgatcatg tgggccaggg ccagatagat
5820cagccgcagg tcggccttgt cggtgctgtc caccagtttc tttctcaggt
ggtagatggt 5880ggggtacttc tcgtggtagg ccacctcgtc cacgatgttg
ccgaagatgg ggtgccgctc 5940gtgcttctta tcctcttcca ccaggaagga
ctcttccagt ctgtggaaga agctgtcgtc 6000caccttggcc atctcgttgc
tgaagatctc ttgcagatag cagatccggt tcttccgtct 6060ggtgtatctt
cttctggcgg ttctcttcag ccgggtggcc tcggctgttt cgccgctgtc
6120gaacagcagg gctccgatca ggttcttctt gatgctgtgc cggtcggtgt
tgcccagcac 6180cttgaatttc ttgctgggca ccttgtactc gtcggtgatc
acggcccagc ccacagagtt 6240ggtgccgatg tccaggccga tgctgtactt
cttgtcggct gctgggactc cgtggatacc 6300gaccttccgc ttcttctttg
gggccatctt atcgtcatcg tctttgtaat caatatcatg 6360atccttgtag
tctccgtcgt ggtccttata gtccattttt ctcgagggat cctgatatat
6420ttctattagg tatttattat tataaaatat aaatcttgaa tgataataaa
taaaatatta 6480gttattcctt ttctagttta aaatatacat attataaata
tatatatata tatatatatt 6540tttattgtga caagaatata taattataaa
ttatattatt tatttttgta tttttttttt 6600tttttttttt tttttctttt
tttgttttat ttttcttttt ttttataaat attatttttt 6660tcttttatca
tgcacattgg aataatacat taatatatat atatatatta tattatacat
6720atattgaata atgtttataa aaaatgcata acttatatga
atataatttt ttttaaatat 6780gacaaaaaga aaaaaaaaaa aaaccaaaaa
aaattaaaat tgaaatgaaa tatataaata 6840tattatttat atatattata
cattgtttaa tactactaca tgtatatata tatattatat 6900atatatatat
atatcaattt tttcaaaaat aaattaatat aaaaagaggg gaaaaaaaaa
6960aaaaaaaaaa aaaaaagata attaagtaag catttaaaaa tatataaatt
gataatatat 7020aaaattaatc acatataaaa gcttataaac actaggttag
ctaattcgct tgtaagaggt 7080actctcgttt atgcaaaact atttgatata
gcattttaac aagtacacat atatatatgt 7140aatatatata ctatatatat
ctattgcatg tgtactaagc atgtgcatgg catccccttt 7200ttctcgtgtt
taaaacagtt tgtatgataa aatataaagg atttgaaaaa gagaaaaaaa
7260tatatgatct catcctatat agcgccataa tttttatttg ggttgaataa
aattttctac 7320taaatttagg tgtaagtaaa ataatggaat atatataagt
acaataaaaa agtgcataaa 7380ttaaaaaatt tttataataa atattttttt
taaaaaagtc aataataata ttaaatatat 7440ataacacagg attatatatg
ttcactacaa ttttttatat tataatataa attcttttca 7500attttcattt
tattttacat acactttcct tttttgtcac tatattttaa tattcacata
7560tttagtttaa atactggcta tttctttcta catttgctag taacaattgt
gtagtgctta 7620aatatataca cacacctaaa acttacaaag tatcctagga
ccatggccaa gcctttgtct 7680caagaagaat ccaccctcat tgaaagagca
acggctacaa tcaacagcat ccccatctct 7740gaagactaca gcgtcgccag
cgcagctctc tctagcgacg gccgcatctt cactggtgtc 7800aatgtatatc
attttactgg gggaccttgt gcagaactcg tggtgctggg cactgctgct
7860gctgcggcag ctggcaacct gacttgtatc gtcgcgatcg gaaatgagaa
caggggcatc 7920ttgagcccct gcggacggtg ccgacaggtg cttctcgatc
tgcatcctgg gatcaaagcc 7980atagtgaagg acagtgatgg acagccgacg
gcagttggga ttcgtgaatt gctgccctct 8040ggttatgtgt gggagggcta
accgcgggta ccccattaaa tttatttaat aatagattaa 8100aaatattata
aaaataaaaa cataaacaca gaaattacaa aaaaaataca tatgaatttt
8160ttttttgtaa tcttccttat aaatatagaa taatgaatca tataaaacat
atcattattc 8220atttatttac atttaaaatt attgtttcag tatctttaat
ttattatgta tatataaaaa 8280taacttacaa ttttattaat aaacaatata
tgtttattaa ttcatgtttt gtaatttatg 8340ggatagcgat tttttttact
gtctgtattt tcttttttaa ttatgtttta attgtattta 8400ttttattttt
attattgttc tttttatagt attattttaa aacaaaatgt attttctaag
8460aacttataat aataataata taaattttaa taaaaattat atttatcttt
tacaatatga 8520acataaagta caacattaat atatagcttt taatattttt
attcctaatc atgtaaatct 8580taaatttttc tttttaaaca tatgttaaat
atttatttct cattatatat aagaacatat 8640ttattacatc tagaggtacc
gagctcgttt tcgacactgg atggcggcgt tagtatcgaa 8700tcgacagcag
tatagcgacc agcattcaca tacgattgac gcatgatatt actttctgcg
8760cacttaactt cgcatctggg cagatgatgt cgaggcgaaa aaaaatataa
atcacgctaa 8820catttgatta aaatagaaca actacaatat aaaaaaacta
tacaaatgac aagttcttga 8880aaacaagaat ctttttattg tcagtactga
ttagaaaaac tcatcgagca tcaaatgaaa 8940ctgcaattta ttcatatcag
gattatcaat accatatttt tgaaaaagcc gtttctgtaa 9000tgaaggagaa
aactcaccga ggcagttcca taggatggca agatcctggt atcggtctgc
9060gattccgact cgtccaacat caatacaacc tattaatttc ccctcgtcaa
aaataaggtt 9120atcaagtgag aaatcaccat gagtgacgac tgaatccggt
gagaatggca aaagcttatg 9180catttctttc cagacttgtt caacaggcca
gccattacgc tcgtcatcaa aatcactcgc 9240atcaaccaaa ccgttattca
ttcgtgattg cgcctgagcg agacgaaata cgcgatcgct 9300gttaaaagga
caattacaaa caggaatcga atgcaaccgg cgcaggaaca ctgccagcgc
9360atcaacaata ttttcacctg aatcaggata ttcttctaat acctggaatg
ctgttttgcc 9420ggggatcgca gtggtgagta accatgcatc atcaggagta
cggataaaat gcttgatggt 9480cggaagaggc ataaattccg tcagccagtt
tagtctgacc atctcatctg taacatcatt 9540ggcaacgcta cctttgccat
gtttcagaaa caactctggc gcatcgggct tcccatacaa 9600tcgatagatt
gtcgcacctg attgcccgac attatcgcga gcccatttat acccatataa
9660atcagcatcc atgttggaat ttaatcgcgg cctcgaaacg tgagtctttt
ccttacccat 9720ggttgtttat gttcggatgt gatgtgagaa ctgtatccta
gcaagatttt aaaaggaagt 9780atatgaaaga agaacctcag tggcaaatcc
taacctttta tatttctcta caggggcgcg 9840gcgtggggac aattcaacgc
gtctgtgagg ggagcgtttc cctgctcgca ggtctgcagc 9900gaggagccgt
aatttttgct tcgcgccgtg cggccatcaa aatgtatgga tgcaaatgat
9960tatacatggg gatgtatggg ctaaatgtac gggcgacagt cacatcatgc
ccctgagctg 10020cgcacgtcaa gactgtcaag gagggtattc tgggcctcca
tgtcgctggc ctaacattag 10080taatgtaggt ctgactttca ctcatataag
tcttatggta actaaactaa ggtcttacct 10140ttactgatat atgtcttact
ttcactaact taggtattac ttttactaac ttaggtctta 10200aattcagtaa
ctaaggtcat acttcgacta actaaggtct tacattcact gatataggtc
10260ttatgattac taacttaggt cctaatttga ctaacataag tcctaacatt
agtaatgtag 10320gtcttaactt aactaactta ggtcttacct tcactaatat
aggtcttaat attactgact 10380taagtaatta aggtactaac ttaggtcgta
aggtaactaa tatataggtc ttaaggtaac 10440taatttaggt cttgacttaa
taaatatagg tcctaacata aatagtatag gtcctaatat 10500aagtactata
ggccttaact taaccaacat aggtcctaac ataagttata taggtcttaa
10560cgtaactaac ataagtcatt aaggtactaa gtttggtctt aatttaacaa
taacatgtcg 10620ctggcctaac attagtaatg taggtctgac tttcactcat
ataagtctta tggtaactaa 10680actaaggtct tacctttact gatatatgtc
ttactttcac taacttaggt attactttta 10740ctaacttagg tcttaaattc
agtaactaag gtcatacttc gactaactaa ggtcttacat 10800tcactgatat
aggtcttatg attactaact taggtcctaa tttgactaac ataagtccta
10860acattagtaa tgtaggtctt aacttaacta acttaggtct taccttcact
aatataggtc 10920ttaatattac tgacttaagt aattaaggta ctaacttagg
tcgtaaggta actaatatat 10980aggtcttaag gtaactaatt taggtcttga
cttaataaat ataggtccta acataaatag 11040tataggtcct aatataagta
ctataggcct taacttaacc aacataggtc ctaacataag 11100ttatataggt
cttaacgtaa ctaacataag tcattaaggt actaagtttg gtcttaattt
11160aacaataacc atgtcgctgg ccgggtggtc ttaatttaac aaatatagac
catgtcgctg 11220gccgggtgac ccggcgggga cgaggcaagc taaacagatc
ctcgtgatac gcctattttt 11280ataggttaat gtcatgataa taatggtttc
ttaggacgga tcgcttgcct gtaacttaca 11340cgcgcctcgt atcttttaat
gatggaataa tttgggaatt tactctgtgt ttatttattt 11400ttatgttttg
tatttggatt ttagaaagta aataaagaag gtagaagagt tacggaatga
11460agaaaaaaaa ataaacaaag gtttaaaaaa tttcaacaaa aagcgtactt
tacatatata 11520tttattagac aagaaaagca gattaaatag atatacattc
gattaacgat aagtaaaatg 11580taaaatcaca ggattttcgt gtgtggtctt
ctacacagac aagatgaaac aattcggcat 11640taatacctga gagcaggaag
agcaagataa aaggtagtat ttgttggcga tccccctaga 11700gtcttttaca
tcttcggaaa acaaaaacta ttttttcttt aatttctttt tttactttct
11760atttttaatt tatatattta tattaaaaaa tttaaattat aattattttt
atagcacgtg 11820atgaaaagga cccaggtggc acttttcggg gaaatctcga
cctgcagcgt acgaagct 1187838812044DNAArtificial SequenceDescription
of Artificial Sequence Syntheticpolynucleotide 388gcctcactga
ttaagcattg gtaactgtca gaccaagttt actcatatat actttagatt 60gatttaaaac
ttcattttta atttaaaagg atctaggtga agatcctttt tgataatctc
120atgaccaaaa tcccttaacg tgagttttcg ttccactgag cgtcagaccc
cgtagaaaag 180atcaaaggat cttcttgaga tccttttttt ctgcgcgtaa
tctgctgctt gcaaacaaaa 240aaaccaccgc taccagcggt ggtttgtttg
ccggatcaag agctaccaac tctttttccg 300aaggtaactg gcttcagcag
agcgcagata ccaaatactg ttcttctagt gtagccgtag 360ttaggccacc
acttcaagaa ctctgtagca ccgcctacat acctcgctct gctaatcctg
420ttaccagtgg ctgctgccag tggcgataag tcgtgtctta ccgggttgga
ctcaagacga 480tagttaccgg ataaggcgca gcggtcgggc tgaacggggg
gttcgtgcac acagcccagc 540ttggagcgaa cgacctacac cgaactgaga
tacctacagc gtgagctatg agaaagcgcc 600acgcttcccg aagggagaaa
ggcggacagg tatccggtaa gcggcagggt cggaacagga 660gagcgcacga
gggagcttcc agggggaaac gcctggtatc tttatagtcc tgtcgggttt
720cgccacctct gacttgagcg tcgatttttg tgatgctcgt caggggggcg
gagcctatgg 780aaaaacgcca gcaacgcggc ctttttacgg ttcctggcct
tttgctggcc ttttgctcac 840atgttctttc ctgcgttatc ccctgattct
gtggataacc gtattaccgc ctttgagtga 900gctgataccg ctcgccgcag
ccgaacgacc gagcgcagcg agtcagtgag cgaggaagcg 960gaagagcgcc
caatacgcaa accgcctctc cccgcgcgtt ggccgattca ttaatgcagc
1020tggcacgaca ggtttcccga ctggaaagcg ggcagtgagc gcaacgcaat
taatgtgagt 1080tagctcactc attaggcacc ccaggcttta cactttatgc
ttccggctcg tatgttgtgt 1140ggaattgtga gcggataaca atttcacaca
ggaaacagct atgaccatga ttacgccaag 1200ctatttaggt gacactatag
aatactcaag cttgggggga tcctctagag tcgactaata 1260cgactcacta
taggaacata atctatagcg gcgttttaga gctagaaata gcaagttaaa
1320ataaggctag tccgttatca acttgaaaaa gtggcaccga gtcggtgcta
gcataacccc 1380ttggggcctc taaacgggtc ttgaggggtt ttttggtcga
cctgcaggca tgctatttga 1440tgaattaact acacttaaaa taatacaatt
attattaaat ttttttttga tttatttatt 1500aatttttaaa cttaatcatt
tgtatttggg aggaattata tatatcttta taattatttt 1560attttttttt
atttttttat ttttttatta ttattatttt tttttatttt ttttttttac
1620tgtatcaaag aaaaaccttt aaaaaaaaaa ttataatttc cccatcttac
tatattttta 1680atacatacgt tttaaggaat taaattagac aaaagctata
ttatgcttta catataatta 1740gaatttataa acgtttggtt attagatatt
tcatgtctca gtaaagtctt tcaatacata 1800tgtaaaaaaa tatatatgaa
tacacataag ttgttaatat attttatatg cataaatgta 1860taaatatata
tatatatata tatatatgta tgtatgtata tgtgtgtata tgaaattatt
1920tcaatgttta attttttaaa ttttaatttt tttttttttt ttttttttta
ttatgtatat 1980tgatctttat tatttaaata ttactttttt cgttttttct
tctttttatt attttttttt 2040ttttttatat tttatacaaa tggtaattca
aataaaaggt ataaatttat atttaatttt 2100cttttatgga taaataaaag
aaaaatataa atatataaaa atataaaaat atatatatgt 2160atattggggt
gatgataaaa tgaaagataa tatatatata tatatatctt tatttttttt
2220tttttgtaga ccccattgtg agtacataaa tatattatat aactcgggag
catcagtcat 2280ggaattctta tttctttttc ttttttgcct ggccggcctt
tttcgtggcc gccggccttt 2340tgtcgcctcc cagctgagac aggtcgatcc
gtgtctcgta caggccggtg atgctctggt 2400ggatcagggt ggcgtccagc
acctctttgg tgctggtgta cctcttccgg tcgatggtgg 2460tgtcaaagta
cttgaaggcg gcaggggctc ccagattggt cagggtaaac aggtggatga
2520tattctcggc ctgctctctg atgggcttat cccggtgctt gttgtaggcg
gacagcactt 2580tgtccagatt agcgtcggcc aggatcactc tcttggagaa
ctcgctgatc tgctcgatga 2640tctcgtccag gtagtgcttg tgctgttcca
caaacagctg tttctgctca ttatcctcgg 2700gggagccctt cagcttctca
tagtggctgg ccaggtacag gaagttcaca tatttggagg 2760gcagggccag
ttcgtttccc ttctgcagtt cgccggcaga ggccagcatt ctcttccggc
2820cgttttccag ctcgaacagg gagtacttag gcagcttgat gatcaggtcc
tttttcactt 2880ctttgtagcc cttggcttcc agaaagtcga tgggattctt
ctcgaagctg cttctttcca 2940tgatggtgat ccccagcagc tctttcacac
tcttcagttt cttggacttg cccttttcca 3000ctttggccac caccagcaca
gaataggcca cggtggggct gtcgaagccg ccgtacttct 3060tagggtccca
gtccttcttt ctggcgatca gcttatcgct gttcctcttg ggcaggatag
3120actctttgct gaagccgcct gtctgcacct cggtcttttt cacgatattc
acttggggca 3180tgctcagcac tttccgcacg gtggcaaaat cccggccctt
atcccacacg atctccccgg 3240tttcgccgtt tgtctcgatc agaggccgct
tccggatctc gccgttggcc agggtaatct 3300cggtcttgaa aaagttcatg
atgttgctgt agaagaagta cttggcggta gccttgccga 3360tttcctgctc
gctcttggcg atcatcttcc gcacgtcgta caccttgtag tcgccgtaca
3420cgaactcgct ttccagctta gggtactttt tgatcagggc ggttcccacg
acggcgttca 3480ggtaggcgtc gtgggcgtgg tggtagttgt tgatctcgcg
cactttgtaa aactggaaat 3540ccttccggaa atcggacacc agcttggact
tcagggtgat cactttcact tcccggatca 3600gcttgtcatt ctcgtcgtac
ttagtgttca tccgggagtc caggatctgt gccacgtgct 3660ttgtgatctg
ccgggtttcc accagctgtc tcttgatgaa gccggcctta tccagttcgc
3720tcaggccgcc tctctcggcc ttggtcagat tgtcgaactt tctctgggta
atcagcttgg 3780cgttcagcag ctgccgccag tagttcttca tcttcttcac
gacctcttcg gagggcacgt 3840tgtcgctctt gccccggttc ttgtcgcttc
tggtcagcac cttgttgtcg atggagtcgt 3900ccttcagaaa gctctgaggc
acgatatggt ccacatcgta gtcggacagc cggttgatgt 3960ccagttcctg
gtccacgtac atatcccgcc cattctgcag gtagtacagg tacagcttct
4020cgttctgcag ctgggtgttt tccacggggt gttctttcag gatctggctg
cccagctctt 4080tgatgccctc ttcgatccgc ttcattctct cgcggctgtt
cttctgtccc ttctgggtgg 4140tctggttctc tctggccatt tcgatcacga
tgttctcggg cttgtgccgg cccatcactt 4200tcacgagctc gtccaccacc
ttcactgtct gcaggatgcc cttcttaatg gcggggctgc 4260cggccagatt
ggcaatgtgc tcgtgcaggc tatcgccctg gccggacacc tgggctttct
4320ggatgtcctc tttaaaggtc aggctgtcgt cgtggatcag ctgcatgaag
tttctgttgg 4380cgaagccgtc ggacttcagg aaatccagga ttgtcttgcc
ggactgcttg tcccggatgc 4440cgttgatcag cttccggctc agcctgcccc
agccggtgta tctccgccgc ttcagctgct 4500tcatcacttt gtcgtcgaac
aggtgggcat aggttttcag ccgttcctcg atcatctctc 4560tgtcctcaaa
cagtgtcagg gtcagcacga tatcttccag aatgtcctcg ttttcctcat
4620tgtccaggaa gtccttgtcc ttgataattt tcagcagatc gtggtatgtg
cccagggagg 4680cgttgaaccg atcttccacg ccggagattt ccacggagtc
gaagcactcg attttcttga 4740agtagtcctc tttcagctgc ttcacggtca
ctttccggtt ggtcttgaac agcaggtcca 4800cgatggcctt tttctgctcg
ccgctcagga aggcgggctt tctcattccc tcggtcacgt 4860atttcacttt
ggtcagctcg ttatacacgg tgaagtactc gtacagcagg ctgtgcttgg
4920gcagcacctt ctcgttgggc aggttcttat cgaagttggt catccgctcg
atgaagctct 4980gggcggaagc gcccttgtcc accacttcct cgaagttcca
gggggtgatg gtttcctcgc 5040tctttctggt catccaggcg aatctgctgt
ttcccctggc cagagggccc acgtagtagg 5100ggatgcggaa ggtcaggatc
ttctcgatct tttcccggtt gtccttcagg aatgggtaaa 5160aatcttcctg
ccgccgcaga atggcgtgca gctctcccag gtggatctgg tgggggatgc
5220tgccgttgtc gaaggtccgc tgcttccgca gcaggtcctc tctgttcagc
ttcacgagca 5280gttcctcggt gccgtccatc ttttccagga tgggcttgat
gaacttgtag aactcttcct 5340ggctggctcc gccgtcaatg tagccggcgt
agccgttctt gctctggtcg aagaaaatct 5400ctttgtactt ctcaggcagc
tgctgccgca cgagagcttt cagcagggtc aggtcctggt 5460ggtgctcgtc
gtatctcttg atcatagagg cgctcagggg ggccttggtg atctcggtgt
5520tcactctcag gatgtcgctc agcaggatgg cgtcggacag gttcttggcg
gccagaaaca 5580ggtcggcgta ctggtcgccg atctgggcca gcaggttgtc
caggtcgtcg tcgtaggtgt 5640ccttgctcag ctgcagtttg gcatcctcgg
ccaggtcgaa gttgctcttg aagttggggg 5700tcaggcccag gctcagggca
atcaggtttc cgaacaggcc attcttcttc tcgccgggca 5760gctgggcgat
cagattttcc agccgtctgc tcttgctcag tctggcagac aggatggcct
5820tggcgtccac gccgctggcg ttgatggggt tttcctcgaa cagctggttg
taggtctgca 5880ccagctggat gaacagcttg tccacgtcgc tgttgtcggg
gttcaggtcg ccctcgatca 5940ggaagtggcc ccggaacttg atcatgtggg
ccagggccag atagatcagc cgcaggtcgg 6000ccttgtcggt gctgtccacc
agtttctttc tcaggtggta gatggtgggg tacttctcgt 6060ggtaggccac
ctcgtccacg atgttgccga agatggggtg ccgctcgtgc ttcttatcct
6120cttccaccag gaaggactct tccagtctgt ggaagaagct gtcgtccacc
ttggccatct 6180cgttgctgaa gatctcttgc agatagcaga tccggttctt
ccgtctggtg tatcttcttc 6240tggcggttct cttcagccgg gtggcctcgg
ctgtttcgcc gctgtcgaac agcagggctc 6300cgatcaggtt cttcttgatg
ctgtgccggt cggtgttgcc cagcaccttg aatttcttgc 6360tgggcacctt
gtactcgtcg gtgatcacgg cccagcccac agagttggtg ccgatgtcca
6420ggccgatgct gtacttcttg tcggctgctg ggactccgtg gataccgacc
ttccgcttct 6480tctttggggc catcttatcg tcatcgtctt tgtaatcaat
atcatgatcc ttgtagtctc 6540cgtcgtggtc cttatagtcc atttttctcg
agggatcctg atatatttct attaggtatt 6600tattattata aaatataaat
cttgaatgat aataaataaa atattagtta ttccttttct 6660agtttaaaat
atacatatta taaatatata tatatatata tatattttta ttgtgacaag
6720aatatataat tataaattat attatttatt tttgtatttt tttttttttt
tttttttttt 6780tctttttttg ttttattttt cttttttttt ataaatatta
tttttttctt ttatcatgca 6840cattggaata atacattaat atatatatat
atattatatt atacatatat tgaataatgt 6900ttataaaaaa tgcataactt
atatgaatat aatttttttt aaatatgaca aaaagaaaaa 6960aaaaaaaaac
caaaaaaaat taaaattgaa atgaaatata taaatatatt atttatatat
7020attatacatt gtttaatact actacatgta tatatatata ttatatatat
atatatatat 7080caattttttc aaaaataaat taatataaaa agaggggaaa
aaaaaaaaaa aaaaaaaaaa 7140aagataatta agtaagcatt taaaaatata
taaattgata atatataaaa ttaatcacat 7200ataaaagctt ataaacacta
ggttagctaa ttcgcttgta agaggtactc tcgtttatgc 7260aaaactattt
gatatagcat tttaacaagt acacatatat atatgtaata tatatactat
7320atatatctat tgcatgtgta ctaagcatgt gcatggcatc ccctttttct
cgtgtttaaa 7380acagtttgta tgataaaata taaaggattt gaaaaagaga
aaaaaatata tgatctcatc 7440ctatatagcg ccataatttt tatttgggtt
gaataaaatt ttctactaaa tttaggtgta 7500agtaaaataa tggaatatat
ataagtacaa taaaaaagtg cataaattaa aaaattttta 7560taataaatat
tttttttaaa aaagtcaata ataatattaa atatatataa cacaggatta
7620tatatgttca ctacaatttt ttatattata atataaattc ttttcaattt
tcattttatt 7680ttacatacac tttccttttt tgtcactata ttttaatatt
cacatattta gtttaaatac 7740tggctatttc tttctacatt tgctagtaac
aattgtgtag tgcttaaata tatacacaca 7800cctaaaactt acaaagtatc
ctaggaccat ggccaagcct ttgtctcaag aagaatccac 7860cctcattgaa
agagcaacgg ctacaatcaa cagcatcccc atctctgaag actacagcgt
7920cgccagcgca gctctctcta gcgacggccg catcttcact ggtgtcaatg
tatatcattt 7980tactggggga ccttgtgcag aactcgtggt gctgggcact
gctgctgctg cggcagctgg 8040caacctgact tgtatcgtcg cgatcggaaa
tgagaacagg ggcatcttga gcccctgcgg 8100acggtgccga caggtgcttc
tcgatctgca tcctgggatc aaagccatag tgaaggacag 8160tgatggacag
ccgacggcag ttgggattcg tgaattgctg ccctctggtt atgtgtggga
8220gggctaaccg cgggtacccc attaaattta tttaataata gattaaaaat
attataaaaa 8280taaaaacata aacacagaaa ttacaaaaaa aatacatatg
aatttttttt ttgtaatctt 8340ccttataaat atagaataat gaatcatata
aaacatatca ttattcattt atttacattt 8400aaaattattg tttcagtatc
tttaatttat tatgtatata taaaaataac ttacaatttt 8460attaataaac
aatatatgtt tattaattca tgttttgtaa tttatgggat agcgattttt
8520tttactgtct gtattttctt ttttaattat gttttaattg tatttatttt
atttttatta 8580ttgttctttt tatagtatta ttttaaaaca aaatgtattt
tctaagaact tataataata 8640ataatataaa ttttaataaa aattatattt
atcttttaca atatgaacat aaagtacaac 8700attaatatat agcttttaat
atttttattc ctaatcatgt aaatcttaaa tttttctttt 8760taaacatatg
ttaaatattt atttctcatt atatataaga acatatttat tacatctaga
8820ggtaccgagc tcgttttcga cactggatgg cggcgttagt atcgaatcga
cagcagtata 8880gcgaccagca ttcacatacg attgacgcat gatattactt
tctgcgcact taacttcgca 8940tctgggcaga tgatgtcgag gcgaaaaaaa
atataaatca cgctaacatt tgattaaaat 9000agaacaacta caatataaaa
aaactataca aatgacaagt tcttgaaaac aagaatcttt 9060ttattgtcag
tactgattag aaaaactcat cgagcatcaa atgaaactgc aatttattca
9120tatcaggatt atcaatacca tatttttgaa aaagccgttt ctgtaatgaa
ggagaaaact 9180caccgaggca gttccatagg atggcaagat cctggtatcg
gtctgcgatt ccgactcgtc 9240caacatcaat acaacctatt aatttcccct
cgtcaaaaat aaggttatca agtgagaaat 9300caccatgagt gacgactgaa
tccggtgaga atggcaaaag cttatgcatt tctttccaga 9360cttgttcaac
aggccagcca ttacgctcgt catcaaaatc actcgcatca accaaaccgt
9420tattcattcg tgattgcgcc tgagcgagac gaaatacgcg atcgctgtta
aaaggacaat 9480tacaaacagg aatcgaatgc aaccggcgca ggaacactgc
cagcgcatca acaatatttt 9540cacctgaatc aggatattct tctaatacct
ggaatgctgt tttgccgggg atcgcagtgg 9600tgagtaacca tgcatcatca
ggagtacgga taaaatgctt gatggtcgga agaggcataa 9660attccgtcag
ccagtttagt ctgaccatct catctgtaac atcattggca acgctacctt
9720tgccatgttt cagaaacaac tctggcgcat cgggcttccc atacaatcga
tagattgtcg 9780cacctgattg cccgacatta tcgcgagccc atttataccc
atataaatca gcatccatgt 9840tggaatttaa tcgcggcctc
gaaacgtgag tcttttcctt acccatggtt gtttatgttc 9900ggatgtgatg
tgagaactgt atcctagcaa gattttaaaa ggaagtatat gaaagaagaa
9960cctcagtggc aaatcctaac cttttatatt tctctacagg ggcgcggcgt
ggggacaatt 10020caacgcgtct gtgaggggag cgtttccctg ctcgcaggtc
tgcagcgagg agccgtaatt 10080tttgcttcgc gccgtgcggc catcaaaatg
tatggatgca aatgattata catggggatg 10140tatgggctaa atgtacgggc
gacagtcaca tcatgcccct gagctgcgca cgtcaagact 10200gtcaaggagg
gtattctggg cctccatgtc gctggcctaa cattagtaat gtaggtctga
10260ctttcactca tataagtctt atggtaacta aactaaggtc ttacctttac
tgatatatgt 10320cttactttca ctaacttagg tattactttt actaacttag
gtcttaaatt cagtaactaa 10380ggtcatactt cgactaacta aggtcttaca
ttcactgata taggtcttat gattactaac 10440ttaggtccta atttgactaa
cataagtcct aacattagta atgtaggtct taacttaact 10500aacttaggtc
ttaccttcac taatataggt cttaatatta ctgacttaag taattaaggt
10560actaacttag gtcgtaaggt aactaatata taggtcttaa ggtaactaat
ttaggtcttg 10620acttaataaa tataggtcct aacataaata gtataggtcc
taatataagt actataggcc 10680ttaacttaac caacataggt cctaacataa
gttatatagg tcttaacgta actaacataa 10740gtcattaagg tactaagttt
ggtcttaatt taacaataac atgtcgctgg cctaacatta 10800gtaatgtagg
tctgactttc actcatataa gtcttatggt aactaaacta aggtcttacc
10860tttactgata tatgtcttac tttcactaac ttaggtatta cttttactaa
cttaggtctt 10920aaattcagta actaaggtca tacttcgact aactaaggtc
ttacattcac tgatataggt 10980cttatgatta ctaacttagg tcctaatttg
actaacataa gtcctaacat tagtaatgta 11040ggtcttaact taactaactt
aggtcttacc ttcactaata taggtcttaa tattactgac 11100ttaagtaatt
aaggtactaa cttaggtcgt aaggtaacta atatataggt cttaaggtaa
11160ctaatttagg tcttgactta ataaatatag gtcctaacat aaatagtata
ggtcctaata 11220taagtactat aggccttaac ttaaccaaca taggtcctaa
cataagttat ataggtctta 11280acgtaactaa cataagtcat taaggtacta
agtttggtct taatttaaca ataaccatgt 11340cgctggccgg gtggtcttaa
tttaacaaat atagaccatg tcgctggccg ggtgacccgg 11400cggggacgag
gcaagctaaa cagatcctcg tgatacgcct atttttatag gttaatgtca
11460tgataataat ggtttcttag gacggatcgc ttgcctgtaa cttacacgcg
cctcgtatct 11520tttaatgatg gaataatttg ggaatttact ctgtgtttat
ttatttttat gttttgtatt 11580tggattttag aaagtaaata aagaaggtag
aagagttacg gaatgaagaa aaaaaaataa 11640acaaaggttt aaaaaatttc
aacaaaaagc gtactttaca tatatattta ttagacaaga 11700aaagcagatt
aaatagatat acattcgatt aacgataagt aaaatgtaaa atcacaggat
11760tttcgtgtgt ggtcttctac acagacaaga tgaaacaatt cggcattaat
acctgagagc 11820aggaagagca agataaaagg tagtatttgt tggcgatccc
cctagagtct tttacatctt 11880cggaaaacaa aaactatttt ttctttaatt
tcttttttta ctttctattt ttaatttata 11940tatttatatt aaaaaattta
aattataatt atttttatag cacgtgatga aaaggaccca 12000ggtggcactt
ttcggggaaa tctcgacctg cagcgtacga agct 1204438912044DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
389gcctcactga ttaagcattg gtaactgtca gaccaagttt actcatatat
actttagatt 60gatttaaaac ttcattttta atttaaaagg atctaggtga agatcctttt
tgataatctc 120atgaccaaaa tcccttaacg tgagttttcg ttccactgag
cgtcagaccc cgtagaaaag 180atcaaaggat cttcttgaga tccttttttt
ctgcgcgtaa tctgctgctt gcaaacaaaa 240aaaccaccgc taccagcggt
ggtttgtttg ccggatcaag agctaccaac tctttttccg 300aaggtaactg
gcttcagcag agcgcagata ccaaatactg ttcttctagt gtagccgtag
360ttaggccacc acttcaagaa ctctgtagca ccgcctacat acctcgctct
gctaatcctg 420ttaccagtgg ctgctgccag tggcgataag tcgtgtctta
ccgggttgga ctcaagacga 480tagttaccgg ataaggcgca gcggtcgggc
tgaacggggg gttcgtgcac acagcccagc 540ttggagcgaa cgacctacac
cgaactgaga tacctacagc gtgagctatg agaaagcgcc 600acgcttcccg
aagggagaaa ggcggacagg tatccggtaa gcggcagggt cggaacagga
660gagcgcacga gggagcttcc agggggaaac gcctggtatc tttatagtcc
tgtcgggttt 720cgccacctct gacttgagcg tcgatttttg tgatgctcgt
caggggggcg gagcctatgg 780aaaaacgcca gcaacgcggc ctttttacgg
ttcctggcct tttgctggcc ttttgctcac 840atgttctttc ctgcgttatc
ccctgattct gtggataacc gtattaccgc ctttgagtga 900gctgataccg
ctcgccgcag ccgaacgacc gagcgcagcg agtcagtgag cgaggaagcg
960gaagagcgcc caatacgcaa accgcctctc cccgcgcgtt ggccgattca
ttaatgcagc 1020tggcacgaca ggtttcccga ctggaaagcg ggcagtgagc
gcaacgcaat taatgtgagt 1080tagctcactc attaggcacc ccaggcttta
cactttatgc ttccggctcg tatgttgtgt 1140ggaattgtga gcggataaca
atttcacaca ggaaacagct atgaccatga ttacgccaag 1200ctatttaggt
gacactatag aatactcaag cttgggggga tcctctagag tcgactaata
1260cgactcacta taggaaatga tatggatttt gggttttaga gctagaaata
gcaagttaaa 1320ataaggctag tccgttatca acttgaaaaa gtggcaccga
gtcggtgcta gcataacccc 1380ttggggcctc taaacgggtc ttgaggggtt
ttttggtcga cctgcaggca tgctatttga 1440tgaattaact acacttaaaa
taatacaatt attattaaat ttttttttga tttatttatt 1500aatttttaaa
cttaatcatt tgtatttggg aggaattata tatatcttta taattatttt
1560attttttttt atttttttat ttttttatta ttattatttt tttttatttt
ttttttttac 1620tgtatcaaag aaaaaccttt aaaaaaaaaa ttataatttc
cccatcttac tatattttta 1680atacatacgt tttaaggaat taaattagac
aaaagctata ttatgcttta catataatta 1740gaatttataa acgtttggtt
attagatatt tcatgtctca gtaaagtctt tcaatacata 1800tgtaaaaaaa
tatatatgaa tacacataag ttgttaatat attttatatg cataaatgta
1860taaatatata tatatatata tatatatgta tgtatgtata tgtgtgtata
tgaaattatt 1920tcaatgttta attttttaaa ttttaatttt tttttttttt
ttttttttta ttatgtatat 1980tgatctttat tatttaaata ttactttttt
cgttttttct tctttttatt attttttttt 2040ttttttatat tttatacaaa
tggtaattca aataaaaggt ataaatttat atttaatttt 2100cttttatgga
taaataaaag aaaaatataa atatataaaa atataaaaat atatatatgt
2160atattggggt gatgataaaa tgaaagataa tatatatata tatatatctt
tatttttttt 2220tttttgtaga ccccattgtg agtacataaa tatattatat
aactcgggag catcagtcat 2280ggaattctta tttctttttc ttttttgcct
ggccggcctt tttcgtggcc gccggccttt 2340tgtcgcctcc cagctgagac
aggtcgatcc gtgtctcgta caggccggtg atgctctggt 2400ggatcagggt
ggcgtccagc acctctttgg tgctggtgta cctcttccgg tcgatggtgg
2460tgtcaaagta cttgaaggcg gcaggggctc ccagattggt cagggtaaac
aggtggatga 2520tattctcggc ctgctctctg atgggcttat cccggtgctt
gttgtaggcg gacagcactt 2580tgtccagatt agcgtcggcc aggatcactc
tcttggagaa ctcgctgatc tgctcgatga 2640tctcgtccag gtagtgcttg
tgctgttcca caaacagctg tttctgctca ttatcctcgg 2700gggagccctt
cagcttctca tagtggctgg ccaggtacag gaagttcaca tatttggagg
2760gcagggccag ttcgtttccc ttctgcagtt cgccggcaga ggccagcatt
ctcttccggc 2820cgttttccag ctcgaacagg gagtacttag gcagcttgat
gatcaggtcc tttttcactt 2880ctttgtagcc cttggcttcc agaaagtcga
tgggattctt ctcgaagctg cttctttcca 2940tgatggtgat ccccagcagc
tctttcacac tcttcagttt cttggacttg cccttttcca 3000ctttggccac
caccagcaca gaataggcca cggtggggct gtcgaagccg ccgtacttct
3060tagggtccca gtccttcttt ctggcgatca gcttatcgct gttcctcttg
ggcaggatag 3120actctttgct gaagccgcct gtctgcacct cggtcttttt
cacgatattc acttggggca 3180tgctcagcac tttccgcacg gtggcaaaat
cccggccctt atcccacacg atctccccgg 3240tttcgccgtt tgtctcgatc
agaggccgct tccggatctc gccgttggcc agggtaatct 3300cggtcttgaa
aaagttcatg atgttgctgt agaagaagta cttggcggta gccttgccga
3360tttcctgctc gctcttggcg atcatcttcc gcacgtcgta caccttgtag
tcgccgtaca 3420cgaactcgct ttccagctta gggtactttt tgatcagggc
ggttcccacg acggcgttca 3480ggtaggcgtc gtgggcgtgg tggtagttgt
tgatctcgcg cactttgtaa aactggaaat 3540ccttccggaa atcggacacc
agcttggact tcagggtgat cactttcact tcccggatca 3600gcttgtcatt
ctcgtcgtac ttagtgttca tccgggagtc caggatctgt gccacgtgct
3660ttgtgatctg ccgggtttcc accagctgtc tcttgatgaa gccggcctta
tccagttcgc 3720tcaggccgcc tctctcggcc ttggtcagat tgtcgaactt
tctctgggta atcagcttgg 3780cgttcagcag ctgccgccag tagttcttca
tcttcttcac gacctcttcg gagggcacgt 3840tgtcgctctt gccccggttc
ttgtcgcttc tggtcagcac cttgttgtcg atggagtcgt 3900ccttcagaaa
gctctgaggc acgatatggt ccacatcgta gtcggacagc cggttgatgt
3960ccagttcctg gtccacgtac atatcccgcc cattctgcag gtagtacagg
tacagcttct 4020cgttctgcag ctgggtgttt tccacggggt gttctttcag
gatctggctg cccagctctt 4080tgatgccctc ttcgatccgc ttcattctct
cgcggctgtt cttctgtccc ttctgggtgg 4140tctggttctc tctggccatt
tcgatcacga tgttctcggg cttgtgccgg cccatcactt 4200tcacgagctc
gtccaccacc ttcactgtct gcaggatgcc cttcttaatg gcggggctgc
4260cggccagatt ggcaatgtgc tcgtgcaggc tatcgccctg gccggacacc
tgggctttct 4320ggatgtcctc tttaaaggtc aggctgtcgt cgtggatcag
ctgcatgaag tttctgttgg 4380cgaagccgtc ggacttcagg aaatccagga
ttgtcttgcc ggactgcttg tcccggatgc 4440cgttgatcag cttccggctc
agcctgcccc agccggtgta tctccgccgc ttcagctgct 4500tcatcacttt
gtcgtcgaac aggtgggcat aggttttcag ccgttcctcg atcatctctc
4560tgtcctcaaa cagtgtcagg gtcagcacga tatcttccag aatgtcctcg
ttttcctcat 4620tgtccaggaa gtccttgtcc ttgataattt tcagcagatc
gtggtatgtg cccagggagg 4680cgttgaaccg atcttccacg ccggagattt
ccacggagtc gaagcactcg attttcttga 4740agtagtcctc tttcagctgc
ttcacggtca ctttccggtt ggtcttgaac agcaggtcca 4800cgatggcctt
tttctgctcg ccgctcagga aggcgggctt tctcattccc tcggtcacgt
4860atttcacttt ggtcagctcg ttatacacgg tgaagtactc gtacagcagg
ctgtgcttgg 4920gcagcacctt ctcgttgggc aggttcttat cgaagttggt
catccgctcg atgaagctct 4980gggcggaagc gcccttgtcc accacttcct
cgaagttcca gggggtgatg gtttcctcgc 5040tctttctggt catccaggcg
aatctgctgt ttcccctggc cagagggccc acgtagtagg 5100ggatgcggaa
ggtcaggatc ttctcgatct tttcccggtt gtccttcagg aatgggtaaa
5160aatcttcctg ccgccgcaga atggcgtgca gctctcccag gtggatctgg
tgggggatgc 5220tgccgttgtc gaaggtccgc tgcttccgca gcaggtcctc
tctgttcagc ttcacgagca 5280gttcctcggt gccgtccatc ttttccagga
tgggcttgat gaacttgtag aactcttcct 5340ggctggctcc gccgtcaatg
tagccggcgt agccgttctt gctctggtcg aagaaaatct 5400ctttgtactt
ctcaggcagc tgctgccgca cgagagcttt cagcagggtc aggtcctggt
5460ggtgctcgtc gtatctcttg atcatagagg cgctcagggg ggccttggtg
atctcggtgt 5520tcactctcag gatgtcgctc agcaggatgg cgtcggacag
gttcttggcg gccagaaaca 5580ggtcggcgta ctggtcgccg atctgggcca
gcaggttgtc caggtcgtcg tcgtaggtgt 5640ccttgctcag ctgcagtttg
gcatcctcgg ccaggtcgaa gttgctcttg aagttggggg 5700tcaggcccag
gctcagggca atcaggtttc cgaacaggcc attcttcttc tcgccgggca
5760gctgggcgat cagattttcc agccgtctgc tcttgctcag tctggcagac
aggatggcct 5820tggcgtccac gccgctggcg ttgatggggt tttcctcgaa
cagctggttg taggtctgca 5880ccagctggat gaacagcttg tccacgtcgc
tgttgtcggg gttcaggtcg ccctcgatca 5940ggaagtggcc ccggaacttg
atcatgtggg ccagggccag atagatcagc cgcaggtcgg 6000ccttgtcggt
gctgtccacc agtttctttc tcaggtggta gatggtgggg tacttctcgt
6060ggtaggccac ctcgtccacg atgttgccga agatggggtg ccgctcgtgc
ttcttatcct 6120cttccaccag gaaggactct tccagtctgt ggaagaagct
gtcgtccacc ttggccatct 6180cgttgctgaa gatctcttgc agatagcaga
tccggttctt ccgtctggtg tatcttcttc 6240tggcggttct cttcagccgg
gtggcctcgg ctgtttcgcc gctgtcgaac agcagggctc 6300cgatcaggtt
cttcttgatg ctgtgccggt cggtgttgcc cagcaccttg aatttcttgc
6360tgggcacctt gtactcgtcg gtgatcacgg cccagcccac agagttggtg
ccgatgtcca 6420ggccgatgct gtacttcttg tcggctgctg ggactccgtg
gataccgacc ttccgcttct 6480tctttggggc catcttatcg tcatcgtctt
tgtaatcaat atcatgatcc ttgtagtctc 6540cgtcgtggtc cttatagtcc
atttttctcg agggatcctg atatatttct attaggtatt 6600tattattata
aaatataaat cttgaatgat aataaataaa atattagtta ttccttttct
6660agtttaaaat atacatatta taaatatata tatatatata tatattttta
ttgtgacaag 6720aatatataat tataaattat attatttatt tttgtatttt
tttttttttt tttttttttt 6780tctttttttg ttttattttt cttttttttt
ataaatatta tttttttctt ttatcatgca 6840cattggaata atacattaat
atatatatat atattatatt atacatatat tgaataatgt 6900ttataaaaaa
tgcataactt atatgaatat aatttttttt aaatatgaca aaaagaaaaa
6960aaaaaaaaac caaaaaaaat taaaattgaa atgaaatata taaatatatt
atttatatat 7020attatacatt gtttaatact actacatgta tatatatata
ttatatatat atatatatat 7080caattttttc aaaaataaat taatataaaa
agaggggaaa aaaaaaaaaa aaaaaaaaaa 7140aagataatta agtaagcatt
taaaaatata taaattgata atatataaaa ttaatcacat 7200ataaaagctt
ataaacacta ggttagctaa ttcgcttgta agaggtactc tcgtttatgc
7260aaaactattt gatatagcat tttaacaagt acacatatat atatgtaata
tatatactat 7320atatatctat tgcatgtgta ctaagcatgt gcatggcatc
ccctttttct cgtgtttaaa 7380acagtttgta tgataaaata taaaggattt
gaaaaagaga aaaaaatata tgatctcatc 7440ctatatagcg ccataatttt
tatttgggtt gaataaaatt ttctactaaa tttaggtgta 7500agtaaaataa
tggaatatat ataagtacaa taaaaaagtg cataaattaa aaaattttta
7560taataaatat tttttttaaa aaagtcaata ataatattaa atatatataa
cacaggatta 7620tatatgttca ctacaatttt ttatattata atataaattc
ttttcaattt tcattttatt 7680ttacatacac tttccttttt tgtcactata
ttttaatatt cacatattta gtttaaatac 7740tggctatttc tttctacatt
tgctagtaac aattgtgtag tgcttaaata tatacacaca 7800cctaaaactt
acaaagtatc ctaggaccat ggccaagcct ttgtctcaag aagaatccac
7860cctcattgaa agagcaacgg ctacaatcaa cagcatcccc atctctgaag
actacagcgt 7920cgccagcgca gctctctcta gcgacggccg catcttcact
ggtgtcaatg tatatcattt 7980tactggggga ccttgtgcag aactcgtggt
gctgggcact gctgctgctg cggcagctgg 8040caacctgact tgtatcgtcg
cgatcggaaa tgagaacagg ggcatcttga gcccctgcgg 8100acggtgccga
caggtgcttc tcgatctgca tcctgggatc aaagccatag tgaaggacag
8160tgatggacag ccgacggcag ttgggattcg tgaattgctg ccctctggtt
atgtgtggga 8220gggctaaccg cgggtacccc attaaattta tttaataata
gattaaaaat attataaaaa 8280taaaaacata aacacagaaa ttacaaaaaa
aatacatatg aatttttttt ttgtaatctt 8340ccttataaat atagaataat
gaatcatata aaacatatca ttattcattt atttacattt 8400aaaattattg
tttcagtatc tttaatttat tatgtatata taaaaataac ttacaatttt
8460attaataaac aatatatgtt tattaattca tgttttgtaa tttatgggat
agcgattttt 8520tttactgtct gtattttctt ttttaattat gttttaattg
tatttatttt atttttatta 8580ttgttctttt tatagtatta ttttaaaaca
aaatgtattt tctaagaact tataataata 8640ataatataaa ttttaataaa
aattatattt atcttttaca atatgaacat aaagtacaac 8700attaatatat
agcttttaat atttttattc ctaatcatgt aaatcttaaa tttttctttt
8760taaacatatg ttaaatattt atttctcatt atatataaga acatatttat
tacatctaga 8820ggtaccgagc tcgttttcga cactggatgg cggcgttagt
atcgaatcga cagcagtata 8880gcgaccagca ttcacatacg attgacgcat
gatattactt tctgcgcact taacttcgca 8940tctgggcaga tgatgtcgag
gcgaaaaaaa atataaatca cgctaacatt tgattaaaat 9000agaacaacta
caatataaaa aaactataca aatgacaagt tcttgaaaac aagaatcttt
9060ttattgtcag tactgattag aaaaactcat cgagcatcaa atgaaactgc
aatttattca 9120tatcaggatt atcaatacca tatttttgaa aaagccgttt
ctgtaatgaa ggagaaaact 9180caccgaggca gttccatagg atggcaagat
cctggtatcg gtctgcgatt ccgactcgtc 9240caacatcaat acaacctatt
aatttcccct cgtcaaaaat aaggttatca agtgagaaat 9300caccatgagt
gacgactgaa tccggtgaga atggcaaaag cttatgcatt tctttccaga
9360cttgttcaac aggccagcca ttacgctcgt catcaaaatc actcgcatca
accaaaccgt 9420tattcattcg tgattgcgcc tgagcgagac gaaatacgcg
atcgctgtta aaaggacaat 9480tacaaacagg aatcgaatgc aaccggcgca
ggaacactgc cagcgcatca acaatatttt 9540cacctgaatc aggatattct
tctaatacct ggaatgctgt tttgccgggg atcgcagtgg 9600tgagtaacca
tgcatcatca ggagtacgga taaaatgctt gatggtcgga agaggcataa
9660attccgtcag ccagtttagt ctgaccatct catctgtaac atcattggca
acgctacctt 9720tgccatgttt cagaaacaac tctggcgcat cgggcttccc
atacaatcga tagattgtcg 9780cacctgattg cccgacatta tcgcgagccc
atttataccc atataaatca gcatccatgt 9840tggaatttaa tcgcggcctc
gaaacgtgag tcttttcctt acccatggtt gtttatgttc 9900ggatgtgatg
tgagaactgt atcctagcaa gattttaaaa ggaagtatat gaaagaagaa
9960cctcagtggc aaatcctaac cttttatatt tctctacagg ggcgcggcgt
ggggacaatt 10020caacgcgtct gtgaggggag cgtttccctg ctcgcaggtc
tgcagcgagg agccgtaatt 10080tttgcttcgc gccgtgcggc catcaaaatg
tatggatgca aatgattata catggggatg 10140tatgggctaa atgtacgggc
gacagtcaca tcatgcccct gagctgcgca cgtcaagact 10200gtcaaggagg
gtattctggg cctccatgtc gctggcctaa cattagtaat gtaggtctga
10260ctttcactca tataagtctt atggtaacta aactaaggtc ttacctttac
tgatatatgt 10320cttactttca ctaacttagg tattactttt actaacttag
gtcttaaatt cagtaactaa 10380ggtcatactt cgactaacta aggtcttaca
ttcactgata taggtcttat gattactaac 10440ttaggtccta atttgactaa
cataagtcct aacattagta atgtaggtct taacttaact 10500aacttaggtc
ttaccttcac taatataggt cttaatatta ctgacttaag taattaaggt
10560actaacttag gtcgtaaggt aactaatata taggtcttaa ggtaactaat
ttaggtcttg 10620acttaataaa tataggtcct aacataaata gtataggtcc
taatataagt actataggcc 10680ttaacttaac caacataggt cctaacataa
gttatatagg tcttaacgta actaacataa 10740gtcattaagg tactaagttt
ggtcttaatt taacaataac atgtcgctgg cctaacatta 10800gtaatgtagg
tctgactttc actcatataa gtcttatggt aactaaacta aggtcttacc
10860tttactgata tatgtcttac tttcactaac ttaggtatta cttttactaa
cttaggtctt 10920aaattcagta actaaggtca tacttcgact aactaaggtc
ttacattcac tgatataggt 10980cttatgatta ctaacttagg tcctaatttg
actaacataa gtcctaacat tagtaatgta 11040ggtcttaact taactaactt
aggtcttacc ttcactaata taggtcttaa tattactgac 11100ttaagtaatt
aaggtactaa cttaggtcgt aaggtaacta atatataggt cttaaggtaa
11160ctaatttagg tcttgactta ataaatatag gtcctaacat aaatagtata
ggtcctaata 11220taagtactat aggccttaac ttaaccaaca taggtcctaa
cataagttat ataggtctta 11280acgtaactaa cataagtcat taaggtacta
agtttggtct taatttaaca ataaccatgt 11340cgctggccgg gtggtcttaa
tttaacaaat atagaccatg tcgctggccg ggtgacccgg 11400cggggacgag
gcaagctaaa cagatcctcg tgatacgcct atttttatag gttaatgtca
11460tgataataat ggtttcttag gacggatcgc ttgcctgtaa cttacacgcg
cctcgtatct 11520tttaatgatg gaataatttg ggaatttact ctgtgtttat
ttatttttat gttttgtatt 11580tggattttag aaagtaaata aagaaggtag
aagagttacg gaatgaagaa aaaaaaataa 11640acaaaggttt aaaaaatttc
aacaaaaagc gtactttaca tatatattta ttagacaaga 11700aaagcagatt
aaatagatat acattcgatt aacgataagt aaaatgtaaa atcacaggat
11760tttcgtgtgt ggtcttctac acagacaaga tgaaacaatt cggcattaat
acctgagagc 11820aggaagagca agataaaagg tagtatttgt tggcgatccc
cctagagtct tttacatctt 11880cggaaaacaa aaactatttt ttctttaatt
tcttttttta ctttctattt ttaatttata 11940tatttatatt aaaaaattta
aattataatt atttttatag cacgtgatga aaaggaccca 12000ggtggcactt
ttcggggaaa tctcgacctg cagcgtacga agct 12044
* * * * *
References