U.S. patent application number 15/039393 was filed with the patent office on 2017-02-02 for tal effector means useful for partial or full deletion of dna tandem repeats.
The applicant listed for this patent is INSTITUT PASTEUR. Invention is credited to Valentine MOSBACH, Guy-Franck RICHARD, David VITERBO.
Application Number | 20170029794 15/039393 |
Document ID | / |
Family ID | 49766009 |
Filed Date | 2017-02-02 |
United States Patent
Application |
20170029794 |
Kind Code |
A1 |
RICHARD; Guy-Franck ; et
al. |
February 2, 2017 |
TAL EFFECTOR MEANS USEFUL FOR PARTIAL OR FULL DELETION OF DNA
TANDEM REPEATS
Abstract
The application relates to means, which derive from TAL
effectors and TALENs. The structure of the means of the application
is especially adapted for partial or full deletion of at least one
DNA tandem repeat, more particularly for partial or full deletion
of at least one DNA tandem repeat in a double-stranded DNA, more
particularly for partial or full deletion of at least one DNA
tandem repeat, which is contained in a double-stranded DNA and,
which forms a complex secondary structure, such as a hairpin, a
triple helix or a tetraplex secondary structure. The means of the
application are notably useful in the treatment and/or prevention
and/or palliation of a disease or disorder involving at least one
DNA tandem repeat, such as DM1, SCA8, SCA12, HDL2, SBMA, HD, DRPLA,
SCA1, SCA2, SCA3, SCA6, SCA7, SCA17, PSACH, DM2, SCA10, SPD1, OPMD,
CCD, HPE5, HFG syndrome, BPES, EIEE1, FRAXA, FXTAS and FRAXE.
Inventors: |
RICHARD; Guy-Franck; (Paris,
FR) ; MOSBACH; Valentine; (Paris, FR) ;
VITERBO; David; (Paris, FR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
INSTITUT PASTEUR |
Paris |
|
FR |
|
|
Family ID: |
49766009 |
Appl. No.: |
15/039393 |
Filed: |
November 26, 2014 |
PCT Filed: |
November 26, 2014 |
PCT NO: |
PCT/EP2014/075718 |
371 Date: |
May 25, 2016 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
C07K 2319/80 20130101;
A61P 21/00 20180101; A61K 48/005 20130101; C12N 15/907 20130101;
C12N 15/62 20130101; A61P 19/00 20180101; C12Y 301/21004 20130101;
C12N 9/22 20130101; A61P 25/00 20180101 |
International
Class: |
C12N 9/22 20060101
C12N009/22; C12N 15/90 20060101 C12N015/90 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 29, 2013 |
EP |
13306644.9 |
Claims
1. A composition or kit comprising a first DNA-binding polypeptide
and a second DNA-binding polypeptide, wherein said first
DNA-binding polypeptide is different from said second DNA-binding
polypeptide, wherein each of said first and second DNA-binding
polypeptides binds to a DNA nucleic acid comprising at least one
DNA tandem repeat, wherein the DNA nucleic acid to which said first
DNA-binding polypeptide binds is one strand of a double-stranded
nucleic acid, wherein the DNA nucleic acid to which said second
DNA-binding polypeptide binds is the other strand of the same
double-stranded nucleic acid, wherein said double-stranded DNA
nucleic acid is a gene involved in a neurological and/or muscular
and/or skeletal disorder or disease involving said at least one DNA
tandem repeat, wherein each of said first and second DNA-binding
polypeptides comprises a TAL effector tandem repeat consisting of
adjacent units of TAL effector tandem repeat, wherein the ordered
series of RVDs formed by the RVDs respectively contained in said
adjacent units of TAL effector tandem repeat, in N- to
C-orientation, is an ordered series of amino acids, which
determines the recognition of the 5'-3' nucleotide sequence of a
DNA target site contained in the strand of double-stranded DNA
nucleic acid to which said DNA-binding polypeptide binds, wherein
the sequence of said DNA target site is: i. a fragment of said
strand of double-stranded DNA nucleic acid consisting of a fragment
of said at least one DNA tandem repeat, wherein said fragment
comprises more than one copy of said DNA sequence unit of said at
least one DNA tandem repeat, or ii. a fragment of said strand of
double-stranded DNA nucleic acid, which starts outside the sequence
of said at least one DNA tandem repeat and ends within the sequence
of said at least one DNA tandem repeat, or conversely, which starts
within the sequence of said at least one DNA tandem repeat and ends
outside the sequence of said at least one DNA tandem repeat,
wherein each of said first and second DNA-binding polypeptides is
directly or indirectly linked to one endonuclease monomer or to one
fragment of endonuclease monomer, wherein said fragment of
endonuclease monomer still comprises the catalytic domain of said
endonuclease monomer, and wherein said first and second DNA-binding
polypeptides induce a partial or complete deletion of said at least
one DNA tandem repeat.
2. The composition or kit of claim 1, wherein the DNA target site
of said first DNA-binding polypeptide is a DNA target site as
defined in claim 1 i. and the DNA target site of said second
DNA-binding polypeptide is a DNA target site as defined in claim 1
ii.
3. The composition or kit of claim 1 or 2, wherein the sequence of
the DNA target site of said first DNA-binding polypeptide is
different from the sequence of the DNA target site of said second
DNA-binding polypeptide.
4. The composition or kit of any one of claims 1-3, wherein the
endonuclease monomer or endonuclease monomer fragment of said first
DNA-binding polypeptide dimerizes with the endonuclease monomer or
endonuclease monomer fragment of said second DNA-binding
polypeptide, when said first and second DNA-binding polypeptides
are bound to their respective DNA target sites.
5. The composition or kit of any one of claims 1-4, wherein said
dimeric endonuclease is the endonuclease FokI.
6. The composition or kit of any one of claims 1-5, wherein the
nucleotide spacer length between the DNA target site of said first
DNA-binding polypeptide and the DNA target site of said second
DNA-binding polypeptide is of 15-24 nucleotides.
7. The composition or kit of any one of claims 1-6, wherein said
first and second DNA-binding polypeptides induce a double-strand
break specifically in said double-stranded DNA nucleic acid.
8. The composition or kit of any one of claims 1-7, wherein each of
said first and second DNA-binding polypeptides further comprises a
Nuclear Localization Signal (NLS).
9. The composition or kit of any one of claims 1-8, wherein none of
said first and second DNA-binding polypeptides comprises the acidic
transcriptional Activator Domain (AD) of a TAL effector.
10. A composition or kit, which comprises a first nucleic acid and
a second nucleic acid, wherein said first nucleic acid codes for a
first DNA-binding polypeptide, wherein said second nucleic acid
codes for a second DNA-binding polypeptide, wherein said first
DNA-binding polypeptide is different from said second DNA-binding
polypeptide, and wherein said first DNA-binding polypeptide and
said second DNA-binding polypeptide are as defined in any one of
claims 1-9.
11. A composition or kit, which comprises: a first recombinant
nucleic acid vector and a second recombinant nucleic acid vector,
wherein said first recombinant nucleic acid vector codes for a
first DNA-binding polypeptide, wherein said second recombinant
nucleic acid vector codes for a second DNA-binding polypeptide,
wherein said first DNA-binding polypeptide is different from said
second DNA-binding polypeptide, and wherein said first DNA-binding
polypeptide and said second DNA-binding polypeptide are as defined
in any one of claims 1-9; and/or comprising a first lentiviral
vector pseudotyped particle and a second lentiviral vector
pseudotyped particle, wherein said first lentiviral vector
pseudotyped particle codes for a first DNA-binding polypeptide,
wherein said second lentiviral vector pseudotyped particle codes
for a second DNA-binding polypeptide, wherein said first
DNA-binding polypeptide is different from said second DNA-binding
polypeptide, and wherein said first DNA-binding polypeptide and
said second DNA-binding polypeptide are as defined in any one of
claims 1-9.
12. The composition or kit of any one of claims 1-11, which is for
use in the treatment and/or palliation and/or prevention of said
neurological and/or muscular and/or skeletal disorder or
disease.
13. The composition or kit for the use according to claim 12,
wherein said first and second DNA-binding polypeptides induces a
deletion of said at least one DNA tandem repeat to a length below
the pathological threshold.
14. The composition or kit for the use according to claim 12 or 13,
wherein said neurological and/or muscular and/or skeletal disease
or disorder is a trinucleotide, tetranucleotide or pentanucleotide
disease or disorder.
15. The composition or kit for the use according to any one of
claims 12-14, wherein said DNA sequence unit of said at least one
DNA tandem repeat is .sup.5'CTG.sup.3', .sup.5'TTG.sup.3',
.sup.5'GTC.sup.3', .sup.5'CCTG.sup.3', .sup.5'ATTCT.sup.3' or
.sup.5'AGAAT.sup.3'.
16. The composition or kit for the use according to any one of
claims 12-15, wherein the sequence of said at least one DNA tandem
repeat forms a secondary structure, which is a hairpin, a triple
helix or a tetraplex secondary structure.
17. The composition or kit for the use according to any one of
claims 12-16, wherein said neurological and/or muscular and/or
skeletal disease or disorder is a trinucleotide, tetranucleotide or
pentanucleotide disease or disorder.
18. The composition or kit for the use according to any one of
claims 12-17, wherein said neurological and/or muscular and/or
skeletal disease or disorder is DM1, SCA8, SCA12, HDL2, SBMA, HD,
DRPLA, SCA1, SCA2, SCA3, SCA6, SCA7, SCA17, PSACH, DM2 or
SCA10.
19. The composition or kit for the use according to any one of
claims 12-18, wherein said gene involved in a neurological and/or
muscular and/or skeletal disorder or disease is: a gene coding for
DMPK, ATXN8, PPP2R2B, JPH3, AR, HTT, ATN1, ATXN1, ATXN2, ATXN3,
CACNA1A, ATXN7, TBP, COMP, ZNF9 or ATXN10, or the human gene coding
for DMPK, ATXN8, PPP2R2B, JPH3, AR, HTT, ATN1, ATXN1, ATXN2, ATXN3,
CACNA1A, ATXN7, TBP, COMP, ZNF9 or ATXN10, and wherein said gene
comprises said at least one DNA tandem repeat.
20. The composition or kit for the use according to any one of
claims 12-19, wherein said adjacent units of TAL effector tandem
repeat comprise one or several copy(ies) of at least one sequence
selected from the group consisting of SEQ ID NOs: 25, 26, 46 and
55, and/or comprise one or several copy(ies) of at least one of the
sequences of TAL effector tandem repeat units of the DNA-binding
polypeptide, which is coded by the plasmid deposited at the
Collection Nationale de Culture de Microorganismes (C.N.C.M.),
Paris, France, under deposit number I-4804 or under deposit number
I-4805.
21. The composition or kit for the use according to any one of
claims 12-20, wherein said DNA target site of claim 1 i. is the
sequence of SEQ ID NO: 10 or 11, and/or wherein said DNA target
site of claim 1 ii. is the sequence of SEQ ID NO: 4 or 5, and/or
wherein said first and second DNA-binding polypeptides are the
polypeptides coded by the sequences of SEQ ID NOs: 1 and 2,
respectively.
22. A method for producing a product that is useful for fully or
partially deleting a DNA tandem repeat that is contained in a
double stranded DNA, wherein said DNA tandem repeat forms, in said
double stranded DNA, a secondary structure, which is a hairpin, a
triple helix or a tetraplex structure, wherein said method
comprises producing a pair of DNA-binding polypeptides, wherein
said pair of DNA-binding polypeptides is a first DNA-binding
polypeptide and a second DNA-binding polypeptide as defined in any
one of claims 1-9, and wherein said pair of DNA-binding
polypeptides is a product useful for said full or partial DNA
tandem repeat deletion.
23. A method for fully or partially deleting in vitro a DNA tandem
repeat that is contained in a double stranded DNA, wherein said
method comprises placing in vitro said double-stranded DNA into
contact with: a first DNA-binding polypeptide and a second
DNA-binding polypeptide as defined in any one of claims 1-9,
wherein said first and second DNA-binding polypeptides induce the
full or partial deletion of said DNA tandem repeat, or with a first
nucleic acid and a second nucleic acid as defined in claim 10,
wherein the first and second DNA-binding polypeptides, which are
coded by said first and second nucleic acids induce the full or
partial deletion of said DNA tandem repeat, or with a first
recombinant nucleic acid vector and a second recombinant nucleic
acid vector as defined in claim 11, wherein the first and second
DNA-binding polypeptides, which are coded by said first and second
recombinant nucleic acid vectors, induce the full or partial
deletion of said DNA tandem repeat, or with a first lentiviral
vector pseudotyped particle and a second lentiviral vector
pseudotyped particle as defined in claim 11, wherein the first and
second DNA-binding polypeptides, which are coded by said first and
second lentiviral vector pseudotyped particles induce the full or
partial deletion of said DNA tandem repeat.
Description
FIELD OF THE INVENTION
[0001] The application relates to means, which derive from
transcription activator-like (TAL) effectors, more particularly
from TAL effector endonucleases (TALENs).
[0002] The means of the application are notably useful for fully or
partially deleting a DNA tandem repeat, more particularly for fully
or partially deleting a DNA tandem repeat in a double-stranded DNA
molecule, more particularly for fully or partially deleting an
expanded DNA tandem repeat in a double-stranded DNA molecule.
[0003] The application also relates to medical and biotechnological
applications, more particularly in the field of diseases and
disorders involving expanded DNA tandem repeats in double-stranded
DNA molecules, such as trinucleotide repeat diseases or disorders,
tetranucleotide repeat diseases or disorders or pentanucleotide
repeat diseases or disorders.
BACKGROUND OF THE INVENTION
[0004] DNA tandem repeats occur frequently in double-stranded DNAs
of eukaryotic genomes, more particularly of the human genome. DNA
tandem repeat units of 2, 3, 4, 5 or even more nucleotides can be
observed in a genome at different frequencies and locations (exons,
introns, intergenic regions). DNA tandem repeats are prone to
recombination and/or random integration events, and are considered
to be at the center of species evolution.
[0005] However, expansion in the length of a DNA tandem repeat can
result in deleterious effects on gene function, leading to disease
or disorder. Expansion in DNA tandem repeat is known to underlie
about 20 severe neurological and/or muscular and/or skeletal
diseases or disorders (McMurray 2010).
[0006] Over the last 20 years or so, it was demonstrated that
replication slippage, double-strand break repair, base excision
repair, nucleotide excision repair, basically any mechanism
involving de novo DNA synthesis within a DNA tandem repeat, are
involved in DNA tandem repeat expansion. However, the precise
mechanisms are still obscure.
[0007] A large amount of studies were devoted to understanding the
mechanisms responsible for large trinucleotide repeat expansions,
using model systems as diverse as bacteria, yeast, drosophila, mice
or human cell lines.
[0008] Richard et al. 1999 and Richard et al. 2003 demonstrated
that the insertion of a recognition site for the rare cutter
endonuclease I-SceI or HO between two short (CAG).sub.n repeats
leads to the induction of a double-strand break (DSB) by said
endonuclease, resulting in contractions or expansions of the repeat
domain. However, the efficacy of such engineered nucleases is
highly variable depending on the genomic target tested, and
requires the insertion of the endonuclease recognition site.
[0009] Zinc-finger nucleases (ZFN) were developed for targeted gene
editing in eukaryotes. They were built by fusing modular
zinc-finger DNA-binding domains to the catalytic domain of the Fok
I endonuclease (Mittelman et al. 2009). However, they induce high
toxicity and a high frequency of off-target mutations, probably due
to recognition and cutting of many degenerate sequences differing
only slightly from the targeted sequence.
[0010] Hence, the available prior art means are not fully adapted
to the deletion of an (expanded) DNA tandem repeat in a
double-stranded DNA, and are not adapted to medical
applications.
[0011] Furthermore, an expanded DNA tandem repeat domain in a
double-stranded DNA, such as those observed in pathological
conditions, poses particular technical problems. Indeed, such an
expanded DNA tandem repeat domain forms a complex secondary
structure, such as a hairpin, a triple helix or a tetraplex
secondary structure, which hinders or complicates accessibility to
appropriate cleavage and which may promote repeat expansion during
DSB repair (Richard et al. 2000).
[0012] Appropriate means should allow size reduction of the
(expanded) DNA tandem repeat down to a non-pathological level, even
when said (expanded) DNA tandem repeat has a complex secondary
structure, such as a hairpin, a triple helix or a tetraplex
secondary structure.
[0013] Appropriate means should also be as less toxic as possible
to allow survival of the cell, and induce as less side mutations or
alterations as possible. Advantageously, they should be
sufficiently specific to avoid off-targets cleavage as much as
possible.
[0014] The application provides means, which can achieve these
goals.
SUMMARY OF THE INVENTION
[0015] The means of the application derive from TAL effectors and
TALENs. The structure of the means of the application is especially
adapted for partial or full deletion of at least one DNA tandem
repeat, more particularly for partial or full deletion of at least
one DNA tandem repeat in a double-stranded DNA, more particularly
for partial or full deletion of at least one DNA tandem repeat,
which is contained in a double-stranded DNA and, which forms a
non-linear secondary structure, such as a hairpin, a triple helix
or a tetraplex secondary structure. The means of the application
are especially adapted for partial or full deletion of at least one
(expanded) DNA tandem repeat in a double-stranded DNA, such as
those observed in pathological conditions.
[0016] The application relates to the subject-matter as defined in
the claims as filed and as herein described.
[0017] More particularly, the application relates to DNA-binding
polypeptides and to products deriving therefrom such as nucleic
acids, vectors, cells, liposomes, nanoparticles, sets,
compositions, kits, pharmaceutical compositions, medicaments and
drugs.
[0018] The application also relates to uses of said products and to
methods involving at least one of said products, more particularly
in the medical field.
[0019] The products of the application are notably useful in the
treatment and/or prevention and/or palliation of a disease or
disorder involving at least one DNA tandem repeat, more
particularly of a trinucleotide, tetranucleotide or pentanucleotide
disease or disorder, such as DM1, SCA8, SCA12, HDL2, SBMA, HD,
DRPLA, SCA1, SCA2, SCA3, SCA6, SCA7, SCA17, PSACH, DM2, SCA10,
SPD1, OPMD, CCD, HPE5, HFG syndrome, BPES, EIEE1, FRAXA, FXTAS and
FRAXE (cf. Tables 6, 7 and 8 below).
[0020] The means of the application allow size reduction of the DNA
tandem repeat down to a non-pathological level at a high efficacy
rate (near 100% in heterozygous and homozygous yeast cells).
[0021] No increase in the mutation rate was detected. No large
genomic rearrangement, such as aneuploidy, segmental duplication or
translocation, was detected.
[0022] According to an advantageous aspect of the application, the
means of the application do not induce any length alteration or
mutation at off-target locations, e.g., in non-pathological genes,
which comprise the same repeat unit as the pathological gene.
[0023] It is believed that it is the first demonstration of the
induction of a shortening of a DNA tandem repeat in a
double-stranded DNA to lengths below pathological thresholds in
humans, with 100% efficacy and a high specificity.
BRIEF DESCRIPTION OF THE FIGURES
[0024] FIGS. 1A and 1B: schematic representation of the experiments
of example 1.
[0025] FIG. 1A: Plasmids pCLS9996 (C.N.C.M. deposit number I-4804;
C.N.C.M. deposit date: 10 Oct. 2013) and pCLS16715 (C.N.C.M.
deposit number I-4805; C.N.C.M. deposit date: 10 Oct. 2013),
carrying the two TALEN arms were respectively transformed into
GFY40 strain or GFY6162-3D. Haploids were crossed and diploids
containing both TALEN arms were selected on SC-Leu supplemented
with G418 sulfate. As a control, the split-TALEN left arm carried
by pCLS9984 was transformed in GFY6162-3D, crossed to GFY40
carrying the TALEN right arm, and diploids were selected as
before.
[0026] FIG. 1B: Sequences recognized by both TALE DNA-binding
domains and by the split-TALE. The length of the spacer, which is
appropriate to induce a DSB was deduced from repeat tract lengths
analyzed in surviving cells after TALEN induction (length of 18
bp).
[0027] FIGS. 2A to 2D: Molecular analysis of survivors after TALEN
induction.
[0028] FIG. 2A: Survival after galactose induction (ratio of CFU on
galactose plates over CFU on glucose plates, after 3-5 days of
growth at 30.degree. C.).
[0029] FIG. 2B: Molecular analysis of heterozygous diploids
(SUP4-opal/sup4-(CAG)).
[0030] FIG. 2C: PCR amplification of DNA extracted from survivors.
When both alleles are present, bands of slightly different sizes
corresponding to uncut alleles are visible in both lanes (arrow
labeled "Uncut"), along with restriction products of cut alleles
(arrows labeled "Cut"). When only the SUP4-opal allele is present,
no cut product is detected in the `I` lane (clones 8 and 11 to
20).
[0031] FIG. 2D: Molecular analysis of homozygous diploids
(sup4-(CAG)/sup4-(CAG)). Same as FIG. 2B, except that total genomic
DNA was digested with Ssp I.
[0032] FIGS. 3A to 3D: Karyotypes and sequencing of TALEN-induced
yeast colonies.
[0033] FIG. 3A: Sanger sequencing of survivors.
[0034] FIG. 3B: Two models proposing how heterozygous and
homozygous repeats may be formed following TALEN induction.
[0035] FIG. 3C: Deep sequencing of yeast genomes from yeast
colonies isolated on glucose or galactose plates.
[0036] FIG. 3D: Pulse-field gel electrophoresis of red and white
colonies after galactose induction.
[0037] FIG. 4: Southern blots (left: strains GFY6161-3C (MATa
leu2.DELTA.1 his3.DELTA.200 lys2.DELTA.202 ade2-opal
sup4::(CAG).sub.30) and GFY6162-3D; right: transformants
GFY6162-3C/1 and GFY6162-3D/2).
DETAILED DESCRIPTION OF THE INVENTION
[0038] The application relates to the subject-matter as defined in
the claims as filed and as herein described.
[0039] The means of the application derive from means, which were
created for genome editing, i.e., Transcription Activator-Like
(TAL) effectors and TAL effector endonucleases (TALENs).
[0040] TAL effectors and TALENs have been described e.g., in Boch
et al. 2009, Moscou et al. 2009, Bogdanove and Voytas 2011, Cermark
et al. 2011, Bedell et al. 2012, Beurdeley et al. 2012, WO
2011/072246 (and its national counterparts, more particularly its
US counterpart(s) (including the US continuation and divisional
application(s))), WO 2010/079430 (and its national counterparts,
more particularly its US counterpart(s) (including the US
continuation and divisional application(s))).
[0041] TAL effectors have been discovered in phytopathogenic
bacteria of the genus Xanthomonas, and are key virulence factors of
these bacteria. Once inside the plant cell, they enter the nucleus,
bind effector-specific DNA sequences and reprogram the host cell by
mimicking eukaryotic transcription factors (Boch et al. 2009;
Moscou et al. 2009). A naturally-occurring TAL effector typically
comprises: [0042] a tandem repeat (or central domain), which is the
direct tandem repeat of adjacent amino acid units, wherein each
unit of the (tandem) repeat consists of 33, 34 or 35 amino acids,
the N- to C-ordered series of which determine the (specific)
recognition of a nucleotide sequence, [0043] said tandem repeat
being followed (in C-term) by a truncated amino acid unit (usually,
the truncation is at 20 amino acids), which is not involved in the
(specific) recognition of said nucleotide sequence, [0044] at least
one Nuclear Localization Signal (NLS), and [0045] an acidic
transcriptional Activation Domain (AD) (cf. Boch et al. 2009).
[0046] The number of (full-length) units of the tandem repeat of a
naturally-occurring TAL effector (i.e., the number of amino acid
units, which determine the (specific) recognition of the nucleotide
sequence) may e.g., range from 8 to 39, more particularly from 10
to 33, usually from 12 to 27.
[0047] TAL effectors are highly conserved among the different
bacterial species. Examples of TAL effectors, which derive from a
naturally-occurring source, include AvrBs3 (from Xanthomonas
campestris pv. vesicatoria), PthXo1 (from Xanthomonas oryzae pv.
oryzae), AvrXa27 (from Xanthomonas oryzae pv. oryzae), PthXo6 (from
Xanthomonas oryzae pv. oryzae), PthXo7 (from Xanthomonas oryzae pv.
oryzae).
[0048] The amino acid sequence of each (tandem) repeat unit is
largely invariant within a TAL effector, with the exception of two
adjacent amino acids, which are known as the Repeat Variable
Diresidue (RVD), and which typically are at positions 12 and 13
within the repeat unit.
[0049] When a TAL effector repeat unit consists of 34 or 35 amino
acids, the RVDs are at positions 12 and 13.
[0050] When a TAL effector repeat unit consists of 33 amino acids,
the amino acid that is at the second position in the RVD is missing
(i.e., the variable amino acid, which would have been at position
13, is missing). Hence, in such a situation, the RVD does not
consist of two adjacent amino acids, but of only one amino acid. In
accordance with the acknowledged terminology in the field of TAL
effectors and TALENs, said amino acid at position 12 is being
referred to as a RVD, although it is not followed by a variable
amino acid at position 13.
[0051] Repeat units with different RVDs recognize different
nucleotides, and there is a direct correspondence between the RVDs
in the repeat domain and the nucleotides in the target DNA
sequence. Examples of RVDs and of their corresponding target
nucleotides are given in Table 5 below.
TABLE-US-00001 TABLE 5 RVD nucleotide HD C NG T NI A NN G or A NS A
or C or G N* C or T HG T H* T IG T HA C ND C NK G HI C HN G NA G SN
G or A YG T *denotes a gap in the repeat sequence corresponding to
a lack of the amino acid residue at the second position of the RVD
(e.g., when the repeat consists of 33 amino acid, instead of 34 or
35 amino acids).
[0052] In accordance with the acknowledged terminology in the field
of TAL effector and TALEN, each of the amino acid units that forms
the tandem repeat of a TAL effector or TALEN (i.e., each of the
amino acid units, which determine the (specific) recognition of the
nucleotide sequence) is being referred to as a (tandem) repeat
unit, although the repeat units of the same tandem repeat do not
have the same sequence.
[0053] Engineered (or man-made or artificial) TAL effectors have
been produced by modification of naturally-occurring TAL
effectors.
[0054] For example, engineered (or man-made or artificial) TAL
effectors have been produced by truncation of a naturally-occurring
TAL effector, to produce fragments of naturally-occurring TAL
effector, which have retained the DNA-binding function of the full
length TAL effector. More particularly, engineered (or man-made or
artificial) TAL effectors have been produced by truncation of the
acidic transcriptional Activation Domain, to produce a fragment of
a naturally-occurring TAL effector, which is devoid of the acidic
transcriptional Activation Domain, but which has retained the
DNA-binding function of the full length TAL effector.
[0055] Engineered (or man-made or artificial) TAL effectors have
been produced by modification of the RVD sequence and/or by
modification of the number of repeat units of naturally-occurring
TAL effectors or of fragments thereof, to recode them for defined
target DNA sequences (cf. e.g., WO 2011/072246 (and its national
counterparts, more particularly its US counterpart(s) (including
the US continuation and divisional application(s)), WO 2010/079430
(and its national counterparts, more particularly its US
counterpart(s) (including the US continuation and divisional
application(s))).
[0056] TAL effectors have been used in genome editing (Bedell et
al. 2012, Cade et al. 2012, Chen et al. 2013, Qiu et al. 2013).
[0057] However, it is believed that TAL effectors and TALENs have
not been previously used for partial or full deletion of an
(expanded) DNA tandem repeat, more particularly for partial or full
deletion of an (expanded) DNA tandem repeat, which is contained in
a double-stranded DNA molecule, more particularly for partial or
full deletion of an (expanded) DNA tandem repeat, which is
contained in a double-stranded DNA molecule and, which forms a
non-linear secondary structure, such as a hairpin, a triple helix
or a tetraplex secondary structure.
[0058] It is also believed that TAL effectors and TALENs have not
been previously used for partial or full deletion of an (expanded)
DNA tandem repeat that is contained in a double-stranded DNA
molecule, such as those observed in pathological conditions.
[0059] The structure of the means of the application is especially
adapted for partial or full deletion of at least one DNA tandem
repeat, more particularly for partial or full deletion of at least
one DNA tandem repeat in a double-stranded DNA, more particularly
for partial or full deletion of at least one DNA tandem repeat,
which is contained in a double-stranded DNA and, which forms a
non-linear secondary structure, such as a hairpin, a triple helix
or a tetraplex secondary structure.
[0060] The structure of the means of the application is especially
adapted for partial or full deletion of at least one (expanded) DNA
tandem repeat that is contained in a double-stranded DNA molecule,
such as those observed in pathological conditions.
[0061] One of the means of the application is a DNA-binding
polypeptide, which binds, or specifically binds, to a DNA nucleic
acid comprising at least one DNA tandem repeat, wherein said
DNA-binding polypeptide comprises a TAL effector tandem repeat. A
TAL effector tandem repeat consists of adjacent amino acid units
(of TAL effector tandem repeat), each containing a Repeat Variable
Diresidue (RVD) that determines recognition of a nucleotide (cf.
above).
[0062] According to an embodiment of the application, said TAL
effector tandem repeat units are immediately or directly adjacent
to each other, i.e., are contiguous.
[0063] Said DNA-binding polypeptide may further comprise at least
one Nuclear Localization Signal (NLS), more particularly at least
one NLS of a TAL effector.
[0064] The term "polypeptide" is herein intended in accordance with
its ordinary meaning in the field of biology. The term
"polypeptide" generally refers to a chain of amino acids linked by
peptidic linkage. It does not imply any restriction in maximal
length of the amino acid chain. As described below, a DNA-binding
polypeptide of the application comprises several units of TAL
effector, and therefore has a minimal length that typically is
above 50 amino acids, more particularly above 60 amino acids, more
particularly above 70 amino acids, more particularly above 100
amino acids, more particularly above 150 amino acids, more
particularly of at least 200 amino acids. The maximal length of a
DNA-binding polypeptide of the application typically is below 2,000
amino acids, more particularly below 1,500 amino acids, more
particularly below 1,400 amino acids, more particularly below 1,000
amino acids.
[0065] The DNA nucleic acid, to which the polypeptide of the
application binds or specifically binds, is a DNA nucleic acid that
comprises at least one DNA tandem repeat.
[0066] Said DNA nucleic acid can e.g., be a double-stranded DNA
nucleic acid or a strand of a double-stranded DNA nucleic acid,
more particularly a double-stranded DNA nucleic acid, more
particularly a chromosomal double-stranded DNA nucleic acid, more
particularly a double-stranded DNA nucleic acid that is contained
in a chromosome. Said double-stranded DNA nucleic acid can e.g., be
a gene, more particularly a eukaryotic gene, more particularly a
non-mammalian eukaryotic gene (e.g., a yeast gene) or a non-human
mammalian gene (e.g., a rodent gene, a rat gene, a mouse gene, a
pig gene, a rabbit gene) or a human gene. According to an
embodiment of the application, said at least one DNA nucleic acid
is a gene (or a strand of a gene), more particularly a human gene
(or a strand of a human gene). Advantageously, said gene (more
particularly, said human gene) is contained in a chromosome.
[0067] Said at least one DNA tandem repeat can be contained at any
location(s) of said gene, e.g., in a promoter and/or in the 5'UTR
and/or in at least one exon and/or in at least one intron and/or in
the 3'UTR of said gene.
[0068] In a DNA-binding polypeptide of the application, the ordered
series of RVDs formed by the RVDs respectively contained in said
adjacent units of TAL effector tandem repeat, in N- to
C-orientation, is an ordered series of amino acids, which,
according to the acknowledged RVD/nucleotide correspondence,
determines the recognition of the 5'-3' nucleotide sequence of a
DNA target site contained in said DNA nucleic acid.
[0069] An acknowledged RVD/nucleotide correspondence is shown in
Table 5 above.
[0070] According to an advantageous aspect of the application, a
DNA-binding polypeptide of the application does not comprise the
acidic transcriptional Activation Domain (AD) of a TAL effector.
Such a DNA-binding polypeptide of the application does not have the
function of transcriptional activation that a naturally-occurring
TAL effector has, but has retained the DNA-binding function of a
full length TAL effector.
[0071] The sequence of said DNA target site consists of:
i. a fragment of the sequence of said at least one DNA tandem
repeat, or ii. a fragment of the sequence of said DNA nucleic acid,
which starts outside the sequence of said at least one DNA tandem
repeat and which ends within the sequence of said at least one DNA
tandem repeat, or conversely, which starts within the sequence of
said at least one DNA tandem repeat and which ends outside the
sequence of said at least one DNA tandem repeat.
[0072] For the sake of concision, a DNA target site, the sequence
of which satisfies feature i. above, will herein after be referred
to as "non-overlapping DNA target site", and a DNA target site, the
sequence of which satisfies feature ii. above, will herein after be
referred to as "overlapping DNA target site".
[0073] An example of non-overlapping DNA target site is the DNA
target site of SEQ ID NO: 10 or 11 (cf. example 1 and FIG. 1B).
[0074] According to an embodiment of the application, the DNA
target site of a DNA-binding polypeptide of the application is a
non-overlapping DNA target site.
[0075] An example of overlapping DNA target site is the DNA target
site of SEQ ID NO: 4 or 5 (cf. example 1 and FIG. 1B).
[0076] For example, said DNA target site is: [0077] either fully
comprised within the DNA tandem repeat sequence: cf. e.g., the DNA
target site .sup.5'G(CTG).sub.4CT.sup.3' (SEQ ID NO: 10) shown
underlined in FIG. 1B (right TALE binding domain), [0078] or, is an
overlapping site consisting of a fragment of the 5' or 3' end of
the DNA tandem repeat sequence (wherein said fragment contains the
first nucleotide at said 5' end or the last nucleotide at said 3'
end respectively) and of a fragment of the DNA sequence that is
immediately adjacent to said 5' or 3' end of DNA tandem repeat
sequence outside said DNA tandem repeat sequence respectively,
i.e., a target site, which, for a portion of it, is the 5' or 3'
end (or extremity) of the DNA tandem repeat sequence and for the
rest of it the DNA sequence that is immediately or directly
adjacent to said 5' or 3' end of DNA tandem repeat sequence
(outside the DNA tandem repeat sequence): cf. e.g., the DNA target
site .sup.5'GTGATCCCCCCAGCA.sup.3' (SEQ ID NO: 4) shown underlined
in FIG. 1B (non-split left TALE binding domain), wherein the last
five nucleotides, i.e., .sup.5'CAGCA.sup.3' (SEQ ID NO: 6) is the
sequence of the 5' end of the DNA tandem repeat and wherein
.sup.5'GTGATCCCCC.sup.3' (SEQ ID NO: 7) is the DNA sequence that is
immediately adjacent to the 5' end of the DNA tandem repeat
sequence outside said DNA tandem repeat sequence (i.e., the gene
sequence that is immediately or directly adjacent to the 5' end of
.sup.5'CAGCA.sup.3').
[0079] According to an embodiment of the application, the DNA
target site of a DNA-binding polypeptide of the application is an
overlapping DNA target site.
[0080] A DNA tandem repeat occurs in a DNA nucleic acid, when a DNA
sequence unit (or pattern) of 2, 3, 4, 5 or more nucleotides is
repeated, i.e., the same DNA sequence unit of 2, 3, 4, 5 or more
nucleotides is identically repeated. Said DNA sequence unit can be
any sequence of at least two nucleotides, more particularly of at
least two different nucleotides.
[0081] When they relate to a DNA nucleic acid, the phrases
"repeat", "tandem repeat", "sequence unit(s)" and "unit(s)" (or
equivalent or similar phrases) are given their respective general
meaning of the field of DNA nucleic acids and DNA tandem repeats.
For example, the nucleic acid
TABLE-US-00002 [SEQ ID NO: 23]
GTGATCCCCCCAGCAGCAGCAGCAGCAGCAGCAG
[0082] contains a DNA tandem repeat consisting of eight copies of
the sequence unit CAG (said eight copies are shown underlined in
SEQ ID NO: 23 above).
[0083] According to an aspect of the application, said DNA sequence
unit consists of 2, 3, 4 or 5 nucleotides, wherein at least two
nucleotides of said unit are different nucleotides.
[0084] According to an aspect of the application, said DNA sequence
unit consists of 3, 4 or 5 nucleotides, wherein at least two
nucleotides of said unit or (at least) three nucleotides of said
unit are different nucleotides.
[0085] According to an aspect of the application, said DNA sequence
unit is selected from the group consisting of .sup.5'CTG.sup.3',
.sup.5'CAG.sup.3', .sup.5'CAA.sup.3', .sup.5'TTG.sup.3',
.sup.5'GAC.sup.3', .sup.5'GTC.sup.3', .sup.5'CCTG.sup.3',
.sup.5'CAGG.sup.3', .sup.5'ATTCT.sup.3', .sup.5'AGAAT.sup.3',
.sup.5'GCG.sup.3', .sup.5'CGC.sup.3', .sup.5'CGG.sup.3' and
.sup.5'CCG.sup.3'.
[0086] According to an aspect of the application, said DNA sequence
unit consists of 3 or 4 nucleotides, wherein at least two
nucleotides of said unit or (at least) three nucleotides of said
unit are different nucleotides.
[0087] According to an aspect of the application, said DNA sequence
unit is selected from the group consisting of .sup.5'CTG.sup.3',
.sup.5'CAG.sup.3', .sup.5'CAA.sup.3', .sup.5'TTG.sup.3',
.sup.5'GAC.sup.3', .sup.5'GTC.sup.3', .sup.5'CCTG.sup.3',
.sup.5'CAGG.sup.3', .sup.5'GCG.sup.3', .sup.5'CGC.sup.3',
.sup.5'CGG.sup.3' and .sup.5'CCG.sup.3'.
[0088] According to an aspect of the application, said DNA sequence
unit consists of 3 nucleotides, wherein at least two nucleotides of
said unit or the three nucleotides of said unit are different
nucleotides.
[0089] According to an aspect of the application, said DNA sequence
unit is selected from the group consisting of .sup.5'CTG.sup.3',
.sup.5'CAG.sup.3', .sup.5'CAA.sup.3', .sup.5'TTG.sup.3',
.sup.5'GAC.sup.3', .sup.5'GTC.sup.3', .sup.5'GCG.sup.3',
.sup.5'CGC.sup.3', .sup.5'CGG.sup.3'and .sup.5'CCG.sup.3.
[0090] The number of DNA sequence units that are repeated in said
at least one DNA tandem repeat is of at least 2 units. According to
an aspect of the application, said number is of at least 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 or 52 units.
[0091] Within a DNA tandem repeat, the copies of the sequence unit
are adjacent to each other. They can either be spaced apart from
each other by only a few nucleotides, e.g., by less than 11, 10, 9,
8, 7, 6, 5, 4, 3, 2 nucleotides, or can be directly adjacent to
each other. According to an aspect of the application, said copies
of DNA sequence unit are spaced apart from each other by only a few
nucleotides, e.g., by less than 6, 5, 4, 3, 2 nucleotides, or are
directly adjacent to each other. According to an aspect of the
application, said copies of DNA sequence unit are directly adjacent
to each other. For example, in the above-mentioned nucleic acid of
SEQ ID NO: 23, said copies of DNA sequence unit (i.e., the copies
of the sequence unit CAG) are directly adjacent to each other.
[0092] According to an aspect of the application, one DNA sequence
unit (or pattern) does not consist of the same nucleotide. For
example, the sequence unit CAG consists of three different
nucleotides (C, A and G).
[0093] In the application, a DNA tandem repeat is a direct tandem
repeat, i.e., it is not an inverted tandem repeat: the order in
which the nucleotides are contained in one DNA sequence unit is
conserved throughout the DNA tandem repeat.
[0094] When they relate to TAL effector or TALEN, the phrases
"repeat", "tandem repeat", "unit(s) of tandem repeat", "TAL
effector repeat unit(s)" or "repeat unit(s)" (or equivalent or
similar phrases) are given their respective general meaning of the
field of TAL effectors and TALENs. TAL effector repeat units are
the amino acid units that form the tandem repeat of the TAL
effector, i.e., the amino acid units, which determine the
(specific) recognition of the nucleotide sequence of the DNA target
site through the N- to C-ordered series of RVDs they respectively
contain.
[0095] As mentioned above, the units, which are considered (and
computed) as TAL effector repeat units, are those, which determine
the recognition of the DNA target site by direct correspondence of
the N- to C-ordered series of RVDs they form with the 5'-3'
nucleotide sequence of the DNA target site (e.g., in accordance
with Table 5 above). TAL effector repeat units do not include any
TAL effector amino acid unit, which would not be involved in said
(specific) recognition, such as e.g., the unit, which is in C-term
of the central domain of a naturally-occurring TAL effector and,
which is truncated at 20 amino acids.
[0096] The tandem repeat (or central domain) of a TAL effector
consists of adjacent amino acid sequence(s), which are known as the
repeat units of said TAL effector, and which each consist of a
frame sequence in which a RVD is contained. Please see above for
the description of the typical structure of a TAL effector, more
particularly of a TAL effector repeat unit.
[0097] Said repeat units can be directly or non-directly adjacent
to each other; they are more particularly directly adjacent to each
other.
[0098] For example, the polypeptide
TABLE-US-00003 [SEQ ID NO: 24]
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNGGG
KQALETVQRLLPVLCQAHGLTPEQVVAIASNIGGKQALETVQRLLPVLC
QAHGLTPEQVVAIASNNGGKQALETVQRLLPVLCQAHGLTPEQVVAIAS
NSGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNGGKQALETVQRLLP
VLCQAHGLTPEQVVAIASHGGGKQALETVQRLLPVLCQAHGLTPEQVVA
IASHGGKQALETVQRLLPVLCQAHG
is a TAL effector tandem repeat consisting of eight (directly
adjacent) copies of the repeat unit
LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG [SEQ ID NO: 25], wherein XX is
the RVD and wherein the RVD is:
[0099] HD in the first repeat unit,
[0100] NG in the second repeat unit,
[0101] NI in the third repeat unit,
[0102] NN in the fourth repeat unit,
[0103] NS in the fifth repeat unit,
[0104] N* in the sixth repeat unit,
[0105] HG in the seventh repeat unit, and
[0106] H* in the eighth repeat unit,
(the first repeat unit is shown underlined in SEQ ID NO: 24, the
RVDs are shown in bold characters in each of the eight units). In
this example, the frame sequence of the TAL effector tandem repeat
unit is the sequence of SEQ ID NO: 25 (in this example, the frame
sequence is the same for each of the eight repeat units).
[0107] In a DNA-binding polypeptide of the application, the number
of amino acids that are contained in one TAL effector tandem repeat
unit can be 33, 34 or 35 (i.e., the same as in a
naturally-occurring TAL effector), or can be lower, e.g., 29, 30,
31 or 32.
[0108] Hence, in a DNA-binding polypeptide of the application, the
number of amino acids that are contained in one TAL effector tandem
repeat unit can be an integer selected from 29-35, or from 30-35,
or from 31-35, or from 32-35, or from 29-34, or from 30-34, or from
31-34, or from 32-34, or from 30-33, or from 30-34, or from 31-33,
or from 32-33.
[0109] According to an aspect of the application, the number of
amino acids that are contained in one repeat unit is 33, 34 or 35
(i.e., the same as in a naturally-occurring TAL effector), more
particularly 34.
[0110] The TAL effector repeat units of a DNA-binding polypeptide
of the application can each consist of the same number of amino
acids, or can consist of different numbers of amino acids.
According to an embodiment of the application, the TAL effectors
repeat units of a DNA-binding polypeptide of the application each
consist of the same number of amino acids, e.g., 33, 34 or 35 amino
acids, e.g., 34 amino acids.
[0111] The N- to C-ordered series of TAL effector repeat units of a
DNA-binding polypeptide of the application can be followed in
C-term by a truncated unit, which consists of less than 29 amino
acids, more particularly a unit, which is truncated after the RVD
(i.e., after the amino acid at position 13), e.g., which is
truncated immediately after the amino acid at position 28, 27, 26,
25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14 or 13. An example of
such a truncated unit is the unit of SEQ ID NO: 56
(LTPQQVVAIASNGG), which can be viewed as a truncation of the TAL
effector repeat unit of SEQ ID NO: 46
(LTPQQVVAIASXXGGKQALETVQRLLPVLCQAHG) at amino acid position 14 with
XX=NG. Such a truncated unit is not involved in the (specific)
recognition of the nucleotide sequence of the DNA target site and
therefore is not considered as, and not computed as a TAL effector
repeat unit.
[0112] As mentioned above, the units, which are considered (and
computed) as TAL effector repeat units, are those, which determine
the recognition of the DNA target site by direct correspondence of
the N- to C-ordered series of RVDs they form with the 5'-3'
nucleotide sequence of the DNA target site (e.g., in accordance
with Table 5 above).
[0113] Units, which would not determine said recognition, such as
the above-mentioned truncated unit, are not considered, and are not
computed, as a TAL effector repeat unit.
[0114] Hence, in the application, units, which consist of 29-35
amino acids as described above, can be considered (and computed) as
TAL effector repeat units, whereas the above-mentioned truncated
unit is not considered (and is not computed) as a TAL effector
repeat unit.
[0115] The frame sequence of a TAL effector repeat unit is largely
invariant among the TAL effectors. Examples of (the frame sequence
of a) TAL effector repeat unit comprise: [0116] the sequence of SEQ
ID NO: 25 (LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG; cf. example 1
below), [0117] the sequence of SEQ ID NO: 46
(LTPQQVVAIASXXGGKQALETVQRLLPVLCQAHG; cf. example 1 below), [0118]
the sequence of SEQ ID NO: 55 (LTPEQVVAIASXXGGKQALETVQALLPVLCQAHG;
cf. example 1 below), [0119] the sequence of SEQ ID NO: 26
(LTPDQVVAIASXXGGKQALETVQRLLPVLCQDHG) wherein XX stands for the RVD.
Please note that, in a RVD sequence "XX", the first X is an amino
acid, whereas the second X is an amino acid or is absent (cf. e.g.,
N* or H* in Table 5 above).
[0120] Other examples of (the frame sequence of a) TAL effector
repeat unit comprise amino acid units, the respective sequences of
which are variant sequences of at least one of the sequences of SEQ
ID NOs: 25, 46, 55, 26. Said variant sequences: [0121] consist of
33, 34 or 35 amino acids, [0122] are at least 50%, more
particularly at least 51%, or at least 52%, or at least 53%, or at
least 54%, or at least 55%, or least 55.5% identical to at least
one of SEQ ID NOs: 25, 46, 55 and 26 over the whole length of said
at least one SEQ ID sequence, [0123] have retained the "XX" RVD
sequence at positions 12 and 13, and [0124] have retained the
nucleotide recognition capacity of a TAL effector tandem repeat
unit (i.e., which determine the recognition of a nucleotide through
said "XX" RVD).
[0125] According to an aspect of the application, said XX is
selected from the group consisting of HD, NG, NI, NN, NS, N*, HG,
H*, IG, HA, ND, NK, HI, HN, NA, SN and YG. The symbol * denotes
that the second X is missing (cf. e.g., Table 5 above).
[0126] A TAL effector tandem repeat can be: [0127] formed by the
same unit frame sequence, which is identically repeated (like a
homopolymer wherein only the RVDs vary), such as illustrated above
by the sequence of SEQ ID NO: 24, or can be [0128] formed by
different unit frame sequences, i.e., the TAL effector repeat units
do not all have the same frame sequence (like a heteropolymer,
e.g., as in the TAL effector tandem repeat sequence coded by SEQ ID
NO: 45 or SEQ ID NO: 54).
[0129] The TAL effector repeat units of a DNA-binding polypeptide
of the application can have each different frame sequences.
Nevertheless, a TAL effector tandem repeat of a DNA-binding
polypeptide of the application generally consists of repeat units
wherein at least one frame sequence is identically repeated.
[0130] Although the TAL effector repeat units of a DNA-binding
polypeptide can have different frame sequences, said units are
considered to be "repeat units" in accordance with the acknowledged
terminology in the field of TALE effector and TALENs. Indeed, the
sequence variation between (the frame sequences of) two different
TAL effector units is low (cf. e.g., the sequence variation between
the 34aa-long sequences of SEQ IDs: 25, 26, 46 and 55) and the
function is conserved.
[0131] Hence, in a DNA-binding polypeptide of the application, the
adjacent units of TAL effector tandem repeat may for example
comprise or consist of one or several copy(ies) of at least one
sequence selected from the group consisting of SEQ ID NOs: 25, 26,
46, 55 and said variant sequences thereof, and/or comprise one or
several copy(ies) of at least one of the sequences of TAL effector
tandem repeat units of the DNA-binding polypeptide, which is coded
by the plasmid deposited at the Collection Nationale de Culture de
Microorganismes (C.N.C.M.), Paris, France, under deposit number
I-4804 or under deposit number I-4805.
[0132] The total number of the adjacent units forming the TAL
effector tandem repeat of a DNA-binding polypeptide of the
application can be from 8 to 39, usually from 10 to 33, 13 to 33,
13 to 34, 13 to 35, 14 to 33, 14 to 34 or 14 to 35, for example
from 12 to 27, 13 to 28, from 14 to 28, from 14 to 22, from 15 to
21, e.g., 15, 16, 17, 18, 19, 20 or 21.
[0133] For example, in FIG. 1B, each of the two engineered TAL
effectors binds to a target DNA site consisting of 15 nucleotides
(SEQ ID NO: 4 and 10, respectively); therefore, the number of
repeat units of each of said two engineered TAL effectors is
15.
[0134] Every combination of amino acid length of a TAL effector
tandem repeat unit and of number of TAL effector tandem repeat
units is herein explicitly encompassed, e.g., a number of amino
acids of 29-35 per TAL effector tandem repeat unit and a number of
13-33 TAL effector tandem repeat units per polypeptide, or a number
of amino acids of 33-35 per TAL effector tandem repeat unit and a
number of 12-27 TAL effector tandem repeat units per
polypeptide.
[0135] A DNA-binding polypeptide of the application can be a
non-naturally-occurring polypeptide, e.g., a man-made or artificial
or engineered polypeptide.
[0136] According to an aspect of the application, a DNA-binding
polypeptide of the application does not comprise the acidic
transcriptional activation domain (AD) of a TAL effector. According
to this aspect of the application, a DNA-binding polypeptide of the
application can be viewed as a fragment of TAL effector or of
engineered TAL effector, which still comprises the tandem repeat
and the NLS of said (engineered) TAL effector, and which is
advantageously devoid of the acidic transcriptional activation
domain (AD) of said TAL effector. Examples of such fragments
notably include the BamHI fragment of said TAL effector.
[0137] The total length of the TAL effector tandem repeat of a
polypeptide of the application (i.e., the total length formed by
the adjacent amino acid units forming said TAL effector tandem
repeat) can e.g., be above 50, 60, 70, 100, 150, 200, 250, 300,
350, 400, 450, 500, 550, 600, 650 or 700 amino acids and/or below
2,000, 1,800, 1,600, 1,500, 1,400, 1,200, 1,000, 900, 800 or 750
amino acids.
[0138] Every combination of one of these minimal lengths and of one
of these maximal lengths is herein explicitly encompassed, e.g., a
length above 50 and below 1,400 amino acids, or a length above 60
and below 1,400 amino acids, or above 70 and below 1,400 amino
acids, or above 1000 and below 1,400 amino acids, or above 150 and
below 1,400 amino acids, or above 200 and below 1,400 amino acids,
or above 300 and below 1,500 amino acids, or above 400 and below
1,200 amino acids, or above 500 and below 1,200 amino acids, or
above 600 and below 1,200 amino acids, or above 600 and below 1,000
amino acids, or above 650 and below 800 amino acids, or above 700
and below 750 amino acids.
[0139] Hence, a DNA-binding polypeptide of the application can
e.g., comprise a TAL effector tandem repeat, wherein the total
number of adjacent amino acid units forming said TAL effector
tandem repeat is 8 to 39 (more particularly, 10 to 33, 13 to 33, 13
to 34, 13 to 35, 14 to 33, 14 to 34 or 14 to 35, for example from
12 to 27, 13 to 28, from 14 to 28, from 14 to 22, from 15 to 21,
e.g., 15, 16, 17, 18, 19, 20 or 21), wherein each of said adjacent
units of said TAL effector tandem repeat is selected from the group
consisting of the sequences of SEQ ID NOs: 25, 26, 46, 55 and said
variant sequences thereof, and wherein the N- to C-ordered series
of RVDs formed by said adjacent repeat units determine the
recognition of an overlapping DNA target site or of a
non-overlapping DNA target site.
[0140] Said DNA-binding polypeptide may further comprise at least
one NLS, more particularly at least one NLS of TAL effector.
[0141] According to an aspect of the application, the DNA target
site, which is recognized by the ordered series of RVDs of the TAL
effector tandem repeat units of a DNA-binding polypeptide of the
application, consists of 8 to 39 nucleotides, more particularly of
13 to 33, 13 to 34, 13 to 35, 14 to 33, 14 to 34 or 14 to 35
nucleotides, for example of 13 to 28 nucleotides, of 14 to 28
nucleotides, of 14 to 22 nucleotides, of 15 to 21 nucleotides,
e.g., of 15, 16, 17, 18, 19, 20 or 21 nucleotides.
[0142] According to an aspect of the application, said DNA target
site consists of a number of nucleotides, which is identical to the
number of TAL effector tandem repeat units of said DNA-binding
polypeptide.
[0143] According to an aspect of the application, a non-overlapping
DNA target site as defined above is a fragment of said at least one
DNA tandem repeat, and comprises more than one copy of the DNA
sequence unit of said at least one DNA tandem repeat, more
particularly at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13
(adjacent or directly adjacent) copies of the DNA sequence unit of
said at least one DNA tandem repeat, more particularly at least 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13 directly adjacent copies of
the DNA sequence unit of said at least one DNA tandem repeat.
[0144] According to an aspect of the application, the number of
copy(ies) of DNA sequence unit in said fragment of said at least
one DNA tandem repeat is an integer.
[0145] According to an alternative or complementary aspect of the
application, said copy number is not an integer, i.e., it is a
number with decimals (more particularly a number with two
decimals). For example, if the DNA sequence unit consists of 3
nucleotides, and if the fragment of the DNA tandem repeat that is
contained in the non-overlapping DNA target site consists of five
nucleotides, i.e., if it consists of one unit copy (3 nucleotides)
and (directly adjacent thereto) two thirds of another unit copy (2
nucleotides), the copy number is 3/3+2/3=1.67, i.e., the copy
number is not an integer.
[0146] When it relates to a non-overlapping DNA target site, the
expression "more than one copy" encompasses a copy number, which is
or not an integer, more particularly a copy number, which is more
than one and less than two, such as a copy number of 1.67, as well
as a copy number of two and above.
[0147] A non-overlapping DNA target site can e.g., be a fragment of
said at least one DNA tandem repeat, which comprises or consists of
more than one copy of the DNA sequence unit, e.g., which comprises
or consists of: [0148] one copy, or several directly adjacent
copies, of the DNA sequence unit, and, directly adjacent thereto
(in 5' and/or in 3'), [0149] zero, one fragment of said DNA
sequence unit (in 5' and/or in 3'), or two fragment(s) of said DNA
sequence unit (one fragment in 5' and one fragment in 3').
[0150] For example, if the DNA sequence unit of the DNA tandem
repeat is .sup.5'CTG.sup.3', the sequence of the non-overlapping
DNA target site can e.g., be .sup.5'G(CTG).sub.4CT.sup.3' (SEQ ID
NO: 10), i.e., a fragment of the DNA tandem repeat, which consists
of four .sup.5'CTG.sup.3' units ((CTG).sub.4) and, directly
adjacent thereto, two fragments of DNA sequence unit (fragment G in
5' and fragment CT in 3').
[0151] An example of non-overlapping DNA target site is the DNA
target site of SEQ ID NO: 10 or of SEQ ID NO: 11 (cf. FIG. 1B).
Hence, a DNA-binding polypeptide of the application can e.g.,
comprise a TAL effector tandem repeat as defined above, wherein
said units are selected from the group consisting of the sequences
of SEQ ID NOs: 25, 26, 46, 55 and said variant sequences thereof,
and wherein the N- to C-ordered series of RVDs formed by the RVDs
respectively contained in said units determines the recognition of
a non-overlapping DNA target site as defined above, e.g., the DNA
target site of SEQ ID NO: 10 or of SEQ ID NO: 11. An example of N-
to C-ordered series of RVDs, which determines the recognition of
the DNA target site of SEQ ID NO: 10 [.sup.5'G(CTG).sub.4CT.sup.3],
is: NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD and NG
(cf. Table 5 above).
[0152] An example of TAL effector tandem repeat, which can be
comprised in a DNA-polypeptide of the application and, which
(specifically) binds to a non-overlapping DNA target site, is the
polypeptide coded by the sequence of SEQ ID NO: 45 (cf. example 1
below), which (specifically) binds to the non-overlapping DNA
target site of SEQ ID NO: 10.
[0153] An example of TAL effector tandem repeat, which can be
comprised in a DNA-polypeptide of the application and, which
(specifically) binds to a non-overlapping DNA target site, is the
TAL effector tandem repeat coded by plasmid pCLS9996exp (C.N.C.M.
deposit number I-4804), which (specifically) binds to the
non-overlapping DNA target site of SEQ ID NO: 10.
[0154] The sequence of an overlapping DNA target site as defined
above is the sequence of a fragment of said DNA nucleic acid,
which, for a portion of it, is within said at least one DNA tandem
repeat [the "inside" portion], and which, for the remaining portion
of it, is outside said at least one DNA tandem repeat [the
"outside" portion].
[0155] For example, if the sequence of the overlapping DNA target
site is .sup.5'GTGATCCCCCCAGCA.sup.3' (SEQ ID NO: 4) within a DNA
nucleic acid comprising the (CAG).sub.n tandem repeat (cf. FIG.
1B), the portion of the DNA target site, which is within the DNA
tandem repeat [the "inside" portion], consists of
.sup.5'CAGCA.sup.3' (SEQ ID NO: 6), and the remaining portion,
which is outside said DNA tandem repeat [the "outside" portion], is
.sup.5'GTGATCCCCC.sup.3' (SEQ ID NO: 7), i.e., 10 nucleotides.
[0156] The portion of an overlapping DNA target site, which is
within said at least one DNA tandem repeat [the "inside" portion],
is a fragment of said at least one DNA tandem repeat, and consists
of at least a fragment of a copy of the DNA sequence unit of said
at least one DNA tandem repeat, more particularly of one, at least
one or more than one copy of the DNA sequence unit of said at least
one DNA tandem repeat, more particularly of two, at least two,
three, at least three, four or at least four (adjacent or directly
adjacent) copies of the DNA sequence unit of said at least one DNA
tandem repeat.
[0157] According to an aspect of the application, said copy number
is an integer (e.g., in the DNA tandem repeat fragment
.sup.5'CAGCAG.sup.3' (SEQ ID NO: 35), the copy number is two (two
.sup.5'CAG.sup.3' units)).
[0158] According to an alternative or complementary aspect of the
application, said copy number is not an integer i.e., it is a
number with decimals (more particularly a number with two
decimals). For example, if the sequence of the overlapping DNA
target site is .sup.5'GTGATCCCCCCAGCA.sup.3' (SEQ ID NO: 4) within
a DNA nucleic acid comprising the (CAG).sub.n tandem repeat (cf.
FIG. 1B), the portion of the DNA target site, which is within the
DNA tandem repeat [the "inside" portion], consists of
.sup.5'CAGCA.sup.3' (SEQ ID NO: 6), i.e., consists of one DNA
sequence unit (unit CAG) and (directly adjacent thereto) a fragment
of the DNA sequence unit (CA), i.e., the "inside" portion consists
of one unit copy (unit CAG) and, directly adjacent thereto, two
thirds of another unit copy (CA), the copy number is 1+2/3=1.67,
i.e., the copy number is not an integer.
[0159] When it relates to the portion of an overlapping DNA target
site, which is within the DNA tandem repeat, the expression "more
than one copy" encompasses a copy number, which is or not an
integer, more particularly a copy number, which is more than one
and less than two, such as a copy number of 1.67, as well as a copy
number of two and above.
[0160] More particularly, the portion of an overlapping DNA target
site, which is within said at least one DNA tandem repeat [the
"inside" portion], is a fragment of said at least one DNA tandem
repeat, which comprises or consists of more than one copy of the
DNA sequence unit, e.g., which comprises or consists of: [0161] one
copy, or several directly adjacent copies, of the DNA sequence
unit, and, directly adjacent thereto (in 5' and/or in 3'), [0162]
zero, one fragment of said DNA sequence unit (in 5' and/or in 3'),
or two fragment(s) of said DNA sequence unit (one fragment in 5'
and one fragment in 3').
[0163] For example, if the DNA sequence unit of the DNA tandem
repeat is .sup.5'CTG.sup.3', the sequence of the portion of the
overlapping DNA target site, which is within said at least one DNA
tandem repeat [the "inside" portion], can e.g., be
.sup.5'G(CTG).sub.4CT.sup.3' (SEQ ID NO: 10), i.e., a fragment of
the DNA tandem repeat, which consists of four .sup.5'CTG.sup.3'
units ((CTG).sub.4) and, directly adjacent thereto, two fragments
of DNA sequence unit (fragment G in 5' and fragment CT in 3').
[0164] The portion of an overlapping DNA target site, which is
outside said DNA tandem repeat [the "outside" portion], consists of
at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotide(s), for example
of at least 5 nucleotides or of more than 5 nucleotides, more
particularly of at least 6, 7, 8, 9 or 10 nucleotides.
[0165] Any combination of: [0166] said number of nucleotides of the
"outside" (or first) portion and of [0167] said copy number of DNA
repeat unit(s) of the "inside" (or second) portion is herein
explicitly encompassed, e.g., an overlapping DNA target site,
comprising at least one or more than one copy of the DNA repeat
unit of said at least one DNA tandem repeat (cf. above) and
comprising an "outside" portion of at least 5, 6 or 7
nucleotides.
[0168] Alternatively or complementarily, the sequence of an
overlapping DNA target site as defined above can be viewed as the
sequence of a fragment of said DNA nucleic acid, which comprises or
consists of: [0169] a. a sequence comprising, or consisting of, at
least one, or more than one, copy of the DNA sequence unit (cf.
above), and, directly adjacent thereto, in 5' or in 3', [0170] b. a
sequence, which is of at least 5 nucleotides or of more than five
nucleotides, and which differs from the sequence of a.
[0171] Alternatively or complementarily, the sequence of an
overlapping DNA target site can be viewed as the sequence of a
fragment of said DNA nucleic acid, which comprises, but does not
consist of, a fragment of said at least one DNA tandem repeat,
wherein the copy number of said DNA sequence unit in said fragment
of said at least one DNA tandem repeat is at least one or more than
one, more particularly more than one.
[0172] More particularly, said fragment of said DNA nucleic acid
further comprises another sequence, which is of at least five or of
more than five nucleotides, and which is directly adjacent in 5' or
in 3' to said fragment of said at least one DNA tandem repeat. More
particularly, the end of said sequence of at least five or of more
than five nucleotides, which is directly linked to said fragment of
said at least one DNA tandem repeat, is a sequence (e.g., of the
same length as said DNA sequence unit, but) which differs from the
sequence of said DNA sequence unit.
[0173] Alternatively or complementarily, the sequence of an
overlapping DNA target site can be viewed as the sequence of a
fragment of said DNA nucleic acid, which consists of: [0174] a. a
nucleotide sequence, which is a fragment of said at least one DNA
tandem repeat, wherein the copy number of said DNA sequence unit in
said fragment of said at least one DNA tandem repeat is more than
one (said more than one copy being adjacent or directly adjacent to
each other, more particularly directly adjacent to each other), and
[0175] b. a nucleotide sequence of at least five or of more than
five nucleotides, which differs from said sequence of a., and which
is directly linked, in 5' or in 3', to said nucleotide sequence of
a., [0176] wherein the end of said nucleotide sequence of b., which
is directly linked to said nucleotide sequence of a., is a sequence
(e.g., of the same length as said DNA sequence unit, but) which
differs from the sequence of said DNA sequence unit.
[0177] Alternatively or complementarily, an overlapping DNA target
site can be viewed as a fragment of said DNA nucleic acid, which
consists of a portion, which is outside the DNA tandem repeat [the
"outside" portion] and of a portion, which is inside the DNA tandem
repeat [the "inside" portion], wherein the nucleotide length of
said "outside" portion is more than 20%, more particularly more
than 30%, more particularly more than 40%, more particularly more
than 45%, more particularly more than 50%, more particularly more
than 55%, more particularly more than 60% (but less than 100%) of
the total length of said (full-length) DNA target site.
[0178] For example, if the sequence of the overlapping DNA target
site is .sup.5'GTGATCCCCCCAGCA3' (SEQ ID NO: 4) within a DNA
nucleic acid comprising the (CAG).sub.n tandem repeat (cf. FIG.
1B), the portion of the DNA target site, which is outside the DNA
tandem repeat, consists of .sup.5'GTGATCCCCC.sup.3' (SEQ ID NO: 7),
i.e., consists of 10 nucleotides, whereas the DNA target site
consists of 15 nucleotides; hence, the portion of the overlapping
DNA target site, which is outside the DNA tandem repeat, consists
of a number of nucleotides, which is (10/15.times.100=) 66.7%,
i.e., of more than 60% (but less than 100%) of the total number of
nucleotides of the DNA target site.
[0179] Advantageously, an overlapping DNA target site is a fragment
of said DNA nucleic acid, which consists of a portion, which is
outside the DNA tandem repeat [the "outside" portion] and of a
portion, which is inside the DNA tandem repeat [the "inside"
portion], wherein the nucleotide length of said "outside" portion
is more than 40%, more particularly more than 45%, more
particularly more than 50%, more particularly more than 55%, more
particularly more than 60% (but less than 100%) of the total length
of said (full-length) DNA target site.
[0180] Alternatively or complementarily, an overlapping DNA target
site is a fragment of said DNA nucleic acid, which consists of a
portion, which is outside the DNA tandem repeat [the "outside"
portion] and of a portion, which is inside the DNA tandem repeat
[the "inside" portion], wherein the nucleotide length of said
"inside" portion is less than 80%, more particularly less than 70%,
more particularly less than 60%, more particularly less than 55%,
more particularly less than 50%, more particularly less than 45%,
more particularly less than 40% (but more than 0% or more than 1%)
of the total length of said (full-length) DNA target site. For
example, if the sequence of the overlapping DNA target site is
.sup.5'GTGATCCCCCCAGCA.sup.3' (SEQ ID NO: 4) within a DNA nucleic
acid comprising the (CAG).sub.n tandem repeat (cf. FIG. 1B), the
portion of the DNA target site, which is inside the DNA tandem
repeat, consists of .sup.5'CAGCA.sup.3' (SEQ ID NO: 6), i.e., of 5
nucleotides, whereas the DNA target site consists of 15
nucleotides; hence, the portion of the overlapping DNA target site,
which is inside the DNA tandem repeat, consists of a number of
nucleotides, which is (5/15.times.100=) 33.3%, i.e., of less than
40% (but more than 0% or more than 1%) of the total number of
nucleotides of the DNA target site.
[0181] Advantageously, an overlapping DNA target site is a fragment
of said DNA nucleic acid, which consists of a portion, which is
outside the DNA tandem repeat [the "outside" portion] and of a
portion, which is inside the DNA tandem repeat [the "inside"
portion], wherein the nucleotide length of said "inside" portion is
less than 60%, more particularly less than 55%, more particularly
less than 50%, more particularly less than 45%, more particularly
less than 40% (but more than 0% or more than 1%) of the total
length of said (full-length) DNA target site.
[0182] Alternatively or complementarily, an overlapping DNA target
site is a fragment of said DNA nucleic acid, which comprises, but
does not consist of, a fragment of said at least one DNA tandem
repeat, wherein the nucleotide length of said at least one DNA
tandem repeat is more than 10% and less than 80%, more particularly
more than 15% and less than 70%, more particularly more than 20%
and less than 60%, more particularly more than 20% and less than
50%, more particularly more than 20% and less than 40%, of the
total nucleotide length of said DNA target site.
[0183] An example of overlapping DNA target site is the DNA target
site of SEQ ID NO: 4 or of SEQ ID NO: 5 (cf. FIG. 1B).
[0184] Hence, a DNA-binding polypeptide of the application can
e.g., comprise a TAL effector tandem repeat as defined above,
wherein said adjacent units are selected from the group consisting
of the sequences of SEQ ID NOs: 25, 26, 46, 55 and said variant
sequences thereof, and wherein the N- to C-ordered series of RVDs
formed by said adjacent units determine the recognition of the
(overlapping) DNA target site of SEQ ID NO: 4 or of SEQ ID NO:
5.
[0185] An example of N- to C-ordered series of RVDs, which
determine the recognition of the DNA target site of SEQ ID NO: 4
[.sup.5'GTGATCCCCCCAGCA.sup.3'], is NN; NG; NN; NI; NG; HD; HD; HD;
HD; HD; HD; NI; NN; HD and NI (cf. Table 5 above).
[0186] An example of TAL effector tandem repeat, which can be
comprised in a DNA-polypeptide of the application, is the
polypeptide coded by the sequence of SEQ ID NO: 54 (cf. example 1
below), which (specifically) binds to the overlapping DNA target
site of SEQ ID NO: 4.
[0187] An example of TAL effector tandem repeat, which can be
comprised in a DNA-polypeptide of the application, is the TAL
effector tandem repeat coded by plasmid pCLS16715 (C.N.C.M. deposit
number I-4805), which (specifically) binds to the overlapping DNA
target site of SEQ ID NO: 4.
[0188] According to an aspect of the application, the sequence of
the DNA target site that is recognized by the ordered series of
RVDs of a DNA-binding polypeptide of the application is immediately
preceded in 5' by the nucleotide T.
[0189] Indeed, it has been observed that the presence of the
nucleotide T directly adjacent to the 5' end (or extremity) of the
DNA target site might be advantageous to adequately or efficiently
bind to a naturally-occurring DNA target.
[0190] For example, the DNA target site of SEQ ID NO: 10 (cf. right
TALE binding domain of FIG. 1B) is immediately preceded in 5' by
the nucleotide T. The DNA target site of SEQ ID NO: 4 (cf. the
non-split left TALE binding domain of FIG. 1B) also is immediately
preceded in 5' by the nucleotide T.
[0191] Said preceding T is not part of the DNA target site (the
RVDs of the TALE effector do not determine the recognition of said
T), but is believed to improve the stability of the binding.
[0192] The (at least one) DNA nucleic acid, to which the
DNA-binding polypeptide of the application binds, or specifically
binds, can e.g, be a double-stranded DNA nucleic acid or a strand
of a double-stranded DNA nucleic acid.
[0193] Said DNA strand can be isolated from the other strand of the
double-stranded DNA nucleic acid, whereby forming a single-stranded
molecule, or can still be contained in said double-stranded DNA
nucleic acid molecule (i.e., be still in duplex with its
complementary strand). Advantageously, said DNA strand is still
contained in said double-stranded DNA nucleic acid molecule.
[0194] Hence, the DNA nucleic acid, to which the DNA-binding
polypeptide of the application binds, or specifically binds,
advantageously is a double-stranded DNA nucleic. When said DNA
nucleic acid is a double-stranded DNA nucleic acid, the DNA-binding
polypeptide of the application binds to one of the two strands of
the double-stranded DNA nucleic acid.
[0195] Said double-stranded DNA nucleic acid can e.g., be a
chromosomal DNA nucleic acid, more particularly a chromosomal
double-stranded DNA nucleic acid, more particularly a
double-stranded DNA nucleic acid that is contained in a
chromosome.
[0196] Said double-stranded DNA nucleic acid can e.g., be a gene,
more particularly a eukaryotic gene, more particularly a
non-mammalian eukaryotic gene (e.g., a yeast gene) or a non-human
mammalian gene (e.g., a rodent gene, a rat gene, a mouse gene, a
pig gene, a rabbit gene) or a human gene. According to an
embodiment of the application, said at least one DNA nucleic acid
is a gene, more particularly a human gene. Advantageously, said
gene (more particularly, said human gene) is a chromosomal gene,
more particularly a gene that is contained in a chromosome, more
particularly a gene that is contained in a human chromosome.
[0197] According to an aspect of the application, the DNA nucleic
acid, to which the DNA-binding polypeptide of the application
binds, or specifically binds, is a double-stranded DNA nucleic,
wherein at least one of its two strands contains nucleotide(s) T in
the sequence of the DNA tandem repeat (i.e., in at least one of
said two DNA strands, the unit of the DNA tandem repeat contains at
least one nucleotide T). According to this aspect of the
application, the DNA sequence unit that is repeated in the sequence
of said T-containing DNA tandem repeat can e.g., be selected from
the group consisting of .sup.5'CTG.sup.3', .sup.5'TTG.sup.3',
.sup.5'GTC.sup.3', .sup.5'CCTG.sup.3', .sup.5'ATTCT.sup.3' and
.sup.5'AGAAT.sup.3'.
[0198] According to an aspect of the application, the DNA nucleic
acid, to which the DNA-binding polypeptide of the application
binds, or specifically binds, is a double-stranded DNA nucleic,
wherein only one of its two strands contains the nucleotide T in
the sequence of said at least one DNA tandem repeat (i.e., in one
of said two DNA strands, the unit of the DNA tandem repeat contains
at least one nucleotide T, whereas in the other of said two DNA
strands, the unit of the DNA tandem repeat does not contain any
nucleotide T). According to this aspect of the application, the DNA
sequence unit that is repeated in the sequence of said T-containing
DNA tandem repeat can e.g., be selected from the group consisting
of .sup.5'CTG.sup.3', .sup.5'TTG.sup.3', .sup.5'GTC.sup.3'and
.sup.5'CCTG.sup.3'. The DNA-binding polypeptide of the application
may bind to the strand that contains said nucleotide T, or may bind
to the other strand. Please see FIG. 1B, which illustrates a human
gene, wherein only one of its two strands contains the nucleotide T
in the sequence of the DNA tandem repeat (i.e., the human gene
coding for DM1, which comprises the (CAG)n tandem repeat in one
strand and the (CTG)n tandem repeat in the complementary strand):
as described in example 1 below, a first DNA-binding polypeptide of
the application (i.e., the left-hand TALEN of example 1 below)
binds to the strand containing the (CAG)n repeat at an overlapping
DNA binding site (e.g., SEQ ID NO: 4), whereas a second DNA-binding
polypeptide of the application, which is different from said first
DNA-binding polypeptide of the application, (i.e., the right-hand
TALEN of example 1 below) binds to the (complementary) strand
containing the (CAG)n repeat at a non-overlapping DNA binding site
(e.g., SEQ ID NO: 10).
[0199] According to an aspect of the application, the DNA nucleic
acid, to which the DNA-binding polypeptide of the application
binds, or specifically binds, is a double-stranded DNA nucleic,
wherein each of its two strands contains the nucleotide T in the
sequence of the DNA tandem repeat (i.e., in one of said two DNA
strands, the unit of the DNA tandem repeat contains at least one
nucleotide T, and in the other of said two DNA strands, the unit of
the DNA tandem repeat also contains at least one nucleotide T).
According to this aspect of the application, the DNA sequence unit
that is repeated in the sequence of said (T-containing) DNA tandem
repeat can e.g., be selected from the group consisting of
.sup.5'ATTCT.sup.3'and .sup.5'AGAAT.sup.3'. According to an
advantageous aspect of the application, the sequence of said at
least one DNA tandem repeat can form a non-linear secondary
structure, such as a hairpin, a triple helix or a tetraplex
secondary structure.
[0200] According to an advantageous aspect of the application, said
DNA nucleic acid can be any DNA nucleic acid, more particularly any
double-stranded DNA nucleic acid (more particularly any human
double-stranded DNA nucleic acid), more particularly any gene (more
particularly any human gene), which is involved in a neurological
and/or muscular and/or skeletal disorder or disease and/or in a
disorder or disease involving at least one (abnormally-expanded)
DNA tandem repeat, more particularly in a neurological and/or
muscular and/or skeletal disorder or disease involving at least one
(abnormally-expanded) DNA tandem repeat.
[0201] Examples of such disorders and diseases, as well as of the
genes that are respectively involved in said disorders and diseases
are given in Tables 6, 7 and 8 below. Table 8 below shows examples
of the average number of DNA tandem repeat units that is observed
in a healthy subject (normal average range of repeat units).
TABLE-US-00004 TABLE 6 DISEASE or DISORDER Acronym or Phenotype MIM
abbreviation Name number (*) DM1 Myotonic dystrophy type 1 160900
SCA8 Spinocerebellar ataxia 8 608768 SCA12 Spinocerebellar ataxia
12 604326 HDL2 Huntington's disease-like 2 606438 SBMA Spinal and
bulbar muscular atrophy 313200 (or Kennedy disease) HD Huntington's
disease 143100 DRPLA Dentatorubral-pallidouysian atrophy 125370
SCA1 Spinocerebellar ataxia 1 164400 SCA2 Spinocerebellar ataxia 2
183090 SCA3 Spinocerebellar ataxia 3 109150 (Machado-Joseph
disease) SCA6 Spinocerebellar ataxia 6 183086 SCA7 Spinocerebellar
ataxia 7 164500 SCA17 Spinocerebellar ataxia 17 607136 PSACH
Pseudoachondroplasia 177170 DM2 Myotonic dystrophy 2 602668 SCA10
Spinocerebellar ataxia 10 603516 SPD1 Synpolydactyly 186000 OPMD
Oculopharyngeal muscular dystrophy 164300 CCD Cleidocranial
dysplasia 119600 HPE5 Holoprosencephaly 5 609637 HFG
Hand-Foot-Genital syndrome 140000 syndrome BPES Blepharophimosis,
epicanthus inversus, 110100 and ptosis EIEE1 Epileptic
encephalopathy, early 308350 infantile, 1 FRAXA Fragile X syndrome
300624 FXTAS X tremor/ataxia syndrome 300623 FRAXE Mental
retardation, X-linked, 309548 associated with fragile site
FRAXE
TABLE-US-00005 TABLE 7 GENE DISEASE or Gene ID DISORDER MIM of the
(acronym or Name of the protein number sequence abbreviation)
encoded by the gene (*) (.sctn.) DM1 DMPK (dystrophia myotonia
protein 605377 1762 kinase) SCA8 ATXN8; protein-coding strand
613289 724066 SCA8 ATXN8; non-protein coding strand 603680 6315
(ATXN8OS) SCA12 PPP2R2B (regulatory subunit B of 604325 5521
protein phosphatase 2) HDL2 JPH3 (junctophilin-3) 605268 57338 SBMA
AR (androgen receptor) 313700 367 HD HTT (huntingtin) 613004 3064
DRPLA ATN1 (atrophin 1) 607462 1822 SCA1 ATXN1 (ataxin-1) 601556
6310 SCA2 ATXN2 (ataxin-2) 601517 6311 SCA3 ATXN3 (ataxin-3) 607047
4287 SCA6 CACNA1A (calcium channel, 601011 773 voltage-dependent,
P/Q type, alpha 1A subunit) SCA7 ATXN7 (ataxin-7) 607640 6314 SCA17
TBP (TATA box-binding protein) 600075 6908 PSACH COMP 600310 1311
DM2 ZNF9 (zinc-finger protein) 116955 7555 SCA10 ATXN10 (ataxin-10)
611150 25814 SPD1 HOXD13 (homeobox D13) 142989 3239 OPMD PABN1
(poly(A)-binding protein-2) 602279 8106 CCD RUNX2 (runt-related
transcription 600211 860 factor 2) HPE5 ZIC2 (zinc-finger protein
of 603073 7546 cerebellum 2) HFG HOXA13 (homeobox A13) 142959 3209
syndrome BPES FOXL2 605597 668 EIEE1 ARX (homeobox gene) 300382
170302 FRAXA FMR1 (fragile X mental retardation 309550 2332 1)
FXTAS FMR1 (fragile X mental retardation 309550 2332 1) FRAXE AFF2
300806 309548
TABLE-US-00006 TABLE 8 REPEAT UNIT Coding = C Normal Comple-
[encoded amino average DISEASE or mentary acids] Non- range of
DISORDER 5'-3' strand 5'-3' Coding = NC repeat units DM1 (CTG)n
(CAG)n NC 5-37 SCA8 (non- (CTG)n (CAG)n NC 15-50 protein coding
strand, ATXN8OS) SCA8 (CAG)n (CTG)n C [polyGln] (ATXNS- coding
strand) SCA12 (CAG)n (CTG)n NC 7-32 HDL2 (CAG)n (CTG)n NC 6-28 SBMA
(CAG)n (CTG)n C [polyGln] 10-36 HD (CAG)n (CTG)n C [polyGln] 9-36
DRPLA (CAG)n (CTG)n C [polyGln] 7-25 SCA1 (CAG)n (CTG)n C [polyGln]
6-39 SCA2 (CAG)n (CTG)n C [polyGln] 13-31 SCA3 (CAG)n (CTG)n C
[polyGln] 13-44 SCA6 (CAG)n (CTG)n C [polyGln] 4-18 SCA7 (CAG)n
(CTG)n C [polyGln] 4-35 SCA17 (CAG)n (CTG)n C [polyGln] 25-44 SCA17
(CAA)n (TTG)n C [polyGln] 25-44 PSACH (GAC)n (GTC)n C [polyAsp] 5
DM2 (CCTG)n (CAGG)n NC .ltoreq.30.sup. SCA10 (ATTCT)n (AGAAT)n NC
10-29 SPD1 (GCG)n (CGC)n C [polyAla] 15 OPMD (GCG)n (CGC)n C
[polyAla] 6 CCD (GCG)n (CGC)n C [polyAla] 17 HPE5 (GCG)n (CGC)n C
[polyAla] 15 HFG (GCG)n (CGC)n C [polyAla] 18 syndrome BPES (GCG)n
(CGC)n C [polyAla] 14 EIEE1 (GCG)n (CGC)n C [polyAla] 10-16 FRAXA
(CGG)n (GCG)n NC 6-52 FXTAS (CGG)n (GCG)n NC 6-52 FRAXE (CCG)n
(CGG)n NC 4-39
(*) MIM number of the Online Mendelian Inheritance in Man.RTM.
(OMIM.RTM.) database. OMIM.RTM. is authored and edited at the
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins
University School of Medicine, U.S.A., under the direction of Dr.
Ada Hamosh; please see http://www.omim.org/ as well as McKusick, V.
A. 1998 (Mendelian Inheritance in Man; A Catalog of Human Genes and
Genetic Disorders, Baltimore, Md., U.S.A., Johns Hopkins University
Press, ISBN 0-8018-5742-2). (.sctn.) Gene ID as available from NCBI
(National Center for Biotechnology Information, U.S. National
Library of Medicine, 8600 Rockville Pike, Bethesda Md., 20894,
U.S.A.) [http://www.ncbi.nlm.nih.gov/].
[0202] According to an aspect of the application, the DNA nucleic
acid, to which the DNA-binding polypeptide of the application
binds, or specifically binds, is a gene, more particularly the
human gene, coding for DMPK, ATXN8, PPP2R2B, JPH3, AR, HTT, ATN1,
ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP, COMP, ZNF9, ATXN10,
HOXD13, PABN1, RUNX2, ZIC2, HOXA13, FOXL2, ARX, FMR1 or AFF2
(wherein said gene comprises said at least one DNA tandem
repeat).
[0203] According to an aspect of the application, said DNA nucleic
acid is a gene, more particularly the human gene, coding for DMPK,
ATXN8, PPP2R2B, JPH3, AR, HTT, ATN1, ATXN1, ATXN2, ATXN3, CACNA1A,
ATXN7, TBP, COMP, ZNF9 or ATXN10 (wherein said gene comprises said
at least one DNA tandem repeat).
[0204] More particularly, said DNA nucleic acid can be a gene, more
particularly the human gene, coding for DMPK (wherein said gene
comprises said at least one DNA tandem repeat).
[0205] According to an aspect of the application, the number of DNA
tandem repeat units contained in said human DNA nucleic acid is
above the average normal range, e.g., above the range that is
observed in a healthy subject, e.g., above the average normal range
of repeat units respectively indicated in Table 8 above (i.e.,
above 37 for the DM1 disease or disorder, above 50 for the SCA8
disease or disorder, above 32 for the SCA12 disease or disorder,
etc.).
[0206] The DNA-binding polypeptide of the application can be bound
to said DNA nucleic acid. More particularly, the DNA-binding
polypeptide of the application can be bound to said DNA nucleic
acid in vitro or in an in vitro cell.
[0207] The DNA-binding polypeptide of the application can be
directly or indirectly linked to at least one endonuclease monomer,
or to at least one fragment of endonuclease monomer, wherein said
fragment of endonuclease monomer still comprises the catalytic
domain of said endonuclease monomer.
[0208] More particularly, the DNA-binding polypeptide of the
application can be directly or indirectly linked to one
endonuclease monomer, or one fragment of endonuclease monomer,
wherein said fragment of endonuclease monomer still comprises the
catalytic domain of said endonuclease monomer.
[0209] The linkage of said (at least) one endonuclease monomer or
fragment thereof to said DNA-binding polypeptide is such that it
does not impede the endonuclease activity or function of said at
least one endonuclease monomer or at least fragment thereof.
[0210] The resulting structure can be viewed as and functions as a
TALEN monomer.
[0211] In the application, the phrase "endonuclease" and the phrase
"catalytic domain" (or equivalent or similar phrases) are given
their respective ordinary meaning in the field of enzymology, more
particularly in the field of enzymology for biotechnological
applications. An endonuclease can e.g., be defined as an enzyme
that cleaves phosphodiester bond(s) within polynucleotide chain(s).
The catalytic domain of an endonuclease can e.g., be defined as the
region of said endonuclease, which contains the catalytic function
of the endonuclease.
[0212] In the application, the phrase "catalytic domain" (or an
equivalent or similar phrase) can be understood as meaning
"cleavage domain", i.e., the portion of the endonuclease, which
causes the cleavage of the polynucleotide chain(s).
[0213] In the application, the phrase "linked" (or an equivalent or
similar phrase) encompasses direct linkage, as well as indirect
linkage. It encompasses any chemical linkage, more particularly
covalent linkage, more particularly divalent covalent linkage.
[0214] Appropriate endonucleases notably comprise endonucleases,
which function as multimers, more particularly as dimers.
[0215] A dimeric endonuclease is an endonuclease, which is formed
by two monomers, the dimerization of which is required to cleave
the target DNA double strand. Each monomer of a dimeric
endonuclease comprises a catalytic domain.
[0216] Examples of such dimeric endonucleases notably include the
FokI endonuclease (Christian et al. 2010, Li et al. 2011, WO
94/18313 and its national counterparts more particularly its US
counterpart(s), including the US continuation and divisional
application(s)), WO 95/09233 and its national counterparts more
particularly its US counterpart(s), including the US continuation
and divisional application(s)). An example of the sequence of a
Fold endonuclease and of its catalytic domain is available under
GENBANK accession number A32861. When the endonuclease is a
multimeric endonuclease, more particularly a dimeric endonuclease,
the DNA-binding polypeptide of the application is advantageously
linked to only one of said endonuclease monomers (advantageously in
only one exemplar).
[0217] Appropriate endonucleases also comprise monomeric
endonucleases. A monomeric endonuclease cleaves DNA, when it is
used as single monomer as well as when it is used in a pair of
monomeric endonucleases. Examples of monomeric endonucleases
include I-TevI, which is the homing endonuclease member of the
GIY-YIG protein family. Examples of fragments of I-TevI, which
still comprise the endonuclease catalytic domain, include the
I-TevI fragment, which consists of the N-terminal 183 residues of
wild-type I-TevI and a linker of 5 amino acids, e.g., the linker
QGPSG [SEQ ID NO: 22] (Beurdeley et al. 2013). When the
endonuclease is a monomeric endonuclease, the DNA-binding
polypeptide of the application is advantageously linked to said
endonuclease monomer in only one exemplar.
[0218] Examples of endonucleases also include non-naturally
occurring endonuclease, e.g., a non-naturally occurring
endonuclease, which derives from a naturally-occurring
endonuclease, more particularly from a naturally-occurring dimeric
endonuclease, by amino acid mutation(s) (e.g., by amino acid
replacement(s) and/or deletion(s) and/or addition(s), more
particularly by amino acid replacement(s)). For example, said
non-naturally occurring restriction endonuclease can be a (homo- or
hetero-) dimer, which differs from the FokI dimer by amino acid
mutation(s) in the catalytic domain of one or each of the two Fold
monomers. The number of amino acid mutation(s) per mutated Fold
monomer can e.g., be of three to six. For example, said amino acid
mutation(s) can be three to six mutations selected from positions
483, 486, 487, 490, 499 and 538 of the catalytic domain as
described in cf. WO 2012/015938 and its national counterparts,
including its US national counterpart(s).
[0219] Advantageously, the DNA-binding polypeptide of the
application is linked to only one endonuclease monomer
(advantageously at only one exemplar).
[0220] In the application, the phrase "an endonuclease monomer" (or
an equivalent or a similar phrase) encompasses a monomer of a
dimeric endonuclease, as well as the monomer of a monomeric
endonuclease.
[0221] For medical applications, more particularly for applications
relating to treatment and/or palliation and/or prevention of
diseases or disorders, a dimeric endonuclease might be preferred to
a pair of monomeric endonucleases, because a monomeric endonuclease
might induce off-target single-strand cleavage.
[0222] According to an embodiment of the application, the
endonuclease is a dimeric (naturally-occurring or non
naturally-occurring) endonuclease, such as FokI.
[0223] A fragment from an endonuclease monomer, which still
comprises the catalytic domain of the endonuclease monomer, can
also be used.
[0224] Said fragment can be a fragment of a monomer of a dimeric
endonuclease, or a fragment of a monomeric endonuclease (said
endonuclease being naturally-occurring or
non-naturally-occurring).
[0225] An example of a FokI endonuclease monomer is the sequence of
SEQ ID NO: 49 (cf. example 1 below).
[0226] Examples of a DNA-binding polypeptide of the application,
which is directly or indirectly linked to an endonuclease monomer
or to a fragment of endonuclease monomer, and which (specifically)
binds to a non-overlapping DNA target site (i.e., the DNA target
site of SEQ ID NO: 10) include: [0227] the polypeptide coded by the
sequence of SEQ ID NO: 39, and [0228] the polypeptide coded by
plasmid pCLS9996exp (C.N.C.M. deposit number I-4804).
[0229] Examples of a DNA-binding polypeptide of the application,
which is directly or indirectly linked to an endonuclease monomer
or to a fragment of endonuclease monomer, and which (specifically)
binds to an overlapping DNA target site (i.e., the DNA target site
of SEQ ID NO: 4), include: [0230] the polypeptide coded by the
sequence of SEQ ID NO: 50, and [0231] the polypeptide coded by
plasmid pCLS16715 (C.N.C.M. deposit number I-4805).
[0232] A DNA-binding polypeptide of the application may further
comprise a detection label or a selection marker, such as kanamycin
or a knockout leucine synthesis gene (e.g., LEU2) (cf. example 1
below).
[0233] The application also relates to a set comprising a first
DNA-binding polypeptide and a second DNA-binding polypeptide,
wherein only one, or each one, of said first and second DNA-binding
polypeptides is a DNA-binding polypeptide of the application. Said
first DNA-binding polypeptide is different from said second
DNA-binding polypeptide. Said set can be herein referred to as the
"polypeptide set of the application".
[0234] A polypeptide set of the application is: [0235] a set
comprising a first DNA-binding polypeptide and a second DNA-binding
polypeptide, wherein said first DNA-binding polypeptide is a
DNA-binding polypeptide of the application and wherein said second
DNA-binding polypeptide is a DNA-binding polypeptide of the
application; or [0236] a set comprising a first DNA-binding
polypeptide and a second DNA-binding polypeptide, wherein said
first DNA-binding polypeptide is a DNA-binding polypeptide of the
application and wherein said second DNA-binding polypeptide is not
a DNA-binding polypeptide of the application (said set may herein
after be more particularly referred to as "a mixed polypeptide set
of the application").
[0237] The phrase "set" is intended in accordance with its ordinary
meaning in the field. It notably encompasses the meaning of "a
plurality of", more particularly the meaning of "a pair of". Said
set of plurality (or pair) can e.g., be in the form of one
composition or kit, or of at least two compositions or at least two
kits.
[0238] Said one composition or kit comprises both said first and
second DNA-binding polypeptides.
[0239] Said at least two compositions or kits are in the form of
separate compositions or kits, each comprising one of said first
and second DNA-binding polypeptides (e.g., a first composition or
kit comprising said first DNA-binding polypeptide and a second
composition or kit comprising said second DNA-binding polypeptide,
wherein said first composition or kit is distinct or separate from
said second composition or kit). Said at least two compositions or
kits can be for simultaneous, separate, distinct or sequential use,
more particularly for simultaneous or sequential use.
[0240] In said polypeptide set, the first and second DNA-binding
polypeptides can e.g., be present as isolated polypeptides, as
individual polypeptides, as dimerized polypeptides, or can be
contained within cell(s), e.g., within host and/or genetically
engineered cell(s) (e.g., as described below) (the first
DNA-binding polypeptide can be contained within the same cell as
said second DNA-binding polypeptide, or in two distinct cells
respectively).
[0241] According to an aspect of the application, the first
DNA-binding polypeptide and the second DNA-binding polypeptide,
which are comprised in said set, are different from each other.
[0242] According to an aspect of the application, said first and
second DNA-binding polypeptides (specifically) bind to the same DNA
nucleic acid but at different DNA target sites.
[0243] More particularly, said first and second DNA-binding
polypeptides (specifically) bind to the same double-stranded DNA
nucleic acid, wherein said first DNA-binding polypeptide binds to
one strand of said double-stranded DNA nucleic acid, and wherein
said second DNA-binding polypeptide binds to the other strand of
said double-stranded DNA nucleic acid (i.e., to the complementary
strand). Hence, said first DNA-polypeptide recognizes or binds to a
first DNA target site, said second DNA-polypeptide recognizes or
binds to a second DNA target site, wherein said first DNA target
site is comprised in a strand of a double-stranded nucleic acid and
said second DNA target site is comprised in the other
(complementary) strand of the same double-stranded DNA nucleic
acid.
[0244] Advantageously, said first DNA target site is different from
said second DNA target site.
[0245] Advantageously, said first DNA target site is comprised in a
first strand of a double-stranded DNA nucleic acid, without being
comprised in the second strand of the same double-stranded DNA
nucleic acid, and, conversely, said second DNA target site is
comprised in said second strand (of the same double-stranded DNA
nucleic acid as said first DNA target site), without being
comprised in said first strand.
[0246] The application thus relates to a composition or kit
comprising a first DNA-binding polypeptide and a second DNA-binding
polypeptide,
wherein said first DNA-binding polypeptide is different from said
second DNA-binding polypeptide, wherein each of said first and
second DNA-binding polypeptides binds to a DNA nucleic acid
comprising at least one DNA tandem repeat, wherein the DNA nucleic
acid to which said first DNA-binding polypeptide binds is one
strand of a double-stranded nucleic acid, wherein the DNA nucleic
acid to which said second DNA-binding polypeptide binds is the
other strand of the same double-stranded nucleic acid, wherein said
double-stranded DNA nucleic acid is a gene involved in a
neurological and/or muscular and/or skeletal disorder or disease
involving said at least one DNA tandem repeat, wherein each of said
first and second DNA-binding polypeptides comprises a TAL effector
tandem repeat consisting of adjacent units of TAL effector tandem
repeat, wherein the ordered series of RVDs formed by the RVDs
respectively contained in said adjacent units of TAL effector
tandem repeat, in N- to C-orientation, is an ordered series of
amino acids, which determines the recognition of the 5'-3'
nucleotide sequence of a DNA target site contained in the strand of
double-stranded DNA nucleic acid to which said DNA-binding
polypeptide binds, wherein the sequence of said DNA target site is:
[0247] i. a fragment of said strand of double-stranded DNA nucleic
acid consisting of a fragment of said at least one DNA tandem
repeat, wherein said fragment comprises more than one copy of said
DNA sequence unit of said at least one DNA tandem repeat, or [0248]
ii. a fragment of said strand of double-stranded DNA nucleic acid,
which starts outside the sequence of said at least one DNA tandem
repeat and ends within the sequence of said at least one DNA tandem
repeat, or conversely, which starts within the sequence of said at
least one DNA tandem repeat and ends outside the sequence of said
at least one DNA tandem repeat, wherein each of said first and
second DNA-binding polypeptides is directly or indirectly linked to
one endonuclease monomer or to one fragment of endonuclease
monomer, wherein said fragment of endonuclease monomer still
comprises the catalytic domain of said endonuclease monomer, and
wherein said first and second DNA-binding polypeptides induce a
partial or complete deletion of said at least one DNA tandem
repeat.
[0249] Advantageously, said endonuclease monomer is the monomer of
a dimeric endonuclease.
[0250] According to an aspect of the application, the DNA target
site is a non-overlapping DNA target site (as defined above) for
only one of said first and second DNA-binding polypeptides, or for
each of said first and second DNA-binding polypeptides. When the
DNA target site is a non-overlapping DNA target site (as defined
above) for only one of said first and second DNA-binding
polypeptides, the DNA target site of the other of said first and
second DNA-binding polypeptides is: [0251] an overlapping DNA
target site (as above-defined), or is [0252] a DNA target site,
which is neither a non-overlapping site (as above-defined) nor an
overlapping site (as above-defined).
[0253] According to an aspect of the application, the DNA target
site is an overlapping DNA target site (as defined above) for only
one of said first and second DNA-binding polypeptides, or for each
of said first and second DNA-binding polypeptides. When the DNA
target site is an overlapping DNA target site (as defined above)
for only one of said first and second DNA-binding polypeptides, the
DNA target site of the other of said first and second DNA-binding
polypeptides is: [0254] a non-overlapping DNA target site (as
above-defined), or is [0255] a DNA target site, which is neither a
non-overlapping site (as above-defined) nor an overlapping site (as
above-defined).
[0256] Advantageously, the DNA target site is an overlapping DNA
target site (as defined above) for one of said first and second
DNA-binding polypeptides.
[0257] Advantageously, the DNA target site is a non-overlapping DNA
target site (as defined above) for one of said first and second
DNA-binding polypeptides.
[0258] Advantageously, the DNA target site is an overlapping DNA
target site (as defined above) for one of said first and second
DNA-binding polypeptides and is a non-overlapping DNA target site
(as defined above) for the other of said first and second
DNA-binding polypeptides.
[0259] This configuration drastically reduces the chance that the
first and second DNA-binding polypeptides induce a length
alteration or mutation at an off-target location, e.g., in a
non-pathological gene, which would comprise the same DNA repeat
unit as the targeted pathological gene.
[0260] For example, said first DNA-binding polypeptide binds to a
DNA target site of SEQ ID NO: 10 and said second DNA-binding
polypeptide binds to a DNA target site of SEQ ID NO: 4, or said
first DNA-binding polypeptide binds to a DNA target site of SEQ ID
NO: 11 and said second DNA-binding polypeptide binds to a DNA
target site of SEQ ID NO: 5 (cf. FIG. 1B).
[0261] For example, the ordered series of RVDs formed by the
adjacent units forming the TAL effector tandem repeat of said first
DNA-binding polypeptide is NN; HD; NG; NN; HD; NG; NN; HD; NG; NN;
HD; NG; NN; HD; NG, which determines the (specific) recognition
(and the (specific) binding to) the non-overlapping DNA target site
of SEQ ID NO: 10), and the ordered series of RVDs formed by the
adjacent units forming the TAL effector tandem repeat of said
second DNA-binding polypeptide is NN; NG; NN; NI; NG; HD; HD; HD;
HD; HD; HD; NI; NN; HD; NI, which determines the (specific)
recognition (and the (specific) binding to) the overlapping DNA
target site of SEQ ID NO: 4.
[0262] More particularly, the TAL effector tandem repeat of said
first DNA-binding polypeptide is different from the one of said
second DNA-binding polypeptide. The difference can be a difference
in amino acid sequence and/or in amino acid length.
[0263] More particularly: [0264] the frame sequence(s) of the TAL
effector tandem repeat units of said first DNA-binding polypeptide
is(are) different from the one(s) of said second DNA-binding
polypeptide; and/or [0265] said first DNA-binding polypeptide and
said second DNA-binding polypeptide have different DNA target
sites, i.e., the ordered series of RVDs formed by the units of the
TAL effector tandem repeat that is contained in said first
DNA-binding polypeptide is different from the ordered series of
RVDs formed by the units of the TAL effector tandem repeat that is
contained in said second DNA-binding polypeptide.
[0266] For example: [0267] the adjacent units of the TAL effector
tandem repeat that is contained in said first DNA-binding
polypeptide may comprise one or several copy(ies) of at least one
sequence selected from the group consisting of SEQ ID NOs: 25, 26,
46, 55 and said variants thereof, [0268] the adjacent units of the
TAL effector tandem repeat that is contained in said second
DNA-binding polypeptide may comprise one or several copy(ies) of at
least one sequence selected from the group consisting of SEQ ID
NOs: 25, 26, 46, 55 and said variants thereof, and [0269] the
ordered series of RVDs formed by the RVDs respectively contained in
the adjacent units of the TAL effector tandem repeat of said first
DNA-binding polypeptide, in N- to C-orientation, is different from
the ordered series of RVDs formed by the RVDs respectively
contained in the adjacent units of the TAL effector tandem repeat
of said second DNA-binding polypeptide, in N- to C-orientation.
[0270] For example, the ordered series of RVDs formed by the RVDs
respectively contained in the adjacent units of the TAL effector
tandem repeat of said first DNA-binding polypeptide, in N- to
C-orientation, determines the recognition of (and the (specific)
binding to) an overlapping DNA target site (as defined above, e.g.,
the DNA target site of SEQ ID NO: 4 or 5, more particularly the DNA
target site of SEQ ID NO: 4), and the ordered series of RVDs formed
by the RVDs respectively contained in the adjacent units of the TAL
effector tandem repeat of said second DNA-binding polypeptide, in
N- to C-orientation, determines the recognition of (and the
(specific) binding to) a non-overlapping DNA target site (as
defined above, e.g., the DNA target site of SEQ ID NO: 10 or 11,
more particularly the DNA target site of SEQ ID NO: 10; cf. FIG.
1B).
[0271] Each of said first and second DNA-binding polypeptides can
be linked to an endonuclease monomer or to a fragment of such a
monomer as described above.
[0272] Advantageously, each of said first and second DNA-binding
polypeptides is linked to the monomer of a dimeric endonuclease,
such as Fok I, or to a fragment of such a monomer as described
above (cf. above).
[0273] In a polypeptide set of the application, said first
DNA-binding polypeptide can be dimerized to said second DNA-binding
polypeptide.
[0274] The application thus relates to a polymer, more particularly
a dimer, which comprises said first and second DNA-binding
polypeptides. Said polymer may further comprise at least one
(double-stranded) DNA nucleic acid, more particularly at least one
(double-stranded) DNA nucleic acid comprising at least one DNA
tandem repeat (as defined above). Said at least one DNA nucleic
acid can be linked to said first and second DNA-binding
polypeptides by non-covalent linkage, e.g., by non-covalent binding
of the RVDs of said first and second DNA-binding polypeptides to
nucleotides of said at least one DNA nucleic acid, e.g., by
non-covalent binding of the RVDs of said first DNA-binding
polypeptide to nucleotides of one strand of said at least one
double-stranded DNA nucleic acid and by non-covalent binding of the
RVDs of said second DNA-binding polypeptide to nucleotides of the
other (complementary) strand of the same double-stranded DNA
nucleic acid.
[0275] Alternatively, in a polypeptide set of the application, said
first DNA-binding polypeptide can be not dimerized to said second
DNA-binding polypeptide.
[0276] More particularly, in a set of the application, said first
DNA-binding polypeptide can be contained separately from said
second DNA-binding polypeptide, e.g., to avoid dimerization of said
first DNA-binding polypeptide to said second DNA-binding
polypeptide.
[0277] The nucleotide length that extends from the DNA target site
of said first DNA-binding polypeptide to the DNA target site of
said second DNA-polypeptide is being referred to as the "spacer
length". This terminology is in accordance with the terminology
that is used in the field of TALENs.
[0278] Said spacer length is the number of nucleotides extending
between the two proximal ends of the respective DNA target sites of
said first and second DNA-binding polypeptides.
[0279] On a double-stranded DNA nucleic acid, wherein a first
DNA-binding polypeptide recognizes or binds to a first DNA target
site on one strand of said double-stranded DNA nucleic acid and
wherein a second DNA-binding polypeptide recognizes or binds to a
second DNA target site on the other strand of said double-stranded
DNA nucleic acid, said spacer length can be viewed as the
nucleotide length that extends from the 3' end of one of the
respective DNA target sites of said first and second DNA-binding
polypeptides to the 3' end of the other of said DNA target sites
(the last 3' end nucleotides of said first and second DNA target
sites are not taken into account in the computation of said
nucleotide number).
[0280] For example, in FIG. 1B, the sequence of the spacer is
GCAGCAGCAGCAGCAGCAGC [SEQ ID NO: 8] (GCTGCTGCTGCTGCTGCTGC [SEQ ID
NO: 9] on the complementary strand (5'-3')). Hence, in FIG. 1B, the
spacer length is 20 nucleotides.
[0281] When each of the first and second DNA-binding polypeptides
are respectively linked to the monomer of the same dimeric
endonuclease, said spacer length is selected to be sufficiently
short and sufficiently long for the two monomers of said dimeric
endonuclease to dimerize when said first and second DNA-binding
polypeptides are bound to their respective DNA target sites on each
strand of the same double-stranded DNA nucleic acid (cf. FIGS. 1A
and 1B). In other words, the respective DNA target sites of said
first and second DNA-binding polypeptides are selected to be spaced
apart by a spacer length that is appropriate for dimerization of
the two endonuclease monomers respectively borne by said first and
second DNA-binding polypeptides.
[0282] The respective DNA target sites of said first and second
DNA-polypeptides can be spaced apart by a nucleotide length that
may vary from 6 to 40 nt (or bp), optimal cleavage being usually
observed with a spacer length of 10 to 30 nt (or bp), e.g., of
15-24 nt (or bp), 15-21 nt (or bp) or 16-21 nt (or bp), e.g., 16,
17, 18, 19, 20 or 21 nt (or bp).
[0283] Advantageously: [0284] said DNA target site is an
overlapping DNA target site (as defined above) for one of said
first and second DNA-binding polypeptides, and is a non-overlapping
DNA target site (as defined above) for the other of said first and
second DNA-binding polypeptides, and [0285] each of said first and
second DNA-binding polypeptides is linked to the monomer of a
dimeric endonuclease (cf. above), such as Fok I, or to a fragment
of such a monomer as described above.
[0286] Advantageously: [0287] said DNA target site is an
overlapping DNA target site (as defined above) for one of said
first and second DNA-binding polypeptides, and is a non-overlapping
DNA target site (as defined above) for the other of said first and
second DNA-binding polypeptides, [0288] each of said first and
second DNA-binding polypeptides is linked to the monomer of a
dimeric endonuclease (cf. above), such as Fok I, or to a fragment
of such a monomer as described above, and [0289] the DNA target
site of said first DNA-binding polypeptide is spaced apart from the
one of said second DNA-binding polypeptide by a spacer length that
enables dimerization of the two endonuclease monomers respectively
borne by said first and second DNA-binding polypeptides (when said
first and second DNA-binding polypeptides are bound to their
respective DNA target sites), e.g., by a spacer length as indicated
above e.g., a spacer length of 15-24 nt (or bp), 15-21 nt (or bp)
or 16-21 nt (or bp), e.g., 16, 17, 18, 19, 20 or 21 nt (or bp).
[0290] In a set, which comprises a first DNA-binding polypeptide,
which is a DNA-binding polypeptide of the application and which
further comprises a second DNA-binding polypeptide, which is not a
DNA-binding polypeptide of the application, i.e., in a mixed
polypeptide set of the application, said second DNA-binding
polypeptide, which is not of the application, can e.g., be as
above-defined except that its DNA target site is neither a
non-overlapping DNA target site as defined above nor an overlapping
DNA target site as defined above.
[0291] Said second DNA-binding polypeptide, which is not of the
application, can e.g., be identical to a DNA-polypeptide of the
application in all features (e.g., it comprises a TAL effector
tandem repeat as above-defined), except for the DNA target site,
which is not one that is recognized by a DNA-binding polypeptide of
the application.
[0292] Hence, said second DNA-binding polypeptide, which is not of
the application, can e.g., be identical to a DNA-polypeptide of the
application in all features (e.g., it comprises a TAL effector
tandem repeat as above-defined), except that the ordered series of
RVDs formed by the RVDs respectively contained in the adjacent
units of its TAL effector tandem repeat, in N- to C-orientation, is
an ordered series of amino acids, which according to the
RVD/nucleotide correspondence shown in Table 5 above, determines
the recognition of the 5'-3' nucleotide sequence of a DNA target
site that is contained in a DNA nucleic acid, wherein said DNA
nucleic acid is as above-defined, but wherein said DNA target site
is neither a non-overlapping DNA target site as defined above nor
an overlapping DNA target site as defined above. More particularly,
the DNA target site of said second DNA-binding polypeptide, which
is not of the application, can be a fragment of said DNA nucleic,
which does not comprise any fragment of said at least one DNA
tandem repeat, more particularly a fragment of said DNA nucleic,
which does not comprise any DNA sequence unit of said at least one
DNA tandem repeat.
[0293] Said first DNA-binding polypeptide, which is comprised in
the set with said second DNA-binding polypeptide, is a DNA-binding
polypeptide of the application, and therefore has a DNA target
site, which is either a non-overlapping DNA target site as
above-defined or an overlapping DNA binding site as above
defined.
[0294] The spacer length is as above-defined. More particularly,
the spacer length between said first DNA-binding polypeptide of the
application and said second DNA-binding polypeptide (which is not
of the application) is a nucleotide length appropriate for
dimerization of the two endonuclease monomers respectively borne by
said first and second DNA-binding polypeptides of the application.
The respective DNA target sites of said first and second
DNA-polypeptides can be spaced apart by a nucleotide length that
may vary from 6 to 40 nt (or bp), optimal cleavage being usually
observed with a spacer length of 10 to 30 nt (or bp), e.g., of
15-24 nt (or bp), 15-21 nt (or bp) or 16-21 nt (or bp), e.g., 16,
17, 18, 19, 20 or 21 nt (or bp).
[0295] According to an aspect of the application, said first and
second DNA-binding polypeptides induce a double-strand break in
said double-stranded DNA nucleic acid. More particularly, they
induce a double-strand break specifically in said double-stranded
DNA nucleic acid.
[0296] The application also relates to a nucleic acid, more
particularly a DNA or RNA, more particularly a DNA. Said nucleic
acid can be a man-made or artificial or engineered nucleic
acid.
[0297] A nucleic acid of the application codes for the DNA-binding
polypeptide of the application, more particularly for the
DNA-binding polypeptide of the application (directly or indirectly)
linked to (at least) one endonuclease monomer (cf. above) or to (at
least) one fragment of endonuclease monomer as above-defined.
[0298] The application relates more particularly to a coding
nucleic acid, the coding sequence of which consists of a sequence
coding for the DNA-binding polypeptide of the application, more
particularly for the DNA-binding polypeptide of the application
(directly or indirectly) linked to (at least) one endonuclease
monomer (cf. above) or to (at least) one fragment of endonuclease
monomer as above-defined (said coding being according to the
universal genetic code, taking due account of its degeneracy).
[0299] The application relates more particularly to a coding
nucleic acid, the coding sequence of which comprises a sequence,
which codes for the TAL effector tandem repeat of a DNA-binding
polypeptide of the application (said coding being according to the
universal genetic code, taking due account of its degeneracy). Said
coding sequence may e.g., comprise one or several copy(ies) of at
least one of the sequences coding for SEQ ID NO: 25, 26, 46, 55 and
said variant sequences thereof.
[0300] Examples of such coding nucleic acid sequences comprise:
[0301] the nucleic acid sequence of SEQ ID NO: 45, which consists
of 10 copies of a sequence coding for SEQ ID NO: 46 and of 5 copies
of a sequence coding for SEQ ID NO: 25, and which codes for a TAL
effector tandem repeat, wherein the ordered series of RVDs (i.e.,
NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD; NG)
determine the (specific) recognition of (and the (specific) binding
to) the non-overlapping DNA target site of SEQ ID NO: 10), [0302]
the portion of the nucleic acid sequence of the insert carried by
plasmid pCLS9996exp (C.N.C.M. deposit number I-4804), which codes
for a TAL effector tandem repeat (said TAL effector tandem repeat
(specifically) binds to the non-overlapping DNA target site of SEQ
ID NO: 10), [0303] the nucleic acid sequence of SEQ ID NO: 54,
which consists of 5 copies of a sequence coding for SEQ ID NO: 46,
2 copies of a sequence coding for SEQ ID NO: 55 and of 8 copies of
a sequence coding for SEQ ID NO: 25, and which codes for a TAL
effector tandem repeat, wherein the ordered series of RVDs (i.e.,
NN; NG; NN; NI; NG; HD; HD; HD; HD; HD; HD; NI; NN; HD; NI)
determine the (specific) recognition of (and the (specific) binding
to) the overlapping DNA target site of SEQ ID NO: 4), [0304] the
portion of the nucleic acid sequence of the insert carried by
plasmid pCLS16715 (C.N.C.M. deposit number I-4805), which codes for
a TAL effector tandem repeat (said TAL effector tandem repeat
(specifically) binds to the overlapping DNA target site of SEQ ID
NO: 4).
[0305] The application more particularly relates to a nucleic acid
(DNA or RNA), which codes for the TAL effector tandem repeat coded
by plasmid pCLS9996exp (C.N.C.M. deposit number I-4804) or by
plasmid pCLS16715 (C.N.C.M. 1-4805).
[0306] Examples of sequences coding for an endonuclease monomer or
for a fragment of endonuclease monomer as above-defined comprise:
[0307] the sequence of SEQ ID NO: 3 (which codes for the FokI
monomer of SEQ ID NO: 49) and the fragments thereof, which still
code for the FokI catalytic domain, and [0308] the sequences coding
for the I-TevI endonuclease and the fragments thereof, which still
code for the I-TevI catalytic domain.
[0309] The nucleic acid of the application may comprise at least
one coding sequence, wherein said at least one coding sequence
codes for (at least) one DNA-binding polypeptide of the application
(directly or indirectly) linked to (at least) one endonuclease
monomer (cf. above) or to (at least) one fragment of endonuclease
monomer as above-defined (according to the universal genetic code
and taking due account of its degeneracy). Examples of such coding
nucleic acid sequences comprise: [0310] the nucleic acid sequence
of SEQ ID NO: 39 (which codes for a TALEN, which (specifically)
binds to the non-overlapping DNA target site of SEQ ID NO: 10),
[0311] the nucleic acid sequence of the insert carried by plasmid
pCLS9996exp (C.N.C.M. deposit number I-4804), which codes for a
TALEN that (specifically) binds to the non-overlapping DNA target
site of SEQ ID NO: 10), [0312] the nucleic acid sequence of SEQ ID
NO: 50 (which codes for a TALEN, which (specifically) binds to the
overlapping DNA target site of SEQ ID NO: 4), [0313] the nucleic
acid sequence of the insert carried by plasmid pCLS16715 (C.N.C.M.
deposit number I-4805), which codes for a TALEN that (specifically)
binds to the overlapping DNA target site of SEQ ID NO: 4).
[0314] The nucleic acid of the application can further comprise a
translational start codon, such as ATG, located (immediately) in 5'
of said coding sequence and/or further comprise a 3'UTR for
transcription termination and polyadenylation of RNA transcript
located (immediately) in 3' of said coding sequence. For example,
said 3' UTR comprises a translational stop codon (such as TGA, TAG
or TAA) and a polyA sequence.
[0315] The nucleic acid of the application may further comprise
sequence(s), which does(do) not code for amino acid(s) but which
regulates(regulate) transcription and/or translation. For example,
the nucleic acid of the application can further comprise (at least)
one sequence for initiating DNA transcription located in 5' of said
coding sequence and/or further comprise (at least) one sequence for
terminating DNA transcription located in 3' of said coding
sequence. For example, the nucleic acid of the application may
further comprise (at least) one enhancer (such as the GAL10
enhancer of SEQ ID NO: 37) and a promoter (such as the CYC1
promoter of SEQ ID NO: 38) in 5' of said coding sequence, and may
further comprise a terminator (such as an ADH1 terminator of SEQ ID
NO: 40 or 51).
[0316] The nucleic acid of the application, can (thereby) form an
expression cassette for expression in a host cell, more
particularly in a eukaryotic host cell, more particularly in a
mammalian cell, a non-human mammalian cell (e.g., a rodent cell,
such as a mouse cell), a human host cell, a yeast host cell, a
bacterial host cell or a plant host cell, more particularly in a
human host cell or a yeast host cell, more particularly in a human
host cell.
[0317] Hence, the nucleic acid of the application may consist of:
[0318] (at least) one sequence coding for a polypeptide consisting
of the DNA-binding polypeptide of the application (according to the
universal genetic code and taking due account of its degeneracy),
or coding for a polypeptide consisting of the DNA-binding
polypeptide of the application (directly or indirectly) linked to
(at least) one endonuclease monomer (cf. above) or to (at least)
one fragment of endonuclease monomer as above-defined (according to
the universal genetic code and taking due account of its
degeneracy), and [0319] optionally, sequence(s), which does(do) not
code for amino acids but which regulates(regulate) transcription
and/or translation, such as (at least) one sequence for initiating
DNA transcription located (immediately) in 5' of said nucleic acid
sequence and/or (at least) one sequence for terminating DNA
transcription located (immediately) in 3' of said nucleic acid
sequence, e.g., as above described.
[0320] For example, the sequence of the nucleic acid of the
application can comprise or consist of the sequence of SEQ ID NO: 2
or the sequence of SEQ ID NO: 1.
[0321] The sequence of SEQ ID NO: 2 codes for a DNA-binding
polypeptide of the application, which is linked to a FokI
endonuclease monomer and, which (specifically) binds to a DNA
target site that is the (non-split) left TALE DNA-binding domain of
FIG. 1B (cf. example 1 below). The sequence of the left-hand TALE
DNA-binding domain of FIG. 1B is the sequence of SEQ ID NO: 4,
i.e., an overlapping DNA target site as defined above.
[0322] The sequence of SEQ ID NO: 1 codes for a DNA-binding
polypeptide of the application, which is linked to a FokI
endonuclease monomer and, which (specifically) binds to a DNA
target site that is the right TALE DNA-binding domain of FIG. 1B
(cf. example 1 below). The sequence of the DNA target site of the
right TALE DNA-binding domain of FIG. 1B is the sequence of SEQ ID
NO: 10, i.e., a non-overlapping DNA target site as defined
above.
[0323] For example, the sequence of the nucleic acid of the
application can comprise or consist of the sequence of the insert
carried by plasmid pCLS9996 (C.N.C.M. I-4804) or the sequence of
the insert carried by plasmid pCLS16715 (C.N.C.M. I-4805).
[0324] The application also relates to a nucleic acid vector, more
particularly a recombinant vector, more particularly a recombinant
expression nucleic acid vector, which comprises at least one
nucleic acid (DNA or RNA) of the application.
[0325] A nucleic acid vector of the application may comprise a
cloning site into which a nucleic acid is inserted, wherein the
sequence of said inserted nucleic acid is the sequence of the
nucleic acid of the application.
[0326] The nucleic acid vector of the application advantageously is
a non-integrative (i.e., a vector, which does not induce the
integration of the nucleic acid into the genome of the host into
which said vector has been introduced) and/or non-replicative.
[0327] According to an embodiment of the application, said nucleic
acid vector is a recombinant expression vector. More particularly,
said nucleic acid vector is an expression vector comprising a
cloning site into which a nucleic acid to be expressed is inserted
under the control of a 5' expression promoter (said promoter being
inducible or non-inducible), and optionally under the control of at
least one 5' expression enhancer, wherein the sequence of said
nucleic acid to be expressed is the sequence of the nucleic acid of
the application.
[0328] Advantageously, said expression vector is a non-integrative
vector, more particularly a vector for transient expression, for
example a plasmid.
[0329] An illustrative plasmid is the plasmid pCLS16715 (C.N.C.M.
I-4805), which carries the sequence of SEQ ID NO: 2 (as nucleic
acid to be expressed): SEQ ID NO: 2 codes for the TALEN monomer
that binds to the left-hand DNA target site of SEQ ID NO: 4; cf.
FIG. 1B. Another illustrative plasmid is the plasmid pCLS9996exp
(C.N.C.M. I-4804), which carries the sequence of SEQ ID NO: 1 (as
nucleic acid to be expressed): SEQ ID NO: 1 codes for the TALEN
monomer that binds to the right-hand DNA target site of SEQ ID NO:
10; cf. FIG. 1B.
[0330] Each of plasmid pCLS16715 and plasmid pCLS9996exp has been
deposited at the Collection Nationale de Cultures de
Microorganismes (C.N.C.M.) under the terms of the Budapest Treaty
(COLLECTION NATIONALE DE CULTURES DE MICROORGANISMES; Institut
Pasteur; 28, rue du Docteur Roux; F-75724 PARIS CEDEX 15;
FRANCE).
[0331] The C.N.C.M. deposit number of plasmid pCLS16715 is I-4805
and the date of the deposit under the terms of the Budapest Treaty
is 10 Oct. 2013. Deposit I-4805 is plasmid pCLS16715 transformed in
E. coli (more particularly, an E. coli strain, which is deficient
in the genes involved in the rearrangement and deletion of DNA,
such as E. coli SURE.RTM.2, which is available from STRATEGENE, an
AGILENT TECHNOLOGIES division, California, U.S.A.; e.g, an E. coli
strain, which is endA1 glnV44 thi-1 gyrA96 relA1 lac recB recJ sbcC
umuC::Tn5 uvrC e14-.DELTA.(mcrCB-hsdSMR-mrr)171 F[proAB.sup.+
lacI.sup.q lacZ.DELTA.M15 Tn10 Amy Cm.sup.R]). An example of
suitable growth medium is Lysogeny Broth (LB) growth
medium+ampicillin (e.g., ampicillin at 100 .mu.g/mL). An example of
suitable incubation condition is 37.degree. C. (more particularly,
37.degree. C. under stirring conditions).
[0332] The C.N.C.M. deposit number of plasmid pCLS9996exp is I-4804
and the date of the deposit under the terms of the Budapest Treaty
is 10 Oct. 2013. Deposit I-4804 is plasmid pCLS9996 transformed in
E. coli (more particularly, an E. coli strain, which is efficient
in DNA transformation and in maintenance of large plasmids, such as
E. coli DH10B (cf. Durfee et al. 2008, J. Bacteriol. 190(7):
2597-2606)). An example of suitable growth medium is Lysogeny Broth
(LB) growth medium+kanamycin sulfate (e.g., kanamycin at 50
.mu.g/mL). An example of suitable incubation condition is
37.degree. C. (more particularly, 37.degree. C. under stirring
conditions).
[0333] Appropriate non-integrative vectors, more particularly
appropriate vectors for transient expression, also comprise
retroviral or lentiviral vectors, more particularly HIV vectors,
more particularly HIV1 vectors, wherein the integrase of said
vectors is or has been made defective, e.g., by class 1 integrase
mutation(s) (whereby said vectors are or have been made
non-integrative). Examples of such non-integrative vectors
comprise: [0334] a HIV1 vector, the integrase of which has been
made defective by replacement of the .sup.262RRK motif by AAH as
described in Philippe et al. 2006 (cf. FIG. 1 of Philippe et al.
2006), [0335] a retroviral or lentiviral vector, more particularly
a HIV vector, more particularly a HIV1 vector, as described in WO
99/55892, which has been made non-integrative e.g., by the method
described in Philippe et al. 2006 or by the method described in WO
2006/010834, [0336] a non-integrative vector as described in WO
2009/019612, more particularly at paragraph
[0337] of WO 2009/019612.
[0338] The application more particularly relates to a recombinant
vector, more particularly a recombinant expression vector, more
particularly a recombinant retroviral expression vector, more
particularly a lentiviral expression vector, which comprises:
at least one nucleic acid (DNA or RNA) of the application, more
particularly at least one RNA of the application (and regulatory
elements for the expression of said at least one nucleic acid or
RNA), and a defective integrase, more particularly an integrase,
which has been made defective by mutation(s), more particularly an
integrase, which has been made defective by mutation(s), wherein
said mutation(s) comprise(s) or consist(s) of one or more point
mutations affecting a basic region of its C-terminal region.
[0339] Said defective integrase does not allow (or prevents) the
integration of said at least one nucleic acid or of the cDNA
thereof into the genome of a host cell, more particularly into the
genome of a mammalian cell (more particularly into the genome of a
mammalian neuronal cell and/or of a mammalian muscular cell and/or
of a cell of the mammalian skeleton), more particularly into the
genome of a human cell (more particularly into the genome of a
human neuronal cell and/or of a human muscular cell and/or of a
cell of the human skeleton).
[0340] Said defective integrase may e.g., be the integrase of Human
Immunodeficiency Virus type 1 (HIV1), Human Immunodeficiency Virus
type 2 (HIV2), Simian Immunodeficiency Virus (SIV), Feline
Immunodeficiency Virus (FIV), Equine Infectious Anemia Virus
(EIAV), Bovine Immunodeficiency Virus (BIV), visna virus or Caprine
Arthritis Encephalitis Virus (CAEV), more particularly the
integrase of HIV1, which has been made defective by mutation(s),
more particularly an integrase of HIV1, which has been made
defective by mutation(s), wherein said mutation(s) comprise(s) or
consist(s) of one or more point mutations affecting a basic region
of its C-terminal region (cf. WO 2006/010834). More particularly,
said integrase may e.g., be the integrase of HIV1, which has been
made defective by replacement of the .sup.262RRK motif by AAH (cf.
Philippe et al. 2006).
[0341] Said recombinant vector, more particularly a recombinant
expression vector, more particularly a recombinant retroviral
expression vector, more particularly a lentiviral expression vector
advantageously is non-integrative, more particularly
non-integrative and non-replicative.
[0342] The application more particularly relates to a recombinant
retroviral expression vector, more particularly a lentiviral
expression vector, more particularly a HIV expression vector, more
particularly a HIV1 expression vector, which comprises at least one
nucleic acid or RNA of the application (and regulatory elements for
the expression of said at least one nucleic acid), wherein the
integrase of said retrovirus or lentivirus has been made defective,
more particularly which has been made defective by mutation(s),
more particularly which has been made defective by mutation(s),
wherein said mutation(s) comprise(s) or consist(s) of one or more
point mutations affecting a basic region of its C-terminal region
(cf. WO 2006/010834). Said retrovirus or lentivirus may e.g., be
HIV1, HIV2, SIV, FIV, EIAV, BIV, visna or CAEV, more particularly
HIV1.
[0343] The application more particularly relates to a recombinant
HIV1 expression vector, which comprises at least one nucleic acid
or RNA of the application (and regulatory elements for the
expression of said at least one nucleic acid), wherein the
integrase of said HIV1 has been made defective, more particularly
which has been made defective by mutation(s), more particularly
which has been made defective by mutation(s), wherein said
mutation(s) comprise(s) or consist(s) of the eplacement of the
.sup.262RRK motif by AAH (cf. Philippe et al. 2006).
[0344] Said recombinant vector, more particularly said recombinant
expression vector, more particularly said recombinant retroviral
expression vector, more particularly said lentiviral expression
vector, more particularly said HIV1 vector, may further comprise a
recombinant genome, which is devoid of, or has been deleted from,
all the lentiviral encoding sequences, and which comprises, between
the lentiviral LTR 5' and 3' sequences, a lentiviral encapsidation
psi sequence, a RNA nuclear export element, a transgene comprising
said at least one nucleic acid, and optionally, a promoter and/or a
sequence favoring the nuclear import of RNA (cf. WO 99/55892).
[0345] Appropriate vectors comprise vectors, which are especially
adapted for the expression of the nucleic acid of the application
by, or in, a particular type of cells, tissue(s) or organ(s), for
example, vectors, which are especially adapted for the expression
of the inserted nucleic acid by neuronal cells, more particularly:
[0346] an expression lentivirus-derived vector, more particularly,
a non-replicative expression lentivirus-derived vector (e.g., as
described in WO 2013/068430, cf. pages 35-44 of WO 2013/068430 and
the examples of WO 2013/068430) or [0347] a lentiviral vector
pseudotyped particle, more particularly a lentiviral vector, which
has been pseudotyped with the G protein of a rabies virus (e.g., as
described in WO 2013/068430, cf. pages 41-44 of WO 2013/068430 and
the examples of WO 2013/068430).
[0348] The application more particularly relates to a recombinant
vector, more particularly a recombinant expression vector, more
particularly a recombinant (expression) plasmid, which
comprises:
i. at least one nucleic acid of the application, more particularly
at least one RNA of the application, ii. expression regulatory
elements of said at least one nucleic acid or RNA, iii. a
cis-acting central initiation region (cPPT) and a cis-acting
termination region (CTS), both of lentiviral origin, and iv.
regulatory signals of retroviral origin (more particularly, of
lentiviral origin) for transcription (more particularly, for
reverse transcription), expression and packaging.
[0349] Examples of the structure of such vectors (elements ii.,
iii. and iv.) are described in WO 2013/068430, more particularly
from page 35 line 25 to page 41 line 4.
[0350] Said recombinant vector, more particularly said recombinant
expression vector, more particularly said recombinant (expression)
plasmid advantageously is non-replicative. Non-replication may be
achieved by any means that the person of ordinary skill in the art
may find appropriate, e.g., by deletion and/or mutation(s) of viral
sequence(s) (e.g., of the gag and/or pol and/or env gene(s)) and/or
of cis-acting genetic elements needed for particle formation (cf.
WO 2013/068430, more particularly from page 40 line 26 to page 41
line 4).
[0351] The application also relates to a recombinant viral
particle, more particularly to a lentiviral vector pseudotyped
particle, comprising at least one nucleic acid vector of the
application.
[0352] The application also relates to a recombinant viral
particle, more particularly to a lentiviral vector pseudotyped
particle, comprising GAG structural proteins and a viral core made
of (a) POL proteins and (b) a lentiviral genome comprising said at
least one nucleic acid or RNA of the application, expression
regulatory elements of said at least one nucleic acid or RNA, a
cis-acting central initiation region (cPPT) and a cis-acting
termination region (CTS), both of lentiviral origin, and regulatory
signals of retroviral origin for transcription (more particularly
for reverse transcription), expression and packaging, wherein said
particle is pseudotyped with the G protein of a Vesicular
Stomatitis Indiana Virus (VSIV or VSV) or with the G protein of a
rabies virus (cf. above and WO 2013/068430, more particularly from
page 41 line 6 to page 44 line 28). Said rabies virus can e.g., be
the ERA strain (ATCC vr332) or the CVS strain (ATCC vr959). The
sequence of the G protein of the ERA strain is available under
accession number AF406693. The sequence of the G protein of the CVS
rabies virus strain is available under accession number AF406694.
Said recombinant viral particle, more particularly said lentiviral
vector pseudotyped particle, may advantageously have been made
defective, i.e., the integrase of lentiviral origin (which is coded
by the pol gene) is devoid of the capacity of integration of the
lentiviral genome into the genome of a host cell, more particularly
into the genome of a mammalian cell (more particularly into the
genome of a mammalian neuronal cell and/or of a mammalian muscular
cell and/or of a cell of the mammalian skeleton), more particularly
into the genome of a human cell (more particularly into the genome
of a human neuronal cell and/or of a human muscular cell and/or of
a cell of the human skeleton). Said integrase may e.g., comprise
mutation(s), which alter(s) or impede(s) its integrase activity.
Examples of such defective integrases and of such mutation(s) are
described in WO 2013/068430 from page 43 line 6 to page 44 line
28.
[0353] The application also relates to a set comprising a nucleic
acid of the application and a nucleic acid vector of the
application.
[0354] The application also relates to a set comprising a first
nucleic acid and a second nucleic acid, wherein only one, or each
one, of said first and second nucleic acids is a nucleic acid of
the application. Said first nucleic acid is different from said
second nucleic acid.
[0355] When only one of said first and second nucleic acids is a
nucleic acid of the application, the other of said first and second
nucleic acid is a nucleic acid, which is not of the application.
For example, the set comprises a first nucleic acid, which is a
nucleic acid of the application, and a second nucleic acid, which
is not of the application, wherein said first nucleic acid of the
application codes for a first DNA-binding polypeptide and said
second nucleic acid codes for a second DNA-binding polypeptide, and
wherein said first and second DNA-binding polypeptides are the
first and second DNA-binding polypeptides of a mixed polypeptide
set of the application as defined above (first DNA-binding
polypeptide, which is of the application, and second DNA-binding
polypeptide, which is not of the application; cf. above).
[0356] The application more particularly relates to a set wherein
each of said first and second nucleic acids is a nucleic acid of
the application.
[0357] The application also relates to a set comprising a first
nucleic acid vector and a second nucleic acid vector, wherein only
one, or each one, of said first and second nucleic acid vectors is
a nucleic acid vector of the application. Said first nucleic acid
vector is different from said second nucleic acid vector.
[0358] When only one of said first and second nucleic acid vectors
is a nucleic acid vector of the application, the other of said
first and second nucleic acid vectors is a nucleic acid vector,
which is not of the application. For example, the set comprises a
first nucleic acid vector, which comprises a nucleic acid of the
application, and a second nucleic acid vector, which comprises a
nucleic acid, which is not of the application, wherein said first
nucleic acid of the application codes for a first DNA-binding
polypeptide and said second nucleic acid codes for a second
DNA-binding polypeptide, and wherein said first and second
DNA-binding polypeptides are the first and second DNA-binding
polypeptides of a mixed polypeptide set of the application as
defined above (first DNA-binding polypeptide, which is of the
application, and second DNA-binding polypeptide, which is not of
the application; cf. above).
[0359] The application more particularly relates to a set, wherein
each of said first and second nucleic acid vectors is a nucleic
acid vector of the application.
[0360] Each of these sets can be herein referred to as the "nucleic
acid/vector set of the application". The phrase "set" is intended
in accordance with its ordinary meaning in the field. It notably
encompasses the meaning of "a plurality of", more particularly the
meaning of "a pair of". Said set of plurality can e.g., be in the
form of one composition or kit, or of at least two compositions or
of at least two kits.
[0361] Said one composition or kit comprises both said first and
second DNA-binding nucleic acids or nucleic acid vectors.
[0362] Said at least two compositions or kits are in the form of
separate compositions or kits, each comprising one of said first
and second nucleic acids or nucleic acid vectors (e.g., a first
composition or kit comprising said first nucleic acid or nucleic
acid vector and a second composition or kit comprising said second
nucleic acid or nucleic acid vector, wherein said first composition
or kit is distinct or separate from said second composition or
kit). Said at least two compositions or kits can be for
simultaneous, separate, distinct or sequential use, more
particularly for simultaneous or sequential use.
[0363] In said nucleic acid/vector set, the first and second
nucleic acids or vectors can e.g., be present as isolated nucleic
acids or vectors, as individual nucleic acids or vectors, or can be
contained within cell(s), e.g., host and/or genetically engineered
cell(s) as described below (the first nucleic acid or vector can be
contained within the same cell as said second nucleic acid or
vector, or in two distinct cells respectively).
[0364] The application relates to a composition or kit
comprising:
a first recombinant nucleic acid vector and a second recombinant
nucleic acid vector, wherein said first recombinant nucleic acid
vector is different from said second recombinant nucleic acid
vector and wherein said first recombinant nucleic acid vector and
said second recombinant nucleic acid vector respectively code for
the first DNA-binding polypeptide and for the second DNA-binding
polypeptide as defined above; and/or comprising a first lentiviral
vector pseudotyped particle and a second lentiviral vector
pseudotyped particle, wherein said first lentiviral vector
pseudotyped particle is different from said second lentiviral
vector pseudotyped particle and wherein said first lentiviral
vector pseudotyped particle and said second lentiviral vector
pseudotyped particle respectively code for the first DNA-binding
polypeptide and for the second DNA-binding polypeptide as defined
above.
[0365] More particularly, each of said first and second recombinant
nucleic acid vectors is a recombinant nucleic acid vector of the
application, and/or each of said first and second lentiviral vector
pseudotyped particles is a lentiviral vector pseudotyped particle
of the application.
[0366] In said nucleic acid/vector set, said first nucleic acid
and/or said second nucleic acid can be contained in/on a
nanoparticle or liposome as described below, and/or said first
nucleic acid vector and/or said second nucleic acid vector can be
contained in/on a nanoparticle or liposome as described below.
[0367] Said nucleic acid/vector set can be contained in a
composition suitable for nucleic acid transfection of a cell, more
particularly of a eukaryotic cell, more particularly of a mammalian
cell, a non-human mammalian cell (e.g., a rodent cell, such as a
mouse cell), a human cell, a yeast cell, a bacterial cell or a
plant cell, more particularly of a human cell or a yeast cell, more
particularly of a human cell.
[0368] The term "transfection" herein encompasses its broadest
general meaning in the field of genetic engineering. It notably
encompasses any process of deliberately introducing a nucleic acid
into a cell (said process can be virus-mediated or not
virus-mediated, said cell can be eukaryotic or not eukaryotic).
[0369] Said nucleic acid/vector set may further comprises at least
one cell, more particularly at least one eukaryotic cell, more
particularly at least one mammalian cell, at least one non-human
mammalian cell (e.g., a rodent cell, such as a mouse cell), at
least one human cell, at least one yeast cell, at least one
bacterial cell or at least one plant cell, more particularly at
least one human cell or at least one yeast cell, more particularly
at least one human cell.
[0370] The application also relates to a nanoparticle or to a
liposome, which comprises at least one of the polypeptides, sets,
nucleic acids, vectors and host cells of the application, more
particularly which comprises at least one of the polypeptides, sets
and host cells of the application. Said at least one polypeptide,
set, nucleic acid, vector or host cell of the application can be
contained in and/or on nanoparticles. Said at least one
polypeptide, set, nucleic acid, vector or host cell of the
application can be contained in and/or on a liposome, e.g., it can
be encapsulated inside a liposome or associated to a liposome
delivery system. Said liposome can e.g., be a cationic liposome, a
pegylated liposome. Said liposome can be loaded with nanoparticles.
The nanoparticle and/or liposome formulation of the polypeptide,
set, nucleic acid, vector or host cell of the application is
notably useful for improved crossing of the blood-brain barrier
and/or for protection against serum degradation.
[0371] The application also relates to a cell, more particularly a
eukaryotic cell more particularly a mammalian cell, a non-human
mammalian cell, a human cell, a yeast cell, a bacterial cell or a
plant cell, more particularly a human cell or a yeast cell, more
particularly a human cell, which comprises at least one DNA-binding
polypeptide of the application and/or at least one polypeptide set
of the application and/or at least one nucleic acid of any one of
the application and/or at least one nucleic acid vector of the
application and/or at least one nucleic acid/vector set of the
application and/or at least one liposome or nanoparticle of the
application.
[0372] Said cell can e.g., be a host cell and/or a recombinant cell
and/or a genetically engineered cell. The application also relates
to the in vitro use of said cell for the production or synthesis of
at least one DNA-binding polypeptide of the application and/or at
least one polypeptide set of the application and/or at least one
nucleic acid of the application and/or at least one nucleic acid
vector of the application and/or at least one nucleic acid/vector
set of the application and/or at least one liposome or nanoparticle
of the application.
[0373] The application also relates to an in vitro method for the
production of a product, which binds, or specifically binds, to a
(double-stranded) DNA nucleic acid comprising at least one DNA
tandem repeat, more particularly which cleaves, or specifically
cleaves, a (double-stranded) DNA nucleic acid comprising at least
one DNA tandem repeat, more particularly which fully or partially
deletes said at least one DNA tandem repeat, more particularly
which fully or partially deletes said at least one DNA tandem
repeat in a specific manner.
[0374] Said method typically comprises in vitro growing said cell
of the application on a culture medium, allowing it to produce said
at least one DNA-binding polypeptide of the application and/or at
least one nucleic acid of the application and/or at least one
nucleic acid vector of the application, and collecting said at
least one DNA-binding polypeptide of the application and/or at
least one nucleic acid of the application and/or at least one
nucleic acid vector of the application.
[0375] The application also relates to a method for producing (at
least one) DNA-binding polypeptide, which binds, or specifically
binds, to a DNA nucleic acid comprising at least one DNA tandem
repeat, wherein said method comprises producing or synthesizing a
DNA-binding polypeptide of the application.
[0376] The application also relates to a method for producing a
pair of DNA-binding polypeptides, which comprises producing or
synthesizing a first DNA-binding polypeptide and a second
DNA-binding polypeptide, wherein said first DNA-binding polypeptide
and said second DNA-binding polypeptide are as defined above for a
polypeptide set of the application.
[0377] The expression "synthesizing a polypeptide" encompasses
synthesizing a polypeptide by chemical synthesis (e.g., by solid
phase synthesis, or by liquid phase synthesis), as well as
synthesizing a polypeptide by recombinant expression. More
particularly, said expression encompasses the synthesis of a
polypeptide by recombinant expression, more particularly by
recombinant expression of a nucleic acid of the application and/or
of a nucleic acid vector of the application and/or from a host cell
of the application and/or from a composition comprising a first
nucleic acid and a second nucleic acid of the application (as
defined above). Said method may further comprise the collection of
the synthesized polypeptide, e.g., by purification and/or
isolation, for example by antibody capture and/or by HPLC.
[0378] The application also relates to a non-human animal (e.g., a
rodent, such as a mouse, or pig, or a rabbit), which has been
engineered to contain or produce at least one DNA-binding
polypeptide of the application and/or at least one polypeptide set
of the application and/or at least one nucleic acid of the
application and/or at least one nucleic acid vector of the
application and/or at least one nucleic acid/vector set of the
application and/or at least one cell of the application.
[0379] The application also relates to the use of said non-human
animal for the production or synthesis of at least one DNA-binding
polypeptide of the application and/or at least one nucleic acid of
the application and/or at least one nucleic acid vector of the
application.
[0380] The application also relates to a method for the production
of a product, which binds, or specifically binds, to a
(double-stranded) DNA nucleic acid comprising at least one DNA
tandem repeat, more particularly which cleaves, or specifically
cleaves, a (double-stranded) DNA nucleic acid comprising at least
one DNA tandem repeat, more particularly which fully or partially
deletes said at least one DNA tandem repeat, more particularly
which fully or partially deletes said at least one DNA tandem
repeat in a specific manner.
[0381] Said method typically comprises breeding or keeping said
non-human animal, allowing it to produce at least one DNA-binding
polypeptide of the application and/or at least one nucleic acid of
the application and/or at least one nucleic acid vector of the
application, and collecting from said animal at least one
DNA-binding polypeptide of the application and/or at least one
nucleic acid of the application and/or at least one nucleic acid
vector of the application.
[0382] The application also relates to a composition, more
particularly to a pharmaceutical composition, medicament, drug or
kit, which comprises at least one product of the application, more
particularly at least one DNA-binding polypeptide of the
application and/or at least one polypeptide set of the application
and/or at least one nucleic acid of the application and/or at least
one nucleic acid vector of the application and/or at least one
nucleic acid/vector set of the application and/or at least one
liposome or nanoparticle of the application and/or at least one
cell of the application.
[0383] Said pharmaceutical composition, medicament, drug or kit may
further comprise at least one pharmaceutically acceptable vehicle
or carrier, more particularly a physiologically acceptable vehicle
or carrier, more particularly a vehicle or carrier, which is
adapted to the physiology of a mammal, e.g., a human or non-human
mammal. Said vehicle or carrier can be mixed with said at least one
product of the application.
[0384] Said vehicle or carrier can e.g., be or comprise one or
several elements selected from at least one diluent, at least one
excipient, at least one additive, at least one pH adjuster, at
least one pH buffering agent, at least one emulsifier agent, at
least one dispersing agent, at least one preservative, at least one
surfactant, at least one gelling agent, at least one buffering
agent, at least one stabilizing agent and at least one solubilising
agent.
[0385] Appropriate pharmaceutically acceptable vehicles and
formulations include all known pharmaceutically acceptable vehicles
and formulations, such as those described in "Remington: The
Science and Practice of Pharmacy", 20.sup.th edition, Mack
Publishing Co.;
[0386] and "Pharmaceutical Dosage Forms and Drug Delivery Systems",
Ansel, Popovich and Allen Jr., Lippincott Williams and Wilkins.
[0387] When said composition, pharmaceutical composition,
medicament, drug or kit is intended for administration to a subject
(e.g., a non-human mammal or a human) in need thereof, the nature
of the vehicle will in general depend on the particular mode of
administration being employed. For instance, parenteral
formulations usually comprise injectable fluids that include
pharmaceutically and physiologically acceptable fluids, including
water, physiological saline, balanced salt solutions, buffers,
aqueous dextrose, glycerol, ethanol, sesame oil, combinations
thereof, or the like as a vehicle. The medium may also contain
conventional pharmaceutical adjunct materials such as, for example,
pharmaceutically acceptable salts to adjust the osmotic pressure,
buffers, preservatives and the like.
[0388] In said composition, pharmaceutical composition, drug,
medicament or kit of the application, said at least one product of
the application can e.g., be formulated as, or contained in, a
liquid solution, a suspension, an emulsion or a capsule. It can be
formulated e.g., for immediate release, or for differed release or
sustained release formulation.
[0389] Advantageously, said composition, pharmaceutical
composition, drug, medicament or kit is stored or contained in a
sterile container and/or environment.
[0390] The application describes products, which are DNA-binding
polypeptides, nucleic acids, sets, vectors, liposomes,
nanoparticles, cells, non-human animals, compositions,
pharmaceutical compositions, medicaments, drugs, kits.
[0391] Each of these products is useful in the medical field, more
particularly in the field of the treatment and/or palliation and/or
prevention of a disease or disorder.
[0392] Said disease or disorder can e.g., be any disorder or
disease involving at least one DNA tandem repeat (as above
described), more particularly at least one (direct) DNA tandem
repeat in a DNA nucleic acid, more particularly at least one
(direct) DNA tandem repeat in a double-stranded DNA nucleic
acid.
[0393] Said disease or disorder can e.g., be any disorder or
disease involving at least one expanded or abnormally-expanded DNA
tandem repeat, more particularly at least one expanded or
abnormally-expanded DNA tandem repeat in a DNA nucleic acid, more
particularly at least one expanded or abnormally-expanded DNA
tandem repeat in a double-stranded DNA nucleic acid. The phrase
"expanded" or "abnormally-expanded" means that the number of repeat
units forming the DNA tandem repeat is above the normal average
number (for the DNA nucleic acid in consideration).
[0394] Said disease or disorder can e.g., be a neurological and/or
muscular and/or skeletal disorder or disease.
[0395] Said disease or disorder can e.g., be a neurological and/or
muscular and/or skeletal disorder or disease involving at least one
DNA tandem repeat as above-described.
[0396] Said at least one DNA tandem repeat may e.g., have a
non-linear secondary structure such as a hairpin, a triple helix or
a tetraplex.
[0397] Said disease or disorder can e.g., be a neurological and/or
muscular and/or skeletal disorder or disease involving at least one
DNA tandem repeat in a double-stranded DNA nucleic acid, more
particularly at least one DNA tandem repeat in a double-stranded
DNA nucleic acid, wherein said at least one DNA tandem repeat has a
non-linear secondary structure such as a hairpin, a triple helix or
a tetraplex.
[0398] Said at least one DNA tandem repeat can be contained in a
gene, more particularly a eukaryotic gene, more particularly a
non-mammalian eukaryotic gene, e.g., a yeast gene, more
particularly a mammalian gene, e.g., a non-human mammalian gene or
a human gene.
[0399] Advantageously, said at least one DNA tandem repeat is
contained in a chromosome, more particularly is a gene, a
non-mammalian eukaryotic gene, a mammalian gene, a non-human
mammalian gene or a human gene, wherein said gene is contained in a
chromosome, more particularly a human chromosome.
[0400] Said at least one DNA tandem repeat can be contained in any
location of said gene, e.g., in a promoter and/or in the 5'UTR
and/or in at least one exon and/or in at least one intron and/or in
the 3'UTR of said gene.
[0401] Said disease or disorder can e.g., be a trinucleotide repeat
disease or disorder, a tetranucleotide repeat disease or disorder,
or a pentanucleotide repeat disease or disorder.
[0402] Said disease or disorder can e.g., be any disease or
disorder selected from the group consisting of DM1, SCA8, SCA12,
HDL2, SBMA, HD, DRPLA, SCA1, SCA2, SCA3, SCA6, SCA7, SCA17, PSACH,
DM2, SCA10, SPD1, OPMD, CCD, HPE5, HFG syndrome, BPES, EIEE1,
FRAXA, FXTAS and FRAXE (cf. Table 6 above).
[0403] According to an aspect of the application, when said at
least one DNA nucleic acid is a double-stranded nucleic acid, at
least one of its two strands (either only one of them or both of
them) contains nucleotide(s) T in DNA tandem repeat unit.
[0404] Said disease or disorder can e.g., be any disease or
disorder selected from the group consisting of DM1, SCA8, SCA12,
HDL2, SBMA, HD, DRPLA, SCA1, SCA2, SCA3, SCA6, SCA7, SCA17, PSACH,
DM2 and SCA10 (cf. Table 6 above).
[0405] More particularly, said disease or disorder is DM1.
[0406] Said disease and disorders are described in Table 6 above.
Table 7 identifies the at least one DNA nucleic acid, which is
involved in each of said diseases or disorders, respectively. Table
8 identifies the nature of the DNA tandem repeat unit that is
contained in said at least one DNA nucleic acid. Table 8 also
provides the normal average range of DNA tandem repeat units that
are contained in said at least one DNA nucleic acid. A number of
DNA tandem repeat units above said normal average range is
generally considered to be an abnormal number of DNA tandem repeat
units, i.e., it is then generally considered that the at least one
DNA tandem repeat is an expanded or abnormally-expanded DNA tandem
repeat.
[0407] The application notably relates to the use of at least one
DNA-binding polypeptide of the application, more particularly the
use of a first and second DNA-binding polypeptides of the
application (as above defined), or the use of at least one nucleic
acid or vector of the application (as above defined), more
particularly the use of a first and second nucleic acids or vectors
of the application (as above defined), wherein said use is in the
manufacture of a medicament for treating and/or palliating and/or
preventing a disease or disorder involving at least one DNA tandem
repeat, more particularly a disease or disorder as above
defined.
[0408] The application also relates to said at least one
DNA-binding polypeptide of the application, more particularly to
said first and second DNA-binding polypeptides of the application,
or to said at least one nucleic acid or vector of the application,
or to said first and second nucleic acids of the application, for
its/their use as a medicament.
[0409] The application also relates to said at least one
DNA-binding polypeptide of the application, more particularly to
said first and second DNA-binding polypeptides of the application,
or to said at least one nucleic acid or vector of the application,
or to said first and second nucleic acids of the application, for
its/their use in the treatment and/or palliation and/or prevention
of a disease or disorder involving at least one DNA tandem repeat,
more particularly a disease or disorder as above defined.
[0410] The application also relates to a method for producing a
drug or medicament that is useful in the treatment and/or
palliation and/or prevention of a disease or disorder involving at
least one DNA tandem repeat, more particularly a disease or
disorder as above defined. Said method comprises: [0411] producing
said at least one DNA-binding polypeptide of the application, more
particularly said first and second DNA-binding polypeptides of the
application, and/or producing said at least one nucleic acid or
vector of the application, more particularly said first and second
nucleic acids of the application, and/or producing a composition or
kit of the application, [0412] formulating said polypeptide(s)
and/or nucleic acid(s) or vector(s) and/or composition or kit as a
drug or medicament (more particularly, mixing said polypeptide(s)
and/or nucleic acid(s) or vector(s) with at least one vehicle or
carrier e.g., at least one vehicle or carrier as above
defined).
[0413] A product of the application can induce a double-strand
break in a double-stranded DNA nucleic acid. More particularly, a
product of the application can induce a double-strand break
specifically in a double-stranded DNA nucleic acid.
[0414] A product of the application can act by cleaving said at
least one DNA tandem repeat, more particularly by reducing the
number of units contained in said at least one DNA tandem repeat,
more particularly by fully or partially deleting said at least one
DNA tandem repeat.
[0415] According to an advantageous aspect of the application, a
product of the application allows a deletion or reduction of said
at least one DNA tandem repeat down to a non-abnormal number of
repeat units, i.e., down to below the abnormal range (cf. e.g.,
Table 8 for the average normal range of DNA tandem repeat units
that is generally observed in illustrative disease or
disorders).
[0416] The example below illustrates that the efficiency of a
product of the application in achieving said deletion or reduction
is very high (near 100% in heterozygous and homozygous yeast
cells).
[0417] The example below illustrates that a product of the
application can act without inducing an increase in the mutation
rate and without inducing any large genomic rearrangement, such as
aneuploidy, segmental duplication or translocation.
[0418] Advantageously, a means of the application is less toxic
than the prior art means, more particularly than the Zinc Finger
prior art means.
[0419] Advantageously, a means of the application does not induce
any length alteration or mutation at off-target locations, e.g., in
non-pathological genes, which comprise the same repeat unit as the
pathological gene. It is notably the case when the DNA target site
of said first DNA-polypeptide is a non-overlapping DNA target site
(as defined above) and when the DNA target site of said second
DNA-polypeptide is an overlapping DNA target site (as defined
above). Please see FIG. 1B for an illustration of such a
configuration.
[0420] It is believed that it is the first demonstration that the
shortening of a DNA tandem repeat to lengths below pathological
thresholds in humans can be induced with 100% efficacy and a high
specificity.
[0421] Reduction in size of an abnormally-expanded tandem repeat
unit provides a genetic treatment and/or palliation and/or
prevention of the disease or disorder. Indeed it has been
demonstrated that, when a large trinucleotide repeat contraction of
an expanded myotonic dystrophy allele occurred during transmission
from father to daughter, complete clinical examination of the
daughter showed no sign of myotonic dystrophy symptoms (O'Hoy et
al. 1993).
[0422] Hence, a product of the application actually provides a
means for gene therapy and/or palliation and/or prevention of said
diseases or disorders.
[0423] The application also relates to a method of treatment of a
subject in need thereof, which comprises administering at least one
product of the application to said subject. Said subject can e.g.,
be a mammal (e.g., a non-human mammal or a human), more
particularly a human.
[0424] Said at least one product can more particularly be at least
one DNA-binding polypeptide of the application, at least one
polypeptide set (composition or kit) of the application, at least
one nucleic acid, at least one set of nucleic acids, at least one
liposome, at least one nanoparticle, at least one vector or at
least one cell of the application.
[0425] Said at least one product can more particularly be at least
one pharmaceutical composition, medicament or drug of the
application.
[0426] The application also relates to the (in vitro) use of at
least one product of the application in the selection of a product
suitable for cleavage and/or reduction in size, and/or full or
partial deletion of at least one (expanded or abnormally-expanded)
DNA tandem repeat, more particularly at least one (expanded or
abnormally-expanded) DNA tandem repeat in a DNA nucleic acid, more
particularly at least one (expanded or abnormally-expanded) DNA
tandem repeat in a double-stranded DNA nucleic acid.
[0427] The application also relates to a method for identifying a
product useful in the treatment and/or palliation and/or prevention
of a disease or disorder as above defined, which comprises: [0428]
in vitro growing cells, which comprise at least DNA nucleic acid,
wherein said at least one DNA nucleic acid comprises said at least
one DNA tandem repeat to be cleaved and/or reduced in size and/or
fully or partially deleted, wherein said cells are the cells of a
cell line (e.g., a cell line, which is considered by the person of
average skill in the art as a model of one or several of said
diseases or disorders, e.g., one or several of the diseases or
disorders of Table 6 above), or wherein said cells are cells, which
have been collected from a mammal, more particularly from a
non-human mammal or from a human, more particularly from a human in
need of said treatment and/or palliation and/or prevention (e.g., a
human affected by at least one disease or disorder listed in Table
6 above), [0429] contacting at least one cell of said cells with at
least one product of the application, [0430] contacting at least
one other of said cells with at least one other product of the
application, and [0431] selecting the at least one product, which
achieves said cleavage and/or reduction in size and/or full or
partial deletion with the highest efficiency and/or with the lowest
undesired side effects.
[0432] Selecting the at least one product, which achieves said
cleavage and/or reduction in size and/or full or partial deletion
with the lowest undesired side effects notably encompasses
selecting the at least one product, which achieves said cleavage
and/or reduction in size and/or full or partial deletion, and which
induces the lowest level of one or several side effect(s) selected
from the group consisting of induced toxicity, rate of induced
mutation, induced rate of genomic rearrangement, induced rate of
aneuploidy, induced rate of segmental duplication, induced rate of
translocation, rate of off-target cleavage, e.g., in
non-pathological genes, which comprise the same repeat unit as the
pathological gene.
[0433] The application also relates to a method for producing a
product useful for fully or partially deleting a DNA tandem repeat
that is contained in a double stranded DNA nucleic acid, more
particularly for fully or partially deleting a DNA tandem repeat
that is contained in a double stranded DNA nucleic acid and forms a
non-linear secondary structure in said double stranded DNA nucleic
acid (more particularly a secondary structure, which is a hairpin,
a triple helix or a tetraplex structure). Said double-stranded DNA
nucleic acid is as above defined and can e.g., be contained in a
chromosome, more particularly a gene that is contained in a
chromosome, more particularly in a human chromosome, more
particularly a human gene that is contained in a chromosome. Said
full or partial deletion is a deletion or excision of all or
several of the repeated units of said DNA tandem repeat, more
particularly a specific deletion or excision of all or several of
the repeated units of said DNA tandem repeat. Said method comprises
producing a pair of DNA-binding polypeptides of the application
(i.e., a first DNA-binding polypeptide and a second DNA-binding
polypeptide as defined above for a polypeptide set, or mixed set,
of the application), e.g., according to the method of the
application. Said pair of DNA-binding polypeptides is a product
useful for said full or partial DNA tandem repeat deletion.
[0434] At least one of said first and second DNA-binding
polypeptides is a DNA-binding polypeptide of the application, more
particularly a DNA-binding polypeptide of the application, the DNA
target of which is a non-overlapping or an overlapping DNA target
site as defined above, more particularly a DNA-binding polypeptide
of the application, the DNA target of which is a non-overlapping
DNA target site as defined above. Advantageously, said first
DNA-binding polypeptide is a DNA-binding polypeptide of the
application, the DNA target of which is a non-overlapping DNA
target site as defined above, and said second DNA-binding
polypeptide is a DNA-binding polypeptide of the application, the
DNA target of which is an overlapping DNA target site as defined
above.
[0435] The application also relates to a method for inducing (or
generating), more particularly in vitro inducing (or generating), a
double-strand DNA break (into a double-stranded DNA nucleic
acid).
[0436] Said method comprises placing a double-stranded DNA into
contact with a first DNA-binding polypeptide and with a second
DNA-binding polypeptide (said first DNA-binding polypeptide and
said second DNA-binding polypeptide are as defined above for a
polypeptide set, or mixed set, of the application), or with nucleic
acid(s) coding for said first and second DNA-binding polypeptides,
or with a composition or kit, which comprises said first
DNA-binding polypeptide and said second DNA-binding polypeptide
and/or which comprises nucleic acid(s) coding for said first and
second DNA-binding polypeptides (cf. above).
[0437] At least one of said first and second DNA-binding
polypeptides is a DNA-binding polypeptide of the application, more
particularly a DNA-binding polypeptide of the application, the DNA
target of which is a non-overlapping or an overlapping DNA target
site as defined above, more particularly a DNA-binding polypeptide
of the application, the DNA target of which is a non-overlapping
DNA target site as defined above. Advantageously, said first
DNA-binding polypeptide is a DNA-binding polypeptide of the
application, the DNA target of which is a non-overlapping DNA
target site as defined above, and said second DNA-binding
polypeptide is a DNA-binding polypeptide of the application, the
DNA target of which is an overlapping DNA target site as defined
above.
[0438] The application also relates to a method for fully or
partially deleting, more particularly for in vitro fully or
partially deleting, a DNA tandem repeat that is contained in a
double stranded DNA nucleic acid (or a DNA tandem repeat in a
double stranded DNA nucleic acid, which is contained in a
chromosomal DNA, more particularly in a human chromosomal DNA).
[0439] Said method comprises placing a double-stranded DNA into
contact with a first DNA-binding polypeptide and with second
DNA-binding polypeptide (said first DNA-binding polypeptide and
said second DNA-binding polypeptide are as defined above for a
polypeptide set, or mixed set, of the application), or with nucleic
acid(s) coding for said first and second DNA-binding polypeptides,
or with a composition or kit, which comprises said first
DNA-binding polypeptide and said second DNA-binding polypeptide
and/or which comprises nucleic acid(s) coding for said first and
second DNA-binding polypeptides (cf. above).
[0440] At least one of said first and second DNA-binding
polypeptides is a DNA-binding polypeptide of the application, more
particularly a DNA-binding polypeptide of the application, the DNA
target of which is a non-overlapping or an overlapping DNA target
site as defined above, more particularly a DNA-binding polypeptide
of the application, the DNA target of which is a non-overlapping
DNA target site as defined above. Advantageously, said first
DNA-binding polypeptide is a DNA-binding polypeptide of the
application, the DNA target of which is a non-overlapping DNA
target site as defined above, and said second DNA-binding
polypeptide is a DNA-binding polypeptide of the application, the
DNA target of which is an overlapping DNA target site as defined
above.
[0441] Said first and second DNA-binding polypeptides produce or
generate a double-strand DNA break in said double-stranded DNA
nucleic acid. Please see FIGS. 1A, 1B and 3B for illustrations of
double strand DNA breaks. Said double-stranded DNA nucleic acid is
as above defined and can e.g., be contained in a chromosome, more
particularly a gene that is contained in a chromosome, more
particularly a human gene that is contained in a chromosome. Said
double-stranded DNA nucleic acid can be an isolated or purified
DNA, or can be contained into a cell. When said double-stranded DNA
nucleic acid is contained into a cell, said method may further
comprise allowing said first and second DNA-binding polypeptides to
contact or reach said double-stranded DNA.
[0442] The application also relates to an (in vitro) method for
fully or partially deleting a DNA tandem repeat that is contained
in a double stranded DNA nucleic acid, more particularly in a
chromosomal DNA, more particularly in a human chromosomal DNA. Said
method comprises: [0443] (in vitro) contacting a cell containing
said double-stranded DNA nucleic acid, more particularly said
chromosomal DNA, with at least one DNA-binding polypeptide of the
application and/or [0444] (in vitro) contacting and/or (in vitro)
transfecting said cell with at least one nucleic acid of the
application. More particularly, said method comprises: [0445]
contacting said cell with a first DNA-binding polypeptide of the
application and with a second DNA-binding polypeptide of the
application (said first and second DNA-binding polypeptides being
as above defined) and/or [0446] contacting and/or transfecting said
cell with a nucleic acid or with nucleic acids coding for said
first and second DNA-binding polypeptides, more particularly with a
first nucleic acid of the application and with a second nucleic
acid (as above defined).
[0447] The phrase "transfecting" is as defined above: it is
intended with its broadest general meaning in the field of genetic
engineering. It notably encompasses any process of deliberately
introducing a nucleic acid into a cell (said process can be
virus-mediated or not virus-mediated, said cell can be eukaryotic
or not eukaryotic).
[0448] Said cell can be an isolated or purified cell, or can be
contained in an organ or tissue, e.g., an organ or tissue, which
has been collected from a subject, a patient, a mammal, a non-human
mammal, a human. Said cell, organ or tissue can be a mammal cell,
organ or tissue, for example a human cell, organ or tissue, or a
non-human mammal cell, organ or tissue, such as rodent cell, organ
or tissue, a rat cell, organ or tissue, a mouse cell, organ or
tissue, a rabbit cell, organ or tissue, a pig cell, organ or
tissue. Whether it is human or non-human, said cell can e.g., be a
fibroblast cell, a neuronal cell, a skeletal muscle cell, a heart
cell, a skin cell, a kidney cell. Whether it is human or non-human,
said organ or tissue can e.g., be a skeletal muscle or a tissue or
sample thereof, a heart or a tissue or sample thereof, skin or a
tissue or sample thereof, kidney or a tissue or sample thereof.
[0449] In the application, unless specified otherwise or unless a
context dictates otherwise, all the terms have their ordinary
meaning in the relevant field(s).
[0450] The term "comprising", which is synonymous with "including"
or "containing", is open-ended, and does not exclude additional,
un-recited element(s), ingredient(s) or method step(s), whereas the
term "consisting of" is a closed term, which excludes any
additional element, step, or ingredient which is not explicitly
recited.
[0451] The term "essentially consisting of" is a partially open
term, which does not exclude additional, un-recited element(s),
step(s), or ingredient(s), as long as these additional element(s),
step(s) or ingredient(s) do not materially affect the basic and
novel properties of the application.
[0452] Accordingly, the term "comprising" (or "comprise(s)") hence
includes the term "consisting of" ("consist(s) of"), as well as the
term "essentially consisting of" ("essentially consist(s) of").
[0453] In an attempt to help the reader of the present application,
the description has been separated in various paragraphs and/or
sections and/or embodiments and/or aspects. These separations
should not be considered as disconnecting the substance of a
paragraph and/or section and/or embodiment and/or aspect from the
substance of another(other) paragraph(s) and/or section(s) and/or
embodiment(s) and/or aspect(s). To the contrary, the present
application encompasses all the combinations of the various
sections, paragraphs, embodiments and aspects that can be
contemplated by the person of average skill in the art.
[0454] Each of the relevant disclosures of all references cited
herein is specifically incorporated by reference. The following
examples are offered by way of illustration, and not by way of
limitation.
EXAMPLES
Example 1
[0455] Trinucleotide repeat expansions are responsible for at least
two dozens severe neurological or developmental disorders in
humans. A double-strand break between two short CAG/CTG
trinucleotide repeats was formerly shown to induce a high frequency
of repeat contractions in yeast cells (Richard et al. 1999). We
conceived that specific endonucleases called TALENs (described in
Cermark et al. 2011) could provide us with a new and modular tool
to induce a double-strand break within a repeat array.
[0456] Here we show, using a dedicated genetic selection screen,
that TALEN induction of a double-strand break into a CAG/CTG
trinucleotide repeat in heterozygous diploid cells results in gene
conversion of the repeat tract with near 100% efficacy, de facto
deleting the repeat tract. Induction of the same TALEN in
homozygous diploid cells leads to contractions of both repeat
tracts to a final length of 3-13 triplets, with 100% efficacy.
[0457] High throughput sequencing of yeast colonies, before and
after TALEN induction, shows that the TALEN does not increase
mutation rate to a level detectable in our experiments.
[0458] No other CAG/CTG triplet repeat of the yeast genome, besides
the one that was targeted, showed any length alteration or
mutation.
[0459] No large genomic rearrangement such as aneuploidy, segmental
duplication or translocation was detected.
[0460] It is believed that it is the first demonstration that
induction of a dedicated TALEN in a eukaryotic diploid nucleus
leads to shortening of a specific tandem repeat tract to lengths
below pathological thresholds in humans, with 100% efficacy and a
high specificity, effectively paving the way to gene therapy of
diseases or disorders linked to tandem repeat expansions.
[0461] In the present example, a TALEN designed to recognize and
cut a CAG/CTG trinucleotide repeat was assayed in a dedicated yeast
experimental system. The assay relies on a modified suppressor tRNA
gene (SUP4) in which the natural intron was replaced by either a
short spacer sequence (18 bp), hereafter called SUP4-opal) or a
CAG/CTG trinucleotide repeat (125-180 bp, depending on repeat
length, hereafter called sup4-(CAG)). The SUP4-opal allele is
functional and suppresses an ade2-opal non-sense mutation that
accumulates a red pigment into yeast cells, whereas the sup4-(CAG)
is not functional (Richard et al. 1999, Richard et al. 2000).
Diploid yeast cells carrying homozygous ade2-opal mutations are red
if only one copy of SUP4-opal is present, but they revert to white
if two copies are present (FIG. 1A). Haploid cells of opposite
mating types containing either SUP4-opal or sup4-(CAG), were
transformed with one of the two TALEN arms. As a control, a TALEN
arm modified to bind a recognition site split in two halves
separated by 49 bp, was also transformed in one of the two haploid
strains. The left arm of this split-TALEN should not be able to
bind its cognate site and therefore no double-strand break should
be induced (FIG. 1B). TALEN arms are carried by multicopy plasmids
(2 microns) and their expression is under the control of the
inducible GAL1-10 promoter (Giniger et al. 1985). Cells were
simultaneously plated on glucose and galactose media and colonies
were scored after 3-5 days of growth. Yeast survival to the TALEN
induction was 81.4%.+-.7.2%, slightly less than survival to the
split-TALEN induction (96.4%, FIG. 2A). White colonies were scored
and represent a majority of cells on both media, even though they
are more frequent on galactose (82.5% of white colonies) as
compared to glucose (66.7%). This suggests that even in repressing
conditions (glucose), the GAL1-10 promoter shows some level of
leakiness which is, associated to multicopy plasmids, apparently
sufficient to induce TALEN expression. In support of this
observation, we noticed that when crossing two haploids strains
containing a stable trinucleotide repeat and one of the two TALEN
arms, none of the diploids obtained contained a repeat longer than
30 triplets, strongly suggesting than even in repressing
conditions, leaky expression of both TALEN arms occur to a level
high enough to induce repeat contractions when both plasmids are in
the same diploid cell (cf. FIG. 4).
[0462] DNA originating from red and white colonies was subsequently
analyzed by Southern blotting. Forty-nine out of 52 red colonies
contain the two alleles, only three colonies showed the complete
deletion of the sup4-(CAG) allele (FIG. 2B). Conversely, 119 out of
120 white colonies only contain the SUP4-opal allele, whose signal
intensity was twice the intensity detected in red colonies,
suggesting that it corresponds to a near-complete deletion of the
sup4-(CAG) allele. We took advantage of a restriction site
polymorphism between SUP4-opal and sup4-(CAG) alleles, to
discriminate between a perfect homozygotization and a large
contraction of the sup4-(CAG) allele. DNA extracted from red or
white diploid survivors was amplified and digested with enzymes
recognizing one of the two alleles. In all ten white survivors
analyzed, restrictions showed the presence of only the SUP4-opal
allele (FIG. 2C).
[0463] Sequencing the same PCR products amplified from white
diploid survivors confirmed that only one sequence was present, and
not a mix of two different sequences, as would be expected for an
heterozygous SUP4Isup4 locus. These experiments proved that gene
conversion of the sup4-(CAG) allele by the SUP4-opal allele was
more than 99% efficient following TALEN expression. Comparatively,
there was no difference between glucose and galactose and no gene
conversion was detected when inducing the split-TALEN (FIG.
2B).
[0464] In a second set of experiments, we built a diploid strain
containing two sup4-(CAG) alleles of different lengths. In such a
strain, it is not possible to screen for white colonies, since both
alleles are deficient in suppressing ade2-opal mutation. In this
strain, survival to galactose induction dropped to 37.1%.+-.18%, a
FIG. 2.2 fold lower than survival of the SUP4-opallsup4-(CAG)
heterozygote (FIG. 2A). This shows that cutting both chromosomes
instead of one decreases viability by about a two-fold factor.
Molecular analysis showed that ca. 5% of colonies on glucose (2 out
of 37) showed a small expansion, whereas 59% (22 out of 37) of
colonies exhibited a contracted or deleted allele (FIG. 2D),
suggesting again that some TALEN induction occurs in repressing
conditions.
[0465] In galactose, 100% of the 153 colonies analyzed showed one
single band corresponding in size to the near-complete deletion of
the repeat tract. However, Southern blot resolution was not
sufficient to determine if both alleles harbored repeats of the
exact same length. DNA extracted from diploid survivors was
therefore amplified and sequenced. In 23 out of 60 sequenced
survivors (38%), only one sequence was present, as shown by good
quality, evenly spaced peaks (FIG. 3A). In 37 out of 60 survivors
(62%), a mix of two DNA sequences was read after the repeat tract,
indicating that the two alleles carry repeat tracts of different
lengths. Using this approach, only the shortest of the two repeat
tract lengths could be determined, and was found to range from
three to 13 triplets (with one exception, one sequence of 20
triplets was found). Given the size of both TALE recognition sites,
we determined that the minimal spacing between the two TALE
DNA-binding domains necessary to obtain active dimerization of the
Fok I nuclease and subsequent DSB formation was 18 bp (FIG.
1B).
[0466] Homozygous survivors may result from iterative coordinated
or uncoordinated breaks on both chromosomes, one (or two) allele(s)
being cut and repaired by intra-molecular mechanism, while the
other allele is repaired by gene conversion using the shortest one
as a template (FIG. 3B). Heterozygous survivors may result as
before, from iterative coordinated or uncoordinated breaks, that
will not be repaired by gene conversion and will therefore lead to
repeat tracts of different lengths. This may be due to the presence
of CAG repeats at DSB ends, which may impede one or more steps of
homologous recombination, including correct processing of the
break, subsequent formation of Rad51 nucleo filament, or strand
invasion of the homologous template (which also contains CAG
repeats). In support of this hypothesis, distribution of repeat
tract lengths among heterozygous and homozygous survivors shows
that homozygous tract lengths are shorter on the average (mean=7
triplets) than heterozygous tract lengths (mean=9 triplets), this
difference being very significant (Wilcoxon test, p-value=0.0021,
FIG. 3A). This suggests that gene conversion between repeat tracts
may be hindered when tract lengths are too long, probably
inhibiting an early step in the recombination process. In these
cases, intramolecular repair is favored, giving rise to longer
repeat tracts of unequal lengths. However, we cannot totally
exclude that heterozygous survivors result from slippage occurring
during DNA synthesis associated to gene conversion. When
competition is possible between intra- and intermolecular repair
mechanisms, intramolecular events might be favored, even though
homologous recombination is highly efficient in yeast.
[0467] In order to determine TALEN specificity, particularly if an
increase in off-site mutations was associated with its expression,
we completely re-sequenced eight colonies growing on glucose plates
and seven colonies growing on galactose plates. Paired-end ILLUMINA
reads were generated and mapped to the S288C reference genome for
each colony (cf. Table 2). After removal of duplicates, coverage of
unique sequences was homogeneous in all 15 clones sequenced,
showing no aneuploidy and no segmental duplication. Among eight
glucose colonies, eight unique heterozygous SNPs were detected,
whereas among seven galactose colonies four unique heterozygous
SNPs were detected (FIG. 3C). These numbers are not significantly
different from each other and are in good agreement with
predictions. Lynch et al. 2008 determined that the average base
substitution rate per nucleotide site was 3.3.times.10.sup.-10 per
cell division, in S. cerevisiae. Given that glucose and galactose
colonies underwent approximately only 30 cell divisions before DNA
was extracted and sequenced, it was expected that most of the
colonies did not contain any base substitution. Nine colonies out
of 15 did not contain any SNP, whereas the remaining contained
between one (three colonies) and four SNPs (one colony). Actually,
the number of colonies without any SNP was higher for clones
growing in galactose than for clones growing in glucose.
Altogether, five transitions for seven transversions were found
(ratio: 0.71), a proportion slightly higher than expected for
transitions (expected ratio: 0.61), but figures are small.
Insertions and deletions (Indels) of one base pair in
non-monotonous DNA are expected to be ten times less frequent than
base substitutions. Indeed, we only found one deletion of a GC
dinucleotide in an intergenic region. However, six indels were
found in monotonous poly-A/T stretches, but more importantly no
mutation was detected in any of the naturally occurring (at least
five triplet long) 88 CAG/CTG trinucleotide repeats of the S288C
genome. All indels and five out of twelve SNPs fall within
intergenic regions. Out of seven remaining SNPs in coding regions,
two are synonymous (third codon base) whereas five are
non-synonymous and encode point mutations in five different genes
(cf. Tables 3 and 4 below).
TABLE-US-00007 TABLE 3 summary of mutations detected in the 15
sequenced colonies; base substitutions: Chro- mo- some Position
.sup.(1) Mutation Location Codon Amino acid I 175371 T -> C
Intergene -- -- III 300201 C -> A Intergene -- -- IV 628439 C
-> A RLI1 GTG -> TTG Val -> Leu IV 1298899 G -> T SYF1
GTT -> TTT Val -> Phe X 333003 A -> T ZAP1 ACT -> ACA
Synonymous X 626414 T -> C ECM27 TTT -> CTT Phe -> Leu XI
142750 C -> T PIR1 CCG -> CCA Synonymous XI 315846 T -> G
Intergene -- -- XI 609033 A -> G PXL1 CAG -> CGG Gln ->
Arg XII 823062 A -> C Intergene -- -- XIII 330662 C -> G
Intergene -- -- XV 1075334 A -> G YOR389w AAC -> AGC Asn
-> Ser
TABLE-US-00008 TABLE 4 summary of mutations detected in the 15
sequenced colonies; insertions/deletions: Chro- mo- Muta- some
Position .sup.(1) tion Sequence Location I 6737 +A (A).sub.19 SEQ
ID NO: 17 Intergene I 101282 +A (A).sub.24 SEQ ID NO: 18 Intergene
II 809788 -T (T).sub.19 SEQ ID NO: 19 Intergene VI 106271 +TT
(T).sub.13 SEQ ID NO: 20 Intergene VII 95081 -GC Non monotonous
Intergene VII 413969 -GA (A).sub.2G(A).sub.12 SEQ ID NO: 21
Intergene XIII 918118 +T (T).sub.19 SEQ ID NO: 19 Intergene
.sup.(1) mutation position according to GENBANK NC_001133 to
NC_001148, PLN 6 DEC. 2008 yeast genome assembly.
[0468] We concluded that expression of a TALEN targeted to a
specific CAG/CTG trinucleotide repeat has no effect on other
triplet repeats and has no effect on the overall mutation rate of
the yeast genome. Since deep-sequencing cannot reveal reciprocal
translocations that could be induced by the TALEN, as a last
control experiment, a PFGE was run on the heterozygous
SUP4-opal/sup4-opal::CAG strain. DNA from two colonies grown on
glucose and 20 colonies grown on galactose was prepared embedded in
agarose plugs and loaded on a PFGE. All karyotypes were normal,
showing no evidence for aneuploidies, large segmental duplications
or translocations (FIG. 3D).
[0469] TALEN expression leads to trinucleotide repeat contractions
with a 100% efficacy, giving rise to survivors containing
homozygous or heterozygous shorter alleles.
[0470] Detailed Material, Methods & Results:
[0471] Plasmid pCLS9996 (marked with KANMX) and plasmid pCLS16715
(marked with LEU2), carrying the two TALEN arms were respectively
transformed into GFY40 strain (MATa ura3.DELTA.851 leu2.DELTA.1
his3.DELTA.200 lys2.DELTA.202 ade2-opal SUP4-opal; cf. Richard et
al. 1999) or GFY6162-3D (MATa ura3.DELTA.851 leu2.DELTA.1
his3.DELTA.200 trp1.DELTA.65 ade2-opal sup4-(CAG); cf. Richard et
al. 2003). Please see FIG. 1A.
[0472] Plasmid pCLS9996 has been deposited at the C.N.C.M. under
the terms of the Budapest Treaty [C.N.C.M. deposit number: I-4804;
deposit date under the terms of the Budapest Treaty: 10 October
2013].
[0473] Plasmid pCLS16715 has also been deposited at the C.N.C.M.
under the terms of the Budapest Treaty [C.N.C.M. deposit number:
I-4805; deposit date under the terms of the Budapest Treaty: 10
Oct. 2013].
[0474] Plasmid pCLS9996 codes for the right-hand TALEN monomer that
binds to the DNA target site of SEQ ID NO: 10 (cf. FIG. 1B).
[0475] Plasmid pCLS16715 codes for the left-hand TALEN monomer that
binds to the DNA target site of SEQ ID NO: 4 (cf. FIG. 1B).
[0476] Haploids were crossed and diploids containing both TALEN
arms were selected on SC-Leu supplemented with G418 sulfate (200
.mu.g/ml).
[0477] As a control, the split-TALEN left arm carried by pCLS9984
(marked with LEU2) was transformed in GFY6162-3D, crossed to GFY40
carrying the TALEN right arm, and diploids were selected as
before.
[0478] Repeat lengths were checked by Southern blot in several
independant diploids before galactose induction (cf. FIG. 4 and the
associated comments below).
[0479] The TALEN is normally repressed on glucose medium, one copy
of the active SUP4 tRNA being insufficient to suppress the
ade2-opal mutation, yeast cells are red (cf. Richard et al., 1999,
Richard et al., 2000, Richard et al., 2003). In the presence of
galactose, the TALEN is expressed, binds CAG/CTG trinucleotide
repeats and induces a double-strand break (DSB) into the repeat
tract. If a second copy of an active SUP4 tRNA is generated during
double-strand break repair, the ade2-opal mutation will be
suppressed and yeast cells will now be white (cf. FIG. 1A).
[0480] Sequences recognized by both TALE DNA-binding domains and by
the split-TALE.
[0481] The length of the spacer, which is appropriate to induce a
DSB was deduced from repeat tract lengths analyzed in surviving
cells after TALEN induction (length of 18 bp); cf. FIG. 1B).
Sequence Data for Plasmid pCLS9996 (C.N.C.M. I-4804):
[0482] The sequence of the insert carried by plasmid pCLS9996
is:
TABLE-US-00009 [SEQ ID NO: 1] GCGCACATTTCCCCGAAAAGTGCCACCTGACG
TCCGATCAAAAATCATCGCTTCGCTGATTAA TTACCCCAGAAATAAGGCTAAAAAACTAATCG
CATTATCATCCTATGGTTGTTAATTTGATTC GTTCATTTGAAGGTTTGTGGGGCCAGGTTACT
GCCAATTTTTCCTCTTCATAACCATAAAAGC TAGTATTGTAGAATCTTTATTGTTCGGAGCAG
TGCGGCGCGAGGCACATCTGCGTTTCAGGAA CGCGACCGGTGAAGACGAGGACGCACGGAGGA
GAGTCTTCCTTCGGAGGGCTGTCACCCGCTC GGCGGCTTCTAATCCGTACTTCAATATAGCAA
TGAGCAGTTAAGCGTATTACTGAAAGTTCCA AAGAGAAGGTTTTTTTAGGCTAATCGACCTCG
AGCAGATCCGCCAGGCGTGTATATAGCGTGG ATGGCCAGGCAACTTTAGTGCTGACACATACA
GGCATATATATATGTGTGCGACGACACATGA TCATATGGCATGCATGTGCTCTGTATGTATAT
AAAACTCTTGTTTTCTTCTTTTCTCTAAATA TTCTTTCCTTATACATTAGGTCCTTTGTAGCA
TAAATTACTATACTTCTATAGACACGCAAAC ACAAATACACAGCGGCCTTGCCACCATGGGCG
ATCCTAAAAAGAAACGTAAGGTCATCGATAA GGAGACCGCCGCTGCCAAGTTCGAGAGACAGC
ACATGGACAGCATCGATATCGCCGATCTACG CACGCTCGGCTACAGCCAGCAGCAACAGGAGA
AGATCAAACCGAAGGTTCGTTCGACAGTGGC GCAGCACCACGAGGCACTGGTCGGCCACGGGT
TTACACACGCGCACATCGTTGCGTTAAGCCA ACACCCGGCAGCGTTAGGGACCGTCGCTGTCA
AGTATCAGGACATGATCGCAGCGTTGCCAGA GGCGACACACGAAGCGATCGTTGGCGTCGGCA
AACAGTGGTCCGGCGCACGCGCTCTGGAGGC CTTGCTCACGGTGGCGGGAGAGTTGAGAGGTC
CACCGTTACAGTTGGACACAGGCCAACTTCT CAAGATTGCAAAACGTGGCGGCGTGACCGCAG
TGGAGGCAGTGCATGCATGGCGCAATGCACT GACGGGTGCCCCGCTCAACTTGACCCCCCAGC
AGGTGGTGGCCATCGCCAGCAATAATGGTGG CAAGCAGGCGCTGGAGACGGTCCAGCGGCTGT
TGCCGGTGCTGTGCCAGGCCCACGGCTTGAC CCCGGAGCAGGTGGTGGCCATCGCCAGCCACG
ATGGCGGCAAGCAGGCGCTGGAGACGGTCCA GCGGCTGTTGCCGGTGCTGTGCCAGGCCCACG
GCTTGACCCCCCAGCAGGTGGTGGCCATCGC CAGCAATGGCGGTGGCAAGCAGGCGCTGGAGA
CGGTCCAGCGGCTGTTGCCGGTGCTGTGCCA GGCCCACGGCTTGACCCCCCAGCAGGTGGTGG
CCATCGCCAGCAATAATGGTGGCAAGCAGGC GCTGGAGACGGTCCAGCGGCTGTTGCCGGTGC
TGTGCCAGGCCCACGGCTTGACCCCGGAGCA GGTGGTGGCCATCGCCAGCCACGATGGCGGCA
AGCAGGCGCTGGAGACGGTCCAGCGGCTGTT GCCGGTGCTGTGCCAGGCCCACGGCTTGACCC
CCCAGCAGGTGGTGGCCATCGCCAGCAATGG CGGTGGCAAGCAGGCGCTGGAGACGGTCCAGC
GGCTGTTGCCGGTGCTGTGCCAGGCCCACGG CTTGACCCCCCAGCAGGTGGTGGCCATCGCCA
GCAATAATGGTGGCAAGCAGGCGCTGGAGAC GGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGG
CCCACGGCTTGACCCCGGAGCAGGTGGTGGC CATCGCCAGCCACGATGGCGGCAAGCAGGCGC
TGGAGACGGTCCAGCGGCTGTTGCCGGTGCT GTGCCAGGCCCACGGCTTGACCCCCCAGCAGG
TGGTGGCCATCGCCAGCAATGGCGGTGGCAA GCAGGCGCTGGAGACGGTCCAGCGGCTGTTGC
CGGTGCTGTGCCAGGCCCACGGCTTGACCCC CCAGCAGGTGGTGGCCATCGCCAGCAATAATG
GTGGCAAGCAGGCGCTGGAGACGGTCCAGCG GCTGTTGCCGGTGCTGTGCCAGGCCCACGGCT
TGACCCCGGAGCAGGTGGTGGCCATCGCCAG CCACGATGGCGGCAAGCAGGCGCTGGAGACGG
TCCAGCGGCTGTTGCCGGTGCTGTGCCAGGC CCACGGCTTGACCCCCCAGCAGGTGGTGGCCA
TCGCCAGCAATGGCGGTGGCAAGCAGGCGCT GGAGACGGTCCAGCGGCTGTTGCCGGTGCTGT
GCCAGGCCCACGGCTTGACCCCCCAGCAGGT GGTGGCCATCGCCAGCAATAATGGTGGCAAGC
AGGCGCTGGAGACGGTCCAGCGGCTGTTGCC GGTGCTGTGCCAGGCCCACGGCTTGACCCCGG
AGCAGGTGGTGGCCATCGCCAGCCACGATGG CGGCAAGCAGGCGCTGGAGACGGTCCAGCGGC
TGTTGCCGGTGCTGTGCCAGGCCCACGGCTT GACCCCCCAGCAGGTGGTGGCCATCGCCAGCA
ATGGCGGTGGCAAGCAGGCGCTGGAGACGGT CCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCC
ACGGCTTGACCCCTCAGCAGGTGGTGGCCAT CGCCAGCAATGGCGGCGGCAGGCCGGCGCTGG
AGAGCATTGTTGCCCAGTTATCTCGCCCTGA TCCGGCGTTGGCCGCGTTGACCAACGACCACC
TCGTCGCCTTGGCCTGCCTCGGCGGGCGTCC TGCGCTGGATGCAGTGAAAAAGGGATTGGGGG
ATCCTATCAGCCGTTCCCAGCTGGTGAAGTC CGAGCTGGAGGAGAAGAAATCCGAGTTGAGGC
ACAAGCTGAAGTACGTGCCCCACGAGTACAT CGAGCTGATCGAGATCGCCCGGAACAGCACCC
AGGACCGTATCCTGGAGATGAAGGTGATGGA GTTCTTCATGAAGGTGTACGGCTACAGGGGCA
AGCACCTGGGCGGCTCCAGGAAGCCCGACGG CGCCATCTACACCGTGGGCTCCCCCATCGACT
ACGGCGTGATCGTGGACACCAAGGCCTACTC CGGCGGCTACAACCTGCCCATCGGCCAGGCCG
ACGAAATGCAGAGGTACGTGGAGGAGAACCA GACCAGGAACAAGCACATCAACCCCAACGAGT
GGTGGAAGGTGTACCCCTCCAGCGTGACCGA GTTCAAGTTCCTGTTCGTGTCCGGCCACTTCA
AGGGCAACTACAAGGCCCAGCTGACCAGGCT GAACCACATCACCAACTGCAACGGCGCCGTGC
TGTCCGTGGAGGAGCTCCTGATCGGCGGCGA GATGATCAAGGCCGGCACCCTGACCCTGGAGG
AGGTGAGGAGGAAGTTCAACAACGGCGAGAT CAACTTCGCGGCCGACTGATAACTCGAGCGAT
CCTCTAGACGAGCTCCTCGAGCCTGCAGCAG CTGAAGCTTTGGACTTCTTCGCCAGAGGTTTG
GTCAAGTCTCCAATCAAGGTTGTCGGCTTGT CTACCTTGCCAGAAATTTACGAAAAGATGGAA
AAGGGTCAAATCGTTGGTAGATACGTTGTTG ACACTTCTAAATAAGCGAATTTCTTATGATTT
ATGATTTTTATTATTAAATAAGTTATAAAAA AAATAAGTGTATACAAATTTTAAAGTGACTCT
TAGGTTTTAAAACGAAAATTCTTATTCTTGA GTAACTCTTTCCTGTAGGTCAGGTTGCTTTCT
CAGGTATAGCATGAGGTCGCTCTTATTGACC ACACCTCTACCGGCATGCAAGCTTGGCGTAAT
CATGGTCATAGCTGTTTCCTGTGTGAAATTG
TTATCCGCTCACAATTCCACACAACATACGAG
CCGGAAGCATAAAGTGTAAAGCCTGGGGTGC CTAATGAGTGAGCTAACTCACATTAATTGCGT
TGCGCTCACTGCCCGCTTTCCAGTCGGGAAA CCTGTCGTGCCAGCAGATCTGTTTAGCTTGCC
TCGTCCCCGCCGGGTCACCCGGCCAGCGACA TGGAGGCCCAGAATACCCTCCTTGACAGTCTT
GACGTGCGCAGCTCAGGGGCATGATGTGACT GTCGCCCGTACATTTAGCCCATACATCCCCAT
GTATAATCATTTGCATCCATACATTTTGATG GCCGCACGGCGCGAAGCAAAAATTACGGCTCC
TCGCTGCAGACCTGCGAGCAGGGAAACGCTC CCCTCACAGACGCGTTGAATTGTCCCCACGCC
GCGCCCCTGTAGAGAAATATAAAAGGTTAGG ATTTGCCACTGAGGTTCTTCTTTCATATACTT
CCTTTTAAAATCTTGCTAGGATACAGTTCTC ACATCACATCCGAACATAAACAACCATGCATG
GGTAAGGAAAAGACTCACGTTTCGAGGCCGC GATTAAATTCCAACATGGATGCTGATTTATAT
GGGTATAAATGGGCTCGCGATAATGTCGGGC AATCAGGTGCGACAATCTATCGATTGTATGGG
AAGCCCGATGCGCCAGAGTTGTTTCTGAAAC ATGGCAAAGGTAGCGTTGCCAATGATGTTACA
GATGAGATGGTCAGACTAAACTGGCTGACGG AATTTATGCCTCTTCCGACCATCAAGCATTTT
ATCCGTACTCCTGATGATGCATGGTTACTCA CCACTGCGATCCCCGGCAAAACAGCATTCCAG
GTATTAGAAGAATATCCTGATTCAGGTGAAA ATATTGTTGATGCGCTGGCAGTGTTCCTGCGC
CGGTTGCATTCGATTCCTGTTTGTAATTGTC CTTTTAACAGCGATCGCGTATTTCGCCTCGCT
CAGGCGCAATCACGAATGAATAACGGTTTGG TTGATGCGAGTGATTTTGATGACGAGCGTAAT
GGCTGGCCTGTTGAACAAGTCTGGAAAGAAA TGCATAAGCTTTTGCCATTCTCACCGGATTCA
GTCGTCACTCATGGTGATTTCTCACTTGATA ACCTTATTTTTGACGAGGGGAAATTAATAGGT
TGTATTGATGTTGGACGAGTCGGAATCGCAG ACCGATACCAGGATCTTGCCATCCTATGGAAC
TGCCTCGGTGAGTTTTCTCCTTCATTACAGA AACGGCTTTTTCAAAAATATGGTATTGATAAT
CCTGATATGAATAAATTGCAGTTTCATTTGA TGCTCGATGAGTTTTTCTAATCAGTACTGACA
ATAAAAAGATTCTTGTTTTCAAGAACTTGTC ATTTGTATAGTTTTTTTATATTGTAGTTGTTC
TATTTTAATCAAATGTTAGCGTGATTTATAT TTTTTTTCGCCTCGACATCATCTGCCCAGATG
CGAAGTTAAGTGCGCAGAAAGTAATATCATG CGTCAATCGTATGTGAATGCTGGTCGCTATAC
TGCTGTCGATTCGATACTAACGCCGCCATCC AGTGTCGAAAACGAGCTCGAATTCATCGATGA
TATCAGATCCACTAGTGGCCTATGCGACCGC GGATCTGCCGGTCTCCCTATAGTGAGTCGTAT
TAATTTCGATAAGCCAGGTTAACCTGCATTA ATGAATCGGCCAACGCGCGGGGAGAGGCGGTT
TGCGTATTGGGCGCTCTTCCGCTTCCTCGCT CACTGACTCGCTGCGCTCGGTCGTTCGGCTGC
GGCGAGCGGTATCAGCATCGATGAATTCCAC GGACTATAGACTATACTAGTATACTCCGTCTA
CTGTACGATACACTTCCGCTCAGGTCCTTGT CCTTTAACGAGGCCTTACCACTCTTTTGTTAC
TCTATTGATCCAGCTCAGCAAAGGCAGTGTG ATCTAAGATTCTATCTTCGCGATGTAGTAAAA
CTAGCTAGACCGAGAAAGAGACTAGAAATGC AAAAGGCACTTCTACAATGGCTGCCATCATTA
TTATCCGATGTGACGCTGCAGCTTCTCAATG ATATTCGAATACGCTTTGAGGAGATACAGCCT
AATATCCGACAAACTGTTTTACAGATTTACG ATCGTACTTGTTACCCATCATTGAATTTTGAA
CATCCGAACCTGGGAGTTTTCCCTGAAACAG ATAGTATATTTGAACCTGTATAATAATATATA
GTCTAGCGCTTTACGGAAGACAATGTATGTA TTTCGGTTCCTGGAGAAACTATTGCATCTATT
GCATAGGTAATCTTGCACGTCGCATCCCCGG TTCATTTTCTGCGTTTCCATCTTGCACTTCAA
TAGCATATCTTTGTTAACGAAGCATCTGTGC TTCATTTTGTAGAACAAAAATGCAACGCGAGA
GCGCTAATTTTTCAAACAAAGAATCTGAGCT GCATTTTTACAGAACAGAAATGCAACGCGAAA
GCGCTATTTTACCAACGAAGAATCTGTGCTT CATTTTTGTAAAACAAAAATGCAACGCGAGAG
CGCTAATTTTTCAAACAAAGAATCTGAGCTG CATTTTTACAGAACAGAAATGCAACGCGAGAG
CGCTATTTTACCAACAAAGAATCTATACTTC TTTTTTGTTCTACAAAAATGCATCCCGAGAGC
GCTATTTTTCTAACAAAGCATCTTAGATTAC TTTTTTTCTCCTTTGTGCGCTCTATAATGCAG
TCTCTTGATAACTTTTTGCACTGTAGGTCCG TTAAGGTTAGAAGAAGGCTACTTTGGTGTCTA
TTTTCTCTTCCATAAAAAAAGCCTGACTCCA CTTCCCGCGTTTACTGATTACTAGCGAAGCTG
CGGGTGCATTTTTTCAAGATAAAGGCATCCC CGATTATATTCTATACCGATGTGGATTGCGCA
TACTTTGTGAACAGAAAGTGATAGCGTTGAT GATTCTTCATTGGTCAGAAAATTATGAACGGT
TTCTTCTATTTTGTCTCTATATACTACGTAT AGGAAATGTTTACATTTTCGTATTGTTTTCGA
TTCACTCTATGAATAGTTCTTACTACAATTT TTTTGTCTAAAGAGTAATACTAGAGATAAACA
TAAAAAATGTAGAGGTCGAGTTTAGATGCAA GTTCAAGGAGCGAAAGGTGGATGGGTAGGTTA
TATAGGGATATAGCACAGAGATATATAGCAA AGAGATACTTTTGAGCAATGTTTGTGGAAGCG
GTATTCGCAATATTTTAGTAGCTCGTTACAG TCCGGTGCGTTTTTGGTTTTTTGAAAGTGCGT
CTTCAGAGCGCTTTTGGTTTTCAAAAGCGCT CTGAAGTTCCTATACTTTCTAGAGAATAGGAA
CTTCGGAATAGGAACTTCAAAGCGTTTCCGA AAACGAGCGCTTCCGAAAATGCAACGCGAGCT
GCGCACATACAGCTCACTGTTCACGTCGCAC CTATATCTGCGTGTTGCCTGTATATATATATA
CATGAGAAGAACGGCATAGTGCGTGTTTATG CTTAAATGCGTACTTATATGCGTCTATTTATG
TAGGATGAAAGGTAGTCTAGTACCTCCTGTG ATATTATCCCATTCCATGCGGGGTATCGTATG
CTTCCTTCAGCACTACCCTTTAGCTGTTCTA TATGCTGCCACTCCTCAATTGGATTAGTCTCA
TCCTTCAATGCTATCATTTCCTTTGATATTG GATCATATGCATAGTACCGAGAAACTAGTGCG
AAGTAGTGATCAGGTATTGCTGTTATCTGAT GAGTATACGTTGTCCTGGCCACGGCAGAAGCA
CGCTTATCGCTCCAATTTCCCACAACATTAG TCAACTCCGTTAGGCCCTTCATTGAAAGAAAT
GAGGTCATCAAATGTCTTCCAATGTGAGATT TTGGGCCATTTTTTATAGCAAAGATTGAATAA
GGCGCATTTTTCTTCAAAGCTTTATTGTACG
ATCTGACTAAGTTATCTTTTAATAATTGGTAT TCCTGTTTATTGCTTGAAGAATTGCCGGTCC
TATTTACTCGTTTTAGGACTGGTTCAGAATTC ATCGATGCTCACTCAAAGGTCGGTAATACGG
TTATCCACAGAATCAGGGGATAACGCAGGAAA GAACATGTGAGCAAAAGGCCAGCAAAAGGCC
AGGAACCGTAAAAAGGCCGCGTTGCTGGCGTT TTTCCATAGGCTCCGCCCCCCTGACGAGCAT
CACAAAAATCGACGCTCAAGTCAGAGGTGGCG AAACCCGACAGGACTATAAAGATACCAGGCG
TTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCC TGTTCCGACCCTGCCGCTTACCGGATACCTG
TCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCT TTCTCATAGCTCACGCTGTAGGTATCTCAGT
TCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTG TGTGCACGAACCCCCCGTTCAGCCCGACCGC
TGCGCCTTATCCGGTAACTATCGTCTTGAGTC CAACCCGGTAAGACACGACTTATCGCCACTG
GCAGCAGCCACTGGTAACAGGATTAGCAGAGC GAGGTATGTAGGCGGTGCTACAGAGTTCTTG
AAGTGGTGGCCTAACTACGGCTACACTAGAAG GACAGTATTTGGTATCTGCGCTCTGCTGAAG
CCAGTTACCTTCGGAAAAAGAGTTGGTAGCTC TTGATCCGGCAAACAAACCACCGCTGGTAGC
GGTGGTTTTTTTGTTTGCAAGCAGCAGATTAC GCGCAGAAAAAAAGGATCTCAAGAAGATCCT
TTGATCTTTTCTACGGGGTCTGACGCTCAGTG GAACGAAAACTCACGTTAAGGGATTTTGGTC
ATGAGCGGATACATATTTGAATGTATTTAGAA AAATAAACAAATAGGGGTTCC
[0483] The nucleic acid of SEQ ID NO: 1 (carried by plasmid
pCLS9996) codes for the TALEN arm that binds to the DNA target site
of SEQ ID NO: 10 (cf. FIG. 1B). Hence, the nucleic acid of SEQ ID
NO: 1 comprises a sequence, which codes for adjacent units of TAL
effector tandem repeat that determine recognition of the DNA target
site of SEQ ID NO: 10, and which codes for an endonuclease. The
endonuclease is the monomer of a dimeric endonuclease, i.e., a Fold
monomer. The sequence, which codes for adjacent units of TAL
effector tandem repeat and for an endonuclease, is preceded by a
promoter and an enhancer, and is followed by a terminator. The
nucleic acid of SEQ ID NO: 1 further comprises a sequence, which
codes for a selection marker, i.e., the kanamycin selection marker.
The sequence, which codes for the selection marker, is preceded by
a promoter and is followed by a terminator.
[0484] The nucleic acid of SEQ ID NO: 1 further comprises a
replication origin, i.e., the 2-micron replication origin.
[0485] More particularly, the nucleic acid of SEQ ID NO: 1 (carried
by plasmid pCLS9996) comprises: [0486] a GAL10 enhancer at
positions 36-401 [SEQ ID NO: 37]; [0487] a CYC1 promoter at
positions 402-641[SEQ ID NO: 38]; [0488] a sequence coding for a
TALEN arm (TALEN arm that binds to the DNA target site of SEQ ID
NO: 10) at positions 656-3484 [SEQ ID NO: 39]; [0489] an ADH1
terminator at positions 3836-4155 [SEQ ID NO: 40]; [0490] a TEF
promoter at positions 4357-4736 [SEQ ID NO: 41]; [0491] a sequence
coding for the KANMX selection marker at positions 4740-5546 [SEQ
ID NO: 42]; [0492] a TEF terminator at positions 5547-5759 [SEQ ID
NO: 43]; and [0493] the 2-micron replication origin at positions
6585-7929 [SEQ ID NO: 44].
[0494] Hence, the sequences of SEQ ID NOs: 37-44 are:
TABLE-US-00010 (GAL10 enhancer) SEQ ID NO: 37
GATCAAAAATCATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAA
AAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTCATTTG
AAGGTTTGTGGGGCCAGGTTACTGCCAATTTTTCCTCTTCATAACCATAA
AAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCA
CATCTGCGTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAG
AGTCTTCCTTCGGAGGGCTGTCACCCGCTCGGCGGCTTCTAATCCGTACT
TCAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAA GGTTTTTTTAGGCTAA
(CYC1 promoter) SEQ ID NO: 38
TCGACCTCGAGCAGATCCGCCAGGCGTGTATATAGCGTGGATGGCCAGGC
AACTTTAGTGCTGACACATACAGGCATATATATATGTGTGCGACGACACA
TGATCATATGGCATGCATGTGCTCTGTATGTATATAAAACTCTTGTTTTC
TTCTTTTCTCTAAATATTCTTTCCTTATACATTAGGTCCTTTGTAGCATA
AATTACTATACTTCTATAGACACGCAAACACAAATACACA (coding for the TALEN arm
that recognizes the DNA target site of SEQ ID NO: 10) SEQ ID NO: 39
ATGGGCGATCCTAAAAAGAAACGTAAGGTCATCGATAAGGAGACCGCCGC
GCCAAGTTCGAGAGACAGCACATGGACAGCATCGATATCGCCGATCTACG
CACGCTCGGCTACAGCCAGCAGCAACAGGAGAAGATCAAACCGAAGGTTC
GTTCGACAGTGGCGCAGCACCACGAGGCACTGGTCGGCCACGGGTTTACA
CACGCGCACATCGTTGCGTTAAGCCAACACCCGGCAGCGTTAGGGACCGT
CGCTGTCAAGTATCAGGACATGATCGCAGCGTTGCCAGAGGCGACACACG
AAGCGATCGTTGGCGTCGGCAAACAGTGGTCCGGCGCACGCGCTCTGGAG
GCCTTGCTCACGGTGGCGGGAGAGTTGAGAGGTCCACCGTTACAGTTGGA
CACAGGCCAACTTCTCAAGATTGCAAAACGTGGCGGCGTGACCGCAGTGG
AGGCAGTGCATGCATGGCGCAATGCACTGACGGGTGCCCCGCTCAACTTG
ACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGGTGGCAAGCAGGC
GCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCT
TGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAG
GCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGG
CTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGC
AGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCAC
GGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGGTGGCAA
GCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCC
ACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGC
AAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGC
CCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTG
GCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAG
GCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGG
TGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCC
AGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGAT
GGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTG
CCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATG
GCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAA
TAATGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGC
TGTGCCAGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGC
CACGATGGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGT
GCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCA
GCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCG
GTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGC
CAGCAATAATGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGC
CGGTGCTGTGCCAGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATC
GCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTT
GCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCA
TCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTG
TTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCTCAGCAGGTGGTGGC
CATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGGAGAGCATTGTTGCCC
AGTTATCTCGCCCTGATCCGGCGTTGGCCGCGTTGACCAACGACCACCTC
GTCGCCTTGGCCTGCCTCGGCGGGCGTCCTGCGCTGGATGCAGTGAAAAA
GGGATTGGGGGATCCTATCAGCCGTTCCCAGCTGGTGAAGTCCGAGCTGG
AGGAGAAGAAATCCGAGTTGAGGCACAAGCTGAAGTACGTGCCCCACGAG
TACATCGAGCTGATCGAGATCGCCCGGAACAGCACCCAGGACCGTATCCT
GGAGATGAAGGTGATGGAGTTCTTCATGAAGGTGTACGGCTACAGGGGCA
AGCACCTGGGCGGCTCCAGGAAGCCCGACGGCGCCATCTACACCGTGGGC
TCCCCCATCGACTACGGCGTGATCGTGGACACCAAGGCCTACTCCGGCGG
CTACAACCTGCCCATCGGCCAGGCCGACGAAATGCAGAGGTACGTGGAGG
AGAACCAGACCAGGAACAAGCACATCAACCCCAACGAGTGGTGGAAGGTG
TACCCCTCCAGCGTGACCGAGTTCAAGTTCCTGTTCGTGTCCGGCCACTT
CAAGGGCAACTACAAGGCCCAGCTGACCAGGCTGAACCACATCACCAACT
GCAACGGCGCCGTGCTGTCCGTGGAGGAGCTCCTGATCGGCGGCGAGATG
ATCAAGGCCGGCACCCTGACCCTGGAGGAGGTGAGGAGGAAGTTCAACAA
CGGCGAGATCAACTTCGCGGCCGACTGA (ADH1 terminator) SEQ ID NO: 40
TATTGACCACACCTCTACCGGCATGCAAGCTTGGCGTAATCATGGTCATA
GCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATAC
GAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAA
CTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCT
GTCGTGCCAGCAGATCTGTTTAGCTTGCCTCGTCCCCGCCGGGTCACCCG
GCCAGCGACATGGAGGCCCAGAATACCCTCCTTGACAGTCTTGACGTGCG
CAGCTCAGGGGCATGATGTG (TEF promoter) SEQ ID NO: 41
TGAGGTTCTTCTTTCATATACTTCCTTTTAAAATCTTGCTAGGATACAGT
TCTCACATCACATCCGAACATAAACAACCATGCATGGGTAAGGAAAAGAC
TCACGTTTCGAGGCCGCGATTAAATTCCAACATGGATGCTGATTTATATG
GGTATAAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACAATCTAT
CGATTGTATGGGAAGCCCGATGCGCCAGAGTTGTTTCTGAAACATGGCAA
AGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAACTGGC
TGACGGAATTTATGCCTCTTCCGACCATCAAGCATTTTATCCGTACTCCT
GATGATGCATGGTTACTCACCACTGCGATCCCC (coding for the KANMX selection
marker) SEQ ID NO: 42
GGCAAAACAGCATTCCAGGTATTAGAAGAATATCCTGATTCAGGTGAAAA
TATTGTTGATGCGCTGGCAGTGTTCCTGCGCCGGTTGCATTCGATTCCTG
TTTGTAATTGTCCTTTTAACAGCGATCGCGTATTTCGCCTCGCTCAGGCG
CAATCACGAATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGA
GCGTAATGGCTGGCCTGTTGAACAAGTCTGGAAAGAAATGCATAAGCTTT
TGCCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGAT
AACCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTGGACG
AGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCCTATGGAACTGCC
TCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGT
ATTGATAATCCTGATATGAATAAATTGCAGTTTCATTTGATGCTCGATGA
GTTTTTCTAATCAGTACTGACAATAAAAAGATTCTTGTTTTCAAGAACTT
GTCATTTGTATAGTTTTTTTATATTGTAGTTGTTCTATTTTAATCAAATG
TTAGCGTGATTTATATTTTTTTTCGCCTCGACATCATCTGCCCAGATGCG
AAGTTAAGTGCGCAGAAAGTAATATCATGCGTCAATCGTATGTGAATGCT
GGTCGCTATACTGCTGTCGATTCGATACTAACGCCGCCATCCAGTGTCGA
AAACGAGCTCGAATTCATCGATGATATCAGATCCACTAGTGGCCTATGCG ACCGCGG (TEF
terminator) SEQ ID NO: 43
ATCTGCCGGTCTCCCTATAGTGAGTCGTATTAATTTCGATAAGCCAGGTT
AACCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTAT
TGGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTC
GGCTGCGGCGAGCGGTATCAGCATCGATGAATTCCACGG ACTATAGACTATACTAGTATACTC
(2-Micron replication origin) SEQ ID NO: 44
GCTATTTTTCTAACAAAGCATCTTAGATTACTTTTTTTCTCCTTTGTGCG
CTCTATAATGCAGTCTCTTGATAACTTTTTGCACTGTAGGTCCGTTAAGG
TTAGAAGAAGGCTACTTTGGTGTCTATTTTCTCTTCCATAAAAAAAGCCT
GACTCCACTTCCCGCGTTTACTGATTACTAGCGAAGCTGCGGGTGCATTT
TTTCAAGATAAAGGCATCCCCGATTATATTCTATACCGATGTGGATTGCG
CATACTTTGTGAACAGAAAGTGATAGCGTTGATGATTCTTCATTGGTCAG
AAAATTATGAACGGTTTCTTCTATTTTGTCTCTATATACTACGTATAGGA
AATGTTTACATTTTCGTATTGTTTTCGATTCACTCTATGAATAGTTCTTA
CTACAATTTTTTTGTCTAAAGAGTAATACTAGAGATAAACATAAAAAATG
TAGAGGTCGAGTTTAGATGCAAGTTCAAGGAGCGAAAGGTGGATGGGTAG
GTTATATAGGGATATAGCACAGAGATATATAGCAAAGAGATACTTTTGAG
CAATGTTTGTGGAAGCGGTATTCGCAATATTTTAGTAGCTCGTTACAGTC
CGGTGCGTTTTTGGTTTTTTGAAAGTGCGTCTTCAGAGCGCTTTTGGTTT
TCAAAAGCGCTCTGAAGTTCCTATACTTTCTAGAGAATAGGAACTTCGGA
ATAGGAACTTCAAAGCGTTTCCGAAAACGAGCGCTTCCGAAAATGCAACG
CGAGCTGCGCACATACAGCTCACTGTTCACGTCGCACCTATATCTGCGTG
TTGCCTGTATATATATATACATGAGAAGAACGGCATAGTGCGTGTTTATG
CTTAAATGCGTACTTATATGCGTCTATTTATGTAGGATGAAAGGTAGTCT
AGTACCTCCTGTGATATTATCCCATTCCATGCGGGGTATCGTATGCTTCC
TTCAGCACTACCCTTTAGCTGTTCTATATGCTGCCACTCCTCAATTGGAT
TAGTCTCATCCTTCAATGCTATCATTTCCTTTGATATTGGATCATATGCA
TAGTACCGAGAAACTAGTGCGAAGTAGTGATCAGGTATTGCTGTTATCTG
ATGAGTATACGTTGTCCTGGCCACGGCAGAAGCACGCTTATCGCTCCAAT
TTCCCACAACATTAGTCAACTCCGTTAGGCCCTTCATTGAAAGAAATGAG
GTCATCAAATGTCTTCCAATGTGAGATTTTGGGCCATTTTTTATAGCAAA
GATTGAATAAGGCGCATTTTTCTTCAAAGCTTTATTGTACGATCTGACTA
AGTTATCTTTTAATAATTGGTATTCCTGTTTATTGCTTGAAGAAT
[0495] In plasmid pCLS9996, the sequence coding for the TALEN arm
(SEQ ID NO: 39) comprises: [0496] a sequence coding for 15 adjacent
units of TAL effector tandem repeat, and [0497] a sequence coding
for an endonuclease.
[0498] The 15 adjacent units of TAL effector tandem repeat are a N-
to C-ordered series of 15 adjacent units each consisting of 34
amino acids. The last C-terminal unit of 34 amino acids is followed
by one (truncated) unit of 20 amino acids.
[0499] The ordered series of 15 adjacent units determines the
recognition of a specific DNA target site (of 15 nucleotides, i.e.,
of SEQ ID NO: 10), whereas the (truncated) unit of 20 amino acids
is not involved in the specific recognition of said DNA target
site.
[0500] The sequence coding for said 15 adjacent units of 34 amino
acids is at positions 499-2028 within the TALEN coding sequence of
SEQ ID NO: 39, i.e., is:
TABLE-US-00011 [SEQ ID NO: 45]
TTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGGTGGCAAGCA
GGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACG
GCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAG
CAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCA
CGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCA
AGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCC
CACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGGTGG
CAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGG
CCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGC
GGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCA
GGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATGGCG
GTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGC
CAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAA
TGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGT
GCCAGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCAC
GATGGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCT
GTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCA
ATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTG
CTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAG
CAATAATGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGG
TGCTGTGCCAGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCC
AGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCC
GGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCATCG
CCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTG
CCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGGCCAT
CGCCAGCAATAATGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCTGT
TGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCGGAGCAGGTGGTGGCC
ATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTCCAGCGGCT
GTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTGGTGG
CCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCAGCGG
CTGTTGCCGGTGCTGTGCCAGGCCCACGGC.
[0501] The sequence of one of said 15 adjacent units of 34 amino
acids (coding sequence comprised in SEQ ID NO: 45) is:
TABLE-US-00012 [SEQ ID NO: 46] LTPQQVVAIASXXGGKQALETVQRLLPVLCQAHG,
or [SEQ ID NO: 25] LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG,
wherein XX is the RVD of the unit.
[0502] The N- to C-ordered series of RVDs formed by the RVDs
respectively contained in the 15 adjacent units of TAL effector
tandem repeat is:
NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD; NG; NN; HD; NG.
[0503] The N- to C-ordered series of RVDs determines the
recognition of the DNA target site of SEQ ID NO: 10, i.e.,
GCTGCTGCTGCTGCT (cf. Table 5 above; cf. FIG. 1B).
[0504] The sequence coding for said truncated unit of 20 amino
acids is (positions 2029-2088 within the TALEN coding sequence of
SEQ ID NO: 39):
TABLE-US-00013 [SEQ ID NO: 47]
TTGACCCCTCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCC GGCGCTGGAG.
[0505] The sequence of said unit of 20 amino acids is:
TABLE-US-00014 [SEQ ID NO: 48] LTPQQVVAIASNGGGRPALE.
[0506] The sequence coding for the FokI monomer is at positions
2885-3481 within the TALEN coding sequence of SEQ ID NO: 39, i.e.,
is:
TABLE-US-00015 [SEQ ID NO: 3]
cagctggtgaagtccgagctggaggagaagaaatccgagttgaggcaca
agctgaagtacgtgccccacgagtacatcgagctgatcgagatcgcccg
gaacagcacccaggaccgtatcctggagatgaaggtgatggagttcttc
atgaaggtgtacggctacaggggcaagcacctgggcggctccaggaagc
ccgacggcgccatctacaccgtgggctcccccatcgactacggcgtgat
cgtggacaccaaggcctactccggeggctacaacctgcccatcggccag
gccgacgaaatgcagaggtacgtggaggagaaccagaccaggaacaagc
acatcaaccccaacgagtggtggaaggtgtaccectccagcgtgaccga
gttcaagttcctgttcgtgtccggccacttcaagggcaactacaaggcc
cagctgaccaggctgaaccacatcaccaactgcaacggcgccgtgctgt
ccgtggaggagctcctgatcggcggcgagatgatcaaggccggcaccct
gaccctggaggaggtgaggaggaagttcaacaacggcgagatcaacttc gcggccgac.
[0507] The Fold monomer sequence (coded by the sequence of SEQ ID
NO: 3) is:
TABLE-US-00016 [SEQ ID NO: 49]
QLVKSELEEKKSELRHKLKYVPHEYIELIEIARNSTQDRILEMKVMEFFM
KVYGYRGKHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQAD
EMQRYVEENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLT
RLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFAAD.
Sequence Data for Plasmid pCLS16715 (C.N.C.M. I-4805):
[0508] The sequence of the insert carried by plasmid pCLS16715
is:
TABLE-US-00017 [SEQ ID NO: 2] GGGTTCCGCGCACATTTCCCCGAAAAGTGCCA
CCTGACGTCCGATCAAAAATCATCGCTTCGC TGATTAATTACCCCAGAAATAAGGCTAAAAAA
CTAATCGCATTATCATCCTATGGTTGTTAAT TTGATTCGTTCATTTGAAGGTTTGTGGGGCCA
GGTTACTGCCAATTTTTCCTCTTCATAACCA TAAAAGCTAGTATTGTAGAATCTTTATTGTTC
GGAGCAGTGCGGCGCGAGGCACATCTGCGTT TCAGGAACGCGACCGGTGAAGACGAGGACGCA
CGGAGGAGAGTCTTCCTTCGGAGGGCTGTCA CCCGCTCGGCGGCTTCTAATCCGTACTTCAAT
ATAGCAATGAGCAGTTAAGCGTATTACTGAA AGTTCCAAAGAGAAGGTTTTTTTAGGCTAATC
GACCTCGAGCAGATCCGCCAGGCGTGTATAT AGCGTGGATGGCCAGGCAACTTTAGTGCTGAC
ACATACAGGCATATATATATGTGTGCGACGA CACATGATCATATGGCATGCATGTGCTCTGTA
TGTATATAAAACTCTTGTTTTCTTCTTTTCT CTAAATATTCTTTCCTTATACATTAGGTCCTT
TGTAGCATAAATTACTATACTTCTATAGACA CGCAAACACAAATACACAGCGGCCTTGCCACC
ATGGGCGATCCTAAAAAGAAACGTAAGGTCA TCGATTACCCATACGATGTTCCAGATTACGCT
ATCGATATCGCCGATCTACGCACGCTCGGCT ACAGCCAGCAGCAACAGGAGAAGATCAAACCG
AAGGTTCGTTCGACAGTGGCGCAGCACCACG AGGCACTGGTCGGCCACGGGTTTACACACGCG
CACATCGTTGCGTTAAGCCAACACCCGGCAG CGTTAGGGACCGTCGCTGTCAAGTATCAGGAC
ATGATCGCAGCGTTGCCAGAGGCGACACACG AAGCGATCGTTGGCGTCGGCAAACAGTGGTCC
GGCGCACGCGCTCTGGAGGCCTTGCTCACGG TGGCGGGAGAGTTGAGAGGTCCACCGTTACAG
TTGGACACAGGCCAACTTCTCAAGATTGCAA AACGTGGCGGCGTGACCGCAGTGGAGGCAGTG
CATGCATGGCGCAATGCACTGACGGGTGCCC CGCTCAACTTGACCCCCCAGCAGGTGGTGGCC
ATCGCCAGCAATAATGGTGGCAAGCAGGCGC TGGAGACGGTCCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCACGGCTTGACCCCCCAGCAGG TGGTGGCCATCGCCAGCAATGGCGGTGGCAAG
CAGGCGCTGGAGACGGTCCAGCGGCTGTTGC CGGTGCTGTGCCAGGCCCACGGCTTGACCCCC
CAGCAGGTGGTGGCCATCGCCAGCAATAATG GTGGCAAGCAGGCGCTGGAGACGGTCCAGCGG
CTGTTGCCGGTGCTGTGCCAGGCCCACGGCT TGACCCCGGAGCAGGTGGTGGCCATCGCCAGC
AATATTGGTGGCAAGCAGGCGCTGGAGACGG TGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCC
CACGGCTTGACCCCCCAGCAGGTGGTGGCCA TCGCCAGCAATGGCGGTGGCAAGCAGGCGCTG
GAGACGGTCCAGCGGCTGTTGCCGGTGCTGT GCCAGGCCCACGGCTTGACCCCGGAGCAGGTG
GTGGCCATCGCCAGCCACGATGGCGGCAAGC AGGCGCTGGAGACGGTCCAGCGGCTGTTGCCG
GTGCTGTGCCAGGCCCACGGCTTGACCCCGG AGCAGGTGGTGGCCATCGCCAGCCACGATGGC
GGCAAGCAGGCGCTGGAGACGGTCCAGCGGC TGTTGCCGGTGCTGTGCCAGGCCCACGGCTTG
ACCCCGGAGCAGGTGGTGGCCATCGCCAGCC ACGATGGCGGCAAGCAGGCGCTGGAGACGGTC
CAGCGGCTGTTGCCGGTGCTGTGCCAGGCCC ACGGCTTGACCCCGGAGCAGGTGGTGGCCATC
GCCAGCCACGATGGCGGCAAGCAGGCGCTGG AGACGGTCCAGCGGCTGTTGCCGGTGCTGTGC
CAGGCCCACGGCTTGACCCCGGAGCAGGTGG TGGCCATCGCCAGCCACGATGGCGGCAAGCAG
GCGCTGGAGACGGTCCAGCGGCTGTTGCCGG TGCTGTGCCAGGCCCACGGCTTGACCCCGGAG
CAGGTGGTGGCCATCGCCAGCCACGATGGCG GCAAGCAGGCGCTGGAGACGGTCCAGCGGCTG
TTGCCGGTGCTGTGCCAGGCCCACGGCTTGA CCCCGGAGCAGGTGGTGGCCATCGCCAGCAAT
ATTGGTGGCAAGCAGGCGCTGGAGACGGTGC AGGCGCTGTTGCCGGTGCTGTGCCAGGCCCAC
GGCTTGACCCCCCAGCAGGTGGTGGCCATCG CCAGCAATAATGGTGGCAAGCAGGCGCTGGAG
ACGGTCCAGCGGCTGTTGCCGGTGCTGTGCC AGGCCCACGGCTTGACCCCGGAGCAGGTGGTG
GCCATCGCCAGCCACGATGGCGGCAAGCAGG CGCTGGAGACGGTCCAGCGGCTGTTGCCGGTG
CTGTGCCAGGCCCACGGCTTGACCCCGGAGC AGGTGGTGGCCATCGCCAGCAATATTGGTGGC
AAGCAGGCGCTGGAGACGGTGCAGGCGCTGT TGCCGGTGCTGTGCCAGGCCCACGGCTTGACC
CCTCAGCAGGTGGTGGCCATCGCCAGCAATG GCGGCGGCAGGCCGGCGCTGGAGAGCATTGTT
GCCCAGTTATCTCGCCCTGATCCGGCGTTGG CCGCGTTGACCAACGACCACCTCGTCGCCTTG
GCCTGCCTCGGCGGGCGTCCTGCGCTGGATG CAGTGAAAAAGGGATTGGGGGATCCTATCAGC
CGTTCCCAGCTGGTGAAGTCCGAGCTGGAGG AGAAGAAATCCGAGTTGAGGCACAAGCTGAAG
TACGTGCCCCACGAGTACATCGAGCTGATCG AGATCGCCCGGAACAGCACCCAGGACCGTATC
CTGGAGATGAAGGTGATGGAGTTCTTCATGA AGGTGTACGGCTACAGGGGCAAGCACCTGGGC
GGCTCCAGGAAGCCCGACGGCGCCATCTACA CCGTGGGCTCCCCCATCGACTACGGCGTGATC
GTGGACACCAAGGCCTACTCCGGCGGCTACA ACCTGCCCATCGGCCAGGCCGACGAAATGCAG
AGGTACGTGGAGGAGAACCAGACCAGGAACA AGCACATCAACCCCAACGAGTGGTGGAAGGTG
TACCCCTCCAGCGTGACCGAGTTCAAGTTCC TGTTCGTGTCCGGCCACTTCAAGGGCAACTAC
AAGGCCCAGCTGACCAGGCTGAACCACATCA CCAACTGCAACGGCGCCGTGCTGTCCGTGGAG
GAGCTCCTGATCGGCGGCGAGATGATCAAGG CCGGCACCCTGACCCTGGAGGAGGTGAGGAGG
AAGTTCAACAACGGCGAGATCAACTTCGCGG CCGACTGATAACTCGAGCGATCCTCTAGACGA
GCTCCTCGAGCCTGCAGCAGCTGAAGCTTTG GACTTCTTCGCCAGAGGTTTGGTCAAGTCTCC
AATCAAGGTTGTCGGCTTGTCTACCTTGCCA GAAATTTACGAAAAGATGGAAAAGGGTCAAAT
CGTTGGTAGATACGTTGTTGACACTTCTAAA TAAGCGAATTTCTTATGATTTATGATTTTTAT
TATTAAATAAGTTATAAAAAAAATAAGTGTA TACAAATTTTAAAGTGACTCTTAGGTTTTAAA
ACGAAAATTCTTATTCTTGAGTAACTCTTTC CTGTAGGTCAGGTTGCTTTCTCAGGTATAGCA
TGAGGTCGCTCTTATTGACCACACCTCTACC GGCATGCAAGCTTGGCGTAATCATGGTCATAG
CTGTTTCCTGTGTGAAATTGTTATCCGCTCA
CAATTCCACACAACATACGAGCCGGAAGCATA
AAGTGTAAAGCCTGGGGTGCCTAATGAGTGA GCTAACTCACATTAATTGCGTTGCGCTCACTG
CCCGCTTTCCAGTCGGGAAACCTGTCGTGCC AGCAGATCTATTACATTATGGGTGGTATGTTG
GAATAAAAATCAACTATCATCTACTAACTAG TATTTACGTTACTAGTATATTATCATATACGG
TGTTAGAAGATGACGCAAATGATGAGAAATA GTCATCTAAATTAGTGGAAGCTGAAACGCAAG
GATTGATAATGTAATAGGATCAATGAATATT AACATATAAAATGATGATAATAATATTTATAG
AATTGTGTAGAATTGCAGATTCCCTTTTATG GATTCCTAAATCCTCGAGGAGAACTTCTAGTA
TATCTACATACCTAATATTATTGCCTTATTA AAAATGGAATCCCAACAATTACATCAAAATCC
ACATTCTCTTCAAAATCAATTGTCCTGTACT TCCTTGTTCATGTGTGTTCAAAAACGTTATAT
TTATAGGATAATTATACTCTATTTCTCAACA AGTAATTGGTTGTTTGGCCGAGCGGTCTAAGG
CGCCTGATTCAAGAAATATCTTGACCGCAGT TAACTGTGGGAATACTCAGGTATCGTAAGATG
CAAGAGTTCGAATCTCTTAGCAACCATTATT TTTTTCCTCAACATAACGAGAACACACAGGGG
CGCTATCGCACAGAATCAAATTCGATGACTG GAAATTTTTTGTTAATTTCAGAGGTCGCCTGA
CGCATATACCTTTTTCAACTGAAAAATTGGG AGAAAAAGGAAAGGTGAGAGCCGCGGAACCGG
CTTTTCATATAGAATAGAGAAGCGTTCATGA CTAAATGCTTGCATCACAATACTTGAAGTTGA
CAATATTATTTAAGGACCTATTGTTTTTTCC AATAGGTGGTTAGCAATCGTCTTACTTTCTAA
CTTTTCTTACCTTTTACATTTCAGCAATATA TATATATATATTTCAAGGATATACCATTCTAA
TGTCTGCCCCTAAGAAGATCGTCGTTTTGCC AGGTGACCACGTTGGTCAAGAAATCACAGCCG
AAGCCATTAAGGTTCTTAAAGCTATTTCTGA TGTTCGTTCCAATGTCAAGTTCGATTTCGAAA
ATCATTTAATTGGTGGTGCTGCTATCGATGC TACAGGTGTCCCACTTCCAGATGAGGCGCTGG
AAGCCTCCAAGAAGGTTGATGCCGTTTTGTT AGGTGCTGTGGGTGGTCCTAAATGGGGTACCG
GTAGTGTTAGACCTGAACAAGGTTTACTAAA AATCCGTAAAGAACTTCAATTGTACGCCAACT
TAAGACCATGTAACTTTGCATCCGACTCTCT TTTAGACTTATCTCCAATCAAGCCACAATTTG
CTAAAGGTACTGACTTCGTTGTTGTCAGAGA ATTAGTGGGAGGTATTTACTTTGGTAAGAGAA
AGGAAGACGATGGTGATGGTGTCGCTTGGGA TAGTGAACAATACACCGTTCCAGAAGTGCAAA
GAATCACAAGAATGGCCGCTTTCATGGCCCT ACAACATGAGCCACCATTGCCTATTTGGTCCT
TGGATAAAGCTAATGTTTTGGCCTCTTCAAG ATTATGGAGAAAAACTGTGGAGGAAACCATCA
AGAACGAATTCCCTACATTGAAGGTTCAACA TCAATTGATTGATTCTGCCGCCATGATCCTAG
TTAAGAACCCAACCCACCTAAATGGTATTAT AATCACCAGCAACATGTTTGGTGATATCATCT
CCGATGAAGCCTCCGTTATCCCAGGTTCCTT GGGTTTGTTGCCATCTGCGTCCTTGGCCTCTT
TGCCAGACAAGAACACCGCATTTGGTTTGTA CGAACCATGCCACGGTTCTGCTCCAGATTTGC
CAAAGAATAAGGTCAACCCTATCGCCACTAT CTTGTCTGCTGCAATGATGTTGAAATTGTCAT
TGAACTTGCCTGAAGAAGGTAAGGCCATTGA AGATGCAGTTAAAAAGGTTTTGGATGCAGGTA
TCAGAACTGGTGATTTAGGTGGTTCCAACAG TACCACGGAAGTCGGTGATGCTGTCGCCGAAG
AAGTTAAGAAAATCCTTGCTTAAAAAGATTC TCTTTTTTTATGATATTTGTACATAAACTTTA
TAAATGAAATTCATAATAGAAACGACACGAA ATTACAAAATGGAATATGTTCATAGGGTAGAC
GAAACTATATACGCAATCTACATACATTTAT CAAGAAGGAGAAAAAGGAGGATGTAAAGGAAT
ACAGGTAAGCAAATTGATACTAATGGCTCAA CGTGATAAGGAAAAAGAATTGCACTTTAACAT
TAATATTGACAAGGAGGAGGGCACCACACAA AAAGTTAGGTGTAACAGAAAATCATGAAACTA
TGATTCCTAATTTATATATTGGAGGATTTTC TCTAAAAAAAAAAAAATACAACAAATAAAAAA
CACTCAATGACCTGACCATTTGATGGAGTTT AAGTCAATACCTTCTTGAACCATTTCCCATAA
TGGTGAAAGTTCCCTCAAGAATTTTACTCTG TCAGAAACGGCCTTAACGACGTAGTCGACCTC
CTCTTCAGTACTAAATCTACCAATACCAAAT CTGATGGAAGAATGGGCTAATGCATCATCCTT
ACCCAGCGCATGTAAAACATAAGAAGGTTCT AGGGAAGCAGATGTACAGGCTGAACCCGAGGA
TAATGCGATATCCCTTAGTGCCATCAATAAA GATTCTCCTTCCACGTAGGCGAAAGAAACGTT
AACACACCCTGGATAACGATGATCTGGAGAT CCGTTCAACGTGGTATGTTCAGCGGATAATAG
ACCTTTGACTAATTTATCGGATAGTCTTTTG ATGTGAGCTTGGTCGTTGTCAAATTCTTTCTT
CATCAATCTCGCAGCTTCACCAAATCCCGCT ACCAATGGGGGGGCCAAAGTACCAGATCTGCT
GCATTAATGAATCGGCCAACGCGCGGGGAGA GGCGGTTTGCGTATTGGGCGCTCTTCCGCTTC
CTCGCTCACTGACTCGCTGCGCTCGGTCGTT CGGCTGCGGCGAGCGGTATCAGCATCGATGAA
TTCCACGGACTATAGACTATACTAGTATACT CCGTCTACTGTACGATACACTTCCGCTCAGGT
CCTTGTCCTTTAACGAGGCCTTACCACTCTT TTGTTACTCTATTGATCCAGCTCAGCAAAGGC
AGTGTGATCTAAGATTCTATCTTCGCGATGT AGTAAAACTAGCTAGACCGAGAAAGAGACTAG
AAATGCAAAAGGCACTTCTACAATGGCTGCC ATCATTATTATCCGATGTGACGCTGCAGCTTC
TCAATGATATTCGAATACGCTTTGAGGAGAT ACAGCCTAATATCCGACAAACTGTTTTACAGA
TTTACGATCGTACTTGTTACCCATCATTGAA TTTTGAACATCCGAACCTGGGAGTTTTCCCTG
AAACAGATAGTATATTTGAACCTGTATAATA ATATATAGTCTAGCGCTTTACGGAAGACAATG
TATGTATTTCGGTTCCTGGAGAAACTATTGC ATCTATTGCATAGGTAATCTTGCACGTCGCAT
CCCCGGTTCATTTTCTGCGTTTCCATCTTGC ACTTCAATAGCATATCTTTGTTAACGAAGCAT
CTGTGCTTCATTTTGTAGAACAAAAATGCAA CGCGAGAGCGCTAATTTTTCAAACAAAGAATC
TGAGCTGCATTTTTACAGAACAGAAATGCAA CGCGAAAGCGCTATTTTACCAACGAAGAATCT
GTGCTTCATTTTTGTAAAACAAAAATGCAAC GCGAGAGCGCTAATTTTTCAAACAAAGAATCT
GAGCTGCATTTTTACAGAACAGAAATGCAAC GCGAGAGCGCTATTTTACCAACAAAGAATCTA
TACTTCTTTTTTGTTCTACAAAAATGCATCC
CGAGAGCGCTATTTTTCTAACAAAGCATCTTA GATTACTTTTTTTCTCCTTTGTGCGCTCTAT
AATGCAGTCTCTTGATAACTTTTTGCACTGTA GGTCCGTTAAGGTTAGAAGAAGGCTACTTTG
GTGTCTATTTTCTCTTCCATAAAAAAAGCCTG ACTCCACTTCCCGCGTTTACTGATTACTAGC
GAAGCTGCGGGTGCATTTTTTCAAGATAAAGG CATCCCCGATTATATTCTATACCGATGTGGA
TTGCGCATACTTTGTGAACAGAAAGTGATAGC GTTGATGATTCTTCATTGGTCAGAAAATTAT
GAACGGTTTCTTCTATTTTGTCTCTATATACT ACGTATAGGAAATGTTTACATTTTCGTATTG
TTTTCGATTCACTCTATGAATAGTTCTTACTA CAATTTTTTTGTCTAAAGAGTAATACTAGAG
ATAAACATAAAAAATGTAGAGGTCGAGTTTAG ATGCAAGTTCAAGGAGCGAAAGGTGGATGGG
TAGGTTATATAGGGATATAGCACAGAGATATA TAGCAAAGAGATACTTTTGAGCAATGTTTGT
GGAAGCGGTATTCGCAATATTTTAGTAGCTCG TTACAGTCCGGTGCGTTTTTGGTTTTTTGAA
AGTGCGTCTTCAGAGCGCTTTTGGTTTTCAAA AGCGCTCTGAAGTTCCTATACTTTCTAGAGA
ATAGGAACTTCGGAATAGGAACTTCAAAGCGT TTCCGAAAACGAGCGCTTCCGAAAATGCAAC
GCGAGCTGCGCACATACAGCTCACTGTTCACG TCGCACCTATATCTGCGTGTTGCCTGTATAT
ATATATACATGAGAAGAACGGCATAGTGCGTG TTTATGCTTAAATGCGTACTTATATGCGTCT
ATTTATGTAGGATGAAAGGTAGTCTAGTACCT CCTGTGATATTATCCCATTCCATGCGGGGTA
TCGTATGCTTCCTTCAGCACTACCCTTTAGCT GTTCTATATGCTGCCACTCCTCAATTGGATT
AGTCTCATCCTTCAATGCTATCATTTCCTTTG ATATTGGATCATATGCATAGTACCGAGAAAC
TAGTGCGAAGTAGTGATCAGGTATTGCTGTTA TCTGATGAGTATACGTTGTCCTGGCCACGGC
AGAAGCACGCTTATCGCTCCAATTTCCCACAA CATTAGTCAACTCCGTTAGGCCCTTCATTGA
AAGAAATGAGGTCATCAAATGTCTTCCAATGT GAGATTTTGGGCCATTTTTTATAGCAAAGAT
TGAATAAGGCGCATTTTTCTTCAAAGCTTTAT TGTACGATCTGACTAAGTTATCTTTTAATAA
TTGGTATTCCTGTTTATTGCTTGAAGAATTGC CGGTCCTATTTACTCGTTTTAGGACTGGTTC
AGAATTCATCGATGCTCACTCAAAGGTCGGTA ATACGGTTATCCACAGAATCAGGGGATAACG
CAGGAAAGAACATGTGAGCAAAAGGCCAGCAA AAGGCCAGGAACCGTAAAAAGGCCGCGTTGC
TGGCGTTTTTCCATAGGCTCCGCCCCCCTGAC GAGCATCACAAAAATCGACGCTCAAGTCAGA
GGTGGCGAAACCCGACAGGACTATAAAGATAC CAGGCGTTTCCCCCTGGAAGCTCCCTCGTGC
GCTCTCCTGTTCCGACCCTGCCGCTTACCGGA TACCTGTCCGCCTTTCTCCCTTCGGGAAGCG
TGGCGCTTTCTCATAGCTCACGCTGTAGGTAT CTCAGTTCGGTGTAGGTCGTTCGCTCCAAGC
TGGGCTGTGTGCACGAACCCCCCGTTCAGCCC GACCGCTGCGCCTTATCCGGTAACTATCGTC
TTGAGTCCAACCCGGTAAGACACGACTTATCG CCACTGGCAGCAGCCACTGGTAACAGGATTA
GCAGAGCGAGGTATGTAGGCGGTGCTACAGAG TTCTTGAAGTGGTGGCCTAACTACGGCTACA
CTAGAAGGACAGTATTTGGTATCTGCGCTCTG CTGAAGCCAGTTACCTTCGGAAAAAGAGTTG
GTAGCTCTTGATCCGGCAAACAAACCACCGCT GGTAGCGGTGGTTTTTTTGTTTGCAAGCAGC
AGATTACGCGCAGAAAAAAAGGATCTCAAGAA GATCCTTTGATCTTTTCTACGGGGTCTGACG
CTCAGTGGAACGAAAACTCACGTTAAGGGATT TTGGTCATGAGATTATCAAAAAGGATCTTCA
CCTAGATCCTTTTAAATTAAAAATGAAGTTTT AAATCAATCTAAAGTATATATGAGTAAACTT
GGTCTGACAGTTACCAATGCTTAATCAGTGAG GCACCTATCTCAGCGATCTGTCTATTTCGTT
CATCCATAGTTGCCTGACTCCCCGTCGTGTAG ATAACTACGATACGGGAGGGCTTACCATCTG
GCCCCAGTGCTGCAATGATACCGCGAGACCCA CGCTCACCGGCTCCAGATTTATCAGCAATAA
ACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGT GGTCCTGCAACTTTATCCGCCTCCATCCAGT
CTATTAATTGTTGCCGGGAAGCTAGAGTAAGT AGTTCGCCAGTTAATAGTTTGCGCAACGTTG
TTGCCATTGCTACAGGCATCGTGGTGTCACGC TCGTCGTTTGGTATGGCTTCATTCAGCTCCG
GTTCCCAACGATCAAGGCGAGTTACATGATCC CCCATGTTGTGCAAAAAAGCGGTTAGCTCCT
TCGGTCCTCCGATCGTTGTCAGAAGTAAGTTG GCCGCAGTGTTATCACTCATGGTTATGGCAG
CACTGCATAATTCTCTTACTGTCATGCCATCC GTAAGATGCTTTTCTGTGACTGGTGAGTACT
CAACCAAGTCATTCTGAGAATAGTGTATGCGG CGACCGAGTTGCTCTTGCCCGGCGTCAATAC
GGGATAATACCGCGCCACATAGCAGAACTTTA AAAGTGCTCATCATTGGAAAACGTTCTTCGG
GGCGAAAACTCTCAAGGATCTTACCGCTGTTG AGATCCAGTTCGATGTAACCCACTCGTGCAC
CCAACTGATCTTCAGCATCTTTTACTTTCACC AGCGTTTCTGGGTGAGCAAAAACAGGAAGGC
AAAATGCCGCAAAAAAGGGAATAAGGGCGACA CGGAAATGTTGAATACTCATACTCTTCCTTT
TTCAATATTATTGAAGCATTTATCAGGGTTAT TGTCTCATGAGCGGATACATATTTGAATGTA
TTTAGAAAAATAAACAAATAG
[0509] The nucleic acid of SEQ ID NO: 2 (carried by plasmid
pCLS16715) codes for the TALEN arm that binds to the DNA target
site of SEQ ID NO: 4 (cf. FIG. 1B). Hence, the nucleic acid of SEQ
ID NO: 2 comprises a sequence, which codes for adjacent units of
TAL effector tandem repeat that determine recognition of the DNA
target site of SEQ ID NO: 4, and which codes for an endonuclease.
The endonuclease is the monomer of a dimeric endonuclease, i.e., a
Fold monomer. The sequence, which codes for adjacent units of TAL
effector tandem repeat and for an endonuclease, is preceded by a
promoter and an enhancer, and is followed by a terminator.
[0510] The nucleic acid of SEQ ID NO: 2 further comprises a
sequence, which codes for a selection marker, i.e., a leucine
selection marker.
[0511] The nucleic acid of SEQ ID NO: 2 further comprises a
replication origin, i.e., the 2-micron replication origin.
[0512] More particularly, the nucleic acid of SEQ ID NO: 2 (carried
by plasmid pCLS16715) comprises: [0513] a GAL10 enhancer at
positions 43-408 [SEQ ID NO: 37, as in plasmid pCLS9996]; [0514] a
CYC1 promoter at positions 409-648 [SEQ ID NO: 38, as in plasmid
pCLS9996]; [0515] a sequence coding for a TALEN arm (TALEN arm that
binds to the DNA target site of SEQ ID NO: 4) at positions 663-3476
[SEQ ID NO: 50]; [0516] an ADH1 terminator at positions 3525-3844
[SEQ ID NO: 51]; [0517] a sequence coding for the LEU2 selection
marker at positions 4946-6040 [SEQ ID NO: 52]; [0518] the 2-micron
replication origin at positions 7583-8927[SEQ ID NO: 53].
[0519] The sequences of SEQ ID NO: 37 (GAL10 enhancer) and of SEQ
ID NO: 38 (CYC1 promoter) are described above.
[0520] The sequences of SEQ ID NOs: 50-53 are:
TABLE-US-00018 (coding for the TALEN arm that binds to the DNA
target site of SEQ ID NO: 4) SEQ ID NO: 50
ATGGGCGATCCTAAAAAGAAACGTAAGGTCATCGATTACCCATACGATGT
TCCAGATTACGCTATCGATATCGCCGATCTACGCACGCTCGGCTACAGCC
AGCAGCAACAGGAGAAGATCAAACCGAAGGTTCGTTCGACAGTGGCGCAG
CACCACGAGGCACTGGTCGGCCACGGGTTTACACACGCGCACATCGTTGC
GTTAAGCCAACACCCGGCAGCGTTAGGGACCGTCGCTGTCAAGTATCAGG
ACATGATCGCAGCGTTGCCAGAGGCGACACACGAAGCGATCGTTGGCGTC
GGCAAACAGTGGTCCGGCGCACGCGCTCTGGAGGCCTTGCTCACGGTGGC
GGGAGAGTTGAGAGGTCCACCGTTACAGTTGGACACAGGCCAACTTCTCA
AGATTGCAAAACGTGGCGGCGTGACCGCAGTGGAGGCAGTGCATGCATGG
CGCAATGCACTGACGGGTGCCCCGCTCAACTTGACCCCCCAGCAGGTGGT
GGCCATCGCCAGCAATAATGGTGGCAAGCAGGCGCTGGAGACGGTCCAGC
GGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGGTG
GTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTCCA
GCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAGCAGG
TGGTGGCCATCGCCAGCAATAATGGTGGCAAGCAGGCGCTGGAGACGGTC
CAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCGGAGCA
GGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGG
TGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCCCAG
CAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGAC
GGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCCGG
AGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAG
ACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACCCC
GGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGG
AGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGACC
CCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCT
GGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTTGA
CCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCG
CTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGCTT
GACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGG
CGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACGGC
TTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCA
GGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCACG
GCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAG
CAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCA
CGGCTTGACCCCCCAGCAGGTGGTGGCCATCGCCAGCAATAATGGTGGCA
AGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCC
CACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGG
CAAGCAGGCGCTGGAGACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGG
CCCACGGCTTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGT
GGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCA
GGCCCACGGCTTGACCCCTCAGCAGGTGGTGGCCATCGCCAGCAATGGCG
GCGGCAGGCCGGCGCTGGAGAGCATTGTTGCCCAGTTATCTCGCCCTGAT
CCGGCGTTGGCCGCGTTGACCAACGACCACCTCGTCGCCTTGGCCTGCCT
CGGCGGGCGTCCTGCGCTGGATGCAGTGAAAAAGGGATTGGGGGATCCTA
TCAGCCGTTCCCAGCTGGTGAAGTCCGAGCTGGAGGAGAAGAAATCCGAG
TTGAGGCACAAGCTGAAGTACGTGCCCCACGAGTACATCGAGCTGATCGA
GATCGCCCGGAACAGCACCCAGGACCGTATCCTGGAGATGAAGGTGATGG
AGTTCTTCATGAAGGTGTACGGCTACAGGGGCAAGCACCTGGGCGGCTCC
AGGAAGCCCGACGGCGCCATCTACACCGTGGGCTCCCCCATCGACTACGG
CGTGATCGTGGACACCAAGGCCTACTCCGGCGGCTACAACCTGCCCATCG
GCCAGGCCGACGAAATGCAGAGGTACGTGGAGGAGAACCAGACCAGGAAC
AAGCACATCAACCCCAACGAGTGGTGGAAGGTGTACCCCTCCAGCGTGAC
CGAGTTCAAGTTCCTGTTCGTGTCCGGCCACTTCAAGGGCAACTACAAGG
CCCAGCTGACCAGGCTGAACCACATCACCAACTGCAACGGCGCCGTGCTG
TCCGTGGAGGAGCTCCTGATCGGCGGCGAGATGATCAAGGCCGGCACCCT
GACCCTGGAGGAGGTGAGGAGGAAGTTCAACAACGGCGAGATCAACTTCG CGGCCGACTGA
(ADH1 terminator) SEQ ID NO: 51
TTTGGACTTCTTCGCCAGAGGTTTGGTCAAGTCTCCAATCAAGGTTGTCG
GCTTGTCTACCTTGCCAGAAATTTACGAAAAGATGGAAAAGGGTCAAATC
GTTGGTAGATACGTTGTTGACACTTCTAAATAAGCGAATTTCTTATGATT
TATGATTTTTATTATTAAATAAGTTATAAAAAAAATAAGTGTATACAAAT
TTTAAAGTGACTCTTAGGTTTTAAAACGAAAATTCTTATTCTTGAGTAAC
TCTTTCCTGTAGGTCAGGTTGCTTTCTCAGGTATAGCATGAGGTCGCTCT
TATTGACCACACCTCTACCG (coding for the LEU2 selection marker) SEQ ID
NO: 52 ATGTCTGCCCCTAAGAAGATCGTCGTTTTGCCAGGTGACCACGTTGGTCA
AGAAATCACAGCCGAAGCCATTAAGGTTCTTAAAGCTATTTCTGATGTTC
GTTCCAATGTCAAGTTCGATTTCGAAAATCATTTAATTGGTGGTGCTGCT
ATCGATGCTACAGGTGTCCCACTTCCAGATGAGGCGCTGGAAGCCTCCAA
GAAGGTTGATGCCGTTTTGTTAGGTGCTGTGGGTGGTCCTAAATGGGGTA
CCGGTAGTGTTAGACCTGAACAAGGTTTACTAAAAATCCGTAAAGAACTT
CAATTGTACGCCAACTTAAGACCATGTAACTTTGCATCCGACTCTCTTTT
AGACTTATCTCCAATCAAGCCACAATTTGCTAAAGGTACTGACTTCGTTG
TTGTCAGAGAATTAGTGGGAGGTATTTACTTTGGTAAGAGAAAGGAAGAC
GATGGTGATGGTGTCGCTTGGGATAGTGAACAATACACCGTTCCAGAAGT
GCAAAGAATCACAAGAATGGCCGCTTTCATGGCCCTACAACATGAGCCAC
CATTGCCTATTTGGTCCTTGGATAAAGCTAATGTTTTGGCCTCTTCAAGA
TTATGGAGAAAAACTGTGGAGGAAACCATCAAGAACGAATTCCCTACATT
GAAGGTTCAACATCAATTGATTGATTCTGCCGCCATGATCCTAGTTAAGA
ACCCAACCCACCTAAATGGTATTATAATCACCAGCAACATGTTTGGTGAT
ATCATCTCCGATGAAGCCTCCGTTATCCCAGGTTCCTTGGGTTTGTTGCC
ATCTGCGTCCTTGGCCTCTTTGCCAGACAAGAACACCGCATTTGGTTTGT
ACGAACCATGCCACGGTTCTGCTCCAGATTTGCCAAAGAATAAGGTCAAC
CCTATCGCCACTATCTTGTCTGCTGCAATGATGTTGAAATTGTCATTGAA
CTTGCCTGAAGAAGGTAAGGCCATTGAAGATGCAGTTAAAAAGGTTTTGG
ATGCAGGTATCAGAACTGGTGATTTAGGTGGTTCCAACAGTACCACGGAA
GTCGGTGATGCTGTCGCCGAAGAAGTTAAGAAAATCCTTGCTTAA (2-Micron replication
origin) SEQ ID NO: 53
AACGAAGCATCTGTGCTTCATTTTGTAGAACAAAAATGCAACGCGAGAGC
GCTAATTTTTCAAACAAAGAATCTGAGCTGCATTTTTACAGAACAGAAAT
GCAACGCGAAAGCGCTATTTTACCAACGAAGAATCTGTGCTTCATTTTTG
TAAAACAAAAATGCAACGCGAGAGCGCTAATTTTTCAAACAAAGAATCTG
AGCTGCATTTTTACAGAACAGAAATGCAACGCGAGAGCGCTATTTTACCA
ACAAAGAATCTATACTTCTTTTTTGTTCTACAAAAATGCATCCCGAGAGC
GCTATTTTTCTAACAAAGCATCTTAGATTACTTTTTTTCTCCTTTGTGCG
CTCTATAATGCAGTCTCTTGATAACTTTTTGCACTGTAGGTCCGTTAAGG
TTAGAAGAAGGCTACTTTGGTGTCTATTTTCTCTTCCATAAAAAAAGCCT
GACTCCACTTCCCGCGTTTACTGATTACTAGCGAAGCTGCGGGTGCATTT
TTTCAAGATAAAGGCATCCCCGATTATATTCTATACCGATGTGGATTGCG
CATACTTTGTGAACAGAAAGTGATAGCGTTGATGATTCTTCATTGGTCAG
AAAATTATGAACGGTTTCTTCTATTTTGTCTCTATATACTACGTATAGGA
AATGTTTACATTTTCGTATTGTTTTCGATTCACTCTATGAATAGTTCTTA
CTACAATTTTTTTGTCTAAAGAGTAATACTAGAGATAAACATAAAAAATG
TAGAGGTCGAGTTTAGATGCAAGTTCAAGGAGCGAAAGGTGGATGGGTAG
GTTATATAGGGATATAGCACAGAGATATATAGCAAAGAGATACTTTTGAG
CAATGTTTGTGGAAGCGGTATTCGCAATATTTTAGTAGCTCGTTACAGTC
CGGTGCGTTTTTGGTTTTTTGAAAGTGCGTCTTCAGAGCGCTTTTGGTTT
TCAAAAGCGCTCTGAAGTTCCTATACTTTCTAGAGAATAGGAACTTCGGA
ATAGGAACTTCAAAGCGTTTCCGAAAACGAGCGCTTCCGAAAATGCAACG
CGAGCTGCGCACATACAGCTCACTGTTCACGTCGCACCTATATCTGCGTG
TTGCCTGTATATATATATACATGAGAAGAACGGCATAGTGCGTGTTTATG
CTTAAATGCGTACTTATATGCGTCTATTTATGTAGGATGAAAGGTAGTCT
AGTACCTCCTGTGATATTATCCCATTCCATGCGGGGTATCGTATGCTTCC
TTCAGCACTACCCTTTAGCTGTTCTATATGCTGCCACTCCTCAATTGGAT
TAGTCTCATCCTTCAATGCTATCATTTCCTTTGATATTGGATCAT
[0521] In plasmid pCLS16715, the sequence coding for the TALEN arm
(SEQ ID NO: 50) comprises: [0522] a sequence coding for 15 adjacent
units of TAL effector tandem repeat, and [0523] a sequence coding
for an endonuclease.
[0524] The 15 adjacent units of TAL effector tandem repeat are a N-
to C-ordered series of 15 adjacent units each consisting of 34
amino acids. The last C-terminal unit of 34 amino acids is followed
by one (truncated) unit of 20 amino acids.
[0525] The ordered series of 15 adjacent units determines the
recognition of a specific DNA target site (of 15 nucleotides, i.e.,
of SEQ ID NO: 4), whereas the (truncated) unit of 20 amino acids is
not involved in the specific recognition of said DNA target
site.
[0526] The sequence coding for said 15 adjacent units of 34 amino
acids is at positions 481-2010 within the TALEN coding sequence of
SEQ ID NO: 50, i.e., is:
TABLE-US-00019 [SEQ ID NO: 54] TTGACCCCCCAGCAGGTGGTGGCCATCGCCAG
CAATAATGGTGGCAAGCAGGCGCTGGAGACG GTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGC
CCACGGCTTGACCCCCCAGCAGGTGGTGGCC ATCGCCAGCAATGGCGGTGGCAAGCAGGCGCT
GGAGACGGTCCAGCGGCTGTTGCCGGTGCTG TGCCAGGCCCACGGCTTGACCCCCCAGCAGGT
GGTGGCCATCGCCAGCAATAATGGTGGCAAG CAGGCGCTGGAGACGGTCCAGCGGCTGTTGCC
GGTGCTGTGCCAGGCCCACGGCTTGACCCCG GAGCAGGTGGTGGCCATCGCCAGCAATATTGG
TGGCAAGCAGGCGCTGGAGACGGTGCAGGCG CTGTTGCCGGTGCTGTGCCAGGCCCACGGCTT
GACCCCCCAGCAGGTGGTGGCCATCGCCAGC AATGGCGGTGGCAAGCAGGCGCTGGAGACGGT
CCAGCGGCTGTTGCCGGTGCTGTGCCAGGCC CACGGCTTGACCCCGGAGCAGGTGGTGGCCAT
CGCCAGCCACGATGGCGGCAAGCAGGCGCTG GAGACGGTCCAGCGGCTGTTGCCGGTGCTGTG
CCAGGCCCACGGCTTGACCCCGGAGCAGGTG GTGGCCATCGCCAGCCACGATGGCGGCAAGCA
GGCGCTGGAGACGGTCCAGCGGCTGTTGCCG GTGCTGTGCCAGGCCCACGGCTTGACCCCGGA
GCAGGTGGTGGCCATCGCCAGCCACGATGGC GGCAAGCAGGCGCTGGAGACGGTCCAGCGGCT
GTTGCCGGTGCTGTGCCAGGCCCACGGCTTG ACCCCGGAGCAGGTGGTGGCCATCGCCAGCCA
CGATGGCGGCAAGCAGGCGCTGGAGACGGTC CAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCA
CGGCTTGACCCCGGAGCAGGTGGTGGCCATC GCCAGCCACGATGGCGGCAAGCAGGCGCTGGA
GACGGTCCAGCGGCTGTTGCCGGTGCTGTGC CAGGCCCACGGCTTGACCCCGGAGCAGGTGGT
GGCCATCGCCAGCCACGATGGCGGCAAGCAG GCGCTGGAGACGGTCCAGCGGCTGTTGCCGGT
GCTGTGCCAGGCCCACGGCTTGACCCCGGAG CAGGTGGTGGCCATCGCCAGCAATATTGGTGG
CAAGCAGGCGCTGGAGACGGTGCAGGCGCTG TTGCCGGTGCTGTGCCAGGCCCACGGCTTGAC
CCCCCAGCAGGTGGTGGCCATCGCCAGCAAT AATGGTGGCAAGCAGGCGCTGGAGACGGTCCA
GCGGCTGTTGCCGGTGCTGTGCCAGGCCCAC GGCTTGACCCCGGAGCAGGTGGTGGCCATCGC
CAGCCACGATGGCGGCAAGCAGGCGCTGGAG ACGGTCCAGCGGCTGTTGCCGGTGCTGTGCCA
GGCCCACGGCTTGACCCCGGAGCAGGTGGTG GCCATCGCCAGCAATATTGGTGGCAAGCAGGC
GCTGGAGACGGTGCAGGCGCTGTTGCCGGTG CTGTGCCAGGCCCACGGC.
[0527] The sequence of one of said 15 adjacent units of 34 amino
acids (coding sequence comprised in SEQ ID NO: 54) is:
TABLE-US-00020 [SEQ ID NO: 46] LTPQQVVAIASXXGGKQALETVQRLLPVLCQAHG,
or [SEQ ID NO: 55] LTPEQVVAIASXXGGKQALETVQALLPVLCQAHG, or [SEQ ID
NO: 25] LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG,
or wherein XX is the RVD of the unit.
[0528] The N- to C-ordered series of RVDs formed by the RVDs
respectively contained in the 15 adjacent units of TAL effector
tandem repeat is:
NN; NG; NN; NI; NG; HD; HD; HD; HD; HD; HD; NI; NN; HD; NI.
[0529] The N- to C-ordered series of RVDs determines the
recognition of the DNA target site of SEQ ID NO: 4, i.e.,
GTGATCCCCCCAGCA (cf. Table 5 above; cf. FIG. 1B).
[0530] The sequence coding for said truncated unit of 20 amino
acids is the sequence of SEQ ID NO: 47 (coding for the unit of SEQ
ID NO: 48; same coding and amino acid sequences as in plasmid
pCLS9996), and is at positions 2011-2070 within the TALEN coding
sequence of SEQ ID NO: 50.
[0531] The sequence coding for the Fold monomer (same FokI monomer
as in the plasmid pCLS9996) is the sequence of SEQ ID NO: 3 (coding
for the Fold monomer of SEQ ID NO: 49), and is at positions
2212-2808 within the TALEN coding sequence of SEQ ID NO: 50.
Sequence Data for the DNA Target Sites:
[0532] 5'-3' sequence of the DNA target site of the left-hand TALE
(cf. FIG. 1B)=
TABLE-US-00021 [SEQ ID NO: 4] GTGATCCCCCCAGCA
[0533] Sequence complementary to the sequence of the DNA target
site of the left-hand TALE (5'-3)=
TABLE-US-00022 [SEQ ID NO: 5] TGCTGGGGGGATCAC
[0534] Portion of the DNA target site of the left-hand TALE that is
the sequence of the 5' end of the tandem repeat:
TABLE-US-00023 [SEQ ID NO: 6] CAGCA,
[0535] Portion of the DNA target site of the left-hand TALE that is
the gene sequence that is immediately adjacent to the 5' end of the
tandem repeat (outside of the tandem repeat sequence):
TABLE-US-00024 [SEQ ID NO: 7] GTGATCCCCC
[0536] 5'-3' sequence of the spacer (cf. FIG. 1B)=
TABLE-US-00025 [SEQ ID NO: 8] GCAGCAGCAGCAGCAGCAGC
[0537] Sequence of the spacer (5'-3') on the complementary strand
(5'-3')=
TABLE-US-00026 [SEQ ID NO: 9] GCTGCTGCTGCTGCTGCTGC
[0538] 5'-3' sequence of the DNA target site of the right-hand TALE
(cf. FIG. 1B)=
TABLE-US-00027 [SEQ ID NO: 10] GCTGCTGCTGCTGCT
[0539] Sequence complementary to the DNA target site of the
right-hand TALE (5'-3')=
TABLE-US-00028 [SEQ ID NO: 11] AGCAGCAGCAGCAGC
[0540] Sequence of the split left TALE DNA-binding domain
(5'-3')=
TABLE-US-00029 [SEQ ID NO: 12] TCGCTG-
CAGGTCGGCCTCAGCCTGGCCGAAAGAAAGAAATGGTCTGTGATCCCCC- CAGCAGCAGC
[0541] Sequence complementary to the split left TALE DNA-binding
domain (5'-3')=
TABLE-US-00030 [SEQ ID NO: 13] GCTGCTGCTG-
GTCCAGCCGGAGTCGGACCGGCTTTCTTTCTTTACCAGACACTAGGGGG- CAGCGA
[0542] Molecular analysis of survivors after TALEN induction.
Please see FIGS. 2A-AD.
[0543] FIG. 2A: Survival after galactose induction. Cells were
grown in YPLactate for 5 hours (one generation), then plated on
SC-Leu plates supplemented with 200 .mu.g/mL G418 sulfate,
containing either 20 g/L glucose or galactose. Survival was
determined as the ratio of CFU on galactose plates over CFU on
glucose plates, after 3-5 days of growth at 30.degree. C.
[0544] FIG. 2B: Molecular analysis of heterozygous diploids
(SUP4-opal I sup4-(CAG))[(CAG).sub.13=SEQ ID NO: 27]. Red and white
colonies were picked, total genomic DNA was extracted, digested
with Eco RV, loaded on a 1% agarose gel and run overnight at 1
V/cm. The gel was vacuum transferred to a HYBOND-XL.RTM. nylon
membrane (AMERSHAM) and hybridized with a randomly-labeled probe
specific of a unique region downstream of SUP4. After washing, the
membrane was overnight exposed on a FUJIFILM FLA-9000.
[0545] FIG. 2C: DNA extracted from survivors was PCR amplified
using primers su3/su9 and in vitro digested using restriction
enzyme I-Sce I (I) or Pst I (P). For each clone, numbered 1 to 20,
the two lanes show the result of restriction with one of the two
enzymes. When both alleles are present, bands of slightly different
sizes corresponding to uncut alleles are visible in both lanes
(arrow labeled "Uncut"), along with restriction products of cut
alleles (arrows labeled "Cut"). When only the SUP4-opal allele is
present, no cut product is detected in the `I` lane (clones 8 and
11 to 20). Note that these 20 survivors correspond to the same
clones as in FIG. 2B.
[0546] FIG. 2D: Molecular analysis of homozygous diploids
(sup4-(CAG)/sup4-(CAG)). Same as FIG. 2B, except that total genomic
DNA was digested with Ssp I.
[(CAG).sub.29=SEQ ID NO: 28; (CAG).sub.15=SEQ ID NO: 29;
(CAG).sub.3=SEQ ID NO: 30]
[0547] Karyotypes and sequencing of TALEN-induced yeast colonies.
Please see FIGS. 3A-3D. FIG. 3A: Sanger sequencing of survivors.
PCR fragment amplified with su3/su9 (FIG. 2C) was sequenced using a
primer (su7) located ca. 210 bp upstream of the repeat tract.
[0548] Upper and lower graphs: when only one allele was present,
one unique sequence was read [upper graph, homozygous
(CTG).sub.9/(CTG).sub.9 ((CTG).sub.9=SEQ ID NO: 14); the sequence
reads:
TABLE-US-00031 (SEQ ID NO: 15)]
TAGCCGGGAATG(CTG).sub.9GGGGGATCACAGACCATTTCTTTCTT.
[0549] When two alleles of different lengths were present, the
sequence was blurry and unreadable after the shortest of the two
repeat tracts [lower graph, heterozygous (CTG).sub.9/(CTG).sub.n
((CTG).sub.9
[0550] SEQ ID NO: 14); the sequence reads:
TABLE-US-00032 (SEQ ID NO: 16)]
TAGCCGGGAATG(CTG).sub.9GGGGGATCACATACTTTTTTTTTCTTTCG.
[0551] The freeware 4PEAKS was used to visualize sequences.
[0552] Histogram at the bottom of FIG. 3B: length distribution of
alleles in homozygous and heterozygous survivors to TALEN
induction. The values read as shown in Table 1 below.
TABLE-US-00033 TABLE 1 Final length of Number repeat tract
Heterozygotes Homozygotes 3 0 4 4 0 4 5 1 4 6 1 4 7 10 4 8 9 16 9 4
6 10 3 4 11 2 0 12 1 0 13 5 0 20 1 0
[0553] Note that for heterozygous alleles only the length of the
shortest repeat can be precisely known, hence the statistical
difference observed between the two distributions is even more
important FIG. 3B: Two models proposing how heterozygous and
homozygous repeats may be formed following TALEN induction.
[0554] FIG. 3C: Deep sequencing of yeast genomes from yeast
colonies isolated on glucose or galactose plates. Each of the 15
yeast genomes was re-sequenced to 700.times. coverage, on the
average (see Table 2 below). For each colony, the number of unique
SNPs and insertions/deletions is indicated.
TABLE-US-00034 TABLE 2 ILLUMINA sequencing data Initial Read read
length after Median length trimming sequencing Origin Library Total
reads (bp) (bp) depth Galactose GAL1 .sup. 298 .times. 10.sup.6 110
82 1601 X GAL2 119.6 .times. 10.sup.6 110 82 677 X GAL3 134.4
.times. 10.sup.6 110 82 780 X GAL4 117.8 .times. 10.sup.6 110 82
675 X GAL5 262.2 .times. 10.sup.6 110 82 765 X GAL6 167.6 .times.
10.sup.6 110 82 975 X GAL7 155.4 .times. 10.sup.6 110 82 1779
x.sup. Glucose GLU1 41.2 .times. 10.sup.6 110 83 457 X GLU2 41.2
.times. 10.sup.6 110 83 457 x GLU3 70 .times. 10.sup.6 110 83 394 X
GLU4 .sup. 118 .times. 10.sup.6 110 83 648 X GLU5 54 .times.
10.sup.6 110 83 303 X GLU6 28 .times. 10.sup.6 110 83 156 X GLU7 44
.times. 10.sup.6 110 83 249 X GLU8 .sup. 100 .times. 10.sup.6 110
83 588 X
[0555] Each library corresponds to one individual colony, collected
on glucose or galactose plates
[0556] (Origin), grown in non-selective rich medium, whose DNA was
extracted and sonicated to an average size of 500 bp (BIORUPTOR,
maximum power (H), 30'' ON/30'' OFF cycles, 9 cycles). DNA ends
were subsequently repaired with T4 DNA polymerase (15 units,
NEBIOLABS) and KLENOW DNA polymerase (5 units, NEBIOLABS) and
phosphorylated with T4 DNA kinase (50 units, NEBIOLABS). Repaired
DNA was purified on two MINELUTE columns (QIAGEN) and eluted in 16
.mu.l (32 .mu.l final for each library). Addition of a 3' dATP was
performed with KLENOW DNA polymerase (exo-) (15 units, NEBIOLABS)
and home-made adapters containing a 4-bp unique tag used for
multiplexing, were ligated with 2 .mu.l T4 DNA ligase (NEBIOLABS,
high concentration, 2.times.10.sup.6 units/ml). DNA was size DNA
fragments was recovered in LOBIND microtubes (EPPENDORF). DNA was
PCR amplified with ILLUMINA primers PE1.0 and PE2.0 and PHUSION DNA
polymerase (1 unit, THERMO SCIENTIFIC). Depending on PCR
efficiency, 9, 12 or 15 PCR cycles were performed on each library.
Twenty-four PCR reactions were pooled, for each library, and
purified on QIAGEN purification columns (two columns were used for
24 PCR reactions).
[0557] Elution was performed in 60 .mu.l (twice 30 .mu.1) and DNA
was quantified on a spectrophotometer and on an agarose gel.
[0558] Two multiplexed libraries were loaded on each lane of a
HISEQ 2000 (ILLUMINA), and 110 bp paired-end reads were generated.
Reads quality was evaluated by FASTQC v.0.10.1
[http://www.bioinformatics.babraham.ac.uk/projects/fastqc/] and
trimmed off using the paired-end mode of TRIMMOMATIC v0.30
[http://www.usadellab.org/cms/index.php?page=trimmomatic].
[0559] TRIMMED reads were mapped along S288C chromosomes reference
sequence (GENBANK NC 001133 to NC 001148, PLN 6 Dec. 2008), plus
the two SUP4 alleles (SUP4-opal and sup4-(CAG)) using the
paired-end mapping mode of BWA v0.6.2 (Li and Durbin 2009) with
default parameters. The output SAM files were converted and sorted
to BAM files using SAMTOOLS v0.1.18 (Li et al. 2009).
[0560] The command IndelRealigner from GATK v2.2 (DePristo et al.
2011) was used to realigne the reads. Duplicated reads were removed
using the option "MarkDuplicates" implemented in Picard v1.81
[http://picard.sourceforge.net/]. Reads uniquely mapped to the
reference sequence with a minimum mapping quality of 30
(PHRED-scaled) were kept. MPILEUP files were generated by SAMTOOLS
without BAQ adjustments. SNPs and INDELs were called by the options
"mpileup2snp" and "mpileup2indel" of Varscan2 v2.3.5 (Koboldt et
al. 2012) with a minimum depth of 5 reads and a threshold of 0.3
for minimum variant allele frequency (strains are diploids).
Mismatches were kept when they represented at least 20% of the
reads supporting the variant on each strand. They were manually
examined and compared between all sequenced libraries for
interpretation.
[0561] FIG. 3D: Pulse-field gel electrophoresis of red and white
colonies after galactose induction. Karyotypes are identical among
all clones and do not show any large chromosomal rearrangement,
neither on chromosome X (bearing SUP4) nor on any other
chromosome.
[0562] FIG. 4:
[0563] Left: strains GFY6161-3C (MATa leu2.DELTA.1 his3.DELTA.200
lys2.DELTA.202 ade2-opal sup4::(CAG).sub.30) and GFY6162-3D (MATa
ura3.DELTA.851 leu2.DELTA.1 his3.DELTA.200 trp1.DELTA.65 ade2-opal
sup4::(CAG).sub.100) were respectively transformed with pCLS9996
(KANMX marker) or pCLS16715 (LEU2 marker). Seven transformants were
analyzed by Southern blot, for each strain, to estimate repeat
length variability after transformation. Transformant 4 in strain
GFY6162-3C shows extensive contractions of the repeat tract, but
all other transformants exhibit stable trinucleotide repeats after
transformation. Right: Transformants GFY6162-3C/1 and GFY6162-3D/2
were crossed, and diploids were selected on glucose SC-Leu plates
supplemented with G418 sulfate (200 .mu.g/ml). Twelve independent
diploids were analyzed by Southern blot, as previously. None of the
diploids contained the repeat band around 100 triplets, showing
that it was contracted during or right after the cross, even though
cells were crossed on glucose medium. In this particular cross,
diploid #5 was selected for further induction experiments.
[(CTG).sub.122=SEQ ID NO: 31; (CTG).sub.72=SEQ ID NO: 32;
(CTG).sub.32=SEQ ID NO: 33; (CTG).sub.2=SEQ ID NO: 34]
Example 2
[0564] Myotonic dystrophy (DM) is caused by a CTG repeat expansion
in the 3'UTR of the DM protein kinase (DMPK) gene
[(CTG).sub.n.(CAG), repeat]. The size of the CTG repeat, which
increases from generation to generation with sometimes very large
expansions, is generally correlated with clinical severity and age
at onset, providing a molecular basis for the anticipation
phenomenon observed in DM1 families.
[0565] Transgenic mice carrying the human DMPK gene with a normal
CTG repeat (i.e., 5-37 repeat units) or with an expanded CTG repeat
(e.g., 200-3,000 CTG repeat units) were generated and bred as
described in Gantelet et al. 2007, Seznec et al. 2001,
Gomes-Pereira et al. 2007, Panaite et al. 2011, Panaite et al.
2013.
[0566] Transgenic mice carrying about 20 CTG repeat units (DM20
mice) are control mice, which do not show the DM1 phenotype.
[0567] Transgenic mice carrying 200-3,000 CTG repeat units develop
the DM1 phenotype, ranging from mild DM1 phenotype (e.g., mice,
which carry about 500 CTG repeat units) to severe DM1 phenotype
(e.g., mice, which carry more than 1,300 CTG repeat units).
[0568] Fibroblast primary cells have been isolated from DM20 mice
and mice carrying different lengths of expanded repeat (e.g., about
500 CTG repeat units; more than 1,300 CTG repeat units), and have
been cultured on a culture medium.
[0569] Human cells have been collected from healthy donors having a
normal DMPK CTG repeat length, as well as from DM1 patients at
different stages of the disease.
[0570] Plasmids coding for DNA-binding polypeptides of the
application, such as the TALEN described in example 1 above, have
been transfected into the mouse fibroblast primary cells or into
the human cells.
[0571] Plasmids coding for DNA-binding polypeptides of the
application, such as the TALEN described in example 1 above, have
been administered to the mice, e.g., by intraveinous injection.
[0572] The effect of the TALEN on the repeat length has been
determined by Southern blot analysis and/or PCR, e.g., as described
in Jansen et al. 1994.
BIBLIOGRAPHIC REFERENCES
[0573] Bedell, V. M. et al. 2012. In vivo genome editing using a
high-efficiency TALEN system. Nature 491, 114-118, doi:nature11537
[pii] 10.1038/nature11537. [0574] Beurdeley, M. et al. 2013.
Compact designer TALENs for efficient genome engineering. Nat.
Commun. 4, 1762, doi:ncomms2782 [pii] 10.1038/ncomms2782. [0575]
Boch, J. et al. 2009. Breaking the code of DNA binding specificity
of TAL-type III effectors. Science 326, 1509-1512, doi:1178811
[pii] 10.1126/science.1178811. [0576] Bogdanove and Voytas. 2011.
TAL effectors: Customizable proteins for DNA targeting. Science 33,
1843-1846. [0577] Cade, L. et al. 2012. Highly efficient generation
of heritable zebrafish gene mutations using homo- and heterodimeric
TALENs. Nucleic Acids Res. 40, 8001-8010, doi:gks518 [pii]
10.1093/nar/gks518. [0578] Cermak, T. et al. 2011. Efficient design
and assembly of custom TALEN and other TAL effector-based
constructs for DNA targeting. Nucleic Acids Res. 39, e82,
doi:gkr218 [pii] 10.1093/nar/gkr218. [0579] Chen, S. et al. 2013. A
large-scale in vivo analysis reveals that TALENs are significantly
more mutagenic than ZFNs generated using context-dependent
assembly. Nucleic Acids Res. 41, 2769-2778, doi:gks1356 [pii]
10.1093/nar/gks1356. [0580] Christian, M. et al. 2010. Targeting
DNA double-strand breaks with TAL effector nucleases. Genetics 186,
757-761, doi:genetics.110.120717 [pii] 10.1534/genetics.110.120717.
[0581] DePristo, M. A. et al. 2011. A framework for variation
discovery and genotyping using next-generation DNA sequencing data.
Nat. Genet. 43, 491-498, doi:ng.806 [pii] 10.1038/ng.806. [0582]
Durfee et al. 2008. The complete genome sequence of Escherichia
coli DH10B: insights into the biology of a laboratory workhorse. J.
Bacteriol. 190(7): 2597-2606. [0583] Gantelet et al. 2007. The
expansion of 300 CTG repeats in myotonic dystrophy transgenic mice
does not induce sensory or motor neuropathy. Acta Neuropathol. 114:
175-185. [0584] Giniger, E., Varnum, S. M. & Ptashne, M. 1985.
Specific DNA binding of GAL4, a positive regulatory protein of
yeast. Cell 40, 767-774, doi:0092-8674(85)90336-8 [pii]. [0585]
Gomes-Pereira et al. 2007. CTG trinucleotide repeat "big jumps":
large expansions, small mice. PLoS Genet. 3: e52. [0586] Jansen et
al. 1994. Gonosomal mosaicism in myotonic dystrophy patients:
involvement of mitotic events in (CTG)n repeat variation and
selection against extreme expansion in sperm. Am. J. Hum. Genet.
54: 575-585. [0587] Koboldt, D. C. et al. 2012. VarScan 2: somatic
mutation and copy number alteration discovery in cancer by exome
sequencing. Genome Res. 22, 568-576, doi:gr.129684.111 [pii]
10.1101/gr.129684.111. [0588] Li, H. and Durbin, R. 2009. Fast and
accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25: 1754-1760, doi:btp324 [pii]
10.1093/bioinformatics/btp324. [0589] Li., H. et al. 2009. The
sequence alignment/map format and SAMtools. Bioinformatics 25:
2078-2079, doi:btp352 [pii] 10.1093/bioinformatics/btp352. [0590]
Li, T. et al. 2011. TAL nucleases (TALNs): hybrid proteins composed
of TAL effectors and FokI DNA-cleavage domain. Nucleic Acids Res.
39, 359-372, doi:gkq704 [pii] 10.1093/nar/gkq704. [0591] Lynch, M.
et al. 2008. A genome-wide view of the spectrum of spontaneous
mutations in yeast. Proc. Natl. Acad. Sci. U.S.A. 105, 9272-9277,
doi:0803466105 [pii] 10.1073/pnas.0803466105. [0592] Moscou, M. J.
& Bogdanove, A. J. 2010. A simple cipher governs DNA
recognition by TAL effectors. Science 326, 1501, doi:1178817 [pii]
10.1126/science.1178817 (2009). [0593] McKusick, V. A. 1998.
Mendelian Inheritance in Man; A Catalog of Human Genes and Genetic
Disorders. Baltimore, Md., U.S.A., Johns Hopkins University Press,
ISBN 0-8018-5742-2. [0594] McMurray. Mechanisms of trinucleotide
repeat instability during human development. Nat. Rev. Genet.
11(11): 786-799. [0595] O'Hoy, K. L. et al. 1993. Reduction in size
of the myotonic dystrophy trinucleotide repeat mutation during
transmission. Science 259, 809-812. [0596] Panaite et al. 2011.
Peripheral neuropathy is linked to a severe form of myotonic
dystrophy in transgenic mice. J. Neuropathol. Exp. Neurol. 70:
678-685. [0597] Panaite et al. 2013. Functional and
histopathological identification of the respiratory failure in a
DMSXL transgenic mouse model of myotonic dystrophy. Dis. Model
Mech. 6(3): 622-631. [0598] Philippe S. et al. 2006. Lentiviral
vectors with a defective integrase allow efficient and sustained
transgene expression in vitro and in vivo. PNAS 103(47):
17684-17689. [0599] Qiu, Z. et al. 2013. High-efficiency and
heritable gene targeting in mouse by transcription activator-like
effector nucleases. Nucleic Acids Res., doi:gkt258 [pii]
10.1093/nar/gkt258. Remington: The Science and Practice of
Pharmacy", 20.sup.th edition, Mack Publishing Co.; and
"Pharmaceutical Dosage Forms and Drug Delivery Systems", Ansel,
Popovich and Allen Jr., Lippincott Williams and Wilkins. [0600]
Richard, G.-F., Dujon, B. & Haber, J. E. 1999. Double-strand
break repair can lead to high frequencies of deletions within short
CAG/CTG trinucleotide repeats. Mol. Gen. Genet. 261, 871-882.
[0601] Richard, G.-F., Goellner, G. M., McMurray, C. T. &
Haber, J. E. 2000. Recombination-induced CAG trinucleotide repeat
expansions in yeast involve the MRE11/RAD50/XRS2 complex. EMBO J.
19, 2381-2390. [0602] Richard, G.-F., Cyncynatus, C. & Dujon,
B. 2003. Contractions and expansions of CAG/CTG trinucleotide
repeats occur during ectopic gene conversion in yeast, by a
MUS81-independent mechanism. J. Mol. Biol. 326, 769-782 (2003).
[0603] Seznec et al. 2001. Mice transgenic for the human myotonic
dystrophy region with expanded CTG repeats display muscular and
brain abnormalities. Hum. Mol. Genet. 10: 2717-2726. [0604] WO
94/18313 and its national counterparts including its US
counterpart(s) (including the US continuation and divisional
applications). [0605] WO 95/09233 and its national counterparts
including its US counterpart(s) (including the US continuation and
divisional applications). [0606] WO 99/55892 and its national
counterparts including its US counterpart(s) (including the US
continuation and divisional applications). [0607] WO 2006/010834
and its national counterparts including its US counterpart(s)
(including the US continuation and divisional applications). [0608]
WO 2009/019612 and its national counterparts including its US
counterpart(s) (including the US continuation and divisional
applications). [0609] WO 2011/072246 and its national counterparts,
including its US counterpart(s) (including the US continuation and
divisional applications). [0610] WO 2010/079430 and its national
counterparts, including its US counterpart(s) (including the US
continuation and divisional applications). [0611] WO 2012/015938
and its national counterparts, including its US national
counterpart(s) (including the US continuation and divisional
applications). [0612] WO 2013/068430 and its national counterparts
including its US counterpart(s) (including the US continuation and
divisional applications).
Sequence CWU 1
1
5618810DNAArtificialplasmid pCLS9996 1gcgcacattt ccccgaaaag
tgccacctga cgtccgatca aaaatcatcg cttcgctgat 60taattacccc agaaataagg
ctaaaaaact aatcgcatta tcatcctatg gttgttaatt 120tgattcgttc
atttgaaggt ttgtggggcc aggttactgc caatttttcc tcttcataac
180cataaaagct agtattgtag aatctttatt gttcggagca gtgcggcgcg
aggcacatct 240gcgtttcagg aacgcgaccg gtgaagacga ggacgcacgg
aggagagtct tccttcggag 300ggctgtcacc cgctcggcgg cttctaatcc
gtacttcaat atagcaatga gcagttaagc 360gtattactga aagttccaaa
gagaaggttt ttttaggcta atcgacctcg agcagatccg 420ccaggcgtgt
atatagcgtg gatggccagg caactttagt gctgacacat acaggcatat
480atatatgtgt gcgacgacac atgatcatat ggcatgcatg tgctctgtat
gtatataaaa 540ctcttgtttt cttcttttct ctaaatattc tttccttata
cattaggtcc tttgtagcat 600aaattactat acttctatag acacgcaaac
acaaatacac agcggccttg ccaccatggg 660cgatcctaaa aagaaacgta
aggtcatcga taaggagacc gccgctgcca agttcgagag 720acagcacatg
gacagcatcg atatcgccga tctacgcacg ctcggctaca gccagcagca
780acaggagaag atcaaaccga aggttcgttc gacagtggcg cagcaccacg
aggcactggt 840cggccacggg tttacacacg cgcacatcgt tgcgttaagc
caacacccgg cagcgttagg 900gaccgtcgct gtcaagtatc aggacatgat
cgcagcgttg ccagaggcga cacacgaagc 960gatcgttggc gtcggcaaac
agtggtccgg cgcacgcgct ctggaggcct tgctcacggt 1020ggcgggagag
ttgagaggtc caccgttaca gttggacaca ggccaacttc tcaagattgc
1080aaaacgtggc ggcgtgaccg cagtggaggc agtgcatgca tggcgcaatg
cactgacggg 1140tgccccgctc aacttgaccc cccagcaggt ggtggccatc
gccagcaata atggtggcaa 1200gcaggcgctg gagacggtcc agcggctgtt
gccggtgctg tgccaggccc acggcttgac 1260cccggagcag gtggtggcca
tcgccagcca cgatggcggc aagcaggcgc tggagacggt 1320ccagcggctg
ttgccggtgc tgtgccaggc ccacggcttg accccccagc aggtggtggc
1380catcgccagc aatggcggtg gcaagcaggc gctggagacg gtccagcggc
tgttgccggt 1440gctgtgccag gcccacggct tgacccccca gcaggtggtg
gccatcgcca gcaataatgg 1500tggcaagcag gcgctggaga cggtccagcg
gctgttgccg gtgctgtgcc aggcccacgg 1560cttgaccccg gagcaggtgg
tggccatcgc cagccacgat ggcggcaagc aggcgctgga 1620gacggtccag
cggctgttgc cggtgctgtg ccaggcccac ggcttgaccc cccagcaggt
1680ggtggccatc gccagcaatg gcggtggcaa gcaggcgctg gagacggtcc
agcggctgtt 1740gccggtgctg tgccaggccc acggcttgac cccccagcag
gtggtggcca tcgccagcaa 1800taatggtggc aagcaggcgc tggagacggt
ccagcggctg ttgccggtgc tgtgccaggc 1860ccacggcttg accccggagc
aggtggtggc catcgccagc cacgatggcg gcaagcaggc 1920gctggagacg
gtccagcggc tgttgccggt gctgtgccag gcccacggct tgacccccca
1980gcaggtggtg gccatcgcca gcaatggcgg tggcaagcag gcgctggaga
cggtccagcg 2040gctgttgccg gtgctgtgcc aggcccacgg cttgaccccc
cagcaggtgg tggccatcgc 2100cagcaataat ggtggcaagc aggcgctgga
gacggtccag cggctgttgc cggtgctgtg 2160ccaggcccac ggcttgaccc
cggagcaggt ggtggccatc gccagccacg atggcggcaa 2220gcaggcgctg
gagacggtcc agcggctgtt gccggtgctg tgccaggccc acggcttgac
2280cccccagcag gtggtggcca tcgccagcaa tggcggtggc aagcaggcgc
tggagacggt 2340ccagcggctg ttgccggtgc tgtgccaggc ccacggcttg
accccccagc aggtggtggc 2400catcgccagc aataatggtg gcaagcaggc
gctggagacg gtccagcggc tgttgccggt 2460gctgtgccag gcccacggct
tgaccccgga gcaggtggtg gccatcgcca gccacgatgg 2520cggcaagcag
gcgctggaga cggtccagcg gctgttgccg gtgctgtgcc aggcccacgg
2580cttgaccccc cagcaggtgg tggccatcgc cagcaatggc ggtggcaagc
aggcgctgga 2640gacggtccag cggctgttgc cggtgctgtg ccaggcccac
ggcttgaccc ctcagcaggt 2700ggtggccatc gccagcaatg gcggcggcag
gccggcgctg gagagcattg ttgcccagtt 2760atctcgccct gatccggcgt
tggccgcgtt gaccaacgac cacctcgtcg ccttggcctg 2820cctcggcggg
cgtcctgcgc tggatgcagt gaaaaaggga ttgggggatc ctatcagccg
2880ttcccagctg gtgaagtccg agctggagga gaagaaatcc gagttgaggc
acaagctgaa 2940gtacgtgccc cacgagtaca tcgagctgat cgagatcgcc
cggaacagca cccaggaccg 3000tatcctggag atgaaggtga tggagttctt
catgaaggtg tacggctaca ggggcaagca 3060cctgggcggc tccaggaagc
ccgacggcgc catctacacc gtgggctccc ccatcgacta 3120cggcgtgatc
gtggacacca aggcctactc cggcggctac aacctgccca tcggccaggc
3180cgacgaaatg cagaggtacg tggaggagaa ccagaccagg aacaagcaca
tcaaccccaa 3240cgagtggtgg aaggtgtacc cctccagcgt gaccgagttc
aagttcctgt tcgtgtccgg 3300ccacttcaag ggcaactaca aggcccagct
gaccaggctg aaccacatca ccaactgcaa 3360cggcgccgtg ctgtccgtgg
aggagctcct gatcggcggc gagatgatca aggccggcac 3420cctgaccctg
gaggaggtga ggaggaagtt caacaacggc gagatcaact tcgcggccga
3480ctgataactc gagcgatcct ctagacgagc tcctcgagcc tgcagcagct
gaagctttgg 3540acttcttcgc cagaggtttg gtcaagtctc caatcaaggt
tgtcggcttg tctaccttgc 3600cagaaattta cgaaaagatg gaaaagggtc
aaatcgttgg tagatacgtt gttgacactt 3660ctaaataagc gaatttctta
tgatttatga tttttattat taaataagtt ataaaaaaaa 3720taagtgtata
caaattttaa agtgactctt aggttttaaa acgaaaattc ttattcttga
3780gtaactcttt cctgtaggtc aggttgcttt ctcaggtata gcatgaggtc
gctcttattg 3840accacacctc taccggcatg caagcttggc gtaatcatgg
tcatagctgt ttcctgtgtg 3900aaattgttat ccgctcacaa ttccacacaa
catacgagcc ggaagcataa agtgtaaagc 3960ctggggtgcc taatgagtga
gctaactcac attaattgcg ttgcgctcac tgcccgcttt 4020ccagtcggga
aacctgtcgt gccagcagat ctgtttagct tgcctcgtcc ccgccgggtc
4080acccggccag cgacatggag gcccagaata ccctccttga cagtcttgac
gtgcgcagct 4140caggggcatg atgtgactgt cgcccgtaca tttagcccat
acatccccat gtataatcat 4200ttgcatccat acattttgat ggccgcacgg
cgcgaagcaa aaattacggc tcctcgctgc 4260agacctgcga gcagggaaac
gctcccctca cagacgcgtt gaattgtccc cacgccgcgc 4320ccctgtagag
aaatataaaa ggttaggatt tgccactgag gttcttcttt catatacttc
4380cttttaaaat cttgctagga tacagttctc acatcacatc cgaacataaa
caaccatgca 4440tgggtaagga aaagactcac gtttcgaggc cgcgattaaa
ttccaacatg gatgctgatt 4500tatatgggta taaatgggct cgcgataatg
tcgggcaatc aggtgcgaca atctatcgat 4560tgtatgggaa gcccgatgcg
ccagagttgt ttctgaaaca tggcaaaggt agcgttgcca 4620atgatgttac
agatgagatg gtcagactaa actggctgac ggaatttatg cctcttccga
4680ccatcaagca ttttatccgt actcctgatg atgcatggtt actcaccact
gcgatccccg 4740gcaaaacagc attccaggta ttagaagaat atcctgattc
aggtgaaaat attgttgatg 4800cgctggcagt gttcctgcgc cggttgcatt
cgattcctgt ttgtaattgt ccttttaaca 4860gcgatcgcgt atttcgcctc
gctcaggcgc aatcacgaat gaataacggt ttggttgatg 4920cgagtgattt
tgatgacgag cgtaatggct ggcctgttga acaagtctgg aaagaaatgc
4980ataagctttt gccattctca ccggattcag tcgtcactca tggtgatttc
tcacttgata 5040accttatttt tgacgagggg aaattaatag gttgtattga
tgttggacga gtcggaatcg 5100cagaccgata ccaggatctt gccatcctat
ggaactgcct cggtgagttt tctccttcat 5160tacagaaacg gctttttcaa
aaatatggta ttgataatcc tgatatgaat aaattgcagt 5220ttcatttgat
gctcgatgag tttttctaat cagtactgac aataaaaaga ttcttgtttt
5280caagaacttg tcatttgtat agttttttta tattgtagtt gttctatttt
aatcaaatgt 5340tagcgtgatt tatatttttt ttcgcctcga catcatctgc
ccagatgcga agttaagtgc 5400gcagaaagta atatcatgcg tcaatcgtat
gtgaatgctg gtcgctatac tgctgtcgat 5460tcgatactaa cgccgccatc
cagtgtcgaa aacgagctcg aattcatcga tgatatcaga 5520tccactagtg
gcctatgcga ccgcggatct gccggtctcc ctatagtgag tcgtattaat
5580ttcgataagc caggttaacc tgcattaatg aatcggccaa cgcgcgggga
gaggcggttt 5640gcgtattggg cgctcttccg cttcctcgct cactgactcg
ctgcgctcgg tcgttcggct 5700gcggcgagcg gtatcagcat cgatgaattc
cacggactat agactatact agtatactcc 5760gtctactgta cgatacactt
ccgctcaggt ccttgtcctt taacgaggcc ttaccactct 5820tttgttactc
tattgatcca gctcagcaaa ggcagtgtga tctaagattc tatcttcgcg
5880atgtagtaaa actagctaga ccgagaaaga gactagaaat gcaaaaggca
cttctacaat 5940ggctgccatc attattatcc gatgtgacgc tgcagcttct
caatgatatt cgaatacgct 6000ttgaggagat acagcctaat atccgacaaa
ctgttttaca gatttacgat cgtacttgtt 6060acccatcatt gaattttgaa
catccgaacc tgggagtttt ccctgaaaca gatagtatat 6120ttgaacctgt
ataataatat atagtctagc gctttacgga agacaatgta tgtatttcgg
6180ttcctggaga aactattgca tctattgcat aggtaatctt gcacgtcgca
tccccggttc 6240attttctgcg tttccatctt gcacttcaat agcatatctt
tgttaacgaa gcatctgtgc 6300ttcattttgt agaacaaaaa tgcaacgcga
gagcgctaat ttttcaaaca aagaatctga 6360gctgcatttt tacagaacag
aaatgcaacg cgaaagcgct attttaccaa cgaagaatct 6420gtgcttcatt
tttgtaaaac aaaaatgcaa cgcgagagcg ctaatttttc aaacaaagaa
6480tctgagctgc atttttacag aacagaaatg caacgcgaga gcgctatttt
accaacaaag 6540aatctatact tcttttttgt tctacaaaaa tgcatcccga
gagcgctatt tttctaacaa 6600agcatcttag attacttttt ttctcctttg
tgcgctctat aatgcagtct cttgataact 6660ttttgcactg taggtccgtt
aaggttagaa gaaggctact ttggtgtcta ttttctcttc 6720cataaaaaaa
gcctgactcc acttcccgcg tttactgatt actagcgaag ctgcgggtgc
6780attttttcaa gataaaggca tccccgatta tattctatac cgatgtggat
tgcgcatact 6840ttgtgaacag aaagtgatag cgttgatgat tcttcattgg
tcagaaaatt atgaacggtt 6900tcttctattt tgtctctata tactacgtat
aggaaatgtt tacattttcg tattgttttc 6960gattcactct atgaatagtt
cttactacaa tttttttgtc taaagagtaa tactagagat 7020aaacataaaa
aatgtagagg tcgagtttag atgcaagttc aaggagcgaa aggtggatgg
7080gtaggttata tagggatata gcacagagat atatagcaaa gagatacttt
tgagcaatgt 7140ttgtggaagc ggtattcgca atattttagt agctcgttac
agtccggtgc gtttttggtt 7200ttttgaaagt gcgtcttcag agcgcttttg
gttttcaaaa gcgctctgaa gttcctatac 7260tttctagaga ataggaactt
cggaatagga acttcaaagc gtttccgaaa acgagcgctt 7320ccgaaaatgc
aacgcgagct gcgcacatac agctcactgt tcacgtcgca cctatatctg
7380cgtgttgcct gtatatatat atacatgaga agaacggcat agtgcgtgtt
tatgcttaaa 7440tgcgtactta tatgcgtcta tttatgtagg atgaaaggta
gtctagtacc tcctgtgata 7500ttatcccatt ccatgcgggg tatcgtatgc
ttccttcagc actacccttt agctgttcta 7560tatgctgcca ctcctcaatt
ggattagtct catccttcaa tgctatcatt tcctttgata 7620ttggatcata
tgcatagtac cgagaaacta gtgcgaagta gtgatcaggt attgctgtta
7680tctgatgagt atacgttgtc ctggccacgg cagaagcacg cttatcgctc
caatttccca 7740caacattagt caactccgtt aggcccttca ttgaaagaaa
tgaggtcatc aaatgtcttc 7800caatgtgaga ttttgggcca ttttttatag
caaagattga ataaggcgca tttttcttca 7860aagctttatt gtacgatctg
actaagttat cttttaataa ttggtattcc tgtttattgc 7920ttgaagaatt
gccggtccta tttactcgtt ttaggactgg ttcagaattc atcgatgctc
7980actcaaaggt cggtaatacg gttatccaca gaatcagggg ataacgcagg
aaagaacatg 8040tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg
ccgcgttgct ggcgtttttc 8100cataggctcc gcccccctga cgagcatcac
aaaaatcgac gctcaagtca gaggtggcga 8160aacccgacag gactataaag
ataccaggcg tttccccctg gaagctccct cgtgcgctct 8220cctgttccga
ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtg
8280gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt
tcgctccaag 8340ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct
gcgccttatc cggtaactat 8400cgtcttgagt ccaacccggt aagacacgac
ttatcgccac tggcagcagc cactggtaac 8460aggattagca gagcgaggta
tgtaggcggt gctacagagt tcttgaagtg gtggcctaac 8520tacggctaca
ctagaaggac agtatttggt atctgcgctc tgctgaagcc agttaccttc
8580ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag
cggtggtttt 8640tttgtttgca agcagcagat tacgcgcaga aaaaaaggat
ctcaagaaga tcctttgatc 8700ttttctacgg ggtctgacgc tcagtggaac
gaaaactcac gttaagggat tttggtcatg 8760agcggataca tatttgaatg
tatttagaaa aataaacaaa taggggttcc 8810211109DNAArtificialplasmid
pCLS16715 2gggttccgcg cacatttccc cgaaaagtgc cacctgacgt ccgatcaaaa
atcatcgctt 60cgctgattaa ttaccccaga aataaggcta aaaaactaat cgcattatca
tcctatggtt 120gttaatttga ttcgttcatt tgaaggtttg tggggccagg
ttactgccaa tttttcctct 180tcataaccat aaaagctagt attgtagaat
ctttattgtt cggagcagtg cggcgcgagg 240cacatctgcg tttcaggaac
gcgaccggtg aagacgagga cgcacggagg agagtcttcc 300ttcggagggc
tgtcacccgc tcggcggctt ctaatccgta cttcaatata gcaatgagca
360gttaagcgta ttactgaaag ttccaaagag aaggtttttt taggctaatc
gacctcgagc 420agatccgcca ggcgtgtata tagcgtggat ggccaggcaa
ctttagtgct gacacataca 480ggcatatata tatgtgtgcg acgacacatg
atcatatggc atgcatgtgc tctgtatgta 540tataaaactc ttgttttctt
cttttctcta aatattcttt ccttatacat taggtccttt 600gtagcataaa
ttactatact tctatagaca cgcaaacaca aatacacagc ggccttgcca
660ccatgggcga tcctaaaaag aaacgtaagg tcatcgatta cccatacgat
gttccagatt 720acgctatcga tatcgccgat ctacgcacgc tcggctacag
ccagcagcaa caggagaaga 780tcaaaccgaa ggttcgttcg acagtggcgc
agcaccacga ggcactggtc ggccacgggt 840ttacacacgc gcacatcgtt
gcgttaagcc aacacccggc agcgttaggg accgtcgctg 900tcaagtatca
ggacatgatc gcagcgttgc cagaggcgac acacgaagcg atcgttggcg
960tcggcaaaca gtggtccggc gcacgcgctc tggaggcctt gctcacggtg
gcgggagagt 1020tgagaggtcc accgttacag ttggacacag gccaacttct
caagattgca aaacgtggcg 1080gcgtgaccgc agtggaggca gtgcatgcat
ggcgcaatgc actgacgggt gccccgctca 1140acttgacccc ccagcaggtg
gtggccatcg ccagcaataa tggtggcaag caggcgctgg 1200agacggtcca
gcggctgttg ccggtgctgt gccaggccca cggcttgacc ccccagcagg
1260tggtggccat cgccagcaat ggcggtggca agcaggcgct ggagacggtc
cagcggctgt 1320tgccggtgct gtgccaggcc cacggcttga ccccccagca
ggtggtggcc atcgccagca 1380ataatggtgg caagcaggcg ctggagacgg
tccagcggct gttgccggtg ctgtgccagg 1440cccacggctt gaccccggag
caggtggtgg ccatcgccag caatattggt ggcaagcagg 1500cgctggagac
ggtgcaggcg ctgttgccgg tgctgtgcca ggcccacggc ttgacccccc
1560agcaggtggt ggccatcgcc agcaatggcg gtggcaagca ggcgctggag
acggtccagc 1620ggctgttgcc ggtgctgtgc caggcccacg gcttgacccc
ggagcaggtg gtggccatcg 1680ccagccacga tggcggcaag caggcgctgg
agacggtcca gcggctgttg ccggtgctgt 1740gccaggccca cggcttgacc
ccggagcagg tggtggccat cgccagccac gatggcggca 1800agcaggcgct
ggagacggtc cagcggctgt tgccggtgct gtgccaggcc cacggcttga
1860ccccggagca ggtggtggcc atcgccagcc acgatggcgg caagcaggcg
ctggagacgg 1920tccagcggct gttgccggtg ctgtgccagg cccacggctt
gaccccggag caggtggtgg 1980ccatcgccag ccacgatggc ggcaagcagg
cgctggagac ggtccagcgg ctgttgccgg 2040tgctgtgcca ggcccacggc
ttgaccccgg agcaggtggt ggccatcgcc agccacgatg 2100gcggcaagca
ggcgctggag acggtccagc ggctgttgcc ggtgctgtgc caggcccacg
2160gcttgacccc ggagcaggtg gtggccatcg ccagccacga tggcggcaag
caggcgctgg 2220agacggtcca gcggctgttg ccggtgctgt gccaggccca
cggcttgacc ccggagcagg 2280tggtggccat cgccagcaat attggtggca
agcaggcgct ggagacggtg caggcgctgt 2340tgccggtgct gtgccaggcc
cacggcttga ccccccagca ggtggtggcc atcgccagca 2400ataatggtgg
caagcaggcg ctggagacgg tccagcggct gttgccggtg ctgtgccagg
2460cccacggctt gaccccggag caggtggtgg ccatcgccag ccacgatggc
ggcaagcagg 2520cgctggagac ggtccagcgg ctgttgccgg tgctgtgcca
ggcccacggc ttgaccccgg 2580agcaggtggt ggccatcgcc agcaatattg
gtggcaagca ggcgctggag acggtgcagg 2640cgctgttgcc ggtgctgtgc
caggcccacg gcttgacccc tcagcaggtg gtggccatcg 2700ccagcaatgg
cggcggcagg ccggcgctgg agagcattgt tgcccagtta tctcgccctg
2760atccggcgtt ggccgcgttg accaacgacc acctcgtcgc cttggcctgc
ctcggcgggc 2820gtcctgcgct ggatgcagtg aaaaagggat tgggggatcc
tatcagccgt tcccagctgg 2880tgaagtccga gctggaggag aagaaatccg
agttgaggca caagctgaag tacgtgcccc 2940acgagtacat cgagctgatc
gagatcgccc ggaacagcac ccaggaccgt atcctggaga 3000tgaaggtgat
ggagttcttc atgaaggtgt acggctacag gggcaagcac ctgggcggct
3060ccaggaagcc cgacggcgcc atctacaccg tgggctcccc catcgactac
ggcgtgatcg 3120tggacaccaa ggcctactcc ggcggctaca acctgcccat
cggccaggcc gacgaaatgc 3180agaggtacgt ggaggagaac cagaccagga
acaagcacat caaccccaac gagtggtgga 3240aggtgtaccc ctccagcgtg
accgagttca agttcctgtt cgtgtccggc cacttcaagg 3300gcaactacaa
ggcccagctg accaggctga accacatcac caactgcaac ggcgccgtgc
3360tgtccgtgga ggagctcctg atcggcggcg agatgatcaa ggccggcacc
ctgaccctgg 3420aggaggtgag gaggaagttc aacaacggcg agatcaactt
cgcggccgac tgataactcg 3480agcgatcctc tagacgagct cctcgagcct
gcagcagctg aagctttgga cttcttcgcc 3540agaggtttgg tcaagtctcc
aatcaaggtt gtcggcttgt ctaccttgcc agaaatttac 3600gaaaagatgg
aaaagggtca aatcgttggt agatacgttg ttgacacttc taaataagcg
3660aatttcttat gatttatgat ttttattatt aaataagtta taaaaaaaat
aagtgtatac 3720aaattttaaa gtgactctta ggttttaaaa cgaaaattct
tattcttgag taactctttc 3780ctgtaggtca ggttgctttc tcaggtatag
catgaggtcg ctcttattga ccacacctct 3840accggcatgc aagcttggcg
taatcatggt catagctgtt tcctgtgtga aattgttatc 3900cgctcacaat
tccacacaac atacgagccg gaagcataaa gtgtaaagcc tggggtgcct
3960aatgagtgag ctaactcaca ttaattgcgt tgcgctcact gcccgctttc
cagtcgggaa 4020acctgtcgtg ccagcagatc tattacatta tgggtggtat
gttggaataa aaatcaacta 4080tcatctacta actagtattt acgttactag
tatattatca tatacggtgt tagaagatga 4140cgcaaatgat gagaaatagt
catctaaatt agtggaagct gaaacgcaag gattgataat 4200gtaataggat
caatgaatat taacatataa aatgatgata ataatattta tagaattgtg
4260tagaattgca gattcccttt tatggattcc taaatcctcg aggagaactt
ctagtatatc 4320tacataccta atattattgc cttattaaaa atggaatccc
aacaattaca tcaaaatcca 4380cattctcttc aaaatcaatt gtcctgtact
tccttgttca tgtgtgttca aaaacgttat 4440atttatagga taattatact
ctatttctca acaagtaatt ggttgtttgg ccgagcggtc 4500taaggcgcct
gattcaagaa atatcttgac cgcagttaac tgtgggaata ctcaggtatc
4560gtaagatgca agagttcgaa tctcttagca accattattt ttttcctcaa
cataacgaga 4620acacacaggg gcgctatcgc acagaatcaa attcgatgac
tggaaatttt ttgttaattt 4680cagaggtcgc ctgacgcata tacctttttc
aactgaaaaa ttgggagaaa aaggaaaggt 4740gagagccgcg gaaccggctt
ttcatataga atagagaagc gttcatgact aaatgcttgc 4800atcacaatac
ttgaagttga caatattatt taaggaccta ttgttttttc caataggtgg
4860ttagcaatcg tcttactttc taacttttct taccttttac atttcagcaa
tatatatata 4920tatatttcaa ggatatacca ttctaatgtc tgcccctaag
aagatcgtcg ttttgccagg 4980tgaccacgtt ggtcaagaaa tcacagccga
agccattaag gttcttaaag ctatttctga 5040tgttcgttcc aatgtcaagt
tcgatttcga aaatcattta attggtggtg ctgctatcga 5100tgctacaggt
gtcccacttc cagatgaggc gctggaagcc tccaagaagg ttgatgccgt
5160tttgttaggt gctgtgggtg gtcctaaatg gggtaccggt agtgttagac
ctgaacaagg 5220tttactaaaa atccgtaaag aacttcaatt gtacgccaac
ttaagaccat gtaactttgc 5280atccgactct cttttagact tatctccaat
caagccacaa tttgctaaag gtactgactt 5340cgttgttgtc agagaattag
tgggaggtat ttactttggt aagagaaagg aagacgatgg 5400tgatggtgtc
gcttgggata gtgaacaata caccgttcca gaagtgcaaa gaatcacaag
5460aatggccgct ttcatggccc tacaacatga gccaccattg cctatttggt
ccttggataa 5520agctaatgtt ttggcctctt caagattatg gagaaaaact
gtggaggaaa ccatcaagaa 5580cgaattccct acattgaagg ttcaacatca
attgattgat tctgccgcca tgatcctagt 5640taagaaccca acccacctaa
atggtattat aatcaccagc aacatgtttg gtgatatcat 5700ctccgatgaa
gcctccgtta tcccaggttc cttgggtttg ttgccatctg cgtccttggc
5760ctctttgcca gacaagaaca ccgcatttgg tttgtacgaa ccatgccacg
gttctgctcc 5820agatttgcca aagaataagg tcaaccctat cgccactatc
ttgtctgctg caatgatgtt 5880gaaattgtca ttgaacttgc ctgaagaagg
taaggccatt gaagatgcag ttaaaaaggt 5940tttggatgca ggtatcagaa
ctggtgattt aggtggttcc aacagtacca cggaagtcgg 6000tgatgctgtc
gccgaagaag ttaagaaaat ccttgcttaa aaagattctc tttttttatg
6060atatttgtac ataaacttta taaatgaaat tcataataga aacgacacga
aattacaaaa 6120tggaatatgt tcatagggta gacgaaacta tatacgcaat
ctacatacat
ttatcaagaa 6180ggagaaaaag gaggatgtaa aggaatacag gtaagcaaat
tgatactaat ggctcaacgt 6240gataaggaaa aagaattgca ctttaacatt
aatattgaca aggaggaggg caccacacaa 6300aaagttaggt gtaacagaaa
atcatgaaac tatgattcct aatttatata ttggaggatt 6360ttctctaaaa
aaaaaaaaat acaacaaata aaaaacactc aatgacctga ccatttgatg
6420gagtttaagt caataccttc ttgaaccatt tcccataatg gtgaaagttc
cctcaagaat 6480tttactctgt cagaaacggc cttaacgacg tagtcgacct
cctcttcagt actaaatcta 6540ccaataccaa atctgatgga agaatgggct
aatgcatcat ccttacccag cgcatgtaaa 6600acataagaag gttctaggga
agcagatgta caggctgaac ccgaggataa tgcgatatcc 6660cttagtgcca
tcaataaaga ttctccttcc acgtaggcga aagaaacgtt aacacaccct
6720ggataacgat gatctggaga tccgttcaac gtggtatgtt cagcggataa
tagacctttg 6780actaatttat cggatagtct tttgatgtga gcttggtcgt
tgtcaaattc tttcttcatc 6840aatctcgcag cttcaccaaa tcccgctacc
aatggggggg ccaaagtacc agatctgctg 6900cattaatgaa tcggccaacg
cgcggggaga ggcggtttgc gtattgggcg ctcttccgct 6960tcctcgctca
ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagcatcg
7020atgaattcca cggactatag actatactag tatactccgt ctactgtacg
atacacttcc 7080gctcaggtcc ttgtccttta acgaggcctt accactcttt
tgttactcta ttgatccagc 7140tcagcaaagg cagtgtgatc taagattcta
tcttcgcgat gtagtaaaac tagctagacc 7200gagaaagaga ctagaaatgc
aaaaggcact tctacaatgg ctgccatcat tattatccga 7260tgtgacgctg
cagcttctca atgatattcg aatacgcttt gaggagatac agcctaatat
7320ccgacaaact gttttacaga tttacgatcg tacttgttac ccatcattga
attttgaaca 7380tccgaacctg ggagttttcc ctgaaacaga tagtatattt
gaacctgtat aataatatat 7440agtctagcgc tttacggaag acaatgtatg
tatttcggtt cctggagaaa ctattgcatc 7500tattgcatag gtaatcttgc
acgtcgcatc cccggttcat tttctgcgtt tccatcttgc 7560acttcaatag
catatctttg ttaacgaagc atctgtgctt cattttgtag aacaaaaatg
7620caacgcgaga gcgctaattt ttcaaacaaa gaatctgagc tgcattttta
cagaacagaa 7680atgcaacgcg aaagcgctat tttaccaacg aagaatctgt
gcttcatttt tgtaaaacaa 7740aaatgcaacg cgagagcgct aatttttcaa
acaaagaatc tgagctgcat ttttacagaa 7800cagaaatgca acgcgagagc
gctattttac caacaaagaa tctatacttc ttttttgttc 7860tacaaaaatg
catcccgaga gcgctatttt tctaacaaag catcttagat tacttttttt
7920ctcctttgtg cgctctataa tgcagtctct tgataacttt ttgcactgta
ggtccgttaa 7980ggttagaaga aggctacttt ggtgtctatt ttctcttcca
taaaaaaagc ctgactccac 8040ttcccgcgtt tactgattac tagcgaagct
gcgggtgcat tttttcaaga taaaggcatc 8100cccgattata ttctataccg
atgtggattg cgcatacttt gtgaacagaa agtgatagcg 8160ttgatgattc
ttcattggtc agaaaattat gaacggtttc ttctattttg tctctatata
8220ctacgtatag gaaatgttta cattttcgta ttgttttcga ttcactctat
gaatagttct 8280tactacaatt tttttgtcta aagagtaata ctagagataa
acataaaaaa tgtagaggtc 8340gagtttagat gcaagttcaa ggagcgaaag
gtggatgggt aggttatata gggatatagc 8400acagagatat atagcaaaga
gatacttttg agcaatgttt gtggaagcgg tattcgcaat 8460attttagtag
ctcgttacag tccggtgcgt ttttggtttt ttgaaagtgc gtcttcagag
8520cgcttttggt tttcaaaagc gctctgaagt tcctatactt tctagagaat
aggaacttcg 8580gaataggaac ttcaaagcgt ttccgaaaac gagcgcttcc
gaaaatgcaa cgcgagctgc 8640gcacatacag ctcactgttc acgtcgcacc
tatatctgcg tgttgcctgt atatatatat 8700acatgagaag aacggcatag
tgcgtgttta tgcttaaatg cgtacttata tgcgtctatt 8760tatgtaggat
gaaaggtagt ctagtacctc ctgtgatatt atcccattcc atgcggggta
8820tcgtatgctt ccttcagcac taccctttag ctgttctata tgctgccact
cctcaattgg 8880attagtctca tccttcaatg ctatcatttc ctttgatatt
ggatcatatg catagtaccg 8940agaaactagt gcgaagtagt gatcaggtat
tgctgttatc tgatgagtat acgttgtcct 9000ggccacggca gaagcacgct
tatcgctcca atttcccaca acattagtca actccgttag 9060gcccttcatt
gaaagaaatg aggtcatcaa atgtcttcca atgtgagatt ttgggccatt
9120ttttatagca aagattgaat aaggcgcatt tttcttcaaa gctttattgt
acgatctgac 9180taagttatct tttaataatt ggtattcctg tttattgctt
gaagaattgc cggtcctatt 9240tactcgtttt aggactggtt cagaattcat
cgatgctcac tcaaaggtcg gtaatacggt 9300tatccacaga atcaggggat
aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg 9360ccaggaaccg
taaaaaggcc gcgttgctgg cgtttttcca taggctccgc ccccctgacg
9420agcatcacaa aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga
ctataaagat 9480accaggcgtt tccccctgga agctccctcg tgcgctctcc
tgttccgacc ctgccgctta 9540ccggatacct gtccgccttt ctcccttcgg
gaagcgtggc gctttctcat agctcacgct 9600gtaggtatct cagttcggtg
taggtcgttc gctccaagct gggctgtgtg cacgaacccc 9660ccgttcagcc
cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa
9720gacacgactt atcgccactg gcagcagcca ctggtaacag gattagcaga
gcgaggtatg 9780taggcggtgc tacagagttc ttgaagtggt ggcctaacta
cggctacact agaaggacag 9840tatttggtat ctgcgctctg ctgaagccag
ttaccttcgg aaaaagagtt ggtagctctt 9900gatccggcaa acaaaccacc
gctggtagcg gtggtttttt tgtttgcaag cagcagatta 9960cgcgcagaaa
aaaaggatct caagaagatc ctttgatctt ttctacgggg tctgacgctc
10020agtggaacga aaactcacgt taagggattt tggtcatgag attatcaaaa
aggatcttca 10080cctagatcct tttaaattaa aaatgaagtt ttaaatcaat
ctaaagtata tatgagtaaa 10140cttggtctga cagttaccaa tgcttaatca
gtgaggcacc tatctcagcg atctgtctat 10200ttcgttcatc catagttgcc
tgactccccg tcgtgtagat aactacgata cgggagggct 10260taccatctgg
ccccagtgct gcaatgatac cgcgagaccc acgctcaccg gctccagatt
10320tatcagcaat aaaccagcca gccggaaggg ccgagcgcag aagtggtcct
gcaactttat 10380ccgcctccat ccagtctatt aattgttgcc gggaagctag
agtaagtagt tcgccagtta 10440atagtttgcg caacgttgtt gccattgcta
caggcatcgt ggtgtcacgc tcgtcgtttg 10500gtatggcttc attcagctcc
ggttcccaac gatcaaggcg agttacatga tcccccatgt 10560tgtgcaaaaa
agcggttagc tccttcggtc ctccgatcgt tgtcagaagt aagttggccg
10620cagtgttatc actcatggtt atggcagcac tgcataattc tcttactgtc
atgccatccg 10680taagatgctt ttctgtgact ggtgagtact caaccaagtc
attctgagaa tagtgtatgc 10740ggcgaccgag ttgctcttgc ccggcgtcaa
tacgggataa taccgcgcca catagcagaa 10800ctttaaaagt gctcatcatt
ggaaaacgtt cttcggggcg aaaactctca aggatcttac 10860cgctgttgag
atccagttcg atgtaaccca ctcgtgcacc caactgatct tcagcatctt
10920ttactttcac cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc
gcaaaaaagg 10980gaataagggc gacacggaaa tgttgaatac tcatactctt
cctttttcaa tattattgaa 11040gcatttatca gggttattgt ctcatgagcg
gatacatatt tgaatgtatt tagaaaaata 11100aacaaatag
111093597DNAFlavovacterium 3cagctggtga agtccgagct ggaggagaag
aaatccgagt tgaggcacaa gctgaagtac 60gtgccccacg agtacatcga gctgatcgag
atcgcccgga acagcaccca ggaccgtatc 120ctggagatga aggtgatgga
gttcttcatg aaggtgtacg gctacagggg caagcacctg 180ggcggctcca
ggaagcccga cggcgccatc tacaccgtgg gctcccccat cgactacggc
240gtgatcgtgg acaccaaggc ctactccggc ggctacaacc tgcccatcgg
ccaggccgac 300gaaatgcaga ggtacgtgga ggagaaccag accaggaaca
agcacatcaa ccccaacgag 360tggtggaagg tgtacccctc cagcgtgacc
gagttcaagt tcctgttcgt gtccggccac 420ttcaagggca actacaaggc
ccagctgacc aggctgaacc acatcaccaa ctgcaacggc 480gccgtgctgt
ccgtggagga gctcctgatc ggcggcgaga tgatcaaggc cggcaccctg
540accctggagg aggtgaggag gaagttcaac aacggcgaga tcaacttcgc ggccgac
597415DNAArtificialDNA target site of the left-hand TALE of Figure
1B 4gtgatccccc cagca 15515DNAArtificialSequence complementary to
the DNA target site of SEQ ID NO 4 5tgctgggggg atcac
1565DNAArtificialportion of the DNA target site of SEQ ID NO 4 that
is the sequence of the 5' end of the DNA tandem repeat 6cagca
5710DNAArtificialportion of the DNA target site of SEQ ID NO 4 that
is the gene sequence that is immediately adjacent to the 5' end of
the tandem repeat (outside of the tandem repeat sequence)
7gtgatccccc 10820DNAArtificialspacer of Figure 1B 8gcagcagcag
cagcagcagc 20920DNAArtificialsequence of the spacer on the
complementary strand 9gctgctgctg ctgctgctgc 201015DNAArtificialDNA
target site of the right-hand TALE of Figure 1B 10gctgctgctg ctgct
151115DNAArtificialSequence complementary to the DNA target site of
SEQ ID NO 10 11agcagcagca gcagc 151265DNAArtificialSplit left TALE
DNA-binding domain of Figure 1B 12tcgctgcagg tcggcctcag cctggccgaa
agaaagaaat ggtctgtgat ccccccagca 60gcagc
651365DNAArtificialSequence complementary to SEQ ID NO 12
13gctgctgctg gtccagccgg agtcggaccg gctttctttc tttaccagac actagggggc
60agcga 651427DNAArtificialCTG repeat 14ctgctgctgc tgctgctgct
gctgctg 271565DNAArtificialDNA comprising a DNA direct tandem
repeat consisting of 9 copies of the unit CTG 15tagccgggaa
tgctgctgct gctgctgctg ctgctgctgg ggggatcaca gaccatttct 60ttctt
651668DNAArtificialDNA comprising a DNA direct tandem repeat
consisting of 9 copies of the unit CTG 16tagccgggaa tgctgctgct
gctgctgctg ctgctgctgg ggggatcaca tacttttttt 60ttctttcg
681719DNAArtificialmutation detected in yeast 17aaaaaaaaaa
aaaaaaaaa 191824DNAArtificialmutation detected in yeast
18aaaaaaaaaa aaaaaaaaaa aaaa 241919DNAArtificialmutation detected
in yeast 19tttttttttt ttttttttt 192013DNAArtificialmutation
detected in yeast 20tttttttttt ttt 132115DNAArtificialmutation
detected in yeast 21aagaaaaaaa aaaaa 15225PRTArtificiallinker 22Gln
Gly Pro Ser Gly 1 5 2334DNAArtificialDNA comprising a DNA direct
tandem repeat consisting of 8 copies of the unit CAG 23gtgatccccc
cagcagcagc agcagcagca gcag 3424270PRTArtificialTALE tandem repeat
consisting of 8 copies of the unit of SEQ ID NO 25 (the RVDs being
HD, NG, NI, NN, NS, N*, HG and H* respectively) 24Leu Thr Pro Glu
Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys 1 5 10 15 Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 20 25 30
His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Gly Gly 35
40 45 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 50 55 60 Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile
Ala Ser Asn 65 70 75 80 Ile Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val 85 90 95 Leu Cys Gln Ala His Gly Leu Thr Pro
Glu Gln Val Val Ala Ile Ala 100 105 110 Ser Asn Asn Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 115 120 125 Pro Val Leu Cys Gln
Ala His Gly Leu Thr Pro Glu Gln Val Val Ala 130 135 140 Ile Ala Ser
Asn Ser Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 145 150 155 160
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val 165
170 175 Val Ala Ile Ala Ser Asn Gly Gly Lys Gln Ala Leu Glu Thr Val
Gln 180 185 190 Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr
Pro Glu Gln 195 200 205 Val Val Ala Ile Ala Ser His Gly Gly Gly Lys
Gln Ala Leu Glu Thr 210 215 220 Val Gln Arg Leu Leu Pro Val Leu Cys
Gln Ala His Gly Leu Thr Pro 225 230 235 240 Glu Gln Val Val Ala Ile
Ala Ser His Gly Gly Lys Gln Ala Leu Glu 245 250 255 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly 260 265 270
2534PRTArtificialTAL effector tandem repeat unit [XX is selected
from the group consisting of HD, NG, NI, NN, NS, N*, HG, H*, IG,
HA, ND, NK, HI, HN, NA, SN and YG (the symbol * denotes that the
second X is missing)] 25Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser
Xaa Xaa Gly Gly Lys 1 5 10 15 Gln Ala Leu Glu Thr Val Gln Arg Leu
Leu Pro Val Leu Cys Gln Ala 20 25 30 His Gly 2634PRTArtificialTAL
effector tandem repeat unit [XX is selected from the group
consisting of HD, NG, NI, NN, NS, N*, HG, H*, IG, HA, ND, NK, HI,
HN, NA, SN and YG (the symbol * denotes that the second X is
missing)] 26Leu Thr Pro Asp Gln Val Val Ala Ile Ala Ser Xaa Xaa Gly
Gly Lys 1 5 10 15 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Asp 20 25 30 His Gly 2739DNAArtificialCAG repeat
27cagcagcagc agcagcagca gcagcagcag cagcagcag 392887DNAArtificialCAG
repeat 28cagcagcagc agcagcagca gcagcagcag cagcagcagc agcagcagca
gcagcagcag 60cagcagcagc agcagcagca gcagcag 872945DNAArtificialCAG
repeat 29cagcagcagc agcagcagca gcagcagcag cagcagcagc agcag
45309DNAArtificialCAG repeat 30cagcagcag 931366DNAArtificialCTG
repeat 31ctgctgctgc tgctgctgct gctgctgctg ctgctgctgc tgctgctgct
gctgctgctg 60ctgctgctgc tgctgctgct gctgctgctg ctgctgctgc tgctgctgct
gctgctgctg 120ctgctgctgc tgctgctgct gctgctgctg ctgctgctgc
tgctgctgct gctgctgctg 180ctgctgctgc tgctgctgct gctgctgctg
ctgctgctgc tgctgctgct gctgctgctg 240ctgctgctgc tgctgctgct
gctgctgctg ctgctgctgc tgctgctgct gctgctgctg 300ctgctgctgc
tgctgctgct gctgctgctg ctgctgctgc tgctgctgct gctgctgctg 360ctgctg
36632216DNAArtificialCTG repeat 32ctgctgctgc tgctgctgct gctgctgctg
ctgctgctgc tgctgctgct gctgctgctg 60ctgctgctgc tgctgctgct gctgctgctg
ctgctgctgc tgctgctgct gctgctgctg 120ctgctgctgc tgctgctgct
gctgctgctg ctgctgctgc tgctgctgct gctgctgctg 180ctgctgctgc
tgctgctgct gctgctgctg ctgctg 2163396DNAArtificialCTG repeat
33ctgctgctgc tgctgctgct gctgctgctg ctgctgctgc tgctgctgct gctgctgctg
60ctgctgctgc tgctgctgct gctgctgctg ctgctg 96346DNAArtificialCTG
repeat 34ctgctg 6356DNAArtificialCTG repeat 35cagcag
6364DNAArtificialCTG repeat 36gcag 437366DNAArtificialGAL10
enhancer 37gatcaaaaat catcgcttcg ctgattaatt accccagaaa taaggctaaa
aaactaatcg 60cattatcatc ctatggttgt taatttgatt cgttcatttg aaggtttgtg
gggccaggtt 120actgccaatt tttcctcttc ataaccataa aagctagtat
tgtagaatct ttattgttcg 180gagcagtgcg gcgcgaggca catctgcgtt
tcaggaacgc gaccggtgaa gacgaggacg 240cacggaggag agtcttcctt
cggagggctg tcacccgctc ggcggcttct aatccgtact 300tcaatatagc
aatgagcagt taagcgtatt actgaaagtt ccaaagagaa ggttttttta 360ggctaa
36638240DNAArtificialCYC1 promoter 38tcgacctcga gcagatccgc
caggcgtgta tatagcgtgg atggccaggc aactttagtg 60ctgacacata caggcatata
tatatgtgtg cgacgacaca tgatcatatg gcatgcatgt 120gctctgtatg
tatataaaac tcttgttttc ttcttttctc taaatattct ttccttatac
180attaggtcct ttgtagcata aattactata cttctataga cacgcaaaca
caaatacaca 240392829DNAArtificialsequence coding for the TALEN arm
that recognizes the DNA target site of SEQ ID NO 10 39atgggcgatc
ctaaaaagaa acgtaaggtc atcgataagg agaccgccgc tgccaagttc 60gagagacagc
acatggacag catcgatatc gccgatctac gcacgctcgg ctacagccag
120cagcaacagg agaagatcaa accgaaggtt cgttcgacag tggcgcagca
ccacgaggca 180ctggtcggcc acgggtttac acacgcgcac atcgttgcgt
taagccaaca cccggcagcg 240ttagggaccg tcgctgtcaa gtatcaggac
atgatcgcag cgttgccaga ggcgacacac 300gaagcgatcg ttggcgtcgg
caaacagtgg tccggcgcac gcgctctgga ggccttgctc 360acggtggcgg
gagagttgag aggtccaccg ttacagttgg acacaggcca acttctcaag
420attgcaaaac gtggcggcgt gaccgcagtg gaggcagtgc atgcatggcg
caatgcactg 480acgggtgccc cgctcaactt gaccccccag caggtggtgg
ccatcgccag caataatggt 540ggcaagcagg cgctggagac ggtccagcgg
ctgttgccgg tgctgtgcca ggcccacggc 600ttgaccccgg agcaggtggt
ggccatcgcc agccacgatg gcggcaagca ggcgctggag 660acggtccagc
ggctgttgcc ggtgctgtgc caggcccacg gcttgacccc ccagcaggtg
720gtggccatcg ccagcaatgg cggtggcaag caggcgctgg agacggtcca
gcggctgttg 780ccggtgctgt gccaggccca cggcttgacc ccccagcagg
tggtggccat cgccagcaat 840aatggtggca agcaggcgct ggagacggtc
cagcggctgt tgccggtgct gtgccaggcc 900cacggcttga ccccggagca
ggtggtggcc atcgccagcc acgatggcgg caagcaggcg 960ctggagacgg
tccagcggct gttgccggtg ctgtgccagg cccacggctt gaccccccag
1020caggtggtgg ccatcgccag caatggcggt ggcaagcagg cgctggagac
ggtccagcgg 1080ctgttgccgg tgctgtgcca ggcccacggc ttgacccccc
agcaggtggt ggccatcgcc 1140agcaataatg gtggcaagca ggcgctggag
acggtccagc ggctgttgcc ggtgctgtgc 1200caggcccacg gcttgacccc
ggagcaggtg gtggccatcg ccagccacga tggcggcaag 1260caggcgctgg
agacggtcca gcggctgttg ccggtgctgt gccaggccca cggcttgacc
1320ccccagcagg tggtggccat cgccagcaat ggcggtggca agcaggcgct
ggagacggtc 1380cagcggctgt tgccggtgct gtgccaggcc cacggcttga
ccccccagca ggtggtggcc 1440atcgccagca ataatggtgg caagcaggcg
ctggagacgg tccagcggct gttgccggtg 1500ctgtgccagg cccacggctt
gaccccggag caggtggtgg ccatcgccag ccacgatggc 1560ggcaagcagg
cgctggagac
ggtccagcgg ctgttgccgg tgctgtgcca ggcccacggc 1620ttgacccccc
agcaggtggt ggccatcgcc agcaatggcg gtggcaagca ggcgctggag
1680acggtccagc ggctgttgcc ggtgctgtgc caggcccacg gcttgacccc
ccagcaggtg 1740gtggccatcg ccagcaataa tggtggcaag caggcgctgg
agacggtcca gcggctgttg 1800ccggtgctgt gccaggccca cggcttgacc
ccggagcagg tggtggccat cgccagccac 1860gatggcggca agcaggcgct
ggagacggtc cagcggctgt tgccggtgct gtgccaggcc 1920cacggcttga
ccccccagca ggtggtggcc atcgccagca atggcggtgg caagcaggcg
1980ctggagacgg tccagcggct gttgccggtg ctgtgccagg cccacggctt
gacccctcag 2040caggtggtgg ccatcgccag caatggcggc ggcaggccgg
cgctggagag cattgttgcc 2100cagttatctc gccctgatcc ggcgttggcc
gcgttgacca acgaccacct cgtcgccttg 2160gcctgcctcg gcgggcgtcc
tgcgctggat gcagtgaaaa agggattggg ggatcctatc 2220agccgttccc
agctggtgaa gtccgagctg gaggagaaga aatccgagtt gaggcacaag
2280ctgaagtacg tgccccacga gtacatcgag ctgatcgaga tcgcccggaa
cagcacccag 2340gaccgtatcc tggagatgaa ggtgatggag ttcttcatga
aggtgtacgg ctacaggggc 2400aagcacctgg gcggctccag gaagcccgac
ggcgccatct acaccgtggg ctcccccatc 2460gactacggcg tgatcgtgga
caccaaggcc tactccggcg gctacaacct gcccatcggc 2520caggccgacg
aaatgcagag gtacgtggag gagaaccaga ccaggaacaa gcacatcaac
2580cccaacgagt ggtggaaggt gtacccctcc agcgtgaccg agttcaagtt
cctgttcgtg 2640tccggccact tcaagggcaa ctacaaggcc cagctgacca
ggctgaacca catcaccaac 2700tgcaacggcg ccgtgctgtc cgtggaggag
ctcctgatcg gcggcgagat gatcaaggcc 2760ggcaccctga ccctggagga
ggtgaggagg aagttcaaca acggcgagat caacttcgcg 2820gccgactga
282940320DNAArtificialADH1 terminator 40tattgaccac acctctaccg
gcatgcaagc ttggcgtaat catggtcata gctgtttcct 60gtgtgaaatt gttatccgct
cacaattcca cacaacatac gagccggaag cataaagtgt 120aaagcctggg
gtgcctaatg agtgagctaa ctcacattaa ttgcgttgcg ctcactgccc
180gctttccagt cgggaaacct gtcgtgccag cagatctgtt tagcttgcct
cgtccccgcc 240gggtcacccg gccagcgaca tggaggccca gaataccctc
cttgacagtc ttgacgtgcg 300cagctcaggg gcatgatgtg
32041383DNAArtificialTEF promoter 41tgaggttctt ctttcatata
cttcctttta aaatcttgct aggatacagt tctcacatca 60catccgaaca taaacaacca
tgcatgggta aggaaaagac tcacgtttcg aggccgcgat 120taaattccaa
catggatgct gatttatatg ggtataaatg ggctcgcgat aatgtcgggc
180aatcaggtgc gacaatctat cgattgtatg ggaagcccga tgcgccagag
ttgtttctga 240aacatggcaa aggtagcgtt gccaatgatg ttacagatga
gatggtcaga ctaaactggc 300tgacggaatt tatgcctctt ccgaccatca
agcattttat ccgtactcct gatgatgcat 360ggttactcac cactgcgatc ccc
38342807DNAArtificialsequence coding for the KANMX selection marker
42ggcaaaacag cattccaggt attagaagaa tatcctgatt caggtgaaaa tattgttgat
60gcgctggcag tgttcctgcg ccggttgcat tcgattcctg tttgtaattg tccttttaac
120agcgatcgcg tatttcgcct cgctcaggcg caatcacgaa tgaataacgg
tttggttgat 180gcgagtgatt ttgatgacga gcgtaatggc tggcctgttg
aacaagtctg gaaagaaatg 240cataagcttt tgccattctc accggattca
gtcgtcactc atggtgattt ctcacttgat 300aaccttattt ttgacgaggg
gaaattaata ggttgtattg atgttggacg agtcggaatc 360gcagaccgat
accaggatct tgccatccta tggaactgcc tcggtgagtt ttctccttca
420ttacagaaac ggctttttca aaaatatggt attgataatc ctgatatgaa
taaattgcag 480tttcatttga tgctcgatga gtttttctaa tcagtactga
caataaaaag attcttgttt 540tcaagaactt gtcatttgta tagttttttt
atattgtagt tgttctattt taatcaaatg 600ttagcgtgat ttatattttt
tttcgcctcg acatcatctg cccagatgcg aagttaagtg 660cgcagaaagt
aatatcatgc gtcaatcgta tgtgaatgct ggtcgctata ctgctgtcga
720ttcgatacta acgccgccat ccagtgtcga aaacgagctc gaattcatcg
atgatatcag 780atccactagt ggcctatgcg accgcgg
80743213DNAArtificialTEF terminator 43atctgccggt ctccctatag
tgagtcgtat taatttcgat aagccaggtt aacctgcatt 60aatgaatcgg ccaacgcgcg
gggagaggcg gtttgcgtat tgggcgctct tccgcttcct 120cgctcactga
ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca gcatcgatga
180attccacgga ctatagacta tactagtata ctc
213441345DNAArtificial2-Micron replication origin 44gctatttttc
taacaaagca tcttagatta ctttttttct cctttgtgcg ctctataatg 60cagtctcttg
ataacttttt gcactgtagg tccgttaagg ttagaagaag gctactttgg
120tgtctatttt ctcttccata aaaaaagcct gactccactt cccgcgttta
ctgattacta 180gcgaagctgc gggtgcattt tttcaagata aaggcatccc
cgattatatt ctataccgat 240gtggattgcg catactttgt gaacagaaag
tgatagcgtt gatgattctt cattggtcag 300aaaattatga acggtttctt
ctattttgtc tctatatact acgtatagga aatgtttaca 360ttttcgtatt
gttttcgatt cactctatga atagttctta ctacaatttt tttgtctaaa
420gagtaatact agagataaac ataaaaaatg tagaggtcga gtttagatgc
aagttcaagg 480agcgaaaggt ggatgggtag gttatatagg gatatagcac
agagatatat agcaaagaga 540tacttttgag caatgtttgt ggaagcggta
ttcgcaatat tttagtagct cgttacagtc 600cggtgcgttt ttggtttttt
gaaagtgcgt cttcagagcg cttttggttt tcaaaagcgc 660tctgaagttc
ctatactttc tagagaatag gaacttcgga ataggaactt caaagcgttt
720ccgaaaacga gcgcttccga aaatgcaacg cgagctgcgc acatacagct
cactgttcac 780gtcgcaccta tatctgcgtg ttgcctgtat atatatatac
atgagaagaa cggcatagtg 840cgtgtttatg cttaaatgcg tacttatatg
cgtctattta tgtaggatga aaggtagtct 900agtacctcct gtgatattat
cccattccat gcggggtatc gtatgcttcc ttcagcacta 960ccctttagct
gttctatatg ctgccactcc tcaattggat tagtctcatc cttcaatgct
1020atcatttcct ttgatattgg atcatatgca tagtaccgag aaactagtgc
gaagtagtga 1080tcaggtattg ctgttatctg atgagtatac gttgtcctgg
ccacggcaga agcacgctta 1140tcgctccaat ttcccacaac attagtcaac
tccgttaggc ccttcattga aagaaatgag 1200gtcatcaaat gtcttccaat
gtgagatttt gggccatttt ttatagcaaa gattgaataa 1260ggcgcatttt
tcttcaaagc tttattgtac gatctgacta agttatcttt taataattgg
1320tattcctgtt tattgcttga agaat 1345451530DNAArtificialsequence
coding for the TAL effector tandem repeat of the TALEN arm that
binds to the DNA target site of SEQ ID NO 10 (15 adjacent units of
34 amino acids) 45ttgacccccc agcaggtggt ggccatcgcc agcaataatg
gtggcaagca ggcgctggag 60acggtccagc ggctgttgcc ggtgctgtgc caggcccacg
gcttgacccc ggagcaggtg 120gtggccatcg ccagccacga tggcggcaag
caggcgctgg agacggtcca gcggctgttg 180ccggtgctgt gccaggccca
cggcttgacc ccccagcagg tggtggccat cgccagcaat 240ggcggtggca
agcaggcgct ggagacggtc cagcggctgt tgccggtgct gtgccaggcc
300cacggcttga ccccccagca ggtggtggcc atcgccagca ataatggtgg
caagcaggcg 360ctggagacgg tccagcggct gttgccggtg ctgtgccagg
cccacggctt gaccccggag 420caggtggtgg ccatcgccag ccacgatggc
ggcaagcagg cgctggagac ggtccagcgg 480ctgttgccgg tgctgtgcca
ggcccacggc ttgacccccc agcaggtggt ggccatcgcc 540agcaatggcg
gtggcaagca ggcgctggag acggtccagc ggctgttgcc ggtgctgtgc
600caggcccacg gcttgacccc ccagcaggtg gtggccatcg ccagcaataa
tggtggcaag 660caggcgctgg agacggtcca gcggctgttg ccggtgctgt
gccaggccca cggcttgacc 720ccggagcagg tggtggccat cgccagccac
gatggcggca agcaggcgct ggagacggtc 780cagcggctgt tgccggtgct
gtgccaggcc cacggcttga ccccccagca ggtggtggcc 840atcgccagca
atggcggtgg caagcaggcg ctggagacgg tccagcggct gttgccggtg
900ctgtgccagg cccacggctt gaccccccag caggtggtgg ccatcgccag
caataatggt 960ggcaagcagg cgctggagac ggtccagcgg ctgttgccgg
tgctgtgcca ggcccacggc 1020ttgaccccgg agcaggtggt ggccatcgcc
agccacgatg gcggcaagca ggcgctggag 1080acggtccagc ggctgttgcc
ggtgctgtgc caggcccacg gcttgacccc ccagcaggtg 1140gtggccatcg
ccagcaatgg cggtggcaag caggcgctgg agacggtcca gcggctgttg
1200ccggtgctgt gccaggccca cggcttgacc ccccagcagg tggtggccat
cgccagcaat 1260aatggtggca agcaggcgct ggagacggtc cagcggctgt
tgccggtgct gtgccaggcc 1320cacggcttga ccccggagca ggtggtggcc
atcgccagcc acgatggcgg caagcaggcg 1380ctggagacgg tccagcggct
gttgccggtg ctgtgccagg cccacggctt gaccccccag 1440caggtggtgg
ccatcgccag caatggcggt ggcaagcagg cgctggagac ggtccagcgg
1500ctgttgccgg tgctgtgcca ggcccacggc 15304634PRTArtificialTAL
effector tandem repeat unit [XX is selected from the group
consisting of HD, NG, NI, NN, NS, N*, HG, H*, IG, HA, ND, NK, HI,
HN, NA, SN and YG (the symbol * denotes that the second X is
missing)] 46Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Xaa Xaa Gly
Gly Lys 1 5 10 15 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala 20 25 30 His Gly 4760DNAArtificial(non-specific)
C-terminal truncated unit of 20 amino acids of the TALEN arm that
binds to the DNA target site of SEQ ID NO 10 47ttgacccctc
agcaggtggt ggccatcgcc agcaatggcg gcggcaggcc ggcgctggag
604820PRTArtificial(non-specific) C-terminal truncated unit of 20
amino acids of the TALEN arm that binds to the DNA target site of
SEQ ID NO 10 48Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly
Gly Gly Arg 1 5 10 15 Pro Ala Leu Glu 20 49199PRTArtificialFokI
monomer 49Gln Leu Val Lys Ser Glu Leu Glu Glu Lys Lys Ser Glu Leu
Arg His 1 5 10 15 Lys Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu
Ile Glu Ile Ala 20 25 30 Arg Asn Ser Thr Gln Asp Arg Ile Leu Glu
Met Lys Val Met Glu Phe 35 40 45 Phe Met Lys Val Tyr Gly Tyr Arg
Gly Lys His Leu Gly Gly Ser Arg 50 55 60 Lys Pro Asp Gly Ala Ile
Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly 65 70 75 80 Val Ile Val Asp
Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu Pro Ile 85 90 95 Gly Gln
Ala Asp Glu Met Gln Arg Tyr Val Glu Glu Asn Gln Thr Arg 100 105 110
Asn Lys His Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser 115
120 125 Val Thr Glu Phe Lys Phe Leu Phe Val Ser Gly His Phe Lys Gly
Asn 130 135 140 Tyr Lys Ala Gln Leu Thr Arg Leu Asn His Ile Thr Asn
Cys Asn Gly 145 150 155 160 Ala Val Leu Ser Val Glu Glu Leu Leu Ile
Gly Gly Glu Met Ile Lys 165 170 175 Ala Gly Thr Leu Thr Leu Glu Glu
Val Arg Arg Lys Phe Asn Asn Gly 180 185 190 Glu Ile Asn Phe Ala Ala
Asp 195 502811DNAArtificialsequence coding for the TALEN arm that
binds to the DNA target site of SEQ ID NO 4 50atgggcgatc ctaaaaagaa
acgtaaggtc atcgattacc catacgatgt tccagattac 60gctatcgata tcgccgatct
acgcacgctc ggctacagcc agcagcaaca ggagaagatc 120aaaccgaagg
ttcgttcgac agtggcgcag caccacgagg cactggtcgg ccacgggttt
180acacacgcgc acatcgttgc gttaagccaa cacccggcag cgttagggac
cgtcgctgtc 240aagtatcagg acatgatcgc agcgttgcca gaggcgacac
acgaagcgat cgttggcgtc 300ggcaaacagt ggtccggcgc acgcgctctg
gaggccttgc tcacggtggc gggagagttg 360agaggtccac cgttacagtt
ggacacaggc caacttctca agattgcaaa acgtggcggc 420gtgaccgcag
tggaggcagt gcatgcatgg cgcaatgcac tgacgggtgc cccgctcaac
480ttgacccccc agcaggtggt ggccatcgcc agcaataatg gtggcaagca
ggcgctggag 540acggtccagc ggctgttgcc ggtgctgtgc caggcccacg
gcttgacccc ccagcaggtg 600gtggccatcg ccagcaatgg cggtggcaag
caggcgctgg agacggtcca gcggctgttg 660ccggtgctgt gccaggccca
cggcttgacc ccccagcagg tggtggccat cgccagcaat 720aatggtggca
agcaggcgct ggagacggtc cagcggctgt tgccggtgct gtgccaggcc
780cacggcttga ccccggagca ggtggtggcc atcgccagca atattggtgg
caagcaggcg 840ctggagacgg tgcaggcgct gttgccggtg ctgtgccagg
cccacggctt gaccccccag 900caggtggtgg ccatcgccag caatggcggt
ggcaagcagg cgctggagac ggtccagcgg 960ctgttgccgg tgctgtgcca
ggcccacggc ttgaccccgg agcaggtggt ggccatcgcc 1020agccacgatg
gcggcaagca ggcgctggag acggtccagc ggctgttgcc ggtgctgtgc
1080caggcccacg gcttgacccc ggagcaggtg gtggccatcg ccagccacga
tggcggcaag 1140caggcgctgg agacggtcca gcggctgttg ccggtgctgt
gccaggccca cggcttgacc 1200ccggagcagg tggtggccat cgccagccac
gatggcggca agcaggcgct ggagacggtc 1260cagcggctgt tgccggtgct
gtgccaggcc cacggcttga ccccggagca ggtggtggcc 1320atcgccagcc
acgatggcgg caagcaggcg ctggagacgg tccagcggct gttgccggtg
1380ctgtgccagg cccacggctt gaccccggag caggtggtgg ccatcgccag
ccacgatggc 1440ggcaagcagg cgctggagac ggtccagcgg ctgttgccgg
tgctgtgcca ggcccacggc 1500ttgaccccgg agcaggtggt ggccatcgcc
agccacgatg gcggcaagca ggcgctggag 1560acggtccagc ggctgttgcc
ggtgctgtgc caggcccacg gcttgacccc ggagcaggtg 1620gtggccatcg
ccagcaatat tggtggcaag caggcgctgg agacggtgca ggcgctgttg
1680ccggtgctgt gccaggccca cggcttgacc ccccagcagg tggtggccat
cgccagcaat 1740aatggtggca agcaggcgct ggagacggtc cagcggctgt
tgccggtgct gtgccaggcc 1800cacggcttga ccccggagca ggtggtggcc
atcgccagcc acgatggcgg caagcaggcg 1860ctggagacgg tccagcggct
gttgccggtg ctgtgccagg cccacggctt gaccccggag 1920caggtggtgg
ccatcgccag caatattggt ggcaagcagg cgctggagac ggtgcaggcg
1980ctgttgccgg tgctgtgcca ggcccacggc ttgacccctc agcaggtggt
ggccatcgcc 2040agcaatggcg gcggcaggcc ggcgctggag agcattgttg
cccagttatc tcgccctgat 2100ccggcgttgg ccgcgttgac caacgaccac
ctcgtcgcct tggcctgcct cggcgggcgt 2160cctgcgctgg atgcagtgaa
aaagggattg ggggatccta tcagccgttc ccagctggtg 2220aagtccgagc
tggaggagaa gaaatccgag ttgaggcaca agctgaagta cgtgccccac
2280gagtacatcg agctgatcga gatcgcccgg aacagcaccc aggaccgtat
cctggagatg 2340aaggtgatgg agttcttcat gaaggtgtac ggctacaggg
gcaagcacct gggcggctcc 2400aggaagcccg acggcgccat ctacaccgtg
ggctccccca tcgactacgg cgtgatcgtg 2460gacaccaagg cctactccgg
cggctacaac ctgcccatcg gccaggccga cgaaatgcag 2520aggtacgtgg
aggagaacca gaccaggaac aagcacatca accccaacga gtggtggaag
2580gtgtacccct ccagcgtgac cgagttcaag ttcctgttcg tgtccggcca
cttcaagggc 2640aactacaagg cccagctgac caggctgaac cacatcacca
actgcaacgg cgccgtgctg 2700tccgtggagg agctcctgat cggcggcgag
atgatcaagg ccggcaccct gaccctggag 2760gaggtgagga ggaagttcaa
caacggcgag atcaacttcg cggccgactg a 281151320DNAArtificialADH1
terminator 51tttggacttc ttcgccagag gtttggtcaa gtctccaatc aaggttgtcg
gcttgtctac 60cttgccagaa atttacgaaa agatggaaaa gggtcaaatc gttggtagat
acgttgttga 120cacttctaaa taagcgaatt tcttatgatt tatgattttt
attattaaat aagttataaa 180aaaaataagt gtatacaaat tttaaagtga
ctcttaggtt ttaaaacgaa aattcttatt 240cttgagtaac tctttcctgt
aggtcaggtt gctttctcag gtatagcatg aggtcgctct 300tattgaccac
acctctaccg 320521095DNAArtificialsequence coding for the LEU2
selection marker 52atgtctgccc ctaagaagat cgtcgttttg ccaggtgacc
acgttggtca agaaatcaca 60gccgaagcca ttaaggttct taaagctatt tctgatgttc
gttccaatgt caagttcgat 120ttcgaaaatc atttaattgg tggtgctgct
atcgatgcta caggtgtccc acttccagat 180gaggcgctgg aagcctccaa
gaaggttgat gccgttttgt taggtgctgt gggtggtcct 240aaatggggta
ccggtagtgt tagacctgaa caaggtttac taaaaatccg taaagaactt
300caattgtacg ccaacttaag accatgtaac tttgcatccg actctctttt
agacttatct 360ccaatcaagc cacaatttgc taaaggtact gacttcgttg
ttgtcagaga attagtggga 420ggtatttact ttggtaagag aaaggaagac
gatggtgatg gtgtcgcttg ggatagtgaa 480caatacaccg ttccagaagt
gcaaagaatc acaagaatgg ccgctttcat ggccctacaa 540catgagccac
cattgcctat ttggtccttg gataaagcta atgttttggc ctcttcaaga
600ttatggagaa aaactgtgga ggaaaccatc aagaacgaat tccctacatt
gaaggttcaa 660catcaattga ttgattctgc cgccatgatc ctagttaaga
acccaaccca cctaaatggt 720attataatca ccagcaacat gtttggtgat
atcatctccg atgaagcctc cgttatccca 780ggttccttgg gtttgttgcc
atctgcgtcc ttggcctctt tgccagacaa gaacaccgca 840tttggtttgt
acgaaccatg ccacggttct gctccagatt tgccaaagaa taaggtcaac
900cctatcgcca ctatcttgtc tgctgcaatg atgttgaaat tgtcattgaa
cttgcctgaa 960gaaggtaagg ccattgaaga tgcagttaaa aaggttttgg
atgcaggtat cagaactggt 1020gatttaggtg gttccaacag taccacggaa
gtcggtgatg ctgtcgccga agaagttaag 1080aaaatccttg cttaa
1095531345DNAArtificial2-Micron replication origin 53aacgaagcat
ctgtgcttca ttttgtagaa caaaaatgca acgcgagagc gctaattttt 60caaacaaaga
atctgagctg catttttaca gaacagaaat gcaacgcgaa agcgctattt
120taccaacgaa gaatctgtgc ttcatttttg taaaacaaaa atgcaacgcg
agagcgctaa 180tttttcaaac aaagaatctg agctgcattt ttacagaaca
gaaatgcaac gcgagagcgc 240tattttacca acaaagaatc tatacttctt
ttttgttcta caaaaatgca tcccgagagc 300gctatttttc taacaaagca
tcttagatta ctttttttct cctttgtgcg ctctataatg 360cagtctcttg
ataacttttt gcactgtagg tccgttaagg ttagaagaag gctactttgg
420tgtctatttt ctcttccata aaaaaagcct gactccactt cccgcgttta
ctgattacta 480gcgaagctgc gggtgcattt tttcaagata aaggcatccc
cgattatatt ctataccgat 540gtggattgcg catactttgt gaacagaaag
tgatagcgtt gatgattctt cattggtcag 600aaaattatga acggtttctt
ctattttgtc tctatatact acgtatagga aatgtttaca 660ttttcgtatt
gttttcgatt cactctatga atagttctta ctacaatttt tttgtctaaa
720gagtaatact agagataaac ataaaaaatg tagaggtcga gtttagatgc
aagttcaagg 780agcgaaaggt ggatgggtag gttatatagg gatatagcac
agagatatat agcaaagaga 840tacttttgag caatgtttgt ggaagcggta
ttcgcaatat tttagtagct cgttacagtc 900cggtgcgttt ttggtttttt
gaaagtgcgt cttcagagcg cttttggttt tcaaaagcgc 960tctgaagttc
ctatactttc tagagaatag gaacttcgga ataggaactt caaagcgttt
1020ccgaaaacga gcgcttccga aaatgcaacg cgagctgcgc acatacagct
cactgttcac 1080gtcgcaccta tatctgcgtg ttgcctgtat atatatatac
atgagaagaa cggcatagtg 1140cgtgtttatg cttaaatgcg tacttatatg
cgtctattta tgtaggatga aaggtagtct 1200agtacctcct gtgatattat
cccattccat gcggggtatc gtatgcttcc ttcagcacta 1260ccctttagct
gttctatatg ctgccactcc tcaattggat tagtctcatc cttcaatgct
1320atcatttcct ttgatattgg atcat 1345541530DNAArtificialsequence
coding for the TAL effector tandem repeat of the TALEN arm that
binds to the DNA target site of SEQ ID NO 4 (15 adjacent units of
34 amino acids) 54ttgacccccc agcaggtggt ggccatcgcc agcaataatg
gtggcaagca ggcgctggag 60acggtccagc ggctgttgcc ggtgctgtgc caggcccacg
gcttgacccc ccagcaggtg 120gtggccatcg ccagcaatgg cggtggcaag
caggcgctgg agacggtcca gcggctgttg 180ccggtgctgt gccaggccca
cggcttgacc ccccagcagg tggtggccat cgccagcaat 240aatggtggca
agcaggcgct
ggagacggtc cagcggctgt tgccggtgct gtgccaggcc 300cacggcttga
ccccggagca ggtggtggcc atcgccagca atattggtgg caagcaggcg
360ctggagacgg tgcaggcgct gttgccggtg ctgtgccagg cccacggctt
gaccccccag 420caggtggtgg ccatcgccag caatggcggt ggcaagcagg
cgctggagac ggtccagcgg 480ctgttgccgg tgctgtgcca ggcccacggc
ttgaccccgg agcaggtggt ggccatcgcc 540agccacgatg gcggcaagca
ggcgctggag acggtccagc ggctgttgcc ggtgctgtgc 600caggcccacg
gcttgacccc ggagcaggtg gtggccatcg ccagccacga tggcggcaag
660caggcgctgg agacggtcca gcggctgttg ccggtgctgt gccaggccca
cggcttgacc 720ccggagcagg tggtggccat cgccagccac gatggcggca
agcaggcgct ggagacggtc 780cagcggctgt tgccggtgct gtgccaggcc
cacggcttga ccccggagca ggtggtggcc 840atcgccagcc acgatggcgg
caagcaggcg ctggagacgg tccagcggct gttgccggtg 900ctgtgccagg
cccacggctt gaccccggag caggtggtgg ccatcgccag ccacgatggc
960ggcaagcagg cgctggagac ggtccagcgg ctgttgccgg tgctgtgcca
ggcccacggc 1020ttgaccccgg agcaggtggt ggccatcgcc agccacgatg
gcggcaagca ggcgctggag 1080acggtccagc ggctgttgcc ggtgctgtgc
caggcccacg gcttgacccc ggagcaggtg 1140gtggccatcg ccagcaatat
tggtggcaag caggcgctgg agacggtgca ggcgctgttg 1200ccggtgctgt
gccaggccca cggcttgacc ccccagcagg tggtggccat cgccagcaat
1260aatggtggca agcaggcgct ggagacggtc cagcggctgt tgccggtgct
gtgccaggcc 1320cacggcttga ccccggagca ggtggtggcc atcgccagcc
acgatggcgg caagcaggcg 1380ctggagacgg tccagcggct gttgccggtg
ctgtgccagg cccacggctt gaccccggag 1440caggtggtgg ccatcgccag
caatattggt ggcaagcagg cgctggagac ggtgcaggcg 1500ctgttgccgg
tgctgtgcca ggcccacggc 15305534PRTArtificialTAL effector tandem
repeat unit [XX is selected from the group consisting of HD, NG,
NI, NN, NS, N*, HG, H*, IG, HA, ND, NK, HI, HN, NA, SN and YG (the
symbol * denotes that the second X is missing)] 55Leu Thr Pro Glu
Gln Val Val Ala Ile Ala Ser Xaa Xaa Gly Gly Lys 1 5 10 15 Gln Ala
Leu Glu Thr Val Gln Ala Leu Leu Pro Val Leu Cys Gln Ala 20 25 30
His Gly 5614PRTArtificial(non-specific) C-terminal truncated unit
of 14 amino acids of the TALEN arm that binds to the DNA target
site of SEQ ID NO 10 56Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Gly Gly 1 5 10
* * * * *
References