U.S. patent application number 14/240735 was filed with the patent office on 2014-10-02 for modified cascade ribonucleoproteins and uses thereof.
The applicant listed for this patent is WAGENINGEN UNIVERSITEIT. Invention is credited to Stan Johan Jozef Brouns, John Van Der Oost.
Application Number | 20140294773 14/240735 |
Document ID | / |
Family ID | 45695084 |
Filed Date | 2014-10-02 |
United States Patent
Application |
20140294773 |
Kind Code |
A1 |
Brouns; Stan Johan Jozef ;
et al. |
October 2, 2014 |
MODIFIED CASCADE RIBONUCLEOPROTEINS AND USES THEREOF
Abstract
A clustered regularly interspaced short palindromic repeat
(CRISPR)-associated complex for adaptive antiviral defence
(Cascade); the Cascade protein complex comprising at least
CRISPR-associated protein subunits Cas7, Cas5 and Cas6 which
includes at least one subunit with an additional amino acid
sequence possessing nucleic acid or chromatin modifying,
visualising, transcription activating or transcription repressing
activity. The Cascade complex with additional activity is combined
with an RNA molecule to produce a ribonucleoprotein complex. The
RNA molecule is selected to have substantial complementarity to a
target sequence. Targeted ribonucleoproteins can be used as genetic
engineering tools for precise cutting of nucleic acids in
homologous recombination, non-homologous end joining, gene
modification, gene integration, mutation repair or for their
visualisation, transcriptional activation or repression. A pair of
ribonucleotides fused to Fold dimers may be used to generate
double-strand breakages in the DNA to facilitate these applications
in a sequence-specific manner.
Inventors: |
Brouns; Stan Johan Jozef;
(Wageningen, NL) ; Van Der Oost; John; (Renkum,
NL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
WAGENINGEN UNIVERSITEIT |
Wageningen |
|
NL |
|
|
Family ID: |
45695084 |
Appl. No.: |
14/240735 |
Filed: |
December 21, 2012 |
PCT Filed: |
December 21, 2012 |
PCT NO: |
PCT/EP2012/076674 |
371 Date: |
February 24, 2014 |
Current U.S.
Class: |
424/93.4 ;
424/93.6; 424/94.6; 435/196; 435/320.1; 435/325; 435/348; 435/366;
435/419; 435/455; 435/468; 435/471; 514/44A; 514/44R; 536/23.2 |
Current CPC
Class: |
C07K 14/47 20130101;
C07K 2319/85 20130101; C12N 15/81 20130101; C07K 2319/22 20130101;
A61K 48/005 20130101; C07K 14/245 20130101; C12N 15/70 20130101;
C12N 15/74 20130101; C12N 9/22 20130101; C12N 15/62 20130101; C12N
15/82 20130101; C12N 9/16 20130101; C07K 2319/80 20130101; C12N
15/66 20130101; C12N 15/86 20130101; C12N 2310/20 20170501; C12Y
301/21004 20130101; A61P 31/12 20180101; C12N 15/902 20130101; C07K
2319/09 20130101; C07K 2319/71 20130101; C12N 15/907 20130101; C07K
2319/60 20130101; A61K 38/00 20130101 |
Class at
Publication: |
424/93.4 ;
424/93.6; 424/94.6; 435/455; 435/468; 435/471; 435/196; 435/325;
435/348; 435/366; 435/419; 435/320.1; 514/44.R; 514/44.A;
536/23.2 |
International
Class: |
C12N 9/16 20060101
C12N009/16; C12N 15/86 20060101 C12N015/86; C12N 15/81 20060101
C12N015/81; A61K 48/00 20060101 A61K048/00; C12N 15/82 20060101
C12N015/82 |
Claims
1.-46. (canceled)
47. A composition comprising: (a) a double-stranded target nucleic
acid comprising a protospacer adjacent motif; and (b) a
synthetically constructed non-natural nucleic acid comprising a
CRISPR RNA, wherein a spacer portion of said CRISPR RNA is
hybridized with one strand of said double-stranded target nucleic
acid.
48. The composition of claim 47, wherein said synthetically
constructed non-natural nucleic acid comprising a CRISPR RNA is a
designed non-natural nucleic acid comprising a CRISPR RNA.
49. The composition of claim 47, wherein said spacer portion of
said CRISPR RNA is hybridized adjacent to said protospacer adjacent
motif.
50. The composition of claim 47, further comprising a CRISPR
associated polypeptide.
51. The composition of claim 50, wherein said CRISPR associated
polypeptide is a nuclease.
52. The composition of claim 50, wherein said CRISPR associated
polypeptide forms a complex with said double-stranded target
nucleic acid.
53. The composition of claim 50, wherein said CRISPR associated
polypeptide forms a complex with said CRISPR RNA.
54. The composition of claim 47, wherein said protospacer adjacent
motif is located at the 5' end of said double-stranded target
nucleic acid.
55. The composition of claim 47, wherein said protospacer adjacent
motif is located at the 3' end of said double-stranded target
nucleic acid.
56. The composition of claim 47, wherein said protospacer adjacent
motif comprises 5'-CTT-3'.
57. The composition of claim 47, wherein said protospacer adjacent
motif comprises 5'-CAT-3'.
58. The composition of claim 47, wherein said protospacer adjacent
motif comprises 5'-CTC-3'.
59. The composition of claim 47, wherein said protospacer adjacent
motif comprises 5'-CCT-3'.
60. The composition of claim 47, wherein one strand of said
protospacer adjacent motif comprises a CC dinucleotide.
61. The composition of claim 47, wherein one strand said
protospacer adjacent motif comprises a CC dinucleotide, wherein
said CC dinucleotide is hybridized with a GG dinucleotide on the
other strand of said protospacer adjacent motif.
62. The composition of claim 47, wherein said double-stranded
target nucleic acid is DNA.
63. The composition of claim 47, wherein said double-stranded
target nucleic acid is a target for genetic modification.
64. The composition of claim 47, wherein said spacer portion of
said CRISPR RNA hybridizes to said double-stranded target nucleic
acid with a dissociation constant from 1 picomolar to 1
micromolar.
65. The composition of claim 47, wherein said spacer portion of
said CRISPR RNA hybridizes to said double-stranded target nucleic
acid with a dissociation constant from 1 nanomolar to 100
nanomolar.
66. The composition of claim 47, wherein said spacer portion of
said CRISPR RNA is synthetically constructed to hybridize to a
sequence of said double-stranded target nucleic acid comprising at
least 50% identity to said spacer portion.
67. The composition of claim 47, wherein said CRISPR RNA comprises
a handle region.
68. The composition of claim 47, wherein said CRISPR RNA comprises
a seed region.
69. The composition of claim 47, wherein said CRISPR RNA comprises
a 3' hairpin region.
70. The composition of claim 47, wherein said CRISPR RNA comprises
a handle region, a seed region, and 3' hairpin region, or any
combination thereof.
71. The composition of claim 70, wherein said handle region is
located 5' to said seed region which is located 5' to said spacer
portion which is located 5' to said 3' hairpin region.
72. The composition of claim 50, wherein said CRISPR associated
polypeptide is adapted to induce a modification in said
double-stranded target nucleic acid.
73. The composition of claim 50, wherein said CRISPR associated
polypeptide further comprises a polypeptide selected from the group
consisting of: helicase, nuclease, methyltransferase, demethylase,
acetylase, deacetylase, phosphatase, kinase, transcription
activator, RNA polymerase subunit, transcription repressor, DNA
binding polypeptide, DNA structuring polypeptide, marker
polypeptide, reporter polypeptide, fluorescent polypeptide, ligand
binding polypeptide, signal polypeptide, subcellular localization
polypeptide, and antibody epitope, or any combination thereof.
74. The composition of claim 50, wherein said CRISPR associated
polypeptide further comprises a nuclease.
75. The composition of claim 50, wherein said CRISPR associated
polypeptide further comprises a helicase.
76. The composition of claim 50, wherein said CRISPR associated
polypeptide further comprises a transcription activator.
77. The composition of claim 50, wherein said CRISPR associated
polypeptide further comprises a transcription repressor.
78. The composition of claim 47, wherein said double-stranded
target nucleic acid is bound to a solid support.
79. A method of synthesizing a synthetically constructed
non-natural CRISPR RNA comprising: synthesizing the synthetically
constructed non-natural nucleic acid of claim 47.
80. A composition comprising: a nucleic acid-targeting nucleic acid
comprising: (a) a synthetically constructed non-natural spacer
region adapted to hybridize to one strand of a double-stranded
target nucleic acid comprising a protospacer adjacent motif; (b) a
nucleic acid duplex; (c) a linker linking both strands of said
duplex, wherein at least one strand of said duplex is adapted to
form a complex with a CRISPR associated polypeptide.
81. The composition of claim 80, wherein said duplex comprises a
hairpin.
82. The composition of claim 81, wherein said duplex is 3' to said
synthetically constructed non-natural spacer region.
83. The composition of claim 80, wherein said linker comprises a
tetranucleotide loop.
84. The composition of claim 80, wherein said synthetically
constructed non-natural spacer region is 5' to said duplex.
85. The composition of claim 80, wherein said synthetically
constructed non-natural spacer region hybridizes adjacent to said
protospacer adajcent motif of said one strand of said
double-stranded target nucleic acid.
86. The composition of claim 80, wherein said spacer region
comprises a seed region.
87. The composition of claim 80, wherein said CRISPR associated
polypeptide is a nuclease.
88. The composition of claim 80, wherein said protospacer adjacent
motif is located at the 5' end of said double-stranded target
nucleic acid.
89. The composition of claim 80, wherein said protospacer adjacent
motif is located at the 3' end of said double-stranded target
nucleic acid.
90. The composition of claim 80, wherein said protospacer adjacent
motif comprises 5'-CTT-3'.
91. The composition of claim 80, wherein said protospacer adjacent
motif comprises 5'-CAT-3'.
92. The composition of claim 80, wherein said protospacer adjacent
motif comprises 5'-CTC-3'.
93. The composition of claim 80, wherein said protospacer adjacent
motif comprises 5'-CCT-3'.
94. The composition of claim 80, wherein one strand of said
protospacer adjacent motif comprises a CC dinucleotide.
95. The composition of claim 80, wherein one strand said
protospacer adjacent motif comprises a CC dinucleotide, wherein
said CC dinucleotide is hybridized with a GG dinucleotide on the
other strand of said protospacer adjacent motif.
96. The composition of claim 80, wherein said double-stranded
target nucleic acid is DNA.
97. The composition of claim 80, wherein said double-stranded
target nucleic acid is a target for genetic modification.
98. The composition of claim 80, wherein said synthetically
constructed spacer region hybridizes to said double-stranded target
nucleic acid with a dissociation constant from 1 picomolar to 1
micromolar.
99. The composition of claim 80, wherein said synthetically
constructed spacer region hybridizes to said double-stranded target
nucleic acid with a dissociation constant from 1 nanomolar to 100
nanomolar.
100. The composition of claim 80, wherein said synthetically
constructed spacer region is synthetically constructed to hybridize
to a sequence of said double-stranded target nucleic acid
comprising at least 50% identity to said synthetically constructed
spacer region.
101. The composition of claim 80, wherein said nucleic
acid-targeting nucleic acid is from 35-75 nucleotides in
length.
102. The composition of claim 80, further comprising a delivery
vehicle.
103. The composition of claim 102, wherein said delivery vehicle
comprises a virus.
104. The composition of claim 102, wherein said delivery vehicle
comprises adenovirus.
105. The composition of claim 102, wherein said delivery vehicle
comprises an Agrobacterium.
106. A pharmaceutical composition comprising the composition of
claim 80.
107. A method of synthesizing a synthetically constructed
non-natural CRISPR RNA comprising: synthesizing the synthetically
constructed non-natural nucleic acid of claim 80.
108. The composition of claim 80 for use as a medicament.
109. A composition comprising: a delivery vehicle comprising a
CRISPR RNA, wherein a portion of said CRISPR RNA is adapted to
hybridize to a region of one strand of a double-stranded target
nucleic acid, wherein said region is adjacent to a protospacer
adjacent motif.
110. The composition of claim 109, wherein said delivery vehicle is
selected from the group consisting of: cationic polymers, lipids,
and cell-penetrating peptide, or any combination thereof.
111. The composition of claim 109, wherein said delivery vehicle
comprises a virus.
112. The composition of claim 109, wherein said delivery vehicle
comprises adenovirus.
113. The composition of claim 109, wherein said delivery vehicle
comprises an Agrobacterium.
114. A pharmaceutical composition comprising the composition of
claim 109.
115. The composition of claim 109 for use as a medicament.
116. A vector comprising: (a) a first polynucleotide sequence
encoding a CRISPR associated polypeptide; and (b) a second
polynucleotide sequence encoding a visualizing polypeptide.
117. The vector of claim 116, wherein said CRISPR associated
polypeptide comprises nuclease activity.
118. The vector of claim 116, wherein said vector comprises a third
polynucleotide sequence encoding for a linker, wherein said third
polynucleotide sequence is between said first and second
polynucleotide sequences.
119. The vector of claim 116, wherein said vector further comprises
a polynucleotide sequence encoding for a CRISPR RNA.
120. The vector of claim 116, wherein said visualizing polypeptide
comprises a fluorescent protein.
121. The vector of claim 116, wherein said visualizing polypeptide
comprises green fluorescent protein.
122. The vector of claim 116, wherein said visualizing polypeptide
comprises yellow fluorescent protein.
123. A pharmaceutical composition comprising: the vector of claim
116.
124. A delivery vehicle comprising: the vector of claim 116.
125. The delivery vehicle of claim 124, wherein said delivery
vehicle is selected from the group consisting of: cationic
polymers, lipids, and cell-penetrating peptide, or any combination
thereof.
126. The delivery vehicle of claim 124, wherein said delivery
vehicle comprises a virus.
127. The delivery vehicle of claim 124, wherein said delivery
vehicle comprises adenovirus.
128. The delivery vehicle of claim 124, wherein said delivery
vehicle comprises an Agrobacterium.
129. A method for modifying a genomic target nucleic acid in a
eukaryotic cell comprising: (a) contacting a nucleic acid-targeting
nucleic acid with a target nucleic acid, wherein said nucleic
acid-targeting nucleic acid comprises a CRISPR RNA; and (b)
modifying said target nucleic acid in said eukaryotic cell.
130. The method of claim 129, wherein said target nucleic acid is
adjacent to a protospacer adjacent motif.
131. The method of claim 129, wherein a spacer of said nucleic
acid-targeting nucleic acid is synthetically constructed.
132. The method of claim 129, wherein said target nucleic acid is
single stranded DNA.
133. The method of claim 129, wherein said target nucleic acid is
double stranded DNA.
134. The method of claim 129, wherein said target nucleic acid is
RNA.
135. The method of claim 129, wherein said target nucleic acid is
genomic DNA.
136. The method of claim 129, wherein said contacting comprises
hybridizing a portion of said nucleic acid-targeting nucleic acid
to said target nucleic acid.
137. The method of claim 129, wherein said contacting comprises
binding said nucleic acid-targeting nucleic acid to said target
nucleic acid with a dissociation constant of at least 1
micromolar.
138. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid is introduced into said eukaryotic cell by a delivery
method selected from the group consisting of: microinjection,
transfection, electroporation, calcium co-precipitation, cationic
polymer and lipid delivery, and cell-penetrating particles, or any
combination thereof.
139. The method of claim 129, wherein said contacting comprises
hybridizing a synthetically constructed spacer portion of said
CRISPR RNA to a sequence of said target nucleic acid comprising at
least 50% identity to said synthetically constructed spacer
portion.
140. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid is RNA.
141. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid comprises a handle region.
142. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid comprises a seed region.
143. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid comprises a spacer region.
144. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid comprises a 3' hairpin region.
145. The method of claim 129, wherein said nucleic acid-targeting
nucleic acid comprises a handle region, a seed region, a spacer
region, and 3' hairpin region, or any combination thereof.
146. The method of claim 145, wherein said handle region is located
5' to said seed region which is located 5' to said spacer region
which is located 5' to said 3' hairpin region.
147. The method of claim 129, wherein said modifying comprises
cleaving said target nucleic acid.
148. The method of claim 129, wherein said modifying comprises
introducing a single-stranded break in said target nucleic
acid.
149. The method of claim 129, wherein said modifying comprises
introducing a double-stranded break in said target nucleic
acid.
150. The method of claim 129, wherein said modifying comprises
allowing said target nucleic acid to be visualized.
151. The method of claim 129, wherein said modifying comprises
introducing a modification selected from the group consisting of:
an organic dye, a radiolabel, and a spin label, or any combination
thereof.
152. The method of claim 129, wherein said modifying comprises
deleting a portion of said target nucleic acid.
153. The method of claim 129, wherein said modifying is selected
from the group consisting of: activating trancription of said
target nucleic acid, repressing transcription of said target
nucleic acid, or both activating and repressing transcription of
said target nucleic acid.
154. The method of claim 129, wherein said modifying comprises
inserting a nucleic acid into said target nucleic acid.
155. The method of claim 129, wherein said modifying is performed
by a CRISPR associated polypeptide.
156. The method of claim 155, wherein said CRISPR associated
polypeptide is a nuclease.
157. The method of claim 129, wherein said cell is an insect
cell.
158. The method of claim 129, wherein said cell is a cell of a
multicellular organism.
159. The method of claim 129, wherein said cell is a mammalian
cell.
160. The method of claim 129, wherein said cell is a mammalian stem
cell.
161. The method of claim 129, wherein said cell is a human
cell.
162. The method of claim 129, wherein said cell is a human stem
cell.
163. The method of claim 129, wherein said cell is a plant
cell.
164. The method of claim 129, wherein said cell is an isolated
cell.
165. The method of claim 164, wherein said cell is a yeast
cell.
166. The method of claim 164, wherein said cell is a fungal
cell.
167. A composition comprising: (a) a eukaryotic cell; (b) a
double-stranded target nucleic acid in said eukaryotic cell
comprising a protospacer adjacent motif; and (c) a non-natural
nucleic acid comprising a CRISPR RNA, wherein a spacer portion of
said CRISPR RNA is hybridized with one strand of said
double-stranded target nucleic acid in said eukaryotic cell.
168. The eukaryotic cell of claim 167, wherein a portion of said
CRISPR RNA hybridizes to one strand of a double-stranded target
nucleic acid in said eukaryotic cell.
169. The eukaryotic cell of claim 167, wherein said cell is an
insect cell.
170. The eukaryotic cell of claim 167, wherein said cell is a cell
of a multicellular organism.
171. The eukaryotic cell of claim 167, wherein said cell is a
mammalian cell.
172. The eukaryotic cell of claim 167, wherein said cell is a
mammalian stem cell.
173. The eukaryotic cell of claim 167, wherein said cell is a human
cell.
174. The eukaryotic cell of claim 167, wherein said cell is a human
stem cell.
175. The eukaryotic cell of claim 167, wherein said cell is a plant
cell.
176. The eukaryotic cell of claim 167, wherein said cell is a cell
of a multicellular tissue.
177. The eukaryotic cell of claim 167, wherein said cell is an
isolated cell.
178. The eukaryotic cell of claim 167, wherein said cell is a cell
of an organ.
179. The eukaryotic cell of claim 167, wherein said cell is a cell
of an organism.
180. A method for using a eukaryotic cell comprising: (a) isolating
said eukaryotic cell; (b) contacting a target nucleic acid in said
cell with a complex comprising: (i) a CRISPR RNA; and (ii) a CRISPR
associated polypeptide; (c) modifying said target nucleic acid; and
(d) relocating said eukaryotic cell.
181. The method of claim 180, wherein said isolating comprises
isolating from a whole tissue.
182. The method of claim 180, wherein said isolating comprises
isolating from an organ.
183. The method of claim 180, wherein said isolating comprises
isolating from an organism.
184. The method of claim 180, wherein said relocating comprises
transplanting said cell to a different location from which said
cell originated.
185. The method of claim 180, wherein said relocating comprises
transplanting said cell to a different organism from which said
cell originated.
186. The method of claim 180, wherein said relocating comprises
transplanting said cell to a same location from which said cell
originated.
187. The method of claim 180, wherein said relocating comprises
transplanting said cell to a same organism from which said cell
originated.
188. The method of claim 180, wherein said relocating comprises a
transplanting method selected from the group consisting of:
allograft transplant, autograft transplant, isograft transplant,
and xenograft transplant, or any combination thereof.
189. The method of claim 180, wherein said contacting comprises
hybridizing a portion of said nucleic acid-targeting nucleic acid
to said target nucleic acid.
190. The method of claim 180, wherein said contacting comprises
hybridizing a portion of said CRISPR RNA to said target nucleic
acid with a dissociation constant from 1 picomolar to 1
micromolar.
191. The method of claim 180, wherein said contacting comprises
hybridizing a portion of said CRISPR RNA to said target nucleic
acid with a dissociation constant from 1 nanomolar to 100
nanomolar.
192. The method of claim 180, wherein said contacting comprises
introducing said nucleic acid-targeting nucleic acid by a delivery
method selected from the group consisting of: microinjection,
transfection, electroporation, calcium co-precipitation, cationic
polymer and lipid delivery, and cell-penetrating particles, or any
combination thereof.
193. The method of claim 180, wherein said contacting comprises
contacting said complex to said target nucleic acid at a site
adjacent to a protospacer adjacent motif.
194. The method of claim 193, wherein said protospacer adjacent
motif is located 5' to said target nucleic acid.
195. The method of claim 193, wherein said protospacer adjacent
motif is located 3' to said target nucleic acid.
196. The method of claim 193, wherein said protospacer adjacent
motif comprises 5'-CTT-3'.
197. The method of claim 193, wherein said protospacer adjacent
motif comprises 5'-CAT-3'.
198. The method of claim 193, wherein said protospacer adjacent
motif comprises 5'-CTC-3'.
199. The method of claim 193, wherein said protospacer adjacent
motif comprises 5'-CCT-3'.
200. The method of claim 180, wherein said target nucleic acid is
selected from the group consisting of ssDNA, dsDNA, and RNA.
201. The method of claim 180, wherein said target nucleic acid is
DNA.
202. The method of claim 180, wherein said CRISPR associated
polypeptide is a nuclease.
203. The method of claim 180, wherein said CRISPR associated
polypeptide further comprises a polypeptide selected from the group
consisting of: helicase, nuclease, methyltransferase, demthylase,
acetylase, deacetylase, phosphatase, kinase, transcription
activator, RNA polymerase subunit, transcription repressor, DNA
binding polypeptide, DNA structuring polypeptide, marker
polypeptide, reporter polypeptide, fluorescent polypeptide, ligand
binding polypeptide, signal polypeptide, subcellular localization
polypeptide, and antibody epitope, or any combination thereof.
204. The method of claim 180, wherein said CRISPR RNA comprises a
handle region.
205. The method of claim 180, wherein said CRISPR RNA comprises a
seed region.
206. The method of claim 180, wherein said CRISPR RNA comprises a
3' hairpin region.
207. The method of claim 180, wherein said CRISPR RNA comprises a
handle region, a seed region, a spacer region and 3' hairpin
region, or any combination thereof.
208. The method of claim 207, wherein said handle region is located
5' to said seed region which is located 5' to said spacer region,
which is located 5' to said 3' hairpin region.
209. The method of claim 180, wherein said contacting comprises
hybridizing a synthetically constructed spacer portion of said
CRISPR RNA to a sequence of said target nucleic acid comprising at
least 50% identity to said synthetically constructed spacer
portion.
210. The method of claim 180, wherein said nucleic acid-targeting
nucleic acid is RNA.
211. The method of claim 180, wherein said modifying comprises
cleaving said target nucleic acid.
212. The method of claim 180, wherein said modifying comprises
introducing a single-stranded break in said target nucleic
acid.
213. The method of claim 180, wherein said modifying comprises
introducing a double-stranded break in said target nucleic
acid.
214. The method of claim 180, wherein said modifying comprises
allowing said target nucleic acid to be visualized.
215. The method of claim 180, wherein said modifying comprises
introducing a modification selected from the group consisting of:
an organic dye, a radiolabel, and a spin label, or any combination
thereof.
216. The method of claim 180, wherein said modifying comprises
deleting a portion of said target nucleic acid.
217. The method of claim 180, wherein said modifying is selected
from the group consisting of: activating transcription of said
target nucleic acid, and repressing transcription of said target
nucleic acid, or any combination thereof.
218. The method of claim 180, wherein said modifying comprises
inserting a nucleic acid into said target nucleic acid.
219. The method of claim 180, wherein said eukaryotic cell is an
insect cell.
220. The method of claim 180, wherein said eukaryotic cell is a
mammalian cell.
221. The method of claim 180, wherein said eukaryotic cell is a
mammalian stem cell.
222. The method of claim 180, wherein said eukaryotic cell is a
human cell.
223. The method of claim 180, wherein said eukaryotic cell is a
human stem cell.
224. The method of claim 180, wherein said eukaryotic cell is a
plant cell.
225. A CRISPR associated polypeptide comprising: an amino acid
sequence comprising XXXNHXNNHXXHH, wherein: (a) "X" signifies an
amino acid identical to an amino acid at the same position in SEQ
ID NO: 3, "H" signifies an amino acid similar to an amino acid at
the same position in SEQ ID NO: 3, and "N" is an any amino acid;
and (b) said polypeptide comprises an activity selected from the
group consisting of: visualizing activity, nuclease activity, or
both visualizing and nuclease activity.
226. The polypeptide of claim 225, further comprising at least 18%
identity to SEQ ID NO: 3.
227. The polypeptide of claim 225, further comprising at least 17%
identity to SEQ ID NO: 4.
228. The polypeptide of claim 225 further comprising at least 16%
identity to SEQ ID NO: 5.
229. The polypeptide of claim 225, wherein said CRISPR associated
polypeptide is adapted to induce a modification in said
double-stranded target nucleic acid.
230. The polypeptide of claim 229, wherein said modification is
adapted to cleave said double-stranded target nucleic acid.
231. The polypeptide of claim 229, wherein said modification
comprises introduction of a single-stranded break in said
double-stranded target nucleic acid.
232. The polypeptide of claim 229, wherein said modification
comprises introduction of a double-stranded break in said
double-stranded target nucleic acid.
233. The polypeptide of claim 229, wherein said modification
comprises deletion of a portion of said double-stranded target
nucleic acid.
234. The polypeptide of claim 229, wherein said modification
comprises introduction of a nucleic acid into said double-stranded
target nucleic acid.
235. The polypeptide of claim 229, wherein said modification is
selected from the group consisting of: a modification to visualize
said double-stranded target nucleic acid, a modification to
activate trancription of said double-stranded target nucleic acid,
and a modification to repress transcription of said double-stranded
target nucleic acid, or any combination thereof.
236. The polypeptide of claim 229, wherein said CRISPR associated
polypeptide further comprises a polypeptide selected from the group
consisting of: helicase, nuclease, methyltransferase, demthylase,
acetylase, deacetylase, phosphatase, kinase, transcription
activator, RNA polymerase subunit, transcription repressor, DNA
binding polypeptide, DNA structuring polypeptide, marker
polypeptide, reporter polypeptide, fluorescent polypeptide, ligand
binding polypeptide, signal polypeptide, subcellular localization
polypeptide, and antibody epitope, or any combination thereof.
237. A pharmaceutical composition comprising: the polypeptide of
claim 225.
238. A delivery vehicle comprising: the polypeptide of claim
225.
239. The delivery vehicle of claim 238, wherein said delivery
vehicle is selected from the group consisting of: cationic
polymers, lipids, cell-penetrating peptide, and viruses, or any
combination thereof.
240. A kit comprising: the composition of claim 80.
241. A kit comprising: the composition of claim 109.
242. A kit comprising: the vector of claim 116.
243. A kit comprising: the composition of claim 167.
244. A kit comprising: the polypeptide of claim 225.
Description
[0001] The invention relates to the field of genetic engineering
and more particularly to the area of gene and/or genome
modification of organisms, including prokaryotes and eukaryotes.
The invention also concerns methods of making site specific tools
for use in methods of genome analysis and genetic modification,
whether in vivo or in vitro. The invention more particularly
relates to the field of ribonucleoproteins which recognise and
associate with nucleic acid sequences in a sequence specific
way.
[0002] Bacteria and archaea have a wide variety of defense
mechanisms against invasive DNA. So called CRISPR/Cas defense
systems provide adaptive immunity by integrating plasmid and viral
DNA fragments in loci of clustered regularly interspaced short
palindromic repeats (CRISPR) on the host chromosome. The viral or
plasmid-derived sequences, known as spacers, are separated from
each other by repeating host-derived sequences. These repetitive
elements are the genetic memory of this immune system and each
CRISPR locus contains a diverse repertoire of unique `spacer`
sequences acquired during previous encounters with foreign genetic
elements.
[0003] Acquisition of foreign DNA is the first step of
immunization, but protection requires that the CRISPR is
transcribed and that these long transcripts are processed into
short CRISPR-derived RNAs (crRNAs) that each contains a unique
spacer sequence complementary to a foreign nucleic acid
challenger.
[0004] In addition to the crRNA, genetic experiments in several
organisms have revealed that a unique set of CRISPR-associated
(Cas) proteins is required for the steps of acquiring immunity, for
crRNA biogenesis and for targeted interference. Also, a subset of
Cas proteins from phylogenetically distinct CRISPR systems have
been shown to assemble into large complexes that include a
crRNA.
[0005] A recent re-evaluation of the diversity of CRISPR/Cas
systems has resulted in a classification into three distinct types
(Makarova K. et al (2011) Nature Reviews Microbiology--AOP 9 May
2011; doi:10.1038/nrmicro2577) that vary in cas gene content, and
display major differences throughout the CRISPR defense pathway.
(The Makarova classification and nomenclature for CRISPR-associated
genes is adopted in the present specification.) RNA transcripts of
CRISPR loci (pre-crRNA) are cleaved specifically in the repeat
sequences by CRISPR associated (Cas) endoribonucleases in type I
and type III systems or by RNase III in type II systems; the
generated crRNAs are utilized by a Cas protein complex as a guide
RNA to detect complementary sequences of either invading DNA or
RNA. Cleavage of target nucleic acids has been demonstrated in
vitro for the Pyrococcus furiosus type III-B system, which cleaves
RNA in a ruler-anchored mechanism, and, more recently, in vivo for
the Streptococcus thermophiles type II system, which cleaves DNA in
the complementary target sequence (protospacer). In contrast, for
type I systems the mechanism of CRISPR-interference is still
largely unknown.
[0006] The model organism Escherichia coli strain K12 possesses a
CRISPR/Cas type I-E (previously known as CRISPR subtype E (Cse)).
It contains eight cas genes (cas1, cas2, cas3 and cse1, cse2, cas7,
cas2, cas6e) and a downstream CRISPR (type-2 repeats). In
Escherichia coli K12 the eight cas genes are encoded upstream of
the CRISPR locus. Cas1 and Cas2 do not appear to be needed for
target interference, but are likely to participate in new target
sequence acquisition. In contrast, six Cas proteins: Cse1, Cse2,
Cas3, Cas7, Cas5 and Cas6e (previously also known as CasA, CasB,
Cas3, CasC/Cse4, CasD and CasE/Cse3 respectively) are essential for
protection against lambda phage challenge. Five of these proteins:
Cse1, Cse2, Cas7, Cas5 and Cas6e (previously known as CasA, CasB,
CasC/Cse4, CasD and CasE/Cse3 respectively) assemble with a crRNA
to form a multi-subunit ribonucleoprotein (RNP) referred to as
Cascade.
[0007] In E. coli, Cascade is a 405 kDa ribonucleoprotein complex
composed of an unequal stoichiometry of five functionally essential
Cas proteins: Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1Cas6e.sub.1
(i.e. under previous nomenclature
CasA.sub.1B.sub.2C.sub.6D.sub.1E.sub.1) and a 61-nt CRISPR-derived
RNA. Cascade is an obligate RNP that relies on the crRNA for
complex assembly and stability, and for the identification of
invading nucleic acid sequences. Cascade is a surveillance complex
that finds and binds foreign nucleic acids that are complementary
to the spacer sequence of the crRNA.
[0008] Jore et al. (2011) entitled "Structural basis for CRISPR
RNA-guided DNA recognition by Cascade" Nature Structural &
Molecular Biology 18: 529-537 describes how there is a cleavage of
the pre-crRNA transcript by the Cas6e subunit of Cascade, resulting
in the mature 61 nt crRNA being retained by the CRISPR complex. The
crRNA serves as a guide RNA for sequence specific binding of
Cascade to double stranded (ds) DNA molecules through base pairing
between the crRNA spacer and the complementary protospacer, forming
a so-called R-loop. This is known to be an ATP-independent
process.
[0009] Brouns S. J. J., et al (2008) entitled "Small CRISPR RNAs
guide antiviral defense in prokaryotes" Science 321: 960-964
teaches that Cascade loaded with a crRNA requires Cas3 for in vivo
phage resistance.
[0010] Marraffini L. & Sontheimer E. (2010) entitled "CRISPR
interference: RNA-directed adaptive immunity in bacteria and
archaea" Nature Reviews Genetics 11: 181-190 is a review article
which summarises the state of knowledge in the art in the field.
Some suggestions are made about CRISPR-based applications and
technologies, but this is mainly in the area of generating phage
resistant strains of domesticated bacteria for the dairy industry.
The specific cleavage of RNA molecules in vitro by a crRNP complex
in Pyrococcus furiosus is suggested as something which awaits
further development. Manipulation of CRISPR systems is also
suggested as a possible way of reducing transmission of
antibiotic-resistant bacterial strains in hospitals. The authors
stress that further research effort will be needed to explore the
potential utility of the technology in these areas.
[0011] US2011236530 A1 (Manoury et al.) entitled "Genetic cluster
of strains of Streptococcus thermophilus having unique rheological
properties for dairy fermentation" discloses certain S.
thermophilus strains which ferment milk so that it is highly
viscous and weakly ropy. A specific CRISPR locus of defined
sequence is disclosed.
[0012] US2011217739 A1 (Terns et al.) entitled "Cas6 polypeptides
and methods of use" discloses polypeptides which have Cas6
endoribonuclease activity. The polypeptides cleave a target RNA
polynucleotide having a Cas6 recognition domain and cleavage site.
Cleavage may be carried out in vitro or in vivo. Microbes such as
E. coli or Haloferax volcanii are genetically modified so as to
express Cas6 endoribonuclease activity.
[0013] WO2010054154 (Danisco) entitled "Bifidobacteria CRISPR
sequences" discloses various CRISPR sequences found in
Bifidobacteria and their use in making genetically altered strains
of the bacteria which are altered in their phage resistance
characteristics.
[0014] US2011189776 A1 (Terns et al.) entitled "Prokaryotic
RNAi-like system and methods of use" describes methods of
inactivating target polynucleotides in vitro or in prokaryotic
microbes in vivo. The methods use a psiRNA having a 5' region of
5-10 nucleotides chosen from a repeat from a CRISPR locus
immediately upstream of a spacer. The 3' region is substantially
complementary to a portion of the target polynucleotide. Also
described are polypeptides having endonuclease activity in the
presence of psiRNA and target polynucleotide.
[0015] EP2341149 A1 (Danisco) entitled "Use of CRISPR associated
genes (CAS) describes how one or more Cas genes can be used for
modulating resistance of bacterial cells against bacteriophage;
particularly bacteria which provide a starter culture or probiotic
culture in dairy products.
[0016] WO2010075424 (The Regents of the University of California)
entitled "Compositions and methods for downregulating prokaryotic
genes" discloses an isolated polynucleotide comprising a CRISPR
array. At least one spacer of the CRISPR is complementary to a gene
of a prokaryote so that is can down-regulate expression of the
gene; particularly where the gene is associated with biofuel
production.
[0017] WO2008108989 (Danisco) entitled "Cultures with improved
phage resistance" discloses selecting bacteriophage resistant
strains of bacteria and also selecting the strains which have an
additional spacer having 100% identity with a region of phage RNA.
Improved strain combinations and starter culture rotations are
described for use in the dairy industry. Certain phages are
described for use as biocontrol agents.
[0018] WO2009115861 (Institut Pasteur) entitled "Molecular typing
and subtyping of Salmonella by identification of the variable
nucleotide sequences of the CRISPR loci" discloses methods for
detecting and identifying bacterial of the Salmonella genus by
using their variable nucleotide sequences contained in CRISPR
loci.
[0019] WO2006073445 (Danisco) entitled "Detection and typing of
bacterial strains" describes detecting and typing of bacterial
strains in food products, dietary supplements and environmental
samples. Strains of Lactobacillus are identified through specific
CRISPR nucleotide sequences.
[0020] Urnov F et al. (2010) entitled "Genome editing with
engineered zinc finger nucleases" Nature 11: 636-646 is a review
article about zinc finger nucleases and how they have been
instrumental in the field of reverse genetics in a range of model
organisms. Zinc finger nucleases have been developed so that
precisely targeting genome cleavage is possible followed by gene
modification in the subsequent repair process. However, zinc finger
nucleases are generated by fusing a number of zinc finger
DNA-binding domains to a DNA cleavage domain. DNA sequence
specificity is achieved by coupling several zinc fingers in series,
each recognising a three nucleotide motif A significant drawback
with the technology is that new zinc fingers need to be developed
for each new DNA locus which requires to be cleaved. This requires
protein engineering and extensive screening to ensure specificity
of DNA binding.
[0021] In the fields of genetic engineering and genomic research
there is an ongoing need for improved agents for sequence/site
specific nucleic acid detection and/or cleavage.
[0022] The inventors have made a surprising discovery in that
certain bacteria expressing Cas3, which has helicase-nuclease
activity, express Cas3 as a fusion with Cse1. The inventors have
also unexpectedly been able to produce artificial fusions of Cse1
with other nuclease enzymes.
[0023] The inventors have also discovered that Cas3-independent
target DNA recognition by Cascade marks DNA for cleavage by Cas3,
and that Cascade DNA binding is governed by topological
requirements of the target DNA.
[0024] The inventors have further found that Cascade is unable to
bind relaxed target plasmids, but surprisingly Cascade displays
high affinity for targets which have a negatively supercoiled (nSC)
topology.
[0025] Accordingly in a first aspect the present invention provides
a clustered regularly interspaced short palindromic repeat
(CRISPR)-associated complex for antiviral defence (Cascade), the
Cascade protein complex, or portion thereof, comprising at least
CRISPR-associated protein subunits: [0026] Cas7 (or COG 1857)
having an amino acid sequence of SEQ ID NO:3 or a sequence of at
least 18% identity therewith, [0027] Cas5 (or COG1688) having an
amino acid sequence of SEQ ID NO:4 or a sequence of at least 17%
identity therewith, and [0028] Cas6 (or COG 1583) having an amino
acid sequence of SEQ ID NO:5 or a sequence of at least 16% identity
therewith; and wherein at least one of the subunits includes an
additional amino acid sequence providing nucleic acid or chromatin
modifying, visualising, transcription activating or transcription
repressing activity.
[0029] A subunit which includes an additional amino acid sequence
having nucleic acid or chromatin modifying, visualising,
transcription activating or transcription repressing activity is an
example of what may be termed "a subunit linked to at least one
functional moiety"; a functional moiety being the polypeptide or
protein made up of the additional amino acid sequence. The
transcription activating activity may be that leading to activation
or upregulation of a desired genes; the transcription repressing
activity leading to repressing or downregulation of a desired
genes. The selection of the gene being due to the targeting of the
cascade complex of the invention with an RNA molecule, as described
further below.
[0030] The additional amino acid sequence having nucleic acid or
chromatin modifying, visualising, transcription activating or
transcription repressing activity is preferably formed of
contiguous amino acid residues. These additional amino acids may be
viewed as a polypeptide or protein which is contiguous and forms
part of the Cas or Cse subunit(s) concerned. Such a polypeptide or
protein sequence is preferably not normally part of any Cas or Cse
subunit amino acid sequence. In other words, the additional amino
acid sequence having nucleic acid or chromatin modifying,
visualising, transcription activating or transcription repressing
activity may be other than a Cas or Cse subunit amino acid
sequence, or portion thereof, i.e. may be other than a Cas3 submit
amino acid sequence or portion thereof.
[0031] The additional amino acid sequence with nucleic acid or
chromatin modifying, visualising, transcription activating or
transcription repressing activity may, as desired, be obtained or
derived from the same organism, e.g. E. coli, as the Cas or Cse
subunit(s).
[0032] Additionally and/or alternatively to the above, the
additional amino acid sequence having nucleic acid or chromatin
modifying, visualising, transcription activating or transcription
repressing activity may be "heterologous" to the amino acid
sequence of the Cas or Cse subunit(s). Therefore, the additional
amino acid sequence may be obtained or derived from an organism
different from the organism from which the Cas and/or Cse
subunit(s) are derived or originate.
[0033] Throughout, sequence identity may be determined by way of
BLAST and subsequent Cobalt multiple sequence alignment at the
National Center for Biotechnology Information webserver, where the
sequence in question is compared to a reference sequence (e.g. SEQ
ID NO: 3, 4 or 5). The amino acid sequences may be defined in terms
of percentage sequence similarity based on a BLOSUM62 matrix or
percentage identity with a given reference sequence (e.g. SEQ ID
NO:3, 4 or 5). The similarity or identity of a sequence involves an
initial step of making the best alignment before calculating the
percentage conservation with the reference and reflects a measure
of evolutionary relationship of sequences.
[0034] Cas7 may have a sequence similarity of at least 31% with SEQ
ID NO:3; Cas5 may have a sequence similarity of at least 26% with
SEQ ID NO:4. Cas6 may have a sequence similarity of at least 27%
with SEQ ID NO:5.
TABLE-US-00001 For Cse1/CasA (502 AA):
>gi|6130667|ref|NP_417240.1| CRISP RNA (crRNA) containing
Cascade antiviral complex protein [Escherichia coli str. K-12
substr. MG1655] [SEQ ID NO: 1]
MNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALAL
LVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPF
MQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFN
QANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFP
NESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKC
SCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAF
TTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGY
RNNQASILERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFA
EGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADL
RDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPS NG For Cse2/CasB
(160 AA): >gi|16130666|ref|NP_417239.1| CRISP RNA (crRNA)
containing Cascade antiviral complex protein [Escherichia coli str.
K-12 substr. MG1655] [SEQ ID NO: 2]
MADEIDAMALYRAWQQLDNGSCAQIRRVSEPDELRDIPAFYRLVQPFGWE
NPRHQQALLRMVFCLSAGKNVIRHQDKKSEQTTGISLGRALANSGRINER
RIFQLIRADRTADMVQLRRLLTHAEPVLDWPLMARMLTWWGKRERQQLLE DFVLTTNKNA For
Cas7/CasC/Cse4 (363 AA): >gi|16130665|ref|NP_417238.1| CRISP RNA
(crRNA) containing Cascade antiviral complex protein [Escherichia
coli str. K-12 substr. MG1655] [SEQ ID NO: 3]
MSNFINIHVLISHSPSCLNRDDMNMQKDAIFGGKRRVRISSQSLKRAMRK
SGYYAQNIGESSLRTIHLAQLRDVLRQKLGERFDQKIIDKTLALLSGKSV
DEAEKISADAVTPWVVGEIAWFCEQVAKAEADNLDDKKLLKVLKEDIAAI
RVNLQQGVDIALSGRMATSGMMTELGKVDGAMSIAHAITTHQVDSDIDWF
TAVDDLQEQGSAHLGTQEFSSGVFYRYANINLAQLQENLGGASREQALEI
ATHVVHMLATEVPGAKQRTYAAFNPADMVMVNFSDMPLSMANAFEKAVKA
KDGFLQPSIQAFNQYWDRVANGYGLNGAAAQFSLSDVDPITAQVKQMPTL EQLKSWVRNNGEA
For Cas5/CasD (224 AA): >gi|90111483|ref|NP_417237.2| CRISP RNA
(crRNA) containing Cascade antiviral complex protein [Escherichia
coli str. K-12 substr. MG1655] [SEQ ID NO: 4]
MRSYLILRLAGPMQAWGQPTFEGTRPTGRFPTRSGLLGLLGACLGIQRDD
TSSLQALSESVQFAVRCDELILDDRRVSVTGLRDYHTVLGAREDYRGLKS
HETIQTWREYLCDASFTVALWLTPHATMVISELEKAVLKPRYTPYLGRRS
CPLTHPLFLGTCQASDPQKALLNYEPVGGDIYSEESVTGHHLKFTARDEP
MITLPRQFASREWYVIKGGMDVSQ For Cas6e/CasE (199 AA):
>gi|16130663|ref|NP_417236.1| CRISPR RNA precursor cleavage
enzyme; CRISP RNA (crRNA) containing Cascade antiviral complex
protein [Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 5]
MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEGC
HVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILDNQ
KRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQYFS
GDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSLAPL
[0035] In defining the range of sequence variants which fall within
the scope of the invention, for the avoidance of doubt, the
following are each optional limits on the extent of variation, to
be applied for each of SEQ ID NO:1, 2, 3, 4 or 5 starting from the
respect broadest range of variants as specified in terms of the
respective percentage identity above. The range of variants
therefore may therefore include: at least 16%, or at least 17%, or
at least 18%, or at least 19%, or at least 20%, or at least 21%, or
at least 22%, or at least 23%, or at least 24%, or at least 25%, or
at least 26%, or at least 27%, or at least 28%, or at least 29%, or
at least 30%, or at least 31%, or at least 32%, or at least 33%, or
at least 34%, or at least 35%, or at least 36%, or at least 37%, or
at least 38%, or at least 39%, or at least 40%, or at least 41%, or
at least 42%, or at least 43%, at least 44%, or at least 45%, or at
least 46%, or at least 47%, or at least 48%, or at least 49%, or at
least 50%, or at least 51%, or at least 52%, or at least 53%, or at
least 54%, or at least 55%, or at least 56%, or at least 57%, or at
least 58%, or at least 59%, or at least 60%, or at least 61%, or at
least 62%, or at least 63%, or at least 64%, or at least 65%, or at
least 66%, or at least 67%, or at least 68%, or at least 69%, or at
least 70%, or at least 71%, at least 72%, or at least 73%, or at
least 74%, or at least 75%, or at least 76%, or at least 77%, or at
least 78%, or at least 79%, or at least 80%, or at least 81%, or at
least 82%, or at least 83%, or at least 84%, or at least 85%, or at
least 86%, or at least 87%, or at least 88%, or at least 89%, or at
least 90%, or at least 91%, or at least 92%, or at least 93%, or at
least 94%, or at least 95%, or at least 96%, or at least 97%, or at
least 98%, or at least 99%, or 100% amino acid sequence
identity.
[0036] Throughout, the Makarova et al. (2011) nomenclature is being
used in the definition of the Cas protein subunits. Table 2 on page
5 of the Makarova et al. article lists the Cas genes and the names
of the families and superfamilies to which they belong. Throughout,
reference to a Cas protein or Cse protein subunit includes cross
reference to the family or superfamily of which these subunits form
part.
[0037] Throughout, the reference sequences of the Cas and Cse
subunits of the invention may be defined as a nucleotide sequence
encoding the amino acid sequence. For example, the amino acid
sequence of SEQ ID NO:3 for Cas7 also includes all nucleic acid
sequences which encode that amino acid sequence. The variants of
Cas7 included within the scope of the invention therefore include
nucleotide sequences of at least the defined amino acid percentage
identities or similarities with the reference nucleic acid
sequence; as well as all possible percentage identities or
similarities between that lower limit and 100%.
[0038] The Cascade complexes of the invention may be made up of
subunits derived or modified from more than one different bacterial
or archaeal prokaryote. Also, the subunits from different Cas
subtypes may be mixed.
[0039] In a preferred aspect, the Cas6 subunit is a Cas6e subunit
of SEQ ID NO: 17 below, or a sequence of at least 16% identity
therewith.
TABLE-US-00002 The sequence of a preferred Cas6e subunit is
>gi|16130663|ref|NP_417236.1| CRISPR RNA precursor cleavage
enzyme; CRISP RNA (crRNA) containing Cascade antiviral complex
protein [Escherichia coli str. K-12 substr. MG1655]: [SEQ ID NO:
17] MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEGC
HVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILDNQ
KRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQYFS
GDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSLAPL
[0040] The Cascade complexes, or portions thereof, of the
invention--which comprise at least one subunit which includes an
additional amino acid sequence having nucleic acid or chromatin
modifying, visualising, transcription activating or transcription
repressing activity--may further comprise a Cse2 (or YgcK-like)
subunit having an amino acid sequence of SEQ ID NO:2 or a sequence
of at least 20% identity therewith, or a portion thereof.
Alternatively, the Cse subunit is defined as having at least 38%
similarity with SEQ ID NO:2. Optionally, within the protein complex
of the invention it is the Cse2 subunit which includes the
additional amino acid sequence having nucleic acid or chromatin
modifying activity.
[0041] Additionally or alternatively, the Cascade complexes of the
invention may further comprise a Cse1 (or YgcL-like) subunit having
an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 9%
identity therewith, or a portion thereof. Optionally within the
protein complex of the invention it is the Cse1 subunit which
includes the additional amino acid sequence having nucleic acid or
chromatin modifying, visualising, transcription activating or
transcription repressing activity.
[0042] In preferred embodiments, a Cascade complex of the invention
is a Type I CRISPR-Cas system protein complex; more preferably a
subtype I-E CRISPR-Cas protein complex or it can be based on a Type
I-A or Type I-B complex. A Type I-C, D or F complex is possible. In
particularly preferred embodiments based on the E. coli system, the
subunits may have the following stoichiometries:
Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1 Cas6.sub.e or
Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1Cas6e.sub.1.
[0043] The additional amino acid sequence having nucleic acid or
chromatin modifying, visualising, transcription activating or
transcription repressing activity may be translationally fused
through expression in natural or artificial protein expression
systems, or covalently linked by a chemical synthesis step to the
at least one subunit; preferably the at least one functional moiety
is fused or linked to at least the region of the N terminus and/or
the region of the C terminus of at least one of a Cse1, Cse2, Cas7,
Cas5, Cas6 or Cas6e subunit. In particularly preferred embodiments,
the additional amino acid sequence having nucleic acid or chromatin
modifying activity is fused or linked to the N terminus or the C
terminus of a Cse1, a Cse2 or a Cas5 subunit; more preferably the
linkage is in the region of the N terminus of a Cse1 subunit, the N
terminus of a Cse2 subunit, or the N terminus of a Cas7
subunit.
[0044] The additional amino acid sequence having nucleic acid or
chromatin modifying, activating, repressing or visualising activity
may be a protein; optionally selected from a helicase, a nuclease,
a nuclease-helicase, a DNA methyltransferase (e.g. Dam), or DNA
demethylase, a histone methyltransferase, a histone demethylase, an
acetylase, a deacetylase, a phosphatase, a kinase, a transcription
(co-)activator, an RNA polymerase submit, a transcription
repressor, a DNA binding protein, a DNA structuring protein, a
marker protein, a reporter protein, a fluorescent protein, a ligand
binding protein (e.g. mCherry or a heavy metal binding protein), a
signal peptide (e.g. Tat-signal sequence), a subcellular
localisation sequence (e.g. nuclear localisation sequence) or an
antibody epitope.
[0045] The protein concerned may be a heterologous protein from a
species other than the bacterial species from which the Cascade
protein subunits have their sequence origin.
[0046] When the protein is a nuclease, it may be one selected from
a type II restriction endonuclease such as FokI, or a mutant or an
active portion thereof. Other type II restriction endonucleases
which may be used include EcoR1, EcoRV, BgII, BamHI, BsgI and
BspMI. Preferably, one protein complex of the invention may be
fused to the N terminal domain of Fold and another protein complex
of the invention may be fused to the C terminal domain of FokI.
These two protein complexes may then be used together to achieve an
advantageous locus specific double stranded cut in a nucleic acid,
whereby the location of the cut in the genetic material is at the
design and choice of the user, as guided by the RNA component
(defined and described below) and due to presence of a so-called
"protospacer adjacent motif" (PAM) sequence in the target nucleic
acid strand (also described in more detail below).
[0047] In a preferred embodiment, a protein complex of the
invention has an additional amino acid sequence which is a modified
restriction endonuclease, e.g. FokI. The modification is preferably
in the catalytic domain. In preferred embodiments, the modified
FokI is KKR Sharkey or ELD Sharkey which is fused to the Cse1
protein of the protein complex. In a preferred application of these
complexes of the invention, two of these complexes (KKR Sharkey and
ELD Sharkey) may be together in combination. A heterodimer pair of
protein complexes employing differently modified FokI is has
particular advantage in targeted double stranded cutting of nucleic
acid. If homodimers are used then it is possible that there is more
cleavage at non-target sites due to non-specific activity. A
heterodimer approach advantageously increases the fidelity of the
cleavage in a sample of material.
[0048] The Cascade complex with additional amino acid sequence
having nucleic acid or chromatin modifying, visualising,
transcription activating or transcription repressing activity
defined and described above is a component part of an overall
system of the invention which advantageously permits the user to
select in a predetermined matter a precise genetic locus which is
desired to be cleaved, tagged or otherwise altered in some way, e.g
methylation, using any of the nucleic acid or chromatin modifying,
visualising, transcription activating or transcription repressing
entities defined herein. The other component part of the system is
an RNA molecule which acts as a guide for directing the Cascade
complex of the invention to the correct locus on DNA or RNA
intending to be modified, cut or tagged.
[0049] The Cascade complex of the invention preferably also
comprises an RNA molecule which comprises a ribonucleotide sequence
of at least 50% identity to a desired target nucleic acid sequence,
and wherein the protein complex and the RNA molecule form a
ribonucleoprotein complex. Preferably the ribonucleoprotein complex
forms when the RNA molecule is hybridized to its intended target
nucleic acid sequence. The ribonucleoprotein complex forms when the
necessary components of Cascade-functional moiety combination and
RNA molecule and nucleic acid (DNA or RNA) are present together in
suitable physiological conditions, whether in vivo or in vitro.
Without wishing to be bound by any particular theory, the inventors
believe that in the context of dsDNA, particularly negatively
supercoiled DNA, the Cascade complex associating with the dsDNA
causes a partial unwinding of the duplex strands which then allows
the RNA to associate with one strand; the whole ribonucleoprotein
complex then migrates along the DNA strand until a target sequence
substantially complementary to at least a portion of the RNA
sequence is reached, at which point a stable interaction between
RNA and DNA strand occurs, and the function of the functional
moiety takes effect, whether by modifying, nuclease cutting or
tagging of the DNA at that locus.
[0050] In preferred embodiments, a portion of the RNA molecule has
at least 50% identity to the target nucleic acid sequence; more
preferably at least 95% identity to the target sequence. In more
preferred embodiments, the portion of the RNA molecule is
substantially complementary along its length to the target DNA
sequence; i.e. there is only one, two, three, four or five
mismatches which may be contiguous or non-contiguous. The RNA
molecule (or portion thereof) may have at least 51%, or at least
52%, or at least 53%, or at least 54%, or at least 55%, or at least
56%, or at least 57%, or at least 58%, or at least 59%, or at least
60%, or at least 61%, or at least 62%, or at least 63%, or at least
64%, or least 65%, or at least 66%, or at least 67%, or at least
68%, or at least 69%, or at least 70%, or at least 71%, or at least
72%, or at least 73%, or at least 74%, or at least 75%, or at least
76%, or at least 77%, or at least 78%, or at least 79%, or at least
80%, or at least 81%, or at least 82%, or at least 83%, or at least
84%, or least 85%, or at least 86%, or at least 87%, or at least
88%, or at least 89%, or at least 90%, or at least 91%, or at least
92%, or at least 93%, or at least 94%, or at least 95%, or at least
96%, or at least 97%, or at least 98%, or at least 99%, or 100%
identity to the target sequence.
[0051] The target nucleic acid may be DNA (ss or ds) or RNA.
[0052] In other preferred embodiments, the RNA molecule or portion
thereof has at least 70% identity with the target nucleic acid. At
such levels of identity, the target nucleic acid is preferably
dsDNA.
[0053] The RNA molecule will preferably require a high specificity
and affinity for the target nucleic acid sequence. A dissociation
constant (K.sub.d) in the range 1 pM to 1 .mu.M, preferably 1-100
nM is desirable as determined by preferably native gel
electrophoresis, or alternatively isothermal titration calorimetry,
surface plasmon resonance, or fluorescence based titration methods.
Affinity may be determined using an electrophoretic mobility shift
assay (EMSA), also called gel retardation assay (see Semenova E et
al. (2011) Proc. Natl. Acad. Sci. USA 108: 10098-10103).
[0054] The RNA molecule is preferably modelled on what are known
from nature in prokaryotes as CRISPR RNA (crRNA) molecules. The
structure of crRNA molecules is already established and explained
in more detail in Jore et al. (2011) Nature Structural &
Molecular Biology 18: 529-537. In brief, a mature crRNA of type I-E
is often 61 nucleotides long and consists of a 5' "handle" region
of 8 nucleotides, the "spacer" sequence of 32 nucleotides, and a 3'
sequence of 21 nucleotides which form a hairpin with a
tetranucleotide loop. However, the RNA used in the invention does
not have to be designed strictly to the design of naturally
occurring crRNA, whether in length, regions or specific RNA
sequences. What is clear though, is that RNA molecules for use in
the invention may be designed based on gene sequence information in
the public databases or newly discovered, and then made
artificially, e.g. by chemical synthesis in whole or in part. The
RNA molecules of the invention may also be designed and produced by
way of expression in genetically modified cells or cell free
expression systems and this option may include synthesis of some or
all of the RNA sequence.
[0055] The structure and requirements of crRNA has also been
described in Semenova E et al. (2011) Proc. Natl. Acad. Sci. USA
108: 10098-10103. There is a so-called "SEED" portion forming the
5' end of the spacer sequence and which is flanked 5' thereto by
the 5' handle of 8 nucleotides. Semenova et al. (2011) have found
that all residues of the SEED sequence should be complementary to
the target sequence, although for the residue at position 6, a
mismatch may be tolerated. Similarly, when designing and making an
RNA component of a ribonucleoprotein complex of the invention
directed at a target locus (i.e. sequence), the necessary match and
mismatch rules for the SEED sequence can be applied.
[0056] The invention therefore includes a method of detecting
and/or locating a single base change in a target nucleic acid
molecule comprising contacting a nucleic acid sample with a
ribonucleoprotein complex of the invention as hereinbefore
described, or with a Cascade complex and separate RNA component of
the invention as hereinbefore described, and wherein the sequence
of the RNA component (including when in the ribonucleoprotein
complex) is such that it discriminates between a normal allele and
a mutant allele by virtue of a single base change at position 6 of
a contiguous sequence of 8 nucleotide residues.
[0057] In embodiments of the invention, the RNA molecule may have a
length in the range of 35-75 residues. In preferred embodiments,
the portion of the RNA which is complementary to and used for
targeting a desired nucleic acid sequence is 32 or 33 residues
long. (In the context of a naturally occurring crRNA, this would
correspond to the spacer portion; as shown in FIG. 1 of Semenova et
al. (2011)).
[0058] A ribonucleoprotein complex of the invention may
additionally have an RNA component comprising 8 residues 5' to the
RNA sequence which has at least substantial complementarity to the
nucleic acid target sequence. (The RNA sequence having at least
substantial complementarity to the nucleic acid target sequence
would be understood to correspond in the context of a crRNA as
being the spacer sequence. The 5' flanking sequence of the RNA
would be considered to correspond to the 5' handle of a crRNA. This
is shown in FIG. 1 of Semenova et al. (2011)).
[0059] A ribonucleoprotein complex of the invention may have a
hairpin and tetranucleotide loop forming sequence 3' to the RNA
sequence which has at least substantial complementarity to the DNA
target sequence. (In the context of crRNA, this would correspond to
a 3' handle flanking the spacer sequence as shown in FIG. 1 of
Semenova et al. (2011)).
[0060] In some embodiments, the RNA may be a CRISPR RNA
(crRNA).
[0061] The Cascade proteins and complexes of the invention may be
characterised in vitro in terms of its activity of association with
the RNA guiding component to form a ribonucleoprotein complex in
the presence of the target nucleic acid (which may be DNA or RNA).
An electrophoretic mobility shift assay (EMSA) may be used as a
functional assay for interaction of complexes of the invention with
their nucleic acid targets. Basically, Cascade-functional moiety
complex of the invention is mixed with nucleic acid targets and the
stable interaction of the Cascade-functional moiety complex is
monitored by EMSA or by specific readout out the functional moiety,
for example endonucleolytic cleavage of target DNA at the desired
site. This can be determined by further restriction fragment length
analysis using commercially available enzymes with known
specificities and cleavage sites in a target DNA molecule.
[0062] Visualisation of binding of Cascade proteins or complexes of
the invention to DNA or RNA in the presence of guiding RNA may be
achieved using scanning/atomic force microscopy (SFM/AFM) imaging
and this may provide an assay for the presence of functional
complexes of the invention.
[0063] The invention also provides a nucleic acid molecule encoding
at least one clustered regularly interspaced short palindromic
repeat (CRISPR)-associated protein subunit selected from: [0064] a.
a Cse1 subunit having an amino acid sequence of SEQ ID NO: 1 or a
sequence of at least 9% identity therewith; [0065] b. a Cse2
subunit having an amino acid sequence of SEQ ID NO:2 or a sequence
of at least 20% identity therewith; [0066] c. a Cas7 subunit having
an amino acid sequence of SEQ ID NO:3 or a sequence of at least 18%
identity therewith; [0067] d. a Cas5 subunit having an amino acid
sequence of SEQ ID NO:4 or a sequence of at least 17% identity
therewith; [0068] e. a Cas6 subunit having an amino acid sequence
of SEQ ID NO:5 or a sequence of at least 16% identity therewith;
and wherein at least a, b, c, d or e includes an additional amino
acid sequence having nucleic acid or chromatin modifying,
visualising, transcription activating or transcription repressing
activity.
[0069] The additional amino acid sequence having nucleic acid or
chromatin modifying, visualising, transcription activating or
transcription repressing activity is preferably fused to the
CRISPR-associated protein subunit.
[0070] In the nucleic acids of the invention defined above, the
nucleotide sequence may be that which encodes the respective SEQ ID
NO: 1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4 or SEQ ID NO:5, or in
defining the range of variant sequences thereto, it may be a
sequence hybridisable to that nucleotide sequence, preferably under
stringent conditions, more preferably very high stringency
conditions. A variety of stringent hybridisation conditions will be
familiar to the skilled reader in the field. Hybridization of a
nucleic acid molecule occurs when two complementary nucleic acid
molecules undergo an amount of hydrogen bonding to each other known
as Watson-Crick base pairing. The stringency of hybridization can
vary according to the environmental (i.e.
chemical/physical/biological) conditions surrounding the nucleic
acids, temperature, the nature of the hybridization method, and the
composition and length of the nucleic acid molecules used.
Calculations regarding hybridization conditions required for
attaining particular degrees of stringency are discussed in
Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2001);
and Tijssen, Laboratory Techniques in Biochemistry and Molecular
Biology--Hybridization with Nucleic Acid Probes Part I, Chapter 2
(Elsevier, New York, 1993). The T.sub.m is the temperature at which
50% of a given strand of a nucleic acid molecule is hybridized to
its complementary strand. The following is an exemplary set of
hybridization conditions and is not limiting:
Very High Stringency (Allows Sequences that Share at Least 90%
Identity to Hybridize) Hybridization: 5x SSC at 65.degree. C. for
16 hours Wash twice: 2x SSC at room temperature (RT) for 15 minutes
each Wash twice: 0.5x SSC at 65.degree. C. for 20 minutes each High
Stringency (Allows Sequences that Share at Least 80% Identity to
Hybridize) Hybridization: 5x-6x SSC at 65.degree. C.-70.degree. C.
for 16-20 hours Wash twice: 2x SSC at RT for 5-20 minutes each Wash
twice: 1x SSC at 55.degree. C.-70.degree. C. for 30 minutes each
Low Stringency (Allows Sequences that Share at Least 50% Identity
to Hybridize) Hybridization: 6x SSC at RT to 55.degree. C. for
16-20 hours Wash at least twice: 2x-3x SSC at RT to 55.degree. C.
for 20-30 minutes each.
[0071] The nucleic acid molecule may be an isolated nucleic acid
molecule and may be an RNA or a DNA molecule.
[0072] The additional amino acid sequence may be selected from a
helicase, a nuclease, a nuclease-helicase (e.g. Cas3), a DNA
methyltransferase (e.g. Dam), a DNA demethylase, a histone
methyltransferase, a histone demethylase, an acetylase, a
deacetylase, a phosphatase, a kinase, a transcription
(co-)activator, an RNA polymerase subunit, a transcription
repressor, a DNA binding protein, a DNA structuring protein, a
marker protein, a reporter protein, a fluorescent protein, a ligand
binding protein (e.g. mCherry or a heavy metal binding protein), a
signal peptide (e.g. Tat-signal sequence), a subcellular
localisation sequence (e.g. nuclear localisation sequence), or an
antibody epitope. The additional amino acid sequence may be, or
from a different protein from the organism from which the relevant
Cascade protein subunit(s) are derived.
[0073] The invention includes an expression vector comprising a
nucleic acid molecule as hereinbefore defined. One expression
vector may contain the nucleotide sequence encoding a single
Cascade protein subunit and also the nucleotide sequence encoding
the additional amino acid sequence, whereby on expression the
subunit and additional sequence are fused. Other expression vectors
may comprise nucleotide sequences encoding just one or more Cascade
protein subunits which are not fused to any additional amino acid
sequence.
[0074] The additional amino acid sequence with nucleic acid or
chromatin modifying activity may be fused to any of the Cascade
subunits via a linker polypeptide. The linker may be of any length
up to about 60 or up to about 100 amino acid residues. Preferably
the linker has a number of amino acids in the range 10 to 60, more
preferably 10-20. The amino acids are preferably polar and/or small
and/or charged amino acids (e.g. Gln, Ser, Thr, Pro, Ala, Glu, Asp,
Lys, Arg, His, Asn, Cys, Tyr). The linker peptide is preferably
designed to obtain the correct spacing and positioning of the fused
functional moiety and the subunit of Cascade to which the moiety is
fused to allow proper interaction with the target nucleotide.
[0075] An expression vector of the invention (with or without
nucleotide sequence encoding amino acid residues which on
expression will be fused to a Cascade protein subunit) may further
comprise a sequence encoding an RNA molecule as hereinbefore
defined. Consequently, such expression vectors can be used in an
appropriate host to generate a ribonucleoprotein of the invention
which can target a desired nucleotide sequence.
[0076] Accordingly, the invention also provides a method of
modifying, visualising, or activating or repressing transcription
of a target nucleic acid comprising contacting the nucleic acid
with a ribonucleoprotein complex as hereinbefore defined. The
modifying may be by cleaving the nucleic acid or binding to it.
[0077] The invention also includes a method of modifying,
visualising, or activating or repressing transcription of a target
nucleic acid comprising contacting the nucleic acid with a Cascade
protein complex as hereinbefore defined, plus an RNA molecule as
hereinbefore defined.
[0078] In accordance with the above methods, the modification,
visualising, or activating or repressing transcription of a target
nucleic acid may therefore be carried out in vitro and in a cell
free environment; i.e. the method is carried out as a biochemical
reaction whether free in solution or whether involving a solid
phase. Target nucleic acid may be bound to a solid phase, for
example.
[0079] In a cell free environment, the order of adding each of the
target nucleic acid, the Cascade protein complex and the RNA
molecule is at the option of the average skilled person. The three
components may be added simultaneously, sequentially in any desired
order, or separately at different times and in a desired order.
Thus it is possible for the target nucleic acid and RNA to be added
simultaneously to a reaction mix and then the Cascade protein
complex of the invention to be added separately and later in a
sequence of specific method steps.
[0080] The modification, visualising, or activating or repressing
transcription of a target nucleic acid may be made in situ in a
cell, whether an isolated cell or as part of a multicellular
tissue, organ or organism. Therefore in the context of whole tissue
and organs, and in the context of an organism, the method can be
carried out in vivo or it can be carried out by isolating a cell
from the whole tissue, organ or organism and then returning the
cell treated with ribonucleoprotein complex to its former location,
or a different location, whether within the same or a different
organism. Thus the method would include allografts, autografts,
isografts and xenografts.
[0081] In these embodiments, the ribonucleoprotein complex or the
Cascade protein complex of the invention requires an appropriate
form of delivery into the cell, which will be well known to persons
of skill in the art, including microinjection, whether into the
cell cytoplasm or into the nucleus.
[0082] Also when present separately, the RNA molecule requires an
appropriate form of delivery into a cell, whether simultaneously,
separately or sequentially with the Cascade protein complex. Such
forms of introducing RNA into cells are well known to a person of
skill in the art and may include in vitro or ex vivo delivery via
conventional transfection methods. Physical methods, such as
microinjection and electroporation, as well as calcium
co-precipitation, and commercially available cationic polymers and
lipids, and cell-penetrating peptides, cell-penetrating particles
(gene-gun) may each be used. For example, viruses may be used as
delivery vehicles, whether to the cytoplasm and/or nucleus--e.g.
via the (reversible) fusion of Cascade protein complex of the
invention or a ribonucleoprotein complex of the invention to the
viral particle. Viral delivery (e.g. adenovirus delivery) or
Agrobacterium-mediated delivery may be used.
[0083] The invention also includes a method of modifying
visualising, or activating or repressing transcription of a target
nucleic acid in a cell, comprising transfecting, transforming or
transducing the cell with any of the expression vectors as
hereinbefore described. The methods of transfection, transformation
or transduction are of the types well known to a person of skill in
the art. Where there is one expression vector used to generate
expression of a Cascade complex of the invention and when the RNA
is added directly to the cell then the same or a different method
of transfection, transformation or transduction may be used.
Similarly, when there is one expression vector being used to
generate expression of a Cascade-functional fusion complex of the
invention and when another expression vector is being used to
generate the RNA in situ via expression, then the same or a
different method of transfection, transformation or transduction
may be used.
[0084] In other embodiments, mRNA encoding the Cascade complex of
the invention is introduced into a cell so that the Cascade complex
is expressed in the cell. The RNA which guides the Cascade complex
to the desired target sequence is also introduced into the cell,
whether simultaneously, separately or sequentially from the mRNA,
such that the necessary ribonucleoprotein complex is formed in the
cell.
[0085] In the aforementioned methods of modifying or visualising a
target nucleic acid, the additional amino acid sequence may be a
marker and the marker associates with the target nucleic acid;
preferably wherein the marker is a protein; optionally a
fluorescent protein, e.g. green fluorescent protein (GFP) or yellow
fluorescent protein (YFP) or mCherry. Whether in vitro, ex vivo or
in vitro, then methods of the invention can be used to directly
visualise a target locus in a nucleic acid molecule, preferably in
the form of a higher order structure such as a supercoiled plasmid
or chromosome, or a single stranded target nucleic acid such as
mRNA. Direct visualisation of a target locus may use electron
micrography, or fluorescence microscopy.
[0086] Other kinds of label may be used to mark the target nucleic
acid including organic dye molecules, radiolabels and spin labels
which may be small molecules.
[0087] In methods of the invention described above, the target
nucleic acid is DNA; preferably dsDNA although the target can be
RNA; preferably mRNA.
[0088] In methods of the invention for modifying, visualising,
activating transcription or repressing transcription of a target
nucleic acid wherein the target nucleic acid is dsDNA, the
additional amino acid sequence with nucleic acid or chromatin
modifying activity may be a nuclease or a helicase-nuclease, and
the modification is preferably a single stranded or a double
stranded break at a desired locus. In this way unique sequence
specific cutting of DNA can be engineered by using the
Cascade-functional moiety complexes. The chosen sequence of the RNA
component of the final ribonucleoprotein complex provides the
desired sequence specificity for the action of the additional amino
acid sequence.
[0089] Therefore, the invention also provides a method of
non-homologous end joining of a dsDNA molecule in a cell at a
desired locus to remove at least a part of a nucleotide sequence
from the dsDNA molecule; optionally to knockout the function of a
gene or genes, wherein the method comprises making double stranded
breaks using any of the methods of modifying a target nucleic acid
as hereinbefore described.
[0090] The invention further provides a method of homologous
recombination of a nucleic acid into a dsDNA molecule in a cell at
a desired locus in order to modify an existing nucleotide sequence
or insert a desired nucleotide sequence, wherein the method
comprises making a double or single stranded break at the desired
locus using any of the methods of modifying a target nucleic acid
as hereinbefore described.
[0091] The invention therefore also provides a method of modifying,
activating or repressing gene expression in an organism comprising
modifying, activating transcription or repressing transcription of
a target nucleic acid sequence according to any of the methods
hereinbefore described, wherein the nucleic acid is dsDNA and the
functional moiety is selected from a DNA modifying enzyme (e.g. a
demethylase or deacetylase), a transcription activator or a
transcription repressor.
[0092] The invention additionally provides a method of modifying,
activating or repressing gene expression in an organism comprising
modifying, activating transcription or repressing transcription of
a target nucleic acid sequence according to any of the methods
hereinbefore described, wherein the nucleic acid is an mRNA and the
functional moiety is a ribonuclease; optionally selected from an
endonuclease, a 3' exonuclease or a 5' exonuclease.
[0093] In any of the methods of the invention as described above,
the cell which is subjected to the method may be a prokaryote.
Similarly, the cell may be a eukaryotic cell, e.g. a plant cell, an
insect cell, a yeast cell, a fungal cell, a mammalian cell or a
human cell. When the cell is of a mammal or human then it can be a
stem cell (but may not be any human embryonic stem cell). Such stem
cells for use in the invention are preferably isolated stem cells.
Optionally in accordance with any method the invention a cell is
transfected in vitro.
[0094] Preferably though, in any of the methods of the invention,
the target nucleic acid has a specific tertiary structure,
optionally supercoiled, more preferably wherein the target nucleic
acid is negatively supercoiled. Advantageously, the
ribonucleoprotein complexes of the invention, whether produced in
vitro, or whether formed within cells, or whether formed within
cells via expression machinery of the cell, can be used to target a
locus which would otherwise be difficult to get access to in order
to apply the functional activity of a desired component, whether
labelling or tagging of a specific sequence, modification of
nucleic acid structure, switching on or off of gene expression, or
of modification of the target sequence itself involving single or
double stranded cutting followed by insertion of one or more
nucleotide residues or a cassette.
[0095] The invention also includes a pharmaceutical composition
comprising a Cascade protein complex or a ribonucleoprotein complex
of the invention as hereinbefore described.
[0096] The invention further includes a pharmaceutical composition
comprising an isolated nucleic acid or an expression vector of the
invention as hereinbefore described.
[0097] Also provided is a kit comprising a Casacade protein complex
of the invention as hereinbefore described plus an RNA molecule of
the invention as hereinbefore described.
[0098] The invention includes a Cascade protein complex or a
ribonucleoprotein complex or a nucleic acid or a vector, as
hereinbefore described for use as a medicament.
[0099] The invention allows a variety of possibilities to
physically alter DNA of prokaryotic or eukaryotic hosts at a
specified genomic locus, or change expression patterns of a gene at
a given locus. Host genomic DNA can be cleaved or modified by
methylation, visualized by fluorescence, transcriptionally
activated or repressed by functional domains such as nucleases,
methylases, fluorescent proteins, transcription activators or
repressors respectively, fused to suitable Cascade-subunits.
Moreover, the RNA-guided RNA-binding ability of Cascade permits the
monitoring of RNA trafficking in live cells using fluorescent
Cascade fusion proteins, and provides ways to sequester or destroy
host mRNAs causing interference with gene expression levels of a
host cell.
[0100] In any of the methods of the invention, the target nucleic
acid may be defined, preferably so if dsDNA, by the presence of at
least one of the following nucleotide triplets:
5'-CTT-3',5'-CAT-3',5'-CCT-3', or 5'-CTC-3' (or
5'-CUU-3',5'-CAU-3',5'-CCU-3', or 5'-CTC-3' if the target is an
RNA). The location of the triplet is in the target strand adjacent
to the sequence to which the RNA molecule component of a
ribonucleoprotein of the invention hybridizes. The triplet marks
the point in the target strand sequence at which base pairing with
the RNA molecule component of the ribonucleoprotein does not take
place in a 5' to 3' (downstream) direction of the target (whilst it
takes place upstream of the target sequence from that point subject
to the preferred length of the RNA sequence of the RNA molecule
component of the ribonucleoprotein of the invention). In the
context of a native type I CRISPR system, the triplets correspond
to what is known as a "PAM" (protospacer adjacent motif). For ssDNA
or ssRNA targets, presence of one of the triplets is not so
necessary.
[0101] The invention will now be described in detail and with
reference to specific examples and drawings in which:
[0102] FIG. 1 shows the results of gel-shift assays where Cascade
binds negatively supercoiled (nSC) plasmid DNA but not relaxed DNA.
A) Gel-shift of nSC plasmid DNA with J3-Cascade, containing a
targeting (J3) crRNA. pUC-.lamda. was mixed with 2-fold increasing
amounts of J3-Cascade, from a pUC-.lamda.:Cascade molar ratio of
1:0.5 up to a 1:256 molar ratio. The first and last lanes contain
only pUC-.lamda.. B) Gel-shift as in (A) with R44-Cascade
containing a non-targeting (R44) cRNA. C) Gel-shift as in (A) with
Nt.BspQI nicked pUC-.lamda.. D) Gel-shift as in (A) with PdmI
linearized pUC-.lamda.. E) Fit of the fraction pUC-.lamda. bound to
J3-Cascade plotted against the concentration of free J3-Cascade
gives the dissociation constant (Kd) for specific binding. F) Fit
of the fraction pUC-.lamda. bound to R44-Cascade plotted against
the concentration of free R44-Cascade gives the dissociation
constant (Kd) for non-specific binding. G) Specific binding of
Cascade to the protospacer monitored by restriction analysis, using
the unique BsmI restriction site in the protospacer sequence. Lane
1 and 5 contain only pUC-.lamda.. Lane 2 and 6 contain pUC-.lamda.
mixed with Cascade. Lane 3 and 7 contain pUC-.lamda. mixed with
Cascade and subsequent BsmI addition. Lane 4 and 8 contain
pUC-.lamda. mixed with BsmI. H) Gel-shift of pUC-.lamda. bound to
Cascade with subsequent Nt.BspQI cleavage of one strand of the
plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane 2 and 7
contain pUC-.lamda. mixed with Cascade. Lane 3 and 8 contain
pUC-.lamda. mixed with Cascade and subsequent Nt.BspQI nicking Lane
4 and 9 contain pUC-.lamda. mixed with Cascade, followed by
addition of a ssDNA probe complementary to the displaced strand in
the R-loop and subsequent nicking with Nt.BspQI. Lane 5 and 10
contain pUC-.lamda. nicked with Nt.BspQI. H) Gel-shift of
pUC-.lamda. bound to Cascade with subsequent Nt.BspQI nicking of
the plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane 2 and 7
contain pUC-.lamda. mixed with Cascade. Lane 3 and 8 contain
pUC-.lamda. mixed with Cascade and subsequent Nt.BspQI cleavage.
Lane 4 and 9 contain pUC-.lamda. mixed with Cascade, followed by
addition of a ssDNA probe complementary to the displaced strand in
the R-loop and subsequent cleavage with Nt.BspQI. Lane 5 and 10
contain pUC-.lamda. cleaved with Nt.BspQI. I) Gel-shift of
pUC-.lamda. bound to Cascade with subsequent EcoRI cleavage of both
strands of the plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane
2 and 7 contain pUC-.lamda. mixed with Cascade. Lane 3 and 8
contain pUC-.lamda. mixed with Cascade and subsequent EcoRI
cleavage. Lane 4 and 9 contain pUC-.lamda. mixed with Cascade,
followed by addition of a ssDNA probe complementary to the
displaced strand in the R-loop and subsequent cleavage with EcoRI.
Lane 5 and 10 contain pUC-.lamda. cleaved with EcoRI.
[0103] FIG. 2 shows scanning force micrographs demonstrating how
Cascade induces bending of target DNA upon protospacer binding.
A-P) Scanning force microscopy images of nSC plasmid DNA with
J3-Cascade containing a targeting (J3) crRNA. pUC-.lamda. was mixed
with J3-Cascade at a pUC-.lamda.:Cascade ratio of 1:7. Each image
shows a 500.times.500 nm surface area. White dots correspond to
Cascade.
[0104] FIG. 3 shows how BiFC analysis reveals that Cascade and Cas3
interact upon target recognition. A) Venus fluorescence of cells
expressing Cascade.DELTA.Cse1 and CRISPR 7Tm, which targets 7
protospacers on the phage .lamda. genome, and Cse1-N155Venus and
Cas3-C85Venus fusion proteins. B) Brightfield image of the cells in
(A). C) Overlay of (A) and (B). D) Venus fluorescence of phage
.lamda. infected cells expressing Cascade.DELTA.Cse1 and CRISPR
7Tm, and Cse1-N155Venus and Cas3-C85Venus fusion proteins. E)
Brightfield image of the cells in (G). F) Overlay of (G) and (H).
G) Venus fluorescence of phage .lamda. infected cells expressing
Cascade.DELTA.Cse1 and non-targeting CRISPR R44, and N155Venus and
C85Venus proteins. H) Brightfield image of the cells in (J). I)
Overlay of (J) and (K). J) Average of the fluorescence intensity of
4-7 individual cells of each strain, as determined using the
profile tool of LSM viewer (Carl Zeiss).
[0105] FIG. 4 shows Cas3 nuclease and helicase activities during
CRISPR-interference. A) Competent BL21-AI cells expressing Cascade,
a Cas3 mutant and CRISPR J3 were transformed with pUC-.lamda..
Colony forming units per microgram pUC-.lamda. (cfu/.mu.g DNA) are
depicted for each of the strains expressing a Cas3 mutant. Cells
expressing wt Cas3 and CRISPR J3 or CRISPR R44 serve as positive
and negative controls, respectively. B) BL21-AI cells carrying
Cascade, Cas3 mutant, and CRISPR encoding plasmids as well as
pUC-.lamda. are grown under conditions that suppress expression of
the cas genes and CRISPR. At t=0 expression is induced. The
percentage of cells that lost pUC-.lamda. over time is shown, as
determined by the ratio of ampicillin sensitive and ampicillin
resistant cells.
[0106] FIG. 5 shows how a Cascade-Cas3 fusion complex provides in
vivo resistance and has in vitro nuclease activity. A) Coomassie
Blue stained SDS-PAGE of purified Cascade and Cascade-Cas3 fusion
complex. B) Efficiency of plaquing of phage .lamda. on cells
expressing Cascade-Cas3 fusion complex and a targeting (J3) or
non-targeting (R44) CRISPR and on cells expressing Cascade and Cas3
separately together with a targeting (J3) CRISPR. C) Gel-shift (in
the absence of divalent metal ions) of nSC target plasmid with
J3-Cascade-Cas3 fusion complex. pUC-.lamda. was mixed with 2-fold
increasing amounts of J3-Cascade-Cas3, from a
pUC-.lamda.:J3-Cascade-Cas3 molar ratio of 1:0.5 up to a 1:128
molar ratio. The first and last lane contain only pUC-.lamda.. D)
Gel-shift (in the absence of divalent metal ions) of nSC non-target
plasmid with J3-Cascade-Cas3 fusion complex. pUC-p7 was mixed with
2-fold increasing amounts of J3-Cascade-Cas3, from a
pUC-p7:J3-Cascade-Cas3 molar ratio of 1:0.5 up to a 1:128 molar
ratio. The first and last lane contain only pUC-p7. E) Incubation
of nSC target plasmid (pUC-.lamda., left) or nSC non-target plasmid
(pUC-p7, right) with J3-Cascade-Cas3 in the presence of 10 mM
MgCl.sub.2. Lane 1 and 7 contain only plasmid. F) Assay as in (E)
in the presence of 2 mM ATP. G) Assay as in (E) with the mutant
J3-Cascade-Cas3K320N complex. H) Assay as in (G) in the presence of
2 mM ATP.
[0107] FIG. 6 is a schematic diagram showing a model of the
CRISPR-interference type I pathway in E. coli.
[0108] FIG. 7 is a schematic diagram showing how a Cascade-FokI
fusion embodiment of the invention is used to create Fold dimers
which cuts dsDNA to produce blunt ends as part of a process of
non-homologous end joining or homologous recombination.
[0109] FIG. 8 shows how BiFC analysis reveals that Cascade and Cas3
interact upon target recognition. Overlay of Brightfield image and
Venus fluorescence of cells expressing Cascade without Cse1,
Cse1-N155Venus and Cas3-C85Venus and either CRISPR 7Tm, which
targets 7 protospacers on the phage Lambda genome, or the
non-targeting CRISPR R44. Cells expressing CRISPR 7Tm are
fluorescent only when infected with phage Lambda, while cells
expressing CRISPR R44 are non-fluorescent. The highly intense
fluorescent dots (outside cells) are due to light-reflecting salt
crystals. White bars correspond to 10 micron.
[0110] FIG. 9 shows pUC-.lamda. sequences of 4 clones [SEQ ID NOs:
39-42] encoding CRISPR J3, Cascade and Cas3 (wt or S483AT485A)
indicate that these are escape mutants carrying (partial) deletions
of the protospacer or carrying a single point mutation in the seed
region, which explains the inability to cure these plasmids.
[0111] FIG. 10 shows sequence alignments of cas3 genes from
organisms containing the Type I-E CRISPR/Cas system. Alignment of
cas3-cse1 genes from Streptomyces sp. SPB78 (1.sup.st sequence,
Accession Number: ZP.sub.--07272643.1) [SEQ ID NO: 43], in
Streptomyces griseus (2.sup.nd sequence, Accession Number
YP.sub.--001825054) [SEQ ID NO: 44], and in Catenulispora
acidiphila DSM 44928 (3.sup.rd sequence, Accession Number
YP.sub.--003114638) [SEQ ID NO: 45] and an artificial E. coli
Cas3-Cse1 fusion protein [SEQ ID NO: 46] which includes the
polypeptide linker sequence from S. griseus.
[0112] FIG. 11 shows the design of a Cascade.sup.KKR/ELD nuclease
pair in which Fold nuclease domains are mutated such that only
heterodimers consisting of KKR and ELD nuclease domains are and the
distance between the opposing binding sites may be varied to
determine the optimal distance between a Cascade nuclease pair.
[0113] FIG. 12 is a schematic diagram showing genome targeting by a
Cascade-Fold nuclease pair.
[0114] FIG. 13 shows an SDS PAGE gel of Cascade-nuclease
complexes.
[0115] FIG. 14 shows electrophoresis gels of in vitro cleavage
assays of Cascade.sup.KKR/ELD on plasmid DNA.
[0116] FIG. 15 shows Cascade.sup.KKR/ELD cleavage patterns and
frequency [SEQ ID NO: 47].
EXAMPLES
Materials and Methods Used
Strains, Gene Cloning, Plasmids and Vectors
[0117] E. coli BL21-AI and E. coli BL21 (DE3) strains were used
throughout. Table 1 lists all plasmids used in this study. The
previously described pWUR408, pWUR480, pWUR404 and pWUR547 were
used for production of Strep-tag II R44-Cascade, and pWUR408,
pWUR514 and pWUR630 were used for production of Strep-tag II
J3-Cascade (Jore et al., (2011) Nature Structural & Molecular
Biology 18, 529-536; Semenova et al., (2011) Proceedings of the
National Academy of Sciences of the United States of America 108,
10098-10103.) pUC-.lamda. (pWUR610) and pUC-p7 (pWUR613) have been
described elsewhere (Jore et al., 2011; Semenova et al., 2011). The
C85Venus protein is encoded by pWUR647, which corresponds to pET52b
(Novagen) containing the synthetic GA1070943 construct (Table 2)
(Geneart) cloned between the BamHI and NotI sites. The N155Venus
protein is encoded by pWUR648, which corresponds to pRSF1b
(Novagen) containing the synthetic GA1070941 construct (Table 2)
(Geneart) cloned between the NotI and XhoI sites. The Cas3-C85Venus
fusion protein is encoded by pWUR649, which corresponds to pWUR647
containing the Cas3 amplification product using primers BG3186 and
BG3213 (Table 3) between the NcoI and BamHI sites. The
CasA-N155Venus fusion protein is encoded by pWUR650, which
corresponds to pWUR648 containing the CasA amplification product
using primers BG3303 and BG3212 (Table 3) between the NcoI and
BamHI sites. CRISPR 7Tm is encoded by pWUR651, which corresponds to
pACYCDuet-1 (Novagen) containing the synthetic GA1068859 construct
(Table 2) (Geneart) cloned between the NcoI and KpnI sites. The
Cascade encoding pWUR400, the Cascade.DELTA.Cse1 encoding WUR401
and the Cas3 encoding pWUR397 were described previously (Jore et
al., 2011). The Cas3H74A encoding pWUR652 was constructed using
site directed mutagenesis of pWUR397 with primers BG3093, BG3094
(Table 3).
TABLE-US-00003 TABLE 1 Plasmids used Description and order
Restriction Plasmids of genes (5'-3') sites Primers Source pWUR397
cas3 inpRSF-1b, no 1 tags pWUR400 casA-casB-casC-casD- 1 casE in
pCDF-1b, no tags pWUR401 casB-casC-casD-casE 1 in pCDF-1b, no tags
pWUR404 casE in pCDF-1b, no 1 tags pWUR408 casA in pRSF-1b, no 1
tags pWUR480 casB with Strep-tag II 1 (N-term)-casC-casD in pET52b
pWUR514 casB with Strep-tag II 2 (N-term)-casC-casD- CasE in pET52b
pWUR547 E. coli R44 CRISPR, 7x 2 spacer nr. 2, in pACYCDuet-1
pWUR613 pUC-p7; pUC19 2 containing R44- protospacer on a 350 bp
phage P7 amplicon pWUR630 CRISPR poly J3, 5x NcoI/KpnI This spacer
J3 in study pACYCDuet-1 pWUR610 pUC-.lamda.; pUC19 3 containing J3-
protospacer on a 350 bp phage A, amplicon pWUR647 C85Venus;
GA1070943 BamHI/NotI This (Table S1) in pET52b study pWUR648
N155Venus; NotI/XhoI This GA1070941 (Table S1) study in pRSF1b
pWUR649 cas3-C85Venus; NcoI/BamHI BG3186 + This pWUR647 containing
BG3213 study cas3 amplicon pWUR650 casA-N155Venus NcoI/NotI BG3303
+ This pWUR648 containing BG3212 study casA amplicon pWUR651 CRISPR
7Tm; NcoI/KpnI This GA1068859 (Table S1) study in pACYCDuet-1 casB
with Strep-tag II This (N-term)-casC-casD- study CasE in pCDF-1b
cas3-casA fusion This study cas3H74A-CasA fusion This study
cas3D75A-CasA fusion This study cas3K320N-CasA This fusion study
cas3D452N-CasA This fusion study Source 1 in the table above is
Brouns et al (2008) Science 321, 960-964. Source 2 in the table
above is Jore et al (2011) Nature Structural & Molecular
Biology 18: 529-537.
TABLE-US-00004 TABLE 2 Synthetic Constructs GA1070943
ACTGGAAAGCGGGCAGTGAAAGGAAGGCCCATGAGGCCAGTTAATTAAGC
GGATCCTGGCGGCGGCAGCGGCGGCGGCAGCGACAAGCAGAAGAACGGCA
TCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCGGCGTGCAG
CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCT
GCTGCCCGACAACCACTACCTGAGCTACCAGTCCGCCCTGAGCAAAGACC
CCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCC
GGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAGCGGCCGCGGCGCG
CCTAGGCCTTGACGGCCTTCCTTCAATTCGCCCTATAGTGAG [SEQ ID NO: 6] GA1070941
CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATTTAATTAAGC
GGCCGCAGGCGGCGGCAGCGGCGGCGGCAGCATGGTGAGCAAGGGCGAGG
AGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA
AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTA
CGGCAAGCTGACCCTGAAGCTCATCTGCACCACCGGCAAGCTGCCCGTGC
CCTGGCCCACCCTCGTGACCACCCTCGGCTACGGCCTGCAGTGCTTCGCC
CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCC
CGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACT
ACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC
ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCA
CAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCACGGCCTAAC
TCGAGGGCGCGCCCTGGGCCTCATGGGCCTTCCGCTCACTGCCCGCTTTC CAG [SEQ ID NO:
7] GA1068859 CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATGAGCTCCATG
GAAACAAAGAATTAGCTGATCTTTAATAATAAGGAAATGTTACATTAAGG
TTGGTGGGTTGTTTTTATGGGAAAAAATGCTTTAAGAACAAATGTATACT
TTTAGAGAGTTCCCCGCGCCAGCGGGGATAAACCGGGCCGATTGAAGGTC
CGGTGGATGGCTTAAAAGAGTTCCCCGCGCCAGCGGGGATAAACCGCCGC
AGGTACAGCAGGTAGCGCAGATCATCAAGAGTTCCCCGCGCCAGCGGGGA
TAAACCGACTTCTCTCCGAAAAGTCAGGACGCTGTGGCAGAGTTCCCCGC
GCCAGCGGGGATAAACCGCCTACGCGCTGAACGCCAGCGGTGTGGTGAAT
GAGTTCCCCGCGCCAGCGGGGATAAACCGGTGTGGCCATGCACGCCTTTA
ACGGTGAACTGGAGTTCCCCGCGCCAGCGGGGATAAACCGCACGAACTCA
GCCAGAACGACAAACAAAAGGCGAGTTCCCCGCGCCAGCGGGGATAAACC
GGCACCAGTACGCGCCCCACGCTGACGGTTTCTGAGTTCCCCGCGCCAGC
GGGATAAACCGCAGCTCCCATTTTCAAACCCAGGTACCCTGGGCCTCATG
GGCCTTCCGCTCACTGCCCGCTTTCCAG [SEQ ID NO: 8] GA1047360
GAGCTCCCGGGCTGACGGTAATAGAGGCACCTACAGGCTCCGGTAAAACG
GAAACAGCGCTGGCCTATGCTTGGAAACTTATTGATCAACAAATTGCGGA
TAGTGTTATTTTTGCCCTCCCAACACAAGCTACCGCGAATGCTATGCTTA
CGAGAATGGAAGCGAGCGCGAGCCACTTATTTTCATCCCCAAATCTTATT
CTTGCTCATGGCAATTCACGGTTTAACCACCTCTTTCAATCAATAAAATC
ACGCGCGATTACTGAACAGGGGCAAGAAGAAGCGTGGGTTCAGTGTTGTC
AGTGGTTGTCACAAAGCAATAAGAAAGTGTTTCTTGGGCAAATCGGCGTT
TGCACGATTGATCAGGTGTTGATTTCGGTATTGCCAGTTAAACACCGCTT
TATCCGTGGTTTGGGAATTGGTAGATCTGTTTTAATTGTTAATGAAGTTC
ATGCTTACGACACCTATATGAACGGCTTGCTCGAGGCAGTGCTCAAGGCT
CAGGCTGATGTGGGAGGGAGTGTTATTCTTCTTTCCGCAACCCTACCAAT
GAAACAAAAACAGAAGCTTCTGGATACTTATGGTCTGCATACAGATCCAG
TGGAAAATAACTCCGCATATCCACTCATTAACTGGCGAGGTGTGAATGGT
GCGCAACGTTTTGATCTGCTAGCGGATCCGGTACC [SEQ ID NO: 9]
TABLE-US-00005 TABLE 3 Primers BG3186
ATAGCGCCATGGAACCTTTTAAATATATATGCCATTA [SEQ ID NO: 10] BG3213
ACAGTGGGATCCGCTTTGGGATTTGCAGGGATGACTCTGGT [SEQ ID NO: 11] BG3303
ATAGCGTCATGAATTTGCTTATTGATAACTGGATTCCTGTA CG [SEQ ID NO: 12] BG3212
ACAGTGGCGGCCGCGCCATTTGATGGCCCTCCTTGCGGTTT TAA [SEQ ID NO: 13]
BG3076 CGTATATCAAACTTTCCAATAGCATGAAGAGCAATGAAAAA TAAC [SEQ ID NO:
14] BG3449 ATGATACCGCGAGACCCACGCTC [SEQ ID NO: 15] BG3451
CGGATAAAGTTGCAGGACCACTTC [SEQ ID NO: 16]
Protein Production and Purification
[0118] Cascade was expressed and purified as described (Jore et
al., 2011). Throughout purification a buffer containing 20 mM HEPES
pH 7.5, 75 mM NaCl, 1 mM DTT, 2 mM EDTA was used for resuspension
and washing. Protein elution was performed in the same buffer
containing 4 mM desthiobiotin. The Cascade-Cas3 fusion complex was
expressed and purified in the same manner, with washing steps being
performed with 20 mM HEPES pH 7.5, 200 mM NaCl and 1 mM DTT, and
elution in 20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT containing 4 mM
desthiobiotin.
Electrophoretic Mobility Shift Assay
[0119] Purified Cascade or Cascade subsomplexes were mixed with
pUC-.lamda. in a buffer containing 20 mM HEPES pH 7.5, 75 mM NaCl,
1 mM DTT, 2 mM EDTA, and incubated at 37.degree. C. for 15 minutes.
Samples were run overnight on a 0.8% TAE Agarose gel and
post-stained with SybR safe (Invitrogen) 1:10000 dilution in TAE
for 30 minutes. Cleavage with BsmI (Fermentas) or Nt.BspQI (New
England Biolabs) was performed in the HEPES reaction buffer
supplemented with 5 mM MgCl.sub.2.
Scanning Force Microscopy
[0120] Purified Cascade was mixed with pUC-.lamda. (at a ratio of
7:1, 250 nM Cascade, 35 nM DNA) in a buffer containing 20 mM HEPES
pH 7.5, 75 mM NaCl, 0.2 mM DTT, 0.3 mM EDTA and incubated at
37.degree. C. for 15 minutes. Subsequently, for AFM sample
preparation, the incubation mixture was diluted 10.times. in double
distilled water and MgCl.sub.2 was added at a final concentration
of 1.2 mM. Deposition of the protein-DNA complexes and imaging was
carried out as described before (Dame et al., (2000) Nucleic Acids
Res. 28: 3504-3510).
Fluorescence Microscopy
[0121] BL21-AI cells carrying CRISPR en cas gene encoding plasmids,
were grown overnight at 37.degree. C. in Luria-Bertani broth (LB)
containing ampicillin (100 .mu.g/ml), kanamycin (50 .mu.g/ml),
streptomycin (50 .mu.g/ml) and chloramphenicol (34 .mu.g/ml).
Overnight culture was diluted 1:100 in fresh antibiotic-containing
LB, and grown for 1 hour at 37.degree. C. Expression of cas genes
and CRISPR was induced for 1 hour by adding L-arabinose to a final
concentration of 0.2% and IPTG to a final concentration of 1 mM.
For infection, cells were mixed with phage Lambda at a Multiplicity
of Infection (MOI) of 4. Cells were applied to poly-L-lysine
covered microscope slides, and analyzed using a Zeiss LSM510
confocal laser scanning microscope based on an Axiovert inverted
microscope, with a 40.times. oil immersion objective (N.A. of 1.3)
and an argon laser as the excitation source (514 nm) and detection
at 530-600 nm. The pinhole was set at 203 .mu.m for all
measurements.
pUC-.lamda. Transformation Studies
[0122] LB containing kanamycin (50 .mu.g/ml), streptomycin (50
.mu.g/ml) and chloramphenicol (34 .mu.g/ml) was inoculated from an
overnight pre-inoculum and grown to an OD.sub.600 of 0.3.
Expression of cas genes and CRISPR was induced for 45 minutes with
0.2% L-arabinose and 1 mM IPTG. Cells were collected by
centrifugation at 4.degree. C. and made competent by resuspension
in ice cold buffer containing 100 mM RbCl.sub.2, 50 mM MnCl.sub.2,
30 mM potassium acetate, 10 mM CaCl.sub.2 and 15% glycerol, pH 5.8.
After a 3 hour incubation, cells were collected and resuspended in
a buffer containing 10 mM MOPS, 10 mM RbC1, 75 mM CaCl.sub.2, 15%
glycerol, pH 6.8. Transformation was performed by adding 80 ng
pUC-.lamda., followed by a 1 minute heat-shock at 42.degree. C.,
and 5 minute cold-shock on ice. Next cells were grown in LB for 45
minutes at 37.degree. C. before plating on LB-agar plates
containing 0.2% L-arabinose, 1 mM IPTG, ampicillin (100 .mu.g/ml),
kanamycin (50 .mu.g/ml), streptomycin (50 .mu.g/ml) and
chloramphenicol (34 .mu.g/ml).
[0123] Plasmid curing was analyzed by transforming BL21-AI cells
containing cas gene and CRISPR encoding plasmids with pUC-.lamda.,
while growing the cells in the presence of 0.2% glucose to suppress
expression of the T7-polymerase gene. Expression of cas genes and
CRISPR was induced by collecting the cells and re-suspension in LB
containing 0.2% arabinose and 1 mM IPTG. Cells were plated on
LB-agar containing either streptomycin, kanamycin and
chloramphenicol (non-selective for pUC-.lamda.) or ampicillin,
streptomycin, kanamycin and chloramphenicol (selective for
pUC-.lamda.). After overnight growth the percentage of plasmid loss
can be calculated from the ratio of colony forming units on the
selective and non-selective plates.
Phage Lambda Infection Studies
[0124] Host sensitivity to phage infection was tested using a
virulent phage Lambda (.lamda..sub.vir), as in (Brouns et al (2008)
Science 321, 960-964). The sensitivity of the host to infection was
calculated as the efficiency of plaquing (the plaque count ratio of
a strain containing an anti-.lamda. CRISPR to that of the strain
containing a non-targeting R44CRISPR) as described in Brouns et al
(2008).
Example 1
Cascade Exclusively Binds Negatively Supercoiled Target DNA
[0125] The 3 kb pUC19-derived plasmid denoted pUC-.lamda., contains
a 350 bp DNA fragment corresponding to part of the J gene of phage
.lamda., which is targeted by J3-Cascade (Cascade associated with
crRNA containing spacer J3 (Westra et al (2010) Molecular
Microbiology 77, 1380-1393). The electrophoretic mobility shift
assays show that Cascade has high affinity only for negatively
supercoiled (nSC) target plasmid. At a molar ratio of J3-Cascade to
pUC-.lamda. of 6:1 all nSC plasmid was bound by Cascade, (see FIG.
1A), while Cascade carrying the non-targeting crRNA R44
(R44-Cascade) displayed non-specific binding at a molar ratio of
128:1 (see FIG. 1B). The dissociation constant (Kd) of nSC
pUC-.lamda. was determined to be 13.+-.1.4 nM for J3-Cascade (see
FIG. 1E) and 429.+-.152 nM for R44-Cascade (see FIG. 1F).
J3-Cascade was unable to bind relaxed target DNA with measurable
affinity, such as nicked (see FIG. 1C) or linear pUC-.lamda. (see
FIG. 1D), showing that Cascade has high affinity for larger DNA
substrates with a nSC topology.
[0126] To distinguish non-specific binding from specific binding,
the BsmI restriction site located within the protospacer was used.
While adding BsmI enzyme to pUC-.lamda. gives a linear product in
the presence of R44-Cascade (see FIG. 1G, lane 4), pUC-.lamda. is
protected from BsmI cleavage in the presence of J3-Cascade (see
FIG. 1G, lane 7), indicating specific binding to the protospacer.
This shows that Cas3 is not required for in vitro sequence specific
binding of Cascade to a protospacer sequence in a nSC plasmid.
[0127] Cascade binding to nSC pUC-.lamda. was followed by nicking
with Nt.BspQI, giving rise to an OC topology. Cascade is released
from the plasmid after strand nicking, as can be seen from the
absence of a mobility shift (see FIG. 1H, compare lane 8 to lane
10). In contrast, Cascade remains bound to its DNA target when a
ssDNA probe complementary to the displaced strand is added to the
reaction before DNA cleavage by Nt.BspQI (see FIG. 1H, lane 9). The
probe artificially stabilizes the Cascade R-loop on relaxed target
DNA. Similar observations are made when both DNA strands of
pUC-.lamda. are cleaved after Cascade binding (see FIG. 1I, lane 8
and lane 9).
Example 2
Cascade Induces Bending of Bound Target DNA
[0128] Complexes formed between purified Cascade and pUC-.lamda.
were visualized. Specific complexes containing a single bound
J3-Cascade complex were formed, while unspecific R44-Cascade yields
no DNA bound complexes in this assay under identical conditions.
Out of 81 DNA molecules observed 76% were found to have J3-Cascade
bound (see FIG. 2A-P). Of these complexes in most cases Cascade was
found at the apex of a loop (86%), whereas a small fraction only
was found at non-apical positions (14%). These data show that
Cascade binding causes bending and possibly wrapping of the DNA,
probably to facilitate local melting of the DNA duplex.
Example 3
Naturally Occurring Fusions of Cas3 and Cse1: Cas3 Interacts with
Cascade Upon Protospacer Recognition
[0129] Figure S3 shows sequence analysis of cas3 genes from
organisms containing the Type I-E CRISPR/Cas system reveals that
Cas3 and Cse1 occur as fusion proteins in Streptomyces sp. SPB78
(Accession Number: ZP.sub.--07272643.1), in Streptomyces griseus
(Accession Number YP.sub.--001825054), and in Catenulispora
acidiphila DSM 44928 (Accession Number YP.sub.--003114638).
Example 4
Bimolecular Fluorescence Complementation (BiFC) Shows how a Cse1
Fusion Protein Forming Part of Cascade Continues to Interact with
Cas3
[0130] BiFC experiments were used to monitor interactions between
Cas3 and Cascade in vivo before and after phage .lamda. infection.
BiFC experiments rely on the capacity of the non-fluorescent halves
of a fluorescent protein, e.g., Yellow Fluorescent Protein (YFP) to
refold and to form a fluorescent molecule when the two halves occur
in close proximity. As such, it provides a tool to reveal
protein-protein interactions, since the efficiency of refolding is
greatly enhanced if the local concentrations are high, e.g., when
the two halves of the fluorescent protein are fused to interaction
partners. Cse1 was fused at the C-terminus with the N-terminal 155
amino acids of Venus (Cse1-N155Venus), an improved version of YFP
(Nagai et al (2002) Nature Biotechnology 20, 87-90). Cas3 was
C-terminally fused to the C-terminal 85 amino acids of Venus
(Cas3-C85Venus).
[0131] BiFC analysis reveals that Cascade does not interact with
Cas3 in the absence of invading DNA (FIG. 3ABC, FIG. 3P and FIG.
8). Upon infection with phage .lamda., however, cells expressing
Cascade.DELTA.Cse1, Cse1-N155Venus and Cas3-C85Venus are
fluorescent if they co-express the anti-.lamda. CRISPR 7Tm (FIG.
3DEF, FIG. 3P and FIG. 8). When they co-express a non-targeting
CRISPR R44 (FIG. 3GHI, FIG. 3P and FIG. 8), the cells remain
non-fluorescent. This shows that Cascade and Cas3 specifically
interact during infection upon protospacer recognition and that
Cse1 and Cas3 are in close proximity of each other in the
Cascade-Cas3 binary effector complex.
[0132] These results also show quite clearly that a fusion of Cse1
with an heterologous protein does not disrupt the ribonucleoprotein
formation of Cascade and crRNA, nor does it disrupt the interaction
of Cascade and Cas3 with the target phage DNA, even when the Cas3
itself is also a fusion protein.
Example 5
Preparing a Designed Cas3-Cse1 Fusion Gives a Protein with In Vivo
Functional Activity
[0133] Providing in vitro evidence for Cas3 DNA cleavage activity
required purified and active Cas3. Despite various solubilization
strategies, Cas3 overproduced (Howard et al (2011) Biochem. J. 439,
85-95) in E. coli BL21 is mainly present in inactive aggregates and
inclusion bodies. Cas3 was therefore produced as a Cas3-Cse1 fusion
protein, containing a linker identical to that of the Cas3-Cse1
fusion protein in S. griseus (see FIG. 10). When co-expressed with
Cascade.DELTA.Cse1 and CRISPR J3, the fusion-complex was soluble
and was obtained in high purity with the same apparent
stoichiometry as Cascade (FIG. 5A). When functionality of this
complex was tested for providing resistance against phage .lamda.
infection, the efficiency of plaquing (eop) on cells expressing the
fusion-complex J3-Cascade-Cas3 was identical as on cells expressing
the separate proteins (FIG. 5B).
[0134] Since the J3-Cascade-Cas3 fusion-complex was functional in
vivo, in vitro DNA cleavage assays were carried out using this
complex. When J3-Cascade-Cas3 was incubated with pUC-.lamda. in the
absence of divalent metals, plasmid binding was observed at molar
ratios similar to those observed for Cascade (FIG. 5C), while
a-specific binding to a non-target plasmid (pUC-p7, a pUC19 derived
plasmid of the same size as pUC-.lamda., but lacking a protospacer)
occurred only at high molar ratios (FIG. 5D), indicating that
a-specific DNA binding of the complex is also similar to that of
Cascade alone.
[0135] Interestingly, the J3-Cascade-Cas3 fusion complex displays
magnesium dependent endonuclease activity on nSC target plasmids.
In the presence of 10 mM Mg.sup.2+J3-Cascade-Cas3 nicks nSC
pUC-.lamda. (FIG. 5E, lane 3-7), but no cleavage is observed for
substrates that do not contain the target sequence (FIG. 5E, lane
9-13), or that have a relaxed topology. No shift of the resulting
OC band is observed, in line with previous observations that
Cascade dissociates spontaneously after cleavage, without requiring
ATP-dependent Cas3 helicase activity. Instead, the helicase
activity of Cas3 appears to be involved in exonucleolytic plasmid
degradation. When both magnesium and ATP are added to the reaction,
full plasmid degradation occurred (FIG. 5H).
[0136] The inventors have found that Cascade alone is unable to
bind protospacers on relaxed DNA. In contrast, the inventors have
found that Cascade efficiently locates targets in negatively
supercoiled DNA, and subsequently recruits Cas3 via the Cse1
subunit. Endonucleolytic cleavage by the Cas3 HD-nuclease domain
causes spontaneous release of Cascade from the DNA through the loss
of supercoiling, remobilizing Cascade to locate new targets. The
target is then progressively unwound and cleaved by the joint
ATP-dependent helicase activity and HD-nuclease activity of Cas3,
leading to complete target DNA degradation and neutralization of
the invader.
[0137] Referring to FIG. 6 and without wishing to be bound to any
particular theory, a mechanism of operation for the
CRISPR-interference type I pathway in E. coli may involve (1)
First, Cascade carrying a crRNA scans the nSC plasmid DNA for a
protospacer, with adjacent PAM. Whether during this stage strand
separation occurs is unknown. (2) Sequence specific protospacer
binding is achieved through basepairing between the crRNA and the
complementary strand of the DNA, forming an R-loop. Upon binding,
Cascade induces bending of the DNA. (3) The Cse1 subunit of Cascade
recruits Cas3 upon DNA binding. This may be achieved by Cascade
conformational changes that take place upon nucleic acid binding.
(4) The HD-domain (darker part) of Cas3 catalyzes
Mg.sup.2+-dependent nicking of the displaced strand of the R-loop,
thereby altering the topology of the target plasmid from nSC to
relaxed OC. (5a and 5b) The plasmid relaxation causes spontaneous
dissociation of Cascade. Meanwhile Cas3 displays ATP-dependent
exonuclease activity on the target plasmid, requiring the helicase
domain for target dsDNA unwinding and the HD-nuclease domain for
successive cleavage activity. (6) Cas3 degrades the entire plasmid
in an ATP-dependent manner as it processively moves along, unwinds
and cleaves the target dsDNA.
Example 6
Preparation of Artificial Cas-Strep Tag Fusion Proteins and
Assembly of Cascade Complexes
[0138] Cascade complexes are produced and purified as described in
Brouns et al (2008) Science 321: 960-4 (2008), using the expression
plasmids listed in Supplementary Table 3 of Jore et al (2011)
Nature Structural & Molecular Biology 18: 529-537. Cascade is
routinely purified with an N-terminal Strep-tag II fused to CasB
(or CasC in CasCDE). Size exclusion chromatography (Superdex 200 HR
10/30 (GE)) is performed using 20 mM Tris-HCl (pH 8.0), 0.1 M NaCl,
1 mM dithiotreitol. Cascade preparations (.about.0.3 mg) are
incubated with DNase I (Invitrogen) in the presence of 2.5 mM
MgCl.sub.2 for 15 min at 37.degree. C. prior to size exclusion
analysis. Co-purified nucleic acids are isolated by extraction
using an equal volume of phenol:chloroform:isoamylalcohol (25:24:1)
pH 8.0 (Fluka), and incubated with either DNase I (Invitrogen)
supplemented with 2.5 mM MgCl.sub.2 or RNase A (Fermentas) for 10
min at 37.degree. C. Cas subunit proteins fused to the amino acid
sequence of Strep-Tag are produced.
[0139] Plaque assays showing the biological activity of the
Strep-Tag Cascade subunits are performed using bacteriophage Lambda
and the efficiency of plaquing (EOP) was calculated as described in
Brouns et al (2008).
[0140] For purification of crRNA, samples are analyzed by ion-pair
reversed-phased-HPLC on an Agilent 1100 HPLC with UV.sub.260nm
detector (Agilent) using a DNAsep column 50 mm.times.4.6 mm I. D.
(Transgenomic, San Jose, Calif.). The chromatographic analysis is
performed using the following buffer conditions: A) 0.1 M
triethylammonium acetate (TEAA) (pH 7.0) (Fluka); B) buffer A with
25% LC MS grade acetonitrile (v/v) (Fisher). crRNA is obtained by
injecting purified intact Cascade at 75.degree. C. using a linear
gradient starting at 15% buffer B and extending to 60% B in 12.5
min, followed by a linear extension to 100% B over 2 min at a flow
rate of 1.0 ml/min. Hydrolysis of the cyclic phosphate terminus was
performed by incubating the HPLC-purified crRNA in a final
concentration of 0.1 M HCl at 4.degree. C. for 1 hour. The samples
are concentrated to 5-10 .mu.l on a vacuum concentrator (Eppendorf)
prior to ESI-MS analysis.
[0141] Electrospray Ionization Mass spectrometry analysis of crRNA
is performed in negative mode using an UHR-TOF mass spectrometer
(maX is) or an HCT Ultra PTM Discovery instrument (both Bruker
Daltonics), coupled to an online capillary liquid chromatography
system (Ultimate 3000, Dionex, UK). RNA separations are performed
using a monolithic (PS-DVB) capillary column (200 .mu.m.times.50 mm
I.D., Dionex, UK). The chromatography is performed using the
following buffer conditions: C) 0.4 M
1,1,1,3,3,3,-Hexafluoro-2-propanol (HFIP, Sigma-Aldrich) adjusted
with triethylamine (TEA) to pH 7.0 and 0.1 mM TEAA, and D) buffer C
with 50% methanol (v/v) (Fisher). RNA analysis is performed at
50.degree. C. with 20% buffer D, extending to 40% D in 5 min
followed by a linear extension to 60% D over 8 min at a flow rate
of 2 .mu.l/min.
[0142] Cascade protein is analyzed by native mass spectrometry in
0.15 M ammonium acetate (pH 8.0) at a protein concentration of 5
.mu.M. The protein preparation is obtained by five sequential
concentration and dilution steps at 4.degree. C. using a
centrifugal filter with a cut-off of 10 kDa (Millipore). Proteins
are sprayed from borosilicate glass capillaries and analyzed on a
LCT electrospray time-of-flight or modified quadrupole
time-of-flight instruments (both Waters, UK) adjusted for optimal
performance in high mass detection (see Tahallah N et al (2001)
Rapid Commun Mass Spectrom 15: 596-601 (2001) and van den Heuvel,
R. H. et al. Anal Chem 78: 7473-83 (2006). Exact mass measurements
of the individual Cas proteins were acquired under denaturing
conditions (50% acetonitrile, 50% MQ, 0.1% formic acid).
Sub-complexes in solution were generated by the addition of
2-propanol to the spray solution to a final concentration of 5%
(v/v). Instrument settings were as follows; needle voltage
.about.1.2 kV, cone voltage .about.175 V, source pressure 9 mbar.
Xenon was used as the collision gas for tandem mass spectrometric
analysis at a pressure of 1.5 10.sup.-2 mbar. The collision voltage
varied between 10-200 V.
[0143] Electrophoretic mobility shift assays (EMSA) are used to
demonstrate the functional activity of Cascade complexes for target
nucleic acids. EMSA is performed by incubating Cascade, CasBCDE or
CasCDE with 1 nM labelled nucleic acid in 50 mM Tris-Cl pH 7.5, 100
mM NaCl. Salmon sperm DNA (Invitrogen) is used as competitor. EMSA
reactions are incubated at 37.degree. C. for 20-30 min prior to
electrophoresis on 5% polyacrylamide gels. The gels are dried and
analyzed using phosphor storage screens and a PMI phosphor imager
(Bio-Rad). Target DNA binding and cleavage activity of Cascade is
tested in the presence of 1-10 mM Ca, Mg or Mn-ions.
[0144] DNA targets are gel-purified long oligonucleotides (Isogen
Life Sciences or Biolegio), listed in Supplementary Table 3 of Jore
et al (2011). The oligonucleotides are end-labeled using
.gamma..sup.32P-ATP (PerkinElmer) and T4 kinase (Fermentas).
Double-stranded DNA targets are prepared by annealing complementary
oligonucleotides and digesting remaining ssDNA with Exonuclease I
(Fermentas). Labelled RNA targets are in vitro transcribed using T7
Maxiscript or T7 Mega Shortscript kits (Ambion) with
.alpha..sup.32P-CTP (PerkinElmer) and removing template by DNase I
(Fermentas) digestion. Double stranded RNA targets are prepared by
annealing complementary RNAs and digesting surplus ssRNA with RNase
T1 (Fermentas), followed by phenol extraction.
[0145] Plasmid mobility shift assays are performed using plasmid
pWUR613 containing the R44 protospacer. The fragment containing the
protospacer is PCR-amplified from bacteriophage P7 genomic DNA
using primers BG3297 and BG 3298 (see Supplementary Table 3 of Jore
et al (2011). Plasmid (0.4 .mu.g) and Cascade were mixed in a 1:10
molar ratio in a buffer containing 5 mM Tris-HCl (pH 7.5) and 20 mM
NaCl and incubated at 37.degree. C. for 30 minutes. Cascade
proteins were then removed by proteinase K treatment (Fluka) (0.15
U, 15 min, 37.degree. C.) followed by phenol/chloroform extraction.
RNA-DNA complexes were then treated with RNaseH (Promega) (2 U, 1
h, 37.degree. C.).
[0146] Strep-Tag-Cas protein subunit fusions which form Cascade
protein complexes or active sub-complexes with the RNA component
(equivalent to a crRNA), have the expected biological and
functional activity of scanning and specific attachment and
cleavage of nucleic acid targets. Fusions of the Cas subunits with
the amino acid chains of fluorescent dyes also form Cascade
complexes and sub-complexes with the RNA component (equivalent to
crRNA) which retains biological and functional activity and allows
visualisation of the location of a target nucleic acid sequence in
ds DNA for example.
Example 7
A Cascade-Nuclease Pair and Test of Nuclease Activity In Vitro
[0147] Six mutations designated "Sharkey" have been introduced by
random mutagenesis and screening to improve nuclease activity and
stability of the non-specific nuclease domain from Flavobacterium
okeanokoites restriction enzyme Fold (see Guo, J., et al. (2010) J.
Mol. Biol. 400: 96-107). Other mutations have been introduced that
reduce off-target cleavage activity. This is achieved by
engineering electrostatic interactions at the FokI dimer interface
of a ZFN pair, creating one FokI variant with a positively charged
interface (KKR, E490K, 1538K, H537R) and another with a negatively
charged interface (ELD, Q486E, 1499L, N496D) (see Doyon, Y., et al.
(2011) Nature Methods 8: 74-9). Each of these variants is
catalytically inactive as a homodimer, thereby reducing the
frequency of off-target cleavage.
Cascade-Nuclease Design
[0148] We translationally fused improved FokI nucleases to the
N-terminus of Cse1 to generate variants of Cse1 being
FokI.sup.KKR-Cse1 and FokI.sup.ELD-Cse1, respectively. These two
variants are co-expressed with Cascade subunits (Cse2, Cas7, Cas5
and Cas6e), and one of two distinct CRISPR plasmids with uniform
spacers. This loads the Cascade.sup.KKR complex with uniform
P7-crRNA, and the Cascade.sup.ELD complex with uniform M13g8-crRNA.
These complexes are purified using the N-terminally Strepll-tagged
Cse2 as described in Jore, M. M., et al., (2011) Nat. Struct. Mol.
Biol. 18(5): 529-536. Furthermore an additional purification step
can be carried out using an N-terminally HIS-tagged FokI, to ensure
purifying full length and intact Cascade-nuclease fusion
complexes.
[0149] The nucleotide and amino acid sequences of the fusion
proteins used in this example were as follows:
TABLE-US-00006 >nucleotide sequence of FokI-(Sharkey-ELD)-Cse1
[SEQ ID NO: 18] ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCG
CCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTG
CACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTT
TTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAA
ACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTA
TTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAG
GCAGATGAAATGGAACGTTATGTGGAAGAAAATCAGACCCGTGATAAACA
TCTGAATCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGT
TTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAG
CTGACCCGTCTGAATCATATTACCAATTGTAATGGTGCAGTTCTGAGCGT
TGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCC
TGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGAT
CCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaa
tttgcttattgataactggattcctgtacgcccgcgaaacggggggaaag
tccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcga
ttaagtttgccccgtgacgatatggaactggccgctttagcactgctggt
ttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgac
atcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcg
ccgtggatagatatgttctaccttaatcacgcagaacatccctttatgca
gaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttgg
ctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcag
ggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggc
gaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggag
gaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtg
ttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatga
atcacatacggaaaaccaacctacctggattaaacctatcaagtccaatg
agtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaa
ccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttg
ctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaat
ttacctttacagttaatgggctatggccccatccgcattccccttgtctg
gtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccac
ctccgcaccatcatggacacaaatcagccgagttgtggtagataagatta
ttcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcaga
aatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaa
taatcaagcatctattcttgaacggcgtcatgatgtgttgatgtttaatc
aggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggt
ttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagg
gtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactg
cagaaaggcatttctatcgacagagtgaattattaattcccgatgtactg
gcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgaga
caaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccct
atgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacg
ctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatgg ctga >protein
sequence of FokI-(Sharkey-ELD)-Cse1 [SEQ ID NO: 19]
MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEF
FMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQ
ADEMERYVEENQTRDKHLNPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQ
LTRLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFAD
PTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWR
LSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIA
PWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQ
GEALCGGCTAIALFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTV
LLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQ
PAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCL
VTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFR
NIAPQSPLELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVINEIVTVG
LGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVL
ANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARAT
LYKHLRELKPQGGPSNG* >nucleotide sequence of
FokI-(Sharkey-KKR)-Cse1 [SEQ ID NO: 20]
ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCG
CCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTG
CACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTT
TTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAA
ACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTA
TTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAG
GCAGATGAAATGCAGCGTTATGTGAAAGAAAATCAGACCCGCAACAAACA
TATTAACCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGT
TTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAG
CTGACCCGTCTGAATCGTAAAACCAATTGTAATGGTGCAGTTCTGAGCGT
TGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCC
TGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGAT
CCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaa
tttgcttattgataactggattcctgtacgcccgcgaaacggggggaaag
tccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcga
ttaagtttgccccgtgacgatatggaactggccgctttagcactgctggt
ttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgac
atcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcg
ccgtggatagatatgttctaccttaatcacgcagaacatccctttatgca
gaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttgg
ctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcag
ggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggc
gaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggag
gaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtg
ttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatga
atcacatacggaaaaccaacctacctggattaaacctatcaagtccaatg
agtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaa
ccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttg
ctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaat
ttacctttacagttaatgggctatggccccatccgcattccccttgtctg
gtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccac
ctccgcaccatcatggacacaaatcagccgagttgtggtagataagatta
ttcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcaga
aatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaa
taatcaagcatctattcttgaacggcgtcatgatgtgttgatgtttaatc
aggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggt
ttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagg
gtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactg
cagaaaggcatttctatcgacagagtgaattattaattcccgatgtactg
gcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgaga
caaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccct
atgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacg
ctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatgg ctga >protein
sequence of FokI-(Sharkey-KKR)-Cse1 [SEQ ID NO: 21]
MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEF
FMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQ
ADEMQRYVKENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQ
LTRLNRKTNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFAD
PTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWR
LSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIA
PWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQ
GEALCGGCTAIALFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTV
LLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQ
PAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCL
VTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFR
NIAPQSPLELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVINEIVTVG
LGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVL
ANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARAT
LYKHLRELKPQGGPSNG* >nucleotide sequence of
His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1 [SEQ ID
NO: 22] ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAA
GAAAAAACGTAAAGTTGAAGATCCGAAAGACATGGCTCAACTGGTTAAAA
GCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTG
CCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGA
TCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCT
ATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTAT
ACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTA
TAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGCAGCGTT
ATGTGAAAGAAAATCAGACCCGCAACAAACATATTAACCCGAATGAATGG
TGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAG
CGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCGTA
AAACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGT
GGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAA
ATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAG
GCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactgg
attcctgtacgcccgcgaaacggggggaaagtccaaatcataaatctgca
atcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacg
atatggaactggccgctttagcactgctggtttgcattgggcaaattatc
gccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgct
cactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttct
accttaatcacgcagaacatccctttatgcagaccaaaggtgtcaaagca
aatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgac
gaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtg
gatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggtttt
ggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgtt
cgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacat
tacctcgtcttcaaaaacaatttcctaatgaatcacatacggaaaaccaa
cctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtc
aattgggtttgtccgtggtctattctggcaaccagcgcatattgaattat
gcgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaat
ttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgg
gctatggccccatccgcattccccttgtctggtaacagtcaagaaagggg
aggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggaca
caaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaa
tcgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtc
ctcttgaattgattatggggggatatcgtaataatcaagcatctattctt
gaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacgg
caatgtgataaacgaaatagtgactgttggtttgggatataaaacagcct
tacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttc
aaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcg
acagagtgaattattaattcccgatgtactggcgaatgttaatttttccc
aggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgt
gaaatgctatttaatcaatctgtagctccctatgcacatcatcctaaatt
aataagcacattagcgcttgcccgcgccacgctatacaaacatttacggg
agttaaaaccgcaaggagggccatcaaatggctga >protein sequence of
His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1 [SEQ ID
NO: 23] MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYV
PHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIY
TVGSPIDYGVIVDTKAYSGGYNLPIGQADEMQRYVKENQTRNKHINPNEW
WKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNRKTNCNGAVLSVEELLIG
GEMIKAGTLTLEEVRRKFNNGEINFADPTNRAKGLEAVSVASMNLLIDNW
IPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQII
APAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKA
NDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGF
GGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQ
PTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESN
LRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWT
QISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASIL
ERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGFKNKDF
KGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLC
EMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG* >nucleotide
sequence of His.sub.6-Dual-monopartite NLS SV40-FokI
(Sharkey-ELD)-Cse1 [SEQ ID NO: 24]
ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAA
GAAAAAACGTAAAGTTGAAGATCCGAAAGACATGGCTCAACTGGTTAAAA
GCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTG
CCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGA
TCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCT
ATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTAT
ACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTA
TAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGGAACGTT
ATGTGGAAGAAAATCAGACCCGTGATAAACATCTGAATCCGAATGAATGG
TGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAG
CGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCATA
TTACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGT
GGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAA
ATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAG
GCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactgg
attcctgtacgcccgcgaaacggggggaaagtccaaatcataaatctgca
atcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacg
atatggaactggccgctttagcactgctggtttgcattgggcaaattatc
gccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgct
cactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttct
accttaatcacgcagaacatccctttatgcagaccaaaggtgtcaaagca
aatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgac
gaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtg
gatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggtttt
ggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgtt
cgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacat
tacctcgtcttcaaaaacaatttcctaatgaatcacatacggaaaaccaa
cctacctggattaaacctatcaagtccaatgagtctatacctgatcgtca
attgggtttgtccgtggtctattctggcaaccagcgcatattgaattatg
cgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaatt
tgcgttataccggttttcttaaggaaaaatttacctttacagttaatggg
ctatggccccatccgcattccccttgtctggtaacagtcaagaaagggga
ggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggacac
aaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaat
cgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtcc
tcttgaattgattatggggggatatcgtaataatcaagcatctattcttg
aacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggc
aatgtgataaacgaaatagtgactgttggtttgggatataaaacagcctt
acgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttca
aaggggccggagtctctgttcatgagactgcagaaaggcatttctatcga
cagagtgaattattaattcccgatgtactggcgaatgttaatttttccca
ggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtg
aaatgctatttaatcaatctgtagctccctatgcacatcatcctaaatta
ataagcacattagcgcttgcccgcgccacgctatacaaacatttacggga
gttaaaaccgcaaggagggccatcaaatggctga >protein sequence of
His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-ELD)-Cse1 [SEQ ID
NO: 25] MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYV
PHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIY
TVGSPIDYGVIVDTKAYSGGYNLPIGQADEMERYVEENQTRDKHLNPNEW
WKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLIG
GEMIKAGTLTLEEVRRKFNNGEINFADPTNRAKGLEAVSVASMNLLIDNW
IPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQII
APAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKA
NDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGF
GGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQ
PTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESN
LRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWT
QISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASIL
ERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGFKNKDF
KGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLC
EMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG*
DNA Cleavage Assay
[0150] The specificity and activity of the complexes was tested
using an artificially constructed target plasmid as a substrate.
This plasmid contains M13 and P7 binding sites on opposing strands
such that both FokI domains face each other (see FIG. 11). The
distance between the Cascade binding sites varies between 25 and 50
basepairs with 5 bp increments. As the binding sites of Cascade
need to be flanked by any of four known PAM sequences
(5'-protospacer-CTT/CAT/CTC/CCT-3' this distance range gives
sufficient flexibility to design such a pair for almost any given
sequence.
[0151] The sequences of the target plasmids used are as follows.
The number indicated the distance between the M13 and P7 target
sites. Protospacers are shown in bold, PAMs underlined:
[0152] Sequences of the target plasmids. The number indicates the
distance between the M13 and P7 target sites. (protospacers in
bold, PAMs underlined)
TABLE-US-00007 >50 bp [SEQ ID NO: 26]
gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTGCTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGC
CTCGTTCCGAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGC
GGCCTTTAACTCggatcc >45 bp [SEQ ID NO: 27]
gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTC
GTTCAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCT TTAACTCggatcc
>40 bp [SEQ ID NO: 28]
gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTCGA
AGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAAC TCggatcc >35
bp [SEQ ID NO: 29]
gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTAAGCTG
TCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCgga tcc >30 bp
[SEQ ID NO: 30] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTGCTAGCTCTAGAACTAGTCCTCAGCCTAGGAAGCTGTCTTT
CGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatcc >25 bp [SEQ ID
NO: 31] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAA
TACCGTCTTCTCTAGAACTAGTCCTCAGCCTAGGAAGCTGTCTTTCGCTG
CTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatcc
[0153] Cleavage of the target plasmids was analysed on agarose
gels, where negatively supercoiled (nSC) plasmid can be
distinguished from linearized- or nicked plasmid. The cleavage site
of the Cascade.sup.KKR/ELD pair in a target vector was determined
by isolating linear cleavage products from an agarose gel and
filling in the recessed 3' ends left by FokI cleavage with the
Klenow fragment of E. coli DNA polymerase to create blunt ends. The
linear vector was self-ligated, transformed, amplified, isolated
and sequenced. Filling in of recessed 3' ends and re-ligation will
lead to extra nucleotides in the sequence that represents the
overhang left by FokI cleavage. By aligning the sequence reads to
the original sequence, the cleavage sites can be found on a clonal
level and mapped. Below, the additional bases incorporated into the
sequence after filling in recessed 3' ends left by FokI cleavage
are underlined:
##STR00001##
[0154] Reading from top to bottom, the 5'-3' sequences above are
SEQ ID NOs: 32-35, respectively.
Cleavage of a Target Locus in Human Cells
[0155] The human CCR5 gene encodes the C--C chemokine receptor type
5 protein, which serves as the receptor for the human
immunodeficiency virus (HIV) on the surface of white blood cells.
The CCR5 gene is targeted using a pair of Cascade.sup.KKR/ELD
nucleases in addition to an artificial GFP locus. A suitable
binding site pair is selected on the coding region of CCR5. Two
separate CRISPR arrays containing uniform spacers targeting each of
the binding sites are constructed using DNA synthesis
(Geneart).
[0156] The human CCR5 target gene selection and CRISPR designs used
are as follows:
TABLE-US-00008 >Part of genomic human CCR5 sequence, containing
whole ORF (position 347-1446). [SEQ ID NO: 36]
GGTGGAACAAGATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAAT
TATTATACATCGGAGCCCTGCCAAAAAATCAATGTGAAGCAAATCGCAGC
CCGCCTCCTGCCTCCGCTCTACTCACTGGTGTTCATCTTTGGTTTTGTGG
GCAACATGCTGGTCATCCTCATCCTGATAAACTGCAAAAGGCTGAAGAGC
ATGACTGACATCTACCTGCTCAACCTGGCCATCTCTGACCTGTTTTTCCT
TCTTACTGTCCCCTTCTGGGCTCACTATGCTGCCGCCCAGTGGGACTTTG
GAAATACAATGTGTCAACTCTTGACAGGGCTCTATTTTATAGGCTTCTTC
TCTGGAATCTTCTTCATCATCCTCCTGACAATCGATAGGTACCTGGCTGT
CGTCCATGCTGTGTTTGCTTTAAAAGCCAGGACGGTCACCTTTGGGGTGG
TGACAAGTGTGATCACTTGGGTGGTGGCTGTGTTTGCGTCTCTCCCAGGA
ATCATCTTTACCAGATCTCAAAAAGAAGGTCTTCATTACACCTGCAGCTC
TCATTTTCCATACAGTCAGTATCAATTCTGGAAGAATTTCCAGACATTAA
AGATAGTCATCTTGGGGCTGGTCCTGCCGCTGCTTGTCATGGTCATCTGC
TACTCGGGAATCCTAAAAACTCTGCTTCGGTGTCGAAATGAGAAGAAGAG
GCACAGGGCTGTGAGGCTTATCTTCACCATCATGATTGTTTATTTTCTCT
TCTGGGCTCCCTACAACATTGTCCTTCTCCTGAACACCTTCCAGGAATTC
TTTGGCCTGAATAATTGCAGTAGCTCTAACAGGTTGGACCAAGCTATGCA
GGTGACAGAGACTCTTGGGATGACGCACTGCTGCATCAACCCCATCATCT
ATGCCTTTGTCGGGGAGAAGTTCAGAAACTACCTCTTAGTCTTCTTCCAA
AAGCACATTGCCAAACGCTTCTGCAAATGCTGTTCTATTTTCCAGCAAGA
GGCTCCCGAGCGAGCAAGCTCAGTTTACACCCGATCCACTGGGGAGCAGG
AAATATCTGTGGGCTTGTGACACGGACTCAAGTGGGCTGGTGACCCAGTC
[0157] Red1/2: chosen target sites (distance: 34 bp, PAM
5'-CTT-3'). "Red 1 is first appearing underlined sequence in the
above. Red2 is the second underlined sequence.
TABLE-US-00009 >CRISPR array red1 (italics = spacers, bold =
repeats) [SEQ ID NO: 37]
ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAATA
AGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCT
TTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGGGGATAA
ACCGCAAACACAGCATGGACGACAGCCAGGTACCTAGAGTTCCCCGCGCC
AGCGGGGATAAACCGCAAACACAGCATGGACGACAGCCAGGTACCTAGAG
TTCCCCGCGCCAGCGGGGATAAACCGCAAACACAGCATGGACGACAGCCA
GGTACCTAGAGTTCCCCGCGCCAGCGGGGATAAACCGAAAACAAAAGGCT
CAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCTTGGGGCCTCTAAACGG
GTCTTGAGGGGTTTTTTGggtacc >CRISPR array red2 (italics: spacers,
bold: repeats) [SEQ ID NO: 38]
ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAATA
AGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCT
TTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGGGGATAA
ACCGTGTGATCACTTGGGTGGTGGCTGTGTTTGCGTGAGTTCCCCGCGCC
AGCGGGGATAAACCGTGTGATCACTTGGGTGGTGGCTGTGTTTGCGTGAG
TTCCCCGCGCCAGCGGGGATAAACCGTGTGATCACTTGGGTGGTGGCTGT
GTTTGCGTGAGTTCCCCGCGCCAGCGGGGATAAACCGAAAACAAAAGGCT
CAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCTTGGGGCCTCTAAACGG
GTCTTGAGGGGTTTTTTGggtacc
Delivery of Cascade.sup.KKR/ELD into the Nucleus of Human Cells
[0158] Cascade is very stable as a multi-subunit protein-RNA
complex and is easily produced in mg quantities in E. coli.
Transfection or micro-injection of the complex in its intact form
as purified from E. coli is used as methods of delivery (see FIG.
12). As shown in FIG. 12, Cascade-Fold nucleases are purified from
E. coli and encapsulated in protein transfection vesicles. These
are then fused with the cell membrane of human HepG2 cells
releasing the nucleases in the cytoplasm (step 2). NLS sequences
are then be recognized by importin proteins, which facilitate
nucleopore passage (step 3). Cascade.sup.KKR (open rectangle) and
Cascade.sup.ELD (filled rectangle) will then find and cleave their
target site (step 4), inducing DNA repair pathways that will alter
the target site leading to desired changes. Cascade.sup.KKR/ELD
nucleases need to act only once and require no permanent presence
in the cell encoded on DNA.
[0159] To deliver Cascade into human cells, protein transfection
reagents are used from various sources including Pierce, NEB,
Fermentas and Clontech. These reagents have recently been developed
for the delivery of antibodies, and are useful to transfect a broad
range of human cell lines with efficiencies up to 90%. Human HepG2
cells are transfected. Also, other cell lines including CHO-K1,
COS-7, HeLa, and non-embryonic stem cells, are transfected.
[0160] To import the Cascade.sup.KKR/ELD nuclease pair into the
nucleus, a tandem monopartite nuclear localisation signal (NLS)
from the large T-antigen of simian virus 40 (SV40) is fused to the
N-terminus of FokI. This ensures import of only intact
Cascade.sup.ELD/KKR into the nucleus. (The nuclear pore complex
translocates RNA polymerases (550 kDa) and other large protein
complexes). As a check prior to transformations, the nuclease
activity of the Cascade.sup.KKR/ELD nuclease pair is checked in
vitro using purified complexes and CCR5PCR amplicons to exclude
transfecting non-productive Cascade.sup.KKR/ELD nuclease pairs.
Surveyor Assay
[0161] Transfected cells are cultivated and passaged for several
days. The efficiency of in vivo target DNA cleavage is then
assessed by using the Surveyor assay of Guschin, D. Y., et al
(2010) Methods Mol. Biol., 649: 247-256. Briefly, PCR amplicons of
the target DNA locus will be mixed 1:1 with PCR amplicons from
untreated cells. These are heated and allowed to anneal, giving
rise to mismatches at target sites that have been erroneously
repaired by NHEJ. A mismatch nuclease is then used to cleave only
mismatched DNA molecules, giving a maximum of 50% of cleavage when
target DNA cleavage by Cascade.sup.KKR/ELD is complete. This
procedure was then followed up by sequencing of the target DNA
amplicons of treated cells. The assay allows for rapid assessment
and optimization of the delivery procedure.
Production of Cascade-Nuclease Pairs
[0162] The Cascade-nuclease complexes were constructed as explained
above. Affinity purification from E. coli using the StrepII-tagged
Cse2 subunit yields a complex with the expected stoichiometry when
compared to native Cascade. Referring to FIG. 13, this shows the
stoichiometry of native Cascade (1), Cascade.sup.KKR with P7 CrRNA
and Cascade.sup.ELD with M13 CrRNA 24 h after purification using
only Streptactin. Bands in native Cascade (1) are from top to
bottom: Cse1, Cas7, Cas5, Cas6e, Cse2. Cascade.sup.KKR/ELD show the
FokI-Cse1 fusion band and an additional band representing Cse1 with
a small part of FokI as a result of proteolytic degradation.
[0163] Apart from an intact FokI-Cse1 fusion protein, we observed
that a fraction of the FokI-Cse1-fusion protein is proteolytically
cleaved, resulting in a Cse1 protein with only the linker and a
small part of FokI attached to it (as confirmed by Mass
Spectrometry, data not shown). In most protein isolations the
fraction of degraded fusion protein is approximately 40%. The
isolated protein is stably stored in the elution buffer (20 mM
HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 4 mM desthiobiotin) with
additional 0.1% Tween 20 and 50% glycerol at -20.degree. C. Under
these storage conditions, integrity and activity of the complex
have been found stable for at least three weeks (data not
shown).
Introduction of a His.sub.6-Tag and NLS to the Cascade-Nuclease
[0164] The Cascade nuclease fusion design was modified to
incorporate a Nucleolar Localization Signal (NLS) to enable
transport into the nucleus of eukaryotic cells. For this a tandem
monopartite NLS from the large T-antigen of Simian Virus SV40
(sequence: PKKKRKVDPKKKRKV) was translationally fused to the
N-terminus of the FokI-Cse1 fusion protein, directly preceded by a
His.sub.6-tag at the N-terminus. The His.sub.6-tag (sequence:
MHHHHHH) allows for an additional Ni.sup.2+-resin affinity
purification step after StrepII purification. This additional step
ensures the isolation of only full-length Cascade-nuclease fusion
complex, and increases the efficiency of cleavage by eliminating
the binding of non-intact Cascade complexes to the target site
forming an unproductive nuclease pair.
In Vitro Cleavage Assay
[0165] Cascade.sup.KKR/ELD activity and specificity was assayed in
vitro as described above. FIG. 14A shows plasmids with distances
between protospacers of 25-50 bp (5 bp increments, lanes 1-6)
incubated with Cascade.sup.KKR/ELD for 30 minutes at 37.degree. C.
Lane 10 contains the target plasmid in its three possible
topologies: the lowest band represents the initial, negatively
supercoiled (nSC) form of the plasmid, the middle band represents
the linearized form (cleaved by XbaI), whilst the upper band
represents the open circular (OC) form (after nicking with
Nt.BbrCI). Lane 7 shows incubation of a plasmid with both binding
sites removed (negative control). Therefore FIG. 14A shows a
typical cleavage assay using various target plasmids in which the
binding sites are separated by 25 to 50 base pairs in 5 bp
increments (lanes 1 to 6). These plasmids with distances of 25-50
bp were incubated with Cascade.sup.KKR/ELD carrying anti P7 and M13
crRNA respectively. A plasmid containing no binding sites served as
a control (lane 7). The original plasmid exists in negatively
supercoiled form (nSC, control lane 8), and nicked or linearized
products are clearly distinguishable. Upon incubation a linear
cleavage product is formed when binding sites were separated by 30,
35 and 40 base pairs (lanes 2, 3, 4). At 25, 45 and 50 base pairs
distance (lanes 1, 5, 6), the target plasmid appeared to be
incompletely cleaved leading to the nicked form (OC). These results
show the best cleavage in plasmids with distances between 30 and 40
bp, giving sufficient flexibility when designing a crRNA pair for
any given locus. Both shorter and longer distances result in
increased nicking activity while creating less DSBs. There is very
little activity on a plasmid where the two protospacers have been
removed, showing target specificity (lane 7).
Cleavage Conditions
[0166] To assess the optimal buffer conditions for cleavage assays,
and to estimate whether activity of the complex is expected at
physiological conditions, the following two buffers were selected:
(1) NEB4 (New England Biolabs, 50 mM potassium acetate, 20 mM
Tris-acetate, 10 mM magnesium acetate, 1 mM dithiothreitol, pH 7.9)
and (2) Buffer 0 (Fermentas, 50 mM Tris-HCl, 10 mM MgCl.sub.2, 100
mM NaCl, 0.1 mg/mL BSA, pH 7.5). Of the two, NEB4 is recommended
for optimal activity of the commercial intact FokI enzyme. Buffer 0
was chosen from a quick screen to give good activity and
specificity (data not shown). FIG. 14B shows incubation with
different buffers and different incubation times. Lanes 1-4 have
been incubated with Fermentas Buffer 0 (lane 1, 2 for 15 minutes,
lane 3, 4 for 30 minutes), lanes 5, 6 have been incubated with NEB4
(30 minutes). Lanes 1, 3, 5 used the target plasmid with 35 bp
spacing, lanes 2, 4, 6 used the non-target plasmid (no binding
sites). Lanes 7, 8 have been incubated with only Cascade.sup.KKR or
Cascade.sup.ELD respectively (buffer 0). Lane 9 is the topology
marker as in (A). Lane 10 and 11 show the target and non-target
plasmid incubated without addition of Cascade. Therefore in FIG.
14B, activity was tested on the target plasmid with 35 base pairs
distance (lane 1, 3, 5) and a non-target control plasmid (lane 2,
4, 6). There was a high amount of unspecific nicking and less
cleavage in NEB4 (lane 5,6), whilst buffer 0 shows only activity in
the target plasmid with a high amount of specific cleavage and
little nicking (lane 1-4). The difference is likely caused by the
NaCl concentration in buffer 0, higher ionic strength weakens
protein-protein interactions, leading to less nonspecific activity.
Incubation of 15 or 30 minutes shows little difference in both
target and non-target plasmid (lane 1, 2 or 3,4 respectively).
Addition of only one type of Cascade (P7.sup.KKR or M13.sup.ELD)
does not result in cleavage activity (lane 7, 8) as expected. This
experiment shows that specific Cascade nuclease activity by a
designed pair occurs when the NaCl concentration is at least 100
mM, which is near the physiological saline concentration inside
cells (137 mM NaCl). The Cascade nuclease pair is expected to be
fully active in vivo, in eukaryotic cells, while displaying
negligible off-target cleavage activity.
Cleavage Site
[0167] The site of cleavage in the target plasmid with a spacing of
35 bp (pTarget35) was determined. FIG. 15 shows how sequencing
reveals up- and downstream cleavage sites by Cascade.sup.KKR/ELD in
the target plasmid with 35 base pair spacing. In FIG. 15A) is shown
the target region within pTarget35 with annotated potential
cleavage sites. Parts of the protospacers are indicated in red and
blue. B) The bar chart shows four different cleavage patterns and
their relative abundance within sequenced clones. The blue bars
represent the generated overhang, while the left and right border
of each bar represents the left and right cleavage site (see B for
annotation).
[0168] FIG. 15A shows the original sequence of pTarget35, with
numbered cleavage sites from -7 to +7 where 0 lies in the middle
between the two protospacers (indicated in red and blue). Seventeen
clones were sequenced and these all show cleavage around position
0, creating varying overhangs between 3 and 5 bp (see FIG. 15B).
Overhangs of 4 are most abundant (cumulatively 88%), while
overhangs of 3 and 5 occur only once (6% each). The cleavage
occurred exactly as expected with no clones showing off target
cleavage.
Cleaving a Target Locus in Human Cells.
[0169] Cascade.sup.KKR/ELD nucleases were successfully modified to
contain an N-terminal His.sub.6-tag followed by a dual mono-partite
Nucleolar Localisation Signal. These modified Cascade nuclease
fusion proteins were co-expressed with either one of two
synthetically constructed CRISPR arrays, each targeting a binding
site in the human CCR5 gene. First the activity of this new
nuclease pair is validated in vitro by testing the activity on a
plasmid containing this region of the CCR5 gene. The nuclease pair
is transfected to a human cell line, e.g. HeLa cell line.
Efficiency of target cleavage is assessed using the Surveyor assay
as described above.
Sequence CWU 1
1
471502PRTEscherichia coli 1Met Asn Leu Leu Ile Asp Asn Trp Ile Pro
Val Arg Pro Arg Asn Gly 1 5 10 15 Gly Lys Val Gln Ile Ile Asn Leu
Gln Ser Leu Tyr Cys Ser Arg Asp 20 25 30 Gln Trp Arg Leu Ser Leu
Pro Arg Asp Asp Met Glu Leu Ala Ala Leu 35 40 45 Ala Leu Leu Val
Cys Ile Gly Gln Ile Ile Ala Pro Ala Lys Asp Asp 50 55 60 Val Glu
Phe Arg His Arg Ile Met Asn Pro Leu Thr Glu Asp Glu Phe 65 70 75 80
Gln Gln Leu Ile Ala Pro Trp Ile Asp Met Phe Tyr Leu Asn His Ala 85
90 95 Glu His Pro Phe Met Gln Thr Lys Gly Val Lys Ala Asn Asp Val
Thr 100 105 110 Pro Met Glu Lys Leu Leu Ala Gly Val Ser Gly Ala Thr
Asn Cys Ala 115 120 125 Phe Val Asn Gln Pro Gly Gln Gly Glu Ala Leu
Cys Gly Gly Cys Thr 130 135 140 Ala Ile Ala Leu Phe Asn Gln Ala Asn
Gln Ala Pro Gly Phe Gly Gly 145 150 155 160 Gly Phe Lys Ser Gly Leu
Arg Gly Gly Thr Pro Val Thr Thr Phe Val 165 170 175 Arg Gly Ile Asp
Leu Arg Ser Thr Val Leu Leu Asn Val Leu Thr Leu 180 185 190 Pro Arg
Leu Gln Lys Gln Phe Pro Asn Glu Ser His Thr Glu Asn Gln 195 200 205
Pro Thr Trp Ile Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro Ala Ser 210
215 220 Ser Ile Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His Ile
Glu 225 230 235 240 Leu Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser Cys
Cys Gly Gln Glu 245 250 255 Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys
Glu Lys Phe Thr Phe Thr 260 265 270 Val Asn Gly Leu Trp Pro His Pro
His Ser Pro Cys Leu Val Thr Val 275 280 285 Lys Lys Gly Glu Val Glu
Glu Lys Phe Leu Ala Phe Thr Thr Ser Ala 290 295 300 Pro Ser Trp Thr
Gln Ile Ser Arg Val Val Val Asp Lys Ile Ile Gln 305 310 315 320 Asn
Glu Asn Gly Asn Arg Val Ala Ala Val Val Asn Gln Phe Arg Asn 325 330
335 Ile Ala Pro Gln Ser Pro Leu Glu Leu Ile Met Gly Gly Tyr Arg Asn
340 345 350 Asn Gln Ala Ser Ile Leu Glu Arg Arg His Asp Val Leu Met
Phe Asn 355 360 365 Gln Gly Trp Gln Gln Tyr Gly Asn Val Ile Asn Glu
Ile Val Thr Val 370 375 380 Gly Leu Gly Tyr Lys Thr Ala Leu Arg Lys
Ala Leu Tyr Thr Phe Ala 385 390 395 400 Glu Gly Phe Lys Asn Lys Asp
Phe Lys Gly Ala Gly Val Ser Val His 405 410 415 Glu Thr Ala Glu Arg
His Phe Tyr Arg Gln Ser Glu Leu Leu Ile Pro 420 425 430 Asp Val Leu
Ala Asn Val Asn Phe Ser Gln Ala Asp Glu Val Ile Ala 435 440 445 Asp
Leu Arg Asp Lys Leu His Gln Leu Cys Glu Met Leu Phe Asn Gln 450 455
460 Ser Val Ala Pro Tyr Ala His His Pro Lys Leu Ile Ser Thr Leu Ala
465 470 475 480 Leu Ala Arg Ala Thr Leu Tyr Lys His Leu Arg Glu Leu
Lys Pro Gln 485 490 495 Gly Gly Pro Ser Asn Gly 500
2160PRTEscherichia coli 2Met Ala Asp Glu Ile Asp Ala Met Ala Leu
Tyr Arg Ala Trp Gln Gln 1 5 10 15 Leu Asp Asn Gly Ser Cys Ala Gln
Ile Arg Arg Val Ser Glu Pro Asp 20 25 30 Glu Leu Arg Asp Ile Pro
Ala Phe Tyr Arg Leu Val Gln Pro Phe Gly 35 40 45 Trp Glu Asn Pro
Arg His Gln Gln Ala Leu Leu Arg Met Val Phe Cys 50 55 60 Leu Ser
Ala Gly Lys Asn Val Ile Arg His Gln Asp Lys Lys Ser Glu 65 70 75 80
Gln Thr Thr Gly Ile Ser Leu Gly Arg Ala Leu Ala Asn Ser Gly Arg 85
90 95 Ile Asn Glu Arg Arg Ile Phe Gln Leu Ile Arg Ala Asp Arg Thr
Ala 100 105 110 Asp Met Val Gln Leu Arg Arg Leu Leu Thr His Ala Glu
Pro Val Leu 115 120 125 Asp Trp Pro Leu Met Ala Arg Met Leu Thr Trp
Trp Gly Lys Arg Glu 130 135 140 Arg Gln Gln Leu Leu Glu Asp Phe Val
Leu Thr Thr Asn Lys Asn Ala 145 150 155 160 3363PRTEscherichia coli
3Met Ser Asn Phe Ile Asn Ile His Val Leu Ile Ser His Ser Pro Ser 1
5 10 15 Cys Leu Asn Arg Asp Asp Met Asn Met Gln Lys Asp Ala Ile Phe
Gly 20 25 30 Gly Lys Arg Arg Val Arg Ile Ser Ser Gln Ser Leu Lys
Arg Ala Met 35 40 45 Arg Lys Ser Gly Tyr Tyr Ala Gln Asn Ile Gly
Glu Ser Ser Leu Arg 50 55 60 Thr Ile His Leu Ala Gln Leu Arg Asp
Val Leu Arg Gln Lys Leu Gly 65 70 75 80 Glu Arg Phe Asp Gln Lys Ile
Ile Asp Lys Thr Leu Ala Leu Leu Ser 85 90 95 Gly Lys Ser Val Asp
Glu Ala Glu Lys Ile Ser Ala Asp Ala Val Thr 100 105 110 Pro Trp Val
Val Gly Glu Ile Ala Trp Phe Cys Glu Gln Val Ala Lys 115 120 125 Ala
Glu Ala Asp Asn Leu Asp Asp Lys Lys Leu Leu Lys Val Leu Lys 130 135
140 Glu Asp Ile Ala Ala Ile Arg Val Asn Leu Gln Gln Gly Val Asp Ile
145 150 155 160 Ala Leu Ser Gly Arg Met Ala Thr Ser Gly Met Met Thr
Glu Leu Gly 165 170 175 Lys Val Asp Gly Ala Met Ser Ile Ala His Ala
Ile Thr Thr His Gln 180 185 190 Val Asp Ser Asp Ile Asp Trp Phe Thr
Ala Val Asp Asp Leu Gln Glu 195 200 205 Gln Gly Ser Ala His Leu Gly
Thr Gln Glu Phe Ser Ser Gly Val Phe 210 215 220 Tyr Arg Tyr Ala Asn
Ile Asn Leu Ala Gln Leu Gln Glu Asn Leu Gly 225 230 235 240 Gly Ala
Ser Arg Glu Gln Ala Leu Glu Ile Ala Thr His Val Val His 245 250 255
Met Leu Ala Thr Glu Val Pro Gly Ala Lys Gln Arg Thr Tyr Ala Ala 260
265 270 Phe Asn Pro Ala Asp Met Val Met Val Asn Phe Ser Asp Met Pro
Leu 275 280 285 Ser Met Ala Asn Ala Phe Glu Lys Ala Val Lys Ala Lys
Asp Gly Phe 290 295 300 Leu Gln Pro Ser Ile Gln Ala Phe Asn Gln Tyr
Trp Asp Arg Val Ala 305 310 315 320 Asn Gly Tyr Gly Leu Asn Gly Ala
Ala Ala Gln Phe Ser Leu Ser Asp 325 330 335 Val Asp Pro Ile Thr Ala
Gln Val Lys Gln Met Pro Thr Leu Glu Gln 340 345 350 Leu Lys Ser Trp
Val Arg Asn Asn Gly Glu Ala 355 360 4224PRTEscherichia coli 4Met
Arg Ser Tyr Leu Ile Leu Arg Leu Ala Gly Pro Met Gln Ala Trp 1 5 10
15 Gly Gln Pro Thr Phe Glu Gly Thr Arg Pro Thr Gly Arg Phe Pro Thr
20 25 30 Arg Ser Gly Leu Leu Gly Leu Leu Gly Ala Cys Leu Gly Ile
Gln Arg 35 40 45 Asp Asp Thr Ser Ser Leu Gln Ala Leu Ser Glu Ser
Val Gln Phe Ala 50 55 60 Val Arg Cys Asp Glu Leu Ile Leu Asp Asp
Arg Arg Val Ser Val Thr 65 70 75 80 Gly Leu Arg Asp Tyr His Thr Val
Leu Gly Ala Arg Glu Asp Tyr Arg 85 90 95 Gly Leu Lys Ser His Glu
Thr Ile Gln Thr Trp Arg Glu Tyr Leu Cys 100 105 110 Asp Ala Ser Phe
Thr Val Ala Leu Trp Leu Thr Pro His Ala Thr Met 115 120 125 Val Ile
Ser Glu Leu Glu Lys Ala Val Leu Lys Pro Arg Tyr Thr Pro 130 135 140
Tyr Leu Gly Arg Arg Ser Cys Pro Leu Thr His Pro Leu Phe Leu Gly 145
150 155 160 Thr Cys Gln Ala Ser Asp Pro Gln Lys Ala Leu Leu Asn Tyr
Glu Pro 165 170 175 Val Gly Gly Asp Ile Tyr Ser Glu Glu Ser Val Thr
Gly His His Leu 180 185 190 Lys Phe Thr Ala Arg Asp Glu Pro Met Ile
Thr Leu Pro Arg Gln Phe 195 200 205 Ala Ser Arg Glu Trp Tyr Val Ile
Lys Gly Gly Met Asp Val Ser Gln 210 215 220 5199PRTEscherichia coli
5Met Tyr Leu Ser Lys Val Ile Ile Ala Arg Ala Trp Ser Arg Asp Leu 1
5 10 15 Tyr Gln Leu His Gln Gly Leu Trp His Leu Phe Pro Asn Arg Pro
Asp 20 25 30 Ala Ala Arg Asp Phe Leu Phe His Val Glu Lys Arg Asn
Thr Pro Glu 35 40 45 Gly Cys His Val Leu Leu Gln Ser Ala Gln Met
Pro Val Ser Thr Ala 50 55 60 Val Ala Thr Val Ile Lys Thr Lys Gln
Val Glu Phe Gln Leu Gln Val 65 70 75 80 Gly Val Pro Leu Tyr Phe Arg
Leu Arg Ala Asn Pro Ile Lys Thr Ile 85 90 95 Leu Asp Asn Gln Lys
Arg Leu Asp Ser Lys Gly Asn Ile Lys Arg Cys 100 105 110 Arg Val Pro
Leu Ile Lys Glu Ala Glu Gln Ile Ala Trp Leu Gln Arg 115 120 125 Lys
Leu Gly Asn Ala Ala Arg Val Glu Asp Val His Pro Ile Ser Glu 130 135
140 Arg Pro Gln Tyr Phe Ser Gly Asp Gly Lys Ser Gly Lys Ile Gln Thr
145 150 155 160 Val Cys Phe Glu Gly Val Leu Thr Ile Asn Asp Ala Pro
Ala Leu Ile 165 170 175 Asp Leu Val Gln Gln Gly Ile Gly Pro Ala Lys
Ser Met Gly Cys Gly 180 185 190 Leu Leu Ser Leu Ala Pro Leu 195
6392DNAArtificial SequenceGA1070943 6actggaaagc gggcagtgaa
aggaaggccc atgaggccag ttaattaagc ggatcctggc 60ggcggcagcg gcggcggcag
cgacaagcag aagaacggca tcaaggcgaa cttcaagatc 120cgccacaaca
tcgaggacgg cggcgtgcag ctcgccgacc actaccagca gaacaccccc
180atcggcgacg gccccgtgct gctgcccgac aaccactacc tgagctacca
gtccgccctg 240agcaaagacc ccaacgagaa gcgcgatcac atggtcctgc
tggagttcgt gaccgccgcc 300gggatcactc tcggcatgga cgagctgtac
aagtaagcgg ccgcggcgcg cctaggcctt 360gacggccttc cttcaattcg
ccctatagtg ag 3927603DNAArtificial SequenceGA1070941 7cactataggg
cgaattggcg gaaggccgtc aaggccgcat ttaattaagc ggccgcaggc 60ggcggcagcg
gcggcggcag catggtgagc aagggcgagg agctgttcac cggggtggtg
120cccatcctgg tcgagctgga cggcgacgta aacggccaca agttcagcgt
gtccggcgag 180ggcgagggcg atgccaccta cggcaagctg accctgaagc
tcatctgcac caccggcaag 240ctgcccgtgc cctggcccac cctcgtgacc
accctcggct acggcctgca gtgcttcgcc 300cgctaccccg accacatgaa
gcagcacgac ttcttcaagt ccgccatgcc cgaaggctac 360gtccaggagc
gcaccatctt cttcaaggac gacggcaact acaagacccg cgccgaggtg
420aagttcgagg gcgacaccct ggtgaaccgc atcgagctga agggcatcga
cttcaaggag 480gacggcaaca tcctggggca caagctggag tacaactaca
acagccacaa cgtctatatc 540acggcctaac tcgagggcgc gccctgggcc
tcatgggcct tccgctcact gcccgctttc 600cag 6038679DNAArtificial
SequenceGA1068859 8cactataggg cgaattggcg gaaggccgtc aaggccgcat
gagctccatg gaaacaaaga 60attagctgat ctttaataat aaggaaatgt tacattaagg
ttggtgggtt gtttttatgg 120gaaaaaatgc tttaagaaca aatgtatact
tttagagagt tccccgcgcc agcggggata 180aaccgggccg attgaaggtc
cggtggatgg cttaaaagag ttccccgcgc cagcggggat 240aaaccgccgc
aggtacagca ggtagcgcag atcatcaaga gttccccgcg ccagcgggga
300taaaccgact tctctccgaa aagtcaggac gctgtggcag agttccccgc
gccagcgggg 360ataaaccgcc tacgcgctga acgccagcgg tgtggtgaat
gagttccccg cgccagcggg 420gataaaccgg tgtggccatg cacgccttta
acggtgaact ggagttcccc gcgccagcgg 480ggataaaccg cacgaactca
gccagaacga caaacaaaag gcgagttccc cgcgccagcg 540gggataaacc
ggcaccagta cgcgccccac gctgacggtt tctgagttcc ccgcgccagc
600ggggataaac cgcagctccc attttcaaac ccaggtaccc tgggcctcat
gggccttccg 660ctcactgccc gctttccag 6799685DNAArtificial
SequenceGA1047360 9gagctcccgg gctgacggta atagaggcac ctacaggctc
cggtaaaacg gaaacagcgc 60tggcctatgc ttggaaactt attgatcaac aaattgcgga
tagtgttatt tttgccctcc 120caacacaagc taccgcgaat gctatgctta
cgagaatgga agcgagcgcg agccacttat 180tttcatcccc aaatcttatt
cttgctcatg gcaattcacg gtttaaccac ctctttcaat 240caataaaatc
acgcgcgatt actgaacagg ggcaagaaga agcgtgggtt cagtgttgtc
300agtggttgtc acaaagcaat aagaaagtgt ttcttgggca aatcggcgtt
tgcacgattg 360atcaggtgtt gatttcggta ttgccagtta aacaccgctt
tatccgtggt ttgggaattg 420gtagatctgt tttaattgtt aatgaagttc
atgcttacga cacctatatg aacggcttgc 480tcgaggcagt gctcaaggct
caggctgatg tgggagggag tgttattctt ctttccgcaa 540ccctaccaat
gaaacaaaaa cagaagcttc tggatactta tggtctgcat acagatccag
600tggaaaataa ctccgcatat ccactcatta actggcgagg tgtgaatggt
gcgcaacgtt 660ttgatctgct agcggatccg gtacc 6851037DNAArtificial
SequencePrimer BG3186 10atagcgccat ggaacctttt aaatatatat gccatta
371141DNAArtificial SequencePrimer BG3213 11acagtgggat ccgctttggg
atttgcaggg atgactctgg t 411243DNAArtificial SequencePrimer BG3303
12atagcgtcat gaatttgctt attgataact ggattcctgt acg
431344DNAArtificial SequencePrimer BG3212 13acagtggcgg ccgcgccatt
tgatggccct ccttgcggtt ttaa 441445DNAArtificial SequencePrimer
BG3076 14cgtatatcaa actttccaat agcatgaaga gcaatgaaaa ataac
451523DNAArtificial SequencePrimer BG3449 15atgataccgc gagacccacg
ctc 231624DNAArtificial SequencePrimer BG3451 16cggataaagt
tgcaggacca cttc 2417199PRTEscherichia coli 17Met Tyr Leu Ser Lys
Val Ile Ile Ala Arg Ala Trp Ser Arg Asp Leu 1 5 10 15 Tyr Gln Leu
His Gln Gly Leu Trp His Leu Phe Pro Asn Arg Pro Asp 20 25 30 Ala
Ala Arg Asp Phe Leu Phe His Val Glu Lys Arg Asn Thr Pro Glu 35 40
45 Gly Cys His Val Leu Leu Gln Ser Ala Gln Met Pro Val Ser Thr Ala
50 55 60 Val Ala Thr Val Ile Lys Thr Lys Gln Val Glu Phe Gln Leu
Gln Val 65 70 75 80 Gly Val Pro Leu Tyr Phe Arg Leu Arg Ala Asn Pro
Ile Lys Thr Ile 85 90 95 Leu Asp Asn Gln Lys Arg Leu Asp Ser Lys
Gly Asn Ile Lys Arg Cys 100 105 110 Arg Val Pro Leu Ile Lys Glu Ala
Glu Gln Ile Ala Trp Leu Gln Arg 115 120 125 Lys Leu Gly Asn Ala Ala
Arg Val Glu Asp Val His Pro Ile Ser Glu 130 135 140 Arg Pro Gln Tyr
Phe Ser Gly Asp Gly Lys Ser Gly Lys Ile Gln Thr 145 150 155 160 Val
Cys Phe Glu Gly Val Leu Thr Ile Asn Asp Ala Pro Ala Leu Ile 165 170
175 Asp Leu Val Gln Gln Gly Ile Gly Pro Ala Lys Ser Met Gly Cys Gly
180 185 190 Leu Leu Ser Leu Ala Pro Leu 195 182154DNAArtificial
SequenceFusion protein 18atggctcaac tggttaaaag cgaactggaa
gagaaaaaaa gtgaactgcg ccacaaactg 60aaatatgtgc cgcatgaata tatcgagctg
attgaaattg cacgtaatcc gacccaggat 120cgtattctgg aaatgaaagt
gatggaattt tttatgaaag tgtacggcta tcgcggtgaa 180catctgggtg
gtagccgtaa accggatggt gcaatttata ccgttggtag cccgattgat
240tatggtgtta ttgttgatac caaagcctat agcggtggtt ataatctgcc
gattggtcag 300gcagatgaaa tggaacgtta tgtggaagaa aatcagaccc
gtgataaaca tctgaatccg 360aatgaatggt ggaaagttta tccgagcagc
gttaccgagt ttaaattcct gtttgttagc 420ggtcacttca aaggcaacta
taaagcacag ctgacccgtc tgaatcatat taccaattgt 480aatggtgcag
ttctgagcgt tgaagaactg ctgattggtg gtgaaatgat taaagcaggc
540accctgaccc tggaagaagt tcgtcgcaaa tttaacaatg gcgaaatcaa
ctttgcggat 600cccaccaacc gcgcgaaagg cctggaagcg gtgagcgtgg
cgagcatgaa tttgcttatt 660gataactgga ttcctgtacg cccgcgaaac
ggggggaaag tccaaatcat aaatctgcaa 720tcgctatact gcagtagaga
tcagtggcga ttaagtttgc cccgtgacga tatggaactg 780gccgctttag
cactgctggt
ttgcattggg caaattatcg ccccggcaaa agatgacgtt 840gaatttcgac
atcgcataat gaatccgctc actgaagatg agtttcaaca actcatcgcg
900ccgtggatag atatgttcta ccttaatcac gcagaacatc cctttatgca
gaccaaaggt 960gtcaaagcaa atgatgtgac tccaatggaa aaactgttgg
ctggggtaag cggcgcgacg 1020aattgtgcat ttgtcaatca accggggcag
ggtgaagcat tatgtggtgg atgcactgcg 1080attgcgttat tcaaccaggc
gaatcaggca ccaggttttg gtggtggttt taaaagcggt 1140ttacgtggag
gaacacctgt aacaacgttc gtacgtggga tcgatcttcg ttcaacggtg
1200ttactcaatg tcctcacatt acctcgtctt caaaaacaat ttcctaatga
atcacatacg 1260gaaaaccaac ctacctggat taaacctatc aagtccaatg
agtctatacc tgcttcgtca 1320attgggtttg tccgtggtct attctggcaa
ccagcgcata ttgaattatg cgatcccatt 1380gggattggta aatgttcttg
ctgtggacag gaaagcaatt tgcgttatac cggttttctt 1440aaggaaaaat
ttacctttac agttaatggg ctatggcccc atccgcattc cccttgtctg
1500gtaacagtca agaaagggga ggttgaggaa aaatttcttg ctttcaccac
ctccgcacca 1560tcatggacac aaatcagccg agttgtggta gataagatta
ttcaaaatga aaatggaaat 1620cgcgtggcgg cggttgtgaa tcaattcaga
aatattgcgc cgcaaagtcc tcttgaattg 1680attatggggg gatatcgtaa
taatcaagca tctattcttg aacggcgtca tgatgtgttg 1740atgtttaatc
aggggtggca acaatacggc aatgtgataa acgaaatagt gactgttggt
1800ttgggatata aaacagcctt acgcaaggcg ttatatacct ttgcagaagg
gtttaaaaat 1860aaagacttca aaggggccgg agtctctgtt catgagactg
cagaaaggca tttctatcga 1920cagagtgaat tattaattcc cgatgtactg
gcgaatgtta atttttccca ggctgatgag 1980gtaatagctg atttacgaga
caaacttcat caattgtgtg aaatgctatt taatcaatct 2040gtagctccct
atgcacatca tcctaaatta ataagcacat tagcgcttgc ccgcgccacg
2100ctatacaaac atttacggga gttaaaaccg caaggagggc catcaaatgg ctga
215419717PRTArtificial SequenceFusion protein 19Met Ala Gln Leu Val
Lys Ser Glu Leu Glu Glu Lys Lys Ser Glu Leu 1 5 10 15 Arg His Lys
Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu Ile Glu 20 25 30 Ile
Ala Arg Asn Pro Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met 35 40
45 Glu Phe Phe Met Lys Val Tyr Gly Tyr Arg Gly Glu His Leu Gly Gly
50 55 60 Ser Arg Lys Pro Asp Gly Ala Ile Tyr Thr Val Gly Ser Pro
Ile Asp 65 70 75 80 Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly
Gly Tyr Asn Leu 85 90 95 Pro Ile Gly Gln Ala Asp Glu Met Glu Arg
Tyr Val Glu Glu Asn Gln 100 105 110 Thr Arg Asp Lys His Leu Asn Pro
Asn Glu Trp Trp Lys Val Tyr Pro 115 120 125 Ser Ser Val Thr Glu Phe
Lys Phe Leu Phe Val Ser Gly His Phe Lys 130 135 140 Gly Asn Tyr Lys
Ala Gln Leu Thr Arg Leu Asn His Ile Thr Asn Cys 145 150 155 160 Asn
Gly Ala Val Leu Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met 165 170
175 Ile Lys Ala Gly Thr Leu Thr Leu Glu Glu Val Arg Arg Lys Phe Asn
180 185 190 Asn Gly Glu Ile Asn Phe Ala Asp Pro Thr Asn Arg Ala Lys
Gly Leu 195 200 205 Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile
Asp Asn Trp Ile 210 215 220 Pro Val Arg Pro Arg Asn Gly Gly Lys Val
Gln Ile Ile Asn Leu Gln 225 230 235 240 Ser Leu Tyr Cys Ser Arg Asp
Gln Trp Arg Leu Ser Leu Pro Arg Asp 245 250 255 Asp Met Glu Leu Ala
Ala Leu Ala Leu Leu Val Cys Ile Gly Gln Ile 260 265 270 Ile Ala Pro
Ala Lys Asp Asp Val Glu Phe Arg His Arg Ile Met Asn 275 280 285 Pro
Leu Thr Glu Asp Glu Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp 290 295
300 Met Phe Tyr Leu Asn His Ala Glu His Pro Phe Met Gln Thr Lys Gly
305 310 315 320 Val Lys Ala Asn Asp Val Thr Pro Met Glu Lys Leu Leu
Ala Gly Val 325 330 335 Ser Gly Ala Thr Asn Cys Ala Phe Val Asn Gln
Pro Gly Gln Gly Glu 340 345 350 Ala Leu Cys Gly Gly Cys Thr Ala Ile
Ala Leu Phe Asn Gln Ala Asn 355 360 365 Gln Ala Pro Gly Phe Gly Gly
Gly Phe Lys Ser Gly Leu Arg Gly Gly 370 375 380 Thr Pro Val Thr Thr
Phe Val Arg Gly Ile Asp Leu Arg Ser Thr Val 385 390 395 400 Leu Leu
Asn Val Leu Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro Asn 405 410 415
Glu Ser His Thr Glu Asn Gln Pro Thr Trp Ile Lys Pro Ile Lys Ser 420
425 430 Asn Glu Ser Ile Pro Ala Ser Ser Ile Gly Phe Val Arg Gly Leu
Phe 435 440 445 Trp Gln Pro Ala His Ile Glu Leu Cys Asp Pro Ile Gly
Ile Gly Lys 450 455 460 Cys Ser Cys Cys Gly Gln Glu Ser Asn Leu Arg
Tyr Thr Gly Phe Leu 465 470 475 480 Lys Glu Lys Phe Thr Phe Thr Val
Asn Gly Leu Trp Pro His Pro His 485 490 495 Ser Pro Cys Leu Val Thr
Val Lys Lys Gly Glu Val Glu Glu Lys Phe 500 505 510 Leu Ala Phe Thr
Thr Ser Ala Pro Ser Trp Thr Gln Ile Ser Arg Val 515 520 525 Val Val
Asp Lys Ile Ile Gln Asn Glu Asn Gly Asn Arg Val Ala Ala 530 535 540
Val Val Asn Gln Phe Arg Asn Ile Ala Pro Gln Ser Pro Leu Glu Leu 545
550 555 560 Ile Met Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile Leu Glu
Arg Arg 565 570 575 His Asp Val Leu Met Phe Asn Gln Gly Trp Gln Gln
Tyr Gly Asn Val 580 585 590 Ile Asn Glu Ile Val Thr Val Gly Leu Gly
Tyr Lys Thr Ala Leu Arg 595 600 605 Lys Ala Leu Tyr Thr Phe Ala Glu
Gly Phe Lys Asn Lys Asp Phe Lys 610 615 620 Gly Ala Gly Val Ser Val
His Glu Thr Ala Glu Arg His Phe Tyr Arg 625 630 635 640 Gln Ser Glu
Leu Leu Ile Pro Asp Val Leu Ala Asn Val Asn Phe Ser 645 650 655 Gln
Ala Asp Glu Val Ile Ala Asp Leu Arg Asp Lys Leu His Gln Leu 660 665
670 Cys Glu Met Leu Phe Asn Gln Ser Val Ala Pro Tyr Ala His His Pro
675 680 685 Lys Leu Ile Ser Thr Leu Ala Leu Ala Arg Ala Thr Leu Tyr
Lys His 690 695 700 Leu Arg Glu Leu Lys Pro Gln Gly Gly Pro Ser Asn
Gly 705 710 715 202154DNAArtificial SequenceFusion protein
20atggctcaac tggttaaaag cgaactggaa gagaaaaaaa gtgaactgcg ccacaaactg
60aaatatgtgc cgcatgaata tatcgagctg attgaaattg cacgtaatcc gacccaggat
120cgtattctgg aaatgaaagt gatggaattt tttatgaaag tgtacggcta
tcgcggtgaa 180catctgggtg gtagccgtaa accggatggt gcaatttata
ccgttggtag cccgattgat 240tatggtgtta ttgttgatac caaagcctat
agcggtggtt ataatctgcc gattggtcag 300gcagatgaaa tgcagcgtta
tgtgaaagaa aatcagaccc gcaacaaaca tattaacccg 360aatgaatggt
ggaaagttta tccgagcagc gttaccgagt ttaaattcct gtttgttagc
420ggtcacttca aaggcaacta taaagcacag ctgacccgtc tgaatcgtaa
aaccaattgt 480aatggtgcag ttctgagcgt tgaagaactg ctgattggtg
gtgaaatgat taaagcaggc 540accctgaccc tggaagaagt tcgtcgcaaa
tttaacaatg gcgaaatcaa ctttgcggat 600cccaccaacc gcgcgaaagg
cctggaagcg gtgagcgtgg cgagcatgaa tttgcttatt 660gataactgga
ttcctgtacg cccgcgaaac ggggggaaag tccaaatcat aaatctgcaa
720tcgctatact gcagtagaga tcagtggcga ttaagtttgc cccgtgacga
tatggaactg 780gccgctttag cactgctggt ttgcattggg caaattatcg
ccccggcaaa agatgacgtt 840gaatttcgac atcgcataat gaatccgctc
actgaagatg agtttcaaca actcatcgcg 900ccgtggatag atatgttcta
ccttaatcac gcagaacatc cctttatgca gaccaaaggt 960gtcaaagcaa
atgatgtgac tccaatggaa aaactgttgg ctggggtaag cggcgcgacg
1020aattgtgcat ttgtcaatca accggggcag ggtgaagcat tatgtggtgg
atgcactgcg 1080attgcgttat tcaaccaggc gaatcaggca ccaggttttg
gtggtggttt taaaagcggt 1140ttacgtggag gaacacctgt aacaacgttc
gtacgtggga tcgatcttcg ttcaacggtg 1200ttactcaatg tcctcacatt
acctcgtctt caaaaacaat ttcctaatga atcacatacg 1260gaaaaccaac
ctacctggat taaacctatc aagtccaatg agtctatacc tgcttcgtca
1320attgggtttg tccgtggtct attctggcaa ccagcgcata ttgaattatg
cgatcccatt 1380gggattggta aatgttcttg ctgtggacag gaaagcaatt
tgcgttatac cggttttctt 1440aaggaaaaat ttacctttac agttaatggg
ctatggcccc atccgcattc cccttgtctg 1500gtaacagtca agaaagggga
ggttgaggaa aaatttcttg ctttcaccac ctccgcacca 1560tcatggacac
aaatcagccg agttgtggta gataagatta ttcaaaatga aaatggaaat
1620cgcgtggcgg cggttgtgaa tcaattcaga aatattgcgc cgcaaagtcc
tcttgaattg 1680attatggggg gatatcgtaa taatcaagca tctattcttg
aacggcgtca tgatgtgttg 1740atgtttaatc aggggtggca acaatacggc
aatgtgataa acgaaatagt gactgttggt 1800ttgggatata aaacagcctt
acgcaaggcg ttatatacct ttgcagaagg gtttaaaaat 1860aaagacttca
aaggggccgg agtctctgtt catgagactg cagaaaggca tttctatcga
1920cagagtgaat tattaattcc cgatgtactg gcgaatgtta atttttccca
ggctgatgag 1980gtaatagctg atttacgaga caaacttcat caattgtgtg
aaatgctatt taatcaatct 2040gtagctccct atgcacatca tcctaaatta
ataagcacat tagcgcttgc ccgcgccacg 2100ctatacaaac atttacggga
gttaaaaccg caaggagggc catcaaatgg ctga 215421717PRTArtificial
SequenceFusion protein 21Met Ala Gln Leu Val Lys Ser Glu Leu Glu
Glu Lys Lys Ser Glu Leu 1 5 10 15 Arg His Lys Leu Lys Tyr Val Pro
His Glu Tyr Ile Glu Leu Ile Glu 20 25 30 Ile Ala Arg Asn Pro Thr
Gln Asp Arg Ile Leu Glu Met Lys Val Met 35 40 45 Glu Phe Phe Met
Lys Val Tyr Gly Tyr Arg Gly Glu His Leu Gly Gly 50 55 60 Ser Arg
Lys Pro Asp Gly Ala Ile Tyr Thr Val Gly Ser Pro Ile Asp 65 70 75 80
Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu 85
90 95 Pro Ile Gly Gln Ala Asp Glu Met Gln Arg Tyr Val Lys Glu Asn
Gln 100 105 110 Thr Arg Asn Lys His Ile Asn Pro Asn Glu Trp Trp Lys
Val Tyr Pro 115 120 125 Ser Ser Val Thr Glu Phe Lys Phe Leu Phe Val
Ser Gly His Phe Lys 130 135 140 Gly Asn Tyr Lys Ala Gln Leu Thr Arg
Leu Asn Arg Lys Thr Asn Cys 145 150 155 160 Asn Gly Ala Val Leu Ser
Val Glu Glu Leu Leu Ile Gly Gly Glu Met 165 170 175 Ile Lys Ala Gly
Thr Leu Thr Leu Glu Glu Val Arg Arg Lys Phe Asn 180 185 190 Asn Gly
Glu Ile Asn Phe Ala Asp Pro Thr Asn Arg Ala Lys Gly Leu 195 200 205
Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile Asp Asn Trp Ile 210
215 220 Pro Val Arg Pro Arg Asn Gly Gly Lys Val Gln Ile Ile Asn Leu
Gln 225 230 235 240 Ser Leu Tyr Cys Ser Arg Asp Gln Trp Arg Leu Ser
Leu Pro Arg Asp 245 250 255 Asp Met Glu Leu Ala Ala Leu Ala Leu Leu
Val Cys Ile Gly Gln Ile 260 265 270 Ile Ala Pro Ala Lys Asp Asp Val
Glu Phe Arg His Arg Ile Met Asn 275 280 285 Pro Leu Thr Glu Asp Glu
Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp 290 295 300 Met Phe Tyr Leu
Asn His Ala Glu His Pro Phe Met Gln Thr Lys Gly 305 310 315 320 Val
Lys Ala Asn Asp Val Thr Pro Met Glu Lys Leu Leu Ala Gly Val 325 330
335 Ser Gly Ala Thr Asn Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu
340 345 350 Ala Leu Cys Gly Gly Cys Thr Ala Ile Ala Leu Phe Asn Gln
Ala Asn 355 360 365 Gln Ala Pro Gly Phe Gly Gly Gly Phe Lys Ser Gly
Leu Arg Gly Gly 370 375 380 Thr Pro Val Thr Thr Phe Val Arg Gly Ile
Asp Leu Arg Ser Thr Val 385 390 395 400 Leu Leu Asn Val Leu Thr Leu
Pro Arg Leu Gln Lys Gln Phe Pro Asn 405 410 415 Glu Ser His Thr Glu
Asn Gln Pro Thr Trp Ile Lys Pro Ile Lys Ser 420 425 430 Asn Glu Ser
Ile Pro Ala Ser Ser Ile Gly Phe Val Arg Gly Leu Phe 435 440 445 Trp
Gln Pro Ala His Ile Glu Leu Cys Asp Pro Ile Gly Ile Gly Lys 450 455
460 Cys Ser Cys Cys Gly Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu
465 470 475 480 Lys Glu Lys Phe Thr Phe Thr Val Asn Gly Leu Trp Pro
His Pro His 485 490 495 Ser Pro Cys Leu Val Thr Val Lys Lys Gly Glu
Val Glu Glu Lys Phe 500 505 510 Leu Ala Phe Thr Thr Ser Ala Pro Ser
Trp Thr Gln Ile Ser Arg Val 515 520 525 Val Val Asp Lys Ile Ile Gln
Asn Glu Asn Gly Asn Arg Val Ala Ala 530 535 540 Val Val Asn Gln Phe
Arg Asn Ile Ala Pro Gln Ser Pro Leu Glu Leu 545 550 555 560 Ile Met
Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile Leu Glu Arg Arg 565 570 575
His Asp Val Leu Met Phe Asn Gln Gly Trp Gln Gln Tyr Gly Asn Val 580
585 590 Ile Asn Glu Ile Val Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu
Arg 595 600 605 Lys Ala Leu Tyr Thr Phe Ala Glu Gly Phe Lys Asn Lys
Asp Phe Lys 610 615 620 Gly Ala Gly Val Ser Val His Glu Thr Ala Glu
Arg His Phe Tyr Arg 625 630 635 640 Gln Ser Glu Leu Leu Ile Pro Asp
Val Leu Ala Asn Val Asn Phe Ser 645 650 655 Gln Ala Asp Glu Val Ile
Ala Asp Leu Arg Asp Lys Leu His Gln Leu 660 665 670 Cys Glu Met Leu
Phe Asn Gln Ser Val Ala Pro Tyr Ala His His Pro 675 680 685 Lys Leu
Ile Ser Thr Leu Ala Leu Ala Arg Ala Thr Leu Tyr Lys His 690 695 700
Leu Arg Glu Leu Lys Pro Gln Gly Gly Pro Ser Asn Gly 705 710 715
222235DNAArtificial SequenceFusion protein 22atgcatcacc atcatcacca
cccgaaaaaa aagcgcaaag tggatccgaa gaaaaaacgt 60aaagttgaag atccgaaaga
catggctcaa ctggttaaaa gcgaactgga agagaaaaaa 120agtgaactgc
gccacaaact gaaatatgtg ccgcatgaat atatcgagct gattgaaatt
180gcacgtaatc cgacccagga tcgtattctg gaaatgaaag tgatggaatt
ttttatgaaa 240gtgtacggct atcgcggtga acatctgggt ggtagccgta
aaccggatgg tgcaatttat 300accgttggta gcccgattga ttatggtgtt
attgttgata ccaaagccta tagcggtggt 360tataatctgc cgattggtca
ggcagatgaa atgcagcgtt atgtgaaaga aaatcagacc 420cgcaacaaac
atattaaccc gaatgaatgg tggaaagttt atccgagcag cgttaccgag
480tttaaattcc tgtttgttag cggtcacttc aaaggcaact ataaagcaca
gctgacccgt 540ctgaatcgta aaaccaattg taatggtgca gttctgagcg
ttgaagaact gctgattggt 600ggtgaaatga ttaaagcagg caccctgacc
ctggaagaag ttcgtcgcaa atttaacaat 660ggcgaaatca actttgcgga
tcccaccaac cgcgcgaaag gcctggaagc ggtgagcgtg 720gcgagcatga
atttgcttat tgataactgg attcctgtac gcccgcgaaa cggggggaaa
780gtccaaatca taaatctgca atcgctatac tgcagtagag atcagtggcg
attaagtttg 840ccccgtgacg atatggaact ggccgcttta gcactgctgg
tttgcattgg gcaaattatc 900gccccggcaa aagatgacgt tgaatttcga
catcgcataa tgaatccgct cactgaagat 960gagtttcaac aactcatcgc
gccgtggata gatatgttct accttaatca cgcagaacat 1020ccctttatgc
agaccaaagg tgtcaaagca aatgatgtga ctccaatgga aaaactgttg
1080gctggggtaa gcggcgcgac gaattgtgca tttgtcaatc aaccggggca
gggtgaagca 1140ttatgtggtg gatgcactgc gattgcgtta ttcaaccagg
cgaatcaggc accaggtttt 1200ggtggtggtt ttaaaagcgg tttacgtgga
ggaacacctg taacaacgtt cgtacgtggg 1260atcgatcttc gttcaacggt
gttactcaat gtcctcacat tacctcgtct tcaaaaacaa 1320tttcctaatg
aatcacatac ggaaaaccaa cctacctgga ttaaacctat caagtccaat
1380gagtctatac ctgcttcgtc aattgggttt gtccgtggtc tattctggca
accagcgcat 1440attgaattat gcgatcccat tgggattggt aaatgttctt
gctgtggaca ggaaagcaat 1500ttgcgttata ccggttttct taaggaaaaa
tttaccttta cagttaatgg gctatggccc 1560catccgcatt ccccttgtct
ggtaacagtc aagaaagggg aggttgagga aaaatttctt 1620gctttcacca
cctccgcacc atcatggaca caaatcagcc gagttgtggt agataagatt
1680attcaaaatg aaaatggaaa tcgcgtggcg gcggttgtga atcaattcag
aaatattgcg 1740ccgcaaagtc ctcttgaatt gattatgggg ggatatcgta
ataatcaagc atctattctt 1800gaacggcgtc atgatgtgtt gatgtttaat
caggggtggc aacaatacgg caatgtgata 1860aacgaaatag tgactgttgg
tttgggatat aaaacagcct tacgcaaggc gttatatacc 1920tttgcagaag
ggtttaaaaa taaagacttc aaaggggccg gagtctctgt tcatgagact
1980gcagaaaggc atttctatcg acagagtgaa ttattaattc
ccgatgtact ggcgaatgtt 2040aatttttccc aggctgatga ggtaatagct
gatttacgag acaaacttca tcaattgtgt 2100gaaatgctat ttaatcaatc
tgtagctccc tatgcacatc atcctaaatt aataagcaca 2160ttagcgcttg
cccgcgccac gctatacaaa catttacggg agttaaaacc gcaaggaggg
2220ccatcaaatg gctga 223523744PRTArtificial SequenceFusion protein
23Met His His His His His His Pro Lys Lys Lys Arg Lys Val Asp Pro 1
5 10 15 Lys Lys Lys Arg Lys Val Glu Asp Pro Lys Asp Met Ala Gln Leu
Val 20 25 30 Lys Ser Glu Leu Glu Glu Lys Lys Ser Glu Leu Arg His
Lys Leu Lys 35 40 45 Tyr Val Pro His Glu Tyr Ile Glu Leu Ile Glu
Ile Ala Arg Asn Pro 50 55 60 Thr Gln Asp Arg Ile Leu Glu Met Lys
Val Met Glu Phe Phe Met Lys 65 70 75 80 Val Tyr Gly Tyr Arg Gly Glu
His Leu Gly Gly Ser Arg Lys Pro Asp 85 90 95 Gly Ala Ile Tyr Thr
Val Gly Ser Pro Ile Asp Tyr Gly Val Ile Val 100 105 110 Asp Thr Lys
Ala Tyr Ser Gly Gly Tyr Asn Leu Pro Ile Gly Gln Ala 115 120 125 Asp
Glu Met Gln Arg Tyr Val Lys Glu Asn Gln Thr Arg Asn Lys His 130 135
140 Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser Val Thr Glu
145 150 155 160 Phe Lys Phe Leu Phe Val Ser Gly His Phe Lys Gly Asn
Tyr Lys Ala 165 170 175 Gln Leu Thr Arg Leu Asn Arg Lys Thr Asn Cys
Asn Gly Ala Val Leu 180 185 190 Ser Val Glu Glu Leu Leu Ile Gly Gly
Glu Met Ile Lys Ala Gly Thr 195 200 205 Leu Thr Leu Glu Glu Val Arg
Arg Lys Phe Asn Asn Gly Glu Ile Asn 210 215 220 Phe Ala Asp Pro Thr
Asn Arg Ala Lys Gly Leu Glu Ala Val Ser Val 225 230 235 240 Ala Ser
Met Asn Leu Leu Ile Asp Asn Trp Ile Pro Val Arg Pro Arg 245 250 255
Asn Gly Gly Lys Val Gln Ile Ile Asn Leu Gln Ser Leu Tyr Cys Ser 260
265 270 Arg Asp Gln Trp Arg Leu Ser Leu Pro Arg Asp Asp Met Glu Leu
Ala 275 280 285 Ala Leu Ala Leu Leu Val Cys Ile Gly Gln Ile Ile Ala
Pro Ala Lys 290 295 300 Asp Asp Val Glu Phe Arg His Arg Ile Met Asn
Pro Leu Thr Glu Asp 305 310 315 320 Glu Phe Gln Gln Leu Ile Ala Pro
Trp Ile Asp Met Phe Tyr Leu Asn 325 330 335 His Ala Glu His Pro Phe
Met Gln Thr Lys Gly Val Lys Ala Asn Asp 340 345 350 Val Thr Pro Met
Glu Lys Leu Leu Ala Gly Val Ser Gly Ala Thr Asn 355 360 365 Cys Ala
Phe Val Asn Gln Pro Gly Gln Gly Glu Ala Leu Cys Gly Gly 370 375 380
Cys Thr Ala Ile Ala Leu Phe Asn Gln Ala Asn Gln Ala Pro Gly Phe 385
390 395 400 Gly Gly Gly Phe Lys Ser Gly Leu Arg Gly Gly Thr Pro Val
Thr Thr 405 410 415 Phe Val Arg Gly Ile Asp Leu Arg Ser Thr Val Leu
Leu Asn Val Leu 420 425 430 Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro
Asn Glu Ser His Thr Glu 435 440 445 Asn Gln Pro Thr Trp Ile Lys Pro
Ile Lys Ser Asn Glu Ser Ile Pro 450 455 460 Ala Ser Ser Ile Gly Phe
Val Arg Gly Leu Phe Trp Gln Pro Ala His 465 470 475 480 Ile Glu Leu
Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser Cys Cys Gly 485 490 495 Gln
Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys Glu Lys Phe Thr 500 505
510 Phe Thr Val Asn Gly Leu Trp Pro His Pro His Ser Pro Cys Leu Val
515 520 525 Thr Val Lys Lys Gly Glu Val Glu Glu Lys Phe Leu Ala Phe
Thr Thr 530 535 540 Ser Ala Pro Ser Trp Thr Gln Ile Ser Arg Val Val
Val Asp Lys Ile 545 550 555 560 Ile Gln Asn Glu Asn Gly Asn Arg Val
Ala Ala Val Val Asn Gln Phe 565 570 575 Arg Asn Ile Ala Pro Gln Ser
Pro Leu Glu Leu Ile Met Gly Gly Tyr 580 585 590 Arg Asn Asn Gln Ala
Ser Ile Leu Glu Arg Arg His Asp Val Leu Met 595 600 605 Phe Asn Gln
Gly Trp Gln Gln Tyr Gly Asn Val Ile Asn Glu Ile Val 610 615 620 Thr
Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg Lys Ala Leu Tyr Thr 625 630
635 640 Phe Ala Glu Gly Phe Lys Asn Lys Asp Phe Lys Gly Ala Gly Val
Ser 645 650 655 Val His Glu Thr Ala Glu Arg His Phe Tyr Arg Gln Ser
Glu Leu Leu 660 665 670 Ile Pro Asp Val Leu Ala Asn Val Asn Phe Ser
Gln Ala Asp Glu Val 675 680 685 Ile Ala Asp Leu Arg Asp Lys Leu His
Gln Leu Cys Glu Met Leu Phe 690 695 700 Asn Gln Ser Val Ala Pro Tyr
Ala His His Pro Lys Leu Ile Ser Thr 705 710 715 720 Leu Ala Leu Ala
Arg Ala Thr Leu Tyr Lys His Leu Arg Glu Leu Lys 725 730 735 Pro Gln
Gly Gly Pro Ser Asn Gly 740 242235DNAArtificial SequenceFusion
protein 24atgcatcacc atcatcacca cccgaaaaaa aagcgcaaag tggatccgaa
gaaaaaacgt 60aaagttgaag atccgaaaga catggctcaa ctggttaaaa gcgaactgga
agagaaaaaa 120agtgaactgc gccacaaact gaaatatgtg ccgcatgaat
atatcgagct gattgaaatt 180gcacgtaatc cgacccagga tcgtattctg
gaaatgaaag tgatggaatt ttttatgaaa 240gtgtacggct atcgcggtga
acatctgggt ggtagccgta aaccggatgg tgcaatttat 300accgttggta
gcccgattga ttatggtgtt attgttgata ccaaagccta tagcggtggt
360tataatctgc cgattggtca ggcagatgaa atggaacgtt atgtggaaga
aaatcagacc 420cgtgataaac atctgaatcc gaatgaatgg tggaaagttt
atccgagcag cgttaccgag 480tttaaattcc tgtttgttag cggtcacttc
aaaggcaact ataaagcaca gctgacccgt 540ctgaatcata ttaccaattg
taatggtgca gttctgagcg ttgaagaact gctgattggt 600ggtgaaatga
ttaaagcagg caccctgacc ctggaagaag ttcgtcgcaa atttaacaat
660ggcgaaatca actttgcgga tcccaccaac cgcgcgaaag gcctggaagc
ggtgagcgtg 720gcgagcatga atttgcttat tgataactgg attcctgtac
gcccgcgaaa cggggggaaa 780gtccaaatca taaatctgca atcgctatac
tgcagtagag atcagtggcg attaagtttg 840ccccgtgacg atatggaact
ggccgcttta gcactgctgg tttgcattgg gcaaattatc 900gccccggcaa
aagatgacgt tgaatttcga catcgcataa tgaatccgct cactgaagat
960gagtttcaac aactcatcgc gccgtggata gatatgttct accttaatca
cgcagaacat 1020ccctttatgc agaccaaagg tgtcaaagca aatgatgtga
ctccaatgga aaaactgttg 1080gctggggtaa gcggcgcgac gaattgtgca
tttgtcaatc aaccggggca gggtgaagca 1140ttatgtggtg gatgcactgc
gattgcgtta ttcaaccagg cgaatcaggc accaggtttt 1200ggtggtggtt
ttaaaagcgg tttacgtgga ggaacacctg taacaacgtt cgtacgtggg
1260atcgatcttc gttcaacggt gttactcaat gtcctcacat tacctcgtct
tcaaaaacaa 1320tttcctaatg aatcacatac ggaaaaccaa cctacctgga
ttaaacctat caagtccaat 1380gagtctatac ctgcttcgtc aattgggttt
gtccgtggtc tattctggca accagcgcat 1440attgaattat gcgatcccat
tgggattggt aaatgttctt gctgtggaca ggaaagcaat 1500ttgcgttata
ccggttttct taaggaaaaa tttaccttta cagttaatgg gctatggccc
1560catccgcatt ccccttgtct ggtaacagtc aagaaagggg aggttgagga
aaaatttctt 1620gctttcacca cctccgcacc atcatggaca caaatcagcc
gagttgtggt agataagatt 1680attcaaaatg aaaatggaaa tcgcgtggcg
gcggttgtga atcaattcag aaatattgcg 1740ccgcaaagtc ctcttgaatt
gattatgggg ggatatcgta ataatcaagc atctattctt 1800gaacggcgtc
atgatgtgtt gatgtttaat caggggtggc aacaatacgg caatgtgata
1860aacgaaatag tgactgttgg tttgggatat aaaacagcct tacgcaaggc
gttatatacc 1920tttgcagaag ggtttaaaaa taaagacttc aaaggggccg
gagtctctgt tcatgagact 1980gcagaaaggc atttctatcg acagagtgaa
ttattaattc ccgatgtact ggcgaatgtt 2040aatttttccc aggctgatga
ggtaatagct gatttacgag acaaacttca tcaattgtgt 2100gaaatgctat
ttaatcaatc tgtagctccc tatgcacatc atcctaaatt aataagcaca
2160ttagcgcttg cccgcgccac gctatacaaa catttacggg agttaaaacc
gcaaggaggg 2220ccatcaaatg gctga 223525744PRTArtificial
SequenceFusion protein 25Met His His His His His His Pro Lys Lys
Lys Arg Lys Val Asp Pro 1 5 10 15 Lys Lys Lys Arg Lys Val Glu Asp
Pro Lys Asp Met Ala Gln Leu Val 20 25 30 Lys Ser Glu Leu Glu Glu
Lys Lys Ser Glu Leu Arg His Lys Leu Lys 35 40 45 Tyr Val Pro His
Glu Tyr Ile Glu Leu Ile Glu Ile Ala Arg Asn Pro 50 55 60 Thr Gln
Asp Arg Ile Leu Glu Met Lys Val Met Glu Phe Phe Met Lys 65 70 75 80
Val Tyr Gly Tyr Arg Gly Glu His Leu Gly Gly Ser Arg Lys Pro Asp 85
90 95 Gly Ala Ile Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly Val Ile
Val 100 105 110 Asp Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu Pro Ile
Gly Gln Ala 115 120 125 Asp Glu Met Glu Arg Tyr Val Glu Glu Asn Gln
Thr Arg Asp Lys His 130 135 140 Leu Asn Pro Asn Glu Trp Trp Lys Val
Tyr Pro Ser Ser Val Thr Glu 145 150 155 160 Phe Lys Phe Leu Phe Val
Ser Gly His Phe Lys Gly Asn Tyr Lys Ala 165 170 175 Gln Leu Thr Arg
Leu Asn His Ile Thr Asn Cys Asn Gly Ala Val Leu 180 185 190 Ser Val
Glu Glu Leu Leu Ile Gly Gly Glu Met Ile Lys Ala Gly Thr 195 200 205
Leu Thr Leu Glu Glu Val Arg Arg Lys Phe Asn Asn Gly Glu Ile Asn 210
215 220 Phe Ala Asp Pro Thr Asn Arg Ala Lys Gly Leu Glu Ala Val Ser
Val 225 230 235 240 Ala Ser Met Asn Leu Leu Ile Asp Asn Trp Ile Pro
Val Arg Pro Arg 245 250 255 Asn Gly Gly Lys Val Gln Ile Ile Asn Leu
Gln Ser Leu Tyr Cys Ser 260 265 270 Arg Asp Gln Trp Arg Leu Ser Leu
Pro Arg Asp Asp Met Glu Leu Ala 275 280 285 Ala Leu Ala Leu Leu Val
Cys Ile Gly Gln Ile Ile Ala Pro Ala Lys 290 295 300 Asp Asp Val Glu
Phe Arg His Arg Ile Met Asn Pro Leu Thr Glu Asp 305 310 315 320 Glu
Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp Met Phe Tyr Leu Asn 325 330
335 His Ala Glu His Pro Phe Met Gln Thr Lys Gly Val Lys Ala Asn Asp
340 345 350 Val Thr Pro Met Glu Lys Leu Leu Ala Gly Val Ser Gly Ala
Thr Asn 355 360 365 Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu Ala
Leu Cys Gly Gly 370 375 380 Cys Thr Ala Ile Ala Leu Phe Asn Gln Ala
Asn Gln Ala Pro Gly Phe 385 390 395 400 Gly Gly Gly Phe Lys Ser Gly
Leu Arg Gly Gly Thr Pro Val Thr Thr 405 410 415 Phe Val Arg Gly Ile
Asp Leu Arg Ser Thr Val Leu Leu Asn Val Leu 420 425 430 Thr Leu Pro
Arg Leu Gln Lys Gln Phe Pro Asn Glu Ser His Thr Glu 435 440 445 Asn
Gln Pro Thr Trp Ile Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro 450 455
460 Ala Ser Ser Ile Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His
465 470 475 480 Ile Glu Leu Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser
Cys Cys Gly 485 490 495 Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu
Lys Glu Lys Phe Thr 500 505 510 Phe Thr Val Asn Gly Leu Trp Pro His
Pro His Ser Pro Cys Leu Val 515 520 525 Thr Val Lys Lys Gly Glu Val
Glu Glu Lys Phe Leu Ala Phe Thr Thr 530 535 540 Ser Ala Pro Ser Trp
Thr Gln Ile Ser Arg Val Val Val Asp Lys Ile 545 550 555 560 Ile Gln
Asn Glu Asn Gly Asn Arg Val Ala Ala Val Val Asn Gln Phe 565 570 575
Arg Asn Ile Ala Pro Gln Ser Pro Leu Glu Leu Ile Met Gly Gly Tyr 580
585 590 Arg Asn Asn Gln Ala Ser Ile Leu Glu Arg Arg His Asp Val Leu
Met 595 600 605 Phe Asn Gln Gly Trp Gln Gln Tyr Gly Asn Val Ile Asn
Glu Ile Val 610 615 620 Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg
Lys Ala Leu Tyr Thr 625 630 635 640 Phe Ala Glu Gly Phe Lys Asn Lys
Asp Phe Lys Gly Ala Gly Val Ser 645 650 655 Val His Glu Thr Ala Glu
Arg His Phe Tyr Arg Gln Ser Glu Leu Leu 660 665 670 Ile Pro Asp Val
Leu Ala Asn Val Asn Phe Ser Gln Ala Asp Glu Val 675 680 685 Ile Ala
Asp Leu Arg Asp Lys Leu His Gln Leu Cys Glu Met Leu Phe 690 695 700
Asn Gln Ser Val Ala Pro Tyr Ala His His Pro Lys Leu Ile Ser Thr 705
710 715 720 Leu Ala Leu Ala Arg Ala Thr Leu Tyr Lys His Leu Arg Glu
Leu Lys 725 730 735 Pro Gln Gly Gly Pro Ser Asn Gly 740
26168DNAArtificial SequenceTarget plasmid 26gaattcacaa cggtgagcaa
gtcactgttg gcaagccagg atctgaacaa taccgtcttg 60ctttcgagcg ctagctctag
aactagtcct cagcctaggc ctcgttccga agctgtcttt 120cgctgctgag
ggtgacgatc ccgcataggc ggcctttaac tcggatcc 16827163DNAArtificial
SequenceTarget plasmid 27gaattcacaa cggtgagcaa gtcactgttg
gcaagccagg atctgaacaa taccgtcttt 60tcgagcgcta gctctagaac tagtcctcag
cctaggcctc gttcaagctg tctttcgctg 120ctgagggtga cgatcccgca
taggcggcct ttaactcgga tcc 16328158DNAArtificial SequenceTarget
plasmid 28gaattcacaa cggtgagcaa gtcactgttg gcaagccagg atctgaacaa
taccgtcttc 60gagcgctagc tctagaacta gtcctcagcc taggcctcga agctgtcttt
cgctgctgag 120ggtgacgatc ccgcataggc ggcctttaac tcggatcc
15829153DNAArtificial SequenceTarget plasmid 29gaattcacaa
cggtgagcaa gtcactgttg gcaagccagg atctgaacaa taccgtcttg 60cgctagctct
agaactagtc ctcagcctag gcctaagctg tctttcgctg ctgagggtga
120cgatcccgca taggcggcct ttaactcgga tcc 15330148DNAArtificial
SequenceTarget plasmid 30gaattcacaa cggtgagcaa gtcactgttg
gcaagccagg atctgaacaa taccgtcttg 60ctagctctag aactagtcct cagcctagga
agctgtcttt cgctgctgag ggtgacgatc 120ccgcataggc ggcctttaac tcggatcc
14831143DNAArtificial SequenceTarget plasmid 31gaattcacaa
cggtgagcaa gtcactgttg gcaagccagg atctgaacaa taccgtcttc 60tctagaacta
gtcctcagcc taggaagctg tctttcgctg ctgagggtga cgatcccgca
120taggcggcct ttaactcgga tcc 1433241DNAArtificial SequenceCleavage
product 32cttgcgctag ctctagaact agtcctcagc ctaggcctaa g
413341DNAArtificial SequenceCleavage product 33cttaggccta
ggctgaggac tagttctaga gctagcgcaa g 413445DNAArtificial
SequenceFill-in ligation 34cttgcgctag ctctagaact agctagtcct
cagcctaggc ctaag 453545DNAArtificial SequenceFill-in ligation
35cttaggccta ggctgaggac tagctagttc tagagctagc gcaag 45361100DNAHomo
sapiens 36ggtggaacaa gatggattat caagtgtcaa gtccaatcta tgacatcaat
tattatacat 60cggagccctg ccaaaaaatc aatgtgaagc aaatcgcagc ccgcctcctg
cctccgctct 120actcactggt gttcatcttt ggttttgtgg gcaacatgct
ggtcatcctc atcctgataa 180actgcaaaag gctgaagagc atgactgaca
tctacctgct caacctggcc atctctgacc 240tgtttttcct tcttactgtc
cccttctggg ctcactatgc tgccgcccag tgggactttg 300gaaatacaat
gtgtcaactc ttgacagggc tctattttat aggcttcttc tctggaatct
360tcttcatcat cctcctgaca atcgataggt acctggctgt cgtccatgct
gtgtttgctt 420taaaagccag gacggtcacc tttggggtgg tgacaagtgt
gatcacttgg gtggtggctg 480tgtttgcgtc tctcccagga atcatcttta
ccagatctca aaaagaaggt cttcattaca 540cctgcagctc tcattttcca
tacagtcagt atcaattctg gaagaatttc cagacattaa 600agatagtcat
cttggggctg gtcctgccgc tgcttgtcat ggtcatctgc tactcgggaa
660tcctaaaaac tctgcttcgg tgtcgaaatg agaagaagag gcacagggct
gtgaggctta 720tcttcaccat catgattgtt tattttctct tctgggctcc
ctacaacatt gtccttctcc 780tgaacacctt ccaggaattc tttggcctga
ataattgcag tagctctaac aggttggacc 840aagctatgca ggtgacagag
actcttggga tgacgcactg ctgcatcaac cccatcatct 900atgcctttgt
cggggagaag ttcagaaact acctcttagt cttcttccaa aagcacattg
960ccaaacgctt ctgcaaatgc tgttctattt tccagcaaga ggctcccgag
cgagcaagct 1020cagtttacac ccgatccact ggggagcagg aaatatctgt
gggcttgtga cacggactca 1080agtgggctgg tgacccagtc
110037424DNAArtificial SequenceCRISPR array red1 37ccatggtaat
acgactcact atagggagaa ttagctgatc tttaataata aggaaatgtt 60acattaaggt
tggtgggttg tttttatggg aaaaaatgct ttaagaacaa atgtatactt
120ttagagagtt ccccgcgcca gcggggataa accgcaaaca cagcatggac
gacagccagg 180tacctagagt tccccgcgcc agcggggata aaccgcaaac
acagcatgga cgacagccag 240gtacctagag ttccccgcgc cagcggggat
aaaccgcaaa cacagcatgg acgacagcca 300ggtacctaga gttccccgcg
ccagcgggga taaaccgaaa acaaaaggct cagtcggaag 360actgggcctt
ttgttttaac cccttggggc ctctaaacgg gtcttgaggg gttttttggg 420tacc
42438424DNAArtificial SequenceCRISPR array red2 38ccatggtaat
acgactcact atagggagaa ttagctgatc tttaataata aggaaatgtt 60acattaaggt
tggtgggttg tttttatggg aaaaaatgct ttaagaacaa atgtatactt
120ttagagagtt ccccgcgcca gcggggataa accgtgtgat cacttgggtg
gtggctgtgt 180ttgcgtgagt tccccgcgcc agcggggata aaccgtgtga
tcacttgggt ggtggctgtg 240tttgcgtgag ttccccgcgc cagcggggat
aaaccgtgtg atcacttggg tggtggctgt 300gtttgcgtga gttccccgcg
ccagcgggga taaaccgaaa acaaaaggct cagtcggaag 360actgggcctt
ttgttttaac cccttggggc ctctaaacgg gtcttgaggg gttttttggg 420tacc
4243943DNAArtificial Sequenceplasmid 39aaggatgcca gtgataagtg
gaatgccatg tgggctgtca aaa 434043DNAArtificial Sequenceplasmid
mutant 1 40aaggatgcga gtgataagtg gaatgccatg tgggctgtca aaa
434119DNAArtificial Sequenceplasmid mutant 2 41gccatgtggg ctgtcaaaa
194223DNAArtificial Sequenceplasmid mutant 3 42gaatgccatg
tgggctgtca aaa 23431604PRTStreptomyces sp. SPB78 43Met Pro Asp Gln
Leu Asn Ala Pro Thr Pro Leu Gly Asp Arg Leu Thr 1 5 10 15 Gly Ala
Val Arg Thr Val Trp Ala Lys His Asp Arg Asp Thr Gly Lys 20 25 30
Trp Leu Pro Leu Trp Arg His Met Thr Asp Ser Ala Ala Val Ala Gly 35
40 45 Leu Leu Trp Asp His Trp Leu Pro Arg Asn Ile Lys Asp Leu Ile
Ala 50 55 60 Glu Pro Leu Pro Gly Gly Val Ala Asp Ala Arg Ser Leu
Cys Val Trp 65 70 75 80 Leu Ala Gly Thr His Asp Ile Gly Lys Ala Thr
Pro Ala Phe Ala Cys 85 90 95 Gln Val Asp Glu Leu Ala Gly Val Met
Thr Ala Ala Gly Leu Asp Met 100 105 110 Arg Thr Ser Lys Gln Leu Gly
Glu Asp Arg Arg Met Ala Pro His Gly 115 120 125 Leu Ala Gly Gln Val
Leu Leu Gln Glu Trp Leu Glu Glu Arg Arg Gly 130 135 140 Trp Thr His
Arg Ala Ser Ala Gln Phe Ala Val Val Ala Gly Gly His 145 150 155 160
His Gly Val Pro Pro Asp His Met Gln Leu His Asn Leu Asp Ala His 165
170 175 Pro Glu Leu Leu Arg Thr Gln Gly Leu Ala Glu Ala Gln Trp Arg
Ala 180 185 190 Val Gln Asp Glu Leu Leu Asp Ala Cys Ala Leu Val Phe
Gly Val Glu 195 200 205 Glu Arg Leu Asp Ala Trp Arg Thr Val Lys Leu
Pro Gln Thr Val Gln 210 215 220 Val Leu Leu Thr Ala Thr Val Ile Val
Ser Asp Trp Ile Ala Ser Asn 225 230 235 240 Pro Asp Leu Phe Pro Tyr
Phe Pro Glu Glu His Pro Arg Glu Glu Ala 245 250 255 Glu Arg Val Ala
Ala Ala Trp Gln Gly Leu Leu Leu Pro Ala Pro Trp 260 265 270 Glu Pro
Glu Glu Pro Ser Ala Pro Ala Ala Glu Phe Tyr Ala Ser Arg 275 280 285
Phe Ala Leu Pro Pro Gly Ala Val Val Arg Pro Val Gln Glu Gln Ala 290
295 300 Leu Ala Met Ala Arg Asp Met Glu Arg Pro Gly Met Leu Ile Ile
Glu 305 310 315 320 Ala Pro Met Gly Glu Gly Lys Thr Glu Ala Ala Leu
Ala Val Ala Glu 325 330 335 Val Phe Ala Ala Arg Ser Gly Ala Gly Gly
Cys Tyr Val Ala Leu Pro 340 345 350 Thr Met Ala Thr Ser Asn Ala Met
Phe Pro Arg Leu Leu Arg Trp Leu 355 360 365 Asp Arg Leu Pro Arg Ala
Asp Val Ser Gly Gly Arg Asp His Glu Gln 370 375 380 Arg Ser Val Leu
Leu Ala His Ala Lys Ser Ala Leu Gln Glu Asp Tyr 385 390 395 400 Ala
Thr Leu Met Arg Glu Ser His Arg Thr Ile Ala Ala Val Asp Ala 405 410
415 Tyr Gly Asp Asp Ser Arg Pro Arg Lys Gly Arg Pro Ala Ala Asp Gly
420 425 430 Val Arg Arg Lys Ala Pro Ala Glu Leu Val Ala His Gln Trp
Leu Arg 435 440 445 Gly Arg Lys Lys Gly Leu Leu Ala Ser Phe Ala Val
Gly Thr Ile Asp 450 455 460 Gln Leu Leu Met Ala Gly Leu Lys Ser Arg
His Leu Ala Leu Arg His 465 470 475 480 Leu Ala Met Ala Gly Lys Val
Val Val Ile Asp Glu Val His Ala Tyr 485 490 495 Asp Thr Tyr Met Asn
Ala Tyr Leu Asp Arg Val Leu Ala Trp Leu Gly 500 505 510 Glu Tyr Arg
Val Pro Val Val Val Leu Ser Ala Thr Leu Pro Ala Arg 515 520 525 Arg
Arg Gly Glu Leu Ala Ala Ala Tyr Thr Gly Glu Asp Ala Gln Ala 530 535
540 Leu Thr Glu Ala Thr Gly Tyr Pro Leu Leu Thr Ala Val Val Pro Gly
545 550 555 560 Arg Glu Ala Val Gln Phe Val Ala Ala Ala Ser Gly Arg
Gly Ser Asp 565 570 575 Val Leu Leu Glu Lys Leu Asp Asp Asp Asp Glu
Ala Leu Ala Asp Arg 580 585 590 Leu Asp Thr Asp Leu Ala Asp Gly Gly
Cys Ala Leu Val Val Arg Asn 595 600 605 Thr Val Asp Arg Val Met Asp
Thr Ala Ser Val Leu Arg Glu Arg Phe 610 615 620 Gly Ala Asp His Val
Thr Val Ala His Ala Arg Phe Val Asp Leu Asp 625 630 635 640 Arg Ala
Arg Lys Asp Ser Glu Leu Leu Ala Arg Phe Gly Pro Pro Asp 645 650 655
Pro Asp Gly Gly Ser Pro Gln Arg Pro Arg Asn Ala His Ile Val Val 660
665 670 Ala Ser Gln Val Ala Glu Gln Ser Leu Asp Val Asp Phe Asp Leu
Leu 675 680 685 Val Ser Asp Leu Cys Pro Val Asp Leu Leu Leu Gln Arg
Met Gly Arg 690 695 700 Leu His Arg His Pro Arg Gly Arg Asp Gln Glu
Arg Arg Pro Ala Arg 705 710 715 720 Leu Arg Gln Ala Arg Cys Leu Val
Thr Gly Val Gly Trp Asp Thr Ser 725 730 735 Pro Ala Pro Glu Ala Asp
Glu Gly Ser Arg Ala Ile Tyr Gly Ala Tyr 740 745 750 Ser Leu Leu Arg
Ser Leu Ala Val Leu Ala Pro His Leu Gly Thr Ala 755 760 765 Gly Ala
Ala Gly His Pro Leu Arg Leu Pro Glu Asp Ile Ser Pro Leu 770 775 780
Val Arg Arg Ala Tyr Gly Glu Glu Asp Pro Cys Pro Pro Glu Trp Glu 785
790 795 800 Pro Val Leu Ala Pro Ala Arg Asp Lys Tyr Arg Thr Ala Arg
Glu Arg 805 810 815 Gln Ser Gln Lys Ala Glu Val Phe Arg Leu Asp Glu
Val Arg Lys Ala 820 825 830 Gly Arg Pro Leu Ile Gly Trp Ile Asp Ala
Gly Val Gly Asp Ala Asp 835 840 845 Asp Thr Pro Val Gly Arg Ala Gln
Val Arg Asp Thr Lys Glu Gly Leu 850 855 860 Glu Val Leu Val Val Arg
Arg Arg Ala Asp Gly Ser Leu Cys Thr Leu 865 870 875 880 Pro Trp Leu
Asp Lys Gly Arg Gly Gly Leu Glu Leu Pro Val Asp Ala 885 890 895 Val
Pro Ser Ala Leu Ala Ala Arg Ala Val Ala Ala Ser Gly Leu Arg 900 905
910 Leu Pro Tyr His Phe Thr Ser Ser Pro Gln Thr Leu Asp Arg Thr Leu
915 920 925 Ala Glu Leu Glu Glu Leu Tyr Val Pro Ala Trp Gln Glu Lys
Glu Ser 930 935 940 His Trp Ile Ala Gly Glu Leu Ile Leu Ala Leu Asp
Glu Glu Gly Arg 945 950 955 960 Ala Ala Leu Ala Gly Gln Gln Leu Val
Tyr Asn Pro Glu Glu Gly Leu 965 970 975 Leu Val Ala Ser Ala Asp Ala
Asn Thr Glu Ala Thr Ser Gly Arg Val 980 985 990 Met Asp Gly Lys Pro
Ser Ser Ala Gly Asp Gly Lys Pro Gly His Ala 995 1000 1005 Ala Asp
Gly Asn Arg Ala Arg Thr Thr Val Gly Gln Ser Pro Ala 1010 1015 1020
Asp Arg Gln Thr His Gln Pro Pro Glu Gly Glu Arg His Pro Val 1025
1030 1035 Pro Pro Ser Ala Ala Pro Pro Pro Ala Arg Pro Ser Phe Asp
Leu 1040 1045 1050 Thr Ser Arg Pro Trp Leu Pro Val Leu Leu Lys Asp
Gly Ser Glu 1055 1060 1065 Arg Glu Leu Ser Leu Pro Glu Val Phe Asp
Gln Ala Arg Asp Ile 1070 1075 1080 Arg Arg Leu Val Gly Asp Leu Pro
Thr Gln Asp Phe Ala Leu Thr 1085 1090 1095 Arg Met Leu Leu Ala Leu
Leu Tyr Asp Ala Leu Ser Glu Pro Gly 1100 1105 1110 Gly Asp Met Ala
Pro Ala Asp Thr Asp Ala Trp Glu Glu Leu Trp 1115 1120 1125 Leu Ser
Gln Ser Ala Tyr Ala Ala Pro Val Ala Ala Tyr Leu His 1130 1135 1140
Arg Tyr Arg Glu Arg Phe Asp Leu Leu His Pro Glu Ser Pro Phe 1145
1150 1155 Phe Gln Thr Pro Gly Leu Arg Thr Ala Lys Asn Glu Val Phe
Ser 1160 1165 1170 Leu Asn Arg Leu Val Ala Asp Val Pro Asn Gly Asp
Pro Phe Phe 1175 1180 1185 Ser Met Arg Arg Pro Gly Val Asp Arg Leu
Gly Phe Ala Glu Ala 1190 1195 1200 Ala Arg Trp Leu Val His Ala Gln
Ala Tyr Asp Thr Ser Gly Ile 1205 1210 1215 Lys Thr Gly Ala Val Gly
Asp Pro Arg Val Lys Ala Gly Lys Gly 1220 1225 1230 Tyr Pro Gln Gly
Pro Ala Trp Ala Gly Asn Leu Gly Gly Val Leu 1235 1240 1245 Leu Glu
Gly Asp Asn Leu His Glu Thr Leu Leu Leu Asn Leu Ile 1250 1255 1260
Ala Gly Asp Thr Pro Gly Val His Ala Ala Glu Val Asp Arg Pro 1265
1270 1275 Ala Trp Arg Ala Glu Pro Ser Gly Pro Ala Pro Ala Pro Asp
Leu 1280 1285 1290 Gly Leu Arg Pro Tyr Gly Leu Arg Asp Leu Tyr Thr
Trp Gln Ser 1295 1300 1305 Arg Arg Ile Arg Leu His His Asp Ala Asp
Gly Val His Gly Val 1310 1315 1320 Val Leu Ala Tyr Gly Asp Ser Leu
Glu Pro His Asn Arg His Gly 1325 1330 1335 His Glu Pro Met Thr Ser
Trp Arg Arg Ser Pro Thr Gln Glu Lys 1340 1345 1350 Lys Arg Gln Glu
Asn Leu Val Tyr Leu Pro Arg Glu His Asp Pro 1355 1360 1365 Ser Arg
Leu Ala Trp Arg Gly Met Asp Gly Leu Leu Ala Gly Arg 1370 1375 1380
Glu Thr Gly Ser Ala Gln Gly Pro Asp Gly Ala Asp Arg Leu Ala 1385
1390 1395 Pro Lys Val Val Gln Trp Ala Ala Gln Leu Thr Thr Glu Gly
Leu 1400 1405 1410 Leu Pro Arg Gly Tyr Leu Ile Arg Thr Arg Val Ile
Gly Ala Arg 1415 1420 1425 Tyr Gly Thr Gln Gln Ser Val Ile Asp Glu
Val Val Asp Asp Gly 1430 1435 1440 Val Leu Met Pro Ala Val Leu Leu
His Glu Ala Asp Arg Arg Tyr 1445 1450 1455 Gly Asp Lys Ala Val Asp
Ala Leu His Asp Ala Glu Lys Ala Val 1460 1465 1470 Gly Ala Leu Ala
Gln Leu Ala Ala Asp Leu Ala Leu Ala Val Gly 1475 1480 1485 Thr Asp
Pro Glu Pro Gly Arg Asn Thr Ala Arg Asp Leu Gly Phe 1490 1495 1500
Gly Thr Leu Asp Thr His Tyr Arg Arg Trp Leu Arg Glu Leu Gly 1505
1510 1515 Gly Thr Ser Asp Pro Glu Glu His Arg Asp Arg Trp Lys Gln
Glu 1520 1525 1530 Val Arg Arg Leu Val Ala Glu Leu Gly Glu Arg Leu
Leu Asp Gly 1535 1540 1545 Ala Gly Pro Ala Ala Trp Glu Gly Arg Leu
Val Glu Thr Gly Lys 1550 1555 1560 Gly Thr Arg Trp Leu Asn Asp Ala
Ala Ala Glu Leu Arg Phe Arg 1565 1570 1575 Thr Arg Leu Arg Glu Phe
Leu Thr Thr Ala Pro Asp Thr Pro Thr 1580 1585 1590 Ser Pro Arg Pro
Ala Pro Val Glu Ser Pro Ala 1595 1600 441559PRTStreptomyces griseus
44Met Ser Asn Thr Pro Met Ser Arg Asp His Pro Glu Ser Leu Ser Ala 1
5 10 15 Tyr Ala Arg Leu Ser Pro Val Ser Arg Thr Ala Trp Gly Lys His
Asp 20 25 30 Arg Gln Thr Glu Gln Trp Leu Pro Leu Trp Arg His Met
Ala Asp Ser 35 40 45 Ala Ala Val Ala Glu Arg Leu Trp Asp Gln Trp
Val Pro Asp Asn Val 50 55 60 Lys Ala Leu Ile Ala Asp Ala Phe Pro
Gln Gly Ala Gln Asp Ala Arg 65 70 75 80 Arg Val Ala Val Phe Leu Ala
Cys Val His Asp Ile Gly Lys Ala Thr 85 90 95 Pro Ala Phe Ala Cys
Gln Val Asp Gly Leu Ala Asp Arg Met Arg Ala 100 105 110 Ala Gly Leu
Ser Met Pro Tyr Leu Lys Gln Phe Gly Leu Asp Arg Arg 115 120 125 Met
Ala Pro His Gly Leu Ala Gly Gln Leu Leu Leu Gln Glu Trp Leu 130 135
140 Ala Glu Arg Phe Gly Trp Ser Glu Arg Ala Ser Gly Gln Phe Ala Val
145 150 155 160 Val Ala Gly Gly His His Gly Thr Pro Pro Asp His Gln
His Ile His 165 170 175 Asp Leu Gly Leu Arg Pro His Leu Leu Arg Thr
Ala Gly Glu Ser Gln 180 185 190 Asp Thr Trp Arg Ser Val Gln Asp Glu
Leu Met Asp Ala Cys Ala Val 195 200 205 Arg Ala Gly Val Gly Gly Arg
Phe Gly Ala Trp Arg Ser Val Arg Leu 210 215 220 Pro Gln Pro Val Gln
Val Val Leu Thr Ala Ile Val Ile Val Ser Asp 225 230 235 240 Trp Ile
Ala Ser Ser Ser Glu Leu Phe Pro Tyr Asp Pro Ala Ser Trp 245 250 255
Ser Pro Val Gly Pro Glu Gly Glu Gly Arg Arg Leu Thr Ala Ala Trp 260
265 270 Gly Gly Leu Asp Leu Pro Gly Pro Trp Arg Ala Asp Gln Pro Asp
Cys 275 280 285 Thr Ala Ala Glu Leu Phe Gly Lys Arg Phe Asp Leu Pro
Glu Gly Ala 290 295 300 Gly Val Arg Pro Val Gln Glu Glu Ala Val Arg
Val Ala Gln Glu Leu 305 310 315 320 Pro Gly Pro Gly Leu Leu Ile Ile
Glu Ala Pro Met Gly Glu Gly Lys 325 330 335 Thr Glu Ala Ala Phe Ala
Ala Ala Glu Ile Leu Ala Ala Arg Thr Gly 340 345 350 Ala Gly Gly Cys
Leu Val Ala Leu Pro Thr Arg Ala Thr Gly Asp Ala 355 360 365 Met Phe
Pro Arg Leu Leu Arg Trp Leu Glu Arg Leu Pro Ser Asp Gly 370 375 380
Pro Arg Ser Val Val Leu Ala His Ala Lys Ala Ala Leu Asn Glu Val 385
390 395 400 Trp Ala Gly Met Thr Lys Ala Asp Arg Arg Lys Ile Thr Ala
Val Asp 405 410
415 Leu Asp Ser Gln Val Glu Asp Val Ser Ser Ala Gly Gly Ala Arg Arg
420 425 430 Ala Asn Pro Ala Ser Leu His Ala His Gln Trp Leu Arg Gly
Arg Lys 435 440 445 Lys Ala Leu Leu Ser Ser Phe Ala Val Gly Thr Val
Asp Gln Val Leu 450 455 460 Phe Ala Gly Leu Lys Ser Arg His Leu Ala
Leu Arg His Leu Ala Val 465 470 475 480 Ala Gly Lys Val Val Ile Val
Asp Glu Val His Ala Tyr Asp Ala Tyr 485 490 495 Met Ser Ala Tyr Leu
Asp Arg Val Leu Glu Trp Leu Ala Ala Tyr Arg 500 505 510 Val Pro Val
Val Met Leu Ser Ala Thr Leu Pro Ala His Arg Arg Arg 515 520 525 Glu
Leu Ala Ala Ala Tyr Ala Gly Glu Glu Thr Pro Glu Leu Ala Asp 530 535
540 Ala Leu Ala Leu Pro Asp Asp Ala Tyr Pro Leu Ile Thr Ala Val Ala
545 550 555 560 Pro Gly Gly Leu Val Leu Thr Ala Arg Pro Glu Pro Ala
Ser Gly Arg 565 570 575 Arg Thr Glu Val Val Leu Glu Arg Leu Gly Asp
Gly Pro Ala Leu Leu 580 585 590 Ala Ala Arg Leu Asp Glu Glu Leu Arg
Asp Gly Gly Cys Ala Leu Val 595 600 605 Val Arg Asn Thr Val Asp Arg
Val Leu Glu Ala Ala Glu His Leu Arg 610 615 620 Ala His Phe Gly Ala
Glu Ala Val Thr Val Ala His Ser Arg Phe Val 625 630 635 640 Ala Ala
Asp Arg Ala Arg Asn Asp Thr Val Leu Arg Glu Arg Phe Gly 645 650 655
Pro Gly Gly Asp Arg Pro Ala Gly Pro His Ile Val Val Ala Ser Gln 660
665 670 Val Val Glu Gln Ser Leu Asp Ile Asp Phe Asp Leu Leu Val Thr
Asp 675 680 685 Leu Ala Pro Val Asp Leu Val Leu Gln Arg Met Gly Arg
Leu His Arg 690 695 700 His Pro Arg Thr Arg Pro Pro Arg Leu Ser Arg
Ala Arg Cys Leu Ile 705 710 715 720 Thr Gly Val Glu Asp Trp His Ala
Glu Arg Pro Val Pro Val Arg Gly 725 730 735 Ser Leu Ala Val Tyr Gln
Gly Pro His Thr Leu Leu Arg Ala Leu Ala 740 745 750 Val Leu Gly Pro
His Leu Asp Gly Val Pro Leu Val Leu Pro Asp His 755 760 765 Ile Ser
Pro Leu Val Gln Ala Ala Tyr Asp Glu Arg Pro Val Gly Pro 770 775 780
Ala His Trp Ala Pro Val Leu Asp Glu Ala Arg Arg Gln Tyr Leu Thr 785
790 795 800 Arg Leu Ala Glu Lys Arg Glu Arg Ala Asp Val Phe Arg Leu
Gly Pro 805 810 815 Val Arg Arg Pro Gly Arg Pro Leu Phe Gly Trp Leu
Asp Gly Asn Ala 820 825 830 Gly Asp Ala Asp Asp Ser Arg Thr Gly Arg
Ala Gln Val Arg Asp Ser 835 840 845 Glu Glu Ser Leu Glu Val Leu Val
Val Gln Arg Arg Ala Asp Gly Arg 850 855 860 Leu Thr Thr Val Ser Trp
Leu Asp Gly Gly Arg Gly Gly Leu Asp Leu 865 870 875 880 Pro Glu His
Ala Pro Pro Pro Pro Arg Ala Ala Glu Val Val Ala Ala 885 890 895 Cys
Ala Leu Thr Leu Pro Arg Ser Leu Thr His Pro Gly Val Ile Asp 900 905
910 Arg Thr Ile Ala Glu Leu Glu Arg Phe Val Val Pro Ala Trp Gln Val
915 920 925 Lys Glu Cys Pro Trp Leu Ala Gly Glu Leu Leu Leu Val Leu
Asp Glu 930 935 940 Asp Cys Gln Thr Arg Leu Ser Gly Leu Glu Val His
Tyr Ser Thr Asp 945 950 955 960 Gln Gly Leu Arg Val Gly Ser Val Gly
Thr Arg Ser Thr Asn Arg Ala 965 970 975 Lys Gly Leu Glu Ala Val Ser
Val Ala Ser Phe Asp Leu Val Ser Arg 980 985 990 Pro Trp Leu Pro Val
Gln Tyr Glu Asp Gly Ala Thr Gly Glu Leu Ser 995 1000 1005 Leu Arg
Glu Val Phe Ala Arg Ala Gly Glu Val Arg Arg Leu Val 1010 1015 1020
Gly Asp Leu Pro Thr Gln Glu Leu Ala Leu Leu Arg Leu Leu Leu 1025
1030 1035 Ala Ile Leu Tyr Asp Ala Tyr Asp Glu Ala Pro Gly Arg Ser
Gly 1040 1045 1050 Gly Ala Pro Ala Gln Leu Glu Asp Trp Glu Ala Leu
Trp Asp Glu 1055 1060 1065 Pro Asp Ser Phe Ala Val Val Ala Gly Tyr
Leu Asp Arg His Arg 1070 1075 1080 Asp Arg Phe Asp Leu Leu His Pro
Glu Arg Pro Phe Phe Gln Val 1085 1090 1095 Ala Gly Leu His Thr Gln
Lys His Glu Val Ala Ser Leu Asn Arg 1100 1105 1110 Ile Val Ala Asp
Val Pro Asn Gly Glu Ala Phe Phe Ser Met Arg 1115 1120 1125 Arg Pro
Gly Val His Arg Leu Gly Leu Ala Glu Ala Ala Arg Trp 1130 1135 1140
Leu Val His Thr His Ala Tyr Asp Ala Ser Gly Ile Lys Ser Gly 1145
1150 1155 Met Glu Gly Asp Ala Arg Val Lys Gly Gly Lys Val Tyr Pro
Gln 1160 1165 1170 Gly Val Gly Trp Val Gly Gly Leu Gly Gly Val Phe
Ala Glu Gly 1175 1180 1185 Ala Ser Leu Arg Glu Thr Leu Leu Leu Asn
Leu Ile Pro Thr Asp 1190 1195 1200 Glu Asp Ile Leu Thr Ser Glu Pro
Lys Ala Asp Leu Pro Val Trp 1205 1210 1215 Arg Arg Glu Thr Pro Pro
Gly Pro Gly Val Val Glu Gly Asp Pro 1220 1225 1230 Ser Ala Pro Arg
Pro Ala Gly Pro Arg Asp Leu Tyr Thr Trp Gln 1235 1240 1245 Ser Arg
Arg Leu Leu Leu His Thr Glu Gly Ser Asp Ala Ile Gly 1250 1255 1260
Val Val Leu Gly Tyr Gly Asp Pro Leu Ser Pro Ala Asn Arg Gln 1265
1270 1275 Lys Thr Glu Pro Met Thr Gly Trp Arg Arg Ser Pro Ala Gln
Glu 1280 1285 1290 Lys Lys Leu Gly Arg Pro Leu Val Tyr Leu Pro Arg
Gln His Asp 1295 1300 1305 Pro Gly Arg Ala Ala Trp Arg Gly Leu Ala
Ser Leu Leu Tyr Pro 1310 1315 1320 Gln Gly Glu Asp Gly Asp Thr Thr
Gly Arg Gly Thr Asp Arg Ser 1325 1330 1335 Arg Pro Ala Gly Ile Val
Arg Trp Leu Ala Leu Leu Ser Thr Glu 1340 1345 1350 Gly Val Leu Pro
Lys Gly Ser Leu Ile Arg Thr Arg Leu Val Gly 1355 1360 1365 Ala Val
Tyr Gly Thr Gln Gln Ser Val Val Asp Asp Val Val Asp 1370 1375 1380
Asp Ser Ile Ala Leu Pro Val Val Leu Leu His Gln Asp Arg Arg 1385
1390 1395 Leu His Gly Ala Val Ala Val Asp Ala Val Ala Asp Ala Glu
Arg 1400 1405 1410 Ala Val Ser Ala Leu Gly His Leu Ala Gly Asn Leu
Ala Arg Ala 1415 1420 1425 Ser Gly Ser Glu Ala Gly Pro Ala Thr Ala
Thr Ala Arg Asp Gln 1430 1435 1440 Gly Phe Gly Ala Leu Asp Gly Pro
Tyr Arg Arg Trp Leu Val Asp 1445 1450 1455 Leu Ala Glu Asp Thr Asp
Leu Glu Arg Ala Arg Ala Ala Trp Arg 1460 1465 1470 Asp Thr Val Arg
Leu Val Val Leu Gly Ile Gly Arg Glu Leu Leu 1475 1480 1485 Asp Ala
Ala Gly Arg Ala Ala Ala Glu Gly Arg Val Ile Glu Leu 1490 1495 1500
Pro Gly Val Gly Lys Arg Trp Ile Asp Ser Ser Arg Ala Asp Leu 1505
1510 1515 Trp Phe Arg Thr Arg Ile Asn Arg Val Leu Pro Arg Pro Leu
Pro 1520 1525 1530 Glu Ala His Ala Pro Thr Ala Asp Ile His Ala Gly
His Ala Val 1535 1540 1545 Arg Ala Asp Glu Ala Leu Ser Glu Glu Thr
Val 1550 1555 451540PRTCatenulispora acidiphila 45Met Phe Asn Val
Gly Ser Thr Arg Cys Trp Gly Asp Gly Gly Leu Arg 1 5 10 15 Asn Ala
Ala Glu Asp Leu Ser Ala Ala Thr Arg Ser Ala Trp Ala Lys 20 25 30
Ser Asp Pro Asp Ser Gly Gln Ser Leu Ser Leu Ile Arg His Leu Ala 35
40 45 Asp Ser Ala Ala Ile Ala Glu His Leu Trp Asp Gln Trp Leu Pro
Asp 50 55 60 His Val Lys Ser Leu Ile Ala Glu Gly Leu Pro Glu Gly
Leu Val Asp 65 70 75 80 Gly Arg Thr Leu Ala Val Trp Leu Ala Gly Thr
His Asp Ile Gly Lys 85 90 95 Leu Thr Pro Ala Phe Ala Cys Gln Cys
Glu Pro Leu Ala Gln Ala Met 100 105 110 Arg Glu Cys Gly Leu Asp Met
Pro Thr Arg Thr Gln Phe Gly Asp Asp 115 120 125 Arg Arg Val Ala Pro
His Gly Leu Ala Gly Gln Val Leu Leu Arg Glu 130 135 140 Trp Leu Met
Glu Arg His Gly Trp Ser Gly Arg Ser Ala Asp Ala Phe 145 150 155 160
Thr Val Ile Ala Gly Gly His His Gly Val Pro Pro Ser Tyr Ser Gln 165
170 175 Leu His Asp Leu Asp Ala Tyr Pro Glu Leu Leu Arg Thr Pro Gly
Ala 180 185 190 Ser Glu Gly Ile Trp Lys Ser Ser Gln His Glu Leu Leu
Asp Ala Cys 195 200 205 Ala Val Met Thr Gly Ala Ser Ser Arg Leu Ala
His Trp Arg Gly Leu 210 215 220 Arg Leu Ser Gln Gln Ala Gln Val Leu
Leu Thr Gly Leu Val Ile Val 225 230 235 240 Ala Asp Trp Ile Ala Ser
Asn Thr Asp Leu Phe Pro Tyr Pro Ala Leu 245 250 255 Gly Thr Gly Glu
Ala Ala Ile Asp Pro Gly Lys Arg Val Glu Leu Ala 260 265 270 Trp Arg
Gly Leu Glu Leu Pro Ala Pro Trp Ala Pro Lys Tyr Leu Met 275 280 285
Pro Gly Met Gln Gly Leu Leu Ala Ser Arg Phe Gly Leu Pro Ala Asp 290
295 300 Ala Gln Leu Arg Pro Val Gln Gln Met Ala Val Gln Leu Ala Ser
Ala 305 310 315 320 Asn Ala Ala Pro Gly Leu Leu Val Ile Glu Ala Pro
Met Gly Glu Gly 325 330 335 Lys Thr Glu Ala Ala Leu Leu Ala Ala Glu
Ile Leu Ala Ala Arg Ser 340 345 350 Gly Ala Gly Gly Val Phe Leu Ala
Leu Pro Thr Gln Ala Thr Ser Asn 355 360 365 Ala Met Phe Ala Arg Val
Val Asn Trp Leu Arg Gln Val Pro Arg Glu 370 375 380 Gly Val Ala Ser
Val His Leu Ala His Gly Lys Ala Ala Leu Asp Asp 385 390 395 400 Ala
Phe Ala Ser Phe Leu Arg Ala Ala Pro Arg Leu Thr Ser Ile Asp 405 410
415 Ala Asp Gly Tyr Ala Gly Glu Ala Asn Val Arg Arg Asp Arg Arg Ala
420 425 430 Gly Ser Ala Asp Met Val Ala His Gln Trp Leu Arg Gly Arg
Lys Lys 435 440 445 Gly Ile Leu Ser Pro Phe Val Val Gly Thr Ile Asp
Gln Leu Leu Phe 450 455 460 Thr Gly Leu Lys Ser Arg His Leu Ala Leu
Arg His Leu Ala Val Ala 465 470 475 480 Gly Lys Val Val Val Ile Asp
Glu Val His Ala Tyr Asp Ala Tyr Met 485 490 495 Ser Val Tyr Leu Glu
Arg Val Leu Ser Trp Leu Gly Ala Tyr Arg Val 500 505 510 Pro Val Val
Leu Leu Ser Ala Thr Leu Pro Ala Asp Arg Arg Gln Ala 515 520 525 Leu
Val Glu Ala Tyr Gly Gly Ile Thr Ser Glu Ala Leu Arg Asp Ala 530 535
540 Arg Glu Ala Tyr Pro Val Leu Thr Ala Val Thr Ile Gly Ala Pro Ala
545 550 555 560 Gln Ala Val Gly Thr Glu Pro Ala Glu Gly Arg Arg Val
Asp Val Asn 565 570 575 Val Glu Ala Phe Asp Asp Asp Leu Gly Arg Leu
Ala Asp Arg Leu Glu 580 585 590 Ala Glu Leu Val Asp Gly Gly Cys Ala
Leu Ile Ile Arg Asn Thr Val 595 600 605 Gly Arg Val Leu Gln Thr Ala
Gln Gln Leu Arg Glu Arg Phe Gly Ala 610 615 620 Gly Gln Val Thr Val
Ala His Ser Arg Phe Ile Asp Leu Asp Arg Ala 625 630 635 640 Arg Lys
Asp Ala Asp Leu Leu Ala Arg Phe Gly His Asp Gly Ala Arg 645 650 655
Pro Arg Arg His Ile Val Val Ala Ser Gln Val Ala Glu Gln Ser Leu 660
665 670 Asp Ile Asp Phe Asp Leu Leu Val Thr Asp Leu Ala Pro Ile Asp
Leu 675 680 685 Val Leu Gln Arg Met Gly Arg Val His Arg His His Arg
Gly Gly Pro 690 695 700 Glu Gln Ser Glu Arg Pro Pro Ser Leu Arg Thr
Ala Arg Cys Leu Val 705 710 715 720 Thr Gly Val Asp Trp Ala Gly Ile
Pro Ser Ala Pro Ile Ala Gly Ser 725 730 735 Val Ala Val Tyr Gly Leu
His Pro Leu Leu Arg Ser Leu Ala Val Leu 740 745 750 Gln Pro Tyr Leu
Thr Gly Ser Ala Leu Thr Leu Pro Gly Asp Ile Asn 755 760 765 Pro Leu
Val Gln Cys Ala Tyr Ala Gln Ser Phe Val Ala Pro Thr Gly 770 775 780
Trp Gly Glu Ala Met Asp Ala Ala Gln Ala Glu His Met Ala His Ile 785
790 795 800 Val Gln Gln Arg Glu Gly Ala Met Ala Phe Cys Leu Asp Glu
Val Arg 805 810 815 Gly Pro Gly Arg Ser Leu Ile Gly Trp Ile Asp Gly
Gly Val Gly Asp 820 825 830 Ala Asp Asp Thr Arg Ala Gly Arg Ala Gln
Val Arg Asp Ser Pro Glu 835 840 845 Thr Ile Glu Val Leu Val Val Gln
Arg Gly Ser Asp Gly Val Leu Arg 850 855 860 Thr Leu Pro Trp Leu Asp
Arg Gly Arg Gly Gly Leu Glu Leu Pro Thr 865 870 875 880 Glu Ala Val
Pro Pro Pro Arg Ala Ala Arg Ala Ala Ala Ala Ser Ala 885 890 895 Leu
Arg Leu Pro Gly Leu Phe Ala Lys Pro Trp Met Phe Asp Arg Val 900 905
910 Leu Arg Glu Leu Glu Arg Glu Tyr His Glu Ala Trp Gln Ala Lys Glu
915 920 925 Ser Ser Trp Leu Gln Gly Glu Leu Leu Leu Val Leu Asp Glu
Glu Cys 930 935 940 Arg Thr Val Leu Ala Gly Tyr Glu Leu Ser Tyr Asn
Pro Asp Asp Gly 945 950 955 960 Leu Glu Met Val Met Pro Gly Glu Pro
His Ala Ala Val Val Arg Asp 965 970 975 Lys Glu Ala Ser Asp Asp Lys
Thr Ala Ser Phe Asp Leu Thr Ser Ala 980 985 990 Pro Trp Leu Pro Val
Leu Tyr Ala Asp Gly Met Gln Gly Val Leu Ser 995 1000 1005 Leu Arg
Asp Val Phe Ala Gln Ser Asn Leu Ile Arg Arg Leu Val 1010 1015 1020
Gly Asp Leu Pro Thr Gln Asp Phe Ala Leu Leu Arg Leu Leu Leu 1025
1030 1035 Ala Val Leu Tyr Asp Ala Val Asp Gly Pro Arg Asp Gly Gln
Asp 1040 1045 1050 Trp Glu Asp Leu Trp Thr Ser Asp Asp Pro Phe Ala
Ala Val Pro 1055 1060 1065 Ala Tyr Leu Asp Ser His Arg Glu Arg Phe
Asp Leu Leu His Pro 1070 1075 1080 Ala Thr Pro Phe Tyr Gln Val Pro
Gly Leu Gln Thr Ala Lys Gly 1085 1090 1095 Glu Val Gly Pro Leu Asn
Lys Ile Val Ala Asp Val Pro Asp Gly 1100 1105 1110 Asp Pro Phe Leu
Thr Met Arg Met Pro Gly Val Glu Gln Leu Ser 1115
1120 1125 Phe Ala Glu Ala Ala Arg Trp Leu Val His Thr Gln Ala Phe
Asp 1130 1135 1140 Thr Ser Gly Ile Lys Ser Gly Val Val Gly Asp Pro
Lys Ala Val 1145 1150 1155 Asn Gly Lys Arg Tyr Pro Gln Gly Val Ala
Trp Leu Gly Asn Leu 1160 1165 1170 Gly Gly Val Phe Ala Glu Gly Asp
Thr Leu Arg Gln Thr Leu Leu 1175 1180 1185 Leu Asn Leu Ile Pro Ala
Asp Thr Thr Asn Leu Gln Val Thr Ser 1190 1195 1200 Ala Gln Asp Val
Pro Ala Trp Arg Gly Thr Asn Gly Arg Ala Gly 1205 1210 1215 Ser Asp
His Ala Asp Ala Glu Pro Arg Val Pro Ala Gly Leu Arg 1220 1225 1230
Asp Leu Tyr Thr Trp Gln Ser Arg Arg Ile Arg Leu Glu Tyr Asp 1235
1240 1245 Thr Arg Gly Val Thr Gly Ala Val Leu Thr Tyr Gly Asp Glu
Leu 1250 1255 1260 Thr Ala His Asn Lys His Gly Val Glu Pro Met Thr
Gly Trp Arg 1265 1270 1275 Arg Ser Lys Pro Gln Glu Lys Lys Leu Gly
Leu Ser Thr Val Tyr 1280 1285 1290 Met Pro Gln Gln His Asp Pro Thr
Arg Ala Ala Trp Arg Gly Ile 1295 1300 1305 Glu Ser Leu Leu Ala Gly
Ser Ala Gly Ser Gly Ser Ser Gln Thr 1310 1315 1320 Gly Glu Pro Ala
Ser His Tyr Arg Pro Lys Ile Val Asp Trp Leu 1325 1330 1335 Gly Glu
Leu Ala His His Gly Asn Leu Pro Ser Arg Gly Leu Ile 1340 1345 1350
Arg Val Arg Thr Ser Gly Ala Val Tyr Gly Thr Gln Gln Ser Ile 1355
1360 1365 Ile Asp Glu Val Val Ser Asp Glu Leu Thr Met Ala Val Val
Leu 1370 1375 1380 Leu His Glu Asp Asp Pro Arg Phe Gly Lys Ala Ala
Val Thr Ala 1385 1390 1395 Val Lys Asp Ala Asp Ser Ala Val Ala Ala
Leu Gly Asp Leu Ala 1400 1405 1410 Ser Asp Leu Ala Arg Ala Ala Gly
Leu Asp Pro Glu Pro Glu Arg 1415 1420 1425 Val Thr Ala Arg Asp Arg
Ala Phe Gly Ala Leu Asp Gly Pro Tyr 1430 1435 1440 Arg Arg Trp Leu
Leu Asp Leu Gly Asn Ser Thr Asp Pro Ala Ala 1445 1450 1455 Met Arg
Ala Val Trp Gln Gly Arg Val Tyr Asp Ile Ile Ala Val 1460 1465 1470
Gln Gly Gln Met Leu Leu Asp Ser Ala Gly Ser Ala Ala Ala Gln 1475
1480 1485 Gly Arg Met Val Lys Thr Thr Arg Gly Glu Arg Trp Met Asp
Asp 1490 1495 1500 Ser Leu Ala Asp Leu Tyr Phe Lys Gly Arg Ile Ala
Lys Ala Leu 1505 1510 1515 Ser Ser Arg Leu Gly Lys Lys Pro Thr Asp
Pro Gly Glu Pro Val 1520 1525 1530 Gly Ile Gln Glu Asp Pro Ala 1535
1540 461407PRTArtificial SequenceFusion Cas3-Cse1 46Met Glu Pro Phe
Lys Tyr Ile Cys His Tyr Trp Gly Lys Ser Ser Lys 1 5 10 15 Ser Leu
Thr Lys Gly Asn Asp Ile His Leu Leu Ile Tyr His Cys Leu 20 25 30
Asp Val Ala Ala Val Ala Asp Cys Trp Trp Asp Gln Ser Val Val Leu 35
40 45 Gln Asn Thr Phe Cys Arg Asn Glu Met Leu Ser Lys Gln Arg Val
Lys 50 55 60 Ala Trp Leu Leu Phe Phe Ile Ala Leu His Asp Ile Gly
Lys Phe Asp 65 70 75 80 Ile Arg Phe Gln Tyr Lys Ser Ala Glu Ser Trp
Leu Lys Leu Asn Pro 85 90 95 Ala Thr Pro Ser Leu Asn Gly Pro Ser
Thr Gln Met Cys Arg Lys Phe 100 105 110 Asn His Gly Ala Ala Gly Leu
Tyr Trp Phe Asn Gln Asp Ser Leu Ser 115 120 125 Glu Gln Ser Leu Gly
Asp Phe Phe Ser Phe Phe Asp Ala Ala Pro His 130 135 140 Pro Tyr Glu
Ser Trp Phe Pro Trp Val Glu Ala Val Thr Gly His His 145 150 155 160
Gly Phe Ile Leu His Ser Gln Asp Gln Asp Lys Ser Arg Trp Glu Met 165
170 175 Pro Ala Ser Leu Ala Ser Tyr Ala Ala Gln Asp Lys Gln Ala Arg
Glu 180 185 190 Glu Trp Ile Ser Val Leu Glu Ala Leu Phe Leu Thr Pro
Ala Gly Leu 195 200 205 Ser Ile Asn Asp Ile Pro Pro Asp Cys Ser Ser
Leu Leu Ala Gly Phe 210 215 220 Cys Ser Leu Ala Asp Trp Leu Gly Ser
Trp Thr Thr Thr Asn Thr Phe 225 230 235 240 Leu Phe Asn Glu Asp Ala
Pro Ser Asp Ile Asn Ala Leu Arg Thr Tyr 245 250 255 Phe Gln Asp Arg
Gln Gln Asp Ala Ser Arg Val Leu Glu Leu Ser Gly 260 265 270 Leu Val
Ser Asn Lys Arg Cys Tyr Glu Gly Val His Ala Leu Leu Asp 275 280 285
Asn Gly Tyr Gln Pro Arg Gln Leu Gln Val Leu Val Asp Ala Leu Pro 290
295 300 Val Ala Pro Gly Leu Thr Val Ile Glu Ala Pro Thr Gly Ser Gly
Lys 305 310 315 320 Thr Glu Thr Ala Leu Ala Tyr Ala Trp Lys Leu Ile
Asp Gln Gln Ile 325 330 335 Ala Asp Ser Val Ile Phe Ala Leu Pro Thr
Gln Ala Thr Ala Asn Ala 340 345 350 Met Leu Thr Arg Met Glu Ala Ser
Ala Ser His Leu Phe Ser Ser Pro 355 360 365 Asn Leu Ile Leu Ala His
Gly Asn Ser Arg Phe Asn His Leu Phe Gln 370 375 380 Ser Ile Lys Ser
Arg Ala Ile Thr Glu Gln Gly Gln Glu Glu Ala Trp 385 390 395 400 Val
Gln Cys Cys Gln Trp Leu Ser Gln Ser Asn Lys Lys Val Phe Leu 405 410
415 Gly Gln Ile Gly Val Cys Thr Ile Asp Gln Val Leu Ile Ser Val Leu
420 425 430 Pro Val Lys His Arg Phe Ile Arg Gly Leu Gly Ile Gly Arg
Ser Val 435 440 445 Leu Ile Val Asp Glu Val His Ala Tyr Asp Thr Tyr
Met Asn Gly Leu 450 455 460 Leu Glu Ala Val Leu Lys Ala Gln Ala Asp
Val Gly Gly Ser Val Ile 465 470 475 480 Leu Leu Ser Ala Thr Leu Pro
Met Lys Gln Lys Gln Lys Leu Leu Asp 485 490 495 Thr Tyr Gly Leu His
Thr Asp Pro Val Glu Asn Asn Ser Ala Tyr Pro 500 505 510 Leu Ile Asn
Trp Arg Gly Val Asn Gly Ala Gln Arg Phe Asp Leu Leu 515 520 525 Ala
His Pro Glu Gln Leu Pro Pro Arg Phe Ser Ile Gln Pro Glu Pro 530 535
540 Ile Cys Leu Ala Asp Met Leu Pro Asp Leu Thr Met Leu Glu Arg Met
545 550 555 560 Ile Ala Ala Ala Asn Ala Gly Ala Gln Val Cys Leu Ile
Cys Asn Leu 565 570 575 Val Asp Val Ala Gln Val Cys Tyr Gln Arg Leu
Lys Glu Leu Asn Asn 580 585 590 Thr Gln Val Asp Ile Asp Leu Phe His
Ala Arg Phe Thr Leu Asn Asp 595 600 605 Arg Arg Glu Lys Glu Asn Arg
Val Ile Ser Asn Phe Gly Lys Asn Gly 610 615 620 Lys Arg Asn Val Gly
Arg Ile Leu Val Ala Thr Gln Val Val Glu Gln 625 630 635 640 Ser Leu
Asp Val Asp Phe Asp Trp Leu Ile Thr Gln His Cys Pro Ala 645 650 655
Asp Leu Leu Phe Gln Arg Leu Gly Arg Leu His Arg His His Arg Lys 660
665 670 Tyr Arg Pro Ala Gly Phe Glu Ile Pro Val Ala Thr Ile Leu Leu
Pro 675 680 685 Asp Gly Glu Gly Tyr Gly Arg His Glu His Ile Tyr Ser
Asn Val Arg 690 695 700 Val Met Trp Arg Thr Gln Gln His Ile Glu Glu
Leu Asn Gly Ala Ser 705 710 715 720 Leu Phe Phe Pro Asp Ala Tyr Arg
Gln Trp Leu Asp Ser Ile Tyr Asp 725 730 735 Asp Ala Glu Met Asp Glu
Pro Glu Trp Val Gly Asn Gly Met Asp Lys 740 745 750 Phe Glu Ser Ala
Glu Cys Glu Lys Arg Phe Lys Ala Arg Lys Val Leu 755 760 765 Gln Trp
Ala Glu Glu Tyr Ser Leu Gln Asp Asn Asp Glu Thr Ile Leu 770 775 780
Ala Val Thr Arg Asp Gly Glu Met Ser Leu Pro Leu Leu Pro Tyr Val 785
790 795 800 Gln Thr Ser Ser Gly Lys Gln Leu Leu Asp Gly Gln Val Tyr
Glu Asp 805 810 815 Leu Ser His Glu Gln Gln Tyr Glu Ala Leu Ala Leu
Asn Arg Val Asn 820 825 830 Val Pro Phe Thr Trp Lys Arg Ser Phe Ser
Glu Val Val Asp Glu Asp 835 840 845 Gly Leu Leu Trp Leu Glu Gly Lys
Gln Asn Leu Asp Gly Trp Val Trp 850 855 860 Gln Gly Asn Ser Ile Val
Ile Thr Tyr Thr Gly Asp Glu Gly Met Thr 865 870 875 880 Arg Val Ile
Pro Ala Asn Pro Lys Gly Asp Pro Thr Asn Arg Ala Lys 885 890 895 Gly
Leu Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile Asp Asn 900 905
910 Trp Ile Pro Val Arg Pro Arg Asn Gly Gly Lys Val Gln Ile Ile Asn
915 920 925 Leu Gln Ser Leu Tyr Cys Ser Arg Asp Gln Trp Arg Leu Ser
Leu Pro 930 935 940 Arg Asp Asp Met Glu Leu Ala Ala Leu Ala Leu Leu
Val Cys Ile Gly 945 950 955 960 Gln Ile Ile Ala Pro Ala Lys Asp Asp
Val Glu Phe Arg His Arg Ile 965 970 975 Met Asn Pro Leu Thr Glu Asp
Glu Phe Gln Gln Leu Ile Ala Pro Trp 980 985 990 Ile Asp Met Phe Tyr
Leu Asn His Ala Glu His Pro Phe Met Gln Thr 995 1000 1005 Lys Gly
Val Lys Ala Asn Asp Val Thr Pro Met Glu Lys Leu Leu 1010 1015 1020
Ala Gly Val Ser Gly Ala Thr Asn Cys Ala Phe Val Asn Gln Pro 1025
1030 1035 Gly Gln Gly Glu Ala Leu Cys Gly Gly Cys Thr Ala Ile Ala
Leu 1040 1045 1050 Phe Asn Gln Ala Asn Gln Ala Pro Gly Phe Gly Gly
Gly Phe Lys 1055 1060 1065 Ser Gly Leu Arg Gly Gly Thr Pro Val Thr
Thr Phe Val Arg Gly 1070 1075 1080 Ile Asp Leu Arg Ser Thr Val Leu
Leu Asn Val Leu Thr Leu Pro 1085 1090 1095 Arg Leu Gln Lys Gln Phe
Pro Asn Glu Ser His Thr Glu Asn Gln 1100 1105 1110 Pro Thr Trp Ile
Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro Ala 1115 1120 1125 Ser Ser
Ile Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His 1130 1135 1140
Ile Glu Leu Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser Cys Cys 1145
1150 1155 Gly Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys Glu
Lys 1160 1165 1170 Phe Thr Phe Thr Val Asn Gly Leu Trp Pro His Pro
His Ser Pro 1175 1180 1185 Cys Leu Val Thr Val Lys Lys Gly Glu Val
Glu Glu Lys Phe Leu 1190 1195 1200 Ala Phe Thr Thr Ser Ala Pro Ser
Trp Thr Gln Ile Ser Arg Val 1205 1210 1215 Val Val Asp Lys Ile Ile
Gln Asn Glu Asn Gly Asn Arg Val Ala 1220 1225 1230 Ala Val Val Asn
Gln Phe Arg Asn Ile Ala Pro Gln Ser Pro Leu 1235 1240 1245 Glu Leu
Ile Met Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile Leu 1250 1255 1260
Glu Arg Arg His Asp Val Leu Met Phe Asn Gln Gly Trp Gln Gln 1265
1270 1275 Tyr Gly Asn Val Ile Asn Glu Ile Val Thr Val Gly Leu Gly
Tyr 1280 1285 1290 Lys Thr Ala Leu Arg Lys Ala Leu Tyr Thr Phe Ala
Glu Gly Phe 1295 1300 1305 Lys Asn Lys Asp Phe Lys Gly Ala Gly Val
Ser Val His Glu Thr 1310 1315 1320 Ala Glu Arg His Phe Tyr Arg Gln
Ser Glu Leu Leu Ile Pro Asp 1325 1330 1335 Val Leu Ala Asn Val Asn
Phe Ser Gln Ala Asp Glu Val Ile Ala 1340 1345 1350 Asp Leu Arg Asp
Lys Leu His Gln Leu Cys Glu Met Leu Phe Asn 1355 1360 1365 Gln Ser
Val Ala Pro Tyr Ala His His Pro Lys Leu Ile Ser Thr 1370 1375 1380
Leu Ala Leu Ala Arg Ala Thr Leu Tyr Lys His Leu Arg Glu Leu 1385
1390 1395 Lys Pro Gln Gly Gly Pro Ser Asn Gly 1400 1405
4749DNAArtificial SequenceCASCADE sequence cleavage 47ccgtcttgcg
ctagctctag aactagtcct cagcctaggc ctaagctgt 49
* * * * *