Non-human Animals Having A Hexanucleotide Repeat Expansion In A C9orf72 Locus Heslin; David ; et al. [Regeneron Pharmaceuticals, Inc.]

Non-human Animals Having A Hexanucleotide Repeat Expansion In A C9orf72 Locus

Heslin; David ; et al.

Patent Application Summary

U.S. patent application number 16/986077 was filed with the patent office on 2020-11-26 for non-human animals having a hexanucleotide repeat expansion in a c9orf72 locus. The applicant listed for this patent is Regeneron Pharmaceuticals, Inc.. Invention is credited to Roxanne Ally, Gustavo Droguett, David Frendewey, Chunguang Guo, David Heslin, Daisuke Kajimura, Michael LaCroix-Fralish, Ka-Man Venus Lai, Lynn Macdonald, Alexander O. Mujica, Aarti Sharma-Kanning, Chia-Jen Siao, David M. Valenzuela.

Application Number	20200370054 16/986077
Document ID	/
Family ID	1000005008392
Filed Date	2020-11-26

View All Diagrams

United States Patent Application	20200370054
Kind Code	A1
Heslin; David ; et al.	November 26, 2020

NON-HUMAN ANIMALS HAVING A HEXANUCLEOTIDE REPEAT EXPANSION IN A C9ORF72 LOCUS

Abstract

A non-human animal (e.g., a rodent) model for diseases associated with a C9ORF72 heterologous hexanucleotide repeat expansion sequence is provided, which non-human animal comprises a heterologous hexanucleotide repeat (GGGGCC) in an endogenous C9ORF72 locus. A non-human animal disclosed herein comprising a heterologous hexanucleotide repeat expansion sequence comprising at least one instance, e.g., repeat, of a hexanucleotide (GGGGCC) sequence may further exhibit a characteristic and/or phenotype associated with one or more neurodegenerative disorders (e.g., amyotrophic lateral sclerosis (ALS) and/or frontotemporal dementia (FTD), etc.). Methods of identifying therapeutic candidates that may be used to prevent, delay or treat one or more neurodegenerative (e.g., amyotrophic lateral sclerosis (ALS, also referred to as Lou Gehrig's disease) and frontotemporal dementia (FTD)) are also provided.

Inventors:

Heslin; David; (Closter, NJ) ; Ally; Roxanne; (Briarwood, NY) ; Siao; Chia-Jen; (New York, NY) ; Lai; Ka-Man Venus; (Seattle, WA) ; Valenzuela; David M.; (Yorktown Heights, NY) ; Guo; Chunguang; (Thornwood, NY) ; LaCroix-Fralish; Michael; (Yorktown Heights, NY) ; Macdonald; Lynn; (Harrison, NY) ; Sharma-Kanning; Aarti; (New York, NY) ; Kajimura; Daisuke; (New York, NY) ; Droguett; Gustavo; (New City, NY) ; Frendewey; David; (New York, NY) ; Mujica; Alexander O.; (Elmsford, NY)

Applicant:

Name	City	State	Country	Type
Regeneron Pharmaceuticals, Inc.	Tarrytown	NY	US

Family ID:

1000005008392

Appl. No.:

16/986077

Filed:

August 5, 2020

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
15721517	Sep 29, 2017	10781453
16986077
62452795	Jan 31, 2017
62402613	Sep 30, 2016

Current U.S. Class:	1/1
Current CPC Class:	A01K 67/0278 20130101; C12N 2310/20 20170501; A01K 2267/0318 20130101; C12N 5/0619 20130101; C12N 15/907 20130101; C12N 5/0623 20130101; C12N 15/625 20130101; C12N 9/22 20130101; C12N 2800/30 20130101; C12N 15/113 20130101; A01K 2217/072 20130101; A01K 2227/105 20130101
International Class:	C12N 15/62 20060101 C12N015/62; A01K 67/027 20060101 A01K067/027; C12N 15/113 20060101 C12N015/113; C12N 15/90 20060101 C12N015/90; C12N 5/0793 20060101 C12N005/0793; C12N 5/0797 20060101 C12N005/0797

Claims

1. A non-human animal or non-human animal cell comprising in its genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf7 2 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least one repeat of the hexanucleotide sequence set forth as SEQ ID NO: 1.

2.-32. (canceled)

33. A method of identifying a therapeutic candidate for the treatment of a disease or condition associated with the presence of a hexanucleotide repeat expansion sequence, the method comprising (a) administering a candidate agent to a non-human animal or a non-human animal cell comprising a C9orf72 locus genetically modified to comprise a hexanucleotide repeat expansion sequence comprising at least one repeat of the hexanucleotide sequence set forth as SEQ ID NO:1; (b) performing one or more assays to determine if the candidate agent has an effect on one or more signs, symptoms and/or conditions associated with the disease or condition; and (c) identifying the candidate agent that has an effect on the one or more signs, symptoms and/or conditions associated with the disease or condition as the therapeutic candidate.

34.-40. (canceled)

41. A host cell comprising a heterologous hexanucleotide repeat expansion sequence.

42. The host cell of claim 41, wherein the host cell is a bacterial cell.

43. A CRISPR/Cas system comprising a Cas protein and/or one or more gRNA, wherein the one or more gRNA is encoded by a DNA comprising a sequence selected from the group consisting of SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49; SEQ ID NO:50, and a combination thereof.

44. The CRISPR/Cas system of claim 43, wherein the one or more gRNA comprises a first, second and third gRNA, wherein the first gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 39, wherein the second gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 44, and wherein the third gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 50.

45. The CRISPR/Cas system of claim 44, further comprising a fourth gRNA encoded by a DNA comprising the sequence set forth as SEQ ID NO: 47.

46. The CRISPR/Cas system of claim 45, further comprising a fifth, sixth, and seventh gRNA, wherein the fifth gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 46, the sixth gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 48, and the seventh gRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO: 49.

47. The CRISPR/Cas system of claim 42, wherein the gRNA comprises a tracrRNA encoding by a DNA comprising a sequence set forth as SEQ ID NO:63, 64 or 65.

48. The CRISPR/Cas system of claim 47, wherein the tracrRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO:63.

49. The CRISPR/Cas system of claim 47, wherein the tracrRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO:64.

50. The CRISPR/Cas system of claim 47, wherein the tracrRNA is encoded by a DNA comprising the sequence set forth as SEQ ID NO:65.

51. The CRISPR/Cas system of claim 43, further comprising an expression construct, wherein the expression construct comprises a nucleic acid encoding the Cas protein and/or DNA encoding the at least one gRNA, and wherein the expression construct optionally further comprises a drug resistance gene.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application No. 62/402,613, filed Sep. 30, 2016, and U.S. Provisional Application No. 62/452,795, filed Jan. 31, 2017, each of which is hereby incorporated herein in its entirety by reference.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING

[0002] An official copy of the sequence listing is submitted concurrently with the specification electronically via EFS-Web as an ASCII formatted sequence listing with a file name of "2017-09-29-10267US01-SEQ-LIST_ST25", a creation date of Sep. 29, 2017, and a size of about 94 KB. The sequence listing contained in this ASCII formatted document is part of the specification and is herein incorporated by reference in its entirety.

BACKGROUND

[0003] Neurodegenerative diseases are major contributors to disability and disease. In particular, amyotrophic lateral sclerosis (ALS, also referred to as Lou Gehrig's disease) and frontotemporal dementia (FTD) are rare nervous system disorders characterized by progressive neuronal loss and/or death.

[0004] Although aging is viewed as the greatest risk factor for neurodegenerative disease, several genetic components have been discovered. For example, mutations in the copper-zinc superoxide dismutase (SOD1) gene have long been associated with ALS. Also, expanded hexanucleotide repeats of GGGGCC within a non-coding region of the C9ORF72 gene have been linked to both ALS and FTD. Currently, there is no cure for either disease, although some treatments are able to prolong life by about 3-5 months.

[0005] While various laboratory animal models are extensively used in the development of most therapeutics, very few if any models exist that address neurodegenerative and inflammatory diseases in ways that provide for elucidation of the exact molecular mechanism by which identified genetic components cause disease, which elucidation in turn may uncover potential therapeutic modalities for not only ALS or other neurodegenerative diseases having a similar clinical presentation. Thus, the manner in which genetic mutations cause neurodegenerative disease remains largely unknown. Ideal animal models would contain the same genetic components and represent similar characteristics of human disease. Given the genetic differences between species, there is a high unmet need for the development of improved animal models that closely recapitulate human neurodegenerative and/or inflammatory disease. Of course, such improved animal models provide significant value in the development of effective therapeutic and/or prophylactic agents.

SUMMARY

[0006] The present invention encompasses the recognition that it is desirable to engineer non-human animals or non-human animal cells (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) to permit improved in vivo or in vitro systems for identifying and developing new therapeutics and, in some embodiments, therapeutic regimens, which can be used for the treatment of neurodegenerative diseases, disorders and conditions. In some embodiments, the in vivo or in vitro systems as described herein can be used for identifying and developing new therapeutics for treating diseases, disorders, and/or conditions associated with the C9ORF72 locus, particularly a heterologous hexanucleotide repeat expansion sequence in the locus, such as, e.g., neurodegenerative disorders. Further, non-human animals or non-human animal cells (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein that comprise an insertion of a hexanucleotide repeat expansion sequence in a C9ORF72 locus are desirable, for example, for use in identifying and developing therapeutics that target a GGGGCC hexanucleotide repeat (SEQ ID NO:1), products derived therefrom, e.g., sense or antisense RNA transcribed therefrom, a RAN translation product and/or dipeptide repeat protein encoded by the hexanucleotide repeat, etc. In some embodiments, non-human animals and non-human animal cells (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as described herein respectively provide improved in vivo and in vitro systems (or models) for neurodegenerative diseases, disorders and conditions (e.g., ALS and/or FTD).

[0007] A non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its genome a heterologous hexanucleotide repeat expansion sequence inserted into an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least one repeat of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its germline genome a heterologous hexanucleotide repeat expansion sequence inserted into an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least one repeat of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous hexanucleotide expansion sequence is a non-rodent (e.g., non-rat or non-mouse, e.g., a human) hexanucleotide expansion sequence that comprises at least one instance, e.g., repeat, of the hexanucleotide sequence set forth as SEQ ID NO:1. In some embodiments, the (human) heterologous hexanucleotide repeat expansion sequence comprises more than one, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the (human) heterologous hexanucleotide repeat expansion sequence comprises at least about three, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about five, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about ten, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about fifteen, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about twenty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about thirty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about forty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about fifty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about sixty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about seventy, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about eighty, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about ninety, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the heterologous (human) hexanucleotide repeat expansion sequence comprises at least about one-hundred, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1. In some embodiments, the non-human animal comprises the heterologous (human) hexanucleotide repeat expansion sequence in its germline genome.

[0008] In some embodiments, the heterologous (e.g., non-rodent, non-rat, non-mouse and/or human) hexanucleotide repeat expansion sequence comprises heterologous (e.g., non-rodent, non-rat, non-mouse and/or human) sequences that flank the at least one, e.g., at least about three, at least about five, at least about ten, at least about fifteen, at least about twenty, at least about thirty, at least about forty, at least about fifty, at least about sixty, at least about seventy, at least about eighty, at least about ninety or at least about one-hundred, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO:1. Accordingly, a heterologous (e.g., non-rodent, non-rat, non-mouse, and/or human) hexanucleotide repeat expansion sequence may comprise from 5' to 3': a first heterologous hexanucleotide flanking sequence, one or more (preferably contiguous) instances of the hexanucleotide set forth as SEQ ID NO:1, and a second heterologous hexanucleotide flanking sequence. In some embodiments, a heterologous hexanucleotide repeat expansion sequence is identical to or substantially identical to a naturally occurring genomic sequence comprising a first heterologous hexanucleotide flanking sequence, one or more instances of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second heterologous hexanucleotide flanking sequence. Naturally occurring first and/or second heterologous hexanucleotide flanking sequences may each independently be, e.g., at least 4 base pairs in length, e.g., at least 10 base pairs in length, e.g., at least 20 base pairs in length etc.

[0009] In some embodiments, a heterologous human hexanucleotide expansion sequence spans (and optionally encompasses) all or portions of exons 1a and/or exon 1b of a human C9orf72 gene. In some embodiments, a first heterologous hexanucleotide flanking sequence comprises all or part of the sequence of exon 1a of a human C9orf72 gene (set forth as SEQ ID NO:34) and/or a second heterologous hexanucleotide flanking sequence comprises all or part of the sequence of exon 1b of a human C9orf72 gene (set forth as SEQ ID NO:35). In some embodiments, a first heterologous hexanucleotide flanking sequence comprises the sequence set forth as SEQ ID NO:36, or a portion thereof, and/or a second heterologous hexanucleotide flanking sequence comprises the sequence set forth as SEQ ID NO:37, or a portion thereof.

[0010] An exemplary human hexanucleotide repeat expansion sequence is set forth as SEQ ID NO:2 (comprising from 5' to 3': a first heterologous hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, 3 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second heterologous hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37). Another exemplary human hexanucleotide repeat expansion sequence is set forth as SEQ ID NO:3 (comprising from 5' to 3': a first heterologous hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, 100 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second heterologous hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37). Accordingly, disclosed herein are non-human animals, e.g., rodents such as a rat or a mouse, whose genomes comprise in an endogenous C9orf72 locus a sequence set forth as SEQ ID NO:2, a variant of SEQ ID NO:2, a sequence set forth as SEQ ID NO:3, or a variant of SEQ ID NO:3.

[0011] In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:2 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), one or two contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or a non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as described herein comprises in its genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), more than one and less than 100 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its (germline) genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), 36 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as described herein comprises in its genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), 92 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35).

[0012] In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as disclosed herein is heterozygous or homozygous for a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:2 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), one or two contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its (germline) genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), more than one and less than 100 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its (germline) genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), 36 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35). In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its (germline) genome a hexanucleotide repeat expansion sequence comprising a sequence that is a SEQ ID NO:3 variant, which comprises from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:34), 92 contiguous repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37 (or a portion thereof, e.g., a sequence set forth as SEQ ID NO:35).

[0013] In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) comprises in its (germline) genome a replacement of 5' untranslated and/or non-coding endogenous non-human sequences of the endogenous C9orf72 locus with the heterologous (human) hexanucleotide repeat expansion sequence. In some embodiments, the untranslated and/or non-coding sequence spanning between (and optionally encompassing at least a portion of) endogenous exon 1 (e.g., exon 1a and/or 1b) and the ATG start codon of the endogenous non-human C9orf72 locus, or a portion thereof, is replaced with the heterologous hexanucleotide repeat expansion sequence. Additional sequences (e.g., recombinase recognition sequences, a drug resistance cassette, a reporter gene, etc.) linked to the heterologous (human) hexanucleotide expansion sequence, may also replace the untranslated and/or non-coding sequence spanning between (and optionally encompassing) endogenous exon 1 (e.g., exon 1a and/or exon 1b) and the ATG start codon of the endogenous non-human C9orf72 locus, or a portion thereof.

[0014] Accordingly, in some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as disclosed herein may comprise a heterozygous or homozygous replacement of an endogenous sequence that (1) starts from the 5' end, within, or the 3' end of an endogenous exon 1 and (2) ends 5' of the endogenous ATG start codon, or a portion thereof, with a heterologous hexanucleotide repeat expansion sequence, e.g., a hexanucleotide repeat expansion sequence comprising a least one repeat of the hexanucleotide sequence set forth as SEQ ID NO:1. In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as disclosed herein may comprise a heterozygous or homozygous replacement of an endogenous sequence that (i) starts from the 5' end of, within, or from the 3' end of an endogenous exon 1 and (ii) ends 5' of the endogenous ATG start codon, or a portion thereof, with a heterologous hexanucleotide repeat expansion comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 34, at least one instance of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:35. In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as disclosed herein may comprise a heterozygous or homozygous replacement of an endogenous sequence that (ii) starts from the 5' end of, within, or the 3' end of an endogenous exon 1 and (ii) ends 5' of the endogenous ATG start codon, or a portion thereof, with a heterologous hexanucleotide repeat expansion sequence comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, at least one instance of the hexanucleotide sequence set forth as SEQ ID NO:1, and a second human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37. In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) as disclosed herein may comprise a heterozygous or homozygous replacement of an endogenous sequence that (ii) starts from the 5' end of, within, or the 3' end of an endogenous exon 1 and (ii) ends 5' of the endogenous ATG start codon, or a portion thereof, with a heterologous hexanucleotide repeat expansion sequence comprising the sequence set forth as SEQ ID NO:2, a variant thereof, SEQ ID NO:3 or a variant thereof.

[0015] In some embodiments, a non-human animal or non-human animal cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises one or more repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits one or more of the following characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises three or more repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits one or more of the following characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least thirty repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits one or more of the following characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci comprising an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises ninety or more repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits one or more of the following characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises ninety-two repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits all of the following three characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci comprising an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, and (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence. In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its (germline) genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises more than ninety repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits all of the following three characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci comprising an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, and (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence. In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein the non-human animal or cell exhibits all of the following three characteristics: (i) increased expression of C9orf72 RNA sense and/or antisense transcripts compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by quantitative PCR (ii) an increased number of RNA foci comprising an increased number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by fluorescence activated in situ hybridization, and (iii) an increased level of dipeptide repeat proteins compared to a control animal or cell comprising a wildtype C9orf72 locus, e.g., as evaluated by immunofluorescence.

[0016] In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises a repeat of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein one or more of the following characteristics of the non-human animal or cell is not significantly different compared to a control non-human animal or cell comprising a wildtype C9orf72 locus: (i) the amount of C9orf72 RNA sense and/or antisense transcripts compared, e.g., as evaluated by quantitative PCR (ii) the number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) the level of dipeptide repeat proteins, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises three repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein one or more of the following characteristics of the non-human animal or cell is not significantly different compared to a control non-human animal or cell comprising a wildtype C9orf72 locus: (i) the amount of C9orf72 RNA sense and/or antisense transcripts compared, e.g., as evaluated by quantitative PCR (ii) the number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) the level of dipeptide repeat proteins, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii). In some embodiments, a non-human animal or cell (e.g., embryonic stem cell, embryonic stem cell derived-motor neuron, brain cell, neuronal cell, muscle cell, heart cell) described herein comprises in its genome a heterologous hexanucleotide repeat expansion sequence inserted at an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises thirty repeats of the hexanucleotide sequence set forth as SEQ ID NO: 1, and wherein one or more of the following characteristics of the non-human animal or cell is not significantly different compared to a control non-human animal or cell comprising a wildtype C9orf72 locus: (i) the amount of C9orf72 RNA sense and/or antisense transcripts compared, e.g., as evaluated by quantitative PCR (ii) the number of RNA foci comprising a C9orf72 RNA sense and/or antisense transcript, e.g., as evaluated by fluorescence activated in situ hybridization, (iii) the level of dipeptide repeat proteins, e.g., as evaluated by immunofluorescence or (iv) any combination of (i)-(iii).

[0017] In some embodiments, a nucleic acid construct (or targeting construct, or targeting vector) as described herein is provided.

[0018] In some embodiments, a nucleic acid construct as described herein comprises, from 5' to 3', a 5' non-human targeting arm comprising a polynucleotide that is homologous to a 5' portion of a non-human (e.g., a rodent such as a mouse or a rat) C9ORF72 locus, a heterologous hexanucleotide repeat expansion sequence comprising at least one of a hexanucleotide sequence set forth as SEQ ID NO:1, a first recombinase recognition site; a first promoter operably linked to a selectable marker, a second recombinase recognition site, and a 3' non-human targeting arm comprising a polynucleotide that is homologous to a 3' portion of a non-human (e.g., a rodent such as a mouse or a rat) C9ORF72 locus. In some embodiments, the 5' portion of a non-human (e.g., a rodent such as a mouse or rat) C9ORF72 locus includes a genomic sequence upstream of exon 1 of the non-human (e.g., rodent such as mouse or rat) C9ORF72 gene.

[0019] In some embodiments, recombinase recognition sites include loxP, lox511, lox2272, lox2372, lox66, lox71, loxM2, lox5171, FRT, FRT11, FRT71, attp, att, FRT, rox, or a combination thereof. In some embodiments, a recombinase gene is included in the construct, e.g., under the control of an inducible promoter. The recombinase gene may be selected from the group consisting of Cre, Flp (e.g., Flpe, Flpo), and Dre. In some certain embodiments, first and second recombinase recognition sites are lox (e.g., loxP) sites, and a recombinase gene encodes a Cre recombinase.

[0020] In some embodiments, a first promoter is selected from the group consisting of protamine (Prot; e.g., Prot1 or Prot5), Blimp1, Blimp1 (1 kb fragment), Blimp1 (2 kb fragment), Gata6, Gata4, Igf2, Lhx2, Lhx5, hUB1, Em7 and Pax3. In some certain embodiments, a first promoter is a hUB1 promoter in combination with an Em7 promoter.

[0021] In some embodiments, a selectable marker is selected from group consisting of neomycin phosphotransferase (neo.sup.r), hygromycin B phosphotransferase (hyg.sup.r), puromycin-N-acetyltransferase (puro.sup.r), blasticidin S deaminase (bsr.sup.r), xanthine/guanine phosphoribosyl transferase (gpt), and Herpes simplex virus thymidine kinase (HSV-tk). In some certain embodiments, a selectable marker is neo.sup.r.

[0022] In some embodiments, the nucleic acid construct comprises the sequence set forth as SEQ ID NO:8, which comprises from 5' to 3': a 5' non-human (mouse) targeting arm, a first human hexanucleotide flanking sequence comprising the sequence set forth as SEQ ID NO:36, three repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a second human hexanucleotide flanking sequence comprising the sequence set forth as SEQ ID NO:37, a floxed drug resistance (neo.sup.r) cassette and a 3' non-human (mouse) targeting arm. In some embodiments, the nucleic acid construct comprises the sequence set forth as SEQ ID NO:9, which comprises from 5' to 3': a 5' non-human (mouse) targeting arm, a first human hexanucleotide flanking sequence comprising the sequence set forth as SEQ ID NO:36, one-hundred repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a second human hexanucleotide flanking sequence comprising the sequence set forth as SEQ ID NO:37, a floxed drug resistance (neo.sup.r) cassette and a 3' non-human (mouse) targeting arm.

[0023] In some embodiments, a method of making a non-human animal or non-human animal cell is provided whose genome comprises an insertion of a heterologous hexanucleotide repeat expansion sequence into an endogenous C9orf72 locus, wherein the heterologous hexanucleotide repeat expansion sequence comprises at least one, e.g., at least about 3 repeats, e.g., at least about 30 repeats, e.g., at least about 90 repeats, of a hexanucleotide sequence set forth as SEQ ID NO:1, the method comprising (a) introducing a nucleic acid sequence, e.g., a nucleic acid construct as described herein (e.g., a nucleic acid construct comprising a sequence set forth as SEQ ID NO:8 or a nucleic acid construct comprising a sequence set forth as SEQ ID NO:9), into a non-human embryonic stem cell so that the heterologous hexanucleotide repeat expansion sequence is inserted into an endogenous C9ORF72 locus, which nucleic acid comprises a polynucleotide that is homologous to the C9ORF72 locus; (b) obtaining a genetically modified non-human embryonic stem cell from (a); and optionally, (c) creating a non-human animal using the genetically modified non-human embryonic stem cell of (b). In some embodiments, a method of making a non-human animal described herein further comprises a step of breeding a non-human animal generated in (c) so that a non-human animal homozygous for the insertion is created.

[0024] In some embodiments, a method for making a non-human animal whose genome comprises an insertion of a heterologous hexanucleotide repeat expansion sequence, which comprises at least one repeat of the hexanucleotide sequence set forth as SEQ ID NO:1, in an endogenous C9ORF72 locus is provided, the method comprising modifying the genome of a non-human animal so that it comprises an inserted heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus, thereby making said non-human animal.

[0025] In some embodiments, a non-human animal is provided which is obtainable by, generated from, or produced from a method as described herein. In some embodiments, a non-human animal as disclosed herein is produced using a nucleic acid construct comprising a sequence set forth as SEQ ID NO:8. Such a non-human animal comprises a heterozygous or homozygous replacement of about 853 bp of an endogenous C9orf72 locus starting from within endogenous exon 1 with a heterologous nucleotide sequence comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, one to three repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37, and a floxed drug resistance (neo.sup.r) cassette, or upon excision of the neo gene, a lox recombination recognition sequence. In some embodiments, a non-human animal as disclosed herein is produced using a nucleic acid construct comprising a sequence set forth as SEQ ID NO:9. Such a non-human animal comprises a heterozygous or homozygous replacement of about 853 bp of an endogenous C9orf72 locus starting from within endogenous exon 1 with a heterologous nucleotide sequence comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, one to one-hundred (e.g., 36 or 92) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37, and a floxed drug resistance (neo.sup.r) cassette, or upon excision of the neo gene, a lox recombination recognition sequence. In some embodiments, a non-human animal comprises a heterologous nucleotide sequence set forth as SEQ ID NO:4 (8026), a heterologous nucleotide sequence set forth as SEQ ID NO:5 (8027), a heterologous nucleotide sequence set forth as SEQ ID NO:6 (8028), or a heterologous nucleotide sequence set forth as SEQ ID NO:7 (8029), wherein the heterologous nucleotide sequence optionally replaces about 853 bp of an untranslated and/or non-coding sequence of an endogenous C9orf72 locus that starts within endogenous exon 1. In some embodiments, a non-human animal as disclosed herein is produced, e.g., by breeding an animal created using a nucleic acid construct comprising a sequence set forth as SEQ ID NO:8 with an animal created using a nucleic acid construct comprising a sequence set forth as SEQ ID NO:9. Such animals may comprise both (1) a heterozygous replacement of about 853 bp of an endogenous C9orf72 locus starting from within endogenous exon 1 with a heterologous nucleotide sequence comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, one to three repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37, and a floxed drug resistance (neo.sup.r) cassette, or upon excision of the neo gene, a lox recombination recognition sequence and (2) a heterozygous replacement of about 853 bp of an endogenous C9orf72 locus starting from within endogenous exon 1 with a heterologous nucleotide sequence comprising from 5' to 3': a first human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO: 36, one to one-hundred (e.g., 36 or 92) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a human hexanucleotide flanking sequence comprising a sequence set forth as SEQ ID NO:37, and a floxed drug resistance (neo.sup.r) cassette, or upon excision of the neo gene, a lox recombinase recognition sequence.

[0026] In some embodiments, an isolated non-human cell or tissue of a non-human animal as described herein, or as made by a method described herein, is provided. In some embodiments, an isolated cell or tissue comprises a C9ORF72 locus as described herein. In some embodiments, a cell is a neuronal cell or a cell from a neuronal lineage. In some embodiments, an immortalized cell line is provided, which is made from an isolated cell of a non-human animal as described herein.

[0027] In some embodiments, a non-human embryonic stem cell is provided whose genome comprises a C9ORF72 locus as described herein. In some embodiments, a non-human embryonic stem cell is a rodent embryonic stem cell. In some certain embodiments, a rodent embryonic stem cell is a mouse embryonic stem cell and is from a 129 strain, C57BL strain, or a mixture thereof. In some certain embodiments, a rodent embryonic stem cell is a mouse embryonic stem cell and is a mixture of 129 and C57BL strains.

[0028] Also described herein is a Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas) system, or one or more components of a CRISPR/Cas system, which may be used to delete from a cell, e.g., an embryonic stem cell, a heterologous hexanucleotide repeat expansion sequence (or portion thereof) inserted an endogenous C9ORF72 locus as described herein. Such components include, for example, Cas proteins and/or guide RNAs (gRNAs), which gRNA may include two separate RNA molecules; e.g., targeter-RNA (e.g., CRISPR RNAs (crRNA) and activator RNA (e.g., tracrRNAs); or a single-guide RNA (e.g., single-molecule gRNA (sgRNA)).

[0029] CRISPR/Cas systems include transcripts and other elements involved in the expression of, or directing the activity of, Cas genes. A CRISPR/Cas system can be, for example, a type I, a type II, or a type III system. Alternatively, a CRISPR/Cas system can be a type V system (e.g., subtype V-A or subtype V-B). A heterologous hexanucleotide repeat expansion sequence (or portion thereof) inserted an endogenous C9ORF72 locus as described herein may be deleted by utilizing CRISPR complexes (comprising a guide RNA (gRNA) complexed with a Cas protein) for site-directed cleavage of nucleic acids.

[0030] A CRISPR/Cas system as described herein may comprise a Cas protein (e.g., Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cu1966, and homologs or modified versions thereof) and/or one or more guide RNA (gRNA), which target(s) a gRNA recognition sequence. A CRISPR/Cas system as described herein may further comprise at least one expression construct, which comprises a nucleic acid encoding a Cas protein (e.g., which may be operably linked to a promoter) and/or DNA encoding a gRNA as described herein.

[0031] In some embodiments a gRNA recognition sequence, e.g., a target nucleic acid sequence to which a DNA-targeting segment of a gRNA will bind provided sufficient conditions for binding exist, is found in SEQ ID NO:45, or portion thereof. Site-specific binding and cleavage of SEQ ID NO:45 by Cas proteins can occur at locations determined by both (i) base-pairing complementarity between the gRNA and the target DNA and (ii) a short motif, called the protospacer adjacent motif (PAM), in the target DNA. The PAM can flank the guide RNA recognition sequence. Optionally, the guide RNA recognition sequence can be flanked on the 3' end by the PAM. Alternatively, the guide RNA recognition sequence can be flanked on the 5' end by the PAM. For example, the cleavage site of Cas proteins can be about 1 to about 10 or about 2 to about 5 base pairs (e.g., 3 base pairs) upstream or downstream of the PAM sequence. In some cases (e.g., when Cas9 from S. pyogenes or a closely related Cas9 is used), the PAM sequence of the non-complementary strand can be 5'-N.sub.1GG-3', where N.sub.1 is any DNA nucleotide and is immediately 3' of the guide RNA recognition sequence of the non-complementary strand of the target DNA. As such, the PAM sequence of the complementary strand would be 5'-CCN.sub.2-3', where N.sub.2 is any DNA nucleotide and is immediately 5' of the guide RNA recognition sequence of the complementary strand of the target DNA. In some such cases, N.sub.1 and N.sub.2 can be complementary and the N.sub.1-N.sub.2 base pair can be any base pair (e.g., N.sub.1=C and N.sub.2=G; N.sub.1=G and N.sub.2=C; N.sub.1=A and N.sub.2=T; or N.sub.1=T, and N.sub.2=A). In the case of Cas9 from S. aureus, the PAM can be NNGRRT or NNGRR, where N can A, G, C, or T, and R can be G or A. In some cases (e.g., for FnCpf1), the PAM sequence can be upstream of the 5' end and have the sequence 5'-TTN-3'. In some embodiments, a gRNA recognition sequence starts at position 190, 196, 274, 899, 905, 1006, or 1068 of SEQ ID NO:45.

[0032] As disclosed herein, guide RNAs may be provided in any form. In some embodiments, gRNA can be provided in the form of RNA, either as two molecules (a separate crRNA and tracrRNA) or as one molecule (sgRNA), and optionally in the form of a complex with a Cas protein. The gRNA can also be provided in the form of DNA encoding the gRNA. In some embodiments, the DNA encoding the gRNA can encode a single RNA molecule (sgRNA) or separate RNA molecules (e.g., separate crRNA and tracrRNA) (wherein the separate RNA molecules may be provided as one DNA molecule, or as separate DNA molecules encoding the crRNA and tracrRNA, respectively).

[0033] In one embodiment, a CRISPR/Cas system as described herein comprises Cas9 protein or a protein derived from a Cas9 from a type II CRISPR/Cas system and/or at least one gRNA, wherein the at least one gRNA is encoded by DNA that encodes a crRNA and/or a tracrRNA. In some embodiments, a DNA encoding a crRNA comprises a sequence selected from the group consisting of AGTACTGTGAGAGCAAGTAG (R) (SEQ ID NO:38), GCTCTCACAGTACTCGCTGA (SEQ ID NO:39), CCGCAGCCTGTAGCAAGCTC (SEQ ID NO:40), CGGCCGCTAGCGCGATCGCG (SEQ ID NO:41), ACGCCCCGCGATCGCGCTAG (R) (SEQ ID NO:42), TGGCGAGTGGGTGAGTGAGG (SEQ ID NO:43), GGAAGAGGCGCGGGTAGAAG (SEQ ID NO:44), GAGTACTGTGAGAGCAAGTAG (R) (SEQ ID NO:46), GCCGCAGCCTGTAGCAAGCTC (SEQ ID NO:47), GCGGCCGCTAGCGCGATCGCG (SEQ ID NO:48), GACGCCCCGCGATCGCGCTAG (R) (SEQ ID NO:49), and GTGGCGAGTGGGTGAGTGAGG (SEQ ID NO:50). In one embodiment, a CRISPR/Cas system described herein comprises a combination of at least seven crRNA encoding sequences, wherein each of the seven crRNA encoding sequences comprises a sequence set forth as SEQ ID NO: 38, 39, 40, 41, 42, 43 or 44. In one embodiment, a CRISPR/Cas 9 system described herein comprises a combination of at least seven distinct crRNA encoding sequences, wherein each of the seven crRNA encoding sequences comprises a sequence set forth as SEQ ID NO: 46, 39, 47, 48, 49, 50 or 44. In one embodiment, a CRISPR/Cas 9 system described herein comprises a combination of at least three distinct crRNA encoding sequences, each comprising a sequence set forth as SEQ ID NO: 40, 43 or 44. In one embodiment, a CRISPR/Cas 9 system described herein comprises a combination of at least three distinct crRNA encoding sequences, each comprising a sequence set forth as SEQ ID NO: 47, 50 or 44. In one embodiment, a CRISPR/Cas 9 system described herein comprises a combination of at least four distinct crRNA encoding sequences, each comprising a sequence set forth as SEQ ID NO: 38, 39, 41 or 42. In one embodiment, a CRISPR/Cas 9 system described herein comprises a combination of at least four distinct crRNA encoding sequences, each comprising a sequence set forth as SEQ ID NO: 46, 39, 48, or 49.

[0034] In some embodiments, a gRNA disclosed herein is encoded by DNA encoding a tracrRNA. In some embodiments, the tracrRNA encoding sequence comprises a sequence set forth as SEQ ID NO:63, 64 or 65. In some embodiments a gRNA as described herein comprises a crRNA and a tracrRNA. In some embodiments, a gRNA as disclosed herein comprises one or more crRNA (e.g., encoded by DNA comprising a sequence set forth as SEQ ID NO: 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49 or 50) and a tracrRNA, e.g., a DNA comprising a sequence set forth as SEQ ID NO:63, 64 or 65. In some embodiments, the DNA encoding the gRNA can encode a single RNA molecule (sgRNA) or separate RNA molecules (e.g., separate crRNA and tracrRNA) (wherein the separate RNA molecules may be provided as one DNA molecule, or as separate DNA molecules encoding the crRNA and tracrRNA, respectively).

[0035] Targeted genetic modifications can be generated by contacting a cell with a Cas protein and one or more guide RNAs that hybridize to one or more guide RNA recognition sequences within a target genomic locus. At least one of the one or more guide RNAs can form a complex with and can guide the Cas protein to at least one of the one or more guide RNA recognition sequences, and the Cas protein can cleave the target genomic locus within at least one of the one or more guide RNA recognition sequences. Cleavage by the Cas protein can create a double-strand break or a single-strand break (e.g., if the Cas protein is a nickase). The end sequences generated by the double-strand break or the single-strand break can then undergo recombination.

[0036] In some embodiments, a non-human germ cell is provided whose genome comprises a C9ORF72 locus as described herein. In some embodiments, a non-human germ cell is a rodent germ cell. In some certain embodiments, a rodent germ cell is a mouse germ cell and is from a 129 strain, C57BL strain, or a mixture thereof. In some certain embodiments, a rodent germ cell is a mouse germ cell and is a mixture of 129 and C57BL strains.

[0037] In some embodiments, the use of a non-human embryonic stem cell or germ cell as described herein is provided to make a genetically modified non-human animal. In some certain embodiments, a non-human embryonic stem cell or germ cell is a mouse embryonic stem cell or germ cell and is used to make a mouse comprising a C9ORF72 locus as described herein. In some certain embodiments, a non-human embryonic stem cell or germ cell is a rat embryonic stem cell germ cell and is used to make a rat comprising a C9ORF72 locus as described herein.

[0038] In some embodiments, a non-human embryo is provided comprising, made from, obtained from, or generated from a non-human embryonic stem cell comprising a C9ORF72 locus as described herein. In some certain embodiments, a non-human embryo is a rodent embryo; in some embodiments, a mouse embryo; in some embodiments, a rat embryo.

[0039] In some embodiments, the use of a non-human embryo as described herein is provided to make a genetically modified non-human animal. In some certain embodiments, a non-human embryo is a mouse embryo and is used to make a mouse comprising a C9ORF72 locus as described herein. In some certain embodiments, a non-human embryo is a rat embryo and is used to make a rat comprising a C9ORF72 locus as described herein.

[0040] In some embodiments, a non-human animal model of amyotrophic lateral sclerosis (ALS) or frontotemporal dementia (FTD) is provided, which non-human animal has an endogenous C9ORF72 locus comprising a heterologous hexanucleotide repeat expansion sequence as disclosed herein.

[0041] In some embodiments, a non-human animal model of amyotrophic lateral sclerosis (ALS) or frontotemporal dementia (FTD) is provided, which is obtained by an insertion of a heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus.

[0042] In some embodiments, a method for identifying a therapeutic candidate for the treatment of a neurodegenerative disease, disorder or condition is provided, the method comprising (a) administering a candidate agent to a non-human animal or non-human animal cell (e.g., embryonic stem cell, an embryonic stem cell-derived motor neuron, a brain cell, a cortical cell, a neuronal cell, a muscle cell, a heart cell) whose genome comprises an endogenous C9ORF72 locus modified as described herein; (b) performing one or more assays to determine if the candidate agent has a modulating effect on one or more signs, symptoms and/or conditions associated with the disease, disorder or condition (e.g., increased transcription of sense or antisense C9orf72 RNA from the C9orf72 locus, increased nuclear and/or cytoplasmic RNA foci comprising sense or antisense C9orf72 RNA, increased RAN translation products (e.g., dipeptide repeat proteins); and (c) identifying the candidate agent that has a modulating effect on the one or more signs, symptoms and/or conditions associated with the disease, disorder or condition as the therapeutic candidate. In some embodiments, the disease or condition is selected from the group consisting of a neurodegenerative disease or condition. In some embodiments, the candidate agent is administered in vivo to a non-human animal as described herein, and one or more assays are performed on tissue comprising a brain cell, a cortical cell, a neuronal cell, a muscle cell, a heart cell, or a germ cell isolated from the non-human animal after administration. In some embodiments, the candidate agent is administered to a cell (e.g., an embryonic stem cell, an embryonic stem cell-derived motor neuron, a brain cell, a cortical cell, a neuronal cell, a muscle cell, a heart cell) comprising a hexanucleotide repeat expansion sequence at the C9orf72 locus as described herein, and the assay performed, in vitro. In some embodiments, the assay is quantitative polymerase chain reaction (qPCR) to detect C9orf72 gene products, e.g., sense and antisense C9orf72 RNA. In some embodiments, qPCR may be performed with a primer and/or probe having a nucleotide sequence set forth in SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, or any combination thereof. In some embodiments, the assay measures RNA foci comprising a C9orf72 sense or antisense RNA transcript, e.g., an RNA transcript of a hexanucleotide repeat expansion sequence. In some embodiments, the assay that measures RNA foci comprising a C9orf72 sense or antisense RNA transcript, e.g., an RNA transcript of a hexanucleotide repeat expansion sequence, using one or more probes having a nucleotide sequence as set forth in any one of SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, and/or SEQ ID NO:84. In some embodiments, the assay is measures RAN translation products, e.g., the assay is immunofluorescence and RAN translation products (e.g., dipeptide repeat proteins, e.g., polyGA dipeptide repeat proteins) are measured with an anti-polyGA antibody. In some embodiments, the assay is measures C9orf72 protein levels.

[0043] In some embodiments, use of a non-human animal as described herein is provided in the manufacture of a medicament for the treatment of a neurodegenerative disease, disorder or condition.

[0044] In some embodiments, a neurodegenerative disease, disorder or condition is amyotrophic lateral sclerosis (ALS). In some embodiments, a neurodegenerative disease, disorder or condition is frontotemporal dementia (FTD).

[0045] In various embodiments, one or more phenotypes as described herein is or are as compared to a reference or control. In some embodiments, a reference or control includes a non-human animal having a modification as described herein, a modification that is different than a modification as described herein, or no modification (e.g., a wild type non-human animal). Non-human animals comprising a heterologous hexanucleotide repeat expansion sequence comprising a sequence set forth as SEQ ID NO:2, a variant thereof, SEQ ID NO: 4, a variant thereof, or SEQ ID NO:5, or a variant thereof, may exhibit a wildtype phenotype, e.g., may be used as a reference, or control, non-human animal in the methods described herein.

[0046] In various embodiments, a non-human animal is homozygous for the C9orf72 locus described herein. In various embodiments, the non-human animal is heterozygous for the C9orf72 locus described herein.

[0047] In various embodiments, a non-human animal described herein is a rodent; in some embodiments, a mouse; in some embodiments, a rat.

[0048] As used in this application, the terms "about" and "approximately" are used as equivalents. Any numerals used in this application with or without about/approximately are meant to cover any normal fluctuations appreciated by one of ordinary skill in the relevant art.

[0049] Other features, objects, and advantages of non-human animals, cells and methods provided herein are apparent in the detailed description of certain embodiments that follows. It should be understood, however, that the detailed description, while indicating certain embodiments, is given by way of illustration only, not limitation. Various changes and modifications within the scope of the invention will become apparent to those skilled in the art from the detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0050] The Drawings included herein, which is composed of the following Figures, is for illustration purposes only and not for limitation. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with the color drawing(s) will be provided by the United States Patent and Trademark Office upon request and payment of the necessary fee.

[0051] FIG. 1A shows a schematic illustration, not to scale, of the three reported mouse C9orf72 transcript isoforms (V1, V2 and V3) in the top box and a schematic illustration, not to scale, of a targeting strategy for insertion of one of two human heterologous hexanucleotide repeat expansion sequences spanning exons 1a and 1b of the human C9orf72 gene and comprising 3 or 100 repeats into an endogenous mouse C9orf72 locus. In FIG. 1A, white filled boxes represent mouse exons, with white diagonally striped boxes representing non-coding mouse exons of the mouse C9orf72 locus. Horizontally striped boxes are non-coding exons of a human C9orf72 locus and the diamond represents the hexanucleotide repeat. A first targeting vector comprising a sequence set forth as SEQ ID NO:2 and a second targeting vector comprising a sequence set forth as SEQ ID NO:4 were generated. The first targeting vector includes from 5' to 3': a mouse homology arm 89 Kb upstream from RP23-434N2 of mouse the 3110043021Rik gene and comprising SEQ ID NO:6, a human sequence set forth as SEQ ID NO:8 which spans non-coding exons 1a and 1b of human C9orf72 and includes the intervening intron containing three repeats of the hexanucleotide sequence GGGGCC; a drug selection cassette that comprises a promoter from the human ubiquitin 1 gene (hUb1) and the bacterial Em7 gene operably linked to a neomycin phosphotransferase resistance gene (neo-r) and is flanked by loxP sites), and a mouse homology arm 86 Kb downstream from RP23-434N2 of mouse the 3110043021Rik gene and comprising SEQ ID NO:7. The second targeting vector includes from 5' to 3': a mouse homology arm 89 Kb upstream from RP23-434N2 of mouse the 3110043021Rik gene and comprising SEQ ID NO:6; a human sequence set forth as SEQ ID NO:9 which spans non-coding exons 1a and 1b of human C9orf72 and includes the intervening intron containing 100 repeats of the hexanucleotide sequence GGGGCC; a drug selection cassette that comprises a promoter from the human ubiquitin 1 gene (hUb1) and the bacterial Em7 gene operably linked to a neomycin phosphotransferase resistance gene (neo-r) and is flanked by loxP sites); and a mouse homology arm 86 Kb downstream from RP23-434N2 of mouse the 3110043021Rik gene and comprising SEQ ID NO:7. Upon homologous recombination with the first or second targeting vector, a mouse genomic region of about 853 bp, including a portion of exon 1 and part of intron 1 of mouse 3110043021Rik is replaced with a sequence comprising the genomic sequence spanning exons 1a-1b of the human C9orf72 non-coding sequence. The resulting modified mouse C9orf72-HRE.sub.3 loci before and after excision of the drug resistance cassette are depicted in FIG. 1B. The resulting modified mouse C9orf72-HRE.sub.100 loci before and after excision of the drug resistance cassette are depicted in FIG. 1C. In FIGS. 1B and 1C, murine non-coding regions are represented by diagonally striped boxes, human non-coding exons are represented by horizontally striped boxes, and mouse coding exons are represented by white boxes. Also shown in the top panels of FIGS. 1B and 1C is an approximate location of a probe (vertical white rectangle) used for Southern blot analysis (SEQ ID NO:29).

[0052] Shown in FIG. 2A is the result of Southern blot analysis of genomic DNA isolated from control ES cell clones, ES cell clones targeted with a targeting vector comprising a heterologous repeat expansion sequence comprising three repeats of the hexanucleotide sequence (8026) and after excision of the drug cassette (8027 A-C4), or ES cell clones targeted with a targeting vector comprising a heterologous repeat expansion sequence comprising 100 repeats of the hexanucleotide sequence (8028) and after excision of the drug cassette (8029 A-A3, 8029 A-A6, 8029 B-A4, 8029 B-A10). FIG. 2B shows the genotypic results of genotyping samples (n=6) including a control ES cell clone, the 8027 A-C4 clone, the 8029 A-A3 clone, the 8029 A-A6 clone, the 8029 B-A4 clone, the 8029 B-A10 clone, and controls (n=7) obtained from human samples containing three hexanucleotide repeat expansion sequences.

[0053] FIG. 3 shows a schematic illustration, not to scale, of the humanized C9orf72-HREx (where x.gtoreq.1), the humanized region, and the wildtype (WT) C9orf72 mouse loci. Also shown in FIG. 3 are the approximate locations of 5'- and 3'-primers (white arrows) and probes (filled rectangles) used in the TAQMAN.RTM. qualitative PCR analyses A, B, G, H, and D described in Table 1 to quantify gene expression products from the modified C9orf72-HRE loci (A, B, G, H) or both the modified and wildtype C9orf72 loci (D). In FIG. 3, murine non-coding regions are represented by diagonally striped boxes, human non-coding exons are represented by horizontally striped boxes, and mouse coding exons are represented by white boxes. The sequences for the primers and probes depicted in FIG. 3 and described in Table 1 are provided in Table 5.

Table 1

TABLE-US-00001 [0054] TABLE 1 Location Location Analyses of 5'-primer of 3'-primer Location of probe A Mouse exon 1a Human exon 1a Spans junction of mouse exon 1a and human exon 1a B Human exon 1a Mouse exon 2 Human intron 2 G Human Intron 2 Human Intron 2 Human Intron 2 H Human Intron 2 Human Intron 2 Human Intron 2 D Mouse Exon 5 Mouse Exon 6 Mouse Intron 6

[0055] FIG. 4 provides bar graphs showing expression levels (as determined by the TAQMAN.RTM. qualitative PCR assays A, B, G, and H depicted in FIG. 3) of the C9orf72 locus (y-axis) by embryonic stem cell derived motor neurons (ESMNs), total brain tissue, or parental embryonic stem (ES) cells that are heterozygous (Het) or homozygous (Homo) for a wildtype C9orf72 locus (control) or a modified C9orf72 locus comprising three (3.times.), thirty (30.times.) or ninety-two (92.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 relative to ESMNs, brain, or parental ESCs, respectively, that are heterozygous for a modified C9orf72 locus comprising three (3.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1. All ESMNs and parental ES cells were heterozygous for the modified C9orf72 loci, and all controls were homozygous for the wildtype C9orf72 locus.

[0056] FIGS. 5A-5C provides bar graphs showing the differences in the count values (.DELTA. ct; y-axis) of C9orf72 gene products (detected by the TAQMAN.RTM. qualitative PCR assay A (FIG. 5A), assay B (FIG. 5B), or assay D (FIG. 5C) as depicted in FIG. 3) by embryonic stem cell derived motor neurons (ESMNs), total mouse brain, or parental embryonic stem (ES) cells that are heterozygous (het) or homozygous (homo) for a wildtype C9orf72 locus (Controls) or a modified C9orf72 locus comprising three (3.times.), thirty (30.times.) or ninety-two (92.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 and the count values of GAPDH gene products. All ESMNs and parental ES cells were heterozygous for the modified C9orf72 loci, and all controls were homozygous for the wildtype C9orf72 locus.

[0057] FIG. 6 provides bar graphs showing the differences in the count values (.DELTA. ct; y-axis) of C9orf72 gene products (detected by the TAQMAN.RTM. qualitative PCR assay B as depicted in FIG. 3) in tissues isolated from the cortex, brainstem, remaining (rem) brain, spinal cord, muscle, liver, heart, or kidneys of mice heterozygous (het) or homozygous (homo) for a wildtype C9orf72 locus (WT) or a modified c9orf72 locus comprising three (3.times.) or ninety-two (92.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 and the count values of .beta.2-microglobulin (B2M) gene products.

[0058] FIG. 7 shows Western blot images (top) from reducing SDS-PAGE analysis of lysates from embryonic stem cell-derived motor neurons (ESMNs) homozygous for a wildtype C9orf72 locus (CTRL) or heterozygous for a modified C9orf72 locus comprising three (G.sub.4C.sub.23.times.), thirty (G.sub.4C.sub.230.times.) or ninety-two (G.sub.4C.sub.292.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, blotted with anti-C9orf72 antibody (top) or anti-GAPDH antibody (bottom). Bar graphs (bottom panel) of the protein levels of C9orf72 of these samples normalized to protein levels of C9orf72 of ESMNs heterozygous for a modified C9orf72 locus comprising three repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 are also provided, as are molecular weight markers.

[0059] FIG. 8 shows a Western blot image (top) from reducing SDS-PAGE analysis of lysates of from embryonic stem cell-derived motor neurons (ESMNs) heterozygous for a modified C9orf72 locus comprising three (G.sub.4C.sub.23.times.) or ninety-two (G.sub.4C.sub.292.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1. Lysates containing 0 .mu.g, 1.25 .mu.g, 2.5 .mu.g, 5 .mu.g, or 10 .mu.g total proteins are blotted with anti-C9orf72 antibody (shown) or anti-GAPDH antibody (data not shown). Bar graphs (bottom) of the protein levels of C9orf72 of these samples normalized to protein levels of GAPDH by these samples are also provided, as are molecular weight markers.

[0060] FIGS. 9A and 9B are images obtained from fluorescent in situ hybridization (FISH) of embryonic stem cell derived motor neurons (ESMNs) heterozygous for a C9orf72 locus modified to comprise three (C9orf72 G.sub.4C.sub.2 3.times.), thirty (C9orf72 G.sub.4C.sub.2 30.times.) or ninety-two (C9orf72 G.sub.4C.sub.2 92.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 stained with DNA (FIG. 9A) or LNA (FIG. 9B) probes, which images show the nuclear and cytoplasmic locations of sense (FIG. 9A) or antisense (FIG. 9B) transcripts of the hexanucleotide repeat sequence set forth in SEQ ID NO:1 in the ESMNs. Arrows point to exemplary stained RNA foci.

[0061] FIG. 10 provides images obtained from immunofluorescence of embryonic stem cell derived motor neurons (ESMNs) heterozygous for a C9orf72 locus modified to comprise three (C9orf72 G.sub.4C.sub.2 3.times.) or ninety-two (C9orf72 G.sub.4C.sub.2 92.times.) repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, which images show the nuclear locations of dipeptide repeat proteins (polyGA) translated (through RAN translation, a non-AUG mechanism) from transcripts of the hexanucleotide repeat sequence set forth in SEQ ID NO:1 in the ESMNs. Arrows point to exemplary stained polyGA dipeptide repeat proteins.

[0062] FIG. 11 shows a schematic illustration, not to scale, of about 1300 bp of a mouse C9ORF72 locus comprising a heterologous (human) hexanucleotide repeat expansion comprising about 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, and which may be used as a reference sequence to generate a CRISPR/Cas system for the deletion of the expansion sequence. Also depicted in FIG. 11 are the approximate locations of (1) the 92 repeats of the hexanucleotide sequence depicted by downward pointing arrows, (2) the starting positions (190, 196 and 274) of three sites upstream of the hexanucleotide repeat expansion sequence that may be targeted by gRNA respectively comprising the sequence set forth as SEQ ID NO:38, SEQ ID NO:39, and SEQ ID NO:40, (3) the starting positions (899, 905 1006 and 1068) of four sites downstream of the hexanucleotide repeat expansion sequence that may be targeted by gRNA respectively comprising the sequence set forth as SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43 and SEQ ID NO:44, and (4) the approximate locations for forward (F-) and reverse (R-)primers that may be used to confirm the deletion in selected cell clones. The nucleic acid sequence of the reference sequence depicted in FIG. 11 is set forth as SEQ ID NO:45.

[0063] FIG. 12 shows an exemplary 10,718 bp expression construct that may be used in a CRISP/Cas system. The expression construct comprises a nucleic acid encoding a mouse Cas9 protein "mouse opt Cas9" fused with an N-terminal nuclear localization signal (NLS) and C-terminal nuclear localization signal, the expression of the fusion protein being under the control of a CAGG promoter. Upstream of the nucleic acid is a kozak sequence, and downstream of the nucleic acid is a bovine growth hormone polyadenylation (bGHpA) tail. Also shown as part of the expression construct are an EF1 promoter driving the expression of a nucleotide sequence encoding a green fluorescence protein (GFP) fused with a puromycin resistance gene operably linked to an SV40 polyadenylation (SV40 polyA) tail, an origin of replication site (pMB1), and a .beta. lactamase gene providing ampicillin (Amp) resistance. The expression construct allows for the insertion of DNA encoding gRNA, e.g., a crRNA, between a U6 promoter and a termination signal. An expression construct has depicted in FIG. 4 may further comprise, downstream of the U6 promoter and upstream a termination signal, a tracrRNA encoding sequence. Such tracrRNA encoding sequence is placed such that it may be operably linked to the, e.g., crRNA, upon its insertion. In some embodiments, a tracrRNA encoding sequence comprises GTTGGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG- AGTCGGTGC (SEQ ID NO: 63); GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGG- TGC (SEQ ID NO:64); GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC- ACCGAGTCGGTGC (SEQ ID NO:65), or portions thereof.

DEFINITIONS

[0064] This invention is not limited to particular methods and experimental conditions described herein, as such methods and conditions may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention is defined by the claims.

[0065] Unless defined otherwise, all terms and phrases used herein include the meanings that the terms and phrases have attained in the art, unless the contrary is clearly indicated or clearly apparent from the context in which the term or phrase is used. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, particular methods and materials are now described. All publications mentioned herein are hereby incorporated by reference.

[0066] "Administration" includes the administration of a composition to a subject or system (e.g., to a cell, organ, tissue, organism, or relevant component or set of components thereof). Those of ordinary skill will appreciate that route of administration may vary depending, for example, on the subject or system to which the composition is being administered, the nature of the composition, the purpose of the administration, etc. For example, in certain embodiments, administration to an animal subject (e.g., to a human or a rodent) may be bronchial (including by bronchial instillation), buccal, enteral, interdermal, intra-arterial, intradermal, intragastric, intramedullary, intramuscular, intranasal, intraperitoneal, intrathecal, intravenous, intraventricular, mucosal, nasal, oral, rectal, subcutaneous, sublingual, topical, tracheal (including by intratracheal instillation), transdermal, vaginal and/or vitreal. In some embodiments, administration may involve intermittent dosing. In some embodiments, administration may involve continuous dosing (e.g., perfusion) for at least a selected period of time.

[0067] "Amelioration" includes the prevention, reduction or palliation of a state, or improvement of the state of a subject. Amelioration includes, but does not require complete recovery or complete prevention of a disease, disorder or condition (e.g., radiation injury).

[0068] "Approximately", as applied to one or more values of interest, includes to a value that is similar to a stated reference value. In certain embodiments, the term "approximately" or "about" refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).

[0069] "Biologically active" includes a characteristic of any agent that has activity in a biological system, in vitro or in vivo (e.g., in an organism). For instance, an agent that, when present in an organism, has a biological effect within that organism is considered to be biologically active. In particular embodiments, where a protein or polypeptide is biologically active, a portion of that protein or polypeptide that shares at least one biological activity of the protein or polypeptide is typically referred to as a "biologically active" portion.

[0070] "Comparable" includes two or more agents, entities, situations, sets of conditions, etc. that may not be identical to one another but that are sufficiently similar to permit comparison there between so that conclusions may reasonably be drawn based on differences or similarities observed. Those of ordinary skill in the art will understand, in context, what degree of identity is required in any given circumstance for two or more such agents, entities, situations, sets of conditions, etc. to be considered comparable.

[0071] "Conservative", when describing a conservative amino acid substitution, includes substitution of an amino acid residue by another amino acid residue having a side chain R group with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of interest of a protein, for example, the ability of a receptor to bind to a ligand. Examples of groups of amino acids that have side chains with similar chemical properties include: aliphatic side chains such as glycine, alanine, valine, leucine, and isoleucine; aliphatic-hydroxyl side chains such as serine and threonine; amide-containing side chains such as asparagine and glutamine; aromatic side chains such as phenylalanine, tyrosine, and tryptophan; basic side chains such as lysine, arginine, and histidine; acidic side chains such as aspartic acid and glutamic acid; and sulfur-containing side chains such as cysteine and methionine. Conservative amino acids substitution groups include, for example, valine/leucine/isoleucine, phenylalanine/tyrosine, lysine/arginine, alanine/valine, glutamate/aspartate, and asparagine/glutamine. In some embodiments, a conservative amino acid substitution can be a substitution of any native residue in a protein with alanine, as used in, for example, alanine scanning mutagenesis. In some embodiments, a conservative substitution is made that has a positive value in the PAM250 log-likelihood matrix disclosed in Gonnet, G. H. et al., 1992, Science 256:1443-1445. In some embodiments, a substitution is a moderately conservative substitution wherein the substitution has a nonnegative value in the PAM250 log-likelihood matrix.

[0072] "Control" includes the art-understood meaning of a "control" being a standard against which results are compared. Typically, controls are used to augment integrity in experiments by isolating variables in order to make a conclusion about such variables. In some embodiments, a control is a reaction or assay that is performed simultaneously with a test reaction or assay to provide a comparator. A "control" also includes a "control animal." A "control animal" may have a modification as described herein, a modification that is different as described herein, or no modification (i.e., a wild type animal). In one experiment, a "test" (i.e., a variable being tested) is applied. In a second experiment, the "control," the variable being tested is not applied. In some embodiments, a control is a historical control (i.e., of a test or assay performed previously, or an amount or result that is previously known). In some embodiments, a control is or comprises a printed or otherwise saved record. A control may be a positive control or a negative control.

[0073] "Disruption" includes the result of a homologous recombination event with a DNA molecule (e.g., with an endogenous homologous sequence such as a gene or gene locus). In some embodiments, a disruption may achieve or represent an insertion, deletion, substitution, replacement, missense mutation, or a frame-shift of a DNA sequence(s), or any combination thereof. Insertions may include the insertion of entire genes or fragments of genes, e.g., exons, which may be of an origin other than the endogenous sequence (e.g., a heterologous sequence). In some embodiments, a disruption may increase expression and/or activity of a gene or gene product (e.g., of a protein encoded by a gene). In some embodiments, a disruption may decrease expression and/or activity of a gene or gene product. In some embodiments, a disruption may alter sequence of a gene or an encoded gene product (e.g., an encoded protein). In some embodiments, a disruption may truncate or fragment a gene or an encoded gene product (e.g., an encoded protein). In some embodiments, a disruption may extend a gene or an encoded gene product. In some such embodiments, a disruption may achieve assembly of a fusion protein. In some embodiments, a disruption may affect level, but not activity, of a gene or gene product. In some embodiments, a disruption may affect activity, but not level, of a gene or gene product. In some embodiments, a disruption may have no significant effect on level of a gene or gene product. In some embodiments, a disruption may have no significant effect on activity of a gene or gene product. In some embodiments, a disruption may have no significant effect on either level or activity of a gene or gene product.

[0074] "Determining", "measuring", "evaluating", "assessing", "assaying" and "analyzing" includes any form of measurement, and include determining if an element is present or not. These terms include both quantitative and/or qualitative determinations. Assaying may be relative or absolute. "Assaying for the presence of" can be determining the amount of something present and/or determining whether or not it is present or absent.

[0075] "Endogenous locus" or "endogenous gene" includes a genetic locus found in a parent or reference organism prior to introduction of a disruption, deletion, replacement, alteration, or modification as described herein. In some embodiments, the endogenous locus has a sequence found in nature. In some embodiments, the endogenous locus is a wild type locus. In some embodiments, the reference organism is a wild type organism. In some embodiments, the reference organism is an engineered organism. In some embodiments, the reference organism is a laboratory-bred organism (whether wild type or engineered).

[0076] "Endogenous promoter" includes a promoter that is naturally associated, e.g., in a wild type organism, with an endogenous gene.

[0077] "Gene" includes a DNA sequence in a chromosome that codes for a product (e.g., an RNA product and/or a polypeptide product). In some embodiments, a gene includes coding sequence (i.e., sequence that encodes a particular product). In some embodiments, a gene includes non-coding sequence. In some particular embodiments, a gene may include both coding (e.g., exonic) and non-coding (e.g., intronic) sequence. In some embodiments, a gene may include one or more regulatory sequences (e.g., promoters, enhancers, etc.) and/or intron sequences that, for example, may control or impact one or more aspects of gene expression (e.g., cell-type-specific expression, inducible expression, etc.). For the purpose of clarity we note that, as used in the present application, the term "gene" generally refers to a portion of a nucleic acid that encodes a polypeptide; the term may optionally encompass regulatory sequences, as will be clear from context to those of ordinary skill in the art. This definition is not intended to exclude application of the term "gene" to non-protein-coding expression units but rather to clarify that, in most cases, the term as used in this document refers to a polypeptide-coding nucleic acid.

[0078] "Heterologous" includes an agent or entity from a different source. For example, when used in reference to a polypeptide, nucleic acid sequence, gene, or gene product present in a particular cell or organism, the term clarifies that the relevant polypeptide, nucleic acid sequence, gene, or gene product: 1) was engineered by the hand of man; 2) was introduced into the cell or organism (or a precursor thereof) through the hand of man (e.g., via genetic engineering); and/or 3) is not naturally produced by or present in the relevant cell or organism (e.g., the relevant cell type or organism type). "Heterologous" also includes a polypeptide, nucleic acid sequence, gene or gene product that is normally present in a particular native cell or organism, but has been modified, for example, by mutation or placement under the control of non-naturally associated and, in some embodiments, non-endogenous regulatory elements (e.g., a promoter).

[0079] "Host cell" includes a cell into which a nucleic acid or protein has been introduced. Persons of skill upon reading this disclosure will understand that such terms refer not only to the particular subject cell, but also is used to refer to the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the phrase "host cell". In some embodiments, a host cell is or comprises a prokaryotic or eukaryotic cell. In general, a host cell is any cell that is suitable for receiving and/or producing a heterologous nucleic acid or protein, regardless of the Kingdom of life to which the cell is designated. Exemplary cells include those of prokaryotes and eukaryotes (single-cell or multiple-cell), bacterial cells (e.g., strains of Escherichia coli, Bacillus spp., Streptomyces spp., etc.), mycobacteria cells, fungal cells, yeast cells (e.g., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichia pastoris, Pichia methanolica, etc.), plant cells, insect cells (e.g., SF-9, SF-21, baculovirus-infected insect cells, Trichoplusia ni, etc.), non-human animal cells, human cells, or cell fusions such as, for example, hybridomas or quadromas. In some embodiments, the cell is a human, monkey, ape, hamster, rat, or mouse cell. In some embodiments, the cell is eukaryotic and is selected from the following cells: CHO (e.g., CHO K1, DXB-11 CHO, Veggie-CHO), COS (e.g., COS-7), retinal cell, Vero, CV1, kidney (e.g., HEK293, 293 EBNA, MSR 293, MDCK, HaK, BHK), HeLa, HepG2, WI38, MRC 5, Colo205, HB 8065, HL-60, (e.g., BHK21), Jurkat, Daudi, A431 (epidermal), CV-1, U937, 3T3, L cell, C127 cell, SP2/0, NS-0, MMT 060562, Sertoli cell, BRL 3A cell, HT1080 cell, myeloma cell, tumor cell, and a cell line derived from an aforementioned cell. In some embodiments, the cell comprises one or more viral genes, e.g., a retinal cell that expresses a viral gene (e.g., a PER.C6.RTM. cell). In some embodiments, a host cell is or comprises an isolated cell. In some embodiments, a host cell is part of a tissue. In some embodiments, a host cell is part of an organism.

[0080] "Identity", in connection with a comparison of sequences, includes identity as determined by a number of different algorithms known in the art that can be used to measure nucleotide and/or amino acid sequence identity. In some embodiments, identities as described herein are determined using a ClustalW v. 1.83 (slow) alignment employing an open gap penalty of 10.0, an extend gap penalty of 0.1, and using a Gonnet similarity matrix (MACVECTOR.TM. 10.0.2, MacVector Inc., 2008).

[0081] "Improve", "increase", "eliminate", or "reduce" includes indicated values that are relative to a baseline measurement, such as a measurement in the same individual (or animal) prior to initiation of a treatment described herein, or a measurement in a control individual (or animal) or multiple control individuals (or animals) in the absence of the treatment described herein.

[0082] "Isolated" includes a substance and/or entity that has been (1) separated from at least some of the components with which it was associated when initially produced (whether in nature and/or in an experimental setting), and/or (2) designed, produced, prepared, and/or manufactured by the hand of man. Isolated substances and/or entities may be separated from about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more than about 99% of the other components with which they were initially associated. In some embodiments, isolated agents are about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more than about 99% pure. In some embodiments, a substance is "pure" if it is substantially free of other components. In some embodiments, as will be understood by those skilled in the art, a substance may still be considered "isolated" or even "pure", after having been combined with certain other components such as, for example, one or more carriers or excipients (e.g., buffer, solvent, water, etc.); in such embodiments, percent isolation or purity of the substance is calculated without including such carriers or excipients. To give but one example, in some embodiments, a biological polymer such as a polypeptide or polynucleotide that occurs in nature is considered to be "isolated" when: a) by virtue of its origin or source of derivation is not associated with some or all of the components that accompany it in its native state in nature; b) it is substantially free of other polypeptides or nucleic acids of the same species from the species that produces it in nature; or c) is expressed by or is otherwise in association with components from a cell or other expression system that is not of the species that produces it in nature. Thus, for instance, in some embodiments, a polypeptide that is chemically synthesized or is synthesized in a cellular system different from that which produces it in nature is considered to be an "isolated" polypeptide. Alternatively or additionally, in some embodiments, a polypeptide that has been subjected to one or more purification techniques may be considered to be an "isolated" polypeptide to the extent that it has been separated from other components: a) with which it is associated in nature; and/or b) with which it was associated when initially produced.

[0083] "Locus" or "Loci" includes a specific location(s) of a gene (or significant sequence), DNA sequence, polypeptide-encoding sequence, or position on a chromosome of the genome of an organism. For example, a "C9ORF72 locus" may refer to the specific location of a C9ORF72 gene, C9ORF72 DNA sequence, C9ORF72-encoding sequence, or C9ORF72 position on a chromosome of the genome of an organism that has been identified as to where such a sequence resides. A C9ORF72 locus may comprise a regulatory element of a C9ORF72 gene, including, but not limited to, an enhancer, a promoter, 5' and/or 3' UTR, or a combination thereof. Those of ordinary skill in the art will appreciate that chromosomes may, in some embodiments, contain hundreds or even thousands of genes and demonstrate physical co-localization of similar genetic loci when comparing between different species. Such genetic loci can be described as having shared synteny.

[0084] "Non-human animal" includes any vertebrate organism that is not a human. In some embodiments, a non-human animal is a cyclostome, a bony fish, a cartilaginous fish (e.g., a shark or a ray), an amphibian, a reptile, a mammal, and a bird. In some embodiments, a non-human mammal is a primate, a goat, a sheep, a pig, a dog, a cow, or a rodent. In some embodiments, a non-human animal is a rodent such as a rat or a mouse.

[0085] "Nucleic acid" includes any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, a "nucleic acid" is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage. As will be clear from context, in some embodiments, "nucleic acid" refers to individual nucleic acid residues (e.g., nucleotides and/or nucleosides); in some embodiments, "nucleic acid" refers to an oligonucleotide chain comprising individual nucleic acid residues. In some embodiments, a "nucleic acid" is or comprises RNA; in some embodiments, a "nucleic acid" is or comprises DNA. In some embodiments, a "nucleic acid" is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a "nucleic acid" is, comprises, or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a "nucleic acid" in that it does not utilize a phosphodiester backbone. For example, in some embodiments, a "nucleic acid" is, comprises, or consists of one or more "peptide nucleic acids", which are known in the art and have peptide bonds instead of phosphodiester bonds in the backbone, are considered within the scope of the present invention. Alternatively or additionally, in some embodiments, a "nucleic acid" has one or more phosphorothioate and/or 5'-N-phosphoramidite linkages rather than phosphodiester bonds. In some embodiments, a "nucleic acid" is, comprises, or consists of one or more natural nucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine). In some embodiments, a "nucleic acid" is, comprises, or consists of one or more nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, 2-thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, a "nucleic acid" comprises one or more modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose) as compared with those in natural nucleic acids. In some embodiments, a "nucleic acid" has a nucleotide sequence that encodes a functional gene product such as an RNA or protein. In some embodiments, a "nucleic acid" includes one or more introns. In some embodiments, a "nucleic acid" includes one or more exons. In some embodiments, a "nucleic acid" is prepared by one or more of isolation from a natural source, enzymatic synthesis by polymerization based on a complementary template (in vivo or in vitro), reproduction in a recombinant cell or system, and chemical synthesis. In some embodiments, a "nucleic acid" is at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 20, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000 or more residues long. In some embodiments, a "nucleic acid" is single stranded; in some embodiments, a "nucleic acid" is double stranded. In some embodiments, a "nucleic acid" has a nucleotide sequence comprising at least one element that encodes, or is the complement of a sequence that encodes, a polypeptide. In some embodiments, a "nucleic acid" has enzymatic activity.

[0086] "Operably linked" includes a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A control sequence "operably linked" to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequences. "Operably linked" sequences include both expression control sequences that are contiguous with the gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest. The term "expression control sequence" includes polynucleotide sequences, which are necessary to affect the expression and processing of coding sequences to which they are ligated. "Expression control sequences" include: appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion. The nature of such control sequences differs depending upon the host organism. For example, in prokaryotes, such control sequences generally include promoter, ribosomal binding site and transcription termination sequence, while in eukaryotes typically such control sequences include promoters and transcription termination sequence. The term "control sequences" is intended to include components whose presence is essential for expression and processing, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.

[0087] "Phenotype" includes a trait, or to a class or set of traits displayed by a cell or organism. In some embodiments, a particular phenotype may correlate with a particular allele or genotype. In some embodiments, a phenotype may be discrete; in some embodiments, a phenotype may be continuous.

[0088] "Physiological conditions" includes its art-understood meaning referencing conditions under which cells or organisms live and/or reproduce. In some embodiments, the term includes conditions of the external or internal milieu that may occur in nature for an organism or cell system. In some embodiments, physiological conditions are those conditions present within the body of a human or non-human animal, especially those conditions present at and/or within a surgical site. Physiological conditions typically include, e.g., a temperature range of 20-40.degree. C., atmospheric pressure of 1, pH of 6-8, glucose concentration of 1-20 mM, oxygen concentration at atmospheric levels, and gravity as it is encountered on earth. In some embodiments, conditions in a laboratory are manipulated and/or maintained at physiologic conditions. In some embodiments, physiological conditions are encountered in an organism.

[0089] "Polypeptide" includes any polymeric chain of amino acids. In some embodiments, a polypeptide has an amino acid sequence that occurs in nature. In some embodiments, a polypeptide has an amino acid sequence that does not occur in nature. In some embodiments, a polypeptide has an amino acid sequence that contains portions that occur in nature separately from one another (i.e., from two or more different organisms, for example, human and non-human portions). In some embodiments, a polypeptide has an amino acid sequence that is engineered in that it is designed and/or produced through action of the hand of man.

[0090] "Prevent" or "prevention" in connection with the occurrence of a disease, disorder, and/or condition, includes reducing the risk of developing the disease, disorder and/or condition and/or to delaying onset of one or more characteristics or symptoms of the disease, disorder or condition. Prevention may be considered complete when onset of a disease, disorder or condition has been delayed for a predefined period of time.

[0091] "Reference" includes a standard or control agent, animal, cohort, individual, population, sample, sequence or value against which an agent, animal, cohort, individual, population, sample, sequence or value of interest is compared. In some embodiments, a reference agent, animal, cohort, individual, population, sample, sequence or value is tested and/or determined substantially simultaneously with the testing or determination of the agent, animal, cohort, individual, population, sample, sequence or value of interest. In some embodiments, a reference agent, animal, cohort, individual, population, sample, sequence or value is a historical reference, optionally embodied in a tangible medium. In some embodiments, a reference may refer to a control. A "reference" also includes a "reference animal". A "reference animal" may have a modification as described herein, a modification that is different as described herein or no modification (i.e., a wild type animal). Typically, as would be understood by those skilled in the art, a reference agent, animal, cohort, individual, population, sample, sequence or value is determined or characterized under conditions comparable to those utilized to determine or characterize the agent, animal (e.g., a mammal), cohort, individual, population, sample, sequence or value of interest.

[0092] "Response" includes any beneficial alteration in a subject's condition that occurs as a result of or correlates with treatment. Such alteration may include stabilization of the condition (e.g., prevention of deterioration that would have taken place in the absence of the treatment), amelioration of symptoms of the condition, and/or improvement in the prospects for cure of the condition, etc. It may refer to a subject's response or to a neuron's response. Neuron or subject response may be measured according to a wide variety of criteria, including clinical criteria and objective criteria. Examination of the motor system of a subject may include examination of one or more of strength, tendon reflexes, superficial reflexes, muscle bulk, coordination, muscle tone, abnormal movements, station and gait. Techniques for assessing response include, but are not limited to, clinical examination, stretch flex (myotatic reflex), Hoffmann's reflex, and/or pressure tests. Methods and guidelines for assessing response to treatment are discussed in Brodal, A.: Neurological Anatomy in Relation to Clinical Medicine, ed. 2, New York, Oxford University Press, 1969; Medical Council of the U.K.: Aids to the Examination of the Peripheral Nervous System, Palo Alto, Calif., Pendragon House, 1978; Monrad-Krohn, G. H., Refsum, S.: The Clinical Examination of the Nervous System, ed. 12, London, H.K. Lewis & Co., 1964; and Wolf, J. K.: Segmental Neurology, A Guide to the Examination and Interpretation of Sensory and Motor Function, Baltimore, University Park Press, 1981. The exact response criteria can be selected in any appropriate manner, provided that when comparing groups of neurons and/or patients, the groups to be compared are assessed based on the same or comparable criteria for determining response rate. One of ordinary skill in the art will be able to select appropriate criteria.

[0093] "Risk", as will be understood from context, of a disease, disorder, and/or condition comprises likelihood that a particular individual will develop a disease, disorder, and/or condition (e.g., a radiation injury). In some embodiments, risk is expressed as a percentage. In some embodiments, risk is from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 and up to 100%. In some embodiments, risk is expressed as a risk relative to a risk associated with a reference sample or group of reference samples. In some embodiments, a reference sample or group of reference samples have a known risk of a disease, disorder, condition and/or event (e.g., a radiation injury). In some embodiments a reference sample or group of reference samples are from individuals comparable to a particular individual. In some embodiments, relative risk is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more.

[0094] "Substantially" includes the qualitative condition of exhibiting total or near-total extent or degree of a characteristic or property of interest. One of ordinary skill in the biological arts will understand that biological and chemical phenomena rarely, if ever, go to completion and/or proceed to completeness or achieve or avoid an absolute result. The term "substantially" is therefore used herein to capture the potential lack of completeness inherent in many biological and chemical phenomena.

[0095] "Substantial homology" includes a comparison between amino acid or nucleic acid sequences. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be "substantially homologous" if they contain homologous residues in corresponding positions. Homologous residues may be identical residues. Alternatively, homologous residues may be non-identical residues with appropriately similar structural and/or functional characteristics. For example, as is well known by those of ordinary skill in the art, certain amino acids are typically classified as "hydrophobic" or "hydrophilic" amino acids, and/or as having "polar" or "non polar" side chains. Substitution of one amino acid for another of the same type may often be considered a "homologous" substitution. Typical amino acid categorizations are summarized below.

TABLE-US-00002 Alanine Ala A Nonpolar Neutral 1.8 Arginine Arg R Polar Positive -4.5 Asparagine Asn N Polar Neutral -3.5 Aspartic acid Asp D Polar Negative -3.5 Cysteine Cys C Nonpolar Neutral 2.5 Glutamic acid Glu E Polar Negative -3.5 Glutamine Gln Q Polar Neutral -3.5 Glycine Gly G Nonpolar Neutral -0.4 Histidine His H Polar Positive -3.2 Isoleucine Ile I Nonpolar Neutral 4.5 Leucine Leu L Nonpolar Neutral 3.8 Lysine Lys K Polar Positive -3.9 Methionine Met M Nonpolar Neutral 1.9 Phenylalanine Phe F Nonpolar Neutral 2.8 Proline Pro P Nonpolar Neutral -1.6 Serine Ser S Polar Neutral -0.8 Threonine Thr T Polar Neutral -0.7 Tryptophan Trp W Nonpolar Neutral -0.9 Tyrosine Tyr Y Polar Neutral -1.3 Valine Val V Nonpolar Neutral 4.2 Ambiguous Amino Acids 3-Letter 1-Letter Asparagine or aspartic acid Asx B Glutamine or glutamic acid Glx Z Leucine or Isoleucine Xle J Unspecified or unknown amino acid Xaa X

[0096] As is well known in this art, amino acid or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, and PSI-BLAST for amino acid sequences. Exemplary such programs are described in Altschul, S. F. et al., 1990, J. Mol. Biol., 215(3): 403-410; Altschul, S. F. et al., 1997, Methods in Enzymology; Altschul, S. F. et al., 1997, Nucleic Acids Res., 25:3389-3402; Baxevanis, A. D., and B. F. F. Ouellette (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998; and Misener et al. (eds.) Bioinformatics Methods and Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1998. In addition to identifying homologous sequences, the programs mentioned above typically provide an indication of the degree of homology. In some embodiments, two sequences are considered to be substantially homologous if at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more of their corresponding residues are homologous over a relevant stretch of residues. In some embodiments, the relevant stretch is a complete sequence. In some embodiments, the relevant stretch is at least 9, 10, 11, 12, 13, 14, 15, 16, 17 or more residues. In some embodiments, the relevant stretch includes contiguous residues along a complete sequence. In some embodiments, the relevant stretch includes discontinuous residues along a complete sequence, for example, noncontiguous residues brought together by the folded conformation of a polypeptide or a portion thereof. In some embodiments, the relevant stretch is at least 10, 15, 20, 25, 30, 35, 40, 45, 50, or more residues.

[0097] "Substantial identity" includes a comparison between amino acid or nucleic acid sequences. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be "substantially identical" if they contain identical residues in corresponding positions. As is well known in this art, amino acid or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, and PSI-BLAST for amino acid sequences. Exemplary such programs are described in Altschul, S. F. et al., 1990, J. Mol. Biol., 215(3): 403-410; Altschul, S. F. et al., 1997, Methods in Enzymology; Altschul, S. F. et al., 1997, Nucleic Acids Res., 25:3389-3402; Baxevanis, A. D., and B. F. F. Ouellette (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998; and Misener et al. (eds.) Bioinformatics Methods and Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1998. In addition to identifying identical sequences, the programs mentioned above typically provide an indication of the degree of identity. In some embodiments, two sequences are considered to be substantially identical if at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more of their corresponding residues are identical over a relevant stretch of residues. In some embodiments, the relevant stretch is a complete sequence. In some embodiments, the relevant stretch is at least 10, 15, 20, 25, 30, 35, 40, 45, 50, or more residues.

[0098] "Targeting vector," "targeting construct" or "nucleic acid construct" includes a polynucleotide molecule that comprises a targeting region. A targeting region comprises a sequence that is identical or substantially identical to a sequence in a target cell, tissue or animal and provides for integration of the targeting construct into a position within the genome of the cell, tissue or animal via homologous recombination. Targeting regions that target using site-specific recombinase recognition sites (e.g., loxP or Frt sites) are also included. In some embodiments, a targeting construct as described herein further comprises a nucleic acid sequence or gene (e.g., a reporter gene or homologous or heterologous gene) of particular interest, a selectable marker, control and or regulatory sequences, and other nucleic acid sequences that encodes a recombinase or recombinogenic protein. In some embodiments, a targeting construct may comprise a gene of interest in whole or in part, wherein the gene of interest encodes a polypeptide, in whole or in part, that has a similar function as a protein encoded by an endogenous sequence. In some embodiments, a targeting construct may comprises a humanized gene of interest, in whole or in part, wherein the humanized gene of interest encodes a polypeptide, in whole or in part, that has a similar function as a polypeptide encoded by an endogenous sequence. In some embodiments, a targeting construct may comprise a reporter gene, in whole or in part, wherein the reporter gene encodes a polypeptide that is easily identified and/or measured using techniques known in the art.

[0099] "Transgenic animal", "transgenic non-human animal" or "Tg.sup.+" includes any non-naturally occurring non-human animal in which one or more of the cells of the non-human animal contain heterologous nucleic acid and/or gene encoding a polypeptide of interest, in whole or in part. In some embodiments, a heterologous nucleic acid and/or gene is introduced into the cell, directly or indirectly by introduction into a precursor cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classic breeding techniques, but rather is directed to introduction of recombinant DNA molecule(s). This molecule may be integrated within a chromosome, or it may be extrachromosomally replicating DNA. The term "Tg.sup.+" includes animals that are heterozygous or homozygous for a heterologous nucleic acid and/or gene, and/or animals that have single or multi-copies of a heterologous nucleic acid and/or gene.

[0100] "Treatment", "Treat" or "Treating" includes any administration of a substance (e.g., a therapeutic candidate) that partially or completely alleviates, ameliorates, relives, inhibits, delays onset of, reduces severity of, and/or reduces incidence of one or more symptoms, features, and/or causes of a particular disease, disorder, and/or condition. In some embodiments, such treatment may be administered to a subject who does not exhibit signs of the relevant disease, disorder and/or condition and/or of a subject who exhibits only early signs of the disease, disorder, and/or condition. Alternatively or additionally, in some embodiments, treatment may be administered to a subject who exhibits one or more established signs of the relevant disease, disorder and/or condition. In some embodiments, treatment may be of a subject who has been diagnosed as suffering from the relevant disease, disorder, and/or condition. In some embodiments, treatment may be of a subject known to have one or more susceptibility factors that are statistically correlated with increased risk of development of the relevant disease, disorder, and/or condition.

[0101] "Variant" includes an entity that shows significant structural identity with a reference entity, but differs structurally from the reference entity in the presence or level of one or more chemical moieties as compared with the reference entity. In many embodiments, a "variant" also differs functionally from its reference entity. In general, whether a particular entity is properly considered to be a "variant" of a reference entity is based on its degree of structural identity with the reference entity. As will be appreciated by those skilled in the art, any biological or chemical reference entity has certain characteristic structural elements. A "variant", by definition, is a distinct chemical entity that shares one or more such characteristic structural elements. To give but a few examples, a small molecule may have a characteristic core structural element (e.g., a macrocycle core) and/or one or more characteristic pendent moieties so that a variant of the small molecule is one that shares the core structural element and the characteristic pendent moieties but differs in other pendent moieties and/or in types of bonds present (single vs. double, E vs. Z, etc.) within the core, a polypeptide may have a characteristic sequence element comprised of a plurality of amino acids having designated positions relative to one another in linear or three-dimensional space and/or contributing to a particular biological function, a nucleic acid may have a characteristic sequence element comprised of a plurality of nucleotide residues having designated positions relative to on another in linear or three-dimensional space. For example, a "variant polypeptide" may differ from a reference polypeptide as a result of one or more differences in amino acid sequence and/or one or more differences in chemical moieties (e.g., carbohydrates, lipids, etc.) covalently attached to the polypeptide backbone. In some embodiments, a "variant polypeptide" shows an overall sequence identity with a reference polypeptide that is at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99%. Alternatively or additionally, in some embodiments, a "variant polypeptide" does not share at least one characteristic sequence element with a reference polypeptide. In some embodiments, the reference polypeptide has one or more biological activities. In some embodiments, a "variant polypeptide" shares one or more of the biological activities of the reference polypeptide. In some embodiments, a "variant polypeptide" lacks one or more of the biological activities of the reference polypeptide. In some embodiments, a "variant polypeptide" shows a reduced level of one or more biological activities as compared with the reference polypeptide. In many embodiments, a polypeptide of interest is considered to be a "variant" of a parent or reference polypeptide if the polypeptide of interest has an amino acid sequence that is identical to that of the parent but for a small number of sequence alterations at particular positions. Typically, fewer than 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, or 2% of the residues in the variant are substituted as compared with the parent. In some embodiments, a "variant" has 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 substituted residue(s) as compared with a parent. Often, a "variant" has a very small number (e.g., fewer than 5, 4, 3, 2, or 1) number of substituted functional residues (i.e., residues that participate in a particular biological activity). Furthermore, a "variant" typically has not more than 5, 4, 3, 2, or 1 additions or deletions, and often has no additions or deletions, as compared with the parent. Moreover, any additions or deletions are typically fewer than about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 10, about 9, about 8, about 7, about 6, and commonly are fewer than about 5, about 4, about 3, or about 2 residues. In some embodiments, the parent or reference polypeptide is one found in nature. As will be understood by those of ordinary skill in the art, a plurality of variants of a particular polypeptide of interest may commonly be found in nature, particularly when the polypeptide of interest is an infectious agent polypeptide. In some embodiments, a non-human animal will comprise a variant of a nucleic acid construct used for targeted insertion of a heterologous hexanucleotide expansion sequence. As non-limiting examples, such nucleic acid constructs may comprise a 5' first heterologous hexanucleotide flanking sequence, n repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, a 3' second heterologous hexanucleotide flanking sequence, and optionally, a drug resistance reporter gene preferably flanked by recombinase recognition sequences. As shown in Example 1, an animal resulting from the targeted insertion may comprise in an endogenous locus a variant of the nucleic acid construct, e.g., wherein the variant comprises less than n repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 and/or lacks the drug resistance gene, see, e.g., FIGS. 1B and 1C. Accordingly, a variant of a sequence included herein includes sequences essentially identical to the reference parent sequence, but lacking one or more repeats and/or drug resistance gene(s).

[0102] "Vector" includes a nucleic acid molecule capable of transporting another nucleic acid to which it is associated. In some embodiment, vectors are capable of extra-chromosomal replication and/or expression of nucleic acids to which they are linked in a host cell such as a eukaryotic and/or prokaryotic cell. Vectors capable of directing the expression of operably linked genes are referred to herein as "expression vectors."

[0103] "Wild type" includes an entity having a structure and/or activity as found in nature in a "normal" (as contrasted with mutant, diseased, altered, etc.) state or context. Those of ordinary skill in the art will appreciate that wild type genes and polypeptides often exist in multiple different forms (e.g., alleles).

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

[0104] Non-human animals are provided having an insertion of a heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus. In some embodiments, non-human animals described herein are heterozygous for a modified C9ORF72 locus as described herein. In some embodiments, non-human animals as described herein comprise a first modified C9orf72 locus and a second modified C9orf72 locus, wherein the first and second loci are different. In some embodiments, non-human animals described herein are homozygous for a modified C9ORF72 locus as described herein. In some embodiments, non-human animals described herein develop ALS- and/or FTD-like disease due to the presence of the heterologous hexanucleotide repeat expansion sequence.

[0105] Various aspects of the invention are described in detail in the following sections. The use of sections is not meant to limit the invention. Each section can apply to any aspect of the invention. In this application, the use of "or" means "and/or" unless stated otherwise.

C9ORF72

[0106] Amyotrophic lateral sclerosis (ALS), also referred to as Lou Gehrig's disease, is the most frequent adult-onset paralytic disorder, characterized by the loss of upper and/or lower motor neurons. ALS occurs in as many as 20,000 individuals across the United States with about 5,000 new cases occurring each year. Frontotemporal dementia (FTD), originally referred to as Pick's disease after physician Arnold Pick, is a group of disorders caused by progressive cell degeneration in the frontal or temporal lobes of the brain. FTD is reported to count for 10-15% of all dementia cases. A hexanucleotide repeat expansion sequence between (and optionally spanning) exons 1a and 1b, two non-coding exons, of the human C9ORF72 gene have been linked to both ALS and FTD (DeJesus-Hernandez, M. et al., 2011, Neuron 72:245-256; Renton, A. E. et al., 2011, Neuron 72:257-268; Majounie, E. et al., 2012, Lancet Neurol. 11:323-330; Waite, A. J. et al., 2014, Neurobiol. Aging 35:1779.e5-1779.e13). It is estimated that the GGGGCC (SEQ ID NO:1) hexanucleotide repeat expansion accounts for about 50% of familial and many non-familial ALS cases. It is present in about 25% of familial FTD cases and about 8% of sporadic.

[0107] Many pathological aspects related to the hexanucleotide repeat expansion sequence in C9ORF72 have been reported such as, for example, repeat length-dependent formation of RNA foci, sequestration of specific RNA-binding proteins, and accumulation and aggregation of dipeptide repeat proteins (e.g., reviewed in Stepto, A. et al., 2014, Acta Neuropathol. 127:377-389; see also Almeida, S. et al., 2013, Acta Neuropathol. 126:385-399; Bieniek, K. F. et al., 2014, JAMA Neurol. 71(6): 775-781; van Blitterswijk, M. et al., 2014, Mol. Neurodegen. 9:38, 10 pages). Knock-in mice that have been generated to contain a heterologous hexanucleotide repeat expansion sequence comprising 66 repeats of the hexanucleotide sequence (GGGGCC; SEQ ID NO:1) exhibit RNA foci and dipeptide protein aggregates in their neurons. These mice showed cortical neuron loss and exhibited behavior and motor deficits at 6 months of age (Chew, J. et al., 2015, Science May 14. Pii:aaa9344). However, the mechanism through which such repeat expansions cause disease, whether through a loss- or gain-of-function of toxicity, remains unclear. Additionally, the contribution of a lower number of repeats in the hexanucleotide repeat expansion sequence to ALS/FTD is also unknown.

[0108] Although C9ORF72 has been reported to regulate endosomal trafficking (Farg, M. A. et al., 2014, Human Mol. Gen. 23(13):3579-3595), much of the cellular function of C9ORF72 remains unknown. Indeed, C9ORF72 is a gene that encodes an uncharacterized protein with unknown function. Despite the lack of understanding surrounding C9ORF72, several animal models, including engineered cell lines, for ALS and/or FTD have been developed (Roberson, E. D., 2012, Ann. Neurol. 72(6):837-849; Panda, S. K. et al., 2013, Genetics 195:703-715; Suzuki, N. et al., 2013, Nature Neurosci. 16(12):1725-1728; Xu, Z. et al., 2013, Proc. Nat. Acad. Sci. U.S.A. 110(19):7778-7783; Hukema, R. K. et al., 2014, Acta Neuropathol. Comm. 2:166, 4 pages). Another report using a transgenic mouse strain containing a heterologous hexanucleotide repeat expansion sequence comprising 80 GGGGCC repeats operably linked with a fluorescent reporter and controlled by a tetracycline responsive element without any surrounding C9orf72 sequences demonstrated neuronal cytoplasmic inclusions similar to those seen in ALS-FTD patients, which suggests that expanded repeats of the hexanucleotide GGGGCC sequence itself may be responsible for disease (Hukema, R. K. et al., 2014, Acta Neuropathol. Comm. 2:166, 4 pages). These mice have been useful to establish an initial C9orf72 expression profile in cells of the CNS and provide some understanding of the mechanism of action associated with the repeat expansion; however, construct design can influence the phenotype of the resulting transgenic animal (see, e.g., Muller, U., 1999, Mech. Develop. 81:3-21). For example, a transgenic mouse strain containing an inducible GGGGCC repeat (Hukema, 2014, supra) was designed without human flanking sequence presumably due to the fact that such surrounding sequence was thought to affect translation of repeat sequences. Thus, such in vivo systems exploiting C9ORF72-mediated biology for therapeutic applications are incomplete.

C9ORF72 and Hexanucleotide Repeat Expansion Sequences

[0109] Mouse C9ORF72 transcript variants have been reported in the art (e.g., Koppers et al., Ann Neurol (2015); 78: 426-438; Atkinson et al., Acta Neuropathologica Communications (2015) 3: 59), and are also depicted in FIG. 1A. The genomic information for the three reported mouse C9ORF72 transcript variants is also available at the Ensembl web site under designations of ENSMUST00000108127 (V1), ENSMUST00000108126 (V2), and ENSMUST00000084724 (V3). Exemplary non-human (e.g., rodent) C9ORF72 mRNA and amino acid sequences are set forth in Table 2. For mRNA sequences, bold font contained within parentheses indicates coding sequence and consecutive exons, where indicated, are separated by alternating lower and upper case letters. For amino acid sequences, mature polypeptide sequences, where indicated, are in bold font.

[0110] Human C9ORF72 transcript variants are known in the art. One human C9ORF72 transcript variant lacks multiple exons in the central and 3' coding regions, and its 3' terminal exon extends beyond a splice site that is used in variant 3 (see below), which results in a novel 3' untranslated region (UTR) as compared to variant 3. This variant encodes a significantly shorter polypeptide and its C-terminal amino acid is distinct as compared to that which is encoded by two other variants. The mRNA and amino acid sequences of this variant can be found at GenBank accession numbers NM_145005.6 and NP_659442.2, respectively, and are hereby incorporated by reference. The sequences of NM_145005.6 and NP_659442.2 are respectively set forth as SEQ ID NO:10 and SEQ ID NO:11. A second human C9ORF72 transcript variant (2) differs in the 5' untranslated region (UTR) compared to variant 3. The mRNA and amino acid sequences of this variant can be found at GenBank accession numbers NM_018325.4 and NP_060795.1, respectively, and are hereby incorporated by reference. The sequences of NM_018325.4 and NP_060795.1 are respectively set forth as SEQ ID NO:12 and SEQ ID NO:13. A third human C9ORF72 transcript variant (3) contains the longest sequence among three reported variants and encodes the longer isoform. The mRNA and amino acid sequences of this variant can be found at GenBank accession numbers NM_001256054.2 and NP_001242983.1, respectively, and are hereby incorporated by reference. The sequences of NM_001256054.2 and NP_001242983.1 are respectively set forth as SEQ ID NO:14 and SEQ ID NO:15. Variants 2 and 3 encode the same protein.

[0111] A hexanucleotide repeat expansion sequence is generally a nucleotide sequence comprising at least one instance, e.g., one repeat, of the hexanucleotide sequence GGGGCC set forth as SEQ ID NO:1. For purposes of insertion into an endogenous non-human C9orf72 locus, a heterologous hexanucleotide repeat expansion sequence comprises at least one instance (repeat) and preferably more than one instance (repeat) of the hexanucleotide sequence set forth as SEQ ID NO:1 and may be identical to, or substantially identical to a genomic nucleic acid sequence spanning (and optionally including) non-coding exons 1a and 1b of a human `chromosome 9 open reading frame 72` (C9orf72), or a portion thereof. Non-limiting examples of heterologous hexanucleotide expansion sequences include the sequences set forth as SEQ ID NO:1, SEQ ID NO:2 (comprising three repeats of the GGGGCC hexanucleotide sequence) and SEQ ID NO:3 (comprising 100 repeats of the GGGGCC hexanucleotide sequence).

TABLE-US-00003 TABLE 2 Mus musculus C9orf72 mRNA (NM_001081343; SEQ ID NO: 16) gtgtccggggcggggcggtcccggggcggggcccggagcgg gctgcggttgcggtccctgcgccggcggtgaaggcgcagca gcggcgagtggCTATTGCAAGCGTTCGGATAATGTGAGACC TGGAATGCAGTGAGACCTGGGATGCAGGG(ATGTCGACTAT CTGCCCCCCACCATCTCCTGCTGTTGCCAAGACAGAGATTG CTTTAAGTGGTGAATCACCCTTGTTGGCGGCTACCTTTGCT TACTGGGATAATATTCTTGGTCCTAGAGTAAGGCATATTTG GGCTCCAAAGACAGACCAAGTGCTTCTCAGTGATGGAGAAA TAACTTTTCTTGCCAACCACACTCTAAATGGAGAAATTCTT CGAAATGCAGAGAGTGGGGCTATAGATGTAAAATTTTTTGT CTTATCTGAAAAAGGGGTAATTATTGTTTCATTAATCTTCG ACGGAAACTGGAATGGAGATCGGAGCACTTATGGACTATCA ATTATACTGCCGCAGACAGAGCTGAGCTTCTACCTCCCACT TCACAGAGTGTGTGTTGACAGGCTAACACACATTATTCGAA AAGGAAGAATATGGATGCATAAGgaaagacaagaaaatgtc cagaaaattgtcttggaaggcacagagaggatggaagatca gGGTCAGAGTATCATTCCCATGCTTACTGGGGAAGTCATTC CTGTAATGGAGCTGCTTGCATCTATGAAATCCCACAGTGTT CCTGAAGACATTGATatagctgatacagtgctcaatgatga tgacattggtgacagctgtcacgaaggctttcttctcaaTG CCATCAGCTCACACCTGCAGACCTGTGGCTGTTCCGTTGTA GTTGGCAGCAGTGCAGAGAAAGTAAATAAGatagtaagaac gctgtgcctttttctgacaccagcagagaggaaatgctcca ggctgtgtgaagcagaatcgtcctttaagtacgaatcggga ctctttgtgcaaggcttgctaaagGATGCAACAGGCAGTTT TGTCCTACCCTTCCGGCAAGTTATGTATGCCCCGTACCCCA CCACGCACATTGATGTGGATGTCAACACTGTCAAGCAGATG CCACCGTGTCATGAACATATTTATAATCAACGCAGATACAT GAGGTCAGAGCTGACAGCCTTCTGGAGGGCAACTTCAGAAG AGGACATGGCGCAGGACACCATCATCTACACAGATGAGAGC TTCACTCCTGATTTgaatattttccaagatgtcttacacag agacactctagtgaaagccttcctggatcagGTCTTCCATT TGAAGCCTGGCCTGTCTCTCAGGAGTACTTTCCTTGCACAG TTCCTCCTCATTCTTCACAGAAAAGCCTTGACACTAATCAA GTACATCGAGGATGATACgcagaaggggaaaaagcccttta agtctcttcggaacctgaagatagatcttgatttaacagca gagggcgatcttaacataataatggctctagctgagaaaat taagccaggcctacactctttcatctttgggagacctttct acactagtgtacaagaacgtgatgttctaatgaccttttg a)ccgtgtggtttgctgtgtctgtctcttcacagtcacacc tgctgttacagtgtctcagcagtgtgtgggcacatccttcc tcccgagtcctgctgcaggacagggtacactacacttgtca gtagaagtctgtacctgatgtcaggtgcatcgttacagtga atgactcttcctagaatagatgtactcttttagggccttat gtttacaattatcctaagtactattgctgtcttttaaagat atgaatgatggaatatacacttgaccataactgctgattgg ttttttgttttgttttgtttgttttcttggaaacttatgat tcctggtttacatgtaccacactgaaaccctcgttagcttt acagataaagtgtgagttgacttcctgcccctctgtgttct gtggtatgtccgattacttctgccacagctaaacattagag catttaaagtttgcagttcctcagaaaggaacttagtctga ctacagattagttcttgagagaagacactgatagggcagag ctgtaggtgaaatcagttgttagcccttcctttatagacgt agtccttcagattcggtctgtacagaaatgccgaggggtca tgcatgggccctgagtatcgtgacctgtgacaagttttttg ttggtttattgtagttctgtcaaagaaagtggcatttgttt ttataattgttgccaacttttaaggttaattttcattattt ttgagccgaattaaaatgcgcacctcctgtgcctttcccaa tcttggaaaatataatttcttggcagagggtcagatttcag ggcccagtcactttcatctgaccaccctttgcacggctgcc gtgtgcctggcttagattagaagtccttgttaagtatgtca gagtacattcgctgataagatctttgaagagcagggaagcg tcttgcctctttcctttggtttctgcctgtactctggtgtt tcccgtgtcacctgcatcataggaacagcagagaaatctga cccagtgctatttttctaggtgctactatggcaaactcaag tggtctgtttctgttcctgtaacgttcgactatctcgctag ctgtgaagtactgattagtggagttctgtgcaacagcagtg taggagtatacacaaacacaaatatgtgtttctatttaaaa ctgtggacttagcataaaaagggagaatatatttatttttt acaaaagggataaaaatgggccccgttcctcacccaccaga tttagcgagaaaaagctttctattctgaaaggtcacggtgg ctttggcattacaaatcagaacaacacacactgaccatgat ggcttgtgaactaactgcaaggcactccgtcatggtaagcg agtaggtcccacctcctagtgtgccgctcattgctttacac agtagaatcttatttgagtgctaattgttgtctttgctgct ttactgtgttgttatagaaaatgtaagctgtacagtgaata agttattgaagcatgtgtaaacactgttatatatcttttct cctagatggggaattttgaataaaatacctttgaaattctg tgt Mus musculus C9orf72 amino acid (NP_001074812; SEQ ID NO: 17) MSTICPPPSPAVAKTEIALSGESPLLAATFAYWDNILGPRV RHIWAPKTDQVLLSDGEITFLANHTLNGEILRNAESGAIDV KFFVLSEKGVIIVSLIFDGNWNGDRSTYGLSIILPQTELSF YLPLHRVCVDRLTHIIRKGRIWMHKERQENVQKIVLEGTER MEDQGQSIIPMLTGEVIPVMELLASMKSHSVPEDIDIADTV LNDDDIGDSCHEGFLLNAISSHLQTCGCSVVVGSSAEKVNK IVRTLCLFLTPAERKCSRLCEAESSFKYESGLFVQGLLKDA TGSFVLPFRQVMYAPYPTTHIDVDVNTVKQMPPCHEHIYNQ RRYMRSELTAFWRATSEEDMAQDTIIYTDESFTPDLNIFQD VLHRDTLVKAFLDQVFHLKPGLSLRSTFLAQFLLILHRKAL TLIKYIEDDTQKGKKPFKSLRNLKIDLDLTAEGDLNIIMAL AEKIKPGLHSFIFGRPFYTSVQERDVLMTF Rattus norvegicus C9orf72 mRNA (NM_001007702; SEQ ID NO: 18) CGTTTGTAGTGTCAGCCATCCCAATTGCCTGTTCCTTCTCT GTGGGAGTGGTGTCTAGACAGTCCAGGCAGGGTATGCTAGG CAGGTGCGTTTTGGTTGCCTCAGATCGCAACTTGACTCCAT AACGGTGACCAAAGACAAAAGAAGGAAACCAGATTAAAAAG AACCGGACACAGACCCCTGCAGAATCTGGAGCGGCCGTGGT TGGGGGCGGGGCTACGACGGGGCGGACTCGGGGGCGTGGGA GGGCGGGGCCGGGGCGGGGCCCGGAGCCGGCTGCGGTTGCG GTCCCTGCGCCGGCGGTGAAGGCGCAGCGGCGGCGAGTGGC TATTGCAAGCGTTTGGATAATGTGAGACCTGGGATGCAGG G(ATGTCGACTATCTGCCCCCCACCATCTCCTGCTGTTGCC AAGACAGAGATTGCTTTAAGTGGTGAATCACCCTTGTTGGC GGCTACCTTTGCTTACTGGGATAATATTCTTGGTCCTAGAG TAAGGCACATTTGGGCTCCAAAGACAGACCAAGTACTCCTC AGTGATGGAGAAATCACTTTTCTTGCCAACCACACTCTGAA TGGAGAAATTCTTCGGAATGCGGAGAGTGGGGCAATAGATG TAAAGTTTTTTGTCTTATCTGAAAAGGGCGTCATTATTGTT TCATTAATCTTCGACGGGAACTGGAACGGAGATCGGAGCAC TTACGGACTATCAATTATACTGCCGCAGACGGAGCTGAGTT TCTACCTCCCACTGCACAGAGTGTGTGTTGACAGGCTAACG CACATCATTCGAAAAGGAAGGATATGGATGCACAAGGAAAG ACAAGAAAATGTCCAGAAAATTGTCTTGGAAGGCACCGAGA GGATGGAAGATCAGGGTCAGAGTATCATCCCTATGCTTACT GGGGAGGTCATCCCTGTGATGGAGCTGCTTGCGTCTATGAG ATCACACAGTGTTCCTGAAGACCTCGATATAGCTGATACAG TACTCAATGATGATGACATTGGTGACAGCTGTCATGAAGGC TTTCTTCTCAATGCCATCAGCTCACATCTGCAGACCTGCGG CTGTTCTGTGGTGGTAGGCAGCAGTGCAGAGAAAGTAAATA AGATAGTAAGAACACTGTGCCTTTTTCTGACACCAGCAGAG AGGAAGTGCTCCAGGCTGTGTGAAGCCGAATCGTCCTTTAA ATACGAATCTGGACTCTTTGTACAAGGCTTGCTAAAGGATG CGACTGGCAGTTTTGTACTACCTTTCCGGCAAGTTATGTAT GCCCCTTATCCCACCACACACATCGATGTGGATGTCAACAC TGTCAAGCAGATGCCACCGTGTCATGAACATATTTATAATC AACGCAGATACATGAGGTCAGAGCTGACAGCCTTCTGGAGG GCAACTTCAGAAGAGGACATGGCTCAGGACACCATCATCTA CACAGATGAGAGCTTCACTCCTGATTTGAATATTTTCCAAG ATGTCTTACACAGAGACACTCTAGTGAAAGCCTTTCTGGAT CAGGTCTTCCATTTGAAGCCTGGCCTGTCTCTCAGGAGTAC TTTCCTTGCACAGTTCCTCCTCATTCTTCACAGAAAAGCCT TGACACTAATCAAGTACATAGAGGATGACACGCAGAAGGGG AAAAAGCCCTTTAAGTCTCTTCGGAACCTGAAGATAGATCT TGATTTAACAGCAGAGGGCGACCTTAACATAATAATGGCTC TAGCTGAGAAAATTAAGCCAGGCCTACACTCTTTCATCTTC GGGAGACCTTTCTACACTAGTGTCCAAGAACGTGATGTTCT AATGACTTTTTAA)ACATGTGGTTTGCTCCGTGTGTCTCAT GACAGTCACACTTGCTGTTACAGTGTCTCAGCGCTTTGGAC ACATCCTTCCTCCAGGGTCCTGCCGCAGGACACGTTACACT ACACTTGTCAGTAGAGGTCTGTACCAGATGTCAGGTACATC GTTGTAGTGAATGTCTCTTTTCCTAGACTAGATGTACCCTC GTAGGGACTTATGTTTACAACCCTCCTAAGTACTAGTGCTG TCTTGTAAGGATACGAATGAAGGGATGTAAACTTCACCACA ACTGCTGGTTGGTTTTGTTGTTTTTGTTTTTTGAAACTTAT AATTCATGGTTTACATGCATCACACTGAAACCCTAGTTAGC TTTTTACAGGTAAGCTGTGAGTTGACTGCCTGTCCCTGTGT TCTCTGGCCTGTACGATCTGTGGCGTGTAGGATCACTTTTG CAACAACTAAAAACTAAAGCACTTTGTTTGCAGTTCTACAG AAAGCAACTTAGTCTGTCTGCAGATTCGTTTTTGAAAGAAG ACATGAGAAAGCGGAGTTTTAGGTGAAGTCAGTTGTTGGAT CTTCCTTTATAGACTTAGTCCTTTAGATGTGGTCTGTATAG ACATGCCCAACCATCATGCATGGGCACTGAATATCGTGAAC TGTGGTATGCTTTTTGTTGGTTTATTGTACTTCTGTCAAAG AAAGTGGCATTGGTTTTTATAATTGTTGCCAAGTTTTAAGG TTAATTTTCATTATTTTTGAGCCAAATTAAAATGTGCACCT CCTGTGCCTTTCCCAATCTTGGAAAATATAATTTCTTGGCA GAAGGTCAGATTTCAGGGCCCAGTCACTTTCGTCTGACTTC CCTTTGCACAGTCCGCCATGGGCCTGGCTTAGAAGTTCTTG TAAACTATGCCAGAGAGTACATTCGCTGATAAAATCTTCTT TGCAGAGCAGGAGAGCTTCTTGCCTCTTTCCTTTCATTTCT GCCTGGACTTTGGTGTTCTCCACGTTCCCTGCATCCTAAGG ACAGCAGGAGAACTCTGACCCCAGTGCTATTTCTCTAGGTG CTATTGTGGCAAACTCAAGCGGTCCGTCTCTGTCCCTGTAA CGTTCGTACCTTGCTGGCTGTGAAGTACTGACTGGTAAAGC TCCGTGCTACAGCAGTGTAGGGTATACACAAACACAAGTAA GTGTTTTATTTAAAACTGTGGACTTAGCATAAAAAGGGAGA CTATATTTATTTTTTACAAAAGGGATAAAAATGGAACCCTT TCCTCACCCACCAGATTTAGTCAGAAAAAAACATTCTATTC TGAAAGGTCACAGTGGTTTTGACATGACACATCAGAACAAC GCACACTGTCCATGATGGCTTATGAACTCCAAGTCACTCCA TCATGGTAAATGGGTAGATCCCTCCTTCTAGTGTGCCACAC CATTGCTTCCCACAGTAGAATCTTATTTAAGTGCTAAGTGT TGTCTCTGCTGGTTTACTCTGTTGTTTTAGAGAATGTAAGT TGTATAGTGAATAAGTTATTGAAGCATGTGTAAACACTGTT ATACATCTTTTCTCCTAGATGGGGAATTTGGAATAAAATAC CTTTAAAATTCAAAAAAAAAAAAAAAAAAAAAAAA Rattus norvegicus C9orf72 amino acid (NP_001007703; SEQ ID NO: 19) MSTICPPPSPAVAKTEIALSGESPLLAATFAYWDNILGPRV RHIWAPKTDQVLLSDGEITFLANHTLNGEILRNAESGAIDV KFFVLSEKGVIIVSLIFDGNWNGDRSTYGLSIILPQTELSF YLPLHRVCVDRLTHIIRKGRIWMHKERQENVQKIVLEGTER MEDQGQSIIPMLTGEVIPVMELLASMRSHSVPEDLDIADTV LNDDDIGDSCHEGFLLNAISSHLQTCGCSVVVGSSAEKVNK IVRTLCLFLTPAERKCSRLCEAESSFKYESGLFVQGLLKDA TGSFVLPFRQVMYAPYPTTHIDVDVNTVKQMPPCHEHIYNQ RRYMRSELTAFWRATSEEDMAQDTIIYTDESFTPDLNIFQD VLHRDTLVKAFLDQVFHLKPGLSLRSTFLAQFLLILHRKAL TLIKYIEDDTQKGKKPFKSLRNLKIDLDLTAEGDLNIIMAL AEKIKPGLHSFIFGRPFYTSVQERDVLMTF

C9ORF72 Targeting Vectors and Production of Non-Human Animals Having a Heterologous Hexanucleotide Repeat Expansion Sequence Inserted in a C9ORF72 Locus

[0112] Provided herein are targeting vectors or targeting constructs for the production of non-human animals having a heterologous hexanucleotide expansion sequence inserted into an endogenous C9ORF72 locus as described herein.

[0113] A. Large Targeting Vectors

[0114] In cells other than one-cell stage embryos, a targeting vector that is a "large targeting vector" or "LTVEC" can be used, which includes targeting vectors that comprise homology arms that correspond to and are derived from nucleic acid sequences larger than those typically used by other approaches intended to perform homologous recombination in cells. LTVECs also include targeting vectors comprising nucleic acid inserts having nucleic acid sequences larger than those typically used by other approaches intended to perform homologous recombination in cells. For example, LTVECs make possible the modification of large loci that cannot be accommodated by traditional plasmid-based targeting vectors because of their size limitations. For example, the targeted locus can be (i.e., the 5' and 3' homology arms can correspond to a locus of the cell that is not targetable using a conventional method or that can be targeted only incorrectly or only with significantly low efficiency in the absence of a nick or double-strand break induced by a nuclease agent (e.g., a Cas protein).

[0115] A targeting vector includes homology arms. If the targeting vector also comprises a nucleic acid insert, the homology arms can flank the nucleic acid insert. For ease of reference, the homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms. This terminology relates to the relative position of the homology arms to the nucleic acid insert within the exogenous repair template. The 5' and 3' homology arms correspond to regions within the genomic region of interest, which are referred to herein as "5' target sequence" and "3' target sequence," respectively.

[0116] A homology arm and a target sequence "correspond" or are "corresponding" to one another when the two regions share a sufficient level of sequence identity to one another to act as substrates for a homologous recombination reaction. The term "homology" includes DNA sequences that are either identical or share sequence identity to a corresponding sequence. The sequence identity between a given target sequence and the corresponding homology arm found in the exogenous repair template can be any degree of sequence identity that allows for homologous recombination to occur. For example, the amount of sequence identity shared by the homology arm of the exogenous repair template (or a fragment thereof) and the target sequence (or a fragment thereof) can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that the sequences undergo homologous recombination. A corresponding region of homology between the homology arm and the corresponding target sequence can be of any length that is sufficient to promote homologous recombination. The homology arms can be symmetrical (each about the same size in length), or they can be asymmetrical (one longer than the other).

[0117] The homology arms can correspond to a locus that is native to a cell (e.g., the targeted locus). Alternatively, for example, they can correspond to a region of a heterologous or exogenous segment of DNA that was integrated into the genome of the cell, including, for example, transgenes, expression cassettes, or heterologous or exogenous regions of DNA. Alternatively, the homology arms of the targeting vector can correspond to a region of a yeast artificial chromosome (YAC), a bacterial artificial chromosome (BAC), a human artificial chromosome, or any other engineered region contained in an appropriate host cell. Still further, the homology arms of the targeting vector can correspond to or be derived from a region of a BAC library, a cosmid library, or a P1 phage library, or can be derived from synthetic DNA.

[0118] Examples of LTVECs include vectors derived from a bacterial artificial chromosome (BAC), a human artificial chromosome, or a yeast artificial chromosome (YAC). Non-limiting examples of LTVECs and methods for making them are described, e.g., in U.S. Pat. Nos. 6,586,251; 6,596,541; and 7,105,348; and in WO 2002/036789, each of which is herein incorporated by reference in its entirety for all purposes. LTVECs can be in linear form or in circular form.

[0119] LTVECs can be of any length and are typically at least 10 kb in length. For example, an LTVEC can be from about 50 kb to about 500 kb, from about 50 kb to about 75 kb, from about 75 kb to about 100 kb, from about 100 kb to about 125 kb, from about 125 kb to about 150 kb, from about 150 kb to about 175 kb, from about 175 kb to about 200 kb, from about 200 kb to about 225 kb, from about 225 kb to about 250 kb, from about 250 kb to about 275 kb, from about 275 kb to about 300 kb, from about 300 kb to about 325 kb, from about 325 kb to about 350 kb, from about 350 kb to about 375 kb, from about 375 kb to about 400 kb, from about 400 kb to about 425 kb, from about 425 kb to about 450 kb, from about 450 kb to about 475 kb, or from about 475 kb to about 500 kb. Alternatively, an LTVEC can be at least 10 kb, at least 15 kb, at least 20 kb, at least 30 kb, at least 40 kb, at least 50 kb, at least 60 kb, at least 70 kb, at least 80 kb, at least 90 kb, at least 100 kb, at least 150 kb, at least 200 kb, at least 250 kb, at least 300 kb, at least 350 kb, at least 400 kb, at least 450 kb, or at least 500 kb or greater. The size of an LTVEC can be too large to enable screening of targeting events by conventional assays, e.g., southern blotting and long-range (e.g., 1 kb to 5 kb) PCR.

[0120] The sum total of the 5' homology arm and the 3' homology arm in an LTVEC is typically at least 10 kb. As an example, the 5' homology arm can range from about 5 kb to about 150 kb and/or the 3' homology arm can range from about 5 kb to about 150 kb. Each homology arm can be, for example, from about 5 kb to about 10 kb, from about 10 kb to about 20 kb, from about 20 kb to about 30 kb, from about 30 kb to about 40 kb, from about 40 kb to about 50 kb, from about 50 kb to about 60 kb, from about 60 kb to about 70 kb, from about 70 kb to about 80 kb, from about 80 kb to about 90 kb, from about 90 kb to about 100 kb, from about 100 kb to about 110 kb, from about 110 kb to about 120 kb, from about 120 kb to about 130 kb, from about 130 kb to about 140 kb, from about 140 kb to about 150 kb, from about 150 kb to about 160 kb, from about 160 kb to about 170 kb, from about 170 kb to about 180 kb, from about 180 kb to about 190 kb, or from about 190 kb to about 200 kb. The sum total of the 5' and 3' homology arms can be, for example, from about 10 kb to about 20 kb, from about 20 kb to about 30 kb, from about 30 kb to about 40 kb, from about 40 kb to about 50 kb, from about 50 kb to about 60 kb, from about 60 kb to about 70 kb, from about 70 kb to about 80 kb, from about 80 kb to about 90 kb, from about 90 kb to about 100 kb, from about 100 kb to about 110 kb, from about 110 kb to about 120 kb, from about 120 kb to about 130 kb, from about 130 kb to about 140 kb, from about 140 kb to about 150 kb, from about 150 kb to about 160 kb, from about 160 kb to about 170 kb, from about 170 kb to about 180 kb, from about 180 kb to about 190 kb, from about 190 kb to about 200 kb, from about 200 kb to about 250 kb, from about 250 kb to about 300 kb, from about 300 kb to about 350 kb, or from about 350 kb to about 400 kb. Alternatively, each homology arm can be at least 5 kb, at least 10 kb, at least 15 kb, at least 20 kb, at least 30 kb, at least 40 kb, at least 50 kb, at least 60 kb, at least 70 kb, at least 80 kb, at least 90 kb, at least 100 kb, at least 110 kb, at least 120 kb, at least 130 kb, at least 140 kb, at least 150 kb, at least 160 kb, at least 170 kb, at least 180 kb, at least 190 kb, or at least 200 kb. Likewise, the sum total of the 5' and 3' homology arms can be at least 10 kb, at least 15 kb, at least 20 kb, at least 30 kb, at least 40 kb, at least 50 kb, at least 60 kb, at least 70 kb, at least 80 kb, at least 90 kb, at least 100 kb, at least 110 kb, at least 120 kb, at least 130 kb, at least 140 kb, at least 150 kb, at least 160 kb, at least 170 kb, at least 180 kb, at least 190 kb, at least 200 kb, at least 250 kb, at least 300 kb, at least 350 kb, or at least 400 kb.

[0121] LTVECs can comprise nucleic acid inserts having nucleic acid sequences larger than those typically used by other approaches intended to perform homologous recombination in cells. For example, an LTVEC can comprise a nucleic acid insert ranging from about 1 kb to about 5 kb, from about 5 kb to about 10 kb, from about 10 kb to about 20 kb, from about 20 kb to about 40 kb, from about 40 kb to about 60 kb, from about 60 kb to about 80 kb, from about 80 kb to about 100 kb, from about 100 kb to about 150 kb, from about 150 kb to about 200 kb, from about 200 kb to about 250 kb, from about 250 kb to about 300 kb, from about 300 kb to about 350 kb, from about 350 kb to about 400 kb, from about 400 kb to about 450 kb, from about 450 kb to about 500 kb, or greater. Alternatively, the nucleic acid insert can be at least 1 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, at least 40 kb, at least 60 kb, at least 80 kb, at least 100 kb, at least 150 kb, at least 200 kb, at least 250 kb, at least 300 kb, at least 350 kb, at least 400 kb, at least 450 kb, or at least 500 kb.

[0122] B. Construction of Large Targeting Vectors

[0123] Many of the techniques used to construct targeting vectors described herein are standard molecular biology techniques well known to the skilled artisan (see, e.g., Sambrook, J., E. F. Fritsch and T. Maniatis. Molecular Cloning: A Laboratory Manual, Second Edition, Vols. 1, 2, and 3, 1989; Current Protocols in, Molecular Biology, Eds. Ausubel et al., Greene Publ. Assoc., Wiley Interscience, NY). Any methods known in the art for constructing large targeting vectors can be used.

[0124] In one example, the method for constructing a large targeting vector (LTVEC) comprises: (a) obtaining a large genomic DNA clone containing the gene/genes or chromosomal locus/loci of interest; and (b) appending homology boxes 1 and 2 to a modification cassette to generate the LTVEC. Optionally, such methods can further comprise verifying that each LTVEC has been engineered correctly. Optionally, such methods can further comprise purification, preparation, and linearization of LTVEC DNA for introduction into eukaryotic cells. Such methods are further described in US 2004/0018626, US 2013/0309670, and WO 2013/163394, each of which is herein incorporated by reference in its entirety for all purposes.

[0125] Genes or loci of interest can be selected based on specific criteria, such as detailed structural or functional data, or they can be selected in the absence of such detailed information as potential genes or gene fragments become predicted through the efforts of the various genome sequencing projects. It is not necessary to know the complete sequence and gene structure of a gene or locus of interest to produce LTVECs. The only sequence information that is required is approximately 80-100 nucleotides so as to obtain the genomic clone of interest as well as to generate the homology boxes used in making the LTVEC and to make probes for use in quantitative modification-of-allele (MOA) assays.

[0126] Once a gene or locus of interest has been selected, a large genomic clone containing this gene or locus can be obtained. This clone can be obtained in any one of several ways including, but not limited to, screening suitable DNA libraries (e.g., BAC, PAC, YAC, or cosmid) by standard hybridization or PCR techniques, or by any other methods familiar to the skilled artisan.

[0127] Homology boxes mark the sites of bacterial homologous recombination that are used to generate LTVECs from large cloned genomic fragments. Homology boxes are short segments of DNA, generally double-stranded and at least 40 nucleotides in length, that are homologous to regions within the large cloned genomic fragment flanking the region to be modified. The homology boxes are appended to the modification cassette so that following homologous recombination in bacteria, the modification cassette replaces the region to be modified. The technique of creating a targeting vector using bacterial homologous recombination can be performed in a variety of systems (see, e.g., Yang et al. (1997) Nat. Biotechnol. 15:859-865, Muyrers et al. (1999) Nucleic Acids Res. 27:1555-1557; Angrand et al. (1999) Nucleic Acids Res. 27:e16; Narayanan et al. (1999) Gene Ther. 6:442-447; Yu, et al. (2000) Proc. Natl. Acad. Sci. U.S.A. 97:5978-5983, each of which is herein incorporated by reference in its entirety for all purposes). One example of such a technology is ET cloning (see, e.g., Zhang et al. (1998) Nat. Genet. 20:123-128; Narayanan et al. (1999) Gene Ther. 6:442-447, each of which is herein incorporated by reference in its entirety for all purposes) and variations of this technology (see, e.g., Yu et al. (2000) Proc. Natl. Acad. Sci. U.S.A. 97:5978-5983, herein incorporated by reference in its entirety for all purposes). ET refers to the recE and recT proteins that carry out the homologous recombination reaction. RecE is an exonuclease that trims one strand of linear double-stranded DNA 5' to 3', thus leaving behind a linear double-stranded fragment with a 3' single-stranded overhang. This single-stranded overhang is coated by recT protein, which has single-stranded DNA (ssDNA) binding activity. ET cloning is performed using E. coli that transiently express the E. coli gene products of recE and recT and the bacteriophage lambda (.lamda.) protein .lamda.gam. The .lamda.gam protein is protects the donor DNA fragment from degradation by the recBC exonuclease system and it is preferred for efficient ET cloning in recBC.sup.+ hosts such as the frequently used E. coli strain DH10b.

[0128] LTVECs can also be generated by DNA assembly methods, such as in vitro DNA assembly methods including Gibson DNA assembly or modifications of Gibson DNA assembly. See, e.g., US 2015/0376628, US 2016/0115486, WO 2015/200334, and US 2010/0035768, each of which is incorporated by reference in its entirety for all purposes.

[0129] Traditional methods of assembling nucleic acids employ time consuming steps of conventional enzymatic digestion with restriction enzymes, cloning of the nucleic acids, and ligating nucleic acids together. These methods are made more difficult when large fragments or vectors are being assembled together. However, the malleable target specificity of nucleases (e.g., guide RNAs and Cas9 nucleases) can be taken advantage of to convert nucleic acids into a form suitable for use in rapid assembly reactions. See, e.g., US 2015/0376628, US 2016/0115486, and WO 2015/200334, each of which is incorporated by reference in its entirety for all purposes.

[0130] Any DNA molecules of interest having overlapping sequences can be assembled by such methods, including DNAs which are naturally occurring, cloned DNA molecules, synthetically generated DNAs, and so forth. Assembling two nucleic acids includes any method of joining strands of two nucleic acids. For example, assembly includes joining digested nucleic acids such that strands from each nucleic acid anneal to the other and extension, in which each strand serves as a template for extension of the other.

[0131] Any in vitro or in vivo DNA assembly methods or rapid combinatorial methods can be used to assemble the nucleic acids. For example, a first and a second nucleic acid having overlapping ends can be combined with a ligase, exonuclease, DNA polymerase, and nucleotides and incubated at a constant temperature, such as at 50.degree. C. Specifically, a T5 exonuclease could be used to remove nucleotides from the 5' ends of dsDNA producing complementary overhangs. The complementary single-stranded DNA overhangs can then be annealed, DNA polymerase used for gap filling, and Taq DNA ligase used to seal the resulting nicks at 50.degree. C. Thus, two nucleic acids sharing overlapping end sequences can be joined into a covalently sealed molecule in a one-step isothermal reaction. See, e.g., Gibson et al. (2009) Nature Methods 6(5): 343-345, herein incorporated by reference in its entirety for all purposes.

[0132] Site-directed nuclease agents (e.g., guide RNA-directed Cas proteins) allow rapid and efficient combination of nucleic acids by selecting and manipulating the end sequences generated by their endonuclease activity. For example, DNA assembly methods can combine a first polynucleotide with a nuclease agent (e.g., a gRNA-Cas complex) specific for a desired target site and an exonuclease. The target site can be chosen such that when the nuclease cleaves the nucleic acid, the resulting ends created by the cleavage have regions complementary to the ends of a second nucleic acid to be assembled with the first nucleic acid (e.g., overlapping ends). These complementary ends can then be assembled to yield a single assembled nucleic acid. Because the nuclease agent (e.g., gRNA-Cas complex) is specific for an individual target site, the method allows for modification of nucleic acids in a precise site-directed manner. By selecting a nuclease agent (e.g., a gRNA-Cas complex) specific for a target site such that, on cleavage, produces end sequences complementary to those of a second nucleic acid, isothermal assembly can be used to assemble the resulting digested nucleic acid. Thus, by selecting nucleic acids and nuclease agents (e.g., gRNA-Cas complexes) that result in overlapping end sequences, nucleic acids can be assembled by rapid combinatorial methods to produce the final assembled nucleic acid in a fast and efficient manner. Alternatively, nucleic acids not having complementary ends can be assembled with joiner oligos designed to have complementary ends to each nucleic acid. By using the joiner oligos, two or more nucleic acids can be seamlessly assembled, thereby reducing unnecessary sequences in the resulting assembled nucleic acid.

[0133] Verification that the LTVEC has been engineered correctly can then be undertaken. For example, diagnostic PCR can be used to verify the novel junctions created by the introduction of the donor fragment into the gene or chromosomal locus of interest. Alternatively or additionally, diagnostic restriction enzyme digestion can be done to make sure that only the desired modifications have been introduced into the LTVEC during the bacterial homologous recombination process. Alternatively or additionally, direct sequencing of the LTVEC can be done, particularly the regions spanning the site of the modification to verify the novel junctions created by the introduction of the donor fragment into the gene or chromosomal locus of interest.

[0134] After any purification and further preparation of the LTVEC DNA for introduction into eukaryotic cells, the LTVEC is preferably linearized in a manner that leaves the modified endogenous gene or chromosomal locus DNA flanked with long homology arms. This can be accomplished by linearizing the LTVEC, preferably in the vector backbone, with any suitable restriction enzyme that digests only rarely. Examples of suitable restriction enzymes include NotI, Pad, SfiI, SrfI, Swal, FseI, and so forth. The choice of restriction enzyme may be determined experimentally (i.e., by testing several different candidate rare cutters) or, if the sequence of the LTVEC is known, by analyzing the sequence and choosing a suitable restriction enzyme based on the analysis.

[0135] C. C9orf72-HRE Nucleic Acid Constructs

[0136] DNA sequences can be used to prepare LTVECs for knock-in animals (e.g., an C9ORF72-HRE). Typically, a polynucleotide molecule (e.g., an insert nucleic acid) comprising a hexanucleotide expansion sequence and/or a selectable marker is inserted into a vector, preferably a DNA vector, in order to replicate the polynucleotide molecule in a suitable host cell.

[0137] A polynucleotide molecule (or insert nucleic acid) comprises a segment of DNA that one desires to integrate into a target locus. In some embodiments, an insert nucleic acid comprises one or more polynucleotides of interest. In some embodiments, an insert nucleic acid comprises one or more expression cassettes. In some certain embodiments, an expression cassette comprises a polynucleotide of interest, a polynucleotide encoding a selection marker and/or a reporter gene along with, in some certain embodiments, various regulatory components that influence expression. Virtually any polynucleotide of interest may be contained within an insert nucleic acid and thereby integrated at a target genomic locus. Methods disclosed herein, provide for at least 1, 2, 3, 4, 5, 6 or more polynucleotides of interest to be integrated into a targeted C9ORF72 genomic locus.

[0138] In some embodiments, a polynucleotide of interest contained in an insert nucleic acid encodes a reporter. In some embodiments, a polynucleotide of interest encodes a selectable marker.

[0139] In some embodiments, a polynucleotide of interest is flanked by or comprises site-specific recombination sites (e.g., loxP, Frt, etc.). In some certain embodiments, site-specific recombination sites flank a DNA segment that encodes a reporter and/or a DNA segment that encodes a selectable marker. Exemplary polynucleotides of interest, including selection markers and reporter genes that can be included within insert nucleic acids are described herein.

[0140] Various methods employed in preparation of plasmids, DNA constructs and/or targeting vectors and transformation of host organisms are known in the art. For other suitable expression systems for both prokaryotic and eukaryotic cells, as well as general recombinant procedures, see Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, J. et al., Cold Spring Harbor Laboratory Press: 1989.

[0141] As described above, exemplary non-human (e.g., rodent) C9ORF72 nucleic acid and amino acid sequences for use in constructing targeting vectors for knock-in animals are provided in Table 2. Other non-human C9ORF72 sequences can also be found in the GenBank database. C9ORF72 targeting vectors as disclosed herein comprise a heterologous hexanucleotide repeat expansion sequence, and optionally one or more sequences encoding a reporter gene and/or a selectable marker, flanked by sequences that are identical or substantially homologous to flanking sequences of a target region (also referred to as "homology arms") for insertion into the genome of a transgenic non-human animal.

[0142] To give but one example, an insertion start point may be set upstream (5'), within, or downstream (3') of a first exon, e.g., a first non-coding exon, to allow an insert nucleic acid to be operably linked to an endogenous regulatory sequence (e.g., a promoter). A targeting strategy for making a targeted insertion of a heterologous hexanucleotide repeat expansion sequence is provided in FIG. 1B and FIG. 1C. The drug selection cassette is flanked by loxP (LP) recombinase recognition sites that enable Cre-mediated excision of the drug selection cassette. This allows for, among other things, excision of the selection cassette. Thus, prior to phenotypic analysis the drug selection cassette may be removed leaving only the heterologous hexanucleotide repeat expansion sequence, and in some embodiments, one copy of the recombinase recognition site.

[0143] Disclosed herein are nucleic acid constructs useful for the modified mouse C9orf72 alleles depicted in FIGS. 1B and 1C, wherein the nucleic acid constructs comprise the sequences set forth in SEQ ID NO:8 and SEQ ID NO:9. SEQ ID NO:8 comprises from 5' to 3': a 5' homology arm (SEQ ID NO:20), a 962 human bp sequence spanning and including part of exon 1a and all of exon 1b of a human C9orf72 gene (SEQ ID NO:2), a floxed neomycin resistance cassette containing the neomycin resistance gene under the control of a human ubiquitin 1 and/or Em7 promoter (SEQ ID NO:21), and a 3' homology arm (SEQ ID NO:22). SEQ ID NO:9 comprises from 5' to 3': a 5' homology arm (SEQ ID NO:23), a 1261 human bp sequence spanning and including part exon 1a and all of exon 1b of a human C9orf72 gene (SEQ ID NO:3), a floxed neomycin resistance cassette containing the neomycin resistance gene under the control of a human ubiquitin 1 and/or Em7 promoter (SEQ ID NO:24), and a 3' homology arm (SEQ ID NO:25).

[0144] As described herein, insertion of heterologous hexanucleotide repeat expansion sequence into an endogenous C9orf72 locus can comprise a replacement of or an insertion/addition to the C9orf72 locus or a portion thereof with an insert nucleic acid. In some embodiments, an insert nucleic acid comprises a reporter gene. In some certain embodiments, a reporter gene is positioned in operable linkage with an endogenous C9orf72 promoter. Such a modification allows for the expression of a reporter gene driven by an endogenous C9orf72 promoter. Alternatively, a reporter gene is not placed in operable linkage with an endogenous C9orf72 promoter.

[0145] A variety of reporter genes (or detectable moieties) can be used in targeting vectors described herein. Exemplary reporter genes include, for example, .beta.-galactosidase (encoded lacZ gene), Green Fluorescent Protein (GFP), enhanced Green Fluorescent Protein (eGFP), MmGFP, blue fluorescent protein (BFP), enhanced blue fluorescent protein (eBFP), mPlum, mCherry, tdTomato, mStrawberry, J-Red, DsRed, mOrange, mKO, mCitrine, Venus, YPet, yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (eYFP), Emerald, CyPet, cyan fluorescent protein (CFP), Cerulean, T-Sapphire, luciferase, alkaline phosphatase, or a combination thereof. The methods described herein demonstrate the construction of targeting vectors that employ the use of a lacZ reporter gene that encodes .beta.-galactosidase, however, persons of skill upon reading this disclosure will understand that non-human animals described herein can be generated in the absence of a reporter gene or with any reporter gene known in the art.

[0146] Where appropriate, the coding region of the genetic material or polynucleotide sequence(s) encoding a reporter polypeptide, in whole or in part, may be modified to include codons that are optimized for expression in the non-human animal (e.g., see U.S. Pat. Nos. 5,670,356 and 5,874,304). Codon optimized sequences are synthetic sequences, and preferably encode the identical polypeptide (or a biologically active fragment of a full length polypeptide which has substantially the same activity as the full length polypeptide) encoded by the non-codon optimized parent polynucleotide. In some embodiments, the coding region of the genetic material encoding a reporter polypeptide (e.g. lacZ), in whole or in part, may include an altered sequence to optimize codon usage for a particular cell type (e.g., a rodent cell). For example, the codons of the reporter gene to be inserted into the genome of a non-human animal (e.g., a rodent) may be optimized for expression in a cell of the non-human animal. Such a sequence may be described as a codon-optimized sequence.

[0147] Compositions and methods for making non-human animals that comprises an insertion of heterologous hexanucleotide repeat expansion sequence disruption in an endogenous C9ORF72 locus as described herein are provided, including compositions and methods for making non-human animals that express the heterologous hexanucleotide repeat expansion sequence, e.g., from a C9ORF72 promoter, e.g., an endogenous mouse promoter, and a C9ORF72 regulatory sequence, e.g., a human regulatory, e.g., found in exons 1a and 1b. In some embodiments, compositions and methods for making non-human animals that express a heterologous hexanucleotide repeat expansion sequence from an endogenous promoter and an endogenous regulatory sequence are also provided. Methods include inserting a targeting vector, as described herein, encoding a heterologous hexanucleotide repeat expansion sequence into the genome of a non-human animal so that a non-coding sequence of a C9ORF72 locus is deleted, in whole or in part. In some embodiments, a non-human animal described herein comprises an endogenous C9ORF72 locus that comprises a targeting vector as described herein.

[0148] Targeting vectors described herein may be introduced into ES cells and screened for ES clones harboring a disruption in a C9orf72 locus as described in Frendewey, D., et al., 2010, Methods Enzymol. 476:295-307. A variety of host embryos can be employed in the methods and compositions disclosed herein. For example, the pluripotent and/or totipotent cells having the targeted genetic modification can be introduced into a pre-morula stage embryo (e.g., an 8-cell stage embryo) from a corresponding organism. See, e.g., U.S. Pat. Nos. 7,576,259, 7,659,442, 7,294,754, and US 2008/0078000 A1, all of which are incorporated by reference herein in their entireties. In other cases, the donor ES cells may be implanted into a host embryo at the 2-cell stage, 4-cell stage, 8-cell stage, 16-cell stage, 32-cell stage, or 64-cell stage. The host embryo can also be a blastocyst or can be a pre-blastocyst embryo, a pre-morula stage embryo, a morula stage embryo, an uncompacted morula stage embryo, or a compacted morula stage embryo.

[0149] In some embodiments, the VELOCIMOUSE.RTM. method (Poueymirou, W. T. et al., 2007, Nat. Biotechnol. 25:91-99) may be applied to inject positive ES cells into an 8-cell embryo to generate fully ES cell-derived F0 generation heterozygous mice ready for lacZ expression profiling or breeding to homozygosity. Exemplary methods for generating non-human animals having a disruption in a C9orf72 locus are provided in Example 1.

[0150] Methods for generating transgenic non-human animals, including knockouts and knock-ins, are well known in the art (see, e.g., Gene Targeting: A Practical Approach, Joyner, ed., Oxford University Press, Inc. (2000)). For example, generation of transgenic rodents may optionally involve disruption of the genetic loci of an endogenous rodent gene and introduction of a reporter gene into the rodent genome, in some embodiments, at the same location as the endogenous rodent gene.

[0151] A schematic illustration (not to scale) of the genomic organization of a mouse C9orf72 is provided in FIG. 1A (top box). An exemplary targeting strategy for replacement of a non-coding sequence of an endogenous murine C9orf72 locus with a heterologous hexanucleotide repeat expansion sequence is also provided in FIG. 1A (bottom box). As illustrated, genomic DNA spanning between exon 1 and the ATG start codon, or a portion thereof, is replaced with a heterologous hexanucleotide repeat expansion sequence and a drug selection cassette flanked by site-specific recombinase recognition sites. The targeting vector used in this strategy may optionally include a recombinase-encoding sequence that is operably linked to a promoter that is developmentally regulated such that the recombinase is expressed in undifferentiated cells. Exemplary developmentally regulated promoters that can be included in targeting vectors described herein are provided in Table 3. Additional suitable promoters that can be used in targeting vectors described herein include those described in U.S. Pat. Nos. 8,697,851, 8,518,392 and 8,354,389; all of which are herein incorporated by reference). Upon homologous recombination, the non-coding sequence, e.g., approximately 800-1000 bp spanning from exon 1 (or within exon 1) to exon 2, of the endogenous murine C9orf72 locus is replaced by the sequence contained in the targeting vector. The drug selection cassette may be removed, e.g., optionally in a development-dependent manner such that progeny derived from mice whose germ line cells containing a disruption in a C9orf72 locus described above will shed the selectable marker from differentiated cells during development (see U.S. Pat. Nos. 8,697,851, 8,518,392 and 8,354,389, all of which are herein incorporated by reference).

TABLE-US-00004 TABLE 3 Prot promoter (SEQ ID NO: 26) CCAGTAGCAGCACCCACGTCCACCTTCTGTCTAGTAATGTCCAAC ACCTCCCTCAGTCCAAACACTGCTCTGCATCCATGTGGCTCCCAT TTATACCTGAAGCACTTGATGGGGCCTCAATGTTTTACTAGAGCC CACCCCCCTGCAACTCTGAGACCCTCTGGATTTGTCTGTCAGTGC CTCACTGGGGCGTTGGATAATTTCTTAAAAGGTCAAGTTCCCTCA GCAGCATTCTCTGAGCAGTCTGAAGATGTGTGCTTTTCACAGTTC AAATCCATGTGGCTGTTTCACCCACCTGCCTGGCCTTGGGTTATC TATCAGGACCTAGCCTAGAAGCAGGTGTGTGGCACTTAACACCTA AGCTGAGTGACTAACTGAACACTCAAGTGGATGCCATCTTTGTCA CTTCTTGACTGTGACACAAGCAACTCCTGATGCCAAAGCCCTGCC CACCCCTCTCATGCCCATATTTGGACATGGTACAGGTCCTCACTG GCCATGGTCTGTGAGGTCCTGGTCCTCTTTGACTTCATAATTCCT AGGGGCCACTAGTATCTATAAGAGGAAGAGGGTGCTGGCTCCCAG GCCACAGCCCACAAAATTCCACCTGCTCACAGGTTGGCTGGCTCG ACCCAGGTGGTGTCCCCTGCTCTGAGCCAGCTCCCGGCCAAGCCA GCACC Blimpl promoter 1 kb (SEQ ID NO: 27) TGCCATCATCACAGGATGTCCTTCCTTCTCCAGAAGACAGACTGG GGCTGAAGGAAAAGCCGGCCAGGCTCAGAACGAGCCCCACTAATT ACTGCCTCCAACAGCTTTCCACTCACTGCCCCCAGCCCAACATCC CCTTTTTAACTGGGAAGCATTCCTACTCTCCATTGTACGCACACG CTCGGAAGCCTGGCTGTGGGTTTGGGCATGAGAGGCAGGGACAAC AAAACCAGTATATATGATTATAACTTTTTCCTGTTTCCCTATTTC CAAATGGTCGAAAGGAGGAAGTTAGGTCTACCTAAGCTGAATGTA TTCAGTTAGCAGGAGAAATGAAATCCTATACGTTTAATACTAGAG GAGAACCGCCTTAGAATATTTATTTCATTGGCAATGACTCCAGGA CTACACAGCGAAATTGTATTGCATGTGCTGCCAAAATACTTTAGC TCTTTCCTTCGAAGTACGTCGGATCCTGTAATTGAGACACCGAGT TTAGGTGACTAGGGTTTTCTTTTGAGGAGGAGTCCCCCACCCCGC CCCGCTCTGCCGCGACAGGAAGCTAGCGATCCGGAGGACTTAGAA TACAATCGTAGTGTGGGTAAACATGGAGGGCAAGCGCCTGCAAAG GGAAGTAAGAAGATTCCCAGTCCTTGTTGAAATCCATTTGCAAAC AGAGGAAGCTGCCGCGGGTCGCAGTCGGTGGGGGGAAGCCCTGAA CCCCACGCTGCACGGCTGGGCTGGCCAGGTGCGGCCACGCCCCCA TCGCGGCGGCTGGTAGGAGTGAATCAGACCGTCAGTATTGGTAAA GAAGTCTGCGGCAGGGCAGGGAGGGGGAAGAGTAGTCAGTCGCTC GCTCACTCGCTCGCTCGCACAGACACTGCTGCAGTGACACTCGGC CCTCCAGTGTCGCGGAGACGCAAGAGCAGCGCGCAGCACCTGTCC GCCCGGAGCGAGCCCGGCCCGCGGCCGTAGAAAAGGAGGGACCGC CGAGGTGCGCGTCAGTACTGCTCAGCCCGGCAGGGACGCGGGAGG ATGTGGACTGGGTGGAC Blimpl promoter 2 kb (SEQ ID NO: 28) GTGGTGCTGACTCAGCATCGGTTAATAAACCCTCTGCAGGAGGCT GGATTTCTTTTGTTTAATTATCACTTGGACCTTTCTGAGAACTCT TAAGAATTGTTCATTCGGGTTTTTTTGTTTTGTTTTGGTTTGGTT TTTTTGGGTTTTTTTTTTTTTTTTTTTTTTGGTTTTTGGAGACAG GGTTTCTCTGTATATAGCCCTGGCACAAGAGCAAGCTAACAGCCT GTTTCTTCTTGGTGCTAGCGCCCCCTCTGGCAGAAAATGAAATAA CAGGTGGACCTACAACCCCCCCCCCCCCCCCCAGTGTATTCTACT CTTGTCCCCGGTATAAATTTGATTGTTCCGAACTACATAAATTGT AGAAGGATTTTTTAGATGCACATATCATTTTCTGTGATACCTTCC ACACACCCCTCCCCCCCAAAAAAATTTTTCTGGGAAAGTTTCTTG AAAGGAAAACAGAAGAACAAGCCTGTCTTTATGATTGAGTTGGGC TTTTGTTTTGCTGTGTTTCATTTCTTCCTGTAAACAAATACTCAA ATGTCCACTTCATTGTATGACTAAGTTGGTATCATTAGGTTGGGT CTGGGTGTGTGAATGTGGGTGTGGATCTGGATGTGGGTGGGTGTG TATGCCCCGTGTGTTTAGAATACTAGAAAAGATACCACATCGTAA ACTTTTGGGAGAGATGATTTTTAAAAATGGGGGTGGGGGTGAGGG GAACCTGCGATGAGGCAAGCAAGATAAGGGGAAGACTTGAGTTTC TGTGATCTAAAAAGTCGCTGTGATGGGATGCTGGCTATAAATGGG CCCTTAGCAGCATTGTTTCTGTGAATTGGAGGATCCCTGCTGAAG GCAAAAGACCATTGAAGGAAGTACCGCATCTGGTTTGTTTTGTAA TGAGAAGCAGGAATGCAAGGTCCACGCTCTTAATAATAAACAAAC AGGACATTGTATGCCATCATCACAGGATGTCCTTCCTTCTCCAGA AGACAGACTGGGGCTGAAGGAAAAGCCGGCCAGGCTCAGAACGAG CCCCACTAATTACTGCCTCCAACAGCTTTCCACTCACTGCCCCCA GCCCAACATCCCCTTTTTAACTGGGAAGCATTCCTACTCTCCATT GTACGCACACGCTCGGAAGCCTGGCTGTGGGTTTGGGCATGAGAG GCAGGGACAACAAAACCAGTATATATGATTATAACTTTTTCCTGT TTCCCTATTTCCAAATGGTCGAAAGGAGGAAGTTAGGTCTACCTA AGCTGAATGTATTCAGTTAGCAGGAGAAATGAAATCCTATACGTT TAATACTAGAGGAGAACCGCCTTAGAATATTTATTTCATTGGCAA TGACTCCAGGACTACACAGCGAAATTGTATTGCATGTGCTGCCAA AATACTTTAGCTCTTTCCTTCGAAGTACGTCGGATCCTGTAATTG AGACACCGAGTTTAGGTGACTAGGGTTTTCTTTTGAGGAGGAGTC CCCCACCCCGCCCCGCTCTGCCGCGACAGGAAGCTAGCGATCCGG AGGACTTAGAATACAATCGTAGTGTGGGTAAACATGGAGGGCAAG CGCCTGCAAAGGGAAGTAAGAAGATTCCCAGTCCTTGTTGAAATC CATTTGCAAACAGAGGAAGCTGCCGCGGGTCGCAGTCGGTGGGGG GAAGCCCTGAACCCCACGCTGCACGGCTGGGCTGGCCAGGTGCGG CCACGCCCCCATCGCGGCGGCTGGTAGGAGTGAATCAGACCGTCA GTATTGGTAAAGAAGTCTGCGGCAGGGCAGGGAGGGGGAAGAGTA GTCAGTCGCTCGCTCACTCGCTCGCTCGCACAGACACTGCTGCAG TGACACTCGGCCCTCCAGTGTCGCGGAGACGCAAGAGCAGCGCGC AGCACCTGTCCGCCCGGAGCGAGCCCGGCCCGCGGCCGTAGAAAA GGAGGGACCGCCGAGGTGCGCGTCAGTACTGCTCAGCCCGGCAGG GACGCGGGAGGATGTGGACTGGGTGGAC

[0152] D. Introduction of LTVEC into Cells

[0153] LTVEC DNA can be introduced into eukaryotic cells using any standard methodology. "Introducing" includes presenting to the cell the nucleic acid in such a manner that the sequence gains access to the interior of the cell. The introducing can be accomplished by any means.

[0154] The methods provided herein do not depend on a particular method for introducing a nucleic acid into the cell, only that the nucleic acid gains access to the interior of a least one cell. Methods for introducing nucleic acids into various cell types are known in the art and include, for example, stable transfection methods, transient transfection methods, and virus-mediated methods.

[0155] Transfection protocols as well as protocols for introducing nucleic acids into cells may vary. Non-limiting transfection methods include chemical-based transfection methods using liposomes; nanoparticles; calcium phosphate (see, e.g., Graham et al. (1973) Virology 52 (2): 456-67, Bacchetti et al. (1977) Proc. Natl. Acad. Sci. USA 74 (4): 1590-4, and Kriegler, M (1991). Transfer and Expression: A Laboratory Manual. New York: W. H. Freeman and Company. pp. 96-97, each of which is herein incorporated by reference in its entirety); dendrimers; or cationic polymers such as DEAE-dextran or polyethylenimine. Non-chemical methods include electroporation, Sono-poration, and optical transfection. Particle-based transfection includes the use of a gene gun, or magnet-assisted transfection (see, e.g., Bertram (2006) Current Pharmaceutical Biotechnology 7, 277-285, herein incorporated by reference in its entirety). Viral methods can also be used for transfection.

[0156] Introduction of nucleic acids into a cell can also be mediated by electroporation, by intracytoplasmic injection, by viral infection, by adenovirus, by adeno-associated virus, by lentivirus, by retrovirus, by transfection, by lipid-mediated transfection, or by nucleofection. Nucleofection is an improved electroporation technology that enables nucleic acid substrates to be delivered not only to the cytoplasm but also through the nuclear membrane and into the nucleus. In addition, use of nucleofection in the methods disclosed herein typically requires much fewer cells than regular electroporation (e.g., only about 2 million compared with 7 million by regular electroporation). In one example, nucleofection is performed using the LONZA.RTM. NUCLEOFECTOR.TM. system.

[0157] Introduction of nucleic acids into a cell (e.g., a one-cell stage embryo) can also be accomplished by microinjection. In one-cell stage embryos, microinjection can be into the maternal and/or paternal pronucleus or into the cytoplasm. If the microinjection is into only one pronucleus, the paternal pronucleus is preferable due to its larger size. Microinjection of an mRNA is preferably into the cytoplasm (e.g., to deliver mRNA directly to the translation machinery), while microinjection of a protein or a DNA encoding a DNA encoding a Cas protein is preferably into the nucleus/pronucleus. Alternatively, microinjection can be carried out by injection into both the nucleus/pronucleus and the cytoplasm: a needle can first be introduced into the nucleus/pronucleus and a first amount can be injected, and while removing the needle from the one-cell stage embryo a second amount can be injected into the cytoplasm. If a nuclease agent protein is injected into the cytoplasm, the protein preferably comprises a nuclear localization signal to ensure delivery to the nucleus/pronucleus. Methods for carrying out microinjection are well known. See, e.g., Nagy et al., 2003, Manipulating the Mouse Embryo. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press); Meyer et al. (2010) Proc. Natl. Acad. Sci. U.S.A. 107:15022-15026, and Meyer et al. (2012) Proc. Natl. Acad. Sci. USA 109:9354-9359, each of which is herein incorporated by reference in its entirety.

[0158] Other methods for introducing nucleic acid or proteins into a cell can include, for example, vector delivery, particle-mediated delivery, exosome-mediated delivery, lipid-nanoparticle-mediated delivery, cell-penetrating-peptide-mediated delivery, or implantable-device-mediated delivery.

[0159] The introduction of nucleic acids into the cell can be performed one time or multiple times over a period of time. For example, the introduction can be performed at least two times over a period of time, at least three times over a period of time, at least four times over a period of time, at least five times over a period of time, at least six times over a period of time, at least seven times over a period of time, at least eight times over a period of time, at least nine times over a period of times, at least ten times over a period of time, at least eleven times, at least twelve times over a period of time, at least thirteen times over a period of time, at least fourteen times over a period of time, at least fifteen times over a period of time, at least sixteen times over a period of time, at least seventeen times over a period of time, at least eighteen times over a period of time, at least nineteen times over a period of time, or at least twenty times over a period of time.

[0160] E. Screening for and Identifying Cells with Targeted Genetic Modifications

[0161] Cells in which the LTVEC has been introduced successfully can be selected by exposure to selection agents, depending on whether a selectable marker gene that has been engineered into the LTVEC. As a non-limiting example, if the selectable marker is the neomycin phosphotransferase (neo) gene (see, e.g., Beck et al. (1982) Gene 19:327-336, herein incorporated by reference in its entirety for all purposes), then cells that have taken up the LTVEC can be selected in G418-containing media; cells that do not have the LTVEC will die whereas cells that have taken up the LTVEC will survive (see, e.g., Santerre, et al. (1984) Gene 30:147-156, herein incorporated by reference in its entirety for all purposes). Such selection markers can, for example, impart resistance to an antibiotic such as G418, hygromycin, blasticidin, neomycin, or puromycin. Such selection markers include neomycin phosphotransferase (neo.sup.r), hygromycin B phosphotransferase (hyg.sup.r), puromycin-N-acetyltransferase (puro.sup.r), and blasticidin S deaminase (bsr.sup.r). In still other embodiments, the selection marker is operably linked to an inducible promoter and the expression of the selection marker is toxic to the cell. Non-limiting examples of such selection markers include xanthine/guanine phosphoribosyl transferase (gpt), hypoxanthine-guanine phosphoribosyltransferase (HGPRT) or herpes simplex virus thymidine kinase (HSV-TK).

[0162] The methods disclosed herein can further comprise identifying a cell having a modified genome. Various methods can be used to identify cells having a targeted genetic modification, such as a deletion or an insertion. Such methods can comprise identifying one cell having the targeted genetic modification at a target locus.

[0163] Conventional assays for screening for targeted modifications, such as long-range PCR, Sanger sequencing, or Southern blotting, link the inserted targeting vector to the targeted locus. For example, for a long-range PCR assay, one primer can recognize a sequence within the inserted DNA while the other recognizes a genomic region of interest sequence beyond the ends of the targeting vector's homology arms. Because of their large homology arm sizes, however, LTVECs do not permit screening by such conventional assays. To screen LTVEC targeting, modification-of-allele (MOA) assays including loss-of-allele (LOA) and gain-of-allele (GOA) assays can be used (see, e.g., US 2014/0178879 and Frendewey et al. (2010) Methods Enzymol. 476:295-307, each of which is herein incorporated by reference in its entirety for all purposes). The loss-of-allele (LOA) assay inverts the conventional screening logic and quantifies the number of copies of the native locus to which the mutation was directed. In a correctly targeted cell clone, the LOA assay detects one of the two native alleles (for genes not on the X or Y chromosome), the other allele being disrupted by the targeted modification. The same principle can be applied in reverse as a gain-of-allele (GOA) assay to quantify the copy number of the inserted targeting vector. For example, the combined use of GOA and LOA assays will reveal a correctly targeted heterozygous clone as having lost one copy of the native target gene and gained one copy of the drug resistance gene or other inserted marker.

[0164] As an example, quantitative polymerase chain reaction (qPCR) can be used as the method of allele quantification, but any method that can reliably distinguish the difference between zero, one, and two copies of the target gene or between zero, one, and two copies of the nucleic acid insert can be used to develop a MOA assay. For example, TAQMAN.RTM. can be used to quantify the number of copies of a DNA template in a genomic DNA sample, especially by comparison to a reference gene (see, e.g., U.S. Pat. No. 6,596,541, herein incorporated by reference in its entirety for all purposes). The reference gene is quantitated in the same genomic DNA as the target gene(s) or locus(loci). Therefore, two TAQMAN.RTM. amplifications (each with its respective probe) are performed. One TAQMAN.RTM. probe determines the "Ct" (Threshold Cycle) of the reference gene, while the other probe determines the Ct of the region of the targeted gene(s) or locus(loci) which is replaced by successful targeting (i.e., a LOA assay). The Ct is a quantity that reflects the amount of starting DNA for each of the TAQMAN.RTM. probes, i.e. a less abundant sequence requires more cycles of PCR to reach the threshold cycle. Decreasing by half the number of copies of the template sequence for a TAQMAN.RTM. reaction will result in an increase of about one Ct unit. TAQMAN.RTM. reactions in cells where one allele of the target gene(s) or locus(loci) has been replaced by homologous recombination will result in an increase of one Ct for the target TAQMAN.RTM. reaction without an increase in the Ct for the reference gene when compared to DNA from non-targeted cells. For a GOA assay, another TAQMAN.RTM. probe can be used to determine the Ct of the nucleic acid insert that is replacing the targeted gene(s) or locus(loci) by successful targeting.

[0165] The screening step can also comprise arm-specific assays, which are assays used to distinguish between correct targeted insertions of a nucleic acid insert into a target genomic locus from random transgenic insertions of the nucleic acid insert into genomic locations outside of the target genomic locus. Arm-specific assays determine copy numbers of a DNA template in LTVEC homology arms. See, e.g., US 2016/0177339, WO 2016/100819, US 2016/0145646, and WO 2016/081923, each of which is herein incorporated by reference in its entirety for all purposes. It can be useful augment standard LOA and GOA assays to verify correct targeting by LTVECs. For example, LOA and GOA assays alone may not distinguish correctly targeted cell clones from clones in which a deletion of the target genomic locus coincides with random integration of a LTVEC elsewhere in the genome. Because the selection pressure in the targeted cell is based on the selection cassette, random transgenic integration of the LTVEC elsewhere in the genome will generally include the selection cassette and adjacent regions of the LTVEC but may exclude more distal regions of the LTVEC. For example, if a portion of an LTVEC is randomly integrated into the genome, and the LTVEC comprises a nucleic acid insert of around 5 kb or more in length with a selection cassette adjacent to the 3' homology arm, in some cases the 3' homology arm but not the 5' homology arm will be transgenically integrated with the selection cassette. Alternatively, if the selection cassette adjacent to the 5' homology arm, in some cases the 5' homology arm but not the 3' homology arm will be transgenically integrated with the selection cassette. As an example, if LOA and GOA assays are used to assess targeted integration of the LTVEC, and the GOA assay utilizes probes against the selection cassette or any other unique (non-arm) region of the LTVEC, a heterozygous deletion at the target genomic locus combined with a random transgenic integration of the LTVEC will give the same readout as a heterozygous targeted integration of the LTVEC at the target genomic locus. To verify correct targeting by the LTVEC, arm-specific assays can be used in conjunction with LOA and/or GOA assays.

[0166] Other examples of suitable quantitative assays include fluorescence-mediated in situ hybridization (FISH), comparative genomic hybridization, isothermic DNA amplification, quantitative hybridization to an immobilized probe(s), INVADER.RTM. Probes, TAQMAN.RTM. Molecular Beacon probes, or ECLIPSE.TM. probe technology (see, e.g., US 2005/0144655, herein incorporated by reference in its entirety for all purposes).

[0167] Next generation sequencing (NGS) can also be used for screening, particularly in one-cell stage embryos that have been modified. Next-generation sequencing can also be referred to as "NGS" or "massively parallel sequencing" or "high throughput sequencing." Such NGS can be used as a screening tool in addition to the MOA assays and retention assays to define the exact nature of the targeted genetic modification and to detect mosaicism. Mosaicism refers to the presence of two or more populations of cells with different genotypes in one individual who has developed from a single fertilized egg (i.e., zygote). In the methods disclosed herein, it is not necessary to screen for targeted clones using selection markers. For example, the MOA and NGS assays described herein can be relied on without using selection cassettes.

[0168] F. Methods of Making Genetically Modified Non-Human Animals

[0169] Genetically modified non-human animals can be generated employing the various methods disclosed herein. Any convenient method or protocol for producing a genetically modified organism, including the methods described herein, is suitable for producing such a genetically modified non-human animal. Such methods starting with genetically modifying a pluripotent cell such as an embryonic stem (ES) cell generally comprise: (1) modifying the genome of a pluripotent cell that is not a one-cell stage embryo using the methods described herein; (2) identifying or selecting the genetically modified pluripotent cell; (3) introducing the genetically modified pluripotent cell into a host embryo; and (4) implanting and gestating the host embryo comprising the genetically modified pluripotent cell in a surrogate mother. The surrogate mother can then produce F0 generation non-human animals comprising the targeted genetic modification and capable of transmitting the targeted genetic modification though the germline. Animals bearing the genetically modified genomic locus can be identified via a modification of allele (MOA) assay as described herein. The donor cell can be introduced into a host embryo at any stage, such as the blastocyst stage or the pre-morula stage (i.e., the 4 cell stage or the 8 cell stage). Progeny that are capable of transmitting the genetic modification though the germline are generated. The pluripotent cell can be, for example, an ES cell (e.g., a rodent ES cell, a mouse ES cell, or a rat ES cell) as discussed elsewhere herein. See, e.g., U.S. Pat. No. 7,294,754, herein incorporated by reference in its entirety for all purposes.

[0170] Alternatively, such methods starting with genetically modifying a one-cell stage embryo generally comprise: (1) modifying the genome of a one-cell stage embryo using the methods described herein; (2) identifying or selecting the genetically modified embryo; and (3) implanting and gestating the genetically modified embryo in a surrogate mother. The surrogate mother can then produce F0 generation non-human animals comprising the targeted genetic modification and capable of transmitting the targeted genetic modification though the germline. Animals bearing the genetically modified genomic locus can be identified via a modification of allele (MOA) assay as described herein.

[0171] Nuclear transfer techniques can also be used to generate the non-human mammalian animals. Briefly, methods for nuclear transfer can include the steps of: (1) enucleating an oocyte or providing an enucleated oocyte; (2) isolating or providing a donor cell or nucleus to be combined with the enucleated oocyte; (3) inserting the cell or nucleus into the enucleated oocyte to form a reconstituted cell; (4) implanting the reconstituted cell into the womb of a non-human animal to form an embryo; and (5) allowing the embryo to develop. In such methods, oocytes are generally retrieved from deceased animals, although they may be isolated also from either oviducts and/or ovaries of live animals. Oocytes can be matured in a variety of media known to those of ordinary skill in the art prior to enucleation. Enucleation of the oocyte can be performed in a number of manners well known to those of ordinary skill in the art. Insertion of the donor cell or nucleus into the enucleated oocyte to form a reconstituted cell can be by microinjection of a donor cell under the zona pellucida prior to fusion. Fusion may be induced by application of a DC electrical pulse across the contact/fusion plane (electrofusion), by exposure of the cells to fusion-promoting chemicals, such as polyethylene glycol, or by way of an inactivated virus, such as the Sendai virus. A reconstituted cell can be activated by electrical and/or non-electrical means before, during, and/or after fusion of the nuclear donor and recipient oocyte. Activation methods include electric pulses, chemically induced shock, penetration by sperm, increasing levels of divalent cations in the oocyte, and reducing phosphorylation of cellular proteins (as by way of kinase inhibitors) in the oocyte. The activated reconstituted cells, or embryos, can be cultured in medium well known to those of ordinary skill in the art and then transferred to the womb of an animal. See, e.g., US 2008/0092249, WO 1999/005266, US 2004/0177390, WO 2008/017234, and U.S. Pat. No. 7,612,250, each of which is herein incorporated by reference in its entirety for all purposes.

[0172] The various methods provided herein allow for the generation of a genetically modified non-human F0 animal wherein the cells of the genetically modified F0 animal that comprise the targeted genetic modification. It is recognized that depending on the method used to generate the F0 animal, the number of cells within the F0 animal that have the targeted genetic modification will vary. The introduction of the donor ES cells into a pre-morula stage embryo from a corresponding organism (e.g., an 8-cell stage mouse embryo) via, for example, the VELOCIMOUSE.RTM. method allows for a greater percentage of the cell population of the F0 animal to comprise cells having the targeted genetic modification. See, e.g., US 2014/0331340, US 2008/0078001, US 2008/0028479, US 2006/0085866, and WO 2006/044962, each of which is herein incorporated by reference in its entirety for all purposes. For example, at least 50%, 60%, 65%, 70%, 75%, 85%, 86%, 87%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% of the cellular contribution of the non-human F0 animal can comprise a cell population having the targeted genetic modification. In addition, at least one or more of the germ cells of the F0 animal can have the targeted genetic modification.

[0173] A genetically modified founder non-human animal can be identified based upon the absence of endogenous genomic C9ORF72 sequences in its genome that are replaced with the heterologous hexanucleotide repeat expansion sequence and/or the presence (and/or expression) of the heterologous hexanucleotide repeat expansion sequence, drug resistance gene and/or reporter in tissues or cells of the non-human animal. A transgenic founder non-human animal can then be used to breed additional non-human animals carrying the heterologous hexanucleotide repeat expansion sequence thereby creating a series of non-human animals each carrying one or more copies of a C9ORF72 locus as described herein.

[0174] Transgenic non-human animals may also be produced to contain selected systems that allow for regulated or directed expression of the transgene. Exemplary systems include the Cre/loxP recombinase system of bacteriophage P1 (see, e.g., Lakso, M. et al., 1992, Proc. Natl. Acad. Sci. USA 89:6232-6236) and the FLP/Frt recombinase system of S. cerevisiae (O'Gorman, S. et al, 1991, Science 251:1351-1355). Such animals can be provided through the construction of "double" transgenic animals, e.g., by mating two transgenic animals, one containing a transgene encoding the heterologous hexanucleotide repeat expansion sequence and the other containing a transgene encoding a recombinase (e.g., a Cre recombinase).

[0175] Although embodiments employing an insertion of a heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus in a mouse are extensively discussed herein, other non-human animals that comprise a disruption in a C9ORF72 locus are also provided. Such non-human animals include any of those which can be genetically modified to replace a non-coding sequence of a C9ORF72 locus as disclosed herein, including, e.g., mammals, e.g., mouse, rat, rabbit, pig, bovine (e.g., cow, bull, buffalo), deer, sheep, goat, chicken, cat, dog, ferret, primate (e.g., marmoset, rhesus monkey), etc. For example, for those non-human animals for which suitable genetically modifiable ES cells are not readily available, other methods are employed to make a non-human animal comprising the genetic modification. Such methods include, e.g., modifying a non-ES cell genome (e.g., a fibroblast or an induced pluripotent cell) and employing somatic cell nuclear transfer (SCNT) to transfer the genetically modified genome to a suitable cell, e.g., an enucleated oocyte, and gestating the modified cell (e.g., the modified oocyte) in a non-human animal under suitable conditions to form an embryo.

[0176] Briefly, methods for nuclear transfer include steps of: (1) enucleating an oocyte; (2) isolating a donor cell or nucleus to be combined with the enucleated oocyte; (3) inserting the cell or nucleus into the enucleated oocyte to form a reconstituted cell; (4) implanting the reconstituted cell into the womb of an animal to form an embryo; and (5) allowing the embryo to develop. In such methods oocytes are generally retrieved from deceased animals, although they may be isolated also from either oviducts and/or ovaries of live animals. Oocytes may be matured in a variety of medium known to persons of skill in the art prior to enucleation. Enucleation of the oocyte can be performed in a variety of ways known to persons of skill in the art. Insertion of a donor cell or nucleus into an enucleated oocyte to form a reconstituted cell is typically achieved by microinjection of a donor cell under the zona pellucida prior to fusion. Fusion may be induced by application of a DC electrical pulse across the contact/fusion plane (electrofusion), by exposure of the cells to fusion-promoting chemicals, such as polyethylene glycol, or by way of an inactivated virus, such as the Sendai virus. A reconstituted cell is typically activated by electrical and/or non-electrical means before, during, and/or after fusion of the nuclear donor and recipient oocyte. Activation methods include electric pulses, chemically induced shock, penetration by sperm, increasing levels of divalent cations in the oocyte, and reducing phosphorylation of cellular proteins (as by way of kinase inhibitors) in the oocyte. The activated reconstituted cells, or embryos, are typically cultured in medium known to persons of skill in the art and then transferred to the womb of an animal. See, e.g., U.S. Patent Application Publication No. 2008-0092249 A1, WO 1999/005266 A2, U.S. Patent Application Publication No. 2004-0177390 A1, WO 2008/017234 A1, and U.S. Pat. No. 7,612,250, each of which is herein incorporated by reference.

[0177] Methods for modifying a non-human animal genome (e.g., a pig, cow, rodent, chicken, etc.) include, e.g., employing a zinc finger nuclease (ZFN) or a transcription activator-like effector nuclease (TALEN) to modify a genome to include an insertion of a heterologous hexanucleotide repeat expansion sequence in a C9ORF72 locus as described herein.

[0178] In some embodiments, a non-human animal described herein is a mammal. In some embodiments, a non-human animal described herein is a small mammal, e.g., of the superfamily Dipodoidea or Muroidea. In some embodiments, a genetically modified animal described herein is a rodent. In some embodiments, a rodent described herein is selected from a mouse, a rat, and a hamster. In some embodiments, a rodent described herein is selected from the superfamily Muroidea. In some embodiments, a genetically modified animal described herein is from a family selected from Calomyscidae (e.g., mouse-like hamsters), Cricetidae (e.g., hamster, New World rats and mice, voles), Muridae (true mice and rats, gerbils, spiny mice, crested rats), Nesomyidae (climbing mice, rock mice, with-tailed rats, Malagasy rats and mice), Platacanthomyidae (e.g., spiny dormice), and Spalacidae (e.g., mole rates, bamboo rats, and zokors). In some certain embodiments, a genetically modified rodent described herein is selected from a true mouse or rat (family Muridae), a gerbil, a spiny mouse, and a crested rat. In some certain embodiments, a genetically modified mouse described herein is from a member of the family Muridae. In some embodiment, a non-human animal described herein is a rodent. In some certain embodiments, a rodent described herein is selected from a mouse and a rat. In some embodiments, a non-human animal described herein is a mouse.

[0179] In some embodiments, a non-human animal described herein is a rodent that is a mouse of a C57BL strain selected from C57BL/A, C57BL/An, C57BL/GrFa, C57BL/KaLwN, C57BL/6, C57BL/6J, C57BL/6ByJ, C57BL/6NJ, C57BL/10, C57BL/10ScSn, C57BL/10Cr, and C57BL/Ola. In some certain embodiments, a mouse described herein is a 129 strain selected from the group consisting of a strain that is 129P1, 129P2, 129P3, 129X1, 129S1 (e.g., 129S1/SV, 129S1/SvIm), 129S2, 129S4, 129S5, 129S9/SvEvH, 129/SvJae, 129S6 (129/SvEvTac), 129S7, 129S8, 129T1, 129T2 (see, e.g., Festing et al., 1999, Mammalian Genome 10:836; Auerbach, W. et al., 2000, Biotechniques 29(5):1024-1028, 1030, 1032). In some certain embodiments, a genetically modified mouse described herein is a mix of an aforementioned 129 strain and an aforementioned C57BL/6 strain. In some certain embodiments, a mouse described herein is a mix of aforementioned 129 strains, or a mix of aforementioned BL/6 strains. In some certain embodiments, a 129 strain of the mix as described herein is a 129S6 (129/SvEvTac) strain. In some embodiments, a mouse described herein is a BALB strain, e.g., BALB/c strain. In some embodiments, a mouse described herein is a mix of a BALB strain and another aforementioned strain.

[0180] In some embodiments, a non-human animal described herein is a rat. In some certain embodiments, a rat described herein is selected from a Wistar rat, an LEA strain, a Sprague Dawley strain, a Fischer strain, F344, F6, and Dark Agouti. In some certain embodiments, a rat strain as described herein is a mix of two or more strains selected from the group consisting of Wistar, LEA, Sprague Dawley, Fischer, F344, F6, and Dark Agouti.

[0181] A rat pluripotent and/or totipotent cell can be from any rat strain, including, for example, an ACI rat strain, a Dark Agouti (DA) rat strain, a Wistar rat strain, a LEA rat strain, a Sprague Dawley (SD) rat strain, or a Fischer rat strain such as Fisher F344 or Fisher F6. Rat pluripotent and/or totipotent cells can also be obtained from a strain derived from a mix of two or more strains recited above. For example, the rat pluripotent and/or totipotent cell can be from a DA strain or an ACI strain. The ACI rat strain is characterized as having black agouti, with white belly and feet and an RT1.sup.av1 haplotype. Such strains are available from a variety of sources including Harlan Laboratories. An example of a rat ES cell line from an ACI rat is an ACI.G1 rat ES cell. The Dark Agouti (DA) rat strain is characterized as having an agouti coat and an RT1.sup.av1 haplotype. Such rats are available from a variety of sources including Charles River and Harlan Laboratories. Examples of a rat ES cell line from a DA rat are the DA.2B rat ES cell line and the DA.2C rat ES cell line. In some cases, the rat pluripotent and/or totipotent cells are from an inbred rat strain. See, e.g., U.S. 2014/0235933 A1, filed on Feb. 20, 2014, and herein incorporated by reference in its entirety.

[0182] Non-human animals are provided that comprise an insertion of a heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus. In some embodiments, insertion of a heterologous hexanucleotide repeat expansion sequence is not pathogenic. In some embodiments, insertion of a heterologous hexanucleotide repeat expansion sequence results in one or more phenotypes as described herein, e.g., a phenotype associated with ALS and/or FTD. Insertion of a heterologous hexanucleotide repeat expansion sequence may be measured directly, e.g., by determining the approximate number of instance, e.g., repeats, of the hexanucleotide sequence set forth as SEQ ID NO:1 in the heterologous hexanucleotide repeat expansion sequence, e.g., by Southern Blot or polymerase chain reaction genotyping reactions.

Methods Employing Non-Human Animals Having an Insertion of a Heterologous Hexanucleotide Repeat Expansion Sequence in an Endogenous C9ORF72 Locus

[0183] Non-human animals as described herein provide improved animal models for neurodegenerative diseases, disorders and conditions. In particular, non-human animals as described herein provide improved animal models that translate to human diseases such as, for example, ALS and/or FTD, characterized by upper motor neuron symptoms and/or non-motor neuron loss.

[0184] Non-human animals as described herein provide an improved in vivo system and source of biological materials (e.g., cells) that comprise and/or express the inserted pathogenic heterologous hexanucleotide repeat expansion sequence in an endogenous C9ORF72 locus that are useful for a variety of assays. In various embodiments, non-human animals described herein may be used to develop therapeutics that treat, prevent and/or inhibit one or more symptoms associated with expression and/or activity of a pathogenic heterologous hexanucleotide repeat expansion. In various embodiments, non-human animals described herein are used to identify, screen and/or develop candidate therapeutics (e.g., antibodies, gRNAs (comprising CRISPR RNA and tracRNA) and siRNA, etc.) that bind a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation. In various embodiments, non-human animals described herein are used to screen and develop candidate therapeutics (e.g., antibodies, gRNAs (comprising CRISPR RNA and tracRNA) and siRNA, etc.) that block activity of a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation. In various embodiments, non-human animals described herein are used to determine the binding profile of antagonists and/or agonists of a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof (transcript), e.g., resulting from RAN translation, of a non-human animal as described herein. In some embodiments, non-human animals described herein are used to determine the epitope or epitopes of one or more candidate therapeutic antibodies that bind a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation.

[0185] In various embodiments, non-human animals described herein are used to determine the pharmacokinetic profiles of a drug targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation. In various embodiments, one or more non-human animals described herein and one or more control or reference non-human animals are each exposed to one or more candidate drugs targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation, at various doses (e.g., 0.1 mg/kg, 0.2 mg/kg, 0.3 mg/kg, 0.4 mg/kg, 0.5 mg/kg, 1 mg/kg, 2 mg/kg, 3 mg/kg, 4 mg/kg, 5 mg/mg, 7.5 mg/kg, 10 mg/kg, 15 mg/kg, 20 mg/kg, 25 mg/kg, 30 mg/kg, 40 mg/kg, or 50 mg/kg or more). Candidate therapeutic antibodies may be dosed via any desired route of administration including parenteral and non-parenteral routes of administration. Parenteral routes include, e.g., intravenous, intraarterial, intraportal, intramuscular, subcutaneous, intraperitoneal, intraspinal, intrathecal, intracerebroventricular, intracranial, intrapleural or other routes of injection. Non-parenteral routes include, e.g., oral, nasal, transdermal, pulmonary, rectal, buccal, vaginal, ocular. Administration may also be by continuous infusion, local administration, sustained release from implants (gels, membranes or the like), and/or intravenous injection. Blood is isolated from non-human animals (humanized and control) at various time points (e.g., 0 hr, 6 hr, 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, or up to 30 or more days). Various assays may be performed to determine the pharmacokinetic profiles of administered drugs targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation, using samples obtained from non-human animals as described herein including, but not limited to, total IgG, anti-therapeutic antibody response, agglutination, etc.

[0186] In various embodiments, non-human animals as described herein are used to measure the therapeutic effect of blocking, modulating, and/or inhibiting activity of a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from Repeat-associated non-AUG (RAN) translation, and the effect on gene expression as a result of cellular changes. In various embodiments, a non-human animal as described herein or cells isolated therefrom are exposed to a drug targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation, of the non-human animal and, after a subsequent period of time, analyzed for effects on processes (or interactions) dependent on the pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation, for example, formation of RNA foci, protein aggregation from RAN translation products, motor neuron and/or non-motor neuron function, etc.

[0187] Cells from non-human animals as described herein can be isolated and used on an ad hoc basis, or can be maintained in culture for many generations. In various embodiments, cells from a non-human animal as described herein are immortalized (e.g., via use of a virus) and maintained in culture indefinitely (e.g., in serial cultures).

[0188] Non-human animals described herein provide an in vivo system for assessing the pharmacokinetic properties and/or efficacy of a drug (e.g., a drug targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation). In various embodiments, a drug may be delivered or administered to one or more non-human animals, cells derived therefrom or having the same genetic modifications thereof, as described herein, followed by monitoring of, or performing one or more assays on, the non-human animals (or cells isolated therefrom) to determine the effect of the drug on the non-human animal. Pharmacokinetic properties include, but are not limited to, how an animal processes the drug into various metabolites (or detection of the presence or absence of one or more drug metabolites, including, but not limited to, toxic metabolites), drug half-life, circulating levels of drug after administration (e.g., serum concentration of drug), anti-drug response (e.g., anti-drug antibodies), drug absorption and distribution, route of administration, routes of excretion and/or clearance of the drug. In some embodiments, pharmacokinetic and pharmacodynamic properties of drugs are monitored in or through the use of non-human animals described herein.

[0189] In some embodiments, performing an assay includes determining the effect on the phenotype and/or genotype of the non-human animal to which the drug is administered. In some embodiments, performing an assay includes determining lot-to-lot variability for a drug. In some embodiments, performing an assay includes determining the differences between the effects of a drug administered to a non-human animal described herein and a reference non-human animal. In various embodiments, reference non-human animals may have a modification as described herein, e.g., insertion of a non-pathogenic heterologous hexanucleotide repeat expansion sequence or no modification (i.e., a wild type non-human animal).

[0190] Exemplary parameters that may be measured in non-human animals (or in and/or using cells isolated therefrom) for assessing the pharmacokinetic properties of a drug include, but are not limited to, agglutination, autophagy, cell division, cell death, complement-mediated hemolysis, DNA integrity, drug-specific antibody titer, drug metabolism, gene expression arrays, metabolic activity, mitochondrial activity, oxidative stress, phagocytosis, protein biosynthesis, protein degradation, protein secretion, stress response, target tissue drug concentration, non-target tissue drug concentration, transcriptional activity, and the like. In various embodiments, non-human animals described herein are used to determine a pharmaceutically effective dose of a drug (e.g., a drug targeting a pathogenic heterologous hexanucleotide repeat expansion sequence or expression product thereof, e.g., resulting from RAN translation).

EXAMPLES

[0191] The following examples are provided so as to describe to those of ordinary skill in the art how to make and use methods and compositions of the invention, and are not intended to limit the scope of what the inventors regard as their invention. Unless indicated otherwise, temperature is indicated in Celsius, and pressure is at or near atmospheric.

Example 1

Insertion of a Heterologous Hexanucleotide Repeat Expansion Sequence in a Non-Human Embryonic Stem Cell at an Endogenous Non-Human C9ORF72 Locus

[0192] This example illustrates a targeted insertion of a heterologous hexanucleotide repeat expansion sequence into an embryonic stem cell at a C9orf72 locus of a non-human animal, particularly rodent. In particular, this example specifically describes the replacement of a part of a non-coding sequence of a mouse C9orf72 locus with a heterologous human hexanucleotide repeat expansion sequence placed in operable linkage with an mouse C9orf72 promoter and/or human regulatory elements, e.g., those that may be found in exons 1a and/or 1b of the human C9orf72 gene. The C9orf72-HRE targeting vector for inserting a heterologous hexanucleotide repeat expansion sequence in an endogenous mouse C9orf72 locus was made as previously described (see, e.g., U.S. Pat. No. 6,586,251; Valenzuela et al., 2003, Nature Biotech. 21(6):652-659; and Adams, N. C. and N. W. Gale, in Mammalian and Avian Transgenesis--New Approaches, ed. Lois, S. P. a. C., Springer Verlag, Berlin Heidelberg, 2006). The resulting modified C9orf72 locus is depicted in FIG. 1A, bottom box.

[0193] Briefly, targeting vectors comprising a sequence set forth in SEQ ID NO:8 or SEQ ID NO:9 were generated using bacterial artificial chromosome (BAC) clones from a mouse RP23 BAC library (Adams, D. J. et al., 2005, Genomics 86:753-758) and introduced into F1 hybrid (129S6SvEvTac/C57BL6NTac) embryonic stem (ES) cells followed by culturing in selection medium containing G418. Drug-resistant colonies were picked 10 days after electroporation and screened for correct targeting as previously described (Valenzuela et al., supra; Frendewey, D. et al., 2010, Methods Enzymol. 476:295-307). Targeted ES cells are analyzed to determine the approximate size of hexanucleotide repeat expansions present in targeted mouse ES cell clones by Southern blot analysis and/or amplification of the C9orf72-HRE locus.

[0194] Specifically, Southern blot analysis was performed to determine the approximate size of hexanucleotide repeat expansions present in targeted C9ORF72 transgenic mouse ES cells. Genomic DNA was extracted from targeted mouse ES clones grown in single wells of a gelatin-coated 96 well plate. Once ES cell clones reached 100% confluence, cells were washed twice with 1.times. PBS and lysed overnight at 37.degree. C. in 50 uL of lysis buffer (1M Tris pH 8.5, 0.5M EDTA, 20% SDS, 5M NaCl, and 1 mg/mL proteinase K). DNA was precipitated with the addition of 125 uL of ice cold 200 proof ethanol to each well, followed by an overnight incubation at 4.degree. C. Precipitated DNA was washed twice with 70% ethanol, air dried, and resuspended in 30 uL 0.5.times. TE pH 8.0.

[0195] Extracted genomic DNAs (gDNA) were digested with HindII and ScaI overnight at 37.degree. C. and size separated on a 1% agarose gel. Post-electrophoresis agarose gels were denatured (1M NaCl, 5% NaOH) and neutralized (1.5M NaCl, 0.5M Tris pH 7.5). Digested gDNAs were then transferred to Hybond-N membranes (Amersham) via overnight capillary transfer.

[0196] A probe corresponding to a 252 bp XmaI fragment (see FIG. 2A) contained within the humanized targeting vector

TABLE-US-00005 (5'- CCGGGGCGGGGCTGCGGTTGCGGTGCCTGCGCCCGCGGCGGCGGAGG CGCAGGCGGTGGCGAGTGGGTGAGTGAGGAGGCGGCATCCTGGCGGG TGGCTGTTTGGGGTTCGGCTGCCGGGAAGAGGCGCGGGTAGAAGCGG GGGCTCTCCTCAGAGCTCGACGCATTTTTACTTTCCCTCTCATTTCT CTGACCGAAGCTGGGTGTCGGGCTTTCGCCTCTAGCGACTGGTGGAA TTGCCTGCATCCGGGCC-3'; SEQ ID NO: 29)

was labeled with .sup.32P using Prime-It II Random Primer Labeling Kit (Agilent). Denatured probe was diluted in ExpressHyb Hybridization Solution (Takara) and incubated with prepared membranes overnight at 65.degree. C. Autoradiography film was exposed to the probed blots for 72 hours.

[0197] As shown in FIG. 2B, an ES cell clone (8027 A-C4) comprising an inserted non-pathogenic heterologous hexanucleotide repeat expansion sequence comprising three repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 is obtained after introduction of the C9orf72-HRE-3 targeting vector comprising a sequence set forth as SEQ ID NO:4 and excision of the drug resistance cassette. After introduction of the C9orf72-HRE-100 targeting vector comprising a sequence set forth as SEQ ID NO:6, at least two ES cell clones (8029 A-A3 and 8029 A-A6) comprising an inserted heterologous hexanucleotide repeat expansion sequence, which is a variant of the sequence set forth as SEQ ID NO:7 and comprises about 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, were obtained. Also at least two ES cell clones (8029 B-A6 and 8029 B-A4) comprising an inserted heterologous hexanucleotide repeat expansion sequence, which is a variant of the sequence set forth as SEQ ID NO:7 and comprises about 30 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1, were obtained after introduction of the C9orf72-HRE-100 targeting vector (8028) and excision of the drug resistance cassette.

[0198] AmplideX PCR/CE C9ORF72 Kit (Asuragen) was also used according to manufacturer's instructions to confirm the number of instances of the hexanucleotide sequence set forth as SEQ ID NO:1 in heterologous hexanucleotide repeat expansion sequence inserted into the endogenous C9orf72 ES cell clones described. Purified mESC genomic total DNA from a 3.times. repeat clone (8027 A-C4), 2 individual 92.times. repeat clones (8029 A-A3, 8029 A-A6), and 2 individual 30.times. repeat clones (8029 B-A4, 8029 B-A10) was used as input DNA. F1H4 mESC genomic total DNA served as negative control, and Coriell Cell Repository purified human blood cell genomic DNA from patients with known C9ORF72 hexanucleotide expanded repeat alleles (samples ND11836 (HRE genotype: 8/expanded), ND14442 (2/expanded), ND6769 (13/44)) served as positive controls (Coriell Institute for Medical Research). PCR using the primers in Table 4 was performed on a ABI 9700 thermal cycler (Thermo Fisher). Amplicons were sized by capillary electrophoresis on a ABI 3500.times.L GeneScan using POP-7 polymer (Thermo Fisher) and NuSieve agarose gels (Lonza). 2-log DNA ladder (New England BioLabs) molecular weight marker was loaded on agarose gels for comparison, and bands were visualized with SYBR Gold Nucleic Acid Stain (Thermo Fisher).

TABLE-US-00006 TABLE 4 Primer name Sequence (SEQ ID NO:) 2-Primer Fwd TGCGCCTCCGCCGCCGCGGGCGCAGGCACCGCAACCGCA (SEQ ID NO: 30) 2-Primer Rev CGCAGCCTGTAGCAAGCTCTGGAACTCAGGAGTCG (SEQ ID NO: 31) 3-Primer Fwd ATGCAGGCAATTCCACCAGTCGCTAGAGGCGAAAGC (SEQ ID NO: 32) 3-Primer Rev TAACCAGAAGAAAACAAGGAGGGAAACAACCGCAGCCTGT (SEQ ID NO: 33)

[0199] FIG. 2B confirms the presence of 3 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 within the heterologous hexanucleotide repeat expansion sequence inserted into the endogenous C9orf72 locus of mouse ES cell clone 8027 A-C4, about 30 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 within the heterologous hexanucleotide repeat expansion sequences inserted into the endogenous C9orf72 locus of mouse ES cell clones 8029 B-A9 and 8029 B-A10, and about 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 within the heterologous hexanucleotide repeat expansion sequences at the endogenous C9orf72 locus of mouse ES cell clones 8029 A-A3 and 8029 A-A6.

Example 2

Generation of Embryonic Stem Cell Derived Motor Neurons and Non-Human Animals Comprising a Heterologous Hexanucleotide Repeat Expansion Sequence at an Endogenous Mouse C9ORF72 Locus

[0200] Embryonic Stem Cell Derived Motor Neurons

[0201] Parental embryonic stem cells (ESCs) homozygous for a wildtype C9orf72 locus (control) or heterozygous for a C9orf72 locus genetically modified with about 3 repeats (C9orf72HRE.sub.3.sup.+/-), 30 repeats (C9orf72HRE.sub.30.sup.+/-), or 92 repeats ((C9orf72HRE.sub.92.sup.+/-) of the hexanucleotide sequence set forth as SEQ ID NO:1 were cultured in embryonic stem cell medium (ESM; DMEM+15% Fetal bovine serum+Penicillin/Streptomycin+Glutamine+Non-essential amino acids+nucleosides+.beta.-mercaptoethanol+Sodium pyruvate+LIF) for 2 days, during which the medium was changed daily. ES medium was replaced with 7 ml of ADFNK medium (Advanced DMEM/F12+Neurobasal medium+10% Knockout serum+Penicillin/Streptomycin+Glutamine+.beta.-mercaptoethanol) 1 hour before trypsinization. ADFNK medium was aspirated and ESC were trypsinized with 0.05% trypsin-EDTA. Pelleted cells were resuspended in 12 ml of ADFNK and grown for two days in suspension. Cells were cultured for a further 4 days in ADFNK supplemented with retinoic acid (RA) and smoothened agonist to obtain motor neurons (ESMNs). Dissociated motor neurons were plated and matured in embryonic stem cell-derived motor neuron medium (ESMN; Neurobasal medium+2% Horse serum+B27+Glutamine+Penicillin/Streptomycin+.beta.-mercaptoethanol+10 ng/ml GDNF, BDNF, CNTF).

[0202] Non-Human Animals

[0203] The VELOCIMOUSE.RTM. method (DeChiara, T. M. et al., 2010, Methods Enzymol. 476:285-294; Dechiara, T. M., 2009, Methods Mol. Biol. 530:311-324; Poueymirou et al., 2007, Nat. Biotechnol. 25:91-99) was used, in which targeted ES cells were injected into uncompacted 8-cell stage Swiss Webster embryos, to produce healthy fully ES cell-derived F0 generation mice heterozygous for the C9orf72-HRE (3.times. or 100.times.) insertion. F0 generation heterozygous male were crossed with C57Bl6/NTac females to generate F1 heterozygotes that were intercrossed to produce F2 generation C9orf72-HRE.sup.+/+, C9orf72-HRE.sup.+/- and wild type mice for molecular and phenotypic analyses.

Example 3

Analysis of Motor Neurons or Brain Tissues Having a Heterologous Hexanucleotide Repeat Expansion Sequence in an Endogenous C9orf72 Locus

[0204] Recently, Liu et al. (2017) Cell Chem. Biol. 24:141-148 used quantitative polymerase chain reaction (qPCR) and digital droplet polymerase chain reaction (ddPCR) to quantify the copy number of sense and antisense RNA transcripts from the C9orf72 locus expressed by human fibroblast cell lines, or human astrocytes and motor neurons derived from induced pluripotent stem cells (iPSCs), isolated from patients suffering from ALS. Liu et al. (2017), supra, detected significantly higher numbers of sense intronic, antisense, and sense C9orf72 transcripts in patient-derived fibroblasts compared to fibroblasts derived from healthy patients. On average, three to four copies of C9orf72 intronic and antisense transcripts, and about 15-20 copies of C9orf72 sense mRNA transcripts, were detected per patient-derived fibroblast. Liu et al. (2017) supra. Liu et al. (2017) et al., supra, show that, in contrast, one or less intronic and antisense transcripts, and 5-10 copies of C9orf72 sense mRNA transcripts, were detected in non-disease fibroblast cell lines. Similarly to the fibroblasts, expression of intronic, antisense, and sense C9orf72 transcripts was higher in patient-derived astrocytes and neuronal cells compared to healthy-control derived astrocytes and neuronal cells. Liu et al. (2017) et al., supra. By calculating the percentage of cells that contain RNA foci, the average number of foci per cell, and the distribution of different numbers of foci among cells, and in determining the number of C9orf72 transcripts in disease- or healthy-derived cells, Liu et al. (2017) et al., supra, suggested that the each foci seen in disease-derived cell is a single mutant C9orf72 intronic or antisense transcript, and further, that small numbers of RNA molecules may have a sizable impact on disease.

[0205] In this example, the stability of the size of the hexanucleotide repeat in a breeding colony was confirmed in F2 animals using AmplideX PCR/CE C9ORF72 Kit (Asuragen) as described above (data not shown). Additionally, RNA transcripts in mouse embryonic stem cell derived motor neurons (ESMNs), brain tissues, and parental embryonic stem cells comprising a wildtype C9orf72 locus (control) or a genetically modified C9orf72 locus that comprises three, thirty, or ninety-two repeats of the hexanucleotide sequence set forth in SEQ ID NO:1 were examined. RNA foci and dipeptide repeat protein levels were also evaluated in ESMNs derived from parental embryonic stem cells comprising a wildtype C9orf72 locus (control) or a genetically modified C9orf72 locus that comprises three, thirty, or ninety-two repeats of the hexanucleotide sequence set forth in SEQ ID NO:1.

[0206] Materials and Methods

[0207] Quantitative Polymerase Chain Reaction

[0208] Total RNA from each sample was extracted and reverse transcribed using primers that flank various regions, and probes that detect those regions of the modified C9orf72-HRE locus. Detectable regions include those that span the junction of mouse and human sequences, only human sequences, or only mouse sequences. QPCR of GAPDH or .beta.2-microglobulin was performed using probes and primers of readily available kits.

[0209] Specifically, RNA was isolated from embryonic stem cell-derived motor neurons (ESMN), parental embryonic stem (ES) cells, or total brains isolated from mice comprising a wildtype (WT) C9orf72 locus (control) or a genetically modified C9orf72 locus comprising 3, 30 or 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1

[0210] Total RNA was isolated using Direct-zol RNA Miniprep plus kit according to the manufacturer's protocol (Zymo Research). About 1 .mu.g total RNA was t treated with DNase I (ThermoFisher) at 25.degree. C. for 15 min. EDTA was added and the mixture incubated at 65.degree. C. for 10 min. Reverse transcription (RT) reactions were performed with a Maxima H Minus First Strand cDNA Synthesis Kit with dsDNase (ThermoFisher). After DNase I treatment, 10 .mu.L of RT mixture containing RT buffer, random hexamer primers, dNTPs, Maxima H minus enzyme mix was added to make final volume of 20 .mu.L. The RT reaction mixture was incubated at 25.degree. C. for 10 min, then at 50.degree. C. for 15 min, and then 5 min at 85.degree. C. to inactivate the enzyme. The cDNA mix was diluted with water to make 100 .mu.L final volume.

[0211] After reverse transcription, the PCR reaction solution was reconstituted to a final volume of 8 .mu.L containing 3 .mu.L cDNA and 5 .mu.L of PCR mixture, probe and gene specific primers. Unless otherwise noted final primer and probe concentrations were 0.5 .mu.M and 0.25 .mu.M respectively. qPCR was performed on a ViiA.TM. 7 Real-Time PCR Detection System (ThermoFisher). PCR reactions were done in quadruplicates at 95.degree. C. 10 min and 95.degree. C. 3 s, 60.degree. C. 30 s for 45 cycles in an optical 384-well plate. The sequences of the primers and probes and SEQ ID NO used in each analysis (A, B, F, G, H) are provided in Table 5.

TABLE-US-00007 TABLE 5 Analysis A Forward CATCCCAATTGCCCTTTCC (SEQ ID NO: 66) Primer Reverse CCCACACCTGCTCTTGCTAGA (SEQ ID NO: 67) Primer Probe TCTAGGTGGAAAGTGGG (SEQ ID NO: 68) Analysis B Forward GAGCAGGTGTGGGTTTAGGA (SEQ ID NO: 69) Primer Reverse CCAGGTCTCACTGCATTCCA (SEQ ID NO: 70) Primer Probe ATTGCAAGCGTTCGGATAATGTGAGA (SEQ ID NO: 71) Analysis D Forward GCTGTCACGAAGGCTTTCTTC (SEQ ID NO: 72) Primer Reverse GCACTGCTGCCAACTACAAC (SEQ ID NO: 73) Primer Probe TCAATGCCATCAGCTCACACCTGC (SEQ ID NO: 74) Analysis G Forward AAGAGGCGCGGGTAGAA (SEQ ID NO: 75) Primer Reverse CAGCTTCGGTCAGAGAAATGAG (SEQ ID NO: 76) Primer Probe CTCTCCTCAGAGCTCGACGCATTT (SEQ ID NO: 77) Analysis H Forward CTGCACAATTTCAGCCCAAG (SEQ ID NO: 78) Primer Reverse CAGGTCATGTCCCACAGAAT (SEQ ID NO: 79) Primer Probe CATATGAGGGCAGCAATGCAAGTC (SEQ ID NO: 80)

[0212] Western Blot Analysis

[0213] Differentiated embryoid bodies (EBs) were collected and homogenized in SDS sample buffer (2% SDS, 10% glycerol, 5% .beta.mercaptoethanol, 60 mM TrisHCl, pH 6.8, bromophenol blue). Protein extracts were quantified using the RC DC protein assay (BioRad). Extracts (10 .mu.g) were run on a 4-20% SDS-PAGE gel (ThermoFisher) and transferred onto nitrocellulose membrane using an iBLOT transfer unit (ThermoFisher). Immunoblots were probed with primary antibodies against C9orf72 and GAPDH (Millipore). Bound antibody was detected by incubation with secondary antibodies conjugated to horseradish peroxidase (Abcam) followed by chemiluminescence using a SuperSignal West Pico chemiluminescent substrate (Thermo Scientific). Signal was detected by autoradiography using Full Speed Blue sensitive medical X-Ray film (Ewen Parker XRay Corporation). Relative protein levels were calculated using ImageJ.

[0214] Fluorescent In Situ Hybridization (FISH) and Immunofluorescence (IF) for the Detection of RNA and Translation Products

[0215] Fluorescent in situ hybridization (FISH) and immunofluorescence were respectively used to determine the location of RNA transcribed from the hexanucleotide repeat sequence set forth as SEQ ID NO:1, as well as dipeptide repeat proteins translated therefrom, in embryonic stem cell-derived motor neurons (ESMNs) generated as described in Example 3. Briefly, ESMNs were grown in four-well chamber slide (Lab-Tek II chamber slide system, ThermoFisher Scientific) and fixed with 4% PFA (Electron Microscopy Sciences) in PBS. Cells were then permeabilized with diethyl pyrocarbonate (DEPC) PBS/0.2% Triton X-100 (Fisher Scientific, catalog #BP151) and washed with DEPC-PBS, blocked and stained with LNA or DNA oligonucleotides for the detection of RNA transcription products, or anti-polyGA antibody for the detection of RAN translation products, as described below. After staining, slides were subsequently incubated with an appropriate fluorescent dye, mounted with Fluoromount G (Southern Biotech) and visualized using confocal microscopy.

[0216] Detection of Sense or Antisense RNA Transcription Products

[0217] Slides were pre-hybridized with buffer consisting of 50% formamide (IBI Scientific, catalog #IB72020), DEPC 2.times.SSC [300 mM sodium chloride, 30 mM sodium citrate (pH 7.0)], 10% (w/v) dextran sulfate (Sigma-Aldrich, catalog #D8960), and DEPC 50 mM sodium phosphate (pH 7.0) for 30 min at 66.degree. C. (for LNA probes) or 55.degree. C. (for DNA probes). The hybridization buffer was then drained off, and 400 .mu.l of 40 nM LNA probe mix or 200 ng/ml of DNA probe mix in hybridization buffer was added to each of the slides and incubated in the dark for 3 hours at 66.degree. C. (for LNA probes) or 55.degree. C. (for DNA probes). Slides incubated with LNA probes were rinsed once in DEPC 2.times.SSC/0.1% Tween 20 (Fisher Scientific, catalog no. BP337) at room temperature and in DEPC 0.1.times.SSC three times at 65.degree. C. Slides incubated with DNA probes washed three times with 40% formamide in 2.times.SSC and briefly washed one time in PBS. Slides were subsequently incubated with 1 .mu.g/mL DAPI (Molecular Probes Inc.).

[0218] The sequences and SEQ ID NOs: of the LNA and DNA oligonucleotide probes used in this example, as well as the hybridization conditions of the probes, are provided in Table 6 below.

TABLE-US-00008 TABLE 6 Sequence Probe (SEQ ID NO:) Hybridization method LNA TYE563-CCCCGGCCCCG 66.degree. C. hybridization sense GCCCC and washes in 0.1 X G.sub.4C.sub.2 RNA (SEQ ID NO: 81) SSC LNA TYE563-GGGGCCGGGGC 66.degree. C. hybridization antisense CGGGGGGCCCC and washes in 0.1 X G.sub.4C.sub.2 RNA (SEQ ID NO:82) SSC DNA CCCCGGCCCCGGCCCCG 55.degree. C. hybridization sense G-Cy3 and washes in 2 X SSC G.sub.4C.sub.2 RNA (SEQ ID NO: 83) DNA GGGGCCGGGGCCGGGG 55.degree. C. hybridization antisense C-Cy3 and washes in 2 X SSC G.sub.4C.sub.2 RNA (SEQ ID NO: 84)

[0219] Detection of Dipeptide Repeat Protein Products

[0220] After permeabilization, slides were blocked with 5% normal donkey serum diluted in Tris buffered saline (pH 7.4) with 0.2% Triton X100 (TBS-T). Slides were incubated overnight at 4.degree. C. with primary antibodies against poly-GA (Millipore) diluted in TBS-T with 5% normal donkey. After washing 3 time with TBS-T, slides were incubated with species specific secondary antibodies coupled to Alexa 488 or 555 (1:1000 in TBS-T, ThermoFisher) and DAPI (1 .mu.g/ml) (Molecular Probes Inc.) for 1 hr at room temperature. After washing 3 times with TBS-T slides were mounted with Fluoromount G (Southern Biotech) and visualized using confocal microscopy.

[0221] Results

[0222] As shown in FIGS. 4, 5 and 6, ESMNs, total brain and neuronal tissues from mice comprising the hexanucleotide repeat expansion sequence set forth as SEQ ID NO:1 at the C9orf72 locus showed increased expression of the C9orf72 mRNA transcripts. Such increase appears to be correlated with the number of the hexanucleotide repeats present between exons 1a and 1b of the C9orf72 locus. FIG. 6 also shows that, similar to the neuronal tissues isolated from the mice comprising 3 or 92 repeats of the heterologous hexanucleotide sequence set forth as SEQ ID NO:1 at the endogenous C9orf72 locus and ESMNs comprising the same, C9orf72 expression was also enhanced in non-neuronal tissues, e.g., muscle and heart, in mice comprising 3 or 92 repeats of the heterologous hexanucleotide sequence set forth as SEQ ID NO:1 at the endogenous C9orf72 locus. Furthermore, the enhancement was specific for the humanized C9orf72 allele; no enhanced expression of the mouse C9orf72 allele, which does not contain the repeat sequence, was seen in heterozygous mice (data not shown).

[0223] Preliminary calculations indicate that ESMNs or brain cells with thirty or ninety-two repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 have approximately 17 copies of a C9orf72 mRNA per cell, consistent with the findings of Liu et al. (2017) supra. An increased number of repeats of a hexanucleotide sequence set forth in SEQ ID NO:1 is also directly correlated with an increase in C9orf72 protein levels, (FIGS. 7 and 8), nuclear and cytoplasmic accumulation of sense and antisense C9orf72 RNA foci (FIGS. 9A and 9B), and dipeptide repeat proteins (FIG. 10). The data shown herein indicate that increased number of repeats of a hexanucleotide sequence set forth in SEQ ID NO:1 at the C9orf72 locus results in cells exhibiting a molecular phenotype (e.g., increased transcription, accumulation of RNA foci, and/or increased dipeptide repeat proteins) similar to human cells isolated from patients diagnosed with ALS, and supports the use of the non-human animals disclosed herein as a disease model for neurodegenerative disease.

Example 4

Behavioral Analysis of Non-Human Animals Having a Heterologous Hexanucleotide Repeat Expansion Sequence in an Endogenous C9orf72 Locus

[0224] This example describes behavioral analysis of non-human animals (e.g., rodents) described herein for ALS-like symptoms such as, for example, decreased body weight and/or significant motor abnormalities resulting from an insertion of a heterologous hexanucleotide repeat expansion sequence in an endogenous rodent (e.g., mouse) C9orf72 locus as described in Example 1.

[0225] Phenotypic studies of mice having an a pathogenic heterologous hexanucleotide repeat expansion sequence inserted into an endogenous C9orf72 locus as described above, and/or control mice, e.g., wildtype mice or mice having an a non-pathogenic heterologous hexanucleotide repeat expansion sequence inserted into an endogenous C9orf72 locus as described above, is performed at 8, 18, 37 (female) and 57-60 weeks (male). Body weight is measured on a bi-weekly basis, and body composition is analyzed by .mu.CT scan (Dynamic 60). Standard 24 scan is used to visualize mass of the cervical region of the spine. All animal procedures were conducted in compliance with protocols approved by the Regeneron Pharmaceuticals Institutional Animal Care and Use Committee.

[0226] Assessment of overall motor function is performed using blinded subjective scoring assays. Analysis of motor impairment is conducted using rotarod, open field locomotor, and catwalk testing. Motor impairment score is measure using the system developed by the ALS Therapy Development Institute (ALSTDI, Gill A. et al., 2009, PLoS One 4:e6489). During catwalk testing, subjects walk across an illuminated glass platform while a video camera records from below. Gait related parameters such as stride pattern, individual paw swing speed, stance duration, and pressure are reported for each animal. This test is used to phenotype mice and evaluate novel chemical entities for their effect on motor performance. CatWalk XT is a system for quantitative assessment of footfalls and gait in rats and mice. It is used to evaluate the locomotor ability of rodents in almost any kind of experimental model of central nervous, peripheral nervous, muscular, or skeletal abnormality.

[0227] CatWalk Gait Analysis: Animals are placed at the beginning of the runway of Noldus CatWalk XT 10, with the open end in front of them. Mice spontaneously run to the end of the runway to attempt to escape. The camera records and the software of the system measures the footprints. The footprints are analyzed for abnormalities in paw placement.

[0228] Open Field Test: Mice are placed in the Kinder Scientific open field system and evaluated for 60 minutes. The apparatus uses infrared beams and computer software to calculate fine movements, X+Y ambulation, distance traveled, number of rearing events, time spent rearing, and immobility time.

[0229] Rotorod: The rotorod test (IITC Life Science, Woodland Hills, Calif.) measures the latency for a mouse to fall from a rotating beam. The rotorod is set to the experimental regime that starts at 1 rpm and accelerates up to 15 rpm over 180 seconds. Then, the animals' latency to fall following the incremental regime is recorded. The average and maximum of the three longest durations of time that the animals stay on the beam without falling off are used to evaluate falling latency. Animals that manage to stay on the beam longer than 180 seconds are deemed to be asymptomatic.

[0230] Upper motor neuron impairment presents as spasticity (i.e., rigidity), increased reflexes, tremor, bradykinesia, and Babinski signs. Lower motor neuron impairment presents as muscle weakness, wasting, clasping, curling and dragging of feet, and fasciculations. Bulbar impairment presents as difficulty swallowing, slurring and tongue fasciculations. Overall motor function is also assessed starting at 32 weeks up to 60 weeks of age as percent of living animals at a given week. Mice are weighed weekly and assessment of overall motor function is performed using blinded subjective scoring assays (as described above). Weekly or bi-monthly clinical neurological exams are performed on the two groups of mice looking at their motor impairment, tremor and rigidity of their hind limb muscles. For motor impairment, a blinded neurological scoring scale from of zero (no symptoms) to four (mouse cannot right themselves within 30 seconds of being placed on their side) is used as shown in Table 7.

TABLE-US-00009 TABLE 7 ALS-TDI neurological scoring system Score of 0: Full extension of hind legs away from lateral midline when mouse is suspended by its tail, and mouse can hold this for two seconds, suspended two to three times. Score of 1: Collapse or partial collapse of leg extension towards lateral midline (weakness) or trembling of hind legs during tail suspension. Score of 2: Toes curl under at least twice during walking of 12 inches, or any part of foot is dragging along cage bottom/table. Score of 3: Rigid paralysis or minimal joint movement, foot not being used for generating forward motion. Score of 4: Mouse cannot right itself within 30 seconds after being placed on either side.

[0231] For tremor and rigidity, a scoring system with a scale from zero (no symptoms) to three (severe) is used. Table 8 sets forth the scoring methodology related to motor impairment, tremor and rigidity of animals during testing.

TABLE-US-00010 TABLE 8 0 2 Motor no 1 clapsing & 3 impairment phenotype clasping dragging Paralysis Tremor none mild moderate Severe Rigidity none mild moderate Severe

[0232] In another experiment mice are examined using a grip strength test. Briefly, the grip strength measures the neuromuscular function as maximal muscle strength of forelimbs, and is assessed by the grasping applied by a mouse on a grid that is connected to a sensor. All grip strength values obtained are normalized against mouse body weight.

[0233] In another experiment, the lumbar portion of spinal cords from control mice and mice comprising a pathogenic heterologous hexanucleotide repeat expansion sequence inserted into an endogenous C9orf72 locus at around 60 weeks old are collected for histopathological analysis. The total number of motor neurons in the spinal cords, and mean cell body area of motor neurons are observed in both test and control cohorts.

[0234] The thermal nociception of control mice, and test mice comprising an insertion of a pathogenic heterologous hexanucleotide repeat expansion sequence at 20 weeks of age is tested by placing animals on a metal surface maintained at 48.degree. C., 52.degree. C. or 55.degree. C. (IITC, Woodland Hills, Calif.). Latency to respond, defined as the time elapsed until the animal licked of flicked a hind paw, to the heat stimulus is measured. Mice remain on the plate until they performed either of two nocifensive behaviors: hindpaw licking or hindpaw shaking.

Example 5

Deletion of a Heterologous Hexanucleotide Repeat Expansion Sequence from an Endogenous Non-Human C9ORF72 Locus in a Non-Human Embryonic Stem Cell Using a CRISPR/Cas9 System

[0235] Potential guide RNA (gRNA) sequences for a reference hexanucleotide repeat expansion sequence (comprising at least one, at least about three, at least about five, at least about fifteen, at least about twenty, at least about thirty, at least about forty, at least about fifty, at least about 60 at least about 70, at least about 80, or at least about 90, preferably contiguous, repeats of the hexanucleotide sequence set forth as SEQ ID NO:1) are analyzed and scored. DNA encoding potentially effective gRNA (e.g., crRNA and/or tracRNA) is synthesized and placed into an expression construct, which may also comprise a nucleic acid encoding a Cas protein. See, e.g., FIG. 12. ES cells comprising the reference hexanucleotide repeat expansion sequence are transfected with the expression construct(s) comprising the DNA encoding the gRNA and/or Cas protein, and a drug resistance gene. Drug-resistant clones are obtained by serial dilution, expanded for analysis and frozen. DNA from each drug-resistant ES cell clone is isolated and analyzed by PCR and visualization on an agarose gel. PCR products of a correct size are extracted and further sequenced to confirm deletion of the targeted hexanucleotide repeat expansion sequence.

[0236] FIG. 11 provides a not to scale depiction of a non-limiting exemplary reference hexanucleotide repeat expansion sequence, e.g., as found in 8029 A-A6 ES cells generated in Example 1, e.g., having a sequence as set forth as SEQ ID NO:45, and the positions of which that were more likely to be successfully targeted by gRNA. The DNA sequences encoding crRNA that target the positions depicted in FIG. 11, an exemplary sequence for which is provided as SEQ ID NO:45, and the SEQ ID NO: of each are provided in Table 9. Notably, the sequences set forth as SEQ ID NOs:46-50 contain an initial guanine not found in the reference hexanucleotide repeat expansion sequence set forth as SEQ ID NO:45 for optimal expression with a U6 promoter.

TABLE-US-00011 TABLE 9 Designed gRNA sequences Position in crRNA encoding SEQ ID NO: 45 sequence (SEQ ID NO:) 190 GCTACTTGCTCTCACAGTACT (SEQ ID ON: 46) 196 GCTCTCACAGTACTCGCTGA (SEQ ID NO: 39) 274 GCCGCAGCCTGTAGCAAGCTC (SEQ ID NO: 47) 899 GCGGCCGCTAGCGCGATCGCG (SEQ ID NO: 48) 905 GCTAGCGCGATCGCGGGGCG (SEQ ID NO: 49) 1006 GTGGCGAGTGGGTGAGTGAGG (SEQ ID NO: 50) 1068 GGAAGAGGCGCGGGTAGAAG (SEQ ID NO: 44)

[0237] DNA encoding the crRNA as set forth in Table 9 were made (Integrated DNA Technologies) and inserted into an expression construct in operable linkage with DNA encoding tracrRNA (e.g., DNA comprising the sequence set forth as SEQ ID NO:63). Successful ligation of the crRNA encoding sequences, was confirmed by polymerase chain reaction with the vector screening primers set forth in Table 10, and the sequences of the gRNA (crRNA and tracrRNA) encoding sequences were confirmed with sequence analysis using the vector sequencing primers, also set forth in Table 10. Expression constructs comprising the correct gRNA encoding sequences under the control of a U6 promoter, a nucleic acid encoding a cas9 protein, and a puromycin resistance gene, FIG. 12, were amplified and purified.

TABLE-US-00012 TABLE 10 Vector Screening ACACCGCTCTCACAGTACTCGCTGAG forward primer (SEQ ID NO: 51) Position 190 gRNA Vector Screening ACACCGCCGCAGCCTGTAGCAAGCTCG forward primer (SEQ ID NO: 52) Position 196 gRNA Vector Screening ACACCGAGTACTGTGAGAGCAAGTAGG forward primer (SEQ ID NO: 53) Position 274 gRNA Vector Screening ACACCGACGCCCCGCGATCGCGCTAGG forward primer (SEQ ID NO: 54) Position 899 gRNA Vector Screening ACACCGCGGCCGCTAGCGCGATCGCGG forward primer (SEQ ID NO: 55) Position 905 gRNA Vector Screening ACACCGTGGCGAGTGGGTGAGTGAGGG forward primer (SEQ ID NO: 56) Position 1006 gRNA Vector Screening ACACCGGAAGAGGCGCGGGTAGAAGG forward primer (SEQ ID NO: 57) Position 1068 gRNA Vector Screening GACGCGTTAATGCCAACTTT reverse primer (SEQ ID NO: 58) All gRNA Vector sequencing GAGGGCCTATTTCCCATGAT forward primer (SEQ ID NO: 59) Vector sequencing GACGCGTTAATGCCAACTTT reverse primer (SEQ ID NO: 60) Clone screening GAACTTACGGAGTCCCACGA forward primer (SEQ ID NO: 61) Clone screening GGAGACAGCTCGGGTACTGA reverse primer (SEQ ID NO: 62)

[0238] 8029 A-A6 clones as obtained in Example 1 and comprising a hexanucleotide repeat expansion sequence comprising about 92 repeats of the hexanucleotide sequence set forth as SEQ ID NO:1 (e.g., a reference sequence set forth as SEQ ID NO:45) were transfected with different combinations of the crRNA set forth in Table 9 (plus tracrRNA sequence), a puromycin resistance gene, and a CRISPR/Cas9 endonuclease gene. In one combination, ES cells were transfected with a CRISPR/Cas9 system targeting sequences starting at positions 190, 196, 274, 899, 905, 1006, and 1068 of SEQ ID NO:45 (e.g., the expression construct(s) comprising a nucleic acid encoding cas9 protein and/or gRNA inserts having the sequences set forth as SEQ ID NOs: 39, 44 and 46-50. In a second combination, ES cells were transfected a CRISPR/Cas9 system targeting positions 196, 1006 and 1067 of SEQ ID NO: 45 (e.g., the expression construct(s) comprising a nucleic acid encoding cas9 protein and/or DNA encoding gRNA inserts comprising the sequence set forth as SEQ ID NOs: 39, 50 and 44, respectively). In a third combination, ES cells were transfected with gRNA inserts targeting positions 196, 272 and 1005 and 1067 of SEQ ID NO:45 (e.g., the expression construct(s) comprising a nucleic acid encoding cas9 protein and/or gRNA inserts comprising a sequence set forth as SEQ ID NO: 39, 47, 50 and 44, respectively).

[0239] Puromycin-resistant ES clones were obtained by serial dilution, cultured in media (500 ml KO DMEM media, 95 ml Heat Inactivated FBS, 12 mL L-Glutamine, 6 mL Penn-Step, 6 mL Non-Essential Amino Acids, 1.2 mL B-mercaptoethanol), expanded for analysis, and frozen. DNA from each clone was isolated using the DNAase Blood and Tissue Kit according to the manufacturer's protocol (Qiagen) and analyzed by PCR using the clone screening forward and reverse screening primers set forth in Table 10. PCR products were visualized by agarose gel electrophoresis, and PCR products of a correct size were extracted and further sequenced to confirm deletion of the targeted hexanucleotide repeat expansion sequence.

[0240] Of one-hundred sixty (160) clones, one hundred clones were tested and eleven (11) demonstrated a deletion of the hexanucleotide repeat expansion sequence, e.g. as demonstrated an amplified PCR product between 300 and 700 base pairs (data not shown). Sequence analysis confirmed deletion of the hexanucleotide repeat expansion sequence (data not shown). Of the three combinations tested, a CRISPR/Cas system targeting the combination of positions 196, 1005 and 1067 of SEQ ID NO: 45 proved most efficient in deleting the hexanucleotide repeat expansion sequence; this combination resulted in ten of the eleven positive clones. A CRISPR/Cas system targeting the combination of positions 196, 272. 1005 and 1067 of SEQ ID NO: 45 provided one clone.

EQUIVALENTS

[0241] Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated by those skilled in the art that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description and drawing are by way of example only and the invention is described in detail by the claims that follow.

[0242] Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.

[0243] The articles "a" and "an" in the specification and in the claims, unless clearly indicated to the contrary, should be understood to include the plural referents. Claims or descriptions that include "or" between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention also includes embodiments in which more than one, or the entire group members are present in, employed in, or otherwise relevant to a given product or process. Furthermore, it is to be understood that the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, descriptive terms, etc., from one or more of the listed claims is introduced into another claim dependent on the same base claim (or, as relevant, any other claim) unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise. Where elements are presented as lists, (e.g., in Markush group or similar format) it is to be understood that each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements, features, etc., certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements, features, etc. For purposes of simplicity those embodiments have not in every case been specifically set forth in so many words herein. It should also be understood that any embodiment or aspect of the invention can be explicitly excluded from the claims, regardless of whether the specific exclusion is recited in the specification.

[0244] Those skilled in the art will appreciate typical standards of deviation or error attributable to values obtained in assays or other processes described herein.

[0245] The publications, websites and other reference materials referenced herein to describe the background of the invention and to provide additional detail regarding its practice are hereby incorporated by reference.

Sequence CWU 1

1

8416DNAArtificial SequenceHeterologous Hexanucleotide sequence 1ggggcc 62964DNAHomo sapiens 2gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcgat cgcggggccg gggccggggc cgcgatcgcg gggcgtggtc 240ggggcgggcc cgggggcggg cccggggcgg ggctgcggtt gcggtgcctg cgcccgcggc 300ggcggaggcg caggcggtgg cgagtgggtg agtgaggagg cggcatcctg gcgggtggct 360gtttggggtt cggctgccgg gaagaggcgc gggtagaagc gggggctctc ctcagagctc 420gacgcatttt tactttccct ctcatttctc tgaccgaagc tgggtgtcgg gctttcgcct 480ctagcgactg gtggaattgc ctgcatccgg gccccgggct tcccggcggc ggcggcggcg 540gcggcggcgc agggacaagg gatggggatc tggcctcttc cttgctttcc cgccctcagt 600acccgagctg tctccttccc ggggacccgc tgggagcgct gccgctgcgg gctcgagaaa 660agggagcctc gggtactgag aggcctcgcc tgggggaagg ccggagggtg ggcggcgcgc 720ggcttctgcg gaccaagtcg gggttcgcta ggaacccgag acggtccctg ccggcgagga 780gatcatgcgg gatgagatgg gggtgtggag acgcctgcac aatttcagcc caagcttcta 840gagagtggtg atgacttgca tatgagggca gcaatgcaag tcggtgtgct ccccattctg 900tgggacatga cctggttgct tcacagctcc gagatgacac agacttgctt aaaggaagtg 960actc 96431528DNAHomo sapiens 3gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcgat cgccgtctcg gggccggggc cggggccggg gccggggccg 240gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 300gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 360gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 420gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 480gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 540gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 600gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 660gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 720gggccggggc cggggccggg gccggggccg gggccggggc cgagaccctc gagggccggc 780cgctagcgcg atcgcggggc gtggtcgggg cgggcccggg ggcgggcccg gggcggggct 840gcggttgcgg tgcctgcgcc cgcggcggcg gaggcgcagg cggtggcgag tgggtgagtg 900aggaggcggc atcctggcgg gtggctgttt ggggttcggc tgccgggaag aggcgcgggt 960agaagcgggg gctctcctca gagctcgacg catttttact ttccctctca tttctctgac 1020cgaagctggg tgtcgggctt tcgcctctag cgactggtgg aattgcctgc atccgggccc 1080cgggcttccc ggcggcggcg gcggcggcgg cggcgcaggg acaagggatg gggatctggc 1140ctcttccttg ctttcccgcc ctcagtaccc gagctgtctc cttcccgggg acccgctggg 1200agcgctgccg ctgcgggctc gagaaaaggg agcctcgggt actgagaggc ctcgcctggg 1260ggaaggccgg agggtgggcg gcgcgcggct tctgcggacc aagtcggggt tcgctaggaa 1320cccgagacgg tccctgccgg cgaggagatc atgcgggatg agatgggggt gtggagacgc 1380ctgcacaatt tcagcccaag cttctagaga gtggtgatga cttgcatatg agggcagcaa 1440tgcaagtcgg tgtgctcccc attctgtggg acatgacctg gttgcttcac agctccgaga 1500tgacacagac ttgcttaaag gaagtgac 152843621DNAArtificial Sequence8026 insert nucleic acid without homology arms 4gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcgat cgcggggccg gggccggggc cgcgatcgcg gggcgtggtc 240ggggcgggcc cgggggcggg cccggggcgg ggctgcggtt gcggtgcctg cgcccgcggc 300ggcggaggcg caggcggtgg cgagtgggtg agtgaggagg cggcatcctg gcgggtggct 360gtttggggtt cggctgccgg gaagaggcgc gggtagaagc gggggctctc ctcagagctc 420gacgcatttt tactttccct ctcatttctc tgaccgaagc tgggtgtcgg gctttcgcct 480ctagcgactg gtggaattgc ctgcatccgg gccccgggct tcccggcggc ggcggcggcg 540gcggcggcgc agggacaagg gatggggatc tggcctcttc cttgctttcc cgccctcagt 600acccgagctg tctccttccc ggggacccgc tgggagcgct gccgctgcgg gctcgagaaa 660agggagcctc gggtactgag aggcctcgcc tgggggaagg ccggagggtg ggcggcgcgc 720ggcttctgcg gaccaagtcg gggttcgcta ggaacccgag acggtccctg ccggcgagga 780gatcatgcgg gatgagatgg gggtgtggag acgcctgcac aatttcagcc caagcttcta 840gagagtggtg atgacttgca tatgagggca gcaatgcaag tcggtgtgct ccccattctg 900tgggacatga cctggttgct tcacagctcc gagatgacac agacttgctt aaaggaagtg 960actcgagata acttcgtata atgtatgcta tacgaagtta tatgcatggc ctccgcgccg 1020ggttttggcg cctcccgcgg gcgcccccct cctcacggcg agcgctgcca cgtcagacga 1080agggcgcagc gagcgtcctg atccttccgc ccggacgctc aggacagcgg cccgctgctc 1140ataagactcg gccttagaac cccagtatca gcagaaggac attttaggac gggacttggg 1200tgactctagg gcactggttt tctttccaga gagcggaaca ggcgaggaaa agtagtccct 1260tctcggcgat tctgcggagg gatctccgtg gggcggtgaa cgccgatgat tatataagga 1320cgcgccgggt gtggcacagc tagttccgtc gcagccggga tttgggtcgc ggttcttgtt 1380tgtggatcgc tgtgatcgtc acttggtgag tagcgggctg ctgggctggc cggggctttc 1440gtggccgccg ggccgctcgg tgggacggaa gcgtgtggag agaccgccaa gggctgtagt 1500ctgggtccgc gagcaaggtt gccctgaact gggggttggg gggagcgcag caaaatggcg 1560gctgttcccg agtcttgaat ggaagacgct tgtgaggcgg gctgtgaggt cgttgaaaca 1620aggtgggggg catggtgggc ggcaagaacc caaggtcttg aggccttcgc taatgcggga 1680aagctcttat tcgggtgaga tgggctgggg caccatctgg ggaccctgac gtgaagtttg 1740tcactgactg gagaactcgg tttgtcgtct gttgcggggg cggcagttat ggcggtgccg 1800ttgggcagtg cacccgtacc tttgggagcg cgcgccctcg tcgtgtcgtg acgtcacccg 1860ttctgttggc ttataatgca gggtggggcc acctgccggt aggtgtgcgg taggcttttc 1920tccgtcgcag gacgcagggt tcgggcctag ggtaggctct cctgaatcga caggcgccgg 1980acctctggtg aggggaggga taagtgaggc gtcagtttct ttggtcggtt ttatgtacct 2040atcttcttaa gtagctgaag ctccggtttt gaactatgcg ctcggggttg gcgagtgtgt 2100tttgtgaagt tttttaggca ccttttgaaa tgtaatcatt tgggtcaata tgtaattttc 2160agtgttagac tagtaaattg tccgctaaat tctggccgtt tttggctttt ttgttagacg 2220tgttgacaat taatcatcgg catagtatat cggcatagta taatacgaca aggtgaggaa 2280ctaaaccatg ggatcggcca ttgaacaaga tggattgcac gcaggttctc cggccgcttg 2340ggtggagagg ctattcggct atgactgggc acaacagaca atcggctgct ctgatgccgc 2400cgtgttccgg ctgtcagcgc aggggcgccc ggttcttttt gtcaagaccg acctgtccgg 2460tgccctgaat gaactgcagg acgaggcagc gcggctatcg tggctggcca cgacgggcgt 2520tccttgcgca gctgtgctcg acgttgtcac tgaagcggga agggactggc tgctattggg 2580cgaagtgccg gggcaggatc tcctgtcatc tcaccttgct cctgccgaga aagtatccat 2640catggctgat gcaatgcggc ggctgcatac gcttgatccg gctacctgcc cattcgacca 2700ccaagcgaaa catcgcatcg agcgagcacg tactcggatg gaagccggtc ttgtcgatca 2760ggatgatctg gacgaagagc atcaggggct cgcgccagcc gaactgttcg ccaggctcaa 2820ggcgcgcatg cccgacggcg atgatctcgt cgtgacccat ggcgatgcct gcttgccgaa 2880tatcatggtg gaaaatggcc gcttttctgg attcatcgac tgtggccggc tgggtgtggc 2940ggaccgctat caggacatag cgttggctac ccgtgatatt gctgaagagc ttggcggcga 3000atgggctgac cgcttcctcg tgctttacgg tatcgccgct cccgattcgc agcgcatcgc 3060cttctatcgc cttcttgacg agttcttctg aggggatccg ctgtaagtct gcagaaattg 3120atgatctatt aaacaataaa gatgtccact aaaatggaag tttttcctgt catactttgt 3180taagaagggt gagaacagag tacctacatt ttgaatggaa ggattggagc tacgggggtg 3240ggggtggggt gggattagat aaatgcctgc tctttactga aggctcttta ctattgcttt 3300atgataatgt ttcatagttg gatatcataa tttaaacaag caaaaccaaa ttaagggcca 3360gctcattcct cccactcatg atctatagat ctatagatct ctcgtgggat cattgttttt 3420ctcttgattc ccactttgtg gttctaagta ctgtggtttc caaatgtgtc agtttcatag 3480cctgaagaac gagatcagca gcctctgttc cacatacact tcattctcag tattgttttg 3540ccaagttcta attccatcag acctcgacct gcagccccta gataacttcg tataatgtat 3600gctatacgaa gttatgctag c 362151006DNAArtificial Sequence8026 insert nucleic acid without homology arms and after excision of neo 5gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcgat cgcggggccg gggccggggc cgcgatcgcg gggcgtggtc 240ggggcgggcc cgggggcggg cccggggcgg ggctgcggtt gcggtgcctg cgcccgcggc 300ggcggaggcg caggcggtgg cgagtgggtg agtgaggagg cggcatcctg gcgggtggct 360gtttggggtt cggctgccgg gaagaggcgc gggtagaagc gggggctctc ctcagagctc 420gacgcatttt tactttccct ctcatttctc tgaccgaagc tgggtgtcgg gctttcgcct 480ctagcgactg gtggaattgc ctgcatccgg gccccgggct tcccggcggc ggcggcggcg 540gcggcggcgc agggacaagg gatggggatc tggcctcttc cttgctttcc cgccctcagt 600acccgagctg tctccttccc ggggacccgc tgggagcgct gccgctgcgg gctcgagaaa 660agggagcctc gggtactgag aggcctcgcc tgggggaagg ccggagggtg ggcggcgcgc 720ggcttctgcg gaccaagtcg gggttcgcta ggaacccgag acggtccctg ccggcgagga 780gatcatgcgg gatgagatgg gggtgtggag acgcctgcac aatttcagcc caagcttcta 840gagagtggtg atgacttgca tatgagggca gcaatgcaag tcggtgtgct ccccattctg 900tgggacatga cctggttgct tcacagctcc gagatgacac agacttgctt aaaggaagtg 960actcgagata acttcgtata atgtatgcta tacgaagtta tgctag 100664180DNAArtificial Sequence8026 Insert Nucleic acid without homology arms plus neo cassette 6ggtctagcaa gagcaggtgt gggtttagga ggtgtgtgtt tttgtttttc ccaccctctc 60tccccactac ttgctctcac agtactcgct gagggtgaac aagaaaagac ctgataaaga 120ttaaccagaa gaaaacaagg agggaaacaa ccgcagcctg tagcaagctc tggaactcag 180gagtcgcgcg ctatgcgatc gccgtctcgg ggccggggcc ggggccgggg ccggggccgg 240ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 300ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 360ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 420ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 480ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 540ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 600ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 660ggccggggcc ggggccgggg ccggggccgg ggccggggcc ggggccgggg ccggggccgg 720ggccggggcc ggggccgggg ccggggccgg ggccggggcc gagaccctcg agggccggcc 780gctagcgcga tcgcggggcg tggtcggggc gggcccgggg gcgggcccgg ggcggggctg 840cggttgcggt gcctgcgccc gcggcggcgg aggcgcaggc ggtggcgagt gggtgagtga 900ggaggcggca tcctggcggg tggctgtttg gggttcggct gccgggaaga ggcgcgggta 960gaagcggggg ctctcctcag agctcgacgc atttttactt tccctctcat ttctctgacc 1020gaagctgggt gtcgggcttt cgcctctagc gactggtgga attgcctgca tccgggcccc 1080gggcttcccg gcggcggcgg cggcggcggc ggcgcaggga caagggatgg ggatctggcc 1140tcttccttgc tttcccgccc tcagtacccg agctgtctcc ttcccgggga cccgctggga 1200gcgctgccgc tgcgggctcg agaaaaggga gcctcgggta ctgagaggcc tcgcctgggg 1260gaaggccgga gggtgggcgg cgcgcggctt ctgcggacca agtcggggtt cgctaggaac 1320ccgagacggt ccctgccggc gaggagatca tgcgggatga gatgggggtg tggagacgcc 1380tgcacaattt cagcccaagc ttctagagag tggtgatgac ttgcatatga gggcagcaat 1440gcaagtcggt gtgctcccca ttctgtggga catgacctgg ttgcttcaca gctccgagat 1500gacacagact tgcttaaagg aagtgactcg agataacttc gtataatgta tgctatacga 1560agttatatgc atggcctccg cgccgggttt tggcgcctcc cgcgggcgcc cccctcctca 1620cggcgagcgc tgccacgtca gacgaagggc gcagcgagcg tcctgatcct tccgcccgga 1680cgctcaggac agcggcccgc tgctcataag actcggcctt agaaccccag tatcagcaga 1740aggacatttt aggacgggac ttgggtgact ctagggcact ggttttcttt ccagagagcg 1800gaacaggcga ggaaaagtag tcccttctcg gcgattctgc ggagggatct ccgtggggcg 1860gtgaacgccg atgattatat aaggacgcgc cgggtgtggc acagctagtt ccgtcgcagc 1920cgggatttgg gtcgcggttc ttgtttgtgg atcgctgtga tcgtcacttg gtgagtagcg 1980ggctgctggg ctggccgggg ctttcgtggc cgccgggccg ctcggtggga cggaagcgtg 2040tggagagacc gccaagggct gtagtctggg tccgcgagca aggttgccct gaactggggg 2100ttggggggag cgcagcaaaa tggcggctgt tcccgagtct tgaatggaag acgcttgtga 2160ggcgggctgt gaggtcgttg aaacaaggtg gggggcatgg tgggcggcaa gaacccaagg 2220tcttgaggcc ttcgctaatg cgggaaagct cttattcggg tgagatgggc tggggcacca 2280tctggggacc ctgacgtgaa gtttgtcact gactggagaa ctcggtttgt cgtctgttgc 2340gggggcggca gttatggcgg tgccgttggg cagtgcaccc gtacctttgg gagcgcgcgc 2400cctcgtcgtg tcgtgacgtc acccgttctg ttggcttata atgcagggtg gggccacctg 2460ccggtaggtg tgcggtaggc ttttctccgt cgcaggacgc agggttcggg cctagggtag 2520gctctcctga atcgacaggc gccggacctc tggtgagggg agggataagt gaggcgtcag 2580tttctttggt cggttttatg tacctatctt cttaagtagc tgaagctccg gttttgaact 2640atgcgctcgg ggttggcgag tgtgttttgt gaagtttttt aggcaccttt tgaaatgtaa 2700tcatttgggt caatatgtaa ttttcagtgt tagactagta aattgtccgc taaattctgg 2760ccgtttttgg cttttttgtt agacgtgttg acaattaatc atcggcatag tatatcggca 2820tagtataata cgacaaggtg aggaactaaa ccatgggatc ggccattgaa caagatggat 2880tgcacgcagg ttctccggcc gcttgggtgg agaggctatt cggctatgac tgggcacaac 2940agacaatcgg ctgctctgat gccgccgtgt tccggctgtc agcgcagggg cgcccggttc 3000tttttgtcaa gaccgacctg tccggtgccc tgaatgaact gcaggacgag gcagcgcggc 3060tatcgtggct ggccacgacg ggcgttcctt gcgcagctgt gctcgacgtt gtcactgaag 3120cgggaaggga ctggctgcta ttgggcgaag tgccggggca ggatctcctg tcatctcacc 3180ttgctcctgc cgagaaagta tccatcatgg ctgatgcaat gcggcggctg catacgcttg 3240atccggctac ctgcccattc gaccaccaag cgaaacatcg catcgagcga gcacgtactc 3300ggatggaagc cggtcttgtc gatcaggatg atctggacga agagcatcag gggctcgcgc 3360cagccgaact gttcgccagg ctcaaggcgc gcatgcccga cggcgatgat ctcgtcgtga 3420cccatggcga tgcctgcttg ccgaatatca tggtggaaaa tggccgcttt tctggattca 3480tcgactgtgg ccggctgggt gtggcggacc gctatcagga catagcgttg gctacccgtg 3540atattgctga agagcttggc ggcgaatggg ctgaccgctt cctcgtgctt tacggtatcg 3600ccgctcccga ttcgcagcgc atcgccttct atcgccttct tgacgagttc ttctgagggg 3660atccgctgta agtctgcaga aattgatgat ctattaaaca ataaagatgt ccactaaaat 3720ggaagttttt cctgtcatac tttgttaaga agggtgagaa cagagtacct acattttgaa 3780tggaaggatt ggagctacgg gggtgggggt ggggtgggat tagataaatg cctgctcttt 3840actgaaggct ctttactatt gctttatgat aatgtttcat agttggatat cataatttaa 3900acaagcaaaa ccaaattaag ggccagctca ttcctcccac tcatgatcta tagatctata 3960gatctctcgt gggatcattg tttttctctt gattcccact ttgtggttct aagtactgtg 4020gtttccaaat gtgtcagttt catagcctga agaacgagat cagcagcctc tgttccacat 4080acacttcatt ctcagtattg ttttgccaag ttctaattcc atcagacctc gacctgcagc 4140ccctagataa cttcgtataa tgtatgctat acgaagttat 418071566DNAArtificial Sequence8028 insert nucleic acid with lox site after excision of neo and without homology arms 7gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcgat cgccgtctcg gggccggggc cggggccggg gccggggccg 240gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 300gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 360gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 420gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 480gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 540gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 600gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 660gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg 720gggccggggc cggggccggg gccggggccg gggccggggc cgagaccctc gagggccggc 780cgctagcgcg atcgcggggc gtggtcgggg cgggcccggg ggcgggcccg gggcggggct 840gcggttgcgg tgcctgcgcc cgcggcggcg gaggcgcagg cggtggcgag tgggtgagtg 900aggaggcggc atcctggcgg gtggctgttt ggggttcggc tgccgggaag aggcgcgggt 960agaagcgggg gctctcctca gagctcgacg catttttact ttccctctca tttctctgac 1020cgaagctggg tgtcgggctt tcgcctctag cgactggtgg aattgcctgc atccgggccc 1080cgggcttccc ggcggcggcg gcggcggcgg cggcgcaggg acaagggatg gggatctggc 1140ctcttccttg ctttcccgcc ctcagtaccc gagctgtctc cttcccgggg acccgctggg 1200agcgctgccg ctgcgggctc gagaaaaggg agcctcgggt actgagaggc ctcgcctggg 1260ggaaggccgg agggtgggcg gcgcgcggct tctgcggacc aagtcggggt tcgctaggaa 1320cccgagacgg tccctgccgg cgaggagatc atgcgggatg agatgggggt gtggagacgc 1380ctgcacaatt tcagcccaag cttctagaga gtggtgatga cttgcatatg agggcagcaa 1440tgcaagtcgg tgtgctcccc attctgtggg acatgacctg gttgcttcac agctccgaga 1500tgacacagac ttgcttaaag gaagtgactc gagataactt cgtataatgt atgctatacg 1560aagtta 156683821DNAArtificial Sequence8026 targeting nucleic acid with homology arms and neo cassette 8gaaccgcggc gcgtcaagca gagacgagtt ccgcccacgt gaaagatggc gtttgtagtg 60acagccatcc caattgccct ttccttctag gtggaaagtg gggtctagca agagcaggtg 120tgggtttagg aggtgtgtgt ttttgttttt cccaccctct ctccccacta cttgctctca 180cagtactcgc tgagggtgaa caagaaaaga cctgataaag attaaccaga agaaaacaag 240gagggaaaca accgcagcct gtagcaagct ctggaactca ggagtcgcgc gctatgcgat 300cgcggggccg gggccggggc cgcgatcgcg gggcgtggtc ggggcgggcc cgggggcggg 360cccggggcgg ggctgcggtt gcggtgcctg cgcccgcggc ggcggaggcg caggcggtgg 420cgagtgggtg agtgaggagg cggcatcctg gcgggtggct gtttggggtt cggctgccgg 480gaagaggcgc gggtagaagc gggggctctc ctcagagctc gacgcatttt tactttccct 540ctcatttctc tgaccgaagc tgggtgtcgg gctttcgcct ctagcgactg gtggaattgc 600ctgcatccgg gccccgggct tcccggcggc ggcggcggcg gcggcggcgc agggacaagg 660gatggggatc tggcctcttc cttgctttcc cgccctcagt acccgagctg tctccttccc 720ggggacccgc tgggagcgct gccgctgcgg gctcgagaaa agggagcctc gggtactgag 780aggcctcgcc tgggggaagg ccggagggtg ggcggcgcgc ggcttctgcg gaccaagtcg 840gggttcgcta ggaacccgag acggtccctg ccggcgagga gatcatgcgg gatgagatgg 900gggtgtggag acgcctgcac aatttcagcc caagcttcta gagagtggtg atgacttgca 960tatgagggca gcaatgcaag tcggtgtgct ccccattctg tgggacatga cctggttgct 1020tcacagctcc gagatgacac agacttgctt aaaggaagtg actcgagata acttcgtata 1080atgtatgcta tacgaagtta tatgcatggc ctccgcgccg ggttttggcg cctcccgcgg 1140gcgcccccct cctcacggcg agcgctgcca cgtcagacga agggcgcagc gagcgtcctg 1200atccttccgc ccggacgctc aggacagcgg cccgctgctc ataagactcg gccttagaac 1260cccagtatca gcagaaggac attttaggac gggacttggg tgactctagg gcactggttt 1320tctttccaga gagcggaaca ggcgaggaaa agtagtccct tctcggcgat tctgcggagg 1380gatctccgtg

gggcggtgaa cgccgatgat tatataagga cgcgccgggt gtggcacagc 1440tagttccgtc gcagccggga tttgggtcgc ggttcttgtt tgtggatcgc tgtgatcgtc 1500acttggtgag tagcgggctg ctgggctggc cggggctttc gtggccgccg ggccgctcgg 1560tgggacggaa gcgtgtggag agaccgccaa gggctgtagt ctgggtccgc gagcaaggtt 1620gccctgaact gggggttggg gggagcgcag caaaatggcg gctgttcccg agtcttgaat 1680ggaagacgct tgtgaggcgg gctgtgaggt cgttgaaaca aggtgggggg catggtgggc 1740ggcaagaacc caaggtcttg aggccttcgc taatgcggga aagctcttat tcgggtgaga 1800tgggctgggg caccatctgg ggaccctgac gtgaagtttg tcactgactg gagaactcgg 1860tttgtcgtct gttgcggggg cggcagttat ggcggtgccg ttgggcagtg cacccgtacc 1920tttgggagcg cgcgccctcg tcgtgtcgtg acgtcacccg ttctgttggc ttataatgca 1980gggtggggcc acctgccggt aggtgtgcgg taggcttttc tccgtcgcag gacgcagggt 2040tcgggcctag ggtaggctct cctgaatcga caggcgccgg acctctggtg aggggaggga 2100taagtgaggc gtcagtttct ttggtcggtt ttatgtacct atcttcttaa gtagctgaag 2160ctccggtttt gaactatgcg ctcggggttg gcgagtgtgt tttgtgaagt tttttaggca 2220ccttttgaaa tgtaatcatt tgggtcaata tgtaattttc agtgttagac tagtaaattg 2280tccgctaaat tctggccgtt tttggctttt ttgttagacg tgttgacaat taatcatcgg 2340catagtatat cggcatagta taatacgaca aggtgaggaa ctaaaccatg ggatcggcca 2400ttgaacaaga tggattgcac gcaggttctc cggccgcttg ggtggagagg ctattcggct 2460atgactgggc acaacagaca atcggctgct ctgatgccgc cgtgttccgg ctgtcagcgc 2520aggggcgccc ggttcttttt gtcaagaccg acctgtccgg tgccctgaat gaactgcagg 2580acgaggcagc gcggctatcg tggctggcca cgacgggcgt tccttgcgca gctgtgctcg 2640acgttgtcac tgaagcggga agggactggc tgctattggg cgaagtgccg gggcaggatc 2700tcctgtcatc tcaccttgct cctgccgaga aagtatccat catggctgat gcaatgcggc 2760ggctgcatac gcttgatccg gctacctgcc cattcgacca ccaagcgaaa catcgcatcg 2820agcgagcacg tactcggatg gaagccggtc ttgtcgatca ggatgatctg gacgaagagc 2880atcaggggct cgcgccagcc gaactgttcg ccaggctcaa ggcgcgcatg cccgacggcg 2940atgatctcgt cgtgacccat ggcgatgcct gcttgccgaa tatcatggtg gaaaatggcc 3000gcttttctgg attcatcgac tgtggccggc tgggtgtggc ggaccgctat caggacatag 3060cgttggctac ccgtgatatt gctgaagagc ttggcggcga atgggctgac cgcttcctcg 3120tgctttacgg tatcgccgct cccgattcgc agcgcatcgc cttctatcgc cttcttgacg 3180agttcttctg aggggatccg ctgtaagtct gcagaaattg atgatctatt aaacaataaa 3240gatgtccact aaaatggaag tttttcctgt catactttgt taagaagggt gagaacagag 3300tacctacatt ttgaatggaa ggattggagc tacgggggtg ggggtggggt gggattagat 3360aaatgcctgc tctttactga aggctcttta ctattgcttt atgataatgt ttcatagttg 3420gatatcataa tttaaacaag caaaaccaaa ttaagggcca gctcattcct cccactcatg 3480atctatagat ctatagatct ctcgtgggat cattgttttt ctcttgattc ccactttgtg 3540gttctaagta ctgtggtttc caaatgtgtc agtttcatag cctgaagaac gagatcagca 3600gcctctgttc cacatacact tcattctcag tattgttttg ccaagttcta attccatcag 3660acctcgacct gcagccccta gataacttcg tataatgtat gctatacgaa gttatgctag 3720cattgtgact tgggcatcac ttgactgatg gtaatcagtt gcagagagag aagtgcactg 3780attaagtctg tccacacagg gtctgtctgg ccaggagtgc a 382194387DNAArtificial Sequence8026 insert nucleic acid with homology arms, hexanucleotide repeat(s) and neo cassette 9gaaccgcggc gcgtcaagca gagacgagtt ccgcccacgt gaaagatggc gtttgtagtg 60acagccatcc caattgccct ttccttctag gtggaaagtg gggtctagca agagcaggtg 120tgggtttagg aggtgtgtgt ttttgttttt cccaccctct ctccccacta cttgctctca 180cagtactcgc tgagggtgaa caagaaaaga cctgataaag attaaccaga agaaaacaag 240gagggaaaca accgcagcct gtagcaagct ctggaactca ggagtcgcgc gctatgcgat 300cgccgtctcg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 360gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 420gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 480gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 540gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 600gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 660gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 720gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 780gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc cggggccggg 840gccggggccg gggccggggc cgagaccctc gagggccggc cgctagcgcg atcgcggggc 900gtggtcgggg cgggcccggg ggcgggcccg gggcggggct gcggttgcgg tgcctgcgcc 960cgcggcggcg gaggcgcagg cggtggcgag tgggtgagtg aggaggcggc atcctggcgg 1020gtggctgttt ggggttcggc tgccgggaag aggcgcgggt agaagcgggg gctctcctca 1080gagctcgacg catttttact ttccctctca tttctctgac cgaagctggg tgtcgggctt 1140tcgcctctag cgactggtgg aattgcctgc atccgggccc cgggcttccc ggcggcggcg 1200gcggcggcgg cggcgcaggg acaagggatg gggatctggc ctcttccttg ctttcccgcc 1260ctcagtaccc gagctgtctc cttcccgggg acccgctggg agcgctgccg ctgcgggctc 1320gagaaaaggg agcctcgggt actgagaggc ctcgcctggg ggaaggccgg agggtgggcg 1380gcgcgcggct tctgcggacc aagtcggggt tcgctaggaa cccgagacgg tccctgccgg 1440cgaggagatc atgcgggatg agatgggggt gtggagacgc ctgcacaatt tcagcccaag 1500cttctagaga gtggtgatga cttgcatatg agggcagcaa tgcaagtcgg tgtgctcccc 1560attctgtggg acatgacctg gttgcttcac agctccgaga tgacacagac ttgcttaaag 1620gaagtgactc gagataactt cgtataatgt atgctatacg aagttatatg catggcctcc 1680gcgccgggtt ttggcgcctc ccgcgggcgc ccccctcctc acggcgagcg ctgccacgtc 1740agacgaaggg cgcagcgagc gtcctgatcc ttccgcccgg acgctcagga cagcggcccg 1800ctgctcataa gactcggcct tagaacccca gtatcagcag aaggacattt taggacggga 1860cttgggtgac tctagggcac tggttttctt tccagagagc ggaacaggcg aggaaaagta 1920gtcccttctc ggcgattctg cggagggatc tccgtggggc ggtgaacgcc gatgattata 1980taaggacgcg ccgggtgtgg cacagctagt tccgtcgcag ccgggatttg ggtcgcggtt 2040cttgtttgtg gatcgctgtg atcgtcactt ggtgagtagc gggctgctgg gctggccggg 2100gctttcgtgg ccgccgggcc gctcggtggg acggaagcgt gtggagagac cgccaagggc 2160tgtagtctgg gtccgcgagc aaggttgccc tgaactgggg gttgggggga gcgcagcaaa 2220atggcggctg ttcccgagtc ttgaatggaa gacgcttgtg aggcgggctg tgaggtcgtt 2280gaaacaaggt ggggggcatg gtgggcggca agaacccaag gtcttgaggc cttcgctaat 2340gcgggaaagc tcttattcgg gtgagatggg ctggggcacc atctggggac cctgacgtga 2400agtttgtcac tgactggaga actcggtttg tcgtctgttg cgggggcggc agttatggcg 2460gtgccgttgg gcagtgcacc cgtacctttg ggagcgcgcg ccctcgtcgt gtcgtgacgt 2520cacccgttct gttggcttat aatgcagggt ggggccacct gccggtaggt gtgcggtagg 2580cttttctccg tcgcaggacg cagggttcgg gcctagggta ggctctcctg aatcgacagg 2640cgccggacct ctggtgaggg gagggataag tgaggcgtca gtttctttgg tcggttttat 2700gtacctatct tcttaagtag ctgaagctcc ggttttgaac tatgcgctcg gggttggcga 2760gtgtgttttg tgaagttttt taggcacctt ttgaaatgta atcatttggg tcaatatgta 2820attttcagtg ttagactagt aaattgtccg ctaaattctg gccgtttttg gcttttttgt 2880tagacgtgtt gacaattaat catcggcata gtatatcggc atagtataat acgacaaggt 2940gaggaactaa accatgggat cggccattga acaagatgga ttgcacgcag gttctccggc 3000cgcttgggtg gagaggctat tcggctatga ctgggcacaa cagacaatcg gctgctctga 3060tgccgccgtg ttccggctgt cagcgcaggg gcgcccggtt ctttttgtca agaccgacct 3120gtccggtgcc ctgaatgaac tgcaggacga ggcagcgcgg ctatcgtggc tggccacgac 3180gggcgttcct tgcgcagctg tgctcgacgt tgtcactgaa gcgggaaggg actggctgct 3240attgggcgaa gtgccggggc aggatctcct gtcatctcac cttgctcctg ccgagaaagt 3300atccatcatg gctgatgcaa tgcggcggct gcatacgctt gatccggcta cctgcccatt 3360cgaccaccaa gcgaaacatc gcatcgagcg agcacgtact cggatggaag ccggtcttgt 3420cgatcaggat gatctggacg aagagcatca ggggctcgcg ccagccgaac tgttcgccag 3480gctcaaggcg cgcatgcccg acggcgatga tctcgtcgtg acccatggcg atgcctgctt 3540gccgaatatc atggtggaaa atggccgctt ttctggattc atcgactgtg gccggctggg 3600tgtggcggac cgctatcagg acatagcgtt ggctacccgt gatattgctg aagagcttgg 3660cggcgaatgg gctgaccgct tcctcgtgct ttacggtatc gccgctcccg attcgcagcg 3720catcgccttc tatcgccttc ttgacgagtt cttctgaggg gatccgctgt aagtctgcag 3780aaattgatga tctattaaac aataaagatg tccactaaaa tggaagtttt tcctgtcata 3840ctttgttaag aagggtgaga acagagtacc tacattttga atggaaggat tggagctacg 3900ggggtggggg tggggtggga ttagataaat gcctgctctt tactgaaggc tctttactat 3960tgctttatga taatgtttca tagttggata tcataattta aacaagcaaa accaaattaa 4020gggccagctc attcctccca ctcatgatct atagatctat agatctctcg tgggatcatt 4080gtttttctct tgattcccac tttgtggttc taagtactgt ggtttccaaa tgtgtcagtt 4140tcatagcctg aagaacgaga tcagcagcct ctgttccaca tacacttcat tctcagtatt 4200gttttgccaa gttctaattc catcagacct cgacctgcag cccctagata acttcgtata 4260atgtatgcta tacgaagtta tgctagcatt gtgacttggg catcacttga ctgatggtaa 4320tcagttgcag agagagaagt gcactgatta agtctgtcca cacagggtct gtctggccag 4380gagtgca 4387101957DNAHomo sapiens 10acgtaaccta cggtgtcccg ctaggaaaga gaggtgcgtc aaacagcgac aagttccgcc 60cacgtaaaag atgacgcttg atatctccgg agcatttgga taatgtgaca gttggaatgc 120agtgatgtcg actctttgcc caccgccatc tccagctgtt gccaagacag agattgcttt 180aagtggcaaa tcacctttat tagcagctac ttttgcttac tgggacaata ttcttggtcc 240tagagtaagg cacatttggg ctccaaagac agaacaggta cttctcagtg atggagaaat 300aacttttctt gccaaccaca ctctaaatgg agaaatcctt cgaaatgcag agagtggtgc 360tatagatgta aagttttttg tcttgtctga aaagggagtg attattgttt cattaatctt 420tgatggaaac tggaatgggg atcgcagcac atatggacta tcaattatac ttccacagac 480agaacttagt ttctacctcc cacttcatag agtgtgtgtt gatagattaa cacatataat 540ccggaaagga agaatatgga tgcataagga aagacaagaa aatgtccaga agattatctt 600agaaggcaca gagagaatgg aagatcaggg tcagagtatt attccaatgc ttactggaga 660agtgattcct gtaatggaac tgctttcatc tatgaaatca cacagtgttc ctgaagaaat 720agatatagct gatacagtac tcaatgatga tgatattggt gacagctgtc atgaaggctt 780tcttctcaag taagaatttt tcttttcata aaagctggat gaagcagata ccatcttatg 840ctcacctatg acaagatttg gaagaaagaa aataacagac tgtctactta gattgttcta 900gggacattac gtatttgaac tgttgcttaa atttgtgtta tttttcactc attatatttc 960tatatatatt tggtgttatt ccatttgcta tttaaagaaa ccgagtttcc atcccagaca 1020agaaatcatg gccccttgct tgattctggt ttcttgtttt acttctcatt aaagctaaca 1080gaatcctttc atattaagtt gtactgtaga tgaacttaag ttatttaggc gtagaacaaa 1140attattcata tttatactga tctttttcca tccagcagtg gagtttagta cttaagagtt 1200tgtgccctta aaccagactc cctggattaa tgctgtgtac ccgtgggcaa ggtgcctgaa 1260ttctctatac acctatttcc tcatctgtaa aatggcaata atagtaatag tacctaatgt 1320gtagggttgt tataagcatt gagtaagata aataatataa agcacttaga acagtgcctg 1380gaacataaaa acacttaata atagctcata gctaacattt cctatttaca tttcttctag 1440aaatagccag tatttgttga gtgcctacat gttagttcct ttactagttg ctttacatgt 1500attatcttat attctgtttt aaagtttctt cacagttaca gattttcatg aaattttact 1560tttaataaaa gagaagtaaa agtataaagt attcactttt atgttcacag tcttttcctt 1620taggctcatg atggagtatc agaggcatga gtgtgtttaa cctaagagcc ttaatggctt 1680gaatcagaag cactttagtc ctgtatctgt tcagtgtcag cctttcatac atcattttaa 1740atcccatttg actttaagta agtcacttaa tctctctaca tgtcaatttc ttcagctata 1800aaatgatggt atttcaataa ataaatacat taattaaatg atattatact gactaattgg 1860gctgttttaa ggctcaataa gaaaatttct gtgaaaggtc tctagaaaat gtaggttcct 1920atacaaataa aagataacat tgtgcttata aaaaaaa 195711222PRTHomo sapiens 11Met Ser Thr Leu Cys Pro Pro Pro Ser Pro Ala Val Ala Lys Thr Glu1 5 10 15Ile Ala Leu Ser Gly Lys Ser Pro Leu Leu Ala Ala Thr Phe Ala Tyr 20 25 30Trp Asp Asn Ile Leu Gly Pro Arg Val Arg His Ile Trp Ala Pro Lys 35 40 45Thr Glu Gln Val Leu Leu Ser Asp Gly Glu Ile Thr Phe Leu Ala Asn 50 55 60His Thr Leu Asn Gly Glu Ile Leu Arg Asn Ala Glu Ser Gly Ala Ile65 70 75 80Asp Val Lys Phe Phe Val Leu Ser Glu Lys Gly Val Ile Ile Val Ser 85 90 95Leu Ile Phe Asp Gly Asn Trp Asn Gly Asp Arg Ser Thr Tyr Gly Leu 100 105 110Ser Ile Ile Leu Pro Gln Thr Glu Leu Ser Phe Tyr Leu Pro Leu His 115 120 125Arg Val Cys Val Asp Arg Leu Thr His Ile Ile Arg Lys Gly Arg Ile 130 135 140Trp Met His Lys Glu Arg Gln Glu Asn Val Gln Lys Ile Ile Leu Glu145 150 155 160Gly Thr Glu Arg Met Glu Asp Gln Gly Gln Ser Ile Ile Pro Met Leu 165 170 175Thr Gly Glu Val Ile Pro Val Met Glu Leu Leu Ser Ser Met Lys Ser 180 185 190His Ser Val Pro Glu Glu Ile Asp Ile Ala Asp Thr Val Leu Asn Asp 195 200 205Asp Asp Ile Gly Asp Ser Cys His Glu Gly Phe Leu Leu Lys 210 215 220123261DNAHomo sapiens 12gggcggggct gcggttgcgg tgcctgcgcc cgcggcggcg gaggcgcagg cggtggcgag 60tggatatctc cggagcattt ggataatgtg acagttggaa tgcagtgatg tcgactcttt 120gcccaccgcc atctccagct gttgccaaga cagagattgc tttaagtggc aaatcacctt 180tattagcagc tacttttgct tactgggaca atattcttgg tcctagagta aggcacattt 240gggctccaaa gacagaacag gtacttctca gtgatggaga aataactttt cttgccaacc 300acactctaaa tggagaaatc cttcgaaatg cagagagtgg tgctatagat gtaaagtttt 360ttgtcttgtc tgaaaaggga gtgattattg tttcattaat ctttgatgga aactggaatg 420gggatcgcag cacatatgga ctatcaatta tacttccaca gacagaactt agtttctacc 480tcccacttca tagagtgtgt gttgatagat taacacatat aatccggaaa ggaagaatat 540ggatgcataa ggaaagacaa gaaaatgtcc agaagattat cttagaaggc acagagagaa 600tggaagatca gggtcagagt attattccaa tgcttactgg agaagtgatt cctgtaatgg 660aactgctttc atctatgaaa tcacacagtg ttcctgaaga aatagatata gctgatacag 720tactcaatga tgatgatatt ggtgacagct gtcatgaagg ctttcttctc aatgccatca 780gctcacactt gcaaacctgt ggctgttccg ttgtagtagg tagcagtgca gagaaagtaa 840ataagatagt cagaacatta tgcctttttc tgactccagc agagagaaaa tgctccaggt 900tatgtgaagc agaatcatca tttaaatatg agtcagggct ctttgtacaa ggcctgctaa 960aggattcaac tggaagcttt gtgctgcctt tccggcaagt catgtatgct ccatatccca 1020ccacacacat agatgtggat gtcaatactg tgaagcagat gccaccctgt catgaacata 1080tttataatca gcgtagatac atgagatccg agctgacagc cttctggaga gccacttcag 1140aagaagacat ggctcaggat acgatcatct acactgacga aagctttact cctgatttga 1200atatttttca agatgtctta cacagagaca ctctagtgaa agccttcctg gatcaggtct 1260ttcagctgaa acctggctta tctctcagaa gtactttcct tgcacagttt ctacttgtcc 1320ttcacagaaa agccttgaca ctaataaaat atatagaaga cgatacgcag aagggaaaaa 1380agccctttaa atctcttcgg aacctgaaga tagaccttga tttaacagca gagggcgatc 1440ttaacataat aatggctctg gctgagaaaa ttaaaccagg cctacactct tttatctttg 1500gaagaccttt ctacactagt gtgcaagaac gagatgttct aatgactttt taaatgtgta 1560acttaataag cctattccat cacaatcatg atcgctggta aagtagctca gtggtgtggg 1620gaaacgttcc cctggatcat actccagaat tctgctctca gcaattgcag ttaagtaagt 1680tacactacag ttctcacaag agcctgtgag gggatgtcag gtgcatcatt acattgggtg 1740tctcttttcc tagatttatg cttttgggat acagacctat gtttacaata taataaatat 1800tattgctatc ttttaaagat ataataatag gatgtaaact tgaccacaac tactgttttt 1860ttgaaataca tgattcatgg tttacatgtg tcaaggtgaa atctgagttg gcttttacag 1920atagttgact ttctatcttt tggcattctt tggtgtgtag aattactgta atacttctgc 1980aatcaactga aaactagagc ctttaaatga tttcaattcc acagaaagaa agtgagcttg 2040aacataggat gagctttaga aagaaaattg atcaagcaga tgtttaattg gaattgatta 2100ttagatccta ctttgtggat ttagtccctg ggattcagtc tgtagaaatg tctaatagtt 2160ctctatagtc cttgttcctg gtgaaccaca gttagggtgt tttgtttatt ttattgttct 2220tgctattgtt gatattctat gtagttgagc tctgtaaaag gaaattgtat tttatgtttt 2280agtaattgtt gccaactttt taaattaatt ttcattattt ttgagccaaa ttgaaatgtg 2340cacctcctgt gccttttttc tccttagaaa atctaattac ttggaacaag ttcagatttc 2400actggtcagt cattttcatc ttgttttctt cttgctaagt cttaccatgt acctgctttg 2460gcaatcattg caactctgag attataaaat gccttagaga atatactaac taataagatc 2520tttttttcag aaacagaaaa tagttccttg agtacttcct tcttgcattt ctgcctatgt 2580ttttgaagtt gttgctgttt gcctgcaata ggctataagg aatagcagga gaaattttac 2640tgaagtgctg ttttcctagg tgctactttg gcagagctaa gttatctttt gttttcttaa 2700tgcgtttgga ccattttgct ggctataaaa taactgatta atataattct aacacaatgt 2760tgacattgta gttacacaaa cacaaataaa tattttattt aaaattctgg aagtaatata 2820aaagggaaaa tatatttata agaaagggat aaaggtaata gagcccttct gccccccacc 2880caccaaattt acacaacaaa atgacatgtt cgaatgtgaa aggtcataat agctttccca 2940tcatgaatca gaaagatgtg gacagcttga tgttttagac aaccactgaa ctagatgact 3000gttgtactgt agctcagtca tttaaaaaat atataaatac taccttgtag tgtcccatac 3060tgtgtttttt acatggtaga ttcttattta agtgctaact ggttattttc tttggctggt 3120ttattgtact gttatacaga atgtaagttg tacagtgaaa taagttatta aagcatgtgt 3180aaacattgtt atatatcttt tctcctaaat ggagaatttt gaataaaata tatttgaaat 3240tttaaaaaaa aaaaaaaaaa a 326113481PRTHomo sapiens 13Met Ser Thr Leu Cys Pro Pro Pro Ser Pro Ala Val Ala Lys Thr Glu1 5 10 15Ile Ala Leu Ser Gly Lys Ser Pro Leu Leu Ala Ala Thr Phe Ala Tyr 20 25 30Trp Asp Asn Ile Leu Gly Pro Arg Val Arg His Ile Trp Ala Pro Lys 35 40 45Thr Glu Gln Val Leu Leu Ser Asp Gly Glu Ile Thr Phe Leu Ala Asn 50 55 60His Thr Leu Asn Gly Glu Ile Leu Arg Asn Ala Glu Ser Gly Ala Ile65 70 75 80Asp Val Lys Phe Phe Val Leu Ser Glu Lys Gly Val Ile Ile Val Ser 85 90 95Leu Ile Phe Asp Gly Asn Trp Asn Gly Asp Arg Ser Thr Tyr Gly Leu 100 105 110Ser Ile Ile Leu Pro Gln Thr Glu Leu Ser Phe Tyr Leu Pro Leu His 115 120 125Arg Val Cys Val Asp Arg Leu Thr His Ile Ile Arg Lys Gly Arg Ile 130 135 140Trp Met His Lys Glu Arg Gln Glu Asn Val Gln Lys Ile Ile Leu Glu145 150 155 160Gly Thr Glu Arg Met Glu Asp Gln Gly Gln Ser Ile Ile Pro Met Leu 165 170 175Thr Gly Glu Val Ile Pro Val Met Glu Leu Leu Ser Ser Met Lys Ser 180 185 190His Ser Val Pro Glu Glu Ile Asp Ile Ala Asp Thr Val Leu Asn Asp 195 200 205Asp Asp Ile Gly Asp Ser Cys His Glu Gly Phe Leu Leu Asn Ala Ile 210

215 220Ser Ser His Leu Gln Thr Cys Gly Cys Ser Val Val Val Gly Ser Ser225 230 235 240Ala Glu Lys Val Asn Lys Ile Val Arg Thr Leu Cys Leu Phe Leu Thr 245 250 255Pro Ala Glu Arg Lys Cys Ser Arg Leu Cys Glu Ala Glu Ser Ser Phe 260 265 270Lys Tyr Glu Ser Gly Leu Phe Val Gln Gly Leu Leu Lys Asp Ser Thr 275 280 285Gly Ser Phe Val Leu Pro Phe Arg Gln Val Met Tyr Ala Pro Tyr Pro 290 295 300Thr Thr His Ile Asp Val Asp Val Asn Thr Val Lys Gln Met Pro Pro305 310 315 320Cys His Glu His Ile Tyr Asn Gln Arg Arg Tyr Met Arg Ser Glu Leu 325 330 335Thr Ala Phe Trp Arg Ala Thr Ser Glu Glu Asp Met Ala Gln Asp Thr 340 345 350Ile Ile Tyr Thr Asp Glu Ser Phe Thr Pro Asp Leu Asn Ile Phe Gln 355 360 365Asp Val Leu His Arg Asp Thr Leu Val Lys Ala Phe Leu Asp Gln Val 370 375 380Phe Gln Leu Lys Pro Gly Leu Ser Leu Arg Ser Thr Phe Leu Ala Gln385 390 395 400Phe Leu Leu Val Leu His Arg Lys Ala Leu Thr Leu Ile Lys Tyr Ile 405 410 415Glu Asp Asp Thr Gln Lys Gly Lys Lys Pro Phe Lys Ser Leu Arg Asn 420 425 430Leu Lys Ile Asp Leu Asp Leu Thr Ala Glu Gly Asp Leu Asn Ile Ile 435 440 445Met Ala Leu Ala Glu Lys Ile Lys Pro Gly Leu His Ser Phe Ile Phe 450 455 460Gly Arg Pro Phe Tyr Thr Ser Val Gln Glu Arg Asp Val Leu Met Thr465 470 475 480Phe143356DNAHomo sapiens 14acgtaaccta cggtgtcccg ctaggaaaga gaggtgcgtc aaacagcgac aagttccgcc 60cacgtaaaag atgacgcttg gtgtgtcagc cgtccctgct gcccggttgc ttctcttttg 120ggggcggggt ctagcaagag caggtgtggg tttaggagat atctccggag catttggata 180atgtgacagt tggaatgcag tgatgtcgac tctttgccca ccgccatctc cagctgttgc 240caagacagag attgctttaa gtggcaaatc acctttatta gcagctactt ttgcttactg 300ggacaatatt cttggtccta gagtaaggca catttgggct ccaaagacag aacaggtact 360tctcagtgat ggagaaataa cttttcttgc caaccacact ctaaatggag aaatccttcg 420aaatgcagag agtggtgcta tagatgtaaa gttttttgtc ttgtctgaaa agggagtgat 480tattgtttca ttaatctttg atggaaactg gaatggggat cgcagcacat atggactatc 540aattatactt ccacagacag aacttagttt ctacctccca cttcatagag tgtgtgttga 600tagattaaca catataatcc ggaaaggaag aatatggatg cataaggaaa gacaagaaaa 660tgtccagaag attatcttag aaggcacaga gagaatggaa gatcagggtc agagtattat 720tccaatgctt actggagaag tgattcctgt aatggaactg ctttcatcta tgaaatcaca 780cagtgttcct gaagaaatag atatagctga tacagtactc aatgatgatg atattggtga 840cagctgtcat gaaggctttc ttctcaatgc catcagctca cacttgcaaa cctgtggctg 900ttccgttgta gtaggtagca gtgcagagaa agtaaataag atagtcagaa cattatgcct 960ttttctgact ccagcagaga gaaaatgctc caggttatgt gaagcagaat catcatttaa 1020atatgagtca gggctctttg tacaaggcct gctaaaggat tcaactggaa gctttgtgct 1080gcctttccgg caagtcatgt atgctccata tcccaccaca cacatagatg tggatgtcaa 1140tactgtgaag cagatgccac cctgtcatga acatatttat aatcagcgta gatacatgag 1200atccgagctg acagccttct ggagagccac ttcagaagaa gacatggctc aggatacgat 1260catctacact gacgaaagct ttactcctga tttgaatatt tttcaagatg tcttacacag 1320agacactcta gtgaaagcct tcctggatca ggtctttcag ctgaaacctg gcttatctct 1380cagaagtact ttccttgcac agtttctact tgtccttcac agaaaagcct tgacactaat 1440aaaatatata gaagacgata cgcagaaggg aaaaaagccc tttaaatctc ttcggaacct 1500gaagatagac cttgatttaa cagcagaggg cgatcttaac ataataatgg ctctggctga 1560gaaaattaaa ccaggcctac actcttttat ctttggaaga cctttctaca ctagtgtgca 1620agaacgagat gttctaatga ctttttaaat gtgtaactta ataagcctat tccatcacaa 1680tcatgatcgc tggtaaagta gctcagtggt gtggggaaac gttcccctgg atcatactcc 1740agaattctgc tctcagcaat tgcagttaag taagttacac tacagttctc acaagagcct 1800gtgaggggat gtcaggtgca tcattacatt gggtgtctct tttcctagat ttatgctttt 1860gggatacaga cctatgttta caatataata aatattattg ctatctttta aagatataat 1920aataggatgt aaacttgacc acaactactg tttttttgaa atacatgatt catggtttac 1980atgtgtcaag gtgaaatctg agttggcttt tacagatagt tgactttcta tcttttggca 2040ttctttggtg tgtagaatta ctgtaatact tctgcaatca actgaaaact agagccttta 2100aatgatttca attccacaga aagaaagtga gcttgaacat aggatgagct ttagaaagaa 2160aattgatcaa gcagatgttt aattggaatt gattattaga tcctactttg tggatttagt 2220ccctgggatt cagtctgtag aaatgtctaa tagttctcta tagtccttgt tcctggtgaa 2280ccacagttag ggtgttttgt ttattttatt gttcttgcta ttgttgatat tctatgtagt 2340tgagctctgt aaaaggaaat tgtattttat gttttagtaa ttgttgccaa ctttttaaat 2400taattttcat tatttttgag ccaaattgaa atgtgcacct cctgtgcctt ttttctcctt 2460agaaaatcta attacttgga acaagttcag atttcactgg tcagtcattt tcatcttgtt 2520ttcttcttgc taagtcttac catgtacctg ctttggcaat cattgcaact ctgagattat 2580aaaatgcctt agagaatata ctaactaata agatcttttt ttcagaaaca gaaaatagtt 2640ccttgagtac ttccttcttg catttctgcc tatgtttttg aagttgttgc tgtttgcctg 2700caataggcta taaggaatag caggagaaat tttactgaag tgctgttttc ctaggtgcta 2760ctttggcaga gctaagttat cttttgtttt cttaatgcgt ttggaccatt ttgctggcta 2820taaaataact gattaatata attctaacac aatgttgaca ttgtagttac acaaacacaa 2880ataaatattt tatttaaaat tctggaagta atataaaagg gaaaatatat ttataagaaa 2940gggataaagg taatagagcc cttctgcccc ccacccacca aatttacaca acaaaatgac 3000atgttcgaat gtgaaaggtc ataatagctt tcccatcatg aatcagaaag atgtggacag 3060cttgatgttt tagacaacca ctgaactaga tgactgttgt actgtagctc agtcatttaa 3120aaaatatata aatactacct tgtagtgtcc catactgtgt tttttacatg gtagattctt 3180atttaagtgc taactggtta ttttctttgg ctggtttatt gtactgttat acagaatgta 3240agttgtacag tgaaataagt tattaaagca tgtgtaaaca ttgttatata tcttttctcc 3300taaatggaga attttgaata aaatatattt gaaattttaa aaaaaaaaaa aaaaaa 335615481PRTHomo sapiens 15Met Ser Thr Leu Cys Pro Pro Pro Ser Pro Ala Val Ala Lys Thr Glu1 5 10 15Ile Ala Leu Ser Gly Lys Ser Pro Leu Leu Ala Ala Thr Phe Ala Tyr 20 25 30Trp Asp Asn Ile Leu Gly Pro Arg Val Arg His Ile Trp Ala Pro Lys 35 40 45Thr Glu Gln Val Leu Leu Ser Asp Gly Glu Ile Thr Phe Leu Ala Asn 50 55 60His Thr Leu Asn Gly Glu Ile Leu Arg Asn Ala Glu Ser Gly Ala Ile65 70 75 80Asp Val Lys Phe Phe Val Leu Ser Glu Lys Gly Val Ile Ile Val Ser 85 90 95Leu Ile Phe Asp Gly Asn Trp Asn Gly Asp Arg Ser Thr Tyr Gly Leu 100 105 110Ser Ile Ile Leu Pro Gln Thr Glu Leu Ser Phe Tyr Leu Pro Leu His 115 120 125Arg Val Cys Val Asp Arg Leu Thr His Ile Ile Arg Lys Gly Arg Ile 130 135 140Trp Met His Lys Glu Arg Gln Glu Asn Val Gln Lys Ile Ile Leu Glu145 150 155 160Gly Thr Glu Arg Met Glu Asp Gln Gly Gln Ser Ile Ile Pro Met Leu 165 170 175Thr Gly Glu Val Ile Pro Val Met Glu Leu Leu Ser Ser Met Lys Ser 180 185 190His Ser Val Pro Glu Glu Ile Asp Ile Ala Asp Thr Val Leu Asn Asp 195 200 205Asp Asp Ile Gly Asp Ser Cys His Glu Gly Phe Leu Leu Asn Ala Ile 210 215 220Ser Ser His Leu Gln Thr Cys Gly Cys Ser Val Val Val Gly Ser Ser225 230 235 240Ala Glu Lys Val Asn Lys Ile Val Arg Thr Leu Cys Leu Phe Leu Thr 245 250 255Pro Ala Glu Arg Lys Cys Ser Arg Leu Cys Glu Ala Glu Ser Ser Phe 260 265 270Lys Tyr Glu Ser Gly Leu Phe Val Gln Gly Leu Leu Lys Asp Ser Thr 275 280 285Gly Ser Phe Val Leu Pro Phe Arg Gln Val Met Tyr Ala Pro Tyr Pro 290 295 300Thr Thr His Ile Asp Val Asp Val Asn Thr Val Lys Gln Met Pro Pro305 310 315 320Cys His Glu His Ile Tyr Asn Gln Arg Arg Tyr Met Arg Ser Glu Leu 325 330 335Thr Ala Phe Trp Arg Ala Thr Ser Glu Glu Asp Met Ala Gln Asp Thr 340 345 350Ile Ile Tyr Thr Asp Glu Ser Phe Thr Pro Asp Leu Asn Ile Phe Gln 355 360 365Asp Val Leu His Arg Asp Thr Leu Val Lys Ala Phe Leu Asp Gln Val 370 375 380Phe Gln Leu Lys Pro Gly Leu Ser Leu Arg Ser Thr Phe Leu Ala Gln385 390 395 400Phe Leu Leu Val Leu His Arg Lys Ala Leu Thr Leu Ile Lys Tyr Ile 405 410 415Glu Asp Asp Thr Gln Lys Gly Lys Lys Pro Phe Lys Ser Leu Arg Asn 420 425 430Leu Lys Ile Asp Leu Asp Leu Thr Ala Glu Gly Asp Leu Asn Ile Ile 435 440 445Met Ala Leu Ala Glu Lys Ile Lys Pro Gly Leu His Ser Phe Ile Phe 450 455 460Gly Arg Pro Phe Tyr Thr Ser Val Gln Glu Arg Asp Val Leu Met Thr465 470 475 480Phe163198DNAMus musculus 16gtgtccgggg cggggcggtc ccggggcggg gcccggagcg ggctgcggtt gcggtccctg 60cgccggcggt gaaggcgcag cagcggcgag tggctattgc aagcgttcgg ataatgtgag 120acctggaatg cagtgagacc tgggatgcag ggatgtcgac tatctgcccc ccaccatctc 180ctgctgttgc caagacagag attgctttaa gtggtgaatc acccttgttg gcggctacct 240ttgcttactg ggataatatt cttggtccta gagtaaggca tatttgggct ccaaagacag 300accaagtgct tctcagtgat ggagaaataa cttttcttgc caaccacact ctaaatggag 360aaattcttcg aaatgcagag agtggggcta tagatgtaaa attttttgtc ttatctgaaa 420aaggggtaat tattgtttca ttaatcttcg acggaaactg gaatggagat cggagcactt 480atggactatc aattatactg ccgcagacag agctgagctt ctacctccca cttcacagag 540tgtgtgttga caggctaaca cacattattc gaaaaggaag aatatggatg cataaggaaa 600gacaagaaaa tgtccagaaa attgtcttgg aaggcacaga gaggatggaa gatcagggtc 660agagtatcat tcccatgctt actggggaag tcattcctgt aatggagctg cttgcatcta 720tgaaatccca cagtgttcct gaagacattg atatagctga tacagtgctc aatgatgatg 780acattggtga cagctgtcac gaaggctttc ttctcaatgc catcagctca cacctgcaga 840cctgtggctg ttccgttgta gttggcagca gtgcagagaa agtaaataag atagtaagaa 900cgctgtgcct ttttctgaca ccagcagaga ggaaatgctc caggctgtgt gaagcagaat 960cgtcctttaa gtacgaatcg ggactctttg tgcaaggctt gctaaaggat gcaacaggca 1020gttttgtcct acccttccgg caagttatgt atgccccgta ccccaccacg cacattgatg 1080tggatgtcaa cactgtcaag cagatgccac cgtgtcatga acatatttat aatcaacgca 1140gatacatgag gtcagagctg acagccttct ggagggcaac ttcagaagag gacatggcgc 1200aggacaccat catctacaca gatgagagct tcactcctga tttgaatatt ttccaagatg 1260tcttacacag agacactcta gtgaaagcct tcctggatca ggtcttccat ttgaagcctg 1320gcctgtctct caggagtact ttccttgcac agttcctcct cattcttcac agaaaagcct 1380tgacactaat caagtacatc gaggatgata cgcagaaggg gaaaaagccc tttaagtctc 1440ttcggaacct gaagatagat cttgatttaa cagcagaggg cgatcttaac ataataatgg 1500ctctagctga gaaaattaag ccaggcctac actctttcat ctttgggaga cctttctaca 1560ctagtgtaca agaacgtgat gttctaatga ccttttgacc gtgtggtttg ctgtgtctgt 1620ctcttcacag tcacacctgc tgttacagtg tctcagcagt gtgtgggcac atccttcctc 1680ccgagtcctg ctgcaggaca gggtacacta cacttgtcag tagaagtctg tacctgatgt 1740caggtgcatc gttacagtga atgactcttc ctagaataga tgtactcttt tagggcctta 1800tgtttacaat tatcctaagt actattgctg tcttttaaag atatgaatga tggaatatac 1860acttgaccat aactgctgat tggttttttg ttttgttttg tttgttttct tggaaactta 1920tgattcctgg tttacatgta ccacactgaa accctcgtta gctttacaga taaagtgtga 1980gttgacttcc tgcccctctg tgttctgtgg tatgtccgat tacttctgcc acagctaaac 2040attagagcat ttaaagtttg cagttcctca gaaaggaact tagtctgact acagattagt 2100tcttgagaga agacactgat agggcagagc tgtaggtgaa atcagttgtt agcccttcct 2160ttatagacgt agtccttcag attcggtctg tacagaaatg ccgaggggtc atgcatgggc 2220cctgagtatc gtgacctgtg acaagttttt tgttggttta ttgtagttct gtcaaagaaa 2280gtggcatttg tttttataat tgttgccaac ttttaaggtt aattttcatt atttttgagc 2340cgaattaaaa tgcgcacctc ctgtgccttt cccaatcttg gaaaatataa tttcttggca 2400gagggtcaga tttcagggcc cagtcacttt catctgacca ccctttgcac ggctgccgtg 2460tgcctggctt agattagaag tccttgttaa gtatgtcaga gtacattcgc tgataagatc 2520tttgaagagc agggaagcgt cttgcctctt tcctttggtt tctgcctgta ctctggtgtt 2580tcccgtgtca cctgcatcat aggaacagca gagaaatctg acccagtgct atttttctag 2640gtgctactat ggcaaactca agtggtctgt ttctgttcct gtaacgttcg actatctcgc 2700tagctgtgaa gtactgatta gtggagttct gtgcaacagc agtgtaggag tatacacaaa 2760cacaaatatg tgtttctatt taaaactgtg gacttagcat aaaaagggag aatatattta 2820ttttttacaa aagggataaa aatgggcccc gttcctcacc caccagattt agcgagaaaa 2880agctttctat tctgaaaggt cacggtggct ttggcattac aaatcagaac aacacacact 2940gaccatgatg gcttgtgaac taactgcaag gcactccgtc atggtaagcg agtaggtccc 3000acctcctagt gtgccgctca ttgctttaca cagtagaatc ttatttgagt gctaattgtt 3060gtctttgctg ctttactgtg ttgttataga aaatgtaagc tgtacagtga ataagttatt 3120gaagcatgtg taaacactgt tatatatctt ttctcctaga tggggaattt tgaataaaat 3180acctttgaaa ttctgtgt 319817481PRTMus musculus 17Met Ser Thr Ile Cys Pro Pro Pro Ser Pro Ala Val Ala Lys Thr Glu1 5 10 15Ile Ala Leu Ser Gly Glu Ser Pro Leu Leu Ala Ala Thr Phe Ala Tyr 20 25 30Trp Asp Asn Ile Leu Gly Pro Arg Val Arg His Ile Trp Ala Pro Lys 35 40 45Thr Asp Gln Val Leu Leu Ser Asp Gly Glu Ile Thr Phe Leu Ala Asn 50 55 60His Thr Leu Asn Gly Glu Ile Leu Arg Asn Ala Glu Ser Gly Ala Ile65 70 75 80Asp Val Lys Phe Phe Val Leu Ser Glu Lys Gly Val Ile Ile Val Ser 85 90 95Leu Ile Phe Asp Gly Asn Trp Asn Gly Asp Arg Ser Thr Tyr Gly Leu 100 105 110Ser Ile Ile Leu Pro Gln Thr Glu Leu Ser Phe Tyr Leu Pro Leu His 115 120 125Arg Val Cys Val Asp Arg Leu Thr His Ile Ile Arg Lys Gly Arg Ile 130 135 140Trp Met His Lys Glu Arg Gln Glu Asn Val Gln Lys Ile Val Leu Glu145 150 155 160Gly Thr Glu Arg Met Glu Asp Gln Gly Gln Ser Ile Ile Pro Met Leu 165 170 175Thr Gly Glu Val Ile Pro Val Met Glu Leu Leu Ala Ser Met Lys Ser 180 185 190His Ser Val Pro Glu Asp Ile Asp Ile Ala Asp Thr Val Leu Asn Asp 195 200 205Asp Asp Ile Gly Asp Ser Cys His Glu Gly Phe Leu Leu Asn Ala Ile 210 215 220Ser Ser His Leu Gln Thr Cys Gly Cys Ser Val Val Val Gly Ser Ser225 230 235 240Ala Glu Lys Val Asn Lys Ile Val Arg Thr Leu Cys Leu Phe Leu Thr 245 250 255Pro Ala Glu Arg Lys Cys Ser Arg Leu Cys Glu Ala Glu Ser Ser Phe 260 265 270Lys Tyr Glu Ser Gly Leu Phe Val Gln Gly Leu Leu Lys Asp Ala Thr 275 280 285Gly Ser Phe Val Leu Pro Phe Arg Gln Val Met Tyr Ala Pro Tyr Pro 290 295 300Thr Thr His Ile Asp Val Asp Val Asn Thr Val Lys Gln Met Pro Pro305 310 315 320Cys His Glu His Ile Tyr Asn Gln Arg Arg Tyr Met Arg Ser Glu Leu 325 330 335Thr Ala Phe Trp Arg Ala Thr Ser Glu Glu Asp Met Ala Gln Asp Thr 340 345 350Ile Ile Tyr Thr Asp Glu Ser Phe Thr Pro Asp Leu Asn Ile Phe Gln 355 360 365Asp Val Leu His Arg Asp Thr Leu Val Lys Ala Phe Leu Asp Gln Val 370 375 380Phe His Leu Lys Pro Gly Leu Ser Leu Arg Ser Thr Phe Leu Ala Gln385 390 395 400Phe Leu Leu Ile Leu His Arg Lys Ala Leu Thr Leu Ile Lys Tyr Ile 405 410 415Glu Asp Asp Thr Gln Lys Gly Lys Lys Pro Phe Lys Ser Leu Arg Asn 420 425 430Leu Lys Ile Asp Leu Asp Leu Thr Ala Glu Gly Asp Leu Asn Ile Ile 435 440 445Met Ala Leu Ala Glu Lys Ile Lys Pro Gly Leu His Ser Phe Ile Phe 450 455 460Gly Arg Pro Phe Tyr Thr Ser Val Gln Glu Arg Asp Val Leu Met Thr465 470 475 480Phe183435DNARattus norvegicus 18cgtttgtagt gtcagccatc ccaattgcct gttccttctc tgtgggagtg gtgtctagac 60agtccaggca gggtatgcta ggcaggtgcg ttttggttgc ctcagatcgc aacttgactc 120cataacggtg accaaagaca aaagaaggaa accagattaa aaagaaccgg acacagaccc 180ctgcagaatc tggagcggcc gtggttgggg gcggggctac gacggggcgg actcgggggc 240gtgggagggc ggggccgggg cggggcccgg agccggctgc ggttgcggtc cctgcgccgg 300cggtgaaggc gcagcggcgg cgagtggcta ttgcaagcgt ttggataatg tgagacctgg 360gatgcaggga tgtcgactat ctgcccccca ccatctcctg ctgttgccaa gacagagatt 420gctttaagtg gtgaatcacc cttgttggcg gctacctttg cttactggga taatattctt 480ggtcctagag taaggcacat ttgggctcca aagacagacc aagtactcct cagtgatgga 540gaaatcactt ttcttgccaa ccacactctg aatggagaaa ttcttcggaa tgcggagagt 600ggggcaatag atgtaaagtt ttttgtctta tctgaaaagg gcgtcattat tgtttcatta 660atcttcgacg ggaactggaa cggagatcgg agcacttacg gactatcaat tatactgccg 720cagacggagc tgagtttcta cctcccactg cacagagtgt gtgttgacag gctaacgcac 780atcattcgaa aaggaaggat atggatgcac aaggaaagac aagaaaatgt ccagaaaatt 840gtcttggaag gcaccgagag gatggaagat cagggtcaga gtatcatccc tatgcttact 900ggggaggtca tccctgtgat ggagctgctt

gcgtctatga gatcacacag tgttcctgaa 960gacctcgata tagctgatac agtactcaat gatgatgaca ttggtgacag ctgtcatgaa 1020ggctttcttc tcaatgccat cagctcacat ctgcagacct gcggctgttc tgtggtggta 1080ggcagcagtg cagagaaagt aaataagata gtaagaacac tgtgcctttt tctgacacca 1140gcagagagga agtgctccag gctgtgtgaa gccgaatcgt cctttaaata cgaatctgga 1200ctctttgtac aaggcttgct aaaggatgcg actggcagtt ttgtactacc tttccggcaa 1260gttatgtatg ccccttatcc caccacacac atcgatgtgg atgtcaacac tgtcaagcag 1320atgccaccgt gtcatgaaca tatttataat caacgcagat acatgaggtc agagctgaca 1380gccttctgga gggcaacttc agaagaggac atggctcagg acaccatcat ctacacagat 1440gagagcttca ctcctgattt gaatattttc caagatgtct tacacagaga cactctagtg 1500aaagcctttc tggatcaggt cttccatttg aagcctggcc tgtctctcag gagtactttc 1560cttgcacagt tcctcctcat tcttcacaga aaagccttga cactaatcaa gtacatagag 1620gatgacacgc agaaggggaa aaagcccttt aagtctcttc ggaacctgaa gatagatctt 1680gatttaacag cagagggcga ccttaacata ataatggctc tagctgagaa aattaagcca 1740ggcctacact ctttcatctt cgggagacct ttctacacta gtgtccaaga acgtgatgtt 1800ctaatgactt tttaaacatg tggtttgctc cgtgtgtctc atgacagtca cacttgctgt 1860tacagtgtct cagcgctttg gacacatcct tcctccaggg tcctgccgca ggacacgtta 1920cactacactt gtcagtagag gtctgtacca gatgtcaggt acatcgttgt agtgaatgtc 1980tcttttccta gactagatgt accctcgtag ggacttatgt ttacaaccct cctaagtact 2040agtgctgtct tgtaaggata cgaatgaagg gatgtaaact tcaccacaac tgctggttgg 2100ttttgttgtt tttgtttttt gaaacttata attcatggtt tacatgcatc acactgaaac 2160cctagttagc tttttacagg taagctgtga gttgactgcc tgtccctgtg ttctctggcc 2220tgtacgatct gtggcgtgta ggatcacttt tgcaacaact aaaaactaaa gcactttgtt 2280tgcagttcta cagaaagcaa cttagtctgt ctgcagattc gtttttgaaa gaagacatga 2340gaaagcggag ttttaggtga agtcagttgt tggatcttcc tttatagact tagtccttta 2400gatgtggtct gtatagacat gcccaaccat catgcatggg cactgaatat cgtgaactgt 2460ggtatgcttt ttgttggttt attgtacttc tgtcaaagaa agtggcattg gtttttataa 2520ttgttgccaa gttttaaggt taattttcat tatttttgag ccaaattaaa atgtgcacct 2580cctgtgcctt tcccaatctt ggaaaatata atttcttggc agaaggtcag atttcagggc 2640ccagtcactt tcgtctgact tccctttgca cagtccgcca tgggcctggc ttagaagttc 2700ttgtaaacta tgccagagag tacattcgct gataaaatct tctttgcaga gcaggagagc 2760ttcttgcctc tttcctttca tttctgcctg gactttggtg ttctccacgt tccctgcatc 2820ctaaggacag caggagaact ctgaccccag tgctatttct ctaggtgcta ttgtggcaaa 2880ctcaagcggt ccgtctctgt ccctgtaacg ttcgtacctt gctggctgtg aagtactgac 2940tggtaaagct ccgtgctaca gcagtgtagg gtatacacaa acacaagtaa gtgttttatt 3000taaaactgtg gacttagcat aaaaagggag actatattta ttttttacaa aagggataaa 3060aatggaaccc tttcctcacc caccagattt agtcagaaaa aaacattcta ttctgaaagg 3120tcacagtggt tttgacatga cacatcagaa caacgcacac tgtccatgat ggcttatgaa 3180ctccaagtca ctccatcatg gtaaatgggt agatccctcc ttctagtgtg ccacaccatt 3240gcttcccaca gtagaatctt atttaagtgc taagtgttgt ctctgctggt ttactctgtt 3300gttttagaga atgtaagttg tatagtgaat aagttattga agcatgtgta aacactgtta 3360tacatctttt ctcctagatg gggaatttgg aataaaatac ctttaaaatt caaaaaaaaa 3420aaaaaaaaaa aaaaa 343519481PRTRattus norvegicus 19Met Ser Thr Ile Cys Pro Pro Pro Ser Pro Ala Val Ala Lys Thr Glu1 5 10 15Ile Ala Leu Ser Gly Glu Ser Pro Leu Leu Ala Ala Thr Phe Ala Tyr 20 25 30Trp Asp Asn Ile Leu Gly Pro Arg Val Arg His Ile Trp Ala Pro Lys 35 40 45Thr Asp Gln Val Leu Leu Ser Asp Gly Glu Ile Thr Phe Leu Ala Asn 50 55 60His Thr Leu Asn Gly Glu Ile Leu Arg Asn Ala Glu Ser Gly Ala Ile65 70 75 80Asp Val Lys Phe Phe Val Leu Ser Glu Lys Gly Val Ile Ile Val Ser 85 90 95Leu Ile Phe Asp Gly Asn Trp Asn Gly Asp Arg Ser Thr Tyr Gly Leu 100 105 110Ser Ile Ile Leu Pro Gln Thr Glu Leu Ser Phe Tyr Leu Pro Leu His 115 120 125Arg Val Cys Val Asp Arg Leu Thr His Ile Ile Arg Lys Gly Arg Ile 130 135 140Trp Met His Lys Glu Arg Gln Glu Asn Val Gln Lys Ile Val Leu Glu145 150 155 160Gly Thr Glu Arg Met Glu Asp Gln Gly Gln Ser Ile Ile Pro Met Leu 165 170 175Thr Gly Glu Val Ile Pro Val Met Glu Leu Leu Ala Ser Met Arg Ser 180 185 190His Ser Val Pro Glu Asp Leu Asp Ile Ala Asp Thr Val Leu Asn Asp 195 200 205Asp Asp Ile Gly Asp Ser Cys His Glu Gly Phe Leu Leu Asn Ala Ile 210 215 220Ser Ser His Leu Gln Thr Cys Gly Cys Ser Val Val Val Gly Ser Ser225 230 235 240Ala Glu Lys Val Asn Lys Ile Val Arg Thr Leu Cys Leu Phe Leu Thr 245 250 255Pro Ala Glu Arg Lys Cys Ser Arg Leu Cys Glu Ala Glu Ser Ser Phe 260 265 270Lys Tyr Glu Ser Gly Leu Phe Val Gln Gly Leu Leu Lys Asp Ala Thr 275 280 285Gly Ser Phe Val Leu Pro Phe Arg Gln Val Met Tyr Ala Pro Tyr Pro 290 295 300Thr Thr His Ile Asp Val Asp Val Asn Thr Val Lys Gln Met Pro Pro305 310 315 320Cys His Glu His Ile Tyr Asn Gln Arg Arg Tyr Met Arg Ser Glu Leu 325 330 335Thr Ala Phe Trp Arg Ala Thr Ser Glu Glu Asp Met Ala Gln Asp Thr 340 345 350Ile Ile Tyr Thr Asp Glu Ser Phe Thr Pro Asp Leu Asn Ile Phe Gln 355 360 365Asp Val Leu His Arg Asp Thr Leu Val Lys Ala Phe Leu Asp Gln Val 370 375 380Phe His Leu Lys Pro Gly Leu Ser Leu Arg Ser Thr Phe Leu Ala Gln385 390 395 400Phe Leu Leu Ile Leu His Arg Lys Ala Leu Thr Leu Ile Lys Tyr Ile 405 410 415Glu Asp Asp Thr Gln Lys Gly Lys Lys Pro Phe Lys Ser Leu Arg Asn 420 425 430Leu Lys Ile Asp Leu Asp Leu Thr Ala Glu Gly Asp Leu Asn Ile Ile 435 440 445Met Ala Leu Ala Glu Lys Ile Lys Pro Gly Leu His Ser Phe Ile Phe 450 455 460Gly Arg Pro Phe Tyr Thr Ser Val Gln Glu Arg Asp Val Leu Met Thr465 470 475 480Phe20100DNAMus musculus 20gaaccgcggc gcgtcaagca gagacgagtt ccgcccacgt gaaagatggc gtttgtagtg 60acagccatcc caattgccct ttccttctag gtggaaagtg 100212648DNAArtificial Sequencefloxed neo cassette of 8026 plus lox sites 21ataacttcgt ataatgtatg ctatacgaag ttatatgcat ggcctccgcg ccgggttttg 60gcgcctcccg cgggcgcccc cctcctcacg gcgagcgctg ccacgtcaga cgaagggcgc 120agcgagcgtc ctgatccttc cgcccggacg ctcaggacag cggcccgctg ctcataagac 180tcggccttag aaccccagta tcagcagaag gacattttag gacgggactt gggtgactct 240agggcactgg ttttctttcc agagagcgga acaggcgagg aaaagtagtc ccttctcggc 300gattctgcgg agggatctcc gtggggcggt gaacgccgat gattatataa ggacgcgccg 360ggtgtggcac agctagttcc gtcgcagccg ggatttgggt cgcggttctt gtttgtggat 420cgctgtgatc gtcacttggt gagtagcggg ctgctgggct ggccggggct ttcgtggccg 480ccgggccgct cggtgggacg gaagcgtgtg gagagaccgc caagggctgt agtctgggtc 540cgcgagcaag gttgccctga actgggggtt ggggggagcg cagcaaaatg gcggctgttc 600ccgagtcttg aatggaagac gcttgtgagg cgggctgtga ggtcgttgaa acaaggtggg 660gggcatggtg ggcggcaaga acccaaggtc ttgaggcctt cgctaatgcg ggaaagctct 720tattcgggtg agatgggctg gggcaccatc tggggaccct gacgtgaagt ttgtcactga 780ctggagaact cggtttgtcg tctgttgcgg gggcggcagt tatggcggtg ccgttgggca 840gtgcacccgt acctttggga gcgcgcgccc tcgtcgtgtc gtgacgtcac ccgttctgtt 900ggcttataat gcagggtggg gccacctgcc ggtaggtgtg cggtaggctt ttctccgtcg 960caggacgcag ggttcgggcc tagggtaggc tctcctgaat cgacaggcgc cggacctctg 1020gtgaggggag ggataagtga ggcgtcagtt tctttggtcg gttttatgta cctatcttct 1080taagtagctg aagctccggt tttgaactat gcgctcgggg ttggcgagtg tgttttgtga 1140agttttttag gcaccttttg aaatgtaatc atttgggtca atatgtaatt ttcagtgtta 1200gactagtaaa ttgtccgcta aattctggcc gtttttggct tttttgttag acgtgttgac 1260aattaatcat cggcatagta tatcggcata gtataatacg acaaggtgag gaactaaacc 1320atgggatcgg ccattgaaca agatggattg cacgcaggtt ctccggccgc ttgggtggag 1380aggctattcg gctatgactg ggcacaacag acaatcggct gctctgatgc cgccgtgttc 1440cggctgtcag cgcaggggcg cccggttctt tttgtcaaga ccgacctgtc cggtgccctg 1500aatgaactgc aggacgaggc agcgcggcta tcgtggctgg ccacgacggg cgttccttgc 1560gcagctgtgc tcgacgttgt cactgaagcg ggaagggact ggctgctatt gggcgaagtg 1620ccggggcagg atctcctgtc atctcacctt gctcctgccg agaaagtatc catcatggct 1680gatgcaatgc ggcggctgca tacgcttgat ccggctacct gcccattcga ccaccaagcg 1740aaacatcgca tcgagcgagc acgtactcgg atggaagccg gtcttgtcga tcaggatgat 1800ctggacgaag agcatcaggg gctcgcgcca gccgaactgt tcgccaggct caaggcgcgc 1860atgcccgacg gcgatgatct cgtcgtgacc catggcgatg cctgcttgcc gaatatcatg 1920gtggaaaatg gccgcttttc tggattcatc gactgtggcc ggctgggtgt ggcggaccgc 1980tatcaggaca tagcgttggc tacccgtgat attgctgaag agcttggcgg cgaatgggct 2040gaccgcttcc tcgtgcttta cggtatcgcc gctcccgatt cgcagcgcat cgccttctat 2100cgccttcttg acgagttctt ctgaggggat ccgctgtaag tctgcagaaa ttgatgatct 2160attaaacaat aaagatgtcc actaaaatgg aagtttttcc tgtcatactt tgttaagaag 2220ggtgagaaca gagtacctac attttgaatg gaaggattgg agctacgggg gtgggggtgg 2280ggtgggatta gataaatgcc tgctctttac tgaaggctct ttactattgc tttatgataa 2340tgtttcatag ttggatatca taatttaaac aagcaaaacc aaattaaggg ccagctcatt 2400cctcccactc atgatctata gatctataga tctctcgtgg gatcattgtt tttctcttga 2460ttcccacttt gtggttctaa gtactgtggt ttccaaatgt gtcagtttca tagcctgaag 2520aacgagatca gcagcctctg ttccacatac acttcattct cagtattgtt ttgccaagtt 2580ctaattccat cagacctcga cctgcagccc ctagataact tcgtataatg tatgctatac 2640gaagttat 264822100DNAMus musculus 22attgtgactt gggcatcact tgactgatgg taatcagttg cagagagaga agtgcactga 60ttaagtctgt ccacacaggg tctgtctggc caggagtgca 10023100DNAMus musculus 23gaaccgcggc gcgtcaagca gagacgagtt ccgcccacgt gaaagatggc gtttgtagtg 60acagccatcc caattgccct ttccttctag gtggaaagtg 100242648DNAArtificial Sequencefloxed cassette of 8028 plus lox sites 24ataacttcgt ataatgtatg ctatacgaag ttatatgcat ggcctccgcg ccgggttttg 60gcgcctcccg cgggcgcccc cctcctcacg gcgagcgctg ccacgtcaga cgaagggcgc 120agcgagcgtc ctgatccttc cgcccggacg ctcaggacag cggcccgctg ctcataagac 180tcggccttag aaccccagta tcagcagaag gacattttag gacgggactt gggtgactct 240agggcactgg ttttctttcc agagagcgga acaggcgagg aaaagtagtc ccttctcggc 300gattctgcgg agggatctcc gtggggcggt gaacgccgat gattatataa ggacgcgccg 360ggtgtggcac agctagttcc gtcgcagccg ggatttgggt cgcggttctt gtttgtggat 420cgctgtgatc gtcacttggt gagtagcggg ctgctgggct ggccggggct ttcgtggccg 480ccgggccgct cggtgggacg gaagcgtgtg gagagaccgc caagggctgt agtctgggtc 540cgcgagcaag gttgccctga actgggggtt ggggggagcg cagcaaaatg gcggctgttc 600ccgagtcttg aatggaagac gcttgtgagg cgggctgtga ggtcgttgaa acaaggtggg 660gggcatggtg ggcggcaaga acccaaggtc ttgaggcctt cgctaatgcg ggaaagctct 720tattcgggtg agatgggctg gggcaccatc tggggaccct gacgtgaagt ttgtcactga 780ctggagaact cggtttgtcg tctgttgcgg gggcggcagt tatggcggtg ccgttgggca 840gtgcacccgt acctttggga gcgcgcgccc tcgtcgtgtc gtgacgtcac ccgttctgtt 900ggcttataat gcagggtggg gccacctgcc ggtaggtgtg cggtaggctt ttctccgtcg 960caggacgcag ggttcgggcc tagggtaggc tctcctgaat cgacaggcgc cggacctctg 1020gtgaggggag ggataagtga ggcgtcagtt tctttggtcg gttttatgta cctatcttct 1080taagtagctg aagctccggt tttgaactat gcgctcgggg ttggcgagtg tgttttgtga 1140agttttttag gcaccttttg aaatgtaatc atttgggtca atatgtaatt ttcagtgtta 1200gactagtaaa ttgtccgcta aattctggcc gtttttggct tttttgttag acgtgttgac 1260aattaatcat cggcatagta tatcggcata gtataatacg acaaggtgag gaactaaacc 1320atgggatcgg ccattgaaca agatggattg cacgcaggtt ctccggccgc ttgggtggag 1380aggctattcg gctatgactg ggcacaacag acaatcggct gctctgatgc cgccgtgttc 1440cggctgtcag cgcaggggcg cccggttctt tttgtcaaga ccgacctgtc cggtgccctg 1500aatgaactgc aggacgaggc agcgcggcta tcgtggctgg ccacgacggg cgttccttgc 1560gcagctgtgc tcgacgttgt cactgaagcg ggaagggact ggctgctatt gggcgaagtg 1620ccggggcagg atctcctgtc atctcacctt gctcctgccg agaaagtatc catcatggct 1680gatgcaatgc ggcggctgca tacgcttgat ccggctacct gcccattcga ccaccaagcg 1740aaacatcgca tcgagcgagc acgtactcgg atggaagccg gtcttgtcga tcaggatgat 1800ctggacgaag agcatcaggg gctcgcgcca gccgaactgt tcgccaggct caaggcgcgc 1860atgcccgacg gcgatgatct cgtcgtgacc catggcgatg cctgcttgcc gaatatcatg 1920gtggaaaatg gccgcttttc tggattcatc gactgtggcc ggctgggtgt ggcggaccgc 1980tatcaggaca tagcgttggc tacccgtgat attgctgaag agcttggcgg cgaatgggct 2040gaccgcttcc tcgtgcttta cggtatcgcc gctcccgatt cgcagcgcat cgccttctat 2100cgccttcttg acgagttctt ctgaggggat ccgctgtaag tctgcagaaa ttgatgatct 2160attaaacaat aaagatgtcc actaaaatgg aagtttttcc tgtcatactt tgttaagaag 2220ggtgagaaca gagtacctac attttgaatg gaaggattgg agctacgggg gtgggggtgg 2280ggtgggatta gataaatgcc tgctctttac tgaaggctct ttactattgc tttatgataa 2340tgtttcatag ttggatatca taatttaaac aagcaaaacc aaattaaggg ccagctcatt 2400cctcccactc atgatctata gatctataga tctctcgtgg gatcattgtt tttctcttga 2460ttcccacttt gtggttctaa gtactgtggt ttccaaatgt gtcagtttca tagcctgaag 2520aacgagatca gcagcctctg ttccacatac acttcattct cagtattgtt ttgccaagtt 2580ctaattccat cagacctcga cctgcagccc ctagataact tcgtataatg tatgctatac 2640gaagttat 264825100DNAMus musculus 25attgtgactt gggcatcact tgactgatgg taatcagttg cagagagaga agtgcactga 60ttaagtctgt ccacacaggg tctgtctggc caggagtgca 10026680DNAMus musculus 26ccagtagcag cacccacgtc caccttctgt ctagtaatgt ccaacacctc cctcagtcca 60aacactgctc tgcatccatg tggctcccat ttatacctga agcacttgat ggggcctcaa 120tgttttacta gagcccaccc ccctgcaact ctgagaccct ctggatttgt ctgtcagtgc 180ctcactgggg cgttggataa tttcttaaaa ggtcaagttc cctcagcagc attctctgag 240cagtctgaag atgtgtgctt ttcacagttc aaatccatgt ggctgtttca cccacctgcc 300tggccttggg ttatctatca ggacctagcc tagaagcagg tgtgtggcac ttaacaccta 360agctgagtga ctaactgaac actcaagtgg atgccatctt tgtcacttct tgactgtgac 420acaagcaact cctgatgcca aagccctgcc cacccctctc atgcccatat ttggacatgg 480tacaggtcct cactggccat ggtctgtgag gtcctggtcc tctttgactt cataattcct 540aggggccact agtatctata agaggaagag ggtgctggct cccaggccac agcccacaaa 600attccacctg ctcacaggtt ggctggctcg acccaggtgg tgtcccctgc tctgagccag 660ctcccggcca agccagcacc 680271052DNAMus musculus 27tgccatcatc acaggatgtc cttccttctc cagaagacag actggggctg aaggaaaagc 60cggccaggct cagaacgagc cccactaatt actgcctcca acagctttcc actcactgcc 120cccagcccaa catccccttt ttaactggga agcattccta ctctccattg tacgcacacg 180ctcggaagcc tggctgtggg tttgggcatg agaggcaggg acaacaaaac cagtatatat 240gattataact ttttcctgtt tccctatttc caaatggtcg aaaggaggaa gttaggtcta 300cctaagctga atgtattcag ttagcaggag aaatgaaatc ctatacgttt aatactagag 360gagaaccgcc ttagaatatt tatttcattg gcaatgactc caggactaca cagcgaaatt 420gtattgcatg tgctgccaaa atactttagc tctttccttc gaagtacgtc ggatcctgta 480attgagacac cgagtttagg tgactagggt tttcttttga ggaggagtcc cccaccccgc 540cccgctctgc cgcgacagga agctagcgat ccggaggact tagaatacaa tcgtagtgtg 600ggtaaacatg gagggcaagc gcctgcaaag ggaagtaaga agattcccag tccttgttga 660aatccatttg caaacagagg aagctgccgc gggtcgcagt cggtgggggg aagccctgaa 720ccccacgctg cacggctggg ctggccaggt gcggccacgc ccccatcgcg gcggctggta 780ggagtgaatc agaccgtcag tattggtaaa gaagtctgcg gcagggcagg gagggggaag 840agtagtcagt cgctcgctca ctcgctcgct cgcacagaca ctgctgcagt gacactcggc 900cctccagtgt cgcggagacg caagagcagc gcgcagcacc tgtccgcccg gagcgagccc 960ggcccgcggc cgtagaaaag gagggaccgc cgaggtgcgc gtcagtactg ctcagcccgg 1020cagggacgcg ggaggatgtg gactgggtgg ac 1052282008DNAMus musculus 28gtggtgctga ctcagcatcg gttaataaac cctctgcagg aggctggatt tcttttgttt 60aattatcact tggacctttc tgagaactct taagaattgt tcattcgggt ttttttgttt 120tgttttggtt tggttttttt gggttttttt tttttttttt tttttggttt ttggagacag 180ggtttctctg tatatagccc tggcacaaga gcaagctaac agcctgtttc ttcttggtgc 240tagcgccccc tctggcagaa aatgaaataa caggtggacc tacaaccccc cccccccccc 300ccagtgtatt ctactcttgt ccccggtata aatttgattg ttccgaacta cataaattgt 360agaaggattt tttagatgca catatcattt tctgtgatac cttccacaca cccctccccc 420ccaaaaaaat ttttctggga aagtttcttg aaaggaaaac agaagaacaa gcctgtcttt 480atgattgagt tgggcttttg ttttgctgtg tttcatttct tcctgtaaac aaatactcaa 540atgtccactt cattgtatga ctaagttggt atcattaggt tgggtctggg tgtgtgaatg 600tgggtgtgga tctggatgtg ggtgggtgtg tatgccccgt gtgtttagaa tactagaaaa 660gataccacat cgtaaacttt tgggagagat gatttttaaa aatgggggtg ggggtgaggg 720gaacctgcga tgaggcaagc aagataaggg gaagacttga gtttctgtga tctaaaaagt 780cgctgtgatg ggatgctggc tataaatggg cccttagcag cattgtttct gtgaattgga 840ggatccctgc tgaaggcaaa agaccattga aggaagtacc gcatctggtt tgttttgtaa 900tgagaagcag gaatgcaagg tccacgctct taataataaa caaacaggac attgtatgcc 960atcatcacag gatgtccttc cttctccaga agacagactg gggctgaagg aaaagccggc 1020caggctcaga acgagcccca ctaattactg cctccaacag ctttccactc actgccccca 1080gcccaacatc ccctttttaa ctgggaagca ttcctactct ccattgtacg cacacgctcg 1140gaagcctggc tgtgggtttg ggcatgagag gcagggacaa caaaaccagt atatatgatt 1200ataacttttt cctgtttccc tatttccaaa tggtcgaaag gaggaagtta ggtctaccta 1260agctgaatgt attcagttag caggagaaat gaaatcctat acgtttaata ctagaggaga 1320accgccttag aatatttatt tcattggcaa tgactccagg actacacagc gaaattgtat 1380tgcatgtgct gccaaaatac tttagctctt tccttcgaag tacgtcggat cctgtaattg 1440agacaccgag tttaggtgac tagggttttc ttttgaggag gagtccccca ccccgccccg 1500ctctgccgcg acaggaagct agcgatccgg aggacttaga atacaatcgt agtgtgggta 1560aacatggagg gcaagcgcct gcaaagggaa gtaagaagat tcccagtcct tgttgaaatc

1620catttgcaaa cagaggaagc tgccgcgggt cgcagtcggt ggggggaagc cctgaacccc 1680acgctgcacg gctgggctgg ccaggtgcgg ccacgccccc atcgcggcgg ctggtaggag 1740tgaatcagac cgtcagtatt ggtaaagaag tctgcggcag ggcagggagg gggaagagta 1800gtcagtcgct cgctcactcg ctcgctcgca cagacactgc tgcagtgaca ctcggccctc 1860cagtgtcgcg gagacgcaag agcagcgcgc agcacctgtc cgcccggagc gagcccggcc 1920cgcggccgta gaaaaggagg gaccgccgag gtgcgcgtca gtactgctca gcccggcagg 1980gacgcgggag gatgtggact gggtggac 200829252DNAArtificial SequenceSouthern blot probe 29ccggggcggg gctgcggttg cggtgcctgc gcccgcggcg gcggaggcgc aggcggtggc 60gagtgggtga gtgaggaggc ggcatcctgg cgggtggctg tttggggttc ggctgccggg 120aagaggcgcg ggtagaagcg ggggctctcc tcagagctcg acgcattttt actttccctc 180tcatttctct gaccgaagct gggtgtcggg ctttcgcctc tagcgactgg tggaattgcc 240tgcatccggg cc 2523039DNAArtificial SequenceAsuragen 2-Primer Fwd 30tgcgcctccg ccgccgcggg cgcaggcacc gcaaccgca 393135DNAArtificial SequenceAsuragen 2-Primer Rev 31cgcagcctgt agcaagctct ggaactcagg agtcg 353236DNAArtificial SequenceAsuragen 3-Primer Fwd 32atgcaggcaa ttccaccagt cgctagaggc gaaagc 363340DNAArtificial SequenceAsuragen 3-Primer Rev 33taaccagaag aaaacaagga gggaaacaac cgcagcctgt 4034158DNAHomo sapiens 34acgtaaccta cggtgtcccg ctaggaaaga gaggtgcgtc aaacagcgac aagttccgcc 60cacgtaaaag atgacgcttg gtgtgtcagc cgtccctgct gcccggttgc ttctcttttg 120ggggcggggt ctagcaagag caggtgtggg tttaggag 15835487DNAHomo sapiens 35tatctccgga gcatttggat aatgtgacag ttggaatgca gtgatgtcga ctctttgccc 60accgccatct ccagctgttg ccaagacaga gattgcttta agtggcaaat cacctttatt 120agcagctact tttgcttact gggacaatat tcttggtcct agagtaaggc acatttgggc 180tccaaagaca gaacaggtac ttctcagtga tggagaaata acttttcttg ccaaccacac 240tctaaatgga gaaatccttc gaaatgcaga gagtggtgct atagatgtaa agttttttgt 300cttgtctgaa aagggagtga ttattgtttc attaatcttt gatggaaact ggaatgggga 360tcgcagcaca tatggactat caattatact tccacagaca gaacttagtt tctacctccc 420acttcataga gtgtgtgttg atagattaac acatataatc cggaaaggaa gaatatggat 480gcataag 48736198DNAHomo sapiens 36gggtctagca agagcaggtg tgggtttagg aggtgtgtgt ttttgttttt cccaccctct 60ctccccacta cttgctctca cagtactcgc tgagggtgaa caagaaaaga cctgataaag 120attaaccaga agaaaacaag gagggaaaca accgcagcct gtagcaagct ctggaactca 180ggagtcgcgc gctatgcg 19837118DNAHomo sapiens 37gcgatcgcgg ggcgtggtcg gggcgggccc gggggcgggc ccggggcggg gctgcggttg 60cggtgcctgc gcccgcggcg gcggaggcgc aggcggtggc gagtgggtga gtgaggag 1183820DNAArtificial SequenceSynthetic 38agtactgtga gagcaagtag 203920DNAArtificial SequenceSynthetic 39gctctcacag tactcgctga 204020DNAArtificial SequenceSynthetic 40ccgcagcctg tagcaagctc 204120DNAArtificial SequenceSynthetic 41cggccgctag cgcgatcgcg 204220DNAArtificial SequenceSynthetic 42acgccccgcg atcgcgctag 204320DNAArtificial SequenceSynthetic 43tggcgagtgg gtgagtgagg 204420DNAArtificial SequenceSynthetic 44ggaagaggcg cgggtagaag 20451302DNAArtificial SequenceSynthetic 45gaacttacgg agtcccacga gggaaccgcg gcgcgtcaag cagagacgag ttccgcccac 60gtgaaagatg gcgtttgtag tgacagccat cccaattgcc ctttccttct aggtggaaag 120tggggtctag caagagcagg tgtgggttta ggaggtgtgt gtttttgttt ttcccaccct 180ctctccccac tacttgctct cacagtactc gctgagggtg aacaagaaaa gacctgataa 240agattaacca gaagaaaaca aggagggaaa caaccgcagc ctgtagcaag ctctggaact 300caggagtcgc gcgctatgcg atcgccgtct cggggccggg gccggggccg gggccggggc 360cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 420cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 480cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 540cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 600cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 660cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 720cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 780cggggccggg gccggggccg gggccggggc cggggccggg gccggggccg gggccggggc 840cggggccggg gccggggccg gggccggggc cggggccggg gccgagaccc tcgagggccg 900gccgctagcg cgatcgcggg gcgtggtcgg ggcgggcccg ggggcgggcc cggggcgggg 960ctgcggttgc ggtgcctgcg cccgcggcgg cggaggcgca ggcggtgcga gtgggtgagt 1020gaggaggcgg catcctggcg ggtggctgtt tggggttcgg ctgccgggaa gaggcgcggg 1080tagaagcggg ggctctcctc agagctcgac gcatttttac tttccctctc atttctctga 1140ccgaagctgg gtgtcgggct ttcgcctcta gcgactggtg gaattgcctg catccgggcc 1200ccgggcttcc cggcggcggc ggcggcggcg gcggcgcagg gacaagggat ggggatctgg 1260cctcttcctt gctttcccgc cctcagtacc cgagctgtct cc 13024621DNAArtificial SequenceSynthetic 46gagtactgtg agagcaagta g 214721DNAArtificial SequenceSynthetic 47gccgcagcct gtagcaagct c 214821DNAArtificial SequenceSynthetic 48gcggccgcta gcgcgatcgc g 214921DNAArtificial SequenceSynthetic 49gacgccccgc gatcgcgcta g 215021DNAArtificial SequenceSynthetic 50gtggcgagtg ggtgagtgag g 215126DNAArtificial SequenceSynthetic 51acaccgctct cacagtactc gctgag 265227DNAArtificial SequenceSynthetic 52acaccgccgc agcctgtagc aagctcg 275327DNAArtificial SequenceSynthetic 53acaccgagta ctgtgagagc aagtagg 275427DNAArtificial SequenceSynthetic 54acaccgacgc cccgcgatcg cgctagg 275527DNAArtificial SequenceSynthetic 55acaccgcggc cgctagcgcg atcgcgg 275627DNAArtificial SequenceSynthetic 56acaccgtggc gagtgggtga gtgaggg 275726DNAArtificial SequenceSynthetic 57acaccggaag aggcgcgggt agaagg 265820DNAArtificial SequenceSynthetic 58gacgcgttaa tgccaacttt 205920DNAArtificial SequenceSynthetic 59gagggcctat ttcccatgat 206020DNAArtificial SequenceSynthetic 60gacgcgttaa tgccaacttt 206120DNAArtificial SequenceSynthetic 61gaacttacgg agtcccacga 206220DNAArtificial SequenceSynthetic 62ggagacagct cgggtactga 206382DNAArtificial SequenceSynthetic 63gttggaacca ttcaaaacag catagcaagt taaaataagg ctagtccgtt atcaacttga 60aaaagtggca ccgagtcggt gc 826476DNAArtificial SequenceSynthetic 64gttttagagc tagaaatagc aagttaaaat aaggctagtc cgttatcaac ttgaaaaagt 60ggcaccgagt cggtgc 766586DNAArtificial SequenceSynthetic 65gtttaagagc tatgctggaa acagcatagc aagtttaaat aaggctagtc cgttatcaac 60ttgaaaaagt ggcaccgagt cggtgc 866619DNAArtificial SequenceForward Primer 66catcccaatt gccctttcc 196721DNAArtificial SequenceReverse Primer 67cccacacctg ctcttgctag a 216817DNAArtificial SequenceProbe 68tctaggtgga aagtggg 176920DNAArtificial SequenceForward Primer 69gagcaggtgt gggtttagga 207020DNAArtificial SequenceReverse Primer 70ccaggtctca ctgcattcca 207126DNAArtificial SequenceProbe 71attgcaagcg ttcggataat gtgaga 267221DNAArtificial SequenceForward Primer 72gctgtcacga aggctttctt c 217320DNAArtificial SequenceReverse Primer 73gcactgctgc caactacaac 207424DNAArtificial SequenceProbe 74tcaatgccat cagctcacac ctgc 247517DNAArtificial SequenceForward Primer 75aagaggcgcg ggtagaa 177622DNAArtificial SequenceReverse Primer 76cagcttcggt cagagaaatg ag 227724DNAArtificial SequenceProbe 77ctctcctcag agctcgacgc attt 247820DNAArtificial SequenceForward Primer 78ctgcacaatt tcagcccaag 207920DNAArtificial SequenceReverse Primer 79caggtcatgt cccacagaat 208024DNAArtificial SequenceProbe 80catatgaggg cagcaatgca agtc 248116DNAArtificial SequenceLNA Probe for sense G4C2 RNATYE563(1)..(1) 81ccccggcccc ggcccc 168222DNAArtificial SequenceLNA Probe for antisense G4C2 RNATYE563(1)..(1) 82ggggccgggg ccggggggcc cc 228318DNAArtificial SequenceDNA Probe for sense G4C2 RNACy3(18)..(18) 83ccccggcccc ggccccgg 188417DNAArtificial SequenceDNA Probe for antisense G4C2 RNACy3(17)..(17) 84ggggccgggg ccggggc 17

* * * * *

Patent Diagrams and Documents