Arc-based Capsids And Uses Thereof

MALONE; Colin ;   et al.

Patent Application Summary

U.S. patent application number 17/473209 was filed with the patent office on 2021-12-30 for arc-based capsids and uses thereof. This patent application is currently assigned to VNV NEWCO INC.. The applicant listed for this patent is VNV NEWCO INC.. Invention is credited to Jessica CRISP, Adam FRAITES, Zachary GILBERT, Colin MALONE, Ian PEIKON, Andrey PISAREV.

Application Number20210403907 17/473209
Document ID /
Family ID1000005899865
Filed Date2021-12-30

United States Patent Application 20210403907
Kind Code A1
MALONE; Colin ;   et al. December 30, 2021

ARC-BASED CAPSIDS AND USES THEREOF

Abstract

Disclosed herein, in certain embodiments, are recombinant Arc and endogenous Gag polypeptides, and methods of using recombinant Arc and endogenous Gag polypeptides.


Inventors: MALONE; Colin; (Brooklyn, NY) ; PEIKON; Ian; (Bethpage, NY) ; GILBERT; Zachary; (Brooklyn, NY) ; PISAREV; Andrey; (Brooklyn, NY) ; FRAITES; Adam; (Long Beach Twp., NJ) ; CRISP; Jessica; (Phoenix, AZ)
Applicant:
Name City State Country Type

VNV NEWCO INC.

New York

NY

US
Assignee: VNV NEWCO INC.

Family ID: 1000005899865
Appl. No.: 17/473209
Filed: September 13, 2021

Related U.S. Patent Documents

Application Number Filing Date Patent Number
17277119
PCT/US2019/051786 Sep 18, 2019
17473209
62733015 Sep 18, 2018

Current U.S. Class: 1/1
Current CPC Class: C12N 2310/20 20170501; C12N 2310/141 20130101; C12N 2310/531 20130101; C12N 2310/3519 20130101; C12N 15/111 20130101; C12N 2320/32 20130101; C12N 2310/12 20130101
International Class: C12N 15/11 20060101 C12N015/11

Claims



1. A composition comprising: (a) a capsid that comprises an endogenous Gag polypeptide and a heterologous cargo, wherein the endogenous Gag polypeptide is not an Arc polypeptide; and (b) a delivery component.

2. The composition of claim 1, wherein the delivery component comprises a microvesicle or a microparticle.

3. The composition of claim 1, wherein the delivery component further comprises a fusogenic molecule.

4. The composition of claim 1, wherein the delivery component further comprises a cell-specific binding protein or an engineered protein that binds to an antigen or cell surface molecule.

5. The composition of claim 1, further comprising a second polypeptide that is fused to the endogenous Gag polypeptide, wherein the second polypeptide binds to a target receptor, antigen, or cell surface molecule.

6. The composition of claim 5, wherein the second polypeptide is an antibody or antigen-binding fragment thereof, a human protein, a viral protein, or an engineered protein.

7. The composition of claim 1, wherein the endogenous Gag polypeptide is a Paraneoplastic Ma antigen family polypeptide.

8. The composition of claim 1, wherein the endogenous Gag polypeptide is a retrotransposon Gag-like family polypeptide.

9. The composition of claim 8, wherein the retrotransposon Gag-like family polypeptide is a PEG10 polypeptide.

10. The composition of claim 1, wherein the heterologous cargo comprises a nucleic acid molecule that comprises or encodes a component of a CRISPR-Cas system, zinc finger nuclease (ZFN) system, or transcription activator-like effector nuclease (TALEN) system.

11. A composition comprising: (a) a capsid that comprises an endogenous Gag polypeptide, wherein the endogenous Gag polypeptide is not an Arc polypeptide; and (b) a delivery component that comprises a liposome or a micelle.

12. The composition of claim 11, wherein the delivery component comprises the liposome.

13. The composition of claim 12, wherein the liposome comprises a surface presented lipopeptide.

14. The composition of claim 11, wherein the delivery component comprises the micelle.

15. The composition of claim 14, wherein the micelle comprises a hydrophilic polymer, a hydrophilic copolymer, a pH-sensitive polymer, or a pH-sensitive copolymer.

16. The composition of claim 11, further comprising a fusogenic molecule.

17. The composition of claim 11, further comprising a cell-specific binding protein or an engineered protein that binds to an antigen or cell surface molecule.

18. The composition of claim 11, wherein the endogenous Gag polypeptide is a Paraneoplastic Ma antigen family polypeptide.

19. The composition of claim 11, wherein the endogenous Gag polypeptide is a retrotransposon Gag-like family polypeptide.

20. The composition of claim 19, wherein the retrotransposon Gag-like family polypeptide is a PEG10 polypeptide.

21. The composition of claim 11, further comprising a nucleic acid molecule that comprises or encodes a component of a CRISPR-Cas system, zinc finger nuclease (ZFN) system, or transcription activator-like effector nuclease (TALEN) system.

22. A composition comprising: (a) a capsid that comprises a recombinant endogenous Gag polypeptide and a cargo, wherein the recombinant endogenous Gag polypeptide is not an Arc polypeptide; and (b) a delivery component.

23. The composition of claim 22, wherein the delivery component comprises a microvesicle or a microparticle.

24. The composition of claim 22, wherein the delivery component comprises a fusogenic molecule.

25. The composition of claim 22, wherein the delivery component comprises a cell-specific binding protein or an engineered protein that binds to an antigen or cell surface molecule.

26. The composition of claim 22, wherein the recombinant endogenous Gag polypeptide is a Paraneoplastic Ma antigen family polypeptide.

27. The composition of claim 22, wherein the recombinant endogenous Gag polypeptide is a retrotransposon Gag-like family polypeptide.

28. The composition of claim 27, wherein the retrotransposon Gag-like family polypeptide is a PEG10 polypeptide.

29. The composition of claim 22, wherein the cargo comprises a nucleic acid molecule that comprises or encodes a component of a CRISPR-Cas system, zinc finger nuclease (ZFN) system, or transcription activator-like effector nuclease (TALEN) system.

30. A method of preparing a composition for suitable for delivery of a cargo, the method comprising: (a) isolating an endogenous Gag polypeptide, wherein the endogenous Gag polypeptide is not an Arc polypeptide; (b) incubating the endogenous Gag polypeptide with the cargo in conditions suitable for capsid formation, thereby packaging the cargo in a capsid that comprises the endogenous Gag polypeptide; and (c) formulating the capsid with a delivery component that comprises a liposome or a micelle.
Description



CROSS REFERENCE

[0001] This application is a continuation of U.S. patent application Ser. No. 17/277,119, filed Mar. 17, 2021, which is a national phase entry of International Application No. PCT/US2019/051786, filed Sep. 18, 2019, which claims the benefit of U.S. Provisional Patent Application No. 62/733,015, filed Sep. 18, 2018, each of which is incorporated herein by reference in its entirety.

SEQUENCE LISTING STATEMENT

[0002] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jan. 15, 2020, is named 54838_702_303_SL.txt and is 148,382 bytes in size.

SUMMARY OF THE DISCLOSURE

[0003] Disclosed herein, in certain embodiments, are recombinant and engineered Arc polypeptides and recombinant and engineered endogenous Gag (endo-Gag) polypeptides. In some embodiments, also included are Arc-based capsids and endo-Gag based capsids, either loaded or empty, and methods of preparing the capsids. Additionally included are methods of delivery of the Arc-based capsids and endo-Gag-based capsids to a site of interest.

[0004] Disclosed herein, in certain embodiments, is a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.

[0005] Disclosed herein, in certain embodiments, is a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.

[0006] Disclosed herein, in certain embodiments, is a vector comprising DNA encoding a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide. In some embodiments, the vector further encodes a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.

[0007] Disclosed herein, in certain embodiments, is a vector comprising DNA encoding a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the vector further encodes a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.

[0008] Disclosed herein, in certain embodiments, is a method of delivering a cargo to a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a vertebrate cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cell expresses a gene encoded by the nucleic acid. In some embodiments, the cargo is a therapeutic agent.

[0009] Disclosed herein, in certain embodiments, is a method of delivering a cargo to a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a vertebrate cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cell expresses a gene encoded by the nucleic acid. In some embodiments, the cargo is a therapeutic agent.

[0010] Disclosed herein, in certain embodiments, is a method of transfecting a nucleic acid into a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.

[0011] Disclosed herein, in certain embodiments, is a method of transfecting a nucleic acid into a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; b) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; c) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; d) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15; e) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; f) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; g) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; g) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; g) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; g) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or h) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22.

[0012] Disclosed herein, in certain embodiments, is an engineered Arc or endo-Gag polypeptide comprising a cargo binding domain and at least one capsid forming subunit from an Arc or endo-Gag polypeptide. In some embodiments, the cargo binding domain comprises a nucleic acid binding domain. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a small molecule. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a protein, a peptide, or an antibody or binding fragment thereof. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a peptidomimetic or a nucleotidomimetic. In some embodiments, the at least one capsid forming subunit comprises a polypeptide that corresponds to the CA N-lobe and/or CA C-lobe of SEQ ID NO: 1. In some embodiments, the engineered Arc or endo-Gag polypeptide further comprises a second capsid forming subunit from a different species of an Arc or endo-Gag polypeptide. In some embodiments, the second capsid forming subunit comprises a polypeptide that corresponds to the N-lobe and/or C-lobe of SEQ ID NO: 1. In some embodiments, the at least one capsid forming subunit and the second capsid forming subunit are each independently selected from a species of Arc or endo-Gag selected from a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some embodiments, the at least one capsid forming subunit and the second capsid forming subunit are from two different species. In some embodiments, the cargo binding domain is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the second capsid forming subunit is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the second capsid forming subunit is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit and the second capsid forming subunit is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit and the second capsid forming subunit is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the engineered Arc or endo-Gag polypeptide further comprises a second polypeptide. In some embodiments, the second polypeptide is fused either directly or via a linker to the at least one capsid forming subunit. In some embodiments, the second polypeptide is fused either directly or via a linker to the cargo binding domain. In some embodiments, the second polypeptide is a protein or an antibody or its binding fragments thereof. In some embodiments, the protein is a human protein or a viral protein. In some embodiments, the protein is a human Gag-like protein. In some embodiments, the protein is a de novo engineered protein designed to bind to a target receptor of interest. In some embodiments, the second polypeptide guides the delivery of a capsid formed by the engineered Arc or endo-Gag polypeptide to a target site of interest.

[0013] Disclosed herein, in certain embodiments, is a truncated Arc or endo-Gag polypeptide wherein a portion that is not involved with capsid-formation, nucleic acid binding, or delivery is removed. In some embodiments, the portion comprises a matrix (MA) domain, a reverse transcriptase (RT) domain, a nucleotide binding domain, or a combination thereof, provided that the nucleotide binding domain is not a human Arc RNA binding domain. In some embodiments, the portion comprises a CA C-lobe domain. In some embodiments, the portion comprises an N-terminal deletion, a C-terminal deletion, or a combination thereof. In some embodiments, the N-terminal deletion comprises a deletion of up to 10 amino acids, 20 amino acids, 30 amino acids, or 50 amino acids. In some embodiments, the C-terminal deletion comprises a deletion of up to 10 amino acids, 20 amino acids, 30 amino acids, or 50 amino acids.

[0014] Disclosed herein, in certain embodiments, is an Arc or endo-Gag-based capsid comprising an engineered Arc or endo-Gag polypeptide which may be a truncated Arc or endo-Gag polypeptide and a cargo encapsulated by the capsid formed by the engineered Arc or endo-Gag polypeptide. In some embodiments, the cargo is a nucleic acid molecule. In some embodiments, the nucleic acid molecule is DNA, RNA, or a mixture of DNA and RNA. In some embodiments, the DNA and the RNA are each independently single-stranded, double-stranded, or a mixture of single and double stranded. In some embodiments, the cargo is a small molecule. In some embodiments, the cargo is a protein. In some embodiments, the cargo is a peptide. In some embodiments, the cargo is an antibody or binding fragments thereof. In some embodiments, the cargo is a peptidomimetic or a nucleotidomimetic. In some embodiments, the Arc or endo-Gag-based capsid comprises one or more additional capsid subunits from one or more species of Arc or endo-Gag proteins that are different than the engineered Arc or endo-Gag polypeptide. In some embodiments, the Arc-based or endo-Gag-based capsid comprises one or more additional capsid subunits from non-Arc proteins. In some embodiments, the one or more additional capsid subunits comprise Copia protein, ASPRV1 protein, a protein from the SCAN domain family, a protein encoded by the Paraneoplastic Ma antigen family (e.g. PNMA5, PNMA6, PNMA6A, and PNMA6B), a protein from the retrotransposon Gag-like family (e.g. RTL3, RTL6, RTL8A, RTL8B), or a combination thereof. In some embodiments, the one or more additional capsid subunits comprise BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18. In some embodiments, the capsid has a diameter of at least 1 nm, 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 50 nm, 80 nm, 100 nm, 120 nm, 150 nm, 200 nm, 250 nm, 300 nm, 500 nm, 600 nm, or more. In some embodiments, the capsid has a diameter of from about 1 nm to about 600 nm, from about 1 nm to about 500 nm, from about 1 nm to about 200 nm, from about 1 nm to about 100 nm, from about 1 nm to about 50 nm, or from about 1 nm to about 30 nm. In some embodiments, the capsid has a reduced off-target effect. In some embodiments, the capsid does not have an off-target effect. In some embodiments, the capsid is formed ex-vivo. In some embodiments, the capsid is formed in-vitro.

[0015] Disclosed herein, in certain embodiments, is a nucleic acid polymer encoding a recombinant or engineered Arc polypeptide or a recombinant or engineered endogenous Gag polypeptide described herein.

[0016] Disclosed herein, in certain embodiments, is a vector comprising a nucleic acid polymer encoding a recombinant or engineered Arc polypeptide or a recombinant or engineered endogenous Gag polypeptide described herein.

[0017] Disclosed herein, in certain embodiments, is a method of preparing a loaded Arc-based or endo-Gag-based capsid comprising: incubating a plurality of recombinant or engineered Arc polypeptides or a plurality of recombinant or engineered endo-Gag polypeptides with a cargo in a solution for a time sufficient to generate the loaded capsid. In some embodiments, the method further comprises mixing the solution comprising the plurality of engineered Arc or endo-Gag polypeptides with a plurality of non-Arc or non-endo-Gag capsid forming subunits prior to incubating with the cargo. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of recombinant or engineered Arc or endo-Gag polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of engineered Arc or endo-Gag polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10. In some embodiments, the method further comprises mixing the solution comprising the plurality of truncated Arc or endo-Gag polypeptides with a plurality of non-Arc or endo-Gag capsid forming subunits prior to incubating with the cargo. In some embodiments, the plurality of non-Arc or endo-Gag capsid forming subunits are mixed with the plurality of truncated Arc or endo-Gag polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of truncated Arc or endo-Gag polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10. In some embodiments, the plurality of engineered Arc or endo-Gag polypeptides is obtained from a bacterial cell system, an insect cell system, or a mammalian cell system. In some embodiments, the plurality of engineered Arc or endo-Gag polypeptides is obtained from a cell-free system. In some embodiments, the plurality of truncated Arc or endo-Gag polypeptides is obtained from a bacterial cell system, an insect cell system, or a mammalian cell system. In some embodiments, the plurality of truncated Arc or endo-Gag polypeptides is obtained from a cell-free system. In some embodiments, the loaded Arc-based or endo-Gag capsid is formulated for systemic administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for local administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for parenteral administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for oral administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for topical administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for sublingual or aerosol administration.

[0018] Disclosed herein, in certain embodiments, is use of an engineered or recombinant Arc-based or endo-Gag-based capsid for delivery of a cargo to a site of interest, comprising contacting a cell at the site of interest with an Arc-based or endo-Gag-based capsid for a time sufficient to facilitate cellular uptake of the capsid. In some embodiments, the cell is a tumor cell. In some embodiments, the tumor cell is a solid tumor cell. In some embodiments, the solid tumor cell is a cell from a bladder cancer, breast cancer, brain cancer, colorectal cancer, kidney cancer, liver cancer, lung cancer, pancreatic cancer, prostate cancer, skin cancer, stomach cancer, or thyroid cancer. In some embodiments, the tumor cell is from a hematologic malignancy. In some embodiments, the hematologic malignancy is a B-cell malignancy, or a T-cell malignancy. In some embodiments, the hematologic malignancy is chronic lymphocytic leukemia (CLL), small lymphocytic lymphoma (SLL), diffuse large B cell lymphoma (DLBCL), follicular lymphoma, mantle cell lymphoma, Burkitt lymphoma, cutaneous T-cell lymphoma, or peripheral T cell lymphoma. In some embodiments, the cell is a somatic cell. In some embodiments, the cell is a stem cell or a progenitor cell. In some embodiments, the cell is a mesenchymal stem or progenitor cell. In some embodiments, the cell is a hematopoietic stem or progenitor cell. In some embodiments, the cell is a muscle cell, a skin cell, a blood cell, or an immune cell. In some embodiments, a target protein is overexpressed or is depleted in the cell. In some embodiments, a target gene in the cell has one or more mutations. In some embodiments, the cell comprises an impaired splicing mechanism. In some embodiments, the use is an in vivo use. In some embodiments, the Arc-based capsid is administered systemically to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered via local administration to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered parenterally to a subject. In some embodiments, the Arc-based capsid is administered orally to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered topically to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered via sublingual or aerosol administration to a subject. In some embodiments, the use is an in vitro or ex vivo use.

[0019] Disclosed herein, in certain embodiments, is a kit comprising an engineered Arc or endo-Gag polypeptide, a truncated Arc or endo-Gag polypeptide, a vector encoding a recombinant or engineered Arc or endo-Gag polypeptide, or an Arc-based or endo-Gag-based capsid.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] Various aspects of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings below.

[0021] FIG. 1 is a representation of exemplary Arc polypeptides.

[0022] FIG. 2 is a representation of exemplary engineered Arc polypeptides.

[0023] FIGS. 3A and 3B illustrate an exemplary method of engineering an Arc polypeptide to carry a specific cargo (FIG. 3A) (e.g., an RNA payload), or remove an off-function effect (FIG. 3B).

[0024] FIG. 4A shows the isolation of 6.times.His-tagged human Arc by elution from a HisTrap column with an imidazole gradient.

[0025] FIG. 4B shows the separation of 6.times.His-tagged human Arc from residual nucleic acids on a mono Q column eluted with a NaCl gradient.

[0026] FIG. 5 shows a transmission electron microscope image of negatively stained human Arc capsids.

[0027] FIG. 6 shows transmission electron microscope images of negatively stained capsids formed from recombinantly expressed Arc orthologs.

[0028] FIG. 7 shows transmission electron microscope images of negatively stained capsids formed from recombinantly expressed endo-Gag proteins.

[0029] FIG. 8 shows selective internalization of Alexa594-labeled Arc capsids by HeLa cells.

[0030] FIG. 9 shows the delivery of Cre RNA to HeLa cells by Arc capsids.

[0031] FIG. 10 illustrates methods for screening Arc and endo-Gag gene candidates for the ability to transmit a heterologous RNA payload.

DETAILED DESCRIPTION OF THE DISCLOSURE

[0032] Administrating diagnostic or therapeutic agents to a site of interest with precision has presented an ongoing challenge. Available methods of delivering nucleic acids to cells have myriad limitations. For example, AAV viral vectors often used for gene therapy are immunogenic, have a limited payload capacity of <3 kb, suffer from poor bio-distribution, can only be administered by direct injection, and pose a risk of disrupting host genes by integration. Non-viral methods have different limitations. Liposomes are primarily delivered to the liver. Extracellular vesicles have a limited payload capacity of <1 kb, limited scalability, and purification difficulties. Thus, there is a recognized need for new methods of delivering therapeutic payloads.

[0033] Most molecules do not possess inherent affinity in the body. In other cases, the administered agents accumulate either in the liver and the kidney for clearance or in unintended tissue or cell types. Method for improving delivery includes coating the agent of choice with hydrophobic compounds or polymers. Such an approach increases the duration of said agent in circulation and augments hydrophobicity for cellular uptake. On the other hand, this approach does not actively direct cargo to the site of interest for delivery.

[0034] To specifically target sites where therapy is needed, therapeutic compounds are optionally fused to moieties such as ligands, antibodies, and aptamers that recognize and bind to receptors displayed on the surface of targeted cells. Upon reaching a cell of interest, the therapeutic compound is optionally further delivered to an intracellular target. For example, a therapeutic RNA can be translated to a protein if it comes into contact with a ribosome in the cytoplasm of the cell.

[0035] Arc (activity-regulated cytoskeleton-associated protein) regulates the endocytic trafficking of .alpha.-amino-3-hydroxy-5-methylisoxazole-4-propionic acid (AMPA) type glutamate receptors. Arc activities have been linked to synaptic strength and neuronal plasticity. Phenotypes of loss of Arc in experimental murine model included defective formation of long-term memory and reduced neuronal activity and plasticity.

[0036] Arc exhibits similar molecular properties to retroviral Gag proteins. The Arc gene may have originated from the Ty3/gypsy retrotransposon. An endogenous Gag (endo-Gag) protein is any protein endogenous to a eukaryotic organism, including Arc, that has predicted and annotated similarity to viral Gag proteins. Exemplary endo-Gag proteins are disclosed in Campillos M, Doerks T, Shah P K, and Bork P, Computational characterization of multiple Gag-like human proteins, Trends Genet. 2006 November; 22(11):585-9. An endo-Gag protein is optionally recombinantly expressed by any host cell, including a prokaryotic or eukaryotic cell, or a bacterial, yeast, insect, vertebrate, mammalian, or human cell. As described herein, in some embodiments an endo-Gag protein assembles into an endo-Gag capsid.

[0037] Disclosed herein, in certain embodiments, are Arc and endo-Gag polypeptides which assemble into a capsid for delivery of a cargo of interest. In some embodiments, also described herein are engineered Arc and endo-Gag polypeptides which assemble into a capsid for delivery of a cargo of interest. In additional embodiments, described herein are capsids, e.g., Arc-based or endo-Gag-based capsids, for delivery of a cargo of interest.

Arc Polypeptides and Endogenous Gag Polypeptides

[0038] In certain embodiments, disclosed herein is an Arc polypeptide. In certain embodiments, disclosed herein is an endo-Gag polypeptide. It should be understood that endo-Gag sequences are optional substitutes for Arc sequences to form any type of engineered Arc polypeptide described in this section.

[0039] In some instances, Arc is a non-human Arc polypeptide. In some instances, the Arc polypeptide comprises a full-length Arc polypeptide (e.g., a full-length non-human Arc polypeptide). In other instances, the Arc polypeptide comprises a fragment of non-human Arc, such as a truncated Arc polypeptide, that participates in the formation of a capsid. In additional instances, the Arc polypeptide comprises one or more domains of a non-human Arc polypeptide, in which at least one of the domains participates in the formation of a capsid. In further instances, the Arc polypeptide is a recombinant Arc polypeptide.

[0040] In some instances, endo-Gag is a non-human endo-Gag polypeptide. In some instances, the endo-Gag polypeptide comprises a full-length endo-Gag polypeptide (e.g., a full-length non-human endo-Gag polypeptide). In other instances, the endo-Gag polypeptide comprises a fragment of non-human endo-Gag, such as a truncated endo-Gag polypeptide, that participates in the formation of a capsid. In additional instances, the endo-Gag polypeptide comprises one or more domains of a non-human endo-Gag polypeptide, in which at least one of the domains participates in the formation of a capsid. In further instances, the endo-Gag polypeptide is a recombinant endo-Gag polypeptide.

[0041] In some embodiments, the Arc is a human Arc polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human Arc. In some instances, the Arc polypeptide comprises a full-length human Arc polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human Arc protein. In other instances, the Arc polypeptide comprises a human Arc fragment comprising modification(s) in at least its RNA binding domain. In additional instances, the Arc polypeptide comprises one or more domains of a human Arc polypeptide, in which at least one of the domains participates in the formation of a capsid and in which the RNA binding domain is modified to bind to a cargo that native human Arc protein does not bind to. In further instances, the Arc polypeptide is a recombinant human Arc polypeptide, with at least the RNA binding domain is modified to enable loading of a cargo that is not native to the human Arc protein.

[0042] In some embodiments, the Endo-Gag is a human Endo-Gag polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human endo-Gag. In some instances, the endo-Gag polypeptide comprises a full-length human endo-Gag polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human endo-Gag protein. In other instances, the endo-Gag polypeptide comprises a human endo-Gag fragment comprising modification(s) in at least its RNA binding domain to bind to a cargo that a native human endo-Gag protein does not bind to. In additional instances, the endo-Gag polypeptide comprises one or more domains of a human endo-Gag polypeptide, in which at least one of the domains participates in the formation of a capsid and in which the RNA binding domain is modified to bind to a cargo that is not native to the human endo-Gag protein. In further instances, the endo-Gag polypeptide is a recombinant human endo-Gag polypeptide, with at least the RNA binding domain is modified to enable loading of a cargo that is not native to the human endo-Gag protein.

[0043] In some instances, the Arc or endo-Gag polypeptide is an engineered Arc or endo-Gag polypeptide. As used herein, an engineered polypeptide is a recombinant polypeptide that is not identical in sequence to a full length, wild-type polypeptide. In some instances, the engineered Arc or endo-Gag polypeptide comprises a fragment of an Arc or endo-Gag polypeptide from a first species and at least an additional fragment from an Arc or endo-Gag polypeptide of a second species. In some cases, the first Arc or endo-Gag polypeptide is selected from a kingdom member of animalia, plantae, fungi, or protista. In some cases, the first species is selected from a mammal, a rodent, a bird, a reptile, a fish, a vertebrate, a eukaryote, an insect, a fungus, or a plant. In some cases, the second Arc polypeptide is selected from a kingdom member of animalia, plantae, fungi, or protista that is the same or different than the first Arc or endo-Gag polypeptide. In some cases, the second species is selected from a mammal, a rodent, a bird, a reptile, a fish, a vertebrate, a eukaryote, an insect, a fungus, or a plant that is different from the first species.

[0044] In some embodiments, an exemplary mammalian Arc or endo-Gag protein for expression as a recombinant or engineered Arc polypeptide is from the species Homo sapiens. Additional exemplary species of primate Arc or endo-Gag protein proteins for expression as a recombinant or engineered Arc polypeptide include: Gorilla, Pongo abelii, Pan paniscus, Macaca nemestrina, Chlorocebus sabaeus, Papio anubis, Rhinopithecus roxellana, Macaca fascicularis, Nomascus leucogenys, Callithrix jacchus, Aotus nancymaae, Cebus capucinus imitator, Saimiri boliviensis boliviensis, Otolemur garnettii, Macaca mulatta, and Macaca fascicularis.

[0045] An exemplary species list of rodent Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Fukomys damarensis, Microcebus murinus, Heterocephalus glaber, Propithecus coquereli, Marmota marmota marmota, Galeopterus variegatus, Cavia porcellus, Dipodomys ordii, Octodon degus, Castor canadensis Nannospalax galili, Carlito syrichta, Chinchilla lanigera, Mus musculus, Ictidomys tridecemlineatus, Rattus norvegicus, Microtus ochrogaster, Otolemur garnettii, Meriones unguiculatus, Cricetulus griseus, Rattus norvegicus, Neotoma lepida, Jaculus jaculus, Mustela putorius furo, Mesocricetus auratus, Tupaia chinensis, Cricetulus griseus, Chrysochloris asiatica, Elephantulus edwardii, Erinaceus europaeus, Ochotona princeps, Sorex araneus, Monodelphis domestica, Echinops telfairi, and Condylura cristata.

[0046] An exemplary species list of Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Vulpes vulpes, Canis lupus dingo, Felis catus, Panthera pardus, Callorhinus ursinus, Odobenus rosmarus divergens, Equus asinus, Sus scrofa, Manis javanica, Ceratotherium simum simum, Leptonychotes weddellii, Enhydra lutris kenyoni, Lipotes vexillifer, Bos grunniens, Bubalus bubalis, Camelus dromedarius, Vicugna pacos, Orcinus orca, Neomonachus schauinslandi, Tursiops truncatus, Bos taurus, Capra hircus, Delphinapterus leucas, Ovis aries musimon, Balaenoptera acutorostrata scammoni, Neophocaena asiaeorientalis asiaeorientalis, Miniopterus natalensis, Pteropus alecto, Physeter catodon, Loxodonta africana, Orycteropus afer afer, Bos mutus, Desmodus rotundus, Hipposideros armiger, Ailuropoda melanoleuca, Trichechus manatus latirostris, Rousettus latirostris, Rousettus aegyptiacus, Eptesicus fuscus, Rhinolophus sinicus, Cervus elaphus hippelaphus, Odocoileus virginianus texanus, Pantholops hodgsonii, Camelus bactrianus, Sarcophilus harrisii, Phascolarctos cinereus, and Ornithorhynchus anatinus.

[0047] An exemplary species list of bird Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Gallus gallus, Corvus cornix, cornix, Panus major, Corvus brachyrhynchos, Dromaius novaehollandiae, and Apteryx rowi.

[0048] An exemplary species list of reptile Arc protein for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Python bivittatus, Pogona vitticeps, Anolis carolinensis, Protobothrops mucrosquamatus, Alligator sinensis, Crocodylus porosus, Gavialis gangeticus, Alligator mississippiensis, Pelodiscus sinensis, Terrapene mexicana triunguis, Chrysemys picta bellii, Chelonia mydas, Nanorana parkeri, Xenopus tropicalis, Xenopus laevis, and Latimeria chalumnae,

[0049] An exemplary species list of fish Arc protein for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Oncorhynchus mykiss, Acanthochromis polyacanthus, Oncorhynchus kisutch, Carassius auratus, and Austrofundulus limnaeus.

[0050] An exemplary species list of insect Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Drosophila serrata, Drosophila bipectinata, Solenopsis invicta, Temnothorax curvispinosus, Drosophila melanogaster, Agrilus planipennis, Camponotus floridanus, Pogonomyrmex barbatus, Nilaparvata lugens, Bombyx mori, Tribolium castaneum, and Leptinotarsa decemlineata.

[0051] An exemplary species list of plant Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes Spinacia oleracea and Erythranthe guttata.

[0052] An exemplary species list of fungi proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Saccharomyces cerevisiae, Rhizopus delemar, Fusarium oxysporum, Cryptococcus neoformans, Rhizophagus irregularis, Fusarium fujikuroi, Candida albicans, Trichophyton rubrum, Pyrenophora tritici-repentis, Rhizopus microsporus, Rhizoctonia solani, Aspergillus flavus, Verticillium dahliae, Fusarium verticillioides, Aspergillus niger, Fusarium graminearum, Aspergillus fumigatus, Zymoseptoria tritici, and Trichoderma harzianum.

[0053] An exemplary species list of protists Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Entamoeba histolytica, Paulinella micropora, Guillardia theta, Oxyrrhis marina, Seminavis robusta, Euglena longa, Naegleria gruberi, and Trichomonas vaginalis.

[0054] In some instances, Arc or endo-Gag comprises a capsid assembly/forming (CA) domain, a cargo binding domain (e.g., an RNA binding domain), and optionally a matrix (MA) domain, a reverse transcriptase (RT) domain, or a combination thereof. In some cases, the CA domain is further divided into an N-lobe domain and a C-lobe domain. In some cases, the cargo binding domain comprises an RNA binding domain, a DNA binding domain, a protein binding domain, a peptide binding domain, an antibody binding domain, a small molecule binding domain, or a peptidomimetic/nucleotidomimetic binding domain. Exemplary cargo binding domains include, but are not limited to, domains from GPCRs, antibodies or binding fragments thereof, lipoproteins, integrins, tyrosine kinases, DNA-binding proteins, RNA-binding proteins, nucleases, ligases, proteases, integrases, isomerases, phosphatases, GTPases, aromatases, esterases, adaptor proteins, G-proteins, GEFs, cytokines, interleukins, interleukin receptors, interferons, interferon receptors, caspases, transcription factors, neurotrophic factors and their receptors, growth factors and their receptors, signal recognition particle and receptor components, extracellular matrix proteins, integral components of membrane, ribosomal proteins, translation elongation factors, translation initiation factors, GPI-anchored proteins, tissue factors, dystrophin, utrophin, dystrobrevin, any fusions, combinations, subunits, derivatives, or domains thereof.

[0055] In some embodiments, one or more non-essential regions which are not involved in capsid formation or nucleic acid binding are removed from an Arc or endo-Gag protein to generate an Arc or endo-Gag polypeptide. In such instances, one or more non-essential regions, e.g., an N-terminal region (e.g., up to 10 amino acids, up to 20 amino acids, up to 30 amino acids, or up to 50 amino acids), a C-terminal region (e.g., up to 10 amino acids, up to 20 amino acids, up to 30 amino acids, or up to 50 amino acids), a RT domain, a MA domain, or a combination thereof, are deleted from an Arc or endo-Gag protein to generate an Arc or endo-Gag polypeptide. In some cases, only the essential regions involved in capsid assembly/forming and cargo binding remain in an Arc or endo-Gag polypeptide. In additional cases, only the essential region involved in capsid assembly/forming (e.g., the N-lobe and/or the C-lobe) remains in an Arc polypeptide.

[0056] In certain embodiments, the RT domain, the MA domain, and/or the endogenous RNA binding domain are replaced with other cargo binding domains: for example, replaced with a DNA binding domain, a protein binding domain, a peptide binding domain, an antibody binding domain, a small molecule binding domain, a peptidomimetic binding domain, or a nucleotidomimetic binding domain. In some embodiments, an Arc or endo-Gag polypeptide comprises truncations or modifications of domains involved in capsid forming, nucleic acid binding, or delivery.

[0057] In some embodiments, the Arc or endo-Gag polypeptide comprises a MA domain, a CA N-lobe, a CA C-lobe, a cargo binding domain, and a RT domain. In some instances, the Arc polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the CA N-lobe, the CA C-lobe, the RT domain, and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the RT domain, the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the MA domain, the RT domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.

[0058] In some embodiments, the Arc or endo-Gag polypeptide comprises a MA domain, a CA N-lobe, a CA C-lobe, and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the CA N-lobe, the CA C-lobe, and the cargo binding domain. In some instances, the Arc polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the MA domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.

[0059] In some embodiments, the Arc or endo-Gag polypeptide comprises a CA N-lobe, a CA C-lobe, and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the CA N-lobe, the CA C-lobe, and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.

[0060] In some embodiments, the Arc or endo-Gag polypeptide comprises a CA N-lobe and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the CA N-lobe and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain and the CA N-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, the two domains are either directly or indirectly fused to each other.

[0061] In some embodiments, the Arc or endo-Gag polypeptide is engineered to comprise a cargo binding domain, a CA domain, a MA domain, or a RT domain from one or more additional species to generate an engineered Arc polypeptide. For example, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a CA domain, a MA domain, or a RT domain from a first species and a cargo binding domain, a CA domain, a MA domain, or a RT domain from a second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.

[0062] In some instances, the engineered or endo-Gag Arc polypeptide comprises a cargo binding domain from a first species and a CA domain (e.g., a CA N-lobe and optionally a CA C-lobe) from a second species. The engineered Arc or endo-Gag polypeptide optionally comprises a MA domain and an RT domain from either the first species or the second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.

[0063] In some instances, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a first CA domain, a second CA domain, and optionally a MA domain and/or a RT domain. In some cases, the cargo binding domain, the first CA domain, and optionally a MA domain and/or a RT domain are from a first species and the second CA domain is from a second species. In some cases, the first CA domain is from a first species and the cargo binding domain, the second CA domain, and optionally a MA domain and/or a RT domain are from a second species. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two adjacent domains.

[0064] In some instances, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a first CA domain, and a second CA domain. In some cases, the cargo binding domain and the first CA domain are from a first species and the second CA domain is from a second species. In some cases, the first CA domain is from a first species and the cargo binding domain and the second CA domain are from a second species. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a cargo binding domain, a first CA domain, and a second CA domain. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a first CA domain, a cargo binding domain, and a second CA domain. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a first CA domain, a second CA domain, and a cargo binding domain. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.

[0065] In some instances, the engineered Arc or endo-Gag polypeptide further comprises a second polypeptide. In some instances, the second polypeptide is fused directly or indirectly via a linker to one or more of: a cargo binding domain, a first CA domain, a second CA domain, a MA domain if present, or a RT domain if present. In some cases, the second polypeptide is a protein (e.g., a human protein), an antibody or binding fragment thereof, a viral protein, a Gag-like protein (e.g., a human Gag-like protein), or a de novo engineered protein designed to bind to a target receptor of interest. In some instances, the antibody or binding fragment thereof comprises a humanized antibody or binding fragments thereof, a murine antibody or binding fragment thereof, a chimeric antibody or binding fragment thereof, a monoclonal antibody or binding fragment thereof, a multi-specific antibody or binding fragment thereof, a bispecific antibody or biding fragment thereof, a monovalent Fab', a divalent Fab.sub.2, F(ab)'.sub.3 fragments, a single-chain variable fragment (scFv), a bis-scFv, an (scFv).sub.2, a diabody, a minibody, a nanobody, a triabody, a tetrabody, a disulfide stabilized Fv protein (dsFv), a single-domain antibody (sdAb), an Ig NAR, a camelid antibody or binding fragment thereof, or a chemically modified derivative thereof. In some instances, the second polypeptide guides the delivery of a capsid formed by the engineered Arc polypeptide to a target site of interest.

[0066] In some embodiments, a nucleic acid sequence or amino acid sequence of the disclosure (for example, encoding an Arc polypeptide or endo-Gag polypeptide) has at least 70% homology, at least 71% homology, at least 72% homology, at least 73% homology, at least 74% homology, at least 75% homology, at least 76% homology, at least 77% homology, at least 78% homology, at least 79% homology, at least 80% homology, at least 81% homology, at least 82% homology, at least 83% homology, at least 84% homology, at least 85% homology, at least 86% homology, at least 87% homology, at least 88% homology, at least 89% homology, at least 90% homology, at least 91% homology, at least 92% homology, at least 93% homology, at least 94% homology, at least 95% homology, at least 96% homology, at least 97% homology, at least 98% homology, at least 99% homology, at least 99.1% homology, at least 99.2% homology, at least 99.3% homology, at least 99.4% homology, at least 99.5% homology, at least 99.6% homology, at least 99.7% homology, at least 99.8% homology, at least 99.9% or at least 99.99% homology to an amino acid sequence provided herein. Various methods and software programs are used to determine the homology between two or sequences, such as NCBI BLAST, Clustal W, MAFFT, Clustal Omega, AlignMe, Praline, or another suitable method or algorithm.

[0067] In certain embodiments, the Arc polypeptide is a human polypeptide having the amino acid sequence of SEQ ID NO: 1 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 1.

[0068] In certain embodiments, the Arc polypeptide is a killer whale polypeptide having the amino acid sequence of SEQ ID NO: 2 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 2.

[0069] In certain embodiments, the Arc polypeptide is a white tailed deer polypeptide having the amino acid sequence of SEQ ID NO: 3 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 3.

[0070] In certain embodiments, the Arc polypeptide is a platypus polypeptide having the amino acid sequence of SEQ ID NO: 4 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 4.

[0071] In certain embodiments, the Arc polypeptide is a goose polypeptide having the amino acid sequence of SEQ ID NO: 5 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 5.

[0072] In certain embodiments, the Arc polypeptide is a Dalmatian pelican polypeptide having the amino acid sequence of SEQ ID NO: 6 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 6.

[0073] In certain embodiments, the Arc polypeptide is a white tailed eagle polypeptide having the amino acid sequence of SEQ ID NO: 7 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 7.

[0074] In certain embodiments, the Arc polypeptide is a king cobra polypeptide having the amino acid sequence of SEQ ID NO: 8 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 8.

[0075] In certain embodiments, the Arc polypeptide is a ray finned fish polypeptide having the amino acid sequence of SEQ ID NO: 9 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 9.

[0076] In certain embodiments, the Arc polypeptide is a sperm whale polypeptide having the amino acid sequence of SEQ ID NO: 10 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 10.

[0077] In certain embodiments, the Arc polypeptide is a turkey polypeptide having the amino acid sequence of SEQ ID NO: 11 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 11.

[0078] In certain embodiments, the Arc polypeptide is a central bearded dragon polypeptide having the amino acid sequence of SEQ ID NO: 12 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 12.

[0079] In certain embodiments, the Arc polypeptide is a Chinese alligator polypeptide having the amino acid sequence of SEQ ID NO: 13 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 13.

[0080] In certain embodiments, the Arc polypeptide is an American alligator polypeptide having the amino acid sequence of SEQ ID NO: 14 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 14.

[0081] In certain embodiments, the Arc polypeptide is a Japanese gekko polypeptide having the amino acid sequence of SEQ ID NO: 15 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 15.

[0082] In certain embodiments, the endo-Gag polypeptide is a human PNMA3 polypeptide having the amino acid sequence of SEQ ID NO: 16 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 16.

[0083] In certain embodiments, the endo-Gag polypeptide is a human PNMA5 polypeptide having the amino acid sequence of SEQ ID NO: 17 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 17.

[0084] In certain embodiments, the endo-Gag polypeptide is a human PNMA6A polypeptide having the amino acid sequence of SEQ ID NO: 18 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 18.

[0085] In certain embodiments, the endo-Gag polypeptide is a human PNMA6B polypeptide having the amino acid sequence of SEQ ID NO: 19 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 19.

[0086] In certain embodiments, the endo-Gag polypeptide is a human RTL3 polypeptide having the amino acid sequence of SEQ ID NO: 20 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 20.

[0087] In certain embodiments, the endo-Gag polypeptide is a human RTL6 polypeptide having the amino acid sequence of SEQ ID NO: 21 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 21.

[0088] In certain embodiments, the endo-Gag polypeptide is a human RTL8A polypeptide having the amino acid sequence of SEQ ID NO: 22 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 22.

[0089] In certain embodiments, the endo-Gag polypeptide is a human RTL8B polypeptide having the amino acid sequence of SEQ ID NO: 23 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 23.

[0090] In certain embodiments, the endo-Gag polypeptide is a human BOP polypeptide having the amino acid sequence of SEQ ID NO: 24 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 24.

[0091] In certain embodiments, the endo-Gag polypeptide is a human LDOC1 polypeptide having the amino acid sequence of SEQ ID NO: 25 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 25.

[0092] In certain embodiments, the endo-Gag polypeptide is a human ZNF18 polypeptide having the amino acid sequence of SEQ ID NO: 26 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 26.

[0093] In certain embodiments, the endo-Gag polypeptide is a human MOAP1 polypeptide having the amino acid sequence of SEQ ID NO: 27 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 27.

[0094] In certain embodiments, the endo-Gag polypeptide is a human PEG10 polypeptide having the amino acid sequence of SEQ ID NO: 28 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 28.

[0095] In some cases, the recombinant Arc or endo-Gag polypeptide is an Arc polypeptide illustrated in FIG. 1.

[0096] In some cases, the engineered Arc or endo-Gag polypeptide is an engineered Arc polypeptide illustrated in FIG. 2.

Linkers

[0097] In certain embodiments, a polypeptide of the disclosure comprises a linker. In some embodiments, the linker is a peptide linker. In some instances, the linker is a rigid linker. In other instances, the linker is a flexible linker. In some cases, the linker is a non-cleavable linker. In other cases, the linker is a cleavable linker. In additional cases, the linker comprises a linear structure, or a non-linear structure (e.g., a cyclic structure).

[0098] In certain embodiments, non-cleavable linkers comprise short peptides of varying lengths. Exemplary non-cleavable linkers include (EAAAK)n (SEQ ID NO: 70), or (EAAAR)n (SEQ ID NO: 71), where n is from 1 to 5, and up to 30 residues of glutamic acid-proline or lysine-proline repeats. In some embodiments, the non-cleavable linker comprises (GGGGS)n (SEQ ID NO: 72) or (GGGS)n (SEQ ID NO: 73), wherein n is 1 to 10; KESGSVSSEQLAQFRSLD (SEQ ID NO: 74); or EGKSSGSGSESKST (SEQ ID NO: 75). In some embodiments, the non-cleavable linker comprises a poly-Gly/Ala polymer.

[0099] In certain embodiments, the linker is a cleavable linker, e.g., an extracellular cleavable linker or an intracellular cleavable linker. In some instances, the linker is designed for cleavage in the presence of particular conditions or in a particular environment (e.g., under physiological condition). For example, the design of a linker for cleavage by specific conditions, such as by a specific enzyme, allows the targeting of cellular uptake to a specific location.

[0100] In some embodiments, the linker is a pH-sensitive linker. In one instance, the linker is cleaved under basic pH conditions. In other instance, the linker is cleaved under acidic pH conditions.

[0101] In some embodiments, the linker is cleaved in vivo by endogenous enzymes (e.g., proteases) such as serine proteases including but not limited to thrombin, metalloproteases, furin, cathepsin B, necrotic enzymes (e.g., calpains), and the like. Exemplary cleavable linkers include, but are not limited to, GGAANLVRGG (SEQ ID NO: 76); SGRIGFLRTA (SEQ ID NO: 77); SGRSA (SEQ ID NO: 78); GFLG (SEQ ID NO: 79); ALAL (SEQ ID NO: 80); FK; PIC(Et)F-F (SEQ ID NO: 81), where C(Et) indicates S-ethylcysteine; PR(S/T)(L/I)(S/T) (SEQ ID NO: 82); DEVD (SEQ ID NO: 83); GWEHDG (SEQ ID NO: 84); RPLALWRS (SEQ ID NO: 85); or a combination thereof.

Capsids

[0102] In some embodiments, disclosed herein is a capsid. In some instances, the capsid comprises an Arc polypeptide and/or an endo-Gag polypeptide such as a Copia protein, ASPRV1 protein, a protein from the SCAN domain family, a protein encoded by the Paraneoplastic Ma antigen family, a protein or a combination of proteins chosen from the retrotransposon Gag-like family, or a combination thereof. Exemplary endo-Gag polypeptides are BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18. In some instances, the Arc polypeptide, the Copia protein, the ASPRV1 protein, the protein from the SCAN domain family, the protein encoded by the Paraneoplastic Ma antigen family, and the protein or a combination of proteins chosen from the retrotransposon Gag-like family are each independently a full-length polypeptide. In other instances, the Arc polypeptide, the Copia protein, the ASPRV1 protein, the protein from the SCAN domain family, the protein encoded by the Paraneoplastic Ma antigen family, and the protein or a combination of proteins chosen from the retrotransposon Gag-like family are each independently a functional fragment thereof, e.g., that is capable of forming a subunit of a capsid.

Arc-Based Capsids and Endo-Gag-Based Capsids

[0103] In some embodiments, the capsid comprises an Arc-based capsid. In some embodiments, the capsid comprises an endo-Gag-based capsid. In some instances, the Arc-based and/or endo-Gag capsid comprises a plurality of recombinant Arc polypeptides and/or endo-Gag polypeptides described above, a plurality of engineered Arc polypeptides and/or endo-Gag polypeptides described above, or a combination thereof. In some cases, the Arc-based capsid comprises a plurality of recombinant Arc polypeptides. In other cases, the Arc-based capsid comprises a plurality of engineered Arc polypeptides. In some cases, the endo-Gag-based capsid comprises a plurality of recombinant endo-Gag polypeptides. In other cases, the endo-Gag-based capsid comprises a plurality of engineered endo-Gag polypeptides.

[0104] In some embodiments, the Arc-based or endo-Gag-based capsid comprises a first plurality of Arc and/or endo-Gag polypeptides from a first species and a second plurality of Arc and/or endo-Gag polypeptides from at least a second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.

[0105] In some instances, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 50:1, or 100:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 2:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 4:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 5:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 8:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc polypeptides is 10:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 20:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 50:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 100:1. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the or engineered Arc polypeptide forms a capsid subunit).

[0106] In some instances, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10, 1:20, or 1:50. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:2. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:5. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:8. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:10. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:20. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:50. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc or endo-Gag polypeptide forms a capsid subunit).

[0107] In some embodiments, the Arc-based capsid or endo-Gag-based capsid comprises a plurality of recombinant or engineered Arc polypeptides and a plurality of non-Arc proteins. Exemplary species of non-Arc proteins include but are not limited to, Copia, ASPRV1, a protein or a combination of proteins chosen from the SCAN domain family, a protein or a combination of proteins chosen from the Paraneoplastic Ma antigen family, and a protein or a combination of proteins chosen from the retrotransposon Gag-like family. Exemplary species of non-Arc proteins include BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18.

[0108] In some instances, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 50:1, or 100:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 2:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 4:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 5:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 8:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 10:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 20:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 50:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 100:1. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc polypeptide forms a capsid subunit).

[0109] In some instances, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10, 1:20, or 1:50. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:2. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:5. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:8. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:10. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:20. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:50. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc polypeptide forms a capsid subunit).

[0110] In some embodiments, the capsid has a diameter of at least 1 nm, or more. In some instances, the capsid has a diameter of at least 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, or more. In some instances, the capsid has a diameter of at least 5 nm, or more. In some cases, the capsid has a diameter of at least 10 nm, or more. In some instances, the capsid has a diameter of at least 20 nm, or more. In some cases, the capsid has a diameter of at least 30 nm, or more. In some cases, the capsid has a diameter of at least 40 nm, or more. In some cases, the capsid has a diameter of at least 50 nm, or more. In some cases, the capsid has a diameter of at least 80 nm, or more. In some cases, the capsid has a diameter of at least 100 nm, or more. In some cases, the capsid has a diameter of at least 200 nm, or more. In some cases, the capsid has a diameter of at least 300 nm, or more. In some cases, the capsid has a diameter of at least 400 nm, or more. In some cases, the capsid has a diameter of at least 500 nm, or more. In some cases, the capsid has a diameter of at least 600 nm, or more.

[0111] In some embodiments, the capsid has a diameter of at most 1 nm, or less. In some instances, the capsid has a diameter of at most 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, or less. In some instances, the capsid has a diameter of at most 5 nm, or less. In some cases, the capsid has a diameter of at most 10 nm, or less. In some instances, the capsid has a diameter of at most 20 nm, or less. In some cases, the capsid has a diameter of at most 30 nm, or less. In some cases, the capsid has a diameter of at least 40 nm, or less. In some cases, the capsid has a diameter of at least 50 nm, or less. In some cases, the capsid has a diameter of at least 80 nm, or less. In some cases, the capsid has a diameter of at least 100 nm, or less. In some cases, the capsid has a diameter of at least 200 nm, or less. In some cases, the capsid has a diameter of at least 300 nm, or less. In some cases, the capsid has a diameter of at least 400 nm, or less. In some cases, the capsid has a diameter of at least 500 nm, or less. In some cases, the capsid has a diameter of at least 600 nm, or less.

[0112] In some embodiments, the capsid has a diameter of about 1 nm, 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, or 600 nm. In some instances, the capsid has a diameter of about 5 nm. In some cases, the capsid has a diameter of about 10 nm. In some instances, the capsid has a diameter of about 20 nm. In some cases, the capsid has a diameter of about 30 nm. In some cases, the capsid has a diameter of about 40 nm. In some cases, the capsid has a diameter of about 50 nm. In some cases, the capsid has a diameter of about 80 nm. In some cases, the capsid has a diameter of about 100 nm. In some cases, the capsid has a diameter of about 200 nm. In some cases, the capsid has a diameter of about 300 nm. In some cases, the capsid has a diameter of about 400 nm. In some cases, the capsid has a diameter of about 500 nm. In some cases, the capsid has a diameter of about 600 nm.

[0113] In some embodiments, the capsid has a diameter of from about 1 nm to about 600 nm. In some instances, the capsid has a diameter of from about 2 nm to about 500 nm, from about 2 nm to about 400 nm, from about 2 nm to about 300 nm, from about 2 nm to about 200 nm, from about 2 nm to about 100 nm, from about 2 nm to about 50 nm, from about 2 nm to about 30 nm, from about 20 nm to about 400 nm, from about 20 nm to about 300 nm, from about 20 nm to about 200 nm, from about 20 nm to about 100 nm, from about 20 nm to about 50 nm, from about 20 nm to about 30 nm, from about 30 nm to about 500 nm, from about 30 nm to about 400 nm, from about 30 nm to about 300 nm, from about 30 nm to about 200 nm, from about 30 nm to about 100 nm, from about 30 nm to about 50 nm, from about 50 nm to about 300 nm, from about 50 nm to about 200 nm, from about 50 nm to about 100 nm, from about 2 nm to about 25 nm, from about 2 nm to about 20 nm, from about 2 nm to about 10 nm, from about 5 nm to about 25 nm, from about 5 nm to about 20 nm, from about 5 nm to about 10 nm, from about 10 nm to about 25 nm, or from about 10 nm to about 20 nm.

[0114] In some embodiments, the capsid has a reduced off-target effect. In some cases, the off-target effect is less than 10%, 5%, 4%, 3%, 2%, 1%, or 0.5%. In some cases, the off-target effect is no more than 10%, 5%, 4%, 3%, 2%, 1%, or 0.5%.

[0115] In some cases, the capsid does not have an off-target effect.

[0116] In certain embodiments, the formation of Arc and/or endo-Gag-based capsids occurs either ex vivo or in vitro.

[0117] In some instances, the Arc and/or endo-Gag-based capsids is assembled in vivo.

[0118] In some instances, the Arc and/or endo-Gag-based capsids is stable at room temperature. In some cases, the Arc and/or endo-Gag-based capsids is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a cargo and/or a therapeutic agent, e.g., a DNA or an RNA).

[0119] In some instances, the Arc and/or endo-Gag-based capsids is stable at a temperature from about 2.degree. C. to about 37.degree. C. In some instances, the Arc and/or endo-Gag-based capsids is stable at a temperature from about 2.degree. C. to about 8.degree. C., about 2.degree. C. to about 4.degree. C., about 20.degree. C. to about 37.degree. C., about 25.degree. C. to about 37.degree. C., about 20.degree. C. to about 30.degree. C., about 25.degree. C. to about 30.degree. C., or about 30.degree. C. to about 37.degree. C. In some cases, the Arc and/or endo-Gag-based capsid is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a cargo and/or a therapeutic agent, e.g., a DNA or an RNA).

[0120] In some instances, the Arc and/or endo-Gag-based capsids is stable for at least about 1 day, 2 days, 4 days, 5 days, 7 days, 14 days, 28 days, 30 days, 60 days, 2 months, 3 months, 4 months, 5 months, 6 months, 12 months, 18 months, 24 months, 3 years, 5 years, or longer. In some case, the Arc and/or endo-Gag-based capsids has minimum degradation, e.g., less than 10%, 5%, 4%, 3%, 2%, 1%, 0.5% based on the total population of the Arc and/or endo-Gag-based capsids that is degraded. In some cases, the Arc and/or endo-Gag-based capsid is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a therapeutic agent, e.g., a DNA or an RNA).

Additional Capsids

[0121] In some embodiments, the capsid comprises the Copia protein. In some instances, the Copia protein is from Drosophila melanogaster (UniProtKB-P04146), Ceratitis capitate (UniProtKB-W8BHY5), or Drosophila simulans (UniProtKB-Q08461).

[0122] In some embodiments, the capsid comprises the protein ASPRV1. The ASPRV1 protein is a structural protein that participates in the development and maintenance of the skin barrier. In some instances, the protein ASPRV1 is from Homo sapiens (UniProtKB-Q53RT3).

[0123] In some embodiments, the capsid comprises a protein from the SCAN domain family. SCAN domain is a superfamily of zinc finger transcription factors. SCAN domain is also known as leucine rich region (LeR) and functions as protein interaction domain that mediates self-association or selective association with other proteins.

[0124] In some embodiments, the capsid comprises a protein from the Paraneoplastic Ma antigen family. The Paraneoplastic Ma antigen family comprises about 14 members of neuro- and testis-specific proteins.

[0125] In some embodiments, the capsid comprises a protein encoded by a Retrotransposon Gag-like gene.

[0126] In some embodiments, the capsid comprises BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and/or ZNF18.

Cargos

[0127] In some embodiments, a composition of the disclosure (for example, a capsid) comprises a cargo. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the cargo is a nucleic acid molecule, a small molecule, a protein, a peptide, an antibody or binding fragment thereof, a peptidomimetic, or a nucleotidomimetic. In some instances, the cargo is a therapeutic cargo, comprising e.g., one or more drugs. In some instances, the cargo comprises a diagnostic tool, for profiling, e.g., one or more markers (such as markers associates with one or more disease phenotypes). In additional instances, the cargo comprises an imaging tool.

[0128] In some instances, the cargo is a nucleic acid molecule. Exemplary nucleic acid molecules include DNA, RNA, or a mixture of DNA and RNA. In some instances, the nucleic acid molecule is a DNA polymer. In some cases, the DNA is a single stranded DNA polymer. In other cases, the DNA is a double stranded DNA polymer. In additional cases, the DNA is a hybrid of single and double stranded DNA polymer.

[0129] In some embodiments, the nucleic acid molecule is a RNA polymer, e.g., a single stranded RNA polymer, a double stranded RNA polymer, or a hybrid of single and double stranded RNA polymers. In some instances, the RNA comprises and/or encodes an antisense oligoribonucleotide, a siRNA, an mRNA, a tRNA, an rRNA, a snRNA, a shRNA, microRNA, or a non-coding RNA.

[0130] In some embodiments, the nucleic acid molecule comprises a hybrid of DNA and RNA.

[0131] In some embodiments, the nucleic acid molecule is an antisense oligonucleotide, optionally comprising DNA, RNA, or a hybrid of DNA and RNA.

[0132] In some instances, the nucleic acid molecule comprises and/or encodes an mRNA molecule.

[0133] In some embodiments, the nucleic acid molecule comprises and/or encodes an RNAi molecule. In some cases, the RNAi molecule is a microRNA (miRNA) molecule. In other cases, the RNAi molecule is a siRNA molecule. The miRNA and/or siRNA are optionally double-stranded or as a hairpin, and further optionally encapsulated as precursor molecules.

[0134] In some embodiments, the nucleic acid molecule is for use in a nucleic acid-based therapy. In some instances, the nucleic acid molecule is for regulating gene expression (e.g., modulating mRNA translation or degradation), modulating RNA splicing, or RNA interference. In some cases, the nucleic acid molecule comprises and/or encodes an antisense oligonucleotide, microRNA molecule, siRNA molecule, mRNA molecule, for use in regulation of gene expression, modulating RNA splicing, or RNA interference.

[0135] In some instances, the nucleic acid molecule is for use in gene editing. Exemplary gene editing systems include, but are not limited to, CRISPR-Cas systems, zinc finger nuclease (ZFN) systems, and transcription activator-like effector nuclease (TALEN) systems. In some cases, the nucleic acid molecule comprises and/or encodes a component involved in the CRISPR-Cas systems, ZFN systems, or the TALEN systems.

[0136] In some cases, the nucleic acid molecule is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the nucleic acid molecule encodes an antigen that is expressed and elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).

[0137] In some cases, the nucleic acid molecule comprises a nucleic acid enzyme. Nucleic acid enzymes are RNA molecules (e.g., ribozymes) or DNA molecules (e.g., deoxyribozymes) that have catalytic activities. In some instances, the nucleic acid molecule is a ribozyme. In other instances, the nucleic acid molecule is a deoxyribozyme. In some cases, the nucleic acid molecule is a MNAzyme, which functions as a biosensor and/or a molecular switch (see, e.g., Mokany, et al., "MNAzymes, a versatile new class of nucleic acid enzymes that can function as biosensors and molecular switches," JACS 132(2): 1051-1059 (2010)).

[0138] In some instances, exemplary targets of the nucleic acid molecule include, but are not limited to, UL123 (human cytomegalovirus), APOB, AR (androgen receptor) gene, KRAS, PCSK9, CFTR, and SMN (e.g., SMN2).

[0139] In some embodiments, the nucleic acid molecule is at least 5 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 10 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 12 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 15 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 18 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 19 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 20 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 21 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 22 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 23 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 24 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 25 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 26 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 27 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 28 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 29 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 30 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 40 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 50 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 100 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 200 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 300 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 500 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 1000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 2000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 3000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 4000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 5000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 8000 nucleotides or more in length.

[0140] In some embodiments, the nucleic acid molecule is at most 12 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 15 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 18 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 19 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 20 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 21 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 22 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 23 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 24 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 25 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 26 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 27 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 28 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 29 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 30 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 40 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 50 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 100 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 200 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 300 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 500 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 1000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 2000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 3000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 4000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 5000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 8000 nucleotides or less in length.

[0141] In some embodiments, the nucleic acid molecule is about 5 nucleotides in length. In some instances, the nucleic acid molecule is about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides in length. In some instances, the nucleic acid molecule is about 10 nucleotides in length. In some instances, the nucleic acid molecule is about 12 nucleotides in length. In some instances, the nucleic acid molecule is about 15 nucleotides in length. In some instances, the nucleic acid molecule is about 18 nucleotides in length. In some instances, the nucleic acid molecule is about 19 nucleotides in length. In some instances, the nucleic acid molecule is about 20 nucleotides in length. In some instances, the nucleic acid molecule is about 21 nucleotides in length. In some instances, the nucleic acid molecule is about 22 nucleotides in length. In some instances, the nucleic acid molecule is about 23 nucleotides in length. In some instances, the nucleic acid molecule is about 24 nucleotides in length. In some instances, the nucleic acid molecule is about 25 nucleotides in length. In some instances, the nucleic acid molecule is about 26 nucleotides in length. In some instances, the nucleic acid molecule is about 27 nucleotides in length. In some instances, the nucleic acid molecule is about 28 nucleotides in length. In some instances, the nucleic acid molecule is about 29 nucleotides in length. In some instances, the nucleic acid molecule is about 30 nucleotides in length. In some instances, the nucleic acid molecule is about 40 nucleotides in length. In some instances, the nucleic acid molecule is about 50 nucleotides in length. In some instances, the nucleic acid molecule is about 100 nucleotides in length. In some instances, the nucleic acid molecule is about 200 nucleotides in length. In some instances, the nucleic acid molecule is about 300 nucleotides in length. In some instances, the nucleic acid molecule is about 500 nucleotides in length. In some instances, the nucleic acid molecule is about 1000 nucleotides in length. In some instances, the nucleic acid molecule is about 2000 nucleotides in length. In some instances, the nucleic acid molecule is about 3000 nucleotides in length. In some instances, the nucleic acid molecule is about 4000 nucleotides in length. In some instances, the nucleic acid molecule is about 5000 nucleotides in length. In some instances, the nucleic acid molecule is about 8000 nucleotides in length.

[0142] In some embodiments, the nucleic acid molecule is from about 5 to about 10,000 nucleotides in length. In some instances, the nucleic acid molecule is from about 5 to about 9000 nucleotides in length, from about 5 to about 8000 nucleotides in length, from about 5 to about 7000 nucleotides in length, from about 5 to about 6000 nucleotides in length, from about 5 to about 5000 nucleotides in length, from about 5 to about 4000 nucleotides in length, from about 5 to about 3000 nucleotides in length, from about 5 to about 2000 nucleotides in length, from about 5 to about 1000 nucleotides in length, from about 5 to about 500 nucleotides in length, from about 5 to about 100 nucleotides in length, from about 5 to about 50 nucleotides in length, from about 5 to about 40 nucleotides in length, from about 5 to about 30 nucleotides in length, from about 5 to about 25 nucleotides in length, from about 5 to about 20 nucleotides in length, from about 10 to about 8000 nucleotides in length, from about 10 to about 7000 nucleotides in length, from about 10 to about 6000 nucleotides in length, from about 10 to about 5000 nucleotides in length, from about 10 to about 4000 nucleotides in length, from about 10 to about 3000 nucleotides in length, from about 10 to about 2000 nucleotides in length, from about 10 to about 1000 nucleotides in length, from about 10 to about 500 nucleotides in length, from about 10 to about 100 nucleotides in length, from about 10 to about 50 nucleotides in length, from about 10 to about 40 nucleotides in length, from about 10 to about 30 nucleotides in length, from about 10 to about 25 nucleotides in length, from about 10 to about 20 nucleotides in length, from about 18 to about 8000 nucleotides in length, from about 18 to about 7000 nucleotides in length, from about 18 to about 6000 nucleotides in length, from about 18 to about 5000 nucleotides in length, from about 18 to about 4000 nucleotides in length, from about 18 to about 3000 nucleotides in length, from about 18 to about 2000 nucleotides in length, from about 18 to about 1000 nucleotides in length, from about 18 to about 500 nucleotides in length, from about 18 to about 100 nucleotides in length, from about 18 to about 50 nucleotides in length, from about 18 to about 40 nucleotides in length, from about 18 to about 30 nucleotides in length, from about 18 to about 25 nucleotides in length, from about 12 to about 50 nucleotides in length, from about 20 to about 40 nucleotides in length, from about 20 to about 30 nucleotides in length, or from about 25 to about 30 nucleotides in length.

[0143] In some embodiments, the nucleic acid molecule comprises natural, synthetic, or artificial nucleotide analogues or bases. In some cases, the nucleic acid molecule comprises combinations of DNA, RNA and/or nucleotide analogues. In some instances, the synthetic or artificial nucleotide analogues or bases comprise modifications at one or more of ribose moiety, phosphate moiety, nucleoside moiety, or a combination thereof.

[0144] In some embodiments, a nucleotide analogue or artificial nucleotide base described above comprises a nucleic acid with a modification at a 2' hydroxyl group of the ribose moiety. In some instances, the modification includes an H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN, wherein R is an alkyl moiety. Exemplary alkyl moiety includes, but is not limited to, halogens, sulfurs, thiols, thioethers, thioesters, amines (primary, secondary, or tertiary), amides, ethers, esters, alcohols and oxygen. In some instances, the alkyl moiety further comprises a modification. In some instances, the modification comprises an azo group, a keto group, an aldehyde group, a carboxyl group, a nitro group, a nitroso, group, a nitrile group, a heterocycle (e.g., imidazole, hydrazino or hydroxylamino) group, an isocyanate or cyanate group, or a sulfur containing group (e.g., sulfoxide, sulfone, sulfide, or disulfide). In some instances, the alkyl moiety further comprises a hetero substitution. In some instances, the carbon of the heterocyclic group is substituted by a nitrogen, oxygen or sulfur. In some instances, the heterocyclic substitution includes but is not limited to, morpholino, imidazole, and pyrrolidino.

[0145] In some instances, the modification at the 2' hydroxyl group is a 2'-O-methyl modification or a 2'-O-methoxyethyl (2'-O-MOE) modification. In some cases, the 2'-O-methyl modification adds a methyl group to the 2' hydroxyl group of the ribose moiety whereas the 2'O-methoxyethyl modification adds a methoxyethyl group to the 2' hydroxyl group of the ribose moiety.

[0146] In some instances, the modification at the 2' hydroxyl group is a 2'-O-aminopropyl modification in which an extended amine group comprising a propyl linker binds the amine group to the 2' oxygen. In some instances, this modification neutralizes the phosphate-derived overall negative charge of the oligonucleotide molecule by introducing one positive charge from the amine group per sugar and thereby improves cellular uptake properties due to its zwitterionic properties.

[0147] In some instances, the modification at the 2' hydroxyl group is a locked or bridged ribose modification (e.g., locked nucleic acid or LNA) in which the oxygen molecule bound at the 2' carbon is linked to the 4' carbon by a methylene group, thus forming a 2'-C,4'-C-oxy-methylene-linked bicyclic ribonucleotide monomer.

[0148] In some embodiments, additional modifications at the 2' hydroxyl group include 2'-deoxy, T-deoxy-2'-fluoro, 2'-O-aminopropyl (2'-O-AP), 2'-O-dimethylaminoethyl (2'-O-DMAOE), 2'-O-dimethylaminopropyl (2'-O-DMAP), T-O-dimethylaminoethyloxyethyl (2'-O-DMAEOE), or 2'-O-N-methylacetamido (2'-O-NMA).

[0149] In some embodiments, a nucleotide analogue comprises a modified base such as, but not limited to, 5-propynyluridine, 5-propynylcytidine, 6-methyladenine, 6-methylguanine, N, N, -dimethyladenine, 2-propyladenine, 2propylguanine, 2-aminoadenine, 1-methylinosine, 3-methyluridine, 5-methylcytidine, 5-methyluridine and other nucleotides having a modification at the 5 position, 5-(2-amino) propyl uridine, 5-halocytidine, 5-halouridine, 4-acetylcytidine, 1-methyladenosine, 2-methyladenosine, 3-methylcytidine, 6-methyluridine, 2-methylguanosine, 7-methylguanosine, 2, 2-dimethylguanosine, 5-methylaminoethyluridine, 5-methyloxyuridine, deazanucleotides (such as 7-deaza-adenosine, 6-azouridine, 6-azocytidine, or 6-azothymidine), 5-methyl-2-thiouridine, other thio bases (such as 2-thiouridine, 4-thiouridine, and 2-thiocytidine), dihydrouridine, pseudouridine, queuosine, archaeosine, naphthyl and substituted naphthyl groups, any O- and N-alkylated purines and pyrimidines (such as N6-methyladenosine, 5-methylcarbonylmethyluridine, uridine 5-oxyacetic acid, pyridine-4-one, or pyridine-2-one), phenyl and modified phenyl groups such as aminophenol or 2,4, 6-trimethoxy benzene, modified cytosines that act as G-clamp nucleotides, 8-substituted adenines and guanines, 5-substituted uracils and thymines, azapyrimidines, carboxyhydroxyalkyl nucleotides, carboxyalkylaminoalkyi nucleotides, and alkylcarbonylalkylated nucleotides. Modified nucleotides also include those nucleotides that are modified with respect to the sugar moiety, as well as nucleotides having sugars or analogs thereof that are not ribosyl. For example, the sugar moieties, in some cases are or are based on, mannoses, arabinoses, glucopyranoses, galactopyranoses, 4'-thioribose, and other sugars, heterocycles, or carbocycles. The term nucleotide also includes universal bases. By way of example, universal bases include but are not limited to 3-nitropyrrole, 5-nitroindole, or nebularine.

[0150] In some embodiments, a nucleotide analogue further comprises a morpholino, a peptide nucleic acid (PNA), a methylphosphonate nucleotide, a thiolphosphonate nucleotide, a 2'-fluoro N3-P5'-phosphoramidite, or a 1', 5'-anhydrohexitol nucleic acid (HNA). Morpholino or phosphorodiamidate morpholino oligo (PMO) comprises synthetic molecules whose structure mimics natural nucleic acid structure but deviates from the normal sugar and phosphate structures. In some instances, the five member ribose ring is substituted with a six member morpholino ring containing four carbons, one nitrogen, and one oxygen. In some cases, the ribose monomers are linked by a phosphordiamidate group instead of a phosphate group. In such cases, the backbone alterations remove all positive and negative charges making morpholinos neutral molecules capable of crossing cellular membranes without the aid of cellular delivery agents such as those used by charged oligonucleotides.

[0151] In some embodiments, peptide nucleic acid (PNA) does not contain sugar ring or phosphate linkage and the bases are attached and appropriately spaced by oligoglycine-like molecules, therefore, eliminating a backbone charge.

[0152] In some embodiments, one or more modifications optionally occur at the internucleotide linkage. In some instances, modified internucleotide linkage includes, but is not limited to, phosphorothioates; phosphorodithioates; methylphosphonates; 5'-alkylenephosphonates; 5'-methylphosphonate; 3'-alkylene phosphonates; borontrifluoridates; borano phosphate esters and selenophosphates of 3'-5'linkage or 2'-5'linkage; phosphotriesters; thionoalkylphosphotriesters; hydrogen phosphonate linkages; alkyl phosphonates; alkylphosphonothioates; arylphosphonothioates; phosphoroselenoates; phosphorodiselenoates; phosphinates; phosphoramidates; 3'-alkylphosphoramidates; aminoalkylphosphoramidates; thionophosphoramidates; phosphoropiperazidates; phosphoroanilothioates; phosphoroanilidates; ketones; sulfones; sulfonamides; carbonates; carbamates; methylenehydrazos; methylenedimethylhydrazos; formacetals; thioformacetals; oximes; methyleneiminos; methylenemethyliminos; thioamidates; linkages with riboacetyl groups; aminoethyl glycine; silyl or siloxane linkages; alkyl or cycloalkyl linkages with or without heteroatoms of, for example, 1 to 10 carbons that are saturated or unsaturated and/or substituted and/or contain heteroatoms; linkages with morpholino structures, amides, or polyamides wherein the bases are attached to the aza nitrogens of the backbone directly or indirectly; and combinations thereof

[0153] In some embodiments, one or more modifications comprise a modified phosphate backbone in which the modification generates a neutral or uncharged backbone. In some instances, the phosphate backbone is modified by alkylation to generate an uncharged or neutral phosphate backbone. As used herein, alkylation includes methylation, ethylation, and propylation. In some cases, an alkyl group, as used herein in the context of alkylation, refers to a linear or branched saturated hydrocarbon group containing from 1 to 6 carbon atoms. In some instances, exemplary alkyl groups include, but are not limited to, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, neopentyl, hexyl, isohexyl, 1, 1-dimethylbutyl, 2,2-dimethylbutyl, 3.3-dimethylbutyl, and 2-ethylbutyl groups. In some cases, a modified phosphate is a phosphate group as described in U.S. Pat. No. 9,481,905.

[0154] In some embodiments, additional modified phosphate backbones comprise methylphosphonate, ethylphosphonate, methylthiophosphonate, or methoxyphosphonate. In some cases, the modified phosphate is methylphosphonate. In some cases, the modified phosphate is ethylphosphonate. In some cases, the modified phosphate is methylthiophosphonate. In some cases, the modified phosphate is methoxyphosphonate.

[0155] In some embodiments, one or more modifications further optionally include modifications of the ribose moiety, phosphate backbone and the nucleoside, or modifications of the nucleotide analogues at the 3' or the 5' terminus. For example, the 3' terminus optionally include a 3' cationic group, or by inverting the nucleoside at the 3'-terminus with a 3'-3' linkage. In another alternative, the 3'-terminus is optionally conjugated with an aminoalkyl group, e.g., a 3' C5-aminoalkyl dT. In an additional alternative, the 3'-terminus is optionally conjugated with an abasic site, e.g., with an apurinic or apyrimidinic site. In some instances, the 5'-terminus is conjugated with an aminoalkyl group, e.g., a 5'-O-alkylamino substituent. In some cases, the 5'-terminus is conjugated with an abasic site, e.g., with an apurinic or apyrimidinic site.

[0156] In some embodiments, exemplary nucleic acid cargos include, but are not limited to, Fomivirsen, Mipomersen, AZD5312 (AstraZeneca), Nusinersen, and SB010 (Sterna Biologicals).

Small Molecules

[0157] In some embodiments, the cargo is a small molecule. In some instances, the small molecule is an inhibitor (e.g., a pan inhibitor or a selective inhibitor). In other instances, the small molecule is an activator. In additional cases, the small molecule is an agonist, antagonist, a partial agonist, a mixed agonist/antagonist, or a competitive antagonist.

[0158] In some embodiments, the small molecule is a drug that falls under the class of analgesics, antianxiety drugs, antiarrhythmics, antibacterials, antibiotics, anticoagulants and thrombolytics, anticonvulsants, antidepressants, antidiarrheals, antiemetics, antifungals, antihistamines, antihypertensives, anti-inflammatories, antineoplastics, antipsychotics, antipyretics, antivirals, barbiturates, beta-blockers, bronchodilators, cold cures, corticosteroids, cough suppressants, cytotoxics, decongestants, diuretics, expectorant, hormones, hypoglycemics, immunosuppressives, laxatives, muscle relaxants, sex hormones, sleeping drugs, or tranquilizers.

[0159] In some embodiments, the small molecule is an inhibitor, e.g., an inhibitor of a kinase pathway such as the Tyrosine kinase pathway or a Serine/Threonine kinase pathway. In some cases, the small molecule is a dual protein kinase inhibitor. In some cases, the small molecule is a lipid kinase inhibitor.

[0160] In some cases, the small molecule is a neuraminidase inhibitor.

[0161] In some cases, the small molecule is a carbonic anhydrase inhibitor.

[0162] In some embodiments, exemplary targets of the small molecule include, but are not limited to, vascular endothelial growth factor receptor 1 (VEGFR1), vascular endothelial growth factor receptor 2 (VEGFR2), vascular endothelial growth factor receptor 3 (VEGFR3), fibroblast growth factor receptor 1 (FGFR1), fibroblast growth factor receptor 2 (FGFR2), fibroblast growth factor receptor 3 (FGFR3), fibroblast growth factor receptor 4 (FGFR4), cyclin-dependent kinase 4 (CDK4), cyclin-dependent kinase 6 (CDK6), a receptor tyrosine kinase, a phosphoinositide 3-kinase (PI3K) isoform (e.g., PI3K.delta., also known as p110.delta.), Janus kinase 1 (JAK1), Janus kinase 3 (JAK3), a receptor from the family of platelet-derived growth factor receptors (PDFG-R), and carbonic anhydrase (e.g., carbonic anhydrase I).

[0163] In some embodiments, the small molecule targets a viral protein, e.g., a viral envelope protein. In some embodiments, the small molecule decreases viral adsorption to a host cell. In some embodiments, the small molecule decreases viral entry into a host cell. In some embodiments, the small molecule decreases viral replication in a host or a host cell. In some embodiments, the small molecule decreases viral assembly.

[0164] In some embodiments, exemplary small molecule cargos include, but are not limited to, lenvatinib, palbociclib, regorafenib, idelalisib, tofacitinib, nintedanib, zanamivir, ethoxzolamide, and artemisinin.

Proteins

[0165] In some embodiments, the cargo is a protein. In some instances, the protein is a full-length protein. In other instances, the protein is a fragment, e.g., a functional fragment. In some cases, the protein is a naturally occurring protein. In additional cases, the protein is a de novo engineered protein. In further cases, the protein is a fusion protein. In further cases, the protein is a recombinant protein. Exemplary proteins include, but are not limited to, Fc fusion proteins, anticoagulants, blood factors, bone morphogenetic proteins, enzymes, growth factors, hormones, interferons, interleukins, and thrombolytics.

[0166] In some instances, the protein is for use in an enzyme replacement therapy.

[0167] In some cases, the protein is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the protein comprises an antigen that elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).

[0168] In some instances, exemplary protein cargos include, but are not limited to, romiplostim, liraglutide, a human growth hormone (rHGH), human insulin (BHI), follicle-stimulating hormone (FSH), Factor VIII, erythropoietin (EPO), granulocyte colony-stimulating factor (G-CSF), alpha-galactosidase A, alpha-L-iduronidase, N-acetylgalactosamine-4-sulfatase, dornase alfa, tissue plasminogen activator (TPA), glucocerebrosidase, interferon-beta-1a, insulin-like growth factor 1 (IGF-1), or rasburicase.

Peptides

[0169] In some embodiments, the cargo is a peptide. In some instances, the peptide is a naturally occurring peptide. In other instances, the peptide is an artificial engineered peptide or a recombinant peptide. In some cases, the peptide targets a G-protein coupled receptor, an ion channel, a microbe, an anti-microbial target, a catalytic or other Ig-family of receptors, an intracellular target, a membrane-anchored target, or an extracellular target.

[0170] In some cases, the peptide comprises at least 2 amino acids. In some cases, the peptide comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 amino acids. In some cases, the peptide comprises at least 10 amino acids. In some cases, the peptide comprises at least 15 amino acids. In some cases, the peptide comprises at least 20 amino acids. In some cases, the peptide comprises at least 30 amino acids. In some cases, the peptide comprises at least 40 amino acids. In some cases, the peptide comprises at least 50 amino acids. In some cases, the peptide comprises at least 60 amino acids. In some cases, the peptide comprises at least 70 amino acids. In some cases, the peptide comprises at least 80 amino acids. In some cases, the peptide comprises at least 90 amino acids. In some cases, the peptide comprises at least 100 amino acids.

[0171] In some cases, the peptide comprises at most 3 amino acids. In some cases, the peptide comprises at most 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 amino acids. In some cases, the peptide comprises at most 10 amino acids. In some cases, the peptide comprises at most 15 amino acids. In some cases, the peptide comprises at most 20 amino acids. In some cases, the peptide comprises at most 30 amino acids. In some cases, the peptide comprises at most 40 amino acids. In some cases, the peptide comprises at most 50 amino acids. In some cases, the peptide comprises at most 60 amino acids. In some cases, the peptide comprises at most 70 amino acids. In some cases, the peptide comprises at most 80 amino acids. In some cases, the peptide comprises at most 90 amino acids. In some cases, the peptide comprises at most 100 amino acids.

[0172] In some cases, the peptide comprises from about 1 to about 10 kDa. In some cases, the peptide comprises from about 1 to about 9 kDa, about 1 to about 6 kDa, about 1 to about 5 kDa, about 1 to about 4 kDa, about 1 to about 3 kDa, about 2 to about 8 kDa, about 2 to about 6 kDa, about 2 to about 4 kDa, about 1.2 to about 2.8 kDa, about 1.5 to about 2.5 kDa, or about 1.5 to about 2 kDa.

[0173] In some embodiments, the peptide is a cyclic peptide. In some instances, the cyclic peptide is a macrocyclic peptide. In other instances, the cyclic peptide is a constrained peptide. The cyclic peptides are assembled with varied linkages, such as for example, head-to-tail, head-to-side-chain, side-chain to tail, and side-chain to side-chain linkages. In some instances, a cyclic peptide (e.g., a macrocyclic or a constrained peptide) has a molecular weight from about 500 Dalton to about 2000 Dalton. In other instances, a cyclic peptide (e.g., a macrocyclic or a constrained peptide) ranges from about 10 amino acids to about 100 amino acids, from about 10 amino acids to about 70 amino acids, or from about 10 amino acids to about 50 amino acids.

[0174] In some cases, the peptide is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the peptide comprises an antigen that elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).

[0175] In some embodiments, the peptide comprises natural amino acids, unnatural amino acids, or a combination thereof. In some instances, an amino acid residue refers to a molecule containing both an amino group and a carboxyl group. Suitable amino acids include, without limitation, both the D- and L-isomers of the naturally-occurring amino acids, as well as non-naturally occurring amino acids prepared by organic synthesis or other metabolic routes. The term amino acid, as used herein, includes, without limitation, .alpha.-amino acids, natural amino acids, non-natural amino acids, and amino acid analogs.

[0176] In some instances, .alpha.-amino acid refers to a molecule containing both an amino group and a carboxyl group bound to a carbon which is designated the .alpha.-carbon.

[0177] In some instances, .beta.-amino acid refers to a molecule containing both an amino group and a carboxyl group in a .beta. configuration.

[0178] In some embodiments, an amino acid analog is a racemic mixture. In some instances, the D isomer of the amino acid analog is used. In some cases, the L isomer of the amino acid analog is used. In some instances, the amino acid analog comprises chiral centers that are in the R or S configuration.

[0179] In some embodiments, exemplary peptide cargos include, but are not limited to, peginesatide, insulin, adrenocorticotropic hormone (ACTH), calcitonin, oxytocin, vasopressin, octreolide, and leuprorelin.

[0180] In some embodiments, exemplary peptide cargos include, but are not limited to, Telavancin, Dalbavancin, Oritavancin, Anidulafungin, Lanreotide, Pasireotide, Romidepsin, Linaclotide, and Peginesatide.

Antibodies

[0181] In some embodiments, the cargo is an antibody or a binding fragment thereof. In some instances, the antibody or binding fragment thereof comprises a humanized antibody or binding fragment thereof, murine antibody or binding fragment thereof, chimeric antibody or binding fragment thereof, monoclonal antibody or binding fragment thereof, bispecific antibody or biding fragment thereof, monovalent Fab', divalent Fab.sub.2, F(ab)'.sub.3 fragments, single-chain variable fragment (scFv), bis-scFv, (scFv).sub.2, diabody, minibody, nanobody, triabody, tetrabody, disulfide stabilized Fv protein (dsFv), single-domain antibody (sdAb), Ig NAR, camelid antibody or binding fragment thereof, or a chemically modified derivative thereof.

[0182] In some instances, the antibody or binding fragment thereof recognizes a cell surface protein. In some instances, the cell surface protein is an antigen expressed by a cancerous cell. In some instances, the cell surface protein is a neoepitope. In some instances, the cell surface protein comprises one or more mutations compared to a wild-type protein. Exemplary cancer antigens include, but are not limited to, alpha fetoprotein, ASLG659, B7-H3, BAFF-R, Brevican, CA125 (MUC16), CA15-3, CA19-9, carcinoembryonic antigen (CEA), CA242, CRIPTO (CR, CR1, CRGF, CRIPTO, TDGF1, teratocarcinoma-derived growth factor), CTLA-4, CXCR5, E16 (LAT1, SLC7A5), FcRH2 (IFGP4, IRTA4, SPAP1A (SH2 domain containing phosphatase anchor protein 1a), SPAP1B, SPAP1C), epidermal growth factor, ETBR, Fc receptor-like protein 1 (FCRH1), GEDA, HLA-DOB (Beta subunit of MHC class II molecule (Ia antigen), human chorionic gonadotropin, ICOS, IL-2 receptor, IL20R.alpha., Immunoglobulin superfamily receptor translocation associated 2 (IRTA2), L6, Lewis Y, Lewis X, MAGE-1, MAGE-2, MAGE-3, MAGE 4, MART1, mesothelin, MDP, MPF (SMR, MSLN), MCP1 (CCL2), macrophage inhibitory factor (MIF), MPG, MSG783, mucin, MUC1-KLH, Napi3b (SLC34A2), nectin-4, Neu oncogene product, NCA, placental alkaline phosphatase, prostate specific membrane antigen (PMSA), prostatic acid phosphatase, PSCA hlg, anti-transferrin receptor, p97, Purinergic receptor P2X ligand-gated ion channel 5 (P2X5), LY64 (Lymphocyte antigen 64 (RP105), gp100, P21, six transmembrane epithelial antigen of prostate (STEAP1), STEAP2, Sema 5b, tumor-associated glycoprotein 72 (TAG-72), TrpM4 (BR22450, FLJ20041, TRPM4, TRPM4B, transient receptor potential cation channel, subfamily M, member 4) and the like.

[0183] In some instances, the cell surface protein comprises clusters of differentiation (CD) cell surface markers. Exemplary CD cell surface markers include, but are not limited to, CD1, CD2, CD3, CD4, CD5, CD6, CD7, CD8, CD9, CD10, CD11a, CD11b, CD11c, CD11d, CDw12, CD13, CD14, CD15, CD15s, CD16, CDw17, CD18, CD19, CD20, CD21, CD22, CD23, CD24, CD25, CD26, CD27, CD28, CD29, CD30, CD31, CD32, CD33, CD34, CD35, CD36, CD37, CD38, CD39, CD40, CD41, CD42, CD43, CD44, CD45, CD45RO, CD45RA, CD45RB, CD46, CD47, CD48, CD49a, CD49b, CD49c, CD49d, CD49e, CD49f, CD50, CD51, CD52, CD53, CD54, CD55, CD56, CD57, CD58, CD59, CDw60, CD61, CD62E, CD62L (L-selectin), CD62P, CD63, CD64, CD65, CD66a, CD66b, CD66c, CD66d, CD66e, CD71, CD79 (e.g., CD79a, CD79b), CD90, CD95 (Fas), CD103, CD104, CD125 (IL5RA), CD134 (0X40), CD137 (4-1BB), CD152 (CTLA-4), CD221, CD274, CD279 (PD-1), CD319 (SLAMF7), CD326 (EpCAM), and the like.

[0184] In some embodiments, exemplary antibodies or binding fragments thereof include, but are not limited to, zalutumumab (HuMax-EFGr, Genmab), abagovomab (Menarini), abituzumab (Merck), adecatumumab (MT201), alacizumab pegol, alemtuzumab (Campath.RTM., MabCampath, or Campath-1H; Leukosite), AlloMune (BioTransplant), amatuximab (Morphotek, Inc.), anti-VEGF (Genetech), anatumomab mafenatox, apolizumab (hulD10), ascrinvacumab (Pfizer Inc.), atezolizumab (MPDL3280A; Genentech/Roche), B43.13 (OvaRex, AltaRex Corporation), basiliximab (Simulect.RTM., Novartis), belimumab (Benlysta.RTM., GlaxoSmithKline), bevacizumab (Avastin.RTM., Genentech), blinatumomab (Blincyto, AMG103; Amgen), BEC2 (ImGlone Systems Inc.), carlumab (Janssen Biotech), catumaxomab (Removab, Trion Pharma), CEAcide (Immunomedics), Cetuximab (Erbitux.RTM., ImClone), citatuzumab bogatox (VB6-845), cixutumumab (IMC-A12, ImClone Systems Inc.), conatumumab (AMG 655, Amgen), dacetuzumab (SGN-40, huS2C6; Seattle Genetics, Inc.), daratumumab (Darzalex.RTM., Janssen Biotech), detumomab, drozitumab (Genentech), durvalumab (MedImmune), dusigitumab (MedImmune), edrecolomab (MAb17-1A, Panorex, Glaxo Wellcome), elotuzumab (Empliciti.TM., Bristol-Myers Squibb), emibetuzumab (Eli Lilly), enavatuzumab (Facet Biotech Corp.), enfortumab vedotin (Seattle Genetics, Inc.), enoblituzumab (MGA271, MacroGenics, Inc.), ensituxumab (Neogenix Oncology, Inc.), epratuzumab (LymphoCide, Immunomedics, Inc.), ertumaxomab (Rexomun.RTM., Trion Pharma), etaracizumab (Abegrin, MedImmune), farletuzumab (MORAb-003, Morphotek, Inc), FBTA05 (Lymphomun, Trion Pharma), ficlatuzumab (AVEO Pharmaceuticals), figitumumab (CP-751871, Pfizer), flanvotumab (ImClone Systems), fresolimumab (GC1008, Aanofi-Aventis), futuximab, glaximab, ganitumab (Amgen), girentuximab (Rencarex.RTM., Wilex AG), IMAB362 (Claudiximab, Ganymed Pharmaceuticals AG), imalumab (Baxalta), IMC-1C11 (ImClone Systems), IMC-C225 (Imclone Systems Inc.), imgatuzumab (Genentech/Roche), intetumumab (Centocor, Inc.), ipilimumab (Yervoy.RTM., Bristol-Myers Squibb), iratumumab (Medarex, Inc.), isatuximab (SAR650984, Sanofi-Aventis), labetuzumab (CEA-CIDE, Immunomedics), lexatumumab (ETR2-ST01, Cambridge Antibody Technology), lintuzumab (SGN-33, Seattle Genetics), lucatumumab (Novartis), lumiliximab, mapatumumab (HGS-ETR1, Human Genome Sciences), matuzumab (EMD 72000, Merck), milatuzumab (hLL1, Immunomedics, Inc.), mitumomab (BEC-2, ImClone Systems), narnatumab (ImClone Systems), necitumumab (Portrazza.TM., Eli Lilly), nesvacumab (Regeneron Pharmaceuticals), nimotuzumab (h-R3, BIOMAb EGFR, TheraClM, Theraloc, or CIMAher; Biotech Pharmaceutical Co.), nivolumab (Opdivo.RTM., Bristol-Myers Squibb), obinutuzumab (Gazyva or Gazyvaro; Hoffmann-La Roche), ocaratuzumab (AME-133v, LY2469298; Mentrik Biotech, LLC), ofatumumab (Arzerra.RTM., Genmab), onartuzumab (Genentech), Ontuxizumab (Morphotek, Inc.), oregovomab (OvaRex.RTM., AltaRex Corp.), otlertuzumab (Emergent BioSolutions), panitumumab (ABX-EGF, Amgen), pankomab (Glycotope GMBH), parsatuzumab (Genentech), patritumab, pembrolizumab (Keytruda.RTM., Merck), pemtumomab (Theragyn, Antisoma), pertuzumab (Perj eta, Genentech), pidilizumab (CT-011, Medivation), polatuzumab vedotin (Genentech/Roche), pritumumab, racotumomab (Vaxira.RTM., Recombio), ramucirumab (Cyramza.RTM., ImClone Systems Inc.), rituximab (Rituxan.RTM., Genentech), robatumumab (Schering-Plough), Seribantumab (Sanofi/Merrimack Pharmaceuticals, Inc.), sibrotuzumab, siltuximab (Sylvant.TM., Janssen Biotech), Smart MI95 (Protein Design Labs, Inc.), Smart ID10 (Protein Design Labs, Inc.), tabalumab (LY2127399, Eli Lilly), taplitumomab paptox, tenatumomab, teprotumumab (Roche), tetulomab, TGN1412 (CD28-SuperMAB or TAB08), tigatuzumab (CD-1008, Daiichi Sankyo), tositumomab, trastuzumab (Herceptin.RTM.), tremelimumab (CP-672,206; Pfizer), tucotuzumab celmoleukin (EMD Pharmaceuticals), ublituximab, urelumab (BMS-663513, Bristol-Myers Squibb), volociximab (M200, Biogen Idec), and zatuximab.

[0185] In some instances, the antibody or binding fragments thereof is an antibody-drug conjugate (ADC). In some cases, the payload of the ADC comprises, for example, but is not limited to, an auristatin derivative, maytansine, a maytansinoid, a taxane, a calicheamicin, cemadotin, a duocarmycin, a pyrrolobenzodiazepine (PDB), or a tubulysin. In some instances, the payload comprises monomethyl auristatin E (MMAE) or monomethyl auristatin F (MMAF). In some instances, the payload comprises DM2 (mertansine) or DM4. In some instances, the payload comprises a pyrrolobenzodiazepine dimer.

Additional Cargos

[0186] In some embodiments, the cargo is a peptidomimetic. A peptidomimetic is a small protein-like polymer designed to mimic a peptide. In some instances, the peptidomimetic comprises D-peptides. In other instances, the peptidomimetic comprises L-peptides. Exemplary peptidomimetics include peptoids and .beta.-peptides.

[0187] In some embodiments, the cargo is a nucleotidomimetic.

Vectors and Expression Systems

[0188] In certain embodiments, the Arc polypeptides, endo-Gag polypeptides, engineered Arc and engineered endo-Gag polypeptides described supra are encoded by plasmid vectors. In some embodiments, vectors include any suitable vectors derived from either a eukaryotic or prokaryotic sources. In some cases, vectors are obtained from bacteria (e.g. E. coli), insects, yeast (e.g. Pichia pastoris), algae, or mammalian sources.

[0189] Exemplary bacterial vectors include pACYC177, pASK75, pBAD vector series, pBADM vector series, pET vector series, pETM vector series, pGEX vector series, pHAT, pHAT2, pMal-c2, pMal-p2, pQE vector series, pRSET A, pRSET B, pRSET C, pTrcHis2 series, pZA31-Luc, pZE21-MCS-1, pFLAG ATS, pFLAG CTS, pFLAG MAC, pFLAG Shift-12c, pTAC-MAT-1, pFLAG CTC, or pTAC-MAT-2.

[0190] Exemplary insect vectors include pFastBac1, pFastBac DUAL, pFastBac ET, pFastBac HTa, pFastBac HTb, pFastBac HTc, pFastBac M30a, pFastBact M30b, pFastBac, M30c, pVL1392, pVL1393, pVL1393 M10, pVL1393 M11, pVL1393 M12, FLAG vectors such as pPolh-FLAG1 or pPolh-MAT 2, or MAT vectors such as pPolh-MAT1, or pPolh-MAT2.

[0191] In some cases, yeast vectors include Gateway.RTM. pDEST.TM. 14 vector, Gateway.RTM. pDEST.TM. 15 vector, Gateway.RTM. pDEST.TM. 17 vector, Gateway.RTM. pDEST.TM. 24 vector, Gateway.RTM. pYES-DEST52 vector, pBAD-DEST49 Gateway.RTM. destination vector, pAO815 Pichia vector, pFLD1 Pichi pastoris vector, pGAPZA, B, & C Pichia pastoris vector, pPIC3.5K Pichia vector, pPIC6 A, B, & C Pichia vector, pPIC9K Pichia vector, pTEF1/Zeo, pYES2 yeast vector, pYES2/CT yeast vector, pYES2/NT A, B, & C yeast vector, or pYES3/CT yeast vector.

[0192] Exemplary algae vectors include pChlamy-4 vector or MCS vector.

[0193] Examples of mammalian vectors include transient expression vectors or stable expression vectors. Mammalian transient expression vectors include p3xFLAG-CMV 8, pFLAG-Myc-CMV 19, pFLAG-Myc-CMV 23, pFLAG-CMV 2, pFLAG-CMV 6a,b,c, pFLAG-CMV 5.1, pFLAG-CMV 5a,b,c, p3xFLAG-CMV 7.1, pFLAG-CMV 20, p3xFLAG-Myc-CMV 24, pCMV-FLAG-MAT1, pCMV-FLAG-MAT2, pBICEP-CMV 3, or pBICEP-CMV 4. Mammalian stable expression vector include pFLAG-CMV 3, p3xFLAG-CMV 9, p3xFLAG-CMV 13, pFLAG-Myc-CMV 21, p3xFLAG-Myc-CMV 25, pFLAG-CMV 4, p3xFLAG-CMV 10, p3xFLAG-CMV 14, pFLAG-Myc-CMV 22, p3xFLAG-Myc-CMV 26, pBICEP-CMV 1, or pBICEP-CMV 2.

[0194] In some instances, a cell-free system is a mixture of cytoplasmic and/or nuclear components from a cell and is used for in vitro nucleic acid synthesis. In some cases, a cell-free system utilizes either prokaryotic cell components or eukaryotic cell components. Sometimes, a nucleic acid synthesis is obtained in a cell-free system based on for example Drosophila cell, Xenopus egg, or HeLa cells (ATCC.RTM. CCL-2.TM.). Exemplary cell-free systems include, but are not limited to, E. coli S30 Extract system, E. coli T7 S30 system, or PURExpress.RTM..

Host Cells

[0195] In some embodiments, a host cell includes any suitable cell such as a naturally derived cell or a genetically modified cell. In some instances, a host cell is a production host cell. In some instances, a host cell is a eukaryotic cell. In other instances, a host cell is a prokaryotic cell. In some cases, a eukaryotic cell includes fungi (e.g., a yeast cell), an animal cell, or a plant cell. In some cases, a prokaryotic cell is a bacterial cell. Examples of bacterial cell include gram-positive bacteria or gram-negative bacteria. In some embodiments the gram-negative bacteria is anaerobic, rod-shaped, or both.

[0196] In some instances, gram-positive bacteria include Actinobacteria, Firmicutes or Tenericutes. In some cases, gram-negative bacteria include Aquificae, Deinococcus-Thermus, Fibrobacteres-Chlorobi/Bacteroidetes (FCB group), Fusobacteria, Gemmatimonadetes, Nitrospirae, Planctomycetes-Verrucomicrobia/Chlamydiae (PVC group), Proteobacteria, Spirochaetes or Synergistetes. In some embodiments, bacteria is Acidobacteria, Chloroflexi, Chrysiogenetes, Cyanobacteria, Deferribacteres, Dictyoglomi, Thermodesulfobacteria or Thermotogae. In some embodiments, a bacterial cell is Escherichia coli, Clostridium botulinum, or Coli bacilli.

[0197] Exemplary prokaryotic host cells include, but are not limited to, BL21, Mach1.TM., DH10B.TM., TOP10, DH5.alpha., DH10Bac.TM., OmniMax.TM., MegaX.TM., DH12S.TM., INV110, TOP10F', INV.alpha.F, TOP10/P3, ccdB Survival, PIR1, PIR2, Stbl2.TM., Stbl3.TM., or Stbl4.TM..

[0198] In some instances, animal cells include a cell from a vertebrate or from an invertebrate. In some cases, an animal cell includes a cell from a marine invertebrate, fish, insects, amphibian, reptile, mammal, or human. In some cases, a fungus cell includes a yeast cell, such as brewer's yeast, baker's yeast, or wine yeast.

[0199] Fungi include ascomycetes such as yeast, mold, filamentous fungi, basidiomycetes, or zygomycetes. In some instances, yeast includes Ascomycota or Basidiomycota. In some cases, Ascomycota includes Saccharomycotina (true yeasts, e.g. Saccharomyces cerevisiae (baker's yeast)) or Taphrinomycotina (e.g. Schizosaccharomycetes (fission yeasts)). In some cases, Basidiomycota includes Agaricomycotina (e.g. Tremellomycetes) or Pucciniomycotina (e.g. Microbotryomycetes).

[0200] Exemplary yeast or filamentous fungi include, for example, the genus: Saccharomyces, Schizosaccharomyces, Candida, Pichia, Hansenula, Kluyveromyces, Zygosaccharomyces, Yarrowia, Trichosporon, Rhodosporidi, Aspergillus, Fusarium, or Trichoderma. Exemplary yeast or filamentous fungi include, for example, the species: Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida utilis, Candida boidini, Candida albicans, Candida tropicalis, Candida stellatoidea, Candida glabrata, Candida krusei, Candida parapsilosis, Candida guilliermondii, Candida viswanathii, Candida lusitaniae, Rhodotorula mucilaginosa, Pichia metanolica, Pichia angusta, Pichia pastoris, Pichia anomala, Hansenula polymorpha, Kluyveromyces lactis, Zygosaccharomyces rouxii, Yarrowia hpolytica, Trichosporon pullulans, Rhodosporidium toru-Aspergillus niger, Aspergillus nidulans, Aspergillus awamori, Aspergillus oryzae, Trichoderma reesei, Yarrowia hpolytica, Brettanomyces bruxellensis, Candida stellata, Schizosaccharomyces pombe, Torulaspora delbrueckii, Zygosaccharomyces bailii, Cryptococcus neoformans, Cryptococcus gattii, or Saccharomyces boulardii.

[0201] Exemplary yeast host cells include, but are not limited to, Pichia pastoris yeast strains such as GS115, KM71H, SMD1168, SMD1168H, and X-33; and Saccharomyces cerevisiae yeast strain such as INVScl.

[0202] In some instances, additional animal cells include cells obtained from a mollusk, arthropod, annelid or sponge. In some cases, an additional animal cell is a mammalian cell, e.g., from a human, primate, ape, equine, bovine, porcine, canine, feline or rodent. In some cases, a rodent includes mouse, rat, hamster, gerbil, hamster, chinchilla, fancy rat, or guinea pig.

[0203] Exemplary mammalian host cells include, but are not limited to, 293A cell line, 293FT cell line, 293F cells, 293 H cells, CHO DG44 cells, CHO-S cells, CHO-Kl cells, Expi293F.TM. cells, Flp-In.TM. T-REx.TM. 293 cell line, Flp-In.TM.-293 cell line, Flp-In.TM.-3T3 cell line, Flp-In.TM.-BHK cell line, Flp-In.TM.-CHO cell line, Flp-In.TM.-CV-1 cell line, Flp-In.TM.-Jurkat cell line, FreeStyle.TM. 293-F cells, FreeStyle.TM. CHO-S cells, GripTite.TM. 293 MSR cell line, GS-CHO cell line, HepaRG.TM. cells, T-REx.TM. Jurkat cell line, Per.C6 cells, T-REx.TM.-293 cell line, T-REx.TM.-CHO cell line, and T-REx.TM.-HeLa cell line.

[0204] In some instances, a mammalian host cell is a primary cell. In some instances, a mammalian host cell is a stable cell line, or a cell line that has incorporated a genetic material of interest into its own genome and has the capability to express the product of the genetic material after many generations of cell division. In some cases, a mammalian host cell is a transient cell line, or a cell line that has not incorporated a genetic material of interest into its own genome and does not have the capability to express the product of the genetic material after many generations of cell division.

[0205] Exemplary insect host cell include, but are not limited to, Drosophila S2 cells, Sf9 cells, Sf21 cells, High Five.TM. cells, and expresSF+.RTM. cells.

[0206] In some instances, plant cells include a cell from algae. Exemplary insect cell lines include, but are not limited to, strains from Chlamydomonas reinhardtii 137c, or Synechococcus elongatus PPC 7942.

Methods of Use

[0207] Disclosed herein, in certain embodiments, are methods of preparing a capsid which encapsulates a cargo. In some embodiments, the method comprises incubating a plurality of Arc or endo-Gag polypeptides, engineered Arc or endo-Gag polypeptides, and/or recombinant Arc or endo-Gag polypeptides with a cargo in a solution for a time sufficient to generate a loaded Arc-based capsid or endo-Gag-based capsid.

[0208] In some instances, the method comprises mixing a solution comprising a plurality of engineered and/or recombinant Arc polypeptides with a plurality of non-Arc capsid forming subunits prior to incubating with the cargo. In some cases, the plurality of non-Arc capsid forming subunits are mixed with the plurality of engineered and/or recombinant Arc polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In other cases, the plurality of non-Arc capsid forming subunits are mixed with the plurality of engineered and/or recombinant Arc polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10.

[0209] In some cases, the time sufficient to generate a loaded Arc-based capsid or endo-Gag-based capsid is at least about 5 minutes, at least about 10 minutes, at least about 20 minutes, at least about 30 minutes, at least about 1 hour, at least about 2 hours, at least about 4 hours, at least about 6 hours, at least about 10 hours, at least about 12 hours, at least about 24 hours, or more.

[0210] In some cases, the Arc-based capsid or endo-Gag-based capsid is prepared at a temperature from about 2.degree. C. to about 37.degree. C. In some instances, the Arc-based capsid or endo-Gag-based capsid is prepared at a temperature from about 2.degree. C. to about 8.degree. C., about 2.degree. C. to about 4.degree. C., about 20.degree. C. to about 37.degree. C., about 25.degree. C. to about 37.degree. C., about 20.degree. C. to about 30.degree. C., about 25.degree. C. to about 30.degree. C., or about 30.degree. C. to about 37.degree. C.

[0211] In some cases, the Arc-based capsid or endo-Gag-based capsid is prepared at room temperature.

[0212] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for systemic administration.

[0213] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for local administration.

[0214] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for parenteral (e.g., intra-arterial, intra-articular, intradermal, intralesional, intramuscular, intraocular, intraosseous infusion, intraperitoneal, intrathecal, intravenous, intravitreal, or subcutaneous) administration.

[0215] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for topical administration.

[0216] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for oral administration.

[0217] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for sublingual administration.

[0218] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for aerosol administration.

[0219] In certain embodiments, also described herein is a use of an Arc-based capsid or endo-Gag-based capsid for delivery of a cargo to a site of interest. In some instances, the method comprises contacting a cell at the site of interest with an Arc-based capsid or endo-Gag-based capsid for a time sufficient to facilitate cellular uptake of the capsid.

[0220] In some cases, the cell is a muscle cell, a skin cell, a blood cell, or an immune cell (e.g., a T cell or a B cell).

[0221] In some instances, the cell is a tumor cell, e.g., a solid tumor cell or a cell from a hematologic malignancy. In some cases, the solid tumor cell is a cell from a bladder cancer, breast cancer, brain cancer, colorectal cancer, kidney cancer, liver cancer, lung cancer, pancreatic cancer, prostate cancer, skin cancer, stomach cancer, or thyroid cancer. In some cases, the cell from a hematologic malignancy is from a B-cell malignancy or a T-cell malignancy. In some cases, the cell is from a leukeuma, a lymphoma, a myeloma, chronic lymphocytic leukemia (CLL), small lymphocytic lymphoma (SLL), diffuse large B cell lymphoma (DLBCL), follicular lymphoma, mantle cell lymphoma, Burkitt lymphoma, cutaneous T-cell lymphoma, peripheral T cell lymphoma, multiple myeloma, plasmacytoma, acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), or chronic myeloid leukemia (CML).

[0222] In some embodiments, the cell is a somatic cell. In some instances, the cell is a blood cell, a skin cell, a connective tissue cell, a bone cell, a muscle cell, or a cell from an organ.

[0223] In some embodiments, the cell is an epithelial cell, a connective tissue cell, a muscular cell, or a neuron.

[0224] In some instances, the cell is an endodermal cell, a mesodermal cell, or an ectodermal. In some instances, the endoderm comprises cells of the respiratory system, the intestine, the liver, the gallbladder, the pancreas, the islets of Langerhans, the thyroid, or the hindgut. In some cases, the mesoderm comprises osteochondroprogenitor cells, muscle cells, cells from the digestive system, renal stem cells, cells from the reproductive system, cells from the circulatory system (such as endothelial cells). Exemplary cells from the ectoderm comprise epithelial cells, cells of the anterior pituitary, cells of the peripheral nervous system, cells of the neuroendocrine system, cells of the eyes, cells of the central nervous system, cells of the ependymal, or cells of the pineal gland. In some cases, cells derived from the central and peripheral nervous system comprise neurons, Schwann cells, satellite glial cells, oligodendrocytes, or astrocytes. In some cases, neurons further comprise interneurons, pyramidal neurons, gabaergic neurons, dopaminergic neurons, serotoninergic neurons, glutamatergic neurons, motor neurons from the spinal cord, or inhibitory spinal neurons.

[0225] In some embodiments, the cell is a stem cell or a progenitor cell. In some cases, the cell is a mesenchymal stem or progenitor cell. In other cases, the cell is a hematopoietic stem or progenitor cell.

[0226] In some cases, a target protein is overexpressed or is depleted in the cell. In some cases, the target protein is overexpressed in the cell. In additional cases, the target protein is depleted in the cell.

[0227] In some cases, a target gene in the cell has one or more mutations.

[0228] In some cases, the cell comprises an impaired splicing mechanism.

[0229] In some instances, the Arc-based capsid is administered systemically to a subject in need thereof.

[0230] In other instances, the Arc-based capsid or endo-Gag-based capsid is administered locally to a subject in need thereof.

[0231] In some embodiments, the Arc-based capsid or endo-Gag-based capsid is administered parenterally, orally, topically, via sublingual, or by aerosol to a subject in need thereof. In some cases, the Arc-based capsid or endo-Gag-based capsid is administered parenterally to a subject in need thereof. In other cases, the Arc-based capsid or endo-Gag-based capsid is administered orally to a subject in need thereof. In additional cases, the Arc-based capsid or endo-Gag-based capsid is administered topically, via sublingual, or by aerosol to a subject in need thereof.

[0232] In some embodiments, a delivery component is combined with an Arc-based capsid or endo-Gag-based capsid for a targeted delivery to a site of interest. In some instances, the delivery component comprises a carrier, e.g., an extracellular vesicle such as a micelle, a liposome, or a microvesicle; or a viral envelope.

[0233] In some instances, the delivery component serves as a primary delivery vehicle for an Arc-based capsid or endo-Gag-based capsid which does not comprise its own delivery component (e.g., in which the second polypeptide is not present). In such cases, the delivery component directs the Arc-based capsid or endo-Gag-based capsid to a target site of interest and optionally facilitates intracellular uptake.

[0234] In other instances, the delivery component enhances target specificity and/or sensitivity of an Arc-based capsid's second polypeptide. In such cases, the delivery component enhances the specificity and/or affinity of the Arc-based capsid or endo-Gag-based capsid to the target site. In additional cases, the delivery components enhances the specificity and/or affinity by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. In further cases, the delivery components enhances the specificity and/or affinity by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more. Further still, the delivery component optionally minimizes off-target effect by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. Further still, the delivery component optionally minimizes off-target effect by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more.

[0235] In additional instances, the delivery component serves as a first vehicle that transports an Arc-based capsid to a general target region (e.g., a tumor microenvironment) and the Arc-based or endo-Gag-based capsid's second polypeptide serves as a second delivery molecule that drives the Arc-based capsid or endo-Gag-based capsid to the specific target site and optionally facilitates intracellular uptake. In such cases, the delivery component minimizes off-target effect by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. In such cases, the delivery component minimizes off-target effect by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more.

[0236] In further instances, the delivery component serves as a first vehicle that transports an Arc-based capsid to a target site of interest and the Arc-based or endo-Gag-based capsid's second polypeptide serves as a second delivery molecule that facilitates intracellular uptake.

[0237] In some embodiments, the delivery component comprises an extracellular vesicle. In some instances, the extracellular vesicle comprises a microvesicle, a liposome, or a micelle. In some instances, the extracellular vesicle has a diameter of from about 10 nm to about 2000 nm, from about 10 nm to about 1000 nm, from about 10 nm to about 800 nm, from about 20 nm to about 600 nm, from about 30 nm to about 500 nm, from about 50 nm to about 200 nm, or from about 80 nm to about 100 nm.

[0238] In some embodiments, the delivery component comprises a microvesicle. Also known as circulating microvesicles or microparticles, microvesicles are membrane-bound vesicles that comprise phospholipids. In some instances, the microvesicle has a diameter of from about 50 nm to about 1000 nm, from about 100 nm to about 800 nm, from about 200 nm to about 500 nm, or from about 50 nm to about 400 nm.

[0239] In some instances, the microvesicle is originated from cell membrane inversion, exocytosis, shedding, blebbing, or budding. In some instances, the microvesicles are generated from differentiated cells. In other instances, the microvesicles are generated from undifferentiated cells, e.g., by blast cells, progenitor cells, or stem cells.

[0240] In some embodiments, the delivery component comprises a liposome. In some instances, the liposome comprises a plurality of lipopeptides, which are presented on the surface of the liposome, for targeted delivery to a site or region of interest. In some cases, the liposomes fuse with the target cell, whereby the contents of the liposome are then emptied into the target cell. In some cases, a liposome is endocytosed by cells that are phagocytic. Endocytosis is then followed by intralysosomal degradation of liposomal lipids and release of the encapsulated agents.

[0241] Exemplary liposomes suitable for incorporation include, and are not limited to, multilamellar vesicles (MLV), oligolamellar vesicles (OLV), unilamellar vesicles (UV), small unilamellar vesicles (SUV), medium-sized unilamellar vesicles (MUV), large unilamellar vesicles (LUV), giant unilamellar vesicles (GUV), multivesicular vesicles (MVV), single or oligolamellar vesicles made by reverse-phase evaporation method (REV), multilamellar vesicles made by the reverse-phase evaporation method (MLV-REV), stable plurilamellar vesicles (SPLV), frozen and thawed MLV (FATMLV), vesicles prepared by extrusion methods (VET), vesicles prepared by French press (FPV), vesicles prepared by fusion (FUV), dehydration-rehydration vesicles (DRV), and bubblesomes (BSV). In some instances, a liposome comprises Amphipol (A8-35). Techniques for preparing liposomes are described in, for example, COLLOIDAL DRUG DELIVERY SYSTEMS, vol. 66 (J. Kreuter ed., Marcel Dekker, Inc. (1994)).

[0242] Depending on the method of preparation, liposomes are unilamellar or multilamellar, and vary in size with diameters ranging from about 20 nm to greater than about 1000 nm.

[0243] In some instances, liposomes provided herein also comprise carrier lipids. In some embodiments the carrier lipids are phospholipids. Carrier lipids capable of forming liposomes include, but are not limited to, dipalmitoylphosphatidylcholine (DPPC), phosphatidylcholine (PC; lecithin), phosphatidic acid (PA), phosphatidylglycerol (PG), phosphatidylethanolamine (PE), or phosphatidylserine (PS). Other suitable phospholipids further include distearoylphosphatidylcholine (DSPC), dimyristoylphosphatidylcholine (DMPC), dipalmitoylphosphatidyglycerol (DPPG), distearoylphosphatidyglycerol (DSPG), dimyristoylphosphatidylglycerol (DMPG), dipalmitoylphosphatidic acid (DPPA); dimyristoylphosphatidic acid (DMPA), distearoylphosphatidic acid (DSPA), dipalmitoylphosphatidylserine (DPPS), dimyristoylphosphatidylserine (DMPS), distearoylphosphatidylserine (DSPS), dipalmitoylphosphatidyethanolamine (DPPE), dimyristoylphosphatidylethanolamine (DMPE), distearoylphosphatidylethanolamine (DSPE) and the like, or combinations thereof. In some embodiments, the liposomes further comprise a sterol (e.g., cholesterol) which modulates liposome formation. The carrier lipids are optionally any non-phosphate polar lipids.

[0244] In some embodiments, the delivery component comprises a micelle. In some instances, the micelle has a diameter from about 2 nm to about 250 nm, from about 20 nm to about 200 nm, from about 20 nm to about 100 nm, or from about 50 to about 100 nm.

[0245] In some instances, the micelle is a polymeric micelle, characterized by a core shell structure, in which the hydrophobic core is surrounded by a hydrophilic shell. In some cases, the hydrophilic shell further comprises a hydrophilic polymer or copolymer and a pH sensitive component.

[0246] Exemplary hydrophilic polymers or copolymers include, but are not limited to, poly(N-substituted acrylamides), poly(N-acryloyl pyrrolidine), poly(N-acryloyl piperidine), poly(N-acryl-L-amino acid amides), poly(ethyl oxazoline), methylcellulose, hydroxypropyl acrylate, hydroxyalkyl cellulose derivatives and poly(vinyl alcohol), poly(N-isopropylacrylamide), poly(N-vinyl-2-pyrrolidone), polyethyleneglycol derivatives, and combinations thereof.

[0247] The pH-sensitive moiety includes, but is not limited to, an alkylacrylic acid such as methacrylic acid, ethylacrylic acid, propyl acrylic acid and butyl acrylic acid, or an amino acid such as glutamic acid.

[0248] In some instances, the hydrophobic moiety constitutes the core of the micelle and includes, for example, a single alkyl chain, such as octadecyl acrylate or a double chain alkyl compound such as phosphatidylethanolamine or dioctadecylamine. In some cases, the hydrophobic moiety is optionally a water insoluble polymer such as a poly(lactic acid) or a poly(e-caprolactone).

[0249] Polymeric micelles exhibiting pH-sensitive properties are also contemplated and are formed, e.g., by using pH-sensitive polymers including, but not limited to, copolymers from methacrylic acid, methacrylic acid esters and acrylic acid esters, polyvinyl acetate phthalate, hydroxypropyl methyl cellulose phthalate, cellulose acetate phthalate, or cellulose acetate trimellitate.

[0250] In some embodiments, the delivery component comprises a viral envelope. Viral envelopes comprise glycoproteins, phospholipids, and additional proteins obtained from a host. In some instances, the viral envelope is permissive to a wide range of target cells. In other instances, the viral envelope is non-permissive and is specific to a target cell of interest. In some cases, the viral envelope comprises a cell-specific binding protein and optionally a fusogenic molecule that aids in the fusion of the cargo into a target cell. In some cases, the viral envelope comprises an endogenous viral envelope. In other cases, the viral envelope is a modified envelop, comprising one or more foreign proteins.

[0251] In some instances, the viral envelope is derived from a DNA virus. Exemplary enveloped DNA viruses include viruses from the family of Herpesviridae, Poxviridae, and Hepadnavirdae.

[0252] In other instances, the viral envelope is derived from an RNA virus. Exemplary enveloped RNA viruses include viruses from the family of Bunyaviridae, Coronaviridae, Filoviridae, Flaviviridae, Orthomyxoviridae, Paramyxoviridae, Rhabdoviridae, and Togaviridae.

[0253] In additional instances, the viral envelope is derived from a virus from the family of Retroviridae.

[0254] In some embodiments, the viral envelope is from an oncolytic virus, such as an oncolytic DNA virus from the family of Herpesviridae (for example, HSV1) or Poxviridae (for example, Vaccinia virus and myxoma virus); or an oncolytic RNA virus from the family of Rhabdoviridae (for example, VSV) or Paramyxoviridae (for example MV and NDV).

[0255] In some instances, the viral envelope further comprises a foreign or engineered protein that binds to an antigen or a cell surface molecule. Exemplary antigens and cell surface molecules for targeting include, but are not limited to, P-glycoprotein, Her2/Neu, erythropoietin (EPO), epidermal growth factor receptor (EGFR), vascular endothelial growth factor receptor (VEGF-R), cadherin, carcinoembryonic antigen (CEA), CD4. CD8, CD19. CD20, CD33, CD34, CD45, CD117 (c-kit), CD133, HLA-A, HLA-B, HLA-C, chemokine receptor 5 (CCRS), stem cell marker ABCG2 transporter, ovarian cancer antigen CA125, immunoglobulins, integrins, prostate specific antigen (PSA), prostate stem cell antigen (PSCA), dendritic cell-specific intercellular adhesion molecule 3-grabbing nonintegrin (DC-SIGN), thyroglobulin, granulocyte-macrophage colony stimulating factor (GM-CSF), myogenic differentiation promoting factor-1 (MyoD-1), Leu-7 (CD57), LeuM-1, cell proliferation-associated human nuclear antigen defined by the monoclonal antibody Ki-67 (Ki-67), viral envelope proteins, HIV gp120, or transferrin receptor.

[0256] In some embodiments, the Arc-based capsid or endo-Gag-based capsid is for in vitro use.

[0257] In some instances, the Arc-based capsid or endo-Gag-based capsid is for ex vivo use.

[0258] In some cases, the Arc-based capsid or endo-Gag-based capsid is for in vivo use.

Kits/Article of Manufacture

[0259] Disclosed herein, in certain embodiments, are kits and articles of manufacture for use with one or more methods described herein. Such kits include a carrier, package, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein. Suitable containers include, for example, bottles, vials, syringes, and test tubes. In one embodiment, the containers are formed from a variety of materials such as glass or plastic.

[0260] For example, the container(s) include a recombinant or engineered Arc or endo-Gag polypeptide described above. Such kits optionally include an identifying description or label or instructions relating to its use in the methods described herein. For example, a kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use. A set of instructions will also typically be included.

CERTAIN TERMINOLOGIES

[0261] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood. It is to be understood that the detailed description are exemplary and explanatory only and are not restrictive of any subject matter claimed. In this application, the use of the singular includes the plural unless specifically stated otherwise. It must be noted that, as used in the specification, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise. In this application, the use of "or" means "and/or" unless stated otherwise. Furthermore, use of the term "including" as well as other forms, such as "include", "includes," and "included," is not limiting.

[0262] Although various features of the invention may be described in the context of a single embodiment, the features may also be provided separately or in any suitable combination. Conversely, although the invention may be described herein in the context of separate embodiments for clarity, the invention may also be implemented in a single embodiment.

[0263] Reference in the specification to "some embodiments", "an embodiment", "one embodiment" or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions.

[0264] As used herein, ranges and amounts can be expressed as "about" a particular value or range. About also includes the exact amount. Hence "about 5 .mu.L" means "about 5 .mu.L" and also "5 .mu.L." Generally, the term "about" includes an amount that would be expected to be within experimental error.

[0265] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.

[0266] As used herein, the sequence of a CA N-lobe described herein corresponds to the human CA N-lobe. In some instances, the human CA N-lobe comprises residues 207-278 of SEQ ID NO: 1. In some instances, a CA N-lobe described herein comprises about 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% sequence identity to residue 207-278 of SEQ ID NO: 1. In some cases, a CA N-lobe described herein shares a structural similarity with the human CA N-lobe. For example, a CA N-lobe described herein shares about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity with the human CA N-lobe. In some cases, the CA N-lobe shares a high structural similarity (e.g., 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity) but does not share a high sequence identity (e.g., the sequence identity is lower than 80%, lower than 70%, lower than 60%, lower than 50%, lower than 40%, or lower than 30%). In some cases, the CA N-lobe comprises residues 207-278 of SEQ ID NO: 1.

[0267] As used herein, the sequence of a CA C-lobe described herein corresponds to the human CA C-lobe. In some instances, the human CA C-lobe comprises residues 278-370 of SEQ ID NO: 1. In some instances, a CA C-lobe described herein comprises about 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% sequence identity to residue 278-370 of SEQ ID NO: 1. In some cases, a CA C-lobe described herein shares a structural similarity with the human CA C-lobe. For example, a CA C-lobe described herein shares about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity with the human CA C-lobe. In some cases, the CA C-lobe shares a high structural similarity (e.g., 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity) but does not share a high sequence identity (e.g., the sequence identity is lower than 80%, lower than 70%, lower than 60%, lower than 50%, lower than 40%, or lower than 30%). In some cases, the CA C-lobe comprises residues 278-370 of SEQ ID NO: 1.

[0268] As used herein, the terms "individual(s)", "subject(s)" and "patient(s)" mean any mammal. In some embodiments, the mammal is a human. In some embodiments, the mammal is a non-human. None of the terms require or are limited to situations characterized by the supervision (e.g. constant or intermittent) of a health care worker (e.g. a doctor, a registered nurse, a nurse practitioner, a physician's assistant, an orderly or a hospice worker).

EXAMPLES

[0269] These examples are provided for illustrative purposes only and not to limit the scope of the claims provided herein.

Example 1--Construction of DNA Vectors Encoding Recombinant Arc Proteins and Engineered Arc Proteins

[0270] To construct recombinant DNA vectors for Arc expression, full length cDNA open reading frames, excluding the initial methionine, are inserted into a cloning vector and subsequently transferred into an expression vector according to standard methods. The same approach is used to construct recombinant DNA vectors for expressing endo-Gag proteins. Human Arc cDNA includes an annotated matrix domain (MA) and a capsid domain. The capsid domain has an N-terminal lobe (NTD) and a C-terminal lobe (CTD). FIG. 1 illustrates the structure of the Human Arc protein and the predicted structure of Arc from Python, Platypus, and Orca.

[0271] cDNAs encoding engineered Arc proteins are optionally generated by recombining Arc sequences from different species (FIG. 2), by inserting functional domains from other proteins into an Arc protein (FIG. 3A), by modifying the sequence of an Arc protein (FIG. 3B), and/or by any combination of the approaches exemplified in FIGS. 2-3. cDNAs encoding engineered endo-Gag proteins are likewise generated by recombining endo-Gag sequences from different species, by inserting functional domains from other proteins into an endo-Gag protein, by modifying the sequence of an endo-Gag protein, and/or by any combination of these approaches. Furthermore, an engineered endo-Gag protein optionally contains Arc sequences and an engineered Arc protein optionally contains endo-Gag sequences. Engineered Arc and endo-Gag protein monomers assemble into capsids.

[0272] cDNAs encoding the Arc and endo-Gag proteins of Table 1 were inserted into an expression vector derived from pET-41 a(+) (EMD Millipore (Novagen) Cat #70566). The entire cloning site of pET-41 a(+) was removed and replaced with the DNA having the nucleotide sequence of SEQ ID NO: 57, which encodes an alternative N-terminal tag having the amino acid sequence of SEQ ID NO: 58 and comprising a 6.times.His tag (SEQ ID NO: 59), a 6 amino acid spacer (SEQ ID NO: 60), and an AcTEV.TM. cleavage site (SEQ ID NO: 61). Arc and endo-Gag open reading frames without their starting methionine codon were inserted after the AcTEV.TM. cleavage site by Gibson assembly. Gibson D G, Young L, Chuang R Y, Venter J C, Hutchison C A 3rd, Smith HO (2009). "Enzymatic assembly of DNA molecules up to several hundred kilobases". Nature Methods. 6 (5): 343-345. After expression and AcTEV.TM. cleavage, the N-terminus of the resulting Arc or endo-Gag protein has a single residual Glycine from the AcTEV.TM. cleavage site.

TABLE-US-00001 SEQ ID NO: 57 ATGCATCACCATCACCATCACGGCTCAGGGTCTGGTAGCGAAAATCTGTA CTTCCAGGGG SEQ ID NO: 58 MHHHHHHGSGSGSENLYFQG SEQ ID NO: 59 HHHHHH SEQ ID NO: 60 GSGSGS SEQ ID NO: 61 ENLYFQG

TABLE-US-00002 TABLE 1 Sequences of Arc and endo-Gag polypeptides and nucleotides. SEQ ID NO: Gene Species Amino Name Common name Proper name Sequence ID acid DNA Arc Human Homo sapiens NP_056008.1 1 29 Arc Killer Whale Orcinus orca XP_004265337.1 2 30 Arc White Tailed Deer Odocoileus XP_020755692.1 3 31 virginianus texanus Arc Platypus Ornithorhynchus XP_001512750.1 4 32 anatinus Arc Goose Anser cygnoides XP_013046406.1 5 33 domesticus Arc Dalmation Pelican Pelecanus crispus KFQ60200.1 6 34 Arc White Tailed Eagle Haliaeetus albicilla KFQ04633.1 7 35 Arc King Cobra Ophiophagus ETE60609.1 8 36 hannah Arc Ray Finned Fish Austrofundulus XP_013881732.1 9 37 limnaeus Arc Sperm Whale Physeter catodon XP_007119193.2 10 38 Arc Turkey Meleagris XP_010707654.1 11 39 gallopavo Arc Central Bearded Pogona vitticeps XP_020633722.1 12 40 Dragon Arc Chinese Alligator Alligator sinensis XP_006027442.1 13 41 Arc American Alligator Alligator XP_019337372.1 14 42 mississippiensis Arc Japanese Gekko Gekko japonicus XP_015273745.1 15 43 PNMA3 Human Homo sapiens NP_001269464.1 16 44 PNMA5 Human Homo sapiens NP_001096620.1 17 45 PNMA6A Human Homo sapiens NP_116271.3 18 46 PNMA6B Human Homo sapiens SP_ P0C5W0.1 19 47 RTL3 Human Homo sapiens NP_689907.1 20 48 RTL6 Human Homo sapiens NP_115663.2 21 49 RTL8A Human Homo sapiens NP_001071640.1 22 50 RTL8B Human Homo sapiens NP_001071641.1 23 51 BOP Human Homo sapiens NP_078903.3 24 52 LDOC1 Human Homo sapiens NP_036449.1 25 53 ZNF18 Human Homo sapiens NP_001290210.1 26 54 MOAP1 Human Homo sapiens AAG31786.1 27 55 PEG10 Human Homo sapiens NP_055883.2 28 56

Example 2--Expression and Purification of Arc and Endo-Gag Proteins

[0273] Expression vectors constructs comprising Arc and endo-Gag open reading frames were transformed into the Rosetta 2 (DE3)pLysS E. coli strain (Millipore Sigma, Cat #71403). Arc or endo-Gag expression was induced with 0.1 mM IPTG followed by a 16-hour incubation at 16.degree. C. Cell pellets were lysed by sonication in 20 mM sodium phosphate pH 7.4, 0.1M NaCl, 40 mM imidazole, 1 mM DTT, and 10% glycerol. The lysate was treated with excess TURBO DNase (Thermo Fisher Scientific, Cat #AM2238), RNase Cocktail (Thermo Fisher Scientific, Cat #AM2286), and Benzonase Nuclease (Millipore Sigma, Cat #71205) to eliminate nucleic acids. NaCl was added to lysate in order to adjust the NaCl concentration to 0.5 M followed by centrifugation and filtration to remove cellular debris. 6.times.His-tagged recombinant protein was loaded onto a HisTrap HP column (GE Healthcare, Cat #17-5247-01), washed with buffer A (20 mM sodium phosphate pH 7.4, 0.5M NaCl, 40 mM imidazole, and 10% glycerol), and eluted with a linear gradient of buffer B (20 mM sodium phosphate pH 7.4, 0.5M NaCl, 500 mM imidazole, and 10% glycerol). Collection tubes were supplemented in advance with 10 .mu.l of 0.5 M EDTA pH 8.0 per 1 ml eluate. The resulting Arc or endo-Gag protein is generally more than 95% pure as revealed by SDS-PAGE analysis, with a yield of up to 50 mg per 1 L of bacterial culture. FIG. 4A.

[0274] Residual nucleic acid was removed by anion exchange chromatography on a mono Q 5/50 GL column (GE Healthcare, Cat #17516601). Before loading to the column, recombinant protein was buffer exchanged to buffer C (20 mM Tris-HCl pH 8.0, 100 mM NaCl, and 10% glycerol) using "Pierce Protein Concentrator PES, 10K MWCO, 5-20 ml" (Thermo Scientific, Cat #88528) according to the manufacturer's protocol. After loading, the mono Q resin was washed with 2 ml of buffer C. Arc and endo-Gag proteins were eluted using a linear gradient of buffer D (20 mM Tris-HCl pH 8.0, 500 mM NaCl, and 10% glycerol). RNA efficiently separated from Arc and eluted at 600 mM NaCl (FIG. 4B).

[0275] The N-terminal 6.times.His tag and spacer were removed from concentrating peak fractions of the mono Q purified Arc using a 10 kDa MWCO PES concentrator and then treating with 10% v/v of AcTEV.TM. Protease (Invitrogen.TM. #12575023). The cleavage efficiency is above 99% as revealed by SDS-PAGE assay. The protein is then diluted into HisTrap Buffer A and cleaned with HisTrap HP resin. The resulting purified Arc has an N-terminal Glycine residue and does not contain the initial methionine.

Example 3--Capsid Assembly

[0276] Cleaved Arc protein (1 mg/mL) was loaded into a 20 kDa MWCO dialysis cassette and dialyzed overnight in 1M sodium phosophate (pH 7.5) at room temperature. The following day, the solution was removed from the cassette, transferred to microcentrifuge tubes, and spun at max speed for 5 minutes in a tabletop centrifuge. The supernatant was transferred to a 100 kDa MWCO Regenerated Cellulose Amicon Ultrafiltration Centrifugal concentrator. The buffer was exchanged to PBS pH 7.5 and the volume was reduced 20-fold.

[0277] Capsid assembly was assayed by transmission electron microscopy. EM grids (Carbon Support Film, Square Grid, 400 mesh, 5-6 nm, Copper, CF400-Cu-UL) were prepared by glow discharge. A 5 .mu.L sample of purified Arc was applied to the grid for 20 seconds and then wicked away using filter paper. The grid was then washed with MilliQ H.sub.2O, stained with 5 .mu.L of 1% Uranyl Acetate in H.sub.2O for 30 seconds, and air dried for 1 minute. Images of Arc capsids were acquired using a FEI Talos L120C TEM equipped with a Gatan 4k.times.4k OneView camera. FIG. 5 shows concentrated human Arc capsids. FIG. 6 shows capsids formed from recombinantly expressed Arc orthologs from other vertebrate species. FIG. 7 shows capsids formed from recombinantly expressed endo-Gag genes from other vertebrate species.

Example 4--Selective Cellular Internalization of Arc Capsids

[0278] Capsids assembled from isolated recombinant human Arc protein (0.5 mg/ml) were fluorescently labeled by reacting with a 50-molar excess of NHS ester Alexa Fluor.TM. 594-NHS dye (Invitrogen.TM. #A20004) (dissolved in DMSO) in PBS (pH 8.5). Reactions were allowed to proceed for 2-hours in the dark. Alexa594-labeled capsids were then dialyzed with PBS (pH 7.5) overnight at room temperature in the dark with at least two buffer exchanges to remove any unlabeled dye.

[0279] HeLa cells (ATCC.RTM. CCL-2.TM.) were seeded 24-hours prior to the experiment in 96-well plates at counts such that they reach .about.80% confluency for treatment. Labeled-capsids were then spiked into complete tissue culture media to a final capsid concentration of 0.05 mg/ml. Treatments proceed for 4-hours at 37.degree. C., and then cells are washed 3-times with imaging media (DMEM, no phenol red, with 10% FBS and 20 mM HEPES) containing 10 ug/ml Hoechst nuclear stain prior to imaging. Fluorescence microscopy revealed a punctate staining pattern, suggesting that the Arc capsids were internalized by the HeLa cells (FIG. 8). Little or no intracellular staining was observed after administration of Alexa Fluor.TM. 594-labeled bovine serum albumin (BSA) (final concentration of 0.05 mg/ml) or 45.6 .mu.M Alexa Fluor.TM. 594 under identical conditions.

Example 5--Heterologous RNA Delivery by Arc Capsids

[0280] Human Arc capsids were loaded with Cre RNA by spiking in excess RNA during capsid formation (by dialysis into 1M sodium phosphate). Cre RNA-loaded capsids were administered to HeLa cells in biological triplicate at a final capsid concentration of 0.05 mg/ml for 4-hours at 37.degree. C. The cells were then washed 3-times with ice-cold 1.times.PBS prior to RNA extraction (Invitrogen.TM. TRIzol.TM. Reagent #15596026). Purified cell-associated RNA was quantified by qPCR in technical triplicate, normalizing values to cellular GAPDH-levels, and comparing to Escherichia coli rrsA mRNA and Arc RNA that could have carried over from protein purification. Table 2 shows primers used for the PCR reaction. The amount of cell-associated Cre RNA detected was >27-fold higher when Arc capsid were loaded with Cre RNA compared to control capsids not loaded with Cre RNA (FIG. 9).

TABLE-US-00003 TABLE 2 Primers for qPCR quantification of RNA delivered by Arc capsids to HeLa cells Gene - SEQ ID Primer Sequence NO: GAPDH-F AAGCTCATTTCCTGGTATGACAACGA 62 GAPDH-R AGGGTCTCTCTCTTCCTCTTGTGCT 63 rrsA-F GCTCAACCTGGGAACTGCATCTGAT 64 rrsA-R TAATCCTGTTTGCTCCCCACGCTTT 65 Arc CDS-F GGCCCCTCAGCTCCAGTGATTC 66 Arc CDS-R CCTGTTGTCACTCTCCTGGCTCTGA 67 Cre CDS-F GCCAAGACATAAGAAACCTCGCCT 68 Cre CDS-R GTGAATCAACATCCTCCCTCCGTC 69

[0281] FIG. 10 illustrates an alternative method of demonstrating the delivery of a heterologous RNA by an Arc or endo-Gag capsid. 6.times.His-tagged Arc or endo-Gag genes are expressed in a host cell. The resulting Arc monomers are mixed with translatable Cre mRNA under capsid forming conditions to form Cre mRNA loaded capsids. Cre-loaded capsids are then administered to LoxP-luciferase reporter mice. Upon successful delivery of Cre mRNA into mouse cells and subsequent translation of Cre recombinase protein, LoxP sites of the reporter are recombined, leading to luciferase expression, which is optionally detected by bioluminescence imaging upon administration of luciferin. This method is used to test the transmission potential of candidate Arc and endo-Gag genes. A positive luciferase signal indicates that the candidate Arc or endo-Gag gene encodes an Arc or endo-Gag protein capable of assembling into capsids that incorporate a heterologous cargo and deliver that cargo to a target cell.

[0282] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

TABLE-US-00004 TABLE 3 Arc and endo-Gag amino acid and nucleotide sequences SEQ ID NO: 1 GELDHRTSGGLHAYPGPRGGQVAKPNVILQIGKCRAEMLEHVRRTHRHLLAEVSKQVERELKGLHRSVGKLES NLDGYVPTSDSQRWKKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESTGGKYPVGSESARH TVSVGVGGPESYCHEADGYDYTVSPYAITPPPAAGELPGQEPAEAQQYQPWVPGEDGQPSPGVDTQIFEDPRE FLSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEFKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDLPQ KQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLRHPLPKTLEQLIQRGMEVQDDLEQAAEP AGPHLPVEDEAETLTPAPNSESVASDRTQPE SEQ ID NO: 2 GELDQRTTGGLHAYPAPRGGPVAKPNVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLHRSVGKLES NLDGYVPTGDSQRWRKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESMGGKYPVGSNPSRH TTSVGVGGPESYGHEADTYDYTVSPYAITPPPAAGELPGQEAVEAQQYPPWGLGEDGQPSPGVDTQIFEDPRE FLSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEYKQGSVKNWVEFKKEFLQYSEGALSREAVQRELDLPQ KQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVEDGLEQVAEP ASPHLPTEEESEALTPALTSESVASDRTQPE SEQ ID NO: 3 GELDHRTTGGLHAYPAPRGGPAAKPNVILQIGKCRAEMLEHVRRTHRHLLAEVSKQVERELKGLHRSVGKLES NLDGYVPTGDSQRWKKSIKACLSRCQETIANLERWVKREMHVWREVFYRLERWADRLESGGGKYPVGSDPARH TVSVGVGGPESYCQDADNYDYTVSPYAITPPPAAGQLPGQEEVEAQQYPPWAPGEDGQLSPGVDTQVFEDPRE FLRHLEDYLRQVGGSEEYWLSQIQNHMNGPAKKWWEYKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDLPQ KQGEPLDQFLWRKRDLYQTLYVDAEEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVQDGLEQAAEP AAEEAEALTPALTNESVASDRTQPE SEQ ID NO: 4 GELDRLNPSSGLHPSSGLHPYPGLRGGATAKPNVILQIGKCRAEMLEHVRKTHRHLLTEVSRQVERELKGLHK SVGKLESNLDGYVPSSDSQRWKKSIKACLSRCQETIAHLERWVKREMNVWREVFYRLERWADRLEAMGGKYPA GEQARRTVSVGVGGPETCCPGDESYDCPISPYAVPPSTGESPESLDQGDQHYQQWFALPEESPVSPGVDTQIF EDPREFLRHLEKYLKQVGGTEEDWLSQIQNHMNGPAKKWWEYKQGSVKNWLEFKKEFLQYSEGTLTRDALKRE LDLPQKQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLHHPLPKTLEQLIQRGQEVQNGLE PTDDPAGQRTQSEDNDESLTPAVTNESTASEGTLPE SEQ ID NO: 5 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEMPSIPQAHDSYQWVSVSEDAPASPVETQVFEDPREFLSH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKEGE PLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSDEPSPQR TPEIQSGDSVESMPPSTTASPVPSNGTQPEPPSPPATVI SEQ ID NO: 6 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEVPSIPQAHDSYQWVSVSEDAPASPVETQVFEDPREFLSH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKEGE PLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQR TPEIQSGDSVDSVPPSTTASPVPSNGTQPE SEQ ID NO: 7 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGDHGKQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEVPSIPQAHDSYQWVSTSEDAPASPVETQVFEDPREFLSH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKEGE PLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQR TPEIQSGDSVDSVPPSTTASPVPSNGTQPE SEQ ID NO: 8 GSWGLQRHVADERRGLATPTYGAVCSIREKKASQLSGQSCLEKELLGWKCTEAIVEMMQVDNFNHGNLHSCQG HRGMANHKPNVILQIGKCRAEMLDHVRRTHRHLLTEVSKQVERELKSLQKSVGKLENNLEDHVPSAAENQRWK KSIKACLARCQETIAHLERWVKREINVWKEVFFRLEKWADRLESGGGKYGPGDQSRQTVSVGVGAPEIQPRKE EIYDYALDMSQMYALTPPPMGEDPNVPQSHDSYQWITISDDSPPSPVETQIFEDPREFLTHLEDYLKQVGGTE EYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKQELDLPQKDGEPLDQFLWRKRDL YQTLYIDAEEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDNSEEPSPQRSPKHQLGGSVES LPPSSTASPVASDETHPDVSAPPVTVI SEQ ID NO: 9 GDGETQAENPSTSLNNTDEDILEQLKKIVMDQQHLYQKELKASFEQLSRKMFSQMEQMNSKQTDLLLEHQKQT VKHVDKRVEYLRAQFDASLGWRLKEQHADITTKIIPEIIQTVKEDISLCLSTLCSIAEDIQTSRATTVTGHAA VQTHPVDLLGEHHLGTTGHPRLQSTRVGKPDDVPESPVSLFMQGEARSRIVGKSPIKLQFPTFGKANDSSDPL QYLERCEDFLALNPLTDEELMATLRNVLHGTSRDWWDVARHKIQTWREFNKHFRAAFLSEDYEDELAERVRNR IQKEDESIRDFAYMYQSLCKRWNPAICEGDVVKLILKNINPQLPSQLRSRVTTVDELVRLGQQLEKDRQNQLQ YELRKSSGKIIQKSSSCETSALPNTKSTPNQQNPATSNRPPQVYCWRCKGHHAPASCPQWKADKHRAQPSRSS GPQTLTNLQAQDI SEQ ID NO: 10 GELDQRAAGGLRAYPAPRGGPVAKPSVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLHRSVGKLEG NLDGYVPTGDSQRWKKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESMGGKYPVGTNPSRH TVSVGVGGPEGYSHEADTYDYTVSPYAITPPPAAGELPGQEAVEAQQYPPWGLGEDGQPGPGVDTQIFEDPRE FLSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEFKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDLPQ KQGEPLDQFLWRKRDLYQTLYVDAEEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVQDGLEQAAEP ASPRLPPEEESEALTPALTSESVASDRTQPE SEQ ID NO: 11 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPGPGEVPSIPQAHDSYQWVSVSEDAPASPVETQIFEDPHEFLSH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKEGE PLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQR TPEIQSGDSVESMPPSTTASPVPSNGTQPEPPSPPATVI SEQ ID NO: 12 GQLENINQGSLHAFQGHRGVVHNNKPNVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLE NNLEDHVPSAAENQRWKKSIKACLARCQETIANLERWVKREMNVWKEVFFRLERWADRLESGGGKYCHADQGR QTVSVGVGGPEVRPSEGEIYDYALDMSQMYALTPPPMGDVPVIPQPHDSYQWVTDPEEAPPSPVETQIFEDPR EFLTHLEDYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKQELDLP QKEGEPLDQFLWRKRDLYQTLYVEAEEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDNSEE PSPQRTPEHQLGDSVESLPPSTTASPAGSDKTQPEISLPPTTVI SEQ ID NO: 13 GQLDSVTNAGVHTYQGHRSVANKPNVILQIGKCRTEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLERWADRLESMGGKYCPTDSARQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPSPGELPSVPQPHDSYQWVTSPEDAPASPVETQVFEDPREFLCH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDTVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKDGE PLDQFLWRKRDLYQTLYIDADEEQIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQKGKEVQGSLDHSEEPSPQR ASEARTGDSVETLPPSTTTSPNTSSGTQPEAPSPPATVI SEQ ID NO: 14 GQLDSVTNAGVHTYQGHRGVANKPNVILQIGKCRTEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLERWADRLESMGGKYCPTDSARQTVS VGVGGPEIRPSEGEIYDYALDMSQMYALTPSPGELPSIPQPHDSYQWVTSPEDAPASPVETQVFEDPREFLCH LEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDTVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKDGE PLDQFLWRKRDLYQTLYIDADEEQIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQKGKEVQGSLDHSEEPSPQR ASEARTGDSVESLPPSTTTSPNASSGTQPEAPSPPATVI SEQ ID NO: 15 GQLENVNHGNLHSFQGHRGGVANKPNVILQIGKCRAEMLDHVRRTHRHLLTEVSKQVERELKGLQKSVGKLEN NLEDHVPSAVENQRWKKSIKACLSRCQETIAHLERWVKREMNVWKEVFFRLERWADRLESGGGKYCHGDNHRQ TVSVGVGGPEVRPSEGEIYDYALDMSQMYALTPPSPGDVPVVSQPHDSYQWVTVPEDTPPSPVETQIFEDPRE FLTHLEDYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKEELDLPQ KDGEPLDQFLWRKRDLYQTLYVEADEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDNSEEP TPQRTPEHQLCGSVESLPPSSTVSPVASDGTQPETSPLPATVI SEQ ID NO: 16 GPLTLLQDWCRGEHLNTRRCMLILGIPEDCGEDEFEETLQEACRHLGRYRVIGRMFRREENAQAILLELAQDI DYALLPREIPGKGGPWEVIVKPRNSDGEFLNRLNRFLEEERRTVSDMNRVLGSDTNCSAPRVTISPEFWTWAQ TLGAAVQPLLEQMLYRELRVFSGNTISIPGALAFDAWLEHTTEMLQMWQVPEGEKRRRLMECLRGPALQVVSG LRASNASITVEECLAALQQVFGPVESHKIAQVKLCKAYQEAGEKVSSFVLRLEPLLQRAVENNVVSRRNVNQT RLKRVLSGATLPDKLRDKLKLMKQRRKPPGFLALVKLLREEEEWEATLGPDRESLEGLEVAPRPPARITGVGA VPLPASGNSFDARPSQGYRRRRGRGQHRRGGVARAGSRGSRKRKRHTFCYSCGEDGHIRVQCINPSNLLLAKE TKEILEGGEREAQTNSR SEQ ID NO: 17 GALTLLEDWCKGMDMDPRKALLIVGIPMECSEVEIQDTVKAGLQPLCAYRVLGRMFRREDNAKAVFIELADTV NYTTLPSHIPGKGGSWEVVVKPRNPDDEFLSRLNYFLKDEGRSMTDVARALGCCSLPAESLDAEVMPQVRSPP LEPPKESMWYRKLKVFSGTASPSPGEETFEDWLEQVTEIMPIWQVSEVEKRRRLLESLRGPALSIMRVLQANN DSITVEQCLDALKQIFGDKEDFRASQFRFLQTSPKIGEKVSTFLLRLEPLLQKAVHKSPLSVRSTDMIRLKHL LARVAMTPALRGKLELLDQRGCPPNFLELMKLIRDEEEWENTEAVMKNKEKPSGRGRGASGRQARAEASVSAP QATVQARSFSDSSPQTIQGGLPPLVKRRRLLGSESTRGEDHGQATYPKAENQTPGREGPQAAGEELGNEAGAG AMSHPKPWET SEQ ID NO: 18 GAVTMLQDWCRWMGVNARRGLLILGIPEDCDDAEFQESLEAALRPMGHFTVLGKAFREEDNATAALVELDREV NYALVPREIPGTGGPWNVVFVPRCSGEEFLGLGRVFHFPEQEGQMVESVAGALGVGLRRVCWLRSIGQAVQPW VEAVRCQSLGVFSGRDQPAPGEESFEVWLDHTTEMLHVWQGVSERERRRRLLEGLRGTALQLVHALLAENPAR TAQDCLAALAQVFGDNESQATIRVKCLTAQQQSGERLSAFVLRLEVLLQKAMEKEALARASADRVRLRQMLTR AHLTEPLDEALRKLRMAGRSPSFLEMLGLVRESEAWEASLARSVRAQTQEGAGARAGAQAVARASTKVEAVPG GPGREPEGLLQAGGQEAEELLQEGLKPVLEECDN SEQ ID NO: 19 GAVTMLQDWCRWMGVNARRGLLILGIPEDCDDAEFQESLEAALRPMGHFTVLGKVFREEDNATAALVELDREV NYALVPREIPGTGGPWNVVFVPRCSGEEFLGLGRVFHFPEQEGQMVESVAGALGVGLRRVCWLRSIGQAVQPW VEAVRYQSLGVFSGRDQPAPGEESFEVWLDHTTEMLHVWQGVSERERRRRLLEGLRGTALQLVHALLAENPAR TAQDCLAALAQVFGDNESQATIRVKCLTAQQQSGERLSAFVLRLEVLLQKAMEKEALARASADRVRLRQMLTR AHLTEPLDEALRKLRMAGRSPSFLEMLGLVRESEAWEASLARSVRAQTQEGAGARAGAQAVARASTKVEAVPG GPGREPEGLRQAGGQEAEELLQEGLKPVLEECDN SEQ ID NO: 20 GVEDLAASYIVLKLENElRQAQVQWLMEENAALQAQIPELQKSQAAKEYDLLRKSSEAKEPQKLPEHMNPPAA WEAQKTPEFKEPQKPPEPQDLLPWEPPAAWELQEAPAAPESLAPPATRESQKPPMAHEIPTVLEGQGPANTQD ATIAQEPKNSEPQDPPNIEKPQEAPEYQETAAQLEFLELPPPQEPLEPSNAQEFLELSAAQESLEGLIVVETS AASEFPQAPIGLEATDFPLQYTLTFSGDSQKLPEFLVQLYSYMRVRGHLYPTEAALVSFVGNCFSGRAGWWFQ LLLDIQSPLLEQCESFIPVLQDTFDNPENMKDANQCIHQLCQGEGHVATHFHLIAQELNWDESTLWIQFQEGL ASSIQDELSHTSPATNLSDLITQCISLEEKPDPNPLGKSSSAEGDGPESPPAENQPMQAAINCPHISEAEWVR WHKGRLCLYCGYPGHFARDCPVKPHQALQAGNIQACQ SEQ ID NO: 21 GVQPQTSKAESPALAASPNAQMDDVIDTLTSLRLTNSALRREASTLRAEKANLTNMLESVMAELTLLRTRARI PGALQITPPISSITSNGTRPMTTPPTSLPEPFSGDPGRLAGFLMQMDRFMIFQASRFPGEAERVAFLVSRLTG EAEKWAIPHMQPDSPLRNNYQGFLAELRRTYKSPLRHARRAQIRKTSASNRAVRERQMLCRQLASAGTGPCPV HPASNGTSPAPALPARARNL SEQ ID NO: 22 GDGRVQLMKALLAGPLRPAARRWRNPIPFPETFDGDTDRLPEFIVQTSSYMFVDENTFSNDALKVTFLITRLT GPALQWVIPYIRKESPLLNDYRGFLAEMKRVFGWEEDEDF SEQ ID NO: 23 GEGRVQLMKALLARPLRPAARRWRNPIPFPETFDGDTDRLPEFIVQTSSYMFVDENTFSNDALKVTFLITRLT GPALQWVIPYIKKESPLLSDYRGFLAEMKRVFGWEEDEDF SEQ ID NO: 24 GPRGRCRQQGPRIPIWAAANYANAHPWQQMDKASPGVAYTPLVDPW1ERPCCGDTVCVRTTMEQKSTASGTCG GKPAERGPLAGHMPSSRPHRVDFCWVPGSDPGTFDGSPWLLDRFLAQLGDYMSFHFEHYQDNISRVCEILRRL TGRAQAWAAPYLDGDLPLPDDYELFCQDLKEVVQDPNSFAEYHAVVICPLPLASSQLPVAPQLPVVRQYLARF LEGLALDMGTAPRSLPAAMATPAVSGSNSVSRSALFEQQLTKESTPGPKEPPVLPSSTCSSKPGPVEPASSQP EEAAPTPVPRLSESANPPAQRPDPAHPGGPKPQKTEEEVLETEGDQEVSLGTPQEVVEAPETPGEPPLSPGF SEQ ID NO: 25 GVDELVLLLHALLMRHRALSIENSQLMEQLRLLVCERASLLRQVRPPSCPVPFPETFNGESSRLPEFIVQTAS YMLVNENRFCNDAMKVAFLISLLTGEAEEWVVPYIEMDSP1LGDYRAFLDEMKQCFGWDDDEDDDDEEEEDDY SEQ ID NO: 26 GPVDLGQALGLLPSLAKAEDSQFSESDAALQEELSSPETARQLFRQFRYQVMSGPHETLKQLRKLCFQWLQPE VHTKEQILEILMLEQFLTILPGEIQMWVRKQCPGSGEEAVTLVESLKGDPQRLWQWISIQVLGQDILSEKMES PSCQVGEVEPHLEVVPQELGLENSSSGPGELLSHIVKEESDTEAELALAASQPARLEERLIRDQDLGASLLPA APQEQWRQLDSTQKEQYWDLMLETYGKMVSGAGISHPKSDLTNSIEFGEELAGIYLHVNEKIPRPTCIGDRQE NDKENLNLENHRDQELLHASCQASGEVPSQASLRGFFTEDEPGCFGEGENLPEALQNIQDEGTGEQLSPQERI SEKQLGQHLPNPHSGEMSTMWLEEKRETSQKGQPRAPMAQKLPTCRECGKTFYRNSQLIFHQRTHIGETYFQC TICKKAFLRSSDFVKHQRTHTGEKPCKCDYCGKGFSDFSGLRHHEKIHTGEKPYKCPICEKSFIQRSNFNRHQ RVHTGEKPYKCSHCGKSFSWSSSLDKHQRSHLGKKPFQ SEQ ID NO: 27 GTLRLLEDWCRGMDMNPRKALLIAGISQSCSVAEIEEALQAGLAPLGEYRLLGRMFRRDENRKVALVGLTAET SHALVPKEIPGKGGIWRVIFKPPDPDNTFLSRLNEFLAGEGMTVGELSRALGHENGSLDPEQGMIPEMWAPML AQALEALQPALQCLKYKKLRVFSGRESPEPGEEEFGRWMFHTTQMIKAWQVPDVEKRRRLLESLRGPALDVIR VLKINNPLITVDECLQALEEVFGVTDNPRELQVKYLTTYHKDEEKLSAYVLRLEPLLQKLVQRGAIERDAVNQ ARLDQVIAGAVHKTIRRELNLPEDGPAPGFLQLLVLIKDYEAAEEEEALLQAILEGNFT SEQ ID NO: 28 GTERRRDELSEEINNLREKVMKQSEENNNLQSQVQKLTEENTTLREQVEPTPEDEDDDIELRGAAAAAAPPPP IEEECPEDLPEKFDGNPDMLAPFMAQCQIFMEKSTRDFSVDRVRVCFVTSMMTGRAARWASAKLERSHYLMHN YPAFMMEMKHVFEDPQRREVAKRKIRRLRQGMGSVIDYSNAFQMIAQDLDWNEPALIDQYHEGLSDHIQEELS HLEVAKSLSALIGQCIHIERRLARAAAARKPRSPPRALVLPHIASHHQVDPTEPVGGARMRLTQEEKERRRKL NLCLYCGTGGHYADNCPAKASKSSPAGKLPGPAVEGPSATGPEIIRSPQDDASSPHLQVMLQIHLPGRHTLFV RAMIDSGASGNFIDHEYVAQNGIPLRIKDWPILVEAIDGRPIASGPVVHETHDLIVDLGDHREVLSFDVTQSP FFPVVLGVRWLSTHDPNITWSTRSIVFDSEYCRYHCRMYSPIPPSLPPPAPQPPLYYPVDGYRVYQPVRYYYV QNVYTPVDEHVYPDHRLVDPHIEMIPGAHSIPSGHVYSLSEPEMAALRDFVARNVKDGLITPTIAPNGAQVLQ VKRGWKLQVSYDCRAPNNFTIQNQYPRLSIPNLEDQAHLATYTEFVPQIPGYQTYPTYAAYPTYPVGFAWYPV GRDGQGRSLYVPVMITWNPHWYRQPPVPQYPPPQPPPPPPPPPPPPSYSTL SEQ ID NO: 29 GGGGAGCTGGACCACCGGACCAGCGGCGGGCTCCACGCCTACCCCGGGCCGCGGGGCGGGCAGGTGGCCAAGC CCAACGTGATCCTGCAGATCGGGAAGTGCCGGGCCGAGATGCTGGAGCACGTGCGGCGGACGCACCGGCACCT GCTGGCCGAGGTGTCCAAGCAGGTGGAGCGCGAGCTGAAGGGGCTGCACCGGTCGGTCGGGAAGCTGGAGAGC AACCTGGACGGCTACGTGCCCACGAGCGACTCGCAGCGCTGGAAGAAGTCCATCAAGGCCTGCCTGTGCCGCT GCCAGGAGACCATCGCCAACCTGGAGCGCTGGGTCAAGCGCGAGATGCACGTGTGGCGCGAGGTGTTCTACCG CCTGGAGCGCTGGGCCGACCGCCTGGAGTCCACGGGCGGCAAGTACCCGGTGGGCAGCGAGTCAGCCCGCCAC ACCGTTTCCGTGGGCGTGGGGGGTCCCGAGAGCTACTGCCACGAGGCAGACGGCTACGACTACACCGTCAGCC CCTACGCCATCACCCCGCCCCCAGCCGCTGGCGAGCTGCCCGGGCAGGAGCCCGCCGAGGCCCAGCAGTACCA GCCGTGGGTCCCCGGCGAGGACGGGCAGCCCAGCCCCGGCGTGGACACGCAGATCTTCGAGGACCCTCGAGAG TTCCTGAGCCACCTAGAGGAGTACTTGCGGCAGGTGGGCGGCTCTGAGGAGTACTGGCTGTCCCAGATCCAGA ATCACATGAACGGGCCGGCCAAGAAGTGGTGGGAGTTCAAGCAGGGCTCCGTGAAGAACTGGGTGGAGTTCAA GAAGGAGTTCCTGCAGTACAGCGAGGGCACGCTGTCCCGAGAGGCCATCCAGCGCGAGCTGGACCTGCCGCAG AAGCAGGGCGAGCCGCTGGACCAGTTCCTGTGGCGCAAGCGGGACCTGTACCAGACGCTCTACGTGGACGCGG ACGAGGAGGAGATCATCCAGTACGTGGTGGGCACCCTGCAGCCCAAGCTCAAGCGTTTCCTGCGCCACCCCCT GCCCAAGACCCTGGAGCAGCTCATCCAGAGGGGCATGGAGGTGCAGGATGACCTGGAGCAGGCGGCCGAGCCG GCCGGCCCCCACCTCCCGGTGGAGGATGAGGCGGAGACCCTCACGCCCGCCCCCAACAGCGAGTCCGTGGCCA GTGACCGGACCCAGCCCGAG SEQ ID NO: 30 GGGGAATTGGATCAACGTACTACCGGTGGCCTTCACGCATACCCTGCACCACGCGGGGGCCCTGTCGCGAAGC CAAATGTCATCCTGCAGATTGGGAAGTGCCGGGCTGAGATGCTGGAGCACGTCCGTCGGACGCATCGTCATCT TCTTACTGAGGTGTCAAAACAGGTGGAGCGTGAACTCAAAGGCTTGCACCGCAGCGTTGGGAAACTTGAAAGC AACTTAGATGGCTATGTGCCGACTGGCGACAGCCAGCGTTGGCGTAAGTCCATCAAAGCATGTTTGTGTCGTT GCCAGGAAACGATTGCAAACCTGGAGCGTTGGGTCAAACGGGAGATGCATGTCTGGCGTGAAGTATTTTATCG TTTAGAGCGTTGGGCCGATCGTTTAGAGAGCATGGGTGGTAAGTACCCTGTGGGGAGCAACCCTTCTCGGCAT

ACGACGTCAGTCGGTGTTGGCGGGCCGGAGTCCTACGGTCATGAAGCGGACACCTACGACTATACCGTAAGCC CTTATGCTATTACCCCACCACCTGCGGCCGGCGAATTACCTGGCCAGGAAGCCGTTGAGGCTCAACAATACCC TCCTTGGGGGCTGGGCGAGGATGGTCAACCTAGCCCAGGGGTAGACACGCAAATCTTTGAGGACCCACGGGAG TTTCTTTCCCACCTGGAAGAATACCTGCGTCAGGTTGGTGGGAGCGAAGAATACTGGCTGTCACAAATTCAAA ACCATATGAATGGTCCTGCAAAAAAATGGTGGGAATATAAACAGGGTTCCGTGAAAAACTGGGTTGAGTTTAA AAAGGAGTTTCTTCAATATTCCGAGGGCGCCCTCAGTCGGGAGGCGGTCCAACGCGAGTTGGACTTGCCACAG AAACAGGGGGAACCACTCGATCAATTCCTTTGGCGGAAACGTGACCTTTACCAGACATTGTACGTGGATGCAG ATGAGGAAGAAATTATCCAATATGTTGTGGGGACCCTGCAGCCGAAACTGAAACGTTTCCTTCGCCCGCCGCT GCCTAAAACGTTGGAACAACTTATTCAGAAAGGTATGGAGGTCGAGGATGGCTTAGAACAAGTCGCAGAGCCG GCCTCGCCACACTTGCCTACAGAGGAGGAATCGGAGGCGCTGACCCCAGCACTTACATCAGAGTCAGTGGCAT CAGACCGGACACAACCAGAG SEQ ID NO: 31 GGGGAGTTAGATCACCGTACAACGGGGGGGTTGCACGCATACCCTGCTCCACGTGGCGGGCCGGCAGCTAAGC CAAACGTAATCCTGCAGATTGGGAAGTGCCGGGCAGAGATGTTGGAGCACGTCCGGCGGACCCACCGGCACCT CCTGGCTGAAGTGTCTAAACAAGTAGAACGGGAACTCAAAGGTCTTCATCGTAGCGTCGGGAAATTGGAATCG AATTTGGACGGGTATGTTCCTACAGGCGACTCACAGCGGTGGAAAAAGAGCATCAAGGCCTGCCTGAGTCGCT GCCAGGAGACGATTGCTAACCTCGAACGCTGGGTTAAGCGGGAGATGCACGTTTGGCGCGAAGTCTTCTACCG GCTGGAGCGTTGGGCTGATCGGCTCGAATCTGGTGGGGGTAAGTATCCAGTTGGGTCCGACCCTGCTCGCCAC ACAGTCTCAGTTGGCGTAGGTGGGCCGGAGTCGTATTGCCAAGATGCGGACAACTATGATTATACAGTTTCCC CATACGCGATCACACCACCGCCGGCAGCAGGGCAGCTGCCAGGTCAGGAAGAGGTTGAGGCCCAGCAGTATCC ACCATGGGCCCCAGGGGAAGACGGCCAGCTTTCTCCTGGGGTGGACACTCAAGTTTTTGAAGATCCGCGTGAA TTTCTGCGGCATTTAGAAGATTATCTCCGCCAGGTCGGGGGGTCTGAAGAGTATTGGTTAAGCCAAATTCAAA ACCATATGAACGGCCCGGCCAAGAAGTGGTGGGAGTACAAGCAAGGGTCTGTGAAAAATTGGGTGGAGTTTAA GAAAGAATTCTTGCAATATTCTGAGGGCACTCTTTCGCGTGAAGCCATCCAACGCGAACTCGACTTACCGCAG AAACAAGGGGAACCTCTCGACCAATTTCTGTGGCGCAAACGCGACCTGTACCAGACTCTTTACGTCGATGCTG AGGAGGAAGAAATTATTCAATACGTAGTTGGCACACTGCAGCCTAAGCTTAAACGGTTTTTACGTCCACCATT GCCGAAGACGCTTGAACAACTCATCCAGAAGGGTATGGAGGTTCAAGATGGTCTGGAACAGGCAGCGGAACCA GCGGCGGAGGAGGCAGAAGCCCTGACACCTGCGTTAACTAACGAGTCTGTCGCGAGCGACCGCACCCAGCCGG AA SEQ ID NO: 32 GGGGAATTAGACCGCCTGAACCCAAGCTCAGGCCTGCATCCATCCTCTGGTTTGCATCCATACCCAGGTCTCC GGGGCGGGGCAACCGCGAAGCCTAATGTCATTTTGCAAATTGGCAAATGCCGTGCGGAAATGCTTGAACACGT CCGCAAAACTCACCGTCATCTCCTCACAGAAGTATCGCGCCAAGTAGAACGCGAGCTCAAAGGCCTTCACAAA AGTGTTGGCAAGTTGGAATCAAATCTTGATGGGTACGTACCGTCAAGCGACTCCCAACGCTGGAAGAAAAGCA TTAAGGCGTGCTTATCCCGTTGCCAAGAGACGATTGCGCATTTAGAACGCTGGGTTAAACGTGAAATGAATGT ATGGCGTGAGGTGTTCTACCGTTTGGAACGTTGGGCGGACCGTCTGGAGGCTATGGGCGGTAAGTATCCTGCC GGTGAGCAGGCCCGGCGTACAGTTTCAGTGGGCGTTGGGGGCCCTGAGACATGTTGTCCAGGGGATGAAAGTT ATGATTGTCCGATTTCTCCGTATGCAGTTCCACCTTCCACCGGCGAGTCTCCGGAATCCTTAGACCAAGGGGA TCAGCACTATCAGCAGTGGTTTGCCCTCCCGGAGGAGTCCCCTGTTAGCCCTGGGGTTGATACCCAGATCTTT GAAGATCCTCGCGAGTTTTTACGTCATCTGGAGAAGTACCTGAAACAAGTCGGCGGGACAGAGGAAGACTGGC TTTCTCAAATCCAGAATCACATGAATGGGCCGGCGAAGAAGTGGTGGGAGTACAAGCAAGGGAGTGTTAAGAA TTGGCTTGAATTTAAGAAGGAATTTTTACAGTATTCGGAGGGCACACTGACGCGGGACGCGTTGAAACGTGAA CTGGATCTCCCACAGAAACAAGGCGAACCACTTGATCAATTTTTATGGCGGAAGCGCGACTTATATCAGACAC TCTACGTTGACGCCGATGAAGAGGAAATCATTCAGTACGTCGTGGGCACTCTTCAGCCGAAATTAAAACGCTT TCTCCATCACCCACTCCCTAAGACGCTTGAGCAGCTTATCCAACGGGGCCAAGAAGTTCAGAATGGTCTGGAG CCTACCGACGATCCTGCAGGCCAACGCACTCAATCGGAGGACAACGACGAAAGCCTTACCCCTGCCGTCACCA ATGAGAGTACTGCAAGCGAGGGCACCCTGCCAGAG SEQ ID NO: 33 GGGCAGCTTGATAACGTTACAAACGCGGGCATCCACTCCTTCCAGGGGCATCGTGGCGTAGCGAATAAGCCAA ATGTCATTCTGCAAATTGGTAAATGTCGTGCGGAAATGCTGGAGCACGTTCGCCGCACCCACCGCCATTTATT ATCTGAAGTATCTAAGCAGGTAGAACGTGAGCTGAAAGGGCTGCAAAAGTCCGTGGGCAAGCTCGAGAATAAC TTGGAGGATCATGTCCCTACAGATAACCAACGCTGGAAGAAGTCCATTAAAGCGTGCTTGGCTCGTTGTCAAG AGACTATCGCGCATTTAGAGCGTTGGGTGAAACGCGAAATGAACGTCTGGAAGGAGGTGTTTTTCCGGCTGGA AAAGTGGGCAGACCGGCTGGAGTCAATGGGTGGCAAGTACTGCCCGGGCGAACACGGGAAACAAACCGTCAGT GTAGGCGTGGGGGGTCCTGAAATCCGGCCTTCGGAGGGGGAAATTTATGATTATGCTCTGGATATGAGCCAGA TGTATGCACTCACCCCACCTCCAGGCGAAATGCCATCAATCCCACAAGCCCATGACAGCTATCAGTGGGTTAG TGTCTCAGAAGATGCCCCGGCGAGCCCTGTCGAAACCCAGGTATTTGAGGACCCTCGGGAATTCCTGTCTCAC CTGGAGGAATACCTGAAGCAGGTAGGCGGCACGGAGGAGTATTGGTTGTCCCAGATCCAGAATCACATGAATG GTCCGGCAAAAAAATGGTGGGAATATAAACAGGACTCCGTTAAAAACTGGGTTGAGTTTAAAAAGGAATTCTT GCAATACTCTGAAGGTACTTTAACTCGGGATGCTATTAAGCGTGAACTCGACTTGCCGCAAAAGGAAGGTGAA CCTCTTGACCAATTCCTTTGGCGGAAGCGGGACCTCTATCAGACACTTTACGTGGACGCGGATGAGGAGGAGA TCATTCAGTATGTGGTCGGTACCCTGCAGCCGAAGCTCAAGCGTTTCCTGAGCTATCCTCTCCCAAAGACTTT AGAACAGCTCATCCAGCGCGGTAAAGAAGTGCAGGGTAACATGGATCACTCCGATGAGCCTTCGCCGCAGCGT ACACCTGAAATTCAATCAGGTGACTCCGTAGAATCTATGCCACCTTCAACAACGGCATCTCCGGTTCCATCTA ATGGTACCCAACCTGAGCCGCCGAGCCCGCCAGCCACCGTTATC SEQ ID NO: 34 GGGCAACTTGACAACGTAACAAACGCTGGGATTCACTCCTTTCAGGGCCACCGCGGTGTCGCCAACAAGCCAA ACGTAATCTTGCAAATTGGCAAATGCCGTGCGGAGATGTTGGAACACGTTCGTCGTACACATCGTCACTTGCT GTCGGAAGTCTCTAAACAAGTAGAACGTGAACTTAAAGGGCTTCAAAAGTCAGTCGGCAAATTGGAAAACAAC CTTGAAGACCATGTACCAACCGACAATCAGCGTTGGAAAAAGTCTATCAAAGCTTGCCTGGCCCGTTGTCAAG AGACGATTGCTCACCTGGAGCGGTGGGTAAAGCGCGAGATGAATGTGTGGAAAGAGGTCTTCTTCCGCTTGGA AAAATGGGCCGACCGTTTGGAGTCCATGGGCGGTAAATATTGTCCGGGTGAACATGGTAAGCAAACAGTCTCT GTGGGCGTTGGTGGGCCGGAGATTCGGCCTTCTGAAGGCGAGATTTACGATTATGCGCTCGACATGTCCCAGA TGTATGCGCTTACACCACCACCGGGCGAGGTACCAAGCATTCCTCAAGCGCATGACAGTTATCAGTGGGTTAG CGTATCCGAAGACGCTCCTGCCTCGCCGGTAGAGACCCAGGTTTTTGAAGATCCTCGTGAATTTTTAAGCCAC TTGGAGGAGTATTTGAAGCAGGTAGGGGGGACAGAGGAATATTGGCTGTCTCAGATCCAGAACCACATGAATG GCCCGGCTAAAAAGTGGTGGGAATACAAACAAGATTCGGTAAAGAATTGGGTAGAATTTAAAAAGGAGTTTTT ACAGTACTCAGAGGGGACTCTCACGCGTGATGCGATCAAACGCGAGTTGGATCTTCCTCAAAAAGAGGGGGAG CCACTCGATCAGTTCCTCTGGCGCAAGCGGGATCTCTACCAAACACTCTACGTAGACGCAGACGAAGAAGAGA TCATCCAGTACGTGGTGGGTACGCTCCAGCCGAAACTCAAACGTTTCCTCAGCTACCCACTTCCTAAGACTCT GGAACAACTGATTCAGCGGGGCAAAGAGGTCCAGGGTAACATGGACCATTCAGAGGAACCTAGTCCGCAACGT ACACCTGAGATCCAATCTGGGGATTCTGTCGATTCGGTTCCACCTTCTACAACAGCGTCTCCGGTGCCGTCAA ATGGGACCCAACCAGAG SEQ ID NO: 35 GGGCAGCTTGATAATGTAACCAATGCAGGTATCCACTCTTTCCAGGGTCACCGCGGTGTGGCAAACAAGCCAA ATGTTATTCTGCAAATTGGTAAGTGTCGCGCTGAGATGTTAGAACACGTCCGGCGCACGCATCGGCATCTCCT GTCAGAGGTTTCAAAGCAGGTAGAGCGTGAATTAAAGGGCCTCCAGAAGTCCGTAGGTAAACTCGAAAATAAT CTTGAAGACCACGTTCCTACCGATAATCAACGGTGGAAAAAGTCAATCAAGGCGTGCTTAGCACGGTGTCAGG AAACGATCGCGCACCTCGAACGTTGGGTGAAGCGCGAAATGAATGTCTGGAAAGAAGTGTTCTTCCGGCTTGA GAAGTGGGCTGATCGGCTCGAATCCATGGGTGGCAAATATTGTCCAGGTGATCATGGCAAGCAAACGGTCTCC GTCGGTGTTGGTGGTCCGGAAATCCGGCCGAGCGAGGGTGAAATCTATGACTACGCTCTTGATATGTCCCAGA TGTATGCACTCACTCCTCCGCCGGGTGAGGTCCCGTCGATCCCGCAGGCGCATGACTCATACCAATGGGTGTC GACTAGCGAAGACGCACCAGCCTCCCCTGTTGAAACTCAAGTATTCGAGGACCCGCGTGAGTTCCTGAGCCAT TTAGAGGAGTACCTTAAGCAGGTTGGTGGTACCGAGGAATACTGGTTGAGCCAGATTCAGAATCACATGAACG GGCCGGCTAAGAAATGGTGGGAATACAAGCAGGATTCAGTCAAGAATTGGGTCGAATTTAAGAAGGAGTTTTT GCAGTACAGTGAGGGGACGCTCACACGCGACGCTATCAAACGGGAGCTGGACCTGCCACAAAAGGAGGGTGAA CCGCTTGATCAGTTTCTTTGGCGCAAGCGTGATCTGTATCAAACCCTGTATGTGGACGCTGACGAAGAAGAGA TCATTCAGTACGTGGTTGGGACTCTGCAACCAAAGCTGAAGCGTTTTCTTTCTTATCCTCTCCCTAAGACACT GGAACAGTTAATCCAACGTGGCAAGGAGGTCCAGGGTAATATGGACCACTCTGAGGAACCGAGCCCGCAACGT ACTCCTGAAATTCAGAGCGGGGATAGTGTCGACTCAGTTCCTCCAAGTACGACCGCATCCCCGGTCCCAAGTA ACGGTACCCAACCAGAG SEQ ID NO: 36 GGGTCTTGGGGCTTGCAACGTCACGTGGCTGATGAACGTCGTGGCCTCGCTACGCCTACCTACGGCGCGGTTT GTTCCATTCGGGAGAAAAAAGCCTCCCAACTGAGCGGCCAGAGCTGTTTGGAGAAAGAGTTGCTTGGTTGGAA ATGTACGGAGGCAATCGTGGAAATGATGCAAGTCGATAACTTTAACCACGGTAACTTACATAGCTGCCAAGGC CATCGGGGGATGGCAAATCACAAACCGAACGTAATCCTTCAAATCGGGAAATGTCGCGCAGAAATGTTAGACC ACGTGCGTCGCACCCACCGCCATCTCTTGACGGAGGTTTCGAAGCAGGTAGAACGCGAATTGAAGTCTCTCCA AAAGTCGGTTGGCAAGCTCGAGAATAATCTGGAAGACCACGTGCCATCGGCAGCGGAGAACCAACGTTGGAAG AAATCAATTAAAGCCTGCCTGGCCCGGTGCCAAGAAACAATTGCTCACCTCGAACGCTGGGTTAAACGCGAAA TCAACGTCTGGAAAGAAGTATTCTTTCGTCTGGAGAAGTGGGCGGACCGCCTTGAGTCGGGTGGGGGCAAGTA TGGGCCTGGTGACCAAAGTCGTCAAACTGTAAGTGTCGGTGTTGGGGCCCCAGAAATCCAACCGCGGAAAGAA GAAATCTATGACTACGCTCTCGACATGTCGCAGATGTATGCCTTAACACCACCGCCGATGGGTGAAGACCCAA ACGTACCTCAATCCCACGATAGCTACCAGTGGATTACCATCTCAGACGATTCACCTCCGTCGCCAGTGGAAAC TCAAATTTTCGAGGATCCACGCGAATTCCTTACCCATCTCGAGGATTATCTTAAGCAAGTGGGCGGGACTGAA GAATATTGGTTGAGTCAGATTCAAAATCATATGAACGGTCCGGCCAAGAAATGGTGGGAGTACAAACAAGATT CCGTGAAAAACTGGTTGGAATTCAAGAAGGAATTCCTTCAATACTCTGAGGGTACTTTGACACGTGACGCAAT TAAACAAGAACTTGACTTACCGCAGAAGGACGGCGAGCCATTGGATCAATTTCTTTGGCGGAAGCGGGACCTG TATCAGACGCTCTATATTGATGCAGAGGAGGAAGAAGTAATCCAATACGTTGTTGGCACACTCCAACCGAAAT TAAAACGTTTCCTTTCCCACCCGTATCCGAAAACTTTGGAACAGTTAATCCAACGTGGGAAAGAGGTGGAAGG CAACCTCGATAACTCTGAGGAGCCTAGCCCGCAACGGAGTCCAAAGCACCAATTGGGTGGTAGCGTCGAGAGC CTCCCACCTTCGTCGACCGCAAGTCCTGTTGCGTCAGACGAGACTCACCCAGACGTGAGCGCACCTCCGGTAA CGGTGATT SEQ ID NO: 37 GGGGACGGCGAGACTCAAGCTGAGAATCCATCTACCAGCTTGAACAACACTGACGAAGATATCTTGGAACAGC TCAAGAAAATTGTCATGGATCAACAACACCTGTATCAGAAAGAATTAAAGGCATCTTTTGAACAACTCAGTCG CAAAATGTTTTCCCAGATGGAACAAATGAATAGCAAGCAAACGGATCTGCTTTTAGAACATCAAAAACAGACT GTCAAACATGTAGACAAGCGCGTGGAGTATTTGCGGGCGCAATTCGATGCATCGTTAGGCTGGCGGTTGAAAG AGCAACACGCGGATATTACGACCAAAATCATTCCTGAGATCATCCAAACGGTGAAGGAAGATATTAGCCTGTG TCTTTCTACGCTCTGCAGTATCGCTGAAGATATCCAGACATCACGGGCTACCACTGTCACAGGGCATGCTGCC GTACAAACCCATCCTGTGGATCTTTTGGGTGAACACCATTTAGGGACCACGGGGCACCCACGCTTACAGTCGA CCCGTGTAGGGAAACCAGACGACGTACCTGAGTCGCCGGTAAGCCTGTTTATGCAAGGTGAGGCGCGTTCCCG GATCGTTGGCAAGAGTCCGATTAAACTGCAATTTCCGACGTTCGGCAAAGCAAACGATTCTTCCGACCCACTC CAATATCTGGAGCGGTGTGAGGACTTTCTTGCTCTTAACCCTTTAACTGATGAGGAACTTATGGCTACTTTGC GGAATGTGTTACATGGCACCTCTCGGGATTGGTGGGATGTCGCACGTCATAAAATCCAAACTTGGCGTGAGTT TAATAAACACTTCCGGGCGGCTTTCCTCAGCGAGGATTATGAAGATGAGTTGGCTGAGCGCGTCCGTAACCGC ATCCAAAAAGAAGATGAGTCTATCCGCGATTTCGCTTATATGTATCAGTCCTTGTGCAAGCGGTGGAACCCTG CTATCTGCGAAGGTGATGTAGTAAAGCTCATCCTGAAGAACATCAATCCACAACTGCCGTCTCAGTTACGCTC CCGGGTCACGACCGTGGATGAGCTTGTTCGCTTGGGCCAGCAGCTTGAAAAAGATCGTCAGAATCAGCTCCAA TATGAGCTTCGGAAGAGTTCCGGCAAAATTATCCAAAAATCTAGTTCGTGCGAAACTTCAGCGCTCCCGAACA CGAAGAGTACACCTAATCAACAAAACCCTGCTACCAGTAACCGTCCTCCACAGGTGTATTGCTGGCGGTGTAA GGGTCACCATGCCCCTGCCTCTTGTCCGCAATGGAAAGCTGATAAGCACCGTGCGCAACCTTCGCGGAGTTCT GGGCCACAAACTCTGACTAATCTCCAAGCTCAAGACATC SEQ ID NO: 38 GGGGAATTGGATCAACGTGCGGCAGGGGGCTTGCGCGCGTACCCGGCGCCGCGTGGTGGTCCAGTTGCCAAAC CGAGCGTAATTCTTCAGATTGGTAAGTGCCGCGCTGAGATGCTGGAACACGTCCGCCGCACGCATCGCCATCT TCTGACGGAGGTAAGTAAACAAGTGGAGCGCGAACTCAAGGGGTTACATCGGTCTGTCGGTAAGTTGGAGGGC AATTTAGACGGCTATGTGCCTACCGGTGATTCCCAACGCTGGAAAAAAAGTATCAAGGCGTGTCTCTGCCGGT GTCAGGAAACAATTGCAAATCTCGAGCGTTGGGTGAAACGTGAGATGCATGTTTGGCGTGAGGTATTCTATCG TTTGGAACGGTGGGCAGACCGTTTGGAGTCTATGGGGGGCAAGTATCCGGTGGGCACTAACCCGTCGCGGCAC ACAGTAAGTGTCGGGGTAGGGGGCCCGGAAGGCTATTCTCATGAAGCGGATACTTATGACTACACGGTGTCTC CGTATGCTATCACGCCACCGCCTGCCGCGGGTGAGTTGCCTGGTCAAGAGGCTGTCGAGGCACAACAGTACCC TCCATGGGGTCTGGGGGAGGACGGGCAACCAGGTCCGGGCGTGGACACGCAGATTTTTGAGGACCCTCGCGAA TTTTTGAGCCACTTAGAGGAGTACCTGCGGCAAGTAGGGGGGAGTGAAGAGTACTGGTTATCGCAAATTCAAA ATCATATGAATGGCCCTGCGAAGAAATGGTGGGAGTTCAAACAGGGGTCAGTCAAGAATTGGGTCGAGTTTAA GAAAGAATTTTTGCAATACAGTGAGGGTACGTTGAGTCGCGAGGCCATCCAACGTGAACTGGACCTCCCTCAG AAGCAGGGGGAGCCGTTAGATCAATTTTTATGGCGGAAACGTGACTTATACCAAACCCTCTACGTTGACGCTG AGGAAGAAGAAATTATTCAATATGTTGTCGGTACGCTGCAGCCAAAGCTGAAGCGGTTCCTCCGTCCTCCACT CCCTAAAACCTTAGAACAATTAATCCAAAAAGGCATGGAAGTTCAGGACGGGTTAGAACAAGCGGCCGAACCG GCCTCTCCGCGTCTGCCGCCGGAAGAGGAGAGTGAGGCTCTTACGCCTGCGCTCACGAGCGAATCAGTAGCCT CCGATCGGACACAGCCAGAG SEQ ID NO: 39 GGGCAGCTTGACAATGTGACGAACGCGGGGATTCACAGCTTTCAAGGGCACCGCGGCGTCGCCAACAAACCGA ATGTCATTCTGCAAATCGGTAAATGTCGTGCTGAAATGCTTGAGCACGTTCGTCGTACCCATCGTCACTTGCT TTCTGAAGTATCAAAACAAGTGGAGCGGGAACTCAAAGGCCTGCAAAAGTCAGTGGGTAAATTGGAGAATAAC CTCGAAGACCATGTACCTACAGACAACCAGCGGTGGAAAAAATCTATCAAGGCATGCCTCGCTCGTTGCCAGG AGACTATTGCCCATCTTGAGCGGTGGGTGAAACGTGAAATGAACGTATGGAAGGAAGTATTTTTTCGCTTAGA GAAGTGGGCTGATCGTCTTGAATCGATGGGCGGCAAGTACTGTCCTGGGGAACACGGCAAACAAACTGTATCT GTCGGCGTGGGGGGCCCGGAGATCCGGCCATCGGAAGGGGAAATTTATGATTATGCTCTCGACATGTCCCAAA TGTATGCTCTCACACCAGGGCCAGGGGAAGTACCGTCAATTCCGCAAGCACACGACAGCTACCAATGGGTATC TGTGAGCGAGGACGCGCCTGCCTCTCCGGTTGAGACGCAAATCTTTGAGGACCCACATGAATTTTTGTCTCAT CTTGAAGAATATCTCAAACAGGTTGGCGGCACAGAAGAATACTGGTTATCTCAGATCCAGAATCACATGAACG GCCCGGCTAAAAAGTGGTGGGAGTATAAGCAAGATTCCGTAAAGAACTGGGTCGAATTCAAGAAAGAGTTTCT TCAATACTCTGAGGGTACTCTGACGCGCGATGCAATTAAGCGGGAGTTAGACCTTCCACAAAAAGAGGGGGAG CCTCTTGACCAGTTCCTGTGGCGTAAGCGCGACCTCTATCAGACACTTTACGTCGACGCTGATGAAGAAGAGA TTATTCAATATGTTGTGGGTACCCTGCAGCCAAAGCTTAAGCGTTTCCTTAGCTACCCACTTCCGAAAACTCT GGAGCAGCTCATTCAACGCGGTAAGGAAGTGCAGGGCAACATGGACCACTCTGAAGAGCCTAGCCCGCAGCGC ACTCCTGAAATCCAATCAGGTGACAGTGTGGAGTCAATGCCGCCGTCAACCACCGCTTCTCCGGTACCTAGCA ACGGGACGCAACCAGAGCCTCCAAGCCCACCGGCTACAGTCATC SEQ ID NO: 40 GGGCAACTTGAGAATATTAACCAAGGTTCCCTGCACGCGTTTCAGGGTCATCGCGGCGTGGTCCATAACAACA AGCCTAACGTTATTCTCCAGATCGGGAAGTGCCGCGCCGAAATGCTGGAGCATGTGCGGCGCACCCATCGCCA TTTGCTCACTGAAGTATCAAAACAGGTGGAGCGTGAGTTGAAGGGGTTGCAGAAAAGTGTAGGCAAACTTGAA AATAATTTAGAAGACCACGTACCAAGTGCGGCTGAGAACCAACGCTGGAAGAAGTCGATTAAAGCCTGCTTAG CGCGTTGTCAGGAGACCATTGCGAACTTGGAACGCTGGGTTAAACGTGAGATGAATGTTTGGAAGGAGGTCTT TTTCCGCTTAGAGCGCTGGGCAGATCGCCTCGAATCCGGGGGTGGCAAGTACTGCCATGCAGACCAGGGTCGC CAAACTGTCAGCGTAGGTGTTGGTGGTCCTGAAGTGCGTCCGTCTGAAGGTGAAATTTACGATTACGCGTTGG ATATGAGCCAAATGTACGCCTTGACTCCGCCGCCTATGGGTGATGTTCCAGTAATTCCTCAGCCGCATGACAG TTATCAGTGGGTGACAGATCCGGAAGAAGCGCCACCAAGTCCGGTTGAGACACAAATTTTCGAGGACCCTCGG GAGTTTCTGACCCATCTTGAGGATTATTTAAAACAAGTCGGCGGGACAGAGGAATATTGGCTCTCACAGATCC AAAATCATATGAATGGGCCAGCGAAAAAGTGGTGGGAATATAAACAGGATAGTGTGAAGAACTGGCTTGAGTT CAAAAAAGAATTCTTGCAGTACTCAGAAGGCACGTTAACGCGGGACGCTATTAAACAGGAACTTGACCTTCCA CAAAAAGAAGGGGAACCGCTGGATCAATTCCTCTGGCGCAAACGCGATTTGTACCAAACTCTCTACGTCGAGG CAGAAGAAGAGGAGGTCATCCAATATGTAGTTGGCACACTGCAACCAAAACTGAAGCGGTTTCTTTCTCATCC GTACCCTAAAACCCTGGAGCAACTCATCCAGCGCGGGAAGGAAGTTGAGGGGAATTTGGACAATAGTGAAGAA CCGTCTCCACAGCGGACCCCAGAACATCAGCTGGGGGACAGTGTGGAATCTTTGCCGCCTAGTACTACGGCTT CGCCTGCCGGTTCGGATAAAACGCAACCTGAGATTAGCTTACCTCCAACTACAGTCATT SEQ ID NO: 41 GGGCAATTAGATTCGGTAACCAATGCGGGCGTCCACACCTACCAGGGCCATCGGAGCGTCGCCAATAAACCTA ACGTCATTCTTCAAATCGGGAAATGTCGGACTGAGATGCTGGAGCATGTCCGTCGGACTCATCGCCACCTGCT CACAGAAGTGTCAAAGCAAGTGGAACGTGAACTCAAGGGCTTACAGAAGAGCGTGGGCAAACTGGAAAACAAT CTTGAAGACCATGTCCCAACTGACAATCAGCGGTGGAAGAAGTCAATCAAGGCATGTCTCGCGCGTTGCCAAG AGACCATTGCTCACCTTGAGCGGTGGGTGAAACGTGAAATGAACGTGTGGAAGGAGGTGTTCTTCCGGTTAGA ACGCTGGGCCGACCGCCTTGAATCAATGGGTGGTAAATACTGCCCGACGGACTCTGCACGTCAGACAGTTAGC GTTGGGGTGGGGGGCCCGGAAATTCGGCCTAGTGAAGGCGAAATCTATGACTACGCGCTCGATATGAGCCAAA TGTACGCTCTTACGCCGTCACCGGGCGAATTGCCGTCCGTCCCTCAACCGCATGATTCATACCAGTGGGTCAC TAGTCCGGAAGACGCTCCGGCGTCACCAGTTGAAACGCAGGTATTCGAGGATCCTCGGGAGTTCTTGTGTCAT TTGGAAGAGTACCTGAAGCAGGTTGGCGGTACAGAGGAATATTGGCTGAGCCAGATTCAGAATCATATGAATG GTCCTGCAAAAAAGTGGTGGGAATATAAACAAGACACGGTTAAGAATTGGGTGGAATTCAAGAAGGAGTTCTT ACAATACAGTGAGGGTACACTTACCCGTGATGCGATTAAGCGGGAATTAGACCTCCCGCAAAAGGACGGTGAG CCTCTGGATCAATTTTTATGGCGTAAGCGTGACCTCTATCAGACATTATACATTGATGCCGATGAAGAACAGA TCATTCAGTACGTCGTGGGGACATTGCAACCTAAACTCAAGCGGTTCTTGTCCTATCCACTTCCAAAAACTCT TGAACAATTAATCCAGAAAGGGAAGGAGGTGCAGGGTTCACTTGACCACAGCGAGGAGCCGAGTCCTCAACGT GCGAGCGAGGCTCGGACGGGCGATAGTGTGGAAACCTTGCCGCCTTCTACCACTACATCACCAAATACGTCAT CTGGTACACAGCCAGAGGCACCATCGCCTCCAGCGACGGTAATC SEQ ID NO: 42 GGGCAGTTAGACAGTGTGACTAACGCCGGGGTGCATACGTACCAGGGGCACCGCGGGGTCGCCAATAAGCCAA ATGTAATTCTCCAGATTGGGAAGTGTCGTACAGAGATGTTGGAACATGTCCGTCGCACTCATCGCCACTTGCT CACCGAGGTCTCCAAACAAGTAGAACGCGAACTCAAGGGGCTCCAGAAGAGTGTTGGGAAGTTGGAGAATAAC CTCGAAGACCACGTTCCGACAGATAACCAACGGTGGAAAAAGTCTATTAAAGCCTGTCTCGCCCGTTGTCAAG AGACAATCGCACACTTGGAACGCTGGGTCAAACGGGAGATGAATGTGTGGAAGGAAGTCTTCTTCCGTCTCGA GCGGTGGGCGGATCGTTTAGAAAGTATGGGCGGTAAATATTGCCCAACTGACTCGGCTCGTCAAACGGTGTCG GTTGGCGTAGGCGGCCCGGAAATTCGCCCTAGCGAGGGTGAGATCTATGACTATGCACTTGACATGAGTCAGA TGTATGCGTTAACTCCGTCGCCAGGGGAGCTTCCAAGTATTCCACAGCCTCACGATAGTTATCAATGGGTAAC TTCTCCTGAAGACGCCCCAGCATCCCCAGTTGAGACACAAGTATTCGAGGACCCTCGTGAGTTTCTCTGTCAC CTCGAGGAGTACCTTAAACAGGTAGGCGGGACCGAAGAGTACTGGTTATCGCAAATCCAAAACCATATGAATG GTCCTGCCAAAAAGTGGTGGGAGTATAAACAAGATACTGTGAAGAATTGGGTAGAGTTCAAGAAAGAGTTCTT ACAGTACTCTGAGGGGACGTTAACTCGTGATGCGATCAAGCGCGAATTGGATTTACCTCAGAAGGACGGCGAG CCACTCGACCAGTTCTTATGGCGCAAGCGTGACTTGTATCAAACCCTTTATATCGATGCTGACGAGGAACAAA TTATCCAGTACGTAGTCGGTACGTTGCAACCAAAACTTAAACGCTTTCTGAGCTACCCATTACCTAAAACGTT GGAGCAACTGATCCAGAAAGGTAAAGAGGTGCAAGGGAGCCTGGATCATAGTGAAGAACCGAGCCCTCAGCGG GCTTCTGAAGCTCGGACCGGTGATAGCGTCGAATCTTTACCACCTAGTACCACAACCAGCCCGAATGCGTCAT CTGGTACCCAACCTGAAGCGCCTTCCCCACCTGCTACAGTCATT SEQ ID NO: 43 GGGCAGCTCGAGAATGTCAACCATGGGAACCTCCATTCTTTTCAAGGTCATCGCGGCGGCGTCGCCAACAAGC CAAACGTTATCTTGCAGATCGGTAAATGTCGTGCAGAGATGCTGGACCACGTCCGGCGGACCCACCGGCATTT ACTGACAGAGGTATCGAAACAGGTTGAACGTGAGTTGAAGGGGTTACAGAAATCAGTAGGGAAATTAGAAAAT AACTTAGAAGACCATGTCCCTTCAGCCGTTGAAAACCAGCGTTGGAAAAAATCGATCAAGGCCTGCCTTTCCC GCTGCCAAGAGACCATTGCCCACCTTGAGCGTTGGGTGAAGCGCGAGATGAACGTATGGAAAGAGGTTTTCTT

CCGCTTAGAGCGGTGGGCAGATCGGTTGGAATCTGGGGGCGGGAAATATTGTCACGGTGATAATCATCGTCAA ACAGTATCAGTCGGTGTTGGCGGCCCTGAGGTACGTCCATCTGAAGGCGAAATTTACGATTACGCTCTCGACA TGTCGCAAATGTACGCTTTAACACCGCCTAGCCCAGGGGATGTGCCTGTAGTTAGCCAGCCGCACGACAGCTA TCAGTGGGTTACGGTTCCGGAGGATACCCCTCCATCCCCGGTGGAGACGCAAATCTTCGAGGACCCACGGGAG TTCTTGACCCACTTAGAGGATTACTTAAAGCAAGTGGGGGGTACAGAGGAATATTGGTTATCTCAGATCCAGA ATCACATGAACGGGCCAGCCAAGAAGTGGTGGGAGTATAAGCAAGACTCAGTAAAAAATTGGCTCGAGTTTAA GAAGGAATTCCTTCAGTATTCCGAGGGGACACTTACGCGCGACGCTATCAAGGAAGAACTTGACCTCCCGCAA AAGGACGGGGAACCTCTTGATCAGTTCCTGTGGCGCAAGCGCGACTTGTACCAGACCCTGTACGTGGAGGCGG ATGAGGAGGAGGTGATCCAGTATGTTGTGGGGACTTTACAACCTAAATTAAAGCGTTTTCTCTCACACCCTTA CCCGAAAACGTTAGAGCAACTTATCCAACGGGGCAAAGAGGTGGAAGGGAACCTCGACAATTCAGAGGAACCA ACACCTCAGCGTACTCCAGAACACCAACTGTGTGGTTCTGTAGAATCGCTGCCTCCTTCCTCTACCGTCAGTC CAGTGGCTAGCGATGGTACTCAACCTGAGACTTCGCCATTGCCAGCGACTGTTATT SEQ ID NO: 44 GGGCCATTGACGTTGTTACAAGACTGGTGTCGTGGTGAACATTTAAACACCCGCCGGTGCATGTTGATCCTCG GTATCCCAGAAGATTGCGGCGAGGATGAGTTCGAAGAGACACTTCAGGAGGCGTGTCGCCATTTAGGGCGGTA CCGCGTGATCGGCCGCATGTTCCGTCGTGAGGAAAATGCCCAAGCGATCCTCTTGGAATTGGCGCAGGATATT GACTATGCCTTACTCCCTCGGGAAATCCCTGGGAAAGGCGGGCCTTGGGAGGTAATTGTGAAGCCGCGTAATT CCGACGGCGAATTCTTAAATCGGCTTAATCGCTTTCTTGAAGAGGAGCGCCGTACGGTCTCCGATATGAACCG TGTTTTGGGCTCGGATACTAACTGTTCAGCTCCTCGTGTCACCATTAGTCCTGAATTCTGGACTTGGGCACAG ACGCTGGGCGCAGCTGTCCAACCATTGCTCGAACAGATGCTCTACCGGGAGTTACGGGTCTTCAGTGGCAATA CGATTTCCATCCCAGGTGCTCTCGCTTTTGACGCGTGGCTGGAGCATACCACGGAAATGCTTCAAATGTGGCA GGTGCCTGAAGGGGAGAAACGGCGGCGCTTGATGGAGTGTTTGCGGGGGCCAGCCCTGCAAGTCGTTAGTGGG TTACGTGCATCGAATGCCAGTATCACTGTCGAAGAGTGTCTTGCTGCACTGCAGCAGGTATTCGGTCCAGTGG AAAGTCATAAGATTGCCCAAGTAAAGTTATGCAAAGCTTACCAGGAGGCTGGGGAAAAAGTAAGCAGCTTCGT TTTGCGTTTGGAGCCACTGCTTCAGCGTGCTGTAGAAAACAACGTGGTCAGTCGCCGCAATGTCAACCAAACA CGTCTTAAGCGTGTTCTGTCGGGCGCCACCCTTCCTGACAAGCTGCGTGATAAATTGAAGTTAATGAAACAGC GCCGTAAACCGCCGGGTTTCTTGGCGTTGGTTAAACTGTTACGTGAAGAGGAGGAGTGGGAGGCCACCTTAGG GCCAGACCGCGAGTCATTGGAGGGGTTAGAAGTGGCACCGCGCCCGCCAGCACGGATTACGGGTGTTGGCGCA GTACCTCTTCCGGCATCCGGGAATTCATTTGATGCCCGTCCTTCGCAAGGGTACCGGCGCCGTCGGGGTCGTG GTCAGCACCGTCGGGGCGGCGTTGCTCGTGCAGGCTCTCGTGGCTCTCGTAAGCGGAAACGGCACACCTTCTG CTATTCCTGTGGTGAGGATGGCCATATTCGTGTCCAATGCATTAACCCTAGCAATCTCCTGTTGGCTAAGGAG ACCAAAGAGATTTTGGAAGGGGGAGAACGTGAAGCGCAAACGAATTCACGT SEQ ID NO: 45 GGGGCTCTTACGCTCTTAGAAGACTGGTGTAAGGGTATGGACATGGACCCGCGGAAGGCTCTCCTGATTGTAG GTATTCCGATGGAATGCAGTGAGGTGGAAATCCAGGATACAGTTAAAGCTGGTCTTCAACCTCTGTGCGCTTA TCGTGTACTCGGCCGTATGTTCCGGCGGGAGGATAATGCGAAGGCTGTTTTCATTGAGCTGGCAGACACCGTG AATTACACCACGTTACCGTCTCACATTCCGGGTAAAGGGGGTTCCTGGGAAGTCGTTGTTAAACCTCGGAACC CTGACGACGAGTTCCTTTCTCGGCTTAACTACTTCTTGAAAGATGAGGGCCGCTCGATGACGGATGTCGCCCG GGCACTGGGGTGCTGTAGCTTACCTGCGGAATCACTGGACGCGGAAGTAATGCCACAGGTCCGCTCCCCACCA TTAGAACCTCCAAAAGAGAGTATGTGGTACCGTAAGTTAAAAGTGTTTAGTGGTACCGCGTCGCCTTCGCCGG GGGAGGAGACATTTGAGGACTGGTTAGAGCAAGTCACCGAGATCATGCCTATCTGGCAAGTATCTGAAGTTGA AAAGCGCCGTCGGTTACTGGAGTCACTCCGGGGCCCGGCACTCTCAATTATGCGCGTGTTACAAGCCAATAAC GATAGCATTACCGTTGAACAGTGTTTGGATGCATTAAAGCAGATCTTTGGCGACAAGGAAGACTTCCGTGCCT CTCAATTTCGTTTTCTTCAAACGTCCCCTAAAATTGGGGAGAAGGTGAGTACGTTCCTGCTGCGTTTAGAGCC ACTCTTGCAAAAGGCCGTTCACAAGAGCCCACTTTCGGTACGTAGTACTGATATGATTCGGTTAAAGCACCTG TTGGCACGCGTAGCCATGACCCCGGCACTGCGTGGTAAACTCGAATTACTCGACCAACGCGGGTGCCCACCTA ATTTTCTTGAGCTGATGAAGCTGATCCGGGATGAGGAAGAGTGGGAGAATACTGAAGCTGTGATGAAAAATAA AGAGAAACCTTCAGGTCGTGGCCGCGGTGCATCAGGCCGTCAAGCTCGCGCCGAGGCCAGTGTAAGTGCTCCG CAAGCAACAGTCCAAGCACGTAGCTTCTCTGATTCTAGCCCGCAGACGATTCAGGGGGGCTTACCACCTCTTG TCAAGCGTCGGCGCCTTTTGGGTTCGGAGAGCACACGTGGGGAAGACCACGGGCAAGCTACTTATCCGAAAGC AGAGAATCAGACTCCAGGGCGTGAGGGCCCGCAGGCGGCTGGGGAGGAACTTGGTAATGAGGCCGGGGCCGGC GCGATGTCCCACCCGAAACCGTGGGAAACC SEQ ID NO: 46 GGGGCTGTGACAATGCTCCAGGACTGGTGCCGTTGGATGGGCGTGAACGCTCGGCGGGGGCTGTTAATCTTAG GTATCCCTGAAGACTGTGACGATGCAGAGTTCCAAGAGTCGTTAGAAGCTGCACTCCGTCCTATGGGTCACTT TACTGTACTCGGTAAGGCCTTCCGCGAGGAAGACAACGCTACCGCTGCGCTGGTGGAATTAGATCGCGAGGTT AATTACGCACTTGTTCCACGCGAAATTCCGGGCACCGGCGGGCCTTGGAACGTCGTGTTCGTTCCTCGGTGCT CCGGCGAGGAATTCCTGGGGTTAGGCCGCGTGTTCCACTTTCCTGAACAGGAGGGCCAAATGGTAGAATCGGT TGCGGGGGCACTGGGGGTAGGTCTGCGCCGCGTGTGTTGGTTACGCTCGATCGGGCAAGCTGTACAACCATGG GTAGAAGCTGTTCGCTGCCAAAGCTTAGGGGTATTTAGTGGTCGTGATCAACCTGCACCTGGTGAAGAAAGCT TCGAGGTCTGGTTGGATCATACGACCGAGATGTTGCATGTGTGGCAAGGCGTGTCGGAACGGGAACGGCGCCG TCGTCTGCTGGAAGGGCTGCGTGGCACAGCCTTACAACTTGTACATGCCTTACTGGCAGAAAATCCGGCACGG ACAGCACAAGATTGCTTGGCTGCATTAGCCCAAGTTTTTGGTGATAACGAAAGCCAGGCAACGATTCGTGTTA AATGTTTGACAGCCCAACAGCAGAGTGGCGAACGCCTCTCTGCGTTCGTTCTCCGCTTAGAAGTACTTCTGCA AAAGGCTATGGAGAAGGAAGCATTGGCGCGCGCGTCAGCGGATCGGGTGCGTCTTCGTCAGATGCTGACACGC GCACATCTCACAGAGCCGTTGGATGAAGCCTTACGGAAATTGCGTATGGCAGGGCGTTCTCCGTCTTTTTTGG AAATGCTCGGCTTAGTACGCGAGTCAGAGGCCTGGGAGGCAAGTCTGGCTCGGTCCGTCCGGGCGCAAACCCA GGAGGGTGCAGGGGCCCGGGCGGGGGCCCAAGCAGTTGCGCGTGCCAGCACTAAGGTTGAAGCTGTACCTGGT GGCCCTGGCCGGGAGCCAGAAGGTCTCCTCCAAGCCGGGGGCCAAGAAGCGGAAGAACTTCTCCAAGAGGGCT TAAAGCCGGTTTTAGAGGAATGTGACAAT SEQ ID NO: 47 GGGGCGGTCACCATGTTGCAAGACTGGTGTCGGTGGATGGGCGTGAATGCTCGGCGGGGTTTATTGATCTTGG GTATCCCAGAAGACTGTGACGACGCCGAGTTTCAGGAGTCGCTCGAGGCCGCCCTTCGTCCAATGGGGCATTT TACGGTTCTGGGCAAGGTGTTCCGTGAAGAGGATAACGCTACAGCAGCTCTTGTGGAGCTTGACCGTGAGGTG AATTATGCGTTAGTACCTCGCGAGATTCCAGGTACCGGTGGGCCATGGAACGTAGTCTTCGTCCCACGTTGCT CGGGGGAGGAATTTCTGGGGCTTGGGCGCGTATTCCACTTTCCAGAACAGGAAGGGCAGATGGTCGAAAGCGT AGCAGGCGCTCTTGGCGTTGGTCTCCGGCGCGTGTGCTGGTTACGCTCCATCGGCCAAGCAGTCCAACCATGG GTTGAAGCCGTACGCTATCAATCTTTAGGTGTCTTCTCAGGCCGTGACCAGCCGGCGCCTGGTGAGGAATCCT TCGAAGTCTGGCTCGATCATACAACTGAGATGCTGCATGTATGGCAAGGTGTCTCAGAGCGGGAACGGCGGCG GCGGTTATTAGAGGGGCTCCGTGGGACTGCGCTCCAATTAGTACATGCGCTTTTGGCCGAAAATCCAGCCCGT ACTGCCCAAGATTGTCTGGCAGCACTCGCCCAAGTATTCGGCGACAACGAATCGCAGGCAACAATCCGCGTAA AGTGTCTTACAGCACAGCAGCAGTCAGGGGAACGTCTTAGTGCGTTCGTTCTGCGGCTGGAAGTGTTACTCCA GAAAGCCATGGAAAAGGAGGCATTGGCTCGCGCGAGCGCTGACCGTGTACGTCTGCGGCAAATGCTTACTCGC GCACATCTCACCGAGCCTCTCGATGAAGCACTGCGGAAACTGCGCATGGCAGGCCGCAGCCCGTCTTTCCTGG AAATGTTAGGCTTAGTCCGGGAGTCCGAAGCCTGGGAGGCCAGTCTGGCACGGTCAGTGCGGGCACAAACGCA AGAGGGTGCAGGGGCACGGGCGGGTGCACAAGCAGTTGCACGTGCCTCCACTAAAGTTGAGGCAGTGCCGGGT GGGCCAGGCCGTGAACCGGAGGGTTTGCGCCAAGCCGGCGGGCAGGAAGCCGAAGAATTACTCCAAGAAGGTT TAAAACCGGTTTTGGAGGAATGCGATAAC SEQ ID NO: 48 GGGGTGGAAGATTTGGCGGCATCTTACATCGTATTAAAGCTTGAGAACGAAATCCGGCAGGCGCAGGTCCAAT GGTTAATGGAGGAAAACGCCGCCCTGCAGGCCCAGATCCCTGAACTTCAAAAGTCGCAAGCCGCGAAGGAGTA TGATCTTCTGCGTAAATCTTCGGAGGCGAAGGAGCCGCAAAAACTGCCAGAACATATGAATCCACCGGCCGCT TGGGAAGCACAAAAGACTCCAGAGTTTAAGGAACCACAGAAACCTCCTGAACCACAGGATTTGCTTCCTTGGG AGCCGCCTGCTGCCTGGGAGTTGCAAGAAGCACCGGCTGCCCCTGAGTCACTGGCTCCGCCTGCAACCCGTGA GTCTCAGAAACCACCTATGGCGCATGAAATCCCTACTGTATTGGAGGGGCAAGGGCCTGCCAACACACAAGAC GCTACGATTGCTCAAGAACCAAAGAATAGCGAGCCGCAAGACCCTCCAAATATCGAGAAACCTCAGGAAGCTC CGGAATATCAAGAAACAGCGGCACAGTTGGAGTTTTTAGAACTTCCTCCACCTCAGGAGCCACTCGAACCGAG CAATGCGCAAGAATTTCTCGAGTTGTCGGCTGCCCAGGAGTCCTTAGAAGGCCTCATTGTAGTTGAAACGTCC GCGGCTTCGGAGTTCCCACAGGCTCCTATCGGGCTTGAAGCCACCGACTTTCCGCTGCAGTACACGCTTACCT TCTCTGGCGACAGCCAGAAGTTGCCAGAATTTTTGGTCCAACTCTACAGTTATATGCGGGTACGTGGGCACTT ATACCCTACCGAGGCGGCGTTAGTGTCGTTTGTAGGCAATTGTTTCTCAGGGCGCGCGGGCTGGTGGTTTCAG TTGCTTTTGGATATCCAGTCGCCTCTGTTAGAACAGTGTGAAAGTTTTATCCCGGTTCTCCAAGACACATTTG ACAATCCGGAAAACATGAAGGACGCAAACCAATGCATCCACCAGCTTTGTCAGGGCGAGGGTCATGTGGCCAC ACACTTCCACCTCATTGCACAAGAGCTTAATTGGGATGAAAGCACGCTGTGGATCCAGTTCCAGGAAGGCCTG GCCTCATCCATCCAGGATGAACTTTCCCATACATCGCCTGCTACCAACCTGAGTGATCTGATTACTCAATGCA TCTCATTAGAGGAAAAGCCTGACCCAAACCCGTTAGGGAAGTCCTCCTCGGCGGAGGGGGATGGCCCGGAAAG TCCGCCAGCAGAAAACCAACCTATGCAAGCTGCGATCAATTGTCCTCACATTTCCGAAGCAGAGTGGGTTCGT TGGCACAAAGGCCGGCTTTGTCTCTATTGCGGCTATCCGGGTCACTTCGCACGTGATTGCCCAGTGAAGCCAC ACCAGGCGTTACAGGCAGGGAACATTCAGGCTTGCCAA SEQ ID NO: 49 GGGGTGCAGCCGCAGACTAGCAAAGCTGAATCGCCGGCTCTCGCTGCCTCACCGAACGCACAAATGGATGACG TTATTGATACATTAACCTCCCTGCGTCTGACGAATTCGGCTCTGCGGCGGGAGGCTAGCACTCTTCGGGCCGA GAAAGCAAATTTAACTAATATGCTCGAGTCAGTGATGGCCGAGTTAACGCTGTTACGGACCCGTGCGCGGATT CCGGGGGCCCTGCAGATTACGCCACCAATTTCGTCTATTACTAGCAACGGTACTCGCCCGATGACGACTCCTC CAACTAGTTTACCTGAACCGTTTTCTGGCGATCCTGGCCGGTTAGCTGGTTTCCTTATGCAGATGGACCGTTT TATGATCTTTCAAGCTAGCCGGTTTCCAGGGGAGGCAGAGCGTGTTGCGTTCCTGGTGTCGCGCTTAACTGGC GAAGCAGAAAAATGGGCCATTCCTCACATGCAACCAGACTCTCCTTTGCGTAACAACTATCAAGGCTTCTTAG CAGAGTTACGGCGGACCTATAAGAGCCCGTTGCGTCACGCCCGGCGGGCGCAAATCCGGAAGACATCGGCCTC GAACCGGGCAGTCCGTGAACGCCAAATGCTTTGCCGGCAACTTGCATCAGCAGGTACAGGCCCATGCCCGGTA CACCCTGCTAGTAACGGGACTTCCCCGGCACCGGCATTACCAGCACGGGCGCGTAACTTA SEQ ID NO: 50 GGGGACGGTCGGGTACAGTTGATGAAGGCTTTATTGGCTGGCCCTTTACGTCCGGCGGCACGCCGTTGGCGGA ATCCTATTCCATTTCCAGAGACTTTTGATGGGGATACTGATCGCCTCCCGGAGTTTATCGTCCAAACTTCGTC CTACATGTTCGTTGACGAAAATACTTTCTCTAACGACGCTCTGAAAGTGACATTTCTCATTACCCGGCTGACA GGTCCAGCCTTGCAATGGGTCATTCCGTACATTCGTAAAGAAAGCCCGCTTCTTAACGACTATCGGGGTTTCC TGGCCGAGATGAAGCGGGTTTTTGGGTGGGAAGAGGACGAGGACTTT SEQ ID NO: 51 GGGGAAGGTCGGGTGCAACTTATGAAAGCGTTGCTTGCCCGCCCGCTTCGTCCAGCAGCACGTCGCTGGCGGA ATCCAATTCCTTTCCCGGAGACTTTTGACGGGGACACCGATCGGCTCCCAGAGTTCATTGTGCAGACGTCAAG CTATATGTTCGTGGATGAGAACACGTTCTCTAACGACGCGTTGAAAGTGACTTTCTTAATTACGCGTTTGACT GGCCCGGCTTTACAATGGGTGATTCCATACATTAAGAAAGAGTCACCGCTTCTCAGTGATTATCGCGGTTTTT TAGCCGAGATGAAGCGGGTCTTCGGGTGGGAAGAAGACGAAGACTTT SEQ ID NO: 52 GGGCCGCGTGGGCGTTGCCGTCAACAAGGTCCTCGGATTCCGATTTGGGCAGCGGCCAACTATGCCAACGCCC ACCCGTGGCAACAAATGGATAAGGCTTCGCCAGGCGTTGCTTACACACCTTTGGTTGATCCTTGGATTGAGCG GCCTTGTTGCGGTGACACGGTTTGTGTGCGCACCACAATGGAACAGAAGAGCACAGCGTCAGGCACTTGTGGT GGTAAGCCTGCTGAGCGTGGTCCTCTCGCGGGGCATATGCCGAGCTCACGCCCACATCGGGTTGATTTCTGTT GGGTTCCTGGTAGCGACCCAGGCACATTCGACGGCAGTCCATGGCTCTTAGATCGCTTTTTGGCGCAACTTGG TGATTACATGAGTTTTCACTTTGAACACTACCAGGACAATATCAGCCGTGTCTGCGAGATTCTTCGTCGGTTA ACGGGCCGCGCTCAGGCATGGGCTGCTCCTTACCTGGACGGGGACCTTCCACTGCCAGACGACTACGAATTGT TTTGTCAAGACCTTAAGGAGGTAGTACAGGACCCTAACAGTTTCGCCGAGTATCACGCCGTGGTGACTTGTCC ACTCCCTCTTGCTTCGTCCCAACTTCCTGTAGCTCCTCAGCTTCCGGTGGTACGCCAATACCTTGCGCGCTTC TTGGAGGGCCTTGCTTTGGATATGGGTACGGCGCCTCGGTCACTCCCGGCCGCTATGGCCACACCGGCAGTCT CCGGCTCGAACTCCGTTTCTCGTTCTGCCTTATTTGAACAACAACTCACAAAGGAATCCACTCCAGGCCCGAA AGAGCCACCTGTTCTCCCTAGCTCGACTTGCTCTAGCAAACCGGGTCCTGTCGAACCAGCCAGTTCACAACCT GAAGAGGCTGCTCCTACCCCGGTGCCGCGTTTGTCAGAGTCGGCTAACCCACCGGCTCAGCGTCCAGACCCTG CTCACCCTGGTGGTCCTAAACCACAAAAAACCGAAGAGGAAGTTTTAGAAACTGAGGGGGACCAGGAAGTTAG CCTGGGGACGCCGCAGGAGGTCGTAGAAGCGCCGGAAACACCAGGTGAACCACCGCTCAGCCCTGGGTTC SEQ ID NO: 53 GGGGTTGATGAATTGGTGCTCTTGTTGCACGCGCTGTTAATGCGCCATCGGGCGCTTTCCATTGAAAATTCTC AGTTGATGGAGCAACTTCGCTTGTTGGTCTGCGAACGGGCGAGCCTTCTTCGTCAGGTACGTCCGCCGAGCTG TCCAGTGCCATTTCCTGAGACTTTTAACGGGGAGTCATCACGGTTACCTGAGTTCATCGTCCAAACCGCAAGC TATATGTTAGTTAATGAAAATCGCTTTTGCAATGACGCAATGAAAGTCGCTTTTTTGATTAGCCTTCTTACTG GTGAAGCAGAAGAATGGGTCGTCCCATACATTGAGATGGATTCACCAATTCTTGGGGACTACCGTGCGTTCTT GGATGAGATGAAGCAGTGTTTTGGGTGGGACGATGATGAAGATGACGACGATGAGGAAGAGGAGGATGACTAT SEQ ID NO: 54 GGGCCTGTGGATTTAGGTCAGGCTTTGGGGTTGTTGCCATCCCTCGCTAAGGCCGAAGATTCCCAATTTAGCG AAAGCGATGCAGCTTTACAGGAGGAATTGTCTTCTCCGGAAACCGCACGGCAACTTTTTCGTCAATTTCGCTA TCAAGTCATGTCGGGGCCTCATGAAACACTGAAACAGTTACGGAAGTTATGTTTTCAGTGGCTGCAACCTGAA GTCCATACAAAGGAACAAATCCTCGAAATTCTGATGCTGGAACAGTTCTTGACCATTCTGCCTGGTGAAATTC AGATGTGGGTCCGCAAGCAGTGCCCTGGTAGTGGGGAGGAGGCGGTTACGTTAGTAGAATCCCTGAAAGGTGA TCCACAACGGCTCTGGCAATGGATCTCCATCCAAGTCCTGGGTCAGGATATCCTGTCTGAGAAAATGGAGTCA CCTTCTTGCCAGGTGGGCGAAGTGGAGCCACACCTGGAAGTTGTACCTCAGGAACTGGGGTTAGAGAATTCAT CTTCAGGGCCGGGGGAACTTCTTTCGCACATCGTGAAAGAGGAGTCTGACACTGAAGCAGAGTTGGCGTTAGC GGCATCCCAGCCAGCTCGTTTGGAAGAACGGCTGATTCGGGATCAGGACCTTGGGGCGTCCCTCCTCCCGGCA GCACCGCAGGAGCAATGGCGTCAATTAGACAGCACTCAAAAAGAACAATATTGGGACCTGATGCTGGAGACCT ACGGCAAAATGGTATCCGGCGCGGGTATCTCACACCCGAAGTCCGATTTAACGAACTCAATTGAGTTCGGTGA AGAGTTGGCAGGTATTTATTTACATGTAAACGAAAAGATTCCGCGGCCTACCTGCATTGGTGACCGCCAAGAA AACGACAAAGAAAACCTTAATTTGGAAAACCATCGTGACCAGGAATTATTACATGCCAGCTGCCAGGCCTCGG GCGAAGTGCCATCCCAGGCATCGTTACGTGGCTTCTTTACCGAGGACGAACCTGGTTGCTTCGGCGAAGGGGA GAACCTTCCTGAGGCACTTCAGAATATCCAGGATGAGGGGACTGGCGAACAGCTGAGCCCGCAAGAACGCATT AGTGAAAAACAGTTGGGTCAACATTTGCCAAATCCGCACTCGGGGGAGATGTCGACGATGTGGCTTGAAGAAA AACGGGAGACCAGCCAGAAAGGCCAACCACGTGCACCAATGGCGCAGAAATTGCCAACGTGCCGCGAATGTGG CAAAACGTTTTATCGCAATAGTCAACTTATCTTTCACCAACGCACACACACCGGTGAGACATATTTTCAATGC ACCATCTGCAAAAAGGCGTTTCTCCGGTCATCTGATTTCGTGAAACATCAGCGGACTCATACTGGCGAAAAAC CTTGTAAATGTGACTATTGTGGCAAGGGCTTTAGTGATTTTAGCGGGCTTCGGCATCACGAGAAGATCCATAC CGGCGAGAAGCCATACAAGTGTCCAATCTGTGAGAAATCTTTCATCCAGCGCAGTAATTTTAACCGCCACCAA CGGGTTCACACCGGTGAAAAGCCTTATAAATGCTCGCATTGTGGCAAGAGCTTCAGCTGGAGCTCCTCGCTCG ATAAGCATCAACGTTCACATCTGGGGAAGAAGCCGTTCCAA SEQ ID NO: 55 GGGACTCTCCGCTTACTTGAGGATTGGTGTCGGGGGATGGACATGAACCCACGTAAGGCCCTTCTTATCGCCG GGATTTCCCAGTCATGTTCAGTCGCCGAGATTGAAGAGGCGCTCCAAGCCGGGCTTGCTCCTTTAGGCGAGTA TCGTCTCCTTGGGCGGATGTTTCGCCGCGATGAAAATCGCAAAGTAGCGTTGGTTGGTCTCACAGCTGAAACT AGCCATGCGCTTGTACCTAAAGAAATTCCTGGTAAAGGCGGGATCTGGCGGGTTATTTTTAAACCACCGGACC CGGACAATACGTTTCTTTCTCGTTTGAATGAGTTCCTCGCGGGCGAGGGGATGACGGTGGGGGAACTTAGTCG TGCTCTTGGTCACGAAAATGGGTCATTAGACCCTGAACAGGGTATGATTCCGGAAATGTGGGCGCCGATGCTG GCACAGGCTCTGGAGGCTCTCCAACCGGCTTTACAGTGCCTTAAGTACAAGAAGCTGCGCGTTTTTTCAGGGC GCGAGTCTCCAGAGCCGGGTGAGGAGGAATTCGGCCGTTGGATGTTCCATACCACCCAGATGATCAAAGCGTG GCAGGTGCCGGATGTCGAGAAACGCCGCCGGCTGTTGGAATCACTCCGCGGGCCGGCACTTGACGTTATTCGG GTTCTGAAAATTAACAACCCGTTAATTACGGTAGATGAATGTTTGCAAGCACTTGAAGAGGTCTTTGGGGTGA CTGACAATCCTCGGGAATTGCAAGTAAAATACTTAACGACCTACCATAAGGACGAGGAGAAATTATCAGCCTA CGTACTGCGGCTGGAACCGCTGCTGCAGAAGCTCGTCCAGCGGGGGGCTATTGAACGGGACGCTGTTAATCAG GCTCGCCTGGATCAGGTAATCGCTGGGGCGGTACATAAAACTATCCGCCGTGAGCTGAACCTGCCTGAAGACG GGCCGGCGCCAGGCTTTCTTCAACTCCTCGTTTTGATTAAGGATTACGAGGCAGCTGAAGAGGAGGAAGCATT ACTTCAGGCCATTCTTGAAGGGAACTTTACT SEQ ID NO: 56 GGGACAGAACGGCGTCGCGACGAATTAAGTGAAGAAATTAATAATCTTCGTGAAAAGGTTATGAAACAGAGTG AGGAAAACAACAATCTTCAATCCCAAGTCCAGAAACTCACTGAGGAGAATACTACACTCCGTGAGCAAGTTGA ACCTACACCTGAAGATGAAGATGACGACATTGAGTTGCGGGGCGCAGCAGCCGCAGCCGCGCCTCCGCCGCCG ATCGAGGAGGAATGCCCGGAGGATTTACCGGAAAAATTTGATGGTAATCCGGACATGTTAGCGCCATTCATGG CCCAGTGCCAAATTTTTATGGAAAAGTCTACGCGCGATTTTAGTGTAGATCGCGTACGTGTATGTTTTGTGAC GAGCATGATGACTGGTCGCGCAGCCCGTTGGGCGTCAGCGAAATTGGAGCGGTCGCACTACCTGATGCATAAT TACCCGGCGTTCATGATGGAGATGAAACACGTGTTTGAAGACCCGCAGCGGCGGGAGGTGGCCAAACGCAAGA TCCGGCGGTTGCGGCAGGGCATGGGCAGCGTAATTGATTATAGTAATGCGTTTCAAATGATTGCGCAGGATCT GGATTGGAATGAACCTGCTCTCATTGATCAATATCATGAAGGGCTTAGTGACCATATTCAAGAGGAACTCTCT CACCTGGAAGTGGCTAAATCTCTCTCCGCCCTTATTGGCCAATGCATTCATATTGAGCGCCGTCTTGCACGTG CTGCTGCCGCTCGGAAACCGCGTAGTCCACCACGGGCTTTAGTGCTCCCACATATCGCGTCACACCATCAAGT AGATCCTACTGAGCCAGTGGGGGGTGCACGCATGCGCTTAACCCAAGAAGAAAAGGAACGTCGTCGTAAGCTG AATTTATGCCTGTACTGCGGCACTGGTGGCCATTATGCCGATAACTGTCCTGCCAAAGCCAGTAAGTCAAGCC CGGCTGGGAAACTTCCAGGTCCTGCCGTCGAGGGCCCTTCTGCTACCGGCCCAGAGATTATCCGCTCCCCGCA AGACGATGCGTCGTCGCCTCATCTCCAGGTAATGCTCCAAATCCACCTCCCTGGCCGGCACACACTCTTTGTC CGGGCGATGATTGACTCTGGGGCGTCTGGTAATTTTATTGATCACGAGTATGTTGCTCAAAATGGTATCCCTC TCCGGATCAAAGACTGGCCTATTCTGGTTGAAGCCATCGATGGCCGTCCGATCGCGAGCGGTCCTGTGGTTCA TGAAACGCATGACCTCATCGTTGATCTGGGTGACCACCGTGAAGTATTATCCTTTGATGTGACTCAGTCACCG TTTTTTCCAGTTGTTTTGGGCGTCCGTTGGCTTTCGACTCACGATCCTAACATCACGTGGTCGACACGGTCGA TTGTCTTCGATTCGGAATATTGTCGTTATCATTGCCGCATGTATTCACCAATTCCGCCGTCTCTCCCGCCGCC TGCGCCGCAACCTCCTCTGTATTACCCGGTGGACGGTTACCGTGTTTACCAGCCAGTTCGCTACTACTACGTA CAAAACGTGTACACGCCTGTTGATGAACACGTGTACCCAGATCACCGCCTGGTCGACCCTCATATTGAGATGA TCCCGGGTGCGCACTCGATCCCATCGGGCCATGTTTATTCCTTGTCTGAGCCAGAAATGGCCGCCTTACGGGA TTTTGTGGCCCGGAATGTCAAAGACGGCCTGATTACCCCGACAATTGCACCAAACGGTGCTCAGGTGTTGCAG GTGAAGCGGGGCTGGAAGTTGCAAGTCAGCTATGATTGTCGTGCGCCAAACAACTTCACTATTCAGAACCAAT ATCCACGTCTCAGCATCCCTAATCTCGAGGACCAGGCACATCTTGCAACATATACTGAATTTGTACCTCAGAT TCCTGGCTATCAGACTTATCCTACGTATGCTGCCTACCCAACATACCCGGTAGGTTTCGCATGGTACCCAGTA GGCCGGGACGGGCAGGGCCGCTCTTTATATGTTCCTGTCATGATTACATGGAACCCGCATTGGTACCGCCAGC CTCCGGTCCCACAGTACCCACCTCCTCAACCTCCACCACCTCCGCCGCCTCCTCCACCGCCACCTTCTTACTC GACATTA

Sequence CWU 1

1

851396PRTHomo sapiens 1Gly Glu Leu Asp His Arg Thr Ser Gly Gly Leu His Ala Tyr Pro Gly1 5 10 15Pro Arg Gly Gly Gln Val Ala Lys Pro Asn Val Ile Leu Gln Ile Gly 20 25 30Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35 40 45Leu Leu Ala Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50 55 60His Arg Ser Val Gly Lys Leu Glu Ser Asn Leu Asp Gly Tyr Val Pro65 70 75 80Thr Ser Asp Ser Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Cys 85 90 95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu 100 105 110Met His Val Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115 120 125Arg Leu Glu Ser Thr Gly Gly Lys Tyr Pro Val Gly Ser Glu Ser Ala 130 135 140Arg His Thr Val Ser Val Gly Val Gly Gly Pro Glu Ser Tyr Cys His145 150 155 160Glu Ala Asp Gly Tyr Asp Tyr Thr Val Ser Pro Tyr Ala Ile Thr Pro 165 170 175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Pro Ala Glu Ala Gln 180 185 190Gln Tyr Gln Pro Trp Val Pro Gly Glu Asp Gly Gln Pro Ser Pro Gly 195 200 205Val Asp Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu 210 215 220Glu Glu Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230 235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Phe 245 250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu 260 265 270Gln Tyr Ser Glu Gly Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu 275 280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu305 310 315 320Ile Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe 325 330 335Leu Arg His Pro Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly 340 345 350Met Glu Val Gln Asp Asp Leu Glu Gln Ala Ala Glu Pro Ala Gly Pro 355 360 365His Leu Pro Val Glu Asp Glu Ala Glu Thr Leu Thr Pro Ala Pro Asn 370 375 380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385 390 3952396PRTOrcinus orca 2Gly Glu Leu Asp Gln Arg Thr Thr Gly Gly Leu His Ala Tyr Pro Ala1 5 10 15Pro Arg Gly Gly Pro Val Ala Lys Pro Asn Val Ile Leu Gln Ile Gly 20 25 30Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35 40 45Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50 55 60His Arg Ser Val Gly Lys Leu Glu Ser Asn Leu Asp Gly Tyr Val Pro65 70 75 80Thr Gly Asp Ser Gln Arg Trp Arg Lys Ser Ile Lys Ala Cys Leu Cys 85 90 95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu 100 105 110Met His Val Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115 120 125Arg Leu Glu Ser Met Gly Gly Lys Tyr Pro Val Gly Ser Asn Pro Ser 130 135 140Arg His Thr Thr Ser Val Gly Val Gly Gly Pro Glu Ser Tyr Gly His145 150 155 160Glu Ala Asp Thr Tyr Asp Tyr Thr Val Ser Pro Tyr Ala Ile Thr Pro 165 170 175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Ala Val Glu Ala Gln 180 185 190Gln Tyr Pro Pro Trp Gly Leu Gly Glu Asp Gly Gln Pro Ser Pro Gly 195 200 205Val Asp Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu 210 215 220Glu Glu Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230 235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr 245 250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu 260 265 270Gln Tyr Ser Glu Gly Ala Leu Ser Arg Glu Ala Val Gln Arg Glu Leu 275 280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu305 310 315 320Ile Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe 325 330 335Leu Arg Pro Pro Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Lys Gly 340 345 350Met Glu Val Glu Asp Gly Leu Glu Gln Val Ala Glu Pro Ala Ser Pro 355 360 365His Leu Pro Thr Glu Glu Glu Ser Glu Ala Leu Thr Pro Ala Leu Thr 370 375 380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385 390 3953390PRTOdocoileus virginianus texanus 3Gly Glu Leu Asp His Arg Thr Thr Gly Gly Leu His Ala Tyr Pro Ala1 5 10 15Pro Arg Gly Gly Pro Ala Ala Lys Pro Asn Val Ile Leu Gln Ile Gly 20 25 30Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35 40 45Leu Leu Ala Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50 55 60His Arg Ser Val Gly Lys Leu Glu Ser Asn Leu Asp Gly Tyr Val Pro65 70 75 80Thr Gly Asp Ser Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ser 85 90 95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu 100 105 110Met His Val Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115 120 125Arg Leu Glu Ser Gly Gly Gly Lys Tyr Pro Val Gly Ser Asp Pro Ala 130 135 140Arg His Thr Val Ser Val Gly Val Gly Gly Pro Glu Ser Tyr Cys Gln145 150 155 160Asp Ala Asp Asn Tyr Asp Tyr Thr Val Ser Pro Tyr Ala Ile Thr Pro 165 170 175Pro Pro Ala Ala Gly Gln Leu Pro Gly Gln Glu Glu Val Glu Ala Gln 180 185 190Gln Tyr Pro Pro Trp Ala Pro Gly Glu Asp Gly Gln Leu Ser Pro Gly 195 200 205Val Asp Thr Gln Val Phe Glu Asp Pro Arg Glu Phe Leu Arg His Leu 210 215 220Glu Asp Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230 235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr 245 250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu 260 265 270Gln Tyr Ser Glu Gly Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu 275 280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Glu Glu Glu Glu305 310 315 320Ile Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe 325 330 335Leu Arg Pro Pro Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Lys Gly 340 345 350Met Glu Val Gln Asp Gly Leu Glu Gln Ala Ala Glu Pro Ala Ala Glu 355 360 365Glu Ala Glu Ala Leu Thr Pro Ala Leu Thr Asn Glu Ser Val Ala Ser 370 375 380Asp Arg Thr Gln Pro Glu385 3904401PRTOrnithorhynchus anatinus 4Gly Glu Leu Asp Arg Leu Asn Pro Ser Ser Gly Leu His Pro Ser Ser1 5 10 15Gly Leu His Pro Tyr Pro Gly Leu Arg Gly Gly Ala Thr Ala Lys Pro 20 25 30Asn Val Ile Leu Gln Ile Gly Lys Cys Arg Ala Glu Met Leu Glu His 35 40 45Val Arg Lys Thr His Arg His Leu Leu Thr Glu Val Ser Arg Gln Val 50 55 60Glu Arg Glu Leu Lys Gly Leu His Lys Ser Val Gly Lys Leu Glu Ser65 70 75 80Asn Leu Asp Gly Tyr Val Pro Ser Ser Asp Ser Gln Arg Trp Lys Lys 85 90 95Ser Ile Lys Ala Cys Leu Ser Arg Cys Gln Glu Thr Ile Ala His Leu 100 105 110Glu Arg Trp Val Lys Arg Glu Met Asn Val Trp Arg Glu Val Phe Tyr 115 120 125Arg Leu Glu Arg Trp Ala Asp Arg Leu Glu Ala Met Gly Gly Lys Tyr 130 135 140Pro Ala Gly Glu Gln Ala Arg Arg Thr Val Ser Val Gly Val Gly Gly145 150 155 160Pro Glu Thr Cys Cys Pro Gly Asp Glu Ser Tyr Asp Cys Pro Ile Ser 165 170 175Pro Tyr Ala Val Pro Pro Ser Thr Gly Glu Ser Pro Glu Ser Leu Asp 180 185 190Gln Gly Asp Gln His Tyr Gln Gln Trp Phe Ala Leu Pro Glu Glu Ser 195 200 205Pro Val Ser Pro Gly Val Asp Thr Gln Ile Phe Glu Asp Pro Arg Glu 210 215 220Phe Leu Arg His Leu Glu Lys Tyr Leu Lys Gln Val Gly Gly Thr Glu225 230 235 240Glu Asp Trp Leu Ser Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys 245 250 255Lys Trp Trp Glu Tyr Lys Gln Gly Ser Val Lys Asn Trp Leu Glu Phe 260 265 270Lys Lys Glu Phe Leu Gln Tyr Ser Glu Gly Thr Leu Thr Arg Asp Ala 275 280 285Leu Lys Arg Glu Leu Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp 290 295 300Gln Phe Leu Trp Arg Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp305 310 315 320Ala Asp Glu Glu Glu Ile Ile Gln Tyr Val Val Gly Thr Leu Gln Pro 325 330 335Lys Leu Lys Arg Phe Leu His His Pro Leu Pro Lys Thr Leu Glu Gln 340 345 350Leu Ile Gln Arg Gly Gln Glu Val Gln Asn Gly Leu Glu Pro Thr Asp 355 360 365Asp Pro Ala Gly Gln Arg Thr Gln Ser Glu Asp Asn Asp Glu Ser Leu 370 375 380Thr Pro Ala Val Thr Asn Glu Ser Thr Ala Ser Glu Gly Thr Leu Pro385 390 395 400Glu5404PRTAnser cygnoides domesticus 5Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1 5 10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Pro Pro Gly Glu Met Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln 180 185 190Trp Val Ser Val Ser Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln 340 345 350Gly Asn Met Asp His Ser Asp Glu Pro Ser Pro Gln Arg Thr Pro Glu 355 360 365Ile Gln Ser Gly Asp Ser Val Glu Ser Met Pro Pro Ser Thr Thr Ala 370 375 380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu Pro Pro Ser Pro Pro385 390 395 400Ala Thr Val Ile6395PRTPelecanus crispus 6Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1 5 10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Pro Pro Gly Glu Val Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln 180 185 190Trp Val Ser Val Ser Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln 340 345 350Gly Asn Met Asp His Ser Glu Glu Pro Ser Pro Gln Arg Thr Pro Glu 355 360 365Ile Gln Ser Gly Asp Ser Val Asp Ser Val Pro Pro Ser Thr Thr Ala 370 375 380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu385 390 3957395PRTHaliaeetus albicilla 7Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1 5 10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65

70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Asp His Gly Lys Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Pro Pro Gly Glu Val Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln 180 185 190Trp Val Ser Thr Ser Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln 340 345 350Gly Asn Met Asp His Ser Glu Glu Pro Ser Pro Gln Arg Thr Pro Glu 355 360 365Ile Gln Ser Gly Asp Ser Val Asp Ser Val Pro Pro Ser Thr Thr Ala 370 375 380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu385 390 3958465PRTOphiophagus hannah 8Gly Ser Trp Gly Leu Gln Arg His Val Ala Asp Glu Arg Arg Gly Leu1 5 10 15Ala Thr Pro Thr Tyr Gly Ala Val Cys Ser Ile Arg Glu Lys Lys Ala 20 25 30Ser Gln Leu Ser Gly Gln Ser Cys Leu Glu Lys Glu Leu Leu Gly Trp 35 40 45Lys Cys Thr Glu Ala Ile Val Glu Met Met Gln Val Asp Asn Phe Asn 50 55 60His Gly Asn Leu His Ser Cys Gln Gly His Arg Gly Met Ala Asn His65 70 75 80Lys Pro Asn Val Ile Leu Gln Ile Gly Lys Cys Arg Ala Glu Met Leu 85 90 95Asp His Val Arg Arg Thr His Arg His Leu Leu Thr Glu Val Ser Lys 100 105 110Gln Val Glu Arg Glu Leu Lys Ser Leu Gln Lys Ser Val Gly Lys Leu 115 120 125Glu Asn Asn Leu Glu Asp His Val Pro Ser Ala Ala Glu Asn Gln Arg 130 135 140Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys Gln Glu Thr Ile145 150 155 160Ala His Leu Glu Arg Trp Val Lys Arg Glu Ile Asn Val Trp Lys Glu 165 170 175Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu Glu Ser Gly Gly 180 185 190Gly Lys Tyr Gly Pro Gly Asp Gln Ser Arg Gln Thr Val Ser Val Gly 195 200 205Val Gly Ala Pro Glu Ile Gln Pro Arg Lys Glu Glu Ile Tyr Asp Tyr 210 215 220Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro Pro Pro Met Gly225 230 235 240Glu Asp Pro Asn Val Pro Gln Ser His Asp Ser Tyr Gln Trp Ile Thr 245 250 255Ile Ser Asp Asp Ser Pro Pro Ser Pro Val Glu Thr Gln Ile Phe Glu 260 265 270Asp Pro Arg Glu Phe Leu Thr His Leu Glu Asp Tyr Leu Lys Gln Val 275 280 285Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn His Met Asn 290 295 300Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser Val Lys Asn305 310 315 320Trp Leu Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu Gly Thr Leu 325 330 335Thr Arg Asp Ala Ile Lys Gln Glu Leu Asp Leu Pro Gln Lys Asp Gly 340 345 350Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu Tyr Gln Thr 355 360 365Leu Tyr Ile Asp Ala Glu Glu Glu Glu Val Ile Gln Tyr Val Val Gly 370 375 380Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser His Pro Tyr Pro Lys385 390 395 400Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Glu Gly Asn Leu 405 410 415Asp Asn Ser Glu Glu Pro Ser Pro Gln Arg Ser Pro Lys His Gln Leu 420 425 430Gly Gly Ser Val Glu Ser Leu Pro Pro Ser Ser Thr Ala Ser Pro Val 435 440 445Ala Ser Asp Glu Thr His Pro Asp Val Ser Ala Pro Pro Val Thr Val 450 455 460Ile4659451PRTAustrofundulus limnaeus 9Gly Asp Gly Glu Thr Gln Ala Glu Asn Pro Ser Thr Ser Leu Asn Asn1 5 10 15Thr Asp Glu Asp Ile Leu Glu Gln Leu Lys Lys Ile Val Met Asp Gln 20 25 30Gln His Leu Tyr Gln Lys Glu Leu Lys Ala Ser Phe Glu Gln Leu Ser 35 40 45Arg Lys Met Phe Ser Gln Met Glu Gln Met Asn Ser Lys Gln Thr Asp 50 55 60Leu Leu Leu Glu His Gln Lys Gln Thr Val Lys His Val Asp Lys Arg65 70 75 80Val Glu Tyr Leu Arg Ala Gln Phe Asp Ala Ser Leu Gly Trp Arg Leu 85 90 95Lys Glu Gln His Ala Asp Ile Thr Thr Lys Ile Ile Pro Glu Ile Ile 100 105 110Gln Thr Val Lys Glu Asp Ile Ser Leu Cys Leu Ser Thr Leu Cys Ser 115 120 125Ile Ala Glu Asp Ile Gln Thr Ser Arg Ala Thr Thr Val Thr Gly His 130 135 140Ala Ala Val Gln Thr His Pro Val Asp Leu Leu Gly Glu His His Leu145 150 155 160Gly Thr Thr Gly His Pro Arg Leu Gln Ser Thr Arg Val Gly Lys Pro 165 170 175Asp Asp Val Pro Glu Ser Pro Val Ser Leu Phe Met Gln Gly Glu Ala 180 185 190Arg Ser Arg Ile Val Gly Lys Ser Pro Ile Lys Leu Gln Phe Pro Thr 195 200 205Phe Gly Lys Ala Asn Asp Ser Ser Asp Pro Leu Gln Tyr Leu Glu Arg 210 215 220Cys Glu Asp Phe Leu Ala Leu Asn Pro Leu Thr Asp Glu Glu Leu Met225 230 235 240Ala Thr Leu Arg Asn Val Leu His Gly Thr Ser Arg Asp Trp Trp Asp 245 250 255Val Ala Arg His Lys Ile Gln Thr Trp Arg Glu Phe Asn Lys His Phe 260 265 270Arg Ala Ala Phe Leu Ser Glu Asp Tyr Glu Asp Glu Leu Ala Glu Arg 275 280 285Val Arg Asn Arg Ile Gln Lys Glu Asp Glu Ser Ile Arg Asp Phe Ala 290 295 300Tyr Met Tyr Gln Ser Leu Cys Lys Arg Trp Asn Pro Ala Ile Cys Glu305 310 315 320Gly Asp Val Val Lys Leu Ile Leu Lys Asn Ile Asn Pro Gln Leu Pro 325 330 335Ser Gln Leu Arg Ser Arg Val Thr Thr Val Asp Glu Leu Val Arg Leu 340 345 350Gly Gln Gln Leu Glu Lys Asp Arg Gln Asn Gln Leu Gln Tyr Glu Leu 355 360 365Arg Lys Ser Ser Gly Lys Ile Ile Gln Lys Ser Ser Ser Cys Glu Thr 370 375 380Ser Ala Leu Pro Asn Thr Lys Ser Thr Pro Asn Gln Gln Asn Pro Ala385 390 395 400Thr Ser Asn Arg Pro Pro Gln Val Tyr Cys Trp Arg Cys Lys Gly His 405 410 415His Ala Pro Ala Ser Cys Pro Gln Trp Lys Ala Asp Lys His Arg Ala 420 425 430Gln Pro Ser Arg Ser Ser Gly Pro Gln Thr Leu Thr Asn Leu Gln Ala 435 440 445Gln Asp Ile 45010396PRTPhyseter catodon 10Gly Glu Leu Asp Gln Arg Ala Ala Gly Gly Leu Arg Ala Tyr Pro Ala1 5 10 15Pro Arg Gly Gly Pro Val Ala Lys Pro Ser Val Ile Leu Gln Ile Gly 20 25 30Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35 40 45Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50 55 60His Arg Ser Val Gly Lys Leu Glu Gly Asn Leu Asp Gly Tyr Val Pro65 70 75 80Thr Gly Asp Ser Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Cys 85 90 95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu 100 105 110Met His Val Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115 120 125Arg Leu Glu Ser Met Gly Gly Lys Tyr Pro Val Gly Thr Asn Pro Ser 130 135 140Arg His Thr Val Ser Val Gly Val Gly Gly Pro Glu Gly Tyr Ser His145 150 155 160Glu Ala Asp Thr Tyr Asp Tyr Thr Val Ser Pro Tyr Ala Ile Thr Pro 165 170 175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Ala Val Glu Ala Gln 180 185 190Gln Tyr Pro Pro Trp Gly Leu Gly Glu Asp Gly Gln Pro Gly Pro Gly 195 200 205Val Asp Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu 210 215 220Glu Glu Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230 235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Phe 245 250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu 260 265 270Gln Tyr Ser Glu Gly Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu 275 280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Glu Glu Glu Glu305 310 315 320Ile Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe 325 330 335Leu Arg Pro Pro Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Lys Gly 340 345 350Met Glu Val Gln Asp Gly Leu Glu Gln Ala Ala Glu Pro Ala Ser Pro 355 360 365Arg Leu Pro Pro Glu Glu Glu Ser Glu Ala Leu Thr Pro Ala Leu Thr 370 375 380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385 390 39511404PRTMeleagris gallopavo 11Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1 5 10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Gly Pro Gly Glu Val Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln 180 185 190Trp Val Ser Val Ser Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Ile Phe Glu Asp Pro His Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln 340 345 350Gly Asn Met Asp His Ser Glu Glu Pro Ser Pro Gln Arg Thr Pro Glu 355 360 365Ile Gln Ser Gly Asp Ser Val Glu Ser Met Pro Pro Ser Thr Thr Ala 370 375 380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu Pro Pro Ser Pro Pro385 390 395 400Ala Thr Val Ile12409PRTPogona vitticeps 12Gly Gln Leu Glu Asn Ile Asn Gln Gly Ser Leu His Ala Phe Gln Gly1 5 10 15His Arg Gly Val Val His Asn Asn Lys Pro Asn Val Ile Leu Gln Ile 20 25 30Gly Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg 35 40 45His Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly 50 55 60Leu Gln Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val65 70 75 80Pro Ser Ala Ala Glu Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys 85 90 95Leu Ala Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys 100 105 110Arg Glu Met Asn Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp 115 120 125Ala Asp Arg Leu Glu Ser Gly Gly Gly Lys Tyr Cys His Ala Asp Gln 130 135 140Gly Arg Gln Thr Val Ser Val Gly Val Gly Gly Pro Glu Val Arg Pro145 150 155 160Ser Glu Gly Glu Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr 165 170 175Ala Leu Thr Pro Pro Pro Met Gly Asp Val Pro Val Ile Pro Gln Pro 180 185 190His Asp Ser Tyr Gln Trp Val Thr Asp Pro Glu Glu Ala Pro Pro Ser 195 200 205Pro Val Glu Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Thr His 210 215 220Leu Glu Asp Tyr Leu Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu225 230 235 240Ser Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu 245 250 255Tyr Lys Gln Asp Ser Val Lys Asn Trp Leu Glu Phe Lys Lys Glu Phe 260 265 270Leu Gln Tyr Ser Glu Gly Thr Leu Thr Arg Asp Ala Ile Lys Gln Glu 275 280 285Leu Asp Leu Pro Gln Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp 290 295 300Arg Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Glu Ala Glu Glu Glu305 310 315 320Glu Val Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg 325 330 335Phe Leu Ser His Pro Tyr Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg 340 345 350Gly Lys Glu Val Glu Gly Asn Leu Asp Asn Ser Glu Glu Pro Ser Pro 355 360 365Gln Arg Thr Pro Glu His Gln Leu Gly Asp Ser Val Glu Ser Leu Pro 370 375 380Pro Ser Thr Thr Ala Ser Pro Ala Gly Ser Asp Lys Thr Gln Pro Glu385 390 395 400Ile Ser Leu Pro Pro Thr Thr Val Ile 40513404PRTAlligator sinensis 13Gly Gln Leu Asp Ser Val Thr Asn Ala Gly Val His Thr Tyr Gln Gly1 5 10

15His Arg Ser Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Thr Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Thr Asp Ser Ala Arg Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Ser Pro Gly Glu Leu Pro Ser Val Pro Gln Pro His Asp Ser Tyr Gln 180 185 190Trp Val Thr Ser Pro Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Cys His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Thr 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Asp Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Ile Asp Ala Asp Glu Glu Gln Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Lys Gly Lys Glu Val Gln 340 345 350Gly Ser Leu Asp His Ser Glu Glu Pro Ser Pro Gln Arg Ala Ser Glu 355 360 365Ala Arg Thr Gly Asp Ser Val Glu Thr Leu Pro Pro Ser Thr Thr Thr 370 375 380Ser Pro Asn Thr Ser Ser Gly Thr Gln Pro Glu Ala Pro Ser Pro Pro385 390 395 400Ala Thr Val Ile14404PRTAlligator mississippiensis 14Gly Gln Leu Asp Ser Val Thr Asn Ala Gly Val His Thr Tyr Gln Gly1 5 10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25 30Cys Arg Thr Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35 40 45Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55 60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70 75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100 105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala Asp Arg Leu 115 120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Thr Asp Ser Ala Arg Gln Thr 130 135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150 155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165 170 175Ser Pro Gly Glu Leu Pro Ser Ile Pro Gln Pro His Asp Ser Tyr Gln 180 185 190Trp Val Thr Ser Pro Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Cys His Leu Glu Glu Tyr Leu 210 215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230 235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Thr 245 250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260 265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275 280 285Lys Asp Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295 300Tyr Gln Thr Leu Tyr Ile Asp Ala Asp Glu Glu Gln Ile Ile Gln Tyr305 310 315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325 330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Lys Gly Lys Glu Val Gln 340 345 350Gly Ser Leu Asp His Ser Glu Glu Pro Ser Pro Gln Arg Ala Ser Glu 355 360 365Ala Arg Thr Gly Asp Ser Val Glu Ser Leu Pro Pro Ser Thr Thr Thr 370 375 380Ser Pro Asn Ala Ser Ser Gly Thr Gln Pro Glu Ala Pro Ser Pro Pro385 390 395 400Ala Thr Val Ile15408PRTGekko japonicus 15Gly Gln Leu Glu Asn Val Asn His Gly Asn Leu His Ser Phe Gln Gly1 5 10 15His Arg Gly Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly 20 25 30Lys Cys Arg Ala Glu Met Leu Asp His Val Arg Arg Thr His Arg His 35 40 45Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50 55 60Gln Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro65 70 75 80Ser Ala Val Glu Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu 85 90 95Ser Arg Cys Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg 100 105 110Glu Met Asn Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala 115 120 125Asp Arg Leu Glu Ser Gly Gly Gly Lys Tyr Cys His Gly Asp Asn His 130 135 140Arg Gln Thr Val Ser Val Gly Val Gly Gly Pro Glu Val Arg Pro Ser145 150 155 160Glu Gly Glu Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala 165 170 175Leu Thr Pro Pro Ser Pro Gly Asp Val Pro Val Val Ser Gln Pro His 180 185 190Asp Ser Tyr Gln Trp Val Thr Val Pro Glu Asp Thr Pro Pro Ser Pro 195 200 205Val Glu Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Thr His Leu 210 215 220Glu Asp Tyr Leu Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser225 230 235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr 245 250 255Lys Gln Asp Ser Val Lys Asn Trp Leu Glu Phe Lys Lys Glu Phe Leu 260 265 270Gln Tyr Ser Glu Gly Thr Leu Thr Arg Asp Ala Ile Lys Glu Glu Leu 275 280 285Asp Leu Pro Gln Lys Asp Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Glu Ala Asp Glu Glu Glu305 310 315 320Val Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe 325 330 335Leu Ser His Pro Tyr Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly 340 345 350Lys Glu Val Glu Gly Asn Leu Asp Asn Ser Glu Glu Pro Thr Pro Gln 355 360 365Arg Thr Pro Glu His Gln Leu Cys Gly Ser Val Glu Ser Leu Pro Pro 370 375 380Ser Ser Thr Val Ser Pro Val Ala Ser Asp Gly Thr Gln Pro Glu Thr385 390 395 400Ser Pro Leu Pro Ala Thr Val Ile 40516455PRTHomo sapiens 16Gly Pro Leu Thr Leu Leu Gln Asp Trp Cys Arg Gly Glu His Leu Asn1 5 10 15Thr Arg Arg Cys Met Leu Ile Leu Gly Ile Pro Glu Asp Cys Gly Glu 20 25 30Asp Glu Phe Glu Glu Thr Leu Gln Glu Ala Cys Arg His Leu Gly Arg 35 40 45Tyr Arg Val Ile Gly Arg Met Phe Arg Arg Glu Glu Asn Ala Gln Ala 50 55 60Ile Leu Leu Glu Leu Ala Gln Asp Ile Asp Tyr Ala Leu Leu Pro Arg65 70 75 80Glu Ile Pro Gly Lys Gly Gly Pro Trp Glu Val Ile Val Lys Pro Arg 85 90 95Asn Ser Asp Gly Glu Phe Leu Asn Arg Leu Asn Arg Phe Leu Glu Glu 100 105 110Glu Arg Arg Thr Val Ser Asp Met Asn Arg Val Leu Gly Ser Asp Thr 115 120 125Asn Cys Ser Ala Pro Arg Val Thr Ile Ser Pro Glu Phe Trp Thr Trp 130 135 140Ala Gln Thr Leu Gly Ala Ala Val Gln Pro Leu Leu Glu Gln Met Leu145 150 155 160Tyr Arg Glu Leu Arg Val Phe Ser Gly Asn Thr Ile Ser Ile Pro Gly 165 170 175Ala Leu Ala Phe Asp Ala Trp Leu Glu His Thr Thr Glu Met Leu Gln 180 185 190Met Trp Gln Val Pro Glu Gly Glu Lys Arg Arg Arg Leu Met Glu Cys 195 200 205Leu Arg Gly Pro Ala Leu Gln Val Val Ser Gly Leu Arg Ala Ser Asn 210 215 220Ala Ser Ile Thr Val Glu Glu Cys Leu Ala Ala Leu Gln Gln Val Phe225 230 235 240Gly Pro Val Glu Ser His Lys Ile Ala Gln Val Lys Leu Cys Lys Ala 245 250 255Tyr Gln Glu Ala Gly Glu Lys Val Ser Ser Phe Val Leu Arg Leu Glu 260 265 270Pro Leu Leu Gln Arg Ala Val Glu Asn Asn Val Val Ser Arg Arg Asn 275 280 285Val Asn Gln Thr Arg Leu Lys Arg Val Leu Ser Gly Ala Thr Leu Pro 290 295 300Asp Lys Leu Arg Asp Lys Leu Lys Leu Met Lys Gln Arg Arg Lys Pro305 310 315 320Pro Gly Phe Leu Ala Leu Val Lys Leu Leu Arg Glu Glu Glu Glu Trp 325 330 335Glu Ala Thr Leu Gly Pro Asp Arg Glu Ser Leu Glu Gly Leu Glu Val 340 345 350Ala Pro Arg Pro Pro Ala Arg Ile Thr Gly Val Gly Ala Val Pro Leu 355 360 365Pro Ala Ser Gly Asn Ser Phe Asp Ala Arg Pro Ser Gln Gly Tyr Arg 370 375 380Arg Arg Arg Gly Arg Gly Gln His Arg Arg Gly Gly Val Ala Arg Ala385 390 395 400Gly Ser Arg Gly Ser Arg Lys Arg Lys Arg His Thr Phe Cys Tyr Ser 405 410 415Cys Gly Glu Asp Gly His Ile Arg Val Gln Cys Ile Asn Pro Ser Asn 420 425 430Leu Leu Leu Ala Lys Glu Thr Lys Glu Ile Leu Glu Gly Gly Glu Arg 435 440 445Glu Ala Gln Thr Asn Ser Arg 450 45517448PRTHomo sapiens 17Gly Ala Leu Thr Leu Leu Glu Asp Trp Cys Lys Gly Met Asp Met Asp1 5 10 15Pro Arg Lys Ala Leu Leu Ile Val Gly Ile Pro Met Glu Cys Ser Glu 20 25 30Val Glu Ile Gln Asp Thr Val Lys Ala Gly Leu Gln Pro Leu Cys Ala 35 40 45Tyr Arg Val Leu Gly Arg Met Phe Arg Arg Glu Asp Asn Ala Lys Ala 50 55 60Val Phe Ile Glu Leu Ala Asp Thr Val Asn Tyr Thr Thr Leu Pro Ser65 70 75 80His Ile Pro Gly Lys Gly Gly Ser Trp Glu Val Val Val Lys Pro Arg 85 90 95Asn Pro Asp Asp Glu Phe Leu Ser Arg Leu Asn Tyr Phe Leu Lys Asp 100 105 110Glu Gly Arg Ser Met Thr Asp Val Ala Arg Ala Leu Gly Cys Cys Ser 115 120 125Leu Pro Ala Glu Ser Leu Asp Ala Glu Val Met Pro Gln Val Arg Ser 130 135 140Pro Pro Leu Glu Pro Pro Lys Glu Ser Met Trp Tyr Arg Lys Leu Lys145 150 155 160Val Phe Ser Gly Thr Ala Ser Pro Ser Pro Gly Glu Glu Thr Phe Glu 165 170 175Asp Trp Leu Glu Gln Val Thr Glu Ile Met Pro Ile Trp Gln Val Ser 180 185 190Glu Val Glu Lys Arg Arg Arg Leu Leu Glu Ser Leu Arg Gly Pro Ala 195 200 205Leu Ser Ile Met Arg Val Leu Gln Ala Asn Asn Asp Ser Ile Thr Val 210 215 220Glu Gln Cys Leu Asp Ala Leu Lys Gln Ile Phe Gly Asp Lys Glu Asp225 230 235 240Phe Arg Ala Ser Gln Phe Arg Phe Leu Gln Thr Ser Pro Lys Ile Gly 245 250 255Glu Lys Val Ser Thr Phe Leu Leu Arg Leu Glu Pro Leu Leu Gln Lys 260 265 270Ala Val His Lys Ser Pro Leu Ser Val Arg Ser Thr Asp Met Ile Arg 275 280 285Leu Lys His Leu Leu Ala Arg Val Ala Met Thr Pro Ala Leu Arg Gly 290 295 300Lys Leu Glu Leu Leu Asp Gln Arg Gly Cys Pro Pro Asn Phe Leu Glu305 310 315 320Leu Met Lys Leu Ile Arg Asp Glu Glu Glu Trp Glu Asn Thr Glu Ala 325 330 335Val Met Lys Asn Lys Glu Lys Pro Ser Gly Arg Gly Arg Gly Ala Ser 340 345 350Gly Arg Gln Ala Arg Ala Glu Ala Ser Val Ser Ala Pro Gln Ala Thr 355 360 365Val Gln Ala Arg Ser Phe Ser Asp Ser Ser Pro Gln Thr Ile Gln Gly 370 375 380Gly Leu Pro Pro Leu Val Lys Arg Arg Arg Leu Leu Gly Ser Glu Ser385 390 395 400Thr Arg Gly Glu Asp His Gly Gln Ala Thr Tyr Pro Lys Ala Glu Asn 405 410 415Gln Thr Pro Gly Arg Glu Gly Pro Gln Ala Ala Gly Glu Glu Leu Gly 420 425 430Asn Glu Ala Gly Ala Gly Ala Met Ser His Pro Lys Pro Trp Glu Thr 435 440 44518399PRTHomo sapiens 18Gly Ala Val Thr Met Leu Gln Asp Trp Cys Arg Trp Met Gly Val Asn1 5 10 15Ala Arg Arg Gly Leu Leu Ile Leu Gly Ile Pro Glu Asp Cys Asp Asp 20 25 30Ala Glu Phe Gln Glu Ser Leu Glu Ala Ala Leu Arg Pro Met Gly His 35 40 45Phe Thr Val Leu Gly Lys Ala Phe Arg Glu Glu Asp Asn Ala Thr Ala 50 55 60Ala Leu Val Glu Leu Asp Arg Glu Val Asn Tyr Ala Leu Val Pro Arg65 70 75 80Glu Ile Pro Gly Thr Gly Gly Pro Trp Asn Val Val Phe Val Pro Arg 85 90 95Cys Ser Gly Glu Glu Phe Leu Gly Leu Gly Arg Val Phe His Phe Pro 100 105 110Glu Gln Glu Gly Gln Met Val Glu Ser Val Ala Gly Ala Leu Gly Val 115 120 125Gly Leu Arg Arg Val Cys Trp Leu Arg Ser Ile Gly Gln Ala Val Gln 130 135 140Pro Trp Val Glu Ala Val Arg Cys Gln Ser Leu Gly Val Phe Ser Gly145 150 155 160Arg Asp Gln Pro Ala Pro Gly Glu Glu Ser Phe Glu Val Trp Leu Asp 165 170 175His Thr Thr Glu Met Leu His Val Trp Gln Gly Val Ser Glu Arg Glu 180 185 190Arg Arg Arg Arg Leu Leu Glu Gly Leu Arg Gly Thr Ala Leu Gln Leu 195 200 205Val His Ala Leu Leu Ala Glu Asn Pro Ala Arg Thr Ala Gln Asp Cys 210 215 220Leu Ala Ala Leu Ala Gln Val Phe Gly Asp Asn Glu Ser Gln Ala Thr225 230 235 240Ile Arg Val Lys Cys Leu Thr Ala Gln Gln Gln Ser Gly Glu Arg Leu 245 250 255Ser Ala Phe Val Leu Arg Leu Glu Val Leu Leu Gln Lys Ala Met Glu 260 265 270Lys Glu Ala Leu Ala Arg Ala Ser Ala Asp Arg Val Arg Leu Arg Gln 275 280 285Met Leu Thr Arg Ala His Leu Thr Glu Pro Leu Asp Glu Ala Leu Arg 290 295 300Lys Leu Arg Met Ala Gly Arg Ser Pro Ser Phe Leu Glu Met Leu Gly305 310 315 320Leu Val Arg Glu Ser Glu Ala Trp Glu Ala Ser Leu Ala Arg Ser Val 325 330 335Arg Ala Gln Thr Gln Glu Gly Ala Gly Ala Arg Ala Gly Ala Gln Ala 340 345 350Val Ala Arg Ala Ser Thr Lys Val Glu Ala Val Pro Gly Gly Pro Gly

355 360 365Arg Glu Pro Glu Gly Leu Leu Gln Ala Gly Gly Gln Glu Ala Glu Glu 370 375 380Leu Leu Gln Glu Gly Leu Lys Pro Val Leu Glu Glu Cys Asp Asn385 390 39519399PRTHomo sapiens 19Gly Ala Val Thr Met Leu Gln Asp Trp Cys Arg Trp Met Gly Val Asn1 5 10 15Ala Arg Arg Gly Leu Leu Ile Leu Gly Ile Pro Glu Asp Cys Asp Asp 20 25 30Ala Glu Phe Gln Glu Ser Leu Glu Ala Ala Leu Arg Pro Met Gly His 35 40 45Phe Thr Val Leu Gly Lys Val Phe Arg Glu Glu Asp Asn Ala Thr Ala 50 55 60Ala Leu Val Glu Leu Asp Arg Glu Val Asn Tyr Ala Leu Val Pro Arg65 70 75 80Glu Ile Pro Gly Thr Gly Gly Pro Trp Asn Val Val Phe Val Pro Arg 85 90 95Cys Ser Gly Glu Glu Phe Leu Gly Leu Gly Arg Val Phe His Phe Pro 100 105 110Glu Gln Glu Gly Gln Met Val Glu Ser Val Ala Gly Ala Leu Gly Val 115 120 125Gly Leu Arg Arg Val Cys Trp Leu Arg Ser Ile Gly Gln Ala Val Gln 130 135 140Pro Trp Val Glu Ala Val Arg Tyr Gln Ser Leu Gly Val Phe Ser Gly145 150 155 160Arg Asp Gln Pro Ala Pro Gly Glu Glu Ser Phe Glu Val Trp Leu Asp 165 170 175His Thr Thr Glu Met Leu His Val Trp Gln Gly Val Ser Glu Arg Glu 180 185 190Arg Arg Arg Arg Leu Leu Glu Gly Leu Arg Gly Thr Ala Leu Gln Leu 195 200 205Val His Ala Leu Leu Ala Glu Asn Pro Ala Arg Thr Ala Gln Asp Cys 210 215 220Leu Ala Ala Leu Ala Gln Val Phe Gly Asp Asn Glu Ser Gln Ala Thr225 230 235 240Ile Arg Val Lys Cys Leu Thr Ala Gln Gln Gln Ser Gly Glu Arg Leu 245 250 255Ser Ala Phe Val Leu Arg Leu Glu Val Leu Leu Gln Lys Ala Met Glu 260 265 270Lys Glu Ala Leu Ala Arg Ala Ser Ala Asp Arg Val Arg Leu Arg Gln 275 280 285Met Leu Thr Arg Ala His Leu Thr Glu Pro Leu Asp Glu Ala Leu Arg 290 295 300Lys Leu Arg Met Ala Gly Arg Ser Pro Ser Phe Leu Glu Met Leu Gly305 310 315 320Leu Val Arg Glu Ser Glu Ala Trp Glu Ala Ser Leu Ala Arg Ser Val 325 330 335Arg Ala Gln Thr Gln Glu Gly Ala Gly Ala Arg Ala Gly Ala Gln Ala 340 345 350Val Ala Arg Ala Ser Thr Lys Val Glu Ala Val Pro Gly Gly Pro Gly 355 360 365Arg Glu Pro Glu Gly Leu Arg Gln Ala Gly Gly Gln Glu Ala Glu Glu 370 375 380Leu Leu Gln Glu Gly Leu Lys Pro Val Leu Glu Glu Cys Asp Asn385 390 39520475PRTHomo sapiens 20Gly Val Glu Asp Leu Ala Ala Ser Tyr Ile Val Leu Lys Leu Glu Asn1 5 10 15Glu Ile Arg Gln Ala Gln Val Gln Trp Leu Met Glu Glu Asn Ala Ala 20 25 30Leu Gln Ala Gln Ile Pro Glu Leu Gln Lys Ser Gln Ala Ala Lys Glu 35 40 45Tyr Asp Leu Leu Arg Lys Ser Ser Glu Ala Lys Glu Pro Gln Lys Leu 50 55 60Pro Glu His Met Asn Pro Pro Ala Ala Trp Glu Ala Gln Lys Thr Pro65 70 75 80Glu Phe Lys Glu Pro Gln Lys Pro Pro Glu Pro Gln Asp Leu Leu Pro 85 90 95Trp Glu Pro Pro Ala Ala Trp Glu Leu Gln Glu Ala Pro Ala Ala Pro 100 105 110Glu Ser Leu Ala Pro Pro Ala Thr Arg Glu Ser Gln Lys Pro Pro Met 115 120 125Ala His Glu Ile Pro Thr Val Leu Glu Gly Gln Gly Pro Ala Asn Thr 130 135 140Gln Asp Ala Thr Ile Ala Gln Glu Pro Lys Asn Ser Glu Pro Gln Asp145 150 155 160Pro Pro Asn Ile Glu Lys Pro Gln Glu Ala Pro Glu Tyr Gln Glu Thr 165 170 175Ala Ala Gln Leu Glu Phe Leu Glu Leu Pro Pro Pro Gln Glu Pro Leu 180 185 190Glu Pro Ser Asn Ala Gln Glu Phe Leu Glu Leu Ser Ala Ala Gln Glu 195 200 205Ser Leu Glu Gly Leu Ile Val Val Glu Thr Ser Ala Ala Ser Glu Phe 210 215 220Pro Gln Ala Pro Ile Gly Leu Glu Ala Thr Asp Phe Pro Leu Gln Tyr225 230 235 240Thr Leu Thr Phe Ser Gly Asp Ser Gln Lys Leu Pro Glu Phe Leu Val 245 250 255Gln Leu Tyr Ser Tyr Met Arg Val Arg Gly His Leu Tyr Pro Thr Glu 260 265 270Ala Ala Leu Val Ser Phe Val Gly Asn Cys Phe Ser Gly Arg Ala Gly 275 280 285Trp Trp Phe Gln Leu Leu Leu Asp Ile Gln Ser Pro Leu Leu Glu Gln 290 295 300Cys Glu Ser Phe Ile Pro Val Leu Gln Asp Thr Phe Asp Asn Pro Glu305 310 315 320Asn Met Lys Asp Ala Asn Gln Cys Ile His Gln Leu Cys Gln Gly Glu 325 330 335Gly His Val Ala Thr His Phe His Leu Ile Ala Gln Glu Leu Asn Trp 340 345 350Asp Glu Ser Thr Leu Trp Ile Gln Phe Gln Glu Gly Leu Ala Ser Ser 355 360 365Ile Gln Asp Glu Leu Ser His Thr Ser Pro Ala Thr Asn Leu Ser Asp 370 375 380Leu Ile Thr Gln Cys Ile Ser Leu Glu Glu Lys Pro Asp Pro Asn Pro385 390 395 400Leu Gly Lys Ser Ser Ser Ala Glu Gly Asp Gly Pro Glu Ser Pro Pro 405 410 415Ala Glu Asn Gln Pro Met Gln Ala Ala Ile Asn Cys Pro His Ile Ser 420 425 430Glu Ala Glu Trp Val Arg Trp His Lys Gly Arg Leu Cys Leu Tyr Cys 435 440 445Gly Tyr Pro Gly His Phe Ala Arg Asp Cys Pro Val Lys Pro His Gln 450 455 460Ala Leu Gln Ala Gly Asn Ile Gln Ala Cys Gln465 470 47521239PRTHomo sapiens 21Gly Val Gln Pro Gln Thr Ser Lys Ala Glu Ser Pro Ala Leu Ala Ala1 5 10 15Ser Pro Asn Ala Gln Met Asp Asp Val Ile Asp Thr Leu Thr Ser Leu 20 25 30Arg Leu Thr Asn Ser Ala Leu Arg Arg Glu Ala Ser Thr Leu Arg Ala 35 40 45Glu Lys Ala Asn Leu Thr Asn Met Leu Glu Ser Val Met Ala Glu Leu 50 55 60Thr Leu Leu Arg Thr Arg Ala Arg Ile Pro Gly Ala Leu Gln Ile Thr65 70 75 80Pro Pro Ile Ser Ser Ile Thr Ser Asn Gly Thr Arg Pro Met Thr Thr 85 90 95Pro Pro Thr Ser Leu Pro Glu Pro Phe Ser Gly Asp Pro Gly Arg Leu 100 105 110Ala Gly Phe Leu Met Gln Met Asp Arg Phe Met Ile Phe Gln Ala Ser 115 120 125Arg Phe Pro Gly Glu Ala Glu Arg Val Ala Phe Leu Val Ser Arg Leu 130 135 140Thr Gly Glu Ala Glu Lys Trp Ala Ile Pro His Met Gln Pro Asp Ser145 150 155 160Pro Leu Arg Asn Asn Tyr Gln Gly Phe Leu Ala Glu Leu Arg Arg Thr 165 170 175Tyr Lys Ser Pro Leu Arg His Ala Arg Arg Ala Gln Ile Arg Lys Thr 180 185 190Ser Ala Ser Asn Arg Ala Val Arg Glu Arg Gln Met Leu Cys Arg Gln 195 200 205Leu Ala Ser Ala Gly Thr Gly Pro Cys Pro Val His Pro Ala Ser Asn 210 215 220Gly Thr Ser Pro Ala Pro Ala Leu Pro Ala Arg Ala Arg Asn Leu225 230 23522113PRTHomo sapiens 22Gly Asp Gly Arg Val Gln Leu Met Lys Ala Leu Leu Ala Gly Pro Leu1 5 10 15Arg Pro Ala Ala Arg Arg Trp Arg Asn Pro Ile Pro Phe Pro Glu Thr 20 25 30Phe Asp Gly Asp Thr Asp Arg Leu Pro Glu Phe Ile Val Gln Thr Ser 35 40 45Ser Tyr Met Phe Val Asp Glu Asn Thr Phe Ser Asn Asp Ala Leu Lys 50 55 60Val Thr Phe Leu Ile Thr Arg Leu Thr Gly Pro Ala Leu Gln Trp Val65 70 75 80Ile Pro Tyr Ile Arg Lys Glu Ser Pro Leu Leu Asn Asp Tyr Arg Gly 85 90 95Phe Leu Ala Glu Met Lys Arg Val Phe Gly Trp Glu Glu Asp Glu Asp 100 105 110Phe23113PRTHomo sapiens 23Gly Glu Gly Arg Val Gln Leu Met Lys Ala Leu Leu Ala Arg Pro Leu1 5 10 15Arg Pro Ala Ala Arg Arg Trp Arg Asn Pro Ile Pro Phe Pro Glu Thr 20 25 30Phe Asp Gly Asp Thr Asp Arg Leu Pro Glu Phe Ile Val Gln Thr Ser 35 40 45Ser Tyr Met Phe Val Asp Glu Asn Thr Phe Ser Asn Asp Ala Leu Lys 50 55 60Val Thr Phe Leu Ile Thr Arg Leu Thr Gly Pro Ala Leu Gln Trp Val65 70 75 80Ile Pro Tyr Ile Lys Lys Glu Ser Pro Leu Leu Ser Asp Tyr Arg Gly 85 90 95Phe Leu Ala Glu Met Lys Arg Val Phe Gly Trp Glu Glu Asp Glu Asp 100 105 110Phe24364PRTHomo sapiens 24Gly Pro Arg Gly Arg Cys Arg Gln Gln Gly Pro Arg Ile Pro Ile Trp1 5 10 15Ala Ala Ala Asn Tyr Ala Asn Ala His Pro Trp Gln Gln Met Asp Lys 20 25 30Ala Ser Pro Gly Val Ala Tyr Thr Pro Leu Val Asp Pro Trp Ile Glu 35 40 45Arg Pro Cys Cys Gly Asp Thr Val Cys Val Arg Thr Thr Met Glu Gln 50 55 60Lys Ser Thr Ala Ser Gly Thr Cys Gly Gly Lys Pro Ala Glu Arg Gly65 70 75 80Pro Leu Ala Gly His Met Pro Ser Ser Arg Pro His Arg Val Asp Phe 85 90 95Cys Trp Val Pro Gly Ser Asp Pro Gly Thr Phe Asp Gly Ser Pro Trp 100 105 110Leu Leu Asp Arg Phe Leu Ala Gln Leu Gly Asp Tyr Met Ser Phe His 115 120 125Phe Glu His Tyr Gln Asp Asn Ile Ser Arg Val Cys Glu Ile Leu Arg 130 135 140Arg Leu Thr Gly Arg Ala Gln Ala Trp Ala Ala Pro Tyr Leu Asp Gly145 150 155 160Asp Leu Pro Leu Pro Asp Asp Tyr Glu Leu Phe Cys Gln Asp Leu Lys 165 170 175Glu Val Val Gln Asp Pro Asn Ser Phe Ala Glu Tyr His Ala Val Val 180 185 190Thr Cys Pro Leu Pro Leu Ala Ser Ser Gln Leu Pro Val Ala Pro Gln 195 200 205Leu Pro Val Val Arg Gln Tyr Leu Ala Arg Phe Leu Glu Gly Leu Ala 210 215 220Leu Asp Met Gly Thr Ala Pro Arg Ser Leu Pro Ala Ala Met Ala Thr225 230 235 240Pro Ala Val Ser Gly Ser Asn Ser Val Ser Arg Ser Ala Leu Phe Glu 245 250 255Gln Gln Leu Thr Lys Glu Ser Thr Pro Gly Pro Lys Glu Pro Pro Val 260 265 270Leu Pro Ser Ser Thr Cys Ser Ser Lys Pro Gly Pro Val Glu Pro Ala 275 280 285Ser Ser Gln Pro Glu Glu Ala Ala Pro Thr Pro Val Pro Arg Leu Ser 290 295 300Glu Ser Ala Asn Pro Pro Ala Gln Arg Pro Asp Pro Ala His Pro Gly305 310 315 320Gly Pro Lys Pro Gln Lys Thr Glu Glu Glu Val Leu Glu Thr Glu Gly 325 330 335Asp Gln Glu Val Ser Leu Gly Thr Pro Gln Glu Val Val Glu Ala Pro 340 345 350Glu Thr Pro Gly Glu Pro Pro Leu Ser Pro Gly Phe 355 36025146PRTHomo sapiens 25Gly Val Asp Glu Leu Val Leu Leu Leu His Ala Leu Leu Met Arg His1 5 10 15Arg Ala Leu Ser Ile Glu Asn Ser Gln Leu Met Glu Gln Leu Arg Leu 20 25 30Leu Val Cys Glu Arg Ala Ser Leu Leu Arg Gln Val Arg Pro Pro Ser 35 40 45Cys Pro Val Pro Phe Pro Glu Thr Phe Asn Gly Glu Ser Ser Arg Leu 50 55 60Pro Glu Phe Ile Val Gln Thr Ala Ser Tyr Met Leu Val Asn Glu Asn65 70 75 80Arg Phe Cys Asn Asp Ala Met Lys Val Ala Phe Leu Ile Ser Leu Leu 85 90 95Thr Gly Glu Ala Glu Glu Trp Val Val Pro Tyr Ile Glu Met Asp Ser 100 105 110Pro Ile Leu Gly Asp Tyr Arg Ala Phe Leu Asp Glu Met Lys Gln Cys 115 120 125Phe Gly Trp Asp Asp Asp Glu Asp Asp Asp Asp Glu Glu Glu Glu Asp 130 135 140Asp Tyr14526549PRTHomo sapiens 26Gly Pro Val Asp Leu Gly Gln Ala Leu Gly Leu Leu Pro Ser Leu Ala1 5 10 15Lys Ala Glu Asp Ser Gln Phe Ser Glu Ser Asp Ala Ala Leu Gln Glu 20 25 30Glu Leu Ser Ser Pro Glu Thr Ala Arg Gln Leu Phe Arg Gln Phe Arg 35 40 45Tyr Gln Val Met Ser Gly Pro His Glu Thr Leu Lys Gln Leu Arg Lys 50 55 60Leu Cys Phe Gln Trp Leu Gln Pro Glu Val His Thr Lys Glu Gln Ile65 70 75 80Leu Glu Ile Leu Met Leu Glu Gln Phe Leu Thr Ile Leu Pro Gly Glu 85 90 95Ile Gln Met Trp Val Arg Lys Gln Cys Pro Gly Ser Gly Glu Glu Ala 100 105 110Val Thr Leu Val Glu Ser Leu Lys Gly Asp Pro Gln Arg Leu Trp Gln 115 120 125Trp Ile Ser Ile Gln Val Leu Gly Gln Asp Ile Leu Ser Glu Lys Met 130 135 140Glu Ser Pro Ser Cys Gln Val Gly Glu Val Glu Pro His Leu Glu Val145 150 155 160Val Pro Gln Glu Leu Gly Leu Glu Asn Ser Ser Ser Gly Pro Gly Glu 165 170 175Leu Leu Ser His Ile Val Lys Glu Glu Ser Asp Thr Glu Ala Glu Leu 180 185 190Ala Leu Ala Ala Ser Gln Pro Ala Arg Leu Glu Glu Arg Leu Ile Arg 195 200 205Asp Gln Asp Leu Gly Ala Ser Leu Leu Pro Ala Ala Pro Gln Glu Gln 210 215 220Trp Arg Gln Leu Asp Ser Thr Gln Lys Glu Gln Tyr Trp Asp Leu Met225 230 235 240Leu Glu Thr Tyr Gly Lys Met Val Ser Gly Ala Gly Ile Ser His Pro 245 250 255Lys Ser Asp Leu Thr Asn Ser Ile Glu Phe Gly Glu Glu Leu Ala Gly 260 265 270Ile Tyr Leu His Val Asn Glu Lys Ile Pro Arg Pro Thr Cys Ile Gly 275 280 285Asp Arg Gln Glu Asn Asp Lys Glu Asn Leu Asn Leu Glu Asn His Arg 290 295 300Asp Gln Glu Leu Leu His Ala Ser Cys Gln Ala Ser Gly Glu Val Pro305 310 315 320Ser Gln Ala Ser Leu Arg Gly Phe Phe Thr Glu Asp Glu Pro Gly Cys 325 330 335Phe Gly Glu Gly Glu Asn Leu Pro Glu Ala Leu Gln Asn Ile Gln Asp 340 345 350Glu Gly Thr Gly Glu Gln Leu Ser Pro Gln Glu Arg Ile Ser Glu Lys 355 360 365Gln Leu Gly Gln His Leu Pro Asn Pro His Ser Gly Glu Met Ser Thr 370 375 380Met Trp Leu Glu Glu Lys Arg Glu Thr Ser Gln Lys Gly Gln Pro Arg385 390 395 400Ala Pro Met Ala Gln Lys Leu Pro Thr Cys Arg Glu Cys Gly Lys Thr 405 410 415Phe Tyr Arg Asn Ser Gln Leu Ile Phe His Gln Arg Thr His Thr Gly 420 425 430Glu Thr Tyr Phe Gln Cys Thr Ile Cys Lys Lys Ala Phe Leu Arg Ser 435 440 445Ser Asp Phe Val Lys His Gln Arg Thr His Thr Gly Glu Lys Pro Cys 450 455 460Lys Cys Asp Tyr Cys Gly Lys Gly Phe Ser Asp Phe Ser Gly Leu Arg465 470 475 480His His Glu Lys Ile His Thr Gly Glu Lys Pro Tyr Lys Cys Pro Ile 485 490 495Cys Glu Lys Ser Phe Ile Gln Arg Ser Asn Phe Asn Arg His Gln Arg 500 505 510Val His Thr Gly Glu Lys Pro Tyr Lys Cys Ser His Cys Gly Lys Ser 515 520 525Phe Ser Trp Ser Ser Ser Leu Asp Lys His Gln Arg Ser His Leu Gly 530 535 540Lys Lys Pro Phe Gln54527351PRTHomo sapiens 27Gly Thr Leu Arg Leu Leu Glu Asp Trp Cys Arg Gly Met Asp Met Asn1 5 10 15Pro Arg Lys Ala Leu Leu Ile Ala Gly Ile Ser Gln Ser Cys Ser Val 20 25

30Ala Glu Ile Glu Glu Ala Leu Gln Ala Gly Leu Ala Pro Leu Gly Glu 35 40 45Tyr Arg Leu Leu Gly Arg Met Phe Arg Arg Asp Glu Asn Arg Lys Val 50 55 60Ala Leu Val Gly Leu Thr Ala Glu Thr Ser His Ala Leu Val Pro Lys65 70 75 80Glu Ile Pro Gly Lys Gly Gly Ile Trp Arg Val Ile Phe Lys Pro Pro 85 90 95Asp Pro Asp Asn Thr Phe Leu Ser Arg Leu Asn Glu Phe Leu Ala Gly 100 105 110Glu Gly Met Thr Val Gly Glu Leu Ser Arg Ala Leu Gly His Glu Asn 115 120 125Gly Ser Leu Asp Pro Glu Gln Gly Met Ile Pro Glu Met Trp Ala Pro 130 135 140Met Leu Ala Gln Ala Leu Glu Ala Leu Gln Pro Ala Leu Gln Cys Leu145 150 155 160Lys Tyr Lys Lys Leu Arg Val Phe Ser Gly Arg Glu Ser Pro Glu Pro 165 170 175Gly Glu Glu Glu Phe Gly Arg Trp Met Phe His Thr Thr Gln Met Ile 180 185 190Lys Ala Trp Gln Val Pro Asp Val Glu Lys Arg Arg Arg Leu Leu Glu 195 200 205Ser Leu Arg Gly Pro Ala Leu Asp Val Ile Arg Val Leu Lys Ile Asn 210 215 220Asn Pro Leu Ile Thr Val Asp Glu Cys Leu Gln Ala Leu Glu Glu Val225 230 235 240Phe Gly Val Thr Asp Asn Pro Arg Glu Leu Gln Val Lys Tyr Leu Thr 245 250 255Thr Tyr His Lys Asp Glu Glu Lys Leu Ser Ala Tyr Val Leu Arg Leu 260 265 270Glu Pro Leu Leu Gln Lys Leu Val Gln Arg Gly Ala Ile Glu Arg Asp 275 280 285Ala Val Asn Gln Ala Arg Leu Asp Gln Val Ile Ala Gly Ala Val His 290 295 300Lys Thr Ile Arg Arg Glu Leu Asn Leu Pro Glu Asp Gly Pro Ala Pro305 310 315 320Gly Phe Leu Gln Leu Leu Val Leu Ile Lys Asp Tyr Glu Ala Ala Glu 325 330 335Glu Glu Glu Ala Leu Leu Gln Ala Ile Leu Glu Gly Asn Phe Thr 340 345 35028708PRTHomo sapiens 28Gly Thr Glu Arg Arg Arg Asp Glu Leu Ser Glu Glu Ile Asn Asn Leu1 5 10 15Arg Glu Lys Val Met Lys Gln Ser Glu Glu Asn Asn Asn Leu Gln Ser 20 25 30Gln Val Gln Lys Leu Thr Glu Glu Asn Thr Thr Leu Arg Glu Gln Val 35 40 45Glu Pro Thr Pro Glu Asp Glu Asp Asp Asp Ile Glu Leu Arg Gly Ala 50 55 60Ala Ala Ala Ala Ala Pro Pro Pro Pro Ile Glu Glu Glu Cys Pro Glu65 70 75 80Asp Leu Pro Glu Lys Phe Asp Gly Asn Pro Asp Met Leu Ala Pro Phe 85 90 95Met Ala Gln Cys Gln Ile Phe Met Glu Lys Ser Thr Arg Asp Phe Ser 100 105 110Val Asp Arg Val Arg Val Cys Phe Val Thr Ser Met Met Thr Gly Arg 115 120 125Ala Ala Arg Trp Ala Ser Ala Lys Leu Glu Arg Ser His Tyr Leu Met 130 135 140His Asn Tyr Pro Ala Phe Met Met Glu Met Lys His Val Phe Glu Asp145 150 155 160Pro Gln Arg Arg Glu Val Ala Lys Arg Lys Ile Arg Arg Leu Arg Gln 165 170 175Gly Met Gly Ser Val Ile Asp Tyr Ser Asn Ala Phe Gln Met Ile Ala 180 185 190Gln Asp Leu Asp Trp Asn Glu Pro Ala Leu Ile Asp Gln Tyr His Glu 195 200 205Gly Leu Ser Asp His Ile Gln Glu Glu Leu Ser His Leu Glu Val Ala 210 215 220Lys Ser Leu Ser Ala Leu Ile Gly Gln Cys Ile His Ile Glu Arg Arg225 230 235 240Leu Ala Arg Ala Ala Ala Ala Arg Lys Pro Arg Ser Pro Pro Arg Ala 245 250 255Leu Val Leu Pro His Ile Ala Ser His His Gln Val Asp Pro Thr Glu 260 265 270Pro Val Gly Gly Ala Arg Met Arg Leu Thr Gln Glu Glu Lys Glu Arg 275 280 285Arg Arg Lys Leu Asn Leu Cys Leu Tyr Cys Gly Thr Gly Gly His Tyr 290 295 300Ala Asp Asn Cys Pro Ala Lys Ala Ser Lys Ser Ser Pro Ala Gly Lys305 310 315 320Leu Pro Gly Pro Ala Val Glu Gly Pro Ser Ala Thr Gly Pro Glu Ile 325 330 335Ile Arg Ser Pro Gln Asp Asp Ala Ser Ser Pro His Leu Gln Val Met 340 345 350Leu Gln Ile His Leu Pro Gly Arg His Thr Leu Phe Val Arg Ala Met 355 360 365Ile Asp Ser Gly Ala Ser Gly Asn Phe Ile Asp His Glu Tyr Val Ala 370 375 380Gln Asn Gly Ile Pro Leu Arg Ile Lys Asp Trp Pro Ile Leu Val Glu385 390 395 400Ala Ile Asp Gly Arg Pro Ile Ala Ser Gly Pro Val Val His Glu Thr 405 410 415His Asp Leu Ile Val Asp Leu Gly Asp His Arg Glu Val Leu Ser Phe 420 425 430Asp Val Thr Gln Ser Pro Phe Phe Pro Val Val Leu Gly Val Arg Trp 435 440 445Leu Ser Thr His Asp Pro Asn Ile Thr Trp Ser Thr Arg Ser Ile Val 450 455 460Phe Asp Ser Glu Tyr Cys Arg Tyr His Cys Arg Met Tyr Ser Pro Ile465 470 475 480Pro Pro Ser Leu Pro Pro Pro Ala Pro Gln Pro Pro Leu Tyr Tyr Pro 485 490 495Val Asp Gly Tyr Arg Val Tyr Gln Pro Val Arg Tyr Tyr Tyr Val Gln 500 505 510Asn Val Tyr Thr Pro Val Asp Glu His Val Tyr Pro Asp His Arg Leu 515 520 525Val Asp Pro His Ile Glu Met Ile Pro Gly Ala His Ser Ile Pro Ser 530 535 540Gly His Val Tyr Ser Leu Ser Glu Pro Glu Met Ala Ala Leu Arg Asp545 550 555 560Phe Val Ala Arg Asn Val Lys Asp Gly Leu Ile Thr Pro Thr Ile Ala 565 570 575Pro Asn Gly Ala Gln Val Leu Gln Val Lys Arg Gly Trp Lys Leu Gln 580 585 590Val Ser Tyr Asp Cys Arg Ala Pro Asn Asn Phe Thr Ile Gln Asn Gln 595 600 605Tyr Pro Arg Leu Ser Ile Pro Asn Leu Glu Asp Gln Ala His Leu Ala 610 615 620Thr Tyr Thr Glu Phe Val Pro Gln Ile Pro Gly Tyr Gln Thr Tyr Pro625 630 635 640Thr Tyr Ala Ala Tyr Pro Thr Tyr Pro Val Gly Phe Ala Trp Tyr Pro 645 650 655Val Gly Arg Asp Gly Gln Gly Arg Ser Leu Tyr Val Pro Val Met Ile 660 665 670Thr Trp Asn Pro His Trp Tyr Arg Gln Pro Pro Val Pro Gln Tyr Pro 675 680 685Pro Pro Gln Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Ser 690 695 700Tyr Ser Thr Leu705291188DNAHomo sapiens 29ggggagctgg accaccggac cagcggcggg ctccacgcct accccgggcc gcggggcggg 60caggtggcca agcccaacgt gatcctgcag atcgggaagt gccgggccga gatgctggag 120cacgtgcggc ggacgcaccg gcacctgctg gccgaggtgt ccaagcaggt ggagcgcgag 180ctgaaggggc tgcaccggtc ggtcgggaag ctggagagca acctggacgg ctacgtgccc 240acgagcgact cgcagcgctg gaagaagtcc atcaaggcct gcctgtgccg ctgccaggag 300accatcgcca acctggagcg ctgggtcaag cgcgagatgc acgtgtggcg cgaggtgttc 360taccgcctgg agcgctgggc cgaccgcctg gagtccacgg gcggcaagta cccggtgggc 420agcgagtcag cccgccacac cgtttccgtg ggcgtggggg gtcccgagag ctactgccac 480gaggcagacg gctacgacta caccgtcagc ccctacgcca tcaccccgcc cccagccgct 540ggcgagctgc ccgggcagga gcccgccgag gcccagcagt accagccgtg ggtccccggc 600gaggacgggc agcccagccc cggcgtggac acgcagatct tcgaggaccc tcgagagttc 660ctgagccacc tagaggagta cttgcggcag gtgggcggct ctgaggagta ctggctgtcc 720cagatccaga atcacatgaa cgggccggcc aagaagtggt gggagttcaa gcagggctcc 780gtgaagaact gggtggagtt caagaaggag ttcctgcagt acagcgaggg cacgctgtcc 840cgagaggcca tccagcgcga gctggacctg ccgcagaagc agggcgagcc gctggaccag 900ttcctgtggc gcaagcggga cctgtaccag acgctctacg tggacgcgga cgaggaggag 960atcatccagt acgtggtggg caccctgcag cccaagctca agcgtttcct gcgccacccc 1020ctgcccaaga ccctggagca gctcatccag aggggcatgg aggtgcagga tgacctggag 1080caggcggccg agccggccgg cccccacctc ccggtggagg atgaggcgga gaccctcacg 1140cccgccccca acagcgagtc cgtggccagt gaccggaccc agcccgag 1188301188DNAOrcinus orca 30ggggaattgg atcaacgtac taccggtggc cttcacgcat accctgcacc acgcgggggc 60cctgtcgcga agccaaatgt catcctgcag attgggaagt gccgggctga gatgctggag 120cacgtccgtc ggacgcatcg tcatcttctt actgaggtgt caaaacaggt ggagcgtgaa 180ctcaaaggct tgcaccgcag cgttgggaaa cttgaaagca acttagatgg ctatgtgccg 240actggcgaca gccagcgttg gcgtaagtcc atcaaagcat gtttgtgtcg ttgccaggaa 300acgattgcaa acctggagcg ttgggtcaaa cgggagatgc atgtctggcg tgaagtattt 360tatcgtttag agcgttgggc cgatcgttta gagagcatgg gtggtaagta ccctgtgggg 420agcaaccctt ctcggcatac gacgtcagtc ggtgttggcg ggccggagtc ctacggtcat 480gaagcggaca cctacgacta taccgtaagc ccttatgcta ttaccccacc acctgcggcc 540ggcgaattac ctggccagga agccgttgag gctcaacaat accctccttg ggggctgggc 600gaggatggtc aacctagccc aggggtagac acgcaaatct ttgaggaccc acgggagttt 660ctttcccacc tggaagaata cctgcgtcag gttggtggga gcgaagaata ctggctgtca 720caaattcaaa accatatgaa tggtcctgca aaaaaatggt gggaatataa acagggttcc 780gtgaaaaact gggttgagtt taaaaaggag tttcttcaat attccgaggg cgccctcagt 840cgggaggcgg tccaacgcga gttggacttg ccacagaaac agggggaacc actcgatcaa 900ttcctttggc ggaaacgtga cctttaccag acattgtacg tggatgcaga tgaggaagaa 960attatccaat atgttgtggg gaccctgcag ccgaaactga aacgtttcct tcgcccgccg 1020ctgcctaaaa cgttggaaca acttattcag aaaggtatgg aggtcgagga tggcttagaa 1080caagtcgcag agccggcctc gccacacttg cctacagagg aggaatcgga ggcgctgacc 1140ccagcactta catcagagtc agtggcatca gaccggacac aaccagag 1188311170DNAOdocoileus virginianus texanus 31ggggagttag atcaccgtac aacggggggg ttgcacgcat accctgctcc acgtggcggg 60ccggcagcta agccaaacgt aatcctgcag attgggaagt gccgggcaga gatgttggag 120cacgtccggc ggacccaccg gcacctcctg gctgaagtgt ctaaacaagt agaacgggaa 180ctcaaaggtc ttcatcgtag cgtcgggaaa ttggaatcga atttggacgg gtatgttcct 240acaggcgact cacagcggtg gaaaaagagc atcaaggcct gcctgagtcg ctgccaggag 300acgattgcta acctcgaacg ctgggttaag cgggagatgc acgtttggcg cgaagtcttc 360taccggctgg agcgttgggc tgatcggctc gaatctggtg ggggtaagta tccagttggg 420tccgaccctg ctcgccacac agtctcagtt ggcgtaggtg ggccggagtc gtattgccaa 480gatgcggaca actatgatta tacagtttcc ccatacgcga tcacaccacc gccggcagca 540gggcagctgc caggtcagga agaggttgag gcccagcagt atccaccatg ggccccaggg 600gaagacggcc agctttctcc tggggtggac actcaagttt ttgaagatcc gcgtgaattt 660ctgcggcatt tagaagatta tctccgccag gtcggggggt ctgaagagta ttggttaagc 720caaattcaaa accatatgaa cggcccggcc aagaagtggt gggagtacaa gcaagggtct 780gtgaaaaatt gggtggagtt taagaaagaa ttcttgcaat attctgaggg cactctttcg 840cgtgaagcca tccaacgcga actcgactta ccgcagaaac aaggggaacc tctcgaccaa 900tttctgtggc gcaaacgcga cctgtaccag actctttacg tcgatgctga ggaggaagaa 960attattcaat acgtagttgg cacactgcag cctaagctta aacggttttt acgtccacca 1020ttgccgaaga cgcttgaaca actcatccag aagggtatgg aggttcaaga tggtctggaa 1080caggcagcgg aaccagcggc ggaggaggca gaagccctga cacctgcgtt aactaacgag 1140tctgtcgcga gcgaccgcac ccagccggaa 1170321203DNAOrnithorhynchus anatinus 32ggggaattag accgcctgaa cccaagctca ggcctgcatc catcctctgg tttgcatcca 60tacccaggtc tccggggcgg ggcaaccgcg aagcctaatg tcattttgca aattggcaaa 120tgccgtgcgg aaatgcttga acacgtccgc aaaactcacc gtcatctcct cacagaagta 180tcgcgccaag tagaacgcga gctcaaaggc cttcacaaaa gtgttggcaa gttggaatca 240aatcttgatg ggtacgtacc gtcaagcgac tcccaacgct ggaagaaaag cattaaggcg 300tgcttatccc gttgccaaga gacgattgcg catttagaac gctgggttaa acgtgaaatg 360aatgtatggc gtgaggtgtt ctaccgtttg gaacgttggg cggaccgtct ggaggctatg 420ggcggtaagt atcctgccgg tgagcaggcc cggcgtacag tttcagtggg cgttgggggc 480cctgagacat gttgtccagg ggatgaaagt tatgattgtc cgatttctcc gtatgcagtt 540ccaccttcca ccggcgagtc tccggaatcc ttagaccaag gggatcagca ctatcagcag 600tggtttgccc tcccggagga gtcccctgtt agccctgggg ttgataccca gatctttgaa 660gatcctcgcg agtttttacg tcatctggag aagtacctga aacaagtcgg cgggacagag 720gaagactggc tttctcaaat ccagaatcac atgaatgggc cggcgaagaa gtggtgggag 780tacaagcaag ggagtgttaa gaattggctt gaatttaaga aggaattttt acagtattcg 840gagggcacac tgacgcggga cgcgttgaaa cgtgaactgg atctcccaca gaaacaaggc 900gaaccacttg atcaattttt atggcggaag cgcgacttat atcagacact ctacgttgac 960gccgatgaag aggaaatcat tcagtacgtc gtgggcactc ttcagccgaa attaaaacgc 1020tttctccatc acccactccc taagacgctt gagcagctta tccaacgggg ccaagaagtt 1080cagaatggtc tggagcctac cgacgatcct gcaggccaac gcactcaatc ggaggacaac 1140gacgaaagcc ttacccctgc cgtcaccaat gagagtactg caagcgaggg caccctgcca 1200gag 1203331212DNAAnser cygnoides domesticus 33gggcagcttg ataacgttac aaacgcgggc atccactcct tccaggggca tcgtggcgta 60gcgaataagc caaatgtcat tctgcaaatt ggtaaatgtc gtgcggaaat gctggagcac 120gttcgccgca cccaccgcca tttattatct gaagtatcta agcaggtaga acgtgagctg 180aaagggctgc aaaagtccgt gggcaagctc gagaataact tggaggatca tgtccctaca 240gataaccaac gctggaagaa gtccattaaa gcgtgcttgg ctcgttgtca agagactatc 300gcgcatttag agcgttgggt gaaacgcgaa atgaacgtct ggaaggaggt gtttttccgg 360ctggaaaagt gggcagaccg gctggagtca atgggtggca agtactgccc gggcgaacac 420gggaaacaaa ccgtcagtgt aggcgtgggg ggtcctgaaa tccggccttc ggagggggaa 480atttatgatt atgctctgga tatgagccag atgtatgcac tcaccccacc tccaggcgaa 540atgccatcaa tcccacaagc ccatgacagc tatcagtggg ttagtgtctc agaagatgcc 600ccggcgagcc ctgtcgaaac ccaggtattt gaggaccctc gggaattcct gtctcacctg 660gaggaatacc tgaagcaggt aggcggcacg gaggagtatt ggttgtccca gatccagaat 720cacatgaatg gtccggcaaa aaaatggtgg gaatataaac aggactccgt taaaaactgg 780gttgagttta aaaaggaatt cttgcaatac tctgaaggta ctttaactcg ggatgctatt 840aagcgtgaac tcgacttgcc gcaaaaggaa ggtgaacctc ttgaccaatt cctttggcgg 900aagcgggacc tctatcagac actttacgtg gacgcggatg aggaggagat cattcagtat 960gtggtcggta ccctgcagcc gaagctcaag cgtttcctga gctatcctct cccaaagact 1020ttagaacagc tcatccagcg cggtaaagaa gtgcagggta acatggatca ctccgatgag 1080ccttcgccgc agcgtacacc tgaaattcaa tcaggtgact ccgtagaatc tatgccacct 1140tcaacaacgg catctccggt tccatctaat ggtacccaac ctgagccgcc gagcccgcca 1200gccaccgtta tc 1212341185DNAPelecanus crispus 34gggcaacttg acaacgtaac aaacgctggg attcactcct ttcagggcca ccgcggtgtc 60gccaacaagc caaacgtaat cttgcaaatt ggcaaatgcc gtgcggagat gttggaacac 120gttcgtcgta cacatcgtca cttgctgtcg gaagtctcta aacaagtaga acgtgaactt 180aaagggcttc aaaagtcagt cggcaaattg gaaaacaacc ttgaagacca tgtaccaacc 240gacaatcagc gttggaaaaa gtctatcaaa gcttgcctgg cccgttgtca agagacgatt 300gctcacctgg agcggtgggt aaagcgcgag atgaatgtgt ggaaagaggt cttcttccgc 360ttggaaaaat gggccgaccg tttggagtcc atgggcggta aatattgtcc gggtgaacat 420ggtaagcaaa cagtctctgt gggcgttggt gggccggaga ttcggccttc tgaaggcgag 480atttacgatt atgcgctcga catgtcccag atgtatgcgc ttacaccacc accgggcgag 540gtaccaagca ttcctcaagc gcatgacagt tatcagtggg ttagcgtatc cgaagacgct 600cctgcctcgc cggtagagac ccaggttttt gaagatcctc gtgaattttt aagccacttg 660gaggagtatt tgaagcaggt aggggggaca gaggaatatt ggctgtctca gatccagaac 720cacatgaatg gcccggctaa aaagtggtgg gaatacaaac aagattcggt aaagaattgg 780gtagaattta aaaaggagtt tttacagtac tcagagggga ctctcacgcg tgatgcgatc 840aaacgcgagt tggatcttcc tcaaaaagag ggggagccac tcgatcagtt cctctggcgc 900aagcgggatc tctaccaaac actctacgta gacgcagacg aagaagagat catccagtac 960gtggtgggta cgctccagcc gaaactcaaa cgtttcctca gctacccact tcctaagact 1020ctggaacaac tgattcagcg gggcaaagag gtccagggta acatggacca ttcagaggaa 1080cctagtccgc aacgtacacc tgagatccaa tctggggatt ctgtcgattc ggttccacct 1140tctacaacag cgtctccggt gccgtcaaat gggacccaac cagag 1185351185DNAHaliaeetus albicilla 35gggcagcttg ataatgtaac caatgcaggt atccactctt tccagggtca ccgcggtgtg 60gcaaacaagc caaatgttat tctgcaaatt ggtaagtgtc gcgctgagat gttagaacac 120gtccggcgca cgcatcggca tctcctgtca gaggtttcaa agcaggtaga gcgtgaatta 180aagggcctcc agaagtccgt aggtaaactc gaaaataatc ttgaagacca cgttcctacc 240gataatcaac ggtggaaaaa gtcaatcaag gcgtgcttag cacggtgtca ggaaacgatc 300gcgcacctcg aacgttgggt gaagcgcgaa atgaatgtct ggaaagaagt gttcttccgg 360cttgagaagt gggctgatcg gctcgaatcc atgggtggca aatattgtcc aggtgatcat 420ggcaagcaaa cggtctccgt cggtgttggt ggtccggaaa tccggccgag cgagggtgaa 480atctatgact acgctcttga tatgtcccag atgtatgcac tcactcctcc gccgggtgag 540gtcccgtcga tcccgcaggc gcatgactca taccaatggg tgtcgactag cgaagacgca 600ccagcctccc ctgttgaaac tcaagtattc gaggacccgc gtgagttcct gagccattta 660gaggagtacc ttaagcaggt tggtggtacc gaggaatact ggttgagcca gattcagaat 720cacatgaacg ggccggctaa gaaatggtgg gaatacaagc aggattcagt caagaattgg 780gtcgaattta agaaggagtt tttgcagtac agtgagggga cgctcacacg cgacgctatc 840aaacgggagc tggacctgcc acaaaaggag ggtgaaccgc ttgatcagtt tctttggcgc 900aagcgtgatc tgtatcaaac cctgtatgtg gacgctgacg aagaagagat cattcagtac 960gtggttggga ctctgcaacc aaagctgaag cgttttcttt cttatcctct ccctaagaca 1020ctggaacagt taatccaacg tggcaaggag gtccagggta atatggacca ctctgaggaa 1080ccgagcccgc aacgtactcc tgaaattcag agcggggata gtgtcgactc agttcctcca 1140agtacgaccg catccccggt cccaagtaac ggtacccaac cagag 1185361395DNAOphiophagus hannah 36gggtcttggg gcttgcaacg tcacgtggct

gatgaacgtc gtggcctcgc tacgcctacc 60tacggcgcgg tttgttccat tcgggagaaa aaagcctccc aactgagcgg ccagagctgt 120ttggagaaag agttgcttgg ttggaaatgt acggaggcaa tcgtggaaat gatgcaagtc 180gataacttta accacggtaa cttacatagc tgccaaggcc atcgggggat ggcaaatcac 240aaaccgaacg taatccttca aatcgggaaa tgtcgcgcag aaatgttaga ccacgtgcgt 300cgcacccacc gccatctctt gacggaggtt tcgaagcagg tagaacgcga attgaagtct 360ctccaaaagt cggttggcaa gctcgagaat aatctggaag accacgtgcc atcggcagcg 420gagaaccaac gttggaagaa atcaattaaa gcctgcctgg cccggtgcca agaaacaatt 480gctcacctcg aacgctgggt taaacgcgaa atcaacgtct ggaaagaagt attctttcgt 540ctggagaagt gggcggaccg ccttgagtcg ggtgggggca agtatgggcc tggtgaccaa 600agtcgtcaaa ctgtaagtgt cggtgttggg gccccagaaa tccaaccgcg gaaagaagaa 660atctatgact acgctctcga catgtcgcag atgtatgcct taacaccacc gccgatgggt 720gaagacccaa acgtacctca atcccacgat agctaccagt ggattaccat ctcagacgat 780tcacctccgt cgccagtgga aactcaaatt ttcgaggatc cacgcgaatt ccttacccat 840ctcgaggatt atcttaagca agtgggcggg actgaagaat attggttgag tcagattcaa 900aatcatatga acggtccggc caagaaatgg tgggagtaca aacaagattc cgtgaaaaac 960tggttggaat tcaagaagga attccttcaa tactctgagg gtactttgac acgtgacgca 1020attaaacaag aacttgactt accgcagaag gacggcgagc cattggatca atttctttgg 1080cggaagcggg acctgtatca gacgctctat attgatgcag aggaggaaga agtaatccaa 1140tacgttgttg gcacactcca accgaaatta aaacgtttcc tttcccaccc gtatccgaaa 1200actttggaac agttaatcca acgtgggaaa gaggtggaag gcaacctcga taactctgag 1260gagcctagcc cgcaacggag tccaaagcac caattgggtg gtagcgtcga gagcctccca 1320ccttcgtcga ccgcaagtcc tgttgcgtca gacgagactc acccagacgt gagcgcacct 1380ccggtaacgg tgatt 1395371353DNAAustrofundulus limnaeus 37ggggacggcg agactcaagc tgagaatcca tctaccagct tgaacaacac tgacgaagat 60atcttggaac agctcaagaa aattgtcatg gatcaacaac acctgtatca gaaagaatta 120aaggcatctt ttgaacaact cagtcgcaaa atgttttccc agatggaaca aatgaatagc 180aagcaaacgg atctgctttt agaacatcaa aaacagactg tcaaacatgt agacaagcgc 240gtggagtatt tgcgggcgca attcgatgca tcgttaggct ggcggttgaa agagcaacac 300gcggatatta cgaccaaaat cattcctgag atcatccaaa cggtgaagga agatattagc 360ctgtgtcttt ctacgctctg cagtatcgct gaagatatcc agacatcacg ggctaccact 420gtcacagggc atgctgccgt acaaacccat cctgtggatc ttttgggtga acaccattta 480gggaccacgg ggcacccacg cttacagtcg acccgtgtag ggaaaccaga cgacgtacct 540gagtcgccgg taagcctgtt tatgcaaggt gaggcgcgtt cccggatcgt tggcaagagt 600ccgattaaac tgcaatttcc gacgttcggc aaagcaaacg attcttccga cccactccaa 660tatctggagc ggtgtgagga ctttcttgct cttaaccctt taactgatga ggaacttatg 720gctactttgc ggaatgtgtt acatggcacc tctcgggatt ggtgggatgt cgcacgtcat 780aaaatccaaa cttggcgtga gtttaataaa cacttccggg cggctttcct cagcgaggat 840tatgaagatg agttggctga gcgcgtccgt aaccgcatcc aaaaagaaga tgagtctatc 900cgcgatttcg cttatatgta tcagtccttg tgcaagcggt ggaaccctgc tatctgcgaa 960ggtgatgtag taaagctcat cctgaagaac atcaatccac aactgccgtc tcagttacgc 1020tcccgggtca cgaccgtgga tgagcttgtt cgcttgggcc agcagcttga aaaagatcgt 1080cagaatcagc tccaatatga gcttcggaag agttccggca aaattatcca aaaatctagt 1140tcgtgcgaaa cttcagcgct cccgaacacg aagagtacac ctaatcaaca aaaccctgct 1200accagtaacc gtcctccaca ggtgtattgc tggcggtgta agggtcacca tgcccctgcc 1260tcttgtccgc aatggaaagc tgataagcac cgtgcgcaac cttcgcggag ttctgggcca 1320caaactctga ctaatctcca agctcaagac atc 1353381188DNAPhyseter catodon 38ggggaattgg atcaacgtgc ggcagggggc ttgcgcgcgt acccggcgcc gcgtggtggt 60ccagttgcca aaccgagcgt aattcttcag attggtaagt gccgcgctga gatgctggaa 120cacgtccgcc gcacgcatcg ccatcttctg acggaggtaa gtaaacaagt ggagcgcgaa 180ctcaaggggt tacatcggtc tgtcggtaag ttggagggca atttagacgg ctatgtgcct 240accggtgatt cccaacgctg gaaaaaaagt atcaaggcgt gtctctgccg gtgtcaggaa 300acaattgcaa atctcgagcg ttgggtgaaa cgtgagatgc atgtttggcg tgaggtattc 360tatcgtttgg aacggtgggc agaccgtttg gagtctatgg ggggcaagta tccggtgggc 420actaacccgt cgcggcacac agtaagtgtc ggggtagggg gcccggaagg ctattctcat 480gaagcggata cttatgacta cacggtgtct ccgtatgcta tcacgccacc gcctgccgcg 540ggtgagttgc ctggtcaaga ggctgtcgag gcacaacagt accctccatg gggtctgggg 600gaggacgggc aaccaggtcc gggcgtggac acgcagattt ttgaggaccc tcgcgaattt 660ttgagccact tagaggagta cctgcggcaa gtagggggga gtgaagagta ctggttatcg 720caaattcaaa atcatatgaa tggccctgcg aagaaatggt gggagttcaa acaggggtca 780gtcaagaatt gggtcgagtt taagaaagaa tttttgcaat acagtgaggg tacgttgagt 840cgcgaggcca tccaacgtga actggacctc cctcagaagc agggggagcc gttagatcaa 900tttttatggc ggaaacgtga cttataccaa accctctacg ttgacgctga ggaagaagaa 960attattcaat atgttgtcgg tacgctgcag ccaaagctga agcggttcct ccgtcctcca 1020ctccctaaaa ccttagaaca attaatccaa aaaggcatgg aagttcagga cgggttagaa 1080caagcggccg aaccggcctc tccgcgtctg ccgccggaag aggagagtga ggctcttacg 1140cctgcgctca cgagcgaatc agtagcctcc gatcggacac agccagag 1188391212DNAMeleagris gallopavo 39gggcagcttg acaatgtgac gaacgcgggg attcacagct ttcaagggca ccgcggcgtc 60gccaacaaac cgaatgtcat tctgcaaatc ggtaaatgtc gtgctgaaat gcttgagcac 120gttcgtcgta cccatcgtca cttgctttct gaagtatcaa aacaagtgga gcgggaactc 180aaaggcctgc aaaagtcagt gggtaaattg gagaataacc tcgaagacca tgtacctaca 240gacaaccagc ggtggaaaaa atctatcaag gcatgcctcg ctcgttgcca ggagactatt 300gcccatcttg agcggtgggt gaaacgtgaa atgaacgtat ggaaggaagt attttttcgc 360ttagagaagt gggctgatcg tcttgaatcg atgggcggca agtactgtcc tggggaacac 420ggcaaacaaa ctgtatctgt cggcgtgggg ggcccggaga tccggccatc ggaaggggaa 480atttatgatt atgctctcga catgtcccaa atgtatgctc tcacaccagg gccaggggaa 540gtaccgtcaa ttccgcaagc acacgacagc taccaatggg tatctgtgag cgaggacgcg 600cctgcctctc cggttgagac gcaaatcttt gaggacccac atgaattttt gtctcatctt 660gaagaatatc tcaaacaggt tggcggcaca gaagaatact ggttatctca gatccagaat 720cacatgaacg gcccggctaa aaagtggtgg gagtataagc aagattccgt aaagaactgg 780gtcgaattca agaaagagtt tcttcaatac tctgagggta ctctgacgcg cgatgcaatt 840aagcgggagt tagaccttcc acaaaaagag ggggagcctc ttgaccagtt cctgtggcgt 900aagcgcgacc tctatcagac actttacgtc gacgctgatg aagaagagat tattcaatat 960gttgtgggta ccctgcagcc aaagcttaag cgtttcctta gctacccact tccgaaaact 1020ctggagcagc tcattcaacg cggtaaggaa gtgcagggca acatggacca ctctgaagag 1080cctagcccgc agcgcactcc tgaaatccaa tcaggtgaca gtgtggagtc aatgccgccg 1140tcaaccaccg cttctccggt acctagcaac gggacgcaac cagagcctcc aagcccaccg 1200gctacagtca tc 1212401227DNAPogona vitticeps 40gggcaacttg agaatattaa ccaaggttcc ctgcacgcgt ttcagggtca tcgcggcgtg 60gtccataaca acaagcctaa cgttattctc cagatcggga agtgccgcgc cgaaatgctg 120gagcatgtgc ggcgcaccca tcgccatttg ctcactgaag tatcaaaaca ggtggagcgt 180gagttgaagg ggttgcagaa aagtgtaggc aaacttgaaa ataatttaga agaccacgta 240ccaagtgcgg ctgagaacca acgctggaag aagtcgatta aagcctgctt agcgcgttgt 300caggagacca ttgcgaactt ggaacgctgg gttaaacgtg agatgaatgt ttggaaggag 360gtctttttcc gcttagagcg ctgggcagat cgcctcgaat ccgggggtgg caagtactgc 420catgcagacc agggtcgcca aactgtcagc gtaggtgttg gtggtcctga agtgcgtccg 480tctgaaggtg aaatttacga ttacgcgttg gatatgagcc aaatgtacgc cttgactccg 540ccgcctatgg gtgatgttcc agtaattcct cagccgcatg acagttatca gtgggtgaca 600gatccggaag aagcgccacc aagtccggtt gagacacaaa ttttcgagga ccctcgggag 660tttctgaccc atcttgagga ttatttaaaa caagtcggcg ggacagagga atattggctc 720tcacagatcc aaaatcatat gaatgggcca gcgaaaaagt ggtgggaata taaacaggat 780agtgtgaaga actggcttga gttcaaaaaa gaattcttgc agtactcaga aggcacgtta 840acgcgggacg ctattaaaca ggaacttgac cttccacaaa aagaagggga accgctggat 900caattcctct ggcgcaaacg cgatttgtac caaactctct acgtcgaggc agaagaagag 960gaggtcatcc aatatgtagt tggcacactg caaccaaaac tgaagcggtt tctttctcat 1020ccgtacccta aaaccctgga gcaactcatc cagcgcggga aggaagttga ggggaatttg 1080gacaatagtg aagaaccgtc tccacagcgg accccagaac atcagctggg ggacagtgtg 1140gaatctttgc cgcctagtac tacggcttcg cctgccggtt cggataaaac gcaacctgag 1200attagcttac ctccaactac agtcatt 1227411212DNAAlligator sinensis 41gggcaattag attcggtaac caatgcgggc gtccacacct accagggcca tcggagcgtc 60gccaataaac ctaacgtcat tcttcaaatc gggaaatgtc ggactgagat gctggagcat 120gtccgtcgga ctcatcgcca cctgctcaca gaagtgtcaa agcaagtgga acgtgaactc 180aagggcttac agaagagcgt gggcaaactg gaaaacaatc ttgaagacca tgtcccaact 240gacaatcagc ggtggaagaa gtcaatcaag gcatgtctcg cgcgttgcca agagaccatt 300gctcaccttg agcggtgggt gaaacgtgaa atgaacgtgt ggaaggaggt gttcttccgg 360ttagaacgct gggccgaccg ccttgaatca atgggtggta aatactgccc gacggactct 420gcacgtcaga cagttagcgt tggggtgggg ggcccggaaa ttcggcctag tgaaggcgaa 480atctatgact acgcgctcga tatgagccaa atgtacgctc ttacgccgtc accgggcgaa 540ttgccgtccg tccctcaacc gcatgattca taccagtggg tcactagtcc ggaagacgct 600ccggcgtcac cagttgaaac gcaggtattc gaggatcctc gggagttctt gtgtcatttg 660gaagagtacc tgaagcaggt tggcggtaca gaggaatatt ggctgagcca gattcagaat 720catatgaatg gtcctgcaaa aaagtggtgg gaatataaac aagacacggt taagaattgg 780gtggaattca agaaggagtt cttacaatac agtgagggta cacttacccg tgatgcgatt 840aagcgggaat tagacctccc gcaaaaggac ggtgagcctc tggatcaatt tttatggcgt 900aagcgtgacc tctatcagac attatacatt gatgccgatg aagaacagat cattcagtac 960gtcgtgggga cattgcaacc taaactcaag cggttcttgt cctatccact tccaaaaact 1020cttgaacaat taatccagaa agggaaggag gtgcagggtt cacttgacca cagcgaggag 1080ccgagtcctc aacgtgcgag cgaggctcgg acgggcgata gtgtggaaac cttgccgcct 1140tctaccacta catcaccaaa tacgtcatct ggtacacagc cagaggcacc atcgcctcca 1200gcgacggtaa tc 1212421212DNAAlligator mississippiensis 42gggcagttag acagtgtgac taacgccggg gtgcatacgt accaggggca ccgcggggtc 60gccaataagc caaatgtaat tctccagatt gggaagtgtc gtacagagat gttggaacat 120gtccgtcgca ctcatcgcca cttgctcacc gaggtctcca aacaagtaga acgcgaactc 180aaggggctcc agaagagtgt tgggaagttg gagaataacc tcgaagacca cgttccgaca 240gataaccaac ggtggaaaaa gtctattaaa gcctgtctcg cccgttgtca agagacaatc 300gcacacttgg aacgctgggt caaacgggag atgaatgtgt ggaaggaagt cttcttccgt 360ctcgagcggt gggcggatcg tttagaaagt atgggcggta aatattgccc aactgactcg 420gctcgtcaaa cggtgtcggt tggcgtaggc ggcccggaaa ttcgccctag cgagggtgag 480atctatgact atgcacttga catgagtcag atgtatgcgt taactccgtc gccaggggag 540cttccaagta ttccacagcc tcacgatagt tatcaatggg taacttctcc tgaagacgcc 600ccagcatccc cagttgagac acaagtattc gaggaccctc gtgagtttct ctgtcacctc 660gaggagtacc ttaaacaggt aggcgggacc gaagagtact ggttatcgca aatccaaaac 720catatgaatg gtcctgccaa aaagtggtgg gagtataaac aagatactgt gaagaattgg 780gtagagttca agaaagagtt cttacagtac tctgagggga cgttaactcg tgatgcgatc 840aagcgcgaat tggatttacc tcagaaggac ggcgagccac tcgaccagtt cttatggcgc 900aagcgtgact tgtatcaaac cctttatatc gatgctgacg aggaacaaat tatccagtac 960gtagtcggta cgttgcaacc aaaacttaaa cgctttctga gctacccatt acctaaaacg 1020ttggagcaac tgatccagaa aggtaaagag gtgcaaggga gcctggatca tagtgaagaa 1080ccgagccctc agcgggcttc tgaagctcgg accggtgata gcgtcgaatc tttaccacct 1140agtaccacaa ccagcccgaa tgcgtcatct ggtacccaac ctgaagcgcc ttccccacct 1200gctacagtca tt 1212431224DNAGekko japonicus 43gggcagctcg agaatgtcaa ccatgggaac ctccattctt ttcaaggtca tcgcggcggc 60gtcgccaaca agccaaacgt tatcttgcag atcggtaaat gtcgtgcaga gatgctggac 120cacgtccggc ggacccaccg gcatttactg acagaggtat cgaaacaggt tgaacgtgag 180ttgaaggggt tacagaaatc agtagggaaa ttagaaaata acttagaaga ccatgtccct 240tcagccgttg aaaaccagcg ttggaaaaaa tcgatcaagg cctgcctttc ccgctgccaa 300gagaccattg cccaccttga gcgttgggtg aagcgcgaga tgaacgtatg gaaagaggtt 360ttcttccgct tagagcggtg ggcagatcgg ttggaatctg ggggcgggaa atattgtcac 420ggtgataatc atcgtcaaac agtatcagtc ggtgttggcg gccctgaggt acgtccatct 480gaaggcgaaa tttacgatta cgctctcgac atgtcgcaaa tgtacgcttt aacaccgcct 540agcccagggg atgtgcctgt agttagccag ccgcacgaca gctatcagtg ggttacggtt 600ccggaggata cccctccatc cccggtggag acgcaaatct tcgaggaccc acgggagttc 660ttgacccact tagaggatta cttaaagcaa gtggggggta cagaggaata ttggttatct 720cagatccaga atcacatgaa cgggccagcc aagaagtggt gggagtataa gcaagactca 780gtaaaaaatt ggctcgagtt taagaaggaa ttccttcagt attccgaggg gacacttacg 840cgcgacgcta tcaaggaaga acttgacctc ccgcaaaagg acggggaacc tcttgatcag 900ttcctgtggc gcaagcgcga cttgtaccag accctgtacg tggaggcgga tgaggaggag 960gtgatccagt atgttgtggg gactttacaa cctaaattaa agcgttttct ctcacaccct 1020tacccgaaaa cgttagagca acttatccaa cggggcaaag aggtggaagg gaacctcgac 1080aattcagagg aaccaacacc tcagcgtact ccagaacacc aactgtgtgg ttctgtagaa 1140tcgctgcctc cttcctctac cgtcagtcca gtggctagcg atggtactca acctgagact 1200tcgccattgc cagcgactgt tatt 1224441365DNAHomo sapiens 44gggccattga cgttgttaca agactggtgt cgtggtgaac atttaaacac ccgccggtgc 60atgttgatcc tcggtatccc agaagattgc ggcgaggatg agttcgaaga gacacttcag 120gaggcgtgtc gccatttagg gcggtaccgc gtgatcggcc gcatgttccg tcgtgaggaa 180aatgcccaag cgatcctctt ggaattggcg caggatattg actatgcctt actccctcgg 240gaaatccctg ggaaaggcgg gccttgggag gtaattgtga agccgcgtaa ttccgacggc 300gaattcttaa atcggcttaa tcgctttctt gaagaggagc gccgtacggt ctccgatatg 360aaccgtgttt tgggctcgga tactaactgt tcagctcctc gtgtcaccat tagtcctgaa 420ttctggactt gggcacagac gctgggcgca gctgtccaac cattgctcga acagatgctc 480taccgggagt tacgggtctt cagtggcaat acgatttcca tcccaggtgc tctcgctttt 540gacgcgtggc tggagcatac cacggaaatg cttcaaatgt ggcaggtgcc tgaaggggag 600aaacggcggc gcttgatgga gtgtttgcgg gggccagccc tgcaagtcgt tagtgggtta 660cgtgcatcga atgccagtat cactgtcgaa gagtgtcttg ctgcactgca gcaggtattc 720ggtccagtgg aaagtcataa gattgcccaa gtaaagttat gcaaagctta ccaggaggct 780ggggaaaaag taagcagctt cgttttgcgt ttggagccac tgcttcagcg tgctgtagaa 840aacaacgtgg tcagtcgccg caatgtcaac caaacacgtc ttaagcgtgt tctgtcgggc 900gccacccttc ctgacaagct gcgtgataaa ttgaagttaa tgaaacagcg ccgtaaaccg 960ccgggtttct tggcgttggt taaactgtta cgtgaagagg aggagtggga ggccacctta 1020gggccagacc gcgagtcatt ggaggggtta gaagtggcac cgcgcccgcc agcacggatt 1080acgggtgttg gcgcagtacc tcttccggca tccgggaatt catttgatgc ccgtccttcg 1140caagggtacc ggcgccgtcg gggtcgtggt cagcaccgtc ggggcggcgt tgctcgtgca 1200ggctctcgtg gctctcgtaa gcggaaacgg cacaccttct gctattcctg tggtgaggat 1260ggccatattc gtgtccaatg cattaaccct agcaatctcc tgttggctaa ggagaccaaa 1320gagattttgg aagggggaga acgtgaagcg caaacgaatt cacgt 1365451344DNAHomo sapiens 45ggggctctta cgctcttaga agactggtgt aagggtatgg acatggaccc gcggaaggct 60ctcctgattg taggtattcc gatggaatgc agtgaggtgg aaatccagga tacagttaaa 120gctggtcttc aacctctgtg cgcttatcgt gtactcggcc gtatgttccg gcgggaggat 180aatgcgaagg ctgttttcat tgagctggca gacaccgtga attacaccac gttaccgtct 240cacattccgg gtaaaggggg ttcctgggaa gtcgttgtta aacctcggaa ccctgacgac 300gagttccttt ctcggcttaa ctacttcttg aaagatgagg gccgctcgat gacggatgtc 360gcccgggcac tggggtgctg tagcttacct gcggaatcac tggacgcgga agtaatgcca 420caggtccgct ccccaccatt agaacctcca aaagagagta tgtggtaccg taagttaaaa 480gtgtttagtg gtaccgcgtc gccttcgccg ggggaggaga catttgagga ctggttagag 540caagtcaccg agatcatgcc tatctggcaa gtatctgaag ttgaaaagcg ccgtcggtta 600ctggagtcac tccggggccc ggcactctca attatgcgcg tgttacaagc caataacgat 660agcattaccg ttgaacagtg tttggatgca ttaaagcaga tctttggcga caaggaagac 720ttccgtgcct ctcaatttcg ttttcttcaa acgtccccta aaattgggga gaaggtgagt 780acgttcctgc tgcgtttaga gccactcttg caaaaggccg ttcacaagag cccactttcg 840gtacgtagta ctgatatgat tcggttaaag cacctgttgg cacgcgtagc catgaccccg 900gcactgcgtg gtaaactcga attactcgac caacgcgggt gcccacctaa ttttcttgag 960ctgatgaagc tgatccggga tgaggaagag tgggagaata ctgaagctgt gatgaaaaat 1020aaagagaaac cttcaggtcg tggccgcggt gcatcaggcc gtcaagctcg cgccgaggcc 1080agtgtaagtg ctccgcaagc aacagtccaa gcacgtagct tctctgattc tagcccgcag 1140acgattcagg ggggcttacc acctcttgtc aagcgtcggc gccttttggg ttcggagagc 1200acacgtgggg aagaccacgg gcaagctact tatccgaaag cagagaatca gactccaggg 1260cgtgagggcc cgcaggcggc tggggaggaa cttggtaatg aggccggggc cggcgcgatg 1320tcccacccga aaccgtggga aacc 1344461197DNAHomo sapiens 46ggggctgtga caatgctcca ggactggtgc cgttggatgg gcgtgaacgc tcggcggggg 60ctgttaatct taggtatccc tgaagactgt gacgatgcag agttccaaga gtcgttagaa 120gctgcactcc gtcctatggg tcactttact gtactcggta aggccttccg cgaggaagac 180aacgctaccg ctgcgctggt ggaattagat cgcgaggtta attacgcact tgttccacgc 240gaaattccgg gcaccggcgg gccttggaac gtcgtgttcg ttcctcggtg ctccggcgag 300gaattcctgg ggttaggccg cgtgttccac tttcctgaac aggagggcca aatggtagaa 360tcggttgcgg gggcactggg ggtaggtctg cgccgcgtgt gttggttacg ctcgatcggg 420caagctgtac aaccatgggt agaagctgtt cgctgccaaa gcttaggggt atttagtggt 480cgtgatcaac ctgcacctgg tgaagaaagc ttcgaggtct ggttggatca tacgaccgag 540atgttgcatg tgtggcaagg cgtgtcggaa cgggaacggc gccgtcgtct gctggaaggg 600ctgcgtggca cagccttaca acttgtacat gccttactgg cagaaaatcc ggcacggaca 660gcacaagatt gcttggctgc attagcccaa gtttttggtg ataacgaaag ccaggcaacg 720attcgtgtta aatgtttgac agcccaacag cagagtggcg aacgcctctc tgcgttcgtt 780ctccgcttag aagtacttct gcaaaaggct atggagaagg aagcattggc gcgcgcgtca 840gcggatcggg tgcgtcttcg tcagatgctg acacgcgcac atctcacaga gccgttggat 900gaagccttac ggaaattgcg tatggcaggg cgttctccgt cttttttgga aatgctcggc 960ttagtacgcg agtcagaggc ctgggaggca agtctggctc ggtccgtccg ggcgcaaacc 1020caggagggtg caggggcccg ggcgggggcc caagcagttg cgcgtgccag cactaaggtt 1080gaagctgtac ctggtggccc tggccgggag ccagaaggtc tcctccaagc cgggggccaa 1140gaagcggaag aacttctcca agagggctta aagccggttt tagaggaatg tgacaat 1197471197DNAHomo sapiens 47ggggcggtca ccatgttgca agactggtgt cggtggatgg gcgtgaatgc tcggcggggt 60ttattgatct tgggtatccc agaagactgt gacgacgccg agtttcagga gtcgctcgag 120gccgcccttc gtccaatggg gcattttacg gttctgggca aggtgttccg tgaagaggat 180aacgctacag cagctcttgt ggagcttgac cgtgaggtga attatgcgtt agtacctcgc 240gagattccag gtaccggtgg gccatggaac gtagtcttcg tcccacgttg ctcgggggag 300gaatttctgg ggcttgggcg cgtattccac tttccagaac aggaagggca gatggtcgaa 360agcgtagcag gcgctcttgg cgttggtctc cggcgcgtgt gctggttacg ctccatcggc 420caagcagtcc aaccatgggt tgaagccgta cgctatcaat ctttaggtgt cttctcaggc 480cgtgaccagc cggcgcctgg tgaggaatcc ttcgaagtct

ggctcgatca tacaactgag 540atgctgcatg tatggcaagg tgtctcagag cgggaacggc ggcggcggtt attagagggg 600ctccgtggga ctgcgctcca attagtacat gcgcttttgg ccgaaaatcc agcccgtact 660gcccaagatt gtctggcagc actcgcccaa gtattcggcg acaacgaatc gcaggcaaca 720atccgcgtaa agtgtcttac agcacagcag cagtcagggg aacgtcttag tgcgttcgtt 780ctgcggctgg aagtgttact ccagaaagcc atggaaaagg aggcattggc tcgcgcgagc 840gctgaccgtg tacgtctgcg gcaaatgctt actcgcgcac atctcaccga gcctctcgat 900gaagcactgc ggaaactgcg catggcaggc cgcagcccgt ctttcctgga aatgttaggc 960ttagtccggg agtccgaagc ctgggaggcc agtctggcac ggtcagtgcg ggcacaaacg 1020caagagggtg caggggcacg ggcgggtgca caagcagttg cacgtgcctc cactaaagtt 1080gaggcagtgc cgggtgggcc aggccgtgaa ccggagggtt tgcgccaagc cggcgggcag 1140gaagccgaag aattactcca agaaggttta aaaccggttt tggaggaatg cgataac 1197481425DNAHomo sapiens 48ggggtggaag atttggcggc atcttacatc gtattaaagc ttgagaacga aatccggcag 60gcgcaggtcc aatggttaat ggaggaaaac gccgccctgc aggcccagat ccctgaactt 120caaaagtcgc aagccgcgaa ggagtatgat cttctgcgta aatcttcgga ggcgaaggag 180ccgcaaaaac tgccagaaca tatgaatcca ccggccgctt gggaagcaca aaagactcca 240gagtttaagg aaccacagaa acctcctgaa ccacaggatt tgcttccttg ggagccgcct 300gctgcctggg agttgcaaga agcaccggct gcccctgagt cactggctcc gcctgcaacc 360cgtgagtctc agaaaccacc tatggcgcat gaaatcccta ctgtattgga ggggcaaggg 420cctgccaaca cacaagacgc tacgattgct caagaaccaa agaatagcga gccgcaagac 480cctccaaata tcgagaaacc tcaggaagct ccggaatatc aagaaacagc ggcacagttg 540gagtttttag aacttcctcc acctcaggag ccactcgaac cgagcaatgc gcaagaattt 600ctcgagttgt cggctgccca ggagtcctta gaaggcctca ttgtagttga aacgtccgcg 660gcttcggagt tcccacaggc tcctatcggg cttgaagcca ccgactttcc gctgcagtac 720acgcttacct tctctggcga cagccagaag ttgccagaat ttttggtcca actctacagt 780tatatgcggg tacgtgggca cttataccct accgaggcgg cgttagtgtc gtttgtaggc 840aattgtttct cagggcgcgc gggctggtgg tttcagttgc ttttggatat ccagtcgcct 900ctgttagaac agtgtgaaag ttttatcccg gttctccaag acacatttga caatccggaa 960aacatgaagg acgcaaacca atgcatccac cagctttgtc agggcgaggg tcatgtggcc 1020acacacttcc acctcattgc acaagagctt aattgggatg aaagcacgct gtggatccag 1080ttccaggaag gcctggcctc atccatccag gatgaacttt cccatacatc gcctgctacc 1140aacctgagtg atctgattac tcaatgcatc tcattagagg aaaagcctga cccaaacccg 1200ttagggaagt cctcctcggc ggagggggat ggcccggaaa gtccgccagc agaaaaccaa 1260cctatgcaag ctgcgatcaa ttgtcctcac atttccgaag cagagtgggt tcgttggcac 1320aaaggccggc tttgtctcta ttgcggctat ccgggtcact tcgcacgtga ttgcccagtg 1380aagccacacc aggcgttaca ggcagggaac attcaggctt gccaa 142549717DNAHomo sapiens 49ggggtgcagc cgcagactag caaagctgaa tcgccggctc tcgctgcctc accgaacgca 60caaatggatg acgttattga tacattaacc tccctgcgtc tgacgaattc ggctctgcgg 120cgggaggcta gcactcttcg ggccgagaaa gcaaatttaa ctaatatgct cgagtcagtg 180atggccgagt taacgctgtt acggacccgt gcgcggattc cgggggccct gcagattacg 240ccaccaattt cgtctattac tagcaacggt actcgcccga tgacgactcc tccaactagt 300ttacctgaac cgttttctgg cgatcctggc cggttagctg gtttccttat gcagatggac 360cgttttatga tctttcaagc tagccggttt ccaggggagg cagagcgtgt tgcgttcctg 420gtgtcgcgct taactggcga agcagaaaaa tgggccattc ctcacatgca accagactct 480cctttgcgta acaactatca aggcttctta gcagagttac ggcggaccta taagagcccg 540ttgcgtcacg cccggcgggc gcaaatccgg aagacatcgg cctcgaaccg ggcagtccgt 600gaacgccaaa tgctttgccg gcaacttgca tcagcaggta caggcccatg cccggtacac 660cctgctagta acgggacttc cccggcaccg gcattaccag cacgggcgcg taactta 71750339DNAHomo sapiens 50ggggacggtc gggtacagtt gatgaaggct ttattggctg gccctttacg tccggcggca 60cgccgttggc ggaatcctat tccatttcca gagacttttg atggggatac tgatcgcctc 120ccggagttta tcgtccaaac ttcgtcctac atgttcgttg acgaaaatac tttctctaac 180gacgctctga aagtgacatt tctcattacc cggctgacag gtccagcctt gcaatgggtc 240attccgtaca ttcgtaaaga aagcccgctt cttaacgact atcggggttt cctggccgag 300atgaagcggg tttttgggtg ggaagaggac gaggacttt 33951339DNAHomo sapiens 51ggggaaggtc gggtgcaact tatgaaagcg ttgcttgccc gcccgcttcg tccagcagca 60cgtcgctggc ggaatccaat tcctttcccg gagacttttg acggggacac cgatcggctc 120ccagagttca ttgtgcagac gtcaagctat atgttcgtgg atgagaacac gttctctaac 180gacgcgttga aagtgacttt cttaattacg cgtttgactg gcccggcttt acaatgggtg 240attccataca ttaagaaaga gtcaccgctt ctcagtgatt atcgcggttt tttagccgag 300atgaagcggg tcttcgggtg ggaagaagac gaagacttt 339521092DNAHomo sapiens 52gggccgcgtg ggcgttgccg tcaacaaggt cctcggattc cgatttgggc agcggccaac 60tatgccaacg cccacccgtg gcaacaaatg gataaggctt cgccaggcgt tgcttacaca 120cctttggttg atccttggat tgagcggcct tgttgcggtg acacggtttg tgtgcgcacc 180acaatggaac agaagagcac agcgtcaggc acttgtggtg gtaagcctgc tgagcgtggt 240cctctcgcgg ggcatatgcc gagctcacgc ccacatcggg ttgatttctg ttgggttcct 300ggtagcgacc caggcacatt cgacggcagt ccatggctct tagatcgctt tttggcgcaa 360cttggtgatt acatgagttt tcactttgaa cactaccagg acaatatcag ccgtgtctgc 420gagattcttc gtcggttaac gggccgcgct caggcatggg ctgctcctta cctggacggg 480gaccttccac tgccagacga ctacgaattg ttttgtcaag accttaagga ggtagtacag 540gaccctaaca gtttcgccga gtatcacgcc gtggtgactt gtccactccc tcttgcttcg 600tcccaacttc ctgtagctcc tcagcttccg gtggtacgcc aataccttgc gcgcttcttg 660gagggccttg ctttggatat gggtacggcg cctcggtcac tcccggccgc tatggccaca 720ccggcagtct ccggctcgaa ctccgtttct cgttctgcct tatttgaaca acaactcaca 780aaggaatcca ctccaggccc gaaagagcca cctgttctcc ctagctcgac ttgctctagc 840aaaccgggtc ctgtcgaacc agccagttca caacctgaag aggctgctcc taccccggtg 900ccgcgtttgt cagagtcggc taacccaccg gctcagcgtc cagaccctgc tcaccctggt 960ggtcctaaac cacaaaaaac cgaagaggaa gttttagaaa ctgaggggga ccaggaagtt 1020agcctgggga cgccgcagga ggtcgtagaa gcgccggaaa caccaggtga accaccgctc 1080agccctgggt tc 109253438DNAHomo sapiens 53ggggttgatg aattggtgct cttgttgcac gcgctgttaa tgcgccatcg ggcgctttcc 60attgaaaatt ctcagttgat ggagcaactt cgcttgttgg tctgcgaacg ggcgagcctt 120cttcgtcagg tacgtccgcc gagctgtcca gtgccatttc ctgagacttt taacggggag 180tcatcacggt tacctgagtt catcgtccaa accgcaagct atatgttagt taatgaaaat 240cgcttttgca atgacgcaat gaaagtcgct tttttgatta gccttcttac tggtgaagca 300gaagaatggg tcgtcccata cattgagatg gattcaccaa ttcttgggga ctaccgtgcg 360ttcttggatg agatgaagca gtgttttggg tgggacgatg atgaagatga cgacgatgag 420gaagaggagg atgactat 438541647DNAHomo sapiens 54gggcctgtgg atttaggtca ggctttgggg ttgttgccat ccctcgctaa ggccgaagat 60tcccaattta gcgaaagcga tgcagcttta caggaggaat tgtcttctcc ggaaaccgca 120cggcaacttt ttcgtcaatt tcgctatcaa gtcatgtcgg ggcctcatga aacactgaaa 180cagttacgga agttatgttt tcagtggctg caacctgaag tccatacaaa ggaacaaatc 240ctcgaaattc tgatgctgga acagttcttg accattctgc ctggtgaaat tcagatgtgg 300gtccgcaagc agtgccctgg tagtggggag gaggcggtta cgttagtaga atccctgaaa 360ggtgatccac aacggctctg gcaatggatc tccatccaag tcctgggtca ggatatcctg 420tctgagaaaa tggagtcacc ttcttgccag gtgggcgaag tggagccaca cctggaagtt 480gtacctcagg aactggggtt agagaattca tcttcagggc cgggggaact tctttcgcac 540atcgtgaaag aggagtctga cactgaagca gagttggcgt tagcggcatc ccagccagct 600cgtttggaag aacggctgat tcgggatcag gaccttgggg cgtccctcct cccggcagca 660ccgcaggagc aatggcgtca attagacagc actcaaaaag aacaatattg ggacctgatg 720ctggagacct acggcaaaat ggtatccggc gcgggtatct cacacccgaa gtccgattta 780acgaactcaa ttgagttcgg tgaagagttg gcaggtattt atttacatgt aaacgaaaag 840attccgcggc ctacctgcat tggtgaccgc caagaaaacg acaaagaaaa ccttaatttg 900gaaaaccatc gtgaccagga attattacat gccagctgcc aggcctcggg cgaagtgcca 960tcccaggcat cgttacgtgg cttctttacc gaggacgaac ctggttgctt cggcgaaggg 1020gagaaccttc ctgaggcact tcagaatatc caggatgagg ggactggcga acagctgagc 1080ccgcaagaac gcattagtga aaaacagttg ggtcaacatt tgccaaatcc gcactcgggg 1140gagatgtcga cgatgtggct tgaagaaaaa cgggagacca gccagaaagg ccaaccacgt 1200gcaccaatgg cgcagaaatt gccaacgtgc cgcgaatgtg gcaaaacgtt ttatcgcaat 1260agtcaactta tctttcacca acgcacacac accggtgaga catattttca atgcaccatc 1320tgcaaaaagg cgtttctccg gtcatctgat ttcgtgaaac atcagcggac tcatactggc 1380gaaaaacctt gtaaatgtga ctattgtggc aagggcttta gtgattttag cgggcttcgg 1440catcacgaga agatccatac cggcgagaag ccatacaagt gtccaatctg tgagaaatct 1500ttcatccagc gcagtaattt taaccgccac caacgggttc acaccggtga aaagccttat 1560aaatgctcgc attgtggcaa gagcttcagc tggagctcct cgctcgataa gcatcaacgt 1620tcacatctgg ggaagaagcc gttccaa 1647551053DNAHomo sapiens 55gggactctcc gcttacttga ggattggtgt cgggggatgg acatgaaccc acgtaaggcc 60cttcttatcg ccgggatttc ccagtcatgt tcagtcgccg agattgaaga ggcgctccaa 120gccgggcttg ctcctttagg cgagtatcgt ctccttgggc ggatgtttcg ccgcgatgaa 180aatcgcaaag tagcgttggt tggtctcaca gctgaaacta gccatgcgct tgtacctaaa 240gaaattcctg gtaaaggcgg gatctggcgg gttattttta aaccaccgga cccggacaat 300acgtttcttt ctcgtttgaa tgagttcctc gcgggcgagg ggatgacggt gggggaactt 360agtcgtgctc ttggtcacga aaatgggtca ttagaccctg aacagggtat gattccggaa 420atgtgggcgc cgatgctggc acaggctctg gaggctctcc aaccggcttt acagtgcctt 480aagtacaaga agctgcgcgt tttttcaggg cgcgagtctc cagagccggg tgaggaggaa 540ttcggccgtt ggatgttcca taccacccag atgatcaaag cgtggcaggt gccggatgtc 600gagaaacgcc gccggctgtt ggaatcactc cgcgggccgg cacttgacgt tattcgggtt 660ctgaaaatta acaacccgtt aattacggta gatgaatgtt tgcaagcact tgaagaggtc 720tttggggtga ctgacaatcc tcgggaattg caagtaaaat acttaacgac ctaccataag 780gacgaggaga aattatcagc ctacgtactg cggctggaac cgctgctgca gaagctcgtc 840cagcgggggg ctattgaacg ggacgctgtt aatcaggctc gcctggatca ggtaatcgct 900ggggcggtac ataaaactat ccgccgtgag ctgaacctgc ctgaagacgg gccggcgcca 960ggctttcttc aactcctcgt tttgattaag gattacgagg cagctgaaga ggaggaagca 1020ttacttcagg ccattcttga agggaacttt act 1053562124DNAHomo sapiens 56gggacagaac ggcgtcgcga cgaattaagt gaagaaatta ataatcttcg tgaaaaggtt 60atgaaacaga gtgaggaaaa caacaatctt caatcccaag tccagaaact cactgaggag 120aatactacac tccgtgagca agttgaacct acacctgaag atgaagatga cgacattgag 180ttgcggggcg cagcagccgc agccgcgcct ccgccgccga tcgaggagga atgcccggag 240gatttaccgg aaaaatttga tggtaatccg gacatgttag cgccattcat ggcccagtgc 300caaattttta tggaaaagtc tacgcgcgat tttagtgtag atcgcgtacg tgtatgtttt 360gtgacgagca tgatgactgg tcgcgcagcc cgttgggcgt cagcgaaatt ggagcggtcg 420cactacctga tgcataatta cccggcgttc atgatggaga tgaaacacgt gtttgaagac 480ccgcagcggc gggaggtggc caaacgcaag atccggcggt tgcggcaggg catgggcagc 540gtaattgatt atagtaatgc gtttcaaatg attgcgcagg atctggattg gaatgaacct 600gctctcattg atcaatatca tgaagggctt agtgaccata ttcaagagga actctctcac 660ctggaagtgg ctaaatctct ctccgccctt attggccaat gcattcatat tgagcgccgt 720cttgcacgtg ctgctgccgc tcggaaaccg cgtagtccac cacgggcttt agtgctccca 780catatcgcgt cacaccatca agtagatcct actgagccag tggggggtgc acgcatgcgc 840ttaacccaag aagaaaagga acgtcgtcgt aagctgaatt tatgcctgta ctgcggcact 900ggtggccatt atgccgataa ctgtcctgcc aaagccagta agtcaagccc ggctgggaaa 960cttccaggtc ctgccgtcga gggcccttct gctaccggcc cagagattat ccgctccccg 1020caagacgatg cgtcgtcgcc tcatctccag gtaatgctcc aaatccacct ccctggccgg 1080cacacactct ttgtccgggc gatgattgac tctggggcgt ctggtaattt tattgatcac 1140gagtatgttg ctcaaaatgg tatccctctc cggatcaaag actggcctat tctggttgaa 1200gccatcgatg gccgtccgat cgcgagcggt cctgtggttc atgaaacgca tgacctcatc 1260gttgatctgg gtgaccaccg tgaagtatta tcctttgatg tgactcagtc accgtttttt 1320ccagttgttt tgggcgtccg ttggctttcg actcacgatc ctaacatcac gtggtcgaca 1380cggtcgattg tcttcgattc ggaatattgt cgttatcatt gccgcatgta ttcaccaatt 1440ccgccgtctc tcccgccgcc tgcgccgcaa cctcctctgt attacccggt ggacggttac 1500cgtgtttacc agccagttcg ctactactac gtacaaaacg tgtacacgcc tgttgatgaa 1560cacgtgtacc cagatcaccg cctggtcgac cctcatattg agatgatccc gggtgcgcac 1620tcgatcccat cgggccatgt ttattccttg tctgagccag aaatggccgc cttacgggat 1680tttgtggccc ggaatgtcaa agacggcctg attaccccga caattgcacc aaacggtgct 1740caggtgttgc aggtgaagcg gggctggaag ttgcaagtca gctatgattg tcgtgcgcca 1800aacaacttca ctattcagaa ccaatatcca cgtctcagca tccctaatct cgaggaccag 1860gcacatcttg caacatatac tgaatttgta cctcagattc ctggctatca gacttatcct 1920acgtatgctg cctacccaac atacccggta ggtttcgcat ggtacccagt aggccgggac 1980gggcagggcc gctctttata tgttcctgtc atgattacat ggaacccgca ttggtaccgc 2040cagcctccgg tcccacagta cccacctcct caacctccac cacctccgcc gcctcctcca 2100ccgccacctt cttactcgac atta 21245760DNAArtificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 57atgcatcacc atcaccatca cggctcaggg tctggtagcg aaaatctgta cttccagggg 605820PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 58Met His His His His His His Gly Ser Gly Ser Gly Ser Glu Asn Leu1 5 10 15Tyr Phe Gln Gly 20596PRTArtificial SequenceDescription of Artificial Sequence Synthetic 6xHis tag 59His His His His His His1 5606PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 60Gly Ser Gly Ser Gly Ser1 5617PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 61Glu Asn Leu Tyr Phe Gln Gly1 56226DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 62aagctcattt cctggtatga caacga 266325DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 63agggtctctc tcttcctctt gtgct 256425DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 64gctcaacctg ggaactgcat ctgat 256525DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 65taatcctgtt tgctccccac gcttt 256622DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 66ggcccctcag ctccagtgat tc 226725DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 67cctgttgtca ctctcctggc tctga 256824DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 68gccaagacat aagaaacctc gcct 246924DNAArtificial SequenceDescription of Artificial Sequence Synthetic primer 69gtgaatcaac atcctccctc cgtc 247025PRTArtificial SequenceDescription of Artificial Sequence Synthetic PeptideMISC_FEATURE(1)..(25)This sequence may encompass 1-5 "Glu Ala Ala Ala Lys" repeating units 70Glu Ala Ala Ala Lys Glu Ala Ala Ala Lys Glu Ala Ala Ala Lys Glu1 5 10 15Ala Ala Ala Lys Glu Ala Ala Ala Lys 20 257125PRTArtificial SequenceDescription of Artificial Sequence Synthetic PeptideMISC_FEATURE(1)..(25)This sequence may encompass 1-5 "Glu Ala Ala Ala Arg" repeating units 71Glu Ala Ala Ala Arg Glu Ala Ala Ala Arg Glu Ala Ala Ala Arg Glu1 5 10 15Ala Ala Ala Arg Glu Ala Ala Ala Arg 20 257250PRTArtificial SequenceDescription of Artificial Sequence Synthetic PolypeptideMISC_FEATURE(1)..(50)This sequence may encompass 1-10 "Gly Gly Gly Gly Ser" repeating units 72Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly1 5 10 15Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly 20 25 30Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly 35 40 45Gly Ser 507340PRTArtificial SequenceDescription of Artificial Sequence Synthetic PolypeptideMISC_FEATURE(1)..(40)This sequence may encompass 1-10 "Gly Gly Gly Ser" repeating units 73Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser1 5 10 15Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser 20 25 30Gly Gly Gly Ser Gly Gly Gly Ser 35 407418PRTArtificial SequenceDescription of Artificial Sequence Synthetic Peptide 74Lys Glu Ser Gly Ser Val Ser Ser Glu Gln Leu Ala Gln Phe Arg Ser1 5 10 15Leu Asp7514PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 75Glu Gly Lys Ser Ser Gly Ser Gly Ser Glu Ser Lys Ser Thr1 5 107610PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 76Gly Gly Ala Ala Asn Leu Val Arg Gly Gly1 5 107710PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 77Ser Gly Arg Ile Gly Phe Leu Arg Thr Ala1 5 10785PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 78Ser Gly Arg Ser Ala1 5794PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 79Gly Phe Leu Gly1804PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 80Ala Leu Ala Leu1815PRTArtificial SequenceDescription of Artificial Sequence Synthetic PeptideMOD_RES(3)..(3)S-ethylcysteine 81Pro Ile Cys Phe Phe1 5825PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptideMOD_RES(3)..(3)Ser or ThrMOD_RES(4)..(4)Leu or IleMOD_RES(5)..(5)Ser or Thr 82Pro Arg Xaa Xaa Xaa1 5834PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 83Asp Glu Val Asp1846PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 84Gly Trp Glu His Asp Gly1 5858PRTArtificial SequenceDescription of Artificial Sequence Synthetic peptide 85Arg Pro Leu Ala Leu Trp Arg Ser1 5

* * * * *


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed