Methods For Genomic Modification

Serber; Zach ;   et al.

Patent Application Summary

U.S. patent application number 15/424709 was filed with the patent office on 2017-08-24 for methods for genomic modification. This patent application is currently assigned to Amyris, Inc.. The applicant listed for this patent is Amyris, Inc.. Invention is credited to Andrew Horwitz, Zach Serber.

Application Number20170240923 15/424709
Document ID /
Family ID46051948
Filed Date2017-08-24

United States Patent Application 20170240923
Kind Code A1
Serber; Zach ;   et al. August 24, 2017

METHODS FOR GENOMIC MODIFICATION

Abstract

Provided herein are methods of integrating one or more exogenous nucleic acids into one or more selected target sites of a host cell genome. In certain embodiments, the methods comprise contacting the host cell genome with one or more integration polynucleotides comprising an exogenous nucleic acid to be integrated into a genomic target site, and a nuclease capable of causing a double-strand break near or within the genomic target site.


Inventors: Serber; Zach; (Emeryville, CA) ; Horwitz; Andrew; (Emeryville, CA)
Applicant:
Name City State Country Type

Amyris, Inc.

Emeryville

CA

US
Assignee: Amyris, Inc.
Emeryville
CA

Family ID: 46051948
Appl. No.: 15/424709
Filed: February 3, 2017

Related U.S. Patent Documents

Application Number Filing Date Patent Number
14178203 Feb 11, 2014 9701971
15424709
13459034 Apr 27, 2012 8685737
14178203
61539389 Sep 26, 2011
61500741 Jun 24, 2011
61479821 Apr 27, 2011

Current U.S. Class: 1/1
Current CPC Class: C12N 15/1093 20130101; C12N 15/905 20130101; C12N 15/81 20130101; C12N 15/1082 20130101
International Class: C12N 15/90 20060101 C12N015/90; C12N 15/81 20060101 C12N015/81

Claims



1. A method for simultaneously integrating a plurality of (n) exogenous nucleic acids into a plurality of (n) target sites of a host cell genome, wherein n is at least two, the method comprising: (a) simultaneously contacting a host cell with: (i) said plurality of exogenous nucleic acids, wherein: x is an integer that varies from 1 to n, and for each integer x, each exogenous nucleic acid (ES).sub.x comprises a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x, at a target site (TS).sub.x selected from said plurality of (n) target sites of said host cell genome; and (ii) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; and (b) recovering a host cell wherein each exogenous nucleic acid (ES).sub.x has integrated at its selected target sequence (TS).sub.x, wherein x is any integer from 1 to n wherein n is at least 2.

2. The method of claim 1, wherein (HR1).sub.x is homologous to a 5' region of (TS).sub.x, and (HR2).sub.x, is homologous to a 3' region of (TS).sub.x.

3. The method of claim 1, wherein (N).sub.x is capable of cleaving at a region positioned between said 5' and 3' regions of (TS).sub.x.

4. The method of claim 1, wherein a single nuclease is capable of cleaving each (TS).sub.x.

5. The method of claim 1, wherein n=3, 4, 5, 6, 7, 8, 9 or 10.

6. The method of claim 1, wherein said recovering does not require integration of a selectable marker.

7. (canceled)

8. The method of claim 1, wherein said recovering occurs at a frequency of about one every 10, 9, 8, 7, 6, 5, 4, 3, or 2 contacted host cells, or clonal populations thereof, screened.

9. The method of claim 1, wherein said recovering comprises identifying said integrations by at least one method selected from the group consisting of PCR, Southern blot, restriction mapping, and DNA sequencing.

10. The method of claim 1, wherein (N).sub.x is capable of cleaving an endogenous genomic sequence within (TS).sub.x.

11. The method of claim 1, wherein (N).sub.x is capable of cleaving an exogenous sequence within (TS).sub.x that is a recognition sequence for a homing endonuclease.

12. (canceled)

13. (canceled)

14. The method of claim 1, wherein (ES).sub.x further comprises a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x.

15. The method of claim 14, wherein (D).sub.x is selected from the group consisting of a selectable marker, a promoter, a nucleic acid sequence encoding an epitope tag, a gene of interest, a reporter gene, a nucleic acid sequence encoding a termination codon, and a nucleic acid sequence encoding an enzyme of a biosynthetic pathway.

16. The method of claim 1, wherein (ES).sub.x is linear.

17. The method of claim 1, wherein the host cell comprises one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway.

18. The method of claim 17, wherein the one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway are genomically integrated.

19. (canceled)

20. The method of claim 15, wherein (D).sub.x is a member of a library (L).sub.x comprising a plurality of nucleic acid molecules that encode variants of an enzyme of a biosynthetic pathway.

21. The method of claim 1, wherein the host cell comprises one or more heterologous nucleotide sequences encoding one or more enzymes of a mevalonate (MEV) pathway for making isopentenyl pyrophosphate selected from the group consisting of: acetyl-CoA thiolase, HMG-CoA synthase, HMG-CoA reductase, mevalonate kinase, phosphomevalonate kinase and mevalonate pyrophosphate decarboxylase.

22. (canceled)

23. The method of claim 21, wherein the host cell comprises a plurality of heterologous nucleic acids encoding all the enzymes of the MEV pathway.

24. The method of claim 21, wherein each said exogenous nucleic acid (ES).sub.x comprises a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x, encoding a terpene synthase selected from the group consisting of: a monoterpene synthase, a diterpene synthase, a sesquiterpene synthase, a sesterterpene synthase, a triterpene synthase, a tetraterpene synthase, and a polyterpene synthase.

25. (canceled)

26. The method of claim 1, wherein (N).sub.x is provided as an expression vector comprising a nucleic acid sequence encoding (N).sub.x.

27. The method of claim 1, wherein (N).sub.x is transformed into the host cell as a purified protein.

28. The method of claim 1, wherein (N).sub.x is selected from the group consisting of an endonuclease, a zinc finger nuclease, a TAL-effector DNA binding domain-nuclease fusion protein (TALEN), a transposase, and a site-specific recombinase.

29. The method of claim 28, wherein the zinc finger nuclease is a fusion protein comprising the cleavage domain of a TypeIIS restriction endonuclease fused to an engineered zinc finger binding domain.

30. The method of claim 29, wherein the TypeIIS restriction endonuclease is selected from the group consisting of HO endonuclease and Fok I endonuclease.

31. (canceled)

32. The method of claim 30, wherein the endonuclease is a homing endonuclease selected from the group consisting of: an LAGLIDADG homing endonuclease, an HNH homing endonuclease, a His-Cys box homing endonuclease, a GIY-YIG homing endonuclease, and a cyanobacterial homing endonuclease.

33. The method of claim 30, wherein the endonuclease is selected from the group consisting of: H-DreI, I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, Pi-PspI, F-SceI, F-SceII, F-SuvI, F-CphI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, i-UarAP, i-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MgaI, PI-MtuI, PI-MtuHIP PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, or PI-TliII.

34. The method of claim 28, wherein the endonuclease is modified to specifically bind an endogenous genomic sequence, wherein the modified endonuclease no longer binds to its wild type endonuclease recognition sequence.

35. (canceled)

36. (canceled)

37. The method of claim 1, wherein the host cell is selected from the group consisting of a fungal cell, a bacterial cell, a plant cell, and an animal cell.

38. The method of claim 1, wherein the host cell is a yeast cell.

39. (canceled)

40. (canceled)

41. A host cell generated by the method of claim 1.

42-99. (canceled)

100. A method for markerless integration of an exogenous nucleic acid into a target site of a yeast cell genome, the method comprising: (a) simultaneously contacting a yeast cell with: (i) an exogenous nucleic acid (ES).sub.1 comprising a first homology region (HR1).sub.1 and a second homology region (HR2).sub.1, wherein (HR1).sub.1 and (HR2).sub.1 are capable of initiating host cell mediated homologous recombination at said target site (TS).sub.1; and (ii) a nuclease (N).sub.1 capable of cleaving at (TS).sub.1, whereupon said cleaving results in homologous recombination of (ES).sub.1 at (TS).sub.1; and (b) recovering a yeast cell having (ES).sub.1 integrated at (TS).sub.1, wherein said recovering does not require integration of a selectable marker.

101. (canceled)

102. A method for simultaneously integrating a plurality of (n) exogenous nucleic acids into a plurality of (n) target sites of a host cell genome, wherein n is at least two, the method comprising: (a) simultaneously contacting a host cell with: (i) a plurality of libraries, wherein: x is an integer that varies from 1 to n, and for each integer x, each library (L).sub.x comprises a plurality of exogenous nucleic acids, wherein a selected exogenous nucleic acid comprises, in a 5' to 3' orientation, a first homology region (HR1).sub.x, any nucleic acid of interest selected from the group (D).sub.x, and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of said selected exogenous nucleic acid at a target site (TS).sub.x of said host cell genome; and (ii) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of said selected exogenous nucleic acid at (TS).sub.x; and (b) recovering a host cell wherein each exogenous nucleic acid from each library (L).sub.x has integrated at each selected target sequence (TS).sub.x, wherein x is any integer from 1 to n wherein n is at least 2.

103. (canceled)

104. (canceled)
Description



1. CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of U.S. application Ser. No. 13/459,034, filed Apr. 27, 2012, which claims benefit of priority of U.S. Provisional Application No. 61/479,821, filed on Apr. 27, 2011; U.S. Provisional Application No. 61/500,741, filed on Jun. 24, 2011; and U.S. Provisional Application No. 61/539,389, filed on Sep. 26, 2011, the contents of each of which are hereby incorporated by reference in their entirety.

2. FIELD OF THE INVENTION

[0002] The methods and compositions provided herein generally relate to the fields of molecular biology and genetic engineering.

3. BACKGROUND

[0003] Genetic engineering techniques to introduce and integrate exogenous nucleic acids into a host cell genome are needed in a variety of fields. For example, in the field of synthetic biology, the fabrication of a genetically modified strain requires the insertion of customized DNA sequences into a chromosome of the host cell, and commonly, industrial scale production requires the introduction of dozens of genes into the host organism. Optimized designs for the industrial strain are arrived at empirically, requiring construction and in vivo testing of many DNA assemblies, alone and/or in concert with other biosynthetic pathway components.

[0004] Genetic engineering is highly reliant on gene targeting, which utilizes an extrachromosomal fragment of donor template DNA and invokes a cell's homologous recombination (HR) machinery to exchange a chromosomal sequence with an exogenous donor sequence. See, e.g., Capecchi, Science 244:1288-1292 (1989). Gene targeting is limited in its efficiency; in plant and mammalian cells, only .about.1 in 10.sup.6 cells provided with excess template sequences undergo the desired gene modification. Yeast demonstrates an increased capacity for homologous recombination. However, the successful incorporation of exogenous DNA into yeast genomes is still a comparatively rare event (.about.1 in 10.sup.5), and requires the use of a selectable marker to screen for recombinant cells which usually comprise only a single genomic modification. In addition, since only a limited cache of selectable markers are available for use in yeast, selectable marker(s) must be removed from a recombinant strain to allow for additional genomic modifications using the same markers, and in some instances, prior to releasing the host cell in a manufacturing or natural environment. Thus, independent of the efficiency at which integration can be achieved at any single locus, the one-at-a-time serial nature of genomic engineering requires that making changes at multiple loci requires as many engineering cycles as there are loci to be modified.

[0005] The efficiency of gene targeting can be improved when combined with a targeted genomic double-stranded break (DSB) introduced near the intended site of integration. See e.g., Jasin, M., Trends Genet 12(6):224-228 (1996); and Urnov et al., Nature 435(7042):646-651 (2005). So called "designer nucleases" are enzymes that can be tailored to bind to a specific "target" sequence of DNA in vivo and introduce a double-strand break thereto. Such targeted double-strand breaks can be effected, for instance, by transforming a host cell with a plasmid containing a gene that encodes the designer nuclease. The host cell repairs these double-strand breaks by either homology-directed DNA repair or non-homologous end joining. In the course of the repair, either mechanism may be utilized to incorporate an exogenous donor DNA at the target site. If the nuclease is introduced into the cell at the same time as the donor DNA is introduced, the cell can integrate the donor DNA at the target loci.

[0006] The advent of designer nucleases has enabled the introduction of transgenes into particular target loci in crops (Wright et al., Plant J 44:693-705 (2005)), to improve mammalian cell culture lines expressing therapeutic antibodies (Malphettes et al., Biotechnol Bioeng 106(5):774-783 (2010)), and even to edit the human genome to evoke resistance to HIV (Urnov et al., Nat Rev Genet 11(9):636-646 (2010)). While impactful, DSB-mediated HR has yet to be exploited to reduce the multiple rounds of engineering needed to integrate multiple DNA assemblies, for example, towards the construction of functional metabolic pathways in industrial microbes.

[0007] Thus, there exists a need for methods and compositions that allow for the simultaneous integration of a plurality of exogenous nucleic acids into specific regions of a host cell genome.

4. SUMMARY

[0008] Provided herein are methods and compositions for integrating one or more exogenous nucleic acids into specified genomic loci of a host cell. In some embodiments, a plurality of exogenous nucleic acids is simultaneously integrated with a single transformation reaction. In some embodiments, the methods comprise the introduction of one or more nucleases and one or more donor DNA assemblies into the cell to facilitate integration of the donor DNA at specified locations in the genome. The methods and compositions utilize the native homologous recombination machinery of the host cell, which recombination is further enhanced by inducing targeted double-strand breaks in the host cell's genome at the intended sites of integration.

[0009] Thus, in one aspect, provided herein is a method for integrating a plurality of exogenous nucleic acids into a host cell genome, the method comprising: [0010] (a) contacting a host cell with: [0011] (i) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a target site (TS).sub.x of said host cell genome; and [0012] (ii) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; and [0013] (b) recovering a host cell wherein each selected exogenous nucleic acid (ES).sub.x has integrated at each selected target sequence (TS).sub.x, [0014] wherein x is any integer from 1 to n wherein n is at least 2.

[0015] In some embodiments, (HR1).sub.x is homologous to a 5' region of (TS).sub.x, and (HR2).sub.x, is homologous to a 3' region of (TS).sub.x.

[0016] In some embodiments, (N).sub.x is capable of cleaving at a region positioned between said 5' and 3' regions of (TS).sub.x.

[0017] In some embodiments, a single nuclease is capable of cleaving each (TS).sub.x.

[0018] In some embodiments, n=3, 4, 5, 6, 7, 8, 9 or 10. In some embodiments, n>10.

[0019] In some embodiments, said recovering does not require integration of a selectable marker. In some embodiments, said recovering occurs at a higher frequency as compared to not contacting the host cell with a nuclease capable of cleaving at said target site. In some embodiments, said recovering occurs at a frequency of about one every 10, 9, 8, 7, 6, 5, 4, 3, or 2 contacted host cells, or clonal populations thereof, screened. In some embodiments, said recovering comprises identifying said integrations by at least one method selected from the group consisting of PCR, Southern blot, restriction mapping, and DNA sequencing.

[0020] In some embodiments, (N).sub.x is capable of cleaving an endogenous host genomic sequence, e.g., a native loci within (TS).sub.x. In some embodiments, (N).sub.x is capable of cleaving an exogenous sequence, e.g., an introduced loci within (TS).sub.x.

[0021] In some embodiments, (ES).sub.x further comprises a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x. In some embodiments, (D).sub.x is selected from the group consisting of a promoter, a nucleic acid sequence encoding an epitope tag, a gene of interest, a reporter gene, and a nucleic acid sequence encoding a termination codon.

[0022] In some embodiments, (ES).sub.x is linear. In some embodiments, (N).sub.x is provided as an expression vector comprising the nucleic acid sequence encoding (N).sub.x. In some embodiments, (N).sub.x is transformed into the host cell as a purified protein. In some embodiments, (N).sub.x is transformed into the host cell as purified RNA.

[0023] In some embodiments, the host cell comprises one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway. In some embodiments, the one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway are genomically integrated. In some embodiments, each exogenous nucleic acid (ES).sub.x comprises a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x, encoding an enzyme of a biosynthetic pathway. In some embodiments, (D), is a member of a library (L).sub.x comprising a plurality of nucleic acid molecules that encode variants of an enzyme of a biosynthetic pathway.

[0024] In some embodiments, the host cell comprises one or more heterologous nucleotide sequences encoding one or more enzymes of a mevalonate (MEV) pathway for making isopentenyl pyrophosphate. In some embodiments, the one or more enzymes of the mevaloante pathway are selected from acetyl-CoA thiolase, HMG-CoA synthase, HMG-CoA reductase, mevalonate kinase, phosphomevalonate kinase and mevalonate pyrophosphate decarboxylase. In some embodiments, the host cell comprises a plurality of heterologous nucleic acids encoding all of the enzymes of a MEV pathway. In other words, the plurality of heterologous nucleic acids, taken together, encodes at least one enzyme of each class of enzymes of the MEV pathway listed above. In some embodiments, each exogenous nucleic acid (ES).sub.x comprises a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x, encoding a terpene synthase. In some embodiments, the terpene synthase is selected from the group consisting of a monoterpene synthase, a diterpene synthase, a sesquiterpene synthase, a sesterterpene synthase, a triterpene synthase, a tetraterpene synthase, and a polyterpene synthase.

[0025] In some embodiments, (N).sub.x is selected from the group consisting of an endonuclease, e.g., a meganuclease, a zinc finger nuclease, a TAL-effector DNA binding domain-nuclease fusion protein (TALEN), a transposase, and a site-specific recombinase, wherein x is 1 or any integer from 1 to n. In some embodiments, the zinc finger nuclease is a fusion protein comprising the cleavage domain of a TypeIIS restriction endonuclease fused to an engineered zinc finger binding domain. In some embodiments, the TypeIIS restriction endonuclease is selected from the group consisting of HO endonuclease and Fok I endonuclease. In some embodiments, the zinc finger binding domain comprises 3, 5 or 6 zinc fingers. In some embodiments, the endonuclease is a homing endonuclease selected from the group consisting of: an LAGLIDADG homing endonuclease, an HNH homing endonuclease, a His-Cys box homing endonuclease, a GIY-YIG homing endonuclease, and a cyanobacterial homing endonuclease. In some embodiments, the endonuclease is selected from the group consisting of: H-DreI, I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, Pi-PspI, F-SceI, F-SceII, F-SuvI, F-CphI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, i-UarAP, i-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MgaI, PI-MtuI, PI-MtuHIP PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, or PI-TliII. In particular embodiments, the endonuclease is Fcph-I.

[0026] In some embodiments, the endonuclease is modified to specifically bind an endogenous host cell genomic sequence, wherein the modified endonuclease no longer binds to its wild type endonuclease recognition sequence. In some embodiments, the modified endonuclease is derived from a homing endonuclease selected from the group consisting of: an LAGLIDADG homing endonuclease, an HNH homing endonuclease, a His-Cys box homing endonuclease, a GIY-YIG homing endonuclease, and a cyanobacterial homing endonuclease. In some embodiments, the modified endonuclease is derived from an endonuclease selected from the group consisting of: H-DreI, I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, Pi-PspI, F-SceI, F-SceII, F-SuvI, F-CphI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, i-UarAP, i-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MgaI, PI-MtuI, PI-MtuHIP PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, or PI-TliII.

[0027] In some embodiments, the host cell is a fungal cell, a bacterial cell, a plant cell, an animal cell, or a human cell. In particular embodiments, the host cell is a yeast cell. In some embodiments, the yeast cell is a haploid yeast cell. In some embodiments, the yeast cell is a Saccharomyces cerevisiae cell. In some embodiments, the Saccharomyces cerevisiae cell is of the Baker's yeast, Mauri, Santa Fe, IZ-1904, TA, BG-1, CR-1, SA-1, M-26, Y-904, PE-2, PE-5, VR-1, BR-1, BR-2, ME-2, VR-2, MA-3, MA-4, CAT-1, CB-1, NR-1, BT-1 or AL-1 strain.

[0028] In another aspect, provided herein is a method for markerless integration of an exogenous nucleic acid into a target site of a yeast cell genome, the method comprising: [0029] (a) contacting a host yeast cell with: [0030] (i) an exogenous nucleic acid (ES) comprising a first homology region (HR1) and a second homology region (HR2), wherein (HR1) and (HR2) are capable of initiating host cell mediated homologous recombination at said target site (TS); and [0031] (ii) a nuclease (N) capable of cleaving at (TS), whereupon said cleaving results in homologous recombination of (ES) at (TS); [0032] and [0033] (b) recovering a host cell having (ES) integrated at (TS), wherein said recovering does not require integration of a selectable marker.

[0034] In another aspect, provided herein is a modified host cell generated by any of the methods of genomically integrating one or more exogenous nucleic acids described herein. In some embodiments, the modified host cell comprises: [0035] (a) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a target site (TS).sub.x of said host cell genome; and [0036] (b) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; [0037] wherein x is any integer from 1 to n wherein n is at least 2.

[0038] In some embodiments, the modified host cell is a yeast cell and comprises: [0039] (a) an exogenous nucleic acid (ES) comprising a first homology region (HR1) and a second homology region (HR2), wherein (HR1) and (HR2) are capable of initiating host cell mediated homologous recombination at a target site (TS) of the host cell genome; and [0040] (b) a nuclease (N) capable of cleaving at (TS), whereupon said cleaving results in homologous recombination of (ES) at (TS); [0041] wherein (ES) does not comprise a selectable marker.

[0042] In another aspect, provided herein is a composition comprising: [0043] (a) a yeast cell; [0044] (b) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises: [0045] (i) a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a selected target site (TS).sub.x of a yeast cell genome; and [0046] (ii) a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x; [0047] (c) a plurality of nucleases, wherein each nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; [0048] wherein x is any integer from 1 to n wherein n is at least 2.

[0049] In another aspect, provided herein is a kit useful for performing the methods for genomically integrating one or more exogenous nucleic acids described herein. In some embodiments, the kit comprises: [0050] (a) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises: [0051] (i) a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a selected target site (TS).sub.x of a yeast cell genome; and [0052] (ii) a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x; [0053] (b) a plurality of nucleases, wherein each nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; [0054] wherein x is any integer from 1 to n wherein n is at least 2.

[0055] In some embodiments, (D).sub.x is selected from the group consisting of a selectable marker, a promoter, a nucleic acid sequence encoding an epitope tag, a gene of interest, a reporter gene, and a nucleic acid sequence encoding a termination codon. In some embodiments, the kit further comprises a plurality of primer pairs (P).sub.x, wherein each primer pair is capable of identifying integration of (ES).sub.x at (TS).sub.x by PCR. In some embodiments, (ES).sub.x is linear. In some embodiments, (ES).sub.x is circular.

[0056] In a particular embodiment, the kit enables site-specific integration of an exogenous nucleic acid at a unique target site within any of the approximately 6000 genetic loci of the yeast genome. In these embodiments, n.gtoreq.6000, wherein each (TS).sub.x is unique to a specific locus of the yeast cell genome.

5. BRIEF DESCRIPTION OF THE FIGURES

[0057] FIG. 1 provides an exemplary embodiment of markerless genomic integration of an exogenous nucleic acid using a site-specific nuclease.

[0058] FIG. 2 provides an exemplary embodiment of simultaneous genomic integration of a plurality of exogenous nucleic acids using a plurality of site-specific nucleases. HR1--upstream homology region; HR2--downstream homology region; TS--target site; N--site-specific nuclease; D--nucleic acid of interest.

[0059] FIG. 3 provides a schematic representation of the MEV pathway for isoprenoid production.

[0060] FIG. 4 provides an exemplary embodiment of the methods of generating combinatorial integration libraries provided herein. The hatch marks represent individual exogenous nucleic acid members of each library (L).sub.x.

[0061] FIG. 5 provides results of colony PCR of 96 colonies of yeast cells transformed with empty vector DNA and linear "donor" DNA encoding functional EmGFP. The yeast cells comprised copies of "target" nucleic acid encoding a truncated, non-functional EmGFP genomically integrated at each of the HO, YGR250c, and NDT80 loci. Separate PCR reactions were performed to probe the HO, YGR250c, and NDT80 loci with primers specific to nucleic acid encoding functional EmGFP. No PCR products were observed, indicating that no replacements of the target nucleic acid encoding non-functional EmGFP with donor nucleic acid encoding functional EmGFP occurred.

[0062] FIG. 6 provides results of colony PCR of 96 colonies of yeast cells transformed with pZFN.gfp DNA and linear "donor" DNA encoding functional EmGFP. The yeast cells comprised copies of "target" nucleic acid encoding a truncated, non-functional EmGFP genomically integrated at each of the HO, YGR250c, and NDT80 loci. pZFN.gfp encodes a zinc finger nuclease which recognizes and cleaves a nucleic acid sequence specific to the non-functional EmGFP coding sequence. Separate PCR reactions were performed to probe the HO, YGR250c, and NDT80 loci with primers specific to nucleic acid encoding functional EmGFP. Numerous PCR products were observed, indicating successful replacement of the non-functional EmGFP integrations with DNA expressing functional EmGFP. 23 colonies have all 3 loci replaced.

[0063] FIG. 7 provides the sequiterpene titers of Strain B, a parental farnesene-producing yeast strain comprising enzymes of the mevalonate pathway and a plasmid encoding farnesene synthase (FS); Strain D, a derivative strain of Strain B in which 4 copies of amorphadiene synthase (ADS) have been genomically integrated; and Strain E, a derivative strain of Strain D in which the plasmid encoding FS has been lost. Nearly 100% of the sesquiterpene capacity of parental Strain B is maintained in Strains D and E with only the addition of multiple copies of ADS.

[0064] FIG. 8, provides results for cells co-transformed with linear donor DNAs for the SFC1 (GFP donor DNA) and YJR030c (ADE2 donor DNA) loci, the YJR030c endonuclease plasmid (pCUT006) and SFC1 endonuclease plasmid (pCUT058). 80% of colonies selected on URA dropout+Kan agar plates were GFP positive. Of these colonies, 91% were positive for ADE2 integration. In total, 72.8% of colonies had successfully integrated the markerless donor DNA at both loci.

6. DETAILED DESCRIPTION OF THE EMBODIMENTS

6.1 Definitions

[0065] As used herein, the terms "cleaves," "cleavage" and/or "cleaving" with respect to a nuclease, e.g. a homing endonuclease, zinc-finger nuclease or TAL-effector nuclease, refer to the act of creating a double-stranded break (DSB) in a particular nucleic acid. The DSB can leave a blunt end or sticky end (i.e., 5' or 3' overhang), as understood by those of skill in the art.

[0066] As used herein, the term "engineered host cell" refers to a host cell that is generated by genetically modifying a parent cell using genetic engineering techniques (i.e., recombinant technology). The engineered host cell may comprise additions, deletions, and/or modifications of nucleotide sequences to the genome of the parent cell.

[0067] As used herein, the term "heterologous" refers to what is not normally found in nature. The term "heterologous nucleotide sequence" refers to a nucleotide sequence not normally found in a given cell in nature. As such, a heterologous nucleotide sequence may be: (a) foreign to its host cell (i.e., is "exogenous" to the cell); (b) naturally found in the host cell (i.e., "endogenous") but present at an unnatural quantity in the cell (i.e., greater or lesser quantity than naturally found in the host cell); or (c) be naturally found in the host cell but positioned outside of its natural locus.

[0068] As used herein, the term "homology" refers to the identity between two or more nucleic acid sequences, or two or more amino acid sequences. Sequence identity can be measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more near to identical the sequences are to each other. Homologs or orthologs of nucleic acid or amino acid sequences possess a relatively high degree of sequence identity when aligned using standard methods. Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al., Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. Biosc. 8, 155-65, 1992; and Pearson et al., Meth. Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations. The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J. Mol. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biological Information (NCBI, National Library of Medicine, Building 38A, Room 8N805, Bethesda, Md. 20894) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Additional information can be found at the NCBI web site.

[0069] As used herein, the term "markerless" refers to integration of a donor DNA into a target site within a host cell genome without accompanying integration of a selectable marker. In some embodiments, the term also refers to the recovery of such a host cell without utilizing a selection scheme that relies on integration of selectable marker into the host cell genome. For example, in certain embodiments, a selection marker that is episomal or extrachromasomal may be utilized to select for cells comprising a plasmid encoding a nuclease capable of cleaving a genomic target site. Such use would be considered "markerless" so long as the selectable marker is not integrated into the host cell genome.

[0070] As used herein, the term "polynucleotide" refers to a polymer composed of nucleotide units as would be understood by one of skill in the art. Preferred nucleotide units include but are not limited to those comprising adenine (A), guanine (G), cytosine (C), thymine (T), and uracil (U). Useful modified nucleotide units include but are not limited to those comprising 4-acetylcytidine, 5-(carboxyhydroxylmethyl)uridine, 2-O-methylcytidine, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylamino-methyluridine, dihydrouridine, 2-O-methylpseudouridine, 2-O-methylguanosine, inosine, N6-isopentyladenosine, 1-methyladenosine, 1-methylpseudouridine, 1-methylguanosine, 1-methylinosine, 2,2-dimethylguanosine, 2-methyladenosine, 2-methylguanosine, 3-methylcytidine, 5-methylcytidine, N6-methyladenosine, 7-methylguanosine, 5-methylaminomethyluridine, 5-methoxyaminomethyl-2-thiouridine, 5-methoxyuridine, 5-methoxycarbonylmethyl-2-thiouridine, 5-methoxycarbonylmethyluridine, 2-methylthio-N6-isopentyladenosine, uridine-5-oxyacetic acid-methylester, uridine-5-oxyacetic acid, wybutoxosine, wybutosine, pseudouridine, queuosine, 2-thiocytidine, 5-methyl-2-thiouridine, 2-thiouridine, 4-thiouridine, 5-methyluridine, 2-O-methyl-5-methyluridine, 2-O-methyluridine, and the like. Polynucleotides include naturally occurring nucleic acids, such as deoxyribonucleic acid ("DNA") and ribonucleic acid ("RNA"), as well as nucleic acid analogs. Nucleic acid analogs include those that include non-naturally occurring bases, nucleotides that engage in linkages with other nucleotides other than the naturally occurring phosphodiester bond or that include bases attached through linkages other than phosphodiester bonds. Thus, nucleotide analogs include, for example and without limitation, phosphorothioates, phosphorodithioates, phosphorotriesters, phosphoramidates, boranophosphates, methylphosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), and the like.

[0071] Conventional notation is used herein to describe polynucleotide sequences: the left-hand end of a single-stranded polynucleotide sequence is the 5'-end; the left-hand direction of a double-stranded polynucleotide sequence is referred to as the 5'-direction.

[0072] As used herein, the term "simultaneous," when used with respect to multiple integration, encompasses a period of time beginning at the point at which a host cell is co-transformed with a nuclease, e.g. a plasmid encoding a nuclease, and more than one donor DNA to be integrated into the host cell genome, and ending at the point at which the transformed host cell, or clonal populations thereof, is screened for successful integration of the donor DNAs at their respective target loci. In some embodiments, the period of time encompassed by "simultaneous" is at least the amount of time required for the nuclease to bind and cleave its target sequence within the host cell's chromosome(s). In some embodiments, the period of time encompassed by "simultaneous" is at least 6, 12, 24, 36, 48, 60, 72, 96 or more than 96 hours, beginning at the point at which the a host cell is co-transformed with a nuclease, e.g. a plasmid encoding a nuclease, and more than one donor DNA.

6.2 Methods of Integrating Exogenous Nucleic Acids

[0073] Provided herein are methods of integrating one or more exogenous nucleic acids into one or more selected target sites of a host cell genome. In certain embodiments, the methods comprise contacting the host cell with one or more integration polynucleotides, i.e., donor DNAs, comprising an exogenous nucleic acid to be integrated into the genomic target site, and one or more nucleases capable of causing a double-strand break near or within the genomic target site. Cleavage near or within the genomic target site greatly increases the frequency of homologous recombination at or near the cleavage site.

[0074] In a particular aspect, provided herein is a method for markerless integration of an exogenous nucleic acid into a target site of a host cell genome, the method comprising: [0075] (a) contacting a host cell with: [0076] (i) an exogenous nucleic acid (ES) comprising a first homology region (HR1) and a second homology region (HR2), wherein (HR1) and (HR2) are capable of initiating host cell mediated homologous recombination at said target site (TS); and [0077] (ii) a nuclease (N) capable of cleaving at (TS), whereupon said cleaving results in homologous recombination of (ES) at (TS); [0078] and [0079] (b) recovering a host cell having (ES) integrated at (TS), wherein said recovering does not require integration of a selectable marker.

[0080] FIG. 1 provides an exemplary embodiment of markerless genomic integration of an exogenous nucleic acid using a site-specific nuclease. A donor polynucleotide is introduced to a host cell, wherein the polynucleotide comprises a nucleic acid of interest (D) flanked by a first homology region (HR1) and a second homology region (HR2). HR1 and HR2 share homology with 5' and 3' regions, respectively, of a genomic target site (TS). A site-specific nuclease (N) is also introduced to the host cell, wherein the nuclease is capable of recognizing and cleaving a unique sequence within the target site. Upon induction of a double-stranded break within the target site by the site-specific nuclease, endogenous homologous recombination machinery integrates the nucleic acid of interest at the cleaved target site at a higher frequency as compared to a target site not comprising a double-stranded break. This increased frequency of integration obviates the need to co-integrate a selectable marker in order to select transformants having undergone a recombination event. By eliminating the need for selectable markers, for example, during construction of an engineered microbe, the time needed to build a strain comprising a complete and functional biosynthetic pathway is greatly reduced. In addition, engineering strategies are no longer limited by the need to recycle selectable markers due to there being a limited cache of markers available for a given host organism.

[0081] In some embodiments, markerless recovery of a transformed cell comprising a successfully integrated exogenous nucleic acid occurs within a frequency of about one every 1000, 900, 800, 700, 600, 500, 400, 300, 200 or 100 contacted host cells, or clonal populations thereof, screened. In particular embodiments, markerless recovery of a transformed cell comprising a successfully integrated exogenous nucleic acid occurs within a frequency of about one every 90, 80, 70, 60, 50, 40, 30, 20, or 10 contacted host cells, or clonal populations thereof, screened. In more particular embodiments, markerless recovery of a transformed cell comprising a successfully integrated exogenous nucleic acid occurs within a frequency of about one every 9, 8, 7, 6, 5, 4, 3, or 2 contacted host cells, or clonal populations thereof, screened. In more particular embodiments, the host cell is a yeast cell, and the increased frequency of integration derives from yeast's increased capacity for homologous recombination relative to other host cell types.

[0082] A variety of methods are available to identify those cells having an altered genome at or near the target site without the use of a selectable marker. In some embodiments, such methods seek to detect any change in the target site, and include but are not limited to PCR methods, sequencing methods, nuclease digestion, e.g., restriction mapping, Southern blots, and any combination thereof.

[0083] In another aspect, provided herein is a method for integrating a plurality of exogenous nucleic acids into a host cell genome, the method comprising: [0084] (a) contacting a host cell with: [0085] (i) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a target site (TS).sub.x of said host cell genome; and [0086] (ii) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES), at (TS).sub.x; [0087] and [0088] (b) recovering a host cell wherein each selected exogenous nucleic acid (ES).sub.x has integrated at each selected target sequence (TS).sub.x, [0089] wherein x is any integer from 1 to n wherein n is at least 2.

[0090] FIG. 2 provides an exemplary embodiment of simultaneous genomic integration of a plurality of exogenous nucleic acids using a plurality of site-specific nucleases. In this example, three polynucleotides are introduced to a host cell, wherein each polynucleotide comprises an exogenous nucleic acid (ES).sub.x comprising a nucleic acid of interest (D).sub.x, wherein x=1, 2 or 3. Each (D).sub.x is flanked by a first homology region (HR1).sub.x and a second homology region (HR2).sub.x. (HR1).sub.x and (HR2).sub.x share homology with 5' and 3' regions, respectively, of a selected target site (TS).sub.x, of three total unique target sites in the genome. A plurality of site-specific nucleases (N).sub.x is also introduced to the host cell, wherein each (N).sub.x is capable of recognizing and cleaving a unique sequence within its corresponding target site, (TS).sub.x. Upon cleavage of a target site (TS).sub.x by its corresponding site-specific nuclease (N).sub.x, endogenous homologous recombination machinery facilitates integration of the corresponding nucleic acid interest (D).sub.x at (TS).sub.x.

[0091] In particular embodiments, each exogenous nucleic acid (ES).sub.x, optionally comprising a nucleic acid of interest (D).sub.J, is integrated into its respective genomic target site (TS).sub.x simultaneously, i.e., with a single transformation of the host cell with the plurality of integration polynucleotides and plurality of nucleases. In some embodiments, the methods are useful to simultaneously integrate any plurality of exogenous nucleic acids (ES).sub.x, that is, where x is any integer from 1 to n wherein n is at least 2, in accordance with the variables recited for the above described method. In some embodiments, the method of simultaneous integration provided herein is useful to simultaneously integrate up to 10 exogenous nucleic acids (ES).sub.x into 10 selected target sites (TS).sub.x, that is, where x is any integer from 1 to n wherein n=2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, the method of simultaneous integration provided herein is useful to simultaneously integrate up to 20 exogenous nucleic acids (ES).sub.x into 20 selected target sites (TS).sub.x, that is, where x is any integer from 1 to n wherein n=2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20. In some embodiments, n=2. In some embodiments, n=3. In some embodiments, n=4. In some embodiments, n=5. In some embodiments, n=6. In some embodiments, n=7. In some embodiments, n=8. In some embodiments, n=9. In some embodiments, n=10. In some embodiments, n=11. In some embodiments, n=12. In some embodiments, n=13. In some embodiments, n=14. In some embodiments, n=15. In some embodiments, n=16. In some embodiments, n=17. In some embodiments, n=18. In some embodiments, n=19. In some embodiments, n=20. In some embodiments, the method of simultaneous integration provided herein is useful to simultaneously integrate more than 20 exogenous nucleic acids.

[0092] As with integration of a single exogenous nucleic acid at a single target site, the simultaneous multiple integration of a plurality of exogenous nucleic acids occurs at a substantially higher frequency as compared to not contacting the target sites with a nuclease capable of inducing a double-stranded break. In some embodiments, during the simultaneous integration of a plurality of exogenous nucleic acids at multiple loci, i.e., in the presence of multiple nucleases, the frequency of integration at any single loci is substantially higher compared to the frequency of integration at the same locus during a single integration event, i.e., in the presence of a single nuclease. Such an advantage is demonstrated in Example 6 (Section 7.5.2) below. Without being bound by theory, it is believed that the presence and activity of multiple nucleases, creating double-strand breaks (DSBs) at a plurality of target sites, enriches for transformants that successfully repair the DSBs by integrating donor DNA(s) at the cut site, and/or selects against transformants unable to repair the DSBs. Since DSBs are toxic to cells, it is believed that an increased number of nucleases leads to more DSBs, and correspondingly, an enrichment for cells able to repair the DSBs through HR-mediated integration of donor DNA(s).

[0093] In some embodiments, this increased frequency of integration obviates the requirement for co-integration of one or more selectable markers for the identification of the plurality of recombination events. In some embodiments, markerless recovery of a transformed cell comprising a plurality of successfully integrated exogenous nucleic acid occurs within a frequency of about one every 1000, 900, 800, 700, 600, 500, 400, 300, 200 or 100 contacted host cells, or clonal populations thereof, screened. In particular embodiments, markerless recovery occurs within a frequency of about one every 90, 80, 70, 60, 50, 40, 30, 20, or 10 contacted host cells, or clonal populations thereof, screened. In more particular embodiments, markerless recovery occurs within a frequency of about one every 9, 8, 7, 6, 5, 4, 3, or 2 contacted host cells, or clonal populations thereof, screened. In more particular embodiments, the host cell is a yeast cell, and the increased frequency of integration derives from yeast's increased capacity for homologous recombination relative to other host cell types.

6.2.1. Methods for Metabolic Pathway Engineering

[0094] The methods and compositions described herein provide particular advantages for constructing recombinant organisms comprising optimized biosynthetic pathways, for example, towards the conversion of biomass into biofuels, pharmaceuticals or biomaterials. Functional non-native biological pathways have been successfully constructed in microbial hosts for the production of precursors to the antimalarial drug artemisinin (see, e.g., Martin et al., Nat Biotechnol 21:796-802 (2003); fatty acid derives fuels and chemicals (e.g., fatty esters, fatty alcohols and waxes; see, e.g., Steen et al., Nature 463:559-562 (2010); methyl halide-derived fuels and chemicals (see, e.g., Bayer et al., J Am Chem Soc 131:6508-6515 (2009); polyketide synthases that make cholesterol lowering drugs (see, e.g., Ma et al., Science 326:589-592 (2009); and polyketides (see, e.g., Kodumal, Proc Natl Acad Sci USA 101:15573-15578 (2004).

[0095] Traditionally, metabolic engineering, and in particular, the construction of biosynthetic pathways, has proceeded in a one-at-a-time serial fashion whereby pathway components have been introduced, i.e., integrated into the host cell genome at a single loci at a time. The methods of integration provided herein can be utilized to reduce the time typically required to engineer a host cell, for example, a microbial cell, to comprise one or more heterologous nucleotide sequences encoding enzymes of a new metabolic pathway, i.e., a metabolic pathway that produces a metabolite that is not endogenously produced by the host cell. In other particular embodiments, the methods of integration provided herein can be used to efficiently engineer a host cell to comprise one or more heterologous nucleotide sequences encoding enzymes of a metabolic pathway that is endogenous to the host cell, i.e., a metabolic pathway that produces a metabolite that is endogenously produced by the host cell. In one example, a design strategy may seek to replace three native genes of a host cell with a complementary exogenous pathway. Modifying these three endogenous loci using the current state of the art requires three separate transformations. By contrast, the methods of simultaneous multiple integration provided herein enables all three integrations to be performed in a single transformation, thus reducing the rounds of engineering needed by three-fold. Moreover, the methods enable the porting of DNA assemblies, comprising optimized pathway components integrated at multiple sites in one host cell chassis, to analogous sites in a second host cell chassis. By reducing the number of rounds needed to engineer a desired genotype, the pace of construction of metabolic pathways is substantially increased.

6.2.1.1 Isoprenoid Pathway Engineering

[0096] In some embodiments, the methods provided herein can be utilized to simultaneously introduce or replace one or more components of a biosynthetic pathway to modify the product profile of an engineered host cell. In some embodiments, the biosynthetic pathway is the isoprenoid pathway.

[0097] Terpenes are a large class of hydrocarbons that are produced in many organisms. When terpenes are chemically modified (e.g., via oxidation or rearrangement of the carbon skeleton) the resulting compounds are generally referred to as terpenoids, which are also known as isoprenoids. Isoprenoids play many important biological roles, for example, as quinones in electron transport chains, as components of membranes, in subcellular targeting and regulation via protein prenylation, as photosynthetic pigments including carotenoids, chlorophyll, as hormones and cofactors, and as plant defense compounds with various monoterpenes, sesquiterpenes, and diterpenes. They are industrially useful as antibiotics, hormones, anticancer drugs, insecticides, and chemicals.

[0098] Terpenes are derived by linking units of isoprene (C.sub.5H.sub.8), and are classified by the number of isoprene units present. Hemiterpenes consist of a single isoprene unit. Isoprene itself is considered the only hemiterpene. Monoterpenes are made of two isoprene units, and have the molecular formula C.sub.10H.sub.16. Examples of monoterpenes are geraniol, limonene, and terpineol. Sesquiterpenes are composed of three isoprene units, and have the molecular formula C.sub.15H.sub.24. Examples of sesquiterpenes are farnesenes and farnesol. Diterpenes are made of four isoprene units, and have the molecular formula C.sub.20H.sub.32. Examples of diterpenes are cafestol, kahweol, cembrene, and taxadiene. Sesterterpenes are made of five isoprene units, and have the molecular formula C.sub.25H.sub.40. An example of a sesterterpenes is geranylfarnesol. Triterpenes consist of six isoprene units, and have the molecular formula C.sub.30H.sub.48. Tetraterpenes contain eight isoprene units, and have the molecular formula C.sub.40H.sub.64. Biologically important tetraterpenes include the acyclic lycopene, the monocyclic gamma-carotene, and the bicyclic alpha- and beta-carotenes. Polyterpenes consist of long chains of many isoprene units. Natural rubber consists of polyisoprene in which the double bonds are cis.

[0099] Terpenes are biosynthesized through condensations of isopentenyl pyrophosphate (isopentenyl diphosphate or IPP) and its isomer dimethylallyl pyrophosphate (dimethylallyl diphosphate or DMAPP). Two pathways are known to generate IPP and DMAPP, namely the mevalonate-dependent (MEV) pathway of eukaryotes (FIG. 3), and the mevalonate-independent or deoxyxylulose-5-phosphate (DXP) pathway of prokaryotes. Plants use both the MEV pathway and the DXP pathway. IPP and DMAPP in turn are condensed to polyprenyl diphosphates (e.g., geranyl disphosphate or GPP, farnesyl diphosphate or FPP, and geranylgeranyl diphosphate or GGPP) through the action of prenyl disphosphate synthases (e.g., GPP synthase, FPP synthase, and GGPP synthase, respectively). The polyprenyl diphosphate intermediates are converted to more complex isoprenoid structures by terpene synthases.

[0100] Terpene synthases are organized into large gene families that form multiple products. Examples of terpene synthases include monoterpene synthases, which convert GPP into monoterpenes; diterpene synthases, which convert GGPP into diterpenes; and sesquiterpene synthases, which convert FPP into sesquiterpenes. An example of a sesquiterpene synthase is farnesene synthase, which converts FPP to farnesene. Terpene synthases are important in the regulation of pathway flux to an isoprenoid because they operate at metabolic branch points and dictate the type of isoprenoid produced by the cell. Moreover, the terpene synthases hold the key to high yield production of such terpenes. As such, one strategy to improve pathway flux in hosts engineered for heterologous isoprenoid production is to introduce multiple copies of nucleic acids encoding terpene synthases. For example, in engineered microbes comprising the MEV pathway where the production of sesquiterpenes such as farnesene is desired, a sesquiterpene synthase, e.g., a farnesene synthase is utilized as the terminal enzyme of the pathway, and multiple copies of farnesene synthase genes may be introduced into the host cell towards the generation of a strain optimized for farnesene production.

[0101] Because the biosynthesis of any isoprenoid relies on the same pathway components upstream of the prenyl disphosphate synthase and terpene synthase, these pathway components, once engineered into a host "platform" strain, can be utilized towards the production of any sesquiterpene, and the identity of the sesquiterpene can be dictated by the particular sesquiterpene synthase introduced into the host cell. Moreover, where production of terpenes having different isoprene units is desired, for example a monoterpene instead of a sesquiterpene, both the prenyl diphosphate synthase and the terpene synthase can be replaced to produce the different terpene while still utilizing the upstream components of the pathway.

[0102] Accordingly, the methods and compositions provided herein can be utilized to efficiently modify a host cell comprising an isoprenoid producing pathway, e.g., the MEV pathway to produce a desired isoprenoid. In some embodiments, the host cell comprises the MEV pathway, and the methods of simultaneous multiple integration provided herein can be utilized to simultaneously introduce multiple copies of a prenyl diphosphate synthase and/or a terpene synthase to define the terpene product profile of the host cell. In some embodiments, the prenyl diphosphate synthase is GPP synthase and the terpene synthase is a monoterpene synthase. In some embodiments, the prenyl diphosphate synthase is FPP synthase and the terpene synthase is a sesquiterpene synthase. In some embodiments, the prenyl diphosphate synthase is GGPP synthase and the terpene synthase is a diterpene synthase. In other embodiments, the host cell comprises the MEV pathway and a prenyl diphosphate synthase and/or a terpene synthase for the production of a first type of terpene, for example, farnesene, and the methods of simultaneous multiple integration provided herein can be utilized to simultaneously replace one or more copies of the prenyl diphosphate synthase and/or a terpene synthase to produce a second type of terpene, for example, amorphadiene. These embodiments are exemplified in Examples 3 and 4 below. The methods provided herein can be similarly utilized towards the construction and/or modification of any biosynthetic pathway which utilizes multiple copies of pathway components, and are particularly useful for engineering host cells whose product profile can be readily modified with the addition or exchange of multiple copies of a single pathway component.

6.2.1.2 Methods of Generating Combinatorial Integration Libraries

[0103] Once biosynthetic pathways are constructed, the expression levels of all the components need to be orchestrated to optimize metabolic flux and achieve high product titers. Common approaches for optimizing flux include varying the identity of the pathway component gene, the codon optimization of the gene, the use of solubility tags, the use of truncations or known mutations, and the expression context of the gene (i.e. promoter and terminator choice). To sample this variability in the course of building a strain using traditional methods requires generating and archiving an impractically large number of strains. For example, if a strain engineer plans to integrate constructs at three loci, and has devised 10 variants for each locus, 1,000 strains would need to be generated to fully sample the combinatorial diversity. Since pathway genes work in concert, and not all metabolite intermediates can easily be screened for, it is often impossible to evaluate the individual contribution of the pathway genes after each integration cycle. Thus, strain engineers routinely make choices that severely limit the design space that they sample when constructing a novel metabolic pathway.

[0104] To better identify the optimal pathway design, the methods of genomic modification provided herein can be utilized to generate strains comprising combinatorial libraries of rationally designed integration constructs. The methods rely on the introduction of one or more nucleases and one or more donor DNA assemblies into the cell to facilitate multiple simultaneous integration of donor DNA at specified locations in the genome. However, to generate a diversity of engineered strains, the methods comprise co-transforming a library of donor DNAs, i.e., a mixture of integration constructs for each targeted locus, such that combinatorial integration libraries of host strains can be generated (FIG. 4). The high frequency of multiple integrations achieved means that the resultant strains can reasonably be screened directly for product without extensive genomic quality control, and the identity of top strains can be determined after screening, for example, by sequencing. This method removes the burden of individual strain generation, quality control and archiving, and allows the engineer to generate diverse integration combinations in a single tube, and sort out the best performing strains by screening, e.g., for the terminal product of the pathway.

[0105] Thus, in some embodiments, the methods for integrating a plurality of exogenous nucleic acids into a host cell genome provided herein comprise: [0106] (d) contacting a host cell with: [0107] (i) a plurality of libraries, wherein each library (L).sub.x comprises a plurality of exogenous nucleic acids, wherein a selected exogenous nucleic acid comprises, in a 5' to 3' orientation, a first homology region (HR1).sub.x, any nucleic acid of interest selected from the group (D).sub.x, and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of said selected exogenous nucleic acid at a target site (TS).sub.x of said host cell genome; and [0108] (ii) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of said selected exogenous nucleic acid at (TS).sub.x; [0109] and [0110] (e) recovering a host cell wherein an exogenous nucleic acid from each library (L).sub.x has integrated at each selected target sequence (TS).sub.x, [0111] wherein x is any integer from 1 to n wherein n is at least 2.

[0112] A schematic representation of this method is provided in FIG. 4.

[0113] Also provided herein is a host cell comprising: [0114] (a) a plurality of libraries, wherein each library (L).sub.x comprises a plurality of exogenous nucleic acids, wherein a selected exogenous nucleic acid comprises, in a 5' to 3' orientation, a first homology region (HR1).sub.x, any nucleic acid of interest selected from the group (D).sub.x, and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of said selected exogenous nucleic acid at a target site (TS).sub.x of said host cell genome; and [0115] (b) for each said target site (TS).sub.x, a nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of said selected exogenous nucleic acid at (TS).sub.x, [0116] wherein x is any integer from 1 to n wherein n is at least 2.

[0117] In some embodiments, each library (L).sub.x comprises exogenous nucleic acids encoding enzymes of a common biosynthetic pathway. In some embodiments, the group (D).sub.x comprises at least 10.sup.1, 10.sup.2, 10.sup.3, 10.sup.4, 10.sup.5, 10.sup.6, or more than 10.sup.6 unique nucleic acids of interest. In some embodiments, each library (L).sub.x comprises a plurality of exogenous nucleic acids encoding variants of an enzyme of a biosynthetic pathway. As used herein, the term "variant" refers to an enzyme of a biosynthetic pathway that compared to a selected enzyme has a different nucleotide or amino acid sequence. For example, in some embodiments, a library (L).sub.x comprises sesquiterpene synthase variants, and compared to the wild-type version of the selected sesquiterpene synthase, the sesquiterpene synthase variant may comprise nucleotide additions, deletions, and/or substitutions that may or may not result in changes to the corresponding amino acid sequence. In other embodiments, the enzyme variant comprises amino acid additions, deletions and/or substitutions relative to a reference enzyme, e.g., the wild-type version.

[0118] In some embodiments, the host cell comprises one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway prior to said contacting. In some embodiments, the one or more heterologous nucleotide sequences encoding one or more enzymes of a biosynthetic pathway are genomically integrated.

6.3 Integration Polynucleotides

[0119] Advantageously, an integration polynucleotide, i.e., donor DNA, facilitates integration of one or more exogenous nucleic acid constructs into a selected target site of a host cell genome. In preferred embodiments, an integration polynucleotide comprises an exogenous nucleic acid (ES).sub.x comprising a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, and optionally a nucleic acid of interest positioned between (HR1).sub.x and (HR2).sub.x. In some embodiments, the integration polynucleotide is a linear DNA molecule. In other embodiments, the integration polynucleotide is a circular DNA molecule.

[0120] The integration polynucleotide can be generated by any technique apparent to one skilled in the art. In certain embodiments, the integration polynucleotide is generated using polymerase chain reaction (PCR) and molecular cloning techniques well known in the art. See, e.g., PCR Technology: Principles and Applications for DNA Amplification, ed. H A Erlich, Stockton Press, New York, N.Y. (1989); Sambrook et al., 2001, Molecular Cloning--A Laboratory Manual, 3.sup.rd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; PCR Technology: Principles and Applications for DNA Amplification, ed. H A Erlich, Stockton Press, New York, N.Y. (1989); U.S. Pat. No. 8,110,360.

6.3.1. Genomic Integration Sequences

[0121] In preferred embodiments, an integration polynucleotide comprises an exogenous nucleic acid (ES).sub.x comprising a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination at a selected target site (TS).sub.x within the host cell genome. To integrate an exogenous nucleic acid into the genome by homologous recombination, the integration polynucleotide preferably comprises (HR1).sub.x at one terminus and (HR2).sub.x at the other terminus. In some embodiments, (HR1).sub.x is homologous to a 5' region of the selected genomic target site (TS).sub.x, and (HR2).sub.x, is homologous to a 3' region of the selected target site (TS).sub.x. In some embodiments, (HR1).sub.x is about 70%, 75%, 80%, 85%, 90%, 95% or 100% homologous to a 5' region of the selected genomic target site (TS).sub.x. In some embodiments, (HR2).sub.x, is about 70%, 75%, 80%, 85%, 90%, 95% or 100% homologous to a 3' region of the selected target site (TS).sub.x.

[0122] In certain embodiments, (HR1).sub.x is positioned 5' to a nucleic acid of interest (D).sub.x. In some embodiments, (HR1).sub.x is positioned immediately adjacent to the 5' end of (D).sub.x. In some embodiments, (HR1).sub.x is positioned upstream to the 5' of (D).sub.x. In certain embodiments, (HR2).sub.x is positioned 3' to a nucleic acid of interest (D).sub.x. In some embodiments, (HR2).sub.x is positioned immediately adjacent to the 3' end of (D).sub.J. In some embodiments, (HR2).sub.x is positioned downstream to the 3' of (D).sub.x.

[0123] Properties that may affect the integration of an integration polynucleotide at a particular genomic locus include but are not limited to: the lengths of the genomic integration sequences, the overall length of the excisable nucleic acid construct, and the nucleotide sequence or location of the genomic integration locus. For instance, effective heteroduplex formation between one strand of a genomic integration sequence and one strand of a particular locus in a host cell genome may depend on the length of the genomic integration sequence. An effective range for the length of a genomic integration sequence is 50 to 5,000 nucleotides. For a discussion of effective lengths of homology between genomic integration sequences and genomic loci. See, Hasty et al., Mol Cell Biol 11:5586-91 (1991).

[0124] In some embodiments, (HR1).sub.x and (HR2).sub.x can comprise any nucleotide sequence of sufficient length and sequence identity that allows for genomic integration of the exogenous nucleic acid (ES).sub.x at any yeast genomic locus. In certain embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 50 to 5,000 nucleotides. In certain embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 100 to 2,500 nucleotides. In certain embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 100 to 1,000 nucleotides. In certain embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 250 to 750 nucleotides. In certain embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900 or 5,000 nucleotides. In some embodiments, each of (HR1).sub.x and (HR2).sub.x independently consists of about 500 nucleotides.

6.3.2. Nucleic Acids of Interest

[0125] In some embodiments, the integration polynucleotide further comprises a nucleic acid of interest (D).sub.x. The nucleic acid of interest can be any DNA segment deemed useful by one of skill in the art. For example, the DNA segment may comprise a gene of interest that can be "knocked in" to a host genome. In other embodiments, the DNA segment functions as a "knockout" construct that is capable of specifically disrupting a target gene upon integration of the construct into the target site of the host cell genome, thereby rendering the disrupted gene non-functional. Useful examples of a nucleic acid of interest (D).sub.x include but are not limited to: a protein-coding sequence, reporter gene, fluorescent marker coding sequence, promoter, enhancer, terminator, transcriptional activator, transcriptional repressor, transcriptional activator binding site, transcriptional repressor binding site, intron, exon, poly-A tail, multiple cloning site, nuclear localization signal, mRNA stabilization signal, integration loci, epitope tag coding sequence, degradation signal, or any other naturally occurring or synthetic DNA molecule. In some embodiments, (D).sub.x can be of natural origin. Alternatively, (D).sub.x can be completely of synthetic origin, produced in vitro. Furthermore, (D).sub.x can comprise any combination of isolated naturally occurring DNA molecules, or any combination of an isolated naturally occurring DNA molecule and a synthetic DNA molecule. For example, (D).sub.x may comprise a heterologous promoter operably linked to a protein coding sequence, a protein coding sequence linked to a poly-A tail, a protein coding sequence linked in-frame with a epitope tag coding sequence, and the like. The nucleic acid of interest (D).sub.x may be obtained by standard procedures known in the art from cloned DNA (e.g., a DNA "library"), by chemical synthesis, by cDNA cloning, or by the cloning of genomic DNA, or fragments thereof, purified from the desired cell, or by PCR amplification and cloning. See, for example, Sambrook et al., Molecular Cloning, A Laboratory Manual, 3d. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001); Glover, D. M. (ed.), DNA Cloning: A Practical Approach, 2d. ed., MRL Press, Ltd., Oxford, U.K. (1995).

[0126] In particular embodiments, the nucleic acid of interest (D).sub.x does not comprise nucleic acid encoding a selectable marker. In these embodiments, the high efficiency of integration provided by the methods described herein allows for the screening and identification of integration events without the requirement for growth of transformed cells on selection media. However, in other embodiments where growth on selective media is nonetheless desired, the nucleic acid of interest (D).sub.x can comprise a selectable marker that may be used to select for the integration of the exogenous nucleic acid into a host genome.

[0127] A wide variety of selectable markers are known in the art (see, for example, Kaufinan, Meth. Enzymol., 185:487 (1990); Kaufman, Meth. Enzymol., 185:537 (1990); Srivastava and Schlessinger, Gene, 103:53 (1991); Romanos et al., in DNA Cloning 2: Expression Systems, 2nd Edition, pages 123-167 (IRL Press 1995); Markie, Methods Mol. Biol., 54:359 (1996); Pfeifer et al., Gene, 188:183 (1997); Tucker and Burke, Gene, 199:25 (1997); Hashida-Okado et al., FEBS Letters, 425:117 (1998)). In some embodiments, the selectable marker is a drug resistant marker. A drug resistant marker enables cells to detoxify an exogenous drug that would otherwise kill the cell. Illustrative examples of drug resistant markers include but are not limited to those which confer resistance to antibiotics such as ampicillin, tetracycline, kanamycin, bleomycin, streptomycin, hygromycin, neomycin, Zeocin.TM., and the like. In other embodiments, the selectable marker is an auxotrophic marker. An auxotrophic marker allows cells to synthesize an essential component (usually an amino acid) while grown in media that lacks that essential component. Selectable auxotrophic gene sequences include, for example, hisD, which allows growth in histidine free media in the presence of histidinol. Other selectable markers include a bleomycin-resistance gene, a metallothionein gene, a hygromycin B-phosphotransferase gene, the AURI gene, an adenosine deaminase gene, an aminoglycoside phosphotransferase gene, a dihydrofolate reductase gene, a thymidine kinase gene, a xanthine-guanine phosphoribosyltransferase gene, and the like. In other embodiments, the selectable marker is a marker other than one which rescues an auxotophic mutation. For example, the host cell strain can comprise mutations other than auxotrophic mutations, for example, mutations that are not lethal to the host and that also do not cause adverse effects on the intended use of the strain, e.g., industrial fermentation, so long as the mutations can be identified by a known selection method.

[0128] Host cell transformants comprising a chromosomally integrated polynucleotide can also be identified by selecting host cell transformants exhibiting other traits encoded by individual DNA segments or by combinations of DNA segments, e.g., expression of peptides that emit light, or by molecular analysis of individual host cell colonies, e.g., by restriction enzyme mapping, PCR amplification, or sequence analysis of isolated assembled polynucleotides or chromosomal integration sites.

6.4 Nucleases

[0129] In some embodiments of the methods described herein, a host cell genome is contacted with one or more nucleases capable of cleaving, i.e., causing a double-stranded break at a designated region within a selected target site. In some embodiments, a double-strand break inducing agent is any agent that recognizes and/or binds to a specific polynucleotide recognition sequence to produce a break at or near the recognition sequence. Examples of double-strand break inducing agents include, but are not limited to, endonucleases, site-specific recombinases, transposases, topoisomerases, and zinc finger nucleases, and include modified derivatives, variants, and fragments thereof.

[0130] In some embodiments, each of the one or more nucleases is capable of causing a double-strand break at a designated region within a selected target site (TS).sub.x. In some embodiments, the nuclease is capable of causing a double-strand break at a region positioned between the 5' and 3' regions of (TS).sub.x with which (HR1).sub.x and (HR2).sub.x share homology, respectively. In other embodiments, the nuclease is capable of causing a double-strand break at a region positioned upstream or downstream of the 5' and 3' regions of (TS).sub.x.

[0131] A recognition sequence is any polynucleotide sequence that is specifically recognized and/or bound by a double-strand break inducing agent. The length of the recognition site sequence can vary, and includes, for example, sequences that are at least 10, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70 or more nucleotides in length.

[0132] In some embodiments, the recognition sequence is palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand. In some embodiments, the nick/cleavage site is within the recognition sequence. In other embodiments, the nick/cleavage site is outside of the recognition sequence. In some embodiments, cleavage produces blunt end termini. In other embodiments, cleavage produces single-stranded overhangs, i.e., "sticky ends," which can be either 5' overhangs, or 3' overhangs.

[0133] In some embodiments, the recognition sequence within the selected target site can be endogenous or exogenous to the host cell genome. When the recognition site is an endogenous sequence, it may be a recognition sequence recognized by a naturally-occurring, or native double-strand break inducing agent. Alternatively, an endogenous recognition site could be recognized and/or bound by a modified or engineered double-strand break inducing agent designed or selected to specifically recognize the endogenous recognition sequence to produce a double-strand break. In some embodiments, the modified double-strand break inducing agent is derived from a native, naturally-occurring double-strand break inducing agent. In other embodiments, the modified double-strand break inducing agent is artificially created or synthesized. Methods for selecting such modified or engineered double-strand break inducing agents are known in the art. For example, amino acid sequence variants of the protein(s) can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations include, for example, Kunkel, (1985) Proc Natl Acad Sci USA 82:488-92; Kunkel, et al., (1987) Meth Enzymol 154:367-82; U.S. Pat. No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein. Guidance regarding amino acid substitutions not likely to affect biological activity of the protein is found, for example, in the model of Dayhoff, et al., (1978) Atlas of Protein Sequence and Structure (Natl Biomed Res Found, Washington, D.C.). Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferable. Conservative deletions, insertions, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, insertion, or combination thereof can be evaluated by routine screening assays. Assays for double strand break inducing activity are known and generally measure the overall activity and specificity of the agent on DNA substrates containing recognition sites.

[0134] In some embodiments of the methods provided herein, one or more of the nucleases is an endonuclease. Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA as specific sites without damaging the bases. Restriction endonucleases include Type I, Type II, Type III, and Type IV endonucleases, which further include subtypes. Restriction endonucleases are further described and classified, for example in the REBASE database (webpage at rebase.neb.com; Roberts, et al., (2003) Nucleic Acids Res 31:418-20), Roberts, et al., (2003) Nucleic Acids Res 31:1805-12, and Belfort, et al., (2002) in Mobile DNA II, pp. 761-783, Eds. Craigie, et al., ASM Press, Washington, D.C.

[0135] As used herein, endonucleases also include homing endonucleases, which like restriction endonucleases, bind and cut at a specific recognition sequence. However the recognition sites for homing endonucleases are typically longer, for example, about 18 bp or more. Homing endonucleases, also known as meganucleases, have been classified into the following families based on conserved sequence motifs: an LAGLIDADG (SEQ ID NO: 50) homing endonuclease, an HNH homing endonuclease, a His-Cys box homing endonuclease, a GIY-YIG (SEQ ID NO: 51) homing endonuclease, and a cyanobacterial homing endonuclease. See, e.g., Stoddard, Quarterly Review of Biophysics 38(1): 49-95 (2006). These families differ greatly in their conserved nuclease active-site core motifs and catalytic mechanisms, biological and genomic distributions, and wider relationship to non-homing nuclease systems. See, for example, Guhan and Muniyappa (2003) Crit Rev Biochem Mol Biol 38:199-248; Lucas, et al., (2001) Nucleic Acids Res 29:960-9; Jurica and Stoddard, (1999) Cell Mol Life Sci 55:1304-26; Stoddard, (2006) Q Rev Biophys 38:49-95; and Moure, et al., (2002) Nat Struct Biol 9:764. Examples of useful specific homing endonucleases from these families include, but are not limited to: I-CreI (see, Rochaix et al., Nucleic Acids Res. 13: 975-984 (1985), I-MsoI (see, Lucas et al., Nucleic Acids Res. 29: 960-969 (2001), I-SceI (see, Foury et al., FEBS Lett. 440: 325-331 (1998), I-SceIV (see, Moran et al., Nucleic Acids Res. 20: 4069-4076 (1992), H-DreI (see, Chevalier et al., Mol. Cell 10: 895-905 (2002), I-HmuI (see, Goodrich-Blair et al., Cell 63: 417-424 (1990); Goodrich-Blair et al., Cell 84: 211-221 (1996), I-PpoI (see, Muscarella et al., Mol. Cell. Biol. 10: 3386-3396 (1990), I-DirI (see, Johansen et al., Cell 76: 725-734 (1994); Johansen, Nucleic Acids Res. 21: 4405 (1993), I-NjaI (see, Elde et al., Eur. J. Biochen. 259: 281-288 (1999); De Jonckheere et al., J. Eukaryot. Microbiol. 41: 457-463 (1994), I-NanI (see, Elde et al., S. Eur. J. Biochem. 259: 281-288 (1999); De Jonckheere et al., J. Eukaryot. Microbiol. 41: 457-463 (1994)), I-NitI (see, De Jonckheere et al., J. Eukarvot. Microbiol. 41: 457-463 (1994); Elde et al., Eur. J. Biochem. 259: 281-288 (1999), I-TevI (see, Chu et al., Cell 45: 157-166 (1986), I-TevII (see, Tomaschewski et al., Nucleic Acids Res. 15: 3632-3633 (1987), I-TevIII (see, Eddy et al., Genes Dev. 5: 1032-1041 (1991), F-TevI (see, Fujisawa et al., Nucleic Acids Res. 13: 7473-7481 (1985), F-TevII (see, Kadyrov et al., Dokl. Biochem. 339: 145-147 (1994); Kaliman, Nucleic Acids Res. 18: 4277 (1990), F-CphI (see, Zeng et al., Curr. Biol. 19: 218-222 (2009), PI-MgaI (see, Saves et al., Nucleic Acids Res. 29:4310-4318 (2001), I-CsmI (see, Colleaux et al., Mol. Gen. Genet. 223:288-296 (1990), I-CeuI (see, Turmel et al., J. Mol. Biol. 218: 293-311 (1991) and PI-SceI (see, Hirata et al., J. Biol. Chem. 265: 6726-6733 (1990).

[0136] In some embodiments of the methods described herein, a naturally occurring variant, and/or engineered derivative of a homing endonuclease is used. Methods for modifying the kinetics, cofactor interactions, expression, optimal conditions, and/or recognition site specificity, and screening for activity are known. See, for example, Epinat, et al., (2003) Nucleic Acids Res 31:2952-62; Chevalier, et al., (2002) Mol Cell 10:895-905; Gimble, et al., (2003) Mol Biol 334:993-1008; Seligman, et al., (2002) Nucleic Acids Res 30:3870-9; Sussman, et al., (2004) J Mol Biol 342:31-41; Rosen, et al., (2006) Nucleic Acids Res 34:4791-800; Chames, et al., (2005) Nucleic Acids Res 33:e178; Smith, et al., (2006) Nucleic Acids Res 34:e 149; Gruen, et al., (2002) Nucleic Acids Res 30:e29; Chen and Zhao, (2005) Nucleic Acids Res 33:e154; WO2005105989; WO2003078619; WO2006097854; WO02006097853; WO2006097784; and WO2004031346. Useful homing endonucleases also include those described in WO04/067736; WO04/067753; WO06/097784; WO06/097853; WO06/097854; WO07/034262; WO07/049095; WO07/049156; WO07/057781; WO07/060495; WO08/152524; WO09/001159; WO09/095742; WO09/095793; WO10/001189; WO10/015899; and WO10/046786.

[0137] Any homing endonuclease can be used as a double-strand break inducing agent including, but not limited to: H-DreI, I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, Pi-PspI, F-SceI, F-SceII, F-SuvI, F-CphI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobiIP, I-PorI, I-PorIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, I-UarAP, I-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MgaI, PI-MtuI, PI-MtuHIP PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, or PI-TliII, or any variant or derivative thereof.

[0138] In some embodiments, the endonuclease binds a native or endogenous recognition sequence. In other embodiments, the endonuclease is a modified endonuclease that binds a non-native or exogenous recognition sequence and does not bind a native or endogenous recognition sequence.

[0139] In some embodiments of the methods provided herein, one or more of the nucleases is a TAL-effector DNA binding domain-nuclease fusion protein (TALEN). TAL effectors of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. see, e.g., Gu et al. (2005) Nature 435:1122-5; Yang et al., (2006) Proc. Natl. Acad. Sci. USA 103:10503-8; Kay et al., (2007) Science 318:648-51; Sugio et al., (2007) Proc. Natl. Acad. Sci. USA 104:10720-5; Romer et al., (2007) Science 318:645-8; Boch et al., (2009) Science 326(5959):1509-12; and Moscou and Bogdanove, (2009) 326(5959):1501. A TAL effector comprises a DNA binding domain that interacts with DNA in a sequence-specific manner through one or more tandem repeat domains. The repeated sequence typically comprises 34 amino acids, and the repeats are typically 91-100% homologous with each other. Polymorphism of the repeats is usually located at positions 12 and 13, and there appears to be a one-to-one correspondence between the identity of repeat variable-diresidues at positions 12 and 13 with the identity of the contiguous nucleotides in the TAL-effector's target sequence.

[0140] The TAL-effector DNA binding domain may be engineered to bind to a desired target sequence, and fused to a nuclease domain, e.g., from a type II restriction endonuclease, typically a nonspecific cleavage domain from a type II restriction endonuclease such as FokI (see e.g., Kim et al. (1996) Proc. Natl. Acad. Sci. USA 93:1156-1160). Other useful endonucleases may include, for example, HhaI, HindIII, Nod, BbvCI, EcoRI, BglI, and AlwI. Thus, in preferred embodiments, the TALEN comprises a TAL effector domain comprising a plurality of TAL effector repeat sequences that, in combination, bind to a specific nucleotide sequence in the target DNA sequence, such that the TALEN cleaves the target DNA within or adjacent to the specific nucleotide sequence. TALENS useful for the methods provided herein include those described in WO10/079430 and U.S. Patent Application Publication No. 2011/0145940.

[0141] In some embodiments, the TAL effector domain that binds to a specific nucleotide sequence within the target DNA can comprise 10 or more DNA binding repeats, and preferably 15 or more DNA binding repeats. In some embodiments, each DNA binding repeat comprises a repeat variable-diresidue (RVD) that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence, and wherein the RVD comprises one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, where * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, where * represents a gap in the second position of the RVD; IG for recognizing T; NK for recognizing G; HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; and YG for recognizing T.

[0142] In some embodiments of the methods provided herein, one or more of the nucleases is a site-specific recombinase. A site-specific recombinase, also referred to as a recombinase, is a polypeptide that catalyzes conservative site-specific recombination between its compatible recombination sites, and includes native polypeptides as well as derivatives, variants and/or fragments that retain activity, and native polynucleotides, derivatives, variants, and/or fragments that encode a recombinase that retains activity. For reviews of site-specific recombinases and their recognition sites, see, Sauer (1994) Curr Op Biotechnol 5:521-7; and Sadowski, (1993) FASEB 7:760-7. In some embodiments, the recombinase is a serine recombinase or a tyrosine recombinase. In some embodiments, the recombinase is from the Integrase or Resolvase families. In some embodiments, the recombinase is an integrase selected from the group consisting of FLP, Cre, lambda integrase, and R. For other members of the Integrase family, see for example, Esposito, et al., (1997) Nucleic Acids Res 25:3605-14 and Abremski, et al., (1992) Protein Eng 5:87-91. Methods for modifying the kinetics, cofactor interaction and requirements, expression, optimal conditions, and/or recognition site specificity, and screening for activity of recombinases and variants are known, see for example Miller, et al., (1980) Cell 20:721-9; Lange-Gustafson and Nash, (1984) J Biol Chem 259:12724-32; Christ, et al., (1998) J Mol Biol 288:825-36; Lorbach, et al., (2000) J Mol Biol 296:1175-81; Vergunst, et al., (2000) Science 290:979-82; Dorgai, et al., (1995) J Mol Biol 252:178-88; Dorgai, et al., (1998) J Mol Biol 277:1059-70; Yagu, et al., (1995)J Mol Biol 252:163-7; Sclimente, et al., (2001) Nucleic Acids Res 29:5044-51; Santoro and Schultze, (2002) Proc Natl Acad Sci USA 99:4185-90; Buchholz and Stewart, (2001) Nat Biotechnol 19:1047-52; Voziyanov, et al., (2002) Nucleic Acids Res 30:1656-63; Voziyanov, et al., (2003) J Mol Biol 326:65-76; Klippel, et al., (1988) EMBO J 7:3983-9; Arnold, et al., (1999) EMBO J 18:1407-14; WO03/08045; WO99/25840; and WO99/25841. The recognition sites range from about 30 nucleotide minimal sites to a few hundred nucleotides. Any recognition site for a recombinase can be used, including naturally occurring sites, and variants. Variant recognition sites are known, see for example Hoess, et al., (1986) Nucleic Acids Res 14:2287-300; Albert, et al., (1995) Plant J 7:649-59; Thomson, et al., (2003) Genesis 36:162-7; Huang, et al., (1991) Nucleic Acids Res 19:443-8; Siebler and Bode, (1997) Biochemistry 36:1740-7; Schlake and Bode, (1994) Biochemistry 33:12746-51; Thygarajan, et al., (2001) Mol Cell Biol 21:3926-34; Umlauf and Cox, (1988) EMBO J 7:1845-52; Lee and Saito, (1998) Gene 216:55-65; WO01/23545; WO99/25821; WO99/25851; WO01/11058; WO01/07572 and U.S. Pat. No. 5,888,732.

[0143] In some embodiments of the methods provided herein, one or more of the nucleases is a transposase. Transposases are polypeptides that mediate transposition of a transposon from one location in the genome to another. Transposases typically induce double strand breaks to excise the transposon, recognize subterminal repeats, and bring together the ends of the excised transposon, in some systems other proteins are also required to bring together the ends during transposition. Examples of transposons and transposases include, but are not limited to, the Ac/Ds, Dt/rdt, Mu-MI/Mn, and Spm(En)/dSpm elements from maize, the Tam elements from snapdragon, the Mu transposon from bacteriophage, bacterial transposons (Tn) and insertion sequences (IS), Ty elements of yeast (retrotransposon), Tal elements from Arabidopsis (retrotransposon), the P element transposon from Drosophila (Gloor, et al., (1991) Science 253:1110-1117), the Copia, Mariner and Minos elements from Drosophila, the Hermes elements from the housefly, the PiggyBack elements from Trichplusia ni, Tcl elements from C. elegans, and IAP elements from mice (retrotransposon).

[0144] In some embodiments of the methods provided herein, one or more of the nucleases is a zinc-finger nuclease (ZFN). ZFNs are engineered double-strand break inducing agents comprised of a zinc finger DNA binding domain and a double strand break inducing agent domain. Engineered ZFNs consist of two zinc finger arrays (ZFAs), each of which is fused to a single subunit of a non-specific endonuclease, such as the nuclease domain from the FokI enzyme, which becomes active upon dimerization. Typically, a single ZFA consists of 3 or 4 zinc finger domains, each of which is designed to recognize a specific nucleotide triplet (GGC, GAT, etc.). Thus, ZFNs composed of two "3-finger" ZFAs are capable of recognizing an 18 base pair target site; an 18 base pair recognition sequence is generally unique, even within large genomes such as those of humans and plants. By directing the co-localization and dimerization of two FokI nuclease monomers, ZFNs generate a functional site-specific endonuclease that creates a double-stranded break (DSB) in DNA at the targeted locus.

[0145] Useful zinc-finger nucleases include those that are known and those that are engineered to have specificity for one or more target sites (TS) described herein. Zinc finger domains are amenable for designing polypeptides which specifically bind a selected polynucleotide recognition sequence, for example, within the target site of the host cell genome. ZFNs consist of an engineered DNA-binding zinc finger domain linked to a non-specific endonuclease domain, for example nuclease domain from a Type IIs endonuclease such as HO or FokI. Alternatively, engineered zinc finger DNA binding domains can be fused to other double-strand break inducing agents or derivatives thereof that retain DNA nicking/cleaving activity. For example, this type of fusion can be used to direct the double-strand break inducing agent to a different target site, to alter the location of the nick or cleavage site, to direct the inducing agent to a shorter target site, or to direct the inducing agent to a longer target site. In some examples a zinc finger DNA binding domain is fused to a site-specific recombinase, transposase, or a derivative thereof that retains DNA nicking and/or cleaving activity. Additional functionalities can be fused to the zinc-finger binding domain, including transcriptional activator domains, transcription repressor domains, and methylases. In some embodiments, dimerization of nuclease domain is required for cleavage activity.

[0146] Each zinc finger recognizes three consecutive base pairs in the target DNA. For example, a 3 finger domain recognized a sequence of 9 contiguous nucleotides, with a dimerization requirement of the nuclease, two sets of zinc finger triplets are used to bind a 18 nucleotide recognition sequence. Useful designer zinc finger modules include those that recognize various GNN and ANN triplets (Dreier, et al., (2001) J Biol Chem 276:29466-78; Dreier, et al., (2000) J Mol Biol 303:489-502; Liu, et al., (2002) J Biol Chem 277:3850-6), as well as those that recognize various CNN or TNN triplets (Dreier, et al., (2005) J Biol Chem 280:35588-97; Jamieson, et al., (2003) Nature Rev Drug Discov 2:361-8). See also, Durai, et al., (2005) Nucleic Acids Res 33:5978-90; Segal, (2002) Methods 26:76-83; Porteus and Carroll, (2005) Nat Biotechnol 23:967-73; Pabo, et al., (2001) Ann Rev Biochem 70:313-40; Wolfe, et al., (2000) Ann Rev Biophys Biomol Struct 29:183-212; Segal and Barbas, (2001) Curr Opin Biotechnol 12:632-7; Segal, et al., (2003) Biochemistry 42:2137-48; Beerli and Barbas, (2002) Nat Biotechnol 20:135-41; Carroll, et al., (2006) Nature Protocols 1:1329; Ordiz, et al., (2002) Proc Natl Acad Sci USA 99:13290-5; Guan, et al., (2002) Proc Natl Acad Sci USA 99:13296-301; WO2002099084; WO00/42219; WO02/42459; WO2003062455; US20030059767; US Patent Application Publication Number 2003/0108880; U.S. Pat. Nos. 6,140,466, 6,511,808 and 6,453,242. Useful zinc-finger nucleases also include those described in WO03/080809; WO05/014791; WO05/084190; WO08/021207; WO09/042186; WO09/054985; and WO10/065123.

6.5 Genomic Target Sites

[0147] In the methods provided herein, a nuclease is introduced to the host cell that is capable of causing a double-strand break near or within a genomic target site, which greatly increases the frequency of homologous recombination at or near the cleavage site. In preferred embodiments, the recognition sequence for the nuclease is present in the host cell genome only at the target site, thereby minimizing any off-target genomic binding and cleavage by the nuclease.

[0148] In some embodiments, the genomic target site is endogenous to the host cell, such as a native locus. In some embodiments, the native genomic target site is selected according to the type of nuclease to be utilized in the methods of integration provided herein. If the nuclease to be utilized is a zinc finger nuclease, optimal target sites may be selected using a number of publicly available online resources. See, e.g., Reyon et al., BMC Genomics 12:83 (2011), which is hereby incorporated by reference in its entirety. For example, Oligomerized Pool Engineering (OPEN) is a highly robust and publicly available protocol for engineering zinc finger arrays with high specificity and in vivo functionality, and has been successfully used to generate ZFNs that function efficiently in plants, zebrafish, and human somatic and pluripotent stem cells. OPEN is a selection-based method in which a pre-constructed randomized pool of candidate ZFAs is screened to identify those with high affinity and specificity for a desired target sequence. ZFNGenome is a GBrowse-based tool for identifying and visualizing potential target sites for OPEN-generated ZFNs. ZFNGenome provides a compendium of potential ZFN target sites in sequenced and annotated genomes of model organisms. ZFNGenome currently includes a total of more than 11.6 million potential ZFN target sites, mapped within the fully sequenced genomes of seven model organisms; S. cerevisiae, C. reinhardtii, A. thaliana, D. melanogaster, D. rerio, C. elegans, and H. sapiens. Additional model organisms, including three plant species; Glycine max (soybean), Orvza sativa (rice), Zea mays (maize), and three animal species Tribolium castaneum (red flour beetle), Mus musculus (mouse), Rattus norvegicus (brown rat) will be added in the near future. ZFNGenome provides information about each potential ZFN target site, including its chromosomal location and position relative to transcription initiation site(s). Users can query ZFNGenome using several different criteria (e.g., gene ID, transcript ID, target site sequence).

[0149] If the nuclease to be utilized is a TAL-effector nuclease, in some embodiments, optimal target sites may be selected in accordance with the methods described by Sanjana et al., Nature Protocols, 7:171-192 (2012), which is hereby incorporated by reference in its entirety. In brief, TALENs function as dimers, and a pair of TALENs, referred to as the left and right TALENs, target sequences on opposite strands of DNA. TALENs are engineered as a fusion of the TALE DNA-binding domain and a monomeric FokI catalytic domain. To facilitate FokI dimerization, the left and right TALEN target sites are chosen with a spacing of approximately 14-20 bases. Therefore, for a pair of TALENs, each targeting 20-bp sequences, an optimal target site should have the form 5'-TN.sup.19N.sup.14-20N.sup.19A-3', where the left TALEN targets 5'-TN.sup.19-3' and the right TALEN targets the antisense strand of 5'-N.sup.19A-3' (N=A, G, T or C).

[0150] In other embodiments of the methods provided herein, the genomic target site is exogenous to the host cell. For example, one or more genomic target sites can be engineered into the host cell genome using traditional methods, e.g., gene targeting, prior to performing the methods of integration described herein. In some embodiments, multiple copies of the same target sequence are engineered into the host cell genome at different loci, thereby facilitating simultaneous multiple integration events with the use of only a single nuclease that specifically recognizes the target sequence. In other embodiments, a plurality of different target sequences is engineered into the host cell genome at different loci. In some embodiments, the engineered target site comprises a target sequence that is not otherwise represented in the native genome of the host cell. For example, homing endonucleases target large recognition sites (12-40 bp) that are usually embedded in introns or inteins, and as such, their recognition sites are extremely rare, with none or only a few of these sites present in a mammalian-sized genome. Thus, in some embodiments, the exogenous genomic target site is a recognition sequence for a homing endonuclease. In some embodiments, the homing nuclease is selected from the group consisting of: H-DreI, I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, Pi-PspI, F-SceI, F-SceII, F-SuvI, F-CphI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, I-UarAP, I-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MgaI, PI-MtuI, PI-MtuHIP PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, or PI-TliII, or any variant or derivative thereof. In particular embodiments, the exogenous genomic target site is the recognition sequence for I-SceI, VDE (PI-SceI), F-CphI, PI-MgaI or PI-MtuII, each of which are provided below.

TABLE-US-00001 TABLE 1 Recognition and cleavage sites for select homing endonucleases. Nuclease Recognition sequence I-SceI TAGGGATAACAGGGTAAT (SEQ ID NO: 52) VDE TATGTCGGGTGCGGAGAAAGAGGTAATGAAA (PI-SceI) (SEQ ID NO: 53) F-CphI GATGCACGAGCGCAACGCTCACAA (SEQ ID NO: 54) PI-MgaI GCGTAGCTGCCCAGTATGAGTCAG (SEQ ID NO: 55) PI-MtuII ACGTGCACTACGTAGAGGGTCGCACCGCACCGATCTACA A (SEQ ID NO: 56)

6.6 Delivery

[0151] In some embodiments, the one or more nucleases useful for the methods described herein are provided, e.g., delivered into the host cell as a purified protein. In other embodiments, the one or more nucleases are provided via polynucleotide(s) comprising a nucleic acid encoding the nuclease. In other embodiments, the one or more nucleases are introduced into the host cell as purified RNA which can be directly translated in the host cell nucleus.

[0152] In certain embodiments, an integration polynucletide, a polynucleotide encoding a nuclease, or a purified nuclease protein as described above, or any combination thereof, may be introduced into a host cell using any conventional technique to introduce exogenous protein and/or nucleic acids into a cell known in the art. Such methods include, but are not limited to, direct uptake of the molecule by a cell from solution, or facilitated uptake through lipofection using, e.g., liposomes or immunoliposomes; particle-mediated transfection; etc. See, e.g., U.S. Pat. No. 5,272,065; Goeddel et al., eds, 1990, Methods in Enzymology, vol. 185, Academic Press, Inc., CA; Krieger, 1990, Gene Transfer and Expression--A Laboratory Manual, Stockton Press, NY; Sambrook et al., 1989, Molecular Cloning--A Laboratory Manual, Cold Spring Harbor Laboratory, NY; and Ausubel et al., eds., Current Edition, Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley Interscience, NY. Particular methods for transforming cells are well known in the art. See Hinnen et al., Proc. Natl. Acad. Sci. USA 75:1292-3 (1978); Cregg et al., Mol. Cell. Biol. 5:3376-3385 (1985). Exemplary techniques include but are not limited to, spheroplasting, electroporation, PEG 1000 mediated transformation, and lithium acetate or lithium chloride mediated transformation.

[0153] In some embodiments, biolistics are utilized to introduce an integration polynucletide, a polynucleotide encoding a nuclease, a purified nuclease protein, or any combination thereof into the host cell, in particular, host cells that are otherwise difficult to transform/transfect using conventional techniques, such as plants. Biolistics work by binding the transformation reaction to microscopic gold particles, and then propelling the particles using compressed gas at the target cells.

[0154] In some embodiments, the polynucleotide comprising nucleic acid encoding the nuclease is an expression vector that allows for the expression of a nuclease within a host cell. Suitable expression vectors include but are not limited to those known for use in expressing genes in Escherichia coli, yeast, or mammalian cells. Examples of Escherichia coli expression vectors include but are not limited to pSCM525, pDIC73, pSCM351, and pSCM353. Examples of yeast expression vectors include but are not limited to pPEX7 and pPEX408. Other examples of suitable expression vectors include the yeast-Escherichia coli pRS series of shuttle vectors comprising CEN.ARS sequences and yeast selectable markers; and 2.mu. plasmids. In some embodiments, a polynucleotide encoding a nuclease can be modified to substitute codons having a higher frequency of usage in the host cell, as compared to the naturally occurring polynucleotide sequence. For example the polynucleotide encoding the nuclease can be modified to substitute codons having a higher frequency of usage in S. cerevisiae, as compared to the naturally occurring polynucleotide sequence.

[0155] In some embodiments where the nuclease functions as a heterodimer requiring the separate expression of each monomer, as is the case for zinc finger nucleases and TAL-effector nucleases, each monomer of the heterodimer may be expressed from the same expression plasmid, or from different plasmids. In embodiments where multiple nucleases are introduced to the cell to effect double-strand breaks at different target sites, the nucleases may be encoded on a single plasmid or on separate plasmids.

[0156] In certain embodiments, the nuclease expression vector further comprises a selectable marker that allows for selection of host cells comprising the expression vector. Such selection can be helpful to retain the vector in the host cell for a period of time necessary for expression of sufficient amounts of nuclease to occur, for example, for a period of 12, 24, 36, 48, 60, 72, 84, 96, or more than 96 hours, after which the host cells may be grown under conditions under which the expression vector is no longer retained. In certain embodiments, the selectable marker is selected from the group consisting of: URA3, hygromycin B phosphotransferase, aminoglycoside phosphotransferase, zeocin resistance, and phosphinothricin N-acetyltransferase. In some embodiments, the nuclease expression vector vector may comprise a counter-selectable marker that allows for selection of host cells that do not contain the expression vector subsequent to integration of the one or more donor nucleic acid molecules. The nuclease expression vector used may also be a transient vector that has no selection marker, or is one that is not selected for. In particular embodiments, the progeny of a host cell comprising a transient nuclease expression vector loses the vector over time.

[0157] In certain embodiments, the expression vector further comprises a transcription termination sequence and a promoter operatively linked to the nucleotide sequence encoding the nuclease. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter. Illustrative examples of promoters suitable for use in yeast cells include, but are not limited to the promoter of the TEF1 gene of K. lactis, the promoter of the PGK1 gene of Saccharomyces cerevisiae, the promoter of the TDH3 gene of Saccharomyces cerevisiae, repressible promoters, e.g., the promoter of the CTR3 gene of Saccharomyces cerevisiae, and inducible promoters, e.g., galactose inducible promoters of Saccharomyces cerevisiae (e.g., promoters of the GAL1, GAL7, and GAL10 genes).

[0158] In some embodiments, an additional nucleotide sequence comprising a nuclear localization sequence (NLS) is linked to the 5' of the nucleotide sequence encoding the nuclease. The NLS can facilitate nuclear localization of larger nucleases (>25 kD). In some embodiments, the nuclear localization sequence is an SV40 nuclear localization sequence. In some embodiments, the nuclear localization sequence is a yeast nuclear localization sequence.

[0159] A nuclease expression vector can be made by any technique apparent to one skilled in the art. In certain embodiments, the vector is made using polymerase chain reaction (PCR) and molecular cloning techniques well known in the art. See, e.g., PCR Technology: Principles and Applications for DNA Amplification, ed. HA Erlich, Stockton Press, New York, N.Y. (1989); Sambrook et al., 2001, Molecular Cloning--A Laboratory Manual, 3.sup.rd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

6.7 Host Cells

[0160] In another aspect, provided herein is a modified host cell generated by any of the methods of genomically integrating one or more exogenous nucleic acids described herein. Suitable host cells include any cell in which integration of a nucleic acid or "donor DNA" of interest into a chromosomal or episomal locus is desired. In some embodiments, the cell is a cell of an organism having the ability to perform homologous recombination. Although several of the illustrative embodiments are demonstrated in yeast (S. cerevisiae), it is believed that the methods of genomic modification provided herein can be practiced on all biological organisms having a functional recombination system, even where the recombination system is not as proficient as in yeast. Other cells or cell types that have a functional homologous recombination systems include bacteria such as Bacillus subtilis and E. coli (which is RecE RecT recombination proficient; Muyrers et al., EMBO rep. 1: 239-243, 2000); protozoa (e.g., Plasmodium, Toxoplasma); other yeast (e.g., Schizosaccharomyces pombe); filamentous fungi (e.g., Ashbya gossypii); plants, for instance the moss Physcomitrella patens (Schaefer and Zryd, Plant J. 11: 1195-1206, 1997); and animal cells, such as mammalian cells and chicken DT40 cells (Dieken et al., Nat. Genet. 12:174-182, 1996).

[0161] In some embodiments, the host cell is a prokaryotic cell. In some embodiments, the host cell is a eukaryotic cell. In some embodiments, the cell is a fungal cell (for instance, a yeast cell), a bacteria cell, a plant cell, or an animal cell (for instance, a chicken cell). In some embodiments, the host cell is a mammalian cell. In some embodiments, the host cell is a Chinese hamster ovary (CHO) cell, a COS-7 cell, a mouse fibroblast cell, a mouse embryonic carcinoma cell, or a mouse embryonic stem cell. In some embodiments, the host cell is an insect cell. In some embodiments, the host cell is a S2 cell, a Schneider cell, a S12 cell, a 5B1-4 cell, a Tn5 cell, or a Sf9 cell. In some embodiments, the host cell is a unicellular eukaryotic organism cell.

[0162] In particular embodiments, the host cell is a yeast cell. Useful yeast host cells include yeast cells that have been deposited with microorganism depositories (e.g. IFO, ATCC, etc.) and belong to the genera Aciculoconidium, Ambrosiozyma, Arthroascus, Arxiozyma, Ashbya, Babjevia, Bensingtonia, Botryoascus, Botryozyma, Brettanomyces, Bullera, Bulleromyces, Candida, Citeromyces, Clavispora, Cryptococcus, Cystofilobasidium, Debaryomyces, Dekkara, Dipodascopsis, Dipodascus, Eeniella, Endomycopsella, Eremascus, Eremothecium, Erythrobasidium, Fellomyces, Filobasidium, Galactomyces, Geotrichum, Guilliermondella, Hanseniaspora, Hansenula, Hasegawaea, Holtermannia, Hormoascus, Hyphopichia, Issatchenkia, Kloeckera, Kloeckeraspora, Kluyveromyces, Kondoa, Kuraishia, Kurtzmanomyces, Leucosporidium, Lipomnyces, Lodderomyces, Malassezia, Metschnikowia, Mrakia, Myxozyma, Nadsonia, Nakazawaea, Nematospora, Ogataea, Oosporidium, Pachysolen, Phachytichospora, Phaffia, Pichia, Rhodosporidium, Rhodotorula, Saccharomyces, Saccharomycodes, Saccharomycopsis, Saitoella, Sakaguchia, Saturnospora, Schizoblastosporion, Schizosaccharomyces, Schwanniomyces, Sporidiobolus, Sporobolomyces, Sporopachydermia, Stephanoascus, Sterigmatomyces, Sterigmatosporidium, Symbiotaphrina, Sympodiomyces, Sympodiomycopsis, Torulaspora, Trichosporiella, Trichosporon, Trigonopsis, Tsuchiyaea, Udeniomyces, Waltomyces, Wickerhamia, Wickerhamiella, Williopsis, Yamadazyma, Yarrowia, Zygoascus, Zygosaccharomyces, Zygowilliopsis, and Zygozyma, among others.

[0163] In some embodiments, the yeast host cell is a Saccharomyces cerevisiae cell, a Pichia pastoris cell, a Schizosaccharomyces pombe cell, a Dekkera bruxellensis cell, a Kluyveromyces lactis cell, a Arxula adeninivorans cell, or a Hansenula polymorpha (now known as Pichia angusta) cell. In a particular embodiment, the yeast host cell is a Saccharomyces cerevisiae cell. In some embodiments, the yeast host cell is a Saccharomyces fragilis cell or a Kluyveromyces lactis (previously called Saccharomyces lactis) cell. In some embodiments, the yeast host cell is a cell belonging to the genus Candida, such as Candida lipolytica, Candida guilliermondii, Candida krusei, Candida pseudotropicalis, or Candida utilis. In another particular embodiment, the yeast host cell is a Kluveromyces marxianus cell.

[0164] In particular embodiments, the yeast host cell is a Saccharomyces cerevisiae cell selected from the group consisting of a Baker's yeast cell, a CBS 7959 cell, a CBS 7960 cell, a CBS 7961 cell, a CBS 7962 cell, a CBS 7963 cell, a CBS 7964 cell, a IZ-1904 cell, a TA cell, a BG-1 cell, a CR-1 cell, a SA-1 cell, a M-26 cell, a Y-904 cell, a PE-2 cell, a PE-5 cell, a VR-1 cell, a BR-1 cell, a BR-2 cell, a ME-2 cell, a VR-2 cell, a MA-3 cell, a MA-4 cell, a CAT-1 cell, a CB-1 cell, a NR-1 cell, a BT-1 cell, and a AL-1 cell. In some embodiments, the host cell is a Saccharomyces cerevisiae cell selected from the group consisting of a PE-2 cell, a CAT-1 cell, a VR-1 cell, a BG-1 cell, a CR-1 cell, and a SA-1 cell. In a particular embodiment, the Saccharomyces cerevisiae host cell is a PE-2 cell. In another particular embodiment, the Saccharomyces cerevisiae host cell is a CAT-1 cell. In another particular embodiment, the Saccharomyces cerevisiae host cell is a BG-1 cell.

[0165] In some embodiments, the yeast host cell is a cell that is suitable for industrial fermentation, e.g., bioethanol fermentation. In particular embodiments, the cell is conditioned to subsist under high solvent concentration, high temperature, expanded substrate utilization, nutrient limitation, osmotic stress due, acidity, sulfite and bacterial contamination, or combinations thereof, which are recognized stress conditions of the industrial fermentation environment.

6.8 Kits

[0166] In another aspect, provided herein is a kit useful for performing the methods for genomically integrating one or more exogenous nucleic acids described herein. In some embodiments, the kit comprises: [0167] (a) a plurality of exogenous nucleic acids, wherein each exogenous nucleic acid (ES).sub.x comprises: [0168] (i) a first homology region (HR1).sub.x and a second homology region (HR2).sub.x, wherein (HR1).sub.x and (HR2).sub.x are capable of initiating host cell mediated homologous recombination of (ES).sub.x at a selected target site (TS).sub.x of a host cell genome; and [0169] (ii) a nucleic acid of interest (D).sub.x positioned 3' of (HR1).sub.x and 5' of (HR2).sub.x; [0170] (b) a plurality of nucleases, wherein each nuclease (N).sub.x capable of cleaving at (TS).sub.x, whereupon said cleaving results in homologous recombination of (ES).sub.x at (TS).sub.x; [0171] wherein x is any integer from 1 to n wherein n is at least 2.

[0172] In some embodiments, (D).sub.x is selected from the group consisting of a selectable marker, a promoter, a nucleic acid sequence encoding an epitope tag, a gene of interest, a reporter gene, and a nucleic acid sequence encoding a termination codon. In some embodiments, the kit further comprises a plurality of primer pairs (P).sub.x, wherein each primer pair is capable of identifying integration of (ES), at (TS).sub.x by PCR. In some embodiments, (ES), is linear. In some embodiments, (ES), is circular.

[0173] In a particular embodiment, the kit enables site-specific integration of an exogenous nucleic acid at a unique target site within any of the approximately 6000 genetic loci of the yeast genome. In these embodiments, n=.gtoreq.6000, wherein each (TS).sub.x is unique to a single locus of the yeast cell genome.

[0174] In some embodiments, the kit further comprises instructions for use that describe methods for integrating one or more exogenous nucleic acids into any genetic locus of a host yeast cell.

7. EXAMPLES

7.1 Example 1

Simultaneous Multiple Integration of a Plurality of Exogenous Nucleic Acids

[0175] The methods and compositions described herein are implemented to create a modified yeast cell comprising two exogenous nucleic acids integrated at two loci of the yeast cell genome in a single transformation step, wherein recovery of the modified yeast cell does not require the use of selectable marker(s).

[0176] A host strain is provided comprising: (a) a previously introduced recognition site for the F-CphI endonuclease positioned within the NDT80 locus; and (b): a previously introduced recognition site for the I-SceI endonuclease positioned within the HO locus. The host cell is simultaneously transformed with: (1) an expression plasmid encoding F-CphI; (2) an expression plasmid encoding I-SceI; (3) a linear DNA comprising an expression cassette encoding green fluorescent protein (GFP), flanked by two stretches of >500 bp sequence corresponding to the 5' and 3' regions of the NDT80 locus; and (4) a linear DNA comprising an expression cassette encoding lacZ, flanked by two stretches of >500 bp sequence corresponding to the 5' and 3' regions of the HO locus. As an alternative to inclusion of the expression plasmids encoding F-CphI and I-SceI, respectively, purified F-CphI and I-SceI protein are included in the transformation reaction. A non-double strand break control is performed by transforming host cells with the linear integration constructs (3) and (4) only, in the absence of F-CphI and I-SceI expression plasmid or purified protein.

[0177] Experimental and control transformants are plated on selection-free media, and colonies from each plate are visualized for expression of GFP and lacZ, respectively. Colony PCR is independently performed with a primer pair which anneals upstream and downstream of the junction between the integrated integration construct (3) or (4), respectively, and their respective target sequences, to confirm fidelity and frequency of integration.

7.2 Example 2

Simultaneous Multiple Integration of a Plurality of Exogenous Nucleic Acids

[0178] This Example provides results which demonstrate simultaneous integration of three exogenous nucleic acids at three different loci of a S. cerevisiae host following the induction of targeted double-stranded breaks in the host cell genome. In brief, an exogenous "target" nucleic acid sequence encoding a truncated, non-functional copy of Emerald Green Fluorescent Protein (emgfp.DELTA.) was integrated into the HO, YGR250c and NDT80 loci, respectively, of host yeast cells. Recombinant cells were transformed with linear "donor" DNA encoding an intact, functional copy of Emerald Green Fluorescent Protein (EmGFP) and either: (1) empty vector; or (2) an expression vector, pZFN.gfp, encoding a zinc-finger nuclease (ZFN.gfp) that specifically recognizes and cleaves a sequence within the emgfp.DELTA. coding sequence. Transformed colonies were screened by colony PCR (cPCR) for the replacement of one, two or three genomically integrated copies of the target emgfp.DELTA. coding sequence with the donor EmGFP coding sequence.

7.2.1. Construction and Integration of Target DNA

[0179] To generate exogenous genomic target sites for nuclease-mediated double-strand breaks, target DNAs encoding emgfp.DELTA. were constructed using RYSE-mediated assembly, as described in U.S. Pat. No. 8,110,360, the contents of which are hereby incorporated by reference in their entirety. Nucleotides 450 to 462 of the wild-type EmGFP coding sequence (SEQ ID NO:1) were replaced with the following sequence: 5'-CGTCTAAATCATG-3' (SEQ ID NO:2), resulting in the introduction of: (1) a premature stop codon at position 152 of EmGFP (emgfp.DELTA.); and (2) the recognition/cleavage sequence for ZFN.gfp.

[0180] For the targeted integration of the emgfp.DELTA. coding sequence into each of the HO, YGR250c and NDT80 loci, the emgfp.DELTA. coding sequence was flanked with .about.200-500 bp of upstream and downstream homologous sequences for each loci (SEQ ID NOS:3-8). A unique selectable marker was also incorporated into each construct, positioned 5' to the emgfp.DELTA. coding sequence, for selection of colonies having successful integration events. The HO integration construct included KanR, the YGR250c integration construct included URA3, and the NDT80 integration construct included NatR. Each integration construct was transformed sequentially into a nafve CEN.PK2 haploid yeast strain (strain A), and the strain was confirmed to have three integrated copies of the emgfp.DELTA. coding sequence.

7.2.2. Construction of ZFN Yeast Expression Plasmid

[0181] Zinc finger nucleases consist of two functional domains: a DNA-binding domain comprised of a chain of zinc finger proteins and a DNA-cleaving domain comprised of the nuclease domain of FokI. The endonuclease domain of FokI functions as an obligate heterodimer in order to cleave DNA, and thus, a pair of ZFNs is required to bind and cut its target sequence. The target sequence of ZFN.gfp (CompoZr.RTM. Zinc Finger Nuclease, Sigma-Aldrich, St. Louis, Mo.) is: 5'-ACAACTACAACAGCCACAACgtctatATCATGGCCGACAAGCA-3' (SEQ ID NO: 9), with the recognition sequence indicated in uppercase and the cleavage sequence indicated in lowercase.

[0182] A high-copy ZFN.gfp yeast expression plasmid, pZFN.gfp, was constructed as follows. The genes ZFN.gfp. 1 and ZFN.gfp.2, each encoding one member of the ZFN.gfp obligate heterodimer, were PCR-amplified from a mammalian expression plasmid and fused to the divergent P.sub.GAL1.10 promoter and ADH1 and CYC1 terminators, respectively. Individual PCR products of P.sub.GAL10>ZFN.gfp.1-T.sub.ADH1 and P.sub.GAL1>ZFN.gfp.2-T.sub.CYC1, along with a linearized vector backbone comprising a LEU2 selectable marker, were co-transformed into a naive yeast strain for in vivo assembly via homologous recombination of overlapping ends. The PCR products recombined at the pGAL1,10 promoter sequence and assembled into the vector backbone via homologous sequences added by the terminal primers. Transformants were selected on minimal media lacking leucine, isolated, and grown in liquid media. The plasmids from multiple clones were extracted from yeast using the Zymoprep Yeast Plasmid Miniprep I kit (Zymo Research). The eluent from the extraction protocol was then transformed into E. coli XL-1 blue chemically competent cells. Plasmids were propagated overnight in E. coli and miniprepped (Qiagen, Valencia, Calif.). Correct clones were identified by restriction mapping.

7.2.3. Transformation with Donor DNA and Induction of Double-Strand Breaks

[0183] A standard lithium acetate/SSDNA/PEG protocol (Gietz and Woods, Methods Enzymol. 350:87-96 (2002)) was used to co-transform strain A with linear "donor" DNA encoding EmGFP and either: (1) empty vector; or (2) the pZFN.gfp expression vector. The EmGFP coding sequence differs from the emgfp.DELTA. coding sequence at positions within the recognition/cleavage site for ZFN.gfp, namely positions 450 (C.fwdarw.G), 456 (A.fwdarw.T), 461 (T.fwdarw.C) and 462 (G.fwdarw.C). Thus, ZFN.gfp is expected to recognize and cleave within the emgfp.DELTA. sequence but not within the EmGFP sequence.

[0184] One microgram of the appropriate plasmid DNA was co-transformed with 70 ul of linear EmGFP DNA (.about.300 ng/ul). All transformations were recovered overnight in YP+2% galactose to induce ZFN expression. Various dilutions were plated onto minimal media agar plates lacking leucine to select for colonies transformed with plasmid DNA. Plates were incubated for 3 days at 30.degree. C.

7.2.4. Confirmation of Multiple Simultaneous Integration

[0185] Colony PCR was performed to determine the frequency of replacement of the emgfp.DELTA. coding sequence with the EmGFP coding sequence at each target locus. DNA was prepped from 96 colonies from each transformation and probed with primer pairs specific for EmGFP and HO, EmGFP and NDT80, and EmGFP and YGR250c, respectively, such that successful integration of the EmGFP coding sequence at each locus was expected to produce an amplicon of a predicted size, while non-integration was expected to produce no amplicon.

TABLE-US-00002 TABLE 2 Primer sequences for cPCR verification of multiple integration of the EmGFP coding sequence Primer Name Description Sequence SEQ ID NO KMH749- Forward CAACTACAACAGCCACAAGGTCT SEQ ID NO: 10 Fixed GFP- primer ATATCACC fwd specific to Em. GFP CR813 Reverse CTCTAACGCTGTTGGTAGATTG SEQ ID NO: 11 primer for HO locus KMH773- Reverse ACCATGTGATAATACACTACTAA SEQ ID NO: 12 NDT80-Ar primer for TGTGACTACTAGTTGA NDT80 locus KMH679- Reverse TCAGACGCGTTCGGAGGAGAGTG SEQ ID NO: 13 YGR250c 3' primer for CATTCAC rev YGR250c locus

[0186] As indicated in FIG. 5, of the 96 colonies transformed with linear EmGFP donor DNA (SEQ ID NO:1) and empty vector control, no amplicons were produced during PCR, indicating that there were no successful integration events, i.e., replacements at any of the three loci comprising the target emgfp.DELTA. coding sequence in the absence of a double-strand break. By contrast, of the 96 colonies transformed with linear EmGFP DNA and pZFN.gfp, 2 colonies had one locus replaced, 4 colonies had two loci replaced, and 23 colonies had all three loci replaced with the EmGFP coding sequence (FIG. 6). Colony PCR results were corroborated by visualizing the fluorescence of transformed colonies on plates (data not shown). None of the colonies transformed with EmGFP DNA and empty vector appeared green, indicating that none of the target emgfp.DELTA. coding sequences were replaced with functional EmGFP coding sequences. By contrast, .about.20% of colonies transformed with EmGFP DNA and pZFN.gfp appeared green, roughly correlating with the frequency of integration events observed by cPCR.

[0187] These results demonstrate that induction of multiple targeted double-strand breaks in the genome of a host cell can facilitate simultaneous multiple targeted integration of exogenous donor nucleic acids.

7.3 Example 3

Simultaneous Multiple Integration of Terpene Synthase Genes to Facilitate Conversion of a Farnesene Producing Strain to an Amorphadiene Producing Strain

[0188] This Example provides results which demonstrate simultaneous integration of three sesquiterpene synthase genes at three different engineered loci of a S. cerevisiae host engineered for high mevalonate pathway flux. As a result, a parental strain producing farnesene and comprising a plasmid-based copy of the farnesene synthase gene was converted into an amorphadiene producing strain comprising multiple genomically integrated copies of amorphadiene synthase. In brief, URA3, NatR and KanR marker cassettes flanked by F-CphI sites were integrated at the Gal80, HXT3 and Mata locus, respectively, of the host strain. The host was then co-transformed with a plasmid encoding the F-CphI endonuclease as well as three linear "donor" DNA constructs containing distinct codon optimizations of the amorphadiene synthase (ADS) gene expressed from the Gal1 promoter and terminated by the CYC1 terminator (ADS cassette), each flanked by homology regions for their respective target locus. Transformed colonies were screened by colony PCR (cPCR) for the replacement of one, two or three genomically integrated target marker loci with the ADS cassettes. A triply-integrated strain was identified and further engineered by integrating a fourth ADS cassette, and the resulting strain was cultured under conditions allowing for loss of the plasmid encoding farnesene synthase, such that its product profile was fully converted from farnesene to amorphadiene.

7.3.1. Construction of a Parental Farnesene Producing Strain

[0189] A farnesene-producing yeast strain, Y3639, useful for the multiple simultaneous integration of exogenous donor DNAs encoding amorphadiene synthase, was prepared as follows.

[0190] Strains Y93 (MAT A) and Y94 (MAT alpha) were generated by replacing the promoter of the ERG9 gene of yeast strains Y002 and Y003 (CEN.PK2 background MAT A or MAT alpha, respectively; ura3-52; trp1-289; leu2-3,112; his3.DELTA.1; MAL2-8C; SUC2; van Dijken et al. (2000) Enzyme Microb. Technol. 26:706-714), respectively, with the promoter of the MET3 gene of Saccharomyces cerevisiae. To this end, exponentially growing Y002 and Y003 cells were transformed with integration construct i8 (SEQ ID NO: 14), which comprised the kanamycin resistance marker (KanMX) flanked by the promoter and terminator of the Tef1 gene of Kluyveromyces lactis, the ERG9 coding sequence, a truncated segment of the ERG9 promoter (trunc. PERG9), and the MET3 promoter (PMET3), flanked by ERG9 upstream and downstream sequences. Host cell transformants were selected on medium comprising 0.5 .mu.g/mL Geneticin (Invitrogen Corp., Carlsbad, Calif.), and selected clones were confirmed by diagnostic PCR, yielding strains Y93 and Y94.

[0191] Strains Y176 (MAT A) and Y177 (MAT alpha) were generated by replacing the coding sequence of the ADE1 gene in strains Y93 and Y94, respectively, with the coding sequence of the LEU2 gene of Candida glabrata (CgLEU2). To this end, the 3.5 kb CgLEU2 genomic locus was PCR amplified from Candida glabrata genomic DNA (ATCC, Manassas, Va.) using primers 61-67-CPK066-G (SEQ ID NO: 15) and 61-67-CPK067-G (SEQ ID NO: 16), and transforming the PCR product into exponentially growing Y93 and Y94 cells. Host cell transformants were selected on CSM-L, and selected clones were confirmed by diagnostic PCR, yielding strains Y176 and Y177.

[0192] Strain Y188 was generated by introducing into strain Y176 an additional copy of the coding sequences of the ERG13, ERG10, and ERG12 genes of Saccharomyces cerevisiae, and a truncated coding sequence of the HMG1 gene of Saccharomyces cerevisiae, each under regulatory control of a galactose inducible promoter of the GAL1 or GAL10 gene of Saccharomyces cerevisiae. To this end, exponentially growing Y176 cells were transformed with 2 .mu.g of expression plasmids pAM491 and pAM495 digested with PmeI restriction endonuclease (New England Biolabs, Beverly, Mass.). Host cell transformants were selected on CSM lacking uracil and histidine (CSM-U-H), and selected clones were confirmed by diagnostic PCR, yielding strain Y188.

[0193] Strain Y189 was generated by introducing into strain Y177 an additional copy of the coding sequences of the ERG20, ERG8, and ERG19 genes of Saccharomyces cerevisiae, and a truncated coding sequence of the HMG1 gene of Saccharomyces cerevisiae, each under regulatory control of a galactose inducible promoter of the GAL1 or GAL10 gene of Saccharomyces cerevisiae. To this end, exponentially growing Y188 cells were transformed with 2 .mu.g of expression plasmids pAM489 and pAM497 digested with PmeI restriction endonuclease. Host cell transformants were selected on CSM lacking tryptophan and histidine (CSM-T-H), and selected clones were confirmed by diagnostic PCR, yielding strain Y189.

[0194] Strain Y238 was generated by mating strains Y188 and Y189, and by introducing an additional copy of the coding sequence of the IDI1 gene of Saccharomyces cerevisiae and a truncated coding sequence of the HMG1 gene of Saccharomyces cerevisiae, each under regulatory control of a galactose inducible promoter of the GAL1 or GAL10 gene of Saccharomyces cerevisiae. To this end, approximately 1.times.10.sup.7 cells of strains Y188 and Y189 were mixed on a YPD medium plate for 6 hours at room temperature, diploid cells were selected on CSM-H-U-T, and exponentially growing diploids were transformed with 2 .mu.g of expression plasmid pAM493 digested with PmeI restriction endonuclease. Host cell transformants were selected on CSM lacking adenine (CSM-A), and selected clones were confirmed by diagnostic PCR, yielding strain Y238.

[0195] Strains Y210 (MAT A) and Y211 (MAT alpha) were generated by sporulating strain Y238. The diploid cells were sporulated in 2% potassium acetate and 0.02% raffinose liquid medium, and approximately 200 genetic tetrads were isolated using a Singer Instruments MSM300 series micromanipulator (Singer Instrument Co, LTD. Somerset, UK). Spores were selected on CSM-A-H-U-T, and selected clones were confirmed by diagnostic PCR, yielding strains Y210 (MAT A) and Y211 (MAT alpha).

[0196] Strain Y221 was generated by transforming exponentially growing Y211 cells with vector pAM178. Host cell transformants were selected on CSM-L.

[0197] Strain Y290 was generated by deleting the coding sequence of the GAL80 gene of strain Y221. To this end, exponentially growing Y221 cells were transformed with integration construct i32 (SEQ ID NO: 17), which comprised the hygromycin B resistance marker (hph) flanked by the promoter and terminator of the Tef1 gene of Kluyveromyces lactis flanked by GAL80 upstream and downstream sequences. Host cell transformants were selected on medium comprising hygromycin B, and selected clones were confirmed by diagnostic PCR, yielding strain Y290.

[0198] Strain Y318 was generated by removing the pAM178 vector from strain Y290 by serial propagation in leucine-rich media, and testing individual colonies for their inability to grow on CSM-L, yielding strain Y318.

[0199] Strain Y409 was generated by introducing a heterologous nucleotide sequence encoding a .beta.-farnesene synthase into strain Y318. To this end, exponentially growing Y318 cells were transformed with expression plasmid pAM404. Host cell transformants were selected on CSM-L, yielding strain Y409.

[0200] Strain Y419 was generated by rendering the GAL promoters of strain Y409 constitutively active. To this end, exponentially growing Y409 cells were transformed with integration construct i33 (SEQ ID NO: 18), which comprised the nourseothricin resistance marker of Streptomyces noursei (NatR) flanked by the promoter and terminator of the Tef1 gene of Kluyveromyces lactis, and the coding sequence of the GAL4 gene of Saccharomyces cerevisiae under regulatory control of an "operative constitutive" version of its native promoter (PGAL4oc; Griggs & Johnston (1991) PNAS 88(19):8597-8601) and the GAL4 terminator (TGAL4), flanked by upstream and downstream sequences of the modified ERG9 promoter and coding sequences. Host cell transformants were selected on medium comprising nourseothricin, and selected clones were confirmed by diagnostic PCR, yielding strain Y419.

[0201] Strain Y677 was generated by introducing at the modified GAL80 locus of strain Y419 an additional copy of the coding region of the ERG12 gene of Saccharomyces cerevisiae under regulatory control of the promoter of the GAL1 gene of Saccharomyces cerevisiae. To this end, exponentially growing Y677 cells were transformed with integration construct i37 (SEQ ID NO: 19), which comprised the kanamycin resistance marker of Streptomyces noursei (KanR) flanked by the promoter and terminator of the Tef1 gene of Kluyveromyces lactis, and the coding and terminator sequences of the ERG12 gene of Saccharomyces cerevisiae flanked by the GAL1 promoter (PGAL1) and the ERG12 terminator (TERG12). Host cell transformants were selected on medium comprising kanamycin, and selected clones were confirmed by diagnostic PCR, yielding strain Y677.

[0202] Strain Y1551 was generated from strain Y677 by chemical mutagenesis. Mutated strains were screened for increased production of .beta.-farnesene, yielding strain Y1551.

[0203] Strain Y1778 was generated from strain Y1551 by chemical mutagenesis. Mutated strains were screened for increased production of .beta.-farnesene, yielding strain Y1778.

[0204] Strain Y1816 was generated by replacing the HXT3 coding sequence of strain Y1778 with two copies of an acetoacetyl-CoA thiolase coding sequence, one being derived from Saccharomyces cerevisiae and the other from C. butylicum, and one copy of the coding sequence of the HMGS gene of B. juncea. To this end, exponentially growing Y1778 cells were transformed with integration construct i301 (SEQ ID NO: 20), which comprised the hygromycin B resistance marker (hyg) flanked by the promoter and terminator of the Tef1 gene of Kluyveromyces lactis, the coding sequence of the ERG10 gene of Saccharomyces cerevisiae flanked by a truncated TDH3 promoter (tPTDH3) and the AHP1 terminator (TAHP1), the coding sequence of the acetoacetyl-CoA thiolase gene of C. butylicum (thiolase) flanked by the YPD1 promoter (PYPD1) and CCW12 terminator (TCCW12), and the coding sequence of the HMGS gene of B. juncea (HMGS) preceded by the TUB2 promoter (PTUB2), flanked by upstream and downstream sequences of the HXT3 gene of Saccharomyces cerevisiae. Host cell transformants were selected on medium comprising hygromycin B, and selected clones were confirmed by diagnostic PCR, yielding strain Y1816.

[0205] Strain Y2055 was generated from strain Y1778 by chemical mutagenesis. Mutant strains were screened for increased production of .beta.-farnesene, yielding strain Y2055.

[0206] Strain Y2295 was generated from strain Y2055 by chemical mutagenesis. Mutant strains were screened for increased production of .beta.-farnesene, yielding strain Y2295.

[0207] Strain Y3111 was generated by switching the mating type of strain Y2295 from MAT A to MAT alpha. To this end, exponentially growing Y2295 cells were transformed with integration construct i476 (SEQ ID NO: 21), which comprised the MAT alpha mating locus and the hygromycin B resistance marker (hygA). Host cell transformants were selected on medium comprising hygromycin B, and selected clones were confirmed by diagnostic PCR, yielding strain Y3111.

[0208] Strain Y2168 was generated from strain Y1816 by chemical mutagenesis. Mutant strains were screened for increased production of .beta.-farnesene, yielding strain Y2168.

[0209] Strain Y2446 was generated from strain Y2168 by chemical mutagenesis. Mutant strains were screened for increased production of .beta.-farnesene, yielding strain Y2446.

[0210] Strain Y3118 was generated by inserting into the native URA3 locus of strain Y2446 the coding sequence, promoter, and terminator of the GAL80 gene of Saccharomyces cerevisiae. To this end, exponentially growing Y2446 cells were transformed with integration construct i477 (SEQ ID NO: 22), which comprised the promoter, terminator, and coding sequence of the GAL80 gene of Saccharomyces cerevisiae (GAL80) flanked by overlapping URA3 sequences (which enable loop-out excision of the GAL80 gene by homologous recombination and restoration of the original URA3 sequence). Host cell transformants were selected on medium comprising 5-FOA, yielding strain Y3118.

[0211] Strain Y3215 was generated by mating strains Y3111 and Y3118. Approximately 1.times.10.sup.7 cells of strains Y3111 and Y3118 were mixed on a YPD medium plate for 6 hours at room temperature to allow for mating, followed by plating on YPD agar plate to isolate single colonies. Diploids were identified by screening by colony PCR for the presence of both the hphA-marked MAT alpha locus and the wild-type MAT A locus.

[0212] Strain Y3000 was generated by sporulating strain Y3215 and looping out the GAL80 coding sequence. The diploid cells were sporulated in 2% potassium acetate and 0.02% raffinose liquid medium. Random spores were isolated, plated on YPD agar, grown for 3 days, and then replica-plated to CSM-U to permit growth only of cells lacking GAL80 (i.e., having a functional URA3 gene). Spores were then tested for .beta.-farnesene production, the best producer was identified, and the presence of integration construct i301 was confirmed by diagnostic PCR, yielding strain Y3000.

[0213] Strain Y3284 was generated by removing the URA3 marker from strain Y3000. To this end, exponentially growing Y3000 cells were transformed with integration construct i94 (SEQ ID NO: 23), which comprised the hisG coding sequence of Salmonella, and the coding sequence of the ERG13 gene and a truncated coding sequence of the HMG1 gene of Saccharomyces cerevisiae under control of a galactose inducible promoter of the GAL1 or GAL10 gene of Saccharomyces cerevisiae, flanked by upstream and downstream sequences of the URA3 gene of Saccharomyces cerevisiae. Host cell transformants were selected on medium comprising 5-FOA, and selected clones were confirmed by diagnostic PCR, yielding strain Y3284.

[0214] Strain Y3385 was generated by replacing the NDT80 coding sequence of strain Y3284 with an additional copy of the coding sequence of an acetyl-CoA synthetase gene of Saccharomyces cerevisiae and the coding sequence of the PDC gene of Z. mobilis. To this end, exponentially growing Y3385 cells were transformed with integration construct i467 (SEQ ID NO: 24), which comprised the URA3 marker, the coding sequence of the ACS2 gene of Saccharomyces cerevisiae (ACS2) flanked by the HXT3 promoter (PHXT3) and PGK1 terminator (TPGK1), and the coding sequence of the PDC gene of Z. mobilis (zmPDC) flanked by the GAL7 promoter (PGAL7) and the TDH3 terminator (TTDH3), flanked by upstream and downstream NDT80 sequences. Host cell transformants were selected on CSM-U, and selected clones were confirmed by diagnostic PCR, yielding strain Y3385.

[0215] Strain Y3547 was generated from strain Y3385 by chemical mutagenesis. Mutated strains were screened for increased production of .beta.-farnesene, yielding strain Y3547.

[0216] Strain Y3639 was generated from strain Y3547 by chemical mutagenesis. Mutated strains were screened for increased production of .beta.-farnesene, yielding strain Y3639.

7.3.2. Construction and Integration of Target DNA

[0217] Exogenous genomic target sites for FcphI endonuclease-mediated double-strand breaks were integrated into three different loci of strain Y3639. Three target site cassettes were constructed using PCR assembly of overlapping fragments, each comprising the recognition sequence for tie FcphI endonuclease and the coding sequence for: (1) URA3 (flanked by homology regions for the modified Gal80 locus) (SEQ ID NO: 25); (2) NatR (flanked by homology regions for the modified HXT3 locus) (SEQ ID NO: 26); and (3) KanR (flanked by homology regions for the modified Mat.alpha. locus) (SEQ ID NO: 27), respectively. Each target site cassette was serially transformed into Y3639, and the strain was confirmed by colony PCR to have three integrated copies of the F-CphI-flanked marker cassettes at the correct loci ("strain B").

7.3.3. Construction of F-CphI Yeast Expression Plasmid

[0218] The F-CphI yeast expression plasmid pAM1799, comprising a HygR selectable marker, has been described previously in U.S. Pat. No. 7,919,605, which is hereby incorporated by reference in its entirety.

7.3.4. Transformation with Donor DNA and Induction of Double-Strand Breaks

[0219] The standard lithium acetate/SSDNA/PEG protocol (Gietz and Woods, Methods Enzymol. 2002; 350:87-96) was modified to include a 30 minute, 30 degree incubation of the cells prior to the 42 degree heat shock. This method was used to co-transform strain B with pAM1799, encoding FcphI endonuclease, and three linear "donor" DNAs, each comprising a codon optimized coding sequence for amorphadiene synthase (ADS) of Artemisia annua, flanked by homology regions to the modified Gal80 (SEQ ID NO: 28), HXT3 (SEQ ID NO: 29) and Mat.alpha. loci (SEQ ID NO: 30), respectively, of strain B.

[0220] One microgram of pAM1799 was co-transformed with .about.100 ng of each of the ADS donor DNAs. All transformations were recovered overnight in YP+2% galactose to induce F-CphI expression. Various dilutions were plated onto YPD agar plates containing hygromycin to select for colonies transformed with plasmid DNA. Plates were incubated for 3 days at 30.degree. C.

7.3.5. Confirmation of Multiple Simultaneous Integration

[0221] Colony PCR (cPCR) was performed to determine the frequency of replacement of the F-CphI-flanked marker cassette coding sequences with the ADS cassette coding sequence. DNA was prepped from 20 colonies probed with primer pairs specific for ADS and the Gal80 locus, ADS and the HXT3 locus, and ADS and the Mat.alpha. locus, respectively, such that successful integration of the ADS cassette coding sequence at each locus was expected to produce an amplicon of a predicted size, while non-integration was expected to produce no amplicon. PCR reactions to produce amplicons from the 5' and 3' ends of each locus were attempted in multiplex. In some cases, only the 5' or the 3' amplicon was successfully detected, but proper integration of the ADS cassette was confirmed at these loci by sequencing larger PCR fragments.

TABLE-US-00003 TABLE 3 Primer sequences for cPCR verification of multiple integration of the ADS cassette coding sequence Primer Name Description Sequence SEQ ID NO CUT24 Gal80 locus GTTTCTTTTGGATTGCGCTTGCC SEQ ID NO: 31 US FOR ART12 ADS codon v2 TACTGACAACCACATGTTAC SEQ ID NO: 32 5' REV ART45 ADS ORF 5' TACTGCTTCGGTAGTAGTTTCACC SEQ ID NO: 33 REV CTTCA ART210 Gal80 locus GGGAAGTCCAATTCAATAGT SEQ ID NO: 34 DS REV HJ207 HXT3 locus CATCTTCTCGAGATAACACCTGG SEQ ID NO: 35 US FOR AG KB349 CYC1T FOR ACGCGTGTACGCATGTAAC SEQ ID NO: 36 HJ602 HXT3 locus CAATTGGGGTTCTGGCAGTC SEQ ID NO: 37 DS REV CUT76 Mat.alpha. locus GAAGCCTGCTTTCAAAATTAAGA SEQ ID NO: 38 US FOR ACAAAGC HJ632 Mat.alpha. locus GAATTTACCTGTTCTTAGCTTGTA SEQ ID NO: 39 DS REV CCAGAG

[0222] Of the 20 colonies screened by cPCR, 14 had ADS integrated at the Gal80 locus, 17 had ADS integrated at the HXT3 locus, and four had ADS integrated at the Mat.alpha. locus. The low rate of integration at the Mat.alpha. locus can be explained by self-closure at this locus mediated by a direct repeat sequence flanking the F-CphI sites. In total, 6 clones had ADS integrated at a single site, 10 clones had ADS integrated at two sites, and three clones had ADS integrated at all three loci ("strains C"). The triply integrated strains were further confirmed by sequencing longer PCR products encompassing both flanks.

1.1.5 Completion of the Integrated ADS Strain and Sesquiterpene Assay

[0223] The triply integrated ADS strains were further engineered by integrating a final copy of ADS marked with a URA cassette (SEQ ID NO: 40) at the His3 locus using a standard protocol, and a resulting strain was confirmed for this fourth copy ("strain D"). Finally, strain D cells were passaged in non-selective media to lose the Leu+marked high copy farnesene synthase plasmid (pAM404) ("strain E").

[0224] Several isolates of strain E were assayed for sesquiterpene production alongside strain D and the original parent strain B. In brief, isolates of strains B, D and E were incubated in separate wells of a 96-well plate containing 360 .mu.L of Bird Seed Medium (BSM) with 2% sucrose per well (preculture). After 3 days of incubation at 33.5.degree. C. with 999 rpm agitation, 14.4 .mu.L of each well was inoculated into a well of a new 96-well plate containing 360 .mu.L of fresh BSM with 4% sucrose (production culture). After another 2 days of incubation at 33.5.degree. C. with 999 rpm agitation, samples were taken and analyzed for sesquiterpene production by gas chromatography (GC) analysis. Samples were extracted with methanol-heptane (1:1 v/v), and the mixtures were centrifuged to remove cellular material. An aliquot of the methanol-heptane extract was diluted into heptane, and then injected onto a methyl silicone stationary phase using a pulsed split injection. Farnesene and amorphadiene were separated by boiling point using GC with flame ionization detection (FID). Trans-.beta.-caryophyllene was used as a retention time marker to monitor successful injection and elution during the specified GC oven profile.

[0225] As shown in FIG. 7, total sesquiterpene production remained nearly identical (3-3.5 g/L) for all strains, but the product profile was successfully converted from Farnesene (strain B) to mixed product (strain D) to amorphadiene (strain E).

[0226] These results demonstrate that induction of multiple targeted double-strand breaks in the genome of a host cell can facilitate simultaneous multiple integrations of a functional gene cassette, in this case facilitating conversion of a farnesene-producing strain into an amorphadiene-producing strain in a single transformation.

7.4 Example 4

Simultaneous Replacement of Multiple Integrated Copies of Farnesene Synthase with Amorphadiene Synthase

[0227] This Example provides results which demonstrate the simultaneous replacement of four genomically integrated terpene synthase genes, facilitated by designer nuclease-induced double-strand breaks within the synthase coding regions. In brief, an existing farnesene production strain, derived from strain Y3639 (described in Example 3) but comprising four integrated rather than extrachromasomal copies of the farnesene synthase (FS) gene, was co-transformed with a plasmid encoding a designer TAL-effector nuclease (TALEN) and four linear donor DNAs encoding new terpene synthase genes. The designer TALEN is capable of binding to and cleaving a sequence unique to the integrated farnesene synthase genes. Transformed colonies were screened by colony PCR (cPCR) and strains with one, two or three or four genomically integrated target marker loci were identified.

7.4.1. Construction and Integration of Target DNA

[0228] Four donor cassettes, each comprising a terpene synthase coding sequence flanked by homology regions (.about.500 bp) to its respective target loci, were assembled by overlap PCR. Three of the donor DNAs comprised ADS coding sequences and no selectable marker (SEQ ID NOs: 41-43), while the final donor DNA was a cassette comprising a novel codon optimization of the farnesene synthase (FS) fused to a URA3 marker cassette (SEQ ID NO: 44). None of the donor DNAs contained the target site recognized by the FS-specific TALEN (5'-TAGTGGAGGAATTAAAAGAGGAAGTTAAGAAGGAATTGATAACTATCAA-3' (SEQ ID NO:45)).

[0229] For the replacement of the four integrated FS cassettes in the strain (Strain F), the hyg+marked TALEN plasmid was co-transformed into the host strain along with .about.500 ng of each linear donor DNA using the protocol described in Example 3. Various dilutions were plated onto CSM-URA+Hyg plates and incubated at 30 degrees for 3 day.

7.4.2. Confirmation of Multiple Simultaneous Integration

[0230] After selection for the TALEN plasmid and integration of the URA3 marked codon-FS cassette on CSM-URA+Hyg plates, colony PCR was performed to determine the frequency of replacement of the integrated FS cassettes with the unmarked ADS cassettes at three loci. DNA was prepped from 20 colonies and probed with primer pairs specific for integration of the ADS cassette at the NDT80, DIT1 and ERG10 loci, such that successful integration of the ADS cassette coding sequence at each locus was expected to produce an amplicon of a predicted size, while non-integration was expected to produce no amplicon.

TABLE-US-00004 TABLE 4 Primer sequences for cPCR verification of replacement of multiple farnesene synthase cassettes with amorphadiene synthase cassettes. Primer Name Description Sequence SEQ ID NO HJ272 NDT80 5' ATAACAATATTATAAAAAGCGCT SEQ ID NO: 46 FOR TAA ART45 ADS ORF 5' TACTGCTTCGGTAGTAGTTTCACC SEQ ID NO: 47 REV CTTCA HJ643 DIT1 5' FOR AAAATCCTTATATTATTGGCCC SEQ ID NO: 48 HJ799 ERG10 5' GTAGCCTAAAACAAGCGCC SEQ ID NO: 49 FOR

[0231] Three out of 48 clones examined had integrated a single ADS cassette in addition to the URA3-marked FS, one clone had integrated two ADS cassettes, and one clone had integrated all three ADS cassettes. Multiple integration results were further confirmed by sequencing longer PCR products encompassing both flanks.

[0232] These results demonstrate that expression of a site-specific designer nuclease in a host cell comprising a biosynthetic pathway can facilitate the simultaneous replacement of multiple integrated copies of a pathway gene with new pathway genes in a single transformation step.

7.5 Example 5

Simultaneous Multiple Integration of Markerless DNA Constructs into Two Loci Cut with Distinct Designer Nucleases

[0233] This Example provides results which demonstrate the simultaneous integration of two markerless DNA constructs at two native target sites, each site being cut with a distinct designer nuclease. In brief, an ADE.sup.- host strain was co-transformed with: (1) a linear DNA fragment comprising a GFP cassette (flanked by upstream and downstream regions homologous to the SFC1 locus); (2) a linear DNA fragment comprising an ADE2 cassette (flanked with upstream and downstream regions homologous to the YJR030c locus); and (3) plasmid(s) encoding designer nucleases that target sequences in the native SFC1 and YJR030c open reading frames, respectively. After selection for the plasmid(s), transformed colonies were screened visually for GFP fluorescence and for white color, indicating complementation of the ADE phenotype. Colony PCR (cPCR) was also performed to confirm replacement of both loci. Interestingly, a significant improvement in the rate of integration at both target loci was observed when the designer endonucleases were used in combination compared to the rate of integration when only a single designer nuclease was used.

7.5.1. Construction of Donor DNA Cassettes

[0234] Two donor DNAs were generated using PCR assembly of overlapping fragments: (1) a linear DNA fragment comprising a GFP cassette flanked by .about.500 bp of upstream and downstream regions homologous to the SFC1 locus (SEQ ID NO: 58); and (2) a linear DNA fragment comprising an ADE2 cassette flanked by .about.500 bp of upstream and downstream regions homologous to the YJR030c locus (SEQ ID NO: 59).

7.5.2. Construction of Heterodimeric ZFN Expression Plasmids

[0235] A plasmid encoding the YJR030c-specific zinc finger nuclease (ZFN) was constructed in two ways. In the first version, the two ORFs of a heterodimeric ZFN under expression of a divergent Gal1-10 promoter and terminated by the Adh1 and CYC1 terminators were cloned into a Kan marked CEN-ARS vector by a three part gap repair in yeast (pCUT006). A second version was also constructed wherein both ORFs of the heterodimeric ZFN were expressed from the Gal10 promoter as a single ORF with the monomers separated by a DNA sequence encoding a cleavable peptide linker. This second plasmid was constructed by a three-part ligation using linkers produced by type IIS restriction enzyme digest of PCR fragments into a Kan marked CEN-ARS vector backbone (pCUT016). A plasmid encoding the SFC1-specific ZFN was also constructed as a single ORF using the same linker strategy, marker and backbone (pCUT015). The marker was then changed to URA by means of a gap repair reaction in yeast (pCUT058). To construct a single plasmid for expression of both the YJR030c and SFC1-specific nucleases, the single ORFs from pCUT16 and pCUT15 were subcloned into a new CEN-ARS Kan+vector backbone, and expressed from the Gal1-10 divergent promoter with Cyc1 and Adh1 terminators (pCUT032).

7.5.1. Transformation with Donor DNA and Induction of Double-Strand Breaks

[0236] One microgram of each designer nuclease plasmid DNA, or the plasmid containing both designer endonucleases on a single plasmid, was co-transformed with .about.1 microgram of each of the donor DNAs. All transformations were recovered overnight in YP+2% galactose to induce nuclease expression. Various dilutions were plated onto URA dropout+Kan agar plates (for the dual plasmids) or YPD+Kan to select for colonies transformed with plasmid DNA. Plates were incubated for 3-4 days at 30.degree. C.

7.5.2. Confirmation of Multiple Simultaneous Integration

[0237] Marker-less integration at the SFC1 locus was scored by observation of GFP fluorescence under UV light using appropriate filters. Marker-less integration of ADE2 was scored by observation of a white colony color, indicating complementation of the ADE2 deletion phenotype (pink colonies) in the host strain. In a typical experiment, 50-150 colonies were assayed. The visual scoring strategy was confirmed in a subset of colonies by colony PCR using primers 5' of the integration construct and an internal reverse primer. Integration at each locus was expected to produce an amplicon of a predicted size, while non-integration was expected to produce no amplicon. The cPCR results confirmed the accuracy of the visual scoring method.

TABLE-US-00005 TABLE 4 Primer sequences for successful cPCR verification of multiple integration of the ADS cassette coding sequence Primer Name Description Sequence SEQ ID NO CUT351 SFC1 5' cPCR GCGAATGAGCCATGAATTATTAA SEQ ID NO: 63 CCGC CUT350 YJR030c 5' AGATGAAACGAATTACTAGCATT SEQ ID NO: 64 cPCR TTATCCGTTC CUT371 ADE2 cassette TAACTACCATTACTCAGTGTACTT SEQ ID NO: 65 REV GATTGTTTTGTCCGATTTTCTTG HJ788 GFP cassette GCCGGGTGACAGAGAAATATTG SEQ ID NO: 66 REV

[0238] As indicated in FIG. 8, in cells co-transformed with linear donor DNAs for the SFC1 and YJR030c loci, and the YJR030c endonuclease plasmid (pCUT006) and SFC1 endonuclease plasmid (pCUT058), 80% of colonies selected on URA dropout+Kan agar plates were GFP positive. Of these colonies, 91% were positive for ADE2 integration. In total, 72.8% of colonies had integrated the donor DNA at both loci.

[0239] In cells co-transformed with linear donor DNA for the SFC1 locus and the designer nuclease plasmid targeting SFC1 (pCUT015), 50% of the cells were positive for GFP. When cells were co-transformed with linear donor DNA for the YJR030c locus and the designer nuclease plasmid targeting the YJR030c locus (pCUT016), only 5% of the cells were positive for ADE2 integration. When the host cells were co-transformed with linear DNAs for the SFC1 and YJR030c loci, and the SFC1/YJR030c designer nuclease plasmid (pCUT032), 76% of the cells were GFP positive, and 63% were ADE2 positive. This result is notable in that it demonstrates an unexpectedly significant improvement in integration efficiency when multiple sites are targeted by designer endonucleases.

[0240] These results demonstrate that induction of multiple targeted double-strand breaks at native loci in the genome of a host cell can facilitate simultaneous, multiple, marker-less integrations of functional gene cassettes.

[0241] All publications, patents and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

Sequence CWU 1

1

661741DNAArtificial SequenceSynthetic Wild-type Em.GFP coding sequence 1atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180ctcgtgacca ccttgaccta cggcgtgcag tgcttcgccc gctaccccga ccacatgaag 240cagcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420aagctggagt acaactacaa cagccacaag gtctatatca ccgccgacaa gcagaagaac 480ggcatcaagg tgaacttcaa gacccgccac aacatcgagg acggcagcgt gcagctcgcc 540gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600tacctgagca cccagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660ctgctggagt tcgtgaccgc cgccgggatc actctcggca tggacgagct gtacaagtaa 720ctcgagaagc ttgatccggc t 741213DNAArtificial SequenceSynthetic EmGFP linker encoding premature stop codon 2cgtctaaatc atg 133500DNAArtificial SequenceSynthetic HO upstream integration sequence 3cgcaagtcct gtttctatgc ctttctctta gtaattcacg aaataaacct atggtttacg 60aaatgatcca cgaaaatcat gttattattt acatcaacat atcgcgaaaa ttcatgtcat 120gtccacatta acatcattgc agagcaacaa ttcattttca tagagaaatt tgctactatc 180acccactagt actaccattg gtacctacta ctttgaattg tactaccgct gggcgttatt 240aggtgtgaaa ccacgaaaag ttcaccataa cttcgaataa agtcgcggaa aaaagtaaac 300agctattgct actcaaatga ggtttgcaga agcttgttga agcatgatga agcgttctaa 360acgcactatt catcattaaa tatttaaagc tcataaaatt gtattcaatt cctattctaa 420atggctttta tttctattac aactattagc tctaaatcca tatcctcata agcagcaatc 480aattctatct atactttaaa 5004200DNAArtificial SequenceSynthetic HO downstream integration sequence 4aatgtgtata ttagtttaaa aagttgtatg taataaaagt aaaatttaat attttggatg 60aaaaaaacca tttttagact ttttcttaac tagaatgctg gagtagaaat acgccatctc 120aagatacaaa aagcgttacc ggcactgatt tgtttcaacc agtatataga ttattattgg 180gtcttgatca actttcctca 2005490DNAArtificial SequenceSynthetic YGR250c upstream integration sequence 5gtacgatgtt tctcccgctg atccgattac tagccgaaga cgtaaaattg gcgcttgatt 60caatttatgc ccttcccggg aatagttgac caaagggcaa aaaaattcag tcggagattc 120cctattgggc ggaatttagt agatctcttt ccgtgcataa cgcctgcccg ttagtcgtta 180tttcacgtta acattttctt ggccactgcg ctatataaat aaatacatat atatatgtca 240agcacaataa agaaacttcc cttaaatatt gaataagtaa ataatagttg aaaagtgcct 300tttgttcgaa ggattagagt gttcttaatt ttagttcgtt caacggtctc aaaaaaagtg 360tgaacaagta aagcatagca cacatcccaa attacaaggc accctgatta aaaatccaaa 420aataaaccat aagttttatt ttactaaaaa cattatacgt gaaagacaaa ccgcatcaga 480agtttcgagg 4906185DNAArtificial SequenceSynthetic YGR250c downstream integration sequence 6attgcatcag gtccataaaa tgtttttgtc tgcttttttt tcttcatgta ttagttggtt 60tttattttta tattttcatt tatcttattc atacttttta ctcctttttt cttcattctt 120tacgatcttg gacattcaac tagcctatgg taacttttct tattactttg cccctccttg 180aggtg 1857504DNAArtificial SequenceSynthetic NDT80 upstream integration sequence 7catcaagcgc tccaagctga cataaatcgc actttgtatc tacttttttt tattcgaaaa 60caaggcacaa caatgaatct atcgccctgt gagattttca atctcaagtt tgtgtaatag 120atagcgttat attatagaac tataaaggtc cttgaatata catagtgttt cattcctatt 180actgtatatg tgactttaca ttgttacttc cgcggctatt tgacgttttc tgcttcaggt 240gcggcttgga gggcaaagtg tcagaaaatc ggccaggccg tatgacacaa aagagtagaa 300aacgagatct caaatatctc gaggcctgtc ctctatacaa ccgcccagct ctctgacaaa 360gctccagaac ggttgtcttt tgtttcgaaa agccaaggtc ccttataatt gccctccatt 420ttgtgtcacc tatttaagca aaaaattgaa agtttactaa cctttcatta aagagaaata 480acaatattat aaaaagcgct taaa 5048202DNAArtificial SequenceSynthetic NDT80 downstream integration sequence 8ataaactaat gattttaaat cgttaaaaaa atatgcgaat tctgtggatc gaacacagga 60cctccagata acttgaccga agttttttct tcagtctggc gctctcccaa ctgagctaaa 120tccgcttact atttgttatc agttcccttc atatctacat agaataggtt aagtatttta 180ttagttgcca gaagaactac tg 202943DNAArtificial SequenceSynthetic Recognition sequence for ZFN.gfp 9acaactacaa cagccacaac gtctatatca tggccgacaa gca 431031DNAArtificial SequenceSynthetic Forward primer specific to Em.GFP 10caactacaac agccacaagg tctatatcac c 311122DNAArtificial SequenceSynthetic Reverse primer for HO locus 11ctctaacgct gttggtagat tg 221239DNAArtificial SequenceSynthetic Reverse primer for NDT80 locus 12accatgtgat aatacactac taatgtgact actagttga 391330DNAArtificial SequenceSynthetic Reverse primer for YGR250c locus 13tcagacgcgt tcggaggaga gtgcattcac 30145251DNAArtificial SequenceSynthetic Integration construct i8 14ttgcctatgc tttgtttgct ttgaacactt gtttccgctc tccttttact tattggctac 60taaaactacg tgtaaaagat cgcccagcgc aaaaaggtcc ggcggtttca aataatctcg 120aactattcct ataatatgca aaatagtagg taggaacaag tcgactctag gcagataagg 180aagatgtccg gtaaatggag actagtgctg accgggatag gcaatccaga gcctcagtac 240gctggtaccc gtcacaatgt agggctatat atgctggagc tgctacgaaa gcggcttggt 300ctgcagggga gaacttattc ccctgtgcct aatacgggcg gcaaagtgca ttatatagaa 360gacgaacatt gtacgatact aagatcggat ggccagtaca tgaatctaag tggagaacag 420gtgtgcaagg tctgggcccg gtacgccaag taccaagccc gacacgtagt tattcatgac 480gagttaagtg tggcgtgtgg aaaagtgcag ctcagagccc ccagcaccag tattagaggt 540cataatgggc tgcgaagcct gctaaaatgc agtggaggcc gtgtaccctt tgccaaattg 600gctattggaa tcggcagaga acctgggtcc cgttctagag accctgcgag cgtgtcccgg 660tgggttctgg gagctctaac tccgcaggaa ctacaaacct tgcttacaca gagtgaacct 720gctgcctggc gtgctctgac tcagtacatt tcatagtgga tggcggcgtt agtatcgaat 780cgacagcagt atagcgacca gcattcacat acgattgacg catgatatta ctttctgcgc 840acttaacttc gcatctgggc agatgatgtc gaggcgaaaa aaaatataaa tcacgctaac 900atttgattaa aatagaacaa ctacaatata aaaaaactat acaaatgaca agttcttgaa 960aacaagaatc tttttattgt cagtactgat tagaaaaact catcgagcat caaatgaaac 1020tgcaatttat tcatatcagg attatcaata ccatattttt gaaaaagccg tttctgtaat 1080gaaggagaaa actcaccgag gcagttccat aggatggcaa gatcctggta tcggtctgcg 1140attccgactc gtccaacatc aatacaacct attaatttcc cctcgtcaaa aataaggtta 1200tcaagtgaga aatcaccatg agtgacgact gaatccggtg agaatggcaa aagcttatgc 1260atttctttcc agacttgttc aacaggccag ccattacgct cgtcatcaaa atcactcgca 1320tcaaccaaac cgttattcat tcgtgattgc gcctgagcga gacgaaatac gcgatcgctg 1380ttaaaaggac aattacaaac aggaatcgaa tgcaaccggc gcaggaacac tgccagcgca 1440tcaacaatat tttcacctga atcaggatat tcttctaata cctggaatgc tgttttgccg 1500gggatcgcag tggtgagtaa ccatgcatca tcaggagtac ggataaaatg cttgatggtc 1560ggaagaggca taaattccgt cagccagttt agtctgacca tctcatctgt aacatcattg 1620gcaacgctac ctttgccatg tttcagaaac aactctggcg catcgggctt cccatacaat 1680cgatagattg tcgcacctga ttgcccgaca ttatcgcgag cccatttata cccatataaa 1740tcagcatcca tgttggaatt taatcgcggc ctcgaaacgt gagtcttttc cttacccatg 1800gttgtttatg ttcggatgtg atgtgagaac tgtatcctag caagatttta aaaggaagta 1860tatgaaagaa gaacctcagt ggcaaatcct aaccttttat atttctctac aggggcgcgg 1920cgtggggaca attcaacgcg tctgtgaggg gagcgtttcc ctgctcgcag gtctgcagcg 1980aggagccgta atttttgctt cgcgccgtgc ggccatcaaa atgtatggat gcaaatgatt 2040atacatgggg atgtatgggc taaatgtacg ggcgacagtc acatcatgcc cctgagctgc 2100gcacgtcaag actgtcaagg agggtattct gggcctccat gtcgctggcc gggtgacccg 2160gcggggacga ggcaagctaa acagatctga tcttgaaact gagtaagatg ctcagaatac 2220ccgtcaagat aagagtataa tgtagagtaa tataccaagt attcagcata ttctcctctt 2280cttttgtata aatcacggaa gggatgattt ataagaaaaa tgaatactat tacacttcat 2340ttaccaccct ctgatctaga ttttccaacg atatgtacgt agtggtataa ggtgaggggg 2400tccacagata taacatcgtt taatttagta ctaacagaga cttttgtcac aactacatat 2460aagtgtacaa atatagtaca gatatgacac acttgtagcg ccaacgcgca tcctacggat 2520tgctgacaga aaaaaaggtc acgtgaccag aaaagtcacg tgtaattttg taactcaccg 2580cattctagcg gtccctgtcg tgcacactgc actcaacacc ataaacctta gcaacctcca 2640aaggaaatca ccgtataaca aagccacagt tttacaactt agtctcttat gaagttactt 2700accaatgaga aatagaggct ctttctcgag aaatatgaat atggatatat atatatatat 2760atatatatat atatatatat gtaaacttgg ttctttttta gcttgtgatc tctagcttgg 2820gtctctctct gtcgtaacag ttgtgatatc ggaagaagag aaaagacgaa gagcagaagc 2880ggaaaacgta tacacgtcac atatcacaca cacacaatgg gaaagctatt acaattggca 2940ttgcatccgg tcgagatgaa ggcagctttg aagctgaagt tttgcagaac accgctattc 3000tccatctatg atcagtccac gtctccatat ctcttgcact gtttcgaact gttgaacttg 3060acctccagat cgtttgctgc tgtgatcaga gagctgcatc cagaattgag aaactgtgtt 3120actctctttt atttgatttt aagggctttg gataccatcg aagacgatat gtccatcgaa 3180cacgatttga aaattgactt gttgcgtcac ttccacgaga aattgttgtt aactaaatgg 3240agtttcgacg gaaatgcccc cgatgtgaag gacagagccg ttttgacaga tttcgaatcg 3300attcttattg aattccacaa attgaaacca gaatatcaag aagtcatcaa ggagatcacc 3360gagaaaatgg gtaatggtat ggccgactac atcttagatg aaaattacaa cttgaatggg 3420ttgcaaaccg tccacgacta cgacgtgtac tgtcactacg tagctggttt ggtcggtgat 3480ggtttgaccc gtttgattgt cattgccaag tttgccaacg aatctttgta ttctaatgag 3540caattgtatg aaagcatggg tcttttccta caaaaaacca acatcatcag agattacaat 3600gaagatttgg tcgatggtag atccttctgg cccaaggaaa tctggtcaca atacgctcct 3660cagttgaagg acttcatgaa acctgaaaac gaacaactgg ggttggactg tataaaccac 3720ctcgtcttaa acgcattgag tcatgttatc gatgtgttga cttatttggc cggtatccac 3780gagcaatcca ctttccaatt ttgtgccatt ccccaagtta tggccattgc aaccttggct 3840ttggtattca acaaccgtga agtgctacat ggcaatgtaa agattcgtaa gggtactacc 3900tgctatttaa ttttgaaatc aaggactttg cgtggctgtg tcgagatttt tgactattac 3960ttacgtgata tcaaatctaa attggctgtg caagatccaa atttcttaaa attgaacatt 4020caaatctcca agatcgaaca gtttatggaa gaaatgtacc aggataaatt acctcctaac 4080gtgaagccaa atgaaactcc aattttcttg aaagttaaag aaagatccag atacgatgat 4140gaattggttc caacccaaca agaagaagag tacaagttca atatggtttt atctatcatc 4200ttgtccgttc ttcttgggtt ttattatata tacactttac acagagcgtg aagtctgcgc 4260caaataacat aaacaaacaa ctccgaacaa taactaagta cttacataat aggtagaggc 4320ctatccttaa agataacctt atatttcatt acatcaacta attcgacctt attatctttc 4380gaattgaaat gcattatacc catcggtacg tctagctttg tcaccttccc cagtaaacgc 4440tgtttcttgc cgacaaacaa tgtggccctc tctccgtcaa tctgtaacga cccaaatcgt 4500attaaagttt cgccgtcctg ttcactgaac cttccctcat ttggagaatc tctcctcgcc 4560agcgacgcaa agtccttagg caactctagt tcaccttgaa tctccagcat catcatccca 4620agcggtgtta tcaccgtggt ctgcttttct cttgactgtg tcaacttctg ccattgacta 4680gcatctatat ctacactagg cattcttttc agctgtttat tgggctgaat gatagtgata 4740attctttttt ctatcactcc tttggctata ttagtggtta gcttactaaa aaagattaaa 4800ggaaaaatga aattcaagat gctaacgttg acatgtatat tttaagaaaa caaaaatcat 4860acaaagagga gatcggatat aaaagaataa cataaatatg tttagtgcat taggtaaatg 4920ggtccgaggc tctcgcaatg ataaggactt tgtgacgaag tataccgcag atttatcaca 4980aataacttca cagatccatc aattagatgt cgcgttaaag aaaagccaat ccatcttgag 5040tcaatggcaa tcaaatctga ccttttatgg tattgcgtta acggtattgg ccctgagcta 5100cacatattgg gagtaccatg gttatcgacc ataccttgtg gtgactgcgc tactatgcat 5160aggctcgcta atcttgttca aatgggcatt aaccaaactc tatgcatttt ataacaacaa 5220taggttacgc aagttggcaa aactccgtgc a 52511570DNAArtificial SequenceSynthetic Primer 61-67-CPK066-G 15ggtaagacgg ttgggtttta tcttttgcag ttggtactat taagaacaat cacaggaaac 60agctatgacc 701670DNAArtificial SequenceSynthetic Primer 61-67-CPK067-G 16ttgcgttttg tactttggtt cgctcaattt tgcaggtaga taatcgaaaa gttgtaaaac 60gacggccagt 70174162DNAArtificial SequenceSynthetic Integration construct i32 17gcctgtctac aggataaaga cgggtcggat acctgcacaa gcaatttggc acctgcatac 60cccatttccc cagtagataa cttcaacaca cacatcaatg tccctcacca gtttatttcc 120aaaagagacg ctttttacta cctgactaga ttttcatttt gtttcttttg gattgcgctt 180gcctttgtag gtgtgtcgtt tatcctttac gttttgactt ggtgctcgaa gatgctttca 240gagatggtgc ttatcctcat gtcttttggg tttgtcttca atacggcagc cgttgtcttg 300caaacggccg cctctgccat ggcaaagaat gctttccatg acgatcatcg tagtgcccaa 360ttgggtgcct ctatgatggg tatggcttgg gcaagtgtct ttttatgtat cgtggaattt 420atcctgctgg tcttctggtc tgttagggca aggttggcct ctacttactc catcgacaat 480tcaagataca gaacctcctc cagatggaat cccttccata gagagaagga gcaagcaact 540gacccaatat tgactgccac tggacctgaa gacatgcaac aaagtgcaag catagtgggg 600ccttcttcca atgctaatcc ggtcactgcc actgctgcta cggaaaacca acctaaaggt 660attaacttct tcactataag aaaatcacac gagcgcccgg acgatgtctc tgtttaaatg 720gcgcaagttt tccgctttgt aatatatatt tatacccctt tcttctctcc cctgcaatat 780aatagtttaa ttctaatatt aataatatcc tatattttct tcatttaccg gcgcactctc 840gcccgaacga cctcaaaatg tctgctacat tcataataac caaaagctca taactttttt 900ttttgaacct gaatatatat acatcacata tcactgctgg tccttgccga ccagcgtata 960caatctcgat agttggtttc ccgttctttc cactcccgtc cacaggaaac agctatgacc 1020atgattacgc caagctattt aggtgacact atagaatact caagctatgc atcaagcttg 1080gtaccgagct cggatccact agtaacggcc gccagtgtgc tggaattcgc cctgtcgaca 1140ctagtaatac acatcatcgt cctacaagtt catcaaagtg ttggacagac aactatacca 1200gcatggatct cttgtatcgg ttcttttctc ccgctctctc gcaataacaa tgaacactgg 1260gtcaatcata gcctacacag gtgaacagag tagcgtttat acagggttta tacggtgatt 1320cctacggcaa aaatttttca tttctaaaaa aaaaaagaaa aatttttctt tccaacgcta 1380gaaggaaaag aaaaatctaa ttaaattgat ttggtgattt tctgagagtt ccctttttca 1440tatatcgaat tttgaatata aaaggagatc gaaaaaattt ttctattcaa tctgttttct 1500ggttttattt gatagttttt ttgtgtatta ttattatgga ttagtactgg tttatatggg 1560tttttctgta taacttcttt ttattttagt ttgtttaatc ttattttgag ttacattata 1620gttccctaac tgcaagagaa gtaacattaa aaatgaaaaa gcctgaactc accgcgacgt 1680ctgtcgagaa gtttctgatc gaaaagttcg acagcgtctc cgacctgatg cagctctcgg 1740agggcgaaga atctcgtgct ttcagcttcg atgtaggagg gcgtggatat gtcctgcggg 1800taaatagctg cgccgatggt ttctacaaag atcgttatgt ttatcggcac tttgcatcgg 1860ccgcgctccc gattccggaa gtgcttgaca ttggggaatt cagcgagagc ctgacctatt 1920gcatctcccg ccgtgcacag ggtgtcacgt tgcaagacct gcctgaaacc gaactgcccg 1980ctgttctgca gccggtcgcg gaggccatgg atgcgatcgc tgcggccgat cttagccaga 2040cgagcgggtt cggcccattc ggaccgcaag gaatcggtca atacactaca tggcgtgatt 2100tcatatgcgc gattgctgat ccccatgtgt atcactggca aactgtgatg gacgacaccg 2160tcagtgcgtc cgtcgcgcag gctctcgatg agctgatgct ttgggccgag gactgccccg 2220aagtccggca cctcgtgcac gcggatttcg gctccaacaa tgtcctgacg gacaatggcc 2280gcataacagc ggtcattgac tggagcgagg cgatgttcgg ggattcccaa tacgaggtcg 2340ccaacatctt cttctggagg ccgtggttgg cttgtatgga gcagcagacg cgctacttcg 2400agcggaggca tccggagctt gcaggatcgc cgcggctccg ggcgtatatg ctccgcattg 2460gtcttgacca actctatcag agcttggttg acggcaattt cgatgatgca gcttgggcgc 2520agggtcgatg cgacgcaatc gtccgatccg gagccgggac tgtcgggcgt acacaaatcg 2580cccgcagaag cgcggccgtc tggaccgatg gctgtgtaga agtactcgcc gatagtggaa 2640accgacgccc cagcactcgt ccgagggcaa aggaataggt ttaacttgat actactagat 2700tttttctctt catttataaa atttttggtt ataattgaag ctttagaagt atgaaaaaat 2760cctttttttt cattctttgc aaccaaaata agaagcttct tttattcatt gaaatgatga 2820atataaacct aacaaaagaa aaagactcga atatcaaaca ttaaaaaaaa ataaaagagg 2880ttatctgttt tcccatttag ttggagtttg cattttctaa tagatagaac tctcaattaa 2940tgtggattta gtttctctgt tcgttttttt ttgttttgtt ctcactgtat ttacatttct 3000atttagtatt tagttattca tataatctta acttctcgag gagctctaag ggcgaattct 3060gcagatatcc atcacactgg cggccgctcg agcatgcatc tagagggccc aattcgccct 3120atagtgagtc gtattacaat tcactggccg tcgttttaca acaagcatct tgccctgtgc 3180ttggccccca gtgcagcgaa cgttataaaa acgaatactg agtatatatc tatgtaaaac 3240aaccatatca tttcttgttc tgaactttgt ttacctaact agttttaaat ttcccttttt 3300cgtgcatgcg ggtgttctta tttattagca tactacattt gaaatatcaa atttccttag 3360tagaaaagtg agagaaggtg cactgacaca aaaaataaaa tgctacgtat aactgtcaaa 3420actttgcagc agcgggcatc cttccatcat agcttcaaac atattagcgt tcctgatctt 3480catacccgtg ctcaaaatga tcaaacaaac tgttattgcc aagaaataaa cgcaaggctg 3540ccttcaaaaa ctgatccatt agatcctcat atcaagcttc ctcatagaac gcccaattac 3600aataagcatg ttttgctgtt atcaccgggt gataggtttg ctcaaccatg gaaggtagca 3660tggaatcata atttggatac taatacaaat cggccatata atgccattag taaattgcgc 3720tcccatttag gtggttctcc aggaatacta ataaatgcgg tgcatttgca aaatgaattt 3780attccaaggc caaaacaaca cgatgaatgg ctttattttt ttgttattcc tgacatgaag 3840ctttatgtaa ttaaggaaac ggacatcgag gaatttgcat cttttttaga tgaaggagct 3900attcaagcac caaagctatc cttccaggat tatttaagcg gtaaggccaa ggcttcccaa 3960caggttcatg aagtgcatca tagaaagctt acaaggtttc agggtgaaac ttttctaaga 4020gattggaact tagtctgtgg gcattataag agagatgcta agtgtggaga aatgggaccc 4080gacataattg cagcatttca agatgaaaag ctttttcctg agaataatct agccttaatt 4140tctcatattg ggggtcatat tt 4162187879DNAArtificial SequenceSynthetic Integration construct i33misc_feature(270)..(270)n is a, c, g, or t 18atgaattggc cagttttttc caattatgga acgcctgttc ctgatccacg gcctgcactt 60gcgaccacaa ttccacacct gaggcgcctg cctcttttcc agcatgtggc aactgtcccc 120acgacagggc atcccagaat cctctggtaa atcttaaatg aaactgacgc gtggcagtag 180attccaacaa tggtgggatg gcccgtggga aagtcgtgta gtgctcatac gcatcatatg 240acatggatga tacggccggg tcaaacggtn cgattgcagt tggaatgcaa atgagagtag 300cagatcattg ttgggcagcg gcttcaacac cagtgcttcg tcgtacggat accataaact 360gtcatttata ccaatctgcg acaccgtgtc ttctgcgaac acacccagca gtagagtgcc 420cagcatgaaa taggccagtg tgaggatcat cgtcgtcttg cctatgcttt gtttgctttg 480aacacttgtt tccgctctcc ttttacttat tggctactaa aactacgtgt aaaagatcgc 540ccagcgcaaa aaggtccggc ggtttcaaat aatctcgaac tattcctata atatgcaaaa 600tagtaggtag gaacaagtca actctaggca gataacgaag atgtccggta aatggagact 660agtgctgact gggataggca

atccagagcc tcagtacgct ggcacccgtc acaatgtagg 720gctatatatg ctggagctgc tacgaaagcg gcttggtctg caggggagaa cttattcccc 780tgtgcctaat acgggcggca aagtgcatta tatagaagac gaacattgta cgatactaag 840atcggatggc cagtacatga atctaagtgg agaacaggtg tgcaaggtct gggcccggta 900cgccaagtac caagcccgac acgttgttat tcatgacgag ttaagtgtgg cgtgtggaaa 960agtgcagctc agagccccca gcaccagtat tagaggtcat aatgggctgc gaagcctgct 1020aaaatgcagt ggaggccgtg taccctttgc caaattggct attggaatcg gcagagaacc 1080tgggtcccgt tctagagacc ctgcgagcgt gtcccggtgg gttctgggag ctctaactcc 1140gcaggaacta caaaccttgc ttacacagag tgaacctgct gcctggcgtg ctctgactca 1200gtacatttca taggacagca ttcgcccagt atttttttta ttctacaaac cttctataat 1260ttcaaagtat ttacataatt ctgtatcagt ttaatcacca taatatcgtt ttctttgttt 1320agtgcaatta atttttccta ttgttacttc gggccttttt ctgttttatg agctattttt 1380tccgtcatcc ttccggatcc agattttcag cttcatctcc agattgtgtc tacgtaatgc 1440acgccatcat tttaagagag gacagagaag caagcctcct gaaagatgaa gctactgtct 1500tctatcgaac aagcatgcga tatttgccga cttaaaaagc tcaagtgctc caaagaaaaa 1560ccgaagtgcg ccaagtgtct gaagaacaac tgggagtgtc gctactctcc caaaaccaaa 1620aggtctccgc tgactagggc acatctgaca gaagtggaat caaggctaga aagactggaa 1680cagctatttc tactgatttt tcctcgagaa gaccttgaca tgattttgaa aatggattct 1740ttacaggata taaaagcatt gttaacagga ttatttgtac aagataatgt gaataaagat 1800gccgtcacag atagattggc ttcagtggag actgatatgc ctctaacatt gagacagcat 1860agaataagtg cgacatcatc atcggaagag agtagtaaca aaggtcaaag acagttgact 1920gtatcgattg actcggcagc tcatcatgat aactccacaa ttccgttgga ttttatgccc 1980agggatgctc ttcatggatt tgattggtct gaagaggatg acatgtcgga tggcttgccc 2040ttcctgaaaa cggaccccaa caataatggg ttctttggcg acggttctct cttatgtatt 2100cttcgatcta ttggctttaa accggaaaat tacacgaact ctaacgttaa caggctcccg 2160accatgatta cggatagata cacgttggct tctagatcca caacatcccg tttacttcaa 2220agttatctca ataattttca cccctactgc cctatcgtgc actcaccgac gctaatgatg 2280ttgtataata accagattga aatcgcgtcg aaggatcaat ggcaaatcct ttttaactgc 2340atattagcca ttggagcctg gtgtatagag ggggaatcta ctgatataga tgttttttac 2400tatcaaaatg ctaaatctca tttgacgagc aaggtcttcg agtcaggttc cataattttg 2460gtgacagccc tacatcttct gtcgcgatat acacagtgga ggcagaaaac aaatactagc 2520tataattttc acagcttttc cataagaatg gccatatcat tgggcttgaa tagggacctc 2580ccctcgtcct tcagtgatag cagcattctg gaacaaagac gccgaatttg gtggtctgtc 2640tactcttggg agatccaatt gtccctgctt tatggtcgat ccatccagct ttctcagaat 2700acaatctcct tcccttcttc tgtcgacgat gtgcagcgta ccacaacagg tcccaccata 2760tatcatggca tcattgaaac agcaaggctc ttacaagttt tcacaaaaat ctatgaacta 2820gacaaaacag taactgcaga aaaaagtcct atatgtgcaa aaaaatgctt gatgatttgt 2880aatgagattg aggaggtttc gagacaggca ccaaagtttt tacaaatgga tatttccacc 2940accgctctaa ccaatttgtt gaaggaacac ccttggctat cctttacaag attcgaactg 3000aagtggaaac agttgtctct tatcatttat gtattaagag attttttcac taattttacc 3060cagaaaaagt cacaactaga acaggatcaa aatgatcatc aaagttatga agttaaacga 3120tgctccatca tgttaagcga tgcagcacaa agaactgtta tgtctgtaag tagctatatg 3180gacaatcata atgtcacccc atattttgcc tggaattgtt cttattactt gttcaatgca 3240gtcctagtac ccataaagac tctactctca aactcaaaat cgaatgctga gaataacgag 3300accgcacaat tattacaaca aattaacact gttctgatgc tattaaaaaa actggccact 3360tttaaaatcc agacttgtga aaaatacatt caagtactgg aagaggtatg tgcgccgttt 3420ctgttatcac agtgtgcaat cccattaccg catatcagtt ataacaatag taatggtagc 3480gccattaaaa atattgtcgg ttctgcaact atcgcccaat accctactct tccggaggaa 3540aatgtcaaca atatcagtgt taaatatgtt tctcctggct cagtagggcc ttcacctgtg 3600ccattgaaat caggagcaag tttcagtgat ctagtcaagc tgttatctaa ccgtccaccc 3660tctcgtaact ctccagtgac aataccaaga agcacacctt cgcatcgctc agtcacgcct 3720tttctagggc aacagcaaca gctgcaatca ttagtgccac tgaccccgtc tgctttgttt 3780ggtggcgcca attttaatca aagtgggaat attgctgata gctcattgtc cttcactttc 3840actaacagta gcaacggtcc gaacctcata acaactcaaa caaattctca agcgctttca 3900caaccaattg cctcctctaa cgttcatgat aacttcatga ataatgaaat cacggctagt 3960aaaattgatg atggtaataa ttcaaaacca ctgtcacctg gttggacgga ccaaactgcg 4020tataacgcgt ttggaatcac tacagggatg tttaatacca ctacaatgga tgatgtatat 4080aactatctat tcgatgatga agatacccca ccaaacccaa aaaaagagta aaatgaatcg 4140tagatactga aaaaccccgc aagttcactt caactgtgca tcgtgcacca tctcaatttc 4200tttcatttat acatcgtttt gccttctttt atgtaactat actcctctaa gtttcaatct 4260tggccatgta acctctgatc tatagaattt tttaaatgac tagaattaat gcccatcttt 4320tttttggacc taaattcttc atgaaaatat attacgaggg cttattcaga agcttcgctc 4380agtcgacact agtaatacac atcatcgtcc tacaagttca tcaaagtgtt ggacagacaa 4440ctataccagc atggatctct tgtatcggtt cttttctccc gctctctcgc aataacaatg 4500aacactgggt caatcatagc ctacacaggt gaacagagta gcgtttatac agggtttata 4560cggtgattcc tacggcaaaa atttttcatt tctaaaaaaa aaaagaaaaa tttttctttc 4620caacgctaga aggaaaagaa aaatctaatt aaattgattt ggtgattttc tgagagttcc 4680ctttttcata tatcgaattt tgaatataaa aggagatcga aaaaattttt ctattcaatc 4740tgttttctgg ttttatttga tagttttttt gtgtattatt attatggatt agtactggtt 4800tatatgggtt tttctgtata acttcttttt attttagttt gtttaatctt attttgagtt 4860acattatagt tccctaactg caagagaagt aacattaaaa atgaccactc ttgacgacac 4920ggcttaccgg taccgcacca gtgtcccggg ggacgccgag gccatcgagg cactggatgg 4980gtccttcacc accgacaccg tcttccgcgt caccgccacc ggggacggct tcaccctgcg 5040ggaggtgccg gtggacccgc ccctgaccaa ggtgttcccc gacgacgaat cggacgacga 5100atcggacgcc ggggaggacg gcgacccgga ctcccggacg ttcgtcgcgt acggggacga 5160cggcgacctg gcgggcttcg tggtcgtctc gtactccggc tggaaccgcc ggctgaccgt 5220cgaggacatc gaggtcgccc cggagcaccg ggggcacggg gtcgggcgcg cgttgatggg 5280gctcgcgacg gagttcgccc gcgagcgggg cgccgggcac ctctggctgg aggtcaccaa 5340cgtcaacgca ccggcgatcc acgcgtaccg gcggatgggg ttcaccctct gcggcctgga 5400caccgccctg tacgacggca ccgcctcgga cggcgagcag gcgctctaca tgagcatgcc 5460ctgcccctga gtttaacttg atactactag attttttctc ttcatttata aaatttttgg 5520ttataattga agctttagaa gtatgaaaaa atcctttttt ttcattcttt gcaaccaaaa 5580taagaagctt cttttattca ttgaaatgat gaatataaac ctaacaaaag aaaaagactc 5640gaatatcaaa cattaaaaaa aaataaaaga ggttatctgt tttcccattt agttggagtt 5700tgcattttct aatagataga actctcaatt aatgtggatt tagtttctct gttcgttttt 5760ttttgttttg ttctcactgt atttacattt ctatttagta tttagttatt catataatct 5820taacttctcg aggagctcga tcttgaaact gagtaagatg ctcagaatac ccgtcaagat 5880aagagtataa tgtagagtaa tataccaagt attcagcata ttctcctctt cttttgtata 5940aatcacggaa gggatgattt ataagaaaaa tgaatactat tacacttcat ttaccaccct 6000ctgatctaga ttttccaacg atatgtacgt agtggtataa ggtgaggggg tccacagata 6060taacatcgtt taatttagta ctaacagaga cttttgtcac aactacatat aagtgtacaa 6120atatagtaca gatatgacac acttgtagcg ccaacgcgca tcctacggat tgctgacaga 6180aaaaaaggtc acgtgaccag aaaagtcacg tgtaattttg taactcaccg cattctagcg 6240gtccctgtcg tgcacactgc actcaacacc ataaacctta gcaacctcca aaggaaatca 6300ccgtataaca aagccacagt tttacaactt agtctcttat gaagtgtctc tctctgtcgt 6360aacagttgtg atatcggaag aagagaaaag acgaagagca gaagcggaaa acgtatacac 6420gtcacatatc acacacacac aatgggaaag ctattacaat tggcattgca tccggtcgag 6480atgaaggcag ctttgaagct gaagttttgc agaacaccgc tattctccat ctatgatcag 6540tccacgtctc catatctctt gcactgtttc gaactgttga acttgacctc cagatcgttt 6600gctgctgtga tcagagagct gcatccagaa ttgagaaact gtgttactct cttttatttg 6660attttaaggg ctttggatac catcgaagac gatatgtcca tcgaacacga tttgaaaatt 6720gacttgttgc gtcacttcca cgagaaattg ttgttaacta aatggagttt cgacggaaat 6780gcccccgatg tgaaggacag agccgttttg acagatttcg aatcgattct tattgaattc 6840cacaaattga aaccagaata tcaagaagtc atcaaggaga tcaccgagaa aatgggtaat 6900ggtatggccg actacatctt ggatgaaaat tacaacttga atgggttgca aaccgtccac 6960gactacgacg tgtactgtca ctacgtagct ggtttggtcg gtgatggttt gacccgtttg 7020attgtcattg ccaagtttgc caacgaatct ttgtattcta atgagcaatt gtatgaaagc 7080atgggtcttt tcctacaaaa aaccaacatc atcagagact acaatgaaga tttggtcgat 7140ggtagatcct tctggcccaa ggaaatctgg tcacaatacg ctcctcagtt gaaggacttc 7200atgaaacctg aaaacgaaca actggggttg gactgtataa accacctcgt cttaaacgca 7260ttgagtcatg ttatcgatgt gttgacttat ttggccagta tccacgagca atccactttc 7320caattttgtg ccattcccca agttatggcc attgcaacct tggctttggt attcaacaac 7380cgtgaagtgc tacatggcaa tgtaaagatt cgtaagggta ctacctgcta tttaattttg 7440aaatcaagga ctttgcgtgg ctgtgtcgag atttttgact attacttacg tgatatcaaa 7500tctaaattgg ctgtgcaaga tccaaatttc ttaaaattga acattcaaat ctccaagatc 7560gaacaattca tggaagaaat gtaccaggat aaattacctc ctaacgtgaa gccaaatgaa 7620actccaattt tcttgaaagt taaagaaaga tccagatacg atgatgaatt ggtcccaacc 7680caacaagaag aagagtacaa gttcaatatg gttttatcta tcatcttgtc cgttcttctt 7740gggttttatt atatatacac tttacacaga gcgtgaagtc tgcgccaaat aacataaaca 7800aacaactccg aacaataact aagtacttac ataataggta gaggcctatc cttaaagata 7860accttatatt tcattacat 7879195714DNAArtificial SequenceSynthetic Integration construct i37 19gcctgtctac aggataaaga cgggtcggat acctgcacaa gcaatttggc acctgcatac 60cccatttccc cagtagataa cttcaacaca cacatcaatg tccctcacca gtttatttcc 120aaaagagacg ctttttacta cctgactaga ttttcatttt gtttcttttg gattgcgctt 180gcctttgtag gtgtgtcgtt tatcctttac gttttgactt ggtgctcgaa gatgctttca 240gagatggtgc ttatcctcat gtcttttggg tttgtcttca atacggcagc cgttgtcttg 300caaacggccg cctctgccat ggcaaagaat gctttccatg acgatcatcg tagtgcccaa 360ttgggtgcct ctatgatggg tttaaacgta tggcttgggc aagtgtcttt ttatgtatcg 420tggaatttat cctgctggtc ttctggtctg ttagggcaag gttggcctct acttactcca 480tcgacaattc aagatacaga acctcctcca gatggaatcc cttccataga gagaaggagc 540aagcaactga cccaatattg actgccactg gacctgaaga catgcaacaa agtgcaagca 600tagtggggcc ttcttccaat gctaatccgg tcactgccac tgctgctacg gaaaaccaac 660ctaaaggtat taacttcttc actataagaa aatcacacga gcgcccggac gatgtctctg 720tttaaatggc gcaagttttc cgctttgtaa tatatattta tacccctttc ttctctcccc 780tgcaatataa tagtttaatt ctaatattaa taatatccta tattttcttc atttaccggc 840gcactctcgc ccgaacgacc tcaaaatgtc tgctacattc ataataacca aaagctcata 900actttttttt ttgaacctga atatatatac atcacatatc actgctggtc ctgagaagtt 960aagattatat gaataactaa atactaaata gaaatgtaaa tacagtgaga acaaaacaaa 1020aaaaaacgaa cagagaaact aaatccacat taattgagag ttctatctat tagaaaatgc 1080aaactccaac taaatgggaa aacagataac ctcttttatt tttttttaat gtttgatatt 1140cgagtctttt tcttttgtta ggtttatatt catcatttca atgaataaaa gaagcttctt 1200attttggttg caaagaatga aaaaaaagga ttttttcata cttctaaagc ttcaattata 1260accaaaaatt ttataaatga agagaaaaaa tctagtagta tcaagttaaa cttagaaaaa 1320ctcatcgagc atcaaatgaa actgcaattt attcatatca ggattatcaa taccatattt 1380ttgaaaaagc cgtttctgta atgaaggaga aaactcaccg aggcagttcc ataggatggc 1440aagatcctgg tatcggtctg cgattccgac tcgtccaaca tcaatacaac ctattaattt 1500cccctcgtca aaaataaggt tatcaagtga gaaatcacca tgagtgacga ctgaatccgg 1560tgagaatggc aaaagcttat gcatttcttt ccagacttgt tcaacaggcc agccattacg 1620ctcgtcatca aaatcactcg catcaaccaa accgttattc attcgtgatt gcgcctgagc 1680gagacgaaat acgcgatcgc tgttaaaagg acaattacaa acaggaatcg aatgcaaccg 1740gcgcaggaac actgccagcg catcaacaat attttcacct gaatcaggat attcttctaa 1800tacctggaat gctgttttgc cggggatcgc agtggtgagt aaccatgcat catcaggagt 1860acggataaaa tgcttgatgg tcggaagagg cataaattcc gtcagccagt ttagtctgac 1920catctcatct gtaacatcat tggcaacgct acctttgcca tgtttcagaa acaactctgg 1980cgcatcgggc ttcccataca atcgatagat tgtcgcacct gattgcccga cattatcgcg 2040agcccattta tacccatata aatcagcatc catgttggaa tttaatcgcg gcctcgaaac 2100gtgagtcttt tccttaccca tttttaatgt tacttctctt gcagttaggg aactataatg 2160taactcaaaa taagattaaa caaactaaaa taaaaagaag ttatacagaa aaacccatat 2220aaaccagtac taatccataa taataataca caaaaaaact atcaaataaa accagaaaac 2280agattgaata gaaaaatttt ttcgatctcc ttttatattc aaaattcgat atatgaaaaa 2340gggaactctc agaaaatcac caaatcaatt taattagatt tttcttttcc ttctagcgtt 2400ggaaagaaaa atttttcttt ttttttttag aaatgaaaaa tttttgccgt aggaatcacc 2460gtataaaccc tgtataaacg ctactctgtt cacctgtgta ggctatgatt gacccagtgt 2520tcattgttat tgcgagagag cgggagaaaa gaaccgatac aagagatcca tgctggtata 2580gttgtctgtc caacactttg atgaacttgt aggacgatga tgtgtattac tagtgtcgac 2640accatataca tatccatatc taatcttact tatatgttgt ggaaatgtaa agagccccat 2700tatcttagcc taaaaaaacc ttctctttgg aactttcagt aatacgctta actgctcatt 2760gctatattga agtacggatt agaagccgcc gagcgggcga cagccctccg acggaagact 2820ctcctccgtg cgtcctggtc ttcaccggtc gcgttcctga aacgcagatg tgcctcgcgc 2880cgcactgctc cgaacaataa agattctaca atactagctt ttatggttat gaagaggaaa 2940aattggcagt aacctggccc cacaaacctt caaatcaacg aatcaaatta acaaccatag 3000gataataatg cgattagttt tttagcctta tttctggggt aattaatcag cgaagcgatg 3060atttttgatc tattaacaga tatataaatg caaaagctgc ataaccactt taactaatac 3120tttcaacatt ttcggtttgt attacttctt attcaaatgt cataaaagta tcaacaaaaa 3180attgttaata tacctctata ctttaacgtc aaggagaaaa aactataatg tcattaccgt 3240tcttaacttc tgcaccggga aaggttatta tttttggtga acactctgct gtgtacaaca 3300agcctgccgt cgctgctagt gtgtctgcgt tgagaaccta cctgctaata agcgagtcat 3360ctgcaccaga tactattgaa ttggacttcc cggacattag ctttaatcat aagtggtcca 3420tcaatgattt caatgccatc accgaggatc aagtaaactc ccaaaaattg gccaaggctc 3480aacaagccac cgatggcttg tctcaggaac tcgttagtct tttggatccg ttgttagctc 3540aactatccga atccttccac taccatgcag cgttttgttt cctgtatatg tttgtttgcc 3600tatgccccca tgccaagaat attaagtttt ctttaaagtc tactttaccc atcggtgctg 3660ggttgggctc aagcgcctct atttctgtat cactggcctt agctatggcc tacttggggg 3720ggttaatagg atctaatgac ttggaaaagc tgtcagaaaa cgataagcat atagtgaatc 3780aatgggcctt cataggtgaa aagtgtattc acggtacccc ttcaggaata gataacgctg 3840tggccactta tggtaatgcc ctgctatttg aaaaagactc acataatgga acaataaaca 3900caaacaattt taagttctta gatgatttcc cagccattcc aatgatccta acctatacta 3960gaattccaag gtctacaaaa gatcttgttg ctcgcgttcg tgtgttggtc accgagaaat 4020ttcctgaagt tatgaagcca attctagatg ccatgggtga atgtgcccta caaggcttag 4080agatcatgac taagttaagt aaatgtaaag gcaccgatga cgaggctgta gaaactaata 4140atgaactgta tgaacaacta ttggaattga taagaataaa tcatggactg cttgtctcaa 4200tcggtgtttc tcatcctgga ttagaactta ttaaaaatct gagcgatgat ttgagaattg 4260gctccacaaa acttaccggt gctggtggcg gcggttgctc tttgactttg ttacgaagag 4320acattactca agagcaaatt gacagtttca aaaagaaatt gcaagatgat tttagttacg 4380agacatttga aacagacttg ggtgggactg gctgctgttt gttaagcgca aaaaatttga 4440ataaagatct taaaatcaaa tccctagtat tccaattatt tgaaaataaa actaccacaa 4500agcaacaaat tgacgatcta ttattgccag gaaacacgaa tttaccatgg acttcataag 4560ctaatttgcg ataggcatta tttattagtt gtttttaatc ttaactgtgt atgaagtttt 4620atgtaataaa gatagaaaga gaaacaaaaa aaaatttttc gtagtatcaa ttcagctttc 4680gaagacagaa tgaaatttaa gcagaccatc atcttgccct gtgcttggcc cccagtgcag 4740cgaacgttat aaaaacgaat actgagtata tatctatgta aaacaaccat atcatttctt 4800gttctgaact ttgtttacct aactagtttt aaatttccct ttttcgtgca tgcgggtgtt 4860cttatttatt agcatactac atttgaaata tcaaatttcc ttagtagaaa agtgagagaa 4920ggtgcactga cacaaaaaat aaaatgctac gtataactgt caaaactttg cagcagcggg 4980catccttcca tcatagcttc aaacatatta gcgttcctga tcttcatacc cgtgctcaaa 5040atgatcaaac aaactgttat tgccaagaaa taaacgcaag gctgccttca aaaactgatc 5100cattagatcc tcatatcaag cttcctcata gaacgcccaa ttacaataag catgttttgc 5160tgttatcacc gggtgatagg tttgctcaac catggaaggt agcatggaat cataatttgg 5220atactaatac aaatcggcca tataatgcca ttagtaaatt gcgctcccat ttaggtggtt 5280ctccaggcaa atttgaatac taataaatgc ggtgcatttg caaaatgaat ttattccaag 5340gccaaaacaa cacgatgaat ggctttattt ttttgttatt cctgacatga agctttatgt 5400aattaaggaa acggacatcg aggaatttgc atctttttta gatgaaggag ctattcaagc 5460accaaagcta tccttccagg attatttaag cggtaaggcc aaggcttccc aacaggttca 5520tgaagtgcat catagaaagc ttacaaggtt tcagggtgaa acttttctaa gagattggaa 5580cttagtctgt gggcattata agagagatgc taagtgtgga gaaatgggac ccgacataat 5640tgcagcattt caagatgaaa agctttttcc tgagaataat ctagccttaa tttctcatat 5700tgggggtcat attt 5714207688DNAArtificial SequenceSynthetic Integration construct i301 20gacggcacgg ccacgcgttt aaaccgccga gctattcgcg gaacattcta gctcgtttgc 60atcttcttgc atttggtagg ttttcaatag ttcggtaata ttaacggata cctactatta 120tcccctagta ggctcttttc acggagaaat tcgggagtgt tttttttccg tgcgcatttt 180cttagctata ttcttccagc ttcgcctgct gcccggtcat cgttcctgtc acgtagtttt 240tccggattcg tccggctcat ataataccgc aataaacacg gaatatctcg ttccgcggat 300tcggttaaac tctcggtcgc ggattatcac agagaaagct tcgtggagaa tttttccaga 360ttttccgctt tccccgatgt tggtatttcc ggaggtcatt atactgaccg ccattataat 420gactgtacaa cgaccttctg gagaaagaaa caactcaata acgatgtggg acattggggg 480cccactcaaa aaatctgggg actatatccc cagagaattt ctccagaaga gaagaaaagt 540caaagttttt tttcgcttgg gggttgcata taaagctcac acgcggccag ggggagccat 600gaaaaagcct gaactcaccg cgacgtctgt cgagaagttt ctgatcgaaa agttcgacag 660cgtctccgac ctgatgcagc tctcggaggg cgaagaatct cgtgctttca gcttcgatgt 720aggagggcgt ggatatgtcc tgcgggtaaa tagctgcgcc gatggtttct acaaagatcg 780ttatgtttat cggcactttg catcggccgc gctcccgatt ccggaagtgc ttgacattgg 840ggaattcagc gagagcctga cctattgcat ctcccgccgt gcacagggtg tcacgttgca 900agacctgcct gaaaccgaac tgcccgctgt tctgcagccg gtcgcggagg ccatggatgc 960gatcgctgcg gccgatctta gccagacgag cgggttcggc ccattcggac cgcaaggaat 1020cggtcaatac actacatggc gtgatttcat atgcgcgatt gctgatcccc atgtgtatca 1080ctggcaaact gtgatggacg acaccgtcag tgcgtccgtc gcgcaggctc tcgatgagct 1140gatgctttgg gccgaggact gccccgaagt ccggcacctc gtgcacgcgg atttcggctc 1200caacaatgtc ctgacggaca atggccgcat aacagcggtc attgactgga gcgaggcgat 1260gttcggggat tcccaatacg aggtcgccaa catcttcttc tggaggccgt ggttggcttg 1320tatggagcag cagacgcgct acttcgagcg gaggcatccg gagcttgcag gatcgccgcg 1380gctccgggcg tatatgctcc gcattggtct tgaccaactc tatcagagct tggttgacgg 1440caatttcgat gatgcagctt gggcgcaggg tcgatgcgac gcaatcgtcc gatccggagc 1500cgggactgtc gggcgtacac aaatcgcccg cagaagcgcg gccgtctgga ccgatggctg 1560tgtagaagta ctcgccgata gtggaaaccg acgccccagc actcgtccga gggcaaagga 1620atagcgctcg tccaacgccg gcggacctcg ctcgtccaac gccggcggac ctcttttaat 1680tctgctgtaa cccgtacatg cccaaaatag ggggcgggtt acacagaata tataacatcg 1740taggtgtctg ggtgaacagt ttattcctgg catccactaa atataatgga gcccgctttt 1800taagctggca tccagaaaaa aaaagaatcc cagcaccaaa atattgtttt cttcaccaac 1860catcagttca taggtccatt ctcttagcgc aactacagag aacaggggca caaacaggca 1920aaaaacgggc acaacctcaa

tggagtgatg caacctgcct ggagtaaatg atgacacaag 1980gcaattgacc cacgcatgta tctatctcat tttcttacac cttctattac cttctgctct 2040ctctgatttg gaaaaagctg aaaaaaaagg ttgaaaccag ttccctgaaa ttattcccct 2100acttgactaa taagtatata aagacggtag gtattgattg taattctgta aatctatttc 2160ttaaacttct taaattctac ttttatagtt agtctttttt ttagttttaa aacaccaaga 2220acttagtttc gatccccgcg tgcttggccg gccgtatccc cgcgtgcttg gccggccgta 2280tgtctcagaa cgtttacatt gtatcgactg ccagaacccc aattggttca ttccagggtt 2340ctctatcctc caagacagca gtggaattgg gtgctgttgc tttaaaaggc gccttggcta 2400aggttccaga attggatgca tccaaggatt ttgacgaaat tatttttggt aacgttcttt 2460ctgccaattt gggccaagct ccggccagac aagttgcttt ggctgccggt ttgagtaatc 2520atatcgttgc aagcacagtt aacaaggtct gtgcatccgc tatgaaggca atcattttgg 2580gtgctcaatc catcaaatgt ggtaatgctg atgttgtcgt agctggtggt tgtgaatcta 2640tgactaacgc accatactac atgccagcag cccgtgcggg tgccaaattt ggccaaactg 2700ttcttgttga tggtgtcgaa agagatgggt tgaacgatgc gtacgatggt ctagccatgg 2760gtgtacacgc agaaaagtgt gcccgtgatt gggatattac tagagaacaa caagacaatt 2820ttgccatcga atcctaccaa aaatctcaaa aatctcaaaa ggaaggtaaa ttcgacaatg 2880aaattgtacc tgttaccatt aagggattta gaggtaagcc tgatactcaa gtcacgaagg 2940acgaggaacc tgctagatta cacgttgaaa aattgagatc tgcaaggact gttttccaaa 3000aagaaaacgg tactgttact gccgctaacg cttctccaat caacgatggt gctgcagccg 3060tcatcttggt ttccgaaaaa gttttgaagg aaaagaattt gaagcctttg gctattatca 3120aaggttgggg tgaggccgct catcaaccag ctgattttac atgggctcca tctcttgcag 3180ttccaaaggc tttgaaacat gctggcatcg aagacatcaa ttctgttgat tactttgaat 3240tcaatgaagc cttttcggtt gtcggtttgg tgaacactaa gattttgaag ctagacccat 3300ctaaggttaa tgtatatggt ggtgctgttg ctctaggtca cccattgggt tgttctggtg 3360ctagagtggt tgttacactg ctatccatct tacagcaaga aggaggtaag atcggtgttg 3420ccgccatttg taatggtggt ggtggtgctt cctctattgt cattgaaaag atatgattac 3480gttctgcgat tttctcatga tctttttcat aaaatacata aatatataaa tggctttatg 3540tataacaggc ataatttaaa gttttatttg cgattcatcg tttttcaggt actcaaacgc 3600tgaggtgtgc cttttgactt acttttccgc cttggcaagc tggccgggtg atacttgcac 3660aagttccact aattactgac atttgtggta ttaactcgtt tgactgctct acaattgtag 3720gatgttaatc aatgtcttgg ctgcctaacc tgcaggccgc gagcgccgat atgctatgta 3780atagacaata aaaccatgtt tatataaaaa aaattcaaaa tagaaaacga ttctgtacaa 3840ggagtatttt ttttttgttc tagtgtgttt atattatcct tggctaagag gcactgcgta 3900tacttcaagg tacccctgtg ttttgaaaaa aaacaacagt aaaataggaa ctccgcgagg 3960ttcaggaacc tgaaacaaaa tcaataaaaa cattatatgc gtttcgaaca aaattaaaga 4020aaaagaataa atatagatta aaaaaaaaaa gaagaaatta aaagaatttc tactaaatcc 4080caattgttat atatttgtta aatgccaaaa aagtttataa aaaatttaga atgtataaat 4140aataataaac taagtaacgc gatcgccgac gccgccgata tctccctcgc cagcggccgc 4200cttatggcta agaatgttgg aattttggcc atggacatct acttcccacc aacttgtgtt 4260cagcaggagg ctttagaagc acatgacgga gcctcaaagg gtaagtacac aatcggatta 4320ggacaggatt gcttagcatt ctgcactgaa ttggaggacg tcatctcaat gtctttcaac 4380gccgtcacct cattgttaga gaagtacaaa atcgacccaa accagatcgg aaggttggaa 4440gtcggttctg aaaccgtcat cgacaagtct aaatcaatca agactttcgt tatgcagttg 4500ttcgaaaagt gcggtaatac tgacgtcgag ggtgtagact ctactaacgc ttgttatggt 4560ggtaccgcag ctttattgaa ctgcgtaaac tgggttgagt caaactcatg ggatggtagg 4620tacggattag tcatttgcac cgattctgcc gtctacgccg agggtccagc aaggccaacc 4680ggtggagctg cagctattgc tatgttaatc ggaccagatg cccctatagt cttcgagtct 4740aagttgaggg gttcacacat ccctaacgtc tacgacttct acaagccaaa cttggcctca 4800gagtatccag ttgtcgacgg aaagttatct cagacatgct acttgatggc cttagattca 4860tgttacaagc acttatgcaa caagttcgaa aagttggagg gaaaggagtt ctcaattaac 4920gacgccgact acttcgtttt tcactctcca tacaacaaat tggtccagaa gtcattcgcc 4980aggttattgt acaacgattt tttgagaaac gcatcatcta tcgatgaggc cgccaaggag 5040aaattcaccc catattcttc tttgtcattg gacgagtctt accagtctag ggacttggag 5100aaggtatcac agcaattggc taaaaccttc tatgacgcca aagttcagcc aaccaccttg 5160gtccctaaac aggtcggaaa tatgtatact gcatctttgt atgccgcctt tgcctctttg 5220atccacaaca agcacaacga tttagtcgga aaaagggttg tcatgttttc ttacggtgcc 5280ggatctactg ccactatgtt ctcattgagg ttatgcgaaa accagtcacc attttcattg 5340tctaacatcg cctcagtcat ggacgtaggt gtctcacctg agaagttcgt agaaaccatg 5400aagttgatgg agcacagata cggtgccaaa gaattcgtca cttcaaaaga gggaatcttg 5460gatttgttgg ccccaggaac ctactatttg aaggaggtcg actctttgta cagaaggttc 5520tatggaaaga agggagacga cggatctgtc gcaaacggtc agtaaatcgg cggcgtcggc 5580gatcgcgtta aggcggccgc tggcgaggga gatatttcaa cctgggccta acagtaaaga 5640tatcctcctc aaaactggtg cacttaatcg ctgaatttgt tctggcttct cttctttttc 5700tttattcccc ccatgggcca aaaaaaatag tactatcagg aatttggcgc cgggtcacga 5760tatacgtgta cagtgaccta ggcgacgcca caaggaaaaa ggaaaaaaac agaaaaaaca 5820acaaaaacta aaacaaacac gaaaacttta atagatctaa gtgaagtagt ggtgaggcaa 5880ttggagtgac atagcagcta ctacaactac aaaaaaggcg cgccacggtc gtgcggatat 5940gaaagaggtc gttatagctt ctgccgtcag gaccgccatc ggatcttacg gtaagtcatt 6000aaaggacgtc cctgccgttg atttaggagc caccgcaatt aaagaggccg ttaaaaaggc 6060aggtataaag ccagaggacg tcaacgaggt catcttggga aatgtcttac aagccggatt 6120aggtcaaaac ccagcaagac aagcatcatt caaagccggt ttacctgtcg agatacctgc 6180aatgaccatc aacaaggttt gcggttcagg attaaggacc gtttctttag cagcacagat 6240cattaaggct ggagatgcag acgttatcat tgctggtggt atggaaaaca tgtcaagagc 6300cccatacttg gctaataacg ccaggtgggg atataggatg ggaaacgcca agtttgtcga 6360cgaaatgatt actgacggat tgtgggacgc cttcaatgac tatcacatgg gtataaccgc 6420agaaaacatt gccgagaggt ggaatatctc aagagaagaa caggatgagt ttgcattggc 6480ctcacagaaa aaagcagagg aggcaataaa gtcaggtcag tttaaggatg aaatcgtccc 6540agtcgtcatc aagggaagaa agggtgagac agttgtcgac accgacgaac accctagatt 6600tggttcaacc atcgagggat tagcaaagtt gaagccagcc ttcaagaaag acggaaccgt 6660aaccgccggt aatgcatctg gattgaacga ttgcgcagca gttttggtca taatgtcagc 6720cgagaaagct aaggagttgg gtgtcaagcc attggcaaaa attgtttcat acggatcagc 6780cggtgtcgac cctgccatca tgggttacgg acctttttac gccaccaagg ctgcaatcga 6840aaaggccggt tggaccgtag atgaattgga tttgatcgag tcaaacgagg cctttgccgc 6900ccaatcattg gctgtcgcca aggacttgaa gttcgacatg aacaaggtca acgtcaacgg 6960tggtgccatc gcattgggtc accctatcgg agcctctggt gccaggatct tggttacctt 7020ggtccacgcc atgcagaaga gggacgcaaa gaagggtttg gccaccttgt gcatcggtgg 7080aggtcaggga acagctatct tgttagagaa atgcagcccc tcagcccccc tagcgtcgaa 7140taaaagacat tggtacatga tatcaaacag aattttaaca tttcttgatc cagtttgtaa 7200acaaaacaaa caatttttct accatttaac ttcataccat cggcgagagc cgaacaggaa 7260aaaaaagaag tctccggtta tcgtaagcag tatcaaataa taagaatgta tgtgtgtgca 7320atttgttata cccacgaaga agtgcgcagt agagttagaa aaccaactga gtaatcttta 7380ctcccgacaa tcgtccaata atcctcttgt tgctaggaac gtgatgatgg atttcgtttg 7440aaatccggac ggaaaactca aaagaagtcc aaccaccaac cattttcgag cctcaagaat 7500ctctaagcag gtttctttac taaggggatg gcctttctgt cctggacatt ttttccttcc 7560ttttttcatt tccttgaaag gaacagattt tttttgactt ttgccacaca gctgcactat 7620ctcaacccct tttacatttt aagttttcgg gttgaatggc cggtgtttaa accccagcgc 7680ctggcggg 7688215025DNAArtificial SequenceSynthetic Integration construct i476 21caggatccga cggcacggcc acgcgtttaa accgcctggg ataggatagt agcaactctt 60ggaggagagc attgtcagtt gtccagtctc tgaagttaag tagtaagttt gcggagtcaa 120agggggatgg cttttgccat ttgtgagagt tgtgcggcag catcttattc aaatagagct 180gtattctgaa gacctcttgt agaacatcat ccatactaaa aagtaaatcg tcctgtccca 240ttacgagctg tattagtgct gtgaccctct gtatatttac gttgccatga agaaggtaat 300gggcgatatt ttgatacaat tcctgagttg catgttggat tgagtttacg aagggtcgcc 360agacggccag aaacctccag gcggagttaa caactagtaa tacggcatcc atgtttgcat 420cagcgccgag cctataccag tcactgagta gacgttttct tgctcttttt atgtcctgac 480ttcttttgac gagggggcat tctctagaga cacaggcagt tgcttccagc aactgccgta 540cggccgttct catgctgtcg aggatttttt ttgggacgat attgtcatta tagggcagtg 600tgtgacttat gaattgttgt agaaggacgt ctgtgatgtt ggagatatgt attttgttaa 660ctcttcttga gacgatttgg ccctggatag cgaagcgtgc ggttacaaat aggtcgtctt 720gttcaagaag gtaggcgagg acattatcta tcagtacaaa catcttagta gtgtctgagg 780agagggttga ttgtttatgt atttttgcga aatatatata tatatattct acacagatat 840atacatattt gtttttcggg ctcattcttt cttctttgcc agaggctcac cgctcaagag 900gtccgctaat tctggagcga ttgttattgt tttttctttt cttcttctat tcgaaaccca 960gtttttgatt tgaatgcgag ataaactggt attcttcatt agattctcta ggcccttggt 1020atctagatat gggttctcga tgttctttgc aaaccaactt tctagtattc ggacattttc 1080ttttgtaaac cggtgtcctc tgtaaggttt agtacttttg tttatcatat cttgagttac 1140cacattaaat accaacccat ccgccgattt atttttctgt gtaagttgat aattacttct 1200atcgttttct atgctgcgca tttctttgag taatacagta atggtagtag tgagttgaga 1260tgttgtttgc aacaacttct tctcctcatc actaatctta cggtttttgt tggccctaga 1320taagaatcct aatatatccc ttaattcaac ttcttcttct gttgttacac tctctggtaa 1380cttaggtaaa ttacagcaaa tagaaaagag ctttttattt atgtctagta tgctggattt 1440aaactcatct gtgatttgtg gatttaaaag gtctttaatg ggtattttat tcattttttc 1500ttgcttatct tccttttttt cttgcccact tctaagctga tttcaatctc tcctttatat 1560atatttttaa gttccaacat tttatgtttc aaaacattaa tgatgtctgg gttttgtttg 1620ggatgcaatt tattgcttcc caatgtagaa aagtacatca tatgaaacaa cttaaactct 1680taactacttc ttttaacctt cactttttat gaaatgtatc aaccatatat aataacttaa 1740tagacgacat tcacaatatg tttacttcga agcctgcttt caaaattaag aacaaagcat 1800ccaaatcata cagaaacaca gcggtttcaa aaaagctgaa agaaaaacgt ctagctgagc 1860atgtgaggcc aagctgcttc aatattattc gaccactcaa gaaagatatc cagattcctg 1920ttccttcctc tcgattttta aataaaatcc aaattcacag gatagcgtct ggaagtcaaa 1980atactcagtt tcgacagttc aataagacat ctataaaatc ttcaaagaaa tatttaaact 2040catttatggc ttttagagca tattactcac agtttggctc cggtgtaaaa caaaatgtct 2100tgtcttctct gctcgctgaa gaatggcacg cggacaaaat gcagcacgga atatgggact 2160acttcgcgca acagtataat tttataaacc ctggttttgg ttttgtagag tggttgacga 2220ataattatgc tgaagtacgt ggtgacggat attgggaaga tgtgtttgta catttggcct 2280tatagagtgt ggtcgtggcg gaggttgttt atctttcgag tactgaatgt tgtcagtata 2340gctatcctat ttgaaactcc ccatcgtctt gctcttgttc ccaatgtttg tttatacact 2400catatggcta tacccttatc tacttgcctc ttttgtttat gtctatgtat ttgtataaaa 2460tatgatatta ctcagactca agcaaacaat caatgctcac acgcggccag ggggagcctc 2520gacactagta atacacatca tcgtcctaca agttcatcaa agtgttggac agacaactat 2580accagcatgg atctcttgta tcggttcttt tctcccgctc tctcgcaata acaatgaaca 2640ctgggtcaat catagcctac acaggtgaac agagtagcgt ttatacaggg tttatacggt 2700gattcctacg gcaaaaattt ttcatttcta aaaaaaaaaa gaaaaatttt tctttccaac 2760gctagaagga aaagaaaaat ctaattaaat tgatttggtg attttctgag agttcccttt 2820ttcatatatc gaattttgaa tataaaagga gatcgaaaaa atttttctat tcaatctgtt 2880ttctggtttt atttgatagt ttttttgtgt attattatta tggattagta ctggtttata 2940tgggtttttc tgtataactt ctttttattt tagtttgttt aatcttattt tgagttacat 3000tatagttccc taactgcaag agaagtaaca ttaaaaatga aaaagcctga actcaccgcg 3060acgtctgtcg agaagtttct gatcgaaaag ttcgacagcg tctccgacct gatgcagctc 3120tcggagggcg aagaatctcg tgctttcagc ttcgatgtag gagggcgtgg atatgtcctg 3180cgggtaaata gctgcgccga tggtttctac aaagatcgtt atgtttatcg gcactttgca 3240tcggccgcgc tcccgattcc ggaagtgctt gacattgggg aattcagcga gagcctgacc 3300tattgcatct cccgccgtgc acagggtgtc acgttgcaag acctgcctga aaccgaactg 3360cccgctgttc tgcagccggt cgcggaggcc atggatgcga tcgctgcggc cgatcttagc 3420cagacgagcg ggttcggccc attcggaccg caaggaatcg gtcaatacac tacatggcgt 3480gatttcatat gcgcgattgc tgatccccat gtgtatcact ggcaaactgt gatggacgac 3540accgtcagtg cgtccgtcgc gcaggctctc gatgagctga tgctttgggc cgaggactgc 3600cccgaagtcc ggcacctcgt gcacgcggat ttcggctcca acaatgtcct gacggacaat 3660ggccgcataa cagcggtcat tgactggagc gaggcgatgt tcggggattc ccaatacgag 3720gtcgccaaca tcttcttctg gaggccgtgg ttggcttgta tggagcagca gacgcgctac 3780ttcgagcgga ggcatccgga gcttgcagga tcgccgcggc tccgggcgta tatgctccgc 3840attggtcttg accaactcta tcagagcttg gttgacggca atttcgatga tgcagcttgg 3900gcgcagggtc gatgcgacgc aatcgtccga tccggagccg ggactgtcgg gcgtacacaa 3960atcgcccgca gaagcgcggc cgtctggacc gatggctgtg tagaagtact cgccgatagt 4020ggaaaccgac gccccagcac tcgtccgagg gcaaaggaat aggtttaact tgatactact 4080agattttttc tcttcattta taaaattttt ggttataatt gaagctttag aagtatgaaa 4140aaatcctttt ttttcattct ttgcaaccaa aataagaagc ttcttttatt cattgaaatg 4200atgaatataa acctaacaaa agaaaaagac tcgaatatca aacattaaaa aaaaataaaa 4260gaggttatct gttttcccat ttagttggag tttgcatttt ctaatagata gaactctcaa 4320ttaatgtgga tttagtttct ctgttcgttt ttttttgttt tgttctcact gtatttacat 4380ttctatttag tatttagtta ttcatataat cttaacttct cgaggagctc cgctcgtcca 4440acgccggcgg acctcggagg ttgtttatct ttcgagtact gaatgttgtc agtatagcta 4500tcctatttga aactccccat cgtcttgctc ttgttcccaa tgtttgttta tacactcata 4560tggctatacc cttatctact tgcctctttt gtttatgtct atgtatttgt ataaaatatg 4620atattactca gactcaagca aacaatcaat tcttagcatc attctttgtt cttatcttaa 4680ccataaacga tcttgatgtg acttttgtaa tttgaacgaa ttggctatac gggacggatg 4740acaaatgcac cattactcta ggttgttgtt ggatcttaac aaaccgtaaa ggtaaactgc 4800ccatgcggtt cacatgactt ttgactttcc tttgtttgct agttaccttc ggcttcacaa 4860tttgtttttc cacttttcta acaggtttat cacctttcaa acttatcttt atcttattcg 4920ccttcttggg tgcctccaca gtagaggtta cttccttttt aatatgtact tttaggatac 4980tttcacgctt tataacacgg tgtttaaacc ccagcgcctg gcggg 5025223665DNAArtificial SequenceSynthetic Integration construct i477 22agctcgagga cggcacggcc acgcgtttaa accgccaagc ttttcaattc atcttttttt 60tttttgttct tttttttgat tccggtttct ttgaaatttt tttgattcgg taatctccga 120gcagaaggaa gaacgaagga aggagcacag acttagattg gtatatatac gcatatgtgg 180tgttgaagaa acatgaaatt gcccagtatt cttaacccaa ctgcacagaa caaaaacctg 240caggaaacga agataaatca tgtcgaaagc tacatataag gaacgtgctg ctactcatcc 300tagtcctgtt gctgccaagc tatttaatat catgcacgaa aagcaaacaa acttgtgtgc 360ttcattggat gttcgtacca ccaaggaatt actggagtta gttgaagcat taggtcccaa 420aatttgttta ctaaaaacac atgtggatat cttgactgat ttttccatgg agggcacagt 480taagccgcta aaggcattat ccgccaagta caatttttta ctcttcgaag acagaaaatt 540tgctgacatt ggtaatacag tcaaattgca gtactctgcg ggtgtataca gaatagcaga 600atgggcagac attacgaatg cacacggtgt ggtgggccca ggtattgtta gcggtttgaa 660gcaggcggca gaagaagtaa caaaggaacc tagaggcctt ttgatgttag cagaattgtc 720atgcaagggc tccctatcta ctggagaata tactaagggt actgttgaca ttgcgaagag 780tgacaaagat tttgttatcg gctttattgc tcaaagagac atgggtggaa gagatgaagg 840ttacgattgg ttgattatga cacccggtgt gggtttagat gacaagggag acgcattggg 900tcaacagtat agaaccgtgg atgatgtggt ctctacagga tctgacatta ttattgttgg 960aagcgctcgt ccaacgccgg cggacctatg gcgcaagttt tccgctttgt aatatatatt 1020tatacccctt tcttctctcc cctgcaatat aatagtttaa ttctaatatt aataatatcc 1080tatattttct tcatttaccg gcgcactctc gcccgaacga cctcaaaatg tctgctacat 1140tcataataac caaaagctca taactttttt ttttgaacct gaatatatat acatcacata 1200tcactgctgg tccttgccga ccagcgtata caatctcgat agttggtttc ccgttctttc 1260cactcccgtc atggactaca acaagagatc ttcggtctca accgtgccta atgcagctcc 1320cataagagtc ggattcgtcg gtctcaacgc agccaaagga tgggcaatca agacacatta 1380ccccgccata ctgcaactat cgtcacaatt tcaaatcact gccttataca gtccaaaaat 1440tgagacttct attgccacca tccagcgtct aaaattgagt aatgccactg cttttcccac 1500tttagagtca tttgcatcat cttccactat agatatgata gtgatagcta tccaagtggc 1560cagtcattat gacgttgtta tgcctctctt ggaattctcc aaaaataatc cgaacctcaa 1620gtatcttttc gtagaatggg cccttgcatg ttcactagat caagccgaat ccatttataa 1680ggctgctgct gaacgtgggg ttcaaaccat catctcttta caaggtcgta aatcaccata 1740tattttgaga gcaaaagaat taatatctca aggctatatc ggcgacatta attctatcga 1800gattgctgga aatggcggtt ggtacggcta cgaaaggcct gttaaatcac caaaatacat 1860ctatgaaatc gggaacggtg tagatctggt aaccacaaca tttggtcaca caatcgatat 1920tttacaatac atgacaagtt cgtacttttc caggataaat gcaatggttt tcaataatat 1980tccagagcaa gagctgatag atgagcgtgg taaccgattg ggccagcgag tcccaaagac 2040agtaccggat catcttttat tccaaggcac attgttaaat ggcaatgttc cagtgtcatg 2100cagtttcaaa ggtggcaaac ctaccaaaaa atttaccaaa aatttggtca ttgatattca 2160cggtaccaag ggagatttga aacttgaagg cgatgccgga ttcgcagaaa tttcaaatct 2220ggtcctttac tacagtggaa ctagagcaaa cgacttcccg ctagctaatg gacaacaagc 2280tcctttagac ccggggtatg atgcaggtaa agaaatcatg gaagtatatc atttacgaaa 2340ttataatgcc attgtcggta atattcatcg actgtatcaa tctatctctg acttccactt 2400caatacaaag aaaattcctg aattaccctc acaatttgta atgcaaggtt tcgatttcga 2460aggctttccc accttgatgg atgctctgat attacacagg ttaatcgaga gcgtttataa 2520aagtaacatg atgggctcca cattaaacgt tagcaatatc tcgcattata gtttataaaa 2580gcatcttgcc ctgtgcttgg cccccagtgc agcgaacgtt ataaaaacga atactgagta 2640tatatctatg taaaacaacc atatcatttc ttgttctgaa ctttgtttac ctaactagtt 2700ttaaatttcc ctttttcgtg catgcgggtg ttcttattta ttagcatact acatttgaaa 2760tatcaaattt ccttagtaga aaagtgagag aaggtgcact gacacaaaaa ataaaatccc 2820cgcgtgcttg gccggccgtc ttcattggat gttcgtacca ccaaggaatt actggagtta 2880gttgaagcat taggtcccaa aatttgttta ctaaaaacac atgtggatat cttgactgat 2940ttttccatgg agggcacagt taagccgcta aaggcattat ccgccaagta caatttttta 3000ctcttcgaag acagaaaatt tgctgacatt ggtaatacag tcaaattgca gtactctgcg 3060ggtgtataca gaatagcaga atgggcagac attacgaatg cacacggtgt ggtgggccca 3120ggtattgtta gcggtttgaa gcaggcggca gaagaagtaa caaaggaacc tagaggcctt 3180ttgatgttag cagaattgtc atgcaagggc tccctatcta ctggagaata tactaagggt 3240actgttgaca ttgcgaagag tgacaaagat tttgttatcg gctttattgc tcaaagagac 3300atgggtggaa gagatgaagg ttacgattgg ttgattatga cacccggtgt gggtttagat 3360gacaagggag acgcattggg tcaacagtat agaaccgtgg atgatgtggt ctctacagga 3420tctgacatta ttattgttgg aagaggacta tttgcaaagg gaagggatgc taaggtagag 3480ggtgaacgtt acagaaaagc aggctgggaa gcatatttga gaagatgcgg ccagcaaaac 3540taaaaaactg tattataagt aaatgcatgt atactaaact cacaaattag agcttcaatt 3600taattatatc agttattacc cgggaatctc ggtgtttaaa ccccagcgcc tggcgggtct 3660agatc 36652310623DNAArtificial SequenceSynthetic Integration construct i94 23atgagtgata gggaattcgt cacggtagat cccgtcacta tcataatcaa agaatgcatt 60aatttatcga cagcgatgcg gaaatactct aaatttacct ctcaatctgg agtggccgct 120ttgctggggg gaggaagtga aatatttagc aatcaagatg actacttggc tcacacattc 180aacaatttga ataccaacaa gcacaatgat ccatttttat ctggattcat tcagttaaga 240cttatgttga ataaactgaa aaatctagat aatatagatt cactaaccat attgcagcca 300tttttattaa ttgtgagtac

aagttccatt tctggttaca tcacttccct ggccctggac 360tctttgcaga aattctttac cttgaatatc atcaatgaat catcgcaaaa ctatattggt 420gcacacaggg cgacggtaaa tgctctaaca cattgtaggt ttgaaggatc tcaacaactt 480tctgatgatt cagttctttt gaaagtcgtg tttttactgc gttcaatcgt cgactcacct 540tacggagatt tattatcaaa ctctatcata tatgacgtat tgcaaacgat tctttcattg 600gcttgtaata acagaaggag cgaagtcctt aggaatgctg cacaatcaac aatgatagcc 660gttaccgtaa agattttctc aaaactaaag actattgagc ctgttaatgt gaatcaaata 720tacatcaatg atgaaagtta cacaaatgat gtattgaagg ccgatacaat tggcacaaat 780gtagaatcca aagaagaagg aagtcaagaa gatcccatcg gcatgaaagt gaataatgag 840gaagctatta gcgaggacga tggcattgaa gaagagcata ttcattcaga gaagagcaca 900aatggcgccg aacaactaga tattgtgcaa aaaacaacaa gatcaaattc caggatccaa 960gcgtatgctg atgataacta tggattgccc gtggttaggc aatatttaaa cttattacta 1020tcattgattg cgccagaaaa tgaattaaaa cattcatact ccactagaat atttggccta 1080gagttaattc aaacggcatt agaaatttca ggtgatcgat tgcagctata cccacggctt 1140tttacactga tatcagatcc tattttcaaa agcattttgt ttatcataca gaacactaca 1200aaattatcac tacttcaagc tacattgcag ctatttacta ctctagttgt tatattgggc 1260aacaacttac aattacagat cgagctcact ctaacaagaa tattttctat tcttttagat 1320gatggtaccg caaataactc gagttctgaa aataagaaca agccatcaat aataaaggaa 1380cttctaattg agcaaatatc catcttatgg actaggtcgc catctttttt tacttctact 1440tttatcaatt tcgattgtaa tctcgatagg gcagacgttt ccataaactt tttgaaggct 1500ttgactaaat tggccttacc agaatccgcc ttaactacca cagaaagtgt accacccatt 1560tgccttgagg gattggtctc cctagtcgat gatatgttcg atcacatgaa ggacattgac 1620agagaagaat ttggcaggca aaagaatgaa atggaaatct taaaaaagag ggaccgtaaa 1680acagagttta ttgaatgtac caatgcattc aatgaaaagc ccaaaaaggg tattccgatg 1740ttaatagaaa aaggtttcat tgcttccgac tccgataaag atattgcgga gtttcttttc 1800aataataaca accgtatgaa taaaaaaaca atcggtttgc tactttgcca tccggacaaa 1860gtaagcttgt tgaatgaata tattcgtttg tttgattttt cagggttaag ggtcgatgaa 1920gctattagaa ttttgttgac gaaatttagg ttgcctggtg aatcgcaaca aattgaaaga 1980atcatcgaag ccttctcgtc tgcgtattgt gaaaatcaag attacgatcc atccaaaatc 2040agtgacaacg cggaggatga catttctact gttcaaccag acgctgattc tgttttcatt 2100ttaagttatt caattattat gttgaacact gacctacata accctcaagt gaaggaacac 2160atgtcatttg aagattactc tggtaactta aagggatgct gtaatcacaa agacttccca 2220ttctggtatt tggatagaat ttactgttca atcagagata aagaaattgt tatgcctgaa 2280gagcaccacg gcaacgaaaa gtggtttgaa gatgcttgga ataacttgat atcttcaact 2340actgttataa ctgaaataaa aaaagacaca caatctgtca tggataaatt aacacccttg 2400gagcttttga actttgatag agcaattttt aaacaagttg gcccaagtat tgtcagtact 2460ttattcaaca tttacgtagt tgcatctgat gaccatatat ctaccagaat gataacaagt 2520ttggacaaat gttcctatat ttccgcattt tttgacttca aagatctctt taatgatata 2580ctaaactcca ttgctaaggg cactactttg attaattcaa gccatgacga tgaactttca 2640actttagctt ttgaatatgg cccaatgcca ctggtgcaaa ttaaattcga agacactaac 2700actgagatcc cggttagtac agatgctgtt agatttggta gatcatttaa gggtcaacta 2760aatacagttg tttttttccg gattattcgc aggaacaaag atcctaaaat tttctccaag 2820gaattatggt taaacattgt taatattata ctaacattgt acgaagactt gattttgtct 2880cctgatattt tccctgattt acaaaaaaga ctgaaattaa gcaacttgcc taagccatct 2940cctgaaattt ctattaacaa gagcaaagaa agcaaaggtc tcttatcaac atttgcttct 3000tatttaaaag gtgatgaaga acccacagaa gaggaaatca aatcctcaaa aaaagcgatg 3060gagtgcataa agtcgagtaa tattgccgcc tctgtctttg gaaatgaatc aaatataaca 3120gcggatttaa taaaaacttt actagactcc gccaaaactg agaaaaacgc agataattcc 3180aggtattttg aagcagaact tttatttatc atcgaattga ctattgcatt atttctattt 3240tgcaaagagg agaaagaatt aggaaagttc atacttcaaa aagttttcca actttctcac 3300acgaaaggcc tcacgaaaag gactgttcgt agaatgctaa catacaaaat tttgttaatt 3360tcgttatgtg cggatcagac ggagtacttg tccaaattaa taaacgatga gctgttaaaa 3420aagggggata tttttaccca aaaatttttt gcaactaatc aaggtaagga atttttgaag 3480agactatttt cattgaccga atcagagttt tatagaggat ttttactagg aaatgagaat 3540ttttggaaat ttttaagaaa agttacagca atgaaagagc agagcgagag catttttgaa 3600tatttaaatg aatcgatcaa gacagacagc aatattttga caaatgagaa cttcatgtgg 3660gtcctaggac tattagatga aatttcatca atgggtgccg ttggaaatca ctgggaaata 3720gaatacaaga aattgacaga aagtggtcat aaaattgata aggagaatcc atacaagaaa 3780tcgatcgaat tatcattgaa atccattcaa ctaacatcac acttgctgga agataataac 3840gatctgcgta aaaacgagat attcgctatt attcaagctt tggcacatca atgcatcaat 3900ccgtgtaagc agataagtga atttgcagtg gtaacgctag agcagacgct catcaataaa 3960atcgaaattc caactaatga gatggaatcg gtagaagaat taattgaggg cggattacta 4020ccgttgctaa attcgagtga aacacaggaa gaccagaaaa tcctcatttc atccatatta 4080acaataattt caaatgttta tttgcattat ttgaaactag ggaagacaag caacgaaacg 4140tttttgaaaa ttttgagtat tttcaataaa tttgtagagg actcagatat tgaaaaaaag 4200ctacagcaat taatacttga taagaagagt attgagaagg gcaacggttc atcatctcat 4260ggatctgcac atgaacaaac accagagtca aacgacgttg aaattgaggc tactgcgcca 4320attgatgaca atacagacga tgataacaaa ccgaagttat ctgatgtaga aaaggattaa 4380agatgctaag agatagtgat gatatttcat aaataatgta attctatata tgttaattac 4440cttttttgcg aggcatattt atggtgaagg ataagttttg accatcaaag aaggttaatg 4500tggctgtggt ttcagggtcc atacccggga gttatgacaa ttacaacaac agaattcttt 4560ctatatatgc acgaacttgt aatatggaag aaattatgac gtacaaacta taaagtaaat 4620attttacgta acacatggtg ctgttgtgct tctttttcaa gagaatacca atgacgtatg 4680actaagttta ggatttaatg caggtgacgg acccatcttt caaacgattt atatcagtgg 4740cgtccaaatt gttaggtttt gttggttcag caggtttcct gttgtgggtc atatgacttt 4800gaaccaaatg gccggctgct agggcagcac ataaggataa ttcacctgcc aagacggcac 4860aggcaactat tcttgctaat tgacgtgcgt tggtaccagg agcggtagca tgtgggcctc 4920ttacacctaa taagtccaac atggcacctt gtggttctag aacagtacca ccaccgatgg 4980tacctacttc gatggatggc atggatacgg aaattctcaa atcaccgtcc acttctttca 5040tcaatgttat acagttggaa ctttcgacat tttgtgcagg atcttgtcct aatgccaaga 5100aaacagctgt cactaaatta gctgcatgtg cgttaaatcc accaacagac ccagccattg 5160cagatccaac caaattctta gcaatgttca actcaaccaa tgcggaaaca tcacttttta 5220acacttttct gacaacatca ccaggaatag tagcttctgc gacgacactc ttaccacgac 5280cttcgatcca gttgatggca gctggttttt tgtcggtaca gtagttacca gaaacggaga 5340caacctccat atcttcccag ccatactctt ctaccatttg ctttaatgag tattcgacac 5400ccttagaaat catattcata cccattgcgt caccagtagt tgttctaaat ctcatgaaga 5460gtaaatctcc tgctagacaa gtttgaatat gttgcagacg tgcaaatctt gatgtagagt 5520taaaagcttt tttaattgcg ttttgtccct cttctgagtc taaccatatc ttacaggcac 5580cagatctttt caaagttggg aaacggacta ctgggcctct tgtcatacca tccttagtta 5640aaacagttgt tgcaccaccg ccagcattga ttgccttaca gccacgcatg gcagaagcta 5700ccaaacaacc ctctgtagtt gccattggta tatgataaga tgtaccatcg ataaccaagg 5760ggcctataac accaacgggc aaaggcatgt aacctataac attttcacaa caagcgccaa 5820atacgcggtc gtagtcataa tttttatatg gtaaacgatc agatgctaat acaggagctt 5880ctgccaaaat tgaaagagcc ttcctacgta ccgcaaccgc tctcgtagta tcacctaatt 5940ttttctccaa agcgtacaaa ggtaacttac cgtgaataac caaggcagcg acctctttgt 6000tcttcaattg ttttgtattt ccactactta ataatgcttc taattcttct aaaggacgta 6060ttttcttatc caagctttca atatcgcggg aatcatcttc ctcactagat gatgaaggtc 6120ctgatgagct cgattgcgca gatgataaac ttttgacttt cgatccagaa atgactgttt 6180tattggttaa aactggtgta gaagcctttt gtacaggagc agtaaaagac ttcttggtga 6240cttcagtctt caccaattgg tctgcagcca ttatagtttt ttctccttga cgttaaagta 6300tagaggtata ttaacaattt tttgttgata cttttatgac atttgaataa gaagtaatac 6360aaaccgaaaa tgttgaaagt attagttaaa gtggttatgc agcttttgca tttatatatc 6420tgttaataga tcaaaaatca tcgcttcgct gattaattac cccagaaata aggctaaaaa 6480actaatcgca ttattatcct atggttgtta atttgattcg ttgatttgaa ggtttgtggg 6540gccaggttac tgccaatttt tcctcttcat aaccataaaa gctagtattg tagaatcttt 6600attgttcgga gcagtgcggc gcgaggcaca tctgcgtttc aggaacgcga ccggtgaaga 6660ccaggacgca cggaggagag tcttccgtcg gagggctgtc gcccgctcgg cggcttctaa 6720tccgtacttc aatatagcaa tgagcagtta agcgtattac tgaaagttcc aaagagaagg 6780tttttttagg ctaagataat ggggctcttt acatttccac aacatataag taagattaga 6840tatggatatg tatatggtgg tattgccatg taatatgatt attaaacttc tttgcgtcca 6900tccaaaaaaa aagtaagaat ttttgaaaat tcaatataaa tgaaactctc aactaaactt 6960tgttggtgtg gtattaaagg aagacttagg ccgcaaaagc aacaacaatt acacaataca 7020aacttgcaaa tgactgaact aaaaaaacaa aagaccgctg aacaaaaaac cagacctcaa 7080aatgtcggta ttaaaggtat ccaaatttac atcccaactc aatgtgtcaa ccaatctgag 7140ctagagaaat ttgatggcgt ttctcaaggt aaatacacaa ttggtctggg ccaaaccaac 7200atgtcttttg tcaatgacag agaagatatc tactcgatgt ccctaactgt tttgtctaag 7260ttgatcaaga gttacaacat cgacaccaac aaaattggta gattagaagt cggtactgaa 7320actctgattg acaagtccaa gtctgtcaag tctgtcttga tgcaattgtt tggtgaaaac 7380actgacgtcg aaggtattga cacgcttaat gcctgttacg gtggtaccaa cgcgttgttc 7440aactctttga actggattga atctaacgca tgggatggta gagacgccat tgtagtttgc 7500ggtgatattg ccatctacga taagggtgcc gcaagaccaa ccggtggtgc cggtactgtt 7560gctatgtgga tcggtcctga tgctccaatt gtatttgact ctgtaagagc ttcttacatg 7620gaacacgcct acgattttta caagccagat ttcaccagcg aatatcctta cgtcgatggt 7680catttttcat taacttgtta cgtcaaggct cttgatcaag tttacaagag ttattccaag 7740aaggctattt ctaaagggtt ggttagcgat cccgctggtt cggatgcttt gaacgttttg 7800aaatatttcg actacaacgt tttccatgtt ccaacctgta aattggtcac aaaatcatac 7860ggtagattac tatataacga tttcagagcc aatcctcaat tgttcccaga agttgacgcc 7920gaattagcta ctcgcgatta tgacgaatct ttaaccgata agaacattga aaaaactttt 7980gttaatgttg ctaagccatt ccacaaagag agagttgccc aatctttgat tgttccaaca 8040aacacaggta acatgtacac cgcatctgtt tatgccgcct ttgcatctct attaaactat 8100gttggatctg acgacttaca aggcaagcgt gttggtttat tttcttacgg ttccggttta 8160gctgcatctc tatattcttg caaaattgtt ggtgacgtcc aacatattat caaggaatta 8220gatattacta acaaattagc caagagaatc accgaaactc caaaggatta cgaagctgcc 8280atcgaattga gagaaaatgc ccatttgaag aagaacttca aacctcaagg ttccattgag 8340catttgcaaa gtggtgttta ctacttgacc aacatcgatg acaaatttag aagatcttac 8400gatgttaaaa aataatcttc ccccatcgat tgcatcttgc tgaaccccct tcataaatgc 8460tttatttttt tggcagcctg ctttttttag ctctcattta atagagtagt tttttaatct 8520atatactagg aaaactcttt atttaataac aatgatatat atatattcca gtggtgcatg 8580aacgcatgag aaagcccccg gaagatcatc ttccgggggc tttttttttg gcgcgcgata 8640cagaccggtt cagacaggat aaagaggaac gcagaatgtt agacaacacc cgcttacgca 8700tagctattca gaaatcaggc cgtttaagcg atgattcacg agaattgctg gcccgctgcg 8760gcataaaaat taatttacac actcagcgcc tgattgcgat ggcggaaaac atgccgattg 8820atatcctgcg cgtgcgtgat gatgacattc cgggtctggt aatggatggc gtggtcgatc 8880tcggtattat cggcgaaaac gtgctggaag aagagctact caaccgccgc gcacagggcg 8940aagatccacg ctatttaacc ctgcgccgtc ttgacttcgg cggctgccgt ttatcgctgg 9000caacaccggt tgacgaagcc tgggacggcc cggccgcgct ggacggtaaa cgtatcgcta 9060cctcatatcc gcacctcctc aaacgctacc tcgaccagaa aggcgtctct tttaaatcgt 9120gtctgttaaa tggttctgtc gaagtcgcgc cgcgcgcggg gctggccgac gctatctgcg 9180atttggtctc taccggcgcg acgcttgaag ctaacggcct gcgtgaagtc gaagttatct 9240accgctctaa agcctgtctg attcagcgcg acggtgagat ggcacagagc aagcaagagc 9300tgatcgataa attgctgacc cgtattcagg gcgtgattca ggcgcgcgaa tcgaaataca 9360tcatgatgca cgcgccaagt gaacgcctgg aagaggttat cgccctgctg ccaggcgccg 9420aaaggccgac aattctgccg ctggcaggcg agcaacagcg cgtggcgatg cacatggtca 9480gcagcgaaac gttgttctgg gaaaccatgg agaaactgaa agcgcttggc gccagctcga 9540ttctggtact gccgatcgag aagatgatgg agtgatctga cgcctgatgg cgctgcgctt 9600atcaggccta cgtaatgcgt tgaaaaactg tattataagt aaatgcatgt atactaaact 9660cacaaattag agcttcaatt taattatatc agttattacc cgggaatctc ggtcgtaatg 9720atttctataa tgacgaaaaa aaaaaaattg gaaagaaaaa gcttcatggc ctttataaaa 9780aggaactatc caatacctcg ccagaaccaa gtaacagtat tttacggggc acaaatcaag 9840aacaataaga caggactgta aagatggacg cattgaactc caaagaacaa caagagttcc 9900aaaaagtagt ggaacaaaag caaatgaagg atttcatgcg tttgtactct aatctggtag 9960aaagatgttt cacagactgt gtcaatgact tcacaacatc aaagctaacc aataaggaac 10020aaacatgcat catgaagtgc tcagaaaagt tcttgaagca tagcgaacgt gtagggcagc 10080gtttccaaga acaaaacgct gccttgggac aaggcttggg ccgataaggt gtactggcgt 10140atatatatct aattatgtat ctctggtgta gcccattttt agcatgtaaa tataaagaga 10200aaccatatct aatctaacca aatccaaaca aaattcaata gttactatcg cttttttctt 10260tctgtatcgc aaataagtga aaattaaaaa agaaagatta aattggaagt tggatatggg 10320ctggaacagc agcagtaatc ggtatcgggt tcgccactaa tgacgtccta cgattgcact 10380caacagacct tgacgctcac gccgtagcgg gcgacaagtc aaacggaaca accgttgccg 10440ttcccatcgg agtccgacct aggccgaact ccgtgaattt ctgataacaa cggtcggtaa 10500agactggttc cccagtatat ttcttctctc aggagcaggg gccaatgcca aaagcgacat 10560taacccggag gacaaggctc cactgtgttc caccgaattt cccacctgat aatatctgat 10620aac 10623248479DNAArtificial SequenceSynthetic Integration construct i467 24gacggcacgg ccacgcgttt aaaccgccct ccaagctgac ataaatcgca ctttgtatct 60actttttttt attcgaaaac aaggcacaac aatgaatcta tcgccctgtg agattttcaa 120tctcaagttt gtgtaataga tagcgttata ttatagaact ataaaggtcc ttgaatatac 180atagtgtttc attcctatta ctgtatatgt gactttacat tgttacttcc gcggctattt 240gacgttttct gcttcaggtg cggcttggag ggcaaagtgt cagaaaatcg gccaggccgt 300atgacacaaa agagtagaaa acgagatctc aaatatctcg aggcctgtcc tctatacaac 360cgcccagctc tctgacaaag ctccagaacg gttgtctttt gtttcgaaaa gccaaggtcc 420cttataattg ccctccattt tgtgtcacct atttaagcaa aaaattgaaa gtttactaac 480ctttcattaa agagaaataa caatattata aaaagcgctt aaagctcaca cgcggccagg 540gggagccgtt catcatctca tggatctgca catgaacaaa caccagagtc aaacgacgtt 600gaaattgagg ctactgcgcc aattgatgac aatacagacg atgataacaa accgaagtta 660tctgatgtag aaaaggatta aagatgctaa gagatagtga tgatatttca taaataatgt 720aattctatat atgttaatta ccttttttgc gaggcatatt tatggtgaag gataagtttt 780gaccatcaaa gaaggttaat gtggctgtgg tttcagggtc cataaagctt ttcaattcat 840cttttttttt tttgttcttt tttttgattc cggtttcttt gaaatttttt tgattcggta 900atctccgagc agaaggaaga acgaaggaag gagcacagac ttagattggt atatatacgc 960atatgtggtg ttgaagaaac atgaaattgc ccagtattct taacccaact gcacagaaca 1020aaaacctgca ggaaacgaag ataaatcatg tcgaaagcta catataagga acgtgctgct 1080actcatccta gtcctgttgc tgccaagcta tttaatatca tgcacgaaaa gcaaacaaac 1140ttgtgtgctt cattggatgt tcgtaccacc aaggaattac tggagttagt tgaagcatta 1200ggtcccaaaa tttgtttact aaaaacacat gtggatatct tgactgattt ttccatggag 1260ggcacagtta agccgctaaa ggcattatcc gccaagtaca attttttact cttcgaagac 1320agaaaatttg ctgacattgg taatacagtc aaattgcagt actctgcggg tgtatacaga 1380atagcagaat gggcagacat tacgaatgca cacggtgtgg tgggcccagg tattgttagc 1440ggtttgaagc aggcggcaga agaagtaaca aaggaaccta gaggcctttt gatgttagca 1500gaattgtcat gcaagggctc cctatctact ggagaatata ctaagggtac tgttgacatt 1560gcgaagagtg acaaagattt tgttatcggc tttattgctc aaagagacat gggtggaaga 1620gatgaaggtt acgattggtt gattatgaca cccggtgtgg gtttagatga caagggagac 1680gcattgggtc aacagtatag aaccgtggat gatgtggtct ctacaggatc tgacattatt 1740attgttggaa gaggactatt tgcaaaggga agggatgcta aggtagaggg tgaacgttac 1800agaaaagcag gctgggaagc atatttgaga agatgcggcc agcaaaacta aaaaactgta 1860ttataagtaa atgcatgtat actaaactca caaattagag cttcaattta attatatcag 1920ttattacccg ggaatctcgg tcgtaatgat ttctataatg acgaaaaaaa aaaaattgga 1980aagaaaaagc ttcatggcct ttataaaaag gaactatcca atacctcgcc agaaccaagt 2040aacagtattt tacggggcac aaatcaagaa caataagaca ggactgtaaa gatggacgca 2100tcgctcgtcc aacgccggcg gacctgtttt caatagttcg gtaatattaa cggataccta 2160ctattatccc ctagtaggct cttttcacgg agaaattcgg gagtgttttt tttccgtgcg 2220cattttctta gctatattct tccagcttcg cctgctgccc ggtcatcgtt cctgtcacgt 2280agtttttccg gattcgtccg gctcatataa taccgcaata aacacggaat atctcgttcc 2340gcggattcgg ttaaactctc ggtcgcggat tatcacagag aaagcttcgt ggagaatttt 2400tccagatttt ccgctttccc cgatgttggt atttccggag gtcattatac tgaccgccat 2460tataatgact gtacaacgac cttctggaga aagaaacaac tcaataacga tgtgggacat 2520tgggggccca ctcaaaaaat ctggggacta tatccccaga gaatttctcc agaagagaag 2580aaaagtcaaa gttttttttc gcttgggggt tgcatataaa tacaggcgct gttttatctt 2640cagcatgaat attccataat tttacttaat agcttttcat aaataataga atcacaaaca 2700aaatttacat ctgagttaaa caatcatgac aatcaaggaa cataaagtag tttatgaagc 2760tcacaacgta aaggctctta aggctcctca acatttttac aacagccaac ccggcaaggg 2820ttacgttact gatatgcaac attatcaaga aatgtatcaa caatctatca atgagccaga 2880aaaattcttt gataagatgg ctaaggaata cttgcattgg gatgctccat acaccaaagt 2940tcaatctggt tcattgaaca atggtgatgt tgcatggttt ttgaacggta aattgaatgc 3000atcatacaat tgtgttgaca gacatgcctt tgctaatccc gacaagccag ctttgatcta 3060tgaagctgat gacgaatccg acaacaaaat catcacattt ggtgaattac tcagaaaagt 3120ttcccaaatc gctggtgtct taaaaagctg gggcgttaag aaaggtgaca cagtggctat 3180ctatttgcca atgattccag aagcggtcat tgctatgttg gctgtggctc gtattggtgc 3240tattcactct gttgtctttg ctgggttctc cgctggttcg ttgaaagatc gtgtcgttga 3300cgctaattct aaagtggtca tcacttgtga tgaaggtaaa agaggtggta agaccatcaa 3360cactaaaaaa attgttgacg aaggtttgaa cggagtcgat ttggtttccc gtatcttggt 3420tttccaaaga actggtactg aaggtattcc aatgaaggcc ggtagagatt actggtggca 3480tgaggaggcc gctaagcaga gaacttacct acctcctgtt tcatgtgacg ctgaagatcc 3540tctattttta ttatacactt ccggttccac tggttctcca aagggtgtcg ttcacactac 3600aggtggttat ttattaggtg ccgctttaac aactagatac gtttttgata ttcacccaga 3660agatgttctc ttcactgccg gtgacgtcgg ctggatcacg ggtcacacct atgctctata 3720tggtccatta accttgggta ccgcctcaat aattttcgaa tccactcctg cctacccaga 3780ttatggtaga tattggagaa ttatccaacg tcacaaggct acccatttct atgtggctcc 3840aactgcttta agattaatca aacgtgtagg tgaagccgaa attgccaaat atgacacttc 3900ctcattacgt gtcttgggtt ccgtcggtga accaatctct ccagacttat gggaatggta 3960tcatgaaaaa gtgggtaaca aaaactgtgt catttgtgac actatgtggc aaacagagtc 4020tggttctcat ttaattgctc ctttggcagg tgctgtccca acaaaacctg gttctgctac 4080cgtgccattc tttggtatta acgcttgtat cattgaccct gttacaggtg tggaattaga 4140aggtaatgat gtcgaaggtg tccttgccgt taaatcacca tggccatcaa tggctagatc 4200tgtttggaac caccacgacc gttacatgga tacttacttg aaaccttatc ctggtcacta 4260tttcacaggt gatggtgctg gtagagatca tgatggttac tactggatca ggggtagagt 4320tgacgacgtt gtaaatgttt ccggtcatag attatccaca tcagaaattg aagcatctat 4380ctcaaatcac gaaaacgtct cggaagctgc tgttgtcggt attccagatg aattgaccgg 4440tcaaaccgtc gttgcatatg tttccctaaa agatggttat ctacaaaaca acgctactga 4500aggtgatgca gaacacatca caccagataa tttacgtaga gaattgatct tacaagttag 4560gggtgagatt ggtcctttcg cctcaccaaa aaccattatt ctagttagag atctaccaag 4620aacaaggtca ggaaagatta

tgagaagagt tctaagaaag gttgcttcta acgaagccga 4680acagctaggt gacctaacta ctttggccaa cccagaagtt gtacctgcca tcatttctgc 4740tgtagagaac caatttttct ctcaaaaaaa gaaataaatt gaattgaatt gaaatcgata 4800gatcaatttt tttcttttct ctttccccat cctttacgct aaaataatag tttattttat 4860tttttgaata ttttttattt atatacgtat atatagacta ttatttatct tttaatgatt 4920attaagattt ttattaaaaa aaaattcgct cctcttttaa tgcctttatg cagttttttt 4980ttcccattcg atatttctat gttcgggttc agcgtatttt aagtttaata actcgaaaat 5040tctgcgttcg ttaaagcttt cgagaaggat attatttcga aataaaccgt gttgtgtaag 5100cttgaagcct ttttgcgctg ccaatattct tatccatcta ttgtactctt tagatccagt 5160atagtgtatt cttcctgctc caagctcatc ccatccccgc gtgcttggcc ggccgttttg 5220ccagcttact atccttcttg aaaatatgca ctctatatct tttagttctt aattgcaaca 5280catagatttg ctgtataacg aattttatgc tattttttaa atttggagtt cagtgataaa 5340agtgtcacag cgaatttcct cacatgtagg gaccgaattg tttacaagtt ctctgtacca 5400ccatggagac atcaaaaatt gaaaatctat ggaaagatat ggacggtagc aacaagaata 5460tagcacgagc cgcggagttc atttcgttac ttttgatatc actcacaact attgcgaagc 5520gcttcagtga aaaaatcata aggaaaagtt gtaaatatta ttggtagtat tcgtttggta 5580aagtagaggg ggtaattttt cccctttatt ttgttcatac attcttaaat tgctttgcct 5640ctccttttgg aaagctatac ttcggagcac tgttgagcga aggctcatta gatatatttt 5700ctgtcatttt ccttaaccca aaaataaggg aaagggtcca aaaagcgctc ggacaactgt 5760tgaccgtgat ccgaaggact ggctatacag tgttcacaaa atagccaagc tgaaaataat 5820gtgtagctat gttcagttag tttggctagc aaagatataa aagcaggtcg gaaatattta 5880tgggcattat tatgcagagc atcaacatga taaaaaaaaa cagttgaata ttccctcaaa 5940aatgtcttac accgtcggaa cctacttggc cgagaggttg gtccagatcg gattgaagca 6000ccacttcgcc gtcgccggtg actacaactt ggtcttgttg gacaacttgt tgttgaacaa 6060gaacatggag caggtctatt gctgcaacga gttgaactgc ggtttctcag cagaaggtta 6120tgcaagagcc aagggagcag ccgctgccgt cgtcacctac tcagtcggtg cattatcagc 6180attcgatgca attggaggtg cttacgctga gaacttgcca gtcatcttga tctctggagc 6240acctaacaac aacgaccatg ctgctggtca cgtattgcac cacgccttgg gtaaaacaga 6300ctaccactac cagttggaaa tggcaaaaaa tattaccgca gccgcagagg ccatctacac 6360cccagaggaa gcacctgcca aaattgacca cgtcataaag accgctttga gagagaagaa 6420gcctgtttac ttggagatcg cctgcaacat cgcttctatg ccatgcgccg cacctggtcc 6480agcctctgct ttgttcaacg acgaggcctc tgacgaagct tcattgaacg ccgcagtcga 6540agagacatta aagttcatcg ccaacaggga caaagttgcc gtcttagtcg gttcaaagtt 6600gagggccgct ggtgccgaag aggcagctgt caagttcgct gacgccttgg gaggagccgt 6660cgccaccatg gccgcagcaa aatctttctt tcctgaggag aacccacatt acatcggaac 6720ctcatggggt gaagtatcat atcctggagt agaaaaaacc atgaaagagg ccgatgccgt 6780aatagcattg gctcctgtct tcaacgacta ctcaaccaca ggatggactg atataccaga 6840tccaaagaaa ttagtcttgg ctgagcctag gtctgtcgtc gtaaacggta tcaggttccc 6900ttctgttcat ttgaaggact acttaacaag attggcccaa aaggtatcta aaaagactgg 6960tgccttggac ttcttcaagt cattaaacgc aggagaattg aaaaaagcag caccagccga 7020tccatcagcc ccattagtta acgctgaaat cgctagacaa gtagaggctt tgttgactcc 7080aaacactacc gtcatagctg agacaggtga ctcttggttc aacgcacaga gaatgaaatt 7140gccaaatggt gccagggtcg agtatgaaat gcagtgggga catataggtt ggtcagtccc 7200agccgccttt ggatacgcag taggtgcccc tgagaggagg aacatattga tggttggtga 7260tggttcattc caattaacag cccaggaggt agcccaaatg gtcaggttga agttgcctgt 7320catcatcttc ttgatcaaca attacggata caccatcgag gtcatgatcc acgacggacc 7380ttacaacaac atcaaaaact gggactacgc cggtttgatg gaggttttca acggtaacgg 7440tggttatgac tcaggagccg gtaagggatt aaaggctaag accggtggtg aattggctga 7500agcaattaag gtcgcattgg ccaacaccga tggacctaca ttgattgaat gcttcatcgg 7560aagggaggac tgcaccgagg aattggttaa atggggtaaa agggtagccg ctgctaattc 7620aagaaaacca gttaataaat tattataata agtgaattta ctttaaatct tgcatttaaa 7680taaattttct ttttatagct ttatgactta gtttcaattt atatactatt ttaatgacat 7740tttcgattca ttgattgaaa gctttgtgtt ttttcttgat gcgctattgc attgttcttg 7800tctttttcgc cacatgtaat atctgtagta gatacctgat acattgtgga tgctgagtga 7860aattttagtt aataatggag gcgctcttaa taattttggg gatattggct taacctgcag 7920gccgcgagcg ccgatataaa ctaatgattt taaatcgtta aaaaaatatg cgaattctgt 7980ggatcgaaca caggacctcc agataacttg accgaagttt tttcttcagt ctggcgctct 8040cccaactgag ctaaatccgc ttactatttg ttatcagttc ccttcatatc tacatagaat 8100aggttaagta ttttattagt tgccagaaga actactgata gttgggaata tttggtgaat 8160aatgaagatt gggtgaataa tttgataatt ttgagattca attgttaatc aatgttacaa 8220tattatgtat acagagtata ctagaagttc tcttcggaga tcttgaagtt cacaaaaggg 8280aatcgatatt tctacataat attatcatta cttcttcccc atcttatatt tgtcattcat 8340tattgattat gatcaatgca ataatgattg gtagttgcca aacatttaat acgatcctct 8400gtaatatttc tatgaataat tatcacagca acgttcaatt atcttcaatt ccggtgttta 8460aaccccagcg cctggcggg 8479255628DNAArtificial SequenceSynthetic URA3 FcphI target site cassette 25gcctgtctac aggataaaga cgggtcggat acctgcacaa gcaatttggc acctgcatac 60cccatttccc cagtagataa cttcaacaca cacatcaatg tccctcacca gtttatttcc 120aaaagagacg ctttttacta cctgactaga ttttcatttt gtttcttttg gattgcgctt 180gcctttgtag gtgtgtcgtt tatcctttac gttttgactt ggtgctcgaa gatgctttca 240gagatggtgc ttatcctcat gtcttttggg tttgtcttca atacggcagc cgttgtcttg 300caaacggccg cctctgccat ggcaaagaat gctttccatg acgatcatcg tagtgcccaa 360ttgggtgcct ctatgatggg tttaaacgta tggcttgggc aagtgtcttt ttatgtatcg 420tggaatttat cctgctggtc ttctggtctg ttagggcaag gttggcctct acttactcca 480tcgacaattc aagatacaga acctcctcca gatggaatcc cttccataga gagaaggagc 540aagcaactga cccaatattg actgccactg gacctgaaga catgcaacaa agtgcaagca 600tagtggggcc ttcttccaat gctaatccgg tcactgccac tgctgctacg gaaaaccaac 660ctaaaggtat taacttcttc actataagaa aatcacacga gcgcccggac gatgtctctg 720tttaaatggc gcaagttttc cgctttgtaa tatatattta tacccctttc ttctctcccc 780tgcaatataa tagtttaatt ctaatattaa taatatccta tattttcttc atttaccggc 840gcactctcgc ccgaacgacc tcaaaatgtc tgctacattc ataataacca aaagctcata 900actttttttt ttgaacctga atatatatac atcacatatc actgctggtc ctttgtgagc 960gttgcgctcg tgcatcatgc gtccatcttt acagtcctgt cttattgttc ttgatttgtg 1020ccccgtaaaa tactgttact tggttctggc gaggtattgg atagttcctt tttataaagg 1080ccatgaagct ttttctttcc aatttttttt ttttcgtcat tatagaaatc attacgaccg 1140agattcccgg gtaataactg atataattaa attgaagctc taatttgtga gtttagtata 1200catgcattta cttataatac agttttttag ttttgctggc cgcatcttct caaatatgct 1260tcccagcctg cttttctgta acgttcaccc tctaccttag catcccttcc ctttgcaaat 1320agtcctcttc caacaataat aatgtcagat cctgtagaga ccacatcatc cacggttcta 1380tactgttgac ccaatgcgtc tcccttgtca tctaaaccca caccgggtgt cataatcaac 1440caatcgtaac cttcatctct tccacccatg tctctttgag caataaagcc gataacaaaa 1500tctttgtcac tcttcgcaat gtcaacagta cccttagtat attctccagt agatagggag 1560cccttgcatg acaattctgc taacatcaaa aggcctctag gttcctttgt tacttcttct 1620gccgcctgct tcaaaccgct aacaatacct gggcccacca caccgtgtgc attcgtaatg 1680tctgcccatt ctgctattct gtatacaccc gcagagtact gcaatttgac tgtattacca 1740atgtcagcaa attttctgtc ttcgaagagt aaaaaattgt acttggcgga taatgccttt 1800agcggcttaa ctgtgccctc catggaaaaa tcagtcaaga tatccacatg tgtttttagt 1860aaacaaattt tgggacctaa tgcttcaact aactccagta attccttggt ggtacgaaca 1920tccaatgaag cacacaagtt tgtttgcttt tcgtgcatga tattaaatag cttggcagca 1980acaggactag gatgagtagc agcacgttcc ttatatgtag ctttcgacat gatttatctt 2040cgtttcctgc aggtttttgt tctgtgcagt tgggttaaga atactgggca atttcatgtt 2100tcttcaacac cacatatgcg tatatatacc aatctaagtc tgtgctcctt ccttcgttct 2160tccttctgct cggagattac cgaatcaaaa aaatttcaaa gaaaccggaa tcaaaaaaaa 2220gaacaaaaaa aaaaaagatg aattgaaaag ctttatggac cctgaaacca cagccacatt 2280aaccttcttt gatggtcaaa acttatcctt caccataaat atgcctcgca aaaaaggtaa 2340ttaacatata tagaattaca ttatttatga aatatcatca ctatctctta gcatctttaa 2400tccttttcta catcagataa cttcggtttg ttatcatcgt ctgtattgtc atcaattggc 2460gcagtagcct caatttcaac gtcgtttgac tctggtgttt gttcatgtgc agatccatga 2520gatgatgaac ttgtgagcgt tgcgctcgtg catcaccata tacatatcca tatctaatct 2580tacttatatg ttgtggaaat gtaaagagcc ccattatctt agcctaaaaa aaccttctct 2640ttggaacttt cagtaatacg cttaactgct cattgctata ttgaagtacg gattagaagc 2700cgccgagcgg gcgacagccc tccgacggaa gactctcctc cgtgcgtcct ggtcttcacc 2760ggtcgcgttc ctgaaacgca gatgtgcctc gcgccgcact gctccgaaca ataaagattc 2820tacaatacta gcttttatgg ttatgaagag gaaaaattgg cagtaacctg gccccacaaa 2880ccttcaaatc aacgaatcaa attaacaacc ataggataat aatgcgatta gttttttagc 2940cttatttctg gggtaattaa tcagcgaagc gatgattttt gatctattaa cagatatata 3000aatgcaaaag ctgcataacc actttaacta atactttcaa cattttcggt ttgtattact 3060tcttattcaa atgtcataaa agtatcaaca aaaaattgtt aatatacctc tatactttaa 3120cgtcaaggag aaaaaactat aatgtcatta ccgttcttaa cttctgcacc gggaaaggtt 3180attatttttg gtgaacactc tgctgtgtac aacaagcctg ccgtcgctgc tagtgtgtct 3240gcgttgagaa cctacctgct aataagcgag tcatctgcac cagatactat tgaattggac 3300ttcccggaca ttagctttaa tcataagtgg tccatcaatg atttcaatgc catcaccgag 3360gatcaagtaa actcccaaaa attggccaag gctcaacaag ccaccgatgg cttgtctcag 3420gaactcgtta gtcttttgga tccgttgtta gctcaactat ccgaatcctt ccactaccat 3480gcagcgtttt gtttcctgta tatgtttgtt tgcctatgcc cccatgccaa gaatattaag 3540ttttctttaa agtctacttt acccatcggt gctgggttgg gctcaagcgc ctctatttct 3600gtatcactgg ccttagctat ggcctacttg ggggggttaa taggatctaa tgacttggaa 3660aagctgtcag aaaacgataa gcatatagtg aatcaatggg ccttcatagg tgaaaagtgt 3720attcacggta ccccttcagg aatagataac gctgtggcca cttatggtaa tgccctgcta 3780tttgaaaaag actcacataa tggaacaata aacacaaaca attttaagtt cttagatgat 3840ttcccagcca ttccaatgat cctaacctat actagaattc caaggtctac aaaagatctt 3900gttgctcgcg ttcgtgtgtt ggtcaccgag aaatttcctg aagttatgaa gccaattcta 3960gatgccatgg gtgaatgtgc cctacaaggc ttagagatca tgactaagtt aagtaaatgt 4020aaaggcaccg atgacgaggc tgtagaaact aataatgaac tgtatgaaca actattggaa 4080ttgataagaa taaatcatgg actgcttgtc tcaatcggtg tttctcatcc tggattagaa 4140cttattaaaa atctgagcga tgatttgaga attggctcca caaaacttac cggtgctggt 4200ggcggcggtt gctctttgac tttgttacga agagacatta ctcaagagca aattgacagt 4260ttcaaaaaga aattgcaaga tgattttagt tacgagacat ttgaaacaga cttgggtggg 4320actggctgct gtttgttaag cgcaaaaaat ttgaataaag atcttaaaat caaatcccta 4380gtattccaat tatttgaaaa taaaactacc acaaagcaac aaattgacga tctattattg 4440ccaggaaaca cgaatttacc atggacttca taagctaatt tgcgataggc attatttatt 4500agttgttttt aatcttaact gtgtatgaag ttttatgtaa taaagataga aagagaaaca 4560aaaaaaaatt tttcgtagta tcaattcagc tttcgaagac agaatgaaat ttaagcagac 4620catcatcttg ccctgtgctt ggcccccagt gcagcgaacg ttataaaaac gaatactgag 4680tatatatcta tgtaaaacaa ccatatcatt tcttgttctg aactttgttt acctaactag 4740ttttaaattt ccctttttcg tgcatgcggg tgttcttatt tattagcata ctacatttga 4800aatatcaaat ttccttagta gaaaagtgag agaaggtgca ctgacacaaa aaataaaatg 4860ctacgtataa ctgtcaaaac tttgcagcag cgggcatcct tccatcatag cttcaaacat 4920attagcgttc ctgatcttca tacccgtgct caaaatgatc aaacaaactg ttattgccaa 4980gaaataaacg caaggctgcc ttcaaaaact gatccattag atcctcatat caagcttcct 5040catagaacgc ccaattacaa taagcatgtt ttgctgttat caccgggtga taggtttgct 5100caaccatgga aggtagcatg gaatcataat ttggatacta atacaaatcg gccatataat 5160gccattagta aattgcgctc ccatttaggt ggttctccag gcaaatttga atactaataa 5220atgcggtgca tttgcaaaat gaatttattc caaggccaaa acaacacgat gaatggcttt 5280atttttttgt tattcctgac atgaagcttt atgtaattaa ggaaacggac atcgaggaat 5340ttgcatcttt tttagatgaa ggagctattc aagcaccaaa gctatccttc caggattatt 5400taagcggtaa ggccaaggct tcccaacagg ttcatgaagt gcatcataga aagcttacaa 5460ggtttcaggg tgaaactttt ctaagagatt ggaacttagt ctgtgggcat tataagagag 5520atgctaagtg tggagaaatg ggacccgaca taattgcagc atttcaagat gaaaagcttt 5580ttcctgagaa taatctagcc ttaatttctc atattggggg tcatattt 56282610019DNAArtificial SequenceSynthetic HXT3 FcphI target site cassette 26accggtatat caaatggcgg tgtagtttga aaagtacact gtatgtccat taataaaatt 60acgccgaaga acactgacta gctataaatt ctcttccggc gtccaatttc aacctaaggc 120aagtattgtt aatcaataat gcgtgttggt aaatatgaag cccgatgttg attaagaaga 180attattcgta taagaattaa gcaagcaaaa ttaaggaaaa ttttttcttt cctattcgtc 240atcgcagaca gccttcatct tctcgagata acacctggag gaggagcaat gaaatgaaag 300gaaaaaaaaa tactttcttt ttcttgaaaa aagaaaaaaa ttgtaagatg agctattcgc 360ggaacattct agctcgtttg catcttcttg catttggtag gttttcaata gttcggtaat 420attaacggat acctactatt atcccctagt aggctctttt cacggagaaa ttcgggagtg 480ttttttttcc gtgcgcattt tcttagctat attcttccag cttcgcctgc tgcccggtca 540tcgttcctgt cacgtagttt ttccggattc gtccggctca tataataccg caataaacac 600ggaatatctc gttccgcgga ttcggttaaa ctctcggtcg cggattatca cagagaaagc 660ttcgtggaga atttttccag attttccgct ttccccgatg ttggtatttc cggaggtcat 720tatactgacc gccattataa tgactgtaca acgaccttct ggagaaagaa acaactcaat 780aacgatgtgg gacattgggg gcccactcaa aaaatctggg gactatatcc ccagagaatt 840tctccagaag agaagaaaag tcaaagtttt ttttcgcttg ggggttgcat ataaattgtg 900agcgttgcgc tcgtgcatcg tcgacactag taatacacat catcgtccta caagttcatc 960aaagtgttgg acagacaact ataccagcat ggatctcttg tatcggttct tttctcccgc 1020tctctcgcaa taacaatgaa cactgggtca atcatagcct acacaggtga acagagtagc 1080gtttatacag ggtttatacg gtgattccta cggcaaaaat ttttcatttc taaaaaaaaa 1140aagaaaaatt tttctttcca acgctagaag gaaaagaaaa atctaattaa attgatttgg 1200tgattttctg agagttccct ttttcatata tcgaattttg aatataaaag gagatcgaaa 1260aaatttttct attcaatctg ttttctggtt ttatttgata gtttttttgt gtattattat 1320tatggattag tactggttta tatgggtttt tctgtataac ttctttttat tttagtttgt 1380ttaatcttat tttgagttac attatagttc cctaactgca agagaagtaa cattaaaaat 1440gaccactctt gacgacacgg cttaccggta ccgcaccagt gtcccggggg acgccgaggc 1500catcgaggca ctggatgggt ccttcaccac cgacaccgtc ttccgcgtca ccgccaccgg 1560ggacggcttc accctgcggg aggtgccggt ggacccgccc ctgaccaagg tgttccccga 1620cgacgaatcg gacgacgaat cggacgccgg ggaggacggc gacccggact cccggacgtt 1680cgtcgcgtac ggggacgacg gcgacctggc gggcttcgtg gtcgtctcgt actccggctg 1740gaaccgccgg ctgaccgtcg aggacatcga ggtcgccccg gagcaccggg ggcacggggt 1800cgggcgcgcg ttgatggggc tcgcgacgga gttcgcccgc gagcggggcg ccgggcacct 1860ctggctggag gtcaccaacg tcaacgcacc ggcgatccac gcgtaccggc ggatggggtt 1920caccctctgc ggcctggaca ccgccctgta cgacggcacc gcctcggacg gcgagcaggc 1980gctctacatg agcatgccct gcccctgagt ttaacttgat actactagat tttttctctt 2040catttataaa atttttggtt ataattgaag ctttagaagt atgaaaaaat cctttttttt 2100cattctttgc aaccaaaata agaagcttct tttattcatt gaaatgatga atataaacct 2160aacaaaagaa aaagactcga atatcaaaca ttaaaaaaaa ataaaagagg ttatctgttt 2220tcccatttag ttggagtttg cattttctaa tagatagaac tctcaattaa tgtggattta 2280gtttctctgt tcgttttttt ttgttttgtt ctcactgtat ttacatttct atttagtatt 2340tagttattca tataatctta acttctcgag gagctcgatg cgtccatctt tacagtcctg 2400tcttattgtt cttgatttgt gccccgtaaa atactgttac ttggttctgg cgaggtattg 2460gatagttcct ttttataaag gccatgaagc tttttctttc caattttttt tttttcgtca 2520ttatagaaat cattacgacc gagattcccg ggtaataact gatataatta aattgaagct 2580ctaatttgtg agtttagtat acatgcattt acttataata cagtttttta gttttgctgg 2640ccgcatcttc tcaaatatgc ttcccagcct gcttttctgt aacgttcacc ctctacctta 2700gcatcccttc cctttgcaaa tagtcctctt ccaacaataa taatgtcaga tcctgtagag 2760accacatcat ccacggttct atactgttga cccaatgcgt ctcccttgtc atctaaaccc 2820acaccgggtg tcataatcaa ccaatcgtaa ccttcatctc ttccacccat gtctctttga 2880gcaataaagc cgataacaaa atctttgtca ctcttcgcaa tgtcaacagt acccttagta 2940tattctccag tagataggga gcccttgcat gacaattctg ctaacatcaa aaggcctcta 3000ggttcctttg ttacttcttc tgccgcctgc ttcaaaccgc taacaatacc tgggcccacc 3060acaccgtgtg cattcgtaat gtctgcccat tctgctattc tgtatacacc cgcagagtac 3120tgcaatttga ctgtattacc aatgtcagca aattttctgt cttcgaagag taaaaaattg 3180tacttggcgg ataatgcctt tagcggctta actgtgccct ccatggaaaa atcagtcaag 3240atatccacat gtgtttttag taaacaaatt ttgggaccta atgcttcaac taactccagt 3300aattccttgg tggtacgaac atccaatgaa gcacacaagt ttgtttgctt ttcgtgcatg 3360atattaaata gcttggcagc aacaggacta ggatgagtag cagcacgttc cttatatgta 3420gctttcgaca tgatttatct tcgtttcctg caggtttttg ttctgtgcag ttgggttaag 3480aatactgggc aatttcatgt ttcttcaaca ccacatatgc gtatatatac caatctaagt 3540ctgtgctcct tccttcgttc ttccttctgc tcggagatta ccgaatcaaa aaaatttcaa 3600agaaaccgga atcaaaaaaa agaacaaaaa aaaaaaagat gaattgaaaa gctttatgga 3660ccctgaaacc acagccacat taaccttctt tgatggtcaa aacttatcct tcaccataaa 3720tatgcctcgc aaaaaaggta attaacatat atagaattac attatttatg aaatatcatc 3780actatctctt agcatcttta atccttttct acatcagata acttcggttt gttatcatcg 3840tctgtattgt catcaattgg cgcagtagcc tcaatttcaa cgtcgtttga ctctggtgtt 3900tgttcatgtg cagatccatg agatgatgaa cttgtgagcg ttgcgctcgt gcatccgctc 3960gtccaacgcc ggcggacctc gctcgtccaa cgccggcgga cctcttttaa ttctgctgta 4020acccgtacat gcccaaaata gggggcgggt tacacagaat atataacatc gtaggtgtct 4080gggtgaacag tttattcctg gcatccacta aatataatgg agcccgcttt ttaagctggc 4140atccagaaaa aaaaagaatc ccagcaccaa aatattgttt tcttcaccaa ccatcagttc 4200ataggtccat tctcttagcg caactacaga gaacaggggc acaaacaggc aaaaaacggg 4260cacaacctca atggagtgat gcaacctgcc tggagtaaat gatgacacaa ggcaattgac 4320ccacgcatgt atctatctca ttttcttaca ccttctatta ccttctgctc tctctgattt 4380ggaaaaagct gaaaaaaaag gttgaaacca gttccctgaa attattcccc tacttgacta 4440ataagtatat aaagacggta ggtattgatt gtaattctgt aaatctattt cttaaacttc 4500ttaaattcta cttttatagt tagtcttttt tttagtttta aaacaccaag aacttagttt 4560cgatccccgc gtgcttggcc ggccgtatcc ccgcgtgctt ggccggccgt atgtctcaga 4620acgtttacat tgtatcgact gccagaaccc caattggttc attccagggt tctctatcct 4680ccaagacagc agtggaattg ggtgctgttg ctttaaaagg cgccttggct aaggttccag 4740aattggatgc atccaaggat tttgacgaaa ttatttttgg taacgttctt tctgccaatt 4800tgggccaagc tccggccaga caagttgctt tggctgccgg tttgagtaat catatcgttg 4860caagcacagt taacaaggtc tgtgcatccg ctatgaaggc aatcattttg ggtgctcaat 4920ccatcaaatg tggtaatgct gatgttgtcg tagctggtgg ttgtgaatct atgactaacg 4980caccatacta catgccagca gcccgtgcgg gtgccaaatt tggccaaact gttcttgttg 5040atggtgtcga aagagatggg ttgaacgatg cgtacgatgg tctagccatg ggtgtacacg 5100cagaaaagtg tgcccgtgat tgggatatta ctagagaaca acaagacaat tttgccatcg 5160aatcctacca aaaatctcaa aaatctcaaa aggaaggtaa attcgacaat gaaattgtac 5220ctgttaccat taagggattt agaggtaagc ctgatactca agtcacgaag gacgaggaac 5280ctgctagatt acacgttgaa aaattgagat ctgcaaggac tgttttccaa aaagaaaacg 5340gtactgttac tgccgctaac gcttctccaa tcaacgatgg tgctgcagcc gtcatcttgg 5400tttccgaaaa agttttgaag

gaaaagaatt tgaagccttt ggctattatc aaaggttggg 5460gtgaggccgc tcatcaacca gctgatttta catgggctcc atctcttgca gttccaaagg 5520ctttgaaaca tgctggcatc gaagacatca attctgttga ttactttgaa ttcaatgaag 5580ccttttcggt tgtcggtttg gtgaacacta agattttgaa gctagaccca tctaaggtta 5640atgtatatgg tggtgctgtt gctctaggtc acccattggg ttgttctggt gctagagtgg 5700ttgttacact gctatccatc ttacagcaag aaggaggtaa gatcggtgtt gccgccattt 5760gtaatggtgg tggtggtgct tcctctattg tcattgaaaa gatatgatta cgttctgcga 5820ttttctcatg atctttttca taaaatacat aaatatataa atggctttat gtataacagg 5880cataatttaa agttttattt gcgattcatc gtttttcagg tactcaaacg ctgaggtgtg 5940ccttttgact tacttttccg ccttggcaag ctggccgggt gatacttgca caagttccac 6000taattactga catttgtggt attaactcgt ttgactgctc tacaattgta ggatgttaat 6060caatgtcttg gctgcctaac ctgcaggccg cgagcgccga tatgctatgt aatagacaat 6120aaaaccatgt ttatataaaa aaaattcaaa atagaaaacg attctgtaca aggagtattt 6180tttttttgtt ctagtgtgtt tatattatcc ttggctaaga ggcactgcgt atacttcaag 6240gtacccctgt gttttgaaaa aaaacaacag taaaatagga actccgcgag gttcaggaac 6300ctgaaacaaa atcaataaaa acattatatg cgtttcgaac aaaattaaag aaaaagaata 6360aatatagatt aaaaaaaaaa agaagaaatt aaaagaattt ctactaaatc ccaattgtta 6420tatatttgtt aaatgccaaa aaagtttata aaaaatttag aatgtataaa taataataaa 6480ctaagtaacg cgatcgccga cgccgccgat atctccctcg ccagcggccg ccttatggct 6540aagaatgttg gaattttggc catggacatc tacttcccac caacttgtgt tcagcaggag 6600gctttagaag cacatgacgg agcctcaaag ggtaagtaca caatcggatt aggacaggat 6660tgcttagcat tctgcactga attggaggac gtcatctcaa tgtctttcaa cgccgtcacc 6720tcattgttag agaagtacaa aatcgaccca aaccagatcg gaaggttgga agtcggttct 6780gaaaccgtca tcgacaagtc taaatcaatc aagactttcg ttatgcagtt gttcgaaaag 6840tgcggtaata ctgacgtcga gggtgtagac tctactaacg cttgttatgg tggtaccgca 6900gctttattga actgcgtaaa ctgggttgag tcaaactcat gggatggtag gtacggatta 6960gtcatttgca ccgattctgc cgtctacgcc gagggtccag caaggccaac cggtggagct 7020gcagctattg ctatgttaat cggaccagat gcccctatag tcttcgagtc taagttgagg 7080ggttcacaca tccctaacgt ctacgacttc tacaagccaa acttggcctc agagtatcca 7140gttgtcgacg gaaagttatc tcagacatgc tacttgatgg ccttagattc atgttacaag 7200cacttatgca acaagttcga aaagttggag ggaaaggagt tctcaattaa cgacgccgac 7260tacttcgttt ttcactctcc atacaacaaa ttggtccaga agtcattcgc caggttattg 7320tacaacgatt ttttgagaaa cgcatcatct atcgatgagg ccgccaagga gaaattcacc 7380ccatattctt ctttgtcatt ggacgagtct taccagtcta gggacttgga gaaggtatca 7440cagcaattgg ctaaaacctt ctatgacgcc aaagttcagc caaccacctt ggtccctaaa 7500caggtcggaa atatgtatac tgcatctttg tatgccgcct ttgcctcttt gatccacaac 7560aagcacaacg atttagtcgg aaaaagggtt gtcatgtttt cttacggtgc cggatctact 7620gccactatgt tctcattgag gttatgcgaa aaccagtcac cattttcatt gtctaacatc 7680gcctcagtca tggacgtagg tgtctcacct gagaagttcg tagaaaccat gaagttgatg 7740gagcacagat acggtgccaa agaattcgtc acttcaaaag agggaatctt ggatttgttg 7800gccccaggaa cctactattt gaaggaggtc gactctttgt acagaaggtt ctatggaaag 7860aagggagacg acggatctgt cgcaaacggt cagtaaatcg gcggcgtcgg cgatcgcgtt 7920aaggcggccg ctggcgaggg agatatttca acctgggcct aacagtaaag atatcctcct 7980caaaactggt gcacttaatc gctgaatttg ttctggcttc tcttcttttt ctttattccc 8040cccatgggcc aaaaaaaata gtactatcag gaatttggcg ccgggtcacg atatacgtgt 8100acagtgacct aggcgacgcc acaaggaaaa aggaaaaaaa cagaaaaaac aacaaaaact 8160aaaacaaaca cgaaaacttt aatagatcta agtgaagtag tggtgaggca attggagtga 8220catagcagct actacaacta caaaaaaggc gcgccacggt cgtgcggata tgaaagaggt 8280cgttatagct tctgccgtca ggaccgccat cggatcttac ggtaagtcat taaaggacgt 8340ccctgccgtt gatttaggag ccaccgcaat taaagaggcc gttaaaaagg caggtataaa 8400gccagaggac gtcaacgagg tcatcttggg aaatgtctta caagccggat taggtcaaaa 8460cccagcaaga caagcatcat tcaaagccgg tttacctgtc gagatacctg caatgaccat 8520caacaaggtt tgcggttcag gattaaggac cgtttcttta gcagcacaga tcattaaggc 8580tggagatgca gacgttatca ttgctggtgg tatggaaaac atgtcaagag ccccatactt 8640ggctaataac gccaggtggg gatataggat gggaaacgcc aagtttgtcg acgaaatgat 8700tactgacgga ttgtgggacg ccttcaatga ctatcacatg ggtataaccg cagaaaacat 8760tgccgagagg tggaatatct caagagaaga acaggatgag tttgcattgg cctcacagaa 8820aaaagcagag gaggcaataa agtcaggtca gtttaaggat gaaatcgtcc cagtcgtcat 8880caagggaaga aagggtgaga cagttgtcga caccgacgaa caccctagat ttggttcaac 8940catcgaggga ttagcaaagt tgaagccagc cttcaagaaa gacggaaccg taaccgccgg 9000taatgcatct ggattgaacg attgcgcagc agttttggtc ataatgtcag ccgagaaagc 9060taaggagttg ggtgtcaagc cattggcaaa aattgtttca tacggatcag ccggtgtcga 9120ccctgccatc atgggttacg gaccttttta cgccaccaag gctgcaatcg aaaaggccgg 9180ttggaccgta gatgaattgg atttgatcga gtcaaacgag gcctttgccg cccaatcatt 9240ggctgtcgcc aaggacttga agttcgacat gaacaaggtc aacgtcaacg gtggtgccat 9300cgcattgggt caccctatcg gagcctctgg tgccaggatc ttggttacct tggtccacgc 9360catgcagaag agggacgcaa agaagggttt ggccaccttg tgcatcggtg gaggtcaggg 9420aacagctatc ttgttagaga aatgcagccc ctcagccccc ctagcgtcga ataaaagaca 9480ttggtacatg atatcaaaca gaattttaac atttcttgat ccagtttgta aacaaaacaa 9540acaatttttc taccatttaa cttcatacca tcggcgagag ccgaacagga aaaaaaagaa 9600gtctccggtt atcgtaagca gtatcaaata ataagaatgt atgtgtgtgc aatttgttat 9660acccacgaag aagtgcgcag tagagttaga aaaccaactg agtaatcttt actcccgaca 9720atcgtccaat aatcctcttg ttgctaggaa cgtgatgatg gatttcgttt gaaatccgga 9780cggaaaactc aaaagaagtc caaccaccaa ccattttcga gcctcaagaa tctctaagca 9840ggtttcttta ctaaggggat ggcctttctg tcctggacat tttttccttc cttttttcat 9900ttccttgaaa ggaacagatt ttttttgact tttgccacac agctgcacta tctcaacccc 9960ttttacattt taagttttcg ggttgaatgg ccggtgttta aaccccagcg cctggcggg 10019275027DNAArtificial SequenceSynthetic Mat alpha FcphI target site cassette 27tgatgtctgg gttttgtttg ggatgcaatt tattgcttcc caatgtagaa aagtacatca 60tatgaaacaa cttaaactct taactacttc ttttaacctt cactttttat gaaatgtatc 120aaccatatat aataacttaa tagacgacat tcacaatatg tttacttcga agcctgcttt 180caaaattaag aacaaagcat ccaaatcata cagaaacaca gcggtttcaa aaaagctgaa 240agaaaaacgt ctagctgagc atgtgaggcc aagctgcttc aatattattc gaccactcaa 300gaaagatatc cagattcctg ttccttcctc tcgattttta aataaaatcc aaattcacag 360gatagcgtct ggaagtcaaa atactcagtt tcgacagttc aataagacat ctataaaatc 420ttcaaagaaa tatttaaact catttatggc ttttagagca tattactcac agtttggctc 480cggtgtaaaa caaaatgtct tgtcttctct gctcgctgaa gaatggcacg cggacaaaat 540gcagcacgga atatgggact acttcgcgca acagtataat tttataaacc ctggttttgg 600ttttgtagag tggttgacga ataattatgc tgaagtacgt ggtgacggat attgggaaga 660tgtgtttgta catttggcct tatagagtgt ggtcgtggcg gaggttgttt atctttcgag 720tactgaatgt tgtcagtata gctatcctat ttgaaactcc ccatcgtctt gctcttgttc 780ccaatgtttg tttatacact catatggcta tacccttatc tacttgcctc ttttgtttat 840gtctatgtat ttgtataaaa tatgatatta ctcagactca agcaaacaat caatttgtga 900gcgttgcgct cgtgcatcga gaagttaaga ttatatgaat aactaaatac taaatagaaa 960tgtaaataca gtgagaacaa aacaaaaaaa aacgaacaga gaaactaaat ccacattaat 1020tgagagttct atctattaga aaatgcaaac tccaactaaa tgggaaaaca gataacctct 1080tttatttttt tttaatgttt gatattcgag tctttttctt ttgttaggtt tatattcatc 1140atttcaatga ataaaagaag cttcttattt tggttgcaaa gaatgaaaaa aaaggatttt 1200ttcatacttc taaagcttca attataacca aaaattttat aaatgaagag aaaaaatcta 1260gtagtatcaa gttaaactta gaaaaactca tcgagcatca aatgaaactg caatttattc 1320atatcaggat tatcaatacc atatttttga aaaagccgtt tctgtaatga aggagaaaac 1380tcaccgaggc agttccatag gatggcaaga tcctggtatc ggtctgcgat tccgactcgt 1440ccaacatcaa tacaacctat taatttcccc tcgtcaaaaa taaggttatc aagtgagaaa 1500tcaccatgag tgacgactga atccggtgag aatggcaaaa gcttatgcat ttctttccag 1560acttgttcaa caggccagcc attacgctcg tcatcaaaat cactcgcatc aaccaaaccg 1620ttattcattc gtgattgcgc ctgagcgaga cgaaatacgc gatcgctgtt aaaaggacaa 1680ttacaaacag gaatcgaatg caaccggcgc aggaacactg ccagcgcatc aacaatattt 1740tcacctgaat caggatattc ttctaatacc tggaatgctg ttttgccggg gatcgcagtg 1800gtgagtaacc atgcatcatc aggagtacgg ataaaatgct tgatggtcgg aagaggcata 1860aattccgtca gccagtttag tctgaccatc tcatctgtaa catcattggc aacgctacct 1920ttgccatgtt tcagaaacaa ctctggcgca tcgggcttcc catacaatcg atagattgtc 1980gcacctgatt gcccgacatt atcgcgagcc catttatacc catataaatc agcatccatg 2040ttggaattta atcgcggcct cgaaacgtga gtcttttcct tacccatttt taatgttact 2100tctcttgcag ttagggaact ataatgtaac tcaaaataag attaaacaaa ctaaaataaa 2160aagaagttat acagaaaaac ccatataaac cagtactaat ccataataat aatacacaaa 2220aaaactatca aataaaacca gaaaacagat tgaatagaaa aattttttcg atctcctttt 2280atattcaaaa ttcgatatat gaaaaaggga actctcagaa aatcaccaaa tcaatttaat 2340tagatttttc ttttccttct agcgttggaa agaaaaattt ttcttttttt ttttagaaat 2400gaaaaatttt tgccgtagga atcaccgtat aaaccctgta taaacgctac tctgttcacc 2460tgtgtaggct atgattgacc cagtgttcat tgttattgcg agagagcggg agaaaagaac 2520cgatacaaga gatccatgct ggtatagttg tctgtccaac actttgatga acttgtagga 2580cgatgatgtg tattactagt gtcgacatgc gtccatcttt acagtcctgt cttattgttc 2640ttgatttgtg ccccgtaaaa tactgttact tggttctggc gaggtattgg atagttcctt 2700tttataaagg ccatgaagct ttttctttcc aatttttttt ttttcgtcat tatagaaatc 2760attacgaccg agattcccgg gtaataactg atataattaa attgaagctc taatttgtga 2820gtttagtata catgcattta cttataatac agttttttag ttttgctggc cgcatcttct 2880caaatatgct tcccagcctg cttttctgta acgttcaccc tctaccttag catcccttcc 2940ctttgcaaat agtcctcttc caacaataat aatgtcagat cctgtagaga ccacatcatc 3000cacggttcta tactgttgac ccaatgcgtc tcccttgtca tctaaaccca caccgggtgt 3060cataatcaac caatcgtaac cttcatctct tccacccatg tctctttgag caataaagcc 3120gataacaaaa tctttgtcac tcttcgcaat gtcaacagta cccttagtat attctccagt 3180agatagggag cccttgcatg acaattctgc taacatcaaa aggcctctag gttcctttgt 3240tacttcttct gccgcctgct tcaaaccgct aacaatacct gggcccacca caccgtgtgc 3300attcgtaatg tctgcccatt ctgctattct gtatacaccc gcagagtact gcaatttgac 3360tgtattacca atgtcagcaa attttctgtc ttcgaagagt aaaaaattgt acttggcgga 3420taatgccttt agcggcttaa ctgtgccctc catggaaaaa tcagtcaaga tatccacatg 3480tgtttttagt aaacaaattt tgggacctaa tgcttcaact aactccagta attccttggt 3540ggtacgaaca tccaatgaag cacacaagtt tgtttgcttt tcgtgcatga tattaaatag 3600cttggcagca acaggactag gatgagtagc agcacgttcc ttatatgtag ctttcgacat 3660gatttatctt cgtttcctgc aggtttttgt tctgtgcagt tgggttaaga atactgggca 3720atttcatgtt tcttcaacac cacatatgcg tatatatacc aatctaagtc tgtgctcctt 3780ccttcgttct tccttctgct cggagattac cgaatcaaaa aaatttcaaa gaaaccggaa 3840tcaaaaaaaa gaacaaaaaa aaaaaagatg aattgaaaag ctttatggac cctgaaacca 3900cagccacatt aaccttcttt gatggtcaaa acttatcctt caccataaat atgcctcgca 3960aaaaaggtaa ttaacatata tagaattaca ttatttatga aatatcatca ctatctctta 4020gcatctttaa tccttttcta catcagataa cttcggtttg ttatcatcgt ctgtattgtc 4080atcaattggc gcagtagcct caatttcaac gtcgtttgac tctggtgttt gttcatgtgc 4140agatccatga gatgatgaac ttgtgagcgt tgcgctcgtg catccggagg ttgtttatct 4200ttcgagtact gaatgttgtc agtatagcta tcctatttga aactccccat cgtcttgctc 4260ttgttcccaa tgtttgttta tacactcata tggctatacc cttatctact tgcctctttt 4320gtttatgtct atgtatttgt ataaaatatg atattactca gactcaagca aacaatcaat 4380tcttagcatc attctttgtt cttatcttaa ccataaacga tcttgatgtg acttttgtaa 4440tttgaacgaa ttggctatac gggacggatg acaaatgcac cattactcta ggttgttgtt 4500ggatcttaac aaaccgtaaa ggtaaactgc ccatgcggtt cacatgactt ttgactttcc 4560tttgtttgct agttaccttc ggcttcacaa tttgtttttc cacttttcta acaggtttat 4620cacctttcaa acttatcttt atcttattcg ccttcttggg tgcctccaca gtagaggtta 4680cttccttttt aatatgtact tttaggatac tttcacgctt tataacaata tcaagtttac 4740cttcttcatt actattcatc ttcgccacaa gtcttctctc ccttggtgtt tccaatctaa 4800ctacaaaact gttgattagg gtgtacatca ccctaacaag atcatgtatt tgcttcctct 4860ggtacaagct aagaacaggt aaattcaaaa catcccagag taatatcttc aaagggctat 4920accctttaaa catatctcgg catatttgta ttaacccact aatattttga cggccaatct 4980tttctatttt tattttcata tcatcgacgt aatgaccact taaaaac 5027286126DNAArtificial SequenceSynthetic ADS_GAL80 donor cassette 28gcctgtctac aggataaaga cgggtcggat acctgcacaa gcaatttggc acctgcatac 60cccatttccc cagtagataa cttcaacaca cacatcaatg tccctcacca gtttatttcc 120aaaagagacg ctttttacta cctgactaga ttttcatttt gtttcttttg gattgcgctt 180gcctttgtag gtgtgtcgtt tatcctttac gttttgactt ggtgctcgaa gatgctttca 240gagatggtgc ttatcctcat gtcttttggg tttgtcttca atacggcagc cgttgtcttg 300caaacggccg cctctgccat ggcaaagaat gctttccatg acgatcatcg tagtgcccaa 360ttgggtgcct ctatgatggg tttaaacgta tggcttgggc aagtgtcttt ttatgtatcg 420tggaatttat cctgctggtc ttctggtctg ttagggcaag gttggcctct acttactcca 480tcgacaattc aagatacaga acctcctcca gatggaatcc cttccataga gagaaggagc 540aagcaactga cccaatattg actgccactg gacctgaaga catgcaacaa agtgcaagca 600tagtggggcc ttcttccaat gctaatccgg tcactgccac tgctgctacg gaaaaccaac 660ctaaaggtat taacttcttc actataagaa aatcacacga gcgcccggac gatgtctctg 720tttaaatggc gcaagttttc cgctttgtaa tatatattta tacccctttc ttctctcccc 780tgcaatataa tagtttaatt ctaatattaa taatatccta tattttcttc atttaccggc 840gcactctcgc ccgaacgacc tcaaaatgtc tgctacattc ataataacca aaagctcata 900actttttttt ttgaacctga atatatatac atcacatatc actgctggtc ctcttcgagc 960gtcccaaaac cttctcaagc aaggttttca gtataatgtt acatgcgtac acgcgtctgt 1020acagaaaaaa aagaaaaatt tgaaatataa ataacgttct taatactaac ataactataa 1080aaaaataaat agggacctag acttcaggtt gtctaactcc ttccttttcg gttagagcgg 1140atcttagcta gcttaaatag acatgggata aactaataaa gacttaatta aatgcttgta 1200ctcatctccc atacgagtga agttatcttt acccgcgtac tgcacttcca ggaattgaca 1260aagatagata actgccatca gtaatggtct gggtatgttt ttagtggtca agtattctct 1320gtttatgtcc ttccaaacgt cttcgacttc cttgtatatg agggtttgtg cgtattcctc 1380atttacatta tattccttca tataagattc tagagaagat gatgagtgct ttctctcctg 1440ttctgccttg tgcgtcatca agtcgtttag acgacgaccc aaaataccag aataccggaa 1500tagtggtggc gctgacacgg cccattcaac tgattcttta gtgaaaatat ctgacatccc 1560caggtaacat gtggttgtca gtaggttcgc tccaccagtt ataataacaa ctggatcgtg 1620ttcttcagtg gtcggtatgt gcccttcgtt tgcccacttt gcttcgacca tgagattccg 1680tacaaattcc ttaacaaact cttttccgca attaaataaa tctgttctcc cttcttttgc 1740aagaaattcc tccatttctg tatacgtatc catgaataat ttatagatag gcttcatgta 1800ttctggcagc gtatcaaggc aggtaatcga ccaacgttct acagcttcag tgaaaatttt 1860taattcttca tatgttccat aagcatcgta agtatcatca atcagggtta taacggctac 1920cactttggtg aagaaaactc gtgctcttga atattgtggt tcataaccag aacctaaacc 1980ccagaaataa cactctacaa tgcggtcgcg caaacaaggt gcgttctttt ttatatcaaa 2040tgctttccac catttacaga cgtggctaag ctcctccttg tgcaaagatt gcagcaggtt 2100gaattctaat tttgctaatt ttaaaagcgt cttattatga gagtcttgtt gttggtagaa 2160aggaatatac tgagcagctt ctatgcgggg aagacgcttc cataaaggtt gttttaatgc 2220tctttgtatt tcggtaaaca aagccggatt agtggagaat gcatccttag tcattataga 2280taatcgactt ctagtaaacc caagcgcgtc ttccaaaata atttcgcctg gaacccgcat 2340ggaagtcgcc tcatataact ctaataaccc ttccacatca ttagccaatg attgtttaaa 2400agcaccgttt ttatctttgt aattattgaa aacatcacag gtcacgtagt agccttgctt 2460cctcatcaag cgaaaccata gtgaactgcg atctccattc cagttgtctc cgtatgtctc 2520gtaaatacat tgaagagcat ggtcaatttc cctttcaaaa tggtatggaa ttcctagtct 2580ctgtatttca tctattaatt ttaacaaatt agcatgcttc ataggaatgt ctaaggcttc 2640cttcaggagc tgtcttactt ctttcttcag atcgtttaca atttgttcga ccccttgttc 2700tacttgtttc tcataaatga gaaactgatc accccaaata gagggaggga aattggctat 2760aggtctaatc ggtttctcct cggttagtgc catggatcct tcagttacag taaatgtttt 2820agttgcatcg tcataggtcc attcaccatc aacaccatta tcattagcat attgcttaaa 2880gaccttttcg gctgttgctg catctactgc ttcggtagta gtttcaccct tcaaagtttt 2940accattcaaa attaatttat attgacccat ttatattgaa ttttcaaaaa ttcttacttt 3000ttttttggat ggacgcaaag aagtttaata atcatattac atggcattac caccatatac 3060atatccatat ctaatcttac ttatatgttg tggaaatgta aagagcccca ttatcttagc 3120ctaaaaaaac cttctctttg gaactttcag taatacgctt aactgctcat tgctatattg 3180aagtacggat tagaagccgc cgagcgggcg acagccctcc gacggaagac tctcctccgt 3240gcgtcctcgt cttcaccggt cgcgttcctg aaacgcagat gtgcctcgcg ccgcactgct 3300ccgaacaata aagattctac aatactagct tttatggtta tgaagaggaa aaattggcag 3360taacctggcc ccacaaacct tcaaattaac gaatcaaatt aacaaccata ggatgataat 3420gcgattagtt ttttagcctt atttctgggg taattaatca gcgaagcgat gatttttgat 3480ctattaacag atatataaat ggaaaagctg cataaccact ttaactaata ctttcaacat 3540tttcagtttg tattacttct tattcaaatg tcataaaagt atcaacaaaa aattgttaat 3600atacctctat actttaacgt caaggagaaa aaactataat gtcattaccg ttcttaactt 3660ctgcaccggg aaaggttatt atttttggtg aacactctgc tgtgtacaac aagcctgccg 3720tcgctgctag tgtgtctgcg ttgagaacct acctgctaat aagcgagtca tctgcaccag 3780atactattga attggacttc ccggacatta gctttaatca taagtggtcc atcaatgatt 3840tcaatgccat caccgaggat caagtaaact cccaaaaatt ggccaaggct caacaagcca 3900ccgatggctt gtctcaggaa ctcgttagtc ttttggatcc gttgttagct caactatccg 3960aatccttcca ctaccatgca gcgttttgtt tcctgtatat gtttgtttgc ctatgccccc 4020atgccaagaa tattaagttt tctttaaagt ctactttacc catcggtgct gggttgggct 4080caagcgcctc tatttctgta tcactggcct tagctatggc ctacttgggg gggttaatag 4140gatctaatga cttggaaaag ctgtcagaaa acgataagca tatagtgaat caatgggcct 4200tcataggtga aaagtgtatt cacggtaccc cttcaggaat agataacgct gtggccactt 4260atggtaatgc cctgctattt gaaaaagact cacataatgg aacaataaac acaaacaatt 4320ttaagttctt agatgatttc ccagccattc caatgatcct aacctatact agaattccaa 4380ggtctacaaa agatcttgtt gctcgcgttc gtgtgttggt caccgagaaa tttcctgaag 4440ttatgaagcc aattctagat gccatgggtg aatgtgccct acaaggctta gagatcatga 4500ctaagttaag taaatgtaaa ggcaccgatg acgaggctgt agaaactaat aatgaactgt 4560atgaacaact attggaattg ataagaataa atcatggact gcttgtctca atcggtgttt 4620ctcatcctgg attagaactt attaaaaatc tgagcgatga tttgagaatt ggctccacaa 4680aacttaccgg tgctggtggc ggcggttgct ctttgacttt gttacgaaga gacattactc 4740aagagcaaat tgacagtttc aaaaagaaat tgcaagatga ttttagttac gagacatttg 4800aaacagactt gggtgggact ggctgctgtt tgttaagcgc aaaaaatttg aataaagatc 4860ttaaaatcaa atccctagta ttccaattat ttgaaaataa aactaccaca aagcaacaaa 4920ttgacgatct attattgcca ggaaacacga atttaccatg gacttcataa gctaatttgc 4980gataggcatt atttattagt tgtttttaat cttaactgtg tatgaagttt tatgtaataa 5040agatagaaag agaaacaaaa aaaaattttt cgtagtatca attcagcttt cgaagacaga 5100atgaaattta agcagaccat tcatcttgcc ctgtgcttgg cccccagtgc agcgaacgtt 5160ataaaaacga atactgagta tatatctatg taaaacaacc atatcatttc ttgttctgaa 5220ctttgtttac ctaactagtt ttaaatttcc ctttttcgtg catgcgggtg ttcttattta

5280ttagcatact acatttgaaa tatcaaattt ccttagtaga aaagtgagag aaggtgcact 5340gacacaaaaa ataaaatgct acgtataact gtcaaaactt tgcagcagcg ggcatccttc 5400catcatagct tcaaacatat tagcgttcct gatcttcata cccgtgctca aaatgatcaa 5460acaaactgtt attgccaaga aataaacgca aggctgcctt caaaaactga tccattagat 5520cctcatatca agcttcctca tagaacgccc aattacaata agcatgtttt gctgttatca 5580ccgggtgata ggtttgctca accatggaag gtagcatgga atcataattt ggatactaat 5640acaaatcggc catataatgc cattagtaaa ttgcgctccc atttaggtgg ttctccaggc 5700aaatttgaat actaataaat gcggtgcatt tgcaaaatga atttattcca aggccaaaac 5760aacacgatga atggctttat ttttttgtta ttcctgacat gaagctttat gtaattaagg 5820aaacggacat cgaggaattt gcatcttttt tagatgaagg agctattcaa gcaccaaagc 5880tatccttcca ggattattta agcggtaagg ccaaggcttc ccaacaggtt catgaagtgc 5940atcatagaaa gcttacaagg tttcagggtg aaacttttct aagagattgg aacttagtct 6000gtgggcatta taagagagat gctaagtgtg gagaaatggg acccgacata attgcagcat 6060ttcaagatga aaagcttttt cctgagaata atctagcctt aatttctcat attgggggtc 6120atattt 6126299550DNAArtificial SequenceSynthetic ADS_HXT3 donor cassette 29accggtatat caaatggcgg tgtagtttga aaagtacact gtatgtccat taataaaatt 60acgccgaaga acactgacta gctataaatt ctcttccggc gtccaatttc aacctaaggc 120aagtattgtt aatcaataat gcgtgttggt aaatatgaag cccgatgttg attaagaaga 180attattcgta taagaattaa gcaagcaaaa ttaaggaaaa ttttttcttt cctattcgtc 240atcgcagaca gccttcatct tctcgagata acacctggag gaggagcaat gaaatgaaag 300gaaaaaaaaa tactttcttt ttcttgaaaa aagaaaaaaa ttgtaagatg agctattcgc 360ggaacattct agctcgtttg catcttcttg catttggtag gttttcaata gttcggtaat 420attaacggat acctactatt atcccctagt aggctctttt cacggagaaa ttcgggagtg 480ttttttttcc gtgcgcattt tcttagctat attcttccag cttcgcctgc tgcccggtca 540tcgttcctgt cacgtagttt ttccggattc gtccggctca tataataccg caataaacac 600ggaatatctc gttccgcgga ttcggttaaa ctctcggtcg cggattatca cagagaaagc 660ttcgtggaga atttttccag attttccgct ttccccgatg ttggtatttc cggaggtcat 720tatactgacc gccattataa tgactgtaca acgaccttct ggagaaagaa acaactcaat 780aacgatgtgg gacattgggg gcccactcaa aaaatctggg gactatatcc ccagagaatt 840tctccagaag agaagaaaag tcaaagtttt ttttcgcttg ggggttgcat ataaacttcg 900agcgtcccaa aaccttctca agcaaggttt tcagtataat gttacatgcg tacacgcgtc 960tgtacagaaa aaaaagaaaa atttgaaata taaataacgt tcttaatact aacataacta 1020taaaaaaata aatagggacc tagacttcag gttgtctaac tccttccttt tcggttagag 1080cggatcttag atagacatag ggtagaccaa caaagactta attaagtgct tatattcgtc 1140acccattctg gtgaaattgt ctttaccagc gtattgaact tctaagaatt gacataaata 1200gataacagcc attaacaatg gacgtggaat atttttggtg gtcaagtatt ctctgttaat 1260atctttccaa acgtcttcaa cttccttgta aatcaaagtt tgagcgtatt cttcgttaac 1320gttgtattcc ttcatgtaag attctaagga agaggaggag tgctttcttt cttgctcagc 1380cttgtgagtc atcaaatcgt ttaatcttct acccaaaata ccagagtatc tgaacaaagg 1440tggagcggaa acggcccatt caacagattc cttggtgaaa atatcagaca tacctaagta 1500acaagtagta gtcaacaagt tagcaccacc agtaatgata acgactggat cgtgttcttc 1560agtagttgga atatgacctt cgttagccca tttagcttca accatcaagt tacgaacgaa 1620ttccttaacg aattccttac cacaattgaa caagtcggta cgaccttcct tagccaagaa 1680ttcttccatt tcggtgtaag tgtccatgaa caacttataa attggcttca tgtattctgg 1740taaagtatct aaacaggtaa tggaccatct ttcgacagct tcagtgaaaa tcttcaattc 1800ttcgtaggta ccgtaagcgt cgtaagtgtc gtcaatcaag gtaataacag caaccacctt 1860agtaaagaaa actctagctc tagagtattg tggttcgtaa ccggaaccca aaccccagaa 1920gtaacattcg acaattctat ctctcaaaca tggagcgttc ttcttaatat cgaaggcttt 1980ccaccactta caaacgtgag acaattcttc cttgtgcaag gattgcaaca agttaaattc 2040caatttagcc aatttcaaca aggttttgtt gtgagagtct tgttgttggt agaatgggat 2100gtattgagca gcttcgattc ttggcaaacg cttccacaat ggttgtttca aagctctttg 2160gatttcagtg aacaaagctg ggttagtaga gaaggcatct ttagtcatga tggacaatct 2220agatctagta aaacccaaag cgtcttccaa gatgatttca ccaggaactc tcatggaggt 2280ggcttcatac aattctaaca aaccttcaac atcgttggcc aaagattgct taaaagcacc 2340attcttatct ttgtagttgt tgaaaacgtc acaagtaacg tagtaacctt gctttctcat 2400taatctgaac cataaggaag atctgtcacc gttccagtta tcaccgtaag tttcgtaaat 2460acattgcaaa gcgtgatcaa tttctctttc gaaatggtat ggaataccta atctttggat 2520ttcgtcgatt aatttcaata agttagcgtg cttcattggg atgtccaaag cttccttcaa 2580caattgtcta acttccttct tcaaatcgtt aacgatttgt tcaacacctt gctcaacttg 2640cttttcgtag atcaagaatt ggtcacccca gatagaaggt ggaaaattag caattggtct 2700gattggtttt tcttcagtca aagccatgga tccttcagtt acagtaaatg ttttagttgc 2760atcgtcatag gtccattcac catcaacacc attatcatta gcatattgct taaagacctt 2820ttcggctgtt gctgcatcta ctgcttcggt agtagtttca cccttcaaag ttttaccatt 2880caaaattaat ttatattgac ccatttatat tgaattttca aaaattctta cttttttttt 2940ggatggacgc aaagaagttt aataatcata ttacatggca ttaccaccat atacatatcc 3000atatctaatc ttacttatat gttgtggaaa tgtaaagagc cccattatct tagcctaaaa 3060aaaccttctc tttggaactt tcagtaatac gcttaactgc tcattgctat attgaagtac 3120ggattagaag ccgccgagcg ggcgacagcc ctccgacgga agactctcct ccgtgcgtcc 3180tcgtcttcac cggtcgcgtt cctgaaacgc agatgtgcct cgcgccgcac tgctccgaac 3240aataaagatt ctacaatact agcttttatg gttatgaaga ggaaaaattg gcagtaacct 3300ggccccacaa accttcaaat taacgaatca aattaacaac cataggatga taatgcgatt 3360agttttttag ccttatttct ggggtaatta atcagcgaag cgatgatttt tgatctatta 3420acagatatat aaatggaaaa gctgcataac cactttaact aatactttca acattttcag 3480tttgtattac ttcttattca aatgtcataa aagtatcaac aaaaaattgt taatatttta 3540attctgctgt aacccgtaca tgcccaaaat agggggcggg ttacacagaa tatataacat 3600cgtaggtgtc tgggtgaaca gtttattcct ggcatccact aaatataatg gagcccgctt 3660tttaagctgg catccagaaa aaaaaagaat cccagcacca aaatattgtt ttcttcacca 3720accatcagtt cataggtcca ttctcttagc gcaactacag agaacagggg cacaaacagg 3780caaaaaacgg gcacaacctc aatggagtga tgcaacctgc ctggagtaaa tgatgacaca 3840aggcaattga cccacgcatg tatctatctc attttcttac accttctatt accttctgct 3900ctctctgatt tggaaaaagc tgaaaaaaaa ggttgaaacc agttccctga aattattccc 3960ctacttgact aataagtata taaagacggt aggtattgat tgtaattctg taaatctatt 4020tcttaaactt cttaaattct acttttatag ttagtctttt ttttagtttt aaaacaccaa 4080gaacttagtt tcgatccccg cgtgcttggc cggccgtatc cccgcgtgct tggccggccg 4140tatgtctcag aacgtttaca ttgtatcgac tgccagaacc ccaattggtt cattccaggg 4200ttctctatcc tccaagacag cagtggaatt gggtgctgtt gctttaaaag gcgccttggc 4260taaggttcca gaattggatg catccaagga ttttgacgaa attatttttg gtaacgttct 4320ttctgccaat ttgggccaag ctccggccag acaagttgct ttggctgccg gtttgagtaa 4380tcatatcgtt gcaagcacag ttaacaaggt ctgtgcatcc gctatgaagg caatcatttt 4440gggtgctcaa tccatcaaat gtggtaatgc tgatgttgtc gtagctggtg gttgtgaatc 4500tatgactaac gcaccatact acatgccagc agcccgtgcg ggtgccaaat ttggccaaac 4560tgttcttgtt gatggtgtcg aaagagatgg gttgaacgat gcgtacgatg gtctagccat 4620gggtgtacac gcagaaaagt gtgcccgtga ttgggatatt actagagaac aacaagacaa 4680ttttgccatc gaatcctacc aaaaatctca aaaatctcaa aaggaaggta aattcgacaa 4740tgaaattgta cctgttacca ttaagggatt tagaggtaag cctgatactc aagtcacgaa 4800ggacgaggaa cctgctagat tacacgttga aaaattgaga tctgcaagga ctgttttcca 4860aaaagaaaac ggtactgtta ctgccgctaa cgcttctcca atcaacgatg gtgctgcagc 4920cgtcatcttg gtttccgaaa aagttttgaa ggaaaagaat ttgaagcctt tggctattat 4980caaaggttgg ggtgaggccg ctcatcaacc agctgatttt acatgggctc catctcttgc 5040agttccaaag gctttgaaac atgctggcat cgaagacatc aattctgttg attactttga 5100attcaatgaa gccttttcgg ttgtcggttt ggtgaacact aagattttga agctagaccc 5160atctaaggtt aatgtatatg gtggtgctgt tgctctaggt cacccattgg gttgttctgg 5220tgctagagtg gttgttacac tgctatccat cttacagcaa gaaggaggta agatcggtgt 5280tgccgccatt tgtaatggtg gtggtggtgc ttcctctatt gtcattgaaa agatatgatt 5340acgttctgcg attttctcat gatctttttc ataaaataca taaatatata aatggcttta 5400tgtataacag gcataattta aagttttatt tgcgattcat cgtttttcag gtactcaaac 5460gctgaggtgt gccttttgac ttacttttcc gccttggcaa gctggccggg tgatacttgc 5520acaagttcca ctaattactg acatttgtgg tattaactcg tttgactgct ctacaattgt 5580aggatgttaa tcaatgtctt ggctgcctaa cctgcaggcc gcgagcgccg atatgctatg 5640taatagacaa taaaaccatg tttatataaa aaaaattcaa aatagaaaac gattctgtac 5700aaggagtatt ttttttttgt tctagtgtgt ttatattatc cttggctaag aggcactgcg 5760tatacttcaa ggtacccctg tgttttgaaa aaaaacaaca gtaaaatagg aactccgcga 5820ggttcaggaa cctgaaacaa aatcaataaa aacattatat gcgtttcgaa caaaattaaa 5880gaaaaagaat aaatatagat taaaaaaaaa aagaagaaat taaaagaatt tctactaaat 5940cccaattgtt atatatttgt taaatgccaa aaaagtttat aaaaaattta gaatgtataa 6000ataataataa actaagtaac gcgatcgccg acgccgccga tatctccctc gccagcggcc 6060gccttatggc taagaatgtt ggaattttgg ccatggacat ctacttccca ccaacttgtg 6120ttcagcagga ggctttagaa gcacatgacg gagcctcaaa gggtaagtac acaatcggat 6180taggacagga ttgcttagca ttctgcactg aattggagga cgtcatctca atgtctttca 6240acgccgtcac ctcattgtta gagaagtaca aaatcgaccc aaaccagatc ggaaggttgg 6300aagtcggttc tgaaaccgtc atcgacaagt ctaaatcaat caagactttc gttatgcagt 6360tgttcgaaaa gtgcggtaat actgacgtcg agggtgtaga ctctactaac gcttgttatg 6420gtggtaccgc agctttattg aactgcgtaa actgggttga gtcaaactca tgggatggta 6480ggtacggatt agtcatttgc accgattctg ccgtctacgc cgagggtcca gcaaggccaa 6540ccggtggagc tgcagctatt gctatgttaa tcggaccaga tgcccctata gtcttcgagt 6600ctaagttgag gggttcacac atccctaacg tctacgactt ctacaagcca aacttggcct 6660cagagtatcc agttgtcgac ggaaagttat ctcagacatg ctacttgatg gccttagatt 6720catgttacaa gcacttatgc aacaagttcg aaaagttgga gggaaaggag ttctcaatta 6780acgacgccga ctacttcgtt tttcactctc catacaacaa attggtccag aagtcattcg 6840ccaggttatt gtacaacgat tttttgagaa acgcatcatc tatcgatgag gccgccaagg 6900agaaattcac cccatattct tctttgtcat tggacgagtc ttaccagtct agggacttgg 6960agaaggtatc acagcaattg gctaaaacct tctatgacgc caaagttcag ccaaccacct 7020tggtccctaa acaggtcgga aatatgtata ctgcatcttt gtatgccgcc tttgcctctt 7080tgatccacaa caagcacaac gatttagtcg gaaaaagggt tgtcatgttt tcttacggtg 7140ccggatctac tgccactatg ttctcattga ggttatgcga aaaccagtca ccattttcat 7200tgtctaacat cgcctcagtc atggacgtag gtgtctcacc tgagaagttc gtagaaacca 7260tgaagttgat ggagcacaga tacggtgcca aagaattcgt cacttcaaaa gagggaatct 7320tggatttgtt ggccccagga acctactatt tgaaggaggt cgactctttg tacagaaggt 7380tctatggaaa gaagggagac gacggatctg tcgcaaacgg tcagtaaatc ggcggcgtcg 7440gcgatcgcgt taaggcggcc gctggcgagg gagatatttc aacctgggcc taacagtaaa 7500gatatcctcc tcaaaactgg tgcacttaat cgctgaattt gttctggctt ctcttctttt 7560tctttattcc ccccatgggc caaaaaaaat agtactatca ggaatttggc gccgggtcac 7620gatatacgtg tacagtgacc taggcgacgc cacaaggaaa aaggaaaaaa acagaaaaaa 7680caacaaaaac taaaacaaac acgaaaactt taatagatct aagtgaagta gtggtgaggc 7740aattggagtg acatagcagc tactacaact acaaaaaagg cgcgccacgg tcgtgcggat 7800atgaaagagg tcgttatagc ttctgccgtc aggaccgcca tcggatctta cggtaagtca 7860ttaaaggacg tccctgccgt tgatttagga gccaccgcaa ttaaagaggc cgttaaaaag 7920gcaggtataa agccagagga cgtcaacgag gtcatcttgg gaaatgtctt acaagccgga 7980ttaggtcaaa acccagcaag acaagcatca ttcaaagccg gtttacctgt cgagatacct 8040gcaatgacca tcaacaaggt ttgcggttca ggattaagga ccgtttcttt agcagcacag 8100atcattaagg ctggagatgc agacgttatc attgctggtg gtatggaaaa catgtcaaga 8160gccccatact tggctaataa cgccaggtgg ggatatagga tgggaaacgc caagtttgtc 8220gacgaaatga ttactgacgg attgtgggac gccttcaatg actatcacat gggtataacc 8280gcagaaaaca ttgccgagag gtggaatatc tcaagagaag aacaggatga gtttgcattg 8340gcctcacaga aaaaagcaga ggaggcaata aagtcaggtc agtttaagga tgaaatcgtc 8400ccagtcgtca tcaagggaag aaagggtgag acagttgtcg acaccgacga acaccctaga 8460tttggttcaa ccatcgaggg attagcaaag ttgaagccag ccttcaagaa agacggaacc 8520gtaaccgccg gtaatgcatc tggattgaac gattgcgcag cagttttggt cataatgtca 8580gccgagaaag ctaaggagtt gggtgtcaag ccattggcaa aaattgtttc atacggatca 8640gccggtgtcg accctgccat catgggttac ggaccttttt acgccaccaa ggctgcaatc 8700gaaaaggccg gttggaccgt agatgaattg gatttgatcg agtcaaacga ggcctttgcc 8760gcccaatcat tggctgtcgc caaggacttg aagttcgaca tgaacaaggt caacgtcaac 8820ggtggtgcca tcgcattggg tcaccctatc ggagcctctg gtgccaggat cttggttacc 8880ttggtccacg ccatgcagaa gagggacgca aagaagggtt tggccacctt gtgcatcggt 8940ggaggtcagg gaacagctat cttgttagag aaatgcagcc cctcagcccc cctagcgtcg 9000aataaaagac attggtacat gatatcaaac agaattttaa catttcttga tccagtttgt 9060aaacaaaaca aacaattttt ctaccattta acttcatacc atcggcgaga gccgaacagg 9120aaaaaaaaga agtctccggt tatcgtaagc agtatcaaat aataagaatg tatgtgtgtg 9180caatttgtta tacccacgaa gaagtgcgca gtagagttag aaaaccaact gagtaatctt 9240tactcccgac aatcgtccaa taatcctctt gttgctagga acgtgatgat ggatttcgtt 9300tgaaatccgg acggaaaact caaaagaagt ccaaccacca accattttcg agcctcaaga 9360atctctaagc aggtttcttt actaagggga tggcctttct gtcctggaca ttttttcctt 9420ccttttttca tttccttgaa aggaacagat tttttttgac ttttgccaca cagctgcact 9480atctcaaccc cttttacatt ttaagttttc gggttgaatg gccggtgttt aaaccccagc 9540gcctggcggg 9550304429DNAArtificial SequenceSynthetic ADS_Mat alpha donor cassette 30tgatgtctgg gttttgtttg ggatgcaatt tattgcttcc caatgtagaa aagtacatca 60tatgaaacaa cttaaactct taactacttc ttttaacctt cactttttat gaaatgtatc 120aaccatatat aataacttaa tagacgacat tcacaatatg tttacttcga agcctgcttt 180caaaattaag aacaaagcat ccaaatcata cagaaacaca gcggtttcaa aaaagctgaa 240agaaaaacgt ctagctgagc atgtgaggcc aagctgcttc aatattattc gaccactcaa 300gaaagatatc cagattcctg ttccttcctc tcgattttta aataaaatcc aaattcacag 360gatagcgtct ggaagtcaaa atactcagtt tcgacagttc aataagacat ctataaaatc 420ttcaaagaaa tatttaaact catttatggc ttttagagca tattactcac agtttggctc 480cggtgtaaaa caaaatgtct tgtcttctct gctcgctgaa gaatggcacg cggacaaaat 540gcagcacgga atatgggact acttcgcgca acagtataat tttataaacc ctggttttgg 600ttttgtagag tggttgacga ataattatgc tgaagtacgt ggtgacggat attgggaaga 660tgtgtttgta catttggcct tatagagtgt ggtcgtggcg gaggttgttt atctttcgag 720tactgaatgt tgtcagtata gctatcctat ttgaaactcc ccatcgtctt gctcttgttc 780ccaatgtttg tttatacact catatggcta tacccttatc tacttgcctc ttttgtttat 840gtctatgtat ttgtataaaa tatgatatta ctcagactca agcaaacaat caattttcaa 900aaattcttac tttttttttg gatggacgca aagaagttta ataatcatat tacatggcat 960taccaccata tacatatcca tatacatatc catatctaat cttacttata tgttgtggaa 1020atgtaaagag ccccattatc ttagcctaaa aaaaccttct ctttggaact ttcagtaata 1080cgcttaactg ctcattgcta tattgaagta cggattagaa gccgccgagc gggtgacagc 1140cctccgaagg aagactctcc tccgtgcgtc ctcgtcttca ccggtcgcgt tcctgaaacg 1200cagatgtgcc tcgcgccgca ctgctccgaa caataaagat tctacaatac tagcttttat 1260ggttatgaag aggaaaaatt ggcagtaacc tggccccaca aaccttcaaa tgaacgaatc 1320aaattaacaa ccataggatg ataatgcgat tagtttttta gccttatttc tggggtaatt 1380aatcagcgaa gcgatgattt ttgatctatt aacagatata taaatgcaaa aactgcataa 1440ccactttaac taatactttc aacattttcg gtttgtatta cttcttattc aaatgtaata 1500aaagtatcaa caaaaaattg ttaatatacc tctatacttt aacgtcaagg agaaaaaacc 1560ccggatccat gggtcaatat aaattaattt tgaatggtaa aactttgaag ggtgaaacta 1620ctaccgaagc agtagatgca gcaacagccg aaaaggtctt taagcaatat gctaatgata 1680atggtgttga tggtgaatgg acctatgacg atgcaactaa aacatttact gtaactgaag 1740gatccatggc cttaacagaa gagaagccaa tcagacccat agccaacttc cctccatcaa 1800tatggggtga tcaatttctt atatacgaaa agcaagttga acaaggtgtc gaacaaatcg 1860tcaacgatct aaagaaggag gtcaggcagt tattaaaaga agctctggac ataccgatga 1920aacatgctaa tctactaaaa ttaattgatg agattcaaag attgggcatc ccttatcact 1980tcgagagaga gattgatcat gcattgcaat gtatatacga gacgtacggt gataactgga 2040acggagatag gagtagtcta tggttcaggt tgatgagaaa gcaaggctac tatgtgacat 2100gcgatgtctt taataactat aaggacaaaa atggtgcatt taagcaatct ctagccaacg 2160atgttgaagg tttattagaa ttatatgaag ccacaagcat gagagtccca ggagagatca 2220tattggaaga tgctctaggg ttcacacgtt ctagattgtc aataatgacc aaggacgcat 2280tcagtactaa ccctgcgtta ttcactgaaa tccaaagggc actaaagcaa cctctttgga 2340aaagattgcc tcgtattgaa gcagcacaat atataccctt ctatcagcag caggactctc 2400acaacaaaac tctacttaaa ttagcaaagt tagaattcaa tttgttgcaa agtttacata 2460aggaagaatt atctcacgta tgcaaatggt ggaaagcttt cgatattaag aagaacgccc 2520catgcctgag agacagaata gttgagtgtt atttctgggg tctgggttcc ggttatgaac 2580cacaatattc tagggccaga gtctttttta ccaaagttgt cgcggtcata actttaatcg 2640acgatactta tgatgcttat ggtacctacg aggagttaaa gatattcacc gaagctgtgg 2700aaaggtggtc tataacttgc ttggacaccc taccagaata tatgaaacca atttacaaat 2760tgtttatgga tacttatact gaaatggagg aattcctggc aaaagaaggt agaactgatt 2820tgtttaattg cggtaaggaa tttgtgaagg aattcgtaag aaacttaatg gtcgaggcaa 2880aatgggccaa cgaaggacac attccaacaa ccgaagagca tgatcctgtc gtgataatca 2940ccggtggcgc taacctacta accactacct gctacctagg tatgtctgat atcttcacaa 3000aagagagtgt tgaatgggca gttagtgctc cgcctttgtt caggtatagc ggtatacttg 3060gcaggcgttt aaatgatctt atgacccata aagccgaaca agaaagaaaa cactcaagct 3120cctctcttga atcctatatg aaggaataca atgttaacga ggaatatgcg caaactttaa 3180tatacaagga ggtggaagat gtgtggaaag atatcaacag agaatatctt actacaaaga 3240acataccccg tccacttttg atggcagtga tttacttgtg tcagttctta gaggtacaat 3300acgctggtaa ggataacttt acaagaatgg gcgacgaata caaacatttg atcaagtcat 3360tattagttta tcccatgtcc atctaagcta gctaagatcc gctctaaccg aaaaggaagg 3420agttagacaa cctgaagtct aggtccctat ttattttttt atagttatgt tagtattaag 3480aacgttattt atatttcaaa tttttctttt ttttctgtac agacgcgtgt acgcatgtaa 3540cattatactg aaaaccttgc ttgagaaggt tttgggacgc tcgaagcgga ggttgtttat 3600ctttcgagta ctgaatgttg tcagtatagc tatcctattt gaaactcccc atcgtcttgc 3660tcttgttccc aatgtttgtt tatacactca tatggctata cccttatcta cttgcctctt 3720ttgtttatgt ctatgtattt gtataaaata tgatattact cagactcaag caaacaatca 3780attcttagca tcattctttg ttcttatctt aaccataaac gatcttgatg tgacttttgt 3840aatttgaacg aattggctat acgggacgga tgacaaatgc accattactc taggttgttg 3900ttggatctta acaaaccgta aaggtaaact gcccatgcgg ttcacatgac ttttgacttt 3960cctttgtttg ctagttacct tcggcttcac aatttgtttt tccacttttc taacaggttt 4020atcacctttc aaacttatct ttatcttatt cgccttcttg ggtgcctcca cagtagaggt 4080tacttccttt ttaatatgta cttttaggat actttcacgc tttataacaa tatcaagttt 4140accttcttca ttactattca tcttcgccac aagtcttctc tcccttggtg tttccaatct 4200aactacaaaa ctgttgatta gggtgtacat caccctaaca agatcatgta tttgcttcct 4260ctggtacaag ctaagaacag gtaaattcaa aacatcccag agtaatatct tcaaagggct 4320atacccttta aacatatctc ggcatatttg tattaaccca ctaatatttt gacggccaat 4380cttttctatt tttattttca tatcatcgac gtaatgacca cttaaaaac

44293123DNAArtificial SequenceSynthetic Primer CUT24 31gtttcttttg gattgcgctt gcc 233220DNAArtificial SequenceSynthetic Primer ART12 32tactgacaac cacatgttac 203329DNAArtificial SequenceSynthetic Primer ART45 33tactgcttcg gtagtagttt cacccttca 293420DNAArtificial SequenceSynthetic Primer ART210 34gggaagtcca attcaatagt 203525DNAArtificial SequenceSynthetic Primer HJ207 35catcttctcg agataacacc tggag 253619DNAArtificial SequenceSynthetic Primer KB349 36acgcgtgtac gcatgtaac 193720DNAArtificial SequenceSynthetic Primer HJ602 37caattggggt tctggcagtc 203830DNAArtificial SequenceSynthetic Primer CUT76 38gaagcctgct ttcaaaatta agaacaaagc 303930DNAArtificial SequenceSynthetic Primer HJ362 39gaatttacct gttcttagct tgtaccagag 30406523DNAArtificial SequenceSynthetic ADS_HIS3 donor cassette 40attgtgaggg tcagttattt catccagata taacccgaga ggaaacttct tagcgtctgt 60tttcgtacca taaggcagtt catgaggtat attttcgtta ttgaagccca gctcgtgaat 120gcttaatgct gctgaactgg tgtccatgtc gcctaggtac gcaatctcca caggctgcaa 180aggttttgtc tcaagagcaa tgttattgtg caccccgtaa ttggtcaaca agtttaatct 240gtgcttgtcc accagctctg tcgtaacctt cagttcatcg actatctgaa gaaatttact 300aggaatagtg ccatggtaca gcaaccgaga atggcaattt ctactcgggt tcagcaacgc 360tgcataaacg ctgttggtgc cgtagacata ttcgaagata ggattatcat tcataagttt 420cagagcaatg tccttattct ggaacttgga tttatggctc ttttggttta atttcgcctg 480attcttgatc tcctttagct tctcgacgtg ggcctttttc ttgccatatg gatccgctgc 540acggtcctgt tccctagcat gtacgtgagc gtatttcctt ttaaaccacg acgctttgtc 600ttcattcaac gtttcccatt gtttttttct actattgctt tgctgtggga aaaacttatc 660gaaagatgac gactttttct taattctcgt tttaagagct tggtgagcgc taggagtcac 720tgccaggtat cgtttgaaca cggcattagt cagggaagtc ataacacagt cctttcccgc 780aattttcttt ttctattact cttggcctcc tctagtacac tctatatttt tttatgcctc 840ggtaatgatt ttcatttttt ttttttccac ctagcggatg actctttttt tttcttagcg 900attggcatta tcacataatg aattatacat tatataaagt aatgtgattt cttcgaagaa 960tatactaaaa aaatcgttat tgtcttgaag gtgaaatttc tactcttatt aatggtgaac 1020gttaagctga tgctatgatg gaagctgatt ggtcttaact tgcttgtcat cttgctaatg 1080gtcattggct cgtgttatta cttaagttat ttgtactcgt tttgaacgta atgctaatga 1140tcatcttatg gaataatagt gagtggtttc agggtccata aagcttttca attcatcttt 1200tttttttttg ttcttttttt tgattccggt ttctttgaaa tttttttgat tcggtaatct 1260ccgagcagaa ggaagaacga aggaaggagc acagacttag attggtatat atacgcatat 1320gtggtgttga agaaacatga aattgcccag tattcttaac ccaactgcac agaacaaaaa 1380cctgcaggaa acgaagataa atcatgtcga aagctacata taaggaacgt gctgctactc 1440atcctagtcc tgttgctgcc aagctattta atatcatgca cgaaaagcaa acaaacttgt 1500gtgcttcatt ggatgttcgt accaccaagg aattactgga gttagttgaa gcattaggtc 1560ccaaaatttg tttactaaaa acacatgtgg atatcttgac tgatttttcc atggagggca 1620cagttaagcc gctaaaggca ttatccgcca agtacaattt tttactcttc gaagacagaa 1680aatttgctga cattggtaat acagtcaaat tgcagtactc tgcgggtgta tacagaatag 1740cagaatgggc agacattacg aatgcacacg gtgtggtggg cccaggtatt gttagcggtt 1800tgaagcaggc ggcggaagaa gtaacaaagg aacctagagg ccttttgatg ttagcagaat 1860tgtcatgcaa gggctcccta gctactggag aatatactaa gggtactgtt gacattgcga 1920agagtgacaa agattttgtt atcggcttta ttgctcaaag agacatgggt ggaagagatg 1980aaggttacga ttggttgatt atgacacccg gtgtgggttt agatgacaag ggagacgcat 2040tgggtcaaca gtatagaacc gtggatgatg tggtctctac aggatctgac attattattg 2100ttggaagagg actatttgca aagggaaggg atgctaaggt agagggtgaa cgttacagaa 2160aagcaggctg ggaagcatat ttgagaagat gcggccagca aaactaaaaa actgtattat 2220aagtaaatgc atgtatacta aactcacaaa ttagagcttc aatttaatta tatcagttat 2280taccacgaaa atcgttattg tcttgaaggt gaaatttcta ctcttattaa tggtgaacgt 2340taagctgatg ctatgatgga agctgattgg tcttaacttg cttgtcatct tgctaatggt 2400catatggctc gtgttattac ttaagttatt tgtactcgtt ttgaacgtaa tgctaatgat 2460catcttatgc ttcgagcgtc ccaaaacctt ctcaagcaag gttttcagta taatgttaca 2520tgcgtacacg cgtctgtaca gaaaaaaaag aaaaatttga aatataaata acgttcttaa 2580tactaacata actataaaaa aataaatagg gacctagact tcaggttgtc taactccttc 2640cttttcggtt agagcggatc ttagctagct tagctagctt agatagacat agggtagacc 2700aacaaagact taattaagtg cttatattcg tcacccattc tggtgaaatt gtctttacca 2760gcgtattgaa cttctaagaa ttgacataaa tagataacag ccattaacaa tggacgtgga 2820atatttttgg tggtcaagta ttctctgtta atatctttcc aaacgtcttc aacttccttg 2880taaatcaaag tttgagcgta ttcttcgtta acgttgtatt ccttcatgta agattctaag 2940gaagaggagg agtgctttct ttcttgctca gccttgtgag tcatcaaatc gtttaatctt 3000ctacccaaaa taccagagta tctgaacaaa ggtggagcgg aaacggccca ttcaacagat 3060tccttggtga aaatatcaga catacctaag taacaagtag tagtcaacaa gttagcacca 3120ccagtaatga taacgactgg atcgtgttct tcagtagttg gaatatgacc ttcgttagcc 3180catttagctt caaccatcaa gttacgaacg aattccttaa cgaattcctt accacaattg 3240aacaagtcgg tacgaccttc cttagccaag aattcttcca tttcggtgta agtgtccatg 3300aacaacttat aaattggctt catgtattct ggtaaagtat ctaaacaggt aatggaccat 3360ctttcgacag cttcagtgaa aatcttcaat tcttcgtagg taccgtaagc gtcgtaagtg 3420tcgtcaatca aggtaataac agcaacagcc ttagtaaaga aaactctagc tctagagtat 3480tgtggttcgt aaccggaacc caaaccccag aagtaacatt cgacaattct atctctcaaa 3540catggagcgt tcttcttaat atcgaaggct ttccaccact tacaaacgtg agacaattct 3600tccttgtgca aggattgcaa caagttaaat tccaatttag ccaatttcaa caaggttttg 3660ttgtgagagt cttgttgttg gtagaatggg atgtattgag cagcttcgat tcttggcaaa 3720cgcttccaca atggttgttt caaagctctt tggatttcag tgaacaaagc tgggttagta 3780gagaaggcat ctttagtcat gatggacaat ctagatctag taaaacccaa agcgtcttcc 3840aagatgattt caccaggaac tctcatggag gtggcttcat acaattctaa caaaccttca 3900acatcgttgg ccaaagattg cttaaaagca ccattcttat ctttgtagtt gttgaaaacg 3960tcacaagtaa cgtagtaacc ttgctttctc attaatctga accataagga agatctgtca 4020ccgttccagt tatcaccgta agtttcgtaa atacattgca aagcgtgatc aatttctctt 4080tcgaaatggt atggaatacc taatctttgg atttcgtcga ttaatttcaa taagttagcg 4140tgcttcattg ggatgtccaa agcttccttc aacaattgtc taacttcctt cttcaaatcg 4200ttaacgattt gttcaacacc ttgctcaact tgcttttcgt agatcaagaa ttggtcaccc 4260cagatagaag gtggaaaatt agcaattggt ctgattggtt tttcttcagt caaagccatg 4320gatccttcag ttacagtaaa tgttttagtt gcatcgtcat aggtccattc accatcaaca 4380ccattatcat tagcatattg cttaaagacc ttttcggctg ttgctgcatc tactgcttcg 4440gtagtagttt cacccttcaa agttttacca ttcaaaatta atttatattg acccattata 4500gttttttctc cttgacgtta aagtatagag gtatattaac aattttttgt tgatactttt 4560attacatttg aataagaagt aatacaaacc gaaaatgttg aaagtattag ttaaagtggt 4620tatgcagttt ttgcatttat atatctgtta atagatcaaa aatcatcgct tcgctgatta 4680attaccccag aaataaggct aaaaaactaa tcgcattatc atcctatggt tgttaatttg 4740attcgttcat ttgaaggttt gtggggccag gttactgcca atttttcctc ttcataacca 4800taaaagctag tattgtagaa tctttattgt tcggagcagt gcggcgcgag gcacatctgc 4860gtttcaggaa cgcgaccggt gaagacgagg acgcacggag gagagtcttc cttcggaggg 4920ctgtcacccg ctcggcggct tctaatccgt acttcaatat agcaatgagc agttaagcgt 4980attactgaaa gttccaaaga gaaggttttt ttaggctaag ataatggggc tctttacatt 5040tccacaacat ataagtaaga ttagatatgg atatgtatat ggatatgtat atggtggtaa 5100tgccatgtaa tatgattatt aaacttcttt gcgtccatcc aaaaaaaaag taagaatttt 5160tgaaaattca atataaatgt ctcagaacgt ttacattgta tcgactgcca gaaccccaat 5220tggttcattc cagggttctc tatcctccaa gacagcagtg gaattgggtg ctgttgcttt 5280aaaaggcgcc ttggctaagg ttccagaatt ggatgcatcc aaggattttg acgaaattat 5340ttttggtaac gttctttctg ccaatttggg ccaagctccg gccagacaag ttgctttggc 5400tgccggtttg agtaatcata tcgttgcaag cacagttaac aaggtctgtg catccgctat 5460gaaggcaatc attttgggtg ctcaatccat caaatgtggt aatgctgatg ttgtcgtagc 5520tggtggttgt gaatctatga ctaacgcacc atactacatg ccagcagccc gtgcgggtgc 5580caaatttggc caaactgttc ttgttgatgg tgtcgaaaga gatgggttga acgatgcgta 5640cgatggtcta gccatgggtg tacacgcaga aaagtgtgcc cgtgattggg atattactag 5700agaacaacaa gacaattttg ccatcgaatc ctaccaaaaa tctcaaaaat ctcaaaagga 5760aggtaaattc gacaatgaaa ttgtacctgt taccattaag ggatttagag gtaagcctga 5820tactcaagtc acgaaggacg aggaacctgc tagattacac gttgaaaaat tgagatctgc 5880aaggactgtt ttccaaaaag aaaacggtac tgttactgcc gctaacgctt ctccaatcaa 5940cgatggtgct gcagccgtca tcttggtttc cgaaaaagtt ttgaaggaaa agaatttgaa 6000gcctttggct attatcaaag gttggggtga ggccgctcat caaccagctg attttacatg 6060ggctccatct cttgcagttc caaaggcttt gaaacatgct ggcatcgaag acatcaattc 6120tgttgattac tttgaattca atgaagcctt ttcggttgtc ggtttggtga acactaagat 6180tttgaagcta gacccatcta aggttaatgt atatggtggt gctgttgctc taggtcaccc 6240attgggttgt tctggtgcta gagtggttgt tacactgcta tccatcttac agcaagaagg 6300aggtaagatc ggtgttgccg ccatttgtaa tggtggtggt ggtgcttcct ctattgtcat 6360tgaaaagata tgattacgtt ctgcgatttt ctcatgatct ttttcataaa atacataaat 6420atataaatgg ctttatgtat aacaggcata atttaaagtt ttatttgcga ttcatcgttt 6480ttcaggtact caaacgctga ggtgtgcctt ttgacttact ttt 65234110379DNAArtificial SequenceSynthetic ADS_FS.1 donor cassette 41atattgtttt ttccctagct gttttcggtt tggttatgaa cgcattgtct gcgaaaagaa 60acgtggataa gtattataga aacagcactt cttcggccaa caatatcacg caaattgagc 120aagatagtgc tataaaaggt cttttgccct tttttgccta ttatgcaagc attgctttac 180tagtgtggat gcaaccaagc tttattacac tctctttcat cctttccgtt ggtttcacgg 240gagcatttac cgtcggaaga ataatcgttt gccatttaac taagcagagc tttcccatgt 300tcaatgcacc catgttaatt cctttgtgcc agatagtatt gtacaaaata tgtctatccc 360tttggggaat tgagtctaat aaaatcgtct ttgccctatc ttggcttggg ttcggtctct 420cactaggtgt tcacattatg tttatgaatg acattatcca tgaatttact gagtacctgg 480acgtttatgc tttatccatc aagcgctcca agctgacata aatcgcactt tgtatctact 540tttttttatt cgaaaacaag gcacaacaat gaatctatcg ccctgtgaga ttttcaatct 600caagtttgtg taatagatag cgttatatta tagaactata aaggtccttg aatatacata 660gtgtttcatt cctattactg tatatgtgac tttacattgt tacttccgcg gctatttgac 720gttttctgct tcaggtgcgg cttggagggc aaagtgtcag aaaatcggcc aggccgtatg 780acacaaaaga gtagaaaacg agatctcaaa tatctcgagg cctgtcctct atacaaccgc 840ccagctctct gacaaagctc cagaacggtt gtcttttgtt tcgaaaagcc aaggtccctt 900ataattgccc tccattttgt gtcacctatt taagcaaaaa attgaaagtt tactaacctt 960tcattaaaga gaaataacaa tattataaaa agcgcttaaa tatacctgag aaagcaacct 1020gacctacagg aaagagttac tcaagaataa gaattttcgt tttaaaacct aagagtcact 1080ttaaaatttg tatacactta ttttttttat aacttattta ataataaaaa tcataaatca 1140taagaaattc gcttatttag aagtgtcaac aacgtatcta ccaacgattt gacccttttc 1200catcttttcg taaatttctg gcaaggtaga caagccgaca accttgattg gagacttgac 1260caaacctctg gcgaagaatt gttaattaag agctcagatc ttatcgtcgt catccttgta 1320atccatcgat actagtgcgg ccgcccttta gtgagggttg aattcgaatt ttcaaaaatt 1380cttacttttt ttttggatgg acgcaaagaa gtttaataat catattacat ggcattacca 1440ccatatacat atccatatac atatccatat ctaatcttac ttatatgttg tggaaatgta 1500aagagcccca ttatcttagc ctaaaaaaac cttctctttg gaactttcag taatacgctt 1560aactgctcat tgcgccgccg agcgggtgac agccctccga aggaagactc tcctccgtgc 1620gtcctcgtct tcaccggtcg cgttcctgaa acgcagatgt gcctcgcgcc gcactgctcc 1680gaacaataaa gattctacaa tactagcttt tatggttatg aagaggaaaa attggcagta 1740acctggcccc acaaaccttc aaatgaacga atcaaattaa caaccatagg atgataatgc 1800gattagtttt ttagccttat ttctggggta attaatcagc gaagcgatga tttttgatct 1860attaacagat atataaatgc aaaaactgca taaccacttt aactaatact ttcaacattt 1920tcggtttgta ttacttctta ttcaaatgta ataaaagtat caacaaaaaa ttgttaatat 1980acctctatac tttaacgtca aggagaaaaa actataatgg gtcaatataa attaattttg 2040aatggtaaaa ctttgaaggg tgaaactact accgaagcag tagatgcagc aacagccgaa 2100aaggtcttta agcaatatgc taatgataat ggtgttgatg gtgaatggac ctatgacgat 2160gcaactaaaa catttactgt aactgaagga tccatggctt tgactgaaga aaaaccaatc 2220agaccaattg ctaattttcc accttctatc tggggtgacc aattcttgat ctacgaaaag 2280caagttgagc aaggtgttga acaaatcgtt aacgatttga agaaggaagt tagacaattg 2340ttgaaggaag ctttggacat cccaatgaag cacgctaact tattgaaatt aatcgacgaa 2400atccaaagat taggtattcc ataccatttc gaaagagaaa ttgatcacgc tttgcaatgt 2460atttacgaaa cttacggtga taactggaac ggtgacagat cttccttatg gttcagatta 2520atgagaaagc aaggttacta cgttacttgt gacgttttca acaactacaa agataagaat 2580ggtgctttta agcaatcttt ggccaacgat gttgaaggtt tgttagaatt gtatgaagcc 2640acctccatga gagttcctgg tgaaatcatc ttggaagacg ctttgggttt tactagatct 2700agattgtcca tcatgactaa agatgccttc tctactaacc cagctttgtt cactgaaatc 2760caaagagctt tgaaacaacc attgtggaag cgtttgccaa gaatcgaagc tgctcaatac 2820atcccattct accaacaaca agactctcac aacaaaacct tgttgaaatt ggctaaattg 2880gaatttaact tgttgcaatc cttgcacaag gaagaattgt ctcacgtttg taagtggtgg 2940aaagccttcg atattaagaa gaacgctcca tgtttgagag atagaattgt cgaatgttac 3000ttctggggtt tgggttccgg ttacgaacca caatactcta gagctagagt tttctttact 3060aaggtggttg ctgttattac cttgattgac gacacttacg acgcttacgg tacctacgaa 3120gaattgaaga ttttcactga agctgtcgaa agatggtcca ttacctgttt agatacttta 3180ccagaataca tgaagccaat ttataagttg ttcatggaca cttacaccga aatggaagaa 3240ttcttggcta aggaaggtcg taccgacttg ttcaattgtg gtaaggaatt cgttaaggaa 3300ttcgttcgta acttgatggt tgaagctaaa tgggctaacg aaggtcatat tccaactact 3360gaagaacacg atccagtcgt tatcattact ggtggtgcta acttgttgac tactacttgt 3420tacttaggta tgtctgatat tttcaccaag gaatctgttg aatgggccgt ttccgctcca 3480cctttgttca gatactctgg tattttgggt agaagattaa acgatttgat gactcacaag 3540gctgagcaag aaagaaagca ctcctcctct tccttagaat cttacatgaa ggaatacaac 3600gttaacgaag aatacgctca aactttgatt tacaaggaag ttgaagacgt ttggaaagat 3660attaacagag aatacttgac caccaaaaat attccacgtc cattgttaat ggctgttatc 3720tatttatgtc aattcttaga agttcaatac gctggtaaag acaatttcac cagaatgggt 3780gacgaatata agcacttaat taagtctttg ttggtctacc ctatgtctat ctaagatccg 3840ctctaaccga aaaggaagga gttagacaac ctgaagtcta ggtccctatt tattttttta 3900tagttatgtt agtattaaga acgttattta tatttcaaat ttttcttttt tttctgtaca 3960gacgcgtgta cgcatgtaac attatactga aaaccttgct tgagaaggtt ttgggacgct 4020cgaaggtttt caatagttcg gtaatattaa cggataccta ctattatccc ctagtaggct 4080cttttcacgg agaaattcgg gagtgttttt tttccgtgcg cattttctta gctatattct 4140tccagcttcg cctgctgccc ggtcatcgtt cctgtcacgt agtttttccg gattcgtccg 4200gctcatataa taccgcaata aacacggaat atctcgttcc gcggattcgg ttaaactctc 4260ggtcgcggat tatcacagag aaagcttcgt ggagaatttt tccagatttt ccgctttccc 4320cgatgttggt atttccggag gtcattatac tgaccgccat tataatgact gtacaacgac 4380cttctggaga aagaaacaac tcaataacga tgtgggacat tgggggccca ctcaaaaaat 4440ctggggacta tatccccaga gaatttctcc agaagagaag aaaagtcaaa gttttttttc 4500gcttgggggt tgcatataaa tacaggcgct gttttatctt cagcatgaat attccataat 4560tttacttaat agcttttcat aaataataga atcacaaaca aaatttacat ctgagttaaa 4620caatcatgac aatcaaggaa cataaagtag tttatgaagc tcacaacgta aaggctctta 4680aggctcctca acatttttac aacagccaac ccggcaaggg ttacgttact gatatgcaac 4740attatcaaga aatgtatcaa caatctatca atgagccaga aaaattcttt gataagatgg 4800ctaaggaata cttgcattgg gatgctccat acaccaaagt tcaatctggt tcattgaaca 4860atggtgatgt tgcatggttt ttgaacggta aattgaatgc atcatacaat tgtgttgaca 4920gacatgcctt tgctaatccc gacaagccag ctttgatcta tgaagctgat gacgaatccg 4980acaacaaaat catcacattt ggtgaattac tcagaaaagt ttcccaaatc gctggtgtct 5040taaaaagctg gggcgttaag aaaggtgaca cagtggctat ctatttgcca atgattccag 5100aagcggtcat tgctatgttg gctgtggctc gtattggtgc tattcactct gttgtctttg 5160ctgggttctc cgctggttcg ttgaaagatc gtgtcgttga cgctaattct aaagtggtca 5220tcacttgtga tgaaggtaaa agaggtggta agaccatcaa cactaaaaaa attgttgacg 5280aaggtttgaa cggagtcgat ttggtttccc gtatcttggt tttccaaaga actggtactg 5340aaggtattcc aatgaaggcc ggtagagatt actggtggca tgaggaggcc gctaagcaga 5400gaacttacct acctcctgtt tcatgtgacg ctgaagatcc tctattttta ttatacactt 5460ccggttccac tggttctcca aagggtgtcg ttcacactac aggtggttat ttattaggtg 5520ccgctttaac aactagatac gtttttgata ttcacccaga agatgttctc ttcactgccg 5580gtgacgtcgg ctggatcacg ggtcacacct atgctctata tggtccatta accttgggta 5640ccgcctcaat aattttcgaa tccactcctg cctacccaga ttatggtaga tattggagaa 5700ttatccaacg tcacaaggct acccatttct atgtggctcc aactgcttta agattaatca 5760aacgtgtagg tgaagccgaa attgccaaat atgacacttc ctcattacgt gtcttgggtt 5820ccgtcggtga accaatctct ccagacttat gggaatggta tcatgaaaaa gtgggtaaca 5880aaaactgtgt catttgtgac actatgtggc aaacagagtc tggttctcat ttaattgctc 5940ctttggcagg tgctgtccca acaaaacctg gttctgctac cgtgccattc tttggtatta 6000acgcttgtat cattgaccct gttacaggtg tggaattaga aggtaatgat gtcgaaggtg 6060tccttgccgt taaatcacca tggccatcaa tggctagatc tgtttggaac caccacgacc 6120gttacatgga tacttacttg aaaccttatc ctggtcacta tttcacaggt gatggtgctg 6180gtagagatca tgatggttac tactggatca ggggtagagt tgacgacgtt gtaaatgttt 6240ccggtcatag attatccaca tcagaaattg aagcatctat ctcaaatcac gaaaacgtct 6300cggaagctgc tgttgtcggt attccagatg aattgaccgg tcaaaccgtc gttgcatatg 6360tttccctaaa agatggttat ctacaaaaca acgctactga aggtgatgca gaacacatca 6420caccagataa tttacgtaga gaattgatct tacaagttag gggtgagatt ggtcctttcg 6480cctcaccaaa aaccattatt ctagttagag atctaccaag aacaaggtca ggaaagatta 6540tgagaagagt tctaagaaag gttgcttcta acgaagccga acagctaggt gacctaacta 6600ctttggccaa cccagaagtt gtacctgcca tcatttctgc tgtagagaac caatttttct 6660ctcaaaaaaa gaaataaatt gaattgaatt gaaatcgata gatcaatttt tttcttttct 6720ctttccccat cctttacgct aaaataatag tttattttat tttttgaata ttttttattt 6780atatacgtat atatagacta ttatttatct tttaatgatt attaagattt ttattaaaaa 6840aaaattcgct cctcttttaa tgcctttatg cagttttttt ttcccattcg atatttctat 6900gttcgggttc agcgtatttt aagtttaata actcgaaaat tctgcgttcg ttaaagcttt 6960cgagaaggat attatttcga aataaaccgt gttgtgtaag cttgaagcct ttttgcgctg 7020ccaatattct tatccatcta ttgtactctt tagatccagt atagtgtatt cttcctgctc 7080caagctcatc ccatccccgc gtgcttggcc ggccgttttg ccagcttact atccttcttg 7140aaaatatgca ctctatatct tttagttctt aattgcaaca catagatttg ctgtataacg 7200aattttatgc tattttttaa atttggagtt cagtgataaa agtgtcacag cgaatttcct 7260cacatgtagg gaccgaattg tttacaagtt ctctgtacca ccatggagac atcaaaaatt 7320gaaaatctat ggaaagatat ggacggtagc aacaagaata tagcacgagc cgcggagttc

7380atttcgttac ttttgatatc actcacaact attgcgaagc gcttcagtga aaaaatcata 7440aggaaaagtt gtaaatatta ttggtagtat tcgtttggta aagtagaggg ggtaattttt 7500cccctttatt ttgttcatac attcttaaat tgctttgcct ctccttttgg aaagctatac 7560ttcggagcac tgttgagcga aggctcatta gatatatttt ctgtcatttt ccttaaccca 7620aaaataaggg aaagggtcca aaaagcgctc ggacaactgt tgaccgtgat ccgaaggact 7680ggctatacag tgttcacaaa atagccaagc tgaaaataat gtgtagctat gttcagttag 7740tttggctagc aaagatataa aagcaggtcg gaaatattta tgggcattat tatgcagagc 7800atcaacatga taaaaaaaaa cagttgaata ttccctcaaa aatgtcttac accgtcggaa 7860cctacttggc cgagaggttg gtccagatcg gattgaagca ccacttcgcc gtcgccggtg 7920actacaactt ggtcttgttg gacaacttgt tgttgaacaa gaacatggag caggtctatt 7980gctgcaacga gttgaactgc ggtttctcag cagaaggtta tgcaagagcc aagggagcag 8040ccgctgccgt cgtcacctac tcagtcggtg cattatcagc attcgatgca attggaggtg 8100cttacgctga gaacttgcca gtcatcttga tctctggagc acctaacaac aacgaccatg 8160ctgctggtca cgtattgcac cacgccttgg gtaaaacaga ctaccactac cagttggaaa 8220tggcaaaaaa tattaccgca gccgcagagg ccatctacac cccagaggaa gcacctgcca 8280aaattgacca cgtcataaag accgctttga gagagaagaa gcctgtttac ttggagatcg 8340cctgcaacat cgcttctatg ccatgcgccg cacctggtcc agcctctgct ttgttcaacg 8400acgaggcctc tgacgaagct tcattgaacg ccgcagtcga agagacatta aagttcatcg 8460ccaacaggga caaagttgcc gtcttagtcg gttcaaagtt gagggccgct ggtgccgaag 8520aggcagctgt caagttcgct gacgccttgg gaggagccgt cgccaccatg gccgcagcaa 8580aatctttctt tcctgaggag aacccacatt acatcggaac ctcatggggt gaagtatcat 8640atcctggagt agaaaaaacc atgaaagagg ccgatgccgt aatagcattg gctcctgtct 8700tcaacgacta ctcaaccaca ggatggactg atataccaga tccaaagaaa ttagtcttgg 8760ctgagcctag gtctgtcgtc gtaaacggta tcaggttccc ttctgttcat ttgaaggact 8820acttaacaag attggcccaa aaggtatcta aaaagactgg tgccttggac ttcttcaagt 8880cattaaacgc aggagaattg aaaaaagcag caccagccga tccatcagcc ccattagtta 8940acgctgaaat cgctagacaa gtagaggctt tgttgactcc aaacactacc gtcatagctg 9000agacaggtga ctcttggttc aacgcacaga gaatgaaatt gccaaatggt gccagggtcg 9060agtatgaaat gcagtgggga catataggtt ggtcagtccc agccgccttt ggatacgcag 9120taggtgcccc tgagaggagg aacatattga tggttggtga tggttcattc caattaacag 9180cccaggaggt agcccaaatg gtcaggttga agttgcctgt catcatcttc ttgatcaaca 9240attacggata caccatcgag gtcatgatcc acgacggacc ttacaacaac atcaaaaact 9300gggactacgc cggtttgatg gaggttttca acggtaacgg tggttatgac tcaggagccg 9360gtaagggatt aaaggctaag accggtggtg aattggctga agcaattaag gtcgcattgg 9420ccaacaccga tggacctaca ttgattgaat gcttcatcgg aagggaggac tgcaccgagg 9480aattggttaa atggggtaaa agggtagccg ctgctaattc aagaaaacca gttaataaat 9540tattataata agtgaattta ctttaaatct tgcatttaaa taaattttct ttttatagct 9600ttatgactta gtttcaattt atatactatt ttaatgacat tttcgattca ttgattgaaa 9660gctttgtgtt ttttcttgat gcgctattgc attgttcttg tctttttcgc cacatgtaat 9720atctgtagta gatacctgat acattgtgga tgctgagtga aattttagtt aataatggag 9780gcgctcttaa taattttggg gatattggct taacctgcag gccgcgagcg ccgatataaa 9840ctaatgattt taaatcgtta aaaaaatatg cgaattctgt ggatcgaaca caggacctcc 9900agataacttg accgaagttt tttcttcagt ctggcgctct cccaactgag ctaaatccgc 9960ttactatttg ttatcagttc ccttcatatc tacatagaat aggttaagta ttttattagt 10020tgccagaaga actactgata gttgggaata tttggtgaat aatgaagatt gggtgaataa 10080tttgataatt ttgagattca attgttaatc aatgttacaa tattatgtat acagagtata 10140ctagaagttc tcttcggaga tcttgaagtt cacaaaaggg aatcgatatt tctacataat 10200attatcatta cttcttcccc atcttatatt tgtcattcat tattgattat gatcaatgca 10260ataatgattg gtagttgcca aacatttaat acgatcctct gtaatatttc tatgaataat 10320tatcacagca acgttcaatt atcttcaatt ccggtgttta aaccccagcg cctggcggg 10379424140DNAArtificial SequenceSynthetic ADS_FS.2 donor cassette 42tcctcttgtg gtccgtggct tgtaattttc gggaaagata gaaatcactg cattcatcga 60aagaagtgaa aataaattgt atcggggaag taatgagggg tggaatacgt aattgcttct 120caatttgata atacaagtta cagtcttttt attgagagtc cctggaagga aaattattga 180tgtttacctt tttttgcgac gcgcgtctgg ccataaataa aagggttctc ttgccaagaa 240aaaataaaaa ggcgccttaa gggaccttct atgacaaata tggtgaggta tgcaacctca 300atgaagagca atgagaaaat ttaaggggta agtaagcatc ggaatttgtt gtttcctaac 360aatttgtcta atttactcaa taatatcagg agaattgatc gaaaaaagca aaccaggaac 420ccctcacaaa taagggaaca taaagtaatt gctcgtcttt acatacatgg cactcaatcc 480cagacgtcgc gtgctaaaaa tccttatatt attggcccct caggagttta tttgaatttt 540gattgcattg ctttcagtgg acagtatatc ataaaatttg caagggcata gtgcctgccc 600tacgatgttg taaaacaatt tctgaaaata ggttcagaat caaaaatgat gtataaatat 660tgaaataaat tttcacataa attgtgctcc tccgcaaagt cttgactaaa taaacaattt 720gttaatatcc tttcaaaaat tcttactttt tttttggatg gacgcaaaga agtttaataa 780tcatattaca tggcattacc accatataca tatccatata catatccata tctaatctta 840cttatatgtt gtggaaatgt aaagagcccc attatcttag cctaaaaaaa ccttctcttt 900ggaactttca gtaatacgct taactgctca ttgctatatt gaagtacgga ttagaagccg 960ccgagcgggt gacagccctc cgaaggaaga ctctcctccg tgcgtcctcg tcttcaccgg 1020tcgcgttcct gaaacgcaga tgtgcctcgc gccgcactgc tccgaacaat aaagattcta 1080caatactagc ttttatggtt atgaagagga aaaattggca gtaacctggc cccacaaacc 1140ttcaaatgaa cgaatcaaat taacaaccat aggatgataa tgcgattagt tttttagcct 1200tatttctggg gtaattaatc agcgaagcga tgatttttga tctattaaca gatatataaa 1260tgcaaaaact gcataaccac tttaactaat actttcaaca ttttcggttt gtattacttc 1320ttattcaaat gtaataaaag tatcaacaaa aaattgttaa tatacctcta tactttaacg 1380tcaaggagaa aaaaccccgg atccatgggt caatataaat taattttgaa tggtaaaact 1440ttgaagggtg aaactactac cgaagcagta gatgcagcaa cagccgaaaa ggtctttaag 1500caatatgcta atgataatgg tgttgatggt gaatggacct atgacgatgc aactaaaaca 1560tttactgtaa ctgaaggatc catggctttg actgaagaaa aaccaatcag accaattgct 1620aattttccac cttctatctg gggtgaccaa ttcttgatct acgaaaagca agttgagcaa 1680ggtgttgaac aaatcgttaa cgatttgaag aaggaagtta gacaattgtt gaaggaagct 1740ttggacatcc caatgaagca cgctaactta ttgaaattaa tcgacgaaat ccaaagatta 1800ggtattccat accatttcga aagagaaatt gatcacgctt tgcaatgtat ttacgaaact 1860tacggtgata actggaacgg tgacagatct tccttatggt tcagattaat gagaaagcaa 1920ggttactacg ttacttgtga cgttttcaac aactacaaag ataagaatgg tgcttttaag 1980caatctttgg ccaacgatgt tgaaggtttg ttagaattgt atgaagccac ctccatgaga 2040gttcctggtg aaatcatctt ggaagacgct ttgggtttta ctagatctag attgtccatc 2100atgactaaag atgccttctc tactaaccca gctttgttca ctgaaatcca aagagctttg 2160aaacaaccat tgtggaagcg tttgccaaga atcgaagctg ctcaatacat cccattctac 2220caacaacaag actctcacaa caaaaccttg ttgaaattgg ctaaattgga atttaacttg 2280ttgcaatcct tgcacaagga agaattgtct cacgtttgta agtggtggaa agccttcgat 2340attaagaaga acgctccatg tttgagagat agaattgtcg aatgttactt ctggggtttg 2400ggttccggtt acgaaccaca atactctaga gctagagttt tctttactaa ggtggttgct 2460gttattacct tgattgacga cacttacgac gcttacggta cctacgaaga attgaagatt 2520ttcactgaag ctgtcgaaag atggtccatt acctgtttag atactttacc agaatacatg 2580aagccaattt ataagttgtt catggacact tacaccgaaa tggaagaatt cttggctaag 2640gaaggtcgta ccgacttgtt caattgtggt aaggaattcg ttaaggaatt cgttcgtaac 2700ttgatggttg aagctaaatg ggctaacgaa ggtcatattc caactactga agaacacgat 2760ccagtcgtta tcattactgg tggtgctaac ttgttgacta ctacttgtta cttaggtatg 2820tctgatattt tcaccaagga atctgttgaa tgggccgttt ccgctccacc tttgttcaga 2880tactctggta ttttgggtag aagattaaac gatttgatga ctcacaaggc tgagcaagaa 2940agaaagcact cctcctcttc cttagaatct tacatgaagg aatacaacgt taacgaagaa 3000tacgctcaaa ctttgattta caaggaagtt gaagacgttt ggaaagatat taacagagaa 3060tacttgacca ccaaaaatat tccacgtcca ttgttaatgg ctgttatcta tttatgtcaa 3120ttcttagaag ttcaatacgc tggtaaagac aatttcacca gaatgggtga cgaatataag 3180cacttaatta agtctttgtt ggtctaccct atgtctatct aagatccgct ctaaccgaaa 3240aggaaggagt cagacaacct gaagtctagg tccctattta tttttttata gttatgttag 3300tattaagaac gttatttata tttcaaattt ttcttttttt tctgtacaga cgcgtgtacg 3360catgtaacat tatactgaaa accttgcttg agaaggtttt gggacgctcg aagctacctt 3420tttgttcttt tacttaaaca ttagttagtt cgttttcttt ttctcatttt tttatgtttc 3480ccccccaaag ttctgatttt ataatatttt atttcacaca attccattta acagaggggg 3540aatagattct ttagcttaga aaattagtga tcaatatata tttgcctttc ttttcatctt 3600ttcagtgata ttaatggttt cgagacactg caatggccct agttgtctaa gaggatagat 3660gttactgtca aagatgatat tttgaatttc aattgacgta attaatgata ctattaataa 3720tacagagcgt atatgaagta ttgcaaataa catgcacagt tcttttggga tgagaatgat 3780aatgaaaggc gaaggcgggc gttcagaaaa gcgttgcgga gtaacaagtg attaaatagc 3840acccaaataa tcttctttga tactaccgat tgcgtgaata gaactcactt gactgataca 3900accttcaatt ttaactctaa ttctactttt tatggtgatg acatcctcgg aactttggta 3960tgatggtggg tttgaacccg cattaaaggt taaatcttga ggcatcagat gctttgtcac 4020aaatactttc attggaccta cttgcacttc gaacccgtgc tgagaacatg aaacgactgt 4080gccgtccact acttcccctt taaatggttt gaaaactaca gctctatatt tcacgttgaa 4140434475DNAArtificial SequenceSynthetic ADS_FS.3 donor cassette 43attgaagcac ctgtggagta tttaaaaact gcggttacat ggcctacaga tgaaatatgt 60gctcaactaa tgacacaatt cccaccagga acgccgacca gtgtcctgct gcagactatt 120tcagatgagc tagagaaaag ttctgacaac ctgttcacgt tatctgattt aaagagcaaa 180ctgaaagtta ttggcttatt cgagcacatg gaagatatcc catttttcga caagctgaaa 240ctaagcaatg cgcccgtgaa ggacatgcct atggtcacaa aggcgttcac caaattttgc 300gaaacaatag caaaaaggca tacaagaggc ctactgtcat accgattacc ttttaaccta 360ctggactaca attgcatacc gaatgagagt tattcattag aggtttatga gtcattgtac 420aacatcatta ctctatactt ctggctcagc aacaggtacc caaactactt cattgacatg 480gaatctgcta aagatttgaa gtatttctgt gagatgatta ttttcgagaa acttgatcga 540ttaaagaaga atccttacgc acataagccc tttggttcta caagaggtca cctctcatct 600tcgagaagaa gattgcgtac ataatctacg atatatcctg taaatagaaa cagctacact 660gcttgaaagc cttaacatga tacatttctg gtatgatgcc attgttgtgc cctgccgggt 720ttatcgtttc ctaacaggca cgtcacttat aacgaggtgc ctgtcgttta ccgcccaagc 780cggttttttc gctggagagt acggtactac tagcccacca cacgttcgtg gccaggttga 840taggccaccg ttgagcaaag ggcagtaaaa tatataaaag aggaacaagc gcttccatta 900agagcactgc taagcctact cgttttctag ttctctgaaa aaaggtagcc taaaacaagc 960gccatatcat atatatttat acagattaga cgtactcaaa attcttactt tttttttgga 1020tggacgcaaa gaagtttaat aatcatatta catggcatta ccaccatata catatccata 1080tacatatcca tatctaatct tacttatatg ttgtggaaat gtaaagagcc ccattatctt 1140agcctaaaaa aaccttctct ttggaacttt cagtaatacg cttaactgct cattgcgccg 1200ccgagcgggt gacagccctc cgaaggaaga ctctcctccg tgcgtcctcg tcttcaccgg 1260tcgcgttcct gaaacgcaga tgtgcctcgc gccgcactgc tccgaacaat aaagattcta 1320caatactagc ttttatggtt atgaagagga aaaattggca gtaacctggc cccacaaacc 1380ttcaaatgaa cgaatcaaat taacaaccat aggatgataa tgcgattagt tttttagcct 1440tatttctggg gtaattaatc agcgaagcga tgatttttga tctattaaca gatatataaa 1500tgcaaaaact gcataaccac tttaactaat actttcaaca ttttcggttt gtattacttc 1560ttattcaaat gtaataaaag tatcaacaaa aaattgttaa tatacctcta tactttaacg 1620tcaaggagaa aaaactataa tgggtcaata taaattaatt ttgaatggta aaactttgaa 1680gggtgaaact actaccgaag cagtagatgc agcaacagcc gaaaaggtct ttaagcaata 1740tgctaatgat aatggtgttg atggtgaatg gacctatgac gatgcaacta aaacatttac 1800tgtaactgaa ggatccatgg ctttgactga agaaaaacca atcagaccaa ttgctaattt 1860tccaccttct atctggggtg accaattctt gatctacgaa aagcaagttg agcaaggtgt 1920tgaacaaatc gttaacgatt tgaagaagga agttagacaa ttgttgaagg aagctttgga 1980catcccaatg aagcacgcta acttattgaa attaatcgac gaaatccaaa gattaggtat 2040tccataccat ttcgaaagag aaattgatca cgctttgcaa tgtatttacg aaacttacgg 2100tgataactgg aacggtgaca gatcttcctt atggttcaga ttaatgagaa agcaaggtta 2160ctacgttact tgtgacgttt tcaacaacta caaagataag aatggtgctt ttaagcaatc 2220tttggccaac gatgttgaag gtttgttaga attgtatgaa gccacctcca tgagagttcc 2280tggtgaaatc atcttggaag acgctttggg ttttactaga tctagattgt ccatcatgac 2340taaagatgcc ttctctacta acccagcttt gttcactgaa atccaaagag ctttgaaaca 2400accattgtgg aagcgtttgc caagaatcga agctgctcaa tacatcccat tctaccaaca 2460acaagactct cacaacaaaa ccttgttgaa attggctaaa ttggaattta acttgttgca 2520atccttgcac aaggaagaat tgtctcacgt ttgtaagtgg tggaaagcct tcgatattaa 2580gaagaacgct ccatgtttga gagatagaat tgtcgaatgt tacttctggg gtttgggttc 2640cggttacgaa ccacaatact ctagagctag agttttcttt actaaggtgg ttgctgttat 2700taccttgatt gacgacactt acgacgctta cggtacctac gaagaattga agattttcac 2760tgaagctgtc gaaagatggt ccattacctg tttagatact ttaccagaat acatgaagcc 2820aatttataag ttgttcatgg acacttacac cgaaatggaa gaattcttgg ctaaggaagg 2880tcgtaccgac ttgttcaatt gtggtaagga attcgttaag gaattcgttc gtaacttgat 2940ggttgaagct aaatgggcta acgaaggtca tattccaact actgaagaac acgatccagt 3000cgttatcatt actggtggtg ctaacttgtt gactactact tgttacttag gtatgtctga 3060tattttcacc aaggaatctg ttgaatgggc cgtttccgct ccacctttgt tcagatactc 3120tggtattttg ggtagaagat taaacgattt gatgactcac aaggctgagc aagaaagaaa 3180gcactcctcc tcttccttag aatcttacat gaaggaatac aacgttaacg aagaatacgc 3240tcaaactttg atttacaagg aagttgaaga cgtttggaaa gatattaaca gagaatactt 3300gaccaccaaa aatattccac gtccattgtt aatggctgtt atctatttat gtcaattctt 3360agaagttcaa tacgctggta aagacaattt caccagaatg ggtgacgaat ataagcactt 3420aattaagtct ttgttggtct accctatgtc tatctaagat ccgctctaac cgaaaaggaa 3480ggagttagac aacctgaagt ctaggtccct atttattttt ttatagttat gttagtatta 3540agaacgttat ttatatttca aatttttctt ttttttctgt acagacgcgt gtacgcatgt 3600aacattatac tgaaaacctt gcttgagaag gttttgggac gctcgaagga tacttgcaca 3660agttccacta attactgaca tttgtggtat taactcgttt gactgctcta caattgtagg 3720atgttaatca atgtcttggc tgccttcatt ctcttcaggc tctattaatt ttaaccgtta 3780taagttcctt ttctcccttg gaagcaaaca tcaactgcct taaaatctgg tggcgaggaa 3840agaggaaatg gcatgtacta atgatggtcc taataaatat cccgaaattg tgagtgttaa 3900gcacctgttc caacattcgg gatccaagca tgaatttagt gctggtaaac gattttcaaa 3960atccattggt aaaatattca aacgaaactc tgctttgaaa acttctagaa ctgaaacggc 4020aaatcataaa atggaattga aaaaaagaga gggtgttacc ttattgccac ctgtcccaga 4080atcattatta cataaactca attcttggtt ggaaactttt tcttccacca agaacatgaa 4140aatcgaagaa aacaaaattg ttattaatga aaaagagatt cgggattcag tctcttacta 4200ccctgataag aatggaggaa gtgctgtatt ttgttacttg cccgaccttg tgctatatta 4260taagccgcct ataaaagtca caggcaagca atgtccaata aagagaagtc cttgggaatc 4320gatggaaatc caatatcaaa agtttatgta ccccttagaa aggttggaaa gacagtttga 4380ggaagttcca tttaggccct ggtattttgc aatgcgatta aaggaacttt acagatgctg 4440tgaaaggtct tttactaacg cggcaaatag aggaa 4475448247DNAArtificial SequenceSynthetic 9XFS_URA3_FS.4 donor cassette 44attgctattg agtaagttcg atccgtttgg cgtcttttgg ggtgtaacgc caaacttatt 60acttttccta tttgaggttg gtattgattg ttgtcaaaga atgaaaatat acacaaacgc 120cacaatatac gtaccaggtt cacgaaaact gatcgtatgg ttcataccct gacttggcaa 180acctaatgtg accgtcgctg attagcggat cacgaaaagt gatctcgata caattagagg 240atccacgaaa atgatgtgaa tgaatacatg aaagattcat gagatctgac aacatggtag 300acgtgtgtgt ctcatggaaa ttgatgcagt tgaagacatg tgcgtcacga aaaaagaaat 360caatcctaca cagggcttaa gggcaaatgt attcatgtgt gtcacgaaaa gtgatgtaac 420taaatacacg attaccatgg aaattaacgt accttttttg tgcgtgtatt gaaatattat 480gacatattac agaaagggtt cgcaagtcct gtttctatgc ctttctctta gtaattcacg 540aaataaacct atggtttacg aaatgatcca cgaaaatcat gttattattt acatcaacat 600atcgcgaaaa ttcatgtcat gtccacatta acatcattgc agagcaacaa ttcattttca 660tagagaaatt tgctactatc acccactagt actaccattg gtacctacta ctttgaattg 720tactaccgct gggcgttatt aggtgtgaaa ccacgaaaag ttcaccataa cttcgaataa 780agtcgcggaa aaaagtaaac agctattgct actcaaatga ggtttgcaga agcttgttga 840agcatgatga agcgttctaa acgcactatt catcattaaa tatttaaagc tcataaaatt 900gtattcaatt cctattctaa atggctttta tttctattac aactattagc tctaaatcca 960tatcctcata agcagcaatc aattctatct atactttaaa agtaaaattc ttgagggaac 1020tttcaccatt atgggaaatg gttcaagaag gtattgactt aaactccatc aaatggtcag 1080gtcattgagt gttttttatt tgttgtattt tttttttttt agagaaaatc ctccaatatc 1140aaattaggaa tcgtagtttc atgattttct gttacaccta actttttgtg tggtgccctc 1200ctccttgtca atattaatgt taaagtgcaa ttctttttcc ttatcacgtt gagccattag 1260tatcaatttg cttacctgta ttcctttact atcctccttt ttctccttct tgataaatgt 1320atgtagattg cgtatatagt ttcgtctacc ctatgaacat attccatttt gtaatttcgt 1380gtcgtttcta ttatgaattt catttataaa gtttatgtac aaatatcata aaaaaagaga 1440atctttttaa gcaaggattt tcttaacttc ttcggcgaca gcatcaccga cttcggtggt 1500actgttggaa ccacctaaat caccagttct gatacctgca tccaaaacct ttttaactgc 1560atcttcaatg gccttacctt cttcaggcaa gttcaatgac aatttcaaca tcattgcagc 1620agacaagata gtggcgatag ggtcaacctt attctttggc aaatctggag cagaaccgtg 1680gcatggttcg tacaaaccaa atgcggtgtt cttgtctggc aaagaggcca aggacgcaga 1740tggcaacaaa cccaaggaac ctgggataac ggaggcttca tcggagatga tatcaccaaa 1800catgttgctg gtgattataa taccatttag gtgggttggg ttcttaacta ggatcatggc 1860ggcagaatca atcaattgat gttgaacctt caatgtaggg aattcgttct tgatggtttc 1920ctccacagtt tttctccata atcttgaaga ggccaaaaca ttagctttat ccaaggacca 1980aataggcaat ggtggctcat gttgtagggc catgaaagcg gccattcttg tgattctttg 2040cacttctgga acggtgtatt gttcactatc ccaagcgaca ccatcaccat cgtcttcctt 2100tctcttacca aagtaaatac ctcccactaa ttctctgaca acaacgaagt cagtaccttt 2160agcaaattgt ggcttgattg gagataagtc taaaagagag tcggatgcaa agttacatgg 2220tcttaagttg gcgtacaatt gaagttcttt acggattttt agtaaacctt gttcaggtct 2280aacactaccg gtaccccatt taggaccacc cacagcacct aacaaaacgg catcaacctt 2340cttggaggct tccagcgcct catctggaag tgggacacct gtagcatcga tagcagcacc 2400accaattaaa tgattttcga aatcgaactt gacattggaa cgaacatcag aaatagcttt 2460aagaacctta atggcttcgg ctgtgatttc ttgaccaacg tggtcacctg gcaaaacgac 2520gatcttctta ggggcagaca taggggcaga cattagaatg gtatatcctt gaaatatata 2580tatatattgc tgaaatgtaa aaggtaagaa aagttagaaa gtaagacgat tgctaaccac 2640ctattggaaa aaacaatagg tccttaaata atattgtcaa cttcaagtat tgtgatgcaa 2700gcatttagtc atgaacgctt ctctattcta tatgaaaagc cggttccggc ctctcacctt 2760tcctttttct cccaattttt cagttgaaaa aggtatatgc gtcaggcgac ctctgaaatt 2820aacaaaaaat ttccagtcat cgaatttgat tctgtgcgat agcgcccctg tgtgttctcg 2880ttatgttgag gaaaaaaata atggttgcta agagattcga actcttgcat cttacgatac 2940ctgagtattc ccacagttaa ctgcggtcaa gatatttctt gaatcaggcg ccttagaccg 3000ctcggccaaa caaccaatta cttgttgaga aatagagtat aattatccta taaatataac 3060gtttttgaac acacatgaac aaggaagtac aggacaattg attttgaaga gaatgtggat 3120tttgatgtaa ttgttgggat tccattttta ataaggcaat aatattaggt atgtggatat 3180actagaagtt ctcctcgacc gtaataatca tattacatgg cattaccacc atatacatat

3240ccatatacat atccatatct aatcttactt atatgttgtg gaaatgtaaa gagccccatt 3300atcttagcct aaaaaaacct tctctttgga actttcagta atacgcttaa ctgctcattg 3360cgccgccgag cgggtgacag ccctccgaag gaagactctc ctccgtgcgt cctcgtcttc 3420accggtcgcg ttcctgaaac gcagatgtgc ctcgcgccgc actgctccga acaataaaga 3480ttctacaata ctagctttta tggttatgaa gaggaaaaat tggcagtaac ctggccccac 3540aaaccttcaa atgaacgaat caaattaaca accataggat gataatgcga ttagtttttt 3600agccttattt ctggggtaat taatcagcga agcgatgatt tttgatctat taacagatat 3660ataaatgcaa aaactgcata accactttaa ctaatacttt caacattttc ggtttgtatt 3720acttcttatt caaatgtaat aaaagtatca acaaaaaatt gttaatatac ctctatactt 3780taacgtcaag gagaaaaaac tataatgtcc actttgccta tttcctctgt ttcttcctct 3840ccatctactt ccccaattgt tgtcgatgat aaggattcta ccaagccaga cgttattaga 3900cataccgcta acttcaacgc ttccatttgg ggtgatcaat ttttgactta cgatgaacca 3960gaggacttgg ttatgaagaa gcaattagtc gaagagttaa aggaggaagt taagaaggaa 4020ttgattacta ttaagggttc taacgaacct atgcaacatg ttaagttgat cgaattaatc 4080aatgctgttc aacgtttagg tattgcttac cattttgaag aagaaatcga agaagcttta 4140caacatatcc atgtcactta cggtgaacaa tgggtcgata aggaaaattt acaatctatc 4200tccttgtggt ttcgtttgtt gcgtcaacaa ggtttcaatg tctcctctgg tgttttcaag 4260gactttatgg acgaaaaagg taaatttaag gaatctttgt gtaacgacgc tcaaggtatc 4320ttggccttgt acgaagctgc tttcatgaga gtcgaagatg aaaccatctt ggacaacgct 4380ttggaattct ctaaggttca cttggacatt attgccaagg atccatcttg tgactcttcc 4440ttaagaaccc aaatccacca agccttgaag caaccattga gaagaagatt ggccagaatc 4500gaagctttac actatatgcc aatttaccaa caagaaactt ctcacgacga ggttttgttg 4560aagttggcta agttagattt ctctgttttg caatccatgc ataagaaaga attgtctcac 4620atctgcaagt ggtggaaaga tttagacttg caaaacaaat tgccattcgt tcgtgataga 4680gttgtcgaag gttacttctg gattttgtct atttactatg aaccacaaca tgccagaacc 4740agaatgttct tgatgaagtc ttgtatgtgg ttggtcgttt tggatgatac cttcgacaac 4800tacggtacct acgaagaatt ggaaattttc actcaagccg ttgagaagtg gtccatttct 4860tgtttggaca tgttgccaga atacatgaag ttgatctacc aagaattggt taacttgcac 4920gttgaaatgg aagaatcttt agaaaaggaa ggtaaggctt atcaaatcca ctacgttaag 4980gaaatggcta aggaattggt cagaaactac ttggttgaag ccagatggtt gaaagaaggt 5040tacatgccta ctttggaaga gtacatgtct gtttccatgg ttaccggtac ctacggtttg 5100atcactgcta gatcttacgt tggtagaggt gacattgtta acgaggacac ttttaaatgg 5160gtttcttcct acccacctat tgttgaagct tcttgtgtta tcattagatt gatggatgat 5220attgtctctc ataaagaaga acaagaaaga ggtcatgttg cctcctccat cgaatgttat 5280tctaaggaat ccggtgcttc tgaagaagaa gcctgtgaat acatctctag aaaagtcgaa 5340gacgcctgga aggttattaa cagagaatct ttgagaccaa ccgccgttcc attccctttg 5400ttaatgccag ctatcaactt ggctagaatg tgtgaagttt tatactctgt taacgatggt 5460ttcactcacg ccgaaggtga catgaaatct tacatgaagt ccttctttgt ccatccaatg 5520gtcgtttgag cgaatttctt atgatttatg atttttatta ttaaataagt tataaaaaaa 5580ataagtgtat acaaatttta aagtgactct taggttttaa aacgaaaatt cttattcttg 5640agtaactctt tcctgtaggt caggttgctt tctcaggtat agcatgaggt cgctcatgcg 5700tccatcttta cagtcctgtc ttattgttct tgatttgtgc cccgtaaaat actgttactt 5760ggttctggcg aggtattgga tagttccttt ttataaaggc catgaagctt tttctttcca 5820attttttttt tttcgtcatt atagaaatca ttacgaccga gattcccggg taataactga 5880tataattaaa ttgaagctct aatttgtgag tttagtatac atgcatttac ttataataca 5940gttttttagt tttgctggcc gcatcttctc aaatatgctt cccagcctgc ttttctgtaa 6000cgttcaccct ctaccttagc atcccttccc tttgcaaata gtcctcttcc aacaataata 6060atgtcagatc ctgtagagac cacatcatcc acggttctat actgttgacc caatgcgtct 6120cccttgtcat ctaaacccac accgggtgtc ataatcaacc aatcgtaacc ttcatctctt 6180ccacccatgt ctctttgagc aataaagccg ataacaaaat ctttgtcact cttcgcaatg 6240tcaacagtac ccttagtata ttctccagta gatagggagc ccttgcatga caattctgct 6300aacatcaaaa ggcctctagg ttcctttgtt acttcttctg ccgcctgctt caaaccgcta 6360acaatacctg ggcccaccac accgtgtgca ttcgtaatgt ctgcccattc tgctattctg 6420tatacacccg cagagtactg caatttgact gtattaccaa tgtcagcaaa ttttctgtct 6480tcgaagagta aaaaattgta cttggcggat aatgccttta gcggcttaac tgtgccctcc 6540atggaaaaat cagtcaagat atccacatgt gtttttagta aacaaatttt gggacctaat 6600gcttcaacta actccagtaa ttccttggtg gtacgaacat ccaatgaagc acacaagttt 6660gtttgctttt cgtgcatgat attaaatagc ttggcagcaa caggactagg atgagtagca 6720gcacgttcct tatatgtagc tttcgacatg atttatcttc gtttcctgca ggtttttgtt 6780ctgtgcagtt gggttaagaa tactgggcaa tttcatgttt cttcaacacc acatatgcgt 6840atatatacca atctaagtct gtgctccttc cttcgttctt ccttctgctc ggagattacc 6900gaatcaaaaa aatttcaaag aaaccggaat caaaaaaaag aacaaaaaaa aaaaagatga 6960attgaaaagc tttatggacc ctgaaaccac agccacatta accttctttg atggtcaaaa 7020cttatccttc accataaata tgcctcgcaa aaaaggtaat taacatatat agaattacat 7080tatttatgaa atatcatcac tatctcttag catctttaat ccttttctac atcagataac 7140ttcggtttgt tatcatcgtc tgtattgtca tcaattggcg cagtagcctc aatttcaacg 7200tcgtttgact ctggtgtttg ttcatgtgca gatccatgag atgatgaaat gtgtatatta 7260gtttaaaaag ttgtatgtaa taaaagtaaa atttaatatt ttggatgaaa aaaaccattt 7320ttagactttt tcttaactag aatgctggag tagaaatacg ccatctcaag atacaaaaag 7380cgttaccggc actgatttgt ttcaaccagt atatagatta ttattgggtc ttgatcaact 7440ttcctcagac atatcagtaa cagttatcaa gctaaatatt tacgcgaaag aaaaacaaat 7500attttaattg tgatacttgt gaattttatt ttattaagga tacaaagtta agagaaaaca 7560aaatttatat acaatataag taatattcat atatatgtga tgaatgcagt cttaacgaga 7620agacatggcc ttggtgacaa ctctcttcaa accaacttca gcctttctca attcatcagc 7680agatgggtct tcgatttgca aagcagccaa agcatcggac aaagcagctt caatcttgga 7740cttggaacct ctcttcaatt tagaagacaa gactgggtca gtgacagttt gttcgatgga 7800ggcaacgtag gattccaatc tttgtctagc ttcgtgcttc ttggcaaaag cttcatcggc 7860agccttgaac tcttcagctt ggttaaccat cttttcaatt tcttcagaag acaatctacc 7920aacagcgtta gagatagtga tgttagaaga cttaccggta gacttttcga cggcagtaac 7980cttcaagata ccgttagcat caacttcgaa gatagcttcc aagactggtt caccagctgg 8040catcattggg atgttcttca agtcgaattc acccaacaaa gtgttttctt tacagttaac 8100acgttcacct tggtagactg ggaattgaac ggtggtttgg ttgtcagcac atgtagtaaa 8160ggttcttctc ttgatggttg gaacagtagt gtttcttgga acaacgatac cgaacatgtc 8220accttgcata ccaacaccta gagataa 82474549DNAArtificial SequenceSynthetic Recognition sequence for FS-specific TALEN 45tagtggagga attaaaagag gaagttaaga aggaattgat aactatcaa 494626DNAArtificial SequenceSynthetic Primer HJ272 46ataacaatat tataaaaagc gcttaa 264729DNAArtificial SequenceSynthetic Primer ART45 47tactgcttcg gtagtagttt cacccttca 294822DNAArtificial SequenceSynthetic Primer HJ643 48aaaatcctta tattattggc cc 224919DNAArtificial SequenceSynthetic Primer HJ799 49gtagcctaaa acaagcgcc 19509PRTArtificial SequenceSynthetic Homing endonuclease family conserved motif 50Leu Ala Gly Leu Ile Asp Ala Asp Gly 1 5 516PRTArtificial SequenceSynthetic Homing endonuclease family conserved motif 51Gly Ile Tyr Tyr Ile Gly 1 5 5218DNAArtificial SequenceSynthetic Recognition sequence for I-SceI 52tagggataac agggtaat 185331DNAArtificial SequenceSynthetic Recognition sequence for VDE (PI-SceI) 53tatgtcgggt gcggagaaag aggtaatgaa a 315424DNAArtificial SequenceSynthetic Recognition sequence for F-CphI 54gatgcacgag cgcaacgctc acaa 245524DNAArtificial SequenceSynthetic Recognition sequence for PI-MgaI 55gcgtagctgc ccagtatgag tcag 245640DNAArtificial SequenceSynthetic Recognition sequence for PI-MtuII 56acgtgcacta cgtagagggt cgcaccgcac cgatctacaa 40574300DNAArtificial SequenceSynthetic ADE2_SFC1 donor cassette 57ccggtggtgc ttatactgtt tctactgcag ctgccgctac tgttagatct accatcagaa 60gattaagaga aatggttgaa gcttaaactt ctttcattca ttttctcttg gctttcacta 120ggtatatcta ttccataacg actatgtttt gtatttgtta atttacataa aaccatatca 180gtacatcaac gaactgtaaa aaagaaactt tagcataatt attgcggata tttaaactca 240cttgcaggta agagcaaaag gcgattgatt tccagtccgc ctcttgtcac gtgatttagt 300aagaaatttt gacagcaccc tcggtttaat ggaaaagagg ggcgtttttc gatgaacccg 360aggggagact agagaatcat ctcggttgaa tggagcatta tttttttagt agcgcccgcc 420cggagaaatg gacgttggcg aatgagccat gaattattaa ccgcccatgt ctaccagata 480gacggcacgg ccacgcgttt aaaccgcccc acatcgtatg acagtaccag cctagtcccg 540gtaaaccgca aacggacctt aattgtgacg aagggcccaa atttgatggg tcggtgttaa 600tgattagtcc tcattgtcat aataaagtgt gatgatggag gcaatgatga tatacggtag 660tactactgct cgaggtgcta tcttttaacc aatcctttga gattcttgtc gccacggagt 720tactaccttt tacaaaccgt aatgtcacat tttgcatata tcttatgtat aaatatatag 780ttcacttact acttgttctc gttttgttaa ctttcttgtt gtagttcttc ttgttcttgg 840cgtttccccc tttgttttct atctgcttca taagtaaagt gcaaagcatt ttggaagata 900ttatcaattg agtcattgaa agaaacttgg catcttccct attactaaaa ctaagaatac 960ttgattcaag aaagaagttt atattagttt tagccgtaag ataacataac aaagaagaag 1020aaagaaaatt cttgaataat acataacttt tcttaaaaga atcaaagaca gataaaattt 1080aagagatatt aaatattagt gagaagccga gaattttgta acaccaacat aacactgaca 1140tctttaacaa cttttaatta tgatacattt cttacgtcat gattgattat tacagctatg 1200ctgacaaatg actcttgttg catggctacg aaccgggtaa tactaagtga ttgactcttg 1260ctgacctttt attaagaact aaatggacaa tattatggag catttcatgt ataaattggt 1320gcgtaaaatc gttggatctc tcttctaagt acatcctact ataacaatca agaaaaacaa 1380gaaaatcgga caaaacaatc aagtatggat tctagaacag ttggtatatt aggaggggga 1440caattgggac gtatgattgt tgaggcagca aacaggctca acattaagac ggtaatacta 1500gatgctgaaa attctcctgc caaacaaata agcaactcca atgaccacgt taatggctcc 1560ttttccaatc ctcttgatat cgaaaaacta gctgaaaaat gtgatgtgct aacgattgag 1620attgagcatg ttgatgttcc tacactaaag aatcttcaag taaaacatcc caaattaaaa 1680atttaccctt ctccagaaac aatcagattg atacaagaca aatatattca aaaagagcat 1740ttaatcaaaa atggtatagc agttacccaa agtgttcctg tggaacaagc cagtgagacg 1800tccctattga atgttggaag agatttgggt tttccattcg tcttgaagtc gaggactttg 1860gcatacgatg gaagaggtaa cttcgttgta aagaataagg aaatgattcc ggaagctttg 1920gaagtactga aggatcgtcc tttgtacgcc gaaaaatggg caccatttac taaagaatta 1980gcagtcatga ttgtgaggtc tgttaacggt ttagtgtttt cttacccaat tgtagagact 2040atccacaagg acaatatttg tgacttatgt tatgcgcctg ctagagttcc ggactccgtt 2100caacttaagg cgaagttgtt ggcagaaaat gcaatcaaat cttttcccgg ttgtggtata 2160tttggtgtgg aaatgttcta tttagaaaca ggggaattgc ttattaacga aattgcccca 2220aggcctcaca actctggaca ttataccatt gatgcttgcg tcacttctca atttgaagct 2280catttgagat caatattgga tttgccaatg ccaaagaatt tcacatcttt ctccaccatt 2340acaacgaacg ccattatgct aaatgttctt ggagacaaac atacaaaaga taaagagcta 2400gaaacttgcg aaagagcatt ggcgactcca ggttcctcag tgtacttata tggaaaagag 2460tctagaccta acagaaaagt aggtcacata aatattattg cctccagtat ggcggaatgt 2520gaacaaaggc tgaactacat tacaggtaga actgatattc caatcaaaat ctctgtcgct 2580caaaagttgg acttggaagc aatggtcaaa ccattggttg gaatcatcat gggatcagac 2640tctgacttgc cggtaatgtc tgccgcatgt gcggttttaa aagattttgg cgttccattt 2700gaagtgacaa tagtctctgc tcatagaact ccacatagga tgtcagcata tgctatttcc 2760gcaagcaagc gtggaattaa aacaattatc gctggagctg gtggggctgc tcacttgcca 2820ggtatggtgg ctgcaatgac accacttcct gtcatcggtg tgcccgtaaa aggttcttgt 2880ctagatggag tagattcttt acattcaatt gtgcaaatgc ctagaggtgt tccagtagct 2940accgtcgcta ttaataatag tacgaacgct gcgctgttgg ctgtcagact gcttggcgct 3000tatgattcaa gttatacaac gaaaatggaa cagtttttat taaagcaaga agaagaagtt 3060cttgtcaaag cacaaaagtt agaaactgtc ggttacgaag cttatctaga aaacaagtaa 3120tatataagtt tattgatata cttgtacagc aaataattat aaaatgatat acctattttt 3180taggctttgt tatgattaca tcaaatgtgg acttcataca tagaaatcaa cgcttacagg 3240tgtccttttt taagaatttc atacataaga tcatgatgaa caatgggact acaaaatgaa 3300ataaagaaaa aatagaaata gaatagaaga tcaattatta atcgccctat tcttccttat 3360tacctacaca aaataaagca gcaacataag aaacaaaaac aaaatgaaaa caaaccaaat 3420aaatctatgt aagcatactc atttcaattt gatattcatt acttgacttt tttgtcctta 3480tttgaggctc cataagcgcg ccattttccc ctactccctt ttttcgtaaa tagtaataat 3540gtgctgaaaa gaacaatgaa gtagttatca tacatattcc gtcgtgtcga tatgagggga 3600ggtgtctctt tctttcatcc cttgtcgcaa cctccaatat ataagagcat aagcaactga 3660tcttacttta gtaattaact tagcatacct ggcccgaagg aagaaaaaaa attcacctca 3720acaacatggt tcctaagttt tacaaacttt caaacggctt caaaatccca agcattgctt 3780tgggaaccta ccggtgttta aaccccagcg cctggcgggg atattccaag atcgcaaaca 3840gccgaaattg tgtatgaagg tgtcaagtgc ggctaccgtc atttcgatac tgctgttctt 3900tatggtaatg agaaggaagt tggcgatggt atcattaaat ggttgaacga agatccaggg 3960aaccataaac gtgaggaaat cttctacact actaaattat ggaattcgca aaacggatat 4020aaaagagcta aagctgccat tcagcaatgt ttgaatgaag tctcgggctt gcaatacatc 4080gatcttcttt tgattcattc gccactggaa ggttctaaat taaggttgga aacttggcgc 4140gccatgcaag aagcggttga tgaaggattg gttaagtcta taggggtttc caactatggg 4200aaaaagcaca ttgatgaact tttgaactgg ccagaactga agcacaagcc agtggtcaac 4260caaatcgaga tatcaccttg gattatgaga caagaattag 4300583586DNAArtificial SequenceSynthetic GFP_SFC1 donor cassette 58ccggtggtgc ttatactgtt tctactgcag ctgccgctac tgttagatct accatcagaa 60gattaagaga aatggttgaa gcttaaactt ctttcattca ttttctcttg gctttcacta 120ggtatatcta ttccataacg actatgtttt gtatttgtta atttacataa aaccatatca 180gtacatcaac gaactgtaaa aaagaaactt tagcataatt attgcggata tttaaactca 240cttgcaggta agagcaaaag gcgattgatt tccagtccgc ctcttgtcac gtgatttagt 300aagaaatttt gacagcaccc tcggtttaat ggaaaagagg ggcgtttttc gatgaacccg 360aggggagact agagaatcat ctcggttgaa tggagcatta tttttttagt agcgcccgcc 420cggagaaatg gacgttggcg aatgagccat gaattattaa ccgcccatgt ctaccagata 480gacggcacgg ccacgcgttt aaaccgcccc acatcgtatg acagtaccag cctagtcccg 540gtaaaccgca aacggacctt aattgtgacg aagggcccaa atttgatggg tcggtgttaa 600tgattagtcc tcattgtcat aataaagtgt gatgatggag gcaatgatga tatacggtag 660tactactgct cgaggtgcta tcttttaacc aatcctttga gattcttgtc gccacggagt 720tactaccttt tacaaaccgt aatgtcacat tttgcatata tcttatgtat aaatatatag 780ttcacttact acttgttctc gttttgttaa ctttcttgtt gtagttcttc ttgttcttgg 840cgtttccccc tttgttttct atctgcttca taagtaaagt gcaaagcatt ttggaagata 900ttatcaattg agtcattgaa agaaacttgg catcttccct attactaaaa ctaagaatac 960ttgattcaag aaagaagttt atattagttt tagccgtaag ataacataac aaagaagaag 1020aaagaaaaac acaattacag taacaataac aagaggacag atactaccaa aatgtgtggg 1080gaagcgggta agctgccaca gcaattaatg cacaacattt aacctacatt cttccttatc 1140ggatcctcaa aacccttaaa aacatatgcc tcaccctaac atattttcca attaaccctc 1200aatatttctc tgtcacccgg cctctatttt ccattttctt ctttacccgc cacgcgtttt 1260tttctttcaa atttttttct tctttcttct ttttcttcca cgtcctcttg cataaataaa 1320taaaccgttt tgaaaccaaa ctcgcctctc tctctccttt ttgaaatatt tttgggtttg 1380tttgatcctt tccttcccaa tctctcttgt ttaatatata ttcatttata tcacgctctc 1440tttttatctt cctttttttc ctctctcttg tattcttcct tcccctttct actcaaacca 1500agaagaaaaa gaaaaggtca atctttgtta aagaatagga tcttctacta catcagcttt 1560tagatttttc acgcttactg cttttttctt cccaagatcg aaaatttact gaattaacaa 1620tggtgagcaa gggcgaggag ctgttcaccg gggtggtgcc catcctggtc gagctggacg 1680gcgacgtaaa cggccacaag ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg 1740gcaagctgac cctgaagttc atctgcacca ccggcaagct gcccgtgccc tggcccaccc 1800tcgtgaccac cttgacctac ggcgtgcagt gcttcgcccg ctaccccgac cacatgaagc 1860agcacgactt cttcaagtcc gccatgcccg aaggctacgt ccaggagcgc accatcttct 1920tcaaggacga cggcaactac aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg 1980tgaaccgcat cgagctgaag ggcatcgact tcaaggagga cggcaacatc ctggggcaca 2040agctggagta caactacaac agccacaagg tctatatcac cgccgacaag cagaagaacg 2100gcatcaaggt gaacttcaag acccgccaca acatcgagga cggcagcgtg cagctcgccg 2160accactacca gcagaacacc cccatcggcg acggccccgt gctgctgccc gacaaccact 2220acctgagcac ccagtccgcc ctgagcaaag accccaacga gaagcgcgat cacatggtcc 2280tgctggagtt cgtgaccgcc gccgggatca ctctcggcat ggacgagctg tacaagtaac 2340tcgagaagct tgatccggct gcgaatttct tatgatttat gatttttatt attaaataag 2400ttataaaaaa aataagtgta tacaaatttt aaagtgactc ttaggtttta aaacgaaaat 2460tcttattctt gagtaactct ttcctgtagg tcaggttgct ttctcaggta tagcatgagg 2520tcgctcttat tgaccacacc tctaccggca tgccgagcat gatgaacaat gggactacaa 2580aatgaaataa agaaaaaata gaaatagaat agaagatcaa ttattaatcg ccctattctt 2640ccttattacc tacacaaaat aaagcagcaa cataagaaac aaaaacaaaa tgaaaacaaa 2700ccaaataaat ctatgtaagc atactcattt caatttgata ttcattactt gacttttttg 2760tccttatttg aggctccata agcgcgccat tttcccctac tccctttttt cgtaaatagt 2820aataatgtgc tgaaaagaac aatgaagtag ttatcataca tattccgtcg tgtcgatatg 2880aggggaggtg tctctttctt tcatcccttg tcgcaacctc caatatataa gagcataagc 2940aactgatctt actttagtaa ttaacttagc atacctggcc cgaaggaaga aaaaaaattc 3000acctcaacaa catggttcct aagttttaca aactttcaaa cggcttcaaa atcccaagca 3060ttgctttggg aacctaccgg tgtttaaacc ccagcgcctg gcggggatat tccaagatcg 3120caaacagccg aaattgtgta tgaaggtgtc aagtgcggct accgtcattt cgatactgct 3180gttctttatg gtaatgagaa ggaagttggc gatggtatca ttaaatggtt gaacgaagat 3240ccagggaacc ataaacgtga ggaaatcttc tacactacta aattatggaa ttcgcaaaac 3300ggatataaaa gagctaaagc tgccattcag caatgtttga atgaagtctc gggcttgcaa 3360tacatcgatc ttcttttgat tcattcgcca ctggaaggtt ctaaattaag gttggaaact 3420tggcgcgcca tgcaagaagc ggttgatgaa ggattggtta agtctatagg ggtttccaac 3480tatgggaaaa agcacattga tgaacttttg aactggccag aactgaagca caagccagtg 3540gtcaaccaaa tcgagatatc accttggatt atgagacaag aattag 3586594300DNAArtificial SequenceSynthetic ADE2_YJR030C donor cassette 59ggaactttat gaacatatta agtctgcttg atgaaatgtc atgtgcaggt gccgttggca 60caaaatggga acaaaattat gaaaattcag tcgaggatgg gtgtgaagca cctgaatcca 120atccataccg ttctattatt gatctttcta gtcgttccat taatataact gcagacctac 180tttcgactgt tgggaggagt aattcagcac tcaacaagaa tgagattata gctgctattc 240aaggtctcgc ccatcaatgc ctcaatccct gtgacgaact aggtatgcaa gcattgcaag 300cattagagaa cattctgtta tctcgagcaa gtcaactacg tacggaaaaa gttgcggtgg 360ataacctact agagacagga ttattaccga tttttgagtt ggatgaaatc caagatgtca 420agatgaaacg aattactagc attttatccg ttctttctaa aatgacggca cggccacgcg

480tttaaaccgc ccttcttggg tcaacttgtg gaaggcgtaa caagtaacga aacttttttg 540agagtgctta acgtatttaa caagtatgta gatgatccta cggtggaaag gcagttgcaa 600gagttgatta tttctaagag ggaaatcgag aaggagtaga ccaacgataa tgtaactata 660ccaagaaact tagtatgatg gaattttttc aaggagtcgt aaattagatt ttcgcaggta 720ataatgcgat atataagcaa tctcatttaa atatcgacgg tggcatttat accatcattt 780actgttatat ttatctaacg cgtcgcgacg cgttagataa aatacaacaa gatttttttt 840tcgtgtcacc aatggcatga aagcttcgag aataatttga aggaaatttc acttaatggg 900aaaaataaaa atgtaccctt atcgagatta ctcttttacc ctcagttcaa ttaaaattca 960tcatgaacca agtaaaagtt cctctaatta cgaacgagca agcaaattag tattgtgtgg 1020gagacgggtt cttgaataat acataacttt tcttaaaaga atcaaagaca gataaaattt 1080aagagatatt aaatattagt gagaagccga gaattttgta acaccaacat aacactgaca 1140tctttaacaa cttttaatta tgatacattt cttacgtcat gattgattat tacagctatg 1200ctgacaaatg actcttgttg catggctacg aaccgggtaa tactaagtga ttgactcttg 1260ctgacctttt attaagaact aaatggacaa tattatggag catttcatgt ataaattggt 1320gcgtaaaatc gttggatctc tcttctaagt acatcctact ataacaatca agaaaaacaa 1380gaaaatcgga caaaacaatc aagtatggat tctagaacag ttggtatatt aggaggggga 1440caattgggac gtatgattgt tgaggcagca aacaggctca acattaagac ggtaatacta 1500gatgctgaaa attctcctgc caaacaaata agcaactcca atgaccacgt taatggctcc 1560ttttccaatc ctcttgatat cgaaaaacta gctgaaaaat gtgatgtgct aacgattgag 1620attgagcatg ttgatgttcc tacactaaag aatcttcaag taaaacatcc caaattaaaa 1680atttaccctt ctccagaaac aatcagattg atacaagaca aatatattca aaaagagcat 1740ttaatcaaaa atggtatagc agttacccaa agtgttcctg tggaacaagc cagtgagacg 1800tccctattga atgttggaag agatttgggt tttccattcg tcttgaagtc gaggactttg 1860gcatacgatg gaagaggtaa cttcgttgta aagaataagg aaatgattcc ggaagctttg 1920gaagtactga aggatcgtcc tttgtacgcc gaaaaatggg caccatttac taaagaatta 1980gcagtcatga ttgtgaggtc tgttaacggt ttagtgtttt cttacccaat tgtagagact 2040atccacaagg acaatatttg tgacttatgt tatgcgcctg ctagagttcc ggactccgtt 2100caacttaagg cgaagttgtt ggcagaaaat gcaatcaaat cttttcccgg ttgtggtata 2160tttggtgtgg aaatgttcta tttagaaaca ggggaattgc ttattaacga aattgcccca 2220aggcctcaca actctggaca ttataccatt gatgcttgcg tcacttctca atttgaagct 2280catttgagat caatattgga tttgccaatg ccaaagaatt tcacatcttt ctccaccatt 2340acaacgaacg ccattatgct aaatgttctt ggagacaaac atacaaaaga taaagagcta 2400gaaacttgcg aaagagcatt ggcgactcca ggttcctcag tgtacttata tggaaaagag 2460tctagaccta acagaaaagt aggtcacata aatattattg cctccagtat ggcggaatgt 2520gaacaaaggc tgaactacat tacaggtaga actgatattc caatcaaaat ctctgtcgct 2580caaaagttgg acttggaagc aatggtcaaa ccattggttg gaatcatcat gggatcagac 2640tctgacttgc cggtaatgtc tgccgcatgt gcggttttaa aagattttgg cgttccattt 2700gaagtgacaa tagtctctgc tcatagaact ccacatagga tgtcagcata tgctatttcc 2760gcaagcaagc gtggaattaa aacaattatc gctggagctg gtggggctgc tcacttgcca 2820ggtatggtgg ctgcaatgac accacttcct gtcatcggtg tgcccgtaaa aggttcttgt 2880ctagatggag tagattcttt acattcaatt gtgcaaatgc ctagaggtgt tccagtagct 2940accgtcgcta ttaataatag tacgaacgct gcgctgttgg ctgtcagact gcttggcgct 3000tatgattcaa gttatacaac gaaaatggaa cagtttttat taaagcaaga agaagaagtt 3060cttgtcaaag cacaaaagtt agaaactgtc ggttacgaag cttatctaga aaacaagtaa 3120tatataagtt tattgatata cttgtacagc aaataattat aaaatgatat acctattttt 3180taggctttgt tatgattaca tcaaatgtgg acttcataca tagaaatcaa cgcttacagg 3240tgtccttttt taagaatttc atacataaga tcatgttgag ataattgttg ggattccatt 3300gttgataaag gctataatat taggtataca gaatatacta gaagttctcc tcgaggatat 3360aggaatcctc aaaatggaat ctatatttct acatactaat attacgatta ttcctcattc 3420cgttttatat gtttatattc attgatccta ttacattatc aatccttgcg tttcagcttc 3480ctctaacttc gatgacagct tctcataact tatgtcatca tcttaacacc gtatatgata 3540atatattgat aatataacta ttagttgata gacgatagtg gatttttatt ccaacatacc 3600acccataatg taatagatct aatgaatcca tttgtttgtt aatagtttga atgtttttat 3660cggaagaggt ttggtcatta cgtctgcaat attctttttg gtttcgatat agcatacgtg 3720cagatgattt cctgatactt catctctcaa tctcattgct ttagtaccaa aaaatctgtt 3780cctaaatttc cggtgtttaa accccagcgc ctggcgggtc ttcattattg gatataatta 3840tactgattgt agatttactg tcggttagta atcctttggt aattggtttc ttgtcaagtt 3900cttgtatcag gtaacttaga ttatttaata atgggacaga ttcacttatc gcgtgtattt 3960ctgcttccgt agttgaagta catgttaatg aagccttggt ggactttcct ccaattacct 4020ttccattaag taaatatatg ttgccaattt gtgatttata atacggttgg ttgccatacg 4080aggcatcgct tataacaact aatttatttg ttggcttaac aggtttgctt ttgtgccata 4140ttaattgctt atctctcgta ttccatatga actgtatcaa ttcatatgtc atatctaaca 4200cttgcttgga cggaaatagt atatgttgtg caagtgtgtt gatgtagtat aataggtcaa 4260atctaaattt atatccaaca tatgatgcta gacctatcag 4300603586DNAArtificial SequenceSynthetic GFP_YJR030C donor cassette 60ggaactttat gaacatatta agtctgcttg atgaaatgtc atgtgcaggt gccgttggca 60caaaatggga acaaaattat gaaaattcag tcgaggatgg gtgtgaagca cctgaatcca 120atccataccg ttctattatt gatctttcta gtcgttccat taatataact gcagacctac 180tttcgactgt tgggaggagt aattcagcac tcaacaagaa tgagattata gctgctattc 240aaggtctcgc ccatcaatgc ctcaatccct gtgacgaact aggtatgcaa gcattgcaag 300cattagagaa cattctgtta tctcgagcaa gtcaactacg tacggaaaaa gttgcggtgg 360ataacctact agagacagga ttattaccga tttttgagtt ggatgaaatc caagatgtca 420agatgaaacg aattactagc attttatccg ttctttctaa aatgacggca cggccacgcg 480tttaaaccgc ccttcttggg tcaacttgtg gaaggcgtaa caagtaacga aacttttttg 540agagtgctta acgtatttaa caagtatgta gatgatccta cggtggaaag gcagttgcaa 600gagttgatta tttctaagag ggaaatcgag aaggagtaga ccaacgataa tgtaactata 660ccaagaaact tagtatgatg gaattttttc aaggagtcgt aaattagatt ttcgcaggta 720ataatgcgat atataagcaa tctcatttaa atatcgacgg tggcatttat accatcattt 780actgttatat ttatctaacg cgtcgcgacg cgttagataa aatacaacaa gatttttttt 840tcgtgtcacc aatggcatga aagcttcgag aataatttga aggaaatttc acttaatggg 900aaaaataaaa atgtaccctt atcgagatta ctcttttacc ctcagttcaa ttaaaattca 960tcatgaacca agtaaaagtt cctctaatta cgaacgagca agcaaattag tattgtgtgg 1020gagacgggac acaattacag taacaataac aagaggacag atactaccaa aatgtgtggg 1080gaagcgggta agctgccaca gcaattaatg cacaacattt aacctacatt cttccttatc 1140ggatcctcaa aacccttaaa aacatatgcc tcaccctaac atattttcca attaaccctc 1200aatatttctc tgtcacccgg cctctatttt ccattttctt ctttacccgc cacgcgtttt 1260tttctttcaa atttttttct tctttcttct ttttcttcca cgtcctcttg cataaataaa 1320taaaccgttt tgaaaccaaa ctcgcctctc tctctccttt ttgaaatatt tttgggtttg 1380tttgatcctt tccttcccaa tctctcttgt ttaatatata ttcatttata tcacgctctc 1440tttttatctt cctttttttc ctctctcttg tattcttcct tcccctttct actcaaacca 1500agaagaaaaa gaaaaggtca atctttgtta aagaatagga tcttctacta catcagcttt 1560tagatttttc acgcttactg cttttttctt cccaagatcg aaaatttact gaattaacaa 1620tggtgagcaa gggcgaggag ctgttcaccg gggtggtgcc catcctggtc gagctggacg 1680gcgacgtaaa cggccacaag ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg 1740gcaagctgac cctgaagttc atctgcacca ccggcaagct gcccgtgccc tggcccaccc 1800tcgtgaccac cttgacctac ggcgtgcagt gcttcgcccg ctaccccgac cacatgaagc 1860agcacgactt cttcaagtcc gccatgcccg aaggctacgt ccaggagcgc accatcttct 1920tcaaggacga cggcaactac aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg 1980tgaaccgcat cgagctgaag ggcatcgact tcaaggagga cggcaacatc ctggggcaca 2040agctggagta caactacaac agccacaagg tctatatcac cgccgacaag cagaagaacg 2100gcatcaaggt gaacttcaag acccgccaca acatcgagga cggcagcgtg cagctcgccg 2160accactacca gcagaacacc cccatcggcg acggccccgt gctgctgccc gacaaccact 2220acctgagcac ccagtccgcc ctgagcaaag accccaacga gaagcgcgat cacatggtcc 2280tgctggagtt cgtgaccgcc gccgggatca ctctcggcat ggacgagctg tacaagtaac 2340tcgagaagct tgatccggct gcgaatttct tatgatttat gatttttatt attaaataag 2400ttataaaaaa aataagtgta tacaaatttt aaagtgactc ttaggtttta aaacgaaaat 2460tcttattctt gagtaactct ttcctgtagg tcaggttgct ttctcaggta tagcatgagg 2520tcgctcttat tgaccacacc tctaccggca tgccgagcat gttgagataa ttgttgggat 2580tccattgttg ataaaggcta taatattagg tatacagaat atactagaag ttctcctcga 2640ggatatagga atcctcaaaa tggaatctat atttctacat actaatatta cgattattcc 2700tcattccgtt ttatatgttt atattcattg atcctattac attatcaatc cttgcgtttc 2760agcttcctct aacttcgatg acagcttctc ataacttatg tcatcatctt aacaccgtat 2820atgataatat attgataata taactattag ttgatagacg atagtggatt tttattccaa 2880cataccaccc ataatgtaat agatctaatg aatccatttg tttgttaata gtttgaatgt 2940ttttatcgga agaggtttgg tcattacgtc tgcaatattc tttttggttt cgatatagca 3000tacgtgcaga tgatttcctg atacttcatc tctcaatctc attgctttag taccaaaaaa 3060tctgttccta aatttccggt gtttaaaccc cagcgcctgg cgggtcttca ttattggata 3120taattatact gattgtagat ttactgtcgg ttagtaatcc tttggtaatt ggtttcttgt 3180caagttcttg tatcaggtaa cttagattat ttaataatgg gacagattca cttatcgcgt 3240gtatttctgc ttccgtagtt gaagtacatg ttaatgaagc cttggtggac tttcctccaa 3300ttacctttcc attaagtaaa tatatgttgc caatttgtga tttataatac ggttggttgc 3360catacgaggc atcgcttata acaactaatt tatttgttgg cttaacaggt ttgcttttgt 3420gccatattaa ttgcttatct ctcgtattcc atatgaactg tatcaattca tatgtcatat 3480ctaacacttg cttggacgga aatagtatat gttgtgcaag tgtgttgatg tagtataata 3540ggtcaaatct aaatttatat ccaacatatg atgctagacc tatcag 35866143DNAArtificial SequenceSynthetic Recognition sequence for ZFN.YJR030C 61cccggtatca gcaacccccc atgacgataa cgttgatgaa acg 436240DNAArtificial SequenceSynthetic Recognition sequence for ZFN.SFC1 62cacctttacc gtttatgaat atgtaaggga gcatttagaa 406327DNAArtificial SequenceSynthetic Primer CUT351 63gcgaatgagc catgaattat taaccgc 276433DNAArtificial SequenceSynthetic Primer CUT350 64agatgaaacg aattactagc attttatccg ttc 336547DNAArtificial SequenceSynthetic Primer CUT371 65taactaccat tactcagtgt acttgattgt tttgtccgat tttcttg 476622DNAArtificial SequenceSynthetic Primer HJ788 66gccgggtgac agagaaatat tg 22

* * * * *


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed