Cytosolic Isobutanol Pathway Localization for the Production of Isobutanol Urano; Jun ; et al. [Gevo, Inc.]

Cytosolic Isobutanol Pathway Localization for the Production of Isobutanol

Urano; Jun ; et al.

Patent Application Summary

U.S. patent application number 14/157799 was filed with the patent office on 2014-10-16 for cytosolic isobutanol pathway localization for the production of isobutanol. This patent application is currently assigned to Gevo, Inc.. The applicant listed for this patent is Gevo, Inc.. Invention is credited to Aristos A. Aristidou, Ruth Berry, Thomas Buelter, Catherine Asleson Dundon, Reid M. Renny Feldman, Andrew Hawkins, Ishmeet Kalra, Doug Lies, Peter Meinhold, Matthew Peters, Stephanie Porter-Scheinman, Christopher Smith, Jun Urano.

Application Number	20140308721 14/157799
Document ID	/
Family ID	43586479
Filed Date	2014-10-16

United States Patent Application	20140308721
Kind Code	A1
Urano; Jun ; et al.	October 16, 2014

Cytosolic Isobutanol Pathway Localization for the Production of Isobutanol

Abstract

The present invention provides recombinant microorganisms comprising isobutanol producing metabolic pathway with at least one isobutanol pathway enzyme localized in the cytosol, wherein said recombinant microorganism is selected to produce isobutanol from a carbon source. Methods of using said recombinant microorganisms to produce isobutanol are also provided. In various aspects of the invention, the recombinant microorganisms may comprise a cytosolically active isobutanol pathway enzymes. In some embodiments, the invention provides mutated, modified, and/or chimeric isobutanol pathway enzymes with cytosolic activity. In various embodiments described herein, the recombinant microorganisms may be microorganisms of the Saccharomyces clade, Crabtree-negative yeast microorganisms, Crabtree-positive yeast microorganisms, post-WGD (whole genome duplication) yeast microorganisms, pre-WGD (whole genome duplication) yeast microorganisms, and non-fermenting yeast microorganisms.

Inventors:

Urano; Jun; (Irvine, CA) ; Dundon; Catherine Asleson; (Englewood, CO) ; Meinhold; Peter; (Topanga, CA) ; Feldman; Reid M. Renny; (San Francisco, CA) ; Aristidou; Aristos A.; (Maple Grove, MN) ; Hawkins; Andrew; (Parker, CO) ; Buelter; Thomas; (Santa Monica, CA) ; Peters; Matthew; (Highlands Ranch, CO) ; Lies; Doug; (Parker, CO) ; Porter-Scheinman; Stephanie; (Englewood, CO) ; Smith; Christopher; (Parker, CO) ; Berry; Ruth; (Englewood, CO) ; Kalra; Ishmeet; (Englewood, CO)

Applicant:

Name	City	State	Country	Type
Gevo, Inc.	Englewood	CO	US

Assignee:

Gevo, Inc.
Englewood
CO

Family ID:

43586479

Appl. No.:

14/157799

Filed:

January 17, 2014

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13176452	Jul 5, 2011
14157799
12855276	Aug 12, 2010	8232089
13176452
61272058	Aug 12, 2009
61272059	Aug 12, 2009

Current U.S. Class:	435/160 ; 435/254.21
Current CPC Class:	C12N 9/1022 20130101; C12N 15/81 20130101; C12N 9/88 20130101; Y02E 50/10 20130101; C12P 7/16 20130101; C12N 9/0006 20130101
Class at Publication:	435/160 ; 435/254.21
International Class:	C12N 15/81 20060101 C12N015/81; C12P 7/16 20060101 C12P007/16

Goverment Interests

ACKNOWLEDGMENT OF GOVERNMENTAL SUPPORT

[0002] This invention was made with government support under Contract No. IIP-0823122, awarded by the National Science Foundation, and under Contract No. EP-D-09-023, awarded by the Environmental Protection Agency. The government has certain rights in the invention.

Claims

1. A recombinant yeast microorganism comprising an isobutanol producing metabolic pathway, wherein said isobutanol producing metabolic pathway comprises the following substrate to product conversions: (i) pyruvate to acetolactate; (ii) acetolactate to 2,3-dihydroxyisovalerate; (iii) 2,3-dihydroxyisovalerate to .alpha.-ketoisovalerate; (iv) .alpha.-ketoisovalerate to isobutyraldehyde; and (v) isobutyraldehyde to isobutanol; wherein a) the enzyme that catalyzes a substrate to product conversion of pyruvate to acetolactate is an acetolactate synthase; b) the enzyme that catalyzes a substrate to product conversion of acetolactate to 2,3-dihydroxyisovalerate is a ketol-acid reductoisomerase derived from Lactococcus lactis; c) the enzyme that catalyzes a substrate to product conversion of 2,3-dihydroxyisovalerate to .alpha.-ketoisovalerate is a dihydroxy acid dehydratase; d) the enzyme that catalyzes a substrate to product conversion of .alpha.-ketoisovalerate to isobutyraldehyde is an .alpha.-ketoisovalerate decarboxylase; and e) the enzyme that catalyzes a substrate to product conversion of isobutyraldehyde to isobutanol is an alcohol dehydrogenase.

2. The recombinant yeast microorganism of claim 1, wherein said acetolactate synthase is derived from a bacterial organism.

3. The recombinant yeast microorganism of claim 2, wherein said bacterial organism is Bacillus subtilis.

4. The recombinant yeast microorganism of claim 1, wherein said ketol-acid reductoisomerase is an NADH-dependent ketol-acid reductoisomerase.

5. The recombinant yeast microorganism of claim 1, wherein said dihydroxy acid dehydratase comprises the amino acid sequence P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27), wherein X is any natural or non-natural amino acid.

6. The recombinant yeast microorganism of claim 5, wherein said dihydroxy acid dehydratase enzyme is derived from a bacterial organism.

7. The recombinant yeast microorganism of claim 6, wherein said bacterial organism is Lactococcus lactis.

8. The recombinant yeast microorganism of claim 1, wherein said .alpha.-ketoisovalerate decarboxylase is derived from a bacterial organism.

9. The recombinant yeast microorganism of claim 8, wherein said bacterial organism is Lactococcus lactis.

10. The recombinant yeast microorganism of claim 1, wherein said alcohol dehydrogenase is derived from a bacterial organism.

11. The recombinant yeast microorganism of claim 10, wherein said bacterial organism is Lactococcus lactis.

12. The recombinant yeast microorganism of claim 1, wherein the recombinant yeast microorganism has been engineered to inactivate one or more endogenous pyruvate decarboxylase (PDC) genes.

13. The recombinant yeast microorganism of claim 12, wherein said one or more endogenous PDC genes is selected from PDC1, PDC5, and PDC6.

14. The recombinant yeast microorganism of claim 1, wherein the recombinant yeast microorganism has been engineered to inactivate one or more endogenous glycerol-3-phosphate dehydrogenase (GPD) genes.

15. The recombinant yeast microorganism of claim 14, wherein said one or more endogenous GPD genes is selected from GPD1 and GPD2.

16. The recombinant yeast microorganism of claim 1, wherein the recombinant yeast microorganism is a yeast microorganism of the Saccharomyces clade.

17. The recombinant yeast microorganism of claim 16, wherein said yeast microorganism of the Saccharomyces clade is S. cerevisiae.

18. A method of producing isobutanol, comprising: (a) providing a recombinant yeast microorganism according to claim 1; and (b) cultivating said recombinant yeast microorganism in a culture medium containing a feedstock providing the carbon source, until a recoverable quantity of the isobutanol is produced.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of U.S. application Ser. No. 13/176,452, filed Jul. 5, 2011, which is a divisional of U.S. application Ser. No. 12/855,276, filed Aug. 12, 2010, which issued as U.S. Pat. No. 8,232,089, which claims the benefit of U.S. Provisional Application Ser. No. 61/272,058, filed Aug. 12, 2009, and U.S. Provisional Application Ser. No. 61/272,059, filed Aug. 12, 2009, each of which are herein incorporated by reference in their entireties for all purposes.

TECHNICAL FIELD

[0003] Recombinant microorganisms and methods of producing such organisms are provided. Also provided are methods of producing metabolites that are biofuels by contacting a suitable substrate with recombinant microorganisms and enzymatic preparations therefrom.

DESCRIPTION OF THE TEXT FILE SUBMITTED ELECTRONICALLY

[0004] The contents of the text file submitted electronically herewith are incorporated herein by reference in their entirety: A computer readable format copy of the Sequence Listing (filename: GEVO.sub.--041.sub.--18US_SeqList_ST25.txt, date recorded: Jan. 16, 2014, file size: 343 kilobytes).

BACKGROUND

[0005] Biofuels have a long history ranging back to the beginning of the 20th century. As early as 1900, Rudolf Diesel demonstrated at the World Exhibition in Paris, France, an engine running on peanut oil. Soon thereafter, Henry Ford demonstrated his Model T running on ethanol derived from corn. Petroleum-derived fuels displaced biofuels in the 1930s and 1940s due to increased supply, and efficiency at a lower cost.

[0006] Market fluctuations in the 1970s coupled to the decrease in US oil production led to an increase in crude oil prices and a renewed interest in biofuels. Today, many interest groups, including policy makers, industry planners, aware citizens, and the financial community, are interested in substituting petroleum-derived fuels with biomass-derived biofuels. The leading motivations for developing biofuels are of economical, political, and environmental nature.

[0007] One is the threat of `peak oil`, the point at which the consumption rate of crude oil exceeds the supply rate, thus leading to significantly increased fuel cost results in an increased demand for alternative fuels. In addition, instability in the Middle East and other oil-rich regions has increased the demand for domestically produced biofuels. Also, environmental concerns relating to the possibility of carbon dioxide related climate change is an important social and ethical driving force which is starting to result in government regulations and policies such as caps on carbon dioxide emissions from automobiles, taxes on carbon dioxide emissions, and tax incentives for the use of biofuels.

[0008] Ethanol is the most abundant fermentatively produced fuel today but has several drawbacks when compared to gasoline. Butanol, in comparison, has several advantages over ethanol as a fuel: it can be made from the same feedstocks as ethanol but, unlike ethanol, it is compatible with gasoline at any ratio and can also be used as a pure fuel in existing combustion engines without modifications. Unlike ethanol, butanol does not absorb water and can thus be stored and distributed in the existing petrochemical infrastructure. Due to its higher energy content which is close to that of gasoline, the fuel economy (miles per gallon) is better than that of ethanol. Also, butanol-gasoline blends have lower vapor pressure than ethanol-gasoline blends, which is important in reducing evaporative hydrocarbon emissions.

[0009] Isobutanol has the same advantages as butanol with the additional advantage of having a higher octane number due to its branched carbon chain. Isobutanol is also useful as a commodity chemical and is also a precursor to MTBE.

[0010] Isobutanol has been produced in recombinant microorganisms expressing a heterologous, five-step metabolic pathway (See, e.g., WO/2007/050671 to Donaldson et al., WO/2008/098227 to Liao et al., and WO/2009/103533 to Festel et al.). However, the microorganisms produced have fallen short of commercial relevance due to their low performance characteristics, including, for example low productivity, low titer, low yield, and the requirement for oxygen during the fermentation process. Thus, recombinant microorganisms exhibiting increased isobutanol productivity, titer, and/or yield are desirable.

SUMMARY OF THE INVENTION

[0011] The present invention provides cytosolically active dihydroxyacid dehydratase (DHAD) enzymes and recombinant microorganisms comprising said cytosolically active DHAD enzymes. In some embodiments, said recombinant microorganisms may further comprise one or more additional enzymes catalyzing a reaction in an isobutanol producing metabolic pathway. As described herein, the recombinant microorganisms of the present invention are useful for the production of several beneficial metabolites, including, but not limited to isobutanol.

[0012] In a first aspect, the invention provides cytosolically active dihydroxyacid dehydratase (DHAD) enzymes. These cytosolically active DHAD enzymes generally exhibit the ability to convert 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol. The cytosolically active DHAD enzymes of the present invention, as described herein, can include native (i.e. parental) DHAD enzymes that exhibit cytosolic activity, as well DHAD enzymes that have been modified or mutated to increase their cytosolic localization and/or activity as compared to native (i.e. parental) DHAD enzymes.

[0013] In various embodiments described herein, the DHAD enzymes may be derived from a prokaryotic organism. In one embodiment, the prokaryotic organism is a bacterial organism. In another embodiment, the bacterial organism is Lactococcus lactis. In a specific embodiment, the DHAD enzyme from L. lactis comprises the amino acid sequence of SEQ ID NO: 18. In another embodiment, the bacterial organism is Francisella tularensis. In a specific embodiment, the DHAD enzyme from F. tularensis comprises the amino acid sequence of SEQ ID NO: 14. In another embodiment, the bacterial organism is Gramella forsetii. In a specific embodiment, the DHAD enzyme from G. forsetii comprises the amino acid sequence of SEQ ID NO: 17.

[0014] In alternative embodiments described herein, the DHAD enzyme may be derived from a eukaryotic organism. In one embodiment, the eukaryotic organism is a fungal organism. In an exemplary embodiment, the fungal organism is Neurospora crassa. In a specific embodiment, the DHAD enzyme from N. crassa comprises the amino acid sequence of SEQ ID NO: 165.

[0015] In some embodiments, the invention provides modified or mutated DHAD enzymes, wherein said DHAD enzymes exhibit increased cytosolic activity as compared to their parental DHAD enzymes. In another embodiment, the invention provides modified or mutated DHAD enzymes, wherein said DHAD enzymes exhibit increased cytosolic activity as compared to the DHAD enzyme comprised by the amino acid sequence of SEQ ID NO: 11.

[0016] In further embodiments, the invention provides DHAD enzymes comprising the amino acid sequence P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27), wherein X is any natural or non-natural amino acid, and wherein said DHAD enzymes exhibit the ability to convert 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol.

[0017] In some embodiments, the DHAD enzymes of the present invention exhibit a properly folded iron-sulfur cluster domain and/or redox active domain in the cytosol. In one embodiment, the DHAD enzymes comprise a mutated or modified iron-sulfur cluster domain and/or redox active domain.

[0018] In another aspect, the present invention provides recombinant microorganisms comprising a cytosolically active DHAD enzyme. In one embodiment, the invention provides recombinant microorganisms comprising a DHAD enzyme derived from a prokaryotic organism, wherein said DHAD enzyme exhibits activity in the cytosol. In one embodiment, the DHAD enzyme is derived from a bacterial organism. In a specific embodiment, the DHAD enzyme is derived from L. lactis and comprises the amino acid sequence of SEQ ID NO: 18. In another embodiment, the invention provides recombinant microorganisms comprising a DHAD enzyme derived from a eukaryotic organism, wherein said DHAD enzyme exhibits activity in the cytosol. In one embodiment, the DHAD enzyme is derived from a fungal organism. In an alternative embodiment, the DHAD enzyme is derived from a yeast organism.

[0019] In one embodiment, the invention provides recombinant microorganisms comprising a modified or mutated DHAD enzyme, wherein said DHAD enzyme exhibits increased cytosolic activity as compared to the parental DHAD enzyme. In another embodiment, the invention provides recombinant microorganisms comprising a modified or mutated DHAD enzyme, wherein said DHAD enzyme exhibits increased cytosolic activity as compared to the DHAD enzyme comprised by the amino acid sequence of SEQ ID NO: 11.

[0020] In another embodiment, the invention provides recombinant microorganisms comprising a DHAD enzyme comprising the amino acid sequence P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27), wherein X is any natural or non-natural amino acid, and wherein said DHAD enzymes exhibit the ability to convert 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol.

[0021] In some embodiments, the invention provides recombinant microorganisms comprising a DHAD enzyme fused to a peptide tag, whereby said DHAD enzyme exhibits increased cytosolic localization and/or cytosolic DHAD activity as compared to the parental microorganism. In one embodiment, the peptide tag is non-cleavable. In another embodiment, the peptide tag is fused at the N-terminus of the DHAD enzyme. In another embodiment, the peptide tag is fused at the C-terminus of the DHAD enzyme. In certain embodiments, the peptide tag may be selected from the group consisting of ubiquitin, ubiquitin-like (UBL) proteins, myc, HA-tag, green fluorescent protein (GFP), and the maltose binding protein (MBP).

[0022] In certain embodiments described herein, it may be desirable to further overexpress an additional enzyme that converts 2,3-dihydroxyisovalerate (DHIV) to ketoisovalerate (KIV) in the cytosol. In a specific embodiment, the enzyme may be selected from the group consisting of 3-isopropylmalate isomerase (Leu1p) and imidazoleglycerol-phosphate dehydrogenase (His3p).

[0023] In various embodiments described herein, the recombinant microorganisms may be further engineered to express an isobutanol producing metabolic pathway comprising at least one exogenous gene that catalyzes a step in the conversion of pyruvate to isobutanol. In one embodiment, the recombinant microorganism may be engineered to express an isobutanol producing metabolic pathway comprising at least two exogenous genes. In another embodiment, the recombinant microorganism may be engineered to express an isobutanol producing metabolic pathway comprising at least three exogenous genes. In another embodiment, the recombinant microorganism may be engineered to express an isobutanol producing metabolic pathway comprising at least four exogenous genes. In another embodiment, the recombinant microorganism may be engineered to express an isobutanol producing metabolic pathway comprising five exogenous genes. Thus, the present invention further provides recombinant microorganisms that comprise an isobutanol producing metabolic pathway and methods of using said recombinant microorganisms to produce isobutanol.

[0024] In one embodiment, the recombinant microorganisms comprise an isobutanol producing metabolic pathway with at least one isobutanol pathway enzyme localized in the cytosol. In another embodiment, the recombinant microorganisms comprise an isobutanol producing metabolic pathway with at least two isobutanol pathway enzymes localized in the cytosol. In another embodiment, the recombinant microorganisms comprise an isobutanol producing metabolic pathway with at least three isobutanol pathway enzymes localized in the cytosol. In another embodiment, the recombinant microorganisms comprise an isobutanol producing metabolic pathway with at least four isobutanol pathway enzymes localized in the cytosol. In an exemplary embodiment, the recombinant microorganisms comprise an isobutanol producing metabolic pathway with five isobutanol pathway enzymes localized in the cytosol. In a further exemplary embodiment, at least one of the pathway enzymes localized to the cytosol is a cytosolically active DHAD enzyme as disclosed herein.

[0025] In various embodiments described herein, the isobutanol pathway enzyme(s) is/are selected from the group consisting of acetolactate synthase (ALS), ketol-acid reductoisomerase (KARI), dihydroxyacid dehydratase (DHAD), 2-keto-acid decarboxylase (KIVD), and alcohol dehydrogenase (ADH).

[0026] As described herein, the cytosolically active isobutanol pathway enzymes of the present invention can include native (i.e. parental) enzymes that exhibit cytosolic activity, as well isobutanol pathway enzymes that have been modified or mutated to increase their cytosolic localization and/or activity as compared to native (i.e. parental) pathway enzymes.

[0027] In various embodiments described herein, the isobutanol pathway enzymes may be derived from a prokaryotic organism. In alternative embodiments described herein, the isobutanol pathway enzymes may be derived from a eukaryotic organism.

[0028] In some embodiments, the invention provides modified or mutated isobutanol pathway enzymes, wherein said isobutanol pathway enzymes exhibit increased cytosolic activity as compared to their parental isobutanol pathway enzymes. In another embodiment, the invention provides modified or mutated isobutanol pathway enzymes, wherein said isobutanol pathway enzymes exhibit increased cytosolic activity as compared to the homologous isobutanol pathway enzyme from S. cerevisiae.

[0029] In various embodiments described herein, at least one of the isobutanol pathway enzymes exhibiting cytosolic activity is ALS. In one embodiment, the ALS is derived from a prokaryotic organism, including, but not limited to Bacillus subtilis or L. lactis. In another embodiment, the ALS is derived from a eukaryotic organism, including, but not limited to Magnaporthe grisea, Phaeosphaeria nodorum, Talaromyces stipitatus, and Trichoderma atroviride.

[0030] In additional embodiments, at least one of the isobutanol pathway enzymes exhibiting cytosolic activity is KARI. In one embodiment, the KARI is derived from a prokaryotic organism, including, but not limited to Escherichia coli, B. subtilis or L. lactis. In another embodiment, the KARI is derived from a eukaryotic organism, including, but not limited to Piromyces sp. E2, S. cerevisiae, and Arabidopsis. In certain specific embodiments, the KARI comprises an amino acid sequence selected from an organism selected from the group consisting of E. coli, S. cerevisiae, B. subtilis Piromyces sp. E2, Buchnera aphidicola, Spinacia oleracea, Oryza sativa, Chlamydomonas reinhardtii, N. crassa, Schizosaccharomyces pombe, Laccaria bicolor, Ignicoccus hospitalis, Picrophilus torridus, Acidiphilium cryptum, Cyanobacteria/Synechococcus sp., Zymomonas mobilis, Bacteroides thetaiotaomicron, Methanococcus maripaludis, Vibrio fischeri, Shewanella sp, G. forsetii, Psychromonas ingrhamaii, and Cytophaga hutchinsonii. In additional embodiments, the KARI may be an NADH-dependent KARI.

[0031] In various embodiments described herein, the isobutanol pathway enzyme may be mutated or modified to remove an N-terminal mitochondrial targeting sequence (MTS). Removal of the MTS can increase cytosolic localization of the isobutanol pathway enzyme and/or increase the cytosolic activity of the isobutanol pathway enzyme as compared to the parental isobutanol pathway enzyme.

[0032] In some embodiments, the MTS may be modified or mutated to reduce or eliminate its ability to target the isobutanol pathway enzyme to the mitochondria. Selected modification of the MTS can increase cytosolic localization of the isobutanol pathway enzyme and/or increase the cytosolic activity of the isobutanol pathway enzyme as compared to the parental isobutanol pathway enzyme.

[0033] In additional embodiments, the invention provides recombinant microorganisms comprising an isobutanol pathway enzyme fused to a peptide tag, whereby said isobutanol pathway enzyme exhibits increased cytosolic localization and/or cytosolic activity as compared to the parental enzyme. As a result, the recombinant microorganism comprising the tagged isobutanol pathway enzyme will generally exhibit an increased ability to perform a step involved in the conversion of pyruvate to isobutanol in the cytosol. In one embodiment, the peptide tag is non-cleavable. In another embodiment, the peptide tag is fused at the N-terminus of the isobutanol pathway enzyme. In another embodiment, the peptide tag is fused at the C-terminus of the isobutanol pathway enzyme. In certain embodiments, the peptide tag may be selected from the group consisting of ubiquitin, ubiquitin-like (UBL) proteins, myc, HA-tag, green fluorescent protein (GFP), and the maltose binding protein (MBP).

[0034] In various embodiments described herein, the recombinant microorganisms may further comprise a nucleic acid encoding a chaperone protein, wherein said chaperone protein assists the folding of a protein exhibiting cytosolic activity. In a preferred embodiment, the protein exhibiting cytosolic activity is an isobutanol pathway enzyme. In one embodiment, the chaperone may be a native protein. In another embodiment, the chaperone protein may be an exogenous protein. In some embodiments, the chaperone protein may be selected from the group consisting of: endoplasmic reticulum oxidoreductin 1 (Ero1) including variants of Ero1 that have been suitably altered to reduce or prevent its normal localization to the endoplasmic reticulum; thioredoxins (including, but not limited to, Trx1 and Trx2), thioredoxin reductase (Trr1), glutaredoxins (including, but not limited to, Grx1, Grx2, Grx3, Grx4, Grx5, Grx6, Grx7, and Grx8), glutathione reductase (Gir1), and Jac1, including variants of Jac1 that have been suitably altered to reduce or prevent its normal mitochondrial localization; and homologs or variants thereof.

[0035] In some embodiments, the recombinant microorganisms may further comprise one or more genes encoding an iron-sulfur cluster assembly protein. In one embodiment, the iron-sulfur cluster assembly protein encoding genes may be derived from prokaryotic organisms. In one embodiment, the iron-sulfur cluster assembly protein encoding genes are derived from a bacterial organism, including, but not limited to E. coli, L. lactis, Helicobacter pylori, and Entamoeba histolytica. In specific embodiments, the bacterially derived iron-sulfur cluster assembly protein encoding genes are selected from the group consisting of cyaY, iscS, iscU, iscA, hscB, hscA, fdx, isuX, sufA, sufB, sufC, sufD, sufS, sufE, apbC, and homologs or variants thereof.

[0036] In another embodiment, the iron-sulfur cluster assembly protein encoding genes may be derived from eukaryotic organisms, including, but not limited to yeasts and plants. In one embodiment, the iron-sulfur cluster protein encoding genes are derived from a yeast organism, including, but not limited to S. cerevisiae. In specific embodiments, the yeast derived genes encoding iron-sulfur cluster assembly proteins are selected from the group consisting of Cfd1, Nbp35, Nar1, Cia1, and homologs or variants thereof. In a further embodiment, the iron-sulfur cluster assembly protein encoding genes may be derived from plant nuclear genes which encode proteins translocated to chloroplast or plant genes found in the chloroplast genome itself.

[0037] In some embodiments, one or more genes encoding an iron-sulfur cluster assembly protein may be mutated or modified to remove a signal peptide, whereby localization of the product of said one or more genes to the mitochondria or other subcellular compartment is prevented. In certain embodiments, it may be preferable to overexpress one or more genes encoding an iron-sulfur cluster assembly protein.

[0038] In certain embodiments described herein, it may be desirable to reduce or eliminate the activity and/or proteins levels of one or more iron-sulfur cluster containing cytosolic proteins. In a specific embodiment, the iron-sulfur cluster containing cytosolic protein is 3-isopropylmalate dehydratase (Leu1p). In one embodiment, the recombinant microorganism comprises a mutation in the LEU1 gene resulting in the reduction of Leu1p protein levels. In another embodiment, the recombinant microorganism comprises a partial deletion in the LEU1 gene resulting in the reduction of Leu1p protein levels. In another embodiment, the recombinant microorganism comprises a complete deletion in the LEU1 gene resulting in the reduction of Leu1p protein levels. In another embodiment, the recombinant microorganism comprises a modification of the regulatory region associated with the LEU1 gene resulting in the reduction of Leu1p protein levels. In yet another embodiment, the recombinant microorganism comprises a modification of a transcriptional regulator for the LEU1 gene resulting in the reduction of Leu1p protein levels.

[0039] In additional embodiments, the present invention provides recombinant microorganisms comprising chimeric proteins consisting of isobutanol pathway enzymes. In one embodiment, the chimeric proteins consist of ALS and at least one additional protein. In a specific embodiment, the additional protein is KARI. In a preferred embodiment, the chimeric protein exhibits the biocatalytic properties of both ALS and KARI. Such a chimeric protein allows for an increase in the concentration of 2-acetolactate at the active site of KARI as compared to the parental microorganism, giving the recombinant microorganism an enhanced ability to convert 2-acetolactate to 2,3-dihydroxyisovalerate. In another embodiment, the chimeric proteins consist of KARI and at least one additional protein. In a specific embodiment, the additional protein is DHAD. In a preferred embodiment, the chimeric protein exhibits the biocatalytic properties of both KARI and DHAD. In each of the various embodiments described herein, the proteins may be connected via a flexible linker.

[0040] In various embodiments described herein, the recombinant microorganisms may be engineered to express native genes that catalyze a step in the conversion of pyruvate to isobutanol. In one embodiment, the recombinant microorganism is engineered to increase the activity of a native metabolic pathway gene for conversion of pyruvate to isobutanol. In another embodiment, the recombinant microorganism is further engineered to include at least one enzyme encoded by an exogenous gene and at least one enzyme encoded by a native gene. In yet another embodiment, the recombinant microorganism comprises a reduction in the activity of a native metabolic pathway as compared to a parental microorganism.

[0041] In another embodiment, the present invention provides recombinant microorganisms comprising a scaffold system tethered to one or more isobutanol pathway enzymes. In a specific embodiment, the scaffold system is the MAP kinase scaffold (Ste5) system. In a further embodiment, one or more of the isobutanol pathway enzymes may be modified or mutated to comprise a protein domain allowing for binding to the scaffold system.

[0042] In various embodiments described herein, the recombinant microorganisms may be microorganisms of the Saccharomyces clade, Saccharomyces sensu stricto microorganisms, Crabtree-negative yeast microorganisms, Crabtree-positive yeast microorganisms, post-WGD (whole genome duplication) yeast microorganisms, pre-WGD (whole genome duplication) yeast microorganisms, and non-fermenting yeast microorganisms.

[0043] In some embodiments, the recombinant microorganisms may be yeast recombinant microorganisms of the Saccharomyces clade.

[0044] In some embodiments, the recombinant microorganisms may be Saccharomyces sensu stricto microorganisms. In one embodiment, the Saccharomyces sensu stricto is selected from the group consisting of S. cerevisiae, S. kudriavzevii, S. mikatae, S. bayanus, S. uvarum. S. carocanis and hybrids thereof.

[0045] In some embodiments, the recombinant microorganisms may be Crabtree-negative recombinant yeast microorganisms. In one embodiment, the Crabtree-negative yeast microorganism is classified into a genera selected from the group consisting of Kluyveromyces, Pichia, Hansenula, Issatchenkia, or Candida. In additional embodiments, the Crabtree-negative yeast microorganism is selected from Kluyveromyces lactis, Kluyveromyces marxianus, Pichia anomala, Pichia stipitis, Hansenula anomala, Issatchenkia orientalis, Candida utilis and Kluyveromyces waltii.

[0046] In some embodiments, the recombinant microorganisms may be Crabtree-positive recombinant yeast microorganisms. In one embodiment, the Crabtree-positive yeast microorganism is classified into a genera selected from the group consisting of Saccharomyces, Kluyveromyces, Zygosaccharomyces, Debaryomyces, Candida, Pichia and Schizosaccharomyces. In additional embodiments, the Crabtree-positive yeast microorganism is selected from the group consisting of Saccharomyces cerevisiae, Saccharomyces uvarum, Saccharomyces bayanus, Saccharomyces paradoxus, Saccharomyces castelli, Saccharomyces kluyveri, Kluyveromyces thermotolerans, Candida glabrata, Z. bailli, Z. rouxii, Debaryomyces hansenii, Pichia pastorius, Schizosaccharomyces pombe, and Saccharomyces uvarum.

[0047] In some embodiments, the recombinant microorganisms may be post-WGD (whole genome duplication) yeast recombinant microorganisms. In one embodiment, the post-WGD yeast recombinant microorganism is classified into a genera selected from the group consisting of Saccharomyces or Candida. In additional embodiments, the post-WGD yeast is selected from the group consisting of Saccharomyces cerevisiae, Saccharomyces uvarum, Saccharomyces bayanus, Saccharomyces paradoxus, Saccharomyces castelli, and Candida glabrata.

[0048] In some embodiments, the recombinant microorganisms may be pre-WGD (whole genome duplication) yeast recombinant microorganisms. In one embodiment, the pre-WGD yeast recombinant microorganism is classified into a genera selected from the group consisting of Saccharomyces, Kluyveromyces, Candida, Pichia, Debaryomyces, Hansenula, Pachysolen, Yarrowia, Issatchenkia, and Schizosaccharomyces. In additional embodiments, the pre-WGD yeast is selected from the group consisting of Saccharomyces kluyveri, Kluyveromyces thermotolerans, Kluyveromyces marxianus, Kluyveromyces waltii, Kluyveromyces lactis, Candida tropicalis, Pichia pastoris, Pichia anomala, Pichia stipitis, Debaryomyces hansenii, Hansenula anomala, Pachysolen tannophilis, Yarrowia lipolytica, Issatchenkia orientalis, and Schizosaccharomyces pombe.

[0049] In some embodiments, the recombinant microorganisms may be microorganisms that are non-fermenting yeast microorganisms, including, but not limited to those, classified into a genera selected from the group consisting of Tricosporon, Rhodotorula, or Myxozyma.

[0050] In another aspect, the present invention provides methods of producing isobutanol using one or more recombinant microorganisms of the invention. In one embodiment, the method includes cultivating one or more recombinant microorganisms in a culture medium containing a feedstock providing the carbon source until a recoverable quantity of the isobutanol is produced and optionally, recovering the isobutanol. In one embodiment, the microorganism is selected to produce isobutanol from a carbon source at a yield of at least about 5 percent theoretical. In another embodiment, the microorganism is selected to produce isobutanol at a yield of at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent theoretical, at least about 85 percent theoretical, or at least about 90 percent theoretical.

[0051] In one embodiment, the microorganism produces isobutanol from a carbon source at a specific productivity of at least about 0.7 mg/L/hr per OD. In another embodiment, the microorganism produces isobutanol from a carbon source at a specific productivity of at least about 1 mg/L/hr per OD, at least about 10 mg/L/hr per OD, at least about 50 mg/L/hr per OD, at least about 100 mg/L/hr per OD, at least about 250 mg/L/hr per OD, or at least about 500 g/L/hr per OD.

BRIEF DESCRIPTION OF DRAWINGS

[0052] Illustrative embodiments of the invention are illustrated in the drawings, in which:

[0053] FIG. 1 illustrates an exemplary embodiment of an isobutanol pathway.

[0054] FIG. 2 illustrates acetoin produced from GEVO 1187 (no ALS), 2280 (B. subtilis AlsS not codon optimized), GEVO 2618 (B. subtilis AlsS), GEVO 2621 (T. atroviride ALS) and GEVO 2622 (T. stipitatus ALS). All acetoin values are normalized to OD.sub.600 and reported as mM/OD.

[0055] FIG. 3 illustrates the specific activity at pH 7.5 of KARI enzyme in whole cell lysates for GEVO1803 containing empty vector (pGV1102), ilv5.DELTA.N47(pGV1831), ilv5.DELTA.N46(pGV1901), Full length ILV5 (pGV1833) and E. coli ilvC codon optimized for S. cerevisiae (pGV1824).

[0056] FIG. 4 illustrates the results from fermentations of GEVO2107 transformed with plasmids for expression of KARI and different DHAD homologs (shown in legend).

[0057] FIG. 5 illustrates a phylogenetic tree of 53 representative DHAD homologs following pairwise global alignments and progressive assembly of alignments using Neighbor-Joining phylogeny.

DETAILED DESCRIPTION

[0058] As used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides and reference to "the microorganism" includes reference to one or more microorganisms, and so forth.

[0059] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice of the disclosed methods and compositions, the exemplary methods, devices and materials are described herein.

[0060] Any publications discussed above and throughout the text are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior disclosure.

[0061] The term "microorganism" includes prokaryotic and eukaryotic microbial species from the Domains Archaea, Bacteria and Eucarya, the latter including yeast and filamentous fungi, protozoa, algae, or higher Protista. The terms "microbial cells" and "microbes" are used interchangeably with the term microorganism.

[0062] The term "genus" is defined as a taxonomic group of related species according to the Taxonomic Outline of Bacteria and Archaea (Garrity et al., 2007, TOBA Release 7.7, Michigan State University Board of Trustees).

[0063] The term "species" is defined as a collection of closely related organisms with greater than 97% 16S ribosomal RNA sequence homology and greater than 70% genomic hybridization and sufficiently different from all other organisms so as to be recognized as a distinct unit.

[0064] The terms "recombinant microorganism," "modified microorganism" and "recombinant host cell" are used interchangeably herein and refer to microorganisms that have been genetically modified to express or over-express endogenous polynucleotides, or to express heterologous polynucleotides, such as those included in a vector, or which have an alteration in expression of an endogenous gene. By "alteration" it is meant that the expression of the gene, or level of a RNA molecule or equivalent RNA molecules encoding one or more polypeptides or polypeptide subunits, or activity of one or more polypeptides or polypeptide subunits is up regulated or down regulated, such that expression, level, or activity is greater than or less than that observed in the absence of the alteration. For example, the term "alter" can mean "inhibit," but the use of the word "alter" is not limited to this definition.

[0065] The term "expression" with respect to a gene sequence refers to transcription of the gene and, as appropriate, translation of the resulting mRNA transcript to a protein. Thus, as will be clear from the context, expression of a protein results from transcription and translation of the open reading frame sequence. The level of expression of a desired product in a host cell may be determined on the basis of either the amount of corresponding mRNA that is present in the cell, or the amount of the desired product encoded by the selected sequence. For example, mRNA transcribed from a selected sequence can be quantitated by qRT-PCR or by Northern hybridization (Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989)). Protein encoded by a selected sequence can be quantitated by various methods, e.g., by ELISA, by assaying for the biological activity of the protein, or by employing assays that are independent of such activity, such as western blotting or radioimmunoassay, using antibodies that recognize and bind the protein. The polynucleotide generally encodes a target enzyme involved in a metabolic pathway for producing a desired metabolite. It is understood that the terms "recombinant microorganism" and "recombinant host cell" refer not only to the particular recombinant microorganism but to the progeny or potential progeny of such a microorganism. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.

[0066] The term "wild-type microorganism" describes a cell that occurs in nature, i.e. a cell that has not been genetically modified. A wild-type microorganism can be genetically modified to express or overexpress a first target enzyme. This microorganism can act as a parental microorganism in the generation of a microorganism modified to express or overexpress a second target enzyme. In turn, the microorganism modified to express or overexpress a first and a second target enzyme can be modified to express or overexpress a third target enzyme.

[0067] Accordingly, a "parental microorganism" functions as a reference cell for successive genetic modification events. Each modification event can be accomplished by introducing a nucleic acid molecule in to the reference cell. The introduction facilitates the expression or overexpression of a target enzyme. It is understood that the term "facilitates" encompasses the activation of endogenous polynucleotides encoding a target enzyme through genetic modification of e.g., a promoter sequence in a parental microorganism. It is further understood that the term "facilitates" encompasses the introduction of heterologous polynucleotides encoding a target enzyme in to a parental microorganism.

[0068] The term "engineer" refers to any manipulation of a microorganism that results in a detectable change in the microorganism, wherein the manipulation includes but is not limited to inserting a polynucleotide and/or polypeptide heterologous to the microorganism and mutating a polynucleotide and/or polypeptide native to the microorganism.

[0069] The term "mutation" as used herein indicates any modification of a nucleic acid and/or polypeptide which results in an altered nucleic acid or polypeptide. Mutations include, for example, point mutations, deletions, or insertions of single or multiple residues in a polynucleotide, which includes alterations arising within a protein-encoding region of a gene as well as alterations in regions outside of a protein-encoding sequence, such as, but not limited to, regulatory or promoter sequences. A genetic alteration may be a mutation of any type. For instance, the mutation may constitute a point mutation, a frame-shift mutation, an insertion, or a deletion of part or all of a gene. In addition, in some embodiments of the modified microorganism, a portion of the microorganism genome has been replaced with a heterologous polynucleotide. In some embodiments, the mutations are naturally-occurring. In other embodiments, the mutations are the results of artificial selection pressure. In still other embodiments, the mutations in the microorganism genome are the result of genetic engineering.

[0070] The term "biosynthetic pathway", also referred to as "metabolic pathway", refers to a set of anabolic or catabolic biochemical reactions for converting one chemical species into another. Gene products belong to the same "metabolic pathway" if they, in parallel or in series, act on the same substrate, produce the same product, or act on or produce a metabolic intermediate (i.e., metabolite) between the same substrate and metabolite end product.

[0071] The term "heterologous" as used herein with reference to molecules and in particular enzymes and polynucleotides, indicates molecules that are expressed in an organism other than the organism from which they originated or are found in nature, independently of the level of expression that can be lower, equal or higher than the level of expression of the molecule in the native microorganism.

[0072] On the other hand, the term "native" or "endogenous" as used herein with reference to molecules, and in particular enzymes and polynucleotides, indicates molecules that are expressed in the organism in which they originated or are found in nature, independently of the level of expression that can be lower equal or higher than the level of expression of the molecule in the native microorganism. It is understood that expression of native enzymes or polynucleotides may be modified in recombinant microorganisms.

[0073] The term "feedstock" is defined as a raw material or mixture of raw materials supplied to a microorganism or fermentation process from which other products can be made. For example, a carbon source, such as biomass or the carbon compounds derived from biomass are a feedstock for a microorganism that produces a biofuel in a fermentation process. However, a feedstock may contain nutrients other than a carbon source.

[0074] The term "substrate" or "suitable substrate" refers to any substance or compound that is converted or meant to be converted into another compound by the action of an enzyme. The term includes not only a single compound, but also combinations of compounds, such as solutions, mixtures and other materials which contain at least one substrate, or derivatives thereof. Further, the term "substrate" encompasses not only compounds that provide a carbon source suitable for use as a starting material, such as any biomass derived sugar, but also intermediate and end product metabolites used in a pathway associated with a recombinant microorganism as described herein.

[0075] The term "C2-compound" as used as a carbon source for engineered yeast microorganisms with mutations in all pyruvate decarboxylase (PDC) genes resulting in a reduction of pyruvate decarboxylase activity of said genes refers to organic compounds comprised of two carbon atoms, including but not limited to ethanol and acetate

[0076] The term "fermentation" or "fermentation process" is defined as a process in which a microorganism is cultivated in a culture medium containing raw materials, such as feedstock and nutrients, wherein the microorganism converts raw materials, such as a feedstock, into products.

[0077] The term "volumetric productivity" or "production rate" is defined as the amount of product formed per volume of medium per unit of time. Volumetric productivity is reported in gram per liter per hour (g/L/h).

[0078] The term "specific productivity" or "specific production rate" is defined as the amount of product formed per volume of medium per unit of time per amount of cells. Volumetric productivity is reported in gram or milligram per liter per hour per OD (g/L/h/OD).

[0079] The term "yield" is defined as the amount of product obtained per unit weight of raw material and may be expressed as g product per g substrate (g/g). Yield may be expressed as a percentage of the theoretical yield. "Theoretical yield" is defined as the maximum amount of product that can be generated per a given amount of substrate as dictated by the stoichiometry of the metabolic pathway used to make the product. For example, the theoretical yield for one typical conversion of glucose to isobutanol is 0.41 g/g. As such, a yield of isobutanol from glucose of 0.39 g/g would be expressed as 95% of theoretical or 95% theoretical yield.

[0080] The term "titer" is defined as the strength of a solution or the concentration of a substance in solution. For example, the titer of a biofuel in a fermentation broth is described as g of biofuel in solution per liter of fermentation broth (g/L).

[0081] "Aerobic conditions" are defined as conditions under which the oxygen concentration in the fermentation medium is sufficiently high for an aerobic or facultative anaerobic microorganism to use as a terminal electron acceptor.

[0082] In contrast, "anaerobic conditions" are defined as conditions under which the oxygen concentration in the fermentation medium is too low for the microorganism to use as a terminal electron acceptor. Anaerobic conditions may be achieved by sparging a fermentation medium with an inert gas such as nitrogen until oxygen is no longer available to the microorganism as a terminal electron acceptor. Alternatively, anaerobic conditions may be achieved by the microorganism consuming the available oxygen of the fermentation until oxygen is unavailable to the microorganism as a terminal electron acceptor.

[0083] "Aerobic metabolism" refers to a biochemical process in which oxygen is used as a terminal electron acceptor to make energy, typically in the form of ATP, from carbohydrates. Aerobic metabolism occurs e.g. via glycolysis and the TCA cycle, wherein a single glucose molecule is metabolized completely into carbon dioxide in the presence of oxygen.

[0084] In contrast, "anaerobic metabolism" refers to a biochemical process in which oxygen is not the final acceptor of electrons contained in NADH. Anaerobic metabolism can be divided into anaerobic respiration, in which compounds other than oxygen serve as the terminal electron acceptor, and substrate level phosphorylation, in which the electrons from NADH are utilized to generate a reduced product via a "fermentative pathway."

[0085] In "fermentative pathways", NAD(P)H donates its electrons to a molecule produced by the same metabolic pathway that produced the electrons carried in NAD(P)H. For example, in one of the fermentative pathways of certain yeast strains, NAD(P)H generated through glycolysis transfers its electrons to pyruvate, yielding ethanol. Fermentative pathways are usually active under anaerobic conditions but may also occur under aerobic conditions, under conditions where NADH is not fully oxidized via the respiratory chain. For example, above certain glucose concentrations, Crabtree positive yeasts produce large amounts of ethanol under aerobic conditions.

[0086] The term "byproduct" means an undesired product related to the production of a biofuel or biofuel precursor. Byproducts are generally disposed as waste, adding cost to a production process.

[0087] The term "non-fermenting yeast" is a yeast species that fails to demonstrate an anaerobic metabolism in which the electrons from NADH are utilized to generate a reduced product via a fermentative pathway such as the production of ethanol and CO.sub.2 from glucose. Non-fermentative yeast can be identified by the "Durham Tube Test" (J. A. Barnett, R. W. Payne, and D. Yarrow. 2000. Yeasts Characteristics and Identification. 3.sup.rd edition. p. 28-29. Cambridge University Press, Cambridge, UK) or by monitoring the production of fermentation productions such as ethanol and CO.sub.2.

[0088] The term "polynucleotide" is used herein interchangeably with the term "nucleic acid" and refers to an organic polymer composed of two or more monomers including nucleotides, nucleosides or analogs thereof, including but not limited to single stranded or double stranded, sense or antisense deoxyribonucleic acid (DNA) of any length and, where appropriate, single stranded or double stranded, sense or antisense ribonucleic acid (RNA) of any length, including siRNA. The term "nucleotide" refers to any of several compounds that consist of a ribose or deoxyribose sugar joined to a purine or a pyrimidine base and to a phosphate group, and that are the basic structural units of nucleic acids. The term "nucleoside" refers to a compound (as guanosine or adenosine) that consists of a purine or pyrimidine base combined with deoxyribose or ribose and is found especially in nucleic acids. The term "nucleotide analog" or "nucleoside analog" refers, respectively, to a nucleotide or nucleoside in which one or more individual atoms have been replaced with a different atom or with a different functional group. Accordingly, the term polynucleotide includes nucleic acids of any length, DNA, RNA, analogs and fragments thereof. A polynucleotide of three or more nucleotides is also called nucleotidic oligomer or oligonucleotide.

[0089] It is understood that the polynucleotides described herein include "genes" and that the nucleic acid molecules described herein include "vectors" or "plasmids." Accordingly, the term "gene", also called a "structural gene" refers to a polynucleotide that codes for a particular sequence of amino acids, which comprise all or part of one or more proteins or enzymes, and may include regulatory (non-transcribed) DNA sequences, such as promoter sequences, which determine for example the conditions under which the gene is expressed. The transcribed region of the gene may include untranslated regions, including introns, 5'-untranslated region (UTR), and 3'-UTR, as well as the coding sequence.

[0090] The term "operon" refers to two or more genes which are transcribed as a single transcriptional unit from a common promoter. In some embodiments, the genes comprising the operon are contiguous genes. It is understood that transcription of an entire operon can be modified (i.e., increased, decreased, or eliminated) by modifying the common promoter. Alternatively, any gene or combination of genes in an operon can be modified to alter the function or activity of the encoded polypeptide. The modification can result in an increase in the activity of the encoded polypeptide. Further, the modification can impart new activities on the encoded polypeptide. Exemplary new activities include the use of alternative substrates and/or the ability to function in alternative environmental conditions.

[0091] A "vector" is any means by which a nucleic acid can be propagated and/or transferred between organisms, cells, or cellular components. Vectors include viruses, bacteriophage, pro-viruses, plasmids, phagemids, transposons, and artificial chromosomes such as YACs (yeast artificial chromosomes), BACs (bacterial artificial chromosomes), and PLACs (plant artificial chromosomes), and the like, that are "episomes," that is, that replicate autonomously or can integrate into a chromosome of a host cell. A vector can also be a naked RNA polynucleotide, a naked DNA polynucleotide, a polynucleotide composed of both DNA and RNA within the same strand, a poly-lysine-conjugated DNA or RNA, a peptide-conjugated DNA or RNA, a liposome-conjugated DNA, or the like, that are not episomal in nature, or it can be an organism which comprises one or more of the above polynucleotide constructs such as an agrobacterium or a bacterium.

[0092] "Transformation" refers to the process by which a vector is introduced into a host cell. Transformation (or transduction, or transfection), can be achieved by any one of a number of means including chemical transformation (e.g. lithium acetate transformation), electroporation, microinjection, biolistics (or particle bombardment-mediated delivery), or agrobacterium mediated transformation.

[0093] The term "enzyme" as used herein refers to any substance that catalyzes or promotes one or more chemical or biochemical reactions, which usually includes enzymes totally or partially composed of a polypeptide, but can include enzymes composed of a different molecule including polynucleotides.

[0094] The term "protein", "peptide" or "polypeptide" as used herein indicates an organic polymer composed of two or more amino acidic monomers and/or analogs thereof. As used herein, the term "amino acid" or "amino acidic monomer" refers to any natural and/or synthetic amino acids including glycine and both D or L optical isomers. The term "amino acid analog" refers to an amino acid in which one or more individual atoms have been replaced, either with a different atom, or with a different functional group. Accordingly, the term polypeptide includes amino acidic polymer of any length including full length proteins, and peptides as well as analogs and fragments thereof. A polypeptide of three or more amino acids is also called a protein oligomer or oligopeptide

[0095] The term "homolog", used with respect to an original enzyme or gene of a first family or species, refers to distinct enzymes or genes of a second family or species which are determined by functional, structural or genomic analyses to be an enzyme or gene of the second family or species which corresponds to the original enzyme or gene of the first family or species. Most often, homologs will have functional, structural or genomic similarities. Techniques are known by which homologs of an enzyme or gene can readily be cloned using genetic probes and PCR. Identity of cloned sequences as homolog can be confirmed using functional assays and/or by genomic mapping of the genes.

[0096] A protein has "homology" or is "homologous" to a second protein if the amino acid sequence encoded by a gene has a similar amino acid sequence to that of the second gene. Alternatively, a protein has homology to a second protein if the two proteins have "similar" amino acid sequences. (Thus, the term "homologous proteins" is defined to mean that the two proteins have similar amino acid sequences).

[0097] The term "analog" or "analogous" refers to nucleic acid or protein sequences or protein structures that are related to one another in function only and are not from common descent or do not share a common ancestral sequence. Analogs may differ in sequence but may share a similar structure, due to convergent evolution. For example, two enzymes are analogs or analogous if the enzymes catalyze the same reaction of conversion of a substrate to a product, are unrelated in sequence, and irrespective of whether the two enzymes are related in structure.

Cytosolically Localized Isobutanol Pathway Enzymes and Recombinant Microorganisms Comprising the Same

[0098] Biosynthetic pathways for the production of isobutanol and 2-methyl-1-butanol by recombinant microorganisms are described by Atsumi et al. (Atsumi et al., 2008, Nature 451: 86-89). One strategy described herein for improving isobutanol production by recombinant microorganisms is the localization of the enzymes catalyzing the biosynthetic isobutanol pathway to the yeast cytosol. Cytosolic localization of the isobutanol pathway enzymes activity is desirable, especially for the production of isobutanol since the ideal biocatalyst (e.g. recombinant microorganism) will have the entire isobutanol pathway functionally expressed in the same compartment (e.g. preferably in the cytosol). In addition, this localization allows the pathway to utilize pyruvate and NAD(P)H that is generated in the cytosol by glycolysis and/or the pentose phosphate pathway without the need for transfer of these metabolites to an alternative compartment (i.e. the mitochondria). However, such a strategy of compartmental localization in yeast is not feasible unless the pathway enzymes exhibit cytosolic activity in that compartment. Thus, if one or more of the cytosolically localized pathway enzymes lacks catalytic activity in the cytosol, high level isobutanol production will not occur. As the present application shows in the Examples below, inefficient cytosolic activity of one or or more isobutanol pathway enzymes (e.g. DHAD or ALS) can limit isobutanol production.

[0099] The present inventors describe herein cytosolically active isobutanol pathway enzymes and their use in the production of various beneficial metabolites, such as isobutanol and 2-methyl-1-butanol. Using a combination of genetic selection and biochemical analyses, the present inventors have identified a number of isobutanol pathway enzymes, including DHAD enzymes, that have activity in the cytosol. Accordingly, in one aspect, the present application describes the discovery of DHADs with enhanced cytosolic activity and shows that these newly identified, cytosolically active DHADs facilitate improved isobutanol production when co-expressed in the cytosol with the remaining four isobutanol pathway enzymes.

[0100] As shown in Example 3 below, the native DHAD of yeast is localized to the mitochondria. Therefore, for economically viable production of isobutanol to occur in the yeast cytosol, the identification of heterologous DHAD enzymes that are "cytosolically active" in yeast (i.e. "active in the cytosol" of the yeast) is important. In addition, the present application shows that in the absence of ALS, KARI, KIVD, and ADH which are "cytosolically active" or "active in the cytosol" in the cytosol of yeast, economically viable isobutanol production will not occur, thus making identification of native and/or heterologous ALS, KARI, KIVD, and ADH enzymes additionally and/or independently important to cytosolic isobutanol production.

[0101] As used herein, the term "cytosolically active" or "active in the cytosol" means the enzyme exhibits enzymatic activity in the cytosol of a eukaryotic organism. Cytosolically active enzymes may further be additionally and/or independently characterized as enzymes that generally exhibit a specific cytosolic activity which is greater than the specific mitochondrial activity. In certain respects, a "cytosolically active" enzymes of the present invention exhibit a ratio of the specific activity of the mitochondrial fraction over the specific activity of the whole cell fraction of less than 1, as determined by the method disclosed in Example 3 herein. Cytosolically active enzymes may further be additionally and/or independently characterized as enzymes that, when overexpressed, result in increased activity in the whole cell fraction and do not result in increased activity in the mitochondrial fraction, as determined by the method disclosed in Example 20. Cytosolically active enzymes may further be additionally and/or independently characterized as enzymes that, when overexpressed as one of the five enzymes that together comprise the fivestep biosynthetic pathway for the conversion of pyruvate isobutanol, result in increased isobutanol production compared to enzymes that are not cytosolically active or that are less cytosolically active.

[0102] As used herein, the term "cytosolically localized" or "cytosolic localization" means the enzyme is localized in the cytosol of a eukaryotic organism. Cytosolically localized enzymes may further be additionally and/or independently characterized as enzymes that exhibit a cytosolic protein level which is greater than the mitochondrial protein level.

Identification of Cytosolically Active Isobutanol Pathway Enzymes

[0103] In one aspect, the present invention encompasses a number of strategies for identifying cytosolically active and/or localized isobutanol pathway enzymes that exhibit cytosolic activity and/or cytosolic localization, as well as methods for modifying said isobutanol pathway enzymes to increase their ability to exhibit cytosolic activity and/or cytosolic localization.

[0104] In various embodiments described herein, the isobutanol pathway enzymes may be derived from a prokaryotic organism. In alternative embodiments described herein, the isobutanol pathway enzyme may be derived from a eukaryotic organism. In one embodiment, the eukaryotic organism is a fungal organism. As described herein, the present inventors have found that in general, an enzyme from a fungal source is more likely to show activity in yeast than a bacterial enzyme expressed in yeast. In addition, homologs that are normally expressed in the cytosol are desired, as a normally cytoplasmic enzyme is likely to show higher activity in the cytosol as compared to an enzyme that is relocalized to the cytosol from other organelles, such as the mitochondria. Fungal homologs of various isobutanol pathway enzymes are often localized to the mitochondria. The present inventors have found that fungal homologs of isobutanol pathway enzymes that are cytosolically localized will generally be expected to exhibit higher activity in the cytosol of yeast than those of wild-type yeast strains. Thus, in one embodiment, the present invention provides fungal isobutanol pathway enzyme homologs that are cytosolically active and/or cytosolically localized.

Dihydroxyacid Dehydratase (DHAD)

[0105] In additional embodiments, at least one of the isobutanol pathway enzymes exhibiting cytosolic activity is a dihydroxyacid dehydratase (DHAD). In accordance with this embodiment, the present invention provides cytosolically active dihydroxyacid dehydratases (DHADs) and further describes methods for their use in the production of various beneficial metabolites, such as isobutanol and 2-methyl-1-butanol. As noted above, biosynthetic pathways for the production of isobutanol and 2-methyl-1-butanol have been described (Atsumi et al., 2008, Nature 451: 86-89). In these biosynthetic pathways, DHAD catalyzes the conversion of 2,3-dihydroxyisovalerate to 2-ketoisovalerate, and 2,3-dihydroxy-3-methylvalerate to 2-keto-3-methylvarate, respectively. Using a combination of genetic selection and biochemical analyses, the present inventors have identified a number of DHAD homologs that have activity in the cytosol.

[0106] Among the many strategies for identifying cytosolically active DHADs, the present inventors performed multiway-protein alignments between several DHAD homologs. Using this analysis, the present inventors identified a protein motif that was surprisingly unique to a subset of DHAD homologs exhibiting cytosolical activity. This protein motif, P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27) was found in DHAD homologs demonstrating cytosolic activity in yeast. Therefore, in one embodiment, the present invention provides DHAD enzymes comprising the amino acid sequence P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27), wherein X is any natural or non-natural amino acid, and wherein said DHAD enzyme exhibits the ability to convert 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol. DHAD enzymes harboring this sequence include those derived from L. lactis (SEQ ID NO: 18), G. forsetii (SEQ ID NO: 17), Acidobacteria bacterium Ellin345 (SEQ ID NO: 16), Saccharopolyspora erythraea (SEQ ID NO: 19), Yarrowia lipolytica (SEQ ID NO: 13), Francisella tularensis (SEQ ID NO: 14), Arabidopsis thaliana (SEQ ID NO: 15), Thermotoga petrophila (SEQ ID NO: 10), and Victivallis vadensis (SEQ ID NO: 11). Also encompassed herein are DHAD enzymes that comprise a motif that is at least about 70% similar, at least about 80% similar, or at least about 90% similar to the motif shown in SEQ ID NO: 27.

[0107] As described herein, an even more specific version of this motif has been identified by the present inventors. Thus, in a further embodiment, the present invention provides DHAD enzymes comprising the amino acid sequence PIKXXGX(I/L)XIL (SEQ ID NO: 28), wherein X is any natural or non-natural amino acid, and wherein said DHAD enzyme exhibits the ability to convert 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol. DHAD enzymes harboring this sequence include those derived from L. lactis (SEQ ID NO: 18), G. forsetii (SEQ ID NO: 17), Acidobacteria bacterium Ellin345 (SEQ ID NO: 16), Y. lipolytica (SEQ ID NO: 13), F. tularensis (SEQ ID NO: 14), A. thaliana (SEQ ID NO: 15), T. petrophila (SEQ ID NO: 10), and V. vadensis (SEQ ID NO: 11). Also encompassed herein are DHAD enzymes that comprise a motif that is at least about 70% similar, at least about 80% similar, or at least about 90% similar to the motif shown in SEQ ID NO: 28.

[0108] As noted above, one such cytosolically active DHAD identified herein is exemplified by the L. lactis DHAD amino acid sequence of SEQ ID NO: 18, which is encoded by the L. lactis ilvD gene. As described herein, the present inventors have discovered that yeast strains expressing the cytosolically active L. lactis ilvD (DHAD) exhibit higher isobutanol production than yeast strains expressing the S. cerevisiae ILV3 (DHAD), even when the ILV3 from S. cerevisiae is truncated at its N-terminus to remove a putative mitochondrial targeting sequence. In addition to the use and identification of the cytosolically active DHAD homolog from L. lactis, the present invention encompasses a number of different strategies for identifying DHAD enzymes that exhibit cytosolic activity and/or cytosolic localization, as well as methods for modifying DHADs to increase their ability to exhibit cytosolic activity and/or cytosolic localization.

[0109] In various embodiments described herein, the DHAD enzymes may be derived from a prokaryotic organism. In one embodiment, the prokaryotic organism is a bacterial organism. In another embodiment, the bacterial organism is L. lactis. In a specific embodiment, the DHAD enzyme from L. lactis comprises the amino acid sequence of SEQ ID NO: 18. In other embodiments, the bacterial organisms are of the genus Lactococcus, Gramella, Acidobacteria, Francisella, Thermotoga and Victivallis.

[0110] In alternative embodiments, the DHAD enzyme may be derived from a eukaryotic organism. In one embodiment, the eukaryotic organism is a fungal organism. In an exemplary embodiment, the fungal organism is Neurospora crassa. In a specific embodiment, the DHAD enzyme from N. crassa comprises the amino acid sequence of SEQ ID NO: 165.

[0111] As described herein, the present inventors have found that in general, an enzyme from a fungal source is more likely to show activity in yeast than a bacterial enzyme expressed in yeast. In addition, homologs that are normally expressed in the cytosol are desired, as a normally cytoplasmic enzyme is likely to show higher activity in the cytosol as compared to an enzyme that is relocalized to the cytosol from other organelles, such as the mitochondria. Fungal homologs of various isobutanol pathway enzymes, including DHAD, are often localized to the mitochondria. The present inventors have found that fungal homologs of DHAD that are cytosolically localized will generally be expected to exhibit higher activity in the cytosol of yeast than those of wild-type yeast strains. Thus, in one embodiment, the present invention provides fungal DHAD homologs that are cytosolically active and/or cytosolically localized.

[0112] In another embodiment, the eukaryotic organism is a yeast organism. In another embodiment, the eukaryotic organism is selected from the group consisting of the genera Enamoeba and Giardia.

[0113] In various embodiments described herein, the recombinant microorganism may exhibit at least about 5 percent greater dihydroxyacid dehydratase (DHAD) activity in the cytosol as compared to the parental microorganism. In another embodiment, the recombinant microorganism may exhibit at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent, at least about 100 percent, at least about 200 percent, or at least about 500 percent greater dihydroxyacid dehydratase (DHAD) activity in the cytosol as compared to the parental microorganism.

[0114] In another embodiment, the present invention provides DHAD enzymes that, when overexpressed in yeast, result in increased activity in the whole cell fraction and do not result in increased activity in the mitochondrial fraction. In one embodiment, the DHAD activity in the whole cell fraction is increased by at least about 2-fold. In another embodiment, DHAD activity in the whole cell fraction is increased by at least about 5-fold. In yet another embodiment, DHAD activity in the whole cell fraction is increased by at least about 7-fold. In yet another embodiment, DHAD activity in the whole cell fraction is increased by at least about 10-fold. In yet another embodiment, DHAD activity in the whole cell fraction is increased by at least about 50-fold. In yet another embodiment, DHAD activity in the whole cell fraction is increased by at least about 100-fold.

Acetolactate Synthase (ALS)

[0115] As described herein, the isobutanol pathway enzymes in addition to DHAD should preferably be active in the cytosol. These cytosolically active isobutanol pathway enzymes will generally exhibit enzymatic activity in the cytosol. For instance, a cytosolically active ALS should generally exhibit the ability to convert 2 pyruvate to acetolactate in the cytosol. Thus, in various embodiments described herein, at least one of the isobutanol pathway enzymes exhibiting cytosolic activity is acetolactate synthase (ALS). In yeasts such as S. cerevisiae, the native acetolactate synthase, encoded in S. cerevisiae by the ILV2 gene, is naturally expressed in the yeast mitochondria. Unlike the endogenous acetolactate synthase of yeast, expression of heterologous, acetolactate synthases such as the B. subtilis alsS and the L. lactis alsS in yeast occurs in the yeast cytosol (i.e. cytosolically-localized). Thus, cytosolic expression of acetolactate synthase is achieved by transforming a yeast with a gene encoding an acetolactate synthase protein (EC 2.2.1.6).

[0116] ALS homologs that could be cytosolically expressed and localized in yeast are predicted to lack a mitochondrial targeting sequence as analyzed using mitoprot (Claros et al., 1996, Eur. J. Biochem 241: 779-86). Such cytosolically localized ALS proteins can be used as the first step in the isobutanol pathway. ALS homologs include, but are not limited to, the following: the Serratia marcescens ALS (GenBank Accession No. ADH43113.1) (probability of mitochondrial localization 0.07), the Enterococcus faecalis ALS (GenBank Accession No. NP.sub.--814940) (probability of mitochondrial localization 0.21), the Leuconostoc mesenteroides (GenBank Accession No. YP.sub.--818010.1) (probability of mitochondrial localization 0.21), the Staphylococcus aureus ALS (GenBank Accession No. YP.sub.--417545) (probability of mitochondrial localization 0.13), the Burkholderia cenocepacia ALS (GenBank Accession No. YP.sub.--624435) (probability of mitochondrial localization 0.15), the T. atroviride ALS (SEQ ID NO: 71) (probability of mitochondrial localization 0.19), the T. stipitatus ALS (SEQ ID NO: 72) (probability of mitochondrial localization 0.19), and the Magnaporthe grisea ALS (GenBank Accession No. EDJ99221) (probability of mitochondrial localization 0.02), a homolog or variant of any of the foregoing, and a polypeptide having at least 60% identity to anyone of the foregoing and exhibiting cytosolic ALS activity.

[0117] In one embodiment, the cytosolically active ALS is derived from a prokaryotic organism, including, but not limited to B. subtilis or L. lactis, which exhibit cytosolic activity. In another embodiment, the ALS may be derived from an eukaryotic organism, including, but not limited to M. grisea, P. nodorum, T. stipitatus, and T. atroviride.

[0118] In some embodiments, an ALS enzyme that is predicted to be mitochondrially localized may be mutated or modified to remove or modify an N-terminal mitochondrial targeting sequence (MTS) to remove or eliminate its ability to target the ALS enzyme to the mitochondria. Removal of the MTS can increase cytosolic localization of the ALS and/or increase the cytosolic activity of the ALS as compared to the parental ALS.

[0119] The conversion of two pyruvate molecules to acetolactate can be carried out by either an acetohydroxyacid synthase (AHAS) or an acetolactate synthase (ALS). AHASs are involved in biosynthesis of branched chain amino acids in the mitochondria of yeasts. They are FAD-dependent and are feedback inhibited by branched chain amino acids. ALSs are catabolic and are involved in the conversion of pyruvate to acetoin. ALS are FAD-independent and not feedback inhibited by branched chain amino acids. In addition, ALSs are specific for the conversion of two pyruvates to acetolactate. Therefore, ALSs are favored over AHASs. In addition, in the case of yeast, AHASs are normally mitochondrial, therefore a fungal ALS that is cytoplasmic is favored. Sequence analysis has shown that there is a conserved sequence `RFDDR` found in AHASs that is not conserved among ALSs (Le et al., 2005, Bull. Korean Chem Soc 26: 916-20). This sequence is likely involved in FAD-binding by AHASs and thus could be used to distinguish between the FAD-dependent AHASs and the FAD-independent ALSs. Using this region to distinguish between AHASs and ALSs BLAST searches of fungal sequence databases were performed and resulted in the identification of ALS homologs from several fungal species (M. grisea, P. nodorum, T. atroviride, T. stipitatus, P. marneffei, and Glomerella graminicola). Of these sequences, the ALS homologs from M. grisea, P. nodorum, T. stipitatus, and T. atroviride will generally be expected to be cytosolically localized.

[0120] In one embodiment, the recombinant microorganism may exhibit at least about 5 percent greater acetolactate synthase (ALS) activity in the cytosol as compared to the parental microorganism. In another embodiment, the recombinant microorganism may exhibit at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent, at least about 100 percent, at least about 200 percent, or at least about 500 percent greater acetolactate synthase (ALS) activity in the cytosol as compared to the parental microorganism.

Ketol-Acid Reductoisomerase (KARI)

[0121] In additional embodiments, at least one of the isobutanol pathway enzymes exhibiting cytosolic activity is a ketol-acid reductoisomerase (KARI). A cytosolically active KARI should generally exhibit the ability to convert acetolactate to 2,3-dihydroxyisovalerate in the cytosol.

[0122] In one embodiment, the KARI is derived from a prokaryotic organism, including, but not limited to Escherichia coli, B. subtilis or L. lactis.

[0123] in another embodiment, the KARI is derived from a eukaryotic organism, including, but not limited to Piromyces sp. E2, S. cerevisiae, and Arabidopsis.Fungal homologs of KARI are generally mitochondrially localized. The present inventors have identified a fungal homolog from the anaerobic rumenal fungi, Piromyces sp. E2, that is cytosolically localized.

[0124] In certain specific embodiments, the KARI comprises an amino acid sequence selected from the group consisting of E. coli (GenBank No: NP.sub.--418222, SEQ ID NO: 1), S. cerevisiae (GenBank No: NP.sub.--013459, SEQ ID NO: 2), and B. subtilis (GenBank No: CAB14789) and the KARI enzymes from Piromyces sp E2 (GenBank No: CAA76356), B. aphidicola (GenBank No: AAF13807), S. oleracea (GenBank No: CAA40356), O. sativa (GenBank No: NP.sub.--001056384, SEQ ID NO: 3), C. reinhardtii (GenBank No: XP.sub.--001702649, SEQ ID NO: 6), N. crassa (GenBank No: XP.sub.--961335), S. pombe (GenBank No: NP.sub.--001018845), L. bicolor (GenBank No: XP.sub.--001880867), I. hospitalis (GenBank No: YP.sub.--001435197), P. torridus (GenBank No: YP.sub.--023851, SEQ ID NO: 7), A. cryptum (GenBank No: YP.sub.--001235669, SEQ ID NO: 5), Cyanobacteria/Synechococcus sp. (GenBank No: YP.sub.--473733), Z. mobilis (GenBank No: YP.sub.--162876: SEQ ID NO. 8), B. thetaiotaomicron (GenBank No: NP.sub.--810987), M. maripaludis (GenBank No: YP.sub.--001097443, SEQ ID NO: 4), V. fischeri (GenBank No: YP.sub.--205911), Shewanella sp (GenBank No: YP.sub.--732498.1), G. forsetti (GenBank No: YP.sub.--862142), P. ingrhamaii (GenBank No: YP.sub.--942294), and C. hutchinsonii (GenBank No: YP.sub.--677763), a homolog or variant of any of the foregoing, and a polypeptide having at least 60% identity to anyone of the foregoing and exhibiting cytosolic KARI activity.

[0125] In additional embodiments, the KARI may be an NADH-dependent KARI. Thus, in one embodiment, the present invention provides recombinant microorganisms in which the NADPH-dependent enzymes KARI is replaced with an enzyme that preferentially depends on NADH (i.e. a KARI that is NADH-dependent). In one embodiment, such enzymes may be identified in nature. In an alternative embodiment, such enzymes may be generated by protein engineering techniques including but not limited to directed evolution or site-directed mutagenesis. NADH-dependent KARIs useful in various methods of the present invention are described in commonly owned and co-pending applications U.S. Ser. No. 12/610,784 and PCT/US09/62952 (published as WO/2010/051527), which are herein incorporated by reference in their entireties for all purposes.

[0126] In one embodiment, a microorganism is provided in which cofactor usage is balanced during the production of a fermentation product and the microorganism produces the fermentation product at a higher yield compared to a modified microorganism in which the cofactor usage in not balanced. In another embodiment of the present invention, a microorganism is provided in which the cofactor usage is balanced during the production of isobutanol and the microorganism produces isobutanol at a higher yield compared to a modified microorganism in which the cofactor usage in not balanced. Methods for achieving co-factor balance are described in commonly owned and co-pending applications U.S. Ser. No. 12/610,784 and PCT/US09/62952 (published as WO/2010/051527), which are herein incorporated by reference in their entireties for all purposes.

[0127] In one embodiment, the recombinant microorganism may exhibit at least about 5 percent greater ketol-acid reductoisomerase (KARI) activity in the cytosol as compared to the parental microorganism. In another embodiment, the recombinant microorganism may exhibit at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent, at least about 100 percent, at least about 200 percent, or at least about 500 percent greater ketol-acid reductoisomerase (KARI) activity in the cytosol as compared to the parental microorganism.

Keto-Acid Decarboxylase (KIVD)

[0128] A cytosolically active KIVD should generally exhibit the ability to convert ketoisovalerate to isobutyraldehyde in the cytosol. In one embodiment, the cytosolically active KIVD is derived from a prokaryotic organism, including, but not limited to L. lactis, which exhibits cytosolic activity. In a specific embodiment, the KIVD enzyme from L. lactis comprises the amino acid sequence of SEQ ID NO: 173. In additional embodiments, the cytosolically active KIVD is derived from, for example, Enterobacter cloacae (Accession No. P23234.1), Mycobacterium smegmatis (Accession No. A0R480.1), Mycobacterium tuberculosis (Accession No. O53865.1), Mycobacterium avium (Accession No. Q742Q2.1), Azospirillum brasilense (Accession No. P51852.1), B. subtilis (see Oku et al., 1988, J. Biol. Chem. 263: 18386-96), a homolog or variant of any of the foregoing, and a polypeptide having at least 60% identity to anyone of the foregoing and exhibiting cytosolic KIVD activity.

[0129] In an alternative embodiment, the KIVD may be derived from an eukaryotic organism.

[0130] In one embodiment, the recombinant microorganism may exhibit at least about 5 percent greater 2-keto-acid decarboxylase (KIVD) activity in the cytosol as compared to the parental microorganism. In another embodiment, the recombinant microorganism may exhibit at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent, at least about 100 percent, at least about 200 percent, or at least about 500 percent greater 2-keto-acid decarboxylase (KIVD) activity in the cytosol as compared to the parental microorganism.

Alcohol Dehydroqenase (ADH)

[0131] A cytosolically active ADH (used interchangeably herein with isobutanol dehydrogenase, "IDH") should generally exhibit the ability to convert isobutyraldehyde to isobutanol in the cytosol. In one embodiment, the cytosolically active ADH is derived from a prokaryotic organism, including, but not limited to L. lactis. In a specific embodiment, the ADH enzyme from L. lactis comprises the amino acid sequence of SEQ ID NO: 175. In additional embodiments, the ADH is derived from, for example, Lactobacillus brevis (Accession No. YP.sub.--794451.1), Pediococcus acidilactici (Accession No. ZP.sub.--06197454.1), Bacillus cereus (Accession No. YP.sub.--001374103.1), Bacillus thuringiensis (Accession No. ZP.sub.--04101989.1), Leptotrichia goodfellowii (Accession No. ZP.sub.--06011170.1), Actinobacillus pleuropneumoniae (Accession No. ZP.sub.--00134308.2), Streptococcus sanguinis (Accession No. YP.sub.--001035842.1), Eikenella corrodens (Accession No. ZP.sub.--03713785.1), Exiguobacterium sp. (Accession No. YP.sub.--002886170.1), Neisseria elongate (Accession No. ZP.sub.--06736067.1), E. coli (Accession No. ZP.sub.--06937530.1), Neisseria meningitidis (Accession No. CBA03965.1), Erwinia pyrifoliae (Accession No. CAY75147.1), and Colwellia psychrerythraea (Accession No. YP.sub.--270515.1), a homolog or variant of any of the foregoing, and a polypeptide having at least 60% identity to anyone of the foregoing and having cytosolic ADH activity.

[0132] In an alternative embodiment, the ADH may be derived from an eukaryotic organism, including, but not limited to S. cerevisiae and D. melanogaster. In a specific embodiment, the ADH enzyme from S. cerevisiae is Adh7. In another specific embodiment, the ADH enzyme from D. melanogaster comprises the amino acid sequence of SEQ ID NO: 176.

[0133] In one embodiment, the recombinant microorganism may exhibit at least about 5 percent greater alcohol dehydrogenase (ADH) activity in the cytosol as compared to the parental microorganism. In another embodiment, the recombinant microorganism may exhibit at least about 10 percent, at least about 15 percent, about least about 20 percent, at least about 25 percent, at least about 30 percent, at least about 35 percent, at least about 40 percent, at least about 45 percent, at least about 50 percent, at least about 55 percent, at least about 60 percent, at least about 65 percent, at least about 70 percent, at least about 75 percent, at least about 80 percent, at least about 100 percent, at least about 200 percent, or at least about 500 percent greater alcohol dehydrogenase (ADH) activity in the cytosol as compared to the parental microorganism.

Chimeric Isobutanol Pathway Enzymes

[0134] In another aspect, the present invention provides recombinant microorganisms comprising chimeric proteins consisting of isobutanol pathway enzymes. In one embodiment, the chimeric proteins consist of ALS and at least one additional protein. In a specific embodiment, the additional protein is KARI. In a preferred embodiment, the chimeric protein exhibits the biocatalytic properties of both ALS and KARI. By creating a chimeric protein that incorporates the activities of both ALS and KARI, this will generally be expected to reduce the effect of diffusion and decreasing the time for spontaneous decomposition to occur. By using a flexible linker and/or structural and sequence information to create a protein with the biocatalytic properties of both ALS and KARI, this will generally increase the concentration of 2-acetolactate at the active site of KARI, causing 2-acetolactate to be converted to 2,3-dihydroxyisovalerate near its theoretical maximum (very little effect of diffusion), and thus, the total concentration of 2-acetolactate should remain low correspondingly decreasing its spontaneous decomposition. This will generally have the effect of increasing the rate of conversion of 2-acetolactate to 2,3-dihydroxyisovalerate.

[0135] In another embodiment, the chimeric proteins consist of KARI and at least one additional protein. In a specific embodiment, the additional protein is DHAD. In a preferred embodiment, the chimeric protein exhibits the biocatalytic properties of both KARI and DHAD. In each of the various embodiments described herein, the proteins may be connected via a flexible linker.

Isobutanol Pathway Enzymes Attached to a Protein Scaffold

[0136] In another aspect, the present invention provides recombinant microorganisms comprising a scaffold system tethered to one or more isobutanol pathway enzymes. In a specific embodiment, the scaffold system is the MAP kinase scaffold (Ste5) system. In a further embodiment, one or more of the isobutanol pathway enzymes may be modified or mutated to comprise a protein domain allowing for binding to the scaffold system.

[0137] The present inventors have found that via the use of a protein scaffold, the isobutanol pathway enzymes that act in concert as part of a single pathway can be co-localized. In some embodiments, the scaffold systems are adapted for binding to the isobutanol pathway enzymes. By tethering the enzymes that work together in the pathway to a scaffold protein, they are brought into close physical proximity with each other, thus increasing the efficiency of the isobutanol production.

[0138] There are several advantages to keeping pathway enzymes together on a scaffold system. One is that proteins that normally would localize to an intracellular compartment, like the mitochondria, are partitioned onto the scaffold, thus keeping a sizeable portion of the protein population in the cytosol. Another is that the chemical products of each enzyme is physically close to the next enzyme in the pathway, which speeds reaction time and decreases the possibility that the product would be used in a competing pathway. Finally, unstable products of the enzymes would be used more quickly, since the next enzyme in the pathway would be adjacent to use it as a substrate, thus decreasing nonproductive degradation of the product.

[0139] In a preferred embodiment, the isobutanol pathway enzymes are arranged in the sequence in which they are needed to function (i.e. ALS followed by KARI followed by DHAD followed by KIVD followed by ADH). In another embodiment, the scaffolded protein complex is targeted to the cytosol by adding localization signals to the scaffold. In yet another embodiment, the scaffolded protein complex is targeted to the cell wall by adding localization signals to the scaffold. As would be understood by one of skill in the art, the scaffold system allows for co-localization of proteins or enzymes in addition to the isobutanol pathway enzymes. Such proteins may include chaperone proteins, proteins for the conversion of xylose to xylulose-5P, cellulases, etc.

Removal and/or Modification of N-Terminal Mitochondrial Targeting Sequences

[0140] The localization of the enzymes involved in production of isobutanol is desired to be cytosolic. Cytosolic localization allows for the pathway to utilize pyruvate and NAD(P)H that is generated in the cytosol by glycolysis and/or the pentose phosphate pathway without the need for the transfer of these metabolites to an alternative compartment (i.e. mitochondria). However, the yeast enzymes acetohydroxyacid synthase (AHAS; Ilv2+Ilv6), ketol-acid reductoisomerase (KARI; Ilv5), and dihydroxyacid dehydratase (DHAD; Ilv3) that carry out the first three steps of isobutanol production are physiologically localized to the mitochondria. Mitochondrial matrix proteins are typically targeted to the mitochondria by a N-terminal mitochondrial targeting sequence (MTS), which is then cleaved off in the mitochondria resulting in the `mature` form of the enzyme (Paschen et al., 2001, IUBMB Life 52: 101-112). Indeed, the N-terminal targeting sequences for Ilv6 has been defined (Pang et al., 1999 Biochemistry 38: 5222-31). N-terminal deletions of Ilv5 has also been shown to re-localize this enzyme to the cytosol (Omura, 2008, Appl. Microbiol. Biotechnol. 78: 503-513; See also Omura, WO/2009/078108 A1, hereby incorporated by reference in its entirety).

[0141] One mechanism identified by the present inventors for the cytosolic localization of isobutanol pathway enzymes involves the removal and/or modification of N-terminal mitochondrial targeting sequences (MTS). Nuclear genome-encoded proteins destined to reside in the mitochondria often contain an N-terminal Mitochondrial Targeting Sequence (MTS) that is recognized by a set of proteins collectively known as mitochondrial import machinery. Following recognition and import, the MTS is then physically cleaved off of the imported protein. In eukaryotes, homologs of two of the isobutanol pathway enzymes, ketol-acid reductoisomerase (KARI, e.g. S. cerevisiae Ilv5) and dihydroxy acid dehydratase (DHAD, e.g. S. cerevisiae Ilv3), are predicted to be mitochondrial, based upon the presence of an N-terminal MTS as well as several in vivo functional and mutational studies (See e.g., Omura, F., 2008, Appl Gen & Mol Biot 78: 503-513). As described herein, the present inventors have designed isobutanol pathway enzymes, whereby the predicted MTS is removed or modified. In some instances, there exists experimental evidence for the length of the MTS. Specifically, the MTS of Ilv6 has been experimentally defined to be the N-terminal 61 amino acids (Pang et al., 1999, Biochemistry 38: 5222-31). The MTS of Ilv5 has been reported to be the N-terminal 47 residues (Kassow A., 1992, "Metabolic effects of deleting the region encoding the transit peptide in Saccharomyces cerevisiae ILV5" PhD thesis, University of Copenhagen). In addition, the deletion of the N-terminal 46 amino acids of Ilv5 has been shown to result in an active enzyme that is localized in the cytosol (Omura, F., 2008, Appl Gen & Mol Biot 78: 503-513).

[0142] As described herein, the present inventors utilize deletions and/or modifications of the N-terminal MTS to localize the enzymes of the isobutanol pathway to the cytosol. In various embodiments, the MTS can be entirely or partly deleted or its sequence can be modified to eliminate its ability to target the protein to the mitochondria. A benefit of removing the entire MTS is that the resulting protein would essentially be the `mature` form of the enzyme. The use of deletion of the N-terminal MTS can also be expanded to all enzymes/homologs to be used for isobutanol production. This is especially true for homologs from eukaryotic organisms other than S. cerevisiae where the enzymes are localized to the mitochondria. In addition, some bacterial homologs may have a putative MTS. As bacterial enzymes do not undergo an N-terminal cleavage, N-terminal deletions may be deleterious to these enzymes. In such cases, modifications of the sequence to block the MTS function of the N-terminal sequence may be preferable as such alterations would likely be less deleterious to the enzyme's activity. N-terminal MTS can be predicted by MitoProt II (See, e.g., Claros et al., 1996, Eur. J. Biochem. 241: 779-786). Using this program, the lengths of the MTS for Ilv2 and Ilv3 were predicted to be the N-terminal 55 and 20 amino acids, respectively. Modification of the MTS as contemplated herein includes the introduction of one or multiple mutations to inhibit MTS function. It is thought that the mitochondrial import machinery recognizes the aliphatic alpha helix that is formed by the MTS. Thus modifications that may inhibit MTS functions would be amino acid changes that would alter the aliphatic amino acids such as mutating the charged residues. Such modification(s) prevent its recognition by the mitochondrial import machinery and subsequent cleavage of the MTS and import into the mitochondria.

Peptide Tags to Augment Cytosolic Localization of Isobutanol Pathway Enzymes

[0143] In additional embodiments, the mitochondrially imported isobutanol pathway enzymes can be expressed as a chimeric fusion protein to augment cytosolic localization. In one embodiment, the isobutanol pathway enzyme is fused to a peptide tag, whereby said isobutanol pathway enzyme exhibits increased cytosolic localization and/or cytosolic activity in yeast as compared to the parental isobutanol pathway enzyme. In one embodiment, the isobutanol pathway enzyme is fused to a peptide tag following removal of the N-terminal Mitochondrial Targeting Sequence (MTS). In one embodiment, the peptide tag is non-cleavable. In a preferred embodiment, the peptide tag is fused at the N-terminus of the isobutanol pathway enzyme. Peptide tags useful in the present invention preferably have the following properties: (1) they do not significantly hinder the normal enzymatic function of the isobutanol pathway enzyme; (2) it folds in such as a way as to block recognition of an N-terminal MTS by the normal mitochondrial import machinery; (3) it promotes the stable expression and/or folding of the isobutanol pathway enzyme it precedes; (4) it can be detected, for example, by Western blotting or SDS-PAGE plus Coomassie staining to facilitate analysis of the overexpressed chimeric protein.

[0144] Suitable peptide tags for use in the present invention include, but are not limited to, ubiquitin, ubiquitin-like (UBL) proteins, myc, HA-tag, green fluorescent protein (GFP), and the maltose binding protein (MBP). Ubiquitin, and the Ubiquitin-like protein (Ubl's) offer several advantages. For instance, the use of Ubiquitin or similar Ubl's (e.g., SUMO) as a solubility- and expression-enhancing fusion partner has been well documented (Ecker et al., 1989, J Biol Chem 264: 7715-9; Marblestone et al., 2006, Protein Science 15: 182-9). In fact, in S. cerevisiae, several ribosomal proteins are expressed as C-terminal fusions to ubiquitin. Following translation and protein folding, ubiquitin is cleaved from its co-expressed partner by a highly specific ubiquitin hydrolase, which recognizes and requires the extreme C-terminal Gly-Gly motif present in ubiquitin and cleaves immediately following this sequence; a similar pathway removes Ubl proteins from their fusion partners.

[0145] The invention described here describes a method to re-localize a normally mitochondrial protein or enzyme by expressing it as fusion with an N-terminal, non-cleavable ubiquitin or ubiquitin-like molecule. In doing so, the re-targeted enzyme enjoys enhanced expression, solubility, and function in the cytosol. In another embodiment, the sequence encoding the MTS can be replaced with a sequence encoding one or more copies of the c-myc epitope tag (amino acids EQKLISEEDL, SEQ ID NO: 9), which will generally not target a protein into the mitochondria and can easily be detected by commercially available antibodies.

Altering the Iron-Sulfur Cluster Domain and/or Redox Active Domain

[0146] In general, the yeast cytosol demonstrates a different redox potential than a bacterial cell, as well as the yeast mitochondria. As a result, isobutanol pathway enzymes which exhibit an iron sulfur (FeS) domain and/or redox active domain, may require the redox potential of the native environments to be folded or expressed in a functional form. Expressing some isobutanol pathway enzymes in the yeast cytosol, which can harbor unfavorable redox potential, has the propensity to result in inactive proteins, even if the proteins are expressed. The present inventors have identified a number of different strategies to overcome this problem, which can arise when an isobutanol pathway enzyme which is suited to a particular environment with a specific redox potential is expressed in the yeast cytosol.

[0147] In one embodiment, the present invention provides isobutanol pathway enzymes that exhibit a properly folded iron-sulfur cluster domain and/or redox active domain in the cytosol. Such isobutanol pathway enzymes will generally comprise a mutated or modified iron-sulfur cluster domain and/or redox active domain, allowing for a non-native isobutanol pathway enzyme to be expressed in the yeast cytosol in a functional form.

[0148] In various embodiments described herein, the recombinant microorganisms may further comprise a nucleic acid encoding a chaperone protein, wherein said chaperone protein assists the folding of a protein exhibiting cytosolic activity. In a preferred embodiment, the protein exhibiting cytosolic activity is DHAD. In one embodiment, the chaperone may be a native protein. In another embodiment, the chaperone protein may be an exogenous protein. In some embodiments, the chaperone protein may be selected from the group consisting of: endoplasmic reticulum oxidoreductin 1 (Ero1, Accession No. NP.sub.--013576.1), including variants of Ero1 that have been suitably altered to reduce or prevent its normal localization to the endoplasmic reticulum; thioredoxins (which includes Trx1, Accession No. NP.sub.--013144.1; and Trx2, Accession No. NP.sub.--011725.1), thioredoxin reductase (Trr1, Accession No. NP.sub.--010640.1); glutaredoxins (which includes Grx1, Accession No. NP.sub.--009895.1; Grx2, Accession No. NP.sub.--010801.1; Grx3, Accession No. NP.sub.--010383.1; Grx4, Accession No. NP.sub.--01101.1; Grx5, Accession No. NP.sub.--015266.1; Grx6, Accession No. NP.sub.--010274.1; Grx7, Accession No. NP.sub.--009570.1; Grx8, Accession No. NP.sub.--013468.1); glutathione reductase Glr1 (Accession No. NP.sub.--015234.1); and Jac1 (Accession No. NP.sub.--011497.1), including variants of Jac1 that have been suitably altered to reduce or prevent its normal mitochondrial localization; and homologs or variants thereof.

[0149] As described herein, iron-sulfur cluster assembly for insertion into yeast apo-iron-sulfur proteins begins in yeast mitochondria. To assemble in yeast the active iron-sulfur proteins containing the cluster, either the apo-iron-sulfur protein is imported into the mitochondria from the cytosol and the iron-sulfur cluster is inserted into the protein and the active protein remains localized in the mitochondria; or the iron-sulfur clusters or precursors thereof are exported from the mitochondria to the cytosol and the active protein is assembled in the cytosol or other cellular compartments.

[0150] Targeting of yeast mitochondrial iron-sulfur proteins or non-yeast iron-sulfur proteins to the yeast cytosol can result in such proteins not being properly assembled with their iron-sulfur clusters. This present invention overcomes this problem by co-expression and cytosolic targeting in yeast of proteins for iron-sulfur cluster assembly and cluster insertion into apo-iron-sulfur proteins, including iron-sulfur cluster assembly and insertion proteins from organisms other than yeast, together with the apo-iron-sulfur protein to provide assembly of active iron-sulfur proteins in the yeast cytosol.

[0151] Therefore, in one embodiment of this invention, the apo-iron-sulfur protein DHAD enzyme encoded by the E. coli ilvD gene is expressed in yeast together with E. coli iron-sulfur cluster assembly and insertion genes comprising either the cyaY, iscS, iscU, iscA, hscB, hscA, fdx and isuX genes or the sufA, sufB, sufC, sufD, sufS and sufE genes. This strategy allows for both the apo-iron-sulfur protein (DHAD) and the iron-sulfur cluster assembly and insertion components (the products of the isc or suf genes) to come from the same organism, causing assembly of the active DHAD iron-sulfur protein in the yeast cytosol. As a modification of this embodiment, for those E. coli iron-sulfur cluster assembly and insertion components that localize to or are predicted to localize to the yeast mitochondria upon expression in yeast, the genes for these components are engineered to eliminate such targeting signals to ensure localization of the components in the yeast cytoplasm. Thus, in some embodiments, one or more genes encoding an iron-sulfur cluster assembly protein may be mutated or modified to remove a signal peptide, whereby localization of the product of said one or more genes to the mitochondria is prevented. In certain embodiments, it may be preferable to overexpress one or more genes encoding an iron-sulfur cluster assembly protein.

[0152] In additional embodiments, iron-sulfur cluster assembly and insertion components from other than E. coli can be co-expressed with the E. coli DHAD protein to provide assembly of the active DHAD iron-sulfur cluster protein. Such iron-sulfur cluster assembly and insertion components from other organisms can consist of the products of the Helicobacter pylori nifS and nifU genes or the Entamoeba histolytica nifS and nifU genes. As a modification of this embodiment, for those non-E. coli iron-sulfur cluster assembly and insertion components that localize to or are predicted to localize to the yeast mitochondria upon expression in yeast, the genes for these components can be engineered to eliminate such targeting signals to ensure localization of the components in the yeast cytoplasm.

[0153] As a further modification of this embodiment, in addition to co-expression of these proteins in aerobically-grown yeast, these proteins may be co-expressed in anaerobically-grown yeast to lower the redox state of the yeast cytoplasm to improve assembly of the active iron-sulfur protein.

[0154] In another embodiment, the above iron-sulfur cluster assembly and insertion components can be co-expressed with DHAD apo-iron-sulfur enzymes other than the E. coli IlvD gene product to generate active DHAD enzymes in the yeast cytoplasm. As a modification of this embodiment, for those DHAD enzymes that localize to or are predicted to localize to the yeast mitochondria upon expression in yeast, then the genes for these enzymes can be engineered to eliminate such targeting signals to ensure localization of the enzymes in the yeast cytoplasm.

[0155] In additional embodiments, the above methods used to generate active DHAD enzymes localized to yeast cytoplasm may be combined with methods to generate active acetolactate synthase, KARI, KIVD and ADH enzymes in the same yeast for the production of isobutanol by yeast.

[0156] In another embodiment, production of active iron-sulfur proteins other than DHAD enzymes in yeast cytoplasm can be accomplished by co-expression with iron-sulfur cluster assembly and insertion proteins from organisms other than yeast, with proper targeting of the proteins to the yeast cytoplasm if necessary and expression in anaerobically growing yeast if needed to improve assembly of the active proteins.

[0157] In another embodiment, the iron-sulfur cluster assembly protein encoding genes may be derived from eukaryotic organisms, including, but not limited to yeasts and plants. In one embodiment, the iron-sulfur cluster protein encoding genes are derived from a yeast organism, including, but not limited to S. cerevisiae. In specific embodiments, the yeast derived genes encoding iron-sulfur cluster assembly proteins are selected from the group consisting of Cfd1 (Accession No. NP.sub.--012263.1), Nbp35 (Accession No. NP.sub.--011424.1), Nar1 (Accession No. NP.sub.--014159.1), Cia1 (Accession No. NP.sub.--010553.1), and homologs or variants thereof. In a further embodiment, the iron-sulfur cluster assembly protein encoding genes may be derived from plant nuclear genes which encode proteins translocated to chloroplast or plant genes found in the chloroplast genome itself.

[0158] As noted above, the iron-sulfur cluster assembly genes may be derived from eukaryotic organisms, including, but not limited to yeasts and plants. In one embodiment, the iron-sulfur cluster genes are derived from a yeast organism, including, but not limited to S. cerevisiae. In specific embodiments, the yeast derived iron-sulfur cluster assembly genes are selected from the group consisting of CFD1, NBP35, NAR1, CIA1, and homologs or variants thereof. In a further embodiment, the iron-sulfur cluster assembly genes may be derived from a plant chloroplast.

[0159] In certain embodiments described herein, it may be desirable to reduce or eliminate the activity and/or proteins levels of one or more iron-sulfur cluster containing cytosolic proteins. This modification increases the capacity of a yeast to incorporate [Fe--S] clusters into cytosolically expressed proteins wherein said proteins can be native proteins that are expressed in a non-native compartment or heterologous proteins. This is achieved by deletion of a highly expressed native cytoplasmic [Fe--S]-dependent protein. More specifically, the gene LEU1 is deleted coding for the 3-isopropylmalate dehydratase which catalyses the conversion of 3-isopropylmalate into 2-isopropylmalate as part of the leucine biosynthetic pathway in yeast. Leu1p contains an 4Fe-4S cluster which takes part in the catalysis of the dehydratase. DHAD also contains a 4Fe-4S cluster involved in its dehydratase activity. Therefore, although the two enzymes have different substrate preferences the process of incorporation of the Fe--S cluster is generally similar for the two proteins. Given that Leu1p is present in yeast at 10000 molecules per cell (Ghaemmaghami et al., 2003, Nature 425: 737), deletion of LEU1 therefore ensures that the cell has enough spare capacity to incorporate [Fe--S] clusters into at least 10000 molecules of cytosolically expressed DHAD. Taking into account the specific activity of DHAD (E. coli DHAD is reported to have a specific activity of 63 U/mg) (Flint et al., 1993, J Biological Chem 268: 14732), the LEU1 deletion yeast strain would generally exhibit an increased capacity for DHAD activity in the cytosol as measured in cell lysate.

[0160] In alternative embodiments, it may be desirable to further overexpress an additional enzyme that converts 2,3-dihydroxyisovalerate to ketoisovalerate in the cytosol. In a specific embodiment, the enzyme may be selected from the group consisting of 3-isopropylmalate dehydratase (Leu1p) and imidazoleglycerol-phosphate dehydrogenase (His3p). Because DHAD activity is limited in the cytosol, alternative dehydratases that convert dihydroxyisovalerate (DHIV) to 2-ketoisovalerate (KIV) and are physiologically localized to the yeast cytosol may be utilized. Leu1p and His3p are dehydratases that potentially may exhibit affinity for DHIV. Leu1p is an Fe--S binding protein that is involved in leucine biosynthesis and is also normally localized to the cytosol. His3p is involved in histidine biosynthesis and is similar to Leu1p, it is generally localized to the cytosol or predicted to be localized to the cytosol. This modification overcomes the problem of a DHAD that is limiting isobutanol production in the cytosol of yeast. The use of an alternative dehydratase that has activity in the cytosol with a low activity towards DHIV may thus be used in place of the DHAD in the isobutanol pathway. As described herein, such enzyme may be further engineered to increase activity with DHIV.

The Microorganism in General

[0161] Native producers of 1-butanol, such as Clostridium acetobutylicum, are known, but these organisms also generate byproducts such as acetone, ethanol, and butyrate during fermentations. Furthermore, these microorganisms are relatively difficult to manipulate, with significantly fewer tools available than in more commonly used production hosts such as S. cerevisiae or E. coli. Additionally, the physiology and metabolic regulation of these native producers are much less well understood, impeding rapid progress towards high-efficiency production. Furthermore, no native microorganisms have been identified that can metabolize glucose into isobutanol in industrially relevant quantities.

[0162] The production of isobutanol and other fusel alcohols by various yeast species, including Saccharomyces cerevisiae is of special interest to the distillers of alcoholic beverages, for whom fusel alcohols constitute often undesirable off-notes. Production of isobutanol in wild-type yeasts has been documented on various growth media, ranging from grape must from winemaking (Romano et al., 2003, World J. of Microbiol Biot. 19: 311-5), in which 12-219 mg/L isobutanol were produced, to supplemented minimal media (Oliviera et al., 2005, World J. of Microbiol Blot. 21: 1569-76), producing 16-34 mg/L isobutanol. Work from Dickinson et al. (J Biol Chem. 272: 26871-8, 1997) has identified the enzymatic steps utilized in an endogenous S. cerevisiae pathway converting branch-chain amino acids (e.g., valine or leucine) to isobutanol.

[0163] Recombinant microorganisms provided herein can express a plurality of heterologous and/or native target enzymes involved in pathways for the production of isobutanol from a suitable carbon source.

[0164] Accordingly, "engineered" or "modified" microorganisms are produced via the introduction of genetic material into a host or parental microorganism of choice and/or by modification of the expression of native genes, thereby modifying or altering the cellular physiology and biochemistry of the microorganism. Through the introduction of genetic material and/or the modification of the expression of native genes the parental microorganism acquires new properties, e.g. the ability to produce a new, or greater quantities of, an intracellular metabolite. As described herein, the introduction of genetic material into and/or the modification of the expression of native genes in a parental microorganism results in a new or modified ability to produce isobutanol. The genetic material introduced into and/or the genes modified for expression in the parental microorganism contains gene(s), or parts of genes, coding for one or more of the enzymes involved in a biosynthetic pathway for the production of isobutanol and may also include additional elements for the expression and/or regulation of expression of these genes, e.g. promoter sequences.

[0165] In addition to the introduction of a genetic material into a host or parental microorganism, an engineered or modified microorganism can also include alteration, disruption, deletion or knocking-out of a gene or polynucleotide to alter the cellular physiology and biochemistry of the microorganism. Through the alteration, disruption, deletion or knocking-out of a gene or polynucleotide the microorganism acquires new or improved properties (e.g., the ability to produce a new metabolite or greater quantities of an intracellular metabolite, improve the flux of a metabolite down a desired pathway, and/or reduce the production of byproducts).

[0166] Recombinant microorganisms provided herein may also produce metabolites in quantities not available in the parental microorganism. A "metabolite" refers to any substance produced by metabolism or a substance necessary for or taking part in a particular metabolic process. A metabolite can be an organic compound that is a starting material (e.g., glucose or pyruvate), an intermediate (e.g., 2-ketoisovalerate), or an end product (e.g., isobutanol) of metabolism. Metabolites can be used to construct more complex molecules, or they can be broken down into simpler ones. Intermediate metabolites may be synthesized from other metabolites, perhaps used to make more complex substances, or broken down into simpler compounds, often with the release of chemical energy.

[0167] Exemplary metabolites include glucose, pyruvate, and isobutanol. The metabolite isobutanol can be produced by a recombinant microorganism which expresses or over-expresses a metabolic pathway that converts pyruvate to isobutanol. An exemplary metabolic pathway that converts pyruvate to isobutanol may be comprised of an acetohydroxy acid synthase (ALS), a ketolacid reductoisomerase (KARI), a dihyroxy-acid dehydratase (DHAD), a 2-keto-acid decarboxylase (KIVD), and an alcohol dehydrogenase (ADH).

[0168] Accordingly, provided herein are recombinant microorganisms that produce isobutanol and in some aspects may include the elevated expression of target enzymes such as ALS, KARI, DHAD, KIVD, and ADH

[0169] The disclosure identifies specific genes useful in the methods, compositions and organisms of the disclosure; however it will be recognized that absolute identity to such genes is not necessary. For example, changes in a particular gene or polynucleotide comprising a sequence encoding a polypeptide or enzyme can be performed and screened for activity. Typically such changes comprise conservative mutation and silent mutations. Such modified or mutated polynucleotides and polypeptides can be screened for expression of a functional enzyme using methods known in the art.

[0170] Due to the inherent degeneracy of the genetic code, other polynucleotides which encode substantially the same or functionally equivalent polypeptides can also be used to clone and express the polynucleotides encoding such enzymes.

[0171] As will be understood by those of skill in the art, it can be advantageous to modify a coding sequence to enhance its expression in a particular host. The genetic code is redundant with 64 possible codons, but most organisms typically use a subset of these codons. The codons that are utilized most often in a species are called optimal codons, and those not utilized very often are classified as rare or low-usage codons. Codons can be substituted to reflect the preferred codon usage of the host, a process sometimes called "codon optimization" or "controlling for species codon bias."

[0172] Optimized coding sequences containing codons preferred by a particular prokaryotic or eukaryotic host (Murray et al., 1989, Nucl Acids Res. 17: 477-508) can be prepared, for example, to increase the rate of translation or to produce recombinant RNA transcripts having desirable properties, such as a longer half-life, as compared with transcripts produced from a non-optimized sequence. Translation stop codons can also be modified to reflect host preference. For example, typical stop codons for S. cerevisiae and mammals are UAA and UGA, respectively. The typical stop codon for monocotyledonous plants is UGA, whereas insects and E. coli commonly use UAA as the stop codon (Dalphin et al., 1996, Nucl Acids Res. 24: 216-8). Methodology for optimizing a nucleotide sequence for expression in a plant is provided, for example, in U.S. Pat. No. 6,015,891, and the references cited therein.

[0173] Those of skill in the art will recognize that, due to the degenerate nature of the genetic code, a variety of DNA compounds differing in their nucleotide sequences can be used to encode a given enzyme of the disclosure. The native DNA sequence encoding the biosynthetic enzymes described above are referenced herein merely to illustrate an embodiment of the disclosure, and the disclosure includes DNA compounds of any sequence that encode the amino acid sequences of the polypeptides and proteins of the enzymes utilized in the methods of the disclosure. In similar fashion, a polypeptide can typically tolerate one or more amino acid substitutions, deletions, and insertions in its amino acid sequence without loss or significant loss of a desired activity. The disclosure includes such polypeptides with different amino acid sequences than the specific proteins described herein so long as they modified or variant polypeptides have the enzymatic anabolic or catabolic activity of the reference polypeptide. Furthermore, the amino acid sequences encoded by the DNA sequences shown herein merely illustrate embodiments of the disclosure.

[0174] In addition, homologs of enzymes useful for generating metabolites are encompassed by the microorganisms and methods provided herein.

[0175] As used herein, two proteins (or a region of the proteins) are substantially homologous when the amino acid sequences have at least about 30%, 40%, 50% 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity. To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In one embodiment, the length of a reference sequence aligned for comparison purposes is at least 30%, typically at least 40%, more typically at least 50%, even more typically at least 60%, and even more typically at least 70%, 80%, 90%, 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid "identity" is equivalent to amino acid or nucleic acid "homology"). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.

[0176] When "homologous" is used in reference to proteins or peptides, it is recognized that residue positions that are not identical often differ by conservative amino acid substitutions. A "conservative amino acid substitution" is one in which an amino acid residue is substituted by another amino acid residue having a side chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of a protein. In cases where two or more amino acid sequences differ from each other by conservative substitutions, the percent sequence identity or degree of homology may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art (See, e.g., Pearson W. R., 1994, Methods in Mol Biol 25: 365-89.

[0177] The following six groups each contain amino acids that are conservative substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).

[0178] Sequence homology for polypeptides, which is also referred to as percent sequence identity, is typically measured using sequence analysis software. See commonly owned and co-pending application US 2009/0226991. A typical algorithm used comparing a molecule sequence to a database containing a large number of sequences from different organisms is the computer program BLAST. When searching a database containing sequences from a large number of different organisms, it is typical to compare amino acid sequences. Database searching using amino acid sequences can be measured by algorithms described in commonly owned and co-pending application US 2009/0226991.

[0179] The disclosure provides recombinant microorganisms comprising a biochemical pathway for the production of isobutanol from a suitable substrate at a high yield. A recombinant microorganism of the disclosure comprises one or more recombinant polynucleotides within the genome of the organism or external to the genome within the organism. The microorganism can comprise a reduction in expression, disruption or knockout of a gene found in the wild-type organism and/or introduction of a heterologous polynucleotide and/or expression or overexpression of an endogenous polynucleotide.

[0180] In one aspect, the disclosure provides a recombinant microorganism comprising elevated expression of at least one target enzyme as compared to a parental microorganism or encodes an enzyme not found in the parental organism. In another or further aspect, the microorganism comprises a reduction in expression, disruption or knockout of at least one gene encoding an enzyme that competes with a metabolite necessary for the production of isobutanol. The recombinant microorganism produces at least one metabolite involved in a biosynthetic pathway for the production of isobutanol. In general, the recombinant microorganisms comprises at least one recombinant metabolic pathway that comprises a target enzyme and may further include a reduction in activity or expression of an enzyme in a competitive biosynthetic pathway. The pathway acts to modify a substrate or metabolic intermediate in the production of isobutanol. The target enzyme is encoded by, and expressed from, a polynucleotide derived from a suitable biological source. In some embodiments, the polynucleotide comprises a gene derived from a prokaryotic or eukaryotic source and recombinantly engineered into the microorganism of the disclosure. In other embodiments, the polynucleotide comprises a gene that is native to the host organism.

[0181] It is understood that a range of microorganisms can be modified to include a recombinant metabolic pathway suitable for the production of isobutanol. In various embodiments, microorganisms may be selected from yeast microorganisms. Yeast microorganisms for the production of isobutanol may be selected based on certain characteristics:

[0182] One characteristic may include the property that the microorganism is selected to convert various carbon sources into isobutanol. The term "carbon source" generally refers to a substance suitable to be used as a source of carbon for prokaryotic or eukaryotic cell growth. Examples of suitable carbon sources are described in commonly owned and co-pending application US 2009/0226991. Accordingly, in one embodiment, the recombinant microorganism herein disclosed can convert a variety of carbon sources to products, including but not limited to glucose, galactose, mannose, xylose, arabinose, lactose, sucrose, and mixtures thereof.

[0183] The recombinant microorganism may thus further include a pathway for the fermentation of isobutanol from five-carbon (pentose) sugars including xylose. Most yeast species metabolize xylose via a complex route, in which xylose is first reduced to xylitol via a xylose reductase (XR) enzyme. The xylitol is then oxidized to xylulose via a xylitol dehydrogenase (XDH) enzyme. The xylulose is then phosphorylated via a xylulokinase (XK) enzyme. This pathway operates inefficiently in yeast species because it introduces a redox imbalance in the cell. The xylose-to-xylitol step uses NADH as a cofactor, whereas the xylitol-to-xylulose step uses NADPH as a cofactor. Other processes must operate to restore the redox imbalance within the cell. This often means that the organism cannot grow anaerobically on xylose or other pentose sugar. Accordingly, a yeast species that can efficiently ferment xylose and other pentose sugars into a desired fermentation product is therefore very desirable.

[0184] Thus, in one aspect, the recombinant is engineered to express a functional exogenous xylose isomerase. Exogenous xylose isomerases functional in yeast are known in the art. See, e.g., Rajgarhia et al, US20060234364, which is herein incorporated by reference in its entirety. In an embodiment according to this aspect, the exogenous xylose isomerase gene is operatively linked to promoter and terminator sequences that are functional in the yeast cell. In a preferred embodiment, the recombinant microorganism further has a deletion or disruption of a native gene that encodes for an enzyme (e.g. XR and/or XDH) that catalyzes the conversion of xylose to xylitol. In a further preferred embodiment, the recombinant microorganism also contains a functional, exogenous xylulokinase (XK) gene operatively linked to promoter and terminator sequences that are functional in the yeast cell. In one embodiment, the xylulokinase (XK) gene is overexpressed.

[0185] In one embodiment, the microorganism has reduced or no pyruvate decarboxylase (PDC) activity. PDC catalyzes the decarboxylation of pyruvate to acetaldehyde, which is then reduced to ethanol by ADH via an oxidation of NADH to NADH+. Ethanol production is the main pathway to oxidize the NADH from glycolysis. Deletion of this pathway increases the pyruvate and the reducing equivalents (NADH) available for the isobutanol pathway. Accordingly, deletion of PDC genes can further increase the yield of isobutanol.

[0186] In another embodiment, the microorganism has reduced or no glycerol-3-phosphate dehydrogenase (GPD) activity. GPD catalyzes the reduction of dihydroxyacetone phosphate (DHAP) to glycerol-3-phosphate (G3P) via the oxidation of NADH to NAD+. Glycerol is then produced from G3P by Glycerol-3-phosphatase (GPP). Glycerol production is a secondary pathway to oxidize excess NADH from glycolysis. Reduction or elimination of this pathway would increase the pyruvate and reducing equivalents (NADH) available for the isobutanol pathway. Thus, deletion of GPD genes can further increase the yield of isobutanol.

[0187] In yet another embodiment, the microorganism has reduced or no PDC activity and reduced or no GPD activity.

[0188] In one embodiment, the yeast microorganisms may be selected from the "Saccharomyces Yeast Clade", as described in commonly owned and co-pending application US 2009/0226991.

[0189] The term "Saccharomyces sensu stricto" taxonomy group is a cluster of yeast species that are highly related to S. cerevisiae (Rainieri et al., 2003, J. Biosci Bioengin 96: 1-9). Saccharomyces sensu stricto yeast species include but are not limited to S. cerevisiae, S. cerevisiae, S. kudriavzevii, S. mikatae, S. bayanus, S. uvarum, S. carocanis and hybrids derived from these species (Masneuf et al., 1998, Yeast 7: 61-72).

[0190] An ancient whole genome duplication (WGD) event occurred during the evolution of the hemiascomycete yeast and was discovered using comparative genomic tools (Kellis et al., 2004, Nature 428: 617-24; Dujon et al., 2004, Nature 430:35-44; Langkjaer et al., 2003, Nature 428: 848-52; Wolfe et al., 1997, Nature 387: 708-13). Using this major evolutionary event, yeast can be divided into species that diverged from a common ancestor following the WGD event (termed "post-WGD yeast" herein) and species that diverged from the yeast lineage prior to the WGD event (termed "pre-WGD yeast" herein).

[0191] Accordingly, in one embodiment, the yeast microorganism may be selected from a post-WGD yeast genus, including but not limited to Saccharomyces and Candida. The favored post-WGD yeast species include: S. cerevisiae, S. uvarum, S. bayanus, S. paradoxus, S. castelli, and C. glabrata.

[0192] In another embodiment, the yeast microorganism may be selected from a pre-whole genome duplication (pre-WGD) yeast genus including but not limited to Saccharomyces, Kluyveromyces, Candida, Pichia, Issatchenkia, Debaryomyces, Hansenula, Yarrowia and, Schizosaccharomyces. Representative pre-WGD yeast species include: S. kluyveri, K. thermotolerans, K. marxianus, K. waltii, K. lactis, C. tropicalis, P. pastoris, P. anomala, P. stipitis, I. orientalis, I. occidentalis, I. scutulata, D. hansenii, H. anomala, Y. lipolytica, and S. pombe.

[0193] A yeast microorganism may be either Crabtree-negative or Crabtree-positive as described in described in commonly owned and co-pending application US 2009/0226991. In one embodiment the yeast microorganism may be selected from yeast with a Crabtree-negative phenotype including but not limited to the following genera: Kluyveromyces, Pichia, Issatchenkia, Hansenula, and Candida. Crabtree-negative species include but are not limited to: K. lactis, K. marxianus, P. anomala, P. stipitis, I. orientalis, I. occidentalis, I. scutulata, H. anomala, and C. utilis. In another embodiment, the yeast microorganism may be selected from a yeast with a Crabtree-positive phenotype, including but not limited to Saccharomyces, Kluyveromyces, Zygosaccharomyces, Debaryomyces, Pichia and Schizosaccharomyces. Crabtree-positive yeast species include but are not limited to: S. cerevisiae, S. uvarum, S. bayanus, S. paradoxus, S. castelli, S. kluyveri, K. thermotolerans, C. glabrata, Z. bailli, Z. rouxii, D. hansenii, P. pastorius, and S. pombe.

[0194] Another characteristic may include the property that the microorganism is that it is non-fermenting. In other words, it cannot metabolize a carbon source anaerobically while the yeast is able to metabolize a carbon source in the presence of oxygen. Nonfermenting yeast refers to both naturally occurring yeasts as well as genetically modified yeast. During anaerobic fermentation with fermentative yeast, the main pathway to oxidize the NADH from glycolysis is through the production of ethanol. Ethanol is produced by alcohol dehydrogenase (ADH) via the reduction of acetaldehyde, which is generated from pyruvate by pyruvate decarboxylase (PDC). In one embodiment, a fermentative yeast can be engineered to be non-fermentative by the reduction or elimination of the native PDC activity. Thus, most of the pyruvate produced by glycolysis is not consumed by PDC and is available for the isobutanol pathway. Deletion of this pathway increases the pyruvate and the reducing equivalents available for the isobutanol pathway. Fermentative pathways contribute to low yield and low productivity of isobutanol. Accordingly, deletion of PDC may increase yield and productivity of isobutanol.

[0195] In some embodiments, the recombinant microorganisms may be microorganisms that are non-fermenting yeast microorganisms, including, but not limited to those, classified into a genera selected from the group consisting of Tricosporon, Rhodotorula, or Myxozyma.

[0196] In one embodiment, a yeast microorganism is engineered to convert a carbon source, such as glucose, to pyruvate by glycolysis and the pyruvate is converted to isobutanol via an engineered isobutanol pathway (See, e.g., WO/2007/050671, WO/2008/098227, and Atsumi et al., 2008, Nature 45: 86-9). Alternative pathways for the production of isobutanol have been described in WO/2007/050671 and in Dickinson et al., 1998, J Biol Chem 273:25751-6.

[0197] Accordingly, the engineered isobutanol pathway to convert pyruvate to isobutanol can be comprised of the following reactions:

[0198] 1. 2 pyruvate.fwdarw.acetolactate+CO.sub.2

[0199] 2. acetolactate+NAD(P)H.fwdarw.2,3-dihydroxyisovalerate+NAD(P).sup.- +

[0200] 3. 2,3-dihydroxyisovalerate.fwdarw.alpha-ketoisovalerate

[0201] 4. alpha-ketoisovalerate.fwdarw.isobutyraldehyde+CO.sub.2

[0202] 5. isobutyraldehyde+NAD(P)H.fwdarw.isobutanol+NAD(P)

[0203] These reactions are carried out by the enzymes 1) Acetolactate Synthase (ALS), 2) Keto-acid Reducto-Isomerase (KARI), 3) Dihydroxy-acid dehydratase (DHAD), 4) Keto-isovalerate decarboxylase (KIVD), and 5) an Alcohol dehydrogenase (ADH) (FIG. 1). In another embodiment, the yeast microorganism is engineered to overexpress these enzymes. For example, these enzymes can be encoded by native genes. Alternatively, these enzymes can be encoded by heterologous genes. For example, ALS can be encoded by the alsS gene of B. subtilis, alsS of L. lactis, or the ilvK gene of K. pneumonia. For example, KARI can be encoded by the ilvC genes of E. coli, C. glutamicum, M. maripaludis, or Piromyces sp E2. For example, DHAD can be encoded by the ilvD genes of E. coli, C. glutamicum, or L. lactis. For example, KIVD can be encoded by the kivD gene of L. lactis. ADH can be encoded by ADH2, ADH6, or ADH7 of S. cerevisiae.

[0204] In one embodiment, pathway steps 2 and 5 may be carried out by KARI and ADH enzymes that utilize NADH (rather than NADPH) as a co-factor. Such enzymes are described in commonly owned and co-pending applications U.S. Ser. No. 12/610,784 and PCT/US09/62952 (published as WO/2010/051527), which are herein incorporated by reference in their entireties for all purposes. The present inventors have found that utilization of NADH-dependent KARI and ADH enzymes to catalyze pathway steps 2 and 5, respectively, surprisingly enables production of isobutanol under anaerobic conditions. Thus, in one embodiment, the recombinant microorganisms of the present invention may use an NADH-dependent KARI to catalyze the conversion of acetolactate (+NADH) to produce 2,3-dihydroxyisovalerate. In another embodiment, the recombinant microorganisms of the present invention may use an NADH-dependent ADH to catalyze the conversion of isobutyraldehyde (+NADH) to produce isobutanol. In yet another embodiment, the recombinant microorganisms of the present invention may use both an NADH-dependent KARI to catalyze the conversion of acetolactate (+NADH) to produce 2,3-dihydroxyisovalerate, and an NADH-dependent ADH to catalyze the conversion of isobutyraldehyde (+NADH) to produce isobutanol.

[0205] The yeast microorganism of the invention may be engineered to have increased ability to convert pyruvate to isobutanol. In one embodiment, the yeast microorganism may be engineered to have increased ability to convert pyruvate to isobutyraldehyde. In another embodiment, the yeast microorganism may be engineered to have increased ability to convert pyruvate to keto-isovalerate. In another embodiment, the yeast microorganism may be engineered to have increased ability to convert pyruvate to 2,3-dihydroxyisovalerate. In another embodiment, the yeast microorganism may be engineered to have increased ability to convert pyruvate to acetolactate.

[0206] Furthermore, any of the genes encoding the foregoing enzymes (or any others mentioned herein (or any of the regulatory elements that control or modulate expression thereof)) may be optimized by genetic/protein engineering techniques, such as directed evolution or rational mutagenesis, which are known to those of ordinary skill in the art. Such action allows those of ordinary skill in the art to optimize the enzymes for expression and activity in yeast.

[0207] In addition, genes encoding these enzymes can be identified from other fungal and bacterial species and can be expressed for the modulation of this pathway. A variety of organisms could serve as sources for these enzymes, including, but not limited to, Saccharomyces spp., including S. cerevisiae and S. uvarum, Kluyveromyces spp., including K. thermotolerans, K. lactis, and K. marxianus, Pichia spp., Hansenula spp., including H. polymorpha, Candida spp., Trichosporon spp., Yamadazyma spp., including Y. spp. stipitis, Torulaspora pretoriensis, Schizosaccharomyces spp., including S. pombe, Cryptococcus spp., Aspergillus spp., Neurospora spp., or Ustilago spp. Sources of genes from anaerobic fungi include, but not limited to, Piromyces spp., Orpinomyces spp., or Neocallimastix spp. Sources of prokaryotic enzymes that are useful include, but not limited to, Escherichia. coli, Zymomonas mobilis, Staphylococcus aureus, Bacillus spp., Clostridium spp., Corynebacterium spp., Pseudomonas spp., Lactococcus spp., Enterobacter spp., and Salmonella spp.

Methods in General

Identification of PDC and GPD in a Yeast Microorganism

[0208] Any method can be used to identify genes that encode for enzymes with pyruvate decarboxylase (PDC) activity or glycerol-3-phosphate dehydrogenase (GPD) activity. Suitable methods for the identification of PDC and GPD are described in co-pending applications U.S. Ser. No. 12/343,375 (published as US 2009/0226991), U.S. Ser. No. 12/696,645, and U.S. Ser. No. 12/820,505, which claim priority to U.S. Provisional Application 61/016,483, all of which are herein incorporated by reference in their entireties for all purposes.

Genetic Insertions and Deletions

[0209] Any method can be used to introduce a nucleic acid molecule into yeast and many such methods are well known. For example, transformation and electroporation are common methods for introducing nucleic acid into yeast cells. See, e.g., Gietz et al., 1992, Nuc Acids Res. 27: 69-74; Ito et al., 1983, J. Bacteriol. 153: 163-8; and Becker et al., 1991, Methods in Enzymology 194: 182-7.

[0210] In an embodiment, the integration of a gene of interest into a DNA fragment or target gene of a yeast microorganism occurs according to the principle of homologous recombination. According to this embodiment, an integration cassette containing a module comprising at least one yeast marker gene and/or the gene to be integrated (internal module) is flanked on either side by DNA fragments homologous to those of the ends of the targeted integration site (recombinogenic sequences). After transforming the yeast with the cassette by appropriate methods, a homologous recombination between the recombinogenic sequences may result in the internal module replacing the chromosomal region in between the two sites of the genome corresponding to the recombinogenic sequences of the integration cassette. (Orr-Weaver et al., 1981, PNAS USA 78: 6354-58).

[0211] In an embodiment, the integration cassette for integration of a gene of interest into a yeast microorganism includes the heterologous gene under the control of an appropriate promoter and terminator together with the selectable marker flanked by recombinogenic sequences for integration of a heterologous gene into the yeast chromosome. In an embodiment, the heterologous gene includes an appropriate native gene desired to increase the copy number of a native gene(s). The selectable marker gene can be any marker gene used in yeast, including but not limited to, HIS3, TRP1, LEU2, URA3, bar, ble, hph, and kan. The recombinogenic sequences can be chosen at will, depending on the desired integration site suitable for the desired application.

[0212] In another embodiment, integration of a gene into the chromosome of the yeast microorganism may occur via random integration (Kooistra et al., 2004, Yeast 21: 781-792).

[0213] Additionally, in an embodiment, certain introduced marker genes are removed from the genome using techniques well known to those skilled in the art. For example, URA3 marker loss can be obtained by plating URA3 containing cells in FOA (5-fluoro-orotic acid) containing medium and selecting for FOA resistant colonies (Boeke et al., 1984, Mol. Gen. Genet 197: 345-47).

[0214] The exogenous nucleic acid molecule contained within a yeast cell of the disclosure can be maintained within that cell in any form. For example, exogenous nucleic acid molecules can be integrated into the genome of the cell or maintained in an episomal state that can stably be passed on ("inherited") to daughter cells. Such extra-chromosomal genetic elements (such as plasmids, etc.) can additionally contain selection markers that ensure the presence of such genetic elements in daughter cells. Moreover, the yeast cells can be stably or transiently transformed. In addition, the yeast cells described herein can contain a single copy, or multiple copies of a particular exogenous nucleic acid molecule as described above.

Reduction of Enzymatic Activity

[0215] Yeast microorganisms within the scope of the invention may have reduced enzymatic activity such as reduced glycerol-3-phosphate dehydrogenase activity. The term "reduced" as used herein with respect to a particular enzymatic activity refers to a lower level of enzymatic activity than that measured in a comparable yeast cell of the same species. The term reduced also refers to the elimination of enzymatic activity than that measured in a comparable yeast cell of the same species. Thus, yeast cells lacking glycerol-3-phosphate dehydrogenase activity are considered to have reduced glycerol-3-phosphate dehydrogenase activity since most, if not all, comparable yeast strains have at least some glycerol-3-phosphate dehydrogenase activity. Such reduced enzymatic activities can be the result of lower enzyme concentration, lower specific activity of an enzyme, or a combination thereof. Many different methods can be used to make yeast having reduced enzymatic activity. For example, a yeast cell can be engineered to have a disrupted enzyme-encoding locus using common mutagenesis or knock-out technology. In addition, certain point-mutation(s) can be introduced which results in an enzyme with reduced activity.

[0216] Alternatively, antisense technology can be used to reduce enzymatic activity. For example, yeast can be engineered to contain a cDNA that encodes an antisense molecule that prevents an enzyme from being made. The term "antisense molecule" as used herein encompasses any nucleic acid molecule that contains sequences that correspond to the coding strand of an endogenous polypeptide. An antisense molecule also can have flanking sequences (e.g., regulatory sequences). Thus antisense molecules can be ribozymes or antisense oligonucleotides. A ribozyme can have any general structure including, without limitation, hairpin, hammerhead, or axhead structures, provided the molecule cleaves RNA.

[0217] Yeast having a reduced enzymatic activity can be identified using many methods. For example, yeast having reduced glycerol-3-phosphate dehydrogenase activity can be easily identified using common methods, which may include, for example, measuring glycerol formation via liquid chromatography.

Overexpression of Heterologous Genes

[0218] Methods for overexpressing a polypeptide from a native or heterologous nucleic acid molecule are well known. Such methods include, without limitation, constructing a nucleic acid sequence such that a regulatory element promotes the expression of a nucleic acid sequence that encodes the desired polypeptide. Typically, regulatory elements are DNA sequences that regulate the expression of other DNA sequences at the level of transcription. Thus, regulatory elements include, without limitation, promoters, enhancers, and the like. For example, the exogenous genes can be under the control of an inducible promoter or a constitutive promoter. Moreover, methods for expressing a polypeptide from an exogenous nucleic acid molecule in yeast are well known. For example, nucleic acid constructs that are used for the expression of exogenous polypeptides within Kluyveromyces and Saccharomyces are well known (see, e.g., U.S. Pat. Nos. 4,859,596 and 4,943,529, for Kluyveromyces and, e.g., Gellissen et al., Gene 190(1):87-97 (1997) for Saccharomyces). Yeast plasmids have a selectable marker and an origin of replication. In addition certain plasmids may also contain a centromeric sequence. These centromeric plasmids are generally a single or low copy plasmid. Plasmids without a centromeric sequence and utilizing either a 2 micron (S. cerevisiae) or 1.6 micron (K. lactis) replication origin are high copy plasmids. The selectable marker can be either prototrophic, such as HIS3, TRP1, LEU2, URA3 or ADE2, or antibiotic resistance, such as, bar, ble, hph, or kan.

[0219] In another embodiment, heterologous control elements can be used to activate or repress expression of endogenous genes. Additionally, when expression is to be repressed or eliminated, the gene for the relevant enzyme, protein or RNA can be eliminated by known deletion techniques.

[0220] As described herein, any yeast within the scope of the disclosure can be identified by selection techniques specific to the particular enzyme being expressed, over-expressed or repressed. Methods of identifying the strains with the desired phenotype are well known to those skilled in the art. Such methods include, without limitation, PCR, RT-PCR, and nucleic acid hybridization techniques such as Northern and Southern analysis, altered growth capabilities on a particular substrate or in the presence of a particular substrate, a chemical compound, a selection agent and the like. In some cases, immunohistochemistry and biochemical techniques can be used to determine if a cell contains a particular nucleic acid by detecting the expression of the encoded polypeptide. For example, an antibody having specificity for an encoded enzyme can be used to determine whether or not a particular yeast cell contains that encoded enzyme. Further, biochemical techniques can be used to determine if a cell contains a particular nucleic acid molecule encoding an enzymatic polypeptide by detecting a product produced as a result of the expression of the enzymatic polypeptide. For example, transforming a cell with a vector encoding acetolactate synthase and detecting increased acetolactate concentrations compared to a cell without the vector indicates that the vector is both present and that the gene product is active. Methods for detecting specific enzymatic activities or the presence of particular products are well known to those skilled in the art. For example, the presence of acetolactate can be determined as described by Hugenholtz and Starrenburg, 1992, Appl. Micro. Biot. 38:17-22.

Increase of Enzymatic Activity

[0221] Yeast microorganisms of the invention may be further engineered to have increased activity of enzymes. The term "increased" as used herein with respect to a particular enzymatic activity refers to a higher level of enzymatic activity than that measured in a comparable yeast cell of the same species. For example, overexpression of a specific enzyme can lead to an increased level of activity in the cells for that enzyme. Increased activities for enzymes involved in glycolysis or the isobutanol pathway would result in increased productivity and yield of isobutanol.

[0222] Methods to increase enzymatic activity are known to those skilled in the art. Such techniques may include increasing the expression of the enzyme by increased copy number and/or use of a strong promoter, introduction of mutations to relieve negative regulation of the enzyme, introduction of specific mutations to increase specific activity and/or decrease the Km for the substrate, or by directed evolution. See, e.g., Methods in Molecular Biology (vol. 231), ed. Arnold and Georgiou, Humana Press (2003).

Microorganism Characterized by Producing Isobutanol at High Yield

[0223] For a biocatalyst to produce isobutanol most economically, it is desired to produce a high yield. Preferably, the only product produced is isobutanol. Extra products lead to a reduction in product yield and an increase in capital and operating costs, particularly if the extra products have little or no value. Extra products also require additional capital and operating costs to separate these products from isobutanol.

[0224] The microorganism may convert one or more carbon sources derived from biomass into isobutanol with a yield of greater than 5% of theoretical. In one embodiment, the yield is greater than 10%. In one embodiment, the yield is greater than 50% of theoretical. In one embodiment, the yield is greater than 60% of theoretical. In another embodiment, the yield is greater than 70% of theoretical. In yet another embodiment, the yield is greater than 80% of theoretical. In yet another embodiment, the yield is greater than 85% of theoretical. In yet another embodiment, the yield is greater than 90% of theoretical. In yet another embodiment, the yield is greater than 95% of theoretical. In still another embodiment, the yield is greater than 97.5% of theoretical.

[0225] More specifically, the microorganism converts glucose, which can be derived from biomass into isobutanol with a yield of greater than 5% of theoretical. In one embodiment, the yield is greater than 10% of theoretical. In one embodiment, the yield is greater than 50% of theoretical. In one embodiment the yield is greater than 60% of theoretical. In another embodiment, the yield is greater than 70% of theoretical. In yet another embodiment, the yield is greater than 80% of theoretical. In yet another embodiment, the yield is greater than 85% of theoretical. In yet another embodiment the yield is greater than 90% of theoretical. In yet another embodiment, the yield is greater than 95% of theoretical. In still another embodiment, the yield is greater than 97.5% of theoretical

Microorganism Characterized by Production of Isobutanol from Pyruvate Via an Overexpressed Isobutanol Pathway and a Pdc-Minus and Gpd-Minus Phenotype

[0226] In yeast, the conversion of pyruvate to acetaldehyde is a major drain on the pyruvate pool, and, hence, a major source of competition with the isobutanol pathway. This reaction is catalyzed by the pyruvate decarboxylase (PDC) enzyme. Reduction of this enzymatic activity in the yeast microorganism results in an increased availability of pyruvate and reducing equivalents to the isobutanol pathway and may improve isobutanol production and yield in a yeast microorganism that expresses a pyruvate-dependent isobutanol pathway.

[0227] Reduction of PDC activity can be accomplished by 1) mutation or deletion of a positive transcriptional regulator for the structural genes encoding for PDC or 2) mutation or deletion of all PDC genes in a given organism. The term "transcriptional regulator" can specify a protein or nucleic acid that works in trans to increase or to decrease the transcription of a different locus in the genome. For example, in S. cerevisiae, the PDC2 gene, which encodes for a positive transcriptional regulator of PDC1,5,6 genes can be deleted; a S. cerevisiae in which the PDC2 gene is deleted is reported to have only .about.10% of wildtype PDC activity (Hohmann, 1993, Mol Gen Genet 241:657-66). Alternatively, for example, all structural genes for PDC (e.g. in S. cerevisiae, PDC1, PDC5, and PDC6, or in K. lactis, PDC1) are deleted.

[0228] Crabtree-positive yeast strains such as S. cerevisiae strain that contains disruptions in all three of the PDC alleles no longer produce ethanol by fermentation. However, a downstream product of the reaction catalyzed by PDC, acetyl-CoA, is needed for anabolic production of necessary molecules. Therefore, the Pdc- mutant is unable to grow solely on glucose, and requires a two-carbon carbon source, either ethanol or acetate, to synthesize acetyl-CoA (Flikweert et al., 1999, FEMS Microbiol Lett. 174: 73-9; and van Maris et al., 2004, Appl Environ Microbiol. 70: 159-66).

[0229] Thus, in an embodiment, such a Crabtree-positive yeast strain may be evolved to generate variants of the PDC mutant yeast that do not have the requirement for a two-carbon molecule and has a growth rate similar to wild type on glucose. Any method, including chemostat evolution or serial dilution may be utilized to generate variants of strains with deletion of three PDC alleles that can grow on glucose as the sole carbon source at a rate similar to wild type (van Maris et al., 2004, Appl Envir Micro 70: 159-66).

[0230] Another byproduct that would decrease yield of isobutanol is glycerol. Glycerol is produced by 1) the reduction of the glycolysis intermediate, dihydroxyacetone phosphate (DHAP), to glycerol-3-phosphate (G3P) via the oxidation of NADH to NAD.sup.+ by Glycerol-3-phosphate dehydrogenase (GPD) followed by 2) the dephosphorylation of glycerol-3-phosphate to glycerol by glycerol-3-phosphatase (GPP). Production of glycerol results in loss of carbons as well as reducing equivalents. Reduction of GPD activity would increase yield of isobutanol. Reduction of GPD activity in addition to PDC activity would further increase yield of isobutanol. Reduction of glycerol production has been reported to increase yield of ethanol production (Nissen et al., 2000, Yeast 16, 463-74; Nevoigt et al., Method of modifying a yeast cell for the production of ethanol, WO/2009/056984). Disruption of this pathway has also been reported to increase yield of lactate in a yeast engineered to produce lactate instead of ethanol (Dundon et al., Yeast cells having disrupted pathway from dihydroxyacetone phosphate to glycerol, US 2009/0053782).

[0231] In one embodiment, the microorganism is a Crabtree-positive yeast with reduced or no GPD activity. In another embodiment, the microorganism is a crabtree positive yeast with reduced or no GPD activity, and expresses an isobutanol biosynthetic pathway and produces isobutanol. In yet another embodiment, the microorganism is a Crabtree-positive yeast with reduced or no GPD activity and with reduced or no PDC activity. In another embodiment, the microorganism is a crabtree positive yeast with reduced or no GPD activity, with reduced or no PDC activity, and expresses an isobutanol biosynthetic pathway and produces isobutanol.

[0232] In another embodiment, the microorganism is a Crabtree-negative yeast with reduced or no GPD activity. In another embodiment, the microorganism is a Crabtree-negative yeast with reduced or no GPD activity, expresses the isobutanol biosynthetic pathway and produces isobutanol. In yet another embodiment, the microorganism is a Crabtree-negative yeast with reduced or no GPD activity and with reduced or no PDC activity. In another embodiment, the microorganism is a Crabtree-negative yeast with reduced or no GPD activity, with reduced or no PDC activity, expresses an isobutanol biosynthetic pathway and produces isobutanol.

[0233] PDC-minus/GPD-minus yeast production strains are described in co-pending applications U.S. Ser. No. 12/343,375 (published as US 2009/0226991), U.S. Ser. No. 12/696,645, and U.S. Ser. No. 12/820,505, which claim priority to U.S. Provisional Application 61/016,483, all of which are herein incorporated by reference in their entireties for all purposes.

Method of Using Microorganism for High-Yield Isobutanol Fermentation

[0234] In a method to produce isobutanol from a carbon source at high yield, the yeast microorganism is cultured in an appropriate culture medium containing a carbon source.

[0235] Another exemplary embodiment provides a method for producing isobutanol comprising a recombinant yeast microorganism of the invention in a suitable culture medium containing a carbon source that can be converted to isobutanol by the yeast microorganism of the invention.

[0236] In certain embodiments, the method further includes isolating isobutanol from the culture medium. For example, isobutanol may be isolated from the culture medium by any method known to those skilled in the art, such as distillation, pervaporation, or liquid-liquid extraction, including methods disclosed in co-pending applications U.S. Ser. No. 12/342,992 (published as US 2009/0171129) and PCT/US08/88187 (published as WO/2009/086391), which are herein incorporated by reference in their entireties for all purposes.

[0237] This invention is further illustrated by the following examples that should not be construed as limiting. The contents of all references, patents, and published patent applications cited throughout this application, as well as the Figure and the Sequence Listing, are incorporated herein by reference for all purposes.

EXAMPLES

General Methods

TABLE-US-00001 [0238] TABLE 1 Amino acid sequences disclosed herein. SEQ ID NO Protein, Accession No. 1 E. coli IlvC, NP_418222 2 S. cerevisiae Ilv5, NP_013459 3 Oryza sativa KARI, NP_001056384 4 Methanococcus maripaludis KARI, YP_001097443 5 Acidiphilium cryptum KARI, YP_001235669 6 Chlamydomonas reinhardtii KARI, XP_001702649 7 Picrophilus torridus KARI, YP_023851 8 Zymomonas mobilis KARI, YP_162876 9 c-myc epitope tag 10 Thermotoga petrophila RKU-1 dihydroxyacid dehydratase (DHAD), YP_001243973.1 11 Victivallis vadensis ATCC BAA-548 dihydroxyacid dehydratase (DHAD), ZP_01924101.1 12 Termite group 1 bacterium phylotype Rs-D17 dihydroxyacid dehydratase (DHAD), YP_001956631.1 13 Yarrowia lipolytica dihydroxyacid dehydratase (DHAD), XP_502180.2 14 Francisella tularensis subsp. tularensis WY96-3418 dihydroxyacid dehydratase (DHAD), YP_001122023.1 15 Arabidopsis thaliana dihydroxyacid dehydratase (DHAD), AAK64025.1 16 Candidatus Koribacter versatilis Ellin345 dihydroxyacid dehydratase (DHAD), YP_592184.1 (Acidobacter) 17 Gramella forsetii KT0803 dihydroxyacid dehydratase (DHAD), YP_862145.1 18 Lactococcus lactis subsp. lactis Il1403 dihydroxyacid dehydratase (DHAD), NP_267379.1 19 Saccharopolyspora erythraea NRRL 2338 dihydroxyacid dehydratase (DHAD), YP_001103528.2 20 Saccharomyces cerevisiae Ilv3, NP_012550.1 21 Piromyces sp E2 ilvD 22 Ralstonia eutropha JMP134 ilvD, YP_298150.1 23 Chromohalobacter salexigens ilvD, YP_573197.1 24 Picrophilus torridus DSM9790 ilvD, YP_024215.1 25 Sulfolobus tokodaii str. 7 dihydroxyacid dehydratase (DHAD), NP_378168.1 26 Saccharomyces cerevisiae Ilv3.DELTA.N 27 P(I/L)XXXGX(I/L)XIL (conserved motif described in Example 17) 28 PIKXXGX(I/L)XIL (conserved motif described in Example 17)

TABLE-US-00002 TABLE 2 Nucleic acid sequences disclosed herein. SEQ ID NO Gene, Accession No. 87 Lactococcus lactis subsp. lactis Il1403 (Ll_ilvD) 88 Saccharomyces cerevisiae ILV3 (ScILV3(FL)) 89 Saccharomyces cerevisiae ILV3.DELTA.N (ScILV3.DELTA.N) 90 Gramella forsetii KT0803 (Gf_ilvD) 91 Saccharopolyspora erythraea NRRL 2338 (Se_ilvD) 92 Candidatus Koribacter versatilis Ellin345 ilvD (Acidobacter) 93 Piromcyes sp E2 ilvD (Piromyces ilvD) 94 Ralstonia eutropha JMP134 ilvD, (Re_ilvD) 95 Chromohalobacter salexigens ilvD, (Cs_ilvD) 96 Picrophilus torridus DSM9790 ilvD, (Pt_ilvD) 97 Sulfolobus tokodail str. 7 ilvD, (St_ilvD) 98 E. coli ilvC.sup.Q110V, (Ec_ilvC(Q110V)) 99 Lactococcus lactis kivD, (Ll_kivD) 100 S. cerevisiae ILV5, (ScILV5)

[0239] Determination of Optical Density.

[0240] The optical density of the yeast cultures is determined at 600 nm using a DU 800 spectrophotometer (Beckman-Coulter, Fullerton, Calif., USA). Samples are diluted as necessary to yield an optical density of between 0.1 and 0.8.

[0241] Gas Chromatography.

[0242] Analysis of volatile organic compounds, including ethanol and isobutanol was performed on a HP 5890 gas chromatograph fitted with an HP 7673 Autosampler, a DB-FFAP column (J&W; 30 m length, 0.32 mm ID, 0.25_.mu.M film thickness) or equivalent connected to a flame ionization detector (FID). The temperature program was as follows: 200.degree. C. for the injector, 300.degree. C. for the detector, 50.degree. C. oven for 1 minute, 31.degree. C./minute gradient to 140.degree. C., and then hold for 2.5 min. Analysis was performed using authentic standards (>99%, obtained from Sigma-Aldrich), and a 5-point calibration curve with 1-pentanol as the internal standard.

[0243] High Performance Liquid Chromatography for Quantitative Analysis of Glucose and Organic Acids.

[0244] Analysis of glucose and organic acids was performed on a HP-1100 High Performance Liquid Chromatography system equipped with an Aminex HPX-87H Ion Exclusion column (Bio-Rad, 300.times.7.8 mm) or equivalent and an H.sup.+ cation guard column (Bio-Rad) or equivalent. Organic acids were detected using an HP-1100 UV detector (210 nm, 8 nm 360 nm reference) while glucose was detected using an HP-1100 refractive index detector. The column temperature was 60.degree. C. This method was Isocratic with 0.008 N sulfuric acid in water as the mobile phase. Flow was set at 1 mL/min. Injection volume was 20 .mu.L and the run time was 30 minutes.

[0245] High Performance Liquid Chromatography for Quantitative Analysis of Ketoisovalerate and Isobutyraldehyde.

[0246] Analysis of the DNPH derivatives of ketoisovalerate and isobutyraldehyde was performed on a HP-1100 High Performance Liquid Chromatography system equipped with a Hewlett Packard 1200 HPLC stack column (Agilent Eclipse XDB-18, 150.times.4.0 mm; 5 .mu.m particles [P/N #993967-902] and C18 Guard cartridge). The analytes were detected using an HP-1100 UV detector at 360 nm The column temperature was 50.degree. C. This method was isocratic with 0.1% H.sub.3PO.sub.4 and 70% acetonitrile in water as mobile phase. Flow was set at 3 mL/min. Injection size was 10 .mu.L and the run time was 2 minutes.

[0247] Molecular Biology and Bacterial Cell Culture.

[0248] Standard molecular biology methods for cloning and plasmid construction are generally used, unless otherwise noted (Sambrook, J., Russel, D. W. Molecular Cloning, A Laboratory Manual. 3 ed. 2001, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press).

[0249] Standard recombinant DNA and molecular biology techniques used in the Examples are well known in the art and are described by Sambrook, J., Russel, D. W. Molecular Cloning, A Laboratory Manual. 3 ed. 2001, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press; and by T. J. Silhavy, M. L. Bennan, and L. W. Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1984) and by Ausubel, F. M. et al., Current Protocols in Molecular Biology, pub. by Greene Publishing Assoc. and Wiley-Interscience (1987).

[0250] General materials and methods suitable for the routine maintenance and growth of bacterial cultures are well known in the art. Techniques suitable for use in the following examples may be found as set out in Manual of Methods for General Bacteriology (Phillipp Gerhardt, R. G. E. Murray, Ralph N. Costilow, Eugene W. Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, eds.), American Society for Microbiology, Washington, D.C. (1994)) or by Thomas D. Brock in Biotechnology: A Textbook of Industrial Microbiology, Second Edition, Sinauer Associates, Inc., Sunderland, Mass. (1989).

[0251] Yeast transformations--S. cerevisiae.

[0252] S. cerevisiae strains were transformed by the Lithium Acetate method (Gietz et al., Nucleic Acids Res. 27:69-74 (1992): Cells from 50 mL YPD cultures (YPGaI for valine auxotrophs) were collected by centrifugation (2700 rcf, 2 minutes, 25.degree. C.) once the cultures reached an OD.sub.600 of 1.0. The cells were washed cells with 50 mL sterile water and collected by centrifugation at 2700 rcf for 2 minutes at 25.degree. C. The cells were washed again with 25 mL sterile water and collected cells by centrifugation at 2700 rcf for 2 minutes at 25.degree. C. The cells were resuspended in 1 mL of 100 mM lithium acetate and transferred to a 1.5 mL eppendorf tube. The cells were collected cells by centrifugation for 20 sec at 18,000 rcf, 25.degree. C. The cells were resuspended cells in a volume of 100 mM lithium acetate that was approximately 4.times. the volume of the cell pellet. A mixture of DNA (final volume of 15 .mu.l with sterile water), 72 .mu.l 50% PEG, 10 .mu.l 1 M lithium acetate, and 3 .mu.l denatured salmon sperm DNA was prepared for each transformation. In a 1.5 mL tube, 15 .mu.l of the cell suspension was added to the DNA mixture (85 .mu.l), and the transformation suspension was vortexed with 5 short pulses. The transformation was incubated at 30 minutes at 30.degree. C., followed by incubation for 22 minutes at 42.degree. C. The cells were collected by centrifugation for 20 sec at 18,000 rcf, 25.degree. C. The cells were resuspended in 100 .mu.l SOS (1 M sorbitol, 34% (v/v) YP (1% yeast extract, 2% peptone), 6.5 mM CaCl.sub.2) or 100 .mu.l YP (1% yeast extract, 2% peptone) and spread over an appropriate selective plate.

[0253] Yeast Transformations--K. lactis.

[0254] K. lactis cells were transformed according to a slightly modified version of the protocol as described by Kooistra et al., Yeast 21: 781-792 (2004). Saturated overnight-grown cultures of K. lactis cells were diluted 1:50 into 100 mL YPD and were placed in 30.degree. C. shaker (250 rpm) and grown for 4-5 hours until the culture reached an OD.sub.600 of 0.3-0.5. Cells were collected by centrifugation (2 min, 3000.times.g) and washed with 50 ml cold sterile EB (electroporation buffer; 10 mM Tris-HCl, pH 7.5, 270 mM sucrose, 1 mM MgCl.sub.2) at 4.degree. C. Cells were resuspended in 50 mL YPD that contained 25 mM DTT and 20 mM HEPES, pH 8.0 Cells were transferred back into flasks used to grow cells and incubated in 30.degree. C. incubator (without shaking) for 30 minutes. Cells were then collected by centrifugation (2 minutes, 3000.times.g) and washed with 10 mL ice-cold sterile EB, as above. Cells were then resuspended using one cell pellet volume of ice-cold sterile EB. Sixty microliters of cells were mixed with plasmid DNA and incubated on ice for 15 minutes. For targeted integrations, or transformation of linear DNA, approximately 200-400 ng of non-specific, short (50-500 bp) linear DNA fragments were added to 300-400 ng of the linearized integrating DNA construct. This DNA was either provided by gel-purified Alul-digested salmon sperm DNA, or a mixture of annealed primers 35+36 (yielding a .about.85 bp linear duplex fragment). Cells were transferred cells to a chilled electroporation (2 mm) cuvette and pulsed using a BioRad Gene Pulser at 1 kV, 400.OMEGA., and 25 uF. The cell suspension was immediately transferred to a 14 mL round-bottom Falcon tube with 1 mL room temperature YPD and allowed to incubate vertically at 30.degree. C., 225 RPM for at 6-18 h. Cells were collected in an 1.7 mL by centrifugation for 10 seconds at maximum speed, and resuspended with 150 .mu.L YPD before being spread onto appropriate selection plates.

[0255] Yeast Colony PCR with FailSafe.TM. PCR System(EPICENTRE.RTM. Biotechnologies,

[0256] Madison, Wis.; Catalog #FS99250): Cells from each colony were added to 20 .mu.l of colony PCR mix (per reaction mix contains 6.8 .mu.l water, 1.5 .mu.l of each primer, 0.2 .mu.l of FailSafe PCR Enzyme Mix and 10 .mu.l 2.times. FailSafe Master Mix). Unless otherwise noted, 2.times. FailSafe Master Mix E was used. The PCR reactions were incubated in a thermocycler using the following touchdown PCR conditions: 1 cycle of 94.degree. C..times.2 min, 10 cycles of 94.degree. C..times.20 s, 63.degree.-54.degree. C..times.20 s (decrease 1.degree. C. per cycle), 72.degree. C..times.60 s, 40 cycles of 94.degree. C..times.20 s, 53.degree. C..times.20 s, 72.degree. C..times.60 s and 1 cycle of 72.degree. C..times.5 min.

[0257] Zymoclean Gel DNA Recovery Kit (Zymo Research, Orange, Calif.; Catalog #D4002) Protocol:

[0258] DNA fragments were recovered from agarose gels according to manufacturer's protocol.

[0259] Zymo Research DNA Clean and Concentrator Kit (Zymo Research, Orange, Calif.; Catalog #D4004) Protocol:

[0260] DNA fragments were purified according to manufacturer's protocol.

[0261] Preparation of Cell Lysates for In Vitro Enzyme Assays.

[0262] To grow cultures for cell lysates, triplicate independent cultures of the desired strain were grown overnight in 3 mL of the appropriate medium at 30.degree. C., 250 rpm. The following day, the overnight cultures were diluted into 50 mL fresh medium in 250 mL baffle-bottomed Erlenmeyer flasks and incubated at 30.degree. C. at 250 rpm. Cells were grown for at least 4 generations and the cultures were harvested in mid log phase (OD.sub.600 of 1-3) The cells of each culture were collected by centrifugation (2700.times.g, 5 min, 4.degree. C.). The cell pellets were washed by resuspending in 20 mL of ice cold water. The cells were centrifuged at 2700.times.g, 4.degree. C. for 5 min. All supernatant was removed from each tube and the tubes were frozen at -80.degree. C. until use.

[0263] Lysates were prepared by thawing each cell pellet on ice and preparing a 20% (w/v) cell suspension in lysis buffer. The lysis buffer was varied for each enzyme assay and consisted of: 0.1 M Tris-HCl pH 8.0, 5 mM MgSO.sub.4, for DHAD assays, 50 mM potassium phosphate buffer pH 6.0, 1 mM MgSO.sub.4 for ALS assays, 250 mM KPO.sub.4 pH 7.5, 10 mM MgCl.sub.2 for KARI assays, 50 mM NaHPO.sub.4, 5 mM MgCl.sub.2, for KIVD assays. 10 .mu.L of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) per 1 mL of lysis buffer was used. 800 microliters of cell suspension were added to 1 mL of 0.5 mm glass beads that had been placed in a chilled 1.5 mL tube. Cells were lysed by bead beating (6 rounds, 1 minute per round, 30 beats per second) with 2 minutes chilling on ice in between rounds. The tubes were then centrifuged (20,000.times.g, 15 min) to pellet debris and the supernatants (cell lysates) were retained in fresh tubes on ice. The protein concentration of each lysate was measured using the BioRad Bradford protein assay reagent (BioRad, Hercules, Calif.) according to manufacturer's instructions.

[0264] Preparation of Fractionated Lysates from S. cerevisiae Strains for In Vitro Enzyme Assays.

[0265] To grow cultures for cell fractionated cell lysates, triplicate independent cultures of the desired strain were grown overnight in 3 mL of the appropriate medium at 30.degree. C., 250 rpm. The following day, the overnight cultures were used to inoculate 1 L cultures of each strain which were grown in the appropriate medium at 30.degree. C. at 250 rpm until they reached an OD.sub.600 of approximately 2. The cells were collected by centrifugation (1600.times.g, 2 min) and the culture medium was decanted. The cell pellets were resuspended in 50 mL sterile deionized water, collected by centrifugation (1600.times.g, 2 min), and the supernatant was discarded.

[0266] To obtain spheroplasts, the cell pellets were resuspended in 0.1 M Tris-SO.sub.4, pH 9.3, to a final concentration of 0.1 g/mL, and DTT was added to a final concentration of 10 mM. Cells were incubated with gentle (60 rev/min) agitation on an orbital shaker for 20 min at 30.degree. C., and the cells were then collect by centrifugation (1600.times.g, 2 min) and the supernatant discarded. Each cell pellet was resuspended in spheroplasting buffer, which consists of (final concentrations): 1.2M sorbitol (Amresco, catalog #0691), 20 mM potassium phosphate pH 7.4) and then collected by centrifugation (1600.times.g, 10 min). Each cell pellet was resuspended in spheroplasting buffer to a final concentration of 0.1 g cells/mL in a 500 mL centrifuge bottle, and 50 mg of Zymolyase 20T (Seikagaku Biobusiness, Code#120491) was added to each cell suspension. The suspensions were incubated overnight (approximately 16 hrs) at 30.degree. C. with gentle agitation (60 rev/min) on an orbital shaker. The efficacy of spheroplasting was ascertained by diluting an aliquot of each cell suspension 1:10 in either sterile water or in spheroplasting buffer, and comparing the aliquots microscopically (under 40.times. magnification). In all cases, >90% of the water-diluted cells lysed, indicating efficient spheroplasting. The spheroplasts were centrifuged (3000.times.g, 10 min, 20.degree. C.), and the supernatant was discarded. Each cell pellet was resuspended in 50 mL spheroplast buffer without Zymolyase, and cells were collected by centrifugation (3000.times.g, 10 min, 20.degree. C.).

[0267] To fractionate spheroplasts, the cells were resuspended to a final concentration of 0.5 g/mL in ice cold mitochondrial isolation buffer (MIB), consisting of (final concentration): 0.6M D-mannitol (BD Difco Cat#217020), 20 mM HEPES-KOH, pH 7.4. For each 1 mL of resulting cell suspension, 0.01 mL of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) was added. The cell suspension was subjected to 35 strokes of a Dounce homogenizer with the B (tight) pestle, and the resulting cell suspension was centrifuged (2500.times.g, 10 min, 4.degree. C.) to collect cell debris and unbroken cells and spheroplasts. Following centrifugation, 2 mL of each sample (1 mL of the pGV1900 transformed cells) were saved in a 2 mL centrifuge tube on ice and designated the "W" (for Whole cell extract) fraction, while the remaining supernatant was transferred to a clean, ice-cold 35 mL Oakridge screw-cap tube and centrifuged (12,000.times.g, 20 min, 4.degree. C.) to pellet mitochondria and other organellar structures. Following centrifugation, 5 mL of each resulting supernatant was transferred to a clean tube on ice, being careful to avoid the small, loose pellet, and labelled the "S" (soluble cytosol) fraction. The resulting pellets were resuspended in MIB containing Protease Arrest solution, and were labelled the "P" ("pellet") fractions. The BioRad Protein Assay reagent (BioRad, Hercules, Calif.) was used according to manufacturer's instructions to determine the protein concentration of each fraction.

[0268] Preparation of Fractionated Lysates from K. lactis Strains for In Vitro Enzyme Assays

[0269] Cultures (20 mL YPD) were inoculated with yeast cells (GEVO1742 and GEVO1829) and incubated at 30.degree. C. while shaking at 250 RPM until they reached late-log to stationary phase (OD.sub.600 of approximately 10). Cells from the 20 mL cultures were used to inoculate a 250 mL YPD culture at an OD.sub.600 of approximately 0.2. The cultures were incubated at 30.degree. C. while shaking at 250 RPM until they reached mid-log (OD.sub.600 .about.2).

[0270] To prepare spheroplasts, the cells were collected in 500 mL bottles at 5000.times.g for 5 minutes at room temperature. The pellets were resuspended with 8 mL Spheroplasting Buffer A (25 mM potassium phosphate (pH 7.5), 1 mM MgCl.sub.2, 1 mM EDTA, 1.25 mM TPP, 1 mM DTT) without sorbitol and transferred to pre-weighed 50 mL tubes. The cells were collected at 1600.times.g for 5 minutes at room temperature. The cells were resuspended with 8 mL of Spheroplasting Buffer A with 2.5 M Sorbitol (Amresco Code#0691) and protease inhibitor (G Biosciences Yeast/Fungal ProteaseArrest.TM. (Catalog #788-333)). Approximately 5 mg of Zymolyase 20T Zymolyase 20T (Seikagaku Biobusiness Code#120491) was added to each cell suspension. The suspensions were incubated at 30.degree. C. with gentle agitation (e.g. 50 RPM), with the tube on its side for good mixing, for 1-2 hours. The efficiency of formation of spheroplasts was verified by dilution of the spheroplast suspension 1:10 into Spheroplasting Buffer A with 2.5 M sorbitol and 1:10 in water. Spheroplasts should remain intact when diluted into the buffer but appear fuzzy or completely disappear when diluted into water. The spheroplasts were collected at 1600.times.g for 7 minutes at 4.degree. C. The spheroplasts were gently washed with 2 mL of Spheroplasting Buffer A with 2.5 M sorbitol and protease inhibitor, and collected at 1600.times.g for 7 minutes at 4.degree. C. The spheroplasts were resuspended in 2 mL of Spheroplasting Buffer A with 2.5 M sorbitol and protease inhibitor.

[0271] To fractionate the spheroplasts, 8 mL of Spheroplasting Buffer A with 0.2 M sorbitol and protease inhibitor was slowly added to the cell suspension, bringing the final concentration of Sorbitol to 0.66 M. The spheroplasts were broken with 10 strokes using a B (tight fitting) pestle in a 15 mL Dounce homogenizer (Bellco Glass, Inc. Cat#1984-10015) on ice. The homogenate was transferred to a 50 mL tube, and the cell debris was collected by centrifugation at 4.degree. C. for 10 minutes at 1600.times.g. The supernatant was transferred to a 15 mL tube with a pipette. This supernatant is the "W" fraction. 5 mL of this "W" fraction was transferred to a 35 mL Oakridge tube and centrifuged at 48,000.times.g for 20 minutes at 4.degree. C. The resulting supernatant was transferred to a 15 mL tube and labeled "S." The pellet was resuspended in 5 mL of Spheroplasting Buffer A with 0.66 M Sorbitol and protease inhibitor and labeled "P." All fractions were stored on ice at 4.degree. C. while in use. The BioRad Protein Assay reagant (BioRad, Hercules, Calif.) was used according to manufacturer's instructions to determine the protein concentration of each fraction.

[0272] ALS Assay.

[0273] Cell lysates were prepared and protein concentrations were determined as described above. The colorimetric ALS Assay (FAD-independent) performed here was based on the assay described in Hugenholtz, J. and Starrenburg, J. C., Appl. Microbiol. Biotechnol. (1992) 38:17-22. Reaction buffer was prepared by mixing 900 .mu.l 1M potassium phosphate buffer pH 6.0, 180 .mu.l 100 mM MgSO.sub.4, 180 .mu.l 100 mM TPP, 3.96 ml 500 mM pyruvate and 12.78 ml water. For the no substrate control, the volume of pyruvate was replaced with water. Lysates were prepared at a final protein concentration of 2 .mu.g/.mu.l in Spheroplasting Buffer A with 0.66 M sorbitol. To 900 .mu.L ALS Buffer, 100 .mu.L of lysate was added and incubated at 30.degree. C. for 30 min. Acetoin standards were also prepared at concentrations of 2 mM, 1 mM, 0.5 mM, and 0 mM. From each sample and standard, 175 .mu.L was transferred to a fresh 1.5 mL tube. To each sample and standard was added 25 .mu.L 35% (v/v) H.sub.2SO.sub.4, and all were incubated at 37.degree. C. for 30 mins. After the incubation, the following were added in order, to each standard and sample, with the solutions being mixed by vortexing in between each addition: 50 .mu.L 50% (w/v) NaOH, 50 .mu.L 0.5% creatine, and 50 .mu.L 5% 1-naphthol (in 2.5N NaOH). The samples and standards were incubated at room temperature for 1 hour, being mixed by vortexing every 15 minutes. To a 96 well, half-area, UV-Star, transparent, flat-bottom plate (Catalog #675801, Greiner Bio One, Frickenhausen, Germany), 100 uL of each sample or standard was transferred, and the samples were analyzed by a plate reader by measuring absorbance at 530 nm.

[0274] KARI Assay.

[0275] Cell lysates were prepared and protein concentrations were determined as described above. Acetolactate substrate was made by mixing 50 .mu.l of ethyl-2 acetoxy-2-methyl-acetoacetate with 990 .mu.l of water. Then 10 .mu.l of 2 N NaOH was sequentially added, with vortex mixing between additions, until 260 .mu.l of NaOH was added. The acetolactate was agitated at room temperature for 20 min and then held on ice. NADPH was prepared in 0.01N NaOH (to improve stability) to a concentration of 50 mM. The concentration was determined by reading the OD of a diluted sample at 340 nm in a spectrophotometer and using the molar extinction coefficient of 6.22 M.sup.-1 cm.sup.-1 to calculate the actual concentration (the OD.sub.340 of a 100 .mu.M solution of NAD(P)H should be 0.622). Three buffers were prepared and held on ice. Reaction buffer contained 250 mM KPO.sub.4 pH 7.5, 10 mM MgCl.sub.2, 1 mM DTT, 10 mM acetolactate, and 0.2 mM NADPH. No substrate buffer contained everything except the acetolactate. No NAD(P)H buffer contained everything except the NADPH. Reactions were performed in triplicate using 10 .mu.l of cell extract with 90 .mu.l of reaction buffer in a 96-well plate in a SpectraMax 340PC multi-plate reader (Molecular Devices, Sunnyvale, Calif.). The reaction was followed at 340 nm by measuring a kinetic curve for 5 minutes, with OD readings taken every 10 seconds. The reactions were performed at 30.degree. C. The reactions were performed in complete, no substrate, and no NAD(P)H buffers. The V.sub.max for each extract was determined after subtracting the background reading of the no substrate control from the reading in complete buffer.

[0276] DHAD Assay.

[0277] Cell lysates were prepared and protein concentrations were determined as described above. The DHAD activity of each lysate was ascertained as follows. In a fresh 1.5 mL centrifuge tube, 50 .mu.L of each lysate was mixed with 50 .mu.L of 0.1 M 2,3-dihydroxyisovalerate (DHIV), 25 .mu.L of 0.1 M MgSO.sub.4, and 375 .mu.L of 0.05M Tris-HCl pH 8.0, and the mixture was incubated for 30 min at 35.degree. C. Each tube was then heated to 95.degree. C. for 5 min to inactivate any enzymatic activity, and the solution was centrifuged (16,000.times.g for 5 min) to pellet insoluble debris. To prepare samples for analysis, 100 .mu.L of each reaction were mixed with 100 .mu.L of a solution consisting of 4 parts 15 mM dinitrophenyl hydrazine (DNPH) in acetonitrile with 1 part 50 mM citric acid, pH 3.0, and the mixture was heated to 70.degree. C. for 30 min in a thermocycler. The solution was then analyzed by HPLC as described above in General Methods to quantitate the concentration of ketoisovalerate (KIV) present in the sample.

[0278] KIVD Assay.

[0279] Cell lysates were prepared and protein concentrations were determined as described above. KIVD Assay buffer, containing 1 Roche Protease Inhibitor tablet per 5 mL buffer, was added to each cell pellet to create a 20% (w/v) cell suspension. The KIVD assay buffer was prepared at a final concentration of 0.05 M NaHPO.sub.4*H.sub.2O, 5 mM MgCl.sub.2*8H.sub.2O, and 1.5 mM Thiamin pyrophosphate chloride. The reaction substrate, .alpha.-keto-isovalerate (3-methyl-2-oxobutanoic acid, Acros Organics), was added where appropriate at 30 mM. Lysates were diluted in reaction buffer at a final protein concentration of 0.1 .mu.g/.mu.L. To 1.5 mL tubes, 50 .mu.L of lysate (5 .mu.g of protein) was mixed with 200 .mu.L of reaction buffer with or without substrate. The reactions were incubated at 37.degree. C. for 20 minutes, and the reactions were immediately filtered through a 2 .mu.m filter plate. The filtered samples were diluted 1:10 in water, and 100 .mu.L of the 1:10 dilution was mixed with 100 .mu.L of derivatization reagent in a 0.2 ml thin-wall PCR tubes. Derivatization reagent was prepared by mixing 4 ml of 2,4-Dinitrophenyl Hydrazine (DNPH) in 15 mM in HPLC-grade Acetonitrile with 1 ml 50 mM Citric Acid Buffer, pH 3. The samples were incubated at 70.degree. C. for 30 minutes. The samples were analyzed by HPLC.

[0280] ADH Assay.

[0281] Cell lysates were prepared and protein concentrations were determined as described above. Assays (set up in triplicate for each lysate) contained 10 .mu.L of each lysate (or an appropriate dilution of each lysate) plus 90 .mu.L of reaction buffer, which consisted of (final concentrations present in 1.times. reaction buffer): 0.1M Tris-HCl pH 7.5, 10 mM MgC.sub.12, 1 mM DTT, 0.2 mM NADH (or NADPH, where indicated; each diluted from a 4.4 mM spectrophotometrically-confirmed stock), and 11 mM isobutyraldehyde. Where indicated, as controls a parallel set of assay reactions were set up using reaction buffer lacking isobutyraldehyde and/or NAD(P)H, as indicated. For experiments measuring the acetaldehyde-dependent oxidation of NAD(P)H, reaction buffer was prepared in which acetaldehyde was substituted for isobutyraldehyde. In these cases, the reaction buffer contained at least 11 mM acetaldehyde, although the exact amount present is an estimate due to the inherent difficulties of pipetting acetaldehyde solution. Finally, in some cases a parallel set of reactions lacking yeast cell lysate was included as a negative control. After being added (using a multi-channel pipet) to the wells of a 96-well plate, the reactions were immediately placed into a plate reader that had been pre-warmed to 30.degree. C., and the absorbance at 340 nm was measured every 12 seconds over a period of 300 seconds. Kinetic parameters were computed from assays with linear slopes (where necessary, assays were repeated with appropriate dilutions to obtain linear NAD(P)H consumption curves).

Composition of Culture Media

[0282] Drugs: When indicated, G418 (Calbiochem, Gibbstown, N.J.) was added at 0.2 g/L, Phleomycin (InvivoGen, San Diego, Calif.) was added at 7.5 mg/L, Hygromycin (InvivoGen, San Diego, Calif.) was added at 0.2 g/L, and 5-fluoro-orotic acid (FOA; Toronto Research Chemicals, North York, Ontario, Canada) was added at 1 g/L.

[0283] YP: 1% (w/v) yeast extract, 2% (w/v) peptone.

[0284] YPD: YP containing 2% (w/v) glucose unless otherwise noted,

[0285] YPGal: YP containing 2% (w/v) galactose

[0286] YPE: YP containing 2% (w/v) Ethanol.

[0287] SC media: 6.7 g/L Difco.TM. Yeast Nitrogen Base, 14 g/L Sigma.TM. Synthetic Dropout Media supplement (includes amino acids and nutrients excluding histidine, tryptophan, uracil, and leucine; Sigma-Aldrich, St. Louis, Mo.), 0.076 g/L histidine, 0.076 g/L tryptophan, 0.380 g/L leucine, and 0.076 g/L uracil. Drop-out versions of SC media is made by omitting one or more of histidine (H), tryptophan (W), leucine (L), or uracil (U or Ura). When indicated, SC media are supplemented with additional isoleucine (9xI; 0.684 g/L), valine (9xV; 0.684 g/L) or both isoleucine and valine (9xIV). SCD is SC containing 2% (w/v) glucose unless otherwise noted, SCGal is SC containing 2% (w/v) galactose and SCE is SC containing 2% (w/v) ethanol. For example, SCD-Ura+9xIV would be composed of 6.7 g/L Difco.TM. Yeast Nitrogen Base, 14 g/L Sigma.TM. Synthetic Dropout Media supplement (includes amino acids and nutrients excluding histidine, tryptophan, uracil, and leucine), 0.076 g/L histidine, 0.076 g/L tryptophan, 0.380 g/L leucine, 0.684 g/L isoleucine, 0.684 g/L valine, and 20 g/L glucose.

[0288] SCD-V+9xI: 6.7 g/L Difco.TM. Yeast Nitrogen Base, 0.076 g/L Adenine hemisulfate, 0.076 g/L Alanine 0.076 g/L, Arginine hydrochloride, 0.076 g/L Asparagine monohydrate, 0.076 g/L Aspartic acid, 0.076 g/L Cysteine hydrochloride monohydrate, 0.076 g/L Glutamic acid monosodium salt, 0.076 g/L Glutamine, 0.076 g/L Glycine, 0.076 g/L myo-lnositol, 0.76 g/L Isoleucine, 0.076 g/L Lysine monohydrochloride, 0.076 g/L Methionine, 0.008 g/L p-Aminobenzoic acid potassium salt, 0.076 g/L Phenylalanine, 0.076 g/L Proline, 0.076 g/L Serine, 0.076 g/L Threonine, 0.076 g/L Tyrosine disodium salt, and 20 g/L glucose.

[0289] YNB: 6.7 g/L Difco.TM. Yeast Nitrogen Base supplemented with indicated nutrients as follows: histidine (H; 0.076 g/L), tryptophan (W; 0.076 g/L), leucine (L; 0.380 g/L), uracil (U or Ura; 0.076 g/L), isoleucine (1; 0.076 g/L), valine (V; 0.076 g/L), and casamino acids (CAA; 10 g/L). When indicated, YNB media are supplemented with higher amounts of isoleucine (10xI=0.76 g/L), valine (10xV=0.76 g/L) or both isoleucine and valine (10xIV). YNBD is YNB containing 2% (w/v) glucose unless otherwise noted, YNBGal is YNB containing 2% (w/v) galactose and YNBE is YNB containing 2% (w/v) ethanol. For example, YNBGal+HWLU+10xI+G418 would be composed of 6.7 g/L Difco.TM. Yeast Nitrogen Base, 0.076 g/L histidine, 0.076 g/L tryptophan, 0.380 g/L leucine, 0.076 g/L uracil, 0.76 g/L isoleucine, 0.2 g/L G418, and 20 g/L galactose.

[0290] Plates: Solid versions of the above described media contain 2% (w/v) agar.

Example 1

Isobutanol Pathway is Partially Cytosolic when Expressed in Yeast

[0291] The purpose of this example is to illustrate that three enzymes in the isobutanol biosynthetic pathway (acetolactate synthase, ketoisovalerate decarboxylase, and isobutanol dehydrogenase) are localized to the cytosol when expressed in yeast.

TABLE-US-00003 TABLE 3 Genotype of strains disclosed in Example 1. GEVO No. Genotype/Source 1287 K. lactis ATCC 200826 MAT .alpha. uraA1 trp1 leu2 lysA1 ade1 lac4-8 [pKD1] 1742 K. lactis ATCC 200826 MAT .alpha. uraA1 trp1 leu2 lysA1 ade1 lac4-8 [pKD1] pdc1::Kan.sup.R 1829 K. lactis ATCC 200826 MAT .alpha. uraA1 trp1 leu2 lysA1 ade1 lac4-8 [pKD1] pdc1::kan.sup.R {P.sub.TDH3:Ec_ilvC- .DELTA.N; P.sub.TEF1:Ec_ilvD-.DELTA.N(codon optimized for K. lactis):ScLEU2 integrated} {P.sub.TEF1:Ll_kivD; P.sub.TDH3ScADH7:KmURA3 integrated} {P.sub.CUP1-1: Bs_alsS, TRP1 random integrated}

TABLE-US-00004 TABLE 4 Plasmids disclosed in Example 1. pGV No. Genotype pGV1503 ScTEF1promoter-kanR bla, pUC ori (GEVO) pGV1537 KlPDC1 promoter region + Klpdc1 3'UTR sequence, ScTEF1promoter-kanR bla, pUC ori (GEVO) pGV1590 TEF1 promoter:Ll-kivd (codon optimized for E. coli):TDH3 promoter:ADH7:CYC1 terminator, Km-URA3, 1.6 micron ori, bla, pUC ori (GEVO) pGV1726 CUP1 promoter:Bs-alsS:CYC1 terminator, TRP1, bla, pUC-ori pGV1727 TEF1 promoter:Ec-ilvD.DELTA.N (codon optimized for K. lactis):TDH3 promoter:Ec-ilvC.DELTA.N:CYC1 terminator, LEU2, bla, pUC ori (GEVO)

Plasmids

[0292] pGV1503 contains an S. cerevisiae TEF1 promoter region driving a G418-resistance gene (kan.sup.R).

[0293] pGV1537 was constructed by inserting an (AatII plus MfeI)-digested PCR product containing approximately 500 bp each of KIPDC1 5' and 3' untranslated regions, into (AatII plus EcoRI)-digested pGV1503. The insert was generated by SOE-PCR. First, the KIPDC1 5' and 3' untranslated regions were amplified from K. lactis genomic DNA by primer pairs 1006+1016 and 1017+1009, respectively. Primers 1016 and 1017 were designed to have overlapping sequences. The two fragments were then joined by PCR using primers 1006+1009.

[0294] pGV1590 is a K. lactis plasmid for expression of the L. lactis kivD and the S. cerevisiae ADH7. Expression of the L. lactis kivD is driven by the S. cerevisiae TEF1 promoter and expression of the S. cerevisiae ADH7 is driven by the S. cerevisiae TDH3 promoter. pGV1590 was generated by cloning a SalI-NotI fragment carrying the S. cerevisiae ADH7 gene into the XhoI-NotI sites of pGV1585. The S. cerevisiae ADH7 gene fragment originated as a PCR product from S. cerevisiae genomic DNA using primers 410 and 411.

[0295] pGV1726 is a yeast integration plasmid (utilizing the S. cerevisiae TRP1 gene as selection marker) for random integration (i.e. for K. lactis). This plasmid does not carry a yeast replication origin, thus is unable to replicate episomally. This plasmid also carries the B. subtilis alsS gene, whose expression is under the control of the S. cerevisiae CUP1 promoter. pGV1726 was generated by cloning a SacI-NgoMIV fragment carrying the S. cerevisiae CUP1 promoter, Bs-alsS ORF and the CYC1 terminator into the same sites of pGV1645. The vector, pGV1645, is a K. lactis expression plasmid that was used for expression of the B. subtilis alsS under the control of the K. lactis PDC1 promoter. This plasmid also carries the S. cerevisiae TRP1 gene as a selection marker and the 1.6 micron replication origin. Digestion of pGV1645 with SacI and NgoMIV removes the K. lactis PDC1 promoter, B. subtilis alsS, CYC1 terminator and the 1.6 micron origin of replication. The insert fragment carrying the S. cerevisiae CUP1 promoter, B. subtilis alsS ORF and the CYC1 terminator was obtained from pGV1649 via digestion with SacI and NgoMIV. The CUP1 promoter originated as a PCR product from S. cerevisiae genomic DNA using primers 637 and 638. The B. subtilis alsS originated as a PCR product from B. subtilis genomic DNA using primers 767 and 697.

[0296] pGV1727 is a yeast integration plasmid (utilizing the S. cerevisiae LEU2 gene as selection marker) for random integration (i.e. for K. lactis). This plasmid does not carry a yeast replication origin, thus is unable to replicate episomally. This plasmid carries the E. coli ilvD.DELTA.N and ilvC.DELTA.N genes, whose expressions are under the control of the S. cerevisiae TEF1 and TDH3 promoters respectively. The E. coli ilvD.DELTA.N is a shortened version of E. coli ilvD where the sequence coding for the first 24 amino acids, which encodes for a putative mitochondrial targeting sequence, was removed. Likewise, the E. coli ilvC.DELTA.N is a shortened version of E. coli ilvC where the sequence coding for the first 22 amino acids, which is predicted to function as a mitochondrial targeting sequence was removed. pGV1727 was generated by cloning a XhoI-NgoMIV fragment carrying the E. coli ilvC.DELTA.N gene and the CYC1 terminator into the same sites of pGV1635. The vector, pGV1635, is a K. lactis expression plasmid that was used for expression of the E. coli ilvD.DELTA.N gene under the control of the S. cerevisiae TEF1 promoter. The ilvD.DELTA.N gene is followed by the TDH3 promoter, a short MCS (includes an XhoI site), the CYC1 terminator and the 1.6 micron replication origin. This plasmid carries the S. cerevisiae LEU2 gene as a selection marker. Digestion of pGV1635 with XhoI and NgoMIV removes the CYC1 terminator and the 1.6 micron replication origin. This sequence was replaced by the insert fragment carrying the E. coli ilvC.DELTA.N and the CYC1 terminator which was obtained from pGV1677 digested with XhoI and NgoMIV. The E. coli ilvD.DELTA.N originated as a PCR product from pGV1578 (plasmid carrying E. coli ilvD codon optimized for K. lactis from DNA2.0, Menlo Park, Calif.) using primers 1151 and 1152. The E. coli ilvC.DELTA.N originated as a PCR product from pGV1160 (plasmid carrying the full length E. coli ilvC gene) using primers 1149 and 1150. The E. coli ilvC in pGV1160 originated as a PCR product from E. coli genomic DNA using primers 387 and 388.

[0297] GEVO1287 was transformed with PmlI-digested pGV1537, yielding GEVO1742. GEVO1829 was constructed by sequentially transforming GEVO1742 with gene fragments from pGV1590, pGV1727, and pGV1726 following the standard lithium acetate protocol. First, a 7.8 kb fragment of pGV1590 generated by digestion with NgoMIV and MfeI was transformed into GEVO1742. Next, this transformant strain was transformed with pGV1727 (FIG. 4) that had been linearized by digestion with BcgI. Finally, this transformant strains was transformed with pGV1726 that had been linearized by digestion with AhdI. The final transformant was GEVO1829.

[0298] Cellular fractions were prepared from GEVO1742 and GEVO1829 as described above. The protein concentration used to calculate specific activities from all three fractions ("W," "S," and "P") was measured for the "W" fraction. Below are the results for the assays measuring isobutanol dehydrogenase, acetolactate synthase, and ketoisovalerate decarboxylase activities.

Alcohol Dehydrogenase (ADH) Assay

[0299] The results from the assay are summarized in Table 5. The "W" fraction and the "S" fraction of the pathway carrying strain (GEVO1829) contained at least three times the NADPH dependent alcohol dehydrogenase activity found in the same fractions of GEVO1742. The "W" and "S" fractions of GEVO1829 contained more than four times the activity present in the "P" fraction. These data indicated that S. cerevisiae Adh7 activity was predominantly localized to the cytosol.

TABLE-US-00005 TABLE 5 Alcohol Dehydrogenase Activity. Specific Alcohol Sample Dehydrogenase Activity (U/mg protein) 1742 W 0.08 .+-. 0.00 1742 S 0.07 .+-. 0.02 1742 P 0.03 .+-. 0.012 1829 W 0.26 .+-. 0.00 1829 S 0.25 .+-. 0.02 1829 P 0.04 .+-. 0.02

Acetolactate Synthase (ALS) Assay

[0300] The results from the assay are summarized in Table 6. The "W" and "S" fractions of the isobutanol pathway carrying strain (GEVO1829) contained ALS activity, while no activity was detected in the same fractions of GEVO1742. The "W" and "S" fractions contained three times higher ALS activity than the "P" fraction. These data indicated that B. subtilis ALS activity was predominantly localized to the cytosol.

TABLE-US-00006 TABLE 6 Acetolactate Synthase Activity. Sample Specific Acetolactate Synthase Activity (U/mg protein) 1742 W 0.00 .+-. 0.00 1742 S 0.00 .+-. 0.00 1742 P 0.00 .+-. 0.00 1829 W 0.10 .+-. 0.01 1829 S 0.10 .+-. 0.00 1829 P 0.03 .+-. 0.00

Ketoisovalerate Decarboxylase (KIVD) Assay

[0301] The results from the assay are summarized in Table 7. The "W" and "S" fractions of the isobutanol pathway carrying strain (GEVO1829) contained 8-10 times greater activity than in the same fractions of GEVO1742. Furthermore, the activity in "S" fraction was 45.times. higher than what was detected in "P" fraction. These data indicated that L. lactis KIVD activity was predominantly localized in the cytosol.

TABLE-US-00007 TABLE 7 Ketoisovalerate decarboxylase (KIVD) Assay. Sample Specific Ketoisovalerate Decarboxylase Activity (U/mg protein) 1742 W 0.05 .+-. 0.00 1742 S 0.05 .+-. 0.04 1742 P 0.03 .+-. 0.00 1829 W 0.38 .+-. 0.02 1829 S 0.45 .+-. 0.04 1829 P 0.01 .+-. 0.00

Example 2

Construction of an ILV3 Deletion Mutant

[0302] The purpose of this example is to describe the construction of an ILV3 deletion mutant of S. cerevisiae, GEVO2244.

TABLE-US-00008 TABLE 8 Genotype of strains disclosed in Example 2. GEVO No. Genotype/Source GEVO1147 K. lactis, NRRL Y-1140, (obtained from USDA) GEVO1188 S. cerevisiae, CEN.PK, (obtained from Euroscarf); MAT.alpha. ura3 leu2 his3 trp1 GEVO2145 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 ilv3::Kl_URA3 GEVO2244 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 ilv3.DELTA.

TABLE-US-00009 TABLE 9 Plasmids disclosed in Example 2. Plasmid name Genotype pUC19 bla, pUC-ori (obtained from Invitrogen) pGV1299 K. lactis URA3, bla, pUC-ori (GEVO)

[0303] Plasmid pGV1299 was constructed by cloning the K. lactis URA3 gene into pUC19. The K. lactis URA3 was obtained by PCR using primers 575 and 576 from K. lactis genomic DNA. The PCR product was digested with EcoRI and BamHI and cloned into pUC19 which was similarly digested. The K. lactis URA3 insert was sequenced (Laragen Inc) to confirm correct sequence.

[0304] The ilv3::KI_URA3 integration cassette contained, from 5' to 3', the following: 1) a 80 bp homology to ILV3 (position +158 to 237) that functions as the 5' targeting sequence for the integration, 2) the K. lactis URA3 marker gene, 3) a 60 bp homology to a region ILV3 (position -21 to +39) that is further upstream of the 5' targeting sequence to facilitate loop-out of the K. lactis URA3 marker, and 4) a 221 bp homology to the 3' region of ILV3 (position +1759 to 1979) that functions as the 3' targeting sequence for the integration. This cassette was generated by SOE-PCR. The K. lactis URA3 gene was amplified from pGV1299 using primers 1887 and 1888. Only the 3' region of ILV3 was initially amplified using primers 1623 and 1892 from genomic DNA and this product was used as template to amplify the 3' region of ILV3 using primers 1889 and 1890. The K. lactis URA3 and the 3' region of ILV3 were combined by SOE-PCR using primers 1886 and 1890.

[0305] GEVO1188 was transformed with the ilv3::KI_URA3 cassette described above and plated onto YNBD+W+CAA (-Ura) plates. Initially, eight colonies (#1-8) were patched onto YNBD+HUWLIV plates and then replica plated onto YNBD+HUWLI (-V) plates to test for valine auxotrophy. As none of these exhibited valine auxotrophy, an additional eight colonies (#9-16) were streaked out for single colonies and 3 or 4 isolates (A through C or D) from each streak were tested for valine auxotrophy. Isolates A-C from clone #12 exhibited valine auxotrophy.

[0306] These isolates were tested for the correct integrations by colony PCR using primer pairs 1916+1920 and 1917+1921 for the 5' and 3' junctions, respectively. Correct sized bands were observed with clones #12A through C with primer pair 1916+1920. Correct sized bands were observed with clone 12A when FailSafe Master Mix A or C was used with primer pair 1917+1921. Clone #12A was designated as GEVO2145. The valine auxotrophies of GEVO2145 were reconfirmed by streaking them onto SCD+9xIV and SCD-V+9xI plates. GEVO2145 exhibited no growth on the medium lacking valine (SCD-V+9xI) while it grew on medium supplemented with valine (SCD+9xIV). The parent strain, GEVO1188, grew on both media.

[0307] GEVO2145 was streaked onto YNBE+W+CAA+FOA to isolate strains in which the K. lactis URA3 had been excised through homologous recombination, i.e. "looped out". Five FOA resistant clones (A-E) were tested for auxotrophies for valine and uracil. All five clones exhibited auxotrophies to both nutrients. Clone A was designated GEVO2244. Colony PCR using primers 1891 and 1892 with FailSafe Buffer C was performed and the loss of the KI_URA3 cassette was confirmed.

Example 3

DHAD Activity is Localized to Mitochondria

[0308] The purpose of this Example is to demonstrate that the DHAD activity encoded by ScILV3 is localized to the mitochondria.

TABLE-US-00010 TABLE 9 Genotype of strains disclosed in Example 3. GEVO No. Genotype/Source Gevo2244 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 ilv3.DELTA.

TABLE-US-00011 TABLE 10 Plasmids disclosed in Example 3. pGV No. Genotype pGV1106 pUC ori, bla (AmpR), 2micron ori, URA3, TDH3 promoter- Myc tag-polylinker-CYC1 terminator pGV1900 pUC ori, bla (AmpR), 2micron ori, URA3, TEF1 promoter-ScILV3(FL)

[0309] Plasmid pGV1106 is a variant of p426GPD (described in Mumberg et al, 1995, Gene 119-122). To obtain pGV1106, annealed oligos 271 and 272 were ligated into p426GPD that had been digested with SpeI and XhoI, and the inserted DNA was confirmed by sequencing.

[0310] Plasmid pGV1900 was generated by amplifying the full-length, native ScILV3 nucleotide sequence from S. cerevisiae strain CEN.PK genomic DNA using primers 1617 and 1618. The resulting 1.76 kb fragment, which contained the complete ScILV3 coding sequence (SEQ ID NO: 88) flanked by 5' SalI and 3' BamHI restriction site sequences was digested with SalI and BamHI and ligated into pGV1662 (described in Example 6) which had been digested with SalI and BamHI.

[0311] To measure DHAD activities present in fractionated cell extracts, GEVO2244 was transformed singly with either pGV1106, which served as an empty vector control, or with pGV1900, which is an expression plasmid for ScILV3.

[0312] An independent clonal transformant of each plasmid was isolated, and a 1 L culture of each strain was grown in SCGaI-Ura+9xIV at 30.degree. C. at 250 rpm. The OD.sub.600 was noted, the cells were collected by centrifugation (1600.times.g, 2 min) and the culture medium was decanted. The cell pellets were resuspended in 50 mL sterile deionized water, collected by centrifugation (1600.times.g, 2 min), and the supernatant was discarded. The OD.sub.600 and total wet cell pellet weight of each culture are listed in Table 11, below:

TABLE-US-00012 TABLE 11 OD.sub.600 and pellet mass (g) of strain GEVO2244 transformed with the indicated plasmids. Pellet mass Plasmid OD.sub.600 (g) pGV1106 2.2 7.6 pGV1900 1.3 3.8

[0313] To obtain spheroplasts, the cell pellets were resuspended in 0.1M Tris-SO.sub.4, pH 9.3, to a final concentration of 0.1 g/mL, and DTT was added to a final concentration of 10 mM. Cells were incubated with gentle (60 rev/min) agitation on an orbital shaker for 20 min at 30.degree. C., and the cells were then collect by centrifugation (1600.times.g, 2 min) and the supernatant discarded. Each cell pellet was resuspended in spheroplasting buffer, which consists of (final concentrations): 1.2M sorbitol (Amresco, catalog #0691), 20 mM potassium phosphate pH 7.4) and then collected by centrifugation (1600.times.g, 10 min). Each cell pellet was resuspended in spheroplasting buffer to a final concentration of 0.1 g cells/mL in a 500 mL centrifuge bottle, and 50 mg of Zymolyase 20T (Seikagaku Biobusiness, Code#120491) was added to each cell suspension. The suspensions were incubated overnight (.about.16 hrs) at 30.degree. C. with gentle agitation (60 rev/min) on an orbital shaker. The efficacy of spheroplasting was ascertained by diluting an aliquot of each cell suspension 1:10 in either sterile water or in spheroplasting buffer, and comparing the aliquots microscopically (under 40.times. magnification). In all cases, >90% of the water-diluted cells lysed, indicating efficient spheroplasting. The spheroplasts were centrifuged (3000.times.g, 10 min, 20.degree. C.), and the supernatant was discarded. Each cell pellet was resuspended in 50 mL spheroplast buffer without Zymolyase, and cells were collected by centrifugation (3000.times.g, 10 min, 20.degree. C.).

[0314] To fractionate spheroplasts, the cells were resuspended to a final concentration of 0.5 g/mL in ice cold mitochondrial isolation buffer (MIB), consisting of (final concentration): 0.6M D-mannitol (BD Difco Cat#217020), 20 mM HEPES-KOH, pH 7.4. For each 1 mL of resulting cell suspension, 0.01 mL of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) was added. The cell suspension was subjected to 35 strokes of a Dounce homogenizer with the B (tight) pestle, and the resulting cell suspension was centrifuged (2500 g, 10 min, 4.degree. C.) to collect cell debris and unbroken cells and spheroplasts. Following centrifugation, 2 mL of each sample (1 mL of the pGV1900 transformed cells) were saved in a 2 mL centrifuge tube on ice and designated the "W" (for Whole cell extract) fraction, while the remaining supernatant was transferred to a clean, ice-cold 35 mL Oakridge screw-cap tube and centrifuged (12,000.times.g, 20 min, 4.degree. C.) to pellet mitochondria and other organellar structures. Following centrifugation, 5 mL of each resulting supernatant was transferred to a clean tube on ice, being careful to avoid the small, loose pellet, and labelled the "S" (soluble cytosol) fraction. The resulting pellets were resuspended in MIB containing Protease Arrest solution, and were labelled the "P" ("pellet") fractions. Protein from the "P" fraction was released after dilution 1:5 in DHAD assay buffer (see above) by rapid mixing in a 1.5 mL tube with a Retsch Ball Mill MM301 in the presence of 0.1 mM glass beads. The mixing was performed 4 times for 1 minute.

[0315] The BioRad Protein Assay reagant (BioRad, Hercules, Calif.) was used according to manufacturer's instructions to determine the protein concentration of each fraction.

[0316] The DHAD activity of each fraction was ascertained as described in the methods above.

TABLE-US-00013 TABLE 12 Specific activities (KIV generation) and ratios of specific activities from fractionated lysates of S. cerevisiae strain GEVO2244 carrying plasmids to overexpress the indicated DHAD homolog. Each data point is the result of triplicate samples. Sp. Activity Lysate (pGV# [U/mg protein and fraction*) DHAD in fraction] Std. Dev. 1106 W -- n.d. 1106 S -- n.d. 1106 P -- n.d. 1900 W ScILV3(FL) 0.0096 0.0018 1900 S ScILV3(FL) 0.0052 0.0004 1900 P ScILV3(FL) 0.0340 0.0029

[0317] Cells overexpressing the full-length, native S. cerevisiae Ilv3 contained in a greater proportion of the specific DHAD activity in the mitochondrial fraction (P) versus the cytosolic fraction (S).

Example 4

Replacing Current Mitochondrially Targeted Isobutanol Pathway Enzymes with Fungal Homologs or Functional Analogs that are Targeted to the Cytosol

[0318] The purpose of this example is to illustrate that fungal homologs of isobutanol a pathway enzymes exhibit cytosolic activity.

TABLE-US-00014 TABLE 13 Genotype of strains disclosed in Example 4. GEVO No. Genotype/Source 1187 MATa ura3-52 leu2-3_112 his3.DELTA.1 trp1-289 ADE2 CEN.PK2-1C 2280 MATa ura3-52 leu2-3_112 his3.DELTA.1 trp1-289 ADE2 CEN.PK2-1C integrated pGV1730 at PDC1 locus 2618 MATa ura3-52 leu2-3_112 his3.DELTA.1 trp1-289 ADE2 CEN.PK2-1C integrated pGV2114 at PDC1 locus 2621 MATa ura3-52 leu2-3_112 his3.DELTA.1 trp1-289 ADE2 CEN.PK2-1C integrated pGV2117 at PDC1 locus 2622 MATa ura3-52 leu2-3_112 his3.DELTA.1 trp1-289 ADE2 CEN.PK2-1C integrated pGV2118 at PDC1 locus

TABLE-US-00015 TABLE 14 Plasmids disclosed in Example 4. pGV No. Genotype 1730 P.sub.Cup1-11:Bs_alsS, pUC ORI, Amp.sup.R, TRP1, PDC1 3'-fragment-NruI-PDC1 5'-fragment. 2114 P.sub.Cup1-11:Bs_alsScoSc, pUC ORI, Amp.sup.R, TRP1, PDC1 3'-fragment-NruI-PDC1 5'-fragment. 2117 P.sub.Cup1-11:Ta_alsS, pUC ORI, Amp.sup.R, TRP1, PDC1 3'-fragment-NruI-PDC1 5'-fragment. 2118 P.sub.Cup1-11:Ts_alsS, pUC ORI, Amp.sup.R, TRP1, PDC1 3'-fragment-NruI-PDC1 5'-fragment.

[0319] Yeast AHASs are normally mitochondrial, thus favoring fungal ALS enzymes for as cytosolically functional isobutanol pathway enzymes. Sequence analysis by Le and Choi (Bull. Korean Chem. Soc. (2005) 26:916-920) showed that there is a conserved sequence `RFDDR` found in AHASs that is not conserved among ALSs. This sequence is likely involved in FAD-binding by AHASs and thus could be used to distinguish between the FAD-dependent AHASs and the FAD-independent ALSs. Using this region to distinguish between AHASs and ALSs BLAST searches of fungal sequence databases were performed and resulted in the identification of ALS homologs from several fungal species (Magnaporthe grisea, Phaeosphaeria nodorum, Trichoderma atroviride (SEQ ID NO: 71), Talaromyces stipitatus (SEQ ID NO: 72), Penicillium marneffei, and Glomerella graminicola). Of these sequences, the ALS homologs from M. grisea, P. nodorum, T. atroviride and T. stipitatus are predicted to be cytoplasmic by Mitoprot II v.1.101 as described in the paper M. G. Claros, P. Vincens. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur. J. Biochem. 241, 779-786 (1996).

[0320] Fungal ALS genes were synthesized by DNA 2.0 with codon optimization biased for S. cerevisiae. The following ALS constructs were made and tested for ALS activity by assaying acetoin in the media during a growth timecourse. All ALS genes were cloned into the integration vector pGV1730 (SEQ ID NO: 69) as described herein.

[0321] Plasmid pGV1730 is a yeast integration plasmid used to replace the PDC1 gene in S. cerevisiae with the B. subtilis alsS gene (SEQ ID NO: 70) (not codon optimized for S. cerevisiae) expressed using the S. cerevisiae CUP1 promoter. This plasmid carries the S. cerevisiae TRP1 gene as a selection marker.

[0322] Construction of pGV2114: pGV1730 was treated with BamHI and SalI and the 4932 bp vector fragment was purified by gel electrophoresis as described. The B. subtilis AlsS (codon-optimized for expression in S. cerevisiae) gene was ligated to the pGV1730 vector fragment as a BamHI and SalI 1722 bp fragment using standard methods with an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed using restriction enzyme analysis.

[0323] Construction of pGV2117. pGV1730 was treated with BamHI and SalI and the 4932 bp vector fragment was purified by gel electrophoresis as described. The T. atroviride ALS gene was ligated to the pGV1730 vector fragment as a BamHI and SalI 1686 bp fragment using standard methods with an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed using restriction enzyme analysis.

[0324] Construction of pGV2118. pGV1730 was treated with BamHI and SalI and the 4932 bp vector fragment was purified by gel electrophoresis as described. The T. stipitatus ALS gene was ligated to the pGV1730 vector fragment as a BamHI and SalI 1707 bp fragment using standard methods with an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed using restriction enzyme analysis.

[0325] All yeast strains were constructed by treating the plasmid to be integrated with NruI and then transforming the plasmid according to the standard yeast transformation protocol as described herein. Transformants were selected by plating transformed cells onto SCD-W media and growing at 30.degree. C. for 2 days. Primary transformants were single colony purified and then tested for correct integration using colony PCR. Colony PCR was performed using the Yeast colony PCR to check for proper integration of the integrative plasmids used the FailSafe.TM. PCR System (EPICENTRE.RTM. Biotechnologies, Madison, Wis.; Catalog #FS99250) according to the manufacturer protocol The PCR reactions were incubated in a thermocycler using the following conditions: 1 cycle of 94.degree. C. for 2 min, 40 cycles of 94.degree. C. for 30 s, 53.degree. C. for 30 s, 72.degree. C. for 60 s and 1 cycle of 72.degree. C. for 10 min. Presence of the positive PCR product was assessed using agarose gel electrophoresis. Primer pairs for the 5'-end and 3'-end integration sites contained one primer on the plasmid and one primer in the genome.

[0326] Yeast strains GEVO1187, 2280, 2618, 2621 and 2622 were grown in YPD overnight at 30.degree. C. A 100 mL culture was inoculated to 1 OD/mL and split into 2 50 mL cultures. This was the time zero. One of the 50 mL cultures received 500 .mu.M CuSO.sub.4 at time 2 hours and the other did not. Timepoints consisted of removing 1 mL at times 0, 2, 2.5, 3, 4, 7.5, and 23 hours. At each timepoint the OD.sub.600 was determined and acetoin concentrations were determined using GC as described in the General Methods. Before GC samples were treated with H.sub.2SO.sub.4 to convert intermediates to acetoin. The graph shows the acetoin concentrations in the media of the strains in which transcription of the ALS genes was induced by CuSO.sub.4. The acetoin values were normalized to cell OD. Both the T. stipitatus ALS and the T. atroviride ALS showed increased levels of acetoin as compared to the no ALS control (FIG. 2).

[0327] ALS activity in whole cell lysates is determined as described in General Methods. Activity in mitochondrial/organellar (P) and cytosolic (S) fractions and whole cell (W) lysates is assayed as described in General Methods

Example 5

Replacing Current Mitochondrially Targeted Isobutanol Pathway Enzymes with Homologs or Functional Analogs from Anaerobic Fungi

[0328] The purpose of this example is to illustrate that homologues of isobutanol a pathway enzymes from anaerobic fungi exhibit cytosolic activity.

TABLE-US-00016 TABLE 15 Genotype of strains disclosed in Example 5. GEVO No. Genotype GEVO2244 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 ilv3.DELTA.

TABLE-US-00017 TABLE 16 Plasmids disclosed in Example 5. Plasmid name Genotype pGV1106 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TDH3 promoter-Myc tag-polylinker-CYC1 terminator pGV1662 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-(kivD) pGV1855 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Ll_ilvD

[0329] Plasmid pGV1106 is described in Example 3, above.

[0330] Plasmid pGV1662 (SEQ ID NO: 81) served as the parental plasmid of pGV1855, pGV1900, and pGV2019. The salient features of pGV1662 include the yeast 2 micron origin of replication, the URA3 selectable marker, and the ScTEF1 promoter sequence followed by restriction sites into which an ORF can be cloned to permit its expression under the regulation of the TEF1 promoter.

[0331] Plasmid pGV1855 contains the L. lactis ilvD. The L. lactis ilvD sequence was synthesized (DNA2.0, Menlo Park, Calif.) and included a unique SalI and a NotI site at the 5' and 3' end of the coding sequence, respectively. The synthesized DNA was digested with SalI and NotI and ligated into vector pGV1662 that had been digested with SalI plus NotI, yielding pGV1855.

[0332] The DHAD homolog (ilvD) from the anaerobic fungi Piromyces sp. E2 has a predicted MTS of 49 amino acids at the N-terminus. Thus, a nucleotide sequence encoding the Piromyces ilvD lacking the N-terminal 49 amino acids and with a start codon placed at the N-terminus was synthesized (SEQ ID NO: 73). In addition, a SalI site and a BamHI site were introduced at the 5' and 3' ends of this ORF. This fragment was cloned into the SalI and BamHI sites of pGV1662. The resulting plasmid was transformed in to GEVO2242. An empty vector, pGV1106, is used as a negative control. Plasmid, pGV1855, expressing L. lactis ilvD is used as a positive control.

[0333] An independent clonal transformant of each plasmid is isolated, and a 1 L culture of each strain is grown in SCGaI-Ura+9xIV at 30.degree. C. at 250 rpm. The OD.sub.600 is noted, the cells are collected by centrifugation (1600.times.g, 2 min) and the culture medium is decanted. The cell pellets are resuspended in 50 mL sterile deionized water, collected by centrifugation (1600.times.g, 2 min), and the supernatant is discarded.

[0334] To obtain spheroplasts, the cell pellets are resuspended in 0.1M Tris-SO.sub.4, pH 9.3, to a final concentration of 0.1 g/mL, and DTT is added to a final concentration of 10 mM. Cells are incubated with gentle (60 rev/min) agitation on an orbital shaker for 20 min at 30.degree. C., and the cells are then collected by centrifugation (1600.times.g, 2 min) and the supernatant discarded. Each cell pellet is resuspended in spheroplasting buffer, which consists of (final concentrations): 1.2M sorbitol (Amresco, catalog #0691), 20 mM potassium phosphate pH 7.4) and then collected by centrifugation (1600.times.g, 10 min). Each cell pellet is resuspended in spheroplasting buffer to a final concentration of 0.1 g cells/mL in a 500 mL centrifuge bottle and 50 mg of Zymolyase 20T (Seikagaku Biobusiness, Code#120491) is added to each cell suspension. The suspensions are incubated overnight (approximately 16 hrs) at 30.degree. C. with gentle agitation (60 rev/min) on an orbital shaker. The efficacy of spheroplasting is ascertained by diluting an aliquot of each cell suspension 1:10 in either sterile water or in spheroplasting buffer, and comparing the aliquots microscopically (under 40.times. magnification). The spheroplasts are centrifuged (3000.times.g, 10 min, 20.degree. C.), and the supernatant is discarded. Each cell pellet is resuspended in 50 mL spheroplast buffer without Zymolyase and cells are collected by centrifugation (3000.times.g, 10 min, 20.degree. C.).

[0335] To fractionate spheroplasts, the cells are resuspended to a final concentration of 0.5 g/mL in ice cold mitochondrial isolation buffer (MIB), consisting of (final concentration): 0.6M D-mannitol (BD Difco Cat#217020), 20 mM HEPES-KOH, pH 7.4. For each 1 mL of resulting cell suspension, 0.01 mL of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) is added. The cell suspension is subjected to 35 strokes of a Dounce homogenizer with the B (tight) pestle, and the resulting cell suspension is centrifuged (2500.times.g, 10 min, 4.degree. C.) to collect cell debris and unbroken cells and spheroplasts. Following centrifugation, 2 mL of each sample (1 mL of the pGV1900 transformed cells) are saved in a 2 mL centrifuge tube on ice and designated the "W" (for Whole cell extract) fraction, while the remaining supernatant is transferred to a clean, ice-cold 35 mL Oakridge screw-cap tube and centrifuged (12,000.times.g, 20 min, 4.degree. C.) to pellet mitochondria and other organellar structures. Following centrifugation, 5 mL of each resulting supernatant is transferred to a clean tube on ice, being careful to avoid the small, loose pellet, and labelled the "S" (soluble cytosol) fraction. The resulting pellets are resuspended in MIB containing Protease Arrest solution, and are labelled the "P" ("pellet") fractions. The protein concentration of each fraction is determined using the BioRad Protein Assay reagant (BioRad, Hercules, Calif.) according to manufacturer's instructions.

[0336] The DHAD activity of each fraction is ascertained using the DHAD assays as described above in the General Methods.

Example 6

Modification of the N-Terminal Mitochondrial Targeting Sequence of an Isobutanol Pathway Enzyme

[0337] The purpose of this example is to illustrate that removal or modification of N-terminal mitochondrial targeting sequences allows for cytosolic activity of isobutanol pathway enzymes.

TABLE-US-00018 TABLE 17 Genotype of strains disclosed in Example 6. GEVO Genotype No. 1803 MATa/alpha ura3/ura3 leu2/leu2 his3/his3 trp1/trp1 pdc1::Bs-alsS, TRP1/PDC1

TABLE-US-00019 TABLE 18 Plasmids disclosed in Example 6. Plasmid name Relevant Genes/Usage Genotype pGV1354 Plasmid that contains P.sub.TDH3:ILV.DELTA.N47:CYC1 the Ilv5.DELTA.N47 gene. term, bla, ColE1 ORI, URA3, 2 .mu. ori. pGV1662 Parent vector that has pTEF1::L. lactis Ampicillin resistance, kivD::CYC1 the 2 .mu. origin, a URA3 gene, the term, bla, ColE1 ORI, TEF1 promoter, CYC1 URA3, 2 .mu. ori. terminator region and an E. coli origin. It also has the L. lactis KivD gene that is removed by cutting the plasmid with SalI and NotI, and then gel purifying the vector portion. SalI and NotI were used for cloning genes to be expressed from the TEF1 promoter. pGV1810 Plasmid that contains the pTEF1::ILV5::CYC1 full length ILV5 gene. This was term, bla, used as a PCR template to ColE1 ORI, URA3, 2 .mu. ori. generate the .DELTA.46-ilv5 mutant. pGV1831 Plasmid that contains pTEF1::Sc Ilv5 the Ilv5.DELTA.N47 gene N47::CYC1 under control of the TEF1 term, bla, ColE1 ORI, promoter. URA3, 2 .mu. ori. pGV1833 Plasmid that contains pTEF1::Sc ILV5:CYC1 the full length ILV5 gene under term, bla, control of the TEF1 promoter. ColE1 ORI, URA3, 2 .mu. ori pGV1901 The S. cerevisiae KARI pTEF1::.DELTA.46ilv5 with the N-terminal KARI::CYC1 46 amino acid deleted (.DELTA.46) cloned term, bla, ColE1 into pGV1662 at the SalI-NotI ORI, URA3, 2 .mu. ori sites of the vector. The S. cerevisiae .DELTA.46 KARI was a SalI-NotI fragment that was PCR amplified from pGV1810 using primers 1809 and 1615. pGV1824 The E. coli coSc KARI pTEF1::E. coli coSc cloned into pGV1662 KARI:CYC1 term, bla, at SalI-BamHI sites of the vector. ColE1 ORI, URA3, 2 .mu. ori

[0338] The yeast enzymes acetohydroxyacid synthase (AHAS; ILV2+ILV6), ketol-acid reductoisomerase (KARI; ILV5), and dihydroxyacid dehydratase (DHAD; ILV3) that carry out the first three steps of isobutanol production are physiologically localized to the mitochondria. Mitochondrial matrix proteins are typically targeted to the mitochondria by an N-terminal mitochondrial targeting sequence (MTS), which is then cleaved off in the mitochondria resulting in the `mature` form of the enzyme. N-terminal deletions of ILV5 have been shown to re-localize this enzyme to the cytosol (Omura, 2008, Appl. Microbiol. Biotechnol. 78: 503-513; Omura, WO/2009/078108 A1, hereby incorporated by reference in its entirety).

[0339] N-terminal mitochondria targeting sequences (MTS) are predicted by MitoProt II software; Claros et al., 1996, Eur. J. Biochem. 241: 779-786. Two N-terminal deletions of the ILV5 gene was constructed, one missing the first 46 amino acids and one missing the first 47 amino acids.

[0340] pGV1831 was constructed as follows. pGV1662 was digested with SalI and NotI and the large fragment (6.3 Kb vector backbone) was gel purified by agarose gel electrophoresis. The Ilv5.DELTA.N47 gene was excised from plasmid pGV1354 (SEQ ID NO: 80) using SalI and NotI. The ilv5.DELTA.N47 gene fragment (1.06 Kb) was purified away from the larger vector fragment by agarose gel electrophoresis. The pGV1662 vector and ilv5.DELTA.N47 insert were ligated using standard methods in an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed using restriction enzyme analysis, namely generation of the correct insert size by digesting clones with SalI and NotI enzymes. The clones were verified by sequencing with the primers 351, 1625, and 1626. Purified plasmid DNA was transformed into S. cerevisiae strain GEVO1803 using a standard yeast transformation protocol.

[0341] pGV1833 was constructed as follows. pGV1662 was digested with SalI and NotI and the large fragment (6.3 Kb vector backbone) was gel purified by agarose gel electrophoresis. Primers 1615 and 1616 were used to amplify the S. cerevisiae ILV5 gene from the plasmid template pGV1810 by PCR. The correct fragment size was verified with DNA gel electrophoresis (1.2 Kb). The PCR product was purified after PCR using the Qiagen QIAquick PCR Purification Kit. The PCR product was then digested with XhoI and NotI to generate ends compatible with the pGV1662 backbone (the XhoI end of the PCR product is compatible with the SalI end of the vector, although the ligated DNA fragment can't be recut with either enzyme). After digestion, the PCR product was purified with a Qiagen QIAquick PCR Purification Kit. The two fragments were ligated using standard methods in an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed using restriction enzyme analysis. In this case, SacI plus NotI digestion yielded a fragment of the predicted size (1.6 Kb). The clones were verified by sequencing with the primers 351, 1625, and 1626. Purified plasmid DNA was transformed into S. cerevisiae strain GEVO1803.

[0342] pGV1901 was constructed as follows. pGV1662 was digested with SalI and NotI and the large fragment (6.3 Kb vector backbone) was gel purified by agarose gel electrophoresis. The ILV5 gene was amplified from pGV1810 (SEQ ID NO: 82) using primers 1809 (which removes the first 46 amino acids from the N-terminus while adding a methionine codon) and 1615. The PCR product was digested with SalI and NotI. After digestion, the PCR product was purified on an agarose gel and the proper fragment (1.07 Kb) was recovered using the Zymoclean Gel DNA Recovery Kit. The pGV1662 vector and Ilv5-.DELTA.46 PCR products were ligated using standard methods in an approximately 5:1 insert:vector molar ratio and transformed into TOP10 chemically competent E. coli cells. Plasmid DNA was isolated and correct clones were confirmed with PCR screening of colonies using primers 351 and 1577. The predicted correct PCR product was 580 bp. The clones were sequenced using primers 351, 1625, and 1626. Purified plasmid DNA was transformed into S. cerevisiae strain GEVO1803 using the standard yeast transformation protocol.

[0343] pGV1824 contains the E. coli ilvC gene that is codon optimized for S. cerevisiae cloned into the SalI and BamHI of pGV1662 as described above. The sequence of the codon optimized E. coli ilvC is found as SEQ ID NO: 83.

[0344] Plasmids were transformed into the yeast strain GEVO1803 and an individual colony was purified from each transformation. KARI assays of whole cell lysates were performed at pH 7.5 as described in General Methods. Results are shown in FIG. 3.

[0345] KARI activity in mitochondrial/organellar (P) and cytosolic (S) fractions and whole cell (W) lysates is assayed as described in General Methods

Example 7

Scaffolding Two or More Isobutanol Pathway Enzymes

[0346] The purpose of this example is to illustrate how isobutanol pathway enzymes can be scaffolded in order to localize them to the cytosol.

[0347] Cellulolytic microorganisms utilize a scaffolded enzyme complex called a cellulosome. In such a complex, numerous enzymes are docked to a single scaffold protein, called a scaffoldin, which contain multiple binding domains called cohesin domains. Each cohesin domain interacts with a dockerin domain. In a cellulosome complex, each cellulytic enzyme also has a dockerin domain that allows it to bind to the scaffoldin.

[0348] The cohesin domains of a scaffoldin protein, for example, CipA from Clostridium thermocellum, can be expressed in yeast. The dockerin domains from the cellulolytic enzymes from the same organism, for example Xyn10B, can be fused to the isobutanol enzymes and the fusion proteins expressed in yeast.

[0349] The activity of each pathway enzyme in whole cell lysates is determined as described in General Methods. Activity in mitochondrial/organellar (P) and cytosolic (S) fractions and whole cell (W) lysates is assayed as described in General Methods.

Example 8

Adding of Tags, e.g. Ubiquitin Tags, to the N-Terminus of an Isobutanol Pathway Enzyme

[0350] The purpose of this is example is to demonstrate that isobutanol pathway enzymes can be targeted to the yeast cytosol. For instance, this example illustrates how a DHAD enzyme can be targeted to the yeast cytosol.

TABLE-US-00020 TABLE 18 Genotype of strains disclosed in Example 8. GEVO No. Genotype/Source Gevo2242 S.cerevisiae, CEN.PK; MAT-alpha ura3 leu2 his3 trp1 ilv5.sup.D255E pdc1::Bs-alsS,TRP1 Gevo2244 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 ilv3.DELTA.

TABLE-US-00021 TABLE 19 Plasmids disclosed in Example 8. pGV No. Genotype pGV1106 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TDH3 promoter- Myc tag-polylinker-CYC1 terminator pGV1662 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- (kivD) pGV1784 pUC ori, kanR, Mm_ubiquitin coding sequence pGV1855 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Ll_ilvD pGV1897 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X) pGV1900 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- ScILV3(FL) pGV2019 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- ScILV3.DELTA.N pGV2052 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X)-ScIlv3(FL) pGV2053 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X)-ScIlv3.DELTA.N pGV2054 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X)-Ll_ilvD pGV2055 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X)-Gf_ilvD pGV2056 pMB1 ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter- Mm_ubiquitin(Gly-X)-Se_ilvD

[0351] To develop the constructs required to express DHAD as a fusion with an N-terminal ubiquitin, plasmid pGV1784 was synthesized by DNA2.0. This plasmid contained the synthesized sequence for the Mus musculus ubiquitin gene, codon-optimized for expression in S. cerevisiae (SEQ ID NO: 86). Using this plasmid as the template, the M. musculus ubiquitin gene was amplified via PCR using primers 1792 and 1794 to generate a PCR product containing the M. musculus ubiquitin gene codon sequence flanked by restriction sites XhoI and NotI at its 5' and 3' ends, respectively, and altered so as to lack the codon for its endogenous C-terminal most glycine residue (denoted as Gly-X). This PCR product was cloned into pGV1662 (described in Example 6), yielding pGV1897.

[0352] Plasmid pGV1897 was then used as a recipient cloning vector for sequences encoding S. cerevisiae ILV3 (ScIlv3(FL), SEQ ID NO: 88), S. cerevisiae Ilv3.DELTA.N (ScIlv3.DELTA.N, SEQ ID NO: 89), L. lactis ilvD (LI_ilvD, SEQ ID NO: 87), G. forsetti ilvD (Gf_ilvD, SEQ ID NO: 90), and S. erythraea ilvD (Se_ilvD, SEQ ID NO: 91), yielding plasmids pGV2052-2056, respectively.

[0353] The DHAD activity exhibited by cells transformed with each of the resulting constructs is ascertained by in vitro assay. GEVO2244 is transformed (singly) with pGV2052-2056, pGV1106 (empty control vector), pGV1855 (expressing native, unfused LI_ilvD) or pGV1900 (expressing native, full-length Sc_ILV3(FL)). Lysates of transformants are prepared and DHAD activity in mitochondrial/organellar (P) and cytosolic (S) fractions and whole cell (W) lysates is assayed as described in Example 3.

[0354] In an analogous manner, a desired ALS (e.g., the B. subtilis alsS) or KARI gene whose product is known or predicted to be mitochondrial can be re-targeted to the cytosol by means of the methods detailed in this example. The nucleotide sequence encoding for a full-length, or variant, ALS or KARI is amplified by PCR using primers that introduce restriction sites convenient for cloning the final product as an in-frame fusion of the M. musculus ubiquitin gene. The resulting construct is transformed into a host S. cerevisiae cell suitable for assaying the in vitro activity of the expressed M. musculus ubiquitin-gene chimeric fusion protein, using methods described in Example 3.

Example 9

Dihydroxy Acid Dehydratase Limits Isobutanol Production in Yeast

[0355] This example illustrates the specific activity of various DHAD homologs in yeast. The example also illustrates that high specific activity of the Lactococcus lactis IlvD enzyme (SEQ ID NO: 18) correlates with an increase in isobutanol production.

[0356] Plasmid pGV1106 was used as a control and is described in Example 3. Plasmid pGV1662 (described in Example 6) served as the parental plasmid of pGV1855, pGV1900, and pGV2019 (see Example 5). Plasmids pGV1851-1855 and pGV1904-1907 are all variants of pGV1662 (See Table 20), in which the kivD ORF sequence present in pGV1662 was excised and replaced with a sequence encoding a DHAD homolog, as indicated below.

TABLE-US-00022 TABLE 20 Plasmids disclosed in Example 9. pGV No. Genotype pGV1851 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Gramella forsetti ilvD pGV1852 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Chromohalobacter salexigens ilvD pGV1853 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Ralstonia eutropha ilvD pGV1854 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Saccharopolyspora erythraea ilvD pGV1855 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Ll_ilvD pGV1900 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-ScILV3(FL) pGV1904 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Acidobacteria bacterium Ellin345 ilvD pGV1905 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Picrophilus torridus DSM 9790 ilvD pGV1906 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Piromyces species E2 ilvD pGV1907 pUC ori, bla (AmpR), 2 .mu.m ori, URA3, TEF1 promoter-Sulfolobus tokodaii strain 7 ilvD

[0357] Plasmid pGV1851 contains the G. forsetti ilvD gene (SEQ ID NO: 90). Plasmid pGV1852 contains the C. salexigens gene (SEQ ID NO: 95). Plasmid pGV1853 contains the R. eutropha gene (SEQ ID NO: 94). Plasmid pGV1854 contains the S. erythraea ilvD (SEQ ID NO: 91). Plasmid pGV1855 contains the L. lactis ilvD (SEQ ID NO: 87). Plasmid pGV1900 contains the S. cerevisiae ILV3 (SEQ ID NO: 88). Plasmid pGV1904 contains the A. bacterium Ellin345 ilvD (SEQ ID NO: 92). Plasmid pGV1905 contains the P. torridus DSM 9790 ilvD (SEQ ID NO: 96). Plasmid pGV1906 contains the Piromyces sp. E2 ilvD (SEQ ID NO: 93). Plasmid pGV1907 contains the S. tokodaii ilvD (SEQ ID NO: 97). All sequences (except that of the S. cerevisiae ILV3 (full length) were synthesized with 5' SalI and 3' NotI sites by DNA2.0 (Menlo Park, Calif.), digested with SalI and NotI, and ligated into pGV1662 which had also been digested with SalI and NotI. For plasmid pGV1900, the sequence containing the open reading frame of the S. cerevisiae ILV3 (full length) was amplified from S. cerevisiae genomic DNA using primers 1617 and 1618, and the resulting 1.8 kb fragment was digested with SalI plus BamHI and cloned into pGV1662. Various DHADs were tested for in vitro activity using whole cell lysates. In this case, the DHADs were expressed in a yeast deficient for DHAD activity (GEVO2244; ilv3.DELTA.) (see Example 2) to minimize endogenous background activity.

[0358] To grow cultures for cell lysates, triplicate independent cultures of each desired strain were grown overnight in 3 mL SCD-Ura+9xIV at 30.degree. C., 250 rpm. The following day, the overnight cultures were diluted 1:50 into 50 mL fresh SCD-Ura in a 250 mL baffle-bottomed Erlenmeyer flask and incubated at 30.degree. C. at 250 rpm. After approximately 10 hours, the OD.sub.600 of all cultures were measured, and the cells of each culture were collected by centrifugation (2700.times.g, 5 min). The cell pellets were washed by resuspending in 1 mL of water, and the suspension was placed in a 1.5 mL tube and the cells were collected by centrifugation (16,000.times.g, 30 seconds). All supernatant was removed from each tube and the tubes were frozen at -80.degree. C. until use.

[0359] Lysates were prepared by resuspending each cell pellet in 0.7 mL of lysis buffer. Lysate lysis buffer consisted of: 0.1M Tris-HCl pH 8.0, 5 mM MgSO.sub.4, with 10 .mu.L of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) per 1 mL of lysis buffer. Eight hundred microliters of cell suspension were added to 1 mL of 0.5 mm glass beads that had been placed in a chilled 1.5 mL tube. Cells were lysed by bead beating (6 rounds, 1 minute per round, 30 beats per second) with 2 minutes chilling on ice in between rounds. The tubes were then centrifuged (20,000.times.g, 15 min) to pellet debris and the supernatant (cell lysates) were retained in fresh tubes on ice. The protein concentration of each lysate was measured using the BioRad Bradford protein assay reagent (BioRad, Hercules, Calif.) according to manufacturer's instructions.

[0360] The DHAD activity of each lysate was ascertained as follows. In a fresh 1.5 mL centrifuge tube, 50 .mu.L of each lysate was mixed with 50 .mu.L of 0.1M 2,3-dihydroxyisovalerate (DHIV), 25 .mu.L of 0.1 M MgSO.sub.4, and 375 .mu.L of 0.05M Tris-HCl pH 8.0, and the mixture was incubated for 30 min at 35.degree. C. Each tube was then heated to 95.degree. C. for 5 min to inactivate any enzymatic activity, and the solution was centrifuged (16,000.times.g for 5 min) to pellet insoluble debris. To prepare samples for analysis, 100 .mu.L of each reaction were mixed with 100 .mu.L of a solution consisting of 4 parts 15 mM dinitrophenyl hydrazine (DNPH) in acetonitrile with 1 part 50 mM citric acid, pH 3.0, and the mixture was heated to 70.degree. C. for 30 min in a thermocycler. The solution was then analyzed by HPLC as described above in General Methods to quantitate the concentration of ketoisovalerate (KIV) present in the sample. The results are shown in Table 21.

TABLE-US-00023 TABLE 21 Specific activities (KIV generation) from lysates of S. cerevisiae strain GEVO2244 carrying plasmids to overexpress the indicated DHAD homolog. Each data point is the result of triplicate samples. Specific activity Plasmid Gene (U/mg total protein) pGV1106 Control (i.e. no DHAD) n.d. pGV1851 Gramella forsetti ilvD 0.012 pGV1852 Chromohalobacter salexigens n.d. (SEQ ID NO: 95) pGV1853 Ralstonia eutropha (SEQ ID NO: 94) n.d. pGV1854 Saccharopolyspora erythraea ilvD 0.002 pGV1855 Lactococcus lactis ilvD 0.027 pGV1900 Saccharomyces cerevisiae ILV3(FL) 0.148 pGV1904 Acidobacteria bacterium Ellin345 DHAD 0.004 pGV1905 Picrophilus torridus DSM 9790 DHAD n.d. pGV1906 Piromyces Sp E2 DHAD 0.016 pGV1907 Sulfolobus tokodaii str. 7 DHAD 0.001 * n.d., not detectable

Example 10

Dihydroxy Acid Dehydratase Limits Isobutanol Production in Yeast

[0361] This example illustrates that high specific DHAD activity, and in particular the high specific activity of the L. lactis IlvD enzyme (SEQ ID NO: 18) correlates with an increase in isobutanol production.

TABLE-US-00024 TABLE 22 Genotype of strains disclosed in Example 10. GEVO No. Genotype/Source GEVO1186 S. cerevisiae, CEN.PK; MATa/.alpha. ura3/ura3 leu2/leu2 his3/his3 trp1/trp1 GEVO1188 S. cerevisiae, CEN.PK; MAT.alpha. ura3 leu2 his3 trp1 GEVO1803 MATa/.alpha. ura3/ura3 leu2/leu2 his3/his3 trp1/trp1 pdc1::Bs- alsS, TRP1/PDC1 GEVO2107 MATa/.alpha. ura3/ura3 leu2/leu2 his3/his3 trp1/trp1 pdc1::Bs- alsS, TRP1/PDC1 pdc6::{ScTEF1p-Ll_kivd ScTDH3p- Dm_ADH URA3}/PDC6

TABLE-US-00025 TABLE 23 Plasmids disclosed in Example 10. pGV No. Genotype p423GPD P.sub.TDH3:MCS:T.sub.CYC1, HIS3, 2-micron, bla, pUC ori (Mumberg, D. et al. (1995) Gene 156: 119-122; obtained from ATCC) pGV1103 P.sub.TDH3:myc-tag:MCS:T.sub.CYC1, HIS3, 2 micron, bla, pUC ori pGV1730 P.sub.CUP1:Bs-alsS:T.sub.PDC1/PDC1-3' region:PDC1-5' region, TRP1, bla, pUC ori pGV1914 P.sub.TEF1:Ll_kivD P.sub.TDH3:Dm_ADH PDC6 5', 3' targeting homology URA3 pUC ori bla(ampR) pGV1974 P.sub.TEF1:Sc_ILV3.DELTA.N:P.sub.TDH3:Ec_ilvC.sup.Q110V-coSc:T.sub- .CYC1, HIS3, 2 micron, bla, pUC ori bla(ampR) pGV1981 P.sub.TEF1:Lactococcus lactis ilvD-coSc:P.sub.TDH3:Ec_ilvC.sup.Q110V- coSc:T.sub.CYC1, HIS3, 2 micron, bla, pUC ori pGV2001 P.sub.TEF1:P.sub.TDH3:EC_ilvC.sup.Q110V-coSc:T.sub.CYC1, HIS3, 2 micron, bla, pUC ori

[0362] Plasmid pGV1103 was generated by inserting a linker (primers 271 annealed to primer 272) containing a myc-tag and a new MCS (SalI-EcoRI-SmaI-BamHI-NotI) into the SpeI and XhoI sites of p423GPD. The construction of plasmid pGV1730 is described in Example 4.

[0363] pGV1914 (SEQ ID NO: 117) is a yeast integrating vector that includes the S. cerevisiae URA3 gene as a selection marker and contains homologous sequence for targeting the HpaI-digested, linearized plasmid for integration at the PDC6 locus of S. cerevisiae. pGV1914 carries the D. melanogaster adh (Dm_ADH) (SEQ ID NO: 116) and L. lactis kivd (LI_kivD) genes, expressed under the control of the S. cerevisiae TDH3 and TEF1 promoters, respectively. The open reading frame sequence of DmADH was originally amplified by PCR from clone RH54514 (available from the Drosophila Genome Resource Center).

[0364] Plasmid pGV1974 is a yeast high copy plasmid with HIS3 as a marker for the expression of E. coli ilvC.sup.Q110V (SEQ ID NO: 98) and S. cerevisiae ILV3.DELTA.N (SEQ ID NO: 89). pGV1974 was generated by cloning a SacI-NotI fragment (4.9 kb, SEQ ID NO: 118) carrying the S. cerevisiae TEF1 promoter:S. cerevisiae ilv3.DELTA.N:S. cerevisiae TDH3 promoter:E. coli ilvC.sup.Q110V into the SacI-NotI sites of pGV1103 (5.4 kb), a yeast expression plasmid carrying the HIS3 marker.

[0365] Plasmid pGV1981 is a yeast high copy plasmid with HIS3 as a marker for the expression of E. coli ilvC.sup.Q110V and L. lactis ilvD. pGV1981 was generated by cloning a SalI-BamHI fragment (1.7 kb) carrying the L. lactis ilvD ORF (SEQ ID NO: 87 with a SalI and BamHI sites introduces at the 5' and 3' ends, respectively) into the SalI-BamHI of pGV1974 (8.5 kb), replacing the S. cerevisiae Ilv3.DELTA.N ORF.

[0366] Plasmid pGV2001 is a yeast high copy plasmid with HIS3 as a marker for the expression of E. coli ilvC.sup.Q110V. pGV2001 was generated by digesting pGV1974 with SalI-BamHI to remove the S. cerevisiae Ilv3.DELTA.N ORF. The digest was treated with Klenow to fill-in the 5' overhangs, the larger 8.5 kb fragment was isolated and self-ligated.

[0367] GEVO1803 was made by transforming GEVO1186 with the 6.7 kb pGV1730 (contains S. cerevisiae TRP1 marker and the CUP1 promoter-driven B. subtilis alsS) that had been linearized by digestion with NruI. Completion of the digest was confirmed by running a small sample on a gel. The digested DNA was then purified using Zymo Research DNA Clean and Concentrator and used in the transformation. Trp+ clones were confirmed for the correct integration into the PDC1 locus by colony PCR using primer pairs 1440+1441 and 1442+1443 for the 5' and 3' junctions, respectively. Expression of B. subtilis alsS was confirmed by qRT-PCR using primer pairs 1323+1324.

[0368] GEVO2107 was made by transforming GEVO1803 with linearized, HpaI-digested pGV1914. Correct integration of pGV1914 at the PDC6 locus was confirmed by analyzing candidate Ura+ colonies by colony PCR using primers 1440 plus 1441, or 1443 plus 1633, to detect the 5' and 3' junctions of the integrated construct, respectively. Expression of all transgenes were confirmed by qRT-PCR using primer pairs 1321 plus 1322, 1587 plus 1588, and 1633 plus 1634 to examine Bs_alsS, LI_kivD, and Dm_ADH transcript levels, respectively.

[0369] GEVO 2107 was transformed with plasmids that contained either a KARI alone (pGV2001 with E. coli ilvC.sup.Q110V) or the same KARI with a DHAD (pGV1974 with the S. cerevisiae Ilv3.DELTA.N or pGV1981 with the L. lactis ilvD). Fermentations were carried out with three independent transformants for each DHAD homolog being tested, as well as the no DHAD control plasmid. Seed cultures were grown in SCD-H medium to mid-log phase. The fermentations were initiated by collecting cells and resuspending in 25 mL of SCD-H (5% glucose) medium to an OD.sub.600 of 1. Fermentations were performed aerobically in 125 mL unbaffled flasks shaken at 250 rpm at 30.degree. C. At 0, 24, 48 and 72 hours, the OD.sub.600 was checked and 2 mL samples were taken. These samples were centrifuged at 18,000.times.g in a microcentrifuge and 1.5 mL of the clarified media was transferred to a 1.5 mL Eppendorf tube. The clarified media was stored at 4.degree. C. until analyzed by GC and HPLC as described in General Methods. At 24 and 48 hours, 2.5 mL of glucose from a 400 g/L stock solution was added to the cultures. FIG. 4 shows the production of isobutanol in these fermentations. All values were adjusted for the dilution caused by the volume change from adding glucose. An increased amount of isobutanol was produced from the cells expressing the L. lactis ilvD.

Example 11

Assaying DHAD Activity in Fractionated Cell Extracts

[0370] The purpose of this Example is to describe how DHAD activity can be measured in fractionated cellular extracts that are enriched for either mitochondrial or soluble cytosolic components.

[0371] Plasmids pGV1106, pGV1662, pGV1855, pGV1900 are described in Example 9 above. To measure the DHAD activities present in fractionated cell extracts, the strain GEVO2244 was transformed singly with either pGV1106, which served as an empty vector control, or with one of: pGV1855, pGV1900, or pGV2019, which are expression plasmids for L. lactis ilvD, S. cerevisiae ILV3 (full length), and S. cerevisiae ILV3.DELTA.N, respectively.

[0372] An independent clonal transformant of each plasmid was isolated, and a 1 L culture of each strain was grown in SCGaI-Ura+9xIV at 30.degree. C. at 250 rpm. The OD.sub.600 was noted, the cells were collected by centrifugation (1600.times.g, 2 min) and the culture medium was decanted. The cell pellets were resuspended in 50 mL sterile deionized water, collected by centrifugation (1600.times.g, 2 min), and the supernatant was discarded. The OD.sub.600 and total wet cell pellet weight of each culture are listed in Table 24, below:

TABLE-US-00026 TABLE 24 OD.sub.600 and pellet mass (g) of strain GEVO2244 transformed with the indicated plasmids. Pellet mass Plasmid OD.sub.600 (g) pGV1106 2.2 7.6 pGV1855 2.3 7.7 pGV1900 1.3 3.8 pGV2019 2.6 8.4

[0373] To obtain spheroplasts, the cell pellets were resuspended in 0.1 M Tris-SO.sub.4, pH 9.3, to a final concentration of 0.1 g/mL, and DTT was added to a final concentration of 10 mM. Cells were incubated with gentle (60 rev/min) agitation on an orbital shaker for 20 min at 30.degree. C., and the cells were then collect by centrifugation (1600.times.g, 2 min) and the supernatant discarded. Each cell pellet was resuspended in spheroplasting buffer, which consists of (final concentrations): 1.2M sorbitol (Amresco, catalog #0691), 20 mM potassium phosphate pH 7.4) and then collected by centrifugation (1600.times.g, 10 min). Each cell pellet was resuspended in spheroplasting buffer to a final concentration of 0.1 g cells/mL in a 500 mL centrifuge bottle, and 50 mg of Zymolyase 20T (Seikagaku Biobusiness, Code#120491) was added to each cell suspension. The suspensions were incubated overnight (approximately 16 hrs) at 30.degree. C. with gentle agitation (60 rev/min) on an orbital shaker. The efficacy of spheroplasting was ascertained by diluting an aliquot of each cell suspension 1:10 in either sterile water or in spheroplasting buffer, and comparing the aliquots microscopically (under 40.times. magnification). In all cases, >90% of the water-diluted cells lysed, indicating efficient spheroplasting. The spheroplasts were centrifuged (3000.times.g, 10 min, 20.degree. C.), and the supernatant was discarded. Each cell pellet was resuspended in 50 mL spheroplast buffer without Zymolyase, and cells were collected by centrifugation (3000.times.g, 10 min, 20.degree. C.).

[0374] To fractionate spheroplasts, the cells were resuspended to a final concentration of 0.5 g/mL in ice cold mitochondrial isolation buffer (MIB), consisting of (final concentration): 0.6M D-mannitol (BD Difco Cat#217020), 20 mM HEPES-KOH, pH 7.4. For each 1 mL of resulting cell suspension, 0.01 mL of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) was added. The cell suspension was subjected to 35 strokes of a Dounce homogenizer with the B (tight) pestle, and the resulting cell suspension was centrifuged (2500.times.g, 10 min, 4.degree. C.) to collect cell debris and unbroken cells and spheroplasts. Following centrifugation, 2 mL of each sample (1 mL of the pGV1900 transformed cells) were saved in a 2 mL centrifuge tube on ice and designated the "W" (for Whole cell extract) fraction, while the remaining supernatant was transferred to a clean, ice-cold 35 mL Oakridge screw-cap tube and centrifuged (12,000.times.g, 20 min, 4.degree. C.) to pellet mitochondria and other organellar structures. Following centrifugation, 5 mL of each resulting supernatant was transferred to a clean tube on ice, being careful to avoid the small, loose pellet, and labelled the "S" (soluble cytosol) fraction. The resulting pellets were resuspended in MIB containing Protease Arrest solution, and were labelled the "P" ("pellet") fractions. Protein from the "P" fraction was released after dilution 1:5 in DHAD assay buffer (see above) by rapid mixing in a 1.5 mL tube with a Retsch Ball Mill MM301 in the presence of 0.1 mM glass beads. The bead-beating was performed 4 times for 1 minute, 30 beats per second, after which insoluble debris was removed by centrifugation (20,000.times.g, 10 min, 4.degree. C.) and the soluble portion retained for use.

[0375] The BioRad Protein Assay reagant (BioRad, Hercules, Calif.) was used according to manufacturer's instructions to determine the protein concentration of each fraction; the data are summarized in Table 25, below:

TABLE-US-00027 TABLE 25 Protein concentrations of mitochondrial/organellar (P) and cytosolic (S) fractions and whole cell (W) lysates, prepared as described in the text. plasmid/fraction protein [.mu.g/pL] 1106 P 20.3 1855 P 17.7 1900 P 9.2 2019 P 19.7 1106 S 12.3 1855 S 12.9 1900 S 7.9 2019 S 12.4 1106 W 14.0 1855 W 15.0 1900 W 7.9 2019 W 14.7

[0376] The DHAD activity of each fraction was ascertained as follows. In a fresh 1.5 mL centrifuge tube, 50 .mu.L of each fraction was mixed with 50 .mu.L of 0.1M 2,3-dihydroxyisovalerate (DHIV), 25 .mu.L of 0.1 M MgSO.sub.4, and 375 .mu.L of 0.05M Tris-HCl pH 8.0, and the mixture was incubated for 30 min at 35.degree. C. Each reaction was carried out in triplicate. Each tube was then heated to 95.degree. C. for 5 min to inactivate any enzymatic activity, and the solution was centrifuged (16,000.times.g for 5 min) to pellet insoluble debris. To prepare samples for analysis, 100 .mu.L of each reaction were mixed with 100 .mu.L of a solution consisting of 4 parts 15 mM dinitrophenyl hydrazine (DNPH) in acetonitrile with 1 part 50 mM citric acid, pH 3.0, and the mixture was heated to 70.degree. C. for 30 min in a thermocycler. Analysis of ketoisovalerate via HPLC was carried out as described in General Methods. Data from the experiment are summarized below in Table 26.

TABLE-US-00028 TABLE 26 Specific activities (KIV generation) and ratios of specific activities from fractionated lysates of S. cerevisiae strain GEVO2244 carrying plasmids to overexpress the indicated DHAD homolog. Each data point is the result of triplicate samples. Sp. Activity Ratio of Sp. Lysate [U/mg Activities (pGV# and protein in (Cyto or Mito fraction*) DHAD fraction] Std. Dev. to Whole-Cell) 1106 WCL -- n.d. 1106 cyto -- n.d. 1106 mito -- n.d. 1855 WCL Ll_ilvD 0.0006 4.7E-05 1855 cyto Ll_ilvD 0.0011 0.0001 1.76 1855 mito Ll_ilvD 2E-05 3.5E-05 0.03 1900 WCL ScILV3(FL) 0.0096 0.0018 1900 cyto ScILV3(FL) 0.0052 0.0004 0.54 1900 mito ScILV3(FL) 0.0340 0.0029 3.53 *WCL, whole cell lysate; cyto, cytosolic-enriched fraction; mito, mitochondrial (organellar)-enriched fraction

[0377] Cells overexpressing the L. lactis ilvD generated a significantly greater proportion of DHAD activity in the cytosolic fraction versus the mitochondrial fraction, whereas cells overexpressing the full-length, native (mitochondrial) S. cerevisiae ILV3 resulted in a greater proportion of the specific activity residing in the mitochondrial fraction.

Example 12

Alternative, Native Dehydratases with DHAD Activity

[0378] This example describes how the overexpression of native dehydratases in S. cerevisiae for the conversion of 2,3-dihydroxyisovalerate to ketoisovalerate is measured.

TABLE-US-00029 TABLE 27 Plasmids disclosed in Example 12. pGV No. Genotype p426TEF P.sub.TEF1:MCS:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori (Mumberg, D. et al. (1995) Gene 156: 119-122; obtained from ATCC) 1102 P.sub.TEF1:HA-tag:MCS:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 1106 P.sub.TDH3:myc-tag:MCS:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 1662 P.sub.TEF1:Ll_kivd: T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 1894 P.sub.TEF1:Ec_ilvC.sup.Q110V-coSc:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 2000 P.sub.TEF1:Sc_ILV3.DELTA.N: P.sub.TDH3:Ec_ilvC.sup.Q110V-coSc: T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 2111 P.sub.TEF1:Ll_ ilvD:P.sub.TDH3:Ec_ilvC.sup.Q110v-coSc:T.sub.CYC1, URA3, 2- micron, bla, pUC-ori 2112 P.sub.TEF1:Sc_LEU1:P.sub.TDH3:Ec_ilvC.sup.Q110V-coSc:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori 2113 P.sub.TEF1:Sc_HIS3:P.sub.TDH3:Ec_ilvC.sup.Q110V-coSc:T.sub.CYC1, URA3, 2-micron, bla, pUC-ori

[0379] Plasmid pGV1102 was generated by inserting a linker (primers 269 annealed to primer 270) containing a HA-tag and a new MCS (SalI-EcoRI-SmaI-BamHI-NotI) into the SpeI and XhoI sites of p426TEF. Plasmids pGV1106 and pGV1662 are described in Examples 3 and 5, respectively. Plasmid pGV1894 is a yeast high copy plasmid with URA3 as a marker for the expression of E. coli ilvC.sup.Q110V and was generated by cloning a XhoI-NotI fragment (1.5 kb) carrying the E. coli ilvC.sup.Q110V ORF (SEQ ID NO: 98) into the SalI-NotI of pGV1662 (6.3 kb), replacing the L. lactis kivD ORF. Plasmids pGV2000, pGV2111, pGV2112, and pGV2113 are yeast high copy plasmids with URA3 as a marker for the expression of E. coli ilvC.sup.Q110V and a DHAD. pGV2000 is generated by cloning a SacI-NotI fragment (4.9 kb) from pGV1974 (described in Example 10) carrying the S. cerevisiae TEF1 promoter:S. cerevisiae Ilv3.DELTA.N:S. cerevisiae TDH3 promoter:E. coli ilvC.sup.Q110V into the SacI-NotI sites of pGV1106 (6.6 kb), a yeast expression plasmid carrying the URA3 marker. pGV2111 is generated by cloning a SalI-BamHI fragment (1.7 kb) carrying the L. lactis ilvD ORF (SEQ ID NO: 97 with SalI and BamHI sites introduced at the 5' and 3' ends, respectively) into the SalI-BamHI of pGV2000 (8.4 kb), replacing the S. cerevisiae Ilv3.DELTA.N ORF. pGV2112 is generated by cloning the S. cerevisiae LEU1 gene as a SalI-BamHI fragment (2.3 kb), generated by PCR using primers 2163 and 1842 using genomic DNA as template, into the SalI-BamHI of pGV2000 (8.4 kb), replacing the S. cerevisiae Ilv3.DELTA.N ORF. pGV2113 is generated by cloning the S. cerevisiae HIS3 gene as a SalI-BamHI fragment (0.7 kb), generated by PCR using primers 2183 and 2184 using genomic DNA as template, into the SalI-BamHI of pGV2000 (8.4 kb), replacing the S. cerevisiae Ilv3.DELTA.N ORF.

[0380] DHADs are tested for in vitro activity using whole cell lysates. The DHADs as well as LEU1 and HIS3 are expressed from pGV2000, pGV2112, and pGV2113 GEVO2244 to minimize endogenous DHAD background activity. A plasmid that does not express DHAD, pGV1894, and a plasmid that expresses the L. lactis ilvD, pGV2111, are used as negative and positive controls, respectively

[0381] To grow cultures for cell lysates, triplicate independent cultures of each desired strain are grown overnight in 3 mL YNBD+HLW+10xIV at 30.degree. C., 250 rpm. The following day, the overnight cultures are diluted 1:50 into 50 mL fresh YNBD+HLW+10xIV in a 250 mL baffle-bottomed Erlenmeyer flask and incubated at 30.degree. C. at 250 rpm. After approximately 10 hours, the OD.sub.600 of all cultures are measured, and the cells of each culture are collected by centrifugation (2700.times.g, 5 min). The cell pellets are washed by resuspending in 1 mL of water, and the suspension is placed in a 1.5 mL tube and the cells are collected by centrifugation (16,000.times.g, 30 seconds). All supernatant is removed from each tube and the tubes are frozen at -80.degree. C. until use.

[0382] Lysates are prepared by resuspending each cell pellet in 0.7 mL of lysis buffer. Lysate lysis buffer consisted of: 0.1M Tris-HCl pH 8.0, 5 mM MgSO.sub.4, with 10 .mu.L of Yeast/Fungal Protease Arrest solution (G Biosciences, catalog #788-333) per 1 mL of lysis buffer. Eight hundred microliters of cell suspension are added to 1 mL of 0.5 mm glass beads that had been placed in a chilled 1.5 mL tube. Cells are lysed by bead beating (6 rounds, 1 minute per round, 30 beats per second) with 2 minutes chilling on ice in between rounds. The tubes are then centrifuged (20,000.times.g, 15 min) to pellet debris and the supernatant (cell lysates) are retained in fresh tubes on ice. The protein concentration of each lysate is measured using the BioRad Bradford protein assay reagent (BioRad, Hercules, Calif.) according to manufacturer's instructions.

[0383] The DHAD activity of each lysate is ascertained as follows. In a fresh 1.5 mL centrifuge tube, 50 .mu.L of each lysate is mixed with 50 .mu.L of 0.1M 2,3-dihydroxyisovalerate (DHIV), 25 .mu.L of 0.1 M MgSO.sub.4, and 375 .mu.L of 0.05M Tris-HCl pH 8.0, and the mixture is incubated for 30 min at 35.degree. C. Each tube is then heated to 95.degree. C. for 5 min to inactivate any enzymatic activity, and the solution is centrifuged (16,000.times.g for 5 min) to pellet insoluble debris. To prepare samples for analysis, 100 .mu.L of each reaction are mixed with 100 .mu.L of a solution consisting of 4 parts 15 mM dinitrophenyl hydrazine (DNPH) in acetonitrile with 1 part 50 mM citric acid, pH 3.0, and the mixture is heated to 70.degree. C. for 30 min in a thermocycler. The solution is then analyzed by HPLC as described above in General Methods to quantitate the concentration of ketoisovalerate (KIV) present in the sample.

[0384] DHADs are tested for in vitro activity using whole cell lysates. The DHADs are expressed in a yeast deficient for DHAD activity (GEVO2244; ilv3.DELTA.) to minimize endogenous background activity.

Example 13

Cloning of Low-Abundance, Endogenous Cytosolic Iron-Sulfur Cluster Assembly Machinery for Overexpression in S. cerevisiae

[0385] The purpose of this example is to describe how three known components of the S. cerevisiae cytosolic iron-sulfur assembly machinery were cloned to permit their overexpression in S. cerevisiae, to increase cytosolic DHAD activity.

[0386] In the yeast S. cerevisiae, at four least genes--CIA1, CFD1, NAR1, and NBP35--encode activities that contribute to the proper assembly and/or transfer of iron-sulfur [Fe--S] clusters of cytosolic proteins. Of these four genes, three--CFD1, NAR1, and NBP35--have been shown to be expressed at very low levels during aerobic growth on glucose (Ghaemmaghami et al., 2003, Nature, 425: 737-741). These three genes thus represent attractive candidates for overexpression to increase the cellular capacity for proper cytosolic [Fe--S] cluster protein assembly.

TABLE-US-00030 TABLE 27 Plasmids disclosed in Example 13. pGV No. Genotype pGV2074 pUC ori, bla (AmpR), 2 .mu.m ori, TPI1 promoter-hph (HygroR),PGK1 promoter, TEF1 promoter, TDH3 promoter pGV2127 pUC ori, bla (AmpR), 2 .mu.m ori, TPI1 promoter-hph (HygroR), PGK1 promoter, TEF1 promoter, TDH3 promoter-CFD1 pGV2138 pUC ori, bla (AmpR), 2 .mu.m ori, TPI1 promoter-hph (HygroR), PGK1 promoter, TEF1 promoter-NAR1, TDH3 promoter-CFD1 pGV2144 pUC ori, bla (AmpR), 2 .mu.m ori, TPI1 promoter-hph (HygroR), PGK1 promoter- NBP35, TEF1 promoter, TDH3 promoter pGV2147 pUC ori, bla (AmpR), 2 .mu.m ori, TPI1 promoter-hph (HygroR), PGK1 promoter- NBP35, TEF1 promoter-NAR1, TDH3 promoter-CFD1

[0387] To clone the sequences for CFD1, NAR1, and NBP35 into an appropriate S. cerevisiae expression vector, the following steps were carried out: Vector pGV2074 (SEQ ID NO: 133) was used as a parental plasmid for subsequent cloning steps described below. The salient features of pGV2074 include a bacterial origin of replication (pUC) and selectable marker (bla), an S. cerevisiae 2 .mu.m origin of replication and selectable marker (the hph gene, conferring resistance to hygromycin, operably linked to the TPI1 promoter region), and sequences containing the S. cerevisiae promoters for the PGK1, TDH3 and TEF1 genes, each followed by one or more unique restriction sites to facilitate the introduction of coding sequences.

[0388] First, the CFD1 coding sequence was amplified from S. cerevisiae genomic DNA by PCR, using primers 2195 and 2196, which also added 5' XhoI and 3' NotI sites, respectively. The resulting .about.890 bp product was digested with XhoI plus NotI and ligated into pGV2074 that had been digested with XhoI plus NotI, yielding the plasmid pGV2127. All sequences amplified by PCR were confirmed by DNA sequencing. Next, the NAR1 coding sequence was amplified from S. cerevisiae genomic DNA by PCR, using primers 2197 and 2198, which added 5' SalI and 3' BamHI sites, respectively. The resulting .about.1485 bp product was digested with SalI plus BamHI and cloned into pGV2127 which had also been digested with SalI plus BamHI, thereby yielding pGV2138. Next, the NBP35 coding sequence was amplified S. cerevisiae genomic DNA by PCR, using primers 2259 and 2260, which added 5' BglII and 3' KpnI and XhoI (from 5' to 3') sites, respectively. The resulting .about.995 bp product was digested with BglII plus XhoI and ligated into pGV2074 that had been digested with BglII plus SalI, yielding pGV2144. Finally, pGV2144 was digested with AvrII plus BamHI, and the resulting 1.78 kb fragment (which contained the PGK1 promoter and the NBP35 ORF sequence) was gel purified and ligated into the vector pGV2138 that had been digested with AvrII plus BglII, yielding pGV2147.

Example 14

Cloning of Heterologous Cytosolic Iron-Sulfur Cluster Assembly Machinery for Overexpression in S. cerevisiae

[0389] The purpose of this example is to describe how one or more cytosolic iron-sulfur assembly machinery components, from various species, can be cloned to permit their overexpression in S. cerevisiae, thereby increasing cytosolic DHAD activity.

[0390] In addition to the endogenous cytosolic iron-sulfur assembly machinery found in S. cerevisiae, homologous sequences and activities have been identified in other microbial and eukaryotic species. In one example, the ApbC protein of Salmonella enterica serovar Typhimurium has been shown, in vitro, to bind and effectively transfer iron-sulfur clusters to a known cytosolic [Fe--S] cluster-containing S. cerevisiae substrate, Leu1 (Boyd et al., 2008, Biochemistry, 47: 8195-202). Thus, a number of other useful homologs of the known S. cerevisiae cytosolic iron-sulfur assembly machinery components exist and present attractive candidates for overexpression in S. cerevisiae. Table 28 lists several exemplary homologs and their GenBank accession numbers, as identified by previous homology searches (Boyd et al., 2009, J. Biol Chem 284: 110-118). Also included in the table are two closely related S. cerevisiae homologs, Nbp35 and Cfd1. Of note, Ind1 is reported to be localized to and functional in the mitochondria (Bych et al., 2008, EMBO J. 27: 1736-46), whereas Hcf101 is reported to participate in iron-sulfur cluster assembly in Arabidopsis chloroplasts (Lezhneva et al., 2004, Plant J. Cell Mol Biol 37: 174-185).

TABLE-US-00031 TABLE 28 Functionally homologous proteins involved in iron-sulfur cluster formation. Gene Source, Accession Number ApbC Salmonella enterica serovar Typhimurium LT2, NP_461098 Ind1 Yarrowia lypolytica, YALI0B18590g Hcf101 Arabidopsis thaliana, AAR97892.1 Nubp1 Homo sapiens, NP_002475.2 Nbp35 S. cerevisiae, CAA96797.1 Cfd1 S. cerevisiae, AAS56623

[0391] The cloning of one or more of these genes is carried out using techniques well known to one skilled in the art. Oligonucleotide primers are designed that are homologous to the 5' and 3' ends of each desired reading, and which furthermore incorporate a restriction site sequence convenient for the cloning of each reading frame into vector pGV2074. A standard PCR reaction is used to amplify each gene, either from the genome of each host organism, or from an in vitro synthesized DNA fragment, and the resulting PCR product is cloned into an expression vector (pGV2074). In the case of a protein known to be targeted to the mitochondria, such as Yarrowia lypolytica Ind1, PCR primers are designed to amplify the majority of the coding sequence while excluding the known N-terminal mitochondrial targeting sequence (Bych et al., 2008, EMBO J. 27: 1736-46).

Example 15

Overexpression of S. cerevisiae Cytosolic Iron-Sulfur Assembly Machinery to Increase Cytosolic DHAD Activity

[0392] The purpose of this example is to describe how a plasmid expressing one or more iron-sulfur assembly machinery components is co-expressed with a DHAD, thereby increasing the cytosolic activity of the DHAD.

[0393] Strain GEVO2244 is simultaneously co-transformed with one of: pGV1851, pGV1852, pGV1853, pGV1854, pGV1855, pGV1904, pGV1905, pGV1906, or pGV1907 (pGV1851-55 and pGV1904-07 are described in Table 20); plus, one of either: pGV2074 (Table 27) (which serves as an empty-vector control) or pGV2147 (Table 27) (which serves as the cytosolic Fe--S cluster machinery overexpression plasmid), and doubly-transformed cells are selected by plating onto SCD-Ura+9xIV containing 0.1 g/L Hygromycin B.

[0394] Three independent isolates from each transformation are cultured in SCD-Ura+9xIV containing 0.1 g/L Hygromycin B to obtain a cell mass suitable for preparation of a lysate, as described in Example 3. Lysates are prepared from each culture, and the resulting lysates are assayed for DHAD activity as described in Example 3. To further confirm that the increased DHAD activity is due specifically to increased cytosolic activity, cultures of GEVO2244 containing pGV1855 plus either pGV2074 or pGV2147 are grown in SCD-Ura+9xIV containing 0.1 g/L Hygromycin B as otherwise described in Example 11. Fractionated lysates are prepared and in vitro assays to measure DHAD activity are further carried out as described in Example 11.

Example 16

Deletion of LEU1

[0395] The purpose of this example is to describe the deletion of LEU1 to increase the iron-sulfur cluster availability in the yeast cytosol.

TABLE-US-00032 TABLE 29 Plasmids disclosed in Example 16. pGV No. Genotype pGV1299 K. lactis URA3, bla, pUC-ori (GEVO) pGV1981 P.sub.TEF1:Lactococcus lactis ilvD-coSc:P.sub.TDH3:Ec_ilvC.sup.Q110V- coSc:T.sub.CYC1, HIS3, 2-micron, bla, pUC-ori pGV2001 P.sub.TEF1:P.sub.TDH3:Ec_ilvC.sup.Q110V-coSc:T.sub.CYC1, HIS3, 2-micron, bla, pUC-ori

[0396] The LEU1 gene was deleted by transforming cells with a leu1:K. lactis URA3 deletion cassette that was generated by two rounds of PCR. Initially, the K. lactis URA3 gene was amplified with primers 2171 and 2172 from pGV1299 (described in Example 2). These primers add 40 bp of the LEU1 promoter and terminator sequences to the 5' and 3' ends of the K. lactis URA3 gene. This PCR product was then used as a template for a PCR using primers 2170 and 2173. Primer 2170 adds an additional 36 bp of the LEU1 promoter sequence at the 5' end and primer 2173 adds an additional 38 bp of the LEU1 terminator sequence at the 3' end. This PCR product was transformed into GEVO2244 (described in Example 2) to generate GEVO2570. The 5' junction of the integrations were confirmed by colony PCR using primers 2226 and 587. The 3' junction of the integrations were confirmed by colony PCR using primers 588 and 2175. The loss of the LEU1 gene was confirmed by a lack of PCR product using primers 2167 and 2227.

[0397] GEVO2570 has a deletion in ILV3. GEVO2570 is used to measure DHAD activity in the presence of L. lactis ilvD overexpressed as described in Examples 2 and 4. A plasmid (pGV2001) with no DHAD is used as a negative control.

Example 17

Conserved Motif Amongst Cytosolically Active DHAD Enzymes

[0398] This example illustrates that a DHAD enzymes with a specific amino acid sequence motif are more likely to be functional when expressed in the yeast cytosol.

[0399] Based on the data from biochemical assays (see Example 10), several DHAD homologs were identified that exhibit at least some cytosolic activity. A total of ten different homologs were tested using biochemical assays. The DHADs were expressed from 2 micron yeast vectors and transformed into GEVO2244. The homologs were then ranked based on their measured specific activity in both whole cell lysates and in cytosolic fractions.

[0400] Based on these data, four DHAD homologs: L. lactis (SEQ ID NO: 18), G. forsetii (SEQ ID NO: 17), Acidobacteria (SEQ ID NO: 16), and S. erythraea (SEQ ID NO: 19) exhibit cytosolic DHAD activity. Four DHAD homologs exhibit no cytosolic DHAD activity: R. eutropha (SEQ ID NO: 22), C. salexigens (SEQ ID NO: 23), P. torridus (SEQ ID NO: 24), and S. tokodaii (SEQ ID NO: 25). One motif-containing homolog was inconclusive: Piromyces sp. E2 (SEQ ID NO: 21), which did not complement the GEVO2242 valine auxotrophy and had detectable biochemical DHAD activity. Since, this homolog has a putative organellar targeting sequence, the protein is likely to be mitochondrially located explaining its inability to complement the GEVO2242 auxotrophy, despite containing the motif.

[0401] A multiple sequence alignment (MSA) was created using the Align Multiple Sequences tool of Clone Manager 9 Professional Addition Software using the "MultiWay" function. This function performs exhaustive pairwise global alignments of all sequences and progressive assembly of alignments using Neighbor-Joining phylogeny. A total of 53 representative DHAD homologs (FIG. 5) were aligned using the following using the BLOSUM62 scoring matrix setting. This alignment generated the tree in FIG. 5.

[0402] Many of the DHAD homologs exhibiting cytosolic activity are related by overall homology (>40%) homology when compared to the S. cerevisiae DHAD encoded by S. cerevisiae ILV3 (e.g. L. lactis, G. forsetii, Acidobacteria, and S. erythraea). However, the 40% homology cut-off still includes several DHAD homologs that do not exhibit cytosolic DHAD activity (e.g. R. eutropha, C. salexigens, P. torridus, and S. tokodaii). The Piromyces sp. E2 DHAD failed to complement in the genetic/biochemistry assay but this result is still consistent with our motif hypothesis since the protein still retained its mitochondrial localization signal. Therefore, a common sequence motif, unique to DHAD homologs that are cytosolically active, was identified: P(I/L)XXXGX(I/L)XIL (SEQ ID NO: 27), where (I/L) indicates an isoleucine or leucine at that position, and X indicates any natural or non-natural amino acid. This motif can be found in all DHAD homologs exhibiting cytosolically activity. Furthermore, an even more specific version of this motif was identified that is conserved in all of DHAD homologs exhibiting cytosolic activity except for the S. erythraea DHAD: PIKXXGX(I/L)XIL (SEQ ID NO: 28). This motif is conserved amongst the majority if not all eukaryotic homologs of DHAD.

[0403] Six additional DHAD homologs were identified: SEQ ID NOs: 10-15 as specified in Table 1. These DHAD homologs (SEQ ID NOs: 10-15) contain the motifs PYHKEGGLGIL (SEQ ID NO: 145), PYSEKGGLAIL (SEQ ID NO: 146), PYKPEGGIAIL (SEQ ID NO: 147), PLKPSGHLQIL (SEQ ID NO: 148), PIKKTGHLQIL (SEQ ID NO: 149), and PIKETGHIQIL (SEQ ID NO: 150), respectively.

Example 18

Use of Cytosolically Localized DHADs for the Production of Isobutanol

[0404] The following example illustrates the use of DHADs that have cytosolic activity in yeast and when expressed in the context of an isobutanol biosynthetic pathway lead to isobutanol production.

[0405] A yeast strain that contains one integrated copy of the B. subtilis alsS gene codon-optimized for expression in S. cerevisiae (SEQ ID NO: 144), two integrated copies of the L. lactis kivD gene (SEQ ID NOs: 99 and 151), one integrated copy of L. lactis adhA.sup.RE1 gene (SEQ ID NO: 152), and one integrated copy of the S. cerevisiae AFT1 gene (SEQ ID NO: 153) was transformed with high copy three-component isobutanol pathway plasmids containing a KARI (Ec_ilvC_coSc.sup.P2D1-A1-his6, SEQ ID NO: 154), an ADH (L. lactis adhA.sup.RE1, SEQ ID NO: 152) and a DHAD which was expressed from the S. cerevisiae PDC1-286 promoter. The DHAD varied according to Table 31. Isobutanol titer and DHAD activity of each strain was compared to that of a control strain that did not express a DHAD in the plasmid. Strains, plasmids, and DHADs are listed in Tables 30, 31, and 32, respectively.

TABLE-US-00033 TABLE 30 Genotype of strains disclosed in Example 18. GEVO No. Genotype GEVO3868 S. cerevisiae, CEN.PK2, MATa ura3 leu2 his3 trp1 gpd1::T.sub.Kl_URA3 gpd2::T.sub.Kl_URA3 tma29::T.sub.Kl_URA3 pdc1::P.sub.PDC1-Ll_kivD2_coSc5-P.sub.FBA1- LEU2-T.sub.LEU2-P.sub.ADH1-Bs_alsS1_coSc-T.sub.CYC1- P.sub.PGK1-Ll_kivD2_coEc-P.sub.ENO2-Sp_HIS5 pdc5::T.sub.Kl_URA3 pdc6::P.sub.TDH3-Sc_AFT1-P.sub.ENO2- Ll_adhA.sup.RE1-T-.sub.Kl_URA3_short-P.sub.FBA1-Kl_URA3-T.sub.Kl_URA3 {evolved for C2 supplement-independence, glucose tolerance and faster growth}

TABLE-US-00034 TABLE 31 Plasmids disclosed in Example 18. Plasmid Name DHAD Genotype pGV2663 none P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1, 2 .mu.-ori, pUC ori, bla, G418r pGV2635 L. lactis P.sub.PDC1-286-Ll_ilvD_coSc, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1, 2 .mu.-ori, pUC ori, bla, G418r pGV2671 S. cerevisiae P.sub.PDC1-286-Sc_ilv3_.DELTA.N20, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1, 2 .mu.-ori, pUC ori, bla, G418r pGV2672 G. forsetii P.sub.PDC1-286-Gf_ilvD_coSc, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1, 2 .mu.-ori, pUC ori, bla, G418r pGV2673 S. erythraea P.sub.PDC1-286-Se_ilvD_coSc, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1, 2 .mu.-ori, pUC ori, bla, G418r pGV2674 F. tularensis P.sub.PDC1-286-Ft_ilvD_coSc, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2675 S. cerevisiae P.sub.PDC1-286-Sc_ilv3_.DELTA.N19, ilv3.DELTA.N19 P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2676 S. P.sub.PDC1-286-Sc_ilv3_.DELTA.N23, cerevisiae P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, ilv3.DELTA.N23 P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2677 N. P.sub.PDC1-286-Nc_ilvD2_coSc, crassa ilvD2 P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2678 Acidobacteria P.sub.PDC1-286-Ab_ilvD_coSc, bacterium P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2679 Acaryochloris P.sub.PDC1-286-Am_ilvD_coSc, marina P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2680 Lyngbya spp. P.sub.PDC1-286-Lsp_ilvD_coSc, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r pGV2681 E. coli P.sub.PDC1-286-Ec_ilvD_coKl, P.sub.TDH3-Ec_ilvC_coSc.sup.P2D1-A1-his6, P.sub.ENO2-Ll_adhA.sup.RE1 2 .mu.-ori, pUC ori, bla, G418r

TABLE-US-00035 TABLE 32 DHAD sequences disclosed in Example 18. SEQ ID NO SEQ ID NO DHAD Abbreviation (DNA) (protein) L. lactis Ll_ilvD_coSc 155 18 S. cerevisiae ilv3.DELTA.N20 Sc_ilv3_.DELTA.N20 89 26 G. forsetii Gf_ilvD_coSc 90 17 S. erythraea Se_ilvD_coSc 91 19 F. tularensis Ft_ilvD_coSc 156 14 S. cerevisiae ilv3.DELTA.N19 Sc_ilv3_.DELTA.N19 157 163 S. cerevisiae ilv3.DELTA.N23 Sc_ilv3_.DELTA.N23 158 164 N. crassa ilvD2 Nc_ilvD2_coSc 159 165 A. bacterium Ab_ilvD_coSc 92 16 A. marina Am_ilvD_coSc 160 166 Lyngbya spp. Lsp_ilvD_coSc 161 167 E. coli Ec_ilvD_coKl 162 168

[0406] Cloning techniques included digestion with restriction enzymes, gel purification of DNA fragments (using the Zymoclean Gel DNA Recovery Kit, Cat# D4002, Zymo Research Corp, Orange, Calif.), ligation of two DNA fragments using the DNA Ligation Kit (Mighty Mix Cat# TAK 6023, Clontech Laboratories, Madison, Wis.), and bacterial transformations into competent E. coli cells (Xtreme Efficiency DH5.alpha. Competent Cells, Cat# ABP-CE-CCO2096P, Allele Biotechnology, San Diego, Calif.). Plasmid DNA was purified from E. coli cells using the Qiagen QIAprep Spin Miniprep Kit (Cat#27106, Qiagen, Valencia, Calif.).

[0407] Yeast media used for this example include YP medium (1% (w/v) yeast extract, 2% (w/v) peptone), YPD medium (YP medium containing 2% (w/v) glucose), YPD supplemented with glycerol and ethanol (YPD medium containing 1% (v/v) 80% glycerol and 1% (v/v) ethanol. The antibiotic G418 was added to agar plates to a final concentration of 0.2 g/L. Precultures were grown in YP medium supplemented with 5% glucose, 1% ethanol, and 0.2 g/L G418. Fermentations were carried out in YP medium containing 8% glucose, 1% v/v of ergosterol and Tween-80 in 100% ethanol, 200 mM MES (pH 6.5), and 0.2 .mu.g/mL G418.

[0408] A large patch of S. cerevisiae strain GEVO3868 was grown on an YPD plate. Cells from the patch were scraped from the plate, resuspended in 2 mL YPD containing 1% v/v ethanol containing 1% v/v 80% glycerol and placed in the 30.degree. C. orbital shaker overnight. The following morning, 1 mL of the overnight culture was used to inoculate 50 mL YPD containing 1% ethanol containing 1% v/v 80% glycerol and returned to the 30.degree. C. orbital shaker. After 6 hours, the cells were at an OD.sub.600 of 0.55. They were diluted to an OD.sub.600 of 0.1 in the same media and grown overnight at 30.degree. C. In the morning the cells were diluted to an OD.sub.600 of 0.6, grown for 3 hours at 30.degree. C. until the OD.sub.600 was 1.1, and the cells were collected by centrifugation at 2700 rcf for 2 min at room temperature. The medium was removed, 50 mL sterile milliQ water was used to wash the cells, and the cells were centrifuged for 2 min at 2700 rcf at room temperature. After removing the supernatant, the cells were washed with 25 mL sterile milliQ water and centrifuged at 2700 rcf for 2 min at room temperature. The supernatant was removed and the cells were resuspended in 1 mL 100 mM lithium acetate. The cells were centrifuged for 10 sec, the supernatant removed, and the cells resuspended in 400 .mu.L 100 mM lithium acetate. The cells were transformed as follows. First, a mixture of plasmid DNA (final volume of 15 .mu.l with sterile water), 72 .mu.l 50% PEG, 10 .mu.l 1M lithium acetate, and 3 .mu.l of denatured salmon sperm DNA (10 mg/mL) was prepared for each transformation. In a sterile 1.5 mL tube, 15 .mu.l of the cell suspension was added to the DNA mixture (100 .mu.l), and the transformation suspension was vortexed for 5 short pulses. The transformation was incubated for 30 min at 30.degree. C., followed by incubation for 22 min at 42.degree. C. The cells were collected by centrifugation (18,000.times.g, 10 seconds, 25.degree. C.). After removing the supernatant, the cells were resuspended in 400 .mu.l YPD. After an overnight recovery shaking at 30.degree. C. and 250 RPM, the cells were spread over selective plates, YPD containing 0.2 g/L G418. Transformants were then single colony purified onto selective plates.

[0409] For fermentations, 3 mL cultures of GEVO3868 transformed with each 2p plasmid were started in YPD containing 1% ethanol containing 0.2 g/L G418 and incubated overnight at 30.degree. C. and 250 RPM. There were three biological replicates of each strain for 39 cultures total. After the OD.sub.600 of these cultures were taken the next day, the appropriate amount of culture was used to inoculate 50 mL of YP with 5% glucose containing 1% ethanol containing 0.2 g/L G418 (baffled flask) to an OD.sub.600 of approximately 0.1. These cultures were incubated at 30.degree. C. and 250 RPM overnight. The next day, the cultures containing the S. cerevisiae ilv3.DELTA.N20, the S. cerevisiae ilv3.DELTA.N19, and the S. cerevisiae ilv3.DELTA.N23 did not reach an OD.sub.600 of 5 (0.6-2.4) so incubation continued for another 24 h at 30.degree. C. and 250 RPM. The remaining 30 cultures had reached an OD.sub.600 of approximately 5 and were centrifuged in 50 mL Falcon tubes at 2700 rcf for 5 min at 25.degree. C. The cells from the 30 cultures were resuspended in 50 mL YP with 8% glucose, 1% (v/v) ethanol, ergosterol, Tween-80, 200 mM MES (pH 6.5), and 0.2 g/L G418. The cultures were transferred to 250 mL unbaffled flasks with closed screw caps and incubated at 30.degree. C. and 75 RPM. The next day, the remaining 9 cultures were at a higher OD.sub.600 (3-5) and prepared for the fermentation as described above. At 24 and 48 h after transfer to 250 mL unbaffled flasks with closed screw caps, samples of each of the 39 flasks were taken to determine OD.sub.600 and prepared for gas chromatography as follows. 2 mL of sample (per flask) was removed and OD.sub.600 was determined. The remaining sample was centrifuged for 10 min at maximum speed. 1 mL of the supernatant was analyzed by gas chromatography as described. For the final 72 h timepoint, the same procedures were used for measuring OD.sub.600 and analysis by gas chromatography. In addition samples were analyzed by high performance liquid chromatography. Cells were also prepared for enzyme assays. After 3.times.15 mL Falcon tubes per flask were weighed (total of 117), 14 mL of the appropriate sample was transferred into the Falcon tubes. After centrifugation at 3000.times.g for 5 min at 4.degree. C., the supernatant was removed and the cells washed in 3 mL cold, sterile water. The tubes were centrifuged as per above for 2 min, the supernatant removed, and the tubes reweighed to determine total cell weight. The Falcon tubes were stored at -80.degree. C.

[0410] Analysis of organic acid metabolites was performed on an HP-1200 HPLC system equipped with two Restek RFQ 150.times.7.8 mm columns in series. Organic acid metabolites were detected using an HP-1100 UV detector (210 nm) and refractive index. The column temperature was 60.degree. C. This method was isocratic with 0.0180 N H.sub.2SO.sub.4 (in Milli-Q water) as mobile phase. Flow was set to 1.1 mL/min. Injection volume was 20 .mu.L and run time was 16 min. Analysis was performed using authentic standards (>99%, obtained from Sigma-Aldrich, with the exception of 2,3-dihydroxyisovalerate (DHIV), which was custom synthesized according to Cioffi et al., 1980, Anal Biochem 104: 485 and a 5-point calibration curve.

[0411] Analysis of volatile organic compounds, including ethanol and isobutanol was performed on a HP 5890, 6890 or 7890 gas chromatograph fitted with an HP 7673 Autosampler, a DB-FFAP column (J&W; 30 m length, 0.32 mm ID, 0.25 .mu.M film thickness) or equivalent connected to a flame ionization detector (FID). The temperature program was as follows: 230.degree. C. for the injector, 300.degree. C. for the detector, 100.degree. C. oven for 1 minute, 70.degree. C./minute gradient to 230.degree. C., and then hold for 2.5 min. Analysis was performed using authentic standards (>99%, obtained from Sigma-Aldrich, and a 5-point calibration curve with 1-pentanol as the internal standard.

[0412] For DHAD activity assays cells were thawed on ice and resuspended in lysis buffer (50 mM Tris pH 8.0 and 5 mM MgSO.sub.4) for a 20% cell suspension by mass. 1000 .mu.l of glass beads (0.5 mm diameter) were added to a 1.5 ml Eppendorf tube and 875 .mu.l of cell suspension was added. Yeast cells were lysed using a Retsch MM301 mixer mill (Retsch Inc. Newtown, Pa.), mixing 6.times.1 min each at full speed with 1 min incubations on ice between each bead-beating step. The tubes were centrifuged for 10 min at 23,500.times.g at 4.degree. C. and the supernatant was removed for use. These lysates were held on ice until assayed. Yeast lysate protein concentration was determined using the BioRad Bradford Protein Assay Reagent Kit (Cat#500-0006, BioRad Laboratories, Hercules, Calif.) and using BSA for the standard curve. Briefly 10 .mu.L standard or lysate were added into a microcentrifuge tube. The samples were diluted to fit in the linear range of the standard curve (1:40). 500 .mu.L of diluted and filtered Bio-Rad protein assay dye was added to the blank and samples and then vortexed. Samples were incubated at room temperature for 6 min, transferred into cuvettes and the OD.sub.595 was determined in a spectrophotometer. The linear regression of the standards was then used to calculate the protein concentration in each sample. For DHAD assays technical triplicates were performed for each sample. In addition, a no lysate control with lysis buffer was performed. To assay each sample, 10 .mu.L of an appropriate dilution of lysate in assay buffer was mixed with 90 .mu.L of assay buffer (5 .mu.L of 0.1 M MgSO.sub.4, 10 .mu.L of 0.1 M DHIV, and 75 .mu.L 50 mM Tris pH 8.0), and incubated in a thermocycler for 30 minutes at 35.degree. C., then at 95.degree. C. for 5 minutes. Cell debris and precipitant were removed from the samples by centrifugation at 3000.times.g for 5 min.

[0413] Finally, 75 .mu.L of supernatant was transferred to new PCR tubes and analyzed by Liquid Chromatography for the 2-keto-isovalerate (KIV) product. DNPH reagent (12 mM 2,4-Dinitrophenyl Hydrazine 20 mM Citric Acid pH 3.0 80% Acetonitrile 20% MilliQ H.sub.2O) was added to each sample in a 1:1 ratio. Samples were incubated for 30 min at 70.degree. C. in a thermo-cycler (Eppendorf, Mastercycler). Analysis of KIV was performed on an HP-1200 High Performance Liquid Chromatography system equipped with an Eclipse XDB C-18 reverse phase column (Agilent) and a C-18 reverse phase column guard (Phenomenex). Ketoisovalerate was detected using an HP-1100 UV detector (360 nm). The column temperature was 50.degree. C. This method was isocratic with 70% acetonitrile 2.5% phosphoric acid (4%), 27.5% water as mobile phase. Flow was set to 3 mL/min. Injection size was 10 .mu.L and run time was 2 min.

[0414] The data at 72 hours are summarized in Table 33. The data demonstrates that the DHADs contained in plasmids pGV2635, 2677, 2674, 2672, 2673 and 2676 led to production of isobutanol titers of at least 2.5 g/L and are considered to be significantly active in the cytosolic isobutanol pathway. The DHADs contained in plasmids pGV2675, 2681, 2680, 2678, 2679, 2671, and 2676 led to production of isobutanol titers below 2.5 g/L and are considered to be inactive or poorly active in the cytosolic isobutanol pathway.

TABLE-US-00036 TABLE 33 Isobutanol production with selected DHADs. Plasmid Isobutanol produced DHAD activity (DHAD Gene) OD.sub.600 [g/L] (U/mg) pGV2635 8.6 .+-. 0.6 9.02 .+-. 0.28 0.62 .+-. 0.01 (L. lactis) pGV2677 9.4 .+-. 0.6 6.30 .+-. 0.85 0.42 .+-. 0.02 (N. crassa) pGV2674 7.5 .+-. 0.7 6.22 .+-. 0.31 0.30 .+-. 0.00 (F. tularensis) pGV2672 8.1 .+-. 0.6 6.10 .+-. 0.26 0.20 .+-. 0.00 (G. forsetii) pGV2673 8.0 .+-. 1.1 3.23 .+-. 0.12 0.03 .+-. 0.00 (S. erythraea) pGV2676 5.2 .+-. 0.2 2.67 .+-. 0.06 0.02 .+-. 0.00 (S. cerevisiae ilv3.DELTA.N23) pGV2675 5.0 .+-. 0.2 2.27 .+-. 0.16 0.09 .+-. 0.00 (S. cerevisiae ilv3.DELTA.N19) pGV2681 6.9 .+-. 0.6 2.21 .+-. 0.09 0.03 .+-. 0.00 (E. coli) pGV2680 6.9 .+-. 1.3 2.13 .+-. 0.09 0.02 .+-. 0.00 (Lyngbya spp.) pGV2678 7.5 .+-. 0.2 2.06 .+-. 0.17 0.03 .+-. 0.00 (Acidobacteria) pGV2679 7.5 .+-. 0.6 2.05 .+-. 0.06 0.03 .+-. 0.00 (A. marina) pGV2671 5.5 .+-. 0.0 1.92 .+-. 0.03 0.44 .+-. 0.01 (S. cerevisiae) pGV2663 6.7 .+-. 0.2 1.53 .+-. 0.18 0.01 .+-. 0.01 (none)

Example 19

Overexpression of the L. lactis ilvD in K. lactis and K. Marxianus

[0415] The purpose of this example is to demonstrate activity of L. lactis DHAD in K. lactis and in K. marxianus.

[0416] Strains, plasmids, and sequences disclosed herein are listed in Tables 34, 35, and 36, respectively.

TABLE-US-00037 TABLE 34 Genotype of strains disclosed in Example 19. GEVO Number Genotype K. marxianus strain K. marxianus NRRL-Y-7571 ura3-delta2 GEVO2504 pdc1.DELTA.::Ll.kivd2 coSc. P.sub.TDH3: Dm_ADH:P.sub.FBA1:URA3: P.sub.Sc_FBA1:31COX4_MTS:Bs_alsS1_coSc K. marxianus strain ura3-delta2 pdc1.DELTA.::.DELTA.::{Ll_kivd2 GEVO2543 co:P.sub.Sc_TDH3:Ec_ilvC.sup.Q11V coSC: P.sub.Sc_TPI1:G418.sup.R:P.sub.Sc_CUP1:Bs_alsS1_coSc} K. marxianus strain ura3-delta2 pdc1.DELTA.::{Ll_kivd2 GEVO2598 co:P.sub.Sc_TDH3:Ec_ilvC.sup.Q110V coSC: P.sub.Sc_TPI1:G418.sup.R:P.sub.Sc_CUP1:Bs_alsS1_coSc} + random integration of {P.sub.Sc_TEF1:Ll_ilvD_coSc URA3} K. lactis strain MATalpha uraA1 trp1 leu2 lysA1 ade1 GEVO1287 lac4-8 [pKD1] ATCC 200826

TABLE-US-00038 TABLE 35 Plasmids disclosed in Example 19. Plasmid Name Relevant Genes/Usage Genotype pGV2271 Empty 1.6 micron 1.6 .mu. ori, bla, hygroR vector that can be maintained in K. lactis. Encodes hygromycin resistance. pGV2273 1.6 micron vector for P.sub.TDH3: Ec_ilvC_P2D1-A1, expression of KARI, P.sub.TEF1: Ll_ilvD_coSc, P.sub.PGK1: KIVD, DHAD and ADH Ll_kivD2_coEc, in K. lactis P.sub.ENO2: Ll_adhA 1.6 .mu. ori, bla, HygroR pGV2069 2 micron plasmid for P.sub.TDH3: Ec_ilvC_coScQ.sup.110V, expression of KIVD, P.sub.TEF1: Ll_ilvD_coSc, P.sub.PGK1: DHAD, KARI, and ALS Ll_kivD2_coEc, P.sub.CUP1: in K. marxianus Bs_alsS1_coSc, P.sub.ENO2: Dm_adhA, 2 .mu. ori, bla, G418 pGV1855 2 micron plasmid for expression P.sub.TEF1: Ll_ilvD, 2 .mu. ori, of DHAD in K. marxianus bla, URA

TABLE-US-00039 TABLE 36 Amino acid and nucleotide sequences of enzymes and genes disclosed in Example 19. Corresponding Protein Enz. Source Gene (SEQ ID NO) (SEQ ID NO) ALS B. subtilis Bs_alsS1_coSc Bs_AlsS1_coSc (SEQ ID NO: 144) (SEQ ID NO: 169) KARI E. coli Ec_ilvC_coSc.sup.Q110V Ec_IlvC_coSc.sup.Q110V (SEQ ID NO: 98) (SEQ ID NO: 170) E.coli Ec_ilvC_coSc.sup.P2D1-A1 Ec_ilvC_coSc.sup.P2D1-A1 (SEQ ID NO: 171) (SEQ ID NO: 172) KIVD L. lactis Ll_kivd2_coEc Ll_Kivd2_coEc (SEQ ID NO: 99) (SEQ ID NO: 173) DHAD L. lactis Ll_ilvD_coSc Ll_IlvD_coSc (SEQ ID NO: 155) (SEQ ID NO: 18) ADH L. lactis Ll_adhA Ll_adhA (SEQ ID NO: 174) (SEQ ID NO: 175) D. melanogaster Dm_adh Dm_adh (SEQ ID NO: 116) (SEQ ID NO: 176)

[0417] To generate GEVO2543, GEVO2504 was transformed with pGV2069 to integrate into the genome three genes: Bs_alsS1_coSc (SEQ ID NO: 144), Ec_ilvC_coSc.sup.Q110V (SEQ ID NO: 98), and LI_kivd2_coEc (SEQ ID NO: 99). To generate GEVO2598, GEVO2543 was transformed pGV1855 to integrate the L. lactis ilvD gene which was codon optimized for S. cerevisiae (gene sequence SEQ ID NO: 155, also referred to as LI_ilvD_coSc; protein sequence SEQ ID NO: 18) into the chromosome. GEVO1287 was transformed with either pGV2271 (control plasmid) or pGV2273, which contains LI_ilvD_coSc.

[0418] GEVO2543, GEVO2598 and GEVO1287 transformed with pGV2271 or pGV2273 were inoculated into 3 mL of YPD (for GEVO2543 and GEVO2598) or YPD supplemented with 0.1 g/L hygromycin (for GEVO1287) for an overnight culture. After approximately 18 hours, a 50 ml YPD culture in a baffled 250 ml shake flask was inoculated to 0.15 OD.sub.600 and shaken at 250 rpms for approximately 9 hours. Next, DHAD activity and protein concentrations were measured.

[0419] Over-expression of the L. lactis ilvD gene resulted in an increase in DHAD activity (U/mg total cell lysate protein). Table 37 shows the DHAD activity (U/mg total cell lysate protein) averages from technical triplicates comparing strains expressing the L. lactis DHAD to strains not expressing the L. lactis DHAD gene.

TABLE-US-00040 TABLE 37 DHAD activity in whole cell yeast lysates. Strain Activity [mU/mg] K. marxianus strain GEVO2543 (no DHAD) 0.010 .+-. 0.002 K. marxianus strain GEVO2598 (DHAD) 0.016 .+-. 0.001 K. lactis strain GEVO1287 + pGV2271 (No DHAD) 0.052 .+-. 0.003 K. lactis strain GEVO1287 + pGV2273 (DHAD) 0.122 .+-. 0.011

Example 20

L. lactis ilvD Activity is Localized to the Yeast Cytosol

[0420] The purpose of this example is to demonstrate that the Lactococcus lactis ilvD protein localizes to the cytosol when expressed in a yeast strain.

[0421] The S. cerevisiae strain GEVO1187 (S. cerevisiae CEN.PK2, MATa ura3 leu2 his3 trp1 ADE2) was transformed with plasmid pGV2484, a 2 micron plasmid expressing the L. lactis ilvD gene which was codon optimized for S. cerevisiae (gene sequence SEQ ID NO: 155, also referred to as LI_ilvD_coSc; protein sequence SEQ ID NO: 18) under the S. cerevisiae TEF1 promoter (P.sub.TEF1:LI_ilvD_coSc, 2.mu. ori, bla, G418R). Briefly, the strain was grown in YPD to an OD.sub.600 of 0.6-0.8. Cells were washed in H.sub.20, and then resuspended in 100 mM Lithium acetate. In a 1.5 mL tube, 15 .mu.L of the cell suspension was added to a mixture of DNA (final volume of 15 .mu.l with sterile water), 72 .mu.l 50% PEG, 10 .mu.l 1M lithium acetate, and 3 .mu.l of denatured salmon sperm DNA (10 mg/mL). The transformation suspension was vortexed for 5 short pulses. The mixture was incubated at 30.degree. C. for 30 minutes, followed by incubation for 22 minutes at 42.degree. C. The cells were collected by centrifugation (18,000.times.g, 10 seconds, 25.degree. C.). The cells were resuspended in 1 ml YPD medium (1% (w/v) yeast extract, 2% (w/v) peptone, 2% (w/v) glucose, pH 5) and after an overnight recovery shaking at 30.degree. C. and 250 rpms, the cells were spread over YPD agar plates supplemented with 0.2 g/L G418. Transformants were then single colony purified onto G418 selective plates.

[0422] All isolations of crude mitochondrial fractions were performed in duplicate. GEVO1187 and GEVO1187 transformed with pGV2484 were each grown in 100 mL of YPG medium (1% (w/v) yeast extract, 2% (w/v) peptone, 3% (v/v) glycerol, pH5) overnight at 30.degree. C. and 250 rpm. This overnight culture was used to inoculate 840 mL of YPG in a 2800 mL baffled flask at an OD.sub.600 of 0.03, and cells were grown at 30.degree. C. and 250 rpm for 20-28 h. At an OD.sub.600 of about 2.0, cells were harvested by centrifugation at 3000.times.g for 5 minutes, resuspended in 100 mL H.sub.2O followed by centrifugation at 3000.times.g for 5 minutes. Cells were incubated in 2 mL/g CWW (cell wet weight) of DTT buffer (100 mM Tris-H.sub.2SO.sub.4 pH 9.4, 10 mM DTT) for 20 minutes at 30.degree. C. Cells were resuspended in 7 mL/g CWW Zymolyase buffer (1.2 M sorbitol, 20 mM Potassium phosphate pH 7.4) and then centrifuged at 3000.times.g for 5 minutes. Cells were spheroplasted by incubating in Zymolyase buffer with Zymolyase (Seikagaku Biobusiness Corporation #120491-1; 3 mg/g CWW) for 45 minutes at 30.degree. C. on a rocking platform. 100 OD of spheroplasts were set aside for whole cell lysate preparation (see below). Spheroplasts were resuspended in Zymolyase buffer and centrifuged at 3000.times.g for 5 minutes before resuspension in 6.5 mL/g CWW homogenization buffer (chilled to 4.degree. C.; 6.5 mL/g 0.6 M sorbitol, 10 mM Tris-HCl pH 7.4, 1 mM EDTA, 1 mM PMSF, 0.2% (w/v) BSA). Spheroplasts were homogenized on ice with 15 strokes of a pre-chilled glass-Teflon homogenizer (40 mL capacity), and the sample was diluted 2-fold with homogenization buffer. Cell debris and nuclei were pelleted by serial supernatant centrifugations of 1500.times.g for 5 minutes, and 4000.times.g for 5 minutes. The mitochondrial fraction was isolated by centrifugation at 12,000.times.g for 15 minutes. The crude mitochondrial pellet was resuspended in 10 mL SEM buffer (250 mM sucrose, 1 mM EDTA, 10 mM MOPS-KOH pH 7.2), centrifuged at 4000.times.g for 5 minutes to further remove cellular debris and nuclei before recovering the mitochondrial fraction by centrifugation at 12,000.times.g for 15 minutes. The mitochondrial fraction may contain markers of the plasma membrane, the endoplasmic reticulum, and vacuoles in addition to markers of the mitochondria. Mitochondrial pellet was resuspended in 750 .mu.L SEM Buffer+Protease Arrest (GBiosciences #786-108).

[0423] Preparation of whole cell yeast lysates was performed using the 100 ODs of yeast cells set aside after spheroplasting (see above) by resuspending cells in 20% (w/v) SEM Buffer+1.times. Protease Arrest (GBiosciences #786-108). 1000 .mu.l of glass beads (0.5 mm diameter) were added to a 1.5 ml eppendorf tube, and 875 .mu.l of cell suspension was added. Yeast cells were lysed using a Retsch MM301 mixer mill (Retsch Inc. Newtown, Pa.), mixing 6.times.1 min each at full speed with 1 min incubations on ice between each bead-beating step. The tubes were centrifuged for 10 min at 23,500.times.g at 4.degree. C., the supernatant was removed, aliquoted, flash frozen in liquid nitrogen, and stored at -80.degree. C.

[0424] The resuspended mitochondrial fraction (see above) was added to 1000 .mu.l of glass beads (0.1 mm diameter) in a 1.5 ml Eppendorf tube. Additional buffer was added if necessary to fill the tube completely. The mitochondrial fraction was lysed using a Retsch MM301 mixer mill (Retsch Inc. Newtown, Pa.), mixing 3.times.1 minute each at full speed with 1 minute incubations on ice between each bead-beating step. The tubes were centrifuged for 10 min at 23,500.times.g at 4.degree. C., the supernatant was removed, aliquoted, flash frozen in liquid nitrogen, and stored at -80.degree. C.

[0425] Whole cell yeast lysate and mitochondrial fraction lysate protein concentration was determined using the BioRad Bradford Protein Assay Reagent Kit (Cat#500-0006, BioRad Laboratories, Hercules, Calif.) and using BSA for the standard curve. Briefly, 10 .mu.L standard or lysate were added into a microcentrifuge tube. The samples were diluted to fit in the linear range of the standard curve (1:10-1:40). 500 .mu.L of diluted and filtered Bio-Rad protein assay dye was added to the blank and samples and then vortexed. Samples were incubated at room temperature for 6 mins, transferred into cuvettes and the OD.sub.595 was determined in a spectrophotometer. The linear regression of the standards was then used to calculate the protein concentration in each sample.

[0426] Three samples of each of the mitochondrial and whole cell yeast lysates were assayed for DHAD activity, along with no lysate controls. Table 38 shows the DHAD activity (U/mg protein) averages from duplicate cultures comparing strains GEVO1187 (no DHAD expression) to GEVO1187 transformed with pGV2484 (L. lactis DHAD expressed from pGV2484). DHAD activity was measured in the whole cell yeast lysate and the mitochondrial fraction lysate. Expression of DHAD from pGV2484 resulted in about a 7-fold increase in DHAD activity in the whole cell yeast lysate. Expression of DHAD from pGV2484 did not affect DHAD activity localized to the mitochondrial fraction. Subtracting the background activity in the GEVO1187 whole cell yeast lysate of 0.27 mU/mg from the activity in the whole cell yeast lysate of GEVO1187 transformed with pGV2484 of 1.87 mU/mg shows an increase in 1.60 mU/mg. These data suggest that L. lactis DHAD activity does not localize to the organellar structures harvested in the mitochondrial fraction, and is therefore cytosolic when expressed in a yeast strain.

TABLE-US-00041 TABLE 38 DHAD activity in whole cell yeast lysates and mitochondrial fraction lysates. Activity Strain Lysate [mU/mg] GEVO1187 Whole cell 0.27 .+-. 0.07 GEVO1187 transformed with pGV2484 Whole cell 1.87 .+-. 0.14 GEVO1187 Mitochondrial 3.76 .+-. 0.01 GEVO1187 transformed with pGV2484 Mitochondrial 3.85 .+-. 0.13

Example 21

Overexpression of the L. lactis ilvD in Issatchenkia orientalis

[0427] The purpose of this example is to demonstrate cytosolic activity of L. lactis DHAD in I. orientalis.

[0428] An engineered strain derived from the wild-type I. orientalis strain ATCC PTA-6658 was further modified to contain copies of all five isobutanol pathway genes integrated into the chromosome. First, both alleles of the PDC1 locus were deleted in series (See e.g. WO/2007/106524, which is herein incorporated by reference in its entirety). The deletion event also simultaneously integrated a copy of B. subtilis alsS gene and a copy of the L. lactis kivD gene which encode SEQ ID NOs: 169 and 173, respectively. This resulted in a Pdc-strain with two integrated copies of the B. subtilis alsS gene and two integrated copies of the L. lactis kivD gene (pdc1.DELTA.:LI_kivD: Bs_alsS pdc1.DELTA.:LI_kivD: Bs_alsS). This strain was further engineered to delete a single allele of the GPD1 locus (See e.g. WO/2007/106524). The deletion event also simultaneously integrated a single copy of the L. lactis adhA.sup.RE1, the E. coli ilvC.sup.P2D1-A1, and L. lactis ilvD which encode the proteins shown in SEQ ID NOs: 177, 172, and 18, respectively. This results in a Pdc- Gpd+ strain with one integrated copy of the LI_adhA.sup.RE1, Ec_ilvC.sup.P2D1-A1, and LI_ilvD genes (GPD1/gpd1.DELTA.:[LI_adhA.sup.RE1: Ec_ilvC.sup.P2D1-A1: URA3:LI_ilvD]). This strain is GEVO4306 (Table 39).

[0429] To generate a control strain which does not express the pathway genes, both alleles of the PDC1 locus were deleted in series but with no simultaneous integration of heterologous genes. Next one of the two GPD1 alleles was deleted with no simultaneous integration of heterologous genes. The resulting control strain is GEVO4308 (pdc1.DELTA.::loxP/pdc1.DELTA.::loxP GPD1/gpd1.DELTA.::loxP:URA3:loxP) (Table 39).

TABLE-US-00042 TABLE 39 Genotype of strains disclosed in Example 21. GEVO Number Genotype 4306 pdc1.DELTA.::[Ll_kivD: Bs_alsSl pdc1.DELTA.::Ll_kivD: Bs_alsS] GPD1/gpd1.DELTA.::[Ll_adhA.sup.RE1: Ec_ilvC.sup.P2D1-A1: URA3Ll_ilvD] 4308 pdc1.DELTA.::loxP/pdc1.DELTA.::loxP GPD1/gpd1.DELTA.::loxP:URA3:loxP

[0430] Over-expression of the L. lactis ilvD gene resulted in an increase in DHAD activity (U/mg total cell lysate protein). Table 40 shows the DHAD activity (U/mg total cell lysate protein) averages from technical triplicates comparing the strain expressing the L. lactis DHAD gene to the strain not expressing the L. lactis DHAD gene. Expression of the L. lactis ilvD gene, when expressed with the remainder of the isobutanol pathway, resulted in isobutanol production as seen in Table 40.

TABLE-US-00043 TABLE 40 DHAD activity in whole cell yeast lysates and isobutanol titer after 72 hr fermentation. Strain Activity [mU/mg] Isobutanol titer g/L GEVO4306 0.041 .+-. 0.009 0.56 .+-. 0.01 GEVO4308 0.012 .+-. 0.002 0.00 .+-. 0.00

Example 22

Cytosolic ALS Homologs that Support Isobutanol Production

[0431] This example demonstrates isobutanol production using expression of cytosolically localized ALS genes in the presence of the rest of the isobutanol pathway. The ALS genes were integrated into the PDC1 locus of S. cerevisiae strain GEVO1187 and isobutanol production was achieved by expression from plasmid of the other genes in the isobutanol pathway. Isobutanol production in strains carrying the ALS genes from T. atroviride (Ta_ALS) and T. stipitatus (Ts_ALS) was compared to isobutanol production in strains carrying the ALS gene from B. subtilis. Plasmids described in this example are listed in Table 41.

TABLE-US-00044 TABLE 41 Plasmids disclosed in Example 22. Plasmid name Relevant Genes/Usage Genotype pGV1730 Integration plasmid that will integrate See Table 14. P.sub.CUP1-1:Bs_alsS2 into PDC1 using digestion was the with NruI for targeting. This parent vector for cloning the ALS homologs. pGV1773 Vector with Bacillus subtilis AlsS P.sub.PDC1:Bs_AlsS1_coSc, codon optimized for S. cerevisiae. P.sub.TDH3:Ll_kivD, P.sub.ADH1:Sc_ADH7_coSc, URA3 5'-end, pUC ORI, kan.sup.R. pGV1802 DNA2.0 plasmid carrying the Ta_ALS_coSc in DNA Trichoderma atrovirideALS. 2.0 vector pGV1803 DNA2.0 plasmid carrying the Ts_ALS_coSc in DNA Talaromyces stipitatus ALS. 2.0 vector pGV2082 High copy 2 .mu. plasmid with 4 Ec_ilvC_coSc.sup.Q110V, isobutanol pathway genes Ll_ilvD_coSc, without an ALS gene. Ll_kivD2_coEc, and Dm_ADH, 2 .mu. ori, bla, G418R. pGV2114 Integration plasmid that will integrate See Table 14. into PDC1 using digestion with NruI for targeting. It carries the Bacillus subtilis AlsS gene codon optimized for S. cerevisiae. pGV2117 Integration plasmid that will See Table 14. integrate into PDC1 using digestion with NruI for targeting. It carries the Trichoderma atroviride ALS gene codon optimized for S. cerevisiae. pGV2118 Integration plasmid that will See Table 14. integrate into PDC1 using digestion with NruI for targeting. It carries the Talaromyces stipitatus ALS gene codon optimized for S. cerevisiae.

[0432] Strains with integrated ALS genes expressed from the CUP1 promoter were transformed with pGV2082 (which carries the other 4 isobutanol pathway genes Ec_ilvC_coScQ110V (SEQ ID NO: 98), LI_ilvD (SEQ ID NO: 155), LI_kivd2_coEc (SEQ ID NO: 99), and Dm ADH (SEQ ID NO: 116).

[0433] GEVO2618, GEVO2621, and GEVO2622 (see Table 13) were each transformed with pGV2082. Control strains GEVO2280 (B. subtilis alsS2) (Table 13) and GEVO1187 (no ALS) (Table 13) were also transformed with pGV2082.

[0434] Fermentations of the transformed strains GEVO1187, GEVO2280, GEVO2618, GEVO2621, GEVO2622 were performed. Strains encoding the ALS from T. atroviride (SEQ ID NO: 71) and T. stipitatus (SEQ ID NO: 72) produced more isobutanol than the strain containing the B. subtilis als2. The strain containing Bs_Als1_coSc produced the most isobutanol. Table 42 shows the final OD, glucose consumption, and isobutanol titer for each of the strains. The integration of the cytosolic genes Ta_ALS_coSc and Ts_ALS_coSc led to production of isobutanol that was in each case 6-fold above that of a strain without an integrated ALS gene, demonstrating that these strains are producing isobutanol using a cytosolic pathway.

TABLE-US-00045 TABLE 42 Results of fermentations with cytosolic ALS homologs at 72 hrs. Strain OD.sub.600 Glucose consumed g/L Isobutanol produced g/L GEVO1187 10.9 .+-. 0.3 233 .+-. 36 0.3 .+-. 0.0 GEVO2280 9.9 .+-. 0.3 274 .+-. 26 1.3 .+-. 0.11 GEVO2618 9.4 .+-. 0.2 138 .+-. 9 2.6 .+-. .09 GEVO2621 9.9 .+-. 0.3 161 .+-. 52 1.9 .+-. .18 GEVO2622 10.8 .+-. 0.6 182 .+-. 47 1.8 .+-. .15

[0435] The foregoing detailed description has been given for clearness of understanding only and no unnecessary limitations should be understood there from as modifications will be obvious to those skilled in the art.

[0436] While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the essential features hereinbefore set forth and as follows in the scope of the appended claims.

[0437] The disclosures, including the claims, figures and/or drawings, of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entireties.

Sequence CWU 1

1

1771491PRTEscherichia coli 1Met Ala Asn Tyr Phe Asn Thr Leu Asn Leu Arg Gln Gln Leu Ala Gln 1 5 10 15 Leu Gly Lys Cys Arg Phe Met Gly Arg Asp Glu Phe Ala Asp Gly Ala 20 25 30 Ser Tyr Leu Gln Gly Lys Lys Val Val Ile Val Gly Cys Gly Ala Gln 35 40 45 Gly Leu Asn Gln Gly Leu Asn Met Arg Asp Ser Gly Leu Asp Ile Ser 50 55 60 Tyr Ala Leu Arg Lys Glu Ala Ile Ala Glu Lys Arg Ala Ser Trp Arg 65 70 75 80 Lys Ala Thr Glu Asn Gly Phe Lys Val Gly Thr Tyr Glu Glu Leu Ile 85 90 95 Pro Gln Ala Asp Leu Val Ile Asn Leu Thr Pro Asp Lys Gln His Ser 100 105 110 Asp Val Val Arg Thr Val Gln Pro Leu Met Lys Asp Gly Ala Ala Leu 115 120 125 Gly Tyr Ser His Gly Phe Asn Ile Val Glu Val Gly Glu Gln Ile Arg 130 135 140 Lys Asp Ile Thr Val Val Met Val Ala Pro Lys Cys Pro Gly Thr Glu 145 150 155 160 Val Arg Glu Glu Tyr Lys Arg Gly Phe Gly Val Pro Thr Leu Ile Ala 165 170 175 Val His Pro Glu Asn Asp Pro Lys Gly Glu Gly Met Ala Ile Ala Lys 180 185 190 Ala Trp Ala Ala Ala Thr Gly Gly His Arg Ala Gly Val Leu Glu Ser 195 200 205 Ser Phe Val Ala Glu Val Lys Ser Asp Leu Met Gly Glu Gln Thr Ile 210 215 220 Leu Cys Gly Met Leu Gln Ala Gly Ser Leu Leu Cys Phe Asp Lys Leu 225 230 235 240 Val Glu Glu Gly Thr Asp Pro Ala Tyr Ala Glu Lys Leu Ile Gln Phe 245 250 255 Gly Trp Glu Thr Ile Thr Glu Ala Leu Lys Gln Gly Gly Ile Thr Leu 260 265 270 Met Met Asp Arg Leu Ser Asn Pro Ala Lys Leu Arg Ala Tyr Ala Leu 275 280 285 Ser Glu Gln Leu Lys Glu Ile Met Ala Pro Leu Phe Gln Lys His Met 290 295 300 Asp Asp Ile Ile Ser Gly Glu Phe Ser Ser Gly Met Met Ala Asp Trp 305 310 315 320 Ala Asn Asp Asp Lys Lys Leu Leu Thr Trp Arg Glu Glu Thr Gly Lys 325 330 335 Thr Ala Phe Glu Thr Ala Pro Gln Tyr Glu Gly Lys Ile Gly Glu Gln 340 345 350 Glu Tyr Phe Asp Lys Gly Val Leu Met Ile Ala Met Val Lys Ala Gly 355 360 365 Val Glu Leu Ala Phe Glu Thr Met Val Asp Ser Gly Ile Ile Glu Glu 370 375 380 Ser Ala Tyr Tyr Glu Ser Leu His Glu Leu Pro Leu Ile Ala Asn Thr 385 390 395 400 Ile Ala Arg Lys Arg Leu Tyr Glu Met Asn Val Val Ile Ser Asp Thr 405 410 415 Ala Glu Tyr Gly Asn Tyr Leu Phe Ser Tyr Ala Cys Val Pro Leu Leu 420 425 430 Lys Pro Phe Met Ala Glu Leu Gln Pro Gly Asp Leu Gly Lys Ala Ile 435 440 445 Pro Glu Gly Ala Val Asp Asn Gly Gln Leu Arg Asp Val Asn Glu Ala 450 455 460 Ile Arg Ser His Ala Ile Glu Gln Val Gly Lys Lys Leu Arg Gly Tyr 465 470 475 480 Met Thr Asp Met Lys Arg Ile Ala Val Ala Gly 485 490 2395PRTSaccharomyces cerevisiae 2Met Leu Arg Thr Gln Ala Ala Arg Leu Ile Cys Asn Ser Arg Val Ile 1 5 10 15 Thr Ala Lys Arg Thr Phe Ala Leu Ala Thr Arg Ala Ala Ala Tyr Ser 20 25 30 Arg Pro Ala Ala Arg Phe Val Lys Pro Met Ile Thr Thr Arg Gly Leu 35 40 45 Lys Gln Ile Asn Phe Gly Gly Thr Val Glu Thr Val Tyr Glu Arg Ala 50 55 60 Asp Trp Pro Arg Glu Lys Leu Leu Asp Tyr Phe Lys Asn Asp Thr Phe 65 70 75 80 Ala Leu Ile Gly Tyr Gly Ser Gln Gly Tyr Gly Gln Gly Leu Asn Leu 85 90 95 Arg Asp Asn Gly Leu Asn Val Ile Ile Gly Val Arg Lys Asp Gly Ala 100 105 110 Ser Trp Lys Ala Ala Ile Glu Asp Gly Trp Val Pro Gly Lys Asn Leu 115 120 125 Phe Thr Val Glu Asp Ala Ile Lys Arg Gly Ser Tyr Val Met Asn Leu 130 135 140 Leu Ser Asp Ala Ala Gln Ser Glu Thr Trp Pro Ala Ile Lys Pro Leu 145 150 155 160 Leu Thr Lys Gly Lys Thr Leu Tyr Phe Ser His Gly Phe Ser Pro Val 165 170 175 Phe Lys Asp Leu Thr His Val Glu Pro Pro Lys Asp Leu Asp Val Ile 180 185 190 Leu Val Ala Pro Lys Gly Ser Gly Arg Thr Val Arg Ser Leu Phe Lys 195 200 205 Glu Gly Arg Gly Ile Asn Ser Ser Tyr Ala Val Trp Asn Asp Val Thr 210 215 220 Gly Lys Ala His Glu Lys Ala Gln Ala Leu Ala Val Ala Ile Gly Ser 225 230 235 240 Gly Tyr Val Tyr Gln Thr Thr Phe Glu Arg Glu Val Asn Ser Asp Leu 245 250 255 Tyr Gly Glu Arg Gly Cys Leu Met Gly Gly Ile His Gly Met Phe Leu 260 265 270 Ala Gln Tyr Asp Val Leu Arg Glu Asn Gly His Ser Pro Ser Glu Ala 275 280 285 Phe Asn Glu Thr Val Glu Glu Ala Thr Gln Ser Leu Tyr Pro Leu Ile 290 295 300 Gly Lys Tyr Gly Met Asp Tyr Met Tyr Asp Ala Cys Ser Thr Thr Ala 305 310 315 320 Arg Arg Gly Ala Leu Asp Trp Tyr Pro Ile Phe Lys Asn Ala Leu Lys 325 330 335 Pro Val Phe Gln Asp Leu Tyr Glu Ser Thr Lys Asn Gly Thr Glu Thr 340 345 350 Lys Arg Ser Leu Glu Phe Asn Ser Gln Pro Asp Tyr Arg Glu Lys Leu 355 360 365 Glu Lys Glu Leu Asp Thr Ile Arg Asn Met Glu Ile Trp Lys Val Gly 370 375 380 Lys Glu Val Arg Lys Leu Arg Pro Glu Asn Gln 385 390 395 3578PRTOryza sativa 3Met Ala Ala Ser Thr Thr Leu Ala Leu Ser His Pro Lys Thr Leu Ala 1 5 10 15 Ala Ala Ala Ala Ala Ala Pro Lys Ala Pro Thr Ala Pro Ala Ala Val 20 25 30 Ser Phe Pro Val Ser His Ala Ala Cys Ala Pro Leu Ala Ala Arg Arg 35 40 45 Arg Ala Val Thr Ala Met Val Ala Ala Pro Pro Ala Val Gly Ala Ala 50 55 60 Met Pro Ser Leu Asp Phe Asp Thr Ser Val Phe Asn Lys Glu Lys Val 65 70 75 80 Ser Leu Ala Gly His Glu Glu Tyr Ile Val Arg Gly Gly Arg Asn Leu 85 90 95 Phe Pro Leu Leu Pro Glu Ala Phe Lys Gly Ile Lys Gln Ile Gly Val 100 105 110 Ile Gly Trp Gly Ser Gln Gly Pro Ala Gln Ala Gln Asn Leu Arg Asp 115 120 125 Ser Leu Ala Glu Ala Lys Ser Asp Ile Val Val Lys Ile Gly Leu Arg 130 135 140 Lys Gly Ser Lys Ser Phe Asp Glu Ala Arg Ala Ala Gly Phe Thr Glu 145 150 155 160 Glu Ser Gly Thr Leu Gly Asp Ile Trp Glu Thr Val Ser Gly Ser Asp 165 170 175 Leu Val Leu Leu Leu Ile Ser Asp Ala Ala Gln Ala Asp Asn Tyr Glu 180 185 190 Lys Ile Phe Ser His Met Lys Pro Asn Ser Ile Leu Gly Leu Ser His 195 200 205 Gly Phe Leu Leu Gly His Leu Gln Ser Ala Gly Leu Asp Phe Pro Lys 210 215 220 Asn Ile Ser Val Ile Ala Val Cys Pro Lys Gly Met Gly Pro Ser Val 225 230 235 240 Arg Arg Leu Tyr Val Gln Gly Lys Glu Ile Asn Gly Ala Gly Ile Asn 245 250 255 Ser Ser Phe Ala Val His Gln Asp Val Asp Gly Arg Ala Thr Asp Val 260 265 270 Ala Leu Gly Trp Ser Val Ala Leu Gly Ser Pro Phe Thr Phe Ala Thr 275 280 285 Thr Leu Glu Gln Glu Tyr Lys Ser Asp Ile Phe Gly Glu Arg Gly Ile 290 295 300 Leu Leu Gly Ala Val His Gly Ile Val Glu Ala Leu Phe Arg Arg Tyr 305 310 315 320 Thr Glu Gln Gly Met Asp Glu Glu Met Ala Tyr Lys Asn Thr Val Glu 325 330 335 Gly Ile Thr Gly Ile Ile Ser Lys Thr Ile Ser Lys Lys Gly Met Leu 340 345 350 Glu Val Tyr Asn Ser Leu Thr Glu Glu Gly Lys Lys Glu Phe Asn Lys 355 360 365 Ala Tyr Ser Ala Ser Phe Tyr Pro Cys Met Asp Ile Leu Tyr Glu Cys 370 375 380 Tyr Glu Asp Val Ala Ser Gly Ser Glu Ile Arg Ser Val Val Leu Ala 385 390 395 400 Gly Arg Arg Phe Tyr Glu Lys Glu Gly Leu Pro Ala Phe Pro Met Gly 405 410 415 Asn Ile Asp Gln Thr Arg Met Trp Lys Val Gly Glu Lys Val Arg Ser 420 425 430 Thr Arg Pro Glu Asn Asp Leu Gly Pro Leu His Pro Phe Thr Ala Gly 435 440 445 Val Tyr Val Ala Leu Met Met Ala Gln Ile Glu Val Leu Arg Lys Lys 450 455 460 Gly His Ser Tyr Ser Glu Ile Ile Asn Glu Ser Val Ile Glu Ser Val 465 470 475 480 Asp Ser Leu Asn Pro Phe Met His Ala Arg Gly Val Ala Phe Met Val 485 490 495 Asp Asn Cys Ser Thr Thr Ala Arg Leu Gly Ser Arg Lys Trp Ala Pro 500 505 510 Arg Phe Asp Tyr Ile Leu Thr Gln Gln Ala Phe Val Thr Val Asp Lys 515 520 525 Asp Ala Pro Ile Asn Gln Asp Leu Ile Ser Asn Phe Met Ser Asp Pro 530 535 540 Val His Gly Ala Ile Glu Val Cys Ala Glu Leu Arg Pro Thr Val Asp 545 550 555 560 Ile Ser Val Pro Ala Asn Ala Asp Phe Val Arg Pro Glu Leu Arg Gln 565 570 575 Ser Ser 4329PRTMethanococcus maripaludis 4Met Lys Val Phe Tyr Asp Ser Asp Phe Lys Leu Asp Ala Leu Lys Glu 1 5 10 15 Lys Thr Ile Ala Val Ile Gly Tyr Gly Ser Gln Gly Arg Ala Gln Ser 20 25 30 Leu Asn Met Lys Asp Ser Gly Leu Asn Val Val Val Gly Leu Arg Lys 35 40 45 Asn Gly Ala Ser Trp Glu Asn Ala Lys Ala Asp Gly His Asn Val Met 50 55 60 Thr Ile Glu Glu Ala Ala Glu Lys Ala Asp Ile Ile His Ile Leu Ile 65 70 75 80 Pro Asp Glu Leu Gln Ala Glu Val Tyr Glu Ser Gln Ile Lys Pro Tyr 85 90 95 Leu Lys Glu Gly Lys Thr Leu Ser Phe Ser His Gly Phe Asn Ile His 100 105 110 Tyr Gly Phe Ile Val Pro Pro Lys Gly Val Asn Val Val Leu Val Ala 115 120 125 Pro Lys Ser Pro Gly Lys Met Val Arg Arg Thr Tyr Glu Glu Gly Phe 130 135 140 Gly Val Pro Gly Leu Ile Cys Ile Glu Ile Asp Ala Thr Asn Asn Ala 145 150 155 160 Phe Asp Ile Val Ser Ala Met Ala Lys Gly Ile Gly Leu Ser Arg Ala 165 170 175 Gly Val Ile Gln Thr Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Phe 180 185 190 Gly Glu Gln Ala Val Leu Cys Gly Gly Val Thr Glu Leu Ile Lys Ala 195 200 205 Gly Phe Glu Thr Leu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 220 Phe Glu Thr Cys His Glu Leu Lys Leu Ile Val Asp Leu Ile Tyr Gln 225 230 235 240 Lys Gly Phe Lys Asn Met Trp Asn Asp Val Ser Asn Thr Ala Glu Tyr 245 250 255 Gly Gly Leu Thr Arg Arg Ser Arg Ile Val Thr Ala Asp Ser Lys Ala 260 265 270 Ala Met Lys Glu Ile Leu Lys Glu Ile Gln Asp Gly Arg Phe Thr Lys 275 280 285 Glu Phe Val Leu Glu Lys Gln Val Asn His Ala His Leu Lys Ala Met 290 295 300 Arg Arg Ile Glu Gly Asp Leu Gln Ile Glu Glu Val Gly Ala Lys Leu 305 310 315 320 Arg Lys Met Cys Gly Leu Glu Lys Glu 325 5339PRTAcidiphilium cryptum 5Met Arg Val Tyr Tyr Asp Ser Asp Ala Asp Val Asn Leu Ile Lys Ala 1 5 10 15 Lys Lys Val Ala Val Val Gly Tyr Gly Ser Gln Gly His Ala His Ala 20 25 30 Leu Asn Leu Lys Glu Ser Gly Val Lys Glu Leu Val Val Ala Leu Arg 35 40 45 Lys Gly Ser Ala Ala Val Ala Lys Ala Glu Ala Ala Gly Leu Arg Val 50 55 60 Met Thr Pro Glu Glu Ala Ala Ala Trp Ala Asp Val Val Met Ile Leu 65 70 75 80 Thr Pro Asp Glu Gly Gln Gly Asp Leu Tyr Arg Asp Ser Leu Ala Ala 85 90 95 Asn Leu Lys Pro Gly Ala Ala Ile Ala Phe Ala His Gly Leu Asn Ile 100 105 110 His Phe Asn Leu Ile Glu Pro Arg Ala Asp Ile Asp Val Phe Met Ile 115 120 125 Ala Pro Lys Gly Pro Gly His Thr Val Arg Ser Glu Tyr Gln Arg Gly 130 135 140 Gly Gly Val Pro Cys Leu Val Ala Val Ala Gln Asn Pro Ser Gly Asn 145 150 155 160 Ala Leu Asp Ile Ala Leu Ser Tyr Ala Ser Ala Ile Gly Gly Gly Arg 165 170 175 Ala Gly Ile Ile Glu Thr Thr Phe Lys Glu Glu Cys Glu Thr Asp Leu 180 185 190 Phe Gly Glu Gln Thr Val Leu Cys Gly Gly Leu Val Glu Leu Ile Lys 195 200 205 Ala Gly Phe Glu Thr Leu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala 210 215 220 Tyr Phe Glu Cys Leu His Glu Val Lys Leu Ile Val Asp Leu Ile Tyr 225 230 235 240 Glu Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn Thr Ala Glu 245 250 255 Tyr Gly Glu Tyr Val Thr Gly Pro Arg Met Ile Thr Pro Glu Thr Lys 260 265 270 Ala Glu Met Lys Arg Val Leu Asp Asp Ile Gln Lys Gly Arg Phe Thr 275 280 285 Arg Asp Trp Met Leu Glu Asn Lys Val Asn Gln Thr Asn Phe Lys Ala 290 295 300 Met Arg Arg Ala Asn Ala Ala His Pro Ile Glu Glu Val Gly Glu Lys 305 310 315 320 Leu Arg Ala Met Met Pro Trp Ile Lys Lys Gly Ala Leu Val Asp Lys 325 330 335 Thr Arg Asn 6555PRTChlamydomonas reinhardtii 6Met Gln Leu Leu Asn Ser Lys Ser Arg Val Leu Ser Gly Ser Arg Gln 1 5 10 15 Gln Ala Ala Ala Lys Ala Val Arg Val Ala Pro Ser Gly Arg Arg Ser 20 25 30 Ala Val Arg Val Ser Ala Ala Val His Leu Asp Phe Asn Thr Lys Val 35 40 45 Phe Gln Lys Glu His Ala Lys Phe Gly Pro Thr Glu Glu Tyr Ile Val 50 55 60 Arg Gly Gly Arg Asp Lys Tyr Pro Leu Leu Lys Glu Ala Phe Lys Gly 65 70 75 80 Ile Lys Lys Val Ser Val Ile Gly Trp Gly Ser Gln Ala Pro Ala Gln 85 90 95 Ala Gln Asn Leu Arg Asp Ser Ile Ala Glu Ala Gly Met Asp Ile Lys 100 105 110 Val Ala Ile Gly Leu Arg Pro Asp Ser Pro Ser Trp Ala Glu Ala Glu 115 120 125 Ala Cys Gly Phe Ser Lys Thr Asp Gly Thr Leu Gly Glu Val Phe Glu 130 135 140 Gln Ile Ser Ser Ser Asp Phe Val Ile Leu Leu Ile Ser Asp Ala Ala 145

150 155 160 Gln Ala Lys Leu Tyr Pro Arg Ile Leu Ala Ala Met Lys Pro Gly Ala 165 170 175 Thr Leu Gly Leu Ser His Gly Phe Leu Leu Gly Val Met Arg Asn Asp 180 185 190 Gly Val Asp Phe Arg Lys Asp Ile Asn Val Val Leu Val Ala Pro Lys 195 200 205 Gly Met Gly Pro Ser Val Arg Arg Leu Tyr Glu Gln Gly Lys Ser Val 210 215 220 Asn Gly Ala Gly Ile Asn Cys Ser Phe Ala Ile Gln Gln Asp Ala Thr 225 230 235 240 Gly Gln Ala Ala Asp Ile Ala Ile Gly Trp Ala Ile Gly Val Gly Ala 245 250 255 Pro Phe Ala Phe Pro Thr Thr Leu Glu Ser Glu Tyr Lys Ser Asp Ile 260 265 270 Tyr Gly Glu Arg Cys Val Leu Leu Gly Ala Val His Gly Ile Val Glu 275 280 285 Ala Leu Phe Arg Arg Tyr Thr Arg Gln Gly Met Ser Asp Glu Glu Ala 290 295 300 Phe Lys Gln Ser Val Glu Ser Ile Thr Gly Pro Ile Ser Arg Thr Ile 305 310 315 320 Ser Thr Lys Gly Met Leu Ser Val Tyr Asn Ser Phe Asn Glu Ala Asp 325 330 335 Lys Lys Ile Phe Glu Gln Ala Tyr Ser Ala Ser Tyr Lys Pro Ala Leu 340 345 350 Asp Ile Cys Phe Glu Ile Tyr Glu Asp Val Ala Ser Gly Asn Glu Ile 355 360 365 Lys Ser Val Val Gln Ala Val Gln Arg Phe Asp Arg Phe Pro Met Gly 370 375 380 Lys Ile Asp Gln Thr Tyr Met Trp Lys Val Gly Gln Lys Val Arg Ala 385 390 395 400 Glu Arg Asp Glu Ser Lys Ile Pro Val Asn Pro Phe Thr Ala Gly Val 405 410 415 Tyr Val Ala Val Met Met Ala Thr Val Glu Val Leu Arg Glu Lys Gly 420 425 430 His Pro Phe Ser Glu Ile Cys Asn Glu Ser Ile Ile Glu Ala Val Asp 435 440 445 Ser Leu Asn Pro Tyr Met His Ala Arg Gly Val Ala Phe Met Val Asp 450 455 460 Asn Cys Ser Tyr Thr Ala Arg Leu Gly Ser Arg Lys Trp Ala Pro Arg 465 470 475 480 Phe Asp Tyr Ile Ile Glu Gln Gln Ala Phe Val Asp Ile Asp Ser Gly 485 490 495 Lys Ala Ala Asp Lys Glu Val Met Ala Glu Phe Leu Ala His Pro Val 500 505 510 His Ser Ala Leu Ala Thr Cys Ser Ser Met Arg Pro Ser Val Asp Ile 515 520 525 Ser Val Gly Gly Glu Asn Ser Ser Val Gly Val Gly Ala Gly Ala Ala 530 535 540 Arg Thr Glu Phe Arg Ser Thr Ala Ala Lys Val 545 550 555 7329PRTPicrophilus torridus 7Met Glu Lys Val Tyr Thr Glu Asn Asp Leu Lys Glu Asn Leu Met Arg 1 5 10 15 Asn Lys Lys Ile Ala Val Leu Gly Tyr Gly Ser Gln Gly Arg Ala Trp 20 25 30 Ala Leu Asn Met Arg Asp Ser Gly Leu Asn Val Thr Val Gly Leu Glu 35 40 45 Arg Gln Gly Lys Ser Trp Glu Lys Ala Val Ala Asp Gly Phe Lys Pro 50 55 60 Leu Lys Ser Arg Asp Ala Val Arg Asp Ala Asp Ala Val Ile Phe Leu 65 70 75 80 Val Pro Asp Met Ala Gln Arg Glu Leu Tyr Lys Asn Ile Met Asn Asp 85 90 95 Ile Lys Asp Asp Ala Asp Ile Val Phe Ala His Gly Phe Asn Val His 100 105 110 Tyr Gly Leu Ile Asn Pro Lys Asn His Asp Val Tyr Met Val Ala Pro 115 120 125 Lys Ala Pro Gly Pro Ser Val Arg Glu Phe Tyr Glu Arg Gly Gly Gly 130 135 140 Val Pro Val Leu Ile Ala Val Ala Asn Asp Val Ser Gly Arg Ser Lys 145 150 155 160 Glu Lys Ala Leu Ser Ile Ala Tyr Ser Leu Gly Ala Leu Arg Ala Gly 165 170 175 Ala Ile Glu Thr Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Ile Gly 180 185 190 Glu Gln Leu Asp Leu Val Gly Gly Ile Thr Glu Leu Leu Arg Ser Thr 195 200 205 Phe Asn Ile Met Val Glu Met Gly Tyr Lys Pro Glu Met Ala Tyr Phe 210 215 220 Glu Ala Ile Asn Glu Met Lys Leu Ile Val Asp Gln Val Phe Glu Lys 225 230 235 240 Gly Ile Ser Gly Met Leu Arg Ala Val Ser Asp Thr Ala Lys Tyr Gly 245 250 255 Gly Leu Thr Thr Gly Lys Tyr Ile Ile Asn Asp Asp Val Arg Lys Arg 260 265 270 Met Arg Glu Arg Ala Glu Tyr Ile Val Ser Gly Lys Phe Ala Glu Glu 275 280 285 Trp Ile Glu Glu Tyr Gly Glu Gly Ser Lys Asn Leu Glu Ser Met Met 290 295 300 Leu Asp Ile Asp Asn Ser Leu Glu Glu Gln Val Gly Lys Gln Leu Arg 305 310 315 320 Glu Ile Val Leu Arg Gly Arg Pro Lys 325 8339PRTZymomonas mobilis 8Met Lys Val Tyr Tyr Asp Ser Asp Ala Asp Leu Gly Leu Ile Lys Ser 1 5 10 15 Lys Lys Ile Ala Ile Leu Gly Tyr Gly Ser Gln Gly His Ala His Ala 20 25 30 Gln Asn Leu Arg Asp Ser Gly Val Ala Glu Val Ala Ile Ala Leu Arg 35 40 45 Pro Asp Ser Ala Ser Val Lys Lys Ala Gln Asp Ala Gly Phe Lys Val 50 55 60 Leu Thr Asn Ala Glu Ala Ala Lys Trp Ala Asp Ile Leu Met Ile Leu 65 70 75 80 Ala Pro Asp Glu His Gln Ala Ala Ile Tyr Ala Glu Asp Leu Lys Asp 85 90 95 Asn Leu Arg Pro Gly Ser Ala Ile Ala Phe Ala His Gly Leu Asn Ile 100 105 110 His Phe Gly Leu Ile Glu Pro Arg Lys Asp Ile Asp Val Phe Met Ile 115 120 125 Ala Pro Lys Gly Pro Gly His Thr Val Arg Ser Glu Tyr Val Arg Gly 130 135 140 Gly Gly Val Pro Cys Leu Val Ala Val Asp Gln Asp Ala Ser Gly Asn 145 150 155 160 Ala His Asp Ile Ala Leu Ala Tyr Ala Ser Gly Ile Gly Gly Gly Arg 165 170 175 Ser Gly Val Ile Glu Thr Thr Phe Arg Glu Glu Val Glu Thr Asp Leu 180 185 190 Phe Gly Glu Gln Ala Val Leu Cys Gly Gly Leu Thr Ala Leu Ile Thr 195 200 205 Ala Gly Phe Glu Thr Leu Thr Glu Ala Gly Tyr Ala Pro Glu Met Ala 210 215 220 Phe Phe Glu Cys Met His Glu Met Lys Leu Ile Val Asp Leu Ile Tyr 225 230 235 240 Glu Ala Gly Ile Ala Asn Met Arg Tyr Ser Ile Ser Asn Thr Ala Glu 245 250 255 Tyr Gly Asp Ile Val Ser Gly Pro Arg Val Ile Asn Glu Glu Ser Lys 260 265 270 Lys Ala Met Lys Ala Ile Leu Asp Asp Ile Gln Ser Gly Arg Phe Val 275 280 285 Ser Lys Phe Val Leu Asp Asn Arg Ala Gly Gln Pro Glu Leu Lys Ala 290 295 300 Ala Arg Lys Arg Met Ala Ala His Pro Ile Glu Gln Val Gly Ala Arg 305 310 315 320 Leu Arg Lys Met Met Pro Trp Ile Ala Ser Asn Lys Leu Val Asp Lys 325 330 335 Ala Arg Asn 910PRTArtificial Sequencec-myc epitope tag 9Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu 1 5 10 10554PRTThermotoga petrophila 10Met Arg Ser Asp Val Ile Lys Lys Gly Leu Glu Arg Ala Pro His Arg 1 5 10 15 Ser Leu Leu Lys Ala Leu Gly Ile Thr Asp Asp Glu Met Arg Arg Pro 20 25 30 Phe Ile Gly Ile Val Ser Ser Trp Asn Glu Ile Ile Pro Gly His Val 35 40 45 His Leu Asp Lys Val Val Glu Ala Val Lys Ala Gly Val Arg Met Ala 50 55 60 Gly Gly Val Pro Phe Val Phe Pro Thr Ile Gly Ile Cys Asp Gly Ile 65 70 75 80 Ala Met Asp His Arg Gly Met Lys Phe Ser Leu Pro Ser Arg Glu Leu 85 90 95 Ile Ala Asp Ser Ile Glu Ile Val Ala Ser Gly Phe Pro Phe Asp Gly 100 105 110 Leu Val Phe Val Pro Asn Cys Asp Lys Ile Thr Pro Gly Met Met Met 115 120 125 Ala Met Gly Arg Leu Asn Ile Pro Ser Val Leu Ile Ser Gly Gly Pro 130 135 140 Met Leu Ala Gly Arg Tyr Asn Gly Arg Asp Ile Asp Leu Ile Thr Val 145 150 155 160 Phe Glu Ala Val Gly Gly Tyr Lys Val Gly Lys Val Asp Glu Glu Thr 165 170 175 Leu Lys Ala Ile Glu Asp Leu Ala Cys Pro Gly Ala Gly Ser Cys Ala 180 185 190 Gly Leu Phe Thr Ala Asn Thr Met Asn Ser Leu Ala Glu Ala Leu Gly 195 200 205 Ile Ala Pro Arg Gly Asn Gly Thr Val Pro Ala Val His Ala Lys Arg 210 215 220 Leu Arg Met Ala Lys Glu Ala Gly Met Leu Val Val Glu Leu Val Lys 225 230 235 240 Arg Asp Val Lys Pro Arg Asp Ile Val Thr Leu Asp Ser Phe Met Asn 245 250 255 Ala Val Met Val Asp Leu Ala Thr Gly Gly Ser Thr Asn Thr Val Leu 260 265 270 His Leu Lys Ala Ile Ala Glu Ser Phe Gly Ile Asp Phe Asp Ile Lys 275 280 285 Leu Phe Asp Glu Leu Ser Arg Lys Ile Pro His Ile Cys Asn Ile Ser 290 295 300 Pro Val Gly Pro Tyr His Ile Gln Asp Leu Asp Asp Ala Gly Gly Ile 305 310 315 320 Tyr Ala Val Met Lys Arg Leu Gln Glu Asn Gly Leu Leu Lys Glu Asp 325 330 335 Ala Met Thr Ile Tyr Leu Arg Lys Ile Gly Asp Leu Val Arg Glu Ala 340 345 350 Lys Ile Leu Asn Glu Asp Val Ile Arg Pro Phe Asp Asn Pro Tyr His 355 360 365 Lys Glu Gly Gly Leu Gly Ile Leu Phe Gly Asn Leu Ala Pro Glu Gly 370 375 380 Ala Val Ala Lys Leu Ser Gly Val Pro Glu Lys Met Met His His Val 385 390 395 400 Gly Pro Ala Val Val Phe Glu Asp Gly Glu Glu Ala Thr Lys Ala Ile 405 410 415 Leu Ser Gly Lys Ile Lys Lys Gly Asp Val Val Val Ile Arg Tyr Glu 420 425 430 Gly Pro Lys Gly Gly Pro Gly Met Arg Glu Met Leu Ser Pro Thr Ser 435 440 445 Ala Ile Val Gly Met Gly Leu Ala Glu Asp Val Ala Leu Ile Thr Asp 450 455 460 Gly Arg Phe Ser Gly Gly Ser His Gly Ala Val Ile Gly His Val Ser 465 470 475 480 Pro Glu Ala Ala Glu Gly Gly Pro Ile Gly Ile Val Lys Asp Gly Asp 485 490 495 Leu Ile Glu Ile Asp Phe Glu Lys Arg Thr Leu Asn Leu Leu Ile Ser 500 505 510 Asp Glu Glu Phe Glu Arg Arg Met Lys Glu Phe Thr Pro Leu Val Lys 515 520 525 Glu Val Asp Ser Asp Tyr Leu Arg Arg Tyr Ala Phe Phe Val Gln Ser 530 535 540 Ala Ser Lys Gly Ala Ile Phe Arg Lys Pro 545 550 11561PRTVictivallis vadensis 11Met Arg Ser Asp Thr Met Lys Lys Gly Pro Glu Arg Ala Pro His Arg 1 5 10 15 Gly Leu Met Arg Ala Thr Gly Leu Lys Lys Glu Asp Phe Asp Lys Pro 20 25 30 Phe Ile Gly Val Cys Asn Ser Tyr Thr Asn Ile Val Pro Gly His Cys 35 40 45 His Leu Lys Lys Val Gly Glu Ile Ile Cys Asp Ala Ile Arg Glu Ala 50 55 60 Gly Gly Val Pro Tyr Glu Phe Asn Thr Ile Ala Val Cys Asp Gly Ile 65 70 75 80 Ala Met Gly His Lys Gly Met Lys Tyr Ser Leu Ala Ser Arg Glu Ile 85 90 95 Ile Ala Asp Ser Val Glu Thr Met Gly Thr Ala His Pro Phe Asp Ala 100 105 110 Met Ile Cys Ile Pro Asn Cys Asp Lys Val Val Pro Gly Met Leu Met 115 120 125 Gly Ala Met Arg Leu Asn Ile Pro Thr Ile Phe Ala Ser Gly Gly Pro 130 135 140 Met Arg Ala Gly Lys Pro Gln Ala Glu Gly Gly Pro Asp Thr Asp Leu 145 150 155 160 Ile Ser Ile Phe Glu Gly Val Ala Ala Asn Arg Ile Gly Lys Leu Ser 165 170 175 Asp Glu Gly Leu Glu Ala Leu Glu Cys Ser Ala Cys Pro Gly Pro Gly 180 185 190 Ser Cys Ser Gly Met Phe Thr Ala Asn Ser Met Asn Cys Leu Cys Glu 195 200 205 Ala Leu Gly Ile Ala Leu Pro Gly Asn Gly Thr Ile Ala Ala Asp Ser 210 215 220 Pro Glu Arg Val Glu Leu Trp Lys Arg Ala Ala Arg Arg Ala Val Glu 225 230 235 240 Leu Ala Arg Met Glu Asn Pro Pro Thr Ala Lys Asp Phe Ala Thr Pro 245 250 255 Ala Ala Phe Gln Asn Ala Leu Val Leu Asp Met Ala Met Gly Gly Ser 260 265 270 Ser Asn Thr Val Leu His Thr Leu Ala Val Ala Thr Glu Ala Gly Thr 275 280 285 Lys Leu Asp Leu Lys Lys Leu Asp Glu Ile Ser Ala Arg Thr Pro Asn 290 295 300 Ile Cys Lys Leu Ser Pro Ser Val Gln Tyr His Ile Val Glu Asp Gly 305 310 315 320 Asn Arg Val Gly Gly Ile Met Ala Ile Leu Lys Glu Ile Ser Lys Val 325 330 335 Pro Gly Leu Ile Asp Gly Ser Ala Pro Thr Val Ser Gly Lys Thr Leu 340 345 350 Ala Glu Glu Phe Asn Gly Ala Pro Asp Pro Asp Gly Thr Ile Ile Arg 355 360 365 Pro Leu Ser Asn Pro Tyr Ser Glu Lys Gly Gly Leu Ala Ile Leu Phe 370 375 380 Gly Asn Leu Ala Glu Lys Gly Cys Val Val Lys Ala Ala Gly Val Ala 385 390 395 400 Lys Ala Met Leu Thr His Lys Gly Pro Ala Val Ile Phe Asp Ser Glu 405 410 415 Glu Glu Ala Gly Glu Gly Ile Leu Ala Gly Lys Val Lys Ala Gly Asp 420 425 430 Val Val Val Ile Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Gln 435 440 445 Glu Met Leu Ala Pro Thr Ser Tyr Ile Met Gly Arg Gly Leu Gly Glu 450 455 460 Ser Val Ala Leu Val Thr Asp Gly Arg Phe Ser Gly Gly Thr Arg Gly 465 470 475 480 Ala Cys Ile Gly His Val Ser Pro Glu Ala Ala Ala Gly Gly Leu Ile 485 490 495 Gly Leu Val Glu Pro Gly Asp Ile Ile Glu Ile Asp Ile Pro Asn Arg 500 505 510 Ser Ile Lys Leu Asp Val Pro Asp Glu Val Ile Ala Glu Arg Arg Lys 515 520 525 Asn Trp Lys Pro Arg Glu Pro Lys Ile Lys Thr Gly Tyr Leu Ala Lys 530 535 540 Tyr Ala Ser Leu Ala Thr Ser Ala Asp Thr Gly Gly Val Leu Lys Val 545 550 555 560 Asn 12549PRTUnknownTermite Group 1 Bacterium Phylotype Rs-D17 12Met Arg Ser Asp Gln Ile Lys Arg Gly Ala Val Arg Ala Pro Asn Arg 1 5 10 15 Cys Leu Leu Tyr Ser Thr Gly Ile Ser Pro Gly Asp Leu Asp Lys Pro 20 25 30 Phe Ile Gly Ile Ala Ser Ser Phe Thr Asp Leu Val Pro Gly His Val 35 40 45 Ala Met Arg Asp Leu Glu Arg Tyr Val Glu Arg Gly Ile Ala Ala Gly 50 55 60 Gly Gly Val Pro Phe Ile Phe Gly Ala Pro Ala Val Cys Asp Gly Ile 65 70 75 80 Ala Met Gly His Ser Gly Met His Tyr

Ser Leu Gly Ser Arg Glu Ile 85 90 95 Ile Ala Asp Leu Val Glu Thr Val Ala Asn Ala His Met Leu Asp Gly 100 105 110 Leu Ile Leu Leu Ser Asn Cys Asp Lys Val Thr Pro Gly Met Leu Met 115 120 125 Ala Ala Ala Arg Leu Asn Ile Pro Ala Ile Val Val Thr Ala Gly Ala 130 135 140 Met Met Thr Gly Met Tyr Asp Lys Lys Arg Arg Ser Met Val Arg Asp 145 150 155 160 Thr Phe Glu Ala Val Gly Gln Phe Gln Ala Gly Lys Ile Thr Glu Lys 165 170 175 Gln Leu Ser Glu Leu Glu Met Ala Ala Cys Pro Gly Ala Gly Ala Cys 180 185 190 Gln Gly Met Tyr Thr Ala Asn Thr Met Ala Cys Leu Thr Glu Thr Met 195 200 205 Gly Met Ser Met Arg Gly Cys Ala Thr Thr Leu Ala Val Ser Ala Lys 210 215 220 Lys Lys Arg Ile Ala Tyr Glu Ser Gly Ile Arg Val Val Ala Leu Val 225 230 235 240 Lys Lys Asp Val Lys Pro Arg Asp Ile Leu Thr Leu Ala Ala Phe Lys 245 250 255 Asn Ala Ile Val Ala Asp Met Ala Leu Gly Gly Ser Thr Asn Thr Val 260 265 270 Leu His Leu Pro Ala Ile Ala Asn Glu Ala Gly Ile Glu Leu Pro Leu 275 280 285 Glu Leu Phe Asp Glu Ile Ser Lys Lys Thr Pro Gln Ile Ala Cys Leu 290 295 300 Glu Pro Ala Gly Asp His Tyr Met Glu Asp Leu Asp Asn Ala Gly Gly 305 310 315 320 Ile Pro Ala Val Leu Phe Ala Ile Gln Lys Asn Leu Ala His Ser Lys 325 330 335 Thr Val Ser Gly Phe Asp Ile Ile Glu Ile Ala Asn Ser Ala Glu Ile 340 345 350 Leu Asp Glu Tyr Val Ile Arg Ala Lys Asn Pro Tyr Lys Pro Glu Gly 355 360 365 Gly Ile Ala Ile Leu Arg Gly Asn Ile Ala Pro Arg Gly Cys Val Val 370 375 380 Lys Gln Ala Ala Val Ser Glu Lys Met Lys Val Phe Ser Gly Arg Ala 385 390 395 400 Arg Val Phe Asn Ser Glu Asp Asn Ala Met Lys Ala Ile Leu Asp Asn 405 410 415 Lys Ile Val Pro Gly Asp Ile Val Val Ile Arg Tyr Glu Gly Pro Ala 420 425 430 Gly Gly Pro Gly Met Arg Glu Met Leu Ser Pro Thr Ser Ala Leu His 435 440 445 Gly Met Gly Leu Ser Asp Ser Val Ala Leu Leu Thr Asp Gly Arg Phe 450 455 460 Ser Gly Gly Thr Arg Gly Pro Cys Ile Gly His Ile Ser Pro Glu Ala 465 470 475 480 Ala Ala Asp Gly Ala Ile Val Ala Ile Asn Glu Gly Asp Thr Ile Asn 485 490 495 Ile Asn Ile Pro Glu Arg Thr Leu Asn Val Glu Leu Thr Asp Asp Glu 500 505 510 Ile Lys Ala Arg Ile Gly Lys Val Ile Lys Pro Glu Pro Lys Ile Lys 515 520 525 Thr Gly Tyr Met Ala Arg Tyr Ala Lys Leu Val Gln Ser Ala Asp Thr 530 535 540 Gly Ala Val Leu Lys 545 13573PRTYarrowia lipolytica 13Met Ile Arg Ala Arg Asn Tyr Ala Thr Lys Ala His Thr Leu Asn Lys 1 5 10 15 Phe Ser Lys Ile Ile Thr Glu Pro Lys Ser Gln Gly Ala Ser Gln Ala 20 25 30 Met Leu Tyr Ala Cys Gly Phe Asn Glu Ala Asp Leu Gly Lys Pro Gln 35 40 45 Val Gly Val Ala Ser Val Trp Trp Ser Gly Asn Pro Cys Asn Met His 50 55 60 Leu Leu Asp Leu Asn Phe Lys Val Lys Glu Gly Ile Glu Lys His Asn 65 70 75 80 Leu Lys Ala Met Gln Phe Asn Thr Ile Gly Val Ser Asp Gly Ile Ser 85 90 95 Met Gly Thr Lys Gly Met Arg Tyr Ser Leu Gln Ser Arg Asp Met Ile 100 105 110 Ala Asp Ser Ile Glu Thr Leu Met Met Ala Gln His Tyr Asp Ala Asn 115 120 125 Ile Ser Ile Pro Gly Cys Asp Lys Asn Met Pro Gly Val Leu Met Ala 130 135 140 Met Gly Arg Val Asn Arg Pro Ser Ile Met Leu Tyr Gly Gly Thr Ile 145 150 155 160 His Pro Gly Lys Ala Glu Thr Arg Lys Gly Glu Asp Ile Asp Ile Val 165 170 175 Ser Ala Phe Gln Ala Tyr Gly Gln Tyr Ile Ala Gly Gly Ile Ser Glu 180 185 190 Thr Glu Arg Ala Asp Val Ile Arg His Ala Cys Pro Gly Gln Gly Ala 195 200 205 Cys Gly Gly Met Tyr Thr Ala Asn Thr Met Ala Ser Ala Ala Glu Val 210 215 220 Leu Gly Met Thr Leu Pro Gly Ser Ser Ser Ala Pro Ala Ile Ser Lys 225 230 235 240 Glu Lys Met Ala Glu Cys Glu Ala Leu Gly Pro Ala Ile Asn Lys Leu 245 250 255 Leu Glu Met Asp Leu Lys Pro Lys Asp Ile Met Thr Arg Gln Ala Phe 260 265 270 Glu Asn Ala Ile Ala Tyr Ile Ile Ala Thr Gly Gly Ser Thr Asn Ala 275 280 285 Val Leu His Leu Leu Ala Ile Ala His Thr Val Asp Val Pro Leu Thr 290 295 300 Ile Asp Asp Phe Gln Arg Ile Ser Asp Asn Thr Pro Leu Leu Ala Asp 305 310 315 320 Phe Lys Pro Ser Gly Ala His Val Met Ala Asp Leu Gln Lys Trp Gly 325 330 335 Gly Thr Pro Ala Val Ile Lys Met Leu Ile Glu Gln Gly Phe Ile Asp 340 345 350 Gly Ser Pro Met Thr Cys Ser Gly Glu Ser Leu Lys Asp Thr Val Ala 355 360 365 Lys Tyr Pro Ser Leu Pro Lys Glu Gln Asp Ile Phe Ala Ser Val Asp 370 375 380 Ala Pro Leu Lys Pro Ser Gly His Leu Gln Ile Leu Lys Gly Ser Leu 385 390 395 400 Ala Pro Gly Gly Ser Val Gly Lys Ile Thr Gly Lys Glu Gly Thr Phe 405 410 415 Phe Lys Gly Thr Ala Arg Cys Phe Asp Glu Glu Asp Leu Phe Ile Glu 420 425 430 Ala Leu Glu Lys Gly Glu Ile Lys Lys Gly Glu Lys Thr Cys Val Ile 435 440 445 Ile Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Pro Glu Met Leu 450 455 460 Lys Pro Ser Ser Ala Leu Met Gly Tyr Gly Leu Gly Lys Asp Val Ala 465 470 475 480 Leu Leu Thr Asp Gly Arg Phe Ser Gly Gly Ser His Gly Phe Leu Ile 485 490 495 Gly His Ile Val Pro Glu Ala Tyr Glu Gly Gly Pro Ile Gly Leu Val 500 505 510 Glu Asp Gly Asp Glu Ile Ile Ile Asp Ala Asp Asn Asn Ile Ile Asp 515 520 525 Leu Leu Val Asp Glu Lys Thr Met Ala Glu Arg Lys Ala Lys Trp Thr 530 535 540 Pro Pro Ala Pro Arg Tyr Thr Ser Gly Thr Leu His Lys Tyr Ser Lys 545 550 555 560 Leu Val Ser Asp Ala Ser Thr Gly Cys Ile Thr Asp Ala 565 570 14560PRTFrancisella tularensis 14Met Lys Lys Val Leu Asn Lys Tyr Ser Arg Arg Leu Thr Glu Asp Lys 1 5 10 15 Ser Gln Gly Ala Ser Gln Ala Met Leu Tyr Gly Thr Glu Met Asn Asp 20 25 30 Ala Asp Met His Lys Pro Gln Ile Gly Ile Gly Ser Val Trp Tyr Glu 35 40 45 Gly Asn Thr Cys Asn Met His Leu Asn Gln Leu Ala Gln Phe Val Lys 50 55 60 Asp Ser Val Glu Lys Glu Asn Leu Lys Gly Met Arg Phe Asn Thr Ile 65 70 75 80 Gly Val Ser Asp Gly Ile Ser Met Gly Thr Asp Gly Met Ser Tyr Ser 85 90 95 Leu Gln Ser Arg Asp Leu Ile Ala Asp Ser Ile Glu Thr Val Met Ser 100 105 110 Ala His Trp Tyr Asp Gly Leu Val Ser Ile Pro Gly Cys Asp Lys Asn 115 120 125 Met Pro Gly Cys Met Met Ala Leu Gly Arg Leu Asn Arg Pro Gly Phe 130 135 140 Val Ile Tyr Gly Gly Thr Ile Gln Ala Gly Val Met Arg Gly Lys Pro 145 150 155 160 Ile Asp Ile Val Thr Ala Phe Gln Ser Tyr Gly Ala Cys Leu Ser Gly 165 170 175 Gln Ile Thr Glu Gln Glu Arg Gln Glu Thr Ile Lys Lys Ala Cys Pro 180 185 190 Gly Ala Gly Ala Cys Gly Gly Met Tyr Thr Ala Asn Thr Met Ala Cys 195 200 205 Ala Ile Glu Ala Leu Gly Met Ser Leu Pro Phe Ser Ser Ser Thr Ser 210 215 220 Ala Thr Ser Val Glu Lys Val Gln Glu Cys Asp Lys Ala Gly Glu Thr 225 230 235 240 Ile Lys Asn Leu Leu Glu Leu Asp Ile Lys Pro Arg Asp Ile Met Thr 245 250 255 Arg Lys Ala Phe Glu Asn Ala Met Val Leu Ile Thr Val Met Gly Gly 260 265 270 Ser Thr Asn Ala Val Leu His Leu Leu Ala Met Ala Ser Ser Val Asp 275 280 285 Val Asp Leu Ser Ile Asp Asp Phe Gln Glu Ile Ala Asn Lys Thr Pro 290 295 300 Val Leu Ala Asp Phe Lys Pro Ser Gly Lys Tyr Val Met Ala Asn Leu 305 310 315 320 His Ala Ile Gly Gly Thr Pro Ala Val Met Lys Met Leu Leu Lys Ala 325 330 335 Gly Met Leu His Gly Asp Cys Leu Thr Val Thr Gly Lys Thr Leu Ala 340 345 350 Glu Asn Leu Glu Asn Val Ala Asp Leu Pro Glu Asp Asn Thr Ile Ile 355 360 365 His Lys Leu Asp Asn Pro Ile Lys Lys Thr Gly His Leu Gln Ile Leu 370 375 380 Lys Gly Asn Val Ala Pro Glu Gly Ser Val Ala Lys Ile Thr Gly Lys 385 390 395 400 Glu Gly Glu Ile Phe Glu Gly Val Ala Asn Val Phe Asp Ser Glu Glu 405 410 415 Glu Met Val Ala Ala Val Glu Thr Gly Lys Val Lys Lys Gly Asp Val 420 425 430 Ile Val Ile Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Pro Glu 435 440 445 Met Leu Lys Pro Thr Ser Leu Ile Met Gly Ala Gly Leu Gly Gln Asp 450 455 460 Val Ala Leu Ile Thr Asp Gly Arg Phe Ser Gly Gly Ser His Gly Phe 465 470 475 480 Ile Val Gly His Ile Thr Pro Glu Ala Tyr Glu Gly Gly Met Ile Ala 485 490 495 Leu Leu Glu Asn Gly Asp Lys Ile Thr Ile Asp Ala Ile Asn Asn Val 500 505 510 Ile Asn Val Asp Leu Ser Asp Gln Glu Ile Ala Gln Arg Lys Ser Lys 515 520 525 Trp Arg Ala Ser Lys Gln Lys Ala Ser Arg Gly Thr Leu Lys Lys Tyr 530 535 540 Ile Lys Thr Val Ser Ser Ala Ser Thr Gly Cys Val Thr Asp Leu Asp 545 550 555 560 15581PRTArabidopsis thaliana 15Met Pro Ser Ile Ile Ser Cys Ser Ala Gln Ser Val Thr Ala Asp Pro 1 5 10 15 Ser Pro Pro Ile Thr Asp Thr Asn Lys Leu Asn Lys Tyr Ser Ser Arg 20 25 30 Ile Thr Glu Pro Lys Ser Gln Gly Gly Ser Gln Ala Ile Leu His Gly 35 40 45 Val Gly Leu Ser Asp Asp Asp Leu Leu Lys Pro Gln Ile Gly Ile Ser 50 55 60 Ser Val Trp Tyr Glu Gly Asn Thr Cys Asn Met His Leu Leu Lys Leu 65 70 75 80 Ser Glu Ala Val Lys Glu Gly Val Glu Asn Ala Gly Met Val Gly Phe 85 90 95 Arg Phe Asn Thr Ile Gly Val Ser Asp Ala Ile Ser Met Gly Thr Arg 100 105 110 Gly Met Cys Phe Ser Leu Gln Ser Arg Asp Leu Ile Ala Asp Ser Ile 115 120 125 Glu Thr Val Met Ser Ala Gln Trp Tyr Asp Gly Asn Ile Ser Ile Pro 130 135 140 Gly Cys Asp Lys Asn Met Pro Gly Thr Ile Met Ala Met Gly Arg Leu 145 150 155 160 Asn Arg Pro Gly Ile Met Val Tyr Gly Gly Thr Ile Lys Pro Gly His 165 170 175 Phe Gln Asp Lys Thr Tyr Asp Ile Val Ser Ala Phe Gln Ser Tyr Gly 180 185 190 Glu Phe Val Ser Gly Ser Ile Ser Asp Glu Gln Arg Lys Thr Val Leu 195 200 205 His His Ser Cys Pro Gly Ala Gly Ala Cys Gly Gly Met Tyr Thr Ala 210 215 220 Asn Thr Met Ala Ser Ala Ile Gly Ala Met Gly Met Ser Leu Pro Tyr 225 230 235 240 Ser Ser Ser Ile Pro Ala Glu Asp Pro Leu Lys Leu Asp Glu Cys Arg 245 250 255 Leu Ala Gly Lys Tyr Leu Leu Glu Leu Leu Lys Met Asp Leu Lys Pro 260 265 270 Arg Asp Ile Ile Thr Pro Lys Ser Leu Arg Asn Ala Met Val Ser Val 275 280 285 Met Ala Leu Gly Gly Ser Thr Asn Ala Val Leu His Leu Ile Ala Ile 290 295 300 Ala Arg Ser Val Gly Leu Glu Leu Thr Leu Asp Asp Phe Gln Lys Val 305 310 315 320 Ser Asp Ala Val Pro Phe Leu Ala Asp Leu Lys Pro Ser Gly Lys Tyr 325 330 335 Val Met Glu Asp Ile His Lys Ile Gly Gly Thr Pro Ala Val Leu Arg 340 345 350 Tyr Leu Leu Glu Leu Gly Leu Met Asp Gly Asp Cys Met Thr Val Thr 355 360 365 Gly Gln Thr Leu Ala Gln Asn Leu Glu Asn Val Pro Ser Leu Thr Glu 370 375 380 Gly Gln Glu Ile Ile Arg Pro Leu Ser Asn Pro Ile Lys Glu Thr Gly 385 390 395 400 His Ile Gln Ile Leu Arg Gly Asp Leu Ala Pro Asp Gly Ser Val Ala 405 410 415 Lys Ile Thr Gly Lys Glu Gly Leu Tyr Phe Ser Gly Pro Ala Leu Val 420 425 430 Phe Glu Gly Glu Glu Ser Met Leu Ala Ala Ile Ser Ala Asp Pro Met 435 440 445 Ser Phe Lys Gly Thr Val Val Val Ile Arg Gly Glu Gly Pro Lys Gly 450 455 460 Gly Pro Gly Met Pro Glu Met Leu Thr Pro Thr Ser Ala Ile Met Gly 465 470 475 480 Ala Gly Leu Gly Lys Glu Cys Ala Leu Leu Thr Asp Gly Arg Phe Ser 485 490 495 Gly Gly Ser His Gly Phe Val Val Gly His Ile Cys Pro Glu Ala Gln 500 505 510 Glu Gly Gly Pro Ile Gly Leu Ile Lys Asn Gly Asp Ile Ile Thr Ile 515 520 525 Asp Ile Gly Lys Lys Arg Ile Asp Thr Gln Val Ser Pro Glu Glu Met 530 535 540 Asn Asp Arg Arg Lys Lys Trp Thr Ala Pro Ala Tyr Lys Val Asn Arg 545 550 555 560 Gly Val Leu Tyr Lys Tyr Ile Lys Asn Val Gln Ser Ala Ser Asp Gly 565 570 575 Cys Val Thr Asp Glu 580 16573PRTCandidatus Koribacter versatilis 16Met Thr Glu Lys Ser Pro Lys Pro His Lys Arg Ser Asp Ala Ile Thr 1 5 10 15 Glu Gly Pro Asn Arg Ala Pro Ala Arg Ala Met Leu Arg Ala Ala Gly 20 25 30 Phe Thr Pro Glu Asp Leu Arg Lys Pro Ile Ile Gly Ile Ala Asn Thr 35 40 45 Trp Ile Glu Ile Gly Pro Cys Asn Leu His Leu Arg Glu Leu Ala Glu 50 55 60 His Ile Lys Gln Gly Val Arg Glu Ala Gly Gly Thr Pro Met Glu Phe 65 70 75 80 Asn Thr Val Ser Ile Ser Asp Gly Ile Thr Met Gly Ser Glu Gly Met 85 90 95 Lys Ala Ser Leu Val Ser Arg Glu Val Ile Ala Asp Ser Ile Glu Leu 100 105 110 Val Ala

Arg Gly Asn Leu Phe Asp Gly Leu Ile Ala Leu Ser Gly Cys 115 120 125 Asp Lys Thr Ile Pro Gly Thr Ile Met Ala Leu Glu Arg Leu Asp Ile 130 135 140 Pro Gly Leu Met Leu Tyr Gly Gly Ser Ile Ala Pro Gly Lys Phe His 145 150 155 160 Ala Gln Lys Val Thr Ile Gln Asp Val Phe Glu Ala Val Gly Thr His 165 170 175 Ala Arg Gly Lys Met Ser Asp Ala Asp Leu Glu Glu Leu Glu His Asn 180 185 190 Ala Cys Pro Gly Ala Gly Ala Cys Gly Gly Gln Phe Thr Ala Asn Thr 195 200 205 Met Ser Met Cys Gly Glu Phe Leu Gly Ile Ser Pro Met Gly Ala Asn 210 215 220 Ser Val Pro Ala Met Thr Val Glu Lys Gln Gln Val Ala Arg Arg Cys 225 230 235 240 Gly His Leu Val Met Glu Leu Val Arg Arg Asp Ile Arg Pro Ser Gln 245 250 255 Ile Ile Thr Arg Lys Ala Ile Glu Asn Ala Ile Ala Ser Val Ala Ala 260 265 270 Ser Gly Gly Ser Thr Asn Ala Val Leu His Leu Leu Ala Ile Ala His 275 280 285 Glu Met Asp Val Glu Leu Asn Ile Glu Asp Phe Asp Lys Ile Ser Ser 290 295 300 Arg Thr Pro Leu Leu Cys Glu Leu Lys Pro Ala Gly Arg Phe Thr Ala 305 310 315 320 Thr Asp Leu His Asp Ala Gly Gly Ile Pro Leu Val Ala Gln Arg Leu 325 330 335 Leu Glu Ala Asn Leu Leu His Ala Asp Ala Leu Thr Val Thr Gly Lys 340 345 350 Thr Ile Ala Glu Glu Ala Lys Gln Ala Lys Glu Thr Pro Gly Gln Glu 355 360 365 Val Val Arg Pro Leu Thr Asp Pro Ile Lys Ala Thr Gly Gly Leu Met 370 375 380 Ile Leu Lys Gly Asn Leu Ala Ser Glu Gly Cys Val Val Lys Leu Val 385 390 395 400 Gly His Lys Lys Leu Phe Phe Glu Gly Pro Ala Arg Val Phe Glu Ser 405 410 415 Glu Glu Glu Ala Phe Ala Gly Val Glu Asp Arg Thr Ile Gln Ala Gly 420 425 430 Glu Val Val Val Val Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met 435 440 445 Arg Glu Met Leu Gly Val Thr Ala Ala Ile Ala Gly Thr Glu Leu Ala 450 455 460 Glu Thr Val Ala Leu Ile Thr Asp Gly Arg Phe Ser Gly Ala Thr Arg 465 470 475 480 Gly Leu Ser Val Gly His Val Ala Pro Glu Ala Ala Asn Gly Gly Ala 485 490 495 Ile Ala Val Val Arg Asn Gly Asp Ile Ile Thr Leu Asp Val Glu Arg 500 505 510 Arg Glu Leu Arg Val His Leu Thr Asp Ala Glu Leu Glu Ala Arg Leu 515 520 525 Arg Asn Trp Arg Ala Pro Glu Pro Arg Tyr Lys Arg Gly Val Phe Ala 530 535 540 Lys Tyr Ala Ser Thr Val Ser Ser Ala Ser Phe Gly Ala Val Thr Gly 545 550 555 560 Ser Thr Ile Glu Asn Lys Thr Leu Ala Gly Ser Thr Lys 565 570 17562PRTGramella forsetii 17Met Asp Lys Thr Ala Met Asn Asn Lys Tyr Ser Ser Thr Ile Thr Gln 1 5 10 15 Ser Asp Ser Gln Pro Ala Ser Gln Ala Met Leu His Ala Ile Gly Leu 20 25 30 Asn Lys Glu Asp Leu Lys Lys Pro Phe Val Gly Ile Gly Ser Thr Gly 35 40 45 Tyr Glu Gly Asn Pro Cys Asn Met His Leu Asn Asp Leu Ala Lys Glu 50 55 60 Val Lys Lys Gly Thr Gln Asn Ala Asp Leu Asn Gly Leu Ile Phe Asn 65 70 75 80 Thr Ile Gly Val Ser Asp Gly Ile Ser Met Gly Thr Pro Gly Met Arg 85 90 95 Phe Ser Leu Pro Ser Arg Asp Leu Ile Ala Asp Ser Met Glu Thr Val 100 105 110 Val Gly Gly Met Ser Tyr Asp Gly Leu Val Thr Val Val Gly Cys Asp 115 120 125 Lys Asn Met Pro Gly Ala Leu Met Ala Met Leu Arg Leu Asn Arg Pro 130 135 140 Ser Val Leu Val Tyr Gly Gly Thr Ile Ala Ser Gly Cys His Asn Gly 145 150 155 160 Lys Lys Leu Asp Val Val Ser Ala Phe Glu Ala Trp Gly Ser Lys Val 165 170 175 Ser Gly Asp Met Gln Glu Glu Glu Tyr Gln Gln Val Ile Glu Lys Ala 180 185 190 Cys Pro Gly Ala Gly Ala Cys Gly Gly Met Tyr Thr Ala Asn Thr Met 195 200 205 Ala Ser Ser Ile Glu Ala Leu Gly Met Ser Leu Pro Phe Asn Ser Ser 210 215 220 Asn Pro Ala Thr Gly Pro Glu Lys Thr Gln Glu Ser Val Lys Ala Gly 225 230 235 240 Glu Ala Met Lys Tyr Leu Leu Glu Asn Asp Leu Lys Pro Lys Asp Ile 245 250 255 Val Thr Ala Lys Ser Leu Glu Asn Ala Ile Arg Leu Leu Thr Val Leu 260 265 270 Gly Gly Ser Thr Asn Ala Val Leu His Phe Leu Ala Ile Ala Lys Ala 275 280 285 Ala Glu Ile Asn Phe Gly Leu Lys Asp Phe Thr Arg Ile Cys Glu Glu 290 295 300 Thr Pro Phe Leu Ala Asp Leu Lys Pro Ser Gly Lys Tyr Leu Met Glu 305 310 315 320 Asp Ile His Arg Ile Gly Gly Ile Pro Ala Val Met Lys Tyr Met Leu 325 330 335 Glu Lys Gly Leu Leu His Gly Glu Cys Met Thr Val Thr Gly Lys Thr 340 345 350 Ile Ala Glu Asn Leu Glu Asn Val Lys Pro Leu Pro Asp Asp Gln Asp 355 360 365 Val Ile His Pro Val Glu Lys Pro Ile Lys Ala Thr Gly His Ile Arg 370 375 380 Ile Leu Tyr Gly Asn Leu Ala Ser Glu Gly Ser Val Ala Lys Ile Thr 385 390 395 400 Gly Lys Glu Gly Leu Glu Phe Gln Gly Lys Ala Arg Val Phe Asn Gly 405 410 415 Glu Phe Glu Ala Asn Glu Gly Ile Ser Ser Gly Lys Val Gln Lys Gly 420 425 430 Asp Val Val Val Ile Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met 435 440 445 Pro Glu Met Leu Lys Pro Thr Ser Ala Ile Met Gly Ala Gly Leu Gly 450 455 460 Lys Ser Val Ala Leu Ile Thr Asp Gly Arg Phe Ser Gly Gly Thr His 465 470 475 480 Gly Phe Val Val Gly His Ile Thr Pro Glu Ala Gln Gln Gly Gly Leu 485 490 495 Ile Gly Leu Leu Lys Asp Gly Asp Glu Ile Ser Ile Asn Ala Glu Lys 500 505 510 Asn Thr Ile Glu Ala His Leu Ser Ala Glu Glu Ile Asn Arg Arg Lys 515 520 525 Glu Ala Trp Lys Ala Pro Ala Leu Lys Val Asn Gly Gly Val Leu Tyr 530 535 540 Lys Tyr Ala Lys Thr Val Ala Ser Ala Ser Glu Gly Cys Val Thr Asp 545 550 555 560 Glu Phe 18570PRTLactococcus lactis 18Met Glu Phe Lys Tyr Asn Gly Lys Val Glu Ser Val Glu Leu Asn Lys 1 5 10 15 Tyr Ser Lys Thr Leu Thr Gln Asp Pro Thr Gln Pro Ala Thr Gln Ala 20 25 30 Met Tyr Tyr Gly Ile Gly Phe Lys Asp Glu Asp Phe Lys Lys Ala Gln 35 40 45 Val Gly Ile Val Ser Met Asp Trp Asp Gly Asn Pro Cys Asn Met His 50 55 60 Leu Gly Thr Leu Gly Ser Lys Ile Lys Ser Ser Val Asn Gln Thr Asp 65 70 75 80 Gly Leu Ile Gly Leu Gln Phe His Thr Ile Gly Val Ser Asp Gly Ile 85 90 95 Ala Asn Gly Lys Leu Gly Met Arg Tyr Ser Leu Val Ser Arg Glu Val 100 105 110 Ile Ala Asp Ser Ile Glu Thr Asn Ala Gly Ala Glu Tyr Tyr Asp Ala 115 120 125 Ile Val Ala Ile Pro Gly Cys Asp Lys Asn Met Pro Gly Ser Ile Ile 130 135 140 Gly Met Ala Arg Leu Asn Arg Pro Ser Ile Met Val Tyr Gly Gly Thr 145 150 155 160 Ile Glu His Gly Glu Tyr Lys Gly Glu Lys Leu Asn Ile Val Ser Ala 165 170 175 Phe Glu Ser Leu Gly Gln Lys Ile Thr Gly Asn Ile Ser Asp Glu Asp 180 185 190 Tyr His Gly Val Ile Cys Asn Ala Ile Pro Gly Gln Gly Ala Cys Gly 195 200 205 Gly Met Tyr Thr Ala Asn Thr Leu Ala Ala Ala Ile Glu Thr Leu Gly 210 215 220 Met Ser Leu Pro Tyr Ser Ser Ser Asn Pro Ala Val Ser Gln Glu Lys 225 230 235 240 Gln Glu Glu Cys Asp Glu Ile Gly Leu Ala Ile Lys Asn Leu Leu Glu 245 250 255 Lys Asp Ile Lys Pro Ser Asp Ile Met Thr Lys Glu Ala Phe Glu Asn 260 265 270 Ala Ile Thr Ile Val Met Val Leu Gly Gly Ser Thr Asn Ala Val Leu 275 280 285 His Ile Ile Ala Met Ala Asn Ala Ile Gly Val Glu Ile Thr Gln Asp 290 295 300 Asp Phe Gln Arg Ile Ser Asp Ile Thr Pro Val Leu Gly Asp Phe Lys 305 310 315 320 Pro Ser Gly Lys Tyr Met Met Glu Asp Leu His Lys Ile Gly Gly Leu 325 330 335 Pro Ala Val Leu Lys Tyr Leu Leu Lys Glu Gly Lys Leu His Gly Asp 340 345 350 Cys Leu Thr Val Thr Gly Lys Thr Leu Ala Glu Asn Val Glu Thr Ala 355 360 365 Leu Asp Leu Asp Phe Asp Ser Gln Asp Ile Met Arg Pro Leu Lys Asn 370 375 380 Pro Ile Lys Ala Thr Gly His Leu Gln Ile Leu Tyr Gly Asn Leu Ala 385 390 395 400 Gln Gly Gly Ser Val Ala Lys Ile Ser Gly Lys Glu Gly Glu Phe Phe 405 410 415 Lys Gly Thr Ala Arg Val Phe Asp Gly Glu Gln His Phe Ile Asp Gly 420 425 430 Ile Glu Ser Gly Arg Leu His Ala Gly Asp Val Ala Val Ile Arg Asn 435 440 445 Ile Gly Pro Val Gly Gly Pro Gly Met Pro Glu Met Leu Lys Pro Thr 450 455 460 Ser Ala Leu Ile Gly Ala Gly Leu Gly Lys Ser Cys Ala Leu Ile Thr 465 470 475 480 Asp Gly Arg Phe Ser Gly Gly Thr His Gly Phe Val Val Gly His Ile 485 490 495 Val Pro Glu Ala Val Glu Gly Gly Leu Ile Gly Leu Val Glu Asp Asp 500 505 510 Asp Ile Ile Glu Ile Asp Ala Val Asn Asn Ser Ile Ser Leu Lys Val 515 520 525 Ser Asp Glu Glu Ile Ala Lys Arg Arg Ala Asn Tyr Gln Lys Pro Thr 530 535 540 Pro Lys Ala Thr Arg Gly Val Leu Ala Lys Phe Ala Lys Leu Thr Arg 545 550 555 560 Pro Ala Ser Glu Gly Cys Val Thr Asp Leu 565 570 19568PRTSaccharopolyspora erythraea 19Met Ser Thr Ser Thr Asp Gly Thr Gly Gln Ser Gly Arg Gly Leu Lys 1 5 10 15 Pro Arg Ser Gly Asp Val Thr Glu Gly Ile Glu Arg Ala Ala Ala Arg 20 25 30 Gly Met Leu Arg Ala Val Gly Met Gln Asp Ala Asp Phe Ala Lys Pro 35 40 45 Gln Ile Gly Val Ala Ser Ser Trp Asn Glu Ile Thr Pro Cys Asn Leu 50 55 60 Ser Leu Gln Arg Leu Ala Gln Ala Ser Lys Glu Gly Val His Ala Ala 65 70 75 80 Gly Gly Phe Pro Met Glu Phe Gly Thr Ile Ser Val Ser Asp Gly Ile 85 90 95 Ser Met Gly His Val Gly Met His Tyr Ser Leu Val Ser Arg Glu Val 100 105 110 Ile Ala Asp Ser Val Glu Thr Val Met Glu Ala Glu Arg Leu Asp Gly 115 120 125 Ser Val Leu Leu Ala Gly Cys Asp Lys Ser Leu Pro Gly Met Leu Met 130 135 140 Ala Ala Ala Arg Leu Asp Val Ala Ala Val Phe Val Tyr Ala Gly Ser 145 150 155 160 Ile Leu Pro Gly Arg Val Asp Asp Arg Glu Val Thr Ile Ile Asp Ala 165 170 175 Phe Glu Ala Val Gly Ala Cys Ala Arg Gly Leu Ile Ser Glu Ala Glu 180 185 190 Val Asp Arg Ile Glu Arg Ala Ile Cys Pro Gly Glu Gly Ala Cys Gly 195 200 205 Gly Met Tyr Thr Ala Asn Thr Met Ala Cys Ala Ala Glu Ala Met Gly 210 215 220 Met Ser Leu Pro Gly Ser Ala Ser Pro Pro Ser Val Asp Arg Arg Arg 225 230 235 240 Asp Ala Gly Ala Arg Glu Ala Gly Arg Ala Val Val Gly Met Ile Glu 245 250 255 Arg Gly Leu Thr Ala Arg Gln Ile Leu Thr Lys Glu Ala Phe Glu Asn 260 265 270 Ala Ile Ala Val Val Met Ala Phe Gly Gly Ser Thr Asn Ala Val Leu 275 280 285 His Leu Leu Ala Ile Ala Arg Glu Ala Glu Val Asp Leu Thr Leu Asp 290 295 300 Asp Phe Asn Arg Ile Gly Asp Arg Val Pro His Leu Ala Asp Val Lys 305 310 315 320 Pro Phe Gly Arg His Val Met Thr Ala Val Asp Arg Ile Gly Gly Val 325 330 335 Pro Val Val Met Lys Ala Leu Leu Asp Ala Gly Leu Leu His Gly Asp 340 345 350 Cys Met Thr Val Thr Gly Lys Thr Val Ala Glu Asn Leu Ala Glu Leu 355 360 365 Asp Pro Pro Glu Leu Asp Gly Glu Val Leu His Lys Leu Ser Asn Pro 370 375 380 Leu His Pro Thr Gly Gly Leu Thr Ile Leu Arg Gly Ser Leu Ala Pro 385 390 395 400 Glu Gly Ala Val Val Lys Ser Ala Gly Phe Asp Ser Ala Thr Phe Glu 405 410 415 Gly Thr Ala Arg Val Phe Asp Gly Glu Gln Gly Ala Met Asp Ala Val 420 425 430 Glu Asp Gly Ser Leu Lys Ala Gly Asp Val Val Val Ile Arg Tyr Glu 435 440 445 Gly Pro Arg Gly Gly Pro Gly Met Arg Glu Met Leu Ala Val Thr Gly 450 455 460 Ala Ile Lys Gly Ala Gly Leu Gly Lys Asp Val Leu Leu Leu Thr Asp 465 470 475 480 Gly Arg Phe Ser Gly Gly Thr Thr Gly Leu Cys Ile Gly His Val Ala 485 490 495 Pro Glu Ala Thr Asp Gly Gly Pro Ile Ala Phe Val Arg Asp Gly Asp 500 505 510 Pro Ile Arg Leu Asp Leu Ala Gly Arg Thr Leu Asp Leu Leu Val Asp 515 520 525 Glu Ala Glu Leu Ala Arg Arg Lys Glu Gly Trp Val Pro Arg Glu Pro 530 535 540 Lys Phe Arg Gln Gly Val Leu Gly Lys Tyr Ala Arg Leu Val Arg Ser 545 550 555 560 Ala Ala Val Gly Ala Val Cys Ser 565 20585PRTSaccharomyces cerevisiae 20Met Gly Leu Leu Thr Lys Val Ala Thr Ser Arg Gln Phe Ser Thr Thr 1 5 10 15 Arg Cys Val Ala Lys Lys Leu Asn Lys Tyr Ser Tyr Ile Ile Thr Glu 20 25 30 Pro Lys Gly Gln Gly Ala Ser Gln Ala Met Leu Tyr Ala Thr Gly Phe 35 40 45 Lys Lys Glu Asp Phe Lys Lys Pro Gln Val Gly Val Gly Ser Cys Trp 50 55 60 Trp Ser Gly Asn Pro Cys Asn Met His Leu Leu Asp Leu Asn Asn Arg 65 70 75 80 Cys Ser Gln Ser Ile Glu Lys Ala Gly Leu Lys Ala Met Gln Phe Asn 85 90 95 Thr Ile Gly Val Ser Asp Gly Ile Ser Met Gly Thr Lys Gly Met Arg 100 105 110 Tyr Ser Leu Gln Ser Arg Glu Ile Ile Ala Asp Ser Phe Glu Thr Ile 115 120 125 Met Met Ala Gln

His Tyr Asp Ala Asn Ile Ala Ile Pro Ser Cys Asp 130 135 140 Lys Asn Met Pro Gly Val Met Met Ala Met Gly Arg His Asn Arg Pro 145 150 155 160 Ser Ile Met Val Tyr Gly Gly Thr Ile Leu Pro Gly His Pro Thr Cys 165 170 175 Gly Ser Ser Lys Ile Ser Lys Asn Ile Asp Ile Val Ser Ala Phe Gln 180 185 190 Ser Tyr Gly Glu Tyr Ile Ser Lys Gln Phe Thr Glu Glu Glu Arg Glu 195 200 205 Asp Val Val Glu His Ala Cys Pro Gly Pro Gly Ser Cys Gly Gly Met 210 215 220 Tyr Thr Ala Asn Thr Met Ala Ser Ala Ala Glu Val Leu Gly Leu Thr 225 230 235 240 Ile Pro Asn Ser Ser Ser Phe Pro Ala Val Ser Lys Glu Lys Leu Ala 245 250 255 Glu Cys Asp Asn Ile Gly Glu Tyr Ile Lys Lys Thr Met Glu Leu Gly 260 265 270 Ile Leu Pro Arg Asp Ile Leu Thr Lys Glu Ala Phe Glu Asn Ala Ile 275 280 285 Thr Tyr Val Val Ala Thr Gly Gly Ser Thr Asn Ala Val Leu His Leu 290 295 300 Val Ala Val Ala His Ser Ala Gly Val Lys Leu Ser Pro Asp Asp Phe 305 310 315 320 Gln Arg Ile Ser Asp Thr Thr Pro Leu Ile Gly Asp Phe Lys Pro Ser 325 330 335 Gly Lys Tyr Val Met Ala Asp Leu Ile Asn Val Gly Gly Thr Gln Ser 340 345 350 Val Ile Lys Tyr Leu Tyr Glu Asn Asn Met Leu His Gly Asn Thr Met 355 360 365 Thr Val Thr Gly Asp Thr Leu Ala Glu Arg Ala Lys Lys Ala Pro Ser 370 375 380 Leu Pro Glu Gly Gln Glu Ile Ile Lys Pro Leu Ser His Pro Ile Lys 385 390 395 400 Ala Asn Gly His Leu Gln Ile Leu Tyr Gly Ser Leu Ala Pro Gly Gly 405 410 415 Ala Val Gly Lys Ile Thr Gly Lys Glu Gly Thr Tyr Phe Lys Gly Arg 420 425 430 Ala Arg Val Phe Glu Glu Glu Gly Ala Phe Ile Glu Ala Leu Glu Arg 435 440 445 Gly Glu Ile Lys Lys Gly Glu Lys Thr Val Val Val Ile Arg Tyr Glu 450 455 460 Gly Pro Arg Gly Ala Pro Gly Met Pro Glu Met Leu Lys Pro Ser Ser 465 470 475 480 Ala Leu Met Gly Tyr Gly Leu Gly Lys Asp Val Ala Leu Leu Thr Asp 485 490 495 Gly Arg Phe Ser Gly Gly Ser His Gly Phe Leu Ile Gly His Ile Val 500 505 510 Pro Glu Ala Ala Glu Gly Gly Pro Ile Gly Leu Val Arg Asp Gly Asp 515 520 525 Glu Ile Ile Ile Asp Ala Asp Asn Asn Lys Ile Asp Leu Leu Val Ser 530 535 540 Asp Lys Glu Met Ala Gln Arg Lys Gln Ser Trp Val Ala Pro Pro Pro 545 550 555 560 Arg Tyr Thr Arg Gly Thr Leu Ser Lys Tyr Ala Lys Leu Val Ser Asn 565 570 575 Ala Ser Asn Gly Cys Val Leu Asp Ala 580 585 21592PRTPiromyces sp 21Met Ser Phe Ser Leu Ala Asn Leu Ala Ala Lys Gly Ser Asn Leu Phe 1 5 10 15 Lys Phe Thr Pro Ala Leu Leu Ser Ala Lys Arg Phe Gly Ser Ser Gly 20 25 30 Lys Pro Ile Asn Lys Phe Ser Lys Ile Ile Thr Glu Pro Lys Ser Arg 35 40 45 Gly Gly Ser Gln Ala Met Leu Ile Ala Thr Gly Ile Lys Pro Glu Asp 50 55 60 Leu Lys Lys Pro Gln Ile Gly Ile Gly Ser Val Trp Tyr Asp Gly Asn 65 70 75 80 Pro Cys Asn Met His Leu Leu Asp Leu Gly Ser Val Val Lys Lys Ala 85 90 95 Val Gln Lys Gln Asn Met Asn Gly Met Arg Phe Asn Met Ile Gly Val 100 105 110 Ser Asp Gly Ile Ser Asn Gly Thr Asp Gly Met Ser Phe Ser Leu Gln 115 120 125 Ser Arg Glu Ile Ile Ala Asp Ser Ile Glu Thr Ile Met Ser Ala Gln 130 135 140 Tyr Tyr Asp Ala Asn Ile Ser Leu Pro Gly Cys Asp Lys Asn Met Pro 145 150 155 160 Gly Cys Leu Ile Ala Ala Ala Arg Leu Asn Arg Pro Thr Ile Ile Ile 165 170 175 Tyr Gly Gly Thr Ile Lys Pro Gly His Thr Lys Lys Gly Glu Thr Ile 180 185 190 Asp Leu Val Ser Ala Phe Gln Cys Tyr Gly Gln Tyr Leu Ala Gly Glu 195 200 205 Ile Thr Glu Glu Gln Arg Glu Glu Ile Val Asn Asn Ala Cys Pro Gly 210 215 220 Ala Gly Ala Cys Gly Gly Met Tyr Thr Ala Asn Thr Met Ala Ser Ile 225 230 235 240 Ile Glu Ser Met Gly Met Ser Leu Pro Tyr Ser Ala Ser Thr Pro Ala 245 250 255 Glu Asp Pro Leu Lys Glu Leu Glu Cys Ile Asn Ala Ala Ala Ala Ile 260 265 270 Lys Asn Leu Met Glu Lys Asp Ile Lys Pro Leu Asp Ile Met Thr Arg 275 280 285 Lys Ala Phe Glu Asn Ala Ile Thr Ile Thr Leu Ile Leu Gly Gly Ser 290 295 300 Thr Asn Ser Val Leu His Leu Leu Ala Ile Ala Arg Ala Cys Lys Val 305 310 315 320 Pro Leu Thr Ile Asp Asp Phe Gln Glu Phe Ser Asn Arg Ile Pro Val 325 330 335 Leu Ala Asp Leu Lys Pro Ser Gly Lys Tyr Val Met Glu Asp Leu Gln 340 345 350 Leu Ile Gly Gly Leu Pro Ala Ile Gln Lys Tyr Leu Leu Asn Glu Gly 355 360 365 Leu Leu His Gly Asp Ile Met Thr Val Thr Gly Lys Thr Leu Ala Glu 370 375 380 Asn Leu Lys Asp Val Ala Pro Ile Asp Phe Glu Thr Gln Asp Ile Ile 385 390 395 400 Arg Pro Leu Ser Asn Pro Ile Lys Lys Asn Gly His Ile Ile Ile Met 405 410 415 Lys Gly Asn Val Ser Pro Asp Gly Gly Val Ala Lys Ile Thr Gly Lys 420 425 430 Gln Gly Leu Phe Phe Glu Gly Val Ala Asn Cys Phe Asp Cys Glu Glu 435 440 445 Asp Met Leu Ala Ala Leu Glu Arg Gly Glu Ile Lys Lys Gly Gln Val 450 455 460 Ile Ile Ile Arg Tyr Glu Gly Pro Thr Gly Gly Pro Gly Met Pro Glu 465 470 475 480 Met Leu Thr Pro Thr Ser Ala Ile Met Gly Ala Gly Leu Gly Lys Asp 485 490 495 Val Ala Leu Leu Thr Asp Gly Arg Phe Ser Gly Gly Ser His Gly Phe 500 505 510 Ile Ile Gly His Ile Thr Pro Glu Ala Gln Val Gly Gly Pro Ile Ala 515 520 525 Leu Ile Lys Asn Gly Asp Lys Ile Thr Ile Asp Ala Asn Lys Arg Thr 530 535 540 Ile His Ala His Val Ser Glu Glu Glu Phe Ala Lys Arg Arg Ala Glu 545 550 555 560 Trp Lys Ala Pro Pro Tyr Arg Ala Thr Gln Gly Thr Leu Lys Lys Tyr 565 570 575 Ile Lys Leu Val Lys Pro Ala Asn Phe Gly Cys Val Thr Asp Glu Trp 580 585 590 22587PRTRalstonia eutropha 22Met Pro Tyr Ala Asp Asp Pro Lys Leu Pro Gln Asp Gly Ala Ala Pro 1 5 10 15 Thr Glu Gly Leu Ala Lys Gly Leu Thr Asn Tyr Gly Asp Thr Gly Phe 20 25 30 Ser Leu Phe Leu Arg Lys Ala Phe Ile Lys Gly Ala Gly Phe Thr Asp 35 40 45 Asp Ala Leu Ser Arg Pro Val Ile Gly Ile Val Asn Thr Gly Ser Ser 50 55 60 Tyr Asn Pro Cys His Gly Asn Ala Pro Gln Leu Val Glu Ala Val Lys 65 70 75 80 Arg Gly Val Met Leu Ala Gly Gly Leu Pro Val Asp Phe Pro Thr Ile 85 90 95 Ser Val His Glu Ser Phe Ser Ala Pro Thr Ser Met Tyr Leu Arg Asn 100 105 110 Leu Met Ser Met Asp Thr Glu Glu Met Ile Arg Ala Gln Pro Met Asp 115 120 125 Ala Val Val Leu Ile Gly Gly Cys Asp Lys Thr Val Pro Ala Gln Leu 130 135 140 Met Gly Ala Ala Ser Ala Gly Val Pro Ala Ile Gln Leu Val Thr Gly 145 150 155 160 Ser Met Leu Thr Gly Ser His Arg Ser Glu Arg Val Gly Ala Cys Thr 165 170 175 Asp Cys Arg Arg Tyr Trp Gly Arg Tyr Arg Ala Glu Glu Ile Asp Ser 180 185 190 Ala Glu Ile Ala Asp Val Asn Asn Gln Leu Val Ala Ser Val Gly Thr 195 200 205 Cys Ser Val Met Gly Thr Ala Ser Thr Met Ala Cys Val Ala Glu Ala 210 215 220 Leu Gly Met Met Val Ser Gly Gly Ala Ser Ala Pro Ala Val Thr Ala 225 230 235 240 Asp Arg Val Arg Val Ala Glu Arg Thr Gly Thr Thr Ala Val Gly Met 245 250 255 Ala Ala Ala Arg Leu Thr Pro Asp Arg Ile Leu Thr Gly Lys Ala Phe 260 265 270 Glu Asn Ala Leu Arg Val Leu Leu Ala Ile Gly Gly Ser Thr Asn Gly 275 280 285 Ile Val His Leu Thr Ala Ile Ala Gly Arg Leu Gly Ile Asp Ile Asp 290 295 300 Leu Ala Gly Leu Asp Arg Met Ser Arg Glu Thr Pro Val Leu Val Asp 305 310 315 320 Leu Lys Pro Ser Gly Gln His Tyr Met Glu Asp Phe His Lys Ala Gly 325 330 335 Gly Met Leu Thr Leu Leu Arg Glu Leu Arg Pro Leu Leu His Leu Asp 340 345 350 Thr Leu Thr Val Ser Gly Arg Thr Leu Gly Glu Glu Leu Asp Ala Ala 355 360 365 Pro Pro Leu Phe Pro Gln Asp Val Ile Arg Ser Ala Gly Asn Pro Ile 370 375 380 Tyr Pro Ala Gly Gly Leu Ala Val Leu Arg Gly Asn Leu Ala Pro Gly 385 390 395 400 Gly Ala Ile Ile Lys Gln Ser Ala Ala Asn Pro Ala Leu Met Glu His 405 410 415 Glu Gly Arg Ala Val Val Phe Glu Asn Ala Glu Asp Met Ala Gln Arg 420 425 430 Ile Asp Asp Glu Ser Leu Asp Val Lys Ala Asp Asp Ile Leu Val Leu 435 440 445 Lys Arg Ile Gly Pro Thr Gly Ala Pro Gly Met Pro Glu Ala Gly Tyr 450 455 460 Met Pro Ile Pro Lys Lys Leu Ala Arg Ala Gly Val Lys Asp Met Val 465 470 475 480 Arg Val Ser Asp Gly Arg Met Ser Gly Thr Ala Ala Gly Thr Ile Val 485 490 495 Leu His Val Thr Pro Glu Ala Ala Ile Gly Gly Pro Leu Ala Leu Val 500 505 510 Gln Ser Gly Asp Arg Ile Arg Leu Ser Val Ala Asn Arg Glu Ile Ala 515 520 525 Leu Leu Val Asp Asp Ala Glu Leu Ala Arg Arg Ala Ala Ala Gln Pro 530 535 540 Val Glu Arg Pro Arg Ala Glu Arg Gly Tyr Arg Lys Leu Phe Leu Glu 545 550 555 560 Thr Val Thr Gln Ala Asp Gln Gly Val Asp Phe Asp Phe Leu Arg Ala 565 570 575 Ala Gln Thr Val Asp Thr Val Pro Lys Gln Gly 580 585 23581PRTChromohalobacter salexigens 23Met Thr His Lys Lys Arg Pro Leu Arg Ser Ala Glu Trp Phe Gly Asn 1 5 10 15 Asp Asp Lys Asn Gly Phe Met Tyr Arg Ser Trp Met Lys Asn Gln Gly 20 25 30 Ile Pro Asp His Glu Phe Arg Gly Lys Pro Ile Ile Gly Ile Cys Asn 35 40 45 Thr Phe Ser Glu Leu Thr Pro Cys Asn Ala His Phe Arg Lys Leu Ala 50 55 60 Glu His Val Lys Lys Gly Val Leu Glu Ala Gly Gly Tyr Pro Val Glu 65 70 75 80 Phe Pro Val Phe Ser Asn Gly Glu Ser Asn Leu Arg Pro Thr Ala Met 85 90 95 Phe Thr Arg Asn Leu Ala Ser Met Asp Val Glu Glu Ala Ile Arg Gly 100 105 110 Asn Pro Leu Asp Ala Val Val Leu Leu Val Gly Cys Asp Lys Thr Thr 115 120 125 Pro Ala Leu Leu Met Gly Ala Ala Ser Cys Asp Ile Pro Thr Ile Val 130 135 140 Val Thr Gly Gly Pro Met Leu Asn Gly Lys His Lys Gly Arg Asp Ile 145 150 155 160 Gly Ser Gly Thr Val Val Trp Gln Leu Ser Glu Glu Val Lys Ala Gly 165 170 175 Lys Ile Ser Leu His Asp Phe Met Ala Ala Glu Ala Gly Met Ser Arg 180 185 190 Ser Ala Gly Thr Cys Asn Thr Met Gly Thr Ala Ser Thr Met Ala Cys 195 200 205 Met Ala Glu Ser Leu Gly Thr Ser Leu Pro His Asn Ala Ala Ile Pro 210 215 220 Ala Val Asp Ser Arg Arg Tyr Val Leu Ala His Leu Ser Gly Asn Arg 225 230 235 240 Ile Val Glu Met Val Asp Glu Asp Leu Thr Leu Ser Lys Val Leu Thr 245 250 255 Lys Ser Ala Phe Glu Asn Ala Ile Arg Thr Asn Ala Ala Ile Gly Gly 260 265 270 Ser Thr Asn Ala Val Ile His Leu Gln Ala Ile Ala Gly Arg Met Gly 275 280 285 Val Asp Leu Thr Leu Asp Asp Trp Thr Arg Val Gly Arg Gly Thr Pro 290 295 300 Thr Ile Val Asp Leu Gln Pro Ser Gly Arg Tyr Leu Met Glu Glu Phe 305 310 315 320 Tyr Tyr Ala Gly Gly Leu Pro Ala Val Leu Arg Arg Leu Gly Glu Ala 325 330 335 Asp Arg Leu Pro His Lys Asp Ala Leu Thr Val Asn Gly Lys Thr Leu 340 345 350 Trp Glu Asn Val Gln Asp Ala Pro Leu Tyr Asn Asp Ala Val Ile Leu 355 360 365 Pro Leu Asp Ala Pro Leu Arg Glu Asp Gly Gly Met Cys Val Met Arg 370 375 380 Gly Asn Leu Ala Pro Asn Gly Ala Val Leu Lys Pro Ser Ala Ala Thr 385 390 395 400 Pro Ala Leu Met Gln His Arg Gly Arg Ala Val Val Phe Glu Asn Phe 405 410 415 Asp Asp Tyr Lys Ala Arg Ile Asn Asp Pro Asp Leu Asp Val Thr Ala 420 425 430 Asp Asp Ile Leu Val Met Lys Asn Cys Gly Pro Arg Gly Tyr His Gly 435 440 445 Met Ala Glu Val Gly Asn Met Gly Leu Pro Ala Lys Leu Leu Glu Gln 450 455 460 Gly Val Thr Asp Met Val Arg Ile Ser Asp Ala Arg Met Ser Gly Thr 465 470 475 480 Ala Tyr Gly Thr Val Val Leu His Val Ala Pro Glu Ala Ala Ala Gly 485 490 495 Gly Pro Leu Ala Ala Val Arg Asn Gly Asp Trp Ile Ala Leu Asp Ala 500 505 510 Tyr Ser Gly Lys Leu His Leu Glu Val Asp Asp Ala Glu Ile Ala Ser 515 520 525 Arg Leu Ala Glu Ala Asp Pro Thr Ala Glu Ser Thr Arg Ile Ala Ser 530 535 540 Thr Gly Gly Tyr Arg Gln Leu Tyr Ile Glu His Val Leu Gln Ala Asp 545 550 555 560 Gln Gly Cys Asp Phe Asp Phe Leu Val Gly Cys Arg Gly Ala Glu Val 565 570 575 Pro Arg His Ser His 580 24329PRTPicrophilus torridus 24Met Glu Lys Val Tyr Thr Glu Asn Asp Leu Lys Glu Asn Leu Met Arg 1 5 10 15 Asn Lys Lys Ile Ala Val Leu Gly Tyr Gly Ser Gln Gly Arg Ala Trp 20 25 30 Ala Leu Asn Met Arg Asp Ser Gly Leu Asn Val Thr Val Gly Leu Glu 35 40 45 Arg Gln Gly Lys Ser Trp Glu Lys Ala Val Ala Asp Gly Phe Lys Pro 50 55 60 Leu Lys Ser Arg Asp Ala Val Arg Asp Ala Asp Ala Val Ile Phe Leu 65 70

75 80 Val Pro Asp Met Ala Gln Arg Glu Leu Tyr Lys Asn Ile Met Asn Asp 85 90 95 Ile Lys Asp Asp Ala Asp Ile Val Phe Ala His Gly Phe Asn Val His 100 105 110 Tyr Gly Leu Ile Asn Pro Lys Asn His Asp Val Tyr Met Val Ala Pro 115 120 125 Lys Ala Pro Gly Pro Ser Val Arg Glu Phe Tyr Glu Arg Gly Gly Gly 130 135 140 Val Pro Val Leu Ile Ala Val Ala Asn Asp Val Ser Gly Arg Ser Lys 145 150 155 160 Glu Lys Ala Leu Ser Ile Ala Tyr Ser Leu Gly Ala Leu Arg Ala Gly 165 170 175 Ala Ile Glu Thr Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Ile Gly 180 185 190 Glu Gln Leu Asp Leu Val Gly Gly Ile Thr Glu Leu Leu Arg Ser Thr 195 200 205 Phe Asn Ile Met Val Glu Met Gly Tyr Lys Pro Glu Met Ala Tyr Phe 210 215 220 Glu Ala Ile Asn Glu Met Lys Leu Ile Val Asp Gln Val Phe Glu Lys 225 230 235 240 Gly Ile Ser Gly Met Leu Arg Ala Val Ser Asp Thr Ala Lys Tyr Gly 245 250 255 Gly Leu Thr Thr Gly Lys Tyr Ile Ile Asn Asp Asp Val Arg Lys Arg 260 265 270 Met Arg Glu Arg Ala Glu Tyr Ile Val Ser Gly Lys Phe Ala Glu Glu 275 280 285 Trp Ile Glu Glu Tyr Gly Glu Gly Ser Lys Asn Leu Glu Ser Met Met 290 295 300 Leu Asp Ile Asp Asn Ser Leu Glu Glu Gln Val Gly Lys Gln Leu Arg 305 310 315 320 Glu Ile Val Leu Arg Gly Arg Pro Lys 325 25560PRTPicrophilus torridus 25Met Asn Pro Asp Lys Lys Lys Arg Ser Asn Leu Ile Tyr Gly Gly Tyr 1 5 10 15 Glu Lys Ala Pro Asn Arg Ala Phe Leu Lys Ala Met Gly Leu Thr Asp 20 25 30 Asp Asp Ile Ala Lys Pro Ile Val Gly Val Ala Val Ala Trp Asn Glu 35 40 45 Ala Gly Pro Cys Asn Ile His Leu Leu Gly Leu Ser Asn Ile Val Lys 50 55 60 Glu Gly Val Arg Ser Gly Gly Gly Thr Pro Arg Val Phe Thr Ala Pro 65 70 75 80 Val Val Ile Asp Gly Ile Ala Met Gly Ser Glu Gly Met Lys Tyr Ser 85 90 95 Leu Val Ser Arg Glu Ile Val Ala Asn Thr Val Glu Leu Val Val Asn 100 105 110 Ala His Gly Tyr Asp Gly Phe Val Ala Leu Ala Gly Cys Asp Lys Thr 115 120 125 Pro Pro Gly Met Met Met Ala Met Ala Arg Leu Asn Ile Pro Ser Ile 130 135 140 Ile Met Tyr Gly Gly Thr Thr Leu Pro Gly Asn Phe Lys Gly Lys Pro 145 150 155 160 Ile Thr Ile Gln Asp Val Tyr Glu Ala Val Gly Ala Tyr Ser Lys Gly 165 170 175 Lys Ile Thr Ala Glu Asp Leu Arg Leu Met Glu Asp Asn Ala Ile Pro 180 185 190 Gly Pro Gly Thr Cys Gly Gly Leu Tyr Thr Ala Asn Thr Met Gly Leu 195 200 205 Met Thr Glu Ala Leu Gly Leu Ala Leu Pro Gly Ser Ala Ser Pro Pro 210 215 220 Ala Val Asp Ser Ala Arg Val Lys Tyr Ala Tyr Glu Thr Gly Lys Ala 225 230 235 240 Leu Met Asn Leu Ile Glu Ile Gly Leu Lys Pro Arg Asp Ile Leu Thr 245 250 255 Phe Glu Ala Phe Glu Asn Ala Ile Thr Val Leu Met Ala Ser Gly Gly 260 265 270 Ser Thr Asn Ala Val Leu His Leu Leu Ala Ile Ala Tyr Glu Ala Gly 275 280 285 Val Lys Leu Thr Leu Asp Asp Phe Asp Arg Ile Ser Gln Arg Thr Pro 290 295 300 Glu Ile Val Asn Met Lys Pro Gly Gly Glu Tyr Ala Met Tyr Asp Leu 305 310 315 320 His Arg Val Gly Gly Ala Pro Leu Ile Met Lys Lys Leu Leu Glu Ala 325 330 335 Asp Leu Leu His Gly Asp Val Ile Thr Val Thr Gly Lys Thr Val Lys 340 345 350 Gln Asn Leu Glu Glu Tyr Lys Leu Pro Asn Val Pro His Glu His Ile 355 360 365 Val Arg Pro Ile Ser Asn Pro Phe Asn Pro Thr Gly Gly Ile Arg Ile 370 375 380 Leu Lys Gly Ser Leu Ala Pro Glu Gly Ala Val Ile Lys Val Ser Ala 385 390 395 400 Thr Lys Val Arg Tyr His Lys Gly Pro Ala Arg Val Phe Asn Ser Glu 405 410 415 Glu Glu Ala Phe Lys Ala Val Leu Glu Glu Lys Ile Gln Glu Asn Asp 420 425 430 Val Val Val Ile Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Arg 435 440 445 Glu Met Leu Ala Val Thr Ser Ala Ile Val Gly Gln Gly Leu Gly Glu 450 455 460 Lys Val Ala Leu Ile Thr Asp Gly Arg Phe Ser Gly Ala Thr Arg Gly 465 470 475 480 Ile Met Val Gly His Val Ala Pro Glu Ala Ala Val Gly Gly Pro Ile 485 490 495 Ala Leu Leu Arg Asp Gly Asp Thr Ile Ile Ile Asp Ala Asn Asn Gly 500 505 510 Arg Leu Asp Val Asp Leu Pro Gln Glu Glu Leu Lys Lys Arg Ala Asp 515 520 525 Glu Trp Thr Pro Pro Pro Pro Lys Tyr Lys Ser Gly Leu Leu Ala Gln 530 535 540 Tyr Ala Arg Leu Val Ser Ser Ser Ser Leu Gly Ala Val Leu Leu Thr 545 550 555 560 26566PRTArtificial SequenceSaccharomyces cerevisiae ILV3deltaN 26Met Lys Lys Leu Asn Lys Tyr Ser Tyr Ile Ile Thr Glu Pro Lys Gly 1 5 10 15 Gln Gly Ala Ser Gln Ala Met Leu Tyr Ala Thr Gly Phe Lys Lys Glu 20 25 30 Asp Phe Lys Lys Pro Gln Val Gly Val Gly Ser Cys Trp Trp Ser Gly 35 40 45 Asn Pro Cys Asn Met His Leu Leu Asp Leu Asn Asn Arg Cys Ser Gln 50 55 60 Ser Ile Glu Lys Ala Gly Leu Lys Ala Met Gln Phe Asn Thr Ile Gly 65 70 75 80 Val Ser Asp Gly Ile Ser Met Gly Thr Lys Gly Met Arg Tyr Ser Leu 85 90 95 Gln Ser Arg Glu Ile Ile Ala Asp Ser Phe Glu Thr Ile Met Met Ala 100 105 110 Gln His Tyr Asp Ala Asn Ile Ala Ile Pro Ser Cys Asp Lys Asn Met 115 120 125 Pro Gly Val Met Met Ala Met Gly Arg His Asn Arg Pro Ser Ile Met 130 135 140 Val Tyr Gly Gly Thr Ile Leu Pro Gly His Pro Thr Cys Gly Ser Ser 145 150 155 160 Lys Ile Ser Lys Asn Ile Asp Ile Val Ser Ala Phe Gln Ser Tyr Gly 165 170 175 Glu Tyr Ile Ser Lys Gln Phe Thr Glu Glu Glu Arg Glu Asp Val Val 180 185 190 Glu His Ala Cys Pro Gly Pro Gly Ser Cys Gly Gly Met Tyr Thr Ala 195 200 205 Asn Thr Met Ala Ser Ala Ala Glu Val Leu Gly Leu Thr Ile Pro Asn 210 215 220 Ser Ser Ser Phe Pro Ala Val Ser Lys Glu Lys Leu Ala Glu Cys Asp 225 230 235 240 Asn Ile Gly Glu Tyr Ile Lys Lys Thr Met Glu Leu Gly Ile Leu Pro 245 250 255 Arg Asp Ile Leu Thr Lys Glu Ala Phe Glu Asn Ala Ile Thr Tyr Val 260 265 270 Val Ala Thr Gly Gly Ser Thr Asn Ala Val Leu His Leu Val Ala Val 275 280 285 Ala His Ser Ala Gly Val Lys Leu Ser Pro Asp Asp Phe Gln Arg Ile 290 295 300 Ser Asp Thr Thr Pro Leu Ile Gly Asp Phe Lys Pro Ser Gly Lys Tyr 305 310 315 320 Val Met Ala Asp Leu Ile Asn Val Gly Gly Thr Gln Ser Val Ile Lys 325 330 335 Tyr Leu Tyr Glu Asn Asn Met Leu His Gly Asn Thr Met Thr Val Thr 340 345 350 Gly Asp Thr Leu Ala Glu Arg Ala Lys Lys Ala Pro Ser Leu Pro Glu 355 360 365 Gly Gln Glu Ile Ile Lys Pro Leu Ser His Pro Ile Lys Ala Asn Gly 370 375 380 His Leu Gln Ile Leu Tyr Gly Ser Leu Ala Pro Gly Gly Ala Val Gly 385 390 395 400 Lys Ile Thr Gly Lys Glu Gly Thr Tyr Phe Lys Gly Arg Ala Arg Val 405 410 415 Phe Glu Glu Glu Gly Ala Phe Ile Glu Ala Leu Glu Arg Gly Glu Ile 420 425 430 Lys Lys Gly Glu Lys Thr Val Val Val Ile Arg Tyr Glu Gly Pro Arg 435 440 445 Gly Ala Pro Gly Met Pro Glu Met Leu Lys Pro Ser Ser Ala Leu Met 450 455 460 Gly Tyr Gly Leu Gly Lys Asp Val Ala Leu Leu Thr Asp Gly Arg Phe 465 470 475 480 Ser Gly Gly Ser His Gly Phe Leu Ile Gly His Ile Val Pro Glu Ala 485 490 495 Ala Glu Gly Gly Pro Ile Gly Leu Val Arg Asp Gly Asp Glu Ile Ile 500 505 510 Ile Asp Ala Asp Asn Asn Lys Ile Asp Leu Leu Val Ser Asp Lys Glu 515 520 525 Met Ala Gln Arg Lys Gln Ser Trp Val Ala Pro Pro Pro Arg Tyr Thr 530 535 540 Arg Gly Thr Leu Ser Lys Tyr Ala Lys Leu Val Ser Asn Ala Ser Asn 545 550 555 560 Gly Cys Val Leu Asp Ala 565 2711PRTArtificial SequenceConserved DHAD motif 27Pro Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Ile Leu 1 5 10 2811PRTArtificial SequenceConserved DHAD motif 28Pro Ile Lys Xaa Xaa Gly Xaa Xaa Xaa Ile Leu 1 5 10 2936DNAArtificial SequencePrimer 387 29gtcacagtcg acatggctaa ctacttcaat acactg 363033DNAArtificial SequencePrimer 388 30gcataaggat ccttaacccg caacagcaat acg 333136DNAArtificial SequencePrimer 410 31gactttgtcg acatgcttta cccagaaaaa tttcag 363241DNAArtificial SequencePrimer 411 32ctaatagcgg ccgcctattt atggaatttc ttatcataat c 413335DNAArtificial SequencePrimer 637 33ttttgagctc gccgatccca ttaccgacat ttggg 353496DNAArtificial SequencePrimer 638 34aaagtcgaca ccgatatacc tgtatgtgtc accaccaatg tatctataag tatccatgct 60agccctaggt ttatgtgatg attgattgat tgattg 963536DNAArtificial SequencePrimer 697 35gagtacggat ccctagagag ctttcgtttt catgag 363637DNAArtificial SequencePrimer 767 36caagaagtcg acatgttgac aaaagcaaca aaagaac 373732DNAArtificial SequencePrimer 1149 37cgcttactcg agatgggccg cgatgaattc gc 323833DNAArtificial SequencePrimer 1150 38gcataaagat ctttaacccg caacagcaat acg 333935DNAArtificial SequencePrimer 1151 39agacgtgtcg acatgactgg catgactgat gcaga 354034DNAArtificial SequencePrimer 1152 40gtttagggat cctcatccac ccaacttcga tttg 344131DNAArtificial SequencePrimer 1006 41gtagaagacg tcacctggta gaccaaagat g 314232DNAArtificial SequencePrimer 1009 42catcgtgacg tcgctcaatt gactgctgct ac 324332DNAArtificial SequencePrimer 1016 43actaagcgac acgtgcggtt tctgtggtat ag 324436DNAArtificial SequencePrimer 1017 44gaaaccgcac gtgtcgctta gtttacattt ctttcc 36451647DNAArtificial SequenceL. lactis kivD (codon optimized for E. coli) in pGV1590 45atgtatactg ttggtgatta tctgctggac cgtctgcatg aactgggtat cgaagaaatc 60ttcggcgttc cgggtgatta caatctgcag ttcctggatc agatcatctc tcataaagac 120atgaaatggg tgggtaacgc taacgaactg aacgcaagct acatggcaga tggttatgca 180cgtaccaaga aagccgcggc atttctgacc actttcggtg ttggcgaact gagcgccgtc 240aacggtctgg cgggctccta cgccgaaaac ctgccggtgg tggagatcgt aggcagccca 300acgagcaaag ttcagaacga aggtaaattc gtccaccaca ctctggctga cggcgatttc 360aaacacttca tgaaaatgca tgaacctgtg actgcggcac gtacgctgct gactgcagag 420aacgctactg tggaaatcga ccgcgttctg tctgcgctgc tgaaagaacg caaaccagtt 480tacatcaacc tgcctgtgga tgttgcggca gctaaagcgg aaaaaccgag cctgccgctg 540aagaaagaaa actccacttc taacactagc gaccaggaaa tcctgaacaa aatccaggag 600tctctgaaaa acgcaaagaa accaatcgtg atcaccggcc acgaaatcat ttcttttggt 660ctggagaaga ccgtgaccca attcatcagc aaaaccaaac tgccgattac caccctgaac 720ttcggcaagt cctctgttga cgaggctctg ccgtctttcc tgggcatcta caacggtact 780ctgagcgaac cgaacctgaa agaatttgtt gaatctgcgg acttcatcct gatgctgggc 840gttaaactga ccgactcttc taccggtgca ttcactcacc atctgaacga aaacaaaatg 900attagcctga acatcgacga gggtaaaatc ttcaacgagc gtatccagaa cttcgacttc 960gaaagcctga tcagctctct gctggacctg tccgaaatcg agtataaagg caaatacatt 1020gacaaaaagc aagaagattt cgtaccatct aacgcactgc tgtcccagga tcgcctgtgg 1080caggccgtgg agaacctgac ccagagcaat gaaaccatcg tggcggaaca aggtacgagc 1140tttttcggcg cgtcttctat ctttctgaaa tccaaaagcc attttatcgg tcagccgctg 1200tggggtagca ttggctatac tttcccggca gcgctgggct ctcagatcgc tgataaagaa 1260tctcgtcatc tgctgttcat cggtgacggt tccctgcagc tgaccgtaca ggaactgggt 1320ctggcaattc gtgaaaagat caacccgatt tgcttcatta ttaacaatga cggctacacc 1380gttgagcgtg agatccacgg tccgaaccag tcttacaacg atatccctat gtggaactac 1440tctaaactgc cggagtcctt cggcgcaact gaggaccgtg ttgtgtctaa aattgtgcgt 1500accgaaaacg aatttgtgag cgtgatgaaa gaggcccagg ccgatccgaa ccgtatgtac 1560tggatcgaac tgatcctggc gaaagaaggc gcaccgaagg tactgaagaa aatgggcaag 1620ctgtttgctg aacagaataa atcctaa 1647461086DNAArtificial SequenceS. cerevisiae ADH7 in pGV1590 46atgctttacc cagaaaaatt tcagggcatc ggtatttcca acgcaaagga ttggaagcat 60cctaaattag tgagttttga cccaaaaccc tttggcgatc atgacgttga tgttgaaatt 120gaagcctgtg gtatctgcgg atctgatttt catatagccg ttggtaattg gggtccagtc 180ccagaaaatc aaatccttgg acatgaaata attggccgcg tggtgaaggt tggatccaag 240tgccacactg gggtaaaaat cggtgaccgt gttggtgttg gtgcccaagc cttggcgtgt 300tttgagtgtg aacgttgcaa aagtgacaac gagcaatact gtaccaatga ccacgttttg 360actatgtgga ctccttacaa ggacggctac atttcacaag gaggctttgc ctcccacgtg 420aggcttcatg aacactttgc tattcaaata ccagaaaata ttccaagtcc gctagccgct 480ccattattgt gtggtggtat tacagttttc tctccactac taagaaatgg ctgtggtcca 540ggtaagaggg taggtattgt tggcatcggt ggtattgggc atatggggat tctgttggct 600aaagctatgg gagccgaggt ttatgcgttt tcgcgaggcc actccaagcg ggaggattct 660atgaaactcg gtgctgatca ctatattgct atgttggagg ataaaggctg gacagaacaa 720tactctaacg ctttggacct tcttgtcgtt tgctcatcat ctttgtcgaa agttaatttt 780gacagtatcg ttaagattat gaagattgga ggctccatcg tttcaattgc tgctcctgaa 840gttaatgaaa agcttgtttt aaaaccgttg ggcctaatgg gagtatcaat ctcaagcagt 900gctatcggat ctaggaagga aatcgaacaa ctattgaaat tagtttccga aaagaatgtc 960aaaatatggg tggaaaaact tccgatcagc gaagaaggcg tcagccatgc ctttacaagg 1020atggaaagcg gagacgtcaa atacagattt actttggtcg attatgataa gaaattccat 1080aaatag 1086471716DNAArtificial SequenceB. subtilis alsS in pGV1726 47atgttgacaa aagcaacaaa agaacaaaaa tcccttgtga aaagcagagg ggcggagctt 60gttgttgatt gcttagcgga gcaaggtgtc acacatgtat ttggcattcc aggtgcaaaa 120attgatgcgg tatttgacgc tttacaagat aaagggcctg aaattatcgt tgcccggcat 180gaacaaaatg cagcatttat ggcgcaagca gtcggccgtt taactggaaa accgggagtc 240gtgttagtca catcaggacc aggtgcttcg aacttggcaa caggactgct gacagcaaac 300actgaaggtg accctgtcgt tgcgcttgct gggaacgtga tccgtgcaga tcgtttaaaa 360cggacacatc aatctttgga taatgcggcg ctattccagc cgattacaaa atacagtgta 420gaagttcaag atgtaaaaaa tataccggaa gctgttacaa atgcgtttag gatagcgtca 480gcagggcagg ctggggccgc ttttgtgagt tttccgcaag atgttgtgaa tgaagtcaca 540aatacaaaaa acgtacgtgc tgtcgcagcg ccaaaacttg gtcccgcagc agatgacgca 600atcagtatgg ccattgcaaa aattcaaaca gcaaaacttc ctgtcgtttt agtcggcatg 660aagggcggaa gaccggaagc gattaaagcg gttcgcaagc tattgaaaaa agtgcagctt 720ccattcgttg aaacatatca agctgccggt actcttacga gagatttaga ggatcagtat 780tttggccgga tcggtttatt ccgcaaccag cctggcgatc tgctgcttga gcaggctgat 840gttgttctga caatcggcta tgacccaatt gaatatgatc cgaaattctg gaatgtcaat 900ggagaccgga cgatcatcca tttagacgag attctggctg acattgatca tgcttaccag 960ccggatcttg aactgatcgg tgatattcca tctacgatca atcatatcga acacgatgct 1020gtgaaagtag actttgcgga acgtgagcag aagatccttt ctgatttaaa acaatatatg 1080catgagggtg agcaggtgcc tgcagattgg aaatcagaca gagtgcatcc

tcttgaaatc 1140gttaaagaat tgcgaaacgc agtcgatgat catgttacag tgacttgcga tatcggttca 1200cacgcgattt ggatgtcacg ttatttccgc agctacgagc cgttaacatt aatgattagt 1260aacggtatgc aaacactcgg cgttgcgctt ccttgggcaa tcggcgcttc attggtgaaa 1320ccgggagaaa aagtagtatc agtctccggt gatggcggtt tcttattctc agctatggaa 1380ttagagacag cagttcgttt aaaagcacca attgtacaca ttgtatggaa cgacagcaca 1440tatgacatgg ttgcattcca gcaattgaaa aaatataatc gtacatctgc ggtcgatttc 1500ggaaatatcg atatcgtgaa atacgcggaa agcttcggag caactggctt acgcgtagaa 1560tcaccagacc agctggcaga tgttctgcgt caaggcatga acgctgaggg gcctgtcatc 1620attgatgtcc cggttgacta cagtgataac gttaatttag caagtgacaa gcttccgaaa 1680gaattcgggg aactcatgaa aacgaaagct ctctag 1716481410DNAArtificial SequenceE. coli ilvCdeltaN in pGV1727 48atgggccgcg atgaattcgc cgatggcgcg agctaccttc agggtaaaaa agtagtcatc 60gtcggctgtg gcgcacaggg tctgaaccag ggcctgaaca tgcgtgattc tggtctcgat 120atctcctacg ctctgcgtaa agaagcgatt gccgagaagc gcgcgtcctg gcgtaaagcg 180accgaaaatg gttttaaagt gggtacttac gaagaactga tcccacaggc ggatctggtg 240attaacctga cgccggacaa gcagcactct gatgtagtgc gcaccgtaca gccactgatg 300aaagacggcg cggcgctggg ctactcgcac ggtttcaaca tcgtcgaagt gggcgagcag 360atccgtaaag atatcaccgt agtgatggtt gcgccgaaat gcccaggcac cgaagtgcgt 420gaagagtaca aacgtgggtt cggcgtaccg acgctgattg ccgttcaccc ggaaaacgat 480ccgaaaggcg aaggcatggc gattgccaaa gcctgggcgg ctgcaaccgg tggtcaccgt 540gcgggtgtgc tggaatcgtc cttcgttgcg gaagtgaaat ctgacctgat gggcgagcaa 600accatcctgt gcggtatgtt gcaggctggc tctctgctgt gcttcgacaa gctggtggaa 660gaaggtaccg atccagcata cgcagaaaaa ctgattcagt tcggttggga aaccatcacc 720gaagcactga aacagggcgg catcaccctg atgatggacc gtctctctaa cccggcgaaa 780ctgcgtgctt atgcgctttc tgaacagctg aaagagatca tggcacccct gttccagaaa 840catatggacg acatcatctc cggcgaattc tcttccggta tgatggcgga ctgggccaac 900gatgataaga aactgctgac ctggcgtgaa gagaccggca aaaccgcgtt tgaaaccgcg 960ccgcagtatg aaggcaaaat cggcgagcag gagtacttcg ataaaggcgt actgatgatt 1020gcgatggtga aagcgggcgt tgaactggcg ttcgaaacca tggtcgattc cggcatcatt 1080gaagagtctg catattatga atcactgcac gagctgccgc tgattgccaa caccatcgcc 1140cgtaagcgtc tgtacgaaat gaacgtggtt atctctgata ccgctgagta cggtaactat 1200ctgttctctt acgcttgtgt gccgttgctg aaaccgttta tggcagagct gcaaccgggc 1260gacctgggta aagctattcc ggaaggcgcg gtagataacg ggcaactgcg tgatgtgaac 1320gaagcgattc gcagccatgc gattgagcag gtaggtaaga aactgcgcgg ctatatgaca 1380gatatgaaac gtattgctgt tgcgggttaa 1410491782DNAArtificial SequenceE. coli ilvDdeltaN (codon optimized for K. lactis) in pGV1727 49atgactggca tgactgatgc agatttcgga aagccaatca ttgccgtcgt caactctttt 60acacaattcg ttccgggtca tgtccatttg cgtgatctag gtaagcttgt tgccgaacaa 120attgaagctg caggtggtgt cgcaaaagag tttaatacta ttgctgtgga cgacggtata 180gctatggggc atggcggtat gttatactct ttaccatcga gagaattaat tgcagactca 240gtcgaatata tggttaatgc tcattgtgcc gatgcaatgg tttgtatctc taattgtgat 300aagataacgc ctggtatgtt gatggcgtcc ttgagattga acatcccagt aatcttcgta 360tctggcggcc caatggaggc tggtaaaact aagttaagtg atcagatcat caaacttgat 420cttgtggatg caatgattca aggtgcagat ccaaaagttt cagactcgca gtcagaccaa 480gttgaaagaa gtgcatgtcc aacttgtggt tcttgcagtg gaatgttcac ggctaactct 540atgaattgct tgactgaagc tctaggttta tctcaaccag gaaatggttc attattagcg 600acccatgcag acagaaagca attgttctta aatgccggaa aaagaattgt ggaactaacg 660aaaaggtatt acgaacaaaa tgatgaatca gcattaccga ggaatatagc ttcaaaggct 720gcattcgaaa atgccatgac attggatatt gcaatgggtg gtagtacaaa cacggtctta 780catcttctag ctgcagccca agaagctgag atagatttca ccatgtctga tatcgacaag 840ctttcacgta aggttccaca gttatgtaag gttgcaccat caactcaaaa gtatcacatg 900gaagacgttc atcgtgcagg aggggttatt ggtattttag gggagttgga cagagccggt 960cttttaaaca gggatgtgaa gaatgtattg ggtttaacac ttccacagac attagagcaa 1020tacgatgtca tgttaactca agatgatgcc gtgaaaaaca tgttcagggc aggtccagca 1080gggatcagaa ccacccaagc attctcgcaa gactgtaggt gggacacttt ggacgatgat 1140agagcaaatg gatgtataag atcgcttgag catgcttata gtaaggatgg tggtttagca 1200gtattatatg gaaacttcgc tgaaaatggt tgcattgtga aaactgctgg tgtagatgat 1260agtattttga aatttactgg acccgctaaa gtttacgaaa gtcaagacga tgctgttgag 1320gctatacttg gcggaaaggt ggtagcagga gacgtggtag tgataagata tgagggacca 1380aagggaggac caggtatgca ggaaatgctt tacccaactt catttttgaa gtccatggga 1440ctaggaaaag cttgtgccct tatcactgac ggtagattct ctggtggcac ttcgggttta 1500agtatcggtc acgtatcacc agaggcagct tctggtggtt cgattggatt gattgaagat 1560ggagatttga tcgccataga tatcccaaat agaggtatcc aattacaagt ctcagacgct 1620gaattggctg caagaagaga agcacaagat gccagaggag ataaggcttg gactcctaaa 1680aatagagaac gtcaagtaag tttcgccctt agggcttatg cttcattggc tacttcagcc 1740gataaggggg cagtaagaga caaatcgaag ttgggtggat ga 17825039DNAArtificial SequencePrimer 575 50ttttgaattc tggttctatc gaggagaaaa agcgacaag 395135DNAArtificial SequencePrimer 576 51ttttggatcc ggatgtgaag tcgttgacac agtcg 355222DNAArtificial SequencePrimer 1623 52gtctctgata aggaaatggc tc 225360DNAArtificial SequencePrimer 1886 53tcaagaagcc tcaagtcggg gttggttcct gttggtggtc cggtaaccca tgtaacatgc 605460DNAArtificial SequencePrimer 1887 54cggtaaccca tgtaacatgc atctattgga cttgaataac attctggttc tatcgaggag 605560DNAArtificial SequencePrimer 1888 55ctttcgttaa caagcccatc tctacttttt tcttggctgt atccggatgt gaagtcgttg 605660DNAArtificial SequencePrimer 1889 56gatgggcttg ttaacgaaag ttgctacatc tagacaattc tgcattatag gccccaatcg 605720DNAArtificial SequencePrimer 1890 57ttagtggcag caaagcagag 205820DNAArtificial SequencePrimer 1892 58acatgatgcc cgttcacaac 205920DNAArtificial SequencePrimer 1916 59caggatgaca gttcgatgag 206020DNAArtificial SequencePrimer 1917 60tgtcaacgac ttcacatccg 206120DNAArtificial SequencePrimer 1920 61tgcagcctag ctttgaagac 206220DNAArtificial SequencePrimer 1921 62tacgttagga ccccagtatc 206367DNAArtificial SequencePrimer 271 63ctagcatgga acaaaaactc atctcagaag aagatggtgt cgacgaattc ccgggatccg 60cggccgc 676467DNAArtificial SequencePrimer 272 64tcgagcggcc gcggatcccg ggaattcgtc gacaccatct tcttctgaga tgagtttttg 60ttccatg 676534DNAArtificial SequencePrimer 421 65gccaacggat cctcaagcat ctaaaacaca accg 346636DNAArtificial SequencePrimer 551 66gctcatgtcg acatgaagaa gctcaacaag tactcg 366735DNAArtificial SequencePrimer 1617 67cgttgagtcg acatgggctt gttaacgaaa gttgc 356834DNAArtificial SequencePrimer 1618 68gccaacggat cctcaagcat ctaaaacaca accg 34696654DNAArtificial SequencepGV1730 69caggcaagtg cacaaacaat acttaaataa atactactca gtaataacct atttcttagc 60atttttgacg aaatttgcta ttttgttaga gtcttttaca ccatttgtct ccacacctcc 120gcttacatca acaccaataa cgccatttaa tctaagcgca tcaccaacat tttctggcgt 180cagtccacca gctaacataa aatgtaagct ttcggggctc tcttgccttc caacccagtc 240agaaatcgag ttccaatcca aaagttcacc tgtcccacct gcttctgaat caaacaaggg 300aataaacgaa tgaggtttct gtgaagctgc actgagtagt atgttgcagt cttttggaaa 360tacgagtctt ttaataactg gcaaaccgag gaactcttgg tattcttgcc acgactcatc 420tccatgcagt tggacgatat caatgccgta atcattgacc agagccaaaa catcctcctt 480aggttgatta cgaaacacgc caaccaagta tttcggagtg cctgaactat ttttatatgc 540ttttacaaga cttgaaattt tccttgcaat aaccgggtca attgttctct ttctattggg 600cacacatata atacccagca agtcagcatc ggaatctaga gcacattctg cggcctctgt 660gctctgcaag ccgcaaactt tcaccaatgg accagaacta cctgtgaaat taataacaga 720catactccaa gctgcctttg tgtgcttaat cacgtatact cacgtgctca atagtcacca 780atgccctccc tcttggccct ctccttttct tttttcgacc gaattaattc ttaatcggca 840aaaaaagaaa agctccggat caagattgta cgtaaggtga caagctattt ttcaataaag 900aatatcttcc actactgcca tctggcgtca taactgcaaa gtacacatat attacgatgc 960tgtctattaa atgcttccta tattatatat atagtaatgt cgttgacgtc gccggcagga 1020gagtgaaaga gccttgttta tatatttttt tttcctatgt tcaacgagga cagctaggtt 1080tatgcaaaaa tgtgccatca ccataagctg attcaaatga gctaaaaaaa aaatagttag 1140aaaataaggt ggtgttgaac gatagcaagt agatcaagac accgtctaac agaaaaaggg 1200gcagcggaca atattatgca attatgaaga aaagtactca aagggtcgga aaaatattca 1260aacgatattt gcattaaatc ctcaattgat tgattattcc atagtaaaat accgtaacaa 1320cacaaaattg ttctcaaatt cataaattat tcattttttc cacgagcctc atcacacgaa 1380aagtcagaag agcatacata atcttttaaa tgcataggtt atgcattttg caaatgccac 1440caggcaacaa aaatatgcgt ttagcgggcg gaatcgggaa ggaagccgga accaccaaaa 1500actggaagct acgtttttaa ggaaggtatg ggtgcagtgt gcttatctca agaaatatta 1560gttatgatat aaggtgttga agtttagaga taggtaaata aacgcggggt gtgtttatta 1620catgaagaag aagttagttt ctgccttgct tgtttatctt gcacatcaca tcagcggaac 1680atatgctcac ccagtcgcga catccaattt atagaaatca gcttgtgggt attgttcaga 1740gaatttttca atcattggag caatcatttt acatggaccg caccaagtgg cgtagaaatc 1800tacgacaact agcttgtctt gagcaattgc agagtcgaat tcgctggcag ttttgaattg 1860agtaaccatt atttgtatcg aggtgtctag tcttctatta cactaatgca gtttcagggt 1920tttggaaacc acactgttta aacagtgttc cttaatcaag gatacctctt tttttttcct 1980tggttccact aattcatcgg tttttttttt ggaagacatc ttttccaacg aaaagaatat 2040acatatcgtt taagagaaat tctccaaatt tgtaaagaag cggacccaga cttaagccta 2100accaggccaa ttcaacagac tgtcggcaac ttcttgtctg gtctttccat ggtaagtgac 2160agtgcagtaa taatatgaac caatttattt ttcgttacat aaaaatgctt ataaaacttt 2220aactaataat tagagattaa atcgcggccg cggatcccta gagagctttc gttttcatga 2280gttccccgaa ttctttcgga agcttgtcac ttgctaaatt aacgttatca ctgtagtcaa 2340ccgggacatc aatgatgaca ggcccctcag cgttcatgcc ttgacgcaga acatctgcca 2400gctggtctgg tgattctacg cgtaagccag ttgctccgaa gctttccgcg tatttcacga 2460tatcgatatt tccgaaatcg accgcagatg tacgattata ttttttcaat tgctggaatg 2520caaccatgtc atatgtgctg tcgttccata caatgtgtac aattggtgct tttaaacgaa 2580ctgctgtctc taattccata gctgagaata agaaaccgcc atcaccggag actgatacta 2640ctttttctcc cggtttcacc aatgaagcgc cgattgccca aggaagcgca acgccgagtg 2700tttgcatacc gttactaatc attaatgtta acggctcgta gctgcggaaa taacgtgaca 2760tccaaatcgc gtgtgaaccg atatcgcaag tcactgtaac atgatcatcg actgcgtttc 2820gcaattcttt aacgatttca agaggatgca ctctgtctga tttccaatct gcaggcacct 2880gctcaccctc atgcatatat tgttttaaat cagaaaggat cttctgctca cgttccgcaa 2940agtctacttt cacagcatcg tgttcgatat gattgatcgt agatggaata tcaccgatca 3000gttcaagatc cggctggtaa gcatgatcaa tgtcagccag aatctcgtct aaatggatga 3060tcgtccggtc tccattgaca ttccagaatt tcggatcata ttcaattggg tcatagccga 3120ttgtcagaac aacatcagcc tgctcaagca gcagatcgcc aggctggttg cggaataaac 3180cgatccggcc aaaatactga tcctctaaat ctctcgtaag agtaccggca gcttgatatg 3240tttcaacgaa tggaagctgc acttttttca atagcttgcg aaccgcttta atcgcttccg 3300gtcttccgcc cttcatgccg actaaaacga caggaagttt tgctgtttga atttttgcaa 3360tggccatact gattgcgtca tctgctgcgg gaccaagttt tggcgctgcg acagcacgta 3420cgttttttgt atttgtgact tcattcacaa catcttgcgg aaaactcaca aaagcggccc 3480cagcctgccc tgctgacgct atcctaaacg catttgtaac agcttccggt atatttttta 3540catcttgaac ttctacactg tattttgtaa tcggctggaa tagcgccgca ttatccaaag 3600attgatgtgt ccgttttaaa cgatctgcac ggatcacgtt cccagcaagc gcaacgacag 3660ggtcaccttc agtgtttgct gtcagcagtc ctgttgccaa gttcgaagca cctggtcctg 3720atgtgactaa cacgactccc ggttttccag ttaaacggcc gactgcttgc gccataaatg 3780ctgcattttg ttcatgccgg gcaacgataa tttcaggccc tttatcttgt aaagcgtcaa 3840ataccgcatc aatttttgca cctggaatgc caaatacatg tgtgacacct tgctccgcta 3900agcaatcaac aacaagctcc gcccctctgc ttttcacaag ggatttttgt tcttttgttg 3960cttttgtcaa catgtcgact ttatgtgatg attgattgat tgattgtaca gtttgttttt 4020cttaatatct atttcgatga cttctatatg atattgcact aacaagaaga tattataatg 4080caattgatac aagacaagga gttatttgct tctcttttat atgattctga caatccatat 4140tgcgttggta gtcttttttg ctggaacggt tcagcggaaa agacgcatcg ctctttttgc 4200ttctagaaga aatgccagca aaagaatctc ttgacagtga ctgacagcaa aaatgtcttt 4260ttctaactag taacaaggct aagatatcag cctgaaataa agggtggtga agtaataatt 4320aaatcatccg tataaaccta tacacatata tgaggaaaaa taatacaaaa gtgttttaaa 4380tacagataca tacatgaaca tatgcacgta tagcgcccaa atgtcggtaa tgggatcggc 4440gagctccagc ttttgttccc tttagtgagg gttaattgcg cgcttggcgt aatcatggtc 4500atagctgttt cctgtgtgaa attgttatcc gctcacaatt ccacacaaca taggagccgg 4560aagcataaag tgtaaagcct ggggtgccta atgagtgagg taactcacat taattgcgtt 4620gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc cagctgcatt aatgaatcgg 4680ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct tccgcttcct cgctcactga 4740ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca gctcactcaa aggcggtaat 4800acggttatcc acagaatcag gggataacgc aggaaagaac atgtgagcaa aaggccagca 4860aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc tccgcccccc 4920tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga caggactata 4980aagataccag gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc cgaccctgcc 5040gcttaccgga tacctgtccg cctttctccc ttcgggaagc gtggcgcttt ctcatagctc 5100acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc aagctgggct gtgtgcacga 5160accccccgtt cagcccgacc gctgcgcctt atccggtaac tatcgtcttg agtccaaccc 5220ggtaagacac gacttatcgc cactggcagc agccactggt aacaggatta gcagagcgag 5280gtatgtaggc ggtgctacag agttcttgaa gtggtggcct aactacggct acactagaag 5340gacagtattt ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa gagttggtag 5400ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt gcaagcagca 5460gattacgcgc agaaaaaaag gatctcaaga agatcctttg atcttttcta cggggtctga 5520cgctcagtgg aacgaaaact cacgttaagg gattttggtc atgagattat caaaaaggat 5580cttcacctag atccttttaa attaaaaatg aagttttaaa tcaatctaaa gtatatatga 5640gtaaacttgg tctgacagtt accaatgctt aatcagtgag gcacctatct cagcgatctg 5700tctatttcgt tcatccatag ttgcctgact ccccgtcgtg tagataacta cgatacggga 5760gggcttacca tctggcccca gtgctgcaat gataccgcga gacccacgct caccggctcc 5820agatttatca gcaataaacc agccagccgg aagggccgag cgcagaagtg gtcctgcaac 5880tttatccgcc tccatccagt ctattaattg ttgccgggaa gctagagtaa gtagttcgcc 5940agttaatagt ttgcgcaacg ttgttgccat tgctacaggc atcgtggtgt cacgctcgtc 6000gtttggtatg gcttcattca gctccggttc ccaacgatca aggcgagtta catgatcccc 6060catgttgtgc aaaaaagcgg ttagctcctt cggtcctccg atcgttgtca gaagtaagtt 6120ggccgcagtg ttatcactca tggttatggc agcactgcat aattctctta ctgtcatgcc 6180atccgtaaga tgcttttctg tgactggtga gtactcaacc aagtcattct gagaatagtg 6240tatgcggcga ccgagttgct cttgcccggc gtcaatacgg gataataccg cgccacatag 6300cagaacttta aaagtgctca tcattggaaa acgttcttcg gggcgaaaac tctcaaggat 6360cttaccgctg ttgagatcca gttcgatgta acccactcgt gcacccaact gatcttcagc 6420atcttttact ttcaccagcg tttctgggtg agcaaaaaca ggaaggcaaa atgccgcaaa 6480aaagggaata agggcgacac ggaaatgttg aatactcata ctcttccttt ttcaatatta 6540ttgaagcatt tatcagggtt attgtctcat gagcggatac atatttgaat gtatttagaa 6600aaataaacaa ataggggttc cgcgcacatt tccccgaaaa gtgccacctg acgt 6654701728DNAArtificial SequenceB. subtilis alsS in pGV1730 70gtcgacatgt tgactaaagc tacaaaagag cagaaatcat tggtgaaaaa taggggtgca 60gaacttgttg tggactgttt ggtagaacag ggcgtaacac atgtttttgg tatcccaggt 120gcaaaaatcg acgccgtgtt tgatgcatta caagacaagg gtccagaaat tattgttgct 180agacatgagc aaaatgccgc atttatggcg caagctgtag gtaggcttac aggtaaacct 240ggtgttgtcc tagttacgtc tggcccagga gcctccaatt tagcaactgg tctattgaca 300gctaatactg agggagatcc tgtagttgcg ttagccggta atgtaattag agctgatagg 360cttaagagaa ctcaccagtc tctagacaac gctgctttat tccaaccgat caccaagtac 420tcagtagagg tacaagacgt aaagaatata cctgaagctg tgacaaacgc atttcgtata 480gcttctgctg gtcaggctgg tgccgcgttt gtttcttttc ctcaagacgt tgtcaatgaa 540gtgaccaata ctaaaaacgt tagagcggtt gcagccccta aactaggtcc agccgcagac 600gacgcaatta gcgctgcaat tgctaaaatt cagacggcga aactaccagt agtccttgtc 660ggtatgaagg gcggaagacc agaagcaata aaagctgttc gtaagttatt gaagaaagtc 720caattacctt tcgttgagac ttaccaagca gcaggtactt tatctagaga tttagaggat 780cagtattttg gaaggatagg tctatttaga aaccaaccag gagatttact attagaacaa 840gctgatgttg tacttactat cggttatgat cctatagagt atgacccaaa gttttggaac 900ataaatgggg atagaacaat tatacatcta gacgagataa tcgccgacat cgatcacgct 960tatcaaccag atttagaact aatcggagat atcccgtcaa caatcaatca tattgaacat 1020gatgctgtaa aggttgagtt cgctgaacgt gagcagaaaa tcttatctga tctaaagcaa 1080tatatgcatg agggtgaaca agttccagca gactggaaat ctgaccgtgc acatcctttg 1140gaaatcgtta aggaactaag aaatgcggtc gatgatcatg tgactgttac atgtgatatc 1200ggttcacatg caatttggat gtcacgttat tttaggagct acgaaccatt aactttaatg 1260atatctaacg ggatgcaaac tctgggggtt gcacttcctt gggctattgg cgctagttta 1320gttaagcccg gtgagaaggt ggtatcggta tcaggtgatg gtggctttct gttttcggct 1380atggaattag aaactgcagt ccgtttaaaa gctcccattg tgcatattgt ctggaatgat 1440tctacttacg acatggttgc ttttcaacag ttgaagaaat acaatagaac ttcggctgta 1500gactttggta acatcgatat tgtgaaatat gctgagtctt ttggcgcaac aggcctgagg 1560gtggaaagtc cagatcagtt agctgatgtg ttgagacaag ggatgaatgc cgagggaccg 1620gtaatcatag atgtgccagt tgactactca gacaatatta atttggcttc tgataaactt 1680cctaaagagt ttggcgagct aatgaagacc aaagccttat aaggatcc 1728711698DNATrichoderma atraviride 71gtcgacatga caaaggatac cgttgacatt ttgattgatt ctttaaaagc agcaggtgta 60aaatatgttt tcggcgttcc gggagcgaaa attgactccg tgtttaatgc cctaatcgat 120catccagaca tcaagttagt tgtatgtaga cacgaacaaa acgccgcctt tatcgcagca 180gctatgggta aggttaccgg tagacctggt gtctgcatcg ctacaagtgg gcctgggact 240tctaatttgg ttacaggcct ggttacagcg accgacgaag gggcgccggt tgttgctata 300gtgggttcag ttaaacgtag tcaatcatta caaagaactc atcagtcgct aaggggagcc 360gacctgttgg ctcccgttac caagaaggtg gtaagtgccg ttgtcgaaga tcaagttgcc 420gaaatcatgt tggatgcatt tcgtgttgca gctgcttccc ctccaggcgc taccgctgtg 480tctcttccca tcgatctgat gacgccagcc aaatctactt ctaccgttac ggccttccca 540gctgaatgtt tcatacctcc aaaatacggc aaaagccctg aaactacatt acaagccgca 600gccgatttga taagcgccgc caaagctcca

gttctattct tagggatgcg tgttagcgag 660tctgacgata caattagcgc agtacacggt tttcttcgta agcatcctgt tccagttgtg 720gaaacctttc aagctgcagg cgcgatttcc aaagagctag tgcacttatt ttatggtaga 780atcggtttat tttctaatca accgggtgat caattgctac aacatgcgga cctagtaata 840gcgatcggct tagatcaagc tgagtatgac gctaatatgt ggaacgccag aggcacaaca 900attttacatg tcgatataca accagcggac tttgttgctc attataaacc taagatcgag 960ctggtcggtt cactagcaga caacatgaca gatttgactt ctaggttgga tacggtcgct 1020aggctacaat taacgaaacc tggtgaagcc attagaacca acatgtggga atggcaaaat 1080tccccggaag cctccggtag atcaacgggt cctgttcatc cattgcactt tattagacta 1140tttcaatcca ttattgaccc gagcaccact gtaattagtg atgtaggtag tgtgtatatc 1200tggttgtgca gatacttcta ctcttacgct cgtagaactt tcctgatgag taacgtgcag 1260caaacacttg gagtcgctat gccttgggcg ataggggtat ctttatctca gacgccacct 1320agtagtaaga aagttgtatc cattagcggt gatggtggtt ttatgttctc ttcacaagag 1380ttggtgacag ctgttcaaca aggttgcaac atcactcatt ttatatggaa cgatggaaaa 1440tataacatgg tggaatttca agaagttaat aagtatggta ggtcatccgg cgtggatcta 1500ggtggagtgg attttgtaaa gttagctgat agtatgggag ccaagggttt aagagtatca 1560agtgctggcg atcttgaagc cgtaatgaag gaagcattag catacgacgg tgtatgtttg 1620gttgacatag aaattgacta ctctcaaaac cataacttaa tgatggattt ggtaacatcc 1680gatgtatctt aaggatcc 1698721720DNATalaromyces stipitatus 72agtcgacatg tctaacagga acccttctca cgtgatagtg gagtcattat ctaatgccgg 60cgttaagata gttttcggga taccaggtgc aaaggtcgat ggtatctttg atgcattgtc 120agatcatcct actatcaagt tgattgtgtg tagacatgaa cagaacgctg cctttatggc 180agccgcagtt ggacgtctta ctggcgcccc gggtgtctgc ttagtaacga gtgggcctgg 240aacttctaat ttggtaaccg gtttagctac tgccactaca gaaggtgatc ctgttttagc 300aatagctgga acagtctcta gattgcaagc agctaggcat actcatcaaa gtttagatgt 360taacaaagta ttagaagggg tctgtaagag tgtaatacaa gtcggggtgg aagatcaagt 420gagtgaagta atcgctaatg cttttagaca tgcgaggcaa ttcccacaag gagccaccgc 480agttgcgctg ccaatggata taataaaatc tacttccgtg ggtgtgccac cttttccatc 540tctatcattc gaggcaccag gttatggtag ttccaatacg aaactttgta aagtagcggt 600cgataaacta attgcggcga aatatccagt gatactgctg ggaatgagat cctcagaccc 660tgagattgta gcttcagtcc gtcgtatgat aaaagatcat accttgcctg tagttgaaac 720ttttcaagct gcgggagcca tctcagaaga tttgcttcat agatactatg ggagggtggg 780tttattccgt aatcaacctg gtgacaaagt actagcaaga gcagacctga ttattgcagt 840tggctacgat ccatacgaat atgatgcaga aacatggaat gtcaataatc cagcaaccat 900acacaacatt attcacattg attacacaca ttccagggtg tcacaacact atatgcctca 960tgttgagcta ctgggaaacc cagcggatat cgtcgatgaa ttgacggcca gtttacaggc 1020cctaaaacca aacttttggt ctggggctga agatacctta gaaaatatta ggcaagaaat 1080agctcgttgt gaagccactg ccactcatac tgaatctttg caagatggcg cggttcagcc 1140tactcacttc gtatatcaat tgaggcatct gttaccaaag gaaactattg ttgctgttga 1200tgtaggaacc gtctatatct acatgatgag atacttccaa acctattcac cgagacactt 1260gctgtgtagt aatggacaac aaactttggg agttggtttg ccttgggcta tagctgcttc 1320actaattcaa gaacctcctt gcagtaggaa ggttgtctct atatctggtg atggcgggtt 1380tatgtttagt agccaagaac tggctacggc agtcttgcaa aagtgtaaca taacccattt 1440tatttggaat gacagcggct acaacatggt tgaatttcaa gaggaggcta agtatggtcg 1500tagctctggt ataaaactag gcggtattga tttcgtcaaa tttgcagagg ctttcgacgg 1560tgcgcgtgga ttccgtataa acagcaccaa agaagttaag gaggtcatta aagaggcact 1620agcctttgaa ggcgttgcta tagttgatgt cagaatcgat tattctagga gtcatgaatt 1680aatgaaagat attattccaa aggactacca ataaggatcc 1720731635DNAArtificial SequencePiromyces ilvD with predicted MTS removed 73atgggtagtc aagcgatgtt aatcgcaact ggtataaaac cagaagattt aaaaaagcca 60cagatcggca taggcagtgt ttggtatgat ggaaatccat gcaacatgca tctattggat 120cttggctccg tggtaaaaaa ggccgttcaa aaacaaaata tgaatggtat gagattcaat 180atgattggag tgtcagacgg gatctccaac ggtacggatg gaatgtcctt ttctttgcag 240tcccgtgaaa ttattgcgga ttctatcgaa acaatcatgt ctgcacaata ttatgatgct 300aacatcagct tacctggctg cgacaagaac atgcctggtt gtttaatcgc cgctgccaga 360ttgaacagac cgactataat tatctacggt ggcacgatca agcccggaca tacaaaaaag 420ggagagacga ttgatttagt ctcggccttc caatgttatg ggcaatactt ggctggagaa 480attactgaag agcaaagaga agaaatagtg aataatgcat gtcctggcgc aggtgcatgc 540ggtggaatgt atacagctaa tacaatggct tccataatcg aatcaatggg tatgagttta 600ccttactccg cctcgacccc ggcagaagac ccattgaaag agcttgaatg tataaacgcg 660gcagctgcaa ttaagaattt aatggaaaaa gacatcaagc cattagacat aatgacaaga 720aaagcgtttg agaacgctat aactattact ttgattcttg gagggagtac aaactccgtt 780ctgcaccttt tggctatcgc tagggcctgc aaagtcccat taactattga cgatttccag 840gaattttcta ataggatacc cgttttagcc gacttaaaac ctagtggtaa atatgtcatg 900gaagatttgc agttgatcgg cggtcttcca gctattcaga aatatcttct gaatgaaggt 960ctacttcatg gtgatattat gactgttacc ggaaagaccc tagcagagaa tttgaaagac 1020gttgctccaa tcgattttga aactcaagat ataattagac ctttatcgaa tcccattaaa 1080aagaatggtc acattatcat tatgaaaggt aacgtctctc cggacggtgg tgttgctaaa 1140attacaggta agcagggatt gtttttcgaa ggcgtggcga attgctttga ttgtgaagaa 1200gacatgttag ctgcactgga aagaggcgaa attaaaaaag gtcaagtgat tataataagg 1260tatgaaggcc ccactggagg gcctggtatg ccggagatgc taactccgac cagtgctatt 1320atgggtgctg ggttaggaaa agatgtagca ctattaacag atggcagatt ttcaggcggg 1380tcacacggct tcattattgg tcatattacg cctgaggcac aagtaggtgg tccaattgcc 1440ctaatcaaaa acggtgataa gataactata gacgcgaata aacgtaccat acatgcccat 1500gtcagcgaag aagaatttgc taaaagacgt gccgagtgga aagcaccacc ttacagagct 1560actcaaggta ctttaaagaa atacattaag ctggttaaac ccgcaaactt tggatgtgtt 1620accgatgagt ggtaa 16357424DNAArtificial SequencePrimer 351 74cttcttgctc attagaaaga aagc 247521DNAArtificial SequencePrimer 1625 75caaggttacg gtcaaggttt g 217622DNAArtificial SequencePrimer 1626 76cattggttcc ggttacgttt ac 227746DNAArtificial SequencePrimer 1615 77caactcgcgg ccgcggatcc taggttattg gttttctggt ctcaac 467832DNAArtificial SequencePrimer 1616 78cgccgactcg agatgttgag aactcaagcc gc 327944DNAArtificial SequencePrimer 1809 79cgccgactcg aggtcgacat gggtttgaag caaatcaact tcgg 44807564DNAArtificial SequencepGV1354 80tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg 120ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc 180accataaacg acattactat atatataata taggaagcat ttaatagaca gcatcgtaat 240atatgtgtac tttgcagtta tgacgccaga tggcagtagt ggaagatatt ctttattgaa 300aaatagcttg tcaccttacg tacaatcttg atccggagct tttctttttt tgccgattaa 360gaattaattc ggtcgaaaaa agaaaaggag agggccaaga gggagggcat tggtgactat 420tgagcacgtg agtatacgtg attaagcaca caaaggcagc ttggagtatg tctgttatta 480atttcacagg tagttctggt ccattggtga aagtttgcgg cttgcagagc acagaggccg 540cagaatgtgc tctagattcc gatgctgact tgctgggtat tatatgtgtg cccaatagaa 600agagaacaat tgacccggtt attgcaagga aaatttcaag tcttgtaaaa gcatataaaa 660atagttcagg cactccgaaa tacttggttg gcgtgtttcg taatcaacct aaggaggatg 720ttttggctct ggtcaatgat tacggcattg atatcgtcca actgcatgga gatgagtcgt 780ggcaagaata ccaagagttc ctcggtttgc cagttattaa aagactcgta tttccaaaag 840actgcaacat actactcagt gcagcttcac agaaacctca ttcgtttatt cccttgtttg 900attcagaagc aggtgggaca ggtgaacttt tggattggaa ctcgatttct gactgggttg 960gaaggcaaga gagccccgaa agcttacatt ttatgttagc tggtggactg acgccagaaa 1020atgttggtga tgcgcttaga ttaaatggcg ttattggtgt tgatgtaagc ggaggtgtgg 1080agacaaatgg tgtaaaagac tctaacaaaa tagcaaattt cgtcaaaaat gctaagaaat 1140aggttattac tgagtagtat ttatttaagt attgtttgtg cacttgccta tgcggtgtga 1200aataccgcac agatgcgtaa ggagaaaata ccgcatcagg aaattgtaaa cgttaatatt 1260ttgttaaaat tcgcgttaaa tttttgttaa atcagctcat tttttaacca ataggccgaa 1320atcggcaaaa tcccttataa atcaaaagaa tagaccgaga tagggttgag tgttgttcca 1380gtttggaaca agagtccact attaaagaac gtggactcca acgtcaaagg gcgaaaaacc 1440gtctatcagg gcgatggccc actacgtgaa ccatcaccct aatcaagttt tttggggtcg 1500aggtgccgta aagcactaaa tcggaaccct aaagggagcc cccgatttag agcttgacgg 1560ggaaagccgg cgaacgtggc gagaaaggaa gggaagaaag cgaaaggagc gggcgctagg 1620gcgctggcaa gtgtagcggt cacgctgcgc gtaaccacca cacccgccgc gcttaatgcg 1680ccgctacagg gcgcgtcgcg ccattcgcca ttcaggctgc gcaactgttg ggaagggcga 1740tcggtgcggg cctcttcgct attacgccag ctggcgaaag ggggatgtgc tgcaaggcga 1800ttaagttggg taacgccagg gttttcccag tcacgacgtt gtaaaacgac ggccagtgag 1860cgcgcgtaat acgactcact atagggcgaa ttgggtaccg gccgcaaatt aaagccttcg 1920agcgtcccaa aaccttctca agcaaggttt tcagtataat gttacatgcg tacacgcgtc 1980tgtacagaaa aaaaagaaaa atttgaaata taaataacgt tcttaatact aacataacta 2040taaaaaaata aatagggacc tagacttcag gttgtctaac tccttccttt tcggttagag 2100cggatgtggg gggagggcgt gaatgtaagc gtgacataac taattacatg actcgagcgg 2160ccgcggatcc ttattggttt tctggtctca actttctgac ttccttacca accttccaga 2220tttccatgtt tctgatggtg tctaattcct tttctagctt ttctctgtag tcaggttgag 2280agttgaattc caaagatctc ttggtttcgg taccgttctt ggtagattcg tacaagtctt 2340ggaaaacagg cttcaaagca ttcttgaaga ttgggtacca gtccaaagca cctcttctgg 2400cggtggtgga acaagcatcg tacatgtaat ccataccgta cttaccgatc aatgggtata 2460gagattgggt agcttcttcg acggtttcgt tgaaagcttc agatggggag tgaccgtttt 2520ctctcaagac gtcgtattga gccaagaaca taccgtggat accacccatt aaacaacctc 2580tttcaccgta caagtcagag ttgacttctc tttcgaaagt ggtttggtaa acgtaaccgg 2640aaccaatggc aacggccaaa gcttgggcct tttcgtgagc cttaccggtg acatcgttcc 2700agacggcgta agaagagtta ataccacgac cttccttgaa caaagatctg acagttctac 2760cggaaccctt tggagcaacc aagataacat ctaagtcctt tggtggttca acgtgagtca 2820agtccttgaa gactggggag aaaccgtggg agaagtacaa agtcttaccc ttggtcaaca 2880atggcttgat agcaggccag gtttctgatt gagcggcatc ggacaacaag ttcataacgt 2940aactacctct cttgatagca tcttcaacag tgaacaagtt cttgcctgga acccaaccgt 3000cttcgatggc agccttccaa gaagcaccat ctttacggac accaatgata acgttcaaac 3060cgttgtctct caagttcaaa ccttgaccgt aaccttggga accgtaaccg atcaaagcaa 3120aagtgtcgtt cttgaagtag tccaacaact tttctcttgg ccagtcagct ctttcgtaga 3180cggtttcaac agtaccaccg aagttgattt gcttcaacat gtcgacacca tcttcttctg 3240agatgagttt ttgttccatg ctagttctag aatccgtcga aactaagttc tggtgtttta 3300aaactaaaaa aaagactaac tataaaagta gaatttaaga agtttaagaa atagatttac 3360agaattacaa tcaataccta ccgtctttat atacttatta gtcaagtagg ggaataattt 3420cagggaactg gtttcaacct tttttttcag ctttttccaa atcagagaga gcagaaggta 3480atagaaggtg taagaaaatg agatagatac atgcgtgggt caattgcctt gtgtcatcat 3540ttactccagg caggttgcat cactccattg aggttgtgcc cgttttttgc ctgtttgtgc 3600ccctgttctc tgtagttgcg ctaagagaat ggacctatga actgatggtt ggtgaagaaa 3660acaatatttt ggtgctggga ttcttttttt ttctggatgc cagcttaaaa agcgggctcc 3720attatattta gtggatgcca ggaataaact gttcacccag acacctacga tgttatatat 3780tctgtgtaac ccgcccccta ttttgggcat gtacgggtta cagcagaatt aaaaggctaa 3840ttttttgact aaataaagtt aggaaaatca ctactattaa ttatttacgt attctttgaa 3900atggcgagta ttgataatga taaactgagc tagatctggg cccgagctcc agcttttgtt 3960ccctttagtg agggttaatt gcgcgcttgg cgtaatcatg gtcatagctg tttcctgtgt 4020gaaattgtta tccgctcaca attccacaca acataggagc cggaagcata aagtgtaaag 4080cctggggtgc ctaatgagtg aggtaactca cattaattgc gttgcgctca ctgcccgctt 4140tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 4200gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg cgctcggtcg 4260ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta tccacagaat 4320caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta 4380aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag catcacaaaa 4440atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac caggcgtttc 4500cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc ggatacctgt 4560ccgcctttct cccttcggga agcgtggcgc tttctcatag ctcacgctgt aggtatctca 4620gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc gttcagcccg 4680accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga cacgacttat 4740cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta ggcggtgcta 4800cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta tttggtatct 4860gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga tccggcaaac 4920aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg cgcagaaaaa 4980aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag tggaacgaaa 5040actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc tagatccttt 5100taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact tggtctgaca 5160gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt cgttcatcca 5220tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta ccatctggcc 5280ccagtgctgc aatgataccg cgagacccac gctcaccggc tccagattta tcagcaataa 5340accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc gcctccatcc 5400agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat agtttgcgca 5460acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat 5520tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag 5580cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca gtgttatcac 5640tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta agatgctttt 5700ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg cgaccgagtt 5760gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact ttaaaagtgc 5820tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg ctgttgagat 5880ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt actttcacca 5940gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga ataagggcga 6000cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc atttatcagg 6060gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagggg 6120ttccgcgcac atttccccga aaagtgccac ctgaacgaag catctgtgct tcattttgta 6180gaacaaaaat gcaacgcgag agcgctaatt tttcaaacaa agaatctgag ctgcattttt 6240acagaacaga aatgcaacgc gaaagcgcta ttttaccaac gaagaatctg tgcttcattt 6300ttgtaaaaca aaaatgcaac gcgagagcgc taatttttca aacaaagaat ctgagctgca 6360tttttacaga acagaaatgc aacgcgagag cgctatttta ccaacaaaga atctatactt 6420cttttttgtt ctacaaaaat gcatcccgag agcgctattt ttctaacaaa gcatcttaga 6480ttactttttt tctcctttgt gcgctctata atgcagtctc ttgataactt tttgcactgt 6540aggtccgtta aggttagaag aaggctactt tggtgtctat tttctcttcc ataaaaaaag 6600cctgactcca cttcccgcgt ttactgatta ctagcgaagc tgcgggtgca ttttttcaag 6660ataaaggcat ccccgattat attctatacc gatgtggatt gcgcatactt tgtgaacaga 6720aagtgatagc gttgatgatt cttcattggt cagaaaatta tgaacggttt cttctatttt 6780gtctctatat actacgtata ggaaatgttt acattttcgt attgttttcg attcactcta 6840tgaatagttc ttactacaat ttttttgtct aaagagtaat actagagata aacataaaaa 6900atgtagaggt cgagtttaga tgcaagttca aggagcgaaa ggtggatggg taggttatat 6960agggatatag cacagagata tatagcaaag agatactttt gagcaatgtt tgtggaagcg 7020gtattcgcaa tattttagta gctcgttaca gtccggtgcg tttttggttt tttgaaagtg 7080cgtcttcaga gcgcttttgg ttttcaaaag cgctctgaag ttcctatact ttctagagaa 7140taggaacttc ggaataggaa cttcaaagcg tttccgaaaa cgagcgcttc cgaaaatgca 7200acgcgagctg cgcacataca gctcactgtt cacgtcgcac ctatatctgc gtgttgcctg 7260tatatatata tacatgagaa gaacggcata gtgcgtgttt atgcttaaat gcgtacttat 7320atgcgtctat ttatgtagga tgaaaggtag tctagtacct cctgtgatat tatcccattc 7380catgcggggt atcgtatgct tccttcagca ctacccttta gctgttctat atgctgccac 7440tcctcaattg gattagtctc atccttcaat gctatcattt cctttgatat tggatcatat 7500taagaaacca ttattatcat gacattaacc tataaaaata ggcgtatcac gaggcccttt 7560cgtc 7564817955DNAArtificial SequencepGV1662 81ttggatcata ctaagaaacc attattatca tgacattaac ctataaaaat aggcgtatca 60cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga aaacctctga cacatgcagc 120tcccggagac ggtcacagct tgtctgtaag cggatgccgg gagcagacaa gcccgtcagg 180gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa ctatgcggca tcagagcaga 240ttgtactgag agtgcaccat accacagctt ttcaattcaa ttcatcattt tttttttatt 300cttttttttg atttcggttt ctttgaaatt tttttgattc ggtaatctcc gaacagaagg 360aagaacgaag gaaggagcac agacttagat tggtatatat acgcatatgt agtgttgaag 420aaacatgaaa ttgcccagta ttcttaaccc aactgcacag aacaaaaacc tgcaggaaac 480gaagataaat catgtcgaaa gctacatata aggaacgtgc tgctactcat cctagtcctg 540ttgctgccaa gctatttaat atcatgcacg aaaagcaaac aaacttgtgt gcttcattgg 600atgttcgtac caccaaggaa ttactggagt tagttgaagc attaggtccc aaaatttgtt 660tactaaaaac acatgtggat atcttgactg atttttccat ggagggcaca gttaagccgc 720taaaggcatt atccgccaag tacaattttt tactcttcga agacagaaaa tttgctgaca 780ttggtaatac agtcaaattg cagtactctg cgggtgtata cagaatagca gaatgggcag 840acattacgaa tgcacacggt gtggtgggcc caggtattgt tagcggtttg aagcaggcgg 900cagaagaagt aacaaaggaa cctagaggcc ttttgatgtt agcagaattg tcatgcaagg 960gctccctatc tactggagaa tatactaagg gtactgttga cattgcgaag agcgacaaag 1020attttgttat cggctttatt gctcaaagag acatgggtgg aagagatgaa ggttacgatt 1080ggttgattat gacacccggt gtgggtttag atgacaaggg agacgcattg ggtcaacagt 1140atagaaccgt ggatgatgtg gtctctacag gatctgacat tattattgtt ggaagaggac 1200tatttgcaaa gggaagggat gctaaggtag agggtgaacg ttacagaaaa gcaggctggg 1260aagcatattt gagaagatgc ggccagcaaa actaaaaaac tgtattataa gtaaatgcat 1320gtatactaaa ctcacaaatt agagcttcaa tttaattata tcagttatta ccctatgcgg 1380tgtgaaatac cgcacagatg cgtaaggaga aaataccgca tcaggaaatt gtaaacgtta 1440atattttgtt aaaattcgcg ttaaattttt gttaaatcag ctcatttttt aaccaatagg 1500ccgaaatcgg caaaatccct tataaatcaa aagaatagac cgagataggg ttgagtgttg 1560ttccagtttg gaacaagagt ccactattaa agaacgtgga ctccaacgtc aaagggcgaa 1620aaaccgtcta tcagggcgat ggcccactac gtgaaccatc accctaatca agttttttgg 1680ggtcgaggtg ccgtaaagca ctaaatcgga accctaaagg gagcccccga tttagagctt 1740gacggggaaa gccggcgaac gtggcgagaa aggaagggaa gaaagcgaaa ggagcgggcg 1800ctagggcgct ggcaagtgta gcggtcacgc tgcgcgtaac caccacaccc gccgcgctta 1860atgcgccgct acagggcgcg tcgcgccatt cgccattcag gctgcgcaac tgttgggaag 1920ggcgatcggt gcgggcctct tcgctattac gccagctggc gaaaggggga tgtgctgcaa 1980ggcgattaag ttgggtaacg ccagggtttt cccagtcacg acgttgtaaa acgacggcca 2040gtgagcgcgc gtaatacgac tcactatagg gcgaattggg taccggccgc aaattaaagc 2100cttcgagcgt cccaaaacct tctcaagcaa ggttttcagt ataatgttac atgcgtacac 2160gcgtctgtac

agaaaaaaaa gaaaaatttg aaatataaat aacgttctta atactaacat 2220aactataaaa aaataaatag ggacctagac ttcaggttgt ctaactcctt ccttttcggt 2280tagagcggat gtggggggag ggcgtgaatg taagcgtgac ataactaatt acatgactcg 2340agcggccgcg gatccttagg atttattctg ttcagcaaac agcttgccca ttttcttcag 2400taccttcggt gcgccttctt tcgccaggat cagttcgatc cagtacatac ggttcggatc 2460ggcctgggcc tctttcatca cgctcacaaa ttcgttttcg gtacgcacaa ttttagacac 2520aacacggtcc tcagttgcgc cgaaggactc cggcagttta gagtagttcc acatagggat 2580atcgttgtaa gactggttcg gaccgtggat ctcacgctca acggtgtagc cgtcattgtt 2640aataatgaag caaatcgggt tgatcttttc acgaattgcc agacccagtt cctgtacggt 2700cagctgcagg gaaccgtcac cgatgaacag cagatgacga gattctttat cagcgatctg 2760agagcccagc gctgccggga aagtatagcc aatgctaccc cacagcggct gaccgataaa 2820atggcttttg gatttcagaa agatagaaga cgcgccgaaa aagctcgtac cttgttccgc 2880cacgatggtt tcattgctct gggtcaggtt ctccacggcc tgccacaggc gatcctggga 2940cagcagtgcg ttagatggta cgaaatcttc ttgctttttg tcaatgtatt tgcctttata 3000ctcgatttcg gacaggtcca gcagagagct gatcaggctt tcgaagtcga agttctggat 3060acgctcgttg aagattttac cctcgtcgat gttcaggcta atcattttgt tttcgttcag 3120atggtgagtg aatgcaccgg tagaagagtc ggtcagttta acgcccagca tcaggatgaa 3180gtccgcagat tcaacaaatt ctttcaggtt cggttcgctc agagtaccgt tgtagatgcc 3240caggaaagac ggcagagcct cgtcaacaga ggacttgccg aagttcaggg tggtaatcgg 3300cagtttggtt ttgctgatga attgggtcac ggtcttctcc agaccaaaag aaatgatttc 3360gtggccggtg atcacgattg gtttctttgc gtttttcaga gactcctgga ttttgttcag 3420gatttcctgg tcgctagtgt tagaagtgga gttttctttc ttcagcggca ggctcggttt 3480ttccgcttta gctgccgcaa catccacagg caggttgatg taaactggtt tgcgttcttt 3540cagcagcgca gacagaacgc ggtcgatttc cacagtagcg ttctctgcag tcagcagcgt 3600acgtgccgca gtcacaggtt catgcatttt catgaagtgt ttgaaatcgc cgtcagccag 3660agtgtggtgg acgaatttac cttcgttctg aactttgctc gttgggctgc ctacgatctc 3720caccaccggc aggttttcgg cgtaggagcc cgccagaccg ttgacggcgc tcagttcgcc 3780aacaccgaaa gtggtcagaa atgccgcggc tttcttggta cgtgcataac catctgccat 3840gtagcttgcg ttcagttcgt tagcgttacc cacccatttc atgtctttat gagagatgat 3900ctgatccagg aactgcagat tgtaatcacc cggaacgccg aagatttctt cgatacccag 3960ttcatgcaga cggtccagca gataatcacc aacagtatac atgtcgacaa acttagatta 4020gattgctatg ctttctttct aatgagcaag aagtaaaaaa agttgtaata gaacaagaaa 4080aatgaaactg aaacttgaga aattgaagac cgtttattaa cttaaatatc aatgggaggt 4140catcgaaaga gaaaaaaatc aaaaaaaaaa ttttcaagaa aaagaaacgt gataaaaatt 4200tttattgcct ttttcgacga agaaaaagaa acgaggcggt ctcttttttc ttttccaaac 4260ctttagtacg ggtaattaac gacaccctag aggaagaaag aggggaaatt tagtatgctg 4320tgcttgggtg ttttgaagtg gtacggcgat gcgcggagtc cgagaaaatc tggaagagta 4380aaaaaggagt agaaacattt tgaagctatg agctccagct tttgttccct ttagtgaggg 4440ttaattgcgc gcttggcgta atcatggtca tagctgtttc ctgtgtgaaa ttgttatccg 4500ctcacaattc cacacaacat aggagccgga agcataaagt gtaaagcctg gggtgcctaa 4560tgagtgaggt aactcacatt aattgcgttg cgctcactgc ccgctttcca gtcgggaaac 4620ctgtcgtgcc agctgcatta atgaatcggc caacgcgcgg ggagaggcgg tttgcgtatt 4680gggcgctctt ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg gctgcggcga 4740gcggtatcag ctcactcaaa ggcggtaata cggttatcca cagaatcagg ggataacgca 4800ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa ggccgcgttg 4860ctggcgtttt tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt 4920cagaggtggc gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc 4980ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct 5040tcgggaagcg tggcgctttc tcatagctca cgctgtaggt atctcagttc ggtgtaggtc 5100gttcgctcca agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta 5160tccggtaact atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca 5220gccactggta acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag 5280tggtggccta actacggcta cactagaagg acagtatttg gtatctgcgc tctgctgaag 5340ccagttacct tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt 5400agcggtggtt tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa 5460gatcctttga tcttttctac ggggtctgac gctcagtgga acgaaaactc acgttaaggg 5520attttggtca tgagattatc aaaaaggatc ttcacctaga tccttttaaa ttaaaaatga 5580agttttaaat caatctaaag tatatatgag taaacttggt ctgacagtta ccaatgctta 5640atcagtgagg cacctatctc agcgatctgt ctatttcgtt catccatagt tgcctgactc 5700cccgtcgtgt agataactac gatacgggag ggcttaccat ctggccccag tgctgcaatg 5760ataccgcgag acccacgctc accggctcca gatttatcag caataaacca gccagccgga 5820agggccgagc gcagaagtgg tcctgcaact ttatccgcct ccatccagtc tattaattgt 5880tgccgggaag ctagagtaag tagttcgcca gttaatagtt tgcgcaacgt tgttgccatt 5940gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg cttcattcag ctccggttcc 6000caacgatcaa ggcgagttac atgatccccc atgttgtgca aaaaagcggt tagctccttc 6060ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt tatcactcat ggttatggca 6120gcactgcata attctcttac tgtcatgcca tccgtaagat gcttttctgt gactggtgag 6180tactcaacca agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 6240tcaatacggg ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 6300cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 6360cccactcgtg cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 6420gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 6480atactcatac tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 6540agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 6600ccccgaaaag tgccacctga acgaagcatc tgtgcttcat tttgtagaac aaaaatgcaa 6660cgcgagagcg ctaatttttc aaacaaagaa tctgagctgc atttttacag aacagaaatg 6720caacgcgaaa gcgctatttt accaacgaag aatctgtgct tcatttttgt aaaacaaaaa 6780tgcaacgcga gagcgctaat ttttcaaaca aagaatctga gctgcatttt tacagaacag 6840aaatgcaacg cgagagcgct attttaccaa caaagaatct atacttcttt tttgttctac 6900aaaaatgcat cccgagagcg ctatttttct aacaaagcat cttagattac tttttttctc 6960ctttgtgcgc tctataatgc agtctcttga taactttttg cactgtaggt ccgttaaggt 7020tagaagaagg ctactttggt gtctattttc tcttccataa aaaaagcctg actccacttc 7080ccgcgtttac tgattactag cgaagctgcg ggtgcatttt ttcaagataa aggcatcccc 7140gattatattc tataccgatg tggattgcgc atactttgtg aacagaaagt gatagcgttg 7200atgattcttc attggtcaga aaattatgaa cggtttcttc tattttgtct ctatatacta 7260cgtataggaa atgtttacat tttcgtattg ttttcgattc actctatgaa tagttcttac 7320tacaattttt ttgtctaaag agtaatacta gagataaaca taaaaaatgt agaggtcgag 7380tttagatgca agttcaagga gcgaaaggtg gatgggtagg ttatataggg atatagcaca 7440gagatatata gcaaagagat acttttgagc aatgtttgtg gaagcggtat tcgcaatatt 7500ttagtagctc gttacagtcc ggtgcgtttt tggttttttg aaagtgcgtc ttcagagcgc 7560ttttggtttt caaaagcgct ctgaagttcc tatactttct agagaatagg aacttcggaa 7620taggaacttc aaagcgtttc cgaaaacgag cgcttccgaa aatgcaacgc gagctgcgca 7680catacagctc actgttcacg tcgcacctat atctgcgtgt tgcctgtata tatatataca 7740tgagaagaac ggcatagtgc gtgtttatgc ttaaatgcgt acttatatgc gtctatttat 7800gtaggatgaa aggtagtcta gtacctcctg tgatattatc ccattccatg cggggtatcg 7860tatgcttcct tcagcactac cctttagctg ttctatatgc tgccactcct caattggatt 7920agtctcatcc ttcaatgcta tcatttcctt tgata 7955828572DNAArtificial SequencepGV1810 82tagaaaaact catcgagcat caaatgaaac tgcaatttat tcatatcagg attatcaata 60ccatattttt gaaaaagccg tttctgtaat gaaggagaaa actcaccgag gcagttccat 120aggatggcaa gatcctggta tcggtctgcg attccgactc gtccaacatc aatacaacct 180attaatttcc cctcgtcaaa aataaggtta tcaagtgaga aatcaccatg agtgacgact 240gaatccggtg agaatggcaa aagtttatgc atttctttcc agacttgttc aacaggccag 300ccattacgct cgtcatcaaa atcactcgca tcaaccaaac cgttattcat tcgtgattgc 360gcctgagcga ggcgaaatac gcgatcgctg ttaaaaggac aattacaaac aggaatcgag 420tgcaaccggc gcaggaacac tgccagcgca tcaacaatat tttcacctga atcaggatat 480tcttctaata cctggaacgc tgtttttccg gggatcgcag tggtgagtaa ccatgcatca 540tcaggagtac ggataaaatg cttgatggtc ggaagtggca taaattccgt cagccagttt 600agtctgacca tctcatctgt aacatcattg gcaacgctac ctttgccatg tttcagaaac 660aactctggcg catcgggctt cccatacaag cgatagattg tcgcacctga ttgcccgaca 720ttatcgcgag cccatttata cccatataaa tcagcatcca tgttggaatt taatcgcggc 780ctcgacgttt cccgttgaat atggctcata ttcttccttt ttcaatatta ttgaagcatt 840tatcagggtt attgtctcat gagcggatac atatttgaat gtatttagaa aaataaacaa 900ataggggtca gtgttacaac caattaacca attctgaaca ttatcgcgag cccatttata 960cctgaatatg gctcataaca ccccttgttt gcctggcggc agtagcgcgg tggtcccacc 1020tgaccccatg ccgaactcag aagtgaaacg ccgtagcgcc gatggtagtg tggggactcc 1080ccatgcgaga gtagggaact gccaggcatc aaataaaacg aaaggctcag tcgaaagact 1140gggcctttcg cccgggctaa ttagggggtg tcgcccttat tcgactctat agtgaagttc 1200ctattctcta gaaagtatag gaacttctga agtggggcta gccacgaaaa acaaactaac 1260ttatgcgcat cattagatgt aagaactact aaagagctac tggagttggt tgaggcttta 1320ggtccaaaaa tttgtttgtt gaagacacat gttgacatat taacagattt ttctatggag 1380ggtaccgtta agcctctgaa agcgttaagc gcgaaatata actttctttt atttgaagac 1440cgtaagtttg ctgatattgg aaatactgtt aagttgcaat atagcgcagg agtttataga 1500attgccgaat gggctgacat tacgaatgcc cacggtgttg taggtcctgg cattgtgtct 1560ggattgaaac aagcggcaga ggaagtgact aaggaaccaa gaggtttact aatgctggcg 1620gaattatctt gcaaaggctc tctagccacc ggtgaatata caaaaggtac tgtggatatt 1680gcaaagtctg ataaggactt cgtaatcggt tttattgcac aaagagatat gggaggtcgt 1740gacgagggct acgattggtt aattatgaca ccaggcgtag gattagatga caaaggcgat 1800gcgttaggcc aacagtatcg tacagtcgat gatgtcgtaa gtaccggttc tgatatcatt 1860attgtcggga gaggtttatt tgccaagggc cgtgatgcga aagtggaggg ggaaagatat 1920aggaaggcag gttgggaggc ttacttgaga agatgtggtc agcagaatta agcggccgca 1980taacaatact gacagtacta aataattgcc tacttggctt cacatacgtt gcatacgtcg 2040atatagataa taatgataat gacagcagga ttatcgtaat acgtaatagt tgaaaatctc 2100aaaaatgtgt gggtcattac gtaaataatg ataggaatgg gattcttcta tttttccttt 2160ttccattcta gcagccgtcg ggaaaacgtg gcatcctctc tttcgggctc aattggagtc 2220acgctgccgt gagcatcctc tctttccata tctaacaact gagcacgtaa ccaatggaaa 2280agcatgagct tagcgttgct ccaaaaaagt attggatggt taataccatt tgtctgttct 2340cttctgactt tgactcctca aaaaaaaaaa atctacaatc aacagatcgc ttcaattacg 2400ccctcacaaa aacttttttc cttcttcttc gcccacgtta aattttatcc ctcatgttgt 2460ctaacggatt tctgcacttg atttattata aaaagacaaa gacataatac ttctctatca 2520atttcagtta ttgttcttcc ttgcgttatt cttctgttct tctttttctt ttgtcatata 2580taaccataac caagtaatac atattcaaac tcgagatgtt gagaactcaa gccgccagat 2640tgatctgcaa ctcccgtgtc atcactgcta agagaacctt tgctttggcc acccgtgctg 2700ctgcttacag cagaccagct gcccgtttcg ttaagccaat gatcactacc cgtggtttga 2760agcaaatcaa cttcggtggt actgttgaaa ccgtctacga aagagctgac tggccaagag 2820aaaagttgtt ggactacttc aagaacgaca cttttgcttt gatcggttac ggttcccaag 2880gttacggtca aggtttgaac ttgagagaca acggtttgaa cgttatcatt ggtgtccgta 2940aagatggtgc ttcttggaag gctgccatcg aagacggttg ggttccaggc aagaacttgt 3000tcactgttga agatgctatc aagagaggta gttacgttat gaacttgttg tccgatgccg 3060ctcaatcaga aacctggcct gctatcaagc cattgttgac caagggtaag actttgtact 3120tctcccacgg tttctcccca gtcttcaagg acttgactca cgttgaacca ccaaaggact 3180tagatgttat cttggttgct ccaaagggtt ccggtagaac tgtcagatct ttgttcaagg 3240aaggtcgtgg tattaactct tcttacgccg tctggaacga tgtcaccggt aaggctcacg 3300aaaaggccca agctttggcc gttgccattg gttccggtta cgtttaccaa accactttcg 3360aaagagaagt caactctgac ttgtacggtg aaagaggttg tttaatgggt ggtatccacg 3420gtatgttctt ggctcaatac gacgtcttga gagaaaacgg tcactcccca tctgaagctt 3480tcaacgaaac cgtcgaagaa gctacccaat ctctataccc attgatcggt aagtacggta 3540tggattacat gtacgatgct tgttccacca ccgccagaag aggtgctttg gactggtacc 3600caatcttcaa gaatgctttg aagcctgttt tccaagactt gtacgaatct accaagaacg 3660gtaccgaaac caagagatct ttggaattca actctcaacc tgactacaga gaaaagctag 3720aaaaggaatt agacaccatc agaaacatgg aaatctggaa ggttggtaag gaagtcagaa 3780agttgagacc agaaaaccaa taacctagga tcttgtttaa agattacgga tatttaactt 3840acttagaata atgccatttt tttgagttat aataatccta cgttagtgtg agcgggattt 3900aaactgtgag gaccttaata cattcagaca cttctgcggt atcaccctac ttattccctt 3960cgagattata tctaggaacc catcaggttg gtggaagatt acccgttcta agacttttca 4020gcttcctcta ttgatgttac acctggacac cccttttctg gcatccagtt tttaatcttc 4080agtggcatgt gagattctcc gaaattaatt aaagcaatca cacaattctc tcggatacca 4140cctcggttga aactgacagg tggtttgtta cgcatgctaa tgcaaaggag cctatatacc 4200tttggctcgg ctgctgtaac agggaatata aagggcagca taatttagga gtttagtgaa 4260cttgcaacat ttactatttt cccttcttac gtaaatattt ttctttttaa ttctaaatca 4320atctttttca attttttgtt tgtattcttt tcttgcttaa atctataact acaaaaaaca 4380catacataaa ctaaaagtcg acatgtaccc atacgatgtt ccagattacg caggtggtgg 4440tgtcgacatg cctaaataca gatcagctac gactacacac ggtagaaata tggccggagc 4500cagggcccta tggagagcca ccggcatgac agatgcagat tttggtaaac ctataattgc 4560tgtagttaac tcttttacac agtttgttcc aggtcatgta catctaagag acttgggcaa 4620attggtggca gaacaaatcg aggctgctgg tggtgttgca aaagaattta acactattgc 4680cgtagacgac ggcattgcga tgggtcatgg cggtatgctt tattcgctac cctccagaga 4740attaattgca gacagcgttg aatatatggt aaatgcccac tgcgcagatg ccatggtttg 4800catttccaat tgtgacaaaa tcacgccggg catgttgatg gcgtcattga gactaaatat 4860tcctgtgatc ttcgttagcg gaggtcccat ggaagccggg aaaactaaac tttccgatca 4920gataatcaag ttagacttgg tcgatgccat gatccagggt gcggacccca aagtaagcga 4980ctctcaatcc gatcaagttg aaagatccgc atgtccaact tgcgggagtt gctctgggat 5040gttcacggcg aactctatga attgcctaac agaggccctg ggcctgtcac aacctggcaa 5100cggttcgctt ttagcaactc atgctgatag aaagcaatta tttctaaatg ctggtaaaag 5160aatcgttgaa ttaacaaaaa gatattacga acaaaacgat gaatctgcac tgccaaggaa 5220cattgcttca aaggccgctt tcgaaaacgc tatgacattg gatattgcaa tgggtggaag 5280cacaaatact gtccttcatc tactggcggc tgctcaagaa gcagaaattg atttcacaat 5340gagcgatatc gacaagctat cacgtaaggt cccgcagctg tgtaaagtgg caccgtctac 5400tcaaaaatac cacatggaag atgtccatcg tgctggaggc gttatcggaa tcttggggga 5460gttggacagg gccggtctat taaacagaga tgttaagaac gtgctaggtc taactttgcc 5520tcaaacctta gagcagtacg acgttatgtt aactcaagat gacgcagtca aaaacatgtt 5580cagagcgggg ccagctggaa taaggactac ccaagcgttc tcgcaagatt gcagatggga 5640tactctggac gatgatagag ctaacggttg cataagatca ctagagcatg cttactcgaa 5700agatggaggt ttagctgttt tatacggtaa ttttgccgaa aacggatgta tagtgaagac 5760cgctggggtt gatgattcaa ttctaaaatt cactgggcca gccaaggtat acgagtcaca 5820agatgatgct gttgaagcca tcttaggtgg gaaagtggtg gcaggggacg tggtggtaat 5880aagatatgaa ggtccaaagg gtggtccagg tatgcaagaa atgctgtacc ctacttcttt 5940ccttaaatct atgggtttag gcaaggcttg tgctcttata accgatggta gattttctgg 6000aggtacatca ggcctttcca taggacatgt tagccccgaa gctgcctcag gtggtagtat 6060tggcttaatc gaggatggtg acttaattgc tattgacatt cctaacaggg gtattcaact 6120acaggttagc gatgcagaat tagccgctag aagagaggca caagatgcga gaggcgataa 6180agcatggaca cctaagaaca gggagagaca agtgagcttt gccctgagag cttatgcctc 6240gctggcgacg agcgcagaca aaggagccgt aagagataaa tcaaaattgg gtggttaggg 6300atccgcgatt taatctctaa ttattagtta aagttttata agcattttta tgtaacgaaa 6360aataaattgg ttcatattat tactgcactg tcacttacca tggaaagacc agacaagaag 6420ttgccgacag tctgttgaat tggcctggtt aggcttaagt ctgggtccgc ttctttacaa 6480atttggagaa tttctcttaa acgatatgta tattcttttc gttggaaaag atgtcttcca 6540aaaaaaaaac cgatgaatta gtggaaccaa ggaaaaaaaa agaggtatcc ttgattaagg 6600aacactgttt aaacagtgtg gtttccaaaa ccctgaaact gcattagtgt aatagaagac 6660tagacacctc gatacaaata atggttactc aattcaaaac tgccagcgaa ttcgactctg 6720caattgctca agacaagcta gttgtcgtag atttctacgc cacttggtgc ggtccatgta 6780aaatgattgc tccaatgatt gaaaaattct ctgaacaata cccacaagct gatttctata 6840aattggatgt cgatgaattg ggtgatgttg cacaaaagaa tgaagtttcc gctatgccaa 6900ctttgcttct attcaagaac ggtaaggaag ttgcaaaggt tgttggtgcc aacccagcgg 6960ctattaagca agccattgct gctaatgctt aaactcaccc aatgaccgat atattgtgtt 7020tctatactgt gtttgttata tatagtttac ctttaagctt aaaatgaagt gaagttccta 7080tactttctag agaataggaa cttctatagt gagtcgaata agggcgacac aaaatttatt 7140ctaaatgcat aataaatact gataacatct tatagtttgt attatatttt gtattatcgt 7200tgacatgtat aattttgata tcaaaaactg attttccctt tattattttc gagatttatt 7260ttcttaattc tctttaacaa actagaaata ttgtatatac aaaaaatcat aaataataga 7320tgaatagttt aattataggt gttcatcaat cgaaaaagca acgtatctta tttaaagtgc 7380gttgcttttt tctcatttat aaggttaaat aattctcata tatcaagcaa agtgacaggc 7440gcccttaaat attctgacaa atgctctttc cctaaactcc ccccataaaa aaacccgccg 7500aagcgggttt ttacgttatt tgcggattaa cgattactcg ttatcagaac cgcccagggg 7560gcccgagctt aagactggcc gtcgttttac aacacagaaa gagtttgtag aaacgcaaaa 7620aggccatccg tcaggggcct tctgcttagt ttgatgcctg gcagttccct actctcgcct 7680tccgcttcct cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca 7740gctcactcaa aggcggtaat acggttatcc acagaatcag gggataacgc aggaaagaac 7800atgtgagcaa aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt 7860ttccataggc tccgcccccc tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg 7920cgaaacccga caggactata aagataccag gcgtttcccc ctggaagctc cctcgtgcgc 7980tctcctgttc cgaccctgcc gcttaccgga tacctgtccg cctttctccc ttcgggaagc 8040gtggcgcttt ctcatagctc acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc 8100aagctgggct gtgtgcacga accccccgtt cagcccgacc gctgcgcctt atccggtaac 8160tatcgtcttg agtccaaccc ggtaagacac gacttatcgc cactggcagc agccactggt 8220aacaggatta gcagagcgag gtatgtaggc ggtgctacag agttcttgaa gtggtgggct 8280aactacggct acactagaag aacagtattt ggtatctgcg ctctgctgaa gccagttacc 8340ttcggaaaaa gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt 8400ttttttgttt gcaagcagca gattacgcgc agaaaaaaag gatctcaaga agatcctttg 8460atcttttcta cggggtctga cgctcagtgg aacgacgcgc gcgtaactca cgttaaggga 8520ttttggtcat gagcttgcgc cgtcccgtca agtcagcgta atgctctgct tt 8572831488DNAArtificial SequenceE. coli codon optimized ilvC for expression in S. cerevisiae 83ctcgagatgg ccaactattt taacacatta aatttgagac aacaattggc tcaactgggt 60aagtgcagat ttatgggaag ggacgagttt gctgatggtg cttcttatct gcaaggaaag 120aaagtagtaa ttgttggctg cggtgctcag ggtctaaacc aaggtttaaa catgagagat 180tcaggtctgg atatttcgta tgcattgagg aaagaggcaa ttgcagaaaa gagggcctcc 240tggcgtaaag cgacggaaaa tgggttcaaa gttggtactt acgaagaact gatccctcag 300gcagatttag tgattaacct aacaccagat aagcaacact cagacgtagt aagaacagtt 360caaccgctga tgaaggatgg ggcagcttta ggttactctc atggctttaa tatcgttgaa 420gtgggcgagc agatcagaaa agatataaca gtcgtaatgg ttgcaccaaa gtgcccaggt 480acggaagtca gagaggagta caagaggggt tttggtgtac ctacattgat cgccgtacat

540cctgaaaatg accccaaagg tgaaggtatg gcaattgcga aggcatgggc agccgcaacc 600ggaggtcata gagcgggtgt gttagagagt tctttcgtag ctgaggtcaa gagtgactta 660atgggtgaac aaaccattct gtgcggaatg ttgcaggcag ggtctttact atgctttgat 720aaattggtcg aagagggtac agatcctgcc tatgctgaaa agttgataca atttggttgg 780gagacaatca ccgaggcact taaacaaggt ggcataacat tgatgatgga tagactttca 840aatccggcca agctaagagc ctacgcctta tctgagcaac taaaagagat catggcacca 900ttattccaaa agcacatgga cgatattatc tccggtgagt tttcctcagg aatgatggca 960gattgggcaa acgatgataa aaagttattg acgtggagag aagaaaccgg caagacggca 1020ttcgagacag ccccacaata cgaaggtaaa attggtgaac aagaatactt tgataaggga 1080gtattgatga tagctatggt gaaggcaggg gtagaacttg cattcgaaac tatggttgac 1140tccggtatca ttgaagaatc tgcatactat gagtctttgc atgaattgcc tttgatagca 1200aatactattg caagaaaaag actttacgag atgaatgttg tcatatcaga cactgcagaa 1260tatggtaatt acttatttag ctacgcatgt gtcccgttgt taaagccctt catggccgag 1320ttacaacctg gtgatttggg gaaggctatt ccggaaggag cggttgacaa tggccaactg 1380agagacgtaa atgaagctat tcgttcacat gctatagaac aggtgggtaa aaagctgaga 1440ggatatatga ccgatatgaa aagaattgca gtggcaggat gaagatct 14888438DNAArtificial SequencePrimer 1792 84ttttctcgag atgcagattt ttgtgaagac cctcactg 388551DNAArtificial SequencePrimer 1794 85ttttgcggcc gcggatccgt cgacacctcg caggcgcaac accaggtgca g 5186228DNAArtificial SequenceM. musculus ubiquitin gene codon-optimized for expression in S. cerevisiae 86atgcagattt ttgtgaagac cctcactggc aaaaccatca cccttgaggt cgagcccagt 60gacaccattg agaatgtcaa agccaaaatt caagacaagg agggtatccc acctgaccag 120cagcgtctga tatttgccgg caaacagctg gaggatggcc gcactctctc agactacaac 180atccagaaag agtccaccct gcacctggtg ttgcgcctgc gaggtgga 228871713DNALactococcus lactis 87atggagttta agtataacgg caaagttgaa tctgttgaac tgaataagta cagcaaaacg 60ttgacacaag atcccacaca acccgccaca caggcaatgt attacggcat cgggtttaaa 120gacgaagatt tcaagaaagc tcaagtgggt atagtgtcga tggactggga tggaaatcca 180tgcaacatgc atttaggaac ccttggatca aagattaaaa gctcagtaaa tcagacagat 240ggtctgatcg gcttacaatt tcatacgata ggagtttctg atgggatagc aaatggaaag 300ttgggaatga gatactccct tgtttccaga gaagttatag ctgactctat tgaaaccaac 360gctggcgctg aatactatga tgcaattgta gccatcccag gttgtgacaa aaatatgcca 420ggttctatta ttggtatggc aagacttaat aggccaagca ttatggtgta tggaggaaca 480atagaacacg gtgaatataa aggtgagaaa ttgaacatcg tatcggcttt tgaatctcta 540ggccagaaaa ttaccggcaa tatctctgat gaagattatc acggtgttat ttgtaatgct 600attcctggtc aaggggcatg tggggggatg tacacagcta ataccttagc tgccgctatc 660gaaacactag gtatgtcatt gccgtattct tcttcgaacc ctgcagtatc tcaagaaaaa 720caagaagaat gtgatgagat tggattagcc attaagaatc ttttggaaaa agacatcaag 780cctagtgata taatgactaa ggaggcgttc gagaacgcta ttaccattgt gatggtcttg 840gggggtagta ctaatgctgt cttgcatatt attgcaatgg ctaacgcgat aggtgtcgaa 900ataactcagg atgacttcca aagaattagt gacattactc cagtactagg tgattttaaa 960ccttcaggta aatatatgat ggaagatttg cataaaattg gaggcttgcc agcagtgctt 1020aagtaccttc taaaggaagg aaaattgcat ggtgactgcc ttactgtgac gggtaaaaca 1080ttagccgaga atgtcgagac tgccctagac ttggatttcg actcacaaga tatcatgagg 1140ccactaaaga atcctatcaa ggccaccggc cacttgcaga ttctgtacgg taatttagct 1200caagggggtt ccgtagcaaa aattagcggt aaagaaggag agttcttcaa aggcactgcc 1260agagtctttg atggtgaaca acattttatc gacggcatag aatctggtcg tttgcatgct 1320ggagatgtag cggtaattag gaatataggt cccgtcggcg gacctggtat gcccgaaatg 1380ctgaagccta catcagcatt aattggtgcg ggtttaggga aaagttgcgc gttaattacg 1440gatggtagat tctccggtgg cactcacggt tttgttgtcg gccatattgt gcctgaagcc 1500gttgagggtg gactaatcgg cttagttgaa gatgacgata taatagagat agatgcagtc 1560aacaactcta tatccctgaa agtttccgat gaagaaatcg caaagagaag agctaattat 1620cagaagccaa ctccgaaagc caccagggga gttttggcaa aattcgctaa attaacccgt 1680cctgcatcgg aagggtgtgt tactgatctg taa 1713881758DNASaccharomyces cerevisiae 88atgggcttgt taacgaaagt tgctacatct agacaattct ctacaacgag atgcgttgca 60aagaagctca acaagtactc gtatatcatc actgaaccta agggccaagg tgcgtcccag 120gccatgcttt atgccaccgg tttcaagaag gaagatttca agaagcctca agtcggggtt 180ggttcctgtt ggtggtccgg taacccatgt aacatgcatc tattggactt gaataacaga 240tgttctcaat ccattgaaaa agcgggtttg aaagctatgc agttcaacac catcggtgtt 300tcagacggta tctctatggg tactaaaggt atgagatact cgttacaaag tagagaaatc 360attgcagact cctttgaaac catcatgatg gcacaacact acgatgctaa catcgccatc 420ccatcatgtg acaaaaacat gcccggtgtc atgatggcca tgggtagaca taacagacct 480tccatcatgg tatatggtgg tactatcttg cccggtcatc caacatgtgg ttcttcgaag 540atctctaaaa acatcgatat cgtctctgcg ttccaatcct acggtgaata tatttccaag 600caattcactg aagaagaaag agaagatgtt gtggaacatg catgcccagg tcctggttct 660tgtggtggta tgtatactgc caacacaatg gcttctgccg ctgaagtgct aggtttgacc 720attccaaact cctcttcctt cccagccgtt tccaaggaga agttagctga gtgtgacaac 780attggtgaat acatcaagaa gacaatggaa ttgggtattt tacctcgtga tatcctcaca 840aaagaggctt ttgaaaacgc cattacttat gtcgttgcaa ccggtgggtc cactaatgct 900gttttgcatt tggtggctgt tgctcactct gcgggtgtca agttgtcacc agatgatttc 960caaagaatca gtgatactac accattgatc ggtgacttca aaccttctgg taaatacgtc 1020atggccgatt tgattaacgt tggtggtacc caatctgtga ttaagtatct atatgaaaac 1080aacatgttgc acggtaacac aatgactgtt accggtgaca ctttggcaga acgtgcaaag 1140aaagcaccaa gcctacctga aggacaagag attattaagc cactctccca cccaatcaag 1200gccaacggtc acttgcaaat tctgtacggt tcattggcac caggtggagc tgtgggtaaa 1260attaccggta aggaaggtac ttacttcaag ggtagagcac gtgtgttcga agaggaaggt 1320gcctttattg aagccttgga aagaggtgaa atcaagaagg gtgaaaaaac cgttgttgtt 1380atcagatatg aaggtccaag aggtgcacca ggtatgcctg aaatgctaaa gccttcctct 1440gctctgatgg gttacggttt gggtaaagat gttgcattgt tgactgatgg tagattctct 1500ggtggttctc acgggttctt aatcggccac attgttcccg aagccgctga aggtggtcct 1560atcgggttgg tcagagacgg cgatgagatt atcattgatg ctgataataa caagattgac 1620ctattagtct ctgataagga aatggctcaa cgtaaacaaa gttgggttgc acctccacct 1680cgttacacaa gaggtactct atccaagtat gctaagttgg tttccaacgc ttccaacggt 1740tgtgttttag atgcttga 1758891701DNASaccharomyces cerevisiae 89atgaagaagc tcaacaagta ctcgtatatc atcactgaac ctaagggcca aggtgcgtcc 60caggccatgc tttatgccac cggtttcaag aaggaagatt tcaagaagcc tcaagtcggg 120gttggttcct gttggtggtc cggtaaccca tgtaacatgc atctattgga cttgaataac 180agatgttctc aatccattga aaaagcgggt ttgaaagcta tgcagttcaa caccatcggt 240gtttcagacg gtatctctat gggtactaaa ggtatgagat actcgttaca aagtagagaa 300atcattgcag actcctttga aaccatcatg atggcacaac actacgatgc taacatcgcc 360atcccatcat gtgacaaaaa catgcccggt gtcatgatgg ccatgggtag acataacaga 420ccttccatca tggtatatgg tggtactatc ttgcccggtc atccaacatg tggttcttcg 480aagatctcta aaaacatcga tatcgtctct gcgttccaat cctacggtga atatatttcc 540aagcaattca ctgaagaaga aagagaagat gttgtggaac atgcatgccc aggtcctggt 600tcttgtggtg gtatgtatac tgccaacaca atggcttctg ccgctgaagt gctaggtttg 660accattccaa actcctcttc cttcccagcc gtttccaagg agaagttagc tgagtgtgac 720aacattggtg aatacatcaa gaagacaatg gaattgggta ttttacctcg tgatatcctc 780acaaaagagg cttttgaaaa cgccattact tatgtcgttg caaccggtgg gtccactaat 840gctgttttgc atttggtggc tgttgctcac tctgcgggtg tcaagttgtc accagatgat 900ttccaaagaa tcagtgatac tacaccattg atcggtgact tcaaaccttc tggtaaatac 960gtcatggccg atttgattaa cgttggtggt acccaatctg tgattaagta tctatatgaa 1020aacaacatgt tgcacggtaa cacaatgact gttaccggtg acactttggc agaacgtgca 1080aagaaagcac caagcctacc tgaaggacaa gagattatta agccactctc ccacccaatc 1140aaggccaacg gtcacttgca aattctgtac ggttcattgg caccaggtgg agctgtgggt 1200aaaattaccg gtaaggaagg tacttacttc aagggtagag cacgtgtgtt cgaagaggaa 1260ggtgccttta ttgaagcctt ggaaagaggt gaaatcaaga agggtgaaaa aaccgttgtt 1320gttatcagat atgaaggtcc aagaggtgca ccaggtatgc ctgaaatgct aaagccttcc 1380tctgctctga tgggttacgg tttgggtaaa gatgttgcat tgttgactga tggtagattc 1440tctggtggtt ctcacgggtt cttaatcggc cacattgttc ccgaagccgc tgaaggtggt 1500cctatcgggt tggtcagaga cggcgatgag attatcattg atgctgataa taacaagatt 1560gacctattag tctctgataa ggaaatggct caacgtaaac aaagttgggt tgcacctcca 1620cctcgttaca caagaggtac tctatccaag tatgctaagt tggtttccaa cgcttccaac 1680ggttgtgttt tagatgcttg a 1701901689DNAGramella forsetii 90atggataaaa cagccatgaa taacaaatac tcttctacta ttacacaaag tgactcacaa 60ccagcgtcac aagcaatgct tcacgccatc ggccttaata aggaagattt gaaaaagcct 120tttgtaggca tcggcagtac cggatatgaa ggaaacccat gcaacatgca cctgaatgat 180ttggctaagg aagtgaaaaa aggcactcag aatgcagatt taaacggtct gatctttaat 240acaattggcg tcagcgatgg aatatctatg ggtactccag gtatgaggtt ctcattgcca 300tcccgtgact tgattgcaga tagcatggaa acagtagttg gtggaatgtc gtatgatggt 360ttagttaccg tagttgggtg tgataaaaac atgccaggag cattaatggc aatgttgagg 420ttaaatcgtc cgtcggtttt agtgtatggg ggaacaattg ctagtggttg ccacaatgga 480aagaagttag atgttgtgtc tgctttcgag gcctggggtt ctaaagtttc aggtgatatg 540caggaagaag aataccagca agtcattgaa aaggcatgtc ctggtgcagg tgcttgtggg 600ggtatgtaca cagccaacac catggcttca tctattgaag ccttggggat gtccttgcct 660tttaactcat ccaatcctgc aactggtccg gaaaaaactc aagaatctgt caaagctggc 720gaggctatga aatacttact agaaaatgat ctgaaaccca aagatattgt gacggccaag 780tcgctggaaa atgctattag attgctaacg gttttgggtg gtagtaccaa tgccgtcttg 840cacttcttgg ctatagctaa ggcagccgaa ataaactttg gtttgaaaga ttttacaaga 900atatgtgagg aaactccctt cttggccgac ttaaaaccat ctggtaagta tctgatggaa 960gacattcata ggataggcgg aatccccgcg gttatgaagt acatgttaga gaaaggatta 1020cttcatggtg agtgcatgac ggtaactggc aagactatcg cagaaaacct tgaaaatgtg 1080aaacctctgc cagatgatca ggacgtgatt catccagtcg aaaaacctat taaagctact 1140ggacatatca ggattttgta tggcaattta gccagcgaag gctccgtagc caagattact 1200gggaaggaag gattagaatt tcaaggtaag gccagagtct ttaatggcga atttgaggcc 1260aatgaaggga tcagtagcgg aaaggtccaa aaaggcgacg tagtagtaat tagatatgag 1320ggtcccaagg ggggtccggg tatgccggaa atgctaaaac ccacgtcagc aataatggga 1380gctggtcttg gtaagagtgt cgctttaata actgacggta gattcagcgg cggtactcat 1440ggttttgtcg tgggtcatat aacccctgaa gcgcaacaag gtggactaat agggctattg 1500aaagatggtg atgaaatttc gatcaacgcg gagaaaaaca cgattgaagc acatttatcc 1560gcagaagaaa ttaatagaag aaaggaggct tggaaggctc ctgctctaaa agttaacggt 1620ggggtacttt acaaatatgc gaagacagtt gctagtgcat cagaggggtg tgttacagac 1680gagttctaa 1689911707DNASaccharopolyspora erythraea 91atgagtacga gtacagatgg tacgggtcaa tcaggtagag gactaaaacc aaggtccgga 60gacgtaaccg agggtataga aagagccgcc gcaagaggca tgttacgtgc agtcggtatg 120caagatgctg acttcgccaa gcctcaaatt ggtgtcgctt cgtcttggaa cgagataact 180ccctgtaatc tttcccttca gcgtttagca caagcgtcta aggaaggagt gcatgcagct 240ggtgggttcc caatggaatt tggcactatt tcagtgagtg atgggatatc tatgggccat 300gttggaatgc attactctct agtgagtagg gaggtgattg ctgattcggt tgagacggta 360atggaagctg aaaggctgga cggttccgtt ttgttagccg gttgtgacaa gagcctaccg 420ggtatgctaa tggccgcagc acgtttagat gtcgccgctg tattcgtgta tgcaggttcc 480atactgcctg gaagagtaga cgatagagaa gtaactatta ttgacgcttt tgaagccgtc 540ggagcttgtg caaggggctt gatctcagaa gccgaggtgg ataggattga aagggctata 600tgcccaggtg aaggcgcttg tggaggaatg tatacggcga ataccatggc ttgtgcggct 660gaagcaatgg gcatgtcgtt accaggatca gcctcccctc ctagcgtaga tcgtagaaga 720gacgcgggcg cacgtgaagc tggtagagct gtggtcggta tgattgaacg tggtcttaca 780gccagacaaa tattgactaa agaggcgttc gaaaacgcta tcgcggttgt tatggctttt 840ggcggcagta ctaatgctgt tctgcatttg ctggcaattg cacgtgaggc agaagttgat 900ttaacattag atgattttaa caggattggt gatagagtgc ctcatctggc tgatgttaag 960ccatttggaa ggcacgtgat gaccgcagtc gataggatag gtggagtacc agtagtaatg 1020aaagccttgt tggatgctgg tttgcttcat ggagactgta tgacagttac tgggaaaact 1080gtcgccgaga atctagctga attagaccca ccagaattag acggggaagt tcttcacaaa 1140ctgtctaacc ccttacaccc taccggcggc ttgaccatct tgagagggag cttggcccct 1200gagggagctg ttgtcaaaag cgctggcttt gactccgcaa cattcgaggg tactgcacgt 1260gttttcgatg gagagcaggg tgccatggat gctgttgagg atggttcatt gaaagcgggt 1320gacgtggtag tcatcagata tgaaggtcca agaggcggtc caggtatgag ggaaatgctt 1380gctgtaacag gggctatcaa aggtgcaggg ttagggaagg acgttctatt gttaactgat 1440ggtagatttt cgggtggaac cacaggttta tgcatcggac acgtcgcgcc cgaagcaact 1500gacggcggtc cgattgcttt tgttcgtgac ggtgatccta ttagactgga tttagcgggt 1560agaactttgg atctattagt agatgaagcc gaacttgcaa gaagaaaaga aggctgggtt 1620ccgagagaac ccaagtttag acaaggtgtt ttgggcaaat acgctagact ggttaggtct 1680gctgcagttg gagccgtctg ctcttga 1707921722DNACandidatus Koribacter versatilis 92atgactgaga agtcaccaaa accccataag agatccgatg caatcacaga ggggccaaat 60cgtgctcctg ctcgtgctat gttaagggct gcaggtttta ctcctgagga tttgagaaaa 120cccattatcg gtatagccaa cacatggatt gaaattggcc cttgcaactt acatctaaga 180gaattggccg aacatatcaa gcaaggtgta agagaagctg gagggacacc catggaattt 240aatacagttt ccatctccga cgggataacc atgggatcag aaggtatgaa agctagtcta 300gtgagtcgtg aggtaatagc cgattcaatt gagttagttg ccagaggaaa cttgtttgat 360ggactaatag ctttatctgg atgtgataag acaatcccag gtacaattat ggcattggag 420agacttgata tcccaggcct tatgctttat ggtggttcaa ttgctccggg caaattccac 480gcacagaagg ttacgatcca agatgtattc gaagccgttg gtacccacgc taggggtaaa 540atgagcgatg cagacttaga agagcttgag cacaatgctt gtcctggtgc tggggcgtgc 600ggaggacagt tcacagctaa tactatgtct atgtgtggtg aatttctggg tatatctcct 660atgggagcga atagcgttcc cgcaatgacg gtcgagaaac aacaagtcgc gcgtagatgt 720ggacatttag ttatggagtt ggtgagaaga gacatcaggc cgtctcaaat cataacaaga 780aaagcaattg agaacgcaat agcatcagtt gcggctagtg gaggtagtac taacgcggtc 840ctgcatctgt tagctattgc acacgagatg gatgtcgaat tgaacattga agattttgat 900aagataagct ctcgtactcc acttctttgt gaactgaaac cagccggtag gtttacggct 960acagatttgc atgacgctgg tggtattcca ttagttgctc aaagactgtt ggaagcaaat 1020ttgttacacg ctgacgcttt gacagtaact ggcaagacta ttgcagaaga agctaaacag 1080gccaaagaaa ccccgggcca agaagtagtc aggcccttga ccgacccaat taaggctacc 1140ggcggattaa tgatcttaaa aggtaatcta gcatcagaag ggtgcgtggt aaagttggtt 1200ggtcacaaga agttattctt cgaaggtcct gcgagagttt ttgaatctga agaagaagca 1260tttgccggcg tcgaggatag gacgattcaa gcgggtgaag ttgtagtggt cagatacgaa 1320gggccaaaag gcggacctgg aatgcgtgaa atgttaggcg ttactgctgc gatagctggc 1380accgagttag ctgaaactgt ggccctaatc accgacggta gattttcggg tgcaacaaga 1440ggtctatccg tggggcatgt cgcacctgaa gccgcaaatg gtggtgccat tgccgtagtt 1500aggaatggtg acattattac gctggatgtt gagagaagag aattaagggt tcatttgact 1560gatgctgaat tggaggccag attgcgtaac tggagagcgc ctgaaccgag atacaaacgt 1620ggtgttttcg ctaaatatgc ttctacggtc tcatcagcat cgttcggagc tgtaacaggt 1680tctaccatag aaaacaaaac actggcaggc tcgactaagt aa 1722931779DNAPiromyces sp 93atgtctttct cactggctaa cctggccgct aagggttcga acttgttcaa atttactcct 60gcgcttctaa gcgcaaagcg ttttggttca tcaggaaagc caattaataa gttcagcaag 120attataacag agccaaagtc tagagggggt agtcaagcga tgttaatcgc aactggtata 180aaaccagaag atttaaaaaa gccacagatc ggcataggca gtgtttggta tgatggaaat 240ccatgcaaca tgcatctatt ggatcttggc tccgtggtaa aaaaggccgt tcaaaaacaa 300aatatgaatg gtatgagatt caatatgatt ggagtgtcag acgggatctc caacggtacg 360gatggaatgt ccttttcttt gcagtcccgt gaaattattg cggattctat cgaaacaatc 420atgtctgcac aatattatga tgctaacatc agcttacctg gctgcgacaa gaacatgcct 480ggttgtttaa tcgccgctgc cagattgaac agaccgacta taattatcta cggtggcacg 540atcaagcccg gacatacaaa aaagggagag acgattgatt tagtctcggc cttccaatgt 600tatgggcaat acttggctgg agaaattact gaagagcaaa gagaagaaat agtgaataat 660gcatgtcctg gcgcaggtgc atgcggtgga atgtatacag ctaatacaat ggcttccata 720atcgaatcaa tgggtatgag tttaccttac tccgcctcga ccccggcaga agacccattg 780aaagagcttg aatgtataaa cgcggcagct gcaattaaga atttaatgga aaaagacatc 840aagccattag acataatgac aagaaaagcg tttgagaacg ctataactat tactttgatt 900cttggaggga gtacaaactc cgttctgcac cttttggcta tcgctagggc ctgcaaagtc 960ccattaacta ttgacgattt ccaggaattt tctaatagga tacccgtttt agccgactta 1020aaacctagtg gtaaatatgt catggaagat ttgcagttga tcggcggtct tccagctatt 1080cagaaatatc ttctgaatga aggtctactt catggtgata ttatgactgt taccggaaag 1140accctagcag agaatttgaa agacgttgct ccaatcgatt ttgaaactca agatataatt 1200agacctttat cgaatcccat taaaaagaat ggtcacatta tcattatgaa aggtaacgtc 1260tctccggacg gtggtgttgc taaaattaca ggtaagcagg gattgttttt cgaaggcgtg 1320gcgaattgct ttgattgtga agaagacatg ttagctgcac tggaaagagg cgaaattaaa 1380aaaggtcaag tgattataat aaggtatgaa ggccccactg gagggcctgg tatgccggag 1440atgctaactc cgaccagtgc tattatgggt gctgggttag gaaaagatgt agcactatta 1500acagatggca gattttcagg cgggtcacac ggcttcatta ttggtcatat tacgcctgag 1560gcacaagtag gtggtccaat tgccctaatc aaaaacggtg ataagataac tatagacgcg 1620aataaacgta ccatacatgc ccatgtcagc gaagaagaat ttgctaaaag acgtgccgag 1680tggaaagcac caccttacag agctactcaa ggtactttaa agaaatacat taagctggtt 1740aaacccgcaa actttggatg tgttaccgat gagtggtaa 1779941764DNARalstonia eutropha 94atgccgtacg cagatgaccc aaaattacct caagatgggg ctgcgcctac agaaggtttg 60gccaagggcc ttactaatta tggtgatact ggtttctctt tattcctgag gaaggctttt 120atcaaaggtg caggttttac cgatgatgca ctatcaaggc cggtgatagg aattgtaaat 180actggatctt cttataaccc atgccacggc aacgcccctc aattagtgga ggcggtgaag 240agaggtgtca tgttggcagg tggtttaccc gtagacttcc ctactatatc cgtccacgag 300tcatttagcg cacccactag tatgtattta aggaacttga tgtccatgga taccgaagaa 360atgattcgtg ctcagccgat ggacgccgtc gttctgatag ggggttgtga caaaacagtt 420ccagcccaac tgatgggtgc cgcatcagct ggagtaccag ccatccaatt agtcacaggt 480tctatgctaa ctggtagcca tagaagtgag agagtcggag cgtgtacgga ttgtcgtaga 540tactggggta gataccgtgc tgaggagatt gattcagccg agatcgcaga tgttaataat 600cagttggttg cctcagttgg tacatgctcg gtcatgggga cagcttcaac aatggcttgt 660gtagcagagg ccttgggtat gatggtttct ggcggtgctt cggcacctgc tgtgaccgcg 720gatagagtta gggtcgcgga acgtaccggg acgactgctg ttggaatggc ggcggccagg 780ttgacacctg atagaatatt aacaggtaaa gcctttgaaa acgctttgag agttctactg 840gcaatcggcg gttcaacaaa tgggatagta catctaacgg ctattgctgg tagactagga 900atcgacatcg acctagcagg gttggacaga atgtctcgtg aaacgcctgt tctggttgac 960ttgaaaccta gcggtcaaca ttacatggaa gattttcata aggccggagg

aatgttaacg 1020ttgttacgtg aactgagacc actattacac ttagatactt tgaccgttag tggaaggacc 1080cttggcgaag aattagatgc agcaccccct ctgttcccac aagatgtcat tagaagtgca 1140ggtaatccta tttatcccgc aggtggatta gcggtccttc gtggtaattt ggctccaggc 1200ggggctatca tcaaacaatc cgctgcgaac ccagctctta tggagcatga aggaagagcc 1260gtagtttttg aaaatgcaga agacatggct caaagaattg acgacgaatc cttagacgtg 1320aaagctgacg atattcttgt acttaaaagg attggtccaa ctggcgcccc gggtatgcct 1380gaagctggct atatgccgat accaaagaag ttagcaagag caggggttaa ggatatggta 1440agagttagtg atggtcgtat gtctggaacg gcagctggca caatagtttt gcatgtgaca 1500ccagaagcag ccataggggg acccttagcc cttgttcagt cgggagatag aattaggcta 1560tctgtggcca accgtgaaat tgcattgtta gtagatgatg ccgaattagc aaggagggcc 1620gctgctcaac ccgtagaaag accaagggct gagagaggtt atagaaaatt gtttctggag 1680acagtaactc aggcggatca gggtgttgat ttcgactttt tgagagctgc tcaaactgtg 1740gatacagtcc caaagcaagg ctaa 1764951746DNAChromohalobacter salexigens 95atgactcata agaagagacc tttaagaagt gccgagtggt tcggtaatga tgacaaaaat 60ggatttatgt atagatcgtg gatgaaaaac caaggtatac ccgatcacga gtttagaggt 120aaaccgataa ttggtatctg caataccttt agtgaactaa ctccatgcaa cgcccatttc 180agaaagttag cagaacatgt gaaaaaaggt gtattagaag caggcggtta cccggttgaa 240tttccagtat tttctaacgg ggaatctaat ttgagaccaa ctgctatgtt cacaaggaat 300ttggctagta tggatgtcga ggaagccatt agaggcaatc cattagacgc agtcgtgttg 360cttgtgggtt gtgataaaac aacaccagcc ttacttatgg gtgctgcttc ttgtgacatt 420ccgactatag ttgttacagg tgggccaatg cttaacggga aacacaaggg aagagacatc 480ggatcaggta cggtcgtgtg gcagctttct gaagaggtta aggccgggaa aatttcctta 540catgatttca tggcggctga ggctggaatg agccgttccg ctggcacttg taacactatg 600ggaaccgcct ctaccatggc atgcatggcc gaatctcttg gtacttcatt gccacacaat 660gccgctattc cggccgtgga tagccgtagg tatgtacttg cacatttgag tggtaatagg 720attgtcgaaa tggtcgatga agacctaaca ctgagcaaag tgctgaccaa gagcgctttt 780gaaaacgcta tcagaacgaa tgctgcgatt ggcgggtcaa ccaatgcagt aatccatcta 840caggcaatcg caggtagaat gggggtggac ttgacactag atgactggac aagagtaggt 900cgtggcacgc ctactatcgt cgatttacaa ccctcgggta ggtacttgat ggaggaattt 960tattatgcgg gaggtctgcc tgcagtttta aggagattgg gggaagctga tagactaccc 1020cataaagatg ccttaaccgt taatggcaag accctgtggg aaaacgttca agatgcgcca 1080ttatacaacg acgccgttat tttgccattg gatgctccct tacgtgagga cggaggcatg 1140tgtgtgatgc gtggtaatct tgcgcctaac ggggctgtat taaaacctag cgcagcaact 1200cctgctctaa tgcagcacag gggcagagcg gttgtttttg agaattttga tgattacaaa 1260gccaggataa atgatcctga cttggatgtt actgccgatg atatattagt aatgaagaac 1320tgtggtccta gaggttatca tggtatggca gaagtaggca acatgggact gcctgcaaaa 1380ctactggagc agggtgtcac ggacatggtc cgtatttcag atgcaagaat gagtggaacc 1440gcttacggta ctgttgtatt gcatgtagct cctgaagctg ctgccggtgg tcccttagct 1500gccgttcgta atggcgattg gatcgcacta gacgcatatt caggaaaatt acacttggag 1560gtcgatgatg ctgaaatagc gtccagatta gcagaggcag acccaacagc tgaatcaact 1620aggatagcgt caacaggagg ttacagacaa ctttacattg aacatgtttt gcaagctgat 1680caaggctgtg atttcgattt cttagttgga tgcaggggcg cagaagtccc aagacattcc 1740cactaa 174696990DNAPicrophilus torridus 96atggaaaagg tttatacgga gaacgaccta aaggaaaact tgatgcgtaa caaaaagata 60gcagttctag gttatggctc acaaggtaga gcttgggcat taaatatgag agacagcgga 120ttaaatgtga cagtgggatt ggaaagacag gggaaatctt gggaaaaagc cgttgctgat 180ggctttaagc cacttaagtc aagagatgct gttagagacg ctgacgcagt cattttctta 240gtcccagaca tggcccagag agaattatat aagaatatta tgaatgatat taaagatgac 300gcagacatcg tttttgccca cggctttaac gttcattatg gtcttattaa tcctaaaaac 360catgatgttt acatggtggc tcctaaagca cccggcccat cggtaaggga gttttacgaa 420agagggggag gggtcccggt tcttattgct gttgcaaatg atgtctcggg ccgttctaaa 480gaaaaggcgt taagtatagc gtatagcttg ggtgccttga gagcaggtgc gattgaaacc 540accttcaaag aggaaactga aacagaccta atcggtgaac aattggatct ggttggaggt 600attactgaat tactaagatc aacgtttaat attatggttg aaatgggtta taaaccagaa 660atggcttatt ttgaggccat caatgagatg aagttgatag tagaccaggt attcgaaaaa 720ggtatttctg gtatgcttag agccgtaagt gataccgcta aatatggagg tctgacaact 780ggtaagtaca taataaatga tgatgtaaga aaaaggatga gggaaagggc agaatacatt 840gtgtcaggaa aattcgctga ggagtggatt gaagaatacg gcgagggttc taagaatctg 900gaaagtatga tgttggatat cgataactcc ctagaagagc aagttggaaa gcaattaaga 960gaaatcgtct taaggggacg tcctaagtaa 990971683DNASulfolobus tokodaii 97atgaacccag acaagaaaaa acgttcgaat ctgatatatg gtggatacga gaaggctcct 60aacagggcct tcttgaaagc catgggcttg acggatgatg acatcgctaa accaatagtc 120ggtgtcgctg ttgcttggaa tgaagctggc ccatgtaata ttcatttact aggtttatct 180aatattgtta aagaaggagt gaggtcaggg ggtggcactc cgagggtatt taccgcccct 240gttgtgattg acggtatcgc aatgggttct gaagggatga agtattccct tgtttcaaga 300gaaattgtgg caaatacggt cgagcttgtg gttaatgctc acgggtacga tggtttcgtt 360gcattagctg ggtgtgacaa gactccacca ggaatgatga tggcaatggc tagattaaac 420attcccagca ttatcatgta tggaggcaca acactacctg gtaatttcaa aggaaaaccc 480atcactatcc aggatgtata tgaggctgtt ggggcttatt ctaaaggaaa gattacagca 540gaagatttaa gattgatgga agataatgct attccaggtc cgggaacctg cggcggtcta 600tacacagcca atactatggg cttaatgaca gaagcccttg gtcttgcgct accaggcagt 660gcttctcctc cagcagtgga tagtgcaagg gtaaaatatg catacgaaac gggtaaagcc 720ctaatgaatt taatcgaaat cgggttaaaa cctcgtgaca ttcttacctt tgaagccttt 780gaaaacgcaa taaccgtatt gatggcgtcg ggcggatcaa ccaacgcagt gttgcattta 840ctggcgatag catacgaagc aggcgttaaa ttaactttag atgattttga tcgtatatcc 900caaagaacac cagaaattgt taacatgaag cctggaggtg aatacgctat gtacgatttg 960catagggtcg gtggtgctcc cctgataatg aagaaattgc ttgaggccga cttattgcac 1020ggtgatgtaa taactgttac tggtaagacc gtcaaacaga atcttgagga gtataagttg 1080ccaaatgttc cacacgaaca cattgtcagg cccatatcca acccttttaa cccaacagga 1140gggataagaa ttttgaaggg ttcactggct ccagagggcg cagtaattaa agtctccgcc 1200actaaggtga gataccataa gggtccagcg agagtcttca attccgaaga ggaagccttt 1260aaggcagttc tggaagaaaa aatccaagag aatgatgtag ttgttatcag atatgaagga 1320cctaagggcg gtcctggaat gcgtgaaatg ttggctgtca cgtcggctat cgtgggtcaa 1380ggtttaggtg aaaaagttgc cttgattact gacggtagat tttcaggagc cacgagaggt 1440attatggtcg gacatgtagc tcccgaggcg gcagtaggtg gtccgatagc tttgctgagg 1500gacggtgaca caatcataat tgatgcaaat aatggcagac tagacgtcga tctacctcaa 1560gaagaattaa agaaaagagc tgatgagtgg acgcctcctc ccccgaaata taaaagtgga 1620ttattggctc aatacgctag actagttagc agttcttcac taggtgcggt gctattgact 1680taa 1683981476DNAArtificial SequenceE. coli ilvC(Q110V) 98atggccaact attttaacac attaaatttg agacaacaat tggctcaact gggtaagtgc 60agatttatgg gaagggacga gtttgctgat ggtgcttctt atctgcaagg aaagaaagta 120gtaattgttg gctgcggtgc tcagggtcta aaccaaggtt taaacatgag agattcaggt 180ctggatattt cgtatgcatt gaggaaagag gcaattgcag aaaagagggc ctcctggcgt 240aaagcgacgg aaaatgggtt caaagttggt acttacgaag aactgatccc tcaggcagat 300ttagtgatta acctaacacc agataaggtt cactcagacg tagtaagaac agttcaaccg 360ctgatgaagg atggggcagc tttaggttac tctcatggct ttaatatcgt tgaagtgggc 420gagcagatca gaaaagatat aacagtcgta atggttgcac caaagtgccc aggtacggaa 480gtcagagagg agtacaagag gggttttggt gtacctacat tgatcgccgt acatcctgaa 540aatgacccca aaggtgaagg tatggcaatt gcgaaggcat gggcagccgc aaccggaggt 600catagagcgg gtgtgttaga gagttctttc gtagctgagg tcaagagtga cttaatgggt 660gaacaaacca ttctgtgcgg aatgttgcag gcagggtctt tactatgctt tgataaattg 720gtcgaagagg gtacagatcc tgcctatgct gaaaagttga tacaatttgg ttgggagaca 780atcaccgagg cacttaaaca aggtggcata acattgatga tggatagact ttcaaatccg 840gccaagctaa gagcctacgc cttatctgag caactaaaag agatcatggc accattattc 900caaaagcaca tggacgatat tatctccggt gagttttcct caggaatgat ggcagattgg 960gcaaacgatg ataaaaagtt attgacgtgg agagaagaaa ccggcaagac ggcattcgag 1020acagccccac aatacgaagg taaaattggt gaacaagaat actttgataa gggagtattg 1080atgatagcta tggtgaaggc aggggtagaa cttgcattcg aaactatggt tgactccggt 1140atcattgaag aatctgcata ctatgagtct ttgcatgaat tgcctttgat agcaaatact 1200attgcaagaa aaagacttta cgagatgaat gttgtcatat cagacactgc agaatatggt 1260aattacttat ttagctacgc atgtgtcccg ttgttaaagc ccttcatggc cgagttacaa 1320cctggtgatt tggggaaggc tattccggaa ggagcggttg acaatggcca actgagagac 1380gtaaatgaag ctattcgttc acatgctata gaacaggtgg gtaaaaagct gagaggatat 1440atgaccgata tgaaaagaat tgcagtggca ggatga 1476991647DNALactococcus lactis 99atgtatactg ttggtgatta tctgctggac cgtctgcatg aactgggtat cgaagaaatc 60ttcggcgttc cgggtgatta caatctgcag ttcctggatc agatcatctc tcataaagac 120atgaaatggg tgggtaacgc taacgaactg aacgcaagct acatggcaga tggttatgca 180cgtaccaaga aagccgcggc atttctgacc actttcggtg ttggcgaact gagcgccgtc 240aacggtctgg cgggctccta cgccgaaaac ctgccggtgg tggagatcgt aggcagccca 300acgagcaaag ttcagaacga aggtaaattc gtccaccaca ctctggctga cggcgatttc 360aaacacttca tgaaaatgca tgaacctgtg actgcggcac gtacgctgct gactgcagag 420aacgctactg tggaaatcga ccgcgttctg tctgcgctgc tgaaagaacg caaaccagtt 480tacatcaacc tgcctgtgga tgttgcggca gctaaagcgg aaaaaccgag cctgccgctg 540aagaaagaaa actccacttc taacactagc gaccaggaaa tcctgaacaa aatccaggag 600tctctgaaaa acgcaaagaa accaatcgtg atcaccggcc acgaaatcat ttcttttggt 660ctggagaaga ccgtgaccca attcatcagc aaaaccaaac tgccgattac caccctgaac 720ttcggcaagt cctctgttga cgaggctctg ccgtctttcc tgggcatcta caacggtact 780ctgagcgaac cgaacctgaa agaatttgtt gaatctgcgg acttcatcct gatgctgggc 840gttaaactga ccgactcttc taccggtgca ttcactcacc atctgaacga aaacaaaatg 900attagcctga acatcgacga gggtaaaatc ttcaacgagc gtatccagaa cttcgacttc 960gaaagcctga tcagctctct gctggacctg tccgaaatcg agtataaagg caaatacatt 1020gacaaaaagc aagaagattt cgtaccatct aacgcactgc tgtcccagga tcgcctgtgg 1080caggccgtgg agaacctgac ccagagcaat gaaaccatcg tggcggaaca aggtacgagc 1140tttttcggcg cgtcttctat ctttctgaaa tccaaaagcc attttatcgg tcagccgctg 1200tggggtagca ttggctatac tttcccggca gcgctgggct ctcagatcgc tgataaagaa 1260tctcgtcatc tgctgttcat cggtgacggt tccctgcagc tgaccgtaca ggaactgggt 1320ctggcaattc gtgaaaagat caacccgatt tgcttcatta ttaacaatga cggctacacc 1380gttgagcgtg agatccacgg tccgaaccag tcttacaacg atatccctat gtggaactac 1440tctaaactgc cggagtcctt cggcgcaact gaggaccgtg ttgtgtctaa aattgtgcgt 1500accgaaaacg aatttgtgag cgtgatgaaa gaggcccagg ccgatccgaa ccgtatgtac 1560tggatcgaac tgatcctggc gaaagaaggc gcaccgaagg tactgaagaa aatgggcaag 1620ctgtttgctg aacagaataa atcctaa 16471001188DNASaccharomyces cerevisiae 100atgttgagaa ctcaagccgc cagattgatc tgcaactccc gtgtcatcac tgctaagaga 60acctttgctt tggccacccg tgctgctgct tacagcagac cagctgcccg tttcgttaag 120ccaatgatca ctacccgtgg tttgaagcaa atcaacttcg gtggtactgt tgaaaccgtc 180tacgaaagag ctgactggcc aagagaaaag ttgttggact acttcaagaa cgacactttt 240gctttgatcg gttacggttc ccaaggttac ggtcaaggtt tgaacttgag agacaacggt 300ttgaacgtta tcattggtgt ccgtaaagat ggtgcttctt ggaaggctgc catcgaagac 360ggttgggttc caggcaagaa cttgttcact gttgaagatg ctatcaagag aggtagttac 420gttatgaact tgttgtccga tgccgctcaa tcagaaacct ggcctgctat caagccattg 480ttgaccaagg gtaagacttt gtacttctcc cacggtttct ccccagtctt caaggacttg 540actcacgttg aaccaccaaa ggacttagat gttatcttgg ttgctccaaa gggttccggt 600agaactgtca gatctttgtt caaggaaggt cgtggtatta actcttctta cgccgtctgg 660aacgatgtca ccggtaaggc tcacgaaaag gcccaagctt tggccgttgc cattggttcc 720ggttacgttt accaaaccac tttcgaaaga gaagtcaact ctgacttgta cggtgaaaga 780ggttgtttaa tgggtggtat ccacggtatg ttcttggctc aatacgacgt cttgagagaa 840aacggtcact ccccatctga agctttcaac gaaaccgtcg aagaagctac ccaatctcta 900tacccattga tcggtaagta cggtatggat tacatgtacg atgcttgttc caccaccgcc 960agaagaggtg ctttggactg gtacccaatc ttcaagaatg ctttgaagcc tgttttccaa 1020gacttgtacg aatctaccaa gaacggtacc gaaaccaaga gatctttgga attcaactct 1080caacctgact acagagaaaa gctagaaaag gaattagaca ccatcagaaa catggaaatc 1140tggaaggttg gtaaggaagt cagaaagttg agaccagaaa accaataa 118810120DNAArtificial SequencePrimer 1321 101aatcatatcg aacacgatgc 2010220DNAArtificial SequencePrimer 1322 102tcagaaagga tcttctgctc 2010320DNAArtificial SequencePrimer 1323 103atcgatatcg tgaaatacgc 2010420DNAArtificial SequencePrimer 1324 104agctggtctg gtgattctac 2010538DNAArtificial SequencePrimer 1409 105attgatgcgg ccgcgattta atctctaatt attagtta 3810634DNAArtificial SequencePrimer 1410 106cacccagtcg cgacatccaa tttatagaaa tcag 3410732DNAArtificial SequencePrimer 1411 107attggatgtc gcgactgggt gagcatatgt tc 3210832DNAArtificial SequencePrimer 1412 108gagaaagccg gcaggagagt gaaagagcct tg 3210921DNAArtificial SequencePrimer 1440 109atcgtacatc ttccaagcat c 2111020DNAArtificial SequencePrimer 1441 110aatcggaacc ctaaagggag 2011120DNAArtificial SequencePrimer 1443 111tgcagatgca gatgtgagac 2011224DNAArtificial SequencePrimer 1587 112cggctgccag aactctacta actg 2411323DNAArtificial SequencePrimer 1588 113gcgacgtcta ctggcaggtt aat 2311424DNAArtificial SequencePrimer 1633 114tccgtcactg gattcaatgc catc 2411520DNAArtificial SequencePrimer 1634 115ttcgccaggg agctggtgaa 20116771DNADrosophila melanogaster 116atgtcgttta ctttgaccaa caagaacgtg attttcgttg ccggtctggg aggcattggt 60ctggacacca gcaaggagct gctcaagcgc gatctgaaga acctggtgat cctcgaccgc 120attgagaacc cggctgccat tgccgagctg aaggcaatca atccaaaggt gaccgtcacc 180ttctacccct atgatgtgac cgtgcccatt gccgagacca ccaagctgct gaagaccatc 240ttcgcccagc tgaagaccgt cgatgtcctg atcaacggag ctggtatcct ggacgatcac 300cagatcgagc gcaccattgc cgtcaactac actggcctgg tcaacaccac gacggccatt 360ctggacttct gggacaagcg caagggcggt cccggtggta tcatctgcaa cattggatcc 420gtcactggat tcaatgccat ctaccaggtg cccgtctact ccggcaccaa ggccgccgtg 480gtcaacttca ccagctccct ggcgaaactg gcccccatta ccggcgtgac ggcttacact 540gtgaaccccg gcatcacccg caccaccctg gtgcacacgt tcaactcctg gttggatgtt 600gagcctcagg ttgccgagaa gctcctggct catcccaccc agccctcgtt ggcctgcgcc 660gagaacttcg tcaaggctat cgagctgaac cagaacggag ccatctggaa actggacttg 720ggcaccctgg aggccatcca gtggaccaag cactgggact ccggcatcta a 7711178870DNAArtificial SequencepGV1914 117tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg 120ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc 180accatatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc atcaggcgcc 240attcgccatt caggctgcgc aactgttggg aagggcgatc ggtgcgggcc tcttcgctat 300tacgccagct ggcgaaaggg ggatgtgctg caaggcgatt aagttgggta acgccagggt 360tttcccagtc acgacgttgt aaaacgacgg ccagtgaatt cataccacag cttttcaatt 420caattcatca tttttttttt attctttttt ttgatttcgg tttccttgaa atttttttga 480ttcggtaatc tccgaacaga aggaagaacg aaggaaggag cacagactta gattggtata 540tatacgcata tgtagtgttg aagaaacatg aaattgccca gtattcttaa cccaactgca 600cagaacaaaa acctgcagga aacgaagata aatcatgtcg aaagctacat ataaggaacg 660tgctgctact catcctagtc ctgttgctgc caagctattt aatatcatgc acgaaaagca 720aacaaacttg tgtgcttcat tggatgttcg taccaccaag gaattactgg agttagttga 780agcattaggt cccaaaattt gtttactaaa aacacatgtg gatatcttga ctgatttttc 840catggagggc acagttaagc cgctaaaggc attatccgcc aagtacaatt ttttactctt 900cgaagacaga aaatttgctg acattggtaa tacagtcaaa ttgcagtact ctgcgggtgt 960atacagaata gcagaatggg cagacattac gaatgcacac ggtgtggtgg gcccaggtat 1020tgttagcggt ttgaagcagg cggcagaaga agtaacaaag gaacctagag gccttttgat 1080gttagcagaa ttgtcatgca agggctccct atctactgga gaatatacta agggtactgt 1140tgacattgcg aagagcgaca aagattttgt tatcggcttt attgctcaaa gagacatggg 1200tggaagagat gaaggttacg attggttgat tatgacaccc ggtgtgggtt tagatgacaa 1260gggagacgca ttgggtcaac agtatagaac cgtggatgat gtggtctcta caggatctga 1320cattattatt gttggaagag gactatttgc aaagggaagg gatgctaagg tagagggtga 1380acgttacaga aaagcaggct gggaagcata tttgagaaga tgcggccagc aaaactaaaa 1440aactgtatta taagtaaatg catgtatact aaactcacaa attagagctt caatttaatt 1500atatcagtta ttaccctatg cggtgtgaaa taccgcacag atgcgtaagg agaaaatacc 1560gcatcaggaa attgtaaacg ttaatatttt gttaaaattc gcgttaaatt tttgttaaat 1620cagctcattt tttaaccaat aggccgaaat cggcaaaatc ccttataaat caaaagaata 1680gaccgagata gggttgagtg ttgttccagt ttggaacaag agtccactat taaagaacgt 1740ggactccaac gtcaaagggc gaaaaaccgt ctatcagggc gatggcccac tacgtgaacc 1800atcaccctaa tcaagttttt tggggtcgag gtgccgtaaa gcactaaatc ggaaccctaa 1860agggagcccc cgatttagag cttgacgggg aaagccggcg aggactgcaa tagcacaaga 1920ttaagataga atggcttcaa acagccgcct tttatacata ttggtaaaag ctcgcgaatc 1980gcaccatatc ccttatcctg taatcaaatc gatctaggtg cagatacaga tcaattcata 2040aaaagaaatt gaagcaccag tttatcacta ctacactatc tttttctttt tttttttttt 2100ttgcgcagtt tcgccctttg ttcaatatca cttgataagt tgtgggcttt ttctgtcact 2160cattcggctt aaaaagtatt cgttcttttg tgttttatga aaagggaacg tgatataaaa 2220aaacatcctt tggtgtggga catgggcttt tgtttagaga atggttatca ctaccgcccc 2280cacccttgaa agccacagaa aatgaaaaag tatgtgaata aggtgtgaac tctataacat 2340tttggccaaa tgccacagcc gatctgcata ttccaatgga catgatgcaa caacaattga 2400tgtcacattc tcttacacac ttcgattggt ccgtacgtag tactttttac ataactgact 2460caggcgtttc cttcattgaa atgctcatct attgccaagt acatagaatc cacagtgcat 2520aggttaacgc attgtaccca aacgacggga aacaaggaag gatgcagaat gagcacttgt 2580tatttataaa aagacacggg agggggaatc ccgtctttcg tccgtcggag ccaaagagat 2640gagccaaagc agaaaaacag gggacgccgc ccttcttccg tcccgtgcgt gaggggggcg 2700cggccattcg gtttttgcaa tatgacctgt gggccaaaaa tcgaaaaaaa

aaaaaaaaat 2760aagaggcggc tgcggaattt tataagacaa gcgcagggcc aaagaaaaaa taataattga 2820cgtggctgaa caacagtctc tccccacccc tttccaaaaa ggggaatgaa atacgagttc 2880tttttcccaa ttggtagata ttcaacaaga gacgcgcagt acgtaacatg cgaattgcgt 2940aattcacggc gataacgtag tatttagatt tagtataatt tgaaccgatg tatttatttg 3000tctgattgat ttatgtattc aaactgtgta agtttattta tttgcaacaa taattcgttt 3060gagtacacta ctaatggcgg ccgcttagat gccggagtcc cagtgcttgg tccactggat 3120ggcctccagg gtgcccaagt ccagtttcca gatggctccg ttctggttca gctcgatagc 3180cttgacgaag ttctcggcgc aggccaacga gggctgggtg ggatgagcca ggagcttctc 3240ggcaacctga ggctcaacat ccaaccagga gttgaacgtg tgcaccaggg tggtgcgggt 3300gatgccgggg ttcacagtgt aagccgtcac gccggtaatg ggggccagtt tcgccaggga 3360gctggtgaag ttgaccacgg cggccttggt gccggagtag acgggcacct ggtagatggc 3420attgaatcca gtgacggatc caatgttgca gatgatacca ccgggaccgc ccttgcgctt 3480gtcccagaag tccagaatgg ccgtcgtggt gttgaccagg ccagtgtagt tgacggcaat 3540ggtgcgctcg atctggtgat cgtccaggat accagctccg ttgatcagga catcgacggt 3600cttcagctgg gcgaagatgg tcttcagcag cttggtggtc tcggcaatgg gcacggtcac 3660atcatagggg tagaaggtga cggtcacctt tggattgatt gccttcagct cggcaatggc 3720agccgggttc tcaatgcggt cgaggatcac caggttcttc agatcgcgct tgagcagctc 3780cttgctggtg tccagaccaa tgcctcccag accggcaacg aaaatcacgt tcttgttggt 3840caaagtaaac gacataccgg tatctcctag atccgtcgaa gtcgaaacta agttctggtg 3900ttttaaaact aaaaaaaaga ctaactataa aagtagaatt taagaagttt aagaaataga 3960tttacagaat tacaatcaat acctaccgtc tttatatact tattagtcaa gtaggggaat 4020aatttcaggg aactggtttc aacctttttt ttcagctttt tccaaatcag agagagcaga 4080aggtaataga aggtgtaaga aaatgagata gatacatgcg tgggtcaatt gccttgtgtc 4140atcatttact ccaggcaggt tgcatcactc cattgaggtt gtgcccgttt tttgcctgtt 4200tgtgcccctg ttctctgtag ttgcgctaag agaatggacc tatgaactga tggttggtga 4260agaaaacaat attttggtgc tgggattctt tttttttctg gatgccagct taaaaagcgg 4320gctccattat atttagtgga tgccaggaat aaactgttca cccagacacc tacgatgtta 4380tatattctgt gtaacccgcc ccctattttg ggcatgtacg ggttacagca gaattaaaag 4440gctaattttt tgactaaata aagttaggaa aatcactact attaattatt tacgtattct 4500ttgaaatggc gagtattgat aatgataaac tggatcctta ggatttattc tgttcagcaa 4560acagcttgcc cattttcttc agtaccttcg gtgcgccttc tttcgccagg atcagttcga 4620tccagtacat acggttcgga tcggcctggg cctctttcat cacgctcaca aattcgtttt 4680cggtacgcac aattttagac acaacacggt cctcagttgc gccgaaggac tccggcagtt 4740tagagtagtt ccacataggg atatcgttgt aagactggtt cggaccgtgg atctcacgct 4800caacggtgta gccgtcattg ttaataatga agcaaatcgg gttgatcttt tcacgaattg 4860ccagacccag ttcctgtacg gtcagctgca gggaaccgtc accgatgaac agcagatgac 4920gagattcttt atcagcgatc tgagagccca gcgctgccgg gaaagtatag ccaatgctac 4980cccacagcgg ctgaccgata aaatggcttt tggatttcag aaagatagaa gacgcgccga 5040aaaagctcgt accttgttcc gccacgatgg tttcattgct ctgggtcagg ttctccacgg 5100cctgccacag gcgatcctgg gacagcagtg cgttagatgg tacgaaatct tcttgctttt 5160tgtcaatgta tttgccttta tactcgattt cggacaggtc cagcagagag ctgatcaggc 5220tttcgaagtc gaagttctgg atacgctcgt tgaagatttt accctcgtcg atgttcaggc 5280taatcatttt gttttcgttc agatggtgag tgaatgcacc ggtagaagag tcggtcagtt 5340taacgcccag catcaggatg aagtccgcag attcaacaaa ttctttcagg ttcggttcgc 5400tcagagtacc gttgtagatg cccaggaaag acggcagagc ctcgtcaaca gaggacttgc 5460cgaagttcag ggtggtaatc ggcagtttgg ttttgctgat gaattgggtc acggtcttct 5520ccagaccaaa agaaatgatt tcgtggccgg tgatcacgat tggtttcttt gcgtttttca 5580gagactcctg gattttgttc aggatttcct ggtcgctagt gttagaagtg gagttttctt 5640tcttcagcgg caggctcggt ttttccgctt tagctgccgc aacatccaca ggcaggttga 5700tgtaaactgg tttgcgttct ttcagcagcg cagacagaac gcggtcgatt tccacagtag 5760cgttctctgc agtcagcagc gtacgtgccg cagtcacagg ttcatgcatt ttcatgaagt 5820gtttgaaatc gccgtcagcc agagtgtggt ggacgaattt accttcgttc tgaactttgc 5880tcgttgggct gcctacgatc tccaccaccg gcaggttttc ggcgtaggag cccgccagac 5940cgttgacggc gctcagttcg ccaacaccga aagtggtcag aaatgccgcg gctttcttgg 6000tacgtgcata accatctgcc atgtagcttg cgttcagttc gttagcgtta cccacccatt 6060tcatgtcttt atgagagatg atctgatcca ggaactgcag attgtaatca cccggaacgc 6120cgaagatttc ttcgataccc agttcatgca gacggtccag cagataatca ccaacagtat 6180acatgtcgac aaacttagat tagattgcta tgctttcttt ctaatgagca agaagtaaaa 6240aaagttgtaa tagaacaaga aaaatgaaac tgaaacttga gaaattgaag accgtttatt 6300aacttaaata tcaatgggag gtcatcgaaa gagaaaaaaa tcaaaaaaaa aattttcaag 6360aaaaagaaac gtgataaaaa tttttattgc ctttttcgac gaagaaaaag aaacgaggcg 6420gtctcttttt tcttttccaa acctttagta cgggtaatta acgacaccct agaggaagaa 6480agaggggaaa tttagtatgc tgtgcttggg tgttttgaag tggtacggcg atgcgcggag 6540tccgagaaaa tctggaagag taaaaaagga gtagaaacat tttgaagcta tgagctccag 6600cttttgttcc ctttagtgag ggttaattgc gcgcttggcg taatcatggt catagctgtt 6660tcctgtgtga aattgttatc cgctcacaat tccacacaac ataggagccg gaagcataaa 6720gtgtaaagcc tggggtgcct aatgagtgag gtaactcaca ttaattgcgt tgcgctcact 6780gcccgctttc cagtcgggaa acctgtcgtg ccagctgcat taatgaatcg gccaacgcgc 6840ggggagaggc ggtttgcgta ttgggcgctc ttccgcttcc tcgctcactg actcgctgcg 6900ctcggtcgtt cggctgcggc gagcggtatc agctcactca aaggcggtaa tacggttatc 6960cacagaatca ggggataacg caggaaagaa catgtgagca aaaggccagc aaaaggccag 7020gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg ctccgccccc ctgacgagca 7080tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg acaggactat aaagatacca 7140ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg 7200atacctgtcc gcctttctcc cttcgggaag cgtggcgctt tctcatagct cacgctgtag 7260gtatctcagt tcggtgtagg tcgttcgctc caagctgggc tgtgtgcacg aaccccccgt 7320tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc cggtaagaca 7380cgacttatcg ccactggcag cagccactgg taacaggatt agcagagcga ggtatgtagg 7440cggtgctaca gagttcttga agtggtggcc taactacggc tacactagaa ggacagtatt 7500tggtatctgc gctctgctga agccagttac cttcggaaaa agagttggta gctcttgatc 7560cggcaaacaa accaccgctg gtagcggtgg tttttttgtt tgcaagcagc agattacgcg 7620cagaaaaaaa ggatctcaag aagatccttt gatcttttct acggggtctg acgctcagtg 7680gaacgaaaac tcacgttaag ggattttggt catgagatta tcaaaaagga tcttcaccta 7740gatcctttta aattaaaaat gaagttttaa atcaatctaa agtatatatg agtaaacttg 7800gtctgacagt taccaatgct taatcagtga ggcacctatc tcagcgatct gtctatttcg 7860ttcatccata gttgcctgac tccccgtcgt gtagataact acgatacggg agggcttacc 7920atctggcccc agtgctgcaa tgataccgcg agacccacgc tcaccggctc cagatttatc 7980agcaataaac cagccagccg gaagggccga gcgcagaagt ggtcctgcaa ctttatccgc 8040ctccatccag tctattaatt gttgccggga agctagagta agtagttcgc cagttaatag 8100tttgcgcaac gttgttgcca ttgctacagg catcgtggtg tcacgctcgt cgtttggtat 8160ggcttcattc agctccggtt cccaacgatc aaggcgagtt acatgatccc ccatgttgtg 8220caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc agaagtaagt tggccgcagt 8280gttatcactc atggttatgg cagcactgca taattctctt actgtcatgc catccgtaag 8340atgcttttct gtgactggtg agtactcaac caagtcattc tgagaatagt gtatgcggcg 8400accgagttgc tcttgcccgg cgtcaatacg ggataatacc gcgccacata gcagaacttt 8460aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa ctctcaagga tcttaccgct 8520gttgagatcc agttcgatgt aacccactcg tgcacccaac tgatcttcag catcttttac 8580tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa aaaagggaat 8640aagggcgaca cggaaatgtt gaatactcat actcttcctt tttcaatatt attgaagcat 8700ttatcagggt tattgtctca tgagcggata catatttgaa tgtatttaga aaaataaaca 8760aataggggtt ccgcgcacat ttccccgaaa agtgccacct gacgtctaag aaaccattat 8820tatcatgaca ttaacctata aaaataggcg tatcacgagg ccctttcgtc 88701184857DNAArtificial SequenceSacI-NotI fragment 118gagctcatag cttcaaaatg tttctactcc ttttttactc ttccagattt tctcggactc 60cgcgcatcgc cgtaccactt caaaacaccc aagcacagca tactaaattt cccctctttc 120ttcctctagg gtgtcgttaa ttacccgtac taaaggtttg gaaaagaaaa aagagaccgc 180ctcgtttctt tttcttcgtc gaaaaaggca ataaaaattt ttatcacgtt tctttttctt 240gaaaattttt tttttgattt ttttctcttt cgatgacctc ccattgatat ttaagttaat 300aaacggtctt caatttctca agtttcagtt tcatttttct tgttctatta caactttttt 360tacttcttgc tcattagaaa gaaagcatag caatctaatc taagtttgtc gacatgaaga 420agctcaacaa gtactcgtat atcatcactg aacctaaggg ccaaggtgcg tcccaggcca 480tgctttatgc caccggtttc aagaaggaag atttcaagaa gcctcaagtc ggggttggtt 540cctgttggtg gtccggtaac ccatgtaaca tgcatctatt ggacttgaat aacagatgtt 600ctcaatccat tgaaaaagcg ggtttgaaag ctatgcagtt caacaccatc ggtgtttcag 660acggtatctc tatgggtact aaaggtatga gatactcgtt acaaagtaga gaaatcattg 720cagactcctt tgaaaccatc atgatggcac aacactacga tgctaacatc gccatcccat 780catgtgacaa aaacatgccc ggtgtcatga tggccatggg tagacataac agaccttcca 840tcatggtata tggtggtact atcttgcccg gtcatccaac atgtggttct tcgaagatct 900ctaaaaacat cgatatcgtc tctgcgttcc aatcctacgg tgaatatatt tccaagcaat 960tcactgaaga agaaagagaa gatgttgtgg aacatgcatg cccaggtcct ggttcttgtg 1020gtggtatgta tactgccaac acaatggctt ctgccgctga agtgctaggt ttgaccattc 1080caaactcctc ttccttccca gccgtttcca aggagaagtt agctgagtgt gacaacattg 1140gtgaatacat caagaagaca atggaattgg gtattttacc tcgtgatatc ctcacaaaag 1200aggcttttga aaacgccatt acttatgtcg ttgcaaccgg tgggtccact aatgctgttt 1260tgcatttggt ggctgttgct cactctgcgg gtgtcaagtt gtcaccagat gatttccaaa 1320gaatcagtga tactacacca ttgatcggtg acttcaaacc ttctggtaaa tacgtcatgg 1380ccgatttgat taacgttggt ggtacccaat ctgtgattaa gtatctatat gaaaacaaca 1440tgttgcacgg taacacaatg actgttaccg gtgacacttt ggcagaacgt gcaaagaaag 1500caccaagcct acctgaagga caagagatta ttaagccact ctcccaccca atcaaggcca 1560acggtcactt gcaaattctg tacggttcat tggcaccagg tggagctgtg ggtaaaatta 1620ccggtaagga aggtacttac ttcaagggta gagcacgtgt gttcgaagag gaaggtgcct 1680ttattgaagc cttggaaaga ggtgaaatca agaagggtga aaaaaccgtt gttgttatca 1740gatatgaagg tccaagaggt gcaccaggta tgcctgaaat gctaaagcct tcctctgctc 1800tgatgggtta cggtttgggt aaagatgttg cattgttgac tgatggtaga ttctctggtg 1860gttctcacgg gttcttaatc ggccacattg ttcccgaagc cgctgaaggt ggtcctatcg 1920ggttggtcag agacggcgat gagattatca ttgatgctga taataacaag attgacctat 1980tagtctctga taaggaaatg gctcaacgta aacaaagttg ggttgcacct ccacctcgtt 2040acacaagagg tactctatcc aagtatgcta agttggtttc caacgcttcc aacggttgtg 2100ttttagatgc ttgaggatcc agtttatcat tatcaatact cgccatttca aagaatacgt 2160aaataattaa tagtagtgat tttcctaact ttatttagtc aaaaaattag ccttttaatt 2220ctgctgtaac ccgtacatgc ccaaaatagg gggcgggtta cacagaatat ataacatcgt 2280aggtgtctgg gtgaacagtt tattcctggc atccactaaa tataatggag cccgcttttt 2340aagctggcat ccagaaaaaa aaagaatccc agcaccaaaa tattgttttc ttcaccaacc 2400atcagttcat aggtccattc tcttagcgca actacagaga acaggggcac aaacaggcaa 2460aaaacgggca caacctcaat ggagtgatgc aacctgcctg gagtaaatga tgacacaagg 2520caattgaccc acgcatgtat ctatctcatt ttcttacacc ttctattacc ttctgctctc 2580tctgatttgg aaaaagctga aaaaaaaggt tgaaaccagt tccctgaaat tattccccta 2640cttgactaat aagtatataa agacggtagg tattgattgt aattctgtaa atctatttct 2700taaacttctt aaattctact tttatagtta gtcttttttt tagttttaaa acaccagaac 2760ttagtttcga ctcgagatgg ccaactattt taacacatta aatttgagac aacaattggc 2820tcaactgggt aagtgcagat ttatgggaag ggacgagttt gctgatggtg cttcttatct 2880gcaaggaaag aaagtagtaa ttgttggctg cggtgctcag ggtctaaacc aaggtttaaa 2940catgagagat tcaggtctgg atatttcgta tgcattgagg aaagaggcaa ttgcagaaaa 3000gagggcctcc tggcgtaaag cgacggaaaa tgggttcaaa gttggtactt acgaagaact 3060gatccctcag gcagatttag tgattaacct aacaccagat aaggttcact cagacgtagt 3120aagaacagtt caaccgctga tgaaggatgg ggcagcttta ggttactctc atggctttaa 3180tatcgttgaa gtgggcgagc agatcagaaa agatataaca gtcgtaatgg ttgcaccaaa 3240gtgcccaggt acggaagtca gagaggagta caagaggggt tttggtgtac ctacattgat 3300cgccgtacat cctgaaaatg accccaaagg tgaaggtatg gcaattgcga aggcatgggc 3360agccgcaacc ggaggtcata gagcgggtgt gttagagagt tctttcgtag ctgaggtcaa 3420gagtgactta atgggtgaac aaaccattct gtgcggaatg ttgcaggcag ggtctttact 3480atgctttgat aaattggtcg aagagggtac agatcctgcc tatgctgaaa agttgataca 3540atttggttgg gagacaatca ccgaggcact taaacaaggt ggcataacat tgatgatgga 3600tagactttca aatccggcca agctaagagc ctacgcctta tctgagcaac taaaagagat 3660catggcacca ttattccaaa agcacatgga cgatattatc tccggtgagt tttcctcagg 3720aatgatggca gattgggcaa acgatgataa aaagttattg acgtggagag aagaaaccgg 3780caagacggca ttcgagacag ccccacaata cgaaggtaaa attggtgaac aagaatactt 3840tgataaggga gtattgatga tagctatggt gaaggcaggg gtagaacttg cattcgaaac 3900tatggttgac tccggtatca ttgaagaatc tgcatactat gagtctttgc atgaattgcc 3960tttgatagca aatactattg caagaaaaag actttacgag atgaatgttg tcatatcaga 4020cactgcagaa tatggtaatt acttatttag ctacgcatgt gtcccgttgt taaagccctt 4080catggccgag ttacaacctg gtgatttggg gaaggctatt ccggaaggag cggttgacaa 4140tggccaactg agagacgtaa atgaagctat tcgttcacat gctatagaac aggtgggtaa 4200aaagctgaga ggatatatga ccgatatgaa aagaattgca gtggcaggat gaagatccgc 4260ggccgctcga gtcatgtaat tagttatgtc acgcttacat tcacgccctc cccccacatc 4320cgctctaacc gaaaaggaag gagttagaca acctgaagtc taggtcccta tttatttttt 4380tatagttatg ttagtattaa gaacgttatt tatatttcaa atttttcttt tttttctgta 4440cagacgcgtg tacgcatgta acattatact gaaaaccttg cttgagaagg ttttgggacg 4500ctcgaaggct ttaatttgcg gccggtaccc aattcgccct atagtgagtc gtattacgcg 4560cgctcactgg ccgtcgtttt acaacgtcgt gactgggaaa accctggcgt tacccaactt 4620aatcgccttg cagcacatcc ccctttcgcc agctggcgta atagcgaaga ggcccgcacc 4680gatcgccctt cccaacagtt gcgcagcctg aatggcgaat ggcgcgacgc gccctgtagc 4740ggcgcattaa gcgcggcggg tgtggtggtt acgcgcagcg tgaccgctac acttgccagc 4800gccctagcgc ccgctccttt cgctttcttc ccttcctttc tcgccacgtt cgccggc 485711934DNAArtificial SequencePrimer 421 119gccaacggat cctcaagcat ctaaaacaca accg 3412036DNAArtificial SequencePrimer 551 120gctcatgtcg acatgaagaa gctcaacaag tactcg 3612167DNAArtificial SequencePrimer 269 121ctagcatgta cccatacgat gttcctgact atgcgggtgt cgacgaattc ccgggatccg 60cggccgc 6712267DNAArtificial SequencePrimer 270 122tcgagcggcc gcggatcccg ggaattcgtc gacacccgca tagtcaggaa catcgtatgg 60gtacatg 6712335DNAArtificial SequencePrimer 1842 123ttttggatcc ctaccaatcc tggtggactt tatcg 3512437DNAArtificial SequencePrimer 2163 124ttggtagtcg acatggttta cactccatcc aagggtc 3712532DNAArtificial SequencePrimer 2183 125acagtagtcg acatgacaga gcagaaagcc ct 3212634DNAArtificial SequencePrimer 2184 126tacatcggat ccctacataa gaacaccttt ggtg 3412741DNAArtificial SequencePrimer 2195 127ttgttcctcg agatggagga acaggagata ggcgttcctg c 4112842DNAArtificial SequencePrimer 2196 128gttcttgcgg ccgcttattt tggagattct atctggggtt gc 4212940DNAArtificial SequencePrimer 2197 129ttcttggtcg acatgagtgc tctactgtcc gagtctgacc 4013043DNAArtificial SequencePrimer 2198 130ttgttcggat ccttaccagg tgctcccaac agagacgaga tcc 4313142DNAArtificial SequencePrimer 2259 131tcagtaagat ctatgactga gatactacca catgtaaacg ac 4213244DNAArtificial SequencePrimer 2260 132catatcctcg aggtacccta tacatccccc acagcatctc gcag 441337685DNAArtificial SequencepGV2074 133ttggatcata ctaagaaacc attattatca tgacattaac ctataaaaat aggcgtatca 60cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga aaacctctga cacatgcagc 120tcccggagac ggtcacagct tgtctgtaag cggatgccgg gagcagacaa gcccgtcagg 180gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa ctatgcggca tcagagcaga 240ttgtactgag agtgcaccat accacagctt ttcaattcaa ttcatcattt tttttttatt 300cttttttttg atttcggttt ctttgaaatt tttttgattc ggtaatctcc gaacagaagg 360aagaacgaag gaaggagcac agacttagat tggtatatat acgcatatgg caaattaaag 420ccttcgagcg tcccaaaacc ttctcaagca aggttttcag tataatgtta catgcgtaca 480cgcgtctgta cagaaaaaaa agaaaaattt gaaatataaa taacgttctt aatactaaca 540taactataaa aaaataaata gggacctaga cttcaggttg tctaactcct tccttttcgg 600ttagagcgga tgtgggggga gggcgtgaat gtaagcgtga cataagaatt cttattcctt 660tgccctcgga cgagtgctgg ggcgtcggtt tccactatcg gcgagtactt ctacacagcc 720atcggtccag acggccgcgc ttctgcgggc gatttgtgta cgcccgacag tcccggctcc 780ggatcggacg attgcgtcgc atcgaccctg cgcccaagct gcatcatcga aattgccgtc 840aaccaagctc tgatagagtt ggtcaagacc aatgcggagc atatacgccc ggaggcgcgg 900cgatcctgca agctccggat gcctccgctc gaagtagcgc gtctgctgct ccatacaagc 960caaccacggc ctccagaaga ggatgttggc gacctcgtat tgggaatccc cgaacatcgc 1020ctcgctccag tcaatgaccg ctgttatgcg gccattgtcc gtcaggacat tgttggagcc 1080gaaatccgca tgcacgaggt gccggacttc ggggcagtcc tcggcccaaa gcatcagctc 1140atcgagagcc tgcgcgacgg acgcactgac ggtgtcgtcc atcacagttt gccagtgata 1200cacatgggga tcagcaatcg cgcatatgaa atcacgccat gtagtgtatt gaccgattcc 1260ttgcggtccg aatgggccga acccgctcgt ctggctaaga tcggccgcag cgatcgcatc 1320catggcctcc gcgaccggct ggagaacagc gggcagttcg gtttcaggca ggtcttgcaa 1380cgtgacaccc tgtgcacggc gggagatgca ataggtcagg ctctcgctga actccccaat 1440gtcaagcact tccggaatcg ggagcgcggc cgatgcaaag tgccgataaa cataacgatc 1500tttgtagaaa ccatcggcgc agctatttac ccgcaggaca tatccacgcc ctcctacatc 1560gaagctgaaa gcacgagatt cttcgccctc cgagagctgc atcaggtcgg agacgctgtc 1620gaacttttcg atcagaaact tctcgacaga cgtcgcggtg agttcaggct ttttacccat 1680actagttttt agtttatgta tgtgtttttt gtagttatag atttaagcaa gaaaagaata 1740caaacaaaaa attgaaaaag attgatttag aattaaaaag aaaaatattt acgtaagaag 1800ggaaaatagt aaatgttgca agttcactaa actcctaaat tatgctgccc tttatattcc 1860ctgttacagc agccgagcca aaggtatata ggctcctttg cattagcatg cgtaacaaac 1920cacctgtcag tttcaaccga ggtggtatcc gagagaattg tgtgattgct ttaattaatt 1980tcggagaatc tcacatgcca ctgaagatta aaaactggat gccagaaaag gggtgtccag 2040gtgtaacatc aatagaggaa gctgaaaagt cttagaacgg gtaatcttcc accaacctga 2100tgggttccta gatataatct cgaagggaat aagtagggtg ataccgcaga agtgtctgaa 2160tgtattaagg tcctcacagt ttaaatcccg ctcacactaa cgtaggatta ttataactca 2220aaaaaatggc attattctaa gtaagttaaa tatccgtaat ctttaaacag cggccgcaga 2280tctctcgagt cgaaactaag ttctggtgtt ttaaaactaa aaaaaagact aactataaaa 2340gtagaattta agaagtttaa gaaatagatt tacagaatta caatcaatac ctaccgtctt 2400tatatactta ttagtcaagt aggggaataa tttcagggaa ctggtttcaa cctttttttt 2460cagctttttc

caaatcagag agagcagaag gtaatagaag gtgtaagaaa atgagataga 2520tacatgcgtg ggtcaattgc cttgtgtcat catttactcc aggcaggttg catcactcca 2580ttgaggttgt gcccgttttt tgcctgtttg tgcccctgtt ctctgtagtt gcgctaagag 2640aatggaccta tgaactgatg gttggtgaag aaaacaatat tttggtgctg ggattctttt 2700tttttctgga tgccagctta aaaagcgggc tccattatat ttagtggatg ccaggaataa 2760actgttcacc cagacaccta cgatgttata tattctgtgt aacccgcccc ctattttggg 2820catgtacggg ttacagcaga attaaaaggc taattttttg actaaataaa gttaggaaaa 2880tcactactat taattattta cgtattcttt gaaatggcga gtattgataa tgataaactg 2940gatccgtcga caaacttaga ttagattgct atgctttctt tctaatgagc aagaagtaaa 3000aaaagttgta atagaacaag aaaaatgaaa ctgaaacttg agaaattgaa gaccgtttat 3060taacttaaat atcaatggga ggtcatcgaa agagaaaaaa atcaaaaaaa aaattttcaa 3120gaaaaagaaa cgtgataaaa atttttattg cctttttcga cgaagaaaaa gaaacgaggc 3180ggtctctttt ttcttttcca aacctttagt acgggtaatt aacgacaccc tagaggaaga 3240aagaggggaa atttagtatg ctgtgcttgg gtgttttgaa gtggtacggc gatgcgcgga 3300gtccgagaaa atctggaaga gtaaaaaagg agtagaaaca ttttgaagct atgagctcag 3360atctgttaac cttgttttat atttgttgta aaaagtagat aattacttcc ttgatgatct 3420gtaaaaaaga gaaaaagaaa gcatctaaga acttgaaaaa ctacgaatta gaaaagacca 3480aatatgtatt tcttgcattg accaatttat gcaagtttat atatatgtaa atgtaagttt 3540cacgaggttc tactaaacta aaccaccccc ttggttagaa gaaaagagtg tgtgagaaca 3600ggctgttgtt gtcacacgat tcggacaatt ctgtttgaaa gagagagagt aacagtacga 3660tcgaacgaac tttgctctgg agatcacagt gggcatcata gcatgtggta ctaaaccctt 3720tcccgccatt ccagaacctt cgattgcttg ttacaaaacc tgtgagccgt cgctaggacc 3780ttgttgtgtg acgaaattgg aagctgcaat caataggaag acaggaagtc gagcgtgtct 3840gggttttttc agttttgttc tttttgcaaa caaatcacga gcgacggtaa tttctttctc 3900gataagaggc cacgtgcttt atgagggtaa catcaattca agaaggaggg aaacacttcc 3960tttttctggc cctgataata gtatgagggt gaagccaaaa taaaggattc gcgcccaaat 4020cggcatcttt aaatgcaggt atgcgatagt tcctcactct ttccttactc acgagtaatt 4080cttgcaaatg cctattatgc agatgttata atatctgtgc gtcttgagtt gagcctaggg 4140agctccagct tttgttccct ttagtgaggg ttaattgcgc gcttggcgta atcatggtca 4200tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat aggagccgga 4260agcataaagt gtaaagcctg gggtgcctaa tgagtgaggt aactcacatt aattgcgttg 4320cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc 4380caacgcgcgg ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc gctcactgac 4440tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata 4500cggttatcca cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa 4560aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct 4620gacgagcatc acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa 4680agataccagg cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg 4740cttaccggat acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca 4800cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa 4860ccccccgttc agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg 4920gtaagacacg acttatcgcc actggcagca gccactggta acaggattag cagagcgagg 4980tatgtaggcg gtgctacaga gttcttgaag tggtggccta actacggcta cactagaagg 5040acagtatttg gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc 5100tcttgatccg gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag 5160attacgcgca gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac 5220gctcagtgga acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc 5280ttcacctaga tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag 5340taaacttggt ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt 5400ctatttcgtt catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag 5460ggcttaccat ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctcca 5520gatttatcag caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact 5580ttatccgcct ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca 5640gttaatagtt tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg 5700tttggtatgg cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc 5760atgttgtgca aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg 5820gccgcagtgt tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca 5880tccgtaagat gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt 5940atgcggcgac cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc 6000agaactttaa aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc 6060ttaccgctgt tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca 6120tcttttactt tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa 6180aagggaataa gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat 6240tgaagcattt atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa 6300aataaacaaa taggggttcc gcgcacattt ccccgaaaag tgccacctga acgaagcatc 6360tgtgcttcat tttgtagaac aaaaatgcaa cgcgagagcg ctaatttttc aaacaaagaa 6420tctgagctgc atttttacag aacagaaatg caacgcgaaa gcgctatttt accaacgaag 6480aatctgtgct tcatttttgt aaaacaaaaa tgcaacgcga gagcgctaat ttttcaaaca 6540aagaatctga gctgcatttt tacagaacag aaatgcaacg cgagagcgct attttaccaa 6600caaagaatct atacttcttt tttgttctac aaaaatgcat cccgagagcg ctatttttct 6660aacaaagcat cttagattac tttttttctc ctttgtgcgc tctataatgc agtctcttga 6720taactttttg cactgtaggt ccgttaaggt tagaagaagg ctactttggt gtctattttc 6780tcttccataa aaaaagcctg actccacttc ccgcgtttac tgattactag cgaagctgcg 6840ggtgcatttt ttcaagataa aggcatcccc gattatattc tataccgatg tggattgcgc 6900atactttgtg aacagaaagt gatagcgttg atgattcttc attggtcaga aaattatgaa 6960cggtttcttc tattttgtct ctatatacta cgtataggaa atgtttacat tttcgtattg 7020ttttcgattc actctatgaa tagttcttac tacaattttt ttgtctaaag agtaatacta 7080gagataaaca taaaaaatgt agaggtcgag tttagatgca agttcaagga gcgaaaggtg 7140gatgggtagg ttatataggg atatagcaca gagatatata gcaaagagat acttttgagc 7200aatgtttgtg gaagcggtat tcgcaatatt ttagtagctc gttacagtcc ggtgcgtttt 7260tggttttttg aaagtgcgtc ttcagagcgc ttttggtttt caaaagcgct ctgaagttcc 7320tatactttct agagaatagg aacttcggaa taggaacttc aaagcgtttc cgaaaacgag 7380cgcttccgaa aatgcaacgc gagctgcgca catacagctc actgttcacg tcgcacctat 7440atctgcgtgt tgcctgtata tatatataca tgagaagaac ggcatagtgc gtgtttatgc 7500ttaaatgcgt acttatatgc gtctatttat gtaggatgaa aggtagtcta gtacctcctg 7560tgatattatc ccattccatg cggggtatcg tatgcttcct tcagcactac cctttagctg 7620ttctatatgc tgccactcct caattggatt agtctcatcc ttcaatgcta tcatttcctt 7680tgata 768513424DNAArtificial SequencePrimer 587 134ccaatgcaga ccgatcttct accc 2413522DNAArtificial SequencePrimer 588 135gatcacgtga tctgttgtat tg 2213620DNAArtificial SequencePrimer 2167 136tacatggggt acttctcctc 2013760DNAArtificial SequencePrimer 2170 137cagtcaacaa atataaagaa tattgaaatt gacagttttt gtcgctatcg atttttatta 6013860DNAArtificial SequencePrimer 2171 138ttttgtcgct atcgattttt attatttgct gttttaaatc attctggttc tatcgaggag 6013960DNAArtificial SequencePrimer 2172 139catgttattg acgccaggtt tggacgttgt ttttcactgt atccggatgt gaagtcgttg 6014060DNAArtificial SequencePrimer 2173 140tggttttaga aaaggatggt gtgcttgtcg ctgagacaca tgttattgac gccaggtttg 6014120DNAArtificial SequencePrimer 2175 141tctagttcag agcttggtgc 2014220DNAArtificial SequencePrimer 2226 142tgctccattt ggaagtctcg 2014320DNAArtificial SequencePrimer 2227 143tatctacgaa gtgacctgcg 201441716DNAArtificial SequenceB. subtilis alsS codon-optimized for expression in S. cerevisiae 144atgttgacta aagctacaaa agagcagaaa tcattggtga aaaatagggg tgcagaactt 60gttgtggact gtttggtaga acagggcgta acacatgttt ttggtatccc aggtgcaaaa 120atcgacgccg tgtttgatgc attacaagac aagggtccag aaattattgt tgctagacat 180gagcaaaatg ccgcatttat ggcgcaagct gtaggtaggc ttacaggtaa acctggtgtt 240gtcctagtta cgtctggccc aggagcctcc aatttagcaa ctggtctatt gacagctaat 300actgagggag atcctgtagt tgcgttagcc ggtaatgtaa ttagagctga taggcttaag 360agaactcacc agtctctaga caacgctgct ttattccaac cgatcaccaa gtactcagta 420gaggtacaag acgtaaagaa tatacctgaa gctgtgacaa acgcatttcg tatagcttct 480gctggtcagg ctggtgccgc gtttgtttct tttcctcaag acgttgtcaa tgaagtgacc 540aatactaaaa acgttagagc ggttgcagcc cctaaactag gtccagccgc agacgacgca 600attagcgctg caattgctaa aattcagacg gcgaaactac cagtagtcct tgtcggtatg 660aagggcggaa gaccagaagc aataaaagct gttcgtaagt tattgaagaa agtccaatta 720cctttcgttg agacttacca agcagcaggt actttatcta gagatttaga ggatcagtat 780tttggaagga taggtctatt tagaaaccaa ccaggagatt tactattaga acaagctgat 840gttgtactta ctatcggtta tgatcctata gagtatgacc caaagttttg gaacataaat 900ggggatagaa caattataca tctagacgag ataatcgccg acatcgatca cgcttatcaa 960ccagatttag aactaatcgg agatatcccg tcaacaatca atcatattga acatgatgct 1020gtaaaggttg agttcgctga acgtgagcag aaaatcttat ctgatctaaa gcaatatatg 1080catgagggtg aacaagttcc agcagactgg aaatctgacc gtgcacatcc tttggaaatc 1140gttaaggaac taagaaatgc ggtcgatgat catgtgactg ttacatgtga tatcggttca 1200catgcaattt ggatgtcacg ttattttagg agctacgaac cattaacttt aatgatatct 1260aacgggatgc aaactctggg ggttgcactt ccttgggcta ttggcgctag tttagttaag 1320cccggtgaga aggtggtatc ggtatcaggt gatggtggct ttctgttttc ggctatggaa 1380ttagaaactg cagtccgttt aaaagctccc attgtgcata ttgtctggaa tgattctact 1440tacgacatgg ttgcttttca acagttgaag aaatacaata gaacttcggc tgtagacttt 1500ggtaacatcg atattgtgaa atatgctgag tcttttggcg caacaggcct gagggtggaa 1560agtccagatc agttagctga tgtgttgaga caagggatga atgccgaggg accggtaatc 1620atagatgtgc cagttgacta ctcagacaat attaatttgg cttctgataa acttcctaaa 1680gagtttggcg agctaatgaa gaccaaagcc ttataa 171614511PRTThermotoga petrophila 145Pro Tyr His Lys Glu Gly Gly Leu Gly Ile Leu 1 5 10 14611PRTVictivallis vadensis 146Pro Tyr Ser Glu Lys Gly Gly Leu Ala Ile Leu 1 5 10 14711PRTUnknownTermite Group 1 Bacterium Phylotype Rs-D17 147Pro Tyr Lys Pro Glu Gly Gly Ile Ala Ile Leu 1 5 10 14811PRTYarrowia lipolytica 148Pro Leu Lys Pro Ser Gly His Leu Gln Ile Leu 1 5 10 14911PRTFrancisella tularensis 149Pro Ile Lys Lys Thr Gly His Leu Gln Ile Leu 1 5 10 15011PRTArabidopsis thaliana 150Pro Ile Lys Glu Thr Gly His Ile Gln Ile Leu 1 5 10 1511647DNAArtificial SequenceL. lactis kivD codon-optimized for expression in S. cerevisiae 151atgtatacag tgggtgatta cttgctagac cgtttacatg aattaggcat agaagagatt 60tttggagtac caggtgatta caatttgcaa ttcttggatc agattatctc acataaggat 120atgaaatggg tcgggaatgc gaacgagtta aacgcttcct atatggcaga tggttatgcg 180agaaccaaaa aggccgctgc ttttcttaca actttcggtg taggtgaact ttcagccgtt 240aatggattag ccggatctta cgctgaaaac ttaccagtcg ttgaaattgt tggttctcct 300acttctaagg tacaaaacga gggaaagttt gttcaccata ctttagcgga tggtgatttc 360aaacatttca tgaagatgca tgaacctgtc acagcagcga gaacactttt gaccgcagag 420aacgctactg ttgaaatcga tagagtatta agtgctttgt taaaagagag aaagccagtg 480tatatcaact tgcctgttga tgtcgctgcc gcaaaagcag aaaaaccatc tttaccattg 540aaaaaggaaa actcaaccag taacacatct gatcaagaga ttctaaacaa aatccaagag 600tcattgaaaa acgccaaaaa gccaatcgtt ataacaggcc atgaaatcat ctcctttggt 660ttagaaaaga ccgttacaca attcatctct aagacaaaat tgccaatcac tactttaaac 720tttggcaaat cttctgtaga tgaagcttta ccatcttttc taggtatcta caatggtact 780ctttctgaac caaacctaaa agagtttgtc gaatccgctg atttcatact tatgttgggt 840gttaagctaa ctgatagttc aactggtgca ttcactcacc acttgaacga aaacaagatg 900atatcattga atatcgatga gggcaagata ttcaacgaac gtattcaaaa cttcgatttt 960gaatcactaa tttcttctct acttgatttg tcagaaatag aatacaaagg aaaatacatc 1020gacaaaaagc aagaggattt cgtgccatca aacgcattgt tgtcacaaga tagactttgg 1080caagctgtgg aaaacctaac tcaatctaac gaaacaattg tggctgagca aggcacatca 1140ttcttcggcg catctagtat tttcttgaag agtaagtccc acttcatcgg tcaacctctt 1200tggggatcta ttgggtacac cttccctgcc gcattgggat cacagatcgc agacaaggag 1260tccagacatc tactattcat tggggacgga tcattgcaac ttaccgttca ggaattaggg 1320ttggctataa gagaaaagat caatcctatt tgttttatca tcaacaatga tggctataca 1380gtagaacgtg agattcacgg tccaaatcaa agttacaatg acatcccaat gtggaattac 1440tctaagttgc ctgaatcttt cggtgcaaca gaagatagag ttgtttccaa aatagtcaga 1500acagaaaatg agtttgtttc tgtcatgaag gaagctcaag ccgatcctaa tagaatgtac 1560tggattgaac taatcttggc caaggaagga gccccaaaag tattgaaaaa gatgggaaaa 1620cttttcgctg aacagaataa gtcctga 16471521023DNAArtificial SequenceL. lactis alcohol dehydrogenase (adhA) RE1 152atgaaagcag cagtagtaag acacaatcca gatggttatg cggaccttgt tgaaaaggaa 60cttcgagcaa tcaaacctaa tgaagctttg cttgacatgg agtattgtgg agtctgtcat 120accgatttgc acgttgcagc aggtgatttt ggcaacaaag cagggactgt tcttggtcat 180gaaggaattg gaattgtcaa agaaattgga gctgatgtaa gctcgcttca agttggtgat 240cgggtttcag tggcttggtt ctttgaagga tgtggtcact gtgaatactg tgtatctggt 300aatgaaactt tttgtcgaga agttaaaaat gcaggatatt cagttgatgg cggaatggct 360gaagaagcaa ttgttgttgc cgattatgct gtcaaagttc ctgacggact tgacccaatt 420gaagctagct caattacttg tgctggagta acaacttaca aagcaatcaa agtatcagga 480gtaaaacctg gtgattggca agtaattttt ggtgctggag gacttggaaa tttagcaatt 540caatatgcta aaaatgtttt tggagcaaaa gtaattgctg ttgatattaa tcaagataaa 600ttaaatttag ctaaaaaaat tggagctgat gtgactatca attctggtga tgtaaatcca 660gttgatgaaa ttaaaaaaat aactggcggc ttaggggtgc aaagtgcaat agtttgtgct 720gttgcaagga ttgcttttga acaagcggtt gcttctttga aacctatggg caaaatggtt 780gctgtggcag ttcccaatac tgagatgact ttatcagttc caacagttgt ttttgacgga 840gtggaggttg caggttcact tgtcggaaca agacttgact tggcagaagc ttttcaattt 900ggagcagaag gtaaggtaaa accaattgtt gcgacacgca aactggaaga aatcaatgat 960attattgatg aaatgaaggc aggaaaaatt gaaggccgaa tggtcattga ttttactaaa 1020taa 10231532073DNASaccharomyces cerevisiae 153atggaaggct tcaatccggc tgacatagaa catgcgtcac cgattaattc atctgacagc 60cattcatcct cctttgtata tgctctaccc aaaagtgcta gtgaatatgt agtcaaccat 120aatgagggtc gtgcaagtgc aagtggaaat ccagccgcag tgccgtctcc cataatgaca 180ctgaatctca aaagcacaca ttccctcaat attgatcagc atgttcatac ctcaacatcg 240ccgacggaaa ctattgggca tattcatcat gtggaaaagc tgaatcaaaa caatttgatt 300catctggatc cagtacccaa ctttgaagat aagtccgata ttaagccttg gttgcaaaag 360attttttatc ctcaaggaat agaacttgtg atagaaaggt cggacgcatt taaagttgtc 420ttcaagtgta aagctgctaa aaggggaagg aacgcgagaa ggaaaagaaa agataagccc 480aaaggacagg accacgaaga cgagaaatcc aagatcaatg atgacgaatt agaatatgcg 540agtccttcta atgccacagt aaccaatggg cctcaaacat cgcccgatca aacatcctcc 600ataaagccaa agaaaaaaag atgtgtatcg aggtttaata actgtccgtt tagagtacga 660gctacttatt cgttaaagag gaaaagatgg agcattgttg taatggacaa taaccattca 720catcagctaa agtttaaccc tgattccgaa gagtacaaaa aattcaaaga aaaattaaga 780aaggataatg acgtagatgc aatcaagaaa ttcgacgaat tggaatacag aactttggcc 840aatttgccca ttccaacagc tacaatcccc tgtgattgtg gtttaacaaa tgaaatacaa 900agtttcaatg tcgtattgcc cactaacagt aatgttactt catcagcatc ctcttcaact 960gtatcgtcca tatcccttga ttcatcgaat gcatctaaaa ggccatgctt accctctgta 1020aataacaccg gtagtatcaa taccaataac gtaaggaaac cgaaaagcca gtgtaagaat 1080aaagacacac tcttaaaaag aaccaccatg cagaactttc tcacaactaa atcaaggctg 1140cgtaagaccg gtacgccaac atcttcgcaa cactcatcta cagcattttc aggatatatt 1200gatgatcctt tcaatttgaa tgaaatcttg ccactgccgg catccgattt caagctaaac 1260actgtaacaa atttgaacga aattgacttt acgaacattt ttaccaaatc gccgcatcca 1320catagcgggt ctacccatcc aagacaagtc ttcgaccaat tggacgattg ttcctctata 1380ctcttctctc cattaactac aaacacgaat aatgaatttg aaggagagtc agatgatttt 1440gttcattctc catatttgaa ctcagaggca gatttcagcc aaattcttag tagtgctccc 1500ccagtccatc atgacccaaa tgaaacacat caggaaaacc aggatattat tgatagattt 1560gctaatagtt cccaagaaca taatgagtat attctacaat atttgacgca ctccgatgct 1620gctaaccaca ataacatcgg cgttccaaac aacaattcac attcgctaaa tactcagcat 1680aacgtttctg atctgggcaa ctcactttta agacaagaag ctttagttgg cagctcttca 1740acaaaaatct tcgacgaatt gaaatttgta caaaatggcc cacacggttc tcaacatcct 1800atagattttc aacatgttga ccatcgtcat ctcagctcta atgaacctca agtacgatca 1860catcaatatg gtccgcaaca gcagccaccg cagcaattgc aatatcacca aaatcagccc 1920cacgacggcc ataaccacga acagcaccaa acagtacaaa aggatatgca aacgcatgaa 1980tcgctagaaa taatgggaaa cacattattg gaagagttca aagacattaa aatggtgaac 2040ggcgagttga agtatgtgaa gccagaagat tag 20731541494DNAArtificial SequenceE. coli ketolacid reductoisomerase P2D1-A1 154atggccaact attttaacac attaaatttg agacaacaat tggctcaact gggtaagtgc 60agatttatgg gaagggacga gtttgctgat ggtgcttctt atctgcaagg aaagaaagta 120gtaattgttg gctgcggtgc tcagggtcta aaccaaggtt taaacatgag agattcaggt 180ctggatattt cgtatgcatt gaggaaagag tctattgcag aaaaggatgc cgattggcgt 240aaagcgacgg aaaatgggtt caaagttggt acttacgaag aactgatccc tcaggcagat 300ttagtgatta acctaacacc agataaggtt cactcagacg tagtaagaac agttcaaccg 360ctgatgaagg atggggcagc tttaggttac tctcatggct ttaatatcgt tgaagtgggc 420gagcagatca gaaaaggtat aacagtcgta atggttgcgc caaagtgccc aggtacggaa 480gtcagagagg agtacaagag gggttttggt gtacctacat tgatcgccgt acatcctgaa 540aatgacccca aacgtgaagg tatggcaata gcgaaggcat gggcagccgc aaccggaggt 600catagagcgg gtgtgttaga gagttctttc gtagctgagg tcaagagtga cttaatgggt 660gaacaaacca ttctgtgcgg aatgttgcag gcagggtctt tactatgctt tgataaattg 720gtcgaagagg gtacagatcc tgcctatgct gaaaagttga tacaatttgg ttgggagaca 780atcaccgagg cacttaaaca aggtggcata acattgatga tggatagact ttcaaatccg 840gccaagctaa gagcctacgc cttatctgag caactaaaag agatcatggc accattattc 900caaaagcaca tggacgatat tatctccggt gagttttcct caggaatgat ggcagattgg 960gcaaacgatg ataaaaagtt attgacgtgg agagaagaaa ccggcaagac ggcattcgag 1020acagccccac aatacgaagg taaaattggt gaacaagaat actttgataa gggagtattg 1080atgatagcta tggtgaaggc aggggtagaa cttgcattcg aaactatggt tgactccggt 1140atcattgaag aatctgcata ctatgagtct ttgcatgaat tgcctttgat

agcaaatact 1200attgcaagaa aaagacttta cgagatgaat gttgtcatat cagacactgc agaatatggt 1260aattacttat ttagctacgc gtgtgtcccg ttgttagagc ccttcatggc cgagttacaa 1320cctggtgatt tggggaaggc tattccggaa ggagcggttg acaatggcca actgagagac 1380gtaaatgaag ctattcgttc gcatgctata gaacaggtgg gtaaaaagct gagaggatat 1440atgaccgata tgaaaagaat tgcagtggca ggacaccacc accaccacca ctaa 14941551713DNAArtificial SequenceL. lactis ilvD codon-optimized for expression in S. cerevisiae 155atggagttta agtataacgg caaagttgaa tctgttgaac tgaataagta cagcaaaacg 60ttgacacaag atcccacaca acccgccaca caggcaatgt attacggcat cgggtttaaa 120gacgaagatt tcaagaaagc tcaagtgggt atagtgtcga tggactggga tggaaatcca 180tgcaacatgc atttaggaac ccttggatca aagattaaaa gctcagtaaa tcagacagat 240ggtctgatcg gcttacaatt tcatacgata ggagtttctg atgggatagc aaatggaaag 300ttgggaatga gatactccct tgtttccaga gaagttatag ctgactctat tgaaaccaac 360gctggcgctg aatactatga tgcaattgta gccatcccag gttgtgacaa aaatatgcca 420ggttctatta ttggtatggc aagacttaat aggccaagca ttatggtgta tggaggaaca 480atagaacacg gtgaatataa aggtgagaaa ttgaacatcg tatcggcttt tgaatctcta 540ggccagaaaa ttaccggcaa tatctctgat gaagattatc acggtgttat ttgtaatgct 600attcctggtc aaggggcatg tggggggatg tacacagcta ataccttagc tgccgctatc 660gaaacactag gtatgtcatt gccgtattct tcttcgaacc ctgcagtatc tcaagaaaaa 720caagaagaat gtgatgagat tggattagcc attaagaatc ttttggaaaa agacatcaag 780cctagtgata taatgactaa ggaggcgttc gagaacgcta ttaccattgt gatggtcttg 840gggggtagta ctaatgctgt cttgcatatt attgcaatgg ctaacgcgat aggtgtcgaa 900ataactcagg atgacttcca aagaattagt gacattactc cagtactagg tgattttaaa 960ccttcaggta aatatatgat ggaagatttg cataaaattg gaggcttgcc agcagtgctt 1020aagtaccttc taaaggaagg aaaattgcat ggtgactgcc ttactgtgac gggtaaaaca 1080ttagccgaga atgtcgagac tgccctagac ttggatttcg actcacaaga tatcatgagg 1140ccactaaaga atcctatcaa ggccaccggc cacttgcaga ttctgtacgg taatttagct 1200caagggggtt ccgtagcaaa aattagcggt aaagaaggag agttcttcaa aggcactgcc 1260agagtctttg atggtgaaca acattttatc gacggcatag aatctggtcg tttgcatgct 1320ggagatgtag cggtaattag gaatataggt cccgtcggcg gacctggtat gcccgaaatg 1380ctgaagccta catcagcatt aattggtgcg ggtttaggga aaagttgcgc gttaattacg 1440gatggtagat tctccggtgg cactcacggt tttgttgtcg gccatattgt gcctgaagcc 1500gttgagggtg gactaatcgg cttagttgaa gatgacgata taatagagat agatgcagtc 1560aacaactcta tatccctgaa agtttccgat gaagaaatcg caaagagaag agctaattat 1620cagaagccaa ctccgaaagc caccagggga gttttggcaa aattcgctaa attaacccgt 1680cctgcatcgg aagggtgtgt tactgatctg taa 17131561683DNAArtificial SequenceF. tularensis ilvD codon-optimized for expression in S. cerevisiae 156atgaaaaagg tgctgaataa gtactcaaga cgtcttaccg aagataagtc tcaaggtgct 60tctcaggcta tgctatacgg aacagagatg aatgatgcag atatgcacaa gcctcaaatc 120ggtatcggtt ccgtttggta tgaaggaaat acttgtaata tgcatttgaa tcaattagca 180caatttgtca aggattctgt tgaaaaggaa aacttgaaag gcatgagatt caacacaatt 240ggagtttctg atggtatctc catgggtact gatggcatgt cctactctct acaatcacgt 300gatctaatcg ctgattcaat cgaaacagtt atgagtgcac actggtatga tggcctagtt 360tcaatcccag gttgtgacaa aaacatgcca ggttgcatga tggcccttgg tagattaaac 420agaccaggtt tcgtgatcta cggtggaacc atacaagctg gcgttatgag aggcaaacct 480attgatattg tcacagcttt ccaatcatat ggagcatgct tatctgggca aataactgaa 540caggaaagac aagagactat caaaaaggct tgtccaggtg caggagcctg tggcggcatg 600tacacagcta acacaatggc ctgtgccatt gaggcccttg gaatgagttt gcctttttcc 660tcttctactt ctgcaacttc agttgaaaag gtacaagagt gtgataaggc aggcgaaaca 720atcaaaaact tgttagaatt ggacattaaa ccaagagaca tcatgactag aaaagctttc 780gaaaacgcta tggtactaat tacagtaatg ggaggttcaa caaatgccgt gttacatctg 840ttagcaatgg cttcatccgt cgatgtagat ttgagtatcg atgactttca ggaaatagct 900aacaaaactc cagtgctggc tgatttcaag ccatccggga aatatgtcat ggcaaacttg 960catgcaattg gcgggactcc tgcagttatg aaaatgttgc tgaaggccgg aatgcttcat 1020ggcgattgtt tgactgtaac tgggaaaacc ttagccgaaa acttggaaaa tgtggccgac 1080ctgccagaag ataacacaat catacacaaa ctagataacc caatcaaaaa gactggtcat 1140ttgcaaatct tgaaggggaa tgttgcccca gaaggttctg ttgctaagat aacagggaag 1200gaaggtgaga tattcgaggg cgtagccaat gtctttgatt cagaggaaga gatggttgcc 1260gcagtcgaaa ctggaaaagt caaaaagggc gatgttattg ttattagata cgaaggtcct 1320aaaggtggcc ctggcatgcc tgaaatgctt aagccaacct ctttgataat gggtgctgga 1380ctaggccagg atgttgcatt aatcacagat ggcagatttt caggtggtag tcatggtttc 1440attgtaggtc acattacacc agaagcatac gaaggcggta tgatcgcctt attagaaaac 1500ggtgataaga taacaatcga tgctatcaac aatgtgataa atgtagactt aagtgatcaa 1560gagattgctc aacgtaaatc taagtggaga gcatcaaagc aaaaagcttc cagaggtaca 1620ctgaaaaagt acattaagac cgtctcttct gcttctaccg ggtgcgtgac tgatttggat 1680tga 16831571704DNASaccharomyces cerevisiae 157atggcaaaga agctcaacaa gtactcgtat atcatcactg aacctaaggg ccaaggtgcg 60tcccaggcca tgctttatgc caccggtttc aagaaggaag atttcaagaa gcctcaagtc 120ggggttggtt cctgttggtg gtccggtaac ccatgtaaca tgcatctatt ggacttgaat 180aacagatgtt ctcaatccat tgaaaaagcg ggtttgaaag ctatgcagtt caacaccatc 240ggtgtttcag acggtatctc tatgggtact aaaggtatga gatactcgtt acaaagtaga 300gaaatcattg cagactcctt tgaaaccatc atgatggcac aacactacga tgctaacatc 360gccatcccat catgtgacaa aaacatgccc ggtgtcatga tggccatggg tagacataac 420agaccttcca tcatggtata tggtggtact atcttgcccg gtcatccaac atgtggttct 480tcgaagatct ctaaaaacat cgatatcgtc tctgcgttcc aatcctacgg tgaatatatt 540tccaagcaat tcactgaaga agaaagagaa gatgttgtgg aacatgcatg cccaggtcct 600ggttcttgtg gtggtatgta tactgccaac acaatggctt ctgccgctga agtgctaggt 660ttgaccattc caaactcctc ttccttccca gccgtttcca aggagaagtt agctgagtgt 720gacaacattg gtgaatacat caagaagaca atggaattgg gtattttacc tcgtgatatc 780ctcacaaaag aggcttttga aaacgccatt acttatgtcg ttgcaaccgg tgggtccact 840aatgctgttt tgcatttggt ggctgttgct cactctgcgg gtgtcaagtt gtcaccagat 900gatttccaaa gaatcagtga tactacacca ttgatcggtg acttcaaacc ttctggtaaa 960tacgtcatgg ccgatttgat taacgttggt ggtacccaat ctgtgattaa gtatctatat 1020gaaaacaaca tgttgcacgg taacacaatg actgttaccg gtgacacttt ggcagaacgt 1080gcaaagaaag caccaagcct acctgaagga caagagatta ttaagccact ctcccaccca 1140atcaaggcca acggtcactt gcaaattctg tacggttcat tggcaccagg tggagctgtg 1200ggtaaaatta ccggtaagga aggtacttac ttcaagggta gagcacgtgt gttcgaagag 1260gaaggtgcct ttattgaagc cttggaaaga ggtgaaatca agaagggtga aaaaaccgtt 1320gttgttatca gatatgaagg tccaagaggt gcaccaggta tgcctgaaat gctaaagcct 1380tcctctgctc tgatgggtta cggtttgggt aaagatgttg cattgttgac tgatggtaga 1440ttctctggtg gttctcacgg gttcttaatc ggccacattg ttcccgaagc cgctgaaggt 1500ggtcctatcg ggttggtcag agacggcgat gagattatca ttgatgctga taataacaag 1560attgacctat tagtctctga taaggaaatg gctcaacgta aacaaagttg ggttgcacct 1620ccacctcgtt acacaagagg tactctatcc aagtatgcta agttggtttc caacgcttcc 1680aacggttgtg ttttagatgc ttga 17041581692DNASaccharomyces cerevisiae 158atgaacaagt actcgtatat catcactgaa cctaagggcc aaggtgcgtc ccaggccatg 60ctttatgcca ccggtttcaa gaaggaagat ttcaagaagc ctcaagtcgg ggttggttcc 120tgttggtggt ccggtaaccc atgtaacatg catctattgg acttgaataa cagatgttct 180caatccattg aaaaagcggg tttgaaagct atgcagttca acaccatcgg tgtttcagac 240ggtatctcta tgggtactaa aggtatgaga tactcgttac aaagtagaga aatcattgca 300gactcctttg aaaccatcat gatggcacaa cactacgatg ctaacatcgc catcccatca 360tgtgacaaaa acatgcccgg tgtcatgatg gccatgggta gacataacag accttccatc 420atggtatatg gtggtactat cttgcccggt catccaacat gtggttcttc gaagatctct 480aaaaacatcg atatcgtctc tgcgttccaa tcctacggtg aatatatttc caagcaattc 540actgaagaag aaagagaaga tgttgtggaa catgcatgcc caggtcctgg ttcttgtggt 600ggtatgtata ctgccaacac aatggcttct gccgctgaag tgctaggttt gaccattcca 660aactcctctt ccttcccagc cgtttccaag gagaagttag ctgagtgtga caacattggt 720gaatacatca agaagacaat ggaattgggt attttacctc gtgatatcct cacaaaagag 780gcttttgaaa acgccattac ttatgtcgtt gcaaccggtg ggtccactaa tgctgttttg 840catttggtgg ctgttgctca ctctgcgggt gtcaagttgt caccagatga tttccaaaga 900atcagtgata ctacaccatt gatcggtgac ttcaaacctt ctggtaaata cgtcatggcc 960gatttgatta acgttggtgg tacccaatct gtgattaagt atctatatga aaacaacatg 1020ttgcacggta acacaatgac tgttaccggt gacactttgg cagaacgtgc aaagaaagca 1080ccaagcctac ctgaaggaca agagattatt aagccactct cccacccaat caaggccaac 1140ggtcacttgc aaattctgta cggttcattg gcaccaggtg gagctgtggg taaaattacc 1200ggtaaggaag gtacttactt caagggtaga gcacgtgtgt tcgaagagga aggtgccttt 1260attgaagcct tggaaagagg tgaaatcaag aagggtgaaa aaaccgttgt tgttatcaga 1320tatgaaggtc caagaggtgc accaggtatg cctgaaatgc taaagccttc ctctgctctg 1380atgggttacg gtttgggtaa agatgttgca ttgttgactg atggtagatt ctctggtggt 1440tctcacgggt tcttaatcgg ccacattgtt cccgaagccg ctgaaggtgg tcctatcggg 1500ttggtcagag acggcgatga gattatcatt gatgctgata ataacaagat tgacctatta 1560gtctctgata aggaaatggc tcaacgtaaa caaagttggg ttgcacctcc acctcgttac 1620acaagaggta ctctatccaa gtatgctaag ttggtttcca acgcttccaa cggttgtgtt 1680ttagatgctt ga 16921591923DNANeurospora crassa 159atggcttcta atcaagataa caaggcagtt gctccagacg ctgctgcacc agcgggtcag 60tcaacaacca ccacaactac aaatgataac agtgaaagga atctaccaaa ggaaggcgaa 120tacattcaat ggaggacact tccagcgggc aatccagatc agttgaacag atggagtcat 180ttcctgactc gtgagcatga gtttccaggc gctcaggcaa tgttgtacgg tgcgggtgta 240cctaacaaag atatgatgaa aaaggctcct catgttggga tcgctactgt ttggtgggaa 300ggtaacccat gtaatactca tctgcttgat ctaggtcaaa aagtcaaaaa ggctgttgaa 360agagagaaga tgttagcttg gcaattcaac acaattggcg ttagtgacgg aataacaatg 420ggtggtgaag gcatgaggta ctctttgcag agcagagaga tcatagcaga ttctatagag 480actgtgacat gtgcacaaca ccatgatgcc aatatctcaa ttccagggtg cgacaaaaac 540atgccaggcg tcatcatggc agctgcaaga cacaacagac cattcgttat gatctacgga 600ggtacaatga gaggcggtca ttccgaatta cttgatagac ctatcaatat cgtaacttgt 660tacgaggcct caggggccta tacttatggt agacttaagc cagcctgtcc aaactccact 720gctaccccat ctgacgtgat ggacgatata gaacaacacg cctgtccagg ggctggagct 780tgtggaggga tgtacaccgc gaatactatg gcaaccgcca tagaagctat gggtctgaca 840gcaccagggt catcctcctt tccagccagc tcaccagaaa agttcagaga gtgcgaaaaa 900gccgcggaat acattaagat atgcatggaa aaagatattc gtccaagaga cttactaaca 960aaggcttcct tcgagaatgc tctcgtcttg acaatgattc taggtggttc aaccaacggt 1020gttttacatt acttagccat ggccaactcc gccgatgtcg atctaactct tgatgatatc 1080aatagagtca gtgctaagac tcctttcctc gctgatatgg ctccatctgg tagatactat 1140atggaggatt tgtacaaggt aggtggtact ccagccgtac tcaagatgtt gatagctgcc 1200ggctatatcg atggaacaat tccaacaata acaggaaaat ctttggctga aaacgtgtca 1260gattggccat ctttagaccc tgatcaaaag attatccgtc ctttggataa tcctatcaaa 1320tcacaaggtc acattagagt gctgtatggt aacttctctc ctggtggggc tgttgccaag 1380atcacgggta aggaaggtct tagttttact ggtaaggcaa gatgctttaa caaagagttt 1440gaattggatg ctgcgctgaa aaactctgaa atcacgctcg aacaaggaaa tcaagttcta 1500attgtaaggt atgaaggccc taagggcgga ccaggcatgc cagaacaatt gaaagcatct 1560gccgctatca tgggcgctgg tttgacgaac gtagctttag tcacggatgg gcgatactct 1620ggcgcttctc acggtttcat cgtcggtcac gtcgtgcctg aggcggcaac tggcggacct 1680attgctttag taaaggatgg agatttgatc acaattgatg cagtcagaaa tagaattgat 1740gttgtcaaaa ccgtagaagg agtggagggc gaggaggaaa ttgcaaaggt tttagaagag 1800aggaaaaagg gatggaaagc acctaagatg aagccaacaa gaggagccct ggccaaatac 1860gcaagacttg ttggtgacgc atcacatgga gcagttacag acttaggagg agatgcttac 1920taa 19231601686DNAAcaryochloris marina 160atgtcagata atcgtaattc tcaagtagtc acacaaggtg ttcaaagagc acctaataga 60gctatgttaa gagctgtagg attcggagat gatgatttca cgaaaccaat agttggattg 120gctaatggtt tctctactat tactccttgt aacatgggaa ttgatagttt ggccacaaga 180gctgaagcat ctattaggac ggctggtgca atgccacaaa agtttggaac cattacaata 240agcgatggga tatcaatggg tacagaaggt atgaagtatt ctctcgtttc aagagaagtg 300attgccgatt ccattgaaac agcttgcatg ggccagagta tggatggcgt attagcaatt 360ggtggctgcg acaaaaacat gcctggcgcg atgttagcaa tggctcgtat gaacatacca 420gccatcttcg tatatggtgg cactatcaag ccaggccacc tcaatggtga agatttgact 480gtcgtatcag ctttcgaagc tgtggggcaa cattccgccg gtagaatatc cgaagccgaa 540cttacagcag tcgaaaagca tgcatgtcca ggcgctggat catgtggtgg catgtacacg 600gccaacacaa tgtctagtgc ttttgaggct atgggcatgt ccttgatgta ctcatccact 660atggctgcag aagatgagga gaaggctgtt tctgccgaac aatctgcggc tgtgctagtt 720gaggcaatcc acaaacagat tctaccaaga gatattctaa ccagaaaggc gtttgagaac 780gcaatagcag tcataatggc tgttgggggt tccacaaatg cagttctcca cttgttagcg 840atttcaagag cagcaggaga ctctttaact ttagatgatt tcgaaactat cagggctcaa 900gttccagtga tttgtgattt gaagccttct ggtcgatatg tcgctacaga ccttcataaa 960gctggcggaa tcccattagt tatgaaaatg ctattagagc atgggctatt acatggggat 1020gcattgacta ttaccggcaa gacaattgca gagcaattgg ccgatgtgcc atctgaacct 1080tctcctgatc aagacgtaat ccgtccttgg gataatccaa tgtacaagca aggtcacctt 1140gccatcttga gaggtaactt ggccacagaa ggtgcagtag ccaagatcac agggatcaaa 1200aaccctcaaa tcactggacc agctagagtt ttcgagtcag aggaagcctg tttagaagcg 1260atcctggccg gaaagatcca accaaatgac gtgatagtcg ttcgatacga aggtccaaaa 1320ggaggaccag gtatgaggga aatgctggct cctacttccg caatcatagg tgcgggtcta 1380ggagactcag ttggccttat cactgatggg agattttccg gtgggacata cggtatggtt 1440gttggacatg tagcaccaga agcagctgtt ggtgggacca ttgctctggt tcaagagggt 1500gaccaaatca ctatcgatgc tcacgctaga aagttggagc tgcatgtctc tgaccaagag 1560cttaaagagc gtaaggaaaa gtgggagcag ccaaaaccac tgtacaataa gggtgtgctt 1620gcgaagtacg ccaaactcgt aagctctagt tcagtaggag cggttacaga tttggatttg 1680ttctaa 16861611683DNALyngbya spp. 161atgtccgata acttccgttc tcaagccatt acacagggca aaaagagaac tcctaataga 60gctatgctga gagcagttgg atttggagat gaagatttca acaaaccaat tgttggtatt 120gccaatggct actccaccat aactccttgc aacatcggtc ttaacgatct tgcacatagg 180gccgaaacag ctctaaagca agcagacgcc atgccacaaa tgttcggaac tattactgta 240agtgatggaa ttgcaatggg aaccgaaggt atgaagtact ctcttgttag cagagaagtt 300atagccgatg ctattgaaac tgcttgtaac ggacagtcta tggatggggt cttagcaata 360ggaggttgtg acaaaaacat gcctggtgct atgatcgcca tagcgcgtat gaatatccct 420gctatctttg tatacggcgg tacaatcaag ccaggtaatc taaacggttg tgatctaaca 480gttgtctccg cattcgaagc cgttggagag tattctgctg gcaaactaga tgacgataga 540ttactggaca tcgagagatt agcatgccct ggttctggct catgtggggg aatgttcact 600gctaatacaa tgtctagtgc atttgaagca atgggtatga gtctgatgta cagcagtaca 660atggcatccg aagatgctga aaaggctgat tccaccgaaa agtccgcttt tgttttgaga 720gaggcaattt ctcagagaat cctacctaag caaatcctga cgaggaaagc cttcgaaaac 780gcaattgcag tcatcatggc ggtaggcggc tccacaaact ctgtattgca tctattggct 840attgcctatg ctgccgatgt agaattgacc atagatgatt tcgaaacaat tcgtgggaga 900gtaccagttt tgtgtgatct taagccatca ggacgatttg tcactaccga tttccataag 960gctggtggag tcccattgat catgaagatg ttactcgaac aaggtttgat ccatggggat 1020gcccttacta taacgggtaa aacagtcgca gagcaattag ctgatatccc atctcaacca 1080tctgccgacc aagaggtgat aagaccatgg aataacccaa tgtacaagca aggtcacttg 1140gcgatcctta aggggaatct tgcaacagaa ggttcagtcg ccaagataac aggtgtgaaa 1200aagcctcaga tgacaggtcc agcgcgagtt tttgaatcag aagagcaatg cttagaagct 1260atactagccg gcaaaatcca agctggggac gttttagtgg ttagatacga aggtccaaaa 1320gggggaccag gtatgagaga aatgctggct ccaacatctg caatcattgg tgccggcttg 1380ggtgattctg ttggactcat tacggatggc agattctctg gcggaacata tggtttggta 1440gtcggacacg ttgctccaga ggctgcagtg ggtggtaaca tcgctttagt gcaagagggc 1500gattcaatta ctattgatgc ttcacagcgt ttgttacaag taaacatctc tgaccaggtg 1560ttggagcaaa gacgacaaaa ctggcaacca ccacaaccta gatacactaa aggcgtatta 1620gcgaagtacg caaagttggt ttcaagtagt tcagttggcg cagttactga tctcgattgt 1680taa 16831621851DNAArtificial SequenceE. coli ilvD codon-optimized for expression in K. lactis 162atgccgaaat acagatcagc aacaacaacc catggtagaa atatggctgg tgcaagggct 60ctatggagag ctactggcat gactgatgca gatttcggaa agccaatcat tgccgtcgtc 120aactctttta cacaattcgt tccgggtcat gtccatttgc gtgatctagg taagcttgtt 180gccgaacaaa ttgaagctgc aggtggtgtc gcaaaagagt ttaatactat tgctgtggac 240gacggtatag ctatggggca tggcggtatg ttatactctt taccatcgag agaattaatt 300gcagactcag tcgaatatat ggttaatgct cattgtgccg atgcaatggt ttgtatctct 360aattgtgata agataacgcc tggtatgttg atggcgtcct tgagattgaa catcccagta 420atcttcgtat ctggcggccc aatggaggct ggtaaaacta agttaagtga tcagatcatc 480aaacttgatc ttgtggatgc aatgattcaa ggtgcagatc caaaagtttc agactcgcag 540tcagaccaag ttgaaagaag tgcatgtcca acttgtggtt cttgcagtgg aatgttcacg 600gctaactcta tgaattgctt gactgaagct ctaggtttat ctcaaccagg aaatggttca 660ttattagcga cccatgcaga cagaaagcaa ttgttcttaa atgccggaaa aagaattgtg 720gaactaacga aaaggtatta cgaacaaaat gatgaatcag cattaccgag gaatatagct 780tcaaaggctg cattcgaaaa tgccatgaca ttggatattg caatgggtgg tagtacaaac 840acggtcttac atcttctagc tgcagcccaa gaagctgaga tagatttcac catgtctgat 900atcgacaagc tttcacgtaa ggttccacag ttatgtaagg ttgcaccatc aactcaaaag 960tatcacatgg aagacgttca tcgtgcagga ggggttattg gtattttagg ggagttggac 1020agagccggtc ttttaaacag ggatgtgaag aatgtattgg gtttaacact tccacagaca 1080ttagagcaat acgatgtcat gttaactcaa gatgatgccg tgaaaaacat gttcagggca 1140ggtccagcag ggatcagaac cacccaagca ttctcgcaag actgtaggtg ggacactttg 1200gacgatgata gagcaaatgg atgtataaga tcgcttgagc atgcttatag taaggatggt 1260ggtttagcag tattatatgg aaacttcgct gaaaatggtt gcattgtgaa aactgctggt 1320gtagatgata gtattttgaa atttactgga cccgctaaag tttacgaaag tcaagacgat 1380gctgttgagg ctatacttgg cggaaaggtg gtagcaggag acgtggtagt gataagatat 1440gagggaccaa agggaggacc aggtatgcag gaaatgcttt acccaacttc atttttgaag 1500tccatgggac taggaaaagc ttgtgccctt atcactgacg gtagattctc tggtggcact 1560tcgggtttaa gtatcggtca cgtatcacca gaggcagctt ctggtggttc gattggattg 1620attgaagatg gagatttgat cgccatagat atcccaaata gaggtatcca attacaagtc 1680tcagacgctg aattggctgc aagaagagaa gcacaagatg ccagaggaga taaggcttgg 1740actcctaaaa atagagaacg tcaagtaagt ttcgccctta gggcttatgc ttcattggct 1800acttcagccg ataagggggc agtaagagac aaatcgaagt tgggtggatg a 1851163567PRTSaccharomyces cerevisiae

163Met Ala Lys Lys Leu Asn Lys Tyr Ser Tyr Ile Ile Thr Glu Pro Lys 1 5 10 15 Gly Gln Gly Ala Ser Gln Ala Met Leu Tyr Ala Thr Gly Phe Lys Lys 20 25 30 Glu Asp Phe Lys Lys Pro Gln Val Gly Val Gly Ser Cys Trp Trp Ser 35 40 45 Gly Asn Pro Cys Asn Met His Leu Leu Asp Leu Asn Asn Arg Cys Ser 50 55 60 Gln Ser Ile Glu Lys Ala Gly Leu Lys Ala Met Gln Phe Asn Thr Ile 65 70 75 80 Gly Val Ser Asp Gly Ile Ser Met Gly Thr Lys Gly Met Arg Tyr Ser 85 90 95 Leu Gln Ser Arg Glu Ile Ile Ala Asp Ser Phe Glu Thr Ile Met Met 100 105 110 Ala Gln His Tyr Asp Ala Asn Ile Ala Ile Pro Ser Cys Asp Lys Asn 115 120 125 Met Pro Gly Val Met Met Ala Met Gly Arg His Asn Arg Pro Ser Ile 130 135 140 Met Val Tyr Gly Gly Thr Ile Leu Pro Gly His Pro Thr Cys Gly Ser 145 150 155 160 Ser Lys Ile Ser Lys Asn Ile Asp Ile Val Ser Ala Phe Gln Ser Tyr 165 170 175 Gly Glu Tyr Ile Ser Lys Gln Phe Thr Glu Glu Glu Arg Glu Asp Val 180 185 190 Val Glu His Ala Cys Pro Gly Pro Gly Ser Cys Gly Gly Met Tyr Thr 195 200 205 Ala Asn Thr Met Ala Ser Ala Ala Glu Val Leu Gly Leu Thr Ile Pro 210 215 220 Asn Ser Ser Ser Phe Pro Ala Val Ser Lys Glu Lys Leu Ala Glu Cys 225 230 235 240 Asp Asn Ile Gly Glu Tyr Ile Lys Lys Thr Met Glu Leu Gly Ile Leu 245 250 255 Pro Arg Asp Ile Leu Thr Lys Glu Ala Phe Glu Asn Ala Ile Thr Tyr 260 265 270 Val Val Ala Thr Gly Gly Ser Thr Asn Ala Val Leu His Leu Val Ala 275 280 285 Val Ala His Ser Ala Gly Val Lys Leu Ser Pro Asp Asp Phe Gln Arg 290 295 300 Ile Ser Asp Thr Thr Pro Leu Ile Gly Asp Phe Lys Pro Ser Gly Lys 305 310 315 320 Tyr Val Met Ala Asp Leu Ile Asn Val Gly Gly Thr Gln Ser Val Ile 325 330 335 Lys Tyr Leu Tyr Glu Asn Asn Met Leu His Gly Asn Thr Met Thr Val 340 345 350 Thr Gly Asp Thr Leu Ala Glu Arg Ala Lys Lys Ala Pro Ser Leu Pro 355 360 365 Glu Gly Gln Glu Ile Ile Lys Pro Leu Ser His Pro Ile Lys Ala Asn 370 375 380 Gly His Leu Gln Ile Leu Tyr Gly Ser Leu Ala Pro Gly Gly Ala Val 385 390 395 400 Gly Lys Ile Thr Gly Lys Glu Gly Thr Tyr Phe Lys Gly Arg Ala Arg 405 410 415 Val Phe Glu Glu Glu Gly Ala Phe Ile Glu Ala Leu Glu Arg Gly Glu 420 425 430 Ile Lys Lys Gly Glu Lys Thr Val Val Val Ile Arg Tyr Glu Gly Pro 435 440 445 Arg Gly Ala Pro Gly Met Pro Glu Met Leu Lys Pro Ser Ser Ala Leu 450 455 460 Met Gly Tyr Gly Leu Gly Lys Asp Val Ala Leu Leu Thr Asp Gly Arg 465 470 475 480 Phe Ser Gly Gly Ser His Gly Phe Leu Ile Gly His Ile Val Pro Glu 485 490 495 Ala Ala Glu Gly Gly Pro Ile Gly Leu Val Arg Asp Gly Asp Glu Ile 500 505 510 Ile Ile Asp Ala Asp Asn Asn Lys Ile Asp Leu Leu Val Ser Asp Lys 515 520 525 Glu Met Ala Gln Arg Lys Gln Ser Trp Val Ala Pro Pro Pro Arg Tyr 530 535 540 Thr Arg Gly Thr Leu Ser Lys Tyr Ala Lys Leu Val Ser Asn Ala Ser 545 550 555 560 Asn Gly Cys Val Leu Asp Ala 565 164563PRTSaccharomyces cerevisiae 164Met Asn Lys Tyr Ser Tyr Ile Ile Thr Glu Pro Lys Gly Gln Gly Ala 1 5 10 15 Ser Gln Ala Met Leu Tyr Ala Thr Gly Phe Lys Lys Glu Asp Phe Lys 20 25 30 Lys Pro Gln Val Gly Val Gly Ser Cys Trp Trp Ser Gly Asn Pro Cys 35 40 45 Asn Met His Leu Leu Asp Leu Asn Asn Arg Cys Ser Gln Ser Ile Glu 50 55 60 Lys Ala Gly Leu Lys Ala Met Gln Phe Asn Thr Ile Gly Val Ser Asp 65 70 75 80 Gly Ile Ser Met Gly Thr Lys Gly Met Arg Tyr Ser Leu Gln Ser Arg 85 90 95 Glu Ile Ile Ala Asp Ser Phe Glu Thr Ile Met Met Ala Gln His Tyr 100 105 110 Asp Ala Asn Ile Ala Ile Pro Ser Cys Asp Lys Asn Met Pro Gly Val 115 120 125 Met Met Ala Met Gly Arg His Asn Arg Pro Ser Ile Met Val Tyr Gly 130 135 140 Gly Thr Ile Leu Pro Gly His Pro Thr Cys Gly Ser Ser Lys Ile Ser 145 150 155 160 Lys Asn Ile Asp Ile Val Ser Ala Phe Gln Ser Tyr Gly Glu Tyr Ile 165 170 175 Ser Lys Gln Phe Thr Glu Glu Glu Arg Glu Asp Val Val Glu His Ala 180 185 190 Cys Pro Gly Pro Gly Ser Cys Gly Gly Met Tyr Thr Ala Asn Thr Met 195 200 205 Ala Ser Ala Ala Glu Val Leu Gly Leu Thr Ile Pro Asn Ser Ser Ser 210 215 220 Phe Pro Ala Val Ser Lys Glu Lys Leu Ala Glu Cys Asp Asn Ile Gly 225 230 235 240 Glu Tyr Ile Lys Lys Thr Met Glu Leu Gly Ile Leu Pro Arg Asp Ile 245 250 255 Leu Thr Lys Glu Ala Phe Glu Asn Ala Ile Thr Tyr Val Val Ala Thr 260 265 270 Gly Gly Ser Thr Asn Ala Val Leu His Leu Val Ala Val Ala His Ser 275 280 285 Ala Gly Val Lys Leu Ser Pro Asp Asp Phe Gln Arg Ile Ser Asp Thr 290 295 300 Thr Pro Leu Ile Gly Asp Phe Lys Pro Ser Gly Lys Tyr Val Met Ala 305 310 315 320 Asp Leu Ile Asn Val Gly Gly Thr Gln Ser Val Ile Lys Tyr Leu Tyr 325 330 335 Glu Asn Asn Met Leu His Gly Asn Thr Met Thr Val Thr Gly Asp Thr 340 345 350 Leu Ala Glu Arg Ala Lys Lys Ala Pro Ser Leu Pro Glu Gly Gln Glu 355 360 365 Ile Ile Lys Pro Leu Ser His Pro Ile Lys Ala Asn Gly His Leu Gln 370 375 380 Ile Leu Tyr Gly Ser Leu Ala Pro Gly Gly Ala Val Gly Lys Ile Thr 385 390 395 400 Gly Lys Glu Gly Thr Tyr Phe Lys Gly Arg Ala Arg Val Phe Glu Glu 405 410 415 Glu Gly Ala Phe Ile Glu Ala Leu Glu Arg Gly Glu Ile Lys Lys Gly 420 425 430 Glu Lys Thr Val Val Val Ile Arg Tyr Glu Gly Pro Arg Gly Ala Pro 435 440 445 Gly Met Pro Glu Met Leu Lys Pro Ser Ser Ala Leu Met Gly Tyr Gly 450 455 460 Leu Gly Lys Asp Val Ala Leu Leu Thr Asp Gly Arg Phe Ser Gly Gly 465 470 475 480 Ser His Gly Phe Leu Ile Gly His Ile Val Pro Glu Ala Ala Glu Gly 485 490 495 Gly Pro Ile Gly Leu Val Arg Asp Gly Asp Glu Ile Ile Ile Asp Ala 500 505 510 Asp Asn Asn Lys Ile Asp Leu Leu Val Ser Asp Lys Glu Met Ala Gln 515 520 525 Arg Lys Gln Ser Trp Val Ala Pro Pro Pro Arg Tyr Thr Arg Gly Thr 530 535 540 Leu Ser Lys Tyr Ala Lys Leu Val Ser Asn Ala Ser Asn Gly Cys Val 545 550 555 560 Leu Asp Ala 165640PRTNeurospora crassa 165Met Ala Ser Asn Gln Asp Asn Lys Ala Val Ala Pro Asp Ala Ala Ala 1 5 10 15 Pro Ala Gly Gln Ser Thr Thr Thr Thr Thr Thr Asn Asp Asn Ser Glu 20 25 30 Arg Asn Leu Pro Lys Glu Gly Glu Tyr Ile Gln Trp Arg Thr Leu Pro 35 40 45 Ala Gly Asn Pro Asp Gln Leu Asn Arg Trp Ser His Phe Leu Thr Arg 50 55 60 Glu His Glu Phe Pro Gly Ala Gln Ala Met Leu Tyr Gly Ala Gly Val 65 70 75 80 Pro Asn Lys Asp Met Met Lys Lys Ala Pro His Val Gly Ile Ala Thr 85 90 95 Val Trp Trp Glu Gly Asn Pro Cys Asn Thr His Leu Leu Asp Leu Gly 100 105 110 Gln Lys Val Lys Lys Ala Val Glu Arg Glu Lys Met Leu Ala Trp Gln 115 120 125 Phe Asn Thr Ile Gly Val Ser Asp Gly Ile Thr Met Gly Gly Glu Gly 130 135 140 Met Arg Tyr Ser Leu Gln Ser Arg Glu Ile Ile Ala Asp Ser Ile Glu 145 150 155 160 Thr Val Thr Cys Ala Gln His His Asp Ala Asn Ile Ser Ile Pro Gly 165 170 175 Cys Asp Lys Asn Met Pro Gly Val Ile Met Ala Ala Ala Arg His Asn 180 185 190 Arg Pro Phe Val Met Ile Tyr Gly Gly Thr Met Arg Gly Gly His Ser 195 200 205 Glu Leu Leu Asp Arg Pro Ile Asn Ile Val Thr Cys Tyr Glu Ala Ser 210 215 220 Gly Ala Tyr Thr Tyr Gly Arg Leu Lys Pro Ala Cys Pro Asn Ser Thr 225 230 235 240 Ala Thr Pro Ser Asp Val Met Asp Asp Ile Glu Gln His Ala Cys Pro 245 250 255 Gly Ala Gly Ala Cys Gly Gly Met Tyr Thr Ala Asn Thr Met Ala Thr 260 265 270 Ala Ile Glu Ala Met Gly Leu Thr Ala Pro Gly Ser Ser Ser Phe Pro 275 280 285 Ala Ser Ser Pro Glu Lys Phe Arg Glu Cys Glu Lys Ala Ala Glu Tyr 290 295 300 Ile Lys Ile Cys Met Glu Lys Asp Ile Arg Pro Arg Asp Leu Leu Thr 305 310 315 320 Lys Ala Ser Phe Glu Asn Ala Leu Val Leu Thr Met Ile Leu Gly Gly 325 330 335 Ser Thr Asn Gly Val Leu His Tyr Leu Ala Met Ala Asn Ser Ala Asp 340 345 350 Val Asp Leu Thr Leu Asp Asp Ile Asn Arg Val Ser Ala Lys Thr Pro 355 360 365 Phe Leu Ala Asp Met Ala Pro Ser Gly Arg Tyr Tyr Met Glu Asp Leu 370 375 380 Tyr Lys Val Gly Gly Thr Pro Ala Val Leu Lys Met Leu Ile Ala Ala 385 390 395 400 Gly Tyr Ile Asp Gly Thr Ile Pro Thr Ile Thr Gly Lys Ser Leu Ala 405 410 415 Glu Asn Val Ser Asp Trp Pro Ser Leu Asp Pro Asp Gln Lys Ile Ile 420 425 430 Arg Pro Leu Asp Asn Pro Ile Lys Ser Gln Gly His Ile Arg Val Leu 435 440 445 Tyr Gly Asn Phe Ser Pro Gly Gly Ala Val Ala Lys Ile Thr Gly Lys 450 455 460 Glu Gly Leu Ser Phe Thr Gly Lys Ala Arg Cys Phe Asn Lys Glu Phe 465 470 475 480 Glu Leu Asp Ala Ala Leu Lys Asn Ser Glu Ile Thr Leu Glu Gln Gly 485 490 495 Asn Gln Val Leu Ile Val Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly 500 505 510 Met Pro Glu Gln Leu Lys Ala Ser Ala Ala Ile Met Gly Ala Gly Leu 515 520 525 Thr Asn Val Ala Leu Val Thr Asp Gly Arg Tyr Ser Gly Ala Ser His 530 535 540 Gly Phe Ile Val Gly His Val Val Pro Glu Ala Ala Thr Gly Gly Pro 545 550 555 560 Ile Ala Leu Val Lys Asp Gly Asp Leu Ile Thr Ile Asp Ala Val Arg 565 570 575 Asn Arg Ile Asp Val Val Lys Thr Val Glu Gly Val Glu Gly Glu Glu 580 585 590 Glu Ile Ala Lys Val Leu Glu Glu Arg Lys Lys Gly Trp Lys Ala Pro 595 600 605 Lys Met Lys Pro Thr Arg Gly Ala Leu Ala Lys Tyr Ala Arg Leu Val 610 615 620 Gly Asp Ala Ser His Gly Ala Val Thr Asp Leu Gly Gly Asp Ala Tyr 625 630 635 640 166561PRTAcaryochloris marina 166Met Ser Asp Asn Arg Asn Ser Gln Val Val Thr Gln Gly Val Gln Arg 1 5 10 15 Ala Pro Asn Arg Ala Met Leu Arg Ala Val Gly Phe Gly Asp Asp Asp 20 25 30 Phe Thr Lys Pro Ile Val Gly Leu Ala Asn Gly Phe Ser Thr Ile Thr 35 40 45 Pro Cys Asn Met Gly Ile Asp Ser Leu Ala Thr Arg Ala Glu Ala Ser 50 55 60 Ile Arg Thr Ala Gly Ala Met Pro Gln Lys Phe Gly Thr Ile Thr Ile 65 70 75 80 Ser Asp Gly Ile Ser Met Gly Thr Glu Gly Met Lys Tyr Ser Leu Val 85 90 95 Ser Arg Glu Val Ile Ala Asp Ser Ile Glu Thr Ala Cys Met Gly Gln 100 105 110 Ser Met Asp Gly Val Leu Ala Ile Gly Gly Cys Asp Lys Asn Met Pro 115 120 125 Gly Ala Met Leu Ala Met Ala Arg Met Asn Ile Pro Ala Ile Phe Val 130 135 140 Tyr Gly Gly Thr Ile Lys Pro Gly His Leu Asn Gly Glu Asp Leu Thr 145 150 155 160 Val Val Ser Ala Phe Glu Ala Val Gly Gln His Ser Ala Gly Arg Ile 165 170 175 Ser Glu Ala Glu Leu Thr Ala Val Glu Lys His Ala Cys Pro Gly Ala 180 185 190 Gly Ser Cys Gly Gly Met Tyr Thr Ala Asn Thr Met Ser Ser Ala Phe 195 200 205 Glu Ala Met Gly Met Ser Leu Met Tyr Ser Ser Thr Met Ala Ala Glu 210 215 220 Asp Glu Glu Lys Ala Val Ser Ala Glu Gln Ser Ala Ala Val Leu Val 225 230 235 240 Glu Ala Ile His Lys Gln Ile Leu Pro Arg Asp Ile Leu Thr Arg Lys 245 250 255 Ala Phe Glu Asn Ala Ile Ala Val Ile Met Ala Val Gly Gly Ser Thr 260 265 270 Asn Ala Val Leu His Leu Leu Ala Ile Ser Arg Ala Ala Gly Asp Ser 275 280 285 Leu Thr Leu Asp Asp Phe Glu Thr Ile Arg Ala Gln Val Pro Val Ile 290 295 300 Cys Asp Leu Lys Pro Ser Gly Arg Tyr Val Ala Thr Asp Leu His Lys 305 310 315 320 Ala Gly Gly Ile Pro Leu Val Met Lys Met Leu Leu Glu His Gly Leu 325 330 335 Leu His Gly Asp Ala Leu Thr Ile Thr Gly Lys Thr Ile Ala Glu Gln 340 345 350 Leu Ala Asp Val Pro Ser Glu Pro Ser Pro Asp Gln Asp Val Ile Arg 355 360 365 Pro Trp Asp Asn Pro Met Tyr Lys Gln Gly His Leu Ala Ile Leu Arg 370 375 380 Gly Asn Leu Ala Thr Glu Gly Ala Val Ala Lys Ile Thr Gly Ile Lys 385 390 395 400 Asn Pro Gln Ile Thr Gly Pro Ala Arg Val Phe Glu Ser Glu Glu Ala 405 410 415 Cys Leu Glu Ala Ile Leu Ala Gly Lys Ile Gln Pro Asn Asp Val Ile 420 425 430 Val Val Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Arg Glu Met 435 440 445 Leu Ala Pro Thr Ser Ala Ile Ile Gly Ala Gly Leu Gly Asp Ser Val 450 455 460 Gly Leu Ile Thr Asp Gly Arg Phe Ser Gly Gly Thr Tyr Gly Met Val 465 470 475 480 Val Gly His Val Ala Pro Glu Ala Ala Val Gly Gly Thr Ile Ala Leu 485 490 495 Val Gln Glu Gly Asp Gln Ile Thr Ile Asp Ala His Ala Arg Lys Leu 500 505 510 Glu Leu His Val Ser Asp Gln Glu Leu Lys Glu Arg Lys Glu Lys Trp 515 520

525 Glu Gln Pro Lys Pro Leu Tyr Asn Lys Gly Val Leu Ala Lys Tyr Ala 530 535 540 Lys Leu Val Ser Ser Ser Ser Val Gly Ala Val Thr Asp Leu Asp Leu 545 550 555 560 Phe 167559PRTLyngbya spp. 167Met Ser Asp Asn Phe Arg Ser Gln Ala Ile Thr Gln Gly Lys Lys Arg 1 5 10 15 Thr Pro Asn Arg Ala Met Leu Arg Ala Val Gly Phe Gly Asp Glu Asp 20 25 30 Phe Asn Lys Pro Ile Val Gly Ile Ala Asn Gly Tyr Ser Thr Ile Thr 35 40 45 Pro Cys Asn Ile Gly Leu Asn Asp Leu Ala His Arg Ala Glu Thr Ala 50 55 60 Leu Lys Gln Ala Asp Ala Met Pro Gln Met Phe Gly Thr Ile Thr Val 65 70 75 80 Ser Asp Gly Ile Ala Met Gly Thr Glu Gly Met Lys Tyr Ser Leu Val 85 90 95 Ser Arg Glu Val Ile Ala Asp Ala Ile Glu Thr Ala Cys Asn Gly Gln 100 105 110 Ser Met Asp Gly Val Leu Ala Ile Gly Gly Cys Asp Lys Asn Met Pro 115 120 125 Gly Ala Met Ile Ala Ile Ala Arg Met Asn Ile Pro Ala Ile Phe Val 130 135 140 Tyr Gly Gly Thr Ile Lys Pro Gly Asn Leu Asn Gly Cys Asp Leu Thr 145 150 155 160 Val Val Ser Ala Phe Glu Ala Val Gly Glu Tyr Ser Ala Gly Lys Leu 165 170 175 Asp Asp Asp Arg Leu Leu Asp Ile Glu Arg Leu Ala Cys Pro Gly Ser 180 185 190 Gly Ser Cys Gly Gly Met Phe Thr Ala Asn Thr Met Ser Ser Ala Phe 195 200 205 Glu Ala Met Gly Met Ser Leu Met Tyr Ser Ser Thr Met Ala Ser Glu 210 215 220 Asp Ala Glu Lys Ala Asp Ser Thr Glu Lys Ser Ala Phe Val Leu Arg 225 230 235 240 Glu Ala Ile Ser Gln Arg Ile Leu Pro Lys Gln Ile Leu Thr Arg Lys 245 250 255 Ala Phe Glu Asn Ala Ile Ala Val Ile Met Ala Val Gly Gly Ser Thr 260 265 270 Asn Ser Val Leu His Leu Leu Ala Ile Ala Tyr Ala Ala Asp Val Glu 275 280 285 Leu Thr Ile Asp Asp Phe Glu Thr Ile Arg Gly Arg Val Pro Val Leu 290 295 300 Cys Asp Leu Lys Pro Ser Gly Arg Phe Val Thr Thr Asp Phe His Lys 305 310 315 320 Ala Gly Gly Val Pro Leu Ile Met Lys Met Leu Leu Glu Gln Gly Leu 325 330 335 Ile His Gly Asp Ala Leu Thr Ile Thr Gly Lys Thr Val Ala Glu Gln 340 345 350 Leu Ala Asp Ile Pro Ser Gln Pro Ser Ala Asp Gln Glu Val Ile Arg 355 360 365 Pro Trp Asn Asn Pro Met Tyr Lys Gln Gly His Leu Ala Ile Leu Lys 370 375 380 Gly Asn Leu Ala Thr Glu Gly Ser Val Ala Lys Ile Thr Gly Val Lys 385 390 395 400 Lys Pro Gln Met Thr Gly Pro Ala Arg Val Phe Glu Ser Glu Glu Gln 405 410 415 Cys Leu Glu Ala Ile Leu Ala Gly Lys Ile Gln Ala Gly Asp Val Leu 420 425 430 Val Val Arg Tyr Glu Gly Pro Lys Gly Gly Pro Gly Met Arg Glu Met 435 440 445 Leu Ala Pro Thr Ser Ala Ile Ile Gly Ala Gly Leu Gly Asp Ser Val 450 455 460 Gly Leu Ile Thr Asp Gly Arg Phe Ser Gly Gly Thr Tyr Gly Leu Val 465 470 475 480 Val Gly His Val Ala Pro Glu Ala Ala Val Gly Gly Asn Ile Ala Leu 485 490 495 Val Gln Glu Gly Asp Ser Ile Thr Ile Asp Ala Ser Gln Arg Leu Leu 500 505 510 Gln Val Asn Ile Ser Asp Gln Val Leu Glu Gln Arg Arg Gln Asn Trp 515 520 525 Gln Pro Pro Gln Pro Arg Tyr Thr Lys Gly Val Leu Ala Lys Tyr Ala 530 535 540 Lys Leu Val Ser Ser Ser Ser Val Gly Ala Val Thr Asp Leu Asp 545 550 555 168616PRTEscherichia coli 168Met Pro Lys Tyr Arg Ser Ala Thr Thr Thr His Gly Arg Asn Met Ala 1 5 10 15 Gly Ala Arg Ala Leu Trp Arg Ala Thr Gly Met Thr Asp Ala Asp Phe 20 25 30 Gly Lys Pro Ile Ile Ala Val Val Asn Ser Phe Thr Gln Phe Val Pro 35 40 45 Gly His Val His Leu Arg Asp Leu Gly Lys Leu Val Ala Glu Gln Ile 50 55 60 Glu Ala Ala Gly Gly Val Ala Lys Glu Phe Asn Thr Ile Ala Val Asp 65 70 75 80 Asp Gly Ile Ala Met Gly His Gly Gly Met Leu Tyr Ser Leu Pro Ser 85 90 95 Arg Glu Leu Ile Ala Asp Ser Val Glu Tyr Met Val Asn Ala His Cys 100 105 110 Ala Asp Ala Met Val Cys Ile Ser Asn Cys Asp Lys Ile Thr Pro Gly 115 120 125 Met Leu Met Ala Ser Leu Arg Leu Asn Ile Pro Val Ile Phe Val Ser 130 135 140 Gly Gly Pro Met Glu Ala Gly Lys Thr Lys Leu Ser Asp Gln Ile Ile 145 150 155 160 Lys Leu Asp Leu Val Asp Ala Met Ile Gln Gly Ala Asp Pro Lys Val 165 170 175 Ser Asp Ser Gln Ser Asp Gln Val Glu Arg Ser Ala Cys Pro Thr Cys 180 185 190 Gly Ser Cys Ser Gly Met Phe Thr Ala Asn Ser Met Asn Cys Leu Thr 195 200 205 Glu Ala Leu Gly Leu Ser Gln Pro Gly Asn Gly Ser Leu Leu Ala Thr 210 215 220 His Ala Asp Arg Lys Gln Leu Phe Leu Asn Ala Gly Lys Arg Ile Val 225 230 235 240 Glu Leu Thr Lys Arg Tyr Tyr Glu Gln Asn Asp Glu Ser Ala Leu Pro 245 250 255 Arg Asn Ile Ala Ser Lys Ala Ala Phe Glu Asn Ala Met Thr Leu Asp 260 265 270 Ile Ala Met Gly Gly Ser Thr Asn Thr Val Leu His Leu Leu Ala Ala 275 280 285 Ala Gln Glu Ala Glu Ile Asp Phe Thr Met Ser Asp Ile Asp Lys Leu 290 295 300 Ser Arg Lys Val Pro Gln Leu Cys Lys Val Ala Pro Ser Thr Gln Lys 305 310 315 320 Tyr His Met Glu Asp Val His Arg Ala Gly Gly Val Ile Gly Ile Leu 325 330 335 Gly Glu Leu Asp Arg Ala Gly Leu Leu Asn Arg Asp Val Lys Asn Val 340 345 350 Leu Gly Leu Thr Leu Pro Gln Thr Leu Glu Gln Tyr Asp Val Met Leu 355 360 365 Thr Gln Asp Asp Ala Val Lys Asn Met Phe Arg Ala Gly Pro Ala Gly 370 375 380 Ile Arg Thr Thr Gln Ala Phe Ser Gln Asp Cys Arg Trp Asp Thr Leu 385 390 395 400 Asp Asp Asp Arg Ala Asn Gly Cys Ile Arg Ser Leu Glu His Ala Tyr 405 410 415 Ser Lys Asp Gly Gly Leu Ala Val Leu Tyr Gly Asn Phe Ala Glu Asn 420 425 430 Gly Cys Ile Val Lys Thr Ala Gly Val Asp Asp Ser Ile Leu Lys Phe 435 440 445 Thr Gly Pro Ala Lys Val Tyr Glu Ser Gln Asp Asp Ala Val Glu Ala 450 455 460 Ile Leu Gly Gly Lys Val Val Ala Gly Asp Val Val Val Ile Arg Tyr 465 470 475 480 Glu Gly Pro Lys Gly Gly Pro Gly Met Gln Glu Met Leu Tyr Pro Thr 485 490 495 Ser Phe Leu Lys Ser Met Gly Leu Gly Lys Ala Cys Ala Leu Ile Thr 500 505 510 Asp Gly Arg Phe Ser Gly Gly Thr Ser Gly Leu Ser Ile Gly His Val 515 520 525 Ser Pro Glu Ala Ala Ser Gly Gly Ser Ile Gly Leu Ile Glu Asp Gly 530 535 540 Asp Leu Ile Ala Ile Asp Ile Pro Asn Arg Gly Ile Gln Leu Gln Val 545 550 555 560 Ser Asp Ala Glu Leu Ala Ala Arg Arg Glu Ala Gln Asp Ala Arg Gly 565 570 575 Asp Lys Ala Trp Thr Pro Lys Asn Arg Glu Arg Gln Val Ser Phe Ala 580 585 590 Leu Arg Ala Tyr Ala Ser Leu Ala Thr Ser Ala Asp Lys Gly Ala Val 595 600 605 Arg Asp Lys Ser Lys Leu Gly Gly 610 615 169571PRTBacillus subtilis 169Met Leu Thr Lys Ala Thr Lys Glu Gln Lys Ser Leu Val Lys Asn Arg 1 5 10 15 Gly Ala Glu Leu Val Val Asp Cys Leu Val Glu Gln Gly Val Thr His 20 25 30 Val Phe Gly Ile Pro Gly Ala Lys Ile Asp Ala Val Phe Asp Ala Leu 35 40 45 Gln Asp Lys Gly Pro Glu Ile Ile Val Ala Arg His Glu Gln Asn Ala 50 55 60 Ala Phe Met Ala Gln Ala Val Gly Arg Leu Thr Gly Lys Pro Gly Val 65 70 75 80 Val Leu Val Thr Ser Gly Pro Gly Ala Ser Asn Leu Ala Thr Gly Leu 85 90 95 Leu Thr Ala Asn Thr Glu Gly Asp Pro Val Val Ala Leu Ala Gly Asn 100 105 110 Val Ile Arg Ala Asp Arg Leu Lys Arg Thr His Gln Ser Leu Asp Asn 115 120 125 Ala Ala Leu Phe Gln Pro Ile Thr Lys Tyr Ser Val Glu Val Gln Asp 130 135 140 Val Lys Asn Ile Pro Glu Ala Val Thr Asn Ala Phe Arg Ile Ala Ser 145 150 155 160 Ala Gly Gln Ala Gly Ala Ala Phe Val Ser Phe Pro Gln Asp Val Val 165 170 175 Asn Glu Val Thr Asn Thr Lys Asn Val Arg Ala Val Ala Ala Pro Lys 180 185 190 Leu Gly Pro Ala Ala Asp Asp Ala Ile Ser Ala Ala Ile Ala Lys Ile 195 200 205 Gln Thr Ala Lys Leu Pro Val Val Leu Val Gly Met Lys Gly Gly Arg 210 215 220 Pro Glu Ala Ile Lys Ala Val Arg Lys Leu Leu Lys Lys Val Gln Leu 225 230 235 240 Pro Phe Val Glu Thr Tyr Gln Ala Ala Gly Thr Leu Ser Arg Asp Leu 245 250 255 Glu Asp Gln Tyr Phe Gly Arg Ile Gly Leu Phe Arg Asn Gln Pro Gly 260 265 270 Asp Leu Leu Leu Glu Gln Ala Asp Val Val Leu Thr Ile Gly Tyr Asp 275 280 285 Pro Ile Glu Tyr Asp Pro Lys Phe Trp Asn Ile Asn Gly Asp Arg Thr 290 295 300 Ile Ile His Leu Asp Glu Ile Ile Ala Asp Ile Asp His Ala Tyr Gln 305 310 315 320 Pro Asp Leu Glu Leu Ile Gly Asp Ile Pro Ser Thr Ile Asn His Ile 325 330 335 Glu His Asp Ala Val Lys Val Glu Phe Ala Glu Arg Glu Gln Lys Ile 340 345 350 Leu Ser Asp Leu Lys Gln Tyr Met His Glu Gly Glu Gln Val Pro Ala 355 360 365 Asp Trp Lys Ser Asp Arg Ala His Pro Leu Glu Ile Val Lys Glu Leu 370 375 380 Arg Asn Ala Val Asp Asp His Val Thr Val Thr Cys Asp Ile Gly Ser 385 390 395 400 His Ala Ile Trp Met Ser Arg Tyr Phe Arg Ser Tyr Glu Pro Leu Thr 405 410 415 Leu Met Ile Ser Asn Gly Met Gln Thr Leu Gly Val Ala Leu Pro Trp 420 425 430 Ala Ile Gly Ala Ser Leu Val Lys Pro Gly Glu Lys Val Val Ser Val 435 440 445 Ser Gly Asp Gly Gly Phe Leu Phe Ser Ala Met Glu Leu Glu Thr Ala 450 455 460 Val Arg Leu Lys Ala Pro Ile Val His Ile Val Trp Asn Asp Ser Thr 465 470 475 480 Tyr Asp Met Val Ala Phe Gln Gln Leu Lys Lys Tyr Asn Arg Thr Ser 485 490 495 Ala Val Asp Phe Gly Asn Ile Asp Ile Val Lys Tyr Ala Glu Ser Phe 500 505 510 Gly Ala Thr Gly Leu Arg Val Glu Ser Pro Asp Gln Leu Ala Asp Val 515 520 525 Leu Arg Gln Gly Met Asn Ala Glu Gly Pro Val Ile Ile Asp Val Pro 530 535 540 Val Asp Tyr Ser Asp Asn Ile Asn Leu Ala Ser Asp Lys Leu Pro Lys 545 550 555 560 Glu Phe Gly Glu Leu Met Lys Thr Lys Ala Leu 565 570 170491PRTArtificial SequenceE. coli ilvC Q110V 170Met Ala Asn Tyr Phe Asn Thr Leu Asn Leu Arg Gln Gln Leu Ala Gln 1 5 10 15 Leu Gly Lys Cys Arg Phe Met Gly Arg Asp Glu Phe Ala Asp Gly Ala 20 25 30 Ser Tyr Leu Gln Gly Lys Lys Val Val Ile Val Gly Cys Gly Ala Gln 35 40 45 Gly Leu Asn Gln Gly Leu Asn Met Arg Asp Ser Gly Leu Asp Ile Ser 50 55 60 Tyr Ala Leu Arg Lys Glu Ala Ile Ala Glu Lys Arg Ala Ser Trp Arg 65 70 75 80 Lys Ala Thr Glu Asn Gly Phe Lys Val Gly Thr Tyr Glu Glu Leu Ile 85 90 95 Pro Gln Ala Asp Leu Val Ile Asn Leu Thr Pro Asp Lys Val His Ser 100 105 110 Asp Val Val Arg Thr Val Gln Pro Leu Met Lys Asp Gly Ala Ala Leu 115 120 125 Gly Tyr Ser His Gly Phe Asn Ile Val Glu Val Gly Glu Gln Ile Arg 130 135 140 Lys Asp Ile Thr Val Val Met Val Ala Pro Lys Cys Pro Gly Thr Glu 145 150 155 160 Val Arg Glu Glu Tyr Lys Arg Gly Phe Gly Val Pro Thr Leu Ile Ala 165 170 175 Val His Pro Glu Asn Asp Pro Lys Gly Glu Gly Met Ala Ile Ala Lys 180 185 190 Ala Trp Ala Ala Ala Thr Gly Gly His Arg Ala Gly Val Leu Glu Ser 195 200 205 Ser Phe Val Ala Glu Val Lys Ser Asp Leu Met Gly Glu Gln Thr Ile 210 215 220 Leu Cys Gly Met Leu Gln Ala Gly Ser Leu Leu Cys Phe Asp Lys Leu 225 230 235 240 Val Glu Glu Gly Thr Asp Pro Ala Tyr Ala Glu Lys Leu Ile Gln Phe 245 250 255 Gly Trp Glu Thr Ile Thr Glu Ala Leu Lys Gln Gly Gly Ile Thr Leu 260 265 270 Met Met Asp Arg Leu Ser Asn Pro Ala Lys Leu Arg Ala Tyr Ala Leu 275 280 285 Ser Glu Gln Leu Lys Glu Ile Met Ala Pro Leu Phe Gln Lys His Met 290 295 300 Asp Asp Ile Ile Ser Gly Glu Phe Ser Ser Gly Met Met Ala Asp Trp 305 310 315 320 Ala Asn Asp Asp Lys Lys Leu Leu Thr Trp Arg Glu Glu Thr Gly Lys 325 330 335 Thr Ala Phe Glu Thr Ala Pro Gln Tyr Glu Gly Lys Ile Gly Glu Gln 340 345 350 Glu Tyr Phe Asp Lys Gly Val Leu Met Ile Ala Met Val Lys Ala Gly 355 360 365 Val Glu Leu Ala Phe Glu Thr Met Val Asp Ser Gly Ile Ile Glu Glu 370 375 380 Ser Ala Tyr Tyr Glu Ser Leu His Glu Leu Pro Leu Ile Ala Asn Thr 385 390 395 400 Ile Ala Arg Lys Arg Leu Tyr Glu Met Asn Val Val Ile Ser Asp Thr 405 410 415 Ala Glu Tyr Gly Asn Tyr Leu Phe Ser Tyr Ala Cys Val Pro Leu Leu 420 425 430 Lys Pro Phe Met Ala Glu Leu Gln Pro Gly Asp Leu Gly Lys Ala Ile 435 440 445 Pro Glu Gly Ala Val Asp Asn Gly Gln Leu Arg Asp Val Asn Glu Ala 450 455 460 Ile Arg Ser His Ala Ile Glu Gln Val Gly Lys Lys Leu Arg Gly Tyr 465 470 475 480 Met Thr Asp Met Lys Arg Ile Ala Val Ala Gly 485 490 1711476DNAArtificial SequenceE. coli ilvC codon-optimized for expression in S. cerevisiae (P2D1-A1) 171atggccaact attttaacac

attaaatttg agacaacaat tggctcaact gggtaagtgc 60agatttatgg gaagggacga gtttgctgat ggtgcttctt atctgcaagg aaagaaagta 120gtaattgttg gctgcggtgc tcagggtcta aaccaaggtt taaacatgag agattcaggt 180ctggatattt cgtatgcatt gaggaaagag tctattgcag aaaaggatgc cgattggcgt 240aaagcgacgg aaaatgggtt caaagttggt acttacgaag aactgatccc tcaggcagat 300ttagtgatta acctaacacc agataaggtt cactcagacg tagtaagaac agttcaaccg 360ctgatgaagg atggggcagc tttaggttac tctcatggct ttaatatcgt tgaagtgggc 420gagcagatca gaaaaggtat aacagtcgta atggttgcgc caaagtgccc aggtacggaa 480gtcagagagg agtacaagag gggttttggt gtacctacat tgatcgccgt acatcctgaa 540aatgacccca aacgtgaagg tatggcaata gcgaaggcat gggcagccgc aaccggaggt 600catagagcgg gtgtgttaga gagttctttc gtagctgagg tcaagagtga cttaatgggt 660gaacaaacca ttctgtgcgg aatgttgcag gcagggtctt tactatgctt tgataaattg 720gtcgaagagg gtacagatcc tgcctatgct gaaaagttga tacaatttgg ttgggagaca 780atcaccgagg cacttaaaca aggtggcata acattgatga tggatagact ttcaaatccg 840gccaagctaa gagcctacgc cttatctgag caactaaaag agatcatggc accattattc 900caaaagcaca tggacgatat tatctccggt gagttttcct caggaatgat ggcagattgg 960gcaaacgatg ataaaaagtt attgacgtgg agagaagaaa ccggcaagac ggcattcgag 1020acagccccac aatacgaagg taaaattggt gaacaagaat actttgataa gggagtattg 1080atgatagcta tggtgaaggc aggggtagaa cttgcattcg aaactatggt tgactccggt 1140atcattgaag aatctgcata ctatgagtct ttgcatgaat tgcctttgat agcaaatact 1200attgcaagaa aaagacttta cgagatgaat gttgtcatat cagacactgc agaatatggt 1260aattacttat ttagctacgc gtgtgtcccg ttgttagagc ccttcatggc cgagttacaa 1320cctggtgatt tggggaaggc tattccggaa ggagcggttg acaatggcca actgagagac 1380gtaaatgaag ctattcgttc acatgctata gaacaggtgg gtaaaaagct gagaggatat 1440atgaccgata tgaaaagaat tgcagtggca ggatga 1476172491PRTArtificial SequenceE. coli ilvC codon-optimized for expression in S. cerevisiae (P2D1-A1) 172Met Ala Asn Tyr Phe Asn Thr Leu Asn Leu Arg Gln Gln Leu Ala Gln 1 5 10 15 Leu Gly Lys Cys Arg Phe Met Gly Arg Asp Glu Phe Ala Asp Gly Ala 20 25 30 Ser Tyr Leu Gln Gly Lys Lys Val Val Ile Val Gly Cys Gly Ala Gln 35 40 45 Gly Leu Asn Gln Gly Leu Asn Met Arg Asp Ser Gly Leu Asp Ile Ser 50 55 60 Tyr Ala Leu Arg Lys Glu Ser Ile Ala Glu Lys Asp Ala Asp Trp Arg 65 70 75 80 Lys Ala Thr Glu Asn Gly Phe Lys Val Gly Thr Tyr Glu Glu Leu Ile 85 90 95 Pro Gln Ala Asp Leu Val Ile Asn Leu Thr Pro Asp Lys Val His Ser 100 105 110 Asp Val Val Arg Thr Val Gln Pro Leu Met Lys Asp Gly Ala Ala Leu 115 120 125 Gly Tyr Ser His Gly Phe Asn Ile Val Glu Val Gly Glu Gln Ile Arg 130 135 140 Lys Gly Ile Thr Val Val Met Val Ala Pro Lys Cys Pro Gly Thr Glu 145 150 155 160 Val Arg Glu Glu Tyr Lys Arg Gly Phe Gly Val Pro Thr Leu Ile Ala 165 170 175 Val His Pro Glu Asn Asp Pro Lys Arg Glu Gly Met Ala Ile Ala Lys 180 185 190 Ala Trp Ala Ala Ala Thr Gly Gly His Arg Ala Gly Val Leu Glu Ser 195 200 205 Ser Phe Val Ala Glu Val Lys Ser Asp Leu Met Gly Glu Gln Thr Ile 210 215 220 Leu Cys Gly Met Leu Gln Ala Gly Ser Leu Leu Cys Phe Asp Lys Leu 225 230 235 240 Val Glu Glu Gly Thr Asp Pro Ala Tyr Ala Glu Lys Leu Ile Gln Phe 245 250 255 Gly Trp Glu Thr Ile Thr Glu Ala Leu Lys Gln Gly Gly Ile Thr Leu 260 265 270 Met Met Asp Arg Leu Ser Asn Pro Ala Lys Leu Arg Ala Tyr Ala Leu 275 280 285 Ser Glu Gln Leu Lys Glu Ile Met Ala Pro Leu Phe Gln Lys His Met 290 295 300 Asp Asp Ile Ile Ser Gly Glu Phe Ser Ser Gly Met Met Ala Asp Trp 305 310 315 320 Ala Asn Asp Asp Lys Lys Leu Leu Thr Trp Arg Glu Glu Thr Gly Lys 325 330 335 Thr Ala Phe Glu Thr Ala Pro Gln Tyr Glu Gly Lys Ile Gly Glu Gln 340 345 350 Glu Tyr Phe Asp Lys Gly Val Leu Met Ile Ala Met Val Lys Ala Gly 355 360 365 Val Glu Leu Ala Phe Glu Thr Met Val Asp Ser Gly Ile Ile Glu Glu 370 375 380 Ser Ala Tyr Tyr Glu Ser Leu His Glu Leu Pro Leu Ile Ala Asn Thr 385 390 395 400 Ile Ala Arg Lys Arg Leu Tyr Glu Met Asn Val Val Ile Ser Asp Thr 405 410 415 Ala Glu Tyr Gly Asn Tyr Leu Phe Ser Tyr Ala Cys Val Pro Leu Leu 420 425 430 Glu Pro Phe Met Ala Glu Leu Gln Pro Gly Asp Leu Gly Lys Ala Ile 435 440 445 Pro Glu Gly Ala Val Asp Asn Gly Gln Leu Arg Asp Val Asn Glu Ala 450 455 460 Ile Arg Ser His Ala Ile Glu Gln Val Gly Lys Lys Leu Arg Gly Tyr 465 470 475 480 Met Thr Asp Met Lys Arg Ile Ala Val Ala Gly 485 490 173548PRTLactococcus lactis 173Met Tyr Thr Val Gly Asp Tyr Leu Leu Asp Arg Leu His Glu Leu Gly 1 5 10 15 Ile Glu Glu Ile Phe Gly Val Pro Gly Asp Tyr Asn Leu Gln Phe Leu 20 25 30 Asp Gln Ile Ile Ser His Lys Asp Met Lys Trp Val Gly Asn Ala Asn 35 40 45 Glu Leu Asn Ala Ser Tyr Met Ala Asp Gly Tyr Ala Arg Thr Lys Lys 50 55 60 Ala Ala Ala Phe Leu Thr Thr Phe Gly Val Gly Glu Leu Ser Ala Val 65 70 75 80 Asn Gly Leu Ala Gly Ser Tyr Ala Glu Asn Leu Pro Val Val Glu Ile 85 90 95 Val Gly Ser Pro Thr Ser Lys Val Gln Asn Glu Gly Lys Phe Val His 100 105 110 His Thr Leu Ala Asp Gly Asp Phe Lys His Phe Met Lys Met His Glu 115 120 125 Pro Val Thr Ala Ala Arg Thr Leu Leu Thr Ala Glu Asn Ala Thr Val 130 135 140 Glu Ile Asp Arg Val Leu Ser Ala Leu Leu Lys Glu Arg Lys Pro Val 145 150 155 160 Tyr Ile Asn Leu Pro Val Asp Val Ala Ala Ala Lys Ala Glu Lys Pro 165 170 175 Ser Leu Pro Leu Lys Lys Glu Asn Ser Thr Ser Asn Thr Ser Asp Gln 180 185 190 Glu Ile Leu Asn Lys Ile Gln Glu Ser Leu Lys Asn Ala Lys Lys Pro 195 200 205 Ile Val Ile Thr Gly His Glu Ile Ile Ser Phe Gly Leu Glu Lys Thr 210 215 220 Val Thr Gln Phe Ile Ser Lys Thr Lys Leu Pro Ile Thr Thr Leu Asn 225 230 235 240 Phe Gly Lys Ser Ser Val Asp Glu Ala Leu Pro Ser Phe Leu Gly Ile 245 250 255 Tyr Asn Gly Thr Leu Ser Glu Pro Asn Leu Lys Glu Phe Val Glu Ser 260 265 270 Ala Asp Phe Ile Leu Met Leu Gly Val Lys Leu Thr Asp Ser Ser Thr 275 280 285 Gly Ala Phe Thr His His Leu Asn Glu Asn Lys Met Ile Ser Leu Asn 290 295 300 Ile Asp Glu Gly Lys Ile Phe Asn Glu Arg Ile Gln Asn Phe Asp Phe 305 310 315 320 Glu Ser Leu Ile Ser Ser Leu Leu Asp Leu Ser Glu Ile Glu Tyr Lys 325 330 335 Gly Lys Tyr Ile Asp Lys Lys Gln Glu Asp Phe Val Pro Ser Asn Ala 340 345 350 Leu Leu Ser Gln Asp Arg Leu Trp Gln Ala Val Glu Asn Leu Thr Gln 355 360 365 Ser Asn Glu Thr Ile Val Ala Glu Gln Gly Thr Ser Phe Phe Gly Ala 370 375 380 Ser Ser Ile Phe Leu Lys Ser Lys Ser His Phe Ile Gly Gln Pro Leu 385 390 395 400 Trp Gly Ser Ile Gly Tyr Thr Phe Pro Ala Ala Leu Gly Ser Gln Ile 405 410 415 Ala Asp Lys Glu Ser Arg His Leu Leu Phe Ile Gly Asp Gly Ser Leu 420 425 430 Gln Leu Thr Val Gln Glu Leu Gly Leu Ala Ile Arg Glu Lys Ile Asn 435 440 445 Pro Ile Cys Phe Ile Ile Asn Asn Asp Gly Tyr Thr Val Glu Arg Glu 450 455 460 Ile His Gly Pro Asn Gln Ser Tyr Asn Asp Ile Pro Met Trp Asn Tyr 465 470 475 480 Ser Lys Leu Pro Glu Ser Phe Gly Ala Thr Glu Asp Arg Val Val Ser 485 490 495 Lys Ile Val Arg Thr Glu Asn Glu Phe Val Ser Val Met Lys Glu Ala 500 505 510 Gln Ala Asp Pro Asn Arg Met Tyr Trp Ile Glu Leu Ile Leu Ala Lys 515 520 525 Glu Gly Ala Pro Lys Val Leu Lys Lys Met Gly Lys Leu Phe Ala Glu 530 535 540 Gln Asn Lys Ser 545 1741023DNALactococcus lactis 174atgaaagcag cagtagtaag acacaatcca gatggttatg cggaccttgt tgaaaaggaa 60cttcgagcaa tcaaacctaa tgaagctttg cttgacatgg agtattgtgg agtctgtcat 120accgatttgc acgttgcagc aggtgattat ggcaacaaag cagggactgt tcttggtcat 180gaaggaattg gaattgtcaa agaaattgga gctgatgtaa gctcgcttca agttggtgat 240cgggtttcag tggcttggtt ctttgaagga tgtggtcact gtgaatactg tgtatctggt 300aatgaaactt tttgtcgaga agttaaaaat gcaggatatt cagttgatgg cggaatggct 360gaagaagcaa ttgttgttgc cgattatgct gtcaaagttc ctgacggact tgacccaatt 420gaagctagct caattacttg tgctggagta acaacttaca aagcaatcaa agtatcagga 480gtaaaacctg gtgattggca agtaattttt ggtgctggag gacttggaaa tttagcaatt 540caatatgcta aaaatgtttt tggagcaaaa gtaattgctg ttgatattaa tcaagataaa 600ttaaatttag ctaaaaaaat tggagctgat gtgattatca attctggtga tgtaaatcca 660gttgatgaaa ttaaaaaaat aactggcggc ttaggggtgc aaagtgcaat agtttgtgct 720gttgcaagga ttgcttttga acaagcggtt gcttctttga aacctatggg caaaatggtt 780gctgtggcac ttcccaatac tgagatgact ttatcagttc caacagttgt ttttgacgga 840gtggaggttg caggttcact tgtcggaaca agacttgact tggcagaagc ttttcaattt 900ggagcagaag gtaaggtaaa accaattgtt gcgacacgca aactggaaga aatcaatgat 960attattgatg aaatgaaggc aggaaaaatt gaaggccgaa tggtcattga ttttactaaa 1020taa 1023175340PRTLactococcus lactis 175Met Lys Ala Ala Val Val Arg His Asn Pro Asp Gly Tyr Ala Asp Leu 1 5 10 15 Val Glu Lys Glu Leu Arg Ala Ile Lys Pro Asn Glu Ala Leu Leu Asp 20 25 30 Met Glu Tyr Cys Gly Val Cys His Thr Asp Leu His Val Ala Ala Gly 35 40 45 Asp Tyr Gly Asn Lys Ala Gly Thr Val Leu Gly His Glu Gly Ile Gly 50 55 60 Ile Val Lys Glu Ile Gly Ala Asp Val Ser Ser Leu Gln Val Gly Asp 65 70 75 80 Arg Val Ser Val Ala Trp Phe Phe Glu Gly Cys Gly His Cys Glu Tyr 85 90 95 Cys Val Ser Gly Asn Glu Thr Phe Cys Arg Glu Val Lys Asn Ala Gly 100 105 110 Tyr Ser Val Asp Gly Gly Met Ala Glu Glu Ala Ile Val Val Ala Asp 115 120 125 Tyr Ala Val Lys Val Pro Asp Gly Leu Asp Pro Ile Glu Ala Ser Ser 130 135 140 Ile Thr Cys Ala Gly Val Thr Thr Tyr Lys Ala Ile Lys Val Ser Gly 145 150 155 160 Val Lys Pro Gly Asp Trp Gln Val Ile Phe Gly Ala Gly Gly Leu Gly 165 170 175 Asn Leu Ala Ile Gln Tyr Ala Lys Asn Val Phe Gly Ala Lys Val Ile 180 185 190 Ala Val Asp Ile Asn Gln Asp Lys Leu Asn Leu Ala Lys Lys Ile Gly 195 200 205 Ala Asp Val Ile Ile Asn Ser Gly Asp Val Asn Pro Val Asp Glu Ile 210 215 220 Lys Lys Ile Thr Gly Gly Leu Gly Val Gln Ser Ala Ile Val Cys Ala 225 230 235 240 Val Ala Arg Ile Ala Phe Glu Gln Ala Val Ala Ser Leu Lys Pro Met 245 250 255 Gly Lys Met Val Ala Val Ala Leu Pro Asn Thr Glu Met Thr Leu Ser 260 265 270 Val Pro Thr Val Val Phe Asp Gly Val Glu Val Ala Gly Ser Leu Val 275 280 285 Gly Thr Arg Leu Asp Leu Ala Glu Ala Phe Gln Phe Gly Ala Glu Gly 290 295 300 Lys Val Lys Pro Ile Val Ala Thr Arg Lys Leu Glu Glu Ile Asn Asp 305 310 315 320 Ile Ile Asp Glu Met Lys Ala Gly Lys Ile Glu Gly Arg Met Val Ile 325 330 335 Asp Phe Thr Lys 340 176256PRTDrosophila melanogaster 176Met Ser Phe Thr Leu Thr Asn Lys Asn Val Ile Phe Val Ala Gly Leu 1 5 10 15 Gly Gly Ile Gly Leu Asp Thr Ser Lys Glu Leu Leu Lys Arg Asp Leu 20 25 30 Lys Asn Leu Val Ile Leu Asp Arg Ile Glu Asn Pro Ala Ala Ile Ala 35 40 45 Glu Leu Lys Ala Ile Asn Pro Lys Val Thr Val Thr Phe Tyr Pro Tyr 50 55 60 Asp Val Thr Val Pro Ile Ala Glu Thr Thr Lys Leu Leu Lys Thr Ile 65 70 75 80 Phe Ala Gln Leu Lys Thr Val Asp Val Leu Ile Asn Gly Ala Gly Ile 85 90 95 Leu Asp Asp His Gln Ile Glu Arg Thr Ile Ala Val Asn Tyr Thr Gly 100 105 110 Leu Val Asn Thr Thr Thr Ala Ile Leu Asp Phe Trp Asp Lys Arg Lys 115 120 125 Gly Gly Pro Gly Gly Ile Ile Cys Asn Ile Gly Ser Val Thr Gly Phe 130 135 140 Asn Ala Ile Tyr Gln Val Pro Val Tyr Ser Gly Thr Lys Ala Ala Val 145 150 155 160 Val Asn Phe Thr Ser Ser Leu Ala Lys Leu Ala Pro Ile Thr Gly Val 165 170 175 Thr Ala Tyr Thr Val Asn Pro Gly Ile Thr Arg Thr Thr Leu Val His 180 185 190 Thr Phe Asn Ser Trp Leu Asp Val Glu Pro Gln Val Ala Glu Lys Leu 195 200 205 Leu Ala His Pro Thr Gln Pro Ser Leu Ala Cys Ala Glu Asn Phe Val 210 215 220 Lys Ala Ile Glu Leu Asn Gln Asn Gly Ala Ile Trp Lys Leu Asp Leu 225 230 235 240 Gly Thr Leu Glu Ala Ile Gln Trp Thr Lys His Trp Asp Ser Gly Ile 245 250 255 177340PRTArtificial SequenceL. lactis AdhA RE1 177Met Lys Ala Ala Val Val Arg His Asn Pro Asp Gly Tyr Ala Asp Leu 1 5 10 15 Val Glu Lys Glu Leu Arg Ala Ile Lys Pro Asn Glu Ala Leu Leu Asp 20 25 30 Met Glu Tyr Cys Gly Val Cys His Thr Asp Leu His Val Ala Ala Gly 35 40 45 Asp Phe Gly Asn Lys Ala Gly Thr Val Leu Gly His Glu Gly Ile Gly 50 55 60 Ile Val Lys Glu Ile Gly Ala Asp Val Ser Ser Leu Gln Val Gly Asp 65 70 75 80 Arg Val Ser Val Ala Trp Phe Phe Glu Gly Cys Gly His Cys Glu Tyr 85 90 95 Cys Val Ser Gly Asn Glu Thr Phe Cys Arg Glu Val Lys Asn Ala Gly 100 105 110 Tyr Ser Val Asp Gly Gly Met Ala Glu Glu Ala Ile Val Val Ala Asp 115 120 125 Tyr Ala Val Lys Val Pro Asp Gly Leu Asp Pro Ile Glu Ala Ser Ser 130 135 140 Ile Thr Cys Ala Gly Val Thr Thr Tyr Lys Ala Ile Lys Val Ser Gly 145 150 155 160 Val Lys Pro Gly Asp Trp Gln Val Ile Phe Gly Ala Gly Gly Leu Gly 165 170 175 Asn Leu Ala Ile Gln Tyr Ala Lys Asn Val Phe Gly Ala Lys Val Ile 180 185 190 Ala Val Asp Ile Asn Gln Asp Lys Leu Asn Leu Ala Lys Lys Ile Gly 195 200 205 Ala Asp Val Thr Ile Asn Ser Gly Asp Val Asn Pro Val Asp Glu Ile 210 215 220 Lys Lys Ile Thr Gly Gly Leu Gly Val Gln Ser Ala Ile Val Cys Ala 225 230

235 240 Val Ala Arg Ile Ala Phe Glu Gln Ala Val Ala Ser Leu Lys Pro Met 245 250 255 Gly Lys Met Val Ala Val Ala Val Pro Asn Thr Glu Met Thr Leu Ser 260 265 270 Val Pro Thr Val Val Phe Asp Gly Val Glu Val Ala Gly Ser Leu Val 275 280 285 Gly Thr Arg Leu Asp Leu Ala Glu Ala Phe Gln Phe Gly Ala Glu Gly 290 295 300 Lys Val Lys Pro Ile Val Ala Thr Arg Lys Leu Glu Glu Ile Asn Asp 305 310 315 320 Ile Ile Asp Glu Met Lys Ala Gly Lys Ile Glu Gly Arg Met Val Ile 325 330 335 Asp Phe Thr Lys 340

* * * * *