Enhancing Microbial Metabolism Of C5 Organic Carbon Merkx-Jacques; Alexandra ; et al. [MARA Renewables Corporation]

Enhancing Microbial Metabolism Of C5 Organic Carbon

Merkx-Jacques; Alexandra ; et al.

Patent Application Summary

U.S. patent application number 16/864098 was filed with the patent office on 2020-10-15 for enhancing microbial metabolism of c5 organic carbon. The applicant listed for this patent is MARA Renewables Corporation. Invention is credited to Roberto E. Armenta, Jeremy Benjamin, Alexandra Merkx-Jacques, Denise Muise, Holly Rasmussen, Mark Scaife, David Woodhall.

Application Number	20200325465 16/864098
Document ID	/
Family ID	1000004925969
Filed Date	2020-10-15

View All Diagrams

United States Patent Application	20200325465
Kind Code	A1
Merkx-Jacques; Alexandra ; et al.	October 15, 2020

ENHANCING MICROBIAL METABOLISM OF C5 ORGANIC CARBON

Abstract

Provided herein are recombinant microorganisms having two or more copies of a nucleic acid sequence encoding xylose isomerase, wherein the nucleic acid encoding the xylose isomerase is an exogenous nucleic acid. Optionally, the recombinant microorganisms include at least one nucleic acid sequence encoding a xylulose kinase and/or at least one nucleic acid sequence encoding a xylose transporter. The provided recombinant microorganisms are capable of growing on xylose as a carbon source.

Inventors:

Merkx-Jacques; Alexandra; (Dartmouth, CA) ; Woodhall; David; (Dartmouth, CA) ; Scaife; Mark; (Dartmouth, CA) ; Armenta; Roberto E.; (Dartmouth, CA) ; Muise; Denise; (Dartmouth, CA) ; Rasmussen; Holly; (Dartmouth, CA) ; Benjamin; Jeremy; (Dartmouth, CA)

Applicant:

Name	City	State	Country	Type
MARA Renewables Corporation	Dartmouth		CA

Family ID:

1000004925969

Appl. No.:

16/864098

Filed:

April 30, 2020

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
15918786	Mar 12, 2018	10662418
16864098
15208849	Jul 13, 2016	9951326
15918786
62191983	Jul 13, 2015
62354444	Jun 24, 2016

Current U.S. Class:	1/1
Current CPC Class:	C12P 7/6427 20130101; C07K 14/40 20130101; C12N 1/12 20130101; C07K 14/38 20130101; C12Y 503/01005 20130101; C12N 1/10 20130101; C12Y 207/01017 20130101; C07K 2319/21 20130101; C12N 9/92 20130101; C12P 7/64 20130101; C12N 9/1205 20130101
International Class:	C12N 9/92 20060101 C12N009/92; C12N 9/12 20060101 C12N009/12; C07K 14/40 20060101 C07K014/40; C12P 7/64 20060101 C12P007/64; C07K 14/38 20060101 C07K014/38; C12N 1/12 20060101 C12N001/12; C12N 1/10 20060101 C12N001/10

Claims

1. A recombinant microorganism comprising two or more copies of a nucleic acid sequence encoding xylose isomerase, wherein the nucleic acid encoding xylose isomerase is an exogenous nucleic acid.

2. The recombinant microorganism of claim 1, further comprising at least one nucleic acid sequence encoding a xylulose kinase.

3. The recombinant microorganism of claim 2, further comprising at least one nucleic acid sequence encoding a xylose transporter.

4. The recombinant microorganism of claim 1, wherein the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the exogenous nucleic acid sequence encoding xylose isomerase.

5. The recombinant microorganism of claim 1, wherein the 6 nucleic acid sequence encoding the xylose isomerase is at least 90% identical to SEQ ID NO:2.

6. The recombinant microorganism of claim 2, wherein the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylulose kinase.

7. The recombinant microorganism of claim 6, wherein the nucleic acid sequence encoding the xylulose kinase is at least 90% identical to SEQ ID NO:5.

8. The recombinant microorganism of claim 3, wherein the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylose transporter.

9. The recombinant microorganism of claim 8, wherein the xylose transporter is GXS1 from Candida intermedia.

10. The recombinant microorganism of claim 8, wherein the nucleic acid sequence encoding the xylose transporter is at least 90% identical to SEQ ID NO:23.

11. The recombinant microorganism of claim 3, wherein the recombinant microorganism has increased xylose transport activity as compared to a non-recombinant control microorganism, increased xylose isomerase activity as compared to a non-recombinant control microorganism, increased xylulose kinase activity as compared to a non-recombinant control microorganism, or any combination thereof.

12. The recombinant microorganism of claim 3, wherein the recombinant microorganism grows with xylose as the sole carbon source.

13. The recombinant microorganism of claim 3, wherein the nucleic acid sequence encoding the xylose isomerase, the xylulose kinase and/or the xylose transporter is operably linked to a promoter.

14. The recombinant microorganism of claim 13, wherein the promoter is a tubulin promoter that is at least 80% identical to SEQ ID NO:25 or SEQ ID NO:26.

15. The recombinant microorganism of claim 3, wherein the nucleic acid sequence encoding the xylose isomerase, the xylulose kinase and/or the xylose transporter comprises a terminator.

16. The recombinant microorganism of claim 15, wherein the terminator is a tubulin terminator that is at least 80% identical to SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:30.

17. The recombinant microorganism of claim 1, wherein the microorganism is either a Thraustochytrium or a Schizochytrium microorganism.

18. The recombinant microorganism of claim 17, wherein the microorganism is ONC-T18.

19. A method of making a recombinant xylose-metabolizing microorganism comprising: providing one or more nucleic acid constructs comprising a nucleic acid sequence encoding a xylose isomerase, a nucleic acid sequence encoding a xylulose kinase and a nucleic acid sequence encoding a xylose transporter; transforming the microorganism with the one or more nucleic acid constructs; and isolating microorganisms comprising at least two or more copies of the nucleic acid sequences encoding the xylose isomerase.

20. The method of claim 19, further comprising isolating microorganisms comprising at least one copy of the nucleic acid sequence encoding the xylulose kinase.

21. The method of claim 20, isolating microorganisms comprising at least one copy of the xylose transporter.

22. The method of claim 19, wherein the providing comprises providing a first nucleic acid construct comprising a nucleic acid sequence encoding a xylose isomerase, a second nucleic acid construct comprising a nucleic acid sequence encoding a xylulose kinase and a third nucleic acid construct comprising a nucleic acid sequence encoding a xylose transporter;

23. The method of claim 22, wherein the first, second, and/or third nucleic acid construct comprises a promoter, a selectable marker, a nucleic acid sequence encoding a 2A peptide, the nucleic acid sequence encoding the xylose isomerase, and a terminator.

24. The method of claim 23, wherein the promoter is a tubulin promoter that is at least 80% identical to SEQ ID NO:25 or SEQ ID NO:26.

25. The method of claim 23, wherein the terminator is a tubulin terminator that is at least 80% identical to SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:30.

26. The method of claim 19, wherein the isolated recombinant microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the exogenous nucleic acid sequence encoding xylose isomerase.

27. The method of claim 19, wherein the nucleic acid sequence encoding the xylose isomerase is at least 90% identical to SEQ ID NO:2.

28. The method of claim 19, wherein the isolated recombinant microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylulose kinase.

29. The method of claim 19, wherein the nucleic acid sequence encoding the xylulose kinase is at least 90% identical to SEQ ID NO:5.

30. The method of claim 19, wherein the isolated recombinant microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylose transporter.

31. The method of claim 30, wherein the xylose transporter is GXS1 from Candida intermedia.

32. The method of claim 30, wherein the nucleic acid sequence encoding the xylose transporter is at least 90% identical to SEQ ID NO:23.

33. The method of claim 19, wherein the isolated recombinant microorganism has increased xylose transport activity as compared to a control non-recombinant microorganism, increased xylose isomerase activity as compared to a control non-recombinant microorganism, increased xylulose kinase activity as compared to a control non-recombinant microorganism, or a combination thereof.

34. The method of claim 19, wherein the isolated recombinant microorganism grows with xylose as the sole carbon source.

35. The method of claim 19, wherein the microorganism is either a Thraustochytrium or a Schizochytrium microorganism.

36. The method of claim 19, wherein the microorganism is ONC-T18.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of currently pending U.S. application Ser. No. 15/918,786, filed Mar. 12, 2018, which is a continuation of U.S. application Ser. No. 15/208,849, filed Jul. 13, 2016 (now U.S. Pat. No. 9,951,326), which claims the benefit of priority to U.S. Provisional Application No. 62/191,983, filed Jul. 13, 2015, and U.S. Provisional Application No. 62/354,444, filed Jun. 24, 2016, which are incorporated by reference herein in their entireties.

BACKGROUND OF THE INVENTION

[0002] Heterotrophic fermentation of microorganisms is an efficient way of generating high value oil and biomass products. Under certain cultivation conditions, microorganisms synthesize intracellular oil, which can be extracted and used to produce fuel (e.g., biodiesel, bio-jetfuel, and the like) and nutritional lipids (e.g., polyunsaturated fatty acids such as DHA, EPA, and DPA). The biomass of some microorganisms is of great nutritional value due to high polyunsaturated fatty acid (PUFA) and protein content, and can be used as a nutritional supplement for animal feed. Thraustochytrids are eukaryotic, single-cell, microorganisms which can be used in such fermentation processes to produce lipids. Heterotrophic fermentations with Thraustochytrids convert organic carbon provided in the growth medium to lipids, which are harvested from the biomass at the end of the fermentation process. However, existing microorganism fermentations use mainly expensive carbohydrates, such as glucose, as the carbon source.

BRIEF SUMMARY OF THE INVENTION

[0003] Provided herein are recombinant microorganisms having two or more copies of a nucleic acid sequence encoding xylose isomerase, wherein the nucleic acid encoding the xylose isomerase is an exogenous nucleic acid. Optionally, the recombinant microorganisms include at least one nucleic acid sequence encoding a xylulose kinase and/or at least one nucleic acid sequence encoding a xylose transporter. The provided recombinant microorganisms are capable of growing on xylose as a carbon source.

BRIEF DESCRIPTION OF THE DRAWINGS

[0004] FIG. 1 is a schematic of the xylose metabolism pathway.

[0005] FIG. 2 is a graph showing expression of xylose isomerase in WT ONC-T18 during cycles of glucose starvation.

[0006] FIG. 3 is a graph showing expression of the putative xylulose kinase in WT ONC-T18 during cycles of glucose starvation.

[0007] FIG. 4 is a schematic showing an alpha-tubulin ble-isomerase plasmid construct.

[0008] FIG. 5 is a schematic showing an alpha-tubulin hygro-xylB plasmid construct.

[0009] FIG. 6 is a schematic showing a nucleic acid construct having an alpha-tubulin promoter a ble sequence a 2A sequence an xylose isomerase sequence and an alpha-tubulin terminator.

[0010] FIG. 7 is an image of a Southern blot to probe the xylose isomerase His-tagged gene within recombinant ONC-T18 strains "6" and "16".

[0011] FIG. 8 is a graph showing the qPCR determination of the number of xylose isomerase His-tagged gene insertions in recombinant ONC-T18 strains.

[0012] FIG. 9 is an image of a Southern blot to probe the xylB gene within recombinant ONC-T18 strains containing both xylose isomerase and xylulose kinase referred to in the graph as "7-3" and "7-7".

[0013] FIG. 10 is a graph of qPCR determination of the number of xylB gene insertions in recombinant 7-3 and 7-7 ONC-T18 strains.

[0014] FIG. 11 is a graph showing the expression of the xylose isomerase gene transcript in recombinant ONC-T18 strains "6" and "16."

[0015] FIG. 12 is a graph showing the in vitro xylose isomerase activity in Wt ONC-T18 and recombinant ONC-T18 strains "6" and "16."

[0016] FIG. 13 is a graph showing the combined xylose isomerase and xylulose kinase activity in vitro of recombinant ONC-T18 strain "16" encoding only xylose isomerase and recombinant ONC-T18 strains "7-3" and "7-7" encoding xylose isomerase and xylulose kinase.

[0017] FIGS. 14A and 14B are graphs showing xylose uptake improvement and decreased xylitol production in recombinant ONC-T18 strain "16" (squares). The Wild Type (WT) strain is represented by diamonds.

[0018] FIGS. 15A and 15B are graphs showing xylose usage improvement and decreased xylitol production in recombinant ONC-T18 strain "16" (squares) and recombinant ONC-T18 strains "7-3" (triangles) and "7-7" (asterisks). The Wild Type (WT) strain is represented by diamonds.

[0019] FIG. 16 is a graph showing accumulation of xylitol during a glucose:xylose fermentation with recombinant ONC-T18 strain "16" and recombinant ONC-T18 strain "7-7."

[0020] FIG. 17 is a schematic of different versions of the constructs used for transformation of ONC-T18.

[0021] FIG. 18 is a graph showing the alignment of the xylB sequence from E. coli (SEQ ID NO:20) with the codon optimized version of E. coli xylB (SEQ ID NO:5).

[0022] FIGS. 19A, 19B, and 19C are graphs showing xylose usage (FIG. 19A), glucose usage (FIG. 19B) and percent xylitol made (FIG. 19C) in strains comprising xylose isomerase, xylulose kinase and the sugar transporter Gxs1. WT is wild-type; IsoHis XylB "7-7" contains the xylose isomerase and xylB sequences, 36-2, 36-9 and 36-16 are transformants containing Gxs1, xylose isomerase and the xylB sequences (xylulose kinase).

[0023] FIGS. 20A and 20B are graphs showing the impact of temperature incubation on the activity of isomerase from T18 (FIG. 20A) and E. coli (FIG. 20B) with xylose (diamond) and xylulose (square).

[0024] FIGS. 21A and 21B are graphs showing dose dependency of isomerase from T18 (FIG. 21A) and E. coli (FIG. 21B) with xylose (diamond) and xylulose (square).

[0025] FIGS. 22A and 22B are graphs showing xylose use (FIG. 22A) and decreased xylitol production (FIG. 22B) in a T18B strain engineered with xylose isomerases ("16" (squares), "B" (x), and "6" (crosses)). FIGS. 22C (xylose) and 22D (xylitol production) show the same data expressed relative to wild type (diamonds) at 4 (gray) and 7 (black) days.

[0026] FIGS. 23A and 23B are graphs showing xylose use and decreased xylitol production in a T18B strain engineered with a xylose isomerase "16" (squares) and strains engineered to express a xylose isomerase and xylulose kinase "7-7" (x) and "7-3" (triangles). FIGS. 23C (xylose) and 23D (xylitol production) show the same data relative to wild type (diamonds) at 9 (gray) and 11 (black) days.

[0027] FIG. 24 is a graph showing improved xylose usage and decreased xylitol production in a T18B strain engineered to express a xylose isomerase and xylulose kinase "7-7" in fermentation. The wild type strain is represented by diamonds and the dotted line and the strain "7-7" is represented by circles.

[0028] FIG. 25 is a schematic showing .alpha.-tubulin aspTx-neo and .alpha.-tubulin gxs1-neo constructs.

[0029] FIG. 26A is an image of a Southern blot to probe the Gxs1 gene within "7-7" T18B strains engineered with the xylose transporter Gxs1. FIG. 26B is an image of a Southern blot to probe the AspTx gene within "7-7" T18B strains engineered with the xylose transporter AspTx.

[0030] FIG. 27A is a graph showing the use of xylose in T18 strains engineered with a xylose isomerase, a xylulose kinase and either the Gxs1 transporter (triangles) or AspTx transporter (circles). Strain "7-7" is represented by diamonds. FIG. 27B is a bar graph of the ratio of xylitol production versus xylose use for each of the 3 modified strains. FIG. 27C is a bar graph showing xylose use relative to strain "7-7." FIG. 27D is a bar graph showing xylitol production made relative to strain "7-7."

[0031] FIG. 28 is a graph showing growth of wild type (WT) (diamonds), isohis strain "16" (squares), strain "7-7" (x), and transporter strains Gxs1 (asterisks) and AspTx (triangles) in media containing xylose as sole carbon source.

[0032] FIG. 29A is a graph showing remaining glucose in alternative feedstock containing glucose and xylose during growth of WT (squares), strain "7-7" (triangles), and transporter strains Gxs1 (asterisks) and AspTx (crosses). FIG. 29B is a graph showing xylose remaining and xylitol produced over time when WT (squares) strain "7-7" (triangles) and transporter strains Gxs1 (asterisks) and AspTx (crosses) are grown on alternative feedstock containing glucose and xylose.

DETAILED DESCRIPTION OF THE INVENTION

[0033] Microorganisms such as Thraustochytrids encode genes required for the metabolism of xylose. However, the microorganism's innate metabolic pathways produce a large amount of the sugar alcohol, xylitol, which is secreted and potentially hinders growth of the microorganisms (see FIG. 14, WT). Furthermore, carbon atoms sequestered into xylitol are atoms that are diverted away from the target product in this process, namely, lipid production. In nature, two xylose metabolism pathways exist, the xylose reductase/xylitol dehydrogenase pathway and the xylose isomerase/xylulose kinase pathway (FIG. 1). Thraustochytrids have genes that encode proteins active in both pathways; however, the former pathway appears to be dominant as evidenced by a build-up of xylitol when grown in a xylose medium. In other organisms, the build-up of xylitol has been shown to be due to a redox co-factor imbalance required for xylose reductase/xylitol dehydrogenase pathway. Since the isomerase/kinase pathway does not depend on redox co-factors, over-expression of the isomerase gene removes co-factor dependence in the conversion of xylose to xylulose. As shown herein, transcriptomic studies with ONC-T18 showed that its xylose isomerase and putative xylulose kinase genes are mostly expressed during glucose starvation (FIG. 2 and FIG. 3); whereas, the putatively identified genes encoding for the xylose reductase and xylitol dehydrogenase are constitutively expressed. To increase the expression of the isomerase and kinase throughout all growth stages, microorganisms were engineered to include ONC-T18 isomerase gene and an E. coli xylulose kinase gene (xylB) such that they are under the control of the constitutively expressed promoter and terminator, e.g., an .alpha.-tubulin promoter and terminator. Optionally, the genes can be under the control of a inducible promoter and/or terminator.

[0034] The provided recombinant microorganisms demonstrate a level of control of the amount of expression of a gene of interest via the number of integrated transgene copies. As shown in the examples below, a recombinant ONC-T18 strain (Iso-His #16) harbouring eight (8) transgene copies demonstrates higher levels of xylose isomerase transcript expression, enzyme activity and xylose metabolism than a strain harbouring a single copy of the transgene (Iso-His #6). When Iso-His #16 was further modified to incorporate the xylB gene, a similar phenomenon is observed. Multiple copies of the xylB gene conferred greater enzyme activity and xylose metabolism productivity compared to single insertions. Thus, unexpectedly, it was not only necessary to recreate a xylose metabolism pathway, but to do so with multiple copies of the necessary transgenes. It was not anticipated that the Thraustrochytrid genome could accommodate multiple transgene copies and remain viable; therefore, it was not expected to observe such variability in expression levels amongst transformant strains. However, as provided herein, recombinant microorganisms can be produced that allow for controlled expression levels of transgenes indirectly by selecting among transformant strains that possess a transgene copy number "tailored" to a particular expression level optimized for the metabolic engineering of a particular pathway, e.g., the xylose pathway.

[0035] Provided herein are nucleic acids encoding one or more genes involved in xylose metabolism. The present application provides recombinant microorganisms, methods for making the microorganisms, and methods for producing oil using the microorganisms that are capable of metabolizing xylose. Specifically, provided herein are nucleic acids and polypeptides encoding xylose isomerase, xylulose kinase and xylose transporters for modifying microorganisms to be capable of metabolizing xylose and/or growing on xylose as the sole carbon source. Thus, provided are nucleic acids encoding a xylose isomerase. The nucleic acid sequences can be endogenous or heterologous to the microorganism. Exemplary nucleic acids sequences of xylose isomerases include, but are not limited to, those from Piromyces sp., Streptococcus sp., and Thraustochytrids. For example, exemplary nucleic acid sequences encoding xylose isomerases include, but are not limited to, SEQ ID NO:2 and SEQ ID NO:15; and exemplary polypeptide sequences of xylose isomerase include, but are not limited to, SEQ ID NO:16. Exemplary nucleic acids sequences of xylulose kinases include, but are not limited to, those from E. coli, Piromyces sp., Saccharomyces sp., and Pichia sp. For example, exemplary nucleic acid sequences encoding xylulose kinases include, but are not limited to, SEQ ID NO:5, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19 and SEQ ID NO:20. Exemplary nucleic acid sequences encoding sugar transporters, e.g., xylose transporters, include, but are not limited to, those from Aspergillus sp., Gfx1, Gxs1 and Sut1. For example, exemplary nucleic acid sequences encoding xylose transporters include, but are not limited to, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, and SEQ ID NO:24.

[0036] Nucleic acid, as used herein, refers to deoxyribonucleotides or ribonucleotides and polymers and complements thereof. The term includes deoxyribonucleotides or ribonucleotides in either single- or double-stranded form. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs). Unless otherwise indicated, conservatively modified variants of nucleic acid sequences (e.g., degenerate codon substitutions) and complementary sequences can be used in place of a particular nucleic acid sequence recited herein. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide.

[0037] A nucleic acid is operably linked when it is placed into a functional relationship with another nucleic acid sequence. For example, DNA that encodes a presequence or secretory leader is operably linked to DNA that encodes a polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, operably linked means that the DNA sequences being linked are near each other, and, in the case of a secretory leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. For example, a nucleic acid sequence that is operably linked to a second nucleic acid sequence is covalently linked, either directly or indirectly, to such second sequence, although any effective three-dimensional association is acceptable. A single nucleic acid sequence can be operably linked to multiple other sequences. For example, a single promoter can direct transcription of multiple RNA species. Linking can be accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice.

[0038] The terms identical or percent identity, in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI web site or the like). Such sequences are then said to be substantially identical. This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 25 amino acids or nucleotides in length, or more preferably over a region that is 50-100 amino acids or nucleotides in length.

[0039] For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Preferably, default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.

[0040] A comparison window, as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981); by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970); by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988); by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.); or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds. 1995 supplement)).

[0041] A preferred example of an algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977), and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, with the parameters described herein, to determine percent sequence identity for nucleic acids or proteins. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information, as known in the art. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of a selected length (W) in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The Expectation value (E) represents the number of different alignments with scores equivalent to or better than what is expected to occur in a database search by chance. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)), alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands.

[0042] The term polypeptide, as used herein, generally has its art-recognized meaning of a polymer of at least three amino acids and is intended to include peptides and proteins. However, the term is also used to refer to specific functional classes of polypeptides, such as, for example, desaturases, elongases, etc. For each such class, the present disclosure provides several examples of known sequences of such polypeptides. Those of ordinary skill in the art will appreciate, however, that the term polypeptide is intended to be sufficiently general as to encompass not only polypeptides having the complete sequence recited herein (or in a reference or database specifically mentioned herein), but also to encompass polypeptides that represent functional fragments (i.e., fragments retaining at least one activity) of such complete polypeptides. Moreover, those in the art understand that protein sequences generally tolerate some substitution without destroying activity. Thus, any polypeptide that retains activity and shares at least about 30-40% overall sequence identity, often greater than about 50%, 60%, 70%, or 80%, and further usually including at least one region of much higher identity, often greater than 90% or even 95%, 96%, 97%, 98%, or 99% in one or more highly conserved regions, usually encompassing at least 3-4 and often up to 20 or more amino acids, with another polypeptide of the same class, is encompassed within the relevant term polypeptide as used herein. Those in the art can determine other regions of similarity and/or identity by analysis of the sequences of various polypeptides described herein. As is known by those in the art, a variety of strategies are known, and tools are available, for performing comparisons of amino acid or nucleotide sequences in order to assess degrees of identity and/or similarity. These strategies include, for example, manual alignment, computer assisted sequence alignment and combinations thereof. A number of algorithms (which are generally computer implemented) for performing sequence alignment are widely available, or can be produced by one of skill in the art. Representative algorithms include, e.g., the local homology algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2: 482); the homology alignment algorithm of Needleman and Wunsch (J. Mol. Biol., 1970, 48: 443); the search for similarity method of Pearson and Lipman (Proc. Natl. Acad. Sci. (USA), 1988, 85: 2444); and/or by computerized implementations of these algorithms (e.g., GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.). Readily available computer programs incorporating such algorithms include, for example, BLASTN, BLASTP, Gapped BLAST, PILEUP, CLUSTALW, etc. When utilizing BLAST and Gapped BLAST programs, default parameters of the respective programs may be used. Alternatively, the practitioner may use non-default parameters depending on his or her experimental and/or other requirements (see for example, the Web site having URL www.ncbi.nlm.nih.gov).

[0043] As discussed above, the nucleic acids encoding the xylose transporter, xylulose kinase and xylose isomerase, can be linked to a promoter and/or terminator. Examples of promoters and terminators include, but are not limited to, tubulin promoters and terminators. By way of example, the promoter is a tubulin promoter, e.g., an alpha-tubulin promoter. Optionally, the promoter is at least 80% identical to SEQ ID NO:25 or SEQ ID NO:26. Optionally, the terminator is a tubulin terminator. Optionally, the terminator is at least 80% identical to SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:30.

[0044] As used herein, the terms promoter, promoter element, and regulatory sequence refer to a polynucleotide that regulates expression of a selected polynucleotide sequence operably linked to the promoter, and that effects expression of the selected polynucleotide sequence in cells. The term Thraustochytrium promoter, as used herein, refers to a promoter that functions in a Thraustochytrium cell. In some embodiments, a promoter element is or comprises untranslated regions (UTR) in a position 5' of coding sequences. 5' UTRs form part of the mRNA transcript and so are an integral part of protein expression in eukaryotic organisms. Following transcription 5'UTRs can regulate protein expression at both the transcription and translation levels.

[0045] As used herein, the term terminator refers to a polynucleotide that abrogates expression of, targets for maturation (e.g., adding a polyA tail), or imparts mRNA stability to a selected polynucleotide sequence operably linked to the terminator in cells. A terminator sequence may be downstream of a stop codon in a gene. The term Thraustochytrium terminator, as used herein, refers to a terminator that functions in a Thraustochytrium cell. Provided herein are also nucleic acid constructs that include nucleic acid sequences encoding xylose isomerase, xylulose kinase and xylose transporter as well as promoters, terminators, selectable markers, 2A peptides or any combination thereof. By way of example, provided is a first nucleic acid construct including a promoter, a selectable marker, a nucleic acid sequence encoding a 2A peptide, a nucleic acid sequence encoding a xylose isomerase, and a terminator. Also provided is a second nucleic acid construct including a promoter, selectable marker, a nucleic acid sequence encoding a 2A peptide, a nucleic acid sequence encoding a xylulose kinase, and a terminator. Further provided is a third nucleic acid construct including a promoter, a nucleic acid sequence encoding a xylose transporter, a nucleic acid sequence encoding a 2A peptide, a selectable marker, and a terminator. These constructs are exemplary and the nucleic acid sequences encoding the xylose isomerase, xylulose kinase and xylose transporter can be included on the same construct under control of the same or different promoters. Optionally, each of the nucleic acid sequences encoding the xylose isomerase, xylulose kinase and xylose transporter are on the same construct and are separated by 2A polypeptide sequences, e.g., as shown in SEQ ID NO:6. Thus, by way of example, a nucleic acid construct can include a tubulin promoter, a nucleic acid sequences encoding a xylose isomerase, xylulose kinase, and xylose transporter separated by a nucleic acid sequence encoding SEQ ID NO:6, a tubulin terminator and a selectable marker. Optionally, the selectable marker is the ble gene. Optionally, the selectable marker comprises SEQ ID NO:29.

[0046] The phrase selectable marker, as used herein, refers either to a nucleotide sequence, e.g., a gene, that encodes a product (polypeptide) that allows for selection, or to the gene product (e.g., polypeptide) itself. The term selectable marker is used herein as it is generally understood in the art and refers to a marker whose presence within a cell or organism confers a significant growth or survival advantage or disadvantage on the cell or organism under certain defined culture conditions (selective conditions). For example, the conditions may be the presence or absence of a particular compound or a particular environmental condition such as increased temperature, increased radiation, presence of a compound that is toxic in the absence of the marker, etc. The presence or absence of such compound(s) or environmental condition(s) is referred to as a selective condition or selective conditions. By growth advantage is meant either enhanced viability (e.g., cells or organisms with the growth advantage have an increased life span, on average, relative to otherwise identical cells), increased rate of proliferation (also referred to herein as growth rate) relative to otherwise identical cells or organisms, or both. In general, a population of cells having a growth advantage will exhibit fewer dead or nonviable cells and/or a greater rate of cell proliferation than a population of otherwise identical cells lacking the growth advantage. Although typically a selectable marker will confer a growth advantage on a cell, certain selectable markers confer a growth disadvantage on a cell, e.g., they make the cell more susceptible to the deleterious effects of certain compounds or environmental conditions than otherwise identical cells not expressing the marker. Antibiotic resistance markers are a non-limiting example of a class of selectable marker that can be used to select cells that express the marker. In the presence of an appropriate concentration of antibiotic (selective conditions), such a marker confers a growth advantage on a cell that expresses the marker. Thus, cells that express the antibiotic resistance marker are able to survive and/or proliferate in the presence of the antibiotic while cells that do not express the antibiotic resistance marker are not able to survive and/or are unable to proliferate in the presence of the antibiotic.

[0047] Examples of selectable markers include common bacterial antibiotics, such as but not limited to ampicillin, kanamycin and chloramphenicol, as well as selective compounds known to function in microalgae; examples include rrnS and AadA (Aminoglycoside 3'-adenylytranferase), which may be isolated from E. coli plasmid R538-1, conferring resistance to spectinomycin and streptomycin, respectively in E. coli and some microalgae (Hollingshead and Vapnek, Plasmid 13:17-30, 1985; Meslet-Cladiere and Vallon, Eukaryot Cell. 10(12):1670-8 2011). Another example is the 23S RNA protein, rrnL, which confers resistance to erythromycin (Newman, Boynton et al., Genetics, 126:875-888 1990; Roffey, Golbeck et al., Proc. Natl Acad. Sci. USA, 88:9122-9126 1991). Another example is Ble, a GC rich gene isolated from Streptoalloteichus hindustanus that confers resistance to zeocin (Stevens, Purton et al., Mol. Gen. Genet., 251:23-30 1996). Aph7 is yet another example, which is a Streptomyces hygroscopicus-derived aminoglycoside phosphotransferase gene that confers resistance to hygromycin B (Berthold, Schmitt et al., Protist 153(4):401-412 2002). Additional examples include: AphVIII, a Streptomyces rimosus derived aminoglycoside 3'-phosphotransferase type VIII that confers resistance to Paromycin in E. coli and some microalgae (Sizova, Lapina et al., Gene 181(1-2):13-18 1996; Sizova, Fuhrmann et al., Gene 277(1-2):221-229 2001); Nat & Sat-1, which encode nourseothricin acetyl transferase from Streptomyces noursei and streptothricin acetyl transferase from E. coli, which confer resistance to nourseothricin (Zaslayskaia, Lippmeier et al., Journal of Phycology 36(2):379-386, 2000); Neo, an aminoglycoside 3'-phosphotransferase, conferring resistance to the aminoglycosides; kanamycin, neomycin, and the analog G418 (Hasnain, Manavathu et al., Molecular and Cellular Biology 5(12):3647-3650, 1985); and Cry1, a ribosomal protein S14 that confers resistance to emetine (Nelson, Savereide et al., Molecular and Cellular Biology 14(6):4011-4019, 1994).

[0048] Other selectable markers include nutritional markers, also referred to as auto- or auxo-trophic markers. These include photoautotrophy markers that impose selection based on the restoration of photosynthetic activity within a photosynthetic organism. Photoautotrophic markers include, but are not limited to, AtpB, TscA, PetB, NifH, psaA and psaB (Boynton, Gillham et al., Science 240(4858):1534-1538 1988; Goldschmidt-Clermont, Nucleic Acids Research 19(15):4083-4089, 1991; Kindle, Richards et al., PNAS, 88(5):1721-1725, 1991; Redding, MacMillan et al., EMBO J 17(1):50-60, 1998; Cheng, Day et al., Biochemical and Biophysical Research Communications 329(3):966-975, 2005). Alternative or additional nutritional markers include ARG7, which encodes argininosuccinate lyase, a critical step in arginine biosynthesis (Debuchy, Purton et al., EMBO J 8(10):2803-2809, 1989); NIT1, which encodes a nitrate reductase essential to nitrogen metabolism (Fernandez, Schnell et al., PNAS, 86(17):6449-6453, 1989); THI10, which is essential to thiamine biosynthesis (Ferris, Genetics 141(2):543-549, 1995); and NIC1, which catalyzes an essential step in nicotinamide biosynthesis (Ferris, Genetics 141(2):543-549, 1995). Such markers are generally enzymes that function in a biosynthetic pathway to produce a compound that is needed for cell growth or survival. In general, under nonselective conditions, the required compound is present in the environment or is produced by an alternative pathway in the cell. Under selective conditions, functioning of the biosynthetic pathway, in which the marker is involved, is needed to produce the compound.

[0049] The phrase selection agent, as used herein refers to an agent that introduces a selective pressure on a cell or populations of cells either in favor of or against the cell or population of cells that bear a selectable marker. For example, the selection agent is an antibiotic and the selectable marker is an antibiotic resistance gene. Optionally, zeocin is used as the selection agent.

[0050] Suitable microorganisms that can be transformed with the provided nucleic acids encoding the genes involved in xylose metabolism and nucleic acid constructs containing the same include, but are not limited to, algae (e.g., microalgae), fungi (including yeast), bacteria, or protists. Optionally, the microorganism includes Thraustochytrids of the order Thraustochytriales, more specifically Thraustochytriales of the genus Thraustochytrium. Optionally, the population of microorganisms includes Thraustochytriales as described in U.S. Pat. Nos. 5,340,594 and 5,340,742, which are incorporated herein by reference in their entireties. The microorganism can be a Thraustochytrium species, such as the Thraustochytrium species deposited as ATCC Accession No. PTA-6245 (i.e., ONC-T18) as described in U.S. Pat. No. 8,163,515, which is incorporated by reference herein in its entirety. Thus, the microorganism can have an 18s rRNA sequence that is at least 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9% or more (e.g., including 100%) identical to SEQ ID NO:1.

[0051] Microalgae are acknowledged in the field to represent a diverse group of organisms. For the purpose of this document, the term microalgae will be used to describe unicellular microorganisms derived from aquatic and/or terrestrial environments (some cyanobacteria are terrestrial/soil dwelling). Aquatic environments extend from oceanic environments to freshwater lakes and rivers, and also include brackish environments such as estuaries and river mouths. Microalgae can be photosynthetic; optionally, microalgae are heterotrophic. Microalgae can be of eukaryotic nature or of prokaryotic nature. Microalgae can be non-motile or motile.

[0052] The term thraustochytrid, as used herein, refers to any member of the order Thraustochytriales, which includes the family Thraustochytriaceae. Strains described as thraustochytrids include the following organisms: Order: Thraustochytriales; Family: Thraustochytriaceae; Genera: Thraustochytrium (Species: sp., arudimentale, aureum, benthicola, globosum, kinnei, motivum, multirudimentale, pachydermum, proliferum, roseum, striatum), Ulkenia (Species: sp., amoeboidea, kerguelensis, minuta, profunda, radiata, sailens, sarkariana, schizochytrops, visurgensis, yorkensis), Schizochytrium (Species: sp., aggregatum, limnaceum, mangrovei, minutum, octosporuni), Japonochytrium (Species: sp., marinum), Aplanochytrium (Species: sp., haliotidis, kerguelensis, profunda, stocchinoi), Althornia (Species: sp., crouchii), or Elina (Species: sp., marisalba, sinorifica). Species described within Ulkenia will be considered to be members of the genus Thraustochytrium. Strains described as being within the genus Thrautochytrium may share traits in common with and also be described as falling within the genus Schizochytrium. For example, in some taxonomic classifications ONC-T18 may be considered within the genus Thrautochytrium, while in other classifications it may be described as within the genus Schizochytrium because it comprises traits indicative of both genera.

[0053] The term transformation, as used herein refers to a process by which an exogenous or heterologous nucleic acid molecule (e.g., a vector or recombinant nucleic acid molecule) is introduced into a recipient cell or microorganism. The exogenous or heterologous nucleic acid molecule may or may not be integrated into (i.e., covalently linked to) chromosomal DNA making up the genome of the host cell or microorganism. For example, the exogenous or heterologous polynucleotide may be maintained on an episomal element, such as a plasmid. Alternatively or additionally, the exogenous or heterologous polynucleotide may become integrated into a chromosome so that it is inherited by daughter cells through chromosomal replication. Methods for transformation include, but are not limited to, calcium phosphate precipitation; Ca.sup.2+ treatment; fusion of recipient cells with bacterial protoplasts containing the recombinant nucleic acid; treatment of the recipient cells with liposomes containing the recombinant nucleic acid; DEAE dextran; fusion using polyethylene glycol (PEG); electroporation; magnetoporation; biolistic delivery; retroviral infection; lipofection; and micro-injection of DNA directly into cells.

[0054] The term transformed, as used in reference to cells, refers to cells that have undergone transformation as described herein such that the cells carry exogenous or heterologous genetic material (e.g., a recombinant nucleic acid). The term transformed can also or alternatively be used to refer to microorganisms, strains of microorganisms, tissues, organisms, etc. that contain exogenous or heterologous genetic material.

[0055] The term introduce, as used herein with reference to introduction of a nucleic acid into a cell or organism, is intended to have its broadest meaning and to encompass introduction, for example by transformation methods (e.g., calcium-chloride-mediated transformation, electroporation, particle bombardment), and also introduction by other methods including transduction, conjugation, and mating. Optionally, a construct is utilized to introduce a nucleic acid into a cell or organism.

[0056] The microorganisms for use in the methods described herein can produce a variety of lipid compounds. As used herein, the term lipid includes phospholipids, free fatty acids, esters of fatty acids, triacylglycerols, sterols and sterol esters, carotenoids, xanthophyls (e.g., oxycarotenoids), hydrocarbons, and other lipids known to one of ordinary skill in the art. Optionally, the lipid compounds include unsaturated lipids. The unsaturated lipids can include polyunsaturated lipids (i.e., lipids containing at least 2 unsaturated carbon-carbon bonds, e.g., double bonds) or highly unsaturated lipids (i.e., lipids containing 4 or more unsaturated carbon-carbon bonds). Examples of unsaturated lipids include omega-3 and/or omega-6 polyunsaturated fatty acids, such as docosahexaenoic acid (i.e., DHA), eicosapentaenoic acid (i.e., EPA), and other naturally occurring unsaturated, polyunsaturated and highly unsaturated compounds.

[0057] Provided herein are recombinant microorganisms engineered to express polypeptides for metabolizing C5 carbon sugars such as xylose. Specifically, provided is a recombinant microorganism having one or more copies of a nucleic acid sequence encoding xylose isomerase, wherein the nucleic acid encoding xylose isomerase is a exogenous nucleic acid. Optionally, the recombinant microorganism comprises two or more copies of the nucleic acid sequence encoding xylose isomerase. Optionally, the recombinant microorganisms also contains one or two copies of an endogenous nucleic acid sequence encoding xylose isomerase. By way of example, the recombinant microorganisms can contain one or two copies of an endogenous nucleic acid sequence encoding xylose isomerase and one copy of an exogenous nucleic acid sequence encoding xylose isomerase. Optionally, the recombinant microorganism includes three copies of a nucleic acid sequence encoding xylose isomerase, one being exogenously introduced and the other two being endogenous. The term recombinant when used with reference to a cell, nucleic acid, polypeptide, vector, or the like indicates that the cell, nucleic acid, polypeptide, vector or the like has been modified by or is the result of laboratory methods and is non-naturally occurring. Thus, for example, recombinant microorganisms include microorganisms produced by or modified by laboratory methods, e.g., transformation methods for introducing nucleic acids into the microroganism. Recombinant microorganisms can include nucleic acid sequences not found within the native (non-recombinant) form of the microroganisms or can include nucleic acid sequences that have been modified, e.g., linked to a non-native promoter.

[0058] As used herein, the term exogenous refers to a substance, such as a nucleic acid (e.g., nucleic acids including regulatory sequences and/or genes) or polypeptide, that is artificially introduced into a cell or organism and/or does not naturally occur in the cell in which it is present. In other words, the substance, such as nucleic acid or polypeptide, originates from outside a cell or organism into which it is introduced. An exogenous nucleic acid can have a nucleotide sequence that is identical to that of a nucleic acid naturally present in the cell. For example, a Thraustochytrid cell can be engineered to include a nucleic acid having a Thraustochytrid or Thraustochytrium regulatory sequence. In a particular example, an endogenous Thraustochytrid or Thraustochytrium regulatory sequence is operably linked to a gene with which the regulatory sequence is not involved under natural conditions. Although the Thraustochytrid or Thraustochytrium regulatory sequence may naturally occur in the host cell, the introduced nucleic acid is exogenous according to the present disclosure. An exogenous nucleic acid can have a nucleotide sequence that is different from that of any nucleic acid that is naturally present in the cell. For example, the exogenous nucleic acid can be a heterologous nucleic acid, i.e., a nucleic acid from a different species or organism. Thus, an exogenous nucleic acid can have a nucleic acid sequence that is identical to that of a nucleic acid that is naturally found in a source organism but that is different from the cell into which the exogenous nucleic acid is introduced. As used herein, the term endogenous, refers to a nucleic acid sequence that is native to a cell. As used herein, the term heterologous refers to a nucleic acid sequence that is not native to a cell, i.e., is from a different organism than the cell. The terms exogenous and endogenous or heterologous are not mutually exclusive. Thus, a nucleic acid sequence can be exogenous and endogenous, meaning the nucleic acid sequence can be introduced into a cell but have a sequence that is the same as or similar to the sequence of a nucleic acid naturally present in the cell. Similarly, a nucleic acid sequence can be exogenous and heterologous meaning the nucleic acid sequence can be introduced into a cell but have a sequence that is not native to the cell, e.g., a sequence from a different organism.

[0059] As discussed above, the provided recombinant microorganisms contain at least two copies of a nucleic acid sequence encoding a xylose isomerase. The provided microorganisms optionally also contain at least one nucleic acid sequence encoding a xylulose kinase. Optionally, the recombinant microorganisms comprise at least one nucleic acid sequence encoding a xylose transporter. The nucleic acid sequences encoding the xylose isomerase, xylulose kinase, and/or xylose transporter are, optionally, exogenous nucleic acid sequences. Optionally, the nucleic acid sequence encoding the xylose isomerase is an endogenous nucleic acid sequence. Optionally, the nucleic acid sequence encoding the xylulose kinase and/or xylose transporter is a heterologous nucleic acid. Optionally, the microorganism contains at least two copies of a nucleic acid sequence encoding a xylose isomerase, at least two copies of a nucleic acid sequence encoding a xylulose kinase, and at least one nucleic acid sequence encoding a xylose transporter. Optionally, the heterologous nucleic acid sequence encoding the xylose isomerase is at least 90% identical to SEQ ID NO:2. Optionally, the heterologous nucleic acid sequence encoding the xylulose kinase is at least 90% identical to SEQ ID NO:5. As noted above, optionally, the nucleic acid encoding the xylose transporter is a heterologous nucleic acid. Optionally, the xylose transporter encoded by the heterologous nucleic acid is GXS1 from Candida intermedia. Optionally, the heterologous nucleic acid sequence encoding the xylose transporter is at least 90% identical to SEQ ID NO:23.

[0060] The provided recombinant microorganisms not only contain nucleic acid sequences encoding genes involved in xylose metabolism, they can include multiple copies of such sequences. Thus, the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding xylose isomerase. Optionally, the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylulose kinase. Optionally, the microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylose transporter.

[0061] In the provided microorganisms, the nucleic acids, e.g., xylose isomerase, xylulose kinase or xylose transporter can be operably linked to a promoter and/or terminator. Optionally, the exogenous nucleic acid sequence encoding the xylose isomerase is operably linked to a promoter. Optionally, the nucleic acid sequence encoding the xylulose kinase and/or the nucleic acid sequence encoding the xylose transporter are also operably linked to a promoter. Optionally, the promoter is a tubulin promoter. Optionally, the promoter is at least 80% identical to SEQ ID NO:25 or SEQ ID NO:26. Optionally, the exogenous nucleic acid sequence encoding the xylose isomerase comprises a terminator. Optionally, the nucleic acid sequence encoding the xylulose kinase comprises a terminator. Optionally, the nucleic acid sequence encoding the xylose transporter comprises a terminator. Optionally, the terminator is a tubulin terminator. Optionally, the terminator is at least 80% identical to SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:30.

[0062] The provided microorganisms can include a selectable marker to confirm transformation of genes of interest. Thus, the microorganism can further include a selectable marker. Optionally, the selectable marker is an antibiotic resistance gene. Optionally, the antibiotic is zeocin, hygromycin B, kanamycin or neomycin. Optionally, the microorganism is either a Thraustochytrium or a Schizochytrium microorganism. Optionally, the microorganism is ONC-T18.

[0063] The provided microorganisms have distinguishing features over wild type microorganisms. For example, the recombinant microorganisms can have increased xylose transport activity as compared to a non-recombinant control (or wild type) microorganism, increased xylose isomerase activity as compared to a non-recombinant control (or wild type) microorganism, increased xylulose kinase activity as compared to a non-recombinant control (or wild type) microorganism, or any combination of these activities. Optionally, the recombinant microorganism grows with xylose as the sole carbon source.

[0064] Also provided are methods of making the recombinant microorganisms. Thus, provided is a method of making a recombinant xylose-metabolizing microorganism including providing one or more nucleic acid constructs comprising a nucleic acid sequence encoding a xylose isomerase, a nucleic acid sequence encoding a xylulose kinase and a nucleic acid sequence encoding a xylose transporter; transforming the microorganism with the one or more nucleic acid constructs; and isolating microorganisms comprising at least two copies of the nucleic acid sequences encoding the xylose isomerase. Optionally, the methods further include isolating microorganisms comprising at least two copies of the nucleic acid sequence encoding the xylulose kinase. Optionally, the method includes isolating microorganisms comprising at least one copy of the xylose transporter. Optionally, the one or more nucleic acid constructs further comprise a selectable marker.

[0065] In the provided methods, the nucleic acid sequences encoding the xylose isomerase, xylulose kinase and xylose transporter can be located on the same or different constructs. Optionally, the method includes providing a first nucleic acid construct comprising a nucleic acid sequence encoding a xylose isomerase, a second nucleic acid construct comprising a nucleic acid sequence encoding a xylulose kinase and a third nucleic acid construct comprising a nucleic acid sequence encoding a xylose transporter. Optionally, the first, second and third nucleic acid constructs comprise the same selectable marker. Optionally, the first nucleic acid construct comprises a promoter, a selectable marker, a nucleic acid sequence encoding a 2A peptide, the nucleic acid sequence encoding the xylose isomerase, and a terminator. Optionally, the second nucleic acid construct comprises a promoter, selectable marker, a nucleic acid sequence encoding a 2A peptide, the nucleic acid sequence encoding the xylulose kinase, and a terminator. Optionally, the third nucleic acid construct comprises a promoter, the nucleic acid sequence encoding the xylose transporter, a nucleic acid sequence encoding a 2A peptide, a selectable marker, and a terminator. As noted above, selectable markers include, but are not limited to, antibiotic resistance genes. Optionally, the antibiotic is zeocin, hygromycin B, kanamycin or neomycin. Promoters used for the constructs include, but are not limited to, a tubulin promoter. Optionally, the promoter is at least 80% identical to SEQ ID NO:25 or SEQ ID NO:26. Terminators used for the constructs include, but are not limited to, a tubulin terminator. Optionally, the terminator is at least 80% identical to SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:30.

[0066] In the provided methods, the isolated recombinant microorganisms can include one or more copies of the xylose isomerase, xylulose kinase and xylose transporter. Optionally, the isolated recombinant microorganism comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding xylose isomerase. Optionally, the xylose isomerase is an endogenous xylose isomerase or a heterologous xylose isomerase. Optionally, the nucleic acid sequence encoding the xylose isomerase is at least 90% identical to SEQ ID NO:2. Optionally, the isolated recombinant microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylulose kinase. Optionally, the xylulose kinase is a heterologous xylulose kinase. Optionally, the nucleic acid sequence encoding the xylulose kinase is at least 90% identical to SEQ ID NO:5. Optionally, the isolated recombinant microorganism comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 copies of the nucleic acid sequence encoding the xylose transporter. Optionally, the xylose transporter is a heterologous xylose transporter. Optionally, the xylose transporter is GXS1 from Candida intermedia. Optionally, the nucleic acid sequence encoding the xylose transporter is at least 90% identical to SEQ ID NO:23. Optionally, the microorganism is either a Thraustochytrium or a Schizochytrium microorganism. Optionally, the microorganism is ONC-T18.

[0067] As noted above, the isolated recombinant microorgansims can have increased xylose transport activity as compared to a control non-recombinant microorganism, increased xylose isomerase activity as compared to a control non-recombinant microorganism, increased xylulose kinase activity as compared to a control non-recombinant microorganism, or a combination thereof. Optionally, the isolated recombinant microorganism grows with xylose as the sole carbon source.

[0068] As described herein, a control or standard control refers to a sample, measurement, or value that serves as a reference, usually a known reference, for comparison to a test sample, measurement, or value. For example, a test microorganism, e.g., a microorganism transformed with nucleic acid sequences encoding genes for metabolizing xylose can be compared to a known normal (wild-type) microorganism (e.g., a standard control microorganism). A standard control can also represent an average measurement or value gathered from a population of microorganisms (e.g., standard control microorganisms) that do not grow or grow poorly on xylose as the sole carbon source or that do not have or have minimal levels of xylose isomerase activity, xylulose kinase activity and/or xylose transport activity. One of skill will recognize that standard controls can be designed for assessment of any number of parameters (e.g., RNA levels, polypeptide levels, specific cell types, and the like).

[0069] Provided herein are also methods of producing oil using the recombinant microorganisms. The method includes providing the recombinant microorganism, wherein the microorganism grows on xylose as the sole carbon source, and culturing the microorganism in a culture medium under suitable conditions to produce the oil. Optionally, the oil comprises triglycerides. Optionally, the oil comprises alpha linolenic acid, arachidonic acid, docosahexanenoic acid, docosapentaenoic acid, eicosapentaenoic acid, gamma-linolenic acid, linoleic acid, linolenic acid, or a combination thereof. Optionally, the method further includes isolating the oil.

[0070] The provided methods include or can be used in conjunction with additional steps for culturing microorganisms according to methods known in the art. For example, a Thraustochytrid, e.g., a Thraustochytrium sp., can be cultivated according to methods described in U.S. Patent Publications 2009/0117194 or 2012/0244584, which are herein incorporated by reference in their entireties for each step of the methods or composition used therein.

[0071] Microorganisms are grown in a growth medium (also known as culture medium). Any of a variety of medium can be suitable for use in culturing the microorganisms described herein. Optionally, the medium supplies various nutritional components, including a carbon source and a nitrogen source, for the microorganism. Medium for Thraustochytrid culture can include any of a variety of carbon sources. Examples of carbon sources include fatty acids, lipids, glycerols, triglycerols, carbohydrates, polyols, amino sugars, and any kind of biomass or waste stream. Fatty acids include, for example, oleic acid. Carbohydrates include, but are not limited to, glucose, cellulose, hemicellulose, fructose, dextrose, xylose, lactulose, galactose, maltotriose, maltose, lactose, glycogen, gelatin, starch (corn or wheat), acetate, m-inositol (e.g., derived from corn steep liquor), galacturonic acid (e.g., derived from pectin), L-fucose (e.g., derived from galactose), gentiobiose, glucosamine, alpha-D-glucose-1-phosphate (e.g., derived from glucose), cellobiose, dextrin, alpha-cyclodextrin (e.g., derived from starch), and sucrose (e.g., from molasses). Polyols include, but are not limited to, maltitol, erythritol, and adonitol. Amino sugars include, but are not limited to, N-acetyl-D-galactosamine, N-acetyl-D-glucosamine, and N-acetyl-beta-D-mannosamine.

[0072] Optionally, the microorganisms provided herein are cultivated under conditions that increase biomass and/or production of a compound of interest (e.g., oil or total fatty acid (TFA) content). Thraustochytrids, for example, are typically cultured in saline medium. Optionally, Thraustochytrids can be cultured in medium having a salt concentration from about 0.5 g/L to about 50.0 g/L. Optionally, Thraustochytrids are cultured in medium having a salt concentration from about 0.5 g/L to about 35 g/L (e.g., from about 18 g/L to about 35 g/L). Optionally, the Thraustochytrids described herein can be grown in low salt conditions. For example, the Thraustochytrids can be cultured in a medium having a salt concentration from about 0.5 g/L to about 20 g/L (e.g., from about 0.5 g/L to about 15 g/L). The culture medium optionally includes NaCl. Optionally, the medium includes natural or artificial sea salt and/or artificial seawater.

[0073] The culture medium can include non-chloride-containing sodium salts as a source of sodium. Examples of non-chloride sodium salts suitable for use in accordance with the present methods include, but are not limited to, soda ash (a mixture of sodium carbonate and sodium oxide), sodium carbonate, sodium bicarbonate, sodium sulfate, and mixtures thereof. See, e.g., U.S. Pat. Nos. 5,340,742 and 6,607,900, the entire contents of each of which are incorporated by reference herein. A significant portion of the total sodium, for example, can be supplied by non-chloride salts such that less than about 100%, 75%, 50%, or 25% of the total sodium in culture medium is supplied by sodium chloride.

[0074] Medium for Thraustochytrids culture can include any of a variety of nitrogen sources. Exemplary nitrogen sources include ammonium solutions (e.g., NH.sub.4 in H.sub.2O), ammonium or amine salts (e.g., (NH.sub.4).sub.2SO.sub.4, (NH.sub.4).sub.3PO.sub.4, NH.sub.4NO.sub.3, NH.sub.4OOCH.sub.2CH.sub.3 (NH.sub.4Ac)), peptone, tryptone, yeast extract, malt extract, fish meal, sodium glutamate, soy extract, casamino acids and distiller grains. Concentrations of nitrogen sources in suitable medium typically range between and including about 1 g/L and about 25 g/L.

[0075] The medium optionally includes a phosphate, such as potassium phosphate or sodium-phosphate. Inorganic salts and trace nutrients in medium can include ammonium sulfate, sodium bicarbonate, sodium orthovanadate, potassium chromate, sodium molybdate, selenous acid, nickel sulfate, copper sulfate, zinc sulfate, cobalt chloride, iron chloride, manganese chloride calcium chloride, and EDTA. Vitamins such as pyridoxine hydrochloride, thiamine hydrochloride, calcium pantothenate, p-aminobenzoic acid, riboflavin, nicotinic acid, biotin, folic acid and vitamin B12 can be included.

[0076] The pH of the medium can be adjusted to between and including 3.0 and 10.0 using acid or base, where appropriate, and/or using the nitrogen source. Optionally, the medium can be sterilized.

[0077] Generally a medium used for culture of a microorganism is a liquid medium. However, the medium used for culture of a microorganism can be a solid medium. In addition to carbon and nitrogen sources as discussed herein, a solid medium can contain one or more components (e.g., agar or agarose) that provide structural support and/or allow the medium to be in solid form.

[0078] Optionally, the resulting biomass is pasteurized to inactivate undesirable substances present in the biomass. For example, the biomass can be pasteurized to inactivate compound degrading substances. The biomass can be present in the fermentation medium or isolated from the fermentation medium for the pasteurization step. The pasteurization step can be performed by heating the biomass and/or fermentation medium to an elevated temperature. For example, the biomass and/or fermentation medium can be heated to a temperature from about 50.degree. C. to about 95.degree. C. (e.g., from about 55.degree. C. to about 90.degree. C. or from about 65.degree. C. to about 80.degree. C.). Optionally, the biomass and/or fermentation medium can be heated from about 30 minutes to about 120 minutes (e.g., from about 45 minutes to about 90 minutes, or from about 55 minutes to about 75 minutes). The pasteurization can be performed using a suitable heating means, such as, for example, by direct steam injection.

[0079] Optionally, no pasteurization step is performed. Stated differently, the method taught herein optionally lacks a pasteurization step.

[0080] Optionally, the biomass can be harvested according to a variety of methods, including those currently known to one skilled in the art. For example, the biomass can be collected from the fermentation medium using, for example, centrifugation (e.g., with a solid-ejecting centrifuge) or filtration (e.g., cross-flow filtration). Optionally, the harvesting step includes use of a precipitation agent for the accelerated collection of cellular biomass (e.g., sodium phosphate or calcium chloride).

[0081] Optionally, the biomass is washed with water. Optionally, the biomass can be concentrated up to about 20% solids. For example, the biomass can be concentrated to about 5% to about 20% solids, from about 7.5% to about 15% solids, or from about solids to about 20% solids, or any percentage within the recited ranges. Optionally, the biomass can be concentrated to about 20% solids or less, about 19% solids or less, about 18% solids or less, about 17% solids or less, about 16% solids or less, about 15% solids or less, about 14% solids or less, about 13% solids or less, about 12% solids or less, about 11% solids or less, about 10% solids or less, about 9% solids or less, about 8% solids or less, about 7% solids or less, about 6% solids or less, about 5% solids or less, about 4% solids or less, about 3% solids or less, about 2% solids or less, or about 1% solids or less.

[0082] The provided methods, optionally, include isolating the polyunsaturated fatty acids from the biomass or microorganisms. Isolation of the polyunsaturated fatty acids can be performed using one or more of a variety of methods, including those currently known to one of skill in the art. For example, methods of isolating polyunsaturated fatty acids are described in U.S. Pat. No. 8,163,515, which is incorporated by reference herein in its entirety. Optionally, the medium is not sterilized prior to isolation of the polyunsaturated fatty acids. Optionally, sterilization comprises an increase in temperature. Optionally, the polyunsaturated fatty acids produced by the microorganisms and isolated from the provided methods are medium chain fatty acids. Optionally, the one or more polyunsaturated fatty acids are selected from the group consisting of alpha linolenic acid, arachidonic acid, docosahexanenoic acid, docosapentaenoic acid, eicosapentaenoic acid, gamma-linolenic acid, linoleic acid, linolenic acid, and combinations thereof.

[0083] Oil including polyunsaturated fatty acids (PUFAs) and other lipids produced according to the method described herein can be utilized in any of a variety of applications exploiting their biological, nutritional, or chemical properties. Thus, the provided methods optionally include isolating oil from the harvested portion of the threshold volume. Optionally, the oil is used to produce fuel, e.g., biofuel. Optionally, the oil can be used in pharmaceuticals, food supplements, animal feed additives, cosmetics, and the like. Lipids produced according to the methods described herein can also be used as intermediates in the production of other compounds.

[0084] By way of example, the oil produced by the microorganisms cultured using the provided methods can comprise fatty acids. Optionally, the fatty acids are selected from the group consisting of alpha linolenic acid, arachidonic acid, docosahexaenoic acid, docosapentaenoic acid, eicosapentaenoic acid, gamma-linolenic acid, linoleic acid, linolenic acid, and combinations thereof. Optionally, the oil comprises triglycerides. Optionally, the oil comprises fatty acids selected from the group consisting of palmitic acid (C16:0), myristic acid (C14:0), palmitoleic acid (C16:1(n-7)), cis-vaccenic acid (C18:1(n-7)), docosapentaenoic acid (C22:5(n-6)), docosahexaenoic acid (C22:6(n-3)), and combinations thereof.

[0085] Optionally, the lipids produced according to the methods described herein can be incorporated into a final product (e.g., a food or feed supplement, an infant formula, a pharmaceutical, a fuel, etc.). Suitable food or feed supplements into which the lipids can be incorporated include beverages such as milk, water, sports drinks, energy drinks, teas, and juices; confections such as candies, jellies, and biscuits; fat-containing foods and beverages such as dairy products; processed food products such as soft rice (or porridge); infant formulae; breakfast cereals; or the like. Optionally, one or more produced lipids can be incorporated into a dietary supplement, such as, for example, a vitamin or multivitamin. Optionally, a lipid produced according to the method described herein can be included in a dietary supplement and optionally can be directly incorporated into a component of food or feed (e.g., a food supplement).

[0086] Examples of feedstuffs into which lipids produced by the methods described herein can be incorporated include pet foods such as cat foods; dog foods; feeds for aquarium fish, cultured fish or crustaceans, etc.; feed for farm-raised animals (including livestock and fish or crustaceans raised in aquaculture). Food or feed material into which the lipids produced according to the methods described herein can be incorporated is preferably palatable to the organism which is the intended recipient. This food or feed material can have any physical properties currently known for a food material (e.g., solid, liquid, soft).

[0087] Optionally, one or more of the produced compounds (e.g., PUFAs) can be incorporated into a nutraceutical or pharmaceutical product. Examples of such a nutraceuticals or pharmaceuticals include various types of tablets, capsules, drinkable agents, etc. Optionally, the nutraceutical or pharmaceutical is suitable for topical application. Dosage forms can include, for example, capsules, oils, granula, granula subtilae, pulveres, tabellae, pilulae, trochisci, or the like.

[0088] The oil or lipids produced according to the methods described herein can be incorporated into products as described herein in combination with any of a variety of other agents. For instance, such compounds can be combined with one or more binders or fillers, chelating agents, pigments, salts, surfactants, moisturizers, viscosity modifiers, thickeners, emollients, fragrances, preservatives, etc., or any combination thereof.

[0089] Disclosed are materials, compositions, and components that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed methods and compositions. These and other materials are disclosed herein, and it is understood that when combinations, subsets, interactions, groups, etc. of these materials are disclosed that while specific reference of each various individual and collective combinations and permutations of these compounds may not be explicitly disclosed, each is specifically contemplated and described herein. For example, if a method is disclosed and discussed and a number of modifications that can be made to a number of molecules including the method are discussed, each and every combination and permutation of the method, and the modifications that are possible are specifically contemplated unless specifically indicated to the contrary. Likewise, any subset or combination of these is also specifically contemplated and disclosed. This concept applies to all aspects of this disclosure including, but not limited to, steps in methods using the disclosed compositions. Thus, if there are a variety of additional steps that can be performed, it is understood that each of these additional steps can be performed with any specific method steps or combination of method steps of the disclosed methods, and that each such combination or subset of combinations is specifically contemplated and should be considered disclosed.

[0090] Publications cited herein and the material for which they are cited are hereby specifically incorporated by reference in their entireties.

[0091] The examples below are intended to further illustrate certain aspects of the methods and compositions described herein, and are not intended to limit the scope of the claims.

EXAMPLES

Example 1. C5 Carbon Metabolism by Recombinant Thraustochytrids

[0092] In nature, two xylose metabolism pathways exist, the xylose reductase/xylitol dehydrogenase pathway and the xylose isomerase/xylulose kinase pathway (FIG. 1). ONC-T18 encodes genes from both pathways, and, as described above, the xylose reductase/xylitol dehydrogenase pathway is dominant, as evidenced by a build-up of xylitol when grown in a xylose medium. Since the isomerase/kinase pathway does not depend on redox co-factors, over-expression of ONC-T18's isomerase gene removes co-factor dependence in the conversion of xylose to xylulose. As shown herein in FIGS. 2 and 3, transcriptomic studies with ONC-T18 showed that its xylose isomerase and putative xylulose kinase genes were mostly expressed during glucose starvation; whereas, the putatively identified genes encoding for the xylose reductase and xylitol dehydrogenase were constitutively expressed.

[0093] T18 isomerase was purified by metal-affinity chromatography following his-tagging and over-expression in yeast INVSc1. As a positive control, his-tagged XylA from E. coli strain W3110 was over-expressed and purified from E. coli strain BL21(DE3)plysS. The protein concentration of purified proteins was determined by a standard Bradford assay. The impact of temperature on the activity of T18 isomerase and E. coli isomerase was determined using 5 .mu.g of protein and 0.75 g/L of either xylose or xylulose in 5 mM MgATP, 50 mM Hepes (pH 7.4), 10 mM MgCl.sub.2. Reactions were incubated overnight at 10.degree. C., 25.degree. C., 30.degree. C., 37.degree. C., 50.degree. C., 60.degree. C., and 80.degree. C. Reactions were stopped by heat inactivation at 95.degree. C. for 5 mins. Reactions were analyzed by HPLC and the concentration of the sugars present was determined from the area under the peak relative to a standard curve. T18 isomerase had higher activity on both xylose and xylulose at temperatures at and above 37.degree. C. (FIG. 20A). This is in contrast to E. coli isomerase, which had higher activity at temperatures between 25.degree. C. and 30.degree. C. (FIG. 20B).

[0094] Dose-dependency was determined by incubating increasing protein concentrations of the isomerase with 0.75 g/L xylose or xylulose in 5 mM MgATP, 50 mM Hepes (pH 7.4), 10 mM MgCl.sub.2. Reactions were incubated overnight at 30.degree. C. (E. coli) or 50.degree. C. (T18) then stopped by heat inactivation at 95.degree. C. for 5 mins. Reactions were analyzed by HPLC and the concentration of the sugars present was determined from the area under the peak relative to a standard curve. Observed was a dose dependency of T18 isomerase on both xylose and xylulose (FIGS. 21A and 21B).

[0095] This example describes the use of a Thraustochytrium ONC-T18-derived (ONC-T18) alpha-tubulin promoter to express endogenous and/or heterologous xylose metabolism transgenes in Thraustochytrid species, including ONC-T18. However, as discussed throughout, other regulatory elements can be used. FIGS. 4 and 5 show constructs of the plasmids containing the xylose isomerase and xylulose kinase genes, respectively. As described herein, the xylose metabolism transgenes were present in multiple (.gtoreq.8) copies within the genome of the host. In the case of ONC-T18, the modified organisms demonstrated an increased metabolism of xylose compared to wild-type (WT) cells. For example, a strain modified to express an endogenous xylose isomerase gene (SEQ ID NO:2) (strain Iso-His #16) and a strain modified to express an endogenous xylose isomerase gene (SEQ ID NO:2) and a xylulose kinase gene (SEQ ID NO:5) (Iso-His+xylB, strain 7-7) both used 40% more xylose than the WT strain. Both Iso-His #16 and 7-7 converted less xylose to xylitol than the WT strain, 40% less and 420% less, respectively. The constructs used for transformation of ONC-T18 are shown in FIGS. 4 and 5. ONC-T18 tranformants were created using standard biolistics protocols as described by BioRad's Biolistic PDS-1000/He Particle Delivery System (Hercules, Calif.). Briefly, 0.6 .mu.m gold particles were coated with 2.5 .mu.g of linerized plasmid DNA (EcoRI, 37.degree. C., overnight). The coated gold particles were used to bombard plates previously spread with 1 ml of ONC-T18 cells at an OD600 of 1.0. The bombardment parameters included using a helium pressure of 1350 or 1100 psi with a target distance of 3 or 6 cm. After an overnight recovery, the cells were washed off the plate and plated on media containing selection antibiotics (Zeo 250 .mu.g/mL and hygro 400 .mu.g/mL). Plates were incubated for 1 week at 25.degree. C. to identify resistant colonies. The resulting transformants were screened by PCR and Southern blot.

[0096] Southern blots were performed using standard protocols. Briefly, approximately 20 .mu.g of genomic DNA were digested with 40 units of BamHI restriction enzyme in a total volume of 50 .mu.L overnight at 37.degree. C. 7.2 .mu.g of each digested sample was run on a 1.0% agarose gel at 50V for approximately 1.5. hours, with a digoxigenin (DIG) DNA molecular-weight marker II (Roche, Basel, Switzerland). DNA was depurinated in the gel by submerging the gel in 250 mM HCl for 15 minutes. The gel was further denatured by incubation in a solution containing 0.5 M NaOH and 1.5 M NaCL (pH 7.5) for two 15 minute washes. The reaction was then neutralized by incubation in 0.5 M Tris-HCl (pH 7.5) for two 15 minute washes. Finally, the gel was equilibrated in 20.times. saline-sodium citrate (SSC) buffer for 15 minutes. DNA was transferred to a positively charged nylon membrane using a standard transfer apparatus. DNA was fixed to the membrane using a UV Stratalinker at an exposure of 120,000 .mu.J. Southern blot probe was generated using a PCR DIG Probe Synthesis Kit (Roche, Basel, Switzerland) to generate a DIG-labelled probe according to the manufacturer's instructions. The DNA affixed to the nylon membrane was prehybridized with 20 mL of DIG EasyHyb solution (DIG EasyHyb Granules, Roche, Basel, Switzerland). The DIG-labelled probe was denatured by adding 40 .mu.L of the ble-probe reaction mixture to 300 .mu.L of ddH.sub.2O and incubated at 99.degree. C. for 5 minutes. This solution was then added to 20 mL of DIG hybridization solution to create the probe solution. The probe solution was then added to the DNA-affixed nylon membrane and incubated at 53.degree. C. overnight. The following day, the membrane was washed twice in 2.times.SSC, 0.1% SDS at room temperature. The membrane was further washed twice in 0.1.times.SSC, 0.1% SDS at 68.degree. C. for 15 minutes. For detection, the membrane was washed and blocked using DIG Wash and Block Buffer set (Roche, Basel, Switzerland) according to the manufacturer's instructions. An anti-DIG-AP conjugated antibody from a DIG Nucleic Acid Detection Kit (Roche, Basel, Switzerland) was used for detection. 2 .mu.L of the antibody solution was added to 20 mL detection solution and incubated with the membrane at room temperature for 30 minutes. The blot was then immersed in a washing buffer provided with the kit. CDP-Star (Roche, Basel, Switzerland) was used for visualization. 10 .mu.L of the CDP-star solution was incubated on the membrane in 1 mL of detection solution, which was covered in a layer of `sheet-protector` plastic to hold the solution to the membrane. Signal was immediately detected using a ChemiDoc imaging system (BioRad Laboratories, Hercules, Calif.).

[0097] The codon optimized ble gene was cloned under the control of T18B .alpha.-tubulin promoter and terminator elements (FIG. 6). The isomerase gene was cloned from T18B in such a way as to add a six-histidine tag on the N-terminus of the expressed protein (Iso-His). Xylose isomerase enzymatic activity was confirmed by over-expression and purification of the histidine-tagged protein in yeast. The isomerase gene (along with the introduced six-histidine tag) was cloned under the control of the .alpha.-tubulin promoter and terminators by cloning the gene downstream of the ble gene and a 2A sequence (FIG. 4 and FIG. 6). Biolistic transformation of T18B with this plasmid (pALPHTB-B2G-hisIso) resulted in Zeocin (zeo) resistant transformants. Many transformant strains were obtained from this procedure. Two of these strains are shown as example #6 containing one copy of the transgene and example #16 containing eight copies of the transgene (FIG. 7).

[0098] The insertion of the Iso-His transgene within the T18B genome was confirmed by PCR and Southern blot analysis (FIG. 7). Qualitatively, these data showed the presence of a single copy of the transgene in strain #6 and multiple, concatameric, transgene copies, at a single site, in strain #16. The precise number of Iso-His transgene insertions was determined by qPCR on genomic DNA (FIG. 8). These data showed the presence of one copy of the transgene in strain #6 and eight copies of the transgene in strain #16 (FIG. 8). To test whether an increase in copy number correlated with an increase in expression level, mRNA was isolated from WT, Iso-His #6 and Iso-His #16 T18B cells and qRT-PCR was performed. FIG. 11 shows significantly increased expression of the Iso-His transcript in strain #16 cells, containing eight copies of the transgene, compared to strain #6, containing a single copy of the transgene. No Iso-His transcript is detectable in WT cells (FIG. 11). To assess whether increased mRNA expression correlated with increased isomerase enzymatic activity, cell extracts were harvested from WT, Iso-His #6 and Iso-His #16 cells. Enhanced isomerase enzyme activity is observed in strain #16 cells compared with strain #6 and WT cells (FIG. 12). Finally, the ability of strain #16 to metabolize xylose was examined in xylose depletion assays (FIG. 14) and compared with WT cells. These flask fermentations demonstrated the ability to metabolise xylose and quantify the amount of xylose converted to xylitol. Thus, FIG. 14 shows an increase in xylose metabolism in Iso-His strain #16 compared with WT cells and significantly less production of xylitol.

[0099] For flasks assays, cells were grown in media for 2 to 3 days. Pellets were washed twice in Media 2 (9 g/L NaCl, 4 g/L MgSO.sub.4, 100 mg/L CaCl.sub.2, 5 mg/L FeCl.sub.3, 20 g/L (NH.sub.4).sub.2SO.sub.4, 0.86 g/L KH.sub.2PO.sub.4, 150 .mu.g/L vitamin B12, 30 .mu.g/L biotin, 6 mg/L thiamine hydrochloride, 1.5 mg/L cobalt (II) chloride, 3 mg/L manganese chloride) containing no sugar. Then, minimal media containing 20 g/L glucose & 50 g/L xylose was inoculated to an OD600 of 0.05 with the washed cells. Samples were taken at various time points and the amount of sugar remaining in the supernatant was analyzed by HPLC. As shown in FIGS. 22A, 22B, 22C and 22D, with increased xylose isomerase gene copy number, up to 40% more xylose usage and 20% decrease in xylitol production when compared to WT.

[0100] Iso-His strain #16 was then used as the parent strain for a second round of transformation to introduce the E. coli xylB gene. This gene was introduced under hygromycin (hygro) selection. The hygro gene from pChlamy_3, the 2A sequence, and the T18B codon optimized W3110 E. coli xylB gene were cloned under the control of the T18B .alpha.-tubulin promoter and terminator elements for expression in T18B iso-his #16 (FIG. 5). The in vitro ability of the E. coli xylulose kinase to work in concert with the T18B isomerase was confirmed by over-expression and purification of the histidine-tagged proteins in yeast followed by enzymatic reactions with xylose and xylulose. Biolistic transformation of T18B iso-his strain #16 with the xylB plasmid (pJB47) resulted in hygro and zeo resistant transformants. The insertion of the hygro-2A-xylB genes within the T18B genome was confirmed by PCR and Southern blot analysis (FIG. 9). Qualitatively, these data show the presence of a single copy of the transgene in strain #7-3 and multiple, concatameric, transgene copies, at a single site, in strain #7-7. The number of xylB gene insertions was determined by qPCR on genomic DNA isolations (FIG. 10). FIG. 10 shows sixteen insertions of the transgene in strain 7-7 and one copy in strain 7-3. To determine whether multiple copies of the transgene confer enhanced xylose metabolism in vitro, cell extract assays were performed and the ability of the cells extracts to metabolise xylose was analysed (FIG. 13). The ability of the transformant cells to metabolize xylose was examined through flask-based xylose depletion assays (FIG. 15). In this experiment, WT cells consumed the least amount of xylose and made the most xylitol. Strain Iso-His #16, 7-3 and 7-7 all consumed similar amounts of xylose; however, only 7-7, containing multiple copies of the xylB transgene, did not make significant amounts of xylitol. Finally, strains Iso-His #16 and 7-7 were tested at in 5 L fermentation vessels in media containing glucose and xylose. During a seventy-seven (77) hour fermentation, strain Iso-His #16 converted approximately 8% of xylose to xylitol, whereas strain 7-7 converted approximately 2% of xylose to xylitol. Xylitol accumulation in this fermentation is shown in FIG. 16.

[0101] For flasks assays, cells were grown in media for 2 to 3 days. Pellets were washed twice in media containing no sugar. Media containing 20 g/L: 50 g/L glucose: xylose was inoculated to an OD600 of 0.05 with the washed cells. Samples were taken at various time points and the amount of sugar remaining in the supernatant was analyzed by HPLC. As shown in FIGS. 23A, 23B, 23C and 23D, up to 50% more xylose was used and an 80% reduction in xylitol was observed in strains over-expressing both a xylose isomerase and a xylulose kinase when compared to WT.

[0102] To further analyze these strains, the strains were grown in parallel 5 L Sartorius fermenters. Initial media contained 20 g/L Glucose and 50 g/L xylose along with other basal media components. Both cultures were maintained at 28.degree. C. and 5.5 pH, with constant mixing at 720 RPM and constant aeration at 1 Lpm of environmental air. The cultures were fed glucose for 16 hrs followed by 8 hr starvation period. This cycle was completed 3 times. During starvation periods, 10 mL samples were taken every 0.5 hr. Glucose, xylose and xylitol concentrations were quantified in these samples by HPLC. Larger 50 mL samples were taken periodically for further biomass and oil content quantification. Glucose feed rates matched glucose consumption rates, which was quantified by CO.sub.2 detected in the culture exhaust gas. As shown in FIG. 24, the 7-7 strain used up to 52% more xylose than WT under these conditions.

[0103] By Southern blot analysis, it was observed that strain Iso-His #16 contains eight (8) insertions of the isomerase transgene (FIG. 8). This unexpected multiple insertion resulted in an increase in isomerase gene expression relative to strains harbouring a single copy (FIG. 11) as well as increased isomerase in vitro activity (FIG. 12). Strain Iso-His #16 demonstrated increased xylose productivity than strains harbouring a single copy of the isomerase transgene (FIG. 14).

[0104] Similarly, within the Iso-His+xylB transformants, one of the clones (Iso-His+xylB 7-7) also had multiple insertions of the xylB gene (FIG. 10), which resulted in increased in vivo activity of both the xylose isomerase and xylulose kinase within the cell (FIG. 13). This clone was capable of using either as much or more xylose than the parental strain, Iso-His #16, while producing significantly less xylitol (FIG. 15). Furthermore the Iso-His+xylB 7-7 produced more biomass than WT in the presence of xylose. These two strains showed that, not only is the presence of both the isomerase and the kinase genes important, but the number of insertions is as well.

[0105] To further optimize the iso-his & xylB containing "7-7" strain, this strain was transformed with a xylose transporter. FIG. 17 shows exemplary constructs for transformation. Examples of xylose transporters to be used include, but are not limited to, At5g17010 and At5g59250 (Arabidopsis thaliana), Gfx1 and GXS1 (Candida), AspTx (Aspergillus), and Sut1 (Pichia). Gxs1 (SEQ ID NO:23) was selected for transformation. The results are shown in FIGS. 19A, 19B, and 19C. The transformants 36-2, 36-9, and 36-16, containing GXS1 use more xylose than 7-7 and WT strains. They also use glucose slower than WT and 7-7 strains. The data demonstrate both xylose and glucose being used in the earlier stages by the GXS1 containing strains. Further, the percent of xylitol made by the GXS1 containing strains is lower than both WT and 7-7 strains.

[0106] To further analyze the effect of sugar transporters on the metabolism of xylose, codon optimized xylose transporters AspTX from Aspergillus (An11g01100) and Gxs1 from Candida were introduced in the 7-7 strain (isohis+xylB). FIG. 25 shows the alpha-tubulin aspTx-neo and alpha-tubulin gxs1-neo constructs. T18 transformants were created using standard biolistics protocols as described by BioRad's Biolistic PDS-1000/He Particle Delivery System. Briefly, 0.6 .mu.m gold particles were coated with 2.5 .mu.g of linearized plasmid DNA (EcoRI, 37.degree. C., o/n), The coated gold particles were used to bombard WD plates previously spread with 1 ml of T18 cells at an OD600 of 1.0. The bombardment parameters included using a Helium pressure of 1350 or 1100 psi with a target distance of 3 or 6 cm. After an overnight recovery, the cells were washed off the plate and plated on media containing selection antibiotics (G418 at 2 mg/mL). Plates were incubated for 1 week at 25.degree. C. to identify resistant colonies. The resulting transformants were screened by PCR and Southern blot (FIG. 26).

[0107] Southern blots were performed using standard protocols. Briefly, approximately 20 .mu.g of genomic DNA were digested with 40 units of BamHI restriction enzyme in a total volume of 50 .mu.L o/n/ at 37.degree. C. 7.2 .mu.g of each digested sample was run on a 1.0% agarose gel at 50V for approximately 1.5. hours, with a digoxigenin (DIG) DNA molecular-weight marker II (Roche). DNA was depurinated in the gel by submerging the gel in 250 mM HCl for 15 minutes. The gel was further denatured by incubation in a solution containing 0.5 M NaOH and 1.5 M NaCL (pH 7.5) for two 15 minute washes. The reaction was then neutralized by incubation in 0.5 M Tris-HCl (pH 7.5) for two 15 minute washes. Finally, the gel was equilibrated in 20.times. saline-sodium citrate (SSC) buffer for 15 minutes. DNA was transferred to a positively charged nylon membrane (Roche) using a standard transfer apparatus. DNA was fixed to the membrane using a UV Stratalinker at an exposure of 120,000 .mu.J. Southern blot probe was generated using a PCR DIG Probe Synthesis Kit (Roche) to generate a DIG-labelled probe according to the manufacturer's instructions. The DNA affixed to the nylon membrane was prehybridised with 20 mL of DIG EasyHyb solution (DIG EasyHyb Granules, Roche). The DIG-labelled probe was denatured by adding 40 .mu.L of the ble-probe reaction mixture to 300 .mu.L of ddH.sub.2O and incubated at 99.degree. C. for 5 minutes. This solution was then added to 20 mL of DIG hybridization solution to create the probe solution. The probe solution was then added to the DNA-affixed nylon membrane and incubated at 53.degree. C. overnight. The following day, the membrane was washed, twice, in 2.times.SSC, 0.1% SDS at RT. The membrane was further washed, twice, in 0.1.times.SSC, 0.1% SDS at 68.degree. C. for 15 minutes. For detection, the membrane was washed and blocked using DIG Wash and Block Buffer set (Roche) according to the manufacturer's instructions. An anti-DIG-AP conjugated antibody from a DIG Nucleic Acid Detection Kit (Roche) was used for detection. 2 .mu.L of the antibody solution was added to 20 mL detection solution and incubated with the membrane at RT for 30 minutes. The blot was then immersed in a washing buffer provided with the kit. CDP-Star (Roche) was used for visualization. 10 .mu.L of the CDP-star solution was incubated on the membrane in 1 mL of detection solution, which was covered in a layer of `sheet-protector` plastic to hold the solution to the membrane. Signal was immediately detected using a ChemiDoc imaging system.

[0108] For flasks assays, cells were grown in media for 2 to 3 days. Pellets were washed twice in media 2 (9 g/L NaCl, 4 g/L MgSO4, 100 mg/L CaCl2, 5 mg/L FeCl3, 20 g/L (NH4)2SO4, 0.86 g/L KH2PO4, 150 .mu.g/L vitamin B12, 30 .mu.g/L biotin, 6 mg/L thiamine hydrochloride, 1.5 mg/L cobalt (II) chloride, 3 mg/L manganese chloride) containing no sugar. Then, Media 2 containing 20 g/L Glucose and 20 g/L Xylose was inoculated to an OD600 of 0.05 with the washed cells. As shown in FIGS. 27A, 27B, 27C and 27D, the expression of the xylose isomerase, xylulose kinase, and either xylose transporters resulted in up to 71% more xylose used and 40% less xylitol produced than the parental strain 7-7.

[0109] For flasks assays, cells were grown in media for 2 to 3 days. Pellets were washed twice in saline. Then, media containing 60 g/L xylose instead of glucose was inoculated to an OD600 of 0.05 with the washed cells. Samples were taken at various time points and the amount of sugar remaining in the supernatant was analyzed by HPLC. FIG. 28 shows T18 growth in media containing xylose as the main carbon source requires over-expression of both an isomerase and a kinase. The expression of the transporters in this background did not significantly increase xylose usage in this media.

[0110] Enhanced xylose usage by T18 7-7 and transporter strains was observed in media containing carbon from alternative feed stocks. For flasks assays, cells were grown in media for 2 to 3 days. Pellets were washed twice in 0.9% saline solution. Media 2 containing 20 g/L glucose:50 g/L xylose as a combination of lab grade glucose and glucose and xylose from an alternative feedstock from forestry, was inoculated to an OD600nm of 0.05 with the washed cells. Samples were taken at various time points and the amount of sugar remaining in the supernatant was analyzed by HPLC. As shown in FIGS. 29A and 29B, in media containing sugars from an alternative feedstock, the T18 7-7 strains encoding for transporters used more xylose than wild-type, or T18 7-7.

Sequence CWU 1

1

3011723DNAThraustochytrium sp. 1gtagtcatac gctcgtctca aagattaagc catgcatgtg taagtataag cgattatact 60gtgagactgc gaacggctca ttatatcagt tatgatttct tcggtatttt ctttatatgg 120atacctgcag taattctgga attaatacat gctgagaggg cccgactgtt cgggagggcc 180gcacttatta gagttgaagc caagtaagat ggtgagtcat gataattgag cagatcgctt 240gtttggagcg atgaatcgtt tgagtttctg ccccatcagt tgtcgacggt agtgtattgg 300actacggtga ctataacggg tgacggggag ttagggctcg actccggaga gggagcctga 360gagacggcta ccacatccaa ggaaggcagc aggcgcgtaa attacccaat gtggactcca 420cgaggtagtg acgagaaata tcaatgcggg gcgcttcgcg tcttgctatt ggaatgagag 480caatgtaaaa ccctcatcga ggatcaactg gagggcaagt ctggtgccag cagccgcggt 540aattccagct ccagaagcgt atgctaaagt tgttgcagtt aaaaagctcg tagttgaatt 600tctggggcgg gagccccggt ctttgcgcga ctgcgctctg tttgccgagc ggctcctctg 660ccatcctcgc ctcttttttt agtggcgtcg ttcactgtaa ttaaagcaga gtgttccaag 720caggtcgtat gacctggatg tttattatgg gatgatcaga tagggctcgg gtgctatttt 780gttggtttgc acatctgagt aatgatgaat aggaacagtt gggggtattc gtatttagga 840gctagaggtg aaattcttgg atttccgaaa gacgaactac agcgaaggca tttaccaagc 900atgttttcat taatcaagaa cgaaagtctg gggatcgaag atgattagat accatcgtag 960tctagaccgt aaacgatgcc gacttgcgat tgcggggtgt ttgtattgga ccctcgcagc 1020agcacatgag aaatcaaagt ctttgggttc cggggggagt atggtcgcaa ggctgaaact 1080taaaggaatt gacggaaggg caccaccagg agtggagcct gcggcttaat ttgactcaac 1140acgggaaaac ttaccaggtc cagacatagg taggattgac agattgagag ctctttcttg 1200attctatggg tggtggtgca tggccgttct tagttggtgg agtgatttgt ctggttaatt 1260ccgttaacga acgagacctc ggcctactaa atagcggtgg gtatggcgac atacttgcgt 1320acgcttctta gagggacatg ttcggtatac gagcaggaag ttcgaggcaa taacaggtct 1380gtgatgccct tagatgttct gggccgcacg cgcgctacac tgatgggttc aacgggtggt 1440catcgttgtt cgcagcgagg tgctttgccg gaaggcatgg caaatccttt caacgcccat 1500cgtgctgggg ctagattttt gcaattatta atctccaacg aggaattcct agtaaacgca 1560agtcatcagc ttgcattgaa tacgtccctg ccctttgtac acaccgcccg tcgcacctac 1620cgattgaacg gtccgatgaa accatgggat gaccttttga gcgtttgttc gcgagggggg 1680tcagaactcg ggtgaatctt attgtttaga ggaaggtgaa gtc 172321320DNAThraustochytrium sp. 2atggagttct tccccgaggt ggccaaggtg gagtacgccg gccccgagag ccgcgacgtc 60ctggcgtata gatggtacaa caaggaagag gtagtgatgg ggaagaaaat gaaggagtgg 120ctgaggttct cggtgtgctt ttggcatacc tttcgcggaa acgggtcgga cccctttggc 180aagcccacca tcacgcaccg cttcgcaggc gacgatggtt cggacaccat ggagaacgcc 240ctccggcgcg ttgaggcggc ctttgagctc tttgtcaagc tcggcgtgga gttctactcc 300tttcacgacg tcgatgtggc gcctgagggc aagacgctca aggagacaaa cgagaacctg 360gacaagatca cggaccgcat gctcgagctg caacaggaga cgggcgtcaa gctgctctgg 420ggcactgcca acttgttctc tcatccgcga tacatgaacg gcgggtcaac aaacccggat 480cccaaggtct ttgtgcgcgc cgccgcgcag gtgaaaaagg ccatcgacgt gacccacaaa 540ctcggtggcg aaggctttgt gttctggggc ggtcgggagg gttacatgca cattctcaac 600acggatatgg tccgtgaaat gaatcattac gcgaaaatgc tcaagatggc catcgcctac 660aagaaaaaga tcggcttcgg cgggcagatc ctggtcgaac ccaagccccg cgagcccatg 720aagcaccagt atgactacga cgtgcagacc gtcattggct ttctcagaca gcacggcctg 780gaaaacgagg tcagcctcaa cgtggagccc aatcacacgc agctcgccgg gcacgagttt 840gagcacgatg tcgtcctcgc cgcgcagctc ggcatgctcg gcagcgtcga cgccaacacg 900ggctccgaga gcctcgggtg ggacacggac gagttcatca ccgaccaaac gcgcgccact 960gtgctttgca aggccatcat tgagatgggt ggtttcgttc agggcggtct caactttgac 1020gccaaggtcc gtcgggagag caccgacccg gaggacctct ttatcgctca tgtcgcctcg 1080attgacgcgc tcgccaaggg tctgcgcaac gcttcgcagc tcgtttctga cggccgcatg 1140cgcaaaatgc tccaggaccg gtacgccggc tgggatgagg gcatcggaca aaagattgag 1200attggggaaa cctcgcttga ggacctcgag gcccactgcc tgcaggacga cacggaacca 1260gtcaagacgt cggccaagca ggagaaattc cttgccgttc tcaaccacta catttcctaa 1320311PRTArtificial SequenceSynthetic Construct 3Met His His His His His His Gly Ser Met Ser1 5 10433DNAArtificial SequenceSynthetic Construct 4atgcaccacc accaccacca cggttccatg tcg 3351455DNAArtificial SequenceArtificial Construct 5atgtacatcg gcatcgacct cggcacctcg ggcgtcaagg tcatcctcct caacgagcag 60ggcgaggtcg tcgccgccca gaccgagaag ctcaccgtct cgcgcccgca cccgctctgg 120tcggagcagg acccggagca gtggtggcag gccaccgacc gcgccatgaa ggccctcggc 180gaccagcact cgctccagga cgtcaaggcc ctcggcatcg ccggccagat gcacggcgcc 240accctcctcg acgcccagca gcgcgtcctc cgcccggcca tcctctggaa cgacggccgc 300tgcgcccagg agtgcaccct cctcgaggcc cgcgtcccgc agtcgcgcgt catcaccggc 360aacctcatga tgccgggctt caccgccccg aagctcctct gggtccagcg ccacgagccg 420gagatcttcc gccagatcga caaggtcctc ctcccgaagg actacctccg cctccgcatg 480accggcgagt tcgcctcgga catgtcggac gccgccggca ccatgtggct cgacgtcgcc 540aagcgcgact ggtcggacgt catgctccag gcctgcgacc tctcgcgcga ccagatgccg 600gccctctacg agggctcgga gatcaccggc gccctcctcc cggaggtcgc caaggcctgg 660ggcatggcca ccgtcccggt cgtcgccggc ggcggcgaca acgccgccgg cgccgtcggc 720gtcggcatgg tcgacgccaa ccaggccatg ctctcgctcg gcacctcggg cgtctacttc 780gccgtctcgg agggcttcct ctcgaagccg gagtcggccg tccactcgtt ctgccacgcc 840ctcccgcagc gctggcacct catgtcggtc atgctctcgg ccgcctcgtg cctcgactgg 900gccgccaagc tcaccggcct ctcgaacgtc ccggccctca tcgccgccgc ccagcaggcc 960gacgagtcgg ccgagccggt ctggttcctc ccgtacctct cgggcgagcg caccccgcac 1020aacaacccgc aggccaaggg cgtcttcttc ggcctcaccc accagcacgg cccgaacgag 1080ctcgcccgcg ccgtcctcga gggcgtcggc tacgccctcg ccgacggcat ggacgtcgtc 1140cacgcctgcg gcatcaagcc gcagtcggtc accctcatcg gcggcggcgc ccgctcggag 1200tactggcgcc agatgctcgc cgacatctcg ggccagcagc tcgactaccg caccggcggc 1260gacgtcggcc cggccctcgg cgccgcccgc ctcgcccaga tcgccgccaa cccggagaag 1320tcgctcatcg agctcctccc gcagctcccg ctcgagcagt cgcacctccc ggacgcccag 1380cgctacgccg cctaccagcc gcgccgcgag accttccgcc gcctctacca gcagctcctc 1440ccgctcatgg cctaa 1455622PRTArtificial SequenceSynthetic Construct 6Gly Ser Gly Ala Thr Asn Phe Ser Leu Leu Lys Gln Ala Gly Asp Val1 5 10 15Glu Glu Asn Pro Gly Pro 20720DNAArtificial SequenceSynthetic Construct 7gacgacgtga ccctgttcat 20819DNAArtificial SequenceSynthetic Construct 8tcccggaagt tcgtggaca 19920DNAArtificial SequenceSynthetic Construct 9tgagattggg gaaacctcgc 201022DNAArtificial SequenceSynthetic Construct 10tcttgactgg ttggccgtgt cg 221120DNAArtificial SequenceSynthetic Construct 11cgtcctgcgc attgatcttg 201220DNAArtificial SequenceSynthetic Construct 12ggcgagcttc tccttgatgt 201320DNAArtificial SequenceSynthetic Construct 13ggcgtcaacc acaaggagta 201420DNAArtificial SequenceSynthetic Construct 14tgtcgttgat gaccttggca 20151669DNAPiromyces sp. 15gtaaatggct aaggaatatt tcccacaaat tcaaaagatt aagttcgaag gtaaggattc 60taagaatcca ttagccttcc actactacga tgctgaaaag gaagtcatgg gtaagaaaat 120gaaggattgg ttacgtttcg ccatggcctg gtggcacact ctttgcgccg aaggtgctga 180ccaattcggt ggaggtacaa agtctttccc atggaacgaa ggtactgatg ctattgaaat 240tgccaagcaa aaggttgatg ctggtttcga aatcatgcaa aagcttggta ttccatacta 300ctgtttccac gatgttgatc ttgtttccga aggtaactct attgaagaat acgaatccaa 360ccttaaggct gtcgttgctt acctcaagga aaagcaaaag gaaaccggta ttaagcttct 420ctggagtact gctaacgtct tcggtcacaa gcgttacatg aacggtgcct ccactaaccc 480agactttgat gttgtcgccc gtgctattgt tcaaattaag aacgccatag acgccggtat 540tgaacttggt gctgaaaact acgtcttctg gggtggtcgt gaaggttaca tgagtctcct 600taacactgac caaaagcgtg aaaaggaaca catggccact atgcttacca tggctcgtga 660ctacgctcgt tccaagggat tcaagggtac tttcctcatt gaaccaaagc caatggaacc 720aaccaagcac caatacgatg ttgacactga aaccgctatt ggtttcctta aggcccacaa 780cttagacaag gacttcaagg tcaacattga agttaaccac gctactcttg ctggtcacac 840tttcgaacac gaacttgcct gtgctgttga tgctggtatg ctcggttcca ttgatgctaa 900ccgtggtgac taccaaaacg gttgggatac tgatcaattc ccaattgatc aatacgaact 960cgtccaagct tggatggaaa tcatccgtgg tggtggtttc gttactggtg gtaccaactt 1020cgatgccaag actcgtcgta actctactga cctcgaagac atcatcattg cccacgtttc 1080tggtatggat gctatggctc gtgctcttga aaacgctgcc aagctcctcc aagaatctcc 1140atacaccaag atgaagaagg aacgttacgc ttccttcgac agtggtattg gtaaggactt 1200tgaagatggt aagctcaccc tcgaacaagt ttacgaatac ggtaagaaga acggtgaacc 1260aaagcaaact tctggtaagc aagaactcta cgaagctatt gttgccatgt accaataagt 1320taatcgtagt taaattggta aaataattgt aaaatcaata aacttgtcaa tcctccaatc 1380aagtttaaaa gatcctatct ctgtactaat taaatatagt acaaaaaaaa atgtataaac 1440aaaaaaaagt ctaaaagacg gaagaattta atttagggaa aaaataaaaa taataataaa 1500caatagataa atcctttata ttaggaaaat gtcccattgt attattttca tttctactaa 1560aaaagaaagt aaataaaaca caagaggaaa ttttcccttt tttttttttt tgtaataaat 1620tttatgcaaa tataaatata aataaaataa taaaaaaaaa aaaaaaaaa 166916395PRTStreptomyces lividans 16Met Asn Tyr Gln Pro Thr Ser Glu Asp Arg Phe Thr Phe Gly Leu Trp1 5 10 15Thr Val Gly Trp Gln Gly Leu Asp Pro Phe Gly Asp Ala Thr Arg Glu 20 25 30Ala Leu Asp Pro Ala Glu Ser Val Arg Arg Leu Ser Gln Leu Gly Ala 35 40 45Tyr Gly Val Thr Phe His Asp Asp Glu Leu Ile Pro Phe Gly Ser Ser 50 55 60Asp Asn Glu Arg Gly Val Ala His Gly Ala Gly Val Ala His Gln Ala65 70 75 80Val Pro Ala Gly Ala Gly Arg Asp Arg His Glu Gly Ala Asp Gly Asp 85 90 95Asp Glu Pro Val His Ala Pro Gly Cys Ser Arg Asp Gly Ala Phe Thr 100 105 110Ala Asn Asp Arg Asp Val Arg Gly Thr Arg Cys Ala Arg Ala Ile Arg 115 120 125Asn Ile Asp Leu Ala Val Glu His Val Ala Arg Ala Ser Thr Cys Ala 130 135 140Trp Gly Gly Arg Glu Gly Ala Glu Ser Gly Ala Ala Lys Asp Val Arg145 150 155 160Asp Ala Leu Asp Arg Met Lys Glu Ala Phe Asp Leu Leu Gly Glu Tyr 165 170 175Val Thr Glu Gln Gly Tyr Asp Leu Lys Phe Ala Ile Glu Pro Lys Pro 180 185 190Asn Glu Pro Arg Gly Asp Ile Leu Leu Pro Thr Val Gly His Ala Leu 195 200 205Ala Phe Ile Glu Arg Leu Glu Arg Pro Glu Leu Tyr Gly Val Asn Pro 210 215 220Glu Val Gly His Glu Gln Met Ala Gly Leu Asn Phe Pro His Gly Ile225 230 235 240Ala Gln Ala Leu Trp Ala Gly Lys Leu Phe His Ile Asp Leu Asn Gly 245 250 255Gln Ser Gly Ile Lys Tyr Asp Gln Asp Leu Arg Phe Gly Ala Gly Asp 260 265 270Leu Arg Ala Ala Phe Trp Leu Val Asp Leu Leu Glu Arg Ala Gly Tyr 275 280 285Ala Gly Pro Arg His Phe Asp Phe Lys Pro Pro Arg Thr Glu Asn Phe 290 295 300Asp Ala Val Trp Pro Ser Ala Ala Gly Cys Met Arg Asn Tyr Leu Ile305 310 315 320Leu Lys Asp Arg Ala Ala Ala Phe Arg Ala Asp Pro Gln Val Gln Glu 325 330 335Ala Leu Ala Ala Ala Arg Leu Asp Glu Leu Ala Arg Pro Thr Ala Glu 340 345 350Asp Gly Leu Ala Ala Leu Leu Ala Asp Arg Ser Ala Tyr Asp Thr Phe 355 360 365Asp Val Asp Ala Ala Ala Ala Arg Gly Met Ala Phe Glu His Leu Asp 370 375 380Gln Leu Ala Met Asp His Leu Leu Gly Ala Arg385 390 395172040DNAPiromyces sp. 17attatataaa ataactttaa ataaaacaat ttttatttgt ttatttaatt attcaaaaaa 60aattaaagta aaagaaaaat aatacagtag aacaatagta ataatatcaa aatgaagact 120gttgctggta ttgatcttgg aactcaaagt atgaaagtcg ttatttacga ctatgaaaag 180aaagaaatta ttgaaagtgc tagctgtcca atggaattga tttccgaaag tgacggtacc 240cgtgaacaaa ccactgaatg gtttgacaag ggtcttgaag tttgttttgg taagcttagt 300gctgataaca aaaagactat tgaagctatt ggtatttctg gtcaattaca cggttttgtt 360cctcttgatg ctaacggtaa ggctttatac aacatcaaac tttggtgtga tactgctacc 420gttgaagaat gtaagattat cactgatgct gccggtggtg acaaggctgt tattgatgcc 480cttggtaacc ttatgctcac cggtttcacc gctccaaaga tcctctggct caagcgcaac 540aagccagaag ctttcgctaa cttaaagtac attatgcttc cacacgatta cttaaactgg 600aagcttactg gtgattacgt tatggaatac ggtgatgcct ctggtaccgc tctcttcgat 660tctaagaacc gttgctggtc taagaagatt tgcgatatca ttgacccaaa acttttagat 720ttacttccaa agttaattga accaagcgct ccagctggta aggttaatga tgaagccgct 780aaggcttacg gtattccagc cggtattcca gtttccgctg gtggtggtga taacatgatg 840ggtgctgttg gtactggtac tgttgctgat ggtttcctta ccatgtctat gggtacttct 900ggtactcttt acggttacag tgacaagcca attagtgacc cagctaatgg tttaagtggt 960ttctgttctt ctactggtgg atggcttcca ttactttgta ctatgaactg tactgttgcc 1020actgaattcg ttcgtaacct cttccaaatg gatattaagg aacttaatgt tgaagctgcc 1080aagtctccat gtggtagtga aggtgtttta gttattccat tcttcaatgg tgaaagaact 1140ccaaacttac caaacggtcg tgctagtatt actggtctta cttctgctaa caccagccgt 1200gctaacattg ctcgtgctag tttcgaatcc gccgttttcg ctatgcgtgg tggtttagat 1260gctttccgta agttaggttt ccaaccaaag gaaattcgtc ttattggtgg tggttctaag 1320ctgatctctg gagacaaatt gccgctgata tcatgaacct tccaatcaga gttccacttt 1380tagaagaagc tgctgctctt ggtggtgctg ttcaagcttt atggtgtctt aagaaccaat 1440ctggtaagtg tgatattgtt gaactttgca aagaacacat taagattgat gaatctaaga 1500atgctaaccc aattgccgaa aatgttgctg tttacgacaa ggcttacgat gaatactgca 1560aggttgtaaa tactctttct ccattatatg cttaaattgc caatgtaaaa aaaaatataa 1620tgccatataa ttgccttgtc aatacactgt tcatgttcat ataatcatag gacattgaat 1680ttacaaggtt tatacaatta atatctatta tcatattatt atacagcatt tcattttcta 1740agattagacg aaacaattct tggttccttg caatatacaa aatttacatg aatttttaga 1800atagtctcgt atttatgccc aataatcagg aaaattacct aatgctggat tcttgttaat 1860aaaaacaaaa taaataaatt aaataaacaa ataaaaatta taagtaaata taaatatata 1920agtaatataa aaaaaaagta aataaataaa taaataaata aaaatttttt gcaaatatat 1980aaataaataa ataaaatata aaaataattt agcaaataaa ttaaaaaaaa aaaaaaaaaa 2040181803DNASaccharomyces sp. 18atgttgtgtt cagtaattca gagacagaca agagaggttt ccaacacaat gtctttagac 60tcatactatc ttgggtttga tctttcgacc caacaactga aatgtctcgc cattaaccag 120gacctaaaaa ttgtccattc agaaacagtg gaatttgaaa aggatcttcc gcattatcac 180acaaagaagg gtgtctatat acacggcgac actatcgaat gtcccgtagc catgtggtta 240gaggctctag atctggttct ctcgaaatat cgcgaggcta aatttccatt gaacaaagtt 300atggccgtct cagggtcctg ccagcagcac gggtctgtct actggtcctc ccaagccgaa 360tctctgttag agcaattgaa taagaaaccg gaaaaagatt tattgcacta cgtgagctct 420gtagcatttg caaggcaaac cgcccccaat tggcaagacc acagtactgc aaagcaatgt 480caagagtttg aagagtgcat aggtgggcct gaaaaaatgg ctcaattaac agggtccaga 540gcccatttta gatttactgg tcctcaaatt ctgaaaattg cacaattaga accagaagct 600tacgaaaaaa caaagaccat ttctttagtg tctaattttt tgacttctat cttagtgggc 660catcttgttg aattagagga ggcagatgcc tgtggtatga acctttatga tatacgtgaa 720agaaaattca gtgatgagct actacatcta attgatagtt cttctaagga taaaactatc 780agacaaaaat taatgagagc acccatgaaa aatttgatag cgggtaccat ctgtaaatat 840tttattgaga agtacggttt caatacaaac tgcaaggtct ctcccatgac tggggataat 900ttagccacta tatgttcttt acccctgcgg aagaatgacg ttctcgtttc cctaggaaca 960agtactacag ttcttctggt caccgataag tatcacccct ctccgaacta tcatcttttc 1020attcatccaa ctctgccaaa ccattatatg ggtatgattt gttattgtaa tggttctttg 1080gcaagggaga ggataagaga cgagttaaac aaagaacggg aaaataatta tgagaagact 1140aacgattgga ctctttttaa tcaagctgtg ctagatgact cagaaagtag tgaaaatgaa 1200ttaggtgtat attttcctct gggggagatc gttcctagcg taaaagccat aaacaaaagg 1260gttatcttca atccaaaaac gggtatgatt gaaagagagg tggccaagtt caaagacaag 1320aggcacgatg ccaaaaatat tgtagaatca caggctttaa gttgcagggt aagaatatct 1380cccctgcttt cggattcaaa cgcaagctca caacagagac tgaacgaaga tacaatcgtg 1440aagtttgatt acgatgaatc tccgctgcgg gactacctaa ataaaaggcc agaaaggact 1500ttttttgtag gtggggcttc taaaaacgat gctattgtga agaagtttgc tcaagtcatt 1560ggtgctacaa agggtaattt taggctagaa acaccaaact catgtgccct tggtggttgt 1620tataaggcca tgtggtcatt gttatatgac tctaataaaa ttgcagttcc ttttgataaa 1680tttctgaatg acaattttcc atggcatgta atggaaagca tatccgatgt ggataatgaa 1740aattgggatc gctataattc caagattgtc cccttaagcg aactggaaaa gactctcatc 1800taa 1803192942DNAPichia sp.misc_feature(4)..(4)n is a, c, g, or tmisc_feature(38)..(38)n is a, c, g, or tmisc_feature(85)..(85)n is a, c, g, or tmisc_feature(88)..(88)n is a, c, g, or tmisc_feature(146)..(146)n is a, c, g, or tmisc_feature(174)..(174)n is a, c, g, or t 19ttanacagtt ttccagaatc caaattttcc aaccaacnaa aaacggaccc agaaagttac 60agatttttca gagcttcatc ttttntanga tttcacagct tcatcaattt cagaccatag 120ccataatgac ttttgtagag tttccnatca ctattcccaa ccagcagcgt gtgnaaactg 180ccatcaccta tagtgcctac ttttcggttt tcaccagtgt ggtttttggc ctagttacaa 240attcgctaga gaatgttgtg tatgcttttg gagcgcagac tgccatcacg ttagtgttga 300ctgcattcaa ctggccgtgg ttcacgagtg ctcccggtat cgaatggctc ccggtagaat 360tttaggatcg tatggtgact tggcgattta actgggtagc acaagggaat tttcaggaaa 420ttttctggtt ggacattttg ggcggctgaa ctttcatggt taaaaggact aaggccagat 480tctcgggggg agaaaaattt ctgttagttt ggaattttcc gagccccaca cattgcgatg 540gtagattcgg tacgaaacta tataaacggt tggattccta gaaagggcca gatcagattg 600tagstagtat atatagcata tagatccctg gaggataccc acagacatta ctgctactaa 660ttcataccat acttgacgta tatctgcgca tacatatcta ccccaacttt catataaaat 720tcctagattt attgcatctt ctaatagagt catttttcag atttttcaat ttccatagaa 780agcatacatt ttcatacagc ttctatttgt taatcgacct gataatttta ctagccatat 840ttcttttttt gatttttcac

ttaatcgaca tataaatact cacgtagttg acactcacaa 900tgaccactac cccatttgat gctccagata agctcttcct cgggttcgat ctttcgactc 960agcagttgaa gatcatcgtc accgatgaaa acctcgctgc tctcaaaacc tacaatgtcg 1020agttcgatag catcaacagc tctgtccaga agggtgtcat tgctatcaac gacgaaatca 1080gcaagggtgc cattatttcc cccgtttaca tgtggttgga tgcccttgac catgtttttg 1140aagacatgaa gaaggacgga ttccccttca acaaggttgt tggtatttcc ggttcttgtc 1200aacagcacgg ttcggtatac tggtctagaa cggccgagaa ggtcttgtcc gaattggacg 1260ctgaatcttc gttatcgagc cagatgagat ctgctttcac cttcaagcac gctccaaact 1320ggcaggatca ctctaccggt aaagagcttg aagagttcga aagagtgatt ggtgctgatg 1380ccttggctga tatctctggt tccagagccc attacagatt cacagggctc cagattagaa 1440agttgtctac cagattcaag cccgaaaagt acaacagaac tgctcgtatc tctttagttt 1500cgtcatttgt tgccagtgtg ttgcttggta gaatcacctc cattgaagaa gccgatgctt 1560gtggaatgaa cttgtacgat atcgaaaagc gcgagttcaa cgaagagctc ttggccatcg 1620ctgctggtgt ccaccctgag ttggatggtg tagaacaaga cggtgaaatt tacagagctg 1680gtatcaatga gttgaagaga aagttgggtc ctgtcaaacc tataacatac gaaagcgaag 1740gtgacattgc ctcttacttt gtcaccagat acggcttcaa ccccgactgt aaaatctact 1800cgttcaccgg agacaatttg gccacgatta tctcgttgcc tttggctcca aatgatgctt 1860tgatctcatt gggtacttct actacagttt taattatcac caagaactac gctccttctt 1920ctcaatacca tttgtttaaa catccaacca tgcctgacca ctacatgggc atgatctgct 1980actgtaacgg ttccttggcc agagaaaagg ttagagacga agtcaacgaa aagttcaatg 2040tagaagacaa gaagtcgtgg gacaagttca atgaaatctt ggacaaatcc acagacttca 2100acaacaagtt gggtatttac ttcccacttg gcgaaattgt ccctaatgcc gctgctcaga 2160tcaagagatc ggtgttgaac agcaagaacg aaattgtaga cgttgagttg ggcgacaaga 2220actggcaacc tgaagatgat gtttcttcaa ttgtagaatc acagactttg tcttgtagat 2280tgagaactgg tccaatgttg agcaagagtg gagattcttc tgcttccagc tctgcctcac 2340ctcaaccaga aggtgatggt acagatttgc acaaggtcta ccaagacttg gttaaaaagt 2400ttggtgactt gttcactgat ggaaagaagc aaacctttga gtctttgacc gccagaccta 2460accgttgtta ctacgtcggt ggtgcttcca acaacggcag cattatccsc aagatgggtt 2520ccatcttggc tcccgtcaac ggaaactaca aggttgacat tcctaacgcc tgtgcattgg 2580gtggtgctta caaggccagt tggagttacg agtgtgaagc caagaaggaa tggatcggat 2640acgatcagta tatcaacaga ttgtttgaag taagtgacga gatgaatctg ttcgaagtca 2700aggataaatg gctcgaatat gccaacgggg ttggaatgtt ggccaagatg gaaagtgaat 2760tgaaacacta aaatccataa tagcttgtat agaggtatag aaaaagagaa cgttatagag 2820taaagacaat gtagcatata tgtgcgaata tcacgataga cgttatacag aagattactt 2880tcacatcatt ttgaaaatat cttgatatgt tcatatttca ttcgcctcta gcatttttca 2940ga 2942201455DNAE. coli 20atgtatatcg ggatagatct tggcacctcg ggcgtaaaag ttattttgct caacgagcag 60ggtgaggtgg ttgctgcgca aacggaaaag ctgaccgttt cgcgcccgca tccactctgg 120tcggaacaag acccggaaca gtggtggcag gcaactgatc gcgcaatgaa agctctgggc 180gatcagcatt ctctgcagga cgttaaagca ttgggtattg ccggccagat gcacggagca 240accttgctgg atgctcagca acgggtgtta cgccctgcca ttttgtggaa cgacgggcgc 300tgtgcgcaag agtgcacttt gctggaagcg cgagttccgc aatcgcgggt gattaccggc 360aacctgatga tgcccggatt tactgcgcct aaattgctat gggttcagcg gcatgagccg 420gagatattcc gtcaaatcga caaagtatta ttaccgaaag attacttgcg tctgcgtatg 480acgggggagt ttgccagcga tatgtctgac gcagctggca ccatgtggct ggatgtcgca 540aagcgtgact ggagtgacgt catgctgcag gcttgcgact tatctcgtga ccagatgccc 600gcattatacg aaggcagcga aattactggt gctttgttac ctgaagttgc gaaagcgtgg 660ggtatggcga cggtgccagt tgtcgcaggc ggtggcgaca atgcagctgg tgcagttggt 720gtgggaatgg ttgatgctaa tcaggcaatg ttatcgctgg ggacgtcggg ggtctatttt 780gctgtcagcg aagggttctt aagcaagcca gaaagcgccg tacatagctt ttgccatgcg 840ctaccgcaac gttggcattt aatgtctgtg atgctgagtg cagcgtcgtg tctggattgg 900gccgcgaaat taaccggcct gagcaatgtc ccagctttaa tcgctgcagc tcaacaggct 960gatgaaagtg ccgagccagt ttggtttctg ccttatcttt ccggcgagcg tacgccacac 1020aataatcccc aggcgaaggg ggttttcttt ggtttgactc atcaacatgg ccccaatgaa 1080ctggcgcgag cagtgctgga aggcgtgggt tatgcgctgg cagatggcat ggatgtcgtg 1140catgcctgcg gtattaaacc gcaaagtgtt acgttgattg ggggcggggc gcgtagtgag 1200tactggcgtc agatgctggc ggatatcagc ggtcagcagc tcgattaccg tacggggggg 1260gatgtggggc cagcactggg cgcagcaagg ctggcgcaga tcgcggcgaa tccagagaaa 1320tcgctcattg aattgttgcc gcaactaccg ttagaacagt cgcatctacc agatgcgcag 1380cgttatgccg cttatcagcc acgacgagaa acgttccgtc gcctctatca gcaacttctg 1440ccattaatgg cgtaa 1455211623DNAAspergillus sp. 21atggctatcg gcaatcttta cttcattgcg gccatcgccg tcgtcggcgg tggtctgttc 60ggtttcgata tctcgtcgat gtcggccatc atcgagaccg atgcctatct ctgttacttc 120aaccaggctc ctgtcactta cgatgatgat ggcaagaggg tctgtcaggg ccccagcgcg 180agtgtgcagg gtggtatcac cgcctccatg gctggtggtt cctggttggg ctcgttgatc 240tcgggtttca tctcggacag gcttggtcgt cgtactgcca ttcagatcgg ttccatcatc 300tggtgcattg gatccatcat tgtctgtgcc tcccagaaca ttcccatgct gatcgtcggt 360cgtatcatca acggtctgag tgtgggtatc tgctccgctc aggtgccagt gtatatttcg 420gagattgctc ctccaaccaa gcgtggtcgt gtcgtcggtc tgcaacaatg ggctattacc 480tggggtatcc tgatcatgtt ctacgtctcc tatggatgca gcttcatcaa gggtacggcg 540gccttccgga ttccctgggg tctgcagatg atccctgccg tgctattgtt cctgggtatg 600atgctcctgc ctgagtcacc ccgctggctg gcacgcaagg accgatggga ggagtgccac 660gctgttttga ccctcgtcca cggtcaggga gacccgagct ctccctttgt gcagcgtgaa 720tatgaagaga tcaagagcat gtgcgagttt gagcgccaaa acgcggatgt ctcctacctc 780gagctgttca agcccaacat gcttaaccgt acccatgtgg gtgttttcgt tcagatctgg 840tctcagttga ctggaatgaa cgtcatgatg tactacatca cctacgtctt tgccatggcc 900ggcttgaaag gtaacaacaa cttgatctcc tccagtatcc agtacgtgat caacgtgtgc 960atgactgtgc cggctctggt gtggggtgat cagtggggcc gtcgcccgac cttcttgatc 1020ggttccctct tcatgatgat ctggatgtac attaatgctg gtctgatggc cagctacggt 1080catcccgcgc cgcccggcgg tctcaacaac gtggaagccg agtcctgggt catccacggc 1140gcgcccagca aggctgtcat tgccagtacc tacctcttcg tagcctcata cgccatctcc 1200ttcggccccg ccagctgggt gtacccgccg gaactcttcc ctctgcgtgt gcgcggcaag 1260gctaccgccc tctgcacttc agccaactgg gccttcaact tcgccctcag ctattttgtc 1320cccccggcat ttgtcaacat ccagtggaag gtctacatcc tcttcggtgt cttctgtact 1380gccatgttct tgcacatttt cttcttcttt cccgagacca cgggtaagac cctggaagag 1440gtcgaggcca tcttcactga tcccaatggt attccgtaca tcggtactcc cgcctggaag 1500acaaagaacg agtactcgcg cggtgcacac attgaggagg ttggctttga agatgagaag 1560aaggttgctg gtgggcagac tatccaccag gaggtcacgg ctactccgga taagattgct 1620tga 1623221644DNACandida sp. 22atgtcacaag attcgcattc ttctggtgcc gctacaccag tcaatggttc catccttgaa 60aaggaaaaag aagactctcc agttcttcaa gttgatgccc cacaaaaggg tttcaaggac 120tacattgtca tttctatctt ctgttttatg gttgccttcg gtggtttcgt cttcggtttc 180gacactggta ccatttccgg tttcgtgaac atgtctgact ttaaagacag attcggtcaa 240caccacgctg atggtactcc ttacttgtcc gacgttagag ttggtttgat gatttctatt 300ttcaacgttg gttgcgctgt cggtggtatt ttcctctgca aggtcgctga tgtctggggt 360agaagaattg gtcttatgtt ctccatggct gtctacgttg ttggtattat tattcagatc 420tcttcatcca ccaagtggta ccagttcttc attggtcgtc ttattgctgg tttggctgtt 480ggtaccgttt ctgtcgtttc cccacttttc atctctgagg tttctccaaa gcaaattaga 540ggtactttag tgtgctgctt ccagttgtgt atcaccttgg gtatcttctt gggttactgt 600actacttacg gtactaagac ctacactgac tctagacagt ggagaattcc tttgggtttg 660tgtttcgctt gggctatctt gttggttgtc ggtatgttga acatgccaga gtctccaaga 720tacttggttg agaagcacag aattgatgag gccaagagat ccattgccag atccaacaag 780atccctgagg aggacccatt cgtctacact gaggttcagc ttattcaggc cggtattgag 840agagaagctt tggctggtca ggcatcttgg aaggagttga tcactggtaa gccaaagatc 900ttcagaagag ttatcatggg tattatgctt cagtccttgc aacagttgac cggtgacaac 960tacttcttct actacggtac taccattttc caggctgtcg gtttgaagga ttctttccag 1020acttctatca ttttgggtat tgtcaacttt gcttccacct tcgttggtat ctatgtcatt 1080gagagattgg gtagaagatt gtgtcttttg accggttccg ctgctatgtt catctgtttc 1140atcatctact ctttgattgg tactcagcac ttgtacaagc aaggttactc caacgagacc 1200tccaacactt acaaggcttc tggtaacgct atgatcttca tcacttgtct ttacattttc 1260ttctttgctt ctacctgggc tggtggtgtt tactgtatca tttccgagtc ctacccattg 1320agaattagat ccaaggccat gtctattgct accgctgcta actggttgtg gggtttcttg 1380atttccttct tcactccatt catcaccagt gccatccact tctactacgg tttcgttttc 1440actggttgtt tggctttctc tttcttctac gtctacttct tcgtctacga aaccaagggt 1500ctttctttgg aggaggttga tgagatgtac gcttccggtg ttcttccact caagtctgcc 1560agctgggttc caccaaatct tgagcacatg gctcactctg ccggttacgc tggtgctgac 1620aaggccaccg acgaacaggt ttaa 1644231569DNACandida sp. 23atgggtttgg aggacaatag aatggttaag cgtttcgtca acgttggcga gaagaaggct 60ggctctactg ccatggccat catcgtcggt ctttttgctg cttctggtgg tgtccttttc 120ggatacgata ctggtactat ttctggtgtg atgaccatgg actacgttct tgctcgttac 180ccttccaaca agcactcttt tactgctgat gaatcttctt tgattgtttc tatcttgtct 240gttggtactt tctttggtgc actttgtgct ccattcctta acgacaccct cggtagacgt 300tggtgtctta ttctttctgc tcttattgtc ttcaacattg gtgctatctt gcaggtcatc 360tctactgcca ttccattgct ttgtgctggt agagttattg caggttttgg tgtcggtttg 420atttctgcta ctattccatt gtaccaatct gagactgctc caaagtggat cagaggtgcc 480attgtctctt gttaccagtg ggctattacc attggtcttt tcttggcctc ttgtgtcaac 540aagggtactg agcacatgac taactctgga tcttacagaa ttccacttgc tattcaatgt 600ctttggggtc ttatcttggg tatcggtatg atcttcttgc cagagactcc aagattctgg 660atctccaagg gtaaccagga gaaggctgct gagtctttgg ccagattgag aaagcttcca 720attgaccacc cagactctct cgaggaatta agagacatca ctgctgctta cgagttcgag 780actgtgtacg gtaagtcctc ttggagccag gtgttctctc acaagaacca ccagttgaag 840agattgttca ctggtgtggc tatccaggct ttccagcaat tgaccggtgt taacttcatt 900ttctactacg gtactacctt cttcaagaga gctggtgtta acggtttcac tatctccttg 960gccactaaca ttgtcaatgt cggttctact attccaggta ttcttttgat ggaagtcctc 1020ggtagaagaa acatgttgat gggtggtgct actggtatgt ctctttctca attgatcgtt 1080gccattgttg gtgttgctac ctcggaaaac aacaagtctt cccagtccgt ccttgttgct 1140ttctcctgta ttttcattgc cttcttcgct gccacctggg gtccatgtgc ttgggttgtt 1200gttggtgagt tgttcccatt gagaaccaga gctaagtctg tctccttgtg tactgcttcc 1260aactggttgt ggaactgggg tattgcttac gctactccat acatggtgga tgaagacaag 1320ggtaacttgg gttccaatgt gttcttcatc tggggtggtt tcaacttggc ttgtgttttc 1380ttcgcttggt acttcatcta cgagaccaag ggtctttctt tggagcaggt cgacgagttg 1440tacgagcatg tcagcaaggc ttggaagtct aagggcttcg ttccatctaa gcactctttc 1500agagagcagg tggaccagca aatggactcc aaaactgaag ctattatgtc tgaagaagct 1560tctgtttaa 1569241662DNAPichia sp. 24atgtctgtag atgaaaatca attggagaat ggacaacttc tatcctccga aaatgaggca 60tcatcacctt ttaaagagtc tatcccttct cgctcttccc tctacttaat agctcttaca 120gtttcacttt tgggagttca attgacttgg tcggttgaac ttggttatgg tacaccgtat 180ttattctcac ttggtcttcg taaagaatgg acttcaatta tatggattgc cggtcctttg 240actggaatat taattcagcc aattgctggt atattgtccg accgggttaa ttcaagaata 300ggtcggcgga gaccgttcat gctctgtgct agtttgttag gaacattcag cttattcctt 360atgggctggg cccctgatat ttgcctcttt atatttagca atgaggttct aatgaaacgt 420gttactatcg ttttggctac gattagcatt tatttgcttg acgtggccgt caatgtcgta 480atggctagca ctcgatcttt aattgttgat tcagtccgtt cagatcaaca gcatgaagca 540aattcctggg ctggaagaat gataggtgta ggcaatgtgc ttgggtactt actaggctat 600ttacctctat atcgcatctt ctcctttctc aatttcacac agttacaggt gttttgcgta 660cttgcctcca tttccttggt actcacagtt accatcacaa caatatttgt gagtgaaagg 720agattcccac cagttgaaca cgagaaatcg gttgctggag aaatctttga attttttaca 780actatgcgac aaagtattac cgcacttcca tttacattaa aaagaatttg ttttgttcaa 840ttttttgcat actttggatg gtttccattt ttgttttata ttactaccta tgtgggtatt 900ttatatttac gccatgctcc taaaggccat gaagaagatt gggacatggc gactcgtcaa 960gggtcgttcg cattactgct ttttgctatc atttctcttg ccgcaaatac agcacttcca 1020ttgttgctcg aggacacgga ggatgatgag gaggacgaat cgagtgatgc atctaataat 1080gaatacaaca ttcaagaaag aaacgatctc ggaaatataa gaactggtac taatacaccc 1140cgtcttggta atttgagcga aacaacttct ttccgttcgg aaaatgagcc ctcacgacgc 1200aggcttttac cgtctagtag atcaattatg acaacgatat cctccaaggt acaaatcaaa 1260ggacttactc ttcctattct gtggttgagc tcccatgtcc tttttggtgt ttgtatgttg 1320agcacgatat tcttgcaaac atcatggcaa gcgcaggcaa tggtagctat ctgtggactg 1380tcctgggcat gtactctatg gattccatat tcgctatttt cttcagaaat agggaagctt 1440ggattacgag aaagcagtgg caaaatgatt ggtgttcaca atgtatttat atctgccccc 1500caagtgttga gcaccatcat tgccaccatt gtatttattc aatcggaggg cagtcatcga 1560gacatcgccg acaatagtat agcatgggtg ttgagaattg gaggtatatc tgcatttcta 1620gccgcgtacc aatgccggca tcttttgccc atcaactttt ga 166225490DNAThraustochytrium sp. 25cgcggcttcc cgtctccaag cttcgtctcg gtagagattc tatcttcgcc cggcagcccg 60ccgccgtccg gcaagtgtag aacggcagaa agcccacttg cacggaacgc ccgacaagtt 120gacgaaagcg gcccgcaagt gcggcagccc ggctggtttt tcctcgcggc gaggccaaac 180cgccaacgcc accaagccag acaccaggta tgtgccgcac gcgccgccgc acgcgagccc 240cgaggatgcc ccgtacgcgc tgacgccttt ctccgccccg cccgcgagaa gacgcgctcc 300ggcaacggcg ggagccgagc gaacgggcga ggattgatcg agtagctgca ggttgagaaa 360aaaggaaaac cgccgagatg gacaacggct ggatggacga gaagacgcac gaggacgcga 420ggactgacga tgatcacgtg cgcaggaaga cttgaaaaga agcaaggaag gtagaaaaaa 480aagaagaaat 490261004DNAThraustochytrium sp. 26ggcctgtctc ccttggccat ccattgcgct gcggaagcat tggattgcga actgcgtcgg 60ccagatcgct tggtttccca acatgagacg cgctctgtcg gcaagaccat ttccgccccc 120ggctttgctc acaaccaact cgtagtagat tttgtaaaga acactgcacg tctgactgct 180cccagcccgc acgcattgcg cttggcagcc tcggtcccaa accgtcacgg tcgctgcccg 240gtccacggga aaaaataact tttgtccgcg agcggccgtt caaggcgcag ccgcgagcgt 300gccaaccgtc cgtcccgcat tcttttccca atgttggatt cattcattct tgccaggcca 360gatcatctgt gcctccctcg cgtgcccttc cttagcgtgc gcagatctct tcttcccaga 420gcccgcgcgg cgcttcgtgg agtcggcgtc catgtcatgc gcgcgcggcg tcttgacccc 480ctcggcccct ttggttcgcg gctgcgcaac gagccgtttc acgccattgc gaccaaccgc 540gcgctaaaat cggattggcc gttgcacgcc gattttgcag cacctctggg ctgtgaggga 600cgaccgtcca cttttacccg cacagagtgg actttcaccc cctcactcca ctgaagccaa 660cttttcgccg tcttcccaac ccaaagttta tgctagccct catgccgcaa cggacgtcac 720ccccatttcc actggcgacg tggggacctg ggcgcaataa ggcgcgagaa ggaaattacg 780acggcacact ggggccagaa gagggcacta ggagcggcaa cccactggcg cggcacagcg 840gtttggcgcg gggatcaaag caaaacccgg ctcatccaga gcaaacccga atcagccttc 900agacggtcgt gcctaacaac acgccgttct accccgcctt ccttgcgtcc ctcgcctccc 960ccgagcccaa gtcttccgcc cgctcctaac gccaaccaag caag 100427590DNAThraustochytrium sp. 27atgcattcgc atggctccgc accaccacac accaccgccc ctcttctttc cttgctcact 60cgatccatag ccacttacct gccccttccc tctaccactg ccacgtgcgg cgtatgagcg 120cgcttgcacc cgcaaccttc tctctagttg ttcacaatta cacccgctat caatactcac 180gcattcatct tccccttttt ttctacttta cgtaccggtg ctcacttact tacacctgcc 240cgccttgttc attcattctt ctcgatgaca acggcaggct ctgcttgcgg cgcgcgcacg 300catcccttac tccgccgcgc accgacaagc ctgcgcaaaa aacaaaaaaa acttatcttc 360gctcgcggct ccgatgtcgc ggcggcgtac gagaccgcgc cgagttccgc ccgccatgcg 420atcgagagtc tctctcgtag gagcgggacc gcgagcgacc tcggtgcctc cgatagccag 480ctgggcttct agaccggctg ggggaccgcc cgcggcgtac ctctgcgctt cggtggccct 540taaaaggctg atcgtggaaa aggtcgctct ccagtctgcg gtttagcggc 59028640DNAThraustochytrium sp. 28gcgccttcag gcaggctgat ccctactgtg ggggctctga cggacggccg gtctttgtac 60gtaaacaggc gcttcttcgc ggcccgccga ggggggcggc aacgagccgg gtggcgtggc 120acggacaagg caagagcctt tccatcccgc ataaagtgat gcaccatttt gaccttgttg 180atcgtttttg tgtgtttaga gcggccccgt gcgggtaggc gaagtgcgct tctgagcaag 240gaagagagag gtgcagcttc ttcttgatca gtgtggtaat cttcaacggc cacgctcgct 300tattcgatac ctgtaaagct accggtgcac ccgtgcaagt tgggcaccac gtagttgtac 360tggtgaatcc aaatgttagc cgctagcttg gtgccctttt cgacaggaag ggcttggtga 420aaagccatgc tgtcgatctc ccttgggtcc tcgttcgtga cgctaggcca gagaatagct 480gtgtgccgcg cagtcgaagc cagcgcgcgc gcgtcggggc cgagcataga gttagcaatt 540cagttgtttc gggctcttga tgaggccgcc agagagcgaa gaaggatgaa cttaccagat 600ccgcgctccg gtgtattggt gatgggcggc ttggtctccg 64029372DNAArtificial SequenceSynthetic Construct 29atggccaagc tcacctcggc cgtcccggtg ctcaccgcgc gcgacgtcgc cggcgcggtc 60gagttctgga ccgaccggct cgggttctcc cgggacttcg tggaggacga cttcgccggc 120gtggtccggg acgacgtgac cctgttcatc agcgcggtcc aggaccaggt ggtgccggac 180aacaccctgg cctgggtgtg ggtgcgcggc ctggacgagc tgtacgccga gtggtcggag 240gtcgtgtcca cgaacttccg ggacgcctcc gggccggcca tgaccgagat cggcgagcag 300ccgtgggggc gggagttcgc cctgcgcgac ccggccggca actgcgtgca cttcgtggcc 360gaggagcagg ac 372301017DNAThraustochytrium sp. 30gcgttgtttc ggcacgcgca attgccggga tgggaatgtg cattggtgca cgggattgct 60gcgtcggccg ggccgtctcc caacatgaga cgcgctcggc aggaccgctt ccggttggca 120taacgtcgtt ttttccctgc tgtcccagct cgcgctttcg agacgaagct atctgtacta 180ccctctctac tcgcgatcac tcgctcgcaa ggaaaatgga agttaacgaa aggtcaatca 240ttttttgcgc tctgcattca tttgctctct tcttgttgtt tgtggaacca aacggtcaga 300cgcgtggatc gcttttgtta ggcactcggg aacggctgtc cctttaagca ctcaaccgaa 360cagtcgggcc acttggtctg caacagcgag accaacttgg gtgcatggcg gcggctcatc 420ttccactgcc actccatggg caggtcgtga aaggagagcc acagcgagta gcccgcctgc 480tggcggctct gtccgcacga ctggcacaca gacgtcccgg cgtcgttctg gccaaagcac 540atggtctgga agcgggtccg gtaacaagag gcgcaacgcc aaggctcgct cgaggccggc 600tcgttgtccg cgttgagcgt caaaatcacg gggggcggca cgcccgcagg gcgctcgggc 660cccgtgatcg acggatcgtc gatagcgagc acgacctcgt aacgccaggc gcgaaaatcg 720ttgggcttgg cagccttgtc gtcaggatgc gcctgcacaa tgtcctcgac ctcgccaggc 780actggtttga agtacgcctc cttgtgctcc tccggggccg cctctgcctc cttgtcgccg 840tggatctcga gttgggccat gcggggaccg gccggcggca gaccgttaaa atgccagagg 900tgcagcggtg gcttgtacga gtcgagggcg tggtcgcaga agagtagcgt aaaatggtca 960ccgccgtgca ggatccatac agggtgcccc ggcgtcttga ggcagtcgtg tacagga 1017

* * * * *