Methods Of Making Surfactant And Cleaning Compositions Through Microbially Produced Branched Fatty Alcohols

RUDE; Mathew

Patent Application Summary

U.S. patent application number 14/508927 was filed with the patent office on 2017-06-29 for methods of making surfactant and cleaning compositions through microbially produced branched fatty alcohols. This patent application is currently assigned to LS9, INC.. The applicant listed for this patent is Reg Life Sciences, LLC. Invention is credited to Mathew RUDE.

Application Number20170183695 14/508927
Document ID /
Family ID44120899
Filed Date2017-06-29

United States Patent Application 20170183695
Kind Code A9
RUDE; Mathew June 29, 2017

METHODS OF MAKING SURFACTANT AND CLEANING COMPOSITIONS THROUGH MICROBIALLY PRODUCED BRANCHED FATTY ALCOHOLS

Abstract

The invention provides a surfactant and/or a cleaning composition comprising a microbially produced branched fatty alcohol or a derivative thereof. The invention also provides a household cleaning composition and a personal or pet care cleaning composition comprising a microbially produced branched fatty alcohol or a derivative thereof.


Inventors: RUDE; Mathew; (South San Francisco, CA)
Applicant:
Name City State Country Type

Reg Life Sciences, LLC

Ames

IA

US
Assignee: LS9, INC.
South San Francisco
CA

Prior Publication:
  Document Identifier Publication Date
US 20160097065 A1 April 7, 2016
Family ID: 44120899
Appl. No.: 14/508927
Filed: October 7, 2014

Related U.S. Patent Documents

Application Number Filing Date Patent Number
13026871 Feb 14, 2011 8859259
14508927
61304448 Feb 14, 2010
61324310 Apr 15, 2010

Current U.S. Class: 1/1
Current CPC Class: C11D 1/29 20130101; C11D 1/28 20130101; C11D 1/345 20130101; C11D 1/72 20130101; C11D 1/75 20130101; C07C 31/125 20130101; C11D 1/62 20130101; C11D 3/202 20130101; C11D 1/662 20130101; C12P 7/64 20130101; C12P 7/04 20130101; C07C 33/025 20130101
International Class: C12P 7/64 20060101 C12P007/64

Claims



1-31. (canceled)

32: A method of making a surfactant composition using branched chain fatty alcohols produced in a recombinant host cell, the method comprising, (a) providing a recombinant host cell genetically modified to comprise (i) a polynucleotide encoding a polypeptide comprising one or more subunits having branched chain alpha-keto acid dehydrogenase (BKD) activity (E.C. 1.2.4.4.) capable of catalyzing a conversion of an alpha-keto acid to a branched acyl-CoA, (ii) a polynucleotide encoding a polypeptide having beta-ketoacyl-ACP synthase (FabH) activity capable of catalyzing a condensation of a branched acyl-CoA and a malonyl-ACP to produce a branched acyl-ACP, and (iii) a polynucleotide encoding a polypeptide having fatty aldehyde biosynthesis activity capable of catalyzing a conversion of a branched fatty acyl-ACP into a branched fatty aldehyde; (b) culturing the recombinant host cell in the presence of a carbon source under conditions effective to express the polynucleotides and produce branched chain fatty alcohols that are secreted into the extracellular environment of the host cell; (c) collecting the branched chain fatty alcohols; and (d) blending the branched chain fatty alcohols to make a surfactant composition.

33: The method of claim 32, further comprising a polypeptide having fatty alcohol biosynthesis activity, wherein said polypeptide is an alcohol dehydrogenase (EC 1.1.1.1).

33. (canceled)

34: The method of claim 32, wherein said one or more subunits are selected from the group consisting of E1 alpha/beta (decarboxylase), E2 (dihydrolipoyl transacylase), and E3 (dihydrolipoyl dehydrogenase) subunits.

35: The method of claim 32, wherein said one or more subunits are selected from the group consisting of E1 alpha/beta (decarboxylase) and E2 (dihydrolipoyl transacylase) subunits.

36: The method of claim 32, wherein said polypeptide having fatty aldehyde biosynthesis activity is an acyl-ACP reductase (AAR).

37: The method of claim 36, wherein said AAR is an enzyme from Synechococcus elongates.

38: The method of claim 32, wherein said polypeptide having fatty aldehyde biosynthesis activity is carboxylic acid reductase (CAR).

39: The method of claim 32, wherein the recombinant host cell is an E. coli host cell.

40: The method of claim 32, wherein the branched chain fatty alcohols include one or more of saturated or unsaturated C.sub.12, C.sub.14 and C.sub.16 fatty alcohols.
Description



CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of U.S. Pat. No. 8,859,259, filed Feb. 14, 2011, which claims the benefit of U.S. Provisional Patent Application No. 61/304,448, filed Feb. 14, 2010, and U.S. Provisional Patent Application No. 61/324,310, filed Apr. 15, 2010, the entire contents of which are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] Fatty alcohols have many commercial uses. Worldwide annual sales of fatty alcohols and their derivatives are in excess of US$1 billion. Fatty alcohols are used in diverse industries. For example, they are used in the cosmetic and food industries as emulsifiers, emollients, and thickeners. Due to their amphiphilic nature, fatty alcohols can be formulated or be used per se as nonionic surfactants, which are useful in personal care and household products, for example, in detergents. In addition, fatty alcohols are used in waxes, gums, resins, pharmaceutical salves and lotions, lubricating oil additives, textile antistatic and finishing agents, plasticizers, cosmetics, industrial solvents, and solvents for fats.

[0003] One major use for fatty alcohols is in cleaning compositions. On the other hand, fatty alcohols find applicability as surfactants, which are, for example, capable of enhancing oil recovery and/or engine performance. Conventional surfactants comprise molecules having at least one water-solubilizing substituent or moiety (e.g., hydrophilic group) and at least one oleophilic substituent or moiety (e.g., hydrophobic group). Examples of hydrophilic groups include, without limitation, carboxylate, sulfate, sulfonate, amine oxide, or polyoxyethylene. Examples of the hydrophobic groups include, without limitation, alkyl, alkenyl, or alkaryl hydrophobes, which typically contain about 10 to about 20 carbon atoms.

[0004] Surfactants are typically regarded as the major force behind cleaning products' ability to break up stains, solubilize dirt and soil, and/or prevent their redeposition to surfaces. As such, surfactants are also referred to as wetting agents and foamers, which lower the surface tension of the medium in which they are dissolved. Capable of lowering the interfacial tension between two media or interfaces (e.g., air/water, water/oil, or oil/solid interfaces), surfactants play a key role, and are often the most important component in detergents. Conventional detergent compositions contain mixtures of various surfactants in order to remove different types of soils and stains from surfaces.

[0005] The earliest utilized source of hydrophobe groups were natural fats and oils, which were converted into soaps (e.g., carboxylate hydrophile) using base via saponification processes. Coconut and palm oils are to this day used to manufacture soaps and alkylsulfate surfactants. As edible oils became more scarce, it has become increasingly prevalent to manufacture detergents from petrochemicals, using processes such as the Zeigler process to convert petroleum derived ethylene to fatty alcohols. For example, ethylene has been converted into alkyl benezene sulfonate surfactants, which are commonly found in today's detergents and cleaning compositions.

[0006] Fatty alcohols can also served as starting materials in the preparation of surfactants and of other cleaning composition ingredients including, for example, alkyl sulfates, fatty ether sulfates, fatty alcohol sulfates, fatty phosphate esters, alkylbenzyl dimethylammonium salts, fatty amine oxides, alkyl polyglucosides, and alkyl glyceryl ether sulfonates. Among these, alkyl sulfates are commonly known due to the ease of their manufacture as well as their improved solubility and surfactant characteristics over traditional soap-based surfactants. However, long-chain alkyl surfactants have less than optimal performance as surfactants or as component(s) of detergents at low temperatures (e.g., about 50.degree. C. or lower, about 30.degree. C. or lower).

[0007] While there have been isolated reports that branching, especially towards the middle part of the long-chain alkyl, can reduce solubility of the surfactant, others have described that, in commercial practices, branching in fatty alcohols is highly desirable. See, e.g., R. G. Laughlin, The Aqueous Phase Behavior of Surfactants," Academic Press, N.Y., (1994), at page 347; but see, Finger et al., Detergent alcohols--the effect of alcohol structure and molecular weight on surfactant properties, J. Amer. Oil Chemicals Society, Vol. 44:525 (1967); Technical Bulletin, Shell Chemical Co., SC:164-80. In addition, K. R. Wormuth, et al., Langmuir, vol 7 (1991):2048-2053, describes the technical advantages observed with a number of branched alkyl sulfates, especially with the "branched Guerbet" type, derived from the highly branched "Exxal" alcohols (Exxon). Phase studies have established a liphophilic ranking (i.e., a hydrophobicity ranking) if highly branched/double tail>methyl branched>linear. Furthermore, patents and applications, including, for example, U.S. Pat. No. 6,008,181 indicates that certain branched or multi-branched fatty alcohol derivatives exhibit improved cleaning capacity, especially at lower temperatures.

[0008] Branched fatty alcohols and various precursors are known to have additional preferred properties such as considerably lower melting points, which can in turn confer lower pour points when made into industrial chemicals, as compared to linear alcohols of comparable molecular weights. They are also known to confer substantially lower volatility and vapor pressure, and improved stability against oxidation and rancidity than their linear counterparts. These additional preferred properties, in addition to making branched materials desirable surfactants, make them particularly suited as components or feedstocks for cosmetic and pharmaceutical applications, as components of plasticizers for making synthetic resins, as solvents for solutions for printing ink and specialty inks, or as industrial lubricants.

[0009] Those added preferred properties can be alternatively obtained from unsaturated fatty alcohols and precursors. But unsaturation promotes oxidation, leading to short shelf lives and corrosion. Thus desirable properties, e.g., lower melting points, pour points, volatility, and vapor pressure and improved oxidative stability, are better achieved via branching.

[0010] Obtaining branched materials from crude petroleum requires a significant financial investment as well as consumes a great deal of energy. It is also an inefficient process because frequently it is necessary to crack the long chain hydrocarbons in crude petroleum to produce smaller monomers, which only then become useful as raw materials for manufacturing complex specialty chemicals. Furthermore, it is commonplace in the petrochemical industry to obtain branched chemicals, such as branched alcohols and aldehydes, by isomerization of straight-chain hydrocarbons. Expensive catalysts are typically required for isomerization, thus increasing manufacturing cost. The catalysts often then become undesirable contaminants that are removed from the finished products, adding yet further cost to the processes.

[0011] Obtaining specialty chemicals such as branched alcohols or derivatives from crude petroleum also drains the dwindling resource of petroleum, in addition to the cost and problems associated with exploring, extracting, transporting, and refining. One estimate of world petroleum consumption is 30 billion barrels per year. By some estimates, it is predicted that at current production levels, the world's petroleum reserves could be depleted before 2050.

[0012] Finally, processing and manufacturing of surfactants and/or detergents from petroleum inevitably releases greenhouse gases (e.g., in the form of carbon dioxide) and other forms of air pollution (e.g., carbon monoxide, sulfur dioxide, etc.). The accumulation of greenhouse gases in the atmosphere can lead to increase global warming, causing local pollutions and spillage as well as global environmental detriments.

[0013] Thus, although it is possible to obtain branched fatty alcohols and derivatives from natural oils and petroleum, it would be desirable to produce these branched materials from other sources, such as directly from biomass.

SUMMARY OF THE INVENTION

[0014] The invention provides a surfactant composition and a cleaning composition comprising one or more microbially produced branched fatty alcohols, branched fatty alcohol precursors, or branched fatty alcohol derivatives thereof.

[0015] The invention provides a surfactant composition comprising about 0.001 wt. % to about 100 wt. % of one or more microbially produced branched fatty alcohols or branched alcohol derivatives thereof.

[0016] The invention also provides a liquid cleaning composition comprising (a) about 0.1 wt. % to about 50 wt. % of one or more microbially produced branched fatty alcohols or derivatives thereof, or about 0.1 wt. % to about 50 wt. % of a surfactant comprising one or more microbially produced branched fatty alcohols or derivatives thereof, (b) about 1 wt. % to about 30 wt. % of one or more co-surfactants, (c) about 0 wt. % to about 10 wt. % of one or more detergency builders, (d) 0 wt. % to about 2 wt. % of one or more enzymes, (e) about 0 wt. % to about 15 wt. % of one or more chelating agents, (f) about 0 wt. % to about 20 wt. % of one or more hydrotropes, (g), about 0 wt. % to about 1.0 wt. % of one or more organic sequestering agents, and (h) about 0.1 wt. % to about 98 wt. % of a solvent system. In some embodiments, the liquid cleaning composition further comprises one or more suitable adjuncts.

[0017] The invention further provides a solid cleaning composition comprising (a) about 0.1 wt. % to about 50 wt. % of one or more microbially produced branched fatty alcohols or derivatives thereof, or about 0.1 wt. % to about 50 wt. % of a surfactant comprising one or more microbially produced branched fatty alcohols or derivatives thereof, (b) about 1 wt. % to about 30 wt. % of one or more co-surfactants, (c) about 1 wt. % to about 60 wt. % of one or more detergency builders, (d) about 0 wt. % to about 2 wt. % of one or more enzymes, (e) about 0 wt. % to about 20 wt. % of one or more hydrotropes, (f) about 10 wt. % to about 35 wt. % of one or more filler salts, (g) about 0 wt. % to about 15 wt. % of one or more chelating agents, and (g) about 0.01 wt. % to about 1 wt. % of one or more organic sequestering agents. In certain embodiments, the solid cleaning composition further comprises one or more suitable adjuncts.

[0018] In particular embodiments, the invention pertains to a household cleaning composition comprising (a) about 0.1 wt. % to about 50 wt. % of one or more microbially produced branched fatty alcohols and/or derivatives thereof, or about 0.1 wt. % to about 50 wt. % of a surfactant comprising one or more microbially produced fatty alcohols and/or derivatives thereof; (b) about 1 wt. % to about 30 wt. % of one or more co-surfactants; (c) about 0 wt. % to about 30 wt. % of one or more detergency builders; (d) about 0 wt. % to about 2.0 wt. % of one or more suitable detersive enzymes; (e) about 0 wt. % to about 15 wt. % one or more chelating agents; (f) about 0 wt. % to about 20 wt. % of one or more hydrotropes, (g) about 0 to about 15 wt. % of one or more rheology modifier; (h) about 0 wt. % to about 1.0 wt. % of one or more organic sequestering agents; and (i) various other adjuncts such as, for example, one or more of bleaching agents, additional enzymes, suds suppressors, dispersants, lime-soap dispersants, soil suspension and anti-redeposition agents, and corrosion inhibitors. In an exemplary embodiment, a laundry composition can also comprise softening agents, fragrances, bleach systems, dyes or colorants, preservatives, germicides, fungicides, fabric care benefit agents, gelling agents, antideposition agents, and other detersive adjuncts

[0019] Such a household cleaning composition can be a liquid, which further comprises water and/or a suitable aqueous carrier or solvent. Liquid compositions can be in a "concentrated" form, the density of which can range from, for example, about 400 to about 1,200 g/L, when measured at 200.degree. C. For example, the water content of a typical concentrated liquid detergent is less than about 40 wt. %, or less than about 30 wt. %. Alternatively, a household cleaning composition can be a solid, for example, in the form of a tablet, a bar, a powder or a granule. Granular compositions can also be in a "compact" form, which is best reflected by density and, in terms of composition, by the amount of inorganic filler salt. Inorganic filler salts are conventional ingredients of solid cleaning compositions, present in substantial amounts, varying from, for example, about 10 wt. % to about 35 wt. %. Suitable filler salts include, for example, alkali and alkaline-earth metal salts of sulfates and chlorides. An exemplary filler salt is sodium sulfate.

[0020] In another embodiment, the invention provides a personal or beauty care cleaning or treatment composition comprising (a) about 0.1 wt. % to about 50 wt. % of one or more microbially produced branched fatty alcohols and/or derivatives thereof, or about 0.1 wt. % to about 50 wt. % of a surfactant comprising one or more microbially produced branched fatty alcohols and/or derivatives thereof; (b) about 0.001 wt. % to about 30 wt. % of one or more co-surfactants; (c) about 0 wt. % to about 30 wt. % of one or more detergency builders; (d) about 0 wt. % to about 2.0 wt. % of one or more suitable detersive enzymes; (e) about 0 wt. % to about 15 wt. % one or more chelating agents; (f) about 0 wt. % to about 20 wt. % of one or more hydrotropes, (g) about 0 to about 15 wt. % of one or more rheology modifier; (h) about 0 wt. % to about 1.0 wt. % of one or more organic sequestering agents; and (i) various other adjuncts such as, for example, one or more of conditioner, silicone, fragrances, silica particles, cationic cellose or guar polymers, silicone microemulsion stabilizers, fatty amphiphiles, germicides, fungicides, anti-dandruff agents, pearlescent agents, foam boosters, pediculocides, pH adjusting agents, UV absorbers, sunscreens, skin active agents, vitamins, minerals, herbal/fruit/food extracts, sphingolipids, sensory indicators, suspension agents, and mixtures thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] FIG. 1A and FIG. 1B are schematics of two exemplary alternative pathways for producing branched fatty alcohols using recombinant microbial host cells.

[0022] FIG. 2A lists representative homologs of BKD E1 alpha subunit, their amino acid sequences and polynucleotide sequences, as well as amino acid sequence motifs of suitable BKD E1 alpha subunit homologs and variants. FIG. 2B lists representative homologs of BKD E1 beta subunit, their amino acid sequences and polynucleotide sequences, as well as amino acid sequence motifs of suitable BKD E1 beta subunit homologs and variants. FIG. 2C lists representative homologs of BKD E2 subunit, their amino acid sequences and polypeptide sequences, as well as amino acid sequence motifs of suitable BKD E2 subunit homologs and variants. FIG. 2D lists representative homologs of BKD E3 subunit homologs and variants, as well as amino acid sequence motifs of suitable BKD E3 subunit homologs and variants. FIG. 2E lists representative homologs of beta ketoacyl-ACP synthase homologs, their amino acid sequences and polynucleotide sequences, as well as amino acid sequences of suitable beta keto-acyl-ACP synthase homologs and variants.

[0023] FIG. 3A is a table of BKD E1 alpha subunit homologs. FIG. 3B is a table of BKD E1 beta subunit homologs. FIG. 3C is a table of BKD E2 subunit homologs. FIG. 3D if a table of BKD E3 subunit homologs. FIG. 3E is a table of beta ketoacyl-ACP synthase homologs. These tables also present % identity in reference to the sequences of various organisms. For example, "ID % Pp" indicates that the identity listed in the column below are in reference to a P. putida gene encoding that subunit. "ID % Bs" refers to the identity to a B. subtilis gene encoding that subunit. "ID % Sc" and "ID % Sc2" refer to identity to a first and second S. coelicolor genes encoding that subunit, respectively. "ID % Sa" and "ID % Sa2" refer to identity to a first and a second S. avermitilis genes encoding that subunit, respectively.

[0024] FIG. 4A depicts a GC/MS trace of branched fatty alcohol production of strain MG1655.sub.-- .DELTA.tonA AAR:kan transformed with a pGL10 vector containing P. putida Pput1450, Pput1451, Pput1452 and Pput1453 inserts, and with B. subtilis fabH1. The figure indicates the production of iso-C.sub.14:0, iso-C.sub.15:0, anteiso-C.sub.15:0, iso-C.sub.16:0, iso-C.sub.17:0 and anteiso-C.sub.17:0 branched fatty alcohols. FIG. 4B depicts the production of branched fatty acyl-CoA precursors by feeding branched substrates isobutyrate and isovalerate to an engineered E. coli strain comprising the pDG10 and an OP-180 plasmids, the latter plasmid contained teas under the control of a Ptrc promoter.

[0025] FIG. 5 is a representative calibration curve obtained by linear regression, which was used in the semi-quantitative measurement of the amount of branched fatty alcohol yield relative to the amount of straight-chain fatty alcohol yield.

[0026] FIG. 6A is a listing of nucleotide sequence of the pDG2 plasmid. FIG. 6B depicts a map of the pDG6 plasmid. FIG. 6C is a listing of nucleotide sequence of the pDG6 plasmid, constructed by inserting B. subtilis fabH1 into pDG2, comprising E. coli PfabH1 (promoter) and B. subtilis fabH1. The B. subtilis fabH1 insert is in upper case italic letters. FIG. 6D depicts a map of the pDG7 plasmid. FIG. 6E is a listing of nucleotide sequence of the pDG7 plasmid, constructed by inserting a B. subtilis fabH2 into pDG2, comprising E. coli PfabH1 (promoter) and B. subtilis fabH2. FIG. 6F depicts a map of the pDG8 plasmid. FIG. 6G is a listing of nucleotide sequence of pDG8 plasmid, constructed by inserting S. coelicolor fabH into pDG2, comprising E. coli PfabH1 (promoter) and S. coelicolor fabH. FIG. 6H is a plasmid map of the pDG10 plasmid. FIG. 6I is listing of nucleotide sequence of the pDG10 plasmid, comprising a C. acetobutylicum ptb_buk insert. FIG. 6J is a listing of nucleotide sequence of the pLS9-111 plasmid. FIG. 6K is a listing of nucleotide sequence of the pLS9-114 plasmid. FIG. 6L is a listing of nucleotide sequence of the pLS9-115 plasmid.

[0027] FIG. 7 is a listing of nucleotide sequence of the pKZ4 plasmid having a pGL10.173B vector backbone and a polynucleotide insert encoding a BKD complex from Pseudomonas putida. The P. putida genes encoding a BKD complex are shown in lower case italic letters.

[0028] FIG. 8 is a listing of nucleotide sequence of the pGL10.173B vector backbone, which contains the BamHI and EcoRI sites to which the Pseudomonas putida bkd genes (operon) were inserted. The BamHI and EcoRI restriction sites are marked.

[0029] FIG. 9 is a listing of additional nucleotide and amino acid sequences of the disclosure.

DETAILED DESCRIPTION OF THE INVENTION

[0030] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein, including GenBank database sequences, are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

[0031] Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

DEFINITIONS

[0032] Throughout the specification, a reference may be made using an abbreviated gene name or polypeptide name, but it is understood that such an abbreviated gene or polypeptide name represents the genus of genes or polypeptides. Such gene names include all genes encoding the same polypeptide and homologous polypeptides having the same physiological function. Polypeptide names include all polypeptides and homologous polypeptides that have the same activity (e.g., that catalyze the same fundamental chemical reaction).

[0033] Unless otherwise indicated, the accession numbers referenced herein are derived from the NCBI database (National Center for Biotechnology Information) maintained by the National Institute of Health, U.S.A. Unless otherwise indicated, the accession numbers are as provided in the database as of December 2009.

[0034] EC numbers are established by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) (available at http://www.chem. qmul.ac.uk/iubmb/enzyme/). The EC numbers referenced herein are derived from the KEGG Ligand database, maintained by the Kyoto Encyclopedia of Genes and Genomics, sponsored in part by the University of Tokyo. Unless otherwise indicated, the EC numbers are as provided in the database as of October 2008.

[0035] The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.

[0036] The term "about" is used herein to mean a value .+-.20% of a given numerical value. Thus, "about 60%" means a value of between 60.+-.(20% of 60) (i.e., between 48 and 70).

[0037] The term "alkyl" is used herein to mean a straight chain or a branched chain hydrocarbon residue having from about 6 carbon atoms to about 26 carbon atoms and in the context of the present specification is used interchangeably with the term "fatty."

[0038] As used herein, the term "alcohol dehydrogenase" (EC 1.1.1.*) refers to a polypeptide capable of catalyzing the conversion of a fatty aldehyde to an alcohol (e.g., fatty alcohol). In certain embodiments, these enzymes can also be referred to as fatty aldehyde recutases, oxidoreductases, or aldo-keto reductases. Additionally, one of ordinary skill in the art will appreciate that some alcohol dehydrogenases will catalyze other reactions as well. For example, some alcohol dehydrogenases will accept other substrates in addition to fatty aldehydes. Such non-specific alcohol dehydrogenases are, therefore, also included in this definition. Nucleic acid sequences encoding alcohol dehydrogenases are known in the art, and such alcohol dehydrogenases are publicly available. Exemplary GenBank Accession Numbers are provided in Table 8 herein.

[0039] As used herein, the term "attenuate" means to weaken, reduce, or diminish. For example, a polypeptide can be attenuated by modifying the polypeptide to reduce its activity (e.g., by modifying a nucleotide sequence that encodes the polypeptide) or its expression level.

[0040] As used herein, the term "biomass" refers to any biological material from which a carbon source is derived. In some instances, a biomass is processed into a carbon source, which is suitable for bioconversion. In other instances, the biomass may not require further processing into a carbon source. The carbon source can be converted into a fatty alcohol. One exemplary source of biomass is plant matter or vegetation. For example, corn, sugar cane, or switchgrass can be used as biomass. Another non-limiting example of biomass is metabolic wastes, such as animal matter, for example cow manure. In addition, biomass may include algae and other marine plants. Biomass also includes waste products from industry, agriculture, forestry, and households. Examples of such waste products that can be used as biomass are fermentation waste, ensilage, straw, lumber, sewage, garbage, cellulosic urban waste, and food leftovers. Biomass also includes carbon sources such as carbohydrates (e.g., monosaccharides, disaccharides, or polysaccharides).

[0041] As used herein, the phrase "carbon source" refers to a substrate or compound suitable to be used as a source of carbon for prokaryotic or simple eukaryotic cell growth. Carbon sources can be in various forms, including, but not limited to polymers, carbohydrates, acids, alcohols, aldehydes, ketones, amino acids, peptides, and gases (e.g., CO and CO.sub.2). These include, for example, various monosaccharides, such as glucose, fructose, mannose, and galactose; oligosaccharides, such as fructo-oligosaccharide and galacto-oligosaccharide; polysaccharides such as xylose and arabinose; disaccharides, such as sucrose, maltose, and turanose; cellulosic material, such as methyl cellulose and sodium carboxymethyl cellulose; saturated or unsaturated fatty acid esters, such as succinate, lactate, and acetate; alcohols, such as ethanol, methanol, and glycerol, or mixtures thereof. The carbon source can also be a product of photosynthesis, including, but not limited to, glucose. A preferred carbon source is biomass. Another preferred carbon source is glucose.

[0042] A nucleotide sequence is "complementary" to another nucleotide sequence if each of the bases of the two sequences matches (i.e., is capable of forming Watson-Crick base pairs). The term "complementary strand" is used herein interchangeably with the term "complement". The complement of a nucleic acid strand can be the complement of a coding strand or the complement of a non-coding strand.

[0043] As used herein, the term "conditions sufficient to allow expression" means any conditions that allow a host cell to produce a desired product, such as a polypeptide or fatty alcohol described herein. Suitable conditions include, for example, fermentation conditions. Fermentation conditions can comprise many parameters, such as temperature ranges, levels of aeration, and media composition. Each of these conditions, individually and in combination, allows the host cell to grow. Exemplary culture media include broths or gels. Generally, the medium includes a carbon source, such as glucose, fructose, cellulose, or the like, that can be metabolized by a host cell directly. In addition, enzymes can be used in the medium to facilitate the mobilization (e.g., the depolymerization of starch or cellulose to fermentable sugars) and subsequent metabolism of the carbon source.

[0044] To determine if conditions are sufficient to allow expression, a host cell can be cultured, for example, for about 4, 8, 12, 24, 36, or 48 hours. During and/or after culturing, samples can be obtained and analyzed to determine if the conditions allow expression. For example, the host cells in the sample or the medium in which the host cells were grown can be tested for the presence of a desired product. When testing for the presence of a product, assays, such as TLC, HPLC, GC/FID, GC/MS, LC/MS, and MS, can be used.

[0045] It is understood that the polypeptides described herein may have additional conservative or non-essential amino acid substitutions, which do not have a substantial effect on the polypeptide functions. Whether or not a particular substitution will be tolerated (i.e., will not adversely affect desired biological properties, such as carboxylic acid reductase activity) can be determined as described in Bowie et al., Science, 247: 1306-1310 (1990). A "conservative amino acid substitution" refers to the replacement of one amino acid residue with another amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine), and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine).

[0046] As used herein, "control element" means a transcriptional and/or a translational control element. Control elements include promoters and enhancers, such as ribosome binding sequences. The term "promoter element," "promoter," or "promoter sequence" refers to a DNA sequence that functions as a switch that activates the expression of a gene. If the gene is activated, it is said to be transcribed or participating in transcription. Transcription involves the synthesis of mRNA from the gene. A promoter, therefore, serves as a transcriptional regulatory element and also provides a site for initiation of transcription of the gene into mRNA. Control elements interact specifically with cellular proteins involved in transcription (Maniatis et al., Science, 236: 1237 (1987)).

[0047] As used herein, the term "detergent" refers broadly to agents and materials that are useful in cleaning applications or as cleaning aids. This term is thus used interchangeably with the term "cleaning composition." The term encompasses materials and agents that comprise various surfactants at various percentages by weight or by volume, as well as suitable additives, and are capable of emulsifying stains in a cleaning matrix. A detergent can take the physical form of, for example, a liquid, a paste, a gel, a bar, a powder, a tablet, or a granule. Granular compositions can also be in "compact" form, whereas liquid compositions can be in "concentrate" form.

[0048] As used herein, detergent compositions include articles and compositions of cleaning and/or treatment. As used herein, the term "cleaning and/or treatment composition" includes, unless otherwise indicated, tablet, granular, or power-form all-purpose or "heavy duty" washing agents, especially laundry detergents; liquid, gel, or paste-form all-purpose washing agents, especially the so-called heavy-duty liquid types; liquid fine-fabric detergents; hand dishwashing agents, or light duty dishwashing agents, especially those of the high-foaming type; machine dishwashing agents, including the various tablets, granular, liquid and rinse-aid types for household and institutional use. The compositions can also be in unit dose packages, including those known in the art and those that are water soluble, water insoluble and/or water permeable.

[0049] As used herein, detergent composition also include personal or beauty care products in the form of skin and hair care compositions including, for example, conditioning treatments, cleansing products, such as hair and/or scalp shampoos, body washes, hand cleaners, water-less hand sanitizers/cleansers, facial cleansers, and the like.

[0050] As used herein, the term "fatty acid" means a carboxylic acid having the formula RCOOH. R represents an aliphatic group, preferably an alkyl group. R can comprise about 4 or more carbon atoms. In some embodiments, the fatty acid comprises between about 4 and about 22 carbon atoms. Fatty acids can be saturated, monounsaturated, or polyunsaturated. In addition, fatty acids can comprise a straight or branched chain. The branched chains may have one or more points of branching. In addition, the branched chains may include cyclic branches. In a preferred embodiment, the fatty acid is made from a fatty acid biosynthetic pathway.

[0051] As used herein, the term "fatty acid biosynthetic pathway" means a biosynthetic pathway that produces fatty acids. The fatty acid biosynthetic pathway includes fatty acid enzymes that can be engineered, as described herein, to produce fatty acids, and in some embodiments can be expressed with additional enzymes to produce fatty acids having desired carbon chain characteristics.

[0052] As used herein, the term "fatty acid derivative" means products made in part from the fatty acid biosynthetic pathway of the production host organism. "Fatty acid derivative" also includes products made in part from acyl-ACP or acyl-ACP derivatives. The fatty acid biosynthetic pathway includes fatty acid synthase enzymes which can be engineered as described herein to produce fatty acid derivatives, and in some examples can be expressed with additional enzymes to produce fatty acid derivatives having desired carbon chain characteristics. Exemplary fatty acid derivatives include, for example, fatty acids, acyl-CoA, fatty aldehyde, short and long chain alcohols, hydrocarbons, fatty alcohols, and esters (e.g., waxes, fatty acid esters, or fatty esters), although due to their separate and industrial utilities and depending the sources from which they derive, hydrocarbons can sometimes be grouped into a separate "hydrocarbon" category.

[0053] As used herein, the term "fatty acid derivative enzyme" means any enzyme that may be expressed or overexpressed in the production of fatty acid derivatives. These enzymes may be part of the fatty acid biosynthetic pathway. Non-limiting examples of fatty acid derivative enzymes include fatty acid synthases, thioesterases, acyl-CoA synthases, acyl-CoA reductases, alcohol dehydrogenases, alcohol acyltransferases, fatty alcohol-forming acyl-CoA reductases, carboxylic acid reductases (e.g., fatty acid reductases), acyl-ACP reductases, fatty acid hydroxylases, acyl-CoA desaturases, acyl-ACP desaturases, acyl-CoA oxidases, acyl-CoA dehydrogenases, ester synthases, and/or alkane biosynthetic polypeptides, etc. Fatty acid derivative enzymes can convert a substrate into a fatty acid derivative. In some examples, the substrate may be a fatty acid derivative that the fatty acid derivative enzyme converts into a different fatty acid derivative.

[0054] As used herein, "fatty acid enzyme" means any enzyme involved in fatty acid biosynthesis. Fatty acid enzymes can be expressed or overexpressed in host cells to produce fatty acids. Non-limiting examples of fatty acid enzymes include fatty acid synthases and thioesterases.

[0055] As used herein, "fatty aldehyde" means an aldehyde having the formula RCHO characterized by an unsaturated carbonyl group (C.dbd.O). In a preferred embodiment, the fatty aldehyde is any aldehyde made from a fatty acid or fatty acid derivative. In one embodiment, the R group is at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or 26 carbons in length, or is a value between any two of the foregoing values.

[0056] R can be straight or branched chain. The branched chains may have one or more points of branching. In addition, the branched chains may include cyclic branches.

[0057] Furthermore, R can be saturated or unsaturated. If unsaturated, the R can have one or more points of unsaturation.

[0058] In one embodiment, the fatty aldehyde is produced biosynthetically.

[0059] Fatty aldehydes have many uses. For example, fatty aldehydes can be used to produce many specialty chemicals. For example, fatty aldehydes are used to produce polymers, resins, dyes, flavorings, plasticizers, perfumes, pharmaceuticals, and other chemicals. Some are used as solvents, preservatives, or disinfectants. Some natural and synthetic compounds, such as vitamins and hormones, are aldehydes.

[0060] The terms "fatty aldehyde biosynthetic polypeptide", "carboxylic acid reductase", and "CAR" are used interchangeably herein.

[0061] As used herein, "fatty alcohol" means an alcohol having the formula ROH. In a preferred embodiment, the fatty alcohol is any alcohol made from a fatty acid or fatty acid derivative. In one embodiment, the R group is at least about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or 26 carbons in length, or is a value between any two of the foregoing values. Typically, the fatty alcohol comprises an R group that is 6 to 26 carbons in length. Preferably, the fatty alcohol comprises an R group that is 8, 10, 12, 14, 16, or 18 carbons in length.

[0062] R can be straight or branched chain. The branched chains may have one or more points of branching. In addition, the branched chains may include cyclic branches. In a particular embodiment, the fatty alcohol of the present invention comprises one or more points of branching.

[0063] Furthermore, R can be saturated or unsaturated. If unsaturated, the R can have one or more points of unsaturation.

[0064] In one embodiment, the branched fatty alcohol is produced biosynthetically.

[0065] Fatty alcohols have many uses. For example, fatty alcohols can be used to produce various specialty chemicals. As such, fatty alcohols are used as a biofuel; as solvents for fats, waxes, gums, and resins; in pharmaceutical salves, emollients, and lotions; as lubricating-oil additives; in detergents and emulsifiers; as textile antistatic and finishing agents; as plasticizers; as nonionic surfactants; in cosmetics, e.g., as thickeners.

[0066] The term "fatty alcohol derivative" refers to a compound derived from a fatty alcohol. The fatty alcohol derivative can include the oxygen atom derived from the fatty alcohol, or, in some embodiments, does not include the aforesaid oxygen atom, in, for example, fatty amine oxides. For example, a fatty amide, which also can be referred to as an alkyl amide, refers to a compound comprising an amide group and a hydrocarbon residue having about 6 carbon atoms or more, wherein the hydrocarbon residue is bonded to the carbonyl group of the amide group or to the nitrogen atom of the amide group. In some embodiments, the hydrocarbon residue of the fatty alcohol is bonded to the carbonyl group of the amide group or to the nitrogen atom of the amide group. In some embodiments, the hydrocarbon residue is saturated. In other embodiments, the hydrocarbon residue is monounsaturated. In further embodiments, the hydrocarbon residue is polyunsaturated. In certain other embodiments, the hydrocarbon residue can be a straight-chain residue. In certain further embodiments, the hydrocarbon residue can contain one or more points of branching.

[0067] Branched fatty alcohols have particularly beneficial properties as compared to their corresponding straight-chain isomers (i.e., isomers of the same molecular weight). For example, branched fatty alcohols tend to have considerably lower melting points when compared to their corresponding straight-chain isomers. Lower melting points confer lower pour points. In addition, branched fatty alcohols tend to substantially lower volatility and vapor pressure, and improved stability against oxidation and rancidity, as compared to their corresponding straight-chain isomers. These beneficial properties render particular suitability of using branched fatty alcohols and/or derivatives thereof as components or feedstocks for cosmetic and pharmaceutical applications, as components of plasticizers for synthetic resins, as solvents for solutions for printing ink and specialty inks, or as industrial lubricants. These materials are also well suited as components of surfactants that have good low-temperature detersive performance. As such, they are especially desirable as ingredients of various household and/or personal care cleaning/treatment compositions wherein low washing temperatures are preferred.

[0068] As used herein, "fraction of modern carbon" or "f.sub.M" has the same meaning as defined by National Institute of Standards and Technology (NIST) Standard Reference Materials (SRMs) 4990B and 4990C, known as oxalic acids standards HOxI and HOxII, respectively. The fundamental definition relates to 0.95 times the .sup.14C/.sup.12C isotope ratio HoxI (referenced to AD 1950). This is roughly equivalent to decay-corrected pre-Industrial Revolution wood. For the current living biosphere (plant material), f.sub.M is approximately 1.1.

[0069] "Gene knockout", as used herein, refers to a procedure by which a gene encoding a target protein is modified or inactivated so as to reduce or eliminate the function of the intact protein. Inactivation of the gene may be performed by general methods such as mutagenesis by UV irradiation or treatment with N-methyl-N'-nitro-N-nitrosoguanidine, site-directed mutagenesis, homologous recombination, insertion-deletion mutagenesis, or "Red-driven integration" (Datsenko et al., Proc. Natl. Acad. Sci. USA, 97: 6640-45 (2000)). For example, in one embodiment, a construct is introduced into a host cell, such that it is possible to select for homologous recombination events in the host cell. One of skill in the art can readily design a knock-out construct including both positive and negative selection genes for efficiently selecting transfected cells that undergo a homologous recombination event with the construct. The alteration in the host cell may be obtained, for example, by replacing through a single or double crossover recombination a wild type DNA sequence by a DNA sequence containing the alteration. For convenient selection of transformants, the alteration may, for example, be a DNA sequence encoding an antibiotic resistance marker or a gene complementing a possible auxotrophy of the host cell. Mutations include, but are not limited to, deletion-insertion mutations. An example of such an alteration includes a gene disruption (i.e., a perturbation of a gene) such that the product that is normally produced from this gene is not produced in a functional form. This could be due to a complete deletion, a deletion and insertion of a selective marker, an insertion of a selective marker, a frameshift mutation, an in-frame deletion, or a point mutation that leads to premature termination. In some instances, the entire mRNA for the gene is absent. In other situations, the amount of mRNA produced varies.

[0070] Calculations of "homology" between two sequences can be performed as follows. The sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In a preferred embodiment, the length of a reference sequence that is aligned for comparison purposes is at least about 30%, preferably at least about 40%, more preferably at least about 50%, even more preferably at least about 60%, and even more preferably at least about 70%, at least about 80%, at least about 90%, or about 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein, amino acid or nucleic acid "identity" is equivalent to amino acid or nucleic acid "homology"). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps and the length of each gap, which need to be introduced for optimal alignment of the two sequences.

[0071] The comparison of sequences and determination of percent homology between two sequences can be accomplished using a mathematical algorithm. In a preferred embodiment, the percent homology between two amino acid sequences is determined using the Needleman and Wunsch (1970), J. Mol. Biol. 48:444 453, algorithm that has been incorporated into the GAP program in the GCG software package, using either a Blossum 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet another preferred embodiment, the percent homology between two nucleotide sequences is determined using the GAP program in the GCG software package, using a NWSgapdna. CMP matrix and a gap weight of about 40, 50, 60, 70, or 80 and a length weight of about 1, 2, 3, 4, 5, or 6. A particularly preferred set of parameters (and the one that should be used if the practitioner is uncertain about which parameters should be applied to determine if a molecule is within a homology limitation of the claims) are a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.

[0072] Other methods for aligning sequences for comparison are well known in the art. Various programs and alignment algorithms are described in, for example, Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988; Higgins & Sharp, Gene 73:237 244, 1988; Higgins & Sharp, CABIOS 5:151-153, 1989; Corpet et al., Nucleic Acids Research 16:10881-10890, 1988; Huang et al., CABIOS 8:155-165, 1992; and Pearson et al., Methods in Molecular Biology 24:307-331, 1994. and Altschul et al., J. Mol. Biol. 215:403-410, 1990.

[0073] As used herein, a "host cell" is a cell used to produce a product described herein (e.g., a branched fatty alcohol described herein). A host cell can be modified to express or overexpress selected genes or to have attenuated expression of selected genes. Non-limiting examples of host cells include plant, animal, human, bacteria, yeast, or filamentous fungi cells.

[0074] As used herein, the term "hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions" describes conditions for hybridization and washing. Guidance for performing hybridization reactions can be found, for example, in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. Aqueous and nonaqueous methods are described in that reference, and either method can be used. An example of hybridization conditions referred to herein are as follows: 1) low stringency hybridization conditions in 6.times. sodium chloride/sodium citrate (SSC) at about 45.degree. C., followed by two washes in 0.2.times.SSC, 0.1% SDS at least at 50.degree. C. (the temperature of the washes can be increased to 55.degree. C. for low stringency conditions); 2) medium stringency hybridization conditions in 6.times.SSC at about 45.degree. C., followed by one or more washes in 0.2.times.SSC, 0.1% SDS at 60.degree. C.; 3) high stringency hybridization conditions in 6.times.SSC at about 45.degree. C., followed by one or more washes in 0.2.X SSC, 0.1% SDS at 65.degree. C.; and 4) very high stringency hybridization conditions in 0.5M sodium phosphate, 7% SDS at 65.degree. C., followed by one or more washes at 0.2.times.SSC, 1% SDS at 65.degree. C. Very high stringency conditions of 4) are the preferred conditions unless otherwise specified.

[0075] The term "isolated" as used herein with respect to nucleic acids, such as DNA or RNA, refers to molecules separated from other DNAs or RNAs, respectively, that are present in the natural source of the nucleic acid. Moreover, by an "isolated nucleic acid" is meant to include nucleic acid fragments, which are not naturally occurring as fragments and would not be found in the natural state. The term "isolated" is also used herein to refer to polypeptides, which are isolated from other cellular proteins, and is meant to encompass both purified and recombinant polypeptides. The term "isolated" as used herein also refers to a nucleic acid or peptide that is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques. The term "isolated" as used herein also refers to a nucleic acid or peptide that is substantially free of chemical precursors or other chemicals when chemically synthesized.

[0076] As used herein, the "level of expression of a gene in a cell" refers to the level of mRNA, pre-mRNA nascent transcript(s), transcript processing intermediates, mature mRNA(s), and degradation products encoded by the gene in the cell.

[0077] As used herein, the term "microorganism" means prokaryotic and eukaryotic microbial species from the domains Archaea, Bacteria, and Eucarya, the latter including yeast and filamentous fungi, protozoa, algae, or higher Protista. The terms "microbial cells" (i.e., cells from microbes) and "microbes" are used interchangeably and refer to cells or small organisms that can only be seen with the aid of a microscope.

[0078] As used herein, the term "nucleic acid" refers to polynucleotides, such as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid (RNA). The term should also be understood to include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides, ESTs, chromosomes, cDNAs, mRNAs, and rRNAs.

[0079] As used herein, the term "operably linked" means that selected nucleotide sequence (e.g., encoding a polypeptide described herein) is in proximity to a promoter to allow the promoter to regulate expression of the selected DNA. In addition, the promoter is located upstream of the selected nucleotide sequence in terms of the direction of transcription and translation. By "operably linked" is meant that a nucleotide sequence and a regulatory sequence(s) are connected in such a way as to permit gene expression when the appropriate molecules (e.g., transcriptional activator proteins) are bound to the regulatory sequence(s).

[0080] The term "or" is used herein to mean, and is used interchangeably with, the term "and/or," unless context clearly indicates otherwise.

[0081] As used herein, "overexpress" means to express or cause to be expressed a nucleic acid or polypeptide in a cell at a greater concentration than is normally expressed in a corresponding wild-type cell. For example, a polypeptide can be "overexpressed" in a recombinant host cell when the polypeptide is present in a greater concentration in the recombinant host cell compared to its concentration in a non-recombinant host cell of the same species.

[0082] As used herein, "partition coefficient" or "P" is defined as the equilibrium concentration of a compound in an organic phase divided by the concentration at equilibrium in an aqueous phase (e.g., fermentation broth). In one embodiment of a bi-phasic system described herein, the organic phase is formed by the fatty aldehyde or fatty alcohol during the production process. However, in some examples, an organic phase can be provided, such as by providing a layer of octane, to facilitate product separation. When describing a two phase system, the partition characteristics of a compound can be described as logP. For example, a compound with a logP of 1 would partition 10:1 to the organic phase: aqueous phase. A compound with a logP of -1 would partition 1:10 to the organic phase: aqueous phase. By choosing an appropriate fermentation broth and organic phase, a branched fatty aldehyde or branched fatty alcohol with a high logP value can separate into the organic phase even at very low concentrations in the fermentation vessel.

[0083] As used herein, the term "purify," "purified," or "purification" means the removal or isolation of a molecule from its environment by, for example, isolation or separation. "Substantially purified" molecules are at least about 60% free, preferably at least about 75% free, and more preferably at least about 90% free from other components with which they are associated. As used herein, these terms also refer to the removal of contaminants from a sample. For example, the removal of contaminants can result in an increase in the percentage of branched fatty aldehyde or branched fatty alcohol in a sample. For example, when branched fatty alcohols are produced in a host cell, the branched fatty alcohols can be purified by the removal of host cell proteins, or by simply separating and removing linear fatty alcohols that are produced during the same process. After purification, the percentage of branched fatty alcohols in the sample is increased.

[0084] The terms "purify," "purified," and "purification" do not require absolute purity. They are relative terms. Thus, for example, when branched fatty alcohols are produced in host cells, a purified branched fatty alcohol is one that is substantially separated from other cellular components (e.g., nucleic acids, polypeptides, lipids, carbohydrates, or other compounds, such as, for example, linear fatty alcohols). In another example, a purified branched fatty alcohol preparation is one in which the branched fatty alcohol is substantially free from contaminants, such as those that might be present following fermentation. In some embodiments, a branched fatty alcohol is purified when at least about 50% by weight of a sample is composed of the branched fatty alcohol. In other embodiments, a branched fatty alcohol is purified when at least about 60%, 70%, 80%, 85%, 90%, 92%, 95%, 98%, or 99% or more by weight of a sample is composed of the branched fatty alcohol.

[0085] As used herein, the term "recombinant polypeptide" refers to a polypeptide that is produced by recombinant DNA techniques, wherein generally DNA encoding the expressed protein or RNA is transferred into a suitable expression vector and that is in turn used to transform a host cell to produce the polypeptide or RNA.

[0086] As used herein, the term "substantially identical" (or "substantially homologous") is used to refer to a first amino acid or nucleotide sequence that contains a sufficient number of identical or equivalent (e.g., with a similar side chain, such as involving conservative amino acid substitutions) amino acid residues or nucleotides to a second amino acid or nucleotide sequence such that the first and second amino acid or nucleotide sequences have similar activities.

[0087] As used herein, the term "surfactants" refers broadly to surface active agents. These agents are typically amphipathic molecules comprising both hydrophilic and hydrophobic moieties that partition preferentially at the interface between fluid phases with different degrees of polarity and hydrogen bonding, such as, for example, an oil/water interface, or an air/water interface. Surfactants are capable of reducing surface and interfacial tension and forming microemulsions. These characteristics confer detergency, emulsifying, foaming and dispersing traits, making them some of the most versatile process chemicals.

[0088] Surfactants can be natural or synthetic in origin. Surfactants from natural origin can be derived from, for example, vegetable or animal sources. Surfactants derived from synthetic origin are typically those derived from petroleum.

[0089] There are many types of surfactants, including, for example, anionic surfactants, cationic surfactants, non-ionic surfactants, and amphoteric/zwitterionic surfactants, each with distinct characteristics.

[0090] The hydrophobic end of an anionic surfactant is negatively charged in solution. As a result, they have good cleaning properties and high sudding potentials, which make them particularly effective as some of the most widely used types of surfactants in, for example, laundry detergents, dishwashing liquids, and shampoos. Known anionic surfactants include, for example, alkyl sulfates, alkyl ethoxylate sulfates, and soaps.

[0091] The hydrophobic end of a cationic surfactant is positively charged in solution. Three types of cationic surfactants are the most commonly known. The first type is the esterquat, which is widely included in, for example, fabric treatment agents or softeners and in detergents with built-in softeners. This is because esterquat is capable of adding softness to fabrics. The second type is a mono alkyl quaternary system, which is found in many household cleaners due to its disinfecting and/or sanitizing properties.

[0092] Non-ionic surfactants do not have an electrical charge in solution, making them resistant to water hardness deactivation. They are typically excellent grease removers. The most commonly used non-ionic surfactants are ethers or derivatives of fatty alcohols.

[0093] Amphoteric/zwitterionic surfactants are milder than the other types of surfactants, making them particularly suitable for use in personal or beauty care cleaning/treatment products. They may contain two oppositely-charged groups. While the positive charge is typically conferred by ammonium, the source of the negative charge can vary. For example, the negative charge can be conferred by carboxylate, sulfate, sulfonate, or a combination thereof. They can be anionic (e.g., negatively charged), cationic (e.g., positively charged) or non-ionic (e.g., no charge) in solution, depending on the acidity or pH of the solution. They have good compatibility with the other types of surfactants and are well known for being soluble and effective in the presence of high concentrations of electrolytes, acids and alkalis. An example of an amphoteric/zwitterionic surfactant is an alkyl betaine.

[0094] In typical applications, different types of surfactants are blended or otherwise used together to achieve an array of desirable properties.

[0095] As used herein, the term "synthase" means an enzyme that catalyzes a synthesis process. As used herein, the term synthase includes synthases, synthetases, and ligases.

[0096] As used herein, the term "transfection" means the introduction of a nucleic acid (e.g., via an expression vector) into a recipient cell by nucleic acid-mediated gene transfer.

[0097] As used herein, "transformation" refers to a process in which a cell's genotype is changed as a result of the cellular uptake of exogenous DNA or RNA. This may result in the transformed cell expressing a recombinant form of an RNA or polypeptide. In the case of antisense expression from the transferred gene, the expression of a naturally-occurring form of the polypeptide is disrupted.

[0098] As used herein, a "transport protein" is a polypeptide that facilitates the movement of one or more compounds in and/or out of a cellular organelle and/or a cell.

[0099] As used herein, a "variant" of polypeptide X refers to a polypeptide having the amino acid sequence of peptide X in which one or more amino acid residues is altered. The variant may have conservative changes or nonconservative changes. Guidance in determining which amino acid residues may be substituted, inserted, or deleted without affecting biological activity may be found using computer programs well known in the art, for example, LASERGENE software (DNASTAR).

[0100] The term "variant," when used in the context of a polynucleotide sequence, may encompass a polynucleotide sequence related to that of a gene or the coding sequence thereof. This definition may also include, for example, "allelic," "splice," "species," or "polymorphic" variants. A splice variant may have significant identity to a reference polynucleotide, but will generally have a greater or fewer number of polynucleotides due to alternative splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or an absence of domains. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid sequence identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species.

[0101] As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of useful vector is an episome (i.e., a nucleic acid capable of extra-chromosomal replication). Useful vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as "expression vectors". In general, expression vectors of utility in recombinant DNA techniques are often in the form of "plasmids," which refer generally to circular double stranded DNA loops that, in their vector form, are not bound to the chromosome. In the present specification, "plasmid" and "vector" are used interchangeably, as the plasmid is the most commonly used form of vector. However, also included are such other forms of expression vectors that serve equivalent functions and that become known in the art subsequently hereto.

Surfactants and Cleaning Compositions Comprising a Microbially Produced Branched Fatty Alcohol or a Branched Fatty Alcohol Derivative Thereof

[0102] The invention provides a surfactant composition comprising one or more microbially produced branched chain fatty alcohols and/or derivatives thereof. The invention further provides a detergent/cleaning composition, such as, for example, a household cleaning composition or a personal or beauty care cleaning composition, comprising such a surfactant.

[0103] In one aspect, the invention features a surfactant composition comprising branched chain fatty alcohols and/or derivatives thereof produced by microbes. In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing genes encoding at least one subunit of a branched-chain alpha-keto acid dehydrogenase polypeptide. The host cell expresses genes encoding at least two subunits of a branched-chain alpha-keto acid dehydrogenase polypeptide. For example, the host cell expresses a set of genes encoding the first subunit and a second subunit of a branched-chain alpha-keto acid dehydrogenase polypeptide. In certain embodiments, the host cell expresses a third gene encoding the second subunit of a branched-chain alpha-keto acid dehydrogenase polypeptide. In some embodiments, the first and second polypeptides have branched-chain alpha-keto acid decarboxylase activity, and the third polypeptide has lipoamide acyltransferase activity. In further embodiments, the host cell expresses a fourth gene encoding the third subunit of a branched-chain alpha-keto acid dehydrogenase polypeptide. In some embodiments, the fourth polypeptide has lipoamide dehydrogenase activity.

[0104] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding a beta ketoacyl-ACP synthase polypeptide. In certain embodiments, the beta ketoacyl-ACP synthase polypeptide has FabH activity. In certain embodiments the beta ketoacyl-ACP synthase has specificity for branched-chain acyl-CoA substrates.

[0105] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a set of genes encoding at least one subunit of a branched-chain alpha-keto acid dehydrogenase complex. Specifically, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a first gene encoding a first polypeptide comprising the amino acid sequence that is any one of SEQ ID NOs:1, 3, 5, 7, 9, 11, 13, and 15, or one that has at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, and 15, or a variant thereof; a second gene encoding a second polypeptide comprising an amino acid sequence of any one of SEQ ID NOs:24, 26, 28, 30, 32, 34, 36, and 38, or one that has at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs:24, 26, 28, 30, 32, 34, 36, and 38, or a variant thereof. In certain embodiments, the host cell also expresses a third gene encoding a third polypeptide comprising the amino acid sequence of any one of SEQ ID NOs:47, 49, 51, 53, 55, 57, 59, and 61, or one that has at least 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs:47, 49, 51, 53, 55, 57, 59, and 61, or a variant thereof. In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or a derivative thereof is isolated from the host cell, for example, isolated from the extracellular environment of the host cell. In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is spontaneously secreted, completely or partially, from the host cell. In alternative embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is transported into the extracellular environment. In further embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is passively transported or spontaneously secreted into the extracellular environment.

[0106] The first polypeptide comprises the amino acid sequence of any one of SEQ ID NOs:1, 3, 5, 7, 9, 11, 13, and 15, with one or more amino acid substitutions, additions, insertions, or deletions, the second polypeptide comprises the amino acid sequence of any one of SEQ ID NOs: 24, 26, 28, 30, 32, 34, 36, and 38, wherein the first and second polypeptides together have alpha-keto acid decarboxylase activity. In certain embodiments, the first polypeptide comprises one or more or all of the amino acid sequence motifs selected from SEQ ID NOs:17-23. The second polypeptide comprises one or more or all of the amino acid sequence motifs selected from SEQ ID NOs:40-46. In some embodiments, the third polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 47, 49, 51, 53, 55, 57, 59, and 61, with one or more amino acid substitutions, additions, insertions, or deletions, wherein the third polypeptide has lipoamide acyltransferase activity. The third polypeptide comprises one or more or all of the amino acid sequence motifs selected from SEQ ID NOs:63-68. In some embodiments, the first, second and third polypeptides are capable of catalyzing the conversion of alpha-keto acids to branched acyl-CoAs. It is within the capacity of those skilled in the art to devise a suitable enzymatic assay using the appropriate substrates. Examples of such assays are described herein.

[0107] In some embodiments, the first, second, and third polypeptides independently comprises 1 or more, 5 or more, 10 or more, 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, or 100 or more of the following conservative amino acid substitutions: replacement of an aliphatic amino acid, such as alanine, valine, leucine, and isoleucine, with another aliphatic amino acid; replacement of a serine with a threonine; replacement of a threonine with a serine; replacement of an acidic residue, such as aspartic acid and glutamic acid, with another acidic residue; replacement of a residue bearing an amide group, such as asparagine and glutamine, with another residue bearing an amide group; exchange of a basic residue, such as lysine and arginine, with another basic residue; and replacement of an aromatic residue, such as phenylalanine and tyrosine, with another aromatic residue. In some embodiments, the first and second polypeptides independently comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the third polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the first and second polypeptides have branched-chain alpha-keto acid decarboxylase activity and the third polypeptide has lipoamide acyltransferase activity. In some embodiments, the first, second and third polypeptides are capable of catalyzing the conversion of branched alpha-keto acids to branched acyl-CoAs.

[0108] In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a fourth gene encoding a fourth polypeptide comprising the amino acid sequence of any one of SEQ ID NOs:69, 71, 73, 75, 77, 79, 81, and 83, or one that has at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs:69, 71, 73, 75, 77, 79, 81, and 83, or a variant thereof. In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or a derivative thereof is isolated from the host cell, for example, from the extracellular environment. In certain embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is spontaneously secreted, partially or completely, into the extracellular environment. In other embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is transported into the extracellular environment. In certain embodiments, the branched fatty aldehyde, branched fatty alcohol or the derivative thereof is passively transported into the extracellular environment.

[0109] The fourth polypeptide comprises the amino acid sequence of any one of SEQ ID NOs:69, 71, 73, 75, 77, 79, 81, and 83, with one or more amino acid substitutions, additions, insertions, or deletions, and the polypeptide has lipoamide dehydrogenase activity. In certain embodiments, the fourth polypeptide comprises one or more or all of amino acid sequence motifs selected from SEQ ID NOs:85-89. In some embodiments, the fourth polypeptide comprises 1 or more, 5 or more, 10 or more, 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, or 100 or more of the following conservative amino acid substitutions: replacement of an aliphatic amino acid, such as alanine, valine, leucine, and isoleucine, with another aliphatic amino acid; replacement of a serine with a threonine; replacement of a threonine with a serine; replacement of an acidic residue, such as aspartic acid and glutamic acid, with another acidic residue; replacement of a residue bearing an amide group, such as asparagine and glutamine, with another residue bearing an amide group; exchange of a basic residue, such as lysine and arginine, with another basic residue; and replacement of an aromatic residue, such as phenylalanine and tyrosine, with another aromatic residue. In some embodiments, the fourth polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the fourth polypeptide has lipoamide dehydrogenase activity. In some embodiments, the first, second, third and fourth polypeptides have branched chain alpha-keto acid decarboxylase and/or lipoamide acyltransferase and/or lipoamide dehydrogenase activity. In some embodiments, the first, second, third and fourth polypeptides, optionally forming a complex, are capable of catalyzing the conversion alpha-keto acids to branched-chain acyl-CoAs.

[0110] In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell further expressing a gene encoding a beta-ketoacyl ACP synthase comprising the amino acid sequence of any one of SEQ ID NOs: 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, and 120, or one that has at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs:90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, and 120, or a variant thereof. In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or a derivative thereof is isolated from the host cell, for example, from the extracellular environment. In certain embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is spontaneously secreted, partially or completely, into the extracellular environment. In other embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is transported into the extracellular environment. In certain embodiments, the branched fatty aldehyde, branched fatty alcohol, or the derivative thereof is passively transported into the extracellular environment.

[0111] The beta ketoacyl-ACP synthase polypeptide comprises the amino acid sequence of any one of SEQ ID NOs:90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, and 120, with one or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the beta ketoacyl-ACP synthase polypeptide comprises one or more or all of amino acid sequence motifs selected from SEQ ID NOs:122-127. In some embodiments, the polypeptide has FabH activity. In certain embodiments, the beta ketoacyl-ACP synthase polypeptide has specificity for branched-chain fatty acyl-CoA substrates. In certain embodiments, the polypeptide is capable of catalyzing the condensation reaction between a branched acyl-CoA and malonyl-ACP. It is within the capacity of those skilled in the art to devise a suitable enzymatic assay using the appropriate substrates in order to distinguish those polypeptides having sequence homology to the beta-ketoacyl-ACP synthase polypeptides herein but are not suitable or does not have specificity for branched-chain substrates. Two examples of such enzymatic assays are described herein.

[0112] The beta ketoacyl-ACP synthase polypeptide can comprise 1 or more, 5 or more, 10 or more, 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, or 100 or more of the following conservative amino acid substitutions: replacement of an aliphatic amino acid, such as alanine, valine, leucine, and isoleucine, with another aliphatic amino acid; replacement of a serine with a threonine; replacement of a threonine with a serine; replacement of an acidic residue, such as aspartic acid and glutamic acid, with another acidic residue; replacement of a residue bearing an amide group, such as asparagine and glutamine, with another residue bearing an amide group; exchange of a basic residue, such as lysine and arginine, with another basic residue; and replacement of an aromatic residue, such as phenylalanine and tyrosine, with another aromatic residue. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200 or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the polypeptide has FabH activity. In some embodiments, the polypeptide has specificity for branched-chain acyl-CoAs. In some embodiments, the polypeptide is capable of catalyzing the condensation of a branched acyl-CoA and malonyl-ACP.

[0113] In certain embodiments, the first polypeptide comprises an amino acid sequence motif of any one of or one or more or all of SEQ ID NOs:17-23, wherein the first polypeptide is of about 200 to about 800 amino acid residues in length, or about 300 to about 700 amino acid residues in length, or about 400 to about 600 amino acids in length. In some embodiments, the second polypeptide comprises an amino acid sequence motif of any one of or one or more or all of SEQ ID NOs:40-46, wherein the second polypeptide is about 200 to about 800 amino acid residues in length, or about 300 to about 700 amino acid residues in length, or about 400 to about 600 amino acid residues in length. In some embodiments, the third polypeptide comprises an amino acid sequence motif of any one of or one or more or all of SEQ ID NOs:63-68, wherein the first polypeptide is of about 200 to about 800 amino acid residues in length, or about 300 to about 700 amino acid residues in length, or about 400 to about 600 amino acid residues in length. In some embodiments, the first, second and optionally the third polypeptides are capable of catalyzing the conversion of alpha-keto acid substrates to branched acyl-CoAs.

[0114] In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell further expressing a gene encoding a fatty aldehyde biosynthesis polypeptide selected from those listed in the Table 6, or a variant thereof. In some embodiments, the fatty aldehyde biosynthesis polypeptide comprises the amino acid sequence of an enzyme listed in Table 6, with one or more amino acid substitutions, additions, insertions, or deletions, and the polypeptide has carboxylic acid reductase activity. In some embodiments, the polypeptide has fatty acid reductase activity. In some embodiments, the fatty aldehyde biosynthesis polypeptide comprises one or more of the following conservative amino acid substitutions. In some embodiments, the fatty aldehyde biosynthesis polypeptide has about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the polypeptide has carboxylic acid reductase activity. In some embodiments, the polypeptide has fatty acid reductase activity.

[0115] In some embodiments, the branched fatty alcohol or a derivative thereof is isolated from the host cell, for example, from the extracellular environment. In some embodiments, the branched fatty alcohol or the derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty alcohol or the derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or the derivative thereof is passively transported into the extracellular environment.

[0116] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell wherein a gene encoding a fatty acid synthase is modified. For example, modifying the expression of a gene encoding a fatty acid synthase includes expressing a gene encoding a fatty acid synthase in the host cell and/or increasing the expression or activity of an endogenous fatty acid synthase in the host cell. Alternatively, modifying the expression of a gene encoding a fatty acid synthase includes attenuating a gene encoding a fatty acid synthase in the host cell and/or decreasing the expression or activity of an endogenous fatty acid synthase in the host cell. In some embodiments, the fatty acid synthase is a thioesterase. In particular embodiments, the thioesterase is encoded by tesA, tesA without leader sequence, tesB, fatB, fatB2, fatB3, fatA, or fatA1.

[0117] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding a fatty alcohol biosynthesis polypeptide. The fatty alcohol biosynthesis polypeptide is, for example, an alcohol dehydrogenase. In particular embodiments, the fatty alcohol biosynthesis polypeptide is one selected from the enzymes listed in Table 8, or a variant thereof.

[0118] In certain other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding another aldehyde biosynthetic polypeptide or an acyl-ACP reductase polypeptide comprising the amino acid sequence of any of the enzymes listed in Table 7, or a variant thereof. In some embodiments, the branched fatty alcohol or derivative thereof is isolated from the host cell, for example, from the extracellular environment. In certain embodiments, the branched fatty alcohol or derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty alcohol or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or derivative thereof is passively transported into the extracellular environment.

[0119] The acyl-ACP reductase polypeptide, for example, comprises the amino acid sequence of an enzyme selected from those listed in Table 7, with one or more amino acid substitutions, additions, insertions, or deletions, and the polypeptide has reductase activity. In certain embodiments, the polypeptide is capable of catalyzing the conversion of a suitable biological substrate into an aldehyde. The acyl-ACP reductase polypeptide, for example, comprises one or more conservative amino acid substitutions, or has about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, or more amino acid substitutions, additions, insertions, or deletions. In some embodiments, the polypeptide has reductase activity. In some embodiments, the polypeptide is capable of catalyzing the conversion of a suitable biological substrate into an aldehyde.

[0120] In any of the above-described embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a fatty acid degradation enzyme relative to a wild type host cell. For example, the host cell is genetically engineered to express an attenuated level of an acyl-CoA synthase relative to a wild type host cell. In particular embodiments, the host cell expresses an attenuated level of an acyl-CoA synthase encoded by fadD, fadK, BH3103, yhfL, Pfl-4354, EAV15023, fadD1, fadD2, RPC_4074, fadDD35, fadDD22, faa3p or the gene encoding the protein ZP_01644857. In certain embodiments, the genetically engineered host cell comprises a knockout of one or more genes encoding a fatty acid degradation enzyme, such as the aforementioned acyl-CoA synthase genes. In certain embodiments, the host cell is genetically engineered to express, relative to a wild type host cell, a decreased level of at least one of a gene encoding an acyl-CoA dehydrogenase, a gene encoding an outer membrane protein receptor, and a gene encoding a transcriptional regulator of fatty acid biosynthesis. In some embodiments, the gene encoding an acyl-CoA dehydrogenase is fadE. In some embodiments, the gene encoding an outer membrane protein receptor is tonA (also known as fhuA). Yet in other embodiments, the gene encoding a transcriptional regulator of fatty acid biosynthesis is fabR.

[0121] In yet other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a dehydratase/isomerase enzyme, such as an enzyme encoded by fabA or by a gene listed in Table 1 or Table 2. In some embodiments, the host cell comprises a knockout of fabA or a gene listed in Table 1 or Table 2. In other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of an endogenous ketoacyl-ACP synthase, such as an enzyme encoded by fabB or by a gene listed in Table 3 or Table 4. In certain embodiments, the host cell comprises a knockout of fabB or a gene listed in Table 3 or Table 4. In yet other embodiments, the host cell is genetically engineered to express a modified level of a gene encoding a desaturase enzyme, such as desA.

[0122] In certain embodiments, the branched-chain alpha-keto acid dehydrogenase complex polypeptides, the beta ketoacyl-ACP synthase polypeptide, the aldehyde biosynthesis polypeptide, the fatty acid synthase, the acyl-ACP reductase, the alcohol biosynthesis polypeptide, and the fatty acid degradation enzyme polypeptide are each independently obtained from a bacterium, a plant, an insect, a yeast, a fungus, or a mammal. For example, each of the above-mentioned polypeptides is from a mammalian cell, plant cell, insect cell, yeast cell, fungus cell, filamentous fungi cell, bacterial cell, or any other organism described herein. In certain embodiments, the branched-chain alpha-keto acid dehydrogenase complex polypeptides can be from a bacterium that uses branched amino acids as carbon source, including, for example, Pseudomonas putida or a Bacillus subtilis. In certain embodiments, the branched-chain alpha-keto acid dehydrogenase complex polypeptide can be from a bacterium that comprises branched fatty acids in its phospholipids, including, for example, a Legionella, Stenotrophomonas, Alteromonas, Flavobacterium, Myxococcus, Bacteroides, Micrococcus, Staphylococcus, Bacillus, Clostridium, Listeria, Lactococcus, or Streptomyces bacterium. In some embodiments, the bacterium is a Leginella pneumophila, Stenotrophomonas maltophilia, Alteromonas macleodii, Flabobacterium phsychrophilum, Myxococcus Xanthus, Bacteroides thetaiotaomicron, Macrococcus luteus, Staphylococcus aureus, Clostridium thermocellum, Listeria monocytogenes, Streptomyces lividans, Streptomyces coelicolor, Streptomyces glaucescens, Streptococcus pneumoniae, Streptomyces peucetius, Streptococcus pyogenes, Escherichia coli, Escherichia coli K-12, Lactococcus lactis ssp. Lactis, Mycobacterium tuberculosis, Enterococcus tuberculosis, Bacillus subtilis, Lactobacillus plantarum. In certain embodiments, suitable fatty aldehyde biosynthesis polypeptides, fatty alcohol biosynthesis polypeptides, acyl-ACP reductases, and other polypeptides of the invention can be from a mycobacterium selected from the group consisting of Mycobacterium smegmatis, Mycobacterium abscessus, Mycobacterium avium, Mycobacterium bovis, Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum, and Mycobacterium ulcerans. In other embodiments, the bacterium is Nocardia sp. NRRL 5646, Nocardia farcinica, Streptomyces griseus, Salinispora arenicola, or Clavibacter michiganenesis. In certain further embodiments, the polypeptide of the invention is derived from a cyanobacterium, including, for example, Synechococcus elongatus PCC7942, Synechocystis sp. PCC6803, Cyanothece sp. ATCC51142, Prochlorococcus marinus subsp. pastoris str. CCMP1986 PMM0533, Gloeobacter violaceus PCC7421, Nostoc punctiforme PCC73102, Anabaena variabilis ATCC29413, Synechococcus elongatus PCC6301, and Nostoc sp. PCC 7120, Microcoleus chthonoplastes PCC7420, Arthrospira maxima CS-328, Lyngbya sp. PCC8106, Nodularia spumigena CCY9414, Trichodesmium erythraeum IMS101, Microcystis aeruginosa, Nostoc azollae, Anabaena variabilis, Crocophaera watsonii, Thermosynechococcus elongatus, Gloeobacer violaceus, Cyanobium, or Prochlorococcus marinus.

[0123] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell cultured in the presence of at least one biological substrate for the branched-chain alpha-keto acid dehydrogenase polypeptides, the beta ketoacyl-ACP synthase polypeptide, the aldehyde biosynthesis polypeptide, the acyl-ACP reductase, and/or the alcohol biosynthesis polypeptide. Suitable substrate for the branched-chain alpha-keto acid dehydrogenase polypeptides can include, without limitation, 2-oxo-isovalerate, 2-oxo-isobutyrate, or 2-oxo-3-methyl-valerate.

[0124] In another aspect, the invention features a surfactant or detergent composition comprising a microbially produced branched fatty alcohol or a derivative thereof. In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a first polynucleotide that hybridizes to a complement of a nucleotide sequence of any one of SEQ ID NOs:2, 4, 6, 8, 10, 12, 14, and 16, or to a fragment thereof, and a second polynucleotide that hybridizes to a complement of a second polynucleotide sequence of any one of SEQ ID NOs:25, 27, 29, 31, 33, 35, 37, and 39. In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a third polynucleotide that hybridizes to a complement of a third nucleotide sequence of any one of SEQ ID NOs:48, 50, 52, 54, 56, 58, 60, and 62, or to a fragment thereof, wherein the first and second polynucleotides encode the first and second polypeptides having branched-chain alpha-keto acid decarboxylase activity, and wherein the third polynucleotide encodes a polypeptide having lipoamide acyltransferase activity. In some embodiments, the first and the second polypeptides, optionally forming a single subunit, optionally together with the third polypeptide, are capable of catalyzing the conversion of branched-chain alpha-keto acids to branched acyl-CoAs.

[0125] The first polynucleotide hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions, to a complement of the nucleotide sequence of any one of SEQ ID NOs:2, 4, 6, 8, 10, 12, 14, and 16, or to a fragment thereof. The second polynucleotide hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions, to a complement of the nucleotide sequence of any one of SEQ ID NOs: 25, 27, 29, 31, 33, 35, 37, and 39, or to a fragment thereof. The third polynucleotide hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions, to a complement of the nucleotide sequence of any one of SEQ ID NOs: 48, 50, 52, 54, 56, 58, 60, and 62.

[0126] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a fourth polynucleotide that hybridizes to a complement of a nucleotide sequence of any one of SEQ ID NOs:70, 72, 74, 76, 78, 80, 82, and 84, or to a fragment thereof, wherein the fourth polynucleotide encodes a polypeptide having lipoamide dehydrogenase activity. In some embodiments, the first, second, and optionally the third and/or fourth polypeptides are capable of catalyzing the conversion of branched-chain alpha-keto acids into branched acetyl-CoAs.

[0127] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a polynucleotide that hybridizes to a complement of a nucleotide sequence of any one of SEQ ID NOs:91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, and 121, or to a fragment thereof, wherein the polynucleotide encodes a polypeptide having beta ketoacyl-ACP synthase activity. In some embodiments, the polypeptide is capable of catalyzing the condensation of a branched acyl-CoA with malonyl-ACP. In some embodiments, the polypeptide has FabH activity. In some embodiments, the polypeptide has specificity for branched acyl-CoA substrates. The polynucleotide hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions, to a complement of the nucleotide sequence of any one of SEQ ID NOs:91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, and 121, or to a fragment thereof.

[0128] In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is isolated from the host cell, for example, from the extracellular environment. In certain embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is passively transported into the extracellular environment.

[0129] In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a polynucleotide that hybridizes to a complement of the nucleotide sequence encoding a fatty aldehyde biosynthesis polypeptide listed in Table 6, or to a fragment thereof, wherein the polynucleotide encodes a polypeptide having carboxylic acid reductase activity. In some embodiments, the polypeptide has fatty acid reductase activity.

[0130] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, wherein the gene encoding a fatty acid synthase is modified. In certain embodiments, modifying the expression of a gene encoding a fatty acid synthase includes expressing a gene encoding a fatty acid synthase in the host cell and/or increasing the expression or activity of an endogenous fatty acid synthase in the host cell. In alternate embodiments, modifying the expression of a gene encoding a fatty acid synthase includes attenuating a gene encoding a fatty acid synthase in the host cell and/or decreasing the expression or activity of an endogenous fatty acid synthase in the host cell. In some embodiments, the fatty acid synthase is a thioesterase. In particular embodiments, the thioesterase is encoded by tesA, tesA without leader sequence, tesB, fatB, fatB2, fatB3, fatA, or fatA1.

[0131] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding a fatty alcohol biosynthesis polypeptide. For example, the fatty alcohol biosynthesis polypeptide is an alcohol dehydrogenase. In particular embodiments, the fatty alcohol biosynthesis polypeptide is one selected from those listed in Table 8, or a variant thereof.

[0132] In any of the above-described embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a fatty acid degradation enzyme relative to a wild type host cell. In some embodiments, the host cell is genetically engineered to express an attenuated level of an acyl-CoA synthase relative to a wild type host cell. In particular embodiments, the host cell expresses an attenuated level of an acyl-CoA synthase encoded by fadD, fadK, BH3103, yhfL, Pfl-4354, EAV15023, fadD1, fadD2, RPC_4074, fadDD35, fadDD22, faa3p or the gene encoding the protein ZP_01644857. In certain embodiments, the genetically engineered host cell comprises a knockout of one or more genes encoding a fatty acid degradation enzyme, such as the aforementioned acyl-CoA synthase genes. In certain embodiments, the host cell is genetically engineered to express, relative to a wild type host cell, a decreased level of at least one of a gene encoding an acyl-CoA dehydrogenase, a gene encoding an outer membrane protein receptor, and a gene encoding a transcriptional regulator of fatty acid biosynthesis. In some embodiments, the gene encoding an acyl-CoA dehydrogenase is fadE. In some embodiments, the gene encoding an outer membrane protein receptor is tonA (also known as fhuA). Yet in other embodiments, the gene encoding a transcriptional regulator of fatty acid biosynthesis is fabR.

[0133] In yet other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a dehydratase/isomerase enzyme, such as an enzyme encoded by fabA or by a gene listed in Table 1 or Table 2. In some embodiments, the host cell comprises a knockout of fabA or a gene listed in Table 1 or Table 2. In other embodiments, the host cell is genetically engineered to express an attenuated level of a ketoacyl-ACP synthase, such as an enzyme encoded by fabB or by a gene listed in Table 3 or Table 4. In certain embodiments, the host cell comprises a knockout of fabB or a gene listed in Table 3 or Table 4. In yet other embodiments, the host cell is genetically engineered to express a modified level of a gene encoding a desaturase enzyme, such as desA.

[0134] In some embodiments, the branched fatty alcohol or a derivative thereof is isolated from the host cell, for example, from the extracellular environment. In certain embodiments, the branched fatty alcohol or the derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty alcohol or the derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or the derivative thereof is passively transported into the extracellular environment.

[0135] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a polynucleotide that hybridizes to a complement of a nucleotide sequence encoding an acyl-ACP reductases selected from those listed in Table 7, or to a fragment thereof, wherein the polynucleotide encodes a polypeptide having reductase activity. The polynucleotide hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions, to a complement of the nucleotide sequence encoding an acyl-ACP reductases selected from those listed in Table 7, or to a fragment thereof.

[0136] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding a fatty alcohol biosynthesis polypeptide in the host cell. For example, the fatty alcohol biosynthesis polypeptide is an alcohol dehydrogenase. In particular embodiments, the fatty alcohol biosynthesis polypeptide is one selected from those listed in Table 8, or a variant thereof.

[0137] In any of the above-described embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a fatty acid degradation enzyme relative to a wild type host cell. In some embodiments, the host cell is genetically engineered to express an attenuated level of an acyl-CoA synthase relative to a wild type host cell. In particular embodiments, the host cell expresses an attenuated level of an acyl-CoA synthase encoded by fadD, fadK, BH3103, yhfL, Pfl-4354, EAV15023, fadD1, fadD2, RPC_4074, fadDD35, fadDD22, faa3p or the gene encoding the protein ZP_01644857. In certain embodiments, the genetically engineered host cell comprises a knockout of one or more genes encoding a fatty acid degradation enzyme, such as the aforementioned acyl-CoA synthase genes. In certain embodiments, the host cell is genetically engineered to express, relative to a wild type host cell, a decreased level of at least one of a gene encoding an acyl-CoA dehydrogenase, a gene encoding an outer membrane protein receptor, and a gene encoding a transcriptional regulator of fatty acid biosynthesis. In some embodiments, the gene encoding an acyl-CoA dehydrogenase is fadE. In some embodiments, the gene encoding an outer membrane protein receptor is tonA (also known as fhuA). Yet in other embodiments, the gene encoding a transcriptional regulator of fatty acid biosynthesis is fabR.

[0138] In yet other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a dehydratase/isomerase enzyme, such as an enzyme encoded by fabA or by a gene listed in Table 1 or Table 2. In some embodiments, the host cell comprises a knockout of fabA or a gene listed in Table 1 or Table 2. In other embodiments, the host cell is genetically engineered to express an attenuated level of a ketoacyl-ACP synthase, such as an enzyme encoded by fabB or by a gene listed in Table 3 or Table 4. In certain embodiments, the host cell comprises a knockout of fabB or a gene listed in Table 3 or Table 4. In yet other embodiments, the host cell is genetically engineered to express a modified level of a gene encoding a desaturase enzyme, such as desA.

[0139] In some embodiments, the branched fatty alcohol or derivative thereof is isolated from the host cell, e.g., from the extracellular environment. In certain embodiments, the branched fatty alcohol or derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty alcohol or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or derivative thereof is passively transported into the extracellular environment.

[0140] In some embodiments, the branched-chain alpha-keto acid dehydrogenase complex, the beta ketoacyl-ACP synthase polypeptide, the aldehyde biosynthesis polypeptide, the fatty acid synthase, the acyl-ACP reductase, the alcohol biosynthesis polypeptide, and the fatty acid degradation enzyme polypeptide are each independently from a bacterium, a plant, an insect, a yeast, a fungus, or a mammal. For example, the branched-chain alpha-keto acid dehydrogenase complex polypeptides can be from a bacterium that uses branched amino acids as carbon source, including, for example, Pseudomonas putida, or Bacillus subtilis. In another example, the branched-chain alpha-keto acid dehydrogenase complex polypeptide can be from a bacterium that comprises branched fatty acids in its phospholipids, including, for example, a Legionella, Stenotrophomonas, Alteromonas, Flavobacterium, Myxococcus, Bccteroides, Micrococcus, Staphylococcus, Bacillus, Clostridium, Listeria, Lactococcus, or Streptomyces bacterium. In some embodiments, the bacterium is a Leginella pneumophila, Stenotrophomonas maltophilia, Alteromonas macleodii, Flabobacterium phsychrophilum, Myxococcus Xanthus, Bacteroides thetaiotaomicron, Macrococcus luteus, Staphylococcus aureus, Clostridium thermocellum, Listeria monocytogenes, Streptomyces lividans, Streptomyces coelicolor, Streptomyces glaucescens, Streptococcus pneumoniae, Streptomyces peucetius, Streptococcus pyogenes, Escherichia coli, Escherichia coli K-12, Lactococcus lactis ssp. Lactis, Mycobacterium tuberculosis, Enterococcus tuberculosis, Bacillus subtilis, Lactobacillus plantarum. In some embodiments, suitable fatty aldehyde biosynthesis polypeptides, fatty alcohol biosynthesis polypeptides, acyl-ACP reductases, and other polypeptides of the invention can be from a mycobacterium selected from the group consisting of Mycobacterium smegmatis, Mycobacterium abscessus, Mycobacterium avium, Mycobacterium bovis, Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum, and Mycobacterium ulcerans. In other embodiments, the bacterium is Nocardia sp. NRRL 5646, Nocardia farcinica, Streptomyces griseus, Salinispora arenicola, or Clavibacter michiganenesis. In yet further embodiments, the polypeptide of the invention is derived from a cyanobacterium, including, for example, Synechococcus elongatus PCC7942, Synechocystis sp. PCC6803, Cyanothece sp. ATCC51142, Prochlorococcus marinus subsp. pastoris str. CCMP1986 PMM0533, Gloeobacter violaceus PCC7421, Nostoc punctiforme PCC73102, Anabaena variabilis ATCC29413, Synechococcus elongatus PCC6301, and Nostoc sp. PCC 7120, Microcoleus chthonoplastes PCC7420, Arthrospira maxima CS-328, Lyngbya sp. PCC8106, Nodularia spumigena CCY9414, Trichodesmium erythraeum IMS101, Microcystis aeruginosa, Nostoc azollae, Anabaena variabilis, Crocophaera watsonii, Thermosynechococcus elongatus, Gloeobacer violaceus, Cyanobium, or Prochlorococcus marinus.

[0141] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell cultured in the presence of at least one biological substrate for the branched-chain alpha-keto acid dehydrogenase polypeptides, the beta ketoacyl-ACP synthase polypeptide, the aldehyde biosynthesis polypeptide, the acyl-ACP reductase, or the alcohol biosynthesis polypeptide. In some embodiments, the host cell is cultured under conditions that allow the expression of the branched-chain alpha-keto acid dehydrogenase polypeptides, the beta ketoacyl-ACP synthase, the aldehyde biosynthesis polypeptide, the acyl-ACP reductase, and/or the alcohol biosynthesis polypeptide. In particular embodiments, the host cell is cultured under conditions that allow the production of branched fatty alcohols or derivatives thereof.

[0142] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell cultured in the presence of at least one biological substrate for the branched-chain alpha-keto acid dehydrogenase complex, the aldehyde biosynthesis polypeptide, the alcohol biosynthesis polypeptide, and/or the acyl-ACP reductase polypeptide. Accordingly, the host cell is cultured under conditions that allow expression of branched-chain alpha-keto acid dehydrogenase complex, the aldehyde biosynthesis polypeptide, the alcohol biosynthesis polypeptide, and/or the acyl-ACP reductase polypeptide.

[0143] In some embodiments, the branched fatty alcohol or derivative thereof is isolated from the host cell, e.g., from the extracellular environment. In some embodiments, the branched fatty alcohol or derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty alcohol or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or derivative thereof is passively transported into the extracellular environment.

[0144] In another aspect, the invention features a surfactant or detergent composition comprising a microbially produced branched fatty alcohol or a derivative thereof. In certain embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing one or more recombinant vectors comprising at least the E1 alpha and beta subunits of a branched-chain alpha-keto acid dehydrogenase. In certain embodiments, the recombinant vector further comprises an E2 subunit of a branched-chain alpha-keto acid dehydrogenase. The subunits can be introduced into the host cell in separate vectors or together in a single vector. For example, the vector can comprise a first polynucleotide sequence having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identity sequence identity to a polynucleotide sequence listed in FIG. 2A, and a second polynucleotide sequence having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identity sequence identity to a polynucleotide sequence listed in FIG. 2B. In another example, the vector can further comprise a third polynucleotide having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identity sequence identity to a polynucleotide sequence listed in FIG. 2C. The polynucleotides encoding the alpha and beta subunits of the E1 subunit can be linked and constitute a single operon, or they may be separately introduced into a vector and/or into a host cell. Likewise, the polynucleotides encoding the E1 subunit and the polynucleotide encoding the E2 subunit can be linked and constitute a single operon, or they may be separately introduced into a vector and/or into a host cell. For example, a first vector can comprise a first polynucleotide sequence having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identity sequence identity to a polynucleotide sequence listed in FIG. 2A, and a second vector can comprise a second polynucleotide having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identity sequence identity to a polynucleotide sequence listed in FIG. 2B.

[0145] In some embodiment, the nucleotide sequence of the first polynucleotide has at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the nucleotide sequence of any one of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14 and 16; the second polynucleotide has at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the nucleotide sequence of any one of SEQ ID NOs: 25, 27, 29, 31, 33, 35, 37, and 39; and the nucleotide sequence of the third polynucleotide has at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the nucleotide sequence of any one of SEQ ID NOs: 48, 50, 52, 54, 56, 58, 60, and 62. In some embodiment, the nucleotide sequence of the first polynucleotide is any one of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14 and 16, the nucleotide sequence of the second polynucleotide is any one of SEQ ID NOs: 25, 27, 29, 31, 33, 35, 37, and 39, and the nucleotide sequence of the third polynucleotide, when present, is any one of SEQ ID NOs: 48, 50, 52, 54, 56, 58, 60, and 62.

[0146] In some embodiment, each of the vectors above, or another vector can comprise a fourth polynucleotide sequence having at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% sequence identity to a polynucleotide sequence listed in FIG. 2D. In certain embodiments, the nucleotide sequence of the fourth polynucleotide has at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the nucleotide sequence of any one of SEQ ID NOs:70, 72, 74, 76, 78, 80, 82, and 84. In some embodiments, the nucleotide sequence of the fourth polynucleotide is any one of SEQ ID NOs:70, 72, 74, 76, 78, 80, 82, and 84.

[0147] In some embodiments, each of the vectors above, or another vector can be introduced into the host cell wherein the vector comprises a beta-ketoacyl ACP synthase nucleotide that has at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% sequence identity to a polynucleotide sequence listed in FIG. 2E. In certain embodiments, the nucleotide sequence of the beta-ketoacyl ACP synthase nucleotide has at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity to the nucleotide sequence of any one of SEQ ID NOs:91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, and 121. In some embodiments, the nucleotide sequence of beta-ketoacyl ACP synthase nucleotide is SEQ ID NOs: 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, and 121.

[0148] In yet another embodiment, an individual vector comprising a beta-ketoacyl-ACP synthase nucleotide that has at least about 30%, e.g., at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% sequence identity to a polynucleotide sequence listed in FIG. 2E can be introduced into a suitable host cell, independent of whether one or more other vectors comprising one or more subunits of a branched-chain alpha-keto acid dehydrogenase is introduced into the same cell. For example, the host cell can suitably be one that expresses an endogenous branched-chain alpha-keto acid dehydrogenase, or one or more subunits thereof.

[0149] In some embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is isolated from the host cell, for example, from the extracellular environment. In some embodiments, the branched fatty aldehyde, branched fatty alcohol or derivative thereof is spontaneously secreted, partially or completely, from the host cell. In alternative embodiments, the branched fatty aldehyde, branched fatty alcohol or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty aldehyde, branched fatty alcohol, or derivative thereof is passively transported into the extracellular environment.

[0150] The recombinant vector can further comprises a promoter operably linked to the nucleotide sequence. In certain embodiments, the promoter is a developmentally-regulated, an organelle-specific, a tissue-specific, an inducible, a constitutive, or a cell-specific promoter.

[0151] In other embodiments, the recombinant vector comprises at least one sequence selected from the group consisting of (a) a regulatory sequence operatively coupled to the nucleotide sequence; (b) a selection marker operatively coupled to the nucleotide sequence; (c) a marker sequence operatively coupled to the nucleotide sequence; (d) a purification moiety operatively coupled to the nucleotide sequence; (e) a secretion sequence operatively coupled to the nucleotide sequence; and (f) a targeting sequence operatively coupled to the nucleotide sequence.

[0152] In some embodiments, the recombinant vector is a plasmid.

[0153] In some embodiments, the host cell expresses a polypeptide encoded by the recombinant vector. In some embodiments, the nucleotide sequence is stably incorporated into the genomic DNA of the host cell, and the expression of the nucleotide sequence is under the control of a regulated promoter region. In an exemplary embodiment, one or more of the polynucleotides encoding a branched-chain alpha-keto acid dehydrogenase polypeptide, a beta ketoacyl-ACP synthase polypeptide, a fatty aldehyde biosynthesis polypeptide, a fatty alcohol biosynthesis polypeptide, and/or an acyl-ACP reductase of the invention can be stably incorporated into the genomic DNA of the host cell, and the expression of the polynucleotide sequence is under the control of a regulated promoter region.

[0154] In some embodiment, an above-described vector or another vector can be introduced into the host cell wherein the vector comprises a fatty aldehyde biosynthesis polynucleotide having at least about 70% sequence identity to a nucleotide sequence encoding an enzyme listed in Table 6.

[0155] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell wherein the expression of a gene encoding a fatty acid synthase is modified. In certain embodiments, modifying the expression of a gene encoding a fatty acid synthase includes expressing a gene encoding a fatty acid synthase in the host cell and/or increasing the expression or activity of an endogenous fatty acid synthase in the host cell. In alternate embodiments, modifying the expression of a gene encoding a fatty acid synthase includes attenuating a gene encoding a fatty acid synthase in the host cell and/or decreasing the expression or activity of an endogenous fatty acid synthase in the host cell. In some embodiments, the fatty acid synthase is a thioesterase. In particular embodiments, the thioesterase is encoded by tesA, tesA without leader sequence, tesB, fatB, fatB2, fatB3, fatA, or fatA1.

[0156] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell expressing a gene encoding a fatty alcohol biosynthesis polypeptide. For example, the fatty alcohol biosynthesis polypeptide is an alcohol dehydrogenase. In particular embodiments, the fatty alcohol biosynthesis polypeptide comprises the amino acid sequence of an enzyme listed in Table 8, or a variant thereof.

[0157] In any of the embodiments described above, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a fatty acid degradation enzyme relative to a wild type host cell. In some embodiments, the host cell is genetically engineered to express an attenuated level of an acyl-CoA synthase relative to a wild type host cell. In particular embodiments, the host cell expresses an attenuated level of an acyl-CoA synthase encoded by fadD, fadK, BH3103, yhfL, Pfl-4354, EAV15023, fadD1, fadD2, RPC_4074, fadDD35, fadDD22, faa3p or the gene encoding the protein ZP_01644857. In certain embodiments, the genetically engineered host cell comprises a knockout of one or more genes encoding a fatty acid degradation enzyme, such as the aforementioned acyl-CoA synthase genes. In certain embodiments, the host cell is genetically engineered to express, relative to a wild type host cell, a decreased level of at least one of a gene encoding an acyl-CoA dehydrogenase, a gene encoding an outer membrane protein receptor, and a gene encoding a transcriptional regulator of fatty acid biosynthesis. In some embodiments, the gene encoding an acyl-CoA dehydrogenase is fadE. In some embodiments, the gene encoding an outer membrane protein receptor is tonA (also known as fhuA). Yet in other embodiments, the gene encoding a transcriptional regulator of fatty acid biosynthesis is fabR.

[0158] In yet other embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell, which is genetically engineered to express an attenuated level of a dehydratase/isomerase enzyme, such as an enzyme encoded by fabA or by a gene listed in Table 1 or Table 2. In some embodiments, the host cell comprises a knockout of fabA or a gene listed in Table 1 or Table 2. In other embodiments, the host cell is genetically engineered to express an attenuated level of a ketoacyl-ACP synthase, such as an enzyme encoded by fabB or by a gene listed in Table 3 or Table 4. In certain embodiments, the host cell comprises a knockout of fabB or a gene listed in Table 3 or Table 4. In yet other embodiments, the host cell is genetically engineered to express a modified level of a gene encoding a desaturase enzyme, such as desA.

[0159] In certain other embodiments, any of the vectors comprising the E1 alpha, E1 beta, and/or optionally E2 and/or optionally E3 subunits of a branched-chain alpha-keto acid dehydrogenase complex or another vector can be introduced into the host cell wherein the vector further comprises an acyl-ACP reductase polynucleotide having at least about 70% sequence identity to a nucleotide sequence encoding an enzyme listed in Table 7.

[0160] In some embodiments, the host cell is cultured in the presence of at least one biological substrate for the branched-chain alpha-keto acid dehydrogenase complex, the aldehyde biosynthesis polypeptide, the alcohol biosynthesis polypeptide, and/or the acyl-ACP reductase polypeptide. In certain embodiments, the host cell is cultured under conditions that are sufficient for expressing a branched-chain alpha-keto acid dehydrogenase complex, an aldehyde biosynthesis polypeptide, an alcohol biosynthesis polypeptide, and/or an acyl-ACP reductase polypeptide. In certain other embodiments, the host cell is cultured under conditions that allow the production of branched fatty alcohols or derivatives thereof.

[0161] In some embodiments, the microbially produced branched fatty alcohol and/or derivative thereof is produced by a host cell cultured in the presence of at least one biological substrate for the branched-chain alpha-keto acid dehydrogenase complex, the aldehyde biosynthesis polypeptide, the alcohol biosynthesis polypeptide, and/or the acyl-ACP reductase polypeptide. Accordingly, the host cell is cultured under conditions that allow expression of branched-chain alpha-keto acid dehydrogenase complex, the aldehyde biosynthesis polypeptide, the alcohol biosynthesis polypeptide, and/or the acyl-ACP reductase polypeptide.

[0162] In some embodiments, the branched fatty alcohol or derivative thereof is isolated from the host cell, for example, from the extracellular environment. In some embodiments, the branched fatty alcohol or derivative thereof is secreted from the host cell. In alternative embodiments, the branched fatty alcohol or derivative thereof is transported into the extracellular environment. In other embodiments, the branched fatty alcohol or derivative thereof is passively transported into the extracellular environment.

[0163] In any of the aspects of the invention described herein, the host cell can be selected from the group consisting of a mammalian cell, plant cell, insect cell, yeast cell, fungus cell, filamentous fungi cell, and bacterial cell. In some embodiments, the host cell is a Gram-positive bacterial cell. In other embodiments, the host cell is a Gram-negative bacterial cell. In some embodiments, the host cell is selected from the genus Escherichia, Bacillus, Lactobacillus, Rhodococcus, Pseudomonas, Aspergillus, Trichoderma, Neurospora, Fusarium, Humicola, Rhizomucor, Kluyveromyces, Pichia, Mucor, Myceliophtora, Penicillium, Phanerochaete, Pleurotus, Trametes, Chrysosporium, Saccharomyces, Stenotrophamonas, Schizosaccharomyces, Yarrowia, or Streptomyces.

[0164] In certain embodiments, the host cell is a Bacillus lentus cell, a Bacillus brevis cell, a Bacillus stearothermophilus cell, a Bacillus licheniformis cell, a Bacillus alkalophilus cell, a Bacillus coagulans cell, a Bacillus circulans cell, a Bacillus pumilis cell, a Bacillus thuringiensis cell, a Bacillus clausii cell, a Bacillus megaterium cell, a Bacillus subtilis cell, or a Bacillus amyloliquefaciens cell.

[0165] In other embodiments, the host cell is a Trichoderma koningii cell, a Trichoderma viride cell, a Trichoderma reesei cell, a Trichoderma longibrachiatum cell, an Aspergillus awamori cell, an Aspergillus fumigates cell, an Aspergillus foetidus cell, an Aspergillus nidulans cell, an Aspergillus niger cell, an Aspergillus oryzae cell, a Humicola insolens cell, a Humicola lanuginose cell, a Rhodococcus opacus cell, a Rhizomucor miehei cell, or a Mucor michei cell.

[0166] In yet other embodiments, the host cell is a Streptomyces lividans cell or a Streptomyces murinus cell.

[0167] In yet other embodiments, the host cell is an Actinomycetes cell.

[0168] In some embodiments, the host cell is a Saccharomyces cerevisiae cell.

[0169] In particular embodiments, the host cell is a cell from an eukaryotic plant, algae, cyanolacterium, green-sulfur bacterium, green non-sulfur bacterium, purple sulfur bacterium, purple non-sulfur bacterium, extremophile, yeast, fungus, engineered organisms thereof, or a synthetic organism. In some embodiments, the host cell is light dependent or fixes carbon. In some embodiments, the host cell is light dependent or fixes carbon. In some embodiments, the host cell has autotrophic activity. In some embodiments, the host cell has photoautotrophic activity, such as in the presence of light. In some embodiments, the host cell is heterotrophic or mixotrophic in the absence of light. In certain embodiments, the host cell is a cell from Avabidopsis thaliana, Panicum virgatum, Miscanthus giganteus, Zea mays, Botryococcuse braunii, Chlamydomonas reinhardtii, Dunaliela salina, Synechococcus Sp. PCC 7002, Synechococcus Sp. PCC 7942, Synechocystis Sp. PCC 6803, Thermosynechococcus elongates BP-1, Chlorobium tepidum, Chloroflexus auranticus, Chromatiumm vinosum, Rhodospirillum rubrum, Rhodobacter capsulatus, Rhodopseudomonas palusris, Clostridium ljungdahlii, Clostridiuthermocellum, Penicillium chrysogenum, Pichia pastoris, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pseudomonas fluorescens, or Zymomonas mobilis.

[0170] In other embodiments, the host cell is a CHO cell, a COS cell, a VERO cell, a BHK cell, a HeLa cell, a Cv1 cell, an MDCK cell, a 293 cell, a 3T3 cell, or a PC12 cell.

[0171] In yet other embodiments, the host cell is an E. coli cell. In certain embodiments, the E. coli cell is a strain B, a strain C, a strain K, or a strain W E. coli cell.

[0172] In further embodiments, the host cell can be genetically engineered to express an attenuated level of a dehydratase/isomerase enzyme. For example, an E. coli cell is chosen as a suitable host cell, wherein one or more of the endogenous dehydratase/isomerase enzymes such as those listed in Table 1 below can be attenuated or knocked out.

TABLE-US-00001 TABLE 1 E.coli dehydratase/isomerase enzymes Polynucleotide Polypeptide Gene Name Acc. No. Acc. No. fabA beta-hydroxydecanoyl GU072596.1 ACY27485.1 thioester dehydrase fabZ (3R)-hydroxymyristol acyl GU072604 ACY27493.1 carrier protein dehydratase cysM cysteine synthase B CP001637.1 ACX38914 maoC fused aldehyde dehydroge- CP001637 ACX39905.1 nase/enoyl-CoA hydratase

[0173] Other dehydratase/isomerase enzymes encoded by a gene listed below in Table 2 can also be attenuated or knocked from an organism comprising such a gene.

TABLE-US-00002 TABLE 2 Other dehydatase/isomerase enzymes Organism Accession No. Shigella sp. D9 ZP_05432652 Citrobacter youngae ATCC 29220 ZP_04561391.1 Salmonella enterica YP_001570967.1 Escherichia fergusonii ATCC 35469 YP_002382254.1 Klebsiella pneumoniae NTUH-K2044 YP_002918743.1 Enterobacter cancerogenus ATCC 35316 ZP_03281954.1 Cronobacter turicensis CBA29728.1 Erwinia pyrifoliae Ep1/96 YP_002649242.1 Pectobacterium carotovorum YP_003018119.1 subsp. carotovorum PC1 Dickeya dadantii Ech703 YP_002987184.1 Edwardsiella ictaluri 93-146 YP_002932813.1 Providencia alcalifaciens DSM 30120 ZP_03317956.1 Yersinia kristensenii ATCC 33638 ZP_04624337.1 Photorhabdus asymbiotica YP_003041580.1 Pantoea sp. At-9b ZP_05728924.1 Actinobacillus succinogenes 130Z YP_001344737.1 Mannheimia succiniciproducens MBEL55E YP_088386.1 Pasteurella multocida subsp. NP_245421.1 multocida str. Pm70 Haemophilus somnus 129PT YP_719117.1 Proteus mirabilis HI4320 YP_002150544.1 Sodalis glossinidius str. `morsitans` YP_454706.1 Candidatus Blochmannia YP_277927.1 pennsylvanicus str. BPEN Aggregatibacter aphrophilus NJ8700 YP_003007342.1 Vibrio cholerae MZO-3 ZP_01958381.1 Baumannia cicadellinicola str. Hc YP_588853.1 (Homalodisca coagulata) Vibrionales bacterium SWAT-3 ZP_01815187.1 Aliivibrio salmonicida LFI1238 YP_002262988.1 Aeromonas salmonicida subsp. YP_001141819.1 salmonicida A449 Wigglesworthia glossinidia endosymbiont NP_871303.1 of Glossina brevipalpis Glaciecola sp. HTCC2999 ZP_03560821.1 Alteromonas macleodii ATCC 27126 ZP_04714556.1

[0174] In other embodiments, the host cell is genetically engineered to express an attenuated level of an endogenous ketoacyl-ACP synthase. For example, an E. coli cell is used as a suitable host cell, wherein one or more of the ketoacyl-ACP genes listed in Table 3 below can be attenuated or knocked out.

TABLE-US-00003 TABLE 3 E.coli ketoacyl-ACP synthase enzymes Polynucleotide Polypeptide Gene Name Acc. No. Acc. No. fabB B-ketoacyl synthase/ GU072597.1 ACY27486.1 3-oxoacyl-[acyl-carrier-pro tein] synthase I fabF 3-oxoacyl-[acyl-carrier- GU072598.1 ACY27487 protein] synthase II fadJ fused enoyl-CoA hydratase CP001637.1 ACX38989.1 and epimerase/isomerase/ 3-hydroxyacyl-CoA dehydrogenase xerC site-specific tyrosine CP001637.1 ACX41768.1 recombinase yqeF predicted acyltransferase CP001637.1 ACX38529.1 murQ predicted PTS component CP001637.1 ACX38907.1

[0175] Other endogenous ketoacyl-ACP synthases, such as the ones listed in Table 4, can be attenuated or knocked out from an organism comprising such an enzyme.

TABLE-US-00004 TABLE 4 Other ketoacyl-ACP synthases Organism Accession No. Shigella boydii CDC 3083-94 YP_001881145.1 Escherichia fergusonii ATCC 35469 YP_002382013.1 Salmonella enterica subsp. arizonae YP_001569590.1 Citrobacter sp. 30_2 ZP_04562837.1 Klebsiella pneumoniae subsp. pneumoniae YP_001336360.1 MGH 78578 Pectobacterium carotovorum subsp. ZP_03831287.1 carotovorum WPP14 Enterobacter cancerogenus ATCC 35316 ZP_03283474.1 Pantoea sp. At-9b ZP_05730617.1 Cronobacter turicensis CBA32510.1 Dickeya dadantii Ech586 ZP_05723897.1 Erwinia tasmaniensis Et1/99 YP_001907100.1 Serratia proteamaculans 568 YP_001479594.1 Edwardsiella ictaluri 93-146 YP_002934130.1 Sodalis glossinidius str. `morsitans` YP_455303.1 Yersinia aldovae ATCC 35236 ZP_04620215.1 Providencia stuartii ATCC 25827 ZP_02961167.1 Photorhabdus asymbiotica YP_003040275.1 Proteus mirabilis HI4320 YP_002151524.1 Candidatus Blochmannia pennsylvanicus str. BPEN YP_278005.1 Glaciecola sp. HTCC2999 ZP_03561088.1 Vibrio cholerae V51 ZP_04919940.1 Wigglesworthia glossinidia endosymbiont of NP_871411.1 Glossina brevipalpis Tolumonas auensis DSM 9187 YP_002892770.1 Actinobacillus pleuropneumoniae serovar 1 str. 4074 ZP_00134992.2 Aggregatibacter aphrophilus NJ8700 YP_003007711.1 Pseudoalteromonas tunicata D2 ZP_01135065.1 Vibrionales bacterium SWAT-3 ZP_01816638.1 Pasteurella multocida subsp. multocida str. Pm70 NP_245276.1 Mannheimia succiniciproducens MBEL55E YP_088783.1 Haemophilus somnus 129PT YP_718877.1 Shewanella loihica PV-4 YP_001094535.1 Aliivibrio salmonicida LFI1238 YP_002262558.1

[0176] In yet other embodiments, the host cell is genetically engineered to express a modified level of a gene encoding a desaturase enzyme, such as desA.

[0177] In certain embodiments, the microorganism is genetically engineered to express a modified level (including, e.g., to attenuate or knock out or to express or overexpress) of a gene encoding a fatty aldehyde biosynthesis polypeptide. In some embodiments, the fatty aldehyde biosynthesis polypeptide comprises an amino acid sequence that has at least 70% sequence identity to an enzyme listed in Table 6.

[0178] In certain embodiments, the microorganism is genetically engineered to express a modified level of a fatty acid synthase in the host cell. An exemplary fatty acid synthase is a thioesterase encoded by, for example, tesA, tesA without leader sequence, tesB, fatB, fatB2, fatB3, fatA, or fatA1.

[0179] In certain embodiments, the microorganism is genetically engineered to express a modified level of gene encoding a fatty alcohol biosynthesis polypeptide. For example, the fatty alcohol biosynthesis polypeptide is an alcohol dehydrogenase. In particular embodiments, the fatty alcohol biosynthesis polypeptide comprises an amino acid sequence that has at least 70% sequence identity to an enzyme listed in Table 8.

Branched-Chain Alpha-Keto Acid Dehydrogenase Complex (BKD Complex) and Beta Ketoacyl-ACP Synthase

[0180] The methods described herein can be used to produce branched fatty alcohols and/or derivatives, for example, from alpha keto acids. The oxidative decarboxylation step, which converts the alpha keto acids to the corresponding branched-chain acyl-CoA involves a branched-chain .alpha.-keto acid dehydrogenase complex (bkd; EC 1.2.4.4.) (Denoya et al., J. Bacteriol. 177:3504 (1995)), which consists of E1 alpha/beta (decarboxylase), E2 (dihydrolipoyl transacylase), and E3 (dihydrolipoyl dehydrogenase) subunits. Any microorganism that possesses branched-chain fatty acids, and/or grows on branched-chain amino acids can be used as a source to isolate bkd genes for expression in host cells, for example, E. coli. Furthermore, E. coli has the E3 component as part of its pyruvate dehydrogenase complex (lpd, EC 1.8.1.4, GenBank accession NP_414658). Thus, branched fatty alcohols and/or derivatives can be made by heterologously expressing only the E1 alpha/beta and E2 bkd genes. Furthermore, certain of the host cells, including E. coli, can produce branched products when only the E1 alpha/beta is expressed without co-expression of the E2 bkd gene.

[0181] On the other hand, microorganisms that endogenously express a suitable beta-ketoacyl ACP synthase can be engineered to express or overexpress at least the first (E1) subunit of a branched-chain alpha keto acid dehydrogenase complex, optionally also the second (E2) and/or the third (E3) subunits of that complex to produce the desirable branched fatty alcohols and/or derivatives thereof. The endogenous beta-ketoacyl ACP synthase can be overexpressed, or can be modified such that it is attenuated or deleted, and a heterologous beta-ketoacyl ACP synthase gene can be expressed in its place.

[0182] In a further embodiment, microorganisms that endogenously express at least the first (E1) subunit of a branched-chain alpha keto acid dehydrogenase complex, and optionally also the second (E2) and/or the third (E3) subunits of that complex, can be engineered to express or overexpress a beta-ketoacyl ACP synthase. For example, the endogenous genes encoding the subunits of the branched-chain alpha keto acid dehydrogenase complex can be overexpressed, or can be modified such that they are attenuated or deleted and a gene encoding one or more subunits of a heterologous branched-chain alpha keto acid dehydrogenase complex can be expressed in the host cell.

Substrates for Branched Fatty Alcohol Production

[0183] The branched fatty alcohols and/or derivatives, as well as the surfactant compositions comprising them, can be produced from, for example, branched fatty aldehydes, which themselves can be produced from an appropriate substrate. While not wishing to be bound by theory, it is believed that the branched fatty aldehyde biosynthetic polypeptides described herein produce branched fatty aldehydes from substrates via a reduction mechanism. In some instances, the substrate is a branched fatty acid derivative, and a fatty aldehyde having particular branching patterns and carbon chain length can be produced from a branched fatty acid derivative having those characteristics. The branched fatty aldehyde can then be converted into the desired branched fatty alcohol in a reaction catalyzed by a fatty alcohol biosynthesis polypeptide.

[0184] Alternatively, a suitable acyl-ACP reductases can be employed to convert a branched acyl-ACP into a fatty aldehyde, which can in turn be converted into a branched fatty alcohol in a reaction catalyzed by a fatty alcohol biosynthesis polypeptide.

[0185] Accordingly, each step within a biosynthetic pathway that leads to the production of a branched fatty acid derivative substrate can be modified to produce or overproduce the branched substrate of interest. For example, known genes involved in the fatty acid biosynthetic pathway or the fatty aldehyde biosynthesis pathway can be expressed, overexpressed, or attenuated in host cells to produce a desired substrate (see, e.g., International Publication WO 2008/119082, the disclosure of which is incorporated by reference).

Synthesis of Branched Fatty Alcohols and Substrates

[0186] Fatty acid synthase (FAS) is a group of polypeptides that catalyze the initiation and elongation of acyl chains (Marrakchi et al., Biochemical Society, 30: 1050-1055 (2002)). The acyl carrier protein (ACP) along with the enzymes in the FAS pathway control the length, degree of saturation, and branching of the fatty acid derivatives produced. The fatty acid biosynthetic pathway involves the precursors acetyl-CoA and malonyl-CoA. The steps in this pathway are catalyzed by enzymes of the fatty acid biosynthesis (fab) and acetyl-CoA carboxylase (acc) gene families (see, e.g., Heath et al., Prog. Lipid Res., 40(6): 467-97 (2001)).

[0187] Host cells can be engineered to express fatty acid derivative substrates by recombinantly expressing or overexpressing one or more fatty acid synthase genes, such as acetyl-CoA and/or malonyl-CoA synthase genes. For example, to increase acetyl-CoA production, one or more of the following genes can be expressed in a host cell: pdh (a multienzyme complex comprising aceEF (which encodes the E1p dehydrogenase component, the E2p dihydrolipoamide acyltransferase component of the pyruvate and 2-oxoglutarate dehydrogenase complexes, and lpd), panK, fabH, fabB, fabD, fabG, acpP, and fabF. Exemplary GenBank accession numbers for these genes are: pdh (BAB34380, AAC73227, AAC73226), panK (also known as CoA, AAC76952), aceEF (AAC73227, AAC73226), fabH (AAC74175), fabB (P0A953), fabD (AAC74176), fabG (AAC74177), acpP (AAC74178), and fabF (AAC74179). Additionally, the expression levels of fadE, gpsA, ldhA, pflb, adhE, pta, poxB, ackA, and/or ackB can be attenuated or knocked-out in an engineered host cell by transformation with conditionally replicative or non-replicative plasmids containing null or deletion mutations of the corresponding genes or by substituting promoter or enhancer sequences. Exemplary GenBank accession numbers for these genes are: fadE (AAC73325), gspA (AAC76632), ldhA (AAC74462), pflb (AAC73989), adhE (AAC74323), pta (AAC75357), poxB (AAC73958), ackA (AAC75356), and ackB (BAB81430). The resulting host cells will have increased acetyl-CoA production levels when grown in an appropriate environment.

[0188] Malonyl-CoA overexpression can be affected by introducing accABCD (e.g., accession number AAC73296, EC 6.4.1.2) into a host cell. Fatty acid production can be further increased by introducing into the host cell a DNA sequence encoding a lipase (e.g., accession numbers CAA89087, CAA98876).

[0189] In addition, inhibiting PlsB can lead to an increase in the levels of long chain acyl-ACP, which will inhibit early steps in the pathway (e.g., accABCD, fabH, and fabI). The plsB (e.g., accession number AAC77011) D311E mutation can be used to increase the amount of available fatty acids.

[0190] In addition, a host cell can be engineered to overexpress a sfa gene (suppressor of fabA, e.g., accession number AAN79592) to increase production of monounsaturated fatty acids (Rock et al., J. Bacteriology, 178: 5382-5387 (1996)).

[0191] The chain length of a fatty acid derivative substrate can be selected for by modifying the expression of selected thioesterases. Thioesterase influences the chain length of fatty acids produced. Hence, host cells can be engineered to express, overexpress, have attenuated expression, or not to express one or more selected thioesterases to increase the production of a preferred fatty acid derivative substrate. For example, C.sub.10 fatty acids can be produced by expressing a thioesterase that has a preference for producing C.sub.10 fatty acids and attenuating thioesterases that have a preference for producing fatty acids other than C.sub.10 fatty acids (e.g., a thioesterase which prefers to produce C.sub.14 fatty acids). This would result in a relatively homogeneous population of fatty acids that have a carbon chain length of 10. In other instances, C.sub.14 fatty acids can be produced by attenuating endogenous thioesterases that produce non-C.sub.14 fatty acids and expressing the thioesterases that have a preference for C.sub.14-ACP. In some situations, C.sub.12 fatty acids can be produced by expressing thioesterases that have a preference for C.sub.12-ACP and attenuating thioesterases that preferentially produce non-C.sub.12 fatty acids. Acetyl-CoA, malonyl-CoA, and fatty acid overproduction can be verified using methods known in the art, for example, by using radioactive precursors, HPLC, or GC-MS subsequent to cell lysis. Non-limiting examples of thioesterases that can be used in the methods described herein are listed in Table 5.

TABLE-US-00005 TABLE 5 Thioesterases Accession Number Source Organism Gene AAC73596 E. coli tesA without leader sequence AAC73555 E. coli tesB Q41635, AAA34215 Umbellularia california fatB AAC49269 Cuphea hookeriana fatB2 Q39513; AAC72881 Cuphea hookeriana fatB3 Q39473, AAC49151 Cinnamonum camphorum fatB CAA85388 Arabidopsis thaliana fatB [M141T]* NP 189147; NP 193041 Arabidopsis thaliana fatA CAC39106 Bradyrhiizobium japonicum fatA AAC72883 Cuphea hookeriana fatA AAL79361 Helianthus annus fatA1 *Mayer et al., BMC Plant Biology, 7: 1-11 (2007)

[0192] In certain embodiments, a host cell, which is used to produce branched fatty alcohols and/or derivatives herein, can be engineered to express or overexpress one of more fatty aldehyde biosynthetic polypeptides. Alternatively, the host cell can be engineered to express an attenuated level of an endogenous fatty aldehyde biosynthetic polypeptide. In other instances, a fatty aldehyde biosynthetic polypeptide, a variant, or a fragment thereof is expressed in a host cell that contains a naturally occurring mutation that results in an increased level of branched fatty aldehyde substrate in the host cell or of branched fatty alcohol produced by the host cell. In some instances, a branched fatty aldehyde is produced by expressing a fatty aldehyde biosynthesis gene, for example, a carboxylic acid reductases gene, encoding a protein listed in Table 6, below, as well as a polynucleotide variant there. In some instances, the fatty aldehyde biosynthesis gene encodes one of the enzymes listed in Table 6 below.

TABLE-US-00006 TABLE 6 Fatty Aldehyde Biosynthesis Genes Name/Organism Accession No. Nocardia sp. NRRL 5646 >gi|40796035|gb|AAR91681.1| Mycobacterium tuberculosis >gi|15609727|ref|NP_217106.1 H37Rv Mycobacterium smegmatis >gi|118174788|gb|ABK75684.1| str. MC2 155 Mycobacterium smegmatis >gi|118469671|ref|YP_889972.1| str. MC2 155 FadD9 uniprot|A0PPD8|A0PPD8_MYCUA Tsukamurella paurometabola >gi|22798060|ref|ZP_04027864.1| DSM 20162 Cyanobium sp. PCC 7001 >gi|254431429|ref|ZP_05045132.1| Putative acyl-CoA dehydrogenase >uniprot|A0QIB5|A0QIB5_MYCA1 NAD dependent >uniprot|A0QWI7|A0QWI7_MYCS2 epimerase/dehydratase Mycobacterium intracellulare >gi|254819907|ref|ZP_05224908.1| ATCC13950 Putative long-chain >uniprot|A0R484|A0R484_MYCS2 fattyacid-CoA ligase Mycobacterium kansasii >gi|240173202|ref|ZP_04751860.1| ATCC 12478 Probable fatty-acid-CoA >uniprot|A1KLT8|A1KLT8_MYCBP ligase fadD9 Mycobacterium intracellulare >gi|254822803|ref|ZP_05227804.1| ATCC13950 Fatty-acid-CoA ligase fadD9 >uniprot|A1QUM2|A1QUM2_MYCTF Thioester reductase domain >uniprot|A1T887|A1T887_MYCVP Thioester reductase domain >uniprot|A1UFA8|A1UFA8_MYCSK Mycobacterium avium >gi|254775919|ref|ZP_05217435.1| subsp. ATCC 25291 Thioester reductase domain >uniprot|A3PYW9|A3PYW9_MYCSJ Mycobacterium leprae Br4923 >gi|219932734|emb|CAR70557.1| Putative acyl-CoA synthetase >uniprot|A5CM59|A5CM59_CLAM3 Thioester reductase domain >uniprot|A8M8D3|A8M8D3_SALAI Probable fatty-acid-CoA >uniprot|B1MCR9|B1MCR9_MYCAB ligase FadD Probable fatty-acid-CoA >uniprot|B1MCS0|B1MCS0_MYCAB ligase FadD Putative fatty-acid-CoA ligase >uniprot|B1MDX4|B1MDX4_MYCAB Probable fatty-acid-coa >uniprot|B1MLD7|B1MLD7_MYCAB ligase FadD Putative carboxylic acid reductase >uniprot|B1VMZ4|B1VMZ4_STRGG Fatty-acid-CoA ligase FadD9_1 >uniprot|B2HE95|B2HE95_MYCMM Fatty-acid-CoA ligase FadD9 >uniprot|B2HN69|B2HN69_MYCMM Putative Acyl-CoA synthetase >uniprot|O69484|O69484_MYCLE Probable peptide synthetase nrp >uniprot|Q10896|Q10896_MYCTU Putative carboxylic acid reductase >uniprot|Q5YY80|Q5YY80_NOCFA ATP/NADPH-dependent >uniprot|Q6RKB1|Q6RKB1_9NOCA carboxylic acid reductase FadD9 >uniprot|Q741P9|Q741P9_MYCPA Substrate--CoA ligase, putative >uniprot|Q7D6X4|Q7D6X4_MYCTU Probable fatty-acid-coa >uniprot|Q7TY99|Q7TY99_MYCBO ligase fadd9 Putative acyl-CoA synthetase >uniprot|Q9CCT4|Q9CCT4_MYCLE Putative uncharacterized protein >uniprot|Q54JK0|Q54JK0_DICDI Putative non-ribosomal >uniprot|Q2MFQ3|Q2MFQ3_STRRY peptide synthetase Mycobacterium tuberculosis >gi|215431545|ref|ZP_03429464.1| EAS054 Mycobacterium tuberculosis >gi|218754327|ref|ZP_03533123.1| GM 1503 Mycobacterium tuberculosis T85 >gi|215446840|ref|ZP_03433592.1| Mycobacterium tuberculosis T17 >gi|219558593|ref|ZP_03537669.1| Mycobacterium intracellulare >gi|254819907|ref|ZP_05224908.1| ATCC13950

[0193] In certain embodiments, a host cell, which is used to produce branched fatty alcohols and/or derivatives herein, can be engineered to express or overexpress one or more acyl-ACP reductases polypeptides, variants, or fragments thereof to achieve an improved production of one or more desirable branched fatty alcohols or derivatives. Alternatively, a host cell can be engineered to express an attenuated level of an endogenous acyl-ACP reductase. Non-limiting examples of suitable acyl-ACP reductases are listed in Table 7 below:

TABLE-US-00007 TABLE 7 Acyl-ACP Reductase Polypeptides Organism Accession No. Synechococcus elongatus Synpcc7942_1594 (YP_400611) PCC7942 Synechocystis sp. sll0209 (NP_442146) Cyanothece sp. ATCC51142 cce_1430 (YP_001802846) Prochlorococcus marinus CCMP1986 PMM0533 (NP_892651) subsp.pastoris str. Gloeobacter violaceus PCC7421 NP_96091 (gll3145) Nostoc punctiforme PCC73102 ZP_00108837 (Npun02004176) Anabaena variabilis ATCC29413 YP_323044 (Ava_2534) Synechococcus elongatus PCC6301 YP_170761 (syc0051_d) Nostoc sp. PCC 7120 alr5284 (NP_489324) Prochlorococcus marinus CCMP1986 PMM0533 (NP_892651) subsp.pastoris str.

[0194] In certain embodiments, a host cell, which is used to produce fatty alcohols and/or derivatives herein, can be further engineered to express or overexpress one or more fatty alcohol biosynthesis polypeptides, variants, or fragments thereof in order to achieve an improved production of one or more desirable branched fatty alcohols or derivatives. Alternatively, a host cell can be engineered to express an attenuated level of an endogenous fatty alcohol biosynthesis polypeptide. Non limiting examples of suitable fatty alcohol biosynthesis polypeptides are listed in Table 8 below:

TABLE-US-00008 TABLE 8 Fatty Alcohol Biosynthesis/Alcohol Dehydrogenase Polypeptide GenBank GenBank GenBank Name Accession No. Name Accession No. Name Accession No. ygjB NP_418690 YggP YP_026187 YciK NP_415787 yahK NP_414859 YiaY YP_026233 YgfF NP_417378 adhP NP_415995 FucO NP_417279 YghA NP_417476 ydjL NP_416290 EutG NP_416948 YjgI NP_418670 ydjJ NP_416288 YqhD NP_417484 YdfG NP_416057 idnD NP_418688 AdhE NP_415757 YgcW NP_417254 Tdh NP_418073 dkgB NP_414743 UcpA NP_416921 yjjN NP_418778 YdjG NP_416285 EntA NP_415128 rspB NP_416097 YeaE NP_416295 FolM NP_416123 gatD NP_416594 dkgA NP_417485 HdhA NP_416136 yphC NP_417040 YajO NP_414953 HcaB NP_417036 yhdH NP_417719 YghZ NP_417474 SrlD NP_417185 ycjQ NP_415829 Tas NP_417311 KduD NP_417319 yncB NP_415966 YdhF YP_025305 IdnO NP_418687 Qor NP_418475 YdbC NP_415924 FabG NP_415611 frmA NP_414890 ybbO NP_415026 FabI NP_415804 ybdR NP_415141 yohF NP_416641 YdjA NP_416279

[0195] In some instances, a host cell, which can be used to produce branched fatty alcohols and/or derivatives herein, is genetically engineered to increase the level of branched fatty acids in the host cell relative to a corresponding wild-type host cell. For example, the host cell can be genetically engineered to express a reduced level of an acyl-CoA synthase relative to a wild-type host cell. In one embodiment, the level of expression of one or more genes (e.g., an acyl-CoA synthase gene) is reduced by genetically engineering a "knock out" host cell.

[0196] Any known acyl-CoA synthase gene can be reduced or knocked out in a host cell. Non-limiting examples of acyl-CoA synthase genes include fadD, fadK, BH3103, yhfL, Pfl-4354, EAV15023, fadD1, fadD2, RPC_4074, fadDD35, fadDD22, faa3p or the gene encoding the protein ZP_01644857. Specific examples of acyl-CoA synthase genes include fadDD35 from M. tuberculosis H37Rv [NP_217021], fadDD22 from M. tuberculosis H37Rv [NP_217464], fadD from E. coli [NP_416319], fadK from E. coli [YP_416216], fadD from Acinetobacter sp. ADP1 [YP_045024], fadD from Haemophilus influenza RdkW20 [NP_438551], fadD from Rhodopseudomonas palustris Bis B18 [YP_533919], BH3101 from Bacillus halodurans C-125 [NP_243969], Pfl-4354 from Pseudomonas fluorescens Pfo-1 [YP_350082], EAV15023 from Comamonas testosterone KF-1 [ZP_01520072], yhfL from B. subtilis [NP_388908], fadD1 from P. aeruginosa PAO1 [NP_251989], fadD1 from Ralstonia solanacearum GM1 1000 [NP_520978], fadD2 from P. aeruginosa PAO1 [NP_251990], the gene encoding the protein ZP_01644857 from Stenotrophomonas maltophilia R551-3, faa3p from Saccharomyces cerevisiae [NP_012257], faa1p from Saccharomyces cerevisiae [NP_014962], lcfA from Bacillus subtilis [CAA99571], or those described in Shockey et al., Plant. Physiol., 129: 1710-1722 (2002); Caviglia et al., J. Biol. Chem., 279: 1163-1169 (2004); Knoll et al., J. Biol. Chem., 269(23): 16348-56 (1994); Johnson et al., J. Biol. Chem., 269: 18037-18046 (1994); and Black et al., J. Biol. Chem. 267: 25513-25520 (1992).

Production of Branched Precursors

[0197] Branched fatty alcohols and derivatives can be produced from branched fatty aldehydes containing one or more branched points, using branched acyl-ACPs as substrates for a fatty aldehyde biosynthesis polypeptide or an acyl-ACP reductase polypeptide as described herein. The first step in forming branched fatty alcohol precursors is the production of the corresponding alpha-keto acids by a branched-chain amino acid aminotransferase. Host cells may endogenously include genes encoding such enzymes or such genes can be recombinantly introduced. E. coli, for example, endogenously expresses such an enzyme, IlvE (EC 2.6.1.42; GenBank accession YP_026247). In host cells where no branched-chain amino acid aminotransferase are expressed, an E. coli IlvE or any other branched-chain amino acid aminotransferase (e.g., IlvE from Lactococcus lactis (GenBank accession AAF34406), IlvE from Pseudomonas putida (GenBank accession NP_745648), or IlvE from Streptomyces coelicolor (GenBank accession NP_629657)), can be introduced.

[0198] In another embodiment, the production of alpha-keto acids can be achieved using the methods described in Park et al., PNAS, 104:7797-7802 (2007) and Atsumi et al., Nature, 451: 86-89 (2008). For example, 2-ketoisovalerate can be produced by overexpressing the genes encoding IlvI, IlvH, IlvH mutant, IlvB, IlvN, IlvGM, IlvC, or IlvD. Alternatively, 2-keto-3-methyl-valerate can be produced by overexpressing the genes encoding IlvA and IlvI, IlvH (or AlsS of Bacillus subtilis), IlvC, IlvD, or their homologs. 2-keto-4-methyl-pentanoate can also be produced by overexpressing the genes encoding IlvI, IlvH, IlvC, IlvD and LeuA, LeuB, LeuC, LeuD, or their homologs.

[0199] In another example, isobutyryl-CoA can be made in a host cell, for example in E. coli, through the coexpression of a crotonyl-CoA reductase (Ccr, EC 1.6.5.5, 1.1.1.1) and isobutyryl-CoA mutase (large subunit IcmA, EC 5.4.99.2; small subunit IcmB, EC 5.4.99.2) (Han and Reynolds, J. Bacteriol., 179: 5157 (1997)). Crotonyl-CoA is an intermediate in fatty acid biosynthesis in E. coli and other microorganisms. Non-limiting examples of ccr and icm genes from selected microorganisms are listed in Table 9.

TABLE-US-00009 TABLE 9 ccr and icm Genes from Selected Microorganisms Organism Gene GenBank Accession # Streptomyces coelicolor Ccr NP_630556 icmA NP_629554 icmB NP_630904 Streptomyces Ccr AAD53915 cinnamonensis icmA AAC08713 icmB AJ246005

Formation of Branched Cyclic Fatty Alcohols and Derivatives

[0200] Branched cyclic fatty alcohols can be produced from suitable alpha keto acids using branched cyclic fatty acid derivatives such as a branched cyclic acyl-ACP as substrates. To produce branched cyclic fatty acid derivative substrates, genes that provide cyclic precursors (e.g., the ans, chc, and plm gene families) can be introduced into a host cell and expressed to allow initiation of fatty acid biosynthesis from branched cyclic precursors. For example, to convert a host cell, such as E. coli, into one capable of synthesizing .omega.-cyclic fatty acids (cyFA), a gene that provides the cyclic precursor cyclohexylcarbonyl-CoA (CHC-CoA) (Cropp et al., Nature Biotech., 18: 980-983 (2000)) can be introduced and expressed in the host cell. Non-limiting examples of genes that provide CHC-CoA in E. coli include: ansJ, ansK, ansL, chcA, and ansM from the ansatrienin gene cluster of Streptomyces collinus (Chen et al., Eur. J. Biochem., 261: 98-107 (1999)) or plmJ, plmK, plmL, chcA, and plmM from the phoslactomycin B gene cluster of Streptomyces sp. HK803 (Palaniappan et al., J. Biol. Chem., 278: 35552-35557 (2003)) together with the chcB gene (Patton et al., Biochem., 39: 7595-7604 (2000)) from S. collinus, S. avermitilis, or S. coelicolor (see Table 10). The genes listed in Table 10 can then be expressed to allow initiation and elongation of .omega.-cyclic fatty acids. Alternatively, the homologous genes can be isolated from microorganisms that make cyFA and expressed in a host cell (e.g., E. coli).

TABLE-US-00010 TABLE 10 Genes for the Synthesis of CHC-CoA Organism Gene GenBank Accession No. Streptomyces collinus ansJK U72144* ansL AF268489 chcA ansM chcB Streptomyces sp. HK803 pmlJK AAQ84158 pmlL AAQ84159 chcA AAQ84160 pmlM AAQ84161 Streptomyces coelicolor chcB/caiD NP_629292 Streptomyces avermitilis chcB/caiD NP_629292 *Only chcA is annotated in GenBank entry U72144; ansJKLM are according to Chen et al. (Eur. J. Biochem., 261: 98-107 (1999)).

[0201] Genes fabH, acp, and fabF allow initiation and elongation of .omega.-cyclic fatty acids because they have broad substrate specificity. If the coexpression of any of these genes with the genes listed in Table 10 does not yield cyFA, then fabH, acp, and/or fabF homologs from microorganisms that make cyFAs (e.g., those listed in Table 11) can be isolated (e.g., by using degenerate PCR primers or heterologous DNA sequence probes) and coexpressed.

TABLE-US-00011 TABLE 11 Non-Limiting Examples of Microorganisms Containing .omega.-cyclic Fatty Acids Organism Reference Curtobacterium pusillum ATCC19096 Alicyclobacillus acidoterrestris ATCC49025 Alicyclobacillus acidocaldarius ATCC27009 Alicyclobacillus cycloheptanicus * Moore, J. Org. Chem., 62: 2173 (1997) * Uses cycloheptylcarbonyl-CoA and not cyclohexylcarbonyl-CoA as precursor for cyFA biosynthesis.

Branched Fatty Alcohol Saturation Levels

[0202] The degree of saturation in branched fatty acid derivative substrates, such as, for example, a branched acyl-ACP, (which can then be converted into branched fatty aldehydes and then branched fatty alcohols as described herein) can be controlled by regulating the degree of saturation of fatty acid intermediates. For example, the sfa, gns, and fab families of genes can be expressed or overexpressed to control the saturation of a branched acyl-ACP. In certain embodiments, the host cells can be engineered to reduce the expression of an sfa, gns, or fab gene and control the level of saturated substrates vs. unsaturated substrates, which in turn affects the production level of saturated branched fatty alcohols or derivatives vs. unsaturated branched fatty alcohols or derivatives.

[0203] In some instances, a host cell can be engineered to express an attenuated level of a dehydratase/isomerase and/or a ketoacyl-ACP synthase. For example, a host cell can be engineered to express a decreased level of fabA and/or fabB. In some instances, the host cell can be cultured or grown in the presence of unsaturated fatty acids. In some instances, the host cell can be engineered to express or overexpress a gene encoding a desaturases enzyme. One non-limiting example of a desaturases is B. subtiis DesA (AF037430). Other genes encoding desaturases are known in the art can be introduced or used in the host cell and methods described herein, such as desaturases that use acyl-ACPs, including, for example, hexadecanoyl-ACP or octadecanoyl-ACP.

[0204] In some embodiments, those cells can be engineered to produce unsaturated fatty acids by engineering the production host to overexpress fabB or by growing the production host at low temperatures (e.g., less than 37.degree. C.). FabB has preference to cis-.delta.3decenoyl-ACP and results in unsaturated fatty acid production in E. coli. Overexpression of fabB results in the production of a significant percentage of unsaturated fatty acids (de Mendoza et al., J. Biol. Chem., 258: 2098-2101 (1983)). The gene fabB may be inserted into and expressed in host cells not naturally having the gene. These unsaturated fatty acids can then be used as intermediates in host cells that are engineered to produce branched and unsaturated fatty acid derivative substrates, such as branched and unsaturated fatty aldehydes, which can in turn be converted into branched and unsaturated fatty alcohols and derivatives.

[0205] In other instances, a repressor of fatty acid biosynthesis, for example, fabR (GenBank accession NP_418398), can be deleted, which will also result in increased unsaturated fatty acid production in E. coli (Zhang et al., J. Biol. Chem., 277: 15558 (2002)). Similar deletions may be made in other host cells. A further increase in unsaturated fatty acids may be achieved, for example, by overexpressing fabM (trans-2, cis-3-decenoyl-ACP isomerase, GenBank accession DAA05501) and controlled expression of fabK (trans-2-enoyl-ACP reductase II, GenBank accession NP_357969) from Streptococcus pneumoniae (Marrakchi et al., J. Biol. Chem., 277: 44809 (2002)), while deleting E. coli fabI (trans-2-enoyl-ACP reductase, GenBank accession NP_415804). In some examples, the endogenous fabF gene can be attenuated, thus increasing the percentage of palmitoleate (C16:1) produced.

Production of Genetic Variants

[0206] Variants can be naturally occurring or created in vitro. In particular, such variants can be created using genetic engineering techniques, such as site directed mutagenesis, random chemical mutagenesis, Exonuclease III deletion procedures, or standard cloning techniques. Alternatively, such variants, fragments, analogs, or derivatives can be created using chemical synthesis or modification procedures.

[0207] Methods of making variants are well known in the art. These include procedures in which nucleic acid sequences obtained from natural isolates are modified to generate nucleic acids that encode polypeptides having characteristics that enhance their value in industrial or laboratory applications. In such procedures, a large number of variant sequences having one or more nucleotide differences with respect to the sequence obtained from the natural isolate are generated and characterized. Typically, these nucleotide differences result in amino acid changes with respect to the polypeptides encoded by the nucleic acids from the natural isolates.

[0208] For example, variants can be created using error prone PCR (see, e.g., Leung et al., Technique, 1: 11-15 (1989); and Caldwell et al., PCR Methods Applic., 2: 28-33 (1992)). In error prone PCR, PCR is performed under conditions where the copying fidelity of the DNA polymerase is low, such that a high rate of point mutations is obtained along the entire length of the PCR product. Briefly, in such procedures, nucleic acids to be mutagenized (e.g., a fatty aldehyde biosynthetic polynucleotide sequence) are mixed with PCR primers, reaction buffer, MgCl.sub.2, MnCl.sub.2, Taq polymerase, and an appropriate concentration of dNTPs for achieving a high rate of point mutation along the entire length of the PCR product. For example, the reaction can be performed using 20 fmoles of nucleic acid to be mutagenized (e.g., a fatty aldehyde biosynthetic polynucleotide sequence), 30 pmole of each PCR primer, a reaction buffer comprising 50 mM KCl, 10 mM Tris HCl (pH 8.3), and 0.01% gelatin, 7 mM MgCl.sub.2, 0.5 mM MnCl.sub.2, 5 units of Taq polymerase, 0.2 mM dGTP, 0.2 mM dATP, 1 mM dCTP, and 1 mM dTTP. PCR can be performed for 30 cycles of 94.degree. C. for 1 min, 45.degree. C. for 1 min, and 72.degree. C. for 1 min. However, it will be appreciated that these parameters can be varied as appropriate. The mutagenized nucleic acids are then cloned into an appropriate vector and the activities of the polypeptides encoded by the mutagenized nucleic acids are evaluated.

[0209] Variants can also be created using oligonucleotide directed mutagenesis to generate site-specific mutations in any cloned DNA of interest. Oligonucleotide mutagenesis is described in, for example, Reidhaar-Olson et al., Science, 241: 53-57 (1988).

[0210] Variants can also be generated by assembly PCR, which involves the assembly of a PCR product from a mixture of small DNA fragments. A large number of different PCR reactions occur in parallel in the same vial, with the products of one reaction priming the products of another reaction. Assembly PCR is described in, e.g., U.S. Pat. No. 5,965,408.

[0211] Still another method of generating variants is sexual PCR mutagenesis, wherein forced homologous recombination occurs between DNA molecules of different, but highly related, DNA sequence in vitro as a result of random fragmentation of the DNA molecule based on sequence homology. This is followed by fixation of the crossover by primer extension in a PCR reaction. Sexual PCR mutagenesis is described in, for example, Stemmer, Proc. Natl. Acad. Sci. USA, 91: 10747-10751 (1994).

[0212] Variants can also be created by in vivo mutagenesis. In some embodiments, random mutations in a nucleic acid sequence are generated by propagating the sequence in a bacterial strain, such as an E. coli strain, which carries mutations in one or more of the DNA repair pathways. Such "mutator" strains have a higher random mutation rate than that of a wild-type strain. Propagating a DNA sequence (e.g., a BKD polynucleotide sequence, a beta acyl-ACP synthase polynucleotide sequence, a fatty aldehyde biosynthesis polynucleotide sequence, or a fatty alcohol biosynthesis polynucleotide sequence) in one of these strains will eventually generate random mutations within the DNA. Mutator strains suitable for use for in vivo mutagenesis are described in, for example, International Publication WO 91/016427.

[0213] Variants can also be generated using cassette mutagenesis. In cassette mutagenesis, a small region of a double stranded DNA molecule is replaced with a synthetic oligonucleotide "cassette" that differs from the native sequence. The oligonucleotide often contains a completely and/or partially randomized native sequence.

[0214] Recursive ensemble mutagenesis can also be used to generate variants. Recursive ensemble mutagenesis is an algorithm for protein engineering (i.e., protein mutagenesis) developed to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Recursive ensemble mutagenesis is described in, for example, Arkin et al., Proc. Natl. Acad. Sci. USA, 89: 7811-7815 (1992).

[0215] In some embodiments, variants are created using exponential ensemble mutagenesis. Exponential ensemble mutagenesis is a process for generating combinatorial libraries with a high percentage of unique and functional mutants, wherein small groups of residues are randomized in parallel to identify, at each altered position, amino acids which lead to functional proteins. Exponential ensemble mutagenesis is described in, for example, Delegrave et al., Biotech. Res., 11: 1548-1552 (1993). Random and site-directed mutagenesis are described in, for example, Arnold, Curr. Opin. Biotech., 4: 450-455 (1993).

[0216] In some embodiments, variants are created using shuffling procedures wherein portions of a plurality of nucleic acids that encode distinct polypeptides are fused together to create chimeric nucleic acid sequences that encode chimeric polypeptides as described in, for example, U.S. Pat. Nos. 5,965,408 and 5,939,250.

[0217] Polynucleotide variants also include nucleic acid analogs. Nucleic acid analogs can be modified at the base moiety, sugar moiety, or phosphate backbone to improve, for example, stability, hybridization, or solubility of the nucleic acid. Modifications at the base moiety include deoxyuridine for deoxythymidine and 5-methyl-2'-deoxycytidine or 5-bromo-2'-doxycytidine for deoxycytidine. Modifications of the sugar moiety include modification of the 2' hydroxyl of the ribose sugar to form 2'-halo, 2'-O-methyl or 2'-O-allyl sugars. The deoxyribose phosphate backbone can be modified to produce morpholino nucleic acids, in which each base moiety is linked to a six-membered, morpholino ring, or peptide nucleic acids, in which the deoxyphosphate backbone is replaced by a pseudopeptide backbone and the four bases are retained. (See, e.g., Summerton et al., Antisense Nucleic Acid Drug Dev., 7: 187-195 (1997); and Hyrup et al., Bioorgan. Med. Chem., 4: 5-23 (1996)). In addition, the deoxyphosphate backbone can be replaced with, for example, a phosphorothioate or phosphorodithioate backbone, a phosphoroamidite, or an alkyl phosphotriester backbone.

Production of Polypeptide Variants

[0218] Conservative substitutions are those that substitute an amino acid in a polypeptide by another amino acid of similar characteristics. Common conservative substitutions include, without limitation: replacing an aliphatic amino acid, such as alanine, valine, leucine, and isoleucine, with another aliphatic amino acid; replacing a serine with a threonine or vice versa; replacing an acidic residue, such as aspartic acid and glutamic acid, with another acidic residue; replacing a residue bearing an amide group, such as asparagine and glutamine, with another residue bearing an amide group; replacing a basic residue, such as lysine and arginine, with another basic residue; and replacing an aromatic residue, such as phenylalanine and tyrosine, with another aromatic residue.

[0219] Other polypeptide variants are those in which one or more amino acid residues include a substituent group. Still other polypeptide variants are those in which the polypeptide is associated with another compound, such as a compound to increase the half-life of the polypeptide (e.g., polyethylene glycol).

[0220] Additional polypeptide variants are those in which additional amino acids are fused to the polypeptide, such as a leader sequence, a secretory sequence, a proprotein sequence, or a sequence which facilitates purification, enrichment, or stabilization of the polypeptide.

[0221] In some instances, the polypeptide variants described herein retain the same biological function as a polypeptide from which they are derived (e.g., retain branched-chain alpha keto acid dehydrogenase activity, retain beta ketoyacyl ACP synthase activity, such as FabH activity, or retain fatty aldehyde biosynthetic activity, such as carboxylic acid or fatty acid reductase activity) and have amino acid sequences substantially identical thereto.

[0222] In other instances, the polypeptide variants have at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or more than about 95% homology to an amino acid sequence from which they are derived. In another embodiment, the polypeptide variants include a fragment comprising at least about 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof.

[0223] The polypeptide variants or fragments thereof can be obtained by isolating nucleic acids encoding them using techniques described herein or by expressing synthetic nucleic acids encoding them. Alternatively, polypeptide variants or fragments thereof can be obtained through biochemical enrichment or purification procedures. The sequence of polypeptide variants or fragments can be determined by proteolytic digestion, gel electrophoresis, and/or microsequencing. The sequence of the polypeptide variants or fragments can then be compared to the amino acid sequence from which it is derived using any of the programs described herein.

[0224] The polypeptide variants and fragments thereof can be assayed for enzymatic activity. For example, the polypeptide variants or fragments can be contacted with a substrate under conditions that allow the polypeptide variants or fragments to function. A decrease in the level of the substrate or an increase in the level of the desired product can be measured to determine its activity.

Modifications to Increase Conversion of Branched Substrates to Branched Fatty Alcohol

[0225] Host cells can be engineered using known polypeptides to produce branched fatty alcohols from branched substrate, including, for example, a branched fatty acid, a branched fatty acid derivative, a branched acyl-CoA, or a branched acyl-CoA derivative substrate. For example, one method of making branched fatty alcohols involves increasing the expression of, or expressing more active forms of, fatty alcohol forming acyl-CoA reductases (encode by a gene such as acr1 from FAR, EC 1.2.1.50/1.1.1) or acyl-CoA reductases (EC 1.2.1.50) and alcohol dehydrogenase (EC 1.1.1.1).

[0226] The host cell can also be, for example, modified or engineered, such that it expresses or overexpresses at least one (E1) subunit of a branched-chain alpha keto acid dehydrogenase complex, and a beta ketoacyl-ACP synthase. The host cell can be further engineered such that it expresses or overexpresses a fatty aldehyde biosynthesis polypeptide and/or a fatty alcohol biosynthesis polypeptide. Alternatively, the host cell can be engineered such that it expresses or overexpresses an acyl-ACP reductase polypeptide and a fatty alcohol biosynthesis polypeptide.

[0227] In certain embodiments, the gene encoding the subunits of branched-chain alpha keto acid dehydrogenase complex can be derived from a bacterium, a plant, an insect, a yeast, a fungus, or a mammal. For example, the subunits of the branched-chain alpha keto acid dehydrogenase complex can be derived from a bacterium that uses branched amino acids as carbon source, including, for example, Pseudomonas putida or Bacillus subtilis. In another example, the branched-chain alpha-keto acid dehydrogenase complex polypeptide can be from a bacterium that comprises branched fatty acids in its phospholipids, including, e.g., a Legionella, Stenotrophomonas, Alteromonas, Flavobacterium, Myxococcus, Bccteroides, Micrococcus, Staphylococcus, Bacillus, Clostridium, Listeria, Lactococcus, or Streptomyces. In some embodiments, the bacterium is a Leginella pneumophila, Stenotrophomonas maltophilia, Alteromonas macleodii, Flabobacterium phsychrophilum, Myxococcus Xanthus, Bacteroides thetaiotaomicron, Macrococcus luteus, Staphylococcus aureus, Clostridium thermocellum, Listeria monocytogenes, Streptomyces lividans, Streptomyces coelicolor, Streptomyces glaucescens, Streptococcus pneumoniae, Streptomyces peucetius, Streptococcus pyogenes, Escherichia coli, Escherichia coli K-12, Lactococcus lactis ssp. Lactis, Mycobacterium tuberculosis, Enterococcus tuberculosis, Bacillus subtilis, Lactobacillus plantarum. In some embodiments, suitable fatty aldehyde biosynthesis polypeptides, fatty alcohol biosynthesis polypeptides, acyl-ACP reductases, and other polypeptides of the invention can be from a mycobacterium selected from Mycobacterium smegmatis, Mycobacterium abscessus, Mycobacterium avium, Mycobacterium bovis, Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum, or Mycobacterium ulcerans. In other embodiments, the bacterium is Nocardia sp. NRRL 5646, Nocardia farcinica, Streptomyces griseus, Salinispora arenicola, or Clavibacter michiganenesis. In yet further embodiments, the polypeptide of the invention is derived from a cyanobacterium, including, for example, Synechococcus elongatus PCC7942, Synechocystis sp. PCC6803, Cyanothece sp. ATCC51142, Prochlorococcus marinus subsp. pastoris str. CCMP1986 PMM0533, Gloeobacter violaceus PCC7421, Nostoc punctiforme PCC73102, Anabaena variabilis ATCC29413, Synechococcus elongatus PCC6301, and Nostoc sp. PCC 7120, Microcoleus chthonoplastes PCC7420, Arthrospira maxima CS-328, Lyngbya sp. PCC8106, Nodularia spumigena CCY9414, Trichodesmium erythraeum IMS101, Microcystis aeruginosa, Nostoc azollae, Anabaena variabilis, Crocophaera watsonii, Thermosynechococcus elongatus, Gloeobacer violaceus, Cyanobium, or Prochlorococcus marinus.

Genetic Engineering of Host Cells to Produce Branched Fatty Alcohols

[0228] Various host cells can be used to produce branched fatty alcohols, as described herein. A host cell can be any prokaryotic or eukaryotic cell. For example, the host cell can be bacterial cells (such as E. coli), insect cells, yeast, or mammalian cells (such as Chinese hamster ovary cells (CHO) cells, COS cells, VERO cells, BHK cells, HeLa cells, Cv1 cells, MDCK cells, 293 cells, 3T3 cells, or PC12 cells). Other exemplary host cells include cells from the members of the genus Escherichia, Bacillus, Lactobacillus, Rhodococcus, Pseudomonas, Aspergillus, Trichoderma, Neurospora, Fusarium, Humicola, Rhizomucor, Kluyveromyces, Pichia, Mucor, Myceliophtora, Penicillium, Phanerochaete, Pleurotus, Trametes, Chrysosporium, Saccharomyces, Schizosaccharomyces, Yarrowia, or Streptomyces. Yet other exemplary host cells can be a Bacillus lentus cell, a Bacillus brevis cell, a Bacillus stearothermophilus cell, a Bacillus licheniformis cell, a Bacillus alkalophilus cell, a Bacillus coagulans cell, a Bacillus circulans cell, a Bacillus pumilis cell, a Bacillus thuringiensis cell, a Bacillus clausii cell, a Bacillus megaterium cell, a Bacillus subtilis cell, a Bacillus amyloliquefaciens cell, a Trichoderma koningii cell, a Trichoderma viride cell, a Trichoderma reesei cell, a Trichoderma longibrachiatum cell, an Aspergillus awamori cell, an Aspergillus fumigates cell, an Aspergillus foetidus cell, an Aspergillus nidulans cell, an Aspergillus niger cell, an Aspergillus oryzae cell, a Humicola insolens cell, a Humicola lanuginose cell, a Rhizomucor miehei cell, a Mucor michei cell, a Streptomyces lividans cell, a Streptomyces murinus cell, or an Actinomycetes cell. Host cells can also be cyanobacterial cells such as, for example, Synechoccus sp., Synechoccus elongatus, or Synechocystis sp. cells.

[0229] In a preferred embodiment, the host cell is an E. coli cell, a Saccharomyces cerevisiae cell, or a Bacillus subtilis cell. For example, the host cell can be one from E. coli strain B, C, K, or W. Other suitable host cells are known to those skilled in the art.

[0230] Various methods well known in the art can be used to genetically engineer host cells to produce branched fatty alcohols. The methods can include the use of vectors, preferably expression vectors, containing a nucleic acid encoding the first (E1 alpha/beta) subunit of a branched-chain alpha keto acid dehydrogenase, and optionally also the second (E2) and/or the third (E3) subunit of that enzyme, and/or a beta ketoacyl-ACP synthase, and/or a fatty aldehyde biosynthetic polypeptide, and/or an alcohol dehydrogenase, and/or an acyl-ACP reductases, described herein, polypeptide variant, or a fragment thereof. Those skilled in the art will appreciate a variety of viral vectors (for example, retroviral vectors, lentiviral vectors, adenoviral vectors, and adeno-associated viral vectors) and non-viral vectors can be used in the methods described herein.

[0231] The recombinant expression vectors can include polynucleotides described herein in a form suitable for expression in a host cell. The recombinant expression vectors can include one or more control sequences, selected on the basis of the host cell to be used for expression. The control sequence is operably linked to the nucleic acid sequence to be expressed. Such control sequences are described, for example, in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Control sequences include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of protein desired, etc. The expression vectors described herein can be introduced into host cells to produce polypeptides, including fusion polypeptides, encoded by the nucleic acids as described herein.

[0232] In some embodiments, recombinant expression vectors can be designed for expression of a gene encoding a first (E1 alpha/beta) subunit, and optionally a second (E2) and/or a third (E3) subunit of a branched-chain alpha-keto acid dehydrogenase (or variant) and/or a gene encoding a beta-ketoacyl ACP synthase (or variant), and/or a gene encoding a fatty aldehyde biosynthesis polypeptide (or variant), and/or a gene encoding an alcohol dehydrogenase (or variant), and/or a gene encoding an acyl-ACP reductases (or variant) in a suitable host cell. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for example, by using T7 promoter regulatory sequences and T7 polymerase.

[0233] Expression of genes encoding polypeptides in prokaryotes, for example, E. coli, is most often carried out with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion polypeptides. Fusion vectors add a number of amino acids to a polypeptide encoded therein, usually to the amino terminus of the recombinant polypeptide. Such fusion vectors typically serve three purposes: (1) to increase expression of the recombinant polypeptide; (2) to increase the solubility of the recombinant polypeptide; and (3) to aid in the purification of the recombinant polypeptide by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant polypeptide. This enables separation of the recombinant polypeptide from the fusion moiety after purification of the fusion polypeptide. Examples of such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin, and enterokinase. Exemplary fusion expression vectors include pGEX (Pharmacia Biotech Inc.; Smith et al., Gene, 67: 31-40 (1988)), pMAL (New England Biolabs, Beverly, Mass.), and pRITS (Pharmacia, Piscataway, N.J.), which fuse glutathione S-transferase (GST), maltose E binding protein, or protein A, respectively, to the target recombinant polypeptide.

[0234] Examples of inducible, non-fusion E. coli expression vectors include pTrc (Amann et al., Gene, 69: 301-315 (1988)) and pET 11d (Studier et al., Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990), pp. 60-89). Target gene expression from the pTrc vector relies on host RNA polymerase transcription from a hybrid trp-lac fusion promoter. Target gene expression from the pET 11d vector relies on transcription from a T7 gn10-lac fusion promoter mediated by a coexpressed viral RNA polymerase (T7 gn1). This viral polymerase is supplied by host strains BL21(DE3) or HMS174(DE3) from a resident .lamda., prophage harboring a T7 gn1 gene under the transcriptional control of the lacUV 5 promoter.

[0235] One strategy to maximize expression is to express the polypeptide in a host cell with an impaired capacity to proteolytically cleave the recombinant polypeptide (see Gottesman, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990), pp. 119-128). Another strategy is to alter the nucleic acid sequence to be inserted into an expression vector so that the individual codons for each amino acid are those preferentially utilized in the host cell (Wada et al., Nucleic Acids Res., 20: 2111-18 (1992)). These strategies can be carried out by standard DNA synthesis techniques.

[0236] In another embodiment, the host cell is a yeast cell, and the expression vector is a yeast expression vector. Examples of vectors for expression in yeast S. cerevisiae include pYepSecl (Baldari et al., EMBO J., 6: 229-234 (1987)), pMFa (Kurjan et al., Cell, 30: 933-943 (1982)), pJRY88 (Schultz et al., Gene, 54: 113-123 (1987)), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (Invitrogen Corp, San Diego, Calif.).

[0237] Alternatively, polypeptides described herein can be expressed in insect cells using baculovirus expression vectors. Available baculovirus vectors include, for example, the pAc series (Smith et al., Mol. Cell Biol., 3: 2156-2165 (1983)) and the pVL series (Lucklow et al., Virology, 170: 31-39 (1989)).

[0238] In yet another embodiment, the polypeptides described herein can be expressed in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, Nature, 329: 840 (1987)) and pMT2PC (Kaufman et al., EMBO J., 6: 187-195 (1987)). When used in mammalian cells, the expression vector's control functions can be provided by viral regulatory elements. Commonly used promoters include those derived from polyoma, Adenovirus 2, cytomegalovirus, and Simian Virus 40. Other suitable expression systems for both prokaryotic and eukaryotic cells are described in chapters 16-17 of Sambrook et al., eds., Molecular Cloning: A Laboratory Manual. 2.sup.nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989.

[0239] Vectors can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms "transformation" and "transfection" refer to a variety of art-recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation.

[0240] It is known that, depending upon the expression vector and transformation technique used, only a small fraction of bacterial cells will take-up and replicate the expression vector. In order to identify and select these transformants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) can be introduced into the host cells along with the gene of interest. Selectable markers include those that confer resistance to drugs, such as ampicillin, kanamycin, chloramphenicol, or tetracycline. Nucleic acids encoding a selectable marker can be introduced into a host cell on the same vector as that encoding a polypeptide described herein or can be introduced on a separate vector. Cells stably transfected with the introduced nucleic acid can be identified by drug selection (e.g., cells that have incorporated the selectable marker gene will survive, while the other cells die).

[0241] It is known that, depending upon the expression vector and transfection technique used, only a small fraction of mammalian cells may integrate the foreign DNA into their genome. In order to identify and select these integrants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) can be introduced into the host cells along with the gene of interest. Preferred selectable markers include those which confer resistance to drugs, such as G418, hygromycin, and methotrexate. Nucleic acids encoding a selectable marker can be introduced into a host cell on the same vector as that encoding a polypeptide described herein or can be introduced on a separate vector. Cells stably transfected with the introduced nucleic acid can be identified by drug selection (e.g., cells that have incorporated the selectable marker gene will survive, while the other cells die).

Transport Proteins

[0242] Transport proteins can export or excrete polypeptides and organic compounds (e.g., branched fatty alcohols) out of a host cell. A number of transport and efflux proteins can be modified to selectively secrete particular types of compounds such as branched fatty alcohols.

[0243] Non-limiting examples of suitable transport proteins are ATP-Binding Cassette (ABC) transport proteins, efflux proteins, and fatty acid transporter proteins (FATP). Additional suitable transport proteins include the ABC transport proteins from organisms such as Caenorhabditis elegans, Arabidopsis thalania, Alkaligenes eutrophus, or Rhodococcus erythropolis. Exemplary ABC transport proteins include, without limitation, CERS, AtMRPS, AmiS2, and AtPGP1. Host cells can also be chosen for their endogenous ability to secrete organic compounds. The efficiency of organic compound production and secretion into the host cell environment (e.g., culture medium, fermentation broth) can be expressed as a ratio of intracellular product to extracellular product. For example, the ratio can be about 5:1, 4:1, 3:1, 2:1, 1:1, 1:2, 1:3, 1:4, or 1:5.

Fermentation

[0244] The production and isolation of branched fatty alcohols can be enhanced by employing beneficial fermentation techniques. One method for maximizing production while reducing costs is increasing the percentage of the carbon source that is converted to the branched fatty alcohol products.

[0245] During normal cellular lifecycles, carbon is used in cellular functions such as producing lipids, saccharides, proteins, organic acids, and nucleic acids. Reducing the amount of carbon necessary for growth-related activities can increase the efficiency of carbon source conversion to product. This can be achieved by, for example, first growing host cells to a desired density (for example, a density achieved at the peak of the log phase of growth). At such a point, replication checkpoint genes can be harnessed to stop the growth of cells. Specifically, quorum sensing mechanisms (reviewed in Camilli et al., Science, 311: 1113 (2006); Venturi FEMS Microbio. Rev., 30: 274-291 (2006); and Reading et al., FEMS Microbiol. Lett., 254: 1-11 (2006)) can be used to activate checkpoint genes, such as p53, p21, or other checkpoint genes.

[0246] Genes that can be activated to stop cell replication and/or growth in E. coli include umuDC genes. The overexpression of umuDC genes stops the progression from stationary phase to exponential growth (Murli et al., J. Bact., 182: 1127 (2000)). UmuC is a DNA polymerase that can carry out translesion synthesis over non-coding lesions--the mechanistic basis of most UV and chemical mutagenesis. The umuDC gene products are involved in the process of translesion synthesis and also serve as a DNA sequence damage checkpoint. The umuDC gene products include UmuC, UmuD, umuD', UmuD'.sub.2C, UmuD' .sub.2, and UmuD.sub.2. Simultaneously, product-producing genes can be activated, thus minimizing the need for replication and maintenance pathways to be used while a fatty aldehyde is being made. Host cells can also be engineered to express umuC and umuD from E. coli in pBAD24 under the prpBCDE promoter system through de novo synthesis of this gene with the appropriate end-product production genes.

[0247] The percentage of input carbons converted to branched fatty alcohols can be a cost driver. The more efficient the process is (i.e., the higher the percentage of input carbons converted to branched fatty alcohols), the less expensive the process will be. For oxygen-containing carbon sources (e.g., glucose and other carbohydrate based sources), the oxygen must be released in the form of carbon dioxide. For every 2 oxygen atoms released, a carbon atom is also released leading to a maximal theoretical metabolic efficiency of approximately 34% (w/w) (for fatty acid derived products). This figure, however, changes for other organic compounds and carbon sources. Typical efficiencies in the literature are approximately less than 5%. Host cells engineered to produce fatty alcohols can have greater than about 1, 3, 5, 10, 15, 20, 25, and 30% efficiency. In one example, host cells can exhibit an efficiency of about 10% to about 25%. In other examples, such host cells can exhibit an efficiency of about 25% to about 30%. In other examples, host cells can exhibit greater than 30% efficiency.

[0248] The host cell can be additionally engineered to express recombinant cellulosomes, such as those described in International Publication WO 2008/100251. These cellulosomes can allow the host cell to use cellulosic material as a carbon source. For example, the host cell can be additionally engineered to express invertases (EC 3.2.1.26) so that sucrose can be used as a carbon source. Similarly, the host cell can be engineered using the teachings described in U.S. Pat. Nos. 5,000,000; 5,028,539; 5,424,202; 5,482,846; and 5,602,030, so that the host cell can assimilate carbon efficiently and use cellulosic materials as carbon sources.

[0249] In one example, the fermentation chamber can enclose a fermentation that is undergoing a continuous reduction. In this instance, a stable reductive environment can be created. The electron balance can be maintained by the release of carbon dioxide (in gaseous form). Efforts to augment the NAD/H and NADP/H balance can also facilitate in stabilizing the electron balance. The availability of intracellular NADPH can also be enhanced by engineering the host cell to express an NADH:NADPH transhydrogenase. The expression of one or more NADH:NADPH transhydrogenases converts the NADH produced in glycolysis to NADPH, which can enhance the production of fatty alcohols.

[0250] For small scale production, the engineered host cells can be (a) grown in batches of, for example, about 100 mL, 500 mL, 1 L, 2 L, 5 L, or 10 L, (b) fermented, and (c) induced to express desired bkd genes, beta-ketoacyl ACP synthase genes, fatty aldehyde biosynthesis genes, alcohol dehydrogenase genes, and/or acyl-ACP reductases genes, based on the specific genes encoded in the appropriate plasmids. For large scale production, the engineered host cells can be (a) grown in batches of about 10 L, 100 L, 1000 L, 10,000 L, 100,000 L, 1,000,000 L, or larger, (b) fermented, and (c) induced to express the desired bkd genes, beta-ketoacyl ACP synthase genes, fatty aldehyde biosynthesis genes, alcohol dehydrogenase genes, and/or acyl-ACP reductases genes based on the specific genes encoded in the plasmids or incorporated into the host cell's genome.

[0251] For example, a suitable production host, such as an E. coli, harboring plasmids containing the desired genes or having the genes integrated in its chromosome can be incubated in a suitable reactor, for example a 1 L reactor, for 20 hours at 37.degree. C. in an M9 medium supplemented with 2% glucose, carbenicillin, and chloramphenicol. When the OD.sub.600 of the culture reaches 0.9, the production host can be induced with IPTG alcohol After incubation, the spent media can be extracted and the organic phase can be examined for the presence of branched fatty alcohols using GC-MS.

[0252] In some instances, after the first hour of induction, aliquots of no more than about 10% of the total cell volume can be removed each hour and allowed to sit without agitation to allow the branched fatty alcohols to rise to the surface and undergo a spontaneous phase separation or precipitation. The branched fatty alcohol component can then be collected, and the aqueous phase returned to the reaction chamber. The reaction chamber can be operated continuously. When the OD.sub.600 drops below 0.6, the cells can be replaced with a new batch grown from a seed culture.

Producing Branched Fatty Alcohols and Derivatives Using Cell-Free Methods

[0253] In some embodiments, branched fatty alcohols can be produced using a purified polypeptide (e.g., a branched-chain alpha keto acid dehydrogenase complex polypeptide) described herein and a substrate (e.g., an alpha keto acid, malonyl-CoA, 2-oxo-isovalerate, 2-oxo-isobutylrate, 2-oxo-3-methyl-valerate. 2-oxo-isocaproate, 2-oxoglutarate, 2-oxopentanoate, 3-methyl-2-oxobutanoate, 3-methyl-2-oxopentanoate, 4-methyl-2-oxopentanoate, or pyruvate) produced, for example, by a method described herein. For example, a host cell can be engineered to express a branched-chain alpha keto acid dehydrogenase polypeptide or the E1 (alpha and beta), and optionally, the E2 and/or the E3 subunits thereof, or variants as described herein. The host cell can be cultured under conditions sufficient to allow expression of the polypeptide. Cell free extracts can then be generated using known methods, including, for example, cell lysis using detergents or sonication. The expressed polypeptides can be purified. Thereafter, substrates described herein can be added to the cell free extracts and maintained under conditions to allow conversion of the substrates (e.g., alpha keto acids, such as 2-oxo-isovalerate, 2-oxo-isobutylrate, 2-oxo-3-methyl-valerate. 2-oxo-isocaproate, 2-oxoglutarate, 2-oxopentanoate, 3-methyl-2-oxobutanoate, 3-methyl-2-oxopentanoate, 4-methyl-2-oxopentanoate, or pyruvate) to branched chain acyl-CoAs, which can then be converted into branched fatty aldehydes and branched fatty alcohols. The branched fatty alcohols can then be separated and purified using known techniques.

Post-Production Processing

[0254] Depending on the intended use of the branched fatty alcohols produced in accordance with the methods here, post-production processing may or may not be necessary. As such, in certain industrial applications, the produced branched fatty alcohols and/or derivatives may be suitably used per se as surfactants. Moreover, such surfactants can be directly blended or formulated into suitable cleaning compositions.

[0255] The branched fatty alcohols produced during fermentation can be separated from the fermentation media, using any known technique for separating fatty alcohols from aqueous media. One exemplary separation process is a two phase (bi-phasic) separation process, which involves fermenting the genetically engineered host cells under conditions sufficient to produce a branched fatty alcohol, allowing it to collect in an organic phase, and separating the organic phase from the aqueous fermentation broth. This method can be practiced in both a batch and continuous fermentation processes.

[0256] Bi-phasic separation uses the relative immiscibility of fatty alcohols to facilitate separation. Immiscible refers to the relative inability of a compound to dissolve in water and is defined by the compound's partition coefficient. One of ordinary skill in the art will appreciate that by choosing a fermentation broth and organic phase, such that the branched fatty alcohol being produced has a high logP value, the branched fatty alcohol can separate into the organic phase, even at very low concentrations, in the fermentation vessel.

[0257] The branched fatty alcohols produced by the methods described herein can be relatively immiscible in the fermentation broth and the cytoplasm. Therefore, the branched fatty alcohol can collect in an organic phase either intracellularly or extracellularly. The collection of the products in the organic phase can lessen the impact of the branched fatty alcohol on cellular function and can allow the host cell to produce more product.

[0258] The branched fatty alcohol can thus be produced as a homogeneous compounds wherein at least about 60%, 70%, 80%, 90%, or 95% of the branched fatty alcohols produced will have carbon chain lengths that vary by less than about 6 carbons, less than about 4 carbons, or less than about 2 carbons. These compounds can also be produced with a relatively uniform degree of saturation. They can be used per se as surfactants or can be formulated into suitable cleaning compositions. They can also be used as fuels, fuel additives, starting materials for production of other chemical compounds (e.g., polymers, surfactants, plastics, textiles, solvents, adhesives, etc.), or personal care additives. These compounds can also be used as feedstock for subsequent reactions, for example, hydrogenation, catalytic cracking (e.g., via hydrogenation, pyrolisis, or both), and can be dehydrated to make other products. In particular, these branched products confer low volatility, beneficial low-temperature properties, as well as oxidative stability, making them ideal for low temperature applications such as in household cleaning compositions and personal and beauty care products.

[0259] In some embodiments, the branched fatty alcohols produced using methods described herein can contain between about 50% and about 90% carbon, or between about 5% and about 25% hydrogen. In other embodiments, the branched fatty alcohols produced using methods described herein can contain between about 65% and about 85% carbon, or between about 10% and about 15% hydrogen.

[0260] In some embodiments, the branched fatty alcohols produced in accordance with the disclosure herein comprises a C.sub.6-C.sub.26 branched fatty alcohol. In some embodiments, the branched fatty alcohol comprises a C.sub.6, C.sub.7, C.sub.8, C.sub.9, C.sub.10, C.sub.11, C.sub.12, C.sub.13, C.sub.14, C.sub.15, C.sub.16, C.sub.17, C.sub.18, C.sub.19, C.sub.20, C.sub.21, C.sub.22, C.sub.23, C.sub.24, C.sub.25, or a C.sub.26 branched fatty alcohol. In particular embodiments, the branched fatty alcohol is a C.sub.6, C.sub.8, C.sub.10, C.sub.12, C.sub.13, C.sub.14, C.sub.15, C.sub.16, C.sub.17, or C.sub.18 branched fatty alcohol. In certain embodiments, the hydroxyl group of the branched fatty alcohol is in the primary (C.sub.1) position. In certain embodiment, the branched fatty alcohol is an iso-fatty alcohol or an anteiso-fatty alcohol. In exemplary embodiments, the branched fatty alcohol is selected from iso-C.sub.7:0, iso-C.sub.8:0, iso-C.sub.9:0, iso-C.sub.10:0, iso-C.sub.11:0, iso-C.sub.12:0, iso-C.sub.13:0, iso-C.sub.14:0, iso-C.sub.15:0, iso-C.sub.16:0, iso-C.sub.17:0, iso-C.sub.18:0, iso-C.sub.19:0, anteiso-C.sub.7:0, anteiso-C.sub.8:0, anteiso-C.sub.9:0, anteiso-C.sub.10:0, anteiso-C.sub.11:0, anteiso-C.sub.12:0, anteiso-C.sub.13:0, anteiso-C.sub.14:0, anteiso-C.sub.15:0, anteiso-C.sub.16:0, anteiso-C.sub.17:0, anteiso-C.sub.18:0, and anteiso-C.sub.19:0 fatty alcohol.

[0261] In certain embodiments, the fatty alcohol product can comprise straight chain fatty alcohols. In other embodiments, the branched fatty alcohols produced by the host cells described herein can comprise one or more points of branching. In certain embodiments, the branched fatty alcohols produced by the host cells as described herein can comprise one or more cyclic moieties.

[0262] In some embodiments, the branched fatty alcohols can be unsaturated branched fatty alcohols. For example, the branched fatty alcohols produced in accordance with the present description can be monounsaturated branched fatty alcohols. In certain embodiments, the unsaturated branched fatty alcohol can be a C6:1, C7:1, C8:1, C9:1, C10:1, C11:1, C12:1, C13:1, C14:1, C15:1, C16:1, C17:1, C18:1, C19:1, C20:1, C21:1, C22:1, C23:1, C24:1, C25:1, or a C26:1 unsaturated branched fatty alcohol. In other embodiments, the branched fatty alcohol is unsaturated at the omega-7 position. In certain embodiments, the unsaturated branched fatty alcohol comprises a cis double bond.

[0263] In some embodiments, branched fatty alcohols are produced at a relative yield to a straight-chain fatty alcohol at about 20%, for example, at about 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or higher. In an exemplary embodiment, the total amount of branched fatty alcohols produced is estimated to about 45% to about 50% relative to the amount of straight-chain fatty alcohols produced by a host cell.

[0264] In any of the aspects described herein, the production yield of fatty alcohols, including branched fatty alcohols and straight chain fatty alcohol, is about 1 mg/L, 5 mg/L, 10 mg/L, 15 mg/L, 20 mg/L, 25 mg/L, about 50 mg/L, about 75 mg/L, about 100 mg/L, about 125 mg/L, about 150 mg/L, about 175 mg/L, about 200 mg/L, about 225 mg/L, about 250 mg/L, about 275 mg/L, about 300 mg/L, about 325 mg/L, about 350 mg/L, about 375 mg/L, about 400 mg/L, about 425 mg/L, about 450 mg/L, about 475 mg/L, about 500 mg/L, about 525 mg/L, about 550 mg/L, about 575 mg/L, about 600 mg/L, about 625 mg/L, about 650 mg/L, about 675 mg/L, about 700 mg/L, about 725 mg/L, about 750 mg/L, about 775 mg/L, about 800 mg/L, about 825 mg/L, about 850 mg/L, about 875 mg/L, about 900 mg/L, about 925 mg/L, about 950 mg/L, about 975 mg/L, about 1000 g/L, about 1050 mg/L, about 1075 mg/L, about 1100 mg/L, about 1125 mg/L, about 1150 mg/L, about 1175 mg/L, about 1200 mg/L, about 1225 mg/L, about 1250 mg/L, about 1275 mg/L, about 1300 mg/L, about 1325 mg/L, about 1350 mg/L, about 1375 mg/L, about 1400 mg/L, about 1425 mg/L, about 1450 mg/L, about 1475 mg/L, about 1500 mg/L, about 1525 mg/L, about 1550 mg/L, about 1575 mg/L, about 1600 mg/L, about 1625 mg/L, about 1650 mg/L, about 1675 mg/L, about 1700 mg/L, about 1725 mg/L, about 1750 mg/L, about 1775 mg/L, about 1800 mg/L, about 1825 mg/L, about 1850 mg/L, about 1875 mg/L, about 1900 mg/L, about 1925 mg/L, about 1950 mg/L, about 1975 mg/L, about 2000 mg/L, or more.

[0265] In another aspect, the branched fatty alcohol produced in accordance with the present invention is produced by culturing a host cell described herein in a medium having a low level of iron, under conditions sufficient to produce a branched fatty alcohol. In particular embodiments, the medium contains less than about 500 .mu.M iron, less than about 400 .mu.M iron, less than about 300 .mu.M iron, less than about 200 .mu.M iron, less than about 150 .mu.M iron, less than about 100 .mu.M iron, less than about 90 .mu.M iron, less than about 80 .mu.M iron, less than about 70 .mu.M iron, less than about 60 .mu.M iron, less than about 50 .mu.M iron, less than about 40 .mu.M iron, less than about 30 .mu.M iron, less than about 20 .mu.M iron, less than about 10 .mu.M iron, or less than about 5 .mu.M iron. In certain embodiments, the medium does not contain iron.

[0266] Bioproducts (e.g., surfactants and cleaning compositions) comprising microbially produced branched fatty alcohols and/or derivatives, produced using the fatty acid biosynthetic pathway, have not been produced from renewable sources and, as such, are new compositions of matter. These new bioproducts can be distinguished from organic compounds derived from petrochemical carbon on the basis of dual carbon-isotopic fingerprinting or .sup.14C dating. Additionally, the specific source of biosourced carbon (e.g., glucose vs. glycerol) can be determined by dual carbon-isotopic fingerprinting (see, e.g., U.S. Pat. No. 7,169,588, which is herein incorporated by reference).

[0267] The ability to distinguish bioproducts from petroleum based organic compounds is beneficial in tracking these materials in commerce. Organic compounds or chemicals comprising both biologically based and petroleum based carbon isotope profiles may be distinguished from organic compounds and chemicals made only of petroleum based materials. Hence, the surfactants and cleaning compositions of the present invention be followed in commerce on the basis of their unique carbon isotope profile.

[0268] Surfactants or cleaning compositions produced in accordance with the present disclosure can be distinguished from petroleum-derived compounds by comparing the stable carbon isotope ratio (.sup.13C/.sup.12C) of each. The .sup.13C/.sup.12C ratio in a given bioproduct is a consequence of the .sup.13C/.sup.12C ratio in atmospheric carbon dioxide at the time the carbon dioxide is fixed. It also reflects the precise metabolic pathway. Regional variations also occur. Petroleum, C.sub.3 plants (the broadleaf), C.sub.4 plants (the grasses), and marine carbonates all show significant differences in .sup.13C/.sup.12C and the corresponding .delta..sup.13C values. Moreover, lipid matter of C.sub.3 and C.sub.4 plants analyze differently than materials derived from the carbohydrate components of the same plants as a consequence of the metabolic pathway.

[0269] Within the precision of measurement, .sup.13C shows large variations due to isotopic fractionation effects, the most significant of which for bioproducts is the photosynthetic mechanism. The major cause of differences in the carbon isotope ratio in plants is closely associated with differences in the pathway of photosynthetic carbon metabolism in the plants, particularly the reaction occurring during the primary carboxylation (i.e., the initial fixation of atmospheric CO.sub.2). Two large classes of vegetation are those that incorporate the "C.sub.3" (or Calvin-Benson) photosynthetic cycle and those that incorporate the "C.sub.4" (or Hatch-Slack) photosynthetic cycle.

[0270] In C.sub.3 plants, the primary CO.sub.2 fixation/carboxylation reaction involves the enzyme ribulose-1,5-diphosphate carboxylase, and the first stable product is a 3-carbon compound. C.sub.3 plants, such as hardwoods and conifers, are dominant in the temperate climate zones.

[0271] In C.sub.4 plants, an additional carboxylation reaction involving phosphoenol-pyruvate carboxylase, is the primary carboxylation reaction. The first stable carbon compound is a 4-carbon acid that is subsequently decarboxylated. The CO.sub.2 thus released is refixed by the C.sub.3 cycle. Examples of C.sub.4 plants are tropical grasses, corn, and sugar cane.

[0272] Both C.sub.4 and C.sub.3 plants exhibit a range of .sup.13C/.sup.12C isotopic ratios, but typical values are about -7 to about -13 per mil for C.sub.4 plants and about -19 to about -27 per mil for C.sub.3 plants (see, e.g., Stuiver et al., Radiocarbon, 19: 355 (1977)). Coal and petroleum fall generally in this latter range. The .sup.13C measurement scale was originally defined by a zero set by Pee Dee Belemnite (PDB) limestone, where values are given in parts per thousand deviations from this material. The ".delta..sup.13C" values are expressed in parts per thousand (per mil), abbreviated, , and are calculated as follows:

.delta..sup.13C()=[(.sup.13C/.sup.12C).sub.sample-(.sup.13C/.sup.12C).su- b.standard]/(.sup.13C/.sup.12C).sub.standard.times.1000

[0273] Since the PDB reference material (RM) has been exhausted, a series of alternative RMs have been developed in cooperation with the IAEA, USGS, NIST, and other selected international isotope laboratories. Notations for the per mil deviations from PDB is .delta..sup.13C. Measurements are made on CO.sub.2 by high precision stable ratio mass spectrometry (IRMS) on molecular ions of masses 44, 45, and 46.

[0274] The branched fatty alcohol and derivative compositions as well as the surfactants or cleaning compositions described herein include bioproducts produced by any of the methods described herein. The surfactants and cleaning compositions can have a .delta..sup.13C of about -28 or greater, about -27 or greater, -20 or greater, -18 or greater, -15.4 or greater, -15 or greater, -13 or greater, -10 or greater, or -8 or greater. A surfactant or cleaning composition so produced can have a .delta..sup.13C of about -30 to about -15, about -27 to about -19, about -25 to about -21, about -15 to about -5, about -15.4 to about -10.9, about -13.92 to about -13.84, about -13 to about -7, or about -13 to about -10. For example it can have a .delta..sup.13C of about -10, -11, -12, or -12.3.

[0275] The surfactants or cleaning compositions herein can also be distinguished from petroleum-derived compounds by comparing the amount of .sup.14C in each compound. Because .sup.14C has a nuclear half life of 5730 years, petroleum based chemicals containing "older" carbon can be distinguished from bioproducts which contain "newer" carbon (see, e.g., Currie, "Source Apportionment of Atmospheric Particles," Characterization of Environmental Particles, J. Buffle and H. P. van Leeuwen, Eds., 1 of Vol. I of the IUPAC Environmental Analytical Chemistry Series (Lewis Publishers, Inc.) (1992), pp. 3-74).

[0276] The basic assumption in radiocarbon dating is that the constancy of .sup.14C concentration in the atmosphere leads to the constancy of .sup.14C in living organisms. But because of atmospheric nuclear testing since 1950 and the burning of fossil fuel since 1850, .sup.14C has acquired a second, geochemical time characteristic. Its concentration in atmospheric CO.sub.2, and hence in the living biosphere, approximately doubled at the peak of nuclear testing, in the mid-1960s. It has since been gradually returning to the steady-state cosmogenic (atmospheric) baseline isotope rate (.sup.14C/.sup.12C) of about 1.2.times.10.sup.-12, with an approximate relaxation "half-life" of 7-10 years. (This latter half-life is not to be taken literally; rather, the detailed atmospheric nuclear input/decay function to trace the variation of atmospheric and biospheric .sup.14C since the onset of the nuclear age should be used).

[0277] It is this latter biospheric .sup.14C time characteristic that holds out the promise of annual dating of recent biospheric carbon. .sup.14C can be measured by accelerator mass spectrometry (AMS), with results given in units of "fraction of modern carbon" (f.sub.M). f.sub.M is defined by National Institute of Standards and Technology (NIST) Standard Reference Materials (SRMs) 4990B and 4990C. As used herein, "fraction of modern carbon" or "f.sub.M" has the same meaning as defined by National Institute of Standards and Technology (NIST) Standard Reference Materials (SRMs) 4990B and 4990C, known as oxalic acids standards HOxI and HOxII, respectively. The fundamental definition relates to 0.95 times the .sup.14C/.sup.12C isotope ratio HOxI (referenced to AD 1950). This is roughly equivalent to decay-corrected pre-Industrial Revolution wood. For the current living biosphere (plant material), f.sub.M is approximately 1.1.

[0278] The invention provides surfactants or cleaning compositions, having an f.sub.M .sup.14C of at least about 1. An exemplary surfactant has an f.sub.M .sup.14C of at least about 1.01, of at least about 1.5, an f.sub.M .sup.14C of about 1 to about 1.5, an f.sub.M .sup.14C of about 1.04 to about 1.18, or an f.sub.M .sup.14C of about 1.111 to about 1.124. Likewise, an exemplary cleaning composition has an f.sub.M .sup.14C of at least about 1.01, of at least about 1.5, an f.sub.M .sup.14C of about 1 to about 1.5, an f.sub.M .sup.14C of about 1.04 to about 1.18, or an f.sub.M .sup.14C of about 1.111 to about 1.124.

[0279] Another measurement of .sup.14C is known as the percent of modern carbon, pMC. For an archaeologist or geologist using .sup.14C dates, AD 1950 equals "zero years old". This also represents 100 pMC. "Bomb carbon" in the atmosphere reached almost twice the normal level in 1963 at the peak of thermonuclear weapons testing. Its distribution within the atmosphere has been approximated since its appearance, showing values that are greater than 100 pMC for plants and animals living since AD 1950. It has gradually decreased over time with today's value being near 107.5 pMC. This means that a fresh biomass material, such as corn, would give a .sup.14C signature near 107.5 pMC. Petroleum based compounds will have a pMC value of zero. Combining fossil carbon with present day carbon will result in a dilution of the present day pMC content. By presuming 107.5 pMC represents the .sup.14C content of present day biomass materials and 0 pMC represents the .sup.14C content of petroleum based products, the measured pMC value for that material will reflect the proportions of the two component types. For example, a material derived 100% from present day soybeans would give a radiocarbon signature near 107.5 pMC. If that material was diluted 50% with petroleum based products, it would give a radiocarbon signature of approximately 54 pMC.

[0280] A biologically based carbon content is derived by assigning "100%" equal to 107.5 pMC and "0%" equal to 0 pMC. For example, a sample measuring 99 pMC will give an equivalent biologically based carbon content of 93%. This value is referred to as the mean biologically based carbon result and assumes all the components within the analyzed material originated either from present day biological material or petroleum based material.

[0281] A surfactant or a cleaning composition comprising branched fatty alcohols and/or derivatives described herein can have a pMC of at least about 50, 60, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, or 100. In other instances, such a surfactant or cleaning composition can have a pMC of between about 50 and about 100; between about 60 and about 100; between about 70 and about 100; between about 80 and about 100; between about 85 and about 100; between about 87 and about 98; or between about 90 and about 95. In yet other instances, it can have a pMC of about 90, 91, 92, 93, 94, or 94.2.

[0282] Accordingly the present invention is drawn to a branched fatty alcohol or a derivative thereof produced by an engineered microbial host cell. The engineered microbial host cell expresses: (a) a first gene encoding a first polypeptide having at least about 85% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13 and 15, or of a variant thereof; and (b) a second gene encoding a second polypeptide having at least about 85% sequence identity to the amino acid sequence of any one or SEQ ID NOs:24, 26, 28, 30, 32, 34, 36, and 38, or of a variant thereof, and is cultured in the presence of one or more biological substrates of the first and second polypeptides. In some embodiments, the microbial host cell is engineered to express a third gene encoding a third polypeptide comprising an amino acid sequence having at least an about 85% sequence identity to the amino acid sequence of any one of SEQ ID NOs:47, 49, 51, 53, 55, 57, 59, and 61, or of a variant thereof. In some embodiments, the microbial host cell is engineered to express a fourth gene encoding a fourth polypeptide comprising an amino acid sequence having at least an about 85% sequence identity to the amino acid sequence of any one of SEQ ID NO:69, 71, 73, 75, 77, 79, 81, and 83, or of a variant thereof. In any of the above embodiments, the microbial host cell is engineered to express a beta-ketoacyl ACP synthase gene in the host cell, wherein the beta-ketoacyl ACP gene encodes a polypeptide comprising an amino acid sequence having at least about 85% sequence identity to the amino acid sequence of any one of SEQ ID NOs:90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, and 120, or of a variant thereof. The beta-ketoacyl ACP synthase is, for example, a FabH that has specificity for branched-chain acyl-CoA substrates. In any of the embodiments above, the microbial host cell is engineered to express a fatty aldehyde biosynthesis polypeptide, or a variant thereof. In any of the embodiments above, the microbial host cell is engineered to express an acyl-ACP reductase polypeptide or a variant thereof, to modify the expression of a gene encoding a fatty acid synthase, which comprise expressing a gene encoding a thioesterase in the microbial host cell, to express a gene encoding an alcohol dehydrogenase or a variant thereof, and/or to express an attenuated level of a fatty acid degradation enzyme relative to a wild type host cell. The fatty acid degradation enzyme is, for example, an acyl-CoA synthase.

Branched Fatty Alcohol Derivatives

[0283] A derivative of the branched fatty alcohol produced in accordance to the methods described herein can be produced by converting the isolated branched fatty alcohol into a branched fatty alcohol derivative thereof. The branched fatty alcohol derivative can be any suitable branched fatty alcohol derivative selected from, for example, a branched fatty ether sulfate, a branched fatty phosphate ester, an alkylbenzyldimethyl-ammonium chloride, a branched fatty amine oxide, a branched fatty alcohol sulfate, a branched alkyl polyglucoside, a branched alkyl glyceryl ether sulfonate, and a branched ethoxylated fatty alcohol. Typically, the branched fatty alcohol derivative comprises an alkyl group that is about 6 to about 26 carbons in length. Preferably, the branched fatty alcohol comprises an alkyl group that is about 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 carbons in length. In certain embodiments, the alkyl group comprises one or more points of branching. In this regard, the number of carbons in the alkyl group refers to the hydrocarbon group derived from the branched fatty alcohol, and not to any carbon atoms added in the preparation of the branched fatty alcohol derivative, such as polyethoxy groups and the like.

[0284] As used herein, the term "fatty ether sulfate" is the same as "alkyl ether sulfate" wherein the alkyl residue is a fatty residue, and denotes a compound of the structure: RO(CH.sub.2CH.sub.2O).sub.n--SO.sub.3H, wherein R is a C.sub.6-C.sub.26 alkyl group as defined herein, and n is an integer of 1 to about 50. Fatty ether sulfate can also refer to the salt denoted by RO(CH.sub.2CH.sub.2O).sub.nSO.sub.3X, where n and R are as defined above and X is a cation. An exemplary fatty ether sulfate salt is a sodium salt, for example, RO(CH.sub.2CH.sub.2O).sub.nSO.sub.3Na. In an exemplary embodiment, the R group comprises one or more points of branching.

[0285] As used herein, the term "fatty alcohol sulfate" denotes a compound of the structure: ROSO.sub.3H wherein R is a C.sub.8-C.sub.26 alkyl group. Fatty alcohol sulfate can also refer to the salt of the above structure, denoted by ROSO.sub.3X where R is as defined above and X is a cation. An exemplary fatty alcohol sulfate salt is a sodium salt, for example, ROSO.sub.3Na. In an exemplary embodiment, the R group comprises one or more points of branching.

[0286] As used herein, the term "fatty phosphate ester" is the same as "alkyl phosphate ester" wherein the alkyl residue is a fatty residue, and denotes a compound of the structure:

ROP(O)(OH).sub.2

[0287] As used herein, alkylbenzyldimethylammonium chlorides have the structure:

##STR00001##

wherein R is a C.sub.8-C.sub.26 alkyl group as defined herein. For example, the alkyl group of R comprises one or more points of branching.

[0288] As used herein, the term "fatty amine oxide" is the same as "alkyl amine oxide" wherein the alkyl residue is a fatty residue as defined herein, and denotes a compound of the structure:

##STR00002##

wherein R is a C.sub.8-C.sub.26 alkyl group as defined herein and wherein R.sup.1 and R.sup.2 are C.sub.1-C.sub.26 alkyl groups, preferably C.sub.1-C.sub.6 alkyl groups. Preferably the alkyl groups of R, R.sup.1 and R.sup.2 each independently comprises one or more points of branching.

[0289] Alkyl polyglucosides have the structure: RO(C.sub.nH.sub.2nO).sub.tZ.sub.x wherein R is a C.sub.8-C.sub.26 alkyl group, preferably comprising one or more points of branching, Z is a glucose residue, n is 2 or 3, t is from 0 to 10, and x is from about 1 to 10, preferably from about 1.5 to 4.

[0290] Alkyl glyceryl ether sulfonates have the structure:

##STR00003##

wherein R is a C.sub.8-C.sub.26 alkyl group as defined herein, preferably comprising one or more points of branching, and n is an integer from 1 to 4, for example, 1, 2, 3, or 4.

[0291] As used herein, the term "fatty alcohol alkoxylate" is the same as "alkoxylated fatty alcohol" and denotes a compound of the structure: RO(CH.sub.2CH.sub.2).sub.nOH wherein R is a C.sub.8-C.sub.26 alkyl group as defined herein and n is an integer from 1 to about 50. Preferably R comprises one or more points of branching.

[0292] The branched fatty alcohol derivatives can be produced by any suitable method, many of which are known in the art. See, e.g., "Handbook on Soaps, Detergents, and Acid Slurry," 2.sup.nd ed., NIIR Board, Asia Pacific Business Press, Inc., Delhi, India.

[0293] In one embodiment, the branched fatty alcohol derivative is an ethoxylated branched fatty alcohol, which is also known in the art as a branched fatty alcohol ethoxylate, and has a structure as described herein. Preferably, the ethoxylated branched fatty alcohol contains from about 1 to about 50 moles of ethylene oxide per mole of branched fatty alcohol.

Surfactants or Detersive Surfactants

[0294] A surfactant composition of the present invention can comprise about 0.001 wt. % to about 100 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof. Preferably, a surfactant composition is a blend of a microbially produced branched fatty alcohol and/or derivative in combination with one or more other surfactants and/or surfactant systems that have been derived from similar (e.g., microbially derived) or different sources (e.g., synthetic, petroleum-derived). Those other surfactants and/or surfactant systems can confer additional desirable properties. In some embodiments, the one or more other surfactants and/or surfactant systems that are blended with the microbially produced branched fatty alcohols and/or derivatives can comprise linear or branched fatty alcohol derivatives, or they can be other types of surfactants such as, cationic surfactants, anionic surfactants and/or amphoteric/zwitterionic surfactants. These other surfactants and/or surfactants systems are collectively referred to as "co-surfactants" herein. For example, a surfactant composition of the invention can be a blend of a microbially produced branched fatty alcohol and/or derivative composition prepared in accordance with the disclosure herein, and a cationic surfactant derived from a petrochemical source, and the resulting surfactant composition only has good cleaning properties but also contributes certain disinfecting and/sanitizing benefits.

[0295] The cleaning composition of the invention can comprise, in addition to the microbially produced branched fatty alcohols and/or derivatives, or the surfactants comprising such branched materials and/or derivatives, co-surfactants selected from nonionic surfactants, anionic surfactants, cationic surfactants, ampholytic surfactants, squitterionic surfactants, semi-polar nonionric surfactants, and mixtures thereof. When present, the total amount of surfactants, including the microbially produced branched fatty alcohols and/or derivatives thereof, and the co-surfactants, is typically present at a level of about 0.1 wt. % or higher (e.g., about 1.0 wt. % or higher, about 10 wt. % or higher, about 25 wt. % or higher, about 50 wt. % or higher, about 70 wt. % or higher). For example, the total amount of surfactant in a cleaning composition can vary from about 0.1 wt. % to about 80 wt. % (e.g., from about 0.1 wt. % to about 40 wt. %, from about 0.1 wt % to about 12 wt. %, from about 1.0 wt. % to about 50 wt. %, or from about 5 wt. % to about 40 wt. %).

[0296] Various known surfactants can be suitable co-surfactants. In some embodiments, the co-surfactant can comprise an anionic surfactant. In certain embodiments, the amount of one or more anionic surfactants in the cleaning composition can be, for example, about 1 wt. % or more (e.g., about 5 wt. % or more, about 10 wt. % or more, about 20 wt. % or more, about 30 wt. % or more, about 40 wt. % or more). For example, the amount of one or more anionic surfactants in the cleaning composition can vary from about 1 wt. % to about 40 wt. %. Suitable anionic surfactants include, for example, linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl sulfate (fatty alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate, alpha-sulfo fatty acid methyl esters, alkyl- or alkenylsuccinic acid or soap. In some embodiments, an anonic surfactant is, for example, a C.sub.10-C.sub.18 alkyl akoxy ester (AE.sub.xS), wherein x is from 1-30. Other suitable anionic surfactants can be found in International Publication WO98/39403, Surface Active Agents and Detergetns (Vol. 1, & II, by Schwartz, Perry and Berch), and U.S. Pat. Nos. 3,929,678, 6,020,303, 6,060,443, 6,008,181, International Publications WO 99/05243, WO 99/05242 and WO 99/05244, the relevant disclosures of which are incorporated herein by reference.

[0297] In another embodiment, the co-surfctant can comprise a cationic surfactant. Suitable cationic surfactants include, for example, those having long-chain hydrocarbyl groups. Examples include ammonium surfactants such as alkyltrimethylammonium halodenides, and surfactants having the formula [R.sup.2(OR.sup.3)y][R.sup.4(OR.sup.3)y].sub.2R.sup.5N+X.sup.-, wherein R.sup.2 is an alkyl or alkyl benzyl group having from about 8 to about 18 carbon atoms in the alkyl chain, each R.sup.3 is independently selected from the group consisting of --CH.sub.2CH.sub.2--, CH.sub.2CH(CH.sub.3)--, CH.sub.2(CH(CH.sub.2OH))--, CH.sub.2CH.sub.2CH.sub.2--, and mixtures thereof; each R.sup.4 is selected from the group consisting of C.sub.1-C.sub.4 alkyl, C.sub.1-C.sub.4 hydroxyalkyl, benzyl ring structures formed by joining the two R.sup.4 groups, --CH.sub.2CHOH--CHOHCOR.sup.6CHOHCH.sub.2OH wherein R.sup.6 is any hexose or hexose polymer having a molecular weight less than about 1,000, and hydrogen, when y is not 0; R.sup.5 is the same as R.sup.4 or is an alkyl chain wherein the total number of carbon atoms of R.sup.2 plus R.sup.5 is not more than about 18; each y is from 0 to about 10 and the sum of the y values is from 0 to about 15; and X is any compatible anion.

[0298] Certain quaternary ammonium surfactant may also be suitable as cationic co-surfactants, and examples of those are described in International Publication WO 98/39403. Examples of suitable quaternary ammonium compounds include coconut trimethyl ammonium chloride or bromide; coconut methyl dihydroxyethyl ammonium chloride or bromide; decyl triethyl ammonium chloride; decyl di methyl hydroxyethyl ammonium chloride or bromide; C.sub.12-15 dimethyl hydroxyethyl ammonium chloride or bromide; coconut dimethyl hydroxyethyl ammonium chloride or bromide; myristyl trimethyl ammonium methyl sulphate; lauryl dimethyl benzyl ammonium chloride or bromide; lauryl di methyl (ethenoxy) 4 ammonium chloride or bromide. Other cationic surfactants have been described in U.S. Pat. Nos. 4,228,044, 4,228,042, 4,239,660 4,260,529 6,136,769, 6,004,922, 6,022,844, and 6,221,825, International Publications WO 98/35002, WO 98/35003, WO 98/35004, WO 98/35005, WO 98/35006, and WO 00/47708, as well as European Patent Application EP 000,224. When included herein, the surfactant/detergent and the cleaning/treatment compositions of the present invention can comprise, for example, from about 0.2 wt. % to about 25 wt. %, preferably from about 1 wt. % to about 8 wt. % by weight of cationic surfactants.

[0299] In certain embodiments, co-surfactants can comprise nonionic surfactants. Polyethylene, polypropylene, and polybutylene oxide condensates of alkyl phenols are suitable, with the polyethylene oxide condensates being preferred. They include the condensation products of alkyl phenols having an alkyl group having about 6 to about 14 carbon atoms, preferably from about 8 to about 14 carbon atoms, in either a straight-chain or branched-chain configuration, with alkylene oxide. In particular embodiments, the ethylene oxide is present in an amount of from about 2 to about 25 moles (e.g., from about 3 to about 15 moles) of ethylene oxide per mole of alkyl phenol. Commercially available nonionic surfactants of this type include Igepal.TM. C0-630 (The GAF Corp.), Triton.TM. X-45, X-114, X-100 and X-102 (Dow Chemicals). These surfactants are commonly referred to as alkylphenol alkoxylates (e.g., alkyl phenol ethoxylates).

[0300] Moreover, condensation products of primary and secondary aliphatic alcohols with from about 1 to about 25 moles of ethylene oxide are suitable nonionic co-surfactants. The alkyl chain of the aliphatic alcohol can be straight or branched, primary or secondary, and generally can contain about 8 to about 22 (e.g., about 8 to about 20, or about 10 to about 18) carbon atoms with about 2 to about 10 moles (e.g., about 2 to about 5 moles) of ethylene oxide per mole of alcohol present in the condensation products. Examples of commercially available nonionic surfactants of this type include Tergitol.TM. 15-S-9, Tergitol.TM. 24-L-6 NMW (Union Carbide); Neodol.TM. 45-9, Neodol.TM. 23-3, Neodol.TM. 45-7, Neodol.TM. 45-5 (Shell Chemical), Kyro.TM. EOB (Procter & Gamble), and Genapol LA 030 or 050 (Hoechst).

[0301] Further examples of nonionic co-surfactants include C.sub.12-C.sub.18 alkyl ethoxylates (e.g., NEODOL.RTM. nonionic surfactants (Shell)), C.sub.6-C.sub.12 alkyl phenol alkoxylates wherein the alkoxylate units are a mixture of ethyleneoxy and propyleneoxy units, C.sub.12-C.sub.18 alcohol and C.sub.6-C.sub.12 alkyl phenol condensates with ethylene oxide/propylene oxide block alkyl polyamine ethoxylates (e.g., PLURONIC.RTM. (BASF)), C.sub.14-C.sub.22 mid-chain branched alcohols as described in U.S. Pat. No. 6,150,322, C.sub.14-C.sub.22 mid-chain branched alkyl alkoxylates, BAE.sub.x, wherein x is from 1-30, as described in U.S. Pat. Nos. 6,153,577, 6,020,303 and 6,093,856, alkylpolysaccharides as described in U.S. Pat. No. 4,565,647, alkylpolyglycosides as described in U.S. Pat. No. 4,483,780 and U.S. Pat. No. 4,483,779, polyhydroxy detergent acid amides as described in U.S. Pat. No. 5,332,528, or ether capped poly(oxyalkylated) alcohol surfactants as described in U.S. Pat. No. 6,482,994 and International Publication WO 01/42408.

[0302] Semi-polar nonionic surfactants can also be suitable. They include, e.g., water-soluble amine oxides containing 1 alkyl moiety of from about 10 to about 18 carbon atoms and 2 moieties selected from alkyl or hydroxyalkyl moieties containing about 1 to about 3 carbon atoms, water-soluble phosphine oxides containing 1 alkyl moiety of about 10 to about 18 carbon atoms and 2 moieties selected from alkyl or hydroxyalkyl moieties containing about 1 to about 3 carbon atoms; and water-soluble sulfoxides containing 1 alkyl moiety of about 10 to about 18 carbon atoms and a moiety selected from alkyl or hydroxyalkyl moieties of about 1 to about 3 carbon atoms. Semi-polar nonionic surfactants have been described in, e.g., International Publication WO 01/32816, U.S. Pat. Nos. 4,681,704 and 4,133,779.

[0303] Moreover, alkylpolysaccharides, such as those described in U.S. Pat. No. 4,565,647, having a hydrophobic group containing about 6 to about 30 carbon atoms (e.g., from about 10 to about 16 carbon atoms) and a polysaccharide, can also be suitable semi-polar noniornic co-surfactants. Others have been described in, for example, International Publication WO 98/39403. When included herein, the cleaning/treatment compositions of the present invention can comprise, for example, about 0.2 wt. % or more (e.g., about 1 wt. % or more, about 5 wt. % or more, or about 8 wt. % or more) of such semi-polar nonionic surfactants. For example, the cleaning compositions of the invention can comprise about 0.2 wt. % to about 15 wt. % (e.g., about 1 wt. % to about 10 wt. %) of semi-polar nonionic surfactants.

[0304] In certain embodiments, the co-surfactants comprises ampholytic surfactants. Ampholytic surfactants can be broadly described as aliphatic derivatives of secondary or tertiary amines, or aliphatic derivatives of heterocyclic secondary and tertiary amines in which the aliphatic radical can be straight- or branched-chain. One of the aliphatic substituents can contain at least about 8 carbon atoms (e.g., from about 8 to about 18 carbon atoms), and at least another contains an anionic water-solubilizing group, e.g. carboxy, sulfonate, or sulfate. Ampholytica surfactants have been described in, for example, U.S. Pat. No. 3,929,678. When included therein, a cleaning composition of the invention can comprise, for example, about 0.2 wt. % to about 15 wt. % (e.g., about 1 wt. % to about 10 wt. %) of ampholytic surfactants.

[0305] In certain other embodiments, especially in personal care cleaning/treatment compositions, zwitterionic surfactants are included as co-surfactants. These surfactants can be broadly described as derivatives of secondary and tertiary amines, derivatives of heterocyclic secondary and tertiary amines, or derivatives of quaternary ammonium, quaternary phosphonium or tertiary sulfonium compounds. Zwitterionic surfactants have been described in, for example, U.S. Pat. No. 3,929,678. When included therein, a surfactant/detergent or cleaning/treatment composition of the invention can comprise, for example, about 0.2 wt. % to about 15 wt. % (e.g., about 1 wt. % to about 10 wt. %) of zwitterionic surfactants.

[0306] In further embodiments, primary or tertiary amines can be included as co-surfactants. Suitable primary amines include amines according to the formula R.sup.1NH.sub.2 wherein R.sup.1 is a C.sub.6-C.sub.12, preferably C.sub.6-C.sub.10, alkyl chain; or R.sub.4X(CH.sub.2)n, wherein X is --O--, --C(O)NH-- or --NH--, R.sup.4 is a C.sub.6-C.sub.12 alkyl chain, n is between 1 to 5 (e.g., 3). The alkyl chain of R.sup.1 can be straight or branched, and can be interrupted with up to 12, but preferably less than 5 ethylene oxide moieties. Preferred amines include n-alkyl amines, selected from, for example, 1-hexylamine, 1-octylamine, 1-decylamine and laurylamine, C.sub.8-C.sub.10 oxypropylamine, octyloxypropylamine, 2-ethylhexyl-oxypropylamine, lauryl amido propylamine or amido propylamine. Suitable tertiary amines include those having the formula R.sup.1R.sup.2R.sup.3N wherein R.sup.1 and R.sup.2 are C.sub.1-C.sub.8 alkylchains, R.sup.3 is either a C.sub.6-C.sub.12, preferably C.sub.6-C.sub.10, alkyl chain, or R.sup.3 is R.sup.4X(CH.sub.2)n, whereby X is --O--, --C(O)NH-- or --NH--, R.sup.4 is a C.sub.4-C.sub.12, n is between 1 and 5 (e.g., 2, 3, or 4), R.sup.5 is H or C.sub.1-C.sub.2 alkyl, and x is between 1 and 6. R.sup.3 and R.sup.4 may be linear or branched. The alkyl chain of R.sup.3 can be interrupted with up to 12, but preferably less than 5, ethylene oxide moieties. Preferred tertiary amines include, for example, 1-hexylamine, 1-octylamine, 1-decylamine, 1-dodecylamine, n-dodecyldimethylamine, bishydroxyethylcoconutalkylamine, oleylamine(7)ethoxylated, lauryl amido propylamine, and cocoamido propylamine. Other useful detersive surfactants have been described in the prior art, for example, in U.S. Pat. Nos. 3,664,961, 3,919,678, 4,222,905, and 4,239,659.

[0307] In some embodiments, the detergent/cleaning composition of the invention comprises greater than about 5 wt. % anionic surfactant and/or less than about 25 wt. % nonionic surfactant. For example, the composition comprises greater than about 10 wt. % anionic surfactants. In another example, the composition comprises less than 15%, more preferably, less than 12% nonionic surfactants.

[0308] The total amount of surfactants included in a cleaning composition of the invention is typically about 0.1 wt. % or more (e.g., about 1 wt. % or more, about 10 wt. % or more, about 25 wt. % or more, about 50 wt. % or more, about 60 wt. % or more, about 70 wt. % or more). An exemplary cleaning composition of the invention comprises about 0.1 wt. % to about 80 wt. % total surfactants (e.g., about 1 wt. % to about 50 wt. %, about 10 wt. % to about 40 wt. %, about 20 wt. % to about 35 wt. %) of total surfactants, including the microbially produced branched fatty alcohols and/or derivatives thereof and co-surfactants.

[0309] One criteria based on which to the type(s) and amount(s) of surfactants to be included in cleaning compositions can be determined is compatibility with the enzyme components present in the cleaning compositions. For example, in liquid or gel compositions, the cleaning composition (including all the surfactants, which are, for example, pre-formulated into a surfactant package) is prepared such that it promotes, or at least does not degrade, the stability of any enzyme in the cleaning composition.

[0310] A surfactant composition of the present invention, or a surfactant package, which can be formulated and subsequently included in a cleaning composition, can be in any form, for example, a liquid; a solid such as a powder, granules, agglomerate, paste, tablet, pouches, bar; a gel; an emulsion; or in a suitable form to be delivered in dual-compartment containers. The composition can also be formulated into a spray or foam detergent, premoistened wipes (e.g., the cleaning composition in combination with a nonwoven material as described, for example, in U.S. Pat. No. 6,121,165), dry wipes (e.g., a cleaning composition in combination with a nonwoven material, activated with water by a consumer, as described, for example, in U.S. Pat. No. 5,980,931), and other homogeneous or multiphase consumer cleaning product forms.

Cleaning Compositions

[0311] Surfactant compositions comprising branched fatty alcohols and/or derivatives thereof, e.g., sulfate, alkoxyalated or alkoxylated sulfate derivatives, are particularly suitable as soil detachment-promoting ingredients of laundry detergents, dishwashing liquids and powders, and various other cleaning compositions. They exhibit good dissolving power especially when faced with greasy soils, and it is particular advantageous that they display the outstanding soil-detaching power even at low washing temperatures.

[0312] The branched fatty alcohol/derivative compositions of the present invention can be included or blended into a surfactant package as described above, which comprises about 0.0001 wt. % to about 100 wt. % of one or more branched fatty alcohols and/or derivatives produced by a genetically engineered host cell or microbe. That surfactant package can then be blended into a cleaning composition to impart detergency and cleaning power to the cleaning composition. In alternative embodiments, the branched fatty alcohols and/or derivatives thereof produced by the host cell or mibrobe can be blended into a cleaning composition directly, in an amount of about 0.001 wt. % or more (e.g., about 0.001 wt. % or more, about 0.1 wt. % or more, about 1 wt. % or more, about 10 wt. % or more, about 20 wt. % or more, or about 40 wt. % or more) based on the total weight of the cleaning composition. For example, the branched fatty alcohols and/or derivatives thereof can be blended into a composition in an amount of about 0.001 wt. % to about 50 wt. % (e.g., about 0.01 wt. % to about 45 wt. %, about 0.1 wt. % to about 40 wt. %, about 1 wt. % to about 35 wt. %). Accordingly, a cleaning composition of the present invention, in either a solid form (e.g., a tablet, granule, powder, or compact), or a liquid form (e.g., a fluid, gel, paste, emulsion, or concentrate) can comprise about 0.001 wt. % to about 50 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof. For example, a cleaning composition of the invention can comprise about 0.5 wt. % to about 44 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof. Preferably, the cleaning composition comprises about 1 wt. % to about 30 wt. % of microbially produced branched fatty alcohols and/or derivatives.

[0313] Alternatively, a cleaning composition of the present invention can comprise about 0.001 wt. % to about 80 wt. % of a surfactant package formulated to comprise about 0.001 wt. % to about 100 wt. % of microbially produced branched fatty alcohols and/or derivatives. For example, a cleaning composition of the present invention can comprise about 0.1 wt. % to about 50 wt. % of such a surfactant package. The surfactant package can comprise other surfactants (i.e., co-surfactants), which can include surfactants derived from similar (e.g., microbially-produced surfactant) or different sources (e.g., petroleum-derived surfactants). In a particular embodiment, however, the surfactant package can comprise mostly or exclusively of branched fatty alcohols and/or derivatives produced by a host cell or a microbe as described herein.

Industrial Cleaning Compositions, Household Cleaning Compositions & Personal Care Cleaning Compositions

[0314] In certain embodiments, the cleaning composition of the present invention is a liquid or solid laundry detergent composition. In some embodiments, the cleaning composition is a hard surface cleaning composition, wherein the hard surface cleaning composition preferably impregnates a nonwoven substrate. As used herein, "impregnate" means that the hard surface cleaning composition is placed in contact with a nonwoven substrate such that at least a portion of the nonwoven substrate is penetrated by the hard surface cleaning composition. For example, the hard surface cleaning composition preferably saturates the nonwoven substrate. In other embodiments, the cleaning composition of the present invention is a car care composition, which is useful for cleaning various surfaces such as hard wood, tile, ceramic, plastic, leather, metal, and/or glass. In some embodiments, the cleaning composition is a dish-washing composition, such as, for example, a liquid hand dishwashing composition, a solid automatic dishwashing composition, a liquid automatic dishwashing composition, and a tab/unit dose form automatic dishwashing composition.

[0315] In further embodiments, the cleaning composition can be used in industrial environments for cleaning of various equipment, machinery, and for use in oil drilling operations. For example, the cleaning composition of the present invention can be particularly suited in environments wherein it comes into contact with free hardness and in compositions that require hardness tolerant surfactant systems, such as when used to aid oil drilling.

[0316] In some embodiments, the cleaning composition of the invention can be formulated into personal or pet care compositions such as shampoos, body washs, or liquid or solid soaps.

[0317] Common cleaning adjuncts applicable to most cleaning compositions, including, household cleaning compositions, and personal care compositions and the like, include builders, enzymes, polymers, suds boosters, suds suppressors (antifoam), dyes, fillers, germicides, hydrotropes, anti-oxidants, perfumes, pro-perfumes, enzyme stabilizing agents, pigments, and the like. In some embodiments, the cleaning composition is a liquid cleaning composition, wherein the composition comprises one or more selected from solvents, chealating agents, dispersants, and water. In other embodiments, the cleaning composition is a solid, wherein the composition further comprises, for example, an inorganic filler salt. Inorganic filler salts are conventional ingredients of solid cleaning compositions, present in substantial amounts, varying from, for example, about 10 wt. % to about 35 wt. %. Suitable filler salts include, for example, alkali and alkaline-earth metal salts of sulfates and chlorides. An exemplary filler salt is sodium sulfate.

[0318] Household cleaning compositions, including, e.g., laundry detergents and household surface cleaners typically comprise certain additional, in some embodiments, more specialized, ingredients or cleaning adjuncts selected from one or more of: bleaches, bleach activators, catalytic materials, suds boosters, suds suppressors (antifoams), diverse active ingredients or specialized materials such as dispersant polymers (e.g., dispersant polymers sold by BASF or Dow Chemicals), silvercare, anti-tarnish and/or anti-corrosion agents, dyes, germicides, alkalinity sources, hydrotropes, anti-oxidants, enzyme stabilizing agents, pro-perfumes, perfumes, solubilizing agents, carriers, processing aids, pigments, and, for liquid formulations, solvents, chelating agents, dye transfer inhibiting agents, dispersants, brighteners, dyes, structure elasticizing agents, fabric softeners, anti-abrasion agents, hydrotropes, processing aids, and other fabric care agents. The cleaning adjuncts particularly useful for household cleaning compositions and the levels of use have been described in, e.g., U.S. Pat. Nos. 5,576,282, 6,306,812 and 6,326,348. A comprehensive list of suitable lanudry or other household cleaning adjuncts is, e.g., in International Publication WO 99/05245.

[0319] Personal/pet or beauty care cleaning compositions including, e.g., shampoos, facial cleansers, hand sanitizers, bodywash, and the like, can also comprise, in some embodiments, other more specialized adjuncts, inlcuding, for example, conditioning agents such as vitamines, silicone, silicone emulsion stabilizing components, cationic cellulose or polymers such as Guar polymers, anti-dandruff agents, antibacterial agents, dispersed gel network phase, suspending agents, viscosity modifiers, dyes, non-volatile solvens or diluents (water soluble or insoluble), foam boosters, pediculocides, pH adjusting agnets, perfumes, preservatives, chelants, proteins, skin active agents, sunscreens, UV absorbers, and minerals, herbal/fruit/food extracts, sphigolipids derivatives or synthetic derivatives and clay.

Common Adjuncts

[0320] (1) Enzymes

[0321] Various known detersive enzymes can be blended into a cleaning composition of the present invention. Suitable enzymes include, e.g., proteases, amylases, lipases, cellulases, pectinases, mannases, arabinases, galactanases, xylanases, oxidases (e.g., laccases), peroxidases, and/or mixtures thereof. They can provide enhanced cleaning performance and/or fabric care benefits. In general, just as the selection of the type and amount of surfactants to be formulated into a cleaning composition should take account of the enzymes therein, the types of enzyme chosen to be included in the composition should take account of the other components in the composition (including the surfactants). Considerations may include, e.g., the pH-optimum of the overall composition, the presence of absence of enzyme stabilization agents, etc. The enzymes should be present in the cleaning compositions in effective amounts.

[0322] Suitable proteases include those of animal, vegetable or microbial origin. Microbial origin is preferred. Chemically modified or engineered mutants (e.g., those desecribed in International Publications WO 92/19729, 98/20115, 98/20116, and 98/34946) can also be included. Suitable proteases can be a serine protease or a metallo protease, preferably an alkaline microbial protease or a trypsin-like protease. Examples of alkaline proteases are subtilisins, especially those derived from Bacillus, e.g., subtilisin Novo, subtilisin Carlsberg, subtilisin 309, subtilisin 147 and subtilisin 168 (as described in International Publications WO 89/06279 and WO 05/103244). Other suitable serine proteases include those from Micrococcineae sp. and those from Cellulonas sp. and variants thereof as, e.g., described in International Publication WO05/052146. Examples of trypsin-like proteases include trypsin (e.g. of porcine or bovine origin) and Fusarium proteases such as those described in International Publications WO 89/06270 and WO 94/25583. Many proteases are commercially available, including, e.g., Alcalase.TM., Savinase.TM., Primase.TM., Duralase.TM., Esperase.TM. Coronase.TM., Polarzyme.TM., Kannase.TM. (Novozymes A/S), Maxatase.TM., Maxacal.TM. Maxapem.TM., Properase.TM., Purafect.TM., Purafect Prime.TM., Purafect OxP.RTM., FNA, FN2, FN3, and FN4 (Genencor Int'l Inc.).

[0323] Suitable lipases include those of bacterial or fungal origin. For example, suitable lipases can be derived from yeast, from genera such as Candida, Kluyvermyces, pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia, or derived from a filamentous fungus, such as Acremonium, Aspergillus, Aureobasidum, Cryptococcus, Filobasidium, Fusarium, Humicolar, Magnaporthe, Mucor, Myceliophthora, Neocallimasix, Neurospora, Paecilomyces, Penicillium, Piromyces, Schizophyllum, Talaromyces, thermoascus, Thielavia, Tolypocladium, Thermomyces or Trichoderma. Many chemically modified lipases are also suitable, including, for example, those from Humicola, (e.g., a modified lipase from H. lanuginosa as described in EP 258 068 and 305 216, a modified lipase from H. insolens as described in International Publication WO 96/13580), those from Pseudomonas (e.g., a modified lipase from P. alcaligenes, or from P. pseudoalcaligenes as described in EP 218 272, a modified lipase from P. cepacia as described in EP 331 376, a modified lipase from P. stutzeri as described in GB 1,372,034, a modified lipase from P. fluoresces or Pseudomonas sp. strain SD 705, as described in International Publications WO 95/06720 and WO96/27002, a modified lipase from P. wisconsinensis as described in International Publication WO 96/12012), those from Bacillus (e.g. a modified lipase from B. subtilis as described in Dartois et al. Biochemica et Biophysica Acta, 1131, 253-360 (1993)), a modified lipase from B. stearothermophilus as described in JP Application 64/744992, a modified lipase from B. pumilus as described in International Publication WO 91/16422. Other examples are lipase variants include those described in International Publications WO 92/05249, WO 94/01541, WO 95/35381, WO 96/00292, WO 95/30744, WO 94/25578, WO 95/14783, WO 95/22615, WO 97/04079, WO 97/07202, and EP 407 225 and 260 105.

[0324] A number of lipase enzymes, which can be included in a cleaning composition of the invention, are commercially available. They include Lipolase.TM., Lipolase.TM. Ultra and Lipex.TM. (Novozymes A/S). Suitable amylases (.alpha. and/or .beta.) include those of bacterial or fungal origin. Chemically modified or engineered mutant amylases can also be suitably included in a cleaning composition of the invention. Amylases include, for example, .alpha.-amylases obtained from Bacillus (for example, from a special strain of B. licheniformis as described GB Patent 1,296,839). Various mutant amylases, which can be suitably included in a cleaning composition, have been described in International Publications WO 94/02597, WO 94/18314, WO 96/23873, and WO 97/43424. A number of amylases, which can be included in a cleaning composition of the present invention, are commercially available. They include Duramyl.TM. Termamyl.TM., Stainzyme.TM., Stainzyme Ultra.TM., Stainzyme Plus.TM., Fungamyl.TM. and BAN.TM. (Novozymes A/S), Rapidase.TM. and Purastar.TM. (Genencor Int'l Inc.).

[0325] Suitable cellulases include those of bacterial or fungal origin. Chemically modified or engineered mutant cellulases can also be suitably included in a cleaning composition of the invention. Cellulases include, for example, those obtained from the genera Bacillus, Pseudomonas, Humicola (e.g., from Humicola insolens), Fusarium (e.g., from Fusarium oxysporum), Thielavia, Acremonium, Myceliophthora, as described in U.S. Pat. Nos. 4,435,307, 5,648,263, 5,691,178, 5,776,757, and International Publication WO 89/09259. Especially suitable cellulases are the alkaline or neutral cellulases that impart color care benefits. Examples of such cellulases include those described in EP 0 495 257, 0 531 372, and International Publications WO 96/11262, WO 96/29397, and WO 98/08940. A number of cellulases, especially those that provide added color care benefits, are commercially available, which can be included in a cleaning composition of the invention, especially in, for example, a laundry detergent composition. Commercially available cellulases include, e.g., Renozyme.TM. Celluclean.TM., Endolase.TM., Celluzyme.TM., and Carezyme.TM. (Novozymes A/S), Clazinase.TM. and Puradax HA.TM. (Genencor International Inc.), and KAC-500(B).TM. (Kao Corporation).

[0326] Suitable peroxidases/oxidases include those of plant, bacterial or fungal origin. Chemically modified or engineered mutant peroxidases/oxidases can also be suitably included in a cleaning composition of the invention. Useful peroxidases include, for example, those obtained from the genera Coprinus (e.g., a periosidase from C. cinereus and variants thereof as described in International Publications WO 93/24618, WO 95/10602, and WO 98/15257). Commercially available peroxidases include, for example, Guardzyme.TM. (Novozymes A/S).

[0327] Suitable enzymes described above can be present in a cleaning composition of the present invention at levels of about 0.00001 wt. % or higher (e.g., about 0.01 wt % or higher, about 0.1 wt. % or higher, about 0.5 wt. % or higher, or about 1 wt. % or higher). For example, one or more such enzymes can be present in a cleaning composition of the invention in an amount of about 0.00001 wt. % to about 2 wt. % (e.g., about 0.0001 wt. % to about 1 wt. %, about 0.001 wt. % to about 0.5 wt. %) based on the total weight of the cleaning composition. In certain embodiments, the enzyme(s) can be present or used at very low levels, for example, at about 0.001 wt. % or lower. In alternative embodiments, enzyme(s) can be formulated, for example, into a heavier duty laundry detergent composition, at about 0.1 wt. % and higher, for example, at about 0.5 wt. % or higher.

[0328] 2) Enzyme Stabilizers

[0329] In certain embodiments, the cleaning composition of the present invention, which comprises one or more enzymes, further comprises one or more enzyme stabilizers. For example, the enzymes employed in the cleaning composition can be stabilized by the presence of water-soluble sources of calcium and/or magnesium ions in the finished compositions that provide such ions to the enzymes. Known stabilizing agents include, for example, a polyol such as propylene glycol or a glycerol, a sugar or a sugar alcohol, a lactic acid, a boric acid, a boric acid derivative such as an aromatic borate ester, a phenyl boronic acid derivative such as a 4-formylphenyl boronic acid. The enzyme stabilizers can be incorporated into the cleaning composition according to known methods, such as, for example, those described in International Publications WO 92/19709 and WO 92/19708.

[0330] 3) Builders

[0331] Cleaning compositions of the present invention optionally comprise one or more detergent builders or builder systems. When a builder is used, the subject composition can comprise, e.g., at least about 1 wt. % (e.g., at least about 1 wt. %, at least about 5 wt. %, at least about 10 wt. %, at least about 20 wt. %, at least about 30 wt. %, at least about 40 wt. %, at least about 50 wt. %, or more) of one or more builders. For example, a solid cleaning composition of the present invention can comprise, e.g., about 1 wt. % to about 60 wt. % (e.g., about 5 wt. % to about 50 wt. %, about 10 wt. % to about 40 wt. %, about 15 wt. % to about 30 wt. %) of one or more builders or a builder system. For example, a liquid cleaning composition of the present invention can comprise about 0 wt. % to about 10 wt. % of one or more detergency builders.

[0332] Various known builder materials can be used, including, e.g., aluminosilicate materials, silicates, polycarboxylates, alkyl- or alkenyl-succinic acid, and fatty acids, materials such as ethylenediamine tetraacetate, diethylene triamine pentamethyleneacetate, metal ion sequestrants such as aminopolyphosphonates, particularly ethylenediamine tetramethylene phosphonic acid and diethylene tramine pentamethylenephosphonic acid. Particularly, builder materials such as calcium sequestrant materials, precipitating materials, calcium ion-exchange mateirals, polycarboxylate materials, citrate builder, succinic acid builders, aminocarboxylates, and mixtures thereof are preferred.

[0333] Examples of calcium sequenstrant builder materials include alkali metal polyphosphates, such as sodium tripolyphosphate and organic sequestrants, and ethylene diamine tetra-acetic acid. Examples of precipitating builder materials include sodium orthophosphate and sodium carbonate. Examples of calcium ion-exchange builder materials include the water-insoluble crystalline or amorphous aluminosilicates, of which zeolites are the best known, e.g., zeolite A, zeolite B (also known as zeolite P), zeolite C, zeolite X, zeolite Y, and also the zeolite P-type as described in, e.g., EP 0 384 070.

[0334] Of particular importance are citrate builders, including, for example, citric acid and soluble salts thereof (particularly sodium salt), are polycarboxylate builders of particular importance for heavy duty liquid detergent formulations due to their availability from renewable resources and their biodegradability. Oxydisuccinates are also especially useful in such compositions and combinations. Useful succinic acid builders can also be C.sub.5-C.sub.20 alkyl and alkenyl succinic acids and salts thereof, including laurylsuccinate, myristylsuccinate, palmitylsuccinate, 2-dodecenylsuccinate, 2-pentadecenylsuccinate. with dodecenylsuccinic acid being particularly preferred.

[0335] Suitable polycarboxylate builders include, for example, cyclic compounds, particularly alicyclic compounds, such as those described in U.S. Pat. Nos. 3,308,067, 3,723,322, 3,835,163; 3,923,679; 4,102,903, 4,120,874, 4,144,226, and 4,158,635.

[0336] Ether hydroxypolycarboxylates, copolymers of maleic anhydride with ethylene or vinyl methyl ether, 1, 3, 5-trihydroxy benzene-2, 4, 6-trisulphonic acid, and carboxymethyl oxysuccinic acid, various alkali metal, ammonium, and substituted ammonium salts of poly acetic acids such as ethylenediamine tetraacetic acid and nitrilotriacetic acid, and polycarboxylates such as mellitic acid, succinic acid, oxy-disuccinic acid, polymaleic acid, benzene 1,3,5-tricarboxylic acid, carboxymethyloxy-succinic acid, and soluble salts thereof can be used as builders. Other nitrogen-containing, phospho-free aminocarboxylates are sometimes used. Specific examples include ethylene diamine disuccinic acid and salts thereof (ethylene diamine disuccinates, EDDS), ethylene diamine tetraacetic acid and salts thereof (ethylene diamine tetraacetates, EDTA), and diethylene triamine penta acetic acid and salts thereof (diethylene triamine penta acetates, DTPA). In particular embodiments of a liquid composition, 3,3-dicarboxy-4-oxa-1,6-hexanedioates and related compounds as described in U.S. Pat. No. 4,566,984 can be suitable.

[0337] 4) Chelating Agents

[0338] Cleaning compositions of the present invention can optionally comprise one or a mixture of more than one copper, iron and/or manganese chelating agents. When such an agent is used, the subject cleaning composition can comprise, for example, about 0.005 wt. % or more (e.g., about 0.01 wt. % or more, about 1 wt. % or more, about 5 wt. % or more, about 10 wt. % or more) chelating agents. For example, a cleaning composition of the invention comprises about 0.005 wt. % to about 15 wt. % (e.g., about 0.01 wt. % to about 12 wt. %, about 0.1 wt. % to about 10 wt. %, about 1 wt. % to about 8 wt. %, about 2 wt. % to about 6 wt. %) chelating agents.

[0339] Suitable chelating agents include, e.g., amino carboxylates, amino phosphonates, polyfunctionally-substituted aromatic chelating agents, or mixtures, which are capable of removing copper, iron, or manganese ions from washing mixtures by forming soluble chelates.

[0340] Amino carboxylates include, for example, ethylenediaminetetracetates, N-hydroxyethylethylenediaminetriacetates, nitrilotriacetates, ethylenediamine tetraproprionates, triethylenetetraamine-hexacetates, diethylenetriamine penta-acetates, and ethanoldiglycines, alkali metal, ammonium, and substituted ammonium salts thereof.

[0341] Amino phosphonates are selectively used in cleaning compositions becuase they increase the amount of total phosphorus. For some applications wherein the amount of total phosphorus in a cleaning composition is limited, amino phosphonates may not be a suitable chelating agent or should be used in low amounts. Amino phosphonates include, e.g., ethylenediamine tetrakis (methylenephosphonates). The amino phosphonates preferably do not contain alkyl or alkenyl groups with more than about 6 carbon atoms.

[0342] Suitable polyfunctionally-substiuted aromatic chelating agents have been described in, e.g., U.S. Pat. No. 3,812,044. Exemplary polyfunctionally-substituted aromatic chelating agents include a dihydroxydisulfobenzene, such as a 1,2-dihydroxy-3,5-disulfobenzene.

[0343] In some embodiments, biodegradable chelators can be included in a cleaning composition of the invention. An exemplary biodegradable chelator is ethylenediamine disuccinate ("EDDS"), especially the [S,S] isomer as described in U.S. Pat. No. 4,704,233.

[0344] The compositions herein may also contain water-soluble methyl glycine diacetic acid (MGDA) salts (or acid form) as a chelant or co-builder useful with, for example, insoluble builders such as zeolites, layered silicates and the like.

[0345] 5) Hydrotropes

[0346] Hydrotropes can be optionally included in cleaning compositions of the present invention to improve the physical and chemical stability of the compositions. Suitable hydrotropes include sulfonated hydrotropes, which include, for example, alkyl aryl sulfonates, or alkyl aryl sulfonic acids. Alkyl aryl sulfonates can be sodium, potassium, calcium, or ammonium xylene sulfonates; sodium, potassium, calcium, or ammonium toluene sulfonates; sodium, potassium, calcium, or ammonium euraene sulfonates; sodium, potassium, calcium, or ammonium substituted or unsubstituted naphthalene sulfonates, and mixtures thereof. Preferred among these are the sodium salts. Alkyl aryl sulfonic acids can be xylenesulfonic acid, toluenesuifonic acid, cumenesulfonic acid, substituted or unsubstituted naphthalenesulfonic acid, or salts thereof. In certain embodimens, a mixture of xylenesulfonic acid and p-toluene sulfonate can be used.

[0347] If present, a cleaning composition of the present invention comprises hydrotropes in an amount of about 0.01 wt. % or more (e.g., about 0.02 wt. % or more, about 0.05 wt. % or more, about 0.1 wt. % or more, about 1 wt. % or more, about 5 wt. % or more, about 10 wt. % or more, or about 15 wt. % or more). On the other hand, a cleaning composition of the present invention comprises hydrotropes in an amount of no more bout 20 wt. % (e.g., no more than about 20 wt. %, no more than about 15 wt. %, no more than about 10 wt. %, no more than about 5 wt. %, no more than about 1 wt. %). In certain embodiments, the cleaning composition can comprise hydrotropes in an amount of about 0.01 wt. % to about 20 wt. % (e.g., about 0.02 wt. % to about 18 wt. %, about 0.05 wt. % to about 15 wt. %, about 0.1 wt. % to about 10 wt. %, about 1 wt. % to about 5 wt. %), based on the total weight of the cleaning composition.

[0348] 6) Rheology Modifier

[0349] A cleaning composition of the present invention, when in the form of a liquid, can suitably comprise a rheology modifier to provide a matrix that is "shear-thinning" A shear-thinning fluid, as it is understood by those skilled in the art, is a fluid the viscosity of which decreases as shear is applied. Thus, at rest, for example, during storage or shipping of a composition, the liquid matrix of the composition preferably has a relatively high viscosity. When shear is applied to the composition, however, such as in the act of pouring or squeezing the composition from its container, the viscosity of the matrix is lowered to the extent that dispensing of the fluid product is easily and readily accomplished.

[0350] Various materials that are capable of forming shear-thinning fluids when combined with water or other aqueous liquids are known. One type of useful structuring agent for this purpose comprises non-polymeric (except for conventional alkoxylation) crystalline hydroxy-functional materials that can form thread-like structuring systems throughout the liquid matrix when crystallized within the matrix in situ. Such materials include, e.g., crystalline hydroxyl-containing fatty acids, fatty esters, or fatty waxes. Specific examples of crystalline hydroxyl-containing rheology modifiers include castor oil and derivatives. Preferred are hydrogenated castor oil derivatives such as hydrogenated castor oil and hydrogenated castor wax. A number of these materials are commercially availalbe, including, for example, THIXCIN.RTM. (Elementis Specialties), 1,4-di-O-benzyl-D-Threitol in the R,R, and S, S forms and any mixtures, optically active or not, and others described in, for example, U.S. Pat. No. 6,080,708 and International Publication WO 02/40627.

[0351] Suitable polymeric rheology modifiers include those of the polyacrylate, polysaccharide or polysaccharide derivative type. Polysaccharide derivatives typically used as rheology modifiers comprise polymeric gum materials. Such gums include pectine, alginate, arabinogalactan, carrageenan, gellan gum, xanthan gum and guar gum. Another suitable rheology modifier is a combination of a solvent and a polycarboxylate polymer. The solvent can be, for example, an alkylene glycol, more preferably dipropy glycol. For example, the solvent can comprise a mixture of dipropyleneglycol and 1,2-propanediol, with a ratio of dipropyleneglycol to 1,2-propanediol being about 3:1 to about 1:3 (e.g., about 1:1). The polycarboxylate polymer can be, e.g., a polyacrylate, polymethacrylate, or mixtures thereof. The polyacrylate can be a copolymer of unsaturated mono- or di-carbonic acid and 1-30C alkyl ester of the (meth) acrylic acid, or a polyacrylate of unsaturated mono- or di-carbonic acid and 1-30C alkyl ester of the (meth) acrylic acid. Some of these polymers are commercially available, for example, under the tradename Carbopol.RTM. Aqua 30 (Lubrizol, Wickliffe, Ohio).

[0352] The rheology modifiers can be present at a level of about 0.5 wt. % to about 15 wt. % (e.g., about 1 wt. % to about 12 wt. %, about 2 wt. % to about 9 wt. %), based on the total weight of the cleaning composition. The polycarboxylate polymer is suitably present at a level of about 0.1 wt. % to about 10 wt. % (e.g., about 1 wt. % to about 8 wt. %, about 1.5% to about 6 wt. %, about 2 wt. % to about 5 wt. %) in the cleaning composition.

[0353] 6) Solvents or Solvent Systems

[0354] A cleaning composition of the invention can be in a liquid form, wherein one or more suitable solvents or solvent systems are included. Suitable solvents include water, lipophilic fluids, or organic solvents. Examples of suitable lipophilic fluids include siloxanes, other types of silicones, hydrocarbons, glycol ethers, glycerine derivatives such as glycerine ethers, perfluorinated amines, perfluorinated and hydrofluoroether solvents, low-volatility nonfluorinated organic solvents, diol solvents, other environmentally-friendly solvents and mixtures. Particularly suitable solvents include low molecular weight primary and secondary alcohols, such as methanol, ethanol, propanil, or isopropanol. Monohydric alcohols, e.g., polyols containing about 2 to about 6 carbon atoms, and/or about 2 to about 6 hydroxy groups (e.g., propylene glycol, ethylene glycol, glycerin, and 1,2-propanediol) are also suitable.

[0355] Solvents can be absent, for example, from anhydrous solid embodiments of the cleaning compositions of the invention. But in a liquid cleaning composition, they are typically present at levels of bout 0.1 wt. % to about 98 wt. % (e.g., about 1 wt. % to about 90 wt. %, about 10 wt. % to about 80 wt. %, about 20 wt. % to about 75 wt. %).

[0356] 7) Organic Sequestering Agent

[0357] A cleaning composition of the invention can optionally comprise about 0.01 wt. % to about 1.0 wt. % of an organic sequestering agent. Non-limiting example of organic sequestering agent include nitriloacetic acid, EDTA, organic phosphonates, sodium citrate, sodium tartrate monosuccinate, sodium tartrate disuccinate, and mixture thereof.

[0358] Certain adjuncts are particularly suitable for laundry/household cleaning applications as compared to for personal/beauty care cleaning compositions, while other adjuncts are vise versa. Certain adjuncts are categorized and described below as particularly suitable for the former or the latter, but that categorization is not meant to be exclusive in that adjuncts that are suitable for laundry/household cleaning compositions can be included in personal/beauty care cleaning compositions and vise versa as appropriate.

Adjuncts Particularly Suitable for Laundry/Household Applications

[0359] 1) Bleach System

[0360] A bleach system suitable for use herein can contain one or more bleaching agents. Suitable bleaching agents include, e.g., catalytic metal complexes, activated peroxygen sources, bleach activators, bleach boosters, photobleaches, bleaching enzymes, free radical initiators, and hyohalite bleaches. Suitable activated peroxygen sources include, e.g., preformed peracids, a hydrogen peroxide source with a bleach activator, or a mixture thereof. Suitable preformed peracids include, e.g., percarboxylic acids and salts, percarbonic acids and salts, perimidic acids and salts, peroxymonosulfuric acids and salts, and mixtures thereof. Suitable sources of hydrogen peroxide include, e.g., perborate compounds, percarbonate compounds, perphosphate compounds and mixtures. Suitable types and levels of activated peroxygen sources are described in, e.g., U.S. Pat. Nos. 5,576,282, 6,306,812, and 6,326,348.

[0361] A household cleaning composition of the invention can optionally comprise photobleach, which can be, for example, a xanthene dye photobleach, a photo-initiator, or mixtures thereof. Suitable photobleaches can also catalytic photobleaches and photo-initiators. In certain embodiments, catalytic photobleaches are selected from the group consisting of water soluble phthalocyanines of the formula:

##STR00004##

wherein: PC is the phthalocyanine ring system; Me is Zn; Fe(II); Ca; Mg; Na; K; Al--Z.sub.1; Si(IV); P(V); Ti(IV); Ge(IV); Cr(VI); Ga(III); Zr(IV); In(III); Sn(IV) or Hf(VI); Z.sub.1 is a halide; sulfate; nitrate; carboxylate; alkanolate; or hydroxyl ion; q is 0; 1 or 2; r is 1 to 4; Q1 is a sulfur or carboxyl group; or a radical of the formula: --SO.sub.2X.sub.2--R.sub.1--X.sub.3.sup.+; --O--R.sub.1--X.sub.3.sup.+; or --(CH.sub.2), --Y.sub.1.sup.+; in which R.sub.1 is a branched or unbranched C.sub.1-C.sub.8 alkylene; or 1,3- or 1,4-phenylene; X.sub.2 is --NH--; or --N--C.sub.1-C.sub.5 alkyl; X.sub.3.sup.+ is a group of the formula:

##STR00005##

or, in the case where R.sub.1.dbd.C.sub.1-C.sub.5 alkylene, also a group of the formula:

##STR00006##

Y.sub.1.sup.+ is a group of the formula:

##STR00007##

wherein t is 0 or 1; R.sub.2 and R.sub.3 independently of one another are C.sub.1-C.sub.6 alkyl; R.sub.4 is C.sub.1-C.sub.5 alkyl; C.sub.5-C.sub.7 cycloalkyl or NR.sub.7R.sub.8; R.sub.5 and R.sub.6 independently of one another are C.sub.1-C.sub.5 alkyl; R.sub.7 and R.sub.8 independently of one another are hydrogen or C.sub.1-C.sub.5 alkyl; R.sub.9 and R.sub.10 independently of one another are unsubstituted C.sub.1-C.sub.6 alkyl or C.sub.1-C.sub.6 alkyl substituted by hydroxyl, cyano, carboxyl, carb-C.sub.1-C.sub.6 alkoxy, C.sub.1-C.sub.6 alkoxy, phenyl, naphthyl or pyridyl; u is from 1 to 6; A.sub.1 is a unit which completes an aromatic 5- to 7-membered nitrogen heterocycle, which may where appropriate also contain one or two further nitrogen atoms as ring members, and B.sub.1 is a unit which completes a saturated 5- to 7-membered nitrogen heterocycle, which may where appropriate also contain 1 to 2 nitrogen, oxygen and/or sulfur atoms as ring members; Q.sub.2 is hydroxyl; C.sub.1-C.sub.22 alkyl; branched C.sub.3-C.sub.22 alkyl; C.sub.2-C.sub.22 alkenyl; branched C.sub.3-C.sub.22 alkenyl and mixtures thereof; C.sub.1-C.sub.22 alkoxy; a sulfo or carboxyl radical; a radical of the formula:

##STR00008##

a branched alkoxy radical of the formula:

##STR00009##

an alkylethyleneoxy unit of the formula: -(T.sub.1)d-(CH.sub.2).sub.b (OCH.sub.2CH.sub.2)e-B.sub.3; or an ester of the formula: COOR.sub.18, wherein B.sub.2 is hydrogen; hydroxyl; C.sub.1-C.sub.30 alkyl; C.sub.1-C.sub.30 alkoxy; --CO.sub.2H; --CH.sub.2COOH; --SO.sub.3-M.sub.1OSO.sub.3-M.sub.1; --PO.sub.3.sup.2-M.sub.1; --OPO.sub.3.sup.2-M.sub.1; and mixtures thereof; B.sub.3 is hydrogen; hydroxyl; --COOH; --SO.sub.3-M.sub.1; --OSO.sub.3-M.sub.1 or C.sub.1-C.sub.6 alkoxy; M.sub.1 is a water-soluble cation; T.sub.1 is --O--; or --NH--; X.sub.1 and X.sub.4 independently of one another are --O--; --NH-- or --N--C.sub.1-C.sub.5alkyl; R.sub.11 and R.sub.12 independently of one another are hydrogen; a sulfo group and salts thereof; a carboxyl group and salts thereof or a hydroxyl group; at least one of the radicals R.sub.11 and R.sub.12 being a sulfo or carboxyl group or salts thereof, Y.sub.2 is --O--; --S--; --NH-- or --N--C.sub.1-C.sub.5alkyl; R.sub.13 and R.sub.14 independently of one another are hydrogen; C.sub.1-C.sub.6 alkyl; hydroxy-C.sub.1-C.sub.6 alkyl; cyano-C.sub.1-C.sub.6 alkyl; sulfo-C.sub.1-C.sub.6 alkyl; carboxy or halogen-C.sub.1-C.sub.6 alkyl; unsubstituted phenyl or phenyl substituted by halogen, C.sub.1-C.sub.4 alkyl or C.sub.1-C.sub.4 alkoxy; sulfo or carboxyl or R.sub.13 and R.sub.14 together with the nitrogen atom to which they are bonded form a saturated 5- or 6-membered heterocyclic ring which may additionally also contain a nitrogen or oxygen atom as a ring member; R.sub.15 and R.sub.16 independently of one another are C.sub.1-C.sub.6 alkyl or aryl-C.sub.1-C.sub.6 alkyl radicals; R.sub.17 is hydrogen; an unsubstituted C.sub.1-C.sub.6 alkyl or C.sub.1-C.sub.6 alkyl substituted by halogen, hydroxyl, cyano, phenyl, carboxyl, carb-C.sub.1-C.sub.6 alkoxy or C.sub.1-C.sub.6 alkoxy; R.sub.18 is C.sub.1-C.sub.22 alkyl; branched C.sub.3-C.sub.22 alkyl; C.sub.1-C.sub.22 alkenyl or branched C.sub.3-C.sub.22 alkenyl; C.sub.3-C.sub.22 glycol; C.sub.1-C.sub.22 alkoxy; branched C.sub.3-C.sub.22 alkoxy; and mixtures thereof; M is hydrogen; or an alkali metal ion or ammonium ion, Z.sub.2.sup.- is a chlorine; bromine; alkylsulfate or arylsulfate ion; a is 0 or 1; b is from 0 to 6; c is from 0 to 100; d is 0; or 1; e is from 0 to 22; v is an integer from 2 to 12; w is 0 or 1; and A is an organic or inorganic anion, and s is equal to r in cases of monovalent anions A.sup.- and less than or equal to r in cases of polyvalent anions, it being necessary for A.sub.s.sup.- to compensate the positive charge; where, when r is not equal to 1, the radicals Q.sub.1 can be identical or different, and where the phthalocyanine ring system may also comprise further solubilising groups.

[0362] Other suitable catalytic photobleaches include xanthene dyes, sulfonated zinc phthalocyanine, sulfonated aluminium phthalocyanine, Eosin Y, Phoxine B, Rose Bengal, C. I. Food Red 14, and mixtures. In some embodiment, a photobleach can be a mixture of sulfonated zinc phthalocyanine and sulfonated aluminium phthalocyanine, wherein the weight ratio of sulfonated zinc phthalocyanine to sulfonated aluminium phthalocyanine is greater than 1, greater than 1 but less than about 100, or from 1 to about 4.

[0363] Suitable photo-initiators include, e.g., aromatic 1,4-quinones such as anthraquinones and naphthaquinones; alpha amino ketones, particularly those containing benzoyl moieties; alphahydroxy ketones, particularly alpha-hydroxy acetophenones; phosphorus-containing photoinitiators, including monoacyl, bisacyl and trisacyl phosphine oxide and sulphides; dialkoxy acetophenones; alpha-haloacetophenones; trisacyl phosphine oxides; benzoin and benzoin based photoinitiators; and mixtures thereof. Photo-initiators can, e.g., be 2-ethyl anthraquinone; Vitamin K3; 2-sulphate-anthraquinone; 2-methyl 1-[4-phenyl]-2-morpholinopropan-1-one (Irgacure.RTM. 907); (2-benzyl-2-dimethyl amino-1-(4-morpholinophenyl)-butan-1-one (Irgacure.RTM. 369); (1-[4-(2-hydroxyethoxy)-phenyl]-2 hydroxy-2-methyl-1-propan-1-one) (Irgacure.RTM. 2959); 1-hydroxy-cyclohexyl-phenyl-ketone (Irgacure.RTM. 184) (Ciba); oligo[2-hydroxy 2-methyl-1-[4(1-methyl)-phenyl]propanone (Esacure.RTM. KIP 150) (Lamberti); 2-4-6-(trimethyl-benzoyl)diphenyl-phosphine oxide, bis(2,4,6-trimethylbenzoyl)-phenyl-phosphine oxide (Irgacure.RTM. 819); (2,4,6 trimethyl benzoyl) phenyl phosphinic acid ethyl ester (Lucirin.RTM. TPO-L(BASF)); and mixtures thereof.

[0364] A number of photobleaches are commercially available, including those described above, from, e.g., Aldrich; Frontier Scientific; Ciba; BASF; Lamberti S.p.A; Dayglo Color Corporation; Organic Dyestuffs Corp.

[0365] 2) Pearlescent Agents

[0366] Pealescent agents are optional but commonly included ingredients of a number of household cleaners, especially, e.g., in hard surface cleaners. They are typically crystalline or glassy solids, transparent or translucent compounds capable of reflecting and/or refracting light to produce a perlescent effects. For example, they are crystalline particles insoluble in the composition in which they are incorporated. Preferably the pearlescent agents have the shape of thin plates or spheres (which are generally spherical). As commonly practiced in the art, particle sizes are measured across the largest diameter of spheres. Plate-like particles are defined as those wherein the two dimensions of the particle (length and width) are at least 5 times the third dimension (depth or thickness). Other crystal shapes like cubes or needles typically do not display pearlescent effect and thus are not used as perlescent agents.

[0367] Suitable pearlescent agents have D0.99 (sometimes referred to as D99) volume particle size of less than 50 .mu.m. Preferably the pearlescent agents have D0.99 of less than 40 .mu.m, e.g., less than 30 .mu.m. More preferably the particles have volume particle size of greater than 1 .mu.m. The D0.99 is a measure of particle size relating to particle size distribution and meaning in this instance that 99% of the particles have volume particle size of less than 50 .mu.m. Volume particle size and particle size distribution can be measured using conventional methods and equipment, such as, e.g., a Hydro 2000G (Malvern Instruments). The choice of a particle size needs to balance the ease of distribution vs. the efficacy of the pearlescent agent because the smaller the particles, the easier they are suspended, but the lower the efficacy.

[0368] Liquid compositions containing less water and more organic solvents will typically have a refractive index that is higher in comparison to the more aqueous compositions. In these compositions, pearlescent agents with high refractive index are preferably included because otherwise the pearlescent agents do not impart sufficient visual perlescence even when introduced at high levels (e.g., more than about 3 wt. %). In liquid compositions containing less water and more organic solvents, the perlescent agent is preferably one having a refractive index of more than 1.41 (e.g., more than 1.8, more than 2.0. In some embodiments, the difference in refractive index between the pearlescent agent and the cleaning composition or medium, to which pearlescent agent is added, is at least 0.02, or at least 0.2, or at least 0.6.

[0369] A liquid cleaning composition may comprise about 0.01 wt. % or more (e.g., about 0.02 wt. % or more, about 0.05 wt. % or more, about 0.1 wt. % or more, about 0.5 wt. % or more, about 1.0 wt. % or more, about 1.5 wt. % or more) of one or more pearlescent agents. Typically, however, the liquid composition comprises no more than about 2 wt. % (e.g., no more than about 1.5 wt. %, no more than about 1.0 wt. %, no more than about 0.5 wt. %) of one or more pearlescent agents. For example, a liquid cleaning composition herein comprises about 0.01 wt. % to about 2.0 wt. % (e.g., about 0.1 wt. % to about 1.5 wt. %) of pearlescent agents.

[0370] Suitable pearlescent agents may be organic or inorganic. Organic pearlescent agents include, e.g., monoester and/or diester of alkylene glycols, propylene glycol, diethylene glycol, dipropylene glycol, methylene glycol or tetraethylene glycol with fatty acids containing 6 to 22, preferably about 12 to about 18 carbon atoms, e.g., caproic acid, caprylic acid, 2-ethyhexanoic acid, capric acid, lauric acid, isotridecanoic acid, myristic acid, palmitic acid, palmitoleic acid, stearic acid, isostearic acid, oleic acid, elaidic acid, petroselic acid, linoleic acid, linolenic acid, arachic acid, gadoleic acid, behenic acid, erucic acid, and mixtures.

[0371] Inorganic pearlescent agents include mica, metal oxide coated mica, silica coated mica, bismuth oxychloride coated mica, bismuth oxychloride, myristyl myristate, glass, metal oxide coated glass, guanine, glitter, and mixtures thereof.

[0372] Organic pearlescent agent such as ethylene glycol mono stearate and ethylene glycol distearate provide pearlescence, but typically only when the composition is in motion. Hence only when the composition is poured will the composition exhibit pearlescence. Inorganic pearlescent materials are preferred as the provide both dynamic and static pearlescence. By dynamic pearlescence it is meant that the composition exhibits a pearlescent effect when the composition is in motion. By static pearlescence it is meant that the composition exhibits pearlescence when the composition is static.

[0373] Inorganic pearlescent agents are available as a powder, or as a slurry of the powder in an appropriate suspending agent. Suitable suspending agents include ethylhexyl hydroxy-stearate, hydrogenated castor oil. The powder or slurry of the powder can be added to the composition without the need for any additional process steps.

[0374] Optionally, co-crystallizing agents can be used to enhance the crystallization of the organic pearlescent agents. Suitable co-crystallizing agents include but are not limited to fatty acids and/or fatty alcohols having a linear or branched, optionally hydroxyl substituted, alkyl group containing from about 12 to about 22, preferably from about 16 to about 22, and more preferably from about 18 to 20 carbon atoms, such as palmitic acid, linoleic acid, stearic acid, oleic acid, ricinoleic acid, behenyl acid, cetearyl alcohol, hydroxystearyl alcohol, behenyl alcohol, linolyl alcohol, linolenyl alcohol, and mixtures thereof.

[0375] 3) Perfumes/Fragrances

[0376] The term "perfume" as used herein encompasses individual perfume ingredients as well as perfume accords. The perfume ingredients are often premixed to form a perfume accord prior to adding to a cleaning composition. As used herein, the term "perfume" can also include perfume microencapsulates. Perfume microcapsules comprise perfume raw materials encapsulated within a capsule made with materials selected from urea and formaldehyde; melamine and formaldehyde; phenol and formaldehyde; gelatine; polyurethane; polyamides; cellulose ethers; cellulose esters; polymethacrylate; and mixtures thereof. Encapsulation techniques are known and described in, for example, "Microencapsulation": methods and industrial applications, Benita & Simon, eds. (Marcel Dekker, Inc., 1996).

[0377] The perfume ingredients that can be included in a cleaning composition can include various natural and synthetic chemicals. Exemplary perfume ingredients include aldehydes, ketones, esters, natural extracts, natural essences and the like.

[0378] Industrial cleaning compositions often do not comprise perfume ingredients. However, perfume ingredients are commonly found in household and personal care cleaning compositions. When present, the level of perfume or perfume accord is, e.g., present in an amount of about 0.0001 wt. % or more (e.g., about 0.01 wt. % or more, about 0.1 wt. % or more, about 0.5 wt. % or more, about 2 wt. % or more), based on the total weight of the cleaning composition. For example, the level of perfume or perfume accord can be present in an amount of about 0.0001 wt. % to about 10 wt. % (e.g., about 0.01 wt. % to about 5 wt. %, about 0.1 wt. % to about 2 wt. %, preferably about 0.02 wt. % to about 0.8 wt. %, or about 0.003 wt. % to about 0.6 wt. %), by weight of the detergent composition. The level of perfume ingredients in a perfume accord, if one exists, is typically from about 0.0001 wt. % to about 99 wt. % by weight of the perfume accord. Exemplary perfume ingredients and perfume accords are disclosed in, for example, U.S. Pat. Nos. 5,445,747, 5,500,138, 5,531,910, 6,491,840, and 6,903,061.

[0379] 4) Dyes, Colorants, and Preservatives

[0380] The cleaning compositions herein can optionally contain dyes, colorants, and/or preservatives, or contain one or more, or none of these components. The dyes, colorants and/or preservatives can be naturally occurring or slightly processed from natural materials, or they can be synthetic. For example, natural-occurring preservatives include benzyl alcohol, potassium sorbate and bisabalol, sodium benzoate, and 2-phenoxyethanol. Synthetic preservatives can be selected from, for example, mildewstate or bacteriostate, methyl, ethyl, and propyl parabens, bisguarnidine components (e.g., Dantagard.TM. and/or Glydant.TM. (Lonza Group)). Midewstate or bacteriostate compounds include, without limitation, KATHON.RTM. GC, a 5-chloro-3-methyl-4-isothiazolin-3-one, KATHON.RTM. ICP, a 2-methyl-4-isothiazolin-4-one, and a blend thereof, and KATHON.RTM. 886, a 5-chloro-2-methyl-4-isothazolin-3-one (Dow Chemicals); BRONOPOL, a 2-bromo-2-nitropropane 1, 3 diol (Boots, Co. Ltd.); DOWICIDE.TM. A, a 1,2-benzoisothiazolin-3-one (Dow Chemicals); and IRGASAN.RTM. DP 200, a 2,4,4'-trichloro-2-hydroxydiphenylether (Ciba-Geigy, AG).

[0381] Dyes and colorants include synthetic dyes such as Liquitint.RTM. Yellow or Blue or natural plant yes or pigments, such as natural yellow, orange, red, and/or brown pigment, such as carotenoids, including, for example, beta-carotene and lycopene. The composition can additionally contain fluorescent whitening agents or bluing agents. Certain dyes can be light sensitive, including for example Acid Blue 145 (Crompton), Hidacid.RTM. blue (Hilton, Davis, Knowles & Triconh); Pigment Green No. 7, FD&C Green No. 7, Acid Blue 1, Acid Blue 80, Acid Violet 48, and Acid Yellow 17 (Sandoz Corp.); D&C Yellow No. 10 (Warner Jenkinson).

[0382] If present, dyes or colorants are, e.g., present in an amount of about 0.001 wt. % or more (e.g., about 0.002 wt. % or more, 0.01 wt. % or more, 0.05 wt. % or more, 0.1 wt. % or more; 0.5 wt. % or more). Usually, dyes and colorants are present, if at all, in an amount of no more than about 1 wt. % (e.g., no more than about 0.8 wt. %, no more than about 0.5 wt. %, no more than about 0.2 wt. %, no more than about 0.1 wt. %, no more than about 0.01 wt. %). For example, dyes and colorants can be present in a cleaning composition herein in an amount of about 0.001 wt. % to about 1 wt. % (e.g., about 0.01 wt. % to about 0.4 wt. %).

[0383] 5) Fabric Care Benefit Agents

[0384] A household cleaning composition can be a laundry detergent, wherein a preferred optional ingredient can be a fabric care benefit agent. As used herein, "fabric care benefit agent" refers to any material that can provide fabric care benefits such as fabric softening, color protection, pill/fuzz reduction, anti-abrasion, anti-wrinkle, and the like to garments and fabrics, particularly on cotton and cotton-rich garments and fabrics, when an adequate amount of the material is present on the garment/fabric. Non-limiting examples of fabric care benefit agents include cationic surfactants, silicones, poly olefin waxes, latexes, oily sugar derivatives, cationic polysaccharides, polyurethanes and mixtures. Suitable silicones include, e.g., silicone fluids such as poly(di)alkyl siloxanes, especially polydimethyl siloxanes and cyclic silicones.

[0385] Polydimethyl siloxane derivatives include, e.g., organofunctional silicones. One embodiment of functional silicone are the ABn type silicones, as described in U.S. Pat. Nos. 6,903,061, 6,833,344, and International Publication WO02/018528. A number of silicones are commercially available, including, e.g., Waro.TM. and Silsoft.TM. 843 (GE Silicones). Functionalized silicones or copolymers with one or more different types of functional groups such as amino, alkoxy, alkyl, phenyl, polyether, acrylate, silicon hydride, mercaptoproyl, carboxylic acid, quaternized nitrogen are also suitable as fabric care benefit agents. A number of these are commercially available including, e.g., SM2125, Silwet 7622 (GE Silicones), DC8822, PP-5495, DC-5562 (Dow Chemicals), KF-888, KF-889 (Shin Etsu Silicones); Ultrasil.RTM. SW-12, Ultrasil.RTM. DW-18, Ultrasil.RTM. DW-AV, Ultrasil.RTM. Q-Plus, Ultrasil.RTM. Ca-I, Ultrasil.RTM. CA-2, Ultrasil.RTM. SA-I, Ultrasil.RTM. PE-100 (Noveon Inc.), Pecosil.RTM. CA-20, Pecosil.RTM. SM-40, Pecosil.RTM. PAN-150 (Phoenix Chemical). Oily sugar derivatives suitable as fabric care benefit agents were described in International Publication WO 98/16538. Olean.RTM. is a commercial brand for certain oily sugar derivatives marketed by The Procter and Gamble Co.

[0386] Many dispersible polyolefins can be used to provide fabric care benefits. The polyolefins can be in the form of waxes, emulsions, dispersions, or suspensions. Preferably, the polyolefin is a polyethylene, polypropylene, or a mixture. The polyolefin can be partially modified to contain various functional groups, such as carboxyl, alkylamide, sulfonic acid or amide groups. For example, the polyolefin is partially carboxyl modified or oxidized.

[0387] Polymer latex can also be used to provide fabric care benefits in a water based cleaning composition. Non-limiting examples of polymer latexes include those described in, e.g., International Publication WO 02/018451. Additional non-limiting examples include the monomers used in producing polymer latexes, such as 100% or pure butylacrylate, butylacrylate and butadiene mixtures with at least 20 wt. % of butylacrylate, butylacrylate and less than 20 wt. % of other monomers excluding butadiene, alkylacrylate with an alkyl carbon chain at or greater than C.sub.6, alkylacrylate with an alkyl carbon chain at or greater than C.sub.6 and less than 50 wt. % of other monomers, or a third monomer added into monomer systems above.

[0388] Cationic surfactants are also useful in this invention. Examples of cationic surfactants have been described in, e.g., U.S. Patent Publication US2005/0164905.

[0389] Fatty acids can also be used as fabric care benefit agents. When deposited on fabrics, fatty acids or soaps thereof, provide fabric care benefits (e.g., softness, shape retention) to laundry fabrics. Useful fatty acids (or soaps, such as alkali metal soaps) are the higher fatty acids containing from about 8 to about 24 carbon atoms, more preferably from about 12 to about 18 carbon atoms. Soaps can be made by direct saponification of fats and oils or by the neutralization of free fatty acids. Particularly useful are the sodium and potassium salts of the mixtures of fatty acids derived from coconut oil and tallow. Fatty acids can be from natural or synthetic origin, both saturated and unsaturated with linear or branched chains.

[0390] Color care agents are another type of fabric care benefit agent that can be suitably included in a cleaning composition. Examples include metallo catalysts for color maintenance, such as those described in International Publication WO 98/39403.

[0391] Fabric care benefit agents, when present in a household cleaning composition such as a laundry detergent composition, can suitably be present at a level of up to about 30 wt. % (e.g., up to about 20 wt. %, up to about 15 wt. %, up to about 10 wt. %, up to about 5 wt. %, up to about 2 wt. %), based on the total weight of the cleaning composition. For example, a cleaning composition of the invention comprises about 1 wt. % to about 20 wt. % (e.g., about 2 wt. % to about 15 wt. %, about 5 wt. % to about 10 wt. %) of one or more fabric care benefit agents.

[0392] 6) Deposition Aid

[0393] As used herein, "deposition aid" refers to any cationic polymer or combination of cationic polymers that significantly enhance the deposition of the fabric care benefit agent onto the fabric during laundering. An effective deposition aid typically has a strong binding capability with the water insoluble fabric care benefit agents via physical forces such as van der Waals forces or non-covalent chemical bonds such as hydrogen bonding and/or ionic bonding.

[0394] An exemplary deposition aid is a cationic or amphoteric polymer. Amphoteric polymers have a net cationic charge. The cationic charge density of the polymer can range from about 0.05 milliequivalents/g to about 6 milliequivalents/g. The charge density is calculated by dividing the number of net charge per repeating unit by the molecular weight of the repeating unit. Nonlimiting examples of deposition aids include cationic polysaccharides, chitosan and its derivatives, and cationic synthetic polymers. Specific deposition aids include, e.g., cationic hydroxy ethyl cellulose, cationic starch, cationic guar derivatives, and mixtures. Certain deposition aids are commercially available, including, e.g., the JR 30M, JR 400, JR 125, LR 400 and LK 400 polymers (Amerchol Corp.), Celquat.RTM. H200, Celquat.RTM. L-200, and the Cato.RTM. starch (National Starch and Chemical Co.), and Jaguar Cl 3 and Jaguar Excel (Rhodia, Inc.).

[0395] 7) Fabric Substantive and Hueing Dye

[0396] Dyes can be included in a cleaning composition of the invention, e.g., a laundry detergent. Conventionally, dyes include certain types of acid, basic, reactive, disperse, direct, vat, sulphur or solvent dyes. For inclusion in cleaning compositions, direct dyes, acid dyes, and reactive dyes are preferred. Direct dyes are water-soluble dyes taken up directly by fibers from an aqueous solution containing an electrolyte, presumably due to selective adsorption. In the Color Index system, direct dye refers to various planar, highly conjugated molecular structures that contain one or more anionic sulfonate group. Acid dyes are water soluble anionic dyes that are applied from an acidic solution. Reactive dyes are those containing reactive groups capable of forming covalent linkages with certain portions of the molecules of natural or synthetic fibers. Suitable fabric substantive dyes that can be included in a cleaning composition include, e.g., an azo compound, stilbenes, oxazines and phthalocyanines.

[0397] Hueing dyes are another type of dyes that may be present in a household cleaning composition of the invention. Such dyes have been found to exhibit good tinting efficiency during a laundry wash cycle without exhibiting excessive undesirable build up during laundering. Typically, a hueing dye is included in the laundry detergent composition in an amount sufficient to provide a tinting effect to fabric washed in a solution containing the detergent. In one embodiment, the detergent composition comprises, e.g., about 0.0001 wt. % to about 0.05 wt. % (e.g., about 0.001 wt. % to about 0.01 wt. %) of a hueing dye.

[0398] 8) Dye Transfer Inhibitors

[0399] A household cleaning composition of the invention, e.g., a laundry detergent composition, can comprise one or more compounds for inhibiting dye transfer from one fabric to another of solubilized and suspended dyes encountered during fabric laundering operations involving colored fabrics. Exemplary dye transfer inhibitors include polymedc dye transfer inhibiting agents, which are capable of complexing or absorbing the fugitive dyes washed out of dyed fabrics before the dyes have an opportunity to become attached to other articles in the wash. Polymedc dye transfer agents are described in, e.g., International Publication WO 98/39403. Modified polyethyleneimine polymers, such as those described in International Publication WO 00/05334, which are water-soluble or dispersible, modified polyamines can also be used. Other exemplary dye transfer inhibiting agents include, e.g., polyvinylpyrridine N-oxide (PVNO), polyvinyl pyrrolidone (PVP), polyvinyl imidazole, N-vinyl-pyrrolidone and N-vinylimidazole copolymers (PVPVI), copolymers thereof, and mixtures. The amount of dye transfer inhibiting agents in the cleaning composition can be, e.g., about 0.01 wt. % to about 10 wt. % (e.g., about 0.02 wt. % to about 5 wt. %, about 0.03 wt. % to about 2 wt. %).

[0400] 9) Optional Ingredients

[0401] Unless specified herein below, an "effective amount" of a particular adjunct or ingredient is preferably present in an amount of about 0.01 wt. % or more (e.g., about 0.1 wt. % or more, about 0.5 wt. % or more, about 1.0 wt. % or more, about 2.0 wt. % or more), based on the total weight of the detergent composition. Optional adjuncts are usually presented in an amount of no more than about 20 wt. % (e.g., no more than about 15 wt. %, no more than about 10 wt. %, no more than about 5 wt. %, or no more than about 1 wt. %).

[0402] Examples of other suitable cleaning adjuncts, one or more of which may be included in a cleaning composition, include, e.g., effervescent systems comprising hydrogen peroxide and catalase; optical brighteners or fluorescers; soil release polymers; dispersants; suds suppressors; photoactivators; hydrolysable surfactants; preservatives; anti-oxidants; anti-shrinkage agents; gelling agents (e.g., amidoamines, amidoamine oxides, gellan gums); anti-wrinkle agents; germicides; fungicides; color speckles; antideposition agents such as celluose derivatives, colored beads, spheres or extrudates; sunscreens; fluorinated compounds; clays; luminescent agents or chemiluminescent agents; anti-corrosion and/or appliance protectant agents; alkalinity sources or other pH adjusting agents; solubilizing agents; processing aids; pigments; free radical scavengers, and mixtures. Suitable materials and effective amounts are described in, e.g., U.S. Pat. Nos. 5,705,464, 5,710,115, 5,698,504, 5,695,679, 5,686,014 and 5,646,101. Mixtures of the above components can be made in any proportion.

[0403] 10) Encapsulated Composition

[0404] A cleaning composition, such as a household cleaning composition including a laundry detergent, a dishwashing liquid, or a surface cleaning composition, of the present invention can optionally be encapsulated within a water soluble film. The water-soluble film can be made from polyvinyl alcohol or other suitable variations, carboxy methyl cellulose, cellulose derivatives, starch, modified starch, sugars, PEG, waxes, or combinations thereof.

[0405] In certain embodiment the water-soluble film may comprise other adjuncts such as copolymer of vinyl alcohol and a carboxylic acid, the advantages of which have been desbribed in, for example, U.S. Pat. No. 7,022,656. An exemplary benefit of such encapsulation practice is the improvement of the shelf-life of the pouched composition. Another exemplary advantage is that this practice provides improved cold water (e.g., less than 10.degree. C.) solubility to the cleaning composition. The level of the co-polymer in the film material is at least about 60 wt. % (e.g., about 65 wt. %, about 70 wt. %, about 80 wt. %) by weight. The polymer can have any average molecular weight, preferably about 1,000 daltons to 1,000,000 daltons (e.g., about 10,000 daltons to about 300,000 daltons, about 15,000 daltons to 200,000 daltons, about 20,000 daltons to 150,000 daltons). In certain embodiments, the copolymer present in the film is about 60% to about 98% hydrolysed (e.g., about 80% to 95% hydrolysed), to improve the dissolution of the material. In certain embodiments, the copolymer comprises about 0.1 mol % to about 30 mol % (e.g., about 1 mol % to about 6 mol %) of carboxylic acid. In certain embodiments, the water-soluble film comprises additional co-monomers, including, for example, sulfonates and ethoxylates such as 2-acrylamido-2-methyl-1-propane sulphonic acid. In further embodiments, the film can also comprise other ingredients, including, for example, plasticizers, for example, glycerol, ethylene glycol, diethyleneglycol, propane diol, 2-methyl-1,3-propane diol, sorbitol, and mixtures thereof, additional water, disintegrating aids, fillers, anti-foaming agents, emulsifying/dispersing agents, and/or antiblocking agents. It may be useful that the pouch or water-soluble film itself comprises a detergent additive to be delivered to the wash water, for example organic polymeric soil release agents, dispersants, dye transfer inhibitors. Optionally the surface of the film of the pouch may be dusted with fine powder to reduce the coefficient of friction. Sodium aluminosilicate, silica, talc and amylose are examples of suitable fine powders. Certain water-soluble films are commercially available, for example, those marketed under the tradename M8630.TM. (Mono-Sol).

Adjuncts Particularly Suitable for Personal Care Applications

[0406] 1) Hair Conditioning Agents

[0407] Cleaning compositions of the invention can comprise, in some embodiments such as, for example, in personal or beauty care applications, certain known conditioning agents. An exemplary conditioning agent especially suitable for personal care compositions such as shampoos, is a silicone or a silicone-containing material. Such materials can be selected from, e.g., non-volatile silicones, siloxane gums and resins, aminofunctional silicones, quaternary silicones, and mixtures thereof with each other and with volatile silicones. Examples of these silicone polymers have been disclosed, for example, in U.S. Pat. No. 6,316,541.

[0408] Silicone oils are flowable silicone materials having a viscosity as measured at 25.degree. C. of less than about 50,000 centistokes (e.g., less than aobut 30,000 centistokes). For example, silicone oils typically have a viscosity of about 5 centistokes to about 50,000 centistokes (e.g., about 10 centistokes to about 30,000 centistokes). Suitable silicone oils include polyalkyl siloxanes, polyaryl siloxanes, polyalkylaryl siloxanes, polyether siloxane copolymers, and mixtures. Other insoluble, non-volatile silicone fluids having hair conditioning properties can also be used. Methods of making microemulsions of silicone particles are described in the art, including, e.g., the tecnique described in U.S. Pat. No. 6,316,541. The silicone may, e.g., be a liquid at ambient temperatures, so as to be of a suitable viscosity to enable the material itself to be readily emulsified to the required particle size of about 0.15 microns or less.

[0409] The amount of silicone incorporated into a cleaning composition of the invention may depend on the type of composition and the particular silicone materials used. A preferred amount is about 0.01 wt. % to about 10 wt. %, although these limits are not absolute. The lower limit is determined by the minimum level to achieve acceptable conditioning for a target consumer group and the upper limit by the maximum level to avoid making the hair and/or skin unacceptably greasy. The activity of the microemulsion can be adjusted accordingly to achieve the desired amount of silicone or a lower level of the preformed microemulsion can be added.

[0410] The microemulsion of silicone oil may be further stabilized by sodium lauryl sulfate or sodium lauryl ether sulfate with 1-10 moles of ethoxylation. Additional emulsifier, preferably chosen from anionic, cationic, nonionic, amphoteric and zwitterionic surfactants, and mixtures thereof may be present. The amount of emulsifier will typically be in the ratio of about 1:1 to about 1:7 parts by weight of the silicone, although larger amounts of emulsifier can be used, e.g., in about 5:1 parts by weight of the silicone or more. Use of these emulsifiers may be necessary to maintain clarity of the microemulsion if the microemulsion is diluted prior to addition to the personal care cleaning composition. The same detersive surfactants in the cleaning composition can also serve as the emulsifier in the preformed microemulsion.

[0411] The silicone microemulsion may be further stabilized using an emulsion polymerization process. A suitable emulsion polymerization process has been described by, for example, U.S. Pat. No. 6,316,541. A typical emulsifier is TEA dodecyl benzene sulfonate which is formed in the process when triethanolamine (TEA) is used to neutralize the dodecyl benzene sulfonic acid used as the emulsion polymerization catalyst. It has been found that selection of the anionic counterion, typically an amine, and/or selection of the alkyl or alkenyl group in the sulfonic acid catalyst can further improve the stability of the microemulsion in the shampoo composition. Examples of preferred amines include, without limitation, triisopropanol amine, diisopropanol amine, and aminomethyl propanol.

[0412] 2) Pearlescent Agents

[0413] Pearlescent agents, such as those described herein above can be suitably included in a personal care cleaning composition such as a shampoo. They are defined, for the purpose of the present disclosure, as materials which impart to a composition the appearance of mother of pearl. Pearlescence is produced by specular reflection of light. Light reflected from pearl platelets or spheres as they lie essentially parallel to each other at different levels in the composition creates a sense of depth and luster. Some light is reflected off the pearlescent agent, and the remainder will pass through the agent, which may pass directly through or be refracted. Reflected, refracted light produces a different colour, brightness and luster.

[0414] 3) Cationic Cellulose or Guar Polymer

[0415] Cleaning compositions of the present invention can further contain a cationic polymer to aid the deposition of the silicone oil component and enhance conditioning performance. Non limiting examples of such polymers are described in the CTFA Cosmetic Ingredient Dictionary, 3rd ed, Estrin, Crosley, & Haynes eds., (The Cosmetic, Toiletry, and Fragrance Association, Inc., Washington, D. C. (1982)). Suitable cataionic polymers include polysaccharide polymers, such as cationic cellulose derivatives, for example, salts of hydroxyethyl cellulose reacted with trimethyl ammonium substituted epoxide, referred to in the industry (CTFA) as Polyquaternium 10, as well as Polymer LR, JR, JP and KG series polymers (Amerchol Corp.). Other suitable cationic cellulose polymers includes the polymeric quaternary ammonium salts of hydroxyethyl cellulose reacted with lauryl dimethyl ammonium-substituted epoxide referred to in the industry (CTFA) as Polyquaternium 24, available under the tradename Polymer LM-200 (Amerchol Corp). Suitable cationic guar polymers include cationic guar gum derivatives, such as guar hydroxypropyltrimonium chloride, and those described in, for example, U.S. Pat. No. 5,756,720. Certain of these polymers are commercialy available, including, for example, Jaguar.RTM. Excel (Rhodia Corp.).

[0416] When used, the cationic polymers herein are either soluble in the cleaning composition or are soluble in a complex coacervate phase in the cleaning composition formed by the cationic polymer and the anionic, amphoteric and/or zwitterionic detersive surfactant component described hereinbefore. Complex coacervates of the cationic polymer can also be formed with other charged materials in the composition.

[0417] Concentrations of the cationic polymer in the composition can range from about 0.01 wt. % to about 3 wt. % (e.g., about 0.05 wt. % to about 2 wt. %, about 0.1 wt. % to about 1 wt. %). Suitable cationic polymers have cationic charge densities of at least about 0.4 meq/gm (e.g., at least about 0.6 meq/gm). Suitable cationic polymers have cationic charge densities of no more than about 5 meq/gm, at the pH of intended use of the cleaning composition. In an exemplary personal care cleaning composition, such as, for example, a shampoo, which generally has a pH range of about 3 to about 9 (e.g., about 4 to about 8). As used herein, "cationic charge density" of a polymer refers to the ratio of the number of positive charges on the polymer to the molecular weight of the polymer. The average molecular weight of suitable cationic guars and cellulose polymers is typically at least about 800,000 daltons. For example, suitable cationic polymers, which can be included in a cleaning composition of the present invention, is one of sufficiently high cationic charge density to effectively enhance deposition efficiency of the solid particle components in the cleaning composition. Cationic polymers comprising cationic cellulose polymers and cationic guar derivatives with cationic charge densities of at least about 0.5 meq/gm and preferably less than about 7 meq/gm are suitable.

[0418] Preferably, the deposition polymers give good clarity and adequate flocculation on dilution with water during use, especially when suitable electrolytes including, e.g., sodium chloride, sodium benzoate, magnesium chloride, and magnesium sulfate, are added.

[0419] 4) Perfumes/Fragrances

[0420] Just as perfumes or perfume accords are typically included in a household cleaning composition of the invention, perfumes or perfume accords as described herein (e.g., supra) are often included in a personal care cleaning composition, such as a shampoo or a bodywash composition. The perfume ingredients, which optionally can be formulated into a perfume accord prior to blending or formulating the cleaning composition, can be obtained from a wide variety of natural or synthetic sources. They include, without limitation, aldehydes, ketones, esters, and the like. They also include, for example, natural extracts and essences, which can include complex mixtures of ingredients, such as orange oils, lemon oils, rose extracts, lavender, musk, patchouli, balsamic essence, sandalwood oil, pine oil, cedar, and the like. The amount of perfume to be included in a cleaning composition of the invention can vary, for example, from about 0.0001 wt. % to about 2 wt. % (e.g., about 0.01 wt. % to about 1.0 wt. %, about 0.1 wt. % to about 0.5 wt. %), based on the total weight of the cleaning composition.

[0421] 5) Sensory Indicators--Silica Particles

[0422] Optionally, in a personal care cleaning composition of the invention, various sensory indicators can be included. These agents provide a change in sensory feel after an appropriate usage time, allowing for easy and precise recognition for the appropriate time of washing. For example, these agents are particularly suitable for cleaning compositions such as hand cleansers. An exemplary type of sensory indicators are silica particles. The properties of the silica particle may be adjusted to provide the desired end point in time.

[0423] Various silica particles are commercially available, including, for example, those made and distributed by INEOS Silicas Ltd. These particles have also been described in, for example, U.S. Pat. No. 6,165,510, U.S. Patent Publication 2003/0044442.

[0424] Silica particles can be present in an amount that can initially be felt by hands when starting washing with the cleaning composition. In one embodiment, the amount of silica particles is about 0.05 wt. % to about 8 wt. %. In some embodiments, suitable silica particles can have an initial average diameter of about 50 .mu.m to about 600 .mu.m (e.g., about 180 to about 420 .mu.m). In some embodiments, silica particles can further comprise color or pigment on the surface. In other embodiments, suitable silica particles diminish in size and cannot be felt by users during washing before about 5 min, about 2 min, about 30 sec, about 25 sec, about 20 sec, about 15 sec, about 10 sec, about 5 sec, about 5 to about 30 sec, or about 10 to about 30 sec.

[0425] Silica particles can also, in addition to providing sensory indications, improve the dispensing of the cleaning composition. For example, by including these particles, the cleaning composition, such as a liquid hand cleaner or a shampoo, may achieve a desirable thickness such that it is easier to be dispensed with a pump.

[0426] It is often desirable to regulate the viscosity of a composition comprising silica particles, however. Addition of glycerin has been found to be an effective approach to achieve this regulation. Glycerin is typically added to a composition comprising silica particles in an amount of at least about 1 wt. % (e.g., about 2 wt. %, about 2.5 wt. %, about 3 wt. %, about 4 wt. %, about 5 wt. %, or about 6 wt. %), based on the total weight of the cleaning composition. In some embodiments, glycerin is added in an amount of less than about 10 wt. % (e.g., less than about 8 wt. %, less than about 6 wt. %, less than about 4 wt. %, less than about 2 wt. %). The addition of glycerin may, in certain embodiments, help prevent clogging of pumps.

[0427] 6) Suspension Agents-Viscosity Control

[0428] Cleaning compositions of the invention can also include a suspending agent that allows the particulate matters therein, e.g., the silica particles, to remain suspended. Suspending agents are materials that are capable of increasing the ability of the composition to suspend material. Examples of suspending agents include, e.g., synthetic structuring agents, polymeric gums, polysaccharides, pectin, alginate, arabinogalactan, carrageen, gellan gum, xanthum gum, guar gum, rhamsan gum, furcellaran gum, and other natural gum. An exemplary synthetic structuring agent is a polyacrylate. An exemplary acrylate aqueous solution used to form a stable suspension of the solid particles is manufactured by Lubrizol as CARBOPOL.TM. resins, also known as CARBOMER.TM., which are hydrophilic high molecular weight, crosslinked acrylic acid polymers. Other polymers suitable as suspension agents include, e.g., CARBOPOL.TM. Aqua 30, CARBOPOL.TM. 940 and CARBOPOL.TM. 934.

[0429] The suspending agents can be used alone or in combination. The amount of suspending agent can be any amount that provides for a desired level of suspending ability. In certain embodiment, the suspending agent is present in an amount of about 0.01 wt. % to about 15 wt. % (e.g., about 0.1 wt. % to about 12 wt. %, about 1 wt. % to about 10 wt. %, about 2 wt % to about 5 wt. %) by weight of the cleaning composition.

[0430] 7) Other Suitable Adjuncts

[0431] A number of other adjuncts can be suitable for inclusion in a personal care cleaning composition. Those include, for example, thickeners, such as hydroxyl ethyl cellulose derivatives (e.g., Methocel.TM. products, Dow Chemicals; Natrosol.RTM. products, Aqualon Ashland; Carbopol.TM. products, Lubrizol).

[0432] Stability enhancers can also be included as suitable adjuncts. They are typically nonionic surfactants, including those having an hydrophilic-lipophilic balance range of about 9-18. These surfactants can be straight chained or branched chained, and they typically containing various levels of ethoxylation/propoxylation. The nonionic surfactants useful in the present invention are preferably formed from a fatty alcohol, a fatty acid, or a glyceride with a C.sub.3 to C.sub.24 carbon chain, preferably a C.sub.12 to C.sub.18 carbon chain derivatized to yield a Hydrophilic-Lipophilic Balance (HLB) of at least 9. HLB is understood to mean the balance between the size and strength of the hydrophilic group and the size and strength of the lipophilic group of the surfactant. Suitable adjuncts for personal care cleaning compoisitons can also include various vitamins, including, for example, vitamin B complex; incuding thiamine, nicotinic acid, biotin, pantothenic acid, choline, riboflavin, vitamin B6, vitamin B12, pyridoxine, inositol, carnitine, vitamins A, C, D, E, K, and their derivatives.

[0433] Further suitable adjuncts may include one or more materials such as antimicrobial agents, antifungal agents, antidandruff agents, dyes, foam boosters, pediculocides, pH adjusting agents, preservatives, proteins, skin active agents, sunscreens, UV absorbers, minerals, herbal/fruit/food extracts, sphigolipid derivatives or synthetic derivatives, and clay.

Examples of Preferred Embodiments

[0434] Surfactant compositions of the invention can be formulated or used without substantial post-production processing. This is especially the case if the surfactant composition is applied in industrial settings, for example, in oil industry for oil recovery applications. Because typically minimum purity specification is required in such settings, it is potentially possible to use whole-cell broths. Surfactants comprising microbially-produced branched fatty alcohols and derivatives prepared in accordance with the methods herein are relatively more selective, as compared with conventional chemical surfactants. As such, they are required in small quantities, and effective under a broad range of oil and reservoir conditions. They are also more environmentally friendly in protection of coastal areas from additional damage inflicted by synthetic chemicals, because they are readily biodegradable and have lower toxicity than synthetic surfactants. Potentially an about 30% or more increase in total oil recovery from underground sandstone can be achieved using surfactants comprising microbially produced fatty alcohols and derivatives such as those described herein.

[0435] Microbially-produced fatty alcohols, including branched fatty alcohols and derivatives thereof such as those described herein, are also more anaerobic, halotolerant and thermo-tolerant as compared to their petroleum-derived counterparts, making surfactants comprising these fatty alcohols particularly useful for in situ enhanced oil recovery. These surfactants are potent reducers of oil viscosity, making it vastly easier to pump heavy oils from underground sandstone as well as through commercial pipelines for long distances. Microbially-produced fatty alcohols and derivatives and surfactants comprising these materials can also be used to desludge crude oil storage tanks. The branched fatty alcohols and derivates described herein also have improved low temperature properties, and are thus particularly suited for application in low temperature environments such as in the deep sea.

[0436] Potentially, suitable host cells can be engineered such that the culture broth not only provide suitable surfactants but also provides biodegradation of hydrocarbons, resulting in microbial remediation of hydrocarbon- and crude oil-contaminated soils. Furthermore, the branched fatty alcohols, derivatives thereof, as well as the surfactants comprising these materials can be used to manage and emulsify hydrocarbon-water mixtures. This capacity to effectively emulsify oil/water mixtures can be utilized in oil spill management.

[0437] With more extensive post-production processing, surfactants comprising the branched fatty alcohols and derivatives as described herein can be particularly suitable as food additives or in the health care and cosmetic industries. The branching of these molecules confer added oxidative stability and significantly decreased volatility and vapor pressure. They are also useful as ingredients in various household and personal and/or pet care cleaning compositions, with particular advantages at lower washing temperatures.

[0438] In certain embodiments, the invention features a surfactant composition comprising about 0.001 wt. % to about 100 wt. % (e.g., about 0.01 wt. % to about 80 wt. %, about 0.1 wt. % to about 70 wt. %, about 1 wt. % to about 60 wt. %, about 5 wt. % to about 50 wt. %) of one or more microbially produced branched fatty alcohols and/or derivatives thereof. An exemplary surfactant composition of the invention comprises about 0.1 wt. % to about 50 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof. The surfactant composition of the present invention can further comprise one or more other co-surfactants, derived from similar origins (e.g., microbially produced) or different origins (e.g., chemically synthesized, derived from petroleum sources).

[0439] In another aspect, the invention pertains to a cleaning composition comprising one or more surfactants comprising branched fatty alcohols and derivatives produced in accordance with the methods described herein. The inventive cleaning composition can be formulated as a solid cleaning composition or as a liquid cleaning composition.

[0440] In certain embodiment, the invention provides a cleaning composition comprising about 0.1 wt. % to about 50 wt. % (e.g., about 0.1 wt. % to about 50 wt. %, about 0.5 wt. % to about 45 wt. %, about 1 wt. % to about 40 wt. %, about 5 wt. % to about 35 wt. %, about 10 wt. % to about 30 wt. %) of one or more microbially produced branched fatty alcohols and/or derivatives thereof. An exemplary cleaning composition comprises about 1 wt. % to about 40 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof. In another embodiment, the composition comprises about 2 wt. % to about 20 wt. % of microbially produced branched fatty alcohols and/or derivatives thereof.

[0441] In one embodiment, the invention features a liquid cleaning composition comprising (a) about 0.1 wt. % to about 50 wt. % (e.g., about 0.1 wt. % to about 50 wt. %, about 0.5 wt. % to about 45 wt. %, about 1 wt. % to about 40 wt. %, about 5 wt. % to about 35 wt. %, about 10 wt. % to about 30 wt. %) of one or more microbially produced branched fatty alcohols and/or derivatives thereof, (b) about 1 wt. % to about 30 wt. % (e.g., about 2 wt. % to about 25 wt. %, about 5 wt. % to about 20 wt. %) of one or more co-surfactant, (c) about 0 wt. % to about 10 wt. % (e.g., about 0 wt. % to about 10 wt. %, about 0 wt. % to about 8 wt. %, about 0 wt. % to about 5 wt. %, about 0 wt. % to about 2 wt. %) of one or more detergency builders, (d) about 0 wt. % to about 2.0 wt. % (e.g., about 0.0001 wt % to about 1.5 wt. %, about 0.001 wt. % to about 1 wt. %, about 0.01 wt. % to about 0.8 wt. %) of one or more enzymes; (e) about 0 wt. % to about 15 wt. % (e.g., about 0 wt. % to about 12 wt. %, about 0 wt. % to about 10 wt. %, about 0 wt. % to about 8 wt. %, about 0 wt. % to about 5 wt. %) of one or more chelating agents; (f) about 0 wt. % to about 20 wt. % (about 0 wt. % to about 15 wt. %, about 0 wt. % to about 10 wt %, about 0 wt. % to about 5 wt. %) of one or more hydrotropes; (g) about 0 to about 15 wt. % (e.g., about 0 wt. % to about 10 wt. %, about 0 wt. % to about 8 wt. %, about 0 wt. % to about 5 wt. %) of one or more rheology modifiers; (h) about 0 wt % to about 1.0 wt. % (e.g., about 0 wt. % to about 0.8 wt. %, about 0 wt. % to about 0.5 wt. %, about 0 wt. % to about 0.2 wt. %) of one or more organic sequestering agents; and (i) about 0.1 wt. % to about 98 wt. % (e.g., about 0.1 wt. % to about 95 wt. %, about 1 wt. % to about 90 wt. %, about 10 wt. % to about 85 wt. %) of a solvent system comprising water or other suitable solvents.

[0442] In another embodiment, the invention features a solid detergent composition comprising (a) about 0.1 wt. % to about 50 wt. % of one or more microbially produced fatty alcohols and/or derivatives thereof, (b) about 1 wt. % to about 30 wt. % of one or more co-surfactants, (c) about 1 wt. % to about 60 wt. % of one or more detergency builders, (d) about 0 wt. % to about 2.0 wt. % of one or more enzymes, (e) about 0 wt. % to about 20 wt. % of one or more hydrotropes, (f) about 10 wt. % to about 35 wt. % of one or more filler salts, (f) about 0 wt. % to about 15 wt. % of one or more chelating agents, and (g) about 0.01 wt. % to about 1 wt. % of one or more organic sequestering agents.

[0443] When the cleaning composition is a solid (e.g., a particulate, a granule, a tablet), the composition herein can be in any solid form, such as a granular composition or for example a tablet, flake, extrudate, agglomerate, or granule-containing composition. Alternatively, the detergent composition can be a powder. The composition herein can be made by methods such as dry-mixing, agglomerating, compaction, spray drying of various ingredients comprised in the composition herein, or a combination thereof. The composition herein preferably has a bulk density of from about 300 g/L, 350 g/L, or 450 g/L to 1500 g/L, 1000 g/L, or 850 g/L.

[0444] In certain embodiments, a liquid cleaning composition of the present invention is formulated such that during use, the wash water will have a pH of between about 6.5 and about 11.0 (e.g., between about 6.5 to about 11, between about 7.0 to about 8.5).

[0445] In one embodiment, the invention provides a dishwashing detergent composition. The dishwashing detergent composition can be formulated for use in hand washing of dishes or for use in automatic dishwashers. A skilled person will appreciate that a detergent composition formulated for use in automatic dishwashers should contain suitable antifoaming agents in order to prevent excessive foaming of the detergent composition within the dishwasher. However, foaming may be desirable when hand washing dishes. Antifoaming agents are known. For example, various silicone antifoam compounds can be used, including a variety of relatively high molecular weight polymers containing siloxane units and hydrocarbyl groups of various types. Other suitable antifoam agents include monocarboxylic fatty acids and soluble salts thereof, high molecular weight fatty esters (e.g., fatty acid triglycerides), fatty acid esters of monovalent alcohols, aliphatic C.sub.18-C.sub.40 ketones (e.g., stearone), N-alkylated amino triazines (e.g., tri- to hexa-alkylmelamine or di- to tetra-alkyldiamine chlortriazines), propylene oxide, bis-stearic acid amide, and monostearyl di-alkali metal (e.g., sodium, potassium, lithium) phosphate and phosphate esters, amine oxides, alkanolamides, betaines, and mixtures thereof.

[0446] In addition the dishwashing detergents can optionally comprise one or more enzymes, gelling agents, abrasive materials, fragrances, solubility enhancers, antideposition agents, e.g., cellulose derivatives. Abrasive materials can be, e.g., pumice, sand, feldspar, corn meal, or mixtures. Antideposition agent can be present in an exemplary cleaning composition in an amount of about 0.1 wt. % to about 5 wt. % (e.g., about 0.1 wt. % to about 2 wt. %).

[0447] In certain embodiments, the invention provides a laundry detergent composition comprising, in addition to the microbially produced branched fatty alcohols and/or derivatives thereof as described herein, the co-surfactants and the builders, optionally one or more enzymes, gelling agents, fragrances, antideposition agents, brighteners, anticaking agents, pearlescent agents, fabric softeners, bleach systems, dyes or colorants, preservatives, fabric care benefit agents, hueing dyes, soil release polymers, photoactivators, hydrolysable surfactants, anti-shrinkage agents, anti-wrinkle agents, germicides, fungicides, color speckles, colored beads, fluorinated compounds, etc.

[0448] In certain embodiments, the invention further provides a solid surface cleaning composition. In addition to the microbially produced branched fatty alcohols and/or derivatives thereof as described herein, the co-surfactants and the builders, the surface cleaning composition can further comprise one or more of the optional ingredients including, without limitation, one or more enzymes, gelling agents, fragrances, antideposition agents, pearlescent agents, soil release polymers, germicides, abrasive materials, fungicides and mixtures thereof.

[0449] In certain embodiments, the invention also provide a personal and/or pet care cleaning composition comprising one or more microbially produced branched fatty alcohols and/or derivatives thereof, builders, and co-surfactants. Optionally, additional components can be included in the personal and/or pet care cleaning composition, including, for example, conditioners, silicones, silica particles, cationic cellulose or guar polymers, silicone microemulsion stabilizers, enzymes, fatty amphiphiles, germicides, fungicides, anti-dandruff agents, pearlescent agents, foam boosters, pediculocides, pH adjusting agents, UV absorbers, sunscreens, skin active agents, vitamins, minerals, herbal/fruit/food extracts, sphingolipids, sensory indicators, suspension agents, and mixtures thereof.

[0450] The invention further provides a method for cleaning a substrate, such as fibers, fabrics, hard surfaces, skin, hair, etc., by contacting the substrate with the cleaning composition of the invention and water. Agitation is preferably provided to enhance cleaning. Suitable means for providing agitation include rubbing by hand or with a brush, sponge, cloth, mop, or other cleaning device, automatic laundry machines, automatic dishwashers, and the like.

EXAMPLES

[0451] The invention is further illustrated by the following examples. The examples are provided for illustrative purposes only. They are not to be construed as limiting the scope or content of the invention in any way.

[0452] Although particular methods are described, one of ordinary skill in the art will understand that other, similar methods also can be used. In general, standard laboratory practices were used, unless otherwise stipulated. For example, standard laboratory practices were used for: cloning; manipulation and sequencing of nucleic acids; purification and analysis of proteins; and other molecular biological and biochemical techniques. Such techniques are explained in detail in standard laboratory manuals, such as Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., vol. 1-3, Cold Spring Harbor, New York (2000), and Ausubel et al., Current Protocols in Molecular Biology, Greene Publ. Assoc. & Wiley-Intersciences (1989).

Example 1

Constructing E. coli MG1655 .DELTA.fadE .DELTA.tonA AAR:Kan

[0453] This example describes the construction of a genetically engineered microorganism in which the expression of a fatty acid degradation enzyme is attenuated.

[0454] The fadE gene of E. coli MG1655 was deleted using the lambda red system described by Datsenko et al., Proc. Natl. Acad. Sci. USA 97: 6640-6645 (2000), with the following modifications:

[0455] The following two primers were used to create the deletion of fadE:

TABLE-US-00012 Del-fadE-F (SEQ ID NO: 158) 5'-AAAAACAGCAACAATGTGAGCTTTGTTGTAATTATATTGTAAAC ATATTGATTCCGGGGATCCGTCGACC; and Del-fadE-R (SEQ ID NO: 159) 5'-AAACGGAGCCTTTCGGCTCCGTTATTCATTTACGCGGCTTCAAC TTTCCTGTAGGCTGGAGCTGCTTC

[0456] The Del-fadE-F and Del-fadE-R primers were used to amplify the kanamycin resistance (Km.sup.R) cassette from plasmid pKD13 by PCR. The PCR product was then used to transform electrocompetent E. coli MG1655 cells containing pKD46 that had been previously induced with arabinose for 3-4 hours. Following a 3-hour outgrowth in SOC medium at 37.degree. C., the cells were plated on Luria agar plates containing 50 .mu.g/ml of Kanamycin. Resistant colonies were identified and isolated after an overnight incubation at 37.degree. C. Disruption of the fadE gene was confirmed in some of the colonies by PCR amplification using primers fadE-L2 and fadE-R1, which were designed to flank the E. coli fadE gene.

[0457] The fadE deletion confirmation primers were:

TABLE-US-00013 fadE-L2 (SEQ ID NO: 160) 5'-CGGGCAGGTGCTATGACCAGGAC; and fadE-R1 (SEQ ID NO: 161) 5'-CGCGGCGTTGACCGGCAGCCTGG

[0458] After the fadE deletion was confirmed, a single colony was used to remove the Km.sup.R marker using the pCP20 plasmid as described by Datsenko et al., supra. The resulting MG1655 E. coli strain with the fadE gene deleted and the Km.sup.R marker removed was named E. coli MG1655 .DELTA.fadE, or E. coli MG 1655 D1.

[0459] Furthermore, the expression of an outer membrane protein receptor for ferrichrome, colicin M, or phages T1, T5, and phi80 are attenuated.

[0460] The tonA gene of E. coli MG1655, which encodes a ferrichrome outer membrane transporter (GenBank Accession No. NP_414692), was deleted from strain E. coli MG1655 D1 of Example 1, using the lambda red system according to Datsenko et al., supra, but with the following modifications:

[0461] The primers used to create the deletion of tonA were:

TABLE-US-00014 Del-tonA-F (SEQ ID NO: 162) 5'-ATCATTCTCGTTTACGTTATCATTCACTTTACATCAGAGATATAC CAATGATTCCGGGGATCCGTCGACC; and Del-tonA-R (SEQ ID NO: 163) 5'-GCACGGAAATCCGTGCCCCAAAAGAGAAATTAGAAACGGAAG GTTGCGG TTGTAGGCTGGAGCTGCTTC

[0462] The Del-tonA-F and Del-tonA-R primers were used to amplify the kanamycin resistance (Km.sup.R) cassette from plasmid pKD13 by PCR. The PCR product obtained in this way was used to transform electrocompetent E. coli MG1655 D1 cells of Example 1 containing pKD46, which cells had been previously induced with arabinose for 3-4 h. Following a 3-hour outgrowth in SOC medium at 37.degree. C., cells were plated on Luria agar plates containing 50 .mu.g/ml of kanamycin. Resistant colonies were identified and isolated after an overnight incubation at 37.degree. C. Disruption of the tonA gene was confirmed in some of the colonies by PCR amplification using primers flanking the E. coli tonA gene: tonA-verF and tonA-verR:.

TABLE-US-00015 tonA-verF (SEQ ID NO: 164) 5'-CAACAGCAACCTGCTCAGCAA; and tonA-verR (SEQ ID NO: 165) 5'-AAGCTGGAGCAGCAAAGCGTT

[0463] After the tonA deletion was confirmed, a single colony was used to remove the Km.sup.R marker using the pCP20 plasmid as described by Datsenko et al., supra. The resulting MG1655 E. coli strain having fadE and tonA gene deletions was named E. coli MG1655 .DELTA.fadE_.DELTA.tonA, or E. coli MG1655 DV2

[0464] The aar gene encoding Synechococcus elongatus PCC_Synpcc7942_1594 enzyme is integrated into the chromosome with the kanamycin marker directly after the aar sequence.

Example 2

Expression of BKD Homologs and FabH in E. coli

[0465] A branched chain alpha-keto acid dehydrogenase complex from Pseudomonas putida and a FabH from Bacillus subtilis were used to generate two E. coli plasmids for expression. First, the Pseudomonas putida BKD operon was PCT-amplified from Pseudomonas putida F1 genomic DNA. The following primers were used:

TABLE-US-00016 P.p.BKDFUsion_F: (SEQ ID NO: 166) 5'-ATAAACCATGGATCCATGAACGAGTACGCCCC-3' P.pBKDFusion_R: (SEQ ID NO: 167) 5'-CCAAGCTTCGAATTCTCAGATATGCAAGGCGTG-3'

[0466] Using these primers, Pseudomonas putida Pput_1450 (GenBank Accession No. A5W0E08), Pput_1451 (GenBank Accession No. A5W0E9), Pput_1452 (GenBank Accession No. A5W0F0), and Pput_1453 (A5W0F1) were amplified. The PCR product was then cloned into vector pGL10.173B (See, FIG. 8), a plasmid with a pBR322 backbone and a pTrc promoter to drive gene expression. The PCR product was cloned into pGL between BamHI and EcoRI restriction sites. Correct insertion of the PCR product was verified by diagnostic restriction digests. The resulting plasmid was named "pKZ4." (See, FIG. 7)

[0467] To clone E. coli PfabH promoter-B. subtilis fabH1 into a pACYC vector, insert of pDG6 (pCFDuet-E. coli PfabH promoter-B. subtilis fabH1) was subcloned into pACYC vector using NcoI and AvrII restriction sites. The resulting plasmid was named pDG6 (pCFDuet+E. coli PfabH+B. subtilis fabH1). (See, FIG. 6B and FIG. 6C)

[0468] E. coli strain MG1655 .DELTA.fadE_.DELTA.tonA, AAR:kan was transformed with pKZ4 and pDG6 (pCFDuet+E. coli PfabH+B. subtilis fabH1). The strain was evaluated for production of branched chain materials using shake flask fermentation. Shake flask fermentation was carried out using Che-9 media. Specifically, cultures of E. coli MG1655 .DELTA.fadE_.DELTA.tonA AAR:kan without plasmids or carrying individual plasmids were used as controls. Seed cultures of E. coli MG1655 .DELTA.fadE_.DELTA.tonA AAR:kan, E. coli MG1655 .DELTA.fadE_.DELTA.tonA AAR:kan+pKZ4, E. coli MG1655 .DELTA.fadE_.DELTA.tonA AAR:kan+pDG6, and E. coli MG1655 .DELTA.fadE_.DELTA.tonA AAR:kan+pKZ4+pDG6 were grown in LB broths supplemented with the appropriate antibiotics. After 4 hours of growth, the cultures were diluted 1:25 in Che-9 2NBT medium+appropriate selection marker and grown overnight. The cultures were then diluted in 4NBT to a final OD.sub.600.about.0.2. After 6 h of growth, IPTG was added to a final concentration of 1 mM. At 24 h post-induction, 1 mL of culture was extracted with 0.5 mL of methyl tert-butyl ether (MTBE) and subjected to GC/MS analysis. The analysis revealed the production of iso-C.sub.14:0, iso-C.sub.15:0, anteiso-C.sub.15:0, iso-C.sub.16:0, iso-C.sub.17:0, and anteiso-C.sub.17:0 fatty alcohols. (See, FIG. 4A).

Example 3

Quantification and Identification of Branched Fatty Alcohols

Instrumentation:

[0469] The instrument is an Agilent 5975B MSD system equipped with a 30 m.times.0.25 mm (0.10 .mu.m film) DB-5 column. The mass spectrometer was equipped with an electron impact ionization source. Two GC/MS programs were utilized.

[0470] GC/MS program #1: The temperature of the column is held isothermal at 90.degree. C. for 5 min, then is raised to 300.degree. C. with a 25.degree. C./min ramp, and finally stays at 300.degree. C. for 1.6 min. The total run time is 15 min With this program, the inlet temperature is hold at 300.degree. C. The injector is set at splitless mode. 1 .mu.L of sample is injected for every injection. The carrier gas (helium) is released at 1.0 mL/min. The source temperature of the mass spectrometer is held at 230.degree. C.

[0471] GC/MS program #2: The temperature of the column is held isothermal at 100.degree. C. for 3 min, then is raised to 320.degree. C. with 20.degree. C./min, and finally stays isothermal at 320.degree. C. for 5 min. The total run time is 19 min. The injector is set at splitless mode. 1 .mu.L of sample is injected for every injection. The carrier gas (helium) is released at 1.2 mL/min. The ionization source temperature is set at 230.degree. C.

Samples:

[0472] Extracts containing fatty alcohols by the engineered E. coli strains were analyzed on GC/MS. In FIG. 4A chromatograms of the extracts from the mutant strains are compared to those from control strains which only produce straight chain fatty alcohols. The branched fatty alcohol produced are listed: iso-C.sub.14:0, iso-C.sub.15:0, anteiso-C.sub.15:0, iso-C.sub.16:0, iso-C.sub.17:0, and anteiso-C.sub.17:0.

[0473] In FIG. 4A top panel, a GC/MS chromatogram of extract from strain E. coli MG1655 .DELTA.fadE .DELTA.tonA AAR:kan+pKZ4+pDG6 (a) and of control strain E. coli MG1655 .DELTA.fadE .DELTA.tonA AAR:kan+pBR322+pCFDuet(b). Both chromatograms were obtained with GC/MS program #2. Compared to the control strain, mutant strain produces branched-chain fatty alcohols, and the peaks representing the branched fatty alcohols are boxed. GC/MS semi-quantitative analysis:

[0474] In addition to the qualitative analysis, semi-quantitative analysis was performed to obtain the ratio between the branched chain compounds and the straight chain isomers. Due to the lack of commercially available standards for branched fatty alcohols, accurate quantitation for the branched chain compounds was challenging. However, by using straight chain standard with the same functional group, the relative quantity or yield of branched-chan fatty alcohols in relation to the yield of their straight-chain counterpart (isomers) were estimated semi-quantitatively. Standard curve quantitation method was applied, wherein standard mixtures with different concentrations were analyzed by the same GC/MS program as the samples. After data acquisition, the instrument response (total ion current) was plotted against the concentrations of the standards. Linear calibration curves were obtained. (See, FIG. 5). The concentration of branched alcohols in a given sample was calculated according to Equation 1: y=ax+b, wherein y is the instrument response for a particular compound in a sample. Slope a and intercept b for this calibration curve were determined by the linear regression of all calibration levels of standard fatty alcohols (FIG. 4A lower panel).times.(the concentration of the branched fatty alcohol product in the sample). Accordingly, the relative concentration of branched fatty alcohols in the production mixture was calculated.

[0475] The table below lists the compounds used as standards to quantify different branched fatty alcohol compounds.

TABLE-US-00017 Alcohol in sample Standard used for quantitation Iso-Alc C.sub.15:0 Alc C.sub.15:0 Anteiso-Alc C.sub.15:0 Alc C.sub.15:0 Alc C.sub.15:0 Alc C.sub.15:0 Ald C.sub.16:0 Alc C.sub.15:0 Alc C.sub.16:0 Alc C.sub.15:0

[0476] Once the titers were obtained for all the fatty alcohol compounds, the ratio between the production of branched chain fatty alcohols and the production of straight chain isomers were calculated according to equation 2:

Percentage production = Total branched chain products in mg / L Total straight chain products in mg / L .times. 100 % ##EQU00001##

[0477] Using this method, we were able to semi-quantitatively estimate the amount of branched fatty alcohol yield relative to the straight-chain fatty alcohol yield to be about 48%.

Example 4

Production of Branched Acyl-CoA Precursors

[0478] An E. coli strain, MG1655(DE3) .DELTA.fadE::FRT .DELTA.fabH::cat/pDG6 was created, which was tested for its ability to utilize branched-chain substrate molecules to create branched-chain fatty precursors of branched fatty alcohols in vivo.

[0479] The strain MG1655(DE3) .DELTA.fadE::FRT .DELTA.fabH::cat/pDG6 was constructed as follows:

[0480] A region of the E. coli fabH gene described in Lai, et al., 2003, J. Biol. Chem. 278(51): 59494, was replaced by an antibiotic resistance gene. This deletion was perfomed in a strain that was complemented for fabH by the plasmid pDG6 carrying the B. subtilis fabH1 gene.

[0481] Initially, the pDG2 plasmid was constructed. The pCDFDuet-1 vector was purchased from Novagen/EMD Biosciences. The vector carries the CloDF13 replicon, lacI gene and streptomycin/spectinomycin resistance gene (aadA).

[0482] The C-terminal portion of the plsX gene, which contains an internal promoter for the downstream fabH gene, was amplified from E. coli MG1655 genomic DNA using primers 5'-TGAATTCCATGGCGCAACTCACTCTTCTTTTAGTCG-3' (SEQ ID NO:168) and 5'-CAGTACCTCGAGTCTTCGTATACATATGCGCT CAGTCAC-3' (SEQ ID NO:169). These primers introduced NcoI and XhoI restriction sites near the ends, as well as an internal NdeI site.

[0483] Both the plsX insert and pCDFDuet-1 vector were digested with restriction enzymes NcoI and XhoI. The cut vector was treated with Antarctic phosphatase. The insert was ligated into the vector and transformed into chemically competent TOP10 cells. Clones were screened by DNA sequencing. See, FIG. 6A.

[0484] Then a pDG6 plasmid was constructed using the pDG2 plasmid. The fabH1 gene from Bacillus subtilis strain 168 was amplified from plasmid pLS9-114 (see, FIG. 6K) using primers 5'-CCTTGGGGCATATGAAAGCTG-3' (SEQ ID NO:170) and 5'-TTTAGTCATCTCGAGTGCACCTCACCTTT-3' (SEQ ID NO:171). These primers introduced or included NdeI and XhoI restriction sites.

[0485] Both the fabH1 insert and pDG2 vector were digested with restriction enzymes NdeI and XhoI. The cut vector was treated with Antarctic phosphatase. The insert was ligated into the vector and transformed into chemically competent TOP10 cells. Clones were screened by DNA sequencing. See, FIG. 6B and FIG. 6C.

[0486] Then, the cat chloramphenicol resistance gene was amplified from template plasmid pKD3 using primers 5'-GCCACATTGCCGCGCCAAACGAAACC GTTTCAACCATGGCATATGAATATCCTCCTTAGTTCCTATTCCG-3' (SEQ ID NO: 172) and 5'-CGCCCCAGATTTCACGTATTGATCGGCTACGCTTAATGCAT GTGTAGGCTGGAGCTGCTTC-3' (SEQ ID NO:173) which added 50 by nucleotide ends that are homologous to the E. coli fabH gene. This linear PCR product was used to inactivate the E. coli fabH gene.

[0487] Strain MG1655(DE3) .DELTA.fadE::FRT was first transformed with plasmid pKD46 encoding the lambda red recombinase genes. MG1655(DE3) .DELTA.fadE::FRT/pKD46 was then transformed with plasmid pDG6. Finally, MG1655(DE3) .DELTA.fadE::FRT/pKD46+pDG6 was induced for expression of the recombinase genes by addition of 10 mM arabinose and transformed with the linear PCR product as described in Datsenko et al. (supra). Colonies were selected on LB plates containing 30 .mu.g/mL chloramphenicol and screened using colony PCR with primers 5'-TTGACACGTC TAACCCTGGC-3' (SEQ ID NO:174) and 5'-CTGTCCAGGGAACACAAATG C-3' (SEQ ID NO:175).

[0488] A number of other constructs comprising pDG7, and pDG8 were also constructed following the approach as above. The plasmids are prepared as follows.

[0489] The plasmid pDG7 was prepared from pDG2 with B. subtilis fabH2 insert. The fabH2 gene from Bacillus subtilis strain 168 was amplified from plasmid pLS9-111 (see, FIG. 6J) using primers 5'-TTGTGTCGCCCTTTCGCTG-3' (SEQ ID NO:176) and 5'-CTTACGTACGTACTCGAGTGACGC-3' (SEQ ID NO:177). These primers introduced or included NdeI and XhoI restriction sites.

[0490] Both the fabH2 insert and pDG2 vector were digested with restriction enzymes NdeI and XhoI. The cut vector was treated with Antarctic phosphatase. The insert was ligated into the vector and transformed into chemically competent TOP10 cells. Clones were screened by DNA sequencing. See, FIG. 6D and FIG. 6E.

[0491] The plasmid pDG8 was prepared from pDG2 with S. coelicolor fabH insert. The fabH gene from Streptomyces coelicolor was amplified from plasmid pLS9-115 (see, FIG. 6L) using primers 5'-AAGTGGGGCATATGTCTAAGATC-3' (SEQ ID NO:178) and 5'-GTGATCCGGCTCGAGGTGGTTAC-3' (SEQ ID NO:179). These primers introduced or included NdeI and XhoI restriction sites.

[0492] Both the fabH insert and pDG2 vector were digested with restriction enzymes NdeI and XhoI. The cut vector was treated with Antarctic phosphatase. The insert was ligated into the vector and transformed into chemically competent TOP10 cells. Clones were screened by DNA sequencing. See, FIG. 6F and FIG. 6G.

[0493] The plasmid pDG10 was prepared using pCR-Blunt vector, which was purchased from Invitrogen, with C. acetobutylicum ptb_buk operon insert, wherein the ptb part represents the gene encoding C. acetobutylicum phosphotransbutyrylase (GenBank Accession AAA75486.1, SEQ ID NO:156), and the buk part represents the gene encoding C. acetobutylicum butyrate kinase (GenBank Accession JN0795, SEQ ID NO:157). The buk_ptb operon was amplified from Clostridium acetobutylicum genomic DNA (ATCC 824) using primers 5'-CTTAACTTCATGTGAAAAGTTTGT-3' (SEQ ID NO:180) and 5'-ACAATACCCATGTTTATAGGGCAA-3' (SEQ ID NO:181). The PCR product was ligated into the pCR_Blunt vector following the manufacturer's instructions. Colonies were verified by DNA sequencing. See, FIG. 6H and FIG. 6I.

[0494] E. coli strains were transformed with pDG10, and OP-180 plasmid comprising E. coli thioesterase gene tesA under the control of the Ptrc promoter and independently also one of the pDG6, pDG7 and pDG8 plasmids as described above.

[0495] These strains were fed branched molecules isobutyrate, which resulted in iso-C.sub.14:0 and iso-C.sub.16:0 branched acyl-CoA precursors. Independently they were fed branched molecule isovalerate, which resulted in iso-C.sub.13:0 and iso-C.sub.15:0 branched acyl-CoA precursors. See, FIG. 4B. These precursors can then be incorporated into the branched fatty alcohol pathways as described herein and depicted in FIG. 1A and FIG. 1B.

Other Embodiments

[0496] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Sequence CWU 1

1

1811330PRTBacillus subtilis 1Met Ser Thr Asn Arg His Gln Ala Leu Gly Leu Thr Asp Gln Glu Ala 1 5 10 15 Val Asp Met Tyr Arg Thr Met Leu Leu Ala Arg Lys Ile Asp Glu Arg 20 25 30 Met Trp Leu Leu Asn Arg Ser Gly Lys Ile Pro Phe Val Ile Ser Cys 35 40 45 Gln Gly Gln Glu Ala Ala Gln Val Gly Ala Ala Phe Ala Leu Asp Arg 50 55 60 Glu Met Asp Tyr Val Leu Pro Tyr Tyr Arg Asp Met Gly Val Val Leu 65 70 75 80 Ala Phe Gly Met Thr Ala Lys Asp Leu Met Met Ser Gly Phe Ala Lys 85 90 95 Ala Ala Asp Pro Asn Ser Gly Gly Arg Gln Met Pro Gly His Phe Gly 100 105 110 Gln Lys Lys Asn Arg Ile Val Thr Gly Ser Ser Pro Val Thr Thr Gln 115 120 125 Val Pro His Ala Val Gly Ile Ala Leu Ala Gly Arg Met Glu Lys Lys 130 135 140 Asp Ile Ala Ala Phe Val Thr Phe Gly Glu Gly Ser Ser Asn Gln Gly 145 150 155 160 Asp Phe His Glu Gly Ala Asn Phe Ala Ala Val His Lys Leu Pro Val 165 170 175 Ile Phe Met Cys Glu Asn Asn Lys Tyr Ala Ile Ser Val Pro Tyr Asp 180 185 190 Lys Gln Val Ala Cys Glu Asn Ile Ser Asp Arg Ala Ile Gly Tyr Gly 195 200 205 Met Pro Gly Val Thr Val Asn Gly Asn Asp Pro Leu Glu Val Tyr Gln 210 215 220 Ala Val Lys Glu Ala Arg Glu Arg Ala Arg Arg Gly Glu Gly Pro Thr 225 230 235 240 Leu Ile Glu Thr Ile Ser Tyr Arg Leu Thr Pro His Ser Ser Asp Asp 245 250 255 Asp Asp Ser Ser Tyr Arg Gly Arg Glu Glu Val Glu Glu Ala Lys Lys 260 265 270 Ser Asp Pro Leu Leu Thr Tyr Gln Ala Tyr Leu Lys Glu Thr Gly Leu 275 280 285 Leu Ser Asp Glu Ile Glu Gln Thr Met Leu Asp Glu Ile Met Ala Ile 290 295 300 Val Asn Glu Ala Thr Asp Glu Ala Glu Asn Ala Pro Tyr Ala Ala Pro 305 310 315 320 Glu Ser Ala Leu Asp Tyr Val Tyr Ala Lys 325 330 2993DNABacillus subtilis 2ctacttcgca taaacataat caagcgctga ctcaggagct gcatatgggg cgttctccgc 60ttcatccgtc gcttcattta cgattgccat aatttcatcc agcatggttt gttctatctc 120atcggacagc aggcctgttt cctttaagta agcttgataa gtaagcaggg gatcactttt 180tttcgcttcc tctacttctt cacggcctct gtagctgctg tcatcgtcat cactggaatg 240tggtgtaagg cggtaagaaa tcgtttcaat taatgtcggg ccttctcctc tgcgtgccct 300ttcgcgtgct tctttaaccg cttgataaac ttccagcgga tcatttccat tcacagttac 360gccaggcatc ccatagccta tggcacggtc ggaaatgttc tcacatgcga cttgcttatc 420gtaaggcact gagattgcgt atttgttgtt ttcacacatg aaaataaccg gcagcttatg 480gacagcggca aagtttgccc cttcatggaa atcgccttgg tttgaagacc cttccccgaa 540tgtaacaaag gctgcgatat cctttttctc catacgtccc gcaagcgcaa taccgactgc 600gtgcggcact tgcgttgtaa ccggagatga tcccgtcaca atgcggtttt tcttttgtcc 660gaaatgtccc ggcatctggc ggcctcctga gttcggatct gctgcttttg caaacccgga 720catcattaag tcctttgctg tcatgccaaa cgcgagcacg acacccatgt ctctgtagta 780cggcaataca taatccattt cacggtcaag tgcgaaagcc gctcctacct gtgctgcttc 840ctgtccttga caagagatta caaatggaat tttgccagaa cggtttaaca gccacattct 900ttcatcgatt tttcttgcta acagcatggt tctatacata tcaacggctt cctgatcagt 960cagccctagt gcttgatgtc ggtttgtact cat 9933406PRTStreptomyces avermitilis 3Met Thr Val Glu Ser Thr Ala Ala Arg Lys Pro Arg Arg Ser Ala Gly 1 5 10 15 Thr Lys Ser Ala Ala Ala Lys Arg Thr Ser Pro Gly Ala Lys Lys Ser 20 25 30 Pro Ser Thr Thr Gly Ala Glu His Glu Leu Ile Gln Leu Leu Thr Pro 35 40 45 Asp Gly Arg Arg Val Lys Asn Pro Glu Tyr Asp Ala Tyr Val Ala Asp 50 55 60 Ile Thr Pro Glu Glu Leu Arg Gly Leu Tyr Arg Asp Met Val Leu Ser 65 70 75 80 Arg Arg Phe Asp Ala Glu Ala Thr Ser Leu Gln Arg Gln Gly Glu Leu 85 90 95 Gly Leu Trp Ala Ser Met Leu Gly Gln Glu Ala Ala Gln Ile Gly Ser 100 105 110 Gly Arg Ala Thr Arg Asp Asp Asp Tyr Val Phe Pro Thr Tyr Arg Glu 115 120 125 His Gly Val Ala Trp Cys Arg Gly Val Asp Pro Thr Asn Leu Leu Gly 130 135 140 Met Phe Arg Gly Val Asn Asn Gly Gly Trp Asp Pro Asn Ser Asn Asn 145 150 155 160 Phe His Leu Tyr Thr Ile Val Ile Gly Ser Gln Thr Leu His Ala Thr 165 170 175 Gly Tyr Ala Met Gly Ile Ala Lys Asp Gly Ala Asp Ser Ala Val Ile 180 185 190 Ala Tyr Phe Gly Asp Gly Ala Ser Ser Gln Gly Asp Val Ala Glu Ser 195 200 205 Phe Thr Phe Ser Ala Val Tyr Asn Ala Pro Val Val Phe Phe Cys Gln 210 215 220 Asn Asn Gln Trp Ala Ile Ser Glu Pro Thr Glu Lys Gln Thr Arg Val 225 230 235 240 Pro Leu Tyr Gln Arg Ala Gln Gly Tyr Gly Phe Pro Gly Val Arg Val 245 250 255 Asp Gly Asn Asp Val Leu Ala Cys Leu Ala Val Thr Lys Trp Ala Leu 260 265 270 Glu Arg Ala Arg Arg Gly Glu Gly Pro Thr Leu Val Glu Ala Phe Thr 275 280 285 Tyr Arg Met Gly Ala His Thr Thr Ser Asp Asp Pro Thr Lys Tyr Arg 290 295 300 Ala Asp Glu Glu Arg Glu Ala Trp Glu Ala Lys Asp Pro Ile Leu Arg 305 310 315 320 Leu Arg Thr Tyr Leu Glu Ala Ser Asn His Ala Asp Glu Gly Phe Phe 325 330 335 Ala Glu Leu Glu Val Glu Ser Glu Ala Leu Gly Arg Arg Val Arg Glu 340 345 350 Val Val Arg Ala Met Pro Asp Pro Asp His Phe Ala Ile Phe Glu Asn 355 360 365 Val Tyr Ala Asp Gly His Ala Leu Val Asp Glu Glu Arg Ala Gln Phe 370 375 380 Ala Ala Tyr Gln Ala Ser Phe Thr Thr Glu Pro Asp Gly Gly Ser Ala 385 390 395 400 Ala Gly Gln Gly Gly Asn 405 41221DNAStreptomyces avermitilis 4gtgaccgtgg agagcactgc cgcgcgaaag ccgcgacgca gcgccggtac gaagagcgcc 60gcagccaagc gcaccagccc cggcgccaag aagtcaccga gcacgaccgg cgccgagcac 120gagctgattc agctgctcac gcccgacggc cggcgggtga agaaccccga gtacgacgcg 180tacgtcgcgg acatcacccc cgaagagctg cgcggtctgt accgggacat ggtgctgagc 240cgccgcttcg acgcagaggc cacctccctg caacgccagg gcgagctggg cctgtgggcc 300tcgatgctcg ggcaggaggc cgcccagatc ggctcgggcc gggccacccg tgacgacgac 360tacgtcttcc cgacctaccg cgagcacggc gtcgcctggt gccgcggggt cgaccccacc 420aacctgctcg gcatgttccg cggcgtgaac aacggcggct gggatcccaa cagcaacaac 480ttccacctct acacgatcgt catcggctcg cagacgctgc acgccaccgg ctacgccatg 540ggtatcgcca aggacggcgc cgactcggcc gtgatcgcgt acttcggtga cggcgcctcc 600agccagggtg acgtcgccga atcgttcacc ttctccgcgg tctacaacgc ccctgtcgtc 660ttcttctgcc agaacaacca gtgggcgatc tccgagccca ccgagaagca gacccgcgtc 720ccgctctacc agcgcgcgca gggctacggc ttcccgggcg tccgcgtcga cggcaacgac 780gtactggcct gcctcgccgt caccaagtgg gccctcgagc gggcccgccg gggcgagggg 840cccacgttgg tcgaggcgtt cacgtaccgc atgggcgcgc acaccacctc cgacgacccg 900accaagtacc gggccgacga ggagcgcgag gcgtgggagg cgaaggaccc gatcctgcgt 960ctgcgcacgt atctcgaggc ctcaaaccac gcggacgagg gattcttcgc ggaactcgag 1020gtggagagcg aggcgttggg aaggcgagtg cgcgaagtgg tgcgtgccat gccggacccg 1080gaccacttcg ccatcttcga gaacgtgtac gcggacgggc atgcgctcgt cgacgaggag 1140cgggcgcagt tcgccgccta ccaggcgtcg ttcacgacgg agcctgacgg cggctccgcc 1200gcgggacagg ggggtaactg a 12215410PRTPseudomonas putida 5Met Asn Glu Tyr Ala Pro Leu Arg Leu His Val Pro Glu Pro Thr Gly 1 5 10 15 Arg Pro Gly Cys Gln Thr Asp Phe Ser Tyr Leu Arg Leu Asn Asp Ala 20 25 30 Gly Gln Ala Arg Lys Pro Ala Ile Asp Val Asp Ala Ala Asp Thr Ala 35 40 45 Asp Leu Ser Tyr Ser Leu Val Arg Val Leu Asp Glu Gln Gly Asp Ala 50 55 60 Gln Gly Pro Trp Ala Glu Asp Ile Asp Pro Gln Ile Leu Arg Gln Gly 65 70 75 80 Met Arg Ala Met Leu Lys Thr Arg Ile Phe Asp Ser Arg Met Val Val 85 90 95 Ala Gln Arg Gln Lys Lys Met Ser Phe Tyr Met Gln Ser Leu Gly Glu 100 105 110 Glu Ala Ile Gly Ser Gly Gln Ala Leu Ala Leu Asn Arg Thr Asp Met 115 120 125 Cys Phe Pro Thr Tyr Arg Gln Gln Ser Ile Leu Met Ala Arg Asp Val 130 135 140 Ser Leu Val Glu Met Ile Cys Gln Leu Leu Ser Asn Glu Arg Asp Pro 145 150 155 160 Leu Lys Gly Arg Gln Leu Pro Ile Met Tyr Ser Val Arg Glu Ala Gly 165 170 175 Phe Phe Thr Ile Ser Gly Asn Leu Ala Thr Gln Phe Val Gln Ala Val 180 185 190 Gly Trp Ala Met Ala Ser Ala Ile Lys Gly Asp Thr Lys Ile Ala Ser 195 200 205 Ala Trp Ile Gly Asp Gly Ala Thr Ala Glu Ser Asp Phe His Thr Ala 210 215 220 Leu Thr Phe Ala His Val Tyr Arg Ala Pro Val Ile Leu Asn Val Val 225 230 235 240 Asn Asn Gln Trp Ala Ile Ser Thr Phe Gln Ala Ile Ala Gly Gly Glu 245 250 255 Ser Thr Thr Phe Ala Gly Arg Gly Val Gly Cys Gly Ile Ala Ser Leu 260 265 270 Arg Val Asp Gly Asn Asp Phe Val Ala Val Tyr Ala Ala Ser Arg Trp 275 280 285 Ala Ala Glu Arg Ala Arg Arg Gly Leu Gly Pro Ser Leu Ile Glu Trp 290 295 300 Val Thr Tyr Arg Ala Gly Pro His Ser Thr Ser Asp Asp Pro Ser Lys 305 310 315 320 Tyr Arg Pro Ala Asp Asp Trp Ser His Phe Pro Leu Gly Asp Pro Ile 325 330 335 Ala Arg Leu Lys Gln His Leu Ile Lys Ile Gly His Trp Ser Glu Glu 340 345 350 Glu His Gln Ala Val Thr Ala Glu Leu Glu Ala Ala Val Ile Ala Ala 355 360 365 Gln Lys Glu Ala Glu Gln Tyr Gly Thr Leu Ala Asn Gly His Ile Pro 370 375 380 Ser Ala Ala Ser Met Phe Glu Asp Val Tyr Lys Glu Met Pro Glu His 385 390 395 400 Leu Arg Arg Gln Arg Gln Glu Leu Gly Val 405 410 61233DNAPseudomonas putida 6tcaaaccccc agttcctggc gttgacggcg caggtgttcg ggcatctcct tgtacacatc 60ctcgaacatc gaggcggcgc tcgggatgtg cccgttagcc agggtgccgt actgctcggc 120ttctttctgt gcggcaatca ccgcagcttc gagctcggcc gtgacggctt ggtgttcttc 180ttcggaccag tggccgatct tgatcaggtg ctgcttcagg cgggcgatcg ggtcacccag 240cgggaagtgg ctccagtcat cggcagggcg gtacttggag gggtcgtccg acgtcgagtg 300cgggccggca cggtaggtga cccactcgat caggcttggg cccaggccgc ggcgggcgcg 360ctcggcagcc cagcgcgagg cggcgtacac ggcgacgaag tcgttgccgt caacccgcag 420cgaggcaatg ccgcagccca cgccacggcc ggcgaaggtg gtcgactcgc caccggcgat 480ggcctggaag gtagaaatcg cccactggtt gttgaccaca ttgaggatca ccggggcgcg 540gtaaacgtgg gcaaaggtga gggcggtgtg gaagtccgac tcggcggtgg ctccgtcacc 600gatccacgcc gaagcaatct tggtatcgcc cttgatcgcc gaggccatgg cccagccgac 660tgcctgcacg aactgggtcg ccaggttgcc gctgatggtg aagaagccgg cttcgcgcac 720cgagtacatg atcggcaact ggcggccctt gagggggtcg cgctcgttgg acagcagttg 780gcagatcatc tcgaccagcg atacgtcgcg ggccatcagg atgctttgct ggcggtaggt 840cgggaagcac atgtcggtgc ggttcagcgc cagcgcctgg ccactgccga tggcttcttc 900gcccaggctt tgcatgtaga aggacatctt cttctggcgc tgggcaacca ccatgcggct 960gtcgaagatc cgcgtcttga gcatggcgcg catgccttga cgaaggatct gtgggtcgat 1020gtcttcggcc caggggcctt gcgcatcacc ttgctcgtcg agcacgcgga ccaggctgta 1080ggacaggtcg gcagtgtcgg cagcatcgac atcgatcgcg ggtttacggg cttgacctgc 1140atcgttgagg cgcaggtagg aaaaatcggt ctggcagcct ggccggccgg tgggctcggg 1200cacatgcaaa cgcagggggg cgtactcgtt cat 12337331PRTListeria monocytogenes 7Met Thr Leu Lys Glu Ala Gly Leu Thr Glu Asp Lys Leu Ile Lys Met 1 5 10 15 Tyr Glu Thr Met Leu Met Ala Arg Arg Leu Asp Glu Arg Met Trp Leu 20 25 30 Leu Asn Arg Ser Gly Lys Ile Pro Phe Thr Ile Ser Gly Gln Gly Gln 35 40 45 Glu Thr Ala Gln Ile Gly Ala Ala Phe Ala Phe Asp Leu Asp Lys Asp 50 55 60 Tyr Ala Leu Pro Tyr Tyr Arg Asp Leu Ala Val Val Leu Ala Phe Gly 65 70 75 80 Met Thr Ala Lys Asp Ile Met Leu Ser Ala Phe Ala Lys Ala Glu Asp 85 90 95 Pro Asn Ser Gly Gly Arg Gln Met Pro Ala His Phe Gly Gln Lys Ser 100 105 110 Asn Arg Ile Val Thr Gln Ser Ser Pro Val Thr Thr Gln Phe Pro His 115 120 125 Ala Ala Gly Ile Gly Leu Ala Ala Lys Met Ala Gly Asp Glu Ile Ala 130 135 140 Ile Tyr Ala Ser Thr Gly Glu Gly Ser Ser Asn Gln Gly Asp Phe His 145 150 155 160 Glu Gly Ile Asn Phe Ala Ser Val His Lys Leu Pro Val Val Phe Val 165 170 175 Ile His Asn Asn Gln Tyr Ala Ile Ser Val Pro Ala Ser Lys Gln Tyr 180 185 190 Ala Ala Glu Lys Leu Ser Asp Arg Ala Ile Gly Tyr Gly Ile Pro Gly 195 200 205 Glu Arg Val Asp Gly Thr Asn Met Gly Glu Val Tyr Ala Ala Phe Lys 210 215 220 Arg Ala Ala Asp Arg Ala Arg Asn Gly Glu Gly Pro Thr Leu Ile Glu 225 230 235 240 Thr Val Ser Tyr Arg Phe Thr Pro His Ser Ser Asp Asp Asp Asp Ser 245 250 255 Ser Tyr Arg Ser Arg Glu Glu Val Asn Glu Ala Lys Gly Lys Asp Pro 260 265 270 Leu Thr Ile Phe Gln Thr Glu Leu Leu Glu Glu Gly Tyr Leu Thr Glu 275 280 285 Glu Lys Ile Ala Glu Ile Glu Lys Asn Ile Ala Lys Glu Val Asn Glu 290 295 300 Ala Thr Asp Tyr Ala Glu Ser Ala Ala Tyr Ala Glu Pro Glu Ser Ser 305 310 315 320 Leu Leu Tyr Val Tyr Asp Glu Glu Ala Asn Ser 325 330 8996DNAListeria monocytogenes 8atgactttaa aagaagcagg tttaacagaa gataaattaa ttaaaatgta tgaaacaatg 60ctaatggcaa gaagactaga cgagcgtatg tggttgctga accgttctgg gaaaattcct 120ttcaccattt ctggacaagg acaagaaacg gcacaaattg gcgcagcgtt tgcctttgat 180ttagataaag attacgcatt accatattac cgtgatttag cggtggtgtt agcatttggg 240atgacagcga aagatattat gttatccgcg ttcgctaaag cagaggatcc aaactctggt 300ggacgtcaaa tgccagctca ttttggtcaa aaatcaaatc gcatcgtgac acaaagttca 360ccagtaacaa cgcagttccc gcatgcagca ggtattggtc ttgcagcgaa aatggccggt 420gatgagattg caatttatgc ttcaacgggt gaaggatctt ctaaccaagg agatttccat 480gaaggaatca acttcgcatc tgtacataag ttgccagttg ttttcgtgat tcacaataac 540caatatgcca tttccgttcc agcatcgaaa caatatgctg cagaaaaact atccgaccga 600gcaatcggtt atggtatccc aggggaacgt gtggatggca caaatatggg tgaagtatac 660gcggcattta aacgtgcagc agatcgtgca agaaacggcg agggccccac tttaattgaa 720acagtttctt accgattcac accgcactct tctgatgatg atgacagcag ttatcgttcc 780agagaagaag tgaacgaagc aaaaggaaaa gatccactga caattttcca aacagaatta 840ctcgaagaag gttacttaac agaagaaaaa atcgctgaaa tcgaaaaaaa tattgcaaaa 900gaagttaacg aagcaaccga ttacgcggaa agtgcagcat acgctgaacc agaatcatct 960ttactttatg tatatgatga agaagcgaat agctga 9969381PRTStreptomyces avermitilis 9Met Thr Val Met Glu Gln Arg Gly Ala Tyr Arg Pro Thr Pro Pro Pro 1 5 10 15 Ala Trp Gln Pro Arg Thr Asp Pro Ala Pro Leu Leu Pro Asp Ala Leu 20 25 30 Pro His Arg Val Leu Gly Thr Glu Ala Ala Ala Glu Ala Asp Pro Leu 35 40 45 Leu Leu Arg Arg Leu Tyr Ala Glu Leu Val Arg Gly Arg Arg Tyr Asn 50 55 60 Thr Gln Ala Thr Ala Leu Thr Lys Gln Gly Arg Leu Ala Val Tyr Pro 65 70 75 80 Ser Ser Thr Gly Gln Glu Ala Cys Glu Val Ala Ala Ala Leu Val Leu 85

90 95 Glu Glu Arg Asp Trp Leu Phe Pro Ser Tyr Arg Asp Thr Leu Ala Ala 100 105 110 Val Ala Arg Gly Leu Asp Pro Val Gln Ala Leu Thr Leu Leu Arg Gly 115 120 125 Asp Trp His Thr Gly Tyr Asp Pro Arg Glu His Arg Ile Ala Pro Leu 130 135 140 Cys Thr Pro Leu Ala Thr Gln Leu Pro His Ala Val Gly Leu Ala His 145 150 155 160 Ala Ala Arg Leu Lys Gly Asp Asp Val Val Ala Leu Ala Leu Val Gly 165 170 175 Asp Gly Gly Thr Ser Glu Gly Asp Phe His Glu Ala Leu Asn Phe Ala 180 185 190 Ala Val Trp Gln Ala Pro Val Val Phe Leu Val Gln Asn Asn Gly Phe 195 200 205 Ala Ile Ser Val Pro Leu Ala Lys Gln Thr Ala Ala Pro Ser Leu Ala 210 215 220 His Lys Ala Val Gly Tyr Gly Met Pro Gly Arg Leu Val Asp Gly Asn 225 230 235 240 Asp Ala Ala Ala Val His Glu Val Leu Ser Asp Ala Val Ala His Ala 245 250 255 Arg Ala Gly Gly Gly Pro Thr Leu Val Glu Ala Val Thr Tyr Arg Ile 260 265 270 Asp Ala His Thr Asn Ala Asp Asp Ala Thr Arg Tyr Arg Gly Asp Ser 275 280 285 Glu Val Glu Ala Trp Arg Ala His Asp Pro Ile Ala Leu Leu Glu His 290 295 300 Glu Leu Thr Glu Arg Gly Leu Leu Asp Glu Asp Gly Ile Arg Ala Ala 305 310 315 320 Arg Glu Asp Ala Glu Ala Met Ala Ala Asp Leu Arg Ala Arg Met Asn 325 330 335 Gln Asp Pro Ala Leu Asp Pro Met Asp Leu Phe Ala His Val Tyr Ala 340 345 350 Glu Pro Thr Pro Gln Leu Arg Glu Gln Glu Ala Gln Leu Arg Ala Glu 355 360 365 Leu Ala Ala Glu Ala Asp Gly Pro Gln Gly Val Gly Arg 370 375 380 101146DNAStreptomyces avermitilis 10atgacggtca tggagcagcg gggcgcttac cggcccacac cgccgcccgc ctggcagccc 60cgcaccgacc ccgcgccact gctgcccgac gcgctgcccc accgcgtcct gggcaccgag 120gcggccgcgg aggccgaccc gctactgctg cgccgcctgt acgcggagct ggtgcgcggc 180cgccgctaca acacgcaggc cacggctctc accaagcagg gccggctcgc cgtctacccg 240tcgagcacgg gccaggaggc ctgcgaggtc gccgccgcgc tcgtgctgga ggagcgcgac 300tggctcttcc ccagctaccg ggacaccctc gccgccgtcg cccgcggcct cgatcccgtc 360caggcgctca ccctcctgcg cggcgactgg cacaccgggt acgacccccg tgagcaccgc 420atcgcgcccc tgtgcacccc tctcgcgacc cagctcccgc acgccgtcgg cctcgcgcac 480gccgcccgcc tcaagggcga cgacgtggtc gcgctcgccc tggtcggcga cggcggcacc 540agcgagggcg acttccacga ggcactgaac ttcgccgccg tctggcaggc gccggtcgtc 600ttcctcgtgc agaacaacgg cttcgccatc tccgtcccgc tcgccaagca gaccgccgcc 660ccgtcgctgg cccacaaggc cgtcggctac gggatgccgg gccgcctggt cgacggcaac 720gacgcggcgg ccgtgcacga ggtcctcagc gacgccgtgg cccacgcgcg cgcgggaggg 780gggccgacgc tcgtggaggc ggtgacctac cgcatcgacg cccacaccaa cgccgacgac 840gcgacgcgct accgggggga ctccgaggtg gaggcctggc gcgcgcacga cccgatcgcg 900ctcctggagc acgagttgac cgaacgcggg ctgctcgacg aggacggcat ccgggccgcc 960cgcgaggacg ccgaggcgat ggccgcggac ctgcgcgcac gcatgaacca ggatccggcc 1020ctggacccca tggacctgtt cgcccatgtg tatgccgagc ccacccccca gctgcgggag 1080caggaagccc agttgcgggc cgagctggca gcggaggccg acgggcccca aggagtcggc 1140cgatga 114611384PRTMicrococcus luteus 11Met Thr Leu Val Asp His Thr Arg Pro Thr Gly Gly Gln Ser Ala Gly 1 5 10 15 Ser Pro Pro Pro Ala Gly Pro Ala Glu Ala Val Met Leu Gln Val Leu 20 25 30 Asp Thr Glu Gly Arg Arg Arg Pro Gln Pro Glu Leu Asp Pro Trp Ile 35 40 45 Glu Asp Val Asp Ala Ala Ala Leu Ala Ala Leu Tyr Arg Gln Met Ala 50 55 60 Val Val Arg Arg Leu Asp Val Glu Ala Thr His Leu Gln Arg Gln Gly 65 70 75 80 Glu Leu Ala Leu Trp Pro Pro Leu Leu Gly Gln Glu Ala Ala Gln Val 85 90 95 Gly Ser Ala Val Ala Leu Arg Pro Asp Asp Phe Val Phe Pro Ser Tyr 100 105 110 Arg Glu Asn Gly Val Ala Leu Leu Arg Gly Val Pro Ala Leu Asp Leu 115 120 125 Leu Arg Val Trp Arg Gly Ser Thr Phe Ser Ser Trp Asp Pro Asn Glu 130 135 140 Thr Arg Val Ala Thr Gln Gln Ile Ile Ile Gly Ala Gln Ala Leu His 145 150 155 160 Ala Val Gly Tyr Ala Met Gly Val Gln Arg Asp Gln Ala Asp Val Ala 165 170 175 Thr Ile Val Tyr Phe Gly Asp Gly Ala Thr Ser Gln Gly Asp Val Asn 180 185 190 Glu Ala Met Val Phe Ser Ala Ser Tyr Gln Ala Pro Val Val Phe Phe 195 200 205 Cys Gln Asn Asn His Trp Ala Ile Ser Glu Pro Val Arg Leu Gln Thr 210 215 220 Arg Arg Ser Ile Ala Asp Arg Pro Trp Gly Phe Gly Ile Pro Ser Met 225 230 235 240 Arg Val Asp Gly Asn Asp Val Leu Ala Val Leu Ala Ala Thr Arg Ala 245 250 255 Ala Val Glu Arg Ala Ala Asp Gly Gly Gly Pro Thr Phe Val Glu Ala 260 265 270 Val Thr Tyr Arg Met Gly Pro His Thr Thr Ala Asp Asp Pro Thr Arg 275 280 285 Tyr Arg Asp Asp Ala Glu Leu Glu Ala Trp Lys Ala Arg Asp Pro Leu 290 295 300 Thr Arg Val Glu Ala His Leu Arg Thr Leu Asp Val Asp Val Asp Ala 305 310 315 320 Val Leu Ala Gln Ala Gln Ala Glu Ala Asp Glu Leu Ala Ala Glu Val 325 330 335 Arg Arg Ala Leu Glu Ala Leu Glu Glu Asp Gly Ala Asp Arg Leu Phe 340 345 350 Asp Glu Ile Tyr Ala Glu Pro His Gln Glu Leu Glu Arg Gln Arg Arg 355 360 365 Glu His Ala Leu Tyr Leu Gln Gln Phe Asp Asp Glu Glu Ala Gly Ala 370 375 380 121155DNAMicrococcus luteus 12gtgaccctcg tggaccacac ccgtcccacc ggcggacagt ccgccggctc tccgcccccg 60gcgggcccgg ccgaggccgt gatgctccag gtgctcgaca cggagggccg ccgccgtccg 120cagccggagc tcgacccgtg gatcgaggac gtcgacgccg ccgccctcgc cgcgctgtac 180cgccagatgg ccgtggtccg tcgcctcgac gtcgaggcca cgcacctgca gcgtcagggc 240gagctggccc tgtggccgcc gctgctgggc caggaggccg cccaggtggg ctccgccgtc 300gcgctgcgcc cggacgactt cgtcttcccg tcctaccgcg agaacggcgt ggccctgctg 360cgcggcgtcc ccgcgctgga cctgctgcgg gtgtggcgcg gctccacgtt ctcgagctgg 420gacccgaacg agacgcgggt ggccacccag cagatcatca tcggcgcgca ggccctgcac 480gccgtcggct acgcgatggg cgtccagcgg gaccaggcgg acgtcgccac gatcgtctac 540ttcggcgacg gcgccacgag ccagggcgac gtcaacgagg ccatggtctt cagcgcctcc 600taccaggcgc ccgtggtgtt cttctgccag aacaaccact gggccatctc cgagcccgtg 660cgcctgcaga cccgccgcag catcgcggac cgcccgtggg gcttcggcat cccgtcgatg 720cgcgtggacg gcaacgacgt cctggccgtg ctcgccgcaa cccgcgccgc cgtcgagcgc 780gcggccgacg ggggcggccc cacgttcgtc gaggccgtca cctaccgcat gggtccacac 840accaccgcgg acgaccccac ccgctaccgg gacgacgccg agctcgaggc ctggaaggcc 900cgtgacccgc tgacccgcgt ggaggcgcac ctgcgcaccc tcgacgtgga cgtggacgcc 960gtgcttgcac aggcccaggc cgaggccgac gagctggcag cggaggtccg ccgtgccctc 1020gaggcgctcg aggaggacgg cgcggacagg ctcttcgacg agatctacgc ggagccccac 1080caggagctcg agcggcagcg ccgcgagcac gccctctacc tgcagcagtt cgacgacgag 1140gaggcgggcg cgtga 115513330PRTStaphylococcus aureus 13Met Ile Asp Tyr Lys Ser Leu Gly Leu Ser Glu Glu Asp Leu Lys Val 1 5 10 15 Ile Tyr Lys Trp Met Asp Leu Gly Arg Lys Ile Asp Glu Arg Leu Trp 20 25 30 Leu Leu Asn Arg Ala Gly Lys Ile Pro Phe Val Val Ser Gly Gln Gly 35 40 45 Gln Glu Ala Thr Gln Ile Gly Met Ala Tyr Ala Leu Glu Glu Gly Asp 50 55 60 Ile Thr Ala Pro Tyr Tyr Arg Asp Leu Ala Phe Val Thr Tyr Met Gly 65 70 75 80 Ile Ser Ala Tyr Asp Thr Phe Leu Ser Ala Phe Gly Lys Lys Asp Asp 85 90 95 Val Asn Ser Gly Gly Lys Gln Met Pro Ser His Phe Ser Ser Arg Ala 100 105 110 Lys Asn Ile Leu Ser Gln Ser Ser Pro Val Ala Thr Gln Ile Pro His 115 120 125 Ala Val Gly Ala Ala Leu Ala Leu Lys Met Asp Gly Lys Lys Lys Ile 130 135 140 Ala Thr Ala Thr Val Gly Glu Gly Ser Ser Asn Gln Gly Asp Phe His 145 150 155 160 Glu Gly Leu Asn Phe Ala Gly Val His Lys Leu Pro Phe Val Cys Val 165 170 175 Ile Ile Asn Asn Lys Tyr Ala Ile Ser Val Pro Asp Ser Leu Gln Tyr 180 185 190 Ala Ala Glu Lys Leu Ser Asp Arg Ala Leu Gly Tyr Gly Ile His Gly 195 200 205 Glu Gln Val Asp Gly Asn Asp Pro Leu Ala Met Tyr Lys Ala Met Lys 210 215 220 Glu Ala Arg Asp Arg Ala Ile Ser Gly Gln Gly Ser Thr Leu Ile Glu 225 230 235 240 Ala Val Thr Ser Arg Met Thr Ala His Ser Ser Asp Asp Asp Asp Gln 245 250 255 Tyr Arg Thr Lys Glu Glu Arg Glu Ala Leu Lys Lys Ala Asp Cys Asn 260 265 270 Glu Lys Phe Lys Lys Glu Leu Leu Ser Ala Gly Ile Ile Asp Asp Ala 275 280 285 Trp Leu Ala Glu Ile Glu Ala Glu His Lys Asp Ile Ile Asn Lys Ala 290 295 300 Thr Lys Ala Ala Glu Asp Ala Pro Tyr Pro Ser Val Glu Glu Ala Tyr 305 310 315 320 Ala Phe Val Tyr Glu Glu Gly Ser Leu Asn 325 330 14993DNAStaphylococcus aureus 14ttagttaaga ctcccttctt cgtacacaaa tgcataggct tcttcgacac ttggatatgg 60cgcgtcttca gcagcctttg tcgctttatt gatgatgtct ttatgctccg cttctatttc 120tgccaaccaa gcatcatcga taatgccagc tgaaagcaac tcttttttga acttttcatt 180gcagtcagct tttttaagcg cttcacgctc ttctttcgta cgatattggt cgtcatcatc 240tgatgaatga gctgtcatac gacttgttac tgcttcaatc aaagttgaac cttgaccaga 300aatagctcga tctcttgctt ctttcatcgc tttatacatt gctaatggat cattaccatc 360tacttgttca ccatgtatac cgtaaccaag tgctctatcc gataattttt cagctgcgta 420ttgtaatgaa tcaggtactg aaattgcata tttattattt ataatgacac atacaaaagg 480aagtttgtgt acacccgcga agtttaaacc ttcatggaag tcaccttggt ttgagctacc 540ttcaccaaca gttgctgttg caattttctt cttaccatcc atttttaaag ctaaagcagc 600accaacagca tggggtattt gagttgctac cggtgaactt tgagacaaaa tattcttagc 660tctactacta aagtgtgatg gcatttgttt tccaccagag ttaacatcgt ctttctttcc 720aaacgctgat aaaaacgtat catacgctga gatacccata taagtaacga aagctagatc 780tctataataa ggcgctgtaa tatcaccttc ttctaatgcg tatgccatcc caatctgagt 840tgcttcttgt ccttgaccac ttacaacaaa tggaatttta cctgcacggt tcaataacca 900cagtctttca tctatttttc tacctaaatc catccattta tatattactt ttaggtcttc 960ttcgctaagg cctaatgatt tataatcaat cat 99315331PRTStreptococcus mutans 15Met Ala Arg Lys Ile Leu Glu Val Ile Ile Ala Met Leu Ser Lys Lys 1 5 10 15 Gln Tyr Leu Asp Met Phe Leu Lys Met Gln Arg Ile Arg Asp Val Asp 20 25 30 Thr Lys Leu Asn Lys Leu Val Arg Arg Gly Phe Val Gln Gly Met Thr 35 40 45 His Phe Ser Val Gly Glu Glu Ala Ala Ser Val Gly Ala Ile Gln Gly 50 55 60 Leu Thr Asp Gln Asp Ile Ile Phe Ser Asn His Arg Gly His Gly Gln 65 70 75 80 Thr Ile Ala Lys Gly Ile Asp Ile Pro Ala Met Phe Ala Glu Leu Ala 85 90 95 Gly Lys Ala Thr Gly Ser Ser Lys Gly Arg Gly Gly Ser Met His Leu 100 105 110 Ala Asn Leu Glu Lys Gly Asn Tyr Gly Thr Asn Gly Ile Val Gly Gly 115 120 125 Gly Tyr Ala Leu Ala Val Gly Ala Ala Leu Thr Gln Gln Tyr Asp Asn 130 135 140 Thr Gly Asn Ile Val Val Ala Phe Ser Gly Asp Ser Ala Thr Asn Glu 145 150 155 160 Gly Ser Phe His Glu Ser Val Asn Leu Ala Ala Val Trp Asn Leu Pro 165 170 175 Val Ile Phe Phe Ile Ile Asn Asn Arg Tyr Gly Ile Ser Thr Asp Ile 180 185 190 Asn Tyr Ser Thr Lys Ile Ser His Leu Tyr Leu Arg Ala Asp Ala Tyr 195 200 205 Gly Ile Pro Gly His Tyr Val Glu Asp Gly Asn Asp Val Ile Ala Val 210 215 220 Tyr Glu Lys Met Gln Glu Val Ile Asp Tyr Val Arg Ser Gly Asn Gly 225 230 235 240 Pro Ala Leu Val Glu Val Glu Ser Tyr Arg Trp Phe Gly His Ser Thr 245 250 255 Ala Asp Ala Gly Ala Tyr Arg Thr Lys Glu Glu Val Asp Ala Trp Lys 260 265 270 Ala Lys Asp Pro Leu Lys Lys Tyr Arg Thr Tyr Leu Thr Glu Asn Lys 275 280 285 Ile Ala Thr Asp Glu Glu Leu Asp Met Ile Glu Lys Glu Val Ala Gln 290 295 300 Glu Ile Glu Asp Ala Val Lys Phe Ala Gln Asp Ser Pro Glu Pro Glu 305 310 315 320 Leu Ser Val Ala Phe Glu Asp Val Trp Val Asp 325 330 16996DNAStreptococcus mutans 16atggcaagaa aaattttgga ggtcattata gcaatgttat ctaaaaaaca atatttggat 60atgtttttaa aaatgcagcg tatccgtgat gtcgatacaa aactcaataa attagttcgt 120cgtggtttcg tacaaggtat gacacacttt tcagtaggag aagaggcggc ttcggttggt 180gcgattcaag gcttgactga tcaggatatt atcttttcaa atcaccgtgg acatggtcaa 240accattgcaa aagggattga cattcctgct atgtttgcag aattagccgg taaggcaacg 300ggttcttcaa aaggtcgtgg tggttctatg cacttggcaa atcttgaaaa aggaaactat 360gggaccaatg gtattgttgg cgggggttat gccttagcag tcggtgctgc tttgacacag 420caatatgaca atacgggaaa tattgttgtc gccttttcag gagactcggc aactaatgaa 480ggctctttcc atgagtctgt taatttggca gctgtctgga atttaccggt tatcttcttt 540attattaata atcgttatgg tatctcaaca gatatcaatt attctactaa gatttcacat 600ctttatttac gtgctgatgc ttatggtatt cctggacatt atgttgaaga tggtaatgat 660gtcattgcag tttatgaaaa aatgcaggaa gtcattgatt atgtgcgttc aggaaatggg 720ccagctcttg ttgaagtgga atcttatcgt tggttcggac attctactgc tgatgcagga 780gcttaccgta caaaagaaga agtagatgct tggaaagcta aagatcctct caagaaatac 840cgcacttatc taacagaaaa taagattgca acagatgagg aacttgatat gattgaaaaa 900gaagtcgcac aggaaattga ggatgcagtg aaatttgccc aagatagccc tgaaccagag 960ctttctgtag cttttgaaga tgtttgggta gattag 9961716PRTArtificial sequenceSynthetic polypeptide 17Xaa Xaa Xaa Gly Xaa Glu Ala Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 1 5 10 15 188PRTArtificial sequenceSynthetic polypeptide 18Asp Xaa Xaa Xaa Pro Xaa Tyr Arg 1 5 1911PRTArtificial sequenceSynthetic polypeptide 19Xaa Gln Xaa Xaa Xaa Ala Xaa Gly Xaa Ala Xaa 1 5 10 2020PRTArtificial sequenceSynthetic polypeptide 20Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly Xaa Gly Xaa Xaa Xaa 1 5 10 15 Xaa Xaa Asp Xaa 20 2121PRTArtificial sequenceSynthetic polypeptide 21Phe Xaa Xaa Val Xaa Xaa Xaa Pro Val Xaa Xaa Xaa Xaa Xaa Asn Asn 1 5 10 15 Xaa Xaa Ala Ile Ser 20 2216PRTArtificial sequenceSynthetic polypeptide 22Xaa Xaa Xaa Gly Xaa Xaa Xaa Xaa Xaa Xaa Xaa Val Asp Gly Asn Asp 1 5 10 15 2331PRTArtificial sequenceSynthetic polypeptide 23Xaa Ala Arg Xaa Gly Xaa Gly Pro Xaa Leu Xaa Glu Xaa Xaa Xaa Tyr 1 5 10 15 Arg Xaa Xaa Xaa His Xaa Xaa Xaa Asp Asp Xaa Xaa Xaa Tyr Arg 20 25 30 24327PRTBacillus subtillis 24Met Ser Val Met Ser Tyr Ile Asp Ala Ile Asn Leu Ala Met Lys Glu 1 5 10 15 Glu Met Glu Arg Asp Ser Arg Val Phe Val Leu Gly Glu Asp Val Gly 20 25 30 Arg Lys Gly Gly Val Phe Lys Ala Thr Ala Gly Leu Tyr Glu Gln Phe 35 40 45 Gly Glu Glu Arg Val Met Asp Thr Pro Leu Ala Glu Ser Ala Ile Ala 50 55 60 Gly Val Gly Ile Gly Ala Ala Met Tyr Gly Met Arg Pro Ile Ala Glu 65 70 75 80 Met Gln Phe Ala Asp Phe Ile Met Pro Ala Val

Asn Gln Ile Ile Ser 85 90 95 Glu Ala Ala Lys Ile Arg Tyr Arg Ser Asn Asn Asp Trp Ser Cys Pro 100 105 110 Ile Val Val Arg Ala Pro Tyr Gly Gly Gly Val His Gly Ala Leu Tyr 115 120 125 His Ser Gln Ser Val Glu Ala Ile Phe Ala Asn Gln Pro Gly Leu Lys 130 135 140 Ile Val Met Pro Ser Thr Pro Tyr Asp Ala Lys Gly Leu Leu Lys Ala 145 150 155 160 Ala Val Arg Asp Glu Asp Pro Val Leu Phe Phe Glu His Lys Arg Ala 165 170 175 Tyr Arg Leu Ile Lys Gly Glu Val Pro Ala Asp Asp Tyr Val Leu Pro 180 185 190 Ile Gly Lys Ala Asp Val Lys Arg Glu Gly Asp Asp Ile Thr Val Ile 195 200 205 Thr Tyr Gly Leu Cys Val His Phe Ala Leu Gln Ala Ala Glu Arg Leu 210 215 220 Glu Lys Asp Gly Ile Ser Ala His Val Val Asp Leu Arg Thr Val Tyr 225 230 235 240 Pro Leu Asp Lys Glu Ala Ile Ile Glu Ala Ala Ser Lys Thr Gly Lys 245 250 255 Val Leu Leu Val Thr Glu Asp Thr Lys Glu Gly Ser Ile Met Ser Glu 260 265 270 Val Ala Ala Ile Ile Ser Glu His Cys Leu Phe Asp Leu Asp Ala Pro 275 280 285 Ile Lys Arg Leu Ala Gly Pro Asp Ile Pro Ala Met Pro Tyr Ala Pro 290 295 300 Thr Met Glu Lys Tyr Phe Met Val Asn Pro Asp Lys Val Glu Ala Ala 305 310 315 320 Met Arg Glu Leu Ala Glu Phe 325 25984DNABacillus subtillis 25ttaaaactcc gctaattctc tcatcgccgc ttccacttta tcagggttga ccataaagta 60tttttccatt gtcggcgcat aaggcatagc cggaatatca ggacctgcaa gccgtttgat 120cggcgcgtct aagtcgaaca gacaatgctc ggatataatt gcggctactt cgctcatgat 180gctgccttct tttgtatctt ctgtgaccaa aagaaccttt ccagttttgg acgcagcttc 240gatgatggct tctttatcaa gcgggtaaac tgttcttaaa tccaccacat gcgctgaaat 300gccatctttt tcgagacgtt ctgcagcttg taaggcgaag tggacacaca ggccgtatgt 360gatcactgtg atgtcgtcgc cttccctttt tacgtccgcc ttgccgattg gcaggacata 420atcatcagcc ggaacctcgc cctttatcag acggtatgcc cgcttgtgct caaaaaacag 480cacggggtct tcgtcacgaa ctgcggcttt taagagccct ttcgcgtcat atggtgttga 540tggcatgaca attttcagtc cgggctggtt ggcgaaaatt gcttcgactg attgagaatg 600atacagggct ccgtgcacgc ctccgccgta tggcgctctg acgacaatcg gacagctcca 660gtcattgttg ctgcggtagc ggattttagc cgcttcagaa ataatttggt tgactgccgg 720cataatgaaa tcagcaaact gcatttcagc aatcggtctc attccgtaca ttgccgctcc 780gataccgact cctgcgattg cagattcagc aagcggcgta tccataacgc gctcttcccc 840aaattgttca tagagtcccg ctgtcgcttt aaacacaccg ccttttcttc ctacatcttc 900cccaaggacg aaaacgcgag aatctcgttc catttcttct ttcatcgcca aattgattgc 960atcaatatat gacattactg acat 98426325PRTStreptomyces avermitilis 26Met Ala Glu Lys Met Ala Ile Ala Lys Ala Ile Asn Glu Ser Leu Arg 1 5 10 15 Lys Ala Leu Glu Ser Asp Pro Lys Val Leu Ile Met Gly Glu Asp Val 20 25 30 Gly Lys Leu Gly Gly Val Phe Arg Val Thr Asp Gly Leu Gln Lys Asp 35 40 45 Phe Gly Glu Glu Arg Val Ile Asp Thr Pro Leu Ala Glu Ser Gly Ile 50 55 60 Val Gly Thr Ala Ile Gly Leu Ala Leu Arg Gly Tyr Arg Pro Val Val 65 70 75 80 Glu Ile Gln Phe Asp Gly Phe Val Phe Pro Ala Tyr Asp Gln Ile Val 85 90 95 Thr Gln Leu Ala Lys Met His Ala Arg Ala Leu Gly Lys Ile Lys Leu 100 105 110 Pro Val Val Val Arg Ile Pro Tyr Gly Gly Gly Ile Gly Ala Val Glu 115 120 125 His His Ser Glu Ser Pro Glu Ala Leu Phe Ala His Val Ala Gly Leu 130 135 140 Lys Val Val Ser Pro Ser Asn Ala Ser Asp Ala Tyr Trp Met Met Gln 145 150 155 160 Gln Ala Ile Gln Ser Asp Asp Pro Val Ile Phe Phe Glu Pro Lys Arg 165 170 175 Arg Tyr Trp Asp Lys Gly Glu Val Asn Val Glu Ala Ile Pro Asp Pro 180 185 190 Leu His Lys Ala Arg Val Val Arg Glu Gly Thr Asp Leu Thr Leu Ala 195 200 205 Ala Tyr Gly Pro Met Val Lys Val Cys Gln Glu Ala Ala Ala Ala Ala 210 215 220 Glu Glu Glu Gly Lys Ser Leu Glu Val Val Asp Leu Arg Ser Met Ser 225 230 235 240 Pro Ile Asp Phe Asp Ala Val Gln Ala Ser Val Glu Lys Thr Arg Arg 245 250 255 Leu Val Val Val His Glu Ala Pro Val Phe Leu Gly Thr Gly Ala Glu 260 265 270 Ile Ala Ala Arg Ile Thr Glu Arg Cys Phe Tyr His Leu Glu Ala Pro 275 280 285 Val Leu Arg Val Gly Gly Tyr His Ala Pro Tyr Pro Pro Ala Arg Leu 290 295 300 Glu Glu Glu Tyr Leu Pro Gly Leu Asp Arg Val Leu Asp Ala Val Asp 305 310 315 320 Arg Ser Leu Ala Tyr 325 27978DNAStreptomyces avermitilis 27atggccgaga agatggcgat cgccaaggcg atcaacgagt cgctgcgcaa ggccctggag 60tccgacccca aggttctgat catgggtgag gacgtcggca agctcggtgg cgtcttccgc 120gtcaccgacg gcctgcagaa ggacttcggc gaggagcggg tcatcgacac cccgctcgcc 180gagtcgggca tcgtcggcac ggcgatcggt ctcgccctgc gcggctaccg cccggtggtg 240gagatccagt tcgacggctt cgtcttcccg gcgtacgacc agatcgtcac gcagctcgcg 300aagatgcacg cgcgggcgct cggcaagatc aagctccccg ttgtcgtccg catcccgtac 360ggcggcggca tcggcgccgt cgagcaccac tccgagtccc ccgaggcgct cttcgcgcac 420gtggcgggcc tcaaggtggt ctccccgtcc aacgcgtcgg acgcgtactg gatgatgcag 480caggccatcc agagcgacga cccggtgatc ttcttcgagc cgaagcggcg ctactgggac 540aagggcgagg tcaacgtcga ggcgatcccc gacccgctgc acaaggcccg tgtggtgcgt 600gagggcaccg acctgacgct cgccgcgtac ggcccgatgg tgaaggtctg ccaggaggcc 660gcggccgccg ccgaggagga gggcaagtcc ctggaggtcg tcgacctgcg ctccatgtcg 720ccgatcgact tcgacgccgt ccaggcctcc gtcgagaaga cccgccgtct ggtcgtggtg 780cacgaggcgc cggtgttcct gggcacgggc gcggagatcg ccgcccgcat cacggagcgc 840tgcttctacc acctggaggc acccgtgctg agggtcggcg gctaccacgc cccgtatccg 900ccggcgcgtc tggaagagga gtaccttccg ggccttgacc gggtgctcga tgccgtcgac 960cgctcgctgg cgtactga 97828410PRTPseudomonas putida 28Met Asn Glu Tyr Ala Pro Leu Arg Leu His Val Pro Glu Pro Thr Gly 1 5 10 15 Arg Pro Gly Cys Gln Thr Asp Phe Ser Tyr Leu Arg Leu Asn Asp Ala 20 25 30 Gly Gln Ala Arg Lys Pro Ala Ile Asp Val Asp Ala Ala Asp Thr Ala 35 40 45 Asp Leu Ser Tyr Ser Leu Val Arg Val Leu Asp Glu Gln Gly Asp Ala 50 55 60 Gln Gly Pro Trp Ala Glu Asp Ile Asp Pro Gln Ile Leu Arg Gln Gly 65 70 75 80 Met Arg Ala Met Leu Lys Thr Arg Ile Phe Asp Ser Arg Met Val Val 85 90 95 Ala Gln Arg Gln Lys Lys Met Ser Phe Tyr Met Gln Ser Leu Gly Glu 100 105 110 Glu Ala Ile Gly Ser Gly Gln Ala Leu Ala Leu Asn Arg Thr Asp Met 115 120 125 Cys Phe Pro Thr Tyr Arg Gln Gln Ser Ile Leu Met Ala Arg Asp Val 130 135 140 Ser Leu Val Glu Met Ile Cys Gln Leu Leu Ser Asn Glu Arg Asp Pro 145 150 155 160 Leu Lys Gly Arg Gln Leu Pro Ile Met Tyr Ser Val Arg Glu Ala Gly 165 170 175 Phe Phe Thr Ile Ser Gly Asn Leu Ala Thr Gln Phe Val Gln Ala Val 180 185 190 Gly Trp Ala Met Ala Ser Ala Ile Lys Gly Asp Thr Lys Ile Ala Ser 195 200 205 Ala Trp Ile Gly Asp Gly Ala Thr Ala Glu Ser Asp Phe His Thr Ala 210 215 220 Leu Thr Phe Ala His Val Tyr Arg Ala Pro Val Ile Leu Asn Val Val 225 230 235 240 Asn Asn Gln Trp Ala Ile Ser Thr Phe Gln Ala Ile Ala Gly Gly Glu 245 250 255 Ser Thr Thr Phe Ala Gly Arg Gly Val Gly Cys Gly Ile Ala Ser Leu 260 265 270 Arg Val Asp Gly Asn Asp Phe Val Ala Val Tyr Ala Ala Ser Arg Trp 275 280 285 Ala Ala Glu Arg Ala Arg Arg Gly Leu Gly Pro Ser Leu Ile Glu Trp 290 295 300 Val Thr Tyr Arg Ala Gly Pro His Ser Thr Ser Asp Asp Pro Ser Lys 305 310 315 320 Tyr Arg Pro Ala Asp Asp Trp Ser His Phe Pro Leu Gly Asp Pro Ile 325 330 335 Ala Arg Leu Lys Gln His Leu Ile Lys Ile Gly His Trp Ser Glu Glu 340 345 350 Glu His Gln Ala Val Thr Ala Glu Leu Glu Ala Ala Val Ile Ala Ala 355 360 365 Gln Lys Glu Ala Glu Gln Tyr Gly Thr Leu Ala Asn Gly His Ile Pro 370 375 380 Ser Ala Ala Ser Met Phe Glu Asp Val Tyr Lys Glu Met Pro Glu His 385 390 395 400 Leu Arg Arg Gln Arg Gln Glu Leu Gly Val 405 410 291233DNAPseudomonas putida 29tcaaaccccc agttcctggc gttgacggcg caggtgttcg ggcatctcct tgtacacatc 60ctcgaacatc gaggcggcgc tcgggatgtg cccgttagcc agggtgccgt actgctcggc 120ttctttctgt gcggcaatca ccgcagcttc gagctcggcc gtgacggctt ggtgttcttc 180ttcggaccag tggccgatct tgatcaggtg ctgcttcagg cgggcgatcg ggtcacccag 240cgggaagtgg ctccagtcat cggcagggcg gtacttggag gggtcgtccg acgtcgagtg 300cgggccggca cggtaggtga cccactcgat caggcttggg cccaggccgc ggcgggcgcg 360ctcggcagcc cagcgcgagg cggcgtacac ggcgacgaag tcgttgccgt caacccgcag 420cgaggcaatg ccgcagccca cgccacggcc ggcgaaggtg gtcgactcgc caccggcgat 480ggcctggaag gtagaaatcg cccactggtt gttgaccaca ttgaggatca ccggggcgcg 540gtaaacgtgg gcaaaggtga gggcggtgtg gaagtccgac tcggcggtgg ctccgtcacc 600gatccacgcc gaagcaatct tggtatcgcc cttgatcgcc gaggccatgg cccagccgac 660tgcctgcacg aactgggtcg ccaggttgcc gctgatggtg aagaagccgg cttcgcgcac 720cgagtacatg atcggcaact ggcggccctt gagggggtcg cgctcgttgg acagcagttg 780gcagatcatc tcgaccagcg atacgtcgcg ggccatcagg atgctttgct ggcggtaggt 840cgggaagcac atgtcggtgc ggttcagcgc cagcgcctgg ccactgccga tggcttcttc 900gcccaggctt tgcatgtaga aggacatctt cttctggcgc tgggcaacca ccatgcggct 960gtcgaagatc cgcgtcttga gcatggcgcg catgccttga cgaaggatct gtgggtcgat 1020gtcttcggcc caggggcctt gcgcatcacc ttgctcgtcg agcacgcgga ccaggctgta 1080ggacaggtcg gcagtgtcgg cagcatcgac atcgatcgcg ggtttacggg cttgacctgc 1140atcgttgagg cgcaggtagg aaaaatcggt ctggcagcct ggccggccgg tgggctcggg 1200cacatgcaaa cgcagggggg cgtactcgtt cat 123330327PRTListeria monocytogenes 30Met Pro Val Ile Ser Tyr Ile Asp Ala Ile Thr Met Ala Leu Lys Glu 1 5 10 15 Glu Met Glu Arg Asp Asp Lys Val Phe Ile Leu Gly Glu Asp Val Gly 20 25 30 Lys Lys Gly Gly Val Phe Lys Ala Thr Ala Gly Leu Tyr Asp Glu Phe 35 40 45 Gly Glu Asp Arg Val Leu Asp Thr Pro Leu Ala Glu Ser Ala Ile Ala 50 55 60 Gly Val Gly Ile Gly Ala Ala Met Tyr Gly Tyr Arg Pro Val Ala Glu 65 70 75 80 Met Gln Phe Ala Asp Phe Ile Met Pro Ala Val Asn Gln Ile Ile Ser 85 90 95 Glu Ala Ala Arg Ile Arg Tyr Arg Ser Asn Asn Asp Trp Ser Cys Pro 100 105 110 Met Val Ile Arg Ala Pro Phe Gly Gly Gly Val His Gly Ala Leu Tyr 115 120 125 His Ser Gln Ser Val Glu Lys Val Phe Phe Gly Gln Pro Gly Leu Lys 130 135 140 Ile Val Val Pro Ser Ser Pro Tyr Asp Ala Lys Gly Leu Leu Lys Ala 145 150 155 160 Ala Ile Arg Asp Asn Asp Pro Val Leu Phe Phe Glu His Lys Arg Ala 165 170 175 Tyr Arg Leu Leu Lys Gly Glu Val Pro Glu Thr Asp Tyr Ile Val Pro 180 185 190 Ile Gly Glu Ala Asn Val Val Arg Glu Gly Asp Asp Ile Thr Val Ile 195 200 205 Thr Tyr Gly Leu Ala Val Gln Phe Ala Gln Gln Ala Ala Glu Arg Leu 210 215 220 Ala Ala Glu Gly Val Glu Ala His Ile Leu Asp Leu Arg Thr Ile Tyr 225 230 235 240 Pro Leu Asp Gln Glu Ala Ile Ile Glu Ala Thr Lys Lys Thr Gly Lys 245 250 255 Val Leu Leu Val Thr Glu Asp Asn Lys Gln Gly Ser Ile Ile Ser Glu 260 265 270 Val Ala Ala Ile Ile Ser Glu His Cys Leu Phe Asp Leu Asp Ala Pro 275 280 285 Ile Ala Arg Leu Ala Gly Pro Asp Thr Pro Ala Met Pro Phe Ala Pro 290 295 300 Thr Met Glu Lys His Phe Met Ile Asn Pro Asp Lys Val Ala Asp Ala 305 310 315 320 Met Lys Glu Leu Ala Glu Phe 325 31984DNAListeria monocytogenes 31atgccagtca tttcatatat tgatgcaata accatggcgc ttaaagaaga aatggagcgc 60gatgataaag tatttatttt aggagaagat gttgggaaaa aaggtggcgt atttaaagcg 120actgctggtc tatatgacga atttggtgaa gacagagtac ttgatacacc acttgctgaa 180tctgccattg ccggagttgg aattggcgcg gcgatgtatg gctaccgccc agttgcagaa 240atgcaatttg ctgactttat tatgccagct gtcaaccaaa tcatttcaga agctgccaga 300attcggtacc gttctaataa cgattggtct tgtccaatgg ttattcgcgc accttttggc 360ggcggggtac acggggcact ttaccattca caatctgttg aaaaagtgtt tttcggacaa 420cctggtttga aaatcgttgt tccttcttca ccatatgatg caaaagggct tttaaaagcg 480gcgattcgcg ataatgatcc agtgcttttc tttgagcata aacgtgcgta ccgcttgcta 540aaaggcgaag tgccagaaac tgattatatc gttccaatcg gcgaagcaaa tgttgttcgt 600gaaggtgatg atattacagt aattacttac ggacttgcgg ttcaatttgc ccaacaagca 660gcagaacgtt tagcagcgga aggcgtagaa gcacatattc ttgatttacg gacaatctat 720ccactagacc aagaagcaat tattgaagca acgaaaaaaa caggtaaagt acttcttgta 780acggaagata acaaacaagg aagtattatc agtgaagtgg cagcaatcat ttcggagcat 840tgtttatttg acttagacgc accgattgct agactcgcag gacctgatac cccagcgatg 900ccttttgctc caacaatgga aaaacatttt atgatcaatc cagataaagt ggcggatgca 960atgaaagaat tagcggaatt ttag 98432334PRTStreptomyces avermitilis 32Met Thr Thr Val Ala Leu Lys Pro Ala Thr Met Ala Gln Ala Leu Thr 1 5 10 15 Arg Ala Leu Arg Asp Ala Met Ala Ala Asp Pro Ala Val His Val Met 20 25 30 Gly Glu Asp Val Gly Thr Leu Gly Gly Val Phe Arg Val Thr Asp Gly 35 40 45 Leu Ala Lys Glu Phe Gly Glu Asp Arg Cys Thr Asp Thr Pro Leu Ala 50 55 60 Glu Ala Gly Ile Leu Gly Thr Ala Val Gly Met Ala Met Tyr Gly Leu 65 70 75 80 Arg Pro Val Val Glu Met Gln Phe Asp Ala Phe Ala Tyr Pro Ala Phe 85 90 95 Glu Gln Leu Ile Ser His Val Ala Arg Met Arg Asn Arg Thr Arg Gly 100 105 110 Ala Met Pro Leu Pro Ile Thr Ile Arg Val Pro Tyr Gly Gly Gly Ile 115 120 125 Gly Gly Val Glu His His Ser Asp Ser Ser Glu Ala Tyr Tyr Met Ala 130 135 140 Thr Pro Gly Leu His Val Val Thr Pro Ala Thr Val Ala Asp Ala Tyr 145 150 155 160 Gly Leu Leu Arg Ala Ala Ile Ala Ser Asp Asp Pro Val Val Phe Leu 165 170 175 Glu Pro Lys Arg Leu Tyr Trp Ser Lys Asp Ser Trp Asn Pro Asp Glu 180 185 190 Pro Gly Thr Val Glu Pro Ile Gly Arg Ala Val Val Arg Arg Ser Gly 195 200 205 Arg Ser Ala Thr Leu Ile Thr Tyr Gly Pro Ser Leu Pro Val Cys Leu 210 215 220 Glu Ala Ala Glu Ala Ala Arg Ala Glu Gly Trp Asp Leu Glu Val Val 225 230 235 240 Asp Leu Arg Ser Leu Val Pro Phe Asp Asp Glu Thr Val Cys Ala Ser 245 250 255 Val Arg Arg Thr Gly Arg Ala Val Val Val His Glu Ser Gly Gly Tyr 260 265 270 Gly Gly Pro Gly Gly Glu Ile Ala Ala Arg Ile Thr Glu Arg Cys Phe 275 280 285 His His Leu Glu Ala Pro Val Leu Arg Val Ala Gly Phe Asp Ile Pro 290 295 300

Tyr Pro Pro Pro Met Leu Glu Arg His His Leu Pro Gly Val Asp Arg 305 310 315 320 Ile Leu Asp Ala Val Gly Arg Leu Gln Trp Glu Ala Gly Ser 325 330 331005DNAStreptomyces avermitilis 33atgaccaccg ttgccctcaa gccggccacc atggcgcagg cactcacacg cgcgttgcgt 60gacgccatgg ccgccgaccc cgccgtccac gtgatgggcg aggacgtcgg cacgctcggc 120ggggtcttcc gggtcaccga cgggctcgcc aaggagttcg gcgaggaccg ctgcacggac 180acgccgctcg ccgaggcagg catcctcggc acggccgtcg gcatggcgat gtacgggctg 240cggccggtcg tcgagatgca gttcgacgcg ttcgcgtacc cggcgttcga gcagctcatc 300agccatgtcg cgcggatgcg caaccgcacc cgcggggcga tgccgctgcc gatcaccatc 360cgtgtcccct acggcggcgg aatcggcgga gtcgaacacc acagcgactc ctccgaggcg 420tactacatgg cgactccggg gctccatgtc gtcacgcccg ccacggtcgc cgacgcgtac 480gggctgctgc gcgccgccat cgcctccgac gacccggtcg tcttcctgga gcccaagcgg 540ctgtactggt cgaaggactc ctggaacccg gacgagccgg ggaccgttga accgataggc 600cgcgcggtgg tgcggcgctc gggccggagc gccacgctca tcacgtacgg gccttccctg 660cccgtctgcc tggaggcggc cgaggcggcc cgggccgagg gctgggacct cgaagtcgtc 720gatctgcgct ccctggtgcc cttcgacgac gagacggtgt gcgcgtcggt gcgccggacc 780ggacgcgccg tcgtcgtgca cgagtcgggt ggttacggcg gcccgggcgg ggagatcgcc 840gcgcggatca ccgagcgctg cttccaccat ctggaggcgc cggtgctgcg cgtcgccggg 900ttcgacatcc cgtatccgcc gccgatgctg gagcgccatc atctgcccgg tgtcgaccgg 960atcctggacg cggtggggcg gcttcagtgg gaggcgggga gctga 100534355PRTMicrococcus luteus 34Met Ser Glu Arg Met Thr Phe Gly Arg Ala Ile Asn Arg Gly Leu His 1 5 10 15 Arg Ala Leu Ala Asp Asp Pro Lys Val Leu Leu Met Gly Glu Asp Ile 20 25 30 Gly Ala Leu Gly Gly Val Phe Arg Ile Thr Asp Gly Leu Gln Ala Glu 35 40 45 Phe Gly Glu Asp Arg Val Leu Asp Thr Pro Leu Ala Glu Ser Gly Ile 50 55 60 Val Gly Thr Ala Ile Gly Leu Ala Met Arg Gly Tyr Arg Pro Val Val 65 70 75 80 Glu Ile Gln Phe Asp Gly Phe Val Tyr Pro Ala Phe Asp Gln Ile Val 85 90 95 Ala Asn Leu Ala Lys Leu Arg Ala Arg Thr Arg Gly Ala Val Pro Met 100 105 110 Pro Val Thr Ile Arg Ile Pro Phe Gly Gly Gly Ile Gly Ser Pro Glu 115 120 125 His His Ser Glu Ser Pro Glu Ala Tyr Phe Leu His Thr Ala Gly Leu 130 135 140 Arg Val Val Ser Pro Ser Ser Pro Gln Glu Gly Tyr Asp Leu Ile Arg 145 150 155 160 Ala Ala Ile Ala Ser Glu Asp Pro Val Val Tyr Leu Glu Pro Lys Arg 165 170 175 Arg Tyr His Asp Lys Gly Asp Val Asp Leu Gly Val Ala Ile Pro Pro 180 185 190 Met Ser Pro Ala Arg Ile Leu Arg Glu Gly Arg Asp Ala Thr Leu Val 195 200 205 Ala Tyr Gly Pro Leu Val Lys Thr Ala Leu Gln Ala Ala Glu Val Ala 210 215 220 Ala Glu Glu Gly Val Glu Val Glu Val Val Asp Leu Arg Ser Leu Ser 225 230 235 240 Pro Leu Asp Thr Gly Leu Val Glu Ser Ser Val Arg Arg Thr Gly Arg 245 250 255 Leu Val Val Ala His Glu Ala Ser Arg Thr Gly Gly Leu Gly Ala Glu 260 265 270 Leu Val Ala Thr Val Ala Glu Arg Ala Phe His Trp Leu Glu Ala Pro 275 280 285 Pro Val Arg Val Thr Gly Met Asp Val Pro Tyr Pro Pro Ser Lys Leu 290 295 300 Glu His Leu His Leu Pro Asp Leu Asp Arg Ile Leu Asp Gly Leu Asp 305 310 315 320 Arg Ala Leu Gly Arg Pro Asn Ser Leu Asp Ser Val Asp Ala Phe Ala 325 330 335 Ala Pro Glu Thr Ala Glu Gln Phe Leu Ala Ala Gln Asn Ala Gly Glu 340 345 350 Glu Thr Arg 355 351068DNAMicrococcus luteus 35gtgagcgagc gcatgacctt cggccgtgcg atcaaccgcg gcctgcaccg tgccctggcc 60gacgacccca aggtcctgct catgggcgag gacatcggcg ccctcggcgg cgtgttccgc 120atcaccgacg gcctgcaggc cgagttcggc gaggaccggg tgctcgacac cccgctggcc 180gagtccggca tcgtgggcac ggccatcggc ctggcgatgc gcggctaccg gcccgtcgtc 240gagatccagt tcgacggctt cgtgtacccg gcgttcgacc agatcgtggc gaacctggcc 300aagctgcgcg cccgcacccg cggcgccgtg ccgatgccgg tgaccatccg catccccttc 360ggcggcggca tcggctcccc ggagcaccac tccgagtcgc ccgaggccta cttcctgcac 420accgcgggtc tgcgcgtggt ctccccgtcc tccccgcagg aggggtacga cctcatccgc 480gccgcgatcg cctcggagga cccggtggtc tacctcgagc ccaagcgtcg ctaccacgac 540aagggcgacg tggacctggg cgtcgcgatc ccgccgatga gcccggcccg catcctgcgc 600gagggccgtg acgccacgct cgtggcctac ggcccgctcg tgaagaccgc cctgcaggcc 660gccgaggtgg cggccgagga gggtgtcgag gtcgaggtgg tcgacctgcg cagcctgtcc 720ccgctggaca ccggcctcgt cgagtcctcg gtgcggcgca ccggtcggct cgtcgtggcg 780cacgaggcct cccgcacggg cggcctcggc gccgagctcg tggccacggt ggccgagcgc 840gcgttccatt ggctcgaggc cccgccggtg cgcgtcaccg gcatggacgt gccctacccg 900ccgtccaagc tcgagcacct gcacctgccg gacctcgacc gcatcctcga cggcctggac 960cgtgctctgg gccggccgaa ttcgctggac tccgtggacg cgttcgccgc ccccgagacc 1020gccgagcagt tcctcgccgc ccagaacgcc ggggaggaga cccgatga 106836327PRTStaphylococcus aureus 36Met Ala Lys Leu Ser Tyr Leu Glu Ala Ile Arg Gln Ala Gln Asp Leu 1 5 10 15 Ala Leu Gln Gln Asn Lys Asp Val Phe Ile Leu Gly Glu Asp Val Gly 20 25 30 Lys Lys Gly Gly Val Phe Gly Thr Thr Gln Gly Leu Gln Gln Gln Tyr 35 40 45 Gly Glu Asp Arg Val Ile Asp Thr Pro Leu Ala Glu Ser Asn Ile Val 50 55 60 Gly Thr Ala Ile Gly Ala Ala Met Val Gly Lys Arg Pro Ile Ala Glu 65 70 75 80 Ile Gln Phe Ala Asp Phe Ile Leu Pro Ala Thr Asn Gln Ile Ile Ser 85 90 95 Glu Ala Ala Lys Met Arg Tyr Arg Ser Asn Asn Asp Trp Gln Cys Pro 100 105 110 Leu Thr Ile Arg Ala Pro Phe Gly Gly Gly Val His Gly Gly Leu Tyr 115 120 125 His Ser Gln Ser Ile Glu Ser Ile Phe Ala Ser Ser Pro Gly Leu Thr 130 135 140 Ile Val Ile Pro Ser Thr Pro Tyr Asp Ala Lys Gly Leu Leu Leu Ser 145 150 155 160 Ser Ile Glu Ser Asn Asp Pro Val Leu Tyr Phe Glu His Lys Lys Ala 165 170 175 Tyr Arg Phe Leu Lys Glu Glu Val Pro Glu Glu Tyr Tyr Thr Val Pro 180 185 190 Leu Gly Lys Ala Asp Val Lys Arg Glu Gly Glu Asp Leu Thr Val Phe 195 200 205 Cys Tyr Gly Leu Met Val Asn Tyr Cys Leu Gln Ala Ala Asp Ile Leu 210 215 220 Ala Ala Asp Gly Ile Asn Val Glu Val Val Asp Leu Arg Thr Val Tyr 225 230 235 240 Pro Leu Asp Lys Glu Thr Ile Ile Asp Arg Ala Lys Asn Thr Gly Lys 245 250 255 Val Leu Leu Val Thr Glu Asp Asn Leu Glu Gly Ser Ile Met Ser Glu 260 265 270 Val Ser Ala Ile Ile Ala Glu His Cys Leu Phe Asp Leu Asp Thr Pro 275 280 285 Ile Met Arg Leu Ala Ala Pro Asp Val Pro Ser Met Pro Phe Ser Pro 290 295 300 Val Leu Glu Asn Glu Ile Met Met Asn Pro Glu Lys Ile Leu Asn Lys 305 310 315 320 Met Arg Glu Leu Ala Glu Phe 325 37984DNAStaphylococcus aureus 37ctagaattct gctaattcac gcattttatt taagattttt tctggattca tcataatttc 60attttctaat acaggagaaa atggcataga tggtacatct ggagcagcta aacgcatgat 120tggtgtatct aaatcgaaca agcaatgctc tgcaataatc gctgacactt ctgacataat 180actaccttct aaattatctt cagttacaag taaaacttta cctgtatttt tagcacgatc 240aataattgtt tctttatcta atggataaac agttcgtaaa tcaacgactt caacgttgat 300accgtctgca gctaaaatat ccgctgcttg taaacaataa ttgaccatta atccataaca 360aaatactgtt aaatcttcac cttcacgttt aacatctgct tttcctaaag gtacagtgta 420atattcttct ggcacttctt cctttaagaa acgataagct tttttatgct caaagtacaa 480tactggatca tttgattcga tagatgataa taaaagccct ttagcatcat acggtgtgga 540aggaataaca attgttaaac ctggtgatga agcaaatata ctttcaatac tttgtgaatg 600atatagtcct ccgtgaacac cgccaccaaa tggtgcacga atcgttaatg ggcattgcca 660atcattattt gaacgataac gcattttcgc agcttcacta ataatttgat ttgtcgcagg 720taaaataaaa tctgcaaatt gaatttctgc aattggtctt ttacctacca tagctgcacc 780aatggcagtt ccaacaatat ttgactcagc taatggcgta tcgataactc tgtcttcacc 840atattgttgt tgtagtcctt gagtagtacc aaatacgcca ccttttttac caacatcttc 900accaagaata aacacatctt tattttgttg taatgctaag tcttgtgcct ggcgtatcgc 960ctctaaataa gataatttag ccat 98438337PRTStreptococcus mutans 38Met Arg Arg Lys Arg Tyr Met Ser Glu Thr Lys Val Val Ala Leu Arg 1 5 10 15 Glu Ala Ile Asn Leu Ala Met Ser Glu Glu Met Arg Lys Asp Glu Lys 20 25 30 Ile Ile Leu Met Gly Glu Asp Val Gly Ile Tyr Gly Gly Asp Phe Gly 35 40 45 Thr Ser Val Gly Met Leu Ala Glu Phe Gly Glu Lys Arg Val Lys Asp 50 55 60 Thr Pro Ile Ser Glu Ala Ala Ile Ala Gly Ser Ala Val Gly Ala Ala 65 70 75 80 Gln Thr Gly Leu Arg Pro Ile Val Asp Leu Thr Phe Met Asp Phe Val 85 90 95 Thr Ile Ala Met Asp Ala Ile Val Asn Gln Gly Ala Lys Ala Asn Tyr 100 105 110 Met Phe Gly Gly Gly Leu Lys Thr Pro Val Thr Phe Arg Val Ala Ser 115 120 125 Gly Ser Gly Ile Gly Ser Ala Ala Gln His Ser Gln Ser Leu Glu Ala 130 135 140 Trp Leu Thr His Ile Pro Gly Ile Lys Val Val Ala Pro Gly Thr Val 145 150 155 160 Asn Asp Ala Lys Ala Leu Leu Lys Ser Ala Ile Arg Asp Asn Asn Ile 165 170 175 Val Ile Phe Met Glu Pro Lys Ala Leu Tyr Gly Lys Lys Glu Glu Val 180 185 190 Asn Leu Asp Pro Asp Phe Tyr Ile Pro Leu Gly Lys Gly Glu Ile Lys 195 200 205 Arg Glu Gly Thr Asp Val Thr Ile Val Ser Tyr Gly Arg Met Leu Glu 210 215 220 Arg Val Leu Lys Ala Ala Glu Glu Val Ala Ala Glu Asp Ile Ser Val 225 230 235 240 Glu Val Val Asp Pro Arg Thr Leu Ile Pro Leu Asp Lys Asp Leu Ile 245 250 255 Ile Asn Ser Val Lys Lys Thr Gly Lys Val Ile Leu Val Asn Asp Ala 260 265 270 Tyr Lys Thr Gly Gly Phe Ile Gly Glu Ile Ala Ser Val Ile Thr Glu 275 280 285 Ser Glu Ala Phe Asp Tyr Leu Asp Ala Pro Val Leu Arg Leu Ala Ser 290 295 300 Glu Asp Val Pro Val Pro Tyr Ser His Val Leu Glu Thr Ala Ile Leu 305 310 315 320 Pro Asp Val Ala Lys Ile Lys Glu Ala Ile Tyr Lys Gln Val Arg Lys 325 330 335 Arg 391014DNAStreptococcus mutans 39atgaggagaa agagatatat gtcagaaaca aaagtagtag ccttacggga agctatcaat 60cttgctatga gcgaggaaat gcgtaaggac gaaaaaatta ttttgatggg tgaagatgtc 120ggtatttatg gtggtgactt tggaacttct gttggtatgc tggctgaatt tggtgaaaag 180cgtgttaaag atacccctat ttcagaagca gccattgcag gatctgcagt aggtgccgct 240caaactggac ttcgtcctat tgttgatttg acctttatgg actttgtgac tattgccatg 300gatgctattg ttaatcaagg tgctaaagcc aattatatgt ttggcggcgg acttaaaacg 360cctgtaacct ttcgtgtggc ctcaggctca ggtatcggct cagcagcgca gcattctcag 420tcactagaag cttggttaac tcatattccg ggaatcaagg tggttgcgcc tggcacagtc 480aatgatgcta aagccttgct caaatctgct attcgtgata ataatatcgt tattttcatg 540gaaccaaaag cgctttatgg caaaaaagaa gaggtcaatt tagatcctga tttttatatt 600ccgcttggta aaggcgaaat taagcgcgag ggaacagatg ttaccattgt gtcttatggt 660cgtatgctgg aacgcgttct caaagccgct gaggaagtgg cggctgaaga tatcagtgtt 720gaagttgttg acccgcgtac ccttattccg cttgataaag acttaattat taattctgtg 780aaaaagacgg gtaaggttat cctagttaat gatgcttata aaacaggtgg tttcattggt 840gaaatagcat cagtgattac tgaaagcgaa gcatttgatt atttagatgc accagtgctt 900cgtctcgctt ctgaggatgt gcctgttccc tattctcatg ttctcgaaac agccatttta 960ccagatgtgg caaaaattaa agaagctatc tataaacaag tcaggaaaag atag 10144021PRTArtificial sequenceSynthetic polypeptide 40Val Xaa Xaa Xaa Gly Xaa Asp Val Gly Xaa Xaa Gly Gly Val Phe Xaa 1 5 10 15 Xaa Thr Xaa Gly Ile 20 4116PRTArtificial sequenceSynthetic polypeptide 41Xaa Gly Xaa Xaa Arg Xaa Xaa Asp Xaa Pro Xaa Xaa Glu Xaa Xaa Ile 1 5 10 15 4215PRTArtificial sequenceSynthetic polypeptide 42Gly Thr Xaa Xaa Xaa Gly Xaa Arg Pro Xaa Xaa Glu Xaa Gln Phe 1 5 10 15 4319PRTArtificial sequenceSynthetic polypeptide 43Pro Xaa Gly Gly Xaa Xaa Xaa Xaa Xaa Xaa His Ser Xaa Ser Xaa Glu 1 5 10 15 Ala Xaa Xaa 4413PRTArtificial sequenceSynthetic polypeptide 44Xaa Asp Pro Val Xaa Xaa Xaa Glu Xaa Lys Arg Xaa Tyr 1 5 10 4512PRTArtificial sequenceSynthetic polypeptide 45Xaa Val Xaa Asp Leu Arg Xaa Xaa Xaa Pro Xaa Asp 1 5 10 4619PRTArtificial sequenceSynthetic polypeptide 46Glu Xaa Cys Xaa Xaa Xaa Leu Xaa Ala Pro Xaa Xaa Arg Xaa Xaa Gly 1 5 10 15 Xaa Xaa Pro 47424PRTBacillus subtillis 47Met Ala Ile Glu Gln Met Thr Met Pro Gln Leu Gly Glu Ser Val Thr 1 5 10 15 Glu Gly Thr Ile Ser Lys Trp Leu Val Ala Pro Gly Asp Lys Val Asn 20 25 30 Lys Tyr Asp Pro Ile Ala Glu Val Met Thr Asp Lys Val Asn Ala Glu 35 40 45 Val Pro Ser Ser Phe Thr Gly Thr Ile Thr Glu Leu Val Gly Glu Glu 50 55 60 Gly Gln Thr Leu Gln Val Gly Glu Met Ile Cys Lys Ile Glu Thr Glu 65 70 75 80 Gly Ala Asn Pro Ala Glu Gln Lys Gln Glu Gln Pro Ala Ala Ser Glu 85 90 95 Ala Ala Glu Asn Pro Val Ala Lys Ser Ala Gly Ala Ala Asp Gln Pro 100 105 110 Asn Lys Lys Arg Tyr Ser Pro Ala Val Leu Arg Leu Ala Gly Glu His 115 120 125 Gly Ile Asp Leu Asp Gln Val Thr Gly Thr Gly Ala Gly Gly Arg Ile 130 135 140 Thr Arg Lys Asp Ile Gln Arg Leu Ile Glu Thr Gly Gly Val Gln Glu 145 150 155 160 Gln Asn Pro Glu Glu Leu Lys Thr Ala Ala Pro Ala Pro Lys Ser Ala 165 170 175 Ser Lys Pro Glu Pro Lys Glu Glu Thr Ser Tyr Pro Ala Ser Ala Ala 180 185 190 Gly Asp Lys Glu Ile Pro Val Thr Gly Val Arg Lys Ala Ile Ala Ser 195 200 205 Asn Met Lys Arg Ser Lys Thr Glu Ile Pro His Ala Trp Thr Met Met 210 215 220 Glu Val Asp Val Thr Asn Met Val Ala Tyr Arg Asn Ser Ile Lys Asp 225 230 235 240 Ser Phe Lys Lys Thr Glu Gly Phe Asn Leu Thr Phe Phe Ala Phe Phe 245 250 255 Val Lys Ala Val Ala Gln Ala Leu Lys Glu Phe Pro Gln Met Asn Ser 260 265 270 Met Trp Ala Gly Asp Lys Ile Ile Gln Lys Lys Asp Ile Asn Ile Ser 275 280 285 Ile Ala Val Ala Thr Glu Asp Ser Leu Phe Val Pro Val Ile Lys Asn 290 295 300 Ala Asp Glu Lys Thr Ile Lys Gly Ile Ala Lys Asp Ile Thr Gly Leu 305 310 315 320 Ala Lys Lys Val Arg Asp Gly Lys Leu Thr Ala Asp Asp Met Gln Gly 325 330 335 Gly Thr Phe Thr Val Asn Asn Thr Gly Ser Phe Gly Ser Val Gln Ser 340 345 350 Met Gly Ile Ile Asn Tyr Pro Gln Ala Ala Ile Leu Gln Val Glu Ser 355 360 365 Ile Val Lys Arg Pro Val Val Met Asp Asn Gly Met Ile Ala Val Arg 370 375 380 Asp Met Val Asn Leu Cys Leu Ser Leu Asp His Arg Val Leu Asp Gly 385 390 395 400 Leu Val Cys Gly Arg Phe Leu Gly Arg Val Lys Gln Ile Leu Glu Ser 405 410 415 Ile Asp Glu Lys Thr Ser Val Tyr 420

481275DNABacillus subtillis 48ttagtaaaca gatgtcttct cgtcaatcga ttctaaaatt tgtttcactc gtccgaggaa 60tcgtccgcac acgagaccgt caagcactct gtgatctaat gacaggcaca gattaaccat 120gtctctgaca gcaatcatgc cattgtccat gacaaccggg cgtttgacga tggattctac 180ttgaagaatc gcagcctgag ggtagttgat aatgcccatc gactgaacag acccgaacga 240acctgtgttg ttgacggtaa acgtgcctcc ctgcatgtca tctgcagtga gttttccgtc 300tcttactttt ttagctaggc cggtaatgtc tttcgcaatg cctttaattg ttttttcatc 360agcgttttta atcaccggaa caaataaaga atcctctgtg gcaactgcaa ttgaaatatt 420gatatccttt ttctgaataa ttttgtcccc cgcccacatg ctattcattt gcgggaattc 480ttttaacgcc tgagcgaccg cttttacaaa aaaggcgaag aacgttaaat taaagccttc 540tgtcttctta aaagaatctt ttatactgtt gcgatatgca accatatttg tgacgtcgac 600ttccatcatc gtccaagcat gcggaatttc tgttttgctt cgcttcatat tggaagcaat 660tgcttttctt acacctgtga cagggatttc tttatcaccg gctgcagacg caggatatga 720cgtctcttct tttggctcag gttttgatgc agacttcggt gcaggagctg ctgttttcag 780ctcctcagga ttctgttctt gcacgccgcc tgtttcaatt aagcgctgaa tatcttttcg 840tgtgatgcgc ccgccggcac cagttcctgt cacttgatcg aggtcaatgc cgtgctctcc 900ggccaaacgg agaacagctg gcgagtagcg ctttttattg ggctgatcgg ctgctccagc 960actttttgca acagggttct cagcggcttc tgatgctgct ggctgttctt gtttttgttc 1020agccggattc gcgccttctg tttcaatttt gcaaatcatt tctccgactt gcagggtttg 1080gccttcttct cccacaagct ctgttatcgt accagtaaaa gaagacggaa cctctgcatt 1140taccttatct gtcatgactt ccgcgatcgg atcgtatttg ttcactttat caccgggggc 1200gacaagccat ttgctgatcg tcccctctgt tacgctttct ccaagctgcg gcatcgtcat 1260ttgttcaatt gccat 127549462PRTStreptomyces avermitilis 49Met Thr Glu Ala Ser Val Arg Glu Phe Lys Met Pro Asp Val Gly Glu 1 5 10 15 Gly Leu Thr Glu Ala Glu Ile Leu Lys Trp Tyr Val Gln Pro Gly Asp 20 25 30 Thr Val Thr Asp Gly Gln Val Val Cys Glu Val Glu Thr Ala Lys Ala 35 40 45 Ala Val Glu Leu Pro Ile Pro Tyr Asp Gly Val Val Arg Glu Leu Arg 50 55 60 Phe Pro Glu Gly Thr Thr Val Asp Val Gly Gln Val Ile Ile Ala Val 65 70 75 80 Asp Val Ala Gly Asp Ala Pro Val Ala Glu Ile Pro Val Pro Ala Gln 85 90 95 Glu Ala Pro Val Gln Glu Glu Pro Lys Pro Glu Gly Arg Lys Pro Val 100 105 110 Leu Val Gly Tyr Gly Val Ala Glu Ser Ser Thr Lys Arg Arg Pro Arg 115 120 125 Lys Ser Ala Pro Ala Ser Glu Pro Ala Ala Glu Gly Thr Tyr Phe Ala 130 135 140 Ala Thr Val Leu Gln Gly Ile Gln Gly Glu Leu Asn Gly His Gly Ala 145 150 155 160 Val Lys Gln Arg Pro Leu Ala Lys Pro Pro Val Arg Lys Leu Ala Lys 165 170 175 Asp Leu Gly Val Asp Leu Ala Thr Ile Thr Pro Ser Gly Pro Asp Gly 180 185 190 Val Ile Thr Arg Glu Asp Val His Ala Ala Val Ala Pro Pro Pro Pro 195 200 205 Ala Pro Gln Pro Val Gln Thr Pro Ala Ala Pro Ala Pro Ala Pro Val 210 215 220 Ala Ala Tyr Asp Thr Ala Arg Glu Thr Arg Val Pro Val Lys Gly Val 225 230 235 240 Arg Lys Ala Thr Ala Ala Ala Met Val Gly Ser Ala Phe Thr Ala Pro 245 250 255 His Val Thr Glu Phe Val Thr Val Asp Val Thr Arg Thr Met Lys Leu 260 265 270 Val Glu Glu Leu Lys Gln Asp Lys Glu Phe Thr Gly Leu Arg Val Asn 275 280 285 Pro Leu Leu Leu Ile Ala Lys Ala Leu Leu Val Ala Ile Lys Arg Asn 290 295 300 Pro Asp Ile Asn Ala Ser Trp Asp Glu Ala Asn Gln Glu Ile Val Leu 305 310 315 320 Lys His Tyr Val Asn Leu Gly Ile Ala Ala Ala Thr Pro Arg Gly Leu 325 330 335 Ile Val Pro Asn Ile Lys Asp Ala His Ala Lys Thr Leu Pro Gln Leu 340 345 350 Ala Glu Ser Leu Gly Glu Leu Val Ser Thr Ala Arg Glu Gly Lys Thr 355 360 365 Ser Pro Thr Ala Met Gln Gly Gly Thr Val Thr Ile Thr Asn Val Gly 370 375 380 Val Phe Gly Val Asp Thr Gly Thr Pro Ile Leu Asn Pro Gly Glu Ser 385 390 395 400 Ala Ile Leu Ala Val Gly Ala Ile Lys Leu Gln Pro Trp Val His Lys 405 410 415 Gly Lys Val Lys Pro Arg Gln Val Thr Thr Leu Ala Leu Ser Phe Asp 420 425 430 His Arg Leu Val Asp Gly Glu Leu Gly Ser Lys Val Leu Ala Asp Val 435 440 445 Ala Ala Ile Leu Glu Gln Pro Lys Arg Leu Ile Thr Trp Ala 450 455 460 501389DNAStreptomyces avermitilis 50atgactgagg cgtccgtgcg tgagttcaag atgcccgatg tgggtgaggg actcaccgag 60gccgagatcc tcaagtggta cgtccagccc ggcgacaccg tcaccgacgg ccaggtcgtc 120tgcgaggtcg agaccgcgaa ggcggccgtg gaactcccca ttccgtacga cggtgtcgta 180cgcgaactcc gtttccccga ggggacgacg gtggacgtgg gacaggtgat catcgcggtg 240gacgtggccg gcgacgcacc ggtggcggag atccccgtgc ccgcgcagga ggctccggtc 300caggaggagc ccaagcccga gggccgcaag cccgtcctcg tgggctacgg ggtggccgag 360tcctccacca agcgccgtcc gcgcaagagc gcgccggcga gcgagcccgc tgcggagggc 420acgtacttcg cagcgaccgt tctccagggc atccagggcg agctgaacgg acacggcgcg 480gtgaagcagc gtccgctggc gaagccgccg gtgcgcaagc tggccaagga cctgggcgtc 540gacctcgcga cgatcacgcc gtcgggcccc gacggcgtca tcacgcgcga ggacgtgcac 600gcggcggtgg cgccaccgcc gccggcaccc cagcccgtgc agacgcccgc tgccccggcc 660ccggcgccgg tggccgcgta cgacacggct cgtgagaccc gtgtccccgt caagggcgtc 720cgcaaggcga cggcggcggc gatggtcggc tcggcgttca cggcgccgca cgtcacggag 780ttcgtgacgg tggacgtgac gcgcacgatg aagctggtcg aggagctgaa gcaggacaag 840gagttcaccg gcctgcgggt gaacccgctg ctcctcatcg ccaaggcgct cctggtcgcg 900atcaagcgga acccggacat caacgcgtcc tgggacgagg cgaaccagga gatcgtcctc 960aagcactatg tgaacctggg catcgcggcg gccaccccgc gcggtctgat cgtcccgaac 1020atcaaggacg cccacgccaa gacgctgccg caactggccg agtcactggg tgagttggtg 1080tcgacggccc gcgagggcaa gacgtccccg acggccatgc agggcggcac ggtcacgatc 1140acgaacgtcg gcgtcttcgg cgtcgacacg ggcacgccga tcctcaaccc cggcgagtcc 1200gcgatcctcg cggtcggcgc gatcaagctc cagccgtggg tccacaaggg caaggtcaag 1260ccccgacagg tcaccacgct ggcgctcagc ttcgaccatc gcctggtcga cggcgagctg 1320ggctccaagg tgctggccga cgtggcggcg atcctggagc agccgaagcg gctgatcacc 1380tgggcctag 138951423PRTPseudomonas putida 51Met Gly Thr His Val Ile Lys Met Pro Asp Ile Gly Glu Gly Ile Ala 1 5 10 15 Gln Val Glu Leu Val Glu Trp Phe Val Lys Val Gly Asp Ile Ile Ala 20 25 30 Glu Asp Gln Val Val Ala Asp Val Met Thr Asp Lys Ala Thr Val Glu 35 40 45 Ile Pro Ser Pro Val Ser Gly Lys Val Leu Ala Leu Gly Gly Gln Pro 50 55 60 Gly Glu Val Met Ala Val Gly Ser Glu Leu Ile Arg Ile Glu Val Glu 65 70 75 80 Gly Ser Gly Asn His Val Asp Val Pro Gln Pro Lys Pro Val Glu Ala 85 90 95 Pro Ala Ala Pro Ile Ala Ala Lys Pro Glu Pro Gln Lys Asp Val Lys 100 105 110 Pro Ala Val Tyr Gln Ala Pro Ala Asn His Glu Ala Ala Pro Ile Val 115 120 125 Pro Arg Gln Pro Gly Asp Lys Pro Leu Ala Ser Pro Ala Val Arg Lys 130 135 140 Arg Ala Leu Asp Ala Gly Ile Glu Leu Arg Tyr Val His Gly Ser Gly 145 150 155 160 Pro Ala Gly Arg Ile Leu His Glu Asp Leu Asp Ala Phe Met Ser Lys 165 170 175 Pro Gln Ser Asn Ala Gly Gln Ala Pro Asp Gly Tyr Ala Lys Arg Thr 180 185 190 Asp Ser Glu Gln Val Pro Val Ile Gly Leu Arg Arg Lys Ile Ala Gln 195 200 205 Arg Met Gln Asp Ala Lys Arg Arg Val Ala His Phe Ser Tyr Val Glu 210 215 220 Glu Ile Asp Val Thr Ala Leu Glu Ala Leu Arg Gln Gln Leu Asn Ser 225 230 235 240 Lys His Gly Asp Ser Arg Gly Lys Leu Thr Leu Leu Pro Phe Leu Val 245 250 255 Arg Ala Leu Val Val Ala Leu Arg Asp Phe Pro Gln Ile Asn Ala Thr 260 265 270 Tyr Asp Asp Glu Ala Gln Ile Ile Thr Arg His Gly Ala Val His Val 275 280 285 Gly Ile Ala Thr Gln Gly Asp Asn Gly Leu Met Val Pro Val Leu Arg 290 295 300 His Ala Glu Ala Gly Ser Leu Trp Ala Asn Ala Gly Glu Ile Ser Arg 305 310 315 320 Leu Ala Asn Ala Ala Arg Asn Asn Lys Ala Ser Arg Glu Glu Leu Ser 325 330 335 Gly Ser Thr Ile Thr Leu Thr Ser Leu Gly Ala Leu Gly Gly Ile Val 340 345 350 Ser Thr Pro Val Val Asn Thr Pro Glu Val Ala Ile Val Gly Val Asn 355 360 365 Arg Met Val Glu Arg Pro Val Val Ile Asp Gly Gln Ile Val Val Arg 370 375 380 Lys Met Met Asn Leu Ser Ser Ser Phe Asp His Arg Val Val Asp Gly 385 390 395 400 Met Asp Ala Ala Leu Phe Ile Gln Ala Val Arg Gly Leu Leu Glu Gln 405 410 415 Pro Ala Cys Leu Phe Val Glu 420 521272DNAPseudomonas putida 52tcactccacg aacaggcagg cgggttgttc gagcaggcca cgcacggcct ggatgaacag 60ggcggcgtcc atgccatcga ccacgcggtg gtcgaacgag ctggacaggt tcatcatctt 120gcgcacgacg atctggccat caatcaccac cggtcgttcg accatgcggt tgaccccgac 180gattgccact tccggggtgt tgaccaccgg cgtgctgaca atgccaccca aggcgccgag 240gctggtcagg gtgatggtcg agccggacag ctcctcgcgg ctggccttgt tgttacgtgc 300agcgttggcc aggcgcgaaa tctcgccggc attggcccac aggctgcccg cttcggcgtg 360gcgcagcacg ggtaccatca ggccgttgtc accctgggtg gcaatgccca catgcaccgc 420gccatggcgg gtgatgatct gcgcttcgtc gtcgtaggtc gcgttgatct gcgggaagtc 480acgcagcgcc acgacgaggg cgcgcaccag gaatggcagc aaggtcagtt tgccgcggct 540gtcgccgtgc ttgctgttga gttgctggcg cagggcttcc agggcggtga cgtcgatttc 600ctcgacataa ctgaagtgcg cgacccggcg tttggcgtcc tgcatgcgct gggcgatctt 660gcggcgcagg ccgatcaccg gcacctgctc gctgtcggtg cgcttggcat aaccatcagg 720tgcttgcccg gcattgcttt gcggcttgct catgaaggcg tcgaggtctt cgtgcagaat 780gcgcccggcc gggccgctac catgcacata acgcagttcg ataccggcgt ccagggcgcg 840tttgcgcacg gccggcgagg ccagcggctt gtcgcccggc tggcgcggca cgatgggcgc 900agcttcgtgg ttggcgggcg cctggtacac ggcgggtttt acgtctttct gcggttccgg 960cttggctgca atcggggcgg ccggggcctc taccggtttt ggctgaggca cgtccacatg 1020gttgccgctg ccttccactt cgatgcggat cagttcgcta ccgaccgcca tcacttcccc 1080gggctggcca cccagggcca acaccttgcc gctgaccggc gaggggattt ccacggtggc 1140cttgtcggtc atgacgtcgg ccaccacctg gtcctcggcg atgatgtcgc cgaccttgac 1200gaaccattcc accaactcga cctgcgcgat gccttcgcca atgtccggca tcttgatgac 1260gtgcgtgccc at 127253416PRTListeria monocytogenes 53Met Ala Val Glu Lys Ile Thr Met Pro Lys Leu Gly Glu Ser Val Thr 1 5 10 15 Glu Gly Thr Ile Ser Ser Trp Leu Val Lys Pro Gly Asp Thr Val Glu 20 25 30 Lys Tyr Asp Ala Ile Ala Glu Val Leu Thr Asp Lys Val Thr Ala Glu 35 40 45 Ile Pro Ser Ser Phe Ser Gly Thr Ile Lys Glu Ile Leu Ala Glu Glu 50 55 60 Asp Glu Thr Leu Glu Val Gly Glu Val Ile Cys Thr Ile Glu Thr Glu 65 70 75 80 Glu Ala Ser Ser Ser Glu Pro Val Val Glu Ala Glu Gln Thr Glu Pro 85 90 95 Lys Thr Pro Glu Lys Gln Glu Thr Lys Gln Val Lys Leu Ala Glu Ala 100 105 110 Pro Ala Ser Gly Arg Phe Ser Pro Ala Val Leu Arg Ile Ala Gly Glu 115 120 125 Asn Asn Ile Asp Leu Ser Thr Val Glu Gly Thr Gly Lys Gly Gly Arg 130 135 140 Ile Thr Arg Lys Asp Leu Leu Gln Val Ile Glu Asn Gly Pro Val Ala 145 150 155 160 Pro Lys Arg Glu Glu Val Lys Ser Ala Pro Gln Glu Lys Glu Ala Thr 165 170 175 Pro Asn Pro Val Arg Ser Ala Ala Gly Asp Arg Glu Ile Pro Ile Asn 180 185 190 Gly Val Arg Lys Ala Ile Ala Lys His Met Ser Val Ser Lys Gln Glu 195 200 205 Ile Pro His Ala Trp Met Met Val Glu Val Asp Ala Thr Gly Leu Val 210 215 220 Arg Tyr Arg Asn Thr Val Lys Asp Ser Phe Lys Lys Glu Glu Gly Tyr 225 230 235 240 Ser Leu Thr Tyr Phe Ala Phe Phe Ile Lys Ala Val Ala Gln Ala Leu 245 250 255 Lys Glu Phe Pro Gln Leu Asn Ser Thr Trp Ala Gly Asp Lys Ile Ile 260 265 270 Glu His Ala Asn Ile Asn Ile Ser Ile Ala Ile Ala Ala Gly Asp Leu 275 280 285 Leu Tyr Val Pro Val Ile Lys Asn Ala Asp Glu Lys Ser Ile Lys Gly 290 295 300 Ile Ala Arg Glu Ile Ser Glu Leu Ala Gly Lys Ala Arg Asn Gly Lys 305 310 315 320 Leu Ser Gln Ala Asp Met Glu Gly Gly Thr Phe Thr Val Asn Ser Thr 325 330 335 Gly Ser Phe Gly Ser Val Gln Ser Met Gly Ile Ile Asn His Pro Gln 340 345 350 Ala Ala Ile Leu Gln Val Glu Ser Ile Val Lys Arg Pro Val Ile Ile 355 360 365 Asp Asp Met Ile Ala Val Arg Asp Met Val Asn Leu Cys Leu Ser Ile 370 375 380 Asp His Arg Ile Leu Asp Gly Leu Leu Ala Gly Lys Phe Leu Gln Ala 385 390 395 400 Ile Lys Ala Asn Val Glu Lys Ile Ser Lys Glu Asn Thr Ala Leu Tyr 405 410 415 541251DNAListeria monocytogenes 54gtggcagttg aaaaaatcac catgcccaaa ttaggggaaa gtgtaacaga aggaacgatt 60agttcatggt tagttaaacc aggcgataca gtagaaaaat atgatgctat cgcggaagtt 120ttaacagata aagtaacagc tgaaatccca tcatccttta gtggcactat caaagaaatt 180ttagcagagg aagatgaaac actagaagta ggcgaagtta tttgtaccat cgaaacagaa 240gaggctagta gttcagagcc tgtagttgaa gcagaacaaa cagaaccaaa aactccagaa 300aaacaagaaa caaaacaagt gaaattagca gaagcaccag ccagtggaag attttcacca 360gcggtactgc gtattgctgg agaaaacaat attgatttat caaccgtaga aggcacaggt 420aaaggtggcc gaattacaag aaaagattta cttcaagtaa ttgaaaatgg tccagtagct 480ccgaaacgcg aggaagtgaa gtctgctcca caagaaaaag aagcgacgcc aaatcctgta 540cgttcagcag caggtgacag agaaatccca atcaatggtg taagaaaagc gattgctaaa 600catatgagcg tgagtaaaca agaaattccg catgcttgga tgatggtgga agtagatgca 660actggtcttg ttcgctatcg taatacagtt aaagacagct ttaaaaaaga agaaggttat 720tcattaactt atttcgcctt tttcatcaaa gccgttgcac aagcattgaa agaattcccg 780caacttaaca gcacgtgggc aggcgataaa attattgagc atgcgaatat caatatttcg 840attgcgattg cagctggcga tttattgtat gtgccagtta ttaaaaatgc ggacgaaaaa 900tccattaaag gtattgctcg cgaaataagt gaactagctg gaaaagcgcg taatggtaaa 960ctgagccaag ccgatatgga aggtgggact ttcactgtaa atagtactgg ttcatttggc 1020tctgttcaat caatggggat tattaaccac ccacaagccg ctattcttca agtggaatcc 1080attgttaagc gcccagtcat tattgacgat atgattgctg tacgagatat ggtcaaccta 1140tgtctatcca tcgatcatcg tattttagac ggcttactag caggtaaatt cttacaagca 1200attaaagcca atgtcgaaaa gatttctaaa gaaaatacag cgttgtatta a 125155455PRTStreptomyces avermitilis 55Met Ala Gln Val Leu Glu Phe Lys Leu Pro Asp Leu Gly Glu Gly Leu 1 5 10 15 Thr Glu Ala Glu Ile Val Arg Trp Leu Val Gln Val Gly Asp Val Val 20 25 30 Ala Ile Asp Gln Pro Val Val Glu Val Glu Thr Ala Lys Ala Met Val 35 40 45 Glu Val Pro Cys Pro Tyr Gly Gly Val Val Thr Ala Arg Phe Gly Glu 50 55 60 Glu Gly Thr Glu Leu Pro Val Gly Ser Pro Leu Leu Thr Val Ala Val 65 70 75 80 Gly Ala Pro Ser Ser Val Pro Ala Ala Ser Ser Leu Ser Gly Ala Thr 85 90 95 Ser Ala Ser Ser Ala Ser Ser Val Ser Ser Asp Asp Gly Glu Ser Ser 100 105 110 Gly Asn Val Leu Val Gly Tyr Gly Thr Ser Ala Ala Pro Ala Arg Arg 115 120 125 Arg Arg Val Arg Pro Gly Gln Ala Ala Pro Val Val Thr Ala Thr Ala 130 135 140 Ala Ala Ala Ala Thr Arg Val Ala Ala Pro Glu Arg Ser

Asp Gly Pro 145 150 155 160 Val Pro Val Ile Ser Pro Leu Val Arg Arg Leu Ala Arg Glu Asn Gly 165 170 175 Leu Asp Leu Arg Ala Leu Ala Gly Ser Gly Pro Asp Gly Leu Ile Leu 180 185 190 Arg Ser Asp Val Glu Gln Ala Leu Arg Ala Ala Pro Thr Pro Ala Pro 195 200 205 Thr Pro Thr Met Pro Pro Ala Pro Thr Pro Ala Pro Thr Pro Ala Ala 210 215 220 Ala Pro Arg Gly Thr Arg Ile Pro Leu Arg Gly Val Arg Gly Ala Val 225 230 235 240 Ala Asp Lys Leu Ser Arg Ser Arg Arg Glu Ile Pro Asp Ala Thr Cys 245 250 255 Trp Val Asp Ala Asp Ala Thr Ala Leu Met His Ala Arg Val Ala Met 260 265 270 Asn Ala Thr Gly Gly Pro Lys Ile Ser Leu Ile Ala Leu Leu Ala Arg 275 280 285 Ile Cys Thr Ala Ala Leu Ala Arg Phe Pro Glu Leu Asn Ser Thr Val 290 295 300 Asp Met Asp Ala Arg Glu Val Val Arg Leu Asp Gln Val His Leu Gly 305 310 315 320 Phe Ala Ala Gln Thr Glu Arg Gly Leu Val Val Pro Val Val Arg Asp 325 330 335 Ala His Ala Arg Asp Ala Glu Ser Leu Ser Ala Glu Phe Ala Arg Leu 340 345 350 Thr Glu Ala Ala Arg Thr Gly Thr Leu Thr Pro Gly Glu Leu Thr Gly 355 360 365 Gly Thr Phe Thr Leu Asn Asn Tyr Gly Val Phe Gly Val Asp Gly Ser 370 375 380 Thr Pro Ile Ile Asn His Pro Glu Ala Ala Met Leu Gly Val Gly Arg 385 390 395 400 Ile Ile Pro Lys Pro Trp Val His Glu Gly Glu Leu Ala Val Arg Gln 405 410 415 Val Val Gln Leu Ser Leu Thr Phe Asp His Arg Val Cys Asp Gly Gly 420 425 430 Thr Ala Gly Gly Phe Leu Arg Tyr Val Ala Asp Cys Val Glu Gln Pro 435 440 445 Ala Val Leu Leu Arg Thr Leu 450 455 561368DNAStreptomyces avermitilis 56atggcccagg tgctcgagtt caagctcccc gacctcgggg agggcctgac cgaggccgag 60atcgtccgct ggctggtgca ggtcggcgac gtcgtggcga tcgaccagcc ggtcgtcgag 120gtggagacgg ccaaggcgat ggtcgaggtg ccgtgcccct acgggggcgt ggtcaccgcc 180cgcttcggcg aggagggcac ggaactgccc gtgggctcac cgctgttgac ggtggctgtc 240ggagctccgt cctcggtgcc cgcggcgtcc tcgctgtccg gggcgacatc ggcgtcctcc 300gcgtcctcgg tgtcatcgga cgacggcgag tcgtccggca acgtcctggt cggatacggc 360acgtcggccg cgcccgcgcg ccggcggagg gtgcggccgg gccaggcggc acccgtggtg 420acggcaactg ccgccgcggc cgccacgcgc gtggcggctc ccgagcggag cgacggcccc 480gtgcccgtga tctccccgct ggtccgcagg ctcgcccggg agaacggcct ggatctgcgg 540gcgctggcgg gctccgggcc cgacgggctg atcctgaggt cggacgtcga gcaggcgctg 600cgcgccgcgc ccactcctgc ccccaccccg accatgcctc cggctcccac tcctgccccc 660acccccgccg cggcaccccg cggcacccgc atccccctcc gaggggtccg cggtgccgtc 720gccgacaaac tctcccgcag ccggcgtgag atccccgacg cgacctgctg ggtggacgcc 780gacgccacgg cactcatgca cgcgcgcgtg gcgatgaacg cgaccggcgg cccgaagatc 840tccctcatcg cgctgctcgc caggatctgc accgccgcac tggcccgctt ccccgagctc 900aactccaccg tcgacatgga cgcccgcgag gtcgtacggc tcgaccaggt gcacctgggc 960ttcgccgcgc agaccgaacg ggggctcgtc gtcccggtcg tgcgggacgc gcacgcgcgg 1020gacgccgagt cgctcagcgc cgagttcgcg cggctgaccg aggccgcccg gaccggcacc 1080ctcacacccg gggaactgac cggcggcacc ttcacgttga acaactacgg ggtgttcggc 1140gtcgacggtt ccacgccgat catcaaccac cccgaggcgg ccatgctggg cgtcggccgc 1200atcatcccca agccgtgggt gcacgagggc gagctggcgg tgcggcaggt cgtccagctc 1260tcgctcacct tcgaccaccg ggtgtgcgac ggcggcacgg caggcggttt cctgcgctac 1320gtggcggact gcgtggaaca gccggcggtg ctgctgcgca ccctgtag 136857355PRTMicrococcus luteus 57Met Ser Glu Arg Met Thr Phe Gly Arg Ala Ile Asn Arg Gly Leu His 1 5 10 15 Arg Ala Leu Ala Asp Asp Pro Lys Val Leu Leu Met Gly Glu Asp Ile 20 25 30 Gly Ala Leu Gly Gly Val Phe Arg Ile Thr Asp Gly Leu Gln Ala Glu 35 40 45 Phe Gly Glu Asp Arg Val Leu Asp Thr Pro Leu Ala Glu Ser Gly Ile 50 55 60 Val Gly Thr Ala Ile Gly Leu Ala Met Arg Gly Tyr Arg Pro Val Val 65 70 75 80 Glu Ile Gln Phe Asp Gly Phe Val Tyr Pro Ala Phe Asp Gln Ile Val 85 90 95 Ala Asn Leu Ala Lys Leu Arg Ala Arg Thr Arg Gly Ala Val Pro Met 100 105 110 Pro Val Thr Ile Arg Ile Pro Phe Gly Gly Gly Ile Gly Ser Pro Glu 115 120 125 His His Ser Glu Ser Pro Glu Ala Tyr Phe Leu His Thr Ala Gly Leu 130 135 140 Arg Val Val Ser Pro Ser Ser Pro Gln Glu Gly Tyr Asp Leu Ile Arg 145 150 155 160 Ala Ala Ile Ala Ser Glu Asp Pro Val Val Tyr Leu Glu Pro Lys Arg 165 170 175 Arg Tyr His Asp Lys Gly Asp Val Asp Leu Gly Val Ala Ile Pro Pro 180 185 190 Met Ser Pro Ala Arg Ile Leu Arg Glu Gly Arg Asp Ala Thr Leu Val 195 200 205 Ala Tyr Gly Pro Leu Val Lys Thr Ala Leu Gln Ala Ala Glu Val Ala 210 215 220 Ala Glu Glu Gly Val Glu Val Glu Val Val Asp Leu Arg Ser Leu Ser 225 230 235 240 Pro Leu Asp Thr Gly Leu Val Glu Ser Ser Val Arg Arg Thr Gly Arg 245 250 255 Leu Val Val Ala His Glu Ala Ser Arg Thr Gly Gly Leu Gly Ala Glu 260 265 270 Leu Val Ala Thr Val Ala Glu Arg Ala Phe His Trp Leu Glu Ala Pro 275 280 285 Pro Val Arg Val Thr Gly Met Asp Val Pro Tyr Pro Pro Ser Lys Leu 290 295 300 Glu His Leu His Leu Pro Asp Leu Asp Arg Ile Leu Asp Gly Leu Asp 305 310 315 320 Arg Ala Leu Gly Arg Pro Asn Ser Leu Asp Ser Val Asp Ala Phe Ala 325 330 335 Ala Pro Glu Thr Ala Glu Gln Phe Leu Ala Ala Gln Asn Ala Gly Glu 340 345 350 Glu Thr Arg 355 581068DNAMicrococcus luteus 58gtgagcgagc gcatgacctt cggccgtgcg atcaaccgcg gcctgcaccg tgccctggcc 60gacgacccca aggtcctgct catgggcgag gacatcggcg ccctcggcgg cgtgttccgc 120atcaccgacg gcctgcaggc cgagttcggc gaggaccggg tgctcgacac cccgctggcc 180gagtccggca tcgtgggcac ggccatcggc ctggcgatgc gcggctaccg gcccgtcgtc 240gagatccagt tcgacggctt cgtgtacccg gcgttcgacc agatcgtggc gaacctggcc 300aagctgcgcg cccgcacccg cggcgccgtg ccgatgccgg tgaccatccg catccccttc 360ggcggcggca tcggctcccc ggagcaccac tccgagtcgc ccgaggccta cttcctgcac 420accgcgggtc tgcgcgtggt ctccccgtcc tccccgcagg aggggtacga cctcatccgc 480gccgcgatcg cctcggagga cccggtggtc tacctcgagc ccaagcgtcg ctaccacgac 540aagggcgacg tggacctggg cgtcgcgatc ccgccgatga gcccggcccg catcctgcgc 600gagggccgtg acgccacgct cgtggcctac ggcccgctcg tgaagaccgc cctgcaggcc 660gccgaggtgg cggccgagga gggtgtcgag gtcgaggtgg tcgacctgcg cagcctgtcc 720ccgctggaca ccggcctcgt cgagtcctcg gtgcggcgca ccggtcggct cgtcgtggcg 780cacgaggcct cccgcacggg cggcctcggc gccgagctcg tggccacggt ggccgagcgc 840gcgttccatt ggctcgaggc cccgccggtg cgcgtcaccg gcatggacgt gccctacccg 900ccgtccaagc tcgagcacct gcacctgccg gacctcgacc gcatcctcga cggcctggac 960cgtgctctgg gccggccgaa ttcgctggac tccgtggacg cgttcgccgc ccccgagacc 1020gccgagcagt tcctcgccgc ccagaacgcc ggggaggaga cccgatga 106859424PRTStaphylococcus aureus 59Met Glu Ile Thr Met Pro Lys Leu Gly Glu Ser Val His Glu Gly Thr 1 5 10 15 Ile Glu Gln Trp Leu Val Ser Val Gly Asp His Ile Asp Glu Tyr Glu 20 25 30 Pro Leu Cys Glu Val Ile Thr Asp Lys Val Thr Ala Glu Val Pro Ser 35 40 45 Thr Ile Ser Gly Thr Ile Thr Glu Ile Leu Val Glu Ala Gly Gln Thr 50 55 60 Val Ala Ile Asp Thr Ile Ile Cys Lys Ile Glu Thr Ala Asp Glu Lys 65 70 75 80 Thr Asn Glu Thr Thr Glu Glu Ile Gln Ala Lys Val Asp Glu His Thr 85 90 95 Gln Lys Ser Thr Lys Lys Ala Ser Ala Thr Val Glu Gln Thr Phe Thr 100 105 110 Ala Lys Gln Asn Gln Pro Arg Asn Asn Gly Arg Phe Ser Pro Val Val 115 120 125 Phe Lys Leu Ala Ser Glu His Asp Ile Asp Leu Ser Gln Val Val Gly 130 135 140 Ser Gly Phe Glu Gly Arg Val Thr Lys Lys Asp Ile Met Ser Val Ile 145 150 155 160 Glu Asn Gly Gly Thr Thr Ala Gln Ser Asp Lys Gln Val Gln Thr Lys 165 170 175 Ser Thr Ser Val Asp Thr Ser Ser Asn Gln Ser Ser Glu Asp Asn Ser 180 185 190 Glu Asn Ser Thr Ile Pro Val Asn Gly Val Arg Lys Ala Ile Ala Gln 195 200 205 Asn Met Val Asn Ser Val Thr Glu Ile Pro His Ala Trp Met Met Ile 210 215 220 Glu Val Asp Ala Thr Asn Leu Val Asn Thr Arg Asn His Tyr Lys Asn 225 230 235 240 Ser Phe Lys Asn Lys Glu Gly Tyr Asn Leu Thr Phe Phe Ala Phe Phe 245 250 255 Val Lys Ala Val Ala Asp Ala Leu Lys Ala Tyr Pro Leu Leu Asn Ser 260 265 270 Ser Trp Gln Gly Asn Glu Ile Val Leu His Lys Asp Ile Asn Ile Ser 275 280 285 Ile Ala Val Ala Asp Glu Asn Lys Leu Tyr Val Pro Val Ile Lys His 290 295 300 Ala Asp Glu Lys Ser Ile Lys Gly Ile Ala Arg Glu Ile Asn Thr Leu 305 310 315 320 Ala Thr Lys Ala Arg Asn Lys Gln Leu Thr Thr Glu Asp Met Gln Gly 325 330 335 Gly Thr Phe Thr Val Asn Asn Thr Gly Thr Phe Gly Ser Val Ser Ser 340 345 350 Met Gly Ile Ile Asn His Pro Gln Ala Ala Ile Leu Gln Val Glu Ser 355 360 365 Ile Val Lys Lys Pro Val Val Ile Asn Asp Met Ile Ala Ile Arg Ser 370 375 380 Met Val Asn Leu Cys Ile Ser Ile Asp His Arg Ile Leu Asp Gly Leu 385 390 395 400 Gln Thr Gly Lys Phe Met Asn His Ile Lys Gln Arg Ile Glu Gln Tyr 405 410 415 Thr Leu Glu Asn Thr Asn Ile Tyr 420 601275DNAStaphylococcus aureus 60ctaatatata tttgtatttt ctaaagtata ctgttcgata cgctgtttaa tatgattcat 60aaatttacca gtttgtaaac catctaaaat gcgatgatct attgaaatac ataaatttac 120catacttcga attgcaatca tatcattaat tactactggc tttttaacga ttgattctac 180ttgtaatatc gctgcttgag ggtgatttat aatccccatt gatgatactg aaccaaatgt 240accagtatta ttaactgtaa atgttccacc ttgcatatct tcagttgtca attgcttatt 300acgcgctttt gttgctaaag tattaatttc tctagctata cctttgattg acttttcgtc 360tgcatgctta atcacaggta cgtataattt attttcatca gcaacagcaa ttgaaatatt 420aatgtcttta tgtaagacaa tttcatttcc ttgccagcta ctatttaata aaggatatgc 480ttttaaagca tctgctacag cttttacaaa gaaagcaaag aacgttagat tatatccttc 540tttattttta aagctgtttt tataatgatt tctcgtattc acaagatttg tagcatctac 600ttcaatcatc atccatgcat gtggaatctc tgttacacta ttaaccatat tttgcgcaat 660tgctttacgc acaccattta ctggtattgt gctgttttca ctattgtctt cagatgattg 720gttacttgat gtatctactg atgttgattt tgtttgaact tgtttgtcag attgagctgt 780ggtaccacca ttttcaataa ctgacattat atccttctta gttacacgac cttcaaatcc 840actacctaca acttgtgata aatcaatgtc atgctctgaa gcgagtttaa atacaacagg 900tgaaaagcga ccattattac gaggttgatt ttgtttagca gtaaatgtct gttccactgt 960tgcactagct tttttagtag atttctgagt atgctcatcc acttttgctt gtatctcttc 1020agttgtttca tttgtctttt catcagcagt ttcaatttta cagataattg tatcaatagc 1080tactgtctgc cccgcttcaa ctaaaatttc tgtaattgtt cctgatatcg tggaagggac 1140ttcagctgtc actttatctg taataacttc acataatggt tcatattcat caatatgatc 1200accaacagaa actaaccatt gttcaatggt accttcatga acactctcac ctaacttagg 1260cattgttatt tccat 127561455PRTStreptococcus mutans 61Met Ala Val Glu Ile Ile Met Pro Lys Leu Gly Val Asp Met Gln Glu 1 5 10 15 Gly Glu Ile Ile Glu Trp Lys Lys Gln Glu Gly Asp Glu Val Lys Glu 20 25 30 Gly Glu Ile Leu Leu Glu Ile Met Ser Asp Lys Thr Asn Met Glu Ile 35 40 45 Glu Ala Glu Asp Ser Gly Val Leu Leu Lys Ile Val Lys Gly Asn Gly 50 55 60 Gln Val Val Pro Val Thr Glu Val Ile Gly Tyr Ile Gly Gln Ala Gly 65 70 75 80 Glu Val Leu Glu Ile Ala Asp Val Pro Ala Ser Thr Val Pro Lys Glu 85 90 95 Asn Ser Ala Ala Pro Ala Glu Lys Thr Lys Ala Met Ser Ser Pro Thr 100 105 110 Val Ala Ala Ala Pro Gln Gly Lys Ile Arg Ala Thr Pro Ala Ala Arg 115 120 125 Lys Ala Ala Arg Asp Leu Gly Val Asn Leu Asn Gln Val Ser Gly Thr 130 135 140 Gly Ala Lys Gly Arg Val His Lys Glu Asp Val Glu Ser Phe Lys Ala 145 150 155 160 Ala Gln Pro Lys Ala Thr Pro Leu Ala Arg Lys Ile Ala Ile Asp Lys 165 170 175 Gly Ile Asp Leu Ala Ser Val Ser Gly Thr Gly Phe Gly Gly Lys Ile 180 185 190 Ile Lys Glu Asp Ile Leu Asn Leu Phe Glu Ala Ala Gln Pro Val Asn 195 200 205 Asp Val Ser Asp Pro Ala Lys Glu Ala Ala Ala Leu Pro Glu Gly Val 210 215 220 Glu Val Ile Lys Met Ser Ala Met Arg Lys Ala Val Ala Lys Ser Met 225 230 235 240 Val Asn Ser Tyr Leu Thr Ala Pro Thr Phe Thr Leu Asn Tyr Asp Ile 245 250 255 Asp Met Thr Glu Met Ile Ala Leu Arg Lys Lys Leu Ile Asp Pro Ile 260 265 270 Met Glu Lys Thr Gly Phe Lys Val Ser Phe Thr Asp Leu Ile Gly Leu 275 280 285 Ala Val Val Lys Thr Leu Met Lys Pro Glu His Arg Tyr Leu Asn Ala 290 295 300 Ser Leu Ile Asn Asp Ala Thr Glu Ile Glu Leu His Gln Phe Val Asn 305 310 315 320 Leu Gly Ile Ala Val Gly Leu Asp Glu Gly Leu Leu Val Pro Val Val 325 330 335 His Gly Ala Asp Lys Met Ser Leu Ser Asp Phe Val Ile Ala Ser Lys 340 345 350 Asp Val Ile Lys Lys Ala Gln Thr Gly Lys Leu Lys Ala Thr Glu Met 355 360 365 Ser Gly Ser Thr Phe Ser Ile Thr Asn Leu Gly Met Phe Gly Thr Lys 370 375 380 Thr Phe Asn Pro Ile Ile Asn Gln Pro Asn Ser Ala Ile Leu Gly Val 385 390 395 400 Gly Ala Thr Ile Gln Thr Pro Thr Val Val Asp Gly Glu Ile Lys Ile 405 410 415 Arg Pro Ile Met Ala Leu Cys Leu Thr Ile Asp His Arg Leu Val Asp 420 425 430 Gly Met Asn Gly Ala Lys Phe Met Val Asp Leu Lys Lys Leu Met Glu 435 440 445 Asn Pro Phe Thr Leu Leu Ile 450 455 621368DNAStreptococcus mutans 62atggcagtcg aaattattat gcctaaactt ggtgttgata tgcaggaagg cgaaatcatc 60gagtggaaaa aacaagaagg tgatgaggtc aaagaagggg agatcctcct tgagattatg 120tctgacaaga ccaatatgga aatcgaagct gaggattcag gtgtcctgct taagattgtt 180aaaggaaatg gtcaagttgt tcctgtaact gaggtcattg gttatattgg tcaagcaggt 240gaagttcttg aaatagctga tgttcctgca agtacagttc ctaaagaaaa tagtgcagca 300cctgctgaaa aaacaaaagc aatgtcttct ccgacagttg cagcagcccc tcaaggaaag 360attcgagcaa caccagcagc tcgtaaggcg gctcgtgatc tgggagttaa cctgaatcag 420gtttcaggga caggcgctaa aggccgtgtt cacaaggaag atgttgaaag ctttaaagca 480gctcagccta aagcaacacc attagctagg aaaattgcta tagataaagg tattgatcta 540gccagtgtct caggaacagg ttttggcggc aaaattatca aggaagatat tttaaatctg 600tttgaggcag ctcagcctgt taatgatgtg tcagatcctg ctaaagaagc agctgcctta 660ccagagggtg ttgaagtcat taagatgtct gccatgcgta aggcagtggc taaaagcatg 720gtcaattctt acctgacagc tccaactttt actctcaatt atgacattga catgactgag 780atgattgcgt tgcgtaaaaa gttaattgat cctatcatgg aaaaaacagg ttttaaagtt 840agcttcacag atttgattgg tctggcagtc gtaaaaacct taatgaaacc agaacatcgt 900tacctcaatg cttcactcat taatgacgcg actgagattg aacttcatca atttgttaac 960cttggtatcg ccgttggact tgatgaagga ctgttagtac ctgttgttca tggtgcagat

1020aagatgagct tgtcagattt tgttatagct tcaaaggatg tcattaagaa agctcagacc 1080ggtaaattaa aagccactga aatgtctggt tcaacctttt ccattacaaa cttggggatg 1140tttggcacta agactttcaa ccccattatc aatcagccaa attcggctat tttgggtgta 1200ggagcaacta tccaaacgcc aactgttgtg gatggtgaaa ttaagattcg tccaatcatg 1260gcactgtgct tgaccatcga tcaccgcttg gttgatggca tgaacggcgc taagttcatg 1320gttgatctta aaaaactgat ggaaaatcca tttacattat tgatttga 13686314PRTArtificial sequenceSynthetic polypeptide 63Pro Xaa Val Xaa Xaa Xaa Ala Xaa Xaa Xaa Gly Xaa Xaa Leu 1 5 10 648PRTArtificial sequenceSynthetic polypeptide 64Xaa Xaa Gly Xaa Xaa Gly Xaa Ile 1 5 6516PRTArtificial sequenceSynthetic polypeptide 65Xaa Pro Xaa Xaa Gly Xaa Arg Xaa Xaa Ala Xaa Xaa Xaa Xaa Xaa Xaa 1 5 10 15 6613PRTArtificial sequenceSynthetic polypeptide 66Gly Xaa Thr Xaa Thr Xaa Xaa Xaa Xaa Gly Xaa Xaa Gly 1 5 10 6719PRTArtificial sequenceSynthetic polypeptide 67Asn Xaa Pro Glu Xaa Ala Xaa Xaa Xaa Val Xaa Xaa Xaa Xaa Xaa Xaa 1 5 10 15 Pro Xaa Val 6812PRTArtificial sequenceSynthetic polypeptide 68Leu Xaa Xaa Xaa Phe Xaa His Arg Xaa Xaa Asp Gly 1 5 10 69457PRTBacillus subtillis 69Met Ala Thr Glu Tyr Asp Val Val Ile Leu Gly Gly Gly Thr Gly Gly 1 5 10 15 Tyr Val Ala Ala Ile Arg Ala Ala Gln Leu Gly Leu Lys Thr Ala Val 20 25 30 Val Glu Lys Glu Lys Leu Gly Gly Thr Cys Leu His Lys Gly Cys Ile 35 40 45 Pro Ser Lys Ala Leu Leu Arg Ser Ala Glu Val Tyr Arg Thr Ala Arg 50 55 60 Glu Ala Asp Gln Phe Gly Val Glu Thr Ala Gly Val Ser Leu Asn Phe 65 70 75 80 Glu Lys Val Gln Gln Arg Lys Gln Ala Val Val Asp Lys Leu Ala Ala 85 90 95 Gly Val Asn His Leu Met Lys Lys Gly Lys Ile Asp Val Tyr Thr Gly 100 105 110 Tyr Gly Arg Ile Leu Gly Pro Ser Ile Phe Ser Pro Leu Pro Gly Thr 115 120 125 Ile Ser Val Glu Arg Gly Asn Gly Glu Glu Asn Asp Met Leu Ile Pro 130 135 140 Lys Gln Val Ile Ile Ala Thr Gly Ser Arg Pro Arg Met Leu Pro Gly 145 150 155 160 Leu Glu Val Asp Gly Lys Ser Val Leu Thr Ser Asp Glu Ala Leu Gln 165 170 175 Met Glu Glu Leu Pro Gln Ser Ile Ile Ile Val Gly Gly Gly Val Ile 180 185 190 Gly Ile Glu Trp Ala Ser Met Leu His Asp Phe Gly Val Lys Val Thr 195 200 205 Val Ile Glu Tyr Ala Asp Arg Ile Leu Pro Thr Glu Asp Leu Glu Ile 210 215 220 Ser Lys Glu Met Glu Ser Leu Leu Lys Lys Lys Gly Ile Gln Phe Ile 225 230 235 240 Thr Gly Ala Lys Val Leu Pro Asp Thr Met Thr Lys Thr Ser Asp Asp 245 250 255 Ile Ser Ile Gln Ala Glu Lys Asp Gly Glu Thr Val Thr Tyr Ser Ala 260 265 270 Glu Lys Met Leu Val Ser Ile Gly Arg Gln Ala Asn Ile Glu Gly Ile 275 280 285 Gly Leu Glu Asn Thr Asp Ile Val Thr Glu Asn Gly Met Ile Ser Val 290 295 300 Asn Glu Ser Cys Gln Thr Lys Glu Ser His Ile Tyr Ala Ile Gly Asp 305 310 315 320 Val Ile Gly Gly Leu Gln Leu Ala His Val Ala Ser His Glu Gly Ile 325 330 335 Ile Ala Val Glu His Phe Ala Gly Leu Asn Pro His Pro Leu Asp Pro 340 345 350 Thr Leu Val Pro Lys Cys Ile Tyr Ser Ser Pro Glu Ala Ala Ser Val 355 360 365 Gly Leu Thr Glu Asp Glu Ala Lys Ala Asn Gly His Asn Val Lys Ile 370 375 380 Gly Lys Phe Pro Phe Met Ala Ile Gly Lys Ala Leu Val Tyr Gly Glu 385 390 395 400 Ser Asp Gly Phe Val Lys Ile Val Ala Asp Arg Asp Thr Asp Asp Ile 405 410 415 Leu Gly Val His Met Ile Gly Pro His Val Thr Asp Met Ile Ser Glu 420 425 430 Ala Gly Leu Ala Lys Val Leu Asp Ala Thr Pro Trp Glu Val Gly Gln 435 440 445 Thr Ile Ser Pro Ala Ser Asn Ala Phe 450 455 701374DNABacillus subtillis 70tcagaaagcg ttggatgcgg gtgaaatcgt ttgcccgacc tcccacggtg ttgcgtccag 60cactttggca agacccgctt cagaaatcat gtcggtgaca tgcgggccaa tcatatgaac 120gccgagaata tcatctgtat ctcggtcagc cacgattttg acaaaaccgt cgctttcacc 180gtatacaagc gcttttccaa tcgccataaa tgggaacttg ccgattttga cattatgccc 240gttcgccttt gcttcgtctt cggttaagcc gacactggca gcttcagggc ttgagtaaat 300gcacttcggc acaagcgtcg gatcaagcgg atgcggattg agacctgcaa aatgctcaac 360agcaataatt ccctcatgtg aagcaacgtg agctaactgc aggccaccga ttacgtctcc 420gattgcataa atatgagatt ccttcgtttg gcagctttca ttgactgaaa tcatgccatt 480ttcagtaaca atatcggtgt tctctaggcc gatgccttcg atatttgcct gtctgccgat 540ggaaacaagc attttctcag cagaataggt aacggtttct ccgtcttttt ccgcttgtat 600gctgatatcg tctgatgttt ttgtcattgt gtcaggcagc acttttgccc ctgttatgaa 660ctggatgcct tttttcttaa gaagactttc catttctttt gaaatctcta gatcttcagt 720cggcaatatg cgatccgcgt attcaataac cgttacctta acgccaaaat catgaagcat 780agacgcccat tcgataccga taacccctcc gccgacaatg atgattgact gtggcagctc 840ctccatttgg agcgcctcat ctgaagtcag tacagactta ccgtccactt caagacccgg 900aagcattctc ggtcttgatc ctgttgcaat gatcacttgt ttcgggatca gcatgtcatt 960ttcttcgcca tttccccgct caacagaaat tgttcccggc agcggagaga agattgacgg 1020tccaaggata cgtccatatc cggtgtacac gtcaattttt ccttttttca ttaaatgatt 1080tacacccgct gcaagcttat caacaacggc ttgcttacgc tgctgcactt tttcaaagtt 1140gagggacacg ccagccgttt ccactccgaa ttgatcggct tcacgagctg tccggtatac 1200ctctgcgctt ctaagcagcg ctttactcgg gatacagcct ttatgcagac atgttccccc 1260gagtttttcc ttttccacaa cggctgtttt taagccgagc tgagcggctc tgatggccgc 1320aacataaccg ccggtaccgc cgcccagaat gactacgtca tactcagttg ccat 137471462PRTStreptomyces avermitilis 71Met Ala Asn Asp Ala Ser Thr Val Phe Asp Leu Val Ile Leu Gly Gly 1 5 10 15 Gly Ser Gly Gly Tyr Ala Ala Ala Leu Arg Gly Ala Gln Leu Gly Leu 20 25 30 Asp Val Ala Leu Ile Glu Lys Asp Lys Val Gly Gly Thr Cys Leu His 35 40 45 Arg Gly Cys Ile Pro Thr Lys Ala Leu Leu His Ala Gly Glu Ile Ala 50 55 60 Asp Gln Ala Arg Glu Ser Glu Gln Phe Gly Val Lys Ala Thr Phe Glu 65 70 75 80 Gly Ile Asp Val Pro Ala Val His Lys Tyr Lys Asp Gly Val Ile Ser 85 90 95 Gly Leu Tyr Lys Gly Leu Gln Gly Leu Ile Ala Ser Arg Lys Val Thr 100 105 110 Tyr Ile Glu Gly Glu Gly Arg Leu Ser Ser Pro Thr Ser Val Asp Val 115 120 125 Asn Gly Gln Arg Val Gln Gly Arg His Val Leu Leu Ala Thr Gly Ser 130 135 140 Val Pro Lys Ser Leu Pro Gly Leu Ala Ile Asp Gly Asn Arg Ile Ile 145 150 155 160 Ser Ser Asp His Ala Leu Val Leu Asp Arg Val Pro Glu Ser Ala Ile 165 170 175 Val Leu Gly Gly Gly Val Ile Gly Val Glu Phe Ala Ser Ala Trp Lys 180 185 190 Ser Phe Gly Ala Asp Val Thr Val Ile Glu Gly Leu Lys His Leu Val 195 200 205 Pro Val Glu Asp Glu Asn Ser Ser Lys Leu Leu Glu Arg Ala Phe Arg 210 215 220 Lys Arg Gly Ile Lys Phe Asn Leu Gly Thr Phe Phe Ser Lys Ala Glu 225 230 235 240 Tyr Thr Gln Asn Gly Val Lys Val Thr Leu Ala Asp Gly Lys Glu Phe 245 250 255 Glu Ala Glu Val Leu Leu Val Ala Val Gly Arg Gly Pro Val Ser Gln 260 265 270 Gly Leu Gly Tyr Glu Glu Gln Gly Val Ala Met Asp Arg Gly Tyr Val 275 280 285 Leu Val Asp Glu Tyr Met Arg Thr Asn Val Pro Thr Ile Ser Ala Val 290 295 300 Gly Asp Leu Val Pro Thr Leu Gln Leu Ala His Val Gly Phe Ala Glu 305 310 315 320 Gly Ile Leu Val Ala Glu Arg Leu Ala Gly Leu Lys Thr Val Pro Ile 325 330 335 Asp Tyr Asp Gly Val Pro Arg Val Thr Tyr Cys His Pro Glu Val Ala 340 345 350 Ser Val Gly Ile Thr Glu Ala Lys Ala Lys Glu Ile Tyr Gly Ala Asp 355 360 365 Lys Val Val Ala Leu Lys Tyr Asn Leu Ala Gly Asn Gly Lys Ser Lys 370 375 380 Ile Leu Asn Thr Ala Gly Glu Ile Lys Leu Val Gln Val Lys Asp Gly 385 390 395 400 Ala Val Val Gly Val His Met Val Gly Asp Arg Met Gly Glu Gln Val 405 410 415 Gly Glu Ala Gln Leu Ile Tyr Asn Trp Glu Ala Leu Pro Ala Glu Val 420 425 430 Ala Gln Leu Ile His Ala His Pro Thr Gln Asn Glu Ala Met Gly Glu 435 440 445 Ala His Leu Ala Leu Ala Gly Lys Pro Leu His Ser His Asp 450 455 460 721389DNAStreptomyces avermitilis 72tcagtcgtgc gagtgcagcg gcttgcccgc gagggccagg tgggcctcgc ccatcgcttc 60gttctgcgtc gggtgggcgt ggatgagctg ggcgacctcg gccggcagcg cctcccagtt 120gtagatcagc tgggcttcgc cgacctgctc gcccatacgg tcaccgacca tgtggacgcc 180gaccacggca ccgtccttca cctggacgag cttgatctcg cccgcggtgt tgaggatctt 240gctcttgccg ttgcccgcca ggttgtactt cagagcgacg accttgtccg cgccgtagat 300ctccttggcc ttggcctcgg tgatgcccac ggaggcgacc tcggggtggc agtacgtcac 360ccgcggcacg ccgtcgtagt cgatcgggac ggtcttcaga ccggccagac gctccgccac 420caggatgccc tcggcgaagc cgacgtgcgc gagctggagc gtcgggacca ggtcaccgac 480ggcggagatg gtcgggacgt tcgtccgcat gtactcgtcg accaggacgt agccgcggtc 540catggcgacg ccctgctcct cgtagccgag gccctgcgag accgggccgc ggccgacggc 600gacgagcagg acctcggcct cgaactcctt gccgtcggcg agggtgacct tgacaccgtt 660ctgggtgtac tcggccttcg agaagaaggt gcccaggttg aacttgatgc cgcgcttgcg 720gaacgcgcgc tcaagaagct tggaggagtt ctcgtcctcg accgggacga ggtgcttgag 780gccctcgatc accgtcacgt cggctccgaa ggacttccac gcggaggcga actcgacgcc 840gatgacgccg ccgccgagca cgatcgcgga ctccgggacg cggtccagga ccagcgcgtg 900gtcggaggag atgatgcggt tgccgtcgat cgccaggccc ggcagcgact tcggcacgga 960gccggtcgcc aggagcacgt ggcggccctg gacgcgctgg ccgttcacgt cgacggaggt 1020cggggaggac agacggccct caccctcgat gtacgtcacc ttgcgggagg cgatcagccc 1080ctgcagaccc ttgtacaggc ccgagatgac cccgtccttg tacttgtgga cggccggtac 1140gtcgatgccc tcgaaggtgg ccttgacgcc gaactgctcg ctctcgcggg cctggtcggc 1200gatctcgccc gcgtgcagca gcgccttggt ggggatgcac ccacggtgca ggcaggtacc 1260gccgaccttg tccttctcga tcagggcgac gtccaggccc agctgcgctc cgcgcagggc 1320cgcggcgtaa ccaccgctac caccgccgag gatcactagg tcgaaaacgg tgctggcgtc 1380gttcgccac 138973459PRTPseudomonas putida 73Met Gln Gln Ile Ile Gln Thr Thr Leu Leu Ile Ile Gly Gly Gly Pro 1 5 10 15 Gly Gly Tyr Val Ala Ala Ile Arg Ala Gly Gln Leu Gly Ile Pro Thr 20 25 30 Val Leu Val Glu Gly Gln Ala Leu Gly Gly Thr Cys Leu Asn Ile Gly 35 40 45 Cys Ile Pro Ser Lys Ala Leu Ile His Val Ala Glu Gln Phe His Gln 50 55 60 Ala Ser Arg Phe Thr Glu Pro Ser Pro Leu Gly Ile Ser Val Ala Ser 65 70 75 80 Pro Arg Leu Asp Ile Gly Gln Ser Val Thr Trp Lys Asp Gly Ile Val 85 90 95 Asp Arg Leu Thr Thr Gly Val Ala Ala Leu Leu Lys Lys His Gly Val 100 105 110 Lys Val Val His Gly Trp Ala Lys Val Leu Asp Gly Lys Gln Val Glu 115 120 125 Val Asp Gly Gln Arg Ile Gln Cys Glu His Leu Leu Leu Ala Thr Gly 130 135 140 Ser Thr Ser Val Glu Leu Pro Met Leu Pro Leu Gly Gly Pro Val Ile 145 150 155 160 Ser Ser Thr Glu Ala Leu Ala Pro Lys Ala Leu Pro Gln His Leu Val 165 170 175 Val Val Gly Gly Gly Tyr Ile Gly Leu Glu Leu Gly Ile Ala Tyr Arg 180 185 190 Lys Leu Gly Ala Gln Val Ser Val Val Glu Ala Arg Glu Arg Ile Leu 195 200 205 Pro Thr Tyr Asp Ser Glu Leu Thr Ala Pro Val Ala Glu Ser Leu Lys 210 215 220 Lys Leu Gly Ile Ala Leu His Leu Gly His Ser Val Glu Gly Tyr Glu 225 230 235 240 Asn Gly Cys Leu Leu Ala Ser Asp Gly Lys Gly Gly Gln Leu Arg Leu 245 250 255 Glu Ala Asp Gln Val Leu Val Ala Val Gly Arg Arg Pro Arg Thr Lys 260 265 270 Gly Phe Asn Leu Glu Cys Leu Asp Leu Lys Met Asn Gly Thr Ala Ile 275 280 285 Ala Ile Asp Glu Arg Cys His Thr Ser Met His Asn Val Trp Ala Ile 290 295 300 Gly Asp Val Ala Gly Glu Pro Met Leu Ala His Arg Ala Met Ala Gln 305 310 315 320 Gly Glu Met Val Ala Glu Ile Ile Ala Gly Lys Ala Arg Arg Phe Glu 325 330 335 Pro Ala Ala Ile Ala Ala Val Cys Phe Thr Asp Pro Glu Val Val Val 340 345 350 Val Gly Lys Thr Pro Glu Gln Ala Ser Gln Gln Ala Leu Asp Cys Ile 355 360 365 Val Ala Gln Phe Pro Phe Ala Ala Asn Gly Arg Ala Met Ser Leu Glu 370 375 380 Ser Lys Ser Gly Phe Val Arg Val Val Ala Arg Arg Asp Asn His Leu 385 390 395 400 Ile Val Gly Trp Gln Ala Val Gly Val Ala Val Ser Glu Leu Ser Thr 405 410 415 Ala Phe Ala Gln Ser Leu Glu Met Gly Ala Cys Leu Glu Asp Val Ala 420 425 430 Gly Thr Ile His Ala His Pro Thr Leu Gly Glu Ala Val Gln Glu Ala 435 440 445 Ala Leu Arg Ala Leu Gly His Ala Leu His Ile 450 455 741380DNAPseudomonas putida 74tcagatatgc aaggcgtggc ccaacgcgcg tagggcggct tcttgcaccg cttcacccaa 60cgtggggtgg gcatgaatgg tgccggccac atcttccagg cacgcgccca tctccagcga 120ttgggcaaac gcggtggaca gctcggagac cgccacgcca accgcctgcc aacccacgat 180caggtggttg tcacggcgcg ccaccacccg cacgaaaccg cttttcgact ccaggctcat 240ggcccggcca ttggcggcaa acgggaactg cgcgacgatg cagtccaggg cctgctggct 300ggcttgttcc ggggtcttgc cgaccaccac cacttccggg tcggtaaagc acacggcggc 360aatcgctgcc ggctcgaagc gtcgggcctt gccggcgatg atttcggcga ccatctcgcc 420ttgggccatg gcccggtgcg ccagcatcgg ttcgccagcg acgtcgccaa tggcccagac 480gttgtgcatg ctggtatgac agcgctcgtc gatggcaatg gcggtgccgt tcatcttcag 540gtccaggcat tccaggttga agcccttggt gcgtggccgg cggcccacgg ccaccagtac 600ctgatcggct tcaagacgca gttgcccacc cttgccgtcg ctggccagca ggcagccatt 660ttcgtagccc tcgacgctgt ggcccaggtg caacgcgatg cccagtttct tcagcgactc 720ggccaccggg gcggtcaatt cgctgtcgta ggtcggcagg atgcgttcgc gcgcttccac 780cacactcacc tgtgcaccca gcttgcgata ggcaatgccc agctccaggc cgatatagcc 840accgccgacc accaccaggt gttgcggcag ggctttcggc gccagggctt cggtcgagga 900aatcaccggg ccacccagcg gcagcatcgg cagttcgaca ctggtggaac cggtcgccag 960caacagatgc tcgcactgga tacgctggcc atcgacctcg acctgcttgc cgtccagtac 1020cttggcccag ccatgcacca ctttcacccc gtgctttttc agcaaggcgg caacaccggt 1080ggtcagacgg tcgacaatgc cgtccttcca ggtgacgctc tggccgatgt ccaggcgcgg 1140cgaagccacg ctgatgccca gcggcgaggg ttcggtaaag cgcgaggctt ggtgaaactg 1200ctcggccacg tggatcagcg ccttggacgg gatgcagccg atgttcaggc aggtgccgcc 1260cagtgcctgg ccttccacca gtacggtagg aatgcccagt tgcccggcgc ggatggctgc 1320tacatagccg ccagggccgc cgccgatgat caacagggta gtctggataa tctgttgcat 138075475PRTListeria monocytogenes 75Met Ala Lys Glu Tyr Asp Val Val Ile Leu Gly Gly Gly Thr Gly Gly 1 5 10 15 Tyr Val Ala Ala Ile Gln Ala Ala Lys Asn Gly Gln Lys Val Ala Val 20 25 30 Val Glu Lys Gly Lys Val Gly Gly Thr Cys Leu His Arg Gly Cys Ile 35 40 45 Pro Thr Lys Ala Leu Leu Arg Ser Ala Glu Val Leu Gln Thr Val Lys 50 55 60 Lys Ala Ser Glu Phe Gly Ile Ser Val Glu Gly Thr Ala Gly Ile Asn 65 70

75 80 Phe Leu Gln Ala Gln Glu Arg Lys Gln Ala Ile Val Asp Gln Leu Glu 85 90 95 Lys Gly Ile His Gln Leu Phe Lys Gln Gly Lys Ile Asp Leu Phe Val 100 105 110 Gly Thr Gly Thr Ile Leu Gly Pro Ser Ile Phe Ser Pro Thr Ala Gly 115 120 125 Thr Ile Ser Val Glu Phe Glu Asp Gly Ser Glu Asn Glu Met Leu Ile 130 135 140 Pro Lys Asn Leu Ile Ile Ala Thr Gly Ser Lys Pro Arg Thr Leu Ser 145 150 155 160 Gly Leu Thr Ile Asp Glu Glu His Val Leu Ser Ser Asp Gly Ala Leu 165 170 175 Asn Leu Glu Thr Leu Pro Lys Ser Ile Ile Ile Val Gly Gly Gly Val 180 185 190 Ile Gly Met Glu Trp Ala Ser Met Met His Asp Phe Gly Val Glu Val 195 200 205 Thr Val Leu Glu Tyr Ala Asp Arg Ile Leu Pro Thr Glu Asp Lys Glu 210 215 220 Val Ala Lys Glu Leu Ala Arg Leu Tyr Lys Lys Lys Lys Leu Asn Met 225 230 235 240 His Thr Ser Ala Glu Val Gln Ala Ala Ser Tyr Lys Lys Thr Asp Thr 245 250 255 Gly Val Glu Ile Lys Ala Ile Ile Lys Gly Glu Glu Gln Thr Phe Thr 260 265 270 Ala Asp Lys Ile Leu Val Ser Val Gly Arg Ser Ala Thr Thr Glu Asn 275 280 285 Ile Gly Leu Gln Asn Thr Asp Ile Ala Thr Glu Asn Gly Phe Ile Gln 290 295 300 Val Asn Asp Phe Tyr Gln Thr Lys Glu Ser His Ile Tyr Ala Ile Gly 305 310 315 320 Asp Cys Ile Pro Thr Ile Gln Leu Ala His Val Ala Met Glu Glu Gly 325 330 335 Thr Ile Ala Ala Asn His Ile Ala Gly Lys Ala Ala Glu Lys Leu Asp 340 345 350 Tyr Asp Leu Val Pro Arg Cys Ile Tyr Thr Ser Thr Glu Ile Ala Ser 355 360 365 Val Gly Ile Thr Glu Glu Gln Ala Lys Glu Arg Gly His Glu Val Lys 370 375 380 Lys Gly Lys Phe Phe Phe Arg Gly Ile Gly Lys Ala Leu Val Tyr Gly 385 390 395 400 Glu Ser Asp Gly Phe Ile Lys Ile Ile Ala Asp Lys Lys Thr Asp Asp 405 410 415 Ile Leu Gly Val Ser Met Ile Gly Pro His Val Thr Asp Met Ile Ser 420 425 430 Glu Ala Ala Leu Ala Gln Val Leu Asn Ala Thr Pro Trp Glu Val Gly 435 440 445 Asn Thr Ile His Pro His Pro Thr Leu Ser Glu Ser Phe Arg Glu Ala 450 455 460 Ala Leu Ala Val Asp Gly Asn Ala Ile His Gly 465 470 475 761428DNAListeria monocytogenes 76gtggcaaaag aatatgatgt agttattctt ggcggaggaa ctggcggtta cgttgcagca 60attcaagcag ctaagaatgg ccagaaagta gccgtcgttg aaaaagggaa agttggagga 120acgtgtcttc accgtgggtg tattccaacg aaagcgttat tacgttcagc ggaagttcta 180caaacggtaa aaaaagcaag tgaatttggt atttctgtag aaggaactgc cggaatcaat 240tttttacaag cacaagaacg aaaacaagca atagtagatc aattagaaaa aggtattcac 300caattattta aacaagggaa aattgacttg tttgttggaa cgggaactat tttgggacca 360tcaatttttt caccaacagc tggaacaatt tcagttgaat tcgaagatgg ttctgaaaat 420gaaatgctaa ttcctaaaaa cttaattatc gcaactgggt ccaaaccgcg cacattaagc 480ggtttaacaa tcgatgagga acatgtttta tcatctgacg gcgcgcttaa cctagaaact 540ttaccaaaat caattattat tgttggcggt ggggttatcg gaatggaatg ggcttcgatg 600atgcatgatt tcggtgtaga agttacggtg ctagaatatg cagaccgaat tttgccaaca 660gaagataaag aagtggccaa agaattagca agactttata aaaagaaaaa attaaacatg 720catacatctg ctgaagttca agcagctagt tataaaaaaa cagatactgg tgtggaaatt 780aaagcaatca ttaaaggcga agagcagact ttcacagcag ataaaattct tgtttcagtt 840ggtcgttctg ctactacaga aaacatcggc ttacaaaata cagatatcgc gaccgaaaac 900ggctttatcc aagtaaatga tttttaccaa acaaaagaaa gtcacatcta tgcgattgga 960gactgcattc caacgattca actcgcgcac gttgcaatgg aagaaggaac aattgcagcc 1020aaccatattg ccggaaaagc agccgaaaaa cttgactacg acttagttcc ccgctgtatt 1080tatacttcta cagaaatcgc aagtgtcggt atcacagaag aacaagcaaa agaacggggt 1140catgaagtga aaaaaggcaa attcttcttc cgtggtatcg ggaaagcgct cgtttacgga 1200gaatcagatg gcttcattaa aattattgca gataaaaaaa cagacgatat cttaggcgtg 1260agcatgattg gaccacacgt tacggacatg attagcgaag ccgctttagc acaagtttta 1320aatgcaacgc cgtgggaagt gggcaacacg attcacccgc acccaacttt atcagaaagt 1380tttagagaag ctgcccttgc tgtggatggc aatgcaattc acggttaa 142877478PRTStreptomyces avermitilis 77Met Glu Asn Met Asn Thr Pro Asp Val Ile Val Ile Gly Gly Gly Thr 1 5 10 15 Gly Gly Tyr Ser Ala Ala Leu Arg Ala Ala Ala Leu Gly Leu Thr Val 20 25 30 Val Leu Ala Glu Arg Asp Lys Val Gly Gly Thr Cys Leu His Arg Gly 35 40 45 Cys Ile Pro Ser Lys Ala Met Leu His Ala Ala Glu Leu Val Asp Gly 50 55 60 Ile Ala Glu Ala Arg Glu Arg Trp Gly Val Lys Ala Thr Leu Asp Asp 65 70 75 80 Ile Asp Trp Pro Ala Leu Val Ala Thr Arg Asp Asp Ile Val Thr Arg 85 90 95 Asn His Arg Gly Val Glu Ala His Leu Ala His Ala Arg Val Arg Val 100 105 110 Val Arg Gly Ser Ala Arg Leu Thr Gly Pro Arg Ser Val Arg Val Glu 115 120 125 Gly Ala Pro Asp Asp Leu Pro Gly Gly Ala Gly Asp Phe Thr Ala Arg 130 135 140 Arg Gly Ile Val Leu Ala Thr Gly Ser Arg Pro Arg Thr Leu Pro Gly 145 150 155 160 Leu Val Pro Asp Gly Arg Arg Val Val Thr Ser Asp Asp Ala Leu Phe 165 170 175 Ala Pro Gly Leu Pro Arg Ser Val Leu Val Leu Gly Gly Gly Ala Ile 180 185 190 Gly Val Glu Tyr Ala Ser Phe His Arg Ser Met Gly Ala Glu Val Thr 195 200 205 Leu Val Glu Ala Ala Asp Arg Ile Val Pro Leu Glu Asp Val Asp Val 210 215 220 Ser Arg His Leu Thr Arg Gly Leu Lys Lys Arg Gly Ile Asp Val Arg 225 230 235 240 Ala Gly Ala Arg Leu Leu Asp Ala Glu Leu Leu Glu Ala Gly Val Arg 245 250 255 Ala Arg Val Arg Thr Val Arg Gly Glu Ile Arg Thr Leu Glu Ala Glu 260 265 270 Arg Leu Leu Val Ala Val Gly Arg Ala Pro Val Thr Asp Gly Leu Asp 275 280 285 Leu Ala Ala Ala Gly Leu Ala Thr Asp Glu Arg Gly Phe Val Thr Pro 290 295 300 Ser Asp Trp Asp Arg Leu Glu Thr Ala Val Pro Gly Ile His Val Val 305 310 315 320 Gly Asp Leu Leu Pro Pro Pro Ser Leu Gly Leu Ala His Ala Ser Phe 325 330 335 Ala Glu Gly Leu Ser Val Ala Glu Thr Leu Ala Gly Leu Pro Ser Ala 340 345 350 Pro Val Asp Tyr Ala Ala Val Pro Arg Val Thr Tyr Ser Ser Pro Gln 355 360 365 Thr Ala Ser Val Gly Leu Gly Glu Ala Glu Ala Arg Ala Arg Gly His 370 375 380 Glu Val Asp Val Asn Thr Met Pro Leu Thr Ala Val Ala Lys Gly Met 385 390 395 400 Val His Gly Arg Gly Gly Met Val Lys Val Val Ala Glu Glu Gly Gly 405 410 415 Gly Gln Val Leu Gly Val His Leu Val Gly Pro His Val Ser Glu Met 420 425 430 Ile Ala Glu Ser Gln Leu Ile Val Gly Trp Asp Ala Gln Pro Ser Asp 435 440 445 Val Ala Arg His Ile His Ala His Pro Thr Leu Ser Glu Ala Val Gly 450 455 460 Glu Thr Phe Leu Thr Leu Ala Gly Arg Gly Leu His Gln Gln 465 470 475 781437DNAStreptomyces avermitilis 78gtggagaaca tgaacacacc ggacgtcatc gtcatcggag gcggcaccgg cggctacagc 60gccgccctgc gcgccgccgc cctcggtctg accgtggtgc tcgccgagcg ggacaaggtc 120ggcggaacct gtctgcaccg tggctgcatt ccgagcaagg cgatgctgca cgccgcagaa 180ctggtcgacg gcatcgccga ggcgcgcgag cgctgggggg tgaaggccac gctggacgac 240atcgactggc ctgcgctcgt cgccacgcgc gacgacatag tgacgcgcaa ccaccgcggc 300gtggaggcgc acctcgccca cgcgcgcgtg cgcgtcgtcc ggggcagtgc ccggctgacc 360ggtccgcgca gcgtccgcgt cgagggtgct ccggacgacc tgccgggcgg cgcgggcgac 420ttcaccgcgc gccggggcat cgtcctggcg accggctcac ggccgcgtac gctcccgggg 480ctcgtgccgg acgggcggcg cgtggtgacg agcgacgacg cgctgttcgc gcccggcctc 540ccccgctccg tgctggtcct gggcggcggt gcgatcgggg tcgagtacgc ctcgttccac 600cgctccatgg gtgcggaggt cactctcgtc gaggccgccg accggatcgt gccgctcgaa 660gacgtcgacg tcagccgtca tctgacgcgc ggtctgaaga agcgcggcat cgatgtgcgg 720gcgggggcgc ggctgctcga cgccgaactc ctggaggcgg gggtacgcgc gcgcgtacgc 780accgtgcggg gcgagatccg cacactggag gccgagcggc tcctggtggc ggtcgggcgg 840gcgccggtca ccgacgggct ggacctggcc gccgcgggcc tggcgacgga cgagcggggt 900tttgtgacgc cgtccgactg ggaccgtctg gagaccgcgg tgcccggcat ccacgtggtg 960ggcgacctgc tgccaccgcc gtccctggga ctggcccacg cgtcgttcgc cgagggcctg 1020tcggtggccg agacgctggc cgggctgccg tccgcgcccg tggactacgc ggccgtgccc 1080cgggtcacgt actcgtcgcc gcagaccgcc tccgtggggc tgggcgaggc ggaggcacgc 1140gcgcgtggac acgaggtgga cgtcaacacg atgccgctga ccgccgtcgc caagggcatg 1200gtccacggcc ggggcgggat ggtgaaggtc gtcgccgagg agggcggcgg gcaggtgctc 1260ggcgtgcatc tggtgggccc ccacgtgtcc gagatgatcg ccgagagcca gctgatcgtc 1320ggctgggacg cacagccctc cgacgtggcc cggcacatcc acgcgcaccc cacgctgtcc 1380gaggcggtcg gcgaaacgtt tctcacgctc gcgggacggg ggctgcatca gcagtga 143779476PRTMicrococcus luteus 79Met Thr Glu Glu Asn Ser Thr Phe Ile Pro Ser Leu Thr Ile Ile Gly 1 5 10 15 Gly Gly Pro Gly Gly Tyr Glu Ala Ala Met Val Ala Ala Lys Leu Gly 20 25 30 Ala Arg Val Thr Leu Val Glu Arg Gln Gly Val Gly Gly Ala Ala Val 35 40 45 Leu Thr Asp Val Val Pro Ser Lys Thr Leu Ile Ala Ala Ala Asp Ser 50 55 60 Met Arg Arg Val Gly Ala Ser Val Asp Leu Gly Val Asp Leu Gly Gly 65 70 75 80 Ala Glu Val His Ala Asp Met Gly Arg Val Gly His Arg Ile Leu Asn 85 90 95 Leu Ala His Glu Gln Ser Ser Asp Ile Arg Ala Gly Leu Glu Arg Val 100 105 110 Gly Val Arg Val Ile Asp Gly Val Gly Arg Val Val Gly Pro His Glu 115 120 125 Val Ser Val Arg Ala Leu Asp Asp Ala Asp Ala Gly Ala Glu Pro Glu 130 135 140 Ile Ile Thr Ser Asp Ala Ile Leu Val Ala Val Gly Ala Ser Pro Arg 145 150 155 160 Glu Leu Pro Thr Ala Val Pro Asp Gly Glu Arg Ile Phe Asn Trp Lys 165 170 175 Gln Val Tyr Asn Leu Lys Glu Leu Pro Glu His Leu Ile Val Val Gly 180 185 190 Ser Gly Val Thr Gly Ala Glu Phe Ala Ser Ala Tyr Asn Arg Leu Gly 195 200 205 Ala Lys Val Thr Leu Val Ser Ser Arg Asp Arg Val Leu Pro Gly Glu 210 215 220 Asp Ala Asp Ala Ala Glu Leu Leu Glu Lys Val Phe Glu Gly Asn Gly 225 230 235 240 Leu Arg Val Val Ser Arg Ser Arg Ala Glu Ser Val Glu Arg Thr Glu 245 250 255 Thr Gly Val Arg Val His Leu Ser Gly Glu Gly Ala Glu Asp Thr Pro 260 265 270 Ser Ile Glu Gly Ser His Ala Leu Val Ala Val Gly Gly Val Pro Asn 275 280 285 Thr Ala Gly Leu Gly Leu Asp Asp Val Gly Val Lys Leu Ala Asp Ser 290 295 300 Gly His Val Leu Val Asp Gly Val Ser Arg Thr Ser Val Pro Ser Ile 305 310 315 320 Tyr Ala Ala Gly Asp Cys Thr Gly Lys Leu Ala Leu Ala Ser Val Ala 325 330 335 Ala Met Gln Gly Arg Ile Ala Val Ala His Leu Leu Gly Asp Ala Leu 340 345 350 Lys Pro Leu Arg Pro His Leu Leu Ala Ser Asn Ile Phe Thr Ser Pro 355 360 365 Glu Ile Ala Thr Val Gly Val Ser Gln Ala Gln Val Asp Ser Gly Gln 370 375 380 Tyr Gln Ala Asp Val Leu Arg Leu Asp Phe His Thr Asn Pro Arg Ala 385 390 395 400 Lys Met Ser Gly Ala Glu Glu Gly Phe Val Lys Ile Phe Ala Arg Gln 405 410 415 Gly Ser Gly Thr Val Ile Gly Gly Val Val Val Ser Pro Arg Ala Ser 420 425 430 Glu Leu Ile Tyr Ala Leu Ala Leu Ala Val Thr His Lys Leu His Val 435 440 445 Asp Asp Leu Ala Asp Thr Phe Thr Val Tyr Pro Ser Met Ser Gly Ser 450 455 460 Ile Ala Glu Ala Ala Arg Arg Leu His Val Arg Val 465 470 475 801431DNAMicrococcus luteus 80gtgaccgagg aaaacagcac cttcatcccg tccctgacca tcatcggcgg cggccccggc 60ggctacgagg ccgccatggt ggccgcgaag ctgggcgccc gcgtgaccct ggtcgagcgc 120cagggggtgg gcggcgcggc cgtcctcacg gacgtggtcc cctccaagac gctgatcgcc 180gccgccgact cgatgcgccg cgtgggcgcc tccgtggacc tgggggtcga cctcggcggg 240gccgaggtcc acgcggacat gggccgggtc ggccaccgca tcctgaacct ggcccacgag 300cagtcctcgg acatccgcgc gggcctcgag cgggtcggtg tccgggtgat cgacggcgtg 360ggccgcgtcg tcggccccca cgaggtgtcc gtccgcgccc tcgacgacgc cgacgccggc 420gccgagcccg agatcatcac ctcggacgcg atcctcgtgg ccgtcggcgc gagtccccgg 480gagctgccca ccgccgtccc ggacggcgag cggatcttca actggaagca ggtctacaac 540ctcaaggagc tgcccgagca cctgatcgtc gtgggctccg gcgtcaccgg cgccgagttc 600gcctcggcct acaaccgcct cggcgccaag gtcaccctcg tctcctcgcg cgaccgcgtg 660ctccccggcg aggacgccga cgccgcagag ctgctcgaga aggtcttcga gggcaacggc 720ctcagggttg tctcccgctc ccgggccgag tcggtcgagc ggaccgagac cggcgtgcgc 780gtgcacctct ccggcgaggg ggccgaagac accccgtcga tcgagggctc ccacgcgctg 840gtggccgtcg gcggcgtgcc gaacacggcg ggcctcggcc tcgacgacgt gggcgtgaag 900ctggccgact ccggccacgt gctcgtggac ggcgtctccc gcacgtccgt gccgagcatc 960tacgcggcgg gcgactgcac gggcaagctc gccctcgcct cggtggcggc catgcagggg 1020cgcatcgccg tggcccacct gctcggcgac gccctcaagc cgctgcgccc gcacctgctg 1080gcctcgaaca tcttcacctc gccggagatc gccaccgtgg gcgtctcgca ggcgcaggtg 1140gactccggcc agtaccaggc ggacgtgctg cgactggact tccacaccaa cccccgcgcc 1200aagatgtccg gcgcggagga ggggttcgtg aagatcttcg cgcgtcaggg ctccggcacc 1260gtgatcggcg gcgtggtggt ctcgccgcgc gcctccgagc tgatctacgc gctcgcgctc 1320gcggtcacgc acaagttgca cgtggacgac ctcgcggaca ccttcaccgt gtacccgtcc 1380atgtccgggt cgatcgcgga ggcggcgcgc cgcctccatg tgcgggtgtg a 143181473PRTStaphylococcus aureus 81Met Ser Glu Lys Gln Tyr Asp Leu Val Val Leu Gly Gly Gly Thr Ala 1 5 10 15 Gly Tyr Val Ala Ala Ile Arg Ala Ser Gln Leu Gly Lys Lys Val Ala 20 25 30 Ile Val Glu Arg Gln Leu Leu Gly Gly Thr Cys Leu His Lys Gly Cys 35 40 45 Ile Pro Thr Lys Ser Leu Leu Lys Ser Ala Glu Val Phe Gln Thr Val 50 55 60 Lys Gln Ala Ala Met Phe Gly Val Asp Val Lys Asp Ala Asn Val Asn 65 70 75 80 Phe Glu Asn Met Leu Ala Arg Lys Glu Asp Ile Ile Asn Gln Met Tyr 85 90 95 Gln Gly Val Lys His Leu Met Gln His Asn His Ile Asp Ile Tyr Asn 100 105 110 Gly Thr Gly Arg Ile Leu Gly Thr Ser Ile Phe Ser Pro Gln Ser Gly 115 120 125 Thr Ile Ser Val Glu Tyr Glu Asp Gly Glu Ser Asp Leu Leu Pro Asn 130 135 140 Gln Phe Val Leu Ile Ala Thr Gly Ser Ser Pro Ala Glu Leu Pro Phe 145 150 155 160 Leu Ser Phe Asp His Asp Lys Ile Leu Ser Ser Asp Asp Ile Leu Ser 165 170 175 Leu Lys Thr Leu Pro Ser Ser Ile Gly Ile Ile Gly Gly Gly Val Ile 180 185 190 Gly Met Glu Phe Ala Ser Leu Met Ile Asp Leu Gly Val Asp Val Thr 195 200 205 Val Ile Glu Ala Gly Glu Arg Ile Leu Pro Thr Glu Ser Lys Gln Ala 210 215 220 Ser Gln Leu Leu Lys Lys Ser Leu Ser Ala Arg Gly Val Lys Phe Tyr 225 230 235 240 Glu Gly Ile Lys Leu Ser Glu Asn Asp Ile Asn Val Asn Glu Asp Gly 245 250 255 Val Thr Phe Glu Ile Ser Ser Asp Ile Ile Lys Val Asp Lys Val Leu

260 265 270 Leu Ser Ile Gly Arg Lys Pro Asn Thr Ser Asp Ile Gly Leu Asn Asn 275 280 285 Thr Lys Ile Lys Leu Ser Thr Ser Gly His Ile Leu Thr Asn Glu Phe 290 295 300 Gln Gln Thr Glu Asp Lys His Ile Tyr Ala Ala Gly Asp Cys Ile Gly 305 310 315 320 Lys Leu Gln Leu Ala His Val Gly Ser Lys Glu Gly Val Val Ala Val 325 330 335 Asp His Met Phe Glu Gly Asn Pro Ile Pro Val Asn Tyr Asn Met Met 340 345 350 Pro Lys Cys Ile Tyr Ser Gln Pro Glu Ile Ala Ser Ile Gly Leu Asn 355 360 365 Ile Glu Gln Ala Lys Ala Glu Gly Met Lys Val Lys Ser Phe Lys Val 370 375 380 Pro Phe Lys Ala Ile Gly Lys Ala Val Ile Asp Ser His Asp Ala Asn 385 390 395 400 Glu Gly Tyr Ser Glu Met Val Ile Asp Gln Ser Thr Glu Glu Ile Val 405 410 415 Gly Ile Asn Met Ile Gly Pro His Val Thr Glu Leu Ile Asn Glu Ala 420 425 430 Ser Leu Leu Gln Phe Met Asn Gly Ser Ala Leu Glu Leu Gly Leu Thr 435 440 445 Thr His Ala His Pro Ser Ile Ser Glu Val Leu Met Glu Leu Gly Leu 450 455 460 Lys Ala Glu Ser Arg Ala Ile His Val 465 470 821422DNAStaphylococcus aureus 82ttatacgtga atagctctac tttctgcttt caatcctaat tccatcaaca cttcagagat 60ggaaggatgt gcgtgtgttg ttagtcctaa ttctaatgcc gagccattca tgaactgtaa 120cagtgatgcc tcattaatca attctgttac atgtggacca atcatattaa tacccacaat 180ttcttcagtt gattgatcaa tcaccatttc gctataccct tcgtttgcgt catggctatc 240aatcactgct ttaccaattg ctttaaatgg tactttaaaa cttttaactt tcattccctc 300tgcctttgct tgttcaatgt ttaaaccgat agaagcaatt tcaggttgtg aataaataca 360cttaggcatc atgttatagt ttactgggat tgggttcccc tcaaacatat gatcaacagc 420cacaacacct tcttttgatc caacatgtgc caattgtaat tttcctatac aatcaccagc 480tgcataaata tgtttatctt cagtttgttg aaattcgttc gttaaaatat gtcctgatgt 540agaaagtttt attttagtgt tgtttaaacc aatatctgat gtgttaggtt ttctaccaat 600cgatagcaac actttatcta ctttaattat gtctgaagaa atttcaaacg taacaccatc 660ttcgttaaca tttatatcat tttcagaaag ttttattccc tcatagaatt taacaccacg 720tgctgacaat gattttttta atagttgtga agcttgttta ctttcagttg gtaaaattct 780ttcacctgct tctataactg ttacgtcaac acctaaatct atcatcaatg atgcaaattc 840cattccgata acaccaccac caataatacc aatacttgat ggtaacgtct ttaatgataa 900tatatcatcg ctagataaaa ttttatcatg atcaaatgat aagaatggca actctgcagg 960cgaagaacca gttgcaatta atacaaattg gttgggtaat aagtctgatt caccatcttc 1020atattcgaca gaaattgtgc cactttgagg tgaaaatata gatgtaccta gaatacgtcc 1080cgtgccatta taaatgtcaa tgtgattgtg ttgcattaaa tgctttacac cttgatacat 1140ttgattaata atgtcttctt ttcgtgccaa catattttca aaattaacat tagcatcttt 1200gacatcaacg ccaaacattg ctgcctgttt tactgtttga aatacttcag cagatttaag 1260cagcgattta gtaggaatac aacctttatg gagacaagta cctcctaata gttgtcgttc 1320tactattgcc actttcttac ctaattgaga cgcacgtatc gcagcaacat atcctgcagt 1380acctccaccg agaacgacta aatcatattg tttctctgac at 142283581PRTStreptococcus mutans 83Met Ala Val Glu Ile Ile Met Pro Lys Leu Gly Val Asp Met Gln Glu 1 5 10 15 Gly Glu Ile Ile Glu Trp Lys Lys Gln Glu Gly Asp Glu Val Lys Glu 20 25 30 Gly Asp Ile Leu Leu Glu Ile Met Ser Asp Lys Thr Asn Met Glu Ile 35 40 45 Glu Ala Glu Asp Ser Gly Val Leu Leu Lys Ile Val Lys Gly Asn Gly 50 55 60 Gln Val Val Pro Val Thr Glu Val Ile Gly Tyr Ile Gly Ser Ala Gly 65 70 75 80 Glu Thr Ile Glu Thr Asn Ala Ala Pro Ala Ala Ser Ala Asp Asp Leu 85 90 95 Lys Ala Ala Gly Leu Glu Val Pro Asp Thr Leu Gly Glu Ser Ala Ala 100 105 110 Pro Ala Ala Gln Lys Thr Pro Leu Ala Asp Asp Glu Tyr Asp Met Ile 115 120 125 Val Val Gly Gly Gly Pro Ala Gly Tyr Tyr Ala Ala Ile Arg Gly Ala 130 135 140 Gln Leu Gly Gly Lys Val Ala Ile Val Glu Lys Ser Glu Phe Gly Gly 145 150 155 160 Thr Cys Leu Asn Lys Gly Cys Ile Pro Thr Lys Thr Tyr Leu Lys Asn 165 170 175 Ala Glu Ile Leu Asp Gly Ile Lys Ile Ala Ala Gly Arg Gly Ile Asn 180 185 190 Phe Ala Ser Thr Asn Tyr Thr Ile Asp Met Asp Lys Thr Val Ala Phe 195 200 205 Lys Asp Thr Val Val Lys Thr Leu Thr Ser Gly Val Gln Gly Leu Leu 210 215 220 Lys Ala Asn Lys Val Thr Ile Phe Asn Gly Leu Gly Gln Val Asn Pro 225 230 235 240 Asp Lys Thr Val Thr Val Gly Ser Glu Thr Ile Lys Gly His Asn Ile 245 250 255 Ile Leu Ala Thr Gly Ser Lys Val Ser Arg Ile Asn Ile Pro Gly Ile 260 265 270 Asp Ser Pro Leu Val Leu Thr Ser Asp Asp Ile Leu Asp Leu Arg Glu 275 280 285 Ile Pro Lys Ser Leu Ala Val Met Gly Gly Gly Val Val Gly Ile Glu 290 295 300 Leu Gly Leu Val Tyr Ala Ser Tyr Gly Thr Glu Val Thr Val Ile Glu 305 310 315 320 Met Ala Asp Arg Ile Ile Pro Ala Met Asp Lys Glu Val Ser Leu Glu 325 330 335 Leu Gln Lys Ile Leu Ser Lys Lys Gly Met Asn Ile Lys Thr Ser Val 340 345 350 Gly Val Ala Glu Ile Val Glu Ala Asn Asn Gln Leu Thr Leu Lys Leu 355 360 365 Asn Asp Gly Ser Glu Val Val Ala Glu Lys Ala Leu Leu Ser Ile Gly 370 375 380 Arg Val Pro Gln Leu Ser Gly Leu Glu Asn Leu Asn Leu Glu Leu Glu 385 390 395 400 Arg Gly Arg Ile Lys Val Asp Asp Tyr Gln Glu Thr Ser Ile Ser Gly 405 410 415 Ile Tyr Ala Pro Gly Asp Val Asn Gly Arg Lys Met Leu Ala His Ala 420 425 430 Ala Tyr Arg Met Gly Glu Val Ala Ala Glu Asn Ala Ile Trp Gly Asn 435 440 445 Val Arg Lys Ala Asn Leu Lys Tyr Thr Pro Ala Ala Val Tyr Thr His 450 455 460 Pro Glu Val Ala Met Cys Gly Ile Thr Glu Glu Gln Ala Arg Gln Glu 465 470 475 480 Tyr Gly Asn Val Leu Val Gly Lys Ser Ser Phe Ser Gly Asn Gly Arg 485 490 495 Ala Ile Ala Ser Asn Glu Ala Gln Gly Phe Val Lys Val Val Ala Asp 500 505 510 Ala Lys Tyr His Glu Ile Leu Gly Val His Ile Ile Gly Pro Ala Ala 515 520 525 Ala Glu Met Ile Asn Glu Ala Ser Thr Ile Met Glu Asn Glu Leu Thr 530 535 540 Val Asp Glu Leu Leu Arg Ser Ile His Gly His Pro Thr Phe Ser Glu 545 550 555 560 Val Met Tyr Glu Ala Phe Ala Asp Val Leu Gly Glu Ala Ile His Asn 565 570 575 Pro Pro Lys Arg Arg 580 841746DNAStreptococcus mutans 84atggcagtcg aaattattat gcctaaactc ggtgttgata tgcaggaagg cgaaatcatc 60gagtggaaaa aacaagaagg tgatgaggtc aaagaagggg atatcctcct tgaaatcatg 120tctgacaaga ccaatatgga aattgaagct gaggattcag gtgtcctgct caaaattgtt 180aaaggaaatg gtcaagttgt ccctgtgact gaggtcattg gttatattgg ttctgctggt 240gaaacgattg aaacaaatgc agcgccagca gcttcagctg atgatctcaa agcagcgggt 300cttgaagttc ctgatacttt aggcgagtca gcagcaccag cagctcaaaa aactccgctt 360gctgatgatg agtatgatat gattgtcgtt ggtggtggtc ctgctggtta ttatgctgct 420attcgcggtg cacaattggg cggcaaggtt gctatcgtcg aaaaatcaga atttggaggg 480acttgtttaa ataaaggctg cattccaact aaaacttatc ttaagaatgc tgaaatcctt 540gatggcatca aaattgcagc gggtcgcggt attaattttg cttcaaccaa ctataccatt 600gacatggaca aaacggttgc ctttaaagat accgttgtta aaacattgac aagtggggtt 660cagggtcttc ttaaagccaa taaagtgact attttcaatg gtctcggtca ggttaatcct 720gataagacag tgactgtcgg ttcggaaacg attaaaggac ataatattat ccttgcaaca 780ggttcaaaag tgtctcgtat taatattccg ggaattgatt cacctcttgt tttaacatcg 840gatgatattc ttgatcttcg tgaaattcca aagtcacttg ctgttatggg cggtggtgtt 900gtcggcattg aactcggtct tgtttacgct tcctatggta cagaagtgac tgttattgaa 960atggctgatc gcattattcc tgctatggac aaggaagtat cgcttgaact gcaaaaaatt 1020ctatccaaga aaggaatgaa cattaagact tctgttggtg tggctgaaat tgttgaagct 1080aacaatcaat taacgctgaa actcaatgac ggctctgaag ttgtggctga aaaggccctg 1140ctttctattg gtcgtgtccc acaattaagc ggtttagaaa atcttaatct ggaacttgaa 1200cgcggtcgca tcaaagtgga cgattatcag gaaacctcta tttcaggtat ttatgccccg 1260ggtgatgtta atggaagaaa gatgttagcg catgctgcct atcgtatggg tgaagtagct 1320gccgaaaatg ctatctgggg aaatgttcgt aaggctaacc tgaaatatac accagcagct 1380gtttacaccc atccagaggt tgctatgtgc ggtattactg aagaacaagc ccgtcaagaa 1440tatggaaacg tcttagttgg gaaatcctct ttttcaggaa atggacgtgc gatcgcttct 1500aatgaagcac aaggatttgt caaagttgtc gcagatgcta aataccatga aattcttgga 1560gtccatatta ttggaccagc agctgctgag atgattaatg aagcctcaac gattatggaa 1620aatgagttga cggttgatga gctgctacgt tctattcatg gccatcctac cttctcggag 1680gttatgtatg aagcctttgc agacgtcctt ggcgaagcta tccataaccc gccaaagcgt 1740cgttaa 17468517PRTArtificial sequenceSynthetic polypeptide 85Xaa Gly Gly Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Pro Xaa Lys Xaa Xaa 1 5 10 15 Xaa 8617PRTArtificial sequenceSynthetic polypeptide 86Xaa Ala Thr Gly Xaa Xaa Xaa Xaa Xaa Leu Xaa Xaa Xaa Xaa Xaa Xaa 1 5 10 15 Gly 8712PRTArtificial sequenceSynthetic polypeptide 87Xaa Xaa Gly Xaa Gly Xaa Xaa Gly Xaa Glu Xaa Xaa 1 5 10 8814PRTArtificial sequenceSynthetic polypeptide 88Thr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly Asp Xaa Xaa Xaa 1 5 10 8911PRTArtificial sequenceSynthetic polypeptide 89Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Pro Xaa Xaa 1 5 10 90317PRTEscherichia coli 90Met Tyr Thr Lys Ile Ile Gly Thr Gly Ser Tyr Leu Pro Glu Gln Val 1 5 10 15 Arg Thr Asn Ala Asp Leu Glu Lys Met Val Asp Thr Ser Asp Glu Trp 20 25 30 Ile Val Thr Arg Thr Gly Ile Arg Glu Arg His Ile Ala Ala Gln Asn 35 40 45 Glu Thr Val Ser Thr Met Gly Phe Glu Ala Ala Thr Arg Ala Ile Glu 50 55 60 Met Ala Gly Ile Glu Lys Asp Gln Ile Gly Leu Ile Val Val Ala Thr 65 70 75 80 Thr Ser Ala Thr His Ala Phe Pro Ser Ala Ala Cys Gln Ile Gln Ser 85 90 95 Met Leu Gly Ile Lys Gly Cys Pro Ala Phe Asp Val Ala Ala Ala Cys 100 105 110 Ala Gly Phe Thr Tyr Ala Leu Ser Val Ala Asp Gln Tyr Val Lys Ser 115 120 125 Gly Ala Val Lys Tyr Ala Leu Val Val Gly Ser Asp Val Leu Ala Arg 130 135 140 Thr Cys Asp Pro Thr Asp Arg Gly Thr Ile Ile Ile Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Ala Val Leu Ala Ala Ser Glu Glu Pro Gly Ile Ile Ser 165 170 175 Thr His Leu His Ala Asp Gly Ser Tyr Gly Glu Leu Leu Thr Leu Pro 180 185 190 Asn Ala Asp Arg Val Asn Pro Glu Asn Ser Ile His Leu Thr Met Ala 195 200 205 Gly Asn Glu Val Phe Lys Val Ala Val Thr Glu Leu Ala His Ile Val 210 215 220 Asp Glu Thr Leu Ala Ala Asn Asn Leu Asp Arg Ser Gln Leu Asp Trp 225 230 235 240 Leu Val Pro His Gln Ala Asn Leu Arg Ile Ile Ser Ala Thr Ala Lys 245 250 255 Lys Leu Gly Met Ser Met Asp Asn Val Val Val Thr Leu Asp Arg His 260 265 270 Gly Asn Thr Ser Ala Ala Ser Val Pro Cys Ala Leu Asp Glu Ala Val 275 280 285 Arg Asp Gly Arg Ile Lys Pro Gly Gln Leu Val Leu Leu Glu Ala Phe 290 295 300 Gly Gly Gly Phe Thr Trp Gly Ser Ala Leu Val Arg Phe 305 310 315 91954DNAEscherichia coli 91atgtatacga agattattgg tactggcagc tatctgcccg aacaagtgcg gacaaacgcc 60gatttggaaa aaatggtgga cacctctgac gagtggattg tcactcgtac cggtatccgc 120gaacgccaca ttgccgcgca aaacgaaacc gtttcaacca tgggctttga agcggcgaca 180cgcgcaattg agatggcggg cattgagaaa gaccagattg gcctgatcgt tgtggcaacg 240acttctgcta cgcacgcttt cccgagcgca gcttgtcaga ttcaaagcat gctgggcatt 300aaaggttgcc cggcatttga cgttgcagca gcctgcgcag gtttcaccta tgcattaagc 360gtagccgatc aatacgtgaa atctggggcg gtgaagtatg ctctggtcgt cggttccgat 420gtactggcgc gcacctgcga tccaaccgat cgtgggacta ttattatttt tggcgatggc 480gcgggcgctg cggtgctggc tgcctctgaa gagccgggaa tcatctccac ccatctgcat 540gccgacggta gctatggtga gttgctgacg ctgcctaatg ctgaccgtgt gaatccagag 600aattcaattc atctgacgat ggcgggcaac gaagtcttca aggttgcggt aacggaactg 660gcgcacatcg ttgatgagac gctggcggca aataatcttg accgttctca actggactgg 720ctggttccgc atcaggctaa cctgcgtatt atcagtgcaa cggcgaaaaa actcggtatg 780tctatggaca atgtcgtggt gacgctggat cgccacggta atacctctgc ggcctctgtc 840ccgtgcgcgc tggatgaagc tgtacgcgac gggcgcatta agccggggca gttggttctg 900cttgaagcct ttggcggtgg attcacctgg ggctccgcgc tggttcgttt ctag 95492317PRTEscherichia coli 92Met Tyr Thr Lys Ile Ile Gly Thr Gly Ser Tyr Leu Pro Glu Gln Val 1 5 10 15 Arg Thr Asn Ala Asp Leu Glu Lys Met Val Asp Thr Ser Asp Glu Trp 20 25 30 Ile Val Thr Arg Thr Gly Ile Arg Glu Arg His Ile Ala Ala Pro Asn 35 40 45 Glu Thr Val Ser Thr Met Gly Phe Glu Ala Ala Thr Arg Ala Ile Glu 50 55 60 Met Ala Gly Ile Glu Lys Asp Gln Ile Gly Leu Ile Val Val Ala Thr 65 70 75 80 Thr Ser Ala Thr His Ala Phe Pro Ser Ala Ala Cys Gln Ile Gln Ser 85 90 95 Met Leu Gly Ile Lys Gly Cys Pro Ala Phe Asp Val Ala Ala Ala Cys 100 105 110 Ala Gly Phe Thr Tyr Ala Leu Ser Val Ala Asp Gln Tyr Val Lys Ser 115 120 125 Gly Ala Val Lys Tyr Ala Leu Val Val Gly Ser Asp Val Leu Ala Arg 130 135 140 Thr Cys Asp Pro Thr Asp Arg Gly Thr Ile Ile Ile Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Ala Val Leu Ala Ala Ser Glu Glu Pro Gly Ile Ile Ser 165 170 175 Thr His Leu His Ala Asp Gly Ser Tyr Gly Glu Leu Leu Thr Leu Pro 180 185 190 Asn Ala Asp Arg Val Asn Pro Glu Asn Ser Ile His Leu Thr Met Ala 195 200 205 Gly Asn Glu Val Phe Lys Val Ala Val Thr Glu Leu Ala His Ile Val 210 215 220 Asp Glu Thr Leu Thr Ala Asn Asn Leu Asp Arg Ser Gln Leu Asp Trp 225 230 235 240 Leu Val Pro His Gln Ala Asn Leu Arg Ile Ile Ser Ala Thr Ala Lys 245 250 255 Lys Leu Gly Met Ser Met Asp Asn Val Val Val Thr Leu Asp Arg His 260 265 270 Gly Asn Thr Ser Ala Ala Ser Val Pro Cys Ala Leu Asp Glu Ala Val 275 280 285 Arg Asp Gly Arg Ile Lys Pro Gly Gln Leu Val Leu Leu Glu Ala Phe 290 295 300 Gly Gly Gly Phe Thr Trp Gly Ser Ala Leu Val Arg Phe 305 310 315 93954DNAEscherichia coli 93atgtatacga agattattgg tactggcagc tatctgcccg aacaagtgcg gacaaacgcc 60gatttggaaa aaatggtgga cacctctgac gagtggattg tcactcgtac cggtatccgc 120gaacgccaca ttgccgcgcc aaacgaaacc gtttcaacca tgggctttga agcggcgaca 180cgcgcaattg agatggcggg cattgagaaa gaccagattg gcctgatcgt tgtggcaacg 240acttctgcta cgcacgcttt cccgagcgca gcttgtcaga ttcaaagcat gttgggcatt 300aaaggttgcc cggcatttga cgttgcagca gcctgcgcag gtttcaccta tgcattaagc 360gtagccgatc aatacgtgaa atctggggcg gtgaagtatg ctctggtcgt cggttccgat 420gtactggcgc gcacctgcga tccaaccgat cgtgggacta ttattatttt tggcgatggc 480gcgggcgctg cggtgctggc tgcctctgaa gagccgggaa tcatctccac ccatctgcat 540gccgacggta gttatggtga attgctgacg ctgccaaacg ccgaccgcgt gaatccagag 600aattcaattc atctgacgat ggcgggcaac gaagtcttca aggttgcggt aacggaactg 660gcgcacatcg ttgatgagac gctgacggcg aataatcttg

accgttctca actggactgg 720ctggttccgc atcaggctaa cctgcgtatt atcagtgcaa cggcgaaaaa actcggtatg 780tcgatggaca atgtcgtggt gacgctggat cgccacggta atacctctgc ggcctctgtc 840ccgtgcgcgc tggatgaagc tgtacgcgac gggcgcatta agccggggca gttggttctg 900cttgaagcct ttggcggtgg attcacctgg ggctccgcgc tggttcgttt ctag 95494312PRTBacillus subtilis 94Met Lys Ala Gly Ile Leu Gly Val Gly Arg Tyr Ile Pro Glu Lys Val 1 5 10 15 Leu Thr Asn His Asp Leu Glu Lys Met Val Glu Thr Ser Asp Glu Trp 20 25 30 Ile Arg Thr Arg Thr Gly Ile Glu Glu Arg Arg Ile Ala Ala Asp Asp 35 40 45 Val Phe Ser Ser His Met Ala Val Ala Ala Ala Lys Asn Ala Leu Glu 50 55 60 Gln Ala Glu Val Ala Ala Glu Asp Leu Asp Met Ile Leu Val Ala Thr 65 70 75 80 Val Thr Pro Asp Gln Ser Phe Pro Thr Val Ser Cys Met Ile Gln Glu 85 90 95 Gln Leu Gly Ala Lys Lys Ala Cys Ala Met Asp Ile Ser Ala Ala Cys 100 105 110 Ala Gly Phe Met Tyr Gly Val Val Thr Gly Lys Gln Phe Ile Glu Ser 115 120 125 Gly Thr Tyr Lys His Val Leu Val Val Gly Val Glu Lys Leu Ser Ser 130 135 140 Ile Thr Asp Trp Glu Asp Arg Asn Thr Ala Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Ala Val Val Gly Pro Val Ser Asp Asp Arg Gly Ile Leu 165 170 175 Ser Phe Glu Leu Gly Ala Asp Gly Thr Gly Gly Gln His Leu Tyr Leu 180 185 190 Asn Glu Lys Arg His Thr Ile Met Asn Gly Arg Glu Val Phe Lys Phe 195 200 205 Ala Val Arg Gln Met Gly Glu Ser Cys Val Asn Val Ile Glu Lys Ala 210 215 220 Gly Leu Ser Lys Glu Asp Val Asp Phe Leu Ile Pro His Gln Ala Asn 225 230 235 240 Ile Arg Ile Met Glu Ala Ala Arg Glu Arg Leu Glu Leu Pro Val Glu 245 250 255 Lys Met Ser Lys Thr Val His Lys Tyr Gly Asn Thr Ser Ala Ala Ser 260 265 270 Ile Pro Ile Ser Leu Val Glu Glu Leu Glu Ala Gly Lys Ile Lys Asp 275 280 285 Gly Asp Val Val Val Met Val Gly Phe Gly Gly Gly Leu Thr Trp Gly 290 295 300 Ala Ile Ala Ile Arg Trp Gly Arg 305 310 95939DNABacillus subtilis 95atgaaagctg gaatacttgg tgttggacgt tacattcctg agaaggtttt aacaaatcat 60gatcttgaaa aaatggttga aacttctgac gagtggattc gtacaagaac aggaatagaa 120gaaagaagaa tcgcagcaga tgatgtgttt tcatcacata tggctgttgc agcagcgaaa 180aatgcgctgg aacaagctga agtggctgct gaggatctgg atatgatctt ggttgcaact 240gttacacctg atcagtcatt ccctacggtc tcttgtatga ttcaagaaca actcggcgcg 300aagaaagcgt gtgctatgga tatcagcgcg gcttgtgcgg gcttcatgta cggggttgta 360accggtaaac aatttattga atccggaacc tacaagcatg ttctagttgt tggtgtagag 420aagctctcaa gcattaccga ctgggaagac cgcaatacag ccgttctgtt tggagacgga 480gcaggcgctg cggtagtcgg gccagtcagt gatgacagag gaatcctttc atttgaacta 540ggagccgacg gcacaggcgg tcagcacttg tatctgaatg aaaaacgaca tacaatcatg 600aatggacgag aagttttcaa atttgcagtc cgccaaatgg gagaatcatg cgtaaatgtc 660attgaaaaag ccggactttc aaaagaggat gtcgactttt tgattccgca tcaggcgaac 720atccgtatca tggaagctgc tcgcgagcgt ttagagcttc ctgtcgaaaa gatgtctaaa 780actgttcata aatatggaaa tacttctgcc gcatccattc cgatctctct tgtagaagaa 840ttggaagccg gtaaaatcaa agacggcgat gtggtcgtta tggtagggtt cggcggagga 900ctaacatggg gcgccattgc aatccgctgg ggccgataa 93996335PRTStreptomyces avermitilis 96Met Ser Gly Gly Arg Ala Ala Val Ile Thr Gly Ile Gly Gly Tyr Val 1 5 10 15 Pro Pro Asp Leu Val Thr Asn Asp Asp Leu Ala Gln Arg Leu Asp Thr 20 25 30 Ser Asp Ala Trp Ile Arg Ser Arg Thr Gly Ile Ala Glu Arg His Val 35 40 45 Ile Ala Pro Gly Thr Ala Thr Ser Asp Leu Ala Val Glu Ala Gly Leu 50 55 60 Arg Ala Leu Lys Ser Ala Gly Asp Glu His Val Asp Ala Val Val Leu 65 70 75 80 Ala Thr Thr Thr Pro Asp Gln Pro Cys Pro Ala Thr Ala Pro Gln Val 85 90 95 Ala Ala Arg Leu Gly Leu Gly Gln Val Pro Ala Phe Asp Val Ala Ala 100 105 110 Val Cys Ser Gly Phe Leu Phe Gly Leu Ala Thr Ala Ser Gly Leu Ile 115 120 125 Ala Ala Gly Val Ala Asp Lys Val Leu Leu Val Ala Ala Asp Ala Phe 130 135 140 Thr Thr Ile Ile Asn Pro Glu Asp Arg Thr Thr Ala Val Ile Phe Ala 145 150 155 160 Asp Gly Ala Gly Ala Val Val Leu Arg Ala Gly Ala Ala Asp Glu Pro 165 170 175 Gly Ala Val Gly Pro Leu Val Leu Gly Ser Asp Gly Glu Leu Ser His 180 185 190 Leu Ile Glu Val Pro Ala Gly Gly Ser Arg Gln Arg Ser Ser Gly Pro 195 200 205 Thr Thr Asp Pro Asp Asp Gln Tyr Phe Arg Met Leu Gly Arg Asp Thr 210 215 220 Tyr Arg His Ala Val Glu Arg Met Thr Asp Ala Ser Gln Arg Ala Ala 225 230 235 240 Glu Leu Ala Asp Trp Arg Ile Asp Asp Val Asp Arg Phe Ala Ala His 245 250 255 Gln Ala Asn Ala Arg Ile Leu Asp Ser Val Ala Glu Arg Leu Gly Val 260 265 270 Pro Ala Glu Arg Gln Leu Thr Asn Ile Ala Arg Val Gly Asn Thr Gly 275 280 285 Ala Ala Ser Ile Pro Leu Leu Leu Ser Gln Ala Ala Ala Ala Gly Arg 290 295 300 Leu Gly Ala Gly His Arg Val Leu Leu Thr Ala Phe Gly Gly Gly Leu 305 310 315 320 Ser Trp Gly Ala Gly Thr Leu Val Trp Pro Glu Val Gln Pro Val 325 330 335 971008DNAStreptomyces avermitilis 97atgagcggcg gacgcgcggc ggtgatcacc gggatcgggg gctatgtgcc tcccgatctg 60gtgaccaacg acgatctggc ccagcggctc gacacctccg acgcgtggat ccgctcgcgc 120accgggatcg ccgagcggca tgtgatcgcg cccggcaccg cgacctccga cctggcggtg 180gaggccggac tgcgggccct gaagtcggcg ggcgacgagc acgtggacgc ggtcgtcctg 240gccaccacga cgcccgacca gccctgcccg gcgaccgccc cgcaggtggc cgcacggctg 300ggactcgggc aggtgccggc gttcgacgtg gccgccgtct gctccggctt cctgttcggc 360ctcgccaccg cgtccgggct gatcgcggcc ggggtggcgg acaaggtcct gctggtcgcc 420gccgacgcgt tcaccacgat catcaacccc gaggaccgca ccacggccgt catcttcgcg 480gacggcgcgg gcgcggtggt gctgcgcgcg ggcgccgccg acgagccggg ggccgtcggc 540ccgctggtgc tcggcagcga cggcgagctg agccatctca tcgaggtgcc ggcgggcggc 600tcgcgccagc gctcgtccgg ccccacgacc gacccggacg accagtactt ccggatgctc 660ggccgggaca cctaccggca cgcggtggag cggatgaccg atgcgtccca gcgggcggcc 720gaactggccg actggcggat cgacgacgtc gaccggttcg cggcgcacca ggccaacgcc 780cgcatcctcg actcggtcgc ggaacgtctc ggggtccccg ccgaacggca gttgaccaac 840atcgcccggg tcggcaacac cggcgccgcc tcgatcccgc tgcttctgtc gcaggcggcc 900gcggccggcc ggctcggcgc cgggcaccgg gtgctcctga ccgcgttcgg cgggggcctg 960tcctggggcg cggggactct ggtctggccg gaggtccagc cggtctga 100898313PRTStraphylococcus aureus 98Met Asn Val Gly Ile Lys Gly Phe Gly Ala Tyr Ala Pro Glu Lys Ile 1 5 10 15 Ile Asp Asn Ala Tyr Phe Glu Gln Phe Leu Asp Thr Ser Asp Glu Trp 20 25 30 Ile Ser Lys Met Thr Gly Ile Lys Glu Arg His Trp Ala Asp Asp Asp 35 40 45 Gln Asp Thr Ser Asp Leu Ala Tyr Glu Ala Ser Val Lys Ala Ile Ala 50 55 60 Asp Ala Gly Ile Gln Pro Glu Asp Ile Asp Met Ile Ile Val Ala Thr 65 70 75 80 Ala Thr Gly Asp Met Pro Phe Pro Thr Val Ala Asn Met Leu Gln Glu 85 90 95 Arg Leu Gly Thr Gly Lys Val Ala Ser Met Asp Gln Leu Ala Ala Cys 100 105 110 Ser Gly Phe Met Tyr Ser Met Ile Thr Ala Lys Gln Tyr Val Gln Ser 115 120 125 Gly Asp Tyr His Asn Ile Leu Val Val Gly Ala Asp Lys Leu Ser Lys 130 135 140 Ile Thr Asp Leu Thr Asp Arg Ser Thr Ala Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Val Ile Ile Gly Glu Val Ser Glu Gly Arg Gly Ile Ile 165 170 175 Ser Tyr Glu Met Gly Ser Asp Gly Thr Gly Gly Lys His Leu Tyr Leu 180 185 190 Asp Lys Asp Thr Gly Lys Leu Lys Met Asn Gly Arg Glu Val Phe Lys 195 200 205 Phe Ala Val Arg Ile Met Gly Asp Ala Ser Thr Arg Val Val Glu Lys 210 215 220 Ala Asn Leu Thr Ser Asp Asp Ile Asp Leu Phe Ile Pro His Gln Ala 225 230 235 240 Asn Ile Arg Ile Met Glu Ser Ala Arg Glu Arg Leu Gly Ile Ser Lys 245 250 255 Asp Lys Met Ser Val Ser Val Asn Lys Tyr Gly Asn Thr Ser Ala Ala 260 265 270 Ser Ile Pro Leu Ser Ile Asp Gln Glu Leu Lys Asn Gly Lys Leu Lys 275 280 285 Asp Asp Asp Thr Ile Val Leu Val Gly Phe Gly Gly Gly Leu Thr Trp 290 295 300 Gly Ala Met Thr Ile Lys Trp Gly Lys 305 310 99942DNAStraphylococcus aureus 99ctattttccc cattttattg tcattgcgcc ccaagttagg ccgccaccga atccgacaag 60aacaattgta tcatcatctt tgagtttacc attttttaat tcttgatcga tacttaaagg 120tattgacgca gctgaagtat ttccatattt atttacagaa acactcattt tgtcttttga 180aatacctaag cgttctctag ctgattccat aattctaata ttagcttgat gaggaataaa 240taaatctata tcatctgatg ttaaattcgc tttttcaact acacgtgttg atgcatcacc 300cataattcta acagcaaatt taaatacttc tcgaccattc attttcagtt taccagtatc 360tttatctaaa tataaatgtt taccacctgt gccatcagaa cccatttcat aacttataat 420acctctgcct tctgaaactt caccgatgat aaccgcacct gcaccatctc caaatagaac 480tgcagtagaa cggtcagtta aatctgttat tttagataat ttatctgcac cgacaactaa 540aatattatga taatctccag attgaacata ttgtttagct gtaatcattg aatacataaa 600tccagaacat gctgcaagtt gatccataga ggcaactttg cccgtcccta aacgttcttg 660taacatattt gcgacagttg gaaatggcat atctccagtt gctgtggcaa caattatcat 720atctatatct tcgggctgaa taccagcgtc agcgattgct tttacacttg cttcatatgc 780taaatctgaa gtatcttgat cgtcatctgc ccaatgtctt tctttaattc cagtcatctt 840agaaatccat tcatcagatg tatctaaaaa ttgctcaaaa taggcattgt caataatctt 900ttctggtgca tatgcaccaa aacctttaat acccacgttc at 942100325PRTStreptococcus mutans 100Met Thr Phe Ala Lys Ile Ser Gln Ala Ala Tyr Tyr Val Pro Ser Gln 1 5 10 15 Val Val Thr Asn Asp Asp Leu Ser Lys Ile Met Asp Thr Ser Asp Glu 20 25 30 Trp Ile Thr Ser Arg Thr Gly Ile Arg Glu Arg Arg Ile Ser Gln Ser 35 40 45 Glu Asp Thr Ser Asp Leu Ala Ser Gln Val Ala Lys Glu Leu Leu Lys 50 55 60 Lys Ala Ser Leu Lys Ala Lys Glu Ile Asp Phe Ile Ile Val Ala Thr 65 70 75 80 Ile Thr Pro Asp Ala Met Met Pro Ser Thr Ala Ala Cys Val Gln Ala 85 90 95 Lys Ile Gly Ala Val Asn Ala Phe Ala Phe Asp Leu Thr Ala Ala Cys 100 105 110 Ser Gly Phe Ile Phe Ala Leu Ser Ala Ala Glu Lys Met Ile Lys Ser 115 120 125 Gly Gln Tyr Gln Lys Gly Leu Val Ile Gly Ala Glu Val Leu Ser Lys 130 135 140 Ile Ile Asp Trp Ser Asp Arg Thr Thr Ala Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Gly Gly Val Leu Leu Glu Ala Asp Ser Ser Glu His Phe Leu Phe 165 170 175 Glu Ser Ile His Ser Asp Gly Ser Arg Gly Glu Ser Leu Thr Ser Gly 180 185 190 Glu His Ala Val Ser Ser Pro Phe Ser Gln Val Asp Lys Lys Asp Asn 195 200 205 Cys Phe Leu Lys Met Asp Gly Arg Ala Ile Phe Asp Phe Ala Ile Arg 210 215 220 Asp Val Ser Lys Ser Ile Ser Met Leu Ile Arg Lys Ser Asp Met Pro 225 230 235 240 Val Glu Ala Ile Asp Tyr Phe Leu Leu His Gln Ala Asn Ile Arg Ile 245 250 255 Leu Asp Lys Met Ala Lys Lys Ile Gly Ala Asp Arg Glu Lys Phe Pro 260 265 270 Ala Asn Met Met Lys Tyr Gly Asn Thr Ser Ala Ala Ser Ile Pro Ile 275 280 285 Leu Leu Ala Glu Cys Val Glu Asn Gly Thr Ile Glu Leu Asn Gly Ser 290 295 300 His Thr Val Leu Leu Ser Gly Phe Gly Gly Gly Leu Thr Trp Gly Ser 305 310 315 320 Leu Ile Val Lys Ile 325 101978DNAStreptococcus mutans 101atgacttttg caaagattag tcaagcagca tattatgtac catcacaggt tgtcaccaat 60gatgatttat ctaaaataat ggataccagt gatgaatgga ttacaagtcg tacgggaata 120agagagcgcc gtattagtca atccgaagat accagtgact tagccagtca ggtggccaaa 180gaacttttaa aaaaagcctc attaaaggcg aaagagattg attttattat tgttgctaca 240attactccgg atgcaatgat gccatcaaca gctgcttgtg tccaagcgaa aattggtgca 300gtgaatgctt ttgctttcga tttaactgcc gcctgcagtg gatttatttt tgcactttca 360gctgcggaaa aaatgattaa atccggtcag taccagaaag gtttagttat cggtgcagaa 420gttctatcta aaatcatcga ttggtcggat cgaacaacag ctgttctttt tggagatgga 480gctggcggtg ttcttttaga agcagattct tctgaacatt ttttatttga atctattcat 540tcagatggca gtcgtggtga aagtttgaca tcaggtgaac acgctgtttc gtcacccttt 600tcacaggttg ataaaaaaga taactgtttt ctaaaaatgg atggtcgagc tatatttgac 660tttgctattc gtgatgtgtc aaaaagtatt tcgatgctca ttaggaagtc agatatgcct 720gtagaagcga ttgattattt cttattacat caggctaata ttcgtatttt ggataaaatg 780gctaaaaaaa ttggcgctga tagagaaaaa tttcctgcta atatgatgaa gtatggtaat 840accagtgcag caagtattcc tattttatta gccgaatgtg tcgaaaatgg aactatagag 900ctaaatggtt cacacactgt tctcctgagc gggttcggtg ggggtttgac atggggcagt 960ttaattgtta aaatttag 978102325PRTLactococcus lactis 102Met Thr Phe Ala Lys Ile Thr Gln Val Ala His Tyr Val Pro Glu Asn 1 5 10 15 Val Val Ser Asn Asp Asp Leu Ser Lys Ile Met Asp Thr Asn Asp Glu 20 25 30 Trp Ile Tyr Ser Arg Thr Gly Ile Lys Asn Arg His Ile Ser Thr Gly 35 40 45 Glu Asn Thr Ser Asp Leu Ala Ala Lys Val Ala Lys Gln Leu Ile Ser 50 55 60 Asp Ser Asn Leu Ser Pro Glu Thr Ile Asp Phe Ile Ile Val Ala Thr 65 70 75 80 Val Thr Pro Asp Ser Leu Met Pro Ser Thr Ala Ala Arg Val Gln Ala 85 90 95 Gln Val Gly Ala Val Asn Ala Phe Ala Tyr Asp Leu Thr Ala Ala Cys 100 105 110 Ser Gly Phe Val Phe Ala Leu Ser Thr Ala Glu Lys Leu Ile Ser Ser 115 120 125 Gly Ala Tyr Gln Arg Gly Leu Val Ile Gly Ala Glu Val Phe Ser Lys 130 135 140 Val Ile Asp Trp Ser Asp Arg Ser Thr Ala Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Ala Gly Val Leu Ile Glu Ala Gly Ala Ser Gln Pro Leu Ile Ile 165 170 175 Ala Glu Lys Met Gln Thr Asp Gly Ser Arg Gly Asn Ser Leu Leu Ser 180 185 190 Ser Tyr Ala Asp Ile Gln Thr Pro Phe Ala Ser Val Ser Tyr Glu Ser 195 200 205 Ser Asn Leu Ser Met Glu Gly Arg Ala Ile Phe Asp Phe Ala Val Arg 210 215 220 Asp Val Pro Lys Asn Ile Gln Ala Thr Leu Glu Lys Ala Asn Leu Ser 225 230 235 240 Ala Glu Glu Val Asp Tyr Tyr Leu Leu His Gln Ala Asn Ser Arg Ile 245 250 255 Leu Asp Lys Met Ala Lys Lys Leu Gly Val Thr Arg Gln Lys Phe Leu 260 265 270 Gln Asn Met Gln Glu Tyr Gly Asn Thr Ser Ala Ala Ser Ile Pro Ile 275 280 285 Leu Leu Ser Glu Ser Val Lys Asn Gly Ile Phe Ser Leu Asp Gly Gln 290 295 300 Thr Lys Val Val Leu Thr Gly Phe Gly Gly Gly Leu Thr Trp Gly Thr 305 310 315 320 Ala Ile Ile Asn Leu 325 103978DNALactococcus lactis 103atgacttttg cgaaaattac gcaagtggca

cactatgtgc ctgaaaatgt ggtatctaat 60gatgacttgt ccaaaataat ggatactaat gatgaatgga tttacagtcg gacagggatt 120aaaaatcgcc atatttcaac tggagagaac acctcagact tagcagctaa agttgctaag 180cagttgatta gcgattcaaa tttaagccca gaaacgattg acttcatcat tgttgctaca 240gtaactccgg actcattgat gccttcaacc gcggcacggg ttcaagctca agtaggagca 300gttaatgctt ttgcttacga tttgactgcg gcttgttcag gctttgtctt tgctctatca 360acagcggaaa aattaatttc ctcaggagca tatcaacgag ggcttgtcat tggcgcagaa 420gtcttttcaa aagtaattga ttggtcagac cgatcaactg ctgttctttt cggagatgga 480gctgctggtg tgcttattga agctggcgcg agtcaacctc tgattattgc tgaaaaaatg 540caaacagatg gaagtcgtgg gaacagttta ctttctagtt atgctgacat ccaaactcca 600tttgcctctg tttcatacga aagttcaaac ttgagtatgg aagggcgagc aatttttgat 660tttgccgtac gtgatgttcc taaaaatatc caggcaactt tagaaaaagc taatttgtct 720gctgaagaag tagattatta tctccttcat caagcgaatt caagaatcct tgataaaatg 780gctaaaaagc ttggtgtgac gcgccaaaag ttccttcaaa atatgcaaga atatggtaac 840acatcggcag caagtatccc tatattgttg tcagaatccg taaaaaatgg tatatttagt 900ttggacggtc aaacaaaagt cgtcttgaca ggatttggcg gtggcctcac ttggggtaca 960gcaattatta atttataa 978104317PRTLeginonella pneumophila 104Met Lys Asn Ala Val Ile Ser Gly Thr Gly Ser Tyr Ser Pro Glu Arg 1 5 10 15 Gln Met Thr Asn Ala Glu Leu Glu Thr Met Leu Asp Thr Ser Asp Glu 20 25 30 Trp Ile Val Thr Arg Thr Gly Ile Ser Ser Arg Ser Val Ala Gln Glu 35 40 45 His Glu Thr Thr Ser Tyr Met Ala Ser Arg Ala Ala Glu Gln Ala Leu 50 55 60 Glu Ala Ser Gly Leu Asp Ala Glu Glu Ile Asp Leu Ile Leu Val Ala 65 70 75 80 Thr Cys Thr Pro Asp Tyr Phe Phe Pro Ser Val Ala Cys His Val Gln 85 90 95 His Ala Leu Gly Ile Lys Arg Pro Ile Pro Ala Phe Asp Ile Gly Ala 100 105 110 Ala Cys Ser Gly Phe Val Tyr Ala Met Asp Val Ala Lys Gln Tyr Ile 115 120 125 Ala Thr Gly Ala Ala Lys His Val Leu Val Val Gly Ser Glu Ser Met 130 135 140 Ser Arg Ala Val Asp Trp Thr Asp Arg Ser Ile Cys Val Leu Phe Gly 145 150 155 160 Asp Gly Ala Gly Ala Val Val Leu Ser Ala Ser Asp Arg Gln Gly Ile 165 170 175 Met Gly Ser Val Leu His Ser Ala Tyr Asp Ser Asp Lys Leu Leu Val 180 185 190 Leu Arg Asn Ser Thr Phe Glu Gln Asp Arg Ala Thr Ile Gly Met Arg 195 200 205 Gly Asn Glu Val Phe Lys Ile Ala Val Asn Ile Met Gly Asn Ile Val 210 215 220 Asp Glu Val Leu Glu Ala Ser His Leu Lys Lys Ser Asp Ile Asp Trp 225 230 235 240 Leu Ile Pro His Gln Ala Asn Ile Arg Ile Ile Gln Ala Ile Ala Lys 245 250 255 Lys Leu Ser Leu Pro Met Ser His Val Ile Val Thr Ile Gly Asn Gln 260 265 270 Gly Asn Thr Ser Ala Ala Ser Ile Pro Leu Ala Leu Asp Tyr Ser Ile 275 280 285 Lys Asn Asn Gln Ile Lys Arg Asp Glu Ile Leu Leu Ile Glu Ser Phe 290 295 300 Gly Gly Gly Met Thr Trp Gly Ala Met Val Ile Arg Tyr 305 310 315 105954DNALeginonella pneumophila 105atgaaaaatg ctgttattag tggcactgga agttactctc cagagagaca aatgactaat 60gctgaactgg aaaccatgct tgatactagc gatgaatgga tagttaccag gactggtatt 120agtagtcgta gtgttgctca agaacatgaa acaacatctt atatggcctc cagagcagca 180gagcaagcac tagaggcatc aggccttgat gctgaagaaa ttgatttgat attagtagca 240acatgtaccc cggattattt ttttcctagc gttgcctgtc acgtacaaca tgctttagga 300atcaaaagac ctattccggc ttttgacatt ggagctgcat gcagcggttt tgtttatgcg 360atggatgtag cgaaacaata cattgctaca ggggctgcca aacacgttct tgtcgtaggc 420agcgagagca tgtcaagagc ggtagattgg actgatcgtt ctatttgtgt cttattcgga 480gatggcgcag gcgctgttgt tttaagcgca agtgatcgcc aagggattat gggtagtgtt 540ttacattctg cctatgactc tgataaatta ctagtccttc gtaattcaac ttttgaacaa 600gatcgtgcaa cgattggaat gcgaggtaat gaggtattta aaattgctgt taatattatg 660ggtaatattg ttgatgaagt gttagaagca agtcatttaa aaaaatctga tattgattgg 720ctgatacctc atcaagccaa tatacgcatt atacaagcca tagctaaaaa attatctctt 780cctatgtcac atgttattgt tacaattggt aaccaaggca acacatcggc tgcttctatt 840cccttagcac ttgattattc tattaaaaat aatcagatta aaagggatga aatattatta 900attgaatcct ttggtggtgg aatgacctgg ggcgctatgg ttattcgtta ctaa 954106312PRTListeria monocytogenes 106Met Asn Ala Gly Ile Leu Gly Val Gly Lys Tyr Val Pro Glu Lys Ile 1 5 10 15 Val Thr Asn Phe Asp Leu Glu Lys Ile Met Asp Thr Ser Asp Glu Trp 20 25 30 Ile Arg Thr Arg Thr Gly Ile Glu Glu Arg Arg Ile Ala Arg Asp Asp 35 40 45 Glu Tyr Thr His Asp Leu Ala Tyr Glu Ala Ala Lys Val Ala Ile Glu 50 55 60 Asn Ala Gly Leu Thr Pro Asp Asp Ile Asp Leu Phe Ile Val Ala Thr 65 70 75 80 Val Thr Gln Glu Ala Thr Phe Pro Ser Val Ala Asn Ile Ile Gln Asp 85 90 95 Arg Leu Gly Ala Thr Asn Ala Ala Gly Met Asp Val Glu Ala Ala Cys 100 105 110 Ala Gly Phe Thr Phe Gly Val Val Thr Ala Ala Gln Phe Ile Lys Thr 115 120 125 Gly Ala Tyr Lys Asn Ile Val Val Val Gly Ala Asp Lys Leu Ser Lys 130 135 140 Ile Thr Asn Trp Asp Asp Arg Ala Thr Ala Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Val Val Met Gly Pro Val Ser Asp Asp His Gly Leu Leu 165 170 175 Ser Phe Asp Leu Gly Ser Asp Gly Ser Gly Gly Lys Tyr Leu Asn Leu 180 185 190 Asp Glu Asn Lys Lys Ile Tyr Met Asn Gly Arg Glu Val Phe Arg Phe 195 200 205 Ala Val Arg Gln Met Gly Glu Ala Ser Leu Arg Val Leu Glu Arg Ala 210 215 220 Gly Leu Glu Lys Glu Glu Leu Asp Leu Leu Ile Pro His Gln Ala Asn 225 230 235 240 Ile Arg Ile Met Glu Ala Ser Arg Glu Arg Leu Asn Leu Pro Glu Glu 245 250 255 Lys Leu Met Lys Thr Val His Lys Tyr Gly Asn Thr Ser Ser Ser Ser 260 265 270 Ile Ala Leu Ala Leu Val Asp Ala Val Glu Glu Gly Arg Ile Lys Asp 275 280 285 Asn Asp Asn Val Leu Leu Val Gly Phe Gly Gly Gly Leu Thr Trp Gly 290 295 300 Ala Leu Ile Ile Arg Trp Gly Lys 305 310 107939DNAListeria monocytogenes 107atgaacgcag gaattttagg agtaggtaaa tacgtacctg aaaaaatagt aacaaatttt 60gatttagaaa aaataatgga tacatccgat gagtggattc gtactcgaac tggtattgaa 120gaaagaagaa ttgctcgtga tgacgaatat acgcacgact tagcatacga agcagcaaag 180gtagctattg agaatgctgg gcttacacca gatgacattg acttatttat tgttgccact 240gtgacgcagg aagcgacttt tccatccgtt gcgaatatta ttcaagaccg tttaggagca 300acaaatgctg cgggtatgga cgtggaagcg gcatgtgccg gttttacttt tggcgtagta 360actgcagcac aatttattaa aacaggggca tacaagaata tcgtcgtagt tggtgcggat 420aaattatcta aaatcactaa ctgggatgat cgcgcaacag ccgtattatt tggtgatgga 480gcgggagccg ttgttatggg tccggtttct gatgaccatg gactactttc gtttgactta 540ggctcagatg gatctggcgg caaatacttg aacttagatg aaaataagaa gatttatatg 600aatggacgtg aagtgttccg ttttgcagtt cgccaaatgg gagaagcttc gttacgagta 660cttgaacgtg ctggacttga aaaagaagaa ttggatttac taattcctca ccaagcaaat 720atccgtatca tggaagcttc tcgcgagcgt ttgaatttac cggaagaaaa actgatgaaa 780acagtgcata aatacggtaa tacttcgtca tcttctattg ctcttgcgct agttgatgca 840gtcgaagaag gacgcattaa agataatgac aatgtcctgc ttgttggctt tggcggcgga 900ctaacatggg gcgccctaat cattcgttgg ggtaagtaa 939108325PRTBacillus subtilis subsp. subtilis str. 168 108Met Ser Lys Ala Lys Ile Thr Ala Ile Gly Thr Tyr Ala Pro Ser Arg 1 5 10 15 Arg Leu Thr Asn Ala Asp Leu Glu Lys Ile Val Asp Thr Ser Asp Glu 20 25 30 Trp Ile Val Gln Arg Thr Gly Met Arg Glu Arg Arg Ile Ala Asp Glu 35 40 45 His Gln Phe Thr Ser Asp Leu Cys Ile Glu Ala Val Lys Asn Leu Lys 50 55 60 Ser Arg Tyr Lys Gly Thr Leu Asp Asp Val Asp Met Ile Leu Val Ala 65 70 75 80 Thr Thr Thr Ser Asp Tyr Ala Phe Pro Ser Thr Ala Cys Arg Val Gln 85 90 95 Glu Tyr Phe Gly Trp Glu Ser Thr Gly Ala Leu Asp Ile Asn Ala Thr 100 105 110 Cys Ala Gly Leu Thr Tyr Gly Leu His Leu Ala Asn Gly Leu Ile Thr 115 120 125 Ser Gly Leu His Gln Lys Ile Leu Val Ile Ala Gly Glu Thr Leu Ser 130 135 140 Lys Val Thr Asp Tyr Thr Asp Arg Thr Thr Cys Val Leu Phe Gly Asp 145 150 155 160 Ala Ala Gly Ala Leu Leu Val Glu Arg Asp Glu Glu Thr Pro Gly Phe 165 170 175 Leu Ala Ser Val Gln Gly Thr Ser Gly Asn Gly Gly Asp Ile Leu Tyr 180 185 190 Arg Ala Gly Leu Arg Asn Glu Ile Asn Gly Val Gln Leu Val Gly Ser 195 200 205 Gly Lys Met Val Gln Asn Gly Arg Glu Val Tyr Lys Trp Ala Ala Arg 210 215 220 Thr Val Pro Gly Glu Phe Glu Arg Leu Leu His Lys Ala Gly Leu Ser 225 230 235 240 Ser Asp Asp Leu Asp Trp Phe Val Pro His Ser Ala Asn Leu Arg Met 245 250 255 Ile Glu Ser Ile Cys Glu Lys Thr Pro Phe Pro Ile Glu Lys Thr Leu 260 265 270 Thr Ser Val Glu His Tyr Gly Asn Thr Ser Ser Val Ser Ile Val Leu 275 280 285 Ala Leu Asp Leu Ala Val Lys Ala Gly Lys Leu Lys Lys Asp Gln Ile 290 295 300 Val Leu Leu Phe Gly Phe Gly Gly Gly Leu Thr Tyr Thr Gly Leu Leu 305 310 315 320 Ile Lys Trp Gly Met 325 109978DNABacillus subtilis subsp. subtilis str. 168 109atgtcaaaag caaaaattac agctatcggc acctatgcgc cgagcagacg tttaaccaat 60gcagatttag aaaagatcgt tgatacctct gatgaatgga tcgttcagcg cacaggaatg 120agagaacgcc ggattgcgga tgaacatcaa tttacctctg atttatgcat agaagcggtg 180aagaatctca agagccgtta taaaggaacg cttgatgatg tcgatatgat cctcgttgcc 240acaaccacat ccgattacgc ctttccgagt acggcatgcc gcgtacagga atatttcggc 300tgggaaagca ccggcgcgct ggatattaat gcgacatgcg ccgggctgac atacggcctc 360catttggcaa atggattgat cacatctggc cttcatcaaa aaattctcgt catcgccgga 420gagacgttat caaaggtaac cgattatacc gatcgaacga catgcgtact gttcggcgat 480gccgcgggtg cgctgttagt agaacgagat gaagagacgc cgggatttct tgcgtctgta 540caaggaacaa gcgggaacgg cggcgatatt ttgtatcgtg ccggactgcg aaatgaaata 600aacggtgtgc agcttgtcgg ttccggaaaa atggtgcaaa acggacgcga ggtatataaa 660tgggccgcaa gaaccgtccc tggcgaattt gaacggcttt tacataaagc aggactcagc 720tccgatgatc tcgattggtt tgttcctcac agcgccaact tgcgcatgat cgagtcaatt 780tgtgaaaaaa caccgttccc gattgaaaaa acgctcacta gtgttgagca ctacggaaac 840acgtcttcgg tttcaattgt tttggcgctc gatctcgcag tgaaagccgg gaagctgaaa 900aaagatcaaa tcgttttgct tttcgggttt ggcggcggat taacctatac aggattgctt 960attaaatggg ggatgtaa 978110331PRTMyxococcus xanthus 110Met Arg Tyr Ala Gln Ile Leu Ser Thr Gly Arg Tyr Val Pro Glu Lys 1 5 10 15 Val Leu Thr Asn Ala Asp Val Glu Lys Ile Leu Gly Glu Lys Val Asp 20 25 30 Glu Trp Leu Gln Gln Asn Val Gly Ile Arg Glu Arg His Met Met Ala 35 40 45 Asp Asp Gln Ala Thr Ser Asp Leu Cys Val Gly Ala Ala Arg Gln Ala 50 55 60 Leu Glu Arg Ala Gly Thr Lys Pro Glu Glu Leu Asp Leu Ile Ile Ile 65 70 75 80 Ala Thr Asp Thr Pro Asp Tyr Leu Ser Pro Ala Thr Ala Ser Val Val 85 90 95 Gln Ala Lys Leu Gly Ala Val Asn Ala Gly Thr Tyr Asp Leu Asn Cys 100 105 110 Ala Cys Ala Gly Trp Val Thr Ala Leu Asp Val Gly Ser Lys Thr Ile 115 120 125 Ala Ala Asp Asp Ser Tyr Gln Arg Ile Leu Val Val Gly Ala Tyr Gly 130 135 140 Met Ser Arg Tyr Ile Asn Trp Lys Asp Lys Lys Thr Ala Thr Leu Phe 145 150 155 160 Ala Asp Gly Ala Gly Ala Val Val Leu Gly Ala Gly Asp Thr Pro Gly 165 170 175 Phe Met Gly Ala Lys Leu Leu Ala Asn Gly Glu Tyr His Asp Ala Leu 180 185 190 Gly Val Tyr Thr Gly Gly Thr Asn Arg Pro Ala Thr Ala Glu Ser Leu 195 200 205 Glu Leu Thr Gly Gly Lys Pro Ala Val Gln Phe Val Arg Lys Phe Pro 210 215 220 Ala Thr Phe Asn Thr Glu Arg Trp Pro Met Leu Leu Asp Gln Leu Leu 225 230 235 240 Lys Arg Gln Asn Leu Lys Leu Asp Asp Val Lys Gln Phe Val Phe Thr 245 250 255 Gln Leu Asn Leu Arg Thr Ile Glu Ala Thr Met Lys Ile Leu Gly Gln 260 265 270 Pro Met Glu Lys Ala His Tyr Thr Met Asp Lys Trp Gly Tyr Thr Gly 275 280 285 Ser Ala Cys Ile Pro Met Thr Leu Asp Asp Ala Val Val Gln Gly Lys 290 295 300 Val Gln Arg Gly Asp Leu Val Ala Leu Cys Ala Ser Gly Gly Gly Leu 305 310 315 320 Ala Met Ala Ser Ala Leu Tyr Arg Trp Thr Ala 325 330 111996DNAMyxococcus xanthus 111atgcgatacg cccagattct ctccactggc cgctacgtcc ccgagaaggt cctcaccaac 60gctgacgtcg agaagattct cggtgagaag gtggatgagt ggctccagca gaacgtgggc 120attcgcgaac gccacatgat ggcggatgac caggccacct ccgacctctg cgtgggcgcc 180gcccgccagg cgctggagcg cgcgggcacg aagccggagg aactggacct catcatcatc 240gccaccgata ccccggacta tctcagcccc gccacggcct ccgtggtgca ggcgaagctg 300ggcgcggtga acgccggcac ctacgacctc aactgcgcgt gcgcgggctg ggtgacggcg 360ctggacgtgg gctcgaagac gattgccgcg gatgacagct accagcgcat cctcgtcgtg 420ggcgcctacg gcatgtcgcg ctacatcaac tggaaggaca agaagaccgc caccctgttc 480gcggacggcg cgggcgcggt cgtgctgggc gcgggtgaca cgcccggctt catgggcgcg 540aagctgctgg ccaacggcga gtaccacgac gcgctgggtg tctacaccgg cggtacgaac 600cgcccggcca ccgcggagtc gctggagctc acgggcggca agcccgcggt gcagttcgtc 660cgcaagttcc cggcgacgtt caacaccgag cgctggccca tgctgctgga ccagctcctc 720aagcggcaga acctgaagct ggacgacgtg aagcagttcg tcttcacgca gctcaacctg 780cgcaccatcg aagccaccat gaagatcctg ggccagccga tggagaaggc ccactacacc 840atggacaagt ggggctacac cggttcggcc tgcatcccga tgacgctgga tgacgcggtg 900gtgcagggca aggtgcagcg cggcgacctg gtggccctgt gtgccagcgg cggcgggctc 960gccatggcct ccgccctcta ccgctggacg gcctga 996112325PRTStenotrophomonas maltophilia 112Met Ser Lys Arg Ile Tyr Ser Arg Ile Ala Gly Thr Gly Ser Tyr Leu 1 5 10 15 Pro Glu Lys Val Leu Thr Asn Ala Asp Leu Glu Lys Met Val Glu Thr 20 25 30 Ser Asp Glu Trp Ile Gln Ser Arg Thr Gly Ile Arg Glu Arg His Ile 35 40 45 Ala Ala Glu Gly Glu Thr Thr Ser Asp Leu Gly Tyr Asn Ala Ala Leu 50 55 60 Arg Ala Leu Glu Ala Ala Gly Ile Asp Ala Ser Gln Leu Asp Met Ile 65 70 75 80 Val Val Gly Thr Thr Thr Pro Asp Leu Ile Phe Pro Ser Thr Ala Cys 85 90 95 Leu Ile Gln Ala Lys Leu Gly Val Ala Gly Cys Pro Ala Phe Asp Val 100 105 110 Asn Ala Ala Cys Ser Gly Phe Val Phe Ala Leu Gly Val Ala Asp Lys 115 120 125 Phe Ile Arg Ser Gly Asp Cys Arg Tyr Val Leu Val Ile Gly Ala Glu 130 135 140 Thr Leu Thr Arg Met Val Asp Trp Asn Asp Arg Thr Thr Cys Val Leu 145 150 155 160 Phe Gly Asp Gly Ala Gly Ala Val Val Leu Lys Ala Asp Glu Glu Thr 165 170 175 Gly Ile Leu Ser Thr His Leu His Ser Asp Gly Ser Lys Lys Glu Leu 180 185 190 Leu Trp Asn Pro Val Gly Val Ser Thr Gly Phe Lys Gly Gly Ala Asn 195 200 205 Gly Gly Gly Thr Ile Asn Met Lys Gly Asn Asp Val Phe

Lys Tyr Ala 210 215 220 Val Lys Ala Leu Asp Ser Val Val Asp Glu Thr Leu Ala Ala Asn Gly 225 230 235 240 Leu Asp Lys Ser Asp Leu Asp Trp Leu Ile Pro His Gln Ala Asn Leu 245 250 255 Arg Ile Ile Glu Ala Thr Ala Lys Arg Leu Asp Met Ser Met Glu Gln 260 265 270 Val Val Val Thr Val Asp Gln His Gly Asn Thr Ser Ser Gly Ser Val 275 280 285 Pro Leu Ala Leu Asp Ala Ala Val Arg Ser Gly Lys Val Glu Arg Gly 290 295 300 Gln Leu Leu Leu Leu Glu Ala Phe Gly Gly Gly Phe Thr Trp Gly Ser 305 310 315 320 Ala Leu Leu Arg Tyr 325 113978DNAStenotrophomonas maltophilia 113atgagcaagc ggatctattc gaggatcgcg ggcaccggta gctatttgcc ggaaaaagtc 60ctgaccaacg ccgacctgga aaaaatggtc gaaacctcgg atgagtggat ccagtcgcgc 120accggcattc gtgaacggca catcgcggcc gaaggcgaaa ccaccagcga tctcggctac 180aacgccgcgc tgcgcgcact tgaagcggcc ggcatcgacg cttcgcagct cgacatgatc 240gtggtcggta cgaccacccc tgaccttatt ttcccgtcca ccgcgtgcct gatccaggcc 300aagctcggtg tggccggatg ccccgccttc gacgtcaacg cggcctgttc gggtttcgtg 360ttcgcgctgg gcgtggccga caaattcatc cgttccggcg actgccggta cgtgctggtg 420atcggcgccg aaacgctgac ccgcatggtt gactggaacg atcgcaccac ctgcgtgctg 480ttcggtgatg gtgccggcgc cgtcgtgctc aaggccgacg aagagaccgg catcctcagc 540acccacctgc attccgatgg cagcaagaag gagctgttgt ggaacccggt gggtgtctcg 600accggtttca agggcggcgc caacggtggt ggcactatca acatgaaggg caacgatgtg 660ttcaagtacg ccgtcaaggc gctggactcg gtcgtggacg agaccttggc tgcgaacggc 720ctggacaagt ccgacctgga ttggctgatt ccgcaccagg ccaacctacg catcatcgaa 780gccacggcca agcgcctgga catgtcgatg gaacaggtcg tggtcacggt tgatcagcac 840ggcaacacct cgtccggctc ggtgccgctg gcgctggacg ctgcagtgcg atcgggcaag 900gtcgagcgcg gccagctgct gttgctggaa gccttcggcg gcggcttcac ctggggttcg 960gccctgctgc gctattga 978114334PRTBacteroides vulgatus 114Met Glu Lys Ile Asn Ala Val Ile Thr Gly Val Gly Gly Tyr Val Pro 1 5 10 15 Asp Tyr Val Leu Thr Asn Glu Glu Ile Ser Arg Met Val Asp Thr Asn 20 25 30 Asp Glu Trp Ile Met Thr Arg Ile Gly Val Lys Glu Arg Arg Ile Leu 35 40 45 Asn Glu Glu Gly Leu Gly Thr Ser Tyr Met Ala Arg Lys Ala Ala Lys 50 55 60 Gln Leu Met Gln Lys Thr Ala Ser Asn Pro Asp Asp Ile Asp Ala Val 65 70 75 80 Ile Val Ala Thr Thr Thr Pro Asp Tyr His Phe Pro Ser Thr Ala Ser 85 90 95 Ile Leu Cys Asp Lys Leu Gly Leu Lys Asn Ala Phe Ala Phe Asp Leu 100 105 110 Gln Ala Ala Cys Cys Gly Phe Leu Tyr Leu Met Glu Thr Ala Ala Ser 115 120 125 Leu Ile Ala Ser Gly Arg His Lys Lys Ile Ile Ile Val Gly Ala Asp 130 135 140 Lys Met Ser Ser Met Val Asn Tyr Gln Asp Arg Ala Thr Cys Pro Ile 145 150 155 160 Phe Gly Asp Gly Ala Ala Ala Cys Met Val Glu Ala Thr Thr Glu Asp 165 170 175 Tyr Gly Ile Met Asp Ser Ile Leu Arg Thr Asp Gly Lys Gly Leu Pro 180 185 190 Phe Leu His Met Lys Ala Gly Gly Ser Val Cys Pro Pro Ser Tyr Phe 195 200 205 Thr Val Asp His Lys Met His Tyr Leu Tyr Gln Glu Gly Arg Thr Val 210 215 220 Phe Lys Tyr Ala Val Ser Asn Met Ser Asp Ile Thr Ala Thr Ile Ala 225 230 235 240 Glu Lys Asn Gly Leu Asn Lys Asp Asn Ile Asp Trp Val Ile Pro His 245 250 255 Gln Ala Asn Leu Arg Ile Ile Asp Ala Val Ala Ser Arg Leu Glu Val 260 265 270 Pro Leu Glu Lys Val Met Ile Asn Ile Gln Arg Tyr Gly Asn Thr Ser 275 280 285 Gly Ala Thr Leu Pro Leu Cys Leu Trp Asp Tyr Glu Lys Gln Leu Lys 290 295 300 Lys Gly Asp Asn Leu Ile Phe Thr Ala Phe Gly Ala Gly Phe Thr Tyr 305 310 315 320 Gly Ala Val Tyr Val Lys Trp Gly Tyr Asp Gly Ser Lys Arg 325 330 1151005DNABacteroides vulgatus 115atggaaaaaa taaatgcagt aataacagga gtcggtggat atgtaccaga ttatgtcttg 60actaacgaag agatttcaag aatggtagat accaatgatg aatggattat gactcgaatc 120ggagttaaag aaagacgtat tctgaatgaa gaaggattag gtacatcgta tatggcgcgt 180aaggctgcca aacaactgat gcagaaaaca gcttctaatc cggatgacat tgatgcagta 240atcgtagcaa ctactactcc tgactatcat ttcccttcca ctgcttctat cctgtgtgat 300aagctgggat tgaaaaatgc atttgcattt gatttgcagg ctgcctgctg cggctttttg 360tatttaatgg aaactgctgc ttcacttatc gcatcgggaa gacataaaaa gattattatt 420gtcggtgcag ataagatgtc atctatggta aactaccagg atcgtgcaac ttgccctatc 480tttggtgatg gtgcagcagc atgtatggtg gaagctacta cagaagatta tggtattatg 540gattctattc ttcgtacaga tggtaaggga cttccttttc ttcacatgaa agccggtggt 600tctgtatgtc ctccttctta tttcactgtt gatcataaga tgcattatct ttatcaggaa 660ggaagaacag tatttaaata tgctgtttcc aatatgtcgg atattacagc gactattgcc 720gaaaagaatg gtttgaataa agataatatc gactgggtaa ttcctcatca ggctaatctg 780cgtattattg atgcggtagc ctctcgcttg gaagttccct tggaaaaggt aatgattaat 840attcagcgat atggtaatac cagtggtgct acacttccgt tgtgtctttg ggattacgaa 900aagcagctga agaaaggaga taacctgata tttacagctt tcggcgcagg ttttacctat 960ggagccgttt atgtgaaatg gggttacgat ggtagtaaga gataa 1005116325PRTClostridium acetobutylicum 116Met Asn Ser Val Glu Ile Ile Gly Thr Gly Ser Tyr Val Pro Glu Lys 1 5 10 15 Ile Val Thr Asn Glu Asp Met Ser Lys Ile Val Asp Thr Ser Asp Glu 20 25 30 Trp Ile Ser Ser Arg Thr Gly Ile Lys Glu Arg Arg Ile Ser Ile Asn 35 40 45 Glu Asn Thr Ser Asp Leu Gly Ala Lys Ala Ala Leu Arg Ala Ile Glu 50 55 60 Asp Ser Asn Ile Lys Pro Glu Glu Ile Asp Leu Ile Ile Val Ala Thr 65 70 75 80 Thr Ser Pro Asp Ser Tyr Thr Pro Ser Val Ala Cys Ile Val Gln Glu 85 90 95 Lys Ile Gly Ala Lys Asn Ala Ala Cys Phe Asp Leu Asn Ala Ala Cys 100 105 110 Thr Gly Phe Ile Phe Ala Leu Asn Thr Ala Ser Gln Phe Ile Lys Thr 115 120 125 Gly Glu Tyr Lys Thr Ala Leu Val Val Gly Thr Glu Val Leu Ser Lys 130 135 140 Ile Leu Asp Trp Gln Asp Arg Gly Thr Cys Val Leu Phe Gly Asp Gly 145 150 155 160 Ala Gly Ala Val Ile Ile Arg Gly Gly Asp Glu Asn Gly Ile Ile Lys 165 170 175 Ala Cys Leu Gly Ser Asp Gly Thr Gly Lys Asp Phe Leu His Cys Pro 180 185 190 Ala Thr Asn Val Ile Asn Pro Phe Ser Asp Glu Lys Gly Leu Ala Ser 195 200 205 Ser Lys Ile Ser Met Asn Gly Arg Glu Val Phe Lys Phe Ala Val Lys 210 215 220 Val Met Val Ser Ser Val Lys Lys Val Ile Glu Asp Ser Gly Leu Asn 225 230 235 240 Ile Glu Asp Ile Asp Tyr Ile Val Pro His Gln Ala Asn Ile Arg Ile 245 250 255 Ile Glu Phe Ala Ala Lys Lys Leu Gly Leu Ser Met Asp Lys Phe Phe 260 265 270 Ile Asn Leu Gln Asn Tyr Gly Asn Thr Ser Gly Ala Thr Ile Pro Leu 275 280 285 Ala Ile Asp Glu Met Asn Lys Lys Gly Leu Leu Lys Arg Gly Ala Lys 290 295 300 Ile Val Val Val Gly Phe Gly Gly Gly Leu Thr Trp Gly Ser Met Val 305 310 315 320 Leu Lys Trp Thr Lys 325 117978DNAClostridium acetobutylicum 117gtgaatagtg ttgagattat agggactgga agctatgtcc cagaaaaaat agttactaat 60gaagatatgt ctaagatagt tgatactagt gatgagtgga tatcatcaag aacaggtata 120aaggaaagaa gaatatctat aaacgaaaat acatcagatt taggtgctaa agctgcctta 180agggcaatag aggactcaaa cataaaacca gaagaaatag atttaataat agttgcaact 240acaagtccag actcatatac tccatccgta gcttgtattg ttcaggagaa gataggtgcc 300aaaaatgctg cctgttttga tttgaatgcg gcatgtactg gatttatatt tgctcttaat 360acggcatctc agtttataaa aacaggagag tataaaacag ctcttgtagt aggaacagag 420gtactatcaa agatacttga ttggcaagat agaggtacat gtgtactttt tggagatggt 480gcaggtgcgg taattataag aggcggagat gaaaacggaa ttattaaagc atgtcttggt 540tcagatggta cgggaaaaga cttcttgcat tgtccagcga ctaatgtgat aaatccattt 600tcggatgaaa aaggtttagc aagcagtaag atttctatga atggaagaga agtctttaaa 660tttgcagtta aggtaatggt aagctcagtt aaaaaggtta tagaagatag tggactaaat 720atagaagaca ttgattatat agtacctcat caggctaaca ttagaataat agagtttgca 780gctaaaaaac ttggattaag tatggacaaa ttttttataa acctacaaaa ctatggaaat 840acatctggag cgactatacc actggcaata gatgaaatga ataaaaaagg cttgcttaaa 900agaggtgcta aaatagttgt agttggtttt ggtggaggac ttacttgggg ttccatggtt 960cttaaatgga ctaaataa 978118332PRTFlavobacterium johnsoniae 118Met Asn Thr Ile Thr Ala Ala Ile Thr Ala Val Gly Gly Tyr Val Pro 1 5 10 15 Asp Phe Val Leu Ser Asn Lys Val Leu Glu Thr Met Val Asp Thr Asn 20 25 30 Asp Glu Trp Ile Thr Thr Arg Thr Gly Ile Lys Glu Arg Arg Ile Leu 35 40 45 Lys Asp Ala Asp Lys Gly Thr Ser Tyr Leu Ala Ile Gln Ala Ala Gln 50 55 60 Asp Leu Ile Ala Lys Ala Asn Ile Asp Pro Leu Glu Ile Asp Met Val 65 70 75 80 Ile Met Ala Thr Ala Thr Pro Asp Met Met Val Ala Ser Thr Gly Val 85 90 95 Tyr Val Ala Thr Glu Ile Gly Ala Val Asn Ala Phe Ala Tyr Asp Leu 100 105 110 Gln Ala Ala Cys Ser Ser Phe Leu Tyr Gly Met Ser Thr Ala Ala Ala 115 120 125 Tyr Val Gln Ser Gly Arg Tyr Lys Lys Val Leu Leu Ile Gly Ala Asp 130 135 140 Lys Met Ser Ser Ile Val Asp Tyr Thr Asp Arg Ala Thr Cys Ile Ile 145 150 155 160 Phe Gly Asp Gly Ala Gly Ala Val Leu Phe Glu Pro Asn Tyr Glu Gly 165 170 175 Leu Gly Leu Gln Asp Glu Tyr Leu Arg Ser Asp Gly Val Gly Arg Asp 180 185 190 Phe Leu Lys Ile Pro Ala Gly Gly Ser Leu Ile Pro Ala Ser Glu Asp 195 200 205 Thr Val Lys Asn Arg Gln His Asn Ile Met Gln Asp Gly Lys Thr Val 210 215 220 Phe Lys Tyr Ala Val Thr Asn Met Ala Asp Ala Ser Glu Leu Ile Leu 225 230 235 240 Gln Arg Asn Asn Leu Thr Asn Gln Asp Val Asp Trp Leu Val Pro His 245 250 255 Gln Ala Asn Lys Arg Ile Ile Asp Ala Thr Ala Gly Arg Leu Glu Leu 260 265 270 Glu Glu Ser Lys Val Leu Val Asn Ile Glu Arg Tyr Gly Asn Thr Thr 275 280 285 Ser Gly Thr Leu Pro Leu Val Leu Ser Asp Phe Glu Asn Gln Phe Lys 290 295 300 Lys Gly Asp Asn Ile Ile Leu Ala Ala Phe Gly Gly Gly Phe Thr Trp 305 310 315 320 Gly Ser Ile Tyr Leu Lys Trp Ala Tyr Asp Lys Lys 325 330 119999DNAFlavobacterium johnsoniae 119atgaatacaa tcacagccgc aattaccgct gttggaggct acgttccaga ctttgtgctt 60tcaaacaaag tgttggaaac aatggtagat accaatgacg aatggattac cactcgtaca 120ggaattaaag aaagaagaat tcttaaagat gctgataaag gtacatctta ccttgccata 180caagcagcac aggatttaat agcaaaagct aatattgatc ctcttgaaat tgatatggtt 240attatggcaa ctgcaacacc agatatgatg gtagcttcaa caggagttta tgttgcaaca 300gaaattggag ctgttaatgc atttgcatac gatttgcagg cagcttgttc aagtttctta 360tacggaatgt ctactgctgc ggcttatgta caatctggaa gatataaaaa agttctttta 420attggtgccg ataaaatgtc atcaattgta gattacacag acagagcaac ttgtattatt 480tttggtgatg gagcaggggc agttttgttt gagccaaatt acgaaggtct tggtctgcaa 540gacgaatatt taagaagtga tggtgtagga cgcgattttc ttaaaatacc agctggagga 600tctttaattc cagcttcaga agatactgta aaaaacagac aacacaatat tatgcaggat 660ggtaaaacag tttttaaata tgctgtaacc aatatggctg atgccagcga actaatcttg 720caaagaaaca atttaactaa tcaggatgtt gattggttag tgcctcacca ggcaaacaaa 780cgcatcatcg atgcaactgc aggaagacta gagttagaag agtctaaagt actagttaat 840atcgaaagat atggtaatac aacttcagga acattacctt tggtattaag cgattttgaa 900aatcaattca aaaaaggaga taatattatt ttagcagcat ttggaggtgg attcacttgg 960ggatctattt acctaaaatg ggcttacgat aagaaataa 999120350PRTMicrococcus luteus 120Met Thr Val Thr Leu Lys Gln His Glu Arg Pro Ala Ala Ser Arg Ile 1 5 10 15 Val Ala Val Gly Ala Tyr Arg Pro Ala Asn Leu Val Pro Asn Glu Asp 20 25 30 Leu Ile Gly Pro Ile Asp Ser Ser Asp Glu Trp Ile Arg Gln Arg Thr 35 40 45 Gly Ile Val Thr Arg Gln Arg Ala Thr Ala Glu Glu Thr Val Pro Val 50 55 60 Met Ala Val Gly Ala Ala Arg Glu Ala Leu Glu Arg Ala Gly Leu Gln 65 70 75 80 Gly Ser Asp Leu Asp Ala Val Ile Val Ser Thr Val Thr Phe Pro His 85 90 95 Ala Thr Pro Ser Ala Ala Ala Leu Val Ala His Glu Ile Gly Ala Thr 100 105 110 Pro Ala Pro Ala Tyr Asp Val Ser Ala Ala Cys Ala Gly Tyr Cys Tyr 115 120 125 Gly Val Ala Gln Ala Asp Ala Leu Val Arg Ser Gly Thr Ala Arg His 130 135 140 Val Leu Val Val Gly Val Glu Arg Leu Ser Asp Val Val Asp Pro Thr 145 150 155 160 Asp Arg Ser Ile Ser Phe Leu Leu Gly Asp Gly Ala Gly Ala Val Ile 165 170 175 Val Ala Ala Ser Asp Glu Pro Gly Ile Ser Pro Ser Val Trp Gly Ser 180 185 190 Asp Gly Glu Arg Trp Ser Thr Ile Ser Met Thr His Ser Gln Leu Glu 195 200 205 Leu Arg Asp Ala Val Glu His Ala Arg Thr Thr Gly Asp Ala Ser Ala 210 215 220 Ile Thr Gly Ala Glu Gly Met Leu Trp Pro Thr Leu Arg Gln Asp Gly 225 230 235 240 Pro Ser Val Phe Arg Trp Ala Val Trp Ser Met Ala Lys Val Ala Arg 245 250 255 Glu Ala Leu Asp Ala Ala Gly Val Glu Pro Glu Asp Leu Ala Ala Phe 260 265 270 Ile Pro His Gln Ala Asn Met Arg Ile Ile Asp Glu Phe Ala Lys Gln 275 280 285 Leu Lys Leu Pro Glu Ser Val Val Val Ala Arg Asp Ile Ala Asp Ala 290 295 300 Gly Asn Thr Ser Ala Ala Ser Ile Pro Leu Ala Met His Arg Leu Leu 305 310 315 320 Glu Glu Asn Pro Glu Leu Ser Gly Gly Leu Ala Leu Gln Ile Gly Phe 325 330 335 Gly Ala Gly Leu Val Tyr Gly Ala Gln Val Val Arg Leu Pro 340 345 350 121979DNAMicrococcus luteus 121atgaccgtca ccctgaagca gcacgagcgc cccgcggcca gccgcatcgt ggccgtgggc 60gcctaccgcc cggcgaacct ggtcccgaac gaggacctca tcggccccat cgactcgtcg 120gacgagtgga tccgccagcg caccggcatc gtcacacgcc agcgcgccac ggcggaggag 180accgtgcccg tcatggccgt gggcgccgcc cgggaggccc tcgagcgggc cggcctgcag 240ggctcggacc tggacgccgt gatcgtctcg accgtcacct tcccgcacgc caccccctcg 300gccgcggccc tcgtggcgca cgagatcggc gccaccccgg cgcccgccta cgacgtctcc 360gccgcgtgcg ccggctactg ctacggcgtg gcccaggccg acgcgctcgt gcgctccggc 420accgcgcggc acgtgctcgt ggtcggcgtc gagcgcctct ccgacgtcgt ggatcccacg 480gaccgctcca tctccttcct gctgggcgac ggcgcgggcg ccgtgatcgt cgcggcctcg 540gacgagccgg gcatctcccc ctcggtgtgg ggctcggacg gggagcgctg gtccacgatc 600tccatgacgc actcgcagct ggagctgcgc gatgccgtgg agcacgcccg caccacgggc 660gacgcctcgg cgatcaccgg cgcagagggg atgctctggc ccacgctgcg ccaggacggg 720ccctccgtct tccgttgggc cgtgtggtcg atggcgaagg tggcccgcga ggcccttgac 780gccgcgggcg tggagcccga ggacctcgcc gcgttcatcc cgcaccaggc caacatgcgg 840atcatcgacg agttcgccaa gcagctgaag ctgccggagt ccgtcgtcgt ggcccgggac 900atcgcggacg ccggcaacac gtcggccgcg tccatcccgc tggccatgca ccggctgctg 960gaggagaacc ccgagctct 97912217PRTArtificial sequenceSynthetic polypeptide 122Asp Thr Xaa Asp Xaa Trp Ile Xaa Xaa Xaa Thr Gly Ile Xaa Xaa Arg 1 5 10 15 Xaa 12318PRTArtificial

sequenceSynthetic polypeptide 123Xaa Xaa Asp Xaa Xaa Ala Xaa Cys Xaa Gly Phe Xaa Xaa Xaa Xaa Xaa 1 5 10 15 Xaa Ala 12415PRTArtificial sequenceSynthetic polypeptide 124Asp Arg Xaa Thr Xaa Xaa Xaa Phe Xaa Asp Gly Ala Xaa Xaa Xaa 1 5 10 15 1258PRTArtificial sequenceSynthetic polypeptide 125His Gln Ala Asn Xaa Arg Ile Xaa 1 5 12619PRTArtificial sequenceSynthetic polypeptide 126Gly Asn Thr Xaa Ala Ala Ser Xaa Pro Xaa Xaa Xaa Xaa Xaa Xaa Xaa 1 5 10 15 Xaa Xaa Gly 12713PRTArtificial sequenceSynthetic polypeptide 127Xaa Xaa Leu Xaa Xaa Phe Gly Gly Gly Xaa Xaa Trp Gly 1 5 10 1284559DNAArtificial sequencepDG2 plasmid 128ggggaattgt gagcggataa caattcccct gtagaaataa ttttgtttaa ctttaataag 60gagatatacc atggcgcaac tcactcttct tttagtcggc aattccgacg ccatcacgcc 120attacttgct aaagctgact ttgaacaacg ttcgcgtctg cagattattc ctgcgcagtc 180agttatcgcc agtgatgccc ggccttcgca agctatccgc gccagtcgtg ggagttcaat 240gcgcgtggcc ctggagctgg tgaaagaagg tcgagcgcaa gcctgtgtca gtgccggtaa 300taccggggcg ctgatggggc tggcaaaatt attactcaag cccctggagg ggattgagcg 360tccggcgctg gtgacggtat taccacatca gcaaaagggc aaaacggtgg tccttgactt 420aggggccaac gtcgattgtg acagcacaat gctggtgcaa tttgccatta tgggctcagt 480tctggctgaa gaggtggtgg aaattcccaa tcctcgcgtg gcgttgctca atattggtga 540agaagaagta aagggtctcg acagtattcg ggatgcctca gcggtgctta aaacaatccc 600ttctatcaat tatatcggct atcttgaagc caatgagttg ttaactggca agacagatgt 660gctggtttgt gacggcttta caggaaatgt cacattaaag acgatggaag gtgttgtcag 720gatgttcctt tctctgctga aatctcaggg tgaagggaaa aaacggtcgt ggtggctact 780gttattaaag cgttggctac aaaagagcct gacgaggcga ttcagtcacc tcaaccccga 840ccagtataac ggcgcctgtc tgttaggatt gcgcggcacg gtgataaaaa gtcatggtgc 900agccaatcag cgagcttttg cggtcgcgat tgaacaggca gtgcaggcgg tgcagcgaca 960agttcctcag cgaattgccg ctcgcctgga atctgtatac ccagctggtt ttgagctgct 1020ggacggtggc aaaagcggaa ctctgcggta gcaggacgct gccagcgaac tcgcagtttg 1080caagtgacgg tatataaccg aaaagtgact gagcgcatat gtatacgaag actcgagtct 1140ggtaaagaaa ccgctgctgc gaaatttgaa cgccagcaca tggactcgtc tactagcgca 1200gcttaattaa cctaggctgc tgccaccgct gagcaataac tagcataacc ccttggggcc 1260tctaaacggg tcttgagggg ttttttgctg aaacctcagg catttgagaa gcacacggtc 1320acactgcttc cggtagtcaa taaaccggta aaccagcaat agacataagc ggctatttaa 1380cgaccctgcc ctgaaccgac gaccgggtca tcgtggccgg atcttgcggc ccctcggctt 1440gaacgaattg ttagacatta tttgccgact accttggtga tctcgccttt cacgtagtgg 1500acaaattctt ccaactgatc tgcgcgcgag gccaagcgat cttcttcttg tccaagataa 1560gcctgtctag cttcaagtat gacgggctga tactgggccg gcaggcgctc cattgcccag 1620tcggcagcga catccttcgg cgcgattttg ccggttactg cgctgtacca aatgcgggac 1680aacgtaagca ctacatttcg ctcatcgcca gcccagtcgg gcggcgagtt ccatagcgtt 1740aaggtttcat ttagcgcctc aaatagatcc tgttcaggaa ccggatcaaa gagttcctcc 1800gccgctggac ctaccaaggc aacgctatgt tctcttgctt ttgtcagcaa gatagccaga 1860tcaatgtcga tcgtggctgg ctcgaagata cctgcaagaa tgtcattgcg ctgccattct 1920ccaaattgca gttcgcgctt agctggataa cgccacggaa tgatgtcgtc gtgcacaaca 1980atggtgactt ctacagcgcg gagaatctcg ctctctccag gggaagccga agtttccaaa 2040aggtcgttga tcaaagctcg ccgcgttgtt tcatcaagcc ttacggtcac cgtaaccagc 2100aaatcaatat cactgtgtgg cttcaggccg ccatccactg cggagccgta caaatgtacg 2160gccagcaacg tcggttcgag atggcgctcg atgacgccaa ctacctctga tagttgagtc 2220gatacttcgg cgatcaccgc ttccctcata ctcttccttt ttcaatatta ttgaagcatt 2280tatcagggtt attgtctcat gagcggatac atatttgaat gtatttagaa aaataaacaa 2340atagctagct cactcggtcg ctacgctccg ggcgtgagac tgcggcgggc gctgcggaca 2400catacaaagt tacccacaga ttccgtggat aagcagggga ctaacatgtg aggcaaaaca 2460gcagggccgc gccggtggcg tttttccata ggctccgccc tcctgccaga gttcacataa 2520acagacgctt ttccggtgca tctgtgggag ccgtgaggct caaccatgaa tctgacagta 2580cgggcgaaac ccgacaggac ttaaagatcc ccaccgtttc cggcgggtcg ctccctcttg 2640cgctctcctg ttccgaccct gccgtttacc ggatacctgt tccgcctttc tcccttacgg 2700gaagtgtggc gctttctcat agctcacaca ctggtatctc ggctcggtgt aggtcgttcg 2760ctccaagctg ggctgtaagc aagaactccc cgttcagccc gactgctgcg ccttatccgg 2820taactgttca cttgagtcca acccggaaaa gcacggtaaa acgccactgg cagcagccat 2880tggtaactgg gagttcgcag aggatttgtt tagctaaaca cgcggttgct cttgaagtgt 2940gcgccaaagt ccggctacac tggaaggaca gatttggttg ctgtgctctg cgaaagccag 3000ttaccacggt taagcagttc cccaactgac ttaaccttcg atcaaaccac ctccccaggt 3060ggttttttcg tttacagggc aaaagattac gcgcagaaaa aaaggatctc aagaagatcc 3120tttgatcttt tctactgaac cgctctagat ttcagtgcaa tttatctctt caaatgtagc 3180acctgaagtc agccccatac gatataagtt gtaattctca tgttagtcat gccccgcgcc 3240caccggaagg agctgactgg gttgaaggct ctcaagggca tcggtcgaga tcccggtgcc 3300taatgagtga gctaacttac attaattgcg ttgcgctcac tgcccgcttt ccagtcggga 3360aacctgtcgt gccagctgca ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt 3420attgggcgcc agggtggttt ttcttttcac cagtgagacg ggcaacagct gattgccctt 3480caccgcctgg ccctgagaga gttgcagcaa gcggtccacg ctggtttgcc ccagcaggcg 3540aaaatcctgt ttgatggtgg ttaacggcgg gatataacat gagctgtctt cggtatcgtc 3600gtatcccact accgagatgt ccgcaccaac gcgcagcccg gactcggtaa tggcgcgcat 3660tgcgcccagc gccatctgat cgttggcaac cagcatcgca gtgggaacga tgccctcatt 3720cagcatttgc atggtttgtt gaaaaccgga catggcactc cagtcgcctt cccgttccgc 3780tatcggctga atttgattgc gagtgagata tttatgccag ccagccagac gcagacgcgc 3840cgagacagaa cttaatgggc ccgctaacag cgcgatttgc tggtgaccca atgcgaccag 3900atgctccacg cccagtcgcg taccgtcttc atgggagaaa ataatactgt tgatgggtgt 3960ctggtcagag acatcaagaa ataacgccgg aacattagtg caggcagctt ccacagcaat 4020ggcatcctgg tcatccagcg gatagttaat gatcagccca ctgacgcgtt gcgcgagaag 4080attgtgcacc gccgctttac aggcttcgac gccgcttcgt tctaccatcg acaccaccac 4140gctggcaccc agttgatcgg cgcgagattt aatcgccgcg acaatttgcg acggcgcgtg 4200cagggccaga ctggaggtgg caacgccaat cagcaacgac tgtttgcccg ccagttgttg 4260tgccacgcgg ttgggaatgt aattcagctc cgccatcgcc gcttccactt tttcccgcgt 4320tttcgcagaa acgtggctgg cctggttcac cacgcgggaa acggtctgat aagagacacc 4380ggcatactct gcgacatcgt ataacgttac tggtttcaca ttcaccaccc tgaattgact 4440ctcttccggg cgctatcatg ccataccgcg aaaggttttg cgccattcga tggtgtccgg 4500gatctcgacg ctctccctta tgcgactcct gcattaggaa attaatacga ctcactata 45591295502DNAArtificial sequencepDG6 plasmid 129ggggaattgt gagcggataa caattcccct gtagaaataa ttttgtttaa ctttaataag 60gagatatacc atggcgcaac tcactcttct tttagtcggc aattccgacg ccatcacgcc 120attacttgct aaagctgact ttgaacaacg ttcgcgtctg cagattattc ctgcgcagtc 180agttatcgcc agtgatgccc ggccttcgca agctatccgc gccagtcgtg ggagttcaat 240gcgcgtggcc ctggagctgg tgaaagaagg tcgagcgcaa gcctgtgtca gtgccggtaa 300taccggggcg ctgatggggc tggcaaaatt attactcaag cccctggagg ggattgagcg 360tccggcgctg gtgacggtat taccacatca gcaaaagggc aaaacggtgg tccttgactt 420aggggccaac gtcgattgtg acagcacaat gctggtgcaa tttgccatta tgggctcagt 480tctggctgaa gaggtggtgg aaattcccaa tcctcgcgtg gcgttgctca atattggtga 540agaagaagta aagggtctcg acagtattcg ggatgcctca gcggtgctta aaacaatccc 600ttctatcaat tatatcggct atcttgaagc caatgagttg ttaactggca agacagatgt 660gctggtttgt gacggcttta caggaaatgt cacattaaag acgatggaag gtgttgtcag 720gatgttcctt tctctgctga aatctcaggg tgaagggaaa aaacggtcgt ggtggctact 780gttattaaag cgttggctac aaaagagcct gacgaggcga ttcagtcacc tcaaccccga 840ccagtataac ggcgcctgtc tgttaggatt gcgcggcacg gtgataaaaa gtcatggtgc 900agccaatcag cgagcttttg cggtcgcgat tgaacaggca gtgcaggcgg tgcagcgaca 960agttcctcag cgaattgccg ctcgcctgga atctgtatac ccagctggtt ttgagctgct 1020ggacggtggc aaaagcggaa ctctgcggta gcaggacgct gccagcgaac tcgcagtttg 1080caagtgacgg tatataaccg aaaagtgact gagcgcatat gaaagctggc attcttggtg 1140ttggacgtta cattcctgag aaggttttaa caaatcatga tcttgaaaaa atggttgaaa 1200cttctgacga gtggattcgt acaagaacag gaatagaaga aagaagaatc gcagcagatg 1260atgtgttttc atcacacatg gctgttgcag cagcgaaaaa tgcgctggaa caagctgaag 1320tggctgctga ggatctggat atgatcttgg ttgcaactgt tacacctgat cagtcattcc 1380ctacggtgtc ttgtatgatt caagaacaac tcggcgcgaa gaaagcgtgt gctatggata 1440tcagcgcggc ttgtgcgggc ttcatgtacg gggttgtaac cggtaaacaa tttattgaat 1500ccggaaccta caagcatgtt ctagttgttg gtgtagagaa gctctcaagc attaccgact 1560gggaagaccg caatacagcc gttctgtttg gagacggagc aggcgctgcg gtagtcgggc 1620cagtcagtga tgacagagga atcctttcat ttgaactagg agccgacggc acaggcggtc 1680agcacttgta tctgaatgaa aaacgacata caatcatgaa tggacgagaa gttttcaaat 1740ttgcagtccg ccaaatggga gaatcatgcg taaatgtcat tgaaaaagcc ggactttcaa 1800aagaggatgt ggactttttg attccgcatc aggcgaacat ccgtatcatg gaagctgctc 1860gcgagcgttt agagcttcct gtcgaaaaga tgtctaaaac tgttcataaa tatggaaata 1920cttctgccgc atccattccg atctctcttg tagaagaatt ggaagccggt aaaatcaaag 1980acggcgatgt ggtcgttatg gtagggttcg gcggaggact aacatggggc gccattgcaa 2040tccgctgggg ccgataaaaa aaaggtgagg tgcactcgag tctggtaaag aaaccgctgc 2100tgcgaaattt gaacgccagc acatggactc gtctactagc gcagcttaat taacctaggc 2160tgctgccacc gctgagcaat aactagcata accccttggg gcctctaaac gggtcttgag 2220gggttttttg ctgaaacctc aggcatttga gaagcacacg gtcacactgc ttccggtagt 2280caataaaccg gtaaaccagc aatagacata agcggctatt taacgaccct gccctgaacc 2340gacgaccggg tcatcgtggc cggatcttgc ggcccctcgg cttgaacgaa ttgttagaca 2400ttatttgccg actaccttgg tgatctcgcc tttcacgtag tggacaaatt cttccaactg 2460atctgcgcgc gaggccaagc gatcttcttc ttgtccaaga taagcctgtc tagcttcaag 2520tatgacgggc tgatactggg ccggcaggcg ctccattgcc cagtcggcag cgacatcctt 2580cggcgcgatt ttgccggtta ctgcgctgta ccaaatgcgg gacaacgtaa gcactacatt 2640tcgctcatcg ccagcccagt cgggcggcga gttccatagc gttaaggttt catttagcgc 2700ctcaaataga tcctgttcag gaaccggatc aaagagttcc tccgccgctg gacctaccaa 2760ggcaacgcta tgttctcttg cttttgtcag caagatagcc agatcaatgt cgatcgtggc 2820tggctcgaag atacctgcaa gaatgtcatt gcgctgccat tctccaaatt gcagttcgcg 2880cttagctgga taacgccacg gaatgatgtc gtcgtgcaca acaatggtga cttctacagc 2940gcggagaatc tcgctctctc caggggaagc cgaagtttcc aaaaggtcgt tgatcaaagc 3000tcgccgcgtt gtttcatcaa gccttacggt caccgtaacc agcaaatcaa tatcactgtg 3060tggcttcagg ccgccatcca ctgcggagcc gtacaaatgt acggccagca acgtcggttc 3120gagatggcgc tcgatgacgc caactacctc tgatagttga gtcgatactt cggcgatcac 3180cgcttccctc atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct 3240catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagcta gctcactcgg 3300tcgctacgct ccgggcgtga gactgcggcg ggcgctgcgg acacatacaa agttacccac 3360agattccgtg gataagcagg ggactaacat gtgaggcaaa acagcagggc cgcgccggtg 3420gcgtttttcc ataggctccg ccctcctgcc agagttcaca taaacagacg cttttccggt 3480gcatctgtgg gagccgtgag gctcaaccat gaatctgaca gtacgggcga aacccgacag 3540gacttaaaga tccccaccgt ttccggcggg tcgctccctc ttgcgctctc ctgttccgac 3600cctgccgttt accggatacc tgttccgcct ttctccctta cgggaagtgt ggcgctttct 3660catagctcac acactggtat ctcggctcgg tgtaggtcgt tcgctccaag ctgggctgta 3720agcaagaact ccccgttcag cccgactgct gcgccttatc cggtaactgt tcacttgagt 3780ccaacccgga aaagcacggt aaaacgccac tggcagcagc cattggtaac tgggagttcg 3840cagaggattt gtttagctaa acacgcggtt gctcttgaag tgtgcgccaa agtccggcta 3900cactggaagg acagatttgg ttgctgtgct ctgcgaaagc cagttaccac ggttaagcag 3960ttccccaact gacttaacct tcgatcaaac cacctcccca ggtggttttt tcgtttacag 4020ggcaaaagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc ttttctactg 4080aaccgctcta gatttcagtg caatttatct cttcaaatgt agcacctgaa gtcagcccca 4140tacgatataa gttgtaattc tcatgttagt catgccccgc gcccaccgga aggagctgac 4200tgggttgaag gctctcaagg gcatcggtcg agatcccggt gcctaatgag tgagctaact 4260tacattaatt gcgttgcgct cactgcccgc tttccagtcg ggaaacctgt cgtgccagct 4320gcattaatga atcggccaac gcgcggggag aggcggtttg cgtattgggc gccagggtgg 4380tttttctttt caccagtgag acgggcaaca gctgattgcc cttcaccgcc tggccctgag 4440agagttgcag caagcggtcc acgctggttt gccccagcag gcgaaaatcc tgtttgatgg 4500tggttaacgg cgggatataa catgagctgt cttcggtatc gtcgtatccc actaccgaga 4560tgtccgcacc aacgcgcagc ccggactcgg taatggcgcg cattgcgccc agcgccatct 4620gatcgttggc aaccagcatc gcagtgggaa cgatgccctc attcagcatt tgcatggttt 4680gttgaaaacc ggacatggca ctccagtcgc cttcccgttc cgctatcggc tgaatttgat 4740tgcgagtgag atatttatgc cagccagcca gacgcagacg cgccgagaca gaacttaatg 4800ggcccgctaa cagcgcgatt tgctggtgac ccaatgcgac cagatgctcc acgcccagtc 4860gcgtaccgtc ttcatgggag aaaataatac tgttgatggg tgtctggtca gagacatcaa 4920gaaataacgc cggaacatta gtgcaggcag cttccacagc aatggcatcc tggtcatcca 4980gcggatagtt aatgatcagc ccactgacgc gttgcgcgag aagattgtgc accgccgctt 5040tacaggcttc gacgccgctt cgttctacca tcgacaccac cacgctggca cccagttgat 5100cggcgcgaga tttaatcgcc gcgacaattt gcgacggcgc gtgcagggcc agactggagg 5160tggcaacgcc aatcagcaac gactgtttgc ccgccagttg ttgtgccacg cggttgggaa 5220tgtaattcag ctccgccatc gccgcttcca ctttttcccg cgttttcgca gaaacgtggc 5280tggcctggtt caccacgcgg gaaacggtct gataagagac accggcatac tctgcgacat 5340cgtataacgt tactggtttc acattcacca ccctgaattg actctcttcc gggcgctatc 5400atgccatacc gcgaaaggtt ttgcgccatt cgatggtgtc cgggatctcg acgctctccc 5460ttatgcgact cctgcattag gaaattaata cgactcacta ta 55021305541DNAArtificial sequencepDG7 plasmid 130ggggaattgt gagcggataa caattcccct gtagaaataa ttttgtttaa ctttaataag 60gagatatacc atggcgcaac tcactcttct tttagtcggc aattccgacg ccatcacgcc 120attacttgct aaagctgact ttgaacaacg ttcgcgtctg cagattattc ctgcgcagtc 180agttatcgcc agtgatgccc ggccttcgca agctatccgc gccagtcgtg ggagttcaat 240gcgcgtggcc ctggagctgg tgaaagaagg tcgagcgcaa gcctgtgtca gtgccggtaa 300taccggggcg ctgatggggc tggcaaaatt attactcaag cccctggagg ggattgagcg 360tccggcgctg gtgacggtat taccacatca gcaaaagggc aaaacggtgg tccttgactt 420aggggccaac gtcgattgtg acagcacaat gctggtgcaa tttgccatta tgggctcagt 480tctggctgaa gaggtggtgg aaattcccaa tcctcgcgtg gcgttgctca atattggtga 540agaagaagta aagggtctcg acagtattcg ggatgcctca gcggtgctta aaacaatccc 600ttctatcaat tatatcggct atcttgaagc caatgagttg ttaactggca agacagatgt 660gctggtttgt gacggcttta caggaaatgt cacattaaag acgatggaag gtgttgtcag 720gatgttcctt tctctgctga aatctcaggg tgaagggaaa aaacggtcgt ggtggctact 780gttattaaag cgttggctac aaaagagcct gacgaggcga ttcagtcacc tcaaccccga 840ccagtataac ggcgcctgtc tgttaggatt gcgcggcacg gtgataaaaa gtcatggtgc 900agccaatcag cgagcttttg cggtcgcgat tgaacaggca gtgcaggcgg tgcagcgaca 960agttcctcag cgaattgccg ctcgcctgga atctgtatac ccagctggtt ttgagctgct 1020ggacggtggc aaaagcggaa ctctgcggta gcaggacgct gccagcgaac tcgcagtttg 1080caagtgacgg tatataaccg aaaagtgact gagcgcatat gtcaaaagca aaaattacag 1140ctatcggcac ctatgcgccg agcagacgtt taaccaatgc agatttagaa aagatcgttg 1200atacctctga tgaatggatc gttcagcgca caggaatgag agaacgccgg attgcggatg 1260aacatcaatt tacctctgat ttatgcatag aagcggtgaa gaatctcaag agccgttata 1320aaggaacgct tgatgatgtc gatatgatcc tcgttgccac aaccacatcc gattacgcct 1380ttccgagtac ggcatgccgc gtacaggaat atttcggctg ggaaagcacc ggcgcgctgg 1440atattaatgc gacatgcgcc gggctgacat acggcctcca tttggcaaat ggattgatca 1500catctggcct tcatcaaaaa attctcgtca tcgccggaga gacgttatca aaggtaaccg 1560attatacgga tcgaacgaca tgcgtactgt tcggcgatgc cgcgggtgcg ctgttagtag 1620aacgagatga agagacgccg ggatttcttg cgtctgtaca aggaacaagc gggaacggcg 1680gcgatatttt gtatcgtgcc ggactgcgaa atgaaataaa cggtgtgcag cttgtcggtt 1740ccggaaaaat ggtgcaaaac ggacgcgagg tatataaatg ggccgcaaga accgtccctg 1800gcgaatttga acggctttta cataaagcag gactcagctc cgatgatctc gattggtttg 1860ttcctcacag cgccaacttg cgcatgatcg agtcaatttg tgaaaaaaca ccgttcccga 1920ttgaaaaaac gctcactagc gttgagcact acggaaacac gtcttcggtt tcaattgttt 1980tggcgctcga tctcgcagtg aaagccggga agctgaaaaa agatcaaatc gttttgcttt 2040tcgggtttgg cggcggatta acctatacag gattgcttat taaatggggg atgtaaagat 2100ctcctaggcg tcactcgagt ctggtaaaga aaccgctgct gcgaaatttg aacgccagca 2160catggactcg tctactagcg cagcttaatt aacctaggct gctgccaccg ctgagcaata 2220actagcataa ccccttgggg cctctaaacg ggtcttgagg ggttttttgc tgaaacctca 2280ggcatttgag aagcacacgg tcacactgct tccggtagtc aataaaccgg taaaccagca 2340atagacataa gcggctattt aacgaccctg ccctgaaccg acgaccgggt catcgtggcc 2400ggatcttgcg gcccctcggc ttgaacgaat tgttagacat tatttgccga ctaccttggt 2460gatctcgcct ttcacgtagt ggacaaattc ttccaactga tctgcgcgcg aggccaagcg 2520atcttcttct tgtccaagat aagcctgtct agcttcaagt atgacgggct gatactgggc 2580cggcaggcgc tccattgccc agtcggcagc gacatccttc ggcgcgattt tgccggttac 2640tgcgctgtac caaatgcggg acaacgtaag cactacattt cgctcatcgc cagcccagtc 2700gggcggcgag ttccatagcg ttaaggtttc atttagcgcc tcaaatagat cctgttcagg 2760aaccggatca aagagttcct ccgccgctgg acctaccaag gcaacgctat gttctcttgc 2820ttttgtcagc aagatagcca gatcaatgtc gatcgtggct ggctcgaaga tacctgcaag 2880aatgtcattg cgctgccatt ctccaaattg cagttcgcgc ttagctggat aacgccacgg 2940aatgatgtcg tcgtgcacaa caatggtgac ttctacagcg cggagaatct cgctctctcc 3000aggggaagcc gaagtttcca aaaggtcgtt gatcaaagct cgccgcgttg tttcatcaag 3060ccttacggtc accgtaacca gcaaatcaat atcactgtgt ggcttcaggc cgccatccac 3120tgcggagccg tacaaatgta cggccagcaa cgtcggttcg agatggcgct cgatgacgcc 3180aactacctct gatagttgag tcgatacttc ggcgatcacc gcttccctca tactcttcct 3240ttttcaatat tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga 3300atgtatttag aaaaataaac aaatagctag ctcactcggt cgctacgctc cgggcgtgag 3360actgcggcgg gcgctgcgga cacatacaaa gttacccaca gattccgtgg ataagcaggg 3420gactaacatg tgaggcaaaa cagcagggcc gcgccggtgg cgtttttcca taggctccgc 3480cctcctgcca gagttcacat aaacagacgc ttttccggtg catctgtggg agccgtgagg 3540ctcaaccatg aatctgacag tacgggcgaa acccgacagg acttaaagat ccccaccgtt 3600tccggcgggt cgctccctct tgcgctctcc tgttccgacc ctgccgttta ccggatacct 3660gttccgcctt tctcccttac gggaagtgtg gcgctttctc atagctcaca cactggtatc 3720tcggctcggt gtaggtcgtt cgctccaagc tgggctgtaa gcaagaactc cccgttcagc 3780ccgactgctg cgccttatcc ggtaactgtt cacttgagtc caacccggaa aagcacggta 3840aaacgccact ggcagcagcc attggtaact gggagttcgc agaggatttg tttagctaaa 3900cacgcggttg ctcttgaagt gtgcgccaaa gtccggctac actggaagga cagatttggt 3960tgctgtgctc tgcgaaagcc agttaccacg gttaagcagt tccccaactg acttaacctt 4020cgatcaaacc acctccccag gtggtttttt cgtttacagg gcaaaagatt acgcgcagaa 4080aaaaaggatc tcaagaagat cctttgatct tttctactga accgctctag atttcagtgc 4140aatttatctc ttcaaatgta gcacctgaag tcagccccat

acgatataag ttgtaattct 4200catgttagtc atgccccgcg cccaccggaa ggagctgact gggttgaagg ctctcaaggg 4260catcggtcga gatcccggtg cctaatgagt gagctaactt acattaattg cgttgcgctc 4320actgcccgct ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa tcggccaacg 4380cgcggggaga ggcggtttgc gtattgggcg ccagggtggt ttttcttttc accagtgaga 4440cgggcaacag ctgattgccc ttcaccgcct ggccctgaga gagttgcagc aagcggtcca 4500cgctggtttg ccccagcagg cgaaaatcct gtttgatggt ggttaacggc gggatataac 4560atgagctgtc ttcggtatcg tcgtatccca ctaccgagat gtccgcacca acgcgcagcc 4620cggactcggt aatggcgcgc attgcgccca gcgccatctg atcgttggca accagcatcg 4680cagtgggaac gatgccctca ttcagcattt gcatggtttg ttgaaaaccg gacatggcac 4740tccagtcgcc ttcccgttcc gctatcggct gaatttgatt gcgagtgaga tatttatgcc 4800agccagccag acgcagacgc gccgagacag aacttaatgg gcccgctaac agcgcgattt 4860gctggtgacc caatgcgacc agatgctcca cgcccagtcg cgtaccgtct tcatgggaga 4920aaataatact gttgatgggt gtctggtcag agacatcaag aaataacgcc ggaacattag 4980tgcaggcagc ttccacagca atggcatcct ggtcatccag cggatagtta atgatcagcc 5040cactgacgcg ttgcgcgaga agattgtgca ccgccgcttt acaggcttcg acgccgcttc 5100gttctaccat cgacaccacc acgctggcac ccagttgatc ggcgcgagat ttaatcgccg 5160cgacaatttg cgacggcgcg tgcagggcca gactggaggt ggcaacgcca atcagcaacg 5220actgtttgcc cgccagttgt tgtgccacgc ggttgggaat gtaattcagc tccgccatcg 5280ccgcttccac tttttcccgc gttttcgcag aaacgtggct ggcctggttc accacgcggg 5340aaacggtctg ataagagaca ccggcatact ctgcgacatc gtataacgtt actggtttca 5400cattcaccac cctgaattga ctctcttccg ggcgctatca tgccataccg cgaaaggttt 5460tgcgccattc gatggtgtcc gggatctcga cgctctccct tatgcgactc ctgcattagg 5520aaattaatac gactcactat a 55411315582DNAArtificial sequencepDG8 plasmid 131ggggaattgt gagcggataa caattcccct gtagaaataa ttttgtttaa ctttaataag 60gagatatacc atggcgcaac tcactcttct tttagtcggc aattccgacg ccatcacgcc 120attacttgct aaagctgact ttgaacaacg ttcgcgtctg cagattattc ctgcgcagtc 180agttatcgcc agtgatgccc ggccttcgca agctatccgc gccagtcgtg ggagttcaat 240gcgcgtggcc ctggagctgg tgaaagaagg tcgagcgcaa gcctgtgtca gtgccggtaa 300taccggggcg ctgatggggc tggcaaaatt attactcaag cccctggagg ggattgagcg 360tccggcgctg gtgacggtat taccacatca gcaaaagggc aaaacggtgg tccttgactt 420aggggccaac gtcgattgtg acagcacaat gctggtgcaa tttgccatta tgggctcagt 480tctggctgaa gaggtggtgg aaattcccaa tcctcgcgtg gcgttgctca atattggtga 540agaagaagta aagggtctcg acagtattcg ggatgcctca gcggtgctta aaacaatccc 600ttctatcaat tatatcggct atcttgaagc caatgagttg ttaactggca agacagatgt 660gctggtttgt gacggcttta caggaaatgt cacattaaag acgatggaag gtgttgtcag 720gatgttcctt tctctgctga aatctcaggg tgaagggaaa aaacggtcgt ggtggctact 780gttattaaag cgttggctac aaaagagcct gacgaggcga ttcagtcacc tcaaccccga 840ccagtataac ggcgcctgtc tgttaggatt gcgcggcacg gtgataaaaa gtcatggtgc 900agccaatcag cgagcttttg cggtcgcgat tgaacaggca gtgcaggcgg tgcagcgaca 960agttcctcag cgaattgccg ctcgcctgga atctgtatac ccagctggtt ttgagctgct 1020ggacggtggc aaaagcggaa ctctgcggta gcaggacgct gccagcgaac tcgcagtttg 1080caagtgacgg tatataaccg aaaagtgact gagcgcatat gtctaagatc aagccaagca 1140agggcgctcc gtacgcgcgc atcctgggcg tcggcggtta ccgtccgacc cgtgtggtgc 1200cgaacgaggt gatcctggag aagatcgact cttccgacga gtggattcgc tctcgctccg 1260gcatcgaaac gcgtcactgg gcgggtccgg aagaaaccgt cgcggcgatg tctgtggagg 1320cctccggcaa ggcactggcc gacgccggta tcgacgcctc tcgtatcggt gccgtggtag 1380tctctaccgt gtctcacttc agccagaccc cggccatcgc caccgagatc gccgaccgcc 1440tgggcacgga caaggccgca gccttcgaca tctctgccgg ctgcgcgggc ttcggctacg 1500gtctgaccct ggccaagggc atggtcgtcg aaggttctgc ggagtacgtg ctggtcatcg 1560gcgtggagcg tctgtccgac ctgaccgacc tggaggaccg tgccacggcc ttcctgttcg 1620gcgacggcgc tggtgcggtc gtggtcggcc cgtcccagga gccggcaatc ggcccgacgg 1680tctggggctc tgagggcgac aaggccgaaa cgatcaagca gaccgtttcc tgggaccgct 1740tccgtatcgg cgatgtctcc gaactgccgc tggactccga gggcaacgtc aagtttcctg 1800cgatcacgca ggagggccag gcggtgttcc gctgggccgt gttcgagatg gcgaaggtcg 1860cgcagcaggc gctggacgcg gcgggtatca gcccggacga cctggacgtc tttatcccgc 1920accaggccaa tgtgcgtatc atcgactcta tggtgaaaac cctgaagctg ccggagcacg 1980tcacggtcgc ccgtgacatc cgcaccaccg gcaacacctc tgccgcctct attccgctgg 2040cgatggagcg tctgctggcg accggcgacg cgcgtagcgg cgacaccgcg ctggtcatcg 2100gcttcggtgc gggtctggtc tacgccgcga cggtcgttac cctgccgtaa ccacctcgag 2160tctggtaaag aaaccgctgc tgcgaaattt gaacgccagc acatggactc gtctactagc 2220gcagcttaat taacctaggc tgctgccacc gctgagcaat aactagcata accccttggg 2280gcctctaaac gggtcttgag gggttttttg ctgaaacctc aggcatttga gaagcacacg 2340gtcacactgc ttccggtagt caataaaccg gtaaaccagc aatagacata agcggctatt 2400taacgaccct gccctgaacc gacgaccggg tcatcgtggc cggatcttgc ggcccctcgg 2460cttgaacgaa ttgttagaca ttatttgccg actaccttgg tgatctcgcc tttcacgtag 2520tggacaaatt cttccaactg atctgcgcgc gaggccaagc gatcttcttc ttgtccaaga 2580taagcctgtc tagcttcaag tatgacgggc tgatactggg ccggcaggcg ctccattgcc 2640cagtcggcag cgacatcctt cggcgcgatt ttgccggtta ctgcgctgta ccaaatgcgg 2700gacaacgtaa gcactacatt tcgctcatcg ccagcccagt cgggcggcga gttccatagc 2760gttaaggttt catttagcgc ctcaaataga tcctgttcag gaaccggatc aaagagttcc 2820tccgccgctg gacctaccaa ggcaacgcta tgttctcttg cttttgtcag caagatagcc 2880agatcaatgt cgatcgtggc tggctcgaag atacctgcaa gaatgtcatt gcgctgccat 2940tctccaaatt gcagttcgcg cttagctgga taacgccacg gaatgatgtc gtcgtgcaca 3000acaatggtga cttctacagc gcggagaatc tcgctctctc caggggaagc cgaagtttcc 3060aaaaggtcgt tgatcaaagc tcgccgcgtt gtttcatcaa gccttacggt caccgtaacc 3120agcaaatcaa tatcactgtg tggcttcagg ccgccatcca ctgcggagcc gtacaaatgt 3180acggccagca acgtcggttc gagatggcgc tcgatgacgc caactacctc tgatagttga 3240gtcgatactt cggcgatcac cgcttccctc atactcttcc tttttcaata ttattgaagc 3300atttatcagg gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa 3360caaatagcta gctcactcgg tcgctacgct ccgggcgtga gactgcggcg ggcgctgcgg 3420acacatacaa agttacccac agattccgtg gataagcagg ggactaacat gtgaggcaaa 3480acagcagggc cgcgccggtg gcgtttttcc ataggctccg ccctcctgcc agagttcaca 3540taaacagacg cttttccggt gcatctgtgg gagccgtgag gctcaaccat gaatctgaca 3600gtacgggcga aacccgacag gacttaaaga tccccaccgt ttccggcggg tcgctccctc 3660ttgcgctctc ctgttccgac cctgccgttt accggatacc tgttccgcct ttctccctta 3720cgggaagtgt ggcgctttct catagctcac acactggtat ctcggctcgg tgtaggtcgt 3780tcgctccaag ctgggctgta agcaagaact ccccgttcag cccgactgct gcgccttatc 3840cggtaactgt tcacttgagt ccaacccgga aaagcacggt aaaacgccac tggcagcagc 3900cattggtaac tgggagttcg cagaggattt gtttagctaa acacgcggtt gctcttgaag 3960tgtgcgccaa agtccggcta cactggaagg acagatttgg ttgctgtgct ctgcgaaagc 4020cagttaccac ggttaagcag ttccccaact gacttaacct tcgatcaaac cacctcccca 4080ggtggttttt tcgtttacag ggcaaaagat tacgcgcaga aaaaaaggat ctcaagaaga 4140tcctttgatc ttttctactg aaccgctcta gatttcagtg caatttatct cttcaaatgt 4200agcacctgaa gtcagcccca tacgatataa gttgtaattc tcatgttagt catgccccgc 4260gcccaccgga aggagctgac tgggttgaag gctctcaagg gcatcggtcg agatcccggt 4320gcctaatgag tgagctaact tacattaatt gcgttgcgct cactgcccgc tttccagtcg 4380ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag aggcggtttg 4440cgtattgggc gccagggtgg tttttctttt caccagtgag acgggcaaca gctgattgcc 4500cttcaccgcc tggccctgag agagttgcag caagcggtcc acgctggttt gccccagcag 4560gcgaaaatcc tgtttgatgg tggttaacgg cgggatataa catgagctgt cttcggtatc 4620gtcgtatccc actaccgaga tgtccgcacc aacgcgcagc ccggactcgg taatggcgcg 4680cattgcgccc agcgccatct gatcgttggc aaccagcatc gcagtgggaa cgatgccctc 4740attcagcatt tgcatggttt gttgaaaacc ggacatggca ctccagtcgc cttcccgttc 4800cgctatcggc tgaatttgat tgcgagtgag atatttatgc cagccagcca gacgcagacg 4860cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt tgctggtgac ccaatgcgac 4920cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag aaaataatac tgttgatggg 4980tgtctggtca gagacatcaa gaaataacgc cggaacatta gtgcaggcag cttccacagc 5040aatggcatcc tggtcatcca gcggatagtt aatgatcagc ccactgacgc gttgcgcgag 5100aagattgtgc accgccgctt tacaggcttc gacgccgctt cgttctacca tcgacaccac 5160cacgctggca cccagttgat cggcgcgaga tttaatcgcc gcgacaattt gcgacggcgc 5220gtgcagggcc agactggagg tggcaacgcc aatcagcaac gactgtttgc ccgccagttg 5280ttgtgccacg cggttgggaa tgtaattcag ctccgccatc gccgcttcca ctttttcccg 5340cgttttcgca gaaacgtggc tggcctggtt caccacgcgg gaaacggtct gataagagac 5400accggcatac tctgcgacat cgtataacgt tactggtttc acattcacca ccctgaattg 5460actctcttcc gggcgctatc atgccatacc gcgaaaggtt ttgcgccatt cgatggtgtc 5520cgggatctcg acgctctccc ttatgcgact cctgcattag gaaattaata cgactcacta 5580ta 55821325678DNAArtificial sequencepDG10 plasmid 132agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240ttaggtgacg cgttagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300ctagtaacgg ccgccagtgt gctggaattc aggcttaact tcatgtgaaa agtttgttaa 360aatataaatg agcacgttaa tcatttaaca tagataatta aatagtaaaa gggagtgtac 420gaccagtgat taagagtttt aatgaaatta tcatgaaggt aaagagcaaa gaaatgaaaa 480aagttgctgt tgctgtagca caagacgagc cagtacttga agcagtaaga gatgctaaga 540aaaatggtat tgcagatgct attcttgttg gagaccatga cgaaatcgtg tcaatcgcgc 600ttaaaatagg aatggatgta aatgattttg aaatagtaaa cgagcctaac gttaagaaag 660ctgctttaaa ggcagtagag cttgtatcaa ctggaaaagc tgatatggta atgaagggac 720ttgtaaatac agcaactttc ttaagatctg tattaaacaa agaagttgga cttagaacag 780gaaaaactat gtctcacgtt gcagtatttg aaactgagaa atttgataga ctattatttt 840taacagatgt tgctttcaat acttatcctg aattaaagga aaaaattgat atagtaaaca 900attcagttaa ggttgcacat gcaataggaa ttgaaaatcc aaaggttgct ccaatttgtg 960cagttgaggt tataaaccct aaaatgccat caacacttga tgcagcaatg ctttcaaaaa 1020tgagtgacag aggacaaatt aaaggttgtg tagttgacgg acctttagca cttgatatag 1080ctttatcaga agaagcagca catcataagg gagtaacagg agaagttgct ggaaaagctg 1140atatcttctt aatgccaaac atagaaacag gaaatgtaat gtataagact ttaacatata 1200caactgattc aaaaaatgga ggaatcttag ttggaacttc tgcaccagtt gttttaactt 1260caagagctga cagccatgaa acaaaaatga actctatagc acttgcagct ttagttgcag 1320gcaataaata aattaaagtt aagtggagga atgttaacat gtatagatta ctaataatca 1380atcctggctc gacctcaact aaaattggta tttatgacga tgaaaaagag atatttgaga 1440agactttaag acattcagct gaagagatag aaaaatataa cactatattt gatcaatttc 1500aattcagaaa gaatgtaatt ttagatgcgt taaaagaagc aaacatagaa gtaagttctt 1560taaatgctgt agttggaaga ggcggactct taaagccaat agtaagtgga acttatgcag 1620taaatcaaaa aatgcttgaa gaccttaaag taggagttca aggtcagcat gcgtcaaatc 1680ttggtggaat tattgcaaat gaaatagcaa aagaaataaa tgttccagca tacatagttg 1740atccagttgt tgtggatgag cttgatgaag tttcaagaat atcaggaatg gctgacattc 1800caagaaaaag tatattccat gcattaaatc aaaaagcagt tgctagaaga tatgcaaaag 1860aagttggaaa aaaatacgaa gatcttaatt taatcgtagt ccacatgggt ggaggtactt 1920cagtaggtac tcataaagat ggtagagtaa tagaagttaa taatacactt gatggagaag 1980gtccattctc accagaaaga agtggtggag ttccaatagg agatcttgta agattgtgct 2040tcagcaacaa atatacttat gaagaagtaa tgaaaaagat aaacggcaaa ggcggagttg 2100ttagttactt aaatactatc gattttaagg ctgtagttga taaagctctt gaaggagata 2160agaaatgtgc acttatatat gaagctttca cattccaggt agcaaaagag ataggaaaat 2220gttcaaccgt tttaaaagga aatgtagatg caataatctt aacaggcgga attgcgtaca 2280acgagcatgt atgtaatgcc atagaggata gagtaaaatt catagcacct gtagttagat 2340atggtggaga agatgaactt cttgcacttg cagaaggtgg acttagagtt ttaagaggag 2400aagaaaaagc taaggaatac aaataataaa gtcataaata atataatata accagtaccc 2460atgtttataa aacttttgcc ctataaacat gggtattgtc ctgaattctg cagatatcca 2520tcacactggc ggccgctcga gcatgcatct agagggccca attcgcccta tagtgagtcg 2580tattacaatt cactggccgt cgttttacaa cgtcgtgact gggaaaaccc tggcgttacc 2640caacttaatc gccttgcagc acatccccct ttcgccagct ggcgtaatag cgaagaggcc 2700cgcaccgatc gcccttccca acagttgcgc agcctatacg tacggcagtt taaggtttac 2760acctataaaa gagagagccg ttatcgtctg tttgtggatg tacagagtga tattattgac 2820acgccggggc gacggatggt gatccccctg gccagtgcac gtctgctgtc agataaagtc 2880tcccgtgaac tttacccggt ggtgcatatc ggggatgaaa gctggcgcat gatgaccacc 2940gatatggcca gtgtgccggt ctccgttatc ggggaagaag tggctgatct cagccaccgc 3000gaaaatgaca tcaaaaacgc cattaacctg atgttctggg gaatataaat gtcaggcatg 3060agattatcaa aaaggatctt cacctagatc cttttcacgt agaaagccag tccgcagaaa 3120cggtgctgac cccggatgaa tgtcagctac tgggctatct ggacaaggga aaacgcaagc 3180gcaaagagaa agcaggtagc ttgcagtggg cttacatggc gatagctaga ctgggcggtt 3240ttatggacag caagcgaacc ggaattgcca gctggggcgc cctctggtaa ggttgggaag 3300ccctgcaaag taaactggat ggctttcttg ccgccaagga tctgatggcg caggggatca 3360agctctgatc aagagacagg atgaggatcg tttcgcatga ttgaacaaga tggattgcac 3420gcaggttctc cggccgcttg ggtggagagg ctattcggct atgactgggc acaacagaca 3480atcggctgct ctgatgccgc cgtgttccgg ctgtcagcgc aggggcgccc ggttcttttt 3540gtcaagaccg acctgtccgg tgccctgaat gaactgcaag acgaggcagc gcggctatcg 3600tggctggcca cgacgggcgt tccttgcgca gctgtgctcg acgttgtcac tgaagcggga 3660agggactggc tgctattggg cgaagtgccg gggcaggatc tcctgtcatc tcaccttgct 3720cctgccgaga aagtatccat catggctgat gcaatgcggc ggctgcatac gcttgatccg 3780gctacctgcc cattcgacca ccaagcgaaa catcgcatcg agcgagcacg tactcggatg 3840gaagccggtc ttgtcgatca ggatgatctg gacgaagagc atcaggggct cgcgccagcc 3900gaactgttcg ccaggctcaa ggcgagcatg cccgacggcg aggatctcgt cgtgacccat 3960ggcgatgcct gcttgccgaa tatcatggtg gaaaatggcc gcttttctgg attcatcgac 4020tgtggccggc tgggtgtggc ggaccgctat caggacatag cgttggctac ccgtgatatt 4080gctgaagagc ttggcggcga atgggctgac cgcttcctcg tgctttacgg tatcgccgct 4140cccgattcgc agcgcatcgc cttctatcgc cttcttgacg agttcttctg aattattaac 4200gcttacaatt tcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc 4260atcaggtggc acttttcggg gaaatgtgcg cggaacccct atttgtttat ttttctaaat 4320acattcaaat atgtatccgc tcatgagaca ataaccctga taaatgcttc aataatagca 4380cgtgaggagg gccaccatgg ccaagttgac cagtgccgtt ccggtgctca ccgcgcgcga 4440cgtcgccgga gcggtcgagt tctggaccga ccggctcggg ttctcccggg acttcgtgga 4500ggacgacttc gccggtgtgg tccgggacga cgtgaccctg ttcatcagcg cggtccagga 4560ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg cgcggcctgg acgagctgta 4620cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac gcctccgggc cggccatgac 4680cgagatcggc gagcagccgt gggggcggga gttcgccctg cgcgacccgg ccggcaactg 4740cgtgcacttc gtggccgagg agcaggactg acacgtgcta aaacttcatt tttaatttaa 4800aaggatctag gtgaagatcc tttttgataa tctcatgacc aaaatccctt aacgtgagtt 4860ttcgttccac tgagcgtcag accccgtaga aaagatcaaa ggatcttctt gagatccttt 4920ttttctgcgc gtaatctgct gcttgcaaac aaaaaaacca ccgctaccag cggtggtttg 4980tttgccggat caagagctac caactctttt tccgaaggta actggcttca gcagagcgca 5040gataccaaat actgttcttc tagtgtagcc gtagttaggc caccacttca agaactctgt 5100agcaccgcct acatacctcg ctctgctaat cctgttacca gtggctgctg ccagtggcga 5160taagtcgtgt cttaccgggt tggactcaag acgatagtta ccggataagg cgcagcggtc 5220gggctgaacg gggggttcgt gcacacagcc cagcttggag cgaacgacct acaccgaact 5280gagataccta cagcgtgagc tatgagaaag cgccacgctt cccgaaggga gaaaggcgga 5340caggtatccg gtaagcggca gggtcggaac aggagagcgc acgagggagc ttccaggggg 5400aaacgcctgg tatctttata gtcctgtcgg gtttcgccac ctctgacttg agcgtcgatt 5460tttgtgatgc tcgtcagggg ggcggagcct atggaaaaac gccagcaacg cggccttttt 5520acggttcctg gccttttgct ggccttttgc tcacatgttc tttcctgcgt tatcccctga 5580ttctgtggat aaccgtatta ccgcctttga gtgagctgat accgctcgcc gcagccgaac 5640gaccgagcgc agcgagtcag tgagcgagga agcggaag 56781333651DNAArtificial sequencepLS9-111 plasmid 133tagaaaaact catcgagcat caaatgaaac tgcaatttat tcatatcagg attatcaata 60ccatattttt gaaaaagccg tttctgtaat gaaggagaaa actcaccgag gcagttccat 120aggatggcaa gatcctggta tcggtctgcg attccgactc gtccaacatc aatacaacct 180attaatttcc cctcgtcaaa aataaggtta tcaagtgaga aatcaccatg agtgacgact 240gaatccggtg agaatggcaa aagtttatgc atttctttcc agacttgttc aacaggccag 300ccattacgct cgtcatcaaa atcactcgca tcaaccaaac cgttattcat tcgtgattgc 360gcctgagcga ggcgaaatac gcgatcgctg ttaaaaggac aattacaaac aggaatcgag 420tgcaaccggc gcaggaacac tgccagcgca tcaacaatat tttcacctga atcaggatat 480tcttctaata cctggaacgc tgtttttccg gggatcgcag tggtgagtaa ccatgcatca 540tcaggagtac ggataaaatg cttgatggtc ggaagtggca taaattccgt cagccagttt 600agtctgacca tctcatctgt aacatcattg gcaacgctac ctttgccatg tttcagaaac 660aactctggcg catcgggctt cccatacaag cgatagattg tcgcacctga ttgcccgaca 720ttatcgcgag cccatttata cccatataaa tcagcatcca tgttggaatt taatcgcggc 780ctcgacgttt cccgttgaat atggctcata ttcttccttt ttcaatatta ttgaagcatt 840tatcagggtt attgtctcat gagcggatac atatttgaat gtatttagaa aaataaacaa 900ataggggtca gtgttacaac caattaacca attctgaaca ttatcgcgag cccatttata 960cctgaatatg gctcataaca ccccttgttt gcctggcggc agtagcgcgg tggtcccacc 1020tgaccccatg ccgaactcag aagtgaaacg ccgtagcgcc gatggtagtg tggggactcc 1080ccatgcgaga gtagggaact gccaggcatc aaataaaacg aaaggctcag tcgaaagact 1140gggcctttcg cccgggctaa ttagggggtg tcgcccttac gtacgtactc gattgacgcc 1200taggagatct ttacatcccc catttaataa gcaatcctgt ataggttaat ccgccgccaa 1260acccgaaaag caaaacgatt tgatcttttt tcagcttccc ggctttcact gcgagatcga 1320gcgccaaaac aattgaaacc gaagacgtgt ttccgtagtg ctcaacgcta gtgagcgttt 1380tttcaatcgg gaacggtgtt ttttcacaaa ttgactcgat catgcgcaag ttggcgctgt 1440gaggaacaaa ccaatcgaga tcatcggagc tgagtcctgc tttatgtaaa agccgttcaa 1500attcgccagg gacggttctt gcggcccatt tatatacctc gcgtccgttt tgcaccattt 1560ttccggaacc gacaagctgc acaccgttta tttcatttcg cagtccggca cgatacaaaa 1620tatcgccgcc gttcccgctt gttccttgta cagacgcaag aaatcccggc gtctcttcat 1680ctcgttctac taacagcgca cccgcggcat cgccgaacag tacgcatgtc gttcgatccg 1740tataatcggt tacctttgat aacgtctctc cggcgatgac gagaattttt tgatgaaggc 1800cagatgtgat caatccattt gccaaatgga ggccgtatgt cagcccggcg catgtcgcat 1860taatatccag cgcgccggtg ctttcccagc cgaaatattc ctgtacgcgg catgccgtac 1920tcggaaaggc gtaatcggat gtggttgtgg caacgaggat catatcgaca tcatcaagcg 1980ttcctttata acggctcttg agattcttca ccgcttctat gcataaatca gaggtaaatt 2040gatgttcatc cgcaatccgg cgttctctca ttcctgtgcg ctgaacgatc cattcatcag 2100aggtatcaac gatcttttct aaatctgcat tggttaaacg tctgctcggc gcataggtgc 2160cgatagctgt aatttttgct tttgacatat gtcagcgaaa

gggcgacaca aaatttattc 2220taaatgcata ataaatactg ataacatctt atagtttgta ttatattttg tattatcgtt 2280gacatgtata attttgatat caaaaactga ttttcccttt attattttcg agatttattt 2340tcttaattct ctttaacaaa ctagaaatat tgtatataca aaaaatcata aataatagat 2400gaatagttta attataggtg ttcatcaatc gaaaaagcaa cgtatcttat ttaaagtgcg 2460ttgctttttt ctcatttata aggttaaata attctcatat atcaagcaaa gtgacaggcg 2520cccttaaata ttctgacaaa tgctctttcc ctaaactccc cccataaaaa aacccgccga 2580agcgggtttt tacgttattt gcggattaac gattactcgt tatcagaacc gcccaggggg 2640cccgagctta agactggccg tcgttttaca acacagaaag agtttgtaga aacgcaaaaa 2700ggccatccgt caggggcctt ctgcttagtt tgatgcctgg cagttcccta ctctcgcctt 2760ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag 2820ctcactcaaa ggcggtaata cggttatcca cagaatcagg ggataacgca ggaaagaaca 2880tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt 2940tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt cagaggtggc 3000gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc ctcgtgcgct 3060ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct tcgggaagcg 3120tggcgctttc tcatagctca cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca 3180agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta tccggtaact 3240atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca gccactggta 3300acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag tggtgggcta 3360actacggcta cactagaaga acagtatttg gtatctgcgc tctgctgaag ccagttacct 3420tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt agcggtggtt 3480tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa gatcctttga 3540tcttttctac ggggtctgac gctcagtgga acgacgcgcg cgtaactcac gttaagggat 3600tttggtcatg agcttgcgcc gtcccgtcaa gtcagcgtaa tgctctgctt t 36511347310DNAArtificial sequencepLS9-114 plasmid 134ttaataagat gatcttcttg agatcgtttt ggtctgcgcg taatctcttg ctctgaaaac 60gaaaaaaccg ccttgcaggg cggtttttcg aaggttctct gagctaccaa ctctttgaac 120cgaggtaact ggcttggagg agcgcagtca ccaaaacttg tcctttcagt ttagccttaa 180ccggcgcatg acttcaagac taactcctct aaatcaatta ccagtggctg ctgccagtgg 240tgcttttgca tgtctttccg ggttggactc aagacgatag ttaccggata aggcgcagcg 300gtcggactga acggggggtt cgtgcataca gtccagcttg gagcgaactg cctacccgga 360actgagtgtc aggcgtggaa tgagacaaac gcggccataa cagcggaatg acaccggtaa 420accgaaaggc aggaacagga gagcgcacga gggagccgcc agggggaaac gcctggtatc 480tttatagtcc tgtcgggttt cgccaccact gatttgagcg tcagatttcg tgatgcttgt 540caggggggcg gagcctatgg aaaaacggct ttgccgcggc cctctcactt ccctgttaag 600tatcttcctg gcatcttcca ggaaatctcc gccccgttcg taagccattt ccgctcgccg 660cagtcgaacg accgagcgta gcgagtcagt gagcgaggaa gcggaatata tcctgtatca 720catattctgc tgacgcaccg gtgcagcctt ttttctcctg ccacatgaag cacttcactg 780acaccctcat cagtgccaac atagtaagcc agtatacact ccgctagcgc tgaggtcccg 840cagccgaacg accgagcgca gcggcgagag tagggaactg ccaggcatcc tgggcggttc 900tgataacgag taatcgttaa tccgcaaata acgtaaaaac ccgcttcggc gggttttttt 960atggggggag tttagggaaa gagcatttgt cagaatattt aagggcgcct gtcactttgc 1020ttgatatatg agaattattt aaccttataa atgagaaaaa agcaacgcac tttaaataag 1080atacgttgct ttttcgattg atgaacacct ataattaaac tattcatcta ttatttatga 1140ttttttgtat atacaatatt tctagtttgt taaagagaat taagaaaata aatctcgaaa 1200ataataaagg gaaaatcagt ttttgatatc aaaattatac atgtcaacga taatacaaaa 1260tataatacaa actataagat gttatcagta tttattatgc atttagaata ccttttgtgt 1320cgcccttggg gcatatgaaa gctggcattc ttggtgttgg acgttacatt cctgagaagg 1380ttttaacaaa tcatgatctt gaaaaaatgg ttgaaacttc tgacgagtgg attcgtacaa 1440gaacaggaat agaagaaaga agaatcgcag cagatgatgt gttttcatca cacatggctg 1500ttgcagcagc gaaaaatgcg ctggaacaag ctgaagtggc tgctgaggat ctggatatga 1560tcttggttgc aactgttaca cctgatcagt cattccctac ggtgtcttgt atgattcaag 1620aacaactcgg cgcgaagaaa gcgtgtgcta tggatatcag cgcggcttgt gcgggcttca 1680tgtacggggt tgtaaccggt aaacaattta ttgaatccgg aacctacaag catgttctag 1740ttgttggtgt agagaagctc tcaagcatta ccgactggga agaccgcaat acagccgttc 1800tgtttggaga cggagcaggc gctgcggtag tcgggccagt cagtgatgac agaggaatcc 1860tttcatttga actaggagcc gacggcacag gcggtcagca cttgtatctg aatgaaaaac 1920gacatacaat catgaatgga cgagaagttt tcaaatttgc agtccgccaa atgggagaat 1980catgcgtaaa tgtcattgaa aaagccggac tttcaaaaga ggatgtggac tttttgattc 2040cgcatcaggc gaacatccgt atcatggaag ctgctcgcga gcgtttagag cttcctgtcg 2100aaaagatgtc taaaactgtt cataaatatg gaaatacttc tgccgcatcc attccgatct 2160ctcttgtaga agaattggaa gccggtaaaa tcaaagacgg cgatgtggtc gttatggtag 2220ggttcggcgg aggactaaca tggggcgcca ttgcaatccg ctggggccga taaaaaaaag 2280gtgaggtgca cacaagatga ctaaaaaacg tgtagttgtt acaggtcttg gagcattatc 2340tccacttggc aacgacgtcg atacaagttg gaataacgca atcaacggtg tgtccggaat 2400cggtccgatc actcgtgttg acgctgaaga atatccggca aaagttgccg ctgaattaaa 2460agattttaat gttgaagatt atatggataa aaaagaagcc agaaaaatgg accgctttac 2520acaatatgcg gttgtggctg cgaaaatggc ggttgaagat gctgatctta acattaccga 2580tgagatcgcg ccgagagtcg gtgtttgggt aggctccggt atcggaggac ttgaaacact 2640agagtctcaa tttgaaatct tcttaacaaa aggcccaaga cgggtaagcc cgtttttcgt 2700gccaatgatg attcctgaca tggcgacagg ccagatttct attgcattag gagcaaaagg 2760ggtgaactct tgtacggtta cagcatgtgc tacaggaacg aactccatcg gtgacgcgtt 2820taaggttatt cagcgcggtg atgcagacgt gatggtcaca ggcggaacag aagcgctgct 2880gacaagaatg tcattcgccg gctttagtgc caacaaagcg ctgtctacta atccagatcc 2940gaaaacagcg agccgcccgt ttgataaaaa ccgtgatggc tttgtcatgg gggaaggtgc 3000agggattatc gttcttgaag aacttgagca tgccctggcc cgcggcgcta aaatttacgg 3060agaaattgtc ggctacggct caaccggaga cgcttatcat atcacagcgc cggcccaaga 3120cggtgaaggc ggagcgagag cgatgcaaga agccattaaa gatgcaggca ttgcacctga 3180agaaattgat tacatcaatg ctcacgggac aagcacgtat tacaacgaca aatacgaaac 3240aatggcgatt aagaccgttt ttggcgagca tgcgcataaa cttgcggtaa gctctacaaa 3300atcgatgaca ggccacctct taggagcagc cggcggtatt gaagccattt tctctatcct 3360ggccattaaa gaaggcgtga ttccgccgac aatcaatatt caaacacctg acgaagaatg 3420tgatttggat tatgtgcctg atgaagcccg cagacaggaa cttaattatg ttctcagcaa 3480ctcattagga ttcggcggac acaacgcaac attaatcttt aaaaaatatc aatcataagt 3540tttttctcga aaatttcatc gtagtttctc tagtttttta aaaacgaatc cactataata 3600cttgagggga ggtgaattgc tatggcagac acattagagc gtgtaacgaa aatcatcgta 3660gatcgccttg gcgttgatga agcagacgtc aaacttgaag catctttcaa ggaagactta 3720ggtgctgatt ccctagatgt agttgagctt gttatggaac ttgaagacga gtttgatatg 3780gagatttctg acgaagatgc tgaaaagatt gcaacagtcg gcgacgctgt gaactacata 3840caaaaccagc aataattaat taacctagga aaaaagggcg acacccctca attagcccgg 3900gcgaaaggcc cagtctttcg actgagcctt tcgttttatt tgatgcctgg cagttcccta 3960ctctcgcatg gggagtcccc acactaccat cggcgctacg gcgtttcact tctgagttcg 4020gcatggggtc aggtgggacc accgcgctac tgccgccagg caaacaaggg gtgttatgag 4080ccatattcag gtataaatgg gctcgcgata atgttcagaa ttggttaatt ggttgtaaca 4140ctgaccccta tttgtttatt tttctaaata cattcaaata tgtatccgct catgagacaa 4200taaccctgat aaatgcttca ataatattga aaaaggaaga atatgagtat tcaacatttc 4260cgtgtcgccc ttattccctt ttttgcggca ttttgccttc ctgtttttgc tcacccagaa 4320acgctggtga aagtaaaaga tgctgaagat cagttgggtg cacgagtggg ttacatcgaa 4380ctggatctca acagcggtaa gatccttgag agttttcgcc ccgaagaacg ttttccaatg 4440atgagcactt ttaaagttct gctatgtggc gcggtattat cccgtattga cgccgggcaa 4500gagcaactcg gtcgccgcat acactattct cagaatgact tggttgagta ctcaccagtc 4560acagaaaagc atcttacgga tggcatgaca gtaagagaat tatgcagtgc tgccataacc 4620atgagtgata acactgcggc caacttactt ctgacaacga tcggaggacc gaaggagcta 4680accgcttttt tgcacaacat gggggatcat gtaactcgcc ttgatcgttg ggaaccggag 4740ctgaatgaag ccataccaaa cgacgagcgt gacaccacga tgcctgtagc gatggcaaca 4800acgttgcgca aactattaac tggcgaacta cttactctag cttcccggca acaattaata 4860gactggatgg aggcggataa agttgcagga ccacttctgc gctcggccct tccggctggc 4920tggtttattg ctgataaatc cggagccggt gagcgtggtt ctcgcggtat catcgcagcg 4980ctggggccag atggtaagcc ctcccgtatc gtagttatct acacgacggg gagtcaggca 5040actatggatg aacgaaatag acagatcgct gagataggtg cctcactgat taagcattgg 5100taaaggagga aaaaaaaatg agccatattc aacgggaaac gtcgaggccg cgattaaatt 5160ccaacatgga tgctgattta tatgggtata aatgggctcg cgataatgtc gggcaatcag 5220gtgcgacaat ctatcgcttg tatgggaagc ccgatgcgcc agagttgttt ctgaaacatg 5280gcaaaggtag cgttgccaat gatgttacag atgagatggt cagactaaac tggctgacgg 5340aatttatgcc acttccgacc atcaagcatt ttatccgtac tcctgatgat gcatggttac 5400tcaccactgc gatccccgga aaaacagcgt tccaggtatt agaagaatat cctgattcag 5460gtgaaaatat tgttgatgcg ctggcagtgt tcctgcgccg gttgcactcg attcctgttt 5520gtaattgtcc ttttaacagc gatcgcgtat ttcgcctcgc tcaggcgcaa tcacgaatga 5580ataacggttt ggttgatgcg agtgattttg atgacgagcg taatggctgg cctgttgaac 5640aagtctggaa agaaatgcat aaacttttgc cattctcacc ggattcagtc gtcactcatg 5700gtgatttctc acttgataac cttatttttg acgaggggaa attaataggt tgtattgatg 5760ttggacgagt cggaatcgca gaccgatacc aggatcttgc catcctatgg aactgcctcg 5820gtgagttttc tccttcatta cagaaacggc tttttcaaaa atatggtatt gataatcctg 5880atatgaataa attgcagttt catttgatgc tcgatgagtt tttctaaagg aggaaaaaaa 5940aatggagaaa aaaatcactg gatataccac cgttgatata tcccaatggc atcgtaaaga 6000acattttgag gcatttcagt cagttgctca atgtacctat aaccagaccg ttcagctgga 6060tattacggcc tttttaaaga ccgtaaagaa aaataagcac aagttttatc cggcctttat 6120tcacattctt gcccgcctga tgaatgctca tccggagttc cgtatggcaa tgaaagacgg 6180tgagctggtg atatgggata gtgttcaccc ttgttacacc gttttccatg agcaaactga 6240aacgttttca tcgctctgga gtgaatacca cgacgatttc cggcagtttc tacacatata 6300ttcgcaagat gtggcgtgtt acggtgaaaa cctggcctat ttccctaaag ggtttattga 6360gaatatgttt ttcgtcagcg ccaatccctg ggtgagtttc accagttttg atttaaacgt 6420ggccaatatg gacaacttct tcgcccccgt tttcactatg ggcaaatatt atacgcaagg 6480cgacaaggtg ctgatgccgc tggcgattca ggttcatcat gccgtctgtg atggcttcca 6540tgtcggcaga atgcttaatg aattacaaca gtactgcgat gagtggcagg gcggggcgta 6600aacgccgagg aggaaaaaaa aatgcgctca cgcaactggt ccagaacctt gaccgaacgc 6660agcggtggta acggcgcagt ggcggttttc atggcttgtt atgactgttt ttttgtacag 6720tctatgcctc gggcatccaa gcagcaagcg cgttacgccg tgggtcgatg tttgatgtta 6780tggagcagca acgatgttac gcagcagggc agtcgcccta aaacaaagtt aggtggctca 6840agtatgggca tcattcgcac atgtaggctc ggccctgacc aagtcaaatc catgcgggct 6900gctcttgatc ttttcggtcg tgagttcggt gacgtagcca cctactccca acatcagccg 6960gactccgatt acctcgggaa cttgctccgt agtaagacat tcatcgcgct tgctgccttc 7020gaccaagaag cggttgttgg cgctctcgcg gcttacgttc tgcccaagtt tgagcagccg 7080cgtagtgaga tctatatcta tgatctcgca gtctccggcg agcaccggag gcagggcatt 7140gccaccgcgc tcatcaatct cctcaagcat gaggccaacg cgcttggtgc ttatgtgatc 7200tacgtgcaag cagattacgg tgacgatccc gcagtggctc tctatacaaa gttgggcata 7260cgggaagaag tgatgcactt tgatatcgac ccaagtaccg ccacctaagc 73101357804DNAArtificial sequencepLS9-115 plasmid 135ggtggcggta cttgggtcga tatcaaagtg catcacttct tcccgtatgc ccaactttgt 60atagagagcc actgcgggat cgtcaccgta atctgcttgc acgtagatca cataagcacc 120aagcgcgttg gcctcatgct tgaggagatt gatgagcgcg gtggcaatgc cctgcctccg 180gtgctcgccg gagactgcga gatcatagat atagatctca ctacgcggct gctcaaactt 240gggcagaacg taagccgcga gagcgccaac aaccgcttct tggtcgaagg cagcaagcgc 300gatgaatgtc ttactacgga gcaagttccc gaggtaatcg gagtccggct gatgttggga 360gtaggtggct acgtcaccga actcacgacc gaaaagatca agagcagccc gcatggattt 420gacttggtca gggccgagcc tacatgtgcg aatgatgccc atacttgagc cacctaactt 480tgttttaggg cgactgccct gctgcgtaac atcgttgctg ctccataaca tcaaacatcg 540acccacggcg taacgcgctt gctgcttgga tgcccgaggc atagactgta caaaaaaaca 600gtcataacaa gccatgaaaa ccgccactgc gccgttacca ccgctgcgtt cggtcaaggt 660tctggaccag ttgcgtgagc gcattttttt ttcctcctcg gcgtttacgc cccgccctgc 720cactcatcgc agtactgttg taattcatta agcattctgc cgacatggaa gccatcacag 780acggcatgat gaacctgaat cgccagcggc atcagcacct tgtcgccttg cgtataatat 840ttgcccatag tgaaaacggg ggcgaagaag ttgtccatat tggccacgtt taaatcaaaa 900ctggtgaaac tcacccaggg attggcgctg acgaaaaaca tattctcaat aaacccttta 960gggaaatagg ccaggttttc accgtaacac gccacatctt gcgaatatat gtgtagaaac 1020tgccggaaat cgtcgtggta ttcactccag agcgatgaaa acgtttcagt ttgctcatgg 1080aaaacggtgt aacaagggtg aacactatcc catatcacca gctcaccgtc tttcattgcc 1140atacggaact ccggatgagc attcatcagg cgggcaagaa tgtgaataaa ggccggataa 1200aacttgtgct tatttttctt tacggtcttt aaaaaggccg taatatccag ctgaacggtc 1260tggttatagg tacattgagc aactgactga aatgcctcaa aatgttcttt acgatgccat 1320tgggatatat caacggtggt atatccagtg atttttttct ccattttttt ttcctccttt 1380agaaaaactc atcgagcatc aaatgaaact gcaatttatt catatcagga ttatcaatac 1440catatttttg aaaaagccgt ttctgtaatg aaggagaaaa ctcaccgagg cagttccata 1500ggatggcaag atcctggtat cggtctgcga ttccgactcg tccaacatca atacaaccta 1560ttaatttccc ctcgtcaaaa ataaggttat caagtgagaa atcaccatga gtgacgactg 1620aatccggtga gaatggcaaa agtttatgca tttctttcca gacttgttca acaggccagc 1680cattacgctc gtcatcaaaa tcactcgcat caaccaaacc gttattcatt cgtgattgcg 1740cctgagcgag gcgaaatacg cgatcgctgt taaaaggaca attacaaaca ggaatcgagt 1800gcaaccggcg caggaacact gccagcgcat caacaatatt ttcacctgaa tcaggatatt 1860cttctaatac ctggaacgct gtttttccgg ggatcgcagt ggtgagtaac catgcatcat 1920caggagtacg gataaaatgc ttgatggtcg gaagtggcat aaattccgtc agccagttta 1980gtctgaccat ctcatctgta acatcattgg caacgctacc tttgccatgt ttcagaaaca 2040actctggcgc atcgggcttc ccatacaagc gatagattgt cgcacctgat tgcccgacat 2100tatcgcgagc ccatttatac ccatataaat cagcatccat gttggaattt aatcgcggcc 2160tcgacgtttc ccgttgaata tggctcattt ttttttcctc ctttaccaat gcttaatcag 2220tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct gactccccgt 2280cgtgtagata actacgatac gggagggctt accatctggc cccagcgctg cgatgatacc 2340gcgagaacca cgctcaccgg ctccggattt atcagcaata aaccagccag ccggaagggc 2400cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta attgttgccg 2460ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg ccatcgctac 2520aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg gttcccaacg 2580atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc 2640tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta tggcagcact 2700gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg gtgagtactc 2760aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat 2820acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg gaaaacgttc 2880ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga tgtaacccac 2940tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg ggtgagcaaa 3000aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat gttgaatact 3060catattcttc ctttttcaat attattgaag catttatcag ggttattgtc tcatgagcgg 3120atacatattt gaatgtattt agaaaaataa acaaataggg gtcagtgtta caaccaatta 3180accaattctg aacattatcg cgagcccatt tatacctgaa tatggctcat aacacccctt 3240gtttgcctgg cggcagtagc gcggtggtcc cacctgaccc catgccgaac tcagaagtga 3300aacgccgtag cgccgatggt agtgtgggga ctccccatgc gagagtaggg aactgccagg 3360catcaaataa aacgaaaggc tcagtcgaaa gactgggcct ttcgcccggg ctaattgagg 3420ggtgtcgccc ttattcgact ctatagtgaa gttcctattc tctagaaagt ataggaactt 3480ctgaagtggg gcatatgtct aagatcaagc caagcaaggg cgctccgtac gcgcgcatcc 3540tgggcgtcgg cggttaccgt ccgacccgtg tggtgccgaa cgaggtgatc ctggagaaga 3600tcgactcttc cgacgagtgg attcgctctc gctccggcat cgaaacgcgt cactgggcgg 3660gtccggaaga aaccgtcgcg gcgatgtctg tggaggcctc cggcaaggca ctggccgacg 3720ccggtatcga cgcctctcgt atcggtgccg tggtagtctc taccgtgtct cacttcagcc 3780agaccccggc catcgccacc gagatcgccg accgcctggg cacggacaag gccgcagcct 3840tcgacatctc tgccggctgc gcgggcttcg gctacggtct gaccctggcc aagggcatgg 3900tcgtcgaagg ttctgcggag tacgtgctgg tcatcggcgt ggagcgtctg tccgacctga 3960ccgacctgga ggaccgtgcc acggccttcc tgttcggcga cggcgctggt gcggtcgtgg 4020tcggcccgtc ccaggagccg gcaatcggcc cgacggtctg gggctctgag ggcgacaagg 4080ccgaaacgat caagcagacc gtttcctggg accgcttccg tatcggcgat gtctccgaac 4140tgccgctgga ctccgagggc aacgtcaagt ttcctgcgat cacgcaggag ggccaggcgg 4200tgttccgctg ggccgtgttc gagatggcga aggtcgcgca gcaggcgctg gacgcggcgg 4260gtatcagccc ggacgacctg gacgtcttta tcccgcacca ggccaatgtg cgtatcatcg 4320actctatggt gaaaaccctg aagctgccgg agcacgtcac ggtcgcccgt gacatccgca 4380ccaccggcaa cacctctgcc gcctctattc cgctggcgat ggagcgtctg ctggcgaccg 4440gcgacgcgcg tagcggcgac accgcgctgg tcatcggctt cggtgcgggt ctggtctacg 4500ccgcgacggt cgttaccctg ccgtaaccac tccgtgccgg atcaccccgg tccggaacgg 4560agagcagcac cgcccgccgc cgacgcggcg ggccgccaca ccctctggac aacaaagaag 4620gagcgccgtc atggccgcca ctcaggaaga gatcgtcgcc ggtctggcgg agatcgtgaa 4680cgagatcgcc ggcatcccgg tcgaggacgt caagctggac aagtccttca ccgacgacct 4740ggacgtagac tctctgagca tggtcgaggt cgtcgtcgcc gccgaagagc gcttcgacgt 4800caagatcccg gacgacgacg tcaagaacct gaaaacggtc ggcgacgcga cgaagtacat 4860cctggaccac caggcctgat ccgccgatac tcgggcatga cccgcgtacc gggcagatcc 4920gggcagactg ccccgccgcc cggcggtggc gccgtacgaa tccgtatccc gttggagaaa 4980gaattcccat gagcagcacc aatcgcaccg tggtcgtcac cggtatcggc gcaaccaccc 5040cgctgggtgg cgacgcagcc tctacctggg agggtctggt cgcgggtcgt tccggcgtcc 5100gtccgctgga gcaggagtgg gctgccgacc aggcggtccg tatcgcagcg ccggcagccg 5160tagacccgtc cgaggtcatc ccgcgtccgc aggcacgccg tctggaccgc tctgcgcagt 5220tcgcgctgat cgcggcgcag gaggcctgga aggacgccgg ttacgccggc aaggcgggcg 5280agtctccggc ggaggacggt gcggctcacg tagacccgga ccgtctgggt gcggtcatcg 5340cctccggcat cggcggcgtg accacgctgc tggaccagta cgacgtgctg aaggagaagg 5400gcgtccgccg cgtttccccg cacaccgtcc cgatgctgat gccgaacggt ccgtccgcca 5460acgtcggcct ggccgtgggt gcccgtgcgg gcgtgcacac cccggtgtct gcctgcgcgt 5520ctggcgccga ggccatcggc tacgccatcg agatgatccg cactggccgt gcggacgtcg 5580tcgtcgcggg tggcacggag gcggcgatcc acccgctgcc gattgccgcg ttcggcaaca 5640tgatggcgat gtccaagaac aacgacgacc cgcagggcgc ctcccgcccg ttcgacacgg 5700cgcgtgacgg cttcgtcctg ggcgaaggtg ccggcgtcct ggtcctggag tccgccgagc 5760atgcggcagc gcgcggtgcc cgcgtctacg cggaggcggt cggccagggc atctccgccg 5820acagccacga catcgtgcag ccggagccgg agggccgtgg catctccgca gcgctgcaaa 5880acctgctgga cggcaacgac ctggacccgg ccgagatcgt gcacgtcaac gcgcacgcca 5940cctctacccc ggcaggtgac atcgccgagc tgaaggcgct gcgcaaggtc ctgggcgacg 6000acgtagacca catggccgtc agcggcacca agtctatgac cggtcacctg ctgggtggcg 6060ctggcggcgt ggagtccgtg gcgaccgtgc tggcgctgta ccaccgtgtg gctccgccga 6120ccatcaacgt cgagaacctg gacccggagg ccgaggccaa cgcggacatc gtccgcggtg 6180aggcccgcaa

gctgccggtg gagggccgta tcgccgcgct gaacgactct ttcggcttcg 6240gcggtcacaa cgtggtgctg gcgttccgtt ctgtctgatt aattaaccta ggaaaatgaa 6300gtgaagttcc tatactttct agagaatagg aacttctata gtgagtcgaa taagggcgac 6360acaaaattta ttctaaatgc ataataaata ctgataacat cttatagttt gtattatatt 6420ttgtattatc gttgacatgt ataattttga tatcaaaaac tgattttccc tttattattt 6480tcgagattta ttttcttaat tctctttaac aaactagaaa tattgtatat acaaaaaatc 6540ataaataata gatgaatagt ttaattatag gtgttcatca atcgaaaaag caacgtatct 6600tatttaaagt gcgttgcttt tttctcattt ataaggttaa ataattctca tatatcaagc 6660aaagtgacag gcgcccttaa atattctgac aaatgctctt tccctaaact ccccccataa 6720aaaaacccgc cgaagcgggt ttttacgtta tttgcggatt aacgattact cgttatcaga 6780accgcccagg gggcccgagc ttaagactgg ccgtcgtttt acaacacaga aagagtttgt 6840agaaacgcaa aaaggccatc cgtcaggggc cttctgctta gtttgatgcc tggcagttcc 6900ctactctcgc cttccgcttc ctcgctcact gactcgctgc gctcggtcgt tcggctgcgg 6960cgagcggtat cagctcactc aaaggcggta atacggttat ccacagaatc aggggataac 7020gcaggaaaga acatgtgagc aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcg 7080ttgctggcgt ttttccatag gctccgcccc cctgacgagc atcacaaaaa tcgacgctca 7140agtcagaggt ggcgaaaccc gacaggacta taaagatacc aggcgtttcc ccctggaagc 7200tccctcgtgc gctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc 7260ccttcgggaa gcgtggcgct ttctcatagc tcacgctgta ggtatctcag ttcggtgtag 7320gtcgttcgct ccaagctggg ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc 7380ttatccggta actatcgtct tgagtccaac ccggtaagac acgacttatc gccactggca 7440gcagccactg gtaacaggat tagcagagcg aggtatgtag gcggtgctac agagttcttg 7500aagtggtggg ctaactacgg ctacactaga agaacagtat ttggtatctg cgctctgctg 7560aagccagtta ccttcggaaa aagagttggt agctcttgat ccggcaaaca aaccaccgct 7620ggtagcggtg gtttttttgt ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa 7680gaagatcctt tgatcttttc tacggggtct gacgctcagt ggaacgacgc gcgcgtaact 7740cacgttaagg gattttggtc atgagcttgc gccgtcccgt caagtcagcg taatgctctg 7800ctta 780413610460DNAArtificial sequencepKZ4 plasmid 136tcgcgacgcg aggctggatg gccttcccca ttatgattct tctcgcttcc ggcggcatcg 60ggatgcccgc gttgcaggcc atgctgtcca ggcaggtaga tgacgaccat cagggacagc 120ttcaaggatc gctcgcggct cttaccagcc taacttcgat cactggaccg ctgatcgtca 180cggcgattta tgccgcctcg gcgagcacat ggaacgggtt ggcatggatt gtaggcgccg 240ccctatacct tgtctgcctc cccgcgttgc gtcgcggtgc atggagccgg gccacctcga 300cctgaatgga agccggcggc acctcgctaa cggattcacc actccaagaa ttggagccaa 360tcaattcttg cggagaactg tgaatgcgca aaccaaccct tggcagaaca tatccatcgc 420gtccgccatc tccagcagcc gcacgcggcg catctcgggc agcgttgggt cctggccacg 480ggtgcgcatg atcgtgctcc tgtcgttgag gacccggcta ggctggcggg gttgccttac 540tggttagcag aatgaatcac cgatacgcga gcgaacgtga agcgactgct gctgcaaaac 600gtctgcgacc tgagcaacaa catgaatggt cttcggtttc cgtgtttcgt aaagtctgga 660aacgcggaag tcagcgccct gcaccattat gttccggatc tgcatcgcag gatgctgctg 720gctaccctgt ggaacaccta catctgtatt aacgaagcgc tggcattgac cctgagtgat 780ttttctctgg tcccgccgca tccataccgc cagttgttta ccctcacaac gttccagtaa 840ccgggcatgt tcatcatcag taacccgtat cgtgagcatc ctctctcgtt tcatcggtat 900cattaccccc atgaacagaa atccccctta cacggaggca tcagtgacca aacaggaaaa 960aaccgccctt aacatggccc gctttatcag aagccagaca ttaacgcttc tggagaaact 1020caacgagctg gacgcggatg aacaggcaga catctgtgaa tcgcttcacg accacgctga 1080tgagctttac cgcagctgcc tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat 1140gcagctcccg gagacggtca cagcttgtct gtaagcggat gccgggagca gacaagcccg 1200tcagggcgcg tcagcgggtg ttggcgggtg tcggggcgca gccatgaccc agtcacgtag 1260cgatagcgga gtgtatactg gcttaactat gcggcatcag agcagattgt actgagagtg 1320caccatatgc ggtgtgaaat accgcacaga tgcgtaagga gaaaataccg catcaggcgc 1380tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta 1440tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag 1500aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg 1560tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 1620tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 1680cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 1740agcgtggcgc tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 1800tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 1860aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 1920ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 1980cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt 2040accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 2100ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 2160ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 2220gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 2280aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 2340gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 2400gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 2460cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 2520gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 2580gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctgca 2640ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 2700tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 2760ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 2820cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 2880accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaaca 2940cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 3000tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 3060cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 3120acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 3180atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 3240tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 3300aaagtgccac ctgacgtctt aattaatcag gagagcgttc accgacaaac aacagataaa 3360acgaaaggcc cagtctttcg actgagcctt tcgttttatt tgatgcctgg cagttcccta 3420ctctcgcatg gggagacccc acactaccat cggcgctacg gcgtttcact tctgagttcg 3480gcatggggtc aggtgggacc accgcgctac tgccgccagg caaattctgt tttatcagac 3540cgcttctgcg ttctgattta atctgtatca ggctgaaaat cttctctcat ccgccaaaac 3600agccaagctg gagaccgttt aaactcaatg atgatgatga tgatggtcga cggcgctatt 3660cagatcctct tctgagatga gtttttgttc gggcccaagc ttcgaattct cagatatgca 3720aggcgtggcc caacgcgcgt agggcggctt cttgcaccgc ttcacccaac gtggggtggg 3780catgaatggt gccggccaca tcttccaggc acgcgcccat ctccagcgat tgggcaaacg 3840cggtggacag ctcggagacc gccacgccaa ccgcctgcca acccacgatc aggtggttgt 3900cacggcgcgc caccacccgc acgaaaccgc ttttcgactc caggctcatg gcccggccat 3960tggcggcaaa cgggaactgc gcgacgatgc agtccagggc ctgctggctg gcttgttccg 4020gggtcttgcc gaccaccacc acttccgggt cggtaaagca cacggcggca atcgctgccg 4080gctcgaagcg tcgggccttg ccggcgatga tttcggcgac catctcgcct tgggccatgg 4140cccggtgcgc cagcatcggt tcgccagcga cgtcgccaat ggcccagacg ttgtgcatgc 4200tggtatgaca gcgctcgtcg atggcaatgg cggtgccgtt catcttcagg tccaggcatt 4260ccaggttgaa gcccttggtg cgtggccggc ggcccacggc caccagtacc tgatcggctt 4320caagacgcag ttgcccaccc ttgccgtcgc tggccagcag gcagccattt tcgtagccct 4380cgacgctgtg gcccaggtgc aacgcgatgc ccagtttctt cagcgactcg gccaccgggg 4440cggtcaattc gctgtcgtag gtcggcagga tgcgttcgcg cgcttccacc acactcacct 4500gtgcacccag cttgcgatag gcaatgccca gctccaggcc gatatagcca ccgccgacca 4560ccaccaggtg ttgcggcagg gctttcggcg ccagggcttc ggtcgaggaa atcaccgggc 4620cacccagcgg cagcatcggc agttcgacac tggtggaacc ggtcgccagc aacagatgct 4680cgcactggat acgctggcca tcgacctcga cctgcttgcc gtccagtacc ttggcccagc 4740catgcaccac tttcaccccg tgctttttca gcaaggcggc aacaccggtg gtcagacggt 4800cgacaatgcc gtccttccag gtgacgctct ggccgatgtc caggcgcggc gaagccacgc 4860tgatgcccag cggcgagggt tcggtaaagc gcgaggcttg gtgaaactgc tcggccacgt 4920ggatcagcgc cttggacggg atgcagccga tgttcaggca ggtgccgccc agtgcctggc 4980cttccaccag tacggtagga atgcccagtt gcccggcgcg gatggctgct acatagccgc 5040cagggccgcc gccgatgatc aacagggtag tctggataat ctgttgcatg ctcactccac 5100gaacaggcag gcgggttgtt cgagcaggcc acgcacggcc tggatgaaca gggcggcgtc 5160catgccatcg accacgcggt ggtcgaacga gctggacagg ttcatcatct tgcgcacgac 5220gatctggcca tcaatcacca ccggtcgttc gaccatgcgg ttgaccccga cgattgccac 5280ttccggggtg ttgaccaccg gcgtgctgac aatgccaccc aaggcgccga ggctggtcag 5340ggtgatggtc gagccggaca gctcctcgcg gctggccttg ttgttacgtg cagcgttggc 5400caggcgcgaa atctcgccgg cattggccca caggctgccc gcttcggcgt ggcgcagcac 5460gggtaccatc aggccgttgt caccctgggt ggcaatgccc acatgcaccg cgccatggcg 5520ggtgatgatc tgcgcttcgt cgtcgtaggt cgcgttgatc tgcgggaagt cacgcagcgc 5580cacgacgagg gcgcgcacca ggaatggcag caaggtcagt ttgccgcggc tgtcgccgtg 5640cttgctgttg agttgctggc gcagggcttc cagggcggtg acgtcgattt cctcgacata 5700actgaagtgc gcgacccggc gtttggcgtc ctgcatgcgc tgggcgatct tgcggcgcag 5760gccgatcacc ggcacctgct cgctgtcggt gcgcttggca taaccatcag gtgcttgccc 5820ggcattgctt tgcggcttgc tcatgaaggc gtcgaggtct tcgtgcagaa tgcgcccggc 5880cgggccgcta ccatgcacat aacgcagttc gataccggcg tccagggcgc gtttgcgcac 5940ggccggcgag gccagcggct tgtcgcccgg ctggcgcggc acgatgggcg cagcttcgtg 6000gttggcgggc gcctggtaca cggcgggttt tacgtctttc tgcggttccg gcttggctgc 6060aatcggggcg gccggggcct ctaccggttt tggctgaggc acgtccacat ggttgccgct 6120gccttccact tcgatgcgga tcagttcgct accgaccgcc atcacttccc cgggctggcc 6180acccagggcc aacaccttgc cgctgaccgg cgaggggatt tccacggtgg ccttgtcggt 6240catgacgtcg gccaccacct ggtcctcggc gatgatgtcg ccgaccttga cgaaccattc 6300caccaactcg acctgcgcga tgccttcgcc aatgtccggc atcttgatga cgtgcgtgcc 6360cattcagacc tccatgacct ttttcagtgc cgcacctacc cgcgaagggc cggggaagta 6420agcccattcc tgtgcgtgag ggtagggggt gtcccagccg gtgacgcgct cgatcggcgc 6480ctccaggtgg tggaagcagt gctcctgcac cagcgacacc agctcggcac cgaagccgca 6540ggtgcgggtg gcctcgtgca ccaccacgca acggccagtc tttttcaccg actcgacgat 6600agtgtccagg tccagcggcc acaggctgcg caggtcgatc acttcggcat cgacgccgct 6660ttcttcggcg gccacctggg ccacgtacac cgtggtgccg taagtcagta cggtcacgtc 6720attgccaggg cgggtaatgg cggccttgtc cagcggtacg gtgtaatagc cgtcgggcac 6780ggcgctgtgc gggtgcttcg accatggggt tacagggcgg tcgtggtggc catcgaacgg 6840gccgttgtac agacgtttgg gctccaggaa gattaccggg tcgtcgcatt cgatcgaggc 6900aatcagcagg cctttggcgt cataagggtt ggacggcatc acggtgcgca ggccgcagac 6960ctgggtgaac atcgcttccg ggctctggct gtgagtctgg ccgccataga tgccgccgcc 7020gcaaggcatg cgcagggtca gcggggcaat gaactcgccg gccgaccggt aacgcaggcg 7080ggccagctcg gagacgatct ggtcggaggc cgggtagaag tagtcggcga actggatctc 7140caccaccggg cgcaggccat aggcgcccat gcctacggcg gtaccgacga tgccgctctc 7200ggagatgggc gcgtcgaaca cgcgcgattt gccgtacttg ttctgcaggc cttcggtgca 7260gcggaacacg ccgccgaagt aaccgacgtc ctggccgtac accaccacat tgtcgtcgcg 7320ctcaagcatg acatccatgg ccgagcgcag ggcctggatc atggtcatgg tagtggtggc 7380catggcggtt tccgggttga tgctgttgtt gtggtcgttc atctcaaacc cccagttcct 7440ggcgttgacg gcgcaggtgt tcgggcatct ccttgtacac atcctcgaac atcgaggcgg 7500cgctcgggat gtgcccgtta gccagggtgc cgtactgctc ggcttctttc tgtgcggcaa 7560tcaccgcagc ttcgagctcg gccgtgacgg cttggtgttc ttcttcggac cagtggccga 7620tcttgatcag gtgctgcttc aggcgggcga tcgggtcacc cagcgggaag tggctccagt 7680catcggcagg gcggtacttg gaggggtcgt ccgacgtcga gtgcgggccg gcacggtagg 7740tgacccactc gatcaggctt gggcccaggc cgcggcgggc gcgctcggca gcccagcgcg 7800aggcggcgta cacggcgacg aagtcgttgc cgtcaacccg cagcgaggca atgccgcagc 7860ccacgccacg gccggcgaag gtggtcgact cgccaccggc gatggcctgg aaggtagaaa 7920tcgcccactg gttgttgacc acattgagga tcaccggggc gcggtaaacg tgggcaaagg 7980tgagggcggt gtggaagtcc gactcggcgg tggctccgtc accgatccac gccgaagcaa 8040tcttggtatc gcccttgatc gccgaggcca tggcccagcc gactgcctgc acgaactggg 8100tcgccaggtt gccgctgatg gtgaagaagc cggcttcgcg caccgagtac atgatcggca 8160actggcggcc cttgaggggg tcgcgctcgt tggacagcag ttggcagatc atctcgacca 8220gcgatacgtc gcgggccatc aggatgcttt gctggcggta ggtcgggaag cacatgtcgg 8280tgcggttcag cgccagcgcc tggccactgc cgatggcttc ttcgcccagg ctttgcatgt 8340agaaggacat cttcttctgg cgctgggcaa ccaccatgcg gctgtcgaag atccgcgtct 8400tgagcatggc gcgcatgcct tgacgaagga tctgtgggtc gatgtcttcg gcccaggggc 8460cttgcgcatc accttgctcg tcgagcacgc ggaccaggct gtaggacagg tcggcagtgt 8520cggcagcatc gacatcgatc gcgggtttac gggcttgacc tgcatcgttg aggcgcaggt 8580aggaaaaatc ggtctggcag cctggccggc cggtgggctc gggcacatgc aaacgcaggg 8640gggcgtactc gttcatggat ccatggttta ttcctcctta tttaatcgat acattaatat 8700atacctcttt aatttttaat aataaagtta atcgataatt ccggtcgagt gcccacacag 8760attgtctgat aaattgttaa agagcagtgc cgcttcgctt tttctcagcg gcgctgtttc 8820ctgtgtgaaa ttgttatccg ctcacaattc cacacattat acgagccgga tgattaattg 8880tcaacagctc atttcagaat atttgccaga accgttatga tgtcggcgca aaaaacatta 8940tccagaacgg gagtgcgcct tgagcgacac gaattatgca gtgatttacg acctgcacag 9000ccataccaca gcttccgatg gctgcctgac gccagaagca ttggtgcacc gtgcagtcga 9060tgataagctg tcaaaccaga tcaattcgcg ctaactcaca ttaattgcgt tgcgctcact 9120gcccgctttc cagtcgggaa acctgtcgtg ccagctgcat taatgaatcg gccaacgcgc 9180ggggagaggc ggtttgcgta ttgggcgcca gggtggtttt tcttttcacc agtgagacgg 9240gcaacagctg attgcccttc accgcctggc cctgagagag ttgcagcaag cggtccacgc 9300tggtttgccc cagcaggcga aaatcctgtt tgatggtggt tgacggcggg atataacatg 9360agctgtcttc ggtatcgtcg tatcccacta ccgagatatc cgcaccaacg cgcagcccgg 9420actcggtaat ggcgcgcatt gcgcccagcg ccatctgatc gttggcaacc agcatcgcag 9480tgggaacgat gccctcattc agcatttgca tggtttgttg aaaaccggac atggcactcc 9540agtcgccttc ccgttccgct atcggctgaa tttgattgcg agtgagatat ttatgccagc 9600cagccagacg cagacgcgcc gagacagaac ttaatgggcc cgctaacagc gcgatttgct 9660ggtgacccaa tgcgaccaga tgctccacgc ccagtcgcgt accgtcttca tgggagaaaa 9720taatactgtt gatgggtgtc tggtcagaga catcaagaaa taacgccgga acattagtgc 9780aggcagcttc cacagcaatg gcatcctggt catccagcgg atagttaatg atcagcccac 9840tgacgcgttg cgcgagaaga ttgtgcaccg ccgctttaca ggcttcgacg ccgcttcgtt 9900ctaccatcga caccaccacg ctggcaccca gttgatcggc gcgagattta atcgccgcga 9960caatttgcga cggcgcgtgc agggccagac tggaggtggc aacgccaatc agcaacgact 10020gtttgcccgc cagttgttgt gccacgcggt tgggaatgta attcagctcc gccatcgccg 10080cttccacttt ttcccgcgtt ttcgcagaaa cgtggctggc ctggttcacc acgcgggaaa 10140cggtctgata agagacaccg gcatactctg cgacatcgta taacgttact ggtttcacat 10200tcaccaccct gaattgactc tcttccgggc gctatcatgc cataccgcga aaggttttgc 10260accattcgat ggtgtcaacg taaatgcatg ccgcttcgcc ttcgcgcgcg aattgatctg 10320ctgcctcgcg cgtttcggtg atgacggtga aaacctctga cacatgcagc tcccggagac 10380ggtcacagct tgtctgtaag cggatgccgg gagcagacaa gcccgtcagg gcgcgtcagc 10440gggtgttggc ggggccggcc 104601375544DNAArtificial sequencepGL10.173b vector backbone 137tcgcgacgcg aggctggatg gccttcccca ttatgattct tctcgcttcc ggcggcatcg 60ggatgcccgc gttgcaggcc atgctgtcca ggcaggtaga tgacgaccat cagggacagc 120ttcaaggatc gctcgcggct cttaccagcc taacttcgat cactggaccg ctgatcgtca 180cggcgattta tgccgcctcg gcgagcacat ggaacgggtt ggcatggatt gtaggcgccg 240ccctatacct tgtctgcctc cccgcgttgc gtcgcggtgc atggagccgg gccacctcga 300cctgaatgga agccggcggc acctcgctaa cggattcacc actccaagaa ttggagccaa 360tcaattcttg cggagaactg tgaatgcgca aaccaaccct tggcagaaca tatccatcgc 420gtccgccatc tccagcagcc gcacgcggcg catctcgggc agcgttgggt cctggccacg 480ggtgcgcatg atcgtgctcc tgtcgttgag gacccggcta ggctggcggg gttgccttac 540tggttagcag aatgaatcac cgatacgcga gcgaacgtga agcgactgct gctgcaaaac 600gtctgcgacc tgagcaacaa catgaatggt cttcggtttc cgtgtttcgt aaagtctgga 660aacgcggaag tcagcgccct gcaccattat gttccggatc tgcatcgcag gatgctgctg 720gctaccctgt ggaacaccta catctgtatt aacgaagcgc tggcattgac cctgagtgat 780ttttctctgg tcccgccgca tccataccgc cagttgttta ccctcacaac gttccagtaa 840ccgggcatgt tcatcatcag taacccgtat cgtgagcatc ctctctcgtt tcatcggtat 900cattaccccc atgaacagaa atccccctta cacggaggca tcagtgacca aacaggaaaa 960aaccgccctt aacatggccc gctttatcag aagccagaca ttaacgcttc tggagaaact 1020caacgagctg gacgcggatg aacaggcaga catctgtgaa tcgcttcacg accacgctga 1080tgagctttac cgcagctgcc tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat 1140gcagctcccg gagacggtca cagcttgtct gtaagcggat gccgggagca gacaagcccg 1200tcagggcgcg tcagcgggtg ttggcgggtg tcggggcgca gccatgaccc agtcacgtag 1260cgatagcgga gtgtatactg gcttaactat gcggcatcag agcagattgt actgagagtg 1320caccatatgc ggtgtgaaat accgcacaga tgcgtaagga gaaaataccg catcaggcgc 1380tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta 1440tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag 1500aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg 1560tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 1620tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 1680cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 1740agcgtggcgc tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 1800tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 1860aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 1920ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 1980cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt 2040accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 2100ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 2160ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 2220gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 2280aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 2340gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 2400gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 2460cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 2520gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 2580gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctgca 2640ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 2700tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 2760ccgatcgttg tcagaagtaa gttggccgca

gtgttatcac tcatggttat ggcagcactg 2820cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 2880accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaaca 2940cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 3000tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 3060cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 3120acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 3180atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 3240tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 3300aaagtgccac ctgacgtctt aattaatcag gagagcgttc accgacaaac aacagataaa 3360acgaaaggcc cagtctttcg actgagcctt tcgttttatt tgatgcctgg cagttcccta 3420ctctcgcatg gggagacccc acactaccat cggcgctacg gcgtttcact tctgagttcg 3480gcatggggtc aggtgggacc accgcgctac tgccgccagg caaattctgt tttatcagac 3540cgcttctgcg ttctgattta atctgtatca ggctgaaaat cttctctcat ccgccaaaac 3600agccaagctg gagaccgttt aaactcaatg atgatgatga tgatggtcga cggcgctatt 3660cagatcctct tctgagatga gtttttgttc gggcccaagc ttcgaattcc catatggtac 3720cagctgcaga tctcgagctc ggatccatgg tttattcctc cttatttaat cgatacatta 3780atatatacct ctttaatttt taataataaa gttaatcgat aattccggtc gagtgcccac 3840acagattgtc tgataaattg ttaaagagca gtgccgcttc gctttttctc agcggcgctg 3900tttcctgtgt gaaattgtta tccgctcaca attccacaca ttatacgagc cggatgatta 3960attgtcaaca gctcatttca gaatatttgc cagaaccgtt atgatgtcgg cgcaaaaaac 4020attatccaga acgggagtgc gccttgagcg acacgaatta tgcagtgatt tacgacctgc 4080acagccatac cacagcttcc gatggctgcc tgacgccaga agcattggtg caccgtgcag 4140tcgatgataa gctgtcaaac cagatcaatt cgcgctaact cacattaatt gcgttgcgct 4200cactgcccgc tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac 4260gcgcggggag aggcggtttg cgtattgggc gccagggtgg tttttctttt caccagtgag 4320acgggcaaca gctgattgcc cttcaccgcc tggccctgag agagttgcag caagcggtcc 4380acgctggttt gccccagcag gcgaaaatcc tgtttgatgg tggttgacgg cgggatataa 4440catgagctgt cttcggtatc gtcgtatccc actaccgaga tatccgcacc aacgcgcagc 4500ccggactcgg taatggcgcg cattgcgccc agcgccatct gatcgttggc aaccagcatc 4560gcagtgggaa cgatgccctc attcagcatt tgcatggttt gttgaaaacc ggacatggca 4620ctccagtcgc cttcccgttc cgctatcggc tgaatttgat tgcgagtgag atatttatgc 4680cagccagcca gacgcagacg cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt 4740tgctggtgac ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag 4800aaaataatac tgttgatggg tgtctggtca gagacatcaa gaaataacgc cggaacatta 4860gtgcaggcag cttccacagc aatggcatcc tggtcatcca gcggatagtt aatgatcagc 4920ccactgacgc gttgcgcgag aagattgtgc accgccgctt tacaggcttc gacgccgctt 4980cgttctacca tcgacaccac cacgctggca cccagttgat cggcgcgaga tttaatcgcc 5040gcgacaattt gcgacggcgc gtgcagggcc agactggagg tggcaacgcc aatcagcaac 5100gactgtttgc ccgccagttg ttgtgccacg cggttgggaa tgtaattcag ctccgccatc 5160gccgcttcca ctttttcccg cgttttcgca gaaacgtggc tggcctggtt caccacgcgg 5220gaaacggtct gataagagac accggcatac tctgcgacat cgtataacgt tactggtttc 5280acattcacca ccctgaattg actctcttcc gggcgctatc atgccatacc gcgaaaggtt 5340ttgcaccatt cgatggtgtc aacgtaaatg catgccgctt cgccttcgcg cgcgaattga 5400tctgctgcct cgcgcgtttc ggtgatgacg gtgaaaacct ctgacacatg cagctcccgg 5460agacggtcac agcttgtctg taagcggatg ccgggagcag acaagcccgt cagggcgcgt 5520cagcgggtgt tggcggggcc ggcc 55441381026DNASynechococcus elongatus 138atgttcggtc ttatcggtca tctcaccagt ttggagcagg cccgcgacgt ttctcgcagg 60atgggctacg acgaatacgc cgatcaagga ttggagtttt ggagtagcgc tcctcctcaa 120atcgttgatg aaatcacagt caccagtgcc acaggcaagg tgattcacgg tcgctacatc 180gaatcgtgtt tcttgccgga aatgctggcg gcgcgccgct tcaaaacagc cacgcgcaaa 240gttctcaatg ccatgtccca tgcccaaaaa cacggcatcg acatctcggc cttggggggc 300tttacctcga ttattttcga gaatttcgat ttggccagtt tgcggcaagt gcgcgacact 360accttggagt ttgaacggtt caccaccggc aatactcaca cggcctacgt aatctgtaga 420caggtggaag ccgctgctaa aacgctgggc atcgacatta cccaagcgac agtagcggtt 480gtcggcgcga ctggcgatat cggtagcgct gtctgccgct ggctcgacct caaactgggt 540gtcggtgatt tgatcctgac ggcgcgcaat caggagcgtt tggataacct gcaggctgaa 600ctcggccggg gcaagattct gcccttggaa gccgctctgc cggaagctga ctttatcgtg 660tgggtcgcca gtatgcctca gggcgtagtg atcgacccag caaccctgaa gcaaccctgc 720gtcctaatcg acgggggcta ccccaaaaac ttgggcagca aagtccaagg tgagggcatc 780tatgtcctca atggcggggt agttgaacat tgcttcgaca tcgactggca gatcatgtcc 840gctgcagaga tggcgcggcc cgagcgccag atgtttgcct gctttgccga ggcgatgctc 900ttggaatttg aaggctggca tactaacttc tcctggggcc gcaaccaaat cacgatcgag 960aagatggaag cgatcggtga ggcatcggtg cgccacggct tccaaccctt ggcattggca 1020atttga 1026139341PRTSynechococcus elongatus 139Met Phe Gly Leu Ile Gly His Leu Thr Ser Leu Glu Gln Ala Arg Asp 1 5 10 15 Val Ser Arg Arg Met Gly Tyr Asp Glu Tyr Ala Asp Gln Gly Leu Glu 20 25 30 Phe Trp Ser Ser Ala Pro Pro Gln Ile Val Asp Glu Ile Thr Val Thr 35 40 45 Ser Ala Thr Gly Lys Val Ile His Gly Arg Tyr Ile Glu Ser Cys Phe 50 55 60 Leu Pro Glu Met Leu Ala Ala Arg Arg Phe Lys Thr Ala Thr Arg Lys 65 70 75 80 Val Leu Asn Ala Met Ser His Ala Gln Lys His Gly Ile Asp Ile Ser 85 90 95 Ala Leu Gly Gly Phe Thr Ser Ile Ile Phe Glu Asn Phe Asp Leu Ala 100 105 110 Ser Leu Arg Gln Val Arg Asp Thr Thr Leu Glu Phe Glu Arg Phe Thr 115 120 125 Thr Gly Asn Thr His Thr Ala Tyr Val Ile Cys Arg Gln Val Glu Ala 130 135 140 Ala Ala Lys Thr Leu Gly Ile Asp Ile Thr Gln Ala Thr Val Ala Val 145 150 155 160 Val Gly Ala Thr Gly Asp Ile Gly Ser Ala Val Cys Arg Trp Leu Asp 165 170 175 Leu Lys Leu Gly Val Gly Asp Leu Ile Leu Thr Ala Arg Asn Gln Glu 180 185 190 Arg Leu Asp Asn Leu Gln Ala Glu Leu Gly Arg Gly Lys Ile Leu Pro 195 200 205 Leu Glu Ala Ala Leu Pro Glu Ala Asp Phe Ile Val Trp Val Ala Ser 210 215 220 Met Pro Gln Gly Val Val Ile Asp Pro Ala Thr Leu Lys Gln Pro Cys 225 230 235 240 Val Leu Ile Asp Gly Gly Tyr Pro Lys Asn Leu Gly Ser Lys Val Gln 245 250 255 Gly Glu Gly Ile Tyr Val Leu Asn Gly Gly Val Val Glu His Cys Phe 260 265 270 Asp Ile Asp Trp Gln Ile Met Ser Ala Ala Glu Met Ala Arg Pro Glu 275 280 285 Arg Gln Met Phe Ala Cys Phe Ala Glu Ala Met Leu Leu Glu Phe Glu 290 295 300 Gly Trp His Thr Asn Phe Ser Trp Gly Arg Asn Gln Ile Thr Ile Glu 305 310 315 320 Lys Met Glu Ala Ile Gly Glu Ala Ser Val Arg His Gly Phe Gln Pro 325 330 335 Leu Ala Leu Ala Ile 340 14010097DNAArtificial sequencepCL-Ptrc-carB_'tesA plasmid 140cactatacca attgagatgg gctagtcaat gataattact agtccttttc ctttgagttg 60tgggtatctg taaattctgc tagacctttg ctggaaaact tgtaaattct gctagaccct 120ctgtaaattc cgctagacct ttgtgtgttt tttttgttta tattcaagtg gttataattt 180atagaataaa gaaagaataa aaaaagataa aaagaataga tcccagccct gtgtataact 240cactacttta gtcagttccg cagtattaca aaaggatgtc gcaaacgctg tttgctcctc 300tacaaaacag accttaaaac cctaaaggcg tcggcatccg cttacagaca agctgtgacc 360gtctccggga gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg cgcgaggcag 420cagatcaatt cgcgcgcgaa ggcgaagcgg catgcattta cgttgacacc atcgaatggt 480gcaaaacctt tcgcggtatg gcatgatagc gcccggaaga gagtcaattc agggtggtga 540atgtgaaacc agtaacgtta tacgatgtcg cagagtatgc cggtgtctct tatcagaccg 600tttcccgcgt ggtgaaccag gccagccacg tttctgcgaa aacgcgggaa aaagtggaag 660cggcgatggc ggagctgaat tacattccca accgcgtggc acaacaactg gcgggcaaac 720agtcgttgct gattggcgtt gccacctcca gtctggccct gcacgcgccg tcgcaaattg 780tcgcggcgat taaatctcgc gccgatcaac tgggtgccag cgtggtggtg tcgatggtag 840aacgaagcgg cgtcgaagcc tgtaaagcgg cggtgcacaa tcttctcgcg caacgcgtca 900gtgggctgat cattaactat ccgctggatg accaggatgc cattgctgtg gaagctgcct 960gcactaatgt tccggcgtta tttcttgatg tctctgacca gacacccatc aacagtatta 1020ttttctccca tgaagacggt acgcgactgg gcgtggagca tctggtcgca ttgggtcacc 1080agcaaatcgc gctgttagcg ggcccattaa gttctgtctc ggcgcgtctg cgtctggctg 1140gctggcataa atatctcact cgcaatcaaa ttcagccgat agcggaacgg gaaggcgact 1200ggagtgccat gtccggtttt caacaaacca tgcaaatgct gaatgagggc atcgttccca 1260ctgcgatgct ggttgccaac gatcagatgg cgctgggcgc aatgcgcgcc attaccgagt 1320ccgggctgcg cgttggtgcg gatatctcgg tagtgggata cgacgatacc gaagacagct 1380catgttatat cccgccgtta accaccatca aacaggattt tcgcctgctg gggcaaacca 1440gcgtggaccg cttgctgcaa ctctctcagg gccaggcggt gaagggcaat cagctgttgc 1500ccgtctcact ggtgaaaaga aaaaccaccc tggcgcccaa tacgcaaacc gcctctcccc 1560gcgcgttggc cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc 1620agtgagcgca acgcaattaa tgtaagttag cgcgaattga tctggtttga cagcttatca 1680tcgactgcac ggtgcaccaa tgcttctggc gtcaggcagc catcggaagc tgtggtatgg 1740ctgtgcaggt cgtaaatcac tgcataattc gtgtcgctca aggcgcactc ccgttctgga 1800taatgttttt tgcgccgaca tcataacggt tctggcaaat attctgaaat gagctgttga 1860caattaatca tccggctcgt ataatgtgtg gaattgtgag cggataacaa tttcacacag 1920gaaacagcgc cgctgagaaa aagcgaagcg gcactgctct ttaacaattt atcagacaat 1980ctgtgtgggc actcgaccgg aattatcgat taactttatt attaaaaatt aaagaggtat 2040atattaatgt atcgattaaa taaggaggaa taaaccatga ccagcgatgt tcacgacgcc 2100acagacggcg tcaccgaaac cgcactcgac gacgagcagt cgacccgccg catcgccgag 2160ctgtacgcca ccgatcccga gttcgccgcc gccgcaccgt tgcccgccgt ggtcgacgcg 2220gcgcacaaac ccgggctgcg gctggcagag atcctgcaga ccctgttcac cggctacggt 2280gaccgcccgg cgctgggata ccgcgcccgt gaactggcca ccgacgaggg cgggcgcacc 2340gtgacgcgtc tgctgccgcg gttcgacacc ctcacctacg cccaggtgtg gtcgcgcgtg 2400caagcggtcg ccgcggccct gcgccacaac ttcgcgcagc cgatctaccc cggcgacgcc 2460gtcgcgacga tcggtttcgc gagtcccgat tacctgacgc tggatctcgt atgcgcctac 2520ctgggcctcg tgagtgttcc gctgcagcac aacgcaccgg tcagccggct cgccccgatc 2580ctggccgagg tcgaaccgcg gatcctcacc gtgagcgccg aatacctcga cctcgcagtc 2640gaatccgtgc gggacgtcaa ctcggtgtcg cagctcgtgg tgttcgacca tcaccccgag 2700gtcgacgacc accgcgacgc actggcccgc gcgcgtgaac aactcgccgg caagggcatc 2760gccgtcacca ccctggacgc gatcgccgac gagggcgccg ggctgccggc cgaaccgatc 2820tacaccgccg accatgatca gcgcctcgcg atgatcctgt acacctcggg ttccaccggc 2880gcacccaagg gtgcgatgta caccgaggcg atggtggcgc ggctgtggac catgtcgttc 2940atcacgggtg accccacgcc ggtcatcaac gtcaacttca tgccgctcaa ccacctgggc 3000gggcgcatcc ccatttccac cgccgtgcag aacggtggaa ccagttactt cgtaccggaa 3060tccgacatgt ccacgctgtt cgaggatctc gcgctggtgc gcccgaccga actcggcctg 3120gttccgcgcg tcgccgacat gctctaccag caccacctcg ccaccgtcga ccgcctggtc 3180acgcagggcg ccgacgaact gaccgccgag aagcaggccg gtgccgaact gcgtgagcag 3240gtgctcggcg gacgcgtgat caccggattc gtcagcaccg caccgctggc cgcggagatg 3300agggcgttcc tcgacatcac cctgggcgca cacatcgtcg acggctacgg gctcaccgag 3360accggcgccg tgacacgcga cggtgtgatc gtgcggccac cggtgatcga ctacaagctg 3420atcgacgttc ccgaactcgg ctacttcagc accgacaagc cctacccgcg tggcgaactg 3480ctggtcaggt cgcaaacgct gactcccggg tactacaagc gccccgaggt caccgcgagc 3540gtcttcgacc gggacggcta ctaccacacc ggcgacgtca tggccgagac cgcacccgac 3600cacctggtgt acgtggaccg tcgcaacaac gtcctcaaac tcgcgcaggg cgagttcgtg 3660gcggtcgcca acctggaggc ggtgttctcc ggcgcggcgc tggtgcgcca gatcttcgtg 3720tacggcaaca gcgagcgcag tttccttctg gccgtggtgg tcccgacgcc ggaggcgctc 3780gagcagtacg atccggccgc gctcaaggcc gcgctggccg actcgctgca gcgcaccgca 3840cgcgacgccg aactgcaatc ctacgaggtg ccggccgatt tcatcgtcga gaccgagccg 3900ttcagcgccg ccaacgggct gctgtcgggt gtcggaaaac tgctgcggcc caacctcaaa 3960gaccgctacg ggcagcgcct ggagcagatg tacgccgata tcgcggccac gcaggccaac 4020cagttgcgcg aactgcggcg cgcggccgcc acacaaccgg tgatcgacac cctcacccag 4080gccgctgcca cgatcctcgg caccgggagc gaggtggcat ccgacgccca cttcaccgac 4140ctgggcgggg attccctgtc ggcgctgaca ctttcgaacc tgctgagcga tttcttcggt 4200ttcgaagttc ccgtcggcac catcgtgaac ccggccacca acctcgccca actcgcccag 4260cacatcgagg cgcagcgcac cgcgggtgac cgcaggccga gtttcaccac cgtgcacggc 4320gcggacgcca ccgagatccg ggcgagtgag ctgaccctgg acaagttcat cgacgccgaa 4380acgctccggg ccgcaccggg tctgcccaag gtcaccaccg agccacggac ggtgttgctc 4440tcgggcgcca acggctggct gggccggttc ctcacgttgc agtggctgga acgcctggca 4500cctgtcggcg gcaccctcat cacgatcgtg cggggccgcg acgacgccgc ggcccgcgca 4560cggctgaccc aggcctacga caccgatccc gagttgtccc gccgcttcgc cgagctggcc 4620gaccgccacc tgcgggtggt cgccggtgac atcggcgacc cgaatctggg cctcacaccc 4680gagatctggc accggctcgc cgccgaggtc gacctggtgg tgcatccggc agcgctggtc 4740aaccacgtgc tcccctaccg gcagctgttc ggccccaacg tcgtgggcac ggccgaggtg 4800atcaagctgg ccctcaccga acggatcaag cccgtcacgt acctgtccac cgtgtcggtg 4860gccatgggga tccccgactt cgaggaggac ggcgacatcc ggaccgtgag cccggtgcgc 4920ccgctcgacg gcggatacgc caacggctac ggcaacagca agtgggccgg cgaggtgctg 4980ctgcgggagg cccacgatct gtgcgggctg cccgtggcga cgttccgctc ggacatgatc 5040ctggcgcatc cgcgctaccg cggtcaggtc aacgtgccag acatgttcac gcgactcctg 5100ttgagcctct tgatcaccgg cgtcgcgccg cggtcgttct acatcggaga cggtgagcgc 5160ccgcgggcgc actaccccgg cctgacggtc gatttcgtgg ccgaggcggt cacgacgctc 5220ggcgcgcagc agcgcgaggg atacgtgtcc tacgacgtga tgaacccgca cgacgacggg 5280atctccctgg atgtgttcgt ggactggctg atccgggcgg gccatccgat cgaccgggtc 5340gacgactacg acgactgggt gcgtcggttc gagaccgcgt tgaccgcgct tcccgagaag 5400cgccgcgcac agaccgtact gccgctgctg cacgcgttcc gcgctccgca ggcaccgttg 5460cgcggcgcac ccgaacccac ggaggtgttc cacgccgcgg tgcgcaccgc gaaggtgggc 5520ccgggagaca tcccgcacct cgacgaggcg ctgatcgaca agtacatacg cgatctgcgt 5580gagttcggtc tgatctgaga attctagatc tgatcgttgc gggcggggcg agagtctcgc 5640cccgcccgcg accgcggtga aaatacgaga atattatttg tattgatctc ctaggcgggg 5700taccgtattt tggatgataa cgaggcgcaa aaaatggcgg acacgttatt gattctgggt 5760gatagcctga gcgccgggta tcgaatgtct gccagcgcgg cctggcctgc cttgttgaat 5820gataagtggc agagtaaaac gtcggtagtt aatgccagca tcagcggcga cacctcgcaa 5880caaggactgg cgcgccttcc ggctctgctg aaacagcatc agccgcgttg ggtgctggtt 5940gaactgggcg gcaatgacgg tttgcgtggt tttcagccac agcaaaccga gcaaacgctg 6000cgccagattt tgcaggatgt caaagccgcc aacgctgaac cattgttaat gcaaatacgt 6060ctgcctgcaa actatggtcg ccgttataat gaagccttta gcgccattta ccccaaactc 6120gccaaagagt ttgatgttcc gctgctgccc ttttttatgg aagaggtcta cctcaagcca 6180caatggatgc aggatgacgg tattcatccc aaccgcgacg cccagccgtt tattgccgac 6240tggatggcga agcagttgca gcctttagta aatcatgact cataacctag gggtaccgct 6300agcgagctct ctagagaagc ttgggcccga acaaaaactc atctcagaag aggatctgaa 6360tagcgccgtc gaccatcatc atcatcatca ttgagtttaa acggtctcca gcttggctgt 6420tttggcggat gagagaagat tttcagcctg atacagatta aatcagaacg cagaagcggt 6480ctgataaaac agaatttgcc tggcggcagt agcgcggtgg tcccacctga ccccatgccg 6540aactcagaag tgaaacgccg tagcgccgat ggtagtgtgg ggtctcccca tgcgagagta 6600gggaactgcc aggcatcaaa taaaacgaaa ggctcagtcg aaagactggg cctttcgttt 6660tatctgttgt ttgtcggtga acgctctcct gacgcctgat gcggtatttt ctccttacgc 6720atctgtgcgg tatttcacac cgcatatggt gcactctcag tacaatctgc tctgatgccg 6780catagttaag ccagccccga cacccgccaa cacccgctga cgagcttagt aaagccctcg 6840ctagatttta atgcggatgt tgcgattact tcgccaacta ttgcgataac aagaaaaagc 6900cagcctttca tgatatatct cccaatttgt gtagggctta ttatgcacgc ttaaaaataa 6960taaaagcaga cttgacctga tagtttggct gtgagcaatt atgtgcttag tgcatctaac 7020gcttgagtta agccgcgccg cgaagcggcg tcggcttgaa cgaattgtta gacattattt 7080gccgactacc ttggtgatct cgcctttcac gtagtggaca aattcttcca actgatctgc 7140gcgcgaggcc aagcgatctt cttcttgtcc aagataagcc tgtctagctt caagtatgac 7200gggctgatac tgggccggca ggcgctccat tgcccagtcg gcagcgacat ccttcggcgc 7260gattttgccg gttactgcgc tgtaccaaat gcgggacaac gtaagcacta catttcgctc 7320atcgccagcc cagtcgggcg gcgagttcca tagcgttaag gtttcattta gcgcctcaaa 7380tagatcctgt tcaggaaccg gatcaaagag ttcctccgcc gctggaccta ccaaggcaac 7440gctatgttct cttgcttttg tcagcaagat agccagatca atgtcgatcg tggctggctc 7500gaagatacct gcaagaatgt cattgcgctg ccattctcca aattgcagtt cgcgcttagc 7560tggataacgc cacggaatga tgtcgtcgtg cacaacaatg gtgacttcta cagcgcggag 7620aatctcgctc tctccagggg aagccgaagt ttccaaaagg tcgttgatca aagctcgccg 7680cgttgtttca tcaagcctta cggtcaccgt aaccagcaaa tcaatatcac tgtgtggctt 7740caggccgcca tccactgcgg agccgtacaa atgtacggcc agcaacgtcg gttcgagatg 7800gcgctcgatg acgccaacta cctctgatag ttgagtcgat acttcggcga tcaccgcttc 7860cctcatgatg tttaactttg ttttagggcg actgccctgc tgcgtaacat cgttgctgct 7920ccataacatc aaacatcgac ccacggcgta acgcgcttgc tgcttggatg cccgaggcat 7980agactgtacc ccaaaaaaac agtcataaca agccatgaaa accgccactg cgccgttacc 8040accgctgcgt tcggtcaagg ttctggacca gttgcgtgag cgcatacgct acttgcatta 8100cagcttacga accgaacagg cttatgtcca ctgggttcgt gccttcatcc gtttccacgg 8160tgtgcgtcac ccggcaacct tgggcagcag cgaagtcgag gcatttctgt cctggctggc 8220gaacgagcgc aaggtttcgg tctccacgca tcgtcaggca ttggcggcct tgctgttctt 8280ctacggcaag gtgctgtgca cggatctgcc ctggcttcag gagatcggaa gacctcggcc 8340gtcgcggcgc ttgccggtgg tgctgacccc ggatgaagtg gttcgcatcc tcggttttct 8400ggaaggcgag catcgtttgt tcgcccagct tctgtatgga acgggcatgc ggatcagtga 8460gggtttgcaa ctgcgggtca aggatctgga tttcgatcac ggcacgatca tcgtgcggga 8520gggcaagggc tccaaggatc gggccttgat gttacccgag agcttggcac ccagcctgcg 8580cgagcagggg aattaattcc cacgggtttt gctgcccgca aacgggctgt tctggtgttg 8640ctagtttgtt atcagaatcg cagatccggc ttcagccggt ttgccggctg aaagcgctat 8700ttcttccaga attgccatga ttttttcccc acgggaggcg tcactggctc ccgtgttgtc 8760ggcagctttg attcgataag cagcatcgcc tgtttcaggc tgtctatgtg tgactgttga 8820gctgtaacaa gttgtctcag gtgttcaatt

tcatgttcta gttgctttgt tttactggtt 8880tcacctgttc tattaggtgt tacatgctgt tcatctgtta cattgtcgat ctgttcatgg 8940tgaacagctt tgaatgcacc aaaaactcgt aaaagctctg atgtatctat cttttttaca 9000ccgttttcat ctgtgcatat ggacagtttt ccctttgata tgtaacggtg aacagttgtt 9060ctacttttgt ttgttagtct tgatgcttca ctgatagata caagagccat aagaacctca 9120gatccttccg tatttagcca gtatgttctc tagtgtggtt cgttgttttt gcgtgagcca 9180tgagaacgaa ccattgagat catacttact ttgcatgtca ctcaaaaatt ttgcctcaaa 9240actggtgagc tgaatttttg cagttaaagc atcgtgtagt gtttttctta gtccgttatg 9300taggtaggaa tctgatgtaa tggttgttgg tattttgtca ccattcattt ttatctggtt 9360gttctcaagt tcggttacga gatccatttg tctatctagt tcaacttgga aaatcaacgt 9420atcagtcggg cggcctcgct tatcaaccac caatttcata ttgctgtaag tgtttaaatc 9480tttacttatt ggtttcaaaa cccattggtt aagcctttta aactcatggt agttattttc 9540aagcattaac atgaacttaa attcatcaag gctaatctct atatttgcct tgtgagtttt 9600cttttgtgtt agttctttta ataaccactc ataaatcctc atagagtatt tgttttcaaa 9660agacttaaca tgttccagat tatattttat gaattttttt aactggaaaa gataaggcaa 9720tatctcttca ctaaaaacta attctaattt ttcgcttgag aacttggcat agtttgtcca 9780ctggaaaatc tcaaagcctt taaccaaagg attcctgatt tccacagttc tcgtcatcag 9840ctctctggtt gctttagcta atacaccata agcattttcc ctactgatgt tcatcatctg 9900agcgtattgg ttataagtga acgataccgt ccgttctttc cttgtagggt tttcaatcgt 9960ggggttgagt agtgccacac agcataaaat tagcttggtt tcatgctccg ttaagtcata 10020gcgactaatc gctagttcat ttgctttgaa aacaactaat tcagacatac atctcaattg 10080gtctaggtga ttttaat 10097141471PRTEuglena gracilis 141Val Pro Gln Met Ala Glu Gly Phe Ser Gly Glu Ala Thr Ser Ala Trp 1 5 10 15 Ala Ala Ala Gly Pro Gln Trp Ala Ala Pro Leu Val Ala Ala Ala Ser 20 25 30 Ser Ala Leu Ala Leu Trp Trp Trp Ala Ala Arg Arg Ser Val Arg Arg 35 40 45 Pro Leu Ala Ala Leu Ala Glu Leu Pro Thr Ala Val Thr His Leu Ala 50 55 60 Pro Pro Met Ala Met Phe Thr Thr Thr Ala Lys Val Ile Gln Pro Lys 65 70 75 80 Ile Arg Gly Phe Ile Cys Thr Thr Thr His Pro Ile Gly Cys Glu Lys 85 90 95 Arg Val Gln Glu Glu Ile Ala Tyr Ala Arg Ala His Pro Pro Thr Ser 100 105 110 Pro Gly Pro Lys Arg Val Leu Val Ile Gly Cys Ser Thr Gly Tyr Gly 115 120 125 Leu Ser Thr Arg Ile Thr Ala Ala Phe Gly Tyr Gln Ala Ala Thr Leu 130 135 140 Gly Val Phe Leu Ala Gly Pro Pro Thr Lys Gly Arg Pro Ala Ala Ala 145 150 155 160 Gly Trp Tyr Asn Thr Val Ala Phe Glu Lys Ala Ala Leu Glu Ala Gly 165 170 175 Leu Tyr Ala Arg Ser Leu Asn Gly Asp Ala Phe Asp Ser Thr Thr Lys 180 185 190 Ala Arg Thr Val Glu Ala Ile Lys Arg Asp Leu Gly Thr Val Asp Leu 195 200 205 Val Val Tyr Ser Ile Ala Ala Pro Lys Arg Thr Asp Pro Ala Thr Gly 210 215 220 Val Leu His Lys Ala Cys Leu Lys Pro Ile Gly Ala Thr Tyr Thr Asn 225 230 235 240 Arg Thr Val Asn Thr Asp Lys Ala Glu Val Thr Asp Val Ser Ile Glu 245 250 255 Pro Ala Ser Pro Glu Glu Ile Ala Asp Thr Val Lys Val Met Gly Gly 260 265 270 Glu Asp Trp Glu Leu Trp Ile Gln Ala Leu Ser Glu Ala Gly Val Leu 275 280 285 Ala Glu Gly Ala Lys Thr Val Ala Tyr Ser Tyr Ile Gly Pro Glu Met 290 295 300 Thr Trp Pro Val Tyr Trp Ser Gly Thr Ile Gly Glu Ala Lys Lys Asp 305 310 315 320 Val Glu Lys Ala Ala Lys Arg Ile Thr Gln Gln Tyr Gly Cys Pro Ala 325 330 335 Tyr Pro Val Val Ala Lys Ala Leu Val Thr Gln Ala Ser Ser Ala Ile 340 345 350 Pro Val Val Pro Leu Tyr Ile Cys Leu Leu Tyr Arg Val Met Lys Glu 355 360 365 Lys Gly Thr His Glu Gly Cys Ile Glu Gln Met Val Arg Leu Leu Thr 370 375 380 Thr Lys Leu Tyr Pro Glu Asn Gly Ala Pro Ile Val Asp Glu Ala Gly 385 390 395 400 Arg Val Arg Val Asp Asp Trp Glu Met Ala Glu Asp Val Gln Gln Ala 405 410 415 Val Lys Asp Leu Trp Ser Gln Val Ser Thr Ala Asn Leu Lys Asp Ile 420 425 430 Ser Asp Phe Ala Gly Tyr Gln Thr Glu Phe Leu Arg Leu Phe Gly Phe 435 440 445 Gly Ile Asp Gly Val Asp Tyr Asp Gln Pro Val Asp Val Glu Ala Asp 450 455 460 Leu Pro Ser Ala Ala Gln Gln 465 470 1421164DNAEscherichia coli 142atggaacagg ttgtcattgt cgatgcaatt cgcaccccga tgggccgttc gaagggcggt 60gcttttcgta acgtgcgtgc agaagatctc tccgctcatt taatgcgtag cctgctggcg 120cgtaacccgg cgctggaagc ggcggccctc gacgatattt actggggttg tgtgcagcag 180acgctggagc agggttttaa tatcgcccgt aacgcggcgc tgctggcaga agtaccacac 240tctgtcccgg cggttaccgt taatcgcttg tgtggttcat ccatgcaggc actgcatgac 300gcagcacgaa tgatcatgac tggcgatgcg caggcatgtc tggttggcgg cgtggagcat 360atgggccatg tgccgatgag tcacggcgtc gattttcacc ccggcctgag ccgcaatgtc 420gccaaagcgg cgggcatgat gggcttaacg gcagaaatgc tggcgcgtat gcacggtatc 480agccgtgaaa tgcaggatgc ctttgccgcg cggtcacacg cccgcgcctg ggccgccacg 540cagtcggccg catttaaaaa tgaaatcatc ccgaccggtg gtcacgatgc cgacggcgtc 600ctgaagcagt ttaattacga cgaagtgatt cgcccggaaa ccaccgtgga agccctcgcc 660acgctgcgtc cggcgtttga tccagtaaac ggtatggtaa cggcgggcac atcttctgca 720ctttccgatg gcgcagctgc catgctggtg atgagtgaaa gccgcgccca tgaattaggt 780cttaagccgc gcgctcgtgt gcgttcgatg gcggtcgttg gttgtgaccc atcgattatg 840ggttacggcc cggttccggc ctcgaaactg gcgctgaaaa aagcggggct ttctgccagc 900gatatcggcg tgtttgaaat gaacgaagcc tttgccgcgc agatcctgcc atgtattaaa 960gatctgggac taattgagca gattgacgag aagatcaacc tcaacggtgg cgcgatcgcg 1020ctgggtcatc cgctgggttg ttccggtgcg cgtatcagca ccacgctgct gaatctgatg 1080gaacgcaaag acgttcagtt tggtctggcg acgatgtgta tcggtctggg tcagggtatt 1140gcgacggtgt ttgagcgggt ttaa 1164143387PRTEscherichia coli 143Met Glu Gln Val Val Ile Val Asp Ala Ile Arg Thr Pro Met Gly Arg 1 5 10 15 Ser Lys Gly Gly Ala Phe Arg Asn Val Arg Ala Glu Asp Leu Ser Ala 20 25 30 His Leu Met Arg Ser Leu Leu Ala Arg Asn Pro Ala Leu Glu Ala Ala 35 40 45 Ala Leu Asp Asp Ile Tyr Trp Gly Cys Val Gln Gln Thr Leu Glu Gln 50 55 60 Gly Phe Asn Ile Ala Arg Asn Ala Ala Leu Leu Ala Glu Val Pro His 65 70 75 80 Ser Val Pro Ala Val Thr Val Asn Arg Leu Cys Gly Ser Ser Met Gln 85 90 95 Ala Leu His Asp Ala Ala Arg Met Ile Met Thr Gly Asp Ala Gln Ala 100 105 110 Cys Leu Val Gly Gly Val Glu His Met Gly His Val Pro Met Ser His 115 120 125 Gly Val Asp Phe His Pro Gly Leu Ser Arg Asn Val Ala Lys Ala Ala 130 135 140 Gly Met Met Gly Leu Thr Ala Glu Met Leu Ala Arg Met His Gly Ile 145 150 155 160 Ser Arg Glu Met Gln Asp Ala Phe Ala Ala Arg Ser His Ala Arg Ala 165 170 175 Trp Ala Ala Thr Gln Ser Ala Ala Phe Lys Asn Glu Ile Ile Pro Thr 180 185 190 Gly Gly His Asp Ala Asp Gly Val Leu Lys Gln Phe Asn Tyr Asp Glu 195 200 205 Val Ile Arg Pro Glu Thr Thr Val Glu Ala Leu Ala Thr Leu Arg Pro 210 215 220 Ala Phe Asp Pro Val Asn Gly Met Val Thr Ala Gly Thr Ser Ser Ala 225 230 235 240 Leu Ser Asp Gly Ala Ala Ala Met Leu Val Met Ser Glu Ser Arg Ala 245 250 255 His Glu Leu Gly Leu Lys Pro Arg Ala Arg Val Arg Ser Met Ala Val 260 265 270 Val Gly Cys Asp Pro Ser Ile Met Gly Tyr Gly Pro Val Pro Ala Ser 275 280 285 Lys Leu Ala Leu Lys Lys Ala Gly Leu Ser Ala Ser Asp Ile Gly Val 290 295 300 Phe Glu Met Asn Glu Ala Phe Ala Ala Gln Ile Leu Pro Cys Ile Lys 305 310 315 320 Asp Leu Gly Leu Ile Glu Gln Ile Asp Glu Lys Ile Asn Leu Asn Gly 325 330 335 Gly Ala Ile Ala Leu Gly His Pro Leu Gly Cys Ser Gly Ala Arg Ile 340 345 350 Ser Thr Thr Leu Leu Asn Leu Met Glu Arg Lys Asp Val Gln Phe Gly 355 360 365 Leu Ala Thr Met Cys Ile Gly Leu Gly Gln Gly Ile Ala Thr Val Phe 370 375 380 Glu Arg Val 385 1442190DNAEscherichia coli 144atgctttaca aaggcgacac cctgtacctt gactggctgg aagatggcat tgccgaactg 60gtatttgatg ccccaggttc agttaataaa ctcgacactg cgaccgtcgc cagcctcggc 120gaggccatcg gcgtgctgga acagcaatca gatctaaaag ggctgctgct gcgttcgaac 180aaagcagcct ttatcgtcgg tgctgatatc accgaatttt tgtccctgtt cctcgttcct 240gaagaacagt taagtcagtg gctgcacttt gccaatagcg tgtttaatcg cctggaagat 300ctgccggtgc cgaccattgc tgccgtcaat ggctatgcgc tgggcggtgg ctgcgaatgc 360gtgctggcga ccgattatcg tctggcgacg ccggatctgc gcatcggtct gccggaaacc 420aaactgggca tcatgcctgg ctttggcggt tctgtacgta tgccacgtat gctgggcgct 480gacagtgcgc tggaaatcat tgccgccggt aaagatgtcg gcgcggatca ggcgctgaaa 540atcggtctgg tggatggcgt agtcaaagca gaaaaactgg ttgaaggcgc aaaggcggtt 600ttacgccagg ccattaacgg cgacctcgac tggaaagcaa aacgtcagcc gaagctggaa 660ccactaaaac tgagcaagat tgaagccacc atgagcttca ccatcgctaa agggatggtc 720gcacaaacag cggggaaaca ttatccggcc cccatcaccg cagtaaaaac cattgaagct 780gcggcccgtt ttggtcgtga agaagcctta aacctggaaa acaaaagttt tgtcccgctg 840gcgcatacca acgaagcccg cgcactggtc ggcattttcc ttaacgatca atatgtaaaa 900ggcaaagcga agaaactcac caaagacgtt gaaaccccga aacaggccgc ggtgctgggt 960gcaggcatta tgggcggcgg catcgcttac cagtctgcgt ggaaaggcgt gccggttgtc 1020atgaaagata tcaacgacaa gtcgttaacc ctcggcatga ccgaagccgc gaaactgctg 1080aacaagcagc ttgagcgcgg caagatcgat ggtctgaaac tggctggcgt gatctccaca 1140atccacccaa cgctcgacta cgccggattt gaccgcgtgg atattgtggt agaagcggtt 1200gttgaaaacc cgaaagtgaa aaaagccgta ctggcagaaa ccgaacaaaa agtacgccag 1260gataccgtgc tggcgtctaa cacttcaacc attcctatca gcgaactggc caacgcgctg 1320gaacgcccgg aaaacttctg cgggatgcac ttctttaacc cggtccaccg aatgccgttg 1380gtagaaatta ttcgcggcga gaaaagctcc gacgaaacca tcgcgaaagt tgtcgcctgg 1440gcgagcaaga tgggcaagac gccgattgtg gttaacgact gccccggctt ctttgttaac 1500cgcgtgctgt tcccgtattt cgccggtttc agccagctgc tgcgcgacgg cgcggatttc 1560cgcaagatcg acaaagtgat ggaaaaacag tttggctggc cgatgggccc ggcatatctg 1620ctggacgttg tgggcattga taccgcgcat cacgctcagg ctgtcatggc agcaggcttc 1680ccgcagcgga tgcagaaaga ttaccgcgat gccatcgacg cgctgtttga tgccaaccgc 1740tttggtcaga agaacggcct cggtttctgg cgttataaag aagacagcaa aggtaagccg 1800aagaaagaag aagacgccgc cgttgaagac ctgctggcag aagtgagcca gccgaagcgc 1860gatttcagcg aagaagagat tatcgcccgc atgatgatcc cgatggtcaa cgaagtggtg 1920cgctgtctgg aggaaggcat tatcgccact ccggcggaag cggatatggc gctggtctac 1980ggcctgggct tccctccgtt ccacggcggc gcgttccgct ggctggacac cctcggtagc 2040gcaaaatacc tcgatatggc acagcaatat cagcacctcg gcccgctgta tgaagtgccg 2100gaaggtctgc gtaataaagc gcgtcataac gaaccgtact atcctccggt tgagccagcc 2160cgtccggttg gcgacctgaa aacggcttaa 2190145729PRTEscherichia coli 145Met Leu Tyr Lys Gly Asp Thr Leu Tyr Leu Asp Trp Leu Glu Asp Gly 1 5 10 15 Ile Ala Glu Leu Val Phe Asp Ala Pro Gly Ser Val Asn Lys Leu Asp 20 25 30 Thr Ala Thr Val Ala Ser Leu Gly Glu Ala Ile Gly Val Leu Glu Gln 35 40 45 Gln Ser Asp Leu Lys Gly Leu Leu Leu Arg Ser Asn Lys Ala Ala Phe 50 55 60 Ile Val Gly Ala Asp Ile Thr Glu Phe Leu Ser Leu Phe Leu Val Pro 65 70 75 80 Glu Glu Gln Leu Ser Gln Trp Leu His Phe Ala Asn Ser Val Phe Asn 85 90 95 Arg Leu Glu Asp Leu Pro Val Pro Thr Ile Ala Ala Val Asn Gly Tyr 100 105 110 Ala Leu Gly Gly Gly Cys Glu Cys Val Leu Ala Thr Asp Tyr Arg Leu 115 120 125 Ala Thr Pro Asp Leu Arg Ile Gly Leu Pro Glu Thr Lys Leu Gly Ile 130 135 140 Met Pro Gly Phe Gly Gly Ser Val Arg Met Pro Arg Met Leu Gly Ala 145 150 155 160 Asp Ser Ala Leu Glu Ile Ile Ala Ala Gly Lys Asp Val Gly Ala Asp 165 170 175 Gln Ala Leu Lys Ile Gly Leu Val Asp Gly Val Val Lys Ala Glu Lys 180 185 190 Leu Val Glu Gly Ala Lys Ala Val Leu Arg Gln Ala Ile Asn Gly Asp 195 200 205 Leu Asp Trp Lys Ala Lys Arg Gln Pro Lys Leu Glu Pro Leu Lys Leu 210 215 220 Ser Lys Ile Glu Ala Thr Met Ser Phe Thr Ile Ala Lys Gly Met Val 225 230 235 240 Ala Gln Thr Ala Gly Lys His Tyr Pro Ala Pro Ile Thr Ala Val Lys 245 250 255 Thr Ile Glu Ala Ala Ala Arg Phe Gly Arg Glu Glu Ala Leu Asn Leu 260 265 270 Glu Asn Lys Ser Phe Val Pro Leu Ala His Thr Asn Glu Ala Arg Ala 275 280 285 Leu Val Gly Ile Phe Leu Asn Asp Gln Tyr Val Lys Gly Lys Ala Lys 290 295 300 Lys Leu Thr Lys Asp Val Glu Thr Pro Lys Gln Ala Ala Val Leu Gly 305 310 315 320 Ala Gly Ile Met Gly Gly Gly Ile Ala Tyr Gln Ser Ala Trp Lys Gly 325 330 335 Val Pro Val Val Met Lys Asp Ile Asn Asp Lys Ser Leu Thr Leu Gly 340 345 350 Met Thr Glu Ala Ala Lys Leu Leu Asn Lys Gln Leu Glu Arg Gly Lys 355 360 365 Ile Asp Gly Leu Lys Leu Ala Gly Val Ile Ser Thr Ile His Pro Thr 370 375 380 Leu Asp Tyr Ala Gly Phe Asp Arg Val Asp Ile Val Val Glu Ala Val 385 390 395 400 Val Glu Asn Pro Lys Val Lys Lys Ala Val Leu Ala Glu Thr Glu Gln 405 410 415 Lys Val Arg Gln Asp Thr Val Leu Ala Ser Asn Thr Ser Thr Ile Pro 420 425 430 Ile Ser Glu Leu Ala Asn Ala Leu Glu Arg Pro Glu Asn Phe Cys Gly 435 440 445 Met His Phe Phe Asn Pro Val His Arg Met Pro Leu Val Glu Ile Ile 450 455 460 Arg Gly Glu Lys Ser Ser Asp Glu Thr Ile Ala Lys Val Val Ala Trp 465 470 475 480 Ala Ser Lys Met Gly Lys Thr Pro Ile Val Val Asn Asp Cys Pro Gly 485 490 495 Phe Phe Val Asn Arg Val Leu Phe Pro Tyr Phe Ala Gly Phe Ser Gln 500 505 510 Leu Leu Arg Asp Gly Ala Asp Phe Arg Lys Ile Asp Lys Val Met Glu 515 520 525 Lys Gln Phe Gly Trp Pro Met Gly Pro Ala Tyr Leu Leu Asp Val Val 530 535 540 Gly Ile Asp Thr Ala His His Ala Gln Ala Val Met Ala Ala Gly Phe 545 550 555 560 Pro Gln Arg Met Gln Lys Asp Tyr Arg Asp Ala Ile Asp Ala Leu Phe 565 570 575 Asp Ala Asn Arg Phe Gly Gln Lys Asn Gly Leu Gly Phe Trp Arg Tyr 580 585 590 Lys Glu Asp Ser Lys Gly Lys Pro Lys Lys Glu Glu Asp Ala Ala Val 595 600 605 Glu Asp Leu Leu Ala Glu Val Ser Gln Pro Lys Arg Asp Phe Ser Glu 610 615 620 Glu Glu Ile Ile Ala Arg Met Met Ile Pro Met Val Asn Glu Val Val 625 630 635 640 Arg Cys Leu Glu Glu Gly Ile Ile Ala Thr Pro Ala Glu Ala Asp Met 645 650 655 Ala Leu Val Tyr Gly Leu Gly Phe Pro Pro Phe His Gly Gly Ala Phe 660 665 670 Arg Trp Leu Asp Thr Leu Gly Ser Ala Lys Tyr Leu Asp Met Ala Gln 675 680 685 Gln Tyr Gln His Leu Gly Pro Leu Tyr Glu Val Pro Glu Gly Leu Arg 690 695 700 Asn Lys Ala Arg His Asn Glu Pro

Tyr Tyr Pro Pro Val Glu Pro Ala 705 710 715 720 Arg Pro Val Gly Asp Leu Lys Thr Ala 725 1461311DNAEscherichia coli 146atgggtcagg ttttaccgct ggttacccgc cagggcgatc gtatcgccat tgttagcggt 60ttacgtacgc cttttgcccg tcaggcgacg gcttttcatg gcattcccgc ggttgattta 120gggaagatgg tggtaggcga actgctggca cgcagcgaga tccccgccga agtgattgaa 180caactggtct ttggtcaggt cgtacaaatg cctgaagccc ccaacattgc gcgtgaaatt 240gttctcggta cgggaatgaa tgtacatacc gatgcttaca gcgtcagccg cgcttgcgct 300accagtttcc aggcagttgc aaacgtcgca gaaagcctga tggcgggaac tattcgagcg 360gggattgccg gtggggcaga ttcctcttcg gtattgccaa ttggcgtcag taaaaaactg 420gcgcgcgtgc tggttgatgt caacaaagct cgtaccatga gccagcgact gaaactcttc 480tctcgcctgc gtttgcgcga cttaatgccc gtaccacctg cggtagcaga atattctacc 540ggcttgcgga tgggcgacac cgcagagcaa atggcgaaaa cctacggcat cacccgagaa 600cagcaagatg cattagcgca ccgttcgcat cagcgtgccg ctcaggcatg gtcagacgga 660aaactcaaag aagaggtgat gactgccttt atccctcctt ataaacaacc gcttgtcgaa 720gacaacaata ttcgcggtaa ttcctcgctt gccgattacg caaagctgcg cccggcgttt 780gatcgcaaac acggaacggt aacggcggca aacagtacgc cgctgaccga tggcgcggca 840gcggtgatcc tgatgactga atcccgggcg aaagaattag ggctggtgcc gctggggtat 900ctgcgcagct acgcatttac tgcgattgat gtctggcagg acatgttgct cggtccagcc 960tggtcaacac cgctggcgct ggagcgtgcc ggtttgacga tgagcgatct gacattgatc 1020gatatgcacg aagcctttgc agctcagacg ctggcgaata ttcagttgct gggtagtgaa 1080cgttttgctc gtgaagcact ggggcgtgca catgccactg gcgaagtgga cgatagcaaa 1140tttaacgtgc ttggcggttc gattgcttac gggcatccct tcgcggcgac cggcgcgcgg 1200atgattaccc agacattgca tgaacttcgc cgtcgcggcg gtggatttgg tttagttacc 1260gcctgtgctg ccggtgggct tggcgcggca atggttctgg aggcggaata a 1311147436PRTEscherichia coli 147Met Gly Gln Val Leu Pro Leu Val Thr Arg Gln Gly Asp Arg Ile Ala 1 5 10 15 Ile Val Ser Gly Leu Arg Thr Pro Phe Ala Arg Gln Ala Thr Ala Phe 20 25 30 His Gly Ile Pro Ala Val Asp Leu Gly Lys Met Val Val Gly Glu Leu 35 40 45 Leu Ala Arg Ser Glu Ile Pro Ala Glu Val Ile Glu Gln Leu Val Phe 50 55 60 Gly Gln Val Val Gln Met Pro Glu Ala Pro Asn Ile Ala Arg Glu Ile 65 70 75 80 Val Leu Gly Thr Gly Met Asn Val His Thr Asp Ala Tyr Ser Val Ser 85 90 95 Arg Ala Cys Ala Thr Ser Phe Gln Ala Val Ala Asn Val Ala Glu Ser 100 105 110 Leu Met Ala Gly Thr Ile Arg Ala Gly Ile Ala Gly Gly Ala Asp Ser 115 120 125 Ser Ser Val Leu Pro Ile Gly Val Ser Lys Lys Leu Ala Arg Val Leu 130 135 140 Val Asp Val Asn Lys Ala Arg Thr Met Ser Gln Arg Leu Lys Leu Phe 145 150 155 160 Ser Arg Leu Arg Leu Arg Asp Leu Met Pro Val Pro Pro Ala Val Ala 165 170 175 Glu Tyr Ser Thr Gly Leu Arg Met Gly Asp Thr Ala Glu Gln Met Ala 180 185 190 Lys Thr Tyr Gly Ile Thr Arg Glu Gln Gln Asp Ala Leu Ala His Arg 195 200 205 Ser His Gln Arg Ala Ala Gln Ala Trp Ser Asp Gly Lys Leu Lys Glu 210 215 220 Glu Val Met Thr Ala Phe Ile Pro Pro Tyr Lys Gln Pro Leu Val Glu 225 230 235 240 Asp Asn Asn Ile Arg Gly Asn Ser Ser Leu Ala Asp Tyr Ala Lys Leu 245 250 255 Arg Pro Ala Phe Asp Arg Lys His Gly Thr Val Thr Ala Ala Asn Ser 260 265 270 Thr Pro Leu Thr Asp Gly Ala Ala Ala Val Ile Leu Met Thr Glu Ser 275 280 285 Arg Ala Lys Glu Leu Gly Leu Val Pro Leu Gly Tyr Leu Arg Ser Tyr 290 295 300 Ala Phe Thr Ala Ile Asp Val Trp Gln Asp Met Leu Leu Gly Pro Ala 305 310 315 320 Trp Ser Thr Pro Leu Ala Leu Glu Arg Ala Gly Leu Thr Met Ser Asp 325 330 335 Leu Thr Leu Ile Asp Met His Glu Ala Phe Ala Ala Gln Thr Leu Ala 340 345 350 Asn Ile Gln Leu Leu Gly Ser Glu Arg Phe Ala Arg Glu Ala Leu Gly 355 360 365 Arg Ala His Ala Thr Gly Glu Val Asp Asp Ser Lys Phe Asn Val Leu 370 375 380 Gly Gly Ser Ile Ala Tyr Gly His Pro Phe Ala Ala Thr Gly Ala Arg 385 390 395 400 Met Ile Thr Gln Thr Leu His Glu Leu Arg Arg Arg Gly Gly Gly Phe 405 410 415 Gly Leu Val Thr Ala Cys Ala Ala Gly Gly Leu Gly Ala Ala Met Val 420 425 430 Leu Glu Ala Glu 435 1482145DNAEscherichia coli 148atggaaatga catcagcgtt tacccttaat gttcgtctgg acaacattgc cgttatcacc 60atcgacgtac cgggtgagaa aatgaatacc ctgaaggcgg agtttgcctc gcaggtgcgc 120gccattatta agcaactccg tgaaaacaaa gagttgcgag gcgtggtgtt tgtctccgct 180aaaccggaca acttcattgc tggcgcagac atcaacatga tcggcaactg caaaacggcg 240caagaagcgg aagctctggc gcggcagggc caacagttga tggcggagat tcatgctttg 300cccattcagg ttatcgcggc tattcatggc gcttgcctgg gtggtgggct ggagttggcg 360ctggcgtgcc acggtcgcgt ttgtactgac gatcctaaaa cggtgctcgg tttgcctgaa 420gtacaacttg gattgttacc cggttcaggc ggcacccagc gtttaccgcg tctgataggc 480gtcagcacag cattagagat gatcctcacc ggaaaacaac ttcgggcgaa acaggcatta 540aagctggggc tggtggatga cgttgttccg cactccattc tgctggaagc cgctgttgag 600ctggcaaaga aggagcgccc atcttcccgc cctctacctg tacgcgagcg tattctggcg 660gggccgttag gtcgtgcgct gctgttcaaa atggtcggca agaaaacaga acacaaaact 720caaggcaatt atccggcgac agaacgcatc ctggaggttg ttgaaacggg attagcgcag 780ggcaccagca gcggttatga cgccgaagct cgggcgtttg gcgaactggc gatgacgcca 840caatcgcagg cgctgcgtag tatctttttt gccagtacgg acgtgaagaa agatcccggc 900agtgatgcgc cgcctgcgcc attaaacagc gtggggattt taggtggtgg cttgatgggc 960ggcggtattg cttatgtcac tgcttgtaaa gcggggattc cggtcagaat taaagatatc 1020aacccgcagg gcataaatca tgcgctgaag tacagttggg atcagctgga gggcaaagtt 1080cgccgtcgtc atctcaaagc cagcgaacgt gacaaacagc tggcattaat ctccggaacg 1140acggactatc gcggctttgc ccatcgcgat ctgattattg aagcggtgtt tgaaaatctc 1200gaattgaaac aacagatggt ggcggaagtt gagcaaaatt gcgccgctca taccatcttt 1260gcttcgaata cgtcatcttt accgattggt gatatcgccg ctcacgccac gcgacctgag 1320caagttatcg gcctgcattt cttcagtccg gtggaaaaaa tgccgctggt ggagattatt 1380cctcatgcgg ggacatcggc gcaaaccatc gctaccacag taaaactggc gaaaaaacag 1440ggtaaaacgc caattgtcgt gcgtgacaaa gccggttttt acgtcaatcg catcttagcg 1500ccttacatta atgaagctat ccgcatgttg acccaaggtg aacgggtaga gcacattgat 1560gccgcgctag tgaaatttgg ttttccggta ggcccaatcc aacttttgga tgaggtagga 1620atcgacaccg ggactaaaat tattcctgta ctggaagccg cttatggaga acgttttagc 1680gcgcctgcaa atgttgtttc ttcaattttg aacgacgatc gcaaaggcag aaaaaatggc 1740cggggtttct atctttatgg tcagaaaggg cgtaaaagca aaaaacaggt cgatcccgcc 1800atttacccgc tgattggcac acaagggcag gggcgaatct ccgcaccgca ggttgctgaa 1860cggtgtgtga tgttgatgct gaatgaagca gtacgttgtg ttgatgagca ggttatccgt 1920agcgtgcgtg acggggatat tggcgcggta tttggcattg gttttccgcc atttctcggt 1980ggaccgttcc gctatatcga ttctctcggc gcgggcgaag tggttgcaat aatgcaacga 2040cttgccacgc agtatggttc ccgttttacc ccttgcgagc gtttggtcga gatgggcgcg 2100cgtggggaaa gtttttggaa aacaactgca actgacctgc aataa 2145149714PRTEscherichia coli 149Met Glu Met Thr Ser Ala Phe Thr Leu Asn Val Arg Leu Asp Asn Ile 1 5 10 15 Ala Val Ile Thr Ile Asp Val Pro Gly Glu Lys Met Asn Thr Leu Lys 20 25 30 Ala Glu Phe Ala Ser Gln Val Arg Ala Ile Ile Lys Gln Leu Arg Glu 35 40 45 Asn Lys Glu Leu Arg Gly Val Val Phe Val Ser Ala Lys Pro Asp Asn 50 55 60 Phe Ile Ala Gly Ala Asp Ile Asn Met Ile Gly Asn Cys Lys Thr Ala 65 70 75 80 Gln Glu Ala Glu Ala Leu Ala Arg Gln Gly Gln Gln Leu Met Ala Glu 85 90 95 Ile His Ala Leu Pro Ile Gln Val Ile Ala Ala Ile His Gly Ala Cys 100 105 110 Leu Gly Gly Gly Leu Glu Leu Ala Leu Ala Cys His Gly Arg Val Cys 115 120 125 Thr Asp Asp Pro Lys Thr Val Leu Gly Leu Pro Glu Val Gln Leu Gly 130 135 140 Leu Leu Pro Gly Ser Gly Gly Thr Gln Arg Leu Pro Arg Leu Ile Gly 145 150 155 160 Val Ser Thr Ala Leu Glu Met Ile Leu Thr Gly Lys Gln Leu Arg Ala 165 170 175 Lys Gln Ala Leu Lys Leu Gly Leu Val Asp Asp Val Val Pro His Ser 180 185 190 Ile Leu Leu Glu Ala Ala Val Glu Leu Ala Lys Lys Glu Arg Pro Ser 195 200 205 Ser Arg Pro Leu Pro Val Arg Glu Arg Ile Leu Ala Gly Pro Leu Gly 210 215 220 Arg Ala Leu Leu Phe Lys Met Val Gly Lys Lys Thr Glu His Lys Thr 225 230 235 240 Gln Gly Asn Tyr Pro Ala Thr Glu Arg Ile Leu Glu Val Val Glu Thr 245 250 255 Gly Leu Ala Gln Gly Thr Ser Ser Gly Tyr Asp Ala Glu Ala Arg Ala 260 265 270 Phe Gly Glu Leu Ala Met Thr Pro Gln Ser Gln Ala Leu Arg Ser Ile 275 280 285 Phe Phe Ala Ser Thr Asp Val Lys Lys Asp Pro Gly Ser Asp Ala Pro 290 295 300 Pro Ala Pro Leu Asn Ser Val Gly Ile Leu Gly Gly Gly Leu Met Gly 305 310 315 320 Gly Gly Ile Ala Tyr Val Thr Ala Cys Lys Ala Gly Ile Pro Val Arg 325 330 335 Ile Lys Asp Ile Asn Pro Gln Gly Ile Asn His Ala Leu Lys Tyr Ser 340 345 350 Trp Asp Gln Leu Glu Gly Lys Val Arg Arg Arg His Leu Lys Ala Ser 355 360 365 Glu Arg Asp Lys Gln Leu Ala Leu Ile Ser Gly Thr Thr Asp Tyr Arg 370 375 380 Gly Phe Ala His Arg Asp Leu Ile Ile Glu Ala Val Phe Glu Asn Leu 385 390 395 400 Glu Leu Lys Gln Gln Met Val Ala Glu Val Glu Gln Asn Cys Ala Ala 405 410 415 His Thr Ile Phe Ala Ser Asn Thr Ser Ser Leu Pro Ile Gly Asp Ile 420 425 430 Ala Ala His Ala Thr Arg Pro Glu Gln Val Ile Gly Leu His Phe Phe 435 440 445 Ser Pro Val Glu Lys Met Pro Leu Val Glu Ile Ile Pro His Ala Gly 450 455 460 Thr Ser Ala Gln Thr Ile Ala Thr Thr Val Lys Leu Ala Lys Lys Gln 465 470 475 480 Gly Lys Thr Pro Ile Val Val Arg Asp Lys Ala Gly Phe Tyr Val Asn 485 490 495 Arg Ile Leu Ala Pro Tyr Ile Asn Glu Ala Ile Arg Met Leu Thr Gln 500 505 510 Gly Glu Arg Val Glu His Ile Asp Ala Ala Leu Val Lys Phe Gly Phe 515 520 525 Pro Val Gly Pro Ile Gln Leu Leu Asp Glu Val Gly Ile Asp Thr Gly 530 535 540 Thr Lys Ile Ile Pro Val Leu Glu Ala Ala Tyr Gly Glu Arg Phe Ser 545 550 555 560 Ala Pro Ala Asn Val Val Ser Ser Ile Leu Asn Asp Asp Arg Lys Gly 565 570 575 Arg Lys Asn Gly Arg Gly Phe Tyr Leu Tyr Gly Gln Lys Gly Arg Lys 580 585 590 Ser Lys Lys Gln Val Asp Pro Ala Ile Tyr Pro Leu Ile Gly Thr Gln 595 600 605 Gly Gln Gly Arg Ile Ser Ala Pro Gln Val Ala Glu Arg Cys Val Met 610 615 620 Leu Met Leu Asn Glu Ala Val Arg Cys Val Asp Glu Gln Val Ile Arg 625 630 635 640 Ser Val Arg Asp Gly Asp Ile Gly Ala Val Phe Gly Ile Gly Phe Pro 645 650 655 Pro Phe Leu Gly Gly Pro Phe Arg Tyr Ile Asp Ser Leu Gly Ala Gly 660 665 670 Glu Val Val Ala Ile Met Gln Arg Leu Ala Thr Gln Tyr Gly Ser Arg 675 680 685 Phe Thr Pro Cys Glu Arg Leu Val Glu Met Gly Ala Arg Gly Glu Ser 690 695 700 Phe Trp Lys Thr Thr Ala Thr Asp Leu Gln 705 710 150789DNAEscherichia coli 150atgggttttc tttccggtaa gcgcattctg gtaaccggtg ttgccagcaa actatccatc 60gcctacggta tcgctcaggc gatgcaccgc gaaggagctg aactggcatt cacctaccag 120aacgacaaac tgaaaggccg cgtagaagaa tttgccgctc aattgggttc tgacatcgtt 180ctgcagtgcg atgttgcaga agatgccagc atcgacacca tgttcgctga actggggaaa 240gtttggccga aatttgacgg tttcgtacac tctattggtt ttgcacctgg cgatcagctg 300gatggtgact atgttaacgc cgttacccgt gaaggcttca aaattgccca cgacatcagc 360tcctacagct tcgttgcaat ggcaaaagct tgccgctcca tgctgaatcc gggttctgcc 420ctgctgaccc tttcctacct tggcgctgag cgcgctatcc cgaactacaa cgttatgggt 480ctggcaaaag cgtctctgga agcgaacgtg cgctatatgg cgaacgcgat gggtccggaa 540ggtgtgcgtg ttaacgccat ctctgctggt ccgatccgta ctctggcggc ctccggtatc 600aaagacttcc gcaaaatgct ggctcattgc gaagccgtta ccccgattcg ccgtaccgtt 660actattgaag atgtgggtaa ctctgcggca ttcctgtgct ccgatctctc tgccggtatc 720tccggtgaag tggtccacgt tgacggcggt ttcagcattg ctgcaatgaa cgaactcgaa 780ctgaaataa 789151262PRTEscherichia coli 151Met Gly Phe Leu Ser Gly Lys Arg Ile Leu Val Thr Gly Val Ala Ser 1 5 10 15 Lys Leu Ser Ile Ala Tyr Gly Ile Ala Gln Ala Met His Arg Glu Gly 20 25 30 Ala Glu Leu Ala Phe Thr Tyr Gln Asn Asp Lys Leu Lys Gly Arg Val 35 40 45 Glu Glu Phe Ala Ala Gln Leu Gly Ser Asp Ile Val Leu Gln Cys Asp 50 55 60 Val Ala Glu Asp Ala Ser Ile Asp Thr Met Phe Ala Glu Leu Gly Lys 65 70 75 80 Val Trp Pro Lys Phe Asp Gly Phe Val His Ser Ile Gly Phe Ala Pro 85 90 95 Gly Asp Gln Leu Asp Gly Asp Tyr Val Asn Ala Val Thr Arg Glu Gly 100 105 110 Phe Lys Ile Ala His Asp Ile Ser Ser Tyr Ser Phe Val Ala Met Ala 115 120 125 Lys Ala Cys Arg Ser Met Leu Asn Pro Gly Ser Ala Leu Leu Thr Leu 130 135 140 Ser Tyr Leu Gly Ala Glu Arg Ala Ile Pro Asn Tyr Asn Val Met Gly 145 150 155 160 Leu Ala Lys Ala Ser Leu Glu Ala Asn Val Arg Tyr Met Ala Asn Ala 165 170 175 Met Gly Pro Glu Gly Val Arg Val Asn Ala Ile Ser Ala Gly Pro Ile 180 185 190 Arg Thr Leu Ala Ala Ser Gly Ile Lys Asp Phe Arg Lys Met Leu Ala 195 200 205 His Cys Glu Ala Val Thr Pro Ile Arg Arg Thr Val Thr Ile Glu Asp 210 215 220 Val Gly Asn Ser Ala Ala Phe Leu Cys Ser Asp Leu Ser Ala Gly Ile 225 230 235 240 Ser Gly Glu Val Val His Val Asp Gly Gly Phe Ser Ile Ala Ala Met 245 250 255 Asn Glu Leu Glu Leu Lys 260 152861DNAEscherichia coli 152atgagtcagg cgctaaaaaa tttactgaca ttgttaaatc tggaaaaaat tgaggaagga 60ctctttcgcg gccagagtga agatttaggt ttacgccagg tgtttggcgg ccaggtcgtg 120ggtcaggcct tgtatgctgc aaaagagacc gtccctgaag agcggctggt acattcgttt 180cacagctact ttcttcgccc tggcgatagt aagaagccga ttatttatga tgtcgaaacg 240ctgcgtgacg gtaacagctt cagcgcccgc cgggttgctg ctattcaaaa cggcaaaccg 300attttttata tgactgcctc tttccaggca ccagaagcgg gtttcgaaca tcaaaaaaca 360atgccgtccg cgccagcgcc tgatggcctc ccttcggaaa cgcaaatcgc ccaatcgctg 420gcgcacctgc tgccgccagt gctgaaagat aaattcatct gcgatcgtcc gctggaagtc 480cgtccggtgg agtttcataa cccactgaaa ggtcacgtcg cagaaccaca tcgtcaggtg 540tggatccgcg caaatggtag cgtgccggat gacctgcgcg ttcatcagta tctgctcggt 600tacgcttctg atcttaactt cctgccggta gctctacagc cgcacggcat cggttttctc 660gaaccgggga ttcagattgc caccattgac cattccatgt ggttccatcg cccgtttaat 720ttgaatgaat ggctgctgta tagcgtggag agcacctcgg cgtccagcgc acgtggcttt 780gtgcgcggtg agttttatac ccaagacggc gtactggttg cctcgaccgt tcaggaaggg 840gtgatgcgta atcacaatta a 861153286PRTEscherichia coli 153Met Ser Gln Ala Leu Lys Asn Leu Leu Thr Leu Leu Asn Leu Glu Lys 1 5 10 15 Ile Glu Glu Gly Leu Phe Arg Gly Gln Ser Glu Asp Leu Gly Leu Arg 20 25 30 Gln Val Phe Gly Gly Gln Val Val Gly Gln Ala Leu Tyr Ala Ala Lys 35

40 45 Glu Thr Val Pro Glu Glu Arg Leu Val His Ser Phe His Ser Tyr Phe 50 55 60 Leu Arg Pro Gly Asp Ser Lys Lys Pro Ile Ile Tyr Asp Val Glu Thr 65 70 75 80 Leu Arg Asp Gly Asn Ser Phe Ser Ala Arg Arg Val Ala Ala Ile Gln 85 90 95 Asn Gly Lys Pro Ile Phe Tyr Met Thr Ala Ser Phe Gln Ala Pro Glu 100 105 110 Ala Gly Phe Glu His Gln Lys Thr Met Pro Ser Ala Pro Ala Pro Asp 115 120 125 Gly Leu Pro Ser Glu Thr Gln Ile Ala Gln Ser Leu Ala His Leu Leu 130 135 140 Pro Pro Val Leu Lys Asp Lys Phe Ile Cys Asp Arg Pro Leu Glu Val 145 150 155 160 Arg Pro Val Glu Phe His Asn Pro Leu Lys Gly His Val Ala Glu Pro 165 170 175 His Arg Gln Val Trp Ile Arg Ala Asn Gly Ser Val Pro Asp Asp Leu 180 185 190 Arg Val His Gln Tyr Leu Leu Gly Tyr Ala Ser Asp Leu Asn Phe Leu 195 200 205 Pro Val Ala Leu Gln Pro His Gly Ile Gly Phe Leu Glu Pro Gly Ile 210 215 220 Gln Ile Ala Thr Ile Asp His Ser Met Trp Phe His Arg Pro Phe Asn 225 230 235 240 Leu Asn Glu Trp Leu Leu Tyr Ser Val Glu Ser Thr Ser Ala Ser Ser 245 250 255 Ala Arg Gly Phe Val Arg Gly Glu Phe Tyr Thr Gln Asp Gly Val Leu 260 265 270 Val Ala Ser Thr Val Gln Glu Gly Val Met Arg Asn His Asn 275 280 285 154912DNAAcinetobacter sp. ADP1 154ttgatatcaa tcagggaaaa acgcgtgaac aaaaaacttg aagctctctt ccgagagaat 60gtaaaaggta aagtggcttt gatcactggt gcatctagtg gaatcggttt gacgattgca 120aaaagaattg ctgcggcagg tgctcatgta ttattggttg cccgaaccca agaaacactg 180gaagaagtga aagctgcaat tgaacagcaa gggggacagg cctctatttt tccttgtgac 240ctgactgaca tgaatgcgat tgaccagtta tcacaacaaa ttatggccag tgtcgatcat 300gtcgatttcc tgatcaataa tgcagggcgt tcgattcgcc gtgccgtaca cgagtcgttt 360gatcgcttcc atgattttga acgcaccatg cagctgaatt actttggtgc ggtacgttta 420gtgttaaatt tactgccaca tatgattaag cgtaaaaatg gccagatcat caatatcagc 480tctattggtg tattggccaa tgcgacccgt ttttctgctt atgtcgcgtc taaagctgcg 540ctggatgcct tcagtcgctg tctttcagcc gaggtactca agcataaaat ctcaattacc 600tcgatttata tgccattggt gcgtacccca atgatcgcac ccaccaaaat ttataaatac 660gtgcccacgc tttccccaga agaagccgca gatctcattg tctacgccat tgtgaaacgt 720ccaaaacgta ttgcgacgca cttgggtcgt ctggcgtcaa ttacctatgc catcgcacca 780gacatcaata atattctgat gtcgattgga tttaacctat tcccaagctc aacggctgca 840ctgggtgaac aggaaaaatt gaatctgcta caacgtgcct atgcccgctt gttcccaggc 900gaacactggt aa 912155303PRTAcinetobacter sp. ADP1 155Met Ile Ser Ile Arg Glu Lys Arg Val Asn Lys Lys Leu Glu Ala Leu 1 5 10 15 Phe Arg Glu Asn Val Lys Gly Lys Val Ala Leu Ile Thr Gly Ala Ser 20 25 30 Ser Gly Ile Gly Leu Thr Ile Ala Lys Arg Ile Ala Ala Ala Gly Ala 35 40 45 His Val Leu Leu Val Ala Arg Thr Gln Glu Thr Leu Glu Glu Val Lys 50 55 60 Ala Ala Ile Glu Gln Gln Gly Gly Gln Ala Ser Ile Phe Pro Cys Asp 65 70 75 80 Leu Thr Asp Met Asn Ala Ile Asp Gln Leu Ser Gln Gln Ile Met Ala 85 90 95 Ser Val Asp His Val Asp Phe Leu Ile Asn Asn Ala Gly Arg Ser Ile 100 105 110 Arg Arg Ala Val His Glu Ser Phe Asp Arg Phe His Asp Phe Glu Arg 115 120 125 Thr Met Gln Leu Asn Tyr Phe Gly Ala Val Arg Leu Val Leu Asn Leu 130 135 140 Leu Pro His Met Ile Lys Arg Lys Asn Gly Gln Ile Ile Asn Ile Ser 145 150 155 160 Ser Ile Gly Val Leu Ala Asn Ala Thr Arg Phe Ser Ala Tyr Val Ala 165 170 175 Ser Lys Ala Ala Leu Asp Ala Phe Ser Arg Cys Leu Ser Ala Glu Val 180 185 190 Leu Lys His Lys Ile Ser Ile Thr Ser Ile Tyr Met Pro Leu Val Arg 195 200 205 Thr Pro Met Ile Ala Pro Thr Lys Ile Tyr Lys Tyr Val Pro Thr Leu 210 215 220 Ser Pro Glu Glu Ala Ala Asp Leu Ile Val Tyr Ala Ile Val Lys Arg 225 230 235 240 Pro Lys Arg Ile Ala Thr His Leu Gly Arg Leu Ala Ser Ile Thr Tyr 245 250 255 Ala Ile Ala Pro Asp Ile Asn Asn Ile Leu Met Ser Ile Gly Phe Asn 260 265 270 Leu Phe Pro Ser Ser Thr Ala Ala Leu Gly Glu Gln Glu Lys Leu Asn 275 280 285 Leu Leu Gln Arg Ala Tyr Ala Arg Leu Phe Pro Gly Glu His Trp 290 295 300 156296PRTClostridium acetobutylicum 156Met Ile Lys Ser Phe Asn Glu Ile Ile Met Lys Val Lys Ser Lys Glu 1 5 10 15 Met Lys Lys Val Ala Val Ala Val Ala Gln Asp Glu Pro Val Leu Glu 20 25 30 Ala Val Arg Asp Ala Lys Lys Asn Gly Ile Ala Asp Ala Ile Leu Val 35 40 45 Gly Asp His Asp Glu Ile Val Ser Ile Ala Leu Lys Ile Gly Met Asp 50 55 60 Val Asn Asp Phe Glu Ile Val Asn Glu Pro Asn Val Lys Lys Ala Ala 65 70 75 80 Leu Lys Ala Val Glu Leu Val Ser Thr Gly Lys Ala Asp Ile Leu Met 85 90 95 Asn Gly Leu Val Asn Thr Ala Thr Phe Leu Lys Ile Cys Ile Leu Asn 100 105 110 Lys Glu Val Gly Leu Arg Thr Gly Lys Thr Met Ser His Val Ala Val 115 120 125 Phe Glu Thr Glu Thr Ser Asp Arg Leu Ser Phe Leu Thr Asp Val Ala 130 135 140 Phe Asn Thr Tyr Pro Glu Leu Lys Glu Lys Ile Asp Ile Val Asn Asn 145 150 155 160 Ser Val Lys Val Ala His Ala Ile Gly Ile Val Asn Pro Lys Val Ala 165 170 175 Pro Ile Cys Ala Val Glu Val Ile Asn Pro Lys Met Pro Ser Thr Leu 180 185 190 Asp Ala Ala Met Leu Ser Lys Met Ser Asp Arg Gly Gln Ile Lys Gly 195 200 205 Cys Val Val Asp Gly Pro Leu Ala Leu Asp Ile Ala Leu Ser Glu Glu 210 215 220 Ala Ala His His Lys Gly Val Thr Gly Glu Val Ala Gly Lys Ala Asp 225 230 235 240 Ile Phe Leu Met Pro Asn Ile Glu Thr Gly Asn Val Met Tyr Lys Thr 245 250 255 Leu Thr Tyr Thr Thr Asp Ser Lys Asn Gly Gly Ile Leu Val Gly Thr 260 265 270 Ser Ala Pro Val Val Leu Thr Ser Arg Ala Asp Ser His Glu Thr Lys 275 280 285 Met Asn Ser Ile Ala Leu Ala Ala 290 295 157355PRTClostridium acetobutylicum 157Met Ser Tyr Lys Leu Leu Ile Ile Asn Pro Gly Ser Thr Ser Thr Lys 1 5 10 15 Ile Gly Val Tyr Glu Gly Glu Lys Glu Leu Phe Glu Glu Thr Leu Arg 20 25 30 His Thr Asn Glu Glu Ile Lys Arg Tyr Asp Thr Ile Tyr Asp Gln Phe 35 40 45 Glu Phe Arg Lys Glu Val Ile Leu Asn Val Leu Lys Glu Lys Asn Phe 50 55 60 Asp Ile Lys Thr Leu Ser Ala Ile Val Gly Arg Gly Gly Met Leu Arg 65 70 75 80 Pro Val Glu Gly Gly Thr Tyr Ala Val Asn Asp Ala Met Val Glu Asp 85 90 95 Leu Lys Val Gly Val Gln Gly Pro His Ala Ser Asn Leu Gly Gly Ile 100 105 110 Ile Ala Lys Ser Ile Gly Asp Glu Leu Asn Ile Pro Ser Phe Ile Val 115 120 125 Asp Pro Val Val Thr Asp Glu Leu Ala Asp Val Ala Arg Leu Ser Gly 130 135 140 Val Pro Glu Leu Pro Arg Lys Ser Lys Phe His Ala Leu Asn Gln Lys 145 150 155 160 Ala Val Ala Lys Arg Tyr Gly Lys Glu Ser Gly Gln Gly Tyr Glu Asn 165 170 175 Leu Asn Leu Val Val Val His Met Gly Gly Gly Val Ser Val Gly Ala 180 185 190 His Asn His Gly Lys Val Val Asp Val Asn Asn Ala Leu Asp Gly Asp 195 200 205 Gly Pro Phe Ser Pro Glu Arg Ala Gly Ser Val Pro Ile Gly Asp Leu 210 215 220 Val Lys Met Cys Phe Ser Gly Lys Tyr Ser Glu Ala Glu Val Tyr Gly 225 230 235 240 Lys Ala Val Gly Lys Gly Gly Phe Val Gly Tyr Leu Asn Thr Asn Asp 245 250 255 Val Lys Gly Val Ile Asp Lys Met Glu Glu Gly Asp Lys Glu Cys Glu 260 265 270 Ser Ile Tyr Lys Ala Phe Val Tyr Gln Ile Ser Lys Ala Ile Gly Glu 275 280 285 Met Ser Val Val Leu Glu Gly Lys Val Asp Gln Ile Ile Phe Thr Gly 290 295 300 Gly Ile Ala Tyr Ser Pro Thr Leu Val Pro Asp Leu Lys Ala Lys Val 305 310 315 320 Glu Trp Ile Ala Pro Val Thr Val Tyr Pro Gly Glu Asp Glu Leu Leu 325 330 335 Ala Leu Ala Gln Gly Ala Ile Arg Val Leu Asp Gly Glu Glu Gln Ala 340 345 350 Lys Val Tyr 355 15870DNAArtificial sequenceSynthetic primer 158aaaaacagca acaatgtgag ctttgttgta attatattgt aaacatattg attccgggga 60tccgtcgacc 7015968DNAArtificial sequenceSynthetic primer 159aaacggagcc tttcggctcc gttattcatt tacgcggctt caactttcct gtaggctgga 60gctgcttc 6816023DNAArtificial sequenceSynthetic primer 160cgggcaggtg ctatgaccag gac 2316123DNAArtificial sequenceSynthetic primer 161cgcggcgttg accggcagcc tgg 2316270DNAArtificial sequenceSynthetic primer 162atcattctcg tttacgttat cattcacttt acatcagaga tataccaatg attccgggga 60tccgtcgacc 7016369DNAArtificial sequenceSynthetic primer 163gcacggaaat ccgtgcccca aaagagaaat tagaaacgga aggttgcggt tgtaggctgg 60agctgcttc 6916421DNAArtificial sequenceSynthetic primer 164caacagcaac ctgctcagca a 2116521DNAArtificial sequenceSynthetic primer 165aagctggagc agcaaagcgt t 2116632DNAArtificial sequenceSynthetic primer 166ataaaccatg gatccatgaa cgagtacgcc cc 3216733DNAArtificial sequenceSynthetic primer 167ccaagcttcg aattctcaga tatgcaaggc gtg 3316836DNAArtificial sequenceSynthetic primer 168tgaattccat ggcgcaactc actcttcttt tagtcg 3616939DNAArtificial sequenceSynthetic primer 169cagtacctcg agtcttcgta tacatatgcg ctcagtcac 3917021DNAArtificial sequenceSynthetic primer 170ccttggggca tatgaaagct g 2117129DNAArtificial sequenceSynthetic primer 171tttagtcatc tcgagtgcac ctcaccttt 2917270DNAArtificial sequenceSynthetic primer 172gccacattgc cgcgccaaac gaaaccgttt caaccatggc atatgaatat cctccttagt 60tcctattccg 7017361DNAArtificial sequenceSynthetic primer 173cgccccagat ttcacgtatt gatcggctac gcttaatgca tgtgtaggct ggagctgctt 60c 6117420DNAArtificial sequenceSynthetic primer 174ttgacacgtc taaccctggc 2017521DNAArtificial sequenceSynthetic primer 175ctgtccaggg aacacaaatg c 2117619DNAArtificial sequenceSynthetic primer 176ttgtgtcgcc ctttcgctg 1917724DNAArtificial sequenceSynthetic primer 177cttacgtacg tactcgagtg acgc 2417823DNAArtificial sequenceSynthetic primer 178aagtggggca tatgtctaag atc 2317923DNAArtificial sequenceSynthetic primer 179gtgatccggc tcgaggtggt tac 2318024DNAArtificial sequenceSynthetic primer 180cttaacttca tgtgaaaagt ttgt 2418124DNAArtificial sequenceSynthetic primer 181acaataccca tgtttatagg gcaa 24

* * * * *

References


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed