Selection of host cells expressing protein at high levels Otte; Arie Pieter ; et al. [ChromaGenics B.V.]

Selection of host cells expressing protein at high levels

Otte; Arie Pieter ; et al.

Patent Application Summary

U.S. patent application number 13/135966 was filed with the patent office on 2011-12-08 for selection of host cells expressing protein at high levels. This patent application is currently assigned to ChromaGenics B.V.. Invention is credited to Theodorus Hendrikus Jacobus Kwaks, Arie Pieter Otte, Richard George Antonius Bernardus Sewalt, Henricus Johannes Maria Van Blokland.

Application Number	20110300580 13/135966
Document ID	/
Family ID	46324389
Filed Date	2011-12-08

United States Patent Application	20110300580
Kind Code	A1
Otte; Arie Pieter ; et al.	December 8, 2011

Selection of host cells expressing protein at high levels

Abstract

Described is a DNA molecule comprising an open reading frame sequence that encodes a selectable marker polypeptide, wherein the DNA molecule in the coding strand comprises a translation start sequence for the selectable marker polypeptide having a GTG start codon or a TTG start codon, and wherein the ORF sequence that encodes the selectable marker protein has been mutated to replace at least half of its CpG dinucleotides as compared to the native ORF sequence that encodes the selectable marker protein. Further provided are such DNA molecules wherein the ORF sequence that encodes a selectable marker polypeptide is part of a multicistronic transcription unit that further comprises an open reading frame sequence encoding a polypeptide of interest. Also described are methods for obtaining host cells expressing a polypeptide of interest, wherein the host cells comprise the DNA molecules described herein. Further provided is the production of polypeptides of interest, comprising culturing host cells comprising the DNA molecules described herein.

Inventors:	Otte; Arie Pieter; (Amersfoort, NL) ; Van Blokland; Henricus Johannes Maria; (Wijdewormer, NL) ; Kwaks; Theodorus Hendrikus Jacobus; (Amsterdam, NL) ; Sewalt; Richard George Antonius Bernardus; (Arnhem, NL)
Assignee:	ChromaGenics B.V.
Family ID:	46324389
Appl. No.:	13/135966
Filed:	July 18, 2011

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
11416490	May 2, 2006
13135966
11269525	Nov 7, 2005
11416490
11359953	Feb 21, 2006
11269525
11269525	Nov 7, 2005
11359953
12226706	Oct 24, 2008	8039230
PCT/EP2007/053984	Apr 24, 2007
11269525
60626301	Nov 8, 2004
60696610	Jul 5, 2005

Current U.S. Class:	435/70.3 ; 435/243; 435/252.33; 435/320.1; 435/352; 435/354; 435/358; 435/366; 435/369; 435/455; 435/471; 435/71.2; 536/23.5
Current CPC Class:	C12N 2830/46 20130101; C12N 15/67 20130101; C12N 2840/50 20130101; C12Y 207/01095 20130101; C12N 9/1205 20130101; C12N 15/85 20130101; C12N 2840/203 20130101; C07K 14/505 20130101; C12N 2840/20 20130101; C12N 2840/206 20130101
Class at Publication:	435/70.3 ; 536/23.5; 435/320.1; 435/352; 435/455; 435/358; 435/369; 435/354; 435/366; 435/243; 435/252.33; 435/471; 435/71.2
International Class:	C12P 21/02 20060101 C12P021/02; C12N 15/85 20060101 C12N015/85; C12N 15/63 20060101 C12N015/63; C12N 1/00 20060101 C12N001/00; C12N 1/21 20060101 C12N001/21; C07H 21/04 20060101 C07H021/04; C12N 5/10 20060101 C12N005/10

Foreign Application Data

Date	Code	Application Number
Nov 8, 2004	EP	04105593.0
May 2, 2006	EP	06113354.2

Claims

1.-92. (canceled)

93. A DNA molecule comprising an open reading frame encoding a selectable marker polypeptide, wherein the DNA molecule in the coding strand comprises a translation start sequence for the selectable marker polypeptide selected from the group consisting of: a) a GTG start codon; and b) a TTG start codon; and wherein the open reading frame encoding the selectable marker protein has been mutated to replace at least half of its CpG dinucleotides as in comparison to the native open reading frame encoding the selectable marker protein.

94. The DNA molecule of claim 93, wherein the translation start sequence for the selectable marker polypeptide comprises a TTG start codon.

95. The DNA molecule of claim 94, wherein the open reading frame encoding the selectable marker polypeptide has no ATG sequence in the coding strand.

96. The DNA molecule of claim 95, wherein the selectable marker polypeptide provides resistance against ZEOCIN.RTM. antibiotic or against neomycin.

97. The DNA molecule of claim 96, comprising an open reading frame encoding a polypeptide that provides resistance against ZEOCIN.RTM. antibiotic, wherein the DNA molecule comprises a sequence selected from the group consisting of: a) SEQ ID NO:92, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and b) SEQ ID NO:92 wherein nucleotide A at position 280 is replaced by T, and with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG.

98. (canceled)

99. The DNA molecule of claim 96, comprising an open reading frame encoding a polypeptide that provides resistance against neomycin, wherein the DNA molecule comprises a sequence selected from the group consisting of: a) SEQ ID NO:128, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and b) SEQ ID NO:118, with the proviso that at least half of the CpG dinucleotides of the coding strand has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and c) SEQ ID NO:128 or SEQ ID NO:118, with the proviso that it contains a mutation to encode either of the following polypeptide variants as in comparison to the polypeptide encoded by the native sequences: (i) substitution of valine at position 201 into glycine (201V>G), or (ii) substitution of glutamic acid at position 185 into aspartic acid (185E>D), or (iii) a combination of both mutations (i) and (ii) (185E>D and 201V>G), with the further proviso that at least half of the CpG dinucleotides of the coding strand has been replaced without further mutating the amino acid sequence that is encoded beyond the mutation indicated under (i)-(iii), and with the further proviso that the start codon is either GTG or TTG.

100. The DNA molecule of claim 99, comprising SEQ ID NO:130, with the proviso that nucleotide A at position 555 is replaced by C, and that nucleotide T at position 602 is replaced by G and that nucleotide G at position 603 is replaced by T, and with the further proviso that the start codon is either GTG or TTG.

101. The DNA molecule of claim 93, wherein the open reading frame encoding a selectable marker polypeptide is part of a multicistronic transcription unit that further comprises an open reading frame polynucleotide encoding a polypeptide of interest.

102. The DNA molecule of claim 101, wherein the open reading frame encoding the selectable marker polypeptide is upstream of the open reading frame encoding the polypeptide of interest, and wherein the open reading frame encoding the selectable marker polypeptide has no ATG sequence in the coding strand.

103. The DNA molecule of claim 101, wherein the open reading frame encoding the polypeptide of interest is upstream of the open reading frame encoding the selectable marker polypeptide, and wherein the open reading frame encoding the selectable marker polypeptide is operably linked to an internal ribosome entry site (IRES).

104. An expression cassette comprising the DNA molecule of claim 103, the expression cassette comprising a promoter upstream of the multicistronic expression unit and a transcription termination sequence downstream of the multicistronic expression unit, wherein the expression cassette is functional in a eukaryotic host cell for initiating transcription of the multicistronic expression unit.

105. The expression cassette of claim 104, further comprising at least one chromatin control element selected from the group consisting of matrix or scaffold attachment regions (MAR/SAR), and anti-repressor (STAR) sequences.

106. The expression cassette of claim 105, wherein the at least one chromatin control element is an anti-repressor molecule selected from the group consisting of any one of SEQ ID NO:1 through SEQ ID NO:66 and the complement of any one of SEQ ID NO:1 through SEQ ID NO:66.

107. The expression cassette of claim 106, wherein the expression cassette comprises SEQ ID NO:66 positioned upstream of the promoter that drives transcription of the multicistronic expression unit.

108. The expression cassette of claim 107, wherein the multicistronic expression unit is flanked on both sides by at least one anti-repressor molecule selected from the group consisting of any one of SEQ ID NO:1 through SEQ ID NO:65 and the complement of any one of SEQ ID NO:1 through SEQ ID NO:65.

109. A host cell comprising the DNA molecule of claim 103.

110. A host cell comprising the expression cassette of claim 108.

111. A method of generating a host cell able to express a polypeptide of interest, the method comprising the steps of: a) introducing into a plurality of precursor cells a DNA molecule according to claim 103, and b) culturing the plurality of precursor cells under conditions suitable for expression of the selectable marker polypeptide, and c) selecting at least one host cell expressing the polypeptide of interest.

112. A method of generating a host cell able to express a polypeptide of interest, the method comprising the steps of: a) introducing into a plurality of precursor cells an expression cassette according to claim 104, and b) culturing the plurality of precursor cells under conditions suitable for expression of the selectable marker polypeptide, and c) selecting at least one host cell expressing the polypeptide of interest.

113. A method of expressing a polypeptide of interest, comprising culturing a host cell comprising the expression cassette of claim 104, and expressing the polypeptide of interest from the expression cassette.

114. The method according to claim 113, further comprising harvesting the polypeptide of interest.

115.-140. (canceled)

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of co-pending U.S. patent application Ser. No. 11/416,490, filed May 2, 2006, which is a continuation-in-part of co-pending U.S. patent application Ser. No. 11/269,525, filed Nov. 7, 2005, which application claims the benefit under 35 U.S.C. .sctn.119(e) of U.S. Provisional Patent Application Ser. No. 60/626,301, filed Nov. 8, 2004, and U.S. Provisional Patent Application Ser. No. 60/696,610, filed Jul. 5, 2005. U.S. patent application Ser. No. 11/269,525 also claims the benefit of EP 04105593.0, filed Nov. 8, 2004. This application is further a continuation-in-part of co-pending U.S. patent application Ser. No. 11/359,953, filed Feb. 21, 2006, and which itself is a continuation-in-part of the aforementioned co-pending U.S. patent application Ser. No. 11/269,525, filed Nov. 7, 2005. This application is also a continuation of co-pending U.S. Ser. No. 12/226,706, filed Oct. 24, 2008, which is the national stage of PCT International Patent Application No. PCT/EP2007/053984, filed on Apr. 24, 2007, designating the United States of America, and published, in English, as PCT International Publication No. WO 2007/128685 A1 on Nov. 15, 2007, and claims priority to U.S. Ser. No. 11/416,490, filed May 2, 2006, and EP 06113354.2, also filed on May 2, 2006. The contents of each of the preceding applications are incorporated herein by this reference.

STATEMENT ACCORDING TO 37 C.F.R. .sctn.1.52(e)(5)-SEQUENCE LISTING SUBMITTED ON COMPACT DISC

[0002] Pursuant to 37 C.F.R. .sctn.1.52(e)(1)(ii), a compact disc containing an electronic version of the Sequence Listing has been submitted concomitant with this application, the contents of which are hereby incorporated by reference. A second compact disc is submitted and is an identical copy of the first compact disc. The discs are labeled "copy 1" and "copy 2," respectively, and each disc contains one file entitled "2578-7784US seq list.txt" which is 239 KB and created on May 2, 2006.

TECHNICAL FIELD

[0003] The invention relates to the field of molecular biology and biotechnology. More specifically the invention relates to means and methods for improving the selection of host cells that express proteins at high levels.

BACKGROUND

[0004] Proteins can be produced in various host cells for a wide range of applications in biology and biotechnology, for instance as biopharmaceuticals. Eukaryotic and particularly mammalian host cells are preferred for this purpose for expression of many proteins, for instance when such proteins have certain posttranslational modifications such as glycosylation. Methods for such production are well established, and generally entail the expression in a host cell of a nucleic acid (also referred to as "transgene") encoding the protein of interest. In general, the transgene together with a selectable marker gene is introduced into a precursor cell, cells are selected for the expression of the selectable marker gene, and one or more clones that express the protein of interest at high levels are identified, and used for the expression of the protein of interest.

[0005] One problem associated with the expression of transgenes is that it is unpredictable, stemming from the high likelihood that the transgene will become inactive due to gene silencing (McBurney et al., 2002), and therefore many host cell clones have to be tested for high expression of the transgene.

[0006] Methods to select recombinant host cells expressing relatively high levels of desired proteins are known.

[0007] One method describes the use of selectable marker proteins with mutations in their coding sequence that diminish, but not destroy the function of the marker (e.g., WO 01/32901). The rationale is that higher levels of the mutant marker expression are required when selection conditions are employed and therefore selection for high expression of the marker is achieved, therewith concomitantly selecting host cells that also express the gene of interest at high levels.

[0008] Another method makes use of a selection marker gene under control of a promoter sequence that has been mutated such that the promoter has an activity level substantially below that of its corresponding wild type (U.S. Pat. No. 5,627,033).

[0009] Another method describes the use of an impaired dominant selectable marker sequence, such as neomycin phosphotransferase with an impaired consensus Kozak sequence, to decrease the number of colonies to be screened and to increase the expression levels of a gene of interest that is co-linked to the dominant selectable marker (U.S. Pat. Nos. 5,648,267 and 5,733,779). In certain embodiments therein, the gene of interest is placed within an (artificial) intron in the dominant selectable marker. The gene of interest and the dominant selectable marker are in different transcriptional cassettes and each contains its own eukaryotic promoter in this method (U.S. Pat. Nos. 5,648,267 and 5,733,779).

[0010] Another method uses the principle of a selectable marker gene containing an intron that does not naturally occur within the selectable gene, wherein the intron is capable of being spliced in a host cell to provide mRNA encoding a selectable protein and wherein the intron in the selectable gene reduces the level of selectable protein produced from the selectable gene in the host cell (European Patent 0724639 B1).

[0011] In yet another method, DNA constructs are used comprising a selectable gene positioned within an intron defined by a 5' splice donor site comprising an efficient splice donor sequence such that the efficiency of splicing an mRNA having the splice donor site is between about 80-99%, and a 3' splice acceptor site, and a product gene encoding a product of interest downstream of 3' splice acceptor site, the selectable gene and the product gene being controlled by the same transcriptional regulatory region (U.S. Pat. No. 5,561,053).

[0012] In certain methods, use is made of polycistronic expression vector constructs. An early report of use of this principle describes a polycistronic expression vector, containing sequences coding for both the desired protein and a selectable protein, which coding sequences are governed by the same promoter and separated by a translational stop and start signal codons (U.S. Pat. No. 4,965,196). In certain embodiments in U.S. Pat. No. 4,965,196, the selectable marker is the amplifiable DHFR gene. In a particularly preferred embodiment of the system described in U.S. Pat. No. 4,965,196, the sequence coding for the selectable marker is downstream from that coding for the desired polypeptide, such that procedures designed to select for the cells transformed by the selectable marker will also select for particularly enhanced production of the desired protein.

[0013] In further improvements based on the concept of multicistronic expression vectors, bicistronic vectors have been described for the rapid and efficient creation of stable mammalian cell lines that express recombinant protein. These vectors contain an internal ribosome entry site (IRES) between the upstream coding sequence for the protein of interest and the downstream coding sequence of the selection marker (Rees et al., 1996). Such vectors are commercially available, for instance the pIRES1 vectors from Clontech (CLONTECHniques, October 1996). Using such vectors for introduction into host cells, selection of sufficient expression of the downstream marker protein then automatically selects for high transcription levels of the multicistronic mRNA, and hence a strongly increased probability of high expression of the protein of interest is envisaged using such vectors.

[0014] Preferably in such methods, the IRES used is an IRES which gives a relatively low level of translation of the selection marker gene, to further improve the chances of selecting for host cells with a high expression level of the protein of interest by selecting for expression of the selection marker protein (see, e.g., PCT International Publication WO 03/106684).

[0015] The invention aims at providing improved means and methods for selection of host cells expressing high levels of proteins of interest.

DISCLOSURE

[0016] U.S. patent application Ser. No. 11/269,525 (hereinafter the '525 application) and International Patent Application No. PCT/EP2005/055794, both incorporated in their entirety by reference herein, disclose a concept for selecting host cells expressing high levels of polypeptides of interest, the concept referred to therein as "reciprocal interdependent translation." In that concept, a multicistronic transcription unit is used wherein a sequence encoding a selectable marker polypeptide is upstream of a sequence encoding a polypeptide of interest, and wherein the translation of the selectable marker polypeptide is impaired by mutations therein, whereas translation of the polypeptide of interest is very high (see, e.g., FIG. 2 herein for a schematic view).

[0017] U.S. patent application Ser. No. 11/359,953 (hereinafter the '953 application), incorporated in its entirety by reference herein, discloses alternative means and methods for selecting host cells expressing high levels of polypeptide. The '953 application is based on a similar principle as the '525 application, this principle also using multicistronic transcription units and impairment of the translation initiation of the selectable marker polypeptide by mutation of the start codon thereof. The main difference between the means and methods disclosed in the '525 application and the '953 application is in the order of the sequences encoding the selectable marker polypeptide and the sequence encoding the polypeptide of interest in the multicistronic transcription units.

[0018] Both the '525 application and the '953 application thus provide means and methods for selecting host cells with very high expression levels of a polypeptide of interest. The invention provides further advantageous embodiments and improvements to the means and methods disclosed in the incorporated '525 and '953 applications.

[0019] In one aspect, provided is a DNA molecule comprising an open reading frame sequence that encodes a selectable marker polypeptide, wherein the DNA molecule in the coding strand comprises a translation start sequence for the selectable marker polypeptide chosen from the group consisting of: a) a GTG start codon; and b) a TTG start codon; and wherein the open reading frame sequence that encodes the selectable marker protein has been mutated to replace at least 10% of its CpG dinucleotides as compared to the native open reading frame sequence that encodes the selectable marker protein.

[0020] The translation start sequence in the coding strand for the selectable marker polypeptide may comprise a GTG or TTG start codon, most preferably a TTG start codon, flanked by sequences providing for relatively good recognition of the non-ATG sequences as start codons, such that at least some ribosomes start translation from these start codons, i.e., the translation start sequence may comprise the sequence ACC[GTG or TTG start codon]G or GCC[GTG or TTG start codon]G.

[0021] In certain embodiments, the selectable marker protein provides resistance against lethal and/or growth-inhibitory effects of a selection agent, such as an antibiotic. In certain embodiments, the selectable marker polypeptide provides resistance against ZEOCIN.TM. antibiotic or against neomycin.

[0022] In certain embodiments, the DNA molecule comprises comprising an open reading frame sequence that encodes a polypeptide that provides resistance against neomycin, wherein the DNA molecule comprises a sequence chosen from the group consisting of: a) SEQ ID NO:128, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and b) SEQ ID NO:118, with the proviso that at least half of the CpG dinucleotides of the coding strand has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and c) SEQ ID NO:128 or SEQ ID NO:118, with the proviso that it contains a mutation to encode either of the following polypeptide variants as compared to the polypeptide encoded by the native sequences: (i) substitution valine at position 201 into glycine (201V>G), or (ii) subtitution of glutamic acid at position 185 into aspartic acid (185E>D), or (iii) a combination of both mutations (i) and (ii) (185E>D and 201V>G), with the further proviso that at least half of the CpG dinucleotides of the coding strand has been replaced without further mutating the amino acid sequence that is encoded beyond the mutation indicated under (i)-(iii), and with the further proviso that the start codon is either GTG or TTG. In one advantageous embodiment hereof, the DNA molecule comprises SEQ ID NO:130, with the proviso that nucleotide A at position 555 is replaced by C to encode the encode the 185E>D mutation, and that nucleotide T at position 602 is replaced by G and that nucleotide G at position 603 is replaced by T to encode the 201V>G mutation, and with the further proviso that the start codon is either GTG or TTG.

[0023] In certain embodiments, the DNA molecule comprises an open reading frame sequence that encodes a polypeptide that provides resistance against zeocin, wherein the DNA molecule comprises a sequence chosen from the group consisting of: a) SEQ ID NO:92, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG; and b) SEQ ID NO:92 wherein nucleotide A at position 280 is replaced by T, and with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon is either GTG or TTG. In one advantageous embodiment hereof, the DNA sequence comprises SEQ ID NO:132.

[0024] In another aspect, provided is a DNA molecule comprising an open reading frame sequence that encodes a selectable marker polypeptide, wherein the selectable marker polypeptide is chosen from the group consisting of: (i) tryptophan synthesizing enzyme (trp); (ii) histidine synthesizing enzyme (his); and (iii) 5,6,7,8 tetrahydrofolate synthesizing enzyme (dhfr); and wherein the DNA molecule in the coding strand comprises a translation start sequence for the selectable marker polypeptide chosen from the group consisting of: a) a GTG start codon; and b) .sub.a TTG start codon.

[0025] In certain embodiments, the DNA molecule comprises an open reading frame sequence that encodes trp, wherein the DNA molecule comprises a sequence chosen from the group consisting of SEQ ID NO:134 and SEQ ID NO:136, with the proviso that the first three nucleotides (the start codon) are either GTG or TTG.

[0026] In certain embodiments, the DNA molecule comprises an open reading frame sequence that encodes his, wherein the DNA molecule comprises a sequence chosen from the group consisting of SEQ ID NO:138 and SEQ ID NO:140, with the proviso that the first three nucleotides (the start codon) are either GTG or TTG.

[0027] In certain embodiments, the DNA molecule comprises an open reading frame sequence that encodes dhfr, wherein the DNA molecule comprises a sequence chosen from the group consisting of SEQ ID NO:98 and SEQ ID NO:122, with the proviso that the first three nucleotides (the start codon) are either GTG or TTG.

[0028] The coding sequence of the polypeptide of interest may comprises an optimal translation start sequence.

[0029] In certain embodiments, the open reading frame sequence that encodes the selectable marker polypeptide has no ATG sequence in the coding strand.

[0030] In certain embodiments, the open reading frame sequence that encodes a selectable marker polypeptide is part of a multicistronic transcription unit that further comprises an open reading frame sequence encoding a polypeptide of interest.

[0031] In certain embodiments thereof, the open reading frame that encodes the selectable marker polypeptide is upstream of the open reading frame encoding the polypeptide of interest, and the open reading frame that encodes the selectable marker polypeptide has no ATG sequence in the coding strand. In alternative embodiments, the open reading frame that encodes the polypeptide of interest is upstream of the open reading frame that encodes the selectable marker polypeptide, and the open reading frame that encodes the selectable marker polypeptide is operably linked to an internal ribosome entry site (IRES).

[0032] Further provided are expression cassettes comprising a DNA molecule hereof, which expression cassettes further comprise a promoter upstream of the multicistronic expression unit and being functional in a eukaryotic host cell for initiation transcription of the multicistronic expression unit, and the expression cassettes further comprising a transcription termination sequence downstream of the multicistronic expression unit.

[0033] In certain embodiments thereof, such expression cassettes further comprise at least one chromatin control element chosen from the group consisting of a matrix or scaffold attachment region (MAR/SAR), an insulator sequence, a ubiquitous chromatin opener element (UCOE), and an anti-repressor sequence. Anti-repressor sequences are most preferred in this aspect, and in certain embodiments, the anti-repressor sequences are chosen from the group consisting of: a) any one SEQ ID NO:1 through SEQ ID NO:66; b) fragments of any one of SEQ ID NO:1 through SEQ ID NO:66, wherein the fragments have anti-repressor activity; c) sequences that are at least 70% identical in nucleotide sequence to a) or b) wherein the sequences have anti-repressor activity; and d) the complement to any one of a) to c). In certain certain embodiments, the anti-repressor sequences are chosen from the group consisting of: STAR67 (SEQ ID NO:66), STAR7 (SEQ ID NO:7), STAR9 (SEQ ID NO:9), STAR17 (SEQ ID NO:17), STAR27 (SEQ ID NO:27), STAR29 (SEQ ID NO:29), STAR43 (SEQ ID NO:43), STAR44 (SEQ ID NO:44), STAR45 (SEQ ID NO:45), STAR47 (SEQ ID NO:47), STAR61 (SEQ ID NO:61), and functional fragments or derivatives of these STAR sequences. In certain embodiments, the expression cassette comprises STAR67, or a functional fragment or derivative thereof, positioned upstream of the promoter driving expression of the multicistronic gene. In certain embodiments, the multicistronic gene is flanked on both sides by at least one anti-repressor sequence. In certain embodiments, expression cassettes are provided according to the invention, comprising in 5' to 3' order: anti-repressor sequence A-anti-repressor sequence B-[promoter-multicistronic transcription unit hereof (encoding the functional selectable marker protein {from a sequence with a GTG or TTG start codon} and upstream or downstream thereof the polypeptide of interest)-transcription termination sequence]-anti-repressor sequence C, wherein A, B and C may be the same or different.

[0034] In certain embodiments, the polypeptide of interest is a part of a multimeric protein, for example a heavy or light chain of an immunoglobulin.

[0035] Also provided are host cells comprising DNA molecules hereof.

[0036] Further provided are methods for generating host cells expressing a polypeptide of interest, such a method comprising the steps of: introducing into a plurality of precursor host cells an expression cassette hereof, culturing the cells under conditions selecting for expression of the selectable marker polypeptide, and selecting at least one host cell producing the polypeptide of interest.

[0037] Further provided are methods for producing a polypeptide of interest, the methods comprising culturing a host cell, the host cell comprising an expression cassette hereof, and expressing the polypeptide of interest from the expression cassette. In certain embodiments, the polypeptide of interest is harvested from the host cells and/or from the host cell culture medium.

[0038] In certain embodiments, if the selectable marker polypeptide is trp, the host cell in advantageous embodiments is cultured in a culture medium that contains indole and which culture medium is essentially devoid of tryptophan. In other embodiments, if the selectable marker polypeptide is his, the host cell in advantageous embodiments is cultured in a culture medium that contains histidinol and which culture medium is essentially devoid of histidine. In other embodiments, if the selectable marker polypeptide is dhfr, the host cell in advantageous embodiments is cultured in a culture medium that contains folate and which culture medium is essentially devoid of glycine, hypoxanthine and thymidine.

[0039] In further aspects, provided is RNA molecules having the sequence of a transcription product of a DNA molecule hereof. Further, provided is selectable marker polypeptides that are the translation product of a DNA molecule of the invention.

[0040] In another aspect, further provided is a DNA molecule comprising an expression cassette comprising a multicistronic transcription unit, the multicistronic transcription unit comprising a sequence coding for a polypeptide of interest, a sequence coding for a first selectable marker polypeptide, and a sequence coding for a second selectable marker polypeptide, wherein the sequence encoding the first selectable marker polypeptide in the coding strand comprises a translation start sequence chosen from the group consisting of a GTG start codon and a TTG start codon, and wherein the second selectable marker polypeptide is chosen from the group consisting of: (i) tryptophan synthesizing enzyme (trp); (ii) histidine synthesizing enzyme (his); and (iii) 5,6,7,8 tetrahydrofolate synthesizing enzyme (dhfr), and wherein the expression cassette further comprises a promoter upstream of the multicistronic expression unit and a transcription termination sequence downstream of the multicistronic expression unit, wherein the expression cassette is functional in a eukaryotic host cell for initiating transcription of the multicistronic expression unit, and wherein the DNA molecule further comprises at least one chromatin control element selected from the group consisting of matrix attachment regions (MAR), and anti-repressor (STAR) sequences.

[0041] In one embodiment thereof, the sequence encoding the first selectable marker polypeptide is upstream of the sequence encoding the polypeptide of interest and the sequence encoding the first selectable marker polypeptide in the coding strand is devoid of the sequence ATG, and the sequence encoding the second selectable marker polypeptide is downstream of the polypeptide of interest and is operably linked to an IRES.

[0042] In another embodiment, the sequence encoding the polypeptide of interest is upstream of the sequences encoding the first and second selectable marker polypeptide, and the sequence encoding the first selectable marker polypeptide is operably linked to an IRES, and the sequence encoding the second selectable marker polypeptide is operably linked to an IRES.

[0043] In certain embodiments, the first selectable marker polypeptide confers resistance against lethal or growth-inhibitory effects of a selection agent chosen from the group consisting of ZEOCIN.RTM. and neomycin antibiotics.

[0044] In certain embodiments, a chromatin control element is an anti-repressor sequence chosen from the group consisting of any one of SEQ ID NO:1 through SEQ ID NO:66, and the complement of any of these.

[0045] Further provided is host cells comprising such DNA molecules.

[0046] Further provided is a method for expressing a polypeptide of interest, comprising culturing a host cell that comprises a DNA molecule of the invention, and expressing the polypeptide of interest form the expression cassette, and wherein: a) if the second selectable marker polypeptide is trp, the host cell is cultured in a culture medium that contains indole and which culture medium is essentially devoid of tryptophan; b) if the second selectable marker polypeptide is his, the host cell is cultured in a culture medium that contains histidinol and which culture medium is essentially devoid of histidine; c) if the second selectable marker polypeptide is dhfr, the host cell is cultured in a culture medium that contains folate and which culture medium is essentially devoid of glycine, hypoxanthine and thymidine. In certain embodiments, the method further comprises harvesting the polypeptide of interest, from the host cell, from the culture medium, or from both the host cell and the culture medium.

BRIEF DESCRIPTION OF THE DRAWINGS

[0047] FIG. 1. Schematic representation of the use of a selection marker gene (ZEOCIN.RTM.-resistance gene) of the incorporated '525 application. A. wild-type zeocin-resistance gene, having its normal translation initation site (ATG start codon) and one internal ATG codon, which codes for methionine. B. mutant zeocin-resistance gene, wherein the internal ATG has been mutated into a codon for leucine; this mutant is a functional ZEOCIN.RTM.-resistance gene. C. same as B, but comprising a mutated translation initiation site, wherein the context of the ATG start codon has been mutated to decrease the translation initiation. D. same as B, but comprising a mutated start codon (GTG). E. same as B, but with a TTG start codon. The numbers under the Figures C-E schematically indicate a relative amount of initiation frequency (under the start codon) and "scan-through" frequency (under the coding sequence) by the ribosomes, but only in a semi-quantitative manner, i.e., they indicate the efficiency of translation initiation compared to each other, but the qualitative numbers may differ completely: the numbers only serve to explain the invention. See, Example 1 for details.

[0048] FIG. 2. Schematic representation of a multicistronic transcription unit according to the invention of the incorporated '525 application, with more or less reciprocal interdependent translation efficiency. Explanation as for FIG. 1, but now a dEGFP gene (here exemplifying a gene of interest) has been placed downstream of the selectable marker polypeptide coding sequence. The ZEOCIN.RTM.-resistance gene comprises the internal Met.fwdarw.Leu mutation (see FIG. 1B). See, Example 2 for details.

[0049] FIG. 3. Results of selection systems according to the invention of the incorporated '525 application, with and without STAR elements. A. ZEOCIN.RTM.-resistance gene with ATG start codon in bad context (referred to as "ATGmut" in the picture, but including a spacer sequence behind the ATG in the bad context, so in the text generally referred to as "ATGmut/space"). B. ZEOCIN.RTM.-resistance gene with GTG start codon. C. ZEOCIN.RTM.-resistance gene with TTG start codon. d2EGFP signal for independent colonies is shown on the vertical axis. See, Example 2 for details.

[0050] FIG. 4. Results of selection system according to the invention of the incorporated '525 application in upscaled experiment (A), and comparison with selection system according to prior art using an IRES (B). d2EGFP signal for independent colonies is shown on the vertical axis. See, Example 3 for details.

[0051] FIG. 5. Results of selection system with multicistronic transcription unit according to the invention of the incorporated '525 application, using blasticidin as a selectable marker. A. blasticidin resistance gene mutated to comprise a GTG start codon. B. blasticidin resistance gene mutated to comprise a TTG start codon. The blasticidin resistance gene has further been mutated to remove all internal ATG sequences. d2EGFP signal for independent colonies is shown on the vertical axis. See, Example 4 for details.

[0052] FIG. 6. Stability of expression of several clones with a multicistronic transcription unit according to the invention (including a ZEOCIN.RTM. with TTG start codon) of the incorporated '525 application. Selection pressure (100 .mu.g/ml zeocin) was present during the complete experiment. d2EGFP signal for independent colonies is shown on the vertical axis. See, Example 5 for details.

[0053] FIG. 7. As FIG. 6, but ZEOCIN.RTM. antibiotic concentration was lowered to 20 .mu.g/ml after establishment of clones.

[0054] FIG. 8. As FIG. 6, but ZEOCIN.RTM. antibiotic was absent from culture medium after establishment of clones.

[0055] FIG. 9. Expression of an antibody (anti-EpCAM) using the selection system with the multicistronic transcription unit according to the invention of the incorporated '525 application. The heavy chain (HC) and light chain (LC) are the polypeptide of interest in this example. Each of these is present in a separate transcription unit, which are both on a single nucleic acid molecule in this example. The HC is preceded by the zeocin-resistance gene coding for a selectable marker polypeptide, while the LC is preceded by the blasticidin resistance gene coding for a selectable marker polypeptide. Both resistance genes have been mutated to comprise an ATG start codon in a non-optimal context ("mutATG" in Figure, but including a spacer sequence, and hence in the text generally referred to as "ATGmut/space"). Each of the multicistronic transcription units is under control of a CMV promoter. Constructs with STAR sequences as indicated were compared to constructs without STAR sequences. The antibody levels obtained when these constructs were introduced into host cells are given on the vertical axis in pg/cell/day for various independent clones. See, Example 6 for details.

[0056] FIG. 10. As FIG. 9, but both the selection marker genes have been provided with a GTG start codon. See, Example 6 for details.

[0057] FIG. 11. As FIG. 9, but both the selection marker genes have been provided with a TTG start codon. See, Example 6 for details.

[0058] FIG. 12. Stability of expression in sub-clones in the absence of selection pressure (after establishing colonies under selection pressure, some colonies where sub-cloned in medium containing no zeocin). See, Example 5 for details.

[0059] FIG. 13. Copy-number dependency of expression levels of an embodiment of the invention of the incorporated '525 application. See, Example 5 for details.

[0060] FIG. 14. As FIG. 1, but for the blasticidin resistance gene. None of the 4 internal ATGs in this gene are in frame coding for a methionine, and therefore the redundancy of the genetic code was used to mutate these ATGs without mutating the internal amino acid sequence of the encoded protein.

[0061] FIG. 15. Coding sequence of the wild-type zeocin-resistance gene (SEQ ID NO:92). Bold ATGs code for methione. The first bold ATG is the start codon.

[0062] FIG. 16. Coding sequence of the wild-type blasticidin resistance gene (SEQ ID NO:94). Bold ATGs code for methione. The first bold ATG is the start codon. Other ATGs in the sequence are underlined: these internal ATGs do not code for methionine, because they are not in frame.

[0063] FIG. 17. Coding sequence of the wild-type puromycin resistance gene (SEQ ID NO:96). Bold ATGs code for methione. The first bold ATG is the start codon.

[0064] FIG. 18. Coding sequence of the wild-type mouse DHFR gene (SEQ ID NO:98). Bold ATGs code for methione. The first bold ATG is the start codon. Other ATGs in the sequence are underlined: these internal ATGs do not code for methionine, because they are not in frame.

[0065] FIG. 19. Coding sequence of the wild-type hygromycin resistance gene (SEQ ID NO:100). Bold ATGs code for methione. The first bold ATG is the start codon. Other ATGs in the sequence are underlined: these internal ATGs do not code for methionine, because they are not in frame.

[0066] FIG. 20. Coding sequence of the wild-type neomycin resistance gene (SEQ ID NO:102). Bold ATGs code for methione. The first bold ATG is the start codon. Other ATGs in the sequence are underlined: these internal ATGs do not code for methionine, because they are not in frame.

[0067] FIG. 21. Coding sequence of the wild-type human glutamine synthase (GS) gene (SEQ ID NO:104). Bold ATGs code for methione. The first bold ATG is the start codon. Other ATGs in the sequence are underlined: these internal ATGs do not code for methionine, because they are not in frame.

[0068] FIG. 22. Schematic representation of some further modified zeocin-resistance selection marker genes with a GTG start codon according to the invention, allowing for further fine-tuning of the selection stringency. See, Example 7 for details.

[0069] FIG. 23. Results with expression systems containing the further modified zeocin-resistance selection marker genes. See, Example 7 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs (see also FIG. 22) are indicated on the horizontal axis (the addition of 7/67/7 at the end of the construct name indicates the presence of STAR sequences 7 and 67 upstream of the promoter and STAR7 downstream of the transcription termination site), and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0070] FIG. 24. Schematic representation of some further modified zeocin-resistance selection marker genes with a TTG start codon according to the invention, allowing for further fine-tuning of the selection stringency. See, Example 8 for details.

[0071] FIG. 25. Results with expression systems containing the further modified zeocin-resistance selection marker genes. See, Example 8 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0072] FIG. 26. As FIG. 1, but for the puromycin resistance gene. All three internal ATGs code for methione (panel A), and are replaced by CTG sequences coding for leucine (panel B). See, Example 9 for details.

[0073] FIG. 27. Results with expression constructs containing the puromycin resistance gene with a TTG start codon and no internal ATG codons. See, Example 9 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0074] FIG. 28. As FIG. 1, but for the neomycin resistance gene. See, Example 10 for details. A. wild-type neomycin resistance gene; ATG sequences are indicated, ATGs coding for methionine are indicated by Met above the ATG. B. neomycin resistance gene without ATG sequences, and with a GTG start codon. C. neomycin resistance gene without ATG sequences, and with a TTG start codon.

[0075] FIG. 29. As FIG. 1, but for the dhfr gene. See, Example 11 for details. A. wild-type dhfr gene; ATG sequences are indicated, ATGs coding for methionine are indicated by Met above the ATG. B. dhfr gene without ATG sequences, and with a GTG start codon. C. dhfr gene without ATG sequences, and with a TTG start codon.

[0076] FIG. 30. Results with expression constructs (zeocin-selectable marker) according to the invention of the incorporated '525 application in PER.C6.RTM. cells. See, Example 12 for details. Dots indicate individual data points lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0077] FIG. 31. Results with expression constructs (blasticidin selectable marker) according to the invention of the incorporated '525 application in PER.C6.RTM. cells. See, Example 12 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0078] FIG. 32. Results with expression constructs according to the invention of the incorporated '525 application, further comprising a transcription pause (TRAP) sequence. See, Example 13 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0079] FIG. 33. Copy-number dependency of expression of an antibody using transcription units according to the invention of the incorporated '525 application. See, Example 14 for details.

[0080] FIG. 34. Antibody expression from colonies containing expression constructs according to the invention of the incorporated '525 application, wherein the copy number of the expression constructs is amplified by methotrexate. See, Example 15 for details. White bars: selection with ZEOCIN.RTM. and blasticidin; black bars: selection with zeocin, blasticidin and methotrexate (MTX). Numbers of tested colonies are depicted on the horizontal axis.

[0081] FIG. 35. Results with different promoters. See, Example 16 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0082] FIG. 36. Results with different STAR elements. See, Example 17 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0083] FIG. 37. Results with other chromatin control elements. See, Example 18 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph (black triangles indicate different tested chromatin control elements); vertical axis indicates d2EGFP signal.

[0084] FIG. 38. Results with expression constructs according to the invention of the incorporated '953 application. The expression construct contains the sequence encoding the polypeptide of interest (exemplified here by d2EGFP) upstream of an IRES, which is upstream of the sequence encoding the selectable marker according to the invention (exemplified here by the zeocin-resistance gene, with a TTG start codon (TTG Zeo) (or in controls with its normal ATG start codon (ATG Zeo)). See, Example 19 for details. Dots indicate individual data points; lines indicate the average expression levels; used constructs are indicated on the horizontal axis, and schematically depicted above the graph; vertical axis indicates d2EGFP signal.

[0085] FIG. 39. Erythropoietin (EPO) expression with expression constructs of the invention. See, Example 20 for details.

[0086] FIG. 40. Results with different STAR elements in the CHO-DG44 cell line. Dots indicate individual data points; lines indicate the average expression levels; vertical axis indicates d2EGFP signal. The construct is schematically shown above the graph, while the STAR elements tested in the construct are indicated below the horizontal axis. See, Example 21 for details.

[0087] FIG. 41. Results with a zeocin-resistance marker with reduced CpG content in CHO-K1 cells. Dots indicate individual data points; lines indicate the average expression levels; vertical axis indicates d2EGFP signal. See, Example 22 for details.

[0088] FIG. 42. As in FIG. 41, but now in CHO-DG44 cells. See, Example 22 for details.

[0089] FIG. 43. Results with "CpG poor" neomycin resistance marker having different mutations. Dots indicate individual data points; lines indicate the average expression levels; vertical axis indicates d2EGFP signal. See, Example 23 for details.

[0090] FIG. 44. Schematic drawing of constructs with tryptophane synthesizing enzyme (trp) as selectable marker polypeptide according to the invention. See, Example 24 for details.

[0091] FIG. 45. Schematic drawing of constructs with histidine synthesizing enzyme (his) as selectable marker polypeptide according to the invention. See, Example 25 for details.

[0092] FIG. 46. Schematic drawing of constructs with dhfr as selectable marker polypeptide according to the invention. See, Example 26 for details.

[0093] FIG. 47. Schematic drawing of constructs having multicistronic transcription units with two selectable marker polypeptides and one polypeptide of interest (HC: heavy chain; LC: light chain), the first selectable marker polypeptide providing resistance to an antibiotic and having a TTG (or GTG, not shown) start codon in the coding sequence and the second selectable marker polypeptide being trp or dhfr and being under control of an IRES. See, Example 27 for details.

DETAILED DESCRIPTION

[0094] In one aspect, provided is a DNA molecule comprising an open reading frame sequence that encodes a selectable marker polypeptide, wherein the DNA molecule in the coding strand comprises a translation start sequence for the selectable marker polypeptide chosen from the group consisting of: a) a GTG start codon; and b) a TTG start codon; and wherein the open reading frame sequence that encodes the selectable marker protein has been mutated to replace at least 10% of its CpG dinucleotides (any "CG" in the sequence) as compared to the native open reading frame sequence that encodes the selectable marker protein. Such a DNA molecule can be used according to the invention for obtaining eukaryotic host cells expressing high levels of the polypeptide of interest, by selecting for the expression of the selectable marker polypeptide. Subsequently or simultaneously, one or more host cell(s) expressing the polypeptide of interest can be identified, and further used for expression of high levels of the polypeptide of interest.

[0095] It is shown herein that the reduction of the CpG content of the selectable marker gene of the invention, i.e., having a TTG or GTG start codon, can lead to improved expression of a polypeptide of interest that is translated from a multicistronic transcription unit from which also the selectable marker polypeptide is translated. Without wishing to be bound by theory, it is believed that reduction of the CpG content may reduce the possibility for silencing of transcription, because CpG dinucleotides can be methylated and silenced in eukaryotes. Selectable marker polypeptides that are encoded by genes with a relatively high CpG content, often derived from bacterial sequences, for instance, ZEOCIN.TM. antibiotic and neomycin, may benefit from the reduction of the CpG content. In certain embodiments, CpG dinucleotides are removed from a sequence encoding a selectable marker polypeptide without changing the encoded amino acid sequence. This can be done by taking advantage of the redundancy of the genetic code, as is well known and routine to the person skilled in the art of molecular biology.

[0096] In certain embodiments, in particular when the selectable marker polypeptide coding sequence is to be used upstream of the coding sequence of a polypeptide of interest in a multicistronic transcription unit described herein, the coding sequence of the selectable marker polypeptide is devoid of ATG sequences.

[0097] It is expected that a positive effect of removing CpG dinucleotides will be apparent when at least 10% of the CpG dinucleotides in the coding sequence of the selectable marker gene have been replaced. It is expected that removal of more CpG dinucleotides will increase the effect, and hence in certain embodiments, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70% or at least 80% of the CpG dinucleotides are mutated compared to the native open reading frame sequence that encodes the selectable marker protein. In certain advantageous embodiments, at least half of the CpG dinucleotides of the open reading frame sequence that encodes the selectable marker polypeptide have been replaced as compared to the native open reading frame sequence that encodes the selectable marker polypeptide.

[0098] A native open reading frame sequence that encodes the selectable marker polypeptide that provides resistance to neomycin is given as SEQ ID NO:128 (containing internal ATGs) and as SEQ ID NO:118 (lacking internal ATGs). In advantageous embodiments, these sequences may contain one or more further mutations so that the encoded polypeptide has a mutation of valine at position 201 to glycine (201V>G), of glutamic acid at position 185 to aspartic acid (185E>D), or both (185E>D, 201V>G).

[0099] A native open reading frame sequence that encodes the selectable marker polypeptide that provides resistance to ZEOCIN.TM. antibiotic is given as SEQ ID NO:92 (containing internal ATGs), and mutation of A at position 280 into T in this sequence gives a sequence lacking internal ATGs, and wherein the internally encoded methionine at position 94 is replaced by leucine. For the DNA sequences of the invention, the start codon (first three nucleotides of the DNA sequences) is mutated into a GTG or into a TTG start codon.

[0100] In certain advantageous embodiments, the selectable marker polypeptide provides resistance against ZEOCIN.TM. antibiotic. In certain embodiments thereof, the DNA molecule comprises SEQ ID NO:92, wherein at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, with the proviso that the start codon (first three nucleotides in the sequence) is replaced by a start codon chosen from GTG or TTG. In an alternative embodiment, the DNA molecule comprises SEQ ID NO:92 wherein nucleotide A at position 280 is replaced by T, such that encoded amino acid 94 (methionine) is replaced by leucine, and wherein at least half of the CpG dinucleotides has been replaced without further mutating the amino acid sequence that is encoded, with the proviso that the start codon (first three nucleotides in the sequence) is replaced by a start codon chosen from GTG or TTG. This embodiment lacks ATG sequences in the coding sequence for the ZEOCIN.TM. antibiotic-resistance gene, and is therefore suitable in the multicistronic transcription units of the invention wherein the coding sequence for the selectable marker polypeptide is upstream of the coding sequence for the polypeptide of interest. In one preferred embodiment hereof, the DNA molecule comprises SEQ ID NO:132.

[0101] In other advantageous embodiments, the selectable marker polypeptide provides resistance against neomycin. In certain embodiments thereof, the DNA molecule comprises a sequence chosen from the group consisting of any one of: a) SEQ ID NO:128, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon (the first ATG sequence) is replaced by either GTG or TTG; b) SEQ ID NO:118, with the proviso that at least half of the CpG dinucleotides has been replaced without mutating the amino acid sequence that is encoded, and with the further proviso that the start codon (the first ATG sequence) is replaced by either GTG or TTG; and c) SEQ ID NO:128 or SEQ ID NO:118, containing a mutation to encode a neomycin resistance protein variant as compared to the sequences encoded by the indicated sequences, the variant having glycine at position 201 in the encoded protein (201G variant), or aspartic acid at position 185 (185D variant), or both glycine at position 201 and aspartic acid at position 185 (185D, 201G variant), with the proviso that at least half of the CpG dinucleotides in the given DNA sequence has been replaced without further mutating the amino acid sequence that is encoded, and with the further proviso that the start codon (the first ATG sequence) is replaced by either GTG or TTG. The 185D variant is for instance obtained by replacing the codon from position 553-555 in the provided nucleic acid sequences with the sequence GAC, and the 201G variant is for instance obtained by replacing the codon from position 601-603 in the provided nucleic acid sequence with GGT. In one preferred embodiment, the DNA molecule comprises SEQ ID NO:130, with the proviso that nucleotide A at position 555 is replaced by C (to encode the 185E>D variant), and that nucleotide T at position 602 is replaced by G and that nucleotide G at position 603 is replaced by T (to encode the 201V>G variant), and with the further proviso that the start codon (ATG at positions 1-3) is replaced by either GTG or TTG. It will be clear to the skilled person that further variations can be prepared by the skilled person without departing from the teaching of the invention, and such further variations are encompassed with the invention as long as the start codon is not ATG and the encoded protein provides resistance against neomycin (or G418). The 185D and 201G variants further improve the selection stringency according to the invention.

[0102] The term "monocistronic gene" is defined as a gene capable of providing a RNA molecule that encodes one polypeptide. A "multicistronic transcription unit," also referred to as multicistronic gene, is defined as a gene capable of providing an RNA molecule that encodes at least two polypeptides. The term "bicistronic gene" is defined as a gene capable of providing a RNA molecule that encodes two polypeptides. A bicistronic gene is therefore encompassed within the definition of a multicistronic gene. A "polypeptide" as used herein comprises at least five amino acids linked by peptide bonds, and can for instance be a protein or a part, such as a subunit, thereof. Mostly, the terms polypeptide and protein are used interchangeably herein. A "gene" or a "transcription unit" as used in the invention can comprise chromosomal DNA, cDNA, artificial DNA, combinations thereof, and the like. Transcription units comprising several cistrons are transcribed as a single mRNA.

[0103] A multicistronic transcription unit according to the invention can for instance be a bicistronic transcription unit coding from 5' to 3' for a selectable marker polypeptide and for a polypeptide of interest, or for instance a bicistronic transcription unit coding from 5' to 3' for a polypeptide of interest and for a selectable marker polypeptide. In the former case, the coding sequence for the selectable marker polypeptide is preferably devoid of ATG sequences in the coding starnd. In the latter case, the polypeptide of interest is encoded upstream from the coding sequence for the selectable marker polypeptide and an internal ribosome entry site (IRES) is operably linked to the sequence encoding the selectable marker polypeptide, and hence the selectable marker polypeptide is dependent from (also referred to as "operably linked to") the IRES for its translation.

[0104] One may use separate transcription units for the expression of different polypeptides of interest, also when these form part of a multimeric protein (see, e.g., Example 6: the heavy and light chain of an antibody each are encoded by a separate transcription unit, each of these expression units being a bicistronic expression unit).

[0105] The DNA molecules described herein can be present in the form of double stranded DNA, having with respect to the selectable marker polypeptide and the polypeptide of interest a coding strand and a non-coding strand, the coding strand being the strand with the same sequence as the translated RNA, except for the presence of T instead of U. Hence, an AUG start codon is coded for in the coding strand by an ATG sequence, and the strand containing this ATG sequence corresponding to the AUG start codon in the RNA is referred to as the coding strand of the DNA. It will be clear to the skilled person that start codons or translation initiation sequences are in fact present in an RNA molecule, but that these can be considered equally embodied in a DNA molecule coding for such an RNA molecule; hence, wherever the invention refers to a start codon or translation initation sequence, the corresponding DNA molecule having the same sequence as the RNA sequence but for the presence of a T instead of a U in the coding strand of the DNA molecule is meant to be included, and vice versa, except where explicitly specified otherwise. In other words, a start codon is for instance an AUG sequence in RNA, but the corresponding ATG sequence in the coding strand of the DNA is referred to as start codon as well in the invention. The same is used for the reference of "in frame" coding sequences, meaning triplets (3 bases) in the RNA molecule that are translated into an amino acid, but also to be interpreted as the corresponding trinucleotide sequences in the coding strand of the DNA molecule.

[0106] The selectable marker polypeptide and the polypeptide of interest encoded by the multicistronic gene each have their own translation initation sequence, and therefore each have their own start codon (as well as stop codon), i.e., they are encoded by separate open reading frames.

[0107] The term "selection marker" or "selectable marker" is typically used to refer to a gene and/or protein whose presence can be detected directly or indirectly in a cell, for example a polypeptide that inactivates a selection agent and protects the host cell from the agent's lethal or growth-inhibitory effects (e.g., an antibiotic resistance gene and/or protein). Another possibility is that the selection marker induces fluorescence or a color deposit (e.g., green fluorescent protein (GFP) and derivatives (e.g d2EGFP), luciferase, lacZ, alkaline phosphatase, etc.), which can be used for selecting cells expressing the polypeptide inducing the color deposit, e.g., using a fluorescence activated cell sorter (FACS) for selecting cells that express GFP. Preferably, the selectable marker polypeptide according to provided is resistance against lethal and/or growth-inhibitory effects of a selection agent. The selectable marker polypeptide is encoded by the DNA described herein. The selectable marker polypeptide described herein is functional in a eukaryotic host cell, and thus able to be selected for in eukaryotic host cells. Any selectable marker polypeptide fulfilling this criterion can in principle be used. Such selectable marker polypeptides are well known in the art and routinely used when eukaryotic host cell clones are to be obtained, and several examples are provided herein. In certain embodiments, a selection marker used is ZEOCIN.RTM. antibiotic. In other embodiments, blasticidin is used. The person skilled in the art will know that other selection markers are available and can be used, e.g., neomycin, puromycin, bleomycin, hygromycin, etc. In other embodiments, kanamycin is used. In yet other embodiments, the DHFR gene is used as a selectable marker, which can be selected for by methotrexate, especially by increasing the concentration of methotrexate cells can be selected for increased copy numbers of the DHFR gene. Similarly, the glutamine synthetase (GS) gene can be used, for which selection is possible in cells having insufficient GS (e.g., NS-0 cells) by culturing in media without glutamine, or alternatively in cells having sufficient GS (e.g., CHO cells) by adding an inhibitor of GS, methionine sulphoximine (MSX). Other selectable marker genes that could be used, and their selection agents, are for instance described in table 1 of U.S. Pat. No. 5,561,053, incorporated by reference herein; see also Kaufman, Methods in Enzymology, 185:537-566 (1990), for a review of these.

[0108] Other selectable marker polypeptides that can be used are enzymes involved in metabolic pathways. For instance, mammalian cells lack enzymes that are part of the metabolic pathway to create the amino acids tryptophan or histidine. Hence, these amino acids need to present in the culture medium when mammalian cell lines are to be cultured. However, providing the genetic information (which can be derived from the sequences present in bacteria) encoding the enzymes to the mammalian cells and that are essential for the synthesis of the respective amino acid can be used for selection purposes, by growing the cells in a culture medium lacking the respective amino acid, and containing certain precursors for the amino acid which precursor can then be converted into the amino acid by the encoded metabolic enzyme, if this is expressed in the mammalian cell. For example, tryptophan synthesizing enzyme (trp) can be used as a selection marker, by omitting tryptophan from the culture medium and including indol into the culture medium (Hartman and Mulligan, 1988). The trp (trpB) gene can be derived from E. coli, and can be used according to the invention, preferably by providing it with a GTG or TTG start codon (see SEQ ID NO:134 for the sequence of the trp gene, and SEQ ID NO:136 for the sequence of the trp gene wherein all internal ATG sequences have been removed). As another example histindine synthesizing enzyme (his) can be used as a selection marker, by omitting histidine from the culture medium and including histidinol into the culture medium (Hartman and Mulligan, 1988). The his gene can be derived from S.typhimurium, and can be used according to the invention, preferably by providing it with a GTG or TTG start codon (see, SEQ ID NO:138 for the sequence of the his gene, and SEQ ID NO:140 for the sequence of the his gene wherein all internal ATG sequences have been removed). As another example, the mammalian 5,6,7,8 tetrahydrofolate synthesizing enzyme dihydrofolate reductase (dhfr) can be used as a selection marker in cells that have a dhfr.sup.- phenotype (e.g., CHO-DG44 cells), by omitting glycine, hypoxanthine and thymidine from the culture medium and including folate (or (dihydro)folic acid) into the culture medium (Simonsen et al., 1988). The dhfr gene can for instance be derived from the mouse genome or mouse cDNA and can be used, preferably by providing it with a GTG or TTG start codon (see SEQ ID NO:98 for the sequence of the dhfr gene, and SEQ ID NO:122 for the sequence of the dhfr gene wherein all internal ATG sequences have been removed). In all these embodiments, by "omitting from the culture medium" is meant that the culture medium has to be essentially devoid of the indicated component(s), meaning that there is insufficient of the indicated component present to sustain growth of the cells in the culture medium, so that a good selection is possible when the genetic information for the indicated enzyme is expressed in the cells and the indicated precursor component is present in the culture medium. For instance, the indicated component is present at a concentration of less than 0.1% of the concentration of that component that is normally used in the culture medium for a certain cell type. Preferably, the indicated component is absent from the culture medium. A culture medium lacking the indicated component can be prepared according to standard methods by the skilled person or can be obtained from commercial media suppliers. A potential advantage of the use of these types of metabolic enzymes as selectable marker polypeptides is that they can be used to keep the multicistronic transcription units under continuous selection, which may result in higher expression of the polypeptide of interest.

[0109] In another aspect, the invention uses the trp, his, or dhfr metabolic selection markers as an additional selection marker in a multicistronic transcription unit hereof. In such embodiments, selection of host cell clones with high expression is first established by use of, for instance, an antibiotic selection marker, e.g., ZEOCIN.TM. antibiotic, neomycin, etc, the coding sequences of which will have a GTG or TTG start codon according to the invention. After the selection of suitable clones, the antibiotic selection is discontinued, and now continuous or intermittent selection using the metabolic enzyme selection marker can be performed by culturing the cells in the medium lacking the appropriate identified components described supra and containing the appropriate precursor components described supra. In this aspect, the metabolic selection markers are operably linked to an IRES, and can have their normal ATG content, and the start codon can be suitably chosen from ATG, GTG or TTG. The multicistronic transcription units in this aspect are at least tricistronic.

[0110] When two multicistronic transcription units are to be selected for in a single host cell, each one preferably contains the coding sequence for a different selectable marker, to allow selection for both multicistronic transcription units. Of course, both multicistronic transcription units may be present on a single nucleic acid molecule or alternatively each one may be present on a separate nucleic acid molecule.

[0111] The term "selection" is typically defined as the process of using a selection marker/selectable marker and a selection agent to identify host cells with specific genetic properties (e.g., that the host cell contains a transgene integrated into its genome). It is clear to a person skilled in the art that numerous combinations of selection markers are possible. One antibiotic that is particularly advantageous is ZEOCIN.TM., because the ZEOCIN.RTM. antibiotic-resistance protein (zeocin-R) acts by binding the drug and rendering it harmless. Therefore, it is easy to titrate the amount of drug that kills cells with low levels of zeocin-R expression, while allowing the high-expressors to survive. All other antibiotic-resistance proteins in common use are enzymes, and thus act catalytically (not 1:1 with the drug). Hence, the antibiotic ZEOCIN.TM. is a preferred selection marker. However, the invention also works with other selection markers.

[0112] A selectable marker polypeptide described herein is the protein that is encoded by the nucleic acid of the invention, which polypeptide can be detected, for instance because it provides resistance to a selection agent such as an antibiotic. Hence, when an antibiotic is used as a selection agent, the DNA encodes a polypeptide that confers resistance to the selection agent, which polypeptide is the selectable marker polypeptide. DNA sequences coding for such selectable marker polypeptides are known, and several examples of wild-type sequences of DNA encoding selectable marker proteins are provided herein (FIGS. 15-21). It will be clear that mutants or derivatives of selectable markers can also be suitably used according to the invention, and are therefore included within the scope of the term "selectable marker polypeptide," as long as the selectable marker protein is still functional.

[0113] For convenience and as generally accepted by the skilled person, in many publications as well as herein, often the gene and protein encoding the resistance to a selection agent is referred to as the "selectable agent (resistance) gene" or "selection agent (resistance) protein," respectively, although the official names may be different, e.g., the gene coding for the protein conferring restance to neomycin (as well as to G418 and kanamycin) is often referred to as neomycin (resistance) (or neo.sup.r) gene, while the official name is aminoglycoside 3'-phosphotransferase gene.

[0114] It is beneficial to have low levels of expression of the selectable marker polypeptide, so that stringent selection is possible. In the invention this is brought about by using a selectable marker coding sequence with a non-optimal translation efficiency. Upon selection, only cells that have nevertheless sufficient levels of selectable marker polypeptide will be selected, meaning that such cells must have sufficient transcription of the multicistronic transcription unit and sufficient translation of the selectable marker polypeptide, which provides a selection for cells where the multicistronic transcription unit has been integrated or otherwise present in the host cells at a place where expression levels from this transcription unit are high.

[0115] In certain embodiments, the DNA molecules hereof have the coding sequence for the selectable marker polypeptide upstream of the coding sequence for the polypeptide of interest, to provide for a multicistronic transcript (disclosed in detail in the incorporated '525 application). Hence, such a multicistronic transcription unit comprises in the 5' to 3' direction (both in the transcribed strand of the DNA and in the resulting transcribed RNA) the coding sequence for the selectable marker polypeptide and the sequence encoding the polypeptide of interest. In such embodiments, the open reading frame sequence that encodes the selectable marker polypeptide has no ATG sequences in the coding strand.

[0116] In alternative embodiments (disclosed in detail in the incorporated '953 application), the DNA molecules according to the invention have the coding sequence for the selectable marker polypeptide downstream of the coding sequence for the polypeptide of interest. Hence, the multicistronic transcription unit comprises in the 5' to 3' direction (both in the transcribed strand of the DNA and in the resulting transcribed RNA) the sequence encoding the polypeptide of interest and the coding sequence for the selectable marker polypeptide. In such embodiments, an IRES is upstream of and operably linked to the coding sequence for the selectable marker polypeptide.

[0117] To decrease translation of the selectable marker cistron, according to the invention the nucleic acid sequence coding for the selectable marker polypeptide comprises a mutation in the start codon (or in the context thereof) that decreases the translation initiation efficiency of the selectable marker polypeptide in a eukaryotic host cell. Preferably, a GTG start codon or more prefereably a TTG start codon is engineered into the selectable marker polypeptide. The translation efficiency is lower than that of the corresponding wild-type sequence in the same cell, i.e., the mutation results in less polypeptide per cell per time unit, and hence less selectable marker polypeptide. This can be detected using routine methods known to the person skilled in the art. For instance, in the case of antibiotic selection, the mutation will result in less resistance than obtained with the sequence having no such mutation and hence normal translation efficiency, which difference can easily be detected by determining the number of surviving colonies after a normal selection period, which will be lower when a translation efficiency decreasing mutation is present. As is well known to the person skilled in the art there are a number of parameters that indicate the expression level marker polypeptide such as, the maximum concentration of selection agent to which cells are still resistant, number of surviving colonies at a given concentration, growth speed (doubling time) of the cells in the presence of selection agent, combinations of the above, and the like.

[0118] The mutation that decreases the translation initiation efficiency according to the invention is established by providing the selectable marker polypeptide coding sequence with a non-optimal translation start sequence.

[0119] For example, the translation initiation efficiency of the selectable marker gene in eukaryotic cells can be suitably decreased according to the invention by mutating the start codon and/or the nucleotides in positions -3 to -1 and +4 (where the A of the ATG start codon is nt +1), for instance in the coding strand of the corresponding DNA sequence, to provide a non-optimal translation start sequence. A translation start sequence is often referred to in the field as "Kozak sequence," and an optimal Kozak sequence is RCCATGG, the start codon underlined, R being a purine, i.e., A or G (see Kozak M, 1986, 1987, 1989, 1990, 1997, 2002). Hence, besides the start codon itself, the context thereof, in particular nucleotides -3 to -1 and +4, are relevant, and an optimal translation startsequence comprises an optimal start codon (i.e., ATG) in an optimal context (i.e., the ATG directly preceded by RCC and directly followed by G). A non-optimal translation start sequence is defined herein as any sequence that gives at least some detectable translation in a eukaryotic cell (detectable because the selection marker polypeptide is detectable), and not having the consensus sequence RCCATGG (start codon underlined). Translation by the ribosomes is most efficient when an optimal Kozak sequence is present (see Kozak M, 1986, 1987, 1989, 1990, 1997, 2002). However, in a small percentage of events, non-optimal translation initiation sequences are recognized and used by the ribosome to start translation. The invention makes use of this principle, and allows for decreasing and even fine-tuning of the amount of translation and hence expression of the selectable marker polypeptide, which can therefore be used to increase the stringency of the selection system.

[0120] In a first embodiment of the invention, the ATG start codon of the selectable marker polypeptide (in the coding strand of the DNA, coding for the corresponding AUG start codon in the RNA transcription product) is left intact, but the positions at -3 to -1 and +4 are mutated such that they do not fulfill the optimal Kozak sequence any more, e.g., by providing the sequence TTTATGT as the translation start site (ATG start codon underlined). It will be clear that other mutations around the start codon at positions -3 to -1 and/or +4 could be used with similar results using the teaching of the invention, as can be routinely and easily tested by the person skilled in the art. The idea of this first embodiment is that the ATG start codon is placed in a "non-optimal" context for translation initiation.

[0121] In a second and preferred embodiment, the ATG start codon itself of the selectable marker polypeptide is mutated. This will in general lead to even lower levels of translation initiation than the first embodiment. The ATG start codon in the second embodiment is mutated into another codon, which has been reported to provide some translation initiation, for instance to GTG, TTG, CTG, ATT, or ACG (collectively referred to herein as "non-optimal start codons"). In certain embodiments, the ATG start codon is mutated into a GTG start codon. This provides still lower expression levels (lower translation) than with the ATG start codon intact but in a non-optimal context. More preferably, the ATG start codon is mutated to a TTG start codon, which provides even lower expression levels of the selectable marker polypeptide than with the GTG start codon (Kozak M, 1986, 1987, 1989, 1990, 1997, 2002; see also Examples 2-6 herein). The use of non-ATG start codons in the coding sequence for a selectable marker polypeptide in a multicistronic transcription unit according to the invention was not disclosed nor suggested in the prior art and, preferably in combination with chromatin control elements, leads to very high levels of expression of the polypeptide of interest, as also shown in the incorporated '525 application.

[0122] For the second embodiment, i.e., where a non-ATG start codon is used, it is strongly preferred to provide an optimal context for such a start codon, i.e., the non-optimal start codons are preferably directly preceded by nucleotides RCC in positions -3 to -1 and directly followed by a G nucleotide (position +4). However, it has been reported that using the sequence TTTGTGG (start codon underlined), some initiation is observed at least in vitro, so although strongly preferred it may not be absolutely required to provide an optimal context for the non-optimal start codons.

[0123] ATG sequences within the coding sequence for a polypeptide, but excluding the ATG start codon, are referred to as "internal ATGs," and if these are in frame with the ORF and therefore code for methionine, the resulting methionine in the polypeptide is referred to as an "internal methionine." It is strongly preferred according to certain embodiments (those of the incorporated '525 application, i.e., those where the sequence encoding the selectable marker polypeptide is upstream of the sequence encoding the polypeptide of interest) that the coding region (following the start codon, not necessarily including the start codon) coding for the selectable marker polypeptide is devoid of any ATG sequence in the coding strand of the DNA, up to (but not including) the start codon of the polypeptide of interest (obviously, the start codon of the polypeptide of interest may be, and in fact preferably is, an ATG start codon). This can be established by mutating any such ATG sequence within the coding sequence of the selectable marker polypeptide, following the start codon thereof (as is clear from the teaching above, the start codon of the selectable marker polypeptide itself may be an ATG sequence, but not necessarily so). To this purpose preferably, the degeneracy of the genetic code is used to avoid mutating amino acids in the selectable marker polypeptide wherever possible. Hence, wherever an ATG is present in the coding strand of the DNA sequence encoding the selectable marker polypeptide, which ATG is not in frame with the selectable marker polypeptide ORF, and therefore does not code for an internal methionine in the selectable marker polypeptide, the ATG can be mutated such that the resulting polypeptide has no mutations in its internal amino acid sequence. Where the ATG is an in-frame codon coding for an internal methionine, the codon can be mutated, and the resulting mutated polypeptide can be routinely checked for activity of the selectable marker polypeptide. In this way a mutation can be chosen which leads to a mutated selectable marker polypeptide that is still active as such (quantitative differences may exist, but those are less relevant, and in fact it could even be beneficial to have less active variants for the purpose of the invention; the minimum requirement is that the selectable marker polypeptide can still be selected for in eukaryotic cells). Amino acids valine, threonine, isoleucine and leucine are structurally similar to methionine, and therefore codons that code for one of these amino acids are good starting candidates to be tested in place of methione within the coding sequence after the start codon. Of course, using the teachings of the invention, the skilled person may test other amino acids as well in place of internal methionines, using routine molecular biology techniques for mutating the coding DNA, and routine testing for functionality of the selectable marker polypeptide. Besides routine molecular biology techniques for mutating DNA, it is at present also possible to synthesise at will (if required using subcloning steps) DNA sequences that have sufficient length for an ORF of a selectable marker polypeptide, and such synthetic DNA sequences can nowadays be ordered commercially from various companies. Hence, using the teachings of the invention, the person skilled in the art may design appropriate sequences according to the invention encoding a selectable marker polypeptide (with a mutation decreasing translation initiation, and preferably having no internal ATGs), have this sequence synthesized, and test the DNA molecule for functionality of the encoded selectable marker by introducing the DNA molecule in eukaryotic host cells and test for expression of functional selectable marker polypeptide. The commercial availability of such sequences also makes feasible to provide without undue burden for selection marker coding sequences lacking internal ATG sequences, where the wild-type coding sequence of the selection marker polypeptide comprises several such internal ATGs.

[0124] By providing a coding sequence for a selectable marker polypeptide lacking any internal ATG sequence, the chances of inadvertent translation initiation by ribosomes that passed the (first, non-optimal) translation start sequence of the selectable marker polypeptide at a subsequent internal ATG trinucleotide is diminished, so that the ribosomes will continue to scan for the first optimal translation start sequence, i.e., that of the polypeptide of interest.

[0125] For alternative embodiments, i.e., those where the sequence encoding the polypeptide of interest is upstream of the sequence encoding the selectable marker polypeptide and the latter is operably linked to an IRES (disclosed in the incorporated '953 application), internal ATGs in the sequence encoding the selectable marker polypeptide can remain intact.

[0126] The translation start sequence of the polypeptide of interest may comprise an optimal translation start sequence, i.e., having the consensus sequence RCCATGG (start codon underlined). This will result in a very efficient translation of the polypeptide of interest.

[0127] By providing the coding sequence of the marker with different mutations leading to several levels of decreased translation efficiency, the stringency of selection can be increased. Fine-tuning of the selection system is thus possible using the multicistronic transcription units according to the invention: for instance using a GTG start codon for the selection marker polypeptide, only few ribosomes will translate from this start codon, resulting in low levels of selectable marker protein, and hence a high stringency of selection; using a TTG start codon even further increases the stringency of selection because even less ribosomes will translate the selectable marker polypeptide from this start codon.

[0128] It is demonstrated in the incorporated '525 application that the multicistronic expression units disclosed therein can be used in a very robust selection system, leading to a very large percentage of clones that express the polypeptide of interest at high levels, as desired. In addition, the expression levels obtained for the polypeptide of interest appear to be significantly higher than those obtained when an even larger number of colonies are screened using selection systems hitherto known.

[0129] In addition to a decreased translation initiation efficiency, it could be beneficial to also provide for decreased translation elongation efficiency of the selectable marker polypeptide, e.g., by mutating the coding sequence thereof so that it comprises several non-preferred codons of the host cell, in order to further decrease the translation levels of the marker polypeptide and allow still more stringent selection conditions, if desired. In certain embodiments, besides the mutation(s) that decrease the translation efficiency according to the invention, the selectable marker polypeptide further comprises a mutation that reduces the activity of the selectable marker polypeptide compared to its wild-type counterpart. This may be used to increase the stringency of selection even further. As non-limiting examples, proline at position 9 in the ZEOCIN.TM. antibiotic-resistance polypeptide may be mutated, e.g., to Thr or Phe, and for the neomycin resistance polypeptide, amino acid residue 182 or 261 or both may further be mutated (see, e.g., WO 01/32901).

[0130] In certain embodiments, for the neomycin resistance polypeptide encoded by the sequences provided herein, amino acid residue 185 (glutamic acid) is mutated to aspartic acid and/or amino acid residue 201 (valine) is mutated into glycine (Sautter et al., 2005).

[0131] In some embodiments, a so-called spacer sequence is placed downstream of the sequence encoding the start codon of the selectable marker polypeptide, which spacer sequence preferably is a sequence in frame with the start codon and encoding a few amino acids, and that does not contain a secondary structure (Kozak, 1990), and does not contain the sequence ATG. Such a spacer sequence can be used to further decrease the translation initiation frequency if a secondary structure is present in the RNA (Kozak, 1990) of the selectable marker polypeptide (e.g., for zeocin, possibly for blasticidin), and hence increase the stringency of the selection system according to the invention.

[0132] The invention also provides a DNA molecule comprising the sequence encoding a selectable marker protein according to the invention, which DNA molecule has been provided with a mutation that decreases the translation efficiency of the functional selectable marker polypeptide in a eukarytic host cell. In certain embodiments hereof, the DNA molecule in the coding strand has been mutated compared to the wild-type sequence encoding the selectable marker polypeptide, such that the sequence ATG of the start codon is mutated into GTG (encoding Valine) or into TTG (encoding Leucine), and wherein the selectable marker polypeptide is still functional in a eukaryotic host cell. Such DNA molecules encompass a useful intermediate product according to the invention. These molecules can be prepared first, introduced into eukaryotic host cells and tested for functionality (for some markers this is even possible in prokaryotic host cells), if desired in a (semi-) quantitative manner, of the selectable marker polypeptide. They may then be further used to prepare a DNA molecule according to the invention, comprising the multicistronic transcription unit.

[0133] In one embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein that confers resistance to zeocin, the DNA sequence comprising SEQ ID NO:92, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0134] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein that confers resistance to blasticidin, the DNA sequence comprising SEQ ID NO:94, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0135] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein that confers resistance to neomycin, the DNA sequence comprising SEQ ID NO:102, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0136] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein that confers resistance to puromycin, the DNA sequence comprising SEQ ID NO:96, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0137] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein that confers resistance to hygromycin, the DNA sequence comprising SEQ ID NO:100, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0138] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein with dihydrofolate reductase (dhfr) activity (conferring resistance to methotrexate), the DNA sequence comprising SEQ ID NO:98, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0139] In another embodiment thereof, provided is a DNA molecule comprising a DNA sequence encoding a protein with glutamine synthetase (GS) activity, the DNA sequence comprising SEQ ID NO:104, with the proviso that the first ATG (the start codon, encoding Methionine) is replaced by either a GTG (encoding Valine) or a TTG (encoding Leucine) start codon.

[0140] It will be clear that for these embodiments, any DNA molecules as described but having mutations in the sequence downstream of the first ATG (start codon) coding for the selectable marker protein are also encompassed in the invention, as long as the respective encoded selectable marker protein still has activity. For instance any silent mutations that do not alter the encoded protein because of the redundancy of the genetic code are also encompassed. Further mutations that lead to conservative amino acid mutations or to other mutations are also encompassed, as long as the encoded protein still has activity, which may or may not be lower than that of the wild-type protein as encoded by the indicated sequences. In particular, it is preferred that the encoded protein is at least 70%, preferably at least 80%, more preferably at least 90%, still more preferably at least 95% identical to the proteins encoded by the respective indicated sequences. Testing for activity of the selectable marker proteins can be done by routine methods.

[0141] Also provided is a selectable marker proteins encoded by these embodiments.

[0142] In one aspect, provided is an expression cassette comprising the DNA molecule hereof, having the multicistronic transcription unit. Such an expression cassette is useful to express sequences of interest, for instance, in host cells. An "expression cassette" as used herein is a nucleic acid sequence comprising at least a promoter functionally linked to a sequence of which expression is desired. Preferably, an expression cassette further contains transcription termination and polyadenylation sequences. Other regulatory sequences such as enhancers may also be included. Hence, provided is an expression cassette comprising in the following order: 5'-promoter-multicistronic transcription unit according to the invention, coding for either (i) {a polypeptide of interest and downstream thereof a selectable marker polypeptide} or (ii) {a selectable marker polypeptide and downstream thereof a polypeptide of interest}-transcription termination sequence-3'. The promoter is capable of functioning in a eukaryotic host cell, i.e., it is capable of driving transcription of the multicistronic transcription unit. The promoter is thus operably linked to the multicistronic transcription unit. The expression cassette may optionally further contain other elements known in the art, e.g., splice sites to comprise introns, and the like. In some embodiments, an intron is present behind the promoter and before the sequence encoding the polypeptide of interest. In the embodiments where the selectable marker polypeptide is encoded downstream of the polypeptide of interest, an IRES is operably linked to the cistron that contains the selectable marker polypeptide coding sequence. In the embodiments where the selectable marker polypeptide is encoded upstream of the polypeptide of interest, the sequence encoding the selectable marker polypeptide is devoid of ATG sequences in the coding strand.

[0143] To obtain expression of nucleic acid sequences encoding protein, it is well known to those skilled in the art that sequences capable of driving such expression, can be functionally linked to the nucleic acid sequences encoding the protein, resulting in recombinant nucleic acid molecules encoding a protein in expressible format. In the invention, the expression cassette comprises a multicistronic transcription unit. In general, the promoter sequence is placed upstream of the sequences that should be expressed. Much used expression vectors are available in the art, e.g., the pcDNA and pEF vector series of Invitrogen, pMSCV and pTK-Hyg from BD Sciences, pCMV-Script from Stratagene, etc, which can be used to obtain suitable promoters and/or transcription terminator sequences, polyA sequences, and the like.

[0144] Where the sequence encoding the polypeptide of interest is properly inserted with reference to sequences governing the transcription and translation of the encoded polypeptide, the resulting expression cassette is useful to produce the polypeptide of interest, referred to as expression. Sequences driving expression may include promoters, enhancers and the like, and combinations thereof. These should be capable of functioning in the host cell, thereby driving expression of the nucleic acid sequences that are functionally linked to them. The person skilled in the art is aware that various promoters can be used to obtain expression of a gene in host cells. Promoters can be constitutive or regulated, and can be obtained from various sources, including viruses, prokaryotic, or eukaryotic sources, or artificially designed. Expression of nucleic acids of interest may be from the natural promoter or derivative thereof or from an entirely heterologous promoter (Kaufman, 2000). Some well-known and much used promoters for expression in eukaryotic cells comprise promoters derived from viruses, such as adenovirus, e.g., the E1A promoter, promoters derived from cytomegalovirus (CMV), such as the CMV immediate early (IE) promoter (referred to herein as the CMV promoter) (obtainable for instance from pcDNA, Invitrogen), promoters derived from Simian Virus 40 (SV40) (Das et al., 1985), and the like. Suitable promoters can also be derived from eukaryotic cells, such as methallothionein (MT) promoters, elongation factor 1.alpha. (EF-1.alpha.) promoter (Gill et al., 2001), ubiquitin C or UB6 promoter (Gill et al., 2001; Schorpp et al., 1996), actin promoter, an immunoglobulin promoter, heat shock promoters, and the like. Some preferred promoters for obtaining expression in eukaryotic cells, which are suitable promoters in the invention, are the CMV-promoter, a mammalian EF1-alpha promoter, a mammalian ubiquitin promoter such as a ubiquitin C promoter, or a SV40 promoter (e.g., obtainable from pIRES, cat.no. 631605, BD Sciences). Testing for promoter function and strength of a promoter is a matter of routine for a person skilled in the art, and in general may for instance encompass cloning a test gene such as lacZ, luciferase, GFP, etc., behind the promoter sequence, and test for expression of the test gene. Of course, promoters may be altered by deletion, addition, mutation of sequences therein, and tested for functionality, to find new, attenuated, or improved promoter sequences. According to the invention, strong promoters that give high transcription levels in the eukaryotic cells of choice are preferred.

[0145] In certain embodiments, a DNA molecule hereof is part of a vector, e.g., a plasmid. Such vectors can easily be manipulated by methods well known to the person skilled in the art, and can for instance be designed for being capable of replication in prokaryotic and/or eukaryotic cells. In addition, many vectors can directly or in the form of isolated desired fragment therefrom be used for transformation of eukaryotic cells and will integrate in whole or in part into the genome of such cells, resulting in stable host cells comprising the desired nucleic acid in their genome.

[0146] Conventional expression systems are DNA molecules in the form of a recombinant plasmid or a recombinant viral genome. The plasmid or the viral genome is introduced into (eukaryotic host) cells and preferably integrated into their genomes by methods known in the art. In certain embodiments, the invention also uses these types of DNA molecules to deliver its improved transgene expression system. A preferred embodiment of the invention is the use of plasmid DNA for delivery of the expression system. A plasmid contains a number of components: conventional components, known in the art, are an origin of replication and a selectable marker for propagation of the plasmid in bacterial cells; a selectable marker that functions in eukaryotic cells to identify and isolate host cells that carry an integrated transgene expression system; the protein of interest, whose high-level transcription is brought about by a promoter that is functional in eukaryotic cells (e.g., the human cytomegalovirus major immediate early promoter/enhancer, pCMV (Boshart et al., 1985); and viral transcriptional terminators (e.g., the SV40 polyadenylation site (Kaufman & Sharp, 1982) for the transgene of interest and the selectable marker.

[0147] The vector used can be any vector that is suitable for cloning DNA and that can be used for transcription of a nucleic acid of interest. When host cells are used it is preferred that the vector is an integrating vector. Alternatively, the vector may be an episomally replicating vector.

[0148] It is widely appreciated that chromatin structure and other epigenetic control mechanisms may influence the expression of transgenes in eukaryotic cells (e.g., Whitelaw et al., 2001). The multicistronic expression units according to the invention form part of a selection system with a rather rigourous selection regime. This generally requires high transcription levels in the host cells of choice. To increase the chance of finding clones of host cells that survive the rigorous selection regime, and possibly to increase the stability of expression in obtained clones, it will generally be preferable to increase the predictability of transcription. Therefore, in certain embodiments, an expression cassette according to the invention further comprises at least one chromatin control element. A "chromatin control element" as used herein is a collective term for DNA sequences that may somehow have an effect on the chromatin structure and therewith on the expression level and/or stability of expression of transgenes in their vicinity (they function "in cis," and hence are placed preferably within 5 kb, more preferably within 2 kb, still more preferably within 1 kb from the transgene) within eukaryotic cells. Such elements have sometimes been used to increase the number of clones having desired levels of transgene expression. The mechanisms by which these elements work may differ for and even within different classes of such elements, and are not completely known for all types of such elements. However, such elements have been described, and for the purpose of the invention chromatin control elements are chosen from the group consisting of matrix or scaffold attachment regions (MARs/SARs) (e.g., Phi-Van et al., 1990; WO 02/074969, WO 2005/040377), insulators (West et al., 2002) such as the beta-globin insulator element (5' HS4 of the chicken beta-globin locus), scs, scs', and the like (e.g., Chung et al., 1993, 1997; Kellum and Schedl, 1991; WO 94/23046, WO 96/04390, WO 01/02553, WO 2004/027072), a ubiquitous chromatin opening element (UCOE) (WO 00/05393, WO 02/24930, WO 02/099089, WO 02/099070), and anti-repressor sequences (also referred to as "STAR" sequences) (Kwaks et al., 2003; WO 03/004704). Non-limiting examples of MAR/SAR sequences that could be used in the current invention are the chicken lysosyme 5' MAR (Phi-Van et al., 1990) or fragments thereof, e.g., the B, K and F regions as described in WO 02/074969); DNA sequences comprising at least one bent DNA element and at least one binding site for a DNA binding protein, preferably containing at least 10% of dinucleotide TA, and/or at least 12% of dinucleotide AT on a stretch of 100 contiguous base pairs, such as a sequence selected from the group of comprising the sequences SEQ ID Nos 1 to 27 in WO 2005/040377, fragments of any one of SEQ ID Nos 1 to 27 in WO 2005/040377 being at least 100 nucleotides in length and having MAR activity, sequences that are at least 70% identical in nucleotide sequence to any one of SEQ ID Nos 1 to 27 in WO 2005/040377 or fragments thereof and having MAR activity, wherein MAR activity is defined as being capable of binding to nuclear matrices/scaffolds in vitro and/or of altering the expression of coding sequences operably linked to a promoter; sequences chosen from any one of SEQ ID NO: 1 to 5 in WO 02/074969, fragments of any one of any one of SEQ ID NO: 1 to 5 in WO 02/074969 and having MAR activity, sequences that are at least 70% identical in nucleotide sequence to any one of SEQ ID NO: 1 to 5 in WO 02/074969 or fragments thereof and having MAR activity; sequences chosen from SEQ ID NO: 1 and SEQ ID NO: 2 in WO 2004/027072, functional fragments thereof and sequences being at least 70% identical thereto. A non-limiting example of insulator sequences that could be used in the invention is a sequence that comprises SEQ ID NO:1 of WO 01/02553. Non-limiting examples of UCOEs that could be used in the invention are sequences depicted in FIGS. 2 and 7 of WO 02/24930, functional fragments thereof and sequences being at least 70% identical thereto while still retaining activity; sequences comprising SEQ ID NO: 28 of US 2005/181428, functional fragments thereof and sequences being at least 70% identical thereto while still retaining activity.

[0149] Preferably, the chromatin control element is an anti-repressor sequence, preferably chosen from the group consisting of: a) any one SEQ ID NO:1 through SEQ ID NO:66; b) fragments of any one of SEQ ID NO:1 through SEQ ID NO:66, wherein the fragments have anti-repressor activity ("functional fragments"); c) sequences that are at least 70% identical in nucleotide sequence to a) or b) wherein the sequences have anti-repressor activity ("functional derivatives"); and d) the complement to any one of a) to c). Preferably, the chromatin control element is chosen from the group consisting of STAR67 (SEQ ID NO:66), STAR7 (SEQ ID NO:7), STAR9 (SEQ ID NO:9), STAR17 (SEQ ID NO:17), STAR27 (SEQ ID NO:27), STAR29 (SEQ ID NO:29), STAR43 (SEQ ID NO:43), STAR44 (SEQ ID NO:44), STAR45 (SEQ ID NO:45), STAR47 (SEQ ID NO:47), STAR61 (SEQ ID NO:61), or a functional fragment or derivative of the STAR sequences. In a particularly preferred embodiment, the STAR sequence is STAR 67 (SEQ ID NO:66) or a functional fragment or derivative thereof. In certain certain embodiments, STAR 67 or a functional fragment or derivative thereof is positioned upstream of a promoter driving expression of the multicistronic transcription unit. In other certain embodiments, the expression cassettes according to the invention are flanked on both sides by at least one anti-repressor sequence.

[0150] Sequences having anti-repressor activity as used herein are sequences that are capable of at least in part counteracting the repressive effect of HP1 or HPC2 proteins when these proteins are tethered to DNA. Sequences having anti-repressor activity (sometimes also referred to as anti-repressor sequences or anti-repressor elements herein) suitable for the invention, have been disclosed in WO 03/004704, incorporated herein by reference, and were coined "STAR" sequences therein (wherever a sequence is referred to as a STAR sequence herein, this sequence has anti-repressor activity according to the invention). As a non-limiting example, the sequences of 66 anti-repressor elements, named STAR1-65 (see WO 03/004704) and STAR67 (see WO 2006/005718), are presented herein as SEQ ID NOS:1-65 and 66, respectively.

[0151] A functional fragment or derivative of a given anti-repressor element is considered equivalent to the anti-repressor element, when it still has anti-repressor activity. The presence of such anti-repressor activity can easily be checked by the person skilled in the art, for instance by the assay described below. Functional fragments or derivatives can easily be obtained by a person skilled in the art of molecular biology, by starting with a given anti-repressor sequence: and making deletions, additions, substitutions, inversions and the like (see, e.g., WO 03/004704). A functional fragment or derivative also comprises orthologs from other species, which can be found using the known anti-repressor sequences by methods known by the person skilled in the art (see, e.g., WO 03/004704). Hence, the invention encompasses fragments of the anti-repressor sequences, wherein the fragments still have anti-repressor activity. The invention also encompasses sequences that are at least 70% identical in nucleotide sequence to the sequences having anti-repressor activity or to functional fragments thereof having anti-repressor activity, as long as these sequences that are at least 70% identical still have the anti-repressor activity according to the invention. Preferably, the sequences are at least 80% identical, more preferably at least 90% identical and still more preferably at least 95% identical to the reference native sequence or functional fragment thereof. For fragments of a given sequence, percent identity refers to that portion of the reference native sequence that is found in the fragment.

[0152] Sequences having anti-repressor activity according to the invention can be obtained by various methods, including but not limited to the cloning from the human genome or from the genome of another organism, or by for instance amplifying known anti-repressor sequences directly from such a genome by using the knowledge of the sequences, e.g., by PCR, or can in part or wholly be chemically synthesized.

[0153] Sequences having anti-repressor activity, and functional fragments or derivatives thereof, are structurally defined herein by their sequence and in addition are functionally defined as sequences having anti-repressor activity, which can be determined with the assay described below.

[0154] Any sequence having anti-repressor activity according to the invention should at least be capable of surviving the following functional assay (see WO 03/004704, example 1, incorporated herein by reference).

[0155] Human U-2 OS cells (ATCC HTB-96) are stably transfected with the pTet-Off plasmid (Clontech K1620-A) and with nucleic acid encoding a LexA-repressor fusion protein containing the LexA DNA binding domain and the coding region of either HP1 or HPC2 (Drosophila Polycomb group proteins that repress gene expression when tethered to DNA; the assay works with either fusion protein) under control of the Tet-Off transcriptional regulatory system (Gossen and Bujard, 1992). These cells are referred to below as the reporter cells for the anti-repressor activity assay. A reporter plasmid, which provides hygromycin resistance, contains a polylinker sequence positioned between four LexA operator sites and the SV40 promoter that controls the zeocin-resistance gene. The sequence to be tested for anti-repressor activity can be cloned in the polylinker. Construction of a suitable reporter plasmid, such as pSelect, is described in Example 1 and FIG. 1 of WO 00/004704. The reporter plasmid is transfected into the reporter cells, and the cells are cultured under hygromycin selection (25 .mu.g/ml; selection for presence of the reporter plasmid) and tetracycline repression (doxycycline, 10 ng/ml; prevents expression of the LexA-repressor fusion protein). After 1 week of growth under these conditions, the doxycycline concentration is reduced to 0.1 ng/ml to induce the LexA-repressor gene, and after 2 days ZEOCIN.RTM. is added to 250 .mu.g/ml. The cells are cultured for 5 weeks, until the control cultures (transfected with empty reporter plasmid, i.e., lacking a cloned anti-repressor sequence in the polylinker) are killed by the ZEOCIN.RTM. (in this control plasmid, the SV40 promoter is repressed by the LexA-repressor fusion protein that is tethered to the LexA operating sites, resulting in insufficient ZEOCIN.RTM. expression in such cells to survive ZEOCIN.RTM. selection). A sequence has anti-repressor activity according to the invention if, when the sequence is cloned in the polylinker of the reporter plasmid, the reporter cells survive the 5 weeks selection under zeocin. Cells from such colonies can still be propagated onto new medium containing ZEOCIN.RTM. after the 5 weeks ZEOCIN.RTM. selection, whereas cells transfected with reporter plasmids lacking anti-repressor sequences cannot be propagated onto new medium containing zeocin. Any sequence not capable of conferring such growth after 5 weeks on ZEOCIN.TM. in this assay, does not qualify as a sequence having anti-repressor activity, or functional fragment or functional derivative thereof according to the invention. As an example, other known chromatin control elements such as those tested by Van der Vlag et al. (2000), including Drosophila scs (Kellum and Schedl, 1991), 5'-HS4 of the chicken .beta.-globin locus (Chung et al., 1993, 1997) or Matrix Attachment Regions (MARs) (Phi-Van et al., 1990), do not survive this assay.

[0156] In addition, it is preferred that the anti-repressor sequence or functional fragment or derivative thereof confers a higher proportion of reporter over-expressing clones when flanking a reporter gene (e.g., luciferase, GFP) which is integrated into the genome of U-2 OS or CHO cells, compared to when the reporter gene is not flanked by anti-repressor sequences, or flanked by weaker repression blocking sequences such as Drosophila scs. This can be verified using for instance the pSDH vector, or similar vectors, as described in Example 1 and FIG. 2 of WO 03/004704.

[0157] Anti-repressor elements can have at least one of three consequences for production of protein: (1) they increase the predictability of identifying host cell lines that express a protein at industrially acceptable levels (they impair the ability of adjacent heterochromatin to silence the transgene, so that the position of integration has a less pronounced effect on expression); (2) they result in host cell lines with increased protein yields; and/or (3) they result in host cell lines that exhibit more stable protein production during prolonged cultivation.

[0158] Any STAR sequence can be used in the expression cassettes according to the invention, but the following STAR sequences are particularly useful: STAR67 (SEQ ID NO:66), STAR7 (SEQ ID NO:7), STAR9 (SEQ ID NO:9), STAR17 (SEQ ID NO:17), STAR27 (SEQ ID NO:27), STAR29 (SEQ ID NO:29), STAR43 (SEQ ID NO:43), STAR44 (SEQ ID NO:44), STAR45 (SEQ ID NO:45), STAR47 (SEQ ID NO:47), STAR61 (SEQ ID NO:61), or functional fragments or derivatives of these STAR sequences.

[0159] In certain embodiments, the anti-repressor sequence, preferably STAR67, is placed upstream of the promoter, preferably such that less than 2 kb are present between the 3' end of the anti-repressor sequence and the start of the promoter sequence. In certain embodiments, less than 1 kb, more preferably less than 500 nucleotides (nt), still more preferably less than about 200, less than about 100, less than about 50, or less than about 30 nt are present between the 3' end of the anti-repressor sequence and the start of the promoter sequence. In certain certain embodiments, the anti-repressor sequence is cloned directly upstream of the promoter, resulting in only about 0-20 nt between the 3' end of the anti-repressor sequence and the start of the promoter sequence.

[0160] For the production of multimeric proteins, two or more expression cassettes can be used. Preferably, both expression cassettes are multicistronic expression cassettes according to the invention, each coding for a different selectable marker protein, so that selection for both expression cassettes is possible. This embodiment has proven to give good results, e.g., for the expression of the heavy and light chain of antibodies. It will be clear that both expression cassettes may be placed on one nucleic acid molecule or both may be present on a separate nucleic acid molecule, before they are introduced into host cells. An advantage of placing them on one nucleic acid molecule is that the two expression cassettes are present in a single predetermined ratio (e.g., 1:1) when introduced into host cells. On the other hand, when present on two different nucleic acid molecules, this allows the possibility to vary the molar ratio of the two expression cassettes when introducing them into host cells, which may be an advantage if the preferred molar ratio is different from 1:1 or when it is unknown beforehand what is the preferred molar ratio, so that variation thereof and empirically finding the optimum can easily be performed by the skilled person. According to the invention, preferably at least one of the expression cassettes, but more preferably each of them, comprises a chromatin control element, more preferably an anti-repressor sequence.

[0161] In another embodiment, the different subunits or parts of a multimeric protein are present on a single expression cassette.

[0162] Instead of or in addition to the presence of a STAR sequence placed upstream of a promoter in an expression cassette, it has proven highly beneficial to provide a STAR sequence on both sides of an expression cassette, such that expression cassette comprising the transgene is flanked by two STAR sequences, which in certain embodiments are essentially identical to each other.

[0163] It is shown herein that the combination of a first anti-repressor element upstream of a promoter and flanking the expression cassette by two other anti-repressor sequences provides superior results.

[0164] As at least some anti-repressor sequences can be directional (WO 00/004704), the anti-repressor sequences flanking the expression cassette (anti-repressor A and B) may beneficially placed in opposite direction with respect to each other, such that the 3' end of each of these anti-repressor sequences is facing inwards to the expression cassette (and to each other). Hence, in certain embodiments, the 5' side of an anti-repressor element faces the DNA/chromatin of which the influence on the transgene is to be diminished by the anti-repressor element. For an anti-repressor sequence upstream of a promoter in an expression cassette, the 3' end faces the promoter. The sequences of the anti-repressor elements in the sequence listing (SEQ ID NOS:1-66) are given in 5' to 3' direction, unless otherwise indicated.

[0165] In certain embodiments, transcription units or expression cassettes according to the invention are provided, further comprising: a) a transcription pause (TRAP) sequence upstream of the promoter that drives transcription of the multicistronic transcription unit, the TRAP being in a 5' to 3' direction; or b) a TRAP sequence downstream of the open reading frame of the polypeptide of interest and preferably downstream of the transcription termination sequence of the multicistronic transcription unit, the TRAP being in a 3' to 5' orientation; or c) both a) and b); wherein a TRAP sequence is functionally defined as a sequence which when placed into a transcription unit, results in a reduced level of transcription in the nucleic acid present on the 3' side of the TRAP when compared to the level of transcription observed in the nucleic acid on the 5' side of the TRAP. Non-limiting examples of TRAP sequences are transcription termination and/or polyadenylation signals. One non-limiting example of a TRAP sequence is given in SEQ ID NO:126. Examples of other TRAP sequences, methods to find these, and uses thereof have been described in WO 2004/055215.

[0166] DNA molecules comprising multicistronic transcription units and/or expression cassettes according to the invention can be used for improving expression of nucleic acid, preferably in host cells. The terms "cell"/"host cell" and "cell line"/"host cell line" are respectively typically defined as a cell and homogeneous populations thereof that can be maintained in cell culture by methods known in the art, and that have the ability to express heterologous or homologous proteins.

[0167] Prokaryotic host cells can be used to propagate and/or perform genetic engineering with the DNA molecules of the invention, especially when present on plasmids capable of replicating in prokaryotic host cells such as bacteria.

[0168] A host cell according to the invention preferably is a eukaryotic cell, more preferably a mammalian cell, such as a rodent cell or a human cell or fusion between different cells. In certain non-limiting embodiments, the host cell is a U-2 OS osteosarcoma, CHO (Chinese hamster ovary), HEK 293, HuNS-1 myeloma, WERI-Rb-1 retinoblastoma, BHK, Vero, non-secreting mouse myeloma Sp2/0-Ag 14, non-secreting mouse myeloma NS0, NCI-H295R adrenal gland carcinomal or a PER.C6.RTM. cell.

[0169] In certain embodiments, a host cell is a cell expressing at least E1A, and preferably also E1B, of an adenovirus. As non-limiting examples, such a cell can be derived from for instance human cells, for instance from a kidney (example: HEK 293 cells, see Graham et al., 1977), lung (e.g., A549, see, e.g., WO 98/39411) or retina (example: HER cells marketed under the trade mark PER.C6.RTM., see U.S. Pat. No. 5,994,128), or from amniocytes (e.g., N52.E6, described in U.S. Pat. No. 6,558,948), and similarly from other cells. Methods for obtaining such cells are described for instance in U.S. Pat. Nos. 5,994,128 and 6,558,948. PER.C6.RTM. cells for the purpose of the invention means cells from an upstream or downstream passage or a descendent of an upstream or downstream passage of cells as deposited under ECACC no. 96022940, i.e., having the characteristics of those cells. It has been previously shown that such cells are capable of expression of proteins at high levels (e.g., WO 00/63403, and Jones et al., 2003). In other certain embodiments, the host cells are CHO cells, for instance CHO-K1, CHO-S, CHO-DG44, CHO-DUKXB11, and the like. In certain embodiments, the CHO cells have a dhfr.sup.- phenotype.

[0170] Such eukaryotic host cells can express desired polypeptides, and are often used for that purpose. They can be obtained by introduction of a DNA molecule of the invention, preferably in the form of an expression cassette, into the cells. Preferably, the expression cassette is integrated in the genome of the host cells, which can be in different positions in various host cells, and selection will provide for a clone where the transgene is integrated in a suitable position, leading to a host cell clone with desired properties in terms of expression levels, stability, growth characteristics, and the like. Alternatively the multicistronic transcription unit may be targeted or randomly selected for integration into a chromosomal region that is transcriptionally active, e.g., behind a promoter present in the genome. Selection for cells containing the DNA described herein can be performed by selecting for the selectable marker polypeptide, using routine methods known by the person skilled in the art. When such a multicistronic transcription unit is integrated behind a promoter in the genome, an expression cassette according to the invention can be generated in situ, i.e., within the genome of the host cells.

[0171] The host cells may be from a stable clone that can be selected and propagated according to standard procedures known to the person skilled in the art. A culture of such a clone is capable of producing polypeptide of interest, if the cells comprise the multicistronic transcription unit of the invention. Cells according to the invention preferably are able to grow in suspension culture in serum-free medium.

[0172] In certain embodiments, the DNA molecule comprising the multicistronic transcription unit of the invention, preferably in the form of an expression cassette, is integrated into the genome of the eukaryotic host cell according to the invention. This will provide for stable inheritance of the multicistronic transcription unit.

[0173] Selection for the presence of the selectable marker polypeptide, and hence for expression, can be performed during the initial obtaining of the cells, and could be lowered or stopped altogether after stable clones have been obtained. It is however also possible to apply the selection agent during later stages continuously, or only occasionally, possibly at lower levels than during initial selection of the host cells.

[0174] A polypeptide of interest according to the invention can be any protein, and may be a monomeric protein or a (part of a) multimeric protein. A multimeric protein comprises at least two polypeptide chains. Non-limiting examples of a protein of interest according to the invention are enzymes, hormones, immunoglobulin chains, therapeutic proteins like anti-cancer proteins, blood coagulation proteins such as Factor VIII, multi-functional proteins, such as erythropoietin, diagnostic proteins, or proteins or fragments thereof useful for vaccination purposes, all known to the person skilled in the art.

[0175] In certain embodiments, an expression cassette hereof encodes an immunoglobulin heavy or light chain or an antigen binding part, derivative and/or analogue thereof. In one embodiment, a protein expression unit is provided, wherein the protein of interest is an immunoglobulin heavy chain. In yet another preferred embodiment, a protein expression unit is provided, wherein the protein of interest is an immunoglobulin light chain. When these two protein expression units are present within the same (host) cell a multimeric protein and more specifically an immunoglobulin, is assembled. Hence, in certain embodiments, the protein of interest is an immunoglobulin, such as an antibody, which is a multimeric protein. Preferably, such an antibody is a human or humanized antibody. In certain embodiments thereof, it is an IgG, IgA, or IgM antibody. An immunoglobulin may be encoded by the heavy and light chains on different expression cassettes, or on a single expression cassette. Preferably, the heavy and light chain are each present on a separate expression cassette, each having its own promoter (which may be the same or different for the two expression cassettes), each comprising a multicistronic transcription unit according to the invention, the heavy and light chain being the polypeptide of interest, and preferably each coding for a different selectable marker protein, so that selection for both heavy and light chain expression cassette can be performed when the expression cassettes are introduced and/or present in a eukaryotic host cell.

[0176] The polypeptide of interest may be from any source, and in certain embodiments is a mammalian protein, an artificial protein (e.g., a fusion protein or mutated protein), and preferably is a human protein.

[0177] The configurations of the expression cassettes hereof may also be used when the ultimate goal is not the production of a polypeptide of interest, but the RNA itself, for instance for producing increased quantities of RNA from an expression cassette, which may be used for purposes of regulating other genes (e.g., RNAi, antisense RNA), gene therapy, in vitro protein production, etc.

[0178] In one aspect, provided is a method for generating a host cell expressing a polypeptide of interest, the method comprising the steps of: a) introducing into a plurality of precursor cells an expression cassette according to the invention, and b) culturing the generated cells under conditions selecting for expression of the selectable marker polypeptide, and c) selecting at least one host cell producing the polypeptide of interest. This novel method provides a very good result in terms of the ratio of obtained clones versus clones with high expression of the desired polypeptide. Using the most stringent conditions, i.e., the weakest translation efficiency for the selectable marker polypeptide (using the weakest translation start sequence), far fewer colonies are obtained using the same concentration of selection agent than with known selection systems, and a relatively high percentage of the obtained clones produces the polypeptide of interest at high levels. In addition, the obtained levels of expression appear higher than those obtained when an even larger number of clones using the known selection systems are used.

[0179] The selection system is swift because it does not require copy number amplification of the transgene. Hence, cells with low copy numbers of the multicistronic transcription units already provide high expression levels. High transgene copy numbers of the transgene may be prone to genetic instability and repeat-induced silencing (e.g., Kim et al., 1998; McBurney et al., 2002). Therefore, an additional advantage of the embodiments with relatively low transgene copy numbers is that lower copy numbers are anticipated to be less prone to recombination and to repeat-induced silencing, and therefore less problems in this respect are anticipated when using host cells with a limited number of copies of the transgene compared to host cells obtained using an amplification system where hundreds or even thousands of copies of the selectable marker and protein of interest coding sequences may be present in the genome of the cell. Also provided are examples of high expression levels, using the multicistronic transcription unit selection system, while the copy number of the transgene is relatively low, i.e., less than 30 copies per cell, or even less than 20 copies per cell. Hence, the invention allows the generation of host cells according to the invention, comprising less than 30 copies of the multicistronic transcription unit in the genome of the host cells, preferably less than 25, more preferably less than 20 copies, while at the same time providing sufficient expression levels of the polypeptide of interest for commercial purposes, e.g., more than 15, preferably more than 20 pg/cell/day of an antibody.

[0180] While clones having relatively low copy numbers of the multicistronic transcription units and high expression levels can be obtained, the selection system of the invention nevertheless can be combined with amplification methods to even further improve expression levels. This can, for instance, be accomplished by amplification of a co-integrated dhfr gene using methotrexate, for instance by placing dhfr on the same nucleic acid molecule as the multicistronic transcription unit of the invention, or by cotransfection when dhfr is on a separate DNA molecule.

[0181] In one aspect, provided is a method for producing a polypeptide of interest, the method comprising culturing a host cell, the host cell comprising a DNA molecule comprising a multicistronic expression unit or an expression cassette according to the invention, and expressing the polypeptide of interest from the coding sequence for the polypeptide of interest.

[0182] The host cell for this aspect is a eukaryotic host cell, preferably a mammalian cell, such as a CHO cell, further as described above.

[0183] Introduction of nucleic acid that is to be expressed in a cell, can be done by one of several methods, which as such are known to the person skilled in the art, also dependent on the format of the nucleic acid to be introduced. The methods include but are not limited to transfection, infection, injection, transformation, and the like. Suitable host cells that express the polypeptide of interest can be obtained by selection as described above.

[0184] In certain embodiments, selection agent is present in the culture medium at least part of the time during the culturing, either in sufficient concentrations to select for cells expressing the selectable marker polypeptide or in lower concentrations. In certain embodiments, selection agent is no longer present in the culture medium during the production phase when the polypeptide is expressed. In certain embodiments metabolic selection marker proteins such as trp, his, or dhfr, are used, and selection can be easily continued during the production phase by culturing in the suitable culture medium described supra.

[0185] Culturing a cell is done to enable it to metabolize, and/or grow and/or divide and/or produce recombinant proteins of interest. This can be accomplished by methods well known to persons skilled in the art, and includes but is not limited to providing nutrients for the cell. The methods comprise growth adhering to surfaces, growth in suspension, or combinations thereof. Culturing can be done for instance in dishes, roller bottles or in bioreactors, using batch, fed-batch, continuous systems such as perfusion systems, and the like. In order to achieve large scale (continuous) production of recombinant proteins through cell culture it is preferred in the art to have cells capable of growing in suspension, and it is preferred to have cells capable of being cultured in the absence of animal- or human-derived serum or animal- or human-derived serum components.

[0186] The conditions for growing or multiplying cells (see, e.g., Tissue Culture, Academic Press, Kruse and Paterson, editors (1973)) and the conditions for expression of the recombinant product are known to the person skilled in the art. In general, principles, protocols, and practical techniques for maximizing the productivity of mammalian cell cultures can be found in Mammalian Cell Biotechnology: a Practical Approach (M. Butler, ed., IRL Press, 1991).

[0187] In a preferred embodiment, the expressed protein is collected (isolated), either from the cells or from the culture medium or from both. It may then be further purified using known methods, e.g., filtration, column chromatography, etc, by methods generally known to the person skilled in the art.

[0188] The selection method according to the invention works in the absence of chromatin control elements, but improved results are obtained when the multicistronic expression units are provided with such elements. The selection method according to the invention works particularly well when an expression cassette according to the invention, comprising at least one anti-repressor sequence is used. Depending on the selection agent and conditions, the selection can in certain cases be made so stringent, that only very few or even no host cells survive the selection, unless anti-repressor sequences are present. Hence, the combination of the novel selection method and anti-repressor sequences provides a very attractive method to obtain only limited numbers of colonies with a greatly improved chance of high expression of the polypeptide of interest therein, while at the same time the obtained clones comprising the expression cassettes with anti-repressor sequences provide for stable expression of the polypeptide of interest, i.e., they are less prone to silencing or other mechanisms of lowering expression than conventional expression cassettes.

[0189] In certain embodiments, almost no clones are obtained when no anti-repressor sequence is present in the expression cassette according to the invention, providing for very stringent selection. The novel selection system disclosed herein therefore also provides the possibility to test parts of anti-repressor elements for functionality, by analyzing the effects of such sequences when present in expression cassettes of the invention under selection conditions. This easy screen, which provides an almost or even complete black and white difference in many cases, therefore can contribute to identifying functional parts or derivatives from anti-repressor sequences. When known anti-repressor sequences are tested, this assay can be used to characterize them further. When fragments of known anti-repressor sequences are tested, the assay will provide functional fragments of such known anti-repressor sequences.

[0190] The invention disclosed in the incorporated '953 application provides a multicistronic transcription unit having an alternative configuration compared to the configuration disclosed in the incorporated '525 application: in the alternative configuration of the '953 invention, the sequence coding for the polypeptide of interest is upstream of the sequence coding for the selectable marker polypeptide, and the selectable marker polypeptide is operably linked to a cap-independent translation initiation sequence, preferably an internal ribosome entry site (IRES). Such multicistronic transcription units as such were known (e.g., Rees et al., 1996, WO 03/106684), but had not been combined with a non-optimal start codon. According to the alternative of the '953 invention, the start codon (or the context thereof) of the selectable marker polypeptide is changed into a non-optimal start codon, to further decrease the translation initiation rate for the selectable marker. This therefore leads to a desired decreased level of expression of the selectable marker polypeptide, and can result in highly effective selection host cells expressing high levels of the polypeptide of interest, as with the embodiments disclosed in the incorporated '525 application. One potential advantage of this alternative aspect of the '953 invention, compared to the embodiments outlined in the '525 application, is that the coding sequence of the selectable marker polypeptide needs no further modification of internal ATG sequences, because any internal ATG sequences therein can remain intact since they are no longer relevant for translation of further downstream polypeptides. This may be especially advantageous if the coding sequence for the selectable marker polypeptide contains several internal ATG sequences, because the task of changing these and testing the resulting construct for functionality does not have to be performed for the invention: only mutation of the ATG start codon (or its context) suffices in this case. As will be understood by the person skilled in the art after reading the description, this aspect can further be advantageously combined with the embodiments outlined above for the multicistronic transcription units. For instance expression cassettes comprising the multicistronic transcription unit can further in certain embodiments comprise at least one chromatin control element. It is shown hereinbelow (example 19) that this alternative provided by the invention of the '953 application also leads to very good results.

[0191] In this alternative embodiment (disclosed first in the '953 application), the coding sequence for the polypeptide of interest comprises a stop codon, so that translation of the first cistron (encoding the polypeptide of interest) ends upstream of the IRES, which IRES is operably linked to the second cistron (encoding the selectable marker polypeptide). In these embodiments, the IRES is required for the translation of the selectable marker polypeptide.

[0192] As used herein, an "internal ribosome entry site" or "IRES" refers to an element that promotes direct internal ribosome entry to the initiation codon, such as normally an ATG, but in this invention preferably GTG or TTG, of a cistron (a protein encoding region), thereby leading to the cap-independent translation of the gene. See, e.g., Jackson R J, Howell M T, Kaminski A (1990) Trends Biochem Sci 15 (12): 477-83) and Jackson R J and Kaminski, A. (1995) RNA 1 (10): 985-1000. The invention encompasses the use of any IRES element, which is able to promote direct internal ribosome entry to the initiation codon of a cistron. "Under translational control of an IRES" (also referred to as "operably linked to an IRES") as used herein means that translation is associated with the IRES and proceeds in a cap-independent manner. As used herein, the term "IRES" encompasses functional variations of IRES sequences as long as the variation is able to promote direct internal ribosome entry to the initiation codon of a cistron. As used herein, "cistron" refers to a polynucleotide sequence, or gene, of a protein, polypeptide, or peptide of interest. "Operably linked" refers to a situation where the components described are in a relationship permitting them to function in their intended manner. Thus, for example, a promoter "operably linked" to a cistron is ligated in such a manner that expression of the cistron is achieved under conditions compatible with the promoter. Similarly, a nucleotide sequence of an IRES operably linked to a cistron is ligated in such a manner that translation of the cistron is achieved under conditions compatible with the IRES.

[0193] Internal ribosome binding site (IRES) elements are known from viral and mammalian genes (Martinez-Salas, 1999), and have also been identified in screens of small synthetic oligonucleotides (Venkatesan & Dasgupta, 2001). The IRES from the encephalomyocarditis virus has been analyzed in detail (Mizuguchi et al., 2000). An IRES is an element encoded in DNA that results in a structure in the transcribed RNA at which eukaryotic ribosomes can bind and initiate translation. An IRES permits two or more proteins to be produced from a single RNA molecule (the first protein is translated by ribosomes that bind the RNA at the cap structure of its 5' terminus, (Martinez-Salas, 1999)). Translation of proteins from IRES elements is less efficient than cap-dependent translation: the amount of protein from IRES-dependent open reading frames (ORFs) ranges from less than 20% to 50% of the amount from the first ORF (Mizuguchi et al., 2000). The reduced efficiency of IRES-dependent translation provides an advantage that is exploited by this embodiment of the current invention. Furthermore, mutation of IRES elements can attenuate their activity, and lower the expression from the IRES-dependent ORFs to below 10% of the first ORF (Lopez de Quinto & Martinez-Salas, 1998, Rees et al., 1996). The advantage exploited by the invention is as follows: when the IRES-dependent ORF encodes a selectable marker protein, its low relative level of translation means that high absolute levels of transcription must occur in order for the recombinant host cell to be selected. Therefore, selected recombinant host cell isolates will by necessity express high amounts of the transgene mRNA. Since the recombinant protein is translated from the cap-dependent ORF, it can be produced in abundance resulting in high product yields. On top of this, the non-optimal (i.e., non-ATG) start codon for the selectable marker polypeptide according to the invention, further improves the chances of obtaining a preferred host cell, i.e., a host cell expressing high levels of recombinant protein of interest.

[0194] It is clear to a person skilled in the art that changes to the IRES can be made without altering the essence of the function of the IRES (hence, providing a protein translation initiation site with a reduced translation efficiency), resulting in a modified IRES. Use of a modified IRES which is still capable of providing a small percentage of translation (compared to a 5' cap translation) is therefore also included in this invention.

[0195] The practice of this invention will employ, unless otherwise indicated, conventional techniques of immunology, molecular biology, microbiology, cell biology, and recombinant DNA, which are within the skill of the art. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: A Laboratory Manual, 2.sup.nd edition, 1989; Current Protocols in Molecular Biology, Ausubel F M, et al., eds, 1987; the series Methods in Enzymology (Academic Press, Inc.); PCR2: A Practical Approach, MacPherson M J, Hams B D, Taylor G R, eds, 1995; Antibodies: A Laboratory Manual, Harlow and Lane, eds, 1988.

[0196] The invention is further described with the aid of the following illustrative Examples.

EXAMPLES

[0197] Examples 1-18 describe details of several embodiments of the incorporated '525 application. Example 19 describes the selection system with the multicistronic transcription unit of the invention, and it will be clear that the variations described in Examples 1-18 can also be applied and tested for the multicistronic transcription units of the present application.

Example 1

Construction and Testing of a Zeocin-Resistance Gene Product with No Internal Methionine

[0198] The basic idea behind the development of the novel selection system of the incorporated '525 application is to place the gene encoding the resistance gene upstream of a gene of interest, and one promoter drives the expression of this bicistronic mRNA. The translation of the bicistronic mRNA is such that only in a small percentage of translation events the resistance gene will be translated into protein and that most of the time the downstream gene of interest will be translated into protein. Hence the translation efficiency of the upstream resistance gene is severely hampered in comparison to the translation efficiency of the downstream gene of interest. To achieve this, three steps can be taken according to the invention of the '525 application: [0199] 1) within the resistance gene on the mRNA, the searching ribosome preferably should not meet another AUG, since any downstream AUG may serve as translation start codon, resulting in a lower translation efficiency of the second, downstream gene of interest. Hence, preferably any AUG in the resistance gene mRNA will have to be replaced. In case this AUG is a functional codon that encodes a methionine, this amino acid will have to be replaced by a different amino acid, for instance by a leucine (FIGS. 1A and B); [0200] 2) the start codon of the resistance gene must have a bad context (be part of a non-optimal translation start sequence); i.e., the ribosomes must start translation at this start codon only in a limited number of events, and hence in most events continue to search for a better, more optimal start codon (FIG. 1C-E). Three different stringencies can be distinguished: a) the normal ATG start codon, but placed in a bad context (TTTATGT) (called ATGmut) (FIG. 1C), b) preferably when placed in an optimal context, GTG can serve as start codon (ACCGTGG) (FIG. 1D) and c) preferably when placed in an optimal context, TTG can serve as start codon (ACCGTGG) (FIG. 1E). The most stringent translation condition is the TTG codon, followed by the GTG codon (FIG. 1). The Zeo mRNA with a TTG as start codon is expected to produce the least Zeocin-resistance protein and will hence convey the lowest functional ZEOCIN.RTM. resistance to cells (FIGS. 1, 2). [0201] 3) preferably, the normal start codon (ATG) of the downstream gene of interest should have an optimal translation context (e.g., ACCATGG) (FIG. 2A-D). This warrants that, after steps 1 and 2 have been taken, in most events the start codon of the gene of interest will function as start codon of the bicistronic mRNA.

[0202] In this example, step 1 is performed, that is, in the Zeocin-resistance gene one existing internal methionine is replaced by another amino acid (FIG. 1B-E). It is important that after such a change the Zeo protein still confers ZEOCIN.RTM. resistance to the transfected cells. Since it is not known beforehand which amino acid will fulfill this criterium, three different amino acids have been tried: leucine, threonine and valine. The different constructs with distinct amino acids have than been tested for their ability to still confer ZEOCIN.RTM. resistance to the transfected cells.

Materials and Methods

Construction of the Plasmids

[0203] The original Zeo open reading frame has the following sequence around the start codon: AAACCGCC (start codon in bold; SEQ ID NO:67). This is a start codon with an optimal translational context (FIG. 1A). First the optimal context of the start codon of the Zeo open reading frame was changed through amplification from plasmid pCMV-zeo [Invitrogen V50120], with primer pair ZEOforwardMUT (SEQ ID NO:68): GATCTCGCGATACAGGATTTTTGGCCAAGTTGACCAGTGCCGTTCCG and ZEO-WTreverse (WT=Wild type; SEQ ID NO:69): AGGCGAATTCAGTCCTGCTCCTCGGC, using pCMV-ZEO (Invitrogen; V50120) as a template. The amplified product was cut with Nrul-EcoRI, and ligated into pcDNA3, resulting in pZEOATGmut.

[0204] The original Zeo open reading frame contains an in frame ATG, encoding methionine at amino acid position 94 (out of 124). This internal ATG, encoding the methionine at position 94 was changed in such a way that the methionine was changed into leucine, threonine or valine respectively:

[0205] 1) To replace the internal codon for methionine in the Zeo open reading frame with the codon for leucine (FIG. 1B), part of the Zeo open reading frame was amplified using primer pair ZEOforwardMUT (SEQ ID NO:68) and ZEO-LEUreverse (SEQ ID NO:70): AGGCCCCGCCCCCACGGCTGCTCGCCGATCTCGGTCAAGGCCGGC. The PCR product was cut with BamHI-Bgll and ligated into pZEOATGmut. This resulted in pZEO(leu). To replace the internal codon for methionine in the Zeo open reading frame with the codon for threonine (not shown, but as in FIG. 1B), part of the Zeo open reading frame was amplified using primer pair ZEOforwardMUT (SEQ ID NO:68) and ZEO-THRreverse (SEQ ID NO:71): AGGCCCCGCCCCCACGGCTGCTCGCCGATCTCGGTGGTGGCCGGC. The PCR product was cut with BamHI-BglI and ligated into pZEOATGmut. This resulted in pZEO(thr). To replace the internal codon for methionine in the Zeo open reading frame with the codon for valine (not shown, but as in FIG. 1B) (GTG), part of the Zeo open reading frame was amplified using primer pair ZEOforwardMUT (SEQ ID NO:68) and ZEO-VALreverse (SEQ ID NO:72): AGGCCCCGCCCCCACGGCTGCTCGCCGATCTCGGTCCACGCCGG. The PCR product was cut with BamHI-BglI and ligated into pZEOATGmut. This resulted in pZEO(val).

Transfection and Culturing of Cells

[0206] The Chinese Hamster Ovary cell line CHO-K1 (ATCC CCL-61) was cultured in HAMS-F12 medium+10% Fetal Calf Serum containing 2 mM glutamine, 100 U/ml penicillin, and 100 micrograms/ml streptomycin at 37.degree. C./5% CO.sub.2. Cells were transfected with the plasmids using Lipofectamine 2000 (Invitrogen) as described by the manufacturer. Briefly, cells were seeded to culture vessels and grown overnight to 70-90% confluence. Lipofectamine reagent was combined with plasmid DNA at a ratio of 6 microliters per microgram (e.g., for a 10 cm Petri dish, 20 micrograms DNA and 120 microliters Lipofectamine) and added to the cells. After overnight incubation the transfection mixture was replaced with fresh medium, and the transfected cells were incubated further. After overnight cultivation, cells were trypsinized and seeded into fresh culture vessels with fresh medium containing ZEOCIN.RTM. (100 .mu.g/ml). When individual colonies became visible (approximately ten days after transfection) colonies were counted.

Results

[0207] Four plasmids were transfected to CHO-K1 cells, 1) pZEO(WT), 2) pZEO(leu), 3) pZEO(thr), and 4) pZEO(val). The cells were selected on 100 .mu.g/ml zeocine. Transfection of pZEO(leu) resulted in an equal number of zeocin-resistant colonies in comparison with the control pZEO (WT). pZEO(thr) and pZEO(val) gave less colonies, but the differences were not in the order of a magnitude. Hence it was concluded that changes of the internal methionine into leucine, threonine or valine all resulted in a Zeocin-resistance protein that is still able to confer ZEOCIN.RTM. resistance to the transfected cells. Rather arbitrarily, pZEO(leu) was chosen as starting point for creating different start codons on the Zeo open reading frame. Hence in the examples below the start as well as internal methionines are always replaced by leucine, for zeocin, but also for other selectable marker genes, as will be clear from further examples.

Example 2

Creation and testing of Zeocin-d2EGFP Bicistronic Constructs with Differential Translation Efficiencies

[0208] To create a bicistronic mRNA encompassing a mutated Zeocin-resistance mRNA with less translational efficiency, and the d2EGFP gene as downstream gene of interest, the start codon of the d2EGFP gene was first optimized (step 3 in Example 1). After that, the different versions of the Zeocin-resistance gene were created. The differences between these versions are that they have different start codons, with distinct translational efficiency (step 2 in Example 1, FIG. 1C-E). These different ZEOCIN.RTM.-resistance gene versions were cloned upstream of the modified d2EGFP gene (FIG. 2).

Materials and Methods

Creation of Plasmids

[0209] The d2EGFP reporter ORF was introduced into pcDNA3. The sequence around the start codon of this d2EGFP cDNA is GAATTCGG (start codon in bold; SEQ ID NO:73), which is not optimal. As a first step, d2EGFP was amplified from pd2EGFP (Clontech 6010-1) with primers d2EGFPforwardBamHI (SEQ ID NO:74): GATCGGATCCTATGAGGAATTCGCCACCGTGAGCAAGGGCGAGGAG and d2EGFPreverseNotI (SEQ ID NO:75): AAGGAAAAAAGCGGCCGCCTACACATTGATCCTAGCAGAAG. This product contains now a start codon with an optimal translational context (ACCG). This created pd2EGFP and subsequently, the Zeo open reading frame was ligated into pd2EGFP, resulting in pZEO-d2EGFP. It is pointed out here that the optimization of the translational start sequence of the gene of interest (here: EGFP as a model gene) is not essential but preferred in order to skew the translation initiation frequency towards the gene of interest still further.

[0210] Now three classes of constructs were made: [0211] 1) ATG as a start codon in the Zeo resistance gene, but in a bad context (TTTT) (not shown, but as in FIG. 2B) and followed by spacer sequence, instead of the optimal ATG (FIG. 2A). The spacer sequence is placed downstream of the ATG sequence. In the ZEOCIN.RTM. (and possibly in the blasticidin) RNA, a secondary structure is present, causing the ribosome to be temporarily delayed. Because of this, a poor start codon can in some cases be used by the ribosome, despite being a bad start codon or being in a non-optimal context for translation initiation. This causes the chance of translation to increase, and in case of the current invention therefore renders the stringency for selection lower. To decrease this effect, and hence to further decrease the translation initation efficiency, a spacer sequence is introduced that does not contain a secondary structure (Kozak, 1990). Hence, the term "space" is introduced, and used in the plasmid and primer names to indicate the presence of such a spacer sequence. The spacer removes the "ribosome delaying sequence" from the neighbourhoud of the initiation codon, therewith causing the ribosome to start translating less frequently, and hence increasing the stringency of the selection according to the invention. The spacer introduces some extra amino acids in the coding sequence. This has been done in some cases for both ZEOCIN.RTM. and for blasticidin, as will be apparent from the examples. The nomenclature of the plasmids and primers in general in the following is along these lines: the name of the selectable marker polypeptide is referred to by abbreviation (e.g., Zeo, Blas, etc); the start codon is mentioned (e.g., ATG, GTG, TTG); when this start codon is placed in a non-optimal context for translation initiation, the addition "mut" is used (this is usually only done for ATG start codons, as combining a non-optimal context with a non-ATG start codon usually does not result in sufficient translation initiation to allow for selection); when a spacer sequence is used behind the start codon, the addition "space" is used (this is done usually for "ATGmut" start codons for Zeo or Blas selectable markers). The Zeo open reading frame was amplified with primer pair ZEOforwardBamHI-ATGmut/space (SEQ ID NO:77): GATCGGATCCTTGGTTTTCGATCCAAAGACTGCCAAATCTAGATCCGAGATT TTCAGGAGCTAAGGAAGCTAAAGCCAAGTTGACCAGTGAAGTTC (wherein the sequence following the underlined sequence comprises the spacer sequence), and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRl-BamHI, and ligated into pd2EGF, cut with EcoRI-BamHI, creating pZEO-ATGmut/space-d2EGFP. [0212] 2) GTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 2C). The Zeo open reading frame was amplified with primer pair ZEOforwardBamHI-GTG (SEQ ID NO:78): GATCGGATCCACCGCCAAGTTGACCAGTGCCGTTC and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-GTG-d2EGFP.

[0213] 3) TTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 2D). The Zeo open reading frame was amplified with primer pair ZEOforwardBamHI-TTG: GATCGGATCCACCGCCAAGTTGACCAGTGCCGTTC (SEQ ID NO:79) and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-TTG-d2EGFP.

Transfection, Culturing and Analysis of CHO Cells

[0214] The Chinese Hamster Ovary cell line CHO-K1 (ATCC CCL-61) was cultured in HAMS-F12 medium+10% Fetal Calf Serum containing 2 mM glutamine, 100 U/ml penicillin, and 100 micrograms/ml streptomycin at 37.degree. C./5% CO.sub.2. Cells were transfected with the plasmids using Lipofectamine 2000 (Invitrogen) as described by the manufacturer. Briefly, cells were seeded to culture vessels and grown overnight to 70-90% confluence. Lipofectamine reagent was combined with plasmid DNA at a ratio of 15 microliters per 3 microgram (e.g., for a 10 cm Petri dish, 20 micrograms DNA and 120 microliters Lipofectamine) and added after 30 minutes incubation at 25.degree. C. to the cells. After overnight incubation the transfection mixture was replaced with fresh medium, and the transfected cells were incubated further. After overnight cultivation, cells were trypsinized and seeded into fresh culture vessels with fresh medium. After another overnight incubation, ZEOCIN.RTM. was added to a concentration of 50 .mu.g/ml and the cells were cultured further. After another three days the medium was replaced by fresh medium containing ZEOCIN.RTM. (100 .mu.g/ml) and cultured further. When individual colonies became visible (approximately ten days after transfection) medium was removed and replaced with fresh medium without zeocin. Individual clones were isolated and transferred to 24-well plates in medium without zeocin. One day after isolation of the colonies, ZEOCIN.RTM. was added to the medium. Expression of the d2EGFP reporter gene was assessed approximately 3 weeks after transfection. d2EGFP expression levels in the colonies were measured after periods of two weeks.

Results

[0215] CHO-K1 cells were transfected with constructs that contain the ATGmut/space Zeo (FIG. 2B), GTG Zeo (FIG. 2C) and TTG Zeo (FIG. 2D) genes as selection gene, all being cloned upstream of the d2EGFP reporter gene. These three constructs were without STAR elements (Control) or with STAR elements 7 and 67 upstream of the CMV promoter and STAR 7 downstream from the d2EGFP gene (FIG. 3). FIG. 3 shows that both the control (without STAR elements) constructs with ATGmut/space Zeo (A) and GTG Zeo (B) gave colonies that expressed d2EGFP protein. The average d2EGFP expression level of 24 ATGmut/space Zeo colonies was 46 and of GTG Zeo colonies was 75. This higher average expression level in GTG Zeo colonies may reflect the higher stringency of GTG, in comparison with ATGmut/space (Example 1). Addition of STAR elements 7 and 67 to the constructs resulted in colonies that had higher average d2EGFP expression levels. Transfection of the ATGmut/space Zeo STAR 7/67/7 construct resulted in colonies with an average d2EGFP expression level of 118, which is a factor 2.6 higher than the average in the control cells (46). Addition of STAR elements to the GTG Zeo construct resulted in an average d2EGFP expression level of 99, which is a factor 1.3 higher than the average in the control cells (75).

[0216] Importantly, no colonies were established when the TTG Zeo construct was transfected. However, the construct with TTG Zeo, flanked with STARs 7 and 67 resulted in the establishment of 6 colonies, with an average d2EGFP expression level of 576 (FIG. 3C). Thus the highest translation stringency, brought about by the TTG start codon (FIG. 1) yields to the highest d2EGFP expression levels, as predicted in FIG. 2. The results also indicate that the stringency of the TTG Zeo alone (without STAR elements) is at least in some experiments too high for colonies to survive. However, in later independent experiments (see below), some colonies were found with this construct without STAR elements, indicating that the stringency of the selection system with the TTG start codon in the ZEOCIN.RTM. selection marker not necessarily precludes the finding of colonies when no STAR elements are present, and that the number of colonies obtained may vary between experiments.

[0217] It is concluded that the use of STAR elements in combination with the stringent selection system according to the invention allows to readily identify high producers of the gene of interest.

Example 3

Establishment of a Higher Number of TTG Zeo STAR Colonies and Comparison with an IRES-Zeo Construct

[0218] The results in example 2 indicate that the TTG Zeo has extremely stringent translation efficiency, which might be to high to convey ZEOCIN.RTM. resistance to the cells. The transfection was scaled up to test whether there would be some colonies that have such high expression levels that they survive. Scaling up the experiment could also address the question whether the high average of TTG Zeo STAR 7/67/7 would become higher when more colonies were analyzed.

Materials and Methods

[0219] CHO-K1 cells were transfected with the constructs that have the TTG Zeo gene as selection marker, with and without STAR elements 7 and 67 (FIG. 4). Transfections, selection, culturing etc were as in example 2, except that 6 times more cells, DNA and Lipofectamine 2000 were used. Transfections and selection were done in Petri dishes.

Results

[0220] FIG. 4A shows that transfection with the TTG Zeo STAR 7/67/7 construct resulted in the generation of many colonies with an average d2EGFP signal of 560. This is as high as in example 2, except that now 58 colonies were analyzed. When compared to a construct with the Zeocin-resistance gene placed behind an IRES sequence (FIG. 4B), the average d2EGFP expression level was 61, and when STAR elements 7 and 67 were added to such a construct, the average d2EGFP expression level was 125, a factor 2 above the control (FIG. 4B). The average of the TTG Zeo STAR 7/67/7 colonies was therefore a factor 9.2 higher than the STAR-less IRES-Zeo colonies and a factor 4.5 higher than the STAR7/67/7 IRES Zeo colonies.

[0221] An observation is that the form of the curve of all expressing colonies differs between the TTG Zeo STAR7/67/7 and IRES-Zeo STAR 7/67/7. In the first case (TTG Zeo) the curve levels off, whereas in the second case (IRES-Zeo) the curve has a more "exponential" shape. The plateau in the TTG Zeo curve could indicate that the cells have reached a maximum d2EGFP expression level, above which the d2EGFP expression levels become toxic and the cells die. However, it later appeared that the high values were close to the maximum value that could be detected with the settings of the detector of the FACS analyser. In later experiments, the settings of the FACS analyser were changed to allow for detection of higher values, and indeed in some instances higher values than obtained here were measured in later independent experiments (see below).

[0222] Due to up-scaling of the transfections three colonies with the STAR-less TTG Zeo construct could be picked. The d2EGFP expression levels of these colonies were 475, 158 and 43. The last colony died soon after the first measurement. This result indicates that the TTG Zeo construct can convey ZEOCIN.RTM. resistance, resulting in colonies that also can give high expression levels in some instances. Hence, the novel selection method according to the invention can be applied with expression cassettes that do not contain chromatin control elements, although it is clearly preferred to use expression cassettes comprising at least one such element, preferably a STAR element.

[0223] The results indicate that STAR elements allow a more stringent selection system according to the invention, such as exemplified in this example, resulting in the picking of colonies that have a very high average protein expression level.

Example 4

Creation and Testing of Blasticidin-d2EGFP Bicistronic Constructs with Differential Translation Efficiencies

[0224] There are four internal ATGs in the blasticidine resistance gene, none of which codes for a methionine (FIG. 14A). These ATGs have to be eliminated though (FIG. 14B), since they will serve as start codon when the ATG start codon (or the context thereof) has been modified, and this will result in peptides that do not resemble blasticidine resistance protein. More importantly, these ATGs will prevent efficient translation of the gene of interest, as represented by d2EGFP in this example for purposes of illustration. To eliminate the internal ATGs, the blasticidine resistance protein open reading frame was first amplified with 4 primer pairs, generating 4 blasticidine resistance protein fragments. The primer pairs were:

TABLE-US-00001 A) BSDBamHIforward (SEQ ID NO: 80): GATCGGATCCACCATGGCCAAGCCTTTGTCTCAAG BSD150reverse (SEQ ID NO: 81): GTAAAATGATATACGTTGACACCAG B) BSD150forward (SEQ ID NO: 82): CTGGTGTCAACGTATATCATTTTAC BSD250reverse (SEQ ID NO: 83): GCCCTGTTCTCGTTTCCGATCGCG C) BSD250forward (SEQ ID NO: 84): CGCGATCGGAAACGAGAACAGGGC BSD350reverse (SEQ ID NO: 85): GCCGTCGGCTGTCCGTCACTGTCC D) BSD350forward (SEQ ID NO: 86): GGACAGTGACGGACAGCCGACGGC BSD399reverse (SEQ ID NO: 87): GATCGAATTCTTAGCCCTCCCACACGTAACCAGAGGGC

[0225] Fragments A to D were isolated from an agarose gel and mixed together. Next, only primers BSDBamHIforward and BSD399reverse were used to create the full length blasticidine resistance protein cDNA, but with all internal ATGs replaced. The reconstituted blasticidine was then cut with EcoRI-BamHI, and cloned into pZEO-GTG-d2EGFP, cut with EcoRI-BamHI (which releases Zeo), resulting in pBSDmut-d2EGFP. The entire blasticidine resistance protein open reading frame was sequenced to verify that all ATGs were replaced.

[0226] With this mutated gene encoding blasticidine resistance protein (Blas), three classes of constructs are made (FIG. 14C-E): [0227] 1) ATG as a start codon, but in a bad context and followed by spacer sequence. The mutated blasticidine resistance protein open reading frame in pBSD-d2EGFP was amplified using primers BSDforwardBamHIAvrII-ATGmut/space (SEQ ID NO:88): GATCGGATCCTAGGTTGGTTTTC GATCCAAAGACTGCCAAATCTAGATCCGAGATTTTCAGGAGCTAAGGAAGCTAAAGCCAAGCCT TTGTCTCAAGAAG, [0228] and BSD399reverseEcoRIAvrII (SEQ ID NO:89): GATCGAATTCCCTAGGTTAGCCCTCCCAC ACGTAACCAGAGGGC, the PCR product is cut with BamHI-EcoRI, and ligated into pZEO-GTG-d2EGFP, cut with EcoRI-BamHI. This results in pBSD-ATGmut/space-d2EGFP. [0229] 2) GTG as a start codon instead of ATG. The mutated blasticidine resistance protein open reading frame in pBSD-d2EGFP was amplified using primers BSDforwardBamHIAvrII-GTG (SEQ ID NO:90): GATCGGATCCTAGGACCGCCAAGCCTTTGTCTCAAGAAG and BSD399reverseEcoRIAvrII (SEQ ID NO:89), the PCR product was cut with BamHI-EcoRI, and ligated into pZEO-GTG-d2EGFP, cut with EcoRI-BamHI. This results in pBSD-GTG-d2EGFP. [0230] 3) TTG as a start codon instead of ATG. The mutated blasticidine open reading frame in pBSD-d2EGFP was amplified using primers BSDforwardBamHIAvrII-TTG (SEQ ID NO:91): GATCGGATCCTAGGACCGCCAAGCCTTTGTCTCAAGAAG and BSD399reverseEcoRIAvrII (SEQ ID NO:89), the PCR product was cut with BamHI-EcoRI, and ligated into pZEO-GTG-d2EGFP, cut with EcoRI-BamHI. This results in pBSD-TTG-d2EGFP.

Results

[0231] CHO-K1 cells were transfected with constructs that contain the GTG Blas (FIG. 5A) and TTG Blas (FIG. 5B) genes as selection gene, all being cloned upstream of the d2EGFP reporter gene. Selection took place in the presence of 20 .mu.g/ml Blasticidine. The two constructs were without STAR elements (Control) or with STAR elements 7 and 67 upstream of the CMV promoter and STAR7 downstream from the d2EGFP gene (FIG. 5). FIG. 5 shows that both the control (without STAR elements) constructs with GTG Blas (A) and TTG Blas (B) gave colonies that expressed d2EGFP protein. The average d2EGFP signal of 24 GTG Blas colonies was 14.0 (FIG. 5A) and of TTG Blas colonies was 81 (FIG. 5B). This higher average expression level in TTG Blas colonies may reflect the higher stringency of TTG, in comparison with GTG (see also Example 2). However, only 8 colonies survived under the more stringent TTG conditions.

[0232] Addition of STAR elements 7 and 67 to the constructs resulted in colonies that had higher average d2EGFP expression levels. Transfection of the GTG Blas STAR 7/67/7 construct resulted in colonies with an average d2EGFP expression level of 97.2 (FIG. 5A), which is a factor 6.9 higher than the average in the control cells (14.0). Addition of STAR elements to the TTG Blas construct resulted in an average d2EGFP signal of 234.2 (FIG. 5B), which is a factor 2.9 higher than the average in the control cells (81). However, note again that only 8 colonies survived the harsh selection conditions of TTG Blas, whereas 48 colonies survived with TTG Blas STAR 7/67/7. When only the five highest values are compared, the average of the five highest TTG Blas was 109.1 and the average of the five highest TTG Blas STAR 7/67/7 was 561.2, which is a factor 5.1 higher.

[0233] The results indicate that STAR elements allow a more stringent selection system, resulting in the picking of colonies that have a very high average protein expression level. They also show that this selection is not restricted to the Zeocin-resistance protein alone, but that also other selection marker polypeptides, in this case the blasticidine resistance protein, can be used.

Example 5

Stability of d2EGFP Expression in the Novel Selection System

[0234] Colonies described in Example 3 were further cultured under several conditions to assess the stability of d2EGFP expression over an extended time period.

Results

[0235] The TTG Zeo STAR 7/67/7 containing colonies in FIG. 4A were cultured for an additional 70 days in the presence of 100 .mu.g/ml Zeocin. As shown in FIG. 6, the average d2EGFP signal rose from 560.2 after 35 days to 677.2 after 105 days. Except for some rare colonies all colonies had a higher d2EGFP expression level.

[0236] When the level of ZEOCIN.RTM. was lowered to 20 .mu.g/ml ZEOCIN.RTM., there was still an increase in the average d2EGFP expression level, from 560.2 after 35 days to 604.5 after 105 days (FIG. 7).

[0237] When no selection pressure was present at all due to removal of the ZEOCIN.RTM. from the culture medium, approximately 50% of the colonies became mosaic, that is, within one colony non-d2EGFP expressing cells became apparent. This resulted in lowering of d2EGFP expression levels to less than 50% of the original levels. If the signal became less than 67% (decrease of at least one-third) from the original signal, the colony was considered to be unstable in respect to d2EGFP expression. Of the 57 original colonies 27 colonies remained stable according to this criterion; the average d2EGFP signal of these colonies after 35 days (while still under selection pressure) was 425.6, whereas the average d2EGFP signal without selection pressure after 65 days was 290.0. When measured after 105 days, the average signal in the 27 colonies was 300.9. Hence, after an initial decrease, the expression levels in the 27 colonies remained stable according to this criterion (FIG. 8).

[0238] Six of the colonies were subjected to one round of sub-cloning. Cells were sown in 96-wells plates as such that each well contained approximately 0.3 cells. No ZEOCIN.RTM. was present in the medium so that from the start the sub clones grew without selection pressure. Of each original colony six sub clones were randomly isolated and grown in 6-wells plates till analysis. In FIG. 12 we compared the original values of the original clones, as already shown in FIG. 4A, with one of the sub clones. In one of the six clones (clone 25), no sub clone was present with d2EGFP signal in the range of the original clone. However, in five out of six cases at least one the sub clones had equal d2EGFP expression levels as the parent clone. These expression levels were determined after 50 days without selection pressure. We conclude that one round of sub cloning is sufficient to obtain a high number of colonies that remain stable for high expression in the absence of selection pressure. This has been confirmed in a similar experiment (not shown).

[0239] We compared the number of copies that integrated in the TTG Zeo STAR 7/67/7 colonies. DNA was isolated when colonies were 105 days under ZEOCIN.RTM. selection pressure (see FIG. 6). As shown in FIG. 13 two populations could be distinguished. In FIG. 13 the cut off was made at 20 copies and the R.sup.2 value is calculated and shown. Also the R.sup.2 value from data with higher than 20 copies is shown. In the range from 100 to 800 d2EGFP signal there was a high degree of copy number dependency, as signified by a relatively high R.sup.2 of 0.5685 (FIG. 13). However, in the population of colonies that fluctuate around a d2EGFP signal of 800 a high variation in copy number was observed (FIG. 13), as signified with a low R.sup.2 of 0.0328. Together the data show that in the novel selection system, in colonies that contain TTG Zeo STAR 7/67/7 constructs there is copy number dependent d2EGFP expression up to .about.20 copies. Also, although copy number dependency is lost when >20 copies are present, still a substantial proportion of the colonies with high (>800) d2EGFP signal have no more than 30 copies (FIG. 13). This combination between high d2EGFP expression and a relatively low copy number (between 10 and 30) may be important for identifying colonies that remain relatively stable without selection pressure. It is an advantage to have clones with relatively low copy numbers (less than about 30, more preferably less than about 20) that give high expression levels, because such clones are believed to be less amenable to genetic instability. The present selection system allows to generate such clones, including from CHO cells.

Example 6

Creation and Testing of Zeocin-Blasticidin-EpCAM Bicistronic Constructs with Differential Translation Efficiencies

[0240] To test the selection system on the production of an antibody, the anti-EpCAM antibody (see also Example 5 of the incorporated '525 application and of WO2006/005718) was taken as example.

Results

[0241] A plasmid was created on which both the heavy chain (HC) and light chain (LC) were placed, each in a separate transcription unit (FIG. 9-11). Expression of both chains was driven by the CMV promoter. Upstream of the EpCAM heavy chain the Zeocin-resistance gene was placed, either with the ATGmut/space (FIG. 9), GTG (FIG. 10) or TTG (FIG. 11) as start codon (See, Example 2). Upstream of the EpCAM light chain the Blasticidine resistance gene was placed, either with the ATGmut/space (FIG. 9), GTG (FIG. 10) or TTG (FIG. 11) as start codon (See, Example 4). Two types of constructs were made, one construct without STAR elements (Control) and one construct with a combination of STAR 7 and 67 elements. The STAR elements were placed as follows: upstream of each CMV promoter (i.e., one for the transcription unit comprising HC and one for the transcription unit comprising LC) STAR 67 was placed and the resulting construct was flanked with a 5' and 3' STAR 7 element (FIGS. 9-11). All constructs were transfected to CHO-K1 cells and selected on 100 .mu.g/ml ZEOCIN.RTM. and 20 .mu.g/ml Blasticidin (at the same tim selection independent colonies were isolated and propagated under continuous selection pressure (using 100 .mu.g/ml ZEOCIN.RTM. and 20 .mu.g/ml blasticidin). FIG. 9 shows that the STAR 7/67/7 combination had a beneficial effect on EpCAM production. The ATGmut/space Zeo and ATGmut/space Blas had no effect on the number of colonies that were formed with plasmids containing STAR elements or not. However, the average EpCAM expression levels of either 24 control versus STAR 7/67/7 colonies ranged from 0.61 pg/cell/day in the control to 3.44 pg/cell/day in the STAR7/67/7 construct (FIG. 9). This is a factor 5.6 increase. Since there were many colonies in the ATGmut/space control with 0 pg/cell/day, also the average EpCAM production in the highest five colonies was compared. In the control ATGmut/space this was 3.0 pg/cell/day, versus 7.8 pg/cell/day with the ATGmut/space STAR 7/67/7 construct, an increase of a factor 2.6.

[0242] FIG. 10 also shows that the STAR 7/67/7 combination had a beneficial effect on EpCAM production, using the GTG start codon for the markers. With the GTG Zeo and GTG Blas STAR 7/67/7 construct approximately 2 times more colonies were formed. Also, the average EpCAM expression levels of either 24 control versus STAR 7/67/7 colonies ranged from 2.44 pg/cell/day in the control to 6.51 pg/cell/day in the STAR7/67/7 construct (FIG. 10). This is a factor 2.7 increase. Also the average EpCAM production in the highest five colonies was compared. In the control GTG this was 5.7 pg/cell/day, versus 13.0 pg/cell/day with the GTG STAR 7/67/7 construct, an increase of a factor 2.3. Also note that the average EpCAM production mediated by the GTG start codon for the selection markers was significantly higher than with the ATGmut/space start codon.

[0243] FIG. 11 shows that with the TTG Zeo and TTG Blas control construct no colonies were formed, similar as in example 2. With the STAR 7/67/7 TTG construct colonies were formed. The average EpCAM expression levels of the STAR 7/67/7 TTG colonies was 10.4 pg/cell/day (FIG. 11). This is again higher than with the ATGmut/space and GTG as start codon (see FIGS. 9, 10 for comparison). The average EpCAM production in the highest five TTG STAR 7/67/7 colonies was 22.5 pg/cell/day.

[0244] The results show that the selection system can also be applied to two simultaneously produced polypeptides, in this case two polypeptides of a multimeric protein, casu quo an antibody. The EpCAM production closely follows the results obtained with d2EGFP. The TTG as start codon is more stringent than the GTG start codon, which in turn is more stringent than the ATGmut/space (FIGS. 1 and 2). Higher stringency results in a decreasing number of colonies, with no colonies in the case of the TTG control that has no STAR elements, and higher stringency of the selection marker is coupled to higher expression of the protein of interest.

Example 7

Creation and Testing of Additional GTG Zeocin-d2EGFP Bicistronic Constructs with Differential Translation Efficiencies

[0245] Different versions of the Zeocin-resistance gene with mutated start codons were described in Example 1. Besides the described GTG codons (Example 1, FIG. 22A), additional modified start codons with distinct translational efficiency are possible. These different Zeocin-resistance gene versions were created (FIG. 22) and cloned upstream of the modified d2EGFP gene, as in Example 2.

Materials and Methods

Creation of Plasmids

[0246] Four additional GIG constructs were made: [0247] 1) GTG as a start codon in the Zeo resistance gene (FIG. 22A), but followed by a spacer sequence (FIG. 22B). The mutspace-Zeo open reading frame was amplified with primer pair GTGspaceBamHIF (SEQ ID NO:106): GAATTCGGATCCACCGTGGCGATCCAAAGACTGCCAAA TCTAG and (wherein the sequence following the underlined sequence comprises the spacer sequence), and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-GTGspace-d2EGFP. [0248] 2) GTG as a start codon in the Zeo resistance gene, but in a bad context () (FIG. 22C). The Zeo open reading frame was amplified with primer pair ZEOTTTGTGBamHIF (SEQ ID NO:107): GAATTCGGATCCTTTGTGGCCAAGTTGACCAGTGCCGTTCCG and. ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO(leu)-TTTGTG-d2EGFP. [0249] 3) GTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 22A), but with an additional mutation in the Zeo open reading frame at Pro9, which was replaced with threonine (Thr) (FIG. 22D). The Thr9 mutation was introduced by amplifying the Zeo open reading with primer pair ZEOForwardGTG-Thr9 (SEQ ID NO:108): AATTGGATCCACCGTGGCCAAGTTGACCAGTGCC GTTGTGCTC and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-GTG-Thr9-d2EGFP. [0250] 4) GTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 22A), but with an additional mutation in the Zeo open reading frame at Pro9, with was replaced with Phenylalanine (Phe) (FIG. 22E). The Phe9 mutation was introduced by amplifying the Zeo open reading with primer pair ZEOForward GTG-Phe9 (SEQ ID NO:109): AATTGGATCCACCGTGGCCAAGTTGACCAGTG CCGTTGTGCTC and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-GTG-Phe9-d2EGFP.

Transfection, Culturing and Analysis of CHO Cells

[0251] Transfection, culturing and analysis of CHO-K1 cells was performed as in Example 1.

Results

[0252] CHO-K1 cells were transfected with constructs that contain the GTG Zeo (FIG. 22A), GTGspace Zeo (FIG. 22B), TTT GTG Zeo (also called: GTGmut Zeo) (FIG. 22C), GTG Thr9 Zeo(leu) (FIG. 22D) and GTG Phe9 Zeo(leu) (FIG. 22D) genes as selection gene, all being cloned upstream of the d2EGFP reporter gene. These five constructs were without STAR elements (Control) or with STAR elements 7 and 67 upstream of the CMV promoter and STAR 7 downstream from the d2EGFP gene (FIG. 22). FIG. 23 shows that of the control constructs without STAR elements only the GTG Zeo construct without STAR elements gave colonies that expressed d2EGFP protein. In contrast, all constructs containing STAR elements gave colonies that expressed d2EGFP protein. The mean d2EGFP fluorescence signal of 11 GTG Zeo Control colonies was 20.3, of 13 GTG Zeo colonies with STARs 7/67/7 104.9, of 24 GTG space Zeo 7/67/7 colonies 201.5, of 6 TTT GTG Zeo 7/67/7 colonies 310.5, of 22 GTG Thr9 Zeo 7/67/7 colonies 423, and of 16 GTG Phe9 Zeo colonies 550.2 (FIG. 23).

[0253] The higher stringencies of the novel GTG mutations correlate with higher mean fluorescence signals (FIG. 23). The TTT GTG Zeo 7/67/7, however, gave only two high expressing colonies and a few low expressing colonies. This may indicate that this mutation is at the brink of the stringency that these cells can bear with a fixed concentration of ZEOCIN.RTM. added to the culture medium.

[0254] The Thr9 and Phe9 mutations do not influence the translation efficiency of the Zeo mutants. Instead they reduce the functionality of the Zeocin-resistance protein, by preventing an optimal interaction between the two halves of the ZEOCIN.RTM.-resistance protein (Dumas et al., 1994). This implies that more of the protein has to be produced to achieve resistance against the ZEOCIN.RTM. in the culture medium. As a consequence, the entire cassette has to be transcribed at a higher level, eventually resulting in a higher d2EGFP expression level.

[0255] It is concluded that the use of the described translation efficiencies of the Zeocin-resistance mRNA result in higher expression levels of the d2EGFP protein, this in combination with STAR elements.

[0256] This example further demonstrates the possibility to provide for fine-tuning of the stringency of the selection system of the invention, to achieve optimal expression levels of a protein of interest. Clearly, the person skilled in the art will be capable of combining these and other possibilities within the concepts disclosed herein (e.g., mutate the ZEOCIN.RTM. at position 9 to other amino acids, or mutate it in other positions; use a GTG or other start codon in a non-optimal translation initition context for ZEOCIN.RTM. or other selection markers; or mutate other selection markers to reduce their functionality, for instance use a sequence coding for a neomycin resistance gene having a mutation at amino acid residue 182 or 261 or both, see, e.g., WO 01/32901), and the like, to provide for such fine-tuning, and by simply testing determine a suitable combination of features for the selection marker, leading to enhanced expression of the polypeptide of interest.

Example 8

Creation and Testing of Additional TTG Zeocin-d2EGFP Bicistronic Constructs with Differential Translation Efficiencies

[0257] Different versions of the Zeocin-resistance gene with mutated start codons were described in Example 1. Besides the described TTG codons (FIG. 24A) additional modified start codons with distinct translational efficiency are possible. These different Zeocin-resistance gene versions were created and cloned upstream of the modified d2EGFP gene (FIG. 24).

Materials and Methods

Creation of Plasmids

[0258] Three additional TTG constructs were made: [0259] 1) TTG as a start codon in the Zeo resistance gene (FIG. 24A), but followed by a spacer sequence (FIG. 24B). The Zeo open reading frame (with the spacer sequence) was amplified with primer pair TTGspaceBamHIF (SEQ ID NO:110): GAATTCGGATCCACCTTGGCGATCCAAAGACTGCCAA ATCTAG and ZEOWTreverse(SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-TTGspace-d2EGFP. [0260] 2) TTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 24A), but with an additional mutation in the Zeo open reading frame at Pro9, with was replaced with threonine (Thr) (FIG. 24C). The Thr9 mutation was introduced by amplifying the Zeo open reading with primer pair ZEOForwardTTG-Thr9 (SEQ ID NO:111): AATTGGATCCACCTTGGCCAAGTTGACCAGTGCCGT TGTGCTC and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-TTG-Thr9-d2EGFP. [0261] 3) TTG as a start codon in the Zeo resistance gene, instead of ATG (FIG. 24A), but with an additional mutation in the Zeo open reading frame at Pro9, with was replaced with Phenylalanine (Phe) (FIG. 24D). The Phe9 mutation was introduced by amplifying the Zeo open reading with primer pair ZEOForwardTTG-Phe9 (SEQ ID NO:112): AATTGGATCCACCTTGGCCAAGTTGACCAGTGCC GTTGTGCTC and ZEOWTreverse (SEQ ID NO:69), the PCR product was cut with EcoRI-BamHI, and ligated into pd2EGFP, cut with EcoRI-BamHI, creating pZEO-TTG-Phe9-d2EGFP.

Results

[0262] CHO-K1 cells were transfected with constructs that contain the TTG Zeo (FIG. 24A), TTGspace Zeo (FIG. 24B), TTG Thr9 Zeo (FIG. 24C) and TTG Phe9 Zeo (FIG. 24D) genes as selection gene, all being cloned upstream of the d2EGFP reporter gene. These four constructs were without STAR elements (Control) or with STAR elements 7 and 67 upstream of the CMV promoter and STAR 7 downstream from the d2EGFP gene (FIG. 24). FIG. 25 shows that of the control constructs without STAR elements only the TTG Zeo construct without STAR elements gave colonies that expressed d2EGFP protein. In contrast, all constructs containing STAR elements gave colonies that expressed d2EGFP protein. The mean d2EGFP fluorescence signal of 3 TTG Zeo Control colonies was 26.8, of 24 TTG Zeo colonies with STARs 7/67/7 426.8, of 24 TTGspace Zeo 7/67/7 colonies 595.7, of 2 TTG Thr9 Zeo 7/67/7 colonies 712.1, and of 3 TTG Phe9 Zeo colonies 677.1 (FIG. 25).

[0263] The higher stringencies of the novel TTG mutations correlate with higher mean fluorescence signals (FIG. 25). The TTG Thr9 Zeo 7/67/7 and TTG Phe9 Zeo 7/67/7 constructs, however, gave only two high expressing colonies each and a few low expressing colonies. This may indicate that these mutations are at the brink of the stringency that the cells can bear with a fixed concentration of ZEOCIN.RTM. added to the culture medium.

[0264] It is concluded that the use of the described translation efficiencies of the Zeocin-resistance mRNA result in higher expression levels of the d2EGFP protein, this in combination with STAR elements.

Example 9

Creation and Testing of Puromycin-d2EGFP Bicistronic Constructs with Differential Translation Efficiencies

[0265] There are three internal ATGs in the puromycin resistance gene, each of which codes for a methionine (FIG. 17, FIG. 26A). These ATGs have to be eliminated (FIG. 26B,C), since they will serve as start codon when the ATG start codon (or the context thereof) has been modified, and this will result in peptides that do not resemble puromycin resistance protein. More importantly, these ATGs will prevent efficient translation of the gene of interest, as represented by d2EGFP in this example for purposes of illustration. The methionines were changed into leucine, like in the zeocin-resistance protein (Example 1). However, instead of using the TTG codon for leucine (for instance in ZEOCIN.RTM. in Example 1), now the CTG codon for leucine was chosen (in humans, for leucine the CTG codon is used more often than the TTG codon). To eliminate the internal ATGs, the puromycin resistance protein open reading frame was first amplified with 4 primer pairs, generating 4 puromycin resistance protein fragments. The primer pairs were: PURO BamHI F (SEQ ID NO:113): GATCGGATCCATGGTTACCGAGTACAAGCCCACGGT, PURO300 R LEU (SEQ ID NO:114): CAGCCGGGAACCGCTCAACTCGGCCAGGCGCGGGC; and PURO300FLEU (SEQ ID NO:115): CGAGTTGAGCGGTTCCCGGCTGGCCGCGCAGCAACAGCTGGAAGGCCTC, PURO600RLEU (SEQ ID NO:116): AAGCTTGAATTCAGGCACCGGGCTTGCGGGTCAGGCACCAGGTC.

[0266] This generates two PCR products, corresponding to the 5' and 3' part of the puromycin resistance gene. The two products were added together and amplified with PURO BamHI F (SEQ ID NO:113)-PURO600RLEU (SEQ ID NO:116). The resulting PCR product was cut with BamHI-EcoRI and ligated, creating pCMV-ATGPURO (leu). Sequencing of this clone verified that all three internal ATGs had been converted. The entire puromycin open reading frame was then amplified with PUROBamHI TTG1F (SEQ ID NO:117): GAATTCGGATCCACCTTGGTTACCGAGTACAAGCCCACGGTG and PURO600RLEU (SEQ ID NO:116). This primer introduces an extra codon (GTT) directly after the TTG start codon, because the "G" at nucleotide +4 is introduced for an optimal context, and hence two more nucleotides are introduced to preserve the reading frame.

Results

[0267] CHO-K1 cells were transfected with the construct that contains the TTG Puro (FIG. 27) gene as selection gene, cloned upstream of the d2EGFP reporter gene. Selection was under 10 .mu.g/ml puromycin. The construct was without STAR elements (Control) or with STAR elements 7 and 67 upstream of the CMV promoter and STAR 7 downstream from the d2EGFP gene (FIG. 27). FIG. 27 shows that the average d2EGFP fluorescence signal of 24 TTG Puro Control colonies was 37.9, of 24 TTG Puro colonies with STARs 7/67/7 75.5. Moreover, when the average of the five highest values is taken, the d2EGFP fluorescence signal of TTG Puro Control colonies was 69.5, and of TTG Puro colonies with STARs 7/67/7 186.1, an almost three-fold increase in d2EGFP fluorescence signal. This shows that the described, modified translation efficiency of the Puromycin resistance mRNA result in higher expression levels of the d2EGFP protein, this in combination with STAR elements.

[0268] This experiment demonstrates that the puromycin resistance gene can be mutated to remove the ATG sequences therefrom, while remaining functional. Moreover it is concluded that the selection method of the invention also works with yet another selection marker, puromycin.

Example 10

Creation and Testing of Neomycin Constructs with Differential Translation Efficiencies

[0269] There are sixteen internal ATGs in the neomycin resistance gene, five of which code for a methionine in the neomycin open reading frame (FIG. 20, FIG. 28A). All these sixteen ATGs have to be eliminated (FIG. 28B,C), since they will serve as start codon when the ATG start codon (or the context thereof) has been modified, and this will result in peptides that do not resemble neomycin resistance protein, and this will decrease the translation from the downstream open reading frame coding for the polypeptide of interest in the transcription units of the invention. To eliminate the internal ATGs, the neomycin resistance protein open reading frame was entirely synthesized by a commercial provider (GeneArt, Germany), wherein all internal coding ATGs (for Met) where replaced by CTGs (coding for Leu), and non-coding ATGs were replaced such that a degenerated codon was used and hence no mutations in the protein sequence resulted; the synthesised sequence of the neomycin is given in SEQ ID NO:118. In order to replace the ATG start codon with GTG (FIG. 28B) or TTG FIG. 28C), the synthesized neomycin gene was amplified with primer pairs NEO-F-HindIII (SEQ ID NO:120): GATCAAGCTTTTGGATCGGCCATTGAA ACAAGACGGATTG and NEO EcoRI 800R (SEQ ID NO:121): AAGCTTGAATTCTCAGAAGAACTCGT CAAGAAGGCG.

Results

[0270] E. coli bacteria were used to test the functionality of the neomycin resistance protein from which all ATGs were removed. E. coli bacteria were transformed with the constructs that contain the GTG Neo (FIG. 28B) or TTG Neo (FIG. 28C) gene as selection gene. Selection took place by growing the bacteria on kanamycin. Only a functional neomycin resistance gene can give resistance against kanamycin. Transformation with either modified Neo gene resulted in the formation of E. coli colonies, from which the plasmid containing the gene could be isolated. This shows that the described, modified translation efficiencies of the Neomycin resistance mRNAs, as well as the removal of all ATGs from the Neo open reading frame result in the production of functional neomycin resistance protein.

[0271] The mutated neomycin resistance genes are incorporated in a multicistronic transcription unit of the invention, and used for selection with G418 or neomycin in eukaryotic host cells.

Example 11

Creation and Testing of dhfr Constructs with Differential Translation Efficiencies

[0272] There are eight internal ATGs in the dhfr gene, six of which code for a methionine in the dhfr open reading frame (FIG. 18, FIG. 29A). All these ATGs have to be eliminated (FIGS. 29B,C), since they will serve as start codon when the ATG start codon (or the context thereof) has been modified, and this will result in peptides that do not resemble dhfr protein, and will decrease the translation from the downstream open reading frame coding for the polypeptide of interest in the transcription units of the invention. To eliminate the internal ATGs, the dhfr protein open reading frame was entirely synthesized (SEQ ID NO:122), as described above for neomycin. In order to replace the ATG start codon with GTG (FIG. 29B) or TTG (FIG. 29C), the synthesized DHFR gene was amplified with primers DHFR-F-HindIII (SEQ ID NO:124): GATCAAGCTTTTGTTCGACCATTGAACTGCATCGTC and DHFR-EcoRI-600-R (SEQ ID NO:125): AGCTTGAATTCTTAGTCTTTCTTCTCGTAGACTTC.

Results

[0273] E. coli bacteria were used to test the functionality of the dhfr protein from which all ATGs were removed. E. coli was transformed with the constructs that contain the GTG dhfr (FIG. 29B) or TTG dhfr (FIG. 29C) gene. Selection took place by growing the bateria on trimethoprim (Sigma T7883-56). Only a functional dhfr gene can give resistance against trimethoprim. Transformation with either modified dhfr gene resulted in the formation of E. coli colonies, from which the plasmid containing the gene could be isolated. This shows that the described, modified translation efficiencies of the dhfr mRNAs, as well as the removal of all ATGs from the dhfr open reading frame result in the production of functional dhfr protein.

[0274] The mutated dhfr genes are incorporated in a multicistronic transcription unit of the invention, and used for selection with methotrexate in eukaryotic host cells.

Example 12

Testing of Zeocin- and Blasticidin Constructs with Differential Translation Efficiencies in PER.C6.RTM. Cells

[0275] Various ZEOCIN.RTM. and blasticidin genes with mutated start codons, all cloned upstream of the d2EGFP gene were tested in the PER.C6.RTM. cell line.

Results

[0276] The GTG ZEOCIN.RTM. and GTGspace Zeocin-resistance gene modifications (see also Example 7; FIG. 30) and the GTG blasticidin and TTG blasticidin resistance gene modifications (see also Example 4; FIG. 31), all cloned upstream of the d2EGFP gene were transfected to PER.C6.RTM. cells. As shown in FIG. 30, transfection with both the GTG ZEOCIN.RTM. and GTGspace ZEOCIN.RTM. gene resulted in colonies that expressed d2EGFP. The average d2EGFP fluorescence signal of 20 GTG Zeo colonies was 63.8, while the average d2EGFP signal of 20 GTGspace Zeo colonies was 185, demonstrating that also in PER.C6.RTM. cells the GTGspace Zeo has a higher translation stringency than the GTG Zeo mRNA.

[0277] As shown in FIG. 31, transfection with both the GTG Blasticidin and TTG Blasticidin gene resulted in colonies that expressed d2EGFP. The average d2EGFP fluorescence signal of 20 GTG Blasticidin colonies was 71.4, while the average d2EGFP fluorescence signal of 20 TTG Blasticidin colonies was 135, demonstrating that also in PER.C6.RTM. cells the TTG Blasticidin has a higher translation stringency than the GTG Blasticidin mRNA.

[0278] This example demonstrates that the selection system of the invention can also be used in other cells than CHO cells.

Example 13

Testing of the Addition of a Transcriptional Pause Signal to a TTG Zeocin-d2EGFP Construct

[0279] A TRAnscription Pause (TRAP) sequence is thought to, at least in part, prevent formation of antisense RNA or, to at least in part, prevent transcription to enter the protein expression unit (see WO 2004/055215). A TRAP sequence is functionally defined as a sequence which when placed into a transcription unit, results in a reduced level of transcription in the nucleic acid present on the 3' side of the TRAP when compared to the level of transcription observed in the nucleic acid on the 5' side of the TRAP, and non-limiting examples of TRAP sequences are transcription termination signals. In order to function to prevent or decrease transcription to enter the transcription unit, the TRAP is to be placed upstream of a promoter driving expression of the transcription unit and the TRAP should be in a 5' to 3' direction. In order to prevent at least in part formation of antisense RNA, the TRAP should be located downstream of the open reading frame in a transcription unit and present in a 3' to 5' direction (that is, in an opposite orientation as the normal orientation of a transcriptional termination sequence that is usually present behind the open reading frame in a transcription unit). A combination of a TRAP upstream of the promoter in a 5' to 3' orientation and a TRAP downstream of the open reading frame in a 3' to 5' oreintation is preferred. Adding a TRAP sequence to a STAR element improves the effects of STAR elements on transgene expression (see WO 2004/055215). Here we test the effects of the TRAP sequence in the context of the TTG Zeo resistance gene.

Results

[0280] The TTG Zeocin-d2EGFP cassette that was flanked with STAR7 elements (FIG. 32) was modified by the addition of the SPA/pause TRAP sequence (see WO 2004/055215); SEQ ID NO:126), both upstream of the 5' STAR7 (in 5' to 3' direction) and downstream of the 3' STAR7 (in 3' to 5' direction) (FIG. 32). Both STAR 7/7 and TRAP-STAR 7/7-TRAP containing vectors were transfected to CHO-K1. Stable colonies were isolated and the d2EGFP fluorescence intensities were measured. As shown in FIG. 43 the average d2EGFP fluorescence signal of 23 TTG Zeo STAR 7/7 colonies was 455.1, while the average d2EGFP fluorescence signal of 23 TTG Zeo TRAP-STAR 7/7-TRAP colonies was 642.3. The average d2EGFP fluorescence signal in highest 5 TTG Zeo STAR 7/7 colonies was 705.1, while the average d2EGFP fluorescence signal of 5 TTG Zeo TRAP-STAR 7/7-TRAP colonies was 784.7.

[0281] This result indicates that the addition of TRAPs does not enhance the d2EGFP fluorescence signal in the highest colonies, but that there is a significant raise in the number of high expressing colonies. Whereas only 5 TTG Zeo STAR 7/7 colonies had d2EGFP signal above 600, 17 TTG Zeo TRAP-STAR 7/7-TRAP colonies had a d2EGFP fluorescence signal above 600.

[0282] In the experiment 3 .mu.g DNA of each plasmid was transfected. However, whereas the transfection efficiency was similar, the total number of colonies with the TTG Zeo STAR 7/7 plasmid was 62, while the total number of colonies with the TTG Zeo TRAP-STAR 7/7-TRAP plasmid was 116, almost a doubling.

[0283] We conclude that addition of TRAP elements to the STAR containing plasmids with modified Zeocin-resistance gene translation codons results in a significantly higher overall number of colonies and that more colonies are present with the highest expression levels.

Example 14

Copy-Number Dependency of Expression

[0284] We analyzed the EpCAM antibody expression levels in relation to the number of integrated EpCAM DNA copies.

Results

[0285] The construct that was tested was TTG-Zeo-Light Chain (LC)-TTG-Blas-Heavy Chain (HC), both expression units being under the control of the CMV promoter (see FIG. 33). This construct contained STAR 7 and 67 (see FIG. 33). Selection conditions were such that with 200 .mu.g/ml ZEOCIN.RTM. and 20 .mu.g/ml Blasticidin in the culture medium no control colonies (no STARs) survived and only STAR 7/67/7 colonies survived.

[0286] DNA was isolated when colonies were 60 days under ZEOCIN.RTM. and Blasticidin selection pressure (see FIG. 33). The R.sup.2 value is calculated and shown. In the entire range from 5 to 40 pg/cell/day EpCAM there was a high degree of copy number dependency, as signified by a relatively high R.sup.2 of 0.5978 (FIG. 33). The data show that in the novel selection system, in colonies that contain TTG Zeo-TTG Blas EpCAM STAR 7/67/7 constructs there is copy number dependent EpCAM expression.

Example 15

Methotrexate Induction of Higher EpCAM Expression

[0287] We analyzed EpCAM antibody expression levels after incubation of clones with methotrexate (MTX). The purpose of this experiment was to determine whether amplification of a STAR-containing construct would result in higher EpCAM expression. MTX acts through inhibition of the dhfr gene product. While some CHO strains that are dhfr-deficient have been described, CHO-K1 is dhfr.sup.+. Therefore relatively high concentrations of MTX in the culture medium have to be present to select for amplification by increased MTX concentrations in CHO-K1 cells.

Results

[0288] The construct that was tested was TTG-Zeo-Heavy Chain (HC)-TTG-Blas-Light Chain (LC), both expression units being under the control of the CMV promoter. Upstream of each CMV promoter STAR67 was positioned and STAR7 was used to flank the entire cassette (see also Example 6, FIG. 11 for such a construct). This construct was further modified by placing an SV40-dhfr cassette (a mouse dhfr gene under control of an SV40 promoter) between the HC and LC cassettes, upstream of the second STAR67 (FIG. 34). CHO-K1 cells were transfected. Selection was done with 100 .mu.g/ml ZEOCIN.RTM. and 10 .mu.g/ml Blasticidin in the culture medium. No control colonies (without STAR elements) survived and only colonies with constructs containing the STAR elements survived. Colonies were isolated and propagated before measuring EpCAM expression levels. Six colonies that produced between 20 and 35 pg/cell/day were transferred to medium containing 100 nM MTX. This concentration was raised to 500 nM, 1000 nM and finally to 2000 nM with two weeks periods in between each step. After two weeks on 2000 nM MTX, EpCAM concentrations were measured. As shown in FIG. 34, four colonies showed enhanced EpCAM production. Colony 13: from 22 to 30; colony 14: from 28 to 42; colony 17: from 20 to 67 and colony 19: from 37 to 67 pg/cell/day. Colonies 4 and 16 showed no enhanced EpCAM expression. We conclude that addition of methotrexate to the culture medium of CHO-K1 colonies created with the selection system of the invention can result in enhanced protein expression. Hence, STAR elements and the selection method of the invention can be combined with and are compatible with MTX-induced enhancement of protein expression levels.

Example 16

TTG-Zeo Selection Operates in the Context of Different Promoters

[0289] We analyzed d2EGFP expression levels in the context of the TTG Zeo selection marker and different promoters. We compared the action of STAR elements in the context of the CMV enhancer/promoter, the SV40 enhancer/promoter and the CMV enhancer/fl-actin promoter.

Results

[0290] In FIG. 35 we indicate the promoters we tested in the context of the TTG Zeo selection marker. The tested plasmids consisted of the indicated control constructs with three different promoters and STAR constructs which were flanked with STAR 7 and STAR 67 at the 5' end and STAR 7 at the 3' end. The constructs were transfected to CHO-K1 cells and selection was performed with 200 .mu.g/ml ZEOCIN.RTM. in the culture medium. Up to 23 independent colonies were isolated and propagated before analysis of d2EGFP expression levels. As shown in FIG. 35, incorporation of STAR elements in constructs with the CMV enhancer/promoter, the SV40 enhancer/promoter or the CMV enhancer/.beta.-actin promoter all resulted in the formation of colonies with higher d2EGFP expression levels than with the corresponding control constructs. This shows that the selection system of the invention, in combination with STAR elements, operates well in the context of different promoters. Further analysis showed that the mean of CMV-driven d2EGFP values was significantly higher than the mean of SV40-driven d2EGFP values (p<0.05). In contrast, the mean of CMV-driven d2EGFP values did not significantly differ from CMV/.beta. actin-driven d2EGFP values (p=0.2).

Example 17

Comparison of Different STAR Elements in the TTG-Zeo Selection System

[0291] We analyzed d2EGFP expression levels in the context of the CMV promoter-TTG Zeo selection marker and 53 different STAR elements, to obtain more insight in which STAR elements give the best results in this context.

Results

[0292] We cloned 53 STAR elements up-and downstream of the CMV promoter-TTG Zeo-d2EGFP cassette. The following STAR elements were tested in such constructs: STAR2-12, 14, 15, 17-20, 26-34, 36, 37, 39, 40, 42-49, 51, 52, 54, 55, 57-62, 64, 65, 67. The constructs were transfected to CHO-K1 cells and selection was performed with 200 .mu.g/ml ZEOCIN.RTM. in the culture medium. Up to 24 independent colonies were isolated and propagated before analysis of d2EGFP expression levels. Incorporation of STAR elements in the constructs resulted in different degrees of enhanced d2EGFP expression, as compared to the control. Incorporation of STAR elements 14, 18 and 55 in this experiment did not result in an increase of average d2EGFP expression over the control (no STAR element). Although some constructs (with STAR elements 2, 3, 10, 42, 48 and 49) in this experiment gave rise to only a few colonies, all tested STAR elements except 14, 18 and 55 resulted in average d2EGFP expression levels higher than for the control. It should be noted that some STAR elements may act in a more cell type specific manner and that it is well possible that STAR 14, 18 and 55 work better in other cell types, with other promoters, other selection markers, or in different context or configuration than in the particular set of conditions tested here. Addition of 10 STAR elements, namely STAR elements 7, 9, 17, 27, 29, 43, 44, 45, 47 and 61, induced average d2EGFP expression levels higher than 5 times the average d2EGFP expression level of the control. We retransformed the control and 7 constructs with STAR elements and repeated the experiment. The results are shown in FIG. 36. Incorporation of STAR elements in the constructs resulted in different degrees of enhanced d2EGFP expression, as compared to the control (FIG. 47). The average d2EGFP expression level in colonies transfected with the control construct was 29. The averages from d2EGFP expression levels in colonies with the 7 different STAR constructs ranged between 151 (STAR 67) and 297 (STAR 29). This is a factor of 5 to 10-fold higher than the average in the control colonies.

[0293] We conclude that a) the vast majority of STAR elements have a positive effect on gene expression levels, b) there is variation in the degree of positive effects induced by the different STAR elements, and c) 10 out of 53 tested STAR elements induce more than 5-fold average d2EGFP expression levels, as compared to the control, and that STAR elements can induce a 10-fold higher average d2EGFP expression level, as compared to the control.

Example 18

Other Chromatin Control Elements in the Context of a Selection System of the Invention

[0294] DNA elements such as the HS4 hypersensitive site in the locus control region of the chicken .beta.-globin locus (Chung et al., 1997), matrix attachment regions (MAR) (Stief et al., 1989) and a ubiquitous chromatin opening element (UCOE) (Williams et al., 2005) have been reported to have beneficial effects on gene expression when these DNA elements are incorporated in a vector. We combined these DNA elements with the selection system of the invention.

Results

[0295] The 1.25 kb HS4 element was cloned into the cassette encompassing the CMV promoter, TTG Zeo and d2EGFP by a three way ligation step to obtain a construct with a tandem of 2 HS4 elements (Chung et al., 1997). This step was done both for the 5' and 3' of the cassette encompassing the CMV promoter, TTG Zeo and d2EGFP. The 2959 by long chicken lysozyme MAR (Stief et al., 1989) was cloned 5' and 3' of the cassette encompassing the CMV promoter, TTG Zeo and d2EGFP. The 2614 by long UCOE (Williams et al., 2005) was a NotI-KpnI fragment, excised from a human BAC clone (RP11-93D5), corresponding to nucleotide 29449 to 32063. This fragment was cloned 5' of the CMV promoter. The STAR construct contained STAR7 and STAR67 5' of the CMV promoter and STAR7 3' of the cassette. These four constructs, as well as the control construct without flanking chromatin control DNA elements, were transfected to CHO-K1 cells. Selection was performed by 200 .mu.g/ml ZEOCIN.RTM. in the culture medium. Colonies were isolated, propagated and d2EGFP expression levels were measured. As shown in FIG. 37, constructs with all DNA elements resulted in the formation of d2EGFP expressing colonies. However, incorporation of 2.times. HS4 elements and the UCOE did not result in the formation of colonies that displayed higher d2EGFP expression levels, in comparison with the control colonies. In contrast, incorporation of the lysozyme MAR resulted in the formation of colonies that expressed d2EGFP significantly higher. The mean expression level induced by MAR containing constructs was four-fold higher than in the control colonies. Best results were obtained, however, by incorporating STAR 7 and 67 in the construct. An almost ten-fold increase in the mean d2EGFP expression level was observed, as compared to the control colonies. We conclude that other chromatin control DNA elements such as MARs can be used in the context of the selection system of the invention. However, the best results were obtained when STAR elements were used as chromatin control elements.

Example 19

Stringent Selection by Placing a Modified ZEOCIN.degree. Resistance Gene Behind an IRES Sequence

[0296] The previous examples (all from the incorporated '525 application) have shown a selection system where a sequence encoding a selectable marker protein is upstream of a sequence encoding a protein of interest in a multicistonic transcription unit, and wherein the translation initiation sequence of the selectable marker is non-optimal, and wherein further internal ATGs have been removed from the selectable marker coding sequence. This system results in a high stringency selection system. For instance the Zeo selection marker wherein the translation initiation codon is changed into TTG was shown to give very high selection stringency, and very high levels of expression of the protein of interest encoded downstream.

[0297] In another possible selection system the selection marker, e.g., Zeo, is placed downstream from an IRES sequence. This creates a multicistronic mRNA from which the Zeo gene product is translated by IRES-dependent initiation. In the usual d2EGFP-IRES-Zeo construct, the Zeo start codon is the optimal ATG. It is therefore possible that changing the Zeo ATG start codon into for instance TTG (referred to as IRES-TTG Zeo) may result in increased selection stringencies compared to the usual IRES-ATG Zeo.

Results

[0298] The used constructs are schematically shown in FIG. 38. The control construct consisted of a CMV promoter, the d2EGFP gene, an IRES sequence (the sequence of the used IRES (Rees et al., 1996) in this example was: GCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTC TATATGTGATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTG ACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAG CAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCC ACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACC CCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAG GGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTA CATGTGTTTAGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAA AACACGATGATAAGCTTGCCACAACCCCGGGATA; SEQ ID NO:127), and a TTG Zeo selection marker, i.e., the zeocin-resistance gene with a TTG start codon ("d2EGFP-IRES-TTG Zeo"). The other construct was the same, but with a combination of STAR 7 and STAR 67 placed upstream of the expression cassette and STAR 7 downstream of the cassette ("STAR7/67 d2EGFP-IRES-TTG Zeo STAR7"). Both constructs were transfected to CHO-K1 cells and selection was performed with 100 .mu.g/ml ZEOCIN.RTM. in the culture medium. Four colonies emerged after transfection with the control construct and six with the STAR containing construct. These independent colonies were isolated propagated before analysis of d2EGFP expression levels. As shown in FIG. 38, incorporation of STAR elements in the construct resulted in the formation of colonies with high d2EGFP expression levels. Of the control colonies without STAR elements ("d2EGFP-IRES-TTG Zeo") only one colony displayed some d2EGFP expression. The expression levels are also much higher than those obtained with other control constructs, containing the IRES with a normal Zeo with standard ATG start codon, either with or without STAR elements ("d2EGFP-IRES-ATG Zeo" and "STAR 7/67 d2EGFP-IRES-ATG Zeo STAR7"; also in these ATG Zeo constructs there was an enhancing effect of the STAR elements, but these are modest as compared to the novel TTG Zeo variant).

[0299] These results show that placing a Zeo selection marker with a TTG start codon downstream of an IRES sequence, in combination with STAR elements, operates well and establishes a stringent selection system.

[0300] From these data and the previous examples it will be clear that the marker can be varied along the same lines of the previous examples. For instance, instead of a TTG start codon, a GTG start codon can be used, and the marker can be changed from Zeo into a different marker, e.g., Neo, Blas, dhfr, puro, etc, all with either GTG or TTG as start codon. The STAR elements can be varied by using different STAR sequences or different placement thereof, or by substituting them for other chromatin control elements, e.g., MAR sequences. This leads to improvements over the prior art selection systems having an IRES with a marker with a normal ATG start codon.

[0301] As a non-limiting example, instead of the modified Zeo resistance gene (TTG Zeo) a modified Neomycin resistance gene is placed downstream of an IRES sequence. The modification consists of a replacement of the ATG translation initiation codon of the Neo coding sequence by a TTG translation initiation codon, creating TTG Neo. The CMV-d2EGF-IRES-TTG Neo construct, either surrounded by STAR elements or not, is transfected to CHO-K1 cells. Colonies are picked, cells are propagated and d2EGFP values are measured. This ("IRES-TTG Neo") leads to improvement over the known selection system having Neo with an ATG start codon downstream of an IRES ("IRES-ATG Neo"). The improvement is especially apparent when the TTG Neo construct comprises STAR elements.

Example 20

Increased Expression of Erythropoietin Using a Selection System of the Invention

[0302] The previous examples (of the incorporated '525 and '953 applications) have shown selection systems based on altered translation initiation codons for the selectable marker gene, and have employed d2EGFP as test protein of interest, and antibodies as protein of interest. This example shows the applicability of a selection system of the invention for improving protein expression levels of a secreted single chain protein that has therapeutic significance, viz. erythropoietin (EPO). EPO expression levels were analysed in the context of the TTG Zeo selection marker, using the CMV promoter and STAR elements in CHO cells.

[0303] STAR 7 and 67 were cloned to shield a TTG Zeo EPO cassette. Human EPO cDNA was derived from the plasmid pORF-hEPO (Invivogen). As control construct the TTG Zeo EPO cassette was not flanked with STAR elements (FIG. 39). In another control construct we used the IRES Zeo (with normal ATG start codon) configuration as selection system, considered the method giving the best results prior to the invention, which control contruct was either flanked with STARs 7 and 67 or not (FIG. 39). The constructs were transfected to CHO-K1 cells with Lipofectamine 2000 (Invitrogen) and selection was performed with 150 .mu.g/ml ZEOCIN.RTM. in the culture medium. The culture medium consisted of HAMF 12: DMEM=1:1, +10% foetal bovine serum. Up to 24 independent colonies were isolated and propagated before analysis of EPO expression levels. Per independent colony, 10.sup.5 cells were seeded and cultured in 6-well dishes for two days before cells were counted and the medium was collected. The amount of secreted human recombinant erythropoietin was determined using an ELISA-kit (R&D systems).

[0304] We found that the TTG Zeo-EPO control construct (A in FIG. 39) generated much less clones (5), as compared to the STAR containing TTG Zeo EPO construct (B in FIG. 39) (41 clones). Mean EPO expression levels increased from 3.3 pg/cell/day with the TTG Zeo-EPO control construct, to 17.7 pg/cell/day with the STAR containing TTG Zeo-EPO construct. The peak EPO expression level increased from respectively 5.5 to 28.3 pg/cell/day (FIG. 39). Also in comparison with the STAR containing EPO-IRES-Zeo construct (D in FIG. 39; 300 clones) and with the IRES contruct without STARs (C in FIG. 39; 164 clones) we again found that much less clones were formed with the STAR containing TTG Zeo-EPO construct of the invention (B in FIG. 39; 41 clones). Also, mean EPO expression levels increased from 9.0 pg/cell/day with the STAR containing EPO-IRES-Zeo control construct (D), to 17.7 pg/cell/day in the STAR containing TTG Zeo-EPO construct of the invention (B; see FIG. 39).

[0305] The obtained EPO expression levels with the construct of the invention are high in comparison to reported values of 12 pg/cell/day, which was achieved after gene amplification (Yoon et al., 2003, 2005). This result shows that the selection system of the invention can readily be applied for the production of important therapeutic proteins, such as EPO. As shown in FIG. 39 incorporation of STAR elements gave significantly higher EPO expression levels. The results further demonstrate that STAR elements are able to increase EPO expression levels.

[0306] In an alternative embodiment, the EPO sequence is cloned upstream of an IRES, which IRES is operably linked to a sequence encoding ZEOCIN.RTM. resistance having a TTG start codon, analogously to example 19, and STAR sequences are included in the expression construct as described above. It is expected that also this embodiment will improve expression of EPO compared to the situation where the sequence encoding ZEOCIN.RTM. resistance has a normal ATG start codon (such as in situation D in FIG. 39).

Example 21

STAR Sequences Operate Well in the Context of the Selection System of the Invention in CHO-DG44 Cells

[0307] Several previous examples show the selection system of the invention, with an impaired start codon for the selectable marker sequence, and preferably with the use of STAR sequences. In most cases in the examples above, CHO-K1 cells were used. CHO-DG44 is a different CHO cell line, which is dhfr.sup.-, and is a good suspension grower in contrast to CHO-K1, and hence has advantages for recombinant protein production on an industrial scale. Here it is shown that the selection system of the invention works well with several tested STAR sequences also in the CHO-DG44 cell line.

[0308] Seven different STAR elements were tested in a construct that encompasses the CMV promoter, upstream of the TTG Zeo selection marker and the d2EGFP gene. In all constructs STAR 67 was included, cloned immediately upstream of the CMV promoter (FIG. 40). As a control, a construct without STAR elements was included. The following STAR elements were tested in such constructs: STAR 7/67-7, 9/67-9, 17/67-17, 27/67-27, 43/67-43, 44/67-44 and 45/67-45. The constructs were transfected to CHO-DG44 cells with Lipofectamine 2000 (Invitrogen) and selection was performed with 150 .mu.g/ml ZEOCIN.RTM. in the culture medium. The culture medium consisted of HAMF12:DMEM=1:1, +10% foetal bovine serum. Up to 24 independent colonies were isolated and propagated before analysis of d2EGFP expression levels. As expected and as shown in FIG. 40, incorporation of the seven different STAR elements gave significantly higher d2EGFP expression levels, compared to the control without STAR elements. From the results it is clear that STAR elements are able to increase d2EGFP expression levels also in the CHO-DG44 cell line.

Example 22

Removing CpG Dinucleotides from the Selectable Marker Coding Sequence Improves Expression Using a Selection Method of the Invention

[0309] The selection methods of the invention, using different translation initiation codons for the selectable marker, such as GTG or TTG, can result in very stringent selection, and in very high levels of production for the polypeptide of interest, as shown in several examples above. In this example, the coding region of the selectable marker polypeptide gene itself was modified by removing CpG dinucleotides. The rationale is that the C nucleotide in the CpG nucleotide may be prone to methylation, which might result in gene silencing of the selectable marker, and thus removing CpG dinucleotides might improve the results. The zeocin-resistance gene with a TTG start codon was taken as the marker, and as many CpG dinucleotides were removed as was possible, without changing the amino acid sequence of the zeocin-resistance protein, and further without introducing ATG sequences in the coding strand, to prevent undesired translation initiation within the coding region of the zeocin-resistance protein (as explained, e.g., in Examples 1 and 2). Hence, some CpGs were not removed. The CpG content of the native sequence (here: containing a TTG start codon, and a mutation to remove the internal ATG sequence, see, e.g., Examples 1 and 2) is 13.3%, whereas after mutating the CpGs, the CpG content was reduced to 1.8% [referred to as "TTG Zeo (CpG poor)"]. The zeocin-resitance gene with decreased CpG content was cloned upstream of the d2EGFP coding sequence to result in a multicistronic expression construct of the invention (see, e.g., Example 2). Expression levels of d2EGFP were measured.

[0310] Constructs were prepared containing STARs 7 and 67 upstream of the CMV promoter, followed by the TTG Zeo (CpG poor) selection marker (synthesized by GeneArt GmbH, Regensburg, Germany; see SEQ ID NO:132; see SEQ ID NO:92 for the ZEOCIN.TM. antibiotic-resistance coding sequence with its natural CpG content), the d2EGFP gene and STAR 7 (FIG. 41). The constructs were transfected to CHO-K1 cells. DNA was transfected using Lipofectamine 2000 (Invitrogen) and cells were grown in the presence of 150 .mu.g/ml ZEOCIN.TM. antibiotic in HAM-F12 medium (Invitrogen)+10% FBS (Invitrogen).

[0311] Eight colonies emerged after transfection with the control "CpG-rich" TTG Zeo construct (A in FIG. 41) and none with the "CpG-poor" TTG Zeo containing construct (C in FIG. 41). In contrast, with both "CpG-rich" TTG Zeo (B in FIG. 41) and "CpG-poor" TTG Zeo (D in FIG. 41) selection markers more than 24 colonies emerged when STARs 7/67-7 was included in the construct. With the "CpG-rich" TTG ZEOCIN.TM. antibiotic selection marker (A in FIG. 41), the average d2EGFP expression with the STAR-less control construct was 140, and with the STAR containing construct 1332 (B in FIG. 41). This is an increase due to the presence of the STAR elements. The average d2EGFP expression with the STAR containing construct and the "CpG-poor" Zeo was 2453 (D in FIG. 41), an almost two-fold increase in comparison with the "CpG-rich" TTG Zeo (B in FIG. 41). Furthermore, the highest d2EGFP value achieved with the "CpG-rich" TTG Zeo construct (B) was 2481 and with the "CpG-poor" TTG Zeo (D) 4308.

[0312] We conclude that lowering the CpG content of the ZEOCIN.TM. antibiotic marker gene raises the stringency of the selection system. This results in higher d2EGFP expression values when STAR elements are included in the construct and no colonies with the control construct.

[0313] The same constructs were also transfected to CHO-DG44 cells, under the same conditions as in Example 21. With the "CpG-rich" TTG ZEOCIN.RTM. selection marker, the average d2EGFP expression with the STAR-less control construct was 43 (A in FIG. 42), and the average d2EGFP expression with the STAR containing constructs was 586 (B in FIG. 42). This is an increase due to the presence of the STAR elements. The average d2EGFP expression with the STAR constructs and the "CpG-poor" Zeo was 1152 (D in FIG. 42), an almost two-fold increase in comparison with the "CpG-rich" TTG Zeo (B in FIG. 42). Furthermore, the highest d2EGFP value achieved with the "CpG-rich" TTG Zeo construct was 1296 (B in FIG. 42) and with the "CpG-poor" TTG Zeo 2416 (D in FIG. 42). In contrast with CHO-K1, where no control colonies emerged with the "CpG-poor" TTG Zeo construct (C in FIG. 41), control colonies emerged with CHO-DG44, but the average d2EGFP value was 52 and the highest value in a colony was 115 (C in FIG. 42).

[0314] We conclude that also in CHO-DG44 addition of the "CpG-poor" TTG Zeo selection marker to the construct results in higher protein expression when STAR elements are employed.

[0315] It will be clear that the configuration where a zeocin-resistance gene with decreased CpG content and with a GTG or TTG start codon could also be placed downstream from the coding sequence for the polypeptide of interest (here d2EGFP as a model) when the zeocin-resistance protein coding sequences are placed under control of an IRES (see, e.g., Example 19). In that case, no care needs to be taken that mutation of CpG dinucleotides would introduce ATG sequences (as explained in the incorporated '953 application). It is expected that also in such embodiments, similar results can be obtained, i.e., that reduction of the CpG content of the selectable marker protein coding sequence will improve expression levels.

Example 23

Modifications in the Neomycin Resistance Coding Sequence in the Selection System of the Invention

[0316] The selection system of the invention, in which a modified start codon is employed for the sequence encoding the selectable marker polypeptide, is used here for the neomycin resistance gene. As described in examples above (from the incorporated '525 and '953 applications), different stringencies for selection can be designed by using different translation initiation codons for the selectable marker coding sequence, such as GTG or TTG. In this example, also the coding region of the neomycin resistance gene itself was modified, by removing as many CpG dinucleotides of the (ATG-less, so already devoid of ATG sequences in the coding strand) neomycin resistance gene as possible, while not changing the amino acid sequence of the neomycin resistance protein (except for the Met>Leu mutations where the internal ATG sequences were in-frame and replaced by CTG as compared to the wild-type sequence: obviously this was done for reasons of removing ATG sequences from the coding strand and independent from the effort of reducing the CpG content), and without introducing new ATG sequences in the coding strand, analogously to what was done in example 22 for the zeocin-resistance gene. The CpG content of the "wild type" neomycin selection marker gene is 10.4% (SEQ ID NO:128), while after the changes the CpG content was reduced to 2.3% (SEQ ID NO:130). Constructs containing the sequences for the neomycin resistance gene in this example were ordered from GeneArt GmbH, Regensburg, Germany. As a start codon, TTG was used in this example. The sequences used therefore consisted of SEQ ID NO:130, with the proviso that the start codon (first three nucleotides, ATG) was replaced by a TTG start codon, and further in certain cases contained one of the mutations indicated below.

[0317] In the "CpG poor" neomycin resistance gene, some mutations were made to change amino acids in the neomycin resistance protein, to test whether these have influence on the expression levels of the polypeptide of interest when used in the multicistronic transcription units of the invention. The mutations (Sautter et al., 2005; it is noted that the neo sequence used in the present application encodes three additional amino acids immediately after the start codon as compared to the sequence used by (Sautter et al., 2005), and hence the amino acid numbering in the present application is three higher as compared to the numbering in (Sautter et al., 2005)) consisted of a change from amino acid valine 201 (198 in Sautter et al., 2005) to glycine 201 (TTG Neo 201V>G), glutamic acid 185 (182 in Sautter et al., 2005) to aspartic acid 185 (TTG Neo 185E>D) and a double mutation in which both amino acid valine 201 and glutamic acid 185 were changed to glycine 201 and aspartic acid 185, respectively (TTG Neo 185E>D/201V>G) (FIG. 43). These modifications were compared with the control Neomycin (CpG poor TTG Neo 185E/201V). In all cases constructs were prepared with and without STAR elements (FIG. 43).

[0318] The modified TTG Neo selection marker was incorporated in a construct containing STARs 7 and 67 upstream of the CMV promoter, followed by the TTG Neo selection marker, the d2EGFP gene and STAR 7 (FIG. 43). The constructs were transfected to CHO-K1 cells. DNA was transfected using Lipofectamine 2000 (Invitrogen) and cells were grown in the presence of 500 .mu.g/ml G418 geneticin in HAM-F12 medium (Invitrogen)+10% FBS (Invitrogen).

[0319] With the control Neo construct (185E/201V) only a very limited effect of STAR elements was observed. This may at least in part be due to the numerous colonies that were generated under 500 .mu.g/ml G418 geneticin, indicating that the stringency of the TTG neomycin modification is low. However, the neomycin with modifications of the invention is operational: in the TTG Neo 185E 201V construct all ATGs were removed from the coding strand of the neomycin resistance gene, and although d2EGFP values were low, it is clear that the removal of ATGs still allowed proper selection under Geneticin selection pressure. When the Neomycin resistance gene was further modified, a distinctive effect of the addition of STAR elements was observed. The mean of 21 TTG Neo 201V>G control colonies was 65 (A2 in FIG. 43), whereas the mean d2EGFP signal of the 24 TTG Neo 201V>G colonies with STAR elements was 150 (B2 in FIG. 43). The selection stringency with the TTG Neo 185E>D mutation was further increased, since no control colonies survived without STAR elements (A3 in FIG. 43), whereas the mean d2EGFP signal of 17 surviving TTG Neo 185E>D STAR colonies was 204 (B3 in FIG. 43). This mean GFP fluorescence is higher than with the TTG Neo 201V>G colonies (B2 in FIG. 43). Also the highest d2EGFP value in TTG Neo 185E>D colonies was 715, as compared to 433 in the TTG Neo 201V>G colonies (compare B3 and B2 in FIG. 43). The highest stringency was observed in the double Neo mutant, TTG Neo 185E>D 201V>G. No control colonies survived (A4 in FIG. 43) and the mean d2EGFP value of 7 surviving STAR TTG Neo 185E>D 201V>G colonies was 513, with as highest d2EGFP value 923 (B4 in FIG. 43).

[0320] It is concluded that the introduction of specific mutations raises the stringency of selection of the Neomycin resistance gene when used according to the invention. Some of these modifications convey such selection stringency to the Neomycin resistance gene that only after incorporation with STAR elements colonies are able to survive, due to higher expression values. This concomitantly results in higher d2EGFP expression values. Clearly, the advantageous embodiments described herein of the neomycin resistance gene further improve the suitability of this gene for use according to the invention.

[0321] It will be clear that the configuration where a neomycin resistance gene with decreased CpG content and with a GTG or TTG start codon, and with the indicated mutations (185E>D and/or 201V>G) could also be placed downstream from the coding sequence for the polypeptide of interest (here d2EGFP as a model) when the neomycin resistance protein coding sequences are placed under control of an IRES (see, e.g., Example 19). In that case, no care needs to be taken that mutation of CpG dinucleotides would introduce ATG sequences (as explained in the incorporated '953 application). It is expected that also in such embodiments, good results can be obtained, i.e., that reduction of the CpG content and specific mutation at the indicated positions of the selectable marker protein coding sequence will improve expression levels.

Example 24

Use of Tryptophan Synthesizing Enzyme as Selection Marker in the Selection System of the Invention

[0322] Enzymes that are part of metabolic pathways can be effectively used as a selection marker. For instance, mammalian cells lack enzymes that are part of the metabolic pathway to create the amino acids tryptophan or histidine. Hence these amino acids need to be present in our food or, in case of cell lines, in the culture medium. These amino acids are therefore called essential. When the amino acids are omitted from the culture medium, the cells will die, unless a plasmid is transfected to the cells that encompass the (bacterial derived) enzymes that are lacking from the mammalian cell and that are essential for the synthesis of the respective amino acid. In this and the following two examples we describe the use of three enzymes that can be used as selection marker. Specifically, these markers with a GTG or TTG start codon are used in the context of constructs containing STAR elements, and are incorporated in the selection systems of the invention.

[0323] In this example, the tryptophan synthesizing enzyme (trp) is used as a selectable marker polypeptide. The trp protein specifically converts indole and L-serine into L-tryptophan. For use of trp as a selectable marker, a culture medium that is essentially devoid of tryptophan and which contains the non-toxic substance indol is used (Hartman and Mulligan, 1988). Indol is used as substrate for the synthesis of tryptophan. Constructs are designed to contain the CMV promoter, the d2EGFP gene and the tryptophan synthesizing enzyme coding sequence (trp) in several configurations (FIG. 44).

[0324] The synthesized constructs are flanked by STAR elements 7 and 67. trp (the trpB gene) can be derived from E. coli by PCR. More conveniently, the desired trp gene is synthesized using standard DNA synthesis methods (e.g., by GeneArt GmbH, Regensburg, Germany).

[0325] In a first embodiment the trp gene is modified such that all ATGs are removed. These include 14 ATGs that encode methionine (SEQ ID NO:136). The translation initiation codon is either GTG or TTG. These modified trp genes are placed upstream of d2EGFP (FIG. 44A).

[0326] Alternatively the wild type trp gene (containing all internal ATGs; SEQ ID NO:134) is placed downstream of the d2EGFP gene, but separated by an IRES sequence (See, Example 19) (FIG. 44B). Translation initiation of the trp mRNA will start at the translation initiation codon of trp. The first ATG (start codon) is replaced by GTG or TTG as a start codon. As a control in this configuration, a construct is also prepared with the normal ATG start codon for trp.

[0327] The constructs are transfected to CHO-K1 cells that are cultured in HAMF12 medium that is devoid of the amino acid tryptophan (obtained from Invitrogen). The medium contains 0.3 mM of the tryptophan precursor indole.

Example 25

Use of Histidine Synthesizing enzyme as Selection Marker in the Selection System of the Invention

[0328] In this example, the enzyme that is involved in the synthesis of the essential amino acid histidine, named histidinol dehydrogenase (hisD, herein referred to as his), is used as a selectable marker. The hisD protein specifically converts l-histidinol into l-histidine. For use of his as a selectable marker, a culture medium that is essentially devoid of histidine and which contains the substance histidinol is used (Hartman and Mulligan, 1988). Histidinol is used as substrate for the synthesis of histidine. Constructs are designed to contain the CMV promoter, the d2EGFP gene and the hsitidine syntesizing enzyme coding sequence (his) in several configurations (FIG. 45).

[0329] The synthesized constructs are flanked by STAR elements 7 and 67. his can be derived from Salmonella typhimurium by PCR. More conveniently, the desired his gene is synthesized using standard DNA synthesis methods (e.g., by GeneArt GmbH, Regensburg, Germany).

[0330] In a first embodiment the his gene is modified such that all ATGs are removed. These include 4 ATGs that encode methionine (SEQ ID NO:140). The translation initiation codon is either GTG or TTG. These modified his genes are placed upstream of d2EGFP (FIG. 45A).

[0331] Alternatively the wild type his gene (containing all internal ATGs; SEQ ID NO:138) is placed downstream of the d2EGFP gene, but separated by an IRES sequence (See, Example 19) (FIG. 45B). Translation initiation of the his mRNA will start at the translation initiation codon of his. The first ATG (start codon) is replaced by GTG or TTG as a start codon. As a control in this configuration, a construct is also prepared with the normal ATG start codon for his.

[0332] The constructs are transfected to CHO-K1 cells, that are cultured in HAMF12 medium that is devoid of the amino acid histidine (obtained from Invitrogen). The medium contains 0.125 mM of the histidine precursor histidinol.

Example 26

Use of dhfr Enzyme as Selection Marker in the Selection System of the Invention

[0333] In this example, the 5,6,7,8 tetrahydrofolate synthesizing enzyme dihydrofolate reductase (dhfr) is used as a selectable marker. The dhfr protein specifically converts folate into 5,6,7,8 tetrahydrofolate. For use of dhfr as a selectable marker according to this aspect of the invention, the non-toxic substance folate has to be present in the culture medium (Simonsen et al., 1988). Furthermore, the medium is essentially devoid of glycine, hypoxanthine and thymidine, since when these are available for the cell, the need for the dhfr enzyme is bypassed. Constructs are designed to contain the CMV promoter, the d2EGFP gene and the dhfr coding sequence in several configurations (FIG. 46).

[0334] The synthesized constructs are flanked by STAR elements 7 and 67. dhfr can be derived from mouse by PCR. More conveniently, the desired dhfr gene is synthesized using standard DNA synthesis methods (e.g., by GeneArt GmbH, Regensburg, Germany).

[0335] In a first embodiment the dhfr gene is modified such that all ATGs are removed. These include 6 ATGs that encode methionine, which are changed for codons that encode leucine (SEQ ID NO:122). The translation initiation codon is either GTG or TTG. These modified dhfr genes are placed upstream of d2EGFP (FIG. 46A).

[0336] Alternatively the wild type dhfr gene (containing all internal ATGs; SEQ ID NO:98) is placed downstream of the d2EGFP gene, but separated by an IRES sequence (See, Example 19) (FIG. 46B). Translation initiation of the dhfr mRNA will start at the translation initiation codon of dhfr. The first ATG (start codon) is replaced by GTG or TTG as a start codon. As a control in this configuration, a construct is also prepared with the normal ATG start codon for dhfr.

[0337] The constructs are transfected to CHO-DG44 cells, that are cultured in DMEM:HAMF12 (1:1) medium (Gibco, cat no. 11320-074), supplemented with 2 mM L-glutamine (Gibco, 25030-024), which medium is essentially devoid of glycine, hypoxanthine and thymidine, and which medium contains 6 .mu.M folic acid.

Example 27

Use of the trp and dhfr Enzymes as Additional Selection Markers Combined with the Selection System of the Invention

[0338] In certain embodiments, it may be beneficial to maintain (some) selection pressure during culturing of host cells for expression of polypeptides of interest from expression cassettes in the host cell. Although it is possible to do this using selectable marker polypeptides that confer resistance to antibiotics, it is more advantageous in view of costs and/or regulatory/safety issues to use for instance metabolic enzymes such as trp and/or dhfr, as described in Examples 24 and 26, respectively. The present example describes the use of trp and dhfr as an additional selectable marker in combination with the selection system of the invention, to be able to continuously select for the expression and of the expression unit that also expresses the polypeptide of interest. This selection pressure during the stage of expression of the polypeptide of interest may increase the expression levels in this stage as compared to a situation wherein only initially (for the establishment of selected clones) selection pressure is applied.

[0339] Constructs are designed to encompass the light (LC) and heavy chain (HC) of a monoclonal antibody, each under the control of the CMV promoter (FIG. 47A). The constructs are flanked by STAR elements 7 and 67. Also, between the expression cassettes for the LC and HC, STAR67 is placed. The cassette with the LC is placed upstream of the cassette with the HC, but of course the reverse order would also be possible, or alternatively the HC and LC expression cassettes could be on separate DNA molecules. The cassette with the LC is constructed as follows: the CMV promoter, the TTG Zeo selection marker (e.g., SEQ ID NO:132), the LC and an IRES sequence, followed by the trp gene (See, Example 24; SEQ ID NO:134). The trp gene is tested with an ATG, GTG or TTG translation initiation codon. The cassette with the HC is constructed as follows: the CMV promoter, the TTG Neo selection marker (See, Example 23; SEQ ID NO:130, but with a TTG start codon), the HC and an IRES sequence (see, e.g., Example 19), followed by the dhfr gene (See, Example 26; SEQ ID NO:98). The dhfr gene is tested with an ATG, GTG or TTG translation initiation codon (FIG. 47A).

[0340] Alternatively, a cassette can be constructed wherein the HC and/or LC are upstream of the two selectable marker sequences, wherein the selectable marker sequences each are preceded by an IRES (FIG. 47B).

[0341] It is clear that the same principle can be used for a single expression cassette, i.e., for expression of only one polypeptide of interest, for instance if that is not part of a multimeric protein. In that case only one of the two expression cassettes needs to be constructed (e.g., the one for HC, but with HC replaced by a sequence encoding another polypeptide of interest).

[0342] The constructs are transfected to CHO-DG44 cells that are cultured in DMEM:HAMF12 (1:1) medium. Selection takes place by 150 .mu.g/ml ZEOCIN.RTM. and 500 .mu.g/ml geneticin G418. Colonies are isolatated and cells are propagated. After first measurements of secreted monoclonal antibody in the culture medium, the cells are changed to DMEM:HAMF12 (1:1) medium (without ZEOCIN.RTM. and geneticin G418) (Gibco, cat no. 11320-074), supplemented with 2 mM L-glutamine (Gibco, 25030-024), which medium is essentially devoid of glycine, hypoxanthine and thymidine, and which medium contains 6 .mu.M folic acid, and/or to medium devoid of tryptophan, while containing 0.3 mM indole.

REFERENCES

[0343] Boshart, M, Weber, F, Jahn, G, Dorsch-Hasler, K, Fleckenstein, B, and Schaffner, W. (1985) A very strong enhancer is located upstream of an immediate early gene of human cytomegalovirus Cell 41, 521-530. [0344] Chung J H, Whiteley M, and Felsenfeld G. (1993) A 5' element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74: 505-514. [0345] Chung J H, Bell A C, Felsenfeld G. (1997). Characterization of the chicken beta-globin insulator. Proc Natl Acad Sci USA 94: 575-580. [0346] Das, G C, Niyogi, S K, and Salzman, N P. (1985) SV40 promoters and their regulation Prog Nucleic Acid Res Mol Biol 32, 217-236. [0347] Dumas, P, Bergdoll, M., Cagnon, C and Masson J M. 1994. Crystal structure and site-directed mutagenesis of a bleomycin resistance protein and their significance for drug sequestering. EMBO J 13, 2483-2492. [0348] Gill D R, Smyth S E, Goddard C A, Pringle I A, Higgins C F, Colledge W H, and Hyde S C. (2001) Increased persistence of lung gene expression using plasmids containing the ubiquitin C or elongation factor 1.alpha. promoter. Gene Therapy 8: 1539-1546. [0349] Gossen M, and Bujard H. (1992) Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proc Natl Acad Sci USA 89: 5547-5551. [0350] Graham F O, Smiley J, Russell W and Nairn R. (1977). Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J. Gen. Virol. 36, 59-72. [0351] Hartman S C, and Mulligan R C. 1988. Two dominant-acting selectable markers for gene transfer studies in mammalian cells. Proc Natl Acad Sci USA. 85, 8047-8051. [0352] Huls G A, Heijnen I A F M, Cuomo M E, Koningsberger J C, Wiegman L, Boel E, van der Vuurst-de Vries A-R, Loyson S A J, Helfrich W, van Berge Henegouwen G P, van Meijer M, de Kruif J, Logtenberg T. (1999). A recombinant, fully human monoclonal antibody with antitumor activity constructed from phage-displayed antibody fragments. Nat Biotechnol. 17, 276-281. [0353] Jones D, Kroos N, Anema R, Van Montfort B, Vooys A, Van Der Kraats S, Van Der Helm E, Smits S, Schouten J, Brouwer K, Lagerwerf F, Van Berkel P, Opstelten D-J, Logtenberg T, Bout A (2003) High-level expression of recombinant IgG in the human cell line PER.C6. Biotechnol. Prog. 19: 163-168. [0354] Kaufman, R J. (2000) Overview of vector design for mammalian gene expression Mol Biotechnol 16, 151-160. [0355] Kaufman, R J, and Sharp, P A. (1982) Construction of a modular dihydrofolate reductase cDNA gene: analysis of signals utilized for efficient expression Mol Cell Biol 2, 1304-1319. [0356] Kellum R, and Schedl P. (1991) A position-effect assay for boundaries of higher order chromosomal domains. Cell 64: 941-950. [0357] Kim S J, Kim Ns, Ryu C J, Hong H J, Lee G M. 1998. Characterization of chimeric antibody producing CHO cells in the course of dihydrofolate reductase-mediated gene amplification and their stability in the absence of selective pressure. Biotechnol Bioeng 58: 73-84. [0358] Kozak M. (1986) Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44: 283-292. [0359] Kozak M. (1987) An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 15: 8125-8148. [0360] Kozak M. (1989) Context effects and inefficient initiation at non-AUG codons in eucaryotic cell-free translation systems. Mol Cell Biol. 9: 5073-5080. [0361] Kozak M. (1990) Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes. Proc Natl Acad Sci USA 87:8301-8305. [0362] Kozak M. (1997) Recognition of AUG and alternative initiator codons is augmented by G in position +4 but is not generally affected by the nucleotides in positions +5 and +6. EMBO J. 16: 2482-2492. [0363] Kozak M. (2002) Pushing the limits of the scanning mechanism for initiation of translation. Gene 299: 1-34. [0364] Kwaks T H, Barnett P, Hemrika W, Siersma T, Sewalt R G, Satijn D P, Brons J F, van Blokland R, Kwakman P, Kruckeberg A L, Kelder A, Otte A P. (2003) Identification of anti-repressor elements that confer high and stable protein production in mammalian cells. Nat Biotechnol 21, 553-558. Erratum in: Nat Biotechnol 21, 822 (2003). [0365] Lopez de Quinto, S, and Martinez-Salas, E. (1998) Parameters influencing translational efficiency in aphthovirus IRES-based bicistronic expression vectors Gene 217, 51-6. [0366] Phi-Van L, Von Kreis J P, Ostertag W, and Stratling W H. (1990) The chicken lysozyme 5' matrix attachment region increases transcription from a heterologous promoter in heterologous cells and dampens position effects on the expression of transfected genes. Mol. Cell. Biol. 10: 2302-2307. [0367] Martinez-Salas, E. (1999) Internal ribosome entry site biology and its use in expression vectors Curr Opin Biotechnol 10, 458-64. [0368] McBurney, M W, Mai, T, Yang, X, and Jardine, K. (2002) Evidence for repeat-induced gene silencing in cultured Mammalian cells: inactivation of tandem repeats of transfected genes Exp Cell Res 274, 1-8. [0369] Mizuguchi, H, Xu, Z, Ishii-Watabe, A, Uchida, E, and Hayakawa, T. (2000) IRES-dependent second gene expression is significantly lower than cap-dependent first gene expression in a bicistronic vector Mol Ther 1, 376-82. [0370] Rees, S, Coote, J, Stables, J, Goodson, S, Harris, S, and Lee, M G. (1996) Bicistronic vector for the creation of stable mammalian cell lines that predisposes all antibiotic-resistant cells to express recombinant protein Biotechniques 20, 102-104, 106, 108-110. [0371] Sautter, K, Enenkel, B. 2005. Selection of high-producing CHO cells using NPT selection marker with reduced enzyme activity. Biotechnol Bioeng. 89, 530-538. [0372] Schorpp, M, Jager, R, Schellander, K, Schenkel, J, Wagner, E F, Weiher, H, and Angel, P. (1996) The human ubiquitin C promoter directs high ubiquitous expression of transgenes in mice Nucleic Acids Res 24, 1787-8. [0373] Simonsen, C. S., Waltter, M. and Levinson, A. D. 1988. Expression of the plasmid-encoded type I dihydrofolate reductase gene in cultured mammalian cells: a novel selectable marker. Nucleic acids Res. 16, 22355-22246. [0374] Stief A, Winter D M, Stratling W H, Sippel A E (1989) A nuclear DNA attachment element mediates elevated and position-independent gene activity. Nature 341: 343-345. [0375] Van der Vlag, J, den Blaauwen, J L, Sewalt, R G, van Driel, R, and Otte, A P. (2000) Transcriptional repression mediated by polycomb group proteins and other chromatin-associated repressors is selectively blocked by insulators. J Biol Chem 275, 697-704. [0376] Venkatesan, A, and Dasgupta, A. (2001) Novel fluorescence-based screen to identify small synthetic internal ribosome entry site elements Mol Cell Biol 21, 2826-37. [0377] West A G, Gaszner M, Felsenfeld G (2002) Insulators: many functions, many mechanisms. Genes Dev. 16: 271-288. [0378] Whitelaw, E, Sutherland, H, Kearns, M, Morgan, H, Weaving, L, and Garrick, D. (2001) Epigenetic effects on transgene expression Methods Mol Biol 158, 351-68. [0379] Williams S, Mustoe T, Mulcahy T, Griffiths M, Simpson D, Antoniou M, Ivine A, Mountain A, Crombie R (2005) CpG-island fragments from the HNRPA2B1/CBX3 genomic locus reduce silencing and enhance transgene expression from the hCMV promoter/enhancer in mammalian cells. BMC Biotechnol. 5:17. [0380] Yoon S K, Song J Y, and Lee G M (2003) Effect of low culture temperature on specific productivity, transcription level, and heterogeneity of erythropoietin in Chinese hamster ovary cells. Biotechnol Bioeng. 82: 289-298 [0381] Yoon S K, Hong J K, Choo S H, Song J Y, Park H W, and Lee G M (2006) Adaptation of Chinese hamster ovary cells to low culture temperature: Cell growth and recombinant protein production. J Biotechnol. 122: 463-472.

Sequence CWU 1

1

1411749DNAHomo sapiensmisc_featuresequence of STAR1 1atgcggtggg ggcgcgccag agactcgtgg gatccttggc ttggatgttt ggatctttct 60gagttgcctg tgccgcgaaa gacaggtaca tttctgatta ggcctgtgaa gcctcctgga 120ggaccatctc attaagacga tggtattgga gggagagtca cagaaagaac tgtggcccct 180ccctcactgc aaaacggaag tgattttatt ttaatgggag ttggaatatg tgagggctgc 240aggaaccagt ctccctcctt cttggttgga aaagctgggg ctggcctcag agacaggttt 300tttggccccg ctgggctggg cagtctagtc gaccctttgt agactgtgca cacccctaga 360agagcaacta cccctataca ccaggctggc tcaagtgaaa ggggctctgg gctccagtct 420ggaaaatctg gtgtcctggg gacctctggt cttgcttctc tcctcccctg cactggctct 480gggtgcttat ctctgcagaa gcttctcgct agcaaaccca cattcagcgc cctgtagctg 540aacacagcac aaaaagccct agagatcaaa agcattagta tgggcagttg agcgggaggt 600gaatatttaa cgcttttgtt catcaataac tcgttggctt tgacctgtct gaacaagtcg 660agcaataagg tgaaatgcag gtcacagcgt ctaacaaata tgaaaatgtg tatattcacc 720ccggtctcca gccggcgcgc caggctccc 7492883DNAHomo sapiensmisc_featuresequence of STAR2 2gggtgcttcc tgaattcttc cctgagaagg atggtggccg gtaaggtccg tgtaggtggg 60gtgcggctcc ccaggccccg gcccgtggtg gtggccgctg cccagcggcc cggcaccccc 120atagtccatg gcgcccgagg cagcgtgggg gaggtgagtt agaccaaaga gggctggccc 180ggagttgctc atgggctcca catagctgcc ccccacgaag acggggcttc cctgtatgtg 240tggggtccca tagctgccgt tgccctgcag gccatgagcg tgcgggtcat agtcgggggt 300gccccctgcg cccgcccctg ccgccgtgta gcgcttctgt gggggtggcg ggggtgcgca 360gctgggcagg gacgcagggt aggaggcggg gggcagcccg taggtaccct gggggggctt 420ggagaagggc gggggcgact ggggctcata cgggacgctg ttgaccagcg aatgcataga 480gttcagatag ccaccggctc cggggggcac ggggctgcga cttggagact ggccccccga 540tgacgttagc atgcccttgc ccttctgatc ctttttgtac ttcatgcggc gattctggaa 600ccagatcttg atctggcgct cagtgaggtt cagcagattg gccatctcca cccggcgcgg 660ccggcacagg tagcggttga agtggaactc tttctccagc tccaccagct gcgcgctcgt 720gtaggccgtg cgcgcgcgct tggacgaagc ctgccccggc gggctcttgt cgccagcgca 780gctttcgcct gcgaggacag agagaggaag agcggcgtca ggggctgccg cggccccgcc 840cagcccctga cccagcccgg cccctccttc caccaggccc caa 88332126DNAHomo sapiensmisc_featuresequence of STAR3 3atctcgagta ctgaaatagg agtaaatctg aagagcaaat aagatgagcc agaaaaccat 60gaaaagaaca gggactacca gttgattcca caaggacatt cccaaggtga gaaggccata 120tacctccact acctgaacca attctctgta tgcagattta gcaaggttat aaggtagcaa 180aagattagac ccaagaaaat agagaacttc caatccagta aaaatcatag caaatttatt 240gatgataaca attgtctcca aaggaacaag gcagagtcgt gctagcagag gaagcacgtg 300agctgaaaac agccaaatct gctttgtttt catgacacag gagcataaag tacacaccac 360caactgacct attaaggctg tggtaaaccg attcatagag agaggttcta aatacattgg 420tccctcacag gcaaactgca gttcgctccg aacgtagtcc ctggaaattt gatgtccagt 480atagaaaagc agagcagtca aaaaatatag ataaagctga accagatgtt gcctgggcaa 540tgttagcagc accacactta agatataacc tcaggctgtg gactccctcc ctggggagcg 600gtgctgccgg cggcgggcgg gctccgcaac tccccggctc tctcgcccgc cctcccgttc 660tcctcgggcg gcggcggggg ccgggactgc gccgctcaca gcggcggctc ttctgcgccc 720ggcctcggag gcagtggcgg tggcggccat ggcctcctgc gttcgccgat gtcagcattt 780cgaactgagg gtcatctcct tgggactggt tagacagtgg gtgcagccca cggagggcga 840gttgaagcag ggtggggtgt cacctccccc aggaagtcca gtgggtcagg gaactccctc 900ccctagccaa gggaggccgt gagggactgt gcccggtgag agactgtgcc ctgaggaaag 960gtgcactctg gcccagatac tacacttttc ccacggtctt caaaacccgc agaccaggag 1020attccctcgg gttcctacac caccaggacc ctgggtttca accacaaaac cgggccattt 1080gggcagacac ccagctagct gcaagagttg tttttttttt tatactcctg tggcacctgg 1140aacgccagcg agagagcacc tttcactccc ctggaaaggg ggctgaaggc agggaccttt 1200agctgcgggc tagggggttt ggggttgagt gggggagggg agagggaaaa ggcctcgtca 1260ttggcgtcgt ctgcagccaa taaggctacg ctcctctgct gcgagtagac ccaatccttt 1320cctagaggtg gagggggcgg gtaggtggaa gtagaggtgg cgcggtatct aggagagaga 1380aaaagggctg gaccaatagg tgcccggaag aggcggaccc agcggtctgt tgattggtat 1440tggcagtgga ccctcccccg gggtggtgcc ggaggggggg atgatgggtc gaggggtgtg 1500tttatgtgga agcgagatga ccggcaggaa cctgccccaa tgggctgcag agtggttagt 1560gagtgggtga cagacagacc cgtaggccaa cgggtggcct taagtgtctt tggtctcctc 1620caatggagca gcggcggggc gggaccgcga ctcgggttta atgagactcc attgggctgt 1680aatcagtgtc atgtcggatt catgtcaacg acaacaacag ggggacacaa aatggcggcg 1740gcttagtcct acccctggcg gcggcggcag cggtggcgga ggcgacggca ctcctccagg 1800cggcagccgc agtttctcag gcagcggcag cgcccccggc aggcgcggtg gcggtggcgc 1860gcagccaggt ctgtcaccca ccccgcgcgt tcccaggggg aggagactgg gcgggagggg 1920ggaacagacg gggggggatt caggggcttg cgacgcccct cccacaggcc tctgcgcgag 1980ggtcaccgcg gggccgctcg gggtcaggct gcccctgagc gtgacggtag ggggcggggg 2040aaaggggagg agggacaggc cccgcccctc ggcagggcct ctagggcaag ggggcggggc 2100tcgaggagcg gaggggggcg gggcgg 212641625DNAHomo sapiensmisc_featuresequence of STAR4 4gatctgagtc atgttttaag gggaggattc ttttggctgc tgagttgaga ttaggttgag 60ggtagtgaag gtaaaggcag tgagaccacg taggggtcat tgcagtaatc caggctggag 120atgatggtgg ttcagttgga atagcagtgc atgtgctgta acaacctcag ctgggaagca 180gtatatgtgg cgttatgacc tcagctggaa cagcaatgca tgtggtggtg taatgacccc 240agctgggtag ggtgcatgtg gtgtaacgac ctcagctggg tagcagtgtg tgtgatgtaa 300caacctcagc tgggtagcag tgtacttgat aaaatgttgg catactctag atttgttatg 360agggtagtgc cattaaattt ctccacaaat tggttgtcac gtatgagtga aaagaggaag 420tgatggaaga cttcagtgct tttggcctga ataaatagaa gacgtcattt ccagttaatg 480gagacaggga agactaaagg tagggtggga ttcagtagag caggtgttca gttttgaata 540tgatgaactc tgagagagga aaaacttttt ctacctctta gtttttgtga ctggacttaa 600gaattaaagt gacataagac agagtaacaa gacaaaaata tgcgaggtta tttaatattt 660ttacttgcag aggggaatct tcaaaagaaa aatgaagacc caaagaagcc attagggtca 720aaagctcata tgccttttta agtagaaaat gataaatttt aacaatgtga gaagacaaag 780gtgtttgagc tgagggcaat aaattgtggg acagtgatta agaaatatat gggggaaatg 840aaatgataag ttattttagt agatttattc ttcatatcta ttttggcttc aacttccagt 900ctctagtgat aagaatgttc ttctcttcct ggtacagaga gagcaccttt ctcatgggaa 960attttatgac cttgctgtaa gtagaaaggg gaagatcgat ctcctgtttc ccagcatcag 1020gatgcaaaca tttccctcca ttccagttct caaccccatg gctgggcctc atggcattcc 1080agcatcgcta tgagtgcacc tttcctgcag gctgcctcgg gtagctggtg cactgctagg 1140tcagtctatg tgaccaggag ctgggcctct gggcaatgcc agttggcagc ccccatccct 1200ccactgctgg gggcctccta tccagaaggg cttggtgtgc agaacgatgg tgcaccatca 1260tcattcccca cttgccatct ttcaggggac agccagctgc tttgggcgcg gcaaaaaaca 1320cccaactcac tcctcttcag gggcctctgg tctgatgcca ccacaggaca tccttgagtg 1380ctgggcagtc tgaggacagg gaaggagtga tgaccacaaa acaggaatgg cagcagcagt 1440gacaggagga agtcaaaggc ttgtgtgtcc tggccctgct gagggctggc gagggccctg 1500ggatggcgct cagtgcctgg tcggctgcaa gaggccagcc ctctgcccat gaggggagct 1560ggcagtgacc aagctgcact gccctggtgg tgcatttcct gccccactct ttccttctaa 1620gatcc 162551571DNAHomo sapiensmisc_featuresequence of STAR5 5cacctgattt aaatgatctg tctggtgagc tcactgggtc tttactcgca tgctgggtcc 60acagctccac tgtcctgcag ggtccgtgag tgtgggcccc ttatctattt catcatcata 120accctgcgtg tcctcaactc ctggcacata ttgggtggcc ccatccacac acggttgttg 180agtgaatcca tgagatgaca aaggctatga tgtagactat atcatgagcc agaaccaggc 240tttcctacct ccagacaatc aagggccttg atttgggatt gagggagaaa ggagtagaag 300ccaggaagga gaagagattg aggtttacca agggtgcaaa gtcctggccc ctgactgtag 360gctgaaaact atagaaatga tagaacaatt ttgcaatgaa atgcagaaga ccctgcatca 420actttaggtg ggacttcggg tatttttatg gccacagaac atcctcccat ttacctgcat 480ggcccagaca cagacttcaa aacagttgag gccagcaggc tccaggtaag tggtaggatt 540ccagaatgcc ctcagagtgt tgtgggaggc agcaggcgat tttcctggac ttctgagttt 600atgagaaccc caaaccccaa ttggcattaa cattgaggtc tcaatgtatc atggcaggaa 660gcttccgagt ggtgaaaagg aaagtgaaca tcaaagctcg gaagacaaga gggtggagtg 720atggcaacca agagcaagac ccttccctct cctgtgatgg ggtggctcta tgtgaagccc 780ccaaactgga cacaggtctg gcagaatgag gaacccactg agatttagcg ccaacatcca 840gcataaaagg gagactgaca tagaatttga gttagttaaa aataaggcac aatgcttttc 900atgtattcct gagttttgtg gactggtgtt caatttgcag cattcttagt tgattaaatc 960tgagatgaag aaagagtgtc caacactttc accttggaaa gctctggaaa agcaaaaggg 1020agagacaatt agcttcatcc attaactcac ttagtcatta tgcattcatt catgtaacta 1080ccaaacacgt actgagtgcc taacactcct gagacactga gaagtttctt gggaatacaa 1140agatgaataa aaaccacgcc aggcaggagt tggaggaagg ttctggatgc caccacgctc 1200tacctcctgg ctggacacca ggcaatgttg gtaaccttct gcctccaatt tctgcaaata 1260cataattaat aaacacaagg ttatcttcta aacagttctt aaaatgagtc aactttgttt 1320aaacttgttc tttttagaga aaaatgtatt tttgaaagag ttggttagtg ctaggggaaa 1380tgtctgggca cagctcagtc tggtgtgaga gcaggaagca gctctgtgtg tctggggtgg 1440gtacgtatgt aggacctgtg ggagaccagg ttgggggaag gcccctcctc atcaagggct 1500cctttgcttt ggtttgcttt ggcgtgggag gtgctgtgcc acaagggaat acgggaaata 1560agatctctgc t 157161173DNAHomo sapiensmisc_featuresequence of STAR6 6tgacccacca cagacatccc ctctggcctc ctgagtggtt tcttcagcac agcttccaga 60gccaaattaa acgttcactc tatgtctata gacaaaaagg gttttgacta aactctgtgt 120tttagagagg gagttaaatg ctgttaactt tttaggggtg ggcgagaggg atgacaaata 180acaacttgtc tgaatgtttt acatttctcc ccactgcctc aagaaggttc acaacgaggt 240catccatgat aaggagtaag acctcccagc cggactgtcc ctcggccccc agaggacact 300ccacagagat atgctaactg gacttggaga ctggctcaca ctccagagaa aagcatggag 360cacgagcgca cagagcaggg ccaaggtccc agggacagaa tgtctaggag ggagattggg 420gtgagggtaa tctgatgcaa ttactgtggc agctcaacat tcaagggagg gggaagaaag 480aaacagtccc tgtcaagtaa gttgtgcagc agagatggta agctccaaaa tttgaaactt 540tggctgctgg aaagttttag ggggcagaga taagaagaca taagagactt tgagggttta 600ctacacacta gacgctctat gcatttattt atttattatc tcttatttat tactttgtat 660aactcttata ataatcttat gaaaacggaa accctcatat acccatttta cagatgagaa 720aagtgacaat tttgagagca tagctaagaa tagctagtaa gtaaaggagc tgggacctaa 780accaaaccct atctcaccag agtacacact cttttttttt ttccagtgta atttttttta 840atttttattt tactttaagt tctgggatac atgtgcagaa ggtatggttt gttacatagg 900tatatgtgtg ccatagtgga ttgctgcacc tatcaacccg tcatctaggt ttaagcccca 960catgcattag ctatttgtcc tgatgctctc cctcccctcc ccacaccaga caggccttgg 1020tgtgtgatgt tcccctccct gtgtccatgt gttctcactg ttcagctccc acttatgagt 1080gagaacgtgt ggtatttggt tttctgttcc tgtgttagtt tgctgaggat gatggcttcc 1140agcttcatcc atgtccctgc aaaggacacg atc 117372101DNAHomo sapiensmisc_featuresequence of STAR7 7aggtgggtgg atcacccgag gtcaggagtt caagaccagc ctggccaaca tggtaaaacc 60tcgtctctac taaaaaatac gaaaaattag ctggttgtgg tggtgcgtgc ttgtaatccc 120agctactcgg gaggctgagg caggagaatc acttgaatct gggaggcaga ggttgcagtg 180agctgagata gtgccattgc actccagcct gggcaacaga cggagactct gtctccaaaa 240aaaaaaaaaa aaatcttaga ggacaagaat ggctctctca aacttttgaa gaaagaataa 300ataaattatg cagttctaga agaagtaatg gggatatagg tgcagctcat gatgaggaag 360acttagctta actttcataa tgcatctgtc tggcctaaga cgtggtgagc tttttatgtc 420tgaaaacatt ccaatataga atgataataa taatcacttc tgacccccct tttttttcct 480ctccctagac tgtgaagcag aaaccccata tttttcttag ggaagtggct acgcactttg 540tatttatatt aacaactacc ttatcaggaa attcatattg ttgccctttt atggatgggg 600aaactggaca agtgacagag caaaatccaa acacagctgg ggatttccct cttttagatg 660atgattttaa aagaatgctg ccagagagat tcttgcagtg ttggaggaca tatatgacct 720ttaagatatt ttccagctca gagatgctat gaatgtatcc tgagtgcatg gatggacctc 780agttttgcag attctgtagc ttatacaatt tggtggtttt ctttagaaga aaataacaca 840tttataaata ttaaaatagg cccaagacct tacaagggca ttcatacaaa tgagaggctc 900tgaagtttga gtttgttcac tttctagtta attatctcct gcctgtttgt cataaatgcg 960tttagtaggg agctgctaat gacaggttcc tccaacagag tgtggaagaa ggagatgaca 1020gctggcttcc cctctgggac agcctcagag ctagtgggga aactatgtta gcagagtgat 1080gcagtgacca agaaaatagc actaggagaa agctggtcca tgagcagctg gtgagaaaag 1140gggtggtaat catgtatgcc ctttcctgtt ttatttttta ttgggtttcc ttttgcctct 1200caattccttc tgacaataca aaatgttggt tggaacatgg agcacctgga agtctggttc 1260attttctctc agtctcttga tgttctctcg ggttcactgc ctattgttct cagttctaca 1320cttgagcaat ctcctcaata gctaaagctt ccacaatgca gattttgtga tgacaaattc 1380agcatcaccc agcagaactt aggttttttt ctgtcctccg tttcctgacc tttttcttct 1440gagtgcttta tgtcacctcg tgaaccatcc tttccttagt catctaccta gcagtcctga 1500ttcttttgac ttgtctccct acaccacaat aaatcactaa ttactatgga ttcaatccct 1560aaaatttgca caaacttgca aatagattac gggttgaaac ttagagattt caaacttgag 1620aaaaaagttt aaatcaagaa aaatgacctt taccttgaga gtagaggcaa tgtcatttcc 1680aggaataatt ataataatat tgtgtttaat atttgtatgt aacatttgaa taccttcaat 1740gttcttattt gtgttatttt aatctcttga tgttactaac tcatttggta gggaagaaaa 1800catgctaaaa taggcatgag tgtcttatta aatgtgacaa gtgaatagat ggcagaaggt 1860ggattcatat tcagttttcc atcaccctgg aaatcatgcg gagatgattt ctgcttgcaa 1920ataaaactaa cccaatgagg ggaacagctg ttcttaggtg aaaacaaaac aaacacgcca 1980aaaaccttta ttctctttat tatgaatcaa atttttcctc tcagataatt gttttattta 2040tttattttta ttattattgt tattatgtcc agtctcactc tgtcgcctaa gctggcatga 2100t 210181821DNAHomo sapiensmisc_featuresequence of STAR8 8gagatcacct cgaagagagt ctaacgtccg taggaacgct ctcgggttca caaggattga 60ccgaacccca ggatacgtcg ctctccatct gaggcttgct ccaaatggcc ctccactatt 120ccaggcacgt gggtgtctcc cctaactctc cctgctctcc tgagcccatg ctgcctatca 180cccatcggtg caggtccttt ctgaagagct cgggtggatt ctctccatcc cacttccttt 240cccaagaaag aagccaccgt tccaagacac ccaatgggac attccccttc cacctccttc 300tccaaagttg cccaggtgtt catcacaggt tagggagaga agcccccagg tttcagttac 360aaggcatagg acgctggcat gaacacacac acacacacac acacacacac acacacacac 420acacgactcg aagaggtagc cacaagggtc attaaacact tgacgactgt tttccaaaaa 480cgtggatgca gttcatccac gccaaagcca agggtgcaaa gcaaacacgg aatggtggag 540agattccaga ggctcaccaa accctctcag gaatattttc ctgaccctgg gggcagaggt 600tggaaacatt gaggacattt cttgggacac acggagaagc tgaccgacca ggcattttcc 660tttccactgc aaatgaccta tggcgggggc atttcacttt cccctgcaaa tcacctatgg 720cgaggtacct ccccaagccc ccacccccac ttccgcgaat cggcatggct cggcctctat 780ccgggtgtca ctccaggtag gcttctcaac gctctcggct caaagaagga caatcacagg 840tccaagccca aagcccacac ctcttccttt tgttataccc acagaagtta gagaaaacgc 900cacactttga gacaaattaa gagtccttta tttaagccgg cggccaaaga gatggctaac 960gctcaaaatt ctctgggccc cgaggaaggg gcttgactaa cttctatacc ttggtttagg 1020aaggggaggg gaactcaaat gcggtaattc tacagaagta aaaacatgca ggaatcaaaa 1080gaagcaaatg gttatagaga gataaacagt tttaaaaggc aaatggttac aaaaggcaac 1140ggtaccaggt gcggggctct aaatccttca tgacacttag atataggtgc tatgctggac 1200acgaactcaa ggctttatgt tgttatctct tcgagaaaaa tcctgggaac ttcatgcact 1260gtttgtgcca gtatcttatc agttgattgg gctcccttga aatgctgagt atctgcttac 1320acaggtcaac tccttgcgga agggggttgg gtaaggagcc cttcgtgtct cgtaaattaa 1380ggggtcgatt ggagtttgtc cagcattccc agctacagag agccttattt acatgagaag 1440caaggctagg tgattaaaga gaccaacagg gaagattcaa agtagcgact tagagtaaaa 1500acaaggttag gcatttcact ttcccagaga acgcgcaaac attcaatggg agagaggtcc 1560cgagtcgtca aagtcccaga tgtggcgagc ccccgggagg aaaaaccgtg tcttccttag 1620gatgcccgga acaagagcta ggcttccgga gctaggcagc catctatgtc cgtgagccgg 1680cgggagggag accgccggga ggcgaagtgg ggcggggcca tccttctttc tgctctgctg 1740ctgccgggga gctcctggct ggcgtccaag cggcaggagg ccgccgtcct gcagggcgcc 1800gtagagtttg cggtgcagag t 182191929DNAHomo sapiensmisc_featuresequence of STAR9 9cacttcctgg gagtggagca gaggctctgc gtggagcatc catgtgcagt actcttaggt 60acggaaggga ttgggctaaa ccatggatgg gagctgggaa gggaagggac caacttcagg 120ccccactggg acactggagc tgccaccctt tagagccctc ctaaccctac accagaggct 180gagggggacc tcagacatca cacacatgct ttcccatgtt ttcagaaatc tggaaacgta 240gaacttcagg ggtgagagtg cctagatatt gaatacaagg ctagattggg cttctgtaat 300atcccaaagg accctccagc tttttcacca gcacctaatg cccatcagat accaaagaca 360cagcttagga gaggttcacc ctgaagctga ggaggaggca gccggattag agttgactga 420gcaaggatga ctgccttctc cacctgacga tttcagctgc tgcccttttc ttttcctggg 480aatgcctgtc gccatggcct tctgtgtcca caggagagtt tgacccagat actcatggac 540caggcaaagg tgctgttcct cccagcccag ggcccaccat gaagcatgcc tgggagcctg 600gtaaggaccc agccactcct gggctgttga cattggcttc tcttgcccag cattgtagcc 660acgccactgc attgtactgt gagataagtc aaggtgggct caccaggacc tgcactaaat 720tgtgaaattc agctccaaag aactttggaa attacccatg catttaagca aaatgaatga 780tacctgagca aaccctttca cattggcaca agttacaatc ctgtctcatc ctcttgatta 840caaattccat ccaggcaaga gctgtatcac cctgaggtct ccccattcat gttttggtca 900ataatattta gtttcctttt gaaaatagat ttttgtgtta ctccattatg atgggcagag 960gccagatgct tatattctat ttaaatgact atgtttttct atctgtaact gggtttgtgt 1020tcaggtggta aatgcttttt ttttgcagtc agaagattcc tggaaggcga ccagaaatta 1080gctggccgct gtcagacctg aagttacttc taaagggcct ttagaaatga attctttttt 1140atgccttctc tgaattctga gaagtaggct tgacttcccc taagtgtgga gttgggagtc 1200aactcttctg aaaagaaagt ttcagagcat tttccaaagc catggtcagc tgtgggaagg 1260gaagacgatg gatagtacag ttgccggaaa acactgatgg aggcggatgc tccagctcag 1320ccaaagacct ttgttctgcc caccccagaa atgccccttc ctcaatcgca gaaacgttgc 1380cccatggctc ctgatactca gaatgcagcc tctgaccagg accatctgca tcctccagga 1440gctcgtaaga aatgcagcat cgtgggacct gctggcacct ggtgaaccca aacctgcagg 1500gctcctgggt gtgcttgggg cggctgcagg ggaagaggga gtcagcagcc tcctcctgac 1560cttcccgggg gctgcttttc tgaggggcca gaatgcaccg gttgaccttg ttgcatcact 1620ggcccatgac tggctgcttt ggtcaggtgt aaaaaggtgt ttccagaggg tctgctcctc 1680tcactatcgg accaggtttc catggagagc tcagcctccc agcaaggata gagaacttca 1740aatggctcaa agaactgaga ggccacacat gtgtgacctg aatagtctct gctgcaaaac 1800aaagggtttc ttaatgtaaa acgttctctt cctcacagag gggttcccag ctgctagtgg 1860gcatgttgca ggcatttcct gggctgcatc aggttgtcat aagccagagg atcatttttg 1920ggggctcat 1929101167DNAHomo sapiensmisc_featuresequence of STAR10 10aggtcaggag ttcaagacca gcctggccaa catggtgaaa ccctgtccct acaaaaaata 60caaaaattag ccgggcgtgg tggggggcgc ctataatccc agctactcag gatgctgaga 120caggagaatt gtttgaaccc gggaggtgga

ggttgcagtg aactgagatc gcgccactgc 180actccagcct ggtgacagag agagactccg tctcaacaac agacaaacaa acaaacaaac 240aacaacaaaa atgtttactg acagctttat tgagataaaa ttcacatgcc ataaaggtca 300ccttctacag tatacaattc agtggattta gtatgttcac aaagttgtac gttgttcacc 360atctactcca gaacatttac atcaccccta aaagaagctc tttagcagtc acttctcatt 420ctccccagcc cctgccaacc acgaatctac tntctgtctc tattctgaat atttcatata 480aaggagtcct atcatatggg ccttttacgt ctaccttctt tcacttagca tcatgttttt 540aagattcatc cacagtgtag cacgtgtcag ttaattcatt tcatcttatg gctggataat 600gctctattgt atgcatatcc ctcactttgc ttatccattc atcaactgat tgacatttgg 660gttatttcta ctttttgact attatgagta atgctgctat gaacattcct gtaccaatcg 720ttacgtggac atatgctttc aattctcctg agtatgtaac tagggttgga gttgctgggt 780catatgttaa ctcagtgttt catttttttg aagaactacc aaatggtttt ccaaagtgga 840tgcaacactt tacattccca ccagcaagat atgaaggttc caatgtctct acatttttgc 900caacacttgt gattttcttt tatttattta tttatttatt tatttttgag atggagtctc 960actctgtcac ccaggctgga gtgcagtggc acaatttcag ctcactgcaa tctccacctc 1020tcgggctcaa gcgatactcc tgcctcaacc tcccgagtaa ctgggattac aggcgcccac 1080caccacacca agctaatttt ttgtattttt agtagagacg gggtttcatc atgtcggcca 1140ggntgtactc gaactctgac ctcaagt 1167111377DNAHomo sapiensmisc_featuresequence of STAR11 11aggatcactt gagcccagga gttcaagacc agcctgggca acatagcgag aacatgtctc 60aaaaaggaaa aaaatggggg aaaaaaccct cccagggaca gatatccaca gccagtcttg 120ataagctcca tcattttaaa gtgcaaggcg gtgcctccca tgtggatgat tatttaatcc 180tcttgtactt tgtttagtcc tttgtggaaa tgcccatctt ataaattaat agaattctag 240aatctaatta aaatggttca actctacatt ttactttagg ataatatcag gaccatcaca 300gaatgtctga gatgtggatt taccctatct gtagctcact tcttcaacca ttcttttagc 360aaggctagtt atcttcagtg acaacccctt gctgccctct actatctcct ccctcagatg 420gactactctg attaagcttg agctagaata agcatgttat cccgggattt catatggaat 480attttataca tgagtgagcc attatgagtt gtttgaaaat ttattatgtt gagggagggt 540aaccgctgta acaaccatca ccaaatctaa tcgactgaat acatttgacg tttatttctt 600gttcacctga cagttcagtg ttacctaaat ttacatgaag acccagaggc ccacgctcct 660tcattttggg ctccaccgac ctccaaggtt tcagggccct ctgccccgcc ttctgcaccc 720acaggggaag agagtggagg atgcacacgc ccaggcctgg aagtgacgca tgtggcttcc 780ccgtccacag acttcaccca cagtccattg gccttcttaa gtcatggact cctgctgagc 840tgccagggtg catgggaaat ccatgtgact gtgtgccctg gaggaagggg agcgtttcgg 900tgagcacaca ggagtctttg ccactagacg ctgatgagga ttccccacag gcgatgaagc 960atggagactc atcttgtaac aaacagatga gttgttgaca tctcttaagt ttactttgtg 1020tgcagttttt attcagatag gaaaggctgt taaaatctta acacctaact ggaagaaggg 1080ttttagagaa gtgtggtttt cagtaagcca gttctttcca caatccaaga aacgaaataa 1140atttccagca tggagcagtt ggcaggtaag gtttttgttg tggtctcgcc caggcttgag 1200tgtaaccggt gtggtcatag ctcactacat tctcaaactc ctggccttaa gtcatcctcc 1260tgcctcagcc tcccaaaggc aagtaaggtt aagaataggg gaaaggtgaa gtttcacagc 1320ttttctagaa ttctttttat tcaagggact ctcagatcat caaacccacc cagaatc 1377121051DNAHomo sapiensmisc_featuresequence of STAR12 12atcctgcttc tgggaagaga gtggcctccc ttgtgcaggt gactttggca ggaccagcag 60aaacccaggt ttcctgtcag gaggaagtgc tcagcttatc tctgtgaagg gtcgtgataa 120ggcacgagga ggcaggggct tgccaggatg ttgcctttct gtgccatatg ggacatctca 180gcttacgttg ttaagaaata tttggcaaga agatgcacac agaatttctg taacgaatag 240gatggagttt taagggttac tacgaaaaaa agaaaactac tggagaagag ggaagccaaa 300caccaccaag tttgaaatcg attttattgg acgaatgtct cactttaaat ttaaatggag 360tccaacttcc ttttctcacc cagacgtcga gaaggtggca ttcaaaatgt ttacacttgt 420ttcatctgcc tttttgctaa gtcctggtcc cctacctcct ttccctcact tcacatttgt 480cgtttcatcg cacacatatg ctcatcttta tatttacata tatataattt ttatatatgg 540cttgtgaaat atgccagacg agggatgaaa tagtcctgaa aacagctgga aaattatgca 600acagtgggga gattgggcac atgtacattc tgtactgcaa agttgcacaa cagaccaagt 660ttgttataag tgaggctggg tggtttttat tttttctcta ggacaacagc ttgcctggtg 720gagtaggcct cctgcagaag gcattttctt aggagcctca acttccccaa gaagaggaga 780gggcgagact ggagttgtgc tggcagcaca gagacaaggg ggcacggcag gactgcagcc 840tgcagagggg ctggagaagc ggaggctggc acccagtggc cagcgaggcc caggtccaag 900tccagcgagg tcgaggtcta gagtacagca aggccaaggt ccaaggtcag tgagtctaag 960gtccatggtc agtgaggctg agacccaggg tccaatgagg ccaaggtcca gagtccagta 1020aggccgagat ccagggtcca gggaggtcaa g 1051131291DNAHomo sapiensmisc_featuresequence of STAR13 13agccactgag gtcctaactg cagccaaggg gccgttctgc acatgtcgct caccctctgt 60gctctgttcc ccacagagca aacgcacatg gcaacgttgg tccgctcagc cactggttct 120gtggtggaac ggtggatgtc tgcactgtga catcagctga gtaagtaaca acgactgagg 180atgccgctga cccagggctg gggaagggga ctcccagctc agacaggctt ggctgtggtt 240tgctttggga ggagagtgaa catcacaggg aatggctcat gtcagcccca ggagggtggg 300ctggcccctg gtccccgggc tccttctggc cctgcaggcg atagagagcc tcaacctgct 360gccgcttctc cttggcccgg gtgatggccg tctggaagag cctgcagtag aggtgcacag 420ccagcggaga gtcgtcattg ccgggtacag ggtaggtgat gaggcagggg ttgcagttgg 480tgtccacgat gcccactgtg gggatgttca tcttggctgc gtctctcacg gccacgtgtg 540gctcaaagat gttgttgagc gtgtgcagga agatgatgag gtccggcagg cggaccgtgg 600ggccaaagag gaggcgcgcg ttggtcagca tgccgcccct gaagtagcga gtgtgggcgt 660actcgccaca gtcacgggcc atgttctcaa tcaggtacga gaactgccgg ttgcggctta 720taaacaagat gatgcccttg cggtaggcca tgtgggcggt gaagttcaag gccagctgga 780ggtgcgtggc tgtctgttcc aggtcgatga tgtcgtggtc caggcggctc ccaaagatgt 840acggctccat aaacctgcca gagaccccac caaggcaagg gggatgagag ttcacggggc 900catctccact ggctccttgc aggaacacag acgcccacca gggactcccg ggctcctctg 960tgggggcact atgggctggg aagcacaatt tgcaacgctc cccgtgtgca tggacagcag 1020tgcagaccca tccaggccac ccctctgcat gcctcgtctc gtggcttaac ccctcctacc 1080ctctacctct tcccgaagga atcctaatag aactgacccc atatggatgt gtggacatcc 1140aacatgacgc caaaaggaca ttctgccccg tgcagctcac agggcagccg cctccgtcac 1200tgtcctcttc ccgaggcttt gcggatgagg cccctctggg gttggactta gcggggtgct 1260ctgggccaaa agcattaagg gatcagggca g 129114711DNAHomo sapiensmisc_featuresequence of STAR14 14ccctggacca gggtccgtgg tcttggtggg cactggcttc ttcttgctgg gtgttttcct 60gtgggtctct ggcaaggcac tttttgtggc gctgcttgtg ctgtgtgcgg gaggggcagg 120tgctctttcc tcttggagct ggaccctctg gggcgggtcc ccgtcggcct ccttgtgtgt 180tttctgcacc tggtacagct ggatggcctc ctcaatgccg tcgtcgctgc tggagtcgga 240cgcctcgggc gcctgtacgg cgctcgtgac tcgctttccc ctccttgcgg tgctggcgtt 300ccttttaatc ccacttttat tctgtactgc ttctgaaggg cggtgggggt tgctggcttt 360gtgctgccct ccttctcctg cgtggtcgtg gtcgtgacct tggacctgag gcttctgggc 420tgcacgtttg tctttgctaa ccgggggagg tctgcagaag gcgaactcct tctggacgcc 480catcaggccc tgccggtgca ccacctttgt agccggctct tggtgggatt tcgagagtga 540cttcgccgaa ttttcatgtg tgtctggttt cttctccact gacccatcac atttttgggt 600ctcatgctgt cttttctcat tcagaaactg ttctatttct gccctgatgc tctgctcaaa 660ggagtctgct ctgctcatgc tgactgggga ggcagagccc tggtccttgc t 711151876DNAHomo sapiensmisc_featuresequence of STAR15 15gagtccaaga tcaaggtgcc agcatcttgt gagggccttc ttgttacgtc actccctagc 60gaaagggcaa agagagggtg agcaagagaa aggggggctg aactcgtcct tgtagaagag 120gcccattccc gagacaatgg cattcatcca ttcactccac cctcatggcc tcaccacctc 180tcatgaggct ccacctccca gccctggttt gttggggatt aaatttccaa cacatgcctt 240ttgggggaca tgttaaaatt atagcacccc aaatgttaca ctatcttttg atgagcggta 300gttctgattt taagtctagc tggcctactt tttcttgcac gtgggatgct ttctgcctgt 360tccagggcag gcagctcttc tctgtccctc tgctggcccc acctcatcct ctgttgtcct 420cttccctcct tctgtgccct ggggtcctgg tgggggtgtg actgtcaact gcgttgggct 480aacttttttc cctgctggtg gcccgtaatg aaagaaagct tcttgctccc aagttcctta 540aatccaagct catagacaac gcggtctcac agcaggcctg gggccagcct cacgtgagcc 600ccttccctgg tgtagtcact ggcatggggg aatgggattt cctgttgccc tactgtgtgg 660ctgaggtggg ggttgcttcc tggagccagg ccttgtggaa gggcagtgcc cactgcagtg 720gatgctgggc cctgaatctg accccagtgt tcattggctc tgtgagaccc agtgagggca 780gggagggaag tggagctggg gtgagaagta gaggccctgc agggcccacg tgccagccac 840caggcctcag actaggctca gatgacggag agctgcacac ctgcccaacc caggccctgc 900agtgcccaca tgccagccgc tggggcccag acttgctcca gagggcggag agctttacac 960cggcccaacc caggccatgg ctccaaatgc gtgacagttt tgctgttgct tcttttagtc 1020attgtcaagt tgatgcttgt tttgcagagg accaaggctt tatgaaccta ttaccctgtg 1080tgaagagttt caccaggtta tggaaatttc tttaaaacca taccacagtt ttttcattat 1140tcatgtatat ttttaaaaat aattactgca ctcagtagaa taacatgaaa atgttgcctg 1200ttagcccttt tccagtttgc cccgagaata ctgggggcac ttgtggctgc aatgtttatc 1260ctgcggcagc tttgccatga agtatctcac ttttattatt atttttgcat tgctcgagta 1320tattgacttt ggaaacaaaa gacatcattc tatttatagc attatgtttt tagtagtggt 1380atttccatat acaagataca gtaattttcc gtcaatgaaa atgtcaaatt ctagaaaatg 1440taacattcct atgcgtggtg ttaacatcgt tctctaacag ttgttggccg aagattcgtt 1500tgatgaatcc gatttttcca aaatagccga ttctgatgat tcagacgatt ctgatgttct 1560gtttagaaat aattccaaga acagttttta cattttattt tcacattgaa aatcagtcag 1620atttgcttca gcctcaaaga gcacgtttat gtaaaattaa atgagtgctg gcagccagct 1680gcgctttgtt tttctaaatg ggaaaagggt taaatttcac tcagctttta aatgacagcg 1740cacagcctgt gtcatagagg gttggaggag atgactttaa ctgcctgtgg ttaggatccc 1800tttcccccag gaatgtctgg gagcccactg ccgggtttgc tgtccgtctc gtttggactc 1860agttctgcat gtactg 1876161282DNAHomo sapiensmisc_featuresequence of STAR16 16cgcccacctc ggctttccaa agtgctggga ttacaggcat gagtcactgc gcccatcctg 60attccaagtc tttagataat aacttaactt tttcgaccaa ttgccaatca ggcaatcttt 120gaatctgcct atgacctagg acatccctct ccctacaagt tgccccgcgt ttccagacca 180aaccaatgta catcttacat gtattgattg aagttttaca tctccctaaa acatataaaa 240ccaagctata gtctgaccac ctcaggcacg tgttctcagg acctccctgg ggctatggca 300tgggtcctgg tcctcagatt tggctcagaa taaatctctt caaatatttt ccagaatttt 360actcttttca tcaccattac ctatcaccca taagtcagag ttttccacaa ccccttcctc 420agattcagta atttgctaga atggccacca aactcaggaa agtattttac ttacaattac 480caatttatta tgaagaactc aaatcaggaa tagccaaatg gaagaggcat agggaaaggt 540atggaggaag gggcacaaag cttccatgcc ctgtgtgcac accaccctct cagcatcttc 600atgtgttcac caactcagaa gctcttcaaa ctttgtcatt taggggtttt tatggcagtt 660ccactatgta ggcatggttg ataaatcact ggtcatcggt gatagaactc tgtctccagc 720tcctctctct ctcctcccca gaagtcctga ggtggggctg aaagtttcac aaggttagtt 780gctctgacaa ccagccccta tcctgaagct attgaggggt cccccaaaag ttaccttagt 840atggttggaa gaggcttatt atgaataaca aaagatgctc ctatttttac cactagggag 900catatccaag tcttgcggga acaaagcatg ttactggtag caaattcata caggtagata 960gcaatctcaa ttcttgcctt ctcagaagaa agaatttgac caagggggca taaggcagag 1020tgagggacca agataagttt tagagcagga gtgaaagttt attaaaaagt tttaggcagg 1080aatgaaagaa agtaaagtac atttggaaga gggccaagtg ggcgacatga gagagtcaaa 1140caccatgccc tgtttgatgt ttggcttggg gtcttatatg atgacatgct tctgagggtt 1200gcatccttct cccctgattc ttcccttggg gtgggctgtc cgcatgcaca atggcctgcc 1260agcagtaggg aggggccgca tg 128217793DNAHomo sapiensmisc_featuresequence of STAR17 17atccgagggg aggaggagaa gaggaaggcg agcagggcgc cggagcccga ggtgtctgcg 60agaactgttt taaatggttg gcttgaaaat gtcactagtg ctaagtggct tttcggattg 120tcttatttat tactttgtca ggtttcctta aggagagggt gtgttggggg tgggggagga 180ggtggactgg ggaaacctct gcgtttctcc tcctcggctg cacagggtga gtaggaaacg 240cctcgctgcc acttaacaat ccctctatta gtaaatctac gcggagactc tatgggaagc 300cgagaaccag tgtcttcttc cagggcagaa gtcacctgtt gggaacggcc cccgggtccc 360cctgctgggc tttccggctc ttctaggcgg cctgatttct cctcagccct ccacccagcg 420tccctcaggg acttttcaca cctccccacc cccatttcca ctacagtctc ccagggcaca 480gcacttcatt gacagccaca cgagccttct cgttctcttc tcctctgttc cttctctttc 540tcttctcctc tgttccttct ctttctctgt cataatttcc ttggtgcttt cgccacctta 600aacaaaaaag agaaaaaaat aaaataaaaa aaacccattc tgagccaaag tattttaaga 660tgaatccaag aaagcgaccc acatagccct ccccacccac ggagtgcgcc aagacgcacc 720caggctccat cacagggccg agagcagcgc cactctggtc gtacttttgg gtcaagagat 780cttgcaaaag agg 79318492DNAHomo sapiensmisc_featuresequence of STAR18 18atctttttgc tctctaaatg tattgatggg ttgtgttttt tttcccacct gctaataaat 60attacattgc aacattcttc cctcaacttc aaaactgctg aactgaaaca atatgcataa 120aagaaaatcc tttgcagaag aaaaaaagct attttctccc actgattttg aatggcactt 180gcggatgcag ttcgcaaatc ctattgccta ttccctcatg aacattgtga aatgaaacct 240ttggacagtc tgccgcattg cgcatgagac tgcctgcgca aggcaagggt atggttccca 300aagcacccag tggtaaatcc taacttatta ttcccttaaa attccaatgt aacaacgtgg 360gccataaaag agtttctgaa caaaacatgt catctttgtg gaaaggtgtt tttcgtaatt 420aatgatggaa tcatgctcat ttcaaaatgg aggtccacga tttgtggcca gctgatgcct 480gcaaattatc ct 492191840DNAHomo sapiensmisc_featuresequence of STAR19 19tcacttcctg atattttaca ttcaaggcta gctttatgca tatgcaacct gtgcagttgc 60acagggcttt gtgttcagaa agactagctc ttggtttaat actctgttgt tgccatcttg 120agattcatta taatataatt tttgaatttg tgttttgaac gtgatgtcca atgggacaat 180ggaacattca cataacagag gagacaggtc aggtggcagc ctcaattcct tgccaccctt 240ttcacataca gcattggcaa tgccccatga gcacaaaatt tgggggaacc atgatgctaa 300gactcaaagc acatataaac atgttacctc tgtgactaaa agaagtggag gtgctgacag 360cccccagagg ccacagttta tgttcaaacc aaaacttgct tagggtgcag aaagaaggca 420atggcagggt ctaagaaaca gcccatcata tccttgttta ttcatgttac gtccctgcat 480gaactaatca cttacactga aaatattgac agaggaggaa atggaaagat agggcaaccc 540atagttcttt ttccttttag tctttcctta tcagtaaacc aaagatagta ttggtaaaat 600gtgtgtgagt taattaatga gttagtttta ggcagtgttt ccactgttgg ggtaagaaca 660aaatatatag gcttgtattg agctattaaa tgtaaattgt ggaatgtcag tgattccaag 720tatgaattaa atatccttgt atttgcattt aaaattggca ctgaacaaca aagattaaca 780gtaaaattaa taatgtaaaa gtttaatttt tacttagaat gacattaaat agcaaataaa 840agcaccatga taaatcaaga gagagactgt ggaaagaagg aaaacgtttt tattttagta 900tatttaatgg gactttcttc ctgatgtttt gttttgtttt gagagagagg gatgtggggg 960cagggaggtc tcattttgtt gcccaggctg gacttgaact cctgggctcc agctatcctg 1020ccttagcttc ttgagtagct gggactacag gcacacacca cagtgtctga cattttctgg 1080attttttttt tttttttatt ttttttgtga gacaggttct ggctctgtta ctcaggttgc 1140agtgcagtgg catgatagcg gctcactgca gcctcaacct cctcagctta agctactctc 1200ccacttcagc ctcctgagta gccaggacta cagttgtgtg ccaccacacc tgtggctaat 1260ttttgtagag atggggtctc tccacgttgc cgaggctggt ctccaactcc tggtctcaag 1320cgaacctcct gacttggcct cccgaagtgc tgggattaca ggcttgagcc actgcatcca 1380gcctgtcctc tgtgttaaac ctactccaat ttgtctttca tctctacata aacggctctt 1440ttcaaagttc ccatagacct cactgttgct aatctaataa taaattatct gccttttctt 1500acatggttca tcagtagcag cattagattg ggctgctcaa ttcttcttgg tatattttct 1560tcatttggct tctggggcat cacactctct ttgagttact cattcctcat tgatagcttc 1620ttcctagtct tctttactgg ttcttcctct tctccctgac tccttaatat tgtttttctc 1680cccaggcttt agttcttagt cctcttctgt tatctattta cacccaattc tttcagagtc 1740tcatccagag tcatgaactt aaacctgttt ctgtgcagat aattcacatt attatatctc 1800cagcccagac tctcccgcaa actgcagact gatcctactg 184020780DNAHomo sapiensmisc_featuresequence of STAR20 20gatctcaagt ttcaatatca tgttttggca aaacattcga tgctcccaca tccttaccta 60aagctaccag aaaggctttg ggaactgtca acagagctac agaaaagtca gtaaagacca 120atggacccct caaacaaaaa cagccaagct tttctgccaa aaagatgact gagaagactg 180ttaaagcaaa aaactctgtt cctgcctcag atgatggcta tccagaaata gaaaaattat 240ttcccttcaa tcctctaggc ttcgagagtt ttgacctgcc tgaagagcac cagattgcac 300atctcccctt gagtgaagtg cctctcatga tacttgatga ggagagagag cttgaaaagc 360tgtttcagct gggcccccct tcacctttga agatgccctc tccaccatgg aaatccaatc 420tgttgcagtc tcctttaagc attctgttga ccctggatgt tgaattgcca cctgtttgct 480ctgacataga tatttaaatt tcttagtgct ttagagtttg tgtatatttc tattaataaa 540gcattatttg tttaacagaa aaaaagatat atacttaaat cctaaaataa aataaccatt 600aaaaggaaaa acaggagtta taactaataa gggaacaaag gacataaaat gggataataa 660tgcttaatcc aaaataaagc agaaaatgaa gaaaaatgaa atgaagaaca gataaataga 720aaacaaatag caatatgaaa gacaaacttg accgggtgtg gtggctgatg cctgtaatcc 78021607DNAHomo sapiensmisc_featuresequence of STAR21 21gatcaataat ttgtaatagt cagtgaatac aaaggggtat atactaaatg ctacagaaat 60tccattcctg ggtataaatc ctagacatat ttatgcatat gtacaccaag atatatctgc 120aagaatgttc acagcaaatc tctttgtagt agcaaaaggc caaaaggtct atcaacaaga 180aaattaatac attgtggcac ataatggcat ccttatgcca ataaaaatgg atgaaattat 240agttaggttc aaaaggcaag cctccagata atttatatca tataattcca tgtacaacat 300tcaacaacaa gcaaaactaa acatatacaa atgtcaggga aaatgatgaa caaggttaga 360aaatgattaa tataaaaata ctgcacagtg ataacattta atgagaaaaa aagaaggaag 420ggcttaggga gggacctaca gggaactcca aagttcatgg taagtactaa atacataatc 480aaagcactca aaatagaaaa tattttagta atgttttagc tagttaatat cttacttaaa 540acaaggtcta ggccaggcac ggtggctcac acctgtaatc ccagcacttt gggaggctga 600ggcgggt 607221380DNAHomo sapiensmisc_featuresequence of STAR22 22cccttgtgat ccacccgcct tggcctccca aagtgctggg attacaggcg tgagtcacta 60cgcccggcca ccctccctgt atattatttc taagtatact attatgttaa aaaaagttta 120aaaatattga tttaatgaat tcccagaaac taggatttta catgtcacgt tttcttatta 180taaaaataaa aatcaacaat aaatatatgg taaaagtaaa aagaaaaaca aaaacaaaaa 240gtgaaaaaaa taaacaacac tcctgtcaaa aaacaacagt tgtgataaaa cttaagtgcc 300tgaaaattta gaaacatcct tctaaagaag ttctgaataa aataaggaat aaaataatca 360catagttttg gtcattggtt ctgtttatgt gatggattat gtttattgat ttgtgtatgt 420tgaacttatc tcaatagatg cagacaaggc cttgataaaa gtttttaaca ccttttcatg 480ttgaaaactc tcaatagact aggtattgat gaaacatatc tcaaaataat agaagctatt 540tatgataaac ccatagccaa tatcatactg agtgggcaaa agctggaagc attccctttg 600aaaactggca caagacaagg atgccctctc tcaccactcc tattaaatgt agtattggaa 660gttctggcca gagcaatcag gcaggagaaa gaaaaggtat taaaatagga agagaggaag 720tcaaattgtc tctgtttgca gtaaacatga ttgtatattt agaaaacccc attgtctcat 780cctaaaaact ccttaagctg ataaacaact tcagcaaagt ctcaggatac aaaatcaatg 840tgcaaaaatc acaagcattc ctatacaccg

ataatagaca gcagagagcc aaatcatgag 900tgaagtccca ttcacaattg cttcaaagaa aataaaatac ttaggaatac aactttcacg 960ggacatgaag gacattttca aggacaacta aaaaccactg ctcaaggaaa tgagagagga 1020cacaaagaaa tggaaaaaca ttccatgctc atggaagaat caatatcatg aaaatggcca 1080tactgcccaa agtaatttat agattcaatg ctaaccccat caagccacca ttgactttct 1140tcacagaact agaaaaaaac tattttaaaa ctcatatgta gtcaaaaaga gtcggtatag 1200ccaagacaat cctaagcata aagaacaaag ctggatgcat cacgctgact tcaaaccata 1260ctacaaggct acagtaacca aaacagcatg gtactggtac caaaacagat agatagaccg 1320atagaacaga acagaggcct cggaaataac accacacatc tacaaccctt tgatcttcaa 1380231246DNAHomo sapiensmisc_featuresequence of STAR23 23atcccctcat ccttcagggc agctgagcag ggcctcgagc agctggggga gcctcactta 60atgctcctgg gagggcagcc agggagcatg gggtctgcag gcatggtcca gggtcctgca 120ggcggcacgc accatgtgca gccgccccca cctgttgctc tgcctccgcc acctggccat 180gggcttcagc agccagccac aaagtctgca gctgctgtac atggacaaga agcccacaag 240cagctagagg accttgtgtt ccacgtgccc agggagcatg gcccacagcc caaagaccag 300tcaggagcag gcaggggctt ctggcaggcc cagctctacc tctgtcttca cacagatggg 360agatttctgt tgtgattttg agtgatgtgc ccctttggtg acatccaaga tagttgctga 420agcaccgctc taacaatgtg tgtgtattct gaaaacgaga acttctttat tctgaaataa 480ttgatgcaaa ataaattagt ttggatttga aattctattc atgtaggcat gcacacaaaa 540gtccaacatt gcatatgaca caaagaaaag aaaaagcttg cattccttaa atacaaatat 600ctgttaacta tatttgcaaa tatatttgaa tacacttcta ttatgttaca tataatatta 660tatgtatatg tatatataat atacatatat atgttacata taatatactt ctattatgtt 720acatataata tttatctata agtaaataca taaatataaa gatttgagta gctgtagaac 780attgtcttat gtgttatcag ctactactac aaaaatatct cttccactta tgccagtttg 840ccatataaat atgatcttct cattgatggc ccagggcaag agtgcagtgg gtacttattc 900tctgtgagga gggaggagaa aagggaacaa ggagaaagtc acaaagggaa aactctggtg 960ttgccaaaat gtcaagtttc acatattccg agacggaaaa tgacatgtcc cacagaagga 1020ccctgcccag ctaatgtgtc acagatatct caggaagctt aaatgatttt tttaaaagaa 1080aagagatggc attgtcactt gtttcttgta gctgaggctg tgggatgatg cagatttctg 1140gaaggcaaag agctcctgct ttttccacac cgagggactt tcaggaatga ggccagggtg 1200ctgagcacta caccaggaaa tccctggaga gtgtttttct tactta 124624939DNAHomo sapiensmisc_featuresequence of STAR24 24acgaggtcac gagttcgaga ccagcctggc caagatggtg aagccctgtc tctactaaaa 60atacaacaag tagccgggcg cggtgacggg cgcctgtaat cccagctact caggaggctg 120aagcaggaga atctctagaa cccaggaggc ggaggtgcag tgagctgaga ctgccccgct 180gcactctagc ctgggcaaca cagcaagact ctgtctcaaa taaataaata aataaataaa 240taaataaata aataaataaa tagaaaggga gagttggaag tagatgaaag agaagaaaag 300aaatcctaga tttcctatct gaaggcacca tgaagatgaa ggccacctct tctgggccag 360gtcctcccgt tgcaggtgaa ccgagttctg gcctccattg gagaccaaag gagatgactt 420tggcctggct cctagtgagg aagccatgcc tagtcctgtt ctgtttgggc ttgatcctgt 480atcacttgat tgtctctcct ggactttcca tggattccag ggatgcaact gagaagttta 540tttttaatgc acttacttga agtaagagtt attttaaaac attttagcaa aggaaatgaa 600ttctgacagg ttttgcactg aagacattca catgtgagga aaacaggaaa accactatgc 660tagaaaaagc aaatgctgtt gagattgtct cacaaacaca aattgcgtgc cagcaggtag 720gtttgagcct caggttgggc acattttacc ttaagcgcac tgttggtgga acttaaggtg 780actgtaggac ttatatatac atacatacat ataatatata tacatattta tgtgtatata 840cacacacaca cacacacaca cacacagggt cttgctatct tgcccagggt ggtctccaac 900tctgggtctc aagcgatcct ctgcctcccc ttcccaaag 939251067DNAHomo sapiensmisc_featuresequence of STAR25 25cagcccctct tgtgtttttc tttatttctc gtacacacac gcagttttaa gggtgatgtg 60tgtataatta aaaggaccct tggcccatac tttcctaatt ctttagggac tgggattggg 120tttgactgaa atatgttttg gtggggatgg gacggtggac ttccattctc cctaaactgg 180agttttggtc ggtaatcaaa actaaaagaa acctctggga gactggaaac ctgattggag 240cactgaggaa caagggaatg aaaaggcaga ctctctgaac gtttgatgaa atggactctt 300gtgaaaatta acagtgaata ttcactgttg cactgtacga agtctctgaa atgtaattaa 360aagtttttat tgagcccccg agctttggct tgcgcgtatt tttccggtcg cggacatccc 420accgcgcaga gcctcgcctc cccgctgccc tcagcctccg atgacttccc cgcccccgcc 480ctgctcggtg acagacgttc tactgcttcc aatcggaggc acccttcgcg ggagcggcca 540atcgggagct ccggcaggcg gggaggccgg gccagttaga tttggaggtt caacttcaac 600atggccgaag caagtagcgc caatctaggc agcggctgtg aggaaaaaag gcatgagggg 660tcgtcttcgg aatctgtgcc acccggcact accatttcga gggtgaagct cctcgacacc 720atggtggaca cttttcttca gaagctggtc gccgccggca ggtaaagtgg acgcagccgc 780ggtgggagtg tttgttggca ccgaagctca aatcccgcga ggtcaggacg gccgcaggct 840ggcgcgcggt gacgtgggtc cgcgttgggg gcggggcagt cggacgaggc gacccagtca 900aatcctgagc cttaggagtc agggtattca cgcactgata acctgtagcg gaccgggata 960gctagctact ccttcctaca ggaagccccg ttttcactaa aatttcaggt ggttgggagg 1020aaagatagag cctttgcaaa ttagagcagg gttttttatt tttttat 106726540DNAHomo sapiensmisc_featuresequence of STAR26 26ccccctgaca agccccagtg tgtgatgttc cccactctgt gtccatgcat tctcattgtt 60caactcccat ctgtgagtga gaacatgcag tgtttggttt tctgtccttg agatagtttg 120ctgagaatga tggtttccag cttcatccat gtccttgcaa aggaagtgaa cttatccttt 180tttatggctt catagtattc catggcacat atgtgccaca tttttttaat ccagtctatc 240attgatggac atttgggttg gttccaagtc tttgctattg tgaatagcac cacaattaac 300atatgtgtgc atgtatacat ctttatagta gcatgattta taatccttcg ggtatatacc 360ctgtaatggg atcgctgggt caaatggtat ttctagttct agatccttga ggaatcacca 420cactgctttc cacaatggtt gaactaattt acgctcccac cagcagtgta aaagcattcc 480tatttctcca cgtcctctcc agtatctgtt gtttcctgac tttttaatga tcatcattct 540271520DNAHomo sapiensmisc_featuresequence of STAR27 27cttggccctc acaaagcctg tggccaggga acaattagcg agctgcttat tttgctttgt 60atccccaatg ctgggcataa tgcctgccat tatgagtaat gccggtagaa gtatgtgttc 120aaggaccaaa gttgataaat accaaagaat ccagagaagg gagagaacat tgagtagagg 180atagtgacag aagagatggg aacttctgac aagagttgtg aagatgtact aggcaggggg 240aacagcttaa ggagagtcac acaggaccga gctcttgtca agccggctgc catggaggct 300gggtggggcc atggtagctt tcccttcctt ctcaggttca gagtgtcagc cttgaacttc 360taattcccag aggcatttat tcaatgtttt cttctagggg catacctgcc ctgctgtgga 420agactttctt ccctgtgggt cgccccagtc cccagatgag acggtttggg tcagggccag 480gtgcaccgtt gggtgtgtgc ttatgtctga tgacagttag ttactcagtc attagtcatt 540gagggaggtg tggtaaagat ggagatgctg ggtcacatcc ctagagaggt gttccagtat 600gggcacatgg gagggctgga aggataggtt actgctagac gtagagaagc cacatccttt 660aacaccctgg cttttcccac tgccaagatc cagaaagtcc ttgtggtttc gctgctttct 720cctttttttt tttttttttt tttctgagat ggagtctggc tctgtcgccc aggctggagt 780gcagtggcac gatttcggct cactgcaagt tccgcctcct aggttcatac cattctccca 840cctcagcctc ccgagtagct gggactacag gcgccaccac acccagctaa ttttttgtat 900ttttagtaga gacggcgttt caccatgtta gccaggatgg tcttgatccg cctgcctcag 960cctcccaaag tgctgggatt acaggcgtga gccaccgcgc ccggcctgct ttcttctttc 1020atgaagcatt cagctggtga aaaagctcag ccaggctggt ctggaactct tgacctcaag 1080tgatctgcct gcctcagcct cccaaagtgc tgagattaca ggcatgagcc agtccgaatg 1140tggctttttt tgttttgttt tgaaacaagg tctcactgtt gcccaggctg cagtgcagtg 1200gcatacctca gctccactgc agcctcgacc tcctgggctc aagcaatcct cccaactgag 1260cctccccagt agctggggct acaagcgcat gccaccacgc ctggctattt tttttttttt 1320tttttttttt gagaaggagt ttcattcttg ttgcccaggc tggagtgcaa tggcacagtc 1380tcagctcact gcagcctccg cctcctgggt tcaagcgatt ctcctgcctc agcctcccga 1440gtagctggga ttataggcac ctgccaccat gcctggctaa tttttttgta tttttagtag 1500ggatggggtt tcaccatgtt 152028961DNAHomo sapiensmisc_featuresequence of STAR28 28aggaggttat tcctgagcaa atggccagcc tagtgaactg gataaatgcc catgtaagat 60ctgtttaccc tgagaagggc atttcctaac tctccctata aaatgccaag tggagcaccc 120cagatgaaat agctgatatg ctttctatac aagccatcta ggactggctt tatcatgacc 180aggatattca cccactgaat atggctatta cccaagttat ggtaaatgct gtagttaagg 240gggtcccttc cacatggaca ccccaggtta taaccagaaa gggttcccaa tctagactcc 300aagagagggt tcttagacct catgcaagaa agaacttggg gcaagtacat aaagtgaaag 360caagtttatt aagaaagtaa agaaacaaaa aaatggctac tccataagca aagttatttc 420tcacttatat gattaataag agatggatta ttcatgagtt ttctgggaaa ggggtgggca 480attcctggaa ctgagggttc ctcccacttt tagaccatat agggtatctt cctgatattg 540ccatggcatt tgtaaactgt catggcactg atgggagtgt cttttagcat tctaatgcat 600tataattagc atataatgag cagtgaggat gaccagaggt cacttctgtt gccatattgg 660tttcagtggg gtttggttgg cttttttttt tttttaacca caacctgttt tttatttatt 720tatttattta tttatttatt tatatttttt attttttttt agatggagtc ttgctctgtc 780acccaggtta gagtgcagtg gcaccatctc ggctcactgc aagctctgcc tccttggttc 840acgccattct gctgcctcag cctcccgagt agctgggact acaggtgcct gccaccatac 900ccggctaatt ttttctattt ttcagtagag acggggtttc accgtgttag ccaggatggt 960c 961292233DNAHomo sapiensmisc_featuresequence of STAR29 29agcttggaca cttgctgatg ccactttgga tgttgaaggg ccgccctctc ccacaccgct 60ggccactttt aaatatgtcc cctctgccca gaagggcccc agaggagggg ctggtgaggg 120tgacaggagt tgactgctct cacagcaggg ggttccggag ggaccttttc tccccattgg 180gcagcataga aggacctaga agggccccct ccaagcccag ctgggcgtgc agggccagcg 240attcgatgcc ttcccctgac tcaggtggcg ctgtcctaaa ggtgtgtgtg ttttctgttc 300gccagggggt ggcggataca gtggagcatc gtgcccgaag tgtctgagcc cgtggtaagt 360ccctggaggg tgcacggtct cctccgactg tctccatcac gtcaggcctc acagcctgta 420ggcaccgctc ggggaagcct ctggatgagg ccatgtggtc atccccctgg agtcctggcc 480tggcctgaag aggaggggag gaggaggcca gcccctccct agccccaagg cctgcgaggc 540tgcaagcccg gccccacatt ctagtccagg cttggctgtg caagaagcag attgcctggc 600cctggccagg cttcccagct aggatgtggt atggcagggg tgggggacat tgaggggctg 660ctgtagcccc cacaacctcc ccaggtaggg tggtgaacag taggctggac aagtggacct 720gttcccatct gagattcaag agcccacctc tcggaggttg cagtgagccg agatccctcc 780actgcactcc agcctgggca acagagcaag actctgtctc aaaaaaacag aacaacgaca 840acaaaaaacc cacctctggc ccactgccta actttgtaaa taaagtttta ttggcacata 900gacacaccca ttcatttaca tactgctgcg gctgcttttg cattaccctt gagtagacga 960cagaccacgt ggccatggaa gccaaaaata tttactgtct ggccctttac agaagtctgc 1020tctagaggga gaccccggcc catggggcag gaccactggg cgtgggcaga agggaggcct 1080cggtgcctcc acgggcctag ttgggtatct cagtgcctgt ttcttgcatg gagcaccagg 1140ggtcagggca agtacctgga ggaggcaggc tgttgcccgc ccagcactgg gacccaggag 1200accttgagag gctcttaacg aatgggagac aagcaggacc agggctccca ttggctgggc 1260ctcagtttcc ctgcctgtaa gtgagggagg gcagctgtga aggtgaactg tgaggcagag 1320cctctgctca gccattgcag gggcggctct gccccactcc tgttgtgcac ccagagtgag 1380gggcacgggg tgagatgtca ccatcagccc ataggggtgt cctcctggtg ccaggtcccc 1440aagggatgtc ccatcccccc tggctgtgtg gggacagcag agtccctggg gctgggaggg 1500ctccacactg ttttgtcagt ggtttttctg aactgttaaa tttcagtgga aaattctctt 1560tcccctttta ctgaaggaac ctccaaagga agacctgact gtgtctgaga agttccagct 1620ggtgctggac gtcgcccaga aagcccaggt actgccacgg gcgccggcca ggggtgtgtc 1680tgcgccagcc atgggcacca gccaggggtg tgtctacgcc ggccaggggt aggtctccgc 1740cggcctccgc tgctgcctgg ggagggccgt gcctgacact gcaggcccgg tttgtccgcg 1800gtcagctgac ttgtagtcac cctgcccttg gatggtcgtt acagcaactc tggtggttgg 1860ggaaggggcc tcctgattca gcctctgcgg acggtgcgcg agggtggagc tcccctccct 1920ccccaccgcc cctggccagg gttgaacgcc cctgggaagg actcaggccc gggtctgctg 1980ttgctgtgag cgtggccacc tctgccctag accagagctg ggccttcccc ggcctaggag 2040cagccgggca ggaccacagg gctccgagtg acctcagggc tgcccgacct ggaggccctc 2100ctggcgtcgc ggtgtgactg acagcccagg agcgggggct gttgtaattg ctgtttctcc 2160ttcacacaga accttttcgg gaagatggct gacatcctgg agaagatcaa gaagtaagtc 2220ccgcccccca ccc 2233301851DNAHomo sapiensmisc_featuresequence of STAR30 30gggtgcattt ccacccaggg gacacttggc aatggtggga gacattgctt gttgtcacaa 60ctgggcatgg gagtgctgct gcgtctagtg ggtagaggcc agagatgctc ctaatatcct 120acaaggcaca gaacagcccc ccacaacaga gaattatcca gcctgaaaat gtccacagtg 180ctgaggttgg gaaaccctat tctagagcca acaggctgtg aagcttgact catggttcca 240tcaccaatag ctgcgtgacc ttggtgagtt ccttagctgc tctgtgcctc ggattcatgg 300taggttttcc ttgttaggtt taaatgagtg aagttataca gagggcctga agtctcatgg 360tattttacta gagcctcatt gtgttttagt tataattaga aattgggtaa ggtaaggaca 420cagaagaagc catctgatct gggggcttca cacttagaag tgacctcgga gcaattgtat 480tggggtggaa agggactaac agccaggagc agagggcaca ttggaattgg ggccagaggg 540cacagactgc cttgtccatc aggcatagca atggacagag gaaggggaat gactagttat 600ggctgcaagg ccaagtacag gggacttatt tctcatatct atctatctat ctacctaccg 660tctatttatc tatcatctat ctacttattt atctatctat ttatgcatgt gtaccaaccg 720aaagttttag taaatgcaca aactgcgata taatgaaaat ggaaattttc aaaagaagag 780aaatcacctg ccacctgact accttaacaa atgagtggtt ttcatctctc cttccaggcc 840tgtcattttt acagtgcttt agtcataaaa caggtcctct attctattgt tttatgtcac 900atgaaattgt accataagca ttttccatga tgtgactcca ctgtttcatt ttccattttt 960ttccagaatg aagataacct cattgttttt ttcctgattg taaaaatgct ctgtgctctt 1020tttttttttt tttaacaatg caggcagtac caaaaagtat gaagaagaat gtaatagttc 1080ccatttccca tctcactctt taaggccagc attttggtga acatccatcc gaacaaatct 1140ccacgcgttt atcaatttgt tgacttactc cttcttttat gtaaatatga acatgattta 1200actgccagtc catttggaac cttaaagtga aggtttttta ttgttggggt ttgctatggt 1260ctgaatatgt gtgtcccccc aaaatttatg ttgaatccta acgcccaatg cgattaggag 1320gtggggccat taggaggtga ttaagtcatg aagtcatcag ccctaatgaa tgggatttgt 1380ggccttgaaa agggacccca gagagctgcc ttgccccttc tgccatgtaa ggacacagtg 1440aggagctagg aagggggcct cagcagagac caaatgtgat ggtgcctcga tattggactt 1500cccagcctcc agaatgtgag aaatgaattt ctgttgttta taagtcaccc agtctatagt 1560attttgttct agcagcccaa acagactaag tcagggttgt tgttttagga agtggggaat 1620ggggccatgc atgggtgtac gccagaacaa aggaagccag caagtcctga aagatactgg 1680aaaagggaat agtgggcacg tgcagtgtgt tagtttcctg aggctgctat aacaaagcac 1740cacaggttgg gtggcttaaa taacagaaat tcattctccc atcattctgg ggaccagacg 1800tctgaaatca agactcctat gccatgctcc ttctgaaggc tccaggggag g 1851311701DNAHomo sapiensmisc_featuresequence of STAR31 31cacccgcctt ggccccccag agtgctggga ttacaagtgt aaaccaccat tcctggctag 60atttaatttt ttaaaaaata aagagaagta ggaatagttc attttaggga gagcccctta 120actgggacag gggcaggaca ggggtgaggc ttcccttant tcaagctcac ctcaaaccca 180cccaggactg tgtgtcacat tctccaataa aggaaaggtt gctgcccccg cctgtgagtg 240ctgcagtgga gggtagaggg ccgtgggcag agtgcttcat ggactgctca tcaagaaagg 300cttcatgaca atcggcccag ctgctgtcat cccacattct acttccagct aggagaaggc 360ggcttgccca cagtcaccca gccggcaagt gtcacccctg ggttggaccc agagctatga 420tcctgcccag gggtccagct gagaatcagg cccacgttct aggcagaggg gctcacctac 480tgggactcca gtagctgtag tgcatggagg catcatggct gcagcagcct ggacctggtc 540tcacactggc tgtccctgtg ggcaggccat cctcaatgcc aggtcaggcc caagcatgta 600tcccagacaa tgacaatggg gtggaatcct ctcttgtccc agaagccact cctcactgtt 660ctacctgagg aaggcagggg catggtggaa tcctgaagcc tgctgtgagg gtctccagcg 720aacttgcaca tggtcagccc tgccttctcc tccctgaact agattgagcg agagcaagaa 780ggacattgaa ccagcaccca aagaattttg gggaacggcc tctcatccag gtcaggctca 840cctccttttt aaaatttaat taattaatta attaattttt ttttagagac agagtcttac 900tgtgtggccc aggctgtagt gcagtggcac aatcatagtt cactgcagcc tcaaactccc 960cacctcagcc tctggattag ctgagactac aggtgcacca ccaccacacc cagctaatat 1020ttttattttt gtagagagag ggtttcacca tcttgcccag gctggtctca aactcctggg 1080ctcaagtgat cccgcccagg tctgaaagcc cccaggctgg cctcagactg tggggttttc 1140catgcagcca cccgagggcg cccccaagcc agttcatctc ggagtccagg cctggccctg 1200ggagacagag tgaaaccagt ggtttttatg aacttaactt agagtttaaa agatttctac 1260tcgatcactt gtcaagatgc gccctctctg gggagaaggg aacgtgactg gattccctca 1320ctgttgtatc ttgaataaac gctgctgctt catcctgtgg gggccgtggc cctgtccctg 1380tgtgggtggg gcctcttcca tttccctgac ttagaaacca cagtccacct agaacagggt 1440ttgagaggct tagtcagcac tgggtagcgt tttgactcca ttctcggctt tcttcttttt 1500ctttccagga tttttgtgca gaaatggttc ttttgttgcc gtgttagtcc tccttggaag 1560gcagctcaga aggcccgtga aatgtcgggg gacaggaccc ccagggaggg aaccccaggc 1620tacgcacttt agggttcgtt ctccagggag ggcgacctga cccccgnatc cgtcggngcg 1680cgnngnnacn aannnnttcc c 170132771DNAHomo sapiensmisc_featuresequence of STAR32 32gatcacacag cttgtatgtg ggagctagga ttggaacccc agaagtctgg ccccaggttc 60atgctctcac ccactgcata caatggcctc tcataaatca atccagtata aaacattaga 120atctgcttta aaaccataga attagtagcg taagtaataa atgcagagac catgcagtga 180atggcattcc tggaaaaagc ccccagaagg aattttaaat cagctttcgt ctaatcttga 240gcagctagtt agcaaatatg agaatacagt tgttcccaga taatgcttta tgtctgacca 300tcttaaactg gcgctgtttt tcaaaaactt aaaaacaaaa tccatgactc ttttaattat 360aaaagtgata catgtctact tgggaggctg aggtggtggg aggatggctt gagtttgagg 420ctgcagtatg ctactatcat gcctataaat agccgctgca ttccagcttg ggcaacatac 480ccaggcccta tctcaaaaaa ataaaaagta atacatctac attgaagaaa attaatttta 540ttgggttttt ttgcattttt attatacaca gcacacacag cacatatgaa aaaatgggta 600tgaactcagg cattcaactg gaagaacagt actaaatcaa tgtccatgta gtcagcgtga 660ctgaggttgg tttgtttttt cttttttctt ctcttctctt ctcttttctt tttttttgag 720acggagcttt gctctttttg cccaggcttg attgcaatgg cgtgatctca g 771331368DNAHomo sapiensmisc_featuresequence of STAR33 33gcttttatcc tccattcaca gctagcctgg cccccagagt acccaattct ccctaaaaaa 60cggtcatgct gtatagatgt gtgtggcttg gtagtgctaa agtggccaca tacagagctc 120tgacaccaaa cctcaggacc atgttcatgc cttctcactg agttctggct tgttcgtgac 180acattatgac attatgatta tgatgacttg tgagagcctc agtcttctat agcactttta 240gaatgcttta taaaaaccat ggggatgtca ttatattcta acctgttagc acttctgttc 300gtattaccca tcacatccca acatcaattc tcatatatgc aggtacctct tgtcacgcgc 360gtccatgtaa ggagaccaca aaacaggctt tgtttgagca acaaggtttt tatttcacct 420gggtgcaggt gggctgagtc tgaaaagaga gtcagtgaag ggagacaggg gtgggtccac 480tttataagat ttgggtaggt agtggaaaat tacaatcaaa gggggttgtt ctctggctgg 540ccagggtggg ggtcacaagg tgctcagtgg gagagccttt gagccaggat gagccagaag 600gaatttcaca aggtaatgtc atcagttaag gcagggactg gccattttca cttcttttgt 660ggtggaatgt catcagttaa ggcaggaacc ggccattttc acttcttttg tgattcttca 720cttgcttcag gccatctgga cgtataggtg caggtcacag tcacagggga taagatggca 780atggcatagc ttgggctcag aggcctgaca

cctctgagaa actaaagatt ataaaaatga 840tggtcgcttc tattgcaaat ctgtgtttat tgtcaagagg cacttatttg tcaattaaga 900acccagtggt agaatcgaat gtccgaatgt aaaacaaaat acaaaacctc tgtgtgtgtg 960tgtgtgtgag tgtgtgtgta tgtgtgtgtg tgtgtattag agaggaaaag cctgtatttg 1020gaggtgtgat tcttagattc taggttcttt cctgcccacc ccatatgcac ccaccccaca 1080aaagaacaaa caacaaatcc caggacatct tagcgcaaca tttcagtttg catattttac 1140atatttactt ttcttacata ttaaaaaact gaaaatttta tgaacacgct aagttagatt 1200ttaaattaag tttgttttta cactgaaaat aatttaatat ttgtgaagaa tactaataca 1260ttggtatatt tcattttctt aaaattctga acccctcttc ccttatttcc ttttgacccg 1320attggtgtat tggtcatgtg actcatggat ttgccttaag gcaggagg 136834755DNAHomo sapiensmisc_featuresequence of STAR34 34actgggcacc ctcctaggca ggggaatgtg agaactgccg ctgctctggg gctgggcgcc 60atgtcacagc aggagggagg acggtgttac accacgtggg aaggactcag ggtggtcagc 120cacaaagctg ctggtgatga ccaggggctt gtgtcttcac tctgcagccc taacacccag 180gctgggttcg ctaggctcca tcctgggggt gcagaccctg agagtgatgc cagtgggagc 240ctcccgcccc tccccttcct cgaaggccca ggggtcaaac agtgtagact cagaggcctg 300agggcacatg tttatttagc agacaaggtg gggctccatc agcggggtgg cctggggagc 360agctgcatgg gtggcactgt ggggagggtc tcccagctcc ctcaatggtg ttcgggctgg 420tgcggcagct ggcggcaccc tggacagagg tggatatgag ggtgatgggt ggggaaatgg 480gaggcacccg agatggggac agcagaataa agacagcagc agtgctgggg ggcaggggga 540tgagcaaagg caggcccaag acccccagcc cactgcaccc tggcctccca caagccccct 600cgcagccgcc cagccacact cactgtgcac tcagccgtcg atacactggt ctgttaggga 660gaaagtccgt cagaacaggc agctgtgtgt gtgtgtgcgt gtatgagtgt gtgtgtgtga 720tccctgactg ccaggtcctc tgcactgccc ctggg 755351193DNAHomo sapiensmisc_featuresequence of STAR35 35cgacttggtg atgcgggctc ttttttggtt ccatatgaac tttaaagtag tcttttccaa 60ttctgtgaag aaagtcattg gtaggttgat ggggatggca ttgaatctgt aaattacctt 120gggcagtatg gccattttca caatgttgat tcttcctatc catgatgatg gaatgttctt 180ccattagttt gtatcctctt ttatttcctt gagcagtggt ttgtagttct ccttgaagag 240gtccttcaca tcccttgtaa gttggattcc taggtatttt attctctttg aagcaaattg 300tgaatgggag tncactcacg atttggctct ctgtttgtct gctgggtgta taaanaatgt 360ngtgatnttn gtacattgat ttngtatccn tgagacttng ctgaatttgc ttnatcngct 420tnngggaacc ttttgggctg aaacnatggg attttctaaa tatacaatca tgtcgtctgc 480aaacagggaa caatttgact tcctcttttc ctaattgaat acactttatc tccttctcct 540gcctaattgc cctgggcaaa acttccaaca ctatgntngn aataggagnt ggtgagagag 600ggcatccctg ttcttgttgc cagnttttca aagggaatgc ttccagtttt ggcccattca 660gtatgatatg ggctgtgggt ngtgtcataa atagctctta tnattttgaa atgtgtccca 720tcaataccta atttattgaa agtttttagc atgaangcat ngttgaattt ggtcaaaggc 780tttttctgca tctatggaaa taatcatgtg gtttttgtct ttggctcntg tttatatgct 840ggatnacatt tattgatttg tgtatatnga acccagcctn ncatcccagg gatgaagccc 900acttgatcca agcttggcgc gcngnctagc tcgaggcagg caaaagtatg caaagcatgc 960atctcaatta gtcagcaccc atagtccgcc cctacctccg cccatccgcc cctaactcng 1020nccgttcgcc cattctcgcc catggctgac taatnttttt annatccaag cggngccgcc 1080ctgcttganc attcagagtn nagagnnttg gaggccnagc cttgcaaaac tccggacngn 1140ttctnnggat tgaccccnnt taaatatttg gttttttgtn ttttcanngg nga 1193361712DNAHomo sapiensmisc_featuresequence of STAR36 36gatcccatcc ttagcctcat cgatacctcc tgctcacctg tcagtgcctc tggagtgtgt 60gtctagccca ggcccatccc ctggaactca ggggactcag gactagtggg catgtacact 120tggcctcagg ggactcagga ttagtgagcc ccacatgtac acttggcctc agtggactca 180ggactagtga gccccacatg tacacttggc ctcaggggac tcaggattag tgagccccca 240catgtacact tggcctcagg ggactcagga ttagtgagcc ccacatgtac acttggcctc 300aggggactca ggactagtga gccccacatg tacacttggc ctcaggggac tcagaactag 360tgagccccac atgtacactt ggcttcaggg gactcaggat tagtgagccc cacatgtaca 420cttggacacg tgaaccacat cgatgtgctg cagagctcag ccctctgcag atgaaatgtg 480gtcatggcat tccttcacag tggcacccct cgttccctcc ccacctcatc tcccattctt 540gtctgtcttc agcacctgcc atgtccagcc ggcagattcc accgcagcat cttctgcagc 600acccccgacc acacacctcc ccagcgcctg cttggccctc cagcccagct cccgcctttc 660ttccttgggg aagctccctg gacagacacc ccctcctccc agccatggct ttttcctgct 720ctgccccacg cgggaccctg ccctggatgt gctacaatag acacatcaga tacagtcctt 780cctcagcagc cggcagaccc agggtggact gctcggggcc tgcctgtgag gtcacacagg 840tgtcgttaac ttgccatctc agcaactagt gaatatgggc agatgctacc ttccttccgg 900ttccctggtg agaggtactg gtggatgtcc tgtgttgccg gccacctttt gtccctggat 960gccatttatt tttttccaca aatatttccc aggtctcttc tgtgtgcaag gtattagggc 1020tgcagcgggg gccaggccac agatctctgt cctgagaaga cttggattct agtgcaggag 1080actgaagtgt atcacaccaa tcagtgtaaa ttgttaactg ccacaaggag aaaggccagg 1140aaggagtggg gcatggtggt gttctagtgt tacaagaaga agccagggag ggcttcctgg 1200atgaagtggc atctgacctg ggatctggag gaggagaaaa atgtcccaaa agagcagaga 1260gcccacccta ggctctgcac caggaggcaa cttgctgggc ttatggaatt cagagggcaa 1320gtgataagca gaaagtcctt gggggccaca attaggattt ctgtcttcta aagggcctct 1380gccctctgct gtgtgacctt gggcaagtta cttcacctct agtgctttgg ttgcctcatc 1440tgtaaagtgg tgaggataat gctatcacac tggttgagaa ttgaagtaat tattgctgca 1500aagggcttat aagggtgtct aatactagta ctagtaggta cttcatgtgt cttgacaatt 1560ttaatcatta ttattttgtc atcaccgtca ctcttccagg ggactaatgt ccctgctgtt 1620ctgtccaaat taaacattgt ttatccctgt gggcatctgg cgaggtggct aggaaagcct 1680ggagctgttt cctgttgacg tgccagacta gt 1712371321DNAHomo sapiensmisc_featuresequence of STAR37 37aggatcacat ttaaggaagt gtgtggggtc cctggatgac accagcaccc agtgcggctc 60tgtctggcaa ccgctcccaa ggtggcagga gtgggtgtcc cctgtgtgtc agtgggcagc 120tcctgctgag cctacagctc actggggagc ctgacagcgg ggccatgtgc ctgacactcc 180tctctgcttg tggacctggc aaggcaggga gcagaaaaca gagccacttg aaggctttct 240gtctgcgtct gtgtgcagtg tggatttagt tgtgcttttt tcttgctggg agagcacagc 300caccatttac aagcagtgtc accctcatgg gtggcgagga cagaacagga gcctctgctc 360tctgtaccta tctgggcccg gtgggctccc ttgtcctggc ttccatctct gtctcagcga 420ccattcagcc ctgcgcagga acacatgttg cttagaaaag ccaaattcag cccttgtctc 480tgcctcctct ggtctcatga tgtgcatctg ttaccttgaa actggaaacc agtctatcaa 540tgtctgtgcc aattttttat tccctcccca acctccttcc ccatacgact ttttatttat 600gtaggatgtg tgctgtctaa tgatgggatg accacatttt tccatgttct aaaagtgctc 660ctctcccgca gggtcccagg gctggtggtt gctttgggtc tacagctacg tcttacccgc 720ctcctgcctc aacagcctgt gtggtggcaa agccggtgtg gggctgggga acgcagcgtt 780ctccaggagg gggacccggc tctccttctg cagtgcaggc gaaggcctag atgccagtgt 840gacctcccac aaggcgtggc ttccagactc cccggctgga agtgatgctt ttttgcctcc 900ggccctgggt ttgaagcagc ctggctttct cttggtaagt ggctggtgtc ttagcagctg 960caatctgagc tcagccacct acacaccacc gtggccgaca ctttcattaa aaagtttcct 1020gagacgactt gcgtgcatgt tgacttcatg atcagcgccg ctgggaagaa cccctgagcc 1080ggtggggtgg ggctggaagc agcaggtgca gtgatggggc tgggtgccca ggaggcctca 1140gtgctcaatc aggccaaggt ggccaagccc aggctgcagg gaaggccggc ctgggggttg 1200tgggtgagca caggcaggca ccagctgggc agtgttagga tgctggagca gcatccgtaa 1260ccccactgag tggggtagtc tggttggggc agggaccgct gttgctttgg cagagagaga 1320t 1321381445DNAHomo sapiensmisc_featuresequence of STAR38 38gatctatggg agtagcttcc ttagtgagct ttcccttcaa atactttgca accaggtaga 60gaattttgga gtgaaggttt tgttcttcgt ttcttcacaa tatggatatg catcttcttt 120tgaaaatgtt aaagtaaatt acctctcttt tcagatactg tcttcatgcg aacttggtat 180cctgtttcca tcccagcctt ctataaccca gtaacatctt ttttgaaacc agtgggtgag 240aaagacacct ggtcaggaac gcggaccaca ggacaactca ggctcaccca cggcatcaga 300ctaaaggcaa acaaggactc tgtataaagt accggtggca tgtgtatnag tggagatgca 360gcctgtgctc tgcagacagg gagtcacaca gacacttttc tataatttct taagtgcttt 420gaatgttcaa gtagaaagtc taacattaaa tttgattgaa caattgtata ttcatggaat 480attttggaac ggaataccaa aaaatggcaa tagtggttct ttctggatgg aagacaaact 540tttcttgttt aaaataaatt ttattttata tatttgaggt tgaccacatg accttaagga 600tacatataga cagtaaactg gttactacag tgaagcaaat taacatatct accatcgtac 660atagttacat ttttttgtgt gacaggaaca gctaaaatct acgtatttaa caaaaatcct 720aaagacaata catttttatt aactatagcc ctcatgatgt acattagatc gtgtggttgt 780ttcttccgtc cccgccacgc cttcctcctg ggatggggat tcattcccta gcaggtgtcg 840gagaactggc gcccttgcag ggtaggtgcc ccggagcctg aggcgggnac tttaanatca 900gacgcttggg ggccggctgg gaaaaactgg cggaaaatat tataactgna ctctcaatgc 960cagctgttgt agaagctcct gggacaagcc gtggaagtcc cctcaggagg cttccgcgat 1020gtcctaggtg gctgctccgc ccgccacggt catttccatt gactcacacg cgccgcctgg 1080aggaggaggc tgcgctggac acgccggtgg cgcctttgcc tgggggagcg cagcctggag 1140ctctggcggc agcgctggga gcggggcctc ggaggctggg cctggggacc caaggttggg 1200cggggcgcag gaggtgggct cagggttctc cagagaatcc ccatgagctg acccgcaggg 1260cggccgggcc agtaggcacc gggcccccgc ggtgacctgc ggacccgaag ctggagcagc 1320cactgcaaat gctgcgctga ccccaaatgc tgtgtccttt aaatgtttta attaagaata 1380attaataggt ccgggtgtgg aggctcaagc cttaatcccc agcacctggc gaggccgagg 1440aggga 1445392331DNAHomo sapiensmisc_featuresequence of STAR39 39gtgaaataga tcactaaagc tgattcctct tgtctaaatg aaactttcta ccctttgatg 60gacagctatg ctttccccat cctctcccgt cccccagccc ttggtaacca tcatcctact 120ctctacttgt aggagttcaa cttgtttaga ttttgtgagt gagaacatgt ggtatttgcc 180tttagagtcc tctaggttta tccatattgt gttaaatgac aggattccct gcctttttaa 240ggctgaatag tatttcattg taatatatat acatacacac acacatatac acacacatat 300atatacatat atacatatat gtacatagat acatatatat gtacatatat acacacacat 360atacacacat atatacacat atatacatat acatatatac acatatatgt acatatatat 420aacttttttt catttatcca ttcacttaat acatatgatg gagggcttta tatatgccag 480gctctgtgat gaatgctgga aattcaatag tgagaaagac tcagtctctg cctccaaaga 540gcatcatggg ctaggtgctg caacgaggaa ttgccaactg ttgtcatgag agcacagaga 600agggactcaa ccagccttga agaatcaggg gaggcttcta agctaatggt gtgtgcctgg 660ggatcacatt gtttcaagca gcagtaacag gatgtgctca ggtccagatg tgagagagag 720agagagcata tgtcttcaag aaactaacag tagctcccta tagctgaagc aggagtacaa 780aatagtgagt ttaagtgatg aggcaagaga tatgaagaag cttgaccatg cagctacacc 840gggcagcatg ccctctgaga catctcatgg aagccggaaa tgggagtgcc ttgataccaa 900gccagagaaa ttataatact aagtagatag actgagcagc actcctcctg ggaagaatga 960gacaagccct gaatttggag gtaagttgtg gattggtgat tagaggagag gtaacaggca 1020ccaaagcaag aaatagtatt gatgcaaagc tgaggttaat tggatgacaa aatgaagagc 1080ataaggggct cagacacaga ctgagcagaa aacgagtagc atctgaacct agattgagtt 1140actaatggat gagaaagagt tcttaaagtt gatgaccacg ggatccatat ataagaatgt 1200ccaatctccc caaattgatc cacgagttca gtgcaatgcc aatcaaaatc ccactaacaa 1260gtttatttta aaatgtaaat gaaaatacaa aatttttaaa aagcaaagca atattgaaaa 1320cccaggaaaa attaggagga cttacacaac ctgatctcaa aacttaccat tatcaagaca 1380gagtgttatt gacacaagga gagacaaata gataaacgga atgtggtagt ctggagatgc 1440acccacatgt atgtggtcaa ttgatttttg gccaaggcac caagtcaatt caaaggagca 1500aggaaagtag tacagaaaca accaaatatt gttttggaaa ataatgacaa agggcttata 1560accagaatat aagcatataa atataattct ttcaaatcaa taataagaag gcaaatatct 1620aataaaaatg agcaaagact tgaaaagtca cttaaaaagg cttattaatt agaaatatgc 1680aaatgttatt agtcttcagt ggaatttaca ttaaaccaca agggatacta ttatatctta 1740tgcccactag aataaccaaa ggaaaaaaga cagacaaaac aaaatgctgg tgaggatgtg 1800aagcaactgg aactctcata cattattggt ggtaatgtaa aatttataca accattatga 1860ataaaggttt ggcagtttct tacaaagttg aatgcacttc tccacgatga ctaggctttt 1920cactcatagg cgtctggctc cctagaactg aaaacatatg ttcacaagaa gacttgcaaa 1980tatatattct cccacgtcag gagatatttg ctatgcattt aactgacata agattagtgc 2040tagagtttat aatgaggttc ttcaaatcta aaagaaaatg caaagcatat aatagtaagg 2100ggtgcaggcc aggcgcagtg gctcactctg taatcccagc actttgggag gccgaggtgg 2160gcggatcaca aggtcaggag ttcgagacca acctggccaa catagtgaaa ccctgtctct 2220actaaaaata caaaaactag ccaggtgcgg tgtcatgcac ctgtagtccc agctactcgg 2280gaggccgagg caggagaatc acttgaacct gggaggtgga ggttgcagtg a 2331401071DNAHomo sapiensmisc_featuresequence of STAR40 40gctgtgattc aaactgtcag cgagataagg cagcagatca agaaagcact ccgggctcca 60gaaggagcct tccaggccag ctttgagcat aagctgctga tgagcagtga gtgtcttgag 120tagtgttcag ggcagcatgt taccattcat gcttgacttc tagccagtgt gacgagaggc 180tggagtcagg tctctagaga gttgagcagc tccagcctta gatctcccag tcttatgcgg 240tgtgcccatt cgctttgtgt ctgcagtccc ctggccacac ccagtaacag ttctgggatc 300tatgggagta gcttccttag tgagctttcc cttcaaatac tttgcaacca ggtagagaat 360tttggagtga aggttttgtt cttcgtttct tcacaatatg gatatgcatc ttcttttgaa 420aatgttaaag taaattacct ctcttttcag atactgtctt catgcgaact tggtatcctg 480tttccatccc agccttctat aacccagtaa catctttttt gaaaccagtg ggtgagaaag 540acacctggtc aggaacgcgg accacaggac aactcaggct cacccacggc atcagactaa 600aggcaaacaa ggactctgta taaagtaccg gtggcatgtg tattagtgga gatgcagcct 660gtgctctgca gacagggagt cacacagaca cttttctata atttcttaag tgctttgaat 720gttcaagtag aaagtctaac attaaatttg attgaacaat tgtatattca tggaatattt 780tggaacggaa taccaaaaaa tggcaatagt ggttctttct ggatggaaga caaacttttc 840ttgtttaaaa taaattttat tttatatatt tgaggttgac cacatgacct taaggataca 900tatagacagt aaactggtta ctacagtgaa gcaaattaac atatctacca tcgtacatag 960ttacattttt ttgtgtgaca ggaacagcta aaatctacgt atttaacaaa aatcctaaag 1020acaatacatt tttattaact atagccctca tgatgtacat tagatctcta a 1071411135DNAHomo sapiensmisc_featuresequence of STAR41 41cgtgtgcagt ccacggagag tgtgttctcc tcatcctcgt tccggtggtt gtggcgggaa 60acgtggcgct gcaggacacc aacatcagtc acgtatttca ttctggaaaa aaaagtagca 120caagcctcgg ctggttccct ccagctctta ccaggcagcc taagcctagg ctccattccc 180gctcaaggcc ttcctcaggg gcctgctcac cacaggagct gttcccatgc agggactaag 240gacatgcagc ctgcatagaa accaagcacc caggaaaaca tgattggatg gagcgggggg 300gtgtggtctc tagccttgtc cacctccggt cctcatgggt ctcacacctc ctgagaatgg 360gcaccgcaga ggccacagcc catacagcca agatgacaga ctccgtaagt gacagggatc 420cacagcagag tgggtgaaat gttccctata aactttacaa aattaatgag ggcaggggga 480ggggagaaat gaaaatgaac ccagctcgca gcacatcagc atcagtcact aggtcggcgt 540gctctctgac tgcttcctcg tagctgcttg gtgtctcatt gcctcagaag catgtagacc 600ctgtcacaag attgtagttc ccctaactgc tccgtagatc acaacttgaa ccttaggaaa 660tgctgttttc cctttgagat attcctttgg gtcctgtata ctgatggagc tactgactga 720gctgctccga aggaccccac gaggagctga ctaaaccaag agtgcagttt gtacaccctg 780atgattacat cccccttgcc ccaccaatca actctcccaa ttttccagcc cctcaccctc 840cagtcccctt aaaagcccca gcccaggccg ggcacagtgg ctcatgcctg taatcccagc 900actttgggag gccaaggtgg gcagatcacc tgagggcagg aatttgagac cagcctgacc 960aacatgaaga aaccccgtct ctattacaaa tacaaaatta gccgggcgtg ttgctgcata 1020ctggtaatcc cagctacttg ggagggtgag gcaggagaat cacttgaatc tgggaggcgg 1080aggttgcgat gagccgagac agcgccattg cactgcagcc tgggcaacaa gagca 113542735DNAHomo sapiensmisc_featuresequence of STAR42 42aagggtgaga tcactaggga gggaggaagg agctataaaa gaaagaggtc actcatcaca 60tcttacacac tttttaaaac cttggttttt taatgtccgt gttcctcatt agcagtaagc 120cctgtggaag caggagtctt tctcattgac caccatgaca agaccctatt tatgaaacat 180aatagacaca caaatgttta tcggatattt attgaaatat aggaattttt cccctcacac 240ctcatgacca cattctggta cattgtatga atgaatatac cataatttta cctatggctg 300tatatttagg tcttttcgtg caggctataa aaatatgtat gggccggtca cagtgactta 360cgcccgtagt cccagaactt tgggaggccg aggcgggtgg atcacctgag gtcgggagtt 420caaaaccagc ctgaccaaca tggagaaacc ccgtctctgc taaaaataca aaaattaact 480ggacacggtg gcgtatgcct gtaatcccag ctactcggga agctgaggca ggagaactgc 540ttgaacccag gaggcggagg ttgtggtgag tcgagattgc gccattgcac tccagcctgg 600gcaacaagag cgaaattcca tctcaaaaaa aagaaaaaag tatgactgta tttagagtag 660tatgtggatt tgaaaaatta ataagtgttg ccaacttacc ttagggttta taccatttat 720gagggtgtcg gtttc 735431227DNAHomo sapiensmisc_featuresequence of STAR43 43caaatagatc tacacaaaac aagataatgt ctgcccattt ttccaaagat aatgtggtga 60agtgggtaga gagaaatgca tccattctcc ccacccaacc tctgctaaat tgtccatgtc 120acagtactga gaccaggggg cttattccca gcgggcagaa tgtgcaccaa gcacctcttg 180tctcaatttg cagtctaggc cctgctattt gatggtgtga aggcttgcac ctggcatgga 240aggtccgttt tgtacttctt gctttagcag ttcaaagagc agggagagct gcgagggcct 300ctgcagcttc agatggatgt ggtcagcttg ttggaggcgc cttctgtggt ccattatctc 360cagcccccct gcggtgttgc tgtttgcttg gcttgtctgg ctctccatgc cttgttggct 420ccaaaatgtc atcatgctgc accccaggaa gaatgtgcag gcccatctct tttatgtgct 480ttgggctatt ttgattcccc gttgggtata ttccctaggt aagacccaga agacacagga 540ggtagttgct ttgggagagt ttggacctat gggtatgagg taatagacac agtatcttct 600ctttcatttg gtgagactgt tagctctggc cgcggactga attccacaca gctcacttgg 660gaaaacttta ttccaaaaca tagtcacatt gaacattgtg gagaatgagg gacagagaag 720aggccctaga tttgtacatc tgggtgttat gtctataaat agaatgcttt ggtggtcaac 780tagacttgtt catgttgaca tttagtcttg ccttttcggt ggtgatttaa aaattatgta 840tatcttgttt ggaatatagt ggagctatgg tgtggcattt tcatctggct ttttgtttag 900ctcagcccgt cctgttatgg gcagccttga agctcagtag ctaatgaaga ggtatcctca 960ctccctccag agagcggtcc cctcacggct cattgagagt ttgtcagcac cttgaaatga 1020gtttaaactt gtttattttt aaaacattct tggttatgaa tgtgcctata ttgaattact 1080gaacaacctt atggttgtga agaattgatt tggtgctaag gtgtataaat ttcaggacca 1140gtgtctctga agagttcatt tagcatgaag tcagcctgtg gcaggttggg tggagccagg 1200gaacaatgga gaagctttca tgggtgg 1227441586DNAHomo sapiensmisc_featuresequence of STAR44 44cacctgcctc agcctcccaa agtgctgaga ttcaaagaaa ttttcatgga gaggggacag 60atggagtcaa ttcttgtggg gtgaacatga gtaccacagt tagactgagg ttgggaaaga 120ttttccagac aattggaaga gcatgtgaaa gacacagatt ttgagaaatg ttaagtctag 180ggaactgcaa ggcttttggc acaagaaagc cactgtagac tatagaggca ggatgcctag 240attcaaatcc caactgctac acttctaagc tttgtaattt tggcaagttt ttaccctcta 300ttttcttatc tataaaatat agattttata tatatagata tagatatata gatagataat 360aattgtgcat gcctaataaa gttgtcaaag attaaatgtt atatgtgaag tattttgtac 420ggtgatagga acccaggaag ggctctatga atattatgta ttattattat tctaaagtag 480ctggaataca atgttcaaag gagatagtgg caggagataa gtttgaattg aaagattgag 540gccagaacat aaagtgcctc ctatattata ttttacataa ttggaacatc attgaaaaat 600ttaagtatta tttatgtgtg tatgtgtgtt

ttatataatt aattctagtt catcatttta 660aaatatcttt ctgatgtcac tgtgaacaac agatgagaag aagtgaatcc tgagttaagg 720agaccagctc tctgattact gccataatcc agggagggta ccataaggat ttcaactgga 780agtgaatcca tcatgatgga gaggaaggac agggctgaaa aatacttagg aagtagtatc 840agtaggactg gttaagagag agcagaggca ggctacaggg gttggaggtg tcaatcacag 900agatagggaa aatgggagga gaagcaggct ttgaaaaagt ggcttgtctt gtaaaattat 960gtgctgttaa aacagtacaa gaaattaata tattcaatcc caaaatacag ggacaattct 1020ttttgaaaga gttacccaga tagtcttcct tgaagttttc agttaaagaa atttcttgtt 1080aacaaataat gtagtcatag aagaaaacac ttaaaacttt attgaataaa gctaataaat 1140catttaatat aatttatagg aaattgttac ataacacaca cattcaatac tttttgctaa 1200agtataaatt aatggaagga gagcacgcac acagaggttg aattatgttt atgactttat 1260tagtcaagaa tacaaaattg agtagctaca tcaagcagaa gcacatgctt tacaatccag 1320cacagaatcc cttgacatcc aaactcccga aacagacatg taaatacaga tgacattgtc 1380agaacaaaat agggtctcac ccgacctata atgttctttt cttgatataa atatgcacat 1440gaattgcata cggtcatatg gttccaatta ccattatttc ctctgggctt agctatccat 1500ctaaggggaa tttacaccaa cactgtactt ctacttgcaa gaatatatga aagcatagtt 1560aacttctggc ttaggacccc aactca 1586451981DNAHomo sapiensmisc_featuresequence of STAR45 45atggatcata gggtaaataa atttataatt tcttgagaaa gcttcgtact gttttccaag 60atggctgtac taatttccat tcctaccaac agtgtacagg gtttcttttt ctccacatcc 120tcaccaacac ttatcttcca tcttttttta taatagccct agtaaaatgt gtgaggtgat 180atctcattgt ggcattgatt tgcacttctc tgataattag gaatgtttat gattttttca 240tgtacctggt tggccttttg tatgatgtag gaaatgtcta ttctgattct ttgcttattt 300tttaataagc atagtttttt tcttattttt gagtaggttg agttgcttat atattattat 360atgagcccct tacctgatgt atggtttaaa aatattatcc catttgtggg ttctcttaat 420tctatcattg cttcttttcc tgtggaaaag ttttaagttt tatgcagtct catttgtgtg 480ttttgctttt gttgcctttt ggaataatct acagaaaatc atagctcagg ccaatgtcat 540acagtctcct tctatatttc cttgtagtag ttttacattt aaactttaat tttgatttga 600tgcttgtata aagagcaaaa taaaagtcaa attttattct tctgtatgtg gatagtcagt 660tttgtctaca ccatttattg aaaataattt tctttcttca ctgtgtattt ttagttattt 720tatcaaaaaa tcaattgacc acagacacac ggatttattt acaggttcta tatccctttg 780tactgtttta catgtctgtt tttatgccat tgctatgctg ttttaattcc tatagctttg 840taatagagtt tggagtcagg tagtctgatg cctccagctt tgttcttttt gttcaagatt 900gctttggttg gtccaggtct tttgtggttc catacaaatt ttagcagtaa tttttctatt 960tctgtgaaga atgacattgg aatttgatag tggttgcatt taatctgtag attgctttgg 1020gtagcattga cacttttaca atactaattt ttgaatccat caatgaagga tgtttctcca 1080tttatttatg ccattttaat ttttttcatc aatgtgctat agttttcagt atgtaaatct 1140tttatggttt tgattaaatt tactcctgtc ttttatatat ttatatatct gttttgattc 1200tattataaat tgaattgcct ttatttttca ggtaatagtt tgtcattagt taatagaaac 1260aataatgata tttgtatgtt gattttgtaa ctattaactt tattgaattt cttcatcagc 1320tataaccatt tattttggtg gaatctttaa gattttctct atcttaagat tatattttca 1380aaaaacagaa acaatcttac ctcttccttc cctatgtgga tttcttttac gtctttgtct 1440tgtgtaactg ttctggctag gcaattacac ataatgtttt catcatttat aattttacat 1500cacatccatc tattgtggca cattgattgc tacttttcaa gttgtaaacc tggacattta 1560tcactactct tcctccaata caggagtcca tggcgtggtg tgggccctac tgtgccacag 1620tccagggcac ggctgggctg aggttctctt gtgcaagagt ccgtggctct gcggagcaag 1680agttctccag tgccttagtc cagggttagg caggggtggg gctccttcag tagcttagtc 1740cagtgcgccg ccctgcgagg gtcctcctga gcaggagtac acgatgaggc agggtcctac 1800tgtgccttag cccaggaagc ggggggctgg gtcctctggt gccatagtcc aggctgccgg 1860gagctgggtc ctctggtgcc atagctcagg ccggcgggag ctgggtcctc tggtgccgta 1920gtccagggtg cagcagaaca ggagtcctgc ggagcagtag tccagggcac gctggggcgt 1980g 1981461859DNAHomo sapiensmisc_featuresequence of STAR46 46attgtttttc tcgcccttct gcattttctg caaattctgt tgaatcattg cagttactta 60ggtttgcttc gtctccccca ttacaaacta cttactgggt ttttcaaccc tagttccctc 120atttttatga tttatgctca tttctttgta cacttcgtct tgctccatct cccaactcat 180ggcccctggc tttggattat tgttttggtc ttttattttt tgtcttcttc tacctcaaca 240cttatcttcc tctcccagtc tccggtaccc tatcaccaag gttgtcatta acctttcata 300ttattcctca ttatccatgt attcatttgc aaataagcgt atattaacaa aatcacaggt 360ttatggagat ataattcaca taccttaaaa ttcaggcttt taaagtgtac ctttcatgtg 420gtttttggta tattcacaaa gttatgcatt gatcaccacc atctgattcc ataacatgtt 480caatacctca aaaagaagtc tgtactcatt agtagtcatt tcacattcac cactccctct 540ggctctgggc agtcactgat ctttgtgtct ctatggattt gcctagtcta ggtattttta 600tgtaaatggc atcatacaac atgtgacctt ttgtttggct tttttcattt agcaaaatgt 660tatcaaggtc tgtccctgtt gtagcatgta ttagcacttc atttcttata tgctgaatga 720tatactttat ttgtccatca gttgttcatg ctttatttgt ccatcagttg atgaacattt 780gcgtttttgc cactttgggc tattaagaat aatgctactg tgaacaagtg tgtacaagtt 840cctctacaaa tttttgtgtg gacatatcct ttcagttctc tcaggtgtat atctgggaat 900tgaattgctg ggtcgtgtag tagctatgtt aaacactttg agaaactgct ataatgttct 960ccagagctgt accattttaa attctgtgta tgaggattcc acgttctcca cttcctcacc 1020agtgtatgga tttgggggta tactttttaa aaagtgggat taggctgggc acagtggctc 1080acacctgtaa tcccaacact tcaggaagct gaggtgggag gatcacttga gcctagtagt 1140ttgagaccag cctgggcaac atagggagac cctgtctcta caaaaaataa tttaaaataa 1200attagctggg cgttgtggca cacacctgta gtcccagcta catgggaggc tgaggtggaa 1260ggattccctg agcccagaag tttgaggttg cagtgagcca tgatggcagc actatactgt 1320agcctgggtg tcagagcaag actccgtttc agggaagaaa aaaaaaagtg ggatgatatt 1380tttgacactt ttcttcttgt tttcttaatt tcatacttct ggaaattcca ttaaattagc 1440tggtaccact ctaactcatt gtgtttcatg gctgcatagt aatattgcat aatataaata 1500taccattcat tcatcaaagt tagcagatat tgactgttag gtgccaggca ctgctctaag 1560cgttaaagaa aaacacacaa aaacttttgc attcttagag tttattttcc aatggagggg 1620gtggagggag gtaagaattt aggaaataaa ttaattacat atatagcata gggtttcacc 1680agtgagtgca gcttgaatcg ttggcagctt tcttagtagt ataaatacag tactaaagat 1740gaaattactc taaatggtgt tacttaaatt actggaatag gtattactat tagtcacttt 1800gcaggtgaaa gtggaaacac catcgtaaaa tgtaaaatag gaaacagctg gttaatgtt 1859471082DNAHomo sapiensmisc_featuresequence of STAR47 47atcattagtc attagggaaa tgcaaatgaa aaacacaagc agccaccaat atacacctac 60taggatgatt taaaggaaaa taagtgtgaa gaaggacgta aagaaattgt aaccctgata 120cattgatggt agaaatggat aaagttgcag ccactgtgaa aaacagtctg cagtggctca 180gaaggttaaa tatagaaccc ctgttggacc caggaactct actcttaggc accccaaaga 240atagagaaca gaaatcaaac agatgtttgt atactaatgt ttgtagcatc acttttcaca 300ggagccaaaa ggtggaaata atccaaccat cagtgaacaa atgaatgtaa taaaagcaag 360gtggtctgca tgcaatgcta catcatccat ctgtaaaaaa cgaacatcat tttgatagat 420gatacaacat gggtggacat tgagaacatt atgcttagtg aaataagcca gacacaaaag 480gaatatattg tataattgta attacatgaa gtgcctagaa tagtcaaatt catacaagag 540aaagtgggat aggaatcacc atgggctgga aataggggga aggtgctata ctgcttattg 600tggacaaggt ttcgtaagaa atcatcaaaa ttgtgggtgt agatagtggt gttggttatg 660caaccctgtg aatatattga atgccatgga gtgcacactt tggttaaaag gttcaaatga 720taaatattgt gttatatata tttccccacg atagaaaaca cgcacagcca agcccacatg 780ccagtcttgt tagctgcctt cctttacctt caagagtggg ctgaagcttg tccaatcttt 840caaggttgct gaagactgta tgatggaagt catctgcatt gggaaagaaa ttaatggaga 900gaggagaaaa cttgagaatc cacactactc accctgcagg gccaagaact ctgtctccca 960tgctttgctg tcctgtctca gtatttcctg tgaccacctc ctttttcaac tgaagacttt 1020gtacctgaag gggttcccag gtttttcacc tcggcccttg tcaggactga tcctctcaac 1080ta 1082481242DNAHomo sapiensmisc_featuresequence of STAR48 48atcatgtatt tgttttctga attaattctt agatacatta atgttttatg ttaccatgaa 60tgtgatatta taatataata tttttaattg gttgctactg tttataagaa tttcattttc 120tgtttacttt gccttcatat ctgaaaacct tgctgatttg attagtgcat ccacaaattt 180tcttggattt tctatgggta attacaaatc tccacacaat gaggttgcag tgagccaaga 240tcacaccact gtactccagc ctgggcgaca gagtgagaca ccatctcaca aaaacacata 300aacaaacaaa cagaaactcc acacaatgac aacgtatgtg ctttcttttt ttcttcctct 360ttctataata tttctttgtc ctatcttaac tgaactggcc agaaacccca ggacaatgat 420aaatacgagc agtgtcaaca gacatctcat tccctttcct agcttttata aaaataacga 480ttatgcttca acattacata tggtggtgtc gatggttttg ttatagataa gcttatcagg 540ttaagaaatt tgtctgcgtt tcctagtttg gtataaagat tttaatataa atgaatgttg 600tattttatca tcttattttt ttcctacatc tgctaaggta atcctgtgtt ttcccctttt 660caatctccta atgtggtgaa tgacattaaa ataccttcta ttgttaaaat attcttgcaa 720cgctgtatag aaccaatgcc tttattctgt attgctgatg gatttttgaa aaatatgtag 780gtggacttag ttttctaagg ggaatagaat ttctaatata tttaaaatat tttgcatgta 840tgttctgaag gacattggtg tgtcatttct ataccatctg gctactagag gagccgactg 900aaagtcacac tgccggagga ggggagaggt gctcttccgt ttctggtgtc tgtagccatc 960tccagtggta gctgcagtga taataatgct gcagtgccga cagttctgga aggagcaaca 1020acagtgattt cagcagcagc agtattgcgg gatccccacg atggagcaag ggaaataatt 1080ctggaagcaa tgacaatatc agctgtggct atagcagctg agatgtgagt tctcacggtg 1140gcagcttcaa ggacagtagt gatggtccaa tggcgcccag acctagaaat gcacatttcc 1200tcagcaccgg ctccagatgc tgagcttgga cagctgacgc ct 1242491015DNAHomo sapiensmisc_featuresequence of STAR49 49aaaccagaaa cccaaaacaa tgggagtgac atgctaaaac cagaaaccca aaacaatggg 60agggtcctgc taaaccagaa acccaaaaca atgggagtga agtgctaaaa ccagaaaccc 120aaaacaatgg gagtgtcctg ctacaccaga aacccaaaac gatgggagtg acgtgataaa 180accagacacc caaaacaatg ggagtgacgt gctaaaccag aaacccaaaa caatgggagt 240gacgtgctaa aacctggaaa cctaaaacaa tgcgagtgag gtgctaacac cagaatccat 300aacaatgtga gtgacgtgct aaaccagaac ccaaaacaat gggagtgacg tgctaaaaca 360ggaacccaaa acaatgagag tgacgtgcta aaccagaaac ccaaaacaat gggaatgacg 420tgctaaaacc ggaacccaaa acaatgggag tgatgtgcta aaccagaaac ccaaaacaat 480gggaatgaca tgctaaaact ggaacccaaa acaatggtaa ctaagagtga tgctaaggcc 540ctacattttg gtcacactct caactaagtg agaacttgac tgaaaaggag gatttttttt 600tctaagacag agttttggtc tgtcccccag agtggagtgc agtggcatga tctcggctca 660ctgcaagctc tgcctcccgg gttcaggcca ttctcctgcc tcagcctcct gagtagctgg 720gaatacaggc acccgccacc acacttggct aattttttgt atttttagta gagatggggt 780ttcaccatat tagcaaggat ggtctcaatc tcctgacctc gtgatctgcc cacctcaggc 840tcccaaagtg ctgggattac aggtgtgagc caccacaccc agcaaaaagg aggaattttt 900aaagcaaaat tatgggaggc cattgttttg aactaagctc atgcaatagg tcccaacaga 960ccaaaccaaa ccaaaccaaa atggagtcac tcatgctaaa tgtagcataa tcaaa 1015502355DNAHomo sapiensmisc_featuresequence of STAR50 50caaccatcgt tccgcaagag cggcttgttt attaaacatg aaatgaggga aaagcctagt 60agctccattg gattgggaag aatggcaaag agagacaggc gtcattttct agaaagcaat 120cttcacacct gttggtcctc acccattgaa tgtcctcacc caatctccaa cacagaaatg 180agtgactgtg tgtgcacatg cgtgtgcatg tgtgaaagta tgagtgtgaa tgtgtctata 240tgggaacata tatgtgattg tatgtgtgta actatgtgtg actggcagcg tggggagtgc 300tggttggagt gtggtgtgat gtgagtatgc atgagtggct gtgtgtatga ctgtggcggg 360aggcggaagg ggagaagcag caggctcagg tgtcgccaga gaggctggga ggaaactata 420aacctgggca atttcctcct catcagcgag cctttcttgg gcaatagggg cagagctcaa 480agttcacaga gatagtgcct gggaggcatg aggcaaggcg gaagtactgc gaggaggggc 540agagggtctg acacttgagg ggttctaatg ggaaaggaaa gacccacact gaattccact 600tagccccaga ccctgggccc agcggtgccg gcttccaacc ataccaacca tttccaagtg 660ttgccggcag aagttaacct ctcttagcct cagtttcccc acctgtaaaa tggcagaagt 720aaccaagctt accttcccgg cagtgtgtga ggatgaaaag agctatgtac gtgatgcact 780tagaagaagg tctagggtgt gagtggtact cgtctggtgg gtgtggagaa gacattctag 840gcaatgagga ctggggagag cctggcccat ggcttccact cagcaaggtc agtctcttgt 900cctctgcact cccagccttc cagagaggac cttcccaacc agcactcccc acgctgccag 960tcacacatag ttacacacat acaatcacat atatgttccc atatagacac attcacactc 1020ataccttcac acatgcacac gcatgtgcac acacagtcac tcatttctgt gttggagatt 1080gggtgaggac attcaatggg tgaggaccaa caggtgtgaa gattgctttc tagaaaatga 1140ctcctgtctc tctttgccat tcttcccaat ccgatggagc tactaggctt ttccctcatt 1200tcatgtttaa taaaccttcc caatggcgaa atgggctttc tcaagaagtg gtgagtgtcc 1260catccctgcg gtggggacag gggtggcagc ggacaagcct gcctggaggg aactgtcagg 1320ctgattccca gtccaactcc agcttccaac acctcatcct ccaggcagtc ttcattcttg 1380gctctaattt cgctcttgtt ttctttttta tttttatcga gaactgggtg gagagctttt 1440ggtgtcattg gggattgctt tgaaaccctt ctctgcctca cactgggagc tggcttgagt 1500caactggtct ccatggaatt tcttttttta gtgtgtaaac agctaagttt taggcagctg 1560ttgtgccgtc cagggtggaa agcagcctgt tgatgtggaa ctgcttggct cagatttctt 1620gggcaaacag atgccgtgtc tctcaactca ccaattaaga agcccagaaa atgtggcttg 1680gagaccacat gtctggttat gtctagtaat tcagatggct tcacctggga agccctttct 1740gaatgtcaaa gccatgagat aaaggacata tatatagtag ctagggtggt ccacttctta 1800ggggccatct ccggaggtgg tgagcactaa gtgccaggaa gagaggaaac tctgttttgg 1860agccaaagca taaaaaaacc ttagccacaa accactgaac atttgttttg tgcaggttct 1920gagtccaggg agggcttctg aggagagggg cagctggagc tggtaggagt tatgtgagat 1980ggagcaaggg ccctttaaga ggtgggagca gcatgagcaa aggcagagag gtggtaatgt 2040ataaggtatg tcatgggaaa gagtttggct ggaacagagt ttacagaata gaaaaattca 2100acactattaa ttgagcctct actacgtgct cgacattgtt ctagtcactg agataggttt 2160ggtatacaaa acaaaatcca tcctctatgg acattttagt gactaacaac aatataaata 2220ataaaagtga acaaaagctc aaaacatgcc aggcactatt atttatttat ttatttattt 2280atttatttat tttttgaaac agagtctcgc tctgttgccc aggctggagt gtagtggtgc 2340gatctcggct cactg 2355512289DNAHomo sapiensmisc_featuresequence of STAR51 51tcacaggtga caccaatccc ctgaccacgc tttgagaagc actgtactag attgactttc 60taatgtcagt cttcattttc tagctctgtt acagccatgg tctccatatt atctagtaca 120acacacatac aaatatgtgt gatacagtat gaatataata taaaaatatg tgttataata 180taaatataat attaaaatat gtctttatac tagataataa tacttaataa cgttgagtgt 240ttaactgctc taagcacttt acctgcagga aacagttttt tttttatttt ggtgaaatac 300aactaacata aatttattta caattttaag catttttaag tgtatagttt agtggagtta 360atatattcaa aatgttgtgc agccgtcacc atcatcagtc ttcataactc ttttcatatt 420gtaaaattaa aagtttatgc tcatttaaaa atgactccca atttcccccc tcctcaacct 480ctggaaacta ccattctatt ttctgcctcc gtagttttgc ccactctaag tacctcacat 540aagtggaatt tgtcttattt gcctgtttgt gaccggctga tttcatttag tataatgtcc 600tcaagtttta ttcacgttat atagcatatg tcataatttt cttcactttt aagcttgagt 660aatatttcat cgtatgtatc tcacattttg cttatccatt catctctcag tggacacttg 720agttgcttct acattttagc tgttgtgaat actgctgcta tgaacatggg tgtataaata 780tctcaagacc tttttatcag ttttttaaaa tatatactca gtagtagttt agctggatta 840tatggtaatt ttatttttaa tttttgagga actgtcctac ccttttattc aatagtagct 900ataccaattg acaattggca ttcctaccaa cagggcataa gggttctcaa ttctccacat 960attccctgat acttgttatt ttcaggtgtt tttttttttt tttttttttt atgggagcca 1020tgttaatggg tgtaaggtga tatttcatta tagttttgat ttgcatttcc ctaatgatta 1080gtgatgttaa gcatctcttc atgtgcctat tggccatttg tatatcttct ttaaaaatat 1140atatatactc attcctttgc ccatttttga attatgttta ttttttgtta ttgagtttca 1200atacttttct atataaccta ggtattaatc ctttatcaga cttaagattt gcaaatattc 1260tctttcattc cacaggttgc taattctctc tgttggtaat atcttttgat gctgttgtgt 1320ccagaattga ttcattcctg tgggttcttg gtctcactga cttcaagaat aaagctgcgg 1380accctagtgg tgagtgttac acttcttata gatggtgttt ccggagtttg ttccttcaga 1440tgtgtccaga gtttcttcct tccaatgggt tcatggtctt gctgacttca ggaatgaagc 1500cgcagacctt cgcagtgagg tttacagctc ttaaaggtgg cgtgtccaga gttgtttgtt 1560ccccctggtg ggttcgtggt cttgctgact tcaggaatga agccgcagac cctcgcagtg 1620agtgttacag ctcataaagg tagtgcggac acagagtgag ctgcagcaag atttactgtg 1680aagagcaaaa gaacaaagct tccacagcat agaaggacac cccagcgggt tcctgctgct 1740ggctcaggtg gccagttatt attcccttat ttgccctgcc cacatcctgc tgattggtcc 1800attttacaga gtactgattg gtccatttta cagagtgctg attggtgcat ttacaatcct 1860ttagctagac acagagtgct gattgctgca ttcttacaga gtgctgattg gtgcatttac 1920agtcctttag ctagatacag aacgctgatt gctgcgtttt ttacagagtg ctgattggtg 1980catttacaat cctttagcta gacacagtgc tgattggtgg gtttttacag agtgctgatt 2040ggtgcgtctt tacagagtgc tgattggtgc atttacaatc ctttagctag acacagagtg 2100ctgattggtg cgtttataat cctctagcta gacagaaaag ttttccaagt ccccacctga 2160ccgagaagcc ccactggctt cacctctcac tgttatactt tggacatttg tccccccaaa 2220atctcatgtt gaaatgtaac ccctaatgtt ggaactgagg ccagactgga tgtggctggg 2280ccatgggga 2289521184DNAHomo sapiensmisc_featuresequence of STAR52 52ctcttctttg tttttttatt ttggggtgtg tgggtacgtg taagatgaga aatgtacaaa 60cacaagtatt tcagaaactc caagtaatat tctgtctgtg agttcacggt aaataaataa 120aaagggcaaa gtgacagaaa tacaggatta ttaaaagcaa aataatgttc tttgaaatcc 180cccccttggt gtatttttta tcttaggatg cagcactttc agcatgccca agtattgaaa 240gcagtgtttt tacgctacca cggtaatttt atttagaaac cccatgttca cttttagttt 300taaaatggtc tttatgacat aaaattatca gcattcatat ttttgtgttt taatattcct 360ttggctactt attgaaacag taaacattac gaaaattagt aaacaaatct ttgatagttg 420cttatttttg tttaattgaa tgtttatttt attaggtaaa tatacaatca aatttattta 480aaaataatga ggaaaagaat acttttcttt cgctttgcga aagcaaagtg atttttcatt 540cttctccgtc cgattccttc tcttccagct gccacagccg actgacaggc tcccggcggc 600ctgaggagta gtatgcaaat tttggatgat tgacacctac agtagaagcc aatcacgtca 660aagtaggatg ctgattggtt gacaacaata ggcgtaaacc ttgacgtttt aaaaacctga 720cacccaatcc aggcgattca tgcaaataaa ggaagggagt cacattacca ggggccagag 780agacttgagt acgacctcac gtgttcagtg gtggatattg cacagacgtc tgcaaggtct 840atataaacgc tacataatgt tcaactcaat tgcttgcctt ggcctttccc aaacttgtca 900ctggaatata aattatccct tttttaaaaa taaaaaaata agaattatgt agtgcacata 960tatgatggtt catgtagaaa tctaaatgga cttccaacgc atggaatttt cctatttccc 1020cctttcttta aattaatcct cagtgaagga ggctgttttc ccctagattt caaaaggacg 1080agatttacag agcctttcct tggagaaacc cgctctaggc acagatggtc agtaaattta 1140gcttcttcag cgaagttcca catggcaccg ccagatggca taag 1184531431DNAHomo sapiensmisc_featuresequence of STAR53 53ccctgaggaa gatgacgagt aactccgtaa gagaaccttc cactcatccc ccacatccct 60gcagacgtgc tattctgtta tgatactggt atcccatctg tcacttgctc cccaaatcat 120tcccttctta caattttcta ctgtacagca ttgaggctga acgatgagag atttcccatg 180ctctttctac tccctgccct gtatatatcc ggggatcctc cctacccagg atgctgtggg 240gtcccaaacc ccaagtaagc cctgatatgc

gggccacacc tttctctagc ctaggaattg 300ataacccagg cgaggaagtc actgtggcat gaacagatgg ttcacttcga ggaaccgtgg 360aaggcgtgtg caggtcctga gatagggcag aatcggagtg tgcagggtct gcaggtcagg 420aggagttgag attgcgttgc cacgtggtgg gaactcactg ccacttattt ccttctctct 480tcttgcctca gcctcaggga tacgacacat gcccatgatg agaagcagaa cgtggtgacc 540tttcacgaac atgggcatgg ctgcggaccc ctcgtcatca ggtgcatagc aagtgaaagc 600aagtgttcac aacagtgaaa agttgagcgt catttttctt agtgtgccaa gagttcgatg 660ttagcgttta cgttgtattt tcttacactg tgtcattctg ttagatacta acattttcat 720tgatgagcaa gacatactta atgcatattt tggtttgtgt atccatgcac ctaccttaga 780aaacaagtat tgtcggttac ctctgcatgg aacagcatta ccctcctctc tccccagatg 840tgactactga gggcagttct gagtgtttaa tttcagattt tttcctctgc atttacacac 900acacgcacac aaaccacacc acacacacac acacacacac acacacacac acacacacac 960acacaccaag taccagtata agcatctgcc atctgctttt cccattgcca tgcgtcctgg 1020tcaagctccc ctcactctgt ttcctggtca gcatgtactc ccctcatccg attcccctgt 1080agcagtcact gacagttaat aaacctttgc aaacgttccc cagttgtttg ctcgtgccat 1140tattgtgcac acagctctgt gcacgtgtgt gcatatttct ttaggaaaga ttcttagaag 1200tggaattgct gtgtcaaagg agtcatttat tcaacaaaac actaatgagt gcgtcctcgt 1260gctgagcgct gttctaggtg ctggagcgac gtcagggaac aaggcagaca ggagttcctg 1320acccccgttc tagaggagga tgtttccagt tgttgggttt tgtttgtttg tttcttctag 1380agatggtggt cttgctctgt ccaggctaga gtgcagtggc atgatcatag c 143154975DNAHomo sapiensmisc_featuresequence of STAR54 54ccataaaagt gtttctaaac tgcagaaaaa tccccctaca gtcttacagt tcaagaattt 60tcagcatgaa atgcctggta gattacctga ctttttttgc caaaaataag gcacagcagc 120tctctcctga ctctgacttt ctatagtcct tactgaatta tagtccttac tgaattcatt 180cttcagtgtt gcagtctgaa ggacacccac attttctctt tgtctttgtc aattctttgt 240gttgtaaggg caggatgttt aaaagttgaa gtcattgact tgcaaaatga gaaatttcag 300agggcatttt gttctctaga ccatgtagct tagagcagtg ttcacactga ggttgctgct 360aatgtttctg cagttcttac caatagtatc atttacccag caacaggata tgatagagga 420cttcgaaaac cccagaaaat gttttgccat atatccaaag ccctttggga aatggaaagg 480aattgcgggc tcccattttt atatatggat agatagagac caagaaagac caaggcaact 540ccatgtgctt tacattaata aagtacaaaa tgttaacatg taggaagtct aggcgaagtt 600tatgtgagaa ttctttacac taattttgca acattttaat gcaagtctga aattatgtca 660aaataagtaa aaatttttac aagttaagca gagaataaca atgattagtc agagaaataa 720gtagcaaaat cttcttctca gtattgactt ggttgctttt caatctctga ggacacagca 780gtcttcgctt ccaaatccac aagtcacatc agtgaggaga ctcagctgag actttggcta 840atgttggggg gtccctcctg tgtctcccca ggcgcagtga gcctgcaggc cgacctcact 900cgtggcacac aactaaatct ggggagaagc aacccgatgc cagcatgatg cagatatctc 960agggtatgat cggcc 97555501DNAHomo sapiensmisc_featuresequence of STAR55 55cctgaactca tgatccgccc acctcagcct cctgaagtgc tgggattaca ggtgtgagcc 60accacaccca gccgcaacac actcttgagc aaccaatgtg tcataaaaga aataaaatgg 120aaatcagaaa gtatcttgag acagacaaaa atggaaacac aacataccaa aatttatggg 180acacagcaaa agcagtttta ggagggaagt ttatagtgat gaatacctac ctcaaaatca 240ttagcctgat tggatgacac tacagtgtat aaatgaattg aaaaccacat tgtgccccat 300acatatatac aatttttatt tgttaattaa aaataaaata aaactttaaa aaagaagaaa 360gagctcaaat aaacaaccta actttatacc tcaaggaaat agaagagcca gctaagccca 420aagttgacag aaggaaaaaa atattggcag aaagaaatga aacagagact agaaagacaa 480ttgaagagat cagcaaaact a 50156741DNAHomo sapiensmisc_featuresequence of STAR56 56acacaggaaa agatcgcaat tgttcagcag agctttgaac cggggatgac ggtctccctc 60gttgcccggc aacatggtgt agcagccagc cagttatttc tctggcgtaa gcaataccag 120gaaggaagtc ttactgctgt cgccgccgga gaacaggttg ttcctgcctc tgaacttgct 180gccgccatga agcagattaa agaactccag cgcctgctcg gcaagaaaac gatggaaaat 240gaactcctca aagaagccgt tgaatatgga cgggcaaaaa agtggatagc gcacgcgccc 300ttattgcccg gggatgggga gtaagcttag tcagccgttg tctccgggtg tcgcgtgcgc 360agttgcacgt cattctcaga cgaaccgatg actggatgga tggccgccgc agtcgtcaca 420ctgatgatac ggatgtgctt ctccgtatac accatgttat cggagagctg ccaacgtatg 480gttatcgtcg ggtatgggcg ctgcttcgca gacaggcaga acttgatggt atgcctgcga 540tcaatgccaa acgtgtttac cggatcatgc gccagaatgc gctgttgctt gagcgaaaac 600ctgctgtacc gccatcgaaa cgggcacata caggcagagt ggccgtgaaa gaaagcaatc 660agcgatggtg ctctgacggg ttcgagttct gctgtgataa cggagagaga ctgcgtgtca 720cgttcgcgct ggactgctgt g 741571365DNAHomo sapiensmisc_featuresequence of STAR57 57tccttctgta aataggcaaa atgtatttta gtttccacca cacatgttct tttctgtagg 60gcttgtatgt tggaaatttt atccaattat tcaattaaca ctataccaac aatctgctaa 120ttctggagat gtggcagtga ataaaaaagt tatagtttct gattttgtgg agcttggact 180ttaatgatgg acaaaacaac acattcttaa atatatattt catcaaaatt atagtgggtg 240aattatttat atgtgcattt acatgtgtat gtatacataa atgggcggtt actggctgca 300ctgagaatgt acacgtggcg cgaacgaggc tgggcggtca gagaaggcct cccaaggagg 360tggctttgaa gctgagtggt gcttccacgt gaaaaggctg gaaagggcat tccaagaaaa 420ggctgaggcc agcgggaaag aggttccagt gcgctctggg aacggaaagc gcacctgcct 480gaaacgaaaa tgagtgtgct gaaataggac gctagaaagg gaggcagagg ctggcaaaag 540cgaccgagga ggagctcaaa ggagcgagcg gggaaggccg ctgtggagcc tggaggaagc 600acttcggaag cgcttctgag cgggtaaggc cgctgggagc atgaactgct gagcaggtgt 660gtccagaatt cgtgggttct tggtctcact gacttcaaga atgaagaggg accgcggacc 720ctcgcggtga gtgttacagc tcttaaggtg gcgcgtctgg agtttgttcc ttctgatgtt 780cggatgtgtt cagagtttct tccttctggt gggttcgtgg tctcgctggc tcaggagtga 840agctgcagac cttcgcggtg agtgttacag ctcataaaag cagggtggac tcaaagagtg 900agcagcagca agatttattg caaagaatga aagaacaaag cttccacact gtggaagggg 960accccagcgg gttgccactg ctggctccgc agcctgcttt tattctctta tctggcccca 1020cccacatcct gctgattggt agagccgaat ggtctgtttt gacggcgctg attggtgcgt 1080ttacaatccc tgcgctagat acaaaggttc tccacgtccc caccagatta gctagataga 1140gtctccacac aaaggttctc caaggcccca ccagagtagc tagatacaga gtgttgattg 1200gtgcattcac aaaccctgag ctagacacag ggtgatgact ggtgtgttta caaaccttgc 1260ggtagataca gagtatcaat tggcgtattt acaatcactg agctaggcat aaaggttctc 1320caggtcccca ccagactcag gagcccagct ggcttcaccc agtgg 1365581401DNAHomo sapiensmisc_featuresequence of STAR58 58aagtttacct tagccctaaa ttatttcatt gtgattggca ttttaggaaa tatgtattaa 60ggaatgtctc ttaggagata aggataacat atgtctaaga aaattatatt gaaatattat 120tacatgaact aaaatgttag aactgaaaaa aaattattgt aactccttcc agcgtaggca 180ggagtatcta gataccaact ttaacaactc aactttaaca acttcgaacc aaccagatgg 240ctaggagatt cacctattta gcatgatatc ttttattgat aaaaaaatat aaaacttcca 300ttaaattttt aagctactac aatcctatta aattttaact taccagtgtt ctcaatgcta 360cataatttaa aatcattgaa atcttctgat tttaactcct cagtcttgaa atctacttat 420ttttagttac atatatatcc aatctactgc cgctagtaga agaagcttgg aatttgagaa 480aaaaatcaga cgttttgtat attctcatat tcactaattt attttttaaa tgagtttctg 540caatgcatca agcagtggca aaacaggaga aaaattaaaa ttggttgaaa agatatgtgt 600gccaaacaat cccttgaaat ttgatgaagt gactaatcct gagttattgt ttcaaatgtg 660tacctgttta tacaagggta tcacctttga aatctcaaca ttaaatgaaa ttttataagc 720aatttgttgt aacatgatta ttataaaatt ctgatataac attttttatt acctgtttag 780agtttaaaga gagaaaagga gttaagaata attacatttt cattagcatt gtccgggtgc 840aaaaacttct aacactatct tcaaatcttt ttctccattg ccttctgaac atacccactt 900gggtatctca ttagcactgc aaattcaaca ttttcgattg ctaatttttc tccctaaata 960tttatttgtt ttctcagctt tagccaatgt ttcactattg accatttgct caagtatagt 1020gacgcttcaa tgaccttcag agagctgttt cagtccttcc tggactactt gcatgcttcc 1080aacaaaatga agcactcttg atgtcagtca ctcaaataaa tggaaatggg cccatttact 1140aggaatgtta acagaataaa aagatagacg tgacaccagt tgcttcagtc catctccatt 1200tacttgctta aggcctggcc atatttctca cagttgatat ggcgcagggc acatgtttaa 1260atggctgttc ttgtaggatg gtttgactgt tggattcctc atcttccctc tccttaggaa 1320ggaaggttac agtagtactg ttggctcctg gaatatagat tcataaagaa ctaatggagt 1380atcatctccc actgctcttg t 140159866DNAHomo sapiensmisc_featuresequence of STAR59 59gagatcacgc cactgcactc cagcctgggg gacagagcaa gactccatct cagaaacaaa 60caaacacaca aagccagtca aggtgtttaa ttcgacggtg tcaggctcag gtctcttgac 120aggatacatc cagcacccgg gggaaacgtc gatgggtggg gtggaatcta ttttgtggcc 180tcaagggagg gtttgagagg tagtcccgca agcggtgatg gcctaaggaa gcccctccgc 240ccaagaagcg atattcattt ctagcctgta gccacccaag agggagaatc gggctcgcca 300cagaccccac aacccccaac ccaccccacc cccacccctc ccacctcgtg aaatgggctc 360tcgctccgtc aggctctagt cacaccgtgt ggttttggaa cctccagcgt gtgtgcgtgg 420gttgcgtggt ggggtggggc cggctgtgga cagaggaggg gataaagcgg cggtgtcccg 480cgggtgcccg ggacgtgggg cgtggggcgt gggtggggtg gccagagcct tgggaactcg 540tcgcctgtcg ggacgtctcc cctcctggtc ccctctctga cctacgctcc acatcttcgc 600cgttcagtgg ggaccttgtg ggtggaagtc accatccctt tggactttag ccgacgaagg 660ccgggctccc aagagtctcc ccggaggcgg ggccttgggc aggctcacaa ggatgctgac 720ggtgacggtt ggtgacggtg atgtacttcg gaggcctcgg gccaatgcag aggtatccat 780ttgacctcgg tgggacaggt cagctttgcg gagtcccgtg cgtccttcca gagactcatc 840cagcgctagc aagcatggtc ccgagg 866602067DNAHomo sapiensmisc_featuresequence of STAR60 60agcagtgcag aactggggaa gaagaagagt ccctacacca cttaatactc aaaagtactc 60gcaaaaaata acacccctca ccaggtggca tnattactct ccttcattga gaaaattagg 120aaactggact tcgtagaagc taattgcttt atccagagcc acctgcatac aaacctgcag 180cgccacctgc atacaaacct gtcagccgac cccaaagccc tcagtcgcac caagcctctg 240ctgcacaccc tcgtgccttc acactggccg ttccccaagc ctggggcata ctncccagct 300ctgagaaatg tattcatcct tcaaagccct gctcatgtgt cctnntcaac aggaaaatct 360cccatgagat gctctgctat ccccatctct cctgccccat agcttaggca nacttctgtg 420gtggtgagtc ctgggctgtg ctgtgatgtg ttcgcctgcn atgtntgttc ttccccacaa 480tgatgggccc ctgaattctc tatctctagc acctgtgctc agtaaaggct tgggaaacca 540ggctcaaagc ctggcccaga tgccaccttt tccagggtgc ttccgggggc caccaaccag 600agtgcagcct tctcctccac caggaactct tgcagcccca cccctgagca cctgcacccc 660attacccatc tttgtttctc cgtgtgatcg tattattaca gaattatata ctgtattctt 720aatacagtat ataattgtat aattattctt aatacagtat ataattatac aaatacaaaa 780tatgtgttaa tggaccgttt atgttactgg taaagcttta agtcaacagt gggacattag 840ttaggttttt ggcgaagtca aaagttatat gtgcattttc aacttcttga ggggtcggta 900cntctnaccc ccatgttgtt caanggtcaa ctgtctacac atatcatagc taattcacta 960cagaaatgtt agcttgtgtc actagtatct ccccttctca taagcttaat acacatacct 1020tgagagagct cttggccatc tctactaatg actgaagttt ttatttatta tagatgtcat 1080aataggcata aaactacatt acatcattcg agtgccaatt ttgccacctt gaccctcttt 1140tgcaaaacac caacgtcagt acacatatga agaggaaact gcccgagaac tgaagttcct 1200gagaccagga gctgcaggcg ttagatagaa tatggtgacg agagttacga ggatgacgag 1260agtaaatact tcatactcag tacgtgccaa gcactgctat aagcgctctg tatgtgtgaa 1320gtcatttaat cctcacagca tcccacggtg taattatttt cattatcccc atgagggaac 1380agaaactcag aacggttcaa cacatatgcg agaagtcgca gccggtcagt gagagagcag 1440gttcccgtcc aagcagtcag accccgagtg cacactctcg acccctgtcc agcagactca 1500ctcgtcataa ggcggggagt gntctgtttc agccagatgc tttatgcatc tcagagtacc 1560caaaccatga aagaatgagg cagtattcan gagcagatgg ngctgggcag taaggctggg 1620cttcagaata gctggaaagc tcaagtnatg ggacctgcaa gaaaaatcca ttgtttngat 1680aaatagccaa agtccctagg ctgtaagggg aaggtgtgcc aggtgcaagt ggagctctaa 1740tgtaaaatcg cacctgagtc tcctggtctt atgagtnctg ggtgtacccc agtgaaaggt 1800cctgctgcca ccaagtgggc catggttcag ctgtgtaagt gctgagcggc agccggaccg 1860cttcctctaa cttcacctcc aaaggcacag tgcacctggt tcctccagca ctcagctgcg 1920aggcccctag ccagggtccc ggcccccggc ccccggcagc tgctccagct tccttcccca 1980cagcattcag gatggtctgc gttcatgtag acctttgttt tcagtctgtg ctccgaggtc 2040actggcagca ctagccccgg ctcctgt 2067611470DNAHomo sapiensmisc_featuresequence of STAR61 61cagcccccac atgcccagcc ctgtgctcag ctctgcagcg gggcatggtg ggcagagaca 60cagaggccaa ggccctgctt cggggacggt gggcctggga tgagcatggc cttggccttc 120gccgagagtn ctcttgtgaa ggaggggtca ggaggggctg ctgcagctgg ggaggagggc 180gatggcactg tggcangaag tgaantagtg tgggtgcctn gcaccccagg cacggccagc 240ctggggtatg gacccggggc cntctgttct agagcaggaa ggtatggtga ggacctcaaa 300aggacagcca ctggagagct ccaggcagag gnacttgaga ggccctgggg ccatcctgtc 360tcttttctgg gtctgtgtgc tctgggcctg ggcccttcct ctgctccccc gggcttggag 420agggctggcc ttgcctcgtg caaaggacca ctctagactg gtaccaagtc tggcccatgg 480cctcctgtgg gtgcaggcct gtgcgggtga cctgagagcc agggctggca ggtcagagtc 540aggagaggga tggcagtgga tgccctgtgc aggatctgcc taatcatggt gaggctggag 600gaatccaaag tgggcatgca ctctgcactc atttctttat tcatgtgtgc ccatcccaac 660aagcagggag cctggccagg agggcccctg ggagaaggca ctgatgggct gtgttccatt 720taggaaggat ggacggttgt gagacgggta agtcagaacg ggctgcccac ctcggccgag 780agggccccgt ggtgggttgg caccatctgg gcctggagag ctgctcagga ggctctctag 840ggctgggtga ccaggnctgg ggtacagtag ccatgggagc aggtgcttac ctggggctgt 900ccctgagcag gggctgcatt gggtgctctg tgagcacaca cttctctatt cacctgagtc 960ccnctgagtg atgagnacac ccttgttttg cagatgaatc tgagcatgga gatgttaagt 1020ggcttgcctg agccacacag cagatggatg gtgtagctgg gacctgaggg caggcagtcc 1080cagcccgagg acttcccaag gttgtggcaa actctgacag catgacccca gggaacaccc 1140atctcagctc tggtcagaca ctgcggagtt gtgttgtaac ccacacagct ggagacagcc 1200accctagccc cacccttatc ctctcccaaa ggaacctgcc ctttcccttc attttcctct 1260tactgcattg agggaccaca cagtgtggca gaaggaacat gggttcagga cccagatgga 1320cttgcttcac agtgcagccc tcctgtcctc ttgcagagtg cgtcttccac tgtgaagttg 1380ggacagtcac accaactcaa tactgctggg cccgtcacac ggtgggcagg caacggatgg 1440cagtcactgg ctgtgggtct gcagaggtgg 1470621011DNAHomo sapiensmisc_featuresequence of STAR62 62agtgtcaaat agatctacac aaaacaagat aatgtctgcc catttttcca aagataatgt 60ggtgaagtgg gtagagagaa atgcatccat tctccccacc caacctctgc taaattgtcc 120atgtcacagt actgagacca gggggcttat tcccagcggg cagaatgtgc accaagcacc 180tcttgtctca atttgcagtc taggccctgc tatttgatgg tgtgaaggct tgcacctggc 240atggaaggtc cgttttgtac ttcttgcttt agcagttcaa agagcaggga gagctgcgag 300ggcctctgca gcttcagatg gatgtggtca gcttgttgga ggcgccttct gtggtccatt 360atctccagcc cccctgcggt gttgctgttt gcttggcttg tctggctctc catgccttgt 420tggctccaaa atgtcatcat gctgcacccc aggaagaatg tgcaggccca tctcttttat 480gtgctttggg ctattttgat tccccgttgg gtatattccc taggtaagac ccagaagaca 540caggaggtag ttgctttggg agagtttgga cctatgggta tgaggtaata gacacagtat 600cttctctttc atttggtgag actgttagct ctggccgcgg actgaattcc acacagctca 660cttgggaaaa ctttattcca aaacatagtc acattgaaca ttgtggagaa tgagggacag 720agaagaggcc ctagatttgt acatctgggt gttatgtcta taaatagaat gctttggtgg 780tcaactagac ttgttcatgt tgacatttag tcttgccttt tcggtggtga tttaaaaatt 840atgtatatct tgtttggaat atagtggagc tatggtgtgg cattttcatc tggctttttg 900tttagctcag cccgtcctgt tatgggcagc cttgaagctc agtagctaat gaagaggtat 960cctcactccc tccagagagc ggtcccctca cggctcattg agagtttgtc a 1011631410DNAHomo sapiensmisc_featuresequence of STAR63 63ccacagcctg atcgtgctgt cgatgagagg aatctgctct aagggtctga gcggagggag 60atgccgaagc tttgagcttt ttgtttctgg cttaaccttg gtggattttc accctctggg 120cattacctct tgtccagggg aggggctggg ggagtgcctg gagctgtagg gacagagggc 180tgagtggggg ggactgcttg ggctgaccac ataatattct gctgcgtatt aatttttttt 240tgagacagtc tttctctgtt gcccaggctg gagtgtaatg gcttgatagc tcactgccac 300ctccgcctcc tgggttcaag tgattctcct gcttcagctt ccggagtagc tgggactgca 360ggtgcccgcc accatggctg gctaattttt gtatttttat tagcaatggg gttttgctat 420gttgcccagg ccggtcccga actcctgccc tcaagtgata cacctgcctc ggcctcccaa 480agtgctggga ttagaggctt gagccactgc gcctggccag ctgcatattg ttaattagac 540ataaaatgca aaataagatg atataaacac aaaggtgtga aataagatgg acacctgctg 600agcgcgcctg tcctgaagca tcgcccctct gcaaaagcag gggtcagcat gtgttctccg 660gtccttgctc ttacagagga gtgagctgcc tatgcgtctt ccagccactt cctgggctgc 720tcagaggcct ctcacgggtg ttctgggttg ctgccacttg caggggtgct gaggcggggc 780tcctcccgtg cggggcatgt ccaggccgcc ctctctgaag gcttggcagg tacaggtggg 840agtgggggtc tctgggctgc tgtggggact gggcaggctc ctggaagacc tccctgtgtt 900tgggctgaaa gcgcagcccg aggggaggtc cccagggagg ccgctgtcgg gggtgggggc 960ttggaggagg gaggggccga ggagccggcg acactccgtg acggcccagg aacgtcccta 1020aacaaggcgc cgcgttctcg atggggtggg gtccgctttc ttttctcaaa agctgcagtt 1080actccatgct cggaggactg gcgtccgcgc cctgttccaa tgctgccccg gggccctggc 1140cttggggaat cggggccttg gactggaccc tgggggcttc gcggagccgg gcctggcggg 1200gcgagcggag cagaggctgg gcagccccgg ggaagcgctc gccaaagccg ggcgctgctc 1260ccagagcgcg aggtgcagaa ccagaggctg gtcccgcggc gctaacgaga gaagaggaag 1320cgcgctgtgt agagggcgcc caccccgtgg ggcgaacccc cttcctcaac tccatggacg 1380gggctcatgg gttcccagcg gctcagacgc 1410641414DNAHomo sapiensmisc_featuresequence of STAR64 64tggatcagat ttgttttata ccctcccttc tactgctctg agagttgtac atcacagtct 60actgtatctg tttcccatta ttataatttt tttgcactgt gcttgcctga agggagcctc 120aagttcatga gtctccctac cctcctccca aatgagacat ggacctttga atgctttcct 180gggaccacca ccccaccttt catgctgctg ttatccagga ttttagttca acagtgtttt 240aaccccccaa atgagtcatt tttattgttt cgtatagtga atgtgtattt gggtttgctt 300atatggtgac ctgtttattt gctcctcatt gtacctcatg ctctgctctt tccttctaga 360ttcagtctct ttcctaatga ggtgtctcgc agcaattctt tacaagacag ccaagatagg 420ccagctctca gagcacttgt tgtctgaaaa agtcttgtct tatttaattt ctttttctta 480gagatggggt ctcattatgt tacccacact ggtctcaaac ttctggctta aagcggtcct 540cccaccttgg cctcccaaag tgctaggatt acaggcgtga gcgacctcgt ccagcctgtc 600tgagaaagcg tttgttttgc ccttgctctc agatgacagt ttggggatag aattctaggt 660ggacggtttt tttccttcag ccctttgaag agtctgtatt ttcattatct ccctgcatta 720gatgttcttt tgcaagtaac gtgtcttttc tctctgggta ttcttaaggt tttctctttg 780cctttggtga gctgcagtgg atttgctttt ttcaagaggt caagagaaag gaaagtgtga 840ggtttctgtt ttttactgac aatttgtttg ttgatttgtt ttcccaccca gaggttcctt 900gccactttgc caggctggaa ggcagacttc ttctggtgtc ctgttcacag acggggcagc 960ctgcggaagg ccctgccaca tgcagggcct cggtcctcat tcccttgcat gtggacccgg 1020gcgtgactcc tgttcaggct ggcacttccc agagctgagc cccagcctga ccttcctccc 1080atactgtctt cacaccccct cctttcttct

gatacctgga ggttttcctt tctttcctgt 1140cacctccact tggattttaa atcctctgtc tgtggaattg tattcggcac aggaagatgc 1200ttgcaagggc caggctcatc agccctgtcc ctgctgctgg aagcagcaca gcagagcctc 1260atgctcaggc tgagatggag cagaggcctg cagacgagca cccagctcag ctggggttgg 1320cgccgatggt ggagggtcct cgaaagctct ggggacgatg gcagagctat tggcagggga 1380gccgcagggt cttttgagcc cttaaaagat ctct 1414651310DNAHomo sapiensmisc_featuresequence of STAR65 65gtgaatgttg atggatcaaa tatctttctg tgttgtttat caaagttaaa ataaatgtgg 60tcatttaaag gacaaaagat gaggggttgg agtctgttca agcaaagggt atattaggag 120aaaagcagaa ttctctccct gtgaagggac agtgactcct attttccacc tcatttttac 180taactctcct aactatctgc ttaggtagag atatatccat gtacatttat aaaccacagt 240gaatcatttg attttggaat aaagatagta taaaatgtgt cccagtgttg atatacatca 300tacattaaat atgtctggca gtgttctaat tttacagttg tccaaagata atgttagggc 360atactggcta tggatgaagc tccaatgttc agattgcaaa gaaacttaga attttactaa 420tgaaaccaaa tacatcccaa gaaatttttc agaagaaaaa aagagaaact agtagcaaag 480taaagaatca ccacaatatc atcagatttt ttttatatgt agaatattta ttcagttctt 540ttttcaagta caccttgtct tcattcattg tactttattt tttgtgaagg tttaaattta 600tttcttctat gtgtttagtg atatttaaaa tttttattta atcaagttta tcagaaagtt 660ctgttagaaa atatgacgag gctttaattc cgccatctat attttccgct attatataaa 720gataattgtt ttctcttttt aaaacaactt gaattgggat tttatatcat aattttttaa 780tgtctttttt tattatactt taagttctgg gatacatgtg cagaacgtgc aggtgtgtta 840catagatata cacgtgccat ggtggtttgc tgcacccact aacctgttat cgacattagg 900tatttctcct aatgctatca ccccctattt ccccaccccc cgagaggccc cagtgtgtga 960tgttctcctc cctgtgtcca tgtgttctca ttgttcatct cccacttatg gtatctacca 1020taaccttgaa attgtcttat gcattcactt gtttggttgt tatatagcct ccatcaggac 1080agggatattt gctgctgctt cttttttttt tctttttgag acagtcttgc tccgtcatcc 1140aggctggagt gcttctcggc tcaatgcaac ctccacctcc caggtttaag cgattctcca 1200acttcagcct cccaaatggc tgggactgca ggcatgcacc actacacctg gctaattttt 1260gtatttgtaa tagagacaat gtttcaccat gttggccagg ctggtctcga 1310661917DNAHomo sapiensmisc_featuresequence of STAR67 66aggatcctaa aattttgtga ccctagagca agtactaact atgaaagtga aatagagaat 60gaaggaatta tttaattaag tccagcaaaa cccaaccaaa tcatctgtaa aatatatttg 120ttttcaacat ccaggtattt tctgtgtaaa aggttgagtt gtatgctgac ttattgggaa 180aaataattga gttttcccct tcactttgcc agtgagagga aatcagtact gtaattgtta 240aaggttaccc atacctacct ctactaccgt ctagcatagg taaagtaatg tacactgtga 300agtttcctgc ttgactgtaa tgttttcagt ttcatcccat tgattcaaca gctatttatt 360cagcacttac tacaaccatg ctggaaaccc aagagtaaat aggctgtgtt actcaacagg 420actgaggtac agccgaactg tcaggcaagg ttgctgtcct ttggacttgc ctgctttctc 480tctatgtagg aagaagaaat ggacataccg tccaggaaat agatatatgt tacatttcct 540tattccataa ttaatattaa taaccctgga cagaaactac caagtttcta gacccttata 600gtaccacctt accctttctg gatgaatcct tcacatgttg atacatttta tccaaatgaa 660aattttggta ctgtaggtat aacagacaaa gagagaacag aaaactagag atgaagtttg 720ggaaaaggtc aagaaagtaa ataatgcttc tagaagacac aaaaagaaaa atgaaatggt 780aatgttggga aagttttaat acattttgcc ctaaggaaaa aaactacttg ttgaaattct 840acttaagact ggaccttttc tctaaaaatt gtgcttgatg tgaattaaag caacacaggg 900aaatttatgg gctccttcta agttctaccc aactcaccgc aaaactgttc ctagtaggtg 960tggtatactc tttcagattc tttgtgtgta tgtatatgtg tgtgtgtgtg tgtgtttgta 1020tgtgtacagt ctatatacat atgtgtacct acatgtgtgt atatataaat atatatttac 1080ctggatgaaa tagcatatta tagaatattc ttttttcttt aaatatatat gtgcatacat 1140atgtatatgc acatatatac ataaatgtag atatagctag gtaggcattc atgtgaaaca 1200aagaagccta ttacttttta atggttgcat gatattccat cataggagta tagtacaact 1260tatgtaacac acatttggct tgttgtaaaa ttttggtatt aataaaatag cacatatcat 1320gcaaagacac ccttgcatag gtctattcat tctttgattt ttaccttagg acaaaattta 1380aaagtagaat ttctgggtca agcagtatgc tcatttaaaa tgtcattgca tatttccaaa 1440ttgtcctcca gaaaagtagt aacagtaaca attgatggac tgcgtgtttt ctaaaacttg 1500catttttttc cttattggtg aggtttggca ttttccatat gtttattggc attttaattt 1560tttttggttc atgtctttta ttcccttcct gcaaatttgt ggtgtgtctc aactttattt 1620atactctcat tttcataatt ttctaaagga atttgacttt aaaaaaataa gacagccaat 1680gctttggttt aatttcattg ctgctttttg aagtgactgc tgtgttttta tatactttta 1740tattttgttg ttttagcaaa ttcttctata ttataattgt gtatgctgga acaaaaagtt 1800atatttctta atctagataa aatatttcaa gatgttgtaa ttacagtccc ctctaaaatc 1860atataaatag acgcatagct gtgtgatttg taattagtta tgtccattga tagatcc 19176711DNAArtificialsequence around startcodon of wild-type zeocin resistance gene 67aaaccatggc c 116850DNAArtificialprimer ZEOforwardMUT 68gatctcgcga tacaggattt atgttggcca agttgaccag tgccgttccg 506926DNAArtificialprimer ZEO-WTreverse 69aggcgaattc agtcctgctc ctcggc 267045DNAArtificialprimer ZEO-LEUreverse 70aggccccgcc cccacggctg ctcgccgatc tcggtcaagg ccggc 457145DNAArtificialprimer ZEO-THRreverse 71aggccccgcc cccacggctg ctcgccgatc tcggtggtgg ccggc 457244DNAArtificialprimer ZEO-VALreverse 72aggccccgcc cccacggctg ctcgccgatc tcggtccacg ccgg 447311DNAArtificialsequence around startcodon of wt d2EGFP 73gaattcatgg g 117449DNAArtificialprimer d2EGFPforwardBamHI 74gatcggatcc tatgaggaat tcgccaccat ggtgagcaag ggcgaggag 497541DNAArtificialprimer d2EGFPreverseNotI 75aaggaaaaaa gcggccgcct acacattgat cctagcagaa g 417657DNAArtificialspacer sequence 76tcgatccaaa gactgccaaa tctagatccg agattttcag gagctaagga agctaaa 577799DNAArtificialprimer ZEOforwardBamHI-ATGmut/space 77gatcggatcc ttggtttatg tcgatccaaa gactgccaaa tctagatccg agattttcag 60gagctaagga agctaaagcc aagttgacca gtgaagttc 997838DNAArtificialprimer ZEOforwardBamHI-GTG 78gatcggatcc accgtggcca agttgaccag tgccgttc 387938DNAArtificialprimer ZEOforwardBamHI-TTG 79gatcggatcc accttggcca agttgaccag tgccgttc 388035DNAArtificialprimer BSDBamHIforward 80gatcggatcc accatggcca agcctttgtc tcaag 358125DNAArtificialprimer BSD150reverse 81gtaaaatgat atacgttgac accag 258225DNAArtificialprimer BSD150forward 82ctggtgtcaa cgtatatcat tttac 258324DNAArtificialprimer BSD250reverse 83gccctgttct cgtttccgat cgcg 248424DNAArtificialprimer BSD250forward 84cgcgatcgga aacgagaaca gggc 248524DNAArtificialprimer BSD350reverse 85gccgtcggct gtccgtcact gtcc 248624DNAArtificialprimer BSD350forward 86ggacagtgac ggacagccga cggc 248738DNAArtificialprimer BSD399reverse 87gatcgaattc ttagccctcc cacacgtaac cagagggc 3888103DNAArtificialprimer BSDforwardBamHIAvrII-ATGmut/space 88gatcggatcc taggttggtt tatgtcgatc caaagactgc caaatctaga tccgagattt 60tcaggagcta aggaagctaa agccaagcct ttgtctcaag aag 1038944DNAArtificialprimer BSD399reverseEcoRIAvrII 89gatcgaattc cctaggttag ccctcccaca cgtaaccaga gggc 449042DNAArtificialprimer BSDforwardBamHIAvrII-GTG 90gatcggatcc taggaccgtg gccaagcctt tgtctcaaga ag 429142DNAArtificialprimer BSDforwardBamHIAvrII-TTG 91gatcggatcc taggaccttg gccaagcctt tgtctcaaga ag 4292375DNAArtificialwt zeocin resistance gene 92atg gcc aag ttg acc agt gcc gtt ccg gtg ctc acc gcg cgc gac gtc 48Met Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val1 5 10 15gcc gga gcg gtc gag ttc tgg acc gac cgg ctc ggg ttc tcc cgg gac 96Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp 20 25 30ttc gtg gag gac gac ttc gcc ggt gtg gtc cgg gac gac gtg acc ctg 144Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu 35 40 45ttc atc agc gcg gtc cag gac cag gtg gtg ccg gac aac acc ctg gcc 192Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala 50 55 60tgg gtg tgg gtg cgc ggc ctg gac gag ctg tac gcc gag tgg tcg gag 240Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu65 70 75 80gtc gtg tcc acg aac ttc cgg gac gcc tcc ggg ccg gcc atg acc gag 288Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Met Thr Glu 85 90 95atc ggc gag cag ccg tgg ggg cgg gag ttc gcc ctg cgc gac ccg gcc 336Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala 100 105 110ggc aac tgc gtg cac ttc gtg gcc gag gag cag gac tga 375Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp 115 12093124PRTArtificialSynthetic Construct 93Met Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val1 5 10 15Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp 20 25 30Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu 35 40 45Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala 50 55 60Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu65 70 75 80Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Met Thr Glu 85 90 95Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala 100 105 110Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp 115 12094399DNAArtificialwt blasticidin resistance gene 94atg gcc aag cct ttg tct caa gaa gaa tcc acc ctc att gaa aga gca 48Met Ala Lys Pro Leu Ser Gln Glu Glu Ser Thr Leu Ile Glu Arg Ala1 5 10 15acg gct aca atc aac agc atc ccc atc tct gaa gac tac agc gtc gcc 96Thr Ala Thr Ile Asn Ser Ile Pro Ile Ser Glu Asp Tyr Ser Val Ala 20 25 30agc gca gct ctc tct agc gac ggc cgc atc ttc act ggt gtc aat gta 144Ser Ala Ala Leu Ser Ser Asp Gly Arg Ile Phe Thr Gly Val Asn Val 35 40 45tat cat ttt act ggg gga cct tgt gca gaa ctc gtg gtg ctg ggc act 192Tyr His Phe Thr Gly Gly Pro Cys Ala Glu Leu Val Val Leu Gly Thr 50 55 60gct gct gct gcg gca gct ggc aac ctg act tgt atc gtc gcg atc gga 240Ala Ala Ala Ala Ala Ala Gly Asn Leu Thr Cys Ile Val Ala Ile Gly65 70 75 80aat gag aac agg ggc atc ttg agc ccc tgc gga cgg tgc cga cag gtg 288Asn Glu Asn Arg Gly Ile Leu Ser Pro Cys Gly Arg Cys Arg Gln Val 85 90 95ctt ctc gat ctg cat cct ggg atc aaa gcc ata gtg aag gac agt gat 336Leu Leu Asp Leu His Pro Gly Ile Lys Ala Ile Val Lys Asp Ser Asp 100 105 110gga cag ccg acg gca gtt ggg att cgt gaa ttg ctg ccc tct ggt tat 384Gly Gln Pro Thr Ala Val Gly Ile Arg Glu Leu Leu Pro Ser Gly Tyr 115 120 125gtg tgg gag ggc taa 399Val Trp Glu Gly 13095132PRTArtificialSynthetic Construct 95Met Ala Lys Pro Leu Ser Gln Glu Glu Ser Thr Leu Ile Glu Arg Ala1 5 10 15Thr Ala Thr Ile Asn Ser Ile Pro Ile Ser Glu Asp Tyr Ser Val Ala 20 25 30Ser Ala Ala Leu Ser Ser Asp Gly Arg Ile Phe Thr Gly Val Asn Val 35 40 45Tyr His Phe Thr Gly Gly Pro Cys Ala Glu Leu Val Val Leu Gly Thr 50 55 60Ala Ala Ala Ala Ala Ala Gly Asn Leu Thr Cys Ile Val Ala Ile Gly65 70 75 80Asn Glu Asn Arg Gly Ile Leu Ser Pro Cys Gly Arg Cys Arg Gln Val 85 90 95Leu Leu Asp Leu His Pro Gly Ile Lys Ala Ile Val Lys Asp Ser Asp 100 105 110Gly Gln Pro Thr Ala Val Gly Ile Arg Glu Leu Leu Pro Ser Gly Tyr 115 120 125Val Trp Glu Gly 13096600DNAArtificialwt puromycin resistance gene 96atg acc gag tac aag ccc acg gtg cgc ctc gcc acc cgc gac gac gtc 48Met Thr Glu Tyr Lys Pro Thr Val Arg Leu Ala Thr Arg Asp Asp Val1 5 10 15ccc agg gcc gta cgc acc ctc gcc gcc gcg ttc gcc gac tac ccc gcc 96Pro Arg Ala Val Arg Thr Leu Ala Ala Ala Phe Ala Asp Tyr Pro Ala 20 25 30acg cgc cac acc gtc gat ccg gac cgc cac atc gag cgg gtc acc gag 144Thr Arg His Thr Val Asp Pro Asp Arg His Ile Glu Arg Val Thr Glu 35 40 45ctg caa gaa ctc ttc ctc acg cgc gtc ggg ctc gac atc ggc aag gtg 192Leu Gln Glu Leu Phe Leu Thr Arg Val Gly Leu Asp Ile Gly Lys Val 50 55 60tgg gtc gcg gac gac ggc gcc gcg gtg gcg gtc tgg acc acg ccg gag 240Trp Val Ala Asp Asp Gly Ala Ala Val Ala Val Trp Thr Thr Pro Glu65 70 75 80agc gtc gaa gcg ggg gcg gtg ttc gcc gag atc ggc ccg cgc atg gcc 288Ser Val Glu Ala Gly Ala Val Phe Ala Glu Ile Gly Pro Arg Met Ala 85 90 95gag ttg agc ggt tcc cgg ctg gcc gcg cag caa cag atg gaa ggc ctc 336Glu Leu Ser Gly Ser Arg Leu Ala Ala Gln Gln Gln Met Glu Gly Leu 100 105 110ctg gcg ccg cac cgg ccc aag gag ccc gcg tgg ttc ctg gcc acc gtc 384Leu Ala Pro His Arg Pro Lys Glu Pro Ala Trp Phe Leu Ala Thr Val 115 120 125ggc gtc tcg ccc gac cac cag ggc aag ggt ctg ggc agc gcc gtc gtg 432Gly Val Ser Pro Asp His Gln Gly Lys Gly Leu Gly Ser Ala Val Val 130 135 140ctc ccc gga gtg gag gcg gcc gag cgc gcc ggg gtg ccc gcc ttc ctg 480Leu Pro Gly Val Glu Ala Ala Glu Arg Ala Gly Val Pro Ala Phe Leu145 150 155 160gag acc tcc gcg ccc cgc aac ctc ccc ttc tac gag cgg ctc ggc ttc 528Glu Thr Ser Ala Pro Arg Asn Leu Pro Phe Tyr Glu Arg Leu Gly Phe 165 170 175acc gtc acc gcc gac gtc gag tgc ccg aag gac cgc gcg acc tgg tgc 576Thr Val Thr Ala Asp Val Glu Cys Pro Lys Asp Arg Ala Thr Trp Cys 180 185 190atg acc cgc aag ccc ggt gcc tga 600Met Thr Arg Lys Pro Gly Ala 19597199PRTArtificialSynthetic Construct 97Met Thr Glu Tyr Lys Pro Thr Val Arg Leu Ala Thr Arg Asp Asp Val1 5 10 15Pro Arg Ala Val Arg Thr Leu Ala Ala Ala Phe Ala Asp Tyr Pro Ala 20 25 30Thr Arg His Thr Val Asp Pro Asp Arg His Ile Glu Arg Val Thr Glu 35 40 45Leu Gln Glu Leu Phe Leu Thr Arg Val Gly Leu Asp Ile Gly Lys Val 50 55 60Trp Val Ala Asp Asp Gly Ala Ala Val Ala Val Trp Thr Thr Pro Glu65 70 75 80Ser Val Glu Ala Gly Ala Val Phe Ala Glu Ile Gly Pro Arg Met Ala 85 90 95Glu Leu Ser Gly Ser Arg Leu Ala Ala Gln Gln Gln Met Glu Gly Leu 100 105 110Leu Ala Pro His Arg Pro Lys Glu Pro Ala Trp Phe Leu Ala Thr Val 115 120 125Gly Val Ser Pro Asp His Gln Gly Lys Gly Leu Gly Ser Ala Val Val 130 135 140Leu Pro Gly Val Glu Ala Ala Glu Arg Ala Gly Val Pro Ala Phe Leu145 150 155 160Glu Thr Ser Ala Pro Arg Asn Leu Pro Phe Tyr Glu Arg Leu Gly Phe 165 170 175Thr Val Thr Ala Asp Val Glu Cys Pro Lys Asp Arg Ala Thr Trp Cys 180 185 190Met Thr Arg Lys Pro Gly Ala 19598564DNAArtificialwt DHFR gene (from mouse) 98atg gtt cga cca ttg aac tgc atc gtc gcc gtg tcc caa aat atg ggg 48Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Met Gly1 5 10 15att ggc aag aac gga gac cta ccc tgg cct ccg ctc agg aac gag ttc 96Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe 20 25 30aag tac ttc caa aga atg acc aca acc tct tca gtg gaa ggt aaa cag 144Lys Tyr Phe Gln Arg Met Thr Thr Thr Ser Ser Val Glu Gly Lys Gln 35 40 45aat ctg gtg att atg ggt agg aaa acc tgg ttc tcc att cct gag aag 192Asn Leu Val Ile Met Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys 50 55 60aat cga cct tta aag gac aga att aat ata gtt ctc agt aga gaa ctc 240Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu65 70 75 80aaa gaa cca cca cga gga gct cat ttt ctt gcc aaa agt ttg gat gat 288Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp 85 90 95gcc tta aga ctt att gaa caa ccg gaa ttg gca agt aaa gta gac atg 336Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu

Ala Ser Lys Val Asp Met 100 105 110gtt tgg ata gtc gga ggc agt tct gtt tac cag gaa gcc atg aat caa 384Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Met Asn Gln 115 120 125cca ggc cac ctc aga ctc ttt gtg aca agg atc atg cag gaa ttt gaa 432Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Met Gln Glu Phe Glu 130 135 140agt gac acg ttt ttc cca gaa att gat ttg ggg aaa tat aaa ctt ctc 480Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu145 150 155 160cca gaa tac cca ggc gtc ctc tct gag gtc cag gag gaa aaa ggc atc 528Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile 165 170 175aag tat aag ttt gaa gtc tac gag aag aaa gac taa 564Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp 180 18599187PRTArtificialSynthetic Construct 99Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Met Gly1 5 10 15Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe 20 25 30Lys Tyr Phe Gln Arg Met Thr Thr Thr Ser Ser Val Glu Gly Lys Gln 35 40 45Asn Leu Val Ile Met Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys 50 55 60Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu65 70 75 80Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp 85 90 95Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu Ala Ser Lys Val Asp Met 100 105 110Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Met Asn Gln 115 120 125Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Met Gln Glu Phe Glu 130 135 140Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu145 150 155 160Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile 165 170 175Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp 180 1851001143DNAArtificialwt hygromycin resistance gene 100atg aaa aag cct gaa ctc acc gcg acg tct gtc gag aag ttt ctg atc 48Met Lys Lys Pro Glu Leu Thr Ala Thr Ser Val Glu Lys Phe Leu Ile1 5 10 15gaa aag ttc gac agc gtc tcc gac ctg atg cag ctc tcg gag ggc gaa 96Glu Lys Phe Asp Ser Val Ser Asp Leu Met Gln Leu Ser Glu Gly Glu 20 25 30gaa tct cgt gct ttc agc ttc gat gta gga ggg cgt gga tat gtc ctg 144Glu Ser Arg Ala Phe Ser Phe Asp Val Gly Gly Arg Gly Tyr Val Leu 35 40 45cgg gta aat agc tgc gcc gat ggt ttc tac aaa gat cgt tat gtt tat 192Arg Val Asn Ser Cys Ala Asp Gly Phe Tyr Lys Asp Arg Tyr Val Tyr 50 55 60cgg cac ttt gca tcg gcc gcg ctc ccg att ccg gaa gtg ctt gac att 240Arg His Phe Ala Ser Ala Ala Leu Pro Ile Pro Glu Val Leu Asp Ile65 70 75 80ggg gaa ttc agc gag agc ctg acc tat tgc atc tcc cgc cgt gca cag 288Gly Glu Phe Ser Glu Ser Leu Thr Tyr Cys Ile Ser Arg Arg Ala Gln 85 90 95ggt gtc acg ttg caa gac ctg cct gaa acc gaa ctg ccc gct gtt ctg 336Gly Val Thr Leu Gln Asp Leu Pro Glu Thr Glu Leu Pro Ala Val Leu 100 105 110cag ccg gtc gcg gag gcc atg gat gcg atc gct gcg gcc gat ctt agc 384Gln Pro Val Ala Glu Ala Met Asp Ala Ile Ala Ala Ala Asp Leu Ser 115 120 125cag acg agc ggg ttc ggc cca ttc gga ccg caa gga atc ggt caa tac 432Gln Thr Ser Gly Phe Gly Pro Phe Gly Pro Gln Gly Ile Gly Gln Tyr 130 135 140act aca tgg cgt gat ttc ata tgc gcg att gct gat ccc cat gtg tat 480Thr Thr Trp Arg Asp Phe Ile Cys Ala Ile Ala Asp Pro His Val Tyr145 150 155 160cac tgg caa act gtg atg gac gac acc gtc agt gcg tcc gtc gcg cag 528His Trp Gln Thr Val Met Asp Asp Thr Val Ser Ala Ser Val Ala Gln 165 170 175gct ctc gat gag ctg atg ctt tgg gcc gag gac tgc ccc gaa gtc cgg 576Ala Leu Asp Glu Leu Met Leu Trp Ala Glu Asp Cys Pro Glu Val Arg 180 185 190cac ctc gtg cac gcg gat ttc ggc tcc aac aat gtc ctg acg gac aat 624His Leu Val His Ala Asp Phe Gly Ser Asn Asn Val Leu Thr Asp Asn 195 200 205ggc cgc ata aca gcg gtc att gac tgg agc gag gcg atg ttc ggg gat 672Gly Arg Ile Thr Ala Val Ile Asp Trp Ser Glu Ala Met Phe Gly Asp 210 215 220tcc caa tac gag gtc gcc aac atc ttc ttc tgg agg ccg tgg ttg gct 720Ser Gln Tyr Glu Val Ala Asn Ile Phe Phe Trp Arg Pro Trp Leu Ala225 230 235 240tgt atg gag cag cag acg cgc tac ttc gag cgg agg cat ccg gag ctt 768Cys Met Glu Gln Gln Thr Arg Tyr Phe Glu Arg Arg His Pro Glu Leu 245 250 255gca gga tcg ccg cgg ctc cgg gcg tat atg ctc cgc att ggt ctt gac 816Ala Gly Ser Pro Arg Leu Arg Ala Tyr Met Leu Arg Ile Gly Leu Asp 260 265 270caa ctc tat cag agc ttg gtt gac ggc aat ttc gat gat gca gct tgg 864Gln Leu Tyr Gln Ser Leu Val Asp Gly Asn Phe Asp Asp Ala Ala Trp 275 280 285gcg cag ggt cga tgc gac gca atc gtc cga tcc gga gcc ggg act gtc 912Ala Gln Gly Arg Cys Asp Ala Ile Val Arg Ser Gly Ala Gly Thr Val 290 295 300ggg cgt aca caa atc gcc cgc aga agc gcg gcc gtc tgg acc gat ggc 960Gly Arg Thr Gln Ile Ala Arg Arg Ser Ala Ala Val Trp Thr Asp Gly305 310 315 320tgt gta gaa gta ctc gcc gat agt gga aac cga cgc ccc agc act cgt 1008Cys Val Glu Val Leu Ala Asp Ser Gly Asn Arg Arg Pro Ser Thr Arg 325 330 335ccg gag gca aag gaa ttc ggg aga tgg ggg agg cta act gaa aca cgg 1056Pro Glu Ala Lys Glu Phe Gly Arg Trp Gly Arg Leu Thr Glu Thr Arg 340 345 350aag gag aca ata ccg gaa gga acc cgc gct atg acg gca ata aaa aga 1104Lys Glu Thr Ile Pro Glu Gly Thr Arg Ala Met Thr Ala Ile Lys Arg 355 360 365cag aat aaa acg cac ggg tgt tgg gtc gtt tgt tca taa 1143Gln Asn Lys Thr His Gly Cys Trp Val Val Cys Ser 370 375 380101380PRTArtificialSynthetic Construct 101Met Lys Lys Pro Glu Leu Thr Ala Thr Ser Val Glu Lys Phe Leu Ile1 5 10 15Glu Lys Phe Asp Ser Val Ser Asp Leu Met Gln Leu Ser Glu Gly Glu 20 25 30Glu Ser Arg Ala Phe Ser Phe Asp Val Gly Gly Arg Gly Tyr Val Leu 35 40 45Arg Val Asn Ser Cys Ala Asp Gly Phe Tyr Lys Asp Arg Tyr Val Tyr 50 55 60Arg His Phe Ala Ser Ala Ala Leu Pro Ile Pro Glu Val Leu Asp Ile65 70 75 80Gly Glu Phe Ser Glu Ser Leu Thr Tyr Cys Ile Ser Arg Arg Ala Gln 85 90 95Gly Val Thr Leu Gln Asp Leu Pro Glu Thr Glu Leu Pro Ala Val Leu 100 105 110Gln Pro Val Ala Glu Ala Met Asp Ala Ile Ala Ala Ala Asp Leu Ser 115 120 125Gln Thr Ser Gly Phe Gly Pro Phe Gly Pro Gln Gly Ile Gly Gln Tyr 130 135 140Thr Thr Trp Arg Asp Phe Ile Cys Ala Ile Ala Asp Pro His Val Tyr145 150 155 160His Trp Gln Thr Val Met Asp Asp Thr Val Ser Ala Ser Val Ala Gln 165 170 175Ala Leu Asp Glu Leu Met Leu Trp Ala Glu Asp Cys Pro Glu Val Arg 180 185 190His Leu Val His Ala Asp Phe Gly Ser Asn Asn Val Leu Thr Asp Asn 195 200 205Gly Arg Ile Thr Ala Val Ile Asp Trp Ser Glu Ala Met Phe Gly Asp 210 215 220Ser Gln Tyr Glu Val Ala Asn Ile Phe Phe Trp Arg Pro Trp Leu Ala225 230 235 240Cys Met Glu Gln Gln Thr Arg Tyr Phe Glu Arg Arg His Pro Glu Leu 245 250 255Ala Gly Ser Pro Arg Leu Arg Ala Tyr Met Leu Arg Ile Gly Leu Asp 260 265 270Gln Leu Tyr Gln Ser Leu Val Asp Gly Asn Phe Asp Asp Ala Ala Trp 275 280 285Ala Gln Gly Arg Cys Asp Ala Ile Val Arg Ser Gly Ala Gly Thr Val 290 295 300Gly Arg Thr Gln Ile Ala Arg Arg Ser Ala Ala Val Trp Thr Asp Gly305 310 315 320Cys Val Glu Val Leu Ala Asp Ser Gly Asn Arg Arg Pro Ser Thr Arg 325 330 335Pro Glu Ala Lys Glu Phe Gly Arg Trp Gly Arg Leu Thr Glu Thr Arg 340 345 350Lys Glu Thr Ile Pro Glu Gly Thr Arg Ala Met Thr Ala Ile Lys Arg 355 360 365Gln Asn Lys Thr His Gly Cys Trp Val Val Cys Ser 370 375 380102804DNAArtificialwt neomycin resistance gene 102atg gga tcg gcc att gaa caa gat gga ttg cac gca ggt tct ccg gcc 48Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15gct tgg gtg gag agg cta ttc ggc tat gac tgg gca caa cag aca atc 96Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30ggc tgc tct gat gcc gcc gtg ttc cgg ctg tca gcg cag ggg cgc ccg 144Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45gtt ctt ttt gtc aag acc gac ctg tcc ggt gcc ctg aat gaa ctg cag 192Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60gac gag gca gcg cgg cta tcg tgg ctg gcc acg acg ggc gtt cct tgc 240Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80gca gct gtg ctc gac gtt gtc act gaa gcg gga agg gac tgg ctg cta 288Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95ttg ggc gaa gtg ccg ggg cag gat ctc ctg tca tct cac ctt gct cct 336Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110gcc gag aaa gta tcc atc atg gct gat gca atg cgg cgg ctg cat acg 384Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr 115 120 125ctt gat ccg gct acc tgc cca ttc gac cac caa gcg aaa cat cgc atc 432Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140gag cga gca cgt act cgg atg gaa gcc ggt ctt gtc gat cag gat gat 480Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160ctg gac gaa gag cat cag ggg ctc gcg cca gcc gaa ctg ttc gcc agg 528Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175ctc aag gcg cgc atg ccc gac ggc gat gat ctc gtc gtg acc cat ggc 576Leu Lys Ala Arg Met Pro Asp Gly Asp Asp Leu Val Val Thr His Gly 180 185 190gat gcc tgc ttg ccg aat atc atg gtg gaa aat ggc cgc ttt tct gga 624Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly 195 200 205ttc atc gac tgt ggc cgg ctg ggt gtg gcg gac cgc tat cag gac ata 672Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220gcg ttg gct acc cgt gat att gct gaa gag ctt ggc ggc gaa tgg gct 720Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240gac cgc ttc ctc gtg ctt tac ggt atc gcc gct ccc gat tcg cag cgc 768Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255atc gcc ttc tat cgc ctt ctt gac gag ttc ttc tga 804Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265103267PRTArtificialSynthetic Construct 103Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr 115 120 125Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175Leu Lys Ala Arg Met Pro Asp Gly Asp Asp Leu Val Val Thr His Gly 180 185 190Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly 195 200 205Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 2651041121DNAArtificialwt glutamine synthase gene (human) 104atg acc acc tca gca agt tcc cac tta aat aaa ggc atc aag cag gtg 48Met Thr Thr Ser Ala Ser Ser His Leu Asn Lys Gly Ile Lys Gln Val1 5 10 15tac atg tcc ctg cct cag ggt gag aaa gtc cag gcc atg tat atc tgg 96Tyr Met Ser Leu Pro Gln Gly Glu Lys Val Gln Ala Met Tyr Ile Trp 20 25 30atc gat ggt act gga gaa gga ctg cgc tgc aag acc cgg acc ctg gac 144Ile Asp Gly Thr Gly Glu Gly Leu Arg Cys Lys Thr Arg Thr Leu Asp 35 40 45agt gag ccc aag tgt gtg gaa gag ttg cct gag tgg aat ttc gat ggc 192Ser Glu Pro Lys Cys Val Glu Glu Leu Pro Glu Trp Asn Phe Asp Gly 50 55 60tcc agt act tta cag tct gag ggt tcc aac agt gac atg tat ctc gtg 240Ser Ser Thr Leu Gln Ser Glu Gly Ser Asn Ser Asp Met Tyr Leu Val65 70 75 80cct gct gcc atg ttt cgg gac ccc ttc cgt aag gac cct aac aag ctg 288Pro Ala Ala Met Phe Arg Asp Pro Phe Arg Lys Asp Pro Asn Lys Leu 85 90 95gtg tta tgt gaa gtt ttc aag tac aat cga agg cct gca gag acc aat 336Val Leu Cys Glu Val Phe Lys Tyr Asn Arg Arg Pro Ala Glu Thr Asn 100 105 110ttg agg cac acc tgt aaa cgg ata atg gac atg gtg agc aac cag cac 384Leu Arg His Thr Cys Lys Arg Ile Met Asp Met Val Ser Asn Gln His 115 120 125ccc tgg ttt ggc atg gag cag gag tat acc ctc atg ggg aca gat ggg 432Pro Trp Phe Gly Met Glu Gln Glu Tyr Thr Leu Met Gly Thr Asp Gly 130 135 140cac ccc ttt ggt tgg cct tcc aac ggc ttc cca ggg ccc cag ggt cca 480His Pro Phe Gly Trp Pro Ser Asn Gly Phe Pro Gly Pro Gln Gly Pro145 150 155 160tat tac tgt ggt gtg gga gca gac aga gcc tat ggc agg gac atc gtg 528Tyr Tyr Cys Gly Val Gly Ala Asp Arg Ala Tyr Gly Arg Asp Ile Val 165 170 175gag gcc cat tac cgg gcc tgc ttg tat gct gga gtc aag att gcg ggg 576Glu Ala His Tyr Arg Ala Cys Leu Tyr Ala Gly Val Lys Ile Ala Gly 180 185 190act aat gcc gag gtc atg cct gcc cag tgg gaa ttt cag att gga cct 624Thr Asn Ala Glu Val Met Pro Ala Gln Trp Glu Phe Gln Ile Gly Pro 195 200 205tgt gaa gga atc agc atg gga gat cat ctc tgg gtg gcc cgt ttc atc 672Cys Glu Gly Ile Ser Met Gly Asp His Leu Trp Val Ala Arg Phe Ile 210 215 220ttg cat cgt gtg tgt gaa gac ttt gga gtg ata gca acc ttt gat cct 720Leu His Arg Val Cys Glu Asp Phe Gly Val Ile Ala Thr Phe Asp Pro225 230 235 240aag ccc att cct ggg aac tgg aat ggt gca ggc tgc cat acc aac ttc 768Lys Pro Ile Pro Gly Asn Trp Asn Gly Ala Gly Cys His Thr Asn Phe 245 250 255agc acc aag gcc atg cgg gag gag aat ggt ctg aag tac atc gag gag 816Ser Thr Lys Ala Met

Arg Glu Glu Asn Gly Leu Lys Tyr Ile Glu Glu 260 265 270gcc att gag aaa cta agc aag cgg cac cag tac cac atc cgt gcc tat 864Ala Ile Glu Lys Leu Ser Lys Arg His Gln Tyr His Ile Arg Ala Tyr 275 280 285gat ccc aag gga ggc ctg gac aat gcc cga cgt cta act gga ttc cat 912Asp Pro Lys Gly Gly Leu Asp Asn Ala Arg Arg Leu Thr Gly Phe His 290 295 300gaa acc tcc aac atc aac gac ttt tct ggt ggt gta gcc aat cgt agc 960Glu Thr Ser Asn Ile Asn Asp Phe Ser Gly Gly Val Ala Asn Arg Ser305 310 315 320gcc agc ata cgc att ccc cgg act gtt ggc cag gag aag aag ggt tac 1008Ala Ser Ile Arg Ile Pro Arg Thr Val Gly Gln Glu Lys Lys Gly Tyr 325 330 335ttt gaa gat cgt cgc ccc tct gcc aac tgc gac ccc ttt tcg gtg aca 1056Phe Glu Asp Arg Arg Pro Ser Ala Asn Cys Asp Pro Phe Ser Val Thr 340 345 350gaa gcc ctc atc cgc acg tgt ctt ctc aat gaa acc ggc gat gag ccc 1104Glu Ala Leu Ile Arg Thr Cys Leu Leu Asn Glu Thr Gly Asp Glu Pro 355 360 365ttc cag tac aaa aat ta 1121Phe Gln Tyr Lys Asn 370105373PRTArtificialSynthetic Construct 105Met Thr Thr Ser Ala Ser Ser His Leu Asn Lys Gly Ile Lys Gln Val1 5 10 15Tyr Met Ser Leu Pro Gln Gly Glu Lys Val Gln Ala Met Tyr Ile Trp 20 25 30Ile Asp Gly Thr Gly Glu Gly Leu Arg Cys Lys Thr Arg Thr Leu Asp 35 40 45Ser Glu Pro Lys Cys Val Glu Glu Leu Pro Glu Trp Asn Phe Asp Gly 50 55 60Ser Ser Thr Leu Gln Ser Glu Gly Ser Asn Ser Asp Met Tyr Leu Val65 70 75 80Pro Ala Ala Met Phe Arg Asp Pro Phe Arg Lys Asp Pro Asn Lys Leu 85 90 95Val Leu Cys Glu Val Phe Lys Tyr Asn Arg Arg Pro Ala Glu Thr Asn 100 105 110Leu Arg His Thr Cys Lys Arg Ile Met Asp Met Val Ser Asn Gln His 115 120 125Pro Trp Phe Gly Met Glu Gln Glu Tyr Thr Leu Met Gly Thr Asp Gly 130 135 140His Pro Phe Gly Trp Pro Ser Asn Gly Phe Pro Gly Pro Gln Gly Pro145 150 155 160Tyr Tyr Cys Gly Val Gly Ala Asp Arg Ala Tyr Gly Arg Asp Ile Val 165 170 175Glu Ala His Tyr Arg Ala Cys Leu Tyr Ala Gly Val Lys Ile Ala Gly 180 185 190Thr Asn Ala Glu Val Met Pro Ala Gln Trp Glu Phe Gln Ile Gly Pro 195 200 205Cys Glu Gly Ile Ser Met Gly Asp His Leu Trp Val Ala Arg Phe Ile 210 215 220Leu His Arg Val Cys Glu Asp Phe Gly Val Ile Ala Thr Phe Asp Pro225 230 235 240Lys Pro Ile Pro Gly Asn Trp Asn Gly Ala Gly Cys His Thr Asn Phe 245 250 255Ser Thr Lys Ala Met Arg Glu Glu Asn Gly Leu Lys Tyr Ile Glu Glu 260 265 270Ala Ile Glu Lys Leu Ser Lys Arg His Gln Tyr His Ile Arg Ala Tyr 275 280 285Asp Pro Lys Gly Gly Leu Asp Asn Ala Arg Arg Leu Thr Gly Phe His 290 295 300Glu Thr Ser Asn Ile Asn Asp Phe Ser Gly Gly Val Ala Asn Arg Ser305 310 315 320Ala Ser Ile Arg Ile Pro Arg Thr Val Gly Gln Glu Lys Lys Gly Tyr 325 330 335Phe Glu Asp Arg Arg Pro Ser Ala Asn Cys Asp Pro Phe Ser Val Thr 340 345 350Glu Ala Leu Ile Arg Thr Cys Leu Leu Asn Glu Thr Gly Asp Glu Pro 355 360 365Phe Gln Tyr Lys Asn 37010643DNAArtificialprimer GTGspaceBamHIF 106gaattcggat ccaccgtggc gatccaaaga ctgccaaatc tag 4310742DNAArtificialprimer ZEOTTTGTGBamHIF 107gaattcggat cctttgtggc caagttgacc agtgccgttc cg 4210846DNAArtificialprimer ZEOForwardGTG-Thr9 108aattggatcc accgtggcca agttgaccag tgccgttacc gtgctc 4610946DNAArtificialpimer ZEOForward GTG-Phe9 109aattggatcc accgtggcca agttgaccag tgccgttttc gtgctc 4611043DNAArtificialprimer TTGspaceBamHIF 110gaattcggat ccaccttggc gatccaaaga ctgccaaatc tag 4311146DNAArtificialprimer ZEOForwardTTG-Thr9 111aattggatcc accttggcca agttgaccag tgccgttacc gtgctc 4611246DNAArtificialpimer ZEOForwardTTG-Phe9 112aattggatcc accttggcca agttgaccag tgccgttttc gtgctc 4611337DNAArtificialprimer PURO BamHI F 113gatcggatcc atggttaccg agtacaagcc cacggtg 3711435DNAArtificialprimer PURO300 R LEU 114cagccgggaa ccgctcaact cggccaggcg cgggc 3511549DNAArtificialprimer PURO300FLEU 115cgagttgagc ggttcccggc tggccgcgca gcaacagctg gaaggcctc 4911644DNAArtificialprimer PURO600RLEU 116aagcttgaat tcaggcaccg ggcttgcggg tcaggcacca ggtc 4411742DNAArtificialprimer PUROBamHI TTG1F 117gaattcggat ccaccttggt taccgagtac aagcccacgg tg 42118804DNAArtificialmodified neomycin resistance gene lacking internal ATG sequences 118atg gga tcg gcc att gaa caa gac gga ttg cac gca ggt tct ccg gcc 48Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15gct tgg gtg gag agg cta ttc ggc tac gac tgg gca caa cag aca atc 96Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30ggc tgc tct gac gcc gcc gtg ttc cgg ctg tca gcg cag ggg cgc ccg 144Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45gtt ctt ttt gtc aag acc gac ctg tcc ggt gcc ctg aac gaa ctg cag 192Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60gac gag gca gcg cgg cta tcg tgg ctg gcc acg acg ggc gtt cct tgc 240Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80gca gct gtg ctc gac gtt gtc act gaa gcg gga agg gac tgg ctg cta 288Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95ttg ggc gaa gtg ccg ggg cag gat ctc ctg tca tct cac ctt gct cct 336Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110gcc gag aaa gta tcc atc ctg gct gac gca ctg cgg cgg ctg cat acg 384Ala Glu Lys Val Ser Ile Leu Ala Asp Ala Leu Arg Arg Leu His Thr 115 120 125ctt gat ccg gct acc tgc cca ttc gac cac caa gcg aaa cat cgc atc 432Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140gag cga gca cgt act cgg ctg gaa gcc ggt ctt gtc gat cag gac gat 480Glu Arg Ala Arg Thr Arg Leu Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160ctg gac gaa gag cat cag ggg ctc gcg cca gcc gaa ctg ttc gcc agg 528Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175ctc aag gcg cgc ctg ccc gac ggc gac gat ctc gtc gtg acc cac ggc 576Leu Lys Ala Arg Leu Pro Asp Gly Asp Asp Leu Val Val Thr His Gly 180 185 190gac gcc tgc ttg ccg aat atc ctg gtg gaa aac ggc cgc ttt tct gga 624Asp Ala Cys Leu Pro Asn Ile Leu Val Glu Asn Gly Arg Phe Ser Gly 195 200 205ttc atc gac tgt ggc cgg ctg ggt gtg gcg gac cgc tat cag gac ata 672Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220gcg ttg gct acc cgt gat att gct gaa gag ctt ggc ggc gag tgg gct 720Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240gac cgc ttc ctc gtg ctt tac ggt atc gcc gct ccc gat tcg cag cgc 768Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255atc gcc ttc tat cgc ctt ctt gac gag ttc ttc tga 804Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265119267PRTArtificialSynthetic Construct 119Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110Ala Glu Lys Val Ser Ile Leu Ala Asp Ala Leu Arg Arg Leu His Thr 115 120 125Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140Glu Arg Ala Arg Thr Arg Leu Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175Leu Lys Ala Arg Leu Pro Asp Gly Asp Asp Leu Val Val Thr His Gly 180 185 190Asp Ala Cys Leu Pro Asn Ile Leu Val Glu Asn Gly Arg Phe Ser Gly 195 200 205Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 26512040DNAArtificialprimer NEO-F-HindIII 120gatcaagctt ttggatcggc cattgaaaca agacggattg 4012136DNAArtificialprimer NEO EcoRI 800R 121aagcttgaat tctcagaaga actcgtcaag aaggcg 36122564DNAArtificialmodified dhfr gene lacking internal ATG sequences 122atg gtt cga cca ttg aac tgc atc gtc gcc gtg tcc caa aat ctg ggg 48Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Leu Gly1 5 10 15att ggc aag aac gga gac cta ccc tgg cct ccg ctc agg aac gag ttc 96Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe 20 25 30aag tac ttc caa aga ctg acc aca acc tct tca gtg gaa ggt aaa cag 144Lys Tyr Phe Gln Arg Leu Thr Thr Thr Ser Ser Val Glu Gly Lys Gln 35 40 45aat ctg gtg att ctg ggt agg aaa acc tgg ttc tcc att cct gag aag 192Asn Leu Val Ile Leu Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys 50 55 60aat cga cct tta aag gac aga att aat ata gtt ctc agt aga gaa ctc 240Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu65 70 75 80aaa gaa cca cca cga gga gct cat ttt ctt gcc aaa agt ttg gac gac 288Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp 85 90 95gcc tta aga ctt att gaa caa ccg gaa ttg gca agt aaa gta gac ctg 336Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu Ala Ser Lys Val Asp Leu 100 105 110gtt tgg ata gtc gga ggc agt tct gtt tac cag gaa gcc ctg aat caa 384Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Leu Asn Gln 115 120 125cca ggc cac ctc aga ctc ttt gtg aca agg att ctg cag gaa ttt gaa 432Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Leu Gln Glu Phe Glu 130 135 140agt gac acg ttt ttc cca gaa att gat ttg ggg aaa tat aaa ctt ctc 480Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu145 150 155 160cca gaa tac cca ggc gtc ctc tct gag gtc cag gag gaa aaa ggc atc 528Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile 165 170 175aag tat aag ttt gaa gtc tac gag aag aaa gac taa 564Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp 180 185123187PRTArtificialSynthetic Construct 123Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Leu Gly1 5 10 15Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe 20 25 30Lys Tyr Phe Gln Arg Leu Thr Thr Thr Ser Ser Val Glu Gly Lys Gln 35 40 45Asn Leu Val Ile Leu Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys 50 55 60Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu65 70 75 80Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp 85 90 95Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu Ala Ser Lys Val Asp Leu 100 105 110Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Leu Asn Gln 115 120 125Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Leu Gln Glu Phe Glu 130 135 140Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu145 150 155 160Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile 165 170 175Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp 180 18512436DNAArtificialprimer DHFR-F-HindIII 124gatcaagctt ttgttcgacc attgaactgc atcgtc 3612536DNAArtificialprimer DHFR-EcoRI-600-R 125aagcttgaat tcttagtctt tcttctcgta gacttc 36126154DNAArtificialcombined synthetic polyadenylation sequence and pausing signal from the human alpha2 globin gene 126aataaaatat ctttattttc attacatctg tgtgttggtt ttttgtgtga atcgatagta 60ctaacatacg ctctccatca aaacaaaacg aaacaaaaca aactagcaaa ataggctgtc 120cccagtgcaa gtgcaggtgc cagaacattt ctct 154127596DNAArtificialIRES sequence 127gcccctctcc ctcccccccc cctaacgtta ctggccgaag ccgcttggaa taaggccggt 60gtgcgtttgt ctatatgtga ttttccacca tattgccgtc ttttggcaat gtgagggccc 120ggaaacctgg ccctgtcttc ttgacgagca ttcctagggg tctttcccct ctcgccaaag 180gaatgcaagg tctgttgaat gtcgtgaagg aagcagttcc tctggaagct tcttgaagac 240aaacaacgtc tgtagcgacc ctttgcaggc agcggaaccc cccacctggc gacaggtgcc 300tctgcggcca aaagccacgt gtataagata cacctgcaaa ggcggcacaa ccccagtgcc 360acgttgtgag ttggatagtt gtggaaagag tcaaatggct ctcctcaagc gtattcaaca 420aggggctgaa ggatgcccag aaggtacccc attgtatggg atctgatctg gggcctcggt 480gcacatgctt tacatgtgtt tagtcgaggt taaaaaaacg tctaggcccc ccgaaccacg 540gggacgtggt tttcctttga aaaacacgat gataagcttg ccacaacccc gggata 596128804DNAArtificialwild type neomycin (Neo) resistance sequence 128atg gga tcg gcc att gaa caa gat gga ttg cac gca ggt tct ccg gcc 48Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15gct tgg gtg gag agg cta ttc ggc tat gac tgg gca caa cag aca atc 96Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30ggc tgc tct gat gcc gcc gtg ttc cgg ctg tca gcg cag ggg cgc ccg 144Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45gtt ctt ttt gtc aag acc gac ctg tcc ggt gcc ctg aat gaa ctg cag 192Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60gac gag gca gcg cgg cta tcg tgg ctg gcc acg acg ggc gtt cct tgc 240Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80gca gct gtg ctc gac gtt gtc act gaa gcg gga agg gac tgg ctg cta 288Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95ttg ggc gaa gtg ccg ggg cag gat ctc ctg tca tct cac ctt gct cct 336Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110gcc gag aaa gta tcc atc atg gct gat gca atg cgg cgg ctg cat acg 384Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr 115 120 125ctt gat ccg gct acc tgc cca ttc gac cac caa gcg aaa cat cgc atc 432Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140gag cga gca cgt act cgg atg gaa

gcc ggt ctt gtc gat cag gat gat 480Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160ctg gac gaa gag cat cag ggg ctc gcg cca gcc gaa ctg ttc gcc agg 528Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175ctc aag gcg cgc atg ccc gac ggc gag gat ctc gtc gtg acc cat ggc 576Leu Lys Ala Arg Met Pro Asp Gly Glu Asp Leu Val Val Thr His Gly 180 185 190gat gcc tgc ttg ccg aat atc atg gtg gaa aat ggc cgc ttt tct gga 624Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly 195 200 205ttc atc gac tgt ggc cgg ctg ggt gtg gcg gac cgc tat cag gac ata 672Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220gcg ttg gct acc cgt gat att gct gaa gag ctt ggc ggc gaa tgg gct 720Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240gac cgc ttc ctc gtg ctt tac ggt atc gcc gct ccc gat tcg cag cgc 768Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255atc gcc ttc tat cgc ctt ctt gac gag ttc ttc tga 804Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265129267PRTArtificialSynthetic Construct 129Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr 115 120 125Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175Leu Lys Ala Arg Met Pro Asp Gly Glu Asp Leu Val Val Thr His Gly 180 185 190Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly 195 200 205Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265130804DNAArtificialCpG poor Neo resistance sequence 130atg gga agt gcc att gaa caa gac gga ttg cac gca ggt tct cct gca 48Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15gct tgg gtg gag agg cta ttt ggc tac gac tgg gca caa cag aca ata 96Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30ggc tgc tct gac gca gca gtg ttc aga ctg tca gca cag ggg aga cca 144Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45gtt ctt ttt gtc aag act gac ctg tca ggt gcc ctg aac gaa ctg cag 192Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60gac gag gca gca aga cta agt tgg ctg gcc act act ggt gtt cct tgt 240Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80gca gct gtg ttg gac gtt gtc act gaa gca gga agg gac tgg ctg cta 288Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95ttg ggt gaa gtg cct ggg cag gat ctc ctg tca tct cac ctt gct cct 336Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110gca gag aaa gta tcc atc ctg gct gac gca ctg aga aga ctg cat act 384Ala Glu Lys Val Ser Ile Leu Ala Asp Ala Leu Arg Arg Leu His Thr 115 120 125ctt gat cca gct acc tgc cca ttt gac cac caa gca aaa cat aga att 432Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140gag aga gca cga act aga ctg gaa gca ggt ctt gta gat cag gac gat 480Glu Arg Ala Arg Thr Arg Leu Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160ctg gac gaa gag cat cag ggg ttg gca cca gca gaa ctg ttt gcc agg 528Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175ctc aag gca aga ctg cct gac ggt gaa gat ttg gtt gtg acc cac ggt 576Leu Lys Ala Arg Leu Pro Asp Gly Glu Asp Leu Val Val Thr His Gly 180 185 190gac gcc tgc ttg cct aat atc ctg gtg gaa aac ggc aga ttt tct gga 624Asp Ala Cys Leu Pro Asn Ile Leu Val Glu Asn Gly Arg Phe Ser Gly 195 200 205ttc att gac tgt ggc aga ctg ggt gtg gca gac aga tat cag gac ata 672Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220gca ttg gct acc aga gat att gct gaa gag ctt ggt ggt gag tgg gct 720Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240gac aga ttc ttg gtg ctt tac ggt ata gcc gct cct gat tca cag aga 768Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255ata gcc ttc tat aga ctt ctt gac gag ttc ttc tga 804Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265131267PRTArtificialSynthetic Construct 131Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala1 5 10 15Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile 20 25 30Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro 35 40 45Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln 50 55 60Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys65 70 75 80Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu 85 90 95Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro 100 105 110Ala Glu Lys Val Ser Ile Leu Ala Asp Ala Leu Arg Arg Leu His Thr 115 120 125Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile 130 135 140Glu Arg Ala Arg Thr Arg Leu Glu Ala Gly Leu Val Asp Gln Asp Asp145 150 155 160Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg 165 170 175Leu Lys Ala Arg Leu Pro Asp Gly Glu Asp Leu Val Val Thr His Gly 180 185 190Asp Ala Cys Leu Pro Asn Ile Leu Val Glu Asn Gly Arg Phe Ser Gly 195 200 205Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile 210 215 220Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala225 230 235 240Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg 245 250 255Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe 260 265132375DNAArtificialCpG poor and ATG-less zeocin (Zeo) resistance sequence 132ttg gcc aag ttg acc agt gct gtc cca gtg ctc aca gcc agg gac gtg 48Leu Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val1 5 10 15gct gga gct gtt gag ttc tgg act gac agg ttg ggg ttc tcc aga gat 96Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp 20 25 30ttt gtg gag gac gac ttt gca ggt gtg gtc aga gac gac gtc acc ctg 144Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu 35 40 45ttc atc tca gca gtc cag gac cag gtg gtg cct gac aac acc ctg gct 192Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala 50 55 60tgg gtg tgg gtg aga gga ctg gac gag ctg tac gct gag tgg agt gag 240Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu65 70 75 80gtg gtc tcc acc aac ttc agg gac gcc agt ggc cct gcc ttg aca gag 288Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Leu Thr Glu 85 90 95att gga gag cag ccc tgg ggg aga gag ttt gcc ctg aga gac cca gca 336Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala 100 105 110ggc aac tgt gtg cac ttt gtg gca gag gag cag gac tga 375Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp 115 120133124PRTArtificialSynthetic Construct 133Leu Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val1 5 10 15Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp 20 25 30Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu 35 40 45Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala 50 55 60Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu65 70 75 80Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Leu Thr Glu 85 90 95Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala 100 105 110Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp 115 1201341194DNAEscherichia coliwt trp sequence(1)..(1194)CDS(1)..(1194) 134atg aca aca tta ctt aac ccc tat ttt ggt gag ttt ggc ggc atg tac 48Met Thr Thr Leu Leu Asn Pro Tyr Phe Gly Glu Phe Gly Gly Met Tyr1 5 10 15gtg cca caa atc ctg atg cct gct ctg cgc cag ctg gaa gaa gct ttt 96Val Pro Gln Ile Leu Met Pro Ala Leu Arg Gln Leu Glu Glu Ala Phe 20 25 30gtc agt gcg caa aaa gat cct gaa ttt cag gct cag ttc aac gac ctg 144Val Ser Ala Gln Lys Asp Pro Glu Phe Gln Ala Gln Phe Asn Asp Leu 35 40 45ctg aaa aac tat gcc ggg cgt cca acc gcg ctg acc aaa tgc cag aac 192Leu Lys Asn Tyr Ala Gly Arg Pro Thr Ala Leu Thr Lys Cys Gln Asn 50 55 60att aca gcc ggg acg aac acc acg ctg tat ctc aag cgt gaa gat ttg 240Ile Thr Ala Gly Thr Asn Thr Thr Leu Tyr Leu Lys Arg Glu Asp Leu65 70 75 80ctg cac ggc ggc gcg cat aaa act aac cag gtg ctg ggg cag gcg ttg 288Leu His Gly Gly Ala His Lys Thr Asn Gln Val Leu Gly Gln Ala Leu 85 90 95ctg gcg aag cgg atg ggt aaa acc gaa atc atc gcc gaa acc ggt gcc 336Leu Ala Lys Arg Met Gly Lys Thr Glu Ile Ile Ala Glu Thr Gly Ala 100 105 110ggt cag cat ggc gtg gcg tcg gcc ctg gcc agc gcc ctg ctc ggc ctg 384Gly Gln His Gly Val Ala Ser Ala Leu Ala Ser Ala Leu Leu Gly Leu 115 120 125aaa tgc cgt att tat atg ggt gcc aaa gac gtt gaa cgc cag tcg cct 432Lys Cys Arg Ile Tyr Met Gly Ala Lys Asp Val Glu Arg Gln Ser Pro 130 135 140aac gtt ttt cgt atg cgc tta atg ggt gcg gaa gtg atc ccg gtg cat 480Asn Val Phe Arg Met Arg Leu Met Gly Ala Glu Val Ile Pro Val His145 150 155 160agc ggt tcc gcg acg ctg aaa gat gcc tgt aac gag gcg ctg cgc gac 528Ser Gly Ser Ala Thr Leu Lys Asp Ala Cys Asn Glu Ala Leu Arg Asp 165 170 175tgg tcc ggt agt tac gaa acc gcg cac tat atg ctg ggc acc gca gct 576Trp Ser Gly Ser Tyr Glu Thr Ala His Tyr Met Leu Gly Thr Ala Ala 180 185 190ggc ccg cat cct tat ccg acc att gtg cgt gag ttt cag cgg atg att 624Gly Pro His Pro Tyr Pro Thr Ile Val Arg Glu Phe Gln Arg Met Ile 195 200 205ggc gaa gaa acc aaa gcg cag att ctg gaa aga gaa ggt cgc ctg ccg 672Gly Glu Glu Thr Lys Ala Gln Ile Leu Glu Arg Glu Gly Arg Leu Pro 210 215 220gat gcc gtt atc gcc tgt gtt ggc ggc ggt tcg aat gcc atc ggc atg 720Asp Ala Val Ile Ala Cys Val Gly Gly Gly Ser Asn Ala Ile Gly Met225 230 235 240ttt gct gat ttc atc aat gaa acc aac gtc ggc ctg att ggt gtg gag 768Phe Ala Asp Phe Ile Asn Glu Thr Asn Val Gly Leu Ile Gly Val Glu 245 250 255cca ggt ggt cac ggt atc gaa act ggc gag cac ggc gca ccg cta aaa 816Pro Gly Gly His Gly Ile Glu Thr Gly Glu His Gly Ala Pro Leu Lys 260 265 270cat ggt cgc gtg ggt atc tat ttc ggt atg aaa gcg ccg atg atg caa 864His Gly Arg Val Gly Ile Tyr Phe Gly Met Lys Ala Pro Met Met Gln 275 280 285acc gaa gac ggg cag att gaa gaa tct tac tcc atc tcc gcc gga ctg 912Thr Glu Asp Gly Gln Ile Glu Glu Ser Tyr Ser Ile Ser Ala Gly Leu 290 295 300gat ttc ccg tct gtc ggc cca caa cac gcg tat ctt aac agc act gga 960Asp Phe Pro Ser Val Gly Pro Gln His Ala Tyr Leu Asn Ser Thr Gly305 310 315 320cgc gct gat tac gtg tct att acc gat gat gaa gcc ctt gaa gcc ttc 1008Arg Ala Asp Tyr Val Ser Ile Thr Asp Asp Glu Ala Leu Glu Ala Phe 325 330 335aaa acg ctg tgc ctg cac gaa ggg atc atc ccg gcg ctg gaa tcc tcc 1056Lys Thr Leu Cys Leu His Glu Gly Ile Ile Pro Ala Leu Glu Ser Ser 340 345 350cac gcc ttg gcc cat gcg ttg aaa atg atg cgc gaa aac ccg gat aaa 1104His Ala Leu Ala His Ala Leu Lys Met Met Arg Glu Asn Pro Asp Lys 355 360 365gag cag cta ctg gtg gtt aac ctt tcc ggt cgc ggc gat aaa gac atc 1152Glu Gln Leu Leu Val Val Asn Leu Ser Gly Arg Gly Asp Lys Asp Ile 370 375 380ttc acc gtt cac gat att ttg aaa gca cga ggg gaa atc tga 1194Phe Thr Val His Asp Ile Leu Lys Ala Arg Gly Glu Ile385 390 395135397PRTEscherichia coli 135Met Thr Thr Leu Leu Asn Pro Tyr Phe Gly Glu Phe Gly Gly Met Tyr1 5 10 15Val Pro Gln Ile Leu Met Pro Ala Leu Arg Gln Leu Glu Glu Ala Phe 20 25 30Val Ser Ala Gln Lys Asp Pro Glu Phe Gln Ala Gln Phe Asn Asp Leu 35 40 45Leu Lys Asn Tyr Ala Gly Arg Pro Thr Ala Leu Thr Lys Cys Gln Asn 50 55 60Ile Thr Ala Gly Thr Asn Thr Thr Leu Tyr Leu Lys Arg Glu Asp Leu65 70 75 80Leu His Gly Gly Ala His Lys Thr Asn Gln Val Leu Gly Gln Ala Leu 85 90 95Leu Ala Lys Arg Met Gly Lys Thr Glu Ile Ile Ala Glu Thr Gly Ala 100 105 110Gly Gln His Gly Val Ala Ser Ala Leu Ala Ser Ala Leu Leu Gly Leu 115 120 125Lys Cys Arg Ile Tyr Met Gly Ala Lys Asp Val Glu Arg Gln Ser Pro 130 135 140Asn Val Phe Arg Met Arg Leu Met Gly Ala Glu Val Ile Pro Val His145 150 155 160Ser Gly Ser Ala Thr Leu Lys Asp Ala Cys Asn Glu Ala Leu Arg Asp 165 170 175Trp Ser Gly Ser Tyr Glu Thr Ala His Tyr Met Leu Gly Thr Ala Ala 180 185 190Gly Pro His Pro Tyr Pro Thr Ile Val Arg Glu Phe Gln Arg Met Ile 195 200 205Gly Glu Glu Thr Lys Ala Gln Ile Leu Glu Arg Glu Gly Arg Leu Pro 210 215 220Asp Ala Val Ile Ala Cys Val Gly Gly Gly Ser Asn Ala Ile Gly Met225 230 235 240Phe Ala Asp Phe Ile Asn Glu Thr Asn Val Gly Leu Ile Gly Val Glu 245 250 255Pro Gly Gly His Gly Ile Glu Thr Gly Glu His Gly Ala Pro Leu Lys 260 265 270His Gly Arg Val Gly Ile Tyr Phe Gly Met Lys Ala Pro Met Met Gln 275 280 285Thr Glu Asp Gly Gln Ile Glu Glu Ser Tyr Ser Ile

Ser Ala Gly Leu 290 295 300Asp Phe Pro Ser Val Gly Pro Gln His Ala Tyr Leu Asn Ser Thr Gly305 310 315 320Arg Ala Asp Tyr Val Ser Ile Thr Asp Asp Glu Ala Leu Glu Ala Phe 325 330 335Lys Thr Leu Cys Leu His Glu Gly Ile Ile Pro Ala Leu Glu Ser Ser 340 345 350His Ala Leu Ala His Ala Leu Lys Met Met Arg Glu Asn Pro Asp Lys 355 360 365Glu Gln Leu Leu Val Val Asn Leu Ser Gly Arg Gly Asp Lys Asp Ile 370 375 380Phe Thr Val His Asp Ile Leu Lys Ala Arg Gly Glu Ile385 390 3951361194DNAArtificialATG-less trp sequence 136atg aca aca tta ctt aac ccc tat ttt ggt gag ttt ggc ggc cag tac 48Met Thr Thr Leu Leu Asn Pro Tyr Phe Gly Glu Phe Gly Gly Gln Tyr1 5 10 15gtg cca caa atc ctg gtc cct gct ctg cgc cag ctg gaa gag gct ttt 96Val Pro Gln Ile Leu Val Pro Ala Leu Arg Gln Leu Glu Glu Ala Phe 20 25 30gtc agt gcc caa aaa gat cct gaa ttt caa gct cag ttc aac gac ctg 144Val Ser Ala Gln Lys Asp Pro Glu Phe Gln Ala Gln Phe Asn Asp Leu 35 40 45ctg aaa aac tac gcc ggg cgt cca acc gcg ctg acc aag tgc cag aac 192Leu Lys Asn Tyr Ala Gly Arg Pro Thr Ala Leu Thr Lys Cys Gln Asn 50 55 60att acc gcc ggg acg aac acc acg ctg tat ctc aag cgt gaa gat ttg 240Ile Thr Ala Gly Thr Asn Thr Thr Leu Tyr Leu Lys Arg Glu Asp Leu65 70 75 80ctg cac ggc ggc gcg cat aaa act aac cag gtg ctg ggg cag gcg ttg 288Leu His Gly Gly Ala His Lys Thr Asn Gln Val Leu Gly Gln Ala Leu 85 90 95ctg gcg aag cgg ctg ggt aaa acc gaa atc atc gcc gaa act ggt gcc 336Leu Ala Lys Arg Leu Gly Lys Thr Glu Ile Ile Ala Glu Thr Gly Ala 100 105 110ggt cag cac ggc gtg gcg tcg gcc ctt gcc agc gcc ctg ctc ggc ctg 384Gly Gln His Gly Val Ala Ser Ala Leu Ala Ser Ala Leu Leu Gly Leu 115 120 125aag tgc cgt att tat ctg ggt gcc aaa gac gtt gaa cgc cag tcg cct 432Lys Cys Arg Ile Tyr Leu Gly Ala Lys Asp Val Glu Arg Gln Ser Pro 130 135 140aac gtt ttt cgt ctg cgc tta ctg ggt gcg gaa gtg atc ccg gtg cat 480Asn Val Phe Arg Leu Arg Leu Leu Gly Ala Glu Val Ile Pro Val His145 150 155 160agc ggt tcc gcg acg ctg aaa gac gcc tgt aac gag gcg ctg cgc gac 528Ser Gly Ser Ala Thr Leu Lys Asp Ala Cys Asn Glu Ala Leu Arg Asp 165 170 175tgg tcc ggt agt tac gaa acc gcg cac tat ctg ctg ggc acc gca gct 576Trp Ser Gly Ser Tyr Glu Thr Ala His Tyr Leu Leu Gly Thr Ala Ala 180 185 190ggc ccg cat cct tat ccg acc att gtg cgt gag ttt caa cgg atc att 624Gly Pro His Pro Tyr Pro Thr Ile Val Arg Glu Phe Gln Arg Ile Ile 195 200 205ggc gaa gaa acc aaa gcg cag att ctg gaa aga gaa ggt cgc ctg ccg 672Gly Glu Glu Thr Lys Ala Gln Ile Leu Glu Arg Glu Gly Arg Leu Pro 210 215 220gac gcc gtt atc gcc tgt gtt ggc ggc ggt tct aac gcc atc ggc atc 720Asp Ala Val Ile Ala Cys Val Gly Gly Gly Ser Asn Ala Ile Gly Ile225 230 235 240ttt gct gat ttc atc aac gaa acc aac gtc ggc ctg att ggt gtg gag 768Phe Ala Asp Phe Ile Asn Glu Thr Asn Val Gly Leu Ile Gly Val Glu 245 250 255cca ggt ggt cac ggt atc gaa act ggc gag cac ggc gca ccg cta aaa 816Pro Gly Gly His Gly Ile Glu Thr Gly Glu His Gly Ala Pro Leu Lys 260 265 270cac ggt cgc gtg ggt atc tat ttc ggt ctg aaa gcg ccg atc ctg caa 864His Gly Arg Val Gly Ile Tyr Phe Gly Leu Lys Ala Pro Ile Leu Gln 275 280 285acc gaa gac ggg cag att gaa gaa tct tac tcc atc tcc gcc gga ctg 912Thr Glu Asp Gly Gln Ile Glu Glu Ser Tyr Ser Ile Ser Ala Gly Leu 290 295 300gat ttc ccg tct gtc ggc cca caa cac gcc tat ctt aac agc act gga 960Asp Phe Pro Ser Val Gly Pro Gln His Ala Tyr Leu Asn Ser Thr Gly305 310 315 320cgc gct gat tac gtg tct att acc gac gac gaa gcc ctt gaa gcc ttc 1008Arg Ala Asp Tyr Val Ser Ile Thr Asp Asp Glu Ala Leu Glu Ala Phe 325 330 335aaa acg ctg tgc ctg cac gaa ggg atc atc ccg gcg ctg gaa tcc tcc 1056Lys Thr Leu Cys Leu His Glu Gly Ile Ile Pro Ala Leu Glu Ser Ser 340 345 350cac gcc ctg gcc cac gcc ttg aaa ctg gct cgc gaa aac ccg gat aaa 1104His Ala Leu Ala His Ala Leu Lys Leu Ala Arg Glu Asn Pro Asp Lys 355 360 365gag cag cta ctg gtg gtc aac ctt tcc ggt cgc ggc gat aaa gac atc 1152Glu Gln Leu Leu Val Val Asn Leu Ser Gly Arg Gly Asp Lys Asp Ile 370 375 380ttc acc gtt cac gat att ttg aaa gca cga ggg gaa atc tga 1194Phe Thr Val His Asp Ile Leu Lys Ala Arg Gly Glu Ile385 390 395137397PRTArtificialSynthetic Construct 137Met Thr Thr Leu Leu Asn Pro Tyr Phe Gly Glu Phe Gly Gly Gln Tyr1 5 10 15Val Pro Gln Ile Leu Val Pro Ala Leu Arg Gln Leu Glu Glu Ala Phe 20 25 30Val Ser Ala Gln Lys Asp Pro Glu Phe Gln Ala Gln Phe Asn Asp Leu 35 40 45Leu Lys Asn Tyr Ala Gly Arg Pro Thr Ala Leu Thr Lys Cys Gln Asn 50 55 60Ile Thr Ala Gly Thr Asn Thr Thr Leu Tyr Leu Lys Arg Glu Asp Leu65 70 75 80Leu His Gly Gly Ala His Lys Thr Asn Gln Val Leu Gly Gln Ala Leu 85 90 95Leu Ala Lys Arg Leu Gly Lys Thr Glu Ile Ile Ala Glu Thr Gly Ala 100 105 110Gly Gln His Gly Val Ala Ser Ala Leu Ala Ser Ala Leu Leu Gly Leu 115 120 125Lys Cys Arg Ile Tyr Leu Gly Ala Lys Asp Val Glu Arg Gln Ser Pro 130 135 140Asn Val Phe Arg Leu Arg Leu Leu Gly Ala Glu Val Ile Pro Val His145 150 155 160Ser Gly Ser Ala Thr Leu Lys Asp Ala Cys Asn Glu Ala Leu Arg Asp 165 170 175Trp Ser Gly Ser Tyr Glu Thr Ala His Tyr Leu Leu Gly Thr Ala Ala 180 185 190Gly Pro His Pro Tyr Pro Thr Ile Val Arg Glu Phe Gln Arg Ile Ile 195 200 205Gly Glu Glu Thr Lys Ala Gln Ile Leu Glu Arg Glu Gly Arg Leu Pro 210 215 220Asp Ala Val Ile Ala Cys Val Gly Gly Gly Ser Asn Ala Ile Gly Ile225 230 235 240Phe Ala Asp Phe Ile Asn Glu Thr Asn Val Gly Leu Ile Gly Val Glu 245 250 255Pro Gly Gly His Gly Ile Glu Thr Gly Glu His Gly Ala Pro Leu Lys 260 265 270His Gly Arg Val Gly Ile Tyr Phe Gly Leu Lys Ala Pro Ile Leu Gln 275 280 285Thr Glu Asp Gly Gln Ile Glu Glu Ser Tyr Ser Ile Ser Ala Gly Leu 290 295 300Asp Phe Pro Ser Val Gly Pro Gln His Ala Tyr Leu Asn Ser Thr Gly305 310 315 320Arg Ala Asp Tyr Val Ser Ile Thr Asp Asp Glu Ala Leu Glu Ala Phe 325 330 335Lys Thr Leu Cys Leu His Glu Gly Ile Ile Pro Ala Leu Glu Ser Ser 340 345 350His Ala Leu Ala His Ala Leu Lys Leu Ala Arg Glu Asn Pro Asp Lys 355 360 365Glu Gln Leu Leu Val Val Asn Leu Ser Gly Arg Gly Asp Lys Asp Ile 370 375 380Phe Thr Val His Asp Ile Leu Lys Ala Arg Gly Glu Ile385 390 3951381305DNASalmonella typhimuriumwt his sequence(1)..(1305)CDS(1)..(1305) 138atg agc ttc aat acc ctg att gac tgg aac agc tgt agc cct gaa cag 48Met Ser Phe Asn Thr Leu Ile Asp Trp Asn Ser Cys Ser Pro Glu Gln1 5 10 15cag cgt gcg ctg ctg acg cgt ccg gcg att tcc gcc tct gac agt att 96Gln Arg Ala Leu Leu Thr Arg Pro Ala Ile Ser Ala Ser Asp Ser Ile 20 25 30acc cgg acg gtc agc gat att ctg gat aat gta aaa acg cgc ggt gac 144Thr Arg Thr Val Ser Asp Ile Leu Asp Asn Val Lys Thr Arg Gly Asp 35 40 45gat gcc ctg cgt gaa tac agc gct aaa ttt gat aaa aca gaa gtg aca 192Asp Ala Leu Arg Glu Tyr Ser Ala Lys Phe Asp Lys Thr Glu Val Thr 50 55 60gcg cta cgc gtc acc cct gaa gag atc gcc gcc gcc ggc gcg cgt ctg 240Ala Leu Arg Val Thr Pro Glu Glu Ile Ala Ala Ala Gly Ala Arg Leu65 70 75 80agc gac gaa tta aaa cag gcg atg acc gct gcc gtc aaa aat att gaa 288Ser Asp Glu Leu Lys Gln Ala Met Thr Ala Ala Val Lys Asn Ile Glu 85 90 95acg ttc cat tcc gcg cag acg cta ccg cct gta gat gtg gaa acc cag 336Thr Phe His Ser Ala Gln Thr Leu Pro Pro Val Asp Val Glu Thr Gln 100 105 110cca ggc gtg cgt tgc cag cag gtt acg cgt ccc gtc tcg tct gtc ggt 384Pro Gly Val Arg Cys Gln Gln Val Thr Arg Pro Val Ser Ser Val Gly 115 120 125ctg tat att ccc ggc ggc tcg gct ccg ctc ttc tca acg gtg ctg atg 432Leu Tyr Ile Pro Gly Gly Ser Ala Pro Leu Phe Ser Thr Val Leu Met 130 135 140ctg gcg acg ccg gcg cgc att gcg gga tgc cag aag gtg gtt ctg tgc 480Leu Ala Thr Pro Ala Arg Ile Ala Gly Cys Gln Lys Val Val Leu Cys145 150 155 160tcg ccg ccg ccc atc gct gat gaa atc ctc tat gcg gcg caa ctg tgt 528Ser Pro Pro Pro Ile Ala Asp Glu Ile Leu Tyr Ala Ala Gln Leu Cys 165 170 175ggc gtg cag gaa atc ttt aac gtc ggc ggc gcg cag gcg att gcc gct 576Gly Val Gln Glu Ile Phe Asn Val Gly Gly Ala Gln Ala Ile Ala Ala 180 185 190ctg gcc ttc ggc agc gag tcc gta ccg aaa gtg gat aaa att ttt ggc 624Leu Ala Phe Gly Ser Glu Ser Val Pro Lys Val Asp Lys Ile Phe Gly 195 200 205ccc ggc aac gcc ttt gta acc gaa gcc aaa cgt cag gtc agc cag cgt 672Pro Gly Asn Ala Phe Val Thr Glu Ala Lys Arg Gln Val Ser Gln Arg 210 215 220ctc gac ggc gcg gct atc gat atg cca gcc ggg ccg tct gaa gta ctg 720Leu Asp Gly Ala Ala Ile Asp Met Pro Ala Gly Pro Ser Glu Val Leu225 230 235 240gtg atc gca gac agc ggc gca aca ccg gat ttc gtc gct tct gac ctg 768Val Ile Ala Asp Ser Gly Ala Thr Pro Asp Phe Val Ala Ser Asp Leu 245 250 255ctc tcc cag gct gag cac ggc ccg gat tcc cag gtg atc ctg ctg acg 816Leu Ser Gln Ala Glu His Gly Pro Asp Ser Gln Val Ile Leu Leu Thr 260 265 270cct gat gct gac att gcc cgc aag gtg gcg gag gcg gta gaa cgt caa 864Pro Asp Ala Asp Ile Ala Arg Lys Val Ala Glu Ala Val Glu Arg Gln 275 280 285ctg gcg gaa ctg ccg cgc gcg gac acc gcc cgg cag gcc ctg agc gcc 912Leu Ala Glu Leu Pro Arg Ala Asp Thr Ala Arg Gln Ala Leu Ser Ala 290 295 300agt cgt ctg att gtg acc aaa gat tta gcg cag tgc gtc gcc atc tct 960Ser Arg Leu Ile Val Thr Lys Asp Leu Ala Gln Cys Val Ala Ile Ser305 310 315 320aat cag tat ggg ccg gaa cac tta atc atc cag acg cgc aat gcg cgc 1008Asn Gln Tyr Gly Pro Glu His Leu Ile Ile Gln Thr Arg Asn Ala Arg 325 330 335gat ttg gtg gat gcg att acc agc gca ggc tcg gta ttt ctc ggc gac 1056Asp Leu Val Asp Ala Ile Thr Ser Ala Gly Ser Val Phe Leu Gly Asp 340 345 350tgg tcg ccg gaa tcc gcc ggt gat tac gct tcc gga acc aac cat gtt 1104Trp Ser Pro Glu Ser Ala Gly Asp Tyr Ala Ser Gly Thr Asn His Val 355 360 365tta ccg acc tat ggc tat act gct acc tgt tcc agc ctt ggg tta gcg 1152Leu Pro Thr Tyr Gly Tyr Thr Ala Thr Cys Ser Ser Leu Gly Leu Ala 370 375 380gat ttc cag aaa cgg atg acc gtt cag gaa ctg tcg aaa gcg ggc ttt 1200Asp Phe Gln Lys Arg Met Thr Val Gln Glu Leu Ser Lys Ala Gly Phe385 390 395 400tcc gct ctg gca tca acc att gaa aca ttg gcg gcg gca gaa cgt ctg 1248Ser Ala Leu Ala Ser Thr Ile Glu Thr Leu Ala Ala Ala Glu Arg Leu 405 410 415acc gcc cat aaa aat gcc gtg acc ctg cgc gta aac gcc ctc aag gag 1296Thr Ala His Lys Asn Ala Val Thr Leu Arg Val Asn Ala Leu Lys Glu 420 425 430caa gca tga 1305Gln Ala139434PRTSalmonella typhimurium 139Met Ser Phe Asn Thr Leu Ile Asp Trp Asn Ser Cys Ser Pro Glu Gln1 5 10 15Gln Arg Ala Leu Leu Thr Arg Pro Ala Ile Ser Ala Ser Asp Ser Ile 20 25 30Thr Arg Thr Val Ser Asp Ile Leu Asp Asn Val Lys Thr Arg Gly Asp 35 40 45Asp Ala Leu Arg Glu Tyr Ser Ala Lys Phe Asp Lys Thr Glu Val Thr 50 55 60Ala Leu Arg Val Thr Pro Glu Glu Ile Ala Ala Ala Gly Ala Arg Leu65 70 75 80Ser Asp Glu Leu Lys Gln Ala Met Thr Ala Ala Val Lys Asn Ile Glu 85 90 95Thr Phe His Ser Ala Gln Thr Leu Pro Pro Val Asp Val Glu Thr Gln 100 105 110Pro Gly Val Arg Cys Gln Gln Val Thr Arg Pro Val Ser Ser Val Gly 115 120 125Leu Tyr Ile Pro Gly Gly Ser Ala Pro Leu Phe Ser Thr Val Leu Met 130 135 140Leu Ala Thr Pro Ala Arg Ile Ala Gly Cys Gln Lys Val Val Leu Cys145 150 155 160Ser Pro Pro Pro Ile Ala Asp Glu Ile Leu Tyr Ala Ala Gln Leu Cys 165 170 175Gly Val Gln Glu Ile Phe Asn Val Gly Gly Ala Gln Ala Ile Ala Ala 180 185 190Leu Ala Phe Gly Ser Glu Ser Val Pro Lys Val Asp Lys Ile Phe Gly 195 200 205Pro Gly Asn Ala Phe Val Thr Glu Ala Lys Arg Gln Val Ser Gln Arg 210 215 220Leu Asp Gly Ala Ala Ile Asp Met Pro Ala Gly Pro Ser Glu Val Leu225 230 235 240Val Ile Ala Asp Ser Gly Ala Thr Pro Asp Phe Val Ala Ser Asp Leu 245 250 255Leu Ser Gln Ala Glu His Gly Pro Asp Ser Gln Val Ile Leu Leu Thr 260 265 270Pro Asp Ala Asp Ile Ala Arg Lys Val Ala Glu Ala Val Glu Arg Gln 275 280 285Leu Ala Glu Leu Pro Arg Ala Asp Thr Ala Arg Gln Ala Leu Ser Ala 290 295 300Ser Arg Leu Ile Val Thr Lys Asp Leu Ala Gln Cys Val Ala Ile Ser305 310 315 320Asn Gln Tyr Gly Pro Glu His Leu Ile Ile Gln Thr Arg Asn Ala Arg 325 330 335Asp Leu Val Asp Ala Ile Thr Ser Ala Gly Ser Val Phe Leu Gly Asp 340 345 350Trp Ser Pro Glu Ser Ala Gly Asp Tyr Ala Ser Gly Thr Asn His Val 355 360 365Leu Pro Thr Tyr Gly Tyr Thr Ala Thr Cys Ser Ser Leu Gly Leu Ala 370 375 380Asp Phe Gln Lys Arg Met Thr Val Gln Glu Leu Ser Lys Ala Gly Phe385 390 395 400Ser Ala Leu Ala Ser Thr Ile Glu Thr Leu Ala Ala Ala Glu Arg Leu 405 410 415Thr Ala His Lys Asn Ala Val Thr Leu Arg Val Asn Ala Leu Lys Glu 420 425 430Gln Ala1401305DNAArtificialATG-less his sequence 140atg agc ttc aat acc ctg att gac tgg aac agc tgt agc cct gaa cag 48Met Ser Phe Asn Thr Leu Ile Asp Trp Asn Ser Cys Ser Pro Glu Gln1 5 10 15cag cgt gcg ctg ctg acg cgt ccg gcg att tcc gcc tct gac agt att 96Gln Arg Ala Leu Leu Thr Arg Pro Ala Ile Ser Ala Ser Asp Ser Ile 20 25 30acc cgg acg gtc agc gat att ctg gat aac gta aaa acg cgc ggt gac 144Thr Arg Thr Val Ser Asp Ile Leu Asp Asn Val Lys Thr Arg Gly Asp 35 40 45gac gcc ctg cgt gaa tac agc gct aaa ttt gat aaa aca gaa gtg aca 192Asp Ala Leu Arg Glu Tyr Ser Ala Lys Phe Asp Lys Thr Glu Val Thr 50 55 60gcg cta cgc gtc acc cct gaa gag atc gcc gcc gcc ggc gcg cgt ctg 240Ala Leu Arg Val Thr Pro Glu Glu Ile Ala Ala Ala Gly Ala Arg Leu65 70 75 80agc gac gaa tta aaa cag gcg att acc gct gcc gtc aaa aat att gaa 288Ser Asp Glu Leu Lys Gln Ala Ile Thr Ala Ala Val Lys Asn Ile Glu 85 90 95acg ttc cat tcc gcg cag acg cta ccg cct gta gac gtg gaa acc cag 336Thr Phe

His Ser Ala Gln Thr Leu Pro Pro Val Asp Val Glu Thr Gln 100 105 110cca ggc gtg cgt tgc cag cag gtt acg cgt ccc gtc tcg tct gtc ggt 384Pro Gly Val Arg Cys Gln Gln Val Thr Arg Pro Val Ser Ser Val Gly 115 120 125ctg tat att ccc ggc ggc tcg gct ccg ctc ttc tca acg gtg ctg ctg 432Leu Tyr Ile Pro Gly Gly Ser Ala Pro Leu Phe Ser Thr Val Leu Leu 130 135 140ctg gcg acg ccg gcg cgc att gcg ggt tgc cag aag gtg gtt ctg tgc 480Leu Ala Thr Pro Ala Arg Ile Ala Gly Cys Gln Lys Val Val Leu Cys145 150 155 160tcg ccg ccg ccc atc gct gac gaa atc ctc tac gcg gcg caa ctg tgt 528Ser Pro Pro Pro Ile Ala Asp Glu Ile Leu Tyr Ala Ala Gln Leu Cys 165 170 175ggc gtg cag gaa atc ttt aac gtc ggc ggc gcg cag gcg att gcc gct 576Gly Val Gln Glu Ile Phe Asn Val Gly Gly Ala Gln Ala Ile Ala Ala 180 185 190ctg gcc ttc ggc agc gag tcc gta ccg aaa gtg gat aaa att ttt ggc 624Leu Ala Phe Gly Ser Glu Ser Val Pro Lys Val Asp Lys Ile Phe Gly 195 200 205ccc ggc aac gcc ttt gta acc gaa gcc aaa cgt cag gtc agc cag cgt 672Pro Gly Asn Ala Phe Val Thr Glu Ala Lys Arg Gln Val Ser Gln Arg 210 215 220ctc gac ggc gcg gct atc gat att cca gcc ggg ccg tct gaa gta ctg 720Leu Asp Gly Ala Ala Ile Asp Ile Pro Ala Gly Pro Ser Glu Val Leu225 230 235 240gtg atc gca gac agc ggc gca aca ccg gat ttc gtc gct tct gac ctg 768Val Ile Ala Asp Ser Gly Ala Thr Pro Asp Phe Val Ala Ser Asp Leu 245 250 255ctc tcc cag gct gag cac ggc ccg gat tcc cag gtg atc ctg ctg acg 816Leu Ser Gln Ala Glu His Gly Pro Asp Ser Gln Val Ile Leu Leu Thr 260 265 270cct gac gct gac att gcc cgc aag gtg gcg gag gcg gta gaa cgt caa 864Pro Asp Ala Asp Ile Ala Arg Lys Val Ala Glu Ala Val Glu Arg Gln 275 280 285ctg gcg gaa ctg ccg cgc gcg gac acc gcc cgg cag gcc ctg agc gcc 912Leu Ala Glu Leu Pro Arg Ala Asp Thr Ala Arg Gln Ala Leu Ser Ala 290 295 300agt cgt ctg att gtg acc aaa gat tta gcg cag tgc gtc gcc atc tct 960Ser Arg Leu Ile Val Thr Lys Asp Leu Ala Gln Cys Val Ala Ile Ser305 310 315 320aat cag tac ggg ccg gaa cac tta atc atc cag acg cgc aac gcg cgc 1008Asn Gln Tyr Gly Pro Glu His Leu Ile Ile Gln Thr Arg Asn Ala Arg 325 330 335gat ttg gtg gac gcg att acc agc gca ggc tcg gta ttt ctc ggc gac 1056Asp Leu Val Asp Ala Ile Thr Ser Ala Gly Ser Val Phe Leu Gly Asp 340 345 350tgg tcg ccg gaa tcc gcc ggt gat tac gct tcc gga acc aac cac gtt 1104Trp Ser Pro Glu Ser Ala Gly Asp Tyr Ala Ser Gly Thr Asn His Val 355 360 365tta ccg acc tac ggc tat act gct acc tgt tcc agc ctt ggg tta gcg 1152Leu Pro Thr Tyr Gly Tyr Thr Ala Thr Cys Ser Ser Leu Gly Leu Ala 370 375 380gat ttc cag aaa cgg att acc gtt cag gaa ctg tcg aaa gcg ggc ttt 1200Asp Phe Gln Lys Arg Ile Thr Val Gln Glu Leu Ser Lys Ala Gly Phe385 390 395 400tcc gct ctg gca tca acc att gaa aca ttg gcg gcg gca gaa cgt ctg 1248Ser Ala Leu Ala Ser Thr Ile Glu Thr Leu Ala Ala Ala Glu Arg Leu 405 410 415acc gcc cat aaa aac gcc gtg acc ctg cgc gta aac gcc ctc aag gag 1296Thr Ala His Lys Asn Ala Val Thr Leu Arg Val Asn Ala Leu Lys Glu 420 425 430caa gca taa 1305Gln Ala141434PRTArtificialSynthetic Construct 141Met Ser Phe Asn Thr Leu Ile Asp Trp Asn Ser Cys Ser Pro Glu Gln1 5 10 15Gln Arg Ala Leu Leu Thr Arg Pro Ala Ile Ser Ala Ser Asp Ser Ile 20 25 30Thr Arg Thr Val Ser Asp Ile Leu Asp Asn Val Lys Thr Arg Gly Asp 35 40 45Asp Ala Leu Arg Glu Tyr Ser Ala Lys Phe Asp Lys Thr Glu Val Thr 50 55 60Ala Leu Arg Val Thr Pro Glu Glu Ile Ala Ala Ala Gly Ala Arg Leu65 70 75 80Ser Asp Glu Leu Lys Gln Ala Ile Thr Ala Ala Val Lys Asn Ile Glu 85 90 95Thr Phe His Ser Ala Gln Thr Leu Pro Pro Val Asp Val Glu Thr Gln 100 105 110Pro Gly Val Arg Cys Gln Gln Val Thr Arg Pro Val Ser Ser Val Gly 115 120 125Leu Tyr Ile Pro Gly Gly Ser Ala Pro Leu Phe Ser Thr Val Leu Leu 130 135 140Leu Ala Thr Pro Ala Arg Ile Ala Gly Cys Gln Lys Val Val Leu Cys145 150 155 160Ser Pro Pro Pro Ile Ala Asp Glu Ile Leu Tyr Ala Ala Gln Leu Cys 165 170 175Gly Val Gln Glu Ile Phe Asn Val Gly Gly Ala Gln Ala Ile Ala Ala 180 185 190Leu Ala Phe Gly Ser Glu Ser Val Pro Lys Val Asp Lys Ile Phe Gly 195 200 205Pro Gly Asn Ala Phe Val Thr Glu Ala Lys Arg Gln Val Ser Gln Arg 210 215 220Leu Asp Gly Ala Ala Ile Asp Ile Pro Ala Gly Pro Ser Glu Val Leu225 230 235 240Val Ile Ala Asp Ser Gly Ala Thr Pro Asp Phe Val Ala Ser Asp Leu 245 250 255Leu Ser Gln Ala Glu His Gly Pro Asp Ser Gln Val Ile Leu Leu Thr 260 265 270Pro Asp Ala Asp Ile Ala Arg Lys Val Ala Glu Ala Val Glu Arg Gln 275 280 285Leu Ala Glu Leu Pro Arg Ala Asp Thr Ala Arg Gln Ala Leu Ser Ala 290 295 300Ser Arg Leu Ile Val Thr Lys Asp Leu Ala Gln Cys Val Ala Ile Ser305 310 315 320Asn Gln Tyr Gly Pro Glu His Leu Ile Ile Gln Thr Arg Asn Ala Arg 325 330 335Asp Leu Val Asp Ala Ile Thr Ser Ala Gly Ser Val Phe Leu Gly Asp 340 345 350Trp Ser Pro Glu Ser Ala Gly Asp Tyr Ala Ser Gly Thr Asn His Val 355 360 365Leu Pro Thr Tyr Gly Tyr Thr Ala Thr Cys Ser Ser Leu Gly Leu Ala 370 375 380Asp Phe Gln Lys Arg Ile Thr Val Gln Glu Leu Ser Lys Ala Gly Phe385 390 395 400Ser Ala Leu Ala Ser Thr Ile Glu Thr Leu Ala Ala Ala Glu Arg Leu 405 410 415Thr Ala His Lys Asn Ala Val Thr Leu Arg Val Asn Ala Leu Lys Glu 420 425 430Gln Ala

* * * * *