U.S. patent application number 10/740714 was filed with the patent office on 2004-11-18 for cloning vectors and vector components.
Invention is credited to Godiska, Ronald, Mead, David Alan.
Application Number | 20040229359 10/740714 |
Document ID | / |
Family ID | 26668480 |
Filed Date | 2004-11-18 |
United States Patent
Application |
20040229359 |
Kind Code |
A1 |
Mead, David Alan ; et
al. |
November 18, 2004 |
Cloning vectors and vector components
Abstract
The present invention relates to systems, methods, and
compositions for cloning and sequencing insert nucleic acid
sequences. In particular, the present invention provides vectors
and vector components configured for multiplex cloning, multiplex
sequencing, and fixed orientation cloning. The present invention
also provides vectors and vector components that allow insert
sequences that are deleterious to a host cell to be successfully
cloned.
Inventors: |
Mead, David Alan;
(Middleton, WI) ; Godiska, Ronald; (Verona,
WI) |
Correspondence
Address: |
Jason R. Bond
MEDLEN & CARROLL, LLP
101 Howard Street, Suite 350
San Francisco
CA
94105
US
|
Family ID: |
26668480 |
Appl. No.: |
10/740714 |
Filed: |
December 19, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10740714 |
Dec 19, 2003 |
|
|
|
10001052 |
Nov 15, 2001 |
|
|
|
6709861 |
|
|
|
|
60249594 |
Nov 17, 2000 |
|
|
|
Current U.S.
Class: |
435/455 ;
435/320.1 |
Current CPC
Class: |
C12N 15/66 20130101;
C12N 15/64 20130101 |
Class at
Publication: |
435/455 ;
435/320.1 |
International
Class: |
C12N 015/85 |
Goverment Interests
[0002] The present application was funded in part with government
support under grant number Grant # HG01800-03 from the National
Human Genome Research Institute of the National Institute of
Health. The government has certain rights in this invention.
Claims
1. A composition comprising X+1 vector components, wherein each of
said X+1 vector components are configured for combining in the
presence of X+1 insert sequences to form a circular recombinant
vector such that said X+1 vector components are non-contiguous
within said circular recombinant vector.
2. The composition of claim 1, wherein each of said X+1 vector
components comprises; i) first and second free ends, and ii) a
selectable marker region comprising at least one selectable marker
sequence unique among said X+1 vector components.
3. The composition of claim 2, wherein each of said X+1 vector
components further comprises; iii) a first transcriptional
terminator between said first free end and said selectable marker
region, and iv) a second transcriptional terminator between said
second free end and said selectable marker region.
4. The composition of claim 3, wherein said first transcriptional
terminator is configured to terminate RNA transcripts entering said
selectable marker region from said first free end.
5. The composition of claim 3, wherein said second transcriptional
terminator is configured to terminate RNA transcripts entering said
selectable marker region from said second free end.
6. The composition of claim 2, wherein said selectable marker
region in each of said X+1 vector components comprises a
transcriptional terminator configured to terminate RNA transcripts
encoded by at least one selectable marker sequence in said
selectable marker region.
7. The composition of claim 2, wherein each of said X+1 vector
components comprises a first non-promoter sequence between said
first free end and said selectable marker region, and a second
non-promoter sequence between said second free end and said
selectable marker region, wherein said first and second
non-promoter sequences are unable to serve as an operable promoters
in a host cell.
8. The composition of claim 2, wherein at least one of said X+1
vector components comprises a promoter sequence between at least
one of said first or second free ends and said selectable marker
region, wherein said promoter sequence is capable of serving as an
operable promoter in a host cell.
9. The composition of claim 2, wherein said first and second free
ends are non-compatible free ends.
10. The composition of claim 1, wherein each of said X+1 vector
components comprises two primer binding sites.
11. The composition of claim 1, wherein each of said X+1 insert
sequences comprise two identical sticky free ends that are unique
among said X+1 insert sequences, wherein each of said X+1 vector
components comprises two different sticky free ends, and wherein
each of said two different sticky free ends binds one of said X+1
insert sequences.
12-28. (cancelled).
Description
[0001] The present Application claims priority to U.S. Provisional
Application Serial No. 60/249,594 filed Nov. 17, 2000, hereby
incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0003] The present invention relates to systems, methods, and
compositions for cloning and sequencing insert nucleic acid
sequences. In particular, the present invention provides vectors
and vector components configured for multiplex cloning, multiplex
sequencing, and fixed orientation cloning. The present invention
also provides vectors and vector components that allow insert
sequences that are deleterious to a host cell to be successfully
cloned.
BACKGROUND OF THE INVENTION
[0004] Prior to the 1990's, DNA sequencing was a time consuming,
labor intensive, manual protocol by which individual researchers
read 100's of bases per day from a single DNA template. It has
since evolved into an automated, robotic process by which major
genome sequencing centers read tens of millions of bases from tens
of thousands of DNA templates per day. This vast increase in
sequencing capacity has broadened the scope of DNA sequencing to
entire genomes rather than individual genes. It has likewise
created a need to increase the rate of throughput in all stages of
the sequencing process.
[0005] The most prominent example of large scale sequencing to date
is the Human Genome Initiative, an effort to sequence all 3.3
billion bases of the human genome. Begun in 1990, the Human Genome
Initiative was declared "finished" on Jun. 26, 2000, by the major
genome centers involved. The public draft genome released by the
National Institutes of Health consortia was 85% assembled, with 97%
of the genome covered by clones whose location is known. This
project required reading some 25 million DNA sequences. In a
completely independent effort, Celera Corporation claimed to have
99% of the genome sequence assembled at a 3.times. redundancy
level, which required 27 million DNA sequencing reads.
[0006] The public effort for "complete and accurate" sequencing,
typically defined as 5.times. coverage and an accuracy of not more
than 1 mistake every 10,000 bases, will require sequencing millions
of additional plasmid clones over several more years to obtain high
quality data on the entire genome. Because so much of the human
genome is not characterized, a more complete understanding of it
will be facilitated by sequencing the genomes of other organisms
for comparison, such as the mouse, rat, dog, and chimpanzee. In
fact, Celera claims to have sequenced three mouse genomes during
the year 2000, while the NIH consortia of university and
international genome centers have begun work on the mouse and rat
genome. The NIH has also initiated funding of pilot sequencing
projects for the chicken, puffer fish, and zebra fish.
[0007] At the 12.sup.th International Genome Sequencing and
Analysis Conference in Miami, Fla. (Sep. 12-15, 2000), Celera
presented data showing that over 200,000 plasmid template
purifications a day are required to sustain their ongoing
sequencing efforts. The NIH consortia purify a similar number of
templates on a daily basis. Genome sequencing facilities at other
large corporations, overseas national genome projects, and smaller
academic labs sequence an additional 500,000 plasmid templates per
day. Thus, the worldwide rate of sequencing is rapidly approaching
1,000,000 templates per day.
[0008] The generation of clone banks, or libraries, of DNA is an
important intermediate step in sequence analysis of whole genomes.
In a process called shotgun cloning and sequencing, large molecules
of DNA, often more than 100,000 bases (100 kb) in length, are
fragmented and reduced to libraries of numerous sub-clones of
approximately 1-4 kb for propagation and sequence analysis. Most
large-scale DNA sequencing strategies depend on a multi-step
process to randomly fragment the target molecule into these smaller
pieces, which are then enzymatically joined (ligated) into a
cloning vector in a reaction that inserts one or more DNA fragments
into a single site in each vector molecule (Fitzgerald et al.,
Nucleic Acids Res. 14:3753 [1992]). This ligation mixture is
introduced into specific strains of Eschericia coli (E. coli), with
each bacterial cell propagating one vector along with any DNA
fragments it carries. The vector DNA, which may or may not contain
an insert, is purified from each cell line and used as a template
in an enzymatic sequencing reaction (Sanger et al., Proc Natl Acad
Sci USA 74:5463 [1977]; Prober et al., Science 238:336 [1987];
Tabor and Richarson, Proc Natl Acad Sci U S A 92:6339 [1995], all
of which are hereby incorporated by reference). The reaction
product is analyzed by automated sequencing instruments to
determine the linear sequence of the sub-cloned DNA fragments
(Smith et al., Nature 321:674 [1986], hereby incorporated by
reference). Computer algorithms are used to assemble the data from
the library of sub-fragments, typically producing sequence
information for 80-95% of the original DNA molecule. "Gap filling"
techniques are used to determine the remaining 5-20% of the target
DNA.
[0009] Although most DNA sequencing methods utilize one template or
primer per sequencing reaction, there are exceptions to this
pattern. In early examples, Church et al. (Science 240: 185 [1988])
and Creasey et al. (Bio Techniques 11: 102 [1991]) performed
multiple Sanger dideoxy sequencing reactions in a single set of
four tubes, using vectors containing unique sequence tags. The
reactions from each set of tubes were run on a sequencing gel and
transferred to a nylon membrane. Each sequence reaction was then
detected by sequentially probing the membrane with an
oligonucleotide specific for the tag on each vector. Other
variations on this theme have also been developed (Cherry et al.,
Genomics 20: 68 [1994]).
[0010] Subsequently, Wiemann et al. (Anal. Biochem. 224: 117
[1995]; Anal. Biochem. 234: 166 [1996]) showed that fluorescently
labeled sequencing primers could be used to simultaneously sequence
both strands of a dsDNA template. Recent examples have demonstrated
multiplex co-sequencing using the four-color dye terminator
reaction chemistry pioneered by Prober et al. (Science 238: 336
[1987]). At the 10th International Genome Sequencing and Analysis
Conference, (Sep. 17-20, 1998, Miami Beach, Fla.), Uhlen (Royal
Institute of Technology) and Chiesa (PE Biosystems) independently
showed that biotinylated oligomers could be used to specifically
capture an individual sequencing reaction from a pool of multiple
reactions in a single tube.
[0011] Numerous vectors are available for cloning DNA into E. coli.
Conventional plasmid vectors are normally double stranded circular
DNA molecules containing restriction enzyme recognition sites
suitable for inserting exogenous DNA sequences, an antibiotic
selectable gene, an origin of replication for autonomous
propagation in the host cell, and a gene for the discrimination or
selection of clones that contain recombinant insert DNA.
[0012] One of the first recombinant DNA cloning systems used a dual
antibiotic resistant plasmid such as pBR322 (Bolivar et al., Gene
2:95 [1977]). One of the resistance genes served to select for
those cells taking up plasmid DNA. This gene was typically the
beta-lactamase gene (Amp or ampR), which confers resistance to
ampicillin (amp). The other resistance gene, Tet or tetR, encoding
resistance to tetracycline (tet), was used indirectly as the
indicator for recombinant clones. The foreign DNA fragment was
inserted into any of a number of restriction sites within the Tet
gene, resulting in inactivation of the Tet gene and sensitivity of
the transformed cell to killing by tetracycline.
[0013] Thus, to find those clones that might have contained foreign
insert DNA, the transformed cells were first spread onto
ampicillin-containing plates. Those colonies that grew were replica
plated onto tetracycline-containing plates. The colonies growing on
the ampicillin but not on the tetracycline plates were likely
candidates for further analysis. This screening method required
additional labor and time compared to newer methods and is rarely
used now.
[0014] The predominant cloning system in use for the last two
decades is the "blue screen" method. Blue screen vectors contain a
selectable marker such as the ampicillin resistance gene described
above. However, the tetracycline screen is replaced by a color
discrimination technique based on insertional inactivation of a
genetically engineered gene that encodes beta galactosidase
(.beta.Gal). The bacteriophage M13mp series and plasmid pUC series
of cloning vehicles are ubiquitous examples of this screening
method. These vectors encode the N-terminal 60 amino acids of the
.beta.Gal gene, the so-called lacZ.alpha. peptide, which is
inactive as such. Another inactive, truncated portion of lacZ (the
lacZ.DELTA.M15 allele) is carried on an F' episome of the host
bacteria, which can complement the lacZ.alpha. peptide to restore
.beta.Gal activity. Cells containing non-recombinant vectors
therefore produce functional .beta.Gal, which can hydrolyze the
indicator chemical XGAL
(5-bromo-4-chloro-3-indolyl-beta-galactoside) to produce a blue
colored product.
[0015] The lacZ.alpha. fragment in the vector also contains a
series of cloning sites, termed the multiple cloning site, situated
such that insertion of foreign DNA into any one site disrupts the
lacZ.alpha. peptide. An insertion into a site generally, but not
always, inactivates the lacZ.alpha. fragment. Thus, cells
containing an insert in the vector generally do not produce active
.beta.Gal. These recombinant clones therefore remain white.
[0016] The advantage of the blue screen is that it is a visual
assay to discriminate recombinant clones from non-recombinants.
However, there are a number of disadvantages to this cloning
strategy. One disadvantage is that the substrate XGAL is expensive,
unstable, and awkward to use. Another chemical compound, IPTG
(isopropyl-.beta.-D-thiogalactoside), a gratuitous inducer of the
lac promoter that drives lacZ.alpha. in these vectors, is also
often required for this cloning system. Another disadvantage is
that the high percentage of non-recombinant (blue) colonies compete
for nutrients and space with the desired recombinant colonies. A
need exists for cloning systems that eliminate the requirement of
exogenous chemical additives for screening.
[0017] A more significant problem with blue screen cloning
technology is the issue of false negative and false positive
results, as well as results that cannot be easily classified
(Slilaty et al., Gene 213:83 [1998]). False positive results are
colonies or plaques that appear white or uncolored, yet do not
contain a foreign DNA insert in the lacZ.alpha. cloning vectors.
Among the external factors responsible for generating false
positives are: (1) contamination of the restriction or modifying
enzymes used to process the vector (e.g., exonucleases that remove
bases from the termini of the lacZ.alpha. fragment, creating
frame-shifts that inactivate the fragment), (2) spontaneous
mutations in the lacZ.alpha. fragment or in the lacZ.DELTA.M15
allele, and (3) loss of the F' episome carrying the lacZ.DELTA.M15
allele. False positive results are carried forward and analyzed as
real positive clones, eventually being detected as empty, deleted,
or otherwise mutated vector DNA when further analyzed.
[0018] False negative results are blue colonies or plaques that
actually do contain foreign DNA inserted in the lacZ.alpha. based
vector. There are two principle causes of false negative results
using blue screen vectors: (1) in-frame insertion of DNA fragments
containing one or more open reading frames, and (2) reinitiation of
translation within the mRNA transcribed from the inserted DNA
fragment. Either event results in the synthesis of the lacZ
.alpha.-peptide fused to a foreign peptide, which often does not
impair its activity. Because the fusion peptide restores .beta.Gal
activity, these clones produce the blue color and are erroneously
discarded as non-recombinants.
[0019] Another problem is the hypersensitivity of the XGAL assay
system. Because very little beta-galactosidase activity is required
to produce a color reaction, inserts in blue screen vectors often
result in "light blue" and "dark white" colony phenotypes that
complicate the interpretation of cloning results. These blue false
negatives are rarely carried forward for analysis and can lead to
erroneous conclusions that the DNA fragments they carry are
"non-clonable." This bias against certain sequences may lead to
excessive gaps in shotgun DNA sequencing results as well. Thus, a
need exists for cloning systems that do not rely on the blue screen
technology.
[0020] A cloning procedure that selectively eliminates the
background of parental non-recombinant vector would be advantageous
in any DNA library construction or sub-cloning experiment. It would
also eliminate the screening process, as well as the need to buy,
weigh, and mix the required screening chemicals. Various cloning
vectors permitting direct selection of recombinant clones have been
described in the scientific literature.
[0021] Most positive selection vectors (or "suicide" vectors) are
based on the insertional inactivation of a lethal gene product
(Henrich & Plapp, Gene 42, 345 [1986]). Insertion of a foreign
DNA fragment disrupts the lethal gene, allowing recombinant cells
to grow. Bacterial clones that carry a parental vector do not
survive, resulting in selection for clones that carry foreign DNA
fragments. The use of suicide vectors for positive selection is an
efficient strategy to suppress an undesired background of
non-recombinant clones that do not carry the desired DNA
insert.
[0022] Other examples of positive selection are based on abolition
of a particular sensitivity towards metabolites, selection by means
of DNA-degrading or RNA-degrading enzymes, or selection by means of
unstable long palindromic DNA sequences. Several problems can arise
when using the available direct selection cloning vectors. One
problem is a high number of false positive clones, i.e., viable
clones without an insert. False positives may arise from mutations
in the selection genes or their controlling genetic elements (so
called revertants), or by inadequate expression of the toxic gene
using an inducible genetic system (Bernhard et al., Gene, 148: 71
[1994]). False positive clones are typically carried forward as
real positives and are only detected as false positives after
analysis of their sequence. Thus, a need exists for a positive
selection cloning system that minimizes the number of false
positive clones.
[0023] Another problem with available direct selection vectors is a
high number of false negative clones, i.e., clones with inserts
that do not grow or grow very slowly. Similar to the situation
described above for blue screen method, certain DNA fragments may
not completely inactivate the function of the toxic gene product,
which can result in a functionally diminished but nevertheless
toxic protein. In other cases, insertion of a particular DNA
fragment may not in any way adversely affect the lethal properties
of the selection gene. Thus, no clones with the desired insert are
obtained. This may occur in particular with small DNA fragments
or/and those fragments whose nucleotide sequence is in frame with
the selection gene. False negative clones are rarely detected,
because they cannot grow on the plating media. Thus, a need exists
for a direct selection cloning system that minimizes the number of
false negative clones.
[0024] Yet another disadvantage of direct selection vectors is
that, as in the blue screen vectors, the vector contains a promoter
that actively transcribes the region into which the insert DNA is
to be cloned. Therefore, insert DNA that encodes toxic or
deleterious peptides or proteins will be harmful to the bacterial
host cell in which it is carried. Thus, a need exists for a
low-background vector that does not transcribe the inserted DNA
fragment.
[0025] A further disadvantage in some positive selection schemes is
the need to make up complex nutrient media to utilize the selection
mechanism. Thus, a need exists for direct selection cloning systems
that do not require the use of exogenous chemical compounds.
[0026] Despite the rapid evolution of sequencing, it is nonetheless
still constrained by the significant effort needed to generate
libraries of DNA templates, identify recombinant clones, and purify
the DNA from those clones. The process of constructing a random
clone library is technically challenging, inefficient, and involves
numerous steps. The present paradigm for shotgun cloning requires
one cloning reaction to generate a library of several thousand
templates, each template containing 1 or 2 primer extension sites,
which are anchor sequences for the enzymatic method of dideoxy
sequencing typically used today. Once a library is made, a vast
number of DNA templates must be grown, purified, and sequenced to
deduce the sequence of a large genome. For the human genome
project, two approaches were used to determine this genetic
blueprint. One method was the whole genome shotgun cloning approach
used by Celera Corporation. A few shotgun libraries were
constructed, but tens of millions of random clones were sequenced
using this approach. The other approach, used by the NIH consortia,
was to create an ordered array of cosmid, BAC and P1 clone
libraries, with average clone sizes of 40-100 kb. An arrayed
library covering the entire genome requires approximately 100,000
cosmid clones or 40,000 BAC or P1 clones, assuming a 20% clone
overlap. Thus, a minimum of 40,000 to 100,000 shotgun libraries are
required to sequence the human genome with this approach. Assuming
400 templates are needed to sequence a 40 kb cosmid clone, or 1000
templates per 100 kb BAC or P1 clone, approximately 40 Million
templates will be grown, purified, and sequenced. An alternative
strategy using large insert BAC clones (150 kb average inserts) and
minimal overlap predicts that 20,000 BAC clones will be sufficient
to sequence the genome. If 1500 templates are needed to sequence
each of these large insert BAC clones, then a minimum of 30 million
templates will be grown, purified, and sequenced. Additional genome
projects and failed reactions can be expected to double or triple
the number of libraries, as well as templates, required for this
undertaking. Such high-throughput demands of large-scale sequencing
necessitate improvements that will minimize rate-limiting steps.
The growth, purification, and sequencing of tens of millions of
templates are significant rate-limiting steps in the sequencing of
any large genome. What is needed are methods, compositions and
systems for cloning and sequencing insert DNA sequences that are
faster, more economical, produce very low levels of non-recombinant
vector background, and exhibit less discrimination against
fragments containing promoter-like sequences or open reading
frames.
SUMMARY OF THE INVENTION
[0027] The present invention relates to systems, methods, and
compositions for cloning and sequencing insert nucleic acid
sequences. In particular, the present invention provides vectors
and vector components configured for multiplex cloning and
multiplex sequencing. The present invention also provides vectors
and vector components configured to reduce or minimize
transcription into and out of insert sequences.
[0028] In some embodiments, a circular vector (e.g. recombinant
plasmid) is formed from at least two vector components containing
selectable marker sequences. In particular embodiments, this vector
(e.g. recombinant plasmid) is formed from at least two vector
components containing selectable marker sequences and at least two
insert DNA sequences. The formation of a vector (e.g. recombinant
plasmid) may occur, for example, in a single ligation reaction
(e.g. the two vector components and insert sequences, all separate,
are joined together in a single ligation reaction). In some
embodiments, the compositions of the present invention permit
multiplex sequencing (e.g. from a single vector constructed from at
least two vector components and at least two insert sequences). In
preferred embodiments, the source nucleic acid used to form the
vectors of the present invention are at least two separate source
nucleic acid molecules (e.g. neither of which has all of the
selectable markers contained in the final vector that is
formed).
[0029] In some embodiments, the present invention provides systems,
kits, and compositions for cloning nucleic acid comprising at least
two separate source nucleic acid molecules capable of supplying X+1
vector components, wherein the vector components are configured for
combining in the presence of X+1 insert sequences to form a closed
circular recombinant vector (e.g. recombinant plasmid). In certain
embodiments, the present invention provides systems, kits, and
compositions for cloning nucleic acid comprising at least two
different source nucleic acid molecules capable of supplying X+1
vector components, wherein the vector components are configured for
combining in the presence of X+1 insert sequences to form a closed
circular recombinant vector (e.g. recombinant plasmid). In
particular embodiments, the present invention provides systems,
kits, and compositions for cloning nucleic acid comprising at least
two separate source nucleic acid molecules capable of supplying X+1
vector components, wherein the X+1 vector components are configured
for combining in the presence of X+1 insert sequences to form a
circular vector (e.g. recombinant plasmid).
[0030] In some embodiments, the present invention provides systems,
kits, and compositions for cloning nucleic acid comprising at least
two separate source nucleic acid molecules capable of supplying X+1
vector components, wherein the X+1 vector components are configured
for combining in the presence of X+1 insert sequences to form a
circular vector (e.g. recombinant plasmid), and wherein the vector
components are non-contiguous within the circular vector. In some
embodiments, X is a positive integer (e.g. 1-50). In particular
embodiments, X is selected from 1, 2, 3, 4, 5, and 6.
[0031] In other embodiments, the present invention provides
systems, kits, and compositions for cloning nucleic acid comprising
at least two separate source nucleic acid molecules capable of
supplying two vector components, wherein the vector components are
configured for combining in the presence of two insert sequences to
form a circular vector (e.g. recombinant plasmid), and wherein the
vector components are non-contiguous with the circular vector. In
some embodiments, the present invention provides systems, kits, and
compositions for cloning nucleic acid comprising at least two
separate source nucleic acid molecules capable of supplying three
vector components, wherein the vector components are configured for
combining in the presence of three insert sequences to form a
circular vector (e.g. recombinant plasmid), and wherein the vector
components are non-contiguous with the circular vector.
[0032] In some embodiments, the present invention provides systems,
compositions, and kits, comprising at least two separate source
nucleic acid molecules configured for supplying X+1 vector
components, wherein the X+1 vector components are configured for
combining in the presence of X+1 insert sequences to form a
circular vector such that the X+1 vector components are
non-contiguous within the circular vector. In certain embodiments,
the systems, compositions, and kits further comprise the X+1 insert
sequences.
[0033] In particular embodiments, the present invention provides
systems, compositions, and kits comprising X+1 vector components,
wherein each of the X+1 vector components are configured for
combining in the presence of X+1 insert sequences to form a
circular vector such that the X+1 vector components are
non-contiguous within the circular vector. In certain embodiments,
the systems, compositions, and kits further comprise the X+1 insert
sequences.
[0034] In certain embodiments, the present invention provides
compositions, kits, and systems for fixed orientation cloning. In
certain embodiments, vector components with selectable marker
sequences (e.g. all the same selectable marker sequences, or
different selectable marker sequences) are utilized for fixed
orientation cloning. In other embodiments, vector components
without selectable marker sequences are utilized for fixed
orientation cloning. In further embodiments, some vector components
with selectable marker sequences and some vector components without
selectable marker sequences are utilized for fixed orientation
cloning. In some embodiments, the present invention provides kits,
systems, and compositions for fixed orientation cloning comprising
X+1 vector components, wherein each of the X+1 vector components
comprises two different sticky free ends and are configured for
combining in the presence of X+1 insert sequences to form a
circular recombinant vector, wherein each of the X+1 insert
sequences comprise two identical sticky free ends that are unique
among the X+1 insert sequences. In preferred embodiments, each of
the two different sticky free ends (of the vector components) binds
one of the X+1 insert sequences. In other preferred embodiments,
the X+1 vector components are non-contiguous within the circular
recombinant vector.
[0035] In certain embodiments, each of the X+1 vector components
comprises; i) first and second free ends, and ii) a selectable
marker region comprising at least one selectable marker sequence
unique among the X+1 vector components. In particular embodiments,
each of the X+1 vector components further comprises; iii) a first
transcriptional terminator between the first free end and the
selectable marker region, and iv) a second transcriptional
terminator between the second free end and the selectable marker
region. In some embodiments, the first transcriptional terminator
is configured to terminate RNA transcripts entering the selectable
marker region from the first free end. In other embodiments, the
second transcriptional terminator is configured to terminate RNA
transcripts entering the selectable marker region from the second
free end.
[0036] In some embodiments, each of the X+1 vector components
comprises a non-promoter sequence between the first free end and
the selectable marker region, wherein the non-promoter sequence is
unable to serve as an operable promoter in a bacterial host cell.
In preferred embodiments, the bacterial host cell is Escherichia
coli. In other embodiments, each of the X+1 vector components
comprises a non-promoter sequence between the second free end and
the selectable marker region, wherein the non-promoter sequence is
unable to serve as an operable promoter in a bacterial host cell.
In preferred embodiments, the bacterial host cell is Escherichia
coli. In certain embodiments, there is a selectable marker after
the selectable marker region.
[0037] In certain embodiments, one of the X+1 vector components
comprises SEQ ID NO:85 or a sequence that is at least 90% identical
to SEQ ID NO:85 (e.g. at least 95% or at least 98% identical to SEQ
ID NO:85). In some embodiments, one of the X+1 vector components
comprises SEQ ID NO:86 or a sequence that is at least 90% identical
to SEQ ID NO:86 (e.g. at least 95% or at least 98% identical to SEQ
ID NO:86). In preferred embodiments, at least one of the X+1 insert
sequence is a lethal or toxic sequence (e.g. will not allow the
host cell to form a colony if the insert sequence is
transcribed).
[0038] In some embodiments, the first and second free ends are
configured such that they will not bind to each other. In certain
embodiments, the first and second free ends comprise 5' ends
lacking terminal phosphate groups. In other embodiments, the first
and second free ends are blunt free ends or sticky free ends. In
particular embodiments, at least one of the X+1 insert sequences is
of unknown sequence. In preferred embodiments, each of the X+1
vector components comprises two primer binding sites (e.g. such
that the circular vector formed has a pair of primer binding sites
for sequencing each of the X+1 insert sequences). In certain
embodiments, the circular vector is a low copy number circular
vector (e.g. contains a gene causing a low copy number or an origin
of replication causing a low copy number). In other embodiments,
the low copy number circular vector is configured such than no more
that 200 copies are produced in a host cell (e.g. no more than 100
or no more than 20 copies per host cell).
[0039] In some embodiments, the present invention provides fixed
orientation cloning. In particular embodiments, each of the X+1
insert sequences comprise two identical sticky free ends that are
unique among the X+1 insert sequences, wherein each of the X+1
vector components comprises two different sticky free ends, and
wherein each of the two different sticky free ends binds one of the
X+1 insert sequences.
[0040] In other embodiments, at least one of the X+1 vector
components comprises an ampicillin resistance gene and an Origin of
replication. In some embodiments, the ampicillin resistance
sequence is a mutated ampicillin resistance sequence configured to
reduce feeder colonies. In some embodiments, the mutated ampicillin
resistance gene (e.g. derived from pUC19) comprises at least one
mutation selected from: T to A at position 174; T to C at position
333; A to G at position 412, C to T at position 648; T to C at
position 668; T to C at position 764; and combinations thereof. In
preferred embodiments, the circular vector is a recombinant
plasmid. In other embodiments, the promoter of the ampicillin
resistance gene is replaced by a less active promoter (e.g. CamR
promoter).
[0041] In certain embodiments, each of the source nucleic acid
molecules is configured to supply no more than X of the X+1 vector
components. In some embodiments, at least one of the source nucleic
acid molecules comprises at least one of the X+1 vector components.
In particular embodiments, at least one of the source nucleic acid
molecules comprises a template for generating at least one of the
X+1 vector components.
[0042] In some embodiments, the present invention provides kits
comprising at least two separate source nucleic acid molecules
configured for supplying X+1 vector components, and one other
component (e.g., buffer, product insert, sequencing primers,
ligase, etc.). In other embodiments, the present invention provides
kits comprising X+1 vector components, wherein the X+1 vector
components are configured for combining in the presence of X+1
insert sequences to form a circular vector such that the X+1 vector
components are non-contiguous within the circular vector, and one
other component (e.g., buffer, product insert, sequencing primers,
ligase, etc.). In additional embodiments, the kits further comprise
an insert DNA end repair kit (e.g. comprising a polymerase and
kinase). In certain embodiments, the kits of the present invention
further comprise a written insert component (e.g. comprising
written instructions for using the kit).
[0043] In certain embodiments, the present invention provides
compositions comprising a vector component, wherein the vector
component comprises: i) first and second free ends; ii) a
selectable marker region, iii) a first transcriptional terminator
between the first free end and the selectable marker region, and
iv) a second transcriptional terminator between the second free end
and the selectable marker region, and wherein the vector component
is configured to form a circular vector when combined with an
insert sequence. In preferred embodiments, the insert sequence is a
lethal or toxic insert sequence (e.g. will not allow the host cell
to form a colony if the insert sequence is transcribed). In certain
embodiments, the insert sequence has at least 65% A/T content (e.g.
at least 65%, 75%, 80%, or 85% A/T content).
[0044] In some embodiments, the vector component comprises a
non-promoter sequence between the first free end and the selectable
marker region, wherein the non-promoter sequence is unable to serve
as an operable promoter in a bacterial host cell. In preferred
embodiments, the bacterial host cell is Escherichia coli.
[0045] In certain embodiments, the vector component comprises a
non-promoter sequence between the second free end and the
selectable marker region, wherein the non-promoter sequence is
unable to serve as an operable promoter in a bacterial host cell.
In preferred embodiments, the bacterial host cell is Escherichia
coli. In some embodiments, the first and second free ends comprise
5' ends lacking terminal phosphate groups. In other embodiments,
the first and second free ends are blunt free ends. In certain
embodiments, the selectable marker region comprises first and
second selectable marker sequences. In some embodiments, the
selectable marker region further comprises a transcriptional
terminator. In particular embodiments, the transcriptional
terminator is between the first and second selectable marker
sequences. In other embodiments, the first selectable marker
sequence is an Origin of Replication. In certain embodiments, the
second selectable marker sequence is an antibiotic resistance gene
comprising a promoter sequence and a protein encoding sequence. In
preferred embodiments, the promoter sequence is closer to the first
or second free ends than the protein encoding sequence (e.g.
transcription of the selectable marker sequence proceeds "away"
from the free ends).
[0046] In certain embodiments, the present invention provides
compositions comprising a circular vector, wherein said circular
vector comprises: i) a cloning site comprising at least one unique
restriction site for insertion of exogenous DNA; ii) a selectable
marker region, iii) a transcriptional terminator following the
selectable marker region, oriented so as to terminate any RNA
transcript initiated from the selectable marker region; iv) a
["5'-end"] transcriptional terminator between the cloning site and
the 5' end of the selectable marker region, oriented so as to
terminate RNA transcripts entering the 5' end of said selectable
marker region from the cloning site, and v) a ["3'-end"]
transcriptional terminator between the cloning site and the 3' end
said selectable marker region, oriented so as to terminate RNA
transcripts entering the 3' end of the selectable marker region
from the cloning site. In other embodiments, the circular vector is
configured such that it may be cleaved to generate a linear
fragment. In some embodiments, the circular vector further
comprises i) a gene that is toxic when expressed in a host cell,
ii) restriction sites that allow excision of the toxic gene, and
wherein the circular vector is configured [e.g. by excision of said
toxic gene or by PCR amplification to generate a linear fragment.
In some embodiments, the present invention provides circular
vectors comprising i) a gene that is toxic when expressed in a host
cell, and ii) one or more unique restriction sites within the toxic
gene, and wherein insertion of exogenous DNA into any of the one or
more unique restriction sites is likely to result in disruption of
expression of the toxic gene, allowing maintenance of the resulting
recombinant vector in host cells.
[0047] In some embodiments, the present invention provides
compositions comprising a circular vector, wherein the circular
vector comprises; i) a toxic gene sequence, and ii) a nucleic acid
sequence, wherein the nucleic acid sequence comprises; a) first and
second ends, b) a selectable marker region, c) a first
transcriptional terminator between the first end and the selectable
marker region, and d) a second transcriptional terminator between
the second end and the selectable marker region. In certain
embodiments, the circular vector is configured to generate a vector
component having first and second free ends upon removal of the
toxic gene sequence from the circular vector. In other embodiments,
the 3. The first transcriptional terminator is configured to
terminate RNA transcripts entering the selectable marker region
from the first end. In particular embodiments, the second
transcriptional terminator is configured to terminate RNA
transcripts entering the selectable marker region from the second
end.
[0048] In some embodiments, the selectable marker region comprises
a transcriptional terminator configured to terminate RNA
transcripts encoded by at least one selectable marker sequence in
the selectable marker region. In other embodiments, the nucleic
acid sequence comprises a first non-promoter sequence between the
first end and the selectable marker region, and a second
non-promoter sequence between the second end and the selectable
marker region, wherein each of the first and second non-promoter
sequences are unable to serve as an operable promoter in a host
cell. In preferred embodiments, the host cell is Escherichia
coli.
[0049] In certain embodiments, the selectable marker region
comprises first and second selectable marker sequences. In other
embodiments, the selectable marker region further comprises a
transcriptional terminator configured to terminate transcription of
at least one of the first and second selectable marker sequences.
In further embodiments, the nucleic acid sequence further comprises
two primer binding sites. In some embodiments, expression of the
toxic gene sequence prevents growth of a host cell. In particular
embodiments, the circular vector further comprises a cloning site
positioned such that introduction of an insert sequence into the
cloning site diminishes or prevents expression of the toxic gene
sequence. In other embodiments, the nucleic acid sequence comprises
a promoter sequence between the first or second end and the
selectable marker region.
[0050] In some embodiments, the selectable marker region comprises
an ampicillin resistance sequence. In preferred embodiments, the
ampicillin resistance sequence is a mutated ampicillin resistance
sequence configured to reduce feeder colonies. In some embodiments,
the mutated ampicillin resistance gene (e.g. derived from pUC19)
comprises at least one mutation selected from: T to A at position
174; T to C at position 333; A to G at position 412, C to T at
position 648; T to C at position 668; T to C at position 764; and
combinations thereof. In certain embodiments, the natural promoter
of the ampicillin resistance gene is replaced with a weaker
promoter.
[0051] In certain embodiments, the circular vector is a recombinant
plasmid. In preferred embodiments, the circular vector is low copy
number vector (e.g. produces less than 300, or less than 200, or
less than 100 or less than 50 or less than 20 copies per cell). In
some embodiments, the vector component further comprises two primer
binding sites. In preferred embodiments, the vector component
comprises SEQ ID NO:85 or a sequence that is at least 90% identical
to SEQ ID NO:85 (e.g. at least 95% or at least 98% identical to SEQ
ID NO:85).
[0052] In some embodiments, the present invention provides kits
comprising; a) a vector component, wherein the vector component
comprises: i) first and second free ends; ii) a selectable marker
region, iii) a first transcriptional terminator between the first
free end and the selectable marker region, and iv) a second
transcriptional terminator between the second free end and the
selectable marker region, and wherein the vector component is
configured to form a circular vector when combined with an insert
sequence; and b) one other component (e.g., buffer, product insert,
sequencing primers, ligase, etc.). In certain embodiments, there is
a transcriptional terminator after the selectable marker region. In
additional embodiments, the kits further comprise an insert DNA end
repair component (e.g. comprising a polymerase and kinase). In
certain embodiments, the kits of the present invention further
comprise a written insert component (e.g. comprising written
instructions). In certain embodiments, the selectable marker region
comprises at least one selectable marker sequence.
[0053] In certain embodiments, the vector components of the present
invention comprise at least one selectable marker sequence selected
from an ampicillin selectable marker, a chloramphenicol selectable
marker, a kanamycin selectable marker, a gentamycin selectable
marker, and a plasmid origin of replication (e.g. serving as a
selectable marker). In certain embodiments, the vector components
comprise at least one transcriptional terminator. In some
embodiments, the vector component comprise at least two, or at
least three, transcriptional terminators (e.g. flanking a
selectable marker). In certain embodiments, each selectable marker,
including Ori as a selectable marker, is flanked by transcriptional
terminators (e.g. strong transcriptional terminators). In
particular embodiments, each of the X+1 vector components comprises
at least one transcriptional terminator that is downstream of the
selectable marker sequence (i.e. the transcriptional terminator is
3' of the stop codon in the selectable marker sequence, see Amp
selectable marker sequence in FIG. 12B). In other embodiments, at
least one of the X+1 vector components comprises first and second
transcriptional terminators, wherein the first transcriptional
terminator is downstream of a selectable marker sequence, and
wherein the second transcriptional terminator is upstream of a
selectable marker sequence (i.e. 5' of the start codon of the
selectable marker sequence oriented to terminate transcripts
entering the selectable marker sequence).
[0054] In particular embodiments of the present invention, at least
one of the vector components comprises at least a portion of one of
the at least two separate source nucleic acid molecules. In other
embodiments, at least one of the vector components is amplified
(e.g. using PCR) from at least a portion of one of the at least two
separate source nucleic acid molecules (e.g. one of the separate
source nucleic acid molecules is exposed to primers that amplify at
least a portion of the sequence of the source nucleic acid
molecule). In preferred embodiments, the vector components are
linear (e.g. the vector components have ends that are not connected
to each other). In other preferred embodiments, each of the vector
components comprises at least two primer binding sites (e.g. to
allow insert DNA adjacent to the vector components to be
sequenced).
[0055] In some embodiments, the present invention provides systems,
kits, and compositions for cloning nucleic acid comprising at least
two separate source nucleic acid molecules capable of supplying X+1
vector components, wherein each of the source nucleic acid
molecules is configured to supply no more than X of the vector
components, and wherein the vector components are configured for
combining in the presence of X+1 insert sequences to form a
circular vector (e.g. recombinant plasmid) such that the X+1 vector
components are non-contiguous within the circular vector. In
particular embodiments, at least one of the at least two separate
source nucleic acid molecules is a replicable vector (e.g. a vector
that has an origin of replication and is therefore capable of being
copied by a host cell). In some embodiments, the replicable vector
is selected from a plasmid, a BAC, a cosmid, or a viral vector
(e.g. bacteriophage).
[0056] In some embodiments, at least one of the at least two
separate source nucleic acid molecules is a direct selection vector
(e.g. a vector with a lethal gene that has a cloning site in it).
In other embodiments, at least one of the at least two separate
source nucleic acid molecules is a conditional replication vector.
In particular embodiments, at least one of the source nucleic acid
molecules comprises at least one of the vector components. In
certain embodiments, at least one of the source nucleic acid
molecules is a vector component. In other embodiments, all of the
source nucleic acid molecules are vector components. In certain
embodiments, at least one of the source nucleic acid molecules
comprises a template for generating at least one of the vector
components (e.g., by amplification of the template by PCR).
[0057] In certain embodiments, the vector components are linear
with free 5' and 3' ends (e.g. in a double stranded vector
component, both 5' ends and both 3' ends are not linked to other
nucleic acid sequences). In some embodiments, each of the vector
components comprises free ends not compatible with the free ends of
the other vector components (e.g. the 5' end of the vector
components are not able to bind to either 3' end of another vector
component, or to their own 3' end). In preferred embodiments, the
free 5' ends of the vector components lack terminal phosphate
groups. In some embodiments, the ends of the vector components
comprise blunt free ends.
[0058] In some embodiments, at least one of the insert sequences is
of unknown sequence. In particular embodiments, each of the insert
sequences is of unknown sequence. In preferred embodiments, at
least one of the X+1 insert sequence is a lethal or toxic insert
sequence (e.g. will not allow the host cell to form a colony if the
insert sequence is transcribed, which may be determined by also
cloning the insert sequence in a conventional vector, such as
pUC19, to see if the insert sequence when transcribed is toxic or
lethal). In certain embodiments, the circular vector is capable of
being maintained by a host cell when the insert sequence has at
least 65% A/T content (e.g. at least 65%, 75%, 80%, or 85% A/T
content). In particular embodiments, the sequence of at least one
of the insert sequences is known. In particular embodiments, the
sequence of at least two insert sequences is known.
[0059] In certain embodiments, at least a portion of the sequence
of at least one of the insert sequences is known (e.g. 5, 10, 15,
20, 25 bases are known). In other embodiments, the sequence of at
least one of the insert sequences in unknown. In particular
embodiments, the sequence of at least two of the X+1 insert
sequences is the same (e.g. the circular vector formed has at least
two insert sequences that have the same sequence). In some
embodiments, each of the insert sequences is at least 20 base pairs
in length. In other embodiments, each of the insert sequences is at
least 100 base pairs in length. In yet other embodiments, each of
the insert sequences is at least 50, or at least 200, or at least
500, or at least 750, or at least 1000 base pairs in length. In
other embodiments, the insert sequences are from a shotgun cloning
library. In other embodiments, the insert sequences are greater
than 1000 base pairs in length (e.g. between 1001 and 7000). In
some embodiments, the insert sequences are between 2000 and 6000
base pairs in length. In further embodiments, the insert sequences
are greater than 7000 base pairs in length. In particular
embodiments, the insert sequences are identical (e.g. all of the
X+1 insert sequences have the same sequence).
[0060] In certain embodiments, each of the insert sequences is
linear (e.g. its ends are not ligated to each other to form a
closed loop). In particular embodiments, each of the insert
sequences is double stranded. In some embodiments, each of the
insert sequences is configured to bind two of the vector
components. In certain embodiments, each of the insert sequences is
capable of binding to: i) one of the vector components and, ii) one
other of the insert sequences. In particular embodiments, at least
one of the at least X+1 insert sequences comprises a DNA library.
In particular embodiments, none of the at least X+1 insert
sequences comprises a DNA library. In other embodiments, the insert
sequences comprise DNA. In particular embodiments, the insert
sequences comprise RNA.
[0061] In some embodiments, the termini of the vector components
are configured to provide fixed orientation multiplex cloning
vectors, in which the vector components can assemble only in a
fixed orientation relative to each other upon ligation to insert
DNA fragments. For example, in some embodiments, each of the X+1
insert sequences i) is configured to bind only two of the X+1
vector components, but not to itself or to any other insert
sequence, and ii) is combined with X+1 vector components, each of
the vector components being configured to bind only two of the X+1
insert sequences, but not to itself or to any other vector
component (e.g. the 5' end of the vector component is not able to
bind to the 3' end of another vector component or to its own 3'
end; see FIG. 16). As such, the vector components can be assembled
by ligation to the insert DNAs only in a fixed orientation relative
to each other. This arrangement allows for "paired-end" sequencing,
in which the ends of a given insert fragment are adjacent to a
defined pair of sequencing primers. The vector components may be
configured such that specific desired ends are generated by
restriction digestion, by PCR amplification, or by ligation of
oligonucleotide linkers. Specific desired ends of the insert DNAs
may be generated by ligating oligonucleotide linkers onto each of
X+1 pools of insert DNAs. In addition to providing fixed
orientation of the vector fragments, this method of multiplex
cloning eliminates the possibility of cloning multiple insert
fragments into a single cloning site.
[0062] In some embodiments, the present invention provides kits for
cloning nucleic acid comprising at least two separate source
nucleic acid molecules capable of supplying X+1 vector components,
wherein the vector components are configured for combining in the
presence of X+1 insert sequences to form a closed vector (e.g.
recombinant vector). In particular embodiments, the present
invention provides kits for cloning nucleic acid comprising at
least two separate source nucleic acid molecules capable of
supplying X+1 vector components, wherein the X+1 vector components
are configured for combining in the presence of X+1 insert
sequences to form a circular vector (e.g. recombinant plasmid).
[0063] In some embodiments, the present invention provides kits for
cloning nucleic acid comprising at least two separate source
nucleic acid molecules capable of supplying X+1 vector components,
wherein the vector components are configured for combining in the
presence of X+1 insert sequences to form a circular vector, and
wherein the vector components are non-contiguous within the
circular vector. In some embodiments, X is a positive integer (e.g.
1-100). In particular embodiments, X is selected from 1, 2, 3, 4,
5, and 6. In other embodiments, the present invention provides kits
for cloning nucleic acid comprising at least two separate source
nucleic acid molecules capable of supplying at least two vector
components, wherein the two vector components are configured for
combining in the presence of two insert sequences to form a
circular vector, and wherein the two vector components are
non-contiguous with the circular vector. In some embodiments, the
present invention provides kits for cloning nucleic acid comprising
at least two separate source nucleic acid molecules capable of
supplying at least three vector components, wherein the three
vector components are configured for combining in the presence of
at least three insert sequences to form a circular vector, and
wherein the vector components are non-contiguous within the
circular vector.
[0064] In some embodiments, the present invention provides
compositions for cloning nucleic acid comprising at least two
separate source nucleic acid molecules capable of supplying X+1
vector components, wherein the X+1 vector components are configured
for combining in the presence of X+1 insert sequences to form a
closed vector (e.g. recombinant plasmid). In particular
embodiments, the present invention provides compositions for
cloning nucleic acid comprising at least two separate source
nucleic acid molecules capable of supplying X+1 vector components,
wherein the vector components are configured for combining in the
presence of X+1 insert sequences to form a circular vector (e.g.
recombinant plasmid).
[0065] In some embodiments, the present invention provides
compositions for cloning nucleic acid comprising at least two
separate source nucleic acid molecules capable of supplying X+1
vector components, wherein the vector components are configured for
combining in the presence of X+1 insert sequences to form a
circular vector, and wherein the vector components are
non-contiguous within the circular vector. In some embodiments, X
is a positive integer. In particular embodiments, X is selected
from 1, 2, 3, 4, 5, and 6. In other embodiments, the present
invention provides compositions for cloning nucleic acid comprising
at least two separate source nucleic acid molecules capable of
supplying at least two vector components, wherein the vector
components are configured for combining in the presence of at least
two insert sequences to form a circular recombinant plasmid, and
wherein the vector components are non-contiguous with the circular
recombinant plasmid. In some embodiments, the present invention
provides compositions for cloning nucleic acid comprising at least
two separate source nucleic acid molecules capable of supplying at
least three vector components, wherein the vector components are
configured for combining in the presence of at least three insert
sequences to form a circular recombinant plasmid, and wherein the
vector components are non-contiguous with the circular recombinant
plasmid.
[0066] In some embodiments, the present invention provides
compositions comprising a vector, wherein the vector comprises; i)
X+1 vector components, and ii) X+1 insert sequences; and wherein
the vector components are non-contiguous within the recombinant
plasmid. In particular embodiments, the vector is a circular
vector. In other embodiments, the vector is a linear vector. In
certain embodiments, the vector components are derived from at
least two separate source nucleic acid molecules. In certain
embodiments, the vector components of the present invention
comprise at least one selectable marker sequence. In other
embodiments, the vector components comprise at least two selectable
marker sequences. In preferred embodiments, the vector components
comprises at least one unique selectable marker sequence (e.g. each
vector component has at least one selectable marker sequence not
found on the other vector components that make up the circular
vector). In certain embodiments, the vector components comprise at
least one selectable marker sequence selected from an ampicillin
selectable marker, a chloramphenicol selectable marker, a kanamycin
selectable marker, a gentamycin selectable marker, and a plasmid
origin of replication (e.g. serving as a selectable marker).
[0067] In particular embodiments of the compositions of the present
invention, at least one of the vector components comprise at least
a portion of one of the at least two separate source nucleic acid
molecules. In other embodiments, at least one of the vector
components is amplified (e.g. by PCR) from at least a portion of
one of the at least two separate source nucleic acid molecules
(e.g. one of the separate source nucleic acid molecules is exposed
to primers that amplify at least a portion of the sequence of the
source nucleic acid molecule). In preferred embodiments, the vector
components are linear (e.g. they have ends that are not connected
to each other). In other preferred embodiments, the vector
components comprise at least two primer binding sites (e.g. to
allow insert DNA adjacent to the vector components to be
sequenced).
[0068] In some embodiments, the present invention provides
compositions for cloning nucleic acid comprising at least two
separate source nucleic acid molecules capable of supplying X+1
vector components, wherein each of the source nucleic acid
molecules is configured to supply no more than X of the vector
components, and wherein the vector components are configured for
combining in the presence of X+1 insert sequences to form a
circular vector (e.g. recombinant plasmid), and wherein the vector
components are non-contiguous within the circular vector. In
particular embodiments, at least one of the at least two separate
source nucleic acid molecules is a replicable vector (e.g. a vector
that has an origin of replication, and is capable of being copied
by a host cell). In some embodiments, the replicable vector is
selected from a plasmid, a BAC, a cosmid, a viral vector (e.g.
bacteriophage).
[0069] In some embodiments, at least one of the at least two
separate source nucleic acid molecules is a direct selection vector
(e.g. a vector with a lethal gene that has a cloning site in it).
In other embodiments, at least one of the at least two separate
source nucleic acid molecules is a conditional replication vector.
In particular embodiments, at least one of the source nucleic acid
molecules comprises at least one of the vector components. In
certain embodiments, at least one of the source nucleic acid
molecules comprises a template for generating at least one of the
vector components by amplification.
[0070] In certain embodiments, the vector components are linear
with free 5' and 3' ends (e.g. in a double stranded vector
component, both 5' ends and both 3' ends are not linked to other
nucleic acid sequences). In some embodiments, each of the vector
components comprises free ends not compatible with the free ends of
the other vector components (e.g. the 5' end of the vector
components is not able to bind to either end of another vector
component, or to its own 3' end). In preferred embodiments, the
free ends of the vector components lack terminal 5' phosphate
groups.
[0071] In some embodiments, at least one of the insert sequences is
of unknown sequence. In particular embodiments, each of the insert
sequences is of unknown sequence. In particular embodiments, the
sequence of at least one of the insert sequences is known. In
particular embodiments, the sequence of at least two of the insert
sequences is known. In certain embodiments, at least a portion of
the sequence of at least one of the insert sequences in known (e.g.
5, 10, 15, 20, 25 bases are known). In other embodiments, the
sequence of at least one of the insert sequences is unknown. In
some embodiments, each of the insert sequences is at least 20 base
pairs in length. In other embodiments, each of the insert sequences
is at least 100 base pairs in length. In yet other embodiments,
each of the insert sequences is at least 50, or at least 200, or at
least 500, or at least 750, or at least 1000 base pairs in length.
In other embodiments, the insert sequences are from a shotgun
cloning library. In other embodiments, the insert sequences are
between 1000 and 7000 base pairs in length. In some embodiments,
the insert sequences are between 7000 and 12000 base pairs in
length. In particular embodiments, the insert sequences are
identical (e.g. all of the X+1 insert sequences have the same
sequence).
[0072] In certain embodiments, each of the insert sequences is
linear (e.g. its ends are not ligated to each other to form a
closed loop). In particular embodiments, each of the insert
sequences is double stranded. In some embodiments, each of the
insert sequences are configured to bind two of the vector
components. In certain embodiments, each of the insert sequences
are capable of binding to: i) one of the vector components, and ii)
one other of the insert sequences. In particular embodiments, at
least one of the X+1 insert sequence comprises a DNA library. In
other embodiments, the insert sequences comprise DNA. In particular
embodiments, the insert sequences comprise RNA. In some
embodiments, the insert sequences comprise ends that are
phosphorylated.
[0073] In some embodiments, each of the X+1 insert sequences i) is
configured to bind two of the vector components, but not to itself
or to any other insert sequence, and ii) is combined with X+1
vector components, each of the vector components comprising one
free end compatible with one of the insert ends and one free end
compatible with another insert end, but not compatible with the
free ends of the other vector components (e.g. the 5' end of the
vector components is not able to bind to either 3' end of another
vector component, or to its own 3' end) (see FIG. 16).
[0074] In some embodiments, the present invention provides
compositions comprising a circular vector, wherein the circular
vector comprises a plurality of cloning sites, each separated by at
least one selectable marker sequence. In certain embodiments, the
circular vector is a direct selection vector. In other embodiments,
the circular vector is a conditional replication vector. In
particular embodiments, the plurality of cloning sites comprises at
least three cloning sites. In additional embodiments, the plurality
of cloning sites comprises at least four (or five, or six, or
seven) cloning sites. In some embodiments, at least one selectable
marker sequence comprises two selectable marker sequences. In other
embodiments, the selectable marker sequences comprises at least two
primer binding sites. In particular embodiments, at least one
selectable marker sequences selected from ampicillin,
chloramphenicol, kanamycin, gentamycin, and a plasmid origin of
replication. In some embodiments, the circular vector is a
plasmid.
[0075] In some embodiments, the present invention provides
compositions comprising a circular vector, wherein the circular
vector comprises at least two selectable marker sequences, wherein
each of the selectable marker sequences is flanked by cloning
sites.
[0076] In other embodiments, the present invention provides
composition comprising a circular vector, wherein the circular
vector comprises at least two vector components, wherein each of
the vector components comprises at least one selectable marker
sequence, and wherein each of the vector components is flanked by
cloning sites.
[0077] In certain embodiments, the present invention provides
methods for cloning nucleic acid comprising: a) providing; i) at
least two separate source nucleic acid molecules, and ii) at least
X+1 insert sequences; and b) treating the at least two separate
source nucleic acid molecules under conditions such that at least
X+1 vector components are generated; and c) combining the at least
X+1 insert sequences with the at least X+1 vector components under
conditions such that a circular recombinant vector is generated,
wherein the vector components are non-contiguous within the
circular vector. In some embodiments, the method further comprises:
providing; iii) host cells, and step d) transfecting the host cells
with the circular vector (e.g., recombinant plasmid) generating
transfected cells. In other embodiments, the method further
comprises; providing iv) selective growth media, and step e)
treating the transfected cells with the selective media to select
cells containing X+1 insert sequences.
[0078] In particular embodiments, step c) generates a plurality of
circular vectors (e.g. recombinant plasmids), and the method
further comprises step f) identifying the cells containing X+1
insert sequences, wherein the identifying is at least 95% accurate
(e.g. there is only 5% that is false positives). In preferred
embodiments, the identifying is at least 98% accurate. In
particularly preferred embodiments, the identifying is at least 99%
accurate. In most preferred embodiments, the identifying is
approximately 100% accurate (e.g. 99.5% or greater). In certain
embodiments, the selective growth media comprises at least X+1
selective agents. In different embodiments, the selective growth
media comprises X selective agents (e.g. an origin of replication
being employed as a selective marker). In some embodiments, the
selective agents are selected from ampicillin, chloramphenicol,
kanamycin, and gentamycin.
[0079] In some embodiments, the method further comprises providing
multiplex sequencing reagents, and step d) mixing the multiplex
sequencing reagents with the circular vector (e.g. recombinant
plasmid) under conditions such that at least a portion of each of
the X+1 insert sequences are sequenced (e.g. at least 5, 10, 15,
20, 25, 100 bases are determined from each of the insert
sequences). In preferred embodiments, at least 400, or 500 bases
are determined from each of the insert sequences. In particularly
preferred embodiments, at least 500 or at least 700 bases are
determined from each of the insert sequences. In some embodiments,
the multiplex sequencing reagents comprise: i) at least two primers
for each of the X+1 insert sequences, ii) a nucleic acid
polymerizing agent, and iii) nucleotides, wherein a portion of the
nucleotides are di-deoxy nucleotides.
[0080] In certain embodiments, the present invention provides
methods for cloning nucleic acid comprising: a) providing; i) at
least two separate source nucleic acid molecules, and ii) at least
X+1 insert sequences; and b) treating the at least two separate
source nucleic acid molecules under conditions such that at least
X+1 vector components are generated; and c) combining the at least
X+1 insert sequences with the at least X+1 vector components under
conditions such that a circular vector (e.g. recombinant plasmid)
is generated. In certain embodiments, the treating comprises
exposing the at least two separate source nucleic acid molecules to
restriction enzymes and/or alkaline phosphatase. In other
embodiments, the treating comprises employing at least a portion of
one of the at least two separate source nucleic acid molecules as a
template for PCR.
[0081] In certain embodiments, the X+1 vector components of the
present invention comprise at least one selectable marker sequence.
In some embodiments, the vector components comprise: i) first and
second free ends, and ii) a selectable marker region comprising at
least one selectable marker sequence unique among the X+1 vector
components. In further embodiments, the X+1 vector components
further comprise a first transcriptional terminator between the
first free end and the selectable marker region. In other
embodiments, the X+1 vector components comprise a second
transcriptional terminator between the second free end and the
selectable marker region. In other embodiments, the vector
components comprise at least two selectable marker sequences. In
preferred embodiments, the vector components comprises at least one
unique selectable marker sequence (e.g. each vector component has
at least one selectable marker sequence not found on the other
vector components that make up the circular vector). In certain
embodiments, the vector components comprise at least one selectable
marker sequence selected from an ampicillin selectable marker, a
chloramphenicol selectable marker, a kanamycin selectable marker, a
gentamycin selectable marker, tetracycline, and a plasmid origin of
replication (e.g. serving as a selectable marker). In some
embodiments, the selectable marker sequences are antibiotic
resistance genes. In certain embodiments, there is a
transcriptional terminator after the selectable marker
sequence.
[0082] In particular embodiments of the methods of the present
invention, at least one of the vector components comprise at least
a portion of one of the at least two separate source nucleic acid
molecules. In other embodiments, at least one of the vector
components is PCR generated from at least a portion of one of the
at least two separate source nucleic acid molecules (e.g. one of
the separate source nucleic acid molecules is exposed to primers
that amplify at least a portion of the sequence of the source
nucleic acid molecule). In preferred embodiments, the vector
components are linear (e.g. the have ends that are not connected to
each other). In other preferred embodiments, the vector components
comprise at least two primer binding sites (e.g. to allow insert
DNA adjacent to the vector components to be sequenced).
[0083] In particular embodiments, at least one of the at least two
separate source nucleic acid molecules is a replicable vector (e.g.
a vector that has an origin of replication, and is capable of being
copied by a host cell). In some embodiments, the replicable vector
is selected from a plasmid, a BAC, a cosmid, a viral vector (e.g.
bacteriophage).
[0084] In some embodiments, at least one of the at least two
separate source nucleic acid molecules is a direct selection vector
(e.g. a vector with a lethal gene that has a cloning site in it).
In other embodiments, at least one of the at least two separate
source nucleic acid molecules is a conditional replication vector.
In particular embodiments, at least one of the source nucleic acid
molecules comprises at least one of the vector components. In
certain embodiments, at least one of the source nucleic acid
molecules comprises a template for generating at least one of the
vector components (e.g. by amplification).
[0085] In certain embodiments, the vector components are linear
with free 5' and 3' ends (e.g. in a double stranded vector
component, both 5' ends and both 3' ends are not linked to other
nucleic acid sequences). In some embodiments, each of the vector
components comprises free ends not compatible with the free ends of
the other vector components (e.g. the 5' end of the vector
components is not able to bind to either end of another vector
components, or to its own 3' end). In preferred embodiments, the
free ends of the vector components lack terminal phosphate
groups.
[0086] In some embodiments, at least one of the insert sequences is
of unknown sequence. In particular embodiments, each of the insert
sequences is of unknown sequence. In particular embodiments, the
sequence of at least one of the insert sequences is known. In
particular embodiments, the sequence of both of the insert
sequences is known. In certain embodiments, at least a portion of
the sequence of at least one of the insert sequences in known (e.g.
5, 10, 15, 20, 25 bases are known). In other embodiments, the
sequence of at least one of the insert sequences is unknown. In
some embodiments, each of the insert sequences is at least 20 base
pairs in length. In other embodiments, each of the insert sequences
is at least 100 base pairs in length. In yet other embodiments,
each of the insert sequences is at least 50, or at least 200, or at
least 500, or at least 750, or at least 1000 base pairs in length.
In other embodiments, the insert sequences are from a shotgun
cloning library. In other embodiments, the insert sequences are
between 1000 and 7000 base pairs in length. In some embodiments,
the insert sequences are between 7000 and 12000 base pairs in
length. In particular embodiments, the insert sequences are
identical (e.g. all of the X+1 insert sequences have the same
sequence).
[0087] In certain embodiments, each of the insert sequences is
linear (e.g. its ends are not ligated to each other to form a
closed loop). In particular embodiments, each of the insert
sequences is double stranded. In some embodiments, each of the
insert sequences is configured to bind two of the vector
components. In certain embodiments, each of the insert sequences is
capable of binding to: i) one of the vector components and, ii) one
other of the insert sequences. In particular embodiments, at least
one of the at least X+1 insert sequence comprises a DNA library. In
other embodiments, the insert sequences comprise DNA. In particular
embodiments, the insert sequences comprise RNA.
[0088] In some embodiments, each of the X+1 insert sequences i) is
configured to bind two of the vector components, but not to itself
or to any other insert sequence, and ii) is combined with X+1
vector components, each of the vector components comprising one
free end compatible with one of the insert ends and one free end
compatible with another insert end, but not compatible with the
free ends of the other vector components (e.g. the 5' end of the
vector components is not able to bind to the 3' end of another
vector components, or to its own 3' end) (see, e.g., FIG. 16).
[0089] In certain embodiments, the present invention provides
methods for cloning nucleic acid comprising; providing; i) at least
X+1 vector components, and ii) at least X+1 insert sequences; and
b) combining the at least X+1 insert sequences with the at least
X+1 vector components under conditions such that a circular
recombinant plasmid is generated, wherein the vector components are
non-contiguous within the circular recombinant plasmid.
[0090] In other embodiments, the present invention provides methods
for sequencing nucleic acid comprising: a) providing; i) a circular
vector comprising; A) X+1 vector components, and B) X+1 insert
sequences; and wherein the vector components are non-contiguous
within the circular recombinant plasmid, and ii) multiplex
sequencing reagents; and b) mixing the multiplex sequencing
reagents with the circular vector under conditions such that at
least a portion of each of the X+1 insert sequences are sequenced.
In some embodiments, the multiplex sequencing reagents comprise: i)
at least two primers for each of the X+1 insert sequences, ii) a
nucleic acid polymerizing agent, and iii) nucleotides, wherein a
portion of the nucleotides are di-deoxy nucleotides.
[0091] In certain embodiments, the present invention provides
methods comprising combining a plurality of vector components and a
plurality of insert sequences under conditions such that a circular
recombinant plasmid containing two or more of the insert sequences
is formed (in some embodiments the vector components are
non-contiguous). In some embodiments, the circular recombinant
plasmid contains three or more of the insert sequences. In
particular embodiments, the circular recombinant plasmid contains
four or more of the insert sequences.
[0092] In some embodiments, the present invention provides
compositions comprising a direct selection vector, wherein the
direct selection vector comprises; i) a plasmid origin of
replication, and ii) a bacteriophage T7 1.2 gene sequence (or a
sequence encoding a protein identical to the T7 1.2 gene product,
or a sequence encoding a protein that has the same biological
activity as the T7 1.2 gene, e.g. the amino acid sequence for T7
1.2 with minor deletions, substitutions, or additions, that do not
alter the biological activity of the peptide). In particular
embodiments, the direct selection vector further comprises at least
one selectable marker sequence. In other embodiments, the direct
selection vector further comprises a multiple cloning site. In
certain embodiments, the multiple cloning site is derived from
pUC19. In yet other embodiments, the multiple cloning site is
located between the first and second codon of the bacteriophage T7
1.2 gene sequence. In yet other embodiments, the multiple cloning
site is located between two other adjacent codons of the
bacteriophage T7 1.2 gene sequence. In particular embodiments, the
multiple cloning site comprises SEQ ID NO:29. In additional
embodiments, the multiple cloning site comprises SEQ ID NO:30. In
preferred embodiments, the direct selection vector is pT71.2. In
other embodiments, the direct selection vector is pTM2. In some
embodiments, the vector generated by the above method is pCTA1. In
other embodiments, the vector generated by the above method is
pCTAB4.3. In still other embodiments, the vector generated by the
above method is pCTH1.4. In other embodiments, the vector generated
by the above method is pATH. In other embodiments, the vector
generated by the above method is pATBAG. In still other
embodiments, the vector generated by the above method is pATR-G. In
certain embodiments, the vector generated by the above method is
pAT6-6. In other embodiments, the vector generated by the above
method is pARG. In certain embodiments, the bacteriophage T7 1.2
gene is lethal in F' E. coli cells.
[0093] In certain embodiments, the present invention provides
methods for generating a vector comprising: a) providing; i) a
direct selection vector comprising; A) a plasmid origin of
replication, and B) a bacteriophage T7 1.2 gene sequence; ii) a
composition comprising at least one type of restriction enzyme; and
iii) in certain embodiments a composition comprising a phosphatase
(e.g. calf intestinal phosphatase); and b) exposing the direct
selection vector to the composition under conditions such that the
bacteriophage T7 1.2 gene is removed from the direct selection
vector. In some embodiments, the exposing step generates a cloning
vector, or vector component, lacking the bacteriophage T7 1.2 gene
sequence. In further embodiments, the present invention provides a
compositions comprising the vector lacking the bacteriophage T7 1.2
gene, generated by the above method.
[0094] In some embodiments, the present invention provides methods
for generating a vector component comprising; a) providing; i) a
circular vector comprising; A) a selectable marker region, B) a
direction selection sequence (e.g. T7 1.2 gene or Barnase), C) a
first transcriptional terminator upstream of the direct selection
sequence, wherein the first transcriptional terminator is between
the selectable marker region and the direct selection sequence, and
D) a second transcriptional terminator downstream of the direct
selection sequence, wherein the second transcriptional terminator
is between the selectable marker region and the direct selection
sequence; and ii) a composition comprising at least one type of
restriction enzyme; and iii) in certain embodiments a composition
comprising a phosphatase (e.g. calf intestinal phosphatase); and b)
exposing the circular vector to the composition under conditions
such that the direct selection sequence is removed from the
circular vector, thereby generating a vector component with first
and second free ends (e.g. blunt free ends). In certain
embodiments, the method further comprises step c) exposing the
vector component to a phosphatase (e.g. calf intestinal
phosphatase), such that the free ends are dephosphorylated. In
certain embodiments, the selectable marker region comprises at
least one selectable marker followed by a transcriptional
terminator.
[0095] In certain embodiments, the present invention provides
methods comprising, a) providing; i) X+1 vector components, and ii)
X+1 insert sequences; and b) combining the X+1 vector components
and the X+1 insert sequences under conditions such that a circular
vector is formed, wherein the X+1 vector components are
non-contiguous with the circular vector. In some embodiments, each
of the X+1 vector components comprises; i) first and second free
ends, and ii) a selectable marker region comprising at least one
selectable marker sequence unique among the X+1 vector components.
In other embodiments, each of the X+1 vector components further
comprises; iii) a first transcriptional terminator between the
first free end and the selectable marker region, and iv) a second
transcriptional terminator between the second free end and the
selectable marker region. In particular embodiments, each of the
X+1 vector components comprises a non-promoter sequence between the
first free end and the selectable marker region, wherein the
non-promoter sequence is unable to serve as an operable promoter in
a bacterial host cell (e.g., Escherichia coli). In other
embodiments, each of the X+1 vector components comprises a
non-promoter sequence between the second free end and the
selectable marker region, wherein the non-promoter sequence is
unable to serve as an operable promoter in a bacterial host cell.
In certain embodiments, the selectable marker region comprises at
least one selectable marker followed by a transcriptional
terminator.
[0096] In some embodiments, the method further comprises; providing
iii) host cells, and step c) transfecting the host cells with the
circular vector (e.g., recombinant plasmid) generating transfected
cells. In other embodiments, the method further comprises;
providing iv) selective growth media, and step d) treating the
transfected cells with the selective media to select cells
containing X+1 insert sequences.
[0097] In particular embodiments, step b) generates a plurality of
circular vectors (e.g. recombinant plasmids), and the method
further comprises step e) identifying the cells containing X+1
insert sequences, wherein the identifying is at least 95% accurate
(e.g. there is only 5% or less that are false positives). In
preferred embodiments, the identifying is at least 98% accurate. In
particularly preferred embodiments, the identifying is at least 99%
accurate. In most preferred embodiments, the identifying is
approximately 100% accurate (e.g. 99.5% or greater)
[0098] In some embodiments, the present invention provides methods
comprising, a) providing; i) a vector component, wherein the vector
component comprises: A) first and second free ends; B) a selectable
marker region, C) a first transcriptional terminator between the
first free end and the selectable marker region, and D) a second
transcriptional terminator between the second free end and the
selectable marker region, and ii) and an insert sequence, and b)
combining the vector component and the insert sequence under
conditions such that a circular vector is formed. In certain
embodiments, the vector component further comprises a non-promoter
sequence between the first free end and the selectable marker
region, wherein the non-promoter sequence is unable to serve as an
operable promoter in a bacterial host cell (e.g. Escherichia coli).
In particular embodiments, the vector component comprises a
non-promoter sequence between the second free end and the
selectable marker region, wherein the non-promoter sequence is
unable to serve as an operable promoter in a bacterial host cell
(e.g. Escherichia coli). In some embodiments, the vector component
comprises a third transcriptional terminator (e.g. after at least
one selectable marker sequence).
[0099] In some embodiments, the method further comprises; further
providing iii) host cells, and step c) transfecting the host cells
with the circular vector (e.g., recombinant plasmid) generating
transfected cells. In other embodiments, the method further
comprises; providing iv) selective growth media, and step d)
treating the transfected cells with the selective media to select
cells containing X+1 insert sequences.
[0100] In particular embodiments, step b) generates a plurality of
circular vectors (e.g. recombinant plasmids), and the method
further comprises step e) identifying the cells containing X+1
insert sequences, wherein the identifying is at least 95% accurate
(e.g. there is only 5% or less that are false positives). In
preferred embodiments, the identifying is at least 98% accurate. In
particularly preferred embodiments, the identifying is at least 99%
accurate. In most preferred embodiments, the identifying is
approximately 100% accurate (e.g. 99.5% or greater)
[0101] In certain embodiments, the present invention provides
methods for fixed orientation cloning comprising; a) providing; i)
X+1 vector components, wherein each of the X+1 vector components
comprises two different sticky free ends, and ii) X+1 insert
sequence pools, wherein each of the X+1 insert sequence pools
comprises a plurality of insert sequences, and b) treating each of
the X+1 insert sequence pools under conditions such that the
plurality of insert sequences in each of the X+1 insert sequence
pools comprise two identical sticky free ends that are unique among
the X+1 insert sequence pools, and c) combining the X+1 vector
components and the X+1 sequence pools under conditions such that
each of the two different sticky free ends, of each of the X+1
vector components, binds one of the plurality of insert sequences
from one of the X+1 insert sequence pools. In some embodiments, the
treating step comprises exposing the plurality of insert sequences
in each of the X+1 insert sequence pools to a plurality of one type
of linker (e.g. CCCC linkers and ligase are added to one of the
pools, and TTTT linkers and ligase are added to a different pool).
The present invention is not limited to the length or sequence of
the linkers employed. Indeed, any type of linker oligonucleotide
may be used. In preferred embodiments, each of the X+1 pools is
exposed to a different type of linker. In certain embodiments, the
treating step comprises exposing the plurality of insert sequences
in each of the X+1 insert sequence pools to a plurality of one type
of restriction enzyme (e.g. to generate sticky ends).
[0102] In particular embodiments, the present invention provides
methods comprising; a) providing; i) X+1 vectors (e.g. circular or
linearized), wherein each of the vectors comprises; A) an identical
origin of replication (i.e. each of the X+1 vector components
comprises the same origin of replication), and B) at least one
selectable marker sequence unique among the X+1 vectors, ii) a
plurality of insert sequences, and iii) host cells; and b)
combining the X+1 vectors and the plurality of insert sequences
under conditions such that X+1 recombinant vectors are generated;
and c) transforming the host cells with the X+1 recombinant vectors
(e.g. transforming the host cells with each of the X+1 vectors at
approximately the same time) to generate transformed host cells. In
further embodiments, the methods further comprise; providing iv)
selective growth media, and step d) treating the transformed host
cells with the selective media to select cells containing X+1
recombinant vectors.
[0103] In certain embodiments, the selective growth media comprises
at least X+1 selective agents. In different embodiments, the
selective growth media comprises X selective agents (e.g. an origin
of replication being employed as a selective marker). In some
embodiments, the selective agents are selected from ampicillin,
chloramphenicol, kanamycin, and gentamycin.
[0104] In some embodiments, the present invention provides methods
comprising; a) providing; i) X+1 vectors (e.g. circular or
linearized), wherein each of the vectors comprises; A) an identical
origin of replication (i.e. each of the X+1 vector components
comprises the same origin of replication), and B) at least one
selectable marker sequence unique among the X+1 vectors, and ii)
X+1 insert sequence pools; and b) combining each of the insert
sequence pools with one of the X+1 vectors such that X+1
recombinant vector pools comprising recombinant vectors are
generated, and c) contacting the host cells with the X+1
recombinant vector pools (e.g. transforming the host cells with
each of the X+1 vector pools at approximately the same time) to
generate transformed host cells. In further embodiments, the
methods further comprise; providing iv) selective growth media, and
step d) treating the transformed host cells with the selective
media to select cells containing X+1 recombinant vectors.
[0105] In certain embodiments, the present invention provides
compositions, systems, and kits comprising a circular vector (e.g.
plasmid), wherein the circular vector comprises a barnase encoding
nucleic acid sequence, and wherein the circular vector does not
contain an operable barstar encoding nucleic acid sequence. In some
embodiments, the present invention provides cells comprising a
circular vector (e.g. plasmid), wherein the circular vector
comprises a barnase encoding nucleic acid sequence, and wherein the
circular vector does not contain an operable barstar encoding
nucleic acid sequence. In other embodiments, the present invention
provides cells comprising i) a first circular vector (e.g.
plasmid), wherein the first circular vector comprises a barnase
encoding nucleic acid sequence, and wherein the first circular
vector does not contain an operable barstar encoding nucleic acid
sequence, and ii) a second circular vector comprising a barstar
encoding nucleic acid sequence.
[0106] In certain embodiments, the present invention provides
methods comprising; a) providing; i) a plurality of circular
vectors (e.g. plasmids), wherein the circular vectors comprise a
barnase encoding nucleic acid sequence, and wherein the circular
vectors do not contain an operable barstar encoding nucleic acid
sequence, ii) host cells that do not contain a nucleic acid
sequence encoding barnase, and iii) a plurality of insert
sequences; b) combining the plurality of circular vectors and the
plurality of insert sequences such that a plurality of recombinant
vectors are generated, c) transforming the host cells with the
plurality of recombinant vectors to generate a plurality of
transformed cells, and d) plating the plurality of transformed
cells on selective media such that transformed cells containing
recombinant circular vectors with disrupted barnase encoding
nucleic acid sequences are identified.
[0107] In certain embodiments, the present invention provides
compositions comprising X+1 vector components configured for
cloning X+1 insert sequences with a false positive background of
less than 5%, or less than 2% or less than 1% (e.g. 0.5% false
positives). In certain embodiments, the present invention provides
compositions comprising a plurality of circular vectors configured
to yield at least 98% recombinant clones when grown on selective
media (e.g., approximately 99% or 99.5% or greater recombinant
clones), wherein at least a portion of the circular vectors
comprise at least two insert sequences. In some embodiments, the
present invention provides compositions comprising a vector
configured to clone at least one insert (e.g. one insert) without
transcription of the insert sequence when transformed into a host
cell. In other embodiments, the present invention provides
compositions comprising a vector configured to clone at least two
insert sequences without transcription of the insert sequences when
transformed into a host cell.
DESCRIPTION OF THE FIGURES
[0108] FIG. 1 shows a schematic diagram illustrating certain
differences between conventional single-fragment cloning vectors
and a multiplex vector of the present invention (e.g. with
dispersed restriction sites) capable of co-cloning independent
insert sequences (e.g. four independent insert sequences are shown
in this embodiment). The hash marks indicate restriction sites.
P1-8 indicates primer-binding sites. Amp; ampicillin resistance
gene, Cam; chloramphenicol resistance gene, Kan; kanamycin
resistance gene, lacZ.alpha.; alpha fragment of the lacZ gene, Ori;
origin of replication, SmaI; recognition site for SmaI restriction
endonuclease.
[0109] FIG. 2 shows a schematic diagram illustrating the
construction of a duplex cloning vector (pUC19Kan is shown) and a
duplex shotgun cloning library. The hash marks indicate restriction
sites.
[0110] FIG. 3 shows the construction of a duplex clone library
using two sources to supply the vector components.
[0111] FIG. 4 shows the construction of a triplex clone library
using PCR amplified selectable markers from independent plasmid
vectors. Amp; ampicillin resistance gene, Cam; chloramphenicol
resistance gene, Tet; tetracycline resistance gene, lacZ.alpha.;
alpha fragment of the lacZ gene, Ori; origin of replication, SmaI;
recognition site for SmaI restriction endonuclease.
[0112] FIG. 5 shows the construction of a quadraplex clone library
construction using two independent plasmid vectors.
[0113] FIG. 6 shows the construction of a pentaplex clone library
using two independent plasmid vectors.
[0114] FIG. 7 shows the construction of T7 gene 1.2 based direct
selection vectors.
[0115] FIG. 8 shows the construction of second generation direct
selection cloning vectors.
[0116] FIG. 9 shows the construction of conditional replication
plasmids.
[0117] FIG. 10 shows the structure of certain recombinant duplex
plasmid clones.
[0118] FIG. 11 shows the nucleic acid sequence for SEQ ID
NO:41.
[0119] FIG. 12A shows a schematic of a vector component (designated
pSMART).
[0120] FIG. 12B shows a schematic of two vector components
(together designated pLEXX-AK) configured to form a circular
plasmid upon combining with two insert sequences.
[0121] FIG. 13 shows the sequence of the primers (KanL1, SEQ ID
NO:114; KanR1, SEQ ID NO:115; AmpL1, SEQ ID NO:116; and AmpR1, SEQ
ID NO:117) configured for use with the vector components shown in
FIGS. 12A and 12B.
[0122] FIG. 14A shows the sequence (SEQ ID NO:85) of the vector
component shown in FIG. 12A and 12B, and FIG. 14B shows the
sequence (SEQ ID NO:86) of the vector component shown in FIG.
12B.
[0123] FIG. 15 shows construction of a third generation type direct
selection vector.
[0124] FIG. 16A shows a schematic diagram illustrating one
embodiment of Fixed Orientation Multiplex Cloning in which vector
components may be assembled only in a defined orientation relative
to each other. Vector components AB, BC, and CA are ligated to
insert DNA fragments A, B, and C. The termini of the inserts,
labeled "a," "b," or "c," are compatible only to the termini
labeled "a'," "b'," and "c'," respectively, which are present on
the vector components.
[0125] FIG. 16B shows Fixed Orientation Multiplex Cloning as
described in Example 15. Vector components pATBAG and pKfBAG were
digested with BstXI to generate termini of AAAA-3' and GGGG-3' on
each component. Insert fragment pools #1 and #2 were ligated with
linkers to generate termini of CCCC-3' or TTTT-3',
respectively.
DEFINITIONS
[0126] To facilitate an understanding of the invention, a number of
terms are defined below.
[0127] As used herein, the term "nucleotide" refers to a monomeric
unit of nucleic acid (e.g. DNA or RNA) consisting of a sugar moiety
(pentose), a phosphate group, and a nitrogenous heterocyclic base.
The base is linked to the sugar moiety via the glycosidic carbon
(1' carbon of the pentose) and that combination of base and sugar
is called a nucleoside. When the nucleoside contains a phosphate
group bonded to the 3' or 5' position of the pentose it is referred
to as a nucleotide. A sequence of operatively linked nucleotides is
typically referred to herein as a "base sequence" or "nucleotide
sequence" or "nucleic acid sequence," and is represented herein by
a formula whose left to right orientation is in the conventional
direction of 5'-terminus to 3'-terminus.
[0128] As used herein, the term "base pair" refers to the hydrogen
bonded nucleotides of, for example, adenine (A) with thymine (T),
or of cytosine (C) with guanine (G) in a double stranded DNA
molecule. In RNA, uracil (U) is substituted for thymine. This term
base pair is also used generally as a unit of measure for DNA
length. Base pairs are said to be "complementary" when their
component bases pair up normally by hydrogen bonding, such as when
a DNA or RNA molecule adopts a double stranded configuration.
[0129] As used herein, the terms "nucleic acid" and "nucleic acid
molecule" refer to any nucleic acid containing molecule including,
but not limited to DNA or RNA. The term encompasses sequences that
include any of the known base analogs of DNA and RNA including, but
not limited to, 4-acetylcytosine, 8-hydroxy-N6-methyladenosine,
aziridinylcytosine, pseudoisocytosine, 5-(carboxyhydroxylmethyl)
uracil, 5-fluorouracil, 5-bromouracil,
5-carboxymethylaminomethyl-2-thiouracil, 5
carboxymethylaminomethyluracil, dihydrouracil, inosine,
N6-isopentenyladenine, 1-methyladenine, 1-methylpseudouracil,
1-methylguanine, 1-methylinosine, 2,2-dimethylguanine,
2-methyladenine, 2-methylguanine, 3-methylcytosine,
5-methylcytosine, N6-methyladenine, 7-methylguanine,
5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiour- acil,
beta-D-mannosylqueosine, 5'-methoxycarbonylmethyluracil,
5-methoxyuracil, 2-methylthio-N6-isopentenyladenine,
uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid,
oxybutoxosine, pseudouracil, queosine, 2-thiocytosine,
5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,
N-uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid,
pseudouracil, queosine, 2-thiocytosine, and 2,6-diaminopurine.
[0130] DNA molecules are said to have "5' ends" and "3' ends"
because mononucleotides are joined to make oligonucleotides in a
manner such that the 5' phosphate of one mononucleotide pentose
ring is attached to the 3' oxygen of its neighbor in one direction
via a phosphodiester linkage. Therefore, an end of an
oligonucleotide is referred to as the "5' end" if its 5' phosphate
is not linked to the 3' oxygen of a mononucleotide pentose ring and
as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of
a subsequent mononucleotide pentose ring. A double stranded nucleic
acid molecule may also be said to have a 5' and 3' end, wherein the
"5'" refers to the end containing the accepted beginning of the
particular region, gene, or structure. A nucleic acid sequence,
even if internal to a larger oligonucleotide, may also be said to
have 5' and 3' ends (these ends are not `free`). In such a case,
the 5' and 3' ends of the internal nucleic acid sequence refer to
the 5' and 3' ends that said fragment would have were it isolated
from the larger oligonucleotide. In either a linear or circular DNA
molecule, discrete elements may be referred to as being "upstream"
or 5' of the "downstream" or 3' elements. Ends are said to
"compatible" if a) they are both blunt or contain complementary
single strand extensions (such as that created after digestion with
a restriction endonuclease) and b) at least one of the ends
contains a 5' phosphate group. Compatible ends are therefore
capable of being ligated by a double stranded DNA ligase (e.g. T4
DNA ligase) under standard conditions.
[0131] As used herein, the term "hybridization" or "annealing"
refers to the pairing of complementary nucleotide sequences
(strands of nucleic acid) to form a duplex, heteroduplex, or
complex containing more than two single-stranded nucleic acids, by
establishing hydrogen bonds between/among complementary base pairs.
Hybridization is a specific, i.e. non-random, interaction
between/among complementary polynucleotides that can be
competitively inhibited.
[0132] As used herein, the term "primer binding site" refers to the
complimentary sequence of vector or other nucleic acid sequence to
which an oligonucleotide primer can hybridize.
[0133] As used herein, the terms "insert sequence," "insert DNA,"
or "foreign DNA" refer to any nucleic acid sequences that are
capable of being placed in a vector. Examples include, but are not
limited to, random DNA libraries and known nucleic acid sequences.
A particular "insert sequence," "insert DNA," or "foreign DNA" may
refer to a pool or a member of a pool of identical nucleic acid
molecules, a pool or a member of a pool of non-identical nucleic
acid molecules, or a specific individual nucleic acid molecule.
[0134] As used herein, the term "circular vector" refers to a
closed circular nucleic acid sequence capable of replicating in a
host.
[0135] As used herein, the terms "vector" or "plasmid" is used in
reference to extra-chromosomal nucleic acid molecules capable of
replication in a cell and to which an insert sequence can be
operatively linked so as to bring about replication of the insert
sequence. Examples include, but are not limited to, circular DNA
molecules such as plasmids constructs, phage constructs, cosmid
vectors, etc., as well as linear nucleic acid constructs (e.g.,
lambda phage constructs, bacterial artificial chromosomes (BACs),
etc.). A vector may include expression signals such as a promoter
and/or a terminator, a selectable marker such as a gene conferring
resistance to an antibiotic, and one or more restriction sites into
which insert sequences can be cloned. Vectors can have other unique
features (such as the size of DNA insert they can accommodate).
[0136] As used herein, the term "bacterial artificial chromosome"
("BAC") refers to a linear vector designed to propagate large
insert sequences (e.g. approximately 50,000 to several hundred
thousands bases in length) in host bacteria.
[0137] As used herein, the term "origin of replication" refers to a
DNA sequence conferring functional replication capabilities in a
host cell. Examples include, but are not limited to, normal or
non-conditional origin of replications such as the ColEl origin,
and its derivatives, which are functional in a broad range of host
cells.
[0138] As used herein, the term "conditional origin of replication"
refers to an origin of replication that requires the presence of a
functional trans-acting factor (e.g., a replication factor) in a
prokaryotic host cell. Examples of conditional origins of
replication include, but are not limited to, plasmid/bacteriophage
fd hybrid replicons such as that in the plasmid pKf2, which
contains the fd origin of replication. Replication of this type of
plasmid requires the presence of the bacteriophage fd gene II
protein. In conjunction with the host strain BHB2600, which was
constructed to express the bacteriophage gene II protein, the fd
origin is capable of autonomous replication and propagation. In any
host lacking the trans-acting gene II protein, replication fails to
occur. As used herein, a "conditional replication vector" means a
vector that has a conditional origin of replication.
[0139] As used herein, the term "unique restriction enzyme site"
refers to the recognition sequence for a given restriction enzyme
that appears once within a nucleic acid molecule. For example, the
EcoRI site is a unique restriction enzyme site within the plasmid
pUC19.
[0140] As used herein, the terms "polylinker" or "multiple cloning
site" refer to a cluster of restriction enzyme sites on a nucleic
acid construct, which are utilized for the insertion, and/or
excision of nucleic acid sequences.
[0141] As used herein, the term "host cell" refers to any cell that
can be transformed with heterologous DNA (such as a vector).
Examples of host cells include, but are not limited to, E. coli
strains that contain the F or F' factor (e.g., DH5.alpha.F or
DH5.alpha.F') or E. coli strains that lack the F or F' factor (e.g.
DH10B).
[0142] As used herein, the term "direct selection vector" refers to
a cloning vector that carries within it a toxic gene sequence who's
effect can be suppressed (e.g. by insertion of a DNA fragment into
a cloning site in the toxic gene, thereby inactivating the toxic
activity of the toxic gene). When lacking a DNA insert in its
cloning site, such a direct selection vector is generally lethal to
a host bacterial strain. A direct selection vector containing a DNA
insert within its cloning site is generally not lethal to a host
bacterial strain.
[0143] The terms "nucleic acid molecule encoding," "DNA sequence
encoding," and "DNA encoding" refer to a sequence of nucleotides,
which upon transcription into RNA and subsequent translation into
protein, would lead to the synthesis of a given peptide. Such
transcription and translation may actually occur in vitro or in
vivo, or it may be strictly theoretical, based on the standard
genetic code.
[0144] The term "gene" refers to a nucleic acid (e.g., DNA)
sequence that comprises coding sequences necessary for the
production of a polypeptide or precursor. The polypeptide can be
encoded by a full length coding sequence or by any portion of the
coding sequence so long as the desired activity or functional
properties (e.g., enzymatic activity, ligand binding, signal
transduction, etc.) of the full-length or fragment are retained.
The term also encompasses the coding region of a structural gene
and the sequences located adjacent to the coding region on both the
5' and 3' ends for a distance of about I kb or more on either end,
such that the gene is capable of being transcribed into a
full-length mRNA. The sequences which are located 5' of the coding
region and which are present on the mRNA are referred to as 5'
non-translated sequences. The sequences which are located 3' or
downstream of the coding region and which are present on the mRNA
are referred to as 3' non-translated sequences. The term "gene"
encompasses both cDNA and genomic forms of a gene. A genomic form
or clone of a gene contains the coding region interrupted with
non-coding sequences termed "introns" or "intervening regions" or
"intervening sequences." Introns are segments of a gene which are
transcribed into nuclear RNA (hnRNA); introns may contain
regulatory elements such as enhancers. Introns are removed or
"spliced out" from the nuclear or primary transcript; introns
therefore are absent in the messenger RNA (mRNA) transcript. The
mRNA functions during translation to specify the sequence or order
of amino acids in a nascent polypeptide.
[0145] The term "expression" as used herein is intended to mean the
transcription (e.g. from a gene) and, in some cases, translation to
gene product. In the process of expression, a DNA chain coding for
the sequence of gene product is first transcribed to a
complementary RNA, which is often a messenger RNA, and, in some
cases, the transcribed messenger RNA is then translated into the
gene protein product.
[0146] The term "prokaryotic termination sequence,"
"transcriptional terminator," or "terminator" refers to a nucleic
acid sequence, recognized by an RNA polymerase, that results in the
termination of transcription. Prokaryotic termination sequences
commonly comprise a GC-rich region that has a twofold symmetry
followed by an AT-rich sequence. A commonly used prokaryotic
termination sequence is the T7 termination sequence. A variety of
termination sequences are known in the art and may be employed in
the nucleic acid constructs of the present invention, including the
T.sub.INT, T.sub.L1, T.sub.L2, T.sub.L3, T.sub.R1, T.sub.R2,
T.sub.6S termination signals derived from the bacteriophage lambda,
and termination signals derived from bacterial genes such as the
trp gene of E. coli.
[0147] As used herein, the terms "selectable marker," "selectable
marker sequence," or "selectable marker gene" refers to a gene, or
other DNA fragment, which encodes or provides an activity that
confers the ability to grow or survive in what would otherwise be a
deleterious environment. For example, a selectable marker may
confer resistance to an antibiotic or drug upon the cell in which
the selectable marker is expressed. An origin of replication (Ori)
may also be used as a selectable marker enabling propagation of a
plasmid vector.
[0148] As used herein, the phrase "selectable marker region," in
reference to vector sequence, refers to the portion of a vector
component that contains all of the selectable marker sequences
present on a particular vector component. In other words, the ends
of selectable marker sequences present define the selectable marker
region. For example, if a particular vector component only had one
selectable marker sequence, the selectable marker region would be
defined by the beginning of the selectable marker sequence and the
end of the selectable marker sequence (see FIG. 16B, where the
arrow tip of the Kan sequence is one end of the selectable marker
region, and the other flat (non-arrow) end of the Kan sequence is
the other end of the selectable marker region). If a particular
vector component had, for example, two selectable marker sequences,
the selectable marker region is the nucleic acid sequence between
the beginning of the first selectable marker sequence and the end
of the second selectable marker sequence (see FIG. 16B, where the
arrow tip of the Ori sequence is one end of the selectable marker
region, and the flat (non-arrow) end of the Amp sequence is the
other end of the selectable marker region for this particular
vector component).
[0149] As used herein, the phrase "at least one selectable marker
sequence unique among said X+1 vector components," when used to
describe what a particular vector component contains, indicates
that a particular vector component, out of the total X+1 vector
components, contains at least one selectable marker sequence that
is not present on any of the other vector components (i.e. not
present on any of the other X vector components). Likewise, as used
herein, the phrase "two identical free ends that are unique among
said X+1 insert sequences" when used to describe the identical ends
of a particular insert sequence, indicates that a particular insert
sequence, out of the total X+1 insert sequences, has identical ends
that are not present on any of the other insert sequences (i.e. not
present on any of the other X insert sequences).
[0150] As used herein, the term "unique selectable marker sequence"
refers to a selectable marker that is present only on one of the
vector components that are combined to form a circular vector (e.g.
when a circular vector is formed having X+1 insert sequences and
X+1 vector components, each of the vector components has at least
one selectable marker that is not found on the other vector
component making up the circular vector).
[0151] As used herein, the phrase "two different free ends that are
non-unique among said X+1 vector components" when used to describe
the different ends of a particular vector component, indicates that
each of the two different free ends of a particular vector
component, out of the total X+1 vector components, are identical,
or nearly identical (e.g. differ by one or two bases) as at least
one of the ends of another vector component. For example, FIG. 16A
shows vector component "AB" that has a free end "b'". This b' free
end is non-unique because it is the same as one of the ends on the
vector component "BC". In preferred embodiments, each of the free
ends of the vector components is only the same as the free end of
one other vector component (e.g. b' in FIG. 16A only appears
twice).
[0152] As used herein, the term "replicable vector" means a vector
that is capable of replicating in a host cell.
[0153] The term "expression vector" as used herein refers to a
recombinant DNA molecule containing a desired coding sequence and
appropriate nucleic acid sequences necessary for expression of the
operably linked coding sequence (e.g. insert sequence that codes
for a product) in a particular host organism. Nucleic acid
sequences necessary for expression in prokaryotes usually include a
promoter, an operator (optional), and a ribosome binding site,
often along with other sequences.
[0154] As used herein, the terms "restriction endonucleases" and
"restriction enzymes" refer to enzymes (e.g. bacterial), each of
which cut double-stranded DNA at or near a specific nucleotide
sequence. Examples include, but are not limited to, AvaII, BamHI,
EcoRI, HindIII, HincII, NcoI, SmaI, and RsaI.
[0155] As used herein, the term "restriction" refers to cleavage of
DNA by a restriction enzyme at its restriction site.
[0156] As used herein, the term "restriction site" refers to a
particular DNA sequence recognized by its cognate restriction
endonuclease.
[0157] As used herein, the term "purified" or "to purify" refers to
the removal of contaminants from a sample. For example, plasmids
are grown in bacterial host cells and the plasmids are purified by
the removal of host cell proteins, bacterial genomic DNA, and other
contaminants. Thus the percent of plasmid DNA is thereby increased
in the sample. In the case of nucleic acid sequences, "purify"
refers to isolation of the individual nucleic acid sequences from
each other.
[0158] As used herein, the terms "sequencing" or "DNA sequence
analysis" refers to the process of determining the linear order of
nucleotides bases in a nucleic acid sequence (e.g. insert sequence)
or clone. These units are the C, T, A, and G bases. Generally, to
sequence a section of DNA, the sequence of a short flanking region,
i.e., a primer binding site, must be known. One method for
sequencing is called dideoxy sequencing. One example for performing
dideoxy sequencing uses the following reagents: 1) the DNA that
will be used as a template (e.g. insert sequence), 2) a primer that
corresponds to a known sequence that flanks the unknown sequence,
3) DNA nucleotides, to synthesize and elongate a new DNA strand, 4)
dideoxynucleotides that mimic the G, A, T and C building blocks to
incorporate into DNA, but that prevent chain elongation, thus
acting as termination bases for a DNA polymerase (the four
different dideoxynucleotides also may be labeled with different
fluorescent dyes for automated DNA sequence analysis), and 5) a
nucleic acid polymerizing agent (e.g., DNA polymerase or Taq
polymerase, which are enzymes that catalyze synthesis of a DNA
strand from another DNA template strand). When these reagents are
mixed, the primer aligns with and binds the template at the primer
binding site. The polymerizing agent then starts DNA elongation by
adding the nucleotide building blocks to the 3' end of the primer.
Randomly, a dideoxynucleotide will integrate into a growing chain.
When this happens chain elongation stops, and if the
dideoxynucleotide is fluorescently labeled, the label will be also
be attached to the newly generated DNA strand. Multiple strands are
generated from each template, each strand terminating at a
different base of the template. Thus, a population is produced with
strands of different sizes and different fluorescent labels,
depending on the terminal dideoxynucleotide incorporated as the
final base. This entire mix may, for example, be loaded onto a DNA
sequencing instrument that separates DNA strands based on size and
simultaneously uses a laser to detect the fluorescent label on each
strand, beginning with the shortest. The sequence of the
fluorescent labels, read from the shortest fragment to the longest
fragment, corresponds to the sequence of the template. The reading
may be done automatically, and the sequence may be captured and
analyzed using appropriate software. The term "shotgun cloning"
refers to the multi-step process of randomly fragmenting target DNA
into smaller pieces and cloning them en masse into plasmid or phage
vectors.
[0159] The term "shotgun sequencing" refers to sequencing the
nucleic acid templates produced in a shotgun cloning reaction.
[0160] As used herein, the term "to clone" when used in reference
to an insert sequence and vector means ligation of the insert
sequence into a vector capable of replicating in a host. The term
"to clone" when used in reference to an insert sequence, a vector,
and a host cell refers generally to making copies of a given insert
sequence. In this regard, to clone a piece of DNA (e.g., insert
sequence), one would insert it into a vector (e.g., a plasmid)
which may then be put into a host (usually a bacterium) so that the
plasmid and insert replicate with the host. An individual bacterium
is grown until visible as a single colony on nutrient media, the
colony is picked and grown in liquid culture, and the plasmid
containing the "cloned" DNA is re-isolated from the bacteria, at
which point there will be many millions of copies of the DNA. The
term "clone" can also refer either to a bacterium carrying a cloned
DNA, or to the cloned DNA itself.
[0161] As used herein, the terms "clone bank" or "library" refers
to a collection of insert sequences residing in transfected cells,
each of which contains a single insert sequence from a cosmid, BAC,
virus, genome, or other source, sub-cloned into a vector.
[0162] The term "electrophoresis" refers to the use of electrical
fields to separate charged biomolecules such as DNA, RNA, and
proteins. DNA and RNA carry a net negative charge because of the
numerous phosphate groups in their structure. Proteins carry a
charge that changes with pH, but becomes negative in the presence
of certain chemical detergents. In the process of "gel
electrophoresis," biomolecules are put into wells of a solid matrix
typically made of an inert porous substance such as agarose. When
this gel is placed into a bath and an electrical charge applied
across the gel, the biomolecules migrate and separate according to
size in proportion to the amount of charge they carry. The
biomolecules can be stained for viewing and isolated and purified
from the gels for further analysis. Electrophoresis can be used to
isolate pure biomolecules from a mixture or to analyze biomolecules
(such as for DNA sequencing).
[0163] As used herein, the term "PCR" refers to the polymerase
chain reaction method of enzymatically amplifying a region of DNA.
This exponential amplification procedure is based on repeated
cycles of denaturation, oligonucleotide primer annealing, and
primer extension by a DNA polymerizing agent such as a thermostable
DNA polymerase (e.g. the Taq or Tfl DNA polymerase enzymes isolated
from Thermus aquaticus or Thermus flavus, respectively).
[0164] As used herein, the term "dispersed restriction site cloning
vector," refers to a vector (e.g. plasmid), or a collection of DNA
fragments that may be assembled into a vector, with two or more
restriction endonuclease sites intentionally dispersed throughout
the sequence of the plasmid so as to be useful for the ligation of
multiple independent DNA fragments. A dispersed restriction site
cloning vector may exist only as a collection of its individual
parts, i.e., the sum of the parts, alone, may in fact not be
capable of being maintained as a single entity.
[0165] The term "multiplex cloning vector," refers to a vector, or
a collection of DNA fragments that may be assembled into a vector,
capable of co-cloning more than one independent DNA fragment at
more than one cloning site (e.g. restriction site). In preferred
embodiments, a multiplex cloning vector is intentionally designed
with selectable markers flanked by restriction sites useful for
releasing the selectable marker as a functionally intact unit after
endonuclease digestion. This design facilitates the selection
process for achieving multiple independent DNA fragment inserts at
multiple independent insertion sites. A multiplex cloning vector
may exist as a collection of its individual parts, i.e., the sum of
the parts, alone, may in fact not be capable of being maintained as
a single entity.
[0166] As used herein, the terms "complementary" or
"complementarity" are used in reference to polynucleotides (i.e., a
sequence of nucleotides) related by the base-pairing rules. For
example, for the sequence "5'-A-G-T-3'," is complementary to the
sequence "3'-T-C-A-5'" Complementarity may be "partial," in which
only some of the nucleic acids' bases are matched according to the
base pairing rules. Or, there may be "complete" or "total"
complementarity between the nucleic acids. The degree of
complementarity between nucleic acid strands has significant
effects on the efficiency and strength of hybridization between
nucleic acid strands. This is of particular importance in
amplification reactions, as well as detection methods which depend
upon binding between nucleic acids.
[0167] As used herein, the term "oligonucleotide," refers to a
short length of single-stranded polynucleotide chain.
Oligonucleotides are typically less than 100 residues long (e.g.,
between 15 and 50), however, as used herein, the term is also
intended to encompass longer polynucleotide chains.
Oligonucleotides are often referred to by their length. For example
a 24 residue oligonucleotide is referred to as a "24-mer".
Oligonucleotides can form secondary and tertiary structures by
self-hybridizing or by hybridizing to other polynucleotides. Such
structures can include, but are not limited to, duplexes, hairpins,
cruciforms, bends, and triplexes.
[0168] As used herein, the term "primer" refers to an
oligonucleotide, whether occurring naturally as in a purified
restriction digest or produced synthetically, which is capable of
acting as a point of initiation of nucleic acid synthesis when
placed under conditions in which synthesis of a primer extension
product which is complementary to a nucleic acid strand is induced,
(i.e., in the presence of nucleotides and an inducing agent such as
DNA polymerase and at a suitable temperature and pH). The primer is
preferably single stranded for maximum efficiency in amplification,
but may alternatively be double stranded. If double stranded, the
primer is first treated to separate its strands before being used
to prepare extension products. Preferably, the primer is an
oligodeoxyribonucleotide. The primer must be sufficiently long to
prime the synthesis of extension products in the presence of the
inducing agent. The exact lengths of the primers will depend on
many factors, including temperature, source of primer and the use
of the method.
[0169] As used herein, the term "target," in regards to PCR, refers
to the region of nucleic acid bounded by the primers. Thus, the
"target" is sought to be sorted out from other nucleic acid
sequences. A "segment" is defined as a region of nucleic acid
within the target sequence.
[0170] As used herein, the terms "PCR product," "PCR fragment," and
"amplification product" refer to the resultant mixture of compounds
after two or more cycles of the PCR steps of denaturation,
annealing and extension are complete. These terms encompass the
case where there has been amplification of one or more segments of
one or more target sequences.
[0171] The term "transformation" or "transfection" as used herein
refers to the introduction of foreign DNA into cells (e.g.
prokaryotic cells). Transformation may be accomplished by a variety
of means known to the art including calcium phosphate-DNA
co-precipitation, DEAE-dextran-mediated transfection,
polybrene-mediated transfection, electroporation, microinjection,
liposome fusion, lipofection, protoplast fusion, retroviral
infection, and biolistics.
[0172] As used herein, the term "vector component" refers to any
linear nucleic acid sequence that is capable of forming part of a
circular vector when combined with at least one other vector
component (e.g. in the presence of multiple insert sequences), or
when combined with at least one insert sequence (e.g. SEQ ID
NO:85). In preferred embodiments, a vector component comprises at
least one selectable marker sequence or other features (e.g. sticky
ends for the formation of a circular vector when combined other
vector components or insert fragments, primer binding sites,
transcriptional terminators, etc), which contribute to the
maintenance or function of the resulting circular vector.
[0173] As used herein, the term "comprising free ends" or "having
free ends" in reference to a double stranded nucleic acid molecule
having blunt ends, indicates that the nucleic acid molecule is
linear (the ends are not bound to additional nucleotides), with
each "free end" being the position occupied by the terminal 5' and
3' bases of the nucleic acid molecule that are hybridized to each
other. A linear, double stranded, blunt ended nucleic acid molecule
will have two "free ends" (referred to as "blunt free ends"). As
used herein, the term "free ends" in reference to a double stranded
nucleic acid molecule having overhang (sticky) ends, indicates that
the nucleic acid molecule is linear (the ends are not bound to
additional nucleotides), with each "free end" being the positions
occupied by the single stranded (overhang) region. A linear, double
stranded, sticky-ended nucleic acid molecule will have two "free
ends" (referred to as "sticky free ends"). Also, a double stranded,
linear nucleic acid molecule may also have one "blunt free end" and
one "sticky free end". Also, when a vector component or insert
sequence is said to have "free ends," this indicates, for double
stranded vector components and insert sequences, that the molecule
is linear and that the terminal 3' base, and the terminal 5' base
at each end of the molecule are not bound to other
oligonucleotides.
[0174] As used herein, the term "source nucleic acid molecule"
refers to any nucleic acid sequence, either linear or closed
circular, that is capable of supplying at least one vector
component. For example, a source nucleic acid molecule may itself
be a vector component, or may become a vector component upon
digestion with restriction enzymes, or may serve as a target
sequences such that a portion of the source nucleic acid molecule
may be subject to PCR to create a vector component. As used herein,
the term "separate" in reference to at least two source nucleic
acid molecules" indicates that the at least two source nucleic acid
molecules are not physically linked (e.g. ligated) together, and do
not have the same nucleic acid sequence. In other words, the at
least two source nucleic acid molecules are separate molecules that
have different nucleic acid sequences.
[0175] As used herein, the phrase "wherein said vector components
are non-contiguous within said circular vector" refers to the
arrangement of vector components within a circular vector such that
there is at least one insert sequence between the ends of each
vector component present on the circular vector, such that none of
the ends of vector components are joined (e.g. ligated)
together.
[0176] As used herein, the term "selective growth media" refers to
growth media used to grow cells that has been supplemented with one
or more selective agents (antibiotics).
[0177] As used herein, the term "non-promoter sequence" refers to
any nucleic acid sequence that is unable to serve as an operable
promoter element for initiating transcription in a given host cell,
such as a bacterial host cell, or a eukaryotic host cell. In
preferred embodiments, the host cell in which the non-promoter
sequence is unable to serve as an operable promoter is an E. coli
host cell.
[0178] As used herein, the phrase "wherein said identifying is at
least 95% accurate" refers to the visual, chemical, mechanical, or
biological identification of cells (or colonies of cells) as
containing the desired number insert sequences, wherein this
identification is correct 95% of the time (e.g. only 5% or less
identified as containing the desired number of insert sequences are
false positives). The "95% " in this phrase may be substituted for
other numbers (e.g. 80%, 90%, 98%, 99%, etc.), to indicate the
percent correct.
[0179] As used herein, the term "multiplex sequencing reagents,"
includes, but is not limited to, appropriate primers, DNA
nucleotides, dideoxynucleotides, and a DNA polymerizing agent (e.g.
Taq polymerase). In some embodiments, the four different
dideoxynucleotides are labeled with different fluorescent dyes
(e.g., for automated DNA sequence analysis).
[0180] As used herein, the phrase "wherein each of said vector
components is flanked by cloning sites" means that each of the
vector components to be made into a vector (e.g. circular vector)
when the circular vector is formed, has cloning sites at each of
its two ends (e.g. the cloning sites may be part of the vector
component, or only be formed when joined to another vector
component, such that each vector component has a cloning site at
each of its ends).
[0181] As used herein, the symbol" X" is a number that is a
positive integer greater than or equal to one.
DESCRIPTION OF THE INVENTION
[0182] The present invention relates to systems, methods, and
compositions for cloning and sequencing insert nucleic acid
sequences. In particular, the present invention provides vectors
and vector components configured for multiplex cloning and
multiplex sequencing. The present invention also provides vectors
and vector components configured to reduce transcription of insert
sequences.
[0183] In some embodiments, the present invention provides systems,
methods, and compositions for cloning multiple insert sequences in
a single vector. In particular embodiments, this vector is formed
from at least two vector components containing selectable marker
sequences and at least two insert DNA sequences. The formation of
this vector may occur, for example, in a single ligation reaction
(e.g. the two vector components and insert sequences, all separate,
are joined together in a single ligation reaction). In some
embodiments, the compositions of the present invention permit
multiplex sequencing (e.g. from a single vector constructed from at
least two vector components and at least two insert sequences). In
preferred embodiments, the source nucleic acid used to form the
vectors of the present invention are at least two separate source
nucleic acid molecules (e.g. neither of which has all of the
selectable markers as the final vector that is formed).
[0184] The present invention provides systems that facilitate
multiplex DNA cloning and sequencing. In these systems, multiple
DNA fragments are simultaneously and independently cloned into
dispersed sites of a cloning vector, and in some embodiments, the
fragments are subsequently sequenced simultaneously in a single DNA
sequencing reaction. This multiplex cloning system encompasses a
very low-background direct selection vector and requisite exogenous
selectable DNA fragments, as well as enzymatic and physical
processing of said vector and selectable fragments. This invention
further provides prokaryotic host cells, cell lines and methods for
processing of these cells for transformation by the cloning vectors
of the present invention. The present invention also provides
methods of simultaneously sequencing multiple cloned DNA
fragments.
[0185] The present invention provides systems and methods for
multiplex DNA cloning and sequencing. For example, the systems and
methods of the present invention allow multiple insert sequences to
be simultaneously and independently cloned into dispersed sites of
a cloning vector, and allows the insert sequences to be
subsequently sequenced simultaneously in a single DNA sequencing
reaction. This multiplex cloning system, in some embodiments, has a
very low background signal. The invention also provides multiplex
cloning vector preparations with a background of less than 0.5%
empty vector. The invention further provides methods for making
multiplex cloning vector preparations with at least 99.5%
recombinant insertion frequency at each restriction site and a
background of less than 0.5% empty vector.
[0186] Commonly used cloning vectors are designed with one or more
restriction sites clustered in one small region of the vector (FIG.
1A). This polylinker design usually limits the number of DNA
fragment insertion sites to one per vector. As described above, it
is possible to clone more than one fragment into the same site, but
no advantage is gained for sequencing purposes, as there are only
two flanking primer extension sites and sequence reads are
generally limited to approximately 500-700 bases.
[0187] Nearly all of the commonly used vectors are designed such
that the restriction sites suitable for cloning are located within
the reading frame of a selectable marker or indicator gene. The
polylinker engineered into the lacZ.alpha. gene, or any of numerous
other endogenous or engineered restriction sites in any antibiotic
resistance gene, exemplify the predominant cloning strategy in use
today.
[0188] The central dogma of cloning as it is practiced today is
that one plasmid vector can propagate one DNA fragment for genomic
sequence analysis. Because of the very large number of plasmid
purification reactions needed to sustain genomic scale sequencing,
the need exists for a method that can reduce the number of
templates required for this process. True multiplex cloning vectors
and systems capable of propagating multiple independent DNA
fragments at multiple different sites within a single vector
molecule are provided by the present invention. The benefits of the
present invention, for example, are demonstrated from the following
example: purifying two co-cloned DNA fragments carried by a single
vector molecule is approximately twice as efficient as the current
methods of purifying two separate vectors, each carrying one cloned
fragment. Similarly, a quadraplex cloning vector, for example,
could improve the template purification rate approximately
four-fold compared to purifying four single-fragment cloning
vectors. When applied to a large-scale sequencing effort (e.g.
sequencing more than 10,000 templates per day), the advantages of
multiple cloning and sequencing become extraordinary.
[0189] The multiplex cloning vectors of the present invention also
lend themselves to a simple multiplex sequencing strategy,
realizing additional advantages. Each of the unique primer binding
sites in a multiplex vector has an associated unique capture
sequence for affinity purification. In one embodiment of the
present invention, a sextuplex sequencing scheme is carried out, in
which all six primers cognate to a triplex cloning vector are added
to the same reaction tube. All six sets of sequencing reactions are
conducted in a single tube, extending the concept demonstrated by
Creasey et al. (Bio Techniques 11: 102 [1991]) and Wiemann et al.
(Anal. Biochem. 224: 117 [1995]; Anal. Biochem. 234: 166 [1996]).
To separate the six sequence ladders, biotin tagged peptide nucleic
acid (PNA) oligomers, each specific for one of the unique capture
sequences, are sequentially added to the reaction pool and removed
by binding to streptavidin-coated magnetic particles. Each affinity
purified reaction is analyzed on an automated DNA sequencing
machine.
[0190] In a preferred embodiment, PNA oligomers are used instead of
DNA oligomers because of their inherently higher affinity for DNA,
even in low salt conditions (Egholm et al., Nature 365: 566
[1993]), which allows the use of shorter capture oligomers and
permits higher capture efficiencies than equivalent DNA oligomers.
Because the streptavidin-biotin-PNA particles can be reused 5-10
times, just as with the DNA equivalent, the additional reagent cost
for this affinity purification scheme is minimal. The
biotin-streptavidin purification step does increase the reagent
costs slightly, but also serves to remove contaminating dye
terminators, template DNA, and polymerase. This last point is
particularly important as the newest generation of 96 capillary
sequencing instruments are integrated into high throughput genome
centers. The small bore of capillary based instruments (50 micron
diameter) makes this system especially prone to failure by
macro-molecular contaminants, which readily occlude the injection
interface.
[0191] As discussed previously, the present invention overcomes
many of the problems associated with the current blue screen
technology. One such problem with the blue screen technology is
plasmid instability due to vector-driven transcription of the
insert DNA. The lac promoter that drives transcription of the lacZ
.alpha.-peptide in pUC type plasmids must be active (either
constitutively or through induction, e.g., by IPTG) for the blue
screen to function. Because insert DNA fragments are cloned into
the lacZ .alpha.-peptide, the lac promoter will cause transcription
of the inserted sequences as well. Consequently, recombinant
proteins or peptides encoded by the insert sequences may be
expressed. Clones that encode proteins or peptides that are toxic
or deleterious to the host bacterium may result in death or slow
growth of the host, likewise leading to difficulty in recovering
such fragments. The present invention addresses the problem of
promoter driven transcription of insert sequences, for example, by
eliminating promoter elements near cloning sites, and providing
terminators after selectable markers, which is made possible by the
systems and methods of the present invention.
[0192] Another problem with cloning insert sequences is that
transcription may be initiated from within the cloned insert DNA,
particularly if the insert contains authentic transcriptional
promoters or regions that behave as promoters in bacteria. In most
conventional cloning vectors, including the pUC type vectors, such
insert-driven transcription may proceed unimpeded into the vector
portion of the plasmid. This transcription may interfere with
transcription of the antibiotic resistance gene(s) encoded by the
vector or with the functionality of the origin of replication.
Either type of interference is likely to cause instability of the
recombinant clone, leading to difficulty in cloning such fragments.
In particular, inserts that are high in A-T content (e.g., more
than 60% of the bases are either A or T) have an increased tendency
to behave as bacterial promoters. The genomic DNA of several
organisms that are highly enriched in A-T content are difficult to
clone (e.g., Lactobacillus, Dictyostelium, Oxytricha, Tetrahymena,
Paramecium). The present invention blocks or minimizes insert
transcription, for example, by providing transcriptional
terminators before and after insert sequences in vectors formed
from the vector components of the present invention.
[0193] In certain embodiments, the present invention provides
compositions comprising a vector component, wherein the vector
component comprises: i) first and second free ends; ii) a
selectable marker region, iii) a first transcriptional terminator
between the first free end and the selectable marker region, and
iv) a second transcriptional terminator between the second free end
and the selectable marker region, and wherein the vector component
is configured to form a circular vector when combined with an
insert sequence. In certain embodiments, the insert sequence has at
least 65% A/T content (e.g. at least 65%, 75%, 80%, or 85% A/T
content).
[0194] High copy number of vectors with toxic or lethal inserts may
also promote cell death, and thus, the lack of the ability to
recover the sequence of the deleterious or toxic sequence from the
host cell. Most cloning vectors in use today employ a derivative of
the ColE1 origin of replication that is present in the pUC vectors.
This origin of replication results in a high copy number (typically
300-500 copies) of the plasmid in which it is contained. Plasmid
replication to high copy numbers is advantageous for recovery of
increased amounts of plasmid DNA from cell cultures or for
increased production of recombinant proteins encoded by such
plasmids. However, plasmids that contain DNA that is deleterious to
the host cell may result in slow growth or death of the cell if
they are present at high copy number; hence, they may be difficult
to clone in a high copy number vector. The present invention, in
some embodiments, further minimizes the deleterious effects of
toxic insert sequences by employing vectors and vector components
configured to have a low copy number in a host cell.
[0195] As mentioned above, the present invention also provides
vectors and vector components that minimize insert sequence
transcription (e.g. minimize vector-driven transcription into the
insert DNA and insert-driven transcription into the vector). In
preferred embodiments, the reduced amount of transcription allows
cloning of sequences that are toxic to the host cells (thereby
allowing the sequence to be cloned when otherwise the host cell
would be killed and the sequence could not be cloned). In certain
embodiments, the present invention provides compositions comprising
a vector component, wherein the vector component comprises: i) a
selectable marker region, ii) a transcriptional terminator after
the selectable marker region, and wherein the vector component is
configured to form a circular vector when combined with an insert
sequence. In preferred embodiments, the vector is capable of
maintaining an insert sequence that is a lethal or toxic insert
sequence (e.g. will not allow the host cell to form a colony if the
insert sequence is transcribed). In certain embodiments, the insert
sequence has at least 65% A/T content (e.g. at least 65%, 75%, 80%,
or 85% A/T content).
[0196] One problem with the ampicillin gene product beta-lactamase,
found in all of the pUC series of blue screen plasmids, also
contributes to the problem of false positives. Beta-lactamase can
leak out of the cell to generate an antibiotic free zone
surrounding the ampicillin resistant colony. This antibiotic free
zone enables the growth of so called "feeder cells," which do not
have a plasmid but are nonetheless capable of growing in the
vicinity of ampicillin resistant colonies. These cells will be
white in a blue screen system, and they are readily confused as
being recombinant clones. False negative or false positive results
are, in general, present with any cloning system. The degree to
which a cloning strategy circumvents these issues will impact the
final desired result. The present invention addresses this problem,
for example, by providing a mutated ampicillin resistance sequence
configured to reduce feeder colonies. In some embodiments, the
mutated ampicillin resistance gene (e.g. derived from pUC19)
comprises at least one mutation selected from: T to A at position
174; T to C at position 333; A to G at position 412, C to T at
position 648; T to C at position 668; T to C at position 764; and
combinations thereof. In preferred embodiments, the circular vector
is a recombinant plasmid. In some embodiments, the native promoter
of the ampicillin resistance gene is replaced with a less active
promoter (e.g. chloramphenicol promoter).
EXPERIMENTAL
[0197] The following examples are provided in order to demonstrate
and further illustrate certain preferred embodiments and aspects of
the present invention and are not to be construed as limiting the
scope thereof.
[0198] In the experimental disclosure which follows, the following
abbreviations apply: N (normal); M (molar); mM (millimolar); .mu.M
(micromolar); mol (moles); mmol (millimoles); .mu.mol (micromoles);
nmol (nanomoles); pmol (picomoles); g (grams); mg (milligrams);
.mu.g (micrograms); ng (nanograms); l or L (liters); ml
(milliliters); .mu.l (microliters); cm (centimeters); mm
(millimeters); .mu.m (micrometers); nm (nanometers); DS (dextran
sulfate); C (degrees Centigrade); and Sigma (Sigma Chemical Co.,
St. Louis, Mo.).
[0199] In the experimental disclosure which follows, the following
reagents and protocols were used:
[0200] Enzymes
[0201] AvaII, BamHI, EcoRI, HindIII, HincII, NcoI, SmaI, RsaI, T4
DNA ligase, T4 DNA polymerase, Vent DNA polymerase, and T4
polynucleotide kinase were obtained from New England Biolabs
(Beverly, Mass.). Where indicated T4 DNA ligase was obtained from
Epicenter Technologies (Madison, Wis.). Taq and Tfl DNA polymerase
and calf intestinal phosphatase were obtained from Promega
(Madison, Wis.). Calf intestinal alkaline phosphatase and
thermosensitive alkaline phosphatase was obtained from Life
Technologies (Rockville, Md.). Enzymatic reactions were performed
according to the manufacturers' instructions.
[0202] Media
[0203] Terrific Broth (TB) medium contained Bacto tryptone (11.8 g
1-1), yeast extract (23.6 g 1-1), dipotassium hydrogen phosphate
(9.4 g 1-1) and potassium dihydrogen phosphate (2.2 g 1-1). TY
plates contained Bacto tryptone (8 g 1-1), yeast extract (5 g 1-1),
NaCl (5 g 1-1), and agar (15 g 1-1). Plates were supplemented with
ampicillin or carbenicillin at 100 ug/ml, chloramphenicol at 10
ug/ml, kanamycin at 20 ug/ml, or gentamycin at 15 ug/ml. Media
components and antibiotics were obtained from Sigma (St. Louis,
Mo.).
[0204] Strains, Plasmids, and Bacteriophage DNA
[0205] Plasmid pZErO-2 is commercially available from Invitrogen
Corp. (Carlsbad, Calif.). Plasmid pACYC184 is available from the
American Type Culture Collection, #37033 (Chang A C and Cohen S N,
J. Bacteriol. 134: 1141-1156, 1978; Rose R E, Nucleic Acids Res.
16: 355, 1988). pACYC177 is available from the American Type
Culture Collection, #37031 (Chang A C and Cohen S N, supra; Rose R
E, supra. pUC19 is available from the American Type Culture
Collection, #37254 (Vieira J and Messing J, Gene 19: 259-268, 1982.
Bacteriophage fd is available from the American Type Culture
Collection, #15669-B2 (Hoffmann-Berling H, Virology 22: 305, 1964
). pJQ200 is available from the American Type Culture Collection,
#77482 (Quandt J and Hynes M F, Gene 127:15-21, 1993). The cell
strain E. coli BHB2600, supE, supF, lambdaCH616, is available from
the American Type Culture Collection, #47004 (Geider K et al.,Gene
33: 341-349, 1985). E. coli DH5.alpha.F', F' .phi.80dlacZ
.DELTA.(lacZYA-argF)U169 deoR recA1 endA1
hsdR17(r.sub.k.sup.-m.sub.k.sup.+) phoA supE44 .lambda..sup.-thi-1
gyrA96 relA1 was obtained from Life Technologies (Rockville, Md.).
E. coli DH5.alpha.FT, F' .phi.80dlacZ. .DELTA. (lacZYA-argF)U169
deoR recA1 endA1 hsdR17(r.sub.k.sup.-m.sub.k.sup.+) phoA supE44
.lambda..sup.-thi-1 gyrA96 relA1/F' proAB.sup.+lacI.sup.qZ.deltaM15
Tn10(tet.sup.r) was obtained from Life Technologies (Rockville,
Md.). E. coli DH5.alpha.F'IQ, F' .phi.80dlacZ .DELTA.
(lacZYA-argF)U169 deoR recA1 endA1
hsdR17(r.sub.k.sup.-m.sub.k.sup.+) phoA supE44 lambda.sup.-thi-1
gyrA96 relA1/F' proAB.sup.+lacI.sup.qZ.delta.M15 zzf::Tn5[Km.sup.r]
was obtained from Life Technologies (Rockville, Md.). E. coli
DH10B, F.sup.-.sup. mcrA .DELTA. (mrr-hsdRMS-mcrBC) .phi.80dlacZ
.DELTA. M15 .DELTA. lacX74 deoR recA1 endA1 araD139 .DELTA. (ara,
leu)7697 galU galK lambda.sup.-rpsL nupG was obtained from Life
Technologies. Bacteriophage T7 DNA was purchased from Sigma (St.
Louis, Mo.) and bacteriophage lambda DNA was obtained from Promega
Corporation (Madison, Wis.).
[0206] Plasmid Purification
[0207] Mini-prep, midi-prep, and large-scale plasmid DNA was
purified by the alkaline lysis method (3) or with the Quantum Prep
Kit (Bio-Rad Laboratories, Hercules, Calif.).
[0208] PCR Reactions
[0209] Taq or Tfl PCR reactions were performed in 50-100 .mu.l of
1.times. Taq or Tfl polymerase buffer with 200 pmol each primer,
100 nM dNTP, approx. 10 ng template DNA, and 2.5 units of Taq or
Tfl DNA polymerase. PCR cycle conditions were 30 seconds at 94
degrees C., 30 seconds at 60 degrees C., 2 minutes at 72 degrees C.
for 30-35 cycles, followed by 10 minutes at 72 degrees C.
[0210] Vent PCR reactions were performed in 50-100 .mu.l of
1.times. Vent polymerase buffer with 200 pmol each primer, 100 nM
dNTP, approx. 10-50 ng template DNA, and 2.5 units of Vent DNA
polymerase. PCR cycle conditions were 15 seconds at 94 degrees C.,
15 seconds at 50 degrees C., 2 minutes at 72 degrees C. for 25-30
cycles, followed by 10 minutes at 72 degrees C.
[0211] Sequencing Reactions and Analysis
[0212] The Prism Big Dye Terminator Cycle Sequencing Kit with
AmpliTaq DNA Polymerase, FS, and the Prism 310 Genetic Analyzer
capillary sequencing instrument were obtained from Applied
Biosystems (Foster City, Calif.). The cycle sequencing reactions
were performed in 10 .mu.l of 1/2.times. buffer with 250 ng plasmid
template and 3.2 picomole of oligonucleotide primer. Cycle
sequencing conditions were 10 seconds at 95 degrees C., 5 seconds
at 50 degrees C., 4 minutes at 60 degrees C. for 25 cycles. The
reactions were purified by Sephadex G-50 centrifugal filtration for
2 minutes at 3,000 RPM. The eluate was dried at 75 degrees C. and
resuspended in 25 .mu.l formamide. This mixture was heated at 95
degrees C. for 2 minutes and placed on the auto-loading tray of the
sequencing instrument for injection, electrophoresis, detection,
and automated base calling.
[0213] Ligation Reactions
[0214] The ligation reactions were performed in 10 82 l of 1.times.
buffer containing DNA, 0.5 mM ATP, and 2-10 units T4 DNA ligase for
2-3 hours at room temperature. Ligation reactions were then
heat-treated at 65.degree. C. for 15 minutes to denature the
ligase.
[0215] For the examples shown below, three separate ligation
reactions were prepared as follows: 1) A "no ligase, no insert"
control reaction to test for the level of contaminating empty
vector; 2) A "plus ligase, no insert" control reaction to check for
the efficiency of 5' phosphate removal; and 3) A "plus ligase, plus
insert" reaction to test the cloning efficiency. The insert in this
control is phage lambda DNA digested with RsaI (.lambda. RsaI) or
HincII (.lambda.HincII).
[0216] Transformation Procedure
[0217] Frozen electroporation competent E. coli cells were thawed,
and 50 .mu.l were combined with 1-2 .mu.l of ligated, heat-treated
DNA. This mixture was added to a chilled 0.1 cm gap electroporation
cuvette. Electroporation using the Bio-Rad (Hercules, Calif.) E.
coli Pulsor.TM. apparatus was at 1.8 kV. The cells were transferred
to 950 .mu.l of TB medium and placed in a shaking incubator at 225
rpm for 1 hour at 37.degree. C. Varying amounts of these cells were
spread on TY plates containing the appropriate concentration of
antibiotic or indicator chemicals and incubated overnight at 37
C.
[0218] Transformation Results
[0219] The efficiency of the competent cells was determined by
transformation with supercoiled plasmid pUC19. Typically, 10 pg of
pUC19 was mixed with the competent cells, electroporated, and
brought up to 1 ml in growth media for recovery at 37 C for 1 hr.
This solution was diluted ten fold, and a {fraction (1/10)}.sup.th
aliquot was spread on TY plates containing ampicillin. The number
of colonies was counted to calculate the efficiency in colony
forming units (CFU)/ug pUC19. The efficiency of the
electro-competent cells used in the following examples ranged from
5.times.10.sup.8-5.times.10.sup.9 colony forming units/ug
pUC19.
[0220] Although transformation efficiencies are typically presented
in terms of transformants per microgram of a supercoiled vector,
these values are not directly applicable when comparing vectors of
different sizes. To compensate for different sizes, a more accurate
value is the number of transformants per femtomole of vector.
However, because of the unknown size of inserts in a particular
recombinant clone, and the large variation in the amount of DNA
used between experiments, the transformation results from the
cloning experiments are presented as the number of colonies per ml
transformed cells. Thus, the number of colonies are counted and
divided by the fraction of cells from the original 1 ml of recovery
media. The amount of DNA used in a particular ligation and the
fraction of the ligation used to transform the competent cells is
also reported. Using this method of calculation, a transformation
efficiency of 1.times.10.sup.9 CFU/ug of pUC19 corresponds to
approximately 10,000 colonies/ml of cells transformed with 10 pg of
pUC19.
[0221] Nucleic Acid Sequences
[0222] In order to minimize extraneous sequence elements, in some
examples, individual selectable markers were removed from their
original context via polymerase chain reaction amplification.
[0223] PCR primers were designed to amplify a plasmid origin of
replication (Ori) and four antibiotic resistance genes: ampicillin
(Amp), chloramphenicol (Cam), kanamycin (Kan), and gentamycin
(Gen). The PCR primers were designed to append a SmaI site to the
5' and 3' end of each selectable marker. The primer corresponding
to the 5' end of each marker also contained a unique eight base
restriction site, and primer corresponding to the 3' end of each
antibiotic gene contained a strong transcriptional terminator. SmaI
recognizes the sequence CCCGGG and cleaves between the C and G to
leave blunt ends. The selectable markers were amplified from
various plasmid sources, restricted with SmaI, ligated to one
another or to other known plasmids, and transformed into E. coli
cells. All five PCR fragments were checked for their respective
biological function, associated restriction sites, and size.
[0224] The test insert DNA used throughout the Example was
bacteriophage lambda (.lambda.) DNA restricted to completion with
RsaI (113 fragments) or with HincII (35 fragments). RsaI recognizes
the sequence GTAC and cleaves between the T and A to leave blunt
ends. HincII recognizes the sequence GT(T/C)(A/G)AC and cleaves
between the (T/C) and (A/G) to leave blunt ends. The .lambda.HincII
DNA was precipitated with PEG8000/MgCl.sub.2, which results in loss
of fragments less than approximately 300 bp in length. In the
examples below, .lambda. fragments were present at approximately
3-4 fold molar excess of DNA ends over the selectable marker
fragments.
EXAMPLE 1
Conventional Cloning with a Blue Screen Vector
[0225] Blue screen cloning with the vector pUC19 is commonly used
for cloning experiments, including the construction of template
libraries for genomic sequencing. The blue screen vector pUC19,
shown schematically in FIG. 1A, was restricted with SmaI and
treated with alkaline phosphatase (AP) to remove the 5' terminal
phosphate groups (pUC19 SmaI/AP). Since ligase requires a 5'
phosphate group on at least one of the DNA termini in a ligation
reaction, removal of the 5' phosphates of the vector inhibits
rejoining of the ends of the vector. This type of dephosphorylation
is commonly used to decrease the vector background in cloning
strategies. Three separate ligation reactions were prepared as
follows: 1) A no ligase, no insert control reaction to test for the
level of undigested, empty vector background
(pUC19/SmaI/AP-ligase); 2) A plus ligase, no insert control
reaction to check for the efficiency of 5' phosphate removal
(pUC19/SmaI/AP+ligase); and 3) A plus ligase, plus insert reaction
to test the cloning efficiency of this vector system
(pUC19/SmaI/AP+.lambda.RsaI+ligase). The ligation reaction
contained 130 ng pUC19 SmaI/AP and approximately one fifth of this
reaction was transformed into E. coli DH5.alpha.F'. An aliquot of
the transformed cells was spread onto TY agar plates containing
ampicillin plus XGAL. The transformation results are presented in
Table 1.
1TABLE 1 Efficiency of blue screen cloning Blue colonies White
colonies Ligation Antibiotic ml trans- ml trans- reaction plate
formed cells formed cells pUC19/SmaI/ amp + XGAL 667,000 0 AP -
ligase pUC19/SmaI/ amp + XGAL 40,000 3,300 AP + ligase pUC19/SmaI/
amp + XGAL 43,000 53,000 AP + .lambda.RsaI + ligase
[0226] The pUC19/SmaI/AP-ligase control reaction resulted in a high
background of colonies in the absence of ligase. As expected, the
colonies resulting from this reaction had a blue phenotype. The
pUC19/SmaI/AP+ligase reaction also produced a large number of
colonies in the absence of insert DNA. Approximately 8%
(3300/[3300+40,000].times.100- ) of the resulting colonies had a
white phenotype, indicating disruption of the lacZ.alpha. gene even
in the absence of insert DNA. In the presence of insert DNA, this
cloning experiment resulted in approximately 53,000 white colonies,
or putative recombinant clones, per ml of transformed cell
(pUC19/SmaI/AP+.lambda.RsaI+ligase). The background of true
negative clones, i.e., the ratio of empty vector to putative
recombinants, was approximately 54.9%
([43,000/[43000+53000.times.100). The frequency of putative false
positive clones, i.e., white colonies obtained in the absence of
insert DNA was approximately 6.2% (3300/53,000.times.100). This
high level of empty vector background and high frequency of false
positives is a common problem when using the blue screen system to
clone blunt-ended fragments.
EXAMPLE 2
Standard Direct Selection Cloning
[0227] This example describes a standard direct selection cloning
procedure. The pZErO-2 cloning vector, commercially available from
Invitrogen (catalog number K2600-01, Carlsbad, Calif.), allows
direct selection of inserts via disruption of the lethal gene ccdB.
The CcdB protein acts by inhibiting the essential enzyme
topoisomerase II (DNA gyrase) of the host bacteria. The ccdB gene
is fused in-frame to the C-terminus of the lacZ.alpha. gene in the
pZErO construct, putting it under control of the lac promoter.
Thus, the chemical IPTG is required to induce its expression in
cells that carry the over-expressing lacI.sup.q allele of the lac
repressor.
[0228] pZErO-2 was restricted with SmaI and treated with alkaline
phosphatase. Three separate ligation reactions were prepared as
follows: 1) A no ligase, no insert control reaction to test for the
level of uncut empty vector (pZErO-2/SmaI/AP-ligase); 2) A plus
ligase, no insert control reaction to check for the efficiency of
5' phosphate removal (pZErO-2/SmaI/AP+ligase); and 3) A plus
ligase, plus insert reaction to test the cloning efficiency
(pZErO-2/SmaI/AP+.lambda.RsaI+ligase). The ligation reaction
contained 10 ng pZErO-2 SmaI/AP. {fraction (1/20)} of this reaction
was transformed into E. coli DH5.alpha.F', and an aliquot was
spread on TY agar plates containing kanamycin plus or minus IPTG.
The transformation results are presented in Table 2.
2TABLE 2 Efficiency of direct selection cloning with pZErO-2
Antibiotic # colonies/ml Ligation rx plate transformed cells
pZErO-2/SmaI/AP - ligase kan 10,000 pZErO-2/SmaI/AP + ligase kan
>200,000 pZErO-2/SmaI/AP + .lambda.RsaI + ligase kan >200,000
pZErO-2/SmaI/AP - ligase kan + IPTG 0 pZErO-2/SmaI/AP + ligase kan
+ IPTG 3245 pZErO-2/SmaI/AP + .lambda.RsaI + ligase kan + IPTG
88,600
[0229] As seen in Table 2, the pZErO-2/SmaI/AP-ligase and
pZErO-2/SmaI/AP+ligase control reactions produced very high
backgrounds of colonies when plated on kan minus IPTG plates. This
result is expected and demonstrates that it is essential to include
the chemical inducer IPTG when using this vector. When plated on
kan plus IPTG plates, the pZErO-2/SmaI/AP-ligase control reaction
did not produce a background of colonies. However, the
pZErO-2/SmaI/AP+ligase reaction resulted in approx. 3200 colonies
in the absence of insert DNA. In the presence of insert DNA, 88,600
colonies were observed (pZErO-2/SmaI/AP+.lambda.RsaI+l- igase).
Thus, the background of empty vector to putative recombinants was
approximately 3.7% (3200/88600.times.100). The frequency of false
positive or false negative results using this system cannot be
estimated without significant additional analysis.
EXAMPLE 3
Multiplex Cloning Using Standard Methods
[0230] The quadraplex cloning concept illustrated in FIG. 1B shows
a plasmid vector with multiple selectable markers separated by the
restriction enzyme SmaI. Using this same model, a duplex cloning
system using conventional vector fragments and processing methods
was tested. FIG. 2 shows construction of the duplex cloning vector
pUC19Kan, created from pUC19 restricted with SmaI and a kanamycin
selectable marker with SmaI restricted ends. The kanamycin gene was
initially amplified from pACYC177 using the flanking
oligonucleotide primers KML2 (5'-CAC TGT TAA CCC GGG TTT AAA CGT
TGT GTC TCA AAA TAT CTG ATG T-3', SEQ ID NO:1) and KMR2 (5'-CAC TGT
TCC CGG GAG TCA AAA GCC TCC GG T CGG AGG CTT TTG ACT TTC TGC TTA
GAA AAA CTC ATC GAG CAT CAA ATG-3', SEQ ID NO:2) to generate the
plasmid pKO1.2. The kan gene in pKO1.2 was modified to silently
mutate an internal SmaI site and add a tonB transcriptional
terminator (Reynolds et al. J. Mol. Biol., 224:31 [1992]) to the 3'
end of the gene, using the PCR primers KDSL2: TGG GAT CGC AGT GGT
GAG TAA CCA TGC ATC A (SEQ ID NO:27) and KDSR2: GGG AAA ACA GCA TTC
CAG GTA TTA GAA (SEQ ID NO:28). The resulting plasmid was
designated pKO2.3. The primers KML2 and KMR2 were then used to
amplify the kanamycin gene from the plasmid pKO2.3.
[0231] The vector pUC 19Kan was restricted with SmaI and treated
with alkaline phosphatase to generate two separate selectable
markers from one vector (note that Ori was not tested as a
selectable marker in this Example). A duplex cloning experiment was
set up with 3 separate reactions as follows: 1) A no ligase, no
insert control reaction to test for the level of uncut empty vector
background (pUC19Kan/SmaI/AP-ligase); 2) A plus ligase, no insert
control reaction to check for the efficiency of 5' phosphate
removal (pUC19Kan/SmaI/AP+ligase); 3) A plus ligase, plus insert
reaction to test the overall efficiency of duplex cloning
(pUC19Kan/SmaI/AP+.lambda.RsaI+ligase). The ligation reaction
contained 130 ng pUC19Kan/SmaI/AP. One tenth of this reaction was
transformed into E. coli DH5.alpha.FT, and an aliquot was spread
onto TY agar plates containing ampicillin plus kanamycin. The
transformation results are presented in Table 3.
3TABLE 3 Efficiency of duplex cloning using the pUC19Kan vector
preparation Antibiotic # colonies/ml Ligation rx plate transformed
cells pUC19Kan/SmaI/AP - ligase amp + kan 49,000 pUC19Kan/SmaI/AP +
ligase amp + kan 21,000 pUC19Kan/SmaI/AP + .lambda.RsaI + ligase
amp + kan 75,000
[0232] The pUC19Kan/SmaI/AP-ligase control reaction resulted in a
significantly high background of transformants (approx. 49,000
colonies). The plus ligase reaction also resulted in a large number
of colonies (approx. 21,000) in the absence of insert DNA. This
duplex cloning experiment resulted in approximately 75,000 putative
recombinant clones (pUC19Kan/SmaI/AP+.lambda.RsaI+ligase). The
frequency of empty vector versus ligation events containing insert
DNA was 28% (21000/75000.times.100). It is not uncommon to find
such a high level of empty vector background using conventional
cloning vectors.
EXAMPLE 4
Multiplex Cloning with Partial Source Nucleic Acid
[0233] This Example describes multiplex cloning with partial source
nucleic acid. In particular, to reduce the background due to
denatured plasmid DNA the two selectable markers are purified from
different plasmid `partial sources` and combined in one ligation
reaction (as opposed to obtaining the two selectable markers from a
single plasmid backbone containing both selectable markers, e.g.,
pUC19Kan in Example 3). Because neither source contains both
selectable markers (i.e. both sources are `partial sources`),
intact partial source DNA in the ligation and transformation
reaction is selected against under dual selection. Multiplex
cloning reactions with three combinations of partial source DNA
were tested. The first is the combination of two different
plasmids, with different selectable markers, which have been
processed for ligation and cloning. The second is the combination
of selectable marker fragments amplified by PCR from two separate
vector backbones. The third combination is a plasmid with one
selectable marker and a PCR amplified selectable marker from a
separate vector.
[0234] A. Duplex Cloning with Partial Source Nucleic Acid from
Plasmid and Partial Source Nucleic Acid from PCR
[0235] A duplex clone library was constructed using two selectable
markers, one obtained by PCR amplification of the Kan gene and the
other from the plasmid pUC19. FIG. 3 shows the origin of the
selectable markers and how they were processed to make the duplex
shotgun library. This duplex cloning experiment combined SmaI
restricted and alkaline phosphatase treated pUC19 with the PCR
amplified Kan fragment treated with T4 DNA polymerase to make the
ends blunt. These components were combined in 3 separate reactions
as follows: 1) A no ligase, no insert control reaction to test for
the level of empty vector background (pUC19/SmaI/AP+Kan-ligase); 2)
A plus ligase, no insert control reaction to check for the level of
self ligation (pUC19/SmaI/AP+Kan+ligase); 3) A plus ligase, plus
insert reaction to test the overall efficiency of duplex cloning
(pUC19/SmaI/AP+Kan+.lambda.RsaI+ligase). The ligation reactions
contained a total of 200 ng pUC19 SmaI/AP+Kan PCR in equal molar
amounts. Approximately one fifth of this reaction was transformed
into E. coli DH5.alpha.FT, and an aliquot was spread on TY agar
plates containing ampicillin or ampicillin plus kanamycin. The
transformation results are presented in Table 4.
4TABLE 4 Efficiency of duplex cloning combining a plasmid and PCR
amplified selectable marker from partial sources. Antibiotic #
colonies/ml Ligation reaction plate transformed cells pUC19/SmaI/AP
+ Kan - ligase amp 56,000 pUC19/SmaI/AP + Kan + ligase amp 66,000
pUC19/SmaI/AP + Kan + .lambda.RsaI + ligase amp 120,000
pUC19/SmaI/AP + Kan - ligase amp + kan 0 pUC19/SmaI/AP + Kan +
ligase amp + kan 0 pUC19/SmaI/AP + Kan + .lambda.RsaI + ligase amp
+ kan 360
[0236] The pUC19/SmaI/AP+Kan-ligase reaction is a control
containing only selectable marker fragments to test the degree of
background due to intact source DNA. The high background of intact
pUC19 vector alone is readily seen when plated on amp agar plates
(approx. 56,000 colonies). However, plating the same mixture on the
combination of amp+kan antibiotic prevented this background event
in this Example. Thus, using separate partial sources to provide
the selectable components of the duplex vector, pUC19 for the Amp
gene and pKO2.3 to provide the Kan gene by PCR amplification,
eliminated the background associated with conventional plasmid
cloning methods.
[0237] The pUC19/SmaI/AP+Kan+ligase reaction is a control
containing the selectable marker DNAs in the presence of ligase. On
amp+kan agar plates no colonies were detectable, demonstrating that
the level of dephosphorylation was sufficiently high. The
background of intact plus relegated pUC19 vector is readily seen
when plated on amp (approx. 66,000 colonies). This duplex cloning
experiment resulted in approximately 360 recombinant clones.
[0238] B. Triplex Cloning with Partial Source Nucleic Acid from
PCR
[0239] A triplex clone library was constructed by combining three
selectable marker fragments, obtained by three PCR amplifications
from two separate partial source plasmid templates, in a single
ligation reaction. FIG. 4 shows the origin of the selectable
markers and how they were processed to make the triplex shotgun
clone library. The chloramphenicol-resistance gene (camR or Cam)
from plasmid pACYC184 was amplified by PCR using the flanking
oligonucleotide primers CML2 (5'-TGG ACG TTA ACC CGG GCC TAC TAG
GCC TTG ATC GGC ACG TAA GAG GTT CCA-3', SEQ ID NO:3) and CMR2
(5'-TTA CGC CCC GCC CTG CCA CTC A-3', SEQ ID NO:4). The
ampicillin-resistance gene (ampR) was obtained from pUC19 using the
primers APL2 (5'-CTG TTA ACC CGG GCG CGC CTG TGC GCG GAA CCC CTA
TTT GTT TAT TTT C-3', SEQ ID NO:5) and APR2 (5'-TGG ACG TAC CCG GGC
GCA GAA AGG CCA CCC GAA GGT GAG CCA GTG TGA TTA CAT TTA CCA ATG CTT
AAT CAG TGA GGC ACC T-3', SEQ ID NO:6). A minimal origin of
replication from pUC19 was amplified by PCR using the primers ORIL2
(5'-CTG TTA ACC CGG GAT TTA AAT CGT TGC TGG CGT TTT TCC ATA GGC TC
-3', SEQ ID NO:7) and ORR1 (5'-TGG ACG TTA ACC CGG GTA GAA AAG ATC
AAA GGA TCT-3', SEQ ID NO:8). After PCR amplification each of the
fragments were restricted with SmaI and ligated in equal molar mass
to DNA prepared from RsaI digested lambda DNA. No attempt was made
to dephosphorylate the selectable marker fragments after cleavage
with SmaI in this Example.
[0240] To demonstrate feasibility of triplex cloning, SmaI
restricted, PCR amplified Amp, Cam, and Ori selectable marker DNA
fragments were combined in 3 separate reactions as follows: 1) A no
ligase, no insert control reaction to test for the level of empty
vector background (Amp+Cam+Ori-ligase); 2) A plus ligase, no insert
control reaction to check for multiple marker insertion events and
the size of empty vector DNA (Amp+Cam+Ori+ligase); 3) A plus
ligase, plus insert reaction to test the overall efficiency of
triplex cloning (Amp+Cam+Ori+.lambda. RsaI+ligase+SmaI). To enrich
for the desired final result of 100% recombinants, the last
ligation reaction also included SmaI to recleave any vector ends,
which recreate a SmaI site, and thereby force the cloning of insert
DNA (Liu and Schwartz, Biotechniques 12:28-30, 1992). A total of
250 ng of selectable marker DNA was used in each reaction.
Approximately one tenth of the ligation reaction was transformed
into E. coli DH5.alpha.F'IQ, and an aliquot was spread on TY agar
plates containing ampicillin plus chloramphenicol. The cell
transformation results are presented in Table 5.
5TABLE 5 Efficiency of cloning three DNA fragments using PCR
amplified selectable markers. Antibiotic # colonies/ml Ligation
reaction plate transformed cells SmaI Amp + Cam + Ori - ligase amp
8000 SmaI Amp + Cam + Ori - ligase cam 8000 SmaI Amp + Cam + Ori -
ligase amp + cam 0 SmaI Amp + Cam + Ori + ligase amp + cam 400 SmaI
Amp + Cam + Ori + amp + cam 3000 .lambda.RsaI + ligase + SmaI
[0241] The Amp+Cam+Ori-ligase reaction is a control containing only
selectable marker fragments to test the degree of background from
intact vector source DNA. Significantly, the use of partial source
nucleic acid, namely PCR fragments from pUC19 for Amp and Ori and
from pACYC184 for Cam, to provide the components of the complete
triplex vector eliminated the background associated with
conventional plasmid cloning methods.
[0242] The Amp+Cam+Ori+ligase reaction is a control containing
selectable marker DNA only. As expected, the three selectable
markers generated viable clones in the presence of ligase, as seen
by the 400 colonies/ml of transformed cells. To study the
possibility of multiple marker insert events (e.g. two or more
copies of the Amp, or Cam, or Ori markers in any one plasmid), six
colonies were picked and grown, and the plasmid DNA was extracted
for size analysis by agarose gel electrophoresis. All 6 plasmids
migrated equally and were the size predicted for the 3 fragments
being correctly joined as one.
[0243] This triplex cloning Example resulted in approximately 3000
recombinant clones/ml of transformed cells using 25 ng of vector
fragment DNAs (10% of the 250 ng of starting material) to transform
electrocompetent cells. Restriction analysis of 12 recombinant
clones showed that all 12 were larger than the predicted empty
vector. In addition, the number of inserts in each clone could be
estimated by SmaI restriction analysis, as each insertion
eliminates the SmaI site that would be recreated by joining of
vector fragments. This analysis indicated that {fraction (8/12)}
clones had inserts in all 3 sites, {fraction (2/12)} had two
inserts, and {fraction (2/12)} had 1 insert. The four clones that
showed evidence of restriction by SmaI apparently escaped the
selective pressure of SmaI in the ligation reaction.
[0244] C. Quadraplex Cloning with Partial Source Plasmid Nucleic
Acid
[0245] This Example describes quadraplex cloning with partial
source plasmid nucleic acid. Two partial source plasmids were
constructed as follows: 1) Amp+Cam+Ori, or pACO3, and 2) Kan+Ori,
or pKO2.3 (described in Example 3).
[0246] FIG. 5 shows the origin of the selectable markers and how
they were processed to make the quadraplex shotgun clone library.
The chloramphenicol gene, ampicillin gene, and origin of
replication from pUC19 were amplified by PCR using the primers
described in Example 4B. After PCR amplification, each of the
fragments was restricted with SmaI and ligated in equal molar mass
to create pACO3. To generate the fragments for quadraplex cloning,
pACO3 and pKO2.3 DNAs were purified and restricted with SmaI, and
the resulting fragments (SmaI pACO+pKO) were ligated to DNA
prepared from RsaI digested lambda DNA. To demonstrate quadraplex
cloning a total of 200 ng of selectable marker DNA was combined,
with or without ligase or test insert DNA. Approximately one tenth
of the ligation reaction was transformed into E. coli
DH5.alpha.F'IQ, and an aliquot was spread on various antibiotic
containing agar plates, with cell transformation results shown in
Table 6.
6TABLE 6 Efficiency of cloning four DNA fragments using two partial
source vectors. Antibiotic # colonies/ml Ligation reaction plate
transformed cells SmaI pACO + pKO - ligase amp + cam 20,480 SmaI
pACO + pKO - ligase kan 18,720 SmaI pACO + pKO - ligase amp + cam +
kan 0 SmaI pACO + pKO + ligase amp + cam + kan 40,000 SmaI pACO +
pKO + amp + cam + kan 1260 .lambda.RsaI + ligase + SmaI
[0247] The SmaI pACO+pKO-ligase reaction is a control containing
only selectable marker fragments to test the degree of background
contamination due to uncut vector commonly seen in conventional
cloning. The use of separate, starting vectors, pACO and pKO2.3 in
this instance, which are restricted and mixed to provide the
components of the complete quadraplex vector, combined with
selection against each of the individual starting plasmids,
eliminated background problems due to uncut vectors. The background
of pACO vector or pKO2.3 vector alone is readily seen when the
transformants were plated on amp+cam or kan, respectively. However,
plating the same mixture on the combination of amp+cam+kan
antibiotics prevented this unwanted background event.
Significantly, this Example demonstrates the co-cloning of four DNA
fragments with four selectable markers in one plasmid vector. The
quadraplex cloning experiment resulted in approximately 1260
recombinant clones/ml transformed cells using 10% of the ligation
reaction (100%=200 ng). As with the triplex experiment above, SmaI
was included in the reaction with lambda DNA fragments to force the
co-cloning of insert DNA. The number of inserts in each clone was
estimated by SmaI restriction analysis (the lack of SmaI
restriction indicating foreign DNA insertion), with the following
results: {fraction (5/12)} had inserts in all 4 sites, none of the
clones contained three inserts, {fraction (2/12)} had 2 inserts,
{fraction (3/12)} had one insert and {fraction (2/12)} had no
inserts.
[0248] D. Pentaplex Cloning with Partial Source Nucleic Acid from a
Plasmid
[0249] A pentaplex clone library was constructed using two plasmids
to supply the necessary selectable components. The plasmids were
constructed as follows: 1) Amp+Cam+Ori, or pACO3 (described in
Example 4C), and 2) Kan+Gen+Ori, or pKGO. FIG. 6 shows the origin
of the selectable markers and how they were processed to make the
pentaplex shotgun clone library. The gentamycin gene from plasmid
pJQ200 was amplified by PCR using the flanking oligonucleotide
primers GML1 (5'-CACTGTTAACCCGGGAATTGACATAAGC
CTGTTCGGTTCGTAAACT-3', SEQ ID NO:9) and GMR1 (5'-GTGACAACCCGGGC
AGATTAAAACGAAAGGCCCAGT CTTTCGACTGAGCCTTTCGTTTTATTTGT
TTAGGTGGCGGTACTTGGGTCGATATCA-3', SEQ ID NO:10). After PCR
amplification, the Gen fragment was restricted with SmaI and
ligated to SmaI restricted pKO2.3 (described in Example 3) to
generate pKGO. For pentaplex cloning pACO and pKGO were separately
restricted with SmaI to liberate the individual selectable marker
fragments. In this Example, a total of 200 ng of the selectable
fragments was used in each reaction, with or without ligase or test
insert. Approximately 80% of the ligation reaction was transformed
into DH5.alpha.F'IQ, and an aliquot was spread on various
antibiotic containing agar plates, with cell transformation results
shown in Table 7.
7TABLE 7 Efficiency of cloning five DNA fragments using two partial
source vectors. Antibiotic # colonies/ml Ligation reaction plate
transformed cells SmaI pACO + pKGO - ligase amp + cam 20,000 SmaI
pACO + pKGO - ligase kan + gen 4080 SmaI pACO + pKGO - ligase amp +
cam + kan + gen 0 SmaI pACO + pKGO + ligase amp + cam + kan + gen
8000 SmaI pACO + pKGO + .lambda.RsaI + ligase + SmaI amp + cam +
kan + gen 70
[0250] The SmaI pACO+pKGO-ligase reaction is a control containing
only selectable marker fragments to test the background
contamination due to uncut vector commonly seen in conventional
cloning. The use of separate starting vectors, pACO and pKGO in
this instance, to provide the components of the complete pentaplex
vector eliminated the intact vector background (commonly seen in
cloning experiments). The background of pACO vector or pKGO vector
alone is readily seen on amp+cam or kan+gen, respectively. However,
plating the same mixture on the combination of amp+cam+kan+gen
antibiotic plates prevents this unwanted background event.
[0251] Significantly, this Example demonstrates the co-cloning of
five DNA fragments with five selectable markers in one plasmid
vector. This pentaplex cloning reaction resulted in approximately
70 recombinant clones/ml transformed cells. Higher concentrations
of SmaI than that added in the quadraplex experiment above was used
to force the co-cloning of insert DNA. Estimating the number of
inserts by SmaI restriction analysis revealed that all 10 had
inserts at all 5 sites. Although only 70 pentaplex clones were
recovered in this experiment, it is notable that this number would
be sufficient to sequence a 100 kb BAC with more than 3 fold
redundancy, assuming 2 kb inserts at all five sites.
EXAMPLE 5
Triplex Cloning with Foreign Fragment Insertion Approaching
100%
[0252] This Example describes triplex cloning with the efficiency
of foreign fragment insertion approaching 100%. The use of a
restriction endonuclease, such as SmaI in the above Examples, to
lower the probability of empty insertion sites is not a desirable
option for generation of random shotgun libraries, because this
approach, while selecting against empty insertion sites, also
selects against recombinant clones with internal SmaI sites.
Another approach to inhibit self-ligation of vector DNA, or DNA
fragments with selectable markers, is to eliminate their 5'
phosphate groups, thereby forcing ligation to the insert DNA, which
does contain 5' phosphate groups. There are several methods for
achieving this goal. One is to directly ligate synthetic
oligonucleotides, which normally lack terminal phosphate groups, to
the ends of the vector DNA. Another method is to use PCR amplified
fragments, which likewise lack 5' phosphates due to the
incorporation of the synthetic oligonucleotide primers at their 5'
termini. Another method is to dephosphorylate the DNA with alkaline
phosphatase (AP).
[0253] In this Example, triplex cloning with high efficiency
fragment insertion was achieved using the Amp, Cam, and Ori
selectable markers that were PCR amplified as described in Example
4B above. However, the fragments were end-repaired with T4 DNA
polymerase rather than SmaI restricted. Thus, the selectable marker
DNAs lacked phosphate groups at their ends, forcing ligation to the
insert DNA, which bears phosphate groups. This Example demonstrates
triplex cloning by combining the selectable marker DNA fragments in
3 separate reactions as follows: 1) A no ligase, no insert control
reaction to test for empty vector background (Amp+Cam+Ori-ligase);
2) A plus ligase, no insert control reaction to assay self-ligation
using end-repaired PCR fragments (Amp+Cam+Ori+ligase); and 3) A
plus ligase, plus insert reaction to test the overall efficiency of
triplex cloning (Amp+Cam+Ori+.lambda.RsaI+ligas- e). A total of 177
ng of selectable marker DNA was used in each reaction, with or
without ligase or test insert. Approximately one fifth of the
ligation reaction was transformed into DH5.alpha.F'IQ, and an
aliquot was spread onto antibiotic containing agar plates, with
cell transformation results shown in Table 8.
8TABLE 8 Triplex cloning using end-repaired PCR amplified
selectable markers. Antibiotic # colonies/ml Ligation reaction
Plate transformation Amp + Cam + Ori - ligase amp + cam 5 Amp + Cam
+ Ori + ligase amp + cam 15 Amp + Cam + Ori + .lambda.RsaI + ligase
amp + cam 1895
[0254] Because the partial source backbone vectors used to generate
the selectable markers have compatible origins of replication they
can survive together in a single cell. This triplex cloning
experiment resulted in approximately 1895 recombinant clones, with
only 15 clones recovered from ligations lacking insert DNA. Thus,
the efficiency of recovering recombinant clones is over 99%, as the
background of empty vector transformation is 0.79%
(15/1895.times.100).
[0255] To facilitate analysis of the number of copies of each
selectable marker, a unique eight base restriction site was
incorporated into the 5' end of each marker as follows: Amp, SwaI;
Cam, AseI; and Ori, SfiI. Digestion of a triplex vector with each
of these enzymes yields a single restriction fragment if one copy
of each marker is present. Each additional copy of a selectable
marker will result in one additional fragment from digestion with
the respective enzyme (the random presence of any of these
restriction sites in the cloned insert DNA will be uncommon due to
the rare occurrence of these sites). By this analysis, it was
estimated that multiple insertions occurred in {fraction (12/87)}
clones or 13.8%. However, the unusable fraction of sequencing
reactions possible from the recombinant triplex clones is actually
lower than 13.8%. There are 6 primer extension sites in each
template in the triplex cloning situation, only two of which are
unusable due to multiple insertions. The 87 clones will yield a
total of 87.times.6 or 522 DNA sequence reactions. The 12 clones
with multiple insertions will each yield 2 unreadable DNA sequence
reactions, or 24 total. Therefore, {fraction (24/522)} or 4.2% of
the reactions will be unreadable due to multiple insertions of a
given selectable marker.
EXAMPLE 6
Construction of a Direct Selection Cloning Vector
[0256] Using antibiotic selectable markers from partial source
plasmid backbones is an efficient method of reducing the
contaminating background of empty vector in a multiplex cloning
experiment. It was determined, however, that this method is not
always 100% efficient, as a low but measurable number of clones
(approximately 0.1-0.5%) contained two separate plasmids in a
single cell. Size analysis revealed that the presence of two
plasmids was the result of a double transformation event with two
empty vectors. This result is not unexpected, given the mixture of
two or more plasmids in the multiplex ligation reactions, the high
transformation efficiency of denatured but un-cleaved DNA, and the
low transformation efficiency of restricted, re-ligated DNA.
[0257] This Example describes the construction of direct selection
vector based on the bacteriophage T7 gene 1.2, in addition to its
use in a conventional (single insert) cloning reaction. Three
potential positive selection cloning systems, based on the sacB
gene, the Bax, and the bacteriophage T7 1.2 gene (Schmitt et al., J
Bacteriol., 173:1536-43, 1991), were designed and tested as
adjuncts for low background cloning. The sacB gene mediating
sucrose sensitivity had been developed previously by other
researchers as a direct selection cloning scheme for use in E.
coli. For unknown reasons we were not able to readily obtain these
results in this Example. The Bax gene has been shown previously to
be highly toxic when expressed in E. coli, but for unknown reasons
we were not able to readily control the expression of this gene.
Expression of bacteriophage T7 gene 1.2 is lethal to F' containing
E. coli but not F minus strains. Thus, plasmid-based expression of
this gene product should be lethal in male E. coli cells but not
female strains.
[0258] Of the three systems tested, only the bacteriophage T7 1.2
gene product provided sufficient control of the background
transformants. FIG. 7 diagrams the construction of pT71.2 and pTM2,
vectors employed in this Example. Combining most of pUC19 with the
T7 gene 1.2 coding sequence resulted in the initial positive
selection vector pT71.2. The majority of pUC19, except for the
multiple cloning site and the first 7 amino acids of the
lacZ.alpha. gene, were amplified using the primers LACZNCOL
(5'-CAGTGTCACTCCATG GCCATGATTACGCCAAGCTTGCATGCCTG-3', SEQ ID NO:11)
and LACZNCOR (5'-CAGTGTCACTCCCATGGCTGTTTCCTGTGTGAAATT
GTTATCCGCT-3', SEQ ID NO:12). Gene 1.2 was amplified from
bacteriophage T7 (J Bacteriol., 173:1536-43, 1991) using the
oligonucleotides T71.2L (5'-TGTCACTCCATGGGACGTTTATAT
AGTGGTAATCTGGCAGCA-3', SEQ ID NO:13) and T71.2R (5'-CTGACTCGAAT
TCTTACTTCCAGTCCTTCAACTGGTCATACATA-3', SEQ ID NO:14) and cloned in
frame with the lacZ.alpha. start codon in pUC19.
[0259] The correct pT71.2 construct, confirmed by restriction and
size analysis, was tested for lethality in two strains of E. coli:
the F plus strain DH5.alpha.F'IQ and the F minus strain DH10B. As
the T7 1.2 gene was inserted behind the ATG start codon of the lacZ
gene in pT71.2, its expression is controlled by the lacZ.alpha.
promoter. The gratuitous inducer of the lacZ promoter, IPTG, is
often used to increase the level of expression from this regulatory
element. In the presence of IPTG, approximately 2000 fold fewer
colonies were observed when supercoiled pT71.2 DNA was transformed
into the F plus strain DH5.alpha.F'IQ rather than the F minus
strain DH10B. In the absence of IPTG, no difference in colony
forming units was observed between the two strains.
[0260] The T7 1.2 gene lacks useful restriction sites for cloning
within its short coding sequence. In order to make a more
functional direct selection cloning vector, a multiple cloning site
identical to that found in pUC19 was inserted between the first and
second codons of the T7 gene 1.2 in pT71.2, as shown in FIG. 7,
resulting in the plasmid pTM2. Two synthetic oligonucleotides, T7
MCS TOP (5'-CATGCAAAGCTTGCATGCCTGCAG GTCGACTCTAGAGGATCCCCGG
GTACCGAGCTCGAATTCTAG-3' SEQ ID NO:29) and T7 MCS Bottom
(5'-CATGCTAGAATTCGAGCTCGGTACCCGGGGATCCTCT AGAGTCGACCTGCAGGCATGCAAG-
CTTTG-3'SEQ ID NO:30), were annealed to yield a double stranded
multiple cloning site fragment with NcoI overhanging ends. This
sequence was ligated to NcoI digested pT71.2.
[0261] The correct pTM2 construct, confirmed by restriction and
size analysis, was tested for lethality in DH5.alpha.F'IQ and
DH10B. pTM2 also showed a 2000 fold differential plating efficiency
in DH10B versus DH5.alpha.F'IQ using supercoiled plasmid DNA, again
only in the presence of IPTG.
[0262] To test the efficiency of pTM2 as a direct selection cloning
vector, the plasmid was restricted with SmaI, dephosphorylated with
alkaline phosphatase (AP), and incubated with or without ligase and
insert DNA. The DNA was used to transform DH5.alpha.F'IQ or DH10B
and plated on ampicillin plus IPTG agar.
9TABLE 9 Efficiency of direct selection cloning using an engineered
T7 1.2 gene construct. DH5.alpha.F'IQ DH10B # colonies/ml #
colonies/ml Ligation reaction transformation transformation SmaI
pTM2/AP - ligase 1700 5200 SmaI pTM2/AP + ligase 2600 12,300 SmaI
pTM2/AP + .lambda.RsaI + ligase 416,000 142,000
[0263] In this Example, direct selection cloning resulted in
approximately 416,000 recombinant clones when transformed into
DH5.alpha.F'IQ. The frequency of empty vector versus insertion
ligation events was 0.6% (2600/416000.times.100), or 163 fold more
colonies when insert DNA was present. The same DNA transformed into
DH10B resulted in an empty parental vector background of
approximately 8.6% (12300/142000.times.100)- . The transformation
efficiency of the two strains, DH10B and DH5.alpha.F'IQ, was the
same using a pUC19 control plasmid, confirming that the differences
seen with pTM2 reflect selection against this plasmid in
DH5.alpha.F'IQ, rather than simply a lower transformation
efficiency of this strain.
EXAMPLE 7
Multiplex Cloning with a Direct Selection Vector
[0264] This Example describes multiplex cloning employing a direct
selection vector. In particular, this Example describes duplex
cloning wherein pTM2/SmaI/AP was mixed with a PCR amplified Cam
gene (see e.g., Example 3B), which was end-repaired with T4 DNA
polymerase. Three separate reactions were performed as follows: 1)
A no ligase, no insert control reaction to test for the level of
empty vector background contamination (pTM2/AP+Cam-ligase); 2) A
plus ligase no insert control reaction to check the level of
self-ligation (pTM2/AP+Cam+ligase); 3) A plus ligase, plus insert
reaction to test the overall efficiency of duplex cloning
(pTM2/AP+Cam+.lambda.RsaI+ligase). In this Example a total of 100
ng of selectable marker DNA was used in each reaction.
Approximately one fifth of the ligation reaction was transformed
into DH5.alpha.F'IQ, and an aliquot was spread on amp+cam
antibiotic plates, with cell transformation results shown in Table
10.
10TABLE 10 Multiplex cloning using a direct selection vector and
PCR amplified selectable marker. Ligation rx # colonies/ml
transformation pTM2/AP + Cam - ligase 135 pTM2/AP + Cam + ligase
130 pTM2/AP + Cam + .lambda.RsaI + ligase 15,600
[0265] As seen in Table 10, the multiplex cloning reaction
illustrated by this Example resulted in 15,600 recombinant clones.
The frequency of empty vector versus insertion ligation events was
0.83% (130/15600.times.100). The frequency of multiple selectable
marker insertion events was estimated by restriction analysis using
the unique eight base restriction site associated with each marker,
as described above. Three of 60 clones analyzed, or 5.0%, had
multiple inserts. The three clones with multiple inserts will yield
6 unusable sequencing reactions. Compared to a total of 240
reactions from the 60 analyzed clones, 2.5% ({fraction (6/240)}) of
the sequencing reactions will be unreadable due to multiple
fragment insertions.
EXAMPLE 8
Construction of Second Generation Direct Selection Multiplex
Cloning Vectors
[0266] This Example describes the construction of second generation
direct selection multiplex cloning vectors (e.g., smaller or
amenable to excision of the direct selection fragment). FIG. 8
diagrams construction of the direct selection cloning vectors
pCTA1, pCTAB4.3, and pCTH1.4. The conditionally lethal
bacteriophage T7 1.2 gene, with its engineered multiple cloning
site, was amplified from pTM2 (described in Example 6) using
oligonucleotides LZL1(5'-CATTAGGCACCCCAGGCTTTACACTTTATG-3', SEQ ID
NO:15) and T71.2R2 (5'-TTATTACTTCCAGTCCTTCAACTGGTCATACATATGGTTC-3',
SEQ ID NO:16). The chloramphenicol resistance gene was PCR
amplified from pACYC184 using the primers CML2 (SEQ ID NO:3) and
CMRT7 (5'-CAGACTGTGC AAGCTTTGCATTTACGCCCCGCCCTGCCACTCA-3', SEQ ID
NO:18). The T7 and Cam PCR fragments were made blunt by treatment
with T4 DNA polymerase, the ends were phosphorylated using T4
kinase, and both fragments were restricted with endonuclease
HindIII. A minimal origin of replication was PCR amplified using
the ORIL2 (SEQ ID NO:7) and ORR1 (SEQ ID NO:8) primers described
earlier. The Ori PCR fragment was digested with SmaI. All three
fragments were combined in a ligation reaction and transformed into
DH10B cells.
[0267] The ATG start codon of the T7 1.2 gene was joined
immediately after the TAG stop codon of the chloramphenicol
resistance gene to form a single operon by HindIII digestion and
subsequent ligation of the 3' Cam PCR fragment and 5' T7 1.2 PCR
fragment. A minimal origin of replication was added to form the 1.7
kb plasmid pCTMCS, which was confirmed by restriction and size
analysis and was functionally tested for direct selection
capabilities in DH5.alpha.F'IQ and DH10B. This plasmid has a single
promoter, from the Cam gene, driving the expression of both the Cam
and T7 1.2 genes. This design circumvents the need for two separate
promoters and results in constitutive expression of the T7 1.2
gene, eliminating the need for IPTG induction.
[0268] Additional vectors with alternative restriction sites were
constructed in this Example. The multiple cloning site of pCTMCS
was modified to add an AvaII restriction site, using PCR primers
AVAL(5'-TCCTCTAGAGTCGACCTGCAGGCA-3', SEQ ID NO:19) and AVAR
(5'-CCGGGTACCGAGCTCGAATTCTAGCA-3', SEQ ID NO:20), which were
designed so as not to disrupt the reading frame of the T7 1.2 gene.
The resulting plasmid was designated pCTA1. The enzyme AvaII was
chosen for its ability to leave a three base extension, which
alkaline phosphatase is expected to use very efficiently as a
substrate for dephosphorylation, decreasing the likelihood of
vector re-ligation. Further, filling in the three base extension
with T4 DNA polymerase and dNTPs results in generation of a triplet
codon, which will not disrupt the reading frame of the T7 1.2 gene
in those cases in which re-ligation of the vector does occur,
retaining the positive selection against the re-ligated,
non-recombinant vector.
[0269] The single AvaII site of pCTA1 is situated at codon 13 of
the MCS T7 1.2 gene hybrid construct. The authentic second codon of
T7 gene 1.2 is located 7 codons further downstream. Thus, it is
possible that a DNA insertion at the AvaII site could disrupt the
reading frame of the downstream T7 1.2 gene, but subsequent
translation re-initiation or frameshifting could result in an
intact toxic gene product. To circumvent this possibility, three
restriction endonuclease sites were added facilitate removal of the
T7 1.2 gene from pCTA1, creating the plasmid pCTAB4.3 (FIG. 8). A
second AvaII site was added to the 3' end of the T7 1.2 gene using
PCR primers Ava2L: ACCAAAGATCTTATTACTTCCAGTC CTTCAACTGGTCA (SEQ ID
NO:31) and Ava2R: CCTGCAGGGAGCATTTAAATCGTT GCTGGCGTTTTTCCATAGGCT
(SEQ ID NO:32). The presence of AvaII sites at both ends of the T7
1.2 gene allows its complete removal upon digestion with AvaII.
[0270] In addition, two BgIII sites were incorporated within the T7
1.2 gene within codons that could be mutated without changing the
amino acid sequence, using PCR primers T7BL3:
CTGTCCTCAATACGTAACCGTATGCAATCTTTTCTTGT- A (SEQ ID NO:33), T7BR3:
ATCTGGAAACCTGATTGATACTAGCACCTTCTACCA (SEQ ID NO:34), T7BL4:
TCTGAGCTCGGTACCCGGTCCTCTAGAGTCGA (SEQ ID NO:35) and T7BR4:
TCTTAGCATGGGACGTTTATATAGTGGTAATCTGGCAGCA (SEQ ID NO:36). Following
liberation of the T7 1.2 gene fragment by AvaII digestion, further
digestion with BglII will cleave the T7 1.2 gene into segments less
than 200 bases in length. This cleavage facilitates purification of
the vector backbone away from the T7 1.2 sub-fragments by
fractionation, for example, with diatomaceous earth (DE) or
precipitation with 7% 8000 MW polyethylene glycol and 10 mM
magnesium chloride (PEG8000/MgCl.sub.2).
[0271] Further PCR mutagenesis reactions were employed to add a
HincII restriction site to the 3' end of the T7 1.2 gene, using PCR
primers CHp1138R: TAT AGT TAA CGC TCC CTG CAG GAC CA (SEQ ID NO:37)
and CHp1138F: GGC AGT TAA CAT TTA AAT CGT TGC TGG CGT (SEQ ID
NO:38), and to remove an unwanted HincII site between the Cam gene
and Ori using PCR primers CAp29F: TAT TGG GCC CTG ATC GGC ACG TAA
GAGG (SEQ ID NO:39) and CAp1772R: TCA TGG GCC CAA AAG ATC AAA CGA
TCC TCT TGA GA (SEQ ID NO:40). These HincII sites provide an
alternative method to excise the T7 1.2 gene while simultaneously
generating a blunt-ended vector. As shown in FIG. 8, the resulting
direct selection construct is plasmid pCTH1.4 (sequence provided in
FIG. 11, SEQ ID NO:41).
[0272] An indirect experiment was performed to measure the level of
false negative cloning results from the T7 1.2 based suicide
vectors. The plasmid vectors pCTA1 and pCTAB4.3 are nearly
identical in structure and sequence, the primary difference between
them being the additional AvaII restriction site in pCTAB4.3 that
allows the T7 1.2 gene to be excised completely.
[0273] pCTA1 and pCTAB4.3 were restricted with AvaII,
dephosphorylated with Thermosensitive Alkaline Phosphatase, and
treated with T4 DNA polymerase to generate blunt ends. pCTAB4.3 was
further digested with BglII and purified to completely remove the
T7 1.2 gene from the Cam+Ori plasmid backbone. A direct selection
clone library was constructed from each of these treated vectors to
determine the empty vector background and false negative cloning
results. Three separate ligation reactions were prepared as
follows: 1) A no ligase, no insert control reaction to test for the
level of contaminating empty vector (pCTA1 or
pCTAB4.3/AvaII/AP-ligase); 2) A plus ligase, no insert control
reaction to check for the efficiency of 5' phosphate removal (pCTA1
or pCTAB4.3/AvaII/AP+ligase); 3) A plus ligase, plus insert
reaction to test the cloning efficiency (pCTA1 or
pCTAB4.3/AvaII/AP+.lambda.RsaI+ligase). The ligation reactions
contained 100 ng of treated vector DNA. Approximately one fifth of
this reaction was transformed into E. coli DH5.alpha.F', and an
aliquot was spread onto TY agar plates containing chloramphenicol.
The transformation results are presented in Table 11.
11TABLE 11 Cloning assay to assess false negative results with or
without the intact T7 1.2 gene Ligation rx # colonies/ml
transformation pCTA1/AvaII/AP - ligase 333 pCTA1/AvaII/AP + ligase
22,000 pCTA1/AvaII/AP + .lambda.RsaI + ligase 493,000
pCTAB4.3/AvaII/AP - ligase 0 pCTAB4.3/AvaII/AP + ligase 32,600
pCTAB4.3/AvaII/AP + .lambda.RsaI + ligase 1,530,000
[0274] The background of empty vector was similar for both treated
plasmids: 4.5% using pCTA1 (22,000/493,000.times.100) and 2.1%
using pCTAB4.3 (32,600/1,530,000.times.100). However, complete
removal of the T7 1.2 gene in the processed pCTAB4.3 case resulted
in three times as many putative recombinant clones (1,530,000 vs.
493,000). The experiment was repeated four times using fresh
preparations of the processed material with similar results,
pCTAB4.3 consistently yielding 3-4 fold more recombinant clones
than pCTA1 while maintaining a similar level of background. The
decreased number of clones from pCTA1 indicates that the T7 1.2
gene of pCTA1 generates false negatives that cannot survive to form
visible colonies. These false negatives are eliminated by removing
the T7 1.2 gene from the final vector preparation, as in the pCTAB
preparation. It is important to note that the direct selection
function provided by the T7 1.2 gene is useful to reduce the
background of uncut vector.
[0275] Additional variants of pCTH1.4 were constructed to replace
the camR gene with other selectable markers. The primers pCmO-R TTT
AGC TTC CTT AGC TCC (SEQ ID NO:53) and gp1.2-F: ATG CAA AGC TTG CAT
GCC T (SEQ ID NO:54) were used in PCR, with pCTH1.4 as a template,
to amplify a fragment consisting of all of pCTH1.4 except for the
coding sequence of the camR gene. The camR promoter and translation
initiation signals were retained in this PCR fragment ("pCmO1.2
fragment"), along with the origin of replication and all except
five codons of 5' terminus of the T7 1.2 gene. The ampR coding
region, beginning at the initiating ATG codon and lacking any
promoter or 5' non-translated sequences, was amplified from pUC19
by the primers AmpF: ATG AGT ATT CAA CAT TTC C (SEQ ID NO:55) and
Amp1.2R: ATG CAA GCT TTG CAT TTA CCA ATG CTT AAT CAG (SEQ ID
NO:56). The genR coding region was amplified from pKGO with the
primers Gen-F2: ATG TTA CGC AGC AGC AAC GAT GTT ACG CAG CAG GGC AGT
(SEQ ID NO:57) and Gen1.2-R ATG CAA GCT TTG CAT TTA GGT GGC GGT ACT
TGG (SEQ ID NO:58). The kanR coding region was amplified from pACYC
with the primers Kan-F: ATG AGC CAT ATT CAA CGG G (SEQ ID NO:59)
and K1.2Sph-R: CTG CAG GCA TGC AAG CTT TGC ATT TAG AAA AAC TCA TCG
AG (SEQ ID NO:60). Each of the resulting PCR products contained the
five N-terminal codons of the T7 1.2 gene fused to the 3' terminus
of the respective antibiotic gene. Each PCR fragment was treated
with T4 DNA polymerase to generate blunt ends. The pCmO1.2 fragment
was then ligated in the presence of T4 polynucleotide kinase and T4
DNA ligase to the ampR and genR PCR fragments to generate the
plasmids pATH1 and pGTH2. The pCmO1.2 fragment was similarly
ligated to kanR PCR fragment; however, no kanR clones were
recovered.
[0276] Cells transformed with pUC19 or with other vectors
containing the native ampR gene express a high amount
.beta.-lactamase, the product of the ampR gene that confers
resistance to ampicillin. Because .beta.-lactamase is secreted by
host bacteria, it inactivates the ampicillin or carbenicillin in
the medium surrounding colonies of cells transformed with such
plasmids. Non-transformed bacteria present in this zone of
inactivated antibiotic are able to grow, resulting in "feeder" or
"satellite" colonies.
[0277] Placing the ampR coding sequence under control of the
promoter from the camR gene was expected to lower expression of
.beta.-lactamase in transformants, thereby reducing the growth of
the surrounding ampicillin sensitive cells. Following
transformation of cells with the putative pATH1 ligation reaction,
colonies surrounded by a low number feeder colonies were selected
for further analysis. The plasmid contained in one of these
colonies was purified and designated pATH1. Sequence analysis of
pATH1 confirmed that the ampR coding sequence had been fused to the
camR promoter as desired. However, the sequencing data also
revealed that the ampR gene in pATH1 contained several point
mutations. Subsequent transformation of cells with purified pATH1
and its derivatives confirmed that it produced significantly fewer
feeder colonies than cells transformed with pUC19 (see Example
23).
EXAMPLE 9
Construction of Third Generation Direct Selection Multiplex Cloning
Vector
[0278] This Example describes construction of third generation
direct selection multiplex cloning vectors, which minimize
vector-driven transcription into the insert DNA and insert-driven
transcription into the vector. To avoid transcription of the insert
DNA, the vector is configured such that transcription of the ampR
coding sequence proceeds in a direction away from the cloning site.
In addition, the ampR coding sequence is followed by a
transcriptional terminator. No other promoters are present in the
vector. A transcriptional terminator has also been placed on either
side of the cloning site to block transcripts originating from
within the insert DNA. The third generation multiplex cloning
vector pAT3 was constructed by PCR (diagrammed in FIG. 15). The PCR
primers used in this construction are as follows:
12 TR1: CTG GCT CAC CTT CGG GTG GGC CTT TCT GCG TTG CTG GCG TTT TTC
CAT; (SEQ ID NO: 61) TL1: TGT GAT TAC ATT TGG ACG CCT GTG AGC TTG
AGG TTA ACG CTC CCT GCA GGA CCA; (SEQ ID NO: 62) TL2: CAC CTT CAC
GGG TGG GCC TTT CTT CGG TAG AAA AGA TCA AAG GAT CTT CTT GAG; (SEQ
ID NO: 63) TR2: AGC CAG TGA GTT GGT TAC AGT CCA GTT ACT CTC ACT GGA
TGA TCG GCA CGT AAG AGG TTC CAA C; (SEQ ID NO: 64) TOT1435-F: GTA
ATG AGG GCC CAA ATG TAA TCA CCT GG; (SEQ ID NO: 65) T7-1F: CCT GAA
TGA TAT CAA GCT TGA ATT CGT TAA CGG CAC CCC AGG CTT TAC AC; (SEQ ID
NO: 66) T7-422R: CTG ATT TAA ATG GTC AGT ATT GAG CGA TAT CTA GAG
AAT TCG TCG ACT TAC TTC CAG TCC TTC AAC TGG; (SEQ ID NO: 67)
TAmp423-F: TAC CTG ACC TCC ATA GCA GAA AGT CAA AAG CCT CCG ACC GGA
GGC TTT TGA CTT GAT CGG CAC GTA AGA GGT TC; (SEQ ID NO: 68)
Amp-1454R: CAT TTG GGC CCT CAT TAC CAA TGC TTA ATC AG; (SEQ ID NO:
69) TOT-1435F: GTA ATG AGG GCC CAA ATG TAA TCA CCT GG; (SEQ ID NO:
70) TOT-16R: CTT GAT ATC ATT CAG GAC GAG CCT CAG ACT CCA GTG AGC
GTA ACT GGA CTG TAA TCA ACT CAC TGG; (SEQ ID NO: 71) TOT-16RD: CTT
GAT ATC ATT CAG GAG GAG CC; (SEQ ID NO: 72) TAmp-423FD: TAC CTG ACC
TCC ATA GCA GAA A; (SEQ ID NO: 73) T7-422RD: CTG ATT TAA ATG GTC
AGT ATT G. (SEQ ID NO: 74)
[0279] As the first step in the construction of pAT3, PCR was used
to insert the T3Te and T7Te transcriptional terminators into
pCTH1.4 by amplification of pCTH1.4 with the primers TR1 and TL2 in
one reaction and with the primers TR2 and TL1 in a second reaction.
The resulting TR1/TL2 fragment contains the origin of replication
from pCTH1.4, flanked by half of the T3Te terminator at one end and
half the T7Te terminator on the other end. TR2/TL1 fragment
contains the remaining portion of pCTH1.4, including the camR gene
and T7 1.2 gene, flanked by the remaining half of the T3Te
terminator at one end and the remaining half the T7Te terminator on
the other end. The fragments were ligated to each other and
transformed into DH10B cells. A plasmid containing the fragments
ligated in the proper orientation to join the complementary
portions of each terminator was designated pCTTTO-6. Sequence
analysis of pCTTTO-6 revealed that it lacked a single base pair in
the T7Te terminator region: however, the deletion was not in the
stem-loop structure of the T7Te terminator that is considered
critical to its function.
[0280] The primers TOT-1435F and TOT-16R were used to amplify a DNA
fragment ("T-Ori-T" fragment) containing the T7 terminator, the
origin of replication, and the T3 terminator from the plasmid
pTTTO. This PCR was successful only upon lowering the annealing
temperature of the reaction to 40.degree. C. The T7 and T3
terminators in the T-Ori-T fragment are oriented such that they
terminate transcripts entering from either side of this fragment.
The pLac/T7 1.2 fragment, consisting of the lacZ promoter fused to
the T7 1.2 gene was amplified from the plasmid pT7 1.2 by PCR with
the primers T7-1F and T7-422R. The primer T7-1F shares 16 bases of
homology with TOT-16R; thus, the 16 bp constituting the 3' end of
the T-Ori-T fragment are identical to the 16 bp at the 5' end of
the PLac/T7 1.2 fragment. The T-Ori-T and pLac/T7 1.2 fragments
were gel purified, mixed, and added to the primers TOT-1435F and
T7-422R in a PCR. The overlap present in these two fragments allows
them to anneal to each other in the PCR. The resulting fusion of
the two fragments is designated the "T-Ori-T-pLac/T7 1.2" fragment.
A fragment containing the ampR coding region ("ampR" fragment") was
PCR amplified from pATH1 with the primers TAmp-423F and Amp-1454R.
Because the primers TAmp-1454R and TOT-1435F share 19 bases of
homology, the 19 bp constituting the 3' end of the ampR fragment
are identical to the 19 bp at the 5' end of the TOT fragment. The
TOT-T7 and Amp fragments were gel purified, mixed, and added to the
primers TAmp-423FD and T7-422RD in a PCR to create the fusion
fragment "Amp-T-Ori-T-pLac/T7 1.2". This fragment was present as a
faint band in the PCR products. It was gel purified and
re-amplified with the same primers to generate a more intense band,
which was gel purified, treated with T4 DNA polymerase, and
circularized by self-ligation in the presence of T4 polynucleotide
kinase and T4 DNA ligase. The T7 and ampR fragments each contained
a portion of the TonB terminator, so the intact TonB terminator was
formed at the junction of the two fragments. The ligated fragment
was transformed into DH10B cells, and plasmid DNA was isolated from
an ampicillin resistant colony.
[0281] As a functional test of the T7 1.2 gene, approximately 200
pg of intact pAT3 was transformed into DH10B (F-minus) and MC12
(F') cells. The DH10B cells are expected to show no selection
against this plasmid, regardless of whether the T7 1.2 is
expressed, since they lack the F plasmid required for selection.
The MC12 cells are expected to show selection only when expression
of the T7 1.2 gene is induced (e.g. by IPTG). The results indicated
that the T7 1.2 gene functioned as expected. The DH10B cells
yielded the approximately 5.times.10.sup.9 colonies per ug of
plasmid transformed, regardless of the presence of IPTG, which is
the expected efficiency of transformation. The MC12 cells also gave
about 5.times.10.sup.9 colonies per ug of plasmid transformed when
the cells were plated in the absence of IPTG, but only
2.5.times.10.sup.7 colonies per ug transformed when the cells were
plated in the presence of IPTG. Moreover, the MC12 colonies that
grew in the presence of IPTG were significantly smaller than the
DH10B transformants or the MC12 transformants that grew in the
absence of IPTG, confirming the deleterious effects of expressing
the T7 1.2 gene product.
[0282] Sequencing pAT3 revealed that the TonB terminator suffered a
6-bp deletion. The PCR primers LacO-F
(5'-GAGCTGATAACAATTTCAGACAGGAAACAGCCA, SEQ ID NO:101) and TonB-R
(5'-TCGGAGGCTTTTGACTTTCTGCTATGGAGGTCAGG, SEQ ID NO:108) were
designed to amplify a fragment of pAT3 containing a portion of the
lac promoter, the T7 1.2 gene, and a portion of the TonB
terminator. This fragment incorporated changes in the lac operator
that were expected to eliminate its function, resulting in
constitutive expression of the T7 1.2 gene. It also restored the
missing bases in the TonB terminator to restore its native
sequence. The primers LacO-R
(5'-ATAATTCCACACATTATACGAGCCGGAAGCATAAAG, SEQ ID NO:109) and TonB-F
(5'-CCGGAGGCTTTTGACTTGATCGGCACGTAAGA, SEQ ID NO:118) amplified the
remainder of the plasmid, incorporating additional mutations in the
lac operator and the remaining part of the TonB terminator. These
fragments were ligated to form the plasmid pAT4. Sequence analysis
indicated that pAT4 carried the repaired TonB terminator and a
mutated lac operator. Functional analysis of the T7 1.2 gene was
performed as it was for pAT3. MC12 and DH10B cells were transformed
with 200 pg of intact pAT4, and aliquots were spread on ampicillin
plates with or without IPTG. The expected high transformation
frequency of >10.sup.9 cfu/g was obtained in DH10B cells with or
without IPTG. MC12 cells gave a transformation frequency of
>10.sup.9 cfu/g without IPTG and approximately 10.sup.7 cfu/g
with IPTG, indicating that the T7 1.2 gene was expressed only in
the presence of IPTG, as in pAT3.
[0283] To create a constitutively active T7 1.2 gene, the primers
LACdO-F (5' GGACTCGAGGGACGTTGCCTTACAGGAAACAGCCATGGGA, SEQ ID
NO:119) and LacO-R were used in a PCR to create a derivative of
pAT3 that deleted the entire lac operator and replaced it with an
XbaI restriction site. The resulting fragment was circularized to
form the plasmid pAT5. Functional analysis of pAT5 was performed as
it was for pAT3 and pAT4. The expected high transformation
frequency of >10.sup.9 cfu/g was obtained in DH10B cells with or
without IPTG. Likewise, MC12 cells also gave a transformation
frequency of >10.sup.9 cfu/g with or without IPTG, indicating
that the T7 1.2 gene was expressed constitutively in pAT5. Since
pAT3 was the template for the PCR that created pAT5, the 6-bp
deletion of the TonB terminator from pAT3 is present in pAT5. The
primers TonB-F and TonB-R were used in a PCR to amplify a fragment
from pAT5 that contained the intact TonB terminator. The fragment
was re-circularized to form the plasmid pAT6-6. Functional testing
of pAT6-6 indicated that the T7 1.2 gene was constitutively
expressed, as it was in pAT5.
EXAMPLE 10
Construction and Use of Conditional Replication Vectors
[0284] As described in Example 3, the level of background colonies
in multiplex cloning is greatly reduced by isolating the selectable
markers from at least two independent partial source vectors.
Nonetheless, as discussed in Example 5, there remains a detectable
level of background colonies due to co transformation with both of
the parental vectors. Example 5 illustrates that isolating
selectable markers from a direct selection vector provides one
means of decreasing this source of background. The present Example
demonstrates that the background from dual parental transformants
in a multiplex cloning reaction may be reduced by isolating at
least one of the selectable markers from a conditional replication
vector that can not grow in the same host as that used for
transformation.
[0285] The replication origin of bacteriophage fd was used as the
basis of the conditional replication plasmids diagramed in FIG. 9.
Geider et al. (Gene 33:341-349, 1985) showed that approximately 300
bp of DNA from the intergenic region of bacteriophage fd is
sufficient to act as an origin of replication in the presence of
the fd gene 2 protein, the only viral product required for phage
DNA replication. Plasmids containing the fd Ori can grow only in
those E. coli strains engineered to express the bacteriophage fd
gene 2 protein. One such strain such is BHB2600 (ATCC # 47004).
[0286] The conditional replication vector pKf2 (FIG. 9) was
constructed using the PCR amplified kanamycin gene plus TonB
terminator from plasmid pKO2.3 (Example 3). The fd origin of
replication was amplified from bacteriophage fd using the flanking
oligonucleotide primers SSF1L2
(CTCTGAGAATTCATCTGCAGCTCGCCACGTTCGCCGGCTTTC CCCGTCA, SEQ ID NO:21)
and SSF1R2 (TGCACGAATTCTTGCTGCAGTTGTAAACG TTAATATTTTGTTAAAATTCGCGT,
SEQ ID NO:22). The PCR fragments were end repaired with T4 DNA
polymerase, phosphorylated using T4 kinase, ligated with T4 ligase,
and transformed into BHB2600. The correct construct was identified
by restriction analysis and the ability to transform kanamycin
resistance to BHB2600 cells but not to DH10B cells, which lack the
gene 2 protein.
[0287] To minimize the amount of unessential vector DNA in the
final multiplex cloning preparation, a series of PCR mutagenesis
steps were used to incorporate five additional restriction sites
into pKf2. Two BamHI sites were sequentially added to the fd Ori in
a series of constructions using the PCR primers SSBL12
TCCGTAAAGCACTAAATCGG AACCCTAAAGGGAG (SEQ ID NO:42) and SSBR12
TCCTCGACCCCAAAAAACT TGATTAGGGTGATGGTTCA (SEQ ID NO:43) and PCR
primers SSBL4 CGAAAAACCGTCTATCAGGGCGATGGCCCA (SEQ ID NO:44) and
SSBR4 GATCCCTTTGACGTTGGATTCCACGTTCTTTAATAGTGGACTCTTGTTCCA (SEQ ID
NO:45). The resulting plasmid was designated pKfB4.8. These BamHI
sites were added as a means to cleave the Ori fragment into small
sub-fragments (less than 200 bp), to facilitate their removal from
the Kan fragment by digestion followed by DE fractionation or PEG
precipitation, as described in Example 8. Sequence analysis of
pKfB4.8 revealed several mutations that were corrected in a PCR
reaction using the primers K1203B-L: TCC GAA AAA CCG TCT ATC AGG
GCG ATG GCC CA (SEQ ID NO:46) and K1203B-R: TCC CTT TGA CGT TGG AGT
CCA CGT TGT TT (SEQ ID NO:47). An additional BamHI site was
incorporated into the Ori sequence using the PCR primers B1310B-L:
CTT TTG TCA TTT TCT GCT TAC TG (SEQ ID NO:48) and B1310-R: GAT CCT
TAT AAA TCA AAA GAA TAG GCC GA (SEQ ID NO:49). The resulting
plasmid was designated pKf7-1.
[0288] Subsequently, the PCR primers KHc1032R: TCA TGT TAA CCA GGA
ATC TGG ATC CTG CAG CGC C (SEQ ID NO:50); KHc1047F: TAT AGT TAA CGC
AGC TCG CCA CGT TCG CC (SEQ ID NO:51); KHp1399F: TAC TGT CGA CGC
ATA TCT GGA TCC TGC AGC CGA TAC (SEQ ID NO:52); and KHp1384R: GGA
GGT CGA CGC AGT TGT AAA CGT TAA TA (SEQ ID NO:17) were used to add
HincII restriction sites to allow the option of excising the fd Ori
from the Kan gene by HincII digestion, which leaves blunt ended
fragments. The resulting construct, designated pKfH1, was confirmed
by restriction analysis and DNA sequence analysis.
[0289] The TonB terminator is present in the pKf series of plasmids
(e.g. pKf2, pKO2.3, pKf 4.8, pKfH1) and the pAT series of plasmids
(e.g. pAT3, pAT -4, pAT -5, pAT -6, pATBst, pATR-G, pAR-G, and
others). Consequently, duplex plasmids containing pKf and pAT
vectors would have two copies of the TonB terminator. Since
multiple copies of a DNA fragment within a plasmid may lead to
instability (e.g. rearrangement or deletion), the TonB terminator
of pKfH1 was replaced with the rrnB1 terminator. The primers
KfR-990R: TCT TTC GAC TGA GCC TTT CGT TTT ATT TGA TTA GAA AAA CTC
ATC GAG CAT C (SEQ ID NO:75) and KfR-991F: CTG AGC CTT TCG TTT TAA
TCT GGA AAA ACC ACC CTG GCG CTG CAG GTT CCA GAT TCC (SEQ ID NO:76)
were used in a PCR with pKfH1 as a template. The resulting fragment
was re-circularized to generate the plasmid pKfHR.
[0290] The conditional replication vector pAf4 (FIG. 9) was
constructed using the PCR amplified ampicillin gene from plasmid
pACO3 described above. The fd origin of replication was amplified
from bacteriophage fd using the flanking oligonucleotide primers
ssf1L2 (SEQ ID NO:21) and ssf1R2 (SEQ ID NO:22). The PCR fragments
were end repaired with T4 DNA polymerase, phosphorylated using T4
kinase, ligated with T4 ligase, and transformed into BHB2600. The
correct construct was identified by restriction analysis and the
inability to transform DH10B cells to ampicillin resistance.
[0291] The present Example illustrates a duplex cloning experiment
in which pCTAB4.3 (see FIG. 8) was digested with BglII and AvaII,
dephosphorylated using thermosensitive alkaline phosphatase, and
end-repaired using T4 DNA polymerase (abbreviated pCTAB/BATT in
Table 14). The conditional replication vector pKf7.1 (see FIG. 9)
was BamHI restricted, dephosphorylated, and end-repaired with T4
DNA polymerase (abbreviated pKf7.1/BTT in Table 12). These two
vector preparations were mixed in equal molar amounts in the
presence or absence of ligase and insert DNA. The total amount of
vector DNA in the ligation reaction was 100 ng, and approximately
one fifth of the reaction was used for transformation of E. coli
DH5.alpha.F' cells. An aliquot was plated on TY agar plates
containing cam plus kan to assay the background and efficiency of
duplex cloning. Another aliquot was plated on cam alone to assay
cloning into pCTAB alone. The results are shown in Table 12.
13TABLE 12 Duplex cloning results using direct selection plus
conditional replication vector preparations. Antibiotic #
Colonies/ml Ligation reaction plate transformation pCTAB/BATT +
pKf7.1/ cam + kan 0 BTT - ligase pCTAB/BATT + pKf7.1/ cam 14 BTT -
ligase pCTAB/BATT + pKf7.1/ cam + kan 0 BTT + ligase pCTAB/BATT +
pKf7.1/ cam 380 BTT + ligase pCTAB/BATT + pKf7.1/ cam + kan 7,200
BTT + .lambda.RsaI + ligase pCTAB/BATT + pKf7.1/ cam 440,000 BTT +
.lambda.RsaI + ligase
[0292] The pCTAB/BATT+pKf7.1/BTT-ligase reaction is a control
containing only the selectable marker fragments to test the degree
of empty vector background contamination. The background of intact
pCTAB vector alone is observed on the cam only plates, whereas no
background is detectable on cam plus kan plates. The
pCTAB/BATT+pKf7.1/BTT+ligase is a control to test the efficiency of
dephosphorylation to inhibit direct ligation of the selectable
markers. Plating this reaction on cam alone reveals the low
background due to pCTAB self ligation, which is less than 0.1%
(380/440,000). The lack of colonies when plated on cam plus kan
demonstrates that neither partial source nucleic acid is capable of
producing background colonies. This Example demonstrates the use of
a direct selection vector and a conditional replication vector to
provide the components of a complete duplex cloning vector mix,
which is capable of reducing background transformation to an
undetectable level.
[0293] This duplex cloning experiment resulted in approximately
7240 recombinant clones. The ratio of empty vector versus
recombinant plasmid colonies was less than 0.01%
(1/7200.times.100). Although no colonies were detected on the cam
plus kan plates from the empty vector control reaction
(pCTAB/BATT+pKf7.1/BTT+ligase), the number 1 was used in this
calculation to approximate the maximum likely frequency.
EXAMPLE 11
Sequence Analysis of a Lambda DNA Multiplex Clone Library
[0294] The desired structure of recombinant plasmid clones produced
in a duplex cloning experiment, such as that described in Example
10, is a circular DNA molecule consisting of two segments of insert
DNA separated by the Kan selectable marker on one side and the Cam
plus Ori marker on the other side (FIG. 10). The multiplex
sequencing primers KanL4 (KAN-L4: ATC TTG TGC AAC GTG ACA TCA GAG,
SEQ ID NO:23) and KanR2 (KAN-R2: CAG AAA GTC AAA AGC CTC CGA C, SEQ
ID NO:24) are situated within the Kan marker such that they prime
sequencing reactions that read the DNA adjacent to this marker.
These primers are designated KanL and KanR in FIG. 10. Similarly,
the CamL (CamL: CAG TAC TGC GAT GAG TGG CAG, SEQ ID NO:25) and
C1178R (C1178R: GAT TTT TGT GAT GCT CGT CAG G, SEQ ID NO:26)
primers are situated within the Cam plus Ori marker such that they
prime sequencing reactions that read the DNA adjacent to this
marker. These primers are designated CamL and CamR in FIG. 10.
Therefore, in a recombinant plasmid assembled in the desired
manner, all four of these primers are expected to yield DNA
sequence reactions corresponding to insert DNA.
[0295] To confirm that the multiplex cloning scheme generated the
desired recombinant plasmid constructs, 50 randomly picked clones
from Example 9 were sequenced with each of the four sequencing
primers described above. The colonies were grown in 2 ml of
Terrific Broth at 37.degree. C. overnight, the DNA was purified by
alkaline lysis treatment, and each clone was subdivided into four
reactions, one for each of the sequencing primers. The four
resultant DNA sequences from each clone were compared to that of
the known intact lambda DNA (GenBank Accession Number J02459) using
the BLAST program of the NCBI (Lipman et al., PNAS, USA, 86:4412,
1989). Analysis of all 200 DNA sequences (50 clones sequenced using
4 flanking primers) revealed a 100% frequency of .lambda. RsaI
inserts at each of the cloning sites, a 0% frequency of empty
insertion sites, and a 0% frequency of multiple marker inserts. As
expected, many of the insert DNA segments consisted of multiple
independent .lambda. RsaI fragments ligated into a larger fragment,
reflecting the small size and large number of .lambda. RsaI
fragments in the reaction. Significantly, vector DNA was not
detected in any of the inserts. Thus, the duplex cloning experiment
in Example 10 produced the desired experimental results of one
foreign insert in each of two cloning sites in 100% of the
recombinant clones.
EXAMPLE 12
Multiplex (Triplex) Cloning with Second Generation Direct Selection
Vectors and Conditional Replication Vectors
[0296] This Example describes triplex cloning with second
generation direct selection vectors and conditional replication
vectors. In particular, pCTA1 (see in FIG. 8) was digested with
AvaII, dephosphorylated using thermosensitive alkaline phosphatase,
and end-repaired using T4 DNA polymerase (abbreviated pCTA1/ATT in
Table 13). pCTA1/TAA was mixed with the conditional replication
vectors pAf4 and pKf2 that had been BamHI restricted,
dephosphorylated, and end-repaired with T4 DNA polymerase (pAf4/BAT
and pKf2/BAT, respectively). The amount of vector DNA in the
ligation reaction was 200 ng, and approximately one fifth of the
reaction was used to transform E. Coli DH5.alpha.F' cells. An
aliquot of the transformed cells was plated on TY agar plates
containing cam plus kan. The results of this assay are presented in
Table 13.
14TABLE 13 Multiplex cloning using direct selection and conditional
replication vectors. # colonies/ml Ligation rx transformation
pCTA1/ATT + pAf4/BAT + 0 pKf2/BAT + ligase pCTA1/ATT + pAf4/BAT +
7600 pKf2/BAT + .lambda.RsaI + ligase
[0297] The results presented in Table 13 indicate that this assay
resulted in 7600 putative recombinant clones with no detectable
background. Thus, the frequency of empty vector versus insertion
ligation events was less than 0.01% (1/7600.times.100). Although no
colonies were observed in the absence of insert DNA, the number 1
was used to estimate the maximum likely frequency.
[0298] The frequency of multiple selectable marker insertion events
was estimated by restriction analysis using the unique eight base
restriction site associated with each selectable marker. The
results of this analysis indicated that {fraction (11/64)} or 17.2%
of the clones had multiple inserts. These 11 multiple inserts would
render 22 sequence reactions unreadable, representing 5.7% of the
384 possible reactions (64 clones.times.6 reactions per clone).
EXAMPLE 13
Multiplex Sequencing from a Multiplex Cloning Vector
[0299] This Examples describes multiplex sequencing from a
multiplex cloning vector. A 7.0 kb plasmid, pACR4 (See e.g. Example
7) was isolated and combined with 6 primers in a fluorescent
dye-terminator cycle sequencing reaction. The table below lists the
primers used, the general location of their binding sites, and the
results obtained from automated sequence analysis. The results are
presented as either single peaks, meaning clear sequence data was
obtained, or multiple peaks, indicating the lack of interpretable
results.
15TABLE 14 Multiplex sequencing primers and results from sequencing
electropherograms. Strep- Sequence tavidin Primer reaction
Primer(s) Purified Location Peaks Sample 1 All 6 primers No Various
locations Multiple Sample 2 All 6 primers Yes Various locations
Single Sample 3 Biotin forward No 3' of pUC19 MCS Single Sample 4
AMPL6 No 5' end of amp gene Single Sample 5 PST CW No middle amp
gene Single Sample 6 PST CCW No middle amp gene Single Sample 7
CML6 No 5' end of cam gene Single Sample 8 Reverse No 5' of pUC19
MCS Single
[0300] The Applied Biosystems AmpliTaq FS DNA polymerase and
rhodamine dye terminator chemistry were used in this experiment.
One of the 6 primers contained a biotin at the 5' end ("Biotin
forward" in Table 16). After the cycle sequencing reaction,
streptavidin-coated paramagnetic beads and high salt binding buffer
were added to bind the single modified primer. The reaction tube
was placed in a magnetic field, the unbound material was aspirated,
and the bound material was washed with a low salt buffer. The
purified material was analyzed on an Applied Biosystems 373A DNA
sequencer.
[0301] The mixture of six primers in a single sequencing reaction
resulted in numerous overlapping peaks and unintelligible data when
loaded onto a single lane. In contrast, the streptavidin-captured
product from the "Biotin forward" primer in the same six primer
reaction mix yielded well resolved peaks and intelligible data.
These results clearly demonstrate the feasibility of co-sequencing
multiple DNA fragments from a single multiplex vector.
EXAMPLE 14
Multiplex Cloning with Two Vector Components
[0302] This Example describes multiplex cloning using the direct
selection vector pAT6-6 and the conditional replication vector
pKfR. Vector components were prepared by digesting pAT6-6 with
restriction enzymes HincII and StyI and digesting pKfR with
restriction enzymes HincII and Sau96I. The reactions were extracted
with five volumes of 6M guanidine and 100 mM Tris pH 6.5, adsorbed
to diatomaceous earth, and washed with 0.4 M NaCl, 20 mM Tris pH
7.5, 0.5 mM EDTA, and 50% ethanol. The restriction fragments were
eluted with distilled water, and the vector components were
differentially precipitated with 7% PEG8000 and 10 mM MgCl.sub.2.
The fragments were dephosphorylated by treatment with calf
intestinal phosphatase, extracted with phenol and chloroform, and
precipitated again with PEG 8000 and MgCl.sub.2. The processed
vector components are designated pAT66/HSC (SEQ ID NO:85) and
pKfR/HSC (SEQ ID NO:86), respectively (See, FIGS. 12B and FIG. 14).
Insert DNA was prepared by digesting phage lambda DNA with HincII
and purifying with guanidine and diatomaceous earth. After
precipitation with PEG8000 and MgCl.sub.2, and the fragments were
dissolved in distilled water.
[0303] Approximately equal molar amounts of the two vector
components (185 .mu.g of pAT66/HSC and 115 .mu.g pKfR/HSC) were
ligated with 500 .mu.g of lambda/HincII fragments. Control
reactions contained the vector components ligated without insert
DNA or mixed without ligase or insert DNA. One-tenth of the
ligation reactions were transformed into MC12 cells, and aliquots
were plated onto agar plates containing carbenicillin or ampicillin
plus kanamycin. The results are shown in Table 15.
16TABLE 15 Duplex cloning results using direct selection plus
conditional replication vector preparations. Antibiotic #
Colonies/ml Ligation reaction plate transformation pAT66/HSC +
pKfR/HSC - ligase amp + kan 0 pAT66/HSC + pKfR/HSC + ligase amp +
kan 5 pAT66/HSC + pKfR/HSC + amp + kan 7,600 .lambda.HcII + ligase
pAT66/HSC + pKfR/HSC - ligase carb 60 pAT66/HSC + pKfR/HSC + ligase
carb 700 pAT66/HSC + pKfR/HSC + carb 1,500,000 .lambda.HcII +
ligase
[0304] This cloning experiment resulted in approximately 7600
recombinant duplex clones and 1,500,000 recombinant single-insert
clones. The pAT66/HSC+pKfR/HSC-ligase reaction is a control
containing only the selectable marker fragments to test the degree
of empty vector background contamination. The background of intact
pAT6-6 vector, which is observed on the carb only plates, was less
than 0.004% (60/1,500,000), whereas no background was detectable on
amp plus kan plates. The pAT66/HSC+pKfR/HSC+ligase is a control to
test the efficiency of dephosphorylation in inhibiting direct
ligation of the selectable markers. Plating this reaction on carb
alone reveals the low background due to pAT6-6 self ligation, which
is less than 0.05% (700/1,500,000). The small number of colonies
recovered on amp plus kan demonstrates that the level of pAT66/HSC
ligation to pKfR/HSC was less than 0.07% (5/7600.times.100). This
Example demonstrates the use of a direct selection vector and a
conditional replication vector to provide the components of a
complete duplex cloning vector mix, which is capable of reducing
background levels in transformation to extremely low levels.
EXAMPLE 15
Construction of a Fixed Orientation Multiplex Cloning Vector
[0305] The present Example describes fixed orientation multiplex
cloning, in which two vector fragments are assembled in a defined
orientation relative to each other upon ligation with two insert
DNA fragments. The vector pATBAG was constructed by first
amplifying the T7 1.2 gene from pAT6-6 with the primers BXTLGA: AAC
CAT AAA ATT GGC ACC CCA GGC TTT ACA CTT TAT GCT (SEQ ID NO:77) and
BXTRGG: GAC CCA CGG GGC TGG TTA CTT CCA GTC CTT CAA CTG GTC ATA CA
(SEQ ID NO:78). The resulting fragment, containing a T7 1.2 gene
flanked by BstXI cloning sites, was cloned into a preparation of
pAT66/HSC, generating the intact pATBAG vector. The vector pKfRBAG
was constructed by first amplifying the fd replication origin of
pKfHR with the primers KBst-1053F 5'-AACCCACGGGGATGGGC
AGCTCGCCACGTTCGCCGGCTT (SEQ ID NO:79) and KBst-1433R 5'-GACCATAAAA
CTGGGCAGTTGTAAACGTTAATATTTTG (SEQ ID NO:80). The resulting
fragment, containing the fd replication origin flanked by BstXI
cloning sites, was cloned into a preparation of pKfR/HSC,
generating the intact vector pKfRBAG. Preparations of pATBAG and
pKfRBAG were digested with the restriction enzymes StyI and Sau96I,
respectively, and further digested with the restriction enzyme
BstXI. The resulting fragments were treated with calf intestinal
phosphatase to generate the vector components ATBAG/BSC and
KfBAG/BSC. The ATBAG/BSC and KfRBAG/BSC vector components each have
a four bp extension of . . . GGGG-3' on one end and . . . AAAA-3'
on the other end (See FIG. 16B).
[0306] Insert DNA for fixed orientation multiplex cloning was
generated from bacteriophage lambda DNA. The DNA was fragmented by
hydrodynamic shearing and aliquotted into two pools. One pool of
DNA fragments was ligated to the "C4" double-stranded linker, which
has one blunt end and one 3' overhang of CCCC. The C4
double-stranded linker was generated by annealing the primers
NotC4-Lnk: AGC GGC CGC AGA CTT GCC TGA CCA TTG ACC CC (SEQ ID
NO:81) and Not-comp: TCA ATG GTC AGG CAA GTC TGC GGC CGC T (SEQ ID
NO:82). A second pool of DNA fragments was ligated to the "T4"
double-stranded linker, which has one blunt end and one 3' overhang
of TTTT. The T4 double stranded linker was generated by annealing
the primers Not4T-Lnk: AGC GGC CGC AGA CTT GCC TGA CCA TTG ATT TT
(SEQ ID NO:83) and Not-comp (SEQ ID NO:82).
[0307] After ligation to the linkers, insert DNA fragments were
fractionated by agarose gel electrophoresis to purify fragments of
2-4 kb and to remove fragments of other sizes, including un-ligated
and self-ligated linkers. The insert fragments were purified from
the agarose gel and ligated to the ATBAG/BSC and KfBAG/BSC vector
components. The ligation reactions were transformed into MC12
cells, and transformants were selected on plates containing
ampicillin and kanamycin. A total of approximately 650
transformants per ml of transformed cells were recovered. Analysis
of 11 clones indicated that all had inserts of the same size.
Sequence analysis of 2 of these inserts showed that the inserts
were in fact identical to the sequence of the fd origin portion of
the pKfRBAG vector. These clones were therefore likely to be
derived from incomplete digestion of this vector, rather than to
actual ligation of the fd origin segment to the ATBAG/BSC and
KfBAG/BSC vector components.
[0308] This demonstration illustrates the use of a particular set
of vector termini (i.e. AAAA-3' and GGGG-3') and insert termini
(i.e. TTTT-3' and CCCC-3'), which may not be optimal for efficient
ligation. A wide variety of other termini may be used, which
conform to the general configuration for fixed orientation
multiplex cloning depicted in FIG. 16B of this Example or in FIG.
16A. Such termini need not be limited to 3' extensions, to
extensions of exactly four bases, nor to poly-A, poly-T, poly-C, or
poly-G extensions. The following vector components and insert
fragments were created to demonstrate the use of alternate termini
for fixed orientation multiplex cloning.
[0309] The vector component ATBbs was constructed by amplifying the
ampR gene, replication origin, and terminators of ATBAG/BSC with
the primers ATBB-1F GCACCTGACCTCCTGTGTCTTCGACGAATTCTCTAGATATCGCTCAA
(SEQ ID NO:120) and ATBB-1845R: GCAATGGTCTGTCGCCGTCTTCAACGAATTCAAG
CTTGATATCATTCAGGA (SEQ ID NO:121). The resulting fragment was
digested with the restriction enzyme BbsI, generating ATBbs. ATBbs
is analogous to ATBAG/BSC and pAT66/HSC, except that the termini of
ATBbs have an extension of 5'-TCCT on one end and 5'-GTCG on the
other end. The vector component KBsa was constructed by amplifying
the kanR gene and terminator of KfBAG/BSC with the primers KBS-1F:
GGACCTGCAAGTCGGGAGACCGACGCATATCTGGAT CCTGCAGCCGATAC (SEQ ID NO:122)
and KBS-1073R: GGAATCCTGGTCCTCGAGACCAACCAGGAATCT GGAACCTGCAGCGCCA
(SEQ ID NO: 123). The resulting fragment was digested with the
restriction enzyme BsaI, generating the vector component KBsa. KBsa
is analogous to KfBAG/BSC and pKfR/HSC, except that the termini of
KBsa have an extension of 5'-TCCT on one end and 5'-GTCG on the
other end. These termini are the same as those on ATBbs, but they
are not compatible with each other or with those of ATBbs,
preventing the vector components from being ligated to each
other.
[0310] One of the insert DNA fragments for fixed orientation
multiplex cloning was generated by PCR amplification of the lacZ
gene fragment of pUC19 DNA, using the primers LacBS-1F:
GGTACTTATCAGGACGAGACCCATTAGGCACCCC- AGGC TTTAC (SEQ ID NO:124) and
LacBS-340R: GGTCTATTAGAGGACGAGACCTT AGCGCCATTCGCCATTCAGGCT (SEQ ID
NO:125). The resulting fragment was digested with the restriction
enzyme BsaI, leaving an extension of 5'-AGGA on each end. This
fragment, designated LacBS, can therefore be ligated to the 5'-TCCT
extension on one end of either ATBbs or KBsa.
[0311] A second pool of DNA insert fragments was generated by
amplification of the gentR gene of pKGO (see FIG. 6) with the
primers GenBS-1F: GGAACTTCGACGACCGAG ACCAATTGACATAAGCCTGTTCGGTT
(SEQ ID NO:126) and GenBS-765R: GTGTACAATGCGACCGAGACCTTAGGTGGCGGT
ACTTGGGTCGAT (SEQ ID NO:127). The resulting fragment was digested
with the restriction enzyme BsaI, leaving an extension of 5'-CGAC
on each end. This fragment, designated GenBS, can therefore be
ligated to the 5'-GTCG extension on one end of either ATBbs or
KBsa.
[0312] A 10-ul ligation reaction was performed containing
approximately equal molar amounts of ATBbs, KBsa, LacBS, and GenBS.
The amount of each DNA fragment in the ligation reaction was
approximately 270 ng, 150 ng, 105 ng, and 45 ng, respectively. One
tenth of the reaction was used to transform MC12 cells, which were
spread on agar plates containing ampicillin and kanamycin
(amp+kan); or onto plates containing kanamycin, gentamycin, X-Gal,
IPTG, and ampicillin (KGXIA); or onto plates containing kanamycin,
gentamycin, X-Gal, and IPTG (KGXI). The amp+kan plates select for
any plasmids containing both vector fragments. The KGXIA plates
select for plasmids containing both vector components as well as
the GenBS insert. They allow for visual screening of the presence
of the LacBS insert. The KGXI plates select for the KBsa vector
component as well as for the GenBS insert, but they do not select
for the ATBbs component. They also allow for visual screening of
the presence of the LacBS insert. A control ligation reaction
containing only the vector components was also performed. Following
overnight incubation at 37.degree. C., the following results were
observed Table 16):
17TABLE 16 Fixed orientation multiplex cloning. # Colonies/ml
Ligation reaction Plate transformation ATBbs, KBsa + ligase amp +
kan 0 ATBbs, KBsa, LacBS, GenBS + ligase amp + kan 350,000 ATBbs,
KBsa, LacBS, GenBS + ligase KGXIA 150,000 (blue) ATBbs, KBsa,
LacBS, GenBS + ligase KGXI 150,000 (blue)
[0313] The results of the ATBbs, KBsa+ligase reaction presented in
Table 16 indicate that there was no background due to self-ligation
of the vector components. In contrast, ligation of all four
fragments (ATBbs, KBsa, LacBS, GenBS), produced 350,000 colonies
per ml of transformed cells. This reaction was therefore over
46-fold more efficient than the previously described multiplex
cloning reaction employing blunt dephosphorylated ends (Example
14). Plating on KGXIA resulted in 150,000 colonies per ml. These
colonies all must have contained both the LacBS and the GenBS
insert fragments, since they were blue and resistant to gentamycin.
The reduced number of colonies relative to the amp+kan result may
reflect a deleterious effect of selection for triple antibiotic
resistance in addition to expression of the lacZ.alpha. peptide.
Plating on KGXI likewise produced 150,000 blue colonies per ml,
indicating that selection for both vector components was not
necessary. The configuration of the ends of the vector components
and the insert fragments allowed formation of a circular plasmid
only by ligation of all four DNA sequences in the order
ATBbs-GenBS-KBsa-LacBS (with the LacBS fragment being further
ligated to the ATBbs component to form a circle), as follows: 1
[0314] Since inclusion of the vector components is necessitated by
the configuration of the ends of the insert fragments and vector
components, the vector components serve to supply sequencing primer
binding sites and to separate individual insert DNA fragments from
each other.
[0315] PCR amplification plus BsaI digestion of specific insert
fragments was employed to generate the sticky ends in the present
demonstration. For more general applications (e.g., construction of
shotgun libraries), this limitation may be circumvented by
appending double-stranded oligonucleotide linkers to blunt-ended
insert pools, similar to addition of C4 and T4 linkers to sheared
lambda DNA in the initial ligations described in this Example. For
example, insert DNA is fragmented by hydrodynamic shearing and
aliquotted into two pools. One pool of DNA fragments is ligated to
the "AGGA" double-stranded linker, which has one blunt end and one
5' overhang of AGGA. The AGGA double-stranded linker is generated
by annealing the primers AGGA-Lnk: AGC GGC CGC AGA CTT GCC TGA CCA
TTG AAG GA (SEQ ID NO:128) and Not-comp (SEQ ID NO:82). A second
pool of DNA fragments is ligated to the "CGAC" double-stranded
linker, which has one blunt end and one 5' overhang of CGAC. The
CGAC double stranded linker is generated by annealing the primers
CGAC-Lnk: AGC GGC CGC AGA CTT GCC TGA CCA TTG A CGA C (SEQ ID
NO:129) and Not-comp (SEQ ID NO:82). After ligation to the linkers,
insert DNA fragments are fractionated by agarose gel
electrophoresis to purify fragments of a desired size range (e.g.,
2-4 kb) and to remove fragments of other sizes, including
un-ligated and self-ligated linkers. The insert fragments are
purified from the agarose gel and ligated to vector components
(e.g., ATBbs and KBsa) that have one end compatible to each pool of
insert DNA fragments. The ligation reactions are transformed into
MC12 cells, and transformants are selected on plates containing
ampicillin and kanamycin. The number of insert fragment pools and
vector components is not limited by the availability of selectable
markers, since the vector components and insert fragments can be
configured to permit formation of a closed circular plasmid only by
ligation of a particular insert fragment between two particular
vector components, the said insert fragment acting as a "bridge"
between the two said vector components.
[0316] For cloning fragments of known sequence, the fixed
orientation multiplex cloning vector can be further adapted to
clone the inserts as well as the vector components in a defined
orientation. For example, PCR may be used to append a unique 5'
extension onto one end of an insert fragment and a different unique
5' onto the other end of the insert fragment. Likewise, a
complementary unique 5' extension can be appended onto each end of
two vector components, such that the insert end can bind only to
these two vector components in a defined orientation. Additional
inserts and vector components are likewise configured to allow
assembly in a defined order, with each vector component and insert
fragment in a defined orientation. After ligation and
transformation, all the resulting recombinant plasmids will have an
identical structure. Such a set of vector components would be
particularly useful for fixed orientation expression multiplex
cloning, as the vector components would have promoter regions near
their termini to drive expression of insert fragments. Therefore,
inserts fragments would need to be ligated in the proper
orientation for expression. Further, if there are multiple
different promoters present among the circular recombinant plasmid,
the insert fragments would need to be ligated adjacent to a
particular promoter of a particular vector component, to allow one
to know which promoter will drive expression of that insert
fragment. In addition, the final recombinant plasmid should be such
that all promoters are oriented in the same direction (e.g., such
that all transcription proceeds clockwise around the plasmid).
EXAMPLE 16
Multiplex Cloning by Dual Transformation with Independent
Vectors
[0317] This Examples describes the use of multiple independent
vectors to effect multiplex cloning. In this case, two vectors
function as conventional single-insert cloning vectors that are
co-transformed into competent cells. The vectors contain different
antibiotic resistance genes, as well as identical origins of
replication which are functional in the cells to be transformed.
Insert DNA is cloned into each vector to form pools of recombinant
circular plasmids. The recombinant plasmids formed from each vector
are simultaneously transformed into competent cells. Cells are
plated on media containing two antibiotics, thereby selecting for
transformants that took up both vectors. Typically less than 1% of
the intact DNA molecules capable of transformation are successful
in generating a transformant, so the frequency of the desired dual
transformation is extremely small. Further, it is generally
accepted that plasmids with similar origins of replication are
"incompatible," i.e. cannot co-exist stably within the same
cell.
[0318] The vectors pCTAB and pATH were used to demonstrate the
feasibility of dual transformation. Approximately 10 ng each of the
intact vectors were mixed and used to transform MC12 cells.
Transformants were selected by plating on cam alone to measure the
frequency of transformation with pCTAB or on cam plus carb to
measure the frequency of dual transformation. Approximately
1,000,000 pCTAB transformants were obtained on cam plates, and
about 80,000 dual transformants were obtained on cam plus carb.
Therefore, with intact plasmids the frequency of dual
transformation was approximately 12-fold lower than the frequency
of single transformation.
[0319] Multiplex cloning with recombinant libraries was
demonstrated by separately ligating lambda/HincII DNA to either
pCTAB/BATT or pATH/HSC. Approximately 500 ng of lambda/HincII DNA
was ligated to approximately 100 ng of each vector. The ligations
were mixed, and 1 .mu.l was transformed into MC12 cells. Aliquots
were spread on cam, carb, or cam+carb plates. The number of
colonies on the cam plate and on the carb plate each corresponded
to 1,800,000 colonies per ml transformed cells. The cam+carb plate
represented 2300 colonies per ml, approximately 800 fold lower than
the number of single-plasmid transformants. This Example
illustrates that dual transformation can be used to achieve
multiplex cloning, although the frequency is significantly lower
with ligation reactions than with intact purified plasmids. Dual
transformation has the disadvantage that the relative plasmid copy
number of the two plasmids may vary among different recombinant
clones or among various cultures of a single clone.
18TABLE 17 Multiplex cloning by dual transformation of MC12 cells #
Colonies/ml transformation cam + Transformation reaction cam carb
carb pCTAB + pATH 1,000,000 n.d. 80,000 pCTAB/BATT + .lambda.HcII +
ligase and 1,860,000 1,800,000 2300 pAT66/HSC + .lambda.HcII +
ligase
EXAMPLE 17
Use of pAT6-6 as a Single-Insert Cloning Vector
[0320] This Examples describes the use of pAT6-6 as a
low-background, highly efficient vector for cloning one fragment of
DNA per vector. Further, it illustrates the advantages of using a
vector containing terminators flanking the cloning site, such as
pAT6-6, for cloning genomic DNA of Lactobacillus helveticus, which
is AT-rich (i.e. has a relatively low GC-content). A preparation of
pAT6-6 was digested with HincII and StyI restriction enzymes, which
excises the T7 1.2 gene as two fragments. The T7 1.2 gene fragments
were removed by differential precipitation of the larger vector
fragment with 7% PEG8000 and 10 mM MgCl.sub.2. The vector fragment
was then treated with calf intestinal phosphatase and purified by
phenol/chloroform extraction and ethanol precipitation. Fifty ng of
this vector preparation (designated pAT66/HSC-a, SEQ ID NO:85, see
FIG. 12A and FIG. 14A) was ligated to 500 ng of HincII-digested
lambda DNA, self-ligated in the absence of insert DNA, or added to
ligase buffer without insert DNA or ligase. One-tenth of each
reaction was transformed into MC12 cells or DH10B cells, and
aliquots were plated on carbenicillin plates. The experiment was
repeated with two separate preparations of pAT66/HSC-a, with very
similar results. The average of the two results is shown in Table
18.
19TABLE 18 Single-insert cloning with pAT66/HSC in MC12 or DH10B
cells # Colonies/ml Ligation reaction Cell type transformation
pAT66/HSC-a - ligase MC12 5 pAT66/HSC-a + ligase MC12 0 pAT66/HSC-a
+ .lambda.HcII + ligase MC12 181,000 pAT66/HSC-a - ligase DH10B
2800 pAT66/HSC-a + ligase DH10B 100 pAT66/HSC + .lambda.HcII +
ligase DH10B 100,000
[0321] The results shown in Table 18 indicate that the pAT66/HSC-a
preparations gave extremely low backgrounds of uncut vector
(5/181,000 or 0.003%) or self-ligated vector (0/181,000 or 0%) when
transformed into MC12 cells, which provide selection against intact
pAT66 vector molecules. When transformed into DH10B cells, which do
not provide selection against intact pAT66 vector molecules, these
pAT66/HSC-a preparations yielded low, but significantly higher,
backgrounds of uncut vector (2800/100,000 or 2.8%) or self-ligated
vector (100/100,000 or 0.1%). The higher backgrounds seen with
transformation of DH10B cells demonstrates the utility of the T7
1.2 gene in selecting against uncut vector molecules. Observing
fewer colonies from self-ligated vector than from unligated vector
implies that the presence of ligase decreases the efficiency of
transformation. This observation is not novel, but reasons for the
decrease are not known.
[0322] The preparation of pAT66/HSC-a was further treated with
HincII, SnaBI (which cuts within the T7 1.2 gene, near the StyI
site), and calf intestinal phosphatase. Fifty ng of this
preparation, designated pAT66/HSC (SEQ ID NO:85), was tested by
ligation to lambda/HcII DNA, by self-ligation, or with no ligation.
Ligation and transformation conditions were similar to those
employed previously with pAT66/HSC-a (See Table 18 above).
One-tenth of each reaction was used to transform DH10B cells, and
aliquots were spread on plates containing ampicillin or
carbenicillin. The self-ligated and un-ligated background levels
were greatly reduced, although the background level due to
self-ligated vector was still significantly higher than that
observed previously with MC12 cells. The results are listed in
Table19.
20TABLE 19 Transformation of DH10B cells with extensively processed
pAT66 vector. # Colonies/ml Ligation reaction Cell type
transformation pAT66/HSC - ligase DH10B 0 pAT66/HSC + ligase DH10B
80 pAT66/HSC + .lambda.HcII + ligase DH10B 200,000
[0323] The results shown in Table 19 indicate the background due to
uncut vector was 0% and due to self-ligated vector was 0.04%
(80/200,000).
[0324] AT-rich DNA fragments can act as transcriptional promoters
in bacteria, initiating transcription into the vector sequence,
which may interfere with vector replication or expression of drug
resistance from the vector. The AT content of the Lactobacillus
genome is approximately 65%; therefore, it is possible that the
lower transformation efficiency observed with the pUC/HC vector is
due to plasmid instability caused by transcription initiated by the
L.h. gDNA fragments. The terminators flanking the cloning sites of
pAT6-6 are employed to block such transcription.
[0325] The vector preparation pAT66/HSC was used to generate a
library of genomic DNA from the bacterium Lactobacillus helveticus
(L.h. gDNA). The genomic DNA was hydrodynamically sheared with the
HydroShear device (GeneMachines, Inc.) and repaired with the
DNATerminator Kit containing T4 DNA polymerase and T4
polynucleotide kinase (LUCIGEN, Madison Wis.). Agarose gel
electrophoresis was used to fractionate the sheared DNA. Those
fragments of 2-3 kb in length were excised from the gel and
purified. Approximately 200 ng of this genomic DNA preparation was
ligated to 50 ng of pAT66/HSC. An equal amount of the genomic DNA
was ligated to 50 ng of a preparation of pUC19 that had been
extensively treated with HincII and CIP ("pUC/HC"). One tenth of
each ligation was transformed into DH10B cells, and aliquots were
plated on carbenicillin plates. The results of plating are shown in
Table 20.
21TABLE 20 L.h. gDNA library construction in pAT66/HSC and
pUC19/HC. Ligation Cell reaction type # Colonies/ml % Intact clones
-# tested % pAT66/HSC + DH10B 11.5 .times. 10.sup.7 17/36 (47%)
L.h. gDNA + ligase pUC/HC + DH10B 0.5 .times. 10.sup.7 27/55 (49%)
L.h. gDNA + ligase
[0326] The results presented in Table 20 indicate that the
efficiency of cloning L.h. gDNA with the pAT66 vector was 23-fold
greater than that with the pUC vector. To assess the integrity of
the cloned DNA, plasmid DNA was isolated from transformants from
each vector, and its size was analyzed by agarose gel
electrophoresis. Both vectors resulted in approximately half of the
clones having inserts that were significantly smaller than the size
of the fragments in the ligation reactions. Therefore, this genomic
DNA appeared to be unstable in both vectors.
EXAMPLE 18
Construction and Use of a Low Copy Number Cloning Vector
[0327] This Examples describes the construction and use of a low
copy number derivative of pAT6-6 for use in multiplex cloning or as
a single-insert vector. The origin of replication present in pAT6-6
is nearly identical to the origin of replication in pUC19, which
maintains a high plasmid copy number of about 300-500 copies per
cell. DNA fragments that are deleterious to the cell or that are
difficult to replicate may be particularly difficult to clone or
maintain in a high copy number plasmid. Such problems may be
compounded by presence of more than one such fragment per vector,
as in the case of multiplex cloning.
[0328] The copy number of plasmids containing the pUC origin of
replication may be substantially reduced by expressing the product
of the ROP (Repressor of Primer) gene of pBR322 in the host cell.
Therefore, the ROP gene was inserted into the vector pAT6-6. The
ROP gene was amplified from pBR322 with the primers ATR-1646R: CAT
TTG GGC CCT CAT CAG AGG TTT TCA CCG TCA TCA CC (SEQ ID NO:87) and
ATR-1441G: GTG ACC AAA CAG GAA AAA ACC GCC CT (SEQ ID NO:88). The
resulting fragment was digested with ApaI and treated with T4
polynucleotide kinase. The primers ATR-1626F: CCT CTG ATG AGG GCC
CAA ATG TAA TCA CCT GG (SEQ ID NO:89) and Amp-964R: TTA CCA ATG CTT
AAT CAG TGA G (SEQ ID NO:90) were used to amplify pAT6-6. The
resulting fragment was digested with ApaI and ligated to the
ApaI/kinase-treated ROP fragment to create the vector pATR-G, which
was transformed into DH10B cells. This vector uses a GTG initiation
codon for the ROP gene. A nearly identical vector, pATR-A, differs
by only one base pair, incorporating an ATG initiation codon for
the ROP gene. pATR-A was created in a similar manner as pATR-G,
using the PCR primer ATR-1441A: ATG ACC AAA CAG GAA AAA ACC GCC CT
(SEQ ID NO:91) in place of the primer ATR-1441G. Plasmid DNA was
isolated from colonies transformed with pATR-A or pATR-G. Both of
these vectors yielded approximately 15-30 fold less plasmid DNA
than the parental plasmid pAT6-6.
[0329] Because of the low copy number of these plasmids, isolation
of large quantities of plasmid DNA proved difficult. Therefore, PCR
was used to generate a fragment for use as a cloning vector. The
primers AT5-381F: GAC GAA TTC TCT AGA TAT CGC TCA (SEQ ID NO:92)
and AT5-28R: AAC GAA TTC AAG CTT GAT ATC ATT C (SEQ ID NO:93) were
used to amplify a fragment from a preparation of pATR-G that had
been digested with the restriction enzymes HincII and SnaBI and
subsequently treated with CIP. The PCR product was purified and
treated with CIP, generating a vector fragment designated
ATR-G.
[0330] L.h. gDNA was hydrodynamically sheared, repaired with the
DNA Terminator Kit (Lucigen, Madison Wis.), and fractionated by
agarose gel electrophoresis. Fragments of 2-3 kb in length were
excised and purified, and 200 ng was ligated to 50 ng of the ATR-G
vector fragment. One tenth of each reaction was transformed into
DH10B cells, and aliquots were grown on plates containing
carbenicillin.
22TABLE 21 L.h. gDNA library construction in the low copy number
vector ATR-G. Cell Ligation reaction type # Colonies/ml % Intact
clones ATR-G + L.h. gDNA + ligase DH10B 8 .times. 10.sup.7 58/60
(97%)
[0331] The results shown in Table 21 indicate that propagation of
L.h. gDNA fragments of 2-4 kb in the vector ATR-G resulted in
approximately 16-fold more colonies than obtained previously with
the vector pUC/HC. Further, the frequency of intact clones made
with ATR-G was approximately 2-fold greater than that observed in
clones made with pUC/HC (97% v. 49%). Therefore, the total number
of intact clones was nearly 30-fold greater with the vector ATR-G
than with pUC/HC.
EXAMPLE 19
Construction and Use of a Barnase Direct Selection Cloning
Vector
[0332] This Examples describes the construction and use of a direct
selection cloning system that incorporates the barnase lethal gene
from Bacillus amyloliqueifaciens to provide selection against
intact vector molecules. The barnase gene encodes an RNase, which
is lethal to host bacteria that carry it. Protection from barnase
can be provided by expression of barstar, an inhibitor of barnase.
To create a direct selection cloning system based on selection
provided by barnase, the barnase and barstar genes were amplified
by PCR from Bacillus amyloliqueifaciens genomic DNA.
[0333] Barstar was amplified with the primers BSL: AAG CAG TGA TCA
ACG GGG AAC AAA TCA GAA GTA TCA GCG ACC TC (SEQ ID NO:94) and BSR:
ATC ACC TGC AGT TAT TAA GAA AGT ATG ATG GTG ATG TCG CAG CCT (SEQ ID
NO:95). The primers GBSR: CGC TCC CTG CAG AGC CTG ATC ACT GCT TTT
TTC ATT TAG GTG GCG GTA CTT GGG TCG ATA TC (SEQ ID NO:96) and THL:
CAGGCTCTGC AGGGAGCGTTAACATTTAAATCGTTGCTG (SEQ ID NO:97) were used
to amplify a fragment of pGTH encompassing the gentamycin
resistance gene and replication origin, but lacking the T7 1.2
gene. The PCR primers are designed such that the resulting Barstar
and GTH fragments each contain a PstI site on one end and a BclI
site on the other end. The fragments were digested with PstI and
BclI and ligated to form the plasmid pGSTAR, which was transformed
into MC12 cells. pGSTAR DNA was isolated from a transformed colony.
The colony was further grown and treated by standard procedures to
render the cells competent for electroporation (designated "MC/GS"
cells).
[0334] Barnase was amplified with the primers BNL: 5'-GCA CAG GTG
ATC AAC ACG TTT GAC GGG GTG CGG ATT ATC T (SEQ ID NO:98) and BNR:
5'-ATC ACC TGC AGT TAT TAT CTG ATT TTT GTA AAG GTC TGA TAA TGG TCC
GTT (SEQ ID NO:99). The primers CBNL: CGC TCC CTG CAG GTG ATC ACC
TGT GCC ATT TAC GCC CCG CCC TGC CAC TCA TCG CAG TAC TG (SEQ ID
NO:100) and THL were used to amplify a fragment of pCTH
encompassing the chloramphenical resistance gene and replication
origin, but lacking the T7 1.2 gene. The PCR primers are designed
such that the resulting Barnase and CTH fragments each contain a
PstI site on one end and a BclI site on the other end. The
fragments were digested with PstI and BclI, ligated, and
transformed into MC/GS cells. As a control, MC/GS were transformed
with 200 pg of a pCTAB-based plasmid containing an uncharacterized
HincII fragment of lambda DNA. Aliquots of the cells were plated on
YT agarose containing cam or cam plus gent. The results are shown
in Table 22.
23TABLE 22 Transformation of MC/GS cells with a Barnase ligation
reaction. Ligation reaction Cell type Antibiotic # Colonies/ml BN
PCR + CTH PCR + ligase MC/GS cam 0 BN PCR + CTH PCR + ligase MC/GS
cam + gent 0 pCTAB/lambdaHc MC/GS cam >15,000 pCTAB/lambdaHc
MC/GS cam + gent >15,000 "BN" = Barnase
[0335] The results from Table 22 show that the Barnase+CTH ligation
produced no transformants capable of surviving in the MC/GS cells.
The pCTAB/lambdaHc transformation confirmed that the MC/GS cells
were competent for transformation. In addition, since the
transformation efficiency was not decreased by the presence of
gentamycin, most of the MC/GS competent cells must have retained
their gentomycin resistance plasmid pGTH. Further, they were
capable of expressing resistance simultaneously to both
antibiotics.
[0336] To generate a plasmid encoding a secreted Barnase gene
product, the barnase gene was fused to the phoA secretion signal
sequence and expressed under control of the inducible lacZ promoter
of pAT3. A PCR was performed to amplify the bamase gene from
Bacillus amyloliqueifaciens genomic DNA and simultaneously attach
the 3' portion of the phoA signal sequence to the 5' terminus of
the gene. The primers were Pho2BN-F: CCG TTA CTG TTT ACC CCT GTG
ACA AAA GCC GCA CAG GTT ATC AAC ACG TTT G (SEQ ID NO:102) and
ABN-533R: TAT CTA GAG AAT TCG TCG ACT TAT CTG ATT TTT GTA AAG GTC T
(SEQ ID NO:103). The PCR product is designated pho-bamase. A second
PCR was performed to append the 5' portion of the phoA signal
sequence to the lacZ promoter by amplifying pAT3 with the primers
Pho1-R: TAA GAG TGC CAG TGC AAT AGT GCT TTG TTT CAT GGC TGT TTC CTG
TGT GAA A (SEQ ID NO:104) and ABN-493F: CCT TTA CAA AAA TCA GAT AAG
TCG ACG AAT TCT CTA GAT ATC GCT C (SEQ ID NO:105). This PCR product
is designated AT3-pho. The primers ABN-493F and ABN-533R share 40
bases of complementarity; therefore, the AT3-pho and pho-barnase
PCR products were capable of annealing to each other in a PCR to
generate a fusion fragment consisting of the AT3 vector sequence
containing the pho signal sequence joined to the bamase coding
region. The AT3-pho and pho-bamase PCR fragments were mixed and
amplified with the primers Pho1-R and Pho2BN-F to generate the
fusion fragment, which was self-ligated to generate the plasmid
pAPBN. The ligation reaction was transformed into MC/GS cells.
Transformants were plated on carbenicillin plus gentamycin to
select for cells containing pAPBN in addition to the pGSTAR
previously transformed into the MC/GS cells. Sequencing the plasmid
DNAs showed that several of the clones contained the expected phoA
signal fused to the barnase sequence (e.g. clones pAPBN-1 and -6),
whereas others lacked a single base (#A150) corresponding to the
5'-terminal base of the Pho1-R primer (e.g. clones pAPBN-14 and
-21). This deletion results in a frameshift within the phoA signal
sequence. While the frameshift is expected to prevent expression of
barnase from the initiation codon of the phoA signal, it is
possible to re-initiate translation from a GTG codon at base
169.
[0337] Approximately 200 pg of plasmid DNA from various pAPBN
transformants was used to transform MC12 or MC/GS cells. Because
the MC/GS cells also contain the pGSTAR plasmid, these plasmid
preparations contain pGSTAR and pAPBN. The transformants were
plated on gent, carb, or amp+gent plates. Plating on gentamycin
selects for clones containing pGSTAR, plating on carb selects for
clones containing pAPBN, and plating on amp+gent selects for clones
containing both plasmids. The results are shown in Table 23. Values
represent the average number of colonies obtained from the pairs of
plasmids listed.
24TABLE 23 Transformation of MC12 and MC/GS cells with pAPBN
plasmid preparations that also contain pGSTAR. # Colonies/ml
Plasmids Cell type gent carb amp + gent pAPBN-1, -6 MC12 60,000 0
75 pAPBN-14, -21 MC12 34,000 100* 90 pAPBN-1, -6 MC/GS Lawn
(>10.sup.7) 1,000,000 1,000,000 pAPBN-14, -21 MC/GS Lawn
(>10.sup.7) 600,000 280,000 *extrapolated from a single clone
recovered from pAPBN-21.
[0338] The results shown in Table 23 indicate that all four clones
tested show lethality mediated by the barnase gene. The pGSTAR
plasmid appeared to transform MC12 cells readily, as illustrated by
the high number of gent-resistant clones. Because all the MC/GS
cells contained pGSTAR before transformation, plating on gent does
not select against non-transformed cells. The MC/GS cells grew as a
confluent lawn on gent plates, representing at least 107 cells. It
is not possible to determine what fraction of these cells were
transformed by the added pGSTAR plasmid. Very few MC12 clones
survived selection for the pAPBN plasmid, demonstrating the
toxicity of the pAPBN clones. Several clones were recovered on carb
or amp+gent plates; these were likely the result of transformation
with both plasmids or mutations that rendered the barnase gene
ineffective. The transformation efficiency of pAPBN in MC/GS was
vastly greater than in MC12, demonstrating the protective effect of
pGSTAR against barnase lethality. The presence of pGSTAR increased
survival by at least 3000-fold for the pAPBN-14 and -21 clones
(280,000 in MC/GS vs. 90 in MC12) and possibly much more for the
pAPBN-1 and -6 clones (1,000,000 in MC/GS vs. 0 in MC12).
EXAMPLE 20
Construction and Use of a Low-Copy-Number Blue Screen Cloning
Vector
[0339] This Examples describes the construction of a derivative of
pATRG, designated pZLC, that incorporates the lacZ-.alpha. gene
fragment to provide blue/white color selection to identify
recombinant clones. The pZLC vector retains important features of
pATRG, including low copy number, small size, and the presence of
transcriptional terminators flanking the cloning site and following
the ampR gene. However, a significant difference is that the T7 1.2
gene of pATRG is absent from pZLC, and it is replaced by the
lacZ-.alpha. gene fragment. pZLC therefore lacks the direct
selection attribute of pATRG. In addition, DNA fragments inserted
into pZLC will be under the control of the lacZ transcriptional
promoter.
[0340] To create pZLC, a preparation of pATRG was digested with the
restriction enzyme HincII to excise the T7 1.2 gene. The
lacZ-.alpha. gene was amplified from pUC19 with the primers LZL
(5'-CATTAGGCACCCCAGGCTTTACACTTTATGCT, SEQ ID NO:106) and LZR
(5'-TTATTAGCGCCATTCGCCATTCAGGCTGCGCAACTGT, SEQ ID NO:107). The
resulting lacZ-.alpha. gene fragment was ligated to the
HincII-digested pATRG vector fragment and transformed into MC12
cells. The cells were spread onto plates containing ampicillin,
XGAL, and IPTG. pZLC plasmid DNA was isolated from a blue colony
and the lacZ insert confirmed by sequence analysis.
EXAMPLE 21
Construction and Use of a High-Copy-Number Blue Screen Cloning
Vector
[0341] This Examples describes the construction of a derivative of
pAT6-6, designated pZHC, that incorporates the lacZ-.alpha. gene
fragment to provide blue/white color selection to identify
recombinant clones. The pZHC vector retains important features of
pAT6-6, including high copy number, small size, reduced number of
feeder colonies, and the presence of transcriptional terminators
flanking the cloning site and following the ampR gene. However, a
significant difference is that the T7 1.2 gene of pAT6-6 is absent
from pZHC, and it is replaced by the lacZ-.alpha. gene fragment.
pZHC therefore lacks the direct selection attribute of pAT6-6. In
addition, DNA fragments inserted into pZHC will be under the
influence of the lacZ transcriptional promoter.
[0342] To create pZHC, the lacZ-.alpha. gene was amplified from
pUC19 with primers LZL and LZR. The resulting lacZ-.alpha. gene
fragment was ligated to an aliquot of pATH66/HSC and transformed
into MC12 cells. Cells were spread onto agar plates containing
ampicillin, XGAL, and IPTG. The plasmid pZLC was isolated from a
blue transformant.
EXAMPLE 22
Construction and Use of a Multiplex Expression Cloning Vector
[0343] This Examples contemplates derivatives of the multiplex
cloning vector preparation described in, for example, Example 14,
such derivatives being designed to effect expression of the cloned
genes. By positioning a transcriptional promoter adjacent to each
of the cloning sites in a multiplex cloning vector preparation,
expression of two exogenous genes can be induced in a single
bacterial cell. Further, positioning different inducible promoters
adjacent to each cloning site would allow production of either or
both proteins encoded by the two insert DNAs, their expression
dependent on the which inducers were added to the cells. Various
examples of the utility of simultaneously cloning and expressing
two genes or two libraries of genes have been described in
scientific literature. For example, a dual-expression multiplex
cloning vector would be useful i) for production of dual-subunit
molecules, e.g. the heavy chain and light chain of an antibody; ii)
for analyzing the interaction between two known proteins, e.g. a
known receptor and its known ligand, particularly if the
interaction would result in a predictable or measurable response;
iii) for analyzing the interaction between a known protein and a
library of genes suspected to encode one or more interacting
proteins, e.g. a known substrate and a cDNA library suspected of
encoding enzymes specific for the known substrate, particularly if
the interaction would result in a predictable or measurable
response; or iv) for analyzing the interaction between two
libraries of genes suspected to encode interacting proteins, e.g.
cDNA libraries suspected of encoding enzymes and their substrates,
particularly if the interaction would result in a predictable or
measurable response.
[0344] Various other examples of the utility of a Multiplex
Expression Cloning Vector are contemplated. The vector components
of a fixed orientation multiplex cloning vector may be configured
as described below to append promoters to the vector components.
Configuring the fixed orientation multiplex cloning system as an
expression vector would allow, for example, the insertion of
particular genes adjacent to defined vector fragments. Large scale
analysis of gene expression in normal and diseased tissue has
identified numerous genes whose expression varies according to the
disease state (see, Genome Sequencing and Analysis Conference, San
Diego, Calif., 25-28 Oct. 2001). Cloning of such genes in an
multiplex expression vector would allow expression of a group of
proteins, which would facilitate analysis or determination of the
function of the individual proteins or the of proteins as a
group.
[0345] In this example, the vector pAT4 carries an IPTG inducible
lacZ promoter that drives expression of the T7 1.2 gene. A dual
expression multiplex cloning vector preparation is prepared in a
PCR by amplifying pAT4 with the primers LacProm-R: TCC ACA CAT TAT
ACG AGC CGG AAG CAT AAA GTG TAA AGC CTG GGG TGC CGT TAG CGA ATT CAA
GCT TGA TAT CAT TCA G (SEQ ID NO:110) and LacHc-F: ATT ATG GAC TCG
AGG GAC GTT GCC TTA CAG GAA ACA GCC ATG GTT AAC GGA CGT TTA TAT AGT
GGT AAT CTG (SEQ ID NO:111). The resulting fragment is self-ligated
to form the vector pATprom, which is nearly identical to pAT4, the
only difference being that the HincII site immediately preceding
the lac promoter in pAT4 is destroyed, and another HincII site is
created just after the translation initiation codon. Hence,
digestion with HincII will excise the entire coding region of the
T7 1.2 gene, except for the initiating ATG codon and a GTT codon
that corresponds to half the HincII site. This GTT codon may be
removed by digesting the vector with StyI or NcoI prior to
digesting with HincII. Following such digestion of the pATprom
vector, the lac promoter will drive expression of DNAs inserted
into the cloning site.
[0346] The vector pKfprom is a derivative of pKfR designed to
transcribe sequences inserted into the cloning site of pKfR. The
primers pAra-F: AAG AAA CCA ATT GTC CAT ATT GCA TCA G (SEQ ID
NO:112) and pAra-R: AAC CAT CGT TTC ACT CCA TCC AAA (SEQ ID NO:113)
are used to amplify the arabinose BAD promoter from E. coli strain
K-12. The resulting fragment is cloned into the unique HpaI site of
pKfR. When inserted in the proper orientation, the terminus of the
promoter fragment corresponding to the pAra-F primer is adjacent to
the transcriptional terminator at the 3' end of the kanR gene of
the pKfR fragment. As such, a restriction site recognized by the
enzymes HpaI and HincII will be recreated at the junction of the 3'
terminus of the promoter fragment and the replication origin region
of the vector. In this orientation, the arabinose promoter will
drive expression of DNA fragments inserted into the HpaI/HincII
site.
[0347] Dual expression multiplex cloning can be achieved, for
example, by processing pATprom and pKfRprom in a manner similar to
that described for pAT6-6 and pKfR in Example 14. Briefly, pATprom
is digested with HincII and StyI, and pKfR is digested with HincII
and Sau96I. The vector components for multiplex cloning are
purified from the T7 1.2 gene fragments or fd origin fragments by
precipitation with 7% PEG8000 and 10 mM MgCl.sub.2, treated with
alkaline phosphatase, and purified by guanidine extraction and
adsorption to diatomaceous earth. cDNAs encoding the two subunits
of a gene of interest (e.g. the p40 and p70 subunits of
interleukin-12) are mixed with the processed vector components,
ligated, and transformed into MC12 cells. The cells are plated on
agar plates containing carbenicillin and kanamycin, and they are
incubated overnight at 37.degree. C. Plasmid DNA is isolated from
transformants and screened (e.g. by sequencing) to identify those
clones that contain a copy of each subunit cDNA in each cloning
site in the proper orientation for expression (approximately 25% of
the clones are expected to be correctly assembled). Production of
both recombinant IL-12 subunits is induced by growth of such clones
in 1 mM IPTG and 0.02% arabinose.
EXAMPLE 23
Feeder Colony Reducing Vectors
[0348] This Example demonstrates the reduction in feeder colonies
surrounding cells transformed with pATH and its derivatives
relative to pUC19. Among the derivatives of pATH are the plasmids
pAT3, pAT4, pAT5, pAT6-6, pATRG, pZHC, pZLC, and others. Sequence
analysis of the ampR gene of pATH, pAT3, pAT6, and pZHC revealed
the presence of several mutations relative to the ampR gene of
pUC19. Table 24 shows the nucleotides present in pUC19 and the
mutations in the corresponding positions of the AmpR gene of pATH
and its derivatives. The position of the mutation refers to the
base number within the ampR gene, with the first base of the ampR
coding sequence designated as base #1.
25TABLE 24 Mutations in the AmpR genes of pATH and plasmids derived
from pATH Position in AmpR gene Vector Promoter 174 333 412 648 668
764 pUC19 AmpR T T A C T T pATH CamR A C A C T T pAT3 CamR n.d. C A
C C C pAT66 CamR A C G T C C pZHC CamR A C G T C C A.a. change:
Phe->Leu n.c. Thr->Ala Pro->Ser n.c. n.c. n.d., not
determined; n.c., no change
[0349] The vectors are ordered in Table 24 such that each vector
was derived from the vector listed above it. There appears to be an
accumulation of mutations with successive derivatives, consistent
with the mutations being caused by mis-incorporation of bases
during PCR. It is possible that the reduction in feeder colonies is
primarily due to the camR promoter used in the pATH-derived
plasmids. However, the low background of feeder colonies may also
be related to the mutations that result in changes in the amino
acid sequence of the AmpR gene, i.e. A 174, G412, and T648.
[0350] All of the plasmids that have been derived from pATH show a
reduction of approximately 50% in the number of feeder colonies
that arise on ampicillin plates following extended growth of the
transformants (e.g., 16 hrs of growth at 37 C followed by further
incubation at room temperature or 37 C). In addition, the feeder
colonies surrounding the pUC19 transformants grew more robustly
than those that arose from the pATH-derived transformants.
[0351] All publications and patents mentioned in the above
specification are herein incorporated by reference. Various
modifications and variations of the described method and system of
the invention will be apparent to those skilled in the art without
departing from the scope and spirit of the invention. Although the
invention has been described in connection with specific preferred
embodiments, it should be understood that the invention as claimed
should not be unduly limited to such specific embodiments. Indeed,
various modifications of the described modes for carrying out the
invention which are obvious to those skilled in chemistry,
molecular biology, or related fields and are intended to be within
the scope of the following claims.
Sequence CWU 1
1
128 1 46 DNA Artificial Sequence Synthetic 1 cactgttaac ccgggtttaa
acgttgtgtc tcaaaatatc tgatgt 46 2 78 DNA Artificial Sequence
Synthetic 2 cactgttccc gggagtcaaa agcctccggt cggaggcttt tgactttctg
cttagaaaaa 60 ctcatcgagc atcaaatg 78 3 51 DNA Artificial Sequence
Synthetic 3 tggacgttaa cccgggccta ctaggccttg atcggcacgt aagaggttcc
a 51 4 22 DNA Artificial Sequence Synthetic 4 ttacgccccg ccctgccact
ca 22 5 49 DNA Artificial Sequence Synthetic 5 ctgttaaccc
gggcgcgcct gtgcgcggaa cccctatttg tttattttc 49 6 82 DNA Artificial
Sequence Synthetic 6 tggacgtacc cgggcgcaga aaggccaccc gaaggtgagc
cagtgtgatt acatttacca 60 atgcttaatc agtgaggcac ct 82 7 47 DNA
Artificial Sequence Synthetic 7 ctgttaaccc gggatttaaa tcgttgctgg
cgtttttcca taggctc 47 8 36 DNA Artificial Sequence Synthetic 8
tggacgttaa cccgggtaga aaagatcaaa ggatct 36 9 46 DNA Artificial
Sequence Synthetic 9 cactgttaac ccgggaattg acataagcct gttcggttcg
taaact 46 10 93 DNA Artificial Sequence Synthetic 10 gtgacaaccc
gggcagatta aaacgaaagg cccagtcttt cgactgagcc tttcgtttta 60
tttgtttagg tggcggtact tgggtcgata tca 93 11 44 DNA Artificial
Sequence Synthetic 11 cagtgtcact ccatggccat gattacgcca agcttgcatg
cctg 44 12 46 DNA Artificial Sequence Synthetic 12 cagtgtcact
cccatggctg tttcctgtgt gaaattgtta tccgct 46 13 42 DNA Artificial
Sequence Synthetic 13 tgtcactcca tgggacgttt atatagtggt aatctggcag
ca 42 14 44 DNA Artificial Sequence Synthetic 14 ctgactcgaa
ttcttacttc cagtccttca actggtcata cata 44 15 30 DNA Artificial
Sequence Synthetic 15 cattaggcac cccaggcttt acactttatg 30 16 40 DNA
Artificial Sequence Synthetic 16 ttattacttc cagtccttca actggtcata
catatggttc 40 17 29 DNA Artificial Sequence Synthetic 17 ggaggtcgac
gcagttgtaa acgttaata 29 18 43 DNA Artificial Sequence Synthetic 18
cagactgtgc aagctttgca tttacgcccc gccctgccac tca 43 19 24 DNA
Artificial Sequence Synthetic 19 tcctctagag tcgacctgca ggca 24 20
26 DNA Artificial Sequence Synthetic 20 ccgggtaccg agctcgaatt
ctagca 26 21 49 DNA Artificial Sequence Synthetic 21 ctctgagaat
tcatctgcag ctcgccacgt tcgccggctt tccccgtca 49 22 53 DNA Artificial
Sequence Synthetic 22 tgcacgaatt cttgctgcag ttgtaaacgt taatattttg
ttaaaattcg cgt 53 23 24 DNA Artificial Sequence Synthetic 23
atcttgtgca acgtgacatc agag 24 24 22 DNA Artificial Sequence
Synthetic 24 cagaaagtca aaagcctccg ac 22 25 21 DNA Artificial
Sequence Synthetic 25 cagtactgcg atgagtggca g 21 26 22 DNA
Artificial Sequence Synthetic 26 gatttttgtg atgctcgtca gg 22 27 31
DNA Artificial Sequence Synthetic 27 tgggatcgca gtggtgagta
accatgcatc a 31 28 27 DNA Artificial Sequence Synthetic 28
gggaaaacag cattccaggt attagaa 27 29 66 DNA Artificial Sequence
Synthetic 29 catgcaaagc ttgcatgcct gcaggtcgac tctagaggat ccccgggtac
cgagctcgaa 60 ttctag 66 30 66 DNA Artificial Sequence Synthetic 30
catgctagaa ttcgagctcg gtacccgggg atcctctaga gtcgacctgc aggcatgcaa
60 gctttg 66 31 38 DNA Artificial Sequence Synthetic 31 accaaagatc
ttattacttc cagtccttca actggtca 38 32 45 DNA Artificial Sequence
Synthetic 32 cctgcaggga gcatttaaat cgttgctggc gtttttccat aggct 45
33 39 DNA Artificial Sequence Synthetic 33 ctgtcctcaa tacgtaaccg
tatgcaatct tttcttgta 39 34 36 DNA Artificial Sequence Synthetic 34
atctggaaac ctgattgata ctagcacctt ctacca 36 35 32 DNA Artificial
Sequence Synthetic 35 tctgagctcg gtacccggtc ctctagagtc ga 32 36 40
DNA Artificial Sequence Synthetic 36 tcttagcatg ggacgtttat
atagtggtaa tctggcagca 40 37 26 DNA Artificial Sequence Synthetic 37
tatagttaac gctccctgca ggacca 26 38 30 DNA Artificial Sequence
Synthetic 38 ggcagttaac atttaaatcg ttgctggcgt 30 39 28 DNA
Artificial Sequence Synthetic 39 tattgggccc tgatcggcac gtaagagg 28
40 35 DNA Artificial Sequence Synthetic 40 tcatgggccc aaaagatcaa
acgatcctct tgaga 35 41 1750 DNA Artificial Sequence Synthetic 41
tgatcggcac gtaagaggtt ccaactttca ccataatgaa ataagatcac taccgggcgt
60 attttttgag ttatcgagat tttcaggagc taaggaagct aaaatggaga
aaaaaatcac 120 tggatatgcc accgttgata tatcccaatg gcatcgtaaa
gaacattttg aggcatttca 180 gtcagttgct caatgtacct ataaccagac
cgttcagctg gatactacgg cctttttaaa 240 gaccgtaaag aaaaataagc
acaagtttta tccggccttt attcacattc ttgcccgcct 300 gatgaatgct
catccggaat tccgtatggc agtgaaagac ggtgagctgg tgatatggga 360
tagtgttcac ccttgttaca ccgttttcca tgagcaaact gaaacgtttt catcgctctg
420 gagtgaatac cacgacgatt tccggcagtt tctacacata tattcgcaag
atgtggcgtg 480 ttacggtgaa aacctggcct atttccctaa agggtttatt
gagaatatgt ttttcgtctc 540 agccaatccc tgggtgagtt tcaccagttt
tgatttaaac gtggccaata tggacaactt 600 cttcgccccc gttttcacca
tgggcaaata ttatacgcaa ggcgacaagg tgctgatgcc 660 gctggcgatt
caggttcatc atgccgtttg tgatggcttc catgtcggca gaatgcttaa 720
tgaattacaa cagtactgcg atgagtggca gggcggggcg taaatgcaaa gcttgcatgc
780 ctgcaggtcg actctagagg accgggtacc gagctcagat cttagcatgg
gacgtttata 840 tagtggtaat ctggcagcat tcaaggcagc aacaaacaag
ctgttccagt tagacttagc 900 ggtcatttat gatgactggt ataatgccta
tacaagaaaa gattgcatac ggttacgtat 960 tgaggacaga tctggaaacc
tgattgatac tagcaccttc taccaccacg acgaggacgt 1020 tctgttcaat
atgtgtactg attggttgaa ccatatgtat gaccagttga aggactggaa 1080
gtaataagat ctttggtcct gcagggagcg ttaacattta aatcgttgct ggcgtttttc
1140 cataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca
gaggtggcga 1200 aacccgacag gactataaag ataccaggcg tttccccctg
gaagctccct cgtgcgctct 1260 cctgttccga ccctgccgct taccggatac
ctgtccgcct ttctcccttc gggaagcgtg 1320 gcgctttctc atagctcacg
ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag 1380 ctgggctgtg
tgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactat 1440
cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaac
1500 aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg
gtggcctaac 1560 tacggctaca ctagaagaac agtatttggt atctgcgctc
tgctgaagcc agttaccttc 1620 ggaaaaagag ttggtagctc ttgatccggc
aaacaaacca ccgctggtag cggtggtttt 1680 tttgtttgca agcagcagat
tacgcgcaga aaaaaaggat ctcaagagga tcgtttgatc 1740 ttttgggccc 1750 42
34 DNA Artificial Sequence Synthetic 42 tccgtaaagc actaaatcgg
aaccctaaag ggag 34 43 38 DNA Artificial Sequence Synthetic 43
tcctcgaccc caaaaaactt gattagggtg atggttca 38 44 30 DNA Artificial
Sequence Synthetic 44 cgaaaaaccg tctatcaggg cgatggccca 30 45 51 DNA
Artificial Sequence Synthetic 45 gatccctttg acgttggatt ccacgttctt
taatagtgga ctcttgttcc a 51 46 32 DNA Artificial Sequence Synthetic
46 tccgaaaaac cgtctatcag ggcgatggcc ca 32 47 29 DNA Artificial
Sequence Synthetic 47 tccctttgac gttggagtcc acgttgttt 29 48 23 DNA
Artificial Sequence Synthetic 48 cttttgtcat tttctgctta ctg 23 49 29
DNA Artificial Sequence Synthetic 49 gatccttata aatcaaaaga
ataggccga 29 50 34 DNA Artificial Sequence Synthetic 50 tcatgttaac
caggaatctg gatcctgcag cgcc 34 51 29 DNA Artificial Sequence
Synthetic 51 tatagttaac gcagctcgcc acgttcgcc 29 52 36 DNA
Artificial Sequence Synthetic 52 tactgtcgac gcatatctgg atcctgcagc
cgatac 36 53 18 DNA Artificial Sequence Synthetic 53 tttagcttcc
ttagctcc 18 54 19 DNA Artificial Sequence Synthetic 54 atgcaaagct
tgcatgcct 19 55 19 DNA Artificial Sequence Synthetic 55 atgagtattc
aacatttcc 19 56 33 DNA Artificial Sequence Synthetic 56 atgcaagctt
tgcatttacc aatgcttaat cag 33 57 39 DNA Artificial Sequence
Synthetic 57 atgttacgca gcagcaacga tgttacgcag cagggcagt 39 58 33
DNA Artificial Sequence Synthetic 58 atgcaagctt tgcatttagg
tggcggtact tgg 33 59 19 DNA Artificial Sequence Synthetic 59
atgagccata ttcaacggg 19 60 41 DNA Artificial Sequence Synthetic 60
ctgcaggcat gcaagctttg catttagaaa aactcatcga g 41 61 48 DNA
Artificial Sequence Synthetic 61 ctggctcacc ttcgggtggg cctttctgcg
ttgctggcgt ttttccat 48 62 54 DNA Artificial Sequence Synthetic 62
tgtgattaca tttggacgcc tgtgagcttg aggttaacgc tccctgcagg acca 54 63
54 DNA Artificial Sequence Synthetic 63 caccttcacg ggtgggcctt
tcttcggtag aaaagatcaa aggatcttct tgag 54 64 64 DNA Artificial
Sequence Synthetic 64 agccagtgag ttggttacag tccagttact ctcactggat
gatcggcacg taagaggttc 60 caac 64 65 29 DNA Artificial Sequence
Synthetic 65 gtaatgaggg cccaaatgta atcacctgg 29 66 50 DNA
Artificial Sequence Synthetic 66 cctgaatgat atcaagcttg aattcgttaa
cggcacccca ggctttacac 50 67 69 DNA Artificial Sequence Synthetic 67
ctgatttaaa tggtcagtat tgagcgatat ctagagaatt cgtcgactta cttccagtcc
60 ttcaactgg 69 68 74 DNA Artificial Sequence Synthetic 68
tacctgacct ccatagcaga aagtcaaaag cctccgaccg gaggcttttg acttgatcgg
60 cacgtaagag gttc 74 69 32 DNA Artificial Sequence Synthetic 69
catttgggcc ctcattacca atgcttaatc ag 32 70 29 DNA Artificial
Sequence Synthetic 70 gtaatgaggg cccaaatgta atcacctgg 29 71 66 DNA
Artificial Sequence Synthetic 71 cttgatatca ttcaggacga gcctcagact
ccagtgagcg taactggact gtaatcaact 60 cactgg 66 72 23 DNA Artificial
Sequence Synthetic 72 cttgatatca ttcaggacga gcc 23 73 22 DNA
Artificial Sequence Synthetic 73 tacctgacct ccatagcaga aa 22 74 22
DNA Artificial Sequence Synthetic 74 ctgatttaaa tggtcagtat tg 22 75
52 DNA Artificial Sequence Synthetic 75 tctttcgact gagcctttcg
ttttatttga ttagaaaaac tcatcgagca tc 52 76 57 DNA Artificial
Sequence Synthetic 76 ctgagccttt cgttttaatc tggaaaaacc accctggcgc
tgcaggttcc agattcc 57 77 39 DNA Artificial Sequence Synthetic 77
aaccataaaa ttggcacccc aggctttaca ctttatgct 39 78 44 DNA Artificial
Sequence Synthetic 78 gacccacggg gctggttact tccagtcctt caactggtca
taca 44 79 39 DNA Artificial Sequence Synthetic 79 aacccacggg
gatgggcagc tcgccacgtt cgccggctt 39 80 38 DNA Artificial Sequence
Synthetic 80 gaccataaaa ctgggcagtt gtaaacgtta atattttg 38 81 32 DNA
Artificial Sequence Synthetic 81 agcggccgca gacttgcctg accattgacc
cc 32 82 28 DNA Artificial Sequence Synthetic 82 tcaatggtca
ggcaagtctg cggccgct 28 83 32 DNA Artificial Sequence Synthetic 83
agcggccgca gacttgcctg accattgatt tt 32 84 32 DNA Artificial
Sequence Synthetic 84 agcggccgca gacttgcctg accattgacg ac 32 85
1833 DNA Artificial Sequence Synthetic 85 gacgaattct ctagatatcg
ctcaatactg accatttaaa tcatacctga cctccatagc 60 agaaagtcaa
aagcctccga ccggaggctt ttgacttgat cggcacgtaa gaggttccaa 120
ctttcaccat aatgaaataa gatcactacc gggcgtattt tttgagttat cgagattttc
180 aggagctaag gaagctaaaa tgagtattca acatttccgt gtcgccctta
ttcccttttt 240 tgcggcattt tgccttcctg tttttgctca cccagaaacg
ctggtgaaag taaaagatgc 300 tgaagatcag ttgggtgcac gagtgggtta
catcgaactg gatctcaaca gcggtaagat 360 ccttgagagt ttacgccccg
aagaacgttt tccaatgatg agcactttta aagttctgct 420 atgtggcgcg
gtattatccc gtattgacgc cgggcaagag caactcggtc gccgcataca 480
ctattctcag aatgacttgg ttgagtactc accagtcaca gaaaagcatc tcacggatgg
540 catgacagta agagaattat gcagtgctgc cataaccatg agtgataaca
ctgcggccaa 600 cttacttctg gcaacgatcg gaggaccgaa ggagctaacc
gcttttttgc acaacatggg 660 ggatcatgta actcgccttg atcgttggga
accggagctg aatgaagcca taccaaacga 720 cgagcgtgac accacgatgc
ctgtagcaat ggcaacaacg ttgcgcaaac tattaactgg 780 cgaactactt
actctagctt cccggcaaca attaatagac tggatggagg cggataaagt 840
tgcaggatca cttctgcgct cggccctccc ggctggctgg tttattgctg ataaatctgg
900 agccggtgag cgtgggtctc gcggtatcat tgcagcactg gggccagatg
gtaagccctc 960 ccgcatcgta gttatctaca cgacggggag tcaggcaact
atggatgaac gaaatagaca 1020 gatcgctgag ataggtgcct cactgattaa
gcattggtaa tgagggccca aatgtaatca 1080 cctggctcac cttcgggtgg
gcctttctgc gttgctggcg tttttccata ggctccgccc 1140 ccctgacgag
catcacaaaa atcgatgctc aagtcagagg tggcgaaacc cgacaggact 1200
ataaagatac caggcgtttc cccctggaag ctccctcgtg cgctctcctg ttccgaccct
1260 gccgcttacc ggatacctgt ccgcctttct cccttcggga agcgtggcgc
tttctcatag 1320 ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc
tccaagctgg gctgtgtgca 1380 cgaacccccc gttcagcccg accgctgcgc
cttatccggt aactatcgtc ttgagtccaa 1440 cccggtaaga cacgacttat
cgccactggc agcagccact ggtaacagga ttagcagagc 1500 gaggtatgta
ggcggtgcta cagagttctt gaagtggtgg cctaactacg gctacactag 1560
aagaacagta tttggtatct gcgctctgct gaagccagtt acctcggaaa aagagttggt
1620 agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt
ttgcaagcag 1680 cagattacgc gcagaaaaaa aggatctcaa gaagatcctt
tgattttcta ccgaagaaag 1740 gcccacccgt gaaggtgagc cagtgagttg
attgcagtcc agttacgctg gagtctgagg 1800 ctcgtcctga atgatatcaa
gcttgaattc gtt 1833 86 1058 DNA Artificial Sequence Synthetic 86
gacgcatatc tggatcctgc agccgatacg gtcgtcgtcc gtttaaacgt tgtgtctcaa
60 aatctctgat gtcacgttgc acaagataaa aatatatcat catgaacaat
aaaaccgtct 120 gcttacataa acagtaatac aaggggtgtt atgagccata
ttcaacggga aacgtcttgc 180 tcgaggccgc gattaaattc caacatggat
gctgatttat atgggtataa atgggctcgc 240 gataatgtcg ggcaatcagg
tgcgacaatc tatcgattgt atgggaagcc cgatgcgcca 300 gagttgtttc
tgaaacatgg caaaggtagc gttgccaatg atgttacaga tgagatggtc 360
aggctaaact ggctgacgga atttatgcct cttccgacca tcaagcattt tatccgtact
420 cctgatgatg catggttact caccactgcg atcccaggga aaacagcatt
ccaggtatta 480 gaagaatatc ctgattcagg tgaaaatatt gttgatgcgc
tggcagtgtt cctgcgccgg 540 ttgcattcga ttcctgtttg taattgtcct
tttaacggcg atcgcgtatt tcgtctcgct 600 caggcgcaat cacgaatgaa
taacggtttg gttggtgcga gtgattttga tgacgagcgt 660 aatggctggc
ctgttgaaca agtctggaaa gaaatgcata agcttttgcc attctcaccg 720
gattcagtcg tcactcatgg tgatttctca cttgataacc ttatttttga cgaggggaaa
780 ttaataggtt gtattgatgt tggacgagtc ggaatcgcag accgatacca
ggatcttgcc 840 atcctatgga actgcctcgg tgagttttct ccttcattac
agaaacggct ttttcaaaaa 900 tatggtattg ataatcctga tatgaataaa
ttgcagtttc acttgatgct cgatgagttt 960 ttctaatcaa ataaaacgaa
aggctcagtc gaaagactga gcctttcgtt ttaatctgga 1020 aaaaccaccc
tggcgctgca ggttccagat tcctggtt 1058 87 38 DNA Artificial Sequence
Synthetic 87 catttgggcc ctcatcagag gttttcaccg tcatcacc 38 88 26 DNA
Artificial Sequence Synthetic 88 gtgaccaaac aggaaaaaac cgccct 26 89
32 DNA Artificial Sequence Synthetic 89 cctctgatga gggcccaaat
gtaatcacct gg 32 90 22 DNA Artificial Sequence Synthetic 90
ttaccaatgc ttaatcagtg ag 22 91 26 DNA Artificial Sequence Synthetic
91 atgaccaaac aggaaaaaac cgccct 26 92 24 DNA Artificial Sequence
Synthetic 92 gacgaattct ctagatatcg ctca 24 93 25 DNA Artificial
Sequence Synthetic 93 aacgaattca agcttgatat cattc 25 94 44 DNA
Artificial Sequence Synthetic 94 aagcagtgat caacggggaa caaatcagaa
gtatcagcga cctc 44 95 45 DNA Artificial Sequence Synthetic 95
atcacctgca gttattaaga aagtatgatg gtgatgtcgc agcct 45 96 62 DNA
Artificial Sequence Synthetic 96 cgctccctgc agagcctgat cactgctttt
ttcatttagg tggcggtact tgggtcgata 60 tc 62 97 39 DNA Artificial
Sequence Synthetic 97 caggctctgc agggagcgtt aacatttaaa tcgttgctg 39
98 40 DNA Artificial Sequence Synthetic 98 gcacaggtga tcaacacgtt
tgacggggtg cggattatct 40 99 51 DNA Artificial Sequence Synthetic 99
atcacctgca gttattatct gatttttgta aaggtctgat aatggtccgt t 51 100 62
DNA Artificial Sequence Synthetic 100 cgctccctgc aggtgatcac
ctgtgccatt tacgccccgc cctgccactc atcgcagtac
60 tg 62 101 33 DNA Artificial Sequence Synthetic 101 gagctgataa
caatttcaga caggaaacag cca 33 102 52 DNA Artificial Sequence
Synthetic 102 ccgttactgt ttacccctgt gacaaaagcc gcacaggtta
tcaacacgtt tg 52 103 43 DNA Artificial Sequence Synthetic 103
tatctagaga attcgtcgac ttatctgatt tttgtaaagg tct 43 104 52 DNA
Artificial Sequence Synthetic 104 taagagtgcc agtgcaatag tgctttgttt
catggctgtt tcctgtgtga aa 52 105 46 DNA Artificial Sequence
Synthetic 105 cctttacaaa aatcagataa gtcgacgaat tctctagata tcgctc 46
106 32 DNA Artificial Sequence Synthetic 106 cattaggcac cccaggcttt
acactttatg ct 32 107 37 DNA Artificial Sequence Synthetic 107
ttattagcgc cattcgccat tcaggctgcg caactgt 37 108 35 DNA Artificial
Sequence Synthetic 108 tcggaggctt ttgactttct gctatggagg tcagg 35
109 36 DNA Artificial Sequence Synthetic 109 ataattccac acattatacg
agccggaagc ataaag 36 110 79 DNA Artificial Sequence Synthetic 110
tccacacatt atacgagccg gaagcataaa gtgtaaagcc tggggtgccg ttagcgaatt
60 caagcttgat atcattcag 79 111 72 DNA Artificial Sequence Synthetic
111 attatggact cgagggacgt tgccttacag gaaacagcca tggttaacgg
acgtttatat 60 agtggtaatc tg 72 112 28 DNA Artificial Sequence
Synthetic 112 aagaaaccaa ttgtccatat tgcatcag 28 113 24 DNA
Artificial Sequence Synthetic 113 aaccatcgtt tcactccatc caaa 24 114
87 DNA Artificial Sequence Synthetic 114 atcttgtgca acgtgacatc
agagattttg agacacaacg tttaaacgga cgacgaccgt 60 atcggctgca
ggatccagat atgcgtc 87 115 54 DNA Artificial Sequence Synthetic 115
ttcgttttaa tctggaaaaa ccaccctggc gctgcaggtt ccagattcct ggtt 54 116
59 DNA Artificial Sequence Synthetic 116 cagtccagtt acgctggagt
ctgaggctcg tcctgaatga tatcaagctt gaattcgtt 59 117 66 DNA Artificial
Sequence Synthetic 117 ctttctgcta tggaggtcag gtatgattta aatggtcagt
attgagcgat atctagagaa 60 ttcgtc 66 118 32 DNA Artificial Sequence
Synthetic 118 ccggaggctt ttgacttgat cggcacgtaa ga 32 119 40 DNA
Artificial Sequence Synthetic 119 ggactcgagg gacgttgcct tacaggaaac
agccatggga 40 120 47 DNA Artificial Sequence Synthetic 120
gcacctgacc tcctgtgtct tcgacgaatt ctctagatat cgctcaa 47 121 51 DNA
Artificial Sequence Synthetic 121 gcaatggtct gtcgccgtct tcaacgaatt
caagcttgat atcattcagg a 51 122 50 DNA Artificial Sequence Synthetic
122 ggacctgcaa gtcgggagac cgacgcatat ctggatcctg cagccgatac 50 123
49 DNA Artificial Sequence Synthetic 123 ggaatcctgg tcctcgagac
caaccaggaa tctggaacct gcagcgcca 49 124 43 DNA Artificial Sequence
Synthetic 124 ggtacttatc aggacgagac ccattaggca ccccaggctt tac 43
125 45 DNA Artificial Sequence Synthetic 125 ggtctattag aggacgagac
cttagcgcca ttcgccattc aggct 45 126 44 DNA Artificial Sequence
Synthetic 126 ggaacttcga cgaccgagac caattgacat aagcctgttc ggtt 44
127 45 DNA Artificial Sequence Synthetic 127 gtgtacaatg cgaccgagac
cttaggtggc ggtacttggg tcgat 45 128 32 DNA Artificial Sequence
Synthetic 128 agcggccgca gacttgcctg accattgaag ga 32
* * * * *