U.S. patent application number 10/052582 was filed with the patent office on 2003-07-24 for computer-directed assembly of a polynucleotide encoding a target polypeptide.
Invention is credited to Evans, Glen A..
Application Number | 20030138782 10/052582 |
Document ID | / |
Family ID | 22998593 |
Filed Date | 2003-07-24 |
United States Patent
Application |
20030138782 |
Kind Code |
A1 |
Evans, Glen A. |
July 24, 2003 |
Computer-directed assembly of a polynucleotide encoding a target
polypeptide
Abstract
The present invention outlines a novel approach to utilizing the
results of genomic sequence information by computer-directed
polynucleotide assembly based upon information available in
databases such as the human genome database. Specifically, the
present invention may be used to select, synthesize and assemble a
novel, synthetic target polynucleotide sequence encoding a target
polypeptide. The target polynucleotide may encode a target
polypeptide that exhibits enhanced or altered biological activity
as compared to a model polypeptide encoded by a natural (wild-type)
or model polynucleotide sequence.
Inventors: |
Evans, Glen A.; (San Marcos,
CA) |
Correspondence
Address: |
CAMPBELL & FLORES LLP
4370 LA JOLLA VILLAGE DRIVE
7TH FLOOR
SAN DIEGO
CA
92122
US
|
Family ID: |
22998593 |
Appl. No.: |
10/052582 |
Filed: |
January 18, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60262693 |
Jan 19, 2001 |
|
|
|
Current U.S.
Class: |
435/6.16 ;
435/91.2; 702/20 |
Current CPC
Class: |
C40B 40/06 20130101;
C12N 15/66 20130101; C07K 1/00 20130101; B01J 2219/007 20130101;
B01J 2219/00722 20130101; B01J 2219/00689 20130101; G16B 30/00
20190201; C07B 2200/11 20130101; B01J 2219/00659 20130101; G16B
30/20 20190201; B01J 2219/00695 20130101; C12N 15/10 20130101 |
Class at
Publication: |
435/6 ; 435/91.2;
702/20 |
International
Class: |
C12Q 001/68; C12P
019/34; G06F 019/00; G01N 033/48; G01N 033/50 |
Claims
What is claimed is:
1. A method of synthesizing a target polynucleotide comprising: a)
providing a target polynucleotide sequence; b) identifying at least
one initiating polynucleotide present in the target polynucleotide
of a), wherein the initiating polynucleotide comprises at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of a 5' overhang and a 3' overhang; c)
identifying a second polynucleotide present in the target
polynucleotide of a), wherein the second polynucleotide is
contiguous with the initiating polynucleotide and comprises at
least one plus strand oligonucleotide annealed to at least one
minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang, a 3'
overhang, or a 5' overhang and a 3' overhang, wherein at least one
overhang of the second polynucleotide is complementary to at least
one overhang of the initiating polynucleotide; d) identifying a
third polynucleotide present in the target polynucleotide of a),
wherein the third polynucleotide is contiguous with the initiating
sequence and comprises at least one plus strand oligonucleotide
annealed to at least one minus strand oligonucleotide resulting in
a partially double-stranded polynucleotide comprised of a 5'
overhang, a 3' overhang, or a 5' overhang and a 3' overhang,
wherein at least one overhang of the third polynucleotide is
complementary to at least one overhang of the initiating
polynucleotide which is not complementary to an overhang of the
second polynucleotide; e) contacting the initiating polynucleotide
of b) with the second polynucleotide of c) and the third
polynucleotide of d) under conditions and for such time suitable
for annealing, the contacting resulting in a contiguous
double-stranded polynucleotide, wherein the initiating sequence is
extended bi-directionally; f) in the absence of primer extension,
optionally contacting the mixture of e) with a ligase under
conditions suitable for ligation; and g) optionally repeating b)
through f) to sequentially add double-stranded polynucleotides to
the extended initiating polynucleotide through repeated cycles of
annealing and ligation, whereby a target polynucleotide is
synthesized.
2. The method of claim 1, wherein the target polynucleotide
sequence encodes a target polypeptide.
3. The method of claim 2, wherein the target polypeptide is a
protein.
4. The method of claim 3, wherein the protein is an enzyme.
5. The method of claim 1, wherein the initiating polynucleotide
sequence is identified by a computer program.
6. The method of claim 5, wherein the computer program comprises
the following algorithm:
7. The method of claim 1, wherein the plus strand of the
initiating, second or third polynucleotide is about 15 to 1000
nucleotides in length.
8. The method of claim 1, wherein the plus strand of the
initiating, second or third polynucleotide is about 20 to 500
nucleotides in length.
9. The method of claim 1, wherein the plus strand of the
initiating, second or third polynucleotide is about 25 to 100
nucleotides in length.
10. The method of claim 1, wherein the minus strand of the
initiating, second or third polynucleotide is about 15 to 1000
nucleotides in length.
11. The method of claim 1, wherein the minus strand of the
initiating, second or third polynucleotide is about 20 to 500
nucleotides in length.
12. The method of claim 1, wherein the minus strand of the
initiating, second or third polynucleotide is about 25 to 100
nucleotides in length.
13. The method of claim 1, wherein the initiating polynucleotide is
attached to a solid support.
14. A method of synthesizing a target polynucleotide comprising: a)
providing a target polynucleotide sequence derived from a model
sequence; b) identifying at least one initiating polynucleotide
sequence present in the target polynucleotide sequence of a),
wherein the initiating polynucleotide comprises: 1) a first plus
strand oligonucleotide; 2) a second plus strand oligonucleotide
contiguous with the first plus strand oligonucleotide; and 3) a
minus strand oligonucleotide comprising a first contiguous sequence
that is at least partially complementary to the first plus strand
oligonucleotide and second contiguous sequence which is at least
partially complementary to the second plus strand oligonucleotide;
c) annealing the first plus strand oligonucleotide and the second
plus strand oligonucleotide to the minus strand oligonucleotide of
b) resulting in a partially double-stranded initiating
polynucleotide comprised of a 5' overhang and a 3' overhang; d)
identifying a second polynucleotide sequence present in the target
polynucleotide sequence of a), wherein the second polynucleotide
sequence is contiguous with the initiating polynucleotide sequence
and comprises: 1) a first plus strand oligonucleotide; 2) a second
plus strand oligonucleotide contiguous with the first plus strand
oligonucleotide; and 3) a minus strand oligonucleotide comprising a
first contiguous sequence which is at least partially complementary
to the first plus strand oligonucleotide and second contiguous
sequence which is at least partially complementary to the second
plus strand oligonucleotide; e) annealing the first plus strand
oligonucleotide and the second plus strand oligonucleotide to the
minus strand oligonucleotide of d) resulting in a partially
double-stranded second polynucleotide, wherein at least one
overhang of the second polynucleotide is complementary to at least
one overhang of the initiating polynucleotide; f) identifying a
third polynucleotide present in the target polynucleotide of a),
wherein the third polynucleotide is contiguous with the initiating
sequence and comprises: 1) a first plus strand oligonucleotide; 2)
a second plus strand oligonucleotide contiguous with the first plus
strand oligonucleotide; and 3) a minus strand oligonucleotide
comprising a first contiguous sequence which is at least partially
complementary to the first plus strand oligonucleotide and second
contiguous sequence which is at least partially complementary to
the second plus strand oligonucleotide; g) annealing the first plus
strand oligonucleotide and the second plus strand oligonucleotide
to the minus strand oligonucleotide of f) resulting in a partially
double-stranded second polynucleotide, wherein at least one
overhang of the third polynucleotide is complementary to at least
one overhang of the initiating polynucleotide and not complementary
to an overhang of the second polynucleotide; h) contacting the
initiating polynucleotide of c) with the second polynucleotide of
e) and the third polynucleotide of g) under conditions and for such
time suitable for annealing, the contacting resulting in a
contiguous double-stranded polynucleotide, wherein the initiating
sequence is extended bi-directionally; i) in the absence of primer
extension, optionally contacting the mixture of h) with a ligase
under conditions suitable for ligation; and j) optionally repeating
b) through i) to sequentially add double-stranded polynucleotides
to the extended initiating polynucleotide through repeated cycles
of annealing and ligation, whereby a target polynucleotide is
synthesized.
15. A method for synthesizing a target polynucleotide, comprising:
a) providing a target polynucleotide sequence; b) identifying at
least one initiating polynucleotide present in the target
polynucleotide of a), wherein the initiating polynucleotide
comprises at least one plus strand oligonucleotide annealed to at
least one minus strand oligonucleotide; c) contacting the
initiating polynucleotide under conditions suitable for primer
annealing with a first oligonucleotide having partial
complementarity to the 3' portion of the plus strand of the
initiating polynucleotide, and a second oligonucleotide having
partial complementarity to the 3' portion of the minus strand of
the initiating polynucleotide; d) catalyzing under conditions
suitable for primer extension: 1) polynucleotide synthesis from the
3'-hydroxyl of the plus strand of the initiating polynucleotide; 2)
polynucleotide synthesis from the 3'-hydroxyl of the annealed first
oligonucleotide; 3) polynucleotide synthesis from the 3'-hydroxyl
of the minus strand of the initiating polynucleotide; and 4)
polynucleotide synthesis from the 3'-hydroxyl of the annealed
second oligonucleotide, wherein the initiating sequence is extended
bi-directionally thereby forming a nascent extended initiating
polynucleotide; e) contacting the extended initiating
polynucleotide of d) under conditions suitable for primer annealing
with a third oligonucleotide having partial complementarity to the
3' portion of the plus strand of the extended initiating
polynucleotide, and a fourth oligonucleotide having partial
complementarity to the 3' portion of the minus strand of the
extended initiating polynucleotide; f) catalyzing under conditions
suitable for primer extension: 1) polynucleotide synthesis from the
3'-hydroxyl of the plus strand of the extended initiating
polynucleotide; 2) polynucleotide synthesis from the 3'-hydroxyl of
the annealed third oligonucleotide; 3) polynucleotide synthesis
from the 3'-hydroxyl of the minus strand of the extended initiating
polynucleotide; and 4) polynucleotide synthesis from the
3'-hydroxyl of the annealed fourth oligonucleotide, wherein the
extended initiating sequence is extended bi-directionally thereby
forming a nascent extended initiating polynucleotide; and g)
optionally repeating e) through f) as desired, resulting in
formation of the target polynucleotide sequence.
16. The method of claim 15, wherein the target polynucleotide
sequence encodes a target polypeptide.
17. The method of claim 16, wherein the target polypeptide is a
protein.
18. The method of claim 17, wherein the protein is an enzyme.
19. The method of claim 15, wherein the initiating polynucleotide
is identified by an algorithm.
20. A method of synthesizing a target polynucleotide comprising: a)
providing a target polynucleotide sequence; b) identifying at least
one initiating polynucleotide present in the target polynucleotide
of a), wherein the initiating polynucleotide comprises at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of at least a 5' overhang or a 3'
overhang; c) identifying a second polynucleotide present in the
target polynucleotide of a), wherein the second polynucleotide is
contiguous with the initiating polynucleotide and comprises at
least one plus strand oligonucleotide annealed to at least one
minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang, a 3'
overhang, or a 5' overhang and a 3' overhang, wherein at least one
overhang of the second polynucleotide is complementary to the
overhang of the initiating polynucleotide; d) contacting the
initiating polynucleotide of b) with the second polynucleotide of
c) under conditions and for such time suitable for annealing, the
contacting resulting in a contiguous double-stranded
polynucleotide, wherein the initiating sequence is extended
uni-directionally; e) in the absence of primer extension,
optionally contacting the mixture of e) with a ligase under
conditions suitable for ligation; and f) optionally repeating b)
through e) to sequentially add double-stranded polynucleotides to
the extended initiating polynucleotide through repeated cycles of
annealing and ligation, whereby a target polynucleotide is
synthesized.
21. The method of claim 15, wherein the plus strand of the
initiating, second or third polynucleotide is about 15 to 1000
nucleotides in length.
22. The method of claim 15, wherein the plus strand of the
initiating, second or third polynucleotide is about 20 to 500
nucleotides in length.
23. The method of claim 15, wherein the plus strand of the
initiating, second or third polynucleotide is about 25 to 100
nucleotides in length.
24. The method of claim 15, wherein the minus strand of the
initiating, second or third polynucleotide is about 15 to 1000
nucleotides in length.
25. The method of claim 15, wherein the minus strand of the
initiating, second or third polynucleotide is about 20 to 500
nucleotides in length.
26. The method of claim 15, wherein the minus strand of the
initiating, second or third polynucleotide is about 25 to 100
nucleotides in length.
27. The method of claim 15, wherein the initiating polynucleotide
is attached to a solid support.
28. A method for isolating a target polypeptide encoded by a target
polynucleotide, comprising: a) providing a target polynucleotide
sequence derived from a model sequence; b) identifying at least one
initiating polynucleotide present in the target polynucleotide of
a), wherein the initiating polynucleotide comprises at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of a 5' overhang and a 3' overhang; c)
identifying a second polynucleotide present in the target
polynucleotide of a), wherein the second polynucleotide is
contiguous with the initiating sequence and comprises at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5'
overhang and a 3' overhang, wherein at least one overhang of the
second polynucleotide is complementary to at least one overhang of
the initiating sequence; d) identifying a third polynucleotide
present in the target polynucleotide of a), wherein the third
polynucleotide is contiguous with the initiating sequence and
comprises at least one plus strand oligonucleotide annealed to at
least one minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang, a 3'
overhang, or a 5' overhang and a 3' overhang, wherein at least one
overhang of the third polynucleotide is complementary to at least
one overhang of the initiating sequence which is not complementary
to an overhang of the second polynucleotide; e) contacting the
initiating polynucleotide of b) with the second polynucleotide of
c) and the third polynucleotide of d) under conditions and for such
time suitable for annealing, the contacting resulting in a
contiguous double-stranded polynucleotide, wherein the initiating
sequence is extended bi-directionally; f) in the absence of primer
extension, optionally contacting the mixture of e) with a ligase
under conditions suitable for ligation; g) optionally repeating b)
through f) to sequentially add double-stranded polynucleotides to
the extended initiating sequence through repeated cycles of
annealing and ligation, whereby a target polynucleotide is
synthesized; h) incorporating the target polynucleotide of g) in an
expression vector; i) introducing the expression vector of h) into
a suitable host cell; j) culturing the cell of i) under conditions
and for such time as to promote the expression of the target
polypeptide encoded by the target polynucleotide; and k) isolating
the target polypeptide.
29. The method of claim 28, wherein the target polypeptide is a
chimeric protein.
30. The method of claim 28, wherein the target polypeptide is a
fusion protein.
31. The method of claim 28, wherein the expression vector is a
bacterial expression vector.
32. The method of claim 29, wherein the expression vector is an
animal cell expression vector.
33. The method of claim 28, wherein the expression vector is an
insect cell expression vector.
34. The method of claim 28, wherein the expression vector is a
retroviral vector.
35. The method of claim 29, wherein the expression vector is
contained in a host cell.
36. The method of claim 35, wherein the host cell is a prokaryotic
cell.
37. The method of claim 35, wherein the host cell is a eukaryotic
cell.
38. The method of claims 1, 14, 15 or 27, wherein the
oligonucleotides are produced by synthesis on a automated DNA
synthesizer.
39. A method of synthesizing a target polynucleotide comprising: a)
providing a target polynucleotide sequence derived from a model
sequence; b) chemically synthesizing a plurality of single-stranded
oligonucleotides each of which is partially complementary to at
least one oligonucleotide present in the plurality, wherein the
sequence of the plurality of oligonucleotides is a contiguous
sequence of the target polynucleotide; c) contacting the partially
complementary oligonucleotides of b) under conditions and for such
time suitable for annealing, the contacting resulting in a
plurality of partially double-stranded polynucleotides, wherein
each double-stranded polynucleotide is comprised of a 5' overhang
and a 3' overhang; d) identifying at least one initiating
polynucleotide derived from the model sequence, wherein the
initiating polynucleotide is present in the plurality of
double-stranded polynucleotides set forth in c); e) in the absence
of primer extension, subjecting a mixture comprising the initiating
polynucleotide and 1) a double-stranded polynucleotide that will
anneal to the 5' portion of said initiating and sequence; 2) a
double-stranded polynucleotide that will anneal to the 3' portion
of the initiating polynucleotide; and 3) a DNA ligase under
conditions suitable for annealing and ligation, wherein the
initiating polynucleotide is extended bi-directionally; f)
sequentially annealing double-stranded polynucleotides to the
extended initiating polynucleotide through repeated cycles of
annealing, whereby the target polynucleotide is produced.
40. The method of claim 39, wherein the oligonucleotides are
produced by synthesis on an automated DNA synthesizer.
41. A computer program, stored on a computer-readable medium, for
generating a target polynucleotide sequence, the computer program
comprising instructions for causing a computer system to: a)
identify an initiating polynucleotide sequence contained in the
target polynucleotide sequence; b) parse the target polynucleotide
sequence into multiply distinct, partially complementary,
oligonucleotides; c) control assembly of the target polynucleotide
sequence by controlling the bi-directional extension of the
initiating polynucleotide sequence by the sequential addition of
partially complementary oligonucleotides resulting in a contiguous
double-stranded polynucleotide.
42. The computer program of claim 41, wherein the parsing is
performed by an algorithm.
43. The computer program of claim 42, wherein the algorithm
comprises:
12 $Overlap = <STDIN>; $seqlen = length ($sequence); }
$revcomp = " "; for ($i = $seqlen-1; $i >= 0; $i--) { $base =
substr ($sequence, $i, 1); if ($base eq "a") {$comp = "T";} elsif
($base eq "t") {$comp = "A";} elsif ($base eq "g") {$comp = "C";}
elsif ($base eq "c") {$comp = "G";} elsif ($base eq "A") {$comp =
"T";} elsif ($base eq "T") {$comp = "A";} elsif ($base eq "G")
{$comp = "C";} elsif ($base eq "C") {$comp = "G";} else {$comp =
"X"}; $revcomp = $revcomp.$comp; } print OUT "Forward
oligos.backslash.n"; print "Forward oligos.backslash.n"; $r = 1;
for ($i = 0; $i <= $seqlen -1; $i+=$OL) { $oligo = substr
($sequence, $i, $OL); print OUT "$oligname F- $r
$oligo.backslash.n"; print "$oligname F- $r $oligo.backslash.n"; $r
= $r + 1; } $r = 1; for ($i = $seqlen - $Overlap - $OL; $i >= 0;
$i-=$OL) { print OUT ".backslash.n"; print ".backslash.n"; $oligo =
substr ($revcomp, $i, $OL); print OUT "$oligname R- $r $oligo";
print "$oligname R- $r $oligo"; $r = $r + 1; } $oligo = substr
($revcomp, 1, $Overlap); print OUT "$oligo.backslash.n"; print
"$oligo.backslash.n";
44. The computer program of claim 43, wherein the forward sequence
is optionally converted to upper case using an algorithm
comprising:
13 $forseq = ""; for ($j = 0; $j <= seqlen-1; $j ++) { $bas =
substr ($sequence, $j,1); if ($bas eq "a") {$cfor = "A";} elsif
($bas eq "t") {$cfor = "T";} elsif ($bas eq "c") {$cfor = "C";}
elsif ($bas eq "g") {$cfor = "G";} elsif ($bas eq "A") {$cfor =
"A";} elsif ($bas eq "T") {$cfor = "T";} elsif ($bas eq "C") {$cfor
= "C";} elsif ($bas eq "G") {$cfor = "G";} else {$cfor = "X"};
$forseq = $forseq.$cfor; print OUT "$j .backslash.n";
45. A computer-assisted method for synthesizing a target
polynucleotide encoding a target polypeptide derived from a model
sequence using a programmed computer including a processor, an
input device, and an output device, comprising: a) inputting into
the programmed computer, through the input device, data including
at least a portion of the target polynucleotide sequence encoding a
target polypeptide; b) determining, using the processor, the
sequence of at least one initiating polynucleotide present in the
target polynucleotide sequence c) selecting, using the processor, a
model for synthesizing the target polynucleotide sequence based on
the position of the initiating sequence in the target
polynucleotide sequence using overall sequence parameters necessary
for expression of the target polypeptide in a biological system;
and d) outputting, to the output device, the results of the at
least one determination.
46. The method of claim 45, further comprising predicting, using
the processor, whether changing the model sequence to the target
polynucleotide will have an effect on the target polypeptide
encoded by the target polynucleotide based on at least one
physical, structural or phylogenetic characteristic of the model
sequence.
47. A method for automated synthesis of a target polynucleotide
sequence, comprising: a) providing a user with an opportunity to
communicate a desired target polynucleotide sequence; b) allowing
the user to transmit the desired target polynucleotide sequence to
a server; c) providing the user with a unique designation; d)
obtaining the transmitted target polynucleotide sequence provided
by the user.
48. The method of claim 47, further comprising: f) identifying at
least one initiating polynucleotide present in the target
polynucleotide of e), wherein the initiating polynucleotide
comprises at least one plus strand oligonucleotide annealed to at
least one minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang and a 3'
overhang; g) identifying a second polynucleotide present in the
target polynucleotide of e), wherein the second polynucleotide is
contiguous with the initiating polynucleotide and comprises at
least one plus strand oligonucleotide annealed to at least one
minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang, a 3'
overhang, or a 5' overhang and a 3' overhang, wherein at least one
overhang of the second polynucleotide is complementary to at least
one overhang of the initiating polynucleotide; h) identifying a
third polynucleotide present in the target polynucleotide of e),
wherein the third polynucleotide is contiguous with the initiating
sequence and comprises at least one plus strand oligonucleotide
annealed to at least one minus strand oligonucleotide resulting in
a partially double-stranded polynucleotide comprised of a 5'
overhang, a 3' overhang, or a 5' overhang and a 3' overhang,
wherein at least one overhang of the third polynucleotide is
complementary to at least one overhang of the initiating
polynucleotide which is not complementary to an overhang of the
second polynucleotide; i) contacting the initiating polynucleotide
of f) with the second polynucleotide of g) and the third
polynucleotide of h) under conditions and for such time suitable
for annealing, the contacting resulting in a contiguous
double-stranded polynucleotide, wherein the initiating sequence is
extended bi-directionally; j) in the absence of primer extension,
optionally contacting the mixture of i) with a ligase under
conditions suitable for ligation; and k) optionally repeating f)
through k) to sequentially add double-stranded polynucleotides to
the extended initiating polynucleotide through repeated cycles of
annealing and ligation, whereby a target polynucleotide is
synthesized.
49. The method of claim 47, further comprising: f) identifying at
least one initiating polynucleotide present in the target
polynucleotide of e), wherein the initiating polynucleotide
comprises at least one plus strand oligonucleotide annealed to at
least one minus strand oligonucleotide; g) contacting the
initiating polynucleotide under conditions suitable for primer
annealing with a first oligonucleotide having partial
complementarity to the 3' portion of the plus strand of the
initiating polynucleotide, and a second oligonucleotide having
partial complementarity to the 3' portion of the minus strand of
the initiating polynucleotide; h) catalyzing under conditions
suitable for primer extension: 1) polynucleotide synthesis from the
3'-hydroxyl of the plus strand of the initiating polynucleotide; 2)
polynucleotide synthesis from the 3'-hydroxyl of the annealed first
oligonucleotide; 3) polynucleotide synthesis from the 3'-hydroxyl
of the minus strand of the initiating polynucleotide; and 4)
polynucleotide synthesis from the 3'-hydroxyl of the annealed
second oligonucleotide, wherein the initiating sequence is extended
bi-directionally thereby forming a nascent extended initiating
polynucleotide; i) contacting the extended initiating
polynucleotide of h) under conditions suitable for primer annealing
with a third oligonucleotide having partial complementarity to the
3' portion of the plus strand of the extended initiating
polynucleotide, and a fourth oligonucleotide having partial
complementarity to the 3' portion of the minus strand of the
extended initiating polynucleotide; j) catalyzing under conditions
suitable for primer extension: 1) polynucleotide synthesis from the
3'-hydroxyl of the plus strand of the extended initiating
polynucleotide; 2) polynucleotide synthesis from the 3'-hydroxyl of
the annealed third oligonucleotide; 3) polynucleotide synthesis
from the 3'-hydroxyl of the minus strand of the extended initiating
polynucleotide; and 4) polynucleotide synthesis from the
3'-hydroxyl of the annealed fourth oligonucleotide, wherein the
extended initiating sequence is extended bi-directionally thereby
forming a nascent extended initiating polynucleotide; and k)
optionally repeating f) through j) as desired, resulting in
formation of the target polynucleotide sequence.
50. A method for automated synthesis of a polynucleotide,
comprising: a) providing a user with a mechanism for communicating
a model polynucleotide sequence; b) optionally providing the user
with an opportunity to communicate at least one desired
modification to the model sequence if desired; c) allowing the user
to transmit the model sequence and desired modification to a
server; d) providing user with a unique designation; e) obtaining
the transmitted model sequence and desired modification provided by
the user; f) inputting into a programmed computer, through an input
device, data including at least a portion of the model
polynucleotide sequence; g) determining, using the processor, the
sequence of the model polynucleotide sequence containing the
desired modification; h) further determining, using the processor,
at least one initiating polynucleotide sequence present in the
model polynucleotide sequence i) selecting, using the processor, a
model for synthesizing the modified model polynucleotide sequence
based on the position of the initiating sequence in the model
polynucleotide sequence; and j) outputting, to the output device,
the results of the at least one determination.
51. An isolated polynucleotide composition comprising: a) an
initiating polynucleotide comprising a plus strand and a minus
strand, wherein the plus or minus strand is modified to incorporate
a moiety that binds to a solid support; b) a first primer suitable
for primer extension having partial complementarity to the 3'
portion of the plus strand of the initiating polynucleotide c) a
second primer suitable for primer extension having partial
complementarity to the 3' portion of the minus strand of the
initiating polynucleotide; and d) a solid support matrix, wherein
each of the first and second primers consists of about 25 to 1000
nucleotides.
52. An isolated polynucleotide composition comprising: a) an
initiating polynucleotide comprising a plus strand and a minus
strand, wherein the plus or minus strand is modified to incorporate
a moiety that binds to a solid support; b) a first primer suitable
for primer extension having partial complementarity to the 3'
portion of the plus strand of the initiating polynucleotide c) a
second primer suitable for primer extension having partial
complementarity to the 3' portion of the minus strand of the
initiating polynucleotide; and d) a solid support matrix, wherein
each of the first and second primers consists of about 25 to 1000
nucleotides.
Description
[0001] This application is based on, and claims the benefit of,
U.S. Provisional Application No. 60/262,693, filed Jan. 19,
2001,and entitled COMPUTER-DIRECTED ASSEMBLY OF A POLYNUCLEOTIDE
ENCODING A TARGET POLYPEPTIDE, and which is incorporated herein by
reference.
TECHNICAL FIELD
[0002] The present invention relates generally to the area of
bioinformatics and more specifically to methods, algorithms and
apparatus for computer directed polynucleotide assembly. The
invention further relates to the production of polypeptides encoded
by polynucleotides assembled by the invention.
BACKGROUND
[0003] Enzymes, antibodies, receptors and ligands are polypeptides
that have evolved by selective pressure to perform very specific
biological functions within the milieu of a living organism. The
use of a polypeptide for specific technological applications may
require the polypeptide to function in environments or on
substrates for which it was not evolutionarily selected.
Polypeptides isolated from microorganisms that thrive in extreme
environments provide ample evidence that these molecules are, in
general, malleable with regard to structure and function. However,
the process for isolating a polypeptide from its native environment
is expensive and time consuming. Thus, new methods for
synthetically evolving genetic material encoding a polypeptide
possessing a desired activity are needed.
[0004] There are two ways to obtain genetic material for genetic
engineering manipulations: (1) isolation and purification of a
polynucleotide in the form of DNA or RNA from natural sources or
(2) the synthesis of a polynucleotide using various
chemical-enzymatic approaches. The former approach is limited to
naturally-occurring sequences that do not easily lend themselves to
specific modification. The latter approach is much more complicated
and labor-intensive. However, the chemical-enzymatic approach has
many attractive features including the possibility of preparing,
without any significant limitations, any desirable polynucleotide
sequence.
[0005] Two general methods currently exist for the synthetic
assembly of oligonucleotides into long polynucleotide fragments.
First, oligonucleotides covering the entire sequence to be
synthesized are first allowed to anneal, and then the nicks are
repaired with ligase. The fragment is then cloned directly, or
cloned after amplification by the polymerase chain reaction (PCR).
The polynucleotide is subsequently used for in vitro assembly into
longer sequences. The second general method for gene synthesis
utilizes polymerase to fill in single-stranded gaps in the annealed
pairs of oligonucleotides. After the polymerase reaction,
single-stranded regions of oligonucleotides become double-stranded,
and after digestion with restriction endonuclease, can be cloned
directly or used for further assembly of longer sequences by
ligating different double-stranded fragments. Typically, subsequent
to the polymerase reaction, each segment must be cloned which
significantly delays the synthesis of long DNA fragments and
greatly decreases the efficiency of this approach.
[0006] The creation of entirely novel polynucleotides, or the
substantial modification of existing polynucleotides, is extremely
time consuming, expensive, requires complex and multiple steps, and
in some cases is impossible. Therefore, there exists a great need
for an efficient means to assemble synthetic polynucleotides of any
desired sequence. Such a method could be universally applied. For
example, the method could be used to efficiently make an array of
polynucleotides having specific substitutions in a known sequence
that is expressed and screened for improved function. The present
invention satisfies these needs by providing efficient and powerful
methods and compositions for the synthesis of a target
polynucleotide encoding a target polypeptide.
SUMMARY
[0007] The present invention addresses the limitations in present
recombinant nucleic acid manipulations by providing a fast,
efficient means for generating a nucleic acid sequence, including
entire genes, chromosomal segments, chromosomes and genomes.
Because this approach is based on a completely synthetic approach,
there are no limitations, such as the availability of existing
nucleic acids, to hinder the construction of even very large
segments of nucleic acid.
[0008] In one embodiment, the invention provides a method of
synthesizing a target polynucleotide sequence including; a)
providing a target polynucleotide sequence; b) identifying at least
one initiating polynucleotide present in the target polynucleotide
which includes at least one plus strand oligonucleotide annealed to
at least one minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang and a 3'
overhang; c) identifying a second polynucleotide present in the
target polynucleotide which is contiguous with the initiating
polynucleotide and includes at least one plus strand
oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5'
overhang and a 3' overhang, where at least one overhang of the
second polynucleotide is complementary to at least one overhang of
the initiating polynucleotide; d) identifying a third
polynucleotide present in the target polynucleotide which is
contiguous with the initiating sequence and includes at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide resulting in a partially double-stranded
polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5'
overhang and a 3' overhang, where at least one overhang of the
third polynucleotide is complementary to at least one overhang of
the initiating polynucleotide which is not complementary to an
overhang of the second polynucleotide; e) contacting the initiating
polynucleotide with the second polynucleotide and the third
polynucleotide under conditions and for such time suitable for
annealing, the contacting resulting in a contiguous double-stranded
polynucleotide, resulting in the bi-directional extension of the
initiating polynucleotide; f) in the absence of primer extension,
optionally contacting the mixture of e) with a ligase under
conditions suitable for ligation; and g) optionally repeating b)
through f) to sequentially add double-stranded polynucleotides to
the extended initiating polynucleotide through repeated cycles of
annealing and ligation, whereby a target polynucleotide is
synthesized.
[0009] The invention further provides a method of synthesizing a
target polynucleotide including: a) providing a target
polynucleotide sequence derived from a model sequence; b)
identifying at least one initiating polynucleotide sequence present
in the target polynucleotide sequence of a), wherein the initiating
polynucleotide including: 1) a first plus strand oligonucleotide;
2) a second plus strand oligonucleotide contiguous with the first
plus strand oligonucleotide; and 3) a minus strand oligonucleotide
including a first contiguous sequence which is at least partially
complementary to the first plus strand oligonucleotide and second
contiguous sequence which is at least partially complementary to
the second plus strand oligonucleotide; c) annealing the first plus
strand oligonucleotide and the second plus strand oligonucleotide
to the minus strand oligonucleotide of b) resulting in a partially
double-stranded initiating polynucleotide including a 5' overhang
and a 3' overhang; d) identifying a second polynucleotide sequence
present in the target polynucleotide sequence of a), wherein the
second polynucleotide sequence is contiguous with the initiating
polynucleotide sequence and includes: 1) a first plus strand
oligonucleotide; 2) a second plus strand oligonucleotide contiguous
with the first plus strand oligonucleotide; and 3) a minus strand
oligonucleotide comprising a first contiguous sequence which is at
least partially complementary to the first plus strand
oligonucleotide and second contiguous sequence which is at least
partially complementary to the second plus strand oligonucleotide;
e) annealing the first plus strand oligonucleotide and the second
plus strand oligonucleotide to the minus strand oligonucleotide of
d) resulting in a partially double-stranded second polynucleotide,
wherein at least one overhang of the second polynucleotide is
complementary to at least one overhang of the initiating
polynucleotide; f) identifying a third polynucleotide present in
the target polynucleotide of a), wherein the third polynucleotide
is contiguous with the initiating sequence and comprises: 1) a
first plus strand oligonucleotide; 2) a second plus strand
oligonucleotide contiguous with the first plus strand
oligonucleotide; and 3) a minus strand oligonucleotide comprising a
first contiguous sequence which is at least partially complementary
to the first plus strand oligonucleotide and second contiguous
sequence which is at least partially complementary to the second
plus strand oligonucleotide; g) annealing the first plus strand
oligonucleotide and the second plus strand oligonucleotide to the
minus strand oligonucleotide of f) resulting in a partially
double-stranded second polynucleotide, wherein at least one
overhang of the third polynucleotide is complementary to at least
one overhang of the initiating polynucleotide and not complementary
to an overhang of the second polynucleotide; h) contacting the
initiating polynucleotide of c) with the second polynucleotide of
e) and the third polynucleotide of g) under conditions and for such
time suitable for annealing, the contacting resulting in a
contiguous double-stranded polynucleotide, wherein the initiating
sequence is extended bi-directionally; i) in the absence of primer
extension, optionally contacting the mixture of h) with a ligase
under conditions suitable for ligation; and j) optionally repeating
b) through i) to sequentially add double-stranded polynucleotides
to the extended initiating polynucleotide through repeated cycles
of annealing and ligation, whereby a target polynucleotide is
synthesized.
[0010] In another embodiment, the invention provides a method a
method for synthesizing a target polynucleotide, including; a)
providing a target polynucleotide sequence derived from a model
sequence; b) identifying at least one initiating polynucleotide
present in the target polynucleotide which includes at least one
plus strand oligonucleotide annealed to at least one minus strand
oligonucleotide; c) contacting the initiating polynucleotide under
conditions suitable for primer annealing with a first
oligonucleotide having partial complementarity to the 3' portion of
the plus strand of the initiating polynucleotide, and a second
oligonucleotide having partial complementarity to the 3' portion of
the minus strand of the initiating polynucleotide; d) catalyzing
under conditions suitable for primer extension: 1) polynucleotide
synthesis from the 3'-hydroxyl of the plus strand of the initiating
polynucleotide; 2) polynucleotide synthesis from the 3'-hydroxyl of
the annealed first oligonucleotide; 3) polynucleotide synthesis
from the 3'-hydroxyl of the minus strand of the initiating
polynucleotide; and 4) polynucleotide synthesis from the
3'-hydroxyl of the annealed second oligonucleotide, resulting in
the bi-directional extension of the initiating sequence thereby
forming a nascent extended initiating polynucleotide; e) contacting
the extended initiating polynucleotide of d) under conditions
suitable for primer annealing with a third oligonucleotide having
partial complementarity to the 3' portion of the plus strand of the
extended initiating polynucleotide, and a fourth oligonucleotide
having partial complementarity to the 3' portion of the minus
strand of the extended initiating polynucleotide; f) catalyzing
under conditions suitable for primer extension: 1) polynucleotide
synthesis from the 3'-hydroxyl of the plus strand of the extended
initiating polynucleotide; 2) polynucleotide synthesis from the
3'-hydroxyl of the annealed third oligonucleotide; 3)
polynucleotide synthesis from the 3'-hydroxyl of the minus strand
of the extended initiating polynucleotide; and 4) polynucleotide
synthesis from the 3'-hydroxyl of the annealed fourth
oligonucleotide, resulting in the bi-directional extension of the
initiating sequence thereby forming a nascent extended initiating
polynucleotide; and g) optionally repeating e) through f) as
desired, resulting in formation of the target polynucleotide
sequence.
[0011] The invention further provides a method for isolating a
target polypeptide encoded by a target polynucleotide generated by
a method of the invention by; a) incorporating the target
polynucleotide in an expression vector; b) introducing the
expression vector into a suitable host cell; c) culturing the cell
under conditions and for such time as to promote the expression of
the target polypeptide encoded by the target polynucleotide; and d)
isolating the target polypeptide.
[0012] The invention further provides a method of synthesizing a
target polynucleotide including; a) providing a target
polynucleotide sequence derived from a model sequence; b)
chemically synthesizing a plurality of single-stranded
oligonucleotides each of which is partially complementary to at
least one oligonucleotide present in the plurality, where the
sequence of the plurality of oligonucleotides is a contiguous
sequence of the target polynucleotide; c) contacting the partially
complementary oligonucleotides under conditions and for such time
suitable for annealing, the contacting resulting in a plurality of
partially double-stranded polynucleotides, where each
double-stranded polynucleotide includes a 5' overhang and a 3'
overhang; d) identifying at least one initiating polynucleotide
derived from the model sequence present in the plurality of
double-stranded polynucleotides; e) in the absence of primer
extension, subjecting a mixture including the initiating
polynucleotide and 1) a double-stranded polynucleotide that will
anneal to the 5' portion of said initiating and sequence; 2) a
double-stranded polynucleotide that will anneal to the 3' portion
of the initiating polynucleotide; and 3) a DNA ligase under
conditions suitable for annealing and ligation, wherein the
initiating polynucleotide is extended bi-directionally; f)
sequentially annealing double-stranded polynucleotides to the
extended initiating polynucleotide through repeated cycles of
annealing, whereby the target polynucleotide is produced.
[0013] The invention further provides a computer program, stored on
a computer-readable medium, for generating a target polynucleotide
sequence derived from a model sequence, the computer program
comprising instructions for causing a computer system to: a)
identify an initiating polynucleotide sequence contained in the
target polynucleotide sequence; b) parse the target polynucleotide
sequence into multiply distinct, partially complementary,
oligonucleotides; c) control assembly of the target polynucleotide
sequence by controlling the bi-directional extension of the
initiating polynucleotide sequence by the sequential addition of
partially complementary oligonucleotides resulting in a contiguous
double-stranded polynucleotide.
[0014] The invention further provides a method for automated
synthesis of a target polynucleotide sequence, including: a)
providing the user with an opportunity to communicate a desired
target polynucleotide sequence; b) allowing the user to transmit
the desired target polynucleotide sequence to a server; c)
providing the user with a unique designation; d) obtaining the
transmitted target polynucleotide sequence provided by the
user.
[0015] The invention further provides a method for automated
synthesis of a polynucleotide sequence, including: a) providing a
user with a mechanism for communicating a model polynucleotide
sequence; b) optionally providing the user with an opportunity to
communicate at least one desired modification to the model sequence
if desired; c) allowing the user to transmit the model sequence and
desired modification to a server; d) providing user with a unique
designation; e) obtaining the transmitted model sequence and
optional desired modification provided by the user; f) inputting
into a programmed computer, through an input device, data including
at least a portion of the model polynucleotide sequence; g)
determining, using the processor, the sequence of the model
polynucleotide sequence containing the desired modification; h)
further determining, using the processor, at least one initiating
polynucleotide sequence present in the model polynucleotide
sequence; i) selecting, using the processor, a model for
synthesizing the modified model polynucleotide sequence based on
the position of the initiating sequence in the model polynucleotide
sequence; and j) outputting, to the output device, the results of
the at least one determination.
[0016] Unless otherwise defined, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs. For
example, the one letter and three letter abbreviations for amino
acids and the one-letter abbreviations for nucleotides are commonly
understood. Although methods and materials similar or equivalent to
those described herein can be used in the practice or testing of
the present invention, suitable methods and materials are described
below. In addition, the materials, methods and examples are
illustrative only and not intended to be limiting. All
publications, patent applications, patents, and other references
mentioned herein are incorporated by reference in their entirety.
In case of conflict, the present specification, including
definitions, will control.
[0017] The details of one or more embodiments of the invention are
set forth in the accompanying drawings and the description below.
Other features, objects, and advantages of the invention will be
apparent from the description and drawings, and from the
claims.
DESCRIPTION OF DRAWINGS
[0018] Like reference symbols in the various drawings indicate like
elements.
[0019] FIG. 1 depicts 96 well plates for of F (i.e., "forward" or
"plus strand") oligonucleotide synthesis, R (i.e., "reverse" or
"minus strand") oligonucleotide synthesis, and a T (i.e.,
"temperature") plate for the annealing of F and T
oligonucleotides.
[0020] FIG. 2 depicts the oligonucleotide pooling plan where F
oligonucleotides and R oligonucleotides are annealed to form a
contiguous polynucleotide.
[0021] FIG. 3 depicts the schematic of assembly of a target
polynucleotide sequence defining a gene, genome, set of genes or
polypeptide sequence. The sequence is designed by computer and used
to generate a set of parsed oligonucleotide fragments covering the
+ and - strand of a target polynucleotide sequence encoding a
target polypeptide.
[0022] FIG. 4 depicts a schematic of the polynuceotide synthesis
modules. A nanodispensing head with a plurality of valves will
deposit synthesis chemicals in assembly vessels. Chemical
distribution from the reagent reservoir can be controlled using a
syringe pump. Underlying the reaction chambers is a set of assembly
vessels linked to microchannels that will move fluids by
microfluidics.
[0023] FIG. 5 depicts that oligonucleotide synthesis,
oligonucleotide assembly by pooling and annealing, and ligation can
be accomplished using microfluidic mixing.
[0024] FIG. 6 depicts the sequential pooling of oligonucleotides
synthesized in arrays.
[0025] FIG. 7 depicts the pooling stage of the oligonucleotide
components through the manifold assemblies resulting in the
complete assembly of all oligonucleotides from the array.
[0026] FIG. 8 depicts an example of an assembly module comprising a
complete set of pooling manifolds produced using microfabrication
in a single unit. Various configurations of the pooling manifold
will allow assembly of increased numbers of well arrays of parsed
component oligonucleotides.
[0027] FIG. 9 depicts the configuration for the assembly of
oligonucleotides synthesized in a pre-defined array. Passage
through the assembly device in the presence of DNA ligase and other
appropriate buffer and chemical components will facilitate double
stranded polynucleotide assembly.
[0028] FIG. 10 depicts an example of the pooling device design.
Microgrooves or microfluidic channels are etched into the surface
of the pooling device. The device provides a microreaction vessel
at the junction of two channels for 1) mixing of the two streams,
2) controlled temperature maintenance or cycling a the site of the
junction and 3) expulsion of the ligated mixture from the exit
channel into the next set of pooling and ligation chambers.
[0029] FIG. 11 depicts the design of a polynucleotide synthesis
platform comprising microwell plates addressed with a plurality of
channels for microdispensing.
[0030] FIG. 12 depicts an example of a high capacity polynucleotide
synthesis platform using high density microwell microplates capable
of synthesizing in excess of 1536 component oligonucleotides per
plate.
[0031] FIG. 13 depicts a polynucleotide assembly format using
surface-bound oligonucleotide synthesis rather than soluble
synthesis. In this configuration, oligonucleotides are synthesized
with a linker that allows attachment to a solid support.
[0032] FIG. 14 depicts a diagram of systematic polynucleotide
assembly on a solid support. A set of parsed component
oligonucleotides are arranged in an array with a stabilizer
oligonucletoide attached. A set of ligation substrate
oligonucleotides are placed in the solution and systematic assembly
is carried out in the solid phase by sequential annealing, ligation
and melting.
[0033] FIG. 15 depicts polynucleotide assembly using component
oligonucleotides bound to a set of metal electrodes on a
microelectronic chip. Each electrode can be controlled
independently with respect to current and voltage.
[0034] FIG. 16 depicts generally a primer extension assembly method
of the invention.
[0035] FIG. 17 provides a system diagram of the invention.
[0036] FIG. 18 depicts a perspective view of an instrument of the
invention.
DETAILED DESCRIPTION
[0037] The complete sequence of complex genomes, including the
human genome, make large scale functional approaches to genetics
possible. The present invention outlines a novel approach to
utilizing the results of genomic sequence information by
computer-directed polynucleotide assembly based upon information
available in databases such as the human genome database.
Specifically, the present invention may be used to synthesize,
assemble and select a novel, synthetic target polynucleotide
sequence encoding a target polypeptide. The target polynucleotide
may encode a target polypeptide that exhibits enhanced or altered
biological activity as compared to a model polypeptide encoded by a
natural (wild-type) or model polynucleotide sequence. Subsequently,
standard assays may be used to survey the activity of an expressed
target polypeptide. For example, the expressed target polypeptide
can be assayed to determine its ability to carry out the function
of the corresponding model polypeptide or to determine whether a
target polypeptide exhibiting a new function has been produced.
Thus, the present invention provides a means for the synthetically
evolving a model polypeptide by synthesizing, in a
computer-directed fashion, polynucleotides encoding a target
polypeptide derived from a model polypeptide.
[0038] In one embodiment, the invention provides a method of
synthesizing a target polynucleotide by providing a target
polynucleotide sequence and identifying at least one initiating
polynucleotide present in the target polynucleotide which includes
at least one plus strand oligonucleotide annealed to at least one
minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang and a 3'
overhang. As used herein, a "target polynucleotide sequence"
includes any nucleic acid sequence suitable for encoding a target
polypeptide that can be synthesized by a method of the invention. A
target polynucleotide sequence can be used to generate a target
polynucleotide using an apparatus capable of assembling nucleic
sequences. Generally, a target polynucleotide sequence is a linear
segment of DNA having a double-stranded region; the segment may be
of any length sufficiently long to be created by the hybridization
of at least two oligonucleotides have complementary regions. It is
contemplated that a target polynucleotide can be 100, 200, 300,
400, 800, 100, 1500, 200, 4000, 8000, 10000, 12000, 18,000, 20,000,
40,000, 80,000 or more base pairs in length. Indeed, it is
contemplated that the methods of the present invention will be able
to create entire artificial genomes of lengths comparable to known
bacterial, yeast, viral, mammalian, amphibian, reptilian, or avian
genomes. In more particular embodiments, the target polynucleotide
is a gene encoding a polypeptide of interest. The target
polynucleotide may further include non-coding elements such as
origins of replication, telomeres, promoters, enhancers,
transcription and translation start and stop signals, introns, exon
splice sites, chromatin scaffold components and other regulatory
sequences. The target polynucleotide may comprises multiple genes,
chromosomal segments, chromosomes and even entire genomes. A
polynucleotide of the invention may be derived from prokaryotic or
eukaryotic sequences including bacterial, yeast, viral, mammalian,
amphibian, reptilian, avian, plants, archebacteria and other DNA
containing living organisms.
[0039] An "oligonucleotide", as used herein, is defined as a
molecule comprised of two or more deoxyribonucleotides or
ribonucleotides, preferably more than three. Its exact size will
depend on many factors, such as the reaction temperature, salt
concentration, the presence of denaturants such as formamide, and
the degree of complementarity with the sequence to which the
oligonucleotide is intended to hybridize.
[0040] The term "nucleotide" as used herein can refer to
nucleotides present in either DNA or RNA and thus includes
nucleotides which incorporate adenine, cytosine, guanine, thymine
and uracil as base, the sugar moiety being deoxyribose or ribose.
It will be appreciated however that other modified bases capable of
base pairing with one of the conventional bases, adenine, cytosine,
guanine, thymine and uracil, may be used in an oligonucleotide
employed in the present invention. Such modified bases include for
example 8-azaguanine and hypoxanthine. If desired the nucleotides
may carry a label or marker so that on incorporation into a primer
extension product, they augment the signal associated with the
primer extension product, for example for capture on to solid
phase.
[0041] A "plus strand" oligonucleotide, by convention, includes a
short, single-stranded DNA segment that starts with the 5' end to
the left as one reads the sequence. A "minus strand"
oligonucleotide includes a short, single-stranded DNA segment that
starts with the 31 end to the left as one reads the sequence.
Methods of synthesizing oligonucleotides are found in, for example,
Oligonucleotide Synthesis: A Practical Approach, Gate, ed., IRL
Press, Oxford (1984), incorporated herein by reference in its
entirety. Solid-phase synthesis techniques have been provided for
the synthesis of several peptide sequences on, for example, a
number of "pins" (See e.g., Geysen et al., J. Immun. Meth. (1987)
102:259-274, incorporated herein by reference in its entirety).
[0042] Additional methods of forming large arrays of
oligonucleotides and other polymer sequences in a short period of
time have been devised. Of particular note, Pirrung et al., U.S.
Pat. No. 5,143,854 (see also PCT Application No. WO 90/15070),
Fodor et al., PCT Publication No. WO 92/10092 and Winkler et al.,
U.S. Pat. No. 6,136,269, all incorporated herein by reference,
disclose methods of forming vast arrays of polymer sequences using,
for example, light-directed synthesis techniques. See also, Fodor
et al., Science (1991) 251:767-777, also incorporated herein by
reference in its entirety. Some work has been done to automate
synthesis of polymer arrays. For example, Southern, PCT Application
No. WO 89/10977, describes the use of a conventional pen plotter to
deposit three different monomers at twelve distinct locations on a
substrate.
[0043] An "initiating polynucleotide sequence," as used herein, is
a sequence contained in a target polynucleotide sequence and
identified by an algorithm of the invention. An "initiating
polynucleotide" is the physical embodiment of an initiating
polynucleotide sequence. For ligation assembly of a target
polynucleotide, an initiating polynucleotide begins assembly by
providing an anchor for hybridization of subsequent polynucleotides
contiguous with the initiating polynucleotide. Thus, for ligation
assembly, an initiating polynucleotide is partially double-stranded
nucleic acid thereby providing single-stranded overhang(s) for
annealing of a contiguous, double-stranded nucleic acid molecule.
For primer extension assembly of a target polynucleotide, an
initiating polynucleotide begins assembly by providing a template
for hybridization of subsequent oligonucleotides contiguous with
the initiating polynucleotide. Thus, for primer extension assembly,
an initiating polynucleotide can be partially double-stranded or
fully double-stranded.
[0044] In one embodiment, an initiating polynucleotide of the
invention can be bound to a solid support for improved efficiency.
The solid phase allows for the efficient separation of the
assembled target polynucleotide from other components of the
reaction. Different supports can be applied in the method. For
example, supports can be magnetic latex beads or magnetic control
pore glass beads that allows the desirable product from the
reaction mixture to be magnetically separated. Binding the
initiating polynucleotide to such beads can be accomplished by a
variety of known methods, for example carbodiimide treatment
(Gilham, Biochemistry 7:2809-2813 (1968); Mizutani and Tachbana, J.
Chromatography 356:202-205 (1986); Wolf et al., Nucleic Acids Res.
15:2911-2926 (1987); Musso, Nucleic Acids Res. 15:5353-5372 (1987);
Lund et al., Nucleic Acids Res. 16:10861-10880 (1988)).
[0045] The initiating polynucleotide attached to the solid phase
can act as an anchor for the continued synthesis of the target
polynucleotide. Assembly can be accomplished by addition of
contiguous polynucleotides together with ligase for ligation
assembly or by addition of oligonucleotides together with
polymerase for primer extension assembly. After the appropriate
incubation time, unbound components of the method can be washed out
and the reaction can be repeated again to improve the efficiency of
template utilization. Alternatively, another set of polynucleotides
or oligonucleotides can be added to continue the assembly.
[0046] Solid phase, to be efficiently used for the synthesis, can
contain pores with sufficient room for synthesis of the long
nucleic acid molecules. The solid phase can be composed of material
that cannot non-specifically bind any undesired components of the
reaction. One way to solve the problem is to use control pore glass
beads appropriate for long DNA molecules. The initiating
polynucleotide can be attached to the beads through a long
connector. The role of the connector is to position the initiating
polynucleotide from the surface of the solid support at a desirable
distance.
[0047] The method of the invention further includes identifying a
second polynucleotide sequence present in the target polynucleotide
which is contiguous with the initiating polynucleotide and includes
at least one plus strand oligonucleotide annealed to at least one
minus strand oligonucleotide resulting in a partially
double-stranded polynucleotide comprised of a 5' overhang, a 3'
overhang, or a 5' overhang and a 3' overhang, where at least one
overhang of the second polynucleotide is complementary to at least
one overhang of the initiating polynucleotide. Two or more
oligonucleotides having complementary regions, where they are
permitted, will "anneal" (i.e., base pair) under the appropriate
conditions, thereby producing a double-stranded region. In order to
anneal (i.e., hybridize), oligonucleotides must be at least
partially complementary. The term "complementary to" is used herein
in relation to nucleotides to mean a nucleotide that will base pair
with another specific nucleotide. Thus adenosine triphosphate is
complementary to uridine triphosphate or thymidine triphosphate and
guanosine triphosphate is complementary to cytidine
triphosphate.
[0048] As used herein, a 5' or 3' "overhang" means a region on the
5' or 3', or 5' and 3', end of a polynucleotide that is
single-stranded, i.e. not base paired. An overhang provides a means
for the subsequent annealing of a contiguous polynucleotide
containing an overhang that is complementary to the overhang of the
contiguous polynucleotide. Depending on the application envisioned,
one will desire to employ varying conditions of annealing to
achieve varying degrees of annealing selectivity.
[0049] For applications requiring high selectivity, one typically
will desire to employ relatively stringent conditions to form the
hybrids, e.g., one will select relatively low salt and/or high
temperature conditions, such as provided by about 0.02 M to about
0.10 M NaCl at temperatures of about 50.degree. C. to about
70.degree. C. Such high stringency conditions tolerate little, if
any, mismatch between the oligonucleotide and the template or
target strand. It generally is appreciated that conditions can be
rendered more stringent by the addition of increasing amounts of
formamide.
[0050] For certain applications, for example, by analogy to
substitution of nucleotides by site-directed mutagenesis, it is
appreciated that lower stringency conditions may be used. Under
these conditions, hybridization may occur even though the sequences
of probe and target strand are not perfectly complementary, but are
mismatched at one or more positions. Conditions may be rendered
less stringent by increasing salt concentration and decreasing
temperature. For example, a medium stringency condition could be
provided by about 0.1 to 0.25 M NaCl at temperatures of about
37.degree. C. to about 55.degree. C., while a low stringency
condition could be provided by about 0.15 M to about 0.9 M salt, at
temperatures ranging from about 20.degree. C. to about 55.degree.
C. Thus, hybridization conditions can be readily manipulated
depending on the desired results.
[0051] In certain embodiments, it will be advantageous to determine
the hybridization of oligonucleotides by employing a label. A wide
variety of appropriate labels are known in the art, including
fluorescent, radioactive, enzymatic or other ligands, such as
avidin/biotin, which are capable of being detected. In preferred
embodiments, one may desire to employ a fluorescent label or an
enzyme tag such as urease, alkaline phosphatase or peroxidase,
instead of radioactive or other environmentally undesirable
reagents. In the case of enzyme tags, calorimetric indicator
substrates are known that can be employed to provide a means for
detection visible to the human eye or spectrophotometrically to
identify whether specific hybridization with complementary
oligonucleotide has occurred.
[0052] In embodiments involving a solid phase, for example, at
least one oligonucleotide of an initiating polynucleotide is
adsorbed or otherwise affixed to a selected matrix or surface. This
fixed, single-stranded nucleic acid is then subjected to
hybridization with the complementary oligonucleotides under desired
conditions. The selected conditions will also depend on the
particular circumstances based on the particular criteria required
(depending, for example, on the G+C content, type of target nucleic
acid, source of nucleic acid, size of hybridization probe, etc.).
Following washing of the hybridized surface to remove
non-specifically bound oligonucleotides, the hybridization may be
detected, or even quantified, by means of the label.
[0053] The method of the invention further provides a third
polynucleotide present in the target polynucleotide which is
contiguous with the initiating sequence and provides a 5' overhang,
a 3' overhang, or a 5' overhang and a 3' overhang, where at least
one overhang of the third polynucleotide is complementary to at
least one overhang of the initiating polynucleotide which is not
complementary to an overhang of the second polynucleotide.
[0054] The method further provides contacting the initiating
polynucleotide with the second polynucleotide and the third
polynucleotide under conditions and for such time suitable for
annealing, the contacting resulting in a contiguous double-stranded
polynucleotide, resulting in the bi-directional extension of the
initiating polynucleotide. The annealed polynucleotides are
optionally contacted with a ligase under conditions suitable for
ligation. The method discussed above is optionally repeated to
sequentially add double-stranded polynucleotides to the extended
initiating polynucleotide through repeated cycles of annealing and
ligation.
[0055] A target polynucleotide sequence can be designed de novo or
derived from a "model polynucleotide sequence". As used herein, a
"model polynucleotide sequence" includes any nucleic acid sequence
that encodes a model polypeptide sequence. A model polypeptide
sequence provides a basis for designing a modified polynucleotide
such that a target polynucleotide incorporating the desired
modification is synthesized.
[0056] The present invention provides also provides methods that
can be used to synthesize, de novo, polynucleotides that encode
sets of genes, either naturally occurring genes expressed from
natural or artificial promoter constructs or artificial genes
derived from synthetic DNA sequences, which encode elements of
biological systems that perform a specified function or attribution
of an artificial organism as well as entire genomes. In producing
such systems and genomes, the present invention provides the
synthesis of a replication-competent, double-stranded
polynucleotide, wherein the polynucleotide has an origin of
replication, a first coding region and a first regulatory element
directing the expression of the first coding region. By replication
competent, it is meant that the polynucleotide is capable of
directing its own replication. Thus, it is envisioned that the
polynucleotide will possess all the cis-acting signals required to
facilitate its own synthesis. In this respect, the polynucleotide
will be similar to a plasmid or a virus, such that once placed
within a cell, it is capable of replication by a combination of the
polynucleotide's and cellular functions.
[0057] A polynucleotide sequence defining a gene, genome, set of
genes or protein sequence can be designed in a computer-assisted
manner (discussed below) and used to generate a set of parsed
oligonucleotides covering the plus (+) and minus (-) strand of the
sequence. As used herein, a "parsed" means a target polynucleotide
sequence has been delineated in a computer-assisted manner such
that a series of contiguous oligonucleotide sequences are
identified. The oligonucleotide sequences are individually
synthesized and used in a method of the invention to generate a
target polynucleotide. The length of an oligonucleotide is quite
variable. Preferably, oligonucleotides used in the methods of the
invention are between about 15 and 100 bases and more preferably
between about 20 and 50 bases. Specific lengths include, but are
not limited to 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64. 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99 and 100 bases. Depending on the size, the overlap
between the oligonucleotides having partial complementarity may be
designed to be between 5 and 75 bases per oligonucleotide pair.
[0058] The oligonucleotides preferably are treated with
polynucleotide kinase, for example, T4 polynucleotide kinase. The
kinasing can be performed prior to, or after, mixing of the
oligonucleotides set or after, but before annealing. After
annealing, the oligonucleotides are treated with an enzyme having a
ligating function. For example, a DNA ligase typically will be
employed for this function. However, topoisomerase, which does not
require 5' phosphorylation, is rapid and operates at room
temperature, and may be used instead of ligase. For example, 50
base pair oligonucleotides overlapping by 25 bases can be
synthesized by an oligonucleotide array synthesizer (OAS). A 5' (+)
strand set of oligonucleotides is synthesized in one 96-well plate
and the second 3' or (-) strand set is synthesized in a second
96-well microtiter plate. Synthesis can be carried out using
phosphoramidite chemistry modified to miniaturize the reaction size
and generate small reaction volumes and yields in the range of 2 to
5 mmole. Synthesis is done on controlled pore glass beads (CPGs),
then the completed oligonucleotides are deblocked, deprotected and
removed from the beads. The oligonucleotides are lyophilized,
re-suspended in water and 5' phosphorylated using polynucleotide
kinase and ATP to enable ligation.
[0059] The set of arrayed oligonucleotide sequences in the plate
can be assembled using a mixed pooling strategy. For example,
systematic pooling of component oligonucleotides can be performed
using a modified Beckman Biomek automated pipetting robot, or
another automated lab workstation. The fragments can be combined
with buffer and enzyme (Taq I DNA ligase or Egea Assemblase.TM.,
for example). Pooling can be performed in microwell plates. After
each step of pooling, the temperature is ramped to enable annealing
and ligation, then additional pooling carried out.
[0060] Target polynucleotide assembly involves forming a set of
intermediates. A set of intermediates can include a plus strand
oligonucleotide annealed to a minus strand oligonucleotide, as
described above. The annealed intermediate can be formed by
providing a single plus strand oligonucleotide annealed to a single
minus strand oligonucleotide.
[0061] Alternatively, two or more oligonucleotides may comprise the
plus strand or the minus strand. For example, in order to construct
a polynucleotide (e.g., an initiating polynucleotide) which can be
used to assemble a target polynucleotide of the invention, three or
more oligonucleotides can be annealed. Thus, a first plus strand
oligonucleotide, a second plus strand oligonucleotide contiguous
with the first plus strand oligonucleotide, and a minus strand
oligonucleotide having a first contiguous sequence which is at
least partially complementary to the first plus strand
oligonucleotide and second contiguous sequence which is at least
partially complementary to the second plus strand oligonucleotide
can be annealed to form a partially double-stranded polynucleotide.
The polynucleotide can include a 5' overhang, a 3' overhang, or a
5' overhang and a 3' overhang. The first plus strand
oligonucleotide and second plus strand oligonucleotide are
contiguous sequences such that they are ligatable. The minus strand
oligonucleotide is partially complementary to both plus strand
oligonucleotides and acts as a "bridge" or "stabilizer" sequence by
annealing to both oligonucleotides. Subsequent polynucleotides
comprised of more than two oligonucleotides annealed as previously
described, can be used to assemble a target polynucleotide in a
manner resulting in a contiguous double-stranded
polynucleotide.
[0062] An example of using two or more plus strand oligonucleotides
to assemble a polynucleotide is shown in FIG. 3. A triplex of three
oligonucleotides of about 50 bp each, which overlap by about 25 bp
form a "nicked" intermediate. Two of these oligonucleotides provide
a ligation substrate joined by ligase and the third oligonucleotide
is a stabilizer that brings together two specific sequences by
annealing resulting in the formation of a part of the final
polynucleotide construct. This intermediate provides a substrate
for DNA ligase which, through its nick sealing activity, joins the
two 50-base pair oligonucleotides into a single 100 base
single-stranded polynucleotide.
[0063] Following initial pooling and formation of annealed
products, the products are assembled into increasingly larger
polynucleotides. For example, following triplex formation of
oligonucleotides, sets of triplexes are systematically joined,
ligated, and assembled. Each step can be mediated by robotic
pooling, ligation and thermal cycling to achieve annealing and
denaturation. The final step joins assembled pieces into a complete
sequence representing all of the fragments in the array. Since the
efficiency of yield at each step is less than 100%, the mass amount
of completed product in the final mixture may be very small.
Optionally, additional specific oligonucleotide primers, usually 15
to 20 bases and complementary to the extreme ends of the assembly,
can be annealed and PCR amplification carried out, thereby
amplifying and purifying the final full-length product.
[0064] The methods of the invention provide several improvements
over existing polynucleotide synthesis technology. For example,
synthesis can utilize microdispensing piezioelectric or
microsolenoid nanodispensors allowing very fast synthesis, much
smaller reaction volumes and higher density plates as synthesis
vessels. The instrument will use up to 1536 well plates giving a
very high capacity. Additionally, controlled pooling can be
performed by a microfluidic manifold that will move individual
oligonucleotides though microchannels and mix/ligate in a
controlled way. This will obviate the need for robotic pipetting
and increases speed and efficiency. Thus, an apparatus that
accomplishes a method of the invention will have a greater
capability for simultaneous reactions giving an overall larger
capacity for gene length.
[0065] Once target polynucleotide have been synthesized using a
method of the present invention, it may be necessary to screen the
sequences for analysis of function. Specifically contemplated by
the present inventor are chip-based DNA technologies. Briefly,
these techniques involve quantitative methods for analyzing large
numbers of genes rapidly and accurately. By tagging genes with
oligonucleotides or using fixed probe arrays, one can employ chip
technology to segregate target molecules as high-density arrays and
screen these molecules on the basis of hybridization.
[0066] The use of combinatorial synthesis and high throughput
screening assays are well known to those of skill in the art. For
example, U.S. Pat. Nos. 5,807,754; 5,807,683; 5,804,563; 5,789,162;
5,783,384; 5,770,358; 5,759,779; 5,747,334;5,686,242; 5,198,346;
5,738,996; 5,733, 743; 5,714,320; and 5,663,046 (each specifically
incorporated herein by reference) describe screening systems useful
for determining the activity of a target polypeptide . These
patents teach various aspects of the methods and compositions
involved in the assembly and activity analyses of high-density
arrays of different polysubunits (polynucleotides or polypeptides).
As such it is contemplated that the methods and compositions
described in the patents listed above may be useful in assaying the
activity profiles of the target polypeptides of the present
invention.
[0067] In another embodiment, the invention provides a method of
synthesizing a target polynucleotide by providing a target
polynucleotide sequence and identifying at least one initiating
polynucleotide sequence present in the target polynucleotide
sequence that includes at least one plus strand oligonucleotide
annealed to at least one minus strand oligonucleotide resulting in
a double-stranded polynucleotide. The initiating polynucleotide is
contacted under conditions suitable for primer annealing with a
first oligonucleotide having partial complementarity to the 3'
portion of the plus strand of the initiating polynucleotide, and a
second oligonucleotide having partial complementarity to the 3'
portion of the minus strand of the initiating polynucleotide.
Primer extension subsequently performed using polynucleotide
synthesis from the 3'-hydroxyl of: 1) the plus strand of the
initiating polynucleotide; 2) the annealed first oligonucleotide;
3) the minus strand of the initiating polynucleotide; and 4) the
annealed second oligonucleotide. The synthesis results in the
initiating sequence being extended bi-directionally thereby forming
a nascent extended initiating polynucleotide. The extended
initiating sequence can be further extended by repeated cycles of
annealing and primer extension.
[0068] As previously noted, oligonucleotides can be used as
building blocks to assemble polynucleotides through annealing and
ligation reactions. Alternatively, oligonucleotides can be used as
primers to manufacture polynucleotides through annealing and primer
extension reactions. The term "primer" is used herein to refer to a
binding element which comprises an oligonucleotide, whether
occurring naturally as in a purified restriction digest or produced
synthetically, which is capable of acting as a point of initiation
of synthesis when placed under conditions in which synthesis of a
primer extension product which is complementary to a nucleic acid
strand is induced, i.e., in the presence of appropriate nucleotides
and an agent for polymerization such as a DNA polymerase in an
appropriate buffer ("buffer" includes pH, ionic strength,
cofactors, etc.) and at a suitable temperature.
[0069] The primer is preferably single stranded for maximum
efficiency in amplification, but may alternatively be double
stranded. If double stranded, the primer is first treated to
separate its strands before being used to prepare extension
products. Preferably, the primer is an oligodeoxyribonucleotide.
The primer must be sufficiently long to prime the synthesis of
extension products in the presence of the agent for polymerization.
The exact lengths of the primers will depend on many factors,
including temperature and source of primer and use of the method.
Primers having only short sequences capable of hybridization to the
target nucleotide sequence generally require lower temperatures to
form sufficiently stable hybrid complexes with the template.
>The primers herein are selected to be "substantially"
complementary to the different strands of each specific sequence to
be amplified. This means that the primers must be sufficiently
complementary to hybridize with their respective strands.
Therefore, the primer sequence need not reflect the exact sequence
of the template. Commonly, however, the primers have exact
complementarity except with respect to analyses effected according
to the method described in Nucleic Acids Research 17 (7) 2503-2516
(1989) or a corresponding method employing linear amplification or
an amplification technique other than the polymerase chain
reaction.
[0070] The agent for primer extension of an oligonucleotide may be
any compound or system that will function to accomplish the
synthesis of primer extension products, including enzymes. Suitable
enzymes for this purpose include, for example, E. coli DNA
Polymerase I, Klenow fragment of E. coli DNA polymerase I, T4 DNA
polymerase, other available DNA polymerases, reverse transcriptase,
and other enzymes, including thermostable enzymes. The term
"thermostable enzyme" as used herein refers to any enzyme that is
stable to heat and is heat resistant and catalyses (facilitates)
combination of the nucleotides in the proper manner to form the
primer extension products which are complementary to each nucleic
acid strand. Generally, the synthesis will be initiated at the 3'
end of each primer and will proceed in the 5' direction along the
template strand, until synthesis terminates. A preferred
thermostable enzyme that may be employed in the process of the
present invention is that which can be extracted and purified from
Thermus aquaticus. Such an enzyme has a molecular weight of about
86,000-90,000 daltons. Thermus aquaticus strain YT1 is available
without restriction from the American Type Culture Collection,
12301 Parklawn Drive, Rockville, Md., U.S.A. as ATCC 25,104.
[0071] Processes for amplifying a desired target polynucleotide are
known and have been described in the literature. K. Kleppe et al in
J. Mol. Biol., (1971), 56, 341-361 disclose a method for the
amplification of a desired DNA sequence. The method involves
denaturation of a DNA duplex to form single strands. The
denaturation step is carried out in the presence of a sufficiently
large excess of two nucleic acid primers that hybridize to regions
adjacent to the desired DNA sequence. Upon cooling two structures
are obtained each containing the full length of the template strand
appropriately complexed with primer. DNA polymerase and a
sufficient amount of each required nucleoside triphosphate are
added whereby two molecules of the original duplex are obtained.
The above cycle of denaturation, primer addition and extension are
repeated until the appropriate number of copies of the desired
target polynucleotide is obtained.
[0072] The present invention further provides a method for the
expression and isolation of a target polypeptide encoded by a
target polynucleotide. The method includes incorporating a target
polynucleotide synthesized by a method of the invention into an
expression vector; introducing the expression vector of into a
suitable host cell; culturing the host cell under conditions and
for such time as to promote the expression of the target
polypeptide encoded by the target polynucleotide; and isolating the
target polypeptide.
[0073] The invention can be used to modify certain functional,
structural, or phylogenic features of a model polynucleotide
encoding a model polypeptide resulting in an altered target
polypeptide. An input or model polynucleotide sequence encoding a
model polypeptide can be electronically manipulated to determine a
potential for an effect of an amino acid change (or variance) at a
particular site or multiple sites in the model polypeptide. Once
identified, a novel target polynucleotide sequence is assembled by
a method of the invention such that the target polynucleotide
encodes a target polypeptide possessing a characteristic different
from that of the model polypeptide.
[0074] The methods of the invention may rely on the use of public
sequence and structure databases. These databases become more
robust as more and more sequences and structures are added.
Information regarding the amino acid sequence of a target
polypeptide and the tertiary structure of the polypeptide can be
used to synthesize oligonucleotides that can be assembled into a
target polynucleotide encoding a target polypeptide. A model
polypeptide should have sufficient structural information to
analyze the amino acids involved in the function of the
polypeptide. The structural information can be derived from x-ray
crystallography, NMR, or some other technique for determining the
structure of a protein at the amino acid or atomic level. Once
selected, the sequence and structural information obtained from the
model polypeptide can be used to generate a plurality of
polynucleotides encoding a plurality of variant amino acid
sequences that comprise a target polypeptide. Thus, a model
polypeptide can be selected based on overall sequence similarity to
the target protein or based on the presence of a portion having
sequence similarity to a portion of the target polypeptide.
[0075] A "polypeptide", as used herein, is a polymer in which the
monomers are alpha amino acids and are joined together through
amide bonds. Amino acids may be the L-optical isomer or the
D-optical isomer. Polypeptides are two or more amino acid monomers
long and are often more than 20 amino acid monomers long. Standard
abbreviations for amino acids are used (e.g., P for proline). These
abbreviations are included in Stryer, Biochemistry, Third Ed.,
1988, which is incorporated herein by reference for all purposes.
With respect to polypeptides, "isolated" refers to a polypeptide
that constitutes the major component in a mixture of components,
e.g., 50% or more, 60% or more, 70% or more, 80% or more, 90% or
more, or 95% or more by weight. Isolated polypeptides typically are
obtained by purification from an organism in which the polypeptide
has been produced, although chemical synthesis is also possible.
Method of polypeptide purification includes, for example,
chromatography or immunoaffinity techniques.
[0076] Polypeptides of the invention may be detected by sodium
dodecyl sulphate (SDS)-polyacrylamide gel electrophoresis followed
by Coomassie Blue-staining or Western blot analysis using
monoclonal or polyclonal antibodies that have binding affinity for
the polypeptide to be detected.
[0077] A "chimeric polypeptide," as used herein, is a polypeptide
containing portions of amino acid sequence derived from two or more
different proteins, or two or more regions of the same protein that
are not normally contiguous.
[0078] A "ligand", as used herein, is a molecule that is recognized
by a receptor. Examples of ligands that can be investigated by this
invention include, but are not restricted to, agonists and
antagonists for cell membrane receptors, toxins and venoms, viral
epitopes, hormones, opiates, steroids, peptides, enzyme substrates,
cofactors, drugs, lectins, sugars, oligonucleotides, nucleic acids,
oligosaccharides, and proteins.
[0079] A "receptor", as used herein, is a molecule that has an
affinity for a ligand. Receptors may be naturally-occurring or
manmade molecules. They can be employed in their unaltered state or
as aggregates with other species. Receptors may be attached,
covalently or noncovalently, to a binding member, either directly
or via a specific binding substance. Examples of receptors which
can be employed by this invention include, but are not restricted
to, antibodies, cell membrane receptors, monoclonal antibodies and
antisera reactive with specific antigenic determinants, viruses,
cells, drugs, polynucleotides, nucleic acids, peptides, cofactors,
lectins, sugars, polysaccharides, cellular membranes, and
organelles. A "ligand receptor pair" is formed when two molecules
have combined through molecular recognition to form a complex.
[0080] Specific examples of polypeptides which can synthesized by
this invention include but are not restricted to:
[0081] a) Microorganism Receptors:
[0082] Determination of ligands that bind to microorganism
receptors such as specific transport proteins or enzymes essential
to survival of microorganisms would be a useful tool for
discovering new classes of antibiotics. Of particular value would
be antibiotics against opportunistic fungi, protozoa, and bacteria
resistant to antibiotics in current use.
[0083] b) Enzymes:
[0084] For instance, a receptor can comprise a binding site of an
enzyme such as an enzyme responsible for cleaving a
neurotransmitter; determination of ligands for this type of
receptor to modulate the action of an enzyme that cleaves a
neurotransmitter is useful in developing drugs that can be used in
the treatment of disorders of neurotransmission.
[0085] c) Antibodies:
[0086] For instance, the invention may be useful in investigating a
receptor that comprises a ligand-binding site on an antibody
molecule which combines with an epitope of an antigen of interest;
determining a sequence that mimics an antigenic epitope may lead to
the development of vaccines in which the immunogen is based on one
or more of such sequences or lead to the development of related
diagnostic agents or compounds useful in therapeutic treatments
such as for autoimmune diseases (e.g., by blocking the binding of
the "self" antibodies).
[0087] d) Polynucleotides:
[0088] Sequences of polynucleotides may be synthesized to establish
DNA or RNA binding sequences that act as receptors for synthesized
sequence. e) Catalytic Polypeptides: Polymers, preferably
antibodies, which are capable of promoting a chemical reaction
involving the conversion of one or more reactants to one or more
products. Such polypeptides generally include a binding site
specific for at least one reactant or reaction intermediate and an
active functionality proximate to the binding site, which
functionality is capable of chemically modifying the bound
reactant. Catalytic polypeptides and others are described in, for
example, PCT Publication No. WO 90/05746, WO 90/05749, and WO
90/05785, which are incorporated herein by reference for all
purposes.
[0089] f) Hormone Receptors:
[0090] Identification of the ligands that bind with high affinity
to a receptor such as the receptors for insulin and growth hormone
is useful in the development of, for example, an oral replacement
of the daily injections which diabetics must take to relieve the
symptoms of diabetes or a replacement for growth hormone. Other
examples of hormone receptors include the vasoconstrictive hormone
receptors; determination of ligands for these receptors may lead to
the development of drugs to control blood pressure.
[0091] g) Opiate Receptors:
[0092] Determination of ligands which bind to the opiate receptors
in the brain is useful in the development of less-addictive
replacements for morphine and related drugs.
[0093] In the context of a polypeptide, the term "structure" refers
to the three dimensional arrangement of atoms in the protein.
"Function" refers to any measurable property of a protein. Examples
of protein function include, but are not limited to, catalysis,
binding to other proteins, binding to non-protein molecules (e.g.,
drugs), and isomerization between two or more structural forms.
"Biologically relevant protein" refers to any protein playing a
role in the life of an organism.
[0094] To identify significant structural motifs, the sequence of
the model polypeptide is examined for matches to the entries in one
or more databases of recognized domains, e.g., the PROSITE database
domains (Bairoch, Nucl. Acids. Res. 24:217, 1997) or the pfam HMM
database (Bateman et al., (2000) Nucl. Acids. Res. 28:263). The
PROSITE database is a compilation of two types of sequence
signatures-profiles, typically representing whole protein domains,
and patterns typically representing just the most highly conserved
functional or structural aspects of protein domains.
[0095] The methods of the invention can be used to generate
polypeptides containing polymorphisms that have an effect on a
catalytic activity of a target polypeptide or a non-catalytic
activity of the target polypeptide (e.g., structure, stability,
binding to a second protein or polypeptide chain, binding to a
nucleic acid molecule, binding to a small molecule, and binding to
a macromolecule that is neither a protein nor a nucleic acid). For
example, the invention provides a means for assembling any
polynucleotide sequence encoding a target polypeptide such that the
encoded polypeptide can be expressed and screened for a particular
activity. By altering particular amino acids at specific points in
the target polypeptide, the operating temperature, operating pH, or
any other characteristic of a polypeptide can be manipulated
resulting in a polypeptide with a unique activity. Thus, the
methods of the invention can be used to identify amino acid
substitutions that can be made to engineer the structure or
function of a polypeptide of interest (e.g., to increase or
decrease a selected activity or to add or remove a selective
activity).
[0096] In addition, the methods of the invention can be used in the
identification and analysis of candidate polymorphisms for
polymorphism-specific targeting by pharmaceutical or diagnostic
agents, for the identification and analysis of candidate
polymorphisms for pharmacogenomic applications, and for
experimental biochemical and structural analysis of pharmaceutical
targets that exhibit amino acid polymorphism.
[0097] A library of target polynucleotides encoding a plurality of
target polypeptides can be prepared by the present invention. Host
cells are transformed by artificial introduction of the vectors
containing the target polynucleotide by inoculation under
conditions conducive for such transformation. The resultant
libraries of transformed clones are then screened for clones which
display activity for the polypeptide of interest in a phenotypic
assay for activity.
[0098] A target polynucleotide of the invention can be incorporated
(i.e., cloned) into an appropriate vector. For purposes of
expression, the target sequences encoding a target polypeptide of
the invention may be inserted into a recombinant expression vector.
The term "recombinant expression vector" refers to a plasmid,
virus, or other vehicle known in the art that has been manipulated
by insertion or incorporation of the polynucleotide sequence
encoding a target polypeptide of the invention. The expression
vector typically contains an origin of replication, a promoter, as
well as specific genes that allow phenotypic selection of the
transformed cells. Vectors suitable for use in the present
invention include, but are not limited to, the T7-based expression
vector for expression in bacteria (Rosenberg et al., Gene, 56:125,
1987), the PMSXND expression vector for expression in mammalian
cells (Lee and Nathans, J. Biol. Chem., 263:3521, 1988),
baculovirus-derived vectors for expression in insect cells,
cauliflower mosaic virus, CaMV, tobacco mosaic virus, TMV.
[0099] Depending on the vector utilized, any of a number of
suitable transcription and translation elements, including
constitutive and inducible promoters, transcription enhancer
elements, transcription terminators, etc. may be used in the
expression vector (see, e.g., Bitter et al., Methods in Enzymology,
153:516-544, 1987). These elements are well known to one of skill
in the art.
[0100] The term "operably linked" or "operably associated" refers
to functional linkage between the regulatory sequence and the
polynucleotide sequence regulated by the regulatory sequence. The
operably linked regulatory sequence controls the expression of the
product expressed by the polynucleotide sequence. Alternatively,
the functional linkage also includes an enhancer element.
[0101] "Promoter" means a nucleic acid regulatory sequence
sufficient to direct transcription. Also included in the invention
are those promoter elements that are sufficient to render
promoter-dependent polynucleotide sequence expression controllable
for cell-type specific, tissue specific, or inducible by external
signals or agents; such elements may be located in the 5' or 3'
regions of the native gene, or in the introns.
[0102] "Gene expression" or "polynucleotide sequence expression"
means the process by which a nucleotide sequence undergoes
successful transcription and translation such that detectable
levels of the delivered nucleotide sequence are expressed in an
amount and over a time period so that a functional biological
effect is achieved.
[0103] In yeast, a number of vectors containing constitutive or
inducible promoters may be used. (Current Protocols in Molecular
Biology, Vol. 2, Ed. Ausubel et al., Greene Publish. Assoc. &
Wiley Interscience, Ch. 13, 1988; Grant et al., "Expression and
Secretion Vectors for Yeast," in Methods in Enzymology, Eds. Wu
& Grossman, Acad. Press, N.Y., Vol. 153, pp.516-544, 1987;
Glover, DNA Cloning, Vol. II, IRL Press, Wash., D.C., Ch. 3, 1986;
"Bitter, Heterologous Gene Expression in Yeast," Methods in
Enzymology, Eds. Berger & Kimmel, Acad. Press, N.Y., Vol. 152,
pp. 673-684, 1987; and The Molecular Biology of the Yeast
Saccharomyces, Eds. Strathern et al., Cold Spring Harbor Press,
Vols. I and II, 1982). A constitutive yeast promoter, such as ADH
or LEU2, or an inducible promoter, such as GAL, may be used
("Cloning in Yeast," Ch. 3, R. Rothstein In: DNA Cloning Vol.11, A
Practical Approach, Ed. DM Glover, IRL Press, Wash., D.C., 1986).
Alternatively, vectors may be used which promote integration of
foreign DNA sequences into the yeast chromosome.
[0104] In certain embodiments, it may be desirable to include
specialized regions known as telomeres at the end of a target
polynucleotide sequence. Telomeres are repeated sequences found at
chromosome ends and it has long been known that chromosomes with
truncated ends are unstable, tend to fuse with other chromosomes
and are otherwise lost during cell division.
[0105] Some data suggest that telomeres interact with the
nucleoprotein complex and the nuclear matrix. One putative role for
telomeres includes stabilizing chromosomes and shielding the ends
from degradative enzyme.
[0106] Another possible role for telomeres is in replication.
According to present doctrine, replication of DNA requires starts
from short RNA primers annealed to the T-end of the template. The
result of this mechanism is an "end replication problem" in which
the region corresponding to the RNA primer is not replicated. Over
many cell divisions, this will result in the progressive truncation
of the chromosome. It is thought that telomeres may provide a
buffer against this effect, at least until they are themselves
eliminated by this effect. A further structure that may be included
in target polynucleotide is a centromere.
[0107] In certain embodiments of the invention, the delivery of a
nucleic acid in a cell may be identified in vitro or in vivo by
including a marker in the expression construct. The marker would
result in an identifiable change to the transfected cell permitting
easy identification of expression.
[0108] An expression vector of the invention can be used to
transform a target cell. By "transformation" is meant a genetic
change induced in a cell following incorporation of new DNA (i.e.,
DNA exogenous to the cell). Where the cell is a mammalian cell, the
genetic change is generally achieved by introduction of the DNA
into the genome of the cell. By "transformed cell" is meant a cell
into which (or into an ancestor of which) has been introduced, by
means of recombinant DNA techniques. Transformation of a host cell
with recombinant DNA may be carried out by conventional techniques
as are well known to those skilled in the art. Where the host is
prokaryotic, such as E. coli, competent cells that are capable of
DNA uptake can be prepared from cells harvested after exponential
growth phase and subsequently treated by the CaCl.sub.2 method by
procedures well known in the art. Alternatively, MgCl.sub.2 or RbCl
can be used. Transformation can also be performed after forming a
protoplast of the host cell or by electroporation.
[0109] A target polypeptide of the invention can be produced in
prokaryotes by expression of nucleic acid encoding the polypeptide.
These include, but are not limited to, microorganisms, such as
bacteria transformed with recombinant bacteriophage DNA, plasmid
DNA, or cosmid DNA expression vectors encoding a polypeptide of the
invention. The constructs can be expressed in E. coli in large
scale for in vitro assays. Purification from bacteria is simplified
when the sequences include tags for one-step purification by
nickel-chelate chromatography. The construct can also contain a tag
to simplify isolation of the polypeptide. For example, a
polyhistidine tag of, e.g., six histidine residues, can be
incorporated at the amino terminal end, or carboxy terminal end, of
the protein. The polyhistidine tag allows convenient isolation of
the protein in a single step by nickel-chelate chromatography. The
target polypeptide of the invention can also be engineered to
contain a cleavage site to aid in protein recovery. Alternatively,
the polypeptides of the invention can be expressed directly in a
desired host cell for assays in situ.
[0110] When the host is a eukaryote, such methods of transfection
of DNA as calcium phosphate co-precipitates, conventional
mechanical procedures, such as microinjection, electroporation or
biollistic techniques, insertion of a plasmid encased in liposomes,
or virus vectors may be used. Eukaryotic cells can also be
cotransfected with DNA sequences encoding a polypeptide of the
invention, and a second foreign DNA molecule encoding a selectable
phenotype, such as the herpes simplex thymidine kinase gene.
Another method is to use a eukaryotic viral vector, such as simian
virus 40 (SV40) or bovine papilloma virus, to transiently infect or
transform eukaryotic cells and express the protein. (Eukaryotic
Viral Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982).
Preferably, a eukaryotic host is utilized as the host cell, as
described herein.
[0111] Eukaryotic systems, and preferably mammalian expression
systems, allow for proper post-translational modifications of
expressed mammalian proteins to occur. Eukaryotic cells that
possess the cellular machinery for proper processing of the primary
transcript, glycosylation, phosphorylation, and advantageously
secretion of the gene product should be used as host cells for the
expression of the polypeptide of the invention. Such host cell
lines may include, but are not limited to, CHO, VERO, BHK, HeLa,
COS, MDCK, Jurkat, HEK-293, and WI38.
[0112] For long-term, high-yield production of recombinant
proteins, stable expression is preferred. Rather than using
expression vectors that contain viral origins of replication, host
cells can be transformed with the cDNA encoding a target
polypeptide of the invention controlled by appropriate expression
control elements (e.g., promoter, enhancer, sequences,
transcription terminators, polyadenylation sites, etc.), and a
selectable marker. The selectable marker in the recombinant plasmid
confers resistance to the selection and allows cells to stably
integrate the plasmid into their chromosomes and grow to form foci
that, in turn, can be cloned and expanded into cell lines. For
example, following the introduction of foreign DNA, engineered
cells may be allowed to grow for 1-2 days in an enriched media, and
then are switched to a selective media. A number of selection
systems may be used, including, but not limited to, the herpes
simplex virus thymidine kinase (Wigler et al., Cell, 11:223, 1977),
hypoxanthine-guanine phosphoribosyltransferase (Szybalska &
Szybalski, Proc. Natl. Acad. Sci. USA, 48:2026, 1962), and adenine
phosphoribosyltransferase (Lowy et al., Cell, 22:817, 1980) genes
can be employed in tk-, hgprt- or aprt-cells, respectively. Also,
antimetabolite resistance can be used as the basis of selection for
dhfr, which confers resistance to methotrexate (wigler et al.,
Proc. Natl. Acad. Sci. USA, 77:3567, 1980; O'Hare et al., Proc.
Natl. Acad. Sci. USA, 8:1527, 1981); gpt, which confers resistance
to mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci.
USA, 78:2072, 1981; neo, which confers resistance to the
aminoglycoside G-418 (Colberre-Garapin et al., J. Mol. Biol.,
150:1, 1981); and hygro, which confers resistance to hygromycin
genes (Santerre et al., Gene, 30:147, 1984). Recently, additional
selectable genes have been described, namely trpB, which allows
cells to utilize indole in place of tryptophan; hisD, which allows
cells to utilize histinol in place of histidine (Hartman &
Mulligan, Proc. Natl. Acad. Sci. USA, 85:8047, 1988); and ODC
(ornithine decarboxylase), which confers resistance to the
ornithine decarboxylase inhibitor, 2-(difluoromethyl)-DL-ornithine,
DFMO (McConlogue L., In: Current Communications in Molecular
Biology, Cold Spring Harbor Laboratory, ed., 1987).
[0113] Techniques for the isolation and purification of either
microbially or eukaryotically expressed polypeptides of the
invention may be by any conventional means, such as, for example,
preparative chromatographic separations and immunological
separations, such as those involving the use of monoclonal or
polyclonal antibodies or antigen.
[0114] A target polynucleotide, or expression construct containing
a target polynucleotide, may be entrapped in a liposome. Liposomes
are vesicular structures characterized by a phospholipid bilayer
membrane and an inner aqueous medium. Multilarnellar liposomes have
multiple lipid layers separated by aqueous medium and form
spontaneously when phospholipids are suspended in an excess of
aqueous solution. The lipid components undergo self-rearrangement
before the formation of closed structures and entrap water and
dissolved solutes between the lipid bilayers. The liposome may be
complexed with a hernagglutinating virus (HVJ). This has been shown
to facilitate fusion with the cell membrane and promote cell entry
of liposome-encapsulated DNA. In other embodiments, the liposome
may be complexed or employed in conjunction with nuclear
non-histone chromosomal proteins (HMG-1). In yet further
embodiments, the liposome may be complexed or employed in
conjunction with both HVJ and HMG-1. In that such expression
constructs have been successfully employed in transfer and
expression of nucleic acid in vitro and in vivo, then they are
applicable for the present invention. Where a bacterial promoter is
employed in the DNA construct, it also will be desirable to include
within the liposome an appropriate bacterial polymerase.
[0115] The present invention describes methods for enabling the
creation of a target polynucleotide based upon information only,
i.e., without the requirement for existing genes, DNA molecules or
genomes. Generally, using computer software, it is possible to
construct a virtual polynucleotide in the computer. This
polynucleotide consists of a string of DNA bases, G, A, T or C,
comprising for example an entire artificial polynucleotide sequence
in a linear string. Following construction of a sequence, computer
software is then used to parse the target sequence breaking it down
into a set of overlapping oligonucleotides of specified length.
This results in a set of shorter DNA sequences that overlap to
cover the entire length of the target polynucleotide in overlapping
sets.
[0116] Typically, a gene of 1000 bases pairs would be broken down
into 20 100-mers where 10 of these comprise one strand and 10 of
these comprise the other strand. They would be selected to overlap
on each strand by 25 to 50 base pairs.
[0117] The degeneracy of the genetic code permits substantial
freedom in the choice of codons for any particular amino acid
sequence. Transgenic organisms such as plants frequently prefer
particular codons that, though they encode the same protein, may
differ from the codons in the organism from which the gene was
derived. For example, U.S. Pat. No. 5,380,831 to Adang et al.
describes the creation of insect resistant transgenic plants that
express the Bacillus thuringiensis (Bt) toxin gene. The Bt crystal
protein, an insect toxin, is encoded by a full-length gene that is
poorly expressed in transgenic plants. In order to improve
expression in plants, a synthetic gene encoding the protein
containing codons preferred in plants was substituted for the
natural sequence. The invention disclosed therein comprised a
chemically synthesized gene encoding an insecticidal protein which
is frequently equivalent to a native insecticidal protein of Bt.
The synthetic gene was designed to be expressed in plants at a
level higher than a native Bt gene.
[0118] In designing a target polynucleotide that encodes a
particular polypeptide, the hydropathic index of amino acids may be
considered. The importance of the hydropathic amino acid index in
conferring interactive biologic function on a protein is generally
understood in the art. Each amino acid has been assigned a
hydropathic index on the basis of their hydrophobicity and charge
characteristics, these are: Isoleucine (+4.5); valine (+4.2);
leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5);
methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (47);
serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6);
histidine (-3.2); glutamate (-3.5); glutamine (-3.5); aspartate
(-3.5); asparagine (-3.5); lysine (-3.9); and arginine (45).
[0119] It is known in the art that certain amino acids may be
substituted by other amino acids having a similar hydropathic index
or score and still result in a protein with similar biological
activity, i.e., still obtain a biological functionally equivalent
protein. In making such changes, the substitution of amino acids
whose hydropathic indices are within +2 is preferred, those which
are within +I are particularly preferred, and those within .+-.0.5
are even more particularly preferred.
[0120] It is also understood in the art that the substitution of
like amino acids can be made effectively on the basis of
hydrophilicity. U.S. Pat. No. 4,554,101, incorporated herein by
reference, states that the greatest local average hydrophilicity of
a protein, as governed by the hydrophilicity of its adjacent amino
acids, correlates with a biological property of the protein.
[0121] As detailed in U.S. Pat. No. 4,554,101, the following
hydrophilicity values have been assigned to amino acid residues:
arginine (+3.0); lysine (+3.0); aspartate (+3.0.+-.1); glutamate
(+3.0.+-.1); serine (+0.3); asparagine (+0.2); glutamine (+0.2);
glycine (0); threonine (44); proline (-0.5.+-.1); alanine (45);
histidine -0.5); cysteine (-1.0); methionine (-1.3); valine 1.5);
leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine
(-2.5); tryptophan (-3.4).
[0122] It is understood that an amino acid can be substituted for
another having a similar hydrophilicity value and still obtain a
biologically equivalent and immunologically equivalent polypeptide.
In such changes, the substitution of amino acids whose
hydrophilicity values are within .+-.2 is preferred, those that are
within .+-.1 are particularly preferred, and those within .+-.0.5
are even more particularly preferred.
[0123] As outlined above, amino acid substitutions are generally
based on the relative similarity of the amino acid side-chain
substituents, for example, their hydrophobicity, hydrophilicity,
charge, size, and the like. Exemplary substitutions that take
various of the foregoing characteristics into consideration are
well known to those of skill in the art and include: arginine and
lysine; glutarnate and aspartate; serine and threonine; glutamine
and asparagine; and valine, leucine and isoleucine.
[0124] Aspects of the invention may be implemented in hardware or
software, or a combination of both. However, preferably, the
algorithms and processes of the invention are implemented in one or
more computer programs executing on programmable computers each
comprising at least one processor, at least one data storage system
(including volatile and non-volatile memory and/or storage
elements), at least one input device, and at least one output
device. Program code is applied to input data to perform the
functions described herein and generate output information. The
output information is applied to one or more output devices, in
known fashion.
[0125] Each program may be implemented in any desired computer
language (including machine, assembly, high level procedural, or
object oriented programming languages) to communicate with a
computer system. In any case, the language may be a compiled or
interpreted language.
[0126] Each such computer program is preferably stored on a storage
medium or device (e.g., ROM, CD-ROM, tape, or magnetic diskette)
readable by a general or special purpose programmable computer, for
configuring and operating the computer when the storage media or
device is read by the computer to perform the procedures described
herein. The inventive system may also be considered to be
implemented as a computer-readable storage medium, configured with
a computer program, where the storage medium so configured causes a
computer to operate in a specific and predefined manner to perform
the functions described herein.
[0127] Thus, in another embodiment, the invention provides a
computer program, stored on a computer-readable medium, for
generating a target polynucleotide sequence. The computer program
includes instructions for causing a computer system to: 1) identify
an initiating polynucleotide sequence contained in the target
polynucleotide sequence; 2) parse the target polynucleotide
sequence into multiply distinct, partially complementary,
oligonucleotides; and 3) control assembly of the target
polynucleotide sequence by controlling the bi-directional extension
of the initiating polynucleotide sequence by the sequential
addition of partially complementary oligonucleotides resulting in a
contiguous double-stranded polynucleotide. The computer program
will contain an algorithm for parsing the sequence of the target
polynucleotide by generating a set of oligonucleotides
corresponding to a polypeptide sequence. The algorithm utilizes a
polypeptide sequence to generate a DNA sequence using a specified
codon table. The algorithm then generates a set of parsed
oligonucleotides corresponding to the (+) and (-) strands of the
DNA sequence in the following manner:
[0128] 1. The DNA sequence GENE[ ], an array of bases, is generated
from the protein sequence AA[ ], an array of amino acids, using a
specified codon table. An example of the codon table for E. coli
type II codons, is listed below.
[0129] a. parameters
[0130] i. N Length of protein in amino acid residues
[0131] ii. L=3N Length of gene in DNA bases
[0132] iii. Q Length of each component oligonucleotide
[0133] iv. X=Q/2 Length of overlap between oligonucleotides
[0134] v. W=3N/Q Number of oligonucleotides in the F set
[0135] vi. Z=3N/Q+1 Number of oligonucleotides in the R set
[0136] vii. F[1:W] set of (+) strand oligonucleotides
[0137] viii. R[L:Z] set of (-) strand oligonucleotides
[0138] ix. AA[1:N] array of amino acid residues
[0139] x. GENE[1:L] array of bases comprising the gene
[0140] b. Obtain or design a protein sequence AA[ ]consisting of a
list of amino acid residues.
[0141] c. Generate the DNA sequence, GENE[ ], from the protein
sequence, AA[ ]
[0142] i. For I=1 to N
[0143] ii. Translate AA[J] from codon table generating GENE[I:
I+2]
[0144] iii. I.dbd.I+3
[0145] iv. J=J+1
[0146] v. Go to ii
[0147] 2. Two sets of overlapping oligonucleotides are generated
from GENE[ ]; F[ ] covers the (+) strand and R[ ] is a
complementary, partially overlapping set covering the (-)
strand.
[0148] a. Generate the F[ ] set of oligos
[0149] i. For I=1 to W
[0150] ii. F[I]=GENE [I:I+Q-1]
[0151] iii. I=I+Q
[0152] iv. Go to ii
[0153] b. Generate the R set of oligos
[0154] i. J=W
[0155] ii. For I=1 to W
[0156] iii. R[I]=GENE [W:W-Q]
[0157] iv. J=J--Q
[0158] v. Go to iii
[0159] c. Result is two set of oligos F[ ] and R[ ] of Q length
[0160] d. Generate the final two finishing oligos
[0161] i. S[1]=GENE [Q/2:1]
[0162] ii. S[2]=GENE [L-Q/2:L]
[0163] Subsequently, oligonucleotide set assembly is established by
the following algorithm:
[0164] Two sets of oligonucleotides F[1:W] R[1:Z] S[1:2]
[0165] 3. Step 1
[0166] a. For I=1 to W
[0167] b. Ligate F[I], F[I+1], R[I]; place in T[I]
[0168] c. Ligate F[I+2], R[I+1], R[I+2] T[I+1]
[0169] d. I=I+3
[0170] e. Go to b
[0171] 4. Step 2
[0172] a. Do the following until only a single reaction remains
[0173] i. For I=1 to W/3
[0174] ii. Ligate T[I], T[I+1]
[0175] iii. I=I+2
[0176] iv. Go to ii
1 CODON TABLE (E. coli Class II preferred usage) PHE TTC SER TCT
TYR TAC CYS TGG TER TGA TRP TGG ILE ATC MET ATG THR ACC LEU CTG PRO
CCG HIS CAC GLN CAG ARG CGT VAL GTT ALA GCG ASN AAC LYS AAA ASP GAC
GLU GAA GLY GGT
[0177] Algorithms of the invention useful for assembly of a target
polynucleotide can further be described as Perl script as set forth
below. ALGORITHM 1 provides a method for converting a protein
sequence into a polynucleotide sequence using E. Coli codons:
2 if ($aminoacid eq "m") {$codon = "ATG";} elsif ($aminoacid eq
"f") {$codon = "TTC";} elsif ($aminoacid eq "1") {$codon = "CTG";}
elsif ($aminoacid eq "s") {$codon = "TCT";} elsif ($aminoacid eq
"y") {$codon = "TAC";} elsif ($aminoacid eq "c") {$codon = "TGC";}
elsif ($aminoacid eq "w") {$codon = "TGG";} elsif ($aminoacid eq
"i") {$codon = "ATC";} elsif ($aminoacid eq "t") {$codon = "ACC";}
elsif ($aminoacid eq "p") {$codon = "CCG";} elsif ($aminoacid eq
"q") {$codon = "CAG";} elsif ($aminoacid eq "r") {$codon = "CGT";}
elsif ($aminoacid eq "v") {$codon = "GTT";} elsif ($aminoacid eq
"a") {$codon = "GCG";} elsif ($aminoacid eq "n") {$codon = "AAC";}
elsif ($aminoacid eq "k") {$codon = "AAA";} elsif ($aminoacid eq
"d") {$codon = "GAC";} elsif ($aminoacid eq "e") {$codon = "GAA";}
elsif ($aminoacid eq "g") {$codon = "GGT";} elsif ($aminoacid eq
"h") {$codon = "CAC";} else {$codon = " "}; $DNAsequence =
$DNAsequence + $codon;
[0178] The following list provides the class II codon preference in
Perl for E. coli
[0179] if ($aminoacid eq "m") {$codon="ATG";}
[0180] elsif ($aminoacid eq "f") {$codon="TTC";}
[0181] elsif ($aminoacid eq "l") {$codon="CTG";}
[0182] elsif ($aminoacid eq "s") {$codon="TCT";}
[0183] elsif ($aminoacid eq "y") {$codon="TAC";}
[0184] elsif ($aminoacid eq "c") {$codon="TGC";}
[0185] elsif ($aminoacid eq "w") {$codon="TGG";}
[0186] elsif ($aminoacid eq "i") {$codon="ATC";}
[0187] elsif ($aminoacid eq "t") {$codon="ACC";}
[0188] elsif ($aminoacid eq "p") {$codon="CCG";}
[0189] elsif ($aminoacid eq "q") {$codon="CAG";}
[0190] elsif ($aminoacid eq "r") {$codon="CGT";}
[0191] elsif ($aminoacid eq "v") {$codon="GTT";}
[0192] elsif ($aminoacid eq "a") {$codon="GCG";}
[0193] elsif ($aminoacid eq "n") {$codon="AAC";}
[0194] elsif ($aminoacid eq "k") {$codon="AAA";}
[0195] elsif ($aminoacid eq "d") {$codon="GAC";}
[0196] elsif ($aminoacid eq "e") {$codon="GAA";}
[0197] elsif ($aminoacid eq "g") {$codon="GGT";}
[0198] elsif ($aminoacid eq "h") {$codon="CAC";}
[0199] else {$codon=" "};
[0200] $DNAsequence=$DNAsequence+$codon;
[0201] ALGORITHM 2 provides a method for parsing a polynucleotide
sequence into component forward and reverse oligonucleotides that
can be reassembled into a complete target polynucleotide encoding a
target polypeptide:
3 #$oligoname is the identifier name for the list and for each
component #oligonucleotide #$OL is the length of each component
oligonucleotide #$Overlap is the length of the overlap in bases
between each forward and each #reverse oligonucleotide #$sequence
is the DNA sequence in bases #$seqlen is the length of the DNA
sequence in bases #$bas is the individual base in a sequence
#$forseq is the sequence of a forward oligonucleotide #$revseq is
the sequence of a reverse oligonucleotide #$revcomp is the reverse
complemented sequence of the gene #$oligonameF- [] is the list of
parsed forward oligos #$oligonameR- [] is the list of parsed
reverse oligos $Overlap = <STDIN>; $seqlen = length
($sequence); #convert forward sequence to upper case if lower case
$forseq = ""; for ($j = 0; $j <= seqlen-1; $j ++) { $bas =
substr ($sequence,$j,1); if ($bas eq "a") {$cfor = "A";} elsif
($bas eq "t") {$cfor = "T";} elsif ($bas eq "c") {$cfor = "C";}
elsif ($bas eq "g") {$cfor = "G";} elsif ($bas eq "A") {$cfor =
"A";} elsif ($bas eq "T") {$cfor = "T";} elsif ($bas eq "C") {$cfor
= "C";} elsif ($bas eq "G") {$cfor = "G";} else {$cfor = "X"};
$forseq = $forseq.$cfor; print OUT "$j .backslash.n"; }
[0202] The reverse complement of the sequence generated above is
identified by:
4 $revcomp = ""; for ($i = $seqlen-1; $i >= 0; $i--) { $base =
substr ($sequence, $i, 1); if ($base eq "a") {$comp = "T";} elsif
($base eq "t") {$comp = "A";} elsif ($base eq "g") {$comp = "C";}
elsif ($base eq "c") {$comp = "G";} elsif ($base eq "A") {$comp =
"T";} elsif ($base eq "T") {$comp = "A";} elsif ($base eq "G")
{$comp = "C";} elsif ($base eq "C") {$comp = "G";} else {$comp =
"X"}; $revcomp = $revcomp.$comp; } #now do the parsing #generate
the forward oligo list print OUT "Forward oligos.backslash.n";
print "Forward oligos.backslash.n"; $r = 1; for ($i = 0; $i <=
$seqlen -1; $i+=$OL) { $oligo = substr ($sequence, $i, $OL); print
OUT "$oligname F- $r $oligo.backslash.n"; print "$oligname F- $r
$oligo.backslash.n"; $r = $r + 1; } #generate the forward reverse
list $r = 1; for ($i = $seqlen - $Overlap - $OL; $i >=0;
$i-=$OL) { print OUT ".backslash.n"; print ".backslash.n"; $oligo =
substr($revcomp, $i, $OL); print OUT "$oligname R- $r $oligo";
print "$oligname R- $r $oligo"; $r = $r + 1; }
[0203] The invention further provides a computer-assisted method
for synthesizing a target polynucleotide encoding a target
polypeptide derived from a model sequence using a programmed
computer including a processor, an input device, and an output
device, by inputting into the programmed computer, through the
input device, data including at least a portion of the target
polynucleotide sequence encoding a target polypeptide.
Subsequently, the sequence of at least one initiating
polynucleotide present in the target polynucleotide sequence is
determined and a model for synthesizing the target polynucleotide
sequence is derived. The model is based on the position of the
initiating sequence in the target polynucleotide sequence using
overall sequence parameters necessary for expression of the target
polypeptide in a biological system. The information is outputted to
an output device which provides the means for synthesizing and
assembling to target polynucleotide.
[0204] It is understood that any apparatus suitable for
polynucleotide synthesis can be used in the present invention.
Various non-limiting examples of apparatus, components, assemblies
and methods are described below. For example, in one embodiment, it
is contemplated that a nanodispensing head with up to 16 valves can
be used to deposit synthesis chemicals in assembly vessels (FIG.
4). Chemicals can be controlled using a syringe pump from the
reagent reservoir. Because of the speed and capability of the
ink-jet dispensing system, synthesis can be made very small and
very rapid. Underlying the reaction chambers is a set of assembly
vessels linked to microchannels that will move fluids by
microfluidics. The configuration of the channels will pool pairs
and triplexes of oligonucleotides systematically using, for
example, a robotic device. However, pooling can be accomplished
using fluidics and without moving parts.
[0205] As shown in FIG. 5, oligonucleotide synthesis,
oligonucleotide assembly by pooling and annealing, and ligation can
be done using microfluidic mixing, resulting in the same set of
critical triplex intermediates that serves as the substrate for
annealing, ligation and oligonucleotide joining. DNA ligase and
other components can be placed in the buffer fluid moving through
the instrument microchambers. Thus, synthesis and assembly can be
carried out in a highly controlled way in the same instrument.
[0206] As shown in FIG. 6, the pooling manifold can be produced
from non-porous plastic and designed to control sequential pooling
of oligonucleotides synthesized in arrays. Oligonucleotide parsing
from a gene sequence designed in the computer can be programmed for
synthesis where (+) and (-) strands are placed in alternating wells
of the array. Following synthesis in this format, the 12 row
sequences of the gene are directed into the pooling manifold that
systematically pools three wells into reaction vessels forming the
critical triplex structure. Following temperature cycling for
annealing and ligation, four sets of triplexes are pooled into 2
sets of 6 oligonucleotide products, then 1 set of 12
oligonucleotide products. Each row of the synthetic array is
associated with a similar manifold resulting in the first stage of
assembly of 8 sets of assembled oligonucleotides representing 12
oligonucleotides each. As shown in FIG. 7, the second manifold
pooling stage is controlled by a single manifold that pools the 8
row assemblies into a single complete assembly. Passage of the
oligonucleotide components through the two manifold assemblies (the
first 8 and the second single) results in the complete assembly of
all 96 oligonucleotides from the array. The assembly module (FIG.
8) of Genewriter.TM. can include a complete set of 7 pooling
manifolds produced using microfabrication in a single plastic block
that sits below the synthesis vessels. Various configurations of
the pooling manifold will allow assembly of 96,384 or 1536 well
arrays of parsed component oligonucleotides.
[0207] The initial configuration is designed for the assembly of 96
oligonucleotides synthesized in a pre-defined array, composed of 48
pairs of overlapping 50 mers. Passage through the assembly device
in the presence of DNA ligase and other appropriate buffer and
chemical components, and with appropriate temperature controls on
the device, will assembly these into a single 2400 base double
stranded gene assembly (FIG. 9).
[0208] The basic pooling device design can be made of Plexiglas.TM.
or other type of co-polymer with microgrooves or microfluidic
channels etched into the surface and with a temperature control
element such as a Peltier circuit underlying the junction of the
channels. This results in a microreaction vessel at the junction of
two channels for 1) mixing of the two streams, 2) controlled
temperature maintenance or cycling a the site of the junction and
3) expulsion of the ligated mixture from the exit channel into the
next set of pooling and ligation chambers.
[0209] As shown in FIG. 11, the assembly platform design can
consist of 8 synthesis microwell plates in a 96 well configuration,
addressed with 16 channels of microdispensing. Below each plate is:
1) an evacuation manifold for removing synthesis components; and 2)
an assembly manifold based on the schematic in FIG. 9 for
assembling component oligonucleotides from each 96-well array. FIG.
12 shows a higher capacity assembly format using 1536-well
microplates and capable of synthesis of 1536 component
oligonucleotides per plate. Below each plate is: 1) an evacuation
manifold for removing synthesis components; and 2) an assembly
manifold assembly for assembling 1536 component oligonucleotides
from each 1536-well array. Pooling and assembly strategies can be
based on the concepts used for 96-well plates.
[0210] An alternative assembly format includes using surface-bound
oligonucleotide synthesis rather than soluble synthesis on CPG
glass beads (FIG. 13). In this configuration, oligonucleotides are
synthesized with a hydrocarbon linker that allows attachment to a
solid support. Following parsing of component sequences and
synthesis, the synthesized oligonucleotides are covalently attached
to a solid support such that the stabilizer is attached and the two
ligation substrates added to the overlying solution. Ligation
occurs as mediated by DNA ligase in the solution and increasing
temperature above the Tm removes the linked oligonucleotides by
thermal melting. As shown in FIG. 14 the systematic assembly on a
solid support of a set of parsed component oligonucleotides can be
arranged in an array with the set of stabilizer oligonucletoide
attached. The set of ligation substrate oligonucleotides are placed
in the solution and, systematic assembly is carried out in the
solid phase by sequential annealing, ligation and melting which
moves the growing DNA molecules across the membrane surface.
[0211] FIG. 15 shows an additional alternative means for
oligonucleotide assembly, by binding the component oligonucleotides
to a set of metal electrodes on a microelectronic chip, where each
electrode can be controlled independently with respect to current
and voltage. The array contains the set of minus strand
oligonucleotides. Placing a positive change on the electrode will
move by electrophoresis the component ligase substrate
oligonucleotide onto the surface where annealing takes place. The
presence of DNA ligase mediates covalent joining or ligation of the
components. The electrode is then turned off or a negative charge
is applied and the DNA molecule expulsed from the electrode. The
next array element containing the next stabilizer oligonucleotide
from the parsed set is turned on with a positive charge and a
second annealing, joining and ligation with the next
oligonucleotide in the set carried out. Systematic and repetitive
application of voltage control, annealing, ligation and
denaturation will result in the movement of the growing chain
across the surface as well as assembly of the components into a
complete DNA molecule.
[0212] The invention further provides methods for the automated
synthesis of target polynucleotides. For example, a desired
sequence can be ordered by any means of communication available to
a user wishing to order such a sequence. A "user", as used herein,
is any entity capable of communicating a desired polynucleotide
sequence to a server. The sequence may be transmitted by any means
of communication available to the user and receivable by a server.
The user can be provided with a unique designation such that the
user can obtain information regarding the synthesis of the
polynucleotide during synthesis. Once obtained, the transmitted
target polynucleotide sequence can be synthesized by any method set
forth in the present invention.
[0213] The invention further provides a method for automated
synthesis of a polynucleotide, by providing a user with a mechanism
for communicating a model polynucleotide sequence and optionally
providing the user with an opportunity to communicate at least one
desired modification to the model sequence. The invention envisions
a user providing a model sequence and a desired modification to
that sequence which results in the alteration of the model
sequence. Any modification that alters the expression, function or
activity of a target polynucleotide or encoded target polypeptide
can be communicated by the user such that a modified polynucleotide
or polypeptide is synthesized or expressed according to a method of
the invention. For example, a model polynucleotide encoding a
polypeptide normally expressed in a eukaryotic system can be
altered such that the codons of the resulting target polynucleotide
are conducive for expression of the polypeptide in a prokaryotic
system. In addition, the user can indicate a desired modified
activity of a polypeptide encoded by a model polynucleotide. Once
provided, the algorithms and methods of the present invention can
be used to synthesize a target polynucleotide encoding a target
polypeptide believed to have the desired modified activity. The
methods of the invention can be further utilized to express the
target polypeptide and to screen for the desired activity. It is
understood that the methods of the invention provide a means for
synthetic evolution whereby any parameter of polynucleotide
expression and/or polypeptide activity can be altered as
desired.
[0214] Once the transmitted model sequence and desired modification
are provided by the user, the data including at least a portion of
the model polynucleotide sequence is inputted into a programmed
computer, through an input device. Once inputted, the algorithms of
the invention are used to determine the sequence of the model
polynucleotide sequence containing the desired modification and
resulting in a target polynucleotide containing the modification.
Subsequently, the processor and algorithms of the invention is used
to identify at least one initiating polynucleotide sequence present
in the polynucleotide sequence. A target polynucleotide (i.e., a
modified model polynucleotide) is identified and synthesized.
EXAMPLES
[0215] Nucleic Acid Synthesis Design Protocol
[0216] For the purposes of assembling a synthetic nucleic acid
sequence encoding a target polypeptide, a model polypeptide
sequence or nucleic acid sequence is obtained and analyzed using a
suitable DNA analysis package, such as, for example, MacVector or
DNA Star. If the target protein will be expressed in a bacterial
system, for example, the model sequence can be converted to a
sequence encoding a polypeptide utilizing E. coli preferred codons
(i.e., Type I, Type II or Type II codon preference). The present
invention provides the conversion programs Codon I, Codon II or
Codon III. A nucleic acid sequence of the invention can be designed
to accommodate any codon preference of any prokaryotic or
eucaryotic organism.
[0217] In addition to the above codon preferences, specific
promoter, enhancer, replication or drug resistance sequences can be
included in a synthetic nucleic acid sequence of the invention. The
length of the construction can be adjusted by padding to give a
round number of bases based on about 25 to 100 bp synthesis. The
synthesis of sequences of about 25 to 100 bp in length can be
manufactured and assembled using the array synthesizer system and
may be used without further purification. For example, two 96-well
plates containing 100-mers could give a 9600 bp construction of a
target sequence.
[0218] Subsequent to the design of the oligonucleotides needed for
assembly of the target sequence, the oligonucleotides are parsed
using Parseoligo.TM., a proprietary computer program that optimizes
nucleic acid sequence assembly. Optional steps in sequence assembly
include identifying and eliminating sequences that may give rise to
hairpins, repeats or other difficult sequences. The parsed
oligonucleotide list is transferred to the Synthesizer driver
software. The individual oligonucleotides are pasted into the wells
and oligonucleotide synthesis is accomplished.
[0219] Assembly of Parsed Oligonucleotides Using a Two-Step PCR
Reaction:
[0220] Obtain arrayed sets of parsed overlapping oligonucleotides,
50 bases each, with an overlap of about 25 base pairs (bp). The
oligonucleotide concentration is from 250 nM (250 .mu.M/ml). 50
base oligos give T.sub.ms from 75 to 85 degrees C., 6 to 10
od.sub.260, 11 to 15 nanomoles, 150 to 300 .mu.g. Resuspend in 50
to 100 .mu.l of H.sub.2O to make 250 nM/ml. Combine equal amounts
of each oligonucleotide to final concentration of 250 .mu.M (250
nM/ml). Add 1 .mu.l of each to give 192 .mu.l. Add 8 .mu.l
dH.sub.2O to bring up to 200 .mu.l. Final concentration is 250
.mu.M mixed oligos. Dilute 250-fold by taking 10 .mu.l of mixed
oligos and add to 1 ml of water. (1/100; 2.5 1M ) then take 1 .mu.l
of this and add to 24 .mu.l 1.times. PCR mix. The PCR reaction
includes:
[0221] 10 mM TRIS-HCl, pH 9.0
[0222] 2.2 mM MgCl.sub.2
[0223] 50 mM KCl
[0224] 0,2 mM each dNTP
[0225] 0.1% Triton X-100
[0226] One U TaqI polymerase is added to the reaction. The reaction
is thermoycled under the following conditions
[0227] a. Assembly
[0228] i. 55 cycles of
[0229] 1.94 degrees 30 s
[0230] 2.52 degrees 30 s
[0231] 3.72 degrees 30 s
[0232] Following assembly amplification, take 2.5 .mu.l of this
assembly mix and add to 100 .mu.l of PCR mix. (40.times.dilution).
Prepare outside primers by taking 1 .mu.l of F1 (forward primer)
and 1 .mu.l of R96 (reverse primer) at 250 .mu.M (250 nm/ml-0.250
mmole/.mu.l) and add to the 100 .mu.l PCR reaction. This gives a
final concentration of 2.5 uM each oligo. Add 1 U Taq1 polymerase
and thermocycle under the following conditions:
[0233] 35 cycles (or original protocol 23 cycles)
5 94 degrees 30 s 50 degrees 30 s 72 degrees 60 s
[0234] Extract with phenol/chloroform. Precipitate with ethanol.
Resuspend in 10 .mu.l of dH.sub.2O and analyze on an agarose
gel.
[0235] Assembly of Parsed Oligonucleotides Using Taq1 Ligation
[0236] Arrayed sets of parsed overlapping oligonucleotides of about
25 to 150 bases in length each, with an overlap of about 12 to 75
base pairs (bp), are obtained. The oligonucleotide concentration is
from 250 nM (250 .mu.M/ml). For example, 50 base oligos give
T.sub.ms from 75 to 85 degrees C., 6 to 10 od.sub.260, 11 to 15
nanomoles, 150 to 300 .mu.g. Resuspend in 50 to 100 ml of H.sub.2O
to make 250 nM/ml.
[0237] Using a robotic workstation, equal amounts of forward and
reverse oligos are combined pairwise. Take 10 .mu.l of forward and
10 .mu.l of reverse oligo and mix in a new 96-well v-bottom plate.
This gives one array with sets of duplex oligonucleotides at
250.mu..mu.. according to pooling scheme Step 1 in Table 1. Prepare
an assembly plate by taking 2 .mu.l of each oligomer pair and
adding to a fresh plate containing 100 .mu.l of ligation mix in
each well. This gives an effective concentration of 2.5 .mu.M or
2.5 nM/ml. Transfer 20 .mu.l of each well to a fresh microwell
plate and add 1 .mu.l of T4 polynucleotide kinase and 1 .mu.l of 1
mM ATP to each well. Each reaction will have 50 pmoles of
oligonucleotide and 1 mmole ATP. Incubate at 37 degrees C. for 30
minutes.
[0238] Initiate assembly according to Steps 2-7 of Table 1. Carry
out pooling Step 2 mixing each successive well with the next. Add 1
.mu.l of Taq1 ligase to each mixed well. Cycle once at 94 degrees
for 30 sec; 52 degrees for 30 s; then 72 degrees for 10
minutes.
[0239] Carry out step 3 (Table 1) of pooling scheme and cycle
according to the temperature scheme above. Carry out steps 4 and 5
of the pooling scheme and cycle according to the temperature scheme
above. Carry out pooling scheme step 6 and take 10 .mu.l of each
mix into a fresh microwell. Carry out step 7 pooling scheme by
pooling the remaining three wells. Reaction volumes will be:
[0240] Initial plate has 20 ul per well.
[0241] Step 2 20 ul+20 ul=40 ul
[0242] Step 3 80 ul
[0243] Step 4 160 ul
[0244] Step 5 230 ul
[0245] Step 6 10 ul+10 ul=20 ul
[0246] Step 7 20+20+20=60 ul final reaction volume
[0247] A final PCR amplification was then performed by taking 2 ul
of final ligation mix and add to 20 ul of PCR mix containing 10 mM
TRIS-HCl, pH 9.0, 2.2 mM MgCl.sub.2, 50 mM KCl, 0.2 mM each dNTP
and 0.1% Triton X-100.
[0248] Prepare outside primers by taking 1 .mu.l of F1 (forward
primer) and 1 .mu.l of R96 (reverse primer) at 250 .mu.M (250
nm/ml-0.250 mmole/.mu.l) and add to the 100 .mu.l PCR reaction
giving a final concentration of 2.5 uM each oligo. Add 1 U Taql
polymerase and cycle for 35 cycles under the following conditions:
94 degrees for 30 s; 50 degrees for 30 s; and 72 degrees for 60 s.
Extract the mixture with phenol/chloroform. Precipitate with
ethanol. Resuspend in 10 .mu.l of dH.sub.2O and analyze on an
agarose gel.
6TABLE 1 Pooling scheme for ligation assembly. Ligation method -
Well pooling scheme STEP FROM TO 1 All F All R 2 A1 A2 A3 A4 A5 A6
A7 A8 A9 A10 A11 A12 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 C1 C2
C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
D12 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 F1 F2 F3 F4 F5 F6 F7 F8
F9 F10 F11 F12 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 H1 H2 H3 H4
H5 H6 H7 H8 H9 H10 H11 H12 A6 A8 A10 A12 B2 B4 B6 B8 B10 B12 C2 C4
C6 C8 C10 C12 D2 D4 D6 D8 D10 D12 E2 E4 E6 E8 E10 E12 F2 F4 F6 F8
F10 F12 G2 G4 G6 G8 G10 G12 H2 H4 H6 H8 H10 H12 4 A4 A8 A12 B4 B8
B12 C4 C8 C12 D4 D8 D12 E4 E8 E12 F4 F8 F12 G4 G8 G12 H4 H8 H12 5
A8 B4 B12 C8 D4 D12 E8 F4 F12 G8 H4 H12 6 B4 C8 D12 F4 G8 H12 7 C8
F4 H12
[0249] Assembly of Parsed Oligonucleotides Using Taq I Synthesis
and Assembly
[0250] Arrayed sets of parsed overlapping oligonucleotides of about
25 to 150 bases in length each, with an overlap of about 12 to 75
base pairs (bp), are obtained. The oligonucleotide concentration is
from 250 nM (250 .mu.M/ml). 50 base oligos give Tms from 75 to 85
degrees C., 6 to 10 od.sub.260, 11 to 15 nanomoles, 150 to 300
.mu.g. Resuspend in 50 to 100 ml of H.sub.2O to make 250 nM/ml.
[0251] The invention envisions using a robotic workstation to
accomplish nucleic acid assembly. In the present example, two
working plates containing forward and reverse oligonucleotides in a
PCR mix at 2.5 mM are prepared and 1 .mu.l of each oligo are added
to 100 .mu.l of PCR mix in a fresh microwell providing one plate of
forward and one of reverse oligos in an array. Cycling assembly is
then initiated as follows according to the pooling scheme outlined
in Table 1. In the present example, 96 cycles of assembly can be
accomplished according to this scheme.
[0252] Remove 2 .mu.l of well F-E1 to a fresh well; remove 2 .mu.l
of R-E1 to a fresh well; add 18 .mu.l of 1.times. PCR mix; add 1 U
of Taq1 polymerase;
[0253] Cycle once:
[0254] 94 degrees 30 s
[0255] 52 degrees 30 s
[0256] 72 degrees 30 s
[0257] Subsequently, remove 2 .mu.l of well F-E2 to the reaction
vessel; remove 2 .mu.l of well R-D12 to the reaction vessel. Cycle
once according to the temperatures above. Repeat the pooling and
cycling according to the scheme outlined in Table 1 for about 96
cycles.
[0258] A PCR amplification is then performed by taking 2 .mu.l of
final reaction mix and adding it to 20 .mu.l of a PCR mix
comprising:
[0259] 10 mM TRIS-HCl, pH 9.0
[0260] 2.2 mM MgCl2
[0261] 50 mM KCl
[0262] 0.2 mM each dNTP
[0263] 0.1% Triton X-100
[0264] Outside primers are prepared by taking 1 .mu.l of F1 and 1
.mu.l of R96 at 250 mM (250 nm/ml-0.250 mmole/ml) and add to the
100 .mu.l PCR reaction. This gives a final concentration of 2.5
.mu.M each oligo. 1 U Taql polymerase is subsequently added and the
reaction is cycled for about 23 to 35 cycles under the following
conditions:
7 94 degrees 30 s 50 degrees 30 s 72 degrees 60 s
[0265] The reaction is subsequently extracted with
phenol/chloroform, precipitated with ethanol and resuspend in 10 ml
of dH.sub.2O for analysis on an agarose gel.
[0266] Equal amounts of forward and reverse oligos pairwise are
added by taking 10 .mu.l of forward and 10 .mu.l of reverse oligo
and mix in a new 96-well v-bottom plate. This provides one array
with sets of duplex oligonucleotides at 250 mM, according to
pooling scheme Step 1 in Table 1. An assembly plate was prepared by
taking 2 .mu.l of each oligomer pair and adding them to the plate
containing 100 .mu.l of ligation mix in each well. This gives an
effective concentration of 2.5 1M or 2.5 nM/ml. About 20 .mu.l of
each well is transferred to a fresh microwell plate in addition to
1 .mu.l of T4 polynucleotide kinase and 1 .mu.l of 1 mM ATP. Each
reaction will have 50 pmoles of oligonucleotide and 1 mmole ATP.
Incubate at 37 degrees for 30 minutes.
[0267] Nucleic acid assembly was initiated according to Steps 2-7
of Table 1. Step 2 pooling is carried out by mixing each well with
the next well in succession. 1 .mu.l of Taq1 ligase to is added to
each mixed well and cycled once as follows:
[0268] 94 degrees 30 sec
[0269] 52 degrees 30 s
[0270] 72 degrees 10 minutes
[0271] Step 3 of pooling scheme is carried out and cycled according
to the temperature scheme above. Steps 4 and 5 of the pooling
scheme are carried out and cycled according to the temperature
scheme above. Carry out pooling scheme step 6 and take 10 .mu.l of
each mix into a fresh microwell. Step 7 pooling scheme is carried
out by pooling the remaining three wells. The reaction volumes will
be (initial plate has 20 .mu.l per well):
[0272] Step 2 20 .mu.l+20 .mu.l=40 .mu.l
[0273] Step 3 80 .mu.l
[0274] Step 4 160 .mu.l
[0275] Step 5 10 .mu.l+10 .mu.l=20 .mu.l
[0276] Step 7 20+20+20=60 .mu.l final reaction volume
[0277] A final PCR amplification is performed by taking 2 .mu.l of
the final ligation mix and adding it to 20 .mu.l of PCR mix
comprising:
[0278] 10 mM TRIS-HCl, pH 9.0
[0279] 2.2 mM MgCl2
[0280] 50 mM KCl
[0281] 0.2 mM each dNTP
[0282] 0.1% Triton X-100
[0283] Outside primers are prepared by taking 1 .mu.l of F1 and 1
.mu.l of R96 at 250 mM (250 nm/ml-0.250 mmole/ml) and adding them
to the 100 .mu.l PCR reaction giving a final concentration of 2.5
uM for each oligo. Subsequentlly, 1 U of Taq1 polymerase is added
and cycled for about 23 to 35 cycles under the following
conditions:
8 94 degrees 30 s 50 degrees 30 s 72 degrees 60 s
[0284] The product is extracted with phenol/chloroform, precipitate
with ethanol, resuspend in 10 .mu.l of dH.sub.2O and analyzed on an
agarose gel.
9TABLE 2 Pooling scheme for assembly using Taql polymerase (also
topoisomerase II). Step Forward oligo Reverse oligo 1 F E 1 + R E 1
Pause 2 F E 2 + R D 12 Pause 3 F E 3 + R D 11 Pause 4 F E 4 + R D
10 Pause 5 F E 5 + R D 9 Pause 6 F E 6 + R D 8 Pause 7 F E 7 + R D
7 Pause 8 F E 8 + R D 6 Pause 9 F E 9 + R D 5 Pause 10 F E 10 + R D
4 Pause 11 F E 11 + R D 3 Pause 12 F E 12 + R D 2 Pause 13 F F 1 +
R D 1 Pause 14 F F 2 + R C 12 Pause 15 F F 3 + R C 11 Pause 16 F F
4 + R C 10 Pause 17 F F 5 + R C 9 Pause 18 F F 6 + R C 8 Pause 19 F
F 7 + R C 7 Pause 20 F F 8 + R C 6 Pause 21 F F 9 + R C 5 Pause 22
F F 10 + R C 4 Pause 23 F F 11 + R C 3 Pause 24 F F 12 + R C 2
Pause 25 F G 1 + R C 1 Pause 26 F G 2 + R B 12 Pause 27 F G 3 + R B
11 Pause 28 F G 4 + R B 10 Pause 29 F G 5 + R B 9 Pause 30 F G 6 +
R B 8 Pause 31 F G 7 + R B 7 Pause 32 F G 8 + R B 6 Pause 33 F G 9
+ R B 5 Pause 34 F G 10 + R B 4 Pause 35 F G 11 + R B 3 Pause 36 F
G 12 + R B 2 Pause 37 F H 1 + R B 1 Pause 38 F H 2 + R A 12 Pause
39 F H 3 + R A 11 Pause 40 F H 4 + R A 10 Pause 41 F H 5 + R A 9
Pause 42 F H 6 + R A 8 Pause 43 F H 7 + R A 7 Pause 44 F H 8 + R A
6 Pause 45 F H 9 + R A 5 Pause 46 F H 10 + R A 4 Pause 47 F H 11 +
R A 3 Pause 48 F H 12 + R A 2 Pause
[0285]
10TABLE 3 Alternate pooling scheme (initiating assembly from the 5'
or 3' end) 1. F-A1 .fwdarw. R-A1 denature, anneal, polymerase
extension 2. F-A2 .fwdarw. R-H12 denature, anneal, polymerase
extension 3. F-A3 .fwdarw. R-H11 denature, anneal, polymerase
extension 4. F-A4 .fwdarw. R-H10 denature, anneal, polymerase
extension 5. F-A5 .fwdarw. R-H9 denature, anneal, polymerase
extension 6. F-A6 .fwdarw. R-H8 denature, anneal, polymerase
extension 7. F-A7 .fwdarw. R-H7 denature, anneal, polymerase
extension 8. F-A8 .fwdarw. R-H6 denature, anneal, polymerase
extension 9. F-A9 .fwdarw. R-H5 denature, anneal, polymerase
extension 10. F-A10 .fwdarw. R-H4 denature, anneal, polymerase
extension 11. F-A11 .fwdarw. R-H3 denature, anneal, polymerase
extension 12. F-A12 .fwdarw. R-H2 denature, anneal, polymerase
extension 13. F-B1 .fwdarw. R-H1 denature, anneal, polymerase
extension 14. F-B2 .fwdarw. R-G12 denature, anneal, polymerase
extension 15. F-B3 .fwdarw. R-G11 denature, anneal, polymerase
extension 16. F-B4 .fwdarw. R-G10 denature, anneal, polymerase
extension 17. F-B5 .fwdarw. R-G9 denature, anneal, polymerase
extension 18. F-B6 .fwdarw. R-G8 denature, anneal, polymerase
extension 19. F-B7 .fwdarw. R-G7 denature, anneal, polymerase
extension 20. F-B8 .fwdarw. R-G6 denature, anneal, polymerase
extension 21. F-B9 .fwdarw. R-G5 denature, anneal, polymerase
extension 22. F-B10 .fwdarw. R-G4 denature, anneal, polymerase
extension 23. F-B11 .fwdarw. R-G3 denature, anneal, polymerase
extension 24. F-B12 .fwdarw. R-G2 denature, anneal, polymerase
extension 25. F-C1 .fwdarw. R-G1 denature, anneal, polymerase
extension 26. F-C2 .fwdarw. R-F12 denature, anneal, polymerase
extension 27. F-C3 .fwdarw. R-F11 denature, anneal, polymerase
extension 28. F-C4 .fwdarw. R-F10 denature, anneal, polymerase
extension 29. F-C5 .fwdarw. R-F9 denature, anneal, polymerase
extension 30. F-C6 .fwdarw. R-F8 denature, anneal, polymerase
extension 31. F-C7 .fwdarw. R-F7 denature, anneal, polymerase
extension 32. F-C8 .fwdarw. R-F6 denature, anneal, polymerase
extension 33. F-C9 .fwdarw. R-F5 denature, anneal, polymerase
extension 34. F-C10 .fwdarw. R-F4 denature, anneal, polymerase
extension 35. F-C11 .fwdarw. R-F3 denature, anneal, polymerase
extension 36. F-C12 .fwdarw. R-F2 denature, anneal, polymerase
extension 37. F-D1 .fwdarw. R-F1 denature, anneal, polymerase
extension 38. F-D2 .fwdarw. R-E12 denature, anneal, polymerase
extension 39. F-D3 .fwdarw. R-E11 denature, anneal, polymerase
extension 40. F-D4 .fwdarw. R-E10 denature, anneal, polymerase
extension 41. F-D5 .fwdarw. R-E9 denature, anneal, polymerase
extension 42. F-D6 .fwdarw. R-E8 denature, anneal, polymerase
extension 43. F-D7 .fwdarw. R-E7 denature, anneal, polymerase
extension 44. F-D8 .fwdarw. R-E6 denature, anneal, polymerase
extension 45. F-D9 .fwdarw. R-E5 denature, anneal, polymerase
extension 46. F-D10 .fwdarw. R-E4 denature, anneal, polymerase
extension 47. F-D11 .fwdarw. R-E3 denature, anneal, polymerase
extension 48. F-D12 .fwdarw. R-E2 denature, anneal, polymerase
extension 49. F-E1 .fwdarw. R-E1 denature, anneal, polymerase
extension 50. F-E2 .fwdarw. R-D12 denature, anneal, polymerase
extension 51. F-E3 .fwdarw. R-D11 denature, anneal, polymerase
extension 52. F-E4 .fwdarw. R-D10 denature, anneal, polymerase
extension 53. F-E5 .fwdarw. R-D9 denature, anneal, polymerase
extension 54. F-E6 .fwdarw. R-D8 denature, anneal, polymerase
extension 55. F-E7 .fwdarw. R-D7 denature, anneal, polymerase
extension 56. F-E8 .fwdarw. R-D6 denature, anneal, polymerase
extension 57. F-E9 .fwdarw. R-D5 denature, anneal, polymerase
extension 58. F-E10 .fwdarw. R-D4 denature, anneal, polymerase
extension 59. F-E11 .fwdarw. R-D3 denature, anneal, polymerase
extension 60. F-E12 .fwdarw. R-D2 denature, anneal, polymerase
extension 61. F-F1 .fwdarw. R-D1 denature, anneal, polymerase
extension 62. F-F2 .fwdarw. R-C12 denature, anneal, polymerase
extension 63. F-F3 .fwdarw. R-C11 denature, anneal, polymerase
extension 64. F-F4 .fwdarw. R-C10 denature, anneal, polymerase
extension 65. F-F5 .fwdarw. R-C9 denature, anneal, polymerase
extension 66. F-F6 .fwdarw. R-C8 denature, anneal, polymerase
extension 67. F-F7 .fwdarw. R-C7 denature, anneal, polymerase
extension 68. F-F8 .fwdarw. R-C6 denature, anneal, polymerase
extension 69. F-F9 .fwdarw. R-C5 denature, anneal, polymerase
extension 70. F-F10 .fwdarw. R-C4 denature, anneal, polymerase
extension 71. F-F11 .fwdarw. R-C3 denature, anneal, polymerase
extension 72. F-F12 .fwdarw. R-C2 denature, anneal, polymerase
extension 73. F-G1 .fwdarw. R-C1 denature, anneal, polymerase
extension 74. F-G2 .fwdarw. R-B12 denature, anneal, polymerase
extension 75. F-G3 .fwdarw. R-B11 denature, anneal, polymerase
extension 76. F-G4 .fwdarw. R-B10 denature, anneal, polymerase
extension 77. F-G5 .fwdarw. R-B9 denature, anneal, polymerase
extension 78. F-G6 .fwdarw. R-B8 denature, anneal, polymerase
extension 79. F-G7 .fwdarw. R-B7 denature, anneal, polymerase
extension 80. F-G8 .fwdarw. R-B6 denature, anneal, polymerase
extension 81. F-G9 .fwdarw. R-B5 denature, anneal, polymerase
extension 82. F-G10 .fwdarw. R-B4 denature, anneal, polymerase
extension 83. F-G11 .fwdarw. R-B3 denature, anneal, polymerase
extension 84. F-G12 .fwdarw. R-B2 denature, anneal, polymerase
extension 85. F-H1 .fwdarw. R-B1 denature, anneal, polymerase
extension 86. F-H2 .fwdarw. R-A12 denature, anneal, polymerase
extension 87. F-H3 .fwdarw. R-A11 denature, anneal, polymerase
extension 88. F-H4 .fwdarw. R-A10 denature, anneal, polymerase
extension 89. F-H5 .fwdarw. R-A9 denature, anneal, polymerase
extension 90. F-H6 .fwdarw. R-A8 denature, anneal, polymerase
extension 91. F-H7 .fwdarw. R-A7 denature, anneal, polymerase
extension 92. F-H8 .fwdarw. R-A6 denature, anneal, polymerase
extension 93. F-H9 .fwdarw. R-A5 denature, anneal, polymerase
extension 94. F-H10 .fwdarw. R-A4 denature, anneal, polymerase
extension 95. F-H11 .fwdarw. R-A3 denature, anneal, polymerase
extension 96. F-H12 .fwdarw. R-A2 denature, anneal, polymerase
extension
[0286] Assembly of Nucleic Acid Molecules
[0287] The nucleic acid molecules listed in Table 4 have been
produced using the methods described herein. The features and
characteristics of each nucleic acid molecule is also described in
Table 4.
[0288] As described in Table 4, a synthetic plasmid of 4800 bp in
length was assembled. The plasmid comprises 192 oligonucleotides
(two sets of 96 overlapping 50 mers; 25 bp overlap). The plasmid is
essentially pUC containing kanamycin resistance instead of
ampicillin resistance. The synthetic plasmid also contains lux A
and B genes from the Vibrio fisheri bacterial luciferase gene. The
SynPucl9 plasmid is 2700 bp in length comprising a sequence
essentially identical to pUC19 only shortened to precisely 2700 bp.
Two sets of 96 50 mers were used to assemble the plasmid. The
Synlux4 pUC19 plasmid was shortened and luxA gene was added. 54
100-mer oligonucleotides comprising two sets of 27 oligonucleotides
were used to assemble the plasmid. The miniQE10 plasmid comprising
2400 bp was assembled using 48 50 mer oligonucleotides. MiniQE10 is
an expression plasmid containing a 6.times. His tag and bacterial
promoter for high-level polypeptide expression. MiniQE10 was
assembled and synthesized using the Taql polymerase amplification
method of the invention. The microQE plasmid is a minimal plasmid
containing only an ampicillin gene, an origin of replication and a
linker of pQE plasmids. MicroQE was assembled using either
combinatoric ligation with 24 50-mers or with one tube PCR
amplification. The SynFib1, SynFibB and SynFibG nucleic acid
sequences are synthetic human fibrinogens manufactured using E.
coli codons to optimize expression in a prokaryotic expression
system.
11TABLE 4 Synthetic nucleic acid molecules produced using the
methods of the invention. Synthetic Plasmid 4800 192 50 circular
F1-F96 SynPUC/19 2700 192 50 circular F01-F96 SynLux/4 2700 54 100
circular F1-27 MiniQE10 2400 48 50 circular MicroQE 1200 24 50
circular MQEF-1,24 Synfib1 1850 75 50 linear SFAF1-37 pQE25 2400 96
25 circular F1-F48 SynFibB 1500 60 59 50 mers linear FibbF1-30 1 25
mer SynFibG 1350 54 53 50 mers linear FibgF1-27 1 25 mer
[0289] It is to be understood that while the invention has been
described in conjunction with the detailed description thereof, the
foregoing description is intended to illustrate and not limit the
scope of the invention, which is defined by the scope of the
appended claims. Other aspects, advantages, and modifications are
within the scope of the following claims.
* * * * *