U.S. patent application number 11/233178 was filed with the patent office on 2006-02-02 for storing data encoded dna in living organisms.
This patent application is currently assigned to Battelle Memorial Institute. Invention is credited to Harlan P. Foote, Kwong K. Wong, Pak C. Wong.
Application Number | 20060024811 11/233178 |
Document ID | / |
Family ID | 29549157 |
Filed Date | 2006-02-02 |
United States Patent
Application |
20060024811 |
Kind Code |
A1 |
Wong; Pak C. ; et
al. |
February 2, 2006 |
Storing data encoded DNA in living organisms
Abstract
Current technologies allow the generation of artificial DNA
molecules and/or the ability to alter the DNA sequences of existing
DNA molecules. With a careful coding scheme and arrangement, it is
possible to encode important information as an artificial DNA
strand and store it in a living host safely and permanently. This
inventive technology can be used to identify origins and protect
R&D investments. It can also be used in environmental research
to track generations of organisms and observe the ecological impact
of pollutants. Today, there are microorganisms that can survive
under extreme conditions. As well, it is advantageous to consider
multicellular organisms as hosts for stored information. These
living organisms can provide as memory housing and protection for
stored data or information. The present invention provides well for
data storage in a living organism wherein at least one DNA sequence
is encoded to represent data and incorporated into a living
organism.
Inventors: |
Wong; Pak C.; (Richland,
WA) ; Wong; Kwong K.; (Sugar Land, TX) ;
Foote; Harlan P.; (Richland, WA) |
Correspondence
Address: |
KLARQUIST SPARKMAN, LLP
121 SW SALMON STREET, SUITE 1600
ONE WORLD TRADE CENTER
PORTLAND
OR
97204
US
|
Assignee: |
Battelle Memorial Institute
|
Family ID: |
29549157 |
Appl. No.: |
11/233178 |
Filed: |
September 21, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10155761 |
May 24, 2002 |
|
|
|
11233178 |
Sep 21, 2005 |
|
|
|
Current U.S.
Class: |
435/252.1 ;
435/471; 435/6.11; 435/6.12; 435/6.13 |
Current CPC
Class: |
G06N 3/123 20130101;
G11C 13/0014 20130101; G11C 13/0019 20130101; B82Y 10/00
20130101 |
Class at
Publication: |
435/252.1 ;
435/006; 435/471 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68; C12N 1/20 20060101 C12N001/20; C12N 15/74 20060101
C12N015/74 |
Goverment Interests
[0001] This invention was made with Government support under
Contract DE-AC0676RL01830 awarded by the U.S. Department of Energy.
The Government has certain rights in the invention.
Claims
1-34. (canceled)
35. A living organism, comprising: a. DNA encoded to represent data
to be decoded thereafter.
36. A living organism as in claim 35, wherein said organism is a
single-celled organism.
37. A living organism as in claim 36, wherein said single-celled
organism is a bacterial cell.
38. A living organism as in claim 37, wherein said bacterial cell
is Deinococcus radians.
39. A living organism as in claim 37, wherein said bacterial cell
is a Escherichia coli.
Description
FIELD OF THE INVENTION
[0002] The present invention relates generally to a method of
storing data. In particular, but not exclusively, the present
invention relates to storage of data as encoded DNA in living
organisms.
BACKGROUND OF THE INVENTION
[0003] A data preservation problem looms large behind today's
information superhighway. All current storage (e.g. paper, magnetic
media, silicon chips) media require constant attention to maintain
their information content. People or natural disasters can easily
destroy all of them intentionally or accidentally. With the large
amount of information generated by our society every day, it is
time to think of a new generation of data memory.
[0004] The use of deoxyribonucleic acid (DNA) as a component of
memory storage has been proposed for a number of reasons. For
example, DNA as a memory medium is compact. One cubic centimeter of
DNA in solution could store 10 21 bits of information, whereas, a
current conventional computer has a memory of at most 10 14 bits.
Also, most computers operate linearly, one block of data after
another. Biochemical reactions are highly parallel in operation.
That is a single biochemical operation can affect trillions of DNA
strands in a test tube.
[0005] Heller et. al. (U.S. Pat. No. 5,787,032) describe the use of
synthetic DNA polymers as an optical storage media for memory.
Clelland et. al. reported in Nature (Vol. 399, 10 Jun. 1999, pp.
533-34 or www.nature.com) that encoding meaningful information as
DNA sequences is possible. The authors conducted an experiment
wherein an encoded DNA strand was hid behind a period (i.e., a dot)
of a printed document. The document was then sealed and mailed to
its owners using regular US Postal Service. The embedded message
was successfully recovered in a lab environment. This work proved
that a DNA strand can be a substitute for a piece of paper in terms
of information storage. However, a naked DNA molecule can easily be
destroyed when exposed to unfavorable environmental conditions such
as excessive temperature or dessication/rehydration. Even nucleases
in the environment may degrade the DNA molecules over time.
Therefore, exploiting DNA as a memory medium would require an
effective protective storage medium.
[0006] Establishing memory of stored information in a living
organism can provide adequate protection for the encoded DNA
strands. By providing a living host for the DNA--one that can
tolerate the addition of "artificial" gene sequences and survive
extreme environmental conditions. Perhaps more importantly, the
host needs to be able to grow and multiply with the embedded
information. Propagation of a host for memory embodied in DNA can
allow for preservation and continuation of the stored memory, as
well as protecting the integrity of the information contained in
the memory. As well there is opportunity to utilize this capability
to store purposeful information.
SUMMARY OF THE INVENTION
[0007] With a careful coding scheme and arrangement, applicants
have invented a process to encode data or information as an
artificial DNA strand and store it in a living host safely and
permanently. The instant invention can be used to identify origins
and protect R&D investments (i.e., DNA watermarking) such as
agricultural products and rare animals. For example, the present
invention allows for storage of data that comprises specific
information about the host organism. The agricultural industry can
use this invention to "label" crops. By storing various data
regarding the particular plant, including origin, type, generation,
etc., the agricultural industry can then rely on this information
at a later date (e.g., when produce hits the market). It can also
be used in environmental research to track generations of organisms
and observe the ecological impact of pollutants. Today, there are
microorganisms that can survive heavy radiation exposure, high
temperatures, and many other extreme conditions. These hardy
microorganisms can serve as memory hosts and protect the stored
data or information. There are living organisms such as weeds and
cockroaches that have existed on earth for hundreds of millions of
years. These organisms are excellent candidates as well for
preserving critical information for a future civilization.
[0008] Therefore, one embodiment of the present invention is a
method of storing data in a living organism wherein at least one
DNA sequence is encoded to represent data and incorporated into a
living organism.
[0009] Another embodiment of the present invention is to provide
sequences encoded to represent data with other sequences not
specifically coded and incorporating them into a living organism
for the purpose of memory storage.
[0010] Yet another embodiment of the present invention is to
provide a method of storing programmed data into a living
organism.
[0011] Still another embodiment of the present invention is to
provide a memory storage system wherein DNA, encoded to represent
data, is stored in a living organism.
[0012] Yet another embodiment of the present invention is to
provide a method of storing editable data in a living organism.
[0013] Still another embodiment of the present invention is to
provide a method of storing programmed data that responds to a
stimulus into a living organism.
[0014] Yet another embodiment of the present invention is to
provide a method of storing information that responds to a stimulus
and reacts to specific encoded programming into a living
organism.
[0015] Still another embodiment of the present invention is to
provide a memory storage system wherein a living organism comprises
at least one DNA sequence encoded to represent data, which is
incorporated into the native DNA of a living organism.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] A more complete appreciation of the invention will be
readily obtained by reference to the following Description and the
accompanying drawings in which like numerals in different figures
represent the same structures or elements, wherein:
[0017] FIG. 1 is a simplified schematic diagram of the process of
present invention.
[0018] FIG. 2 is an illustration of a plasmid vector with encoded
DNA inserted.
[0019] FIG. 3 is a graphical illustration of the encoded DNA
sequence and the decoded message.
[0020] FIG. 4 shows an example of a DNA sequence of a song
phrase.
DETAILED DESCRIPTION
[0021] The present invention comprises a method of storing data in
a living organism, wherein at least one DNA sequence is encoded to
represent data and is thereafter incorporated into the living
organism. The method of the present invention comprises encoding
DNA to represent specific data by selecting at least 2 of the four
DNA nucleotide bases to represent specific text and arranging the
nucleotide bases in a manner to represent the data. Encoding the
DNA bases to represent specific data can be achieved in numerous
and varied ways and the embodiments set forth herein are not meant
to be exclusive, but rather exemplify the broader aspects inherent
to the present invention.
[0022] The present invention comprises a method of storing data in
a living organism by incorporating encoded DNA into a viable cell
of said living organism. FIG. 1 shows a simplified schematic of one
aspect of the present invention. Firstly, the data to be stored is
encoded into a DNA sequence. The four-nucleotide bases associated
with a DNA sequence are assembled to represent the specific data by
correlation with a specific code. For example, three of the four
basic nucleotide bases (Cytosine or C, Guanine or G, Thymine or T
and Adenine or A) can be assigned to represent a text character. A
string of DNA nucleotide bases can then be assembled to represent
text information or data. Once the specific data has been encoded
into a DNA sequence, it is then inserting into a vector that will
provide as a "vehicle" for transport into a living organism. A
vector is a DNA molecule originating from a virus, a plasmid, the
cell of a higher organism or synthetically assembled, into which
another DNA fragment of appropriate size can be integrated without
loss of the vectors capacity for self-replication; vectors
introduce foreign DNA into host cells, where it can be reproduced
in large quantities. Vectors can be selected from the group
consisting of plasmids, cosmids, and yeast artificial chromosomes,
and recombinant molecules containing DNA sequences. The vector
comprising the encoded DNA is then introduced into a viable cell of
a living organism. It is understood by those skilled in the art
that DNA bases can be incorporated into a living cell in different
ways and the particular vectors used and specific methodology is
dependant upon the type of host cell. Once the vector is inside the
cell of the host living organism, it can reside and be stored
indefinitely. The vector DNA, along with the encoded DNA will be
regenerated and coexist with the hosts genomic DNA.
[0023] The stored data can than be retrieved by means currently
know by those skilled in the art. Stored data can be retrieved by
standard PCR amplification method as PCR product (DNA fragment).
Standard DNA sequencing method such as the dideoxy termination
method can then identify the stored information within the
amplified PCR product. Alternatively, stored information within the
PCR product can be determined by hybridization with a panel of
known oligonucleotides. Once the data is retrieved, it is then
encoded and the original message is obtained.
[0024] Another aspect of the invention is to encode the DNA to
represent data that is programmed data. The programmed data can
then be programmed to accomplish an activity, continue a
communication process, and even respond to a stimulus that will
then result in an action. For example, we can construct a gene
fusion between a hydrogen-peroxide-inducible promoter with a
lysozyme gene, which will kill the bacteria if we add hydrogen
peroxide to the engineered bacteria. We can also construct gene
fusion with a regulatory gene, which will trigger a cascade of
genetic responses (in our case is information). Gene fusions
technique is a very common technique that has been used in studying
bacterial gene regulation such as the use of green fluorescent
protein.
[0025] The living organism utilized in the present invention can be
single-celled or multi-cellular, prokaryotic or eukaryotic.
Although bacterial cells serve well as host organisms to
demonstrate the present invention, it is understood that other
living cells can be utilized as well.
[0026] Another aspect of the present invention is the storage of
data in multicellular living organisms. This embodiment of the
present invention can be achieved by incorporating at least one DNA
sequence encoded to represent data into a germ cell; a precursor
cell that gives rise to gametes that will then serve as specialized
haploid cells (sperm or egg) in sexual reproduction, or stem cell;
a relatively undifferentiated cell that will continue dividing
indefinitely, throwing off (producing) daughter cells that will
undergo terminal differentiation into particular cell types. The
encoded DNA sequence will then propagate into a multicellular
living organism. This embodiment of the invention is a memory
storage system that takes advantage of multicellular organisms
(e.g., insect, rodent) and serves to propagate the encoded DNA
sequence in all daughter cells stemming from the original host stem
cell.
[0027] The present invention comprises a memory storage system
wherein a living organism comprises therein at least one DNA
sequence encoded to represent data. The stored data resides in a
living organism and remains there until recovery is desired. The
data is then retrieved and decoded so as to enable communication.
Like a computer memory device that can store data and programs, the
present invention comprises the same or similar items in a DNA
memory system. Unlike a computer compiled software program, a
program in a DNA memory system can comprise a set of rules,
options, or instructions that respond to specific circumstantial or
environmental conditions. In other words, the living organism will
detect stimuli condition as well as react according to the
information or instructions encoded in the DNA sequence. The host
cell of the living organism should not express the non-native
encoded DNA (artificial to the genomic DNA of the organism) and
cause destructive consequences such as toxic effects. It is desired
to custom-design an encoded DNA sequence that will respond to
specific events and cause the host cell of the living organism to
react or change. Therefore, the present invention provides a unique
nano-scaled event detection tool that will detect and respond to a
plurality of stimuli based on the programming encoded into the DNA
that is incorporated into a host cell of a living organism.
[0028] For a clear and concise understanding of the specification
and claims, including the scope given to such terms, the following
definitions are provided:
[0029] As used herein, the word ENCODE means to express given data
or information by means of a code.
[0030] As used herein, the word DATA means Information of any form
that is used for communication, analysis, and or reasoning in
making decisions.
[0031] Cells to be used as a carrier of the encoded DNA needs to be
made competent using standard methods and will uptake the encoded
DNA molecules. This can be achieved by either chemical
transformation or electroporation methods.
EXAMPLE 1
[0032] DNA Host Identification--Two well-understood bacteria,
Escherichia coli (E. coli) and Deinococcus radiodurans (D.
radiodurans), were utilized for our experiment. We selected E. coli
and D. radiodurans because microorganisms, in general, grow very
rapidly and the embedded information can be inherited rapidly and
continuously. Deinococcus, survive extreme conditions such as
ultraviolet, desiccation, partially vacuum environments, and
ionizing radiation up to 1.6 million Rad (about 0.1% of the same
radiation dose would be fatal to human beings). Some strains of
Deinococcus can also tolerate high temperature. Although bacteria
were chosen as preferred embodiments, it is understood that any
living cell, whether single-celled or multicellular organism, can
be used in the use of this invention.
[0033] Information Encoding--A (Adenine), C (Cytosine), G
(Guanine), and T (Thymine) were used to assemble a DNA sequence
information stream to represent data. Table 1 depicts the encoding
key for a set of triplets--a DNA sequence with any 3 of the 4 basic
units. It is recognized that other types and methods of coding
information can be utilized and this example is not meant to be
exclusive to this invention. TABLE-US-00001 TABLE 1 DNA encoding
table AAA - 0 AAC - 1 AAG - 2 AAT - 3 ACA - 4 ACC - 5 ACG - 6 ACT -
7 AGA - 8 AGC - 9 AGG - A AGT - B ATA - C ATC - D ATG - E ATT - F
CAA - G CAC - H CAG - I CAT - J CCA - K CCC - L CCG - M CCT - N CGA
- O CGC - P CGG - Q CGT - R CTA - S CTC - T CTG - U CTT - V GAA - W
GAC - X GAG - Y GAT - Z GCA - SP GCC - : GCG - , GCT - - GGA - .
GGC - ! GGG - ( GGT - ) GTA - {grave over ( )} GTC - ` GTG - " GTT
- " TAA - ? TAC - ; TAG - / TAT - [ TCA - ] TCC - TCG - TCT - TGA -
TGC - TGG - TGT - TTA - TTC - TTG - TTT -
[0034] Unique DNA Searching--The entire genomic sequence of E. coli
and D. radiodurans are known. A number of fixed-size sequences
(20-base pairs) were identified. Several criteria were used to
identify these set of 20-mers-1. these sequences do not exist in
either Deinococcus radioduran or Escherchia coli genome; 2. the
20-mer will not have complimentary sequence with more than four
bases at the 3' end, e.g.-AATT or -CCGG at 3' end; 3. the GC
content of the 20-mer will be in the range of 40 to 60%; 4. the
20-mer will have at least any two of TAG, TAA or TGA stop codons.
Criteria 1 to 3 will provide unique tags for subsequent PCR
retrieval of encoded DNA, while criterion 4 will prevent the
formation of fusion proteins that may be detrimental to the host
bacterium. These sequences ensure that no unnecessary mutations or
damage to the bacteria result. The sequences will serve as
sentinels to tag the beginning and end of the embedded
messages--similar to the file header and footer in a magnetic
tape--for later identification and retrieval. Of the 10 billion
potential candidates in the bacterium Deinococcus, we found only 25
qualified sequences that are acceptable for our experiments. A
brutal force computational analysis is used to compute this set of
20-mers. There is 420 combinations of 20-mers. All the 20-mers do
not have GC content (% of G or C within the 20-mer) between 40 to
60% were eliminated, and then the 4.1 million 20-mer exist in
Deinococcus radiodurans were eliminated. Finally, sequences with
complementary 3' end
(-AATT,-TTAA,-GGCC,-CCGG,-ATAT,-TATA,-GCGC,-CGCG), SEQ ID.: 1 were
eliminated. The remaining 20-mer was searched for the presence of
stop codons. The sequences shown in Table 2 are the identified DNA
sequences used to design oligonucleotides (oligos) used herein.
Multiple stop codons (i.e., triplets such as TAA, TGA, and TAG) are
present in many of the sequences. These codons discourage the host
from "reading" the non-native DNA that has been encoded to
represent data, and subsequently producing chimeric proteins that
may be harmful to the bacteria. TABLE-US-00002 TABLE 2 25 20-base
Pair Sequences Utilzed Herein. SEQ ID NO.: 2 AAGGTAGGTAGGTTAGTTAG
SEQ ID NO.: 3 AGGTTTGGTGGTATAGTTAG SEQ ID NO.: 4
ATAGGAGTGTGTGTAGTTAG SEQ ID NO.: 5 ATATTAGAGGGGGTAGTTAG SEQ ID NO.:
6 GGAGTAGTGTGTATAGTTAG SEQ ID NO.: 7 GGGAGTATGTAGTTAGTTAG SEQ ID
NO.: 8 GGTTAGATGAGTGTAGTTAG SEQ ID NO.: 9 TAAGGGATGTGTGTAGTTAG SEQ
ID NO.: 10 TAGAGGAGGGATATAGTTAG SEQ ID NO.: 11 TAGATGGGAGGTATAGTTAG
SEQ ID NO.: 12 TAGGAGAGATGTGTAGTTAG SEQ ID NO.: 13
TATAGGGAGGGTATAGTTAG SEQ ID NO.: 14 TGTGGGATAGTGATAGTTAG SEQ ID
NO.: 15 AGAGTAGTGAGGATAGTTAG SEQ ID NO.: 16 ATAAGTAGTGGGGTAGTTAG
SEQ ID NO.: 17 ATAGGGGTATGGATAGTTAG SEQ ID NO.: 18
ATGGGTGGATTGATAGTTAG SEQ ID NO.: 19 GGGAATAGAGTGTTAGTTAG SEQ ID
NO.: 20 GGGATGATTGGTTTAGTTAG SEQ ID NO.: 21 GTATGGGAATGGTTAGTTAG
SEQ ID NO.: 22 TAGAGAGAGTGTGTAGTTAG SEQ ID NO.: 23
TAGAGTGGTGTGTTAGTTAG SEQ ID NO.: 24 TAGATTGGATGGGTAGTTAG SEQ ID
NO.: 25 TAGGGTTGGTAGTTAGTTAG SEQ ID NO.: 26
TATAGGGTAGGGTTAGTTAG
[0035] Laboratory Procedures and Results
[0036] Two 46-mer complementary oligos were created, each
comprising two different 20-mer oligos connected by a 6-base pair
long restriction enzyme site. The two 20-mer oligos were created
from two different sequences listed in Table 2. The restriction
enzyme site was to prepare for later encoded DNA fragment
insertion. These two 46-mer long complementary oligos form a double
stranded 46-base pair DNA fragment. The DNA fragment was then
cloned into a recombinant plasmid as illustrated in FIG. 3 where
two 20-mer long oligos 104 serve as "sentinels placed at the
beginning and end of the inserted encoded DNA 102, which was then
incorporated into plasmid vector 100. Because the two 20-mer oligos
do not exist in the genome of the host, they served as
identification markers for later message retrieval. The stop codons
in these two oligos also help protect the message as well as the
host from any potential damage. Table 3 shows the phrases
considered for insertion, along with their respectively coded
sequences. For this experiment, each phrase used (2, 3, 4, 5, 8, 9,
11) was inserted into a different single cell of D. radians. The
present invention can be practiced such that all of the desired
phrases are inserted into the same single cell or individual phase
can be inserted into different cell. Two complimentary oligos
[0037] (5'AGAGTAGTGAGGATAGTTAGAGATCTCTCTAATCACACACATCTCA3', SEQ ID
NO.: 27 and 5'TGAGATGTGTGTGATTAGAGAGATCTCTAACTATCCTCACTACTCT3', SEQ
ID NO.: 28 containing two arbitrarily chosen 20-mer tags
(5'AGAGTAGTGAGGATAGTTAG3', SEQ ID NO.: 29
5'TGAGATGTGTGTGATTAGAG31), SEQ ID NO.: 30 arbitrarily selected from
Table 2, were chemically synthesized. These two chemically
synthesized oligos (46-mer) were allowed to anneal to each other to
form a 46 bp DNA fragment, which was cloned into a cloning vector,
pCR-blunt (InVitrogen Inc.). A BglII restriction enzyme site,
AGATCT, was built in within 46 bp DNA fragment. As a result,
encoded DNA message can be cloned into the BlgII site by standard
cloning procedure, and the message can be retrieved with that pairs
of tags or primer pairs present within the plasmid vector. (See
FIG. 2) TABLE-US-00003 TABLE 3 Stored Data Utilized Herein 1 A
WORLD OF TEARS, 2 AND A GOLDEN SUN, 3 AND A SMILE MEANS 4 AND A
WORLD OF FEARS, 5 AND THE OCEANS ARE WIDE, 6 FRIENDSHIP TO
EVERYONE, 7 IT IS TIME WE'RE AWARE. 8 ITS A SMALL SMALL WORLD. 9
IT'S A SMALL WORLD AFTER ALL, 10 IT'S A WORLD OF HOPES 11 IT'S A
WORLD OF LAUGHTER, 12 IT'S SMALL SMALL WORLD. 13 THERE IS JUST ONE
MOON 14 THERE'S SO MUCH THAT WE SHARE, 15 THOUGH THE MOUNTAINS ARE
HIGH,
[0038] TABLE-US-00004 1
AACGCAAGGGCAGAACGACGTCCCATCGCACGAATTGCACTCATGAGGCGTCTAGCG SEQ ID
NO.: 31 2 AAGGCAAGGCCTATCGCAAGGGCACAACGACCCATCATGCCTGCACTACTGCCTGCG
SEQ ID NO.: 32 3
AATGCAAGGCCTATCGCAAGGGCACTACCGCAGCCCATGGCACCGATGAGGCCTCTA SEQ ID
NO.: 33 4 ACAGCAAGGCCTATCGCAAGGGCAGAACGACGTCCCATCGCACGAATTGCAATTATG
SEQ ID NO.: 34 AGGCGTCTAGCG 5
ACCGCAAGGCCTATCGCACTCCACATGGCACGAATAATGAGGCCTCTAGCAAGGCGT SEQ ID
NO.: 35 ATGGCAGAACAGATCATGGCG 6
ACGGCAATTCGTCAGATGCCTATCCTACACCAGCGCGCACTCCGAGCAATGCTTATG SEQ ID
NO.: 36 CGTGAGCGACCTATGGCG 7
ACTGCACAGCTCGCACAGCTAGCACTCCAGCCGATGGCAGAAATGGTCCGTATGGCA SEQ ID
NO.: 37 AGGGAAAGGCGTATGGGA 8
AGAGCACAGCTCGTCCTAGCAAGGGCACTACCGAGGCCCCCCGCACTACCGAGGCC SEQ ID
NO.: 38 CCCCGCAGAACGACGTCCCATCGGA 9
AGCGCACAGCTCGTCCTAGCAAGGGCACTACCGAGGCCCCCCGCAGAACGACGTCC SEQ ID
NO.: 39 CATCGCAAGGATTCTCATGCGTGCAAGGCCCCCCGCG 10
AACAAAGCACAGCTCGTCCTAGCAAGGGCAGAACGACGTCCCATCGCACGAATTGCAC SEQ ID
NO.: 40 ACCGACGCATGCTA 11
AACAACGCACAGCTCGTCCTAGCAAGGGCAGAACGACGTCCCATCGCACGAATTGCAC SEQ ID
NO.: 41 CCAGGCTGCAACACCTCATGCGTGCG 12
AACAAGGCACAGCTCGTCCTAGCACTACCGAGGCCCCCCGCACTACCGAGGCCCCCCG SEQ ID
NO.: 42 CAGAACGACGTCCCATCGGA 13
AACAATGCACTCCACATGCGTATGGCACAGCTAGCACATCTGCTACTCGCACGACCTATG SEQ ID
NO.: 44 GCACCGCGACGACCT 14
AACACAGCACTCCACATGCGTATGGTCCTAGCACTACGAGCACCGCTGATACACGCACTC SEQ ID
NO.: 45 CACAGGCTCGCAGAAATGGCACTACACAGGCGTATGGCG 15
AACACCGCACTCCACCGACTGCAACACGCACTCCACATGGCACCGCGACTGCCTCTCAG SEQ ID
NO.: 46 GCAGCCTCTAGCAAGGCGTATGGCACACCAGCAACACGCG
[0039] The embedded DNA (Table 3) was then inserted into a plasmid
vector 100, shown in FIG. 3. The resultant vectors are then
transferred into 10 E. coli by electroporation (high-voltage
shocks). It is recognized by one of ordinary skill in the art to
transfer vectors by other means that may be more particularly
suited for the specific host cell. For example, we have used
pCRblunt for cloning most of the specifically designed oligos. As
bacteria grow and divide, the recombinant plasmid vectors also
replicate to produce an enormous number of copies of DNA plasmid
vectors containing the encoded DNA. This produces multiple copies
of the encoded DNA fragment, allowing storage and continuation of
the stored data.
[0040] The stored data was then recovered by searching for the two
20-mer oligos (data markers) 104 (FIG. 3). The cells were harvested
then lysed to obtain crude genomic DNA comprising the incorporated
encoded DNA. With standard procedures, the encoded DNA was located
and amplified with polymerase chain reaction (PCR) techniques.
Specific primers (M13 reverse, TGAGCGGATAACAATTTCACACAG, SEQ ID
NO.: 48 or M13 sequencing primer, GTTTTCCCCAGTCACGACGTTG), SEQ ID
NO.: 49 or a pair of tag primers (FIG. 2) can be used to amplify
the encoded DNA as PCR DNA fragment. (You might want to add some
detail and technically bolster this KK) Once the encoded
information was obtained, it was then decoded to reveal the
original data (song phrases). FIG. 4 shows an example of a DNA
sequence of a song phrase recovered and the decoded message
revealed. We use a simple script to convert DNA sequence into words
based on our assignment of each of the triplets. We have data from
E. coli only although we have tried once with D. radiodurans but
not successful yet.
[0041] Although the invention has been described with respect to
specific preferred embodiments, many variations and modifications
may become apparent to those skilled in the art. It is therefore
the intention that the appended claims be interpreted as broadly as
possible in view of the prior art to include all such variations
and modifications.
Sequence CWU 1
1
48 1 14 DNA Artificial Sequence Randomly-generated sequence for
presentation in FIG. 1 for illustration purposes 1 atttaggctc gaat
14 2 20 DNA Artificial Sequence Synthesized Oligo A 2 aaggtaggta
ggttagttag 20 3 20 DNA Artificial Sequence Synthesized Oligo B 3
aggtttggtg gtatagttag 20 4 20 DNA Artificial Sequence Synthesized
Oligo C 4 ataggagtgt gtgtagttag 20 5 20 DNA Artificial Sequence
Synthesized Oligo D 5 atattagagg gggtagttag 20 6 20 DNA Artificial
Sequence Synthesized Oligo E 6 ggagtagtgt gtatagttag 20 7 20 DNA
Artificial Sequence Synthesized Oligo F 7 gggagtatgt agttagttag 20
8 20 DNA Artificial Sequence Synthesized Oligo G 8 ggttagatga
gtgtagttag 20 9 20 DNA Artificial Sequence Synthesized Oligo O 9
taagggatgt gtgtagttag 20 10 20 DNA Artificial Sequence Synthesized
Oligo P 10 tagaggaggg atatagttag 20 11 20 DNA Artificial Sequence
Synthesized Oligo Q 11 tagatgggag gtatagttag 20 12 20 DNA
Artificial Sequence Synthesized Olgo R 12 taggagagat gtgtagttag 20
13 20 DNA Artificial Sequence Synthesized Oligo S 13 tatagggagg
gtatagttag 20 14 20 DNA Artificial Sequence Synthesized Oligo T 14
tgtgggatag tgatagttag 20 15 20 DNA Artificial Sequence Synthesized
Oligo H 15 agagtagtga ggatagttag 20 16 20 DNA Artificial Sequence
Synthesized Oligo I 16 ataagtagtg gggtagttag 20 17 20 DNA
Artificial Sequence Synthesized Oligo J 17 ataggggtat ggatagttag 20
18 20 DNA Artificial Sequence Synthesized Oligo K 18 atgggtggat
tgatagttag 20 19 20 DNA Artificial Sequence Synthesized Oligo L 19
gggaatagag tgttagttag 20 20 20 DNA Artificial Sequence Synthesized
Oligo M 20 gggatgattg gtttagttag 20 21 20 DNA Artificial Sequence
Synthesized Oligo N 21 gtatgggaat ggttagttag 20 22 20 DNA
Artificial Sequence Synthesized Oligo Z 22 tagagagagt gtgtagttag 20
23 20 DNA Artificial Sequence Synthesized Oligo U 23 tagagtggtg
tgttagttag 20 24 20 DNA Artificial Sequence Synthesized Oligo V 24
tagattggat gggtagttag 20 25 20 DNA Artificial Sequence Synthesized
Oligo W 25 tagggttggt agttagttag 20 26 20 DNA Artificial Sequence
Synthesized Oligo X 26 tatagggtag ggttagttag 20 27 46 DNA
Artificial Sequence Complimentary Oligo A 27 agagtagtga ggatagttag
agatctctct aatcacacac atctca 46 28 46 DNA Artificial Sequence
Complimentary Oligo B 28 tgagatgtgt gtgattagag agatctctaa
ctatcctcac tactct 46 29 20 DNA Artificial Sequence Arbitrary Chosen
20-mer Tag A 29 agagtagtga ggatagttag 20 30 20 DNA Artificial
Sequence Arbitrary Chosen 20-mer Tag B 30 tgagatgtgt gtgattagag 20
31 57 DNA Artificial Sequence Line 1 Encoded Information 31
aacgcaaggg cagaacgacg tcccatcgca cgaattgcac tcatgaggcg tctagcg 57
32 57 DNA Artificial Sequence Line 2 Encoded Information 32
aaggcaaggc ctatcgcaag ggcacaacga cccatcatgc ctgcactact gcctgcg 57
33 57 DNA Artificial Sequence Line 3 Encoded Information 33
aatgcaaggc ctatcgcaag ggcactaccg cagcccatgg caccgatgag gcctcta 57
34 69 DNA Artificial Sequence Line 4 Encoded Information 34
acagcaaggc ctatcgcaag ggcagaacga cgtcccatcg cacgaattgc aattatgagg
60 cgtctagcg 69 35 78 DNA Artificial Sequence Line 5 Encoded
Information 35 accgcaaggc ctatcgcact ccacatggca cgaataatga
ggcctctagc aaggcgtatg 60 gcagaacaga tcatggcg 78 36 75 DNA
Artificial Sequence Line 6 Encoded Information 36 acggcaattc
gtcagatgcc tatcctacac cagcgcgcac tccgagcaat gcttatgcgt 60
gagcgaccta tggcg 75 37 75 DNA Artificial Sequence Line 7 Encoded
Information 37 actgcacagc tcgcacagct agcactccag ccgatggcag
aaatggtccg tatggcaagg 60 gaaaggcgta tggga 75 38 81 DNA Artificial
Sequence Line 8 Encoded Information 38 agagcacagc tcgtcctagc
aagggcacta ccgaggcccc ccgcactacc gaggcccccc 60 gcagaacgac
gtcccatcgg a 81 39 93 DNA Artificial Sequence Line 9 Encoded
Information 39 agcgcacagc tcgtcctagc aagggcacta ccgaggcccc
ccgcagaacg acgtcccatc 60 gcaaggattc tcatgcgtgc aaggcccccc gcg 93 40
72 DNA Artificial Sequence Line 10 Encoded Information 40
aacaaagcac agctcgtcct agcaagggca gaacgacgtc ccatcgcacg aattgcacac
60 cgacgcatgc ta 72 41 84 DNA Artificial Sequence Line 11 Encoded
Information 41 aacaacgcac agctcgtcct agcaagggca gaacgacgtc
ccatcgcacg aattgcaccc 60 aggctgcaac acctcatgcg tgcg 84 42 78 DNA
Artificial Sequence Line 12 Encoded Information 42 aacaaggcac
agctcgtcct agcactaccg aggccccccg cactaccgag gccccccgca 60
gaacgacgtc ccatcgga 78 43 75 DNA Artificial Sequence Encoded
Information Depicted in FIG. 4 in the Context of a Plasmid Vector
43 gcaaggccta tcgcactcca catggcacga ataatgaggc ctctagcaag
gcgtatggca 60 gaacaga tcatggcg 75 44 75 DNA Artificial Sequence
Line 13 Encoded Information 44 aacaatgcac tccacatgcg tatggcacag
ctagcacatc tgctactcgc acgacctatg 60 gcaccgcgac gacct 75 45 99 DNA
Artificial Sequence Line 14 Encoded Information 45 aacacagcac
tccacatgcg tatggtccta gcactacgag caccgctgat acacgcactc 60
cacaggctcg cagaaatggc actacacagg cgtatggcg 99 46 99 DNA Artificial
Sequence Line 15 Encoded Information 46 aacaccgcac tccaccgact
gcaacacgca ctccacatgg caccgcgact gcctctcagg 60 cagcctctag
caaggcgtat ggcacaccag caacacgcg 99 47 24 DNA Artificial Sequence
M13 Reverse Primer (PCR Amplification Primer) 47 tgagcggata
acaatttcac acag 24 48 22 DNA Artificial Sequence M13 Sequence
Primer (PCR Amplification Primer) 48 gttttcccca gtcacgacgt tg
22
* * * * *
References