U.S. patent application number 11/463713 was filed with the patent office on 2007-05-17 for method of storing and retrieving miniaturised data.
This patent application is currently assigned to Zentronix Pty Ltd.. Invention is credited to John Archbold, Graziano Mele.
Application Number | 20070112721 11/463713 |
Document ID | / |
Family ID | 37873090 |
Filed Date | 2007-05-17 |
United States Patent
Application |
20070112721 |
Kind Code |
A1 |
Archbold; John ; et
al. |
May 17, 2007 |
METHOD OF STORING AND RETRIEVING MINIATURISED DATA
Abstract
A method of storing data including the steps of providing a
first index of first location identifiers, a second index of second
location identifiers and a dictionary data base of data items,
wherein the first location identifiers are adapted to identify the
location of second location identifiers in the second index and the
second location identifiers are adapted to identify the location of
data items in the dictionary data base, receiving data and
separating the data into a plurality of data items and storing the
data items in a main data base, whereby at least one of the data
items is stored in the main data base as at least one first
location identifier, which identifies at least one second location
identifier, which identifies the or each data item in the
dictionary data base.
Inventors: |
Archbold; John; (Hampton,
AU) ; Mele; Graziano; (Southport, AU) |
Correspondence
Address: |
TOWNSEND AND TOWNSEND AND CREW, LLP
TWO EMBARCADERO CENTER
EIGHTH FLOOR
SAN FRANCISCO
CA
94111-3834
US
|
Assignee: |
Zentronix Pty Ltd.
Hampton
AU
|
Family ID: |
37873090 |
Appl. No.: |
11/463713 |
Filed: |
August 10, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10168439 |
Oct 11, 2002 |
|
|
|
PCT/AU00/01594 |
Dec 21, 2000 |
|
|
|
11463713 |
Aug 10, 2006 |
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.001; 707/E17.038 |
Current CPC
Class: |
Y10S 707/99934 20130101;
Y10S 707/99942 20130101; H03M 7/3084 20130101; Y10S 707/99943
20130101; G06F 16/902 20190101 |
Class at
Publication: |
707/001 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 23, 1999 |
AU |
PQ 4865 |
Claims
1-12. (canceled)
13. A system for storing data, the system including at least one
dictionary data base and at least two index data bases wherein the
dictionary data base comprises a plurality of data items, a first
one of the index data bases comprising a plurality of data item
location identifiers, which respectively identify the location of
at least one data item in the dictionary data base and a second one
of the index data bases including a plurality of first location
identifiers which respectively identify the location of at least
one data item location identifier in the first index data bases and
wherein the system includes a processing means which is adapted to
receive data including data items and to store the data in a
compressed form by storing in place of each data item occurring in
the dictionary data base, each corresponding first location
identifier, whereby each data item occurring in the dictionary data
base can be retrieved by referencing the data item location
identifier identified by the first location identifier.
14. The system as claimed in claim 13 wherein the at least two
index data bases include separate lists of location identifiers in
one or more other data bases.
15. The system as claimed in claim 14 including a storage medium
having a sequence of instructions adapted to control a data
processor to set up the system.
16. The system as claimed in claim 15 wherein the first index data
base is part of the dictionary data base.
17. The system as claimed in claim 14 including one or more
additional index data bases each with location identifiers which
identify the location of another location identifier of another
index data base.
18. The system as claimed in claim 17 including a main data base
which is adapted to store a stream of data as a combination of data
items which are not represented in the dictionary data base and
first location identifiers.
19. The system as claimed in claim 18 wherein the stream of data
stored in the main data base may have data items and first location
identifiers which are stored in an order determined by a further
index data base and a reprocessing means which is adapted to
control the ordering of data in the main data base with reference
to the further index data base.
20. The system as claimed in claim 19 wherein the dictionary data
base has data items stored in a predetermined order which is
determined by how frequently each data items stored therein is
expected to occur in a data stream of data items.
21. The system as claimed in claim 20 wherein the most common data
items have a location in the dictionary data base that is
identified by a dictionary data base location identifier having
minimal bytes compared to an uncommon data item.
22. The system as claimed in claim 21 wherein the dictionary data
base index comprises dictionary data base location identifiers
arranged sequentially from lowest number to highest number of bytes
required to define them.
23. The system as claimed in claim 22 wherein each first location
identifier comprises a pointer having a number which identifies a
position of one data item location identifier in the dictionary
data base index.
24. The system as claimed in claim 23 wherein each data item
location identifier comprises a pointer having a number which
identifies the position of one data item in the dictionary data
base.
25. The system as claimed in claim 24 wherein the dictionary data
base is divided into different sections which have data items with
locations which are identified by data item location identifiers
from different dictionary data base indexes.
26. The system as claimed in claim 25 wherein the dictionary data
base includes storage space into which data items can be added.
27. A computer program comprising instructions which are adapted to
control a computer to provided at least one dictionary data base
and at least two index data bases, the computer program including
instructions to control the computer to provide the dictionary data
base with a plurality of data items, controlling the computer to
provide a first one of the index data bases with a plurality of
data item location identifiers, which respectively identify the
location of a least one data item in the dictionary database and
controlling the computer to provide a second one of the index data
bases with a plurality of first location identifiers which
respectively identify the location of a least one data item
location identifier in the first index data base, controlling the
computer to receive data including data items and to store the data
in a compressed form by storing in place of each data item which
occurs in the dictionaly data base, each corresponding first
location identifier, whereby each data item which occurs in the
dictionary data base con be retrived by referencing the data item
location identifier identified by the first location
identifier.
28. The computer program as claimed in claim 27 wherein the at
least two index data bases include separate lists of location
identifiers in a least one other data base.
29. The computer program as claimed in claim 28 wherein the first
index data base is part of the dictionary data base.
30. The computer program as claimed in claim 29 including an
exception means for storing data items which do not occur in the
dictionary data base.
31. The computer program as claimed in claim 30 wherein the
exception means includes a predeterminded part of the dictionary
data base.
32. The computer program as claimed in claim 31 weherein the
exception means is adapted to provide a dictionary index location
identifier for any new data items stored in the dictionary data
base.
33. The computer program as claimed in claim 32 wherein the
exception means includes an exceptions data base which is adapted
to store data items which do not occur in the dictionary data
base.
34. The computer program as claimed in claim 34 including a means
for storing different types of data items in different dictionary
data bases.
35. The computer program as claimed in claim 24 wherein each
dictionary data base has predetermined dictionary location
identifiers in the first one of the index data bases, which provide
the location of data items in that data base.
36. The computer program as claimed in claim 26 wherein each
dictionary data base is split into a plurality of data types each
having a plurality of data items of that type.
37. (canceled)
38. A system for retrieving data items stored in a miniaturised
form, the system including at least one dictionary data base and at
least two index data bases wherein the dictionary data base
comprises of plurality of data items and a first one of the index
data bases comprises a plurality of data item location identifiers
which respectively identify the location of a least one data item
in the dictionary dat base and a second one of the index data bases
includes a plurality of first location identifiers which
respectively identify the locations of at least one data item
location identifier in the first index data base and a processing
means, wherein the processing means is adapted to receive a first
data stream including a plurality of first location identifiers and
produce a second data stream including the data items without the
first location identifiers and wherein first location identifiers
are replaced by corresponding data items.
39. The system as claimed in claim 38 wherein the first data stream
is adapted to be received by reading a data base.
40. The system as claimed in claim 39 wherein the data base is
located in a storage medium which is readable by a computer
hardware device.
41. The system as claimed in claim 40 wherein the data stream is
transmitted and received from a communications system.
42. The system as claimed in claim 41 wherein the first data stream
is received and stored in a data processor before being read and
compressed or decompressed.
43. The system as claimed in claim 42 including a scrambling means
for reordering data item location identifiers in the first index
data and storing a method of reording data item location
identifiers utilised by the scrambling means, whereby reordering
data item location identifiers can be reversed.
44. The system as claimed in claim 43, wherein the scrambling means
includes reversing means for reversing any reordering which has
taken place of the dictionary item location identifiers.
Description
CROSS REFERENCES TO RELATED APPLICATIONS
[0001] This application is a divisional of U.S. patent application
Ser. No. 10/168,439, filed Dec. 21, 2000, which claims domestic
priority from PCT/AU00/01594 filed Dec. 21, 2000 which claims
foreign priority from Australian Patent Application No. PQ 4865
filed Dec. 23, 1999, each of which is hereby incorporated by
reference, as if set forth in full in this document, for all
purposes.
FIELD OF THE INVENTION
[0002] The present invention relates primarily, although not
exclusively, to techniques used for storing electronic data in a
form which requires less storage space.
BACKGROUND OF THE INVENTION
[0003] A typical method for compressing data involves the use of a
dictionary data base which lists commonly occurring data and
replaces this commonly occurring data with a coded "token" which
effectively represents that data using a reduced number of data
bits.
[0004] Whenever an item of data occurs repeatedly this data item is
replaced by its equivalent "token" and accordingly that data item
is stored in a compressed form.
[0005] When data is stored in the compressed form, by using a
look-up table each token can be replaced by its equivalent data
item so that the original data can be reformed.
[0006] The above conventional compression technique has a number of
drawbacks. These drawbacks include the number of data bits which
are required to represent a token can also be significant with the
result that significant storage space is required to store each
token. In addition searching a data base which includes tokens can
be quite cumbersome because tokens need to be reconverted to their
original data item before a search of each of the data items can be
properly conducted.
SUMMARY OF THE INVENTION
[0007] The present invention provides an alternative to existing
methods of storing data in a miniaturised form and extends to
methods for encrypting data as well as systems for implementing the
method, computer programs and storage medium for storing electronic
data which is able to implement the method and system.
BRIEF DESCRIPTION OF THE DRAWING
[0008] FIG. 1 illustrates first location identifiers and second
location identifiers and other aspects of an embodiment of the
present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0009] According to the present invention there is provided a
method of storing data including the steps of providing a first
index of first location identifiers, a second index of second
location identifiers and a dictionary data base of data items,
wherein the first location identifiers are adapted to identify the
location of second location identifiers in the second index and the
second location identifiers are adapted to identify the location of
data items in the dictionary data base, receiving data and
separating the data into a plurality of data items and storing the
data items in a main data base, whereby at least one of the data
items is stored in the main data base as at least one first
location identifier, which identifies at least one second location
identifier, which identifies the or each data item in the
dictionary data base.
[0010] According to another aspect of the present invention there
is provided a method of retrieving data stored in a miniaturised
form in a main data base, including the steps of accessing the main
data base, retrieving one or more items of data including at least
one first location identifier from the main data base, using the
first location identifier to access and retrieve the location of a
second location identifier identified in the first index by the
first location identifier, accessing and retrieving from the second
location identifier in the second index the location of an item of
data in a dictionary data base.
[0011] It is preferred that the method of storing data includes the
step of searching the dictionary data base for at least one data
item and replacing the data item with one first location identifier
which indicates the location of one location identifier in the
second index, which second location identifier indicates the
location of the data item in the dictionary data base.
[0012] It is preferred that the method includes the step of
searching the dictionary data base for each data item and
identifying if the data item occurs in the dictionary data base and
if the data item occurs in the dictionary data base, retrieving the
second location identifier in the second index that identifies the
location of the data item in the dictionary data base, retrieving
the first location identifier in the first index which identifies
the location of the second location identifier in the second index
and storing the first location identifier in a main data base in
place of the data item.
[0013] It is preferred that the data item includes a string of
data, a field of data or other group of data that can represent
information in a predetermined format.
[0014] The or each data item preferably represents a stream of data
which represents information which can be searched.
[0015] Each data item preferably represents a name, initial,
address, phone number or other words or numbers or initials or
characters or character strings or number strings.
[0016] Each first location identifier preferably includes a pointer
to the second index.
[0017] Each second location identifier may include a pointer to the
dictionary data base.
[0018] The first index may comprise a plurality of pointers.
[0019] Preferably the second index comprises a plurality of
pointers.
[0020] The first index may comprise a sequential list of
pointers.
[0021] The second index may comprise a sequential list of
pointers.
[0022] The dictionary data base preferably includes a plurality of
data bases each with unique addresses which are represented by the
location identifiers.
[0023] Each index may include a plurality of sub-indexes.
[0024] Preferably each second index is divided into different
sections representing locations of predetermined types of data
items.
[0025] Each first index is preferably divided into different
sections representing the location of second location identifiers
associated with predetermined types of data items.
[0026] The method may include providing additional indexes with
additional location identifiers.
[0027] According to another aspect of the present invention there
is provided a system for storing data, the system including at
least one dictionary data base and at least two index data bases
wherein the dictionary data base comprises a plurality of data
items, a first one of the index data bases comprising a plurality
of data item location identifiers, which respectively identify the
location of at least one data item in the dictionary data base and
a second one of the index data bases including a plurality of first
location identifiers which respectively identify the location of at
least one data item location identifier in the first index data
base, and wherein the system includes a processing means which is
adapted to receive data including data items and to store the data
in a compressed form by storing in place of each data item
occurring in the dictionary data base, each corresponding first
location identifier, whereby each data item occurring in the
dictionary data base can be retrieved by referencing the data item
location identifier identified by the first location
identifier.
[0028] Preferably the at least two index databases include separate
lists of location identifiers in a common data base or which are
part of other data bases.
[0029] Preferably there is provided a storage medium including a
sequence of instructions adapted to control a data processor to set
up the system.
[0030] The first index data base may be part of the dictionary data
base.
[0031] The system may include one or more additional index data
bases each with location identifiers which identify the location of
another location identifier of another index data base.
[0032] It is preferred that the system includes a main data base
which is adapted to store a stream of data as a combination of data
items which are not represented in the dictionary data base and
first location identifiers.
[0033] According to one embodiment the stream of data stored in the
main data base may have data items and first location identifiers
which are stored in an order determined by a further index data
base and a reprocessing means which is adapted to control the
ordering of data in the main data base with reference to the
further index data base.
[0034] Preferably the dictionary data base has data items stored in
a predetermined order which is determined by how commonly or
frequently each data item stored therein is expected to occur in a
data stream of data items.
[0035] It is preferred that the most common data items have a
location in the dictionary data base that is identified by a
dictionary data base location identifier having minimal bits
compared to an uncommon data item.
[0036] Preferably the dictionary data base index comprises
dictionary data base location identifiers arranged sequentially
from lowest number to highest number of bits required to define
them.
[0037] Each first location identifier may comprise a pointer having
a number which identifies a position of one data item location
identifier in the dictionary data base index.
[0038] Preferably each data item location identifier comprises a
pointer having a number which identifies the position of one data
item in the dictionary data base.
[0039] The dictionary data base may be divided into different
sections which have data items with locations which are identified
by data item location identifiers from different dictionary data
base indexes.
[0040] The dictionary data base preferably includes storage space
into which data items can be added.
[0041] According to another aspect of the present invention there
is provided a computer program which is adapted to control a
computer to provide at least one dictionary data base and at least
two index data bases, wherein the dictionary data base comprises a
plurality of data items, a first one of the index data bases
comprises a plurality of data item location identifiers, which
respectively identify the location of at least one data item in the
dictionary data base, and a second one of the index data bases
includes a plurality of first location identifiers which
respectively identify the location of at least one data item
location identifier in the first index data base, and wherein the
computer program includes instructions to control the computer to
receive data including data items and to store the data in a
compressed form by storing in place of each data item which occurs
in the dictionary data base, each corresponding first location
identifier, whereby each data item which occurs in the dictionary
data base can be retrieved by referencing the data item location
identifier identified by the first location identifier.
[0042] Preferably the at least two index data bases include
separate lists of location identifiers in a common data base or
which are part of other data bases.
[0043] The first index data base may be part of the dictionary data
base.
[0044] It is preferred that the computer program includes an
exception means for storing data items which do not occur in the
dictionary data base.
[0045] The exception means preferably includes a predetermined part
of the dictionary data base.
[0046] The exception means may be adapted to provide a dictionary
index location identifier for any new data item stored in the
dictionary data base.
[0047] According to another embodiment of the present invention the
exception means includes an exceptions data base which is adapted
to store data items which do not occur in the dictionary data
base.
[0048] The computer program may include a means for storing
different types of data items in different dictionary data
bases.
[0049] Each dictionary data base may have predetermined dictionary
location identifiers in the first one of the index data bases,
which provide the location of data items in that data base.
[0050] Each dictionary data base may be split into a plurality of
data types or fields each having a plurality of data items of that
type or field.
[0051] According to another aspect of the present invention there
is provided a storage medium having computer software stored
thereon which is adapted to control a computer to set up a system
according to any one of the previously described embodiments of the
invention.
[0052] According to another aspect of the present invention there
is provided a system for retrieving data items stored in a
miniaturised form, the system including at least one dictionary
data base and at least two index data bases wherein the dictionary
data base comprises a plurality of data items and a first one of
the index data bases comprises a plurality of data item location
identifiers which respectively identify the location of at least
one data item in the dictionary data base and a second one of the
index data bases includes a plurality of first location identifiers
which respectively identify the location of at least one data item
location identifier in the first index data base and a processing
means, wherein the processing means is adapted to receive a first
data stream including a plurality of first location identifiers and
produce a second data stream including the data items without first
location identifiers, and wherein first location identifiers are
replaced by corresponding data items.
[0053] It is preferred that the first data stream is adapted to be
received by reading a data base.
[0054] The data base may be in a storage medium which is readable
by a computer hardware device.
[0055] The data base may be stored in a computer memory.
[0056] The data stream preferably is transmitted and received from
a communication system.
[0057] The first data stream may be received and stored in a data
processor before being read and compressed or decompressed.
[0058] According to another aspect of the present invention there
is provided a system which includes the system for compressing data
and the system for retrieving data.
[0059] According to another embodiment of the present invention
there is provided a storage medium having a computer program stored
thereon which is adapted to control a computer to set up/implement
the combined system.
[0060] According to another aspect of the present invention there
is provided a method of encrypting data using the system for
compressing data.
[0061] According to one embodiment of the method of encrypting
data, the location of data items may be changed by using a coding
means for changing the data item location identifiers in a
reconvertible manner.
[0062] It is preferred that the data item location identifiers are
able to be reordered so that the first location identifiers
identify different data item location identifiers to those before
reordering.
[0063] According to another embodiment of the present invention any
one of the systems includes a scrambling means for reordering data
item location identifiers in the first index data and for storing
the method of reordering whereby the reordering can be
reversed.
[0064] According to another aspect of the present invention there
is provided a method of decryption using the system for retrieving
data.
[0065] It is preferred that the method of decryption includes the
scrambling means for reversing any reordering which has taken place
of the dictionary item location identifiers.
[0066] According to another embodiment of the present invention the
method of decryption includes a descrambling means which includes
means for reversing any reordering of dictionary item location
identifiers in the first index data base.
[0067] According to another embodiment of the present invention
there is provided a method of encrypting and decrypting data which
incorporates the combined system for compressing and retrieving
data, the method also including the step of at predetermined times
using a scrambling means to reorder dictionary item location
identifiers in accordance with a predetermined ordering technique
which is stored or able to be stored and received by a descrambling
means at a receiving end of the system.
[0068] A preferred embodiment of the present invention will now be
described by way of example only with reference to HTML script or
text.
[0069] As an example the following HTML text will be minimised in
accordance with the preferred embodiment of the invention:
[0070] The Frontpage install <!-webbot="PurpleText"
preview="This page is created in the root directory of your
FrontPage when FrontPage is installed. It contains information that
allows users to edit pages in your web using the Microsoft Web
Publishing Wizard or programs which use the Microsoft Web
Publishing Wizard such as FrontPad using the same username and
password they would use if they were authorising with Microsoft
FrontPage. If you so not want to allow users to edit files on this
web using tools other than Microsoft FrontPage, you can delete this
file".
[0071] The above text can be split into a number of groups which
for convenience will be referred to as data items. Thus the word
"the" constitutes one data item, the word "frontpage" constitutes
another data item and so on for the word "install", "<!-webbot
bot=" and "PurpleText". In a typical situation each of the above
data items would be stored in a data base and the space required to
store this data is accordingly no longer available to store other
data.
[0072] Using the miniaturisation technique in accordance with the
present invention two indexing lists are set up as shown in FIG.
1.
[0073] A first list 11 is set up which is effectively a data base
of pointers.
[0074] For convenience only some of the pointers are shown, being
those pointers required to identify text which is stored relating
to the sample of HTML text referred to above.
[0075] The first list 11 is generated by analysing the repetitive
structure of HTML text and script that exists as documents or data
transfer streams. This list has common HTML text type documents.
All items that are of repetitive nature that can be identified
exist in this list. This text list could be a super set of other
common lists, for example, the English language list or the French
language list.
[0076] A second list 12 contains a dictionary of the HTML text
which is to be miniaturised. Each data item is located at a
specific position in the list 12 and this position is identified by
a number which is pointed to by a pointer from list 11.
[0077] The list 12 is effectively a dictionary data base which is
generated by coding the entries in the HTML text and script
list.
[0078] The 128 most common items are located first in the list and
are assigned first level representation (typically 8 bits) in
alphabetic sequence. The rest of the list is organised
alphabetically and is assigned the minimum number of bits to
uniquely identify the location of the original data in the list
11.
[0079] As an example, if the total number of data items (e.g.
characters) in the first list is 29,456 then 15 bits (0.32768)
would be needed to represent the unique location of the start of a
particular data item. The number of unique entries is then
calculated. If, for example, there are 3,128 unique entries in the
list 11, then 12 bits (0 to 4096) will be required to identify the
unique data items in the list.
[0080] It follows from the above that by setting up the first list
11 a reduced number of pointers are required to represent the data
items in the second list, because data items that are repeated do
not need to have an associated pointer.
[0081] Accordingly if a data item occurs 1,000 times in the second
list or dictionary data base, a single pointer is all that is
required in the first list 11 and accordingly the single pointer is
all that needs to be stored in a general data base 13.
[0082] Thus referring back to the example of HTML text given above,
the word "the" is the first data item which is to be stored in the
general data base 13. Because the word "the" is a common word, it
therefore occurs in the most common section of the second list 12
and may be located at position 3406. The corresponding pointer from
the first list 11 may be located at position 8A. Accordingly the
word "the" does not need to be stored in the general data base 13
nor does the second list pointer 3406. Instead the first list
pointer 8A can be stored in the general data base 13 and this
obviously has a lower number of bits required to describe it and
accordingly requires less space for storage.
[0083] The next word in the HTML text is "Frontpage" which is not
as common as the word "the", but does exist many times in normal
HTML text. It therefore is located in the less common section of
the dictionary list 12 at a location 23456. In the first list 11
location 23456 is represented by pointer 2408.
[0084] It follows therefore that pointer 2408 is placed in the
general data base 13 straight after pointer BA. The word "install"
is the next data item in the HMTL text and is an uncommon word
which is located at position 26578. The corresponding pointer in
first list 11 is located at position 2458. Accordingly this pointer
2458 is stored in the general data base 13 after pointer 2408.
[0085] Finally the script string "<!-webbot bot=" is a very
common HTML script command and is therefore located at position
4987 in the second list. This location 4987 is represented by
pointer 8F in the first list 11 and accordingly is located in the
general data base 13 instead of the script "<!-webbot bot=".
[0086] The word "PurpleText" is not common in either HTML script or
text and therefore does not occur in the dictionary list 12. As a
result this word is represented by an exception flag "00" in the
general data base 13 and has no associated pointer. Similarly any
other script or text which is not represented in the dictionary
data base 12, is also classified as an exception and is copied
verbatim into the general data base 13.
[0087] Reconstruction of the original data represented in the
general data base is simply achieved by using a reverse look-up
algorithm.
[0088] Thus if the pointer 8A is read, a look-up algorithm is used
to access the first list 11 which gives the location of the
corresponding data item at location 3406 in the second list 12.
[0089] At location 3406 the word "the" is located and this word is
then retrieved and substituted for the pointer 8A.
[0090] The above example discloses what is in effect a double index
technique, utilising two pointers. However the present invention
may equally be applicable to any number of indexes and pointers,
depending on the data which is to be miniaturised. Thus one
application would be in miniaturising data located in telephone
white pages. In such a situation a number of dictionary lists would
be required, such as a names list, a streets list and a locations
list.
[0091] Each of these lists would have their own separate first and
second list pointers using the examples outlined above.
Furthermore, each list could have an associated list which would
also require a double index pointer system.
[0092] Thus a streets list having the names of various streets may
also require a sub-list of street types such as "ST", "PL", "CR"
etc.
[0093] According to another example image data may be represented
by multi-level indexing techniques. Thus the first level may be the
fact that the area is black, the second level may indicate the
shape, the third level may indicate the size. Similarly the levels
may relate to further deconstruction of the original data.
[0094] Clearly the above compression technique is not limited to
text based data, but is also able to be used in connection with
foreign languages, foreign character sets (e.g. Arabic and
Chinese), music and speech phonemes. The only requirement is that
the data has a repetitive nature that can be analysed and
represented as uniquely coded and identifiable items.
[0095] An important advantage of the miniaturisation technique
which is described above lies with the ability to search data items
in its miniaturised format. Thus instead of searching for the word
"the" in the preferred embodiment given above, a search could be
conducted for the pointer 8A. This is in contrast to conventional
searching techniques of compressed text, where it is necessary to
continually convert and reconvert text in order to complete the
search.
[0096] Although the main focus of the present invention is
miniaturisation of data, the invention is equally applicable to
encrypting/decrypting data. This is because the indexing system
described above in effect replaces common data items with
associated pointers which act as tokens.
[0097] Because each token and data item is easily retrievable, the
list of tokens/pointers can easily be manipulated in a reversible
manner to make unauthorised decryption more difficult.
[0098] The present invention is therefore applicable to any data
which includes repetitive elements. This is because these
repetitive elements can be represented in an index of
pointers/tokens which obviate the need for pointers for each
repeated element. It follows therefore that theoretically any data
stored, for example in computer memory can be stored in a
miniaturised form by eliminating the majority of repeated data
items.
* * * * *