U.S. patent application number 10/208606 was filed with the patent office on 2003-04-10 for data processing system.
Invention is credited to Christodoulou, Athena, Taylor, Richard, Tofts, Christopher.
Application Number | 20030069898 10/208606 |
Document ID | / |
Family ID | 9919502 |
Filed Date | 2003-04-10 |
United States Patent
Application |
20030069898 |
Kind Code |
A1 |
Christodoulou, Athena ; et
al. |
April 10, 2003 |
Data processing system
Abstract
A data processing system is provided in which a data acquisition
unit includes a data tag generator for generating data tags
associated with acquired data items. The generated data tags are
transmitted to a data store.
Inventors: |
Christodoulou, Athena;
(Bristol, GB) ; Taylor, Richard; (Bristol, GB)
; Tofts, Christopher; (Bristol, GB) |
Correspondence
Address: |
HEWLETT-PACKARD COMPANY
Intellectual Property Administration
P.O. Box 272400
Fort Collins
CO
80527-2400
US
|
Family ID: |
9919502 |
Appl. No.: |
10/208606 |
Filed: |
July 30, 2002 |
Current U.S.
Class: |
1/1 ;
707/999.107; 707/E17.095 |
Current CPC
Class: |
G06F 16/38 20190101 |
Class at
Publication: |
707/104.1 |
International
Class: |
B61J 001/06 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 31, 2001 |
GB |
0118603.0 |
Claims
1. A data processing system comprising at least one data item
acquisition unit and at least one data store, the data item
acquisition unit comprising a data tag generator for generating a
data tag associated with each data item, and the data item
acquisition unit being arranged to transmit at least the data tag
to the data store.
2. A data processing system according to claim 1, wherein said data
item acquisition unit is further arranged to transmit the data item
to the at least one data store.
3. A data processing system according to claim 2, wherein the at
least one data store comprises a further data tag generator.
4. A data processing system according to claim 3, wherein said data
item acquisition unit comprises a decision module arranged to
evaluate if the data tag should be generated at the data
acquisition unit or if the data item should be sent to the data
store and the data tag generated there.
5. A data processing system according to claim 4, wherein said
decision module is arranged to utilise data item size and
complexity information to generate an estimate of the processing
power required to generate an associated data tag and determine if
said estimated processing power exceeds the processing power
available.
6. A data processing system according to claim 5, wherein said
processing power available at said data item acquisition unit is
variable.
7. A data processing system according to claim 5, wherein said
decision module is further arranged to evaluate the efficiency of
transmitting the data item to the data store for data processing
based on the size of the data item and the quality and/or speed of
the connection between the data acquisition unit and the data
store.
8. A data processing system according to claim 5, wherein said
decision module is further arranged to include one or more
restrictions on data to be included in said data tag in said
evaluation.
9. A data processing system according to claim 1, wherein said data
tag generator is arranged to detect a failure to generate a data
tag for a data item.
10. A data processing system according to claim 9, wherein in
response to said failure the data item is transmitted to the data
store with a request for the data tag to be generated therein.
11. A data processing system according to claim 9, wherein in
response to said failure any portion of said data tag generated
prior to said failure is transmitted to said data store.
12. A data processing system according to claim 9, wherein in
response to said failure said data tag generator attempts to
generate a simplified data tag.
13. A data processing system according to claim 9, wherein further
configuration information held by the data tag generator at the
data store is transmitted to the data acquisition unit to allow
successful data tag generation to occur.
14. A data processing system according to claim 1, wherein said
data tag generator is arranged such that data tag configuration
information is defined in relation to a presented data item and the
data tag configuration information is transmitted to the data store
for inclusion in the data tag generator located therein.
15. A data processing system according to claim 1, wherein said
data store is connected to one or more other data stores so as to
provide a hierarchical arrangement of data stores.
16. A method of processing data including the step of generating a
data tag associated with a data item at a data acquisition unit and
transmitting at least said data tag to a data store.
17. A method of processing data according to claim 16, wherein the
data item is also transmitted to the data store.
18. A method of processing data according to claim 17, wherein said
data tag generation can also occur at the data store.
19. A method of processing data according to claim 18, wherein said
method includes an evaluation step for evaluating if it is
appropriate to generate the data tags at the or each acquisition
unit or to transmit the data item to a data store for data tag
generation to occur there.
20. A method of processing data according to claim 19, wherein the
evaluation step includes determining the size and complexity of the
data item to be processed, providing an estimate of the processing
power required to generate the associated data tag in response to
the determination, comparing the estimated processing power with
the processing power of the available data tag generator, and in
response to the comparison either generating the data tag locally
or transmitting the data item to a data store.
21. A method of processing data according to claim 19, wherein said
evaluation step includes evaluating the efficiency of transmitting
the data item to the data store for data processing based on the
size of the data item and the quality and/or speed of the
connection between the data acquisition unit and data store.
22. A method of processing data according to claim 16, wherein the
method further includes transmitting configuration information from
a data store to a data acquisition device and configuring a data
tag generator located at the data acquisition device in accordance
with the configuration information.
23. A method of processing data according to claim 16, wherein data
tag generation configuration information is transmitted from a data
item acquisition device to a data store, said configuration
information being defined in relation to a user presented data
item.
24. A method of processing data according to claim 19, wherein said
data store is connected to at least one further data store and said
evaluation step is performed at least one of said data stores.
25. A data item acquisition device comprising a data tag generator
and arranged to transmit a generated data tag associated with an
acquired data item to a data store.
26. A data item acquisition device according to claim 25 further
comprising an evaluation module arranged to perform the evaluation
procedure of claim 19.
27. A data item acquisition device according to claim 25, wherein
said data item acquisition device forms an integral part of a
personal computing apparatus.
28. A data item acquisition device according to claim 25, wherein
one or more data capture devices selected from a list including an
electronic camera, scanner and microphone is connected to an input
of the data acquisition device.
29. A data item acquisition device according to claim 25, wherein
the data acquisition unit is integrated within a data capture or
data storage device.
30. A data store arranged to store a plurality of data tags
associated with respective data items, and arranged to export, upon
request, data tag generator configuration information for use by
data tag generators.
31. A data store according to claim 30, wherein said data store is
arranged to store the respective data items associated with the
data tags.
32. A data store according to claim 30, wherein said data store
comprises a data tag generator for generating data tags associated
with data items input to the store.
33. A data store according to claim 30, wherein said data store
comprises a search engine arranged to locate data tags conforming
to a user search request and cause the data items associated with
the located data tags to be output from the data store.
34. A data store according to claim 30, wherein said data store is
connected to one or more other data stores.
35. A data store according to claim 30, wherein said data store
further comprises an evaluation module that is arranged to
determine the processing power required to generate a data tag for
a data item, compare the required processing power to the
processing power available at the data tag generator, and in
response to the comparison either enable the data tag generator to
generate the data tag or enable the transmission of the data item
to a data store.
36. A computer program product arranged to cause a data processor
to execute the method according to claim 16.
37. A distributed metadata processing system comprising at least
one data capture device and at least one remote data store in
communication with the data capture device, the data capture device
comprising metadata generation means for generating metadata
associated with a captured data item, the data capture unit further
comprising communication means for sending the metadata to the at
least one remote data store.
38. A data processing system comprising at least one data item
acquisition unit and at least one remote data store, and in which
both the data item acquisition unit and the remote data store
include a metadata generator for generating metadata associated
with acquired data, the data item acquisition unit further
including a decision module arranged to evaluate if the metadata is
most efficiently generated at the data acquisition unit or at the
remote data store, the data acquisition unit being arranged to
transmit at least the either the generated metadata or the data
item to the remote data store in response to said evaluation.
39. A data processing system comprising at least one data item
acquisition unit and at least one remote data store, the data item
acquisition unit comprising a data tag generator for generating a
data tag associated with each data item, and the data item
acquisition unit being arranged to transmit at least the data tag
to the remote data store, wherein said data tag generator is
arranged to detect a failure to generate a data tag for a data item
and in response to said failure to request the transmission of
further configuration information held at the remote data store to
allow successful data tag generation to occur.
40. A method of generating a data tag associated with a data item,
the method including the steps of: acquiring the data item using a
data capture unit; evaluating whether to generate the data tag at
the data capture unit or at a data store in communication with the
data capture unit; and in response to the evaluation, communicating
either the data item or generated data tag to the data store.
41. A method of generating metadata associated with a data item,
the method including: acquiring a data item at a data acquisition
unit having a metadata generator; determining if the metadata
generator has sufficient configuration information to successfully
generate the metadata for the data item; and if the outcome of said
determination is negative, requesting the transmission of further
configuration information from a data store in communication with
the data acquisition unit.
Description
BACKGROUND OF THE INVENTION
[0001] It is well known to provide data bases for the storage of
individual data items. The data items may, for example, be
individual photographs or other images, text documents, items of
music or other audio information, personnel records, or any other
such data items. The purpose of such data bases is to provide both
storage for the data items and also to provide the facility to
search through the stored data items in accordance with one or more
criteria.
[0002] To allow such searching to be executed, the individual data
items are indexed in some manner. Early systems of indexing were
relatively simple and included such schemes as grouping a number of
data items together within a single category, such that when a
search was performed for items within that category the relevant
data items could be retrieved.
[0003] However, in time, more advanced indexing schemes have been
developed, with one such scheme involving the use of metadata.
Metadata describes data, that is, it is data about data. The
metadata associated with a data item may be as simple as a set of
natural language comments referring to one or more elements of the
data item. Alternatively, the metadata may equally be much more
complex comments, or tags, that may refer to the structure of the
data. For example, the metadata associated with a text document may
include a reference to the subject matter of the document, its
author, the number of words, the size of the data item etc. As a
further example, metadata associated with a graphical image such as
a photograph may include tags identifying different elements of the
image i.e. that the image is a sunrise/sunset, it includes peoples
faces, it is a landscape, and so on.
[0004] The metadata that is generated for each data item stored
within a database is ordinarily much more compact than the raw data
itself. An analogy would be the use of classification cards in a
public library. Each card represents a book stored in the library
and contains certain items of information about the book, for
example author and title. The cards themselves occupy a relevantly
small amount of space compared to the space occupied by the
complete library and by searching the cards a set of books, for
example by the same author, can be identified.
[0005] However, the metadata may in some circumstances be much more
sizeable than the original data item. Maintaining the library
analogy, a single novel, such as `Pride and Prejudice`, may have a
number of books analysing it associated with it. These volumes of
analysis are analogous to the metadata to the original novel, yet
would take up more shelf space than the novel itself.
[0006] Powerful search tools that exploit metadata are used to
augment conventional search tools.
[0007] The increased functionality of databases and their
associated search engines is one of the factors in their increased
usage. Another factor is the increased usage of networked systems
with local or remote network stations being linked to a central
database. The link may be provided by a dedicated transmission
cable or a shared transmission cable or wireless connection, or
other such connection means. The quality and/or speed of the
connection may pose a serious restriction on the amount of data
that can be transmitted between the central database and the remote
stations. It is therefore a problem to provide a large database
with powerful search facilities, especially when dealing with
non-textual data, that is easily and quickly accessible from remote
stations. It may be equally problematic for data items to be
exchanged between the database and the remote stations.
[0008] A further problem is the cost in processing terms required
to generate the metadata. A large centralised database may require
a prohibitive amount of processing power to deal with the
generation of increasing amounts of metadata. Equally, local
computers, such as domestic PC's, or portable devices such as
personal digital assistants or cameras may not ordinarily have the
processing power available to perform the metadata generation under
all circumstances or in acceptable time frames.
[0009] At least some of the above problems apply to locally held
databases. The consideration then is whether to perform the
metadata generation locally or request an additional remote
facility to do it even though there is no requirement or intention
to transmit either the generated metadata or data item to a
centralised database.
SUMMARY OF THE INVENTION
[0010] According to the present invention there is provided a data
processing system comprising at least one data item acquisition
unit and at least one data store, the data item acquisition unit
including a data tag generator for generating a data tag associated
with each data item, and the data item acquisition unit being
arranged to transmit at least the data tag to the data store.
[0011] The data item acquisition unit may also transmit the data
item itself to the at least one data store.
[0012] Additionally, the at least one data store may also include a
data tag generator.
[0013] Each data store may be connected to one or more other data
stores so as to provide a hierarchical arrangement of data
stores.
[0014] The data item acquisition units may include a decision
module that evaluates if the data tag should be generated at the
data acquisition unit or if the data item should be sent to the
data store and the data tag generated there. The evaluation may
take into consideration the size and complexity of the data item
and hence the processing power required to generate the data tag,
and may also include an evaluation of the efficiency of
transmitting the data item to the data store for data processing
based on the size of the data item and the quality and/or speed of
the connection between data acquisition unit and data store. The
evaluation may also take into consideration any previously
stipulated privacy requirements.
[0015] A use of the metadata is to preserve the privacy of the
original data item. If the data item includes elements that it is
desired to keep secret, only the metadata associated with the
remaining elements need be transmitted to the data store. In a
similar manner, a search query may be transmitted to the data store
with only the metadata essential for the search without
transmitting the original data item itself.
[0016] The data tag generator at the data acquisition unit is
preferably arranged to detect a failure to generate a data tag for
a data item. A failure to generate the metadata may occur due to
one or more of a number of reasons. It is normal practice not to
limit the amount of system memory required during metadata
generation. There is therefore the possibility for a failure to
occur due to the data acquisition unit running out of memory.
Equally, the processing power required to generate the metadata may
be greater than that available at the data acquisition unit at that
time. A further cause of failure may be that the data acquisition
unit is not appropriately configured with the relevant contextual
data for the data item being presented.
[0017] The failure may be a `hard` failure, in which case no
metadata is generated and the data acquisition unit may transmit
the data item to the data store, the data tag (metadata) generation
then occurring at the data store. Alternatively, appropriate
configuration information held by the data tag generator at the
data store may be transmitted to the data acquisition unit to allow
successful data tag generation to occur at the data acquisition
unit. The failure may alternatively be a `soft` failure, in which
case the metadata generated prior to the failure occurring may be
transmitted to the data store, or equally simplified metadata may
be generated instead and transmitted to the data store.
[0018] Alternatively or additionally, the data acquisition unit may
be arranged such that new data tag configuration information may be
defined in relation to a presented data item and the new data tag
configuration information transmitted to the data store for
inclusion in the data tag generator located therein.
[0019] According to a second aspect of the present invention there
is provided a method of processing data, the method comprising
generating a data tag associated with a data item, said data tag
generation occurring at a data acquisition unit, and transmitting
at least said data tag to a data store.
[0020] Advantageously the data item may also be transmitted to the
data store. Data tag generation may also occur at the data
store.
[0021] The method may further include evaluating if it is
appropriate to generate the data tags at the or each acquisition
unit or to transmit the data item to a data store for data tag
generation to occur there. The evaluation procedure may include
determining the size and complexity of the data item to be
processed, providing an estimate of the processing power required
to generate the associated data tag in response to the
determination, comparing the estimated processing power with the
processing power of the available data tag generator, and in
response to the comparison either generating the data tag locally
or transmitting the data item to a data store.
[0022] The method may further comprise transmitting configuration
information from a data store to a data acquisition device and
configuring a data tag generator located at the data acquisition
device in accordance with the configuration information.
[0023] Additionally or alternatively, data tag generator
configuration information may be transmitted from a data item
acquisition device to a data store, said configuration information
being defined in relation to a user presented data item.
[0024] According to a third aspect of the present invention there
is provided a data item acquisition device comprising a data tag
generator and being arranged to transmit a generated data tag
associated with an acquired data item to a data store.
[0025] The data item acquisition may device further comprise an
evaluation module that is arranged to determine the processing
power required to generate a data tag for a data item, compare the
required processing power to the processing power available at the
data tag generator, and in response to the comparison either enable
the data tag generator to generate the data tag or enable the
transmission of the data item to a data store.
[0026] The data item acquisition device may form an integral or
peripheral part of a personal computer. Additionally one or more
data capture devices, for example an electronic camera, scanner or
microphone, may be connected to an input of the data acquisition
device. Alternatively the data acquisition unit may be integrated
within a data capture or data storage device.
[0027] According to a fourth aspect of the present invention there
is provided a data store arranged to store a plurality of data tags
associated with respective data items, and arranged to export, upon
request, a data tag generator configuration information for use by
data tag generators.
[0028] Preferably the data store is arranged to store the
respective data items associated with the data tags. The data store
may additionally comprise a data tag generator for generating data
tags associated with data items input to the store.
[0029] Additionally the data store may comprise a search engine
arranged to locate data tags conforming to a user search request
and cause the data items associated with the located data tags to
be output from the data store.
[0030] Preferably the data store is connected to one or more other
data stores. Additionally the data store may further comprise an
evaluation module that is arranged to determine the processing
power required to generate a data tag for a data item, compare the
required processing power to the processing power available at the
data tag generator, and in response to the comparison either enable
the data tag generator to generate the data tag or enable the
transmission of the data item to a data store.
[0031] According to a fifth aspect of the present invention, there
is provided a computer program product for causing a data processor
to execute the method according to the second aspect of the present
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] The present invention will now be described, by way of
example, with reference to the accompanying drawings, in which:
[0033] FIG. 1 shows a schematic representation of a data processing
system according to an embodiment of the present invention
connected to a number of data input devices: and
[0034] FIG. 2 shows a further embodiment of the present invention
having a multi-layer structure.
DETAILED DESCRIPTION OF THE INVENTION
[0035] FIG. 1 shows a data acquisition device or unit 2 connected
to a data store 4. The data acquisition unit 2 is connected to one
or more data input devices. Examples of data input devices that are
shown are a discrete data storage unit 6, for example a hard disk,
a digital camera 8, and a document scanner 10. Other input devices
such as video or sound recorders could also be provided. Located in
the data acquisition unit 2 is a datatag generator 12 also known as
a metadata generator. The metadata generator is arranged to process
data items input from one or more of the data input devices to
generate datatags or metadata for each data item. A data store 4 is
connected to the data acquisition unit 2. The data store unit
includes one or more data storage devices 14, such as known hard
disk drives. Connected to the data storage devices 14 is a data
query and/or indexing unit that is arranged to perform conventional
data searching procedures. The data storage devices 14 are arranged
to store either a plurality of data tags, a plurality of individual
data items, or both data items and their associated datatags. The
data acquisition unit 2 and data store 4 are connected by any
suitable data transmission channel, for example by fibre optic
cable, or by wireless connections.
[0036] In use, data items will be input to the data acquisition
unit 2 from one of the data input devices 6-10. On receipt of the
data items the metadata generator 12 will perform data processing
to generate metadata associated with the input data items. The
metadata may then be transmitted from the data acquisition unit 2
to the data store 4, together with, for example, a request from the
data acquisition unit for the data store 4 to provide further data
items that have similar metadata associated with them. This method
of operation has the advantage that metadata generation is
performed locally at the data acquisition unit 2 and not at the
data store 4, thus freeing resources at the data store 4 that may
be applied more efficiently to searching the contents of the data
store 4 for requested data items. Additionally, having generated
the associated metadata at the data acquisition unit 2, both the
metadata and the associated data item may be transmitted to the
data store 4 to be added to the data items and metadata stored
there. In this way a database that is stored at the data store 4
may be expanded and updated relying solely on metadata and data
items provided by remote stations without utilising the central
resources at the central data store.
[0037] In a further embodiment of the present invention the data
acquisition unit 2 may include a decision unit 18 connected to the
metadata generator 12. The function of the decision unit 18 is to
perform an evaluation of whether generation of the metadata for an
input data item would be better performed either locally at the
data acquisition unit 2 or centrally at the store 4. In these
embodiments, the data storage unit 4 also includes a metadata
generator 20. The evaluation of where to perform the metadata
processing may take into account a number of parameters, for
example the size and complexity of the data item(s) and therefore
the processing power required to perform the metadata processing,
or the size of the data item(s) in comparison with the transmission
abilities of the connection between the data acquisition unit 2 and
the data store 4. The evaluation may also take into consideration
any stipulated privacy requirements. For example, it may be
stipulated by a user that the author or originator of a data
item(s) is not included in the metadata. This will have an impact
on the complexity of the generated metadata. As a further example,
if the connection between the data acquisition unit 2 and the data
store 4 is of limited capacity, the decision unit 18 may evaluate
that it is more efficient to generate the metadata locally at the
data acquisition unit 2 using the metadata generator 12 rather than
attempt to transmit the relatively large amount of data down the
restricted transmission capacity of the connection between the data
acquisition unit 2 and the central store 4. Alternatively, an
evaluation may be made that it is more efficient to transmit the
data item(s) unprocessed to the data store 4 to be processed by the
more powerful data generator processor 20 located at the data store
4.
[0038] In certain embodiments of the present invention the metadata
generator 12 located at the data acquisition unit 2 is arranged to
report any failures to generate metadata for data items. A failure
to generate metadata for a data item may occur due to the metadata
generator 12 located at the data acquisition unit 2 not being
configured to with appropriate contextual data generate metadata
for a particular kind of data item. Alternative or additional
causes of failure may include running out of memory during
generation of the metadata or insufficient processing resources
being available. The available processing resources may vary
depending on other tasks being performed by the data acquisition
unit at any given time. When such a failure occurs, the decision
unit 18 may either transmit the particular data item to the data
store 4 in order for the metadata generator 20 located at the data
store 4 to generate the metadata, request revised configuration
information from the metadata generator 20 located at the data
store 4 to enable the metadata generator unit 12 located at the
data acquisition unit 2 to be reconfigured to enable metadata to be
generated for the data item at the data acquisition unit 2,
generate simplified metadata that requires less system resources or
configuration information, or simply transmit to the data store 4
the metadata that had been successfully generated prior to the
failure occurring.
[0039] It will be appreciated by those skilled in the art that the
data acquisition unit 2 may be a dedicated processing unit or may
be integrated as either hardware or software within a general
personal computer or the like. In the latter case, the decision
unit 18 may also take into account the demands on the computer's
processor when evaluating whether to generate metadata at the data
acquisition unit or not. It will be appreciated that metadata
processing may be performed at the data acquisition unit 2 during
what would otherwise be "idle" processing periods.
[0040] FIG. 2 shows an embodiment of the present invention
utilising a multi-layered, hierarchical arrangement of data storage
units. A number of "first layer" data stores 4 are provided. Each
data store includes a metadata generator 20. Connected to each
first layer data stores 4 are one or more data acquisition units 2.
In FIG. 2 the data acquisition units 2 are schematically shown as
being connected to different data input sources. The sources shown
include a digital camera 30, a document source 32, audio source 34
and video source 36. Each data acquisition unit 2 also includes a
metadata processor 12. The data acquisition units 2 are arranged to
operate in the same manner as those described in FIG. 1.
[0041] Each of the first layer data stores 4 is connected to a
second layer data store 38. Preferably, but not necessarily, the
second layer data store 38 has increased storage capacity in
comparison to the first layer data stores 4. Although only two
layers of data stores are shown in FIG. 2, it will be appreciated
that any number of layers can be used. In use, the decision making
process that occurs at the data acquisition units 2, as described
with reference to FIG. 1, also takes place at each of the levels of
data stores. Thus the system is flexible enough to perform the
metadata processing at whichever layer is deemed most
appropriate.
[0042] In further embodiments, the data input sources, for example
the digital camera 30 shown in FIG. 2, may themselves include a
metadata generator. This allows a further sub-layer of metadata
processing and decision making to be performed.
[0043] By providing metadata generators at the various different
layers of the system the processing is distributed throughout the
system. This provides the advantage that both processing power
throughout the system and the use of transmission connections can
be optimised. The processing power of the metadata generators at
the different layers may either be identical, or may increase
towards the highest layer. In the latter case, only the more
complex or large data items would need to be processed by the more
powerful metadata generators, with the simpler data items being
processed at lower levels by individual metadata generators.
[0044] In the same manner as described with relation to FIG. 1, the
lower level metadata generators may be "updated" with new
configuration information provided by metadata generators in the
higher levels.
* * * * *