U.S. patent application number 15/961869 was filed with the patent office on 2019-10-24 for fast read operation utilizing reduced storage of metadata in a distributed encoded storage system.
The applicant listed for this patent is Western Digital Technologies, Inc.. Invention is credited to Stijn Blyweert, Mark Christiaens, Koen De Keyser, Frederik Jacqueline Luc De Schrijver.
Application Number | 20190324657 15/961869 |
Document ID | / |
Family ID | 68237774 |
Filed Date | 2019-10-24 |
United States Patent
Application |
20190324657 |
Kind Code |
A1 |
De Keyser; Koen ; et
al. |
October 24, 2019 |
FAST READ OPERATION UTILIZING REDUCED STORAGE OF METADATA IN A
DISTRIBUTED ENCODED STORAGE SYSTEM
Abstract
A data object can be encoded into multiple encoded data
fragments and stored on the backend of a distributed encoded
storage system. The identifiers and metadata corresponding to each
fragment of the data object can be stored in a first section of a
metadata unit, and the initial part of the data object in a second
section. The metadata unit is encoded into multiple metadata
fragments, which are stored on the backend. The identifiers of the
metadata fragments can be associated with the data object and
stored on a fast frontend storage device. In response to a request
to access the data object, the identifiers are used to retrieve the
metadata fragments from the backend, and decode the metadata unit.
The initial part of the data object is retrieved from the metadata
unit and transmitted to the requesting client application to begin
processing the data object.
Inventors: |
De Keyser; Koen;
(Sint-Denijs-Westrem, BE) ; De Schrijver; Frederik
Jacqueline Luc; (Wenduine, BE) ; Blyweert; Stijn;
(Huizingen, BE) ; Christiaens; Mark; (Westrem,
BE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Western Digital Technologies, Inc. |
San Jose |
CA |
US |
|
|
Family ID: |
68237774 |
Appl. No.: |
15/961869 |
Filed: |
April 24, 2018 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 3/0619 20130101;
G06F 3/067 20130101; G06F 3/0685 20130101; G06F 3/0611 20130101;
G06F 3/065 20130101 |
International
Class: |
G06F 3/06 20060101
G06F003/06 |
Claims
1. A computer-implemented method, comprising: encoding a data
object into a plurality of encoded data fragments; transmitting
each one of the plurality of encoded data fragments to a backend of
a distributed encoded data storage system, for distribution across
a plurality of backend storage elements, wherein the distribution
of the plurality of encoded data fragments across the plurality of
backend storage elements provides a specific level of redundancy;
for each one of the plurality of encoded data fragments, storing a
corresponding fragment identifier and associated metadata in a
first section of a single metadata unit; storing an initial part of
the data object in a second section of the single metadata unit;
encoding the single metadata unit into a plurality of encoded
metadata fragments; transmitting each one of the plurality of
encoded metadata fragments to the backend of the distributed
encoded data storage system for distribution across the plurality
of backend storage elements, wherein the distribution of the
plurality of encoded metadata fragments across the plurality of
backend storage elements provides the specific level of redundancy;
associating fragment identifiers corresponding to the encoded
metadata fragments with the data object; storing the fragment
identifiers corresponding to the encoded metadata fragments in a
frontend storage element, wherein the frontend storage element
provides faster access to stored content than backend storage
elements; receiving a request from a client application to access
the data object; retrieving the fragment identifiers corresponding
to the encoded metadata fragments from the frontend storage
element, responsive to receiving the request to access the data
object from the client application; retrieving the encoded metadata
fragments from the backend of the distributed encoded data storage
system; decoding the single metadata unit using the retrieved
encoded metadata fragments; retrieving the initial part of the data
object from the second section of the single metadata unit; and
transmitting the initial part of the data object to the client
application to begin processing the data object.
2. The computer-implemented method of claim 1, wherein transmitting
the initial part of the data object to the client application to
begin processing the data object further comprises: transmitting
the initial part of the data object to the client application to
begin processing the data object, prior to or concurrently with
retrieving the encoded metadata fragments from the backend of the
distributed encoded data storage system.
3. (canceled)
4. The computer-implemented method of claim 1, wherein to begin
processing the data object further comprises one of: to display a
preview of the data object on a screen of a computing device; and
to launch an end-user application for opening the data object.
5. (canceled)
6. The computer-implemented method of claim 1, wherein the initial
part of the data object further comprises: a header of the data
object; and a subsection of content of the data object.
7. The computer-implemented method of claim 1, wherein the initial
part of the data object further comprises: information concerning a
format of the data object; and a subsection of content of the data
object.
8. (canceled)
9. The computer-implemented method of claim 1, further comprising:
retrieving, for each one of the plurality of encoded data
fragments, the corresponding fragment identifier and associated
metadata from the first section of the single metadata unit;
retrieving encoded data fragments from the backend of the
distributed encoded data storage system, utilizing corresponding
fragment identifiers and associated metadata; and transmitting
encoded data fragments to the client application to decode the data
object, subsequently to transmitting the initial part of the data
object to the client application to begin processing the data
object.
10. The computer-implemented method of claim 1, further comprising:
retrieving, for each one of the plurality of encoded data
fragments, the corresponding fragment identifier and associated
metadata from the first section of the single metadata unit;
retrieving encoded data fragments from the backend of the
distributed encoded data storage system, utilizing corresponding
fragment identifiers and associated metadata; decoding the data
object using the retrieved encoded data fragments; and transmitting
the data object to the client application, subsequently to
transmitting the initial part of the data object to the client
application to begin processing the data object.
11. The computer-implemented method of claim 1, wherein: the
backend storage elements further comprise electromechanical storage
devices.
12. The computer-implemented method of claim 1, wherein: the
frontend storage element further comprises a solid state storage
device.
13. (canceled)
14. (canceled)
15. The computer-implemented method of claim 1, wherein providing a
specific level of redundancy further comprises: providing a
predetermined level of storage redundancy associated with the
distributed encoded data storage system.
16. The computer-implemented method of claim 1, wherein: metadata
associated with each specific encoded data fragment of the
plurality of encoded data fragments further comprises a storage
location of the specific encoded data fragment and encoding
information concerning the specific encoded data fragment.
17. The computer-implemented method of claim 1, further comprising:
transmitting a copy of content of the frontend storage element to
the backend of the distributed encoded data storage system for
storage; retrieving a stored copy from the backend, responsive to a
loss of content of the frontend storage element; and storing the
retrieved stored copy on the frontend storage element.
18. (canceled)
19. The computer-implemented method of claim 1, further comprising:
in response to transmitting each one of the plurality of encoded
data fragments, receiving, for each one of the plurality of encoded
data fragments, a corresponding fragment identifier from the
backend of the distributed encoded data storage system; and in
response to transmitting each one of the plurality of encoded
metadata fragments, receiving, for each one of the plurality of
encoded metadata fragments, a corresponding fragment identifier
from the backend of the distributed encoded data storage
system.
20. (canceled)
21. A system, comprising: one or more processors; a storage manager
executable to by the one or more processors to perform operations
comprising: encoding a single metadata unit storing a part of a
data object into a plurality of encoded metadata fragments;
transmitting the plurality of encoded metadata fragments to a
backend of a distributed encoded data storage system for
distributed storage across a plurality of backend storage elements;
responsive to receiving a request to access the data object,
retrieving a plurality of fragment identifiers corresponding to the
plurality of encoded metadata fragments from a frontend storage
element; retrieving the encoded metadata fragments from the backend
of the distributed encoded data storage system; decoding the single
metadata unit using the retrieved encoded metadata fragments;
retrieving the part of the data object from the single metadata
unit; and transmitting the part of the data object to a requesting
application to begin processing the data object.
22. The system of claim 21, further comprising: encoding the data
object into a plurality of encoded data fragments; transmitting the
plurality of encoded data fragments of the data object to a backend
of a distributed encoded data storage system for distributed
storage across a plurality of backend storage elements; storing the
plurality of fragment identifiers and associated metadata
corresponding to the plurality of encoded data fragments in a first
section of a single metadata unit; and storing the part of the data
object in a second section of the single metadata unit.
23. The system of claim 22, further comprising: retrieving the
plurality of fragment identifiers and associated metadata from the
first section of the single metadata unit; retrieving the plurality
of encoded data fragments from the backend of the distributed
encoded data storage system, utilizing the plurality of fragment
identifiers and associated metadata; and transmitting the plurality
of encoded data fragments to the requesting application subsequent
to transmitting the part of the data object to the requesting
application.
24. The system of claim 22, further comprising: retrieving the
plurality of fragment identifiers and associated metadata from the
first section of the single metadata unit; retrieving the plurality
of encoded data fragments from the backend of the distributed
encoded data storage system, utilizing the plurality of fragment
identifiers and associated metadata; decoding the data object using
the retrieved plurality of encoded data fragments; and transmitting
the data object to the requesting application subsequent to
transmitting the part of the data object to the requesting
application.
25. The system of claim 21, wherein transmitting the part of the
data object to the requesting application to begin processing the
data object further comprises: transmitting the part of the data
object to the requesting application to begin processing the data
object, prior to or concurrently with retrieving the encoded
metadata fragments from the backend of the distributed encoded data
storage system.
26. A system, comprising: means for encoding a single metadata unit
storing a part of a data object into a plurality of encoded
metadata fragments; means for transmitting the plurality of encoded
metadata fragments to a backend of a distributed encoded data
storage system for distributed storage across a plurality of
backend storage elements; means for retrieving a plurality of
fragment identifiers corresponding to the plurality of encoded
metadata fragments from a frontend storage element, responsive to
receiving a request to access the data object; means for retrieving
the encoded metadata fragments from the backend of the distributed
encoded data storage system; means for decoding the single metadata
unit using the retrieved encoded metadata fragments; means for
retrieving the part of the data object from the single metadata
unit; and means for transmitting the part of the data object to a
requesting application to begin processing the data object.
27. The system of claim 26, further comprising: means for encoding
the data object into a plurality of encoded data fragments; means
for transmitting the plurality of encoded data fragments of the
data object to a backend of a distributed encoded data storage
system for distributed storage across a plurality of backend
storage elements; means for storing the plurality of fragment
identifiers and associated metadata corresponding to the plurality
of encoded data fragments in a first section of a single metadata
unit; and means for storing the part of the data object in a second
section of the single metadata unit.
Description
CROSS-REFERENCES TO RELATED APPLICATION
[0001] This application is related to the U.S. patent application
filed with the U.S. Patent and Trademark Office under Attorney
Docket No. WDA-3569-US on Apr. 24, 2018, having the same assignee
and entitled Reduced Storage of Metadata in a Distributed Encoded
Storage System, the entire contents of which are hereby
incorporated herein by reference.
TECHNICAL FIELD
[0002] The present disclosure pertains generally to storage
systems, and more specifically to fast read operations utilizing
reduced storage of metadata in a distributed encoded storage
system.
BACKGROUND
[0003] The rise in electronic and digital device technology has
rapidly changed the way society communicates, interacts, and
consumes goods and services. Modern computing devices allow
organizations and users to have access to a variety of useful
applications in many locations. Using such applications results in
the generation of a large amount of data. Storing and retrieving
the produced data is a significant challenge associated with
providing useful applications and devices.
[0004] The data generated by online services and other applications
can be stored at data storage facilities. As the amount of data
grows, having a plurality of users sending and requesting data can
result in complications that reduce efficiency and speed. Quick and
reliable access in storage systems is important for good
performance.
[0005] Distributed encoded storage systems typically divide each
data object to be stored into a plurality of data pieces, each of
which is encoded into a plurality of encoded data fragments. The
encoded data fragments are spread across multiple backend storage
elements, thereby providing a given level of redundancy. A
distributed encoded storage system maintains metadata which
identifies each stored data object, specifies where in the system
each data object is stored, including where the encoded data
fragments have been distributed and hence from where they can
subsequently be retrieved, what type of encoding has been used,
etc. For each encoded fragment of the data object, an identifier,
location information and encoding information are maintained. Thus,
storage of a single data object generates a large amount of
associated metadata.
[0006] As noted above, a distributed encoded storage system stores
the encoded data fragments on storage elements in the backend.
However, because the corresponding metadata is accessed frequently
and needs to be provided with a high level of responsiveness, it is
typically stored on storage elements other than those of the
backend, as this would lead to unacceptable delays. Typically the
backend storage elements on which data objects are stored are in
the form of hard disks, while the metadata is stored on expensive,
fast, low latency storage elements, such as solid state disks
("SSDs"). The separate storage of metadata on SSDs leads to the
problem of a higher cost and typically a reduced level of
durability.
[0007] In addition, although conventional systems make use of
expensive, fast, low latency storage elements, such as SSDs, for
storage of the metadata, the overall responsiveness of read
operations is still negatively impacted by the need to retrieve a
sufficient number of encoded data fragments from the slower backend
storage elements after retrieval of the metadata, before the client
application that made the read request can access the data object.
In such conventional systems, a large number of encoded fragments
must be downloaded from the backend in order to decode the data
object, before the client application can begin processing the data
object. This adds significant latency to read operations.
[0008] It would be desirable to address at least these issues.
SUMMARY
[0009] A data object can be encoded into a plurality of encoded
data fragments and stored on backend storage elements in a
distributed encoded storage system. The identifiers and metadata
corresponding to each encoded fragment of the data object can be
stored in a single metadata unit, which can be encoded into a
plurality of metadata fragments. The single metadata unit can
contain the identifiers and associated metadata for all of the
encoded data fragments of the data object. The metadata unit can
also contain an initial part of the data object itself. Each one of
the plurality of encoded metadata fragments can be transmitted to
the backend of the distributed encoded data storage system for
distribution across the plurality of backend storage elements,
wherein the distribution of the plurality of encoded metadata
fragments across the plurality of backend storage elements provides
the same specific level of redundancy as the encoded data
fragments, and hence the underlying data object. The identifiers of
the encoded metadata fragments can be associated with the data
object and stored on a low latency frontend storage device. These
fragment identifiers can be quickly retrieved, and used to retrieve
the encoded metadata fragments from the backend. The retrieved
encoded metadata fragments can be used to decode the single
metadata unit, and the initial part of the data object in the
metadata unit can be used to begin the fulfillment of a read
request targeting the data object, while the identifiers and
metadata in the metadata unit are being used to retrieve the
encoded data fragments from the backend and decode the data object.
This enables the requesting client application to begin processing
the contents of the data object, without the delays associated with
downloading sufficient encoded data fragments to allow for the
decoding operation. Therefore, the end-user perceived latency of
read operations is reduced considerably, as well as the end-user
perceived responsiveness of the distributed storage system
generally.
[0010] A system of one or more computers can be configured to
perform particular operations or actions by virtue of having
software, firmware, hardware, or a combination of these installed
on the system, where the software, firmware and/or hardware
cause(s) the system to perform the actions. One or more computer
programs can be configured to perform particular operations or
actions by virtue of including instructions that, when executed by
data processing apparatus, cause the apparatus to perform the
actions.
[0011] One general aspect includes a computer-implemented method
comprising: encoding a data object into a plurality of encoded data
fragments; transmitting each one of the plurality of encoded data
fragments to a backend of a distributed encoded data storage
system, for distribution across a plurality of backend storage
elements, wherein the distribution of the plurality of encoded data
fragments across the plurality of backend storage elements provides
a specific level of redundancy; in response to transmitting each
one of the plurality of encoded data fragments, receiving, for each
one of the plurality of encoded data fragments, a corresponding
fragment identifier from the backend of the distributed encoded
data storage system (in another embodiment, the fragment
identifiers can be generated or selected on the front end, and,
e.g., provided to the backend for use); for each one of the
plurality of encoded data fragments, storing the corresponding
fragment identifier and associated metadata in a first section of a
single metadata unit; storing an initial part of the data object in
a second section of the single metadata unit; encoding the single
metadata unit into a plurality of encoded metadata fragments;
transmitting each one of the plurality of encoded metadata
fragments to the backend of the distributed encoded data storage
system for storage, wherein the distribution of the plurality of
encoded metadata fragments across the plurality of backend storage
elements provides the specific level of redundancy; in response to
transmitting each one of the plurality of encoded metadata
fragments, receiving, for each one of the plurality of encoded
metadata fragments a corresponding fragment identifier from the
backend of the distributed encoded data storage system (as noted
above, in another embodiment, the fragment identifiers can be
generated or selected on the front end); associating the fragment
identifiers corresponding to encoded metadata fragments with the
data object; storing the fragment identifiers corresponding to the
encoded metadata fragments in a frontend storage element, wherein
the frontend storage element provides faster access to stored
content than backend storage elements; receiving a request from a
client application to access the data object; retrieving the
fragment identifiers corresponding to the encoded metadata
fragments from the frontend storage element, responsive to
receiving the request to access the data object from the client
application; retrieving the encoded metadata fragments from the
backend of the distributed encoded data storage system; decoding
the single metadata unit using the retrieved encoded metadata
fragments; retrieving the initial part of the data object from the
second section of the single metadata unit; and transmitting the
initial part of the data object to the client application to begin
processing the data object.
[0012] Other embodiments of this aspect include corresponding
computer systems, system means, apparatus, and computer programs
recorded on one or more computer storage devices, each configured
to perform the actions of the methods.
[0013] Some implementations may optionally include one or more of
the following features: that transmitting the initial part of the
data object to the client application to begin processing the data
object further comprises transmitting the initial part of the data
object to the client application to begin processing the data
object, prior to retrieving the encoded data fragments from the
backend of the distributed encoded data storage system. That
transmitting the initial part of the data object to the client
application to begin processing the data object further comprises
transmitting the initial part of the data object to the client
application to begin processing the data object, concurrently with
retrieving the encoded data fragments from the backend of the
distributed encoded data storage system. That to begin processing
the data object further comprises displaying a preview of the data
object on a screen of a computing device and/or launching an
end-user application for opening the data object. That the initial
part of the data object further comprises a header and a subsection
of content of the data object, and/or information concerning a
format of the data object and a subsection of contents of the data
object. That receiving a request from a client application to
access the data object further comprises receiving a read request
targeting the data object from the client application. The features
retrieving, for each one of the plurality of encoded data
fragments, the corresponding fragment identifier and associated
metadata from the first section of the single metadata unit;
retrieving encoded data fragments from the backend of the
distributed encoded data storage system, utilizing corresponding
fragment identifiers and associated metadata; and transmitting
encoded data fragments to the client application to decode the data
object, subsequently to transmitting the initial part of the data
object to the client application to begin processing the data
object. The features retrieving, for each one of the plurality of
encoded data fragments, the corresponding fragment identifier and
associated metadata from the first section of the single metadata
unit; retrieving encoded data fragments from the backend of the
distributed encoded data storage system, utilizing corresponding
fragment identifiers and associated metadata; decoding the data
object using the retrieved encoded data fragments; and transmitting
the data object to the client application, subsequently to
transmitting the initial part of the data object to the client
application to begin processing the data object. That the backend
storage elements further comprise electromechanical storage
devices. That the frontend storage element further comprises a
solid state storage device, such as flash memory and/or dynamic
random access memory. That providing a specific level of redundancy
further comprises: providing a predetermined level of storage
redundancy associated with the distributed encoded data storage
system. That metadata associated with each specific one of the
plurality of encoded data fragments further comprises, a storage
location of the specific encoded data fragment and encoding
information concerning the specific encoded data fragment. The
features transmitting a copy of content of the frontend storage
element to the backend of the distributed encoded data storage
system for storage; retrieving the stored copy from the backend,
responsive to a loss of content of the frontend storage element;
and storing the retrieved copy on the frontend storage element. The
features caching content of at least one electromechanical backend
storage element on the solid state frontend storage element. That
in response to transmitting each one of the plurality of encoded
data fragments, receiving, for each one of the plurality of encoded
data fragments, a corresponding fragment identifier from the
backend of the distributed encoded data storage system. That in
response to transmitting each one of the plurality of encoded
metadata fragments, receiving, for each one of the plurality of
encoded metadata fragments, receiving a corresponding fragment
identifier from the backend of the distributed encoded data storage
system. The feature that each encoded metadata fragment is of a
same fragment format as the encoded data fragments.
[0014] Note that the above list of features is not all-inclusive
and many additional features and advantages are contemplated and
fall within the scope of the present disclosure. Moreover, the
language used in the present disclosure has been principally
selected for readability and instructional purposes, and not to
limit the scope of the subject matter disclosed herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 is a diagram of a distributed encoded storage system
in which a data object retrieval manager can operate, according to
one embodiment.
[0016] FIG. 2 is a diagram illustrating the operation of a data
object retrieval manager, according to one embodiment.
[0017] FIG. 3 is a flowchart illustrating the operation of a data
object retrieval manager, according to one embodiment.
[0018] The Figures depict various embodiments for purposes of
illustration only. One skilled in the art will readily recognize
from the following discussion that alternative embodiments of the
structures and methods illustrated herein may be employed without
departing from the principles described herein.
DETAILED DESCRIPTION
[0019] The present disclosure describes technology, which may
include methods, systems, apparatuses, computer program products,
and other implementations, for executing read operations of data
objects in a distributed encoded storage system, in which
identifiers and metadata corresponding to each distributed encoded
fragment of a data object are stored in a single metadata unit,
which can be encoded into a plurality of metadata fragments. The
single metadata unit can contain the identifiers and associated
metadata for all of the encoded data fragments of the data object.
The metadata unit can also contain an initial part of the data
object itself. Each one of the plurality of encoded metadata
fragments can be transmitted to the backend of the distributed
encoded data storage system for distribution across the plurality
of backend storage elements, wherein the distribution of the
plurality of encoded metadata fragments across the plurality of
backend storage elements provides the same specific level of
redundancy as the encoded data fragments, and hence the underlying
data object. The identifiers of the encoded metadata fragments can
be associated with the data object and stored on a low latency
frontend storage device. These fragment identifiers can be quickly
retrieved, and used to retrieve the encoded metadata fragments from
the backend. The retrieved encoded metadata fragments can be used
to decode the single metadata unit, and the initial part of the
data object in the metadata unit can be used to begin the
fulfillment of a read request targeting the data object, while the
identifiers and metadata in the metadata fragment are being used to
retrieve the encoded data fragments and decode the data object.
This enables the requesting client application to begin processing
the contents of the data object, without the delays associated with
downloading sufficient encoded data fragments to allow for the
decoding operation. Therefore, the end-user perceived latency of
read operations is reduced considerably, as well as the end-user
perceived responsiveness of the distributed storage system
generally.
[0020] FIG. 1 illustrates an exemplary datacenter 109 in a
distributed encoded storage system 100 in which a data object
retrieval manager 101 can operate, according to one embodiment. In
the illustrated distributed encoded storage system 100, datacenter
109 comprises storage servers 105A, 105B and 105N, which are
communicatively coupled via a network 107. A data object retrieval
manager 101 is illustrated as residing on storage server 105A. It
is to be understood that the data object retrieval manager 101 can
reside on more, fewer or different computing devices, and/or can be
distributed between multiple computing devices, as desired. In FIG.
1, storage server 105A is further depicted as having storage
devices 160A(1)-(N) attached, storage server 105B is further
depicted as having storage devices 160B(1)-(N) attached, and
storage server 105N is depicted with storage devices 160N(1)-(N)
attached. It is to be understood that storage devices 160A(1)-(N),
160B(1)-(N) and 160N(1)-(N) can be instantiated as
electromechanical storage such as hard disks, solid state storage
such as flash memory, other types of storage media, and/or
combinations of these.
[0021] Although three storage servers 105A-N each coupled to three
devices 160(1)-(N) are illustrated for visual clarity, it is to be
understood that the storage servers 105A-N can be in the form of
rack mounted computing devices, and datacenters 109 can comprise
many large storage racks each housing a dozen or more storage
servers 105, hundreds of storage devices 160 and a fast network
107. It is further to be understood that although FIG. 1 only
illustrates a single datacenter 109, a distributed encoded storage
system can comprise multiple datacenters, including datacenters
located in different cities, countries and/or continents.
[0022] It is to be understood that although the embodiment
described in conjunction with FIG. 2-3 is directed to object
storage, in other embodiments the data object retrieval manager 101
can operate in the context of other storage architectures. As an
example of another possible storage architecture according to some
embodiments, server 105A is depicted as also being connected to a
SAN fabric 170 which supports access to storage devices 180(1)-(N).
Intelligent storage array 190 is also shown as an example of a
specific storage device accessible via SAN fabric 170. As noted
above, SAN 170 is shown in FIG. 1 only as an example of another
possible architecture to which the data object retrieval manager
101 might be applied in another embodiment. In yet other
embodiments, shared storage can be implemented using FC and iSCSI
(not illustrated) instead of a SAN fabric 170.
[0023] Turning to FIG. 2, in one example embodiment, the data
object retrieval manager 101 executes a fast retrieval of a data
object 201 stored as a plurality of encoded data fragments 203 on
the backend 211 of the distributed encoded data storage system 100,
redundantly distributed across multiple backend storage elements
213. As described in detail in related application WDA-3569-US, the
data object retrieval manager 101 first divides a data object 201
(for example of size 256 MB, although data objects 201 can be of
different sizes in different embodiments) into a plurality of data
pieces 202. FIG. 2 illustrates the data object 201 being divided
into four data pieces (202A, 202B, 202C and 202D) for clarity of
illustration. Typically, a data object 201 would be divided into
(many) more data pieces 202 in practice. For example, each data
piece 202 could be of size 1 MB, although data pieces 202 can be of
different sizes in different embodiments.
[0024] The data object retrieval manager 101 encodes each one of
these data pieces 202 into a plurality of corresponding encoded
data fragments 203, which are distributed across multiple backend
storage elements 213 to provide a given level of redundancy. For
example, FIG. 2 illustrates each separate data piece 202A-D being
encoded into three separate encoded data fragments, e.g., 203A1,
203A2 and 203A3, etc. It is to be understood that three is just an
example number of data fragments 203, and data pieces 202 can be
divided into more (or fewer) data fragments 203 as desired. It is
to be further understood that the specific encoding format used to
encode the data pieces 202 into encoded data fragments 203 can vary
between embodiments, as can the size of the encoded data fragments
203.
[0025] These encoded data fragments 203 are provided to the backend
211 of the distributed encoded data storage system 100, where they
are redundantly distributed across multiple backend storage
elements 213, thereby providing a specific level of redundancy
(e.g., a predetermined level of storage redundancy associated with
the distributed encoded data storage system 100). It is to be
understood that low latency frontend storage elements 209 may be in
the form of SSDs such as flash memory devices, whereas backend
storage elements 213 may be in the form of hard disks or other
forms of electromechanical storage devices. Because solid state
storage has significantly faster access times than the
electromechanical storage, frontend storage elements 209 typically
provide faster access to stored content than backend storage
elements 213. In addition, solid state storage is significantly
more expensive per megabyte than electromechanical storage.
[0026] For each encoded data fragment 203 transmitted to the
backend 211 for storage, the backend 211 returns a corresponding
fragment identifier 205 (as noted above, in other embodiments
fragment identifiers can be generated/selected on the frontend and
provided to the backend for use). In addition, metadata 207
concerning each encoded data fragment 203 stored on the backend 211
is generated, such as the storage location of the specific encoded
data fragment 203 (e.g., information enabling the subsequent
retrieval of the encoded data fragment 203), encoding information
concerning the specific encoded data fragment 203 (e.g., the
encoding format used and any relevant parameters), relationship
information between the specific encoded data fragment 203 and its
corresponding data piece 202, relationship information between the
specific encoded data fragment 203 and other encoded data fragments
203 and/or relationship information between the multiple data
pieces 202 of the data object 201 (e.g., information enabling the
reconstruction of the underlying data object 201 from the plurality
of fragments 203), etc. It is to be understood that the specific
format and/or content of metadata 207 concerning encoded data
fragments 203 can vary between embodiments.
[0027] For each encoded data fragment 203, the data object
retrieval manager 101 stores the corresponding fragment identifier
205 and associated metadata 207 in a single metadata unit 214,
which is then encoded into a plurality of encoded metadata
fragments, e.g., 215A, 215B and 215C. It is to be understood that
the metadata unit 214 is analogous to a data piece 202, and the
encoded metadata fragments 215 can be of the same fragment format
as the encoded data fragments 203. In addition, the metadata unit
214 can be the same size (e.g., 1 MB) as the data pieces 202. In
one embodiment, in order to construct the metadata unit 214, as
fragment identifiers 205 are returned from the backend 211, the
fragment identifiers and corresponding metadata 207 and stored in a
temporary buffer. The contents of the temporary buffer is then
copied into the single metadata unit 214.
[0028] The fragment identifiers 205 and associated metadata 207 can
be stored in a first section of the metadata unit 214, and an
initial part of the data object 201 can be stored in a second
section of the metadata unit 214. For example, in an embodiment in
which the size of a fragment is, e.g., 1 MB, a first 0.5 MB section
of the metadata unit 214 could be used to store the fragment
identifiers 205 and metadata 207 associated with the various
encoded data fragments 203, and a second 0.5 MB section of the
metadata unit 214 could be used to store the initial part of the
data object 201. (One megabyte is just an example of a possible
size for the metadata unit 214, and other sizes can be used in
other embodiments). Continuing with the example described above,
the metadata unit 214 would contain both the identifiers 205 and
metadata 207 of all the encoded data fragments 203, and the initial
part of the data object 201. Note that the initial part of the data
object 201 may comprise a header and additional content. When a
client application 217 makes a request to access the data object
201 (e.g., a read operation), the data object retrieval manager 101
can quickly provide the provide the initial part of the data object
201 (e.g., header and content) to the requesting client application
217, and the client application 217 can begin processing the
initial part of the data object 201 while the various encoded data
fragments 203 of the data object 201 are being retrieved from the
backend 211.
[0029] More specifically, before or simultaneously with the
retrieval of the data object 201 from the backend 211, the initial
part of the data object 201 can be quickly retrieved from the
metadata unit 214 and provided to the requesting client application
217. The client application 217 can then use the, e.g., header and
initial content of the data object 201, to begin processing the
data object 201 as described in more detail below, for example in
order to show a suitable preview of the data object 201 on the
screen of a computing device, to launch a suitable end-user
application for opening the data object 201 (e.g., a word
processor, spreadsheet program, graphics program, database client
or other end-user application as desired), etc. It is to be
understood that as used herein, the language "first section" and
"second section" of the metadata unit 214 does not denote that the
sections are positioned within the metadata unit 214 in a specific
order, but only that these are two specific sections thereof. It is
also be understood that although the above example describes the
two sections of the metadata unit 214 as being of equal size, in
some embodiments more or less space can be used for one section or
the other as desired. In addition, the exact format and content of
the initial part of the data object 201 can vary between
embodiments, but may contain a header and/or other information
concerning the format of the data object 201, as well as a subset
of the substantive contents of the data object 201 itself.
[0030] The data object retrieval manager 101 transmits the multiple
encoded metadata fragments 215 encoded from the metadata unit 214
to the backend 211 of the distributed encoded data storage system
100 for storage, as was done with the encoded data fragments 203.
In response to transmitting each encoded metadata fragment 215, a
corresponding fragment identifier 205 is received from the backend
211. In other words, the backend 211 returns a fragment identifier
205 for each encoded metadata fragment 215, as it did for the
encoded fragments 203 (as noted above, in other embodiments
fragment identifiers can be generated/selected on the frontend and
provided to the backend for use). And like the encoded data
fragments 203, the backend 211 redundantly distributes the encoded
metadata fragments 215 across multiple backend storage elements
213, thereby providing the same level of redundancy for the encoded
metadata fragments 215 as for the encoded data fragments 203 (e.g.,
the predetermined level of storage redundancy associated with the
distributed encoded data storage system 100).
[0031] The metadata fragment identifiers 205 are associated with
the data object 201, and stored on a frontend storage element 209,
which as noted above provides faster access to stored content than
backend storage elements 213. It is to be understood that the
metadata fragment identifiers 205, which are just the fragment
identifiers 205 that identify the encoded metadata fragments 215,
are much smaller (for example, 16 bytes) than the identifiers 205
and associated metadata 207 (e.g., hundreds of kilobytes)
corresponding to all of the encoded data fragments 203. Thus, the
space consumed per data object 201 of the fast, low latency,
expensive frontend storage elements 209 is reduced.
[0032] Because the fragment identifiers 205 that identify the
metadata fragments 215 are stored on fast, low latency frontend
storage 209, they can be retrieved quickly. Because the metadata
fragment identifiers 205 are linked to the data object 201, they
can be used to retrieve the data object 201 from the backend 211.
More specifically, when the data object retrieval manager 101
receives a read (or other access) request targeting the data object
201 from a client application 217, the data object retrieval
manager 101 can retrieve the metadata fragment identifiers 205
associated with the data object 201 from a low latency frontend
storage element 209, and use the metadata fragment identifiers 205
to retrieve the corresponding metadata fragments 215 from the
backend 211. The data object retrieval manager 101 can then decode
the single metadata unit 214 using the retrieved encoded metadata
fragments 215. As explained above, this metadata unit 214 contains
the identifiers 205 and associated metadata 207 for all of the
encoded data fragments 203 of the data object 201.
[0033] The data object retrieval manager 101 can use the encoded
data fragment identifiers 205 and associated metadata 207 to
retrieve the encoded data fragments 203 of the data object 201 from
the backend 211. The metadata unit 214 also contains the initial
part of the data object 201. As described above, the data object
retrieval manager 101 can retrieve the initial part of the data
object 201 from the metadata unit 214, and transmit the initial
part of the data object 201 to the requesting client application
217 (e.g., the client application 217 that executed the read
operation targeting the data object 201). In other words, the data
object retrieval manager 101 can use the initial part of the data
object 201 to begin the fulfillment of an access request, before or
while the encoded data fragments 205 are being retrieved from the
backend 211. More specifically, during a retrieval operation of the
data object 201, the metadata unit 214 is retrieved, and the
initial part of the data object 201, which is contained in the
metadata unit 214, can be provided immediately to the requesting
client application 217. In this way the client application 217 can
begin processing the data object 201 with reduced latency, because
the initial part of the data object 201 can be provided to the
client application 217 before or concurrently with the retrieval of
the encoded data fragments 203.
[0034] As explained above, this initial portion of the data object
201 may comprise a header and/or other suitable information for
initiating the processing of the data object 201, as well as
content that can be used for, e.g., operations such as displaying a
preview or launching a suitable end-user application, etc. While or
after this occurs, the data object retrieval manager 101 can
retrieve the corresponding fragment identifiers 205 and associated
metadata 207 for the encoded data fragments 203 from the first
section of the single metadata unit 214, and use this information
to retrieve the encoded data fragments 203 from the backend 211 of
the distributed encoded data storage system 100. In due course, the
data object retrieval manager 101 can decode the data object 201
using the encoded data fragments 203, and provide the data object
201 to the requesting client application 217. In another
embodiment, the data object retrieval manager 101 transmits the
encoded data fragments 203 to the client application 217 to decode
the data object 201. Either way, the transmission of the data
object 201 or encoded data fragments 203 to the client application
217 can be subsequent to the transmission of the initial part of
the data object 201. Because the initial part of the data object
201 can be provided to the client application 217 after retrieving
the encoded metadata fragments 215 from the backend 211 and
decoding the metadata unit 214, the initial part of the data object
201 can be provided to and processed by the client application 217
much more quickly than the data object 201 as a whole.
[0035] Because the amount of frontend storage space utilized is
minimal, content stored on the backend 211 (e.g., encoded data
fragments 203) can be cached to frontend storage in one embodiment,
thereby improving overall performance. In other words, a given
amount of low latency frontend storage space can be used to cache
backend data such as encoded data fragments 203, thereby enabling
faster access times when cached data are retrieved, for example
during a read operation. The amount of frontend storage space to
utilize as a cache in this content is a variable design
parameter.
[0036] In one embodiment, low latency frontend storage elements 209
may be in the form of media other than flash memory, such as
dynamic random access memory (DRAM). DRAM is typically faster but
more expensive than flash memory. Unlike flash memory, DRAM is
volatile, quickly losing its data without power. In one embodiment
a copy of the contents of a frontend storage element 209 (e.g., the
metadata fragment identifiers 205, which are small in size) can be
provided to the backend 211 for storage. This enables the contents
of the frontend storage element 209 to be rebuilt from the copy
stored on the backend 211, for example in the case where the
contents of the frontend storage are lost due to, e.g., a lapse in
supplied power to volatile storage elements such as DRAM.
[0037] Note that even where the metadata fragment identifiers 205
are distributed across multiple frontend storage elements for
redundancy (e.g., multiple frontend storage elements 209 per
datacenter 109, or across multiple datacenters 109), the total
amount of low latency frontend storage space utilized is still
orders of magnitude less than in conventional systems in which all
of the metadata 207 is stored on the frontend 209.
[0038] In addition, the encoded metadata fragments 215 can be
processed by the backend 211 in the same or a similar way as any
other encoded fragment 203 (e.g., the encoded data fragments 205).
Thus, the encoded metadata fragments 215 can be stored on the
backend 211 in a way that provides the same level of redundancy as
is provided for the data object 201.
[0039] Thus, the use of the data object retrieval manager 101 as
described above ensures that the storage capacity per data object
201 for metadata 207 on expensive low latency storage elements 209
with a high level of responsiveness is reduced, without
compromising system level responsiveness. The overall redundancy
level of the metadata 207 and scalability is also improved. The
client application 217 does not experience the delays
conventionally associated with downloading sufficient encoded data
fragments 203 to allow for a decoding operation in order to
initiate processing of the data object 201. Therefore, the end-user
perceived latency of a read operation is significantly reduced.
[0040] FIG. 3 is a flowchart illustrating steps that may be
performed by the data object retrieval manager 101, according to
one embodiment. The data object retrieval manager 101 encodes 301 a
data object 201 into a plurality of encoded data fragments 203. The
data object retrieval manager 101 transmits 303 the plurality of
encoded data fragments 203 to the backend 211 of the distributed
encoded data storage system 100, for distribution across a
plurality of backend storage elements 213, such that the
distribution provides a specific level of redundancy. The data
object retrieval manager 101 may receive 305 a corresponding
fragment identifier 205 for each one of the plurality of encoded
data fragments 203, from the backend 211 of the distributed encoded
data storage system 100 (as noted above, in other embodiments
fragment identifiers can be generated/selected on the frontend and
provided to the backend for use). The data object retrieval manager
101 stores 307 the corresponding fragment identifier 205 and
associated metadata 207 for each one of the plurality of encoded
data fragments 203 in a first section of a single metadata unit
214. The data object retrieval manager 101 also stores 309 an
initial part of the data object 201 in a second section of the
metadata unit 214. The data object retrieval manager 101 encodes
310 the single metadata unit 214 into a plurality of encoded
metadata fragments 215 (note that metadata fragments 215 may but
need not be of the same common fragment format as the encoded data
fragments 205). The data object retrieval manager 101 transmits 311
each one of the plurality of encoded metadata fragments 215 to the
backend 211 of the distributed encoded data storage system 100,
where it is stored so as to provide the same specific level of
redundancy as the encoded data fragments 205, and hence the
underlying data object 201. In response to transmitting each one of
the plurality of encoded metadata fragments 215, the data object
retrieval manager 101 receives 313 a corresponding fragment
identifier 205 from the backend 211 of the distributed encoded data
storage system 100 (once again, as noted above, in other
embodiments fragment identifiers can be generated/selected on the
frontend and provided to the backend for use). The data object
retrieval manager 101 associates 315 the fragment identifiers 205
corresponding to the encoded metadata fragments 215 with the data
object 201, and stores 317 these fragment identifiers 205 on a low
latency frontend storage element 209, from which they can be
quickly retrieved and used to obtain the data object 201. As
explained above, frontend storage elements 209 (e.g., SSDs) can
provide faster access to stored content than backend storage
elements 213 (e.g., hard disks). The data object retrieval manager
101 receives 319 a request from a client application 217 to access
the data object 201, and in response retrieves 321 the fragment
identifiers 205 from the frontend storage element 209. The data
object retrieval manager 101 proceeds to retrieve 323 the encoded
metadata fragments 215 from the backend 211 of the distributed
encoded data storage system 100. The data object retrieval manager
101 then decodes 324 the single metadata unit 214 using the
retrieved encoded metadata fragments 215, and retrieves 325 the
initial part of the data object 201 from the second section of the
single metadata unit 215. The data object retrieval manager 101
then transmits 327 the initial part of the data object 201 to the
client application 217 to begin processing the data object 201.
[0041] FIGS. 1-2 illustrate a data object retrieval manager 101
residing on a single storage server 105. It is to be understood
that this is just an example. The functionalities of the data
object retrieval manager 101 can be implemented on other computing
devices in other embodiments, or can be distributed between
multiple computing devices. It is to be understood that although
the data object retrieval manager 101 is illustrated in FIG. 1 as a
standalone entity, the illustrated data object retrieval manager
101 represents a collection of functionalities, which can be
instantiated as a single or multiple modules on one or more
computing devices as desired.
[0042] It is to be understood the data object retrieval manager 101
can be instantiated as one or more modules (for example as object
code or executable images) within the system memory (e.g., RAM,
ROM, flash memory) of any computing device, such that when the
processor of the computing device processes a module, the computing
device executes the associated functionality. As used herein, the
terms "computer system," "computer," "client," "client computer,"
"server," "server computer" and "computing device" mean one or more
computers configured and/or programmed to execute the described
functionality. Additionally, program code to implement the
functionalities of the data object retrieval manager 101 can be
stored on computer-readable storage media. Any form of tangible
computer readable storage medium can be used in this context, such
as magnetic or optical storage media. As used herein, the term
"computer readable storage medium" does not mean an electrical
signal separate from an underlying physical medium.
[0043] Throughout this specification, plural instances may
implement components, operations, or structures described as a
single instance. Although individual operations of one or more
methods are illustrated and described as separate operations, one
or more of the individual operations may be performed concurrently,
and nothing requires that the operations be performed in the order
illustrated. Structures and functionality presented as separate
components in example configurations may be implemented as a
combined structure or component. Similarly, structures and
functionality presented as a single component may be implemented as
separate components. These and other variations, modifications,
additions, and improvements fall within the scope of the subject
matter herein.
[0044] The embodiments illustrated herein are described in enough
detail to enable the disclosed teachings to be practiced. Other
embodiments may be used and derived therefrom, such that structural
and logical substitutions and changes may be made without departing
from the scope of this disclosure. The Detailed Description,
therefore, is not to be taken in a limiting sense, and the scope of
various embodiments is defined by the below claims, along with the
full range of equivalents to which such claims are entitled.
[0045] As used herein, the term "or" may be construed in either an
inclusive or exclusive sense. Moreover, plural instances may be
provided for resources, operations, or structures described herein
as a single instance. Additionally, boundaries between various
resources, operations, modules, engines, and data stores are
somewhat arbitrary, and particular operations are illustrated in a
context of specific illustrative configurations. Other allocations
of functionality are envisioned and may fall within a scope of
various embodiments of the present disclosure. In general,
structures and functionality presented as separate resources in the
example configurations may be implemented as a combined structure
or resource. Similarly, structures and functionality presented as a
single resource may be implemented as separate resources. These and
other variations, modifications, additions, and improvements fall
within a scope of embodiments of the present disclosure as
represented by the appended claims. The specification and drawings
are, accordingly, to be regarded in an illustrative rather than a
restrictive sense.
[0046] The foregoing description, for the purpose of explanation,
has been described with reference to specific example embodiments.
The illustrative discussions above are not intended to be
exhaustive or to limit the possible example embodiments to the
precise forms disclosed. Many modifications and variations are
possible in view of the above teachings. The example embodiments
were chosen and described in order to best explain the principles
involved and their practical applications, to thereby enable others
to best utilize the various example embodiments with various
modifications as are suited to the particular use contemplated.
[0047] Note that, although the terms "first," "second," and so
forth may be used herein to describe various elements, these
elements are not to be limited by these terms. These terms are only
used to distinguish one element from another. For example, a first
contact could be termed a second contact, and, similarly, a second
contact could be termed a first contact, without departing from the
scope of the present example embodiments. The first contact and the
second contact are both contacts, but they are not the same
contact.
[0048] The terminology used in the description of the example
embodiments herein is for describing particular example embodiments
only and is not intended to be limiting. As used in the description
of the example embodiments and the appended claims, the singular
forms "a," "an," and "the" are intended to include the plural forms
as well, unless the context clearly indicates otherwise. Also note
that the term "and/or" as used herein refers to and encompasses any
and/or all possible combinations of one or more of the associated
listed items. Furthermore, the terms "comprises" and/or
"comprising," when used in this specification, specify the presence
of stated features, integers, blocks, steps, operations, elements,
and/or components, but do not preclude the presence or addition of
one or more other features, integers, blocks, steps, operations,
elements, components, and/or groups thereof.
[0049] As used herein, the term "if" may be construed to mean
"when" or "upon" or "in response to determining" or "in response to
detecting," depending on the context. Similarly, the phrase "if it
is determined" or "if [a stated condition or event] is detected"
may be construed to mean "upon determining" or "in response to
determining" or "upon detecting [the stated condition or event]" or
"in response to detecting [the stated condition or event],"
depending on the context.
[0050] As will be understood by those skilled in the art, the
invention may be embodied in other specific forms without departing
from the spirit or essential characteristics thereof. Likewise, the
particular naming and division of the portions, modules, servers,
managers, components, functions, procedures, actions, layers,
features, attributes, methodologies, data structures and other
aspects are not mandatory or significant, and the mechanisms that
implement the invention or its features may have different names,
divisions and/or formats. The foregoing description, for the
purpose of explanation, has been described with reference to
specific embodiments. However, the illustrative discussions above
are not intended to be exhaustive or limiting to the precise forms
disclosed. Many modifications and variations are possible in view
of the above teachings. The embodiments were chosen and described
in order to best explain relevant principles and their practical
applications, to thereby enable others skilled in the art to best
utilize various embodiments with or without various modifications
as may be suited to the particular use contemplated.
* * * * *