U.S. patent application number 12/499489 was filed with the patent office on 2010-03-11 for billing system for information dispersal system.
This patent application is currently assigned to CLEVERSAFE, INC.. Invention is credited to MATTHEW M. ENGLAND, S. CHRISTOPHER GLADWIN, ZACHARY J. MARK, SEJAL KUMARBHAI MODI, JOSHUA J. MULLIN, VANCE T. THORNTON.
Application Number | 20100063911 12/499489 |
Document ID | / |
Family ID | 38610029 |
Filed Date | 2010-03-11 |
United States Patent
Application |
20100063911 |
Kind Code |
A1 |
GLADWIN; S. CHRISTOPHER ; et
al. |
March 11, 2010 |
BILLING SYSTEM FOR INFORMATION DISPERSAL SYSTEM
Abstract
An apparatus includes a processing module and a network
interface. The processing module is operably coupled to: accessing
user level metadata based on account identifier to identify a
plurality of files associated with a user and retrieve user level
metadata for the plurality of files; access file level metadata
associated with the plurality of files to retrieve, for each of the
plurality of files, file level metadata and determine, for each of
the plurality of files, a plurality of file slices associated with
a corresponding file of the plurality of files; and generate
billing transaction information based on the user level metadata
and the file level metadata of the plurality of files. The network
interface is operably coupled to: convert the billing transaction
information into a network billing transaction information message
and transmit it.
Inventors: |
GLADWIN; S. CHRISTOPHER;
(CHICAGO, IL) ; ENGLAND; MATTHEW M.; (CHICAGO,
IL) ; MARK; ZACHARY J.; (CHICAGO, IL) ;
THORNTON; VANCE T.; (CHICAGO, IL) ; MULLIN; JOSHUA
J.; (CHICAGO, IL) ; MODI; SEJAL KUMARBHAI;
(CHICAGO, IL) |
Correspondence
Address: |
Garlick Harrison & Markison (CS)
P.O.Box 160727
Austin
TX
78716
US
|
Assignee: |
CLEVERSAFE, INC.
CHICAGO
IL
|
Family ID: |
38610029 |
Appl. No.: |
12/499489 |
Filed: |
July 8, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11403684 |
Apr 13, 2006 |
7574570 |
|
|
12499489 |
|
|
|
|
11241555 |
Sep 30, 2005 |
|
|
|
11403684 |
|
|
|
|
Current U.S.
Class: |
705/34 |
Current CPC
Class: |
G06F 21/6227 20130101;
G06Q 20/102 20130101; G06Q 30/04 20130101 |
Class at
Publication: |
705/34 |
International
Class: |
G06Q 30/00 20060101
G06Q030/00; G06Q 50/00 20060101 G06Q050/00 |
Claims
1-20. (canceled)
21. An apparatus comprises: a processing module operably coupled
to: accessing user level metadata based on account identifier to:
identify a plurality of files associated with a user; and retrieve
user level metadata for the plurality of files; access file level
metadata associated with the plurality of files to: retrieve, for
each of the plurality of files, file level metadata; and determine,
for each of the plurality of files, a plurality of file slices
associated with a corresponding file of the plurality of files; and
generate billing transaction information based on the user level
metadata and the file level metadata of the plurality of files; and
a network interface operably couple to: convert the billing
transaction information into a network billing transaction
information message; and transmit the network billing transaction
information message.
22. The apparatus of claim 21, wherein the file level metadata
comprises at least one of: a transaction data sources table; and an
applications table.
23. The apparatus of claim 21, wherein the user level metadata
comprises at least one of: a transaction context table; and a list
of files.
24. The apparatus of claim 21, wherein the processing module is
further operably coupled to: access data space level metadata to
determine to a data space of the user based on an account
identifier; and access the user level metadata based on the data
space.
25. The apparatus of claim 24, wherein the system level metadata
comprises at least one of: a data space directory map; a data space
volume map; and an account data space map.
26. The apparatus of claim 21, wherein the processing module is
further operably coupled to: for each of the plurality of files,
access file slice metadata associated with the plurality of file
slices to retrieve file slice metadata; and generate the billing
transaction information based on the user level metadata, the file
level metadata of the plurality of files, and the file slice
metadata.
27. The apparatus of claim 26, wherein the file slice level
metadata comprises at least one of: a data sources table; and a
data space table.
28. The apparatus of claim 26, wherein the processing module is
further operably coupled to: for each of the plurality of files,
identify a plurality of storage nodes that stores the plurality of
file slices based on the file slice metadata, wherein a file slice
of the plurality of file slices includes a data slice and coded
subsets; and generate the billing transaction information based on
the user level metadata, the file level metadata of the plurality
of files, the file slice metadata, and the identity of the
plurality of storage nodes.
29. The apparatus of claim 21 further comprises at least one of: a
computer; a plurality of computers; an application running on the
computer; and the application running on the plurality of
computers.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of co-pending U.S.
Utility patent application Ser. No. 11/403,684, filed Apr. 13,
2006, which is a continuation-in-part of co-pending U.S. Utility
patent application Ser. No. 11/241,555, filed Sep. 30, 2005.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to a billing system and method
for a distributed data storage system for storing data in subsets
and more particularly, to a billing system and method in which
information regarding the original file size and the times and
types of transactions are maintained and stored separately from the
stored data subsets and used to perform billing operations in a
commercial information dispersal data storage system.
[0004] 2. Description of the Prior Art
[0005] Various data storage systems are known for storing data.
Normally such data storage systems store all of the data associated
with a particular data set, for example, all the data of a
particular user or all the data associated with a particular
software application or all the data in a particular file, in a
single dataspace (i.e., single digital data storage device).
Critical data is known to be initially stored on redundant digital
data storage devices. Thus, if there is a failure of one digital
data storage device, a complete copy of the data is available on
the other digital data storage device. Examples of such systems
with redundant digital data storage devices are disclosed in U.S.
Pat. Nos.: 5,890,156; 6,058,454; and 6,418,539, hereby incorporated
by reference. Although such redundant digital data storage systems
are relatively reliable, there are other problems with such
systems. First, such systems essentially double or further increase
the cost of digital data storage. Second, all of the data in such
redundant digital data storage systems is in one place making the
data vulnerable to unauthorized access.
[0006] The use of such information dispersal algorithms in data
storage systems is also described in various trade publications.
For example, "How to Share a Secret", by A. Shamir, Communications
of the ACM, Vol. 22, No. 11, November, 1979, describes a scheme for
sharing a secret, such as a cryptographic key, based on polynomial
interpolation. Another trade publication, "Efficient Dispersal of
Information for Security, Load Balancing, and Fault Tolerance", by
M. Rabin, Journal of the Association for Computing Machinery, Vol.
36, No. 2, April 1989, pgs. 335-348, also describes a method for
information dispersal using an information dispersal algorithm.
Unfortunately, these methods and other known information dispersal
methods are computationally intensive and are thus not applicable
for general storage of large amounts of data using the kinds of
computers in broad use by businesses, consumers and other
organizations today. Thus there is a need for a data storage system
that is able to reliably and securely protect data that does not
require the use of computation intensive algorithms.
[0007] Several companies offer commercial data storage servers
using data storage systems that store copies of data files together
with associated metadata. Many companies, such as Rackspace, Ltd,
offer data storage services as a part of general managed hosting
services. Other known companies, such as Iron Mountain
Incorporated, offer data storage services as a part of an online
backup service. These companies typically determine billing charges
in relation to the size of the data stored. The original file size
is stored together with the data as a metadata attribute associated
with the data file. Billing for such services is based on the
amount of data stored or transferred. In these cases, billing
amounts are derived from the metadata attributes associated with
each file. In some situations, it is necessary that the data being
stored or transmitted be changed in size, for example, by
compression, in order to reduce storage space or improve
transmission speed. In these situations, known information
dispersal storage systems are unable to keep track of the original
data file size. Since billing in such known systems is based upon
metadata attributes associated with the data being stored or
transferred, billing options in such situations are rather limited.
Thus, there is a need for more flexible billing options in such
information dispersal storage systems.
DESCRIPTION OF THE DRAWING
[0008] These and other advantages of the present invention will be
readily understood with reference to the following drawing and
attached specification wherein:
[0009] FIG. 1 is a block diagram of an exemplary data storage
system in accordance with the present invention which illustrates
how the original data is sliced into data subsets, coded and
transmitted to a separate digital data storage device or node.
[0010] FIG. 2 is similar to FIG. 1 but illustrates how the data
subsets from all of the exemplary six nodes are retrieved and
decoded to recreate the original data set.
[0011] FIG. 3 is similar to FIG. 2 but illustrates a condition of a
failure of one of the six digital data storage devices.
[0012] FIG. 4 is similar FIG. 3 but for the condition of a failure
of three of the six digital data storage devices.
[0013] FIG. 5 is an exemplary table in accordance with the present
invention that can be used to recreate data which has been stored
on the exemplary six digital data storage devices.
[0014] FIG. 6 is an exemplary table that lists the decode equations
for an exemplary six node storage data storage system for a
condition of two node outages.
[0015] FIG. 7 is similar to FIG. 6 but for a condition with three
node outages.
[0016] FIG. 8 is a table that lists all possible storage node
outage states for an exemplary data storage system with nine
storage nodes for a condition with two node outages.
[0017] FIG. 9 is an exemplary diagram in accordance with the
present invention which illustrates the various functional elements
of a metadata management system for use with an information
dispersal storage system which provides flexible billing options in
accordance with the present invention.
[0018] FIG. 10 is an exemplary flow chart that shows the process
for maintaining metadata for data stored on the dispersed data
storage grid.
[0019] FIG. 11 shows the essential metadata components that are
used during user transactions and during user file set lookup.
[0020] FIGS. 12 A and 12 B illustrate the operation of the
system.
[0021] FIG. 13 is an exemplary flow chart that shows a billing
process in accordance with the present invention.
DETAILED DESCRIPTION
[0022] The present invention relates to a billing system for an
information dispersal storage system or data storage system. The
information dispersal storage system is illustrated and described
in connection with FIGS. 1-8. FIGS. 9-12 illustrate a metadata
management system for managing the information dispersal storage
system. The billing system in accordance with the present invention
is illustrated and described in connection with FIG. 13. It is to
be understood that the principles of the billing system are
amenable to being utilized with all sorts of information dispersal
storage systems. The information dispersal storage system
illustrated in FIGS. 1-8 is merely exemplary of one type of
information dispersal storage system for use with the present
invention.
[0023] Information Dispersal Storage System
[0024] In order to protect the security of the original data, the
original data is separated into a number of data "slices" or
subsets. The amount of data in each slice is less usable or less
recognizable or completely unusable or completely unrecognizable by
itself except when combined with some or all of the other data
subsets. In particular, the system in accordance with the present
invention "slices" the original data into data subsets and uses a
coding algorithm on the data subsets to create coded data subsets.
Each data subset and its corresponding coded subset may be
transmitted separately across a communications network and stored
in a separate storage node in an array of storage nodes. In order
to recreate the original data, data subsets and coded subsets are
retrieved from some or all of the storage nodes or communication
channels, depending on the availability and performance of each
storage node and each communication channel. The original data is
recreated by applying a series of decoding algorithms to the
retrieved data and coded data.
[0025] As with other known data storage systems based upon
information dispersal methods, unauthorized access to one or more
data subsets only provides reduced or unusable information about
the source data. In accordance with an important aspect of the
invention, the system codes and decodes data subsets in a manner
that is computationally efficient relative to known systems in
order to enable broad use of this method using the types of
computers generally used by businesses, consumers and other
organizations currently.
[0026] In order to understand the invention, consider a string of N
characters d.sub.0, d.sub.1, . . . , d.sub.N which could comprise a
file or a system of files. A typical computer file system may
contain gigabytes of data which would mean N would contain
trillions of characters. The following example considers a much
smaller string where the data string length, N, equals the number
of storage nodes, n. To store larger data strings, these methods
can be applied repeatedly. These methods can also be applied
repeatedly to store computer files or entire file systems.
[0027] For this example, assume that the string contains the
characters, O L I V E R where the string contains ASCII character
codes as follows: [0028] d.sub.0=O=79 [0029] d.sub.1=L=76 [0030]
d.sub.2,=I=73 [0031] d.sub.3,=V=86 [0032] d.sub.4,=E=69 [0033]
d.sub.5=R=82
[0034] The string is broken into segments that are n characters
each, where n is chosen to provide the desired reliability and
security characteristics while maintaining the desired level of
computational efficiency--typically n would be selected to be below
100. In one embodiment, n may be chosen to be greater than four (4)
so that each subset of the data contains less than, for example,
1/4 of the original data, thus decreasing the recognizablity of
each data subset.
[0035] In an alternate embodiment, n is selected to be six (6), so
that the first original data set is separated into six (6)
different data subsets as follows:
A=d.sub.0, B=d.sub.1, C=d.sub.2, D=d.sub.3, E=d.sub.4,
F=d.sub.5
[0036] For example, where the original data is the starting string
of ASCII values for the characters of the text O L I V E R, the
values in the data subsets would be those listed below: [0037] A=79
[0038] B=76 [0039] C=73 [0040] D=86 [0041] E=69 [0042] F=82
[0043] In this embodiment, the coded data values are created by
adding data values from a subset of the other data values in the
original data set. For example, the coded values can be created by
adding the following data values:
c[x]=d[n_mod(x+1)]+d[n_mod(x+2)]+d[n_mod(x+4)]
where: [0044] c[x] is the xth coded data value in the segment array
of coded data values [0045] d[x+1] is the value in the position 1
greater than x in a array of data values [0046] d[x+2] is the value
in the position 2 greater than x in a array of data values [0047]
d[x+4] is the value in the position 4 greater than x in a array of
data values [0048] n_mod( )is function that performs a modulo
operation over the number space 0 to n-1
[0049] Using this equation, the following coded values are
created:
cA, cB, cC, cD, cE, cF
where cA, for example, is equal to B+C+E and represents the coded
value that will be communicated and/or stored along with the data
value, A.
[0050] For example, where the original data is the starting string
of ASCII values for the characters of the text O L I V E R, the
values in the coded data subsets would be those listed below:
[0051] cA=218 [0052] cB=241 [0053] cC=234 [0054] cD=227 [0055]
cE=234 [0056] cF=241
[0057] In accordance with the present invention, the original data
set 20, consisting of the exemplary data ABCDEF is sliced into, for
example, six (6) data subsets A, B, C, D, E and F. The data subsets
A, B, C, D, E and F are also coded as discussed below forming coded
data subsets cA, cB, cC, cD, cE and cF. The data subsets A, B, C,
D, E and F and the coded data subsets cA, cB, cC, cD, cE and cF are
formed into a plurality of slices 22, 24, 26, 28,30 and 32 as
shown, for example, in FIG. 1. Each slice, 22, 24, 26, 28, 30 and
32, contains a different data value A, B, C, D, E and F and a
different coded subset cA, cB, cC, cD, cE and cF. The slices 22,
24, 26, 28, 30 and 32 may be transmitted across a communications
network, such as the Internet, in a series of data transmissions to
a series and each stored in a different digital data storage device
or storage node 34, 36, 38, 40, 42 and 44.
[0058] In order to retrieve the original data (or receive it in the
case where the data is just transmitted, not stored), the data can
reconstructed as shown in FIG. 2. Data values from each storage
node 34, 36, 38, 40, 42 and 44 are transmitted across a
communications network, such as the Internet, to a receiving
computer (not shown). As shown in FIG. 2, the receiving computer
receives the slices 22, 24, 26, 28, 30 and 32, each of which
contains a different data value A, B, C, D, E and F and a different
coded value cA, cB, cC, cD, cE and cF.
[0059] For a variety of reasons, such as the outage or slow
performance of a storage node 34, 36, 38, 40, 42 and 44 or a
communications connection, not all data slices 22, 24, 26, 28, 30
and 32 will always be available each time data is recreated. FIG. 3
illustrates a condition in which the present invention recreates
the original data set when one data slice 22, 24, 26, 28, 30 and
32, for example, the data slice 22 containing the data value A and
the coded value cA are not available. In this case, the original
data value A can be obtained as follows:
A=cC-D-E
where cC is a coded value and D and E are original data values,
available from the slices 26, 28 and 30, which are assumed to be
available from the nodes 38, 40 and 42, respectively. In this case
the missing data value can be determined by reversing the coding
equation that summed a portion of the data values to create a coded
value by subtracting the known data values from a known coded
value.
[0060] For example, where the original data is the starting string
of ASCII values for the characters of the text O L I V E R, the
data value of the A could be determined as follows:
A=234-86-69
Therefore A=79 which is the ASCII value for the character, O.
[0061] In other cases, determining the original data values
requires a more detailed decoding equation. For example, FIG. 4
illustrates a condition in which three (3) of the six (6) nodes 34,
36 and 42 which contain the original data values A, B and E and
their corresponding coded values cA, cB and cE are not available.
These missing data values A, B and E and corresponding in FIG. 4
can be restored by using the following sequence of equations:
1. B=(cD-F+cF-cC)/2
2. E=cD-F-B
3. A=cF-B-D
[0062] These equations are performed in the order listed in order
for the data values required for each equation to be available when
the specific equation is performed.
[0063] For example, where the original data is the starting string
of ASCII values for the characters of the text O L I V E R, the
data values of the B, E and A could be determined as follows:
1. B=(227-82+241-234)/2 B=76
2. E=227-82-76 E=69
3. A=241-76-86 A=79
[0064] In order to generalize the method for the recreation of all
original data ABCDEF when n=6 and up to three slices 22, 24, 26, 28
30 and 32 are not available at the time of the recreation, FIG. 5
contains a table that can be used to determine how to recreate the
missing data.
[0065] This table lists the 40 different outage scenarios where 1,
2, or 3 out of six storage nodes are be not available or performing
slow enough as to be considered not available. In the table in FIG.
5, an `X` in a row designates that data and coded values from that
node are not available. The `Type` column designates the number of
nodes not available. An `Offset` value for each outage scenario is
also indicated. The offset is the difference the spatial position
of a particular outage scenario and the first outage scenario of
that Type.
[0066] The data values can be represented by the array d[x], where
x is the node number where that data value is stored. The coded
values can be represented by the array c[x].
[0067] In order to reconstruct missing data in an outage scenario
where one node is not available in a storage array where n=6, the
follow equation can be used:
d[0+offset]=c3d(2, 3, 4, offset)
where c3d ( )is a function in pseudo computer software code as
follows:
TABLE-US-00001 c3d(coded_data_pos, known_data_a_pos,
known_data_b_pos, offset) { unknown_data=
c[n_mod(coded_data_pos+offset)]- d[n_mod(known_data_a_pos+offset)]-
d[n_mod(known_data_b_pos+offset)]; return unknown_data } where
n_mod( ) is the function defined previously. [
[0068] In order to reconstruct missing data in an outage scenario
where two nodes are not available in a storage array where n=6, the
equations in the table in FIG. 6 can be used. In FIG. 6, the
`Outage Type Num` refers to the corresponding outage `Type` from
FIG. 5. The `Decode Operation` in FIG. 6 refers to the order in
which the decode operations are performed. The `Decoded Data`
column in FIG. 6 provides the specific decode operations which
produces each missing data value.
[0069] In order to reconstruct missing data in an outage scenario
where three nodes are not available in a storage array where n=6,
the equations in the table in FIG. 7 can be used. Note that in FIG.
7, the structure of the decode equation for the first decode for
outage type=3 is a different structure than the other decode
equations where n=6.
[0070] The example equations listed above are typical of the type
of coding and decoding equations that create efficient computing
processes using this method, but they only represent one of many
examples of how this method can be used to create efficient
information distribution systems. In the example above of
distributing original data on a storage array of 6 nodes where at
least 3 are required to recreate all the data, the computational
overhead of creating the coded data is only two addition operations
per byte. When data is decoded, no additional operations are
required if all storage nodes and communications channels are
available. If one or two of the storage nodes or communications
channels are not available when n=6, then only two additional
addition/subtraction operations are required to decode each missing
data value. If three storage nodes or communications channels are
missing when n=6, then just addition/subtraction operations are
required for each missing byte in 11 of 12 instances--in that
twelfth instance, only 4 computational operations are required (3
addition/subtractions and one division by an integer). This method
is more computationally efficient that known methods, such as those
described by Rabin and Shamir.
[0071] This method of selecting a computationally efficient method
for secure, distributed data storage by creating coded values to
store at storage nodes that also store data subsets can be used to
create data storage arrays generally for configurations where n=4
or greater. In each case decoding equations such as those detailed
above can be used to recreate missing data in a computationally
efficient manner.
[0072] Coding and decoding algorithms for varying grid sizes which
tolerate varying numbers of storage node outages without original
data loss can also be created using these methods. For example, to
create a 9 node grid that can tolerate the loss of 2 nodes, a
candidate coding algorithm is selected that uses a mathematical
function that incorporates at least two other nodes, such as:
c[x]=d[n_mod(x+1)]+d[n_mod(x+2)]
where: [0073] n=9, the number of storage nodes in the grid [0074]
c[x] is the xth coded data value in the segment array of coded data
values [0075] d[x+1] is the value in the position 1 greater than x
in a array of data values [0076] d[x+2] is the value in the
position 2 greater than x in a array of data values [0077] n_mod(
)is function that performs a mod over the number space 0 to n-1
[0078] In this example embodiment, n=9, the first data segment is
separated into different data subsets as follows:
A=d.sub.0, B=d.sub.1, C=d.sub.2, D=d.sub.3, E=d.sub.4, F=d.sub.5,
G=d.sub.6, H=d.sub.7, I=d.sub.8
[0079] Using this candidate coding algorithm equation above, the
following coded values are created:
cA, cB, cC, cD, cE, cF, cG, cH, cI
[0080] The candidate coding algorithm is then tested against all
possible grid outage states of up to the desired number of storage
node outages that can be tolerated with complete data restoration
of all original data. FIG. 8 lists all possible storage grid cases
for a 9 storage node grid with 2 storage node outages. Although
there are 36 outage cases on a 9 node storage grid with 2 storage
node outages, these can be grouped into 4 Types as shown in FIG. 8.
Each of these 4 Types represent a particular spatial arrangement of
the 2 outages, such as the 2 storage node outages being spatially
next to each other in the grid (Type 1) or the 2 storage node
outages being separated by one operating storage node (Type 2). The
offset listed in FIG. 8 shows the spatial relationship of each
outage case within the same Type as they relate to the first outage
case of that Type listed in that table. For example, the first
instance of a Type 1 outage in FIG. 8 is the outage case where
Node0 and Node1 are out. This first instance of a Type 1 outage is
then assigned the Offset value of 0. The second instance of a Type
1 outage in FIG. 8 is the outage case where Node1 and Node2 are
out. Therefore, this second instance of a Type 1 outage is assigned
the Offset value of 1 since the two storage nodes outages occur at
storage nodes that are 1 greater than the location of the storage
node outages in the first case of Type 1 in FIG. 8.
[0081] The validity of the candidate coding algorithm can them be
tested by determining if there is a decoding equation or set of
decoding equations that can be used to recreate all the original
data in each outage Type and thus each outage case. For example, in
the first outage case in FIG. 8, Node0 and Node1 are out. This
means that the data values A and B are not directly available on
the storage grid. However, A can be recreated from cH as
follows:
cH=I+A
A=cH-I
[0082] The missing data value B can then be created from cI as
follows:
cI=A+B
B=cI-A
[0083] This type of validity testing can then be used to test if
all original data can be obtained in all other instances where 2
storage nodes on a 9 node storage grid are not operating. Next, all
instances where 1 storage node is not operating on a 9 node storage
grid are tested to verify whether that candidate coding algorithm
is valid. If the validity testing shows that all original data can
be obtained in every instance of 2 storage nodes not operating on a
9 node storage grid and every instance of 1 storage node not
operating on a 9 node storage grid, then that coding algorithm
would be valid to store data on a 9 node storage grid and then to
retrieve all original data from that grid if up to 2 storage nodes
were not operating.
[0084] These types of coding and decoding algorithms can be used by
those practiced in the art of software development to create
storage grids with varying numbers of storage nodes with varying
numbers of storage node outages that can be tolerated by the
storage grid while perfectly restoring all original data.
[0085] Metadata Management System
[0086] A metadata management system, illustrated in FIGS. 9-12, is
used to manage dispersal and storage of information that is
dispersed and stored in several storage nodes coupled to a common
communication network forming a grid, for example, as discussed
above in connection with FIGS. 1-8. In order to enhance the
reliability of the information dispersal system, metadata
attributes of the transactions on the grid are stored in separate
dataspace from the dispersed data.
[0087] As discussed above, the information dispersal system
"slices" the original data into data subsets and uses a coding
algorithm on the data subsets to create coded data subsets. In
order to recreate the original data, data subsets and coded subsets
are retrieved from some or all of the storage nodes or
communication channels, depending on the availability and
performance of each storage node and each communication channel. As
with other known data storage systems based upon information
dispersal methods, unauthorized access to one or more data subsets
only provides reduced or unusable information about the source
data. For example as illustrated in FIG. 1, each slice 22, 24, 26,
28, 30 and 32, contains a different data value A, B, C, D, E and F
and a different "coded subset" (Coded subsets are generated by
algorithms and are stored with the data slices to allow for
restoration when restoration is done using part of the original
subsets) cA, cB, cC, cD, cE and cF. The slices 22, 24, 26, 28, 30
and 32 may be transmitted across a communications network, such as
the Internet, in a series of data transmissions to a series and
each stored in a different digital data storage device or storage
node 34, 36, 38, 40, 42 and 44. Each data subset and its
corresponding coded subset may be transmitted separately across a
communications network and stored in a separate storage node in an
array of storage nodes.
[0088] A "file stripe" is the set of data and/or coded subsets
corresponding to a particular file. Each file stripe may be stored
on a different set of data storage devices or storage nodes 57
within the overall grid as available storage resources or storage
nodes may change over time as different files are stored on the
grid.
[0089] A "dataspace" is a portion of a storage grid 49 that
contains the data of a specific client 64. A grid client may also
utilize more than one data. The dataspaces table 106 in FIG. 11
shows all dataspaces associated with a particular client.
Typically, particular grid clients are not able to view the
dataspaces of other grid clients in order to provide data security
and privacy.
[0090] FIG. 9 shows the different components of a storage grid,
generally identified with the reference numeral 49. The grid 49
includes associated storage nodes 54 associated with a specific
grid client 64 as well as other storage nodes 56 associated with
other grid clients (collectively or individually "the storage nodes
57"), connected to a communication network, such as the Internet.
The grid 49 also includes applications for managing client backups
and restorations in terms of dataspaces and their associated
collections.
[0091] In general, a "director" is an application running on the
grid 49. The director serves various purposes, such as: [0092] 1.
Provide a centralized-but-duplicatable point of User-Client login.
The Director is the only grid application that stores User-login
information. [0093] 2. Autonomously provide a per-User list of
stored files. All User-Client's can acquire the entire list of
files stored on the Grid for each user by talking to one and only
one director. This file-list metadata is duplicated across one
Primary Directory to several Backup Directors. [0094] 3. Track
which Sites contain User Slices. [0095] 4. Manager Authentication
Certificates for other Node personalities.
[0096] The applications on the grid form a metadata management
system and include a primary director 58, secondary directors 60
and other directors 62. Each dataspace is always associated at any
given time with one and only one primary director 58. Every time a
grid client 64 attempts any dataspace operation (save/retrieve),
the grid client 64 must reconcile the operation with the primary
director 58 associated with that dataspace. Among other things, the
primary director 58 manages exclusive locks for each dataspace.
Every primary director 58 has at least one or more secondary
directors 60. In order to enhance reliability of the system, any
dataspace metadata updates (especially lock updates) are
synchronously copied by the dataspace's primary director 58 and to
all of its secondary or backup directors 60 before returning
acknowledgement status back to the requesting grid client. 64. In
addition, for additional reliability, all other directors 62 on the
Grid may also asynchronously receive a copy of the metadata update.
In such a configuration, all dataspace metadata is effectively
copied across the entire grid 49.
[0097] As used herein, a primary director 58 and its associated
secondary directors 60 are also referred to as associated directors
60. The secondary directors 60 ensure that any acknowledged
metadata management updates are not lost in the event that a
primary director 58 fails in the midst of a grid client 64
dataspace update operation. There exists a trade-off between the
number of secondary directors 60 and the metadata access
performance of the grid 49. In general, the greater the number of
secondary directors 60, the higher the reliability of metadata
updates, but the slower the metadata update response time.
[0098] The associated directors 66 and other directors 62 do not
track which slices are stored on each storage node 57, but rather
keeps track of the associated storage nodes 57 associated with each
grid client 64. Once the specific nodes are known for each client,
it is necessary to contact the various storage nodes 57 in order to
determine the slices associated with each grid client 64.
[0099] While the primary director 58 controls the majority of Grid
metadata; the storage nodes 57 serve the following
responsibilities: [0100] 1. Store the user's slices. The storage
nodes 57 store the user slices in a file-system that mirrors the
user's file-system structure on the Client machines(s). [0101] 2.
Store a list of per-user files on the storage node 57 in a
database. The storage node 57 associates minimal metadata
attributes, such as Slice hash signatures (e.g., MD5s) with each
slice "row" in the database.
[0102] The Grid identifies each storage node 57 with a unique
storage volume serial number (volumeID) and as such can identify
the storage volume even when it is spread across multiple servers.
In order to recreate the original data, data subsets and coded
subsets are retrieved from some or all of the storage nodes 57 or
communication channels, depending on the availability and
performance of each storage node 57 and each communication channel.
Each primary director 58 keeps a list of all storage nodes 57 on
the grid 49 and therefore all the nodes available at each site.
[0103] Following is the list of key metadata attributes used during
backup/restore processes:
TABLE-US-00002 Attribute Description iAccountID Unique ID number
for each account, unique for each user. iDataspaceID Unique ID for
each user on all the volumes, it is used to keep track of the user
data on each volume iDirectorAppID Grid wide unique ID which
identifies a running instance of the director. iRank Used to insure
that primary director always has accurate metadata. iVolumeID
Unique for identifying each volume on the Grid, director uses this
to generate a volume map for a new user (first time) and track
volume map for existing users. iTransactionContextID Identifies a
running instance of a client. iApplicationID Grid wide unique ID
which identifies running instance of an application. iDatasourceID
All the contents stored on the grid is in the form of data source,
each unique file on the disk is associated with this unique ID.
iRevision Keeps track of the different revisions for a data source.
iSize Metadata to track the size of the data source sName Metadata
to track the name of the data source iCreationTime Metadata to
track the creation time of the data source iModificationTime
Metadata to track the last modification time of the data
source,
[0104] FIG. 10 describes a flow of data and a top level view of
what happens when a client interacts with the storage system. FIG.
11 illustrates the key metadata tables that are used to keep track
of user info in the process.
[0105] Referring to FIG. 10, initially in step 70, a grid client 64
starts with logging in to a director application running on a
server on the grid. After a successful log in, the director
application returns to the grid client 64 in step 72, a
DataspaceDirectorMap 92 (FIG. 11). The director application
includes an AccountDataspaceMap 93; a look up table which looks up
the grid client's AccountID in order to determine the DataspaceID.
The DataspaceID is then used to determine the grid client's primary
director (i.e., DirectorAppID) from the DataspaceDirectorMap
92.
[0106] Once the grid client 64 knows its primary director 58, the
grid client 64 can request a Dataspace VolumeMap 94 (FIG. 11) and
use the DataspaceID to determine the storage nodes associated with
that grid client 64 (i.e., VolumeID). The primary director 58 sets
up a TransactionContextID for the grid client 64 in a Transactions
table 102 (FIG. 11). The TransactionContextID is unique for each
transaction (i.e., for each running instance or session of the grid
client 64). In particular, the Dataspace ID from the
DataspaceDirectorMap 92 is used to create a unique transaction ID
in a TransactionContexts table 96. The transaction ID stored in a
Transaction table 102 along with the TransactionContextID in order
to keep track of all transactions by all of the grid clients for
each session of a grid client with the grid 49.
[0107] The "TransactionContextID" metadata attribute is a different
attribute than TransactionID in that a client can be involved with
more than one active transactions (not committed) but at all times
only one "Transaction context Id" is associated with one running
instance of the client. These metadata attributes allow management
of concurrent transactions by different grid clients.
[0108] As mentioned above, the primary director 58 maintains a list
of the storage nodes 57 associated with each grid client 64. This
list is maintained as a TransactionContexts table 96 which
maintains the identities of the storage nodes (i.e., DataspaceID)
and the identity of the grid client 64 (i.e., ID). The primary
director 58 contains the "Application" metadata (i.e., Applications
table 104) used by the grid client 64 to communicate with the
primary director 58. The Applications table 64 is used to record
the type of transaction (AppTypeID), for example add or remove data
slices and the storage nodes 57 associated with the transaction
(i.e., SiteID).
[0109] Before any data transfers begins, the grid client 64 files
metadata with the primary director 58 regarding the intended
transaction, such as the name and size of the file as well as its
creation date and modification date, for example. The metadata may
also include other metadata attributes, such as the various fields
illustrated in the TransactionsDatasources table 98.(FIG. 11) The
Transaction Datasources metadata table 98 is used to keep control
over the transactions until the transactions are completed.
[0110] After the above information is exchanged between the grid
client 64 and the primary director 58, the grid client 64 connects
to the storage nodes in step 74 in preparation for transfer of the
file slices. Before any information is exchanged, the grid client
64 registers the metadata in its Datasources table 100 in step 76
in order to fill in the data fields in the Transaction Datasources
table 98.
[0111] Next in step 78, the data slices and coded subsets are
created in the manner discussed above by an application running on
the grid client 64. Any data scrambling, compression and/or
encryption of the data may be done before or after the data has
been dispersed into slices. The data slices are then uploaded to
the storage nodes 57 in step 80.
[0112] Once the upload starts, the grid client 64 uses the
transaction metadata (i.e., data from Transaction Datasources table
98) to update the file metadata (i.e., DataSources table 100). Once
the upload is complete, only then the datasource information from
the Transaction Datasources table 98 is moved to the Datasource
table 100 and removed from the Transaction Datasources table 98 in
steps 84, 86 and 88. This process is "atomic" in nature, that is,
no change is recorded if at any instance the transaction fails. The
Datasources table 100 includes revision numbers to maintain the
integrity of the user's file set.
[0113] A simple example, as illustrated in FIGS. 12 A and 12B,
illustrates the operation of the metadata management system 50. The
example assumes that the client wants to save a file named
"Myfile.txt" on the grid 49.
[0114] Step 1: The grid client connects to the director application
running on the grid 49. Since the director application is not the
primary director 58 for this grid client 64, the director
application authenticates the grid client and returns the
DataspaceDirectorMap 92. Basically, the director uses the AccountID
to find its DataspaceID and return the corresponding DirectorAppID
(primary director ID for this client).
[0115] Step 2: Once the grid client 64 has the DataspaceDirectorMap
92, it now knows which director is its primary director. The grid
client 64 then connects to this director application and the
primary director creates a TransactionContextID, as explained
above, which is unique for the grid client session. The primary
director 58 also sends the grid client 64 its DataspaceVolumeMap 94
(i.e., the number of storage nodes 57 in which the grid client 64
needs to a connection). The grid client 64 sends the file metadata
to the director (i.e., fields required in the Transaction
Datasources table).
[0116] Step 3: By way of an application running on the client, the
data slices and coded subsets of "Myfile.txt" are created using
storage algorithms as discussed above. The grid client 64 now
connects to the various storage nodes 57 on the grid 49, as per the
DataspaceVolumeMap 94. The grid client now pushes its data and
coded subsets to the various storage nodes 57 on the grid 49.
[0117] Step 4: When the grid client 64 is finished saving its file
slices on the various storage nodes 57, the grid client 64 notifies
the primary director application 58 to remove this transaction from
the TransactionDatasources Table 98 and add it to the Datasources
Table 100. The system is configured so that the grid dent 64 is not
able retrieve any file that is not on the Datasources Table 100. As
such, adding the file Metadata on the Datasources table 100
completes the file save/backup operation.
[0118] As should be clear from the above, the primary director 58
is an application that decides when a transaction begins or ends. A
transaction begins before a primary director 58 sends the storage
node 57 metadata to the grid client 64 and it ends after writing
the information about the data sources on the Datasources table
100. This configuration insures completeness. As such, if a primary
director 58 reports a transaction as having completed, then any
application viewing that transaction will know that all the other
storage nodes have been appropriately updated for the transaction.
This concept of "Atomic Transactions" is important to maintain the
integrity of the storage system. For example, if the entire update
transaction does not complete, and all of the disparate storage
nodes are not appropriately "synchronized," then the storage system
is left in a state of disarray, at least for the Dataspace table
100 of the grid client 64 in question. Otherwise, if transactions
are interrupted for any reason (e.g., simply by powering off a
client PC in the middle of a backup process) and are otherwise left
in an incomplete state, the system's overall data integrity would
become compromised rather quickly.
[0119] Billing System for Information Dispersal Storage System
[0120] In accordance with an important aspect of the invention,
metadata tables that include information about the original files
are created and maintained separate from the file shares as
illustrated in FIGS. 9-12. These separate files are used to provide
information required to bill for commercial usage of the
information dispersal grid. Although the system is described and
illustrated for use with the information dispersal storage system,
illustrated in FIGS. 1-8, the principles of the present invention
are applicable to virtually any such system, such as systems
configured as Storage Area Networks (SAN), for example as disclosed
in U.S. Pat. Nos. 6,256,688 and 7,003,688 as well as US Patent
Application Publications US 2005/0125593 A1 and US 2006/0047907 A1,
hereby incorporated by reference.
[0121] As mentioned above, the metadata management system includes
a primary director 58 and one or more secondary directors 60
(collectively or individually "the associated directors 66"). These
directors 66 are used to create the metadata tables, illustrated in
FIG. 12 that are associated with each grid client 64. These
metadata tables include information regarding transactions of the
files that are stored on the storage nodes 57 and are maintained
separately from the dispersed files in the storage nodes 57.
[0122] In accordance with the present invention each associated
director 66 generally stores a Storage Transaction Table with an
exemplary structure as illustrated below for each node:
TABLE-US-00003 Storage Transaction Table OriginalFileSize Date/Time
TransactionID AccountID FileID (Bytes) Type Completed 3/20/2005
4218274 0031321123 06693142 55312 Add True 14:32:05 3/20/2005
4218275 0031321123 06774921 621921 Add True 14:32:06 3/20/2005
4218276 0019358233 04331131 4481 Remove True 14:32:12 3/20/2005
4218277 0019358233 05823819 8293100219 Add False 14:32:35
[0123] For each storage transaction, the storage transaction table
logs the file size prior to dispersal for storage on the dispersal
grid (OriginalFileSize) and optionally other information regarding
the transaction, for example, the date and time of the transaction;
a unique transaction identification number (TransactionID); an
account identification number associated with that transaction
(AccountID); a file identification number associated with that
transaction (File ID); a transaction type of add or delete; and a
completed flag for that transaction. As such, the storage
transaction table is able to maintain the original size of the
files before dispersal even though the file is dispersed into file
slices on the grid which may be different in size from the original
file size. These file slices may be further reduced in size by the
information dispersal system in order to reduce storage space or
improve transmission time. Accordingly, the storage transaction
table allows more flexible options which include billing for file
storage based upon the original file size even though the files are
dispersed and/or compressed.
[0124] In order to create a billing invoice, a separate Billing
Process requests information from the Grid using the process shown
in FIG. 13. First, a Billing Process logs onto a director 66 in
step 106. Next in step 108, the billing process requests the amount
of original storage associated with each billing account in step
106. Specifically, the Billing Process retrieves the account
identification numbers (AccountID) and the file size prior to
dispersal for storage on the dispersal grid (OriginalFileSize) for
each transaction. Then the Billing Process sums all the original
storage amounts associated with each Billing Account to create a
table as structured below:
TABLE-US-00004 Summary Billing Information Table
TotalOriginalStorage AccountID (Bytes) 0031321123 1388239
0019358233 8457309384
[0125] With the information in the Summary Billing Information
Table, the Billing Process creates invoices for each Billing
Account. This method may be used for commercial dispersed data
storage services that bill an amount based on a rate per byte
storage or that bill an amount based on an amount of data storage
within a range of storage amounts or that use some other method to
determine billing amounts based on storage amounts.
[0126] Obviously, many modifications and variations of the present
invention are possible in light of the above teachings. Thus, it is
to be understood that, within the scope of the appended claims, the
invention may be practiced otherwise than is specifically described
above.
* * * * *