U.S. patent application number 12/553199 was filed with the patent office on 2010-03-18 for transferring or migrating portions of data objects, such as block-level data migration or chunk-based data migration.
Invention is credited to Kamleshkumar K. Lad.
Application Number | 20100070474 12/553199 |
Document ID | / |
Family ID | 42008109 |
Filed Date | 2010-03-18 |
United States Patent
Application |
20100070474 |
Kind Code |
A1 |
Lad; Kamleshkumar K. |
March 18, 2010 |
TRANSFERRING OR MIGRATING PORTIONS OF DATA OBJECTS, SUCH AS
BLOCK-LEVEL DATA MIGRATION OR CHUNK-BASED DATA MIGRATION
Abstract
A system and method for migrating data objects based on portions
of the data objects is described. The system may transfer portions
of files, folders, and other data objects from primary storage to
secondary storage based on certain criteria, such as time-based
criteria, age-based criteria, and so on. An increment may be one or
more blocks of a data object, or one or more chunks of a data
object, or other segments that combine to form or store a data
object. For example, the system identifies one or more blocks of a
data object that satisfy a certain criteria, and migrates the
identified blocks. The system may determine that a certain number
of blocks of a file have not been modified or called by a file
system in a certain time period, and migrate these blocks to
secondary storage.
Inventors: |
Lad; Kamleshkumar K.;
(Monroe, NJ) |
Correspondence
Address: |
PERKINS COIE LLP;PATENT-SEA
P.O. BOX 1247
SEATTLE
WA
98111-1247
US
|
Family ID: |
42008109 |
Appl. No.: |
12/553199 |
Filed: |
September 3, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61096587 |
Sep 12, 2008 |
|
|
|
Current U.S.
Class: |
707/624 ;
707/E17.005; 707/E17.01; 707/E17.044 |
Current CPC
Class: |
G06F 16/13 20190101;
G06F 11/1461 20130101; G06F 11/1469 20130101; G06F 16/119 20190101;
G06F 3/0605 20130101; G06F 3/067 20130101; G06F 11/1451 20130101;
G06F 3/0647 20130101; G06F 3/0649 20130101 |
Class at
Publication: |
707/624 ;
707/E17.005; 707/E17.01; 707/E17.044 |
International
Class: |
G06F 17/00 20060101
G06F017/00 |
Claims
1. A system for migrating data from a primary storage device to a
secondary storage device, wherein the system includes a file system
for transferring data to the primary storage device, and wherein
the system further includes a disk driver for at least writing data
received from the file system to the primary storage device and a
secondary driver for at least writing data to the secondary storage
device, the system comprising: a virtual disk driver that receives
data from the file system associated with the primary storage
device and provides data to the disk driver that writes data to the
primary storage device, wherein the virtual disk driver includes: a
data reception component, wherein the data reception component is
configured to receive data from the file system, wherein the
received data identifies multiple blocks of a file to be modified,
wherein the multiple blocks are a proper subset of the total number
of blocks for the file; a data interception component, wherein the
data interception component is configured to intercept the received
data and extract information associated with the received data,
wherein the extracted information includes information identifying
the multiple blocks to be modified; an index component, wherein the
index component is configured to update an index that associates
the extracted information with data blocks on the secondary storage
device that contain the received data; and a data transfer
component, wherein the data transfer component is configured to
transfer the received data to the secondary driver for storage to
the secondary storage device; a block-level data migration
component, wherein the block-level migration component is
configured to identify data blocks within the primary storage
device that satisfy one or more predetermined criteria; a data
management component, configured to communicate with the virtual
disk driver, the block-level data migration component and one or
more media agents, wherein the data management component includes a
storage policy that provides the one or more predetermined
criteria, the storage policy identifying a time period in which to
retain data within the primary storage device and identifying the
one or more media agents in which to transfer the data from the
file system to the disk driver, via the virtual disk driver; and a
media agent, wherein the media agent is one of the identified media
agents and is configured to: transfer data from the identified data
blocks to the secondary driver; and update an index that associates
the transferred data with the secondary storage device that stores
data from the secondary driver.
2. The system of claim 1, wherein the block-level data migration
component is configured to identify data blocks that have not been
accessed by the file system within a predetermined time period, and
wherein the secondary storage device includes a magnetic tape
drive.
3. The system of claim 1, wherein the block-level data migration
component is configured to identify data blocks that have not
changed after a predetermined time period, and wherein the multiple
blocks are written to the secondary storage device in a format that
is not native to a format for an application that created the
file.
4. The system of claim 1, wherein the block-level data migration
component further comprises: an allocation table including one or
more entries that associate data with data blocks that store the
data, the one or more entries including: first information that
identifies data blocks that store the data; and second information
that identifies a date and time of a most recent access to the data
blocks.
5. A method for storing a data object in two or more different data
stores, the method comprising: identifying data blocks representing
a data object in a first data store, wherein the data object is a
discrete data object managed by a file system; for the identified
data blocks: identifying a portion of the identified data blocks
that satisfies one or more data storage criteria, wherein the one
or more data storage criteria is associated with a recent access of
the identified data blocks; and transferring data stored by the
portion of the identified data blocks that satisfies the one or
more data storage criteria to a second data store, wherein the
second data store is associated with data that satisfies the one or
more data storage criteria.
6. The method of claim 5, further comprising: updating an index
associated with the data object to include information associating
the portion of the identified data blocks with the second data
store; and removing information from an allocation table associated
with the file system, wherein the removed information associates
the transferred data with the first data store.
7. The method of claim 5, further comprising: after transferring
the data stored by the portion of the identified data blocks to the
second data store: identifying, from a portion of the identified
data blocks that does not satisfy the one or more data storage
criteria, one or more data blocks that satisfy the data retention
criteria; and transferring the identified one or more blocks to the
second data store.
8. The method of claim 5, wherein the one or more data storage
criteria includes a time period in which to retain data in the
first data store.
9. The method of claim 5, wherein the one or more data storage
criteria defines a time period in which the recent access must
satisfy.
10. The method of claim 5, wherein the first data store includes a
disk drive associated with the file system and the second data
store includes removable media located in a different location than
a location of the disk drive.
11. A tangible computer-readable storage medium whose contents
cause a data storage system to perform a method of migrating data
from primary storage to secondary storage, the method comprising:
identifying no more than n-1 data blocks, located within primary
storage, that satisfy a criteria, wherein the n-1 data blocks
represent a portion of a data file consisting of n blocks and the n
blocks contain data written by a file system associated with the
primary storage; and transferring data contained by the identified
no more than n-1 data blocks from the primary storage to the
secondary storage.
12. The computer-readable medium of claim 11, further comprising:
updating a table to include information associating the transferred
data with information identifying blocks within the secondary
storage that contain the transferred data.
13. The computer-readable medium of claim 11, further comprising:
updating a table to include information associating the transferred
data with information identifying tape offsets for the secondary
storage that contain the transferred data.
14. The computer-readable medium of claim 11, further comprising:
removing information associated with the identified blocks from an
allocation table of the file system.
15. The computer-readable medium of claim 11, wherein the criteria
defines a time period in which the file system last accessed the
blocks of the primary storage.
16. The computer-readable medium of claim 11, wherein the criteria
defines a time period in which changes were made to data contained
by the primary storage.
17. The computer-readable medium of claim 11, wherein the criteria
defines a time period in which the data contained by the primary
storage was created.
18. The computer-readable medium of claim 11, wherein primary
storage includes a data file for a virtual machine.
19. A method in a data storage system for restoring a portion of a
file, the method comprising: receiving, at a file system, a request
from a user to modify a portion of a file, wherein the file is at
least partially stored in secondary storage on a storage device
located at a location geographically different than a location for
the file system; identifying one or more data blocks within the
storage device that contain data associated with the portion of the
file to be modified; retrieving the data contained by the
identified one or more data blocks without retrieving all data
blocks associated with the file; presenting the retrieved data to
the user; and upon receiving input from the user to modify the
portion of the file, transferring data associated with the received
input for storage by the storage device.
20. A system for restoring a portion of a file, the system
comprising: means, at a file system, for receiving a request from a
user to modify a portion of a file, wherein the file is at least
partially stored in secondary storage on a storage device located
at a location geographically different than a location for the file
system; means for identifying one or more data blocks within the
storage device that contain data associated with the portion of the
file to be modified; means for retrieving the data contained by the
identified one or more data blocks without retrieving all data
blocks associated with the file; means for presenting the retrieved
data to the user; and means for transferring data associated with
the received input for storage by the storage device upon receiving
input from the user to modify the portion of the file.
21. The system of claim 20, wherein the identified one or more data
blocks are a proper subset of a set of data blocks that contain
associated with the file.
22. The system of claim 20, wherein the means for identifying one
or more data blocks identifies one or more chunks within the
storage device.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Patent Application
No. 61/096,587, filed on Sep. 12, 2008, entitled TRANSFERRING OR
MIGRATING PORTIONS OF DATA OBJECTS, SUCH AS BLOCK-LEVEL DATA
MIGRATION OR CHUNK-BASED DATA MIGRATION, which is incorporated by
reference in its entirety.
BACKGROUND
[0002] Data storage systems contain large amounts of data. This
data includes personal data, such as financial data,
customer/client/patient contact data, audio/visual data, and much
more. Computer systems often contain word processing documents,
engineering diagrams, spreadsheets, business strategy
presentations, email mailboxes, and so on. With the proliferation
of computer systems and the ease of creating content, the amount of
content in an organization has expanded rapidly. Even small offices
often have more information stored than any single employee can
know about or locate.
[0003] To that end, both companies and individuals rely on data
storage systems to store, protect, and/or hold old data, such as
data no longer actively needed. Often, these data storage systems
perform data migration, moving data from primary storage
(containing actively needed data) to secondary storage (such as
backup storage or archives). Typical data storage systems transfer
data in the forms of files, folders, and so on. For example, the
typical data storage system may transfer data from a data store
associated with a user to secondary storage while maintaining the
structure and application format of the files themselves.
[0004] To restore the data, these systems then require knowledge of
applications that create the data. Additionally, some files, can be
very large, and restoring a large file can be costly, time
consuming, and resource intensive.
[0005] The need exists for a system that overcomes the above
problems, as well as one that provides additional benefits.
Overall, the examples herein of some prior or related systems and
their associated limitations are intended to be illustrative and
not exclusive. Other limitations of existing or prior systems will
become apparent to those of skill in the art upon reading the
following Detailed Description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a block diagram illustrating components of a data
stream utilized by a suitable data storage system.
[0007] FIG. 2 is a block diagram illustrating an example of a data
storage system.
[0008] FIG. 3 is a block diagram illustrating an example of
components of a server used in data storage operations.
[0009] FIG. 4 is a block diagram illustrating a system for
performing increment-based data migration.
[0010] FIG. 5 is a block diagram illustrating the intermediate
component of FIG. 4.
[0011] FIGS. 6A and 6B are schematic diagrams illustrating a data
store before and after a block-based data migration,
respectively.
[0012] FIG. 7 is a flow diagram illustrating a routine for
performing block-level data migration.
[0013] FIG. 8 is a block diagram illustrating a system for
providing chunk-based data migration and/or restoration.
[0014] FIG. 9 is a flow diagram illustrating a routine for
performing chunk-level data migration.
[0015] FIG. 10 is flow diagram illustrating a routine for
block-based or chunk-based data restoration and modification.
DETAILED DESCRIPTION
Overview
[0016] Described in detail herein is a system and method that
transfers or migrates data objects (such as files, folders, data
stores, and/or discrete data component(s) by migrating segments,
portions, increments, or proper subsets of the data objects. The
system may transfer increments of files, folders, and other data
objects from primary storage (or other sources) to secondary
storage based on certain criteria, such as time-based criteria,
age-based criteria, and so on. An increment may be one or more
blocks of a data object, or one or more chunks of a data object, or
other portions that combine to form, store, and/or contain a data
object, such as a file.
[0017] In some examples, the system performs block-based migration
of data. That is, the system identifies one or more blocks of a
data object that satisfy a certain criteria, and migrates the
identified blocks. For example, the system may determine that a
certain number of blocks of a file have not been modified or called
by a file system within a certain time period, and migrate these
blocks to secondary storage. The system then maintains the other
blocks of the file in primary storage. In some cases, the system
automatically migrates data without requiring user input.
Additionally, the migration may be transparent to a user.
[0018] In some examples, the system performs chunk-based migration
of data. A chunk is, for example, a group or set of blocks. One or
more chunks may comprise a file, folder, or other data object. The
system identifies one or more chunks of a data object that satisfy
a certain criteria, and migrates the identified chunks. For
example, the system may determine that a certain number of chunks
of a file have not been modified or called by a file system in a
certain time period, and migrate these chunks to secondary storage.
The system then maintains the other chunks of the file in primary
storage. Further details regarding chunks and chunk-based storage
may be found in U.S. Patent Application No. 61/180,791, entitled
BLOCK-LEVEL SINGLE INSTANCING, filed May 22, 2009.
[0019] In some examples, the system leverages the block-based or
chunk-based data migration in order to restore portions of data
objects without restoring entire data objects. For example, the
system can restore one or more blocks of a file, present the data
contained by the blocks, receive modifications to the data, and
update the blocks, and hence the file.
[0020] The system will now be described with respect to various
examples. The following description provides specific details for a
thorough understanding of, and enabling description for, these
examples of the system. However, one skilled in the art will
understand that the system may be practiced without these details.
In other instances, well-known structures and functions have not
been shown or described in detail to avoid unnecessarily obscuring
the description of the examples of the system.
[0021] The terminology used in the description presented below is
intended to be interpreted in its broadest reasonable manner, even
though it is being used in conjunction with a detailed description
of certain specific examples of the system. Certain terms may even
be emphasized below; however, any terminology intended to be
interpreted in any restricted manner will be overtly and
specifically defined as such in this Detailed Description
section.
Suitable System
[0022] Referring to FIG. 1, a block diagram illustrating components
of a data stream utilized by a suitable data storage and recovery
system, such as a system that performs block-based and/or
chunk-based data migration, is shown. The stream 110 may include a
client 111, a media agent 112, and a secondary storage device 113.
For example, in storage operations, the system may store, receive
and/or prepare data, such as blocks or chunks, to be stored, copied
or backed up at a server or client 111. The system may then
transfer the data to be stored to media agent 112, which may then
refer to storage policies, schedule policies, and/retention
policies (and other policies) to choose a secondary storage device
113. The media agent 112 may include or be associated with an
intermediate component, to be discussed herein.
[0023] The secondary storage device 113 receives the data from the
media agent 112 and stores the data as a secondary copy, such as a
backup copy. Secondary storage devices may be magnetic tapes,
optical disks, USB and other similar media, disk and tape drives,
and so on. Of course, the system may employ other configurations of
stream components not shown in the Figure.
[0024] Referring to FIG. 2, a block diagram illustrating an example
of a data storage and recovery system 200 is shown. Data storage
systems may contain some or all of the following components,
depending on the needs of the system. FIG. 2 and the following
discussion provide a brief, general description of a suitable
computing environment in which the system can be implemented.
Although not required, aspects of the system are described in the
general context of computer-executable instructions, such as
routines executed by a general-purpose computer, e.g., a server
computer, wireless device or personal computer. Those skilled in
the relevant art will appreciate that the system can be practiced
with other communications, data processing, or computer system
configurations, including: Internet appliances, network PCs,
mini-computers, mainframe computers, and the like. Indeed, the
terms "computer," "host," and "host computer" are generally used
interchangeably herein, and refer to any of the above devices and
systems, as well as any data processor.
[0025] Aspects of the system can be embodied in a special purpose
computer or data processor that is specifically programmed,
configured, or constructed to perform one or more of the
computer-executable instructions explained in detail herein.
Aspects of the system can also be practiced in distributed
computing environments where tasks or modules are performed by
remote processing devices, which are linked through a
communications network, such as a Local Area Network (LAN), Wide
Area Network (WAN), Storage Area Network (SAN), Fibre Channel, or
the Internet. In a distributed computing environment, program
modules may be located in both local and remote memory storage
devices.
[0026] Aspects of the system may be stored or distributed on
computer-readable media, including tangible storage media, such as
magnetically or optically readable computer discs, hard-wired or
preprogrammed chips (e.g., EEPROM semiconductor chips),
nanotechnology memory, biological memory, or other data storage
media. Indeed, computer implemented instructions, data structures,
screen displays, and other data under aspects of the system may be
distributed over the Internet or over other networks (including
wireless networks), on a propagated signal on a propagation medium
(e.g., an electromagnetic wave(s), a sound wave, etc.) over a
period of time, or they may be provided on any analog or digital
network (packet switched, circuit switched, or other scheme). Those
skilled in the relevant art will recognize that portions of the
system reside on a server computer, while corresponding portions
reside on a client computer, and thus, while certain hardware
platforms are described herein, aspects of the system are equally
applicable to nodes on a network.
[0027] For example, the data storage system 200 contains a storage
manager 210, one or more clients 111, one or more media agents 112,
and one or more storage devices 113. Storage manager 210 controls
media agents 112, which may be responsible for transferring data to
storage devices 113. Storage manager 210 includes a jobs agent 211,
a management agent 212, a database 213, and/or an interface module
214. Storage manager 210 communicates with client(s) 111. One or
more clients 111 may access data to be stored by the system from
database 222 via a data agent 221. The system uses media agents
112, which contain databases 231, to transfer and store data into
storage devices 113. Client databases 222 may contain data files
and other information, while media agent databases may contain
indices and other data structures that store the data at secondary
storage devices, for example.
[0028] The data storage and recovery system may include software
and/or hardware components and modules used in data storage
operations. The components may be storage resources that function
to copy data during storage operations. The components may perform
other storage operations (or storage management operations) other
that operations used in data stores. For example, some resources
may create, store, retrieve, and/or migrate primary or secondary
data copies of data. Additionally, some resources may create
indices and other tables relied upon by the data storage system and
other data recovery systems. The secondary copies may include
snapshot copies and associated indices, but may also include other
backup copies such as HSM copies, archive copies, auxiliary copies,
and so on. The resources may also perform storage management
functions that may communicate information to higher level
components, such as global management resources.
[0029] In some examples, the system performs storage operations
based on storage policies, as mentioned above. For example, a
storage policy includes a set of preferences or other criteria to
be considered during storage operations. The storage policy may
determine or define a storage location and/or set of preferences
about how the system transfers data to the location and what
processes the system performs on the data before, during, or after
the data transfer. In some cases, a storage policy may define a
logical bucket in which to transfer, store or copy data from a
source to a data store, such as storage media. Storage policies may
be stored in storage manager 210, or may be stored in other
resources, such as a global manager, a media agent, and so on.
Further details regarding storage management and resources for
storage management will now be discussed.
[0030] Referring to FIG. 3, a block diagram illustrating an example
of components of a server used in data storage operations is shown.
A server, such as storage manager 210, may communicate with clients
111 to determine data to be copied to storage media. As described
above, the storage manager 210 may contain a jobs agent 211, a
management agent 212, a database 213, and/or an interface module.
Jobs agent 211 may manage and control the scheduling of jobs (such
as copying data files) from clients 111 to media agents 112.
Management agent 212 may control the overall functionality and
processes of the data storage system, or may communicate with
global managers. Database 213 or another data structure may store
storage policies, schedule policies, retention policies, or other
information, such as historical storage statistics, storage trend
statistics, and so on. Interface module 215 may interact with a
user interface, enabling the system to present information to
administrators and receive feedback or other input from the
administrators or with other components of the system (such as via
APIs).
[0031] In some examples, the system performs some or all the
operations described herein using an intermediate component,
virtual storage device, virtual device driver, virtual disk driver,
or other intermediary capable of mounting to a file system and
communicating with a storage device. That is, an intermediate
component may communicatively reside between a file system and a
primary data store that contains data created by the file system
and a secondary data store. The intermediate component enables
flexibility during data restoration, enabling a file system to
indirectly access a secondary copy of data in order to identify
information associated with data stored by the secondary copy,
among other benefits.
Data Migration System
[0032] Referring to FIG. 4, a block diagram illustrating a system
for performing portion-based data migration is shown. The system
components include a data creation and/or modification component
410, an intermediate component 420, and a data storage component
430. The restore component 410 may include a client portion 415,
such as a client portion that receives input from users. A file
system 417, as discussed herein, may organize and provide data to
applications, user interfaces, and so on to the user, among other
things. The file system creates, updates, modifies, and/or removes
data from a data store, based on input from users. The file system
417 may store the created data in one or more data stores, such as
a local database 418 that provides primary storage. For example,
the database 418 may be a hard drive or hard disk that stores data
produced by the file system as primary copies or production copies
of the data. The system components may also include an intermediate
component 420 (further described herein), such as a virtual disk
driver. The intermediate component 420 communicates with a disk
driver 435 and mounted disk 437, which together may act as the data
storage component 430. Additionally, the intermediate component 420
may be located between the file system 417 and database 418. The
data storage component provides secondary storage, and may store
secondary copies of data generated by the file system 417, such as
secondary copies of primary copies stored in database 418.
[0033] Referring to FIG. 5, a block diagram illustrating the
intermediate component 420 of FIG. 4 is shown. The intermediate
component 420 includes a restore module 510 that may contain its
own file system 515. The restore module 510 (or component,
sub-system, and so on), may communicate with a file system, such as
the file system 417. Further details with respect the functionality
of the restore module 510 is described herein.
[0034] The intermediate component 420 may also include a storage
device module 520 that communicates with storage devices, such as
disk driver 435 and disk 437 (or other fixed or removable media).
The storage device module 520 may include an index 525 or
allocation table that identifies available media for data storage,
contains information associated with data stored via the
intermediate component 420, and so on.
[0035] The intermediate component 420 may also include a cache 530
(or, a cache module or interface that communicates with an external
cache), and/or other agents or modules 540, such as modules that
index files, classify files, manage files or information, and so
on.
Block-Based Data Migration
[0036] Block-level migration, or block-based data migration,
involves migrating disk blocks from a primary data store (e.g., a
disk partition) to secondary media. Using block-level migration, a
data storage system transfers blocks on a disk partition that have
not been recently accessed to secondary storage, freeing up space
on the disk. In order to expand the database, the system moves data
from the database to other locations, such as other databases or
storage locations. Typically, such expansion requires knowledge of
the database, such as the database application, the database
schema, and so on. However, using block-level migration, the system
can expand or extend a database without any knowledge of the
applications or schema of the database, providing for transparent
migration and/or restoration of data from one storage location to
another. This can be helpful when migrating data from virtual
machines that contain large files, (e.g., large files created by
applications such as Vmware, Microsoft Virtual Server, and so on).
The system may implement block-level migration processes as
software device drivers, but may also implement block-level
migration in disk hardware.
[0037] As described herein, the system can transfer or migrate
certain blocks of a data object from one data store to another,
such as from primary storage that contains a primary copy of the
data object to secondary storage that contains or will contain a
secondary copy of the primary copy of the data object. Referring to
FIGS. 6A-6B, a schematic diagram illustrating contents of two data
stores before and after a block-based data migration is shown. In
FIG. 6A, a first data store 610 contains primary copies (i.e.,
production copies) of two data objects, a first data object 620 and
a second data object 630. The first data object comprises blocks A
and A.sup.1, where blocks A are blocks that satisfy or meet certain
storage criteria (such as blocks that have not been modified since
creation or not been modified within a certain period of time) and
blocks A' are blocks that do not meet the criteria (such as blocks
that have been modified within the certain time period). The second
data object comprises blocks B and B', where blocks B satisfy the
criteria and blocks B' do not meet the criteria.
[0038] FIG. 6B depicts the first data store 610 after a block-based
data migration of the two data objects 620 and 630. In this
example, the system only transfers the data from blocks that
satisfy a criteria (blocks A and B) from the first data store 610
to a second data store 640, such as secondary storage 642, 644. The
secondary storage may include one or more magnetic tapes, one or
more optical disks, and so on. The system maintains data in the
remaining blocks (blocks A' and B') within the first data store
610.
[0039] The system can perform file system data migration at a block
level, unlike previous systems that only migrate data at the file
level (that is, they have a file-level granularity). By tracking
migrated blocks, the system can also restore data at the block
level, which may avoid cost and time problems associated with
restoring data at the file level or may assist in defragmenting a
storage device. Further details regarding the block-level
restoration of data is be discussed herein.
[0040] Referring to FIG. 7, a flow diagram illustrating a routine
700 for performing block-level data migration is shown. In step
710, the system identifies data blocks within a data store that
satisfy a certain criteria. The system may track data blocks and
access the blocks via APIs. The data store may be a database
associated with a file system, a SQL database, a Microsoft Exchange
mailbox, and so on. The system may compare some or all of the
blocks (or, information associated with the blocks) of the data
store with predetermined criteria. The predetermined criteria may
be time-based criteria within a storage policy or data retention
policy.
[0041] In some examples, the system identifies blocks set to be
"aged off" from the data store. That is, the system identifies
blocks created, changed, or last modified before a certain date and
time. For example, the system may review a data store for all data
blocks that satisfy a criterion or criteria. The data store may be
an electronic mailbox or personal folders (.pst) file for a
Microsoft Exchange user, and the criterion may define, for example,
all blocks or emails last modified or changed thirty days ago or
earlier. The system compares information associated with the
blocks, such as metadata associated with the blocks, to the
criteria, and identifies all blocks that satisfy the criteria. For
example, the system identifies all blocks in the .pst file not
modified within the past thirty days. The identified blocks may
include all the blocks for some emails and/or a portion of the
blocks for other emails. That is, for a given email (or data
object), a first portion of the blocks that include the email may
satisfy the criteria, while a second portion of the blocks that
include the same email may not satisfy the criteria. In other
words, a file or a data object can be divided into parts or
portions, and only some of the parts or portions change.
[0042] To determine which blocks have changed, and when, the system
can monitor the activity of the file system via the intermediate
component 420, (e.g., the virtual device driver). The system may
store a data structure, such as a bitmap, table, log, and so on
within the cache 530 or other memory of the intermediate component
420, and update the bitmap whenever the file system calls the
database 418 to access and update or change data blocks within the
database 418. The intermediate component 420 traps the command to
the disk driver, where that command identifies certain blocks on a
disk for access or modifications, and writes to the bitmap the
changed blocks and the time of the change. The bitmap may include
information such as an identification of changed blocks and a date
and a time the blocks were changed. The bitmap, which may be a
table, data structure, or group of pointers, such as a snapshot,
may also include other information, such as information that maps
file names to blocks, information that maps chunks to blocks and/or
file names, and so on. Table 1 provides entry information for a
bitmap tracking the activity of a file system with the "/users"
directory:
TABLE-US-00001 TABLE 1 Blocks Date and Time Modified
/users/blocks1-100 09.08.2008 @14:30 /users/blocks101-105
09.04.2008 @12:23 /users2/blocks106-110 09.04.2008 @11:34
/users3/blocks110-1000 08.05.2008 @10:34
[0043] Thus, if a storage policy identified the time 08.30.2008 @
12:00 as a threshold time criteria, where data modified after the
time is to be retained, the system would identify, in step 710,
blocks110-1000 as having satisfied the criteria. Thus, the system,
via the intermediate component 420, can monitor what blocks are
requested by a file system, and act accordingly, as described
herein.
[0044] In step 720, the system transfers data within the identified
blocks from the data store to a media agent, to be stored in a
different data store. The system may perform some or all of the
processes described with respect to FIGS. 1-3 when transferring the
data to the media agent. For example, before transferring data, the
system may review a storage policy as described herein to select a
media agent, such as media agent 112, based on instructions within
the storage policy. In step 725, the system optionally updates an
allocation table, such as a file allocation table (FAT) for a file
system associated with the data store, to indicate the data blocks
that no longer contain data and are now free to receive and store
data from the file system.
[0045] In step 730, via the media agent, the system stores data
from the blocks to a different data store. In some cases, the
system, via the media agent, stores the data from the blocks to a
secondary storage device, such as a magnetic tape or optical disk.
For example, the system may store the data from the blocks in
secondary copies of the data store, such as a backup copy, an
archive copy, and so on. In some cases, the system stores the data
from the blocks to a storage device located near and/or associated
with the data store, such as to a quick recovery volume that
facilitates quick restores of data.
[0046] The system may create, generate, update, and/or include an
allocation table, (such as a table for the data store) that tracks
the transferred data and the data that was not transferred. The
table may include information identifying the original data blocks
for the data, the name of the data object, the location of any
transferred data blocks, and so on. For example, Table 2 provides
entry information for an example .pst file:
TABLE-US-00002 TABLE 2 Name of Data Object Location of data Email1
C:/users/blocks1-100 Email2.1 (body of email)
C:/users/blocks101-120 Email2.2 (attachment) X:/remov1/blocks1-250
Email3 X:/remov2/blocks300-500
[0047] In the above example, the data for "Email2" is stored in two
locations, a local data store (C:/) and an off-site data store
(X:/). The system maintains the body of the email, recently
modified or accessed, at a location within a data store associated
with a file system, "C:/users/blocks101-120." The system stores the
attachment, not recently modified or accessed, in a separate data
store, "X:/remov1/blocks1-250." Of course, the table may include
other information, fields, or entries not shown. For example, when
the system stored data to tape, the table may include tape
identification information, tape offset information, and so on.
Chunk-Based Data Migration
[0048] Chunked file migration, or chunk-based data migration,
involves splitting a data object into two or more portions of the
data object, creating an index that tracks the portions, and
storing the data object to secondary storage via the two or more
portions. Among other things, the chunk-based migration provides
for fast and efficient storage of a data object. Additionally,
chunk-based migration facilitates fast and efficient recall of a
data object, such as the large files described herein. For example,
if a user modifies a migrated file, chunk-based migration enables a
data restore component to only retrieve from, and migrate back to,
secondary storage the chunk containing the modified portion of the
file, and not the entire file. In some cases, chunk-based migration
may collaborate with components that provide file format and/or
database schema information in order to facilitate data
recovery.
[0049] As described above, in some examples the system migrates
chunks of data (sets of blocks) that comprise a data object from
one data store to another. Referring to FIG. 8, a block diagram
illustrating a system 800 for providing chunk-based data migration
and/or restoration is shown. The system 800 includes a file system
810, a callback layer 820, which interacts with the file system,
and a device driver 830, which reads from and writes data to a data
store 840 such as removable media including magnetic tapes, optical
disks, and so on. Further details with respect to the callback
layer 820 will be described herein.
[0050] As described above, the system migrates data via one or more
chunks, such as sets of blocks. A data object, such as a file, may
comprise two or more chunks. A chunk may be a logical division of a
data object. For example, a .pst file may include two or more
chucks: a first chunk that stores data associated with an index of
a user's mailbox, and one or more chunks that stores email,
attachments, and so on within the user's mailbox. A chunk is a
proper subset of all the blocks comprising a file. That is, for a
file consisting of n blocks, the largest chunk of the file
comprises at most n-1 blocks.
[0051] The system 800 may include a chunking component 815 that
divides data objects, such as files, into chunks. The chunking
component 815 may receive files to be stored in database 418,
divide the files into two or more chunks, and store the files as
two or more chunks in database 418. The chunking component 815 may
update an index that associated information associated with files
with the chunks of the file, the data blocks of the chunks, and so
on.
[0052] The chunking component 815 may perform different processes
when determining how to divide a data object. For example, the
chunking component 815 may include indexing, header, and other
identifying information or metadata in a first chunk, and include
the payload in other chunks. The chunking component 815 may follow
a rules-based process when dividing a data object. The rules may
define a minimum or maximum data size for a chunk, a time of
creation for data within a chunk, a type of data within a chunk,
and so on.
[0053] For example, the chunking component 815 may divide a user
mailbox (such as a .pst file) into a number of chunks, based on
various rules that assign emails within the mailbox to chunks based
on the metadata associated with the emails. The chunking component
815 may place an index of the mailbox in a first chunk and the
emails in other chunks. The chunking component 815 may then divide
the other chunks based on dates of creation, deletion or reception
of the emails, size of the emails, sender of the emails, type of
emails, and so on. Thus, as an example, the chunking component may
divide a mailbox as follows:
TABLE-US-00003 User1/Chunk1 Index User1/Chunk2 Sent emails
User1/Chunk3 Received emails User1/Chunk4 Deleted emails
User1/Chunk5 All Attachments.
Of course, other divisions are possible. Chunks may not necessarily
fall within logical divisions. For example, the chunking component
may divide a data object based on information or instructions not
associated with the data object, such as information about data
storage resources, information about a target secondary storage
device, historical information about previous divisions, and so
on.
[0054] The system may perform chunking at various times or in
different locations of a data storage system. For example, although
FIG. 8 shows the chunking component 815 at file system 810, the
system may locate the chunking component at the device driver 830,
at an intermediate component, or other locations. In some cases,
the system may utilize the chunking component 815 to divide data
already in secondary storage into chunks. For example, a data
storage system may retrieve data objects under management that were
transferred to secondary storage using file-based data migration,
divide the data objects into two or more chunks, and migrate the
data objects based to storage using the chunk-based data migration
discussed herein. Thus, future restoration of the data objects may
be faster and easier because the data objects are divided into
chunks.
[0055] Referring to FIG. 9, a flow diagram illustrating a routine
900 for performing chunk-level data migration is shown. In step
910, the system identifies chunks of data blocks within a data
store that satisfy one or more criteria. The data store may store
large files (>50 MB), such as databases associated with a file
system, SQL databases, Microsoft Exchange mailboxes, virtual
machine files, and so on. The system may compare some or all of the
chunks (or, information associated with the chunks) of the data
store with predetermined and/or dynamic criteria. The predetermined
criteria may be time-based criteria within a storage policy or data
retention policy. The system may review an index with the chunking
component 815 when comparing the chunks with applicable
criteria.
[0056] In step 920, the system transfers data within the identified
chunks from the data store to a media agent, to be stored in a
different data store. The system may perform some or all of the
processes described with respect to FIGS. 1-3 when transferring the
data to the media agent. For example, the system may review a
storage policy assigned to the data store and select a media agent
based on instructions within the storage policy. In step 925, the
system optionally updates an allocation table, such as a file
allocation table (FAT) for a file system associated with the data
store, to indicate the data blocks that no longer contain data and
are now free to receive and store data from the file system.
[0057] In some examples, the system monitors the transfer of data
from the file system to the data store via the callback layer 820.
The callback layer 820 may be a layer, or additional file system,
that resides on top of the file system 810. The intermediate layer
820 may intercept data requests from the file system 810, in order
to identify, track and/or monitor the chunks requested by the file
system 810 and store information associated with these requests in
a data structure, such as a bitmap similar to the one shown in
Table 1. Thus, the intermediate layer 820 stores information
identifying when chunks are accessed by tracking calls from the
file system 810 to the data store 840. For example, Table 3
provides entry information for a bitmap tracking calls to a data
store:
TABLE-US-00004 TABLE 3 Chunk of File1 Access Time File1.1
09.05.2008 @12:00 File1.2 09.05.2008 @12:30 File1.3 09.05.2008
@13:30 File1.4 06.04.2008 @12:30
[0058] In this example, the file system 810 creates a data object
named "File1," using the chunking component to divide the file into
four chunks: "File1.1," "File1.2," "File1.3," and "File1.4." The
file system 810 stores the four chunks to data store 840 on
06.04.2008. According to the table, the file system has not
accessed File1.4 since its creation, and most recently accessed the
other chunks on Sep. 5, 2008. Of course, Table 3 may include other
or different information, such as information identifying a
location of the chunks, information identifying the type of media
storing the chunks, information identifying the blocks within the
chunk, and/or other information or metadata.
[0059] In step 930, via the media agent, the system stores the data
from the chunks to a different data store. In some cases, the
system, via the media agent, stores the data to a secondary storage
device, such as a magnetic tape or optical disk. For example, the
system may store the data in secondary copies of the data store,
such as a backup copy, and archive copy, and so on. In some cases,
the system stores the data to a storage device located near and/or
associated with the data store, such as to a quick recovery
volume.
Data Recovery
[0060] The system, using the block-based or chunk-based data
migration processes described herein, is able to restore portions
of files instead of entire files, such as individual blocks or
chunks that comprise portions of the files. Referring to FIG. 10, a
flow diagram illustrating a routine 1000 for block-based or
chunk-based data restoration and modification is shown. In step
1010, the system, via a restore or data recovery component,
receives a request to modify a file located in a data store. For
example, a user submits a request to a file system to provide an
old copy of a large Powerpoint presentation so the user can modify
a picture located on slide 5 of 300 of the presentation. For
example, the data recovery component 410 works with the file system
417 and the data store 430.
[0061] In step 1020, the system identifies one or more blocks or
one or more chunks associated with the request. For example, the
system looks to a table similar to Table 2, and identifies blocks
associated with page 5 of the presentation and blocks associated
with an table of contents of the presentation.
[0062] In step 1030, the system retrieves the identified blocks or
chunks and presents them to the user. For example, the system only
retrieves page 5 and table of contents of the presentation and
presents the pages to the user.
[0063] In step 1040, the system, via the file system, modifies the
retrieved blocks or chunks via the file system. For example, the
user updates the Powerpoint presentation to include a different
picture. In step 1050, the system transfers data associated with
the modified blocks or chunks to the data store. For example, the
system transfers the modified page 5 to the data store. The system
may also update a table that tracks access to the data store, such
as Table 1 or Table 3.
[0064] Thus, the system, leveraging block-based or chunk-based data
migration during data storage, restores only portions of data
objects required by a file system. Such restoration can be, among
other benefits, advantageous over systems that perform file-based
restoration, because those systems restore entire files, which can
be expensive, time consuming, and so on. Some files, such as .pst
files, may contain large amounts of data. File-based restoration
can therefore be inconvenient and cumbersome, among other things,
especially when a user only requires a small portion of a large
file.
[0065] For example, a user submits a request to the system to
retrieve an old email stored in a secondary copy on removable
media. The system identifies a portion of a .pst file associated
with the user that contains a list of old emails, and retrieves the
list. That is, the system has knowledge of the chunk that includes
the list (e.g., a chunking component may always include the list in
a first chunk of a data object), accesses the chunk, and retrieves
the list. The other portions (e.g., all the emails with the .pst
file), are not retrieved from media. The user selects the desired
email from the list. The system, via an index that associates
chunks with data (such as an index similar to Table 2), identifies
the chunk that contains the email, and retrieves the chunk for
presentation to the user. The index may include information about
the chunks, information about the data objects (such as file
formats, database schemas, application specific information, and so
on).
[0066] Thus, the system is able to restore the email without
restoring the entire mailbox (.pst file) associated with the user.
That is, although an entire data object is in storage, the system
is able to retrieve a portion of the entire data object by
leveraging the processes described herein.
CONCLUSION
[0067] From the foregoing, it will be appreciated that specific
examples of the data recovery system have been described herein for
purposes of illustration, but that various modifications may be
made without deviating from the spirit and scope of the system. For
example, although files have been described, other types of content
such as user settings, application data, emails, and other data
objects can be imaged by snapshots. Accordingly, the system is not
limited except as by the appended claims.
[0068] Unless the context clearly requires otherwise, throughout
the description and the claims, the words "comprise," "comprising,"
and the like are to be construed in an inclusive sense, as opposed
to an exclusive or exhaustive sense; that is to say, in the sense
of "including, but not limited to." The word "coupled", as
generally used herein, refers to two or more elements that may be
either directly connected, or connected by way of one or more
intermediate elements. Additionally, the words "herein," "above,"
"below," and words of similar import, when used in this
application, shall refer to this application as a whole and not to
any particular portions of this application. Where the context
permits, words in the above Detailed Description using the singular
or plural number may also include the plural or singular number
respectively. The word "or" in reference to a list of two or more
items, that word covers all of the following interpretations of the
word: any of the items in the list, all of the items in the list,
and any combination of the items in the list.
[0069] The above detailed description of embodiments of the system
is not intended to be exhaustive or to limit the system to the
precise form disclosed above. While specific embodiments of, and
examples for, the system are described above for illustrative
purposes, various equivalent modifications are possible within the
scope of the system, as those skilled in the relevant art will
recognize. For example, while processes or blocks are presented in
a given order, alternative embodiments may perform routines having
steps, or employ systems having blocks, in a different order, and
some processes or blocks may be deleted, moved, added, subdivided,
combined, and/or modified. Each of these processes or blocks may be
implemented in a variety of different ways. Also, while processes
or blocks are at times shown as being performed in series, these
processes or blocks may instead be performed in parallel, or may be
performed at different times.
[0070] The teachings of the system provided herein can be applied
to other systems, not necessarily the system described above. The
elements and acts of the various embodiments described above can be
combined to provide further embodiments.
[0071] These and other changes can be made to the system in light
of the above Detailed Description. While the above description
details certain embodiments of the system and describes the best
mode contemplated, no matter how detailed the above appears in
text, the system can be practiced in many ways. Details of the
system may vary considerably in implementation details, while still
being encompassed by the system disclosed herein. As noted above,
particular terminology used when describing certain features or
aspects of the system should not be taken to imply that the
terminology is being redefined herein to be restricted to any
specific characteristics, features, or aspects of the system with
which that terminology is associated. In general, the terms used in
the following claims should not be construed to limit the system to
the specific embodiments disclosed in the specification, unless the
above Detailed Description section explicitly defines such terms.
Accordingly, the actual scope of the system encompasses not only
the disclosed embodiments, but also all equivalent ways of
practicing or implementing the system under the claims.
[0072] While certain aspects of the system are presented below in
certain claim forms, the applicant contemplates the various aspects
of the system in any number of claim forms. For example, while only
one aspect of the system is recited as a means-plus-function claim
under 35 U.S.C sec. 112, sixth paragraph, other aspects may
likewise be embodied as a means-plus-function claim, or in other
forms, such as being embodied in a computer-readable medium. (Any
claims intended to be treated under 35 U.S.C. .sctn.112, 6 will
begin with the words "means for".) Accordingly, the applicant
reserves the right to add additional claims after filing the
application to pursue such additional claim forms for other aspects
of the system.
* * * * *