U.S. patent application number 16/886534 was filed with the patent office on 2021-06-17 for system and method for generating file system and block-based incremental backups using enhanced dependencies and file system information of data blocks.
The applicant listed for this patent is EMC IP Holding Company LLC. Invention is credited to Aaditya Bansal, Shelesh Chopra, Manish Sharma, Sunil Yadav.
Application Number | 20210182160 16/886534 |
Document ID | / |
Family ID | 1000004881343 |
Filed Date | 2021-06-17 |
United States Patent
Application |
20210182160 |
Kind Code |
A1 |
Sharma; Manish ; et
al. |
June 17, 2021 |
SYSTEM AND METHOD FOR GENERATING FILE SYSTEM AND BLOCK-BASED
INCREMENTAL BACKUPS USING ENHANCED DEPENDENCIES AND FILE SYSTEM
INFORMATION OF DATA BLOCKS
Abstract
A method for a backup operation includes obtaining, by a backup
agent, a backup request for an incremental backup of a file system,
and in response to the backup request: selecting a reference backup
from a backup storage system, obtaining a first hash value document
associated with the reference backup, generating a hash value for
an asset associated with the file system, making a first
determination that the hash value matches a second hash value
specified in the first hash value document, in response to the
first determination, populating an incremental backup with a copy
of data associated with the asset, initiating a transfer of the
incremental backup to the backup storage system, and storing a
second hash value document, wherein the second hash value document
comprises the hash value and a backup identifier of the incremental
backup.
Inventors: |
Sharma; Manish; (Bangalore,
IN) ; Bansal; Aaditya; (Bangalore, IN) ;
Chopra; Shelesh; (Bangalore, IN) ; Yadav; Sunil;
(Bangalore, IN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
EMC IP Holding Company LLC |
Hopkinton |
MA |
US |
|
|
Family ID: |
1000004881343 |
Appl. No.: |
16/886534 |
Filed: |
May 28, 2020 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 11/1469 20130101;
G06F 16/137 20190101; G06F 11/1448 20130101; G06F 2201/835
20130101 |
International
Class: |
G06F 11/14 20060101
G06F011/14; G06F 16/13 20060101 G06F016/13 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 16, 2019 |
IN |
201941052138 |
Claims
1. A method for managing a persistent storage system, the method
comprising: obtaining, by a backup agent, a backup request for an
incremental backup of a file system; and in response to the backup
request: selecting a reference backup from a backup storage system;
obtaining a first hash value document associated with the reference
backup; generating a hash value for an asset associated with the
file system; making a first determination that the hash value
matches a second hash value specified in the first hash value
document; in response to the first determination, populating an
incremental backup with a copy of data associated with the asset;
initiating a transfer of the incremental backup to the backup
storage system; and storing a second hash value document, wherein
the second hash value document comprises the hash value and a
backup identifier of the incremental backup.
2. The method of claim 1, further comprising: prior to initiating
the transfer of the incremental backup to the backup storage
system: generating a third hash value for a second asset associated
with the file system; making a second determination that the third
hash value does not match a fourth hash value; and in response to
the second determination, not populating the incremental backup
with a second copy of second data associated with the second
asset.
3. The method of claim 2, wherein the second hash value document
further comprises the third hash value.
4. The method of claim 1, wherein the first hash value document
comprises a timestamp associated with the reference backup, a
second backup identifier associated with the reference backup, and
a plurality of hash values.
5. The method of claim 1, wherein the reference backup is not a
most recent backup of the file system.
6. The method of claim 1, wherein the asset is a file in the file
system.
7. The method of claim 1, further comprising: obtaining a second
backup request for an incremental block-based backup; and in
response to the second backup request: identifying a plurality of
data blocks changed since a most recent block-based backup;
performing a data block file analysis on the plurality of data
blocks to identify a plurality of modified files; generating the
incremental block-based backup using the plurality of data blocks;
generating a file change document based on the plurality of
modified files; updating the incremental block-based backup based
on the file change document; and initiating a transfer of the
incremental block-based backup to the backup storage system.
8. A system, comprising: a processor; and memory comprising
instructions which, when executed by the processor, perform a
method, the method comprising: obtaining, by a backup agent, a
backup request for an incremental backup of a file system; and in
response to the backup request: selecting a reference backup from a
backup storage system; obtaining a first hash value document
associated with the reference backup; generating a hash value for
an asset associated with the file system; making a first
determination that the hash value matches a second hash value
specified in the first hash value document; in response to the
first determination, populating an incremental backup with a copy
of data associated with the asset; initiating a transfer of the
incremental backup to the backup storage system; and storing a
second hash value document, wherein the second hash value document
specifies the hash value and a backup identifier of the incremental
backup.
9. The system of claim 8, the method further comprising: prior to
initiating the transfer of the incremental backup to the backup
storage system: generating a third hash value for a second asset
associated with the file system; making a second determination that
the third hash value does not match a fourth hash value; and in
response to the second determination, not populating the
incremental backup with a second copy of second data associated
with the second asset.
10. The system of claim 9, wherein the second hash value document
further comprises the third hash value.
11. The system of claim 8, wherein the first hash value document
comprises a time stamp associated with the reference backup, a
second backup identifier associated with the reference backup, and
a plurality of hash values.
12. The system of claim 8, wherein the reference backup is not a
most recent backup of the file system.
13. The system of claim 8, wherein the asset is a file in the file
system.
14. The system of claim 8, the method further comprising: obtaining
a second backup request for an incremental block-based backup; and
in response to the second backup request: identifying a plurality
of data blocks changed since a most recent block-based backup;
performing a data block file analysis on the plurality of data
blocks to identify a plurality of modified files; generating the
incremental block-based backup using the plurality of data blocks;
generating a file change document based on the plurality of
modified files; updating the incremental block-based backup based
on the file change document; and initiating a transfer of the
incremental block-based backup to the backup storage system.
15. A non-transitory computer readable medium comprising computer
readable program code, which when executed by a computer processor
enables the computer processor to perform a method for performing a
backup operation, the method comprising: obtaining, by a backup
agent, a backup request for an incremental backup of a file system;
and in response to the backup request: selecting a reference backup
from a backup storage system; obtaining a first hash value document
associated with the reference backup; generating a hash value for
an asset associated with the file system; making a first
determination that the hash value matches a second hash value
specified in the first hash value document; in response to the
first determination, populating an incremental backup with a copy
of data associated with the asset; initiating a transfer of the
incremental backup to the backup storage system; and storing a
second hash value document, wherein the second hash value document
comprises the hash value and a backup identifier of the incremental
backup.
16. The non-transitory computer readable medium of claim 15, the
method further comprising: prior to initiating the transfer of the
incremental backup to the backup storage system: generating a third
hash value for a second asset associated with the file system;
making a second determination that the third hash value does not
match a fourth hash value; and in response to the second
determination, not populating the incremental backup with a second
copy of second data associated with the second asset.
17. The non-transitory computer readable medium of claim 16,
wherein the second hash value document further comprises the third
hash value.
18. The non-transitory computer readable medium of claim 15,
wherein the first hash value document comprises a timestamp
associated with the reference backup, a second backup identifier
associated with the reference backup, and a plurality of hash
values.
19. The non-transitory computer readable medium of claim 15,
wherein the reference backup is not a most recent backup of the
file system.
20. The non-transitory computer readable medium of claim 15, the
method further comprising: obtaining a second backup request for an
incremental block-based backup; and in response to the second
backup request: identifying a plurality of data blocks changed
since a most recent block-based backup; performing a data block
file analysis on the plurality of data blocks to identify a
plurality of modified files; generating the incremental block-based
backup based on the plurality of data blocks; generating a file
change document based on the plurality of modified files; updating
the incremental block-based backup based on the file change
document; and initiating a transfer of the incremental block-based
backup to the backup storage system.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to Indian Patent
Application No. 201941052138, filed Dec. 16, 2019, which
incorporated by reference herein in its entirety.
BACKGROUND
[0002] Computing devices may include any number of internal
components such as processors, memory, and persistent storage. Each
of the internal components of a computing device may be used to
generate data and to execute functions. The process of generating,
storing, and sending data may utilize computing resources of the
computing devices such as processing and storage. The utilization
of the aforementioned computing resources to generate data and to
send data to other computing devices may impact the overall
performance of the computing resources.
SUMMARY
[0003] In general, in one aspect, the invention relates to a method
for performing backup operations. The method includes obtaining, by
a backup agent, a backup request for an incremental backup of a
file system, and in response to the backup request: selecting a
reference backup from a backup storage system, obtaining a first
hash value document associated with the reference backup,
generating a hash value for an asset associated with the file
system, making a first determination that the hash value matches a
second hash value specified in the first hash value document, in
response to the first determination, populating an incremental
backup with a copy of data associated with the asset, initiating a
transfer of the incremental backup to the backup storage system,
and storing a second hash value document, wherein the second hash
value document comprises the hash value and a backup identifier of
the incremental backup.
[0004] In general, in one aspect, the invention relates to a system
that includes a processor and memory that includes instructions
which, when executed by the processor, perform a method. The method
includes obtaining, by a backup agent, a backup request for an
incremental backup of a file system, and in response to the backup
request: selecting a reference backup from a backup storage system,
obtaining a first hash value document associated with the reference
backup, generating a hash value for an asset associated with the
file system, making a first determination that the hash value
matches a second hash value specified in the first hash value
document, in response to the first determination, populating an
incremental backup with a copy of data associated with the asset,
initiating a transfer of the incremental backup to the backup
storage system, and storing a second hash value document, wherein
the second hash value document comprises the hash value and a
backup identifier of the incremental backup.
[0005] In general, in one aspect, the invention relates to a
non-transitory computer readable medium that includes computer
readable program code, which when executed by a computer processor
enables the computer processor to perform a method for performing
backup operations. The method includes obtaining, by a backup
agent, a backup request for an incremental backup of a file system,
and in response to the backup request: selecting a reference backup
from a backup storage system, obtaining a first hash value document
associated with the reference backup, generating a hash value for
an asset associated with the file system, making a first
determination that the hash value matches a second hash value
specified in the first hash value document, in response to the
first determination, populating an incremental backup with a copy
of data associated with the asset, initiating a transfer of the
incremental backup to the backup storage system, and storing a
second hash value document, wherein the second hash value document
comprises the hash value and a backup identifier of the incremental
backup.
BRIEF DESCRIPTION OF DRAWINGS
[0006] Certain embodiments of the invention will be described with
reference to the accompanying drawings. However, the accompanying
drawings illustrate only certain aspects or implementations of the
invention by way of example and are not meant to limit the scope of
the claims.
[0007] FIG. 1 shows a diagram of a system in accordance with one or
more embodiments of the invention.
[0008] FIG. 2 shows a diagram of a hash value document repository
in accordance with one or more embodiments of the invention.
[0009] FIG. 3A shows a flowchart for performing an incremental
backup of a file system in accordance with one or more embodiments
of the invention.
[0010] FIG. 3B shows a flowchart for performing a block-based
incremental backup in accordance with one or more embodiments of
the invention.
[0011] FIGS. 4A-4B show an example in accordance with one or more
embodiments of the invention.
[0012] FIG. 5 shows a diagram of a computing device in accordance
with one or more embodiments of the invention.
DETAILED DESCRIPTION
[0013] Specific embodiments will now be described with reference to
the accompanying figures. In the following description, numerous
details are set forth as examples of the invention. It will be
understood by those skilled in the art that one or more embodiments
of the present invention may be practiced without these specific
details and that numerous variations or modifications may be
possible without departing from the scope of the invention. Certain
details known to those of ordinary skill in the art are omitted to
avoid obscuring the description.
[0014] In the following description of the figures, any component
described with regard to a figure, in various embodiments of the
invention, may be equivalent to one or more like-named components
described with regard to any other figure. For brevity,
descriptions of these components will not be repeated with regard
to each figure. Thus, each and every embodiment of the components
of each figure is incorporated by reference and assumed to be
optionally present within every other figure having one or more
like-named components. Additionally, in accordance with various
embodiments of the invention, any description of the components of
a figure is to be interpreted as an optional embodiment, which may
be implemented in addition to, in conjunction with, or in place of
the embodiments described with regard to a corresponding like-named
component in any other figure.
[0015] In general, one or more embodiments of the invention relates
to systems and methods for generating incremental backups of
applications in a production host environment. The incremental
backups may be generated using a file system or using block-based
backups. Embodiments of the invention may include using a hash
value document repository to select a reference backup and to
compare hash values of assets in the file system to hash values of
the assets at the point in time of the reference backup to
determine which assets have been modified since the reference
backup. An incremental backup may be generated based on these
determinations.
[0016] Embodiments of the invention further include generating a
block-based incremental backup by identifying which data blocks
have been modified, identifying which files (or assets) are
associated with each changed data block, and, after generating the
block-based backup, storing a file-change document with the
block-based backup that specifies
[0017] FIG. 1 shows a diagram of a system in accordance with one or
more embodiments of the invention. The system may include one or
more clients (100), a production host environment (110), and a
backup storage system (150). The system may include additional,
fewer, and/or different components without departing from the
invention. Each component may be operably connected to any of the
other components via any combination of wired and/or wireless
connections. Each of the aforementioned components is discussed
below.
[0018] In one or more embodiments of the invention, the production
host environment (110) is a grouping of production hosts (110) that
each provide services to the clients (100). Each production host
(110A, 110N) in the production host environment (110) includes
applications (112), a backup agent (116), a block-based write
tracker (118), a hash value document repository (119A), and a file
system storage information (119B). The production hosts (110A,
110N) may include additional, fewer, and/or different components
without departing from the invention. Each of the aforementioned
components illustrated in FIG. 1 are discussed below.
[0019] In one or more embodiments of the invention, a production
host (110A, 110N) hosts one or more applications (112). In one or
more embodiments of the invention, the applications (112) perform
services for clients (e.g., 100). The services may include writing,
reading, and/or otherwise modifying data that is stored in the
production host (110A, 110N). The applications (112) may each
include functionality for writing data to the production host
(110A, 110N) and for notifying the block based write tracker (118)
of data written to a persistent storage system in the production
host (110A, 110N). The applications may be, for example, instances
of databases, email servers, and/or other applications. The
applications (112A, 112N) may host other types of applications
without departing from the invention.
[0020] In one or more of embodiments of the invention, each
application (112A, 112N) is implemented as computer instructions,
e.g., computer code, stored on a persistent storage that when
executed by a processor(s) of the production host (e.g., 110A,
110N) cause the production host (110A, 110N) to provide the
functionality of the applications (e.g., 112A, 112N) described
throughout this application.
[0021] In one or more embodiments of the invention, the production
host (110A, 110N) further includes a backup agent (116). The backup
agent (116) may include functionality for generating backups of a
file system. In one or more embodiments of the invention, a file
system is an organizational data structure that tracks how data is
stored and retrieved in a system (e.g., in persistent storage of a
production host (110A, 110N) or of the production host environment
(110)). The file system may specify references to assets and any
data blocks associated with each asset. An asset may be an
individual data object in the file system. An asset may be, for
example, a file. The backup generated may include a copy of the
assets for one or more specified applications associated with a
specified point in time. The backup of the file system may be
generated via the method illustrated in FIG. 3A. The backup of the
file system may be generated via any other method without departing
from the invention.
[0022] In one or more embodiments of the invention, the backup
agent (116) may further include functionality for generating
block-based backups. In one or more embodiments of the invention, a
block-based backup is a backup generated by copying data blocks in
a persistent storage system (not shown) of a production host (e.g.,
110A, 110N). The data blocks may be stored contiguously or
non-contiguously in the persistent storage system. In other words,
data blocks in stored in portions of a persistent storage system
that are physically located near each other (e.g., next to each
other). The storage location of each data block in the production
host may be specified in the file system storage location (119B)
(discussed below). The block-based backup may be generated via the
method illustrated in FIG. 3B. The block-based backup may be
generated via any other method without departing from the
invention.
[0023] In one or more embodiments of the invention, the backup
agent (116) may generate the backups based on backup policies
implemented by the backup agent (116). The backup policies may
specify a schedule in which the applications (e.g., 112A, 112N) are
to be backed up. The backup agent (116) may be triggered to execute
a backup in response to a backup policy. Alternatively, one or more
of the backups (152, 154) may be generated in response to a backup
request triggered by the client(s) (100). The backup request may
specify the applications to be restored.
[0024] In one or more embodiments of the invention, the backup
agent (116) is a physical device. The physical device may include
circuitry. The physical device may be, for example, a
field-programmable gate array, application specific integrated
circuit, programmable processor, microcontroller, digital signal
processor, or other hardware processor. The physical device may be
adapted to provide the functionality of the backup agent (116)
described throughout this application.
[0025] In one or more embodiments of the invention, the backup
agent (116) is implemented as computer instructions, e.g., computer
code, stored on a persistent storage that when executed by a
processor of the production host (e.g., 110A, 110N) causes the
production host (110A, 110N) to provide the functionality of the
backup agent (116) described throughout this application.
[0026] In one or more embodiments of the invention, the production
host (110A, 110N) further includes a block-based write tracker
(e.g., 118). In one or more embodiments of the invention, the
block-based write tracker (118) tracks the changed portions of the
persistent storage system used in the production host (110A, 110N).
The block-based write tracker (220) tracks such changed portions by
maintaining a block-based change list that specifies each data
block in the persistent storage system that has been changed since
a most recent block-based backup.
[0027] In one or more embodiments of the invention, the block-based
write tracker (118) is a physical device. The physical device may
include circuitry. The physical device may be, for example, a
field-programmable gate array, application specific integrated
circuit, programmable processor, microcontroller, digital signal
processor, or other hardware processor. The physical device may be
adapted to provide the functionality of the block-based write
tracker (118) described throughout this application.
[0028] In one or more embodiments of the invention, the block-based
write tracker (118) is implemented as computer instructions, e.g.,
computer code, stored on a persistent storage that when executed by
a processor of a production host (e.g., 110A, 110N) causes the
production host (110A, 110N) to provide the functionality of the
block-based write tracker (220) described throughout this
application.
[0029] In one or more embodiments of the invention, the hash value
document repository (119A) is a data structure that includes one or
more hash value documents. The hash value documents may each
specify information about assets in the file system at a point in
time in which a file-system backup of the file system was
generated. For additional details regarding the hash value document
repository, see, e.g., FIG. 2.
[0030] In one or more embodiments of the invention, the file system
storage information (119B) is a data structure that specifies each
asset in the file system and a storage location of the data blocks
associated with the asset in the persistent storage system. The
file system storage information (119B) may include entries that
each specify an asset of the file system, the data blocks
associated with the asset, and the physical or logical storage
location of each data block. The storage location may be, for
example, an address (e.g., physical, logical, etc.) associated with
a portion of a physical storage device.
[0031] In one or more embodiments of the invention, the production
host (110A, 110N) is implemented as a computing device (see e.g.,
FIG. 5). The computing device may be, for example, a mobile phone,
a tablet computer, a laptop computer, a desktop computer, a server,
a distributed computing system, or a cloud resource. The computing
device may include one or more processors, memory (e.g., random
access memory), and persistent storage (e.g., disk drives, solid
state drives, etc.). The computing device may include instructions,
stored on the persistent storage, that when executed by the
processor(s) of the computing device cause the computing device to
perform the functionality of the production host (110A, 110N)
described throughout this application.
[0032] In one or more embodiments of the invention, the production
host (110A, 110N) is implemented as a logical device. The logical
device may utilize the computing resources of any number of
computing devices and thereby provide the functionality of the
production host (110A, 110N) described throughout this
application.
[0033] In one or more embodiments of the invention, the client(s)
(100) utilize services provided by the production host (110).
Specifically, the client(s) (100) may utilize the applications in
the applications (112A, 112N) to obtain, modify, and/or store data.
The data may be generated from applications hosted in the
application (112).
[0034] In one or more embodiments of the invention, a client (100)
is implemented as a computing device (see e.g., FIG. 5). The
computing device may be, for example, a mobile phone, a tablet
computer, a laptop computer, a desktop computer, a server, a
distributed computing system, or a cloud resource. The computing
device may include one or more processors, memory (e.g., random
access memory), and persistent storage (e.g., disk drives, solid
state drives, etc.). The computing device may include instructions,
stored on the persistent storage, that when executed by the
processor(s) of the computing device cause the computing device to
perform the functionality of the client (100) described throughout
this application.
[0035] In one or more embodiments of the invention, the client(s)
(100) are implemented as a logical device. The logical device may
utilize the computing resources of any number of computing devices
and thereby provide the functionality of the client(s) (100)
described throughout this application.
[0036] In one or more embodiments of the invention, the backup
storage system (150) stores backups of a file system. The file
system may include application data of the applications (e.g.,
112). The backups may further include application dependency
information. In one or more embodiments of the invention, a backup
is a full or partial copy of one or more applications (e.g., 112A,
112N). The copy may include the application data and/or application
dependency information.
[0037] In one or more embodiments of the invention, a backup (152,
154) in the backup storage system (150) is an incremental backup.
In one or more embodiments of the invention, an incremental backup
is a backup that only stores changes in the persistent storage
system that were made after a previous backup in the backup storage
system. In contrast, a full backup may include all of the data in
the persistent storage system (120) without taking into account
when the data had been modified or otherwise written to the
persistent storage system (120).
[0038] In one or more embodiments of the invention, if the data in
the file system is to be restored to a point in time associated
with an incremental backup, the required backups needed to perform
the restoration include at least: (i) the incremental backup, (ii)
a full backup, and (iii) the intermediate backups (if any) that are
associated with points in time between the full backup and the
incremental backups. In this manner, the required backups
collectively include all of the data of the persistent storage
system (120) at the requested point in time.
[0039] In one or more embodiments of the invention, each backup
(152, 154) in the backup storage system (150) is either a
file-system backup or a block-based backup. In one or more
embodiments of the invention, a file-system backup is a backup
generated by identifying the assets in the file system and
generating a copy of all assets (or a portion thereof). In
contrast, a block-based backup is generated by identifying the data
blocks in the persistent storage system of a production host (e.g.,
110A, 110N) and generating copies of all data blocks (or a portion
thereof). The data in a file-system backup and of a block-based
backup may be similar or different without departing from the
invention.
[0040] In one or more embodiments of the invention, the backup
storage system (150) is implemented as a computing device (see
e.g., FIG. 5). The computing device may be, for example, a mobile
phone, a tablet computer, a laptop computer, a desktop computer, a
server, a distributed computing system, or a cloud resource. The
computing device may include one or more processors, memory (e.g.,
random access memory), and persistent storage (e.g., disk drives,
solid state drives, etc.). The computing device may include
instructions stored on the persistent storage, that when executed
by the processor(s) of the computing device cause the computing
device to perform the functionality of the backup storage system
(150) described throughout this application.
[0041] In one or more embodiments of the invention, the backup
storage system (150) is implemented as a logical device. The
logical device may utilize the computing resources of any number of
computing devices and thereby provide the functionality of the
backup storage system (150) described throughout this
application.
[0042] FIG. 2 shows a diagram of a hash value document repository
in accordance with one or more embodiments of the invention. The
hash value document repository (200) may be an embodiment of the
hash value document repository (119A) discussed above. In one or
more embodiments of the invention, the hash value document
repository (200) includes one or more hash value documents (210A,
210N). Each hash value document (210A, 210N) may include a backup
identifier (212), a timestamp (214), and one or more asset hash
values (216, 218). The hash value document repository (200) may
include additional, fewer, and/or different components without
departing from the invention. Each of the aforementioned components
illustrated in FIG. 2 is discussed below.
[0043] In one or more embodiments of the invention, the backup
identifier (212) of a hash value document (210A, 210N) is a
combination of letters, numbers, and/or symbols that uniquely
identifies a backup stored in a backup storage system. The hash
value document (210A, 210N) may be associated with the backup
identified by the corresponding backup identifier (212).
[0044] In one or more embodiments of the invention, the timestamp
(214) is a combination of letters, numbers, and/or symbols that
uniquely identifies a point in time associated with the backup
identified by the backup identifier (212). The point in time may be
the point in time in which the backup was generated and/or the
point in time in which the data stored in the backup existed.
[0045] In one or more embodiments of the invention, each asset hash
value (e.g., asset A hash value (216), asset M hash value (218)) is
a value that is generated by implementing an encryption function
(e.g., a hash function) on the asset. The hash value may vary based
on the data that is modified in the asset. For example, a hash
value of the asset at a first point in time may be drastically
different from a second hash value of the asset at a second point
in time if any data in the asset is added, deleted, and/or
otherwise modified after the first point in time.
[0046] FIGS. 3A-3B show flowcharts in accordance with one or more
embodiments of the invention. While the various steps in the
flowcharts are presented and described sequentially, one of
ordinary skill in the relevant art will appreciate that some or all
of the steps may be executed in different orders, may be combined
or omitted, and some or all steps may be executed in parallel. In
one embodiment of the invention, the steps shown in FIGS. 3A-3B may
be performed in parallel with any other steps shown in FIGS. 3A-3B
without departing from the scope of the invention.
[0047] FIG. 3A shows a flowchart for performing an incremental
backup of a file system in accordance with one or more embodiments
of the invention. The method shown in FIG. 3A may be performed by,
for example, a backup agent (116, FIG. 1). Other components of the
system illustrated in FIG. 1 may perform the method of FIG. 3A
without departing from the invention.
[0048] In step 300, a backup request for an incremental backup of a
file system is obtained. The backup request may be obtained from a
client managing the initiation of backups. Alternatively, the
backup request may be the result of the backup agent implementing
backup policies. As discussed above, the backup policies may
include schedules that specify when to perform a backup of the
persistent storage device. The backup request may specify the
applications to be backed up.
[0049] In step 302, a reference backup is selected from a backup
storage system. In one or more embodiments of the invention, the
reference backup is selected by sending a selection request to a
client. The client may be the client managing the initiation of
backups.
[0050] The client, in response to the request, may send a response
to the backup agent with a specified backup selected. In one or
more embodiments of the invention, the selected backup may be based
on whether the most recent backup associated with the file system
is available. In other words, the client may identify that the
default backup to be used as the reference backup is not available.
As such, the response may specify a different backup to be used as
the reference backup.
[0051] In one or more embodiments of the invention, the reference
backup is selected by the backup agent by identifying the available
backups in the backup storage and selecting a most recent available
backup in the backup storage system.
[0052] In step 304, a hash value document associated with the
reference backup is obtained. In one or more embodiments of the
invention, the hash value document is identified using the backup
identifier associated with the hash value document. The backup
agent may analyze the hash value repository to identify a hash
value document that specifies the backup identifier associated with
the selected reference backup.
[0053] In step 306, an asset in the file system is selected. In one
or more embodiments of the invention, the asset is an unprocessed
asset in the file system. The asset may be a file, a portion of a
file (e.g., a file segment), a collection of files, and/or any
other sub portion of the file system without departing from the
invention.
[0054] In step 308, a hash value of data associated with the asset
is generated. In one or more embodiments of the invention, the hash
value is generated by performing a hash function (or any other
encryption function) on data associated with the asset.
[0055] In step 310, a determination is made about whether the
generated hash value matches a previous hash value of the reference
backup. In one or more embodiments of the invention, the generated
hash value is compared to an asset hash value stored in the hash
value document that corresponds to the selected asset. If the
generated hash value matches the previous hash value, the method
proceeds to step 314; otherwise, the method proceeds to step
312.
[0056] In step 312, following the determination that the generated
hash value does not match an asset hash value of the hash value
document, an incremental backup is populated with a copy of the
data associated with the asset. In one or more embodiments of the
invention, the incremental backup is generated if this is the first
asset to be selected. The copy of the asset
[0057] In step 314, a determination is made about whether all
assets in the file system are processed. If all assets in the file
system are processed, the method proceeds to step 316; otherwise,
the method proceeds to step 306.
[0058] In step 316, a transfer of the incremental backup to the
backup storage system is initiated. In one or more embodiments of
the invention, the incremental backup is initiated by sending the
incremental backup to the backup storage system.
[0059] In step 318, a hash value document is stored using the
generated hash value(s). In one or more embodiments of the
invention, the hash value document includes the generated hash
values of each asset, a backup identifier of the incremental
backup, a timestamp associated with the point in time associated
with the incremental backup, and the generated hash value(s) of the
asset(s) in the file system.
[0060] FIG. 3B shows a flowchart for performing a block-based
incremental backup in accordance with one or more embodiments of
the invention. The method shown in FIG. 3B may be performed by, for
example, a backup agent (116, FIG. 1). Other components of the
system illustrated in FIG. 1 may perform the method of FIG. 3B
without departing from the invention.
[0061] In step 320, a backup request for an incremental block-based
backup is obtained. The backup request may be obtained from a
client managing the initiation of block-based backups.
Alternatively, the backup request may be the result of the backup
agent implementing backup policies.
[0062] In step 322, one or more data blocks changed since a most
recent block-based backup is identified. In one or more embodiments
of the invention, the data blocks are identified using a
block-based write tracker that tracks the writes performed on a
persistent storage system. The tracked writes may be writes
performed after a most recent block-based backup is performed. The
identified data blocks may be the data blocks specified by the
block-based write tracker that were modified after the most recent
backup.
[0063] In step 324, a data block file analysis is performed on the
identified data blocks to identify modified files. In one or more
embodiments of the invention, the data block file analysis includes
obtaining the file system information, searching for the identified
data blocks, and identifying each file (or asset) associated with
each identified data block.
[0064] For example, if data blocks A, B, and C were identified to
have been changed after the most recent backup, the file system
information may be analyzed to determine which files are associated
with data blocks A, B, and C. The files determined to be associated
with blocks A, B, and C are the identified modified files.
[0065] In step 326, an incremental block-based backup is generated
using the identified data blocks. In one or more embodiments of the
invention, the incremental block-based backup is generated by
copying the identified data blocks and storing the copies in the
incremental block-based backup.
[0066] In step 328, a file change document is generated based on
the identified modified files. In one or more embodiments of the
invention, the file change document specifies the identified
modified files. Further, the file change document specifies a
timestamp associated with the incremental block-based backup.
[0067] In step 330, the backup is updated based on the file change
document. In one or more embodiments of the invention, the backup
is updating by including the file change document to the generated
block-based backup. In this manner, the block-based backup
specifies the modified files, and this information may be provided
to a client when selecting a block-based backup to restore a file
to a previous point in time.
[0068] In step 332, a transfer of the backup to the backup storage
system is initiated. In one or more embodiments of the invention,
the transfer is initiated by sending the backup (which includes the
file change document) to the backup storage system.
Example
[0069] The following section describes an example. The example,
illustrated in FIGS. 4A-4B, is not intended to limit the invention.
Turning to the example, consider a scenario in which a production
host performs an incremental backup of a file system comprising
three files (file A, file B, file C).
[0070] FIG. 4A shows a first diagram of an example system. For the
sake of brevity, not all components of the example system are
illustrated in FIG. 4A. The example system includes a production
host (410), a client (400), and a backup storage system (420). The
production host (410) includes application A (412A), application B
(412B), a backup agent (414), and a hash value document (416).
[0071] The client (400) sends a backup request to the backup agent
(414) that specifies backing up an file system that includes
application data for applications A and B (412A, 412B) [1]. The
backup agent (414), in response to the backup request, follows the
method of FIG. 3A and analyzes the backup storage system (420) to
select a reference backup [2]. In the backup storage system, backup
A (422A) is a full backup of the file system. Backup B (422B) is an
incremental backup that depends on backup A (422A). In other words,
in order for the file system to be restored to a point in time
associated with backup B (422B), both backups A (422A) and B (422B)
will be required for the restoration. A third backup, backup C
(422C), is identified in the backup storage system (420). Backup C
(422C) is an incremental backup that depends on backup B (422B).
However, the backup agent determines that backup C (422C) is a
failed backup. Based on this determination, the backup agent (414)
selects the most recent backup that is available. The backup agent
selects backup B (422B) as the reference backup.
[0072] Following this determination, the backup agent (414)
identifies the hash value document associated with backup B (416).
The hash value document for backup B (416) includes a hash value of
each file in the file system regardless of whether a copy of the
file is in backup B. The backup agent (414) generates a first hash
value of the data in file A of application A (412A) [3]. The first
hash value is compared to a corresponding hash value in the hash
value document for backup B (416) [4]. The backup agent (414)
determines that the hash values match. This determination may
signify that after the generation of backup B, file A was not
modified.
[0073] The backup agent (414) generates a second hash value of the
data in file B of application A (412A) [5]. The second hash value
is compared to a corresponding hash value in the hash value
document for backup B (416) [6]. The backup agent (414) determines
that the hash values do not match. This determination may signify
that after the generation of backup B, file B was in some way
modified.
[0074] The backup agent (414) generates a third hash value of the
data in file C of application B (412B) [7]. The third hash value is
compared to a corresponding hash value in the hash value document
for backup B (416) [8]. The backup agent (414) determines that the
hash values do not match. This determination may signify that after
the generation of backup B, file C was in some way modified.
[0075] FIG. 4B shows a second diagram of the example system. For
the sake of brevity, not all components of the example system are
illustrated in FIG. 4B. At a later point in time, the backup agent
(414) generates an incremental backup of the file system (422D).
The incremental backup (422D) (also referred to as backup D) is
generated by copying the data of files B and C (i.e., the files
that were modified after the generation of backup B (422B)) and
storing the copies in the incremental backup (422D). The
incremental backup (422D) is stored in the backup storage system
(420) [9].
[0076] Further, the backup agent (414) generates a hash value
document for backup D (418) [10]. The hash value document for
backup D (418) includes a backup identifier for backup D (422D), a
timestamp associated with the backup (422D), and the first, second,
and third hash values generated when performing the method of FIG.
3A.
[0077] End of Example
[0078] As discussed above, embodiments of the invention may be
implemented using computing devices. FIG. 5 shows a diagram of a
computing device in accordance with one or more embodiments of the
invention. The computing device (500) may include one or more
computer processors (502), non-persistent storage (504) (e.g.,
volatile memory, such as random access memory (RAM), cache memory),
persistent storage (506) (e.g., a hard disk, an optical drive such
as a compact disk (CD) drive or digital versatile disk (DVD) drive,
a flash memory, etc.), a communication interface (512) (e.g.,
Bluetooth interface, infrared interface, network interface, optical
interface, etc.), input devices (510), output devices (508), and
numerous other elements (not shown) and functionalities. Each of
these components is described below.
[0079] In one embodiment of the invention, the computer
processor(s) (502) may be an integrated circuit for processing
instructions. For example, the computer processor(s) may be one or
more cores or micro-cores of a processor. The computing device
(500) may also include one or more input devices (510), such as a
touchscreen, keyboard, mouse, microphone, touchpad, electronic pen,
or any other type of input device. Further, the communication
interface (512) may include an integrated circuit for connecting
the computing device (500) to a network (not shown) (e.g., a local
area network (LAN), a wide area network (WAN) such as the Internet,
mobile network, or any other type of network) and/or to another
device, such as another computing device.
[0080] In one embodiment of the invention, the computing device
(500) may include one or more output devices (508), such as a
screen (e.g., a liquid crystal display (LCD), a plasma display,
touchscreen, cathode ray tube (CRT) monitor, projector, or other
display device), a printer, external storage, or any other output
device. One or more of the output devices may be the same or
different from the input device(s). The input and output device(s)
may be locally or remotely connected to the computer processor(s)
(502), non-persistent storage (504), and persistent storage (506).
Many different types of computing devices exist, and the
aforementioned input and output device(s) may take other forms.
[0081] One or more embodiments of the invention may be implemented
using instructions executed by one or more processors of the data
management device. Further, such instructions may correspond to
computer readable instructions that are stored on one or more
non-transitory computer readable mediums.
[0082] One or more embodiments of the invention may improve the
operation of one or more computing devices. More specifically,
embodiments of the invention improve the backup operations for data
in a file system. Embodiments of the invention enable a file system
backup to reference any previous backup of the file system
regardless of whether the backup is most recent. Performing a full
backup of the file system may utilize additional computing
resources than an incremental backup. Further, an incremental
backup may be more desirable when not much data was changed between
the incremental backup and the previous backup. Rather than having
the best option be performing a full backup of the file system when
determining that the most recent backup is not available as a
reference backup, embodiments of the invention enable a backup
agent (or another entity) to utilize a different backup (i.e., an
alternate backup to the most recent backup) as a reference backup
for performing an incremental backup.
[0083] Further, embodiments of the invention provide clients
visibility of which files have been modified in a block-based
backup. While traditional block-based backups do not track which
files are modified in each iteration of the block-based backups,
embodiments of the invention utilize storage information of the
backups to determine which files have been modified for each
backup. In this manner, the client has the ability to select a
block-based backup based on the modifications performed on the
files at the specified points in time.
[0084] Thus, embodiments of the invention may address the problem
of inefficient use of computing resources. This problem arises due
to the technological nature of the environment in which backup
operations are performed.
[0085] The problems discussed above should be understood as being
examples of problems solved by embodiments of the invention
disclosed herein and the invention should not be limited to solving
the same/similar problems. The disclosed invention is broadly
applicable to address a range of problems beyond those discussed
herein.
[0086] While the invention has been described above with respect to
a limited number of embodiments, those skilled in the art, having
the benefit of this disclosure, will appreciate that other
embodiments can be devised which do not depart from the scope of
the invention as disclosed herein. Accordingly, the scope of the
invention should be limited only by the attached claims.
* * * * *