System And Method For Generating File System And Block-based Incremental Backups Using Enhanced Dependencies And File System Information Of Data Blocks Sharma; Manish ; et al. [EMC IP Holding Company LLC]

System And Method For Generating File System And Block-based Incremental Backups Using Enhanced Dependencies And File System Information Of Data Blocks

Sharma; Manish ; et al.

Patent Application Summary

U.S. patent application number 16/886534 was filed with the patent office on 2021-06-17 for system and method for generating file system and block-based incremental backups using enhanced dependencies and file system information of data blocks. The applicant listed for this patent is EMC IP Holding Company LLC. Invention is credited to Aaditya Bansal, Shelesh Chopra, Manish Sharma, Sunil Yadav.

Application Number	20210182160 16/886534
Document ID	/
Family ID	1000004881343
Filed Date	2021-06-17

United States Patent Application	20210182160
Kind Code	A1
Sharma; Manish ; et al.	June 17, 2021

SYSTEM AND METHOD FOR GENERATING FILE SYSTEM AND BLOCK-BASED INCREMENTAL BACKUPS USING ENHANCED DEPENDENCIES AND FILE SYSTEM INFORMATION OF DATA BLOCKS

Abstract

A method for a backup operation includes obtaining, by a backup agent, a backup request for an incremental backup of a file system, and in response to the backup request: selecting a reference backup from a backup storage system, obtaining a first hash value document associated with the reference backup, generating a hash value for an asset associated with the file system, making a first determination that the hash value matches a second hash value specified in the first hash value document, in response to the first determination, populating an incremental backup with a copy of data associated with the asset, initiating a transfer of the incremental backup to the backup storage system, and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

Inventors:

Sharma; Manish; (Bangalore, IN) ; Bansal; Aaditya; (Bangalore, IN) ; Chopra; Shelesh; (Bangalore, IN) ; Yadav; Sunil; (Bangalore, IN)

Applicant:

Name	City	State	Country	Type
EMC IP Holding Company LLC	Hopkinton	MA	US

Family ID:

1000004881343

Appl. No.:

16/886534

Filed:

May 28, 2020

Current U.S. Class:	1/1
Current CPC Class:	G06F 11/1469 20130101; G06F 16/137 20190101; G06F 11/1448 20130101; G06F 2201/835 20130101
International Class:	G06F 11/14 20060101 G06F011/14; G06F 16/13 20060101 G06F016/13

Foreign Application Data

Date	Code	Application Number
Dec 16, 2019	IN	201941052138

Claims

1. A method for managing a persistent storage system, the method comprising: obtaining, by a backup agent, a backup request for an incremental backup of a file system; and in response to the backup request: selecting a reference backup from a backup storage system; obtaining a first hash value document associated with the reference backup; generating a hash value for an asset associated with the file system; making a first determination that the hash value matches a second hash value specified in the first hash value document; in response to the first determination, populating an incremental backup with a copy of data associated with the asset; initiating a transfer of the incremental backup to the backup storage system; and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

2. The method of claim 1, further comprising: prior to initiating the transfer of the incremental backup to the backup storage system: generating a third hash value for a second asset associated with the file system; making a second determination that the third hash value does not match a fourth hash value; and in response to the second determination, not populating the incremental backup with a second copy of second data associated with the second asset.

3. The method of claim 2, wherein the second hash value document further comprises the third hash value.

4. The method of claim 1, wherein the first hash value document comprises a timestamp associated with the reference backup, a second backup identifier associated with the reference backup, and a plurality of hash values.

5. The method of claim 1, wherein the reference backup is not a most recent backup of the file system.

6. The method of claim 1, wherein the asset is a file in the file system.

7. The method of claim 1, further comprising: obtaining a second backup request for an incremental block-based backup; and in response to the second backup request: identifying a plurality of data blocks changed since a most recent block-based backup; performing a data block file analysis on the plurality of data blocks to identify a plurality of modified files; generating the incremental block-based backup using the plurality of data blocks; generating a file change document based on the plurality of modified files; updating the incremental block-based backup based on the file change document; and initiating a transfer of the incremental block-based backup to the backup storage system.

8. A system, comprising: a processor; and memory comprising instructions which, when executed by the processor, perform a method, the method comprising: obtaining, by a backup agent, a backup request for an incremental backup of a file system; and in response to the backup request: selecting a reference backup from a backup storage system; obtaining a first hash value document associated with the reference backup; generating a hash value for an asset associated with the file system; making a first determination that the hash value matches a second hash value specified in the first hash value document; in response to the first determination, populating an incremental backup with a copy of data associated with the asset; initiating a transfer of the incremental backup to the backup storage system; and storing a second hash value document, wherein the second hash value document specifies the hash value and a backup identifier of the incremental backup.

9. The system of claim 8, the method further comprising: prior to initiating the transfer of the incremental backup to the backup storage system: generating a third hash value for a second asset associated with the file system; making a second determination that the third hash value does not match a fourth hash value; and in response to the second determination, not populating the incremental backup with a second copy of second data associated with the second asset.

10. The system of claim 9, wherein the second hash value document further comprises the third hash value.

11. The system of claim 8, wherein the first hash value document comprises a time stamp associated with the reference backup, a second backup identifier associated with the reference backup, and a plurality of hash values.

12. The system of claim 8, wherein the reference backup is not a most recent backup of the file system.

13. The system of claim 8, wherein the asset is a file in the file system.

14. The system of claim 8, the method further comprising: obtaining a second backup request for an incremental block-based backup; and in response to the second backup request: identifying a plurality of data blocks changed since a most recent block-based backup; performing a data block file analysis on the plurality of data blocks to identify a plurality of modified files; generating the incremental block-based backup using the plurality of data blocks; generating a file change document based on the plurality of modified files; updating the incremental block-based backup based on the file change document; and initiating a transfer of the incremental block-based backup to the backup storage system.

15. A non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for performing a backup operation, the method comprising: obtaining, by a backup agent, a backup request for an incremental backup of a file system; and in response to the backup request: selecting a reference backup from a backup storage system; obtaining a first hash value document associated with the reference backup; generating a hash value for an asset associated with the file system; making a first determination that the hash value matches a second hash value specified in the first hash value document; in response to the first determination, populating an incremental backup with a copy of data associated with the asset; initiating a transfer of the incremental backup to the backup storage system; and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

16. The non-transitory computer readable medium of claim 15, the method further comprising: prior to initiating the transfer of the incremental backup to the backup storage system: generating a third hash value for a second asset associated with the file system; making a second determination that the third hash value does not match a fourth hash value; and in response to the second determination, not populating the incremental backup with a second copy of second data associated with the second asset.

17. The non-transitory computer readable medium of claim 16, wherein the second hash value document further comprises the third hash value.

18. The non-transitory computer readable medium of claim 15, wherein the first hash value document comprises a timestamp associated with the reference backup, a second backup identifier associated with the reference backup, and a plurality of hash values.

19. The non-transitory computer readable medium of claim 15, wherein the reference backup is not a most recent backup of the file system.

20. The non-transitory computer readable medium of claim 15, the method further comprising: obtaining a second backup request for an incremental block-based backup; and in response to the second backup request: identifying a plurality of data blocks changed since a most recent block-based backup; performing a data block file analysis on the plurality of data blocks to identify a plurality of modified files; generating the incremental block-based backup based on the plurality of data blocks; generating a file change document based on the plurality of modified files; updating the incremental block-based backup based on the file change document; and initiating a transfer of the incremental block-based backup to the backup storage system.

Description

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority to Indian Patent Application No. 201941052138, filed Dec. 16, 2019, which incorporated by reference herein in its entirety.

BACKGROUND

[0002] Computing devices may include any number of internal components such as processors, memory, and persistent storage. Each of the internal components of a computing device may be used to generate data and to execute functions. The process of generating, storing, and sending data may utilize computing resources of the computing devices such as processing and storage. The utilization of the aforementioned computing resources to generate data and to send data to other computing devices may impact the overall performance of the computing resources.

SUMMARY

[0003] In general, in one aspect, the invention relates to a method for performing backup operations. The method includes obtaining, by a backup agent, a backup request for an incremental backup of a file system, and in response to the backup request: selecting a reference backup from a backup storage system, obtaining a first hash value document associated with the reference backup, generating a hash value for an asset associated with the file system, making a first determination that the hash value matches a second hash value specified in the first hash value document, in response to the first determination, populating an incremental backup with a copy of data associated with the asset, initiating a transfer of the incremental backup to the backup storage system, and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

[0004] In general, in one aspect, the invention relates to a system that includes a processor and memory that includes instructions which, when executed by the processor, perform a method. The method includes obtaining, by a backup agent, a backup request for an incremental backup of a file system, and in response to the backup request: selecting a reference backup from a backup storage system, obtaining a first hash value document associated with the reference backup, generating a hash value for an asset associated with the file system, making a first determination that the hash value matches a second hash value specified in the first hash value document, in response to the first determination, populating an incremental backup with a copy of data associated with the asset, initiating a transfer of the incremental backup to the backup storage system, and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

[0005] In general, in one aspect, the invention relates to a non-transitory computer readable medium that includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for performing backup operations. The method includes obtaining, by a backup agent, a backup request for an incremental backup of a file system, and in response to the backup request: selecting a reference backup from a backup storage system, obtaining a first hash value document associated with the reference backup, generating a hash value for an asset associated with the file system, making a first determination that the hash value matches a second hash value specified in the first hash value document, in response to the first determination, populating an incremental backup with a copy of data associated with the asset, initiating a transfer of the incremental backup to the backup storage system, and storing a second hash value document, wherein the second hash value document comprises the hash value and a backup identifier of the incremental backup.

BRIEF DESCRIPTION OF DRAWINGS

[0006] Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.

[0007] FIG. 1 shows a diagram of a system in accordance with one or more embodiments of the invention.

[0008] FIG. 2 shows a diagram of a hash value document repository in accordance with one or more embodiments of the invention.

[0009] FIG. 3A shows a flowchart for performing an incremental backup of a file system in accordance with one or more embodiments of the invention.

[0010] FIG. 3B shows a flowchart for performing a block-based incremental backup in accordance with one or more embodiments of the invention.

[0011] FIGS. 4A-4B show an example in accordance with one or more embodiments of the invention.

[0012] FIG. 5 shows a diagram of a computing device in accordance with one or more embodiments of the invention.

DETAILED DESCRIPTION

[0013] Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.

[0014] In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.

[0015] In general, one or more embodiments of the invention relates to systems and methods for generating incremental backups of applications in a production host environment. The incremental backups may be generated using a file system or using block-based backups. Embodiments of the invention may include using a hash value document repository to select a reference backup and to compare hash values of assets in the file system to hash values of the assets at the point in time of the reference backup to determine which assets have been modified since the reference backup. An incremental backup may be generated based on these determinations.

[0016] Embodiments of the invention further include generating a block-based incremental backup by identifying which data blocks have been modified, identifying which files (or assets) are associated with each changed data block, and, after generating the block-based backup, storing a file-change document with the block-based backup that specifies

[0017] FIG. 1 shows a diagram of a system in accordance with one or more embodiments of the invention. The system may include one or more clients (100), a production host environment (110), and a backup storage system (150). The system may include additional, fewer, and/or different components without departing from the invention. Each component may be operably connected to any of the other components via any combination of wired and/or wireless connections. Each of the aforementioned components is discussed below.

[0018] In one or more embodiments of the invention, the production host environment (110) is a grouping of production hosts (110) that each provide services to the clients (100). Each production host (110A, 110N) in the production host environment (110) includes applications (112), a backup agent (116), a block-based write tracker (118), a hash value document repository (119A), and a file system storage information (119B). The production hosts (110A, 110N) may include additional, fewer, and/or different components without departing from the invention. Each of the aforementioned components illustrated in FIG. 1 are discussed below.

[0019] In one or more embodiments of the invention, a production host (110A, 110N) hosts one or more applications (112). In one or more embodiments of the invention, the applications (112) perform services for clients (e.g., 100). The services may include writing, reading, and/or otherwise modifying data that is stored in the production host (110A, 110N). The applications (112) may each include functionality for writing data to the production host (110A, 110N) and for notifying the block based write tracker (118) of data written to a persistent storage system in the production host (110A, 110N). The applications may be, for example, instances of databases, email servers, and/or other applications. The applications (112A, 112N) may host other types of applications without departing from the invention.

[0020] In one or more of embodiments of the invention, each application (112A, 112N) is implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor(s) of the production host (e.g., 110A, 110N) cause the production host (110A, 110N) to provide the functionality of the applications (e.g., 112A, 112N) described throughout this application.

[0021] In one or more embodiments of the invention, the production host (110A, 110N) further includes a backup agent (116). The backup agent (116) may include functionality for generating backups of a file system. In one or more embodiments of the invention, a file system is an organizational data structure that tracks how data is stored and retrieved in a system (e.g., in persistent storage of a production host (110A, 110N) or of the production host environment (110)). The file system may specify references to assets and any data blocks associated with each asset. An asset may be an individual data object in the file system. An asset may be, for example, a file. The backup generated may include a copy of the assets for one or more specified applications associated with a specified point in time. The backup of the file system may be generated via the method illustrated in FIG. 3A. The backup of the file system may be generated via any other method without departing from the invention.

[0022] In one or more embodiments of the invention, the backup agent (116) may further include functionality for generating block-based backups. In one or more embodiments of the invention, a block-based backup is a backup generated by copying data blocks in a persistent storage system (not shown) of a production host (e.g., 110A, 110N). The data blocks may be stored contiguously or non-contiguously in the persistent storage system. In other words, data blocks in stored in portions of a persistent storage system that are physically located near each other (e.g., next to each other). The storage location of each data block in the production host may be specified in the file system storage location (119B) (discussed below). The block-based backup may be generated via the method illustrated in FIG. 3B. The block-based backup may be generated via any other method without departing from the invention.

[0023] In one or more embodiments of the invention, the backup agent (116) may generate the backups based on backup policies implemented by the backup agent (116). The backup policies may specify a schedule in which the applications (e.g., 112A, 112N) are to be backed up. The backup agent (116) may be triggered to execute a backup in response to a backup policy. Alternatively, one or more of the backups (152, 154) may be generated in response to a backup request triggered by the client(s) (100). The backup request may specify the applications to be restored.

[0024] In one or more embodiments of the invention, the backup agent (116) is a physical device. The physical device may include circuitry. The physical device may be, for example, a field-programmable gate array, application specific integrated circuit, programmable processor, microcontroller, digital signal processor, or other hardware processor. The physical device may be adapted to provide the functionality of the backup agent (116) described throughout this application.

[0025] In one or more embodiments of the invention, the backup agent (116) is implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor of the production host (e.g., 110A, 110N) causes the production host (110A, 110N) to provide the functionality of the backup agent (116) described throughout this application.

[0026] In one or more embodiments of the invention, the production host (110A, 110N) further includes a block-based write tracker (e.g., 118). In one or more embodiments of the invention, the block-based write tracker (118) tracks the changed portions of the persistent storage system used in the production host (110A, 110N). The block-based write tracker (220) tracks such changed portions by maintaining a block-based change list that specifies each data block in the persistent storage system that has been changed since a most recent block-based backup.

[0027] In one or more embodiments of the invention, the block-based write tracker (118) is a physical device. The physical device may include circuitry. The physical device may be, for example, a field-programmable gate array, application specific integrated circuit, programmable processor, microcontroller, digital signal processor, or other hardware processor. The physical device may be adapted to provide the functionality of the block-based write tracker (118) described throughout this application.

[0028] In one or more embodiments of the invention, the block-based write tracker (118) is implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor of a production host (e.g., 110A, 110N) causes the production host (110A, 110N) to provide the functionality of the block-based write tracker (220) described throughout this application.

[0029] In one or more embodiments of the invention, the hash value document repository (119A) is a data structure that includes one or more hash value documents. The hash value documents may each specify information about assets in the file system at a point in time in which a file-system backup of the file system was generated. For additional details regarding the hash value document repository, see, e.g., FIG. 2.

[0030] In one or more embodiments of the invention, the file system storage information (119B) is a data structure that specifies each asset in the file system and a storage location of the data blocks associated with the asset in the persistent storage system. The file system storage information (119B) may include entries that each specify an asset of the file system, the data blocks associated with the asset, and the physical or logical storage location of each data block. The storage location may be, for example, an address (e.g., physical, logical, etc.) associated with a portion of a physical storage device.

[0031] In one or more embodiments of the invention, the production host (110A, 110N) is implemented as a computing device (see e.g., FIG. 5). The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, a distributed computing system, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The computing device may include instructions, stored on the persistent storage, that when executed by the processor(s) of the computing device cause the computing device to perform the functionality of the production host (110A, 110N) described throughout this application.

[0032] In one or more embodiments of the invention, the production host (110A, 110N) is implemented as a logical device. The logical device may utilize the computing resources of any number of computing devices and thereby provide the functionality of the production host (110A, 110N) described throughout this application.

[0033] In one or more embodiments of the invention, the client(s) (100) utilize services provided by the production host (110). Specifically, the client(s) (100) may utilize the applications in the applications (112A, 112N) to obtain, modify, and/or store data. The data may be generated from applications hosted in the application (112).

[0034] In one or more embodiments of the invention, a client (100) is implemented as a computing device (see e.g., FIG. 5). The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, a distributed computing system, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The computing device may include instructions, stored on the persistent storage, that when executed by the processor(s) of the computing device cause the computing device to perform the functionality of the client (100) described throughout this application.

[0035] In one or more embodiments of the invention, the client(s) (100) are implemented as a logical device. The logical device may utilize the computing resources of any number of computing devices and thereby provide the functionality of the client(s) (100) described throughout this application.

[0036] In one or more embodiments of the invention, the backup storage system (150) stores backups of a file system. The file system may include application data of the applications (e.g., 112). The backups may further include application dependency information. In one or more embodiments of the invention, a backup is a full or partial copy of one or more applications (e.g., 112A, 112N). The copy may include the application data and/or application dependency information.

[0037] In one or more embodiments of the invention, a backup (152, 154) in the backup storage system (150) is an incremental backup. In one or more embodiments of the invention, an incremental backup is a backup that only stores changes in the persistent storage system that were made after a previous backup in the backup storage system. In contrast, a full backup may include all of the data in the persistent storage system (120) without taking into account when the data had been modified or otherwise written to the persistent storage system (120).

[0038] In one or more embodiments of the invention, if the data in the file system is to be restored to a point in time associated with an incremental backup, the required backups needed to perform the restoration include at least: (i) the incremental backup, (ii) a full backup, and (iii) the intermediate backups (if any) that are associated with points in time between the full backup and the incremental backups. In this manner, the required backups collectively include all of the data of the persistent storage system (120) at the requested point in time.

[0039] In one or more embodiments of the invention, each backup (152, 154) in the backup storage system (150) is either a file-system backup or a block-based backup. In one or more embodiments of the invention, a file-system backup is a backup generated by identifying the assets in the file system and generating a copy of all assets (or a portion thereof). In contrast, a block-based backup is generated by identifying the data blocks in the persistent storage system of a production host (e.g., 110A, 110N) and generating copies of all data blocks (or a portion thereof). The data in a file-system backup and of a block-based backup may be similar or different without departing from the invention.

[0040] In one or more embodiments of the invention, the backup storage system (150) is implemented as a computing device (see e.g., FIG. 5). The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, a distributed computing system, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The computing device may include instructions stored on the persistent storage, that when executed by the processor(s) of the computing device cause the computing device to perform the functionality of the backup storage system (150) described throughout this application.

[0041] In one or more embodiments of the invention, the backup storage system (150) is implemented as a logical device. The logical device may utilize the computing resources of any number of computing devices and thereby provide the functionality of the backup storage system (150) described throughout this application.

[0042] FIG. 2 shows a diagram of a hash value document repository in accordance with one or more embodiments of the invention. The hash value document repository (200) may be an embodiment of the hash value document repository (119A) discussed above. In one or more embodiments of the invention, the hash value document repository (200) includes one or more hash value documents (210A, 210N). Each hash value document (210A, 210N) may include a backup identifier (212), a timestamp (214), and one or more asset hash values (216, 218). The hash value document repository (200) may include additional, fewer, and/or different components without departing from the invention. Each of the aforementioned components illustrated in FIG. 2 is discussed below.

[0043] In one or more embodiments of the invention, the backup identifier (212) of a hash value document (210A, 210N) is a combination of letters, numbers, and/or symbols that uniquely identifies a backup stored in a backup storage system. The hash value document (210A, 210N) may be associated with the backup identified by the corresponding backup identifier (212).

[0044] In one or more embodiments of the invention, the timestamp (214) is a combination of letters, numbers, and/or symbols that uniquely identifies a point in time associated with the backup identified by the backup identifier (212). The point in time may be the point in time in which the backup was generated and/or the point in time in which the data stored in the backup existed.

[0045] In one or more embodiments of the invention, each asset hash value (e.g., asset A hash value (216), asset M hash value (218)) is a value that is generated by implementing an encryption function (e.g., a hash function) on the asset. The hash value may vary based on the data that is modified in the asset. For example, a hash value of the asset at a first point in time may be drastically different from a second hash value of the asset at a second point in time if any data in the asset is added, deleted, and/or otherwise modified after the first point in time.

[0046] FIGS. 3A-3B show flowcharts in accordance with one or more embodiments of the invention. While the various steps in the flowcharts are presented and described sequentially, one of ordinary skill in the relevant art will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all steps may be executed in parallel. In one embodiment of the invention, the steps shown in FIGS. 3A-3B may be performed in parallel with any other steps shown in FIGS. 3A-3B without departing from the scope of the invention.

[0047] FIG. 3A shows a flowchart for performing an incremental backup of a file system in accordance with one or more embodiments of the invention. The method shown in FIG. 3A may be performed by, for example, a backup agent (116, FIG. 1). Other components of the system illustrated in FIG. 1 may perform the method of FIG. 3A without departing from the invention.

[0048] In step 300, a backup request for an incremental backup of a file system is obtained. The backup request may be obtained from a client managing the initiation of backups. Alternatively, the backup request may be the result of the backup agent implementing backup policies. As discussed above, the backup policies may include schedules that specify when to perform a backup of the persistent storage device. The backup request may specify the applications to be backed up.

[0049] In step 302, a reference backup is selected from a backup storage system. In one or more embodiments of the invention, the reference backup is selected by sending a selection request to a client. The client may be the client managing the initiation of backups.

[0050] The client, in response to the request, may send a response to the backup agent with a specified backup selected. In one or more embodiments of the invention, the selected backup may be based on whether the most recent backup associated with the file system is available. In other words, the client may identify that the default backup to be used as the reference backup is not available. As such, the response may specify a different backup to be used as the reference backup.

[0051] In one or more embodiments of the invention, the reference backup is selected by the backup agent by identifying the available backups in the backup storage and selecting a most recent available backup in the backup storage system.

[0052] In step 304, a hash value document associated with the reference backup is obtained. In one or more embodiments of the invention, the hash value document is identified using the backup identifier associated with the hash value document. The backup agent may analyze the hash value repository to identify a hash value document that specifies the backup identifier associated with the selected reference backup.

[0053] In step 306, an asset in the file system is selected. In one or more embodiments of the invention, the asset is an unprocessed asset in the file system. The asset may be a file, a portion of a file (e.g., a file segment), a collection of files, and/or any other sub portion of the file system without departing from the invention.

[0054] In step 308, a hash value of data associated with the asset is generated. In one or more embodiments of the invention, the hash value is generated by performing a hash function (or any other encryption function) on data associated with the asset.

[0055] In step 310, a determination is made about whether the generated hash value matches a previous hash value of the reference backup. In one or more embodiments of the invention, the generated hash value is compared to an asset hash value stored in the hash value document that corresponds to the selected asset. If the generated hash value matches the previous hash value, the method proceeds to step 314; otherwise, the method proceeds to step 312.

[0056] In step 312, following the determination that the generated hash value does not match an asset hash value of the hash value document, an incremental backup is populated with a copy of the data associated with the asset. In one or more embodiments of the invention, the incremental backup is generated if this is the first asset to be selected. The copy of the asset

[0057] In step 314, a determination is made about whether all assets in the file system are processed. If all assets in the file system are processed, the method proceeds to step 316; otherwise, the method proceeds to step 306.

[0058] In step 316, a transfer of the incremental backup to the backup storage system is initiated. In one or more embodiments of the invention, the incremental backup is initiated by sending the incremental backup to the backup storage system.

[0059] In step 318, a hash value document is stored using the generated hash value(s). In one or more embodiments of the invention, the hash value document includes the generated hash values of each asset, a backup identifier of the incremental backup, a timestamp associated with the point in time associated with the incremental backup, and the generated hash value(s) of the asset(s) in the file system.

[0060] FIG. 3B shows a flowchart for performing a block-based incremental backup in accordance with one or more embodiments of the invention. The method shown in FIG. 3B may be performed by, for example, a backup agent (116, FIG. 1). Other components of the system illustrated in FIG. 1 may perform the method of FIG. 3B without departing from the invention.

[0061] In step 320, a backup request for an incremental block-based backup is obtained. The backup request may be obtained from a client managing the initiation of block-based backups. Alternatively, the backup request may be the result of the backup agent implementing backup policies.

[0062] In step 322, one or more data blocks changed since a most recent block-based backup is identified. In one or more embodiments of the invention, the data blocks are identified using a block-based write tracker that tracks the writes performed on a persistent storage system. The tracked writes may be writes performed after a most recent block-based backup is performed. The identified data blocks may be the data blocks specified by the block-based write tracker that were modified after the most recent backup.

[0063] In step 324, a data block file analysis is performed on the identified data blocks to identify modified files. In one or more embodiments of the invention, the data block file analysis includes obtaining the file system information, searching for the identified data blocks, and identifying each file (or asset) associated with each identified data block.

[0064] For example, if data blocks A, B, and C were identified to have been changed after the most recent backup, the file system information may be analyzed to determine which files are associated with data blocks A, B, and C. The files determined to be associated with blocks A, B, and C are the identified modified files.

[0065] In step 326, an incremental block-based backup is generated using the identified data blocks. In one or more embodiments of the invention, the incremental block-based backup is generated by copying the identified data blocks and storing the copies in the incremental block-based backup.

[0066] In step 328, a file change document is generated based on the identified modified files. In one or more embodiments of the invention, the file change document specifies the identified modified files. Further, the file change document specifies a timestamp associated with the incremental block-based backup.

[0067] In step 330, the backup is updated based on the file change document. In one or more embodiments of the invention, the backup is updating by including the file change document to the generated block-based backup. In this manner, the block-based backup specifies the modified files, and this information may be provided to a client when selecting a block-based backup to restore a file to a previous point in time.

[0068] In step 332, a transfer of the backup to the backup storage system is initiated. In one or more embodiments of the invention, the transfer is initiated by sending the backup (which includes the file change document) to the backup storage system.

Example

[0069] The following section describes an example. The example, illustrated in FIGS. 4A-4B, is not intended to limit the invention. Turning to the example, consider a scenario in which a production host performs an incremental backup of a file system comprising three files (file A, file B, file C).

[0070] FIG. 4A shows a first diagram of an example system. For the sake of brevity, not all components of the example system are illustrated in FIG. 4A. The example system includes a production host (410), a client (400), and a backup storage system (420). The production host (410) includes application A (412A), application B (412B), a backup agent (414), and a hash value document (416).

[0071] The client (400) sends a backup request to the backup agent (414) that specifies backing up an file system that includes application data for applications A and B (412A, 412B) [1]. The backup agent (414), in response to the backup request, follows the method of FIG. 3A and analyzes the backup storage system (420) to select a reference backup [2]. In the backup storage system, backup A (422A) is a full backup of the file system. Backup B (422B) is an incremental backup that depends on backup A (422A). In other words, in order for the file system to be restored to a point in time associated with backup B (422B), both backups A (422A) and B (422B) will be required for the restoration. A third backup, backup C (422C), is identified in the backup storage system (420). Backup C (422C) is an incremental backup that depends on backup B (422B). However, the backup agent determines that backup C (422C) is a failed backup. Based on this determination, the backup agent (414) selects the most recent backup that is available. The backup agent selects backup B (422B) as the reference backup.

[0072] Following this determination, the backup agent (414) identifies the hash value document associated with backup B (416). The hash value document for backup B (416) includes a hash value of each file in the file system regardless of whether a copy of the file is in backup B. The backup agent (414) generates a first hash value of the data in file A of application A (412A) [3]. The first hash value is compared to a corresponding hash value in the hash value document for backup B (416) [4]. The backup agent (414) determines that the hash values match. This determination may signify that after the generation of backup B, file A was not modified.

[0073] The backup agent (414) generates a second hash value of the data in file B of application A (412A) [5]. The second hash value is compared to a corresponding hash value in the hash value document for backup B (416) [6]. The backup agent (414) determines that the hash values do not match. This determination may signify that after the generation of backup B, file B was in some way modified.

[0074] The backup agent (414) generates a third hash value of the data in file C of application B (412B) [7]. The third hash value is compared to a corresponding hash value in the hash value document for backup B (416) [8]. The backup agent (414) determines that the hash values do not match. This determination may signify that after the generation of backup B, file C was in some way modified.

[0075] FIG. 4B shows a second diagram of the example system. For the sake of brevity, not all components of the example system are illustrated in FIG. 4B. At a later point in time, the backup agent (414) generates an incremental backup of the file system (422D). The incremental backup (422D) (also referred to as backup D) is generated by copying the data of files B and C (i.e., the files that were modified after the generation of backup B (422B)) and storing the copies in the incremental backup (422D). The incremental backup (422D) is stored in the backup storage system (420) [9].

[0076] Further, the backup agent (414) generates a hash value document for backup D (418) [10]. The hash value document for backup D (418) includes a backup identifier for backup D (422D), a timestamp associated with the backup (422D), and the first, second, and third hash values generated when performing the method of FIG. 3A.

[0077] End of Example

[0078] As discussed above, embodiments of the invention may be implemented using computing devices. FIG. 5 shows a diagram of a computing device in accordance with one or more embodiments of the invention. The computing device (500) may include one or more computer processors (502), non-persistent storage (504) (e.g., volatile memory, such as random access memory (RAM), cache memory), persistent storage (506) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface (512) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), input devices (510), output devices (508), and numerous other elements (not shown) and functionalities. Each of these components is described below.

[0079] In one embodiment of the invention, the computer processor(s) (502) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing device (500) may also include one or more input devices (510), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device. Further, the communication interface (512) may include an integrated circuit for connecting the computing device (500) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.

[0080] In one embodiment of the invention, the computing device (500) may include one or more output devices (508), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (502), non-persistent storage (504), and persistent storage (506). Many different types of computing devices exist, and the aforementioned input and output device(s) may take other forms.

[0081] One or more embodiments of the invention may be implemented using instructions executed by one or more processors of the data management device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.

[0082] One or more embodiments of the invention may improve the operation of one or more computing devices. More specifically, embodiments of the invention improve the backup operations for data in a file system. Embodiments of the invention enable a file system backup to reference any previous backup of the file system regardless of whether the backup is most recent. Performing a full backup of the file system may utilize additional computing resources than an incremental backup. Further, an incremental backup may be more desirable when not much data was changed between the incremental backup and the previous backup. Rather than having the best option be performing a full backup of the file system when determining that the most recent backup is not available as a reference backup, embodiments of the invention enable a backup agent (or another entity) to utilize a different backup (i.e., an alternate backup to the most recent backup) as a reference backup for performing an incremental backup.

[0083] Further, embodiments of the invention provide clients visibility of which files have been modified in a block-based backup. While traditional block-based backups do not track which files are modified in each iteration of the block-based backups, embodiments of the invention utilize storage information of the backups to determine which files have been modified for each backup. In this manner, the client has the ability to select a block-based backup based on the modifications performed on the files at the specified points in time.

[0084] Thus, embodiments of the invention may address the problem of inefficient use of computing resources. This problem arises due to the technological nature of the environment in which backup operations are performed.

[0085] The problems discussed above should be understood as being examples of problems solved by embodiments of the invention disclosed herein and the invention should not be limited to solving the same/similar problems. The disclosed invention is broadly applicable to address a range of problems beyond those discussed herein.

[0086] While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

* * * * *