Incremental forever backups for exchange Patent Grant Pradeep , et al. De [EMC IP Holding Company LLC]

Incremental forever backups for exchange

Pradeep , et al. De

Patent Grant 10146631

U.S. patent number 10,146,631 [Application Number 14/870,717] was granted by the patent office on 2018-12-04 for incremental forever backups for exchange. This patent grant is currently assigned to EMC IP Holding Company LLC. The grantee listed for this patent is EMC IP Holding Company LLC. Invention is credited to Matthew D Buchman, Vladimir Mandic, Anappa Pradeep, Suman Tokuri, Sunil Yadav.

United States Patent	10,146,631
Pradeep , et al.	December 4, 2018

Incremental forever backups for exchange

Abstract

An incremental backup of a database includes issuing a request to a copy service requesting a snapshot of a volume having the database, and identifying a writer of the database that should participate in creating the snapshot. From the snapshot, changes to the database since a last backup of the database and log files associated with the database that include data not yet committed to the database, are transmitted to a backup storage unit.

Inventors:

Pradeep; Anappa (Bangalore, IN), Yadav; Sunil (Bangalore, IN), Tokuri; Suman (Bangalore, IN), Mandic; Vladimir (San Jose, CA), Buchman; Matthew D (Seattle, WA)

Applicant:

Name	City	State	Country	Type
EMC IP Holding Company LLC	Hopkinton	MA	US

Assignee:

EMC IP Holding Company LLC (Hopkinton, MA)

Family ID:

64451973

Appl. No.:

14/870,717

Filed:

September 30, 2015

Current U.S. Class:	1/1
Current CPC Class:	G06F 11/1471 (20130101); G06F 16/27 (20190101); G06F 16/2365 (20190101); G06F 16/955 (20190101); G06F 11/1464 (20130101); G06F 11/1451 (20130101); G06F 2201/80 (20130101); G06F 2201/84 (20130101)
Current International Class:	G06F 17/30 (20060101); G06F 11/14 (20060101)
Field of Search:	;707/645,646

References Cited [Referenced By]

U.S. Patent Documents


7549028	June 2009	Thompson
7707184	April 2010	Zhang
8635187	January 2014	Beatty
8645496	February 2014	Uhlmann
8706992	April 2014	Liu
9384256	July 2016	Banerjee
9703640	July 2017	Beatty
2006/0053333	March 2006	Uhlmann
2010/0077160	March 2010	Liu
2012/0179655	July 2012	Beatty
2013/0339302	December 2013	Zhang
2013/0339303	December 2013	Potter
2015/0142745	May 2015	Tekade
2016/0077928	March 2016	Dwivedi
2016/0210203	July 2016	Kumarasamy
2016/0314046	October 2016	Kumarasamy
2017/0024232	January 2017	Barve

Primary Examiner: Corrielus; Jean M
Attorney, Agent or Firm: Staniford Tomita LLP

Claims

What is claimed is:

1. A method for an incremental block-based backup of a database stored on a volume of a backup client comprising: registering the database with a driver to track changes to blocks on the volume that are associated with the database; issuing a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that participates in creating the snapshot; receiving an indication from the copy service that the snapshot of the volume has been created on the backup client; receiving, from the driver, an identification of changed file blocks associated with the database, the changed file blocks comprising changes to the database since a last backup of the database; requesting a dump of a header of the database; reviewing the dump of the header to identify uncommitted log files associated with the database; based on the review and upon the snapshot having been created, identifying on the snapshot the uncommitted log files associated with the database, the uncommitted log files comprising data not yet committed to the database on the snapshot; and transmitting for the incremental block-based backup of the database the changed file blocks and the uncommitted log files to a backup storage unit, wherein the incremental block-based backup of the database is performed without backing up other file blocks of the database that have not changed since the last backup, without backing up other changed file blocks of the volume that are not associated with the database, and wherein the incremental block-based backup of the database allows the database to be backed up within an amount of time that is less than an amount of time required for a file-based backup of the database.

2. The method of claim 1 wherein the changes to the database are stored in a first container on the backup storage unit, and the uncommitted log files are stored in a second container on the backup storage unit, separate from the first container.

3. The method of claim 1 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein a synthetic full backup of the database is created on the backup storage unit based on the virtual hard disk file and the virtual hard disk differencing file.

4. The method of claim 1 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein the virtual hard disk file and the virtual hard disk differencing file are maintained on the backup storage unit as separate files.

5. The method of claim 1 wherein the transmitting for the incremental backup comprises: excluding from the transmission other log files associated with the database, wherein the other log files comprise data committed to the database on the snapshot.

6. The method of claim 1 wherein the database comprises a Microsoft Exchange Database.

7. The method of claim 1 wherein data in an uncommitted log file is committed to the database when a size of the uncommitted log file reaches a pre-determined size.

8. A system for an incremental block-based backup of a database stored on a volume of a backup client, the system comprising: a processor-based system executed on a computer system and configured to: issue a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that participates in creating the snapshot; receive an indication from the copy service that the snapshot of the volume has been created on the backup client; identify on the snapshot changed file blocks associated with the database, the changed file blocks comprising changes to the database since a last backup of the database; upon the snapshot having been created, search the snapshot for uncommitted log files associated with the database, the uncommitted log files comprising data not yet committed to the database on the snapshot; and transmit for the incremental block-based backup of the database the changed file blocks and the uncommitted log files to a backup storage unit, wherein the incremental block-based backup of the database is performed without backing up other file blocks of the database that have not changed since the last backup, without backing up other changed file blocks of the volume that are not associated with the database, and wherein the incremental block-based backup of the database allows the database to be backed up within an amount of time that is less than an amount of time required for a file-based backup of the database.

9. The system of claim 8 wherein the changes to the database are stored in a first container on the backup storage unit, and the uncommitted log files are stored in a second container on the backup storage unit, separate from the first container.

10. The system of claim 8 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein a synthetic full backup of the database is created on the backup storage unit based on the virtual hard disk file and the virtual hard disk differencing file.

11. The system of claim 8 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein the virtual hard disk file and the virtual hard disk differencing file are maintained on the backup storage unit as separate files.

12. The system of claim 8 wherein the processor-based system is configured to: exclude from the transmission other log files associated with the database, wherein the other log files comprise data committed to the database on the snapshot.

13. The system of claim 8 wherein the processor-based system executed on the computer system is configured to: request a dump of a header of the database; and review the dump of the header to identify the uncommitted log files of the database.

14. A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to implement a method for an incremental block-based backup of a database stored on a volume of a backup client, the method comprising: issuing a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that participates in creating the snapshot; receiving an indication from the copy service that the snapshot of the volume has been created on the backup client; identifying on the snapshot changed file blocks associated with the database, the changed file blocks comprising changes to the database since a last backup of the database; requesting a dump of a header of the database; reviewing the dump of the header to identify uncommitted log files associated with the database; based on the review and upon the snapshot having been created, searching the snapshot for the uncommitted log files associated with the database, the uncommitted log files comprising data not yet committed to the database on the snapshot; and transmitting for the incremental block-based backup of the database the changed file blocks and the uncommitted log files to a backup storage unit, wherein the incremental block-based backup of the database is performed without backing up other file blocks of the database that have not changed since the last backup, without backing up other changed file blocks of the volume that are not associated with the database, and wherein the incremental block-based backup of the database allows the database to be backed up within an amount of time that is less than an amount of time required for a file-based backup of the database.

15. The computer program product of claim 14 wherein the changes to the database are stored in a first container on the backup storage unit, and the uncommitted log files are stored in a second container on the backup storage unit, separate from the first container.

16. The computer program product of claim 14 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein a synthetic full backup of the database is created on the backup storage unit based on the virtual hard disk file and the virtual hard disk differencing file.

17. The computer program product of claim 14 wherein a first container on the backup storage unit comprises a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit comprises a virtual hard disk differencing file that stores the changes to the database, and wherein the virtual hard disk file and the virtual hard disk differencing file are maintained on the backup storage unit as separate files.

18. The computer program product of claim 14 wherein the transmitting for the incremental backup comprises: excluding from the transmission other log files associated with the database, wherein the other log files comprise data committed to the database on the snapshot.

19. The computer program product of claim 14 wherein the database comprises a Microsoft Exchange Database.

20. The computer program product of claim 14 wherein the method comprises: registering the database with a driver to track changes to blocks on the volume that are associated with the database; and receiving, from the driver, an identification of the changed file blocks.

Description

TECHNICAL FIELD

Embodiments are generally directed to networked-based data backup methods, and more specifically to backing up and restoring a database.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND

In today's digital society organizations depend on having ready access to their data. Data, however, can be lost in a variety of ways such as through accidental deletion, data corruption, disasters and catastrophes (e.g., fires or flooding), media failures (e.g., disk crash), computer viruses, and so forth. Thus, it is important to backup data in the event that the data needs to be restored. An organization may have an immense amount of data that is critical to the organization's operation. Backing up data and subsequently recovering backed up data, however, can involve large amounts of computing resources such as network bandwidth, processing cycles, and storage due to the complexity and amount of data to be backed up and restored when needed.

For example, a database, such as a Microsoft Exchange database, can be several hundred gigabytes or many terabytes in size. Backing up such a large file can take a very long time. As a result, in some organizations will implement protection strategies involving a mix of full and partial backups. The database itself may be backed up weekly, such as on a Sunday or other day with low demand, while the associated database transaction logs are backed up daily. This process can help to speed the backup process because data in the database itself is not backed up as part of the daily backups. The restore procedures for such a protection strategy, however, is a multi-stage restore involving restoring a full backup followed by a number of partial backups. As such, restores can be very slow and complex.

Therefore, there is a need for data protection techniques that address both backup performance and recovery performance.

The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.

BRIEF DESCRIPTION OF THE FIGURES

In the following drawings like reference numerals designate like structural elements. Although the figures depict various examples, the one or more embodiments and implementations described herein are not limited to the examples depicted in the figures.

FIG. 1 is a diagram of a large-scale network implementing a data backup and recovery process that provides for backing up and recovering a database, under some embodiments.

FIG. 2 shows an overall architecture of a system for backup and recovery of a database according to a specific embodiment.

FIG. 3 shows a block diagram of a functional workflow for backing up a database according to a specific embodiment.

FIG. 4 shows a block diagram of an incremental backup saveset being chained to a full backup saveset according to a specific embodiment.

FIG. 5 shows a block diagram of a synthetic full backup based on a full backup saveset and one or more incremental backup savesets according to a specific embodiment.

FIG. 6 shows a flow of a full backup followed by an incremental backup according to a specific embodiment.

FIG. 7 shows a block diagram of a full backup according to a specific embodiment.

FIG. 8 shows a more detailed flow of an incremental backup according to a specific embodiment.

FIG. 9 shows a block diagram of an incremental backup according to a specific embodiment.

FIG. 10 shows a flow for recovering a database according to a specific embodiment.

DETAILED DESCRIPTION

A detailed description of one or more embodiments is provided below along with accompanying figures that illustrate the principles of the described embodiments. While aspects of the invention are described in conjunction with such embodiment(s), it should be understood that it is not limited to any one embodiment. On the contrary, the scope is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the described embodiments, which may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the embodiments has not been described in detail so that the described embodiments are not unnecessarily obscured.

It should be appreciated that the described embodiments can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer-readable medium such as a computer-readable storage medium containing computer-readable instructions or computer program code, or as a computer program product, comprising a computer-usable medium having a computer-readable program code embodied therein. In the context of this disclosure, a computer-usable medium or computer-readable medium may be any physical medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus or device. For example, the computer-readable storage medium or computer-usable medium may be, but is not limited to, a random access memory (RAM), read-only memory (ROM), or a persistent store, such as a mass storage device, hard drives, CDROM, DVDROM, tape, erasable programmable read-only memory (EPROM or flash memory), or any magnetic, electromagnetic, optical, or electrical means or system, apparatus or device for storing information. Alternatively or additionally, the computer-readable storage medium or computer-usable medium may be any combination of these devices or even paper or another suitable medium upon which the program code is printed, as the program code can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. Applications, software programs or computer-readable instructions may be referred to as components or modules. Applications may be hardwired or hard coded in hardware or take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware such that when the software is loaded into and/or executed by the computer, the computer becomes an apparatus for practicing the invention. Applications may also be downloaded, in whole or in part, through the use of a software development kit or toolkit that enables the creation and implementation of the described embodiments. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.

Disclosed herein are methods and systems for backing up and restoring data that can be used as part of a disaster recovery solution for large-scale networks. Some embodiments of the invention involve automated backup recovery techniques in a distributed system, such as a very large-scale wide area network (WAN), metropolitan area network (MAN), or cloud based network system, however, those skilled in the art will appreciate that embodiments are not limited thereto, and may include smaller-scale networks, such as LANs (local area networks). Thus, aspects of the one or more embodiments described herein may be implemented on one or more computers executing software instructions, and the computers may be networked in a client-server arrangement or similar distributed computer network.

FIG. 1 illustrates a computer network system 100 that implements one or more embodiments of a system for backing up and restoring data. In system 100, a number of clients 104 are provided to serve as backup clients or nodes. A network or backup server computer 102 is coupled directly or indirectly to these clients through network 110, which may be a cloud network, LAN, WAN or other appropriate network. Network 110 provides connectivity to the various systems, components, and resources of system 100, and may be implemented using protocols such as Transmission Control Protocol (TCP) and/or Internet Protocol (IP), well known in the relevant arts. In a distributed network environment, network 110 may represent a cloud-based network environment in which applications, servers and data are maintained and provided through a centralized cloud computing platform. In an embodiment, system 100 may represent a multi-tenant network in which a server computer runs a single instance of a program serving multiple clients (tenants) in which the program is designed to virtually partition its data so that each client works with its own customized virtual application, with each virtual machine (VM) representing virtual clients that may be supported by one or more servers within each VM, or other type of centralized network server.

The data generated within system 100 may be stored in a backup media of a backup storage node 114. The backup media may be located at any number of persistent storage locations and devices, such as local client storage, server storage, or network storage, which may at least be partially implemented through storage device arrays, such as RAID components. In an embodiment network 100 may be implemented to provide support for various storage architectures such as storage area network (SAN), Network-attached Storage (NAS), or Direct-attached Storage (DAS) that make use of large-scale network accessible storage devices, such as large capacity tape or drive (optical or magnetic) arrays. In an embodiment, the target storage devices, such as tape or disk array may represent any practical storage device or set of devices, such as tape libraries, virtual tape libraries (VTL), fiber-channel (FC) storage area network devices, and OST (OpenStorage) devices. In a specific embodiment, however, the target storage devices represent disk-based targets implemented through virtual machine technology.

For the embodiment of FIG. 1, network system 100 includes backup server 102, one or more backup clients 104, and backup storage node 114. A backup client executes processes 112 for backing up data to the storage node, restoring the backed up data, and coordinating with backup server processes 120 on the backup server and processes 116 on the storage node. The backup server processes include processes to index the backups and identify which savesets reside on which backup devices or volumes. The backup storage node executes processes 116 for receiving backup information from the backup client, writing data to the backup devices or volumes, sending tracking information to the backup server to track the data written to the devices or volumes, and reading the data from the devices or volumes at the request of the client during a recovery.

In an embodiment, system 100 may represent a Data Domain Restorer (DDR)-based deduplication storage system, and a storage server or node having the backup media may be implemented as a DDR Deduplication Storage server provided by EMC Corporation. However, other similar backup and storage systems are also possible. System 100 may utilize certain protocol-specific namespaces that are the external interface to applications and include NFS (network file system) and CIFS (common internet file system) namespaces, as well as a virtual tape library (VTL) or DD Boost provided by EMC Corporation. In general, DD Boost (Data Domain Boost) is a system that distributes parts of the deduplication process to the backup server or application clients, enabling client-side deduplication for faster, more efficient backup and recovery. A data storage deployment may use any combination of these interfaces simultaneously to store and access data. Data Domain (DD) devices in system 100 may use the DD Boost backup protocol to provide access from servers to DD devices. The DD Boost library exposes APIs (application programming interfaces) to integrate with a Data Domain system using an optimized transport mechanism. These API interfaces exported by the DD Boost Library provide mechanisms to access or manipulate the functionality of a Data Domain file system, and DD devices generally support both NFS and CIFS protocol for accessing files.

FIG. 2 shows further detail of a system 203 for backing up and restoring data. This system includes a backup client 206, a backup storage node 209, and a backup server 212, each of which are interconnected by a network 215. The backup client includes data that is to be backed up to the backup storage node. The backup server is responsible for indexing the data backed up to the backup storage node.

In a specific embodiment, the backup client includes an Exchange or Microsoft Exchange server. In this specific embodiment, system 203 provides for the backup (and recovery) of a Microsoft Exchange database including log files that are associated with the database. Exchange is a Microsoft messaging system that includes a mail server, an e-mail program (e-mail client), and groupware applications. Exchange is a widely used Microsoft application in the market. Exchange databases grow rapidly. In many cases, the time given for a backup window is insufficient to backup an Exchange database because of the size of the database. Thus, traditionally Exchange databases (e.g., files having an *.edb extension) are backed up once and then the Exchange logs (but not the database itself) are subsequently backed up as incremental backups. Nevertheless, there are many problems with this process. In particular, restores can be very slow. As a result, the restoration of an Exchange database often fails to meet restore time objectives (RTO).

More specifically, an Exchange backup traditionally needs to be performed through a file-based backup process (.edb/logs files) by scanning/reading the entire file system or Exchange writer for the backup list. However, the performance of a file-scanning backup can be a major bottleneck in the protection of a high-density Exchange Database and logs, due to the amount of time required for a file system\Exchange writer walk. The amount of data to be read for each incremental is a major problem of previous solutions. During a recover operation a user would need to recover a full backup and apply all the corresponding incremental backups. As a result, recovery can be very time consuming and there are often performance issues.

In this specific embodiment, however, a process of the system shown in FIG. 2 provides for Exchange block-based backups (BBB) and recovery. This process allows using the block-based backup process to perform a backup of an Exchange database (e.g., an *.edb file) and logs at the file block level. Using this new process, users can take advantage of "forever" incremental backups for Exchange using a block-based process. In other words, after an initial or full backup of the Exchange database, which may be referred to as a level zero backup, each following backup may be an incremental backup in which the data is read from the volume block-by-block. This simplifies the restore and these subsequent backups can be performed very fast because they are incremental backups rather than full backups.

Block-based backup is a technology which allows the backup application to walk the volume(s)/disk(s) where the Exchange database and transaction logs reside and backup all the blocks used by the database and logs. In a specific embodiment, the backup agent coordinates with a changed block tracking (CBT) driver, service, or framework. In this specific embodiment, a block-based backup by the backup agent uses the CBT driver to backup only blocks that have changed. A changed block may include modifications to existing data (e.g., modifications to data previously backed up) or newly added data (e.g., data that has not been previously backed up).

In a specific embodiment, an initial or first backup of the Exchange database and logs residing on the volume uses a block-based process involving associated file blocks (e.g., .edb file extents or blocks on the volume). Subsequent backups backup only changed blocks on the volume for the database file (e.g., *.edb) and uncommitted logs associated with the database. In other words, a current incremental backup may include both portions of the database that have changed since a last backup (e.g., modified existing data or newly added data) and uncommitted logs as of the time of the current backup. Committed log files can be excluded from the current backup because the backup agent will be backing up the actual database (or portions of the actual database) itself. That is, the current backup will include the data stored (and thus committed) to the database that was not present in a previous backup.

In a specific embodiment, if the backup target includes Data Domain, full backups are synthesized from these backups. If, however, the backup target does not support artificially synthesizing a full backup, e.g., backup target is an Advanced File Type Device (AFTD), an incremental backup is chained to a previous full or incremental backup. In a specific embodiment, which may be referred to as "incremental forever," incremental backups are sparse. A backup may include only the blocks that are in use by the file system for Exchange. That is, only the Exchange database and logs file blocks may be backed up. A database recover operation can include recovering from backups taken in a "forever" incremental configuration as if recovering from a full backup (e.g., single step recovery).

Referring now to FIG. 2, the backup client, backup storage node, and backup server may be as shown in FIG. 1 and described in the discussion accompanying FIG. 1. For example, the backup client can be a general purpose computer with software and hardware. The hardware may include a processor 218, memory 221, storage 224 (e.g., hard disk), input/output (I/O) controller, network interface, and other computing components, each of which may be interconnected by a bus architecture or any interconnection scheme.

The software may include an operating system, application programs, daemons, drivers, file system device drivers, file system libraries, code modules, and other software components. Examples of operating systems include the Microsoft Windows.RTM. family of operating systems (e.g., Windows Server), Linux, HP-UX, UNIX, Sun OS, Solaris, Mac OS X (e.g., Macintosh), Alpha OS, AIX, IRIX32, or IRIX64. Other operating systems may be used. Communications among the components may be via a communication protocol interface or application program interface (API).

In a specific embodiment, the backup client includes a Microsoft Exchange application 227, a backup/restore agent 230, an Exchange writer 233, a copy service 236, and a change block tracking (CBT) driver 239.

As discussed above, Microsoft Exchange is a messaging platform that provides email, scheduling, and tools for collaboration and messaging service applications. In this specific embodiment, the backup client may be referred to as an Exchange server. The Exchange server provides various messaging services (e.g., email) to support clients connected to the server. Microsoft Exchange stores data in an Exchange database file 242 (e.g., files having the ".edb" file extension) and one or more log files 245.

Data, such as an email received at the Exchange server, may first be written from memory to a log file. Exchange may pre-allocate or pre-determine a size for the log files (e.g., 1 megabyte (MB)). When a size of a current log file reaches the pre-determined size, Exchange may create a new log file. The log files are then committed to the database. That is, the data stored in the log files are written to or incorporated into the database. The writing of data in a log file to the database may occur during an idle machine cycle such as during off-peak times so that computing resources of the server can be devoted to other tasks during peak times. Thus, at any given time there can be multiple, e.g., two or more, log files. For example, there can be 2, 5, 10, 20, 30, 40, 50, or more than 50 log files. The log files may include committed log files (e.g., log files having data already written to the database), uncommitted log files (e.g., log files having data not yet written to the database), or both.

In a specific embodiment, copy service 236 is referred to as a volume shadow copy service (VSS) as provided by Microsoft. VSS is an operating system (e.g., Microsoft Windows) resident service that allows the system to make snapshots of a volume even when the volume is in use. It is used in conjunction with a file system (e.g., NTFS) that is able to create and store shadow copies of volumes. A snapshot is a read-only copy of a volume at a particular point in time. A snapshot may be referred to as a volume snapshot. Snapshots allow for the creation of consistent backups of a volume and ensure that contents do not change and are not locked while the backup is in progress. Snapshots are typically the first step in any incremental or full backup session, and the VSS service initiates and manages the snapshot creation process.

A snapshot of a volume includes an area of space on a disk that may be referred to as "shadow storage." After the snapshot is created a change to any data block from that time forward cannot get written until a copy of that block's data before the change (as it was when the snapshot was created) is written to the differencing area in shadow storage. In this way the data on the disk at the time the snapshot was created is preserved, block by block, in the shadow storage area. The snapshot data is then available either from the original disk, if the data blocks requested have not changed, or from the differencing area if they have.

VSS uses a collection of VSS writers to facilitate the backup of various VSS-compatible applications. Each VSS-compatible application may have a corresponding writer. For example, the Exchange VSS writer corresponds to the Exchange application. A writer is responsible for functions such as quiescing the corresponding application and data stores to be backed up (e.g., temporarily freezing application I/O write requests).

When a backup is initiated by the backup agent, VSS informs the appropriate VSS writers of the backup. This allows the application corresponding to the VSS writer to flush memory-resident data to disk for the backup. The backup agent or application reads a snapshot created by VSS in response to a backup request to VSS. Specifically, VSS performs a copy-on-write (COW) operation on the volume and allows a backup application to read and backup the snapshot data.

The CBT driver is responsible for tracking changes to one or more particular files as of a last backup of the files. The CBT driver can track changes at the file or file block-level. Tracking changes at the file or file block-level provides a more granular process than tracking changes at the volume-level. Specifically, tracking changes at the file-level allows for an incremental backup of a particular file--and more specifically blocks of the file that have changed since a last backup--without having to backup other files on the volume, regardless of whether or not they have changed, and without having to backup other portions of the particular file that have not changed since the last backup.

The backup agent is installed at the backup client and is responsible for backing up (and restoring) the Exchange database and associated log files. The backup agent coordinates 248A, B with VSS and the CBT driver to backup the Exchange database and associated Exchange log files to the backup storage node. In a specific embodiment, the backed up log files include uncommitted Exchange log files. Committed log files are excluded or omitted from the backup since the backup includes the changed portions of the database itself. When the database is to be restored, the uncommitted log files can be replayed against the restored database to write the data stored in the uncommitted log files to the restored database.

The backup storage node includes backup storage processes 251, and a target disk 254. The target disk stores the backed up database and uncommitted log files. Backups may be organized into savesets. A saveset refers to a set of source data that is to be backed up. A saveset may include a single file or multiple files. In a specific embodiment, the database and uncommitted log files are stored in separate containers. That is, the target disk includes a first container 257 and a second container 260. A backed up database 263 is stored in the first container. Backed up uncommitted log files 266 are stored in the second container. The second container is separate or different from the first container.

In a specific embodiment, the containers are virtual hard disk files. Thus, the backed up database may be stored in a first virtual hard disk file. The uncommitted log files associated with the database may be stored in a second virtual hard disk file, separate or different from the first virtual hard disk file. For example, a name of the first virtual hard disk file may be different from a name of the second virtual hard disk file. The first and second virtual hard disk files may be stored in different locations or folders on the backup storage node. The virtual hard disks may be files formatted in the VHD or VHDx file format as provided by Microsoft. The VHD or VHDx format is a container format, which can contain disk related information including a file system. VHD or VHDx files can be mounted and used as a regular disk. Volumes such as NTFS/ReFS/FAT32 or any file system which the OS supports on the mounted disk can also be created.

Storing the database and uncommitted log files in separate containers or separate virtual hard disks helps to facilitate managing the backup and restore processes. Storing the database and uncommitted log files in separate containers reflects the differences in the data that is being backed up. Specifically, for each given backup, the uncommitted log files will be different. The containers or virtual hard disks storing the database backup, however, may include a parent or base virtual hard disk and one or more related child virtual hard disks. The parent virtual hard disk may represent a full backup of the database. The child virtual hard disks may represent incremental backups for changes to the database and may point back to the parent virtual hard disk for unchanged data.

Storing the database separate from the log files can facilitate generating a synthetic full backup of the database. In other words, in some cases, a synthetic full backup of the database is created on the backend (e.g., backup storage node) by merging the parent virtual hard disk and one or more child virtual hard disks. Synthesizing full backups from incremental backups helps to increase recovery time objectives (RTO). For example, over time there may be an increasing number of incremental backups stored on the backup media. These incremental backups are dependent on previous backups and cannot be recovered separately or without the previous backup copy. The number of incremental backups is inversely proportional to recovery performance. Thus, as the number of incremental backups increases the restore performance decreases. Further, managing many separate incremental backups in the media (e.g., managing retention periods and expiration times and dates) can be very cumbersome.

The backup server includes backup server processes 269 and a media database 272. The media database provides a catalog or index that stores metadata associated with the database backups stored on the backup storage node. The metadata may include, for example, a time and date of the backup, device in which the backup or saveset is stored (e.g., location of the backup media or volume), an identification of the backup client (e.g., Exchange server name), and so forth.

FIG. 3 shows a sequence diagram 305 for an overall workflow of the backup system according to a specific embodiment. In this specific embodiment, the backup system is implemented in a product referred to as NetWorker from EMC Corporation. The NetWorker system includes a NetWorker Client Module for Microsoft Applications (NMM) client 310, a NetWorker server 315 including a media database 320, and a NetWorker storage node 320 including a target disk 325. This diagram shows the functional operations of an Exchange block-based backup.

The NMM client includes a process "nsrnmmsv.exe" 330. The NetWorker server includes processes "nsrd" 335, "nsrmmdb" 340, and "nsrindexd" 345. The NetWorker storage node includes a process "nsrmmd" 350. Bolded lines 355A and 355B indicate a data flow. Broken lines 360A, 360B, and 360C indicate inter-process communication.

A snapshot 365 corresponds to a shadow copy volume created by the VSS framework. The NMM client process 330 is responsible for reading the Exchange file data blocks from the shadow copy volume and sending them to the target disk. A direct file access (DFA) mechanism may be used to directly write data to the target disk.

Process 335 ("nsrd") is the main NetWorker server daemon. In order to catalog the backup in the media database for those Exchange files, process 330 sends requests to process 335 to create records in the media database. Process 340 manages the media database to track which backup saveset resides on which backup volume. More particularly, process 345 is responsible for insertion and deletion of backup saveset records into the indexes. Process 350 is the storage node daemon. This process is responsible for save and recovery operations. In particular, the daemon receives backup information from the NMM client, writes data to the devices (volumes), sends tracking information to the NetWorker server to track the data written to the devices (volumes), and reads the data from the devices (volumes) at the request of the client during a recovery.

There can be different types of target disk storing the backup data. FIG. 4 shows a simplified block diagram of a first type of target disk 405. This type target disk may be referred to as an Advanced File Type Device (AFTD). In a specific embodiment, if the backup target is an AFTD, an incremental backup is chained to a previous full or incremental backup. More particularly, in the example shown in FIG. 4, target disk 405 includes a first saveset 410 and a second saveset 415. The second saveset is linked 420 to the first save set. The first saveset corresponds to a backup at a first time. The second saveset corresponds to a backup at a second time, after the first time. For example, the first saveset may include a full backup of Exchange. The second saveset may include an incremental backup of Exchange that stores changes made after the full backup (e.g., after the first time).

Specifically, the first saveset may include a first database container 425A and a first uncommitted log files container 430A. First database container 425A stores a full backup of an Exchange database 435 as of the first time. The first database container may be formatted as a virtual hard disk file and may be referred to as a parent or base virtual hard disk. First uncommitted log files container 430A stores one or more uncommitted log files 437A associated with the Exchange database. Uncommitted log files 437A store data not yet committed to the database as of the first time or as of the current backup.

The second saveset may include a second database container 425B and a second uncommitted log files container 430B. Second database container 425B stores an incremental backup of the Exchange database. The second database container may be formatted as another virtual hard disk file and may be referred to as a differencing virtual hard disk. The incremental backup of the Exchange database is identified as 435' in FIG. 4 and includes changes to the database after the last backup. For example, the data may include changes to the database between the last and current backup (or changes between the first and second time). Data that has not changed (e.g., data already present in a previous backup) may be excluded from the incremental backup. Second uncommitted log files container 430B stores one or more uncommitted log files 437B associated with the Exchange database. Uncommitted log files 437B store data not yet committed to the database as of the second time or as of the current backup.

In this specific embodiment, the incremental backup is chained to the previous backup (e.g., previous full or incremental backup). The backup system associates the child container with the parent through incremental chaining. The incremental backups may be maintained as separate backups without being merged to create a synthetic full backup. More particularly, a virtual hard disk file corresponding to the full backup of Exchange database 435 and a virtual hard disk differencing file corresponding to the incremental backup of Exchange database 435' may not be merged.

FIG. 5 shows a simplified block diagram of a second type of target disk 505, different from the first type of target disk. This type of target disk may be referred to as a Data Domain disk which supports synthetic full backups. FIG. 5 is similar to FIG. 4. For example, target disk 505 includes a first saveset 510 and a second saveset 515. The second saveset is linked 520 to the first saveset. The first saveset corresponds to a backup at a first time. The second saveset corresponds to a backup at a second time, after the first time. For example, the first saveset may include a full backup of Exchange. The second saveset may include an incremental backup of Exchange that stores changes made after the full backup or after the last backup.

More particularly, the first saveset may include a first database container 525A and a first uncommitted log files container 530A. First database container 525A stores a full backup of an Exchange database 535 as of the first time. First database container 525A may be formatted a virtual hard disk file and may be referred to as a parent or base virtual hard disk. First uncommitted log files container 530A stores one or more uncommitted log files 537A associated with the Exchange database. Uncommitted log files 537A store data not yet committed to the database as of the first time or as of the current backup.

The second saveset may include a second database container 525B and a second uncommitted log files container 530B. Second database container 525B stores an incremental backup of the Exchange database. Second database container 525B may be formatted as another virtual hard disk file and may be referred to as a differencing virtual hard disk. The incremental backup of the Exchange database is identified as 535' in FIG. 4 and includes changes to the database between the first and second time (e.g., changes between the last and current backup). Second uncommitted log files container 530B stores one or more uncommitted log files 537B associated with the Exchange database. Uncommitted log files 537B store data not yet committed to the database as of the second time or as of the current backup.

In the example of FIG. 5, however, a synthetic full backup 550 has been artificially created on the target disk based on the full and incremental backups. The synthetic full backup includes a third database container 525C and a third uncommitted log files container 530C. The third database container stores a merging 533 of the full backup of Exchange database 535 and the incremental backup of Exchange database 535'. More particularly, in a specific embodiment, the third database container is formatted as a virtual hard disk and corresponds to a merging of the parent virtual hard disk from the full backup of the database and the differencing virtual hard disk from the incremental backup of the database.

Copies of the parent and differencing virtual hard disks may be merged in order to maintain the full and incremental backups as separate backups. This allows recoveries to particular points in time. For example, the target disk may include a full backup of the database and two or more incremental backups of the database such as first, second, and third incremental backup, each incremental backup corresponding to a particular point in time. A synthetic full backup may be created based on the full backup, the first incremental backup, and the second incremental backup. The third incremental backup may be excluded from the synthetic full backup in order to recover the database as of the second incremental backup. Another synthetic full backup may be created based on the full backup, the first incremental backup, the second incremental backup, and the third incremental backup in order to recover the database as of the third incremental backup, and so forth.

The uncommitted log files of the synthetic full backup may correspond to or be copies of the uncommitted log files from the last incremental backup upon which the synthetic full backup is based. When the database is recovered based on the synthetic full backup, the log files from the last incremental backup can be replayed against the recovered database so that the data stored in the log files can be written to the recovered database.

FIG. 6 shows an overall flow 605 of the backup system. Some specific flows are presented in this application, but it should be understood that the process is not limited to the specific flows and steps presented. For example, a flow may have additional steps (not necessarily described in this application), different steps which replace some of the steps presented, fewer steps or a subset of the steps presented, or steps in a different order than presented, or any combination of these. Further, the steps in other embodiments may not be exactly the same as the steps presented and may be modified or altered as appropriate for a particular process, application or based on the data.

In a step 610, the backup/restore agent initiates a discovery process to detect and identify the database. The discovery may include obtaining a location and name of the database such as the folder path on the backup client where the database file is stored. In a specific embodiment, the discovery process includes communicating with a Microsoft PowerShell Exchange Management Interface. The Exchange Management Shell provides an interface that the backup agent can use to discover the database, storage location, files associated with the database, the file layout, and other attributes and properties associated with the database.

In a step 615, the backup agent registers the discovered database with the changed block tracking (CBT) driver to track changes to the database file. In other words, the backup agent provides to the CBT driver an identification of the database files that the backup agent wishes to track changes to. This may include, for example, providing to the CBT driver the name of the database file and the location of the database file.

In a step 620, the backup agent performs a full backup of the database and uncommitted log files associated with the database as of the time of the backup. In a step 625, the database and log files are stored in separate containers on a backup storage node.

FIG. 7 shows a block diagram of a full backup of a database according to a specific embodiment. In this specific embodiment, the database includes an Exchange database, the backup client is an Exchange Server, the changed block tracking driver is referred to as a block-based backup (BBB) filter driver, the backup agent is referred to as the EMC NetWorker Module for Microsoft (NMM), and the backup storage node includes the EMC Data Domain system and Data Domain Storage Unit.

As shown in the example of FIG. 7, there is an Exchange server 710 having an Exchange database 715. There is a Data Domain storage node 720 having a Data Domain storage unit 725. A backup agent (e.g., NMM) 730 and changed or block-based backup (BBB) filter driver 735 is installed at the Exchange server. NMM includes transport logic 740. The transport logic includes a BBB transport module 745.

When a backup of the Exchange database is to be performed, NMM coordinates with the copy service (e.g., VSS) on the Exchange server to obtain a snapshot 750 of the volume storing the Exchange database. Upon completion of the snapshot, the BBB filter driver starts writing tracking in order to capture subsequent changes to the database for a next backup of the database. In particular, the BBB filter driver maintains a change block map 755 that tracks changes to a file block associated with the particular file that was registered by the backup agent for change tracking.

The backup agent (NMM), or more particularly the BBB transport module, is responsible for reading from the snapshot volume an Exchange database file 760 and associated uncommitted log files 765, and transmitting 770, 775 the database and log files to the backup storage node. The BBB transport module is further responsible for creating on the backup storage node a first container (e.g., first virtual hard disk or first VHDx) 780 and a second container (e.g., second virtual hard disk or second VHDx) 785, and partitioning and formatting the volumes. The BBT transport module copies complete contents of Exchange EDB file 760 from snapshot 750 to first container or VHDx 780.

Similarly, the BBB transport module copies complete contents of uncommitted log files 765 from snapshot 750 to second container or VHDx 785, which is separate from first container or VHDx 780. As discussed above, the backups may be organized into savesets. For example, as shown in FIG. 7, there can be a database or EDB saveset 790 within the first container for the database. There can be a logs saveset 795 within the second container for the uncommitted log files associated with the database.

Referring back to FIG. 6, after the full backup is performed, in a step 630, the backup agent performs an incremental backup of changes to the database since a last backup. The last backup may be a full backup or a previous incremental backup. The incremental backup will include data of the database that has not been previously backed up. The data may include modifications to existing data in the database after the last backup, newly added data to the database after the last the backup, or both.

In a step 635, the backup agent includes with the incremental backup of the database, new uncommitted log files associated with the database as of this current backup. In other words, the incremental backup includes both the changes to the database since the last backup and the new uncommitted log files. In a step 640, the changes to the database and the new uncommitted log files are stored in separate containers on the backup storage node. The incremental backup process shown in steps 630-640 can then be repeated 645 for a next or subsequent incremental backup.

FIG. 8 shows a more detailed flow 805 of an incremental backup. In a step 810, the backup agent issues a request to a copy service requesting a snapshot of the volume storing the database and identifying a writer of the database that should participate in creating the snapshot. In a specific embodiment, the database writer includes the Exchange VSS writer. Identifying the database writer allows the copy service (e.g., VSS) to coordinate with the corresponding application and writer to prepare for the backup. In this specific embodiment, the Exchange VSS writer prepares the Exchange database for a backup by, for example, flushing memory-resident data and freezing or stopping writes to the database and logs on disk.

In a step 815, the backup agent receives the volume snapshot or, more specifically, receives an indication that the volume snapshot has been created. Even though Exchange may be the only application being backed up by the backup agent, VSS creates a snapshot of the entire volume or disk where any Exchange data such as Exchange database files and log files are stored. As a result, portions of the snapshot may include data that should not be backed up by the backup agent. It is desirable to identify what particular data from the snapshot should be backed up. For example, data from the snapshot to not backup may include non-Exchange data, Exchange data that has previously been backed up (e.g., unchanged Exchange data), or both.

Thus, in a step 820, the backup agent receives from the CBT driver a listing of file blocks that should be backed up based on the files (e.g., Exchange database file) the backup agent had previously registered to be tracked. In a specific embodiment, the listing of file blocks includes Exchange database file blocks that have changed since a last backup of the database. In this specific embodiment, a block is the unit of data (R\W). Generally, that may be referred to as "cluster size" and "cluster" in file system terminology. The list of changed blocks includes the addresses of these clusters or cluster numbers. The listing from the CBT driver may specify the changed file blocks by, for example, a starting offset and length.

In a step 825, the backup agent accesses or reads from the snapshot the changed file blocks specified in the listing from the CBT driver. The backup agent transmits from the snapshot the changed file blocks of the database to a backup storage unit for storage in a first container.

In a step 830, the backup agent analyzes the database on the snapshot to identify or determine the uncommitted log files associated with the database that should be included with the incremental backup. For example, certain database applications, such as Exchange track which logs have or have not been committed to the database. This information may be stored in a header of the database or other file. The backup agent reviews this information to identify the log files that should be included with the backup. In a specific embodiment, the backup agent, issues a query to an Exchange Management tool to request a dump of a header of the database. The backup agent reviews the header information to determine the log range corresponding to the uncommitted log files. As discussed, these log files are required to recover the database as the log files store data that have yet to be committed to the database.

In a step 835, the backup agent searches the snapshot for the uncommitted log files associated with the database on the snapshot and transmits the uncommitted log files from the snapshot to the backup storage unit for storage in a second container, separate from the first container. As discussed, VSS creates a snapshot of the entire volume or disk. Distinguishing between log files that have and have not been committed allows the backup agent to reduce the number of log files to include with the incremental backup.

In particular, committed log files can be excluded, omitted, or not transmitted to the backup storage unit as part of the backup or incremental backup. The committed log files can be excluded from the incremental backup because the backup includes the database or, more particularly, the changed portions of the database. There is no need to maintain the full series of transaction logs since each incremental backup can include the actual changed data stored in the database.

Incremental backup processes 810-835 can be repeated 840 for a subsequent or next incremental backup. The CBT driver coordinates with the copy service (e.g., VSS) to begin tracking changes as soon as the snapshot is created. For example, once the copy service releases the IO stack after the creation of the snapshot, the CBT driver integrates with the copy service event instructing the filter driver that an IO thaw has occurred. Upon completion of the snapshot, the CBT driver can clear or reset its change block map or start a new map and begin write tracking for a subsequent or next incremental backup.

An incremental backup may occur on a daily basis or any other frequency as desired. Further, as discussed above, an incremental backup may be accompanied by the creation of a synthetic full backup on the backup storage node. In a specific embodiment, creating a synthetic full backup includes obtaining a previous copy of the database file, creating a new virtual copy of the database, and replacing file blocks in the new virtual copy with the corresponding changed file blocks from a current backup. The net effect is that each incremental backup may result in a "full backup" or synthesized full backup. The synthesized full backup includes a "full backup" of the database and a set of uncommitted log files as of the last incremental backup associated with the synthesized full backup. As a result, recovery time objectives are improved because a restore of the database will not require a multi-stage restore involving a sequence of incremental backups to apply.

FIG. 9 shows a block diagram of an incremental backup of a database according to a specific embodiment. In this specific embodiment, the database includes an Exchange database. FIG. 9 is similar to FIG. 7. FIG. 7, however, shows a full backup and FIG. 9 shows the incremental backup that may follow the full backup shown in FIG. 7.

Referring now to FIG. 9, backup agent (e.g., NMM) 730 installed at Exchange server 710 obtains or receives a snapshot volume 915 of the Exchange server. The agent obtains or receives 925 a list of changed file blocks for an Exchange (e.g., *.edb) file 760' on the snapshot from changed block or block-based backup (BBB) filter driver 735. Exchange database file 760' (FIG. 9) corresponds to Exchange database file 760 (FIG. 7) and represents the changes made to the database after the backup shown in FIG. 7. As discussed, the filter driver maintains a change block map. The map tracks which blocks of a particular file registered with the filter driver that have changed since a last backup of the particular file. In FIG. 9, the changed blocks for the Exchange EDB file are shown as solid black-filled squares in a change block map 755'. Change block map 755' (FIG. 9) corresponds to change block map 755 (FIG. 7). The map shown in FIG. 9, however, has been updated to show the file blocks of the Exchange database file that have changed since the backup shown in FIG. 7.

The backup agent coordinates 940 with storage node 720 to backup changed file blocks 945 and a new set of uncommitted log files 943 to storage unit 725 of the storage node. In particular, changed file blocks 945 of the Exchange database file are transmitted 950 from the snapshot volume on the Exchange server to the storage unit. The changed file blocks may be organized into a saveset (identified in FIG. 9 as an "EDB saveset") are stored in a third container 953 created on the backup storage node or unit. In a specific embodiment, the third container is a virtual hard disk file or, more specifically, a differencing virtual hard disk file.

New set of uncommitted log files 943 are transmitted 960 from the snapshot volume on the Exchange server to the storage unit. The new set of uncommitted log files may be organized into another saveset (identified in FIG. 9 as "Logs saveset") are stored in a fourth container 965 created on the backup storage node or unit, separate from the third container. The fourth container stores a full backup of the new set of uncommitted log files. In a specific embodiment, the fourth container is a virtual hard disk file.

In other words, during an incremental backup of a database file, the storage node receives a set of changed file blocks, and set of uncommitted log files. The changed file blocks correspond to changes to the database file since a last backup of the database file. The set of uncommitted log files store data not yet committed to the database file as of the incremental backup.

As shown in the example of FIG. 5, a full backup of the database may be synthesized on the backup storage node. In a specific embodiment, the backup storage node uses a process referred to as Data Domain (DD) virtual synthetics to create a copy of the Exchange database (EDB) saveset from a previous backup. The process copies changed file blocks of the EDB file on the snapshot to replace the associated virtual hard disk file blocks in the virtual hard disk file of a new EDB saveset. The backup system, however, creates a new full (level 0) container for the logs saveset since the logs change with each backup.

FIG. 10 shows an overall flow 1005 for restoring a database. In a step 1010, the containers storing the database file and uncommitted log files associated with the database are mounted. In a step 1015, the backup agent reads the database file and log files from their respective containers and copies the files to a local or remote server where the Exchange server is running In a step 1020, the database is attached to the Exchange application. In a step 1025, the uncommitted log files are replayed against the restored database. In a specific embodiment, copying is not needed for a Granular Level Restore (GLR). A Granular Level Restore refers to restoring select user emails, folders, or both from the backup of an Exchange database. A user does not need to restore the entire Exchange database. Rather, the Exchange database backup can be mounted and there can be an application including other third party applications that can be used to browse and restore user mail boxes, emails, and so forth.

It should be appreciated that while some embodiments are shown and described in conjunction with backing up a Microsoft Exchange database and log files, aspects and principles of the system can be applicable to backing up other types of data including other types of databases.

In a specific embodiment, a backup method includes identifying a changed block tracking (CBT) driver that tracks the changes for files, initializing the CBT for the Exchange databases, taking a snapshot of an Exchange database, performing a first full backup of the Exchange database (e.g., *.edb and logs), selecting a backup format that supports the chaining, creating separate backup containers for the edb and log files, and saving the edb and log files in their respective containers. The selected backup format may be a VHDx formatted file.

In another specific embodiment, a backup method for a next or subsequent incremental backup includes taking a snapshot, retrieving the changed file block from the driver (CBT) for the edb file, reading the changed file blocks from the snapshot, creating a container to hold the data on the target, associating the child container with the parent through incremental chaining, populating the container to hold the changed file blocks, and saves the uncommitted logs in another container.

In a specific embodiment, a method for an incremental backup of a database stored on a volume of a backup client includes issuing a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that should participate in creating the snapshot; receiving an indication from the copy service that the snapshot of the volume has been created on the backup client; identifying on the snapshot changed file blocks associated with the database, the changed file blocks including changes to the database since a last backup of the database; identifying on the snapshot log files associated with the database, the log files comprising data not yet committed to the database on the snapshot; and transmitting for the incremental backup of the database the changed file blocks and the log files to a backup storage unit.

The changes to the database may be stored in a first container on the backup storage unit, and the log files may be stored in a second container on the backup storage unit, separate from the first container. A first container on the backup storage unit may include a virtual hard disk file that stores a full backup of the database, a second container on the backup storage unit may include a virtual hard disk differencing file that stores the changes to the database. A synthetic full backup of the database is created on the backup storage unit based on the virtual hard disk file and the virtual hard disk differencing file. The virtual hard disk file and the virtual hard disk differencing file may be maintained on the backup storage unit as separate files.

The method may further include after the receiving an indication that the snapshot has been created, requesting a database management utility to identify from the database on the snapshot a log range corresponding to uncommitted log files; and receiving the log range in response to the request, where the uncommitted log files are the log files to be included with the incremental backup of the database.

The transmitting for the incremental backup may include excluding from the transmission other log files associated with the database, where the other log files include data committed to the database on the snapshot. The database may include a Microsoft Exchange Database.

In another specific embodiment, there is a system for an incremental backup of a database stored on a volume of a backup client, the system including a processor-based system executed on a computer system and configured to: issue a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that should participate in creating the snapshot; receive an indication from the copy service that the snapshot of the volume has been created on the backup client; identify on the snapshot changed file blocks associated with the database, the changed file blocks including changes to the database since a last backup of the database; identify on the snapshot log files associated with the database, the log files comprising data not yet committed to the database on the snapshot; and transmit for the incremental backup of the database the changed file blocks and the log files to a backup storage unit.

In another specific embodiment, there is a computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to implement a method including issuing a request to a copy service requesting a snapshot of the volume having the database, and identifying a writer of the database that should participate in creating the snapshot; receiving an indication from the copy service that the snapshot of the volume has been created on the backup client; identifying on the snapshot changed file blocks associated with the database, the changed file blocks including changes to the database since a last backup of the database; identifying on the snapshot log files associated with the database, the log files comprising data not yet committed to the database on the snapshot; and transmitting for the incremental backup of the database the changed file blocks and the log files to a backup storage unit.

In the description above and throughout, numerous specific details are set forth in order to provide a thorough understanding of an embodiment of this disclosure. It will be evident, however, to one of ordinary skill in the art, that an embodiment may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form to facilitate explanation. The description of the preferred embodiments is not intended to limit the scope of the claims appended hereto. Further, in the methods disclosed herein, various steps are disclosed illustrating some of the functions of an embodiment. These steps are merely examples, and are not meant to be limiting in any way. Other steps and functions may be contemplated without departing from this disclosure or the scope of an embodiment. Other embodiments include systems and non-volatile media products that execute, embody or store processes that implement the methods described above.

* * * * *