Method, Electronic Device, And Computer Program Product For Selecting Backup Destination

Jia; Zhen ;   et al.

Patent Application Summary

U.S. patent application number 17/072702 was filed with the patent office on 2022-03-17 for method, electronic device, and computer program product for selecting backup destination. The applicant listed for this patent is EMC IP Holding Company LLC. Invention is credited to Zhen Jia, Qi Wang, Ren Wang, Jing Yu, Yun Zhang.

Application Number20220083431 17/072702
Document ID /
Family ID1000006178496
Filed Date2022-03-17

United States Patent Application 20220083431
Kind Code A1
Jia; Zhen ;   et al. March 17, 2022

METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR SELECTING BACKUP DESTINATION

Abstract

Implementations of the present disclosure provide a method, an electronic device, and a computer program product for selecting a backup destination. One method includes: receiving device information about storage devices in a storage device set, wherein a backup task is executed in the storage device set; receiving backup information about the backup task; acquiring a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; and selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set.


Inventors: Jia; Zhen; (Shanghai, CN) ; Wang; Qi; (Shanghai, CN) ; Zhang; Yun; (Shanghai, CN) ; Wang; Ren; (Shanghai, CN) ; Yu; Jing; (Shanghai, CN)
Applicant:
Name City State Country Type

EMC IP Holding Company LLC

Hopkinton

MA

US
Family ID: 1000006178496
Appl. No.: 17/072702
Filed: October 16, 2020

Current U.S. Class: 1/1
Current CPC Class: G06F 2201/805 20130101; G06F 11/1461 20130101
International Class: G06F 11/14 20060101 G06F011/14

Foreign Application Data

Date Code Application Number
Sep 16, 2020 CN 202010972953.X

Claims



1. A method including: receiving device information about storage devices in a storage device set, wherein a backup task is executed in the storage device set; receiving backup information about the backup task; utilizing a machine learning system to determine at least one network model characterizing a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set; and executing the backup task utilizing the selected backup destination.

2. The method according to claim 1, wherein receiving the device information and the backup information further includes: receiving the device information and the backup information that are within a preset time period.

3. The method according to claim 1, wherein the device information includes, for each of one or more of the storage devices in the storage device set, at least any one of the following: a position of the storage device; an available storage space of the storage device; a network bandwidth of the storage device; a CPU usage rate of the storage device; a memory usage rate of the storage device; and an exhaustion time of the storage device.

4. The method according to claim 1, wherein the backup information includes at least any one of the following: the number of backup copies specified by the backup task; a size of source data specified by the backup task; and a repetition rate of the source data.

5. The method according to claim 1, wherein utilizing the machine learning system to determine at least one network model characterizing the destination association relationship includes: determining reference backup information about the reference backup task executed in the reference storage device set; determining reference device information about each reference storage device in the reference storage device set; and training the destination association relationship based on the reference backup information, the reference device information, and the reference backup destination of the reference backup task.

6. The method according to claim 5, wherein the destination association relationship includes: a first network model based on a convolutional neural network, wherein the first network model is used to map the reference backup information and the reference device information to an internal feature vector; and a second network model based on a long short-term memory network, wherein the second network model is used to map the internal feature vector to the reference backup destination of the reference backup task.

7. The method according to claim 1, wherein determining the backup destination includes: mapping the backup information and the device information to an internal feature vector based on a first network model; and mapping the internal feature vector to the backup destination based on a second network model.

8. The method according to claim 1, wherein determining the backup destination further includes verifying the backup destination in response to the backup destination satisfying the following conditions: a distance between any two storage devices in the group of storage devices that are in the storage device set and included in the backup destination is greater than a threshold distance; an available resource amount of any storage device in the group of storage devices that are in the storage device set and included in the backup destination is greater than a threshold resource amount; and a global balance degree associated with the backup destination is higher than a threshold balance degree, wherein the global balance degree indicates a usage balance degree of the storage device set in a situation where a storage device in the backup destination is used for the backup task.

9. The method according to claim 1, wherein the storage device set and the reference storage device set satisfy at least any one of the following: having same or similar numbers of storage devices; and having same or similar device models.

10. The method according to claim 1, wherein a number of copies specified by the backup task is not higher than a number of copies specified by the reference backup task.

11. An electronic device, including: at least one processor; and at least one memory storing computer program instructions, wherein the at least one memory and the computer program instructions are configured to cause, together with the at least one processor, the electronic device to perform actions, the actions including: receiving device information about storage devices in a storage device set, wherein a backup task is executed in the storage device set; receiving backup information about the backup task; utilizing a machine learning system to determine at least one network model characterizing a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set; and executing the backup task utilizing the selected backup destination.

12. The device according to claim 11, wherein receiving the device information and the backup information further includes: receiving the device information and the backup information that are within a preset time period.

13. The device according to claim 11, wherein the device information includes, for each of one or more of the storage devices in the storage device set, at least any one of the following: a position of the storage device; an available storage space of the storage device; a network bandwidth of the storage device; a CPU usage rate of the storage device; a memory usage rate of the storage device; and an exhaustion time of the storage device.

14. The device according to claim 11, wherein the backup information includes at least any one of the following: the number of backup copies specified by the backup task; a size of source data specified by the backup task; and a repetition rate of the source data.

15. The device according to claim 11, wherein utilizing the machine learning system to determine at least one network model characterizing the destination association relationship includes: determining reference backup information about the reference backup task executed in the reference storage device set; determining reference device information about each reference storage device in the reference storage device set; and training the destination association relationship based on the reference backup information, the reference device information, and the reference backup destination of the reference backup task.

16. The device according to claim 15, wherein the destination association relationship includes: a first network model based on a convolutional neural network, wherein the first network model is used to map the reference backup information and the reference device information to an internal feature vector; and a second network model based on a long short-term memory network, wherein the second network model is used to map the internal feature vector to the reference backup destination of the reference backup task.

17. The device according to claim 11, wherein determining the backup destination includes: mapping the backup information and the device information to an internal feature vector based on a first network model; and mapping the internal feature vector to the backup destination based on a second network model.

18. The device according to claim 11, wherein determining the backup destination further includes verifying the backup destination in response to the backup destination satisfying the following conditions: a distance between any two storage devices in the group of storage devices that are in the storage device set and included in the backup destination is greater than a threshold distance; an available resource amount of any storage device in the group of storage devices that are in the storage device set and included in the backup destination is greater than a threshold resource amount; and a global balance degree associated with the backup destination is higher than a threshold balance degree, wherein the global balance degree indicates a usage balance degree of the storage device set in a situation where a storage device in the backup destination is used for the backup task.

19. The device according to claim 11, wherein the storage device set and the reference storage device set satisfy at least any one of the following: having same or similar numbers of storage devices; and having same or similar device models; and further wherein a number of copies specified by the backup task is not higher than a number of copies specified by the reference backup task.

20. A computer program product tangibly stored on a non-volatile computer-readable medium and including machine-executable instructions, wherein the machine-executable instructions, when executed, cause a machine to perform steps of a method, the method including: receiving device information about storage devices in a storage device set, wherein a backup task is executed in the storage device set; receiving backup information about the backup task; utilizing a machine learning system to determine at least one network model characterizing a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set; and executing the backup task utilizing the selected backup destination.
Description



RELATED APPLICATION(S)

[0001] The present application claims priority to Chinese Patent Application No. 202010972953.X, filed Sep. 16, 2020, and entitled "Method, Electronic Device, and Computer Program Product for Selecting Backup Destination," which is incorporated by reference herein in its entirety.

FIELD

[0002] The implementations of the present disclosure generally relate to storage systems, and more particularly to a method, an electronic device, and a computer program product for selecting a storage device as a backup destination.

BACKGROUND

[0003] Many companies or enterprises generate large amounts of data every day. For security of data, data protection becomes more and more important. In this regard, a backup storage system can provide data protection so as to copy data to be backed up to one or more storage devices, thereby obtaining one or more data copies stored in different storage devices.

[0004] At present, it has been proposed to select a storage device subset that can be used as backup destinations based on states of multiple optional storage devices in a storage device set. For example, a score can be set for the state of each storage device in the storage device set, and various combination modes (for example, the modes based on permutation and combination) for generating a storage device subset can be determined. However, when there is a large number (for example, dozens or more) of storage devices, there will be tens or even hundreds of thousands of combination modes based on the number of backup copies. At this moment, when a backup destination is selected, a huge amount of computation will be involved, and therefore, it is impossible to provide users with recommendations for backup destinations in an effective manner.

SUMMARY

[0005] Implementations of the present disclosure provide a technical solution for determining, in a storage device set, a storage device subset for data backup, and specifically provide a method, an electronic device, and a computer program product for storage management.

[0006] In a first aspect of the present disclosure, a method for selecting a backup destination for a backup task is provided. This method includes: receiving device information about storage devices in a storage device set, wherein the backup task is executed in the storage device set; receiving backup information about the backup task; acquiring a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; and selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set.

[0007] In a second aspect of the present disclosure, an electronic device is provided, including: at least one processor; and at least one memory storing computer program instructions, wherein the at least one memory and the computer program instructions are configured to cause, together with the at least one processor, the electronic device to perform an action for selecting a backup destination for a backup task. The action includes: receiving device information about storage devices in a storage device set, wherein the backup task is executed in the storage device set; receiving backup information about the backup task; acquiring a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; and selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set.

[0008] In a third aspect of the present disclosure, a computer program product is provided. The computer program product is tangibly stored on a non-volatile computer-readable medium and includes machine-executable instructions. The machine-executable instructions, when executed, cause a machine to execute steps of the method according to the first aspect.

[0009] It should be understood that the content described in this Summary is neither intended to limit key or essential features of the implementations of the present disclosure nor intended to limit the scope of the present disclosure. Other features of the present disclosure will become readily understandable through the following description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] The above and other objectives, features, and advantages of the implementations of the present disclosure will become readily understandable by reading the following detailed description with reference to the accompanying drawings. In the accompanying drawings, several implementations of the present disclosure are shown by way of example and not limitation.

[0011] FIG. 1 schematically shows a block diagram of an example application environment in which example implementations of the present disclosure can be implemented;

[0012] FIG. 2 schematically shows a block diagram of a process for selecting a backup destination according to an example implementation of the present disclosure;

[0013] FIG. 3 schematically shows a flowchart of a method for selecting a backup destination according to an example implementation of the present disclosure;

[0014] FIG. 4A schematically shows a block diagram of a data structure of device information according to an example implementation of the present disclosure;

[0015] FIG. 4B schematically shows a block diagram of a data structure of backup information according to an example implementation of the present disclosure;

[0016] FIG. 4C schematically shows a block diagram of a data structure of a feature vector according to an example implementation of the present disclosure;

[0017] FIG. 5 schematically shows a block diagram of acquiring a destination association relationship based on a feature vector according to an example implementation of the present disclosure;

[0018] FIG. 6A is a graphical plot related to changes in available storage space according to an example implementation of the present disclosure;

[0019] FIG. 6B is a graphical plot related to changes in network bandwidth according to an example implementation of the present disclosure;

[0020] FIG. 7 schematically shows a block diagram of acquiring a destination association relationship based on a feature vector according to an example implementation of the present disclosure; and

[0021] FIG. 8 schematically shows a block diagram of a device that can be used to implement the example implementations of the present disclosure.

[0022] Throughout all the accompanying drawings, the same or similar reference numerals are used to indicate the same or similar components.

DETAILED DESCRIPTION

[0023] The principles and spirit of the present disclosure will be described below with reference to several example implementations shown in the accompanying drawings. It should be understood that these implementations are described only for enabling a person skilled in the art to better understand and then implement the present disclosure, instead of limiting the scope of the present disclosure in any way. In the description and claims herein, unless otherwise defined, all technical and scientific terms used herein have meanings that are commonly understood by those of ordinary skill in the art to which the present disclosure belongs.

[0024] At present, the concept of a distributed storage system has been proposed, and a distributed storage system may include hundreds or even more storage devices. For example, these storage devices can be distributed all over the world. Firstly, an application environment for example implementations of the present disclosure will be described with reference to FIG. 1. FIG. 1 schematically shows block diagram 100 of an example application environment in which example implementations of the present disclosure can be implemented. The storage system as shown in FIG. 1 may include storage device set 110, which may include N storage devices, such as those as shown with reference numerals 110-1, 110-2, 110-3, 110-4, 110-5, 110-6, 110-7, . . . , and 110-N.

[0025] Backup task 120 may specify the number of backup copies. For example, it may specify that 3 backup copies are required. The number of copies can be input by the user of the storage system, and the user can specify the number of copies in a service level agreement (SLA). For the convenience of description, hereinafter, the number of copies of 3 will be taken as an example for description. It should be understood that the implementations of the present disclosure can be applied to any other number of copies.

[0026] At this moment, 3 storage devices need to be selected from the N storage devices as the backup destinations. There may be multiple candidate subsets 130, and each candidate subset includes 3 storage devices. For example, candidate subset 130-1 may include storage devices 110-1, 110-2, and 110-3, . . . , and candidate subset 130-M may include storage devices 110-6, 110-7, and 110-N. Backup destination 140 may be selected from the multiple candidate subsets 130 so as to be used as the backup destination of backup task 120.

[0027] At present, it has been proposed to select a group of storage devices that can be used as backup destinations based on the states of multiple optional storage devices in the storage device set. For example, a score can be set for the state of each storage device in the storage device set, and the scores of various combination modes (for example, the modes based on permutation and combination) of generating a candidate subset of storage devices can be determined. However, when there is a large number (for example, dozens of or more) of storage devices, there will be tens or even hundreds of thousands of combination modes based on the number of backup copies. Assuming that there are N storage devices and X copies are expected to be stored, there can be C.sub.N.sup.X combination modes. For example, if N=1000 and X=3, the number of combination modes is

1 .times. 0 .times. 0 .times. 0 .times. 9 .times. 9 .times. 9 .times. 9 .times. 9 .times. 8 3 .times. 2 .times. 1 = 166167000. ##EQU00001##

At this moment, when selecting a backup destination from a large number of combination modes, a huge amount of computation will be involved.

[0028] In view of the foregoing problems and other potential problems in the conventional solutions, the implementations of the present disclosure provide a technical solution for selecting a backup destination for a backup task from a storage device set. In the implementations of the present disclosure, a destination association relationship can be created based on the operation history of the current storage system or other similar storage systems. Then, when a backup task needs to be performed in the storage system, the current information about the storage system can be collected, and this current information can be input into the destination association relationship, so as to obtain a storage device that can be used as a backup destination.

[0029] Hereinafter, an overview of an example implementation 200 according to the present disclosure will be described with reference to FIG. 2. As shown in FIG. 2, device information 210 about storage devices in storage device set 110 is received, and backup information 220 about backup task 120 is received. Backup destination 140 is selected for backup task 120 from storage device set 110 according to destination association relationship 230 and based on device information 210 and backup information 220. According to an example implementation of the present disclosure, destination association relationship 230 here may be obtained based on historical operation state information about the storage system, or may be obtained based on historical operation state information about other storage systems similar to the current storage system.

[0030] With the example implementation 200 of the present disclosure, destination association relationship 230 can be established directly based on historical experience that has been verified as valid. In this way, it is not necessary to determine the score for each combination mode one by one, but instead, the current device information 210 and backup information 220 are directly input into destination association relationship 230 to obtain one or more backup destinations. In this way, the computing resource and time overheads in the process of selecting the backup destination can be greatly reduced, and the backup destination can be determined in a faster and more effective manner.

[0031] Hereinafter, more details of an example implementation according to the present disclosure will be described with reference to FIG. 3. FIG. 3 schematically shows a flowchart of method 300 for selecting a backup destination according to an example implementation of the present disclosure. At block 310, device information 210 about storage devices in storage device set 110 is received. Here, backup task 120 is executed in storage device set 110. It will be understood that storage device set 110 includes a large number of storage devices, and device information 210 at this moment may include information about each storage device in storage device set 110.

[0032] Device information 210 may include various aspects of contents. Hereinafter, more details about device information 210 will be described with reference to FIG. 4A. FIG. 4A schematically shows block diagram 400A of a data structure of device information 210 according to an example implementation of the present disclosure. As shown in FIG. 4A, device information 210 may include at least any one of the following: position 410 of the storage device, available storage space 412 of the storage device, network bandwidth 414 of the storage device, CPU usage rate 416 of the storage device, memory usage rate 418 of the storage device, exhaustion time 420 of the storage device, and so on.

[0033] In the context of the present disclosure, position 410 may be represented by the longitude and latitude or other coordinate information of the place where the storage device is located. Available storage space 412 represents the remaining storage space in the storage device. For example, it may be represented with the size (GB) of the available storage space. Alternatively and/or additionally, it may be represented with the percentage of the available storage space. Network bandwidth 414 of the storage device refers to the available bandwidth of the storage device, and this bandwidth may vary with the size of the data transmission load of the storage device. CPU usage rate 416 and the memory usage rate 418 of the storage device respectively represent the absolute value or percentage of the CPU and memory in the storage device that have been used. Exhaustion time 420 refers to how long until available storage space 412 in the storage device will be exhausted. Exhaustion time 420 may be determined based on the speed of data transmission to the storage device and available storage space 412.

[0034] It will be understood that FIG. 4A only schematically shows an example of information related to the storage device. According to an example implementation of the present disclosure, device information 210 may include other information about the storage device, for example, the type of storage medium of the storage device (for example, a solid-state storage device or a conventional hard disk device). With the example implementation of the present disclosure, the device information about each storage device in storage device set 110 can be counted. In this way, various aspects of information about each of the storage devices can be fully considered in order to select a suitable backup destination.

[0035] Returning to FIG. 3, at block 320 of FIG. 3, backup information 220 about backup task 120 is received. More information about backup information 220 will be described with reference to FIG. 4B. FIG. 4B schematically shows block diagram 400B of a data structure of backup information 220 according to an example implementation of the present disclosure. As shown in FIG. 4B, backup information 220 may include the size of source data 430, indicating the size of the source data to be backed up as specified by backup task 120. It will be understood that the source data will generally become larger and larger, so backup information 220 may include source data growth rate 432, which is used to indicate the percentage (for example, 5%) or absolute value of the daily growth of the source data. Further, the source data will usually not be completely changed, but instead, the source data of two consecutive backups will have a certain degree of duplication. Therefore, backup information 220 may include source data repetition rate 434, which is used to indicate the repetition rate (for example, 50%) of the source data of two consecutive backups.

[0036] It will be understood that FIG. 4B only schematically shows an example of backup information 220 related to backup task 120. According to an example implementation of the present disclosure, backup information 220 may include other information about the backup task, for example, the transmission time taken to back up the source data to a certain storage device, and so on. The transmission time can be determined based on the available bandwidth of the storage device and the size of source data 430. With the example implementation of the present disclosure, backup information 220 about backup task 120 to be executed can be counted. Furthermore, information about various aspects of backup task 120 can be fully considered, so as to select a backup destination suitable for backup task 120 from storage device set 110.

[0037] The specific contents of device information 210 and backup information 220 have been described above with reference to FIGS. 4A and 4B. Further, a feature vector representing an overall state associated with the execution of backup task 120 in the storage system may be generated based on device information 210 and backup information 220. Hereinafter, more details about the feature vector will be described with reference to FIG. 4C. FIG. 4C schematically shows block diagram 400C of a data structure of feature vector 440 according to an example implementation of the present disclosure.

[0038] As shown in FIG. 4C, feature vector 440 may include device information about each storage device: backup information 512 about a first storage device, backup information 514 about a second storage device, . . . , and backup information 516 about an Nth storage device. Further, this feature vector 440 may include backup information 220 about backup task 120. According to an example implementation of the present disclosure, a multidimensional vector can be used to represent feature vector 440.

[0039] How to acquire destination association relationship 230 will be described by returning to FIG. 3. At block 330, destination association relationship 230 is acquired. Destination association relationship 230 here may be a network model obtained based on machine learning technology, wherein this network model may describe an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in the reference storage system. At block 340, a backup destination is selected for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set.

[0040] According to an example implementation of the present disclosure, the reference storage device set may be the storage device set in the current storage system. For example, destination association relationship 230 may be obtained based on the operation history data of the current storage system. Assume that the storage device set includes 1000 storage devices, and 500 backup tasks have been performed during the operation of the storage system. At this moment, training samples can be generated using feature vectors and historical backup destinations related to the execution of the 500 historical backup tasks, so as to obtain the association relationship between the backup destinations and the backup environment.

[0041] Specifically, feature vector 440 as shown in FIG. 4C can be generated for each historical backup task based on the manner described above, and the historical backup destination of each historical backup task can be acquired. It will be understood that there is no limitation on how to acquire the historical backup destination. According to an example implementation of the present disclosure, the historical backup destination can be selected based on manual operations of an administrator of the storage system. For another example, the historical backup destination can be selected based on a usage balance degree of each storage device.

[0042] Further, this destination association relationship 230 may be obtained based on feature vector 440 and the historical backup destination. According to an example implementation of the present disclosure, training operations can be performed based on various technologies currently known and/or to be developed in the future. According to an example implementation of the present disclosure, destination association relationship 230 can be obtained based on the convolutional neural network.

[0043] According to an example implementation of the present disclosure, reference backup information about a group of reference backup tasks in a reference storage device set can be determined. According to the format shown in FIG. 4C above, a group of training samples can be generated based on the reference backup information about the group of reference backup tasks and reference device information about each reference storage device. Further, the reference device information about each reference storage device in the reference storage device set can be determined, and destination association relationship 230 can be obtained based on the group of training samples and reference backup destinations of the group of reference backup tasks.

[0044] FIG. 5 schematically shows block diagram 500 of acquiring destination association relationship 230 based on a feature vector according to an example implementation of the present disclosure. As shown in FIG. 5, one training sample 510 may be generated for one historical backup task, and this training sample 510 may include reference feature vector 512 and reference backup destination 514. Specifically, the device information about each storage device and the backup information about the historical backup task may be received based on the method described above, so as to generate reference feature vector 512. Further, the backup destination of the historical backup task can be acquired to serve as reference backup destination 514. A similar operation can be performed for each historical backup task, so as to obtain a training sample corresponding to each historical backup task. With the example implementation of the present disclosure, based on past historical operations, a wealth of training samples can be obtained. The training samples at this moment will include the successful experience of selecting the backup destinations, which will help to select a suitable backup destination for a future backup operation.

[0045] According to an example implementation of the present disclosure, destination association relationship 230 can be obtained in an iterative manner using the training samples. For example, this destination association relationship 230 may be realized based on convolutional neural network 520. After the training phase is completed, when reference feature vector 512 is input to the trained destination association relationship 230, backup destination 530 that is output is consistent with reference backup destination 514 in training sample 510.

[0046] According to an example implementation of the present disclosure, the training samples in the training set can be used to obtain destination association relationship 230. Test samples in a test set can be used to test whether destination association relationship 230 can obtain a correct output result. Further, destination association relationship 230 can be adjusted so that this association relationship can better match the test set.

[0047] The process of training destination association relationship 230 based on the historical data of the storage system itself has been described above. According to an example implementation of the present disclosure, destination association relationship 230 may also be obtained based on historical data of different storage systems. Assuming that there are two identical storage systems, historical data of one storage system can be used to obtain destination association relationship 230. Further, the obtained destination association relationships 230 may be used to select backup destinations in two different storage systems, respectively.

[0048] According to an example implementation of the present disclosure, the reference storage system used to provide training samples does not have to be identical to the current storage system, but instead, this reference storage system may be similar to the current storage system. For example, the reference storage device set included in the reference storage system may have a similar number of storage devices as the current storage device set. It can be set that the ratio of the numbers of storage devices in the two storage device set should satisfy a threshold range. For example, this threshold range can be represented as [1-4, 1+4], where 4 can be set to 0.005 and/or other values. The smaller the value of 4, the more similar the numbers of storage devices in the two storage device set. At this moment, destination association relationship 230 obtained based on the historical data of the reference storage system is more suitable for the current storage system.

[0049] It will be understood that the selection of the backup destination largely depends on the configurations of the storage devices of the storage system. Therefore, the reference storage system and the current storage system should have the same or similar device configurations. For example, it can be specified that the capacity of the reference storage device in the reference storage system should be similar to the capacity of the storage device of the current storage system, and it can be specified that the type of hard disk of the reference storage device is the same as that of the storage device, and so on. In this way, it can be ensured that all aspects of the configuration of the reference storage devices in the reference storage system that are used as the training basis are similar to those of the current storage system, so that destination association relationship 230 can be more suitable for the current storage system.

[0050] It will be understood that although the use of destination association relationship 230 from the reference storage system may lead to low accuracies in some cases, this destination association relationship 230 can output a more preferred backup destination in most cases. With the example implementation of the present disclosure, it is not necessary to train destination association relationship 230 respectively for each storage system, and thus the reusability of destination association relationship 230 can be greatly improved, and the time and computing resource overheads of the training phase can be reduced.

[0051] It will be understood that the number of storage devices included in the backup destination depends on the number of copies specified by the backup task. According to an example implementation of the present disclosure, the reference backup task used as the training sample and the backup task of the current storage system should specify the same number of copies. Assuming that the backup task of the current storage system specifies that 3 copies are needed, a historical backup task specifying 3 copies can be selected to generate a training sample.

[0052] According to an example implementation of the present disclosure, the number of backup copies of the reference backup task used as the training sample may be greater than the number of copies of the backup task of the current storage system. Assuming that the backup task of the current storage system specifies that 3 copies are needed, and assuming that no backup task specifying 3 copies has been performed in the past, a historical backup task specifying 4 copies can be selected to generate a training sample. The backup destination generated at this moment will involve 4 storage devices, and 3 storage devices can be selected from the 4 storage devices to serve as the backup destinations. Although the backup destination obtained at this moment may not be optimal, compared to the existing technical solutions of determining the backup destination in a completely manual selection manner and/or for each combination mode, this technical solution can make full use of the existing experience to serve future backup tasks.

[0053] The example of acquiring device information 210 and backup information 220 for a certain point in time and generating feature vector 440 has been described above. According to an example implementation of the present disclosure, device information 210 and backup information 220 within a certain preset time period can be received. At this moment, the obtained device information 210 and backup information 220 are both represented by time sequence data.

[0054] FIG. 6A shows a graphical plot 600A related to changes in available storage space according to an example implementation of the present disclosure. In FIG. 6A, the abscissa represents time and the ordinate represents available storage space. It can be specified to obtain changes in available storage space that are within 1 hour (or other length of time). At this moment, the available storage space can be represented by a time sequence as shown by curve 610A. FIG. 6B shows a graphical plot 600B related to changes in network bandwidth according to an example implementation of the present disclosure. In FIG. 6B, the abscissa represents time and the ordinate represents network bandwidth. It can be specified to acquire changes in network bandwidth that are within 1 hour (or other length of time). At this moment, the network bandwidth can be represented by a time sequence shown by curve 610B.

[0055] Similarly, corresponding backup information 220 may be generated based on backup tasks that are within a preset time period. At this moment, both the device information and the backup information in feature vector 440 will be represented in the form of time sequence. According to an example implementation of the present disclosure, in order to analyze the association relationship related to time sequence data in a more accurate manner, a long short-term memory network may be introduced into destination association relationship 230. Hereinafter, more details will be described with reference to FIG. 7. FIG. 7 schematically shows block diagram 700 of acquiring the destination association relationship based on a feature vector according to an example implementation of the present disclosure.

[0056] As shown in FIG. 7, destination association relationship 230 can be constructed based on convolutional neural network 520 and long short-term memory network 710. At this moment, the network model based on convolutional neural network 520 can map feature vector 512 (including the reference backup information and the reference device information) to an internal feature vector. The internal feature vector here may be a high-dimensional feature vector without physical meaning. Then, the network model based on long short-term memory network 710 can map the internal feature vector to backup destination 530.

[0057] It will be understood that although convolutional neural network 520 and long short-term memory network 710 exist at this moment, for external users, there is no need to know the internal details of destination association relationship 230, but instead, destination association relationship 230 can be used as a black box for training. That is, there is no need to train convolutional neural network 520 and long short-term memory network 710 independently, but instead, only training samples 510 are needed to enable destination association relationship 230 to receive the feature vector and output backup destination 530.

[0058] The details of the training process have been described above. After destination association relationship 230 has been obtained, a feature vector established using the device information and backup information about the current storage system can be input to this destination association relationship 230 to obtain a corresponding backup destination. At this moment, the backup destination output by destination association relationship 230 represents a group of storage devices that can be used as backup destinations. Specifically, when destination association relationship 230 as shown in FIG. 7 is used, convolutional neural network 520 can map the feature vector including the backup information and the device information to a high-dimensional internal feature vector. Then, long short-term memory network 710 can map the high-dimensional internal feature vector to the backup destination. With the example implementation of the present disclosure, convolutional neural network 520 can effectively extract various aspects of features of the storage system, and long short-term memory network 710 can fully mine the internal connections in the time sequence data. In this way, destination association relationship 230 can have a higher accuracy.

[0059] According to an example implementation of the present disclosure, multiple candidate backup destinations may be output based on destination association relationship 230. Further, the backup destination can be verified based on multiple indicators, and the multiple candidate backup destinations can be filtered based on preset performance requirements. For example, a preset performance requirement can be set based on the distance between storage devices. Specifically, the preset performance requirements may include: the distance between any two storage devices in the candidate backup destination is greater than a threshold distance (for example, 300 kilometers). Assuming that the candidate backup destination includes 3 storage devices, and the distance between any two of the devices is greater than the threshold distance, this candidate backup destination can be used as the backup destination. Otherwise, the candidate backup destination can be filtered out, and other suitable candidate backup destinations can be selected from the multiple candidate backup destinations.

[0060] It will be understood that the threshold distance can ensure that the storage devices in each candidate backup destination have different physical environments, thereby reducing the possibility of simultaneous failures (e.g., power outages, floods, mechanical shocks, etc.) of different storage devices. It will be understood that the specific value of the threshold distance listed here is only illustrative and is not intended to limit the scope of the present disclosure in any way. In other implementations, the threshold distance may be set to any value according to specific technical environments and performance requirements.

[0061] According to an example implementation of the present disclosure, it can be specified that the available resource amount of any storage device in a group of storage devices included in the candidate backup destination should be greater than a threshold resource amount. Specifically, the preset performance requirements can be set based on the available resources in the storage device. The preset performance requirements may include: an available resource amount of any storage device in the candidate backup destination is greater than a threshold resource amount. In this way, it is ensured that any candidate backup destination can complete data backup. For example, the available resource amount here may include the computing resource amount, the memory resource amount, storage capacities, network bandwidths, etc. of the storage device. According to an example implementation of the present disclosure, the threshold resource amount can be set based on the resource amount required by the backup task. In other implementations, the threshold resource amount may also be predetermined according to specific technical environment and performance requirements.

[0062] According to an example implementation of the present disclosure, it may be specified that a global balance degree associated with the candidate backup destination should be higher than a threshold balance degree. Here, the global balance degree indicates a usage balance degree of the storage device set in the situation where a storage device in the candidate backup destination is used for the backup task. It will be understood that the "usage balance degree" may refer to the balance degree of the "usage" of multiple storage devices in any aspect. For example, the "usage balance degree" may refer to the "usage balance degree" of the available storage capacity of the multiple storage devices, the "usage balance degree" of the input network bandwidth of the multiple storage devices, the "usage balance degree" of the processing resources of the multiple storage devices, the "usage balance degree" of the memory resources of the multiple storage devices, and so on.

[0063] The global balance degree of each candidate backup destination can be determined respectively, and then the final backup destination can be determined based on the global balance degree. According to an example implementation of the present disclosure, the global balance degree can be determined based on various methods. For example, the global balance degree of the candidate backup destination may be determined based on the usage metric of each storage device in the candidate backup destination and the time required to transmit backup data to each storage device in the candidate backup destination.

[0064] According to an example implementation of the present disclosure, it is expected that the usage rate of the multiple storage devices will increase uniformly, but it is not desirable that a certain storage device will be exhausted prematurely. Therefore, the usage metric can be used to measure the time when the storage device is exhausted. For example, it is possible to determine when the storage device is exhausted based on the remaining storage capacity in the storage device, the size of the source data to be backed up, and the daily growth rate of the source data. For example, the time when the ith storage device is exhausted can be determined based on the following Formula 1 and Formula 2:

VE i = s = 1 n .times. S .times. D .times. S s * DDI i D .times. R Formula .times. .times. 1 ##EQU00002##

where VE.sub.i represents the daily data growth of the ith storage device, i is a positive integer and i.ltoreq.the number of storage devices N, n represents the number of pieces of source data with backups, SDS.sub.s represents the Sth source data, DDI.sub.i represents the daily data growth (for example, represented as a percentage) of the ith storage device, and DR represents the data repetition rate.

ETFR i = V .times. E i R .times. C i Formula .times. .times. 2 ##EQU00003##

where ETFR.sub.i represents the predicted exhaustion time of the ith storage device, wherein VE.sub.i represents the daily data growth of the ith storage device, and RC.sub.i represents the available storage space on the ith storage device.

[0065] Further, the standard deviation related to the exhaustion time of each storage device can be determined based on the following Formula 3:

.sigma.1 = i = 1 N .times. ( ETFR i - ETFR _ ) 2 N Formula .times. .times. 3 ##EQU00004##

[0066] where .sigma.1 represents the standard deviation related to the exhaustion time, N represents the number of storage devices, ETFR represents the predicted exhaustion time of the ith storage device, and ETFR represents an average value of exhaustion times of all the storage devices. It will be understood that the above Formulas 1 to 3 are only specific examples for determining the component of the global balance degree, which is related to the exhaustion time. According to an example implementation of the present disclosure, this component can be determined based on other formulas.

[0067] Hereinafter, more information about determining the transmission time will be introduced. According to an example implementation of the present disclosure, the time required to transmit source data to a certain storage device can be determined based on the bandwidth of each storage device. For example, the time for transmitting the source data to the ith storage device can be determined based on the following Formula 4:

E .times. T .times. C i = V .times. E i N .times. B i Formula .times. .times. 4 ##EQU00005##

where ETC.sub.i represents the time for transmitting the source data to the ith storage device, VE.sub.i represents the daily data growth of the ith storage device, and NB.sub.i represents the bandwidth of the ith storage device.

[0068] The standard deviation related to the transmission time of each storage device can be further determined based on Formula 5:

.sigma.2 = i = 1 N .times. ( ETFR i - ETFR _ ) 2 N Formula .times. .times. 5 ##EQU00006##

where .sigma.2 represents the standard deviation related to the transmission time, N represents the number of storage devices, ETC.sub.i represents the predicted transmission time of the ith storage device, and ETC represents an average value of transmission times of all the storage devices. It will be understood that the above Formulas 4 to 5 are only specific examples for determining the component of the global balance degree, which is related to transmission time. According to an example implementation of the present disclosure, this component can be determined based on other formulas.

[0069] According to an example implementation of the present disclosure, the global balance degree function GE associated with each candidate backup destination can be determined based on the following Formula 6:

GE=.sigma.1*v1+.sigma.2*v2+v3 Formula 6

where v1 and v2 distributions represent custom weights, .sigma.1 and .sigma.2 are components determined according to the formulas described above, and v3 represents a custom offset value. It will be understood that Formula 6 here is only illustrative. According to an example implementation of the present disclosure, other formulas may also be used to determine the global balance degree function GE. For example, the global balance degree function GE can be determined based on the product of .sigma.1 and .sigma.2.

[0070] According to an example implementation of the present disclosure, the corresponding global balance degree function GE can be determined for multiple candidate backup destinations. The global balance degrees of the multiple candidate backup destinations can be ordered, and the candidate backup destination with the optimal global balance degree can be selected as the backup destination. According to an example implementation of the present disclosure, a threshold of the global balance degree can be specified, and it can be specified that candidate backup destinations higher than this threshold can be filtered out of the multiple candidate backup destinations. According to an example implementation of the present disclosure, this threshold can be set based on historical experience. According to an example implementation of the present disclosure, this threshold can be set based on the current state of each storage device.

[0071] It will be understood that the global balance degree here represents a difference between the usage of all the storage devices in the storage device set after a certain candidate backup destination is selected as the backup destination. The smaller the value of the global balance degree, the more helpful the selection of this candidate backup destination is to the usage balance of all the storage devices. With the example implementation of the present disclosure, it is possible to select, as much as possible, a candidate backup destination that is helpful to the usage balance of all the storage devices as the backup destination.

[0072] The method for performing the example implementations according to the present disclosure has been described above with reference to FIGS. 2 to 7. According to an example implementation of the present disclosure, an apparatus for selecting a backup destination for a backup task is provided. The apparatus includes: a device information receiving module configured to receive device information about storage devices in a storage device set, wherein the backup task is executed in the storage device set; a backup information receiving module configured to receive backup information about the backup task; an acquisition module configured to acquire a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in the reference storage system; and a selection module configured to select a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set. According to an example implementation of the present disclosure, this apparatus may further include modules for performing other steps in method 300 described above.

[0073] FIG. 8 schematically shows a block diagram of device 800 that can be used to implement the example implementations of the present disclosure. According to an example implementation of the present disclosure, device 800 may be an electronic device, wherein example device 800 includes central processing unit (CPU) 801 that may perform various appropriate actions and processing according to computer program instructions stored in read-only memory device (ROM) 802 or computer program instructions loaded from storage unit 808 into random access memory device (RAM) 803. In RAM 803, various programs and data required for the operation of example device 800 may also be stored. CPU 801, ROM 802, and RAM 803 are connected to each other through bus 804. Input/output (I/O) interface 805 is also connected to bus 804.

[0074] Multiple components in example device 800 are connected to I/O interface 805, including: input unit 806, such as a keyboard and a mouse; output unit 807, such as various types of displays and speakers; storage unit 808, such as a magnetic disk and an optical disk; and communication unit 809, such as a network card, a modem, and a wireless communication transceiver. Communication unit 809 allows example device 800 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

[0075] The various processes and processing described above, such as example methods or example processes, may be performed by CPU 801. For example, according to an example implementation of the present disclosure, various example methods or example processes can be implemented as computer software programs, which are tangibly contained in a machine-readable medium, such as storage unit 808. According to an example implementation of the present disclosure, part or all of the computer program may be loaded and/or installed on example device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM 803 and executed by CPU 801, one or more steps of the example method or example process described above may be executed.

[0076] According to an example implementation of the present disclosure, an electronic device is provided, including: at least one processor; and at least one memory storing computer program instructions, wherein the at least one memory and the computer program instructions are configured to cause, together with the at least one processor, the electronic device to perform an action for selecting a backup destination for a backup task. The action includes: receiving device information about storage devices in a storage device set, wherein the backup task is executed in the storage device set; receiving backup information about the backup task; acquiring a destination association relationship, wherein the destination association relationship describes an association relationship between a reference backup task in a reference storage device set and a reference backup destination of the reference backup task, the reference backup destination including a group of storage devices in a reference storage system; and selecting a backup destination for the backup task from the storage device set according to the destination association relationship and based on the device information and the backup information, the backup destination including a group of storage devices in the storage device set.

[0077] According to an example implementation of the present disclosure, receiving the device information and the backup information further includes: receiving the device information and the backup information that are within a preset time period.

[0078] According to an example implementation of the present disclosure, the device information includes at least any one of the following: a position of the storage device, an available storage space of the storage device, a network bandwidth of the storage device, a CPU usage rate of the storage device, a memory usage rate of the storage device, an exhaustion time of the storage device, and so on.

[0079] According to an example implementation of the present disclosure, the backup information includes at least any one of the following: the number of backup copies specified by the backup task; a size of source data specified by the backup task; and a repetition rate of the source data.

[0080] According to an example implementation of the present disclosure, acquiring the destination association relationship includes: determining reference backup information about each reference backup task executed in the reference storage device set; determining reference device information about each reference storage device in the reference storage device set; and training the destination association relationship based on the reference backup information, the reference device information, and the reference backup destination of the reference backup task.

[0081] According to an example implementation of the present disclosure, the destination association relationship includes: a first network model based on a convolutional neural network, wherein the first network model is used to map the reference backup information and the reference device information to an internal feature vector; and a second network model based on a long short-term memory network, wherein the second network model is used to map the internal feature vector to the reference backup destination of the reference backup task.

[0082] According to an example implementation of the present disclosure, determining the backup destination includes: mapping the backup information and the device information to an internal feature vector based on the first network model; and mapping the internal feature vector to the backup destination based on the second network model.

[0083] According to an example implementation of the present disclosure, determining the backup destination further includes verifying the backup destination in response to the backup destination satisfying the following conditions: a distance between any two storage devices in the group of storage devices included in the backup destination is greater than a threshold distance; an available resource amount of any storage device in the group of storage devices included in the backup destination is greater than a threshold resource amount; and a global balance degree associated with the backup destination is higher than a threshold balance degree, wherein the global balance degree indicates a usage balance degree of the storage device set in the situation where a storage device in the backup destination is used for the backup task.

[0084] According to an example implementation of the present disclosure, the storage device set and the reference storage device set satisfy at least any one of the following: having the same or similar numbers of storage devices; and having the same or similar device models.

[0085] According to the example implementation of the present disclosure, the number of copies specified by the backup task is not higher than the number of copies specified by reference backup task.

[0086] According to an example implementation of the present disclosure, a computer program product is provided, the computer program product being tangibly stored on a non-volatile computer-readable medium and including machine-executable instructions which, when executed, cause a machine to execute the methods described above.

[0087] According to an example implementation of the present disclosure, a computer-readable medium is provided, the medium including machine-executable instructions which, when executed, cause a machine to execute the methods described above.

[0088] Through the implementations of the present disclosure, the amount of computation for selecting a backup destination can be greatly reduced, thereby improving the automation level and performance of the storage system.

[0089] As used herein, the term "include" and similar terms thereof should be understood as open-ended inclusion, i.e., "including but not limited to." The term "based on" should be understood as "based at least in part on." The term "one implementation" or "this implementation" should be understood as "at least one implementation." The terms "first," "second," etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.

[0090] As used herein, the term "determine" encompasses a variety of actions. For example, "determine" may include operating, computing, processing, exporting, surveying, searching (for example, searching in a table, a database, or another data structure), identifying, and the like. In addition, "determine" may include receiving (for example, receiving information), accessing (for example, accessing data in a memory), and the like. In addition, "determine" may include parsing, selecting, choosing, establishing, and the like.

[0091] It should be noted that the implementations of the present disclosure may be implemented by hardware, software, or a combination of software and hardware. The hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. Those skilled in the art can understand that the above-mentioned devices and methods can be implemented by using computer-executable instructions and/or by being included in processor control code which, for example, is provided on a programmable memory or a data carrier such as an optical or electronic signal carrier.

[0092] In addition, although the operations of the method of the present disclosure are described in a specific order in the drawings, this does not require or imply that these operations must be performed in the specific order, or that all the operations shown must be performed to achieve the desired result. Rather, the order of execution of the steps depicted in the flowchart can be changed. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution. It should also be noted that the features and functions of two or more apparatuses according to the present disclosure may be embodied in one apparatus. On the contrary, the features and functions of one apparatus described above can be embodied by further dividing the apparatus into multiple apparatuses.

[0093] Although the present disclosure has been described with reference to several specific implementations, it should be understood that the present disclosure is not limited to the specific implementations disclosed. The present disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

* * * * *


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed