Reducing occupancy of digital storage devices Boldy, Manfred ; et al. [Boldy, Manfred]

Reducing occupancy of digital storage devices

Boldy, Manfred ; et al.

Patent Application Summary

U.S. patent application number 11/019099 was filed with the patent office on 2005-07-14 for reducing occupancy of digital storage devices. Invention is credited to Boldy, Manfred, Sander, Peter, Stamm-Wilbrandt, Hermann.

Application Number	20050152192 11/019099
Document ID	/
Family ID	34717239
Filed Date	2005-07-14

United States Patent Application	20050152192
Kind Code	A1
Boldy, Manfred ; et al.	July 14, 2005

Reducing occupancy of digital storage devices

Abstract

A digital data storage device physically stores blocks of identical data only once on its storage medium wherein a second or even further identical blocks are stored only as reference referring to the first block of these identical blocks. By this technique, storage of duplicate data is most effectively avoided on the lowest storage level of the disk storage device, even in cases where identical blocks are written by different operating systems. In the preferred embodiment, the underlying storage medium (magnetic hard disk, optical disk, tape, or M-RAM) is segmented into two areas, the first area particularly comprising a relatively small block reference table and the remaining physical storage area for storing real blocks of information.

Inventors:	Boldy, Manfred; (Horb a.N, DE) ; Sander, Peter; (Mannheim, DE) ; Stamm-Wilbrandt, Hermann; (Eberbach, DE)
Correspondence Address:	Whitham, Curtis & Christfofferson, P.C. Suite 340 11491 Sunset Hills Road Reston VA 20190 US
Family ID:	34717239
Appl. No.:	11/019099
Filed:	December 22, 2004

Current U.S. Class:	365/189.05 ; G9B/20.015
Current CPC Class:	G11B 20/12 20130101
Class at Publication:	365/189.05
International Class:	G11C 005/00

Foreign Application Data

Date	Code	Application Number
Dec 22, 2003	EP	03104922.4

Claims

1. A digital data storage device storing information on a storage medium segmented into blocks, wherein said storage medium is segmented into two areas, wherein the first area comprises reference means and the remaining area of the storage medium is used for storing said information and wherein a second or further block being identical with a first block on block level is stored only as reference referring to the first block.

2. A digital data storage device according to claim 1, comprising at least one reference table containing at least one entry for each block, at least one fingerprint table containing fingerprint information for each block and at least one chain table containing, for each block, at least information about blocks having same fingerprints.

3. A digital data storage device according to claim 2, wherein the entries of said reference table are numbered consecutively.

4. A digital data storage device according to claim 2, wherein each of said entries consists of at least one field containing a unique identifier for identifying the physical sector where the real block is stored in the remaining physical storage area.

5. A digital data storage device according to claim 4, wherein the length of a reference field is defined as the maximum amount of required binary digits (bits) for real sector IDs.

6. A digital data storage device according to claim 1, wherein a real block stored in said remaining area of the storage medium comprises a reference counter for counting the number of references to that real block.

7. A digital data storage device according to claim 6, wherein said reference counter is used to identify how many times a block is referred to.

8. A digital data storage device according to claim 2, wherein said fingerprint table contains, for each fingerprint, the first unique identifier of a block corresponding to said fingerprint.

9. A digital data storage device according to claim 2, wherein said chain table contains, for a particular block, its preceding block in the linkage table having same fingerprint, its successive block in the linkage table having same fingerprint, its reference count, and its fingerprint.

10. A digital data storage device according to claim 1, wherein said storage medium is formatted so that the number of real blocks equals the number of entries of the reference table.

11. A digital data storage device according to claim 10, wherein the number of real blocks is adapted on a periodic time basis.

12. A digital data storage device according to claim 1, wherein a particular area of said storage medium is reserved for the reference table and thus can not be occupied by real (user) data wherein these real data is only stored in a real sector wherein occupation of the real sector advantageously can move from outer tracks to inner tracks of the storage medium.

13. A digital data storage device according to claim 1, wherein said reference means is stored outside the storage medium of the storage device, preferably in a Random-Access-Memory (RAM) being part of the storage device or a virtual RAM disk storage being part of the main storage of an underlying computer system.

14. A digital data storage device according to claim 13, further comprising a fail-over means for storing the reference table entries in a non-volatile storage, preferably an Electrically Erasable Programmable Read-Only Memory (EEPROM).

Description

FIELD OF THE INVENTION

[0001] The invention generally relates to digital data storage devices such as magnetic hard disk drives, optical disk drives, tape storage devices, semiconductor-based storages emulating or virtually realizing hard disk drives like solid hard disks or RAM disks storing information in continuous data blocks. More specifically, the invention concerns operation of such a digital storage device in order to reduce storage occupancy.

BACKGROUND OF THE INVENTION

[0002] In computer hardware technology it is well-known to use disk storage devices like hard disk drives (HDDs) or optical disk drives built-up of one or a stack of multiple hard disks (platters) on which data is stored on a concentric pattern of magnetic/optical tracks using read/write heads. These tracks are divided into equal arcs or sectors. Two kinds of sectors on such disks are known. The first and at the very lowest level is a servo sector. In the case of a magnetic storage device, when the hard disks are manufactured, a special binary digit (bit) pattern is written in a code called `gray code` on the surface of the disks, while the drive is open in a clean room, with a device called "servo writer".

[0003] This gray code consists of successive numbers that differ by only a single bit like the three bit code sequence, 000', 001', 011', 010', 110', etc. Although many gray codes are possible, one specific type of gray code is considered the gray code of choice because of its efficiency in computation. Although there are other schemes, the gray code is written in a wedge at the start of each sector. There are a fixed number of servo sectors per track and the sectors are adjacent to one another. This pattern is permanent and cannot be changed by writing normal data to the disk drive. It also cannot be changed by low-level formatting the drive.

[0004] Disk drive electronics use feedback from the heads which read the gray code pattern, to very accurately position and constantly correct the radial position of the appropriate head over the desired track, at the beginning of each sector, to compensate for variations in disk (platter) geometry caused by mechanical stress and thermal expansion or contraction.

[0005] At the end of the manufacturing process, the hard disk storage devices generally are low-level formatted. Afterwards, only high-level operations are performed such as known partitioning procedures, high-level formatting and read/write of data in the form of blocks as mentioned above. All high-level operations can be derived from only two base operations, namely a BlockRead and a BlockWrite operation. Thus even partitioning and formatting, the latter independently of the underlying formatting scheme like MS-DOS FAT, FAT32, NTFS or LINUX EXT2, are accomplished using the mentioned base operations.

[0006] When high-level formatting such a disk drive, each disk (platter) is arranged into blocks of fixed length by repeatedly writing with a definite patch like "$5A". After formatting, when storing data in such disk storage devices, these data are stored as continuous data segments on the disk (platter). These continuous data segments are also referred to as "data" or, simply, "blocks" and such terminology will be used hereinafter.

[0007] It is to be noted that, in known tape storage devices, data are stored in form of data blocks, as well. The only difference between the above described hard disk devices and these tape storage devices is that data stored on HDDs are directly accessible by means of the read/write head (so-called direct memory access DMA operation mode), whereas data stored on tapes are only accessible in a sequential manner since the tape has to be wound to the location where the data of interest are stored before these data can be accessed.

[0008] In order to minimize storage occupancy in those storage devices, it is known to avoid duplicate data. A disk drive system comprising a sector buffer having a plurality of segments for storing data and reducing storage occupancy is disclosed in U.S. Pat. No. 6,092,145 assigned to the assignee of the present invention. Generally, HDD systems require a sector buffer memory to temporarily store data in the HDD system because the data transfer rate of the disk is not equal to the data transfer rate of a host computer and thus a sector buffer is provided in order to increase the data I/O rate of new high capacity HDD systems. The system described therein particularly includes a controller for classifying data to be stored in the sector buffer and for storing a portion of the classified data in a segment of the sector buffer such that the portion of classified data stored in the segment is not stored in any other segment in the sector buffer. Therefore, the sector buffer is handled more efficiently, and the computational load to check for duplicated data is reduced and the disk drive thus improves data transfer efficiency.

[0009] The subject matter of the U.S. Pat. No. 6,092,145, in other words, concerns an improved method for read-ahead and write-ahead operations using a sector buffer wherein duplicates are eliminated only in the sector buffer implemented on the hard disk or a separate Random Access Memory (RAM), in order to provide the improved transfer efficiency mentioned above.

[0010] Another approach for optimizing storage occupancy is disclosed in U.S. Pat. No. 5,732,265 assigned to Microsoft Corporation. Particularly disclosed is an encoder for use in CD-ROM pre-mastering software. The storage in the computer readable recording medium (CD-ROM) is optimized by eliminating redundant storage of identical data streams for duplicate files whereby two files having equivalent data streams are detected and encoded as a single data stream referenced by the respective directory entries of the files. More particularly addressed therein is the problem of data consistency that arises when multiple files are encoded as a single data stream and when these files are separately modified by an operating system or application program. In U.S. Pat. No. 5,732,265 it is further disclosed to implement such an encoder in an operating system or file system to dynamically optimize storage in the memory system of the computer wherein the above-described mechanism is applied at the time a file is created or saved on a data volume to detect whether the file is a duplicate of another existing file on the data volume.

[0011] The above-discussed prior art approaches, however, have a disadvantage in that they do not address reduction of storage occupancy of stored user data (e.g. within a file or between files) which is stored in an above identified data storage device. Only as an example, it is referred to text or picture files where blocks frequently are fully represented by a recurring data byte being regarded as duplicate data in the present context. Nevertheless, as computer usage and application programs supporting it has become more sophisticated, there is an increased likelihood that relatively large portions of individual (possibly large) files comprising many blocks of data may be duplicated in many stored files; letterhead and watermarks stored in documents and portions of image files representing relatively large image areas having little detail therein being only a few examples. Further, as the capacity of memory devices increases, it becomes even more clearly impractical to compare a block to the stored with all blocks which-may have been previously stored in one or more memory devices to determine if an identical block has been previously stored.

SUMMARY OF THE INVENTION

[0012] It is therefore an object of the present invention to provide an improved mechanism for minimizing data occupancy in an above specified digital data storage device.

[0013] A further object is to provide such a data storage device with enhanced data access and transmission performance.

[0014] Another object is to provide a mechanism for minimizing data occupancy in such a data storage device that is transparent to an operating system of a computer using the data storage device.

[0015] The above objects are achieved by a digital data storage device and a method for operating same in accordance with the respective independent claims. Advantageous features are subject matter of the corresponding subclaims.

[0016] The underlying concept of the invention is to physically store blocks of identical data only once on the storage medium of the data storage device wherein a second block or even further identical blocks are stored only as reference(s) referring to the first block of these identical blocks. As a consequence, storage of duplicate data is most effectively avoided at the lowest storage level of the disk storage device, even in cases where identical blocks are written by different operating systems. The proposed method thereby effectively avoids data duplicates being created on the sector level of the storage medium. The proposed mechanism is operating system independent or fully transparent to an operating system, respectively, since it operates on the pre-mentioned block/sector level which is not known by the operating system. In contrast to the above-discussed known approaches, the invention proposes, when writing to an existing block of information onto the storage medium, not to modify the real block itself but, moreover, to modify only the relatively small reference table. Thus identical blocks of information are stored only once on the block level of the storage device and accessed or addressed only using reference information stored in the reference table.

[0017] In the preferred embodiment, the underlying storage medium (magnetic hard disk (platter), optical disk, tape, or M-RAM) is segmented into two areas, the first area comprising a relatively small block reference table (in the following briefly referred to as "reference table") and the remaining physical storage area for storing real blocks of information. Despite differences in the storage mechanism, it is emphasized that the present invention can also be applied to tape storage devices since it does not depend on the underlying data access mechanism.

[0018] The possible entries of the reference table, in another embodiment, are continuously numbered wherein the reference table contains, for each real block, at least one entry. This entry contains a unique identifier for identifying the physical sector where the real block is stored in the remaining physical storage area. The length of this entry is preferably defined as the maximum amount of required binary digits (bits) for real sector IDs.

[0019] In yet another embodiment, a real block stored in the second area of the storage medium comprises, in addition to other required information like a header, the stored data and a Cyclic Redundancy Checking (CRC), a reference counter. That counter counts the number of references to the present real block. The reference counter is preferably used to identify whether a block is used or not.

[0020] According to another aspect, as the result of a low-level formatting of the storage medium after manufacture/assembly of the storage device, the number of real blocks available for storing equals the number of entries of the reference table. Only later, during operation of the storage medium where the second area of the storage medium is filled with blocks of real data, the size of the reference table will be adapted or its optimum size being determined. Thus the optimum size can be re-calculated on a periodic time basis.

[0021] According to still another aspect of the invention, during the above-described low-level formatting or a successive formatting step after the low-level formatting of a so-called "intermediate format" of the storage medium, three tables, the above mentioned reference table, a linkage or chain table and a fingerprint table are created. Implementation of the fingerprint table presumes that for each block to be written a "fingerprint" can be calculated. An exemplary fingerprint algorithm is a cyclic redundancy check (CRC) mechanism which preferably is used for calculation of the entries of the fingerprint table. CRC is a well-known mechanism of checking for errors in data that has been transmitted on a communications link. A sending device applies a 16- or 32-bit polynomial to a block of data that is to be transmitted and appends the resulting cyclic redundancy code to the block. The receiving end applies the same polynomial to the data and compares its result with the result appended by the sender. If they agree, the data has been received successfully. If not, the sender can be notified to resend the block of data.

[0022] In the preferred embodiment, the fingerprint table, for a given fingerprint value, contains the first block identified by a block identifier (BLOCK-ID) with that fingerprint. The chain table, in that embodiment, is bi-linked and contains, for each real block, its predecessor and successor in the list of blocks with equal fingerprint and the reference count of the corresponding block and the fingerprint of the block. The reference table, in that embodiment, is continuously numbered and contains at least an entry for each real block. That entry preferably consists of the mentioned BLOCK-ID.

[0023] In order to enable dynamic expansion of the reference table in accordance with the above-mentioned process for optimizing the storage area of the storage medium, in a further embodiment, a particular storage area on the storage medium is reserved for the reference table and thus can not be occupied by real (user) data. The real data is only stored in a real sector wherein occupation of the real sector advantageously can move from outer tracks to inner tracks of the storage medium.

[0024] According to yet another embodiment, the reference table is stored outside the storage medium of the storage device, preferably in an Electronically Erasable Programmable Read-Only Memory (EEPROM/Flash RAM) being part of the storage device or a virtual RAM disk storage being part of the main storage of an underlying computer system.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] In the following, the present invention is described in more detail by way of preferred embodiments from which further features and advantages of the invention become evident wherein

[0026] FIGS. 1A and 1B depict schematic views of an available storage space of a storage device for illustrating segmentation of the storage medium into two different areas (FIG. 1a) and for illustrating the principle of expandable sector storage (FIG. 1b) in accordance with the invention;

[0027] FIG. 2 depicts a reference table according to the preferred embodiment of the invention;

[0028] FIG. 3 depicts a fingerprint table according to the preferred embodiment of the invention;

[0029] FIG. 4 depicts a LIFO stack of free blocks according to the preferred embodiment of the invention;

[0030] FIG. 5 depicts a linkage/chain table according to the preferred embodiment of the invention;

[0031] FIGS. 6A, 6B and 6C comprise a multiple-part flow diagram illustrating a BLOCK WRITE procedure conducted in an HDD device in accordance with the invention;

[0032] FIG. 7 is a flow diagram illustrating a BLOCK READ procedure conducted in an HDD device in accordance with the invention;

[0033] FIG. 8 is a flow diagram illustrating a HIGH-LEVEL FORMATTING procedure conducted in a Hard Disk Drive (HDD) in accordance with the invention;

[0034] FIG. 9A is a flow diagram illustrating a procedure for FINDING THE POSITION OF A BLOCK IN A LIST USING A FINGERPRINT conducted in an HDD in accordance with the invention;

[0035] FIG. 9B is a flow diagram illustrating a procedure for REMOVING A BLOCK FROM A LIST USING A FINGERPRINT conducted in an HDD in accordance with the invention;

[0036] FIG. 9C is a flow diagram illustrating a procedure for PREPENDING `B` TO LIST WITH FINGERPRINT IN conducted in an HDD in accordance with the invention;

[0037] FIG. 10A is a flow diagram illustrating INITIALIZATION OF AN EMPTY STACK;

[0038] FIG. 10B is a flow diagram illustrating an operation of PUSHING AN ELEMENT ONTO A STACK; and

[0039] FIG. 10C is a flow diagram illustrating an operation of RETRIEVING THE LAST PUSHED ELEMENT FROM THE STACK.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0040] FIGS. 1A and 1B schematically show the available storage space of a storage medium of an underlying storage device, the storage space being arranged in accordance with the invention. The underlying storage device, as mentioned above, can be any storage device storing information in continuous data blocks like sector-oriented magnetic hard disk drives, optical disk drives or tape storage devices, and even semiconductor storage devices emulating or virtually realizing hard disk drives like solid hard disks or RAM disks.

[0041] FIG. 1A, more particularly, illustrates how the underlying storage medium is segmented into two different storage areas 100, 105, the first area 100 containing a sector directory (e.g. implemented as a table or the like) used for operational administration of the underlying storage device according to the mechanism described hereinafter and the second area (`Real Sector`) 105 representing physical storage space for physically storing data. In FIG. 1A it is further illustrated by the two arrows 110, 115, that the size of each of the two storage areas 100, 105 can be adapted dynamically during operation of the underlying storage device, mainly depending on the storage capacity requirements of the mentioned sector directory. The required storage size for storing the sector directory, again, mainly depends on the number of currently existing data duplicates on sector level to be administered by means of the sector directory.

[0042] FIG. 1B shows a similar segmentation according to another embodiment of the invention where a number of different storage devices or storage subunits are involved. In this scenario, the sector directory is stored on a storage medium 150 of a first storage device wherein the real blocks are stored on the storage media 155, 160, 165 of other devices. In this way, the sector storage area can be expanded nearly arbitrarily, as indicated by arrow 170.

[0043] In the following it is assumed that, for each block to be written into a sector of the underlying HDD, a fingerprint value (fn) can be calculated. A known example for a fingerprint used in storage media is the above mentioned mechanism of Cyclic Redundancy Checking (CRC). CRC is a method of checking for errors in data that has been transmitted on a communications link whereby a sending device applies a 16-bit or 32-bit polynomial to a block of data that is to be transmitted and appends the resulting cyclic redundancy code (CRC) to the block. The receiving end applies the same polynomial to the data and compares its result with the result appended by the sender. If they agree, the data has been received successfully. If not, the sender can be notified to resend the block of data.

[0044] The mechanism for reducing storage occupancy in accordance with the invention, as illustrated in FIG. 1A, is based on segmentation of the storage area of the HDD or other storage device into two different areas, the first area containing a sector table and the second area intended for physically storing data. In that sector table area, there is stored a reference table R containing at least one entry for each real block of data. As illustrated in FIG. 2, in the preferred embodiment, the possible entries of that table are continuously numbered whereby each entry comprises a unique identifier (ID) of a stored block.

[0045] The sector table area also includes a fingerprint table `FP`. As illustrated by the preferred embodiment shown in FIG. 3, the FP table contains, for each possible fingerprint value A034, A035, . . . , the ID of the first block with that fingerprint. In addition, it comprises a LIFO (last in--first out) stack U (FIG. 4) of unused (real) blocks and a doubly-linked table L (FIG. 5) that comprises for a given block indicated by block number . . . , 14557, 14558, . . . the following information:

[0046] the block's predecessor or previous block (column `prev`) in the list of blocks with identical fingerprint value;

[0047] the block's successor or next block (column `next`) in the list of blocks with identical fingerprint value;

[0048] the block's reference count (column `.rc`); and

[0049] the block`s fingerprint value (column `.fp`).

[0050] The number of available fingerprint values should be on the order of the number of real blocks available in the HDD. In the preferred embodiment, the number of fingerprints is equal to the number of blocks which guarantees that the average number of blocks with equal fingerprint value is smaller than 1. Even in case that in some lists of the above tables the number of blocks with identical fingerprint is larger than 1, then other fingerprint values are not realized (or are not presented) at all and the inequality of a new block compared with all blocks already stored on the HDD is ascertained also without a physical read of the block.

[0051] The following are examples for the calculation of the table sizes showing that the tables require less than 2% of memory:

[0052] Assume that `n` is the number of bytes required for storing block numbers. For example, four bytes (thirty-two bits) are sufficient up to a storage capacity of two terabytes of the underlying storage device if the block size is 512 byte (2{circumflex over (0)}32*512) and three bytes are sufficient for a storage capacity of 16 million blocks.

[0053] The resulting sizes of each of the above tables is:

[0054] Size (R)=#sectors*n;

[0055] size (FP)=#fingerprints*n;

[0056] size (U)=#blocks*n;

[0057] size (L)=#blocks*4*n.

[0058] Thus, in case of #sectors=#blocks=#fingerprints the resulting table size is #blocks*7*n.

[0059] The above calculation shall now be illustrated by the following four different quantitative estimations a)-d):

[0060] a) 2 GB HDD: Provides 1 million blocks (<2{circumflex over (0)}24) of block size 2048 byte; therefore three bytes (n=3) are sufficient, i.e. 21*1.000.000=21 MB (can even be kept in an EEPROM disposed in the HDD);

[0061] b) 30 GB HDD: Provides 15 million blocks (<2{circumflex over (0)}24) of block size 2048 byte; therefore three bytes are sufficient, i.e. 21*15.000.000=315 MB (is about 1.05% of the entire storage capacity of the HDD);

[0062] c) 100 GB HDD: Provides 50 million blocks (<232) of block size 2048 byte; therefore four bytes (n=4) are sufficient, i.e. 28*50.000.000=1.4 GB (is about 1.4% of the entire storage capacity of the HDD);

[0063] d) 8 TB HDD: Provides 4 billion blocks (<2{circumflex over (0)}32) of block size 2048 byte; therefore four bytes are sufficient, i.e. 28*4.000.000.000=112 GB (is about 1.4% of the entire storage capacity of the HDD).

[0064] Statistical investigations have revealed that data stored on block-oriented server storage devices, on an average scale, contain up to 30% of duplicate files, and non-compressed picture formats like .bmp files often contain equally colored areas, which are stored as identical blocks on the storage device (e.g. black or white areas in these pictures), even for different pictures. In the following it is described how formatting or reading and writing blocks are performed or executed in the preferred embodiment, based on the above described storage device architecture. It should be noted that the necessary procedural steps do not depend on the underlying storage device technology and thus can be used either in a hard disk storage device or any other storage device where data are stored as data blocks.

[0065] Referring now to FIGS. 6 to 9 it is described in more detail by way of flow diagrams how the particular operations `BLOCK WRITE`, `BLOCK READ`, `HIGH-LEVEL FORMATTING`, `FINDING THE POSITION OF A BLOCK IN A LIST`, `REMOVING A BLOCK FROM A LIST` and `INSERTING A BLOCK INTO A LIST` (the last three operations by using a fingerprint) are performed in a sector-oriented storage in accordance with the invention. These operations and method of operating a storage device are sufficient to guarantee that any block is stored exactly once in the storage medium and that different sectors containing the same block only contain references to this one block while limiting the processing overhead to do so. The mechanism and method in accordance with the invention must quickly check for a block, blk, that is already stored on the storage medium which can be very large. This reduction in processing time is achieved by calculating a fingerprint for block blk and then quickly searching the relatively short list of blocks already present with the same fingerprint, fn. It should be noted that blocks containing different data may, nevertheless, result in the same fingerprint being calculated. However, since the number of possible fingerprints which can result from calculation based on the data content of block is very large, the list of blocks having different content which may have the same (or any given) fingerprint will be a very small fraction of the number of blocks stored and the search can thus be performed very quickly on a list of blocks which will generally be very short.

[0066] BLOCK WRITE Operation

[0067] For the present BlockWrite operation it is assumed that a data block `blk` is to be written at a position of the HDD designated with block number `s`. For that operation, procedural steps shown in FIGS. 6A-6C are performed. It is noted that the three parts of the entire flow diagram are linked at cardinal points `B` and `C`, respectively.

[0068] In first step 600 shown in FIG. 6A, for the bit pattern of the block `blk`, a fingerprint `fn` is calculated. An appropriate method for calculating the fingerprint is the above-mentioned known CRC mechanism although other appropriate and possible techniques for computing a fingerprint will be evident to those skilled in the art. Next, the HDD position number, s, at which the block is to be written is looked up 605 in the reference table R at block position `s` and the resulting ID entry `b` is checked in the next step 610 to determine if the entry `b` is undefined (`undef`). If this condition is fulfilled (i.e. b is not defined because nothing has been previously stored for sector s) then the procedure continues with step 655 shown in FIG. 6B (through linking cardinal point B). If the condition is not fulfilled (i.e. b is already defined) then it is checked in next step 615 by means of the linkage table L if the above calculated fingerprint `fn` is identical with the fingerprint value stored in table L for the present block entry `b`.

[0069] If condition 615 is fulfilled then in step 620 the whole bit pattern of `b` is read and stored in `orig`. In the following step 625 it is then checked if the bit pattern of block `blk` is identical with the bit pattern `orig`. If so then the procedure is terminated 630 because block `blk` is already in place (blk==orig) in storage. Otherwise it is further checked 635 if the reference count `rc` for the present block `b` contained in the linkage table L is equal to `1`. If so then the bit pattern `b` of block `blk` is physically written 640 to the HDD at block position `s` and the procedure terminated 630 accordingly. Otherwise, in step 645, in the linkage table L the reference count `rc` of `b` is decreased by `1`.

[0070] Referring now back to step 615, if the fingerprint `fn` calculated by means of the linkage table L is not identical with the fingerprint value stored in table L for the present block entry `b`, it is checked in step 650, if the reference count value `rc` contained in table L for entry `b` is equal to `1`. If so, the procedure is continued with the next step linked to point `B` shown in FIG. 6B. Otherwise the reference count `rc` is decreased by `1` in following step 645.

[0071] Now referring to FIG. 6B, it is described how the above BlockWrite procedure is continued at cardinal point `B` to make entry `b` available for writing with step 655 where the reference count value of entry `b` in the linkage table L is set `0`. In next step 660 the entry `b` is removed from the list contained in the fingerprint table FP for fingerprint `fn`. The underlying procedure for the removal of entry `b` is described in more detail referring to FIG. 9B.

[0072] The following steps 665-680 surrounded by line 690 relate to a mechanism for handling physically defective blocks in a HDD and thus represent an optional but further advantageous perfecting feature of the invention. In step 665 of that optional procedure, a gray code is physically be written at block `b` of the HDD. In the following step 670, that block `b` is physically read and stored temporarily as variable `aux`. In step 675 it is then checked if the data pattern temporarily stored in `aux` is equal with the original gray code. If not, the present block can be assumed to be defective and thus in the following step 680 that block is marked as defective simply by setting the reference count `rc` of that block to `-1`. Otherwise, whether or not the optional procedures indicated by line 690 are performed, the procedure continues with step 685 where a stack operation push(U, x) with x=`b` in the present case is executed, making block `b` available as an unused block. The necessary stack operations are described in detail below with reference to FIGS. 10A-10C.

[0073] In FIG. 6C it is illustrated how the presently described procedure continues at cardinal point `C`. In the first step 695 the entry for block `b` in the reference table R is set `undef` (=undefined). In the following step 700, the position of a block `blk` with fingerprint value `fn` in the list with all blocks of fingerprint value `fn` (FP[fn]) is determined. The underlying procedure for finding that position is described in more detail hereinafter referring to FIG. 9A. In following step 705 it is checked if b is undefined (`undef`) indicating, that no block in the list is identical to `blk`. If `YES`, the above described pop(U) operation is performed with the LIFO stack U in step 710 to receive a free block for storing. In the next step 715 `b` is inserted into the fingerprint table FP with the above calculated fingerprint value fn. For the details of that insertion procedure it is referred to the following description of FIG. 9C.

[0074] Similarly to preceding step 640, in present step 720 the bit pattern of block `blk` is physically written to the HDD at real block `b` accordingly. Thereafter, the reference count `rc` of `b` is set 725 to the value `1` in the linkage table L, because the block `blk` is stored for the first time on the storage device. In addition, the fingerprint value of table L is set 730 with the above calculated value fn. In the last step 735, at the position s of the reference table R, the value b is entered. Then the present procedure is terminated by step 740.

[0075] However, if the check box 705 reveals `NO`, i.e. that the entry b of the reference table R is not undefined (`undef`), then the procedure continues with step 745 where the reference count `rc` of `b` in the linkage table L is increased by `1`. The reason for this alternating path is that an already existing block `b` with content identical to `blk` was found in the list, and the new reference to that block increases the number of blocks referring to it.

[0076] Thus, in summary, the block write operation in accoredance with the invention first determines the fingerprint of the block to be written and then searches to determine if a block with the same fingerprint already exists in memory. This search is performed by looking up all blocks in a doubly linked list of blocks with fingerprint fn. The first element of this list is accessed in a constant time by the array FP. FP[fn] is either `undef` (e.g. an empty list with blocks of fingerprint fn) or holds the first physical block with fingerprint fn. If a stored physical block b of content identical to blk is found, all that has to be done is to set the reference for s (R[s]=b) and increment the reference count for block b by 1.

[0077] That is, if no block is already stored which has the same fingerprint, the block to be written is not a duplicate of any other previously written block. While blocks having different content could have the same fingerprint computed for them, this screening by fingerprints reduces the number of block which must be considered to a list which is generally very short (and, as will be demonstrated, will only be a relatively few blocks, on average) compared to the number of blocks which can be stored in a potentially very large memory.

[0078] BLOCK READ Operation

[0079] It is now assumed accordingly that a data block with block number `s` is to be read from the storage device. The following are the steps sufficient for that block read operation in accordance with the preferred embodiment of the present invention.

[0080] In step 800 (FIG. 7) it is checked if the entry at position `s` is undefined (`undef`). If so, in step 805, an arbitrary bit pattern is returned. Otherwise, in step 810, the block `blk` at the position `s` of the reference table R is physically read and returned.

[0081] HIGH-LEVEL FORMATTING Operation

[0082] Referring now to FIG. 8, a preferred embodiment of a procedure for high-level formatting a HDD is described in detail by way of the depicted flow diagram. This procedures serves for initializing an HDD for applying the HDD operation method according to the invention.

[0083] In a first step 900 (FIG. 8), for all sectors s of the HDD, the corresponding entries of the reference table R are set undefined (`undef`), namely all entries of R. Then, in the fingerprint table FP, for all possible fingerprint values fn, FP[fn] is set 905 undefined (`undef`). In step 910, the LIFO stack U is initialized as an empty stack. In the following step 915, for all remaining real blocks b contained in the area 105 shown in FIG. 1A, the above described push operation, as shown too in FIG. 8, is applied for x=b. In the final step 920 of the present formatting procedure, for all real blocks b, the corresponding entries for the parameters previous block `prev`, next block `next` and fingerprint value `fn` contained in the linkage table L are set undefined (`undef`) wherein the entry for the parameter reference count `rc` is set `0`.

[0084] FINDING THE POSITION OF A BLOCK IN A LIST Operation

[0085] According to the preferred embodiment illustrated by way of the flow diagram depicted in FIG. 9A, the procedure starts with step 1000 where it is checked if a given fingerprint entry of the fingerprint table FP is undefined (`undef`). If `yes` then it is returned 1005 `undef` since the list of blocks with fingerprint fn is empty in this case. Otherwise, the first block of the list of blocks with fingerprint fn 1010 is denoted by `b`. In the following step 1015 the block `b` is physically read and temporarily stored as variable `orig`. Then it is checked 1020 if the bit patterns of blk and orig are identical. If so, then the block ID `b` is returned 1025. Otherwise, the next block stored in column `next` of the linkage table `L` for present block `b` is set 1030 as a new block `b`. Thereafter it is checked 1035 if the new block `b` is undefined (`undef`) (i.e. the list is completely traversed). If so `undef` is returned 1040. Otherwise it is jumped back to step 1015 and this step executed for a next block `b`. Thus the block ID of a block in the list identical to `blk` is returned, if one exists, and otherwise `undef is returned as indication of non-existence of such a block.

[0086] REMOVING A BLOCK FROM A LIST Operation

[0087] In the preferred embodiment illustrated in FIG. 9B, the procedure for removing a block `b` with a fingerprint value `fn` from the linkage table L starts with checking 1100 in the fingerprint table `FP`if block `b` is the first block in that list. If so then the fingerprint value `fn` of the next block contained in linkage table `L` is fetched and in the fingerprint table `FP`set 1105 as first block with that `fn` value. Thereafter in the linkage table `L` the corresponding entry with that `fn` value is fetched and the corresponding `prev` value set 1110 `undef`. In the following steps 1115 and 1120 the `next` and `prev` values of the present block `b` are both set `undef`.

[0088] Referring back to step 1100, if the current block `b` is not the first block in the list, it is jumped to the entry for the previous block `prev` of present block `b` in the linkage table `L` and the next block `next` entry for that entry is set 1125 as the next block `next` for the current block `b` in the `L` table. In the next step 1130 following in the path it is then checked if the status of the current entry set in step 1125 is `undef`. If so, this path is continued with step 1115 followed by step 1120 as described beforehand. Otherwise, an intermediate step 1135 is executed where it is jumped to the entry for the next block `next` of present block `b` in the linkage table `L` and the previous block `prev` entry for that entry is set 1125 as the previous block `prev` for the current block `b` in the `L` table.

[0089] PREPEND `B` TO LIST WITH FINGERPRINT FN

[0090] As illustrated by the preferred embodiment for this procedure for insertion of a block `b` having a fingerprint value `fn` into a linkage table `L` and a fingerprint table `FP` shown in FIG. 9C, it is first checked 1200 if the underlying entry for `fn` in the fingerprint table `FP` is `undef`. If so then the next block entry and the previous block entry of the `L` table for `b` are both set 1205, 1210 `undef`. After this `b` is inserted with its fingerprint value `fn` in the `FP` table and the procedure is terminated 1220, i.e. the list consists of block `b` in this case only.

[0091] If the condition in step 1200 is not being fulfilled (the list of blocks with fingerprint `fn` is not empty) then the procedure continues with step 1225 of a second path where the present fingerprint value `fn` gathered from the fingerprint table `FP` is set for next block `next` contained in the linkage table `L`. Thereafter the previous block `prev` contained in the linkage table `L` for that fingerprint is set 1230 `b` (b is pre-pended to the list FP[fn]).

[0092] It is important to note that, of all the above-described operations, only the block write operation requires more than a constant time and that the block read operation only has the small and constant additional processing burden of following the reference R[s]. Therefore, the effect of the invention on memory input/output rates is very slight while optimally reducing memory occupancy by eliminating all duplication of blocks of stored data with a granularity potentially much smaller than files.

[0093] It is also noteworthy that the above described tables R, FP, L and the LIFO stack U, in part or even all, can be implemented in a static approach with predefined size or in a dynamic approach where the size is dynamically adapted to the actual storage requirements for storing the corresponding necessary data. The above described search procedures for finding data duplicates on storage sector level can be implemented by way of a known indexing mechanism in order to enhance overall processing performance of the described storage management mechanism.

[0094] In summary, it is clearly seen that the invention provides for optimally reduced occupancy of memory devices with minimal penalty in processing burden or use of storage. For example in case c discussed above in regard to a 100 GB HDD, providing 50,000,000 blocks of 2048 byte storage, if working with 24-bit fingerprints, then even on a fully written memory of totally different blocks, the average length of a list of blocks having the same fingerprint will be 50,000,000/2.sup.24=2.9802, or, on average, less than three blocks which must be read to determine if a block to be stored is a duplicate of a block previously written. This meritorious effect is increased with increasing memory capacity since the average number of blocks which must be read remains relatively small while the memory capacity may greatly increase and the difference between the number of blocks of storage and the number of blocks which must be read to identify or disprove the presence of a block blk becomes increasingly great. Further, although the invention has been described for a hard disk drive (HDD) only, it is understood hereby, that the invention can be applied accordingly in a tape storage or semiconductor storage or any CPU-based storage using block memory devices, if that storage comprises segmentation into blocks as described beforehand.

[0095] Further, the invention can also be implemented either in a storage area network (SAN) or a network attached storage (NAS) environment. A SAN is a high-speed special purpose digital network that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users. Typically, a storage area network is part of the overall network of computing resources for an enterprise. A storage area network is usually clustered in close proximity to other computing resources such as IBM S/390 mainframe computers but may also extend to remote locations for backup and archival storage, using wide area network carrier technologies such as asynchronous transfer mode or synchronous optical networks.

[0096] A NAS is a hard disk storage that is set up with its own network address rather than being attached to the department computer that is serving applications to a network's workstation users. By removing storage access and its management from the department server, both application programming and files can be served faster because they are not competing for the same processor resources. The network-attached storage device is attached to a local area network (typically, an Ethernet network, the most widely-installed local area network (LAN) technology) and assigned an IP address. File requests are mapped by the main server to the NAS file server. A network-attached storage consists of hard disk storage, including multi-disk RAID (redundant array of independent disks) systems, and software for configuring and mapping file locations to the network-attached device. Network-attached storage can be a step toward and included as part of the above mentioned more sophisticated storage system known as SAN.

[0097] In these environments, as pointed out in FIG. 1b, the sector table (including the above described tables) is separated physically from the sector storage, i.e. both are implemented on different disk storage devices (e.g. HDDs). Hereby it is enabled to implement a large sector table that is used to access sector storages arranged in a stack of other HDDs. It is mentioned hereby that today's HDD controllers are able to manage 100 or even more HDDs. The mentioned stack of sector storage HDDs in case of need can be extended easily insofar as the sector table arranged on the first HDD has only to be enlarged.

[0098] It is further to be noted that the sector table, in another embodiment, can also be arranged in a solid-state random access memory (RAM) thus enhancing processing speed for managing the sector table.

[0099] Thereupon it is noteworthy that although the underlying storage device is an HDD storage in the present embodiment, the concepts and mechanisms described hereinafter can also be applied to other types of storage devices like semiconductor-based storages.

* * * * *