Bloom Filter Partitioning Manville; Thomas ; et al. [Maginatics LLC]

Bloom Filter Partitioning

Manville; Thomas ; et al.

Patent Application Summary

U.S. patent application number 16/670458 was filed with the patent office on 2020-02-27 for bloom filter partitioning. The applicant listed for this patent is Maginatics LLC. Invention is credited to Shrinand Javadekar, Julio Lopez, Thomas Manville.

Application Number	20200065306 16/670458
Document ID	/
Family ID	68766407
Filed Date	2020-02-27

View All Diagrams

United States Patent Application	20200065306
Kind Code	A1
Manville; Thomas ; et al.	February 27, 2020

BLOOM FILTER PARTITIONING

Abstract

A partitioned Bloom filter is disclosed. In various embodiments, a representation of an item is received. The representation is used to determine a partition with which the item is associated. A partition-specific Bloom filter is used to determine at least in part whether the item may be an element of a set with which the partition is associated.

Inventors:

Manville; Thomas; (Mountain View, CA) ; Lopez; Julio; (Mountain View, CA) ; Javadekar; Shrinand; (Sunnyvale, CA)

Applicant:

Name	City	State	Country	Type
Maginatics LLC	Mountain View	CA	US

Family ID:

68766407

Appl. No.:

16/670458

Filed:

October 31, 2019

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
14675476	Mar 31, 2015	10503737
16670458

Current U.S. Class:	1/1
Current CPC Class:	G06F 16/24554 20190101; G06F 16/2455 20190101
International Class:	G06F 16/2455 20060101 G06F016/2455

Claims

1. A method, comprising: receiving, by one or more processors, a representation of an item; using, by one or more processors, the representation to determine a particular partition with which the item is associated, wherein the particular partition with which the item is associated is one of a plurality of partitions corresponding to a volume that comprises the item, and each of the plurality of partitions have a corresponding partition-specific Bloom filter; determining, by one or more processors, whether the item is an element of a set with which the particular partition is associated; dynamically determining to partition or resize the partition-specific Bloom filter based at least in part on a computed probability of the partition-specific Bloom filter rendering a false positive; and in response to determining to partition or resize the partition specific Bloom filter, partitioning or resizing the partition-specific Bloom filter independent of one or more other partition-specific Bloom filters corresponding to one or more other partitions of the plurality of partitions.

2. The method of claim 1, wherein the determining of whether the item is an element in the set comprises: checking a partition-specific Bloom filter corresponding to the particular partition to determine if the item is an element of the set; in response to determining that the partition-specific Bloom filter indicates that the item is not an element of the set, determining that the item is not an element of the set; and in response to determining that the partition-specific Bloom filter indicates that the item is an element of the set, querying a table associated with the set for the representation of the item, and determining that the item is an element of the set in the event that the querying of the table associated with the set indicates that the set includes the item.

3. The method of claim 1, wherein the computed probability is based at least in part on one or more of a filter size and a number of elements in the particular partition corresponding to the partition-specific Bloom filter.

4. The method of claim 1, wherein the Bloom filter comprises a counting Bloom filter.

5. The method of claim 1, wherein the particular partition comprises a subset of the set.

6. The method of claim 1, wherein the representation comprises a hash.

7. The method of claim 1, wherein the item comprises a chunk included in a set of one or more chunks into which a file has been segmented.

8. The method of claim 1, wherein the item comprises a chunk of data and representation comprises a hash of the chunk of data.

9. The method of claim 1, further comprising determining a number of partitions to associate with the set.

10. The method of claim 9, further comprising determining for one or more of the plurality of partitions an initial size of a corresponding partition-specific Bloom filter.

11. The method of claim 1, wherein the particular partition comprises a first partition; and further comprising resizing the partition-specific Bloom filter associated with the first partition without affecting operation of one or more other partition-specific Bloom filters associated the one or more other partitions of the plurality of partitions.

12. The method of claim 1, further comprising determining to rebuild the particular partition based at least in part on a count reflecting a number of items that have been removed from the particular partition.

13. The method of claim 1, further comprising: dynamically determining whether to partition or resize the partition-specific Bloom filter based at least in part on a number of observed false positive results with respect to the partition-specific Bloom filter.

14. The method of claim 1, wherein: the one or more other partition-specific Bloom filters corresponding to the one or more other partitions of the plurality of partitions are responsive to queries during the partitioning or resizing of the partition-specific Bloom filter.

15. A system, comprising: a processor configured to: receive a representation of an item; use the representation to determine a particular partition with which the item is associated, wherein the particular partition with which the item is associated is one of a plurality of partitions corresponding to a volume that comprises the item, and each of the plurality of partitions have a corresponding partition-specific Bloom filter; determine whether the item is an element of a set with which the particular partition is associated, wherein the set comprises one or more objects stored in a distributed file system; dynamically determine to partition or resize the partition-specific Bloom filter based at least in part on a computed probability of the partition-specific Bloom filter rendering a false positive; in response to determining to partition or resize the partition specific Bloom filter, partition or resize the partition-specific Bloom filter independent of one or more other partition-specific Bloom filters corresponding to the one or more other partitions of the plurality of partitions; and a storage device coupled to the processor and configured to store the partition-specific Bloom filter.

16. A computer program product embodied in a non-transitory computer readable storage medium and comprising computer instructions for: receiving a representation of an item; using the representation to determine a particular partition with which the item is associated, wherein the particular partition with which the item is associated is one of a plurality of partitions corresponding to a volume that comprises the item, and each of the plurality of partitions have a corresponding partition-specific Bloom filter; determining, by one or more processors, whether the item is an element of a set with which the particular partition is associated, wherein the set comprises one or more objects stored in a distributed file system; dynamically determining to partition or resize the partition-specific Bloom filter based at least in part on a computed probability of the partition-specific Bloom filter rendering a false positive; and in response to determining to partition or resize the partition specific Bloom filter, partitioning or resizing the partition-specific Bloom filter independent of one or more other partition-specific Bloom filters corresponding to one or more other partitions of the plurality of partitions.

Description

CROSS REFERENCE TO OTHER APPLICATIONS

[0001] This application is a continuation of co-pending U.S. patent application Ser. No. 14/675,476, entitled BLOOM FILTER PARTITIONING filed Mar. 31, 2015 which is incorporated herein by reference for all purposes.

BACKGROUND OF THE INVENTION

[0002] Bloom filters provide a space efficient way to store data that can be used to test whether an element is a member of a set. A Bloom filter may comprise a bit array of m bits. One or more hash functions k may be used to map a given item or a corresponding one or more locations in the array. For example, an element A may be mapped to a filter location by computing the hash of the element A modulo the size of the array. As an element is added to the set, the corresponding bits may be set, e.g., by changing an initial/default value of "0" to "1".

[0003] When a Bloom filter is used to determine membership in a set, false positives are possible, since for two or more different items the respective hash values modulo the array size may be the same. However, false negatives are not possible, since if the element is already a member of the set the corresponding bit(s) in the filter would be found to have been set.

[0004] In some applications, a Bloom filter may be used to determine whether an element is already in a set. If the filter result is positive, a further query, e.g., of a database table, may be performed to determine conclusively whether the element is in the set. If the filter result is negative, the database query does not need to be performed.

[0005] Typically, for an array of a given size, the probability of false positives increases the more elements that are added to the set. Typically, the false positive probability increases at a specific, calculable rate. The false positive rate can be reduced by increasing the size of the array, but typically resizing requires that the entire filter be rebuilt, e.g., by iterating over the elements in the set to populate the newly-resized filter array. For a set having a very large number of elements, the time, computing, and other resources required to rebuild the filter after resizing may be prohibitive.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.

[0007] FIG. 1 is a block diagram illustrating an embodiment of a distributed file system and environment.

[0008] FIG. 2 is a block diagram illustrating an embodiment of a client system.

[0009] FIG. 3 is a block diagram illustrating an embodiment of a distributed file system.

[0010] FIG. 4 is a flow chart illustrating an embodiment of a process to store a file or other file system object in a distributed file system.

[0011] FIG. 5 is a flow chart illustrating an embodiment of a process to handle a request to store a file or other file system object in a distributed file system.

[0012] FIG. 6 is a flow chart illustrating an embodiment of a process to store file segment or "chunk" data associated with a distributed file system.

[0013] FIG. 7 is a flow chart illustrating an embodiment of a process to access a file or other file system object stored in a distributed file system.

[0014] FIG. 8 is a flow chart illustrating an embodiment of a process to handle a request to access a file or other file system object stored in a distributed file system.

[0015] FIG. 9 is a block diagram illustrating an example set of file system metadata tables used in an embodiment of a distributed file system.

[0016] FIG. 10 is a block diagram illustrating an example of a Bloom filter used in an embodiment of a distributed file system.

[0017] FIG. 11 is a flow chart illustrating an embodiment of a process to use a Bloom filter to determine whether a chunk comprising file data has already been stored.

[0018] FIG. 12 is a block diagram illustrating an embodiment of a partitioned Bloom filter used in embodiments of a de-duplicating file system.

[0019] FIG. 13 is a flow chart illustrating an embodiment of a process to create and maintain a partitioned Bloom filter.

[0020] FIG. 14 is a flow chart illustrating an embodiment of a process to use a partitioned Bloom filter to determine whether a chunk comprising file data has already been stored.

[0021] FIG. 15 is a flow chart illustrating an embodiment of a process to determine whether and/or when to resize/rebuild a component filter of a partitioned Bloom filter.

[0022] FIG. 16 is a flow chart illustrating an embodiment of a process to determine whether and/or when to split a component filter of a partitioned Bloom filter.

DETAILED DESCRIPTION

[0023] The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term `processor` refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.

[0024] A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.

[0025] Partitioning a set or set space into two or more partitions and providing a separate, partition-specific Bloom filter for each partition is disclosed. In various embodiments, set membership may be determined at least in part by using the partition-specific Bloom filters. In some embodiments, the computed false positive probability or other criteria may be used to determine to resize and rebuild a partition-specific Bloom filter. In various embodiments, partition-specific Bloom filters may be resized and/or rebuilt independently of other partition-specific Bloom filters associated with other partitions, enabling such other partition-specific Bloom filters to remain available for use. In various embodiments, techniques disclosed herein may be used in connection with a variety of different types of Bloom filter, including without limitation a counting Bloom filter.

[0026] FIG. 1 is a block diagram illustrating an embodiment of a distributed file system and environment. In the example shown, the distributed file system environment 100 includes a plurality of client systems and/or devices, represented in FIG. 1 by clients 102, 104, and 106. In the example shown, the clients connect (wireless or otherwise) to a network 108, e.g., one or more of a local area network (LAN), a wide area network (WAN), the Internet, and/or one or more other public and/or private networks. The clients have access via network 108 to a file system metadata server 110. Applications on the respective clients, such as clients 102, 104, and 106, make file system calls, which result in various embodiments in corresponding remote calls being made to file system metadata server 110. For example, a file system client, agent, or other entity on the client may intercept or otherwise receive calls by the application to a local (e.g., native) file system, and may redirect such calls to an agent configured to make corresponding remote calls to file system metadata server 110 (e.g., transparently to the application).

[0027] In the example shown, data comprising objects stored in the file system, such as files, is stored in a cloud-based object store 112. In some embodiments, files may be segmented into a plurality of segments or "chunks", each of which is stored in a corresponding location in the cloud-based object store. File system calls are made to file system metadata server 110, which stores file system metadata in a file system metadata storage 114, e.g., in a database or other data store. File system metadata server 110 may store in file system metadata store 114, for example, a segment or "chunk" map for each file or other object stored and represented in the file system. For example, for each file name (e.g., pathname) the file system metadata server 110 may store in a corresponding segment map a hash or other representation of each segment, and for each a corresponding location in which the segment is (or is to be) stored in cloud-based object store 112. Other file system metadata, such as metadata typically stored by a file system, may be stored by file system metadata server 110 in file system metadata store 114. Examples include, without limitation, a directory, file, or other node/object name; an identification of parent and/or child nodes; a creation time; a user that created and/or owns the object; a time last modified and/or other time; an end-of-file (EOF) or other value indicative of object size; security attributes such as a classification, access control list, etc.; and/or other file system metadata.

[0028] While in the example shown in FIG. 1 the file system metadata server 110 and the cloud-based object store 112 are shown as separate systems, located in different networks and/or physical locations, in other embodiments the file system metadata and file system content data may be stored together, e.g., both on cloud-based resources and/or both on enterprise or other network servers, etc.

[0029] FIG. 2 is a block diagram illustrating an embodiment of a client system. In the example shown, the client system/device 102 of FIG. 1 is shown to include an application 202 running in an environment provided by an operating system 204. The operating system 204 includes a kernel (not shown) and other components configured to provide services and/or functionality to applications such as application 202. For example, operating system 204 may include and/or be configured to provide access to a native file system (not shown) of client system 102. Application 202 may be configured to make file system calls to the native file system, e.g., to store files or other objects created by/using application 202, to modify, move, or delete such objects, etc. In the example shown, file system calls made by application 202, represented in FIG. 2 by the downward pointing arrow originating in the block labeled "app" (202), are intercepted by a kernel module (or other component) 206 and redirected to a file system client (or other file system agent) 208. In some embodiments, file system agent 208 comprises a client application running in user space. In some embodiments, file system agent 208 comprises a kernel or other operating system component or module. File system client 208 in this example has associated therewith a local cache 210. In various embodiment, cache 210 may be used to buffer and/or otherwise stage file data prior to its being sent to remote storage (e.g., cloud-based object store 112 of FIG. 1), and/or to facilitate access to data stored previously but to which access may be requested later.

[0030] The client system 102 includes a network communication interface 212 that provides network connectivity, e.g., to a network such as network 108 of FIG. 1. For example, a request from app 202 to access a file stored remotely in various embodiments may result in file system client 208 making a remote call, via network communication interface 212, for example to a file system metadata server such as server 110 of FIG. 1.

[0031] In various embodiments, file system client 208 may be configured to store in a metadata write buffer comprising or otherwise associated with file system client 208 and/or cache 210 one or more file system operations and/or requests affecting file system metadata comprising a portion of the file system metadata with respect to which a file system metadata write lease is held by file system client 208. For example, file system operations affecting metadata may be buffered as received, e.g., as a result of local file system calls by applications such as application 202 of FIG. 2, and may be communicated to the remote file system metadata server asynchronously and/or upon occurrence of an event, e.g., receipt of an indication that a metadata write lease "break" event has been received and/or has occurred. For example, a second client system may indicate a desire and need to perform operations affecting a portion of the file system metadata with respect to which a first client system holds a lease, result in a "break" communication being sent to the first client system, which in turns "flushes" at least those operations in the buffer that affect the portion of metadata with respect to which the lease had been held.

[0032] FIG. 3 is a block diagram illustrating an embodiment of a distributed file system. In the example shown, client 102 communicates via a secure session-based connection 302 with file system metadata server 110. In addition, client 102 communicates with cloud-based object store 112 via a TCP/IP or other connection that enables client 102 to store objects (e.g., file segments or "chunks") via HTTP "PUT" requests and to retrieve segments ("chunks") via HTTP "GET" requests. In various embodiments, client 102 (e.g., a file system client or other agent running on client 102) sends and receives distributed file system "control plane" communications via secure connection 302 (e.g., file system operations that change or require the processing and/or use of file system metadata), whereas communicates sent via connection 304 may be considered to comprising a "data plane" via which file system object data (i.e., segments or "chunks") may be stored and/or retrieved. In the example shown, file system metadata server 110 has access to active directory 306, which in various embodiments may comprise information usable to authenticate users of clients such as client 102.

[0033] In various embodiments, file system objects, such as files, may be stored by a client on which a distribute file system client or other agent has been installed. Upon receiving a request to store (or modify) a file system object, in various embodiments the file system client segments the object into one or more segments or "chunks" and computes a reference (e.g., a hash) for each. The references are included in a file system request sent to the file system metadata server, e.g., via a secure connection such as connection 302 of FIG. 3. The file system metadata server returns information to be used by the file system client to store (non-duplicate) segments/chunks in the cloud-based object store by sending the segment data directly to the cloud-based object store, e.g., via PUT requests sent via a connection such as connection 304 of FIG. 3.

[0034] FIG. 4 is a flow chart illustrating an embodiment of a process to store a file or other file system object in a distributed file system. In various embodiments, the process of FIG. 4 may be performed on a client system or device, e.g., by a file system client or other agent running on the client system/device, such as file system client 208 of FIG. 2. In the example shown, a request is received, e.g., from an application, to store a file (402). The file is segmented into one or more segments (404). For each segment, a segment reference, e.g., a hash, is computed (406). A file write request that includes the segment references is sent to the file system metadata server (408). A set of uniform resource indicators (URI's) or other pointers is received from the file system metadata server (410). In various embodiments, the set of pointers may include pointers only for those segments not already stored by the distributed file system. The received pointers are used to store segments, e.g., via HTTP "PUT" requests sent directly to the cloud-based object store (412).

[0035] FIG. 5 is a flow chart illustrating an embodiment of a process to handle a request to store a file or other file system object in a distributed file system. In various embodiments, the process of FIG. 5 may be performed by a file system metadata server, such as file system metadata server 110 of FIG. 1. In the example shown, a request to store a file is received (502). A segment ("chunk") map that associates the file system object name and/or other identifier (e.g., file name, pathname) with a set of one or more segment references (e.g., hash values) is created (504). Segments that are not duplicates of segments already stored by the distributed file system are identified, for example based on the segment references (506). For each segment that is not a duplicate, a storage location is computed (e.g., based at least in part on all or part of the segment reference) and a URI or other pointer usable to store the segment directly in the cloud-based data store is generated (508). In various embodiments, the URI or other pointer is signed cryptographically by the file system metadata server. The URI may have an expiration time by which it must be used to store the segment. The URI's are sent to the file system client from which the request to store the file was received (510).

[0036] FIG. 6 is a flow chart illustrating an embodiment of a process to store file segment or "chunk" data associated with a distributed file system. In various embodiments, the process of FIG. 6 may be performed by a cloud-based object store, such as object store 112 of FIG. 1. In the example shown, a "PUT" request associated with a URI specified in the request is received (602). A cryptographic signature associated with the URI and an expiration time encoded in the URI are checked (604). For example, the cloud-based object store may be provisioned to check that the URI has been signed by a trusted file system metadata server and/or that an expiration time of the URI has not elapsed. If the URI is determined to be currently valid (606), a payload data associated with the PUT request, e.g., file system object segment or "chunk" data, is stored in a location associated with the URI (608). If the URI is determined to not be valid (606), the PUT request fails (610), and the file system client receives a response indicating it must obtain a new URI from the file system metadata server.

[0037] In various embodiments, file system objects, such as files, may be retrieved by a client on which a distribute file system client or other agent has been installed. Upon receiving a request to access a file system object, in various embodiments the file system client sends a file access request to the file system metadata server, e.g., via a secure connection such as connection 302 of FIG. 3. The file system metadata server returns information (e.g., one or more URI's or other pointers) to be used by the file system client to retrieve segments/chunks directly from the cloud-based object store, e.g., via GET requests sent via a connection such as connection 304 of FIG. 3.

[0038] FIG. 7 is a flow chart illustrating an embodiment of a process to access a file or other file system object stored in a distributed file system. In various embodiments, the process of FIG. 4 may be performed on a client system or device, e.g., by a file system client or other agent running on the client system/device, such as file system client 208 of FIG. 2. In the example shown, a request to access a file system object, e.g. a file identified by file name, is received from an application (702). A request is sent to a file system metadata server to retrieve the file (704). A set of segment references, and for each a corresponding URI and encryption key, is received from the file system metadata server (706). A local cache is checked to determine whether any required segments are present in the cache (708). For all segments not present in the cache, the associated URI is used to send a GET request to retrieve the segment from the cloud-based object store, and the associated key is used to decrypt the segment once it has been received from the object store in encrypted form (710). The segments are used to reconstruct the file and provide access to the file to the application from which the access request was received (712).

[0039] FIG. 8 is a flow chart illustrating an embodiment of a process to handle a request to access a file or other file system object stored in a distributed file system. In various embodiments, the process of FIG. 5 may be performed by a file system metadata server, such as file system metadata server 110 of FIG. 1. In the example shown, a request to access a named file is received (802). A segment map associated with the file is retrieved and used to determine a set of segment references (e.g., hashes), and for each a corresponding URI indicating where the segment is stored in the cloud-based segment store and an encryption key usable to decrypt the segment (804). The segment references, URI's, and keys are returned to the file system client from which the file access request was received (806).

[0040] FIG. 9 is a block diagram illustrating an example set of file system metadata tables used in an embodiment of a distributed file system. In various embodiments, the tables 902, 904, and 906 of FIG. 9 may be created and maintained by a file system metadata server, such as file system metadata server 110 of FIGS. 1 and 3. In the example shown, an inode table 902 is used to store data associating each named file system object, e.g., directories, files, or other objects, with a corresponding inode or other unique number or identifier. Chunk map table 904 is used in various embodiments to store for each file, and for each of one or more segments (chunks) into which that file has been broken up to be stored, an offset of the chunk within the file, a chunk identifier (chunk id), and other metadata. For example, a file that has been stored as three chunks would have three entries (rows) in table 904, one for each chunk. In various embodiments, the chunk id is a monotonically increasing value, with each successively stored chunk being given a next chunk id in alphanumeric order. In various embodiments, chunks are immutable once stored. If file data is modified, affected data is stored as a new chunk and assigned a next chunk id in order. As a result, a chunk with a higher chunk id by definition was stored subsequent to a chunk with a lower chunk id, and it can be assumed neither was modified since it was created and stored.

[0041] Finally, the chunk metadata table 906 includes a row for each chunk, identified by chunk id (column 908 in the example shown), and for each chunk metadata including a hash of (all or a prescribed part of) the chunk contents (sometimes referred to herein as a chunk or segment "reference") (column 910), the size of the chunk (column 912), other metadata, and a reference count (column 914) indicating how many currently live files (or other file system objects) reference the chunk. For example, if a file is created by copying another file, each of them would reference the chunks comprising the file that was copied.

[0042] In various embodiments, chunks are stored in an object store, such as object store 112 of FIG. 1, in a de-duplicated manner. Prior to storing a chunk, the file system checks to determine whether the same chunk has already been stored previously. If so, a reference to the chunk as stored previously is associated with the file that has been requested to be stored and a reference count associated with the chunk is incremented. If not, the chunk is added to the object store and corresponding chunk metadata is generated and stored.

[0043] One way to determine whether a given chunk has already been stored would be to query the chunk metadata table to determine whether a chunk having the same hash as the chunk is already among the chunks represented in the chunk metadata table, such as chunk metadata table 906 of FIG. 9. However, such a query may become computationally expensive to perform, in particular as the number of chunks represented in the metadata table, and as result the size of table itself, become very large.

[0044] In various embodiments, a Bloom filter may be used to facilitate determining whether a chunk has been stored already. Given the characteristics of a Bloom filter, a "negative" result can be relied upon to conclude a given chunk has not yet been stored, obviating the need to query the chunk metadata table prior to make that determination.

[0045] FIG. 10 is a block diagram illustrating an example of a Bloom filter used in an embodiment of a distributed file system. In various embodiments, the Bloom filter of FIG. 10 may be used by a file system metadata server, such as file system metadata server 110 of FIG. 1, to determine whether a chunk comprising file data has already been stored. In the example shown, a Bloom filter 1002 having m bits is used to determine whether a chunk "A" 1004 might already be present or more definitively is not present in a set of chunks the file system has already stored. In the example shown, three different hash functions are applied to the chunk and the respective results mapped to corresponding bits in the Bloom filter array, e.g., computing the hash modulo the array size. The array locations (bits) to which the respective hash values are mapped have been set in this example to the value "1". Subsequently, if a request to store the same chunk "A" 1004 were received, the hash functions modulo the array size would be computed and would map to the same three locations as shown in FIG. 10, resulting in a "positive" or "true" result indicating to the file system that the chunk "A" might have been stored already. As noted above, due to the possibility of false positives, in the event of a positive or "true" result from the Bloom filter, in various embodiments the file metadata server is configured to query the chunk metadata table (e.g., chunk metadata table 906 of FIG. 9) in the event of a positive or "true" result from the Bloom filter.

[0046] FIG. 11 is a flow chart illustrating an embodiment of a process to use a Bloom filter to determine whether a chunk comprising file data has already been stored. In the example shown, a hash of a chunk is received (or computed) (1102). A Bloom filter is checked to determine if the chunk may already have been stored in the object store (1104). A "negative" or "false" result from the Bloom filter (1106) results in an indication that the chunk is not already present in the object store and needs to be stored being returned (1108). If the result is positive or "true" (1106), the hash is used to query the chunk metadata table (1110). If the query returns a result indicating the chunk is represented already in the chunk metadata table (1112), a result indicating that the chunk already has been added to the object store is returned (1114). Conversely, if hash is not found in the chunk metadata table (1112), an indication that the chunk is not already present in the object store and needs to be stored is returned (1108).

[0047] Partitioning a Bloom filter into two or more partitions, each having a relatively smaller partition-specific filter, and distributing elements of a set among the partitions, is disclosed. In various embodiments, the number of partitions and the initial size of each may be determined statically, at least initially, based on how many elements are and/or are expected to be included in the overall set. In some embodiments, a decision to partition a Bloom filter may be made dynamically, based for example on a computed probability of a false positive (e.g., based on filter size and number of elements in the set/partition) and/or based on a count of how many elements have been removed from the set/partition, e.g., by virtue of files having been modified and/or deleted from the file system.

[0048] FIG. 12 is a block diagram illustrating an embodiment of a partitioned Bloom filter used in embodiments of a de-duplicating file system. In the example shown, the Bloom filter 1002 of FIG. 10 has been split into a set of partitions 1202. In this example, four partitions are shown, one each corresponding to partition-specific filters 1204, 1206, 1208, and 1210, respectively. In various embodiments, the partitioning is based on key space (e.g., the hash of the content of chunks). The object "A" in this example is mapped, based on the hash of its chunk content k(A), to the partition associated with the partition-specific Bloom filter 1206. In some embodiments, the hash value modulo the number of partitions is computed to determine which partition-specific filter to use. In some embodiments, some prescribed number of bits and/or other portion of the hash or other value may be used. In various embodiments, assignment to a partition is based on a method selected to achieve an even or nearly even distribution of elements across the partitions and to always result in a given object being mapped to the same partition and component filter. In various embodiments, a chunk may be assigned to a filter partition based on a hash of its chunk content, and within the filter partition the chunk may be mapped to one or more filter locations (indices), e.g., based on one or more (additional) hashes of the chunk contents.

[0049] In various embodiments, the partition-specific Bloom filter may initially be set to a size smaller than what may ultimately be required. If the partition-specific Bloom filter becomes too saturated, for example as a result of the number of objects associated with the partition becoming large relative to the filter size, then in various embodiments the partition-specific filter may be resized, as indicated by the dotted lines shown adjacent to each of partition-specific filters 1204, 1206, 1208, and 1210. In various embodiments, while a partition-specific filter is being resized, the file system (or other system) may continue to use the respective Bloom filters associated with other partitions to determine whether chunks mapped to those partitions may have been stored already. In addition, the amount of time the partition-specific Bloom filter may be unavailable as it is resized and rebuilt will be much less than if a single Bloom filter for the entire set had to be resized and rebuilt, resulting in a shorter window of time during which de-duplication decisions would need to be made by querying the chunk metadata table, without the benefit and use of the Bloom filter.

[0050] FIG. 13 is a flow chart illustrating an embodiment of a process to create and maintain a partitioned Bloom filter. In various embodiments, the process of FIG. 13 may be used to provide a partitioned Bloom filter, such as the set of filters 1202 in the example shown in FIG. 12. In the example shown, the expected population of the entire set is determined (1302). For example, in the case of a file system, the number of existing chunks may be known from the chunk metadata table and/or other metadata. For a forward looking determination, a previously-observed rate of increasing in the number of chunks, or other statistical or numerical techniques, may be used to project a future population of the set. The number of partitions to use (at least initially) is determined (1304). For example, the number of partitions may be determined based on one or more of the current and/or projected overall set size, the filter size considered to be manageable or desirable for each partition-specific Bloom filter, false positive rates considered to be acceptable, etc. The initial size of the partition-specific Bloom filters is determined (1306). For example, the number of partitions may be determined based on projected or expected set size at some time in the future (1304), whereas for each partition the initial partition size may be computed based on the current population of the partition (1306). The partition-specific Bloom filters are created and configured (1308). Individual partition-specific Bloom filters are resized and rebuilt, independently of one another, as needed (1310).

[0051] FIG. 14 is a flow chart illustrating an embodiment of a process to use a partitioned Bloom filter to determine whether a chunk comprising file data has already been stored. In various embodiments, the process of FIG. 14 may be used by a file system metadata server, such as file system metadata server 110 of FIG. 1, to determine whether a chunk may already have been stored, e.g., in an object store such as cloud-based object store 112 of FIG. 1. In the example shown, a hash of a chunk is received (or computed) (1402). A corresponding hash-range specific (or otherwise-defined) partition is determined (1404). A partition-specific Bloom filter is used to determine whether the chunk may already have been stored (1406).

[0052] FIG. 15 is a flow chart illustrating an embodiment of a process to determine whether and/or when to resize/rebuild a component filter of a partitioned Bloom filter. In various embodiments, step 1310 of the process of FIG. 13 may include the process of FIG. 15. In the example shown, the probability of a false positive result is computed with respect to the partition-specific Bloom filter (1502). For example, the current size of the partition-specific Bloom filter and the population (number) of elements currently associated with the partition may be used to compute the probability of a false positive result. If the probability of a false positive exceeds a prescribed threshold (1504), the partition-specific Bloom filter is resized and rebuilt (1510). If the probability of a false positive result does not exceed the threshold (1502, 1504), a count of the number of elements that have been removed from the partition is compared to a corresponding prescribed threshold (1506), and if the number of deletions exceeds the threshold (1508), the partition-specific Bloom filter is resized and rebuilt (1510). If not, the probability of a false positive and/or number of deletions continued to be tracked (1512), unless/until a determination to resize and rebuild the partition-specific Bloom filter is made or the process ends, e.g., the system is taken offline for maintenance.

[0053] FIG. 16 is a flow chart illustrating an embodiment of a process to determine whether and/or when to split a component filter of a partitioned Bloom filter. In various embodiments, step 1510 of the process of FIG. 15 may include the process of FIG. 16. In the example shown, upon receiving an indication to resize a partition-specific Bloom filter (e.g., based on the computed probability of a false positive, observed false positives, number of elements removed due to file deletion, etc.) a new size S.sub.new to which the Bloom filter is to be resized is determined (1602). For example, the new size may be computed based on a prescribed increment by which the size is configured to be increased, and/or a size determined dynamically based on throughput and/or other observed conditions. If the new size determined to be required exceeds a prescribed maximum size (1608), the partition is further divided, for example into two sub-partitions, and a separate sub-partition-specific Bloom filter is provided for each (1610). If the required new size would not exceed the maximum (1608), the single partition-specific Bloom filter is resized to the computed new size and is rebuilt (1612).

[0054] In various embodiments, partitioned populations and associated partition-specific Bloom filters may enable the presence of an element in set to be determined using space efficient data structures, without unacceptably high false positive rates. A growing and/or very large population of elements may be managed, including by resizing and/or further partitioning partition-specific Bloom filters, as needed, independently of one another, minimizing filter unavailability.

[0055] Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.

* * * * *

Patent Diagrams and Documents

D00000

D00001

D00002

D00003

D00004

D00005

D00006

D00007

D00008

D00009

D00010

D00011

D00012

D00013

D00014

D00015

D00016

XML

US20200065306A1 – US 20200065306 A1