U.S. patent application number 14/044498 was filed with the patent office on 2014-04-03 for regulating data storage based on popularity.
This patent application is currently assigned to NEXTBIT SYSTEMS INC.. The applicant listed for this patent is NEXTBIT SYSTEMS INC... Invention is credited to Justin Quan.
Application Number | 20140095457 14/044498 |
Document ID | / |
Family ID | 50386182 |
Filed Date | 2014-04-03 |
United States Patent
Application |
20140095457 |
Kind Code |
A1 |
Quan; Justin |
April 3, 2014 |
REGULATING DATA STORAGE BASED ON POPULARITY
Abstract
Technology is disclosed for regulating data storage based on a
popularity of data files ("the technology"). Various embodiments of
the technology includes maintaining a fixed durability level of
data files stored in a storage system by regulating a number of
copies of the data files stored in the storage system. One
embodiment includes regulating the number of copies of a particular
data file based on popularity of the particular data file among
various users using the storage system. The number of copies stored
in the storage system is increased or decreased, including from/to
zero, based on the popularity of the particular data file. The
popularity is determined based on at least one of: a number of
computing devices of various users having the particular data file,
a latency, network bandwidth and/or availability with the computing
devices for reading the particular data file, or access pattern of
the particular data file.
Inventors: |
Quan; Justin; (San
Francisco, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NEXTBIT SYSTEMS INC.. |
San Francisco |
CA |
US |
|
|
Assignee: |
NEXTBIT SYSTEMS INC.
San Francisco
CA
|
Family ID: |
50386182 |
Appl. No.: |
14/044498 |
Filed: |
October 2, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61708794 |
Oct 2, 2012 |
|
|
|
Current U.S.
Class: |
707/697 ;
707/687 |
Current CPC
Class: |
H04M 2250/12 20130101;
G06F 16/278 20190101; H04L 67/2842 20130101; G06F 16/125 20190101;
G06F 16/20 20190101; H04L 67/1097 20130101; Y02D 10/45 20180101;
G06F 16/275 20190101; H04L 67/26 20130101; G06F 16/273 20190101;
H04B 7/26 20130101; H04L 67/1095 20130101; G06F 16/27 20190101;
G06F 16/174 20190101; H04M 2250/10 20130101; G06F 16/93 20190101;
H04L 67/06 20130101; H04M 1/7253 20130101; G06F 9/5038
20130101 |
Class at
Publication: |
707/697 ;
707/687 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method of regulating data storage, the method comprising:
receiving, at a server, a data file from one or more users to
generate multiple copies of the data file; storing, by the server,
the copies of the data file at a storage system; determining, by
the server, a popularity value of the data file, the popularity
value indicating a popularity of the data file among the one or
more users; determining, by the server, a number of copies of the
data file to be stored at the storage system based on the
popularity value; and adjusting, by the server, the number of
copies of the data file stored at the storage system based on the
popularity value.
2. The method of claim 1, wherein adjusting the number of copies
stored at the storage system includes increasing the number of
copies stored at the storage system according to a popularity value
range the popularity value of the data file corresponds to.
3. The method of claim 1, wherein adjusting the number of copies
stored at the storage system includes decreasing the number of
copies stored at the storage system according to a popularity value
range the popularity value of the data file corresponds to.
4. The method of claim 1, wherein adjusting the number of copies
stored at the storage system includes not storing any of the copies
of the data file at the storage system if the popularity value
exceeds a threshold popularity value.
5. The method of claim 1, wherein receiving the data file from one
or more users includes receiving the data file from one or more
computing devices associated with each of the one or more
users.
6. The method of claim 1, wherein the copies of the data file
include copies of a portion of the data file.
7. The method of claim 1, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of a number of computing devices associated with the one
or more users that contain the data file.
8. The method of claim 7, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of a latency associated with reading the data file from
one or more of the computing devices that contain the data
file.
9. The method of claim 7, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of a network bandwidth available for reading the data file
from one or more of the computing devices that contain the data
file.
10. The method of claim 7, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of availability of a network connection with one or more
of the computing devices that contain the data file for reading the
data file.
11. The method of claim 1, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of a number of the one or more users requiring storage for
the same data file at the storage system.
12. The method of claim 1, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of access pattern of the data file for a specific user of
the one or more users.
13. The method of claim 1, wherein determining the popularity value
of the data file includes determining the popularity value as a
function of access pattern of the data file for a subset of the one
or more users.
14. A method comprising: receiving, at a server and from a first
computing device associated with a first user, a request to
retrieve a first data file of the first user from a storage system,
the storage system configured to store a plurality of data files of
a plurality of users based on a plurality of popularity values of
the corresponding data files; determining, by the server, whether
storage system has the first data file; responsive to a
determination that the storage system does not have the copy of the
first data file, determining a plurality of computing devices
associated with the users that have a copy of the first data file;
retrieving, by the server, the copy of the first data file from one
of the computing devices; and serving, by the server, the copy of
the first data file to the first user.
15. The method of claim 14, wherein storing the data files of the
users in the storage system includes determining, by the server and
for a data file of the data files, a popularity value of the data
file, the popularity value indicating a popularity of the data file
among the users, determining, by the server and based on the
popularity value, a number of copies of the data file to be stored
at the storage system, and adjusting, by the server, the number of
copies of the data file stored at the storage system based on the
popularity value.
16. The method of claim 15, wherein determining the popularity
value of the data file includes determining the popularity value as
a function of a number of computing devices associated with the
users that contain the data file.
17. The method of claim 16, wherein determining the popularity
value of the data file includes determining the popularity value as
a function of at least one of: (a) a latency associated with
reading the data file from the computing devices that contain the
data file, (b) a network bandwidth available for reading the data
file from the computing devices that contain the data file or (c)
availability of a network connection with the computing devices
that contain the data file for reading the data file.
18. The method of claim 14, wherein adjusting the number of copies
stored at the storage system includes at least one of: (a) not
storing any of the copies of the data file at the storage system if
the popularity value exceeds a first threshold, (b) increasing the
number of copies stored at the storage according to a popularity
value range the popularity value of the data file corresponds to or
(c) decreasing the number of copies stored at the storage system
according to a popularity value range the popularity value of the
data file corresponds to.
19. The method of claim 14, wherein determining a plurality of
computing devices associated with the users that have a copy of the
first data file includes determining, by the server, a checksum of
each of the data files received from the users to generate a
plurality of checksums, storing the checksums of the data files and
identifications of the computing devices having the copy of the
data files at the storage system, and comparing a first checksum of
the first data file with the checksums of the data files to
determine if any of the computing devices has the copy of the first
data file.
20. An apparatus comprising: a storage system configured to store a
plurality of data files received from a plurality of users based on
a popularity value of each of the data files; a popularity value
determination module to determine the popularity value for each of
the data files, the popularity value indicating a popularity of the
corresponding data file among the users; and a data file
replication management module to determine, based on the popularity
value, a number of copies of the corresponding data file to be
stored at the storage system, and adjusting, based on the
popularity value, the number of copies of the corresponding data
file stored at the storage system.
21. The apparatus of claim 20 further comprising: a request
receiving module to receive from a first computing device
associated with a first user a request to retrieve a first data
file of the first user from the storage system, the first data file
being one of the data files; and a data file serving module to
determine whether storage system has a copy of the first data file,
responsive to a determination that the storage system does not have
the copy of the first data file, determine a plurality of computing
devices associated with the users that have the copy of the first
data file, retrieve the copy of the first data file from one of the
computing devices, and serve the copy of the first data file to the
first user.
22. A method comprising: receiving, at a server and from a first
computing device associated with a first user, a request to
retrieve a first data file of the first user from a storage system,
the storage system configured to store a plurality of data files of
a plurality of users based on a plurality of popularity values of
the corresponding data files, the storage system configured to
store portions of the data files; determining, by the server,
whether storage system has entire first data file or a portion of
the first data file; responsive to a determination that the storage
system has the portion of the first data file, determining a
plurality of computing devices associated with the users that have
a copy of remaining portions of the first data file; retrieving, by
the server, the copy of the remaining portions of the first data
file from one of the computing devices; and serving, by the server,
the copy of the entire first data file to the first user, the
entire first data file generated using the portion retrieved from
the storage system and the remaining portions retrieved from the
one of the computing devices.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application claims to the benefit of U.S. Provisional
Patent Application No. 61/708,794, entitled "CLOUD COMPUTING
INTEGRATED OPERATING SYSTEM", which was filed on Oct. 2, 2012,
which is incorporated by reference herein in its entirety.
TECHNICAL FIELD
[0002] Several of the disclosed embodiments relate to data storage,
and more particularly, to regulating a number of copies of a data
FILE to be stored in a storage system based on a popularity of the
data file.
BACKGROUND
[0003] Current storage services such as cloud storage services
allow users to store various multi-media content such as music
files, video files, images, documents, etc. in the cloud. In order
to provide for recovery from data loss, the cloud storage services
typically replicate the content and store various copies of the
content at different storage systems, and probably at different
locations. This requires huge amounts of storage resources and
other associated infrastructure and maintenance resources to
maintain the data centers. This can result in increased costs.
Further, some of the data files stored for various users can be
identical. For example, a music file such as "Optimistic" by
"Radiohead" is the same for any user storing that music file with
cloud storage service. The current cloud storage services store
multiple copies of the same file, e.g., one for each user who
uploaded the music file, which results in a significant amount of
space being used for storing identical files. Accordingly, the
current storage services are inefficient at least in terms of
managing the available storage space.
SUMMARY
[0004] Technology is disclosed for regulating data storage based on
a popularity of data files ("the technology"). Various embodiments
of the technology provide for maintaining a fixed durability level
of data files stored in a storage system by regulating a number of
copies of the data files stored in the storage system. One such
embodiment includes regulating the number of copies of a particular
data file stored in the storage system based on a popularity of the
particular data file among various users who use the storage
system. The number of copies stored in the storage system is
increased or decreased, including from/to zero copies, based on the
popularity of the particular data file. Further, the storage system
can store either a complete data file or for a portion of the data
file. Accordingly, the technology is applicable to either the
complete data file or a portion of the data file.
[0005] In some embodiments, the popularity of the particular data
file is determined by computing a popularity value for the
particular data file. The popularity value of the particular data
file can be determined based on a number of factors, including one
or more of: (a) a number of computing devices associated with one
or more of the users that contain the particular data file, (b) a
latency associated with reading the particular data file from one
or more of the computing devices that contain the particular data
file, (c) a network bandwidth available for reading the particular
data file from one or more of the computing devices that contain
the particular data file, (d) availability of a network connection
with one or more of the computing devices that contain the
particular data file for reading the particular data file, (e) a
number of the users requiring storage for the same data file at the
storage system, or (f) access pattern of the particular data file
for a specific user or a subset of the users. In some embodiments,
one or more the above factors can be weighted relative to each
other.
[0006] The popularity value can be determined in various units and
using various mathematical equations. One example expression of a
popularity value can include a percentage value, where a popularity
value of 100% can indicate that all the users serviced by the
storage system have a copy of the particular data file on all their
computing devices, the particular data file can be fetched from any
of the computing devices with a minimum latency, the particular
data file is accessed frequently etc. On the other hand, a
popularity value of 0% can indicate that none of the users have a
copy of the particular data file or it is not possible to retrieve
a copy within maximum accepted latency etc.
[0007] The number of copies stored in the storage system is
increased or decreased, including from/to zero copies, based on the
popularity value. For example, if the popularity value of a
particular data file is 100%, the storage system may not store any
copies of the particular data file since the particular data file
is available at all the computing devices of the users and can be
retrieved from any of the computing devices at any time. On the
other hand, if the popularity value of a particular data file is 0%
the storage system may store one or more copies of the particular
data file since the particular data file is not available at any of
the computing devices or cannot be retrieved within a maximum
accepted latency etc. Generally, the higher the popularity of the
data file, the lower the number of copies of the data file that
need to be stored at the storage system. Further, various
popularity value ranges and number of copies that can be stored for
each of the ranges can be configured, e.g., by an entity such as an
administrator of the storage server.
[0008] When a user requests a particular data file, a server
determines whether the particular data file is available at the
storage system. If the particular data file is available at the
storage system, the server serves the request by fetching the file
from the storage system. On the other hand, if the particular data
file is not available at the storage system, the server serves the
request by fetching the file from any of the other computing
devices of the user and/or any of the computing devices of other
users that contain the particular data file.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 illustrates an environment where data storage
regulation for maintaining a specified durability level for the
data files at a storage system can be implemented.
[0010] FIG. 2 illustrates an example system that regulates a number
of copies of data files stored at a storage system based on a
popularity value of the corresponding data file, consistent with
various embodiments of the disclosed technology.
[0011] FIG. 3 illustrates an example of a system for serving a
particular data file from the storage system, consistent with
various embodiments of the disclosed technology.
[0012] FIG. 4 illustrates a block diagram of a server that
regulates the number of copies of the data files stored at the
storage system based on the popularity values of the corresponding
data files, consistent with various embodiments of the disclosed
technology.
[0013] FIG. 5 illustrates a flow diagram for regulating data
storage at a storage system based on a popularity value of data
files, consistent with various embodiments of the disclosed
technology.
[0014] FIG. 6 illustrates an example process for serving a
particular data file from the storage system, consistent with
various embodiments of the disclosed technology.
[0015] FIG. 7 is a block diagram of a computer system as may be
used to implement features of some embodiments of the disclosed
technology.
DETAILED DESCRIPTION
[0016] Technology is disclosed for regulating data storage based on
a popularity of data ("the technology"). Various embodiments of the
technology provide for maintaining a fixed durability level of data
files stored in a storage system by regulating a number of copies
of the data files stored in the storage system. One such embodiment
includes regulating the number of copies of a particular data file
stored in the storage system based on the popularity of the
particular data file among various users who use the storage
system. The number of copies stored in the storage system is
increased or decreased, including from/to zero copies, based on the
popularity of the particular data file. Further, the storage system
can store either a complete data file or for a portion of the data
file. Accordingly, the technology is applicable to either the
complete data file or a portion of the data file.
[0017] In some embodiments, the popularity of the particular data
file is determined by computing a popularity value for the
particular data file. The popularity value of the particular data
file can be determined based on a number of factors, including one
or more of: (a) a number of computing devices associated with one
or more of the users that contain the particular data file, (b) a
latency associated with reading the particular data file from one
or more of the computing devices that contain the particular data
file, (c) a network bandwidth available for reading the particular
data file from one or more of the computing devices that contain
the particular data file, (d) availability of a network connection
with one or more of the computing devices that contain the
particular data file for reading the particular data file, (e) a
number of the users requiring storage for the same data file at the
storage system, or (f) access pattern of the particular data file
for a specific user or a subset of the users. In some embodiments,
one or more the above factors can be weighted relative to each
other.
[0018] The popularity value can be determined in various units and
using various mathematical equations. One example expression of a
popularity value can include a percentage value, where a popularity
value of 100% can indicate that all the users serviced by the
storage system have a copy of the particular data file on all their
computing devices, the particular data file can be fetched from any
of the computing devices with a minimum latency, the particular
data file is accessed frequently etc. On the other hand, a
popularity value of 0% can indicate that none of the users have a
copy of the particular data file or it is not possible to retrieve
a copy within maximum accepted latency etc.
[0019] The number of copies stored in the storage system is
increased or decreased, including from/to zero copies, based on the
popularity value. For example, if the popularity value of a
particular data file is 100% the storage system may not store any
copies of the particular data file since the particular data file
is available at all the computing devices of the users and can be
retrieved from any of the computing devices at any time. On the
other hand, if the popularity value of a particular data file is 0%
the storage system may store one or more copies of the particular
data file since the particular data file is not available at any of
the computing devices or cannot be retrieved within a maximum
accepted latency etc. Generally, the higher the popularity of the
data file, the lower the number of copies of the data file that
need to be stored at the storage system. Further, various
popularity value ranges and number of copies that can be stored for
each of the ranges can be configured, e.g., by an entity such as an
administrator of the storage server.
[0020] When a user requests a particular data file, a server
determines whether the particular data file is available at the
storage system. If the particular data file is available at the
storage system, the server serves the request by fetching the file
from the storage system. On the other hand, if the particular data
file is not available at the storage system, the server serves the
request by fetching the file from any of the other computing
devices of the user and/or any of the computing devices of other
users that contain the particular data file.
Environment
[0021] FIG. 1 illustrates an environment where data storage
regulation for maintaining a specified durability level for the
data files at a storage system can be implemented. The system 100
includes a storage system 105 for storing data files received from
computing devices 130-140 of users. The system 100 includes a cloud
server 110 configured to handle communications between the
computing devices 130-140 and the storage system 105. The
communications can include data storage or retrieval requests from
the computing devices 130-140. In one embodiment, the cloud server
110 can be a server cluster having computer nodes interconnected
with each other by a network. The server cluster can communicate
with storage system via the Internet or communication networks. The
storage system 105 contains storage nodes 112. Each of the storage
nodes 112 contains one or more processors 114 and storage devices
116. The storage devices 116 can include optical disk storage, RAM,
ROM, EEPROM, flash memory, phase change memory, magnetic cassettes,
magnetic tapes, magnetic disk storage or any other computer storage
medium which can be used to store the desired information.
[0022] A cloud data interface 120 can also be included to receive
data from and send data to computing devices 130-140. The cloud
data interface 120 can include network communication hardware and
network connection logic to receive the information from computing
devices. The network can be a local area network (LAN), wide area
network (WAN) or the Internet. The cloud data interface 120 may
include a queuing mechanism to organize data updates received from
or sent to the computing devices 130-140.
[0023] Although FIG. 1 illustrates two computing devices 130-140, a
person having ordinary skill in the art will readily understand
that the technology disclosed herein can be applied to a single
computing device or more than two computing devices connected to
the cloud server 110.
[0024] The computing devices 130-140 include an operating system
132-142 to manage the hardware resources of the computing devices
130-140 and provide services for running computer applications
134-144 (e.g., mobile applications running on mobile devices). The
operating system 132-142 facilitates execution of the computer
applications 134-144 on the computing device 130-140. The computing
devices 130-140 include at least one local storage device 138-148
to store the computer applications 134-144 and user data. The
computing device 130 or 140 can be a desktop computer, a laptop
computer, a tablet computer, an automobile computer, a game
console, a smartphone, a personal digital assistant, or other
computing devices capable of running computer applications, as
contemplated by a person having ordinary skill in the art.
[0025] The computer applications 134-144 stored in the computing
devices 130-140 can include applications for general productivity
and information retrieval, including email, calendar, contacts, and
stock market and weather information. The computer applications
134-144 can also include applications in other categories, such as
mobile games, factory automation, GPS and location-based services,
banking, order-tracking, ticket purchases or any other categories
as contemplated by a person having ordinary skill in the art.
[0026] The operating system 132-142 of the computing devices
130-140 includes socket redirection modules 136-146 to redirect
network messages. The computer applications 134-144 generate and
maintain network connections directed to various remote servers
(not illustrated). The remote servers can include applications,
products or services such as social networking applications that
the users may interact with via the computer applications 142-144.
Instead of directly opening and maintaining the network connections
with these remote servers, the socket redirection modules 136-146
route all of the network messages for these connections of the
computer applications 134-144 to the cloud server 110. The cloud
server 110 is responsible for opening and maintaining network
connections with the remote servers.
[0027] All or some of the network connections of the computing
devices 130-140 are through the cloud server 110. The network
connections can include Transmission Control Protocol (TCP)
connections, User Datagram Protocol (UDP) connections, or other
types of network connections based on other protocols. When there
are multiple computer applications 134-144 that need network
connections to multiple remote servers, the computing devices
130-140 only need to maintain one network connections with the
cloud server 110. The cloud server 110 will in turn maintain
multiple connections with the remote servers on behalf of the
computer applications 134-144.
[0028] In various embodiments, the cloud server 110 maintains a
certain level of durability of the data files stored at the storage
system 105 by regulating a number of copies of the data files
stored at the storage system 105. In some embodiments, the cloud
server 110 regulates the number of copies of the data files based
on the popularity values of the data files. For example, the more
popular the data files are among the users, the fewer the number of
copies of the data files stored at the storage system. Additional
details with respect to regulating number of copies of the data
files based on the popularity values are described at least with
reference to FIGS. 2-7.
[0029] FIG. 2 illustrates an example system that regulates a number
of copies of data files stored at a storage system based on a
popularity value of the corresponding data file, consistent with
various embodiments. In some embodiments, the system 200 can be
similar to a system such as system 100 of FIG. 1. In some
embodiments, the server 230 is similar to the cloud server 110 and
the storage system 235 can be similar to storage system 105. In the
figure, the storage system 235a is a non-regulated data storage,
and storage systems 235b-d, are examples of regulated storage
systems in which the number of copies of data files are regulated
based on the popularity of data files. In an embodiment the storage
systems 235a-d form a storage system 235 of the system 200.
[0030] The server 230 provides data storage services to a number of
users, including a first user, a second user and a third user to
store various data files. The data files can include files such as
images, videos, logs, application configuration files, computing
device configuration files etc. A user can upload data files from
one or more computing devices associated with the user to the
server 230 via a communication network 225. For example, a first
user can upload data file, File A, from a first computing device
205 and a second computing device 210. Similarly, the third
computing device 215 uploads data file, "File A" and "File B" and
the fourth computing device 220 "File A," "File B" and "File C."
Accordingly, the server 230 stores four copies of data file, "File
A" in the storage system 235, two copies of "File B" and one copy
of "File C." The storage system 235a can have a number of storage
units across which the data files can be stored. Further, in some
embodiments, the storage units can be spread across various
geographical locations.
[0031] Typically, a storage system keeps a number of copies of the
data files in order to improve the durability of data files, e.g.,
to minimize the impact due to data loss either at the user end or
at the storage system end. In some embodiments, the server 230
maintains a certain level of durability of the data files stored at
the storage system 235a by regulating a number of copies of the
data files stored at the storage system 235a based on the
popularity of the data files. The more popular the data files are
among the users, the lower the number of copies of the data files
are stored at the storage system.
[0032] The popularity of a particular data file is measured using a
popularity value. In some embodiments, the popularity value of the
particular data file is determined based on a number of factors,
including one or more of: (a) a number of computing devices
associated with one or more of the users that contain the
particular data file, (b) a latency associated with reading the
particular data file from one or more of the computing devices that
contain the particular data file, (c) a network bandwidth available
for reading the particular data file from one or more of the
computing devices that contain the particular data file, (d)
availability of a network connection with one or more of the
computing devices that contain the particular data file for reading
the particular data file, (e) a number of the users requiring
storage for the same data file at the storage system, or (f) access
pattern of the particular data file for a specific user or a subset
of the users. In some embodiments, one or more the above factors
can be weighted relative to each other and an overall popularity
value of the particular data file can be determined as a function
of the popularity value for one or more of the above factors.
[0033] In some embodiments, the higher the number of computing
devices that contain the particular data file, the higher is the
popularity value of the particular data file. This may indicate
that since the particular data file is available from many
computing devices, a lesser number of copies, including zero, may
be stored at the storage system 235. When a user requests to
retrieve the particular data file, the server obtains the
particular data file from one of the computing devices and serves
the particular data file to the user.
[0034] In some embodiments, the higher the latency associated with
reading the particular data file from one or more of the computing
devices that contain the particular data file, the lower the
popularity value of the particular data file is. In some
embodiments, if the latency is above a maximum acceptable value,
the server may determine to store a higher number of copies at the
storage system 235. In some embodiments, an overall latency based
popularity value may be determined as an average of or as any other
function of latency based popularity value of the particular data
file for each of the computer devices that contain the particular
data file.
[0035] In some embodiments, the higher the network bandwidth
available for reading the particular data file from one or more of
the computing devices that contain the particular data file, the
higher popularity value. In some embodiments, an overall network
bandwidth based popularity value may be determined as an average or
as any other function of network bandwidth based popularity value
of the particular data file for each of the computer devices that
contain the particular data file.
[0036] In some embodiments, the higher the availability of a
network connection with one or more of the computing devices that
contain the particular data file for reading the particular data
file higher the popularity value of the particular data file. In
some embodiments, an overall network connection availability based
popularity value may be determined as an average or as any other
function of network connection availability based popularity value
of the particular data file for each of the computer devices that
contain the particular data file.
[0037] In some embodiments, the higher the number of the users
requiring storage for the same data file at the storage system the
higher the popularity value of the particular data file.
[0038] In some embodiments, the access pattern of the particular
data file is considered for determining the popularity value. The
access pattern can be based on how frequently the particular data
file stored at the storage system 235a is accessed or requested by
a user who has uploaded the particular data file. The higher the
frequency of access, the higher the number of copies stored at the
storage system. If the frequency of access is high, the server 230
may determine to store one or more copies on the storage system
since it may be faster and more efficient to retrieve the data file
from the storage system rather than the computing devices of the
users that contain the copy of the particular data file.
Accordingly, the higher the frequency of access the lower the
popularity value. Further, in some embodiments, the access pattern
of the particular data file may be considered not only for a
particular user but also for a subset of the users.
[0039] The popularity value can be determined in various units and
using various mathematical equations. One example expression of a
popularity value can include a percentage value, where a popularity
value of 100% can indicate that all the users serviced by the
storage system have a copy of the particular data file on all their
computing devices, the particular data file can be fetched from any
of the computing devices with a minimum latency, the particular
data file is accessed frequently etc. On the other hand, a
popularity value of 0% can indicate that none of the users have a
copy of the particular data file or it is not possible retrieve a
copy within maximum accepted latency etc.
[0040] The number of copies stored in the storage system is
increased or decreased, including from/to zero, based on the
popularity value. For example, if the popularity value of a
particular data file is 100% the storage system may not store any
copies of the particular data file since the particular data file
is available at all the computing devices of the users and can be
retrieved from any of the computing devices at any time. On the
other hand, if the popularity value of a particular data file is 0%
the storage system may store one or more copies of the particular
data file since the particular data file is not available at any of
the computing devices or cannot be retrieved within a maximum
accepted latency etc. Generally, the higher the popularity, the
lower the number of copies of the data file stored at the storage
system. Further, various popularity value ranges and number of
copies that can be stored for each of the ranges can be configured,
e.g., by an entity such as an administrator of the storage
server.
[0041] Referring back to the non-regulated storage system 235a, the
storage system 235a includes four copies of "File A," two copies of
"File B" and a copy of "File C." The server 230 may adjust the
number of copies of the above mentioned data files in one or more
of the following ways:
[0042] Regarding "File A," the server 230 may determine that "File
A" has a high popularity value, e.g., because each of the four
computing devices has a copy of "File A", the availability of
network connection with one or more of the computing devices is
high, etc. Accordingly, the server 230 may decrease the number of
copies of "File A" by half as shown in regulated storage systems
235b-c. In some embodiments, the server 230 may even determine not
to store any copy of "File A" in the storage system as shown by
example storage system 235d.
[0043] Regarding "File B," the server 230 may determine to retain
the same number of copies based on the popularity value of "File
B." Regarding, "File C," in some embodiments, the popularity value
may indicate that that one of copy of "File C" is sufficient to be
stored at the storage system, for e.g., because only one computing
device needs the file, the file is not accessed as frequently, etc.
Accordingly, the server 230 stores only one copy of "File C" as
shown in the example storage system 235b. However, in some
embodiments, the popularity value of "File C" may change even with
just one user, e.g., if the user is travelling and the network
connectivity between the fourth computing device 220 and the
storage unit in the storage system 235a that contains the copy of
"File C" may change when the user is at another geographical
location. The popularity value of "File C" can change and therefore
can have an effect on the number of copies stored at the storage
system. The popularity value may indicate that two copies of the
file be maintained at the storage system. Accordingly, the server
230 may add another copy of "File C" at the storage system as shown
in regulated storage systems 235c-d. In some embodiments, the
server 230 may add another copy of the "File C" in the storage unit
of storage systems 235c-d that is closer to the location where the
user has travelled to.
[0044] In some embodiments, the server 230 determines whether
various data files uploaded by different users are similar by using
various file comparison techniques such as checksum, hash sum etc.
The server 230 generates a checksum for each of the files uploaded
to the server 230 for further storage at storage system 235 and
stores the checksum of each of the data files in the storage system
235 or in another storage system separate from the storage system
235. The checksums may be calculated for a portion of the data
file, e.g., a block of a file or a segment of file that has a
plurality of blocks, or a complete data file. Further, the server
230 also stores the identifications of at least one of the user and
the computing device which uploaded a particular data file. In some
embodiments, the checksums and the identifications of the users
and/or computing devices are stored in a data file availability
table (not illustrated). The server 230 may use the data file
availability table in determining the popularity value and also in
determining which of the computing devices has a particular data
file.
[0045] In some embodiments, the server 230 can use various storage
techniques to store data efficiently. One example storage technique
can include compression of data files that compresses the data
files so that the space consumed by the data file is minimized. The
computing devices can include devices such as a smart phone, a
digital media player, a laptop, a desktop, a tablet PC etc.
[0046] FIG. 3 illustrates an example of a system 300 for serving a
particular data file from the storage system, consistent with
various embodiments. In some embodiments, the system 300 can be
similar to the system 200 of FIG. 2, server 330 can be similar to
the server 230 the computing devices 305-320 can be similar to the
computing devices 205-220, respectively, and the storage system 350
can be similar to the storage system 235d. The users associated
with the first computing device 305, second computing device 310,
the third computing device 315 and the fourth computing device have
uploaded one or more of data files "File A," "File B" and "File C"
to the server 330 for storage as illustrated with reference to FIG.
2.
[0047] The server 330 has adjusted the number of copies of the data
files stored at storage system 350. For example, while the server
330 has stored two copies of "File B" and "File C" no copies of
"File A" are stored at the storage system 350, e.g., because the
"File A" has a high popularity value due to being available from a
number of computing devices.
[0048] A computing device such as the third computing device
requests the server 330 to retrieve "File A" that it had uploaded
earlier. The server 330 determines whether the storage system 350
has a copy of "File A." If the storage server 350 has a copy of
"File A," then the server obtains the data file from the storage
server 350 and serves the data file to the third computing device
315. On the other hand, if the storage server 350 does not have a
copy of "File A," the server 330 determines which of the computing
devices has a copy of "File A." In some embodiments, the
availability table 325 includes data specifying which of the
computing devices has which of the data files and also data
specifying other attributes such as network bandwidth for the
computing devices, their network connection availability,
associated latency to obtain the data file, etc.
[0049] The server 330 checks with the availability table 325 to
determine which of the computing devices has a copy of "File A" and
identifies a particular computing device from which it can retrieve
a copy of "File A." In some embodiments, the server 330 may select
a computing device, e.g., first computing device 305, from which
the copy of "File A" can be retrieved from least amount of latency.
The server 330 retrieves the copy of "File A" from the first
computing device 305 and serves the data file, "File A" to the
third computing device 315. The third computing device 315 would
not be aware of where the data file is retrieved from. From the
perspective of the third computing device 315, the data file, "File
A" is retrieved from the storage system 350.
[0050] FIG. 4 illustrates a block diagram of a server 400 that
regulates the number of copies of the data files stored at the
storage system based on the popularity values of the corresponding
data files, consistent with various embodiments of the disclosed
technique. In some embodiments, the server 400 can be similar to
cloud server 110 of FIG. 1. The server 400 can be, e.g., a
dedicated standalone server, or implemented in a cloud computing
service. The server 400 includes a network component 410, a
processor 420, a memory 430, a request receiving module 440, a
popularity value determination module 450, data file replication
management module 460 and a data file serving module 470. The
memory 430 can include instructions which when executed by the
processor 420 enables the server 400 to perform the functions as
described with reference to cloud server 110. The networking
component 410 is configured for network communications with
computing devices and remote servers (not illustrated). The
networking component 410 establishes a device network connection
with a computing device, and a server network connection with the
storage system 105 in response to a request from the computing
device for connecting with the storage system 105. The request can
be generated by a computer application running at the computing
device.
[0051] As explained above, the server 400 facilitates storing of
data files of the users at a storage system such as storage system
105. The data files can be received from one or more users and also
from one or more computing devices of each of the users. For
example, a user can be associated with multiple computing devices
such as smartphones, digital media players, laptops, desktops,
tablet PCs etc. The data files can include files such as images,
videos, logs, application configuration files, computing device
configuration files etc. The server 400 maintains a certain level
of durability of the data files stored at the storage system 105 by
regulating a number of copies of the data files stored at the
storage system 105. In some embodiments, the more popular the data
files are among the users, the lower the number of copies of the
data files stored at the storage system 105.
[0052] The popularity of a particular data file is measured using a
popularity value. The popularity value determination module 450
determines the popularity of the data file based on a number of
factors, including one or more of: (a) a number of computing
devices associated with one or more of the users that contain the
particular data file, (b) a latency associated with reading the
particular data file from one or more of the computing devices that
contain the particular data file, (c) a network bandwidth available
for reading the particular data file from one or more of the
computing devices that contain the particular data file, (d)
availability of a network connection with one or more of the
computing devices that contain the particular data file for reading
the particular data file, (e) a number of the users requiring
storage for the same data file at the storage system, or (f) access
pattern of the particular data file for a specific user or a subset
of the users. In some embodiments, one or more the above factors
can be weighted relative to each other. The popularity value can be
determined in various units and using various mathematical
equations. One example expression of a popularity value can include
a percentage value.
[0053] The data file replication management module 460 determines
the number of copies to be maintained at the storage system 105 for
a particular data file. Generally, higher the popularity of the
data file, lower is the number of copies of the data file stored at
the storage system 105. Further, various popularity value ranges
and number of copies that can be stored for each of the ranges can
be configured, e.g., by an entity such as an administrator of the
storage server. Further, in some embodiments, the data file
replication management module 460 also maintains an availability
table that includes data specifying which of the computing devices
has copies of which of the data files, and also includes data
specifying other attributes such as network bandwidth for the
computing devices, their network connection availability,
associated latency to obtain the data file, etc.
[0054] Request receiving module 440 receives requests from the
users for storing or retrieving data files at/from the storage
system 105. In some embodiments, the request receiving module 440
receives the request via the network component that facilitates
communication with the computing devices of the users.
[0055] Data file serving module 470 responds to the requests from a
user for retrieving the data files from storage system 105 by
retrieving the data file and serving it to the user. The data file
serving module 470 serves the data file by either retrieving the
data file from the storage system 105 or from one of the computing
devices if the storage system does not have the requested data
file. In some embodiments, the data file serving module 470 checks
with the availability table to determine which of the computing
devices has a copy of the requested data file, and retrieves the
copy of data file from one of the identified computing devices.
[0056] FIG. 5 illustrates a flow diagram for regulating data
storage at a storage system based on a popularity value of data
files, consistent with various embodiments. The process 500 may be
executed in a system such as system 100 of FIG. 1. At step 505, the
server 110 receives a request to store a data file from one or more
users. In some embodiments, if more than one user uploads the same
data file, multiple copies of the data file is created. At step
510, the server 110 stores the multiple copies of the data file at
the storage system 105.
[0057] At step 515, the server determines a popularity of the data
file. In some embodiments, the popularity of a data file is
measured using a popularity value. The popularity value of the data
file is determined based on a number of factors, including one or
more of: (a) a number of computing devices associated with one or
more of the users that contain the particular data file, (b) a
latency associated with reading the particular data file from one
or more of the computing devices that contain the particular data
file, (c) a network bandwidth available for reading the particular
data file from one or more of the computing devices that contain
the particular data file, (d) availability of a network connection
with one or more of the computing devices that contain the
particular data file for reading the particular data file, (e) a
number of the users requiring storage for the same data file at the
storage system, or (f) access pattern of the particular data file
for a specific user or a subset of the users. In some embodiments,
one or more the above factors can be weighted relative to each
other. The popularity value can be determined in various units and
using various mathematical equations. One example expression of a
popularity value can include a percentage value.
[0058] At step 520, the server 110 determines a number of copies of
the data file to be stored at the storage system 105 based on the
popularity value. In some embodiments, higher the popularity of the
data file, lower is the number of copies of the data file stored at
the storage system. For example, if the popularity value of a
particular data file is 100%, the storage system may not store any
copies of the particular data file since the particular data file
is available at all the computing devices of the users and can be
retrieved from any of the computing devices at any time. On the
other hand, if the popularity value of a particular data file is 0%
the storage system may store one or more copies of the particular
data file since the particular data file is not available at any of
the computing devices or cannot be retrieved within a maximum
accepted latency etc. Further, various popularity value ranges and
number of copies that can be stored for each of the ranges can be
configured, e.g., by an entity such as an administrator of the
storage server. For example, a popularity range of 0-9% may have 5
copies, 10-40% may have 4 copies, 41-70% may have 3 copies, 71-95%
may have 2 copies and 96-100% may have 0 (Zero) copies.
[0059] At step 525, the server 110 regulates or adjusts the number
of copies of the data file at the storage system by at least one
of: (a) not storing any copy of the data file at the storage system
if the popularity value exceeds a first threshold, (b) increasing
the number of copies stored at the storage system if the popularity
value is below a second threshold, or (c) decreasing the number of
copies stored at the storage system if the popularity value exceeds
a third threshold. In some embodiments, the number of copies of a
data file can be regulated either for a complete data file or for a
portion of the data file.
[0060] FIG. 6 illustrates an example process for serving a
particular data file from the storage system, consistent with
various embodiments. In an embodiment, the process 600 may be
implemented in a system such as system 100 of FIG. 1. At step 605,
the server 110 receives a request from a user to retrieve a data
file of the user from a storage system. In some embodiments, the
user can have one or more computing devices associated with the
user. The user may request using any of the computing devices. At
step 610, the server determines whether the storage system 105 has
an entire copy of the requested data file. Responsive to a
determination that the storage system 105 has the entire copy of
the requested data file, at step 645, the server 110 serves the
copy of the requested data file to the user from the storage system
105.
[0061] On the other hand, responsive to a determination that the
storage system 105 does not have a copy of the entire data file, at
step 615, the server 110 determines whether the storage system 105
has a portion of the requested data file. Responsive to a
determination that the storage system has a portion of the
requested file, e.g., a first block or segment etc., at step 620,
the server 110 determines which of the computing devices of other
users have a copy of the remaining portions of the requested data
file. In some embodiments, the server checks with the availability
table to determine which of the computing devices has a copy of the
requested data file.
[0062] In some embodiments, the server 110 generates a checksum for
each of the data files uploaded by the users to the server 110 for
storage of the data files. The checksums may be calculated for a
portion of the data file, e.g., a block of a file or a segment of
file that has a plurality of blocks, or a complete data file. The
server 110 stores the checksums of the data files in the
availability table. In some embodiments, the server 110 also stores
other attributes such as the names of the data file,
identifications of the computing devices from which the data files
are uploaded, a network bandwidth available for reading the copy of
files from the corresponding computing devices, a network
availability for connecting with the corresponding computing
devices, associated latency etc. Some of the foregoing attributes
may be updated periodically.
[0063] In some embodiments, the server 110 compares a checksum of
the requested data file with the stored checksums of the data files
to determine if any of the computing devices has the copy of the
requested data file. The server 110 chooses one of the computing
devices to retrieve a copy of the requested data file from based on
a predefined criterion. For example, the server 110 can choose a
computing device from which the copy of the data file can be read
with least latency.
[0064] At step 625, the server 110 retrieves the copy of the
remaining portions of the data file from one of the computing
devices. At step 630, the server 110 generates an entire copy of
the requested data file using the portions retrieved from the
identified computing device and the storage system 105. The server
115 can use various file joining techniques for generating a file
using various portions of the file. At step 645, the server 110
serves the copy of the data file to the user.
[0065] Referring back to step 615, responsive to a determination
that the storage system does not have a portion of the requested
file, at step 635, the server 110 determines which of the computing
devices of other users have a copy of the entire requested data
file. At step 640, the server 110 retrieves the copy of the entire
data file from one of the computing devices and, at step 645, the
server 110 serves the copy of the data file to the user.
[0066] Regardless of whether the data file is retrieved from the
storage system 105 or from the computing devices of the users, from
the perspective of the user who requested the data file, the user
sees the data file as being served from the storage system 105. The
user may be unaware of the fact that the data file is retrieved
from a computing device of another user.
[0067] FIG. 7 is a block diagram of a computer system as may be
used to implement features of some embodiments of the disclosed
technology. The computing system 700 may be used to implement any
of the entities, components or services depicted in the examples of
FIGS. 1-6 (and any other components described in this
specification). The computing system 700 may include one or more
central processing units ("processors") 705, memory 710,
input/output devices 725 (e.g., keyboard and pointing devices,
display devices), storage devices 720 (e.g., disk drives), and
network adapters 730 (e.g., network interfaces) that are connected
to an interconnect 715. The interconnect 715 is illustrated as an
abstraction that represents any one or more separate physical
buses, point to point connections, or both connected by appropriate
bridges, adapters, or controllers. The interconnect 715, therefore,
may include, for example, a system bus, a Peripheral Component
Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or
industry standard architecture (ISA) bus, a small computer system
interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus,
or an Institute of Electrical and Electronics Engineers (IEEE)
standard 1394 bus, also called "Firewire".
[0068] The memory 710 and storage devices 720 are computer-readable
storage media that may store instructions that implement at least
portions of the described technology. In addition, the data
structures and message structures may be stored or transmitted via
a data transmission medium, such as a signal on a communications
link. Various communications links may be used, such as the
Internet, a local area network, a wide area network, or a
point-to-point dial-up connection. Thus, computer-readable media
can include computer-readable storage media (e.g., "non-transitory"
media) and computer-readable transmission media.
[0069] The instructions stored in memory 710 can be implemented as
software and/or firmware to program the processor(s) 705 to carry
out actions described above. In some embodiments, such software or
firmware may be initially provided to the processing system 700 by
downloading it from a remote system through the computing system
700 (e.g., via network adapter 730).
[0070] The technology introduced herein can be implemented by, for
example, programmable circuitry (e.g., one or more microprocessors)
programmed with software and/or firmware, or entirely in
special-purpose hardwired (non-programmable) circuitry, or in a
combination of such forms. Special-purpose hardwired circuitry may
be in the form of, for example, one or more ASICs, PLDs, FPGAs,
etc.
REMARKS
[0071] The above description and drawings are illustrative and are
not to be construed as limiting. Numerous specific details are
described to provide a thorough understanding of the disclosure.
However, in certain instances, well-known details are not described
in order to avoid obscuring the description.
[0072] Further, various modifications may be made without deviating
from the scope of the invention. Accordingly, the invention is not
limited except as by the appended claims.
[0073] Reference in this specification to "one embodiment" or "an
embodiment" means that a particular feature, structure, or
characteristic described in connection with the embodiment is
included in at least one embodiment of the disclosure. The
appearances of the phrase "in one embodiment" in various places in
the specification are not necessarily all referring to the same
embodiment, nor are separate or alternative embodiments mutually
exclusive of other embodiments. Moreover, various features are
described which may be exhibited by some embodiments and not by
others. Similarly, various requirements are described which may be
requirements for some embodiments but not for other
embodiments.
[0074] The terms used in this specification generally have their
ordinary meanings in the art, within the context of the disclosure,
and in the specific context where each term is used. Certain terms
that are used to describe the disclosure are discussed below, or
elsewhere in the specification, to provide additional guidance to
the practitioner regarding the description of the disclosure. For
convenience, certain terms may be highlighted, for example using
italics and/or quotation marks. The use of highlighting has no
influence on the scope and meaning of a term; the scope and meaning
of a term is the same, in the same context, whether or not it is
highlighted. It will be appreciated that the same thing can be said
in more than one way. One will recognize that "memory" is one form
of a "storage" and that the terms may on occasion be used
interchangeably.
[0075] Consequently, alternative language and synonyms may be used
for any one or more of the terms discussed herein, nor is any
special significance to be placed upon whether or not a term is
elaborated or discussed herein. Synonyms for certain terms are
provided. A recital of one or more synonyms does not exclude the
use of other synonyms. The use of examples anywhere in this
specification including examples of any term discussed herein is
illustrative only, and is not intended to further limit the scope
and meaning of the disclosure or of any exemplified term. Likewise,
the disclosure is not limited to various embodiments given in this
specification.
[0076] Those skilled in the art will appreciate that the logic
illustrated in each of the flow diagrams discussed above, may be
altered in various ways. For example, the order of the logic may be
rearranged, substeps may be performed in parallel, illustrated
logic may be omitted; other logic may be included, etc.
[0077] Without intent to further limit the scope of the disclosure,
examples of instruments, apparatus, methods and their related
results according to the embodiments of the present disclosure are
given below. Note that titles or subtitles may be used in the
examples for convenience of a reader, which in no way should limit
the scope of the disclosure. Unless otherwise defined, all
technical and scientific terms used herein have the same meaning as
commonly understood by one of ordinary skill in the art to which
this disclosure pertains. In the case of conflict, the present
document, including definitions will control.
* * * * *