U.S. patent application number 09/883064 was filed with the patent office on 2002-04-25 for technique for accessing information in a peer-to-peer network.
This patent application is currently assigned to FLYCODE, Inc.. Invention is credited to Ardron, S. Mitra, Scott, Adrian C.H..
Application Number | 20020049760 09/883064 |
Document ID | / |
Family ID | 26906851 |
Filed Date | 2002-04-25 |
United States Patent
Application |
20020049760 |
Kind Code |
A1 |
Scott, Adrian C.H. ; et
al. |
April 25, 2002 |
Technique for accessing information in a peer-to-peer network
Abstract
The present invention provides an improved technique for
accessing information in a peer-to-peer network. According to
specific embodiments of the present invention, each file accessible
in the peer-to-peer network is assigned a respective hash ID or
fingerprint ID which is used to describe the contents of that file.
Files in the peer-to-peer network may be identified and/or accessed
based upon their associated hash ID values. In this way it is
possible to identify identical files stored in the peer-to-peer
network which have different file names and/or other metadata
descriptors. Since the content of all files having the same hash ID
will be identical, an automated process may be used to retrieve the
desired content from one or more of the identified files.
Inventors: |
Scott, Adrian C.H.; (San
Francisco, CA) ; Ardron, S. Mitra; (Fairfax,
CA) |
Correspondence
Address: |
BEYER WEAVER & THOMAS LLP
P.O. BOX 778
BERKELEY
CA
94704-0778
US
|
Assignee: |
FLYCODE, Inc.
|
Family ID: |
26906851 |
Appl. No.: |
09/883064 |
Filed: |
June 15, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60212177 |
Jun 16, 2000 |
|
|
|
Current U.S.
Class: |
1/1 ; 707/999.01;
707/E17.01; 707/E17.036 |
Current CPC
Class: |
G06F 16/9014 20190101;
H04L 67/06 20130101; H04L 67/104 20130101; H04L 67/108 20130101;
G06F 16/10 20190101; H04L 67/1063 20130101 |
Class at
Publication: |
707/10 |
International
Class: |
G06F 007/00 |
Claims
1. A method for accessing information in a peer-to-peer network,
the peerto-peer network including a plurality of peer devices and a
database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, the method comprising: selecting
a first information file; generating, using fingerprinting
algorithm, a first fingerprint ID relating to the content of the
first information file; and identifying the first information file
using the first fingerprint ID.
2. The method of claim 1 wherein the fingerprinting algorithm
corresponds to an MD5 Message-Digest algorithm.
3. The method of claim 1 wherein the fingerprinting algorithm
corresponds to a Secure Hash Algorithm (SHA1).
4. The method of claim 1 wherein the first information file is
stored at a first peer device, and wherein the first information
file has an associated first filename, the method comprising:
storing the first filename and first fingerprint ID at the first
peer device.
5. The method of claim 4 further comprising: transmitting the first
filename and the first fingerprint ID to the database system for
storage therein.
6. The method of claim 5 wherein the database system corresponds to
a remote database system.
7. The method of claim 1 further comprising: selecting a second
information file having content identical to the first information
file; applying the fingerprinting algorithm to the content of the
second information file to thereby generate an identical first
fingerprint ID to that of the first information file; and
identifying both the first and the second information file using
the first unique fingerprint ID.
8. The method of claim 7 wherein the first information file is
stored at a first peer device, and has a first associated filename,
and wherein the second information file is stored at a second peer
device, and has a has second associated filename, the method
further comprising: storing the first associated filename and first
fingerprint ID associated with the first information file in the
database system; and storing the second associated filename and
first fingerprint ID associated with the second information file in
the database system.
9. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective fingerprint ID associated therewith
relating to its file content, the method comprising: transmitting a
first message to the database system, the first message including a
search request for locating files in the network which match a
first search string; and receiving a first response from the
database system, the first response including first information
relating to identified files stored in the network which match the
first search string; the first information further including an
associated fingerprint ID for each identified file.
10. The method of claim 9 further comprising: transmitting a second
message to the database system, the second message including a
first fingerprint ID selected from the first information; and
receiving a second response from the database system in response to
the second message; the second response including second
information, the second information including at least one network
address corresponding to at least one peer device that has been
identified as having access to at least one file corresponding to
the first fingerprint ID.
11. The method of claim 10 further comprising: transmitting a third
message to a first peer device of the at least one peer devices,
the third message corresponding to a request to retrieve a first
file identified by the first fingerprint ID.
12. The method of claim 11 wherein the third message includes the
first fingerprint ID.
13. The method of claim 11 further comprising: receiving at least a
portion of the file content of the first file from the first peer
device in response to the third message.
14. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective fingerprint ID associated therewith
relating to its file content, the method comprising: transmitting a
first message to a first peer device, the first message
corresponding to a request to retrieve a first file identified by a
first fingerprint ID, wherein the first message includes the first
fingerprint ID, and wherein the first fingerprint ID is different
than a filename associated with the first file; and receiving a
first portion of the file content of the first file from the first
peer device in response to the first message.
15. The method of claim 14 further comprising: detecting a failure
in a file transfer process associated with the first peer device;
identifying a second portion of the first file content which has
not been received; and transmitting a second message to a second
peer device, the second message corresponding to a request to
retrieve the second portion of the first file content identified by
the first fingerprint ID, wherein the second message includes the
first fingerprint ID.
16. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the method comprising:
receiving file information from selected peer devices, the file
information relating to shared files stored at each of the selected
peer devices; the file information including a filename for each
shared file, and including a HASH ID for each shared file; storing
the file information in at least one data structure at the database
system; and identifying a desired shared file in the network using
its associated HASH ID.
17. The method of claim 16 further comprising identifying an
identity of a peer device using a selected HASH ID; wherein the
identified peer device has been identified as storing a file having
an associated HASH ID which matches the selected HASH ID.
18. The method of claim 16 further comprising identifying a network
address of a first peer device using a selected HASH ID; wherein
the first peer device has been identified as storing a file having
an associated HASH ID which matches the selected HASH ID.
19. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the method comprising:
receiving a first message from a first peer device, the first
message including a search request for locating files in the
network which match a first search string; generating a first
response to the first message, the response including a first list
of file records relating to files stored in the network which match
the first search string, wherein each file record includes an
associated HASH ID and an associated filename; and providing the
first list of file records to the first peer device.
20. The method of claim 19 further comprising: excluding from the
first list of file records duplicate records in which multiple file
records have the same associated HASH ID and filename.
21. The method of claim 19 further comprising: receiving a second
message from the first peer device in response to the first
response, the second message including at least one HASH ID;
identifying, using said at least one HASH ID, at least one network
address corresponding to at least one peer device which has been
identified as storing at least one file corresponding to the at
least one HASH ID; and providing, to the first peer device, a
second response, the second response including address information
which includes at least a portion of the at least one identified
network addresses.
22. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the method comprising:
identifying a first network addresses corresponding to a first peer
device which has been identified as storing a first information
file associated with a first HASH ID; identifying a second network
addresses corresponding to a second peer device which has been
identified as storing a second information file associated with the
first HASH ID; transmitting a first message to the first peer
device requesting a first portion of file content of the first
information file from the first peer device; and transmitting a
second message to the second peer device requesting a second
portion of file content of the second information file from the
second peer device.
23. The method of claim 22 wherein the first and second messages
each include the first HASH ID.
24. The method of claim 22 wherein the first and second messages
are initiated at substantially a same time
25. The method of claim 22 wherein the requesting of the first
portion of file content from the first peer device occurs
concurrently with the requesting of the second portion of file
content from the second peer device.
26. The method of claim 22 further comprising: receiving the first
portion of file content from the first peer device; receiving the
second portion of file content from the second peer device;
generating a third information file which includes the first and
second portion of file content, wherein the file content of the
third information file is identical to the file content of the
first information file.
27. The method of claim 22 further comprising: detecting a failure
in a file transfer process associated with the first peer device;
identifying a third network addresses corresponding to a third peer
device which has been identified as storing a third information
file associated with the first HASH ID; transmitting a third
message to the third peer device, the third message corresponding
to a request to retrieve the first portion of file content from the
third information file.
28. The method of claim 22 wherein the first portion of file
content corresponds to a first chunk of bytes 1 to N of the first
information file; and wherein the second portion of file content
corresponds to a second chunk of bytes N+1 to 2N of the second
information file.
29. A method for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the method comprising:
requesting from a first plurality of peer devices a respective
portion of file content from a respective information file, each
respective information file being identified as having identical
file content and having an identical first HASH ID being associated
therewith.
30. The method of claim 29 further comprising: receiving from at
least a portion of the first plurality of peer devices respective
portions of file content from the respective information file; and
reconstructing the respective portions of file content to assemble
a requested information file having file content identical to that
corresponding to the first HASH ID being associated therewith.
31. The method of claim 29 further comprising: before requesting a
respective portion, creating a content map of the file content
associated with the first HASH ID, said content map parceling the
file content into respective portions from 1 to M.
32. The method of claim 31 further comprising: assigning at least
one respective portion, from 1 to M, to a first peer device of the
first plurality of peer devices to request retrieval thereof.
33. The method of claim 32 further comprising: receiving from the
first peer device the one respective portion, from 1 to M, of file
content from the respective information file; and upon retrieval of
the entire one respective portion from the first peer device,
updating the content map corresponding to the retrieval
thereof.
34. The method of claim 33 further comprising: upon retrieval of
all respective portions, from 1 to M, of file content,
reconstructing the respective portions to assemble a requested
information file having file content identical to that
corresponding to the first HASH ID being associated therewith.
35. The method of claim 29 further comprising: identifying the
network addresses corresponding a first plurality of peer devices,
from 1 to X, each of the first plurality of peer devices being
identified as storing a respective information file, each having
identical file content and having an identical first HASH ID being
associated therewith;
36. The method of claim 35 further comprising: before requesting a
respective portion, creating a content map of the file content
associated with the first HASH ID, said content map parceling the
file content into respective portions from 1 to M, where
M>X.
37. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, the system comprising: at least
one CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to select a first information file; the system being
further configured or designed to applying a fingerprinting
algorithm to the content of the selected file to thereby generate a
first fingerprint ID relating to the content of the first
information file; and the system being further configured or
designed to identify the first information file using the first
fingerprint ID.
38. The system of claim 37 wherein the fingerprinting algorithm
corresponds to an MD5 Message-Digest algorithm.
39. The system of claim 37 wherein the fingerprinting algorithm
corresponds to a Secure Hash Algorithm (SHA1).
40. The system of claim 37 wherein the first information file is
stored at a first peer device, and wherein the first information
file has an associated first filename; and wherein the system is
further configured or designed to store the first filename and
first fingerprint ID at the first peer device.
41. The system of claim 40 being further configured or designed to
transmit the first filename and the first fingerprint ID to the
database system for storage therein.
42. The system of claim 41 wherein the database system corresponds
to a remote database system.
43. The system of claim 37 being further configured or designed to
select a second information file having content identical to the
first information file; the system being further configured or
designed to apply the fingerprinting algorithm to the content of
the second information file to thereby generate an identical first
fingerprint ID to that of the first information file; and the
system being further configured or designed to identify both the
first and the second information file using the first unique
fingerprint ID.
44. The system of claim 43 wherein the first information file is
stored at a first peer device, and has a first associated filename,
and wherein the second information file is stored at a second peer
device, and has a has second associated filename; the system being
further configured or designed to store the first associated
filename and first fingerprint ID associated with the first
information file in the database system; and the system being
further configured or designed to store the second associated
filename and first fingerprint ID associated with the second
information file in the database system.
45. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective fingerprint ID associated therewith
relating to its file content, the system comprising: at least one
CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to transmit a first message to the database system, the
first message including a search request for locating files in the
network which match a first search string; and the system being
further configured or designed to receive a first response from the
database system, the first response including first information
relating to identified files stored in the network which match the
first search string; the first information further including an
associated fingerprint ID for each identified file.
46. The system of claim 45 being further configured or designed to
transmit a second message to the database system, the second
message including a first fingerprint ID selected from the first
information; and the system being further configured or designed to
receive a second response from the database system in response to
the second message; the second response including second
information, the second information including at least one network
address corresponding to at least one peer device that has been
identified as having access to at least one file corresponding to
the first fingerprint ID.
47. The system of claim 46 being further configured or designed to
transmit a third message to a first peer device of the at least one
peer devices, the third message corresponding to a request to
retrieve a first file identified by the first fingerprint ID.
48. The system of claim 47 wherein the third message includes the
first fingerprint ID.
49. The system of claim 47 being further configured or designed to
tsb receive at least a portion of the file content of the first
file from the first peer device in response to the third
message.
50. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective fingerprint ID associated therewith
relating to its file content, the system comprising: at least one
CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to transmit a first message to a first peer device, the
first message corresponding to a request to retrieve a first file
identified by a first fingerprint ID, wherein the first message
includes the first fingerprint ID, and wherein the first
fingerprint ID is different than a filename associated with the
first file; and the system being further configured or designed to
receive a first portion of the file content of the first file from
the first peer device in response to the first message.
51. The system of claim 50 being further configured or designed to
detect a failure in a file transfer process associated with the
first peer device; the system being further configured or designed
to identify a second portion of the first file content which has
not been received; and the system being further configured or
designed to transmit a second message to a second peer device, the
second message corresponding to a request to retrieve the second
portion of the first file content identified by the first
fingerprint ID, wherein the second message includes the first
fingerprint ID.
52. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: at least
one CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to receive file information from selected peer devices,
the file information relating to shared files stored at each of the
selected peer devices; the file information including a filename
for each shared file, and including a HASH ID for each shared file;
the system being further configured or designed to storing the file
information in at least one data structure at the database system;
and the system being further configured or designed to identify a
desired shared file in the network using its associated HASH
ID.
53. The system of claim 52 being further configured or designed to
identify an identity of a peer device using a selected HASH ID;
wherein the identified peer device has been identified as storing a
file having an associated HASH ID which matches the selected HASH
ID.
54. The system of claim 52 being further configured or designed to
identify a network address of a first peer device using a selected
HASH ID; wherein the first peer device has been identified as
storing a file having an associated HASH ID which matches the
selected HASH ID.
55. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: at least
one CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to receive a first message from a first peer device, the
first message including a search request for locating files in the
network which match a first search string; the system being further
configured or designed to generate a first response to the first
message, the response including a first list of file records
relating to files stored in the network which match the first
search string, wherein each file record includes an associated HASH
ID and an associated filename; and the system being further
configured or designed to provide the first list of file records to
the first peer device.
56. The system of claim 55 further being further configured or
designed to exclude from the first list of file records duplicate
records in which multiple file records have the same associated
HASH ID and filename.
57. The system of claim 55 being further configured or designed to
receive a second message from the first peer device in response to
the first response, the second message including at least one HASH
ID; the system being further configured or designed to identify,
using said at least one HASH ID, at least one network address
corresponding to at least one peer device which has been identified
as storing at least one file corresponding to the at least one HASH
ID; and the system being further configured or designed to provide,
to the first peer device, a second response, the second response
including address information which includes at least a portion of
the at least one identified network addresses.
58. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: at least
one CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to identify a first network addresses corresponding to a
first peer device which has been identified as storing a first
information file associated with a first HASH ID; the system being
further configured or designed to identify a second network
addresses corresponding to a second peer device which has been
identified as storing a second information file associated with the
first HASH ID; the system being further configured or designed to
transmit a first message to the first peer device request a first
portion of file content of the first information file from the
first peer device; and the system being further configured or
designed to transmit a second message to the second peer device
request a second portion of file content of the second information
file from the second peer device.
59. The system of claim 58 wherein the first and second messages
each include the first HASH ID.
60. The system of claim 58 wherein the first and second messages
are initiated at substantially a same time
61. The system of claim 58 wherein the request of the first portion
of file content from the first peer device occurs concurrently with
the request of the second portion of file content from the second
peer device.
62. The system of claim 58 being further configured or designed to
receive the first portion of file content from the first peer
device; the system being further configured or designed to receive
the second portion of file content from the second peer device; and
the system being further configured or designed to generate a third
information file which includes the first and second portion of
file content, wherein the file content of the third information
file is identical to the file content of the first information
file.
63. The system of claim 58 being further configured or designed to
detect a failure in a file transfer process associated with the
first peer device; the system being further configured or designed
to identify a third network addresses corresponding to a third peer
device which has been identified as storing a third information
file associated with the first HASH ID; the system being further
configured or designed to transmit a third message to the third
peer device, the third message corresponding to a request to
retrieve the first portion of file content from the third
information file.
64. The system of claim 58 wherein the first portion of file
content corresponds to a first chunk of bytes 1 to N of the first
information file; and wherein the second portion of file content
corresponds to a second chunk of bytes N+1 to 2N of the second
information file.
65. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: at least
one CPU memory at least one interface for communicating with other
devices in the peer-to-peer network; the system being configured or
designed to request from a first plurality of peer devices a
respective portion of file content from a respective information
file, each respective information file being identified as having
identical file content and having an identical first HASH ID being
associated therewith.
66. The system of claim 65 being further configured or designed to
receive from at least a portion of the first plurality of peer
devices respective portions of file content from the respective
information file; and the system being further configured or
designed to reconstruct the respective portions of file content to
assemble a requested information file having file content identical
to that corresponding to the first HASH ID being associated
therewith.
67. The system of claim 65 being further configured or designed to
create, before request a respective portion, a content map of the
file content associated with the first HASH ID, said content map
parceling the file content into respective portions from 1 to
M.
68. The system of claim 67 being further configured or designed to
assign at least one respective portion, from 1 to M, to a first
peer device of the first plurality of peer devices to request
retrieval thereof.
69. The system of claim 68 being further configured or designed to
receive from the first peer device the one respective portion, from
1 to M, of file content from the respective information file; and
the system being further configured or designed to update, upon
retrieval of the entire one respective portion from the first peer
device, the content map corresponding to the retrieval thereof.
70. The system of claim 69 being further configured or designed to
reconstruct, upon retrieval of all respective portions, from 1 to
M, of file content, the respective portions to assemble a requested
information file having file content identical to that
corresponding to the first HASH ID being associated therewith.
71. The system of claim 65 being further configured or designed to
identify the network addresses corresponding a first plurality of
peer devices, from 1 to X, each of the first plurality of peer
devices being identified as storing a respective information file,
each having identical file content and having an identical first
HASH ID being associated therewith;
72. The system of claim 71 being further configured or designed to
create, before request a respective portion, a content map of the
file content associated with the first HASH ID, said content map
parceling the file content into respective portions from 1 to M,
where M>X.
73. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for selecting a first information file;
computer code for generating, using fingerprinting algorithm, a
first fingerprint ID relating to the content of the first
information file; and computer code for identifying the first
information file using the first fingerprint ID.
74. The computer program product of claim 73 wherein the
fingerprinting algorithm corresponds to an MD5 Message-Digest
algorithm.
75. The computer program product of claim 73 wherein the
fingerprinting algorithm corresponds to a Secure Hash Algorithm
(SHA1).
76. The computer program product of claim 73 wherein the first
information file is stored at a first peer device, and wherein the
first information file has an associated first filename, the
computer program product comprising: computer code for storing the
first filename and first fingerprint ID at the first peer
device.
77. The computer program product of claim 76 further comprising:
computer code for transmitting the first filename and the first
fingerprint ID to the database system for storage therein.
78. The computer program product of claim 77 wherein the database
system corresponds to a remote database system.
79. The computer program product of claim 73 further comprising:
computer code for selecting a second information file having
content identical to the first information file; computer code for
applying the fingerprinting algorithm to the content of the second
information file to thereby generate an identical first fingerprint
ID to that of the first information file; and computer code for
identifying both the first and the second information file using
the first unique fingerprint ID.
80. The computer program product of claim 79 wherein the first
information file is stored at a first peer device, and has a first
associated filename, and wherein the second information file is
stored at a second peer device, and has a has second associated
filename, the computer program product further comprising: computer
code for storing the first associated filename and first
fingerprint ID associated with the first information file in the
database system; and computer code for storing the second
associated filename and first fingerprint ID associated with the
second information file in the database system.
81. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, wherein each
shared file in the network has a respective fingerprint ID
associated therewith relating to its file content, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for transmitting a first message to the
database system, the first message including a search request for
locating files in the network which match a first search string;
and computer code for receiving a first response from the database
system, the first response including first information relating to
identified files stored in the network which match the first search
string; the first information further including an associated
fingerprint ID for each identified file.
82. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, wherein each
shared file in the network has a respective fingerprint ID
associated therewith relating to its file content, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for transmitting a first message to a
first peer device, the first message corresponding to a request to
retrieve a first file identified by a first fingerprint ID, wherein
the first message includes the first fingerprint ID, and wherein
the first fingerprint ID is different than a filename associated
with the first file; and computer code for receiving a first
portion of the file content of the first file from the first peer
device in response to the first message.
83. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, wherein each
shared file in the network has a respective HASH ID associated
therewith relating to its file content, the HASH ID being different
from a respective filename associated with each file, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for receiving file information from
selected peer devices, the file information relating to shared
files stored at each of the selected peer devices; the file
information including a filename for each shared file, and
including a HASH ID for each shared file; computer code for storing
the file information in at least one data structure at the database
system; and computer code for identifying a desired shared file in
the network using its associated HASH ID.
84. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, wherein each
shared file in the network has a respective HASH ID associated
therewith relating to its file content, the HASH ID being different
from a respective filename associated with each file, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for receiving a first message from a
first peer device, the first message including a search request for
locating files in the network which match a first search string;
computer code for generating a first response to the first message,
the response including a first list of file records relating to
files stored in the network which match the first search string,
wherein each file record includes an associated HASH ID and an
associated filename; and computer code for providing the first list
of file records to the first peer device.
85. A computer program product for accessing information in a
peer-to-peer network, the peer-to-peer network including a
plurality of peer devices and a database system accessible by at
least a portion of the peer devices, each of the peer devices being
configured to store information files, and further being configured
to share content from selected information files with at least a
portion of the other peer devices in the network, wherein each
shared file in the network has a respective HASH ID associated
therewith relating to its file content, the HASH ID being different
from a respective filename associated with each file, the computer
program product comprising: a computer usable medium having
computer readable code embodied therein, the computer readable code
comprising: computer code for identifying a first network addresses
corresponding to a first peer device which has been identified as
storing a first information file associated with a first HASH ID;
computer code for identifying a second network addresses
corresponding to a second peer device which has been identified as
storing a second information file associated with the first HASH
ID; computer code for transmitting a first message to the first
peer device requesting a first portion of file content of the first
information file from the first peer device; and computer code for
transmitting a second message to the second peer device requesting
a second portion of file content of the second information file
from the second peer device.
86. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: means
for identifying a first network addresses corresponding to a first
peer device which has been identified as storing a first
information file associated with a first HASH ID; means for
identifying a second network addresses corresponding to a second
peer device which has been identified as storing a second
information file associated with the first HASH ID; means for
transmitting a first message to the first peer device requesting a
first portion of file content of the first information file from
the first peer device; and means for transmitting a second message
to the second peer device requesting a second portion of file
content of the second information file from the second peer
device.
87. A system for accessing information in a peer-to-peer network,
the peer-to-peer network including a plurality of peer devices and
a database system accessible by at least a portion of the peer
devices, each of the peer devices being configured to store
information files, and further being configured to share content
from selected information files with at least a portion of the
other peer devices in the network, wherein each shared file in the
network has a respective HASH ID associated therewith relating to
its file content, the HASH ID being different from a respective
filename associated with each file, the system comprising: means
for receiving a first message from a first peer device, the first
message including a search request for locating files in the
network which match a first search string; means for generating a
first response to the first message, the response including a first
list of file records relating to files stored in the network which
match the first search string, wherein each file record includes an
associated HASH ID and an associated filename; and means for
providing the first list of file records to the first peer device.
Description
RELATED APPLICATION DATA
[0001] This application claims priority under 35 U.S.C. Section
119(e) from U.S. Provisional Patent Application No. 60/212,177,
filed Jun. 16, 2000, attached hereto as Appendix E, which is
incorporated herein by reference in its entirety for all
purposes.
BACKGROUND OF THE INVENTION
[0002] Over the past decade, there has been an explosive growth in
computer network technology, which has dramatically changed the
degree and type of information available to users connected to
computer networks, such as, for example, the Internet. As
information becomes more accessible over local and wide area
networks, new techniques for file storage and distribution are
developed. Currently, most existing architectures for file
distribution in a network environment utilize centralized file
storage and transfer architecture, in which files are stored in
central servers and accessed by individual distributed client
programs. However, as the files increase in number and size, file
storage and distribution from these central servers often becomes
problematic.
[0003] One type of file sharing technology which addresses some of
the problems posed by centralized file storage systems relates to
distributed file storage systems, such as those implemented in
peer-to-peer networks. As commonly known to one having ordinary
skill in the art peer-to-peer networks may be used for implementing
distributed file sharing systems wherein selected files stored on
each peer network device may be made accessible to other peer
network devices in the peer-to-peer network. Accordingly,
peer-to-peer network architectures are highly scalable, since files
may be retrieved from many locations rather than just one central
location (e.g., a central server).
[0004] In recent years, there have been significant advances in
peer-to-peer network technology, particularly with regard to the
Internet. For example, peer-to-peer file sharing applications such
as NAPSTER.TM. and GNUTELLA.TM. now provide the ability for
Internet users to configure their computer systems to function as
peer network devices in a peer-to-peer network implemented across
the Internet. In this way, an Internet user is able to access
desired files which are stored at the computer systems of other
Internet users.
[0005] While this first generation peer-to-peer architecture solved
some of the problems associated with centralized file storage, it
also introduced new problems such as, for example, file access,
reliability, speed, security, etc. For example, using peer-to-peer
file sharing applications such as NAPSTER.TM., shared files in the
peer-to-peer network were identified and retrieved based upon their
file names. Thus, for example, if a name of a file were misspelled,
there was no other way of identifying the file during a search.
Additionally, if a peer which was currently involved in one or more
file retrieval operations went offline, the file retrieval
operations would fail. The requesting user then had to discard the
partial file contents and pick a new peer to download from.
Consequently, very large files were virtually impossible to
retrieve since few peers remained online long enough to complete
such a large transfer.
[0006] It will be appreciated that there are numerous issues
relating to peer-to-peer network technology which remain to be
resolved. Accordingly, continuous efforts are being undertaken to
improve peer-to-peer networking technology in order to provide
improved file storage, access, and distribution techniques
implemented over a data network.
SUMMARY OF THE INVENTION
[0007] According to different embodiments of the present invention,
methods, systems, and computer program products are disclosed for
accessing information in a peer-to-peer network. The peer-to-peer
network includes a plurality of peer devices and a database
accessible by at least a portion of the peer devices. Each of the
peer devices is configured to store information files, and is
further configured to share content from selected information files
with at least a portion of the other peer devices in the network.
Each shared file in the network has a respective fingerprint IID
associated therewith relating to its file content.
[0008] According to specific embodiments, files in the peer-to-peer
network may be identified and/or accessed based upon their
associated hash ID values. In this way it is possible to identify
identical files stored in the peer-to-peer network which have
different file names and/or other metadata descriptors.
Additionally, since the content of all files having the same hash
ID will be identical, an automated process may be used to retrieve
the desired content from one or more of the identified files. For
example, a user may elect to retrieve a desired file (having an
associated hash ID) which may be stored at one or more remote
locations in the peer-to-peer network. Rather than the user having
to select a specific location for accessing and retrieving the
desired file, an automated process may use the hash ID (associated
with the desired file) to automatically select one or more remote
locations for retrieving the desired file. According to different
embodiments, the automated process may choose to retrieve the
entire file contents of the desired file from a specific remote
location, or may choose to receive selected portions of the file
contents of the desired file from different remote locations in the
peer-to-peer network. Further, if an error occurs during the file
transfer process, resulting in a partial file transfer, the
automated process may be configured to identify the portion(s) of
the desired file which were not retrieve, and automatically select
at least one different remote location for retrieving the remaining
contents of the desired file.
[0009] Additional objects, features and advantages of the various
aspects of the present invention will become apparent from the
following description of its preferred embodiments, which
description should be taken in conjunction with the accompanying
drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1 illustrates a block diagram of peer-to-peer network
which may be used for implementing the technique of the present
invention in accordance with a specific embodiment of the present
invention.
[0011] FIG. 2 shows a block diagram of HASH coding of a file in
accordance with a specific embodiment of the present invention.
[0012] FIG. 3A shows a block diagram of directory data structures
in accordance with a specific embodiment of the present
invention.
[0013] FIG. 3B shows a block diagram of an example of specific data
stored in the directory data structures in accordance with a
specific embodiment of the present invention.
[0014] FIG. 4A shows a block diagram of an example of specific data
stored in the peer directory structures in accordance with a
specific embodiment of the present invention.
[0015] FIGS. 4B and 4C illustrates flow diagrams of the directory
synchronization process between a local peer directory and a
central directory in accordance with a specific embodiment of the
present invention.
[0016] FIG. 5 illustrates a trace diagram of the technique for
searching files in accordance with a specific embodiment of the
present invention.
[0017] FIGS. 6A-6C illustrates a trace diagram of a file retrieving
technique in accordance with a specific embodiment of the present
invention.
[0018] FIG. 7 shows a trace diagram of another file retrieving
technique from multiple peers in accordance with an alternative
embodiment of the present invention.
[0019] FIG. 8 shows a block diagram of a chunk map for the
management of the retrieval of "chunks" of a file for the file
retrieving technique in accordance with the alternative embodiment
of the present invention.
[0020] FIG. 9 illustrates a flow diagram of the chunk management
technique across multiple worker threads for the file retrieving
technique in accordance with the alternative embodiment of the
present invention.
[0021] FIG. 10 illustrates a flow diagram of the chunk management
technique when an unresponsive peer for the file retrieving
technique in accordance with the alternative embodiment of the
present invention.
[0022] FIG. 11 shows a specific embodiment of a peer network device
60 which may be used for implementing the technique of the present
invention.
[0023] FIG. 12 is a diagram of an example of a screen shot
illustrating a user interface on a peer device in accordance with a
specific embodiment of the present invention.
[0024] FIG. 13 is a diagram of another example of a screen shot
illustrating a user interface showing a search input field in
accordance with a specific embodiment of the present invention.
[0025] FIG. 14 is a diagram of the example of a screen shot of FIG.
13 illustrating a selection of the search input field in accordance
with a specific embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0026] The present invention provides an improved technique for
accessing information in a peer-to-peer network. According to
specific embodiments of the present invention, each file accessible
in the peer-to-peer network is assigned a respective hash ID or
fingerprint ID which is used to describe the contents of that file.
According to one embodiment, a conventional hash or fingerprinting
algorithm may be used to analyze the contents of a selected file,
and generate a unique hash ID or fingerprint ID which may be used
for identifying the specific contents of that file. The hashing
algorithm is designed such that no two files having different file
content will have the same hash ID. However, files having identical
file content will have the same hash ID. In one implementation, the
file name and metadata associated with a file are not included in
the computation of the hash ID for that file.
[0027] According to specific embodiments, files in the peer-to-peer
network may be identified and/or accessed based upon their
associated hash ID values. In this way it is possible to identify
identical files stored in the peer-to-peer network which have
different file names and/or other metadata descriptors.
Additionally, since the content of all files having the same hash
ID will be identical, an automated process may be used to retrieve
the desired content from one or more of the identified files. For
example, a user may elect to retrieve a desired file (having an
associated hash ID) which may be stored at one or more remote
locations in the peer-to-peer network. Rather than the user having
to select a specific location for accessing and retrieving the
desired file, an automated process may use the hash ID (associated
with the desired file) to automatically select one or more remote
locations for retrieving the desired file. According to different
embodiments, the automated process may choose to retrieve the
entire file contents of the desired file from a specific remote
location, or may choose to receive selected portions of the file
contents of the desired file from different remote locations in the
peer-to-peer network. Further, if an error occurs during the file
transfer process, resulting in a partial file transfer, the
automated process may be configured to identify the portion(s) of
the desired file which were not retrieve, and automatically select
at least one different remote location for retrieving the remaining
contents of the desired file.
[0028] Referring now to FIG. 1, a high level view of a peer-to-peer
network 100 is illustrated in accordance with a specific embodiment
of the present invention. The network 100 includes a plurality of
peer network devices 102a-102n, and at least one central server
system 110. According to one specific implementation, the peer
devices 102a-102n are communicably connected to each other and to
the central server 110 via the Internet 104. The peers may
communicate with each other and the server via the http protocol or
via a private protocol, the former being preferable to minimize the
effects of various firewalls that may exist between any given peer
device and the Internet 104.
[0029] The server 110 preferably includes software or firmware that
handles communication between the peer devices 102a-102n and the
central server 110, and that performs specific logical operations
on the directory 112 on the server. According to one specific
implementation, the directory 112 may be stored in a single
relational database, but permanent storage may also be accomplished
by an object database, a directory server, or multiple
databases.
[0030] Files 202 that are to be shared are only stored on the peer
devices 102a-102n. The directory 112 stores information about the
files (e.g., HASH ID, filename, metadata, size, type, etc.) but not
their contents. In addition, the directory 112 stores information
about the peer devices themselves (e.g., Peer ID) and most
importantly, which peer devices the files 202 exist on at any given
time.
[0031] Briefly, it will be appreciated that file 202 may contain
any type of stored information. These include, for example, jpeg,
mpeg and mp3.
[0032] Turning now to FIG. 2, the unique footprinting of a file 202
is illustrated in accordance with one specific embodiment of the
present invention. In particular, the process by which a file 202
may be uniquely identified is through the application of HASH code
204. Each peer device 102a-102n that wishes to publish a file on
the peer-to-peer network 100 first computes the HASH code 204 of
that file. The HASH code 204 may be computed via conventional
algorithms which generate a unique identifier based upon the
content of a particular file. Examples of fingerprinting or hash
code algorithm which may be used in conjunction with the technique
of the present invention include, MD5 (described in RFC 1321, and
attached hereto has Appendix A), and Keyed SHAL (described in RFC
2841, and attached hereto has Appendix B). Each of these references
is incorporated herein by reference in its entirety for all
purposes.
[0033] According to one specific implementation, MD5 (See Appendix
A) may be used to guarantee that the HASH code is unique for each
different file, even when two files differ by as little as one bit.
Other algorithms may also be used such as, for example, Keyed SHAI
(See Appendix B) which mathematically characterizes the audio or
visual waveforms of the content of the file and creates a unique
code representation of that file. According to a specific
embodiment, the filename, time/date stamps, and any other meta-data
about the file are not included in the computation of the HASH
code. Different peers, or even the same peer, can refer to or
describe what is in fact the same file content by different names.
The desired files are more likely to be found, as different users
may search for files using different names, descriptions, or other
characteristics, although the file content is identical. Such
unique ID association allows each user to name and describe a given
file in the manner most relevant to them.
[0034] According to one specific implementation, the central
directory data 300 may be stored in a relational database. FIG. 3A
illustrates three of the central tables in the directory data
structures 300: the file directory table 310; the elements table
320; and the users table 330. The file directory table 310, for
example, contains information about the file contents only,
including the HASH code or ID, the size, and the type (video,
music, image, etc . . . ). Other information specific to the
contents of the file may also be stored in this table. The elements
table 320, on the other hand, contains descriptive information or
meta data about each file as entered and maintained by each
individual user. This elements table 320 also includes foreign keys
Hash code ID and Peer device ID that points to the files 310 and
users tables 330, respectively. The users table 330 contains
information about each user and his/her peer device 102a-102n. The
users table also maintains the systems current knowledge as to
which user is currently connected to the peer-to-peer network
100.
[0035] According to one specific embodiment, the relationship
between the three tables of FIG. 3A can be defined such that each
user can have many files and each file can be owned by many users.
In addition, each user can have their own individual description of
each file and may possibly describe the same file in more than one
way, although the file content, and thus, the HASH coding are
identical.
[0036] In one embodiment, should a file be deleted by every user
that ever had them, there will be no record in the elements table
320 of that file but a record of that file's existence will remain
in the file directory table 310.
[0037] As best viewed in FIG. 3B, a specific example is diagrammed
which illustrates how a number of files, their descriptions, and
the users that own them might be represented in the directory
database 112. In file directory table 350, for instance, the HASH
ID 354, the file size 356 and file type 358 are categorized and
stored in the central server 110 for each particular file
351a-351c. Correspondingly, in the elements table 360, the HASH ID
364, the file name 362, the metadata 366 and the peer Id 368 are
categorized and stored as well. Finally, in user table 370, the
Peer ID 374, the address location 372 and the "online"
determination, are categorized and stored on the server.
[0038] File 351a, for example, refers to a specific file whose HASH
code is "A," whose length in "n" bytes, and whose type is "MP3."
Two users, "Peer 1" and "Peer 2" each have a copy of this file
351a. Peer 1 has named file 351a as "Hotel California" in the file
name 362 and described it as "Eagles" (in this case the name of the
artist that recorded that song) in the metadata 366. Peer 2, in
contrast, has named file 351a as "California, Hotel" in the file
name 362 and described it simply as "Music File" in the metadata
366. Peer 1 and Peer 2 are both currently online, so that any other
peer wishing to obtain either "Hotel California" or "California,
Hotel," both of which ultimately have HASH code "A", will have two
choices as to which machine to obtain it from.
[0039] Each peer 102a-102n in the peer-to-peer network 100
maintains their own local directory of what files (e.g., 451a-451c)
it has available. In one particular embodiment, this directory may
be stored as a local XML file for easy exchange with the server 110
over internet 104. FIG. 4A illustrates the contents of a local
directory on a specific peer. The local file table 450 contains a
record for each file 451a-451c. Each file may be described by its
filename 452 (which may be subject to the constraints of the local
operating system as to permitted characters, format, and length), a
HASH ID 453, a file size 454, a file type 457, file meta data 458,
and a location or folder path 459. The latter facilitates the peer
device to quickly locate the file on the local disk when another
peer device requests it.
[0040] Other files may be added to the local peer directory
whenever the user drags them into one or more specific folders on
the local machine. Files may also be deleted from the local
directory when they are deleted from those folders or moved out of
the specific directories. Alternative implementations may use
different business rules and user interface processes to specify
when a file becomes available for sharing.
[0041] To determine what files are available and where they may be
available in the central directory 112 so that they can readily be
searched by any peer 102a-102n, a directory synchronization process
is necessary. Applying this technique, it may be the responsibility
of each peer 102a-102n in the network 100 to send their list of
files they have available, at least periodically, along with any
changes to that list as they occur, to the central directory
112.
[0042] Referring now to one specific implementation, FIG. 4B shows
the directory synchronization process between the local peer
directory 450 and the central directory 112 commencing at operation
400. Initially at operation 402, the selected local files to be
made available for file sharing are identified. The HASH code of
each file to be shared (if it has not already been done previously
or if a file has changed for any reason) is computed in operation
403. Once the HASH code for each new or changed file has been
computed, the entire local directory 450, or alternatively only
changes and additions to the local directory, are transmitted to
the central server 110 in operation 404 via the internet 104. The
central server 110 then proceeds to synchronize each individual
file with the central directory 112. The first central directory
operation 405 checks to see if a file with that particular HASH ID
already exists in the file directory 310 of data directory
structure 300 (FIG. 3A). If it doesn't, operation 406 is performed
to add the new file to the central directory 112. In either case,
the system now proceeds to operation 407, which is to check to see
if a record exists in the elements table 320 for this particular
file and user combination. If it does not, operation 408 is
performed to add the filename, meta data, and peer ID to the
elements table. If an element already existed for that user ID and
file ID in the elements table 320, that record is checked for any
required changes and updated as necessary. In any event, the new
data coming from the peer 102a-102n takes precedence over
corresponding existing data in the central directory 112.
[0043] FIG. 4C further illustrates an alternative embodiment
process of adding new files into the local directory 470. This file
addition process may be performed whenever a new location (a file,
several files, or a folder containing several files) is added to
the list of locations available for file sharing or whenever a new
file is added to an existing shared folder. Whenever a new file is
identified in operation 474, its HASH code is immediately
determined in operation 476. The local directory 450 is updated in
operation 478 with the particular information for that file,
including its filename, HASH ID, size, and location on the peer
device.
[0044] FIG. 5 illustrates the process of searching for files
according to one specific implementation. When a requesting peer1
(504) searches for a fall or partial filename or a particular
keyword, the peer software sends the search request to the central
server 502 in operation (1). The central server 502 processes this
request and generates a list of matching files at operation (3),
returning it to the requesting peer1 (504) in operation (5). The
peer1 (504) displays only the relevant information to the user, who
then selects one or more files to retrieve in operation (7). The
requesting peer1 (504) then sends the HASH ID of one or more files
to be retrieved to the server 502 at operation In operation (11),
the server identifies zero, or one or more "on-line" locations
(addresses) at which the requested file(s) may be found. The list
of HASH Ids and matching locations is returned to the requesting
peer1 (504) in operation (13). Finally, in operation (15), the
requesting peer initiates the procedure to retrieve the selected
file(s) from the locations provided by the server 502, the
operations of which are described below.
[0045] In one embodiment, the returned list in operation (13) may
be limited to a maximum number of locations for each file, since in
practice a file can usually be reliably retrieved from a relatively
small number of locations. In an alternative embodiment, the server
502 may have returned the possible locations of each file with the
results of operation (5). This would be beneficial by virtue of
saving the second query to the server in operation (9) but would be
expensive since locations would have to be found and transmitted,
even for files that are not ultimately desired. In addition, the
locations an available file may change in the intervening time
between the original search and the selection of particular files
to retrieve.
[0046] Once a list of locations for a file has been identified, the
requesting peer1 (504) can choose a location from the list
(assuming there is more than one location) using any number of
different techniques. The requesting peer1 (504) can just pick the
first location from the list, pick a location at random, or use
some heuristic algorithm to find the "best" location to retrieve
from. This may involve "pinging" each location to determine its
relative distance on the network 100 from the requesting peer1
(504), or requesting a 1024 byte packet from each location to see
which one can deliver bytes fastest. Such heuristics may also be
useful to see which peers the requesting peer can complete a
connection to. It may be impractical for the server 502 to know
which peers can communicate efficiently with what other peers at
any given time, since the server may not have sufficient knowledge
of the topology of the network 100 or of the traffic loads that
exist on it at any particular time.
[0047] In FIG. 6A, in accordance with one specific embodiment, once
the requesting peer1 (504) has chosen a first location to start
retrieving the file from (e.g., peer2 (506) in this case), the
requesting peer1 makes a request at operation (17) to peer2 (506)
for a file that has the desired HASH ID. Peer2 (506) then
identifies which file in its local directory corresponds to the
desired HASH ID, for example, by performing a lookup or search in
its local directory 450 at operation (19). If the requesting peer1
(504) and peer2 (506) can reliably communicate, peer2 transmits the
contents of the requested file to the requesting peer1 at operation
(21). Once the file has been successfully retrieved, the file is
stored locally in operation (23).
[0048] The name and meta data attached to the file at this point
will be that which was originally selected in the search results in
operation (5) of FIG. 5. This name and meta data may not
necessarily be the same as the name and meta data attached to the
file by peer2 (506). Finally, in operation (24), the requesting
peer1 notifies the server 502 that it has a copy of the file, in
this way potentially becoming a fulfilling peer for a subsequent
request for this same file. Furthermore, this final message from
the requesting peer can be used by the server to log successful
file transfers, helping operators monitor the efficiency of the
network 100.
[0049] In some situations, such as the presence of firewalls, proxy
servers, or other network devices, or the fact that the fulfilling
peer is no longer online, the first fulfilling peer2 (506) may not
answer a request for a file. FIG. 6B illustrates this situation, in
one specific implementation. If the fulfilling peer2 (506) does not
respond within a nominal timeout interval, the requesting peer1
(504) will select the next location Peer3 (508) from the list of
available locations determined in operation (13) of FIG. 5, using
any one of a number of heuristic algorithms to do so. Assuming in
this case that Peer1 (504) and Peer3 (508) can communicate, Peer1
will send a request for the file with the specified HASH ID to
Peer3 in operation (27). Next, Peer3 will find that file on the
local peer device in operation (29), and will transmit the contents
of the requested file to Peer 1 in operation (31). Once the file
has been successfully retrieved, the file is stored locally in
operation (33). Finally, in operation (34), the requesting peer
(504) will notify the server 502 that it now has a copy of the
file.
[0050] According to one specific implementation, the retrieving of
a file from one fulfilling peer that is interrupted for any reason
may be resumed from another peer that is online and that has that
file. FIG. 6C illustrates the situation where a file may be
partially retrieved from the first fulfilling peer2 (506) in the
list of locations determined at operation (13) of FIG. 5. After a
time, the first fulfilling Peer2 (506) is no longer providing the
contents of the file to the requesting peer1 (504) at operation
(21a). The requesting Peer1 (504) detects a timeout in operation
(35)1 and then decides to proceed with retrieving the remainder of
the file from a second fulfilling Peer3 (508). The requesting peer1
(504) in this case makes a request in operation (37) for the file
with HASH ID starting at a position one byte greater than the
amount of the file retrieved so far from the next peer in the list
of available locations; in this case Peer3 (508).
[0051] It is crucial that the file contents on the first fulfilling
Peer2 (506) and the second fulfilling Peer3 (508) corresponding to
HASH ID be identical in every respect, since otherwise these parts
of files may not fit together correctly and result in a damaged or
corrupted final file. This is why it is essential to pick a HASH
function that will uniquely create a unique HASH code from a file's
contents.
[0052] Once the second fulfilling Peer3 (508) identifies which file
in its local directory corresponds to the desired HASH ID at
operation (39), the new fulfilling Peer3 (508) returns the
remainder of the file to the requesting peer1 (504) in operation
(41). The application in the requesting peer1 (504) then joins the
two chunks of the file together in operation (43) and stores the
file locally. In operation (44), the requesting peer1 (504) will
notify and update the central directory 112 of the server 502 that
it now has a copy of the file.
[0053] In an alternative embodiment, when the relative ability of
multiple peers to deliver files is not known, it can be
advantageous to retrieve different parts of a single file from
multiple peers 102a-102n simultaneously. This may be particularly
true if the requesting peer has a faster connection than most of
the fulfilling peers since the file can be retrieved faster than
any single fulfilling peer can deliver it.
[0054] According to one specific implementation shown in FIG. 7,
the requesting peer1 (504) requests different "chunks" of the
desired file from two different fulfilling peers--(506) and (508).
The partial file requests in operation (2a) and operation (2b)
takes the form of the HASH ID of the desired file, the starting
position in the file, and the end position in the file. In this
example, the requesting peer1 (504) may request for chunks of size
"n" bytes. The chunk size "n" may be statically or dynamically
determined based upon parameters such as, size of file to be
retrieved and/or number of peers which currently have the file
available. The first partial request in operation (2a) may be for
the first "n" bytes goes to peer3 (508), starting a byte no. 1 and
ending at byte no. n. The second partial request in operation (2b)
may be for the second "n" bytes goes to peer2 (506), starting a
byte no. n+1 and ending at byte no. 2n. Each fulfilling peer
returns the requested part of the file to the requesting peer1
(504) in operations (8a) and (8b). In operation 43, the requesting
peer1 (504) reassembles the chunks of the file received from each
peer in the right order into the actual file. The requesting peer1
(504) will then notify and update the central directory 112 of the
server 502 that it now has a copy of the file which is not
shown.
[0055] A file may be retrieved from multiple fulfilling peers in
parts or "chunks." According to one specific embodiment, it may be
the responsibility of the requesting peer to keep track of what
chunks have already been retrieved, what chunks are currently being
retrieved, and what chunks remain to be retrieved. FIG. 8
illustrates a "chunk map" 800 constructed by a "chunk manager" tool
or application in which a file has been divided into "m" chunks.
Each chunk will typically have the same size, for example "n"
bytes, except for the last chunk which may have an odd size since
there may be no guarantee that the requested file can be divided
into a number of equal sized chunks. According to this specific
embodiment, each chunk exists in one of three possible states:
AR=Assigned and Retrieved; ANR=Assigned and Not Retrieved; and
NAR=Not Assigned and Not Retrieved. The chunk manager uses this
state information to determine which chunks are to be retrieve
next. The file is known to have been completely retrieved when
every chunk is in the "AR" state.
[0056] In one specific example, a file whose size is 1.45 MB, for
instance, is being requested. If the system has been configured to
use a value of n=100 kb, there will be m=15 chunks, each of size
100 kb, except for the 15.sup.th chunk, whose size will be 50
kb.
[0057] According to one specific implementation, the chunk manager
can assign the retrieval of any one chunk of a file to a worker
thread (i.e. Peer1-PeerN). Multiple worker threads may be running
in parallel, each retrieving a distinct chunk of the file. The
chunk manager may employ a variety of techniques to assign chunks
to retrieve and peers to retrieve from to different threads. In one
embodiment, the chunk manager assigns "p" chunks sequentially to
"p" individual threads. Typically, p.ltoreq.m, the number of chunks
available, although an alternative may be to use more threads than
there are chunks and to simply terminate the surplus threads that
are not finished when the entire file has been retrieved. The
number of threads that can be run in parallel may be constrained by
system resources available on the peer device, by operating system
constraints, or for any other reason.
[0058] FIG. 9 illustrates one specific embodiment where a chunk
manager executive thread at operation (900) manages the efforts of
multiple worker threads. The executive thread starts with initial
parameters at operation (902) including a list of peers that have
the file with HASH ID that is to be retrieved, as well as the chunk
size "n" to be used in retrieving. Alternatively, the chunk manager
may compute its own value of n based on the number of peers
available and the size of the file.
[0059] The executive thread's first task at operation (904) is to
launch a number "p" of worker threads. Each worker thread starts up
in operation (932) and waits for an assignment in operation
(934).
[0060] The executive thread assigns a chunk to each available
worker thread in operation (906). Each unit of work may be
characterized by the HASH ID of the file to be retrieved, the peer
to retrieve it from, and the start and end positions in the file.
Each worker thread accepts a work assignment in operation (936),
and then makes a request to the assigned peer for the assigned
chunk in operation (938). This request includes the HASH ID of the
file and the start and end positions in the file. Meanwhile, the
executive thread waits for chunks to be received in operation
(908).
[0061] A query is performed at operation (940) about whether or not
the desired chunk has been completely received. If "YES", when a
worker thread has completely received a chunk, it sends that chunk
or preferably, a reference to the location of the chunk in memory
or on a storage device, to the chunk manager in operation (942).
The chunk manager accepts the chunk at operation (910), and updates
its chunk map 800, marking a i the received chunk as having state
"AR" in operation (912).
[0062] If the query performed at operation (940) is answered with a
"NO", the failure is reported to the executive manager at operation
(944). This reported information will include the HASH ID of the
file to be retrieved, the peer to retrieve it from, and the start
and end positions in the file of the chunk not retrieved.
[0063] At this point, regardless of the query at operation (940),
the worker thread that has just finished returns to operation 934
and waits for another assignment. The executive thread examines its
updated chunk map in operation 914 to see if there are any
unassigned chunks (state "NAR") remaining. If there are, it selects
one unassigned chunks in operation (916) and assigns it to a free
worker thread in operation (906).
[0064] If there are no unassigned chunks, the executive thread may
decide to select an existing assigned, but not yet fully received
"ANR" chunk, for reassignment in operation (920). This decision may
be made based upon a variety of factors, including the current rate
of retrieving of unfinished chunks, the availability of additional
worker threads or peers, or the relative retrieve speed of
available peers. If an assigned, but unfinished chunk is selected
for reassignment, the existing worker thread assigned to that chunk
has its assignment terminated and becomes available for
reassignment at operation (922). The chunk it was working on is now
marked as "NAR" and is ready for reassignment in operation
(906).
[0065] In one specific embodiment, a specific peer may be attached
to a specific worker thread for the duration of the process. Worker
threads that finish sooner may get new work assigned to them, with
that new work being targeted to peers that are faster at delivering
chunks of the file. Should a peer fail to deliver a chunk in a
timely manner, it may be removed from the list of available peers
and the thread may request a new peer to interact with. The
delivery speed of various peers may vary over time, so that what
was a fast peer at the beginning of the process becomes a slow peer
towards the end or vice-versa. Since new work tends to go to
threads that are finishing their work fastest, the system
self-optimizes the retrieve to deliver the file as fast as
possible.
[0066] By way of example, in the previously discussed file of FIG.
8 where the 1.45 MB file is divided into m=fifteen (15) chunks of
n=100 kb (the .sub.15th chunk being 50 kb), ten (10) peers are
online and have the file with HASH ID available. The executive
manager decides to use p=eight (8) worker threads to retrieve the
file. If every chunk retrieved is successful and is performed in
the same amount of time, each thread will retrieve two 100 kb
chunks, except for the 8.sup.th thread, which will only retrieve
one 100 kb chunk, and for the 7.sup.th thread, which will retrieve
a 100 kb chunk followed by a 50 kb chunk.
[0067] In practice, some threads will be assigned to unresponsive
peers and will fail to retrieve their chunks the first time. The
executive thread may mark these peers as unresponsive and assign
new peers to the available threads. Furthermore, some peers may be
much faster than others at retrieving chunks. In that case, they
will become available for retrieving new chunks earlier, the result
being that one thread may retrieve five (5) or six (6) chunks while
the remainder only retrieve one or two.
[0068] Referring now to FIG. 10, one specific embodiment of a
possible process for dealing with peers that fail to respond for
chunks of a file is illustrated. The executive thread receives a
failure notice from a worker thread in operation (1002). It then
sets the measured speed of the corresponding peer to zero (0) in
operation (1004). Subsequent peer assignments may use a ranking of
measured speeds to pick the fastest available peers rather than the
slower, or non-responsive ones. According to a specific embodiment,
each peer may be assigned an average or nominal speed prior to the
start of file retrieving. The speed of each peer would then be set
to the actual speed as chunks are actually delivered. Since
delivery speed may vary on a minute-by-minute basis, the most
recent measurement of peer speed may be deemed to be authoritative.
At operation (1006) the status of identified peers as being
non-responsive are updated.
[0069] FIG. 12 illustrates one example of a user interface 1200 on
the peer device according to a specific embodiment of the present
invention. The user interface, for instance, may be implemented as
an application running on the Microsoft Windows operating
system.
[0070] In one example, as viewed in FIG. 13, the user interface
1200 includes a search input field 1300 where the user can search
for files. In this example the search term "clinton" is entered in
field 1300. Files that are found are displayed in a search results
list 1301.
[0071] The user can then select one or more files from the list
1301 of search results, as shown in FIG. 14, to retrieve from other
peers. In this instance, the mpeg file "clintong.mpeg" 1400 is
highlighted.
[0072] It will be appreciated that the technique of the present
invention provides improved peer-to-peer networking technology for
enabling faster and more reliable downloads, using multiple peers
in a round-robin or simultaneous retrieving mode, and/or being able
to resume failed downloads from different peers. According to a
specific embodiment, at least a portion of these features may be
implemented by identifying files based on their contents rather
than their file names. In this way it is possible to identify and
retrieve file content from one or more identical files stored in
the peer-to-peer network which have different file names and/or
other metadata descriptors.
[0073] According to a specific embodiment, the peer-to-peer network
of the present invention includes a central directory, like
Napster, but unlike Gnutella which uses a distributed directory.
However, it will be appreciated that the technique of the present
invention may be applied to both to central directory systems as
well as peer-to-peer, distributed directory systems.
[0074] Other Embodiments
[0075] Generally, the peer-to-peer file sharing techniques of the
present invention may be implemented on software and/or hardware.
For example, they can be implemented in an operating system kernel,
in a separate user process, in a library package bound into network
applications, on a specially constructed machine, or on a network
interface card. In a specific embodiment of this invention, the
technique of the present invention is implemented in software such
as an operating system or in an application running on an operating
system.
[0076] A software or software/hardware hybrid implementation of the
peer-to-peer file sharing technique of this invention may be
implemented on a general-purpose programmable machine selectively
activated or reconfigured by a computer program stored in memory.
Such programmable machine may be a network device designed to
handle network traffic, such as, for example, a router or a switch.
Such network devices may have multiple network interfaces including
frame relay and ISDN interfaces, for example. Specific examples of
such network devices include routers and switches. For example, the
technique of the present invention may be implemented on specially
configured routers or servers such as specially configured router
models 1600, 2500, 2600, 3600, 4500, 4700, 7200, 7500, and 12000
available from Cisco Systems, Inc. of San Jose, Calif. A general
architecture for some of these machines will appear from the
description given below. In an alternative embodiment, the
peer-to-peer file sharing technique of this invention may be
implemented on a general-purpose network host machine such as a
personal computer or workstation. Further, the invention may be at
least partially implemented on a card (e.g., an interface card) for
a network device or a general-purpose computing device.
[0077] Referring now to FIG. 11, a network device 60 suitable for
implementing the peer-to-peer file sharing techniques of the
present invention includes a master central processing unit (CPU)
62, interfaces 68, and a bus 67 (e.g., a PCI bus). When acting
under the control of appropriate software or firmware, the CPU 62
may be responsible for implementing specific functions associated
with the functions of a desired network device. For example, when
configured as a server device, the CPU 62 may be responsible for
analyzing packets, encapsulating packets, forwarding packets to
appropriate network devices, processing file search requests,
maintaining shared file information across the peer-to-peer
network, etc. Alternatively, when configured as a peer network
device, the CPU 62 may be responsible for initiating file search
requests, retrieving file content information from peer devices,
performing hash coding operations on selected files, etc. The CPU
62 preferably accomplishes all these functions under the control of
software including an operating system (e.g. Windows NT), and any
appropriate applications software.
[0078] CPU 62 may include one or more processors 63 such as a
processor from the Motorola family of microprocessors or the MIPS
family of microprocessors. In an alternative embodiment, processor
63 is specially designed hardware for controlling the operations of
network device 60. In a specific embodiment, a memory 61 (such as
nonvolatile RAM and/or ROM) also forms part of CPU 62. However,
there are many different ways in which memory could be coupled to
the system. Memory block 61 may be used for a variety of purposes
such as, for example, caching and/or storing data, programming
instructions, etc.
[0079] The interfaces 68 are typically provided as interface cards
(sometimes referred to as "line cards"). Generally, they control
the sending and receiving of data packets over the network and
sometimes support other peripherals used with the network device
60. Among the interfaces that may be provided are Ethernet
interfaces, frame relay interfaces, cable interfaces, DSL
interfaces, token ring interfaces, and the like. In addition,
various very high-speed interfaces may be provided such as fast
Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces,
HSSI interfaces, POS interfaces, FDDI interfaces and the like.
Generally, these interfaces may include ports appropriate for
communication with the appropriate media. In some cases, they may
also include an independent processor and, in some instances,
volatile RAM. The independent processors may control such
communications intensive tasks as packet switching, media control
and management. By providing separate processors for the
communications intensive tasks, these interfaces allow the master
microprocessor 62 to efficiently perform routing computations,
network diagnostics, security functions, etc.
[0080] Although the system shown in FIG. 11 illustrates one
specific network device of the present invention, it is by no means
the only network device architecture on which the present invention
can be implemented. For example, an architecture having a single
processor that handles communications as well as routing
computations, etc. is often used. Further, other types of
interfaces and media could also be used with the network
device.
[0081] Regardless of network device's configuration, it may employ
one or more memories or memory modules (such as, for example,
memory block 65) configured to store data, program instructions for
the general-purpose network operations and/or other information
relating to the functionality of the peer-to-peer file sharing
techniques described herein. The program instructions may control
the operation of an operating system and/or one or more
applications, for example. The memory or memories may also be
configured to include.
[0082] Because such information and program instructions may be
employed to implement the systems/methods described herein, the
present invention relates to machine readable media that include
program instructions, state information, etc. for performing
various operations described herein. Examples of machine-readable
media include, but are not limited to, magnetic media such as hard
disks, floppy disks, and magnetic tape; optical media such as
CD-ROM disks; magneto-optical media such as floptical disks; and
hardware devices that are specially configured to store and perform
program instructions, such as read-only memory devices (ROM) and
random access memory (RAM). The invention may also be embodied in a
carrier wave travelling over an appropriate medium such as
airwaves, optical lines, electric lines, etc. Examples of program
instructions include both machine code, such as produced by a
compiler, and files containing higher level code that may be
executed by the computer using an interpreter.
[0083] Additional embodiments of the present invention are
described in Appendix C and Appendix D to the present application,
each of which is incorporated herein by reference in its entirety
for all purposes. Appendix C is entitled, "FLYCODE DATABASE
SPECIFICATION", and Appendix D is entitled, "FLYCODE VERSION 2
ARCHITECTURE--SPECIFICATION".
[0084] Although several preferred embodiments of this invention
have been described in detail herein with reference to the
accompanying drawings, it is to be understood that the invention is
not limited to these precise embodiments, and that various changes
and modifications may be effected therein by one skilled in the art
without departing from the scope of spirit of the invention as
defined in the appended claims.
* * * * *