Peer-to-peer Live Content Delivery Yang; Bo ; et al. [Greenberg; Evan Pedro]

Peer-to-peer Live Content Delivery

Yang; Bo ; et al.

Patent Application Summary

U.S. patent application number 13/040338 was filed with the patent office on 2011-09-08 for peer-to-peer live content delivery. Invention is credited to Evan Pedro Greenberg, Bo Yang.

Application Number	20110219137 13/040338
Document ID	/
Family ID	44532232
Filed Date	2011-09-08

United States Patent Application	20110219137
Kind Code	A1
Yang; Bo ; et al.	September 8, 2011

PEER-TO-PEER LIVE CONTENT DELIVERY

Abstract

A peer-to-peer live content delivery system and method enables peer-to-peer sharing of live content such as, for example, streaming video or audio. Nodes receive broadcasts of available data from neighboring nodes and determine which data blocks to request. Nodes receiving requests for data determine whether or not to accept the requests and provide the requested blocks when accepted. To enable sharing of live content, sharing of data blocks is constrained such that a node attempts to receive a particular data block prior to a playback deadline for the data block. This allows a node continuously provide an output stream of the received data such as, for example, an output of live video content to a display.

Inventors:	Yang; Bo; (Palo Alto, CA) ; Greenberg; Evan Pedro; (Palo Alto, CA)
Family ID:	44532232
Appl. No.:	13/040338
Filed:	March 4, 2011

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61311141	Mar 5, 2010

Current U.S. Class:	709/231
Current CPC Class:	H04L 29/12103 20130101; H04L 67/14 20130101; H04L 63/029 20130101; H04L 29/12528 20130101; H04L 61/1535 20130101; H04L 61/2575 20130101
Class at Publication:	709/231
International Class:	G06F 15/16 20060101 G06F015/16

Claims

1. A method for distributing streaming data in a peer-to-peer network, the method performed by a first node in the peer-to-peer network, the method comprising: receiving, by the first node, a data availability broadcast message from a first neighboring node, the data availability broadcast message identifying one or more data blocks that the first neighboring node has available for transmission, the one or more data blocks comprising time-localized portions of the streaming data; determining, by the first node, a desired data block selected from the one or more data blocks specified in the data availability broadcast message, the desired data block selected such that the first node receives the desired data block prior to a playback deadline for the first data block; transmitting, by the first node, a data request message to the first neighboring node specifying the desired data block; and receiving, by the first node, the desired block from the first neighboring node.

2. The method of claim 1, wherein the streaming data comprises streaming video data and wherein the first node prioritizes an order of the data request message such that the first node continuously outputs the streaming video data to a display.

3. The method of claim 1, wherein determining the desired data block comprises: maintaining an incoming queue of incoming data blocks scheduled for transmission to the first node within a time window; and determining the desired data block from among the data blocks within the time window that are absent from the incoming queue.

4. The method of claim 1, further comprising: receiving directly from a server, a pre-burst of a plurality of data blocks, the pre-burst corresponding to a beginning of a new data stream.

5. The method of claim 1, further comprising: determining a desired data block that is unavailable from neighboring nodes; transmitting a request for the desired data block to a server; and receiving the desired data block from the server.

6. The method of claim 1, further comprising: transmitting a data availability broadcast message to a plurality of neighboring nodes, the data availability broadcast message identifying one or more data blocks available for transmission by the first node; receiving from a second neighboring node, a data request specifying at least one desired data block selected from among the one or more data blocks available for transmission by the first node; determining whether or not to accept the data request; responsive to determining to accept the data request, transmitting the desired data block to the second neighboring node.

7. The method of claim 6, further comprising: rejecting the data request responsive to determining that the first node is already currently transmitting at least a first threshold number of data blocks; and rejecting the data request responsive to determining that the first node has previously transmitted the desired data block to at least a second threshold number of nodes.

8. A computer-readable storage medium storing computer-executable instructions for distributing streaming data in a peer-to-peer network, the instructions when executed by a processor cause the processor to perform steps including: receiving a data availability broadcast message from a first neighboring node, the data availability broadcast message identifying one or more data blocks that the first neighboring node has available for transmission, the one or more data blocks comprising time-localized portions of the streaming data; determining a desired data block selected from the one or more data blocks specified in the data availability broadcast message, the desired data block selected such that the first node receives the desired data block prior to a playback deadline for the first data block; transmitting a data request message to the first neighboring node specifying the desired data block; and receiving the desired block from the first neighboring node.

9. The computer-readable storage medium of claim 8, wherein the streaming data comprises streaming video data and wherein the first node prioritizes an order of the data request message such that the first node continuously outputs the streaming video data to a display.

10. The computer-readable storage medium of claim 8, wherein determining the desired data block comprises: maintaining an incoming queue of incoming data blocks scheduled for transmission to the first node within a time window; and determining the desired data block from among the data blocks within the time window that are absent from the incoming queue.

11. The computer-readable storage medium of claim 8, wherein the instructions when executed further cause the processor to perform steps including: receiving directly from a server, a pre-burst of a plurality of data blocks, the pre-burst corresponding to a beginning of a new data stream.

12. The computer-readable storage medium of claim 8, wherein the instructions when executed further cause the processor to perform steps including: determining a desired data block that is unavailable from neighboring nodes; transmitting a request for the desired data block to a server; and receiving the desired data block from the server.

13. The computer-readable storage medium of claim 8, wherein the instructions when executed further cause the processor to perform steps including: transmitting a data availability broadcast message to a plurality of neighboring nodes, the data availability broadcast message identifying one or more data blocks available for transmission by the first node; receiving from a second neighboring node, a data request specifying at least one desired data block selected from among the one or more data blocks available for transmission by the first node; determining whether or not to accept the data request; responsive to determining to accept the data request, transmitting the desired data block to the second neighboring node.

14. The computer-readable storage medium of claim 13, wherein the instructions when executed further cause the processor to perform steps including: rejecting the data request responsive to determining that the first node is already currently transmitting at least a first threshold number of data blocks; and rejecting the data request responsive to determining that the first node has previously transmitted the desired data block to at least a second threshold number of nodes.

15. A system for distributing streaming data in a peer-to-peer network, the system comprising: one or more processors; and a computer-readable storage medium storing computer-executable instructions the instructions when executed by the one or more processors cause the one or more processors to perform steps including: receiving a data availability broadcast message from a first neighboring node, the data availability broadcast message identifying one or more data blocks that the first neighboring node has available for transmission, the one or more data blocks comprising time-localized portions of the streaming data; determining a desired data block selected from the one or more data blocks specified in the data availability broadcast message, the desired data block selected such that the first node receives the desired data block prior to a playback deadline for the first data block; transmitting a data request message to the first neighboring node specifying the desired data block; and receiving the desired block from the first neighboring node.

16. The system of claim 15, wherein the streaming data comprises streaming video data and wherein the first node prioritizes an order of the data request message such that the first node continuously outputs the streaming video data to a display.

17. The system of claim 15, wherein determining the desired data block comprises: maintaining an incoming queue of incoming data blocks scheduled for transmission to the first node within a time window; and determining the desired data block from among the data blocks within the time window that are absent from the incoming queue.

18. The system of claim 15, wherein the instructions when executed further cause the one or more processors to perform steps including: receiving directly from a server, a pre-burst of a plurality of data blocks, the pre-burst corresponding to a beginning of a new data stream.

19. The system of claim 15, wherein the instructions when executed further cause the one or more processors to perform steps including: determining a desired data block that is unavailable from neighboring nodes; transmitting a request for the desired data block to a server; and receiving the desired data block from the server.

20. The system of claim 15, wherein the instructions when executed further cause the one or more processors to perform steps including: transmitting a data availability broadcast message to a plurality of neighboring nodes, the data availability broadcast message identifying one or more data blocks available for transmission by the first node; receiving from a second neighboring node, a data request specifying at least one desired data block selected from among the one or more data blocks available for transmission by the first node; determining whether or not to accept the data request; responsive to determining to accept the data request, transmitting the desired data block to the second neighboring node.

Description

RELATED APPLICATIONS

[0001] This application claims priority from U.S. provisional application No. 61/311,141 entitled "High Performance Peer-To-Peer Assisted Live Content Delivery System and Method" filed on Mar. 5, 2010, the content of which is incorporated by reference herein in its entirety.

BACKGROUND

[0002] 1. Field of the Invention

[0003] The invention relates generally to peer-to-peer networking and more particularly to distributing data such as live content over a network within some time constraint.

[0004] 2. Description of the Related Art

[0005] Peer-to-peer networking provides an efficient network architecture for sharing information by creating direct connections between "nodes" without requiring information to pass through a centralized server. In a conventional peer-to-peer network, a node receives different portions of a file from a plurality of different neighboring nodes. Thus, when sharing a video file, for example, a node may receive different segments of the video from different nodes. Once all of the portions are received, the node can reconstruct the file from the separate portions.

[0006] Conventional peer-to-peer networking systems are not adapted to sharing live or streaming media content such as live video or audio. Rather, these conventional networks are only adapted to operate on discrete files rather than continuous data streams. Thus, these conventional sharing protocols are not adapted to handling the time constraints associated with delivery of streaming content. Therefore, the conventional systems do not provide any way to distribute data such as live content in a peer-to-peer network where portions of the data must be received within some time constraint.

SUMMARY

[0007] A system, method, and computer-readable storage medium enable nodes in a peer-to-peer network to share streaming data (e.g., video) and imposes time constraints such that nodes are able to continuously output the streaming data. A first node receives a data availability broadcast message from a neighboring node. The data availability broadcast message identifies one or more data blocks that the neighboring node has available for sharing. The data blocks each comprise a time-localized portion of the streaming data. The first node determines a desired data block selected from the one or more data blocks specified in the data availability broadcast message. In one embodiment, the first node makes selected the desired data block to ensure that it will receive all data blocks in the stream prior to their playback deadlines, i.e., before the first node is scheduled to output the data block (e.g., streaming a video to a display). The first node then transmits a data request message to the neighboring node specifying the desired data block. Assuming the neighboring node accepts the request, the first node receives the desired block from the neighboring node.

[0008] Beneficially, the data sharing system and method enables sharing of live content such as video or audio. By constraining the sharing of data blocks to occur within a limited time period, a node can provide a continuous output stream of the received data such as, for example, an output of live video content to a display.

[0009] The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] The teachings of the embodiments of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings.

[0011] FIG. 1 illustrates an example configuration of a peer-to-peer network, in accordance with an embodiment of the present invention.

[0012] FIG. 2 illustrates examples of data structures of streaming data for sharing in the peer-to-peer network, in accordance with an embodiment of the present invention.

[0013] FIG. 3 is illustrates an example of a message passing protocol for sharing data between nodes in a peer-to-peer network, in accordance with an embodiment of the present invention.

[0014] FIG. 4 illustrates a distribution tree structure for modeling distribution of data blocks in the peer-to-peer network, in accordance with an embodiment of the present invention.

[0015] FIG. 5 illustrates an example architecture for a computing device for use a server or node in a peer-to-peer network, in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

[0016] Reference in the specification to "one embodiment" or to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" or "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

[0017] Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations or transformation of physical quantities or representations of physical quantities as modules or code devices, without loss of generality.

[0018] However, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device (such as a specific computing machine), that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.

[0019] Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the present invention could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems. The invention can also be in a computer program product which can be executed on a computing system.

[0020] The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the purposes, e.g., a specific computer, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Memory can include any of the above and/or other devices that can store information/data/programs. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

[0021] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the method steps. The structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode of the present invention.

[0022] In addition, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention.

Overview

[0023] A peer-to-peer (P2P) type distributed architecture is composed of participants, e.g., individual computing resources that make a portion of their resources (e.g., processing power, disk storage or network bandwidth) available to other participants. FIG. 1 illustrates an example configuration of a peer-to-peer network 100 for distributing streaming content. The peer-to-peer network 100 comprises a server 110 and a number of nodes 120 (e.g., nodes A, B, C, D, and E). The server 110 is a computing device that provides coordinating functionality for the network. In one embodiment, the server 110 may be controlled by an administrator and has specialized functionality as will be described below. The nodes 120 are computing devices communicatively coupled to the network via connections to the server 110 and/or to one or more other nodes 120. Examples of computing devices that may act as a server or a node in the peer-to-peer network 100 are described in further detail below with reference to FIG. 5.

[0024] In this peer-to-peer network, 100 each node 120 has a neighboring relationship with and maintains information about a bounded subset of neighboring nodes (alternatively referred to as "peer nodes" or "peers" or "partners") to which it can directly communicate data or from which it can directly receive data. A node 120 does not necessarily have a direct connection to every other node 120 in the network 100. However, information can flow between nodes 120 that are not directly connected via hops between neighboring nodes. Each node 120 maintains a list of its neighboring nodes in the network graph. In one embodiment, the server 110 shares a direct connection with each of nodes 120.

[0025] The neighboring relationship between nodes 120 is determined by a membership management protocol. An example of a membership management protocol for a peer-to-peer network is described in U.S. Patent Application No. ______ to Yang, et al. filed on Mar. 4, 2011 and entitled "Network Membership Management for Peer-to-Peer Networking," which is incorporated by reference herein. In one embodiment, the membership management protocol determines the neighboring relationships between nodes 120 randomly according to a probabilistic formula. Furthermore, the neighboring relationships may change whenever a new node joins the network, when an existing node 120 leaves the network, or during periodic re-subscriptions which serve to rebalance the network graph. When operated in one such implementation, the probabilistically expected average subset size is (c+1)*log.sub.2(N), where c is a design parameter (e.g., a fixed value or a value configured by a network administrator, typically a small integer value) and N is the total number of nodes in the system. The protocol establishes that in such a network, for any constant k, sending a multicast message to log.sub.2(N)+k nodes (who then proceed to recursively forward the message to log.sub.2(N)+k other nodes that have not already seen the message) will reach every node in the network graph with theoretical probability of e.sup.-e -k. In one embodiment, the backend server 110 manages nodes according to a pod-based management scheme. An example of a pod-based management scheme is described in U.S. Patent Application No. ______ to Yang, et al. filed on Mar. 4, 2011, and entitled "Pod-Based Server Backend Infrastructure for Peer-Assisted Applications," which is incorporated by reference herein. Generally in the pod-based management scheme, each "pod" comprises a plurality of nodes and only nodes within the same pod can directly share data. The server dynamically allocates nodes to pods and dynamically allocates computing resources for pushing data to the pods based on characteristics of the incoming data stream and performance of the peer-to-peer sharing. By dynamically adjusting the pod structure and resources available to them based on monitored characteristics, the server 110 can optimize performance of the peer-to-peer network.

[0026] In one embodiment, the peer-to-peer network 100 distributes streaming data (e.g., audio, video, or other time-based content) to the nodes 120. The server 110 receives the streaming data 130 from a streaming data source (not shown). The streaming data source may be, for example, an external computing device or a storage device local to the server 110. In one embodiment, the streaming data 130 comprises am ordered sequence of data blocks. Each data block comprises a time-localized portion of the data stream 130. For example, for an input video stream, a data block may comprise a sequence of consecutive video frames from the input video stream (e.g., a 0.5 second chunk of video). For an input audio stream, a data block may comprise a time-localized portion of the audio.

[0027] The server 110 then distributes a received data block to one or more nodes 120. The receiving node(s) in turn may distribute the data block to one or more additional nodes, and so on. In this manner, the data block is distributed throughout the peer-to-peer network 100. The number of nodes to which the server 110 distributes a data block may vary depending on the network configuration, and may be vary for different data blocks in the input data stream 130. Similarly, the number of nodes to which a particular node distributes a data block may vary depending on the network configuration, and may also be different for different data blocks.

[0028] In one embodiment, the data distribution protocol constrains the timing of the distribution of data blocks in a manner optimized for streaming data. Distribution of streaming data differs from distribution of complete files in that the nodes generally do not store the received data blocks indefinitely, but rather may store them only temporarily until they are outputted. Thus, unlike file sharing protocols where the order and timing of data block distribution is not important, the distribution protocol for streaming data should attempt to provide data blocks within a specified time constraint such that they can be continuously outputted.

[0029] The data distribution protocol may be useful, for example, to distribute broadcasts of "live" video or other time-based streams. As used herein, the term "live" does not necessarily require that video stream is distributed concurrently with its capture (as in, for example, a live sports broadcast). Rather, the term "live" refers to data for which it is desirable that all participating nodes receive data blocks in a roughly synchronized manner (e.g., within a time period T of each other). Thus, examples of live data may include both live broadcasts (e.g. sports or events) that are distributed as the data is captured, and multicasts of previously stored video or other data.

[0030] In one embodiment, the distribution protocol attempts to ensure delivery of a data block to each node 120 on the network 100 within a time period T seconds from when the server 110 initially outputs the data block. For example, in different embodiments, T could be a few seconds, 5 minutes, or one hour. In one embodiment, if a node 120 cannot receive the media block within the time period T (e.g., due to bandwidth constraints or latency), the block is no longer considered useful to the node 120 and the node may be drop its request for the data block in favor of later blocks. Furthermore, the order in which data blocks are requested and distributed to various nodes 120 may be prioritized in order to optimize the nodes' ability to meet the time constraints, with the goal of enabling the nodes 120 to continuously output the streaming data.

Operation

[0031] In one embodiment, each neighboring node pair has a pair of connections: one for control and one for data. It is generally undesirable to have data transfer delay any control message. It is therefore helpful that control traffic be independent of data traffic, because a node may need to quickly request data from a neighboring node (e.g., because it is not able to get the data from its originally selected neighboring node). To accommodate this in one implementation, a node can open a pair of TCP connections and multiplex control packets onto one connection, and data packets onto the other. In this implementation, data and control packets may be out-of-order with respect to one another (though still in-order with respect to packets of the same type).

[0032] In one embodiment, the peer-to-peer sharing protocol includes (a) exchanging data availability with a set of nodes, (b) retrieving locally unavailable data from one or more nodes, and (c) supplying locally available data to nodes, as will be described in further detail below.

[0033] For each data block received by the server 110, the server 110 selects one or more "partner" nodes to which to distribute the data block. For live streaming, each node should receive the data block by a "playback deadline" such that the nodes can maintain a continuous output stream of the data. For example, for video data, the playback deadline refers to the point in time when a node providing a live output of the video is scheduled to output a particular time-localized block of video. Thus, distribution of data is prioritized with the goal of attempting to ensure that each node in the network receives data blocks by their respective playback deadlines.

[0034] Although nodes attempt to receive each data block prior to their respective playback deadline, the nodes do not necessarily receive the data blocks in the exact order that they appear in the streaming data. Rather, a node may receive a data block corresponding to a number of different time slots within in a window of time past the current playback deadline. An example is illustrated in FIG. 2. Streaming data 130 comprises a sequence of data blocks with each data block corresponding to a particular time slot. The server injection point 255 is a time slot corresponding to the data block that the server 110 is currently pushing out to the peer-to-peer network. Thus, in the illustrated example, the server 110 has already pushed all data blocks up to data block in time slot t+W corresponding to the server injection point 255. Over time, the server injection point 255 advances 256 as the server continuously receives and outputs the streaming data. Received data 250 illustrates the data blocks already received by a node A. The playback deadline 253 indicates the time slot corresponding to the data block currently being output by the node A. The playback deadline 253 advances 257 over time as node A continuously outputs the received data blocks. Thus, to enable node A to continuously output the streaming data (e.g., a streaming video), node A attempts to ensure that it receives a data block before the playback point 253 advances to the time slot corresponding to that data block. Node A does not necessarily receive the data blocks in the original data stream order. Thus, in the illustrated example, node A has received data blocks corresponding to time slots t+1, t+2, t+4, and t+7, but is still missing the remaining data blocks in the window 259 between the current playback deadline 253 and the server injection point 255. Node A attempts to ensure that it receives each of these data blocks before the playback deadline 253 advance to the corresponding time slots. In one embodiment, nodes cease sharing of data blocks once the playback deadline 253 has passed the time slot corresponding to those data blocks. Thus, the data blocks that are being shared in the peer-to-peer network at any given moment correspond to the data blocks within a current share window 259 between the current playback deadline 253 (at time t) and the server injection point 255 (at time t+W).

[0035] In one embodiment, each node maintains an "incoming queue" and a "commit queue." The incoming queue identifies a set of data blocks that the node expects to receive within a time window (e.g., the next N time slots in the data stream). The commit queue identifies a set of data blocks the node expects to transmit to other nodes within the time window. In one embodiment, the commit queue of data blocks can be implemented to impose rate limit on transmission of the data blocks. Specifically, an interval can be imposed between the transmissions of data blocks so that data blocks are not sent out back-to-back immediately. The interval imposed allows capacity to be reserved on the network link to allow other data to go through. This rate control can be implemented at the application level.

[0036] FIG. 3 illustrates an example of a message passing protocol between two neighboring nodes in the peer-to-peer network (e.g., node 120-A and node 120-B). In the illustrated example, node 120-A acts as a requesting node (also referred to herein as a "child node") with respect to a particular data block, and node 120-B acts a transmitting node (also referred to herein as a "parent node"). However, each node can perform the functions attributed to both node 120-A and node 120-B, i.e., any node can act as either a parent node or a child node with respect to different data blocks. Furthermore, although the illustrated example only shows two nodes, similar message passing may occur concurrently between a node and any of its neighboring nodes.

[0037] Initially, both nodes 120-A, 120-B optionally receive a PARAMETERIZATION message (not shown) specifying various parameters that will be used in the communication protocol. For example, in one embodiment, the PARAMETERIZATION message includes: (a) a data availability broadcast period (e.g., 0-10 min.) (in one embodiment, corresponding to the size of window 259) (b) a data block length corresponding to the size of the data block (e.g., 0-5 min.). The PARAMETERZATION messages are optionally sent by the server 110 prior to data sharing between the nodes.

[0038] Node 120-B periodically sends a DATA AVAILABILITY BROADCAST message 201 to its neighboring nodes (including node 120-A). The message 201 may be sent, for example, once per data availability broadcast period. This message identifies the data blocks that node 120-B has available for sharing. The DATA AVAILABILITY BROADCAST message 301 may include, for example, (a) the number of free time slots node 120-B has available for transmission over the next time window (e.g., the next N time slots with each time slot corresponding to a data block), (b) the earliest free time slot node 120-B has, and (c) whether the node 120-B has a specified data block and if it's been scheduled for transmission. Optionally, the node 120-B may communicate a measurement of its current or average bandwidth and transmission delay. In one embodiment, a node maintains a data block availability structure corresponding to the window of blocks it has available for sharing with other nodes. For example, in one embodiment, the data block availability structure comprises a maximum of X data blocks, where X is a fixed integer value or a customizable parameter.

[0039] Upon receiving a DATA AVAILABILITY BROADCAST message 301, node 120-A determines 307 which, if any, of the available data blocks it wants to request from node 120-B. Assuming node 120-A wants to request one of the data blocks from node 120-B, node 120-A requests the desired data block by sending a DATA REQUEST message 303 to node 120-B. In one embodiment, in order to alleviate potential problems associated with upload bandwidth hogging, the data protocol may limit a data request to at most one data block from a neighboring node per request (or similarly, some small number). This limitation may reduce the worst case delay of a data block reaching every node in the network. Optionally, the protocol may also prevent a node from requesting a data block that is too old. For example, in one embodiment, a data block is considered too old if it falls into the first N entries of a data map for some parameter N. A data block may also be considered too old if the node would not be able to receive the block before the current playback point reaches the corresponding time slot for the block.

[0040] When a node first joins the peer-to-peer network, the new node may be limited to only request data blocks beyond the server injection point 255 plus a configured advance window (e.g., an additional N data blocks). This limitation may help avoid a node falling behind permanently trying to catch up with old data blocks. In another embodiment, when a node has just joined the group, the node should not move the window 259 ahead for B seconds where B can be, for example, 0-300 seconds. B can optionally be chosen by the consumer of the data, which could a video player under the video streaming scenario.

[0041] In one embodiment, a node 120-A may receive DATA AVAILABILITY BROADCAST message 301 from a plurality of neighboring nodes and these message may specify one or more of the same data blocks needed by node 120-A. If node 120-A has a choice of neighboring nodes from which to request a particular desired block, node 120-A may first generate a candidate list for each desired block. The list can be partitioned into different groups based on previous responses from these neighboring nodes. Each group can be further sorted using a number of factors, including bandwidth available, observed performance, network distance, time of previous rejection or acceptance, etc. The node 120-A may then use these various factors to determine which neighboring node from which to request the desired data block.

[0042] For rare data blocks that are not available on most of neighboring nodes, a node may be less likely to request the data block (i.e., requests for rare data blocks are issued less often). In one embodiment, a rare data block comprises a data block available from less than a threshold number or threshold percentage or neighboring nodes. In one embodiment, the decision to issue a request for such a block can be determined probabilistically to achieve a higher percentage of successful requests. In an alternative embodiment, an entirely different approach may be used in which a node will pursue a rare data block aggressively (i.e., request it more frequently) than data blocks that are more widely available.

[0043] Upon receiving a request for data via a DATA REQUEST message 303, a node 120-B determines 309 whether or not accept the request. In one embodiment, a node can collectively commit at most a first threshold X data blocks concurrently to all of its neighboring nodes, where X is a parameterized constant equal to or less than the window size of the data block availability data structure. Furthermore, in one embodiment, a node may commit at most a second threshold N total copies of the same data block to its neighboring nodes over the lifetime of the data block, where N is some parameterized constant. Thus once a node has committed N copies of a data block to neighboring nodes, it will no longer accept requests for that data block. This limitation provides a more even fanout distribution across delivery trees of different data blocks. In one embodiment, a node performs an estimate of whether a data block can be transmitted and received by another node by the playback deadline 253 of the block before committing to transfer of the node. Furthermore, in deciding whether a data block request can be accepted, a node may first check if there is any potential conflict. For example, if accepting a new request for a data block will result in any of the data blocks already in the transmit queue missing their playback deadline.

[0044] If node 120-B accepts a request, node 120-B sends a DATA RESPONSE message 305 to the requesting node 120-A indicating which data block(s) it approved for transmission. For accepted requests, node 120-B adds the data block(s) to its commit queue and the requesting node 120-A adds the data block(s) to its incoming queue. Optionally, data blocks can be stamped by nodes each time they are transmitted from one node to another, for example, by incrementing a relay count tracing the number of times the block has been transmitted. This information can be used to maximize performance.

[0045] If node 120-B determines to reject node 120-A's a request for a data block, the requesting node 120-A may wait for a period of time or immediately try to request the data block from another node. If node 120-A cannot obtain the data block from any of its neighboring nodes, it may request the data block directly from the server 110.

[0046] In one embodiment, the server 110 can pre-burst data to newly joined nodes (i.e., for the most recent N data blocks of the streaming data 130 for some parameter N). The amount of data to be pre-bursted can be adjusted based on the particular stream or configuration. Furthermore, nodes can request missing data from the server 110 at a later time if they are unable to obtain the data from other nodes. For example, in one embodiment, a node determines to request a data block from the server 110 when it would not otherwise be able to obtain the data block in time to meet the playback deadline. Such requests can be batched for a number of blocks. A server 110 may also implement logic to prevent a node from requesting too much data in this way.

[0047] Optionally, data blocks in the transmit queue can be sorted based on a number of criteria such as, for example, their sequence in the original data stream, deadline of delivery, urgency of delivery, etc. Optionally, commitments may be prioritized based on tree level, which is lowest at the server and incremented each time the data block is delivered to the next level.

[0048] Under the video streaming scenario, the following parameters can be adjusted to achieve the described data sharing efficiency and video startup latency while maintaining the same playback point among nodes: pod size, size of the data blocks, and number of blocks in the window of active sharing among nodes. This scheme is flexible to support different tradeoffs between peer-to-peer sharing efficiency, small startup latency, and synchronized playback.

Analysis

1. Modeling

[0049] The distribution of an individual data block can be modeled using a distribution tree structure as illustrated in FIG. 4. For any given data block, a distribution tree 400 illustrates the flow of the data block between interconnected nodes. In the illustrated example, a root node 401 receives a data block 403. The root node 401 may correspond to the server 110. The root node distributes the data block 403 to one or more first level nodes 405, which may be referred to herein as "partners" of the root node 401 with respect to the particular data block 403. The first level nodes 405 then distribute the data block 403 to one or more second level nodes 407, and so on for any number of levels.

[0050] Each data block may flow through the nodes in an entirely different manner. Thus, for a window size of W data blocks, there will be W such distribution trees, with each tree corresponding to one of the data blocks. For a given data block, a physical node may appear in a distribution tree multiple times. This occurs, for example, if the node receives the data block from two or more different neighboring nodes. To distinguish between a physical node and a point in the distribution tree, the points in the distribution tree are referred to herein as s-nodes while physically separate nodes are referred to as "nodes." Thus, a node may appear multiple times in a given distribution tree and may therefore correspond to multiple s-nodes in the same tree.

[0051] The distribution tree may change for each data block distributed by the server 110. Thus, a node does not always receive media blocks from the same nodes and a node (or server) does not always distribute media blocks to the same nodes. Thus, for each data block distributed from the server, a distribution tree may arise that looks totally different than the distribution tree for previous or subsequent data blocks. As a result, a node may be located at different levels in distribution trees of different data blocks. A node affects more children in a tree where it is closer to the server than in a tree where it is closer to the bottom. Also, queuing delay introduced by a node adds to the overall delay experienced by the leaves (nodes that only receive but do not transmit the data block).

2. Tree Depth, Overlay Radius, and Coverage Ratio

[0052] Tree Depth, or Overlay Radius, for a particular group size directly affects delay and robustness. Related to this is Coverage Ratio, which is percentage of nodes covered within a certain depth.

2.1 Proof

[0053] In the distribution tree, the root node is at level 0. As previously discussed nodes may appear multiple times at different levels or within the same level. In the following discussion, nodes are numbered based on their appearances in the breadth-first search.

[0054] For an s-node t, the identifier, or the index, assigned to it is denoted as Pt. The root node's identifier is 1, the first (e.g., leftmost) child of root node's identifier is 2, and so on. For simplicity, a homogeneous link bandwidth condition may be assumed for all nodes. For an individual node, the depth of its first appearance in the distribution tree is an important factor. The expected tree depth of a group with N nodes is thus the average tree depth of all N nodes.

[0055] An auxiliary function .delta.(t) is defined as:

.delta. ( t ) = { 1 , if .delta. ( t ) .noteq. .delta. ( t ' ) , 0 < t ' < t 0 , otherwise ##EQU00001##

[0056] In other words, .delta.(t) is 1 if and only if the s-node t corresponds to the first appearance of the node in the tree.

[0057] Another auxiliary function is defined as:

[0058] f(t)=total number of unique nodes associated with s-nodes 1 through t.

[0059] Because membership and partnership are formed randomly among all nodes, the probability of an s-node t being the first appearance of a node is given by:

Pr [ .delta. ( t ) = 1 ] = N - f ( t - 1 ) N ( 1 ) ##EQU00002##

[0060] Note that:

.delta.(t)=f(t-1) (2)

[0061] Taking expectations of (1) and (2), yields:

E [ f ( t ) - f ( t - 1 ) ] = E [ .delta. ( t ) ] = N - E [ f ( t - 1 ) ] N ##EQU00003##

[0062] Thus:

E [ f ( t ) ] = E [ f ( t - 1 ) ] + N - E [ f ( t - 1 ) ] N E [ f ( t ) ] = 1 + N - 1 N E [ f ( t - 1 ) ] ( 3 ) ##EQU00004##

[0063] which gives the iteration for expected number of unique nodes from one s-node to the next. Note that:

f(t)=1 (4)

[0064] Also, note that:

E [ f ( 2 ) ] = 1 + N - 1 N = N ( 2 N - 1 N 2 ) = N ( 1 - ( N - 1 N ) 2 ) ( 5 ) ##EQU00005##

[0065] This forms the iteration base which holds for t=2:

E [ f ( t - 1 ) ] = N [ 1 - ( N - 1 N ) t - 1 ] ( 6 ) ##EQU00006##

[0066] Thus:

E [ f ( t ) ] = 1 + N - 1 N ( N ( 1 - ( N - 1 N ) t - 1 ) = 1 + ( N - 1 ) ( 1 - ( N - 1 N ) t - 1 ) = 1 + ( N - 1 ) ( N t - 1 - ( N - 1 ) t - 1 N t - 1 ) = 1 + N t - N t - 1 - ( N - 1 ) t N t - 1 = N ( 1 N + N t - N t - 1 - ( N - 1 ) t N t ) = N ( 1 - ( 1 - 1 N - N t - N t - 1 - ( N - 1 ) t ) N t ) ) = N ( 1 - N t - N t - 1 - N t + N t - 1 + ( N - 1 ) t N t ) = N ( 1 - ( N - 1 N ) t ) ##EQU00007##

[0067] Thus:

E [ f ( t ) ] = N ( 1 - ( N - 1 N ) t ) ( 7 ) ##EQU00008##

[0068] Note that:

( N - 1 N ) t > - t N ( 8 ) ##EQU00009##

[0069] Thus:

E [ f ( t ) ] > N ( 1 - - t N ) ( 9 ) ##EQU00010##

[0070] t.sub.k denotes the identifier of the last s-node at level k. The number of new unique nodes at level k, but not level 0-(k-1), is then:

f(t.sub.k)-(f(t.sub.k-1) (10)

[0071] The expected distance (depth) d of all nodes is then:

d = 1 N k = 1 .infin. ( k * E [ f ( t k ) - f ( t k - 1 ) ] ) ( 11 ) ##EQU00011##

[0072] Note:

when k.fwdarw..infin., E[f(t.sub.k)]=N (12)

[0073] Thus:

lim k .infin. k ( 1 - e [ f ( t k ) ] N ) = 0 ( 13 ) ##EQU00012##

[0074] From (11), taking N into the summation yields:

d = k = 1 .infin. ( k ( E [ f ( t k ) ] N - E [ f ( t k - 1 ) ] N ) ) = k = 1 .infin. ( k ( 1 - 1 + E [ f ( t k ) ] N - E [ f ( t k - 1 ) ] N ) ) = k = 1 .infin. ( k ( ( 1 - E [ f ( t k - 1 ) ] N ) - ( 1 - E [ f ( t k ) ] N ) ) ) ( 14 ) ##EQU00013##

[0075] Another way to reach (14) is by noting that the following is equivalent of (10):

f ( t k ) - ( f ( t k - 1 ) = N - E [ f ( t k - 1 ) ] N - N - E [ f ( t k ) ] N = ( 1 - E [ f ( t k - 1 ) ] N ) - ( 1 - E [ f ( t k ) ] N ) ( 15 ) ##EQU00014##

This is because the probability of unique new nodes appearing after level (k-1) is:

N - E [ f ( t k - 1 ) ] N ( 16 ) ##EQU00015##

and the probability of unique new nodes appearing after level k is:

N - E [ f ( t k ) ] N ( 17 ) ##EQU00016##

[0076] Eq. (14) may be solved by expanding each term in the summation. Expansion of each individual term with a unique k value of i "offsets" (i-1) number of the right term of the expansion for the k value of (i-1), thus:

d = ( 1 - E [ f ( t 0 ) ] N ) - ( 1 - E [ f ( t 1 ) ] N ) ( k = 0 and k = 1 ) + 2 ( 1 - E [ f ( t 1 ) ] N ) - 2 ( 1 - E [ f ( t 2 ) ] N ) ( k = 2 ) + + ( i - 1 ) ( 1 - E [ f ( t i - 1 ) ] N ) - ( i - 1 ) ( 1 - E [ f ( t i ) ] N ) ( k = i - 1 ) + ( i ) ( 1 - E [ f ( t i - 1 ) ] N ) - ( i ) ( 1 - E [ f ( t i ) ] N ) ( k = i ) + ( i ) ( 1 - E [ f ( t i - 1 ) ] N ) - ( i ) ( 1 - E [ f ( t i ) ] N ) ( k = i ) + = ( 1 - 1 N ) + ( 1 - E [ f ( t 1 ) ] N ) + ( 1 - E [ f ( t 2 ) ] N ) + + ( 1 - E [ f ( t i ) ] N ) + + ( 1 - E [ f ( k = .infin. ) ] N ) d = k = 0 .infin. ( 1 - E [ f ( t k ) ] N ) ( 18 ) ##EQU00017##

[0077] Combining (7) and (18) yields:

d = k = 0 .infin. ( 1 - N ( 1 - ( N - 1 N ) t k ) N ) = k = 0 .infin. ( ( N - 1 N ) t k ) ##EQU00018## d = k = 0 .infin. ( 1 - N ( 1 - ( N - 1 N ) t k N ) = k = 0 .infin. ( N - 1 N ) t k ##EQU00018.2##

[0078] Then, combining (7), (8) and (18), yields:

d < k = 0 .infin. - t k N ( 19 ) ##EQU00019##

[0079] Before solving this summation, note that:

t k = 1 + M + M ( M - 1 ) + M ( M - 1 ) 2 + M ( M - 1 ) 3 + + M ( M - 1 ) k - 1 = 1 + M * i = 0 k - 1 ( M - 1 ) i = 1 + M 1 ( 1 - ( M - 1 ) k ) 1 - ( M - 1 ) = 1 + M 1 - ( M - 1 ) k 2 - M = ( M - 1 ) k - 1 M - 2 M + 1 = M ( M - 1 ) k - M + M - 2 M - 2 = M ( M - 1 ) k - 2 M - 2 ##EQU00020##

[0080] Thus:

t k = M ( M - 1 ) k - 2 M - 2 ( 20 ) ##EQU00021##

[0081] Dividing (19) into two parts, the first part from k=0 to k=log.sub.M-1 N, the second part from k=log.sub.M-1 N+1 to k=.infin.,

d < k = 0 log M - 1 N - M ( M - 1 ) k - 2 ( M - 2 ) N + k = log M - 1 N + 1 .infin. - M ( M - 1 ) k - 2 ( M - 2 ) N ( 21 ) < log M - 1 N + 1 + k = 0 .infin. - MN ( M - 1 ) k - 2 ( M - 2 ) N ( 22 ) ##EQU00022##

[0082] Note that:

MN ( M - 1 ) ( M - 2 ) N > 1 ##EQU00023##

[0083] So:

d .ltoreq. log M - 1 N + 1 + k = 0 .infin. - ( M - 1 ) k ( 23 ) ##EQU00024##

[0084] For M.gtoreq.3:

(M-1).sup.k.gtoreq.(M-1)k (24)

[0085] Thus:

e.sup.-(M-1).sup.k.ltoreq.e.sup.-(M-1)k (25)

[0086] So:

d < log M - 1 N + 1 + k = 0 .infin. - ( M - 1 ) k ( 26 ) = log M - 1 N + 1 + 1 1 - - ( M - 1 ) ( 27 ) < log M - 1 N + 3 ( 28 ) ##EQU00025##

2.2 Tree Depth/Overlay Radius.

[0087] (28) shows that the average distance of a node from the root node is bounded by O(log(N)):

d=O(log N) (29)

2.3 Coverage Ratio.

[0088] From (7), (9) and (20), we also have

coverage ratio at level k = E [ f ( t k ) ] N > N ( 1 - - t k N ) N = 1 - - t k n = 1 - M ( M - 1 ) k - 2 M - 2 ( 30 ) ##EQU00026##

2.4 Notes

[0089] A heterogeneous environment where different nodes have different uplink bandwidth affects fanout of each node. The larger fanout of a node with fat upload link offsets to a degree the smaller fanout of another node. This does not affect the analysis herein.

[0090] Nodes that cannot support any upload, either because a narrow uplink or firewall, can only serve as leaves. This does not affect the conclusion above for a single tree. Its impact on analysis of multi-tree is described below.

3. Multi-Tree in a Window

[0091] The section above examines characteristics of the tree for one data block. Next, the relation between trees within the same window is discussed. Specifically, the following discussion illustrates: (a) whether there is enough bandwidth to support streaming need for all nodes while the uplink bandwidth constraint of each node is met; (b) what requirement an individual node should meet in allocating its uplink bandwidth order to achieve optimal performance for the group collectively; and (c) how the root node can do in selecting its immediate partners and distributing data across the partners to provide best performance possible for the group.

[0092] The tree can be modeled as described below. At any given point, a snapshot of the random graph following the flow of data, one distribution tree is obtained for each data block. For a window size of W data blocks, there will be W such trees, each tree corresponding to one data block. A tree here is different from a regular tree in that a node may appear in the tree multiple times because these nodes form a graph (i.e. a node may receive the same data block from more than one neighboring node). This type of tree a is referred to herein as a "Tree with Redundant Nodes", or simply "tree" hereinafter, unless specified otherwise.

[0093] The trees are numbered from 1 to W and each tree is denoted as T.sub.1, T.sub.2, T.sub.3, . . . T.sub.i, . . . T.sub.w.

3.1 Partner Selection and Data block Scheduling Policy at Root

[0094] Various policies may be used to determine how the root node should select its partners for the initial injection of data blocks into the network. In the very simple case, where the root node selects only one node, A, as its sole partner, A will receive all data blocks within the window and will further deliver all the data blocks to other nodes. For any tree T.sub.i, fanout at level 0 is 1. Also, due to A's uplink bandwidth limit, A can support at most one child in each tree for a maximum of W children across W trees. Thus, the fanout is 1 at level 1 for A in any tree T.sub.i. This means that every tree actually would reduce to a line.

[0095] In a second case, the root node selects M peers, P.sub.1, P.sub.2, . . . , P.sub.M, as its partners (M>1). Furthermore, the root nodes sends out exactly one copy of each data block to one of these partners. In alternative embodiments, the root node can send out multiple copies of the same data block to multiple partners. This may add a level of robustness, but generally does not affect the major performance characteristics. Different policies may be applied at the root node for this distribution. For example, the root node could distribute the data blocks to its partners in a round-robin fashion, one data block at a time per partner. Alternatively, the root node may send W/M consecutive data blocks to one partner, then move on to the next partner. For the description below, it is assumed that the round-robin approach is used.

3.2 Bandwidth Utilization

[0096] In the second case of section 3.1 above (where the root node selects M peers, P.sub.1, P.sub.2, . . . , P.sub.M, as its partners), fanout from the root node for any tree T.sub.i is 1. Fanout from level 1 is (M-1). P.sub.1 would appear in W/M trees corresponding to T.sub.1, T.sub.M+1, T.sub.2M+1, . . . , T.sub.W-M+1. This group of trees is referred to herein as the Tree Group of P.sub.1. P.sub.1 would then utilize

( M - 1 ) W M ##EQU00027##

units worth of its upload bandwidth in total to distribute these data blocks, with W/M units worth of bandwidth left (a node can support a maximum of W units of upload bandwidth). This means in other trees, P.sub.1 would mostly be located as leaf level, not contributing to upload much. Further, note that this applies to children of P.sub.1 in these trees as well, i.e., those nodes are inner nodes and contribute most of their uplink bandwidth in these trees. They will be located at leaf level in most of other trees. This is generally feasible, though, given the ratio of inner nodes to leaf nodes in an (M-1)-way tree. For a tree with N nodes, leaf nodes account for

M - 1 M ##EQU00028##

portion of all nodes, while inner nodes taking up 1/M. (fanout at level 0 is M, that does not change the portion in a material way).

[0097] In trees T.sub.1, T.sub.M+1, T.sub.2M+1, . . . , T.sub.W-M+1 or the tree group of P.sub.1, P.sub.1's children could vary, depending on the dynamics of the partnership formation. If M is small, (e.g., 4), P.sub.1's children in those trees could well be the same. These children of P.sub.1 would almost use up their uplink bandwidth in P.sub.1's tree group. Each of them has W/M units worth of bandwidth left, for a total of

W M ( M - 1 ) ##EQU00029##

units worth of bandwidth for other tree groups. This is the total download bandwidth P.sub.1 would need in the other (M-1) tree groups. The overall bandwidth consumption is balanced at the highest level, since each node is contributing at least as much as it uses.

3.3 Delay Within a Single Tree

[0098] Performance in peer-to-peer sharing may be measured by various characteristics including, for example, how long it takes a data block to be delivered to a leaf node, and how long it takes for a node to accumulate all W data blocks in order to start playback.

[0099] Within a single tree T.sub.i, the expected tree depth is bounded by O(log N). An inner node also needs to support (M-1) children. In the worst case, data block i would take (M-1)*tree-depth to reach a leaf node, if all ancestors of this leaf node are the last one in the transmission queue of their corresponding parents. This delay is given by:

.DELTA. one - tree = ( M - 1 ) tree - depth .ltoreq. ( M - 1 ) O ( log ( N ) ) ##EQU00030##

3.4 Delay Within a Tree Group

[0100] Within a tree group, (e.g., the tree group of P.sub.1) data blocks can be transferred down each tree in a "pipelined" fashion. The timing and delay of delivery of those data blocks are interdependent because they share a lot of common inner nodes. Each inner node there "serializes" transfer of data blocks 1, 1+M, 1+2M, . . . W-M+1. Assuming a node always transfer data blocks in the order of the data block id number, there would be a (M-1) seconds shift in delivery time between consecutive trees in the group. This shift, .DELTA..sub.inter-tree, is constrained as follows: 1.ltoreq..DELTA..sub.inter-tree.ltoreq.(M-1).

[0101] The last data block in the group, namely data block (W-M+1), would arrive at a leaf node

( M - 1 ) ( W M - 1 ) ##EQU00031##

seconds later than data block 1. If the root node starts sending out data block 1 and data block (W-M+1) at time t, then the last leaf node receiving data block 1 will receive it at time t+(M-1)log(N), and the last node receiving data block (W-M+1) would get that data block at time

t + ( M - 1 ) log N + ( M - 1 ) ( W M - 1 ) . ##EQU00032##

3.5 Delay Across Tree Groups

[0102] Across tree groups, the transfer can take place in parallel to a large degree because there is very little overlap between inner nodes in these tree groups. An inner node in one tree group still contributes W/M worth of bandwidth in other tree groups. So the extra shift delay across tree groups is W/M seconds. This delay, .DELTA..sub.inter-group, is constrained as:

.DELTA. inter - group .ltoreq. W M ##EQU00033##

seconds.

3.6 Overall Delay, Buffering Time at Startup

[0103] The maximum overall shift delay caused by "serialization" among all W trees is:

.DELTA. inter - tree * trees in one tree group + .DELTA. inter - group = ( M - 1 ) ( W M - 1 ) + w M = W - ( M - 1 ) seconds ##EQU00034##

[0104] This means a peer can receive a first data block in at most (M-1log N seconds, and receive all the other (W-1) data blocks in the window within W-(M-1) seconds thereafter. The buffering time .DELTA..sub.buffering needed by a newly joined node is:

.DELTA. buffering = .DELTA. one - tree + .DELTA. inter - tree * trees in one tree group + .DELTA. inter - group .ltoreq. ( M - 1 ) O ( log N ) + W - ( M - 1 ) ##EQU00035##

[0105] There are two cases. In case (A), for a node close to the bottom of all trees, the (M-1)O(log N) seconds delay is largely invisible to end-user, because the differential in data block arrival time across tree groups is not big. As a result, the buffering time needed by a node is around W-(M-1) seconds. Intuitively, a node tends to be located at about the same level across trees and tree groups. For example, a newly joined node will logically be located close to bottom of all trees. Thus, case (A) is expected to be the majority of the cases. In case (B), for a node that is at level 1 in a tree group and at leaf level in other tree groups, this node could receive its first data block very fast, then wait for an additional (M-1)O(log N)+W-(M-1) seconds to receive the rest of the data blocks in the window.

3.7 Ordering of Data Blocks Transfer by a Node

[0106] The order in which a node transfers its data blocks affects overall performance of the peer-to-peer network. Suppose a node is ready to transfer data blocks 1, 5, 7, and 8 to its partners. Under different policies, the node may, for example, transfer the data blocks based on the data block ID (i.e., lowest to highest with the lowest ID corresponding to an earlier time slot in the streaming data), or the node may transfer the data blocks in the order the requests are received. From the analysis above, the exact order of transfer does not affect the overall throughput. All data blocks should be able to arrive within the W-second window, assuming there are no transmission errors. Yet, since data block 1 would be needed earlier than data blocks 5, 7 and 8, it appears reasonable to always transfer 1 first, thus providing a bigger head room for its delivery. Thus, in one embodiment, a preferred approach is to order data block transfers simply based on their data block ID numbers. This also suggests that the root node should perform round-robin with one data block per unit between its M partners instead of transferring W/M consecutive data blocks to a partner node. The reason is that the arrival time would then favor those earlier data blocks needed for playback.

4. Discontinuity

[0107] The following parameters may be used to derive the probability of a node experiencing discontinuity: [0108] P.sub.f--the probability of node failure. Node leave is also a form of "node failure" and will be counted for in this. [0109] P.sub.o--the probability that a dependent node cannot find an alternative supplier partner for a particular data block within .DELTA.t time; [0110] P.sub.s--the probability that a partner can support full rate streaming for a dependent node once needed. [0111] P.sub.d--the probability that a node experiencing discontinuity. This is also the percentage of nodes in a group that may suffer discontinuity.

[0112] This is a summation of the probability of finding an alternative supplier node for each node failure case. The probability of having exactly i number of failed partners is:

(1-P.sub.f).sup.M-1P.sub.f.sup.i

[0113] The probability of none of the (M-1) non-failing nodes cannot be an alternative supplier node is:

(1-P.sub.s).sup.M-1

[0114] If none of the existing partners can become the alternative supplier, the probability of not finding a new partner that can supply the data is P.sub.o. Thus:

P d = P o [ i = 1 M ( M i ) ( 1 - P f ) M - i P f i ( 1 - P s ) M - i ] ##EQU00036##

[0115] If the peer-to-peer sharing protocol can manage to keep P.sub.o big and P.sub.s small, then P.sub.d can also be kept small. P.sub.o and P.sub.s depend on the protocol operation. Ps can be reasonably high, (e.g., 0.5 seconds). However, using P.sub.s here could be a conservative estimate, because a node can retrieve data from multiple partners collectively.

[0116] P.sub.o can indeed be very small, especially if a node keeps around a large cache with a lot of "backup" mode nodes in it. They can be contacted immediately when exiting partners cannot supply all the data. In one embodiment, a P.sub.o of 10% is used.

[0117] The expected number of nodes suffering from discontinuity is then:

N P d = N P o [ i = 1 M ( M i ) ( 1 - P f ) M - i P f i ( 1 - P s ) M - i ] ##EQU00037##

System Architecture

[0118] FIG. 5 is a high-level block diagram illustrating an example of a computing device 500 that could act as a node or a server 102 (or sub-server 106) on the peer-to-peer network 100. Illustrated are at least one processor 502, and input controller 504, a network adaptor 506, a graphics adaptor 508, a storage device 510, and a memory 512. Other embodiments of the computer 500 may have different architectures with additional or different components. In some embodiments, one or more of the illustrated components are omitted.

[0119] The storage device 510 is a computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 512 store instructions and data used by the processor 502. The pointing device 526 is a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard 524 to input data into the computer system 500. The graphics adapter 508 outputs images and other information for display by the display device 522. The network adapter 506 couples the computer system 500 to a network 530.

[0120] The computer 500 is adapted to execute computer program instructions for providing functionality described herein. In one embodiment, program instructions are stored on the storage device 510, loaded into the memory 512, and executed by the processor 502 to carry out the processes described herein.

[0121] The types of computers 500 operating on the peer-to-peer network can vary substantially. For example, a node comprising a personal computer (PC) may include most or all of the components illustrated in FIG. 5. Another node may comprise a mobile computing device (e.g., a cell phone) which typically has limited processing power, a small display 522, and might lack a pointing device 526. A server 110 may comprise multiple processors 502 working together to provide the functionality described herein and may lack an input controller 504, keyboard 524, pointing device 526, graphics adapter 508 and display 522. In other embodiments, the nodes or the server could comprises other types of electronic device such as, for example, a personal digital assistant (PDA), a mobile telephone, a pager, a television "set-top box," etc.

[0122] The network 530 enables communications among the entities connected to it (e.g., the nodes and the server). In one embodiment, the network 530 is the Internet and uses standard communications technologies and/or protocols. Thus, the network 530 can include links using a variety of known technologies, protocols, and data formats. In addition, all or some of links can be encrypted using conventional encryption technologies. In another embodiment, the entities use custom and/or dedicated data communications technologies.

[0123] Upon reading this disclosure, those of skill in the art will appreciate still additional alternative designs for membership management having the features described herein. Thus, while particular embodiments and applications of the present invention have been illustrated and described, it is to be understood that the invention is not limited to the precise construction and components disclosed herein and that various modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus of the present invention disclosed herein without departing from the spirit and scope of the invention as defined in the appended claims.

* * * * *