System And Method For Implementing Network Service Level Agreements (slas) Dalela; Ashish [Dalela; Ashish]

System And Method For Implementing Network Service Level Agreements (slas)

Dalela; Ashish

Patent Application Summary

U.S. patent application number 13/648357 was filed with the patent office on 2014-04-10 for system and method for implementing network service level agreements (slas). The applicant listed for this patent is Ashish Dalela. Invention is credited to Ashish Dalela.

Application Number	20140101228 13/648357
Document ID	/
Family ID	50433604
Filed Date	2014-04-10

United States Patent Application	20140101228
Kind Code	A1
Dalela; Ashish	April 10, 2014

SYSTEM AND METHOD FOR IMPLEMENTING NETWORK SERVICE LEVEL AGREEMENTS (SLAS)

Abstract

An embodiment provides a method for implementing a service level agreement (SLA) in a network. The method includes: (1) providing one or more network devices in communication with a Network Credit Server (NCS), the NCS configured to: store a SLA of a customer, and maintain a database of topology information of the network; (2) receiving, at one of the network devices, a first plurality of packets associated with the customer and en-route to a destination; (3) transmitting, from the network device to the NCS, a request for credit when the first plurality of packets is more than a pre-configured threshold number; (4) thereafter receiving a second plurality of packets associated with the customer and en-route to the destination; and (5) forwarding, by the network device and depending on a status of a reply from the NCS, the second plurality of packets.

Inventors:

Dalela; Ashish; (Bangalore, IN)

Applicant:

Name	City	State	Country	Type
Dalela; Ashish	Bangalore		IN

Family ID:

50433604

Appl. No.:

13/648357

Filed:

October 10, 2012

Current U.S. Class:	709/203
Current CPC Class:	H04L 41/5006 20130101; G06Q 10/06315 20130101; H04L 47/122 20130101; H04L 47/70 20130101; H04L 47/39 20130101; H04L 45/302 20130101; H04L 41/5019 20130101; H04L 45/125 20130101
Class at Publication:	709/203
International Class:	G06F 15/16 20060101 G06F015/16

Claims

1. A method for implementing a service level agreement in a network, the method comprising: establishing, at a network device, a communication with a Network Credit Server (NCS), the network device and the NCS in the network and the NCS configured to: store a service level agreement of a customer, the service level agreement providing an upper level limit of resources in the network allocable to the customer, and maintain a database of topology information of the network, the topology information including bandwidth capacity information of links in the network; receiving, at the network device, a first plurality of packets associated with the customer and en-route to a destination; transmitting, from the network device to NCS, a request for credit when the first plurality of packets is more than a pre-configured threshold number; thereafter receiving, at the network device, a second plurality of packets associated with the customer and en-route to the destination; and forwarding, by the network device and depending on a status of a reply from the NCS, the second plurality of packets, the reply from the NCS dependent on the service level agreement of the customer and the topology information of the network.

2. The method of claim 1, further comprising: identifying, by the network device, the customer based on the information in the first plurality of packets, the information representing one of: a port number, a pair of source and destination address, or combinations thereof.

3. The method of claim 1, wherein the service level agreement provides a bandwidth policy that includes at least one of: a host-to-host bandwidth, a site-to-site bandwidth, a total bandwidth across multiple sites or servers, or a total bandwidth for a particular customer on a given link in the network.

4. The method of claim 1, wherein the status of the reply includes an express grant of the request, wherein the express grant is conditioned on: the topology information indicates resources being available in the network, and the request for credit does not cause resources in the network already allocated and to be allocated to the customer to exceed the upper level limit associated with the customer, the upper level limit provided by the service level agreement of the customer, wherein forwarding the packets received thereafter is based on the express grant of the request for credit, and wherein the topology information of the network is updated to reflect the express grant of the request for credit.

5. The method of claim 1, wherein the credit granted is valid for one of: a pre-determined number of packets, a pre-determined amount of time, or combinations thereof.

6. The method of claim 1, further comprising: detecting a congestion on a link in the network; announcing the congestion to other network devices and the NCS in the network, wherein the announcing causes an update of the database of topology information maintained at the NCS; and selecting at least one data flow such that packets of the data flow are re-routed from an old path to a new path through the network, the new path bypassing the congestion.

7. The method of claim 6, wherein the at least one data flow is neither an elephant flow nor a mouse flow, and wherein the packets of the at least one data flow are associated with the customer.

8. The method of claim 6, wherein the at least one data flow is re-routed according to an Access Control List (ACL) associated with the data flow, the ACL installed on the network device and other network devices in the network in response to the congestion.

9. A network device comprising: one or more programmable processors and one or more storage devices storing instructions that are operable, when executed by the one or more programmable processors, to cause the one or more programmable processors to perform operations comprising: establishing, at a network device, a communication with a Network Credit Server (NCS), the network device and the NCS in the network and the NCS configured to: store a service level agreement of a customer, the service level agreement providing an upper level limit of resources in the network allocable to the customer, and maintain a database of topology information of the network, the topology information including bandwidth capacity information of links in the network; receiving, at the network device, a first plurality of packets associated with the customer and en-route to a destination; transmitting, from the network device to NCS, a request for credit when the first plurality of packets is more than a pre-configured threshold number; thereafter receiving, at the network device, a second plurality of packets associated with the customer and en-route to the destination; and forwarding, by the network device and depending on a status of a reply from the NCS, the second plurality of packets, the reply from the NCS dependent on the service level agreement of the customer and the topology information of the network.

10. The network device of claim 9, wherein the operations further comprise: identifying the customer based on the information in the first plurality of packets, the information representing one of: a port number, a pair of source and destination address, or combinations thereof.

11. The network device of claim 9, wherein the service level agreement provides a bandwidth policy that includes at least one of: a host-to-host bandwidth, a site-to-site bandwidth, a total bandwidth across multiple sites or servers, or a total bandwidth for a particular customer on a given link in the network.

12. The network device of claim 9, wherein the status of the reply includes an express grant of the request, wherein the express grant is conditioned on: the topology information indicates resources being available in the network, and the request for credit does not cause resources in the network already allocated and to be allocated to the customer to exceed the upper level limit associated with the customer, the upper level limit provided by the service level agreement of the customer, wherein forwarding the packets received thereafter is based on the express grant of the request for credit, and wherein the topology information of the network is updated to reflect the express grant of the request for credit.

13. The network device of claim 9, wherein the credit granted is valid for one of: a pre-determined number of packets, a pre-determined amount of time, or combinations thereof.

14. The network device of claim 1, wherein the functions further comprise: detecting a congestion on a link in the network; announcing the congestion to other network devices and the NCS in the network, the announcing causes an update of the database of topology information maintained at the NCS; and selecting at least one data flow such that packets of the data flow are re-routed from an old path to a new path through the network, the new path bypassing the congestion.

15. A computer program product, embodied in a non-transitory machine-readable medium and including instructions executable by a processor, the instructions operable to cause the processor to perform functions including: establishing, at a network device, a communication with a Network Credit Server (NCS), the network device and the NCS in the network and the NCS configured to: store a service level agreement of a customer, the service level agreement providing an upper level limit of resources in the network allocable to the customer, and maintain a database of topology information of the network, the topology information including bandwidth capacity information of links in the network; receiving, at the network device, a first plurality of packets associated with the customer and en-route to a destination; transmitting, from the network device to NCS, a request for credit when the first plurality of packets is more than a pre-configured threshold number; thereafter receiving, at the network device, a second plurality of packets associated with the customer and en-route to the destination; and forwarding, by the network device and depending on a status of a reply from the NCS, the second plurality of packets, the reply from the NCS dependent on the service level agreement of the customer and the topology information of the network.

16. The computer program product of claim 15, wherein the functions further comprise: identifying the customer based on the information in the first plurality of packets, the information representing one of: a port number, a pair of source and destination address, or combinations thereof.

17. The computer program product of claim 15, wherein the service level agreement provides a bandwidth policy that includes at least one of: a host-to-host bandwidth, a site-to-site bandwidth, a total bandwidth across multiple sites or servers, or a total bandwidth for a particular customer on a given link in the network.

18. The computer program product of claim 15, wherein the status of the reply includes an express grant of the request, wherein the express grant is conditioned on: the topology information indicates resources being available in the network, and the request for credit does not cause resources in the network already allocated and to be allocated to the customer to exceed the upper level limit associated with the customer, the upper level limit provided by the service level agreement of the customer, wherein forwarding the packets received thereafter is based on the express grant of the request for credit, and wherein the topology information of the network is updated to reflect the express grant of the request for credit.

19. The computer program product of claim 18, wherein the credit granted is valid for one of: a pre-determined number of packets, a pre-determined amount of time, or combinations thereof.

20. The computer program product of claim 15, wherein the functions further comprise: detecting a congestion on a link in the network; announcing the congestion to other network devices and the NCS in the network, the announcing causes an update of the database of topology information maintained at the NCS; and selecting at least one data flow such that packets of the data flow are re-routed from an old path to a new path through the network, the new path bypassing the congestion.

Description

TECHNICAL FIELD

[0001] The following disclosure relates generally to service level agreements in a network.

BACKGROUND

[0002] Network service providers often want to provide their customers with network service levels agreements (SLAs). The SLAs typically tie to network billing models to provide price differentiation based on the service rendered. Customers too may want assured bandwidths on the network. SLAs would be easy to implement in a network when all packets in the network from point A to point B follow one fixed path. In this case, an implementation of customer-specific bandwidth or quality of service (QoS) can be at the ingress port. Generally, however, it is more complex to implement SLAs if a packet can take many different paths from point A to point B. For example, if a customer owns multiple servers inside a data center and a given pair of servers can communicate with each other through different paths, then the specified services under SLAs may be dynamically split between those paths. The diverse paths can make it difficult to implement SLAs in a network.

BRIEF DESCRIPTION OF THE FIGURES

[0003] FIG. 1 shows a flow chart of an example implementation.

[0004] FIG. 2 illustrates an example implementation of credit request and grant.

[0005] FIG. 3 illustrates another example implementation of credit request and grant.

[0006] FIG. 4 shows another flow chart of an example implementation.

[0007] FIG. 5 illustrates an example response to a congested link according to an implementation.

[0008] FIG. 6 illustrates example dimensions of a swappable flow.

DESCRIPTION OF EXAMPLE EMBODIMENTS

Overview

[0009] An embodiment provides a method for implementing a service level agreement in a network. The method includes: (1) providing one or more network devices, the network devices in communication with a Network Credit Server (NCS) and the NCS configured to: store a service level agreement of a customer, the service level agreement providing an upper level limit of resources in the network allocable to the customer, and maintain a database of topology information of the network, the topology information including bandwidth capacity information of links in the network; (2) receiving, at one of the network devices, a first plurality of packets associated with the customer and en-route to a destination; (3) transmitting, from the network device to NCS, a request for credit when the first plurality of packets is more than a pre-configured threshold number; and (4) forwarding, by the network device and depending on a status of a reply from the NCS, a second plurality of packets associated with the customer and received at the network device thereafter and en-route to the destination, the reply from the NCS dependent on the service level agreement of the customer and the topology information of the network.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0010] A "Network Credit Server" (NCS) can be introduced into a managed network, for example, a data center. In one configuration, the NCS is a computing device with a processor and a plurality of memories. The NCS is aware of the network topology information. For example, the NCS has up-to-date information of substantially all possible paths in the network for a given pair of ingress and egress network devices. The NCS also knows the link bandwidth of each link on a given path in the network. The NCS can receive "Credit Request" from a network device on the managed network. The network device may be, for example, a network switch, a network router, etc. The network devices on the managed network communicate with a NCS to request for credits and receive grants therefrom. In one configuration, the Credit Request includes information indicating a pair of source address and a destination address. The pair of addresses may be associated with the "Credit Request." The NCS may lookup pre-existing service level agreements (SLAs) for the customer associated with the Credit Request to determine if a credit grant can be made. The NCS may respond with a "Credit Grant" to authorize packet forwarding to the destination address provided in the pair of addresses. Once the credit to forward packet has been granted, the NCS will subtract the granted bandwidth on a particular link on the path to the destination by the amount that has just been allocated. The grant is valid for a predefined interval (such as, for example, 100 ms) and lapses thereafter. The credit grant is to be renewed periodically, if packets still need to be forwarded.

[0011] FIG. 1 shows a flow chart of an example implementation. In block 102, a network device, for example, an access switch/router or any other device that can implement a SLA, receives packets. The network device is capable of detecting packet flows related to a particular customer. The detection can be based on, for example, an access port, a source or a destination IP address, etc.

[0012] In block 104, the network device determines whether the number of received packets associated with the particular customer has reached a predetermined threshold. When the number of received packets associated with the particular customer exceeds the threshold, the network device determines that a new packet flow has been detected.

[0013] In block 106, the network device sends a credit request to the NCS and may expect a response. In this configuration, the credit request is sent only when the number of received packets associated with the particular customer has exceeded a certain predetermined threshold. This triggering condition helps avoiding sending very frequent requests for packet flows that are insignificant. For example, packets associated with, ARP, DHCP or Credit Requests (from other network devices on the same managed network) can be too trivial to warrant a "Credit Request."

[0014] In block 108, the network device determines whether the Credit Request has been granted by the NCS. If the Credit Request has not been sent, or a response has not been received, or a NCS is not configured, the network device will continue to forward packets normally according to a current forwarding policy and disregard providing elevated service in response to the new packet flow, as indicated by block 112. This mechanism allows the network device to fall back to current forwarding policy for the newly detected packet flow when NCS is not configured or has failed.

[0015] If the network device determined that the Credit Request has been granted, the network device will start policing packets received thereafter using the grant, as indicated by block 110. For example, after the granted number of packets/bytes have been transmitted thereafter, subsequent packets will be dropped unless the grant is renewed timely. In one implementation, the grant is conditioned on an express response from the NCS in order for the policing under the SLA to take place. In another implementation, if the grant has not been received within a certain time window, the requesting network device assumes that all packets can be forwarded. This implementation provides backward compatibility when the NCS is not present or the NCS has failed.

[0016] FIG. 2 illustrates an example implementation of credit request and grant. FIG. 2 may correspond to a TCP slow start. At time instant 230, packets corresponding to packet flow 220 start to arrive at the network device. Before the number of received packets for packet flow 220 reaches the predetermined threshold, the network device forwards the received packets for packet flow normally and according to a current forwarding policy, as illustrated in FIG. 2. Once the number of received packets for packet flow 220 surpasses the predetermined threshold, the network device sends a "Credit Request" 212 to NCS 202 at time instant 232. As discussed above, "Credit Request" 212 can include information encoding a pair of source and destination addresses. Depending on the definitions in the SLAs (available at NCS 202), "Credit Request" 212 can include, for example, either IP or MAC addresses. Credit Request" 212 can specify the VLAN id, if, for example, SLAs on the NCS 202 are based on VLANs.

[0017] NCS 202 maintains a database that includes state information for each link on the managed network. The state information includes currently allocated bandwidth to a particular customer on a given link in the managed network. NCS 202 can ascertain the total bandwidth still available on a given link to the destination address based on the allocations already made. NCS 202 issues a credit grant 214 if the available resources on a given link can accommodate the Credit Request 212. For example, if the total bandwidth still available in a particular path associated with the pair of source and destination address is larger than a bandwidth indicated by the Credit Request 212, NCS 202 issues the Credit Grant 214. After issuance, NCS 202 updates the database that includes the state information for each link on the managed network to reflect that additional network resources have been committed resulting from Credit Grant 214.

[0018] At time instant 234, the Credit Grant 214 arrives at the network device and packets policing based on the grant takes effect starting from the next packet to be transmitted thereafter. The next packet is for the same packet flow 220. In some implementations, the policing may cause packets to be dropped, as indicated in FIG. 2 and discussed above in association with FIG. 1. The dropping can be because the grant has expired after a period of time or a number of packets transmissions.

[0019] At time instant 236, the network device sends Credit Request 216 to NCS 202. The network device may send Credit Request 216 to renew Credit Grant 214. As discussed above, NCS is aware of network topology information including network flow statistics and information on resources already committed on the links on the managed network. The NCS 202 grants the Credit Request 216 in a manner similar to the discussions above. The NCS 202 issues Credit Grant 218 if the available resources on a given link can accommodate the Credit Request 216. For example, if the total bandwidth still available in a particular path associated with the pair of source and destination address is larger than a bandwidth indicated by the Credit Request 216, NCS 202 issues the Credit Grant 218. After issuance, NCS 202 updates the database that includes state information for each link on the managed network to reflect that additional network resources have been committed resulting from Credit Grant 218.

[0020] At time instant 238, the Credit Grant 218 arrives at the network device and packets policing based on Credit Grant 218 takes effect starting from the next packet to be transmitted thereafter. The next packet is for the same packet flow 220. In some implementations, the policing can cause packets to be dropped, as indicated in FIG. 2 and discussed above in association with FIG. 1. The grant expires after a period of time or a number of packet transmissions.

[0021] TCP retransmit window does not present an obstacle to various implementations. The TCP retransmit window can be, for example, about 200 ms. The NCS request/response can arrive within, for example, about 100 us. Therefore, if the packet is dropped due to congestion, then the application recovers after 200 ms. If, however, the NCS takes 100 us to determine the right path, the NCS can incur an additional initial latency of 100 us to forward the packet. However, this 100 us latency is much shorter than an application re-transmitting after 200 ms.

[0022] FIG. 3 illustrates another example implementation of credit request and grant. At time instant 240, packets corresponding to packet flow 220 start to arrive at the network device. The packet flow 220 corresponds to a trap condition for trapping TCP synchronization messages during the first step of a TCP handshake in order to send a "Credit Request" 312 to NCS 202. In particular, once the trap condition occurs, the network device sends a "Credit Request" 312 to NCS 202 at time instant 332. As discussed above, "Credit Request" 312 includes information encoding a pair of source and destination addresses.

[0023] NCS 202 maintains a database containing state information for each link on the managed network. As discussed above, NCS 202 knows the total bandwidth still available on a given link to the destination address based on the allocations already made. NCS 202 issues a Credit Grant 314 if the available resources on a given link can accommodate the Credit Request 312. After issuance, NCS 202 updates the database containing state information for each link on the managed network to reflect that additional network resources have been committed resulting from Credit Grant 314.

[0024] At time instant 334, the Credit Grant 314 arrives at the network device and packets policing based on the grant takes effect starting from the next packet to be transmitted thereafter. The next packet may be for the same packet flow 220. As discussed above, the grant expires after a period of time or a number of packet transmissions.

[0025] At time instant 336, the network device sends Credit Request 316 to NCS 202. The network device sends Credit Request 316 to renew Credit Grant 314. As discussed above, NCS 202 grants the Credit Request 316 in a manner similar to the discussions above. The NCS 202 issues Credit Grant 318 if the available resources on a given link can accommodate the Credit Request 316. After issuance, NCS 202 updates the database containing state information for each link on the managed network to reflect that additional network resources have been committed resulting from Credit Grant 318.

[0026] At time instant 338, the Credit Grant 318 arrives at the network device and packets policing based on Credit Grant 318 takes effect starting from the next packet to be transmitted thereafter. The next packet is for the same packet flow 220. As discussed above, the grant expires after a period of time or a number of packet transmissions.

[0027] The example implementations described herein employ a credit-based system for ensuring SLAs. Network devices on a managed network obtain a grant of credit from a centralized credit server before the network device starts to police packets (e.g., rate limits them to the granted value).

[0028] Current mechanisms apply network policies at different points in the managed network. Coordination across the different points can incur prohibitive overhead. Therefore, the SLAs of a customer may not be dynamically coordinated across the different points in the managed network based on current mechanism. For example, a customer can use ten different servers in a data center, and may reserve a total bandwidth of 4 Gbps across these ten servers. Current mechanisms are generally incapable of achieving the dynamic allocation of this total bandwidth of 4 Gbps across the ten different servers of the data center. The bandwidth coordination across different paths is not feasible with current mechanisms. The credit mechanism generally allows a single point of policy enforcement for global and static coordination amongst various points.

[0029] Using some implementations disclosed herein, 4 Gbps can be distributed dynamically on all the paths between the ten servers of the data center. A network service provider can create bandwidth SLAs and charge their customers for bandwidth usage. The network service provider can provision a variety of bandwidth policies. The policies can include, for example, host-to-host traffic in the data center, site-to-site traffic between two sites, total bandwidth across multiple sites or set of servers, total bandwidth per customer in the managed network, etc. A host can be, for example, a data server in the managed network, a data server outside the managed network, a client laptop accessing a data server, etc. A site can include servers on a given subnet, servers at a particular physical location, servers at a certain domain, etc. In some configurations, these policies are enforced by the NCS and the switches/routers that interact with the NCS. The entities involved in the enforcement may include access switches in a datacenter, or edge routers in a service provider (SP) network.

[0030] In some implementations, providers and their customers obtain a variety of network-based metrics from the NCS. These metrics can include, for example, prominent server-to-server flows (e.g., elephant flows), customer specific traffic patterns, time-based traffic patterns across sites, etc.

[0031] In some implementations, servers or customers can reduce packet drops in the network by using a substantially optimal path to route the packets from point A to point B on the managed network. The path optimization improves network utilization and may avoid over-provisioning the network because the optimized path for a packet flow is dynamically selected rather than statically determined. The path optimization can be implemented as policy based routing using the NCS. The traffic in the data center tends to be evenly distributed resulting from, for example, a routing policy to balance traffic load in the network.

[0032] Some implementations selectively drop or allow packet flows for specific applications or customers. Using the selective feature of some implementations, a service provider can create application or customer specific charging models across the network.

[0033] Cloud service providers can use some implementations disclosed herein to create network SLA for server-to-server traffic. IP next generation network (IP NGN) providers can use some implementations disclosed herein to provide improved quality of service (e.g., customer specific bandwidth or bandwidth for specific application). Some implementations can provide end-to-end and client-to-cloud bandwidth SLAs. Some implementations can utilize per use models based on use patterns. Some implementations can utilize differentiated billing models based on, for example, time-based patterns. The NCS is a singe point to record the use patterns and the time-based patterns. The single point feature also enables configuring policies (e.g., the SLAs) in one place (e.g., the NCS). This single point feature further provides operational ease in creating policies (e.g., SLAs), obtaining data showing flow patterns, obtaining analytics of network traffic, and creating billing models for users.

[0034] Another aspect involves pinning a packet flow to a path on the managed network. In principle, a network device can pin the path of a packet flow by sending, for example, a list of switch id's through which the packet may be passed, a label stack of multiprotocol label switching (MPLS) labels, or a stack of switch media access control (MAC) addresses. These approaches, while possible, do not scale well, in the sense that each network device on the managed network stores parameters related to each "flow", and if the number of flows grows very high, the cost of book keeping by all the network devices becomes prohibitively high.

[0035] To address the above-mentioned scaling issues, some implementations allow the network devices on the managed network to decide on the specific path to forward the packet flow at a given time. The network device forwarding the packet flow considers the specific path as the best path. In particular, when congestion occurs on a given path, the network devices on the managed network can propagate a dynamic "path cost," as discussed below. This "congestion notification" can be an Ethernet PAUSE frame, or other kinds of control plane enhancements (e.g. routing updates with link specific cost updates for L2/L3).

[0036] FIG. 4 shows another flow chart of an example implementation. In block 402, a network device receives a congestion notification. In some implementations, the congestion notification is generated by a neighboring network device. In some implementations, the congestion notification indicates time-out conditions on a given link. In some other implementations, the congestion notification indicates a link failure on a given link, as discussed below in association with FIG. 5

[0037] In response to the congestion notification, the network device propagates the congestion notification in the managed network, for example, to other network devices and the NCS, as illustrated in block 404. In some embodiments, the network device propagates the congestion notification by relaying the received congestion notification. In some other embodiments, the network device propagates the congestion notification by encapsulating the payload of the received congestion notification packet into a notification packet from the network device. In some other implementations, the network device propagates the congestion notification by generating a brand new notification.

[0038] In block 406, the NCS updates the database of the network topology information to reflect the congestion. Thus, the NCS tracks the live state information of all links in the managed network. Subsequent credit requests will be considered in view of the most current network topology information.

[0039] FIG. 5 illustrates an example response to a congested link according to an implementation. Source S transmits a packet flow to destination D through a managed network. The managed network includes network devices, such as access switches and routers, as discussed above. The packet flow from source S to destination D can take, for example, sixteen different paths in FIG. 5. Each path traverses several links between the network devices of the managed network. As illustrated in FIG. 5, links 502 become congested because, for example, traffic on links 502 has exceeded the bandwidth limit. The congestion on links 502 indicates that links 504 are unattractive to the packet flow from source S to destination D because subsequent packets on links 504 will be stuck when they are forwarded onto links 502 (which is congested). Therefore, the link congestion should be radiated to network devices associated with links 504. The network devices associated with links 504 process the congestion notification and then choose links 506 as alternate paths for subsequent packets of the packet flow for source S to destination D.

[0040] When a link failure occurs, flow redirection will take place in the managed network. For example, a central arbiter present on one of the network devices or the NCS. The central arbiter is aware of current state of links in the network, including the link failure that has just occurred. The central arbiter on a network device then computes shortest paths for each packet flow departing the network device. If a given packet flow is pinned to a path on which a link has failed, then the central arbiter redirects the pinned flow to other paths, on subsequent requests. If the central arbiter only performs admission control and the network devices on the managed network dynamically determine a path for a given packet flow, then a link failure on the path causes congestion message to be signaled to upstream network devices. The upstream network devices then install ACLs on downstream network devices to redirect subsequent packets of the affected flows.

[0041] Returning to FIG. 4, in block 408, the NCS may choose packet flows suitable for "swapping" from existing paths to new paths. These swapped packet flows may account for a very small percentage of the total number of packet flows. For example, the swapped packet flows may typically account for less than 10% of the total number of packet flows. The choice can factor in two considerations. First, the packet flow is neither an elephant flow nor a mouse flow. Elephant flow is an extremely large (in total bytes) continuous flow set up by, for example, TCP (or other protocol). Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time. Mouse flow, however, tends to be short (in total bytes) flow set up by, for example, TCP (or other protocol). In other words, the swappable flow is intermediate sized packet flows. Second, the packet flow is not very long-lived or not very short-lived. In other words, the swappable flow is intermediate in duration. The rational basis for the two considerations may include the fact that swapping elephant flows can cause flow toggling, and swapping very small or short-lived flows may not help mitigating the congestion. FIG. 6 summarizes the example swappable flow in two dimensions, namely, the size and duration.

[0042] As discussed above in association with FIG. 1, the NCS uses network statistics to determine which flows are swappable based on the above considerations. Once the swappable flows have been determined based on network statistics, the NCS can install access control lists (ACLs) on a particular network device to redirect these flows to alternate paths. In effect, another aspect of the implementations involves the installation of a redirect ACL for each flow, or a flow specific access table. The NCS can remove these ACLs as soon as congestion eases. In some implementations, the ease condition can be signaled through a control plane protocol or after a certain pre-defined timer. The NCS can use the ease condition signal to remove these ACLs.

[0043] By default, the managed network routes each packet to one of the equal cost multiple path (ECMPs) based on a hash computation. The hash computation can be based on five tuples, namely, source MAC address (SMAC), destination MAC address (DMAC), source IP address (S-IP), destination IP address (D-IP), and port. The installed ACL according to some implementations can override the above hash computation results. When the ACL is removed, the path determined by the hash computation will be restored as the default. Therefore, installing ACLs to redirect flows may only occur when a link congestion has occurred in the managed network.

[0044] The disclosed and other examples can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The implementations can include single or distributed processing of algorithms. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, or a combination of one or more them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

[0045] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[0046] The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[0047] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer can include a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer can also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data can include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[0048] While this document describe many specifics, these should not be construed as limitations on the scope of an invention that is claimed or of what is claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features is described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination is directed to a sub-combination or a variation of a sub-combination. Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results.

[0049] Only a few examples and implementations are disclosed. Variations, modifications, and enhancements to the described examples and implementations and other implementations can be made based on what is disclosed.

* * * * *