Identifying Malicious Client Network Applications Based On Network Request Characteristics BILAS; Maciej ; et al. [CLOUDFLARE, INC.]

Identifying Malicious Client Network Applications Based On Network Request Characteristics

BILAS; Maciej ; et al.

Patent Application Summary

U.S. patent application number 17/672109 was filed with the patent office on 2022-06-02 for identifying malicious client network applications based on network request characteristics. The applicant listed for this patent is CLOUDFLARE, INC.. Invention is credited to Maciej BILAS, John GRAHAM-CUMMING, Marek MAJKOWSKI.

Application Number	20220174075 17/672109
Document ID	/
Family ID	1000006140191
Filed Date	2022-06-02

United States Patent Application	20220174075
Kind Code	A1
BILAS; Maciej ; et al.	June 2, 2022

IDENTIFYING MALICIOUS CLIENT NETWORK APPLICATIONS BASED ON NETWORK REQUEST CHARACTERISTICS

Abstract

An edge server receives a plurality of requests from a client network application for actions to be performed on a resource that is hosted at an origin server. The edge server determines request attributes of the requests and associates the request attributes with a session identifying the client network application. The edge server generates a confidence value for the client network application based at least on the determined request attributes of the plurality of requests and computed session metrics of the session. When the confidence value indicates that the client network application is malicious, the edge server performs one or more mitigation actions.

Inventors:

BILAS; Maciej; (Warsaw, PL) ; GRAHAM-CUMMING; John; (London, GB) ; MAJKOWSKI; Marek; (Warsaw, PL)

Applicant:

Name	City	State	Country	Type
CLOUDFLARE, INC.	San Francisco	CA	US

Family ID:

1000006140191

Appl. No.:

17/672109

Filed:

February 15, 2022

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
16417367	May 20, 2019	11252182
17672109

Current U.S. Class:	1/1
Current CPC Class:	H04L 9/3271 20130101; H04L 43/04 20130101; H04L 63/145 20130101; H04L 67/146 20130101; H04L 63/1458 20130101
International Class:	H04L 9/40 20060101 H04L009/40; H04L 9/32 20060101 H04L009/32; H04L 43/04 20060101 H04L043/04; H04L 67/146 20060101 H04L067/146

Claims

1. A method, comprising: receiving a plurality of requests from a client network application, each request in the plurality of requests for an action to be performed on a resource that is hosted at an origin server; for each request in the plurality of requests, determining one or more request attributes of the request and associating the one or more request attributes of the request with a session that identifies the client network application; computing one or more session metrics of the session; generating a confidence value for the client network application based at least on the determined request attributes of the plurality of requests and the computed session metrics of the session; determining that the confidence value indicates that the client network application is malicious; in response to determining that the confidence value indicates that the client network application is malicious, performing one or more mitigation actions.

2. The method of claim 1, wherein generating the confidence value for the client network application comprises: retrieving historical request data from a data structure, the historical request data including previous request attributes of previous requests from the client network application; and analyzing the previous request attributes of the previous requests to identify patterns between the previous requests and the plurality of requests.

3. The method of claim 1, wherein performing the one or more mitigation actions comprises: modifying a reputation score associated with the client network application; and initiating a challenge process in response to a subsequent request from the client network application.

4. The method of claim 1, wherein performing the one or more mitigation actions comprises: blocking the request from transmittal to the origin server.

5. The method of claim 1, further comprising: storing the one or more request attributes of the request and the session metrics in a data structure, wherein request attributes for subsequent requests are compared to the request attributes in the data structure to prevent subsequent requests having similar request attributes from being sent to the origin server.

6. A non-transitory machine-readable storage medium that provides instructions that, when executed by a processor, cause said processor to perform operations comprising: receiving a plurality of requests from a client network application, each request in the plurality of requests for an action to be performed on a resource that is hosted at an origin server; for each request in the plurality of requests, determining one or more request attributes of the request and associating the one or more request attributes of the request with a session that identifies the client network application; computing one or more session metrics of the session; generating a confidence value for the client network application based at least on the determined request attributes of the plurality of requests and the computed session metrics of the session; determining that the confidence value indicates that the client network application is malicious; in response to determining that the confidence value indicates that the client network application is malicious, performing one or more mitigation actions.

7. The non-transitory machine-readable storage medium of claim 6, wherein generating the confidence value for the client network application comprises: retrieving historical request data from a data structure, the historical request data including previous request attributes of previous requests from the client network application; and analyzing the previous request attributes of the previous requests to identify patterns between the previous requests and the plurality of requests

8. The non-transitory machine-readable storage medium of claim 6, wherein performing the one or more mitigation actions comprises: modifying a reputation score associated with the client network application; and initiating a challenge process in response to a subsequent request from the client network application.

9. The non-transitory machine-readable storage medium of claim 6, wherein performing the one or more mitigation actions comprises: blocking the request from transmittal to the origin server.

10. The non-transitory machine-readable storage medium of claim 6 that provides instructions that, when executed by the processor, cause said processor to further perform operations comprising: storing the one or more request attributes of the request and the session metrics in a data structure, wherein request attributes for subsequent requests are compared to the request attributes in the data structure to prevent subsequent requests having similar request attributes from being sent to the origin server.

11. An apparatus, comprising: a processor; a non-transitory machine-readable storage medium coupled with the processor that stores instructions that, when executed by the processor, cause said processor to perform the following: receive a plurality of requests from a client network application, each request in the plurality of requests for an action to be performed on a resource that is hosted at an origin server; for each request in the plurality of requests, determine one or more request attributes of the request and associating the one or more request attributes of the request with a session that identifies the client network application; compute one or more session metrics of the session; generate a confidence value for the client network application based at least on the determined request attributes of the plurality of requests and the computed session metrics of the session; determine that the confidence value indicates that the client network application is malicious; in response to determining that the confidence value indicates that the client network application is malicious, perform one or more mitigation actions.

12. The apparatus of claim 11, wherein generating the confidence value for the client network application comprises: retrieving historical request data from a data structure, the historical request data including previous request attributes of previous requests from the client network application; and analyzing the previous request attributes of the previous requests to identify patterns between the previous requests and the plurality of requests.

13. The apparatus of claim 11, wherein performing the one or more mitigation actions comprises: modifying a reputation score associated with the client network application; and initiating a challenge process in response to a subsequent request from the client network application.

14. The apparatus of claim 11, wherein performing the one or more mitigation actions comprises: blocking the request from transmittal to the origin server.

15. The apparatus of claim 11, wherein the instructions further cause said processor to perform the following: store the one or more request attributes of the request and the session metrics in a data structure, wherein request attributes for subsequent requests are compared to the request attributes in the data structure to prevent subsequent requests having similar request attributes from being sent to the origin server.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of U.S. application Ser. No. 16/417,367, filed May 29, 2019, which is hereby incorporated by reference.

FIELD

[0002] Embodiments of the invention relate to the field of network communications; and more specifically, to identifying malicious client network applications based on network request characteristics.

BACKGROUND

[0003] Internet hosts are concerned with maintaining high security, performance, and reliability of their hosted resources, such as websites. As the popularity of a resource increases, so does the amount of network traffic that is directed to the resource. A resource can also be targeted by attacks from bots attempting to make a resource unavailable to legitimate users by flooding a resource with requests, commonly referred to as a distributed denial-of-service (DDoS) attack, or to abuse the functionality of the website (e.g., content scraping, credential stuffing, slow brute force attacking, medium-volume attacks at slow endpoints). Heavy traffic caused by attacking bots can affect the security, performance and reliability of a resource.

[0004] Conventional security solutions typically either fingerprint the user agent by what is included in the request headers or present a challenge to gather additional information for fingerprinting. The fingerprint signatures are then used to classify traffic. The signatures can take different forms: a specific header order, client IP address, support of JavaScript, CAPTCHA solution status, and/or a machine learned classifier score. These signatures may be used in conjunction with traffic volume to classify a DDoS attack. However, these conventional security solutions do not effectively detect and stop attacks that are designed to abuse the functionality of the website.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:

[0006] FIG. 1 illustrates an exemplary system according to some embodiments described herein;

[0007] FIG. 2 is a flow diagram that illustrates exemplary operations for identifying malicious client network applications along a request path based on request attributes and session metrics according to an embodiment, according to an embodiment;

[0008] FIG. 3 is a flow diagram that illustrates exemplary operations for identifying malicious client network applications by a centralized server based on request attributes and session metrics according to an embodiment, according to an embodiment; and

[0009] FIG. 4 is a block diagram illustrating a data processing system that can be used in an embodiment.

DESCRIPTION OF EMBODIMENTS

[0010] A server identifies and mitigates attacks that are designed to abuse the functionality of a website. A server receives a request from a client device regarding a resource that is hosted at the server. For example, the request may be an HTTP/S GET request method for the resource. The server determines one or more request attributes and associates them with an HTTP session associated with the request. For instance, timing information of the request and one or more characteristics of the request are recorded and associated with its session. The timing information includes when the request was received (e.g., date and/or time). The one or more characteristics of the request include request "Accept", "Accept-Language" header value, requested resource path, presence of a session cookie, response status code, content type, and cache status (e.g., whether the resource was available in cache or not), etc. A session is typically a collection of multiple requests and responses of the client network application, and can be identified by the TCP connection, by a cookie, by an IP address of the client network application, by the pair of the IP address of the client network application and user-agent, or any combination thereof. In some embodiments, a session is an ordered collection of multiple requests and responses of the client network application. In such embodiments, request-response pairs can be chronologically ordered.

[0011] One or more session metrics are computed of the session including one or more of: the duration of the session, the number of requests received of the session, the ratio of the number of failed requests compared to the number of successful requests during the session, inter-request gap distribution (e.g., the amount of time passing from the previous request to the current request), request content-type distribution, inter-request timings for specific content types (e.g., inter-request timings for HTML pages and/or JSON objects, the components of a requested resource [images, javascript files, videos, etc.]), response code distribution, and/or HTTP method distribution. One or more zone-wide metrics (of all sessions for a zone over a predetermined time) may also be computed including one or more of: the median duration of all sessions, the median number of requests per session across all sessions, the number of failed requests compared to the number of successful requests across all sessions, inter-request gap distribution across all sessions (e.g., the amount of time passing from one request to another request during the sessions), request content-type distribution across the sessions, inter-request timings for specific content types across the sessions (e.g., inter-request timings for HTML pages), response code distribution across the sessions, and/or HTTP method distribution across the sessions. The ratios between the per-session metrics the zone-wide metrics may also be computed.

[0012] A comparison is made between the current request and/or session with historical data analyzed from previous requests and/or sessions to determine whether a client network application is malicious (e.g., a bot that serves malicious purposes). In an embodiment, the calculated values are scored against a previously trained machine learning model to determine if the client network application is malicious or not. Based on the analysis of the information of the request and its session, a confidence value is generated indicating whether the client network application is malicious. When the confidence value indicates that the client network application is malicious, the server performs one or more mitigation actions. For example, the server can block the request or flag the request. When the confidence value indicates that the client network application is not malicious, the server processes the request in its normal way (e.g., where the server is an edge server, the edge server may send the request to an origin server hosting a requested resource). In embodiments where the confidence value is generated offline and not on the request path (e.g., computed by a centralized server), the server can perform one or more mitigation actions (e.g., drop the request) in response to receiving a subsequent request from a client network application that matches a particular IP address and/or a fingerprint of a particular client network application.

[0013] In conventional security solutions, systems can detect DDoS attacks by identifying an atypical volume of traffic (e.g., a larger number of network requests than expected), and mitigate the DDoS attack by blocking certain traffic. Also, other conventional security solutions either fingerprint the user agent by what is included in the request headers or present a challenge to gather additional information for fingerprinting, that can be used in combination with traffic volume to classify DDoS attack traffic. However, these conventional security solutions do not effectively detect and stop attacks that are designed to abuse the functionality of the website.

[0014] In contrast, embodiments described herein allow the for the detection and mitigation of attacks that are designed to abuse the functionality of the website. Embodiments of the invention provide many technical advantages, in addition to addressing the deficiencies of previous solutions. For example, embodiments of the invention evaluate a stream of requests (or multiple requests) from a session versus looking at just the volume of traffic. As noted above, conventional security solutions are designed to detect DDoS attacks. In contrast, by evaluating a stream of requests from a session, embodiments of the invention can detect other types of attacks, such as content scraping and credential stuffing.

[0015] FIG. 1 illustrates an exemplary network architecture that use embodiments described herein. The service illustrated in FIG. 1 includes edge server(s) 120 that are situated between client computing devices 110A-I and origin server(s) 130A-N. In one embodiment, edge server(s) 120 are reverse proxy servers. In one embodiment, certain network traffic is received and processed through edge server(s) 120. For example, web traffic (e.g., HTTP requests/responses, HTTPS requests/responses, SPDY requests/responses, HTTP/2 requests, responses, etc.) for domains handled by origin servers 130A-N may be received and processed at edge server(s) 120. In one embodiment, domain owners are customers of the cloud-based edge service and certain network traffic for their websites are received and processed at edge server(s) 120.

[0016] Client devices 110A-I are computing devices (e.g., laptops, workstations, smartphones, palm tops, mobile phones, tablets, gaming systems, set top boxes, wearable devices, electronic devices, etc.) that are capable of transmitting and/or receiving network traffic. The network traffic may be legitimate network traffic or illegitimate network traffic (e.g., traffic that is part of an attack). Each of client devices 110A-I executes client network application 115 that is capable of transmitting and/or receiving network traffic. For example, client network application 115 may be a web browser or other application that can access network resources (e.g., web pages, images, JSON documents, word processing documents, PDF files, movie files, music files, or other computer files). The client network application may be a scripting application or other application that may be participating in an attack. The client network application 115 may be a legitimate client network application that sends legitimate network traffic or may be a malicious client network application that sends malicious network traffic (e.g., a bot that is designed to abuse the functionality of the website).

[0017] Origin servers 130A-N are computing devices that may serve and/or generate network resources (e.g., web pages, images, word processing documents, PDF files movie files, music files, or other computer files). Origin server 130A-N may also be another edge server to the server that serves and/or generates network resources. Although not illustrated in FIG. 1, it should be understood that the network resources of origin servers 130A-N may be stored separately from the device that responds to the requests. Some of origin servers 130A-N may handle multiple domains that resolve to edge server(s) 120.

[0018] Although not illustrated in FIG. 1, in one embodiment the service includes multiple nodes (referred herein as "edge service nodes"). Each edge service node may include any of one or more edge servers, one or more control servers, one or more DNS servers (e.g., one or more authoritative name servers), and one or more other pieces of networking equipment (e.g., one or more routers, switches, hubs, etc.). The edge service node may be part of the same physical device or multiple physical devices. For example, the edge server(s), control server(s), and DNS server(s) may be virtual instances running on the same physical device or may be separate physical devices. Each edge service node may be part of a data center or a collocation site.

[0019] The service may also include control server 125, which may be owned or operated by the service. In one embodiment, control server 125 provides a set of tools and interfaces for customers (e.g., domain owners) to configure security settings of the service. In some embodiments, control server 125 may be used to send a command to edge server(s) 120 to perform the classification of requests to identify whether requests are from malicious or non-malicious client network applications, as described herein.

[0020] In some embodiments, the service includes multiple edge servers that are geographically distributed. For example, in some embodiments, the service uses multiple edge service nodes that are geographically distributed to decrease the distance between requesting client devices and content. The authoritative name servers may have a same anycast IP address and the edge servers may have a same anycast IP address. As a result, when a DNS request is made, the network transmits the DNS request to the closest authoritative name server (in terms of the routing protocol metrics). That authoritative name server then responds with one or more IP addresses of one or more edge servers within the edge service node. Accordingly, a visitor will be bound to that edge server until the next DNS resolution for the requested hostname (according to the TTL ("time to live") value as provided by the authoritative name server). In some embodiments, instead of using an anycast mechanism, embodiments use a geographical load balancer to route traffic to the nearest edge service node.

[0021] To classify the requests to determine a likelihood of whether the traffic is being received from a non-malicious client network application (e.g., from a human user that is not attacking the website) versus a malicious client network application, the service analyzes requests received by edge server(s) 120 from client network applications (e.g., client network application 115) operating on client devices (e.g., client devices 110A-I). Edge server(s) 120 includes request-based bot detection and mitigation module 170 that is configured to receive requests to access resources hosted by origin server 130A-N. In addition, request-based bot detection and mitigation module 170 on control server 125 and/or edge server(s) 120 is further configured to analyze attributes of the requests and session metrics, determine whether a client network application is likely malicious based on the determined attributes and session metrics, and perform one or more mitigation actions as appropriate.

[0022] In one embodiment, request-based bot detection and mitigation module 170 determines for each request, one or more request attributes and associates the request attributes with its session. For example, the one or more request attributes may include timing information of the request and one or more characteristics of the request. The timing information includes when the request was received (e.g., date and/or time). The one or more characteristics of the request include response status code, content type, etc. The HTTP session can be identified by the TCP connection, by a cookie, by an IP address of the client network application, by the pair of the IP address of the client network application and user-agent, or any combination thereof. In an embodiment, a request-based bot detection and mitigation module 170 on the edge server 120 may determine the request attribute(s) and the session identifier for each request and transmit that information to the request-based bot detection and mitigation module 170 on the control server 125 for further processing including calculating the confidence value offline.

[0023] The request-based bot detection and mitigation module 170 computes one or more session metrics for the session including one or more of: the duration of the session, the number of requests received of the session, the ratio of the number of failed requests compared to the number of successful requests during the session, inter-request gap distribution (e.g., the amount of time passing from the previous request to the current request), request content-type distribution, inter-request timings for specific content types (e.g., inter-request timings for HTML pages), response code distribution, and/or HTTP method distribution. The request-based bot detection and mitigation module 170 also computes one or more zone-wide metrics (of all sessions for a zone over a predetermined time) including one or more of: the duration of all sessions, the number of requests across all sessions, the number of failed requests compared to the number of successful requests across all sessions, inter-request gap distribution across all sessions (e.g., the amount of time passing from one request to another request during the sessions), request content-type distribution across the sessions, inter-request timings for specific content types across the sessions (e.g., inter-request timings for HTML pages), response code distribution across the sessions, and/or HTTP method distribution across the sessions. The ratios between the per-session metrics the zone-wide metrics may also be computed.

[0024] The request-based bot detection and mitigation module 170 determines, using the request attributes and the session metrics, a likelihood of whether the traffic is being received from a legitimate client network application (e.g., from a human user that is not attacking the website) versus a malicious client network application (e.g., an attacking bot). In an embodiment, the calculated values are scored against a previously trained machine learning model to determine if the client network application is malicious or not. The machine learning module may produce a confidence value indicating whether the client network application is suspected to be malicious.

[0025] If it is determined that the client network application is suspected to be malicious, the request-based bot detection and mitigation module 170 performs one or more mitigation actions. For example, the request may be dropped. As another example, a reputation score for the client network application (e.g., as identified through the source IP address of the client network application) may be lowered that may cause future requests from the client network application to be blocked or challenged (e.g., via a CAPTCHA). In some embodiments, edge server 120 creates a "(request label, request confidence %)" tuple as metadata associated with the request. In such embodiments, a filter-based firewall can retrieve the metadata associated with the request and block the request.

[0026] In one embodiment, request-based bot detection and mitigation module 170 performs the above-noted operations on a request in real-time prior to further processing (e.g., prior to sending the request to its origin server). In other embodiments, request-based bot detection and mitigation module 170 performs the above-noted operations on a request asynchronously with sending the request to origin servers 130A-N or offline after sending the request to origin servers 130A-N.

[0027] FIG. 2 is a flow diagram 200 that illustrates exemplary operations for identifying malicious client network applications along a request path based on request attributes and session metrics according to an embodiment. The operations of FIG. 2 will be described with reference to the exemplary embodiment of FIG. 1. However, it should be understood that the operations of FIG. 2 can be performed by embodiments of the invention other than those discussed with reference to FIG. 1, and the embodiments discussed with reference to FIG. 1 can perform operations different than those discussed with reference to FIG. 2. The operations of FIG. 2 are described as being performed by an edge server (e.g., edge server 120). However, in some embodiments, the operations are performed by another device (e.g., control server 125, origin servers 130A-N, etc.). In some embodiments, the operations are performed by request-based bot detection and mitigation module 170 operating on edge server(s) 120, control server 125, or origin servers 130A-N.

[0028] At operation 205, an edge server (e.g., one edge server from set of edge servers 120) receives a plurality of requests from a client network application. In one embodiment, each request in the plurality of requests is for an action to be performed on a resource that is hosted at an origin server. For example, edge server 120 receives a plurality of HTTP "GET" requests to access resources hosted by one or more origin servers (e.g., origin servers 130A-N). In one embodiment, each request is received by edge server 120 as a result of a DNS for the hostname resolving to an IP address of edge server 120 instead of resolving to an IP address of origin server 130A. For example, a requested resource is an HTML page located at, e.g., www.example.com. In one embodiment, edge server 120 receives the request for the resource from client network application 115 (e.g., a browser or other network application) operating on a client device (e.g., client device 110A). In one embodiment, edge server 120 receives the request as part of an HTTP session from client device 110A.

[0029] At operation 210, edge server 120 determines one or more request attributes for each request in the plurality of requests and associates the one or more request attributes with a session that identifies the client network application. For example, edge server 120 determines the one or more request attributes, including timing information of the request and one or more characteristics of the request. Characteristics of the request can include an IP address, an order of request headers, whether the request includes a malformed header, etc. The timing information can include when the request was received (e.g., date and/or time). The one or more characteristics of the request include response status code, content type, etc.

[0030] Edge server 120 can group the plurality of requests into a session to associate requests that occur within a period of time and received from the same client network application. In one embodiment, edge server 120 generates a unique session identifier to uniquely identify requests from a particular client network application.

[0031] At operation 215, edge server 120 computes one or more session metrics of the session. In one embodiment, session metrics of the session can include information regarding a duration of the session, a number of requests made in the session, a number of unique requests made, a number of unique requests for a specific content-type, and a number of failed and successful requests.

[0032] Edge server 120 can determine patterns related to the current requests within a session and/or how the current request relates to previous requests from previous sessions. For example, edge server 120 can determine an inter-request time gap distribution to determine an amount of time passing between requests for a particular resource or type of resource. Edge server 120 can also determine an inter-request time for specific content types to determine an amount of time passing between requests of a specific content type (e.g., HTML pages). Edge server 120 can also determine other characteristics, including: a response content-type distribution, a request HTTP method distribution, a response cache presence distribution, a number of unique paths accessed, and a number of unique paths access with a specific content type (e.g., HTML pages).

[0033] Edge server 120 can also determine, based on historical request data, whether the current request exhibits attributes of similar previous requests, or if the current request exhibits abnormal or unexpected behavior. In one embodiment, edge server 120 accesses a database or a data structure storing historical request data. In one embodiment, the historical request data includes attributes of previous requests and previous sessions. In one embodiment, the historical request data includes data for requests previously analyzed by edge server 120 and/or for requests previously analyzed by edge servers other than edge server 120. In one embodiment, edge server 120 retrieves historical request data for previous requests having similar attributes as the current request. For example, edge server 120 can retrieve previous requests having a similar request content type, from the same source as the current request (e.g., the same client network application), requests directed to the same resources as one or more of the plurality of requests, etc. As an example, a shorter amount of time between requests than expected based on the historical request data can indicate the requests are part of a malicious action.

[0034] In one embodiment, edge server 120 also computes one or more zone-wide metrics (of all sessions for a zone over a predetermined time) to determine a baseline to compare against the session for the plurality of requests received in operation 205. The one or more zone-wide metrics include one or more of: the duration of all sessions, the number of requests across all sessions, the number of failed requests compared to the number of successful requests across all sessions, inter-request gap distribution across all sessions (e.g., the amount of time passing from one request to another request during the sessions), request content-type distribution across the sessions, inter-request timings for specific content types across the sessions (e.g., inter-request timings for HTML pages), response code distribution across the sessions, and/or HTTP method distribution across the sessions. The ratios between the per-session metrics the zone-wide metrics may also be computed.

[0035] In operation 220, edge server 120 generates a confidence value for the client network application based on the analysis of the determined attributes of the plurality of requests, and, based on the computed session metrics of the session. In one embodiment, edge server 120 generates the confidence value for the request using a machine learning model. In some embodiments, edge server 120 generates the confidence value for the request by applying varying weights to the attributes of the request and session metrics of the session, and to the historical request data.

[0036] Edge server 120 also generates the confidence value by retrieving historical request data from a data structure, the historical request data including previous request attributes and previous session metrics associated with previous requests from the client network application. Edge server 120 analyzes the previous request attributes and previous session metrics associated with previous requests to identify patterns between the previous requests and the plurality of requests to generate the confidence value for the client network application.

[0037] In one embodiment, the confidence value indicates whether the request has attributes indicating it originates from a malicious client network application or is legitimate network traffic from a non-malicious client network application, an API, or from a trusted source.

[0038] In operation 225, edge server 120 determines whether the confidence value indicates the client network application is malicious or non-malicious. Edge server 120 can compare the confidence value against a threshold value. The threshold value can be a default value, a system-defined values, or a user-defined value. When edge server 120 determines that the confidence value for the client network application indicates the client network application is not malicious, the flow proceeds to operation 230. When edge server 120 determines that the confidence value for the client network application indicates the client network application is malicious, the flow proceeds to operation 235.

[0039] In operation 230, edge server 120 sends the plurality of requests to the origin servers (e.g., origin servers 130A-N) when edge server 120 determines that the client network application is not malicious (e.g., there is high confidence that the request is not an attacking bot or illegitimate network traffic). In one embodiment, edge server 120 sends the plurality of requests received in operation 205 to the appropriate origin server(s). In one embodiment, edge server 120 stores the requests, including the attributes of the request and the session in a data structure (e.g., the data structure from which edge server 120 previously retrieved historical request data), for use in subsequent analyses of requests received by edge server 120.

[0040] In operation 235, edge server 120 performs one or more mitigations action on the request in response to edge server 120 determining that the confidence value indicates the client network application is malicious. In one embodiment, edge server 120 blocks the plurality of requests from transmittal to the origin server(s). In some embodiments, edge server 120 modifies a reputation score associated with the client network application. In such embodiments, when the reputation score is below a threshold value, edge server 120 initiates a challenge process in response to requests from the client network application. In some embodiments, edge server 120 creates a "(request label, request confidence %)" tuple as metadata associated with the request. In such embodiments, a filter-based firewall can retrieve the metadata associated with the request and block the request.

[0041] In one embodiment, when the confidence value for the client network application indicates the client network application is malicious, edge server 120 also stores the request, including the attributes of the request and the session in the data structure from which edge server 120 previously retrieved historical request data. When storing the request information, edge server 120 can also flag the stored request information to indicate that the request was determined to have a low confidence (e.g., indications the request originated from malicious client network application) to prevent subsequent requests having similar attributes as the plurality of requests from being sent to the origin server.

[0042] FIG. 3 is a flow diagram 300 that illustrates exemplary operations for identifying malicious client network applications by a centralized server (e.g., "offline" or outside the request path) based on request attributes and session metrics according to an embodiment. The operations of FIG. 3 will be described with reference to the exemplary embodiment of FIG. 1. However, it should be understood that the operations of FIG. 3 can be performed by embodiments of the invention other than those discussed with reference to FIG. 1, and the embodiments discussed with reference to FIG. 3 can perform operations different than those discussed with reference to FIG. 3. The operations of FIG. 3 are described as being performed by an edge server (e.g., edge server 120) and a central server (e.g., control server 125).

[0043] In operation 305, an edge server (e.g., one edge server from set of edge servers 120) receives a plurality of requests from a client network application. In one embodiment, each request in the plurality of requests is for an action to be performed on a resource that is hosted at an origin server. For example, edge server 120 receives a plurality of HTTP "GET" requests to access resources hosted by one or more origin servers (e.g., origin servers 130A-N). In one embodiment, each request is received by edge server 120 as a result of a DNS for the hostname resolving to an IP address of edge server 120 instead of resolving to an IP address of origin server 130A. For example, a requested resource is an HTML page located at, e.g., www.example.com. In one embodiment, edge server 120 receives the request for the resource from client network application 115 (e.g., a browser or other network application) operating on a client device (e.g., client device 110A). In one embodiment, edge server 120 receives the request as part of an HTTP session from client device 110A.

[0044] In operation 310, edge server 120 determines one or more request attributes for each request in the plurality of requests and associates the one or more request attributes with a session that identifies the client network application.

[0045] In operation 315, edge server 120 sends the determined request attributes from the plurality of requests to control server 125. In this manner, control server 125 performs the analysis of the request attributes outside of the request path.

[0046] In embodiments where edge server 120 sends the determined request attributes from the plurality of requests to control server 125 outside of the request path, edge server 120 can send the request to the appropriate origin server prior to, concurrently with, or after sending the determined request attributes from the plurality of requests to control server 125

[0047] In operation 320, control server 125 receives the request attributes from edge server 120. In some embodiments, control server 125 receives request attributes from multiple other edge server(s), in addition to the request attributes from edge server 120.

[0048] In operation 325, control server 125 computes one or more session metrics of the session. In one embodiment, session metrics of the session can include information regarding duration of the session, a number of requests made in the session, a number of unique requests made, a number of unique requests for a specific content-type, and a number of failed and successful requests. In one embodiment, control server 125 computes one or more session metrics of the session in a like manner as described in operation 215.

[0049] In operation 330, control server 125 edge server 120 generates a confidence value for the client network application based on the analysis of the determined attributes of the plurality of requests and based on the computed session metrics of the session. In one embodiment, control server 125 generates the confidence value for the request in a like manner as described in operation 220.

[0050] In operation 335, control server 125 determines whether the confidence value indicates the client network application is malicious or non-malicious. In one embodiment, control server 125 determines whether the confidence value indicates the client network application is malicious or non-malicious in a like manner as described in operation 225.

[0051] In operation 340, control server 125 generates and sends rule for edge server 120 to take one or more mitigation actions. In some embodiments, control server 125 sends the rules to multiple other edge server(s), in addition to the edge server (e.g., edge server 120) from which control server 125 received the request attributes.

[0052] In response to determining whether the confidence value indicates the client network application is malicious or non-malicious, control server 125 generates and/or modifies rules for handling subsequent requests from the client network application. For example, when the client network application is determined to be malicious, control server 125 can generate or modify rules to block subsequent requests exhibiting similar request attributes. In another example, when the client network application is determined to be non-malicious, control server 125 can generate or modify rules to transmit subsequent requests exhibiting similar request attributes to the appropriate origin server.

[0053] In operation 345, edge server 120 receives receive rules from control server 125 generated based on the request attributes and session metrics and applies the rules to subsequent requests having similar request attributes. In one embodiment, where control server 125 determines that the client network application is malicious, the rules indicate to edge server 120 to perform mitigation actions in response to edge server 120 receiving a subsequent plurality of requests similar to the plurality of requests received in operation 305. In one embodiment, edge server 120 drops subsequent requests received from the client network application or presents a challenge (e.g., a CAPTCHA) in response to subsequent requests from the client network application. In one embodiment, the rules instruct edge server 120 to take mitigation actions against all the traffic for the associated session, e.g., against traffic associated with the IP address of client device 110A. Mitigation action can also be applied based on a client fingerprint pattern (e.g., header order, presence or absence of a specific header or specific header value, etc.).

[0054] FIG. 4 is a block diagram illustrating a data processing system that can be used in an embodiment As illustrated in FIG. 4, the computer system 400, which is a form of a data processing system, includes the bus(es) 450 which is coupled with the processing system 420, power supply 425, memory 430, and the nonvolatile memory 440 (e.g., a hard drive, flash memory, Phase-Change Memory (PCM), etc.). In one embodiment, the computer system 100 includes a cache 410. The bus(es) 450 may be connected to each other through various bridges, controllers, and/or adapters as is well known in the art. The processing system 420 may retrieve instruction(s) from the memory 430 and/or the nonvolatile memory 440 and execute the instructions to perform operations described herein. The bus 450 interconnects the above components together and also interconnects those components to the display controller & display device 470, Input/Output devices 480 (e.g., NIC (Network Interface Card), a cursor control (e.g., mouse, touchscreen, touchpad, etc.), a keyboard, etc.), and the optional wireless transceiver(s) 490 (e.g., Bluetooth, Wi-Fi, Infrared, etc.). In one embodiment, the client device, edger servers, control server, and/or origin servers described herein may take the form of the computer system 400.

[0055] The techniques shown in the figures can be implemented using code and data stored and executed on one or more computing devices (e.g., client devices, servers, etc.). Such computing devices store and communicate (internally and/or with other computing devices over a network) code and data using machine-readable media, such as machine-readable storage media (e.g., magnetic disks; optical disks; random access memory; read only memory; flash memory devices; phase-change memory) and machine-readable communication media (e.g., electrical, optical, acoustical or other form of propagated signals--such as carrier waves, infrared signals, digital signals, etc.). In addition, such computing devices typically include a set of one or more processors coupled to one or more other components, such as one or more storage devices, user input/output devices (e.g., a keyboard, a touchscreen, and/or a display), and network connections. The coupling of the set of processors and other components is typically through one or more busses and bridges (also termed as bus controllers). The storage device and signals carrying the network traffic respectively represent one or more machine-readable storage media and machine-readable communication media. Thus, the storage device of a given computing device typically stores code and/or data for execution on the set of one or more processors of that computing device. Of course, one or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.

[0056] In the preceding description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

[0057] References in the specification to "one embodiment," "an embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0058] In the preceding description and the claims, the terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. "Coupled" is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. "Connected" is used to indicate the establishment of communication between two or more elements that are coupled with each other.

[0059] While the flow diagrams in the figures show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

[0060] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

* * * * *

References

example.com