Polymorphic Treatment of Annotated Content Hoover; Roger S. ; et al. [Shape Security, Inc.]

Polymorphic Treatment of Annotated Content

Hoover; Roger S. ; et al.

Patent Application Summary

U.S. patent application number 15/805114 was filed with the patent office on 2018-05-24 for polymorphic treatment of annotated content. The applicant listed for this patent is Shape Security, Inc.. Invention is credited to Justin D. Call, Roger S. Hoover.

Application Number	20180144133 15/805114
Document ID	/
Family ID	60189625
Filed Date	2018-05-24

United States Patent Application	20180144133
Kind Code	A1
Hoover; Roger S. ; et al.	May 24, 2018

Polymorphic Treatment of Annotated Content

Abstract

A computer-implemented method includes receiving content and annotation information that describe a structure of the content, the annotation information having been previously generated by a sub-system that is separate from a content transformation sub-system and at a time before the content was requested to be served; interpreting the annotation information to generate transcoding rules that identify one or more portions of the received content to be transcoded in serving the content; applying the transcoding rules to the content to change the content in a manner that interferes with an ability of malware on a client device to interfere with operation of the content; and providing the transcoded content to a client device that requested the content.

Inventors:

Hoover; Roger S.; (Granite Canon, WY) ; Call; Justin D.; (Santa Clara, CA)

Applicant:

Name	City	State	Country	Type
Shape Security, Inc.	Palo Alto	CA	US

Family ID:

60189625

Appl. No.:

15/805114

Filed:

November 6, 2017

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
14713493	May 15, 2015	9813440
15805114

Current U.S. Class:	1/1
Current CPC Class:	H04L 63/1441 20130101; G06F 21/125 20130101; G06F 21/54 20130101; G06F 21/566 20130101; G06F 21/55 20130101; G06F 2221/034 20130101
International Class:	G06F 21/56 20060101 G06F021/56; G06F 9/44 20060101 G06F009/44

Claims

1. A computer-implemented method, comprising: receiving, at a content transformation sub-system, content requested by a client device; accessing annotation information that identifies one or more portions of the content to be transcoded, wherein the annotation information is generated by a content analysis sub-system; transcoding the content, by the content transformation sub-system, at the one or more portions, wherein the transcoding comprises inserting additional executable code, that, when executed on the client device, identifies attempted interaction on the client device with the content by malware and transmits a report from the client device comprising data corresponding to the attempted interaction; and after transforming the content, providing the content to the client device that requested the content; wherein the computer-implemented method is performed by one or more computing devices.

2. The computer-implemented method of claim 1, further comprising interpreting, by the content transformation sub-system, the annotation information to generate transformations that are to be performed on the one or more portions of the content.

3. The computer-implemented method of claim 1, wherein the annotation information defines a plurality of types of transformations that are to be applied to the one or more portions of the content.

4. The computer-implemented method of claim 1, wherein the annotation information is received by the content transformation sub-system with the one or more portions of the content in a common electronic file.

5. The computer-implemented method of claim 1, wherein content received by the content transformation sub-system includes the annotation information, and wherein the content provided to the client device after transforming the content does not include the annotation information.

6. The computer-implemented method of claim 1, wherein the content received by the content transformation sub-system includes information that identifies a location at which the annotation information can be accessed by the content transformation sub-system.

7. The computer-implemented method of claim 1, wherein the additional executable code is generated by an intermediary security server system that intercepts data served from, and requests provided to, a web server system hosting the content.

8. The computer-implemented method of claim 1, wherein the annotation information is generated based on input entered by a programmer of the content, in response to suggestions for annotations automatically provided by a software development environment.

9. The computer-implemented method of claim 1, further comprising applying minification rules to the content to reduce a size of the content before providing the content to the client device.

10. A content transformation system, comprising: one or more hardware processors; and a memory coupled to the one or more hardware processors and storing one or more instructions, which when executed by the one or more hardware processors cause the one or more hardware processors to: receive content requested by a client device; access annotation information that identifies one or more portions of the content to be transcoded, wherein the annotation information is generated by a content analysis sub-system; transcode the content at the one or more portions, wherein the transcoding comprises inserting additional executable code, that, when executed on the client device, identifies attempted interaction on the client device with the content by malware and transmits a report from the client device comprising data corresponding to the attempted interaction; and after transforming the content, provide the content to the client device that requested the content.

11. The content transformation system of claim 10, wherein the annotation information is received with the one or more portions of the content in a common electronic file.

12. The content transformation system of claim 10, wherein the one or more instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to generate transformations that are to be applied to the one or more portions of the content.

13. The content transformation system of claim 10, wherein the annotation information defines a plurality of types of transformations that are to be applied to the one or more portions of the content.

14. The content transformation system of claim 10, wherein the content received includes information that identifies a location at which the annotations can be accessed.

15. The content transformation system of claim 10, wherein the annotation information is generated by a software development environment used to generate the content.

16. The content transformation system of claim 10, wherein the one or more instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to apply minification rules to the content to reduce a size of the content before providing the content to the client device.

17. A computer-implemented system, comprising: one or more hardware processors; and a memory coupled to the one or more hardware processors and storing one or more instructions, which when executed by the one or more hardware processors cause the one or more hardware processors to: generate a programming environment for generating content comprising program code; accepting, in the programming environment, annotation information entered by a programmer using the programming environment to generate the content, wherein the annotation information identifies one or more portions of the program code and one or more transformations to apply to the one or more portions of the program code such that the one or more portions are altered a different way, based on the same annotation information, for one or more different times the content is served; wherein the one or more transformations comprise inserting additional executable code, that, when executed on a client device, identifies attempted interaction on the client device with the program code by malware and transmits a report from the client device comprising data corresponding to the attempted interaction.

18. The computer-implemented system of claim 17, wherein the one or more instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to store the annotation information in a database in association with the content, wherein a content transformation system accesses the annotation information from the database.

19. The computer-implemented system of claim 17, wherein the one or more instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to generate a single electronic file comprising the content and information indicating a location at which the annotation information can be accessed by a content transformation sub-system.

20. The computer-implemented system of claim 17, wherein the one or more instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to generate a single electronic file comprising the content and the annotation information to a web server system.

21. The computer-implemented system of claim 20, wherein a content transformation sub-system receives the single electronic file in response to a client device requesting the content from the web server system and transforms the one or more portions of the content in accordance with the annotation information.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit under 35 U.S.C. 120 as a Continuation of U.S. patent application Ser. No. 14/713,493, filed on 2015 May 15, and will issue as U.S. Pat. No. 9,813,440, on 2017 Nov. 7, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application(s).

TECHNICAL FIELD

[0002] This document relates to systems and techniques for interfering with the operation of computer malware, as a mechanism for improving computer system security.

BACKGROUND

[0003] Much of our commerce now occurs in the form of e-commerce, through computer users who access services over the Internet and using the World Wide Web. Because this commerce involves money, it draws unsavory characters to its periphery--in the form of fraudsters. The aim of these people is to intercept or otherwise interfere with the activities of legitimate commerce so as to identify confidential information like account numbers, passwords, user IDs, and the like, as a mechanism toward stealing money from such users or from the organizations that provide services to such users. For example, through a technique known as a "Man in the Browser" attack, malware may be loaded on a client computer and may attempt to intercept information such as account numbers and passwords where a user interacts with a banking site, or passwords and credit card information when the user interacts with an on-line retail store.

[0004] Various approaches have been taken to identify and prevent such malicious activity. For example, some approaches install defensive software on client computers. Alternative approaches run various kinds of analysis tools on the transactions and/or network traffic on a server system to detect improper activity.

SUMMARY

[0005] This document describes systems and techniques by which web code (e.g., HTML, CSS, and JavaScript) is modified before it is served over the Internet by a server system, so as to make more difficult the exploitation of the code and the server system by clients (e.g., various computers such as desktops, laptops, tablets, and smartphones) that receive the code--including clients that are infected by malware without their users' knowledge. In certain implementations discussed below, code served by a web server system can be analyzed and a map, or template, may be generated to permit polymorphic alteration of the code, meaning that the same code is altered in different ways for different times that it is served (either to different people or at different times to a single person). The analysis of the code may be made easier by the provision of annotations that accompany the code and direct how the polymorphic transformations are to be implemented in the code. For example, a header to an HTML or other file may include instructions generated automatically by a programming environment in which the code was created, or manually by a developer of the code, telling a security server system where there are elements in the code that can be changed polymorphically without affecting the functionality of the code. Such annotations may be explicit textual representations, explicit data structure representations, and/or they can be implicitly represented by other structures. For example, when using an application development environment, some annotations could be explicitly attached to ADE data structures while other annotations are implicitly represented by ADE data structures that always imply those implicit annotations.

[0006] The annotations may absolve the transformation server system from having to separately analyze the content before transforming it. In addition, the annotations may perform a more complete or at least more compliant transformation of the content, particularly where the annotations are generated by a person or system that initially generated the content. That is because the annotator may be able to change the way the code is created or represented so as to make the code more transformable (e.g., to avoid coding techniques or particular syntaxes that create problems for transformation). Also, the annotator may have more time to perform needed analysis, such as by performing analysis and building a template for transformation as the code is generated. For example, when a programmer defines a new variable in her code, a programming environment may note that action and may keep track of each instance in which the variable name is referenced in the code (as the programmer adds such references). A corresponding transformation map or template may be updated in the background as the programmer adds the references. The template may then be provided in the final code before it is shipped by the developer's system (e.g., in a section to be treated as comments), or other annotations generated from the template may be provided.

[0007] Also, where an automatic system is performing analysis at coding-time, it can also provide feedback to the programmer through the programming environment, such as by suggesting that a certain piece of recently-produced code may create a problem for a transformation system, and perhaps suggest an alternative approach for the programmer.

[0008] Such feedback may also be provided in coordination with feedback that permits a minifier to operate efficiently on the generated code--where the minifier removes content that will not affect the presentation of the code, so as to make the code smaller for serving, and the transformer changes the content when it is served so as to obfuscate the operation of the code from client-side code or individuals that may try to reverse engineer, otherwise analyze the code, or illicitly interfere with operation of the code.

[0009] Programmers may be allowed to opt into such automatic annotation and suggestions on a block-by-block basis. For example, a programmer may author certain software without a concern for security or minification, and may have such an annotation assistance and analysis system turned off when writing such code--e.g., code that controls non-sensitive or non-confidential interaction with a user. The programmer may then turn on the system while writing code for areas for which security is most needed (e.g., financial transaction modules or login modules) by way of directives inline to the source code or alternate software methods.

[0010] In one implementations, a computer-implemented method comprises receiving, at a content transformation sub-system, content to be served to a computer user over the Internet, and annotation information that describe a structure of the content, the annotation information having been previously generated by a sub-system that is separate from the content transformation sub-system and at a time before the content was requested to be served; interpreting the annotation information to generate transcoding rules that identify one or more portions of the received content to be transcoded in serving the content; applying the transcoding rules to the content to change the content in a manner that interferes with an ability of malware on a client device to interfere with operation of the content; and providing the transcoded content to a client device that requested the content.

[0011] In some aspects, the method also comprises interpreting the annotation information to generate transformations that are to be performed on each of the one or more portions of the received content. The annotation information can define a plurality of types of transformations that are to be applied to the received content, and one or more locations in the content at which each of the types of transformations is to be applied. Also the annotation information can be received in a common electronic file along with the portions of the received content to which the transformation rules are applied. The provided transcoded content, in such a situation, may not include the annotation information, and the version of the content received by the content transformation sub-system does include the annotation information. Moreover, the received content can include a first file having a pointer to a location at which the annotation information can be accessed by the content transformation sub-system

[0012] In yet other aspects, the transcoded web content can be served to the requesting client device with additional, executable code that is to be run on the requesting client device and that is arranged to identify attempted interaction on the requesting client device with the transcoded content and to report some or all of the identified attempted interaction to a central server system. Also, the additional, executable code can be generated by an intermediary security server system located between a web server system and the Internet, and that intercepts data served from, and requests provided to, the web server system. In addition, the annotation information can have been entered by a programmer of the content while programming the content, in response to suggestions for annotations automatically provided by a software development environment. The method can also comprise applying minification rules to the content to reduce a size of the content before providing the transcoded content to the client device that requested the content.

[0013] In another implementation, a computer-implemented system is disclosed that comprises one or more devices including computer-readable media storing electronic program code and associated annotations that identify locations in the program code at which transcoding is to occur when the program code is served; a content transformation sub-system executed by one or more computer servers and arranged to obtain the program code and associated annotations, and to interpret the associated annotations so as to generate transcoding rules that identify one or more portions of the program code to be transcoded in serving the program code; and a web interface arranged to provide the transcoded program code to client devices that request content. The system can also include a software development environment executed with one or more computer systems to generate a programming environment that accepts annotations to program code made by a programmer of the program code, the annotations identifying changes to be made to the program code after the program code is requested but before the program code is served, and saves the code and annotations to the one or more devices.

[0014] In some aspects, the content transformation sub-system is further programmed to generate transformations that are to be performed on each of the one or more portions of the program code. Also, the annotation information can define a plurality of types of security countermeasures to be applied to the program code, and one or more locations in the program code at which each of the types of security countermeasures is to be applied. In addition, the received program code can include a first file having a pointer to a location at which the annotations can be accessed by the content transformation sub-system

[0015] In yet other aspects, the transcoded program code is served to the requesting client devices with additional, executable code that is to be run on the requesting client devices and that is arranged to identify attempted interaction on the requesting client devices with the transcoded program code and to report to a central server system some or all of the identified attempted interaction. Also, the annotations can be automatically generated by an analysis sub-system that attends to the software development environment. Moreover, the software development environment can be programmed to apply minification rules to the program code to reduce a size of the program code before providing the transcoded program code to the client device that requested the program code.

[0016] In yet another implementation, a computer-implemented system is disclosed that comprises a software development environment executed with one or more computer systems to generate a programming environment that accepts annotations to program code made by a programmer of the program code, the annotations identifying changes to be made to the program code after the program code is requested but before the program code is served; means for transcoding the program code using the annotations; and a web interface arranged to provide the transcoded program code to client devices that request content. The received program code can include a first file having a pointer to a location at which the annotations can be accessed by the content transformation sub-system. Also the transcoded program code can be served to the requesting client devices with additional, executable code that is to be run on the requesting client devices and that is arranged to identify attempted interaction on the requesting client devices with the transcoded program code and to report to a central server system some or all of the identified attempted interaction. Likewise, the annotations can be automatically generated by an analysis sub-system that attends to the software development environment, and the software development environment can be programmed to apply minification rules to the program code to reduce a size of the program code before providing the transcoded program code to the client device that requested the program code.

[0017] The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

[0018] FIG. 1A shows graphically the use of annotations to guide content transcoding.

[0019] FIG. 1B is a conceptual diagram of a system for analyzing and transcoding web code using pre-annotation of the code.

[0020] FIGS. 2A-2C show examples of system architectures for passing web code and annotation information for polymorphic transcoding of served web content.

[0021] FIG. 3A is a flow chart of a process for generating code for minification and transcoding.

[0022] FIG. 3B is a flow chart of a process for handling web code for analysis and polymorphic transcoding.

[0023] FIG. 4 is a swim lane diagram of a process for adding annotations to web content to be served polymorphically.

[0024] FIG. 5 shows an example of components for assisting in minification and transcoding of developed web content.

[0025] FIG. 6 shows a system for serving polymorphic and instrumented code.

[0026] FIG. 7 shows an example computer system.

[0027] Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

[0028] Described below are systems and techniques for deflecting and detecting malware activity on client devices to which a server system serves computer code. The examples discussed below may also be used in other settings to insert security countermeasures into a content serving system. The particular examples discussed here involve, among other things, performing analysis on content before the content is called to be served. Such analysis can allow the content to be compressed through minification, and also permit the insertion of security countermeasures into the content.

[0029] The analysis may occur at different times before the serving of the content, including when the content is initially developed, when the content (e.g., as programming code) is compiled or published, and after the content is compiled or published. For example, development-time analysis may occur as a programmer is adding code to a code base, with an analysis sub-system forming an updated map that can be used for traversing the content as part of minification and security efforts later, and also identifying coding actions by the programmer that create problems for minification or security, but that could be changed by the programmer in real-time and thus eliminate some of those problems. The analysis may also be performed by a dedicated analysis system that then transfers to a security service and system the content and associated annotations (which may include instructions that provide a map to parts of the content that need to be transcoded when the content is ultimately served).

[0030] The annotations generated by the analysis may be attached to the underlying content in a variety of ways so that a transcoding, or transformation, system that subsequently receives the content can use the annotations to transcode the content and/or to implement other countermeasures that are specific to the content. For example, the annotations may be inserted by the analysis system into a file that contains the content or part of the content (e.g., an HTML file). Alternatively, the file that contains the content can include a flag or pointer to a file that contains the corresponding annotations. As yet another alternative, the annotations may be stored in a known location in a predetermined manner so that the transformation system can readily find the annotations that correspond to a particular piece or pieces of content, for example, looking for an annotation file in an annotations directory that matches the name of a code file in a code directory

[0031] FIG. 1A shows graphically the use of annotations to guide content transcoding. In general, the figure shows a highly simplified representation of an electronic document 102 that may be produced by a Web server system for serving to one or more clients that request the electronic document 102. The electronic document 102 may include, for example, HTML for producing a web page when the content of the electronic document 102 is rendered by a web browser at a client device. The electronic document 102 may be a single electronic document or may be split into multiple documents, such as HTML code in one document, one or more cascading style sheet (CSS) documents that are associated with the HTML document or documents, and one or more JavaScript files that are also associated with the HTML document or documents. (e.g., are pointed to in a line of the HTML document) Additional types of content may also be served, though the techniques described here are focused on modifications that may be made to executable web code, so as to prevent easy operation for malicious malware that might be installed on clients that receive and execute the web code.

[0032] Looking more specifically now at electronic document 102, the document 102 includes annotations 106 and code 108. The code 108 may take a variety of forms, such as HTML or JavaScript code. For example, the code 108 may include commands that cause a login screen to be rendered on a browser of a device, such as for content served by an Internet retailer or banker. Such an organization may wish to provide security for a login screen, so that malicious parties may not readily obtain user passwords and other credentials, such as by the use of a "man in the browser" mode of attack on a client device.

[0033] The annotations may have been added to the top of the file that holds the source code 108 by a Web server system at the time of serving, or may have been provided with the code by another system, such as an application development environment (ADE) used by a programmer of the web code. The annotations 106 provide commands for a security system to perform actions on the code 108 in a manner that provides greater security for the code 108, and that is customized for the content of the code 108. The annotations 106 may have been created while the code 108 was being created by a programmer, or at a time after the programming but before the code was requested by a client, and may define manners in which the code can be transformed when it is served so as to obfuscate its operation from malware on a client device.

[0034] In this example, the annotations 106 define two transformations to be performed on the code 108 when it is served. Each transformation is indicated in the annotations 106 by the syntax "TXn." Each annotation is followed by data that defines the transformation that is to be performed in the code 108. For example, the transformation TX1 defines a transformation in the form that the string ABC, when it is encountered by a transformation system, is transformed into a particular random string of characters. For example, an analysis system operating at the time that the code 108 was programmed, may have recognized that the programmer used the string ABC as a function or variable name throughout the code, and the analysis system may have been programmed to then recognize that changing the name of the function or variable would not affect the operation of the code on a client device, as long as such changing was performed consistently across all of the code 108. As a result, such a system, in this example, inserted a few lines of instructions in the annotations 106 indicating that such string should be changed before the code is served. As a result, and if the change is made differently each time the code is served, in a polymorphic manner, malware may be blocked from readily interpreting and locating the function or variable name when it receives the served code, because the name will be different for each serving of the code. Specifically, rather than serving the meaningful ABC name consistently, the transcoded code will instead serve a different string each time malware makes a request, and the malware will then have to figure out how to interact with such a moving target (a much harder proposition).

[0035] The annotations 106 further identify each location in the code 108 at which transformation TX1 is to occur. Such mapping to particular locations may occur in a variety of manners, and here, is represented in a simplified manner by defining the character position within the plaintext code file at which the string ABC was found to occur when the analysis was performed on the code.

[0036] Referring now to a second defined transformation in the annotations 106, transformation TX2 also defines that a particular string of characters is to be transformed into a random string when it is identified in the text. Again, a programming environment or other analysis system may have identified the string XYZ as a string that could be changed consistently throughout the code without affecting the manner in which the code would execute and be seen by a user of a browser that is served and that renders the code 108. With respect to transformation TX2, the analysis system determined that that string appeared at only one location in the code 108, at character position 2221.

[0037] In this simplified representation of the code 108, one can see the appearance of the identified strings throughout the code. For example, the string ABC in the form of element 112A, appears at three locations, as suggested by the three numeric location identifiers in the annotations 106. Similarly, the string XYZ, as shown by item 114A, appears at one location in the code 108. In this manner, then, the simplified representation indicates one way in which annotations may be appended to the top of a file of code to provide a transformation template or mapping for the code.

[0038] The transformation mapping in the form of annotations 106 may then be implemented at the time the code is served, such as by a security intermediary that is provided the code by a web server system. The security intermediary may be programmed to store a copy of the code 108 in its memory, and then to transform that copy of the code 108 using the annotations 106 as a template or map. For example, the transformation system may step through the lines of the annotations 106 as if it were executing them as code and may implement each of the transformations. Such transformed version of the code is shown at item 110. The transformed code 110 is in the form of an electronic document 104 that will actually be served by an intermediary security system to the requesting client, and includes substitutions as defined in the annotations 106. For example, the string ABC has been transformed in each instance to the random string of characters !X3. Similarly, the string XYZ has been transformed to the randomly selected string of characters ?#+. In actual implementation, a variety of additional and more-complicated transformations may be applied to the code 108, and may be applied across multiple different files of content, including different types of files, such as HTML files, CSS files, and JavaScript files. Also, for the shown two transformations, a different random string will be selected for each serving of the code, so that the modified code is served polymorphically.

[0039] The annotations may be connected to the code 108 in a variety of manners. In the example shown here, the annotations 106 are part of a file that contains the code 108 itself. The annotations 106 may be included as a header to that file, and may be hidden by marking them as comments or remarks, so that if the file is executed with the annotations 106 in it, the annotations 106 will not interfere with the execution of the code 108. However, in real implementation, a security intermediary will typically remove the annotations 106 before it serves the transformed code 110. The annotations 106 may also be provided at other locations in the code 108 such as at the end of the code 108 or interspersed throughout the code. For example, to hide the annotations 106 and the nature of operations performed by the annotations 106 from anyone who might intercept the code 108 with the annotations 106, the annotations 106 may be split up throughout the code 108 and interleaved with lines of annotations 106 between lines in the code 108. The annotations may also be provided in an encrypted form, so that, if a security system is non-functional or does not transform the code 108, and the code is served to a client in its original form, malware on the client will not be able to readily identify the sort of countermeasures that are otherwise being applied to that code.

[0040] In other implementations, the annotations 106 may be in a separate file from the code 108. For example, a flag may be inserted into the code 108 that points to a separate file that contains the annotations 106, such as by an application development environment (ADE) adding a URL to the code that points to a file that contains the corresponding annotations 106 for the code. When the code 108 is received by a security intermediary or other sort of security system, the flag or pointer may be accessed, and the security system may obtain the annotations 106 to use as a template in transforming the code 108. In yet other embodiments, the security system may obtain access to the annotations 106 that correspond to a particular piece of group of code in other ways, such as by having annotations stored in a predefined location and having the security system access the annotation information in various manners that link the relevant code to the annotations for that code.

[0041] The annotations may also identify a number of other operations that a security system should or can perform on the code 108. The annotations may be generated pursuant to an application programming interface (API) produced a security services company that provides software and/or hardware for transforming the code and such API may be used by programmers and by developers of ADEs in assisting with the writing of such annotations. In other examples, the annotations can indicate directives that blocks of code should be parse with different settings than other blocks of code, including by setting a level of aggressiveness to be applied, and indicating particular transformations and/or typer checking for transformations.

[0042] FIG. 1B is a conceptual diagram of a system 120 for analyzing and transcoding web code using pre-annotation of the code (and of other content that may be served). In general, the system 120 illustrates operations that may be performed in serving content through a network 134 such as the Internet, so as to transcode or transform the content to obfuscate its operation from malware.

[0043] Referring more specifically to the system 120, a transcoder 128 sits (physically and/or logically) between a Web server system 122 and the network 134. The Web server system 122 serves content in response to requests from devices such as client device 116, in an ordinary manner for the serving of web code. For example, the Web server system 122 may obtain a request from client device 116 to deliver a web page for display to a user, where the web page provides for the entry of credit card and other financial information for the user. Such a page may be something that the operator of the Web server system 122 wants to prevent being interfered with by malware, so that the operator of Web server system 122 may use the transcoder 128 as part of an intermediary system that intercepts code that is served over network 134, and to transcode the code in various different ways each time the code is served.

[0044] As shown in the figure, web code 126 is served from the Web server system 122 and is accompanied by analysis annotations 124. The transcoder 128 may separate the annotations 124 from the web code 126 and use the annotations 124 in combination with transformation rules 130 in order to transcode the web code 126. For example, the annotations 124 may identify locations in the code that should be altered in different manners each time the code 126 is served. The rules 130 may define how the annotations are to be applied generally, whereas the annotations 124 may be specific to the particular instance of web code 126, such as the code for a particular version of a particular webpage.

[0045] The web code 126 and corresponding annotations 124 may be generated and associated with each other in various manners. In this example, an annotation terminal 126 is shown and the annotations 124 were generated by software operating in association with the terminal 125 while the web code 126 was being coded by a program. For example, an application development environment (ADE) may provide tools for developing code in a familiar manner, and may be supplemented with tools that analyze the code as it is being written to ensure that the code is amenable to security treatment in the manners performed by transcoder 128. For example, the terminal 125 may be associated with software programmed with rules to identify coding styles or approaches that are amenable to security treatment. It may also be programmed to recognize or determine that a particular style used by a programmer is equivalent to one of the preferred styles, but is not amenable to security treatment. As a result, such software may monitor the coding progress of a programmer as code is added to a system, and may provide pop-up boxes or other user interface mechanisms by which to inform the programmer that the approach he or she is taking could be improved with minor changes to the code. The software may suggest such changes and allow the programmer to select an icon or other input mechanism in order to have the alternative code inserted in the place of the code that the programmer wrote, in order to have the changes automatically applied to the code that the programmer is providing. The software may also show the programmer before-and-after representations of the code as it would be executed so as to better inform the programmer the effect, if any, that the suggested change might have on the execution of the code when it is ultimately served to a user.

[0046] The annotation terminal 125 may also interact with a system for minifying the content. As shown here, code that is generated by a programmer at the terminal 125 may be supplied to a minifier 127 before it is made available to the Web server system 122. The minifier 127 may act to reduce redundancies or other items in the code that are not needed by clients that are served the code, and that can thus be removed from the code to reduce its size and the overhead of serving the code, without hurting how the code executes on client devices. Just as the terminal 125 and associated ADE software may provide guidance to a programmer with respect to improving his or her programming for security purposes, the terminal 125 and associated software may provide feedback to a programmer for minification purposes. For example, if a programmer adopts a particular programming technique that is not amenable to minification, the software may determine that such non-amenable code has been written, and in response, may provide to the programmer a suggestion for a different way to perform the coding. The suggestion may also be accompanied by example alternative code that is amenable to minification, and the programmer may select a displayed control (e.g., an icon) to have the alternative code inserted in place of the code he or she typed.

[0047] Therefore, the web code 126 may be previously minified, and annotations 124 may be applied to such minified code by the transcoder 128 using rules 130. The output of the transcoder 128, then, is web code 132, which the transcoder 128 may cause to be served to client device 116 through network 134, in response to the request that the client device 116 made to the Web server system 122. For example, as indicated in FIG. 1B, certain names, such as function names and variable names, may be modified as between the initial web code 126 and the web code' 132. Also, the transcoder 128 may serve the same web code 126 many different times, but the web code' 132 may be different for each of those servings. Such changes are generally referred to herein as polymorphic transformations, because they cause a change that differs in many various manners for different servings of the code. Polymorphic transformations may be beneficial in that they may create a moving target for malware at client 116 that is trying to automatically interoperate with the served code 132, or with malicious individuals who are trying to analyze the operation of the served code.

[0048] In this example, the annotations specify both translations to be performed and locations in the content (e.g., positions of related code elements) at which those translations are to be performed. In other implementations, additional or less information may be specified by the annotations. Where less information is specified, or in various situations, a transcoding server system may determine other information needed to fully implement the transformations. For example, the annotations may identify positions for transformations (and potentially group the locations as involving a particular type of element, and thus probably in need of a common transform throughout the content), but not specify the transforms themselves, such that the transcoding server system is responsible for making such a determination of what transform or transforms to apply. Also, while the example here discusses offsets for identifying locations or positions, such information may be identified lexically or syntactically (e.g., XPATH for HTML or similar mechanisms for JavaScript, CSS, and other content).

[0049] FIGS. 2A-2C show examples of system architectures for passing web code and annotation information for minification and polymorphic transcoding of served web content. In general, the different architectures that are displayed provide examples of different manners in which content may be shared between a Web server system 202 and a security server system 204, and manners in which the security server system 204 may obtain access to annotations to be used in polymorphically transcoding and minifying content from the Web server system 202.

[0050] Referring now to a first example in FIG. 2A, a Web server system 202 serves both content and annotations to a security server system 204, which then serves trans-coded versions of the content through a network 206, such as the Internet, to one or more users 208. Such an example may be similar to that shown in FIGS. 1A and 1B, where the content is the code that is served by the Web server system 202, and the annotations are appended to such code in a single file with the code. Alternatively, the content and annotations may be provided in separate files but as part of a single communication transaction, so that the security server system 204 may understand readily that the particular content and particular annotations are to be associated with each other because they were received together, and the annotations are to be used in transcoding the content.

[0051] A second example, shown in FIG. 2B, separates the annotations from the content. In this example, then, the Web server 202 provides the content to the security server system 204. Such content may include a URL or other mechanism to inform security server system 204 to obtain annotations related to the content and also perhaps to identify where such annotations are located. In this example, an annotation database 210 may store a variety of annotation rules or templates for multiple different pieces of content. For example, the annotation database 210 may have been populated by an ADE system like that described in FIG. 1A. Therefore, the security server system 204 may first receive the content, may parse the content to identify a reference to certain annotations, may follow the reference to the annotation database to obtain the particular annotations that are relevant to the served content, and may then apply the annotations to the content to produce transformed content that is served through network 206 to users 208.

[0052] The annotations may also, in some implementations, require some translation or analysis in addition to simply being applied to the content. For example, the annotations may specify types of transformation to be applied to the content and locations in the content where the transformations are to be made, but may not specify the steps needed to make such changes. The security server system in such an example may access the annotations, use the identifiers in the annotations to identify the types of transformations to be made, and may consult a different data source to identify steps to be taken to make the transformations

[0053] In a third example, indicated in FIG. 2C, the position of the Web server system 202 and the security server system 204 has been reversed form the prior examples, with the Web server system 202 serving content directly to the network 206 and users 208 instead of serving the content through the security server system 204. To apply security transformations discussed here, the Web server system 202 may initially provide content and annotations to the security server system 204. Alternatively, the security server system 204 may be provided with the content and may separately acquire the annotations such as in the example of FIG. 2B. The security server system 204 may then perform transformation operations on the content using the annotations, and may return to the Web server system 202 the polymorphic content that had security countermeasures applied to it. The Web server system 202 may then serve such polymorphic content through the network 206 to the users 208. This example, then, allows the web server system 202 to be more involved in the ultimate serving of content and to maintain more control over what is served and how it is served.

[0054] As described in further detail below, additional security countermeasures may also be applied to any of these particular example implementations, such as the addition of supplemental instrumentation code to the transformed code so as to monitor how the code interacts with other resources on clients operated by users 208.

[0055] FIG. 3A is a flow chart of a process for generating code for minification and transcoding. In general, the process centers around analysis of web code, where the analysis may be performed at various times, such as during a time when the web code is being written by a programmer, and to supplemental content that is created for assisting in the transformation of the web code that is to be served by a Web server system. Such supplemental content may include templates or maps that indicate types of changes and locations of changes within the code that are to be made to the code when it is served for purposes such as security countermeasures that interfere with the operation of malware on client devices to which the content is served.

[0056] The process begins at box 302, where a code base is opened by a programmer and an ADE for the code base is used by the programmer. Such programming may occur in various familiar manners, such as by the programmer accessing files managed by a revision control system, and typing lines of code to perform a desired operation. In certain instances, analyzers may assist the programmer in generating the code. For example, where a programmer is working on a particular page that requires security or on a project that requires security, an analyzer for security transformations may be operating in the background while the programmer works.

[0057] At box 304, edits made by the programmer are checked against transformation and minifier rules. For example, if a programmer defines a function or other relevant named object within the code, an analyzer may recognize the presence of a name for that object within a particular syntax for the programming code, and may determine whether that name is something that may be polymorphically transcoded without affecting the operation of the code. In a similar manner, the same or a different module that is part of the ADE may determine when code is provided that could be changed, so that the resulting code could be more effectively minified. Such module may recommend changes to the programmer in a similar manner.

[0058] At box 306, as noted, feedback and suggested edits are provided to the programmer. In certain instances, changes may be made automatically where such changes clearly will not affect the presentation of the code, or affect the particular programmer's treatment of the code at a later time.

[0059] At box 308, the coding session is closed. For example, the programmer may indicate that the code is ready to ship, such as by compiling the code or placing it at a location for code that should be served by a computer server system to customers of the organization that employs the programmer. Such steps may cause the code to be transferred to a system separate from the ADE system. At this stage, further analysis may be applied to the code, such as analysis to identify additional elements in the code that can be added to a template or map that guides application of security countermeasures to the code. In certain implementations, no suggestions are made during the programming, and the analysis may simply occur after the programmer has indicated that he or she is done programming and has released the code.

[0060] At box 310, a minifier is run against the code. The minifier may take various familiar forms, and may perform both HTML minifying and JavaScript minifying, among others. The analysis for transcoding and the analysis for minifying may be supervised or unsupervised. Unsupervised analysis may involve computer operation without a human confirming that the changes defined by the computer analysis will not break the code. Supervised analysis may involve a user interacting with the code that has been subject to changes that are defined by the automatic analysis, such as minification changes and security countermeasure changes. In such a manner, a supervising individual may cause certain of the changes that the automated system suggests to be canceled or reversed, if such changes are determined by that person to affect the manner of presentation for the content.

[0061] At box 312, annotations are generated that map the transformations that the analysis process has determined to be useful in the code. For example, the analysis may look for function names or other such names as described above, and may identify where a particular instance of a function name occurs throughout the code base and map to locations where the name occurs. The generation of annotations may occur in a form similar to that shown in the annotations 106 in FIG. 1A, where a process generates data that can be used by a transformation subsystem at a later time when the code is requested for serving. One benefit of performing the analysis before the code is requested is that analysis may be computationally expensive as compared to simply applying the template or map that is generated from the analysis. As a result, it may be beneficial to have to perform only the computationally inexpensive portion of the process when the code is requested, so as to reduce any time lag in the serving of the code.

[0062] At box 314, the code is provided to a transformation subsystem. Such provision of the code may occur in various manners. For example, a copy of the code may be provided to the transformation subsystem, such as a security intermediary device, by the Web server system for every request for the code. The transformation subsystem may then obtain the corresponding annotations for that particular example of code, such as by the annotations being included in a file of the code, or being pointed to by a flag in the code such as a URL. As indicated in FIG. 1B, then, a transcoder 128 may combine such information with rules 130 in order to generate a transcoded version of the code. In other examples, the transformation system may cache copies of the code so that it need not ask the Web server system for new copies when repeated requests are made by the same or different users for the same code. In such a situation, the transformation system may also monitor for changes to the code by the programmer, so that the cache may be updated when meaningful changes occur, and additionally, analysis may be performed again on the updated code and new annotations may be generated.

[0063] In this manner then, the process permits for analysis of code both for security countermeasures and minification at a time that is before a request for the code is made. The results of the analysis may be used by a transformation system after the request for the code is made, and the results may be obtained by the transformation system by analyzing a connection from the code to the annotations, such as by the annotations being included the same file with the code (e.g., in comments embedded in the code or in a header for the code) or by the code including a pointer toward the annotations that are to be used.

[0064] FIG. 3B is a flow chart of a process for handling web code for analysis and polymorphic transcoding. In general, the process involves transformation of requested code by using annotations that were generated before the request was received, and that define transformations that should be performed on the code, without a need for analysis of the original code at the time of the request for the code.

[0065] The process begins a box 320, where web code with annotations is received, such as from a cache or a Web server system. The web code may be included in an electronic file that also includes the annotations, or may be obtained by a system that has parsed and analyzed the code to find a reference to a file that includes the annotations. As described above, the annotations may define particular changes that need to be made to the code when it is served, so as to provide for polymorphic serving of the code. Such defined changes may be in the form of a template or map that defines the type of each change and the location of each change in the code that is to be transformed. The use of such a well-defined map or template may alleviate the need to perform additional analysis or at least complete additional analysis at the time of serving of the content to requested clients.

[0066] At box 322, the annotation information is identified and extracted. For example, a transformation system may be programmed to look for a particular reserved tag that is used by a programming system to identify annotation commands or definitions. The system may search for such tags and extract all text that corresponds to the tags and copy that text into storage that may then be parsed in applying the transformations defined by the text.

[0067] At box 324, the annotation information is interpreted to generate transcoding rules or templates to be applied to the code. In certain embodiments, the annotations themselves may directly find the rules that will be applied, whereas in other embodiments, certain extra processing of the text in the annotations may be required before the transformations may be executed. The result of such processing or interpretation may be an easily read map that may be used by the transformation system to locate portions of the code to be transformed and to readily provide transformations for those portions of the code.

[0068] At box 326, the transcoding rules or templates are applied to the web code with polymorphism. That means, for example, that portions of the web code may be removed or replaced with different alphanumeric characters than were in the web code, though in a manner that does not break the execution of the web code. Also, the replacing alphanumeric characters may differ between different servings of the web code even though the alphanumeric characters that they replace have not changed. Such polymorphic changing of the characters in the code will, however, need to be performed in a consistent manner across all the code that does not break the operation of the code.

[0069] At box 328, the transcoded web code is served in a polymorphic manner. For example, the received code may have the annotations stripped from it if the annotations were received in the same file as the code, the code may be minified to reduce its size and the bandwidth that it requires, the code may be supplemented with other code as described in more detail below, and the code may then be served in an ordinary manner, either directly by the transformation system, or indirectly by the transformation system passing the transformed code back to the Web server system which will then serve the code to the requesting client sleep

[0070] FIG. 4 is a swim lane diagram of a process for adding annotations to web content to be served polymorphically. In general, the process is similar to those shown and discussed above, but indicates example actions that can be taken by each of different components in a system, so as to more plainly explain such operation.

[0071] The process begins at box 402, where a programmer using a software development system writes software, such as code for generating a web page on a web browser. Such code and other content can take a variety of forms that interact with each other, including HTML files, CSS files, and JavaScript files. Any appropriate type of web page may be generated in this process, though typically the process would be applied to pages that have security concerns for a company that serves the pages.

[0072] At box 404, the software development system analyzes the software to identify changes that can be made to the code when it is served, including polymorphic transformations for adding security to the code, and minification to make the served code smaller. The analysis may be essentially continuous as the code is developed, or after the code is completed and published by a programmer. Such analysis is shown here as being completed by the software development system, but could alternatively or additionally be performed by the security system.

[0073] At box 406, the code is stored. The code may be stored together with the annotations or separately from the annotations. As later shown, other components request the code and annotations from the software development system, but either or both could be stored under the control of another system such as the Web server system or the security system. At this point, the code is ready for being requested by clients and served in response to such requests.

[0074] Such a request is received from a client device at box 408. The request is directed to the Web server system but is intercepted by the security system (which make act as a CDN). If the request were in response to previously served content, it may have included transformed content in it which would need reverse transformation, but in this situation, the security system determines that this is a "fresh" request, and thus just passes the request through to the Web server system. The Web server system receives the request and serves it in an ordinary manner by obtaining the appropriate code and other content, packaging it, and serving it. The served code is again intercepted by the security system (box 414), which begins processing it to add security countermeasures to it. In this example, the code includes a pointer such as a URL aimed at a storage location for the annotations, which in this example are stored with the software development system. The security system makes a request for the annotations by following the URL (box 414) and the software development system returns the annotations in response to the request (box 418). At box 420, the security system then uses the annotations in combination with other items, such as transformation rules that are identified by particular entries in the annotations, to transform the code, such as by applying polymorphic transformations to various elements in the code and other content. At box 422, the security system serves the code, and at box 424, the client device renders it, such as by displaying the code and other content on a web browser.

[0075] FIG. 5 shows an example of components for assisting in minification and transcoding of developed web content. In general, the components are shown as part of a system 500 that may be accessed by a computer programmer in developing software code, and that may aid the programmer in the development of the code, and particularly in helping the programmer develop code that is most amenable to the insertion of security countermeasures and to minification.

[0076] Referring now more particularly to the components, the system 500 includes an integrated development environment (IDE) 516 that manages most portions of the software development process in a familiar manner. The IDE 516 in particular, includes a user interface 504 that may provide word processing functionality and other associated programming a to a programmer in a familiar manner. For example, the user interface 504 may provide access to a revision control system (RCS) 518 that interacts with the IDE 516 to maintain control over who accesses a source code base, and to maintain a history of that source code base.

[0077] The user interface 504, in this example is shown to have three main elements. Element 506 is a main display of the source code on which the programmer 502 is currently working. The programmer 502 may use a cursor to navigate through the visual display of such code 506 and may type or cut and paste code into an out of the source code in familiar manners. Elements 508 and 510 represent, respectively, output from a security plug-in 520 and a minifier plug-in 522. Element 508 indicates messages that are generated by the security plug-in 520, such as messages that indicate to the programmer 502 that code he has just written may be better written in an equivalent manner for execution purposes, but in a manner that is more amenable to polymorphic transformation analysis and treatment. A minifier plug-in 522 may generate similar messages to the programmer 502 through element 510. Such messages may include a blinking alert to catch the programmer's 502 attention, along with text indicating an opportunity for the programmer 502 to improve the operation of the code with respect to both security and minification.

[0078] A code base 524 is separately provided and represents current copies of source code or object code for a particular project. The manner in which the code base 524 is organized may vary according to the particular implementation and environment that is being operated by the organization for which programmer 502 works.

[0079] Guides 512 and 514 are shown here to represent information that may be provided to the programmer 502 in advance of the programmer 502 undertaking a programming task. Guide 512 is a transform coder guide that explains best methods for programming, in a manner that allows for efficient and effective security polymorphic transformation of code. Similarly, guide 514 is a minifier coder guide, which explains to a programmer techniques that may be used to maximize the degree to which code may be compressed or minified before it is served. By the system 500 shown here then, a programmer 502 may be able to operate according to a traditional IDE 516 and traditional user interface 504, but may be guided in a nonintrusive manner to improve the code that is generated in various ways. Such code may be provided within the systems shown, for example, in FIGS. 1B and 6, and in the processes shown in FIGS. 3A, 3B, and 4.

[0080] FIG. 6 shows a system 600 for serving polymorphic and instrumented code. Generally, polymorphic code is code that is changed in different manners for different servings of the code in manners that do not affect the manner in which the executed code is perceived by users, so as to create a moving target for malware that tries to determine how the code operates, but without changing the user experience. Instrumented code is code that is served, e.g., to a browser, with the main functional code and monitors how the functional code operates on a client device, and how other code may interact with the functional code and other activities on the client device.

[0081] The system 600 may be adapted to perform deflection and detection of malicious activity with respect to a web server system. Deflection may occur, for example, by the serving of polymorphic code, which interferes with the ability of malware to interact effectively with the code that is served. Detection may occur, for example, by adding instrumentation code (including injected code for a security service provider) that monitors activity of client devices that are served web code.

[0082] The system 600 in this example is a system that is operated by or for a large number of different businesses that serve web pages and other content over the internet, such as banks and retailers that have on-line presences (e.g., on-line stores, or on-line account management tools). The main server systems operated by those organizations or their agents are designated as web servers 604a-204n, and could include a broad array of web servers, content servers, database servers, financial servers, load balancers, and other necessary components (either as physical or virtual servers).

[0083] In this example, security server systems 602a to 602n (which may implement components like the decoder 110 described with respect to FIG. 1) may cause code from the web server system to be supplemented and altered. In one example of the supplementation, code may be provided, either by the web server system itself as part of the originally-served code, or by another mechanism after the code is initially served, such as by the security server systems 602a to 602n, where the supplementing code causes client devices to which the code is served to transmit data that characterizes the client devices and the use of the client devices in manners like those discussed in the many examples above. As also described below, other actions may be taken by the supplementing code, such as the code reporting actual malware activity or other anomalous activity at the client devices that can then be analyzed to determine whether the activity is malware activity.

[0084] The set of security server systems 602a to 602n is shown connected between the web servers 604a to 604n and a network 610 such as the internet. Although both extend to n in number, the actual number of sub-systems could vary. For example, certain of the customers could install two separate security server systems to serve all of their web server systems (which could be one or more), such as for redundancy purposes. The particular security server systems 602a-202n may be matched to particular ones of the web server systems 604a-204n, or they may be at separate sites, and all of the web servers for various different customers may be provided with services by a single common set of security servers 602a-202n (e.g., when all of the server systems are at a single co-location facility so that bandwidth issues are minimized).

[0085] Each of the security server systems 602a-202n may be arranged and programmed to carry out operations like those discussed above and below and other operations. For example, a policy engine 620 in each such security server system may evaluate HTTP requests from client computers (e.g., desktop, laptop, tablet, and smartphone computers) based on header and network information, and can set and store session information related to a relevant policy. The policy engine may be programmed to classify requests and correlate them to particular actions to be taken to code returned by the web server systems before such code is served back to a client computer.

[0086] When such code returns, the policy information may be provided to a decode, analysis, and re-encode module, which matches the content to be delivered, across multiple content types (e.g., HTML, JavaScript, and CSS), to actions to be taken on the content (e.g., using XPATH within a DOM), such as substitutions, addition of content, and other actions that may be provided as extensions to the system. For example, the different types of content may be analyzed to determine naming that may extend across such different pieces of content (e.g., the name of a function or parameter), and such names may be changed in a way that differs each time the content is served, e.g., by replacing a named item with randomly-generated characters. Elements within the different types of content may also first be grouped as having a common effect on the operation of the code (e.g., if one element makes a call to another), and then may be re-encoded together in a common manner so that their interoperation with each other will be consistent even after the re-encoding.

[0087] The actions generated by policy system 620 may occur in various places and at various times. For example, the actions may be performed to analyze code soon after it is authored and before it is request by a client. In such a situation, the analysis may be performed by a software development system or may be performed by a non-real-time component of a security system, where such a component may periodically scan a code base looking for changed content, and may perform an analysis when such a change is found. The analysis may be performed to create, or may be guided by, annotations that are sorted with the code or referenced by the code. The annotations may guide an analysis component of a transcoding security system and/or a component that implements transcoding of the code using templates or maps generated form the analysis.

[0088] Both the analysis of content for determining which transformations to apply to the content, and the transformation of the content itself, may occur at the same time (after receiving a request for the content) or at different times. Such analysis may also use annotations as discussed above as input, or may generate annotations for later use in the process. For example, the analysis may be triggered, not by a request for the content, but by a separate determination that the content newly exists or has been changed. Such a determination may be via a "push" from the web server system reporting that it has implemented new or updated content. The determination may also be a "pull" from the security servers 602a-202n, such as by the security servers 602a-202n implementing a web crawler (not shown) to recursively search for new and changed content and to report such occurrences to the security servers 602a-202n, and perhaps return the content itself and perhaps perform some processing on the content (e.g., indexing it or otherwise identifying common terms throughout the content, creating DOMs for it, etc.). The analysis to identify portions of the content that should be subjected to polymorphic modifications each time the content is served may then be performed according to the manner discussed above and below.

[0089] A rules engine 622 may store analytical rules for performing such analysis and for re-encoding of the content. The rules engine 622 may be populated with rules developed through operator observation of particular content types, such as by operators of a system studying typical web pages that call JavaScript content and recognizing that a particular method is frequently used in a particular manner. Such observation may result in the rules engine 622 being programmed to identify the method and calls to the method so that they can all be grouped and re-encoded in a consistent and coordinated manner.

[0090] The analysis may be made simpler where analysis-related annotations are present in or with the content, where such annotations identify locations at which transformations could be performed, define the transformations to be performed at those locations, and/or other factors that can assist in the transformation of the content so that less analysis is needed closer to run-time for a system. Various mechanisms for providing such annotations in relation with particular content, and mechanisms for using such annotations in the transformation of content are discussed in detail above.

[0091] The decode, analysis, and re-encode module 624 encodes content being passed to client computers from a web server according to relevant policies and rules. The module 624 also reverse encodes requests from the client computers to the relevant web server or servers. For example, a web page may be served with a particular parameter, and may refer to JavaScript that references that same parameter. The decode, analysis, and re-encode module 624 may replace the name of that parameter, in each of the different types of content, with a randomly generated name, and each time the web page is served (or at least in varying sessions), the generated name may be different. When the name of the parameter is passed back to the web server, it may be re-encoded back to its original name so that this portion of the security process may occur seamlessly for the web server.

[0092] A key for the function that encodes and decodes such strings can be maintained by the security server system 602 along with an identifier for the particular client computer so that the system 602 may know which key or function to apply, and may otherwise maintain a state for the client computer and its session. A stateless approach may also be employed, whereby the system 602 encrypts the state and stores it in a cookie that is saved at the relevant client computer, or in a hidden field such as a field on a form that is being presented to a user and for which the input to the form is being obfuscated in a polymorphic manner. The client computer may then pass that cookie data back when it passes the information that needs to be decoded back to its original status.

[0093] With the cookie data, the system 602 may use a private key to decrypt the state information and use that state information in real-time to decode the information from the client computer. Such a stateless implementation may create benefits such as less management overhead for the server system 602 (e.g., for tracking state, for storing state, and for performing clean-up of stored state information as sessions time out or otherwise end) and as a result, higher overall throughput.

[0094] The decode, analysis, and re-encode module 604 and the security server system 602 may be configured to modify web code differently each time it is served in a manner that is generally imperceptible to a user who interacts with such web code. For example, multiple different client computers may request a common web resource such as a web page or web application that a web server provides in response to the multiple requests in substantially the same manner. Thus, a common web page may be requested from a web server, and the web server may respond by serving the same or substantially identical HTML, CSS, JavaScript, images, and other web code or files to each of the clients in satisfaction of the requests. In some instances, particular portions of requested web resources may be common among multiple requests, while other portions may be client or session specific. The decode, analysis, and re-encode module 624 may be adapted to apply different modifications to each instance of a common web resource, or common portion of a web resource, such that the web code that it is ultimately delivered to the client computers in response to each request for the common web resource includes different modifications.

[0095] In certain implementations, the analysis can happen a single time for a plurality of servings of the code in different recoded instances. For example, the analysis may identify a particular function name and all of the locations it occurs throughout the relevant code, and may create a map to each such occurrence in the code. Subsequently, when the web content is called to be served, the map can be consulted and random strings may be inserted in a coordinated matter across the code, though the generation of a new name each time for the function name and the replacement of that name into the code, will require much less computing cost than would full re-analysis of the content. Also, when a page is to be served, it can be analyzed to determine which portions, if any, have changed since the last analysis, and subsequent analysis may be performed only on the portions of the code that have changed.

[0096] Even where different modifications are applied in responding to multiple requests for a common web resource, the security server system 602 can apply the modifications in a manner that does not substantially affect a way that the user interacts with the resource, regardless of the different transformations applied. For example, when two different client computers request a common web page, the security server system 602 applies different modifications to the web code corresponding to the web page in response to each request for the web page, but the modifications do not substantially affect a presentation of the web page between the two different client computers. The modifications can therefore be made largely transparent to users interacting with a common web resource so that the modifications do not cause a substantial difference in the way the resource is displayed or the way the user interacts with the resource on different client devices or in different sessions in which the resource is requested.

[0097] An instrumentation module 626 is programmed to add instrumentation code to the content that is served from a web server. The instrumentation code is code that is programmed to monitor the operation of other code that is served. For example, the instrumentation code may be programmed to identify when certain methods are called, when those methods have been identified as likely to be called by malicious software. When such actions are observed to occur by the instrumentation code, the instrumentation code may be programmed to send a communication to the security server reporting on the type of action that occurred and other meta data that is helpful in characterizing the activity. Such information can be used to help determine whether the action was malicious or benign.

[0098] The instrumentation code may also analyze the DOM on a client computer in predetermined manners that are likely to identify the presence of and operation of malicious software, and to report to the security servers 602 or a related system. For example, the instrumentation code may be programmed to characterize a portion of the DOM when a user takes a particular action, such as clicking on a particular on-page button, so as to identify a change in the DOM before and after the click (where the click is expected to cause a particular change to the DOM if there is benign code operating with respect to the click, as opposed to malicious code operating with respect to the click). Data that characterizes the DOM may also be hashed, either at the client computer or the server system 602, to produce a representation of the DOM (e.g., in the differences between part of the DOM before and after a defined action occurs) that is easy to compare against corresponding representations of DOMs from other client computers. Other techniques may also be used by the instrumentation code to generate a compact representation of the DOM or other structure expected to be affected by malicious code in an identifiable manner.

[0099] As noted, the content from web servers 604a-704n, as encoded by decode, analysis, and re-encode module 624, may be rendered on web browsers of various client computers. Uninfected client computers 612a-212n represent computers that do not have malicious code programmed to interfere with a particular site a user visits or to otherwise perform malicious activity. Infected client computers 614a-214n represent computers that do have malware or malicious code (218a-218n, respectively) programmed to interfere with a particular site a user visits or to otherwise perform malicious activity. In certain implementations, the client computers 612a-212n, 614a-214n may also store the encrypted cookies discussed above and pass such cookies back through the network 610. The client computers 612a-212n, 614a-214n will, once they obtain the served content, implement DOMs for managing the displayed web pages, and instrumentation code may monitor the respective DOMs as discussed above. Reports of illogical activity (e.g., software on the client device calling a method that does not exist in the downloaded and rendered content) can then be reported back to the server system.

[0100] The reports from the instrumentation code may be analyzed and processed in various manners in order to determine how to respond to particular abnormal events, and to track down malicious code via analysis of multiple different similar interactions across different client computers 612a-212n, 614a-214n. For small-scale analysis, each web site operator may be provided with a single security console 607 that provides analytical tools for a single site or group of sites. For example, the console 607 may include software for showing groups of abnormal activities, or reports that indicate the type of code served by the web site that generates the most abnormal activity. For example, a security officer for a bank may determine that defensive actions are needed if most of the reported abnormal activity for its web site relates to content elements corresponding to money transfer operations--an indication that stale malicious code may be trying to access such elements surreptitiously.

[0101] Console 607 may also be multiple different consoles used by different employees of an operator of the system 600, and may be used for pre-analysis of web content before it is served, as part of determining how best to apply polymorphic transformations to the web code. For example, in combined manual and automatic analysis like that described above, an operator at console 607 may form or apply rules 622 that guide the transformation that is to be performed on the content when it is ultimately served. The rules may be written explicitly by the operator or may be provided by automatic analysis and approved by the operator. Alternatively, or in addition, the operator may perform actions in a graphical user interface (e.g., by selecting particular elements from the code by highlighting them with a pointer, and then selecting an operation from a menu of operations) and rules may be written consistent with those actions.

[0102] A central security console 608 may connect to a large number of web content providers, and may be run, for example, by an organization that provides the software for operating the security server systems 602a-202n. Such console 608 may access complex analytical and data analysis tools, such as tools that identify clustering of abnormal activities across thousands of client computers and sessions, so that an operator of the console 608 can focus on those clusters in order to diagnose them as malicious or benign, and then take steps to thwart any malicious activity.

[0103] In certain other implementations, the console 608 may have access to software for analyzing telemetry data received from a very large number of client computers that execute instrumentation code provided by the system 600. Such data may result from forms being re-written across a large number of web pages and web sites to include content that collects system information such as browser version, installed plug-ins, screen resolution, window size and position, operating system, network information, and the like. In addition, user interaction with served content may be characterized by such code, such as the speed with which a user interacts with a page, the path of a pointer over the page, and the like.

[0104] Such collected telemetry data, across many thousands of sessions and client devices, may be used by the console 608 to identify what is "natural" interaction with a particular page that is likely the result of legitimate human actions, and what is "unnatural" interaction that is likely the result of a bot interacting with the content. Statistical and machine learning methods may be used to identify patterns in such telemetry data, and to resolve bot candidates to particular client computers. Such client computers may then be handled in special manners by the system 600, may be blocked from interaction, or may have their operators notified that their computer is potentially running malicious software (e.g., by sending an e-mail to an account holder of a computer so that the malicious software cannot intercept it easily).

[0105] FIG. 7 shows an example of a computer system 700. The system 700 can be used for the operations described in association with any of the computer-implement methods described previously, according to one implementation. The system 700 is intended to include various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The system 700 can also include mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. Additionally the system can include portable storage media, such as, Universal Serial Bus (USB) flash drives. For example, the USB flash drives may store operating systems and other applications. The USB flash drives can include input/output components, such as a wireless transmitter or USB connector that may be inserted into a USB port of another computing device.

[0106] The system 700 includes a processor 710, a memory 720, a storage device 730, and an input/output device 740. Each of the components 710, 720, 730, and 740 are interconnected using a system bus 750. The processor 710 is capable of processing instructions for execution within the system 700. The processor may be designed using any of a number of architectures. For example, the processor 710 may be a CISC (Complex Instruction Set Computers) processor, a RISC (Reduced Instruction Set Computer) processor, or a MISC (Minimal Instruction Set Computer) processor.

[0107] In one implementation, the processor 710 is a single-threaded processor. In another implementation, the processor 710 is a multi-threaded processor. The processor 710 is capable of processing instructions stored in the memory 720 or on the storage device 730 to display graphical information for a user interface on the input/output device 740.

[0108] The memory 720 stores information within the system 700. In one implementation, the memory 720 is a computer-readable medium. In one implementation, the memory 720 is a volatile memory unit. In another implementation, the memory 720 is a non-volatile memory unit.

[0109] The storage device 730 is capable of providing mass storage for the system 700. In one implementation, the storage device 730 is a computer-readable medium. In various different implementations, the storage device 730 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.

[0110] The input/output device 740 provides input/output operations for the system 700. In one implementation, the input/output device 740 includes a keyboard and/or pointing device. In another implementation, the input/output device 740 includes a display unit for displaying graphical user interfaces.

[0111] The features described can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The apparatus can be implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output. The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

[0112] Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).

[0113] To provide for interaction with a user, the features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer. Additionally, such activities can be implemented via touchscreen flat-panel displays and other appropriate mechanisms.

[0114] The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), peer-to-peer networks (having ad-hoc or static members), grid computing infrastructures, and the Internet.

[0115] The computer system can include clients and servers. A client and server are generally remote from each other and typically interact through a network, such as the described one. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

[0116] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

[0117] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

[0118] Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

* * * * *