U.S. patent application number 15/292177 was filed with the patent office on 2017-04-13 for detection, protection and transparent encryption/tokenization/masking/redaction/blocking of sensitive data and transactions in web and enterprise applications.
The applicant listed for this patent is Secupi Security Solutions Ltd. Invention is credited to Dotan ADLER, Alon ROSENTHAL.
Application Number | 20170104756 15/292177 |
Document ID | / |
Family ID | 58499115 |
Filed Date | 2017-04-13 |
United States Patent
Application |
20170104756 |
Kind Code |
A1 |
ROSENTHAL; Alon ; et
al. |
April 13, 2017 |
Detection, protection and transparent
encryption/tokenization/masking/redaction/blocking of sensitive
data and transactions in web and enterprise applications
Abstract
A plurality of users connect to an application sending requests
over a transport and receiving responses from an application that
contain sensitive data. For each user request, the application runs
one or more data requests and commands to various data sources or
other information systems which return the sensitive data. The
application then processes the data and returns is to the user as
is or processed based on some business logic. The application
includes a run-time environment--where the application logic is
executed.
Inventors: |
ROSENTHAL; Alon; (Ramat
Efal, IL) ; ADLER; Dotan; (Raanana, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Secupi Security Solutions Ltd |
Ramat Efal |
IL |
US |
|
|
Family ID: |
58499115 |
Appl. No.: |
15/292177 |
Filed: |
October 13, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62240795 |
Oct 13, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 63/20 20130101;
H04L 63/102 20130101; H04L 63/0428 20130101 |
International
Class: |
H04L 29/06 20060101
H04L029/06 |
Claims
1. A method for detecting sensitive data in a transaction, in a
system comprising a user application, an application server and a
data source in communication through a computer network, the
application server operating a real time program, the method
comprising: a. Determining a policy for permitted data
transactions, wherein said policy is determined according to a data
lineage for data transactions between the user application, the
application server and the data source; b. Injecting code to the
real time program according to said policy; c. Detecting a data
request from the user application and/or a data response from the
data source by said injected code; d. Determining a score for data
in said data request and/or said data response by said injected
code; and e. Determining whether said data request and/or said data
response is to be blocked by said injected code.
2. The method of claim 1, wherein said data request includes a data
request from the user application and a data request from the
application server, and wherein said data response includes a data
response from the data source and a data response from the
application server, wherein only said data request and said data
response are detected.
3. The method of claim 2, further comprising recording said score
for said data response and for said data request; and auditing
interactions according to said recording.
4. The method of claim 3, wherein said auditing is performed with
regard to a particular user identifier for the user
application.
5. The method of claim 1, further comprising blocking said data
request and/or said data response by said injected code, wherein
said blocking comprises one or more of preventing said data request
and/or said data response from proceeding, alerting or notifying
appropriate administrators, or
redacting/anonymizing/masking/encrypting/decrypting/tokenizing/de-tokeniz-
ing some or all of the data in one or both of said data request
and/or said data response.
6. The method of claim 1, wherein said determining is performed in
real time.
7. The method of claim 1, for operation in a system including a
Security Information Event Management System (SIEM), the method
further comprising sending whether said data request and/or said
data response may proceed to said SIEM as a triggering event.
8. The method of claim 1, wherein said determining said score is
determined according to a 4V model, comprising determining a value
of data, a velocity of data, a variety of data and/or a volume of
data being transmitted in said data request and/or said data
response; and determining said score according to said value, said
velocity, said variety and/or said volume.
9. The method of claim 8, comprising determining said score
according to all of said value, said velocity, said variety and
said volume.
10. The method of claim 9, wherein said determining said score
according to said value, said velocity, said variety and said
volume further comprises comparing each of said value, said
velocity, said variety and said volume to historical values for an
identified user operating said user application.
11. The method of claim 10, wherein said determining said score
according to said value, said velocity, said variety and said
volume further comprises comparing each of said value, said
velocity, said variety and said volume to historical values for a
group of identified users operating said user application.
12. The method of claim 8, further comprising applying a
logarithmic scale to data analyses performed according to said 4V
model.
13. A method for detecting sensitive data in a transaction, in a
system comprising a user application, an application server and a
data source in communication through a computer network, the
application server operating a real time program and an application
agent, the method comprising: a. Determining a policy for permitted
data transactions, wherein said policy is determined according to a
data lineage for data transactions between the user application,
the application server and the data source; b. Determining one or
more permitted data transaction parameters for said application
agent according to said policy; c. Detecting a data request from
the user application and/or a data response from the data source by
said application agent; d. Determining a score for data in said
data request and/or said data response by said application agent;
and e. Determining whether said data request and/or said data
response may proceed by said application agent.
14. A method for detecting sensitive data in a transaction, in a
system comprising a user application, an application server, a
central policy manager and a data source in communication through a
computer network, the application server operating a real time
program, the method comprising: a. Determining a policy for
permitted data transactions by said central policy manager, wherein
said policy is determined according to a data lineage for data
transactions between the user application, the application server
and the data source; b. Detecting a data request from the user
application and/or a data response from the data source by said
central policy manager, c. Determining a score for data in said
data request and/or said data response by said central policy
manager; and d. Determining whether said data request and/or said
data response may proceed by said central policy manager.
15. A system for detecting sensitive data, comprising: a. A user
computer; b. A user application operated by said user computer; c.
An application server; d. A real time program operated by said
application server e. A data source; f. A central policy management
server for setting and managing one or more policies for governing
data transactions in the system; g. A computer network for
communication between said user application, said application
server and said data source, including a data request from the user
application and a data response from the data source; h. A policy
to code generator for generating injectable code to be injected at
the real time program operated by the application server and/or
user application, according to said policies of said central policy
management server, for detecting sensitive information.
16. The system of claim 15, wherein the application server
comprises a plurality of application servers and wherein said
policy to code generator generates a plurality of sets of
injectable code to be injected at said plurality of real time
programs.
17. The system of claim 16, wherein each set of injectable code
injects a marker for detecting from which real time program at
which application server sensitive information is detected.
18. The system of claim 15, further comprising an agent operated by
said user computer for obtaining contextual information related to
said sensitive information and for relaying such contextual
information to said central policy management server.
19. The system of claim 15, further comprising an agent operated by
said application server for filtering incoming requests from said
user application against a large array of values to detect incoming
requests that are not allowed.
20. The system of claim 15, wherein said central policy management
server, upon reviewing an incoming request from said user
application, determines that at least one value of sensitive
information provided by said incoming request, or provided in
response to said incoming request, is to be masked.
21. The system of claim 20, wherein said masking comprises
performing a dynamic mask to replace or delete one or more request
information or variables, and/or to perform one of the following:
encrypt/decrypt, tokenize/detokenize, hide, redact, delay or block
the sensitive-data-flows and high value transactions of end-users
and/or external program interface (API calls).
22. The system of claim 15, wherein said injected code to said user
application and/or said real time program operated by said
application server, is executed before and/or after sensitive user
requests and data requests to audit and monitor data flows and
sensitive transactions.
23. The system of claim 22, wherein said injected code collects at
least one of the following data upon execution: end-user name, the
user request, the user and/or data request variables, the data
request, the result set returned from the data source to the
application if available, and/or the user response if
available.
24. The system of claim 22, wherein said injected code determines
when to execute according to a Post-Build process that scans the
source or binary code during development time and injects the
relevant code into the source or binary, before it is being
deployed; as a post build process, where a virtual machine
transforms/instruments the binary code before it is being loaded
into memory and before it is being executed by the virtual machine;
by injecting code into application files in the filesystem before
it is being loaded and executed at runtime; or by changing the
source code of the application, by adding program calls to a
central server for sending event context and receiving changes to
the variables and code executed by the application.
25. The system of claim 22, wherein said injected code is added to
user applications and/or applications run by said application
server in which program code can be instrumented, as well as on
program code that cannot be instrumented by using code changes in
the application.
Description
FIELD OF THE PRESENT INVENTION
[0001] The present invention relates to a process of detecting,
controlling and protecting sensitive information across web and
enterprise applications (using encryption, tokenization, masking,
redaction or blocking).
BACKGROUND OF THE PRESENT INVENTION
[0002] Today, many large organizations maintain thousands of
sensitive applications that are exposed on a daily basis to
thousands of end-users, partners, clients, part time workforce, new
hires and resignations. With sensitive data such as personal
information, medical information and sensitive financial data being
exfiltrated by malicious insiders and hackers that hijack user
identities--organizations must be able to detect sensitive data
exposure by malicious insiders and hackers in real-time. In
addition, increasing regulations and industry standards require
fine-grained audit of "who" accessed "what", "when" and
"where".
[0003] Without wishing to be limited in any way, the term web or
enterprise applications often relate to a situation in which a
plurality of users and program interfaces connect to an application
sending requests and receiving responses from the application that
contains sensitive information. For each user request, the
application run one or more data requests and commands to various
other applications or data sources which return the sensitive data.
The application includes a run-time environment where the
application logic is executed; and calls are made to various other
application servers or data sources (such as a database, big-data
repositories file, application program interface (API) or an
Enterprise Application Integration solution--EAI) where sensitive
information is stored, accessed by the application.
[0004] The challenge is that these applications are highly complex,
thus requiring tedious human effort to monitor, classify and detect
suspicious malicious events and protection of the data becomes an
impossible task. Most commonly, application logs or Web Application
Firewalls (WAF) cannot get information about who accessed which
sensitive and/or regulated information in the application, while
both Database Activity Monitoring solutions (DAM) and native
database audit are not acceptable performance wise and do not give
accurate enough information to act on (like the identity of the end
user that is hidden by layers of applications, and the use of
database-connection pools--having the database blind to the end
user context). Therefore, there is a need for a system for fast and
accurate detection, and monitoring, of sensitive data in a
transparent, generic, resource-lean and context-rich scalable
way.
SUMMARY
[0005] A plurality of users connect to an application sending
requests over a network and receiving responses from an application
that contain sensitive data or apply a command to manipulate
sensitive data using an Application Program Interface (API call).
For each user/API request, the application runs one or more data
requests and commands to various data sources or other information
systems which return the sensitive data. The application then
processes the data and returns it to the user as is or processed
based on some business logic. The application includes a run-time
environment--where the application logic is executed.
[0006] The existing approach of monitoring applications uses
network Sniffers (Web Application Firewalls--WAF. Database Access
Monitors--DAM, network proxies, application logs, full database
audits and database agents and the like). The problems of these
existing approach include the fact that all of them lack
context--only monitor incoming network traffic, or data only so
they are all blind to encrypted network packets, application
caches, and are even blind to stored procedure calls and
data-source result sets (they only see unformatted data packets
traveling over the network--which they try to parse, with limited
accuracy).
[0007] Web Application Firewalls (WAF) that are installed between
the clients and the web servers are blind to the data context (as
they see only returning result packets and not the structured
database queries). WAF solutions lack the ability to identify which
response network packets contain sensitive data that needs to be
monitored, and which response network packets not within the
endless flood of packets. In addition, the returning data stream
packets are unstructured and highly dependent on the client
technology.
[0008] Database Access Monitoring (DAM) installed as an in-line
proxy or in sniffer mode detect and audit the SQL request stream as
well as the result set retuned to each query, but they are blind to
the user context that is hidden within the application server
domain and not exposed to the database connections that use
standard application connection pools (connecting application
requests to a data source using a pool of connections with a single
generic database user), and not the originating end-user
credentials, and thus cannot provide detection and monitoring of
sensitive data flows.
[0009] Database full audits record everything done in the database,
and hence create a lot of clutter, impose a huge performance
penalty, plus they lack the user information that is hidden behind
the application layer.
[0010] In addition, in cases where the organization has implemented
network encryption, WAF and DAM become totally blind unless they
open the encryption which adds a potential security exploit.
[0011] Application logs provide mere partial information, logging
only data inserts, updates and deletes, and not data reads, missing
all the important information about users accessing data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The preferred embodiments of the present invention will
hereinafter be described in conjunction with the appended drawings,
provided to illustrate and not to limit the present invention,
wherein like designations denote like elements, and in which:
[0013] FIG. 1A shows an exemplary, non-limiting system which
features an application user 106 communicating with an application
server 100, which based on the user request, sends a data request
to a data source 108.
[0014] FIGS. 1B and 1C show other embodiments of an exemplary,
non-limiting system of the present invention.
[0015] FIG. 2 shows an exemplary, illustrative non-limiting system
200 for constructing and injecting injectable code to provide
real-time monitoring capabilities.
[0016] FIG. 3A shows an exemplary, non-limiting illustrative flow
in which the central policy manager defines policies in one or more
of the request data flow stages, that when translated into code and
injected into the application server operated applications, detect
user interactions with the data.
[0017] FIG. 3B shows a similar flow to FIG. 3A except that the
application server itself performs the detection and determining
whether or not to audit, permit or block the interactions.
[0018] FIG. 3C is another exemplary flow with regard to detection
of user application interactions and either permitting or blocking
such interactions.
[0019] FIG. 4 is an exemplary, illustrative, non-limiting flow
which shows the details of the flow of FIG. 3C in greater
detail.
[0020] FIG. 5 shows the flow of each run time program that is
attached to one of the application programs on the application
server in the run time environment.
[0021] FIG. 6 shows an illustrative, non-limiting example of a
policy tree.
[0022] FIG. 7 shows a non-limiting example of how various stages
may lead to policy trees being generated which in turn may
optionally be used to create a policy code generator.
[0023] FIG. 8A shows a non-limiting example of a system featuring a
run-time environment.
[0024] FIG. 8B shows the application run-time environment of FIG.
8A as previously described in more detail.
[0025] FIG. 8C, again, shows the application run-time environment
with the central management server and, which in this case, also
includes an optional data logger 826.
[0026] FIG. 9 shows a non-limiting exemplary display of the results
of the data exfiltration analysis.
[0027] FIG. 10 shows a non-limiting, exemplary method for
determining the "4V" model score.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0028] While various embodiments of the present invention have been
illustrated and described, it will be clear that the present
invention is not limited to these embodiments only. Numerous
modifications, changes, variations, substitutions and equivalents
will be apparent to those skilled in the art without departing from
the spirit and scope of the present invention.
[0029] The term application refers to any program software code
such as but not limited to a single or multiple run-time programs
deployed on multiple application servers and/or a web server in
which an end-user or another application program interface (API) or
a scheduled request is sending a request to the application, that
is sending a request to receive data from any source (database,
file, data service, micro-service, application program
interface--API, or an Enterprise Application Integration EAI hub)
or apply a transaction.
[0030] An application may optionally be operated by any
computational device, including but not limited to a single server,
a Virtual Server, a laptop, or a mobile device, or on a cluster of
such computational devices.
[0031] The detection system as described herein may optionally
include a computer program code that is injected into a specific
application run-time program code. The injected computer program
code includes the ability to track a user-session thread (referred
as "session tracking id") of some, or all user request from
initiation (for example, injecting a code to the application
servlet program that processes incoming user requests) to
completion (for example, injecting a code to the application JDBC
program that sends and receives that data source requests into a
relational database), linking a single request across the request
execution stages (referred to as "lineage") both in a single
application run-time program or across multiple application
run-time programs. The injected computer program code includes the
ability to collect, analyze, replace or delete all request context
and variables such as system information (environment, time), user
information, application information, request information (e.g.
user computer IP address, requested URL, request headers, cookies
etc.), and data information (e.g. all parsed SQL requests and
result set structure, titles and values, outbound output http
requests and responses, etc.).
[0032] According to at least some embodiments, the present
invention can replace or delete one or more (or even all) request
context and variables in one or more (or even all) instrumented
stages of the run-time program. Replacing or deleting such context
and/or variable(s) optionally comprises masking a value, which
optionally includes providing a dynamic mask, and/or to perform one
of the following: encrypt/decrypt, tokenize/detokenize, hide,
redact, delay or block the sensitive-data-flows and high value
transactions of end-users and/or external program interface (API
calls).
[0033] The present invention, in at least some embodiments,
provides a computer system that utilizes injected run-time code
into two or more relevant application run-time programs. The
injected code is executed before and/or after sensitive user
requests (such as a URI request) and data requests (such as a SQL
"Select" request) are executed in order to audit and monitor all
data flows and high-value transaction, but can also include adding
injected run-time code into other application run-time
programs.
[0034] The data collected by the injected programs can include the
end-user name, the user request, the user and/or data request
variables, the data request, the result set returned from the data
source to the application when exists, and/or the user response
when exists.
[0035] The invention describes a computer system that can inject
run-time code into relevant application run-time programs using
instrumentation on Java, .Net, Node.js, PHP and other programming
languages that employ a virtual machine, as well as by adding code
to the run-time program by changing the run-time code of the
application code, source code changes performed by a programmer
and/or automatically by scanning the application code and inserting
new code in the relevant places, done for example on C or C++ DLLs
and Shared Objects binary program code.
[0036] The injected code is executed before and/or after sensitive
data requests are performed in order to detect, audit and monitor
data flows.
[0037] For example, this can be done by using a Post-Build process
that scans the source or binary code during development time and
injects the relevant code into the source or binary, before it is
being deployed OR it can be done as a post build process, where a
virtual machine transforms/instruments the binary code before it is
being loaded into memory and before it is being executed by the
virtual machine, OR by injecting code into application files in the
filesystem (for example DLLs, Shared Objects, object code) before
it is being loaded and executed at runtime, or by changing the
source code of the application--adding program calls to a central
server for sending event context and receiving changes to the
variables and code executed by the application.
[0038] According to at least some embodiments, the above functions
are provided in situations in which program code can be
instrumented, as well as on program code that cannot be
instrumented--using code changes in the application. For example
such changes may optionally be made manually by programmers or
automatically by scanning the application code and adding the new
code into the claims.
[0039] In addition, the detection system as described herein may
optionally include a policy engine that can be used by security
administrators or can be loaded with policies using a program or an
application program interface (API), or by simply import predefined
policies.
[0040] Another unique capability of the system is its ability to
transform the policies into run-time program code (for example by
using the policy-to-code generator, in FIG. 8A), and inject these
new policies into the application run-time programs code, with no
need for any application restart or source code changes. The update
to the instrumented agent's run-time code is optionally performed
every week/day/hour/minute, or based on administrator request or
based on the agent time schedule.
[0041] The policies can optionally audit, monitor, dynamic mask,
encrypt/decrypt, tokenize/detokenize, hide, redact, delay or block
the sensitive-data-flows and high value transactions of end-users
and/or external program interface (API calls) within the user
request, data request, data response from the data source and user
response returned to the end-user or application program interface
call, as well as possibly on other programs--as specified by the
policies.
[0042] Applying policies (that is, particular restrictions or
functions) in specific application run-time programs as instructed
by the policies in advance or dynamically on-demand enables the
system to have very wide functional possibilities, while imposing
low performance penalties or overhead, and increasing the
application security posture in a transparent way.
[0043] As described herein, terms such as "user request" and the
like are understood to involve actions taken through a user
computer (e.g., API call) and/or by a user application.
[0044] Turning now to the Figures, as shown in FIG. 1A, there is an
exemplary, non-limiting system which features an application user
106, operated by a computational device (not shown), communicating
with an application server 100, which based on the user request,
sends a data request to a data source 108. Data source 108 map
should be a database system, file system, EAI, or any other source
of sensitive data.
[0045] During its typical operation, user application 106 would
send a request to the application server 100 for some type of
function, which would involve some type of sensitive data
processing and/or sensitive data retrievals (and exposures). This
would form the user request as shown. Application server 100 then
sends data requests to data sources 108 as a result of the user
request, in order to retrieve sensitive data or perform the
functions or to otherwise satisfy the user request (for example, in
an application working with a relational database, the user sends a
request in a URL form to an application, the application sends a
corresponding data request in a SQL format to the database, such as
"select customer_name from table_customer"). Data source 108 then
responds to application server with the data responses (in
relational database, this is defined as "result set" that is
combined from various columns, which some of them might be
sensitive)
[0046] Application server 100 would then send a response to the
user that is to application user 106 (this is commonly referred to
as "output"). For the purpose of the present invention, at least in
embodiments, application server 100 also features an application
agent 110. Application agent 110 collects all data related to the
user's interaction as collected in injected real-time programs 101,
102, 103, and 104. The user request processing in the application
server 100 is characterized with a specific session tracking id
which is unique to the user request, and which is collected by the
application agent 110 in addition to other run-time variables and
system, user, application, session and request variables from the
application or from other programs and Application Programming
Interfaces (APIs). The application server agent 110 optionally and
preferably collects rich-context on each user
request/report/program, from the application run-time program as
well as from external sources including but not limited to
environment context (such time, date), application context
(application name, type), user context (name, role, LDAP attribute,
hire date, resignation date), session (IP, geolocation, device,
custom variables), request context (type, value, variable value,
objects, result set size, result set value, output value). This
additional context is used to detect the sensitive data flows from
the rest.
[0047] Optionally and preferably, application agent 110 may access
one or more of the following variables (intended as examples only
and not to create a closed list): [0048] a. Client and user
information, such as user name, host name, client IP, OS user, geo
location, role, LDAP/ActiveDirectory attributes, terminal [0049] b.
The request, such as time of request, request type, any of the
request parameters [0050] c. The SQL request to the data source,
the SQL type (select, update, insert or delete), instance (may be
relevant, for example, for clustered environments, to represent an
instance on which the execution is taking place), module, schema,
objects, columns, owner, application, host, bind variables and any
other variables added to the SQL request, request condition, group
by, order by and any other part of the SQL request, [0051] d. The
result set returned from the data source to the application,
including result set headers, column names, values, error message
or any type of database response [0052] e. The user response,
includes HTML, JSON, XML FILES or a program interface that is
returned to the user [0053] f. The user behavior profile, or a peer
behavior profile, user status, responsibility and role [0054] g.
Environment context, such as request's date and time
[0055] Application agent 110 is operated by application (which may
also optionally be or a virtual Machine such as JAVA, .NET, PHP and
the like) 100 and intercepts requests from user application 106 or
from external application programs and other directive
communication with data source 108 as described above. This
interception is necessary in order to determine which data is
sensitive. Real-time programs 101, 102, 103, 104 are preferably
executed in-line, more preferably before every application
program-code that is handling user requests, such as user agent
106, as well as any application program-code that is interfacing
with a data source, such as application server 100 and/or software
operating data source 108
[0056] Application agent 110 is therefore able to collect and
correlate (by identifying and matching the user request session
tracking id) the sensitive user and data requests in each one of
the run-time programs 101-104, detecting the related flows of
sensitive data (referred to as "lineage"). These flows are
initiated by a user operating user application 106, followed by the
application 100 generating a set of related data-source requests
submitted to the data sources 108, followed by the results retuned
from the data sources to the application 100, and the application
outputs that are returned to the user through user application
106.
[0057] That is how the operation of user application 106 and the
operation of the application itself and application server 110
correlate to cause sensitive data to be abstracted from or written
to data source 108.
[0058] Data Flow Monitoring across application servers according to
at least some embodiments is now described. When multiple
application servers 100 are working in-line to process the user
request one after the other, each application with a different
session tracking id., the application agent 110 in each application
server 100 identifies the requests by adding a unique session
remark that identifies the original user request's session tracking
id into the communication between the two (or more) application
servers. Non-limiting examples include adding an header to the
request between the parties in case of http requests, by adding a
remark into the Application Program Interface calls (API) or
detecting a unique identifier (such as client IP) that is
transmitted from one application server to the other or by
collecting the variables, session tracking id and time stamp for
each request in both application servers and correlate them by
comparing their time stamp and request variables. The application
server 100 that receives the remark in the communication parses it.
These added remarks identify the original user session across
different application servers, as the remark is detected and
identified by each application agent 110 on each application server
100 respectively.
[0059] For example, an application server executes the requests,
which then are sent for further processing to a secondary
application server or a microservice (Application Server 110 in
FIG. 1B) using for example, a web service call or an Application
program interface (API) (requests 102-111, and 113 to 103).
[0060] Policy to code generator 403 provides multiple sets of
injected run-time programs for each application server or
microservice instance (Application Server 110 in FIG. 1B). The
injected program 102 adds a marker to the requests that are sent
from Application Server/microservice 100 to Application
Server/microservice 110 by adding a unique ID inserted to the
request header or request body or in the request variable or
proceeding or following requests.
[0061] Another option is to use an existing request identifier
(such as source client IP address) that appears in request header
or request body that is compared in each one of the injected
programs in each one of the application servers 100 and 110.
[0062] The unique ID or the unique identifier (such as source
client IP address) is received by the injected program code in 111
thus linking the request originated in server 100.
[0063] The request information from both injected programs in
server 100 and server 110 are sent back to the computer program
and/or to the application server 100 injected program code (103) by
returning the ID as either the header and or body or in the request
variable or proceeding or following requests of the response
between application server 110 and application server 100.
[0064] FIG. 1C shows another embodiment of an exemplary,
non-limiting system of the present invention. In this embodiment,
optionally application agent 110 is in application server 100, but
additionally or alternatively there is present a sensor 400 which
sits between user application 106 and application server 100.
Another option is adding a network sensor 401 which sits between
application server 100 and data source 108.
[0065] Correlating the context regarding a user request event
collected in the network sniffer 400 with the context collected by
the application agent 110, is performed by adding a unique id. The
terms "sniffer" and "sensor" are used interchangeably.
[0066] In this instance, application agent 110, or another external
software module (not shown) external to application server 100,
collects data from sensors for 100 and 401. Such data optionally
includes all data generated from the users, from the application,
from the interaction of application user 106 as detected by sensor
400 and also as transmitted to or from data source 108 as collected
by sensor 401.
[0067] Sensor 400 can be either a network sniffer or the sensor can
be added to selected user end-points by adding a run-time code to
the user response which is sent to the end-point (stage 4) (such as
adding a Java script to the HTML output sent to a user end-point
device). The run-time code sent to the end-user device is executed
on the user device in order to collect mouse movement and key
stroke data, or challenge the user identity with additional
validation actions, or to perform any other activity on the
endpoint.
[0068] FIG. 2 shows an exemplary, illustrative non-limiting system
for constructing and injecting injectable code to provide real-time
monitoring capabilities. System 200 features a central policy
manager 400, a policy editor 402, policy to code generator 403, and
connections to a sensitive data ex-filtration events repository
405, and a policy repository 404. In this figure system 200 is
expanded to show what occurs after detection in the case where code
is to be actually injected into the real-time applications. As
previously described with regard to FIGS. 1A and 1C, detection may
optionally occur within application agent and/or with one or more
network sensors that are present between application user 100 and
the data source 108. However, in this case, preferably central
policy manager 400 features a policy editor 402.
[0069] Policy editor 402 can receive some information about
sensitive data flow samples, sensitive data source objects,
application user names, roles and responsibilities, organization
structure, patterns that match sensitive data and privacy
regulations and then creates and edits the policies. Another
option, is to import into the policy editor a predefined set of
policies that have been purposely built for the application,
organization or use case at hand (for example, list of policies for
detecting sensitive client information exposure in a Customer
Relationship Application (CRM) application).
[0070] These policies can then be stored in the policy repository
404. In addition, cases in which sensitive data has been detected
may then optionally stored in sensitive data ex-filtration events
repository 405. After policy editor 402 has been used to create the
policy then policy decode generator 403 preferably generates
injectable code for detecting sensitive data flows or sensitive
data transactions (adding, changing or deleting data, referred to
as "monitoring") and optionally enforcing, transactions according
to the policy. Optionally, such injectable code is not used, but if
present, injectable code is preferably inserted into each of the
run-time application programs 101, 102, 103, and 104, thereby
allowing each stage of the information flow between the application
user (for application 106) and the data source 108 to be accurately
monitored. Activity through application server 100 itself can also
be monitored for the presence of sensitive data and/or of the
presence of a request for such data, preferably even before the
sensitive data itself becomes present (for example, detecting
sensitive information request in the user request stage,
identifying that a specific URI and a set of parameters causes
sensitive application data to be exposed from the data source to
the requestor 101). Such injectable code is optionally and
preferably used for greater real-time responsiveness to sensitive
data requests and to the presence of sensitive data.
[0071] The Central Policy Manager (400 in FIG. 2) defines and edits
policies. The Policy to Code generator (403 in FIG. 2) generates
the run-time code and injects it into the appropriate stage's
run-time programs within one or more applications.
[0072] The injection can be initiated by the central server (400 in
FIG. 2) and/or by one or several of the instrumented run-time
programs (FIG. 2 101, 102, 103, 104).
[0073] FIG. 3 shows an optional distributed configuration. In the
distributed configuration option, no central server is needed as
one or several of the run-time programs operate as a central
server. This option also applies to the case when the configuration
is stored in a database or in a file, and is accessed by the
runtime programs.
[0074] FIG. 3A shows an exemplary, non-limiting illustrative flow
in which the central policy manager defines policies in one or more
of the request data flow stages, that when translated into code and
injected into the application server, detect user interactions with
the data. Optionally such flows may including performing one or
more actions to determine whether such interactions include
exposure of sensitive information, and can apply actions, such as
auditing the request, results and any other relevant context into
the Sensitive data exfiltration events repository 405, blocking the
request, alerting or notifying appropriate administrators, or
redacting/anonymizing/masking/encrypting/decrypting/tokenizing/de-tokeniz-
ing some or all of the data that otherwise would have returned to
the user (collectively referred to as "blocking" in following
texts).
[0075] In stage one the user application interacts with the data,
for example, running a request to receive sensitive VIP client
data. In stage two, central policy manager detects the sensitive
data output that the user has performed by detecting one or more
requests or other interactions of the user application with the
data. In stage 3 the central policy manager applies one or more
policies to the data request; as previously described such policies
need to be created in advance according to the presence of
sensitive data. In stage 4, the central manager policy audits the
request, results and any other relevant context into the sensitive
data exfiltration events repository 405 (the term "audit" relates
to this activity), and optionally
blocks/redacts/anonymizes/masks/encrypts/decrypts/tokenizes/de-tokenizes
some or all of the data of the request according to the policy. In
order for this to occur, of course, central policy manager must be
able to detect all user interactions with the data through the user
application or at least all relevant user interactions. In stage 5,
the permitted the user application continues the data interactions.
If, however, they are blocked then further data interactions do not
occur including, for example, the delivery of sensitive data.
[0076] FIG. 3B shows a similar flow except that the application
server itself performs the detection and determining whether or not
to audit, permit or block the interactions. Again, in stage one,
user application interacts with the data, but now in stage 2 the
application server detects such user interactions with the data.
Again, by detecting the request or other interaction of user
interaction with the data. In stage 3, the application server
applies one or more policies to the data request as previously
described with the regard to FIG. 3A, except that now the
application server applies such policies rather than the central
policy manager. In stage 4 the application server audits,
blocks/redacts/anonymizes/masks/encrypts/decrypts/tokenizes/de-tokenizes
some or all of the data of the request, and if permitted, as
previously described with regard to FIG. 3A, in stage 5 the user
application continues with the data interactions. If the
interactions are not permitted then they are blocked and no further
interactions may occur.
[0077] FIG. 3C is another exemplary flow with regard to detection
of user application interactions and either permitting or blocking
such interactions. Stage 1 is as for FIGS. 3A and 3B. Now, however,
in stage 2 one or several injected code applied in one or more
stages detects user interactions with the data based on the
policies that have been defined in the policy editor, transformed
into run-time code and sent to the agent, for instrumenting in one
or more stages. The injected code is preferably present as
previously described in the actual run time code of the user
application and/or of the data resource and/or of an application or
code being run on the application server. The injected code is able
to detect user interactions with the data in real time that is as
part of the normal application program code operation which would
in any case occur due to the user application interactions with the
data. In stage 3 the injected code analyzes the data request
according to one or more policies. Again, the policies need to be
previously set changed into run-time code and added to the injected
code, which now is appended to the application stage processing
logic to make the determination in real-time in run-time as part of
the regular flow of the user application interactions. In stage 4
the injected code determines whether to
block/redact/anonymize/mask/encrypt/decrypt/tokenize/de-tokenize
some or all of the data of the request. If the request is permitted
then the user application continues the data interactions to stage
5 otherwise no further data interactions are permitted and the
process stops as previously described.
[0078] FIG. 4 is an exemplary, illustrative, non-limiting flow
which shows the details of the flow of FIG. 3C in greater detail.
In stage 1 the user request, user application requests, scheduled
requests, or program call is sent to the application. The
application in this case may optionally be the user application, an
application run on the application's server and/or a data source
application. In stage 2 before the relevant application program
that handles the incoming requests is invoked, the previously
described policies are instead applied to, and executed on, the
user request, schedule request, and/or program call. Next, the
relevant application programs that handle the incoming requests are
invoked. In stage 4, optionally other programs may be invoked for
processing the user request. In stage 5, however, before the data
request is handled by the relevant programs, for example at the
database or data-file source or before other data resource may be
invoked, all policies assigned to the data requests are preferably
executed. As previously described such policies may optionally only
be applied to a subset of requests (to minimize performance
overhead) to one or more combination of variables, such as the user
application, only be applied to the application server, or only be
applied to the data source, request type, data source objects, size
of result set or the 4V score of the request. The term "4V score"
relates to a particular scoring system which includes data
sensitivity and other aspects of the request, which uses various
variables and values to compare to one or more policies, as
described in greater detail below with regard to FIGS. 9 and
10.
[0079] Next, in stage 6, the relevant programs that handle the data
request, now in this case to the database and/or to the file and/or
to the data source are invoked. In stage 7 the database and/or file
and/or data source processes the requests and returns a response to
the invoking application. In stage 8, however, before the relevant
applications that handle the returning response may be invoked one
or more and preferably all policies assigned to this particular
stage are executed. This is the stage of the data response and so
optionally policies may only be applied to this stage and/or
specific policies may only be applied to this stage. These policies
can
block/redact/anonymize/mask/encrypt/decrypt/tokenize/de-tokenize
some or all of the data of the response. In stage 9 the relevant
applications programs that handle the returning response are
invoked. In stage 10, more application programs may be invoked that
are necessary to handle this response. Again, before the relevant
applications that can handle these responses may be invoked,
policies assigned to the user response stage are preferably
executed in stage 11. These programs can
block/redact/anonymize/mask/encrypt/decrypt/tokenize/de-tokenize
some or all of the data of the response. In stage 12 relevant
application programs are invoked to complete this stage and in
stage 13 the user request, scheduled request, and/or application
requests receives a response.
[0080] Optionally at any and all stages where a policy is invoked
the entire process may be stopped or at least the process of
permitting data and/or a particular request or response to be
transmitted may optionally be stopped.
[0081] The central policy manager also can collect, analyze,
modify, replace or delete all request context and variables such as
system information (environment,), user information, application
information, request information (e.g. user computer IP address,
requested URL, request headers, cookies, end user mobile device
fingerprint etc), and data information (e.g. all parsed SQL
requests and result set structure, titles and values, outbound
output http requests and responses, etc). For example, the central
policy manager can replace a certain result set value retuned from
the data source into `xxx` applying dynamic data masking. Same
result can be done by having the central policy manager replace the
user response HTML, JSON or XML context retuned to the end-user, or
by rewriting the data source request--SQL request sent to the
database.
[0082] Similarly, encryption or tokenization can be performed on
all injected stages--encrypting/decrypting/tokenizing or
de-tokenizing values in several or all of the instrumented
stages--such as replacing the URI request parameters, and/or
rewriting the data request command, and/or replacing the returned
value or replacing the returned output values.
[0083] FIG. 5 shows the flow of each run time program (injectable
code) that is attached to, or injected within, one of the
application programs on the application server in the run time
environment. These real-time programs were shown in FIGS. 1A, 1C,
and in FIG. 2, as run-time programs 101 to 104. This flow in FIG. 5
optionally and preferably relates to the flow of the injected code
which is added to the run-time program. The process starts at stage
1, next in stage 2 before each stage program code, that is the code
that is related to the processing stage of each request, may be
invoked optionally and preferably one or more policies is applied.
Comprising conditions on a set of variable that if evaluated true
cause a set of actions to be applied, such as for example,
auditing, permitting
blocking/redacting/anonymizing/masking/encrypting/decrypting/tokenizing/d-
e-tokenizing some a particular request or transmission of data. In
stage 3 the policy, if the policy condition includes an
accumulative risk score or a sensitivity score (or other type of
scoring), and that score has been triggered, then in stage 4A a
risk evaluation is performed based on the 4V model (described in
greater detail with regard to FIGS. 9 and 10), and an updated
request for sensitivity variable is noted. If not, then in stage 4B
an action is applied which may do nothing, audit, alert, stop,
mask, redact, hide, encrypt, decrypt, tokenize or de-tokenize. Also
after stage 4A the process preferably continues automatically to
stage 4B.
[0084] Stages 4A and 4B are preferably performed in order to
determine whether cumulative risk has occurred. For example,
optionally, a user may be either permitted or even required in the
course of his or her daily work to check data on a certain number
of VIP clients, even up to 1000 VIP clients if that is required for
the employee's work. However, if the employee were to instead to
request data on a far greater number of VIP clients--5000 or 10,000
this may optionally tagged as suspicious. For other employees,
requesting on data on more than 10 VIP clients of the system may
optionally be considered to be suspicious. Therefore, the
sensitivity variable preferably determines the sensitivity of the
type of data being requested and also the role of the employee
making that request to determine whether this is a usual request or
whether, perhaps, this request should be flagged as potentially
problematic. In stage 5, the employee may be added or modified,
and/or a session or other custom variables may optionally be
invoked, which could be used by the filter in other stages. This
for example may optionally occur if an admin or other employee is
required to update the process on the fly.
[0085] Detailed Example of an Encryption/Tokenization Policy:
A policy for data encryption is defined, containing the following
information (for a data source of type relational database) 1. Name
of the column (and additional metadata to identify the column such
as table, schema, catalog, database identification) 2. Policy
type--Encryption/tokenization 3. Type of encryption/tokenization to
apply including format, if using format preserving encryption, or
specifying the API to an encryption/tokenization service. 4.
Additional conditions that will be evaluated by the run-time
program before the encryption/decryption occurs. For example,
decrypting or revealing certain special or VIP customer names only
to users with a certain IP address, geolocation, position etc. Only
these users will see decrypted VIP customer names while other
unauthorized users will see encrypted VIP customer names. For
example:
[0086] Given a policy to act on CreditCardTable.CreditCard and
secure it at rest:
Injected code apply that policy on every INSERT/UPDATE (throughout
the system), and apply the policy for READ access only for
authorized user ADAM: 1. Select CreditCard, Email from
CreditCardTable [0087] a. For user ADAM Decrypt result set, leave
the result encrypted for all the others. 2. Insert into
CreditCardTable values(`5865-3443-2323`, `myBrokenEmail@yahoo.com`)
[0088] a. For all users injected program will trigger an encryption
on `5865-3443-2323` which will rewrite the query before it is
executed 3. Insert into CreditCardTable values(?, ?) [0089] a. When
executed with the values `5865-3443-2323` and
`myBrokenEmail@yahoo.com`, the value of the credit card will be
sent encrypted to in the binding statement. An alternative
implementation is to use an encryption/tokenization stored
procedures and re-write the queries to use the data source stored
procedures that will perform the encryption/decryption. This will
be done in the application server "data request" injected run-time
program to manipulate the statement before it is sent to the
database server, and based on the current CONTEXT in the request
(much the same way as before) for example 1. Select CreditCard,
Email from CreditCardTable [0090] b. Would be re-written into:
Select Decrypt(CreditCard), Email from CreditCardTable 4. And the
insert [0091] a. Insert into CreditCardTable
values(`5865-3443-2323`, `myBrokenEmail@yahoo.com`) [0092] b. Into:
Insert into CreditCardTable values(encrypt(`5865-3443-2323`),
`myBrokenEmail@yahoo.com`) 5. Also Insert into CreditCardTable
values(?, ?) 6. Will be rewritten into Insert into CreditCardTable
values(encrypt(?), ?)
Encrypting/Tokenizing API Application Requests
[0093] The injected run-tome program identifies API calls or data
source calls using URL and xpath/Jsp.path that contain elements
that needs to be encrypted/decrypted+format/Keyset to use for
automatic encryption. The Injected run-time program replaces the
original values with encrypted/tokenized values or with
decrypted/de-tokenized values.
[0094] FIG. 6 shows an illustrative, non-limiting example of a
policy tree, shown as a policy tree 600. Each policy is preferably
built from a list of conditions. These list of conditions may
optionally be structured and invoked in the form of a tree. For
example, some conditions may require more than one variable
condition to be true to continue where other may require such
conditions to be false to continue. If all conditions in a policy
are evaluated to be true than optionally and preferably an action
is applied such an audit, an alert, a block, encrypt, tokenize etc.
The policy preferably includes an attribute to set further policy
processing, continue policy processing, by-pass the next policy to
get into the stage and/or to bypass all their policies in all
stages for example. Each policy may also optionally include an
ability to define a variable with a value or populated with a
result of a program. This variable can be used by other policies in
the policy trees within the specific user request, entire user
session (that includes multiple sessions), or in some or all
sessions of all users (referred to the setting of a global
variable). Policies are optionally and preferably structured as a
tree of policies such that each policy is preferably evaluated from
the top down.
[0095] Child policies are preferably visited only after the parent
policy is evaluated to be true. If a policy is evaluated to be
false the evaluation may optionally continue to the next parent of
that, in that level, rather than to continue to a child policy. So
for example, in this policy tree as shown, policy 1 (602) is
evaluated, optionally the process continues to evaluate policies
1.1 (604) and then 1.2 (606) only if the condition of policy 1
(602) is valid. Similarly with regard to policy 1.1 (604) the
process continues to evaluate the sub-policy of 1.1.1 (608) only if
the policy condition of 1.1 (604) is valid and so forth. With
regard to policy 2 (610) however, which is not a sub-policy to
policy 1 (602), this policy is optionally visited and processed. If
policy processing was set to continue in all previous policies
and/or if the previous policies conditions were evaluated false.
This structure is made so that the policy processing does not
continue under conditions where it is clearly not correct and on
the other hand also permits hierarchical processing of policies
according to the admin or other users preferred structure.
[0096] FIG. 7 shows a non-limiting example of how various stages
may lead to policy trees being generated which in turn may
optionally be used to create a policy code generator. In FIG. 7
there is a system 700. System 700 optionally and preferably
includes a user request stage, a data request stage, a data results
stage, and a use results stage. Each such stage has a policy tree
600--meaning a set of policies applied on each stage in the form of
a tree as previously described (numbering follows FIG. 6). The
policy trees 600 are optionally and preferably created, and then
used to inform policy to code generator (702) so that code programs
can be generated from the set of policies to be executed before the
appropriate stage program code. So then for example, policy to code
generator 702 may optionally generate code for the user request
stage which would occur just before the program code which is
performed in the stage.
[0097] The application admin can optionally define multiple trees
600 of conditions and actions to be injected to application
run-time programs for performing each stage. For example, for the
stage data result, optionally and preferably, the code can be
injected immediately before the JDBC driver receives the request.
The general run-time code is then injected into each of the
application codes at each stage to be certain that the policies are
correctly applied at run-time.
[0098] FIG. 8A shows a non-limiting example of a system featuring a
run-time environment. A system 800 shows an example of an
application run-time environment 802 with again the user request
stage, the data request stage, data results stage, and user results
stage. Each such stage features a stage program code (812) which is
then injected into real-time programs (804) to instruct the
programs with regards to one or more actions (810) such as bypass
all or partial policies in the stage--or bypass all stages
(primarily to improve performance related to non-sensitive data
requests), audit, alert, encrypt, decrypt, tokenize, detokenize or
block, and which in turn are revoked according to filters and
conditions (808). All of this code is preferably created by the
policy decode generator 403 as previously described which is shown
again in the central management server and in the policy manager
400. The central management server is therefore able to define
policies, transform them into run-time program code, and propagate
these policies to the different stages within the application
run-time environments across multiple application servers. Policy
manager 400 communicates with application run-time environment 802
through a network 806.
[0099] FIG. 8B shows the application run-time environment of FIG.
8A as previously described in more detail, in communication with
the central management server (820). Central management server
(820) in this case also features a batch concept module (822) and
batch operation (824) for the analyzer. Every variable and risk
evaluation and the 4V model risk score and cumulative risk score
calculation can be performed in real-time, and/or in batch mode (to
reduce the overhead and time wait of the online request that awaits
the response of the calculation within the policy tree processing).
What processing will be done in real-time on the application
server, and which processing will be performed on a batch model is
determined either by the administrator, by setting performance
thresholds, based on previous processing statistics, based on
server utilization, time of day, processing time or any other
available variable or a combination of these.
[0100] Central management server (820) or real-time server (814)
preferably receives information from third party sources (844) such
as LDAP databases and human resources applications. The reason for
this is that some variables are created within each user request
and some variables are created only periodically, such as every
hour, day, week and so forth (thus the LDAP request which takes
time and resources is optionally and preferably not executed PER
user request that can generate high-load on the LDAP server, BUT
performed once every hour/day/week or per session), where the
results would be populated in a variable for concurrent use by the
various run-time program policies. When a certain variable within a
condition requires "batch" calculation and thus cannot be validated
in real-time the variable would use a previous or predefined value
which enables real-time in-stream evaluation as not to cause any
delays to the stage processing. The batch variables can include an
integration with third-party identity management systems or with
LDAP services, to extract user role in organization assignments
which optionally and preferably can be done once a day.
[0101] In other situations, the central management server may
optionally need to check each variable and compare all of the
information that the users are exposed to. This could take time and
may not be done in real-time but offline. For example, in order to
calculate the sensitivity score of a result set, the respective
instrumented run-time program code or the relevant stage would send
the result set to the batch 4V request score analyzer. The batch
process analysis would calculate the would be returned as a
variable value that will be than returned back to the run-time
program code (the real-time server 814) as an accumulative risk
related variable for previous or following user request stages or
for previous or following user requests that meet certain variable
conditions. Real-time server 814 preferably operates a real-time
context determination module 816, for determining context for real
time requests; and a real-time request score analyzer 818, for
determining the score for requests, to determine whether they are
acceptable (optionally the 4V score is used as described with
regard to FIGS. 9 and 10).
[0102] Another option, is to have define in the agent operating at
the application server a Bloom filter data structure containing for
example 10 million black listed IP addresses. The set defined in
the Bloom filter can be populated whether by the agent itself
(e.g., by having the agent connect to a service that returns a list
of black listed IPs), or created by the central policy engine.
[0103] The agent can then use the Bloom filter in real time to test
whether an element is a member of a set. In this example, if the IP
of the device login into the application is found in the black
listed IP addresses, then the element "IP address" is a member of
the blacklisted set.
[0104] FIG. 8C, again, shows the application run-time environment
with the central management server and, which in this case, also
includes an optional data logger 826. The data logged by the agent
is sent to sensitive data ex-filtration events repository 405 which
was already shown in FIG. 2. The system preferably collects
ex-filtrated data as a result set and the request context is sent
into sensitive data exfiltration repository 405 which may then be
optionally used for the sensitivity model, and in particular for
the batch context and batch 4V request score analyzer.
[0105] FIG. 9 shows a non-limiting exemplary display of the results
of the data exfiltration analysis. This screen presents a type of
analysis produced by the system in which the user exfiltration is
scored based on each individual transaction, and optionally
compared to his/her peers. If each transaction is scored
individually and then the scores are added up, they would
preferably add up to a 100 accumulated on a timed basis. This is
preferably analyzed the logarithm scale in order to normalize and
easily identify the spikes. The user is then compared to his or her
peers, so for example, a user would be compared to his or her peers
in the same department, and in addition any alarms would have to be
set up if some user deviates significantly from the average or
median user and particularly from the 95th percentile user in his
or her department. The data is preferably graphically presented
with regard to ex-filtration vectors, data access, and roles,
and/or applications used. FIG. 3A shows a more detailed optional
exemplary method for assisting a user to determine which data is
sensitive if an application agent is being run.
[0106] The application agent contains a deterministic lineage
between the user request, user response, data source request, and
data source response. This is because the application agent records
all of the functionality, request and response, from both the user
application and also from the data source and, of course, from the
application server itself.
[0107] This deterministic session tracking id, collected by the
application agent, enables the lineage to be determined whether
performed in a production environment or a non production
environment and with any number of users.
[0108] The system then collects the application session tracing id
for each user request, data request, data response, and user
response. By collecting this series of session tracking ids, the
system is able to compare the session tracking ids across the
run-time programs that process the request, and the network events
as they are recorded by the sensors 400 and 401 in FIG. 1C. All
events with the same session tracking id represent a single user's
request interactions--thus lineage containing these events, and the
connection between the events, is accurately identified, preferably
across ONLY the four sensitive data flow stages (as presented in
FIG. 1A 101, 102, 103, 104). The focus on only four stages is
preferred in order to optimize between the need to collect
sensitive data flows on one hand, and the need to reduce
performance overhead, latency and complexity (due to the run-time
program code that is executed in line to each one of the stage
run-time programs).
[0109] Next, the system can collect certain request variables
within one or more stages (the user request, the data source
request, data source response, and the user response), in order to
identify the sensitivity score (4V model) of the request, the data
source objects (e.g., tables and columns when the data source is a
relational database) of each sensitive data value (e.g., Social
Security number value that has been identified as sensitive by an
end-user is found also in the first column within the result set
response). Parsing the SQL request enables the system to detect the
table and column that populate the first column of the result
set--hence identifying the source of the social security number
value.
[0110] FIG. 10 shows a non-limiting, exemplary method for
determining the "4V" model score. As shown in stage 1, the process
starts when a transaction to retrieve data is under operation by a
sensitivity request classification and scoring analyzer:
automatically classifying each request.
The sensitivity score of each user request is calculated based on
the "four V model", comprising of Value, Volume, Velocity and
Variety:
[0111] Value (stage 2A): the sensitivity level of the data exposed
to the user. The sensitivity is calculated based on administrator
providing "sensitivity score" on each data source object. For
example, in database, the sensitivity score is optionally and
preferably based on schema name, table/view name, column names and
sensitivity score. For example, client name column in the client
table. For example, table customer, column customer_name
sensitivity score is 10.
[0112] Volume of the sensitive data (stage 2D). The volume is the
number of sensitive records returned to the user.
[0113] Variety of the data exposed (stage 2C). Variety means the
number of unique records exposed within a certain time window
(e.g., variety within 5 working days) Velocity of the exposure
events (stage 2B). Velocity adds a factor to the sensitivity score
of a user based on different anomaly behavior indicators, such as
abnormal behavior of data exfiltration, accelerated exfiltration
behavior per time frame, exfiltration red-lines crossed and peer
comparison.
[0114] Details:
[0115] Calculating Request Sensitivity Score
[0116] Calculating sensitivity score of user request is based on
the "four V model", comprising of Value, Volume, Velocity and
Variety.
[0117] Value score calculation may optionally be performed as
follows.
1. Prerequisites:
[0118] An administrator defines a list of objects, including URI,
requests, SQL requests, data objects such as tables, views and
columns, stored procedures or program code, or API calls. For each
entry in the list, the administrator assigns a sensitivity score.
All values support regular expression or wildcard entry. For
example:
TABLE-US-00001 Schema Object name Result set values Sensitivity
Classification Apps Table Customer, Select customer_name from 2 PII
column customer where VIP = `No` customer_name Apps Table Customer,
Select customer name_name from 8 PII, VIP column customer where VIP
= `Yes` customer _name [.*] Table Employees, No filter 5 PII column
name URI URI app.newco.com\cashwithdrawal 20 Sensitive
[0119] Each request that is not filtered out, is parsed and matched
with the list objects, automatically assigning them sensitivity
score and the classification. Classifications are used for both
grouping and analyzing sensitive requests as well as adding another
multiplier factor to the sensitivity score.
[0120] Another optional source of sensitivity objects is a
predefined policy tree which already has been built for a certain
packaged business application (such as SAP ERP application). The
packaged application uses a fixed data source objects for their
sensitive data and the system can discover this structure (from a
discovery process, application source documentation, prior
experience) and import it into the "4V" model.
[0121] Classification examples include PCI related requests, VIP
data related requests.
[0122] For example:
TABLE-US-00002 Classification Multiplier PII 2 VIP 4 Sensitive
1
[0123] When a user submits a request: "select customer_name from
customer" and the response includes VIP customer names, the
sensitivity score is 8.times.4 (8 is the object sensitivity score
and 4 is the sensitivity multiplier assigned to the classification
`VIP`), totaling 32 sensitivity score for the request. In case
several classifications are assigned to the request, the highest
multiplier is used for the sensitivity score calculation, but other
options, such as factoring both classifications is optional.
[0124] Volume Calculation:
[0125] The request volume is based on the number of data returned
to the user. For example, the system calculates the value of the
request "select customer_name from customer" to be 32. As the
request retuned 10,000 VIP client records, the total sensitivity
score is 32.times.10,000=320,000.
[0126] The sensitivity score can be presented in logarithmic scale,
thus the sensitivity score is 5.5
[0127] Variety Calculation:
[0128] Variety is defined as an end-user exposure to new sensitive
information records that were not exposed to a user in the past.
For example, a sales representative that is continuously exposed to
the same customers that he/she works with (as should be) will have
low variety score, until the sales representative explores new
customer information that was not exposed to him/her during the
predefined time period (for example, last week, last month or
ever). In some cases, before the sales representative resigns and
moves to a competitor, he/she might decide to expose him/herself to
more customer information than necessary to perform the current
role.
[0129] In at least some embodiments, the invention calculates
variety based on the uniqueness of the result set during a certain
time period (defined as a parameter value).
[0130] For each request, the sensitive records from the result set
are compared with previous sensitive records that were exposed to
the user during the last X days. Sensitive records that already
have been exposed in a previous user requests are removed from the
volume score, and thus reduce the sensitivity score of the
transaction to equal only unique sensitive records.
[0131] Example: the parameter "Variety number of days" is set for 3
business days. The sales representative ran a transaction two days
ago exposing 100 VIP clients. The sales representative ran the same
transaction now, retrieving the same 100 VIP clients--thus the
sensitive score of the newly request is 0 (as 3 business days did
not pass from the previous exposure). If the sales representative
will run the request one week later, exposing the same 100 VIP
clients, than the new request will have again a sensitivity risk
score of 32.times.100-3,200. In order to reduce false positive
alerts in the system, the amount of business days can be extended
to 30 days, thus only activating a high risk alert when the
representative is exposed to a large number of new customers that
most probably where not required for her to perform her job.
[0132] The VARIETY is also preferably used to answer the
following--how has the user accessed the sensitive information of
client X, when and where (using which application, client
information). This is the essence of all privacy regulation--that
impose a concept of "need to know" basis, namely if an application
user is accessing personal information on a client without a real
business need, that this is unlawful.
[0133] Variety Calculation Based on the User's Exposed Data Sets
and Actual Sensitive Values:
[0134] A user accesses information from one hundred clients,
including client names, addresses, Social Security numbers and
account balances. In the following example, customer name is used
to explain how VARIETY is measured, which is the case with any
other sensitive value.
[0135] Each sensitive value exposure occurrence is captured by the
System, including but not limited to the value, a unique identifier
of the sensitive entity (for example, for each customer name value,
the customer_ID information is also collected. For employee SSN
(social security number) value, the employee number is collected).
This is done to uniquely identify the sensitive data exposed
including the session tracking id and a time stamp. This minimal
context is preferably collected for each sensitive data element
that is exposed to every user.
[0136] For example, if both customer name and customer SSN is
exposed to an application user (by detecting an end-user access the
"order entry" application screen), then two VARIETY records are
added:
TABLE-US-00003 Object Identifier Session Application name Value
Customer_id tracking id name Time stamp . . . Customer John Tiger
132435465 987654321 CRM 10-Oct-15 name 04:04:04 11:21:12 Customer
9999- 132435465 987654321 CRM 10-Oct-15 SSN 999999 04:04:04
11:21:12 Note: in some cases, the "Value" and\or "Identifier
Customer_id" values can be hashed, masked (XXX), tokenized or
encrypted - in order to keep the confidentiality of the sensitive
data.
[0137] Different options are presented to collect the variety of
sensitive data: [0138] 1. Based on parsing the SQL/data source
request and collecting the resulting data structures: The policies
include a description of the sensitive data source objects as well
as a unique identifier for each value. For example, the policies
include a policy on the customer header table, the customer name
column and the customer_id column.
[0139] Whenever a user runs a SQL "select" request with a result
set that includes customer_id and customer name--both these values
are collected by the system. [0140] 2. Another option for a
"Variety calculation" policy is to use the structured result set:
The result set of the sensitive requests include a formatted column
names and result data. A policy can be created to collect the
customer name data based on the column name, in addition to the
customer_id column or any other distinctive means that enables to
identify the specific customer (e.g., a unique key value). If
customer_id or any other unique key is not included, then the
customer name itself will be used to identify uniqueness of the
record. [0141] 3. Another option is to use the request output: The
output of the sensitive requests include an object with A policy
can be created to collect the customer name data based on the
column name, in addition to the customer_id column or any other
distinctive means that enables to identify the specific customer
(e.g., a unique key value). If customer_id or any other unique key
is not included, than the customer name is optionally used to
identify uniqueness of the record.
[0142] The system encrypts or tokenizes or applies a hash function
on these values either or both in the real-time request score
analyzer and on the batch 4V request score analyzer
[0143] The System collects all sensitive information that the user
accessed.
[0144] The Variety calculation can be performed in real-time, by
the Real-time request score analyzer and by the Batch 4V request
score analyzer or based on a combination of the two.
[0145] Deciding where the VARIETY calculation will be performed (in
real-time or batch) can be determined by the system administration,
based on the request variables (such as system, user, session,
request, classification and object variables or combination of
those).
[0146] Velocity Calculation:
[0147] Velocity is defined as a multiplying factor (between 0-100)
for calculating the sensitivity score of the transaction. For
example, if the Velocity factor is 2, and the sensitivity score of
a request is 3,200, than the updated sensitivity score is
3,200.times.2=6,400. The log(6,400)=3.8.
[0148] The factor is calculated based on the following use
cases:
[0149] Abnormal behavior of data exfiltration: Exfiltration that
occurs in a deterministic way. For example, exfiltration every X
seconds over a certain period of time can only be performed using a
malware. When this behavior is identified, a high factor value is
assigned to the requests.
[0150] Accelerated exfiltration behavior per time frame: when
either the amount of sensitive records exposed, and/or the
sensitivity level of the request, and/or the number of sensitive
requests during a time interval has substantially increased by more
than an X times compared to the previous time interval a high
factor value is assigned to the requests.
[0151] Exfiltration red-lines crossed: When a predefined threshold
of the sensitive data volume exposure has been crossed: for
example, VIP customer exposure threshold is 1,000. Every user that
is exposed to more than 1,000 customers is a risk, thus a high
factor value is assigned to the user's requests.
[0152] Peer comparison: When the user total exfiltration risk score
exceeds X times or Y standard deviation from the median peer or
from the X percentile user risk score.
The peer group is based on a common role, responsibility,
department, ActiveDirectory group and/or LDAP properties. The
invention enables to compare exfiltration data scores and trends
with peers to detect outliers. The user's sensitive exfiltration
values, volume, variety and velocity scores are continuously
compared to the user's peers. Any deviation from the peer behavior
is automatically alerted to the security administrators for
investigation.
[0153] External security systems such as IAM (Identity Access
Management)--providing a notification when a user have just
resigned and is leaving to a competitor, or when a devices has been
infected by malware--thus a high factor value is assigned to the
user's requests.
[0154] In stage 3, the above is preferably compared to the user's
own history of data behavior. In stage 4, the above is preferably
compared to the historical behavior of a group, such as the user's
peers. In stage 5, the above is preferably compared to one or more
policies. In stage 6, these comparisons determine whether the
request is allowed to proceed.
[0155] It will be appreciated that various features of the
invention which are, for clarity, described in the contexts of
separate embodiments may also be provided in combination in a
single embodiment. Conversely, various features of the invention
which are, for brevity, described in the context of a single
embodiment may also be provided separately or in any suitable
sub-combination. It will also be appreciated by persons skilled in
the art that the present invention is not limited by what has been
particularly shown and described hereinabove. Rather the scope of
the invention is defined only by the claims which follow.
* * * * *