U.S. patent application number 12/552395 was filed with the patent office on 2011-03-03 for hierarchical statistical model of internet reputation.
Invention is credited to Ross G. Thomas.
Application Number | 20110055923 12/552395 |
Document ID | / |
Family ID | 43568660 |
Filed Date | 2011-03-03 |
United States Patent
Application |
20110055923 |
Kind Code |
A1 |
Thomas; Ross G. |
March 3, 2011 |
HIERARCHICAL STATISTICAL MODEL OF INTERNET REPUTATION
Abstract
In embodiments of the present invention improved capabilities
are described for predicting the reputation of a communication
identifier, such as a web address, a domain name, an IP address,
host name, email address, IM address, telephone number, VoIP
telephony address, and the like. In embodiments, the present
invention may receive a communication from a first communication
identifier, parse the first communication identifier into its
components, and assign the components to a hierarchical tree
structure, where the hierarchical tree structure maintains the
hierarchical relationship between the components of the
communication identifier. The present invention may monitor and
keep count of a number of communications from the first
communication identifier, wherein the number of communications may
be kept for both malicious and/or unwanted communications and
non-malicious and/or unwanted communications. Attributes may then
be provided to the number of communications for each appropriate
component of the hierarchical tree, and a statistical measure may
be calculated as related to the number of communications for each
component of the hierarchical tree. The present invention may then
receive a communication from a second communication identifier,
where the second communication identifier may be previously unknown
and have a common component with the hierarchical tree. The
statistical measure of the common component may then be assigned to
the second communication identifier, and utilizing the statistical
measure assigned to the second communication identifier, may
provide a prediction of reputation of the second communication
identifier.
Inventors: |
Thomas; Ross G.; (Vancouver,
CA) |
Family ID: |
43568660 |
Appl. No.: |
12/552395 |
Filed: |
September 2, 2009 |
Current U.S.
Class: |
726/23 ;
707/E17.045; 709/217 |
Current CPC
Class: |
H04L 51/12 20130101;
G06Q 10/107 20130101; H04L 63/105 20130101 |
Class at
Publication: |
726/23 ;
707/E17.045; 709/217 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer program product embodied in a computer readable
medium that, when executing on one or more computers, performs the
steps of: receiving a communication from a first communication
identifier; parsing the first communication identifier into its
components and assigning the components to a hierarchical tree
structure, wherein the hierarchical tree structure maintains the
hierarchical relationship between the components of the
communication identifier; monitoring and keeping count of a number
of communications from the first communication identifier, wherein
the number of communications includes both a count of at least one
of malicious and unwanted communications and a count of at least
one of non-malicious and unwanted communications; attributing the
number of communications to each appropriate component of the
hierarchical tree; calculating a statistical measure related to the
number of communications for each component of the hierarchical
tree; receiving a communication from a second communication
identifier, wherein the second communication identifier is
previously unknown and has a common component with the hierarchical
tree; assigning the statistical measure of the common component to
the second communication identifier; and utilizing the statistical
measure assigned to the second communication identifier to provide
a prediction of reputation of the second communication
identifier.
2. The computer program product of claim 1, wherein the
communication identifier is in the form of a web address.
3. The computer program product of claim 1, wherein the
communication identifier is in the form of a domain name.
4. The computer program product of claim 2, wherein the components
are domain name components.
5. The computer program product of claim 1, wherein the
communication identifier is in the form of an IP address.
6. The computer program product of claim 4, wherein the components
are IP address components.
7. The computer program product of claim 1, wherein the
communication identifier is in the form of an email address.
8. The computer program product of claim 1, wherein the
communication identifier is in the form of an IM address.
9. The computer program product of claim 1, wherein the
communication identifier is in the form of a telephone number.
10. The computer program product of claim 1, wherein the
communication identifier is in the form of a VoIP telephony
address.
11. The computer program product of claim 1, wherein the component
is at a root level.
12. The computer program product of claim 1, wherein the component
is at a lower level.
13. The computer program product of claim 1, wherein the at least
one of malicious and unwanted communication is spam.
14. The computer program product of claim 1, wherein the at least
one of malicious and unwanted communication is from a computer that
is part of a botnet.
15. The computer program product of claim 1, wherein the
statistical measure relates the number of at least one of malicious
and unwanted communications to the number of total
communications.
16. The computer program product of claim 1, wherein the reputation
is related to how much spam a website communicates.
Description
BACKGROUND
[0001] 1. Field
[0002] The present invention is related to computer security, and
more specifically to reputation of Internet entities.
[0003] 1. Description of the Related Art
[0004] Assessing the reputation ("trustworthiness") of Internet
entities via identifiers such as IP addresses and hostnames is a
fundamental network security task. Most current approaches relying
on black- and white-lists of known identifiers suffer from a grave
weakness: they are unable to assign a reputation score to an entity
not already encountered--such as the many thousands of new bots
that are brought online daily. Therefore there exists a need for
improved reputation assessment, where a reputation score may be
assigned to new entities as they are encountered for the first
time.
SUMMARY
[0005] In embodiments, the present invention may provide for a
hierarchical modeling processing facility to build a hierarchical
reputation model of Internet entities that allows for the automated
clustering of "neighborhoods" of related systems, and thus the
reputation assessment even of brand new hosts within a
neighborhood. The reputation of such a neighborhood, with an
accompanying confidence level, may be derived statistically from
measurements of observed behavior of the systems within it. This
may in turn be used to produce an action to provide additional
security to users.
[0006] In embodiments of the present invention improved
capabilities are described for predicting the reputation of a
communication identifier, such as a web address, a domain name, an
IP address, host name, email address, IM address, telephone number,
VoIP telephony address, and the like. In embodiments, the present
invention may receive a communication from a first communication
identifier, parse the first communication identifier into its
components, and assign the components to a hierarchical tree
structure, where the hierarchical tree structure maintains the
hierarchical relationship between the components of the
communication identifier. The present invention may monitor and
keep count of a number of communications from the first
communication identifier, wherein the number of communications may
be kept for both malicious and/or unwanted communications and
non-malicious and/or unwanted communications. Attributes may then
be provided to the number of communications for each appropriate
component of the hierarchical tree, and a statistical measure may
be calculated as related to the number of communications for each
component of the hierarchical tree. The present invention may then
receive a communication from a second communication identifier,
where the second communication identifier may be previously unknown
and have a common component with the hierarchical tree. The
statistical measure of the common component may then be assigned to
the second communication identifier, and utilizing the statistical
measure assigned to the second communication identifier, may
provide a prediction of reputation of the second communication
identifier. In embodiments, the communication identifier may be in
the form of a web address, a domain name, domain name components,
an IP address, IP address components, an email address, an IM
address, a telephone number, a VoIP telephony address, and the
like. In embodiments, the component may be at a root level, a top
level, a lower level, and the like. In embodiments, the at least
one of malicious and unwanted communication may be spam, from a
computer that is part of a botnet, and the like. In embodiments,
the statistical measure may relate the number of at least one of
malicious and unwanted communications to the number of total
communications. In embodiments, the reputation may be related to
how much spam the website communicates.
[0007] These and other systems, methods, objects, features, and
advantages of the present invention will be apparent to those
skilled in the art from the following detailed description of the
preferred embodiment and the drawings. All documents mentioned
herein are hereby incorporated in their entirety by reference.
BRIEF DESCRIPTION OF THE FIGURES
[0008] The invention and the following detailed description of
certain embodiments thereof may be understood by reference to the
following figures:
[0009] FIG. 1 depicts a block diagram of a threat management
facility providing protection to an enterprise against a plurality
of threats.
[0010] FIG. 2 depicts the security management facility related to
Internet entities through a hierarchical modeling processing
facility and reputation database.
[0011] FIG. 3 depicts an example of a hierarchical tree structure
in an embodiments of the present invention.
[0012] FIGS. 4 through 10 depict an example of how a hierarchical
tree may be built for a domain name structure in an embodiment of
the present invention.
[0013] FIGS. 11 through 14 depict an example of how a hierarchical
tree may be built for an IP address structure in an embodiment of
the present invention.
[0014] FIG. 15 depicts a functional flow diagram in an embodiment
of the present invention.
[0015] While the invention has been described in connection with
certain preferred embodiments, other embodiments would be
understood by one of ordinary skill in the art and are encompassed
herein.
[0016] All documents referenced herein are hereby incorporated by
reference.
DETAILED DESCRIPTION
[0017] FIG. 1 depicts a block diagram of a threat management
facility providing protection to an enterprise against a plurality
of threats. An aspect of the present invention relates to corporate
policy management and implementation through a unified threat
management facility 100. As will be explained in more detail below,
a threat management facility 100 may be used to protect computer
assets from many threats, both computer-generated threats and
user-generated threats. The threat management facility 100 may be
multi-dimensional in that it may be designed to protect corporate
assets from a variety of threats and it may be adapted to learn
about threats in one dimension (e.g. worm detection) and apply the
knowledge in another dimension (e.g. spam detection). Policy
management is one of the dimensions for which the threat management
facility can provide a control capability. A corporation or other
entity may institute a policy that prevents certain people (e.g.
employees, groups of employees, types of employees, guest of the
corporation, etc.) from accessing certain types of computer
programs. For example, the corporation may elect to prevent its
accounting department from using a particular version of an instant
messaging service or all such services. In this example, the policy
management facility 112 may be used to update the policies of all
corporate computing assets with a proper policy control facility or
it may update a select few. By using the threat management facility
100 to facilitate the setting, updating and control of such
policies the corporation only needs to be concerned with keeping
the threat management facility 100 up to date on such policies. The
threat management facility 100 can take care of updating all of the
other corporate computing assets.
[0018] It should be understood that the threat management facility
100 may provide multiple services, and policy management may be
offered as one of the services. We will now turn to a description
of certain capabilities and components of the threat management
system 100.
[0019] Over recent years, malware has become a major problem across
the internet 154. From both technical and user perspectives, the
categorization of a specific threat type, whether as virus, worm,
spam, phishing exploration, spyware, adware, or the like, is
becoming reduced in significance. The threat, no matter how it is
categorized, may need to be stopped at various points of a
networked computing environment, such as one of an enterprise
facility 102, including at one or more laptops, desktops, servers,
such as the server facility 142, gateways, communication ports,
handheld or mobile devices, firewalls, and the like. Similarly,
there may be less and less benefit to the user in having different
solutions for known and unknown threats. As such, a consolidated
threat management facility 100 may need to apply a similar set of
technologies and capabilities for all threats. In certain
embodiments, the threat management facility 100 may provide a
single agent on the desktop, and a single scan of any suspect file.
This approach may eliminate the inevitable overlaps and gaps in
protection caused by treating viruses and spyware as separate
problems, while simultaneously simplifying administration and
minimizing desktop load. As the number and range of types of
threats has increased, so may have the level of connectivity
available to all IT users. This may have lead to a rapid increase
in the speed at which threats may move. Today, an unprotected PC
connected to the internet 154 may be infected quickly (perhaps
within 10 minutes) which may require acceleration for the delivery
of threat protection. Where once monthly updates may have been
sufficient, the threat management facility 100 may automatically
and seamlessly update its product set against spam and virus
threats quickly, for instance, every five minutes, every minute,
continuously, or the like. Analysis and testing may be increasingly
automated, and also may be performed more frequently; for instance,
it may be completed in 15 minutes, and may do so without
compromising quality. The threat management facility 100 may also
extend techniques that may have been developed for virus and
malware protection, and provide them to enterprise facility 102
network administrators to better control their environments. In
addition to stopping malicious code, the threat management facility
100 may provide policy management that may be able to control
legitimate applications, such as VoIP, instant messaging,
peer-to-peer file-sharing, and the like, that may undermine
productivity and network performance within the enterprise facility
102.
[0020] The threat management facility 100 may provide an enterprise
facility 102 protection from computer-based malware, including
viruses, spyware, adware, Trojans, intrusion, spam, policy abuse,
uncontrolled access, and the like, where the enterprise facility
102 may be any entity with a networked computer-based
infrastructure. In an embodiment, FIG. 1 may depict a block diagram
of the threat management facility providing protection to an
enterprise against a plurality of threats. The enterprise facility
102 may be corporate, commercial, educational, governmental, or the
like, and the enterprise facility's 102 computer network may be
distributed amongst a plurality of facilities, and in a plurality
of geographical locations. The threat management facility 100 may
include a plurality of functions, such as security management
facility 122, policy management facility 112, update facility 120,
definitions facility 114, network access rules facility 124,
remedial action facility 128, detection techniques facility 130,
testing facility 118, threat research facility 132, and the like.
In embodiments, the threat protection provided by the threat
management facility 100 may extend beyond the network boundaries of
the enterprise facility 102 to include client facilities 144 that
have moved into network connectivity not directly associated or
controlled by the enterprise facility 102. Threats to client
facilities 144 may come from a plurality of sources, such as from
network threats 104, physical proximity threats 110, secondary
location threats 108, and the like. In embodiments, the threat
management facility 100 may provide an enterprise facility 102
protection from a plurality of threats to multiplatform computer
resources in a plurality of locations and network configurations,
with an integrated system approach.
[0021] In embodiments, the threat management facility 100 may be
provided as a stand-alone solution. In other embodiments, the
threat management facility 100 may be integrated into a third-party
product. An application programming interface (e.g. a source code
interface) may be provided such that the threat management facility
100 may be integrated. For instance, the threat management facility
100 may be stand-alone in that it provides direct threat protection
to an enterprise or computer resource, where protection is
subscribed to directly 100. Alternatively, the threat management
facility may offer protection indirectly, through a third-party
product, where an enterprise may subscribe to services through the
third-party product, and threat protection to the enterprise may be
provided by the threat management facility 100 through the
third-party product.
[0022] The security management facility 122 may include a plurality
of elements that provide protection from malware to enterprise
facility 102 computer resources, including endpoint security and
control, email security and control, web security and control,
reputation-based filtering, control of unauthorized users, control
of guest and non-compliant computers, and the like. The security
management facility 122 may be a software application that may
provide malicious code and malicious application protection to a
client facility 144 computing resource. The security management
facility 122 may have the ability to scan the client facility 144
files for malicious code, remove or quarantine certain applications
and files, prevent certain actions, perform remedial actions and
perform other security measures. In embodiments, scanning the
client facility 144 may include scanning some or all of the files
stored to the client facility 144 on a periodic basis, may scan
applications once the application has been requested to execute,
may scan files as the files are transmitted to or from the client
facility 144, or the like. The scanning of the applications and
files may be to detect known malicious code or known unwanted
applications. In an embodiment, new malicious code and unwanted
applications may be continually developed and distributed, and
updates to the known code database may be provided on a periodic
basis, on a demand basis, on an alert basis, or the like.
[0023] In an embodiment, the security management facility 122 may
provide for email security and control, where security management
may help to eliminate spam, viruses, spyware and phishing, control
of email content, and the like. The security management facility's
122 email security and control may protect against inbound and
outbound threats, protect email infrastructure, prevent data
leakage, provide spam filtering, and the like. In an embodiment,
security management facility 122 may provide for web security and
control, where security management may help to detect or block
viruses, spyware, malware, unwanted applications, help control web
browsing, and the like, which may provide comprehensive web access
control enabling safe, productive web browsing. Web security and
control may provide internet use policies, reporting on suspect
devices, security and content filtering, active monitoring of
network traffic, URI filtering, and the like. In an embodiment, the
security management facility 122 may provide for network access
control, which may provide control over network connections.
Network control may stop unauthorized, guest, or non-compliant
systems from accessing networks, and may control network traffic
that may not be bypassed from the client level. In addition,
network access control may control access to virtual private
networks (VPN), where VPNs may be a communications network tunneled
through another network, establishing a logical connection acting
as a virtual network. In embodiments, a VPN may be treated in the
same manner as a physical network.
[0024] In an embodiment, the security management facility 122 may
provide for host intrusion prevention through behavioral based
protection, which may guard against unknown threats by analyzing
behavior before software code executes. Behavioral based protection
may monitor code when it runs and intervene if the code is deemed
to be suspicious or malicious. Advantages of behavioral based
protection over runtime protection may include code being prevented
from running, whereas runtime protection may only interrupt code
that has already partly executed; behavioral protection may
identify malicious code at the gateway or on the file servers and
deletes it before reaching end-point computers and the like.
[0025] In an embodiment, the security management facility 122 may
provide for reputation filtering, which may target or identify
sources of known malware. For instance, reputation filtering may
include lists of URIs of known sources of malware or known
suspicious IP addresses, or domains, say for spam, that when
detected may invoke an action by the threat management facility
100, such as dropping them immediately. By dropping the source
before any interaction can initiate, potential threat sources may
be thwarted before any exchange of data can be made.
[0026] In embodiments, information may be sent from the enterprise
back to a third party, a vendor, or the like, which may lead to
improved performance of the threat management facility 100. For
example, the types, times, and number of virus interactions that a
client experiences may provide useful information for the
preventions of future virus threats. This type of feedback may be
useful for any aspect of threat detection. Feedback of information
may also be associated with behaviors of individuals within the
enterprise, such as being associated with most common violations of
policy, network access, unauthorized application loading,
unauthorized external device use, and the like. In embodiments,
this type of information feedback may enable the evaluation or
profiling of client actions that are violations of policy that may
provide a predictive model for the improvement of enterprise
policies.
[0027] In an embodiment, the security management facility 122 may
provide for the overall security of the enterprise facility 102
network or set of enterprise facility 102 networks, may provide
updates of malicious code information to the enterprise facility
102 network, and associated client facilities 144. The updates may
be a planned update, an update in reaction to a threat notice, an
update in reaction to a request for an update, an update based on a
search of known malicious code information, or the like. The
administration facility 134 may provide control over the security
management facility 122 when updates are performed. The updates may
be automatically transmitted without an administration facility's
134 direct control, manually transmitted by the administration
facility 134, or the like. The security management facility 122 may
include the management of receiving malicious code descriptions
from a provider, distribution of malicious code descriptions to
enterprise facility 102 networks, distribution of malicious code
descriptions to client facilities 144, or the like. In an
embodiment, the management of malicious code information may be
provided to the enterprise facility's 102 network, where the
enterprise facility's 102 network may provide the malicious code
information through the enterprise facility's 102 network
distribution system.
[0028] The threat management facility 100 may provide a policy
management facility 112 that may be able to block non-malicious
applications, such as VoIP 164, instant messaging 162, peer-to-peer
file-sharing, and the like, that may undermine productivity and
network performance within the enterprise facility 102. The policy
management facility 112 may be a set of rules or policies that may
indicate enterprise facility 102 access permissions for the client
facility 144, such as access permissions associated with the
network, applications, external computer devices, and the like. The
policy management facility 112 may include a database, a text file,
a combination of databases and text files, or the like. In an
embodiment, a policy database may be a block list, a black list, an
allowed list, a white list, or the like that may provide a list of
enterprise facility 102 external network locations/applications
that may or may not be accessed by the client facility 144. The
policy management facility 112 may include rules that may be
interpreted with respect to an enterprise facility 102 network
access request to determine if the request should be allowed. The
rules may provide a generic rule for the type of access that may be
granted; the rules may be related to the policies of an enterprise
facility 102 for access rights for the enterprise facility's 102
client facility 144. For example, there may be a rule that does not
permit access to sporting websites. When a website is requested by
the client facility 144, a security facility may access the rules
within a policy facility to determine if the requested access is
related to a sporting website. In an embodiment, the security
facility may analyze the requested website to determine if the
website matches with any of the policy facility rules.
[0029] The policy management facility 112 may be similar to the
security management facility 122 but with the addition of
enterprise facility 102 wide access rules and policies that may be
distributed to maintain control of client facility 144 access to
enterprise facility 102 network resources. The policies may be
defined for application type, subset of application capabilities,
organization hierarchy, computer facility type, user type, network
location, time of day, connection type, or the like. Policies may
be maintained by the administration facility 134, through the
threat management facility 100, in association with a third party,
or the like. For example, a policy may restrict IM 162 activity to
only support personnel for communicating with customers. This may
allow communication for departments requiring access, but may
maintain the network bandwidth for other activities by restricting
the use of IM 162 to only the personnel that need access to IM 162
in support of the enterprise facility 102. In an embodiment, the
policy management facility 112 may be a stand-alone application,
may be part of the network server facility 142, may be part of the
enterprise facility 102 network, may be part of the client facility
144, or the like.
[0030] In embodiments, the threat management facility 100 may
provide configuration management, which may be similar to policy
management, but may specifically examine the configuration set of
applications, operating systems, hardware, and the like, and
managing changes to their configurations. Assessment of a
configuration may be made against a standard configuration policy,
detection of configuration changes, remediation of improper
configuration, application of new configurations, and the like. An
enterprise may keep a set of standard configuration rules and
policies which may represent the desired state of the device. For
example, a client firewall may be running and installed, but in the
disabled state, where remediation may be to enable the firewall. In
another example, the enterprise may set a rule that disallows the
use of USB disks, and sends a configuration change to all clients,
which turns off USB drive access via a registry.
[0031] In embodiments, the threat management facility 100 may also
provide for the removal of applications that may interfere with the
operation of the threat management facility 100, such as competitor
products that may also be attempting similar threat management
functions. The removal of such products may be initiated
automatically whenever such products are detected. In the case
where such applications are services are provided indirectly
through a third-party product, the application may be suspended
until action is taken to remove or disable the third-party
product's protection facility.
[0032] Threat management against a sometimes quickly evolving
malware environment may require timely updates, and the update
management facility 120 may be provided by the threat management
facility 100. In addition, a policy management facility 112 may
also require update management (e.g. as provided by the update
facility 120 herein described), as the enterprise facility 102
requirements for policies change enterprise facility 102, client
facility 144, server facility 142 enterprise facility 102. The
update management for the security facility 122 and policy
management facility 112 may be provided directly by the threat
management facility 100, such as by a hosted system or in
conjunction with the administration facility 134. In embodiments,
the threat management facility 100 may provide for patch
management, where a patch may be an update to an operating system,
an application, a system tool, or the like, where one of the
reasons for the patch is to reduce vulnerability to threats.
[0033] In embodiments, the security facility 122 and policy
management facility 112 may push information to the enterprise
facility 102 network and/or client facility 144, the enterprise
facility 102 network and/or client facility 144 may pull
information from the security facility 122 and policy management
facility 112 network server facilities 142, there may be a
combination of pushing and pulling of information between the
security facility 122 and the policy management facility 112
network servers 142, enterprise facility 102 network, and client
facilities 144, or the like. For example, the enterprise facility
102 network and/or client facility 144 may pull information from
the security facility 122 and policy management facility 112
network server facility 142 may request the information using the
security facility 122 and policy management facility 112 update
module; the request may be based on a certain time period, by a
certain time, by a date, on demand, or the like. In another
example, the security facility 122 and policy management facility
112 network servers 142 may push the information to the enterprise
facility's 102 network and/or client facility 144 by providing
notification that there are updates available for download and then
transmitting the information. The combination of the security
management 122 network server facility 142 and security update
module may function substantially the same as the policy management
facility 112 network server and policy update module by providing
information to the enterprise facility 102 network and the client
facility 144 in a push or pull method. In an embodiment, the policy
management facility 112 and the security facility 122 management
update modules may work in concert to provide all the needed
information to the enterprise facility's 102 network and/or client
facility 144 for control of application execution. In an
embodiment, the policy update module and security update module may
be combined into a single update module.
[0034] As threats are identified and characterized, the threat
management facility 100 may create definition updates that may be
used to allow the threat management facility 100 to detect and
remediate the latest malicious software, unwanted applications,
configuration and policy changes, and the like. The threat
definition facility 114 may contain threat identification updates,
also referred to as definition files. A definition file may be a
virus identity file that may include definitions of known or
potential malicious code. The virus identity (IDE) definition files
may provide information that may identify malicious code within
files, applications, or the like. The definition files may be
accessed by security management facility 122 when scanning files or
applications within the client facility 144 for the determination
of malicious code that may be within the file or application. The
definition files may contain a number of commands, definitions, or
instructions, to be parsed and acted upon, or the like. In
embodiments, the client facility 144 may be updated with new
definition files periodically to provide the client facility 144
with the most recent malicious code definitions; the updating may
be performed on a set time period, may be updated on demand from
the client facility 144, may be updated on demand from the network,
may be updated on a received malicious code alert, or the like. In
an embodiment, the client facility 144 may request an update to the
definition files from an update facility 120 within the network,
may request updated definition files from a computing facility
external to the network, updated definition files may be provided
to the client facility 114 from within the network, definition
files may be provided to the client facility 144 from an external
computing facility from an external network, or the like.
[0035] In an embodiment, a definition management facility 114 may
provide for the timely updates of definition files information to
the network, client facilities 144, and the like. New and altered
malicious code and malicious applications may be continually
created and distributed to networks worldwide. The definition files
that maintain the definitions of the malicious code and malicious
application information for the protection of the networks and
client facilities 144 may need continual updating to provide
continual defense of the network and client facility 144 from the
malicious code and malicious applications. The definition files
management may provide for automatic and manual methods of updating
the definition files. In embodiments, the network may receive
definition files and distribute the definition files to the network
client facilities 144, the client facilities 144 may receive the
definition files directly, or the network and client facilities 144
may both receive the definition files, or the like. In an
embodiment, the definition files may be updated on a fixed periodic
basis, on demand by the network and/or the client facility 144, as
a result of an alert of a new malicious code or malicious
application, or the like. In an embodiment, the definition files
may be released as a supplemental file to an existing definition
files to provide for rapid updating of the definition files.
[0036] In a similar manner, the security management facility 122
may be used to scan an outgoing file and verify that the outgoing
file is permitted to be transmitted per the enterprise facility 102
rules and policies. By checking outgoing files, the security
management facility 122 may be able discover malicious code
infected files that were not detected as incoming files as a result
of the client facility 144 having been updated with either new
definition files or policy management facility 112 information. The
definition files may discover the malicious code infected file by
having received updates of developing malicious code from the
administration facility 134, updates from a definition files
provider, or the like. The policy management facility 112 may
discover the malicious code infected file by having received new
updates from the administration facility 134, from a rules
provider, or the like.
[0037] The threat management facility 100 may provide for a way to
control access to the enterprise facility 102 networks. For
instance, the enterprise facility 102 may want to restrict access
to certain applications, networks, files, printers, servers,
databases, or the like. In addition, the enterprise facility 102
may want to restrict user access under certain conditions, such as
the user's location, usage history, need to know, job position,
connection type, time of day, method of authentication,
client-system configuration, or the like. Network access rules may
be developed by the enterprise facility 102, or pre-packaged by a
supplier, and managed by the threat management facility 100 in
conjunction with the administration facility 134. Network access
rules and control may be responsible for determining if a client
facility 144 application should be granted access to a requested
network location. The network location may be on the same network
as the facility or may be on another network. In an embodiment, the
network access control may verify access rights for client
facilities 144 from within the network or may verify access rights
of computer facilities from external networks. When network access
for a client facility 144 is denied, the network access control may
send an information file to the client facility 144, the
information file may contain data or commands that may provide
instructions for the remedial action facility 128. The information
sent by the network access facility 124 control may be a data file.
The data file may contain a number of commands, definitions,
instructions, or commands to be parsed and acted upon through the
remedial action facility 128, or the like. The information sent by
the network access facility 124 control may be a command or command
file that the remedial action facility 128 may access and take
action upon.
[0038] In an embodiment, the network access rules 124 may provide
an information store to be accessed by the network access control.
The network access rules facility 124 may include databases such as
a block list, a black list, an allowed list, a white list, an
unacceptable network site database, an acceptable network site
database, a network site reputation database, or the like of
network access locations that may or may not be accessed by the
client facility 144. Additionally, the network access rules
facility 124 may incorporate rule evaluation; the rule evaluation
may parse network access requests and apply the parsed information
to network access rules. The network access rule facility 124 may
have a generic set of rules that may be in support of an enterprise
facility's 102 network access policies, such as denying access to
certain types of websites 158, controlling instant messenger 162
accesses, or the like. Rule evaluation may include regular
expression rule evaluation, or other rule evaluation method for
interpreting the network access request and comparing the
interpretation to the established rules for network access. In an
embodiment, the network access rules facility 124 may receive a
rules evaluation request from the network access control and may
return the rules evaluation to the network access control.
[0039] Similar to the threat definitions facility 114, the network
access rule facility 124 may provide updated rules and policies to
the enterprise facility 102. The network access rules facility 124
may be maintained by the network administration facility 134, using
network access rules facility 124 management. In an embodiment, the
network administration facility 134 may be able to maintain a set
of access rules manually by adding rules, changing rules, deleting
rules, or the like. Additionally, the administration facility 134
may be able to retrieve predefined rule sets from a provider that
may provide a set of rules to be applied to an entire enterprise
facility 102. The network administration facility 134 may be able
to modify the predefined rules as needed for a particular
enterprise facility 102 using the network access rules management
facility 124.
[0040] When a threat or policy violation is detected by the threat
management facility 100, the threat management facility 100 may
provide for a remedial action facility 128. Remedial action may
take a plurality of forms, such as terminating or modifying an
ongoing process or interaction, sending a warning to a client or
administration facility 134 of an ongoing process or interaction,
executing a program or application to remediate against a threat or
violation, record interactions for subsequent evaluation, or the
like. Remedial action may be associated with an application that
responds to information that a client facility 144 network access
request has been denied. In an embodiment, when the data file is
received, remedial action may parse the data file, interpret the
various aspects of the data file, and act on the parsed data file
information to determine actions to be taken on an application
requesting access to a denied network location. In an embodiment,
when the data file is received, remedial action may access the
threat definitions to parse the data file and determine an action
to be taken on an application requesting access to a denied network
location. In an embodiment, the information received from the
facility may be a command or a command file. The remedial action
facility may carry out any commands that are received or parsed
from a data file from the facility without performing any
interpretation of the commands. In an embodiment, the remedial
action facility may interact with the received information and may
perform various actions on a client requesting access to a denied
network location. The action may be one or more of continuing to
block all requests to a denied network location, a malicious code
scan on the application, a malicious code scan on the client
facility 144, quarantine of the application, terminating the
application, isolation of the application, isolation of the client
facility 144 to a location within the network that restricts
network access, blocking a network access port from a client
facility 144, reporting the application to a administration
facility 134, or the like.
[0041] Remedial action may be provided as a result of a detection
of a threat or violation. The detection techniques facility 130 may
include monitoring the enterprise facility 102 network or end-point
devices, such as by monitoring streaming data through the gateway,
across the network, through routers and hubs, and the like. The
detection techniques facility 130 may include monitoring activity
and stored files on computing facilities, such as on server
facilities 142, desktop computers, laptop computers, other mobile
computing devices, and the like. Detection techniques, such as
scanning a computer's stored files, may provide the capability of
checking files for stored threats, either in the active or passive
state. Detection techniques, such as streaming file management, may
provide the capability of checking files received at the network,
gateway facility, client facility 144, and the like. This may
provide the capability of not allowing a streaming file or portions
of the streaming file containing malicious code from entering the
client facility 144, gateway facility, or network. In an
embodiment, the streaming file may be broken into blocks of
information, and a plurality of virus identities may be used to
check each of the blocks of information for malicious code. In an
embodiment, any blocks that are not determined to be clear of
malicious code may not be delivered to the client facility 144,
gateway facility, or network.
[0042] Verifying that the threat management facility 100 is
detecting threats and violations to established policy, may require
the ability to test the system, either at the system level or for a
particular computing component. The testing facility 118 may allow
the administration facility 134 to coordinate the testing of the
security configurations of client facility 144 computing facilities
on a network. The administration facility 134 may be able to send
test files to a set of client facility 144 computing facilities to
test the ability of the client facility 144 to determine
acceptability of the test file. After the test file has been
transmitted, a recording facility may record the actions taken by
the client facility 144 in reaction to the test file. The recording
facility may aggregate the testing information from the client
facility 144 and report the testing information to the
administration facility 134. The administration facility 134 may be
able to determine the level of preparedness of the client facility
144 computing facilities by the reported information. Remedial
action may be taken for any of the client facility 144 computing
facilities as determined by the administration facility 134;
remedial action may be taken by the administration facility 134 or
by the user of the client facility 144.
[0043] The threat research facility 132 may provide a continuously
ongoing effort to maintain the threat protection capabilities of
the threat management facility 100 in light of continuous
generation of new or evolved forms of malware. Threat research may
include researchers and analysts working on known and emerging
malware, such as viruses, rootkits a spyware, as well as other
computer threats such as phishing, spam, scams, and the like. In
embodiments, through threat research, the threat management
facility 100 may be able to provide swift, global responses to the
latest threats.
[0044] The threat management facility 100 may provide threat
protection to the enterprise facility 102, where the enterprise
facility 102 may include a plurality of networked components, such
as client facility 144, server facility 142, administration
facility 134, firewall 138, gateway, hubs and routers 148, threat
management appliance 140, desktop users, mobile users, and the
like. In embodiments, it may be the end-point computer security
facility 152, located on a computer's desktop, which may provide
threat protection to a user, and associated enterprise facility
102. In embodiments, the term end-point may refer to a computer
system that may source data, receive data, evaluate data, buffer
data, or the like (such as a user's desktop computer as an
end-point computer), a firewall as a data evaluation end-point
computer system, a laptop as a mobile end-point computer, a PDA as
a hand-held end-point computer, a mobile phone as an end-point
computer, or the like. In embodiments, end-point may refer to a
source or destination for data, including such components where the
destination is characterized by an evaluation point for data, and
where the data may be sent to a subsequent destination after
evaluation. The end-point computer security facility 152 may be an
application loaded onto the computer platform or computer support
component, where the application may accommodate the plurality of
computer platforms and/or functional requirements of the component.
For instance, a client facility 144 computer may be one of a
plurality of computer platforms, such as Windows, Macintosh, Linux,
and the like, where the end-point computer security facility 152
may be adapted to the specific platform, while maintaining a
uniform product and product services across platforms.
Additionally, components may have different functions to serve
within the enterprise facility's 102 networked computer-based
infrastructure. For instance, computer support components provided
as hubs and routers 148, server facility 142, firewalls 138, and
the like, may require unique security application software to
protect their portion of the system infrastructure, while providing
an element in an integrated threat management system that extends
out beyond the threat management facility 100 to incorporate all
computer resources under its protection.
[0045] The enterprise facility 102 may include a plurality of
client facility 144 computing platforms on which the end-point
computer security facility 152 is adapted. A client facility 144
computing platform may be a computer system that is able to access
a service on another computer, such as a server facility 142, via a
network. This client facility 144 server facility 142 model may
apply to a plurality of networked applications, such as a client
facility 144 connecting to an enterprise facility 102 application
server facility 142, a web browser client facility 144 connecting
to a web server facility 142, an e-mail client facility 144
retrieving e-mail from an internet 154 service provider's mail
storage servers 142, and the like. In embodiments, traditional
large client facility 144 applications may be switched to websites,
which may increase the browser's role as a client facility 144.
Clients 144 may be classified as a function of the extent to which
they perform their own processing. For instance, client facilities
144 are sometimes classified as a fat client facility 144 or thin
client facility 144. The fat client facility 144, also known as a
thick client facility 144 or rich client facility 144, may be a
client facility 144 that performs the bulk of data processing
operations itself, and does not necessarily rely on the server
facility 142. The fat client facility 144 may be most common in the
form of a personal computer, where the personal computer may
operate independent of any server facility 142. Programming
environments for fat clients 144 may include CURI, Delphi,
Droplets, Java, win32, X11, and the like. Thin clients 144 may
offer minimal processing capabilities, for instance, the thin
client facility 144 may primarily provide a graphical user
interface provided by an application server facility 142, which may
perform the bulk of any required data processing. Programming
environments for thin clients 144 may include JavaScript/AJAX, ASP,
JSP, Ruby on Rails, Python's Django, PHP, and the like. The client
facility 144 may also be a mix of the two, such as processing data
locally, but relying on a server facility 142 for data storage. As
a result, this hybrid client facility 144 may provide benefits from
both the fat client facility 144 type, such as multimedia support
and high performance, and the thin client facility 144 type, such
as high manageability and flexibility. In embodiments, the threat
management facility 100, and associated end-point computer security
facility 152, may provide seamless threat protection to the
plurality of clients 144, and client facility 144 types, across the
enterprise facility 102.
[0046] The enterprise facility 102 may include a plurality of
server facilities 142, such as application servers, communications
servers, file servers, database servers, proxy servers, mail
servers, fax servers, game servers, web servers, and the like. A
server facility 142, which may also be referred to as a server
facility 142 application, server facility 142 operating system,
server facility 142 computer, or the like, may be an application
program or operating system that accepts client facility 144
connections in order to service requests from clients 144. The
server facility 142 application may run on the same computer as the
client facility 144 using it, or the server facility 142 and the
client facility 144 may be running on different computers and
communicating across the network. Server facility 142 applications
may be divided among server facility 142 computers, with the
dividing depending upon the workload. For instance, under light
load conditions all server facility 142 applications may run on a
single computer and under heavy load conditions a single server
facility 142 application may run on multiple computers. In
embodiments, the threat management facility 100 may provide threat
protection to server facilities 142 within the enterprise facility
102 as load conditions and application changes are made.
[0047] A server facility 142 may also be an appliance facility 140,
where the appliance facility 140 provides specific services onto
the network. Though the appliance facility 140 is a server facility
142 computer, that may be loaded with a server facility 142
operating system and server facility 142 application, the
enterprise facility 102 user may not need to configure it, as the
configuration may have been performed by a third party. In an
embodiment, an enterprise facility 102 appliance may be a server
facility 142 appliance that has been configured and adapted for use
with the threat management facility 100, and located within the
facilities of the enterprise facility 102. The enterprise
facility's 102 threat management appliance may enable the
enterprise facility 102 to administer an on-site local managed
threat protection configuration, where the administration facility
134 may access the threat resources through an interface, such as a
web portal. In an alternate embodiment, the enterprise facility 102
may be managed remotely from a third party, vendor, or the like,
without an appliance facility 140 located within the enterprise
facility 102. In this instance, the appliance functionality may be
a shared hardware product between pluralities of enterprises 102.
In embodiments, the appliance facility 140 may be located at the
enterprise facility 102, where the enterprise facility 102
maintains a degree of control. In embodiments, a hosted service may
be provided, where the appliance 140 may still be an on-site black
box to the enterprise facility 102, physically placed there because
of infrastructure requirements, but managed by a third party,
vendor, or the like.
[0048] Simple server facility 142 appliances may also be utilized
across the enterprise facility's 102 network infrastructure, such
as switches, routers, wireless routers, hubs and routers, gateways,
print servers, net modems, and the like. These simple server
facility appliances may not require configuration by the enterprise
facility 102, but may require protection from threats via an
end-point computer security facility 152. These appliances may
provide interconnection services within the enterprise facility 102
network, and therefore may advance the spread of a threat if not
properly protected.
[0049] One way for a client facility 144 to be protected from
threats from within the enterprise facility 102 network may be a
personal firewall. A personal firewall may be an application that
controls network traffic to and from a client, permitting or
denying communications based on a security policy. Personal
firewalls may be designed for use by end-users, which may result in
protection for only the computer on which it's installed. Personal
firewalls may be able to control network traffic by providing
prompts each time a connection is attempted and adapting security
policy accordingly. Personal firewalls may also provide some level
of intrusion detection, which may allow the software to terminate
or block connectivity where it suspects an intrusion is being
attempted. Other features that may be provided by a personal
firewall may include alerts about outgoing connection attempts,
control of program access to networks, hiding the client from port
scans by not responding to unsolicited network traffic, monitoring
of applications that may be listening for incoming connections,
monitoring and regulation of incoming and outgoing network traffic,
prevention of unwanted network traffic from installed applications,
reporting applications that make connection attempts, reporting
destination servers with which applications may be attempting
communications, and the like. In embodiments, the personal firewall
may be provided by the threat management facility 100.
[0050] Another important component that may be protected by an
end-point computer security facility 152 is a network firewall
facility 138, which may be a hardware or software device that may
be configured to permit, deny, or proxy data through a computer
network that has different levels of trust in its source of data.
For instance, an internal enterprise facility 102 network may have
a high level of trust, because the source of all data has been
sourced from within the enterprise facility 102. An example of a
low level of trust is the Internet 154, because the source of data
may be unknown. A zone with an intermediate trust level, situated
between the Internet 154 and a trusted internal network, may be
referred to as a "perimeter network". Since firewall facilities 138
represent boundaries between threat levels, the end-point computer
security facility 152 associated with the firewall facility 138 may
provide resources that may control the flow of threats at this
enterprise facility 102 network entry point. Firewall facilities
138, and associated end-point computer security facility 152, may
also be associated with a network node that may be equipped for
interfacing between networks that use different protocols. In
embodiments, the end-point computer security facility 152 may
provide threat protection in a plurality of network infrastructure
locations, such as at the enterprise facility 102 network entry
point, i.e. the firewall facility 138 or gateway; at the server
facility 142; at distribution points within the network, i.e. the
hubs and routers 148; at the desktop of client facility 144
computers; and the like. In embodiments, the most effective
location for threat detection may be at the user's computer desktop
end-point computer security facility 152.
[0051] The interface between the threat management facility 100 and
the enterprise facility 102, and through the appliance facility 140
to embedded end-point computer security facilities, may include a
set of tools that may be the same for all enterprise
implementations, but allow each enterprise to implement different
controls. In embodiments, these controls may include both automatic
actions and managed actions. Automatic actions may include
downloads of the end-point computer security facility 152 to
components of the enterprise facility 102, downloads of updates to
existing end-point computer security facilities of the enterprise
facility 102, uploaded network interaction requests from enterprise
facility 102 components to the threat management facility 100, and
the like. In embodiments, automatic interactions between the
enterprise facility 102 and the threat management facility 100 may
be configured by the threat management facility 100 and an
administration facility 134 in the enterprise facility 102. The
administration facility 134 may configure policy rules that
determine interactions, such as developing rules for accessing
applications, as in who is authorized and when applications may be
used; establishing rules for ethical behavior and activities; rules
governing the use of entertainment software such as games, or
personal use software such as IM 162 and VoIP 164; rules for
determining access to enterprise facility 102 computing resources,
including authentication, levels of access, risk assessment, and
usage history tracking; rules for when an action is not allowed,
such as whether an action is completely deigned or just modified in
its execution; and the like. The administration facility 134 may
also establish license management, which in turn may further
determine interactions associated with a licensed application. In
embodiments, interactions between the threat management facility
100 and the enterprise facility 102 may provide threat protection
to the enterprise facility 102 by managing the flow of network data
into and out of the enterprise facility 102 through automatic
actions that may be configured by the threat management facility
100 or the administration facility 134.
[0052] Client facilities 144 within the enterprise facility 102 may
be connected to the enterprise facility 102 network by way of wired
network facilities 148A or wireless network facilities 148B. Client
facilities 144 connected to the enterprise facility 102 network via
a wired facility 148A or wireless facility 148B may receive similar
protection, as both connection types are ultimately connected to
the same enterprise facility 102 network, with the same end-point
computer security facility 152, and the same threat protected
enterprise facility 102 environment. Mobile wireless facility
clients 144B-F, because of their ability to connect to any wireless
148B,D network access point, may connect to the internet 154
outside the enterprise facility 102, and therefore outside the
threat-protected environment of the enterprise facility 102. In
this instance the mobile client facility 144B-F, if not for the
presence of the end-point computer security facility 152 may
experience a malware attack or perform actions counter to
enterprise facility 102 established policies. In addition, there
may be a plurality of ways for the threat management facility 100
to protect the out-of-enterprise facility 102 mobile client
facility 144D-F that has an embedded end-point computer security
facility 152, such as by providing URI filtering in personal
routers, using a web appliance as a DNS proxy, or the like. Mobile
client facilities 144D-F that are components of the enterprise
facility 102 but temporarily outside connectivity with the
enterprise facility 102 network, may be provided with the same
threat protection and policy control as client facilities 144
inside the enterprise facility 102. In addition, mobile client
facilities 144B-F may receive the same interactions to and from the
threat management facility 100 as client facilities 144 inside the
enterprise facility 102, where mobile client facilities 144B-F may
be considered a virtual extension of the enterprise facility 102,
receiving all the same services via their embedded end-point
computer security facility 152.
[0053] Interactions between the threat management facility 100 and
the components of the enterprise facility 102, including mobile
client facility 144B-F extensions of the enterprise facility 102,
may ultimately be connected through the internet 154. Threat
management facility 100 downloads and upgrades to the enterprise
facility 102 may be passed from the firewalled networks of the
threat management facility 100 through to the end-point computer
security facility 152 equipped components of the enterprise
facility 102. In turn the end-point computer security facility 152
components of the enterprise facility 102 may upload policy and
access requests back across the internet 154 and through to the
threat management facility 100. The Internet 154 however, is also
the path through which threats may be transmitted from their
source. These network threats may include threats from a plurality
of sources, including websites 158, e-mail 160, IM 162, VoIP 164,
application software, and the like. These threats may attempt to
attack a mobile enterprise client facility 144B-F equipped with an
end-point computer security facility 152, but in embodiments, as
long as the mobile client facility 144B-F is embedded with an
end-point computer security facility 152, as described above,
threats may have no better success than if the mobile client
facility 144B-F were inside the enterprise facility 102.
[0054] However, if the mobile client facility 144 were to attempt
to connect into an unprotected connection point, such as at a
secondary location 108 that is not a part of the enterprise
facility 102, the mobile client facility 144 may be required to
request network interactions through the threat management facility
100, where contacting the threat management facility 100 may be
performed prior to any other network action. In embodiments, the
client facility's 144 end-point computer security facility 152 may
manage actions in unprotected network environments such as when the
client facility 144F is in a secondary location 108 or connecting
wirelessly to a non-enterprise facility 102 wireless internet
connection, where the end-point computer security facility 152 may
dictate what actions are allowed, blocked, modified, or the like.
For instance, if the client facility's 144 end-point computer
security facility 152 is unable to establish a secured connection
to the threat management facility 100, the end-point computer
security facility 152 may inform the user of such, and recommend
that the connection not be made. In the instance when the user
chooses to connect despite the recommendation, the end-point
computer security facility 152 may perform specific actions during
or after the unprotected connection is made, including running
scans during the connection period, running scans after the
connection is terminated, storing interactions for subsequent
threat and policy evaluation, contacting the threat management
facility 100 upon first instance of a secured connection for
further actions and or scanning,restricting access to network and
local resources, or the like. In embodiments, the end-point
computer security facility 152 may perform specific actions to
remediate possible threat incursions or policy violations during or
after the unprotected connection.
[0055] The secondary location 108 may have no end-point computer
security facilities 152 as a part of its computer components, such
as its firewalls 138B, servers 142B, clients 144G, hubs and routers
148C-D, and the like. As a result, the computer components of the
secondary location 108 may be open to threat attacks, and become
potential sources of threats, as well as any mobile enterprise
facility clients 144B-F that may be connected to the secondary
location's 108 network. In this instance, these computer components
may now unknowingly spread a threat to other components connected
to the network.
[0056] Some threats may not come directly from the Internet 154,
such as from non-enterprise facility controlled mobile devices that
are physically brought into the enterprise facility 102 and
connected to the enterprise facility 102 client facilities 144. The
connection may be made from direct connection with the enterprise
facility's 102 client facility 144, such as through a USB port, or
in physical proximity with the enterprise facility's 102 client
facility 144 such that a wireless facility connection can be
established, such as through a Bluetooth connection. These physical
proximity threats 110 may be another mobile computing device, a
portable memory storage device, a mobile communications device, or
the like, such as CDs and DVDs 170, memory stick 174, flash drive
174, external hard drive, cell phone 178, PDAs 180, MP3 players,
digital cameras, point-to-point devices, digital picture frames,
digital pens, navigation devices, appliances, and the like. A
physical proximity threat 110 may have been previously infiltrated
by network threats while connected to an unprotected network
connection outside the enterprise facility 102, and when connected
to the enterprise facility 102 client facility 144, pose a threat.
Because of their mobile nature, physical proximity threats 110 may
infiltrate computing resources in any location, such as being
physically brought into the enterprise facility 102 site, connected
to an enterprise facility 102 client facility 144 while that client
facility 144 is mobile, plugged into an unprotected client facility
144 at a secondary location 108, and the like. A mobile device,
once connected to an unprotected computer resource, may become a
physical proximity threat 110. In embodiments, the end-point
computer security facility 152 may provide enterprise facility 102
computing resources with threat protection against physical
proximity threats 110, for instance, through scanning the device
prior to allowing data transfers, through security validation
certificates, through establishing a safe zone within the
enterprise facility 102 computing resource to transfer data into
for evaluation, and the like. Now that the overall system has been
described, we turn towards a set of reputation embodiments relating
to the creation and use of hierarchical statistical Internet
reputation metrics. It should be understood that the following
embodiments may be managed through a threat management facility 100
along with other services, such as those described herein.
[0057] Referring to FIG. 2, assessing the reputation
("trustworthiness") of Internet entities 208 via communication
identifiers such as such as a web address, a domain names, an IP
addresses, host names, email addresses, IM addresses, telephone
numbers, VoIP telephony addresses, and the like, may be a function
of security management 122 within the threat management facility
100. In embodiments, the security management facility 122 may
provide for reputation filtering, which may target or identify
sources of known malware. For instance, reputation filtering may
include lists of URIs of known sources of malware or known
suspicious IP addresses, or domains, say for spam or for botnets,
that when detected may invoke an action by the threat management
facility 100. In embodiments, the present invention may provide for
a hierarchical modeling processing facility 202 to build a
hierarchical reputation model of these entities 208 that allows for
the automated clustering of "neighborhoods" of related systems, and
thus the reputation assessment even of brand new hosts within a
neighborhood. The reputation of such a neighborhood, with an
accompanying confidence level, may be derived statistically from
measurements of observed behavior of the systems within it.
Expressed another way, a hierarchical model may allow an entity's
208 reputation to be partially or wholly inherited from its parent.
In embodiments, reputations for these entities 208 may be stored in
a reputation database 204.
[0058] A hierarchical reputation model may reflect the natural
hierarchy of Internet entities 208, both named (i.e. with a
hostname, the logical level) and unnamed (i.e. with just an IP
address, the physical level). Named entities 208 are provided by
the Domain Name System, a globally distributed mapping of hostnames
to IP addresses. The DNS is arranged hierarchically, and different
parts of the tree are maintained by different organizations. For
example, Bigcorp, Inc. will be officially charged with
responsibility for the "bigcorp.com" domain name and all the
hostnames underneath it, e.g. "www.bigcorp.com" and
"mail.bigcorp.com". Because medium to large organizations tend to
have a significant number of hosts within their DNS sub-tree, it is
a very common practice for administrators to subdivide their part
of the DNS tree into manageable units, often clustering hosts with
similar functions together in the same neighborhood of the tree.
For example, a hypothetical Internet service provider (ISP),
"someisp.net" 308, with a large base of both business and consumer
customers might organize their part of the DNS tree as shown in
FIG. 3, where its domain name is `someisp`, its top-level domain
name is `net` 304, and its root zone is 302. In this example,
SomeISP's zone of authority begins at the someisp node 308 and
extends to the leaves 318, 320, 322, 324, 328, 330. The
administrator has decided to establish three major subzones, corp
310 for the company's corporate network, biz 312 for their business
customers, and res 314 for their residential customers. Each major
subzone is further divided, with the leaves 318, 320, 322, 324,
328, 330 representing a specific host (e.g. computer, collection of
computers, and the like).
[0059] Thus it can be seen that the DNS provides a natural
clustering of similarly purposed hosts, with clusters delimited by
the organization's own naming policy: the host
"bob.res.someisp.net" 328 may for instance be treated similarly to
the host "alice.res.someisp.net" 330, because they both belong to
residential customers. For example, the two hosts may have similar
corresponding probability of being part of a spammer's botnet. On
the other hand, the hosts "ceo.corp.someisp.net" 318 and
"janitor.corp.someisp.net" 320 may be less likely to be part of a
botnet, such as by virtue of being behind a professionally
administered firewall.
[0060] The present invention leverages the information provided by
organizations' DNS sub-trees to cluster hosts for the purpose of
determining their reputation. For instance, and referring again to
FIG. 3, that a new host "chieftechnologist.corp.someisp.net" may
likely to behave more similarly to "ceo.corp.someisp.net" 318 than
to "alice.res.someisp.net" 330. By augmenting a tree structure such
as the one presented in FIG. 3, with the appropriate metrics as
obtained through direct observations of the hosts involved, one may
then predict the behavior of hosts one hasn't yet had the
opportunity to observe by estimating the reputation of the
neighborhood (i.e. a subtree) within which they are located. In
embodiments, the resulting structure is sometimes referred to as a
"weighted tree". In embodiments, the same may be achieved for IP
addresses using a slightly different structure.
[0061] In an example, FIGS. 4-9 provide a hierarchical building of
a named, or logical, tree. Suppose the metrics we are monitoring
are `total mails sent` and `total spams sent`, in order to
calculate a very simple reputation score of total spams sent/total
mails sent, i.e. the ratio of spams to total mail sent by a host,
or the probability that a subsequent email will also be spam. To
begin with, FIG. 4 presents an empty tree, where there are no hosts
associated with the root level 302A, and so no emails or spams
accounted. Now suppose we observe the host "mx1.bigcorp.com" send,
over a period of time, 50 emails, of which 2 were spam. To update
the logical tree, we first break the hostname into its component
labels "mx1" 508, "bigcorp" 504, and "com" 502. We work backwards
through the labels adding a node for each, accumulating our metrics
at every node, including the root. The result is illustrated in
FIG. 5. In FIG. 6 we see the result of observing the host
"mx2.bigcorp.com" 602 sending 100 emails, of which 10 were spam.
Note that the metrics for "mx1" 508 and "mx2" 602 show the counts
for those individual hosts, while their ancestor nodes 504, 502,
302A show the combined total. FIG. 7 illustrates the observation
that a host "home-user-9-8-7-6.nyc.someisp.net" 710 send 25 emails,
22 of which were spam. Note again that the metrics for "NYC" 708,
"someisp" 704, and "net" 702 show the counts from below in the
tree. Next, as shown in FIG. 8, another New York-based SomeISP
customer "home-user-5-6-7-8.nyc.someisp.net" 802 is seen sending 80
emails, of which 70 were spam, where the metrics for "NYC" 708,
"someisp" 704, "net" 702, and root 302A show the additional counts
from the new host 802. Finally, as shown in FIG. 9, we observe
SomeISP's mail server "smtp52.someisp.net" 902 send 200 emails, 5
of which were spam.
[0062] Continuing with this example, the determination of a
reputation (in this instance, the probability that a subsequent
email will be spam) of a host already in the tree is
straightforward. We break apart and reverse the labels as during an
insert, and walk the tree from the root until we find the
corresponding leaf node. As shown in FIG. 10, to find the
reputation of "home-user-9-8-7-6.nyc.someisp.net" 710 we can see
the probability of a new email from that host being spam is
22/25=0.88.
[0063] The power of this model, however, comes from the ability to
estimate a reputation score for a host whose behavior has not
previously been observed. In that case we walk as far towards the
leaves of the tree as possible then calculate the probability at
that point. For example, to find the reputation score of the host
"mx3.bigcorp.com" we would walk the tree as far as the bigcorp node
504, at which point we could make no further progress. Thus we
would assign to this host the score at that node, i.e. 12/150=0.08.
This score can be considered the aggregate reputation of all nodes
underneath bigcorp 504, i.e. mx1 508 and mx2 602. In the case of
such a partial match, we are able to provide a statistical measure
of confidence in our estimation by calculating the standard error.
By considering the nodes below bigcorp 504 to be a sample
population of two in which the mean probability is that of the
bigcorp node 504, we can then calculate the variances of mx1 508
and mx2 602 to obtain the standard error of the measurement. In
this way we can require a certain level of confidence in the
probability estimate that is a function of the number of nodes
beneath bigcorp 504 and the degree of similarity (in terms of
behavior) between them. In other words, the more samples we take
(i.e., the more descendants that are underneath the last node of
our partial match) and the more uniform those samples are, the more
confident we can be that our estimate for "mx3.bigcorp.com" is
accurate. In this example, of course, we do not have enough samples
to be at all confident in our estimate, and so a likely result of
this query might be "not enough information".
[0064] The previous example was for the development and utilization
of a named, or logical, tree. In embodiments an unnamed, or
physical, tree may be developed. The physical tree may be very
similar conceptually to the logical tree as described herein, with
the only difference being that we must construct it based on the IP
addresses of the inserted hosts. We once again start at the root
302B with the empty tree as shown in FIG. 11. Now let us add the
host 1.2.3.4, which was observed sending 40 emails, 18 of which
were spam. To do this we convert the IP address in dotted quad form
to binary:
[0065] 1.2.3.4=00000001 00000010 00000011 00000100
[0066] We then insert a node 1202 corresponding to the host, as
shown in FIG. 12. Next we insert the host 15.16.17.18, which sent
50 emails, none of which were spam. We first convert the IP address
to binary:
[0067] 15.90.152.3=00001111 01011010 10011000 00000011
[0068] Notice that this differs from the previous address at the
5.sup.th binary digit from the left, which in the previous was 0
and in the current is 1. We thus add a "splitter" node 1302 as the
parent of the existing leaf 1202 and add the new leaf 1304 as its
second child, as shown in FIG. 13. Finally we add the host
3.99.58.12, which sent 15 emails, 14 of which were spam:
[0069] 3.99.58.12=00000011 01100011 00111010 00001100
[0070] This time the binary address varies from those already
present at the 7.sup.th digit, resulting in the addition of a new
splitter 1402 and the addition of the new leaf 1404 as its second
child, as shown in FIG. 14. Thus we form a binary search tree given
the IP addresses of the hosts we are inserting.
[0071] Querying may be performed in the same way as with the
logical tree as described herein, by walking from the root 302B to
the leaf (or as far as possible in the case of a partial match) and
calculating or estimating the reputation score at that point. The
same statistical method as described herein may be used to obtain a
confidence interval in the estimated probability.
[0072] In embodiments, the present invention uses hierarchies to
model parent/child relationships between and within entities 208,
thereby representing the entities 208 in a natural and useful way.
This is achieved through the inheritance of reputation from
ancestral entities to their descendants. A significant advantage of
the present invention may be that reputation can be assigned to
entities 208 even if they have never been seen before, provided the
entity 208 has a known ancestor somewhere in the reputation tree.
As a result the system may not need to be updated as often, and may
only degrade slowly in its utility over time if updates do not
occur, and thus the data may be updated and published on a less
demanding schedule. Finally, this invention may provide a unified
view of all relationships affecting the reputability of an entity
208, from the network owner at the highest level to individual
hosts, web pages etc. at the lowest level. By using the present
invention it may be possible to assign a reliable and statistically
justifiable reputation score to Internet entities 208 even if
previously unseen, while eliminating the requirement that the model
be updated and published to consumers with the utmost urgency. In
embodiments, using the present invention may result in more botnet
spam blocked than is possible with a simple flat list of bad
senders, no matter how rapidly it is updated.
[0073] Referring to FIG. 15, in embodiments of the present
invention improved capabilities are described for predicting the
reputation of a communication identifier 1500, such as a web
address, a domain name, an IP address, host name, email address, IM
address, telephone number, VoIP telephony address, and the like. In
embodiments, the present invention may receive a communication 1502
from a first communication identifier, parse the first
communication identifier into its components, and assign the
components to a hierarchical tree structure, where the hierarchical
tree structure maintains the hierarchical relationship between the
components of the communication identifier. The present invention
may monitor and keep count 1504 of a number of communications from
the first communication identifier, wherein the number of
communications may be kept for both malicious and/or unwanted
communications and non-malicious and/or unwanted communications.
Attributes may then be provided 1508 to the number of
communications for each appropriate component of the hierarchical
tree, and a statistical measure may be calculated as related to the
number of communications for each component of the hierarchical
tree. The present invention may then receive a communication 1510
from a second communication identifier, where the second
communication identifier may be previously unknown and have a
common component with the hierarchical tree. The statistical
measure of the common component may then be assigned 1512 to the
second communication identifier, and utilizing the statistical
measure assigned to the second communication identifier, may
provide a prediction of reputation of the second communication
identifier. In embodiments, the communication identifier may be in
the form of a web address, a domain name, domain name components,
an IP address, IP address components, an email address, an IM
address, a telephone number, a VoIP telephony address, and the
like. In embodiments, the component may be at a root level, a top
level, a lower level, and the like. In embodiments, the at least
one of malicious and unwanted communication may be spam, from a
computer that is part of a botnet, and the like. In embodiments,
the statistical measure may relate the number of at least one of
malicious and unwanted communications to the number of total
communications. In embodiments, the reputation may be related to
how much spam the website communicates.
[0074] The methods and systems described herein may be deployed in
part or in whole through a machine that executes computer software,
program codes, and/or instructions on a processor. The present
invention may be implemented as a method on the machine, as a
system or apparatus as part of or in relation to the machine, or as
a computer program product embodied in a computer readable medium
executing on one or more of the machines. The processor may be part
of a server, client, network infrastructure, mobile computing
platform, stationary computing platform, or other computing
platform. A processor may be any kind of computational or
processing device capable of executing program instructions, codes,
binary instructions and the like. The processor may be or include a
signal processor, digital processor, embedded processor,
microprocessor or any variant such as a co-processor (math
co-processor, graphic co-processor, communication co-processor and
the like) and the like that may directly or indirectly facilitate
execution of program code or program instructions stored thereon.
In addition, the processor may enable execution of multiple
programs, threads, and codes. The threads may be executed
simultaneously to enhance the performance of the processor and to
facilitate simultaneous operations of the application. By way of
implementation, methods, program codes, program instructions and
the like described herein may be implemented in one or more thread.
The thread may spawn other threads that may have assigned
priorities associated with them; the processor may execute these
threads based on priority or any other order based on instructions
provided in the program code. The processor may include memory that
stores methods, codes, instructions and programs as described
herein and elsewhere. The processor may access a storage medium
through an interface that may store methods, codes, and
instructions as described herein and elsewhere. The storage medium
associated with the processor for storing methods, programs, codes,
program instructions or other type of instructions capable of being
executed by the computing or processing device may include but may
not be limited to one or more of a CD-ROM, DVD, memory, hard disk,
flash drive, RAM, ROM, cache and the like.
[0075] A processor may include one or more cores that may enhance
speed and performance of a multiprocessor. In embodiments, the
process may be a dual core processor, quad core processors, other
chip-level multiprocessor and the like that combine two or more
independent cores (called a die).
[0076] The methods and systems described herein may be deployed in
part or in whole through a machine that executes computer software
on a server, client, firewall, gateway, hub, router, or other such
computer and/or networking hardware. The software program may be
associated with a server that may include a file server, print
server, domain server, internet server, intranet server and other
variants such as secondary server, host server, distributed server
and the like. The server may include one or more of memories,
processors, computer readable media, storage media, ports (physical
and virtual), communication devices, and interfaces capable of
accessing other servers, clients, machines, and devices through a
wired or a wireless medium, and the like. The methods, programs or
codes as described herein and elsewhere may be executed by the
server. In addition, other devices required for execution of
methods as described in this application may be considered as a
part of the infrastructure associated with the server.
[0077] The server may provide an interface to other devices
including, without limitation, clients, other servers, printers,
database servers, print servers, file servers, communication
servers, distributed servers and the like. Additionally, this
coupling and/or connection may facilitate remote execution of
program across the network. The networking of some or all of these
devices may facilitate parallel processing of a program or method
at one or more location without deviating from the scope of the
invention. In addition, any of the devices attached to the server
through an interface may include at least one storage medium
capable of storing methods, programs, code and/or instructions. A
central repository may provide program instructions to be executed
on different devices. In this implementation, the remote repository
may act as a storage medium for program code, instructions, and
programs.
[0078] The software program may be associated with a client that
may include a file client, print client, domain client, internet
client, intranet client and other variants such as secondary
client, host client, distributed client and the like. The client
may include one or more of memories, processors, computer readable
media, storage media, ports (physical and virtual), communication
devices, and interfaces capable of accessing other clients,
servers, machines, and devices through a wired or a wireless
medium, and the like. The methods, programs or codes as described
herein and elsewhere may be executed by the client. In addition,
other devices required for execution of methods as described in
this application may be considered as a part of the infrastructure
associated with the client.
[0079] The client may provide an interface to other devices
including, without limitation, servers, other clients, printers,
database servers, print servers, file servers, communication
servers, distributed servers and the like. Additionally, this
coupling and/or connection may facilitate remote execution of
program across the network. The networking of some or all of these
devices may facilitate parallel processing of a program or method
at one or more location without deviating from the scope of the
invention. In addition, any of the devices attached to the client
through an interface may include at least one storage medium
capable of storing methods, programs, applications, code and/or
instructions. A central repository may provide program instructions
to be executed on different devices. In this implementation, the
remote repository may act as a storage medium for program code,
instructions, and programs.
[0080] The methods and systems described herein may be deployed in
part or in whole through network infrastructures. The network
infrastructure may include elements such as computing devices,
servers, routers, hubs, firewalls, clients, personal computers,
communication devices, routing devices and other active and passive
devices, modules and/or components as known in the art. The
computing and/or non-computing device(s) associated with the
network infrastructure may include, apart from other components, a
storage medium such as flash memory, buffer, stack, RAM, ROM and
the like. The processes, methods, program codes, instructions
described herein and elsewhere may be executed by one or more of
the network infrastructural elements.
[0081] The methods, program codes, and instructions described
herein and elsewhere may be implemented on a cellular network
having multiple cells. The cellular network may either be frequency
division multiple access (FDMA) network or code division multiple
access (CDMA) network. The cellular network may include mobile
devices, cell sites, base stations, repeaters, antennas, towers,
and the like. The cell network may be a GSM, GPRS, 3G, EVDO, mesh,
or other networks types.
[0082] The methods, programs codes, and instructions described
herein and elsewhere may be implemented on or through mobile
devices. The mobile devices may include navigation devices, cell
phones, mobile phones, mobile personal digital assistants, laptops,
palmtops, netbooks, pagers, electronic books readers, music players
and the like. These devices may include, apart from other
components, a storage medium such as a flash memory, buffer, RAM,
ROM and one or more computing devices. The computing devices
associated with mobile devices may be enabled to execute program
codes, methods, and instructions stored thereon. Alternatively, the
mobile devices may be configured to execute instructions in
collaboration with other devices. The mobile devices may
communicate with base stations interfaced with servers and
configured to execute program codes. The mobile devices may
communicate on a peer to peer network, mesh network, or other
communications network. The program code may be stored on the
storage medium associated with the server and executed by a
computing device embedded within the server. The base station may
include a computing device and a storage medium. The storage device
may store program codes and instructions executed by the computing
devices associated with the base station.
[0083] The computer software, program codes, and/or instructions
may be stored and/or accessed on machine readable media that may
include: computer components, devices, and recording media that
retain digital data used for computing for some interval of time;
semiconductor storage known as random access memory (RAM); mass
storage typically for more permanent storage, such as optical
discs, forms of magnetic storage like hard disks, tapes, drums,
cards and other types; processor registers, cache memory, volatile
memory, non-volatile memory; optical storage such as CD, DVD;
removable media such as flash memory (e.g. USB sticks or keys),
floppy disks, magnetic tape, paper tape, punch cards, standalone
RAM disks, Zip drives, removable mass storage, off-line, and the
like; other computer memory such as dynamic memory, static memory,
read/write storage, mutable storage, read only, random access,
sequential access, location addressable, file addressable, content
addressable, network attached storage, storage area network, bar
codes, magnetic ink, and the like.
[0084] The methods and systems described herein may transform
physical and/or or intangible items from one state to another. The
methods and systems described herein may also transform data
representing physical and/or intangible items from one state to
another.
[0085] The elements described and depicted herein, including in
flow charts and block diagrams throughout the figures, imply
logical boundaries between the elements. However, according to
software or hardware engineering practices, the depicted elements
and the functions thereof may be implemented on machines through
computer executable media having a processor capable of executing
program instructions stored thereon as a monolithic software
structure, as standalone software modules, or as modules that
employ external routines, code, services, and so forth, or any
combination of these, and all such implementations may be within
the scope of the present disclosure. Examples of such machines may
include, but may not be limited to, personal digital assistants,
laptops, personal computers, mobile phones, other handheld
computing devices, medical equipment, wired or wireless
communication devices, transducers, chips, calculators, satellites,
tablet PCs, electronic books, gadgets, electronic devices, devices
having artificial intelligence, computing devices, networking
equipments, servers, routers and the like. Furthermore, the
elements depicted in the flow chart and block diagrams or any other
logical component may be implemented on a machine capable of
executing program instructions. Thus, while the foregoing drawings
and descriptions set forth functional aspects of the disclosed
systems, no particular arrangement of software for implementing
these functional aspects should be inferred from these descriptions
unless explicitly stated or otherwise clear from the context.
Similarly, it will be appreciated that the various steps identified
and described above may be varied, and that the order of steps may
be adapted to particular applications of the techniques disclosed
herein. All such variations and modifications are intended to fall
within the scope of this disclosure. As such, the depiction and/or
description of an order for various steps should not be understood
to require a particular order of execution for those steps, unless
required by a particular application, or explicitly stated or
otherwise clear from the context.
[0086] The methods and/or processes described above, and steps
thereof, may be realized in hardware, software or any combination
of hardware and software suitable for a particular application. The
hardware may include a general purpose computer and/or dedicated
computing device or specific computing device or particular aspect
or component of a specific computing device. The processes may be
realized in one or more microprocessors, microcontrollers, embedded
microcontrollers, programmable digital signal processors or other
programmable device, along with internal and/or external memory.
The processes may also, or instead, be embodied in an application
specific integrated circuit, a programmable gate array,
programmable array logic, or any other device or combination of
devices that may be configured to process electronic signals. It
will further be appreciated that one or more of the processes may
be realized as a computer executable code capable of being executed
on a machine readable medium.
[0087] The computer executable code may be created using a
structured programming language such as C, an object oriented
programming language such as C++, or any other high-level or
low-level programming language (including assembly languages,
hardware description languages, and database programming languages
and technologies) that may be stored, compiled or interpreted to
run on one of the above devices, as well as heterogeneous
combinations of processors, processor architectures, or
combinations of different hardware and software, or any other
machine capable of executing program instructions.
[0088] Thus, in one aspect, each method described above and
combinations thereof may be embodied in computer executable code
that, when executing on one or more computing devices, performs the
steps thereof. In another aspect, the methods may be embodied in
systems that perform the steps thereof, and may be distributed
across devices in a number of ways, or all of the functionality may
be integrated into a dedicated, standalone device or other
hardware. In another aspect, the means for performing the steps
associated with the processes described above may include any of
the hardware and/or software described above. All such permutations
and combinations are intended to fall within the scope of the
present disclosure.
[0089] While the invention has been disclosed in connection with
the preferred embodiments shown and described in detail, various
modifications and improvements thereon will become readily apparent
to those skilled in the art. Accordingly, the spirit and scope of
the present invention is not to be limited by the foregoing
examples, but is to be understood in the broadest sense allowable
by law.
[0090] All documents referenced herein are hereby incorporated by
reference.
* * * * *