System and method for filtering of web-based content stored on a proxy cache server McClain, Carolyn B. ; et al. [Novell, Inc.]

System and method for filtering of web-based content stored on a proxy cache server

McClain, Carolyn B. ; et al.

Patent Application Summary

U.S. patent application number 10/878782 was filed with the patent office on 2005-01-27 for system and method for filtering of web-based content stored on a proxy cache server. This patent application is currently assigned to Novell, Inc.. Invention is credited to McClain, Carolyn B., Thatcher, Jim E..

Application Number	20050021796 10/878782
Document ID	/
Family ID	32772244
Filed Date	2005-01-27

United States Patent Application	20050021796
Kind Code	A1
McClain, Carolyn B. ; et al.	January 27, 2005

System and method for filtering of web-based content stored on a proxy cache server

Abstract

A system and method for filtering of web-based content in a proxy cache server environment provides a local network having a client, a directory server and a proxy cache server that caches predetermined Internet-derived web content within the network. When content is requested, it is vended to the client only if it meets predefined user policies for acceptability. These policies are implemented based upon one or more ratings lists provided by content rating vendors. The lists are downloaded to the network in whole or part, and cached for use in determining acceptability of content by a filter application. Ratings can be particularly based upon predetermined content categories. Caching occurs in a host or object cache for rapid access. Only if current ratings are not found in the host or object caches are ratings caches or vendors accessed for ratings. Ratings on requested content are then placed in the host or object cache for subsequent use. Object parsing or other techniques can be used to screen returned content that is unrated or otherwise allowed to pass to ensure that it is appropriate.

Inventors:	McClain, Carolyn B.; (Springville, UT) ; Thatcher, Jim E.; (Pleasant Grove, UT)
Correspondence Address:	SCHWEGMAN, LUNDBERG, WOESSNER & KLUTH, P.A. P.O. BOX 2938 MINNEAPOLIS MN 55402 US
Assignee:	Novell, Inc. Provo UT
Family ID:	32772244
Appl. No.:	10/878782
Filed:	June 28, 2004

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10878782	Jun 28, 2004
09560307	Apr 27, 2000
6772214

Current U.S. Class:	709/229 ; 707/E17.12
Current CPC Class:	G06F 16/9574 20190101
Class at Publication:	709/229
International Class:	G06F 015/16

Claims

What is claimed is:

1.-12. (Canceled)

13 A method to filter content, comprising: automatically acquiring from a ratings vendor a rating associated with content; comparing the rating to policies associated with a client; permitting the client to acquire the content if permitted by the policies; and denying the client access to the content if the polices do not permit.

14. The method of claim 13, further comprising: pre-acquiring the content on a proxy; and applying the policies for the client on the proxy.

15. The method of claim 13, further comprising receiving, from the client, a request for the content before automatically acquiring the rating.

16. The method of claim 13, wherein automatically acquiring further includes acquiring the rating for the content from a list of ratings provided by the ratings vendor, wherein a number of other ratings for other content of a content vendor is provided in the list.

17. The method of claim 13, further comprising housing the rating for the content in a cache of a proxy for subsequent use when the content is requested by the client again or requested by other clients.

18. The method of claim 13, further comprising identifying the content as a Uniform Resource Locator (URL) sent from the client and intercepted by a proxy, wherein the URL is directed to a content provider and the ratings vendor provides the rating for the content to the proxy.

19. The method of claim 13, wherein automatically acquiring the rating further includes dynamically inspecting, by the ratings vendor, information in the content and dynamically producing the rating based on the dynamic inspection.

20. A method to filter content, comprising: intercepting a client request for content; acquiring one or more policies for the client with respect to receiving the content; searching cache for a rating for the content; acquiring the rating from a ratings vendor if the rating is not in the cache; and enforcing the one or more policies against the rating to determine whether the client is permitted to receive the content or is denied access to the content.

21. The method of claim 20, further comprising pre-acquiring the content and storing the content in cache after intercepting the client request.

22. The method of claim 21, further comprising retaining the rating and the content in the cache for subsequent requests of the client for the content or for new requests for the content from other clients.

23. The method of claim 20, wherein acquiring the rating from the ratings vendor further includes receiving the rating as a category and searching a predefined list of categories to determine if the category is acceptable or unacceptable.

24. The method of claim 20, wherein intercepting the client request further includes identifying the request for the content as a Uniform Resource Locator (URL) associated with a content provider and determining an identity of the ratings vendor based on an identity of the content provider.

25. The method of claim 20, wherein enforcing the one or more policies further includes: determining that the rating acquired from the ratings vendor is not rated by the ratings vendor; acquiring the content from a content provider; and inspecting the content to determine the rating based on predefined words, phrases, or Uniform Resource Locators (URLs) embedded within the content.

26. The method of claim 20, wherein enforcing the one or more policies further includes: determining that the rating acquired from the ratings vendor is deficient for purposes of enforcing the one or more policies; acquiring the content from a content provider; and acquiring from a ratings supplier a modified rating for the rating based on the content that is supplied to the ratings supplier.

27. The method of claim 20, wherein acquiring the rating from the ratings vendor further includes receiving a list of ratings for provided content distributed by a content provider, wherein the rating is acquired from the list based on the content associated with the client request.

28. A system to filter content, comprising: policies for clients that request content from content providers; and a proxy that intercepts requests from the clients, inspects cache for ratings associated with content that satisfies the requests, contacts ratings vendors for a number of the ratings not in the cache, and applies the policies against the ratings to determine whether the clients are to receive the content or be denied access to the content.

29. The system of claim 28, wherein the proxy is a directory service.

30. The system of claim 28, wherein the proxy dynamically determines a number of the ratings in instances where some ratings are not in the cache and not adequately described by the ratings vendors by inspecting the content associated with the requests.

31. The system of claim 28, wherein the proxy dynamically submits the content to a ratings supplier when a corresponding rating is not in the cache and not adequately described by the ratings vendors.

32. The system of claim 28, wherein the clients include World-Wide Web (WWW) browser applications that interface with the proxy over a network and the requests are Uniform Resource Locators (URLs) issued from the browser applications for the content providers, and wherein the URLs are intercepted by the proxy.

Description

FIELD OF THE INVENTION

[0001] This invention relates to Internet content filtering and more particularly to the filtering of web content stored on a local intranet proxy cache server.

BACKGROUND OF THE INVENTION

[0002] Access by home and business computers to large, unrestricted sources of information, such as those available through the World Wide Web (the "Web") domain on the well-known Internet has increased exponentially in recent years. In many computing environments, it is desirable to restrict access to certain types of information on the Internet and other networks by selected users. For example, educational institutions and parents may wish to allow access only to educational content by access by children. Businesses may, likewise, wish to restrict content accessible over their networks, based upon legal, moral and productivity concerns. Many computers joined in Local Area Networks (LANs) frequently employ a network architecture that includes a proxy cache server to store and dispense Internet content. In a common proxy cache arrangement, a network user (a client), typically represented by a stand-alone microcomputer with an appropriate network interface, requests download of Internet web site content by entering the site's Uniform Resource Locator (URL) address into a web browser application resident on the client computer. The request is then transferred to a proxy cache server within the local network that may or may not already contain a current copy of the desired web content If the content is present in the cache, the proxy cache server, rather than placing a call over the Internet to the remote site, instead transmits the requested web content to the client from the local network storage.

[0003] FIG. 1 illustrates a generalized architecture for a local network that includes a proxy cache server. The illustrated network is described more particularly in related U.S. patent application Ser. No. 08/905,150, entitled User Name Authentication for Gateway Clients Accessing A Proxy Cache Server. By way of background, further teachings related to a proxy cache server environment are also disclosed in U.S. patent application Ser. No. 09/023,895, entitled Client Inherited Functionally Derived From a Proxy Topology Where Each Proxy is Independently Configured; U.S. patent application Ser. No. 09/195,982, entitled Proxy Cache Cluster; and U.S. Provisional Patent Application Ser. No. 60/128,829, entitled Object Cache Store--all of which are assigned to Novell, Inc. of Provo, Utah, and the teaching of each of the aforesaid patent applications being expressly incorporated herein by reference.

[0004] Particularly, FIG. 1 illustrates an architecture-level block diagram of a local area network having a proxy cache server and associated applications. The network 20 includes a plurality of clients showing generally by the exemplary client block 22. Each client can comprise a stand-alone microcomputer having a central processing unit (CPU) 24, a memory 26 and a network adapter 28 for communication, all linked by a bus 30. Each client is linked with its own user interface 32 that allows data to be viewed and instructions to be transmitted. The user interface typically includes a keyboard, monitor and a screen-cursor manipulator, such as a mouse. The client is linked to a local network or intranet 34. Packets of data can be transferred over the intranet using the well-known Internet Protocol (IP), or Novell's improved proprietary protocol, IPX or other common protocols.

[0005] The intranet 34 is, likewise, linked with a Novell Directory Services (NDS) server 36, which operates in the commercially available Novell NetWare network operating system environment and other commercially available network operating systems. This server includes its own CPU 38, memory 40 and network adapter 42, linked by a bus 44 to the intranet 34. An associated NDS data storage device, disk 46 is also linked to the server 36. The NDS server 36 and storage device 46 store and distribute data related to client user names. Using proprietary or open standard-based data calls, the clients each poll the NDS server for the unique NDS user name. The NDS user name is used for further communication by the client once it is received over the intranet. A proxy cache server 50 is also provided, linked to the intranet by an appropriate bus. The proxy cache server also contains a CPU 52, memory 54 and network adapter 56. The proxy cache server, in this example, is linked by network link 60 to the well-known Internet communication network 62. A large number of nodes and routers enable transfer of TCP/IP formatted data packets to and from various remote sites. One such remote site consisting of a web server 64 is illustrated. The web server 64 includes its own associated data storage device such as the disk 66. In essence, the proxy cache server 50 acts as a "firewall" between the external Internet 62 and the intranet 34. Requests for web site information are first routed from clients through the intranet 34 to the proxy cache server 50. If the client is authorized to request information from a particular web site, then the information is retrieved from the memory 54 (if such information is already cached in the memory) or it is, at that time retrieved from the remote web site for transfer to the client

[0006] As suggested above, it is desirable that advanced filtering techniques be employed to further ensure that the particular client can only access information from the web that is authorized. In the past this has generally entailed the physical scanning and blocking to selected web content, often on a URL-by-URL basis by the system administrator. The recent rise of independent ratings services that rate the content of a very large number of Internet sites afford an opportunity to automate the filtering function further, and to place it into the province of specialists in the field. Often, however, these services are not readily adaptable to a given network environment and employ a variety of different rating criteria and content categories. It is, therefore, an object of this invention to provide filtering that is readily adaptable to a proxy cache server environment and that enables a variety of different filtering services and databases to be employed with relative ease.

SUMMARY OF THE INVENTION

[0007] This invention overcomes the disadvantages of the prior art by providing a filter that selectively enables access or blocks requested web information by a client in a local network based upon content rating information stored in connection with a large number of known web sites. Such ratings can be stored based upon the site's URL address. When ratings are obtained, they can be applied based upon predefined user policies stored in association with the storage bank and authentication mechanism (such as NDS). Content can be stored in the local network in connection with a proxy cache server application.

[0008] The ratings can comprise a set of categories and sub-categories into which certain content falls based upon a ratings service' subjective criteria Each list is cached in whole or part in the local network in a ratings cache. The list is updated by action of either the ratings service or the filter and new lists can be transmitted over the Internet, or another network, from a remote vendor site. Each vendor may provide a software module (a NetWare Loadable Module in this embodiment) to implement the manipulation of the provided ratings list by the filter. The module may include update procedures, interpretations and translations of proprietary ratings structures and types of content rated. The filter can be configured to vend requested content, block requested content, or monitor requested content (e.g. vend content, but make log-file entry noting type of content vended and to whom). The vend/block/monitor decision is based upon a variety of criteria including override lists, always-acceptable allow lists and always blocked block page listspically dependent on the specific categories associated generally across all URLs, but also upon specific underlying URLs that may or may not be allowable.

[0009] A significant advantage to the system and method according to a preferred embodiment of this invention is increased speed resulting from efficient look-up of content ratings, upon which allow, block and vend-but-monitor/warn decisions are based. In the proxy server environment, content ratings are looked-up in the content host cache first. If ratings are not found in the host cache, or are inconclusive, then the object cache is checked. Then the rating cache is checked, and the rating, if found, is placed in the host or object cache for speedier look-up the next time. If a rating is not found in one of the caches, then it is sought over the Internet or another remote location from one or more rating service providers. If found, it is returned and stored for future use. Late rating service providers that scan content for key words and phrases are also used at this time to provide ratings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] The foregoing and other objects and advantages of the invention will become more clear with reference to the following detailed description as illustrated by the drawings in which:

[0011] FIG. 1 is a block a diagram of a network having a proxy cache server according to the prior art;

[0012] FIG. 2 is a communication-protocol level block diagram of the network of FIG. 1 including an object content filter according to this invention;

[0013] FIG. 3 is a block diagram illustrating a generalized object content filter arrangement, according to an embodiment of this invention, employing a local user directory;

[0014] FIG. 4 is a block diagram illustrating a generalized object content filter arrangement, according to an embodiment of this invention, employing a remote user directory;

[0015] FIG. 5 is a flow diagram detailing an overview of the object content filter user rules evaluation process according to this invention;

[0016] FIG. 6 is a flow diagram of the rating query process for the evaluation process of FIG. 5;

[0017] FIG. 7 is a flow diagram of the rule-skipping process for the evaluation process of FIG. 5;

[0018] FIG. 8 is a flow diagram of the monitoring process for the evaluation process of FIG. 5; and

[0019] FIG. 9 is a flow diagram for the termination process of FIG. 5.

DETAILED DESCRIPTION

[0020] FIG. 2 shows a communication protocol stack organization for a generalized local area network that includes a proxy cache server. As noted above, the network can be configured to use the NetWare operating system or any other suitable standard. The client stack 100 includes a resident web browser application 102, such as the commercially available Netscape Navigator.TM. product that enables display and navigation of web site content information and a gateway client application 104. In addition, the Dynamic Link Library (DLL) resident as part of the well-known Windows.RTM. operating system Windows Socket (WinSock) available from Microsoft Corp. of Redmond, Wash. is shown. The WinSock DLL 130 generally enables communication through the intranet 34 by a client.

[0021] The authentication mechanism chosen according to this embodiment is through the lightweight directory access protocol (LDAP), but NDS, Radius and other well-known systems can also be used. The preferred directory service stack 110 includes a resident directory server application 112 (a directory server). The proxy cache server stack 120 includes a proxy server application 122 and an access control agent application 124. Each of the stacks 100, 110 and 120 includes appropriate Transmission Control Protocol (TCP) communication layers 131, 132 and 134, respectively. Likewise, each of the stacks 100, 110 and 120 includes an IP and/or IPX communication layer 140, 142 and 144 respectively. These layers enable appropriate transmission of packets, and verification of packet receipt over the network. The web site stack 150, which is one of a large number of such stacks, shows a web application 152 which provides, content which may or may not be appropriate for receipt by the client This stack includes appropriate TCP 154 and IP 156 layers. The Internet link 160, shown generally as a jagged line, connects this web site with the proxy cache server stack 120.

[0022] The proxy cache server stack 120 further includes a generalized object content filter (OCF) application 170 according to this invention. This filter is used to determine whether a web application (such as the application 152) may be accessed by the client. The filter will now be described in further detail.

[0023] Reference will be made to FIGS. 3 and 4, which each define the elements of an object content filter and data associated therewith according to this invention. FIGS. 3 and 4 include like reference numerals for like components, and will be referred to together except where otherwise noted. FIG. 3 differs from FIG. 4 in that it includes a localized user directory within the proxy cache environment. As such it is contemplated as an NDS-based proxy cache, while FIG. 3 shows a more-generalized external directory such as a directory service accessible via LDAP. Both types of directory implementations are expressly contemplated according to this invention.

[0024] In general, this invention contemplates that web site URLs can be categorized according to a variety of rating schemes into content that is or is not appropriate for viewing and access by predetermined groups or specific users. For example, certain content may be rated as educational, and accessible by any group. Other content may be rated as pornographic or violent, and not accessible by certain (or any) user groups within the local network. For the purposes of this description, the ratings of URLs are provided by services that monitor and periodically update large lists 306, 308 and/or 314 in order to determine the category of each content provider. The services can take at least two forms. Some services are delivered in whole or in part over the Internet by accessing the web site of the service and downloading all or part of a rating list. The rating list generally includes the URL along with appropriate information. This is discussed further below. Alternatively, vendors may provide disks or other downloadable software that includes a list of current URLs that have been analyzed. In addition to URL-based ratings services, this invention can operate with late rating service providers that generate ratings based upon words and phrases found in a particular document returned from the server. One such service is the Novell Word Parser, available from Novell, Inc. of Provo Utah.

[0025] A variety of ratings schemes are typically employed by vendors. Some vendors concentrate only in providing URLs that either meet or do not meet certain criteria For example, some vendors only provide lists of acceptable educational material. Other vendors explicitly provide lists of unacceptable material based on a variety of criteria such as violence and pornographic content. Still, other vendors provide various category ratings on a large number of URLs across a wide range of content that may be both acceptable and unacceptable.

[0026] A standard format exists for defining ratings systems. This format is known as the Platform for Internet Content Selection (PICS.TM. standard by which labels are associated with Internet content. This standard is currently accessible via the W3C World Wide Web consortium that includes the Massachusetts Institute of Technology, the Defense Advanced Research Projects Administration (DARPA) and a variety of other inter-national computer authorities. The standards and supporting documentation are directly available through the World Wide Web through the Internet address http://www.w3c.org/PICS.

[0027] According to this embodiment, the PICS standard is utilized to define basic ratings categories. In general, to be compatible with the object content filter of this invention, each vendor should format its rating service description to conform to the PICS standard. In this manner, each vendor would provide a PICS rating service description file that describes their own in-house database of ratings information. Note that other standards are usable according to the teachings of this invention.

[0028] Again, referring to FIGS. 3 and 4, the object content filter application is depicted as the functional block 300. This application is implemented generally as an application program interface (API) typically as part of the overall Border Manager.TM. product available from Novell and resides within the proxy cache server. The functionalities of the filter and related blocks are integrated within the base code of the proxy cache depicted generally as the block 302. In general, the object content filter API is defined so that various content rating vendors can provide structured lists 308 that can be readily interfaced with the defined rating system. As described above, certain vendors can provide NetWare Loadable Module (NLM) applications (ratings supplier process 312) that interface directly with the filter 300. The NLM, in this case, enables a vendor's proprietary ratings structure 314 to be acquired from the vendor and to be stored in a form readable by the filter 300 based upon an object rating API 316.

[0029] The actual rating information comprises a large list of URLs for specific web content sites. Each provider may have a given specialty (e.g. some providers rate only educational material or adult content, while others rate a variety of different content types. Each type of content is ascribed one or more categories within the ratings scheme employed by the filter. This is described in further detail below. Each list of URL ratings is typically obtained over the Internet 310. The lists can be accessed by the filter each time specific web content is requested or, alternatively, it can be cached in for example a host cache (including a Domain Name System (DNS) cache 320), as an entry 322, or in an object cache 330 as an attachment to the actual data (which resides as an entry 332 in this cache). There is included a mechanism for periodic update of cached lists that is either initiated by the vendor when updates are available, or by a polling process within the filter. In each case, an Internet communication link is established between the ratings list provider/vendor and the local filter and updated list information is transferred.

[0030] It is generally contemplated that the ratings list vendors provide information according to the filter's accepted format, which, in this embodiment is the PICS standard. However, the applicable NLM can also include translation functionalities to place proprietary ratings data into a format acceptable by the filter. Ratings under the PICS implementation (PICSIMP) process 340 are stored in the rating cache 342.

[0031] The downstream user or proxy client 358, when requesting information via the local intranet, enters a URL that is intercepted by the proxy cache server application 364. The server poses a permission request to the filter 300. If the URL content is already available within the proxy cache server it will be stored as an entry in the object cache 366. The user name of the client is determined from the authentication service (LDAP in this example). Once the name has been associated, sets of rules are stored in the user directory 360 (local directory) or 460 (remote directory interconnected via the Internet in FIG. 4). User profiles are stored in a profile cache 368, and are employed in the authentication process. When a user authenticates, then the active rule set is aggregated and cached. When the user then requests content (user request 376), the rule is applied by the filter 300 (user request phase 366) to determine whether the permission to view the content is granted. The procedure for granting permission is described further below. In summary, the filter 300 accesses rating information either from the existing host cache, object cache or when a URL rating is not already cached by the proxy, from a vendor rating list via, for example, the rating cache 342. Based upon the preset user policies provided through the directory, as controlled by the system administrator (not shown), the filter either allows return of the web content/net data (370) from the appropriate cache or returns a message denying access, to the user 358 (allowance/denial 380 during user data phase 382).

[0032] As noted above, late rating services can be employed during data return to locate certain words or phrases in otherwise permissible or unrated content within the returned data 370. An object parser 388 can locate words, phrases and or other data in a document that is or is not permissible.

[0033] The structuring of rating service lists into the PICS format will now be described further. In general, the ratings service description is the basis for all URL labels provided by the list vendor. Each time there are additions or deletions from the ratings service list, the list vendor typically generates a new complete URL list that replaces the list previously provided. According to this embodiment, ratings are provided as whole numbers in which lower values are more restrictive and higher values are less restrictive. It is contemplated that each rating represents one or more given categories. Sub-categories can be supported according to alternate embodiment in which additional numbers are appended to the basic rating numbers. Where more detailed sub-categories are used, a these sub-categories can be identified as part of an overall category and sub-category concatenated string with a slash between the two parts. It is noted that certain web content providers automatically apply PICS-based HTML tags (content headers) to provided data that denote a given rating. For example, certain adult entertainment content providers include a content header in their respective web content objects that indicate adult content. According to this embodiment ratings for URLs are listed individually or grouped with a "Hildcard" prefix. According to one arrangement, the rating file can be provided in a format that includes a series of PICS labels. Each label follows the PICS-defined label format and conforms to the accompanying rating service description. URL labels in the rating file that do not currently exist in the filter database are added. Labels that contain a blank rating are removed from the database. Labels that already exist in the database are modified with any updated rating information.

[0034] Alternatively, the rating file can be provided as a binary file format containing a translation header and a list of URL/rating pairs. The format for the binary rating file is:

[0035] <Category Count,<Total Bit Count,

[0036] <Bit Offset,<Bit Field Length>, <Distinguished Category Name>,

[0037] <URL len<URL><Rating>; where

[0038] Category Count is the number of Distinguished Category Names contained in the header;

[0039] Total Bit Count represents the number of bits in the rating record. This number may be larger that the actual number of bits used to allow byte alignment. It is assumed that the bit field is always byte aligned;

[0040] Bit Offset represents the starting bit that applies to the distinguished category name. Bit offset 0 of the rating is assumed to be the first bit following the URL. Bit offset N of the rating is assumed to be the last bit in the bit field;

[0041] Bit Field Length is the number of bits, starting as bit offset, that apply to the distinguished category name. Each bit field will be parsed as if the number in little endian byte order,

[0042] Distinguished Category Name is the unique name for the category as defined in the rating service description file. This is the "tansmit-as" name and must not be longer than 38 characters. For subcategories this may be in the format <category name>.backslash.<sub is category name>;

[0043] The Distinguished Category Name must be NULL terminated;

[0044] URL len is a byte value representing the length of the URL string;

[0045] URL is the actual URL being rated (this string is not NULL terminated); and

[0046] Rating is a bit string containing the rating values. Bit 0 of the rating is assumed to be the first bit following the URL. Bit N of the rating is assumed to be the last bit in the bit field.

[0047] By way of example a sample ratings file is now described. Note that the <URL len><URL><Rating> entries are displayed on individual lines for reading convenience only. In the actual download file they should appear consecutively. And, the numbers displayed for the URL length and rating are the hexadecimal representations of an exemplary ratings file entry:

[0048] 5, 16,

[0049] 0, 1, Music,

[0050] 1, 1, Video,

[0051] 2, 2, Profanity,

[0052] 4, 3, Medical,

[0053] 8, 3, Political,

[0054] 12http://www.abc.com8400

[0055] 13http://www.defg.com0240

[0056] For list maintenance purposes, each URL/rating pair in the binary rating file will constitute one transaction on the locally maintained rating database. URLs not currently in the database will be added and URLs that already exist in the database will be updated with the new rating. To remove a rating from the database, the URL should be included and the entire rating field should be set to 1's (this includes any bits used for padding).

[0057] The following is a description of the manner in which the object content filter interacts with the rating service providers in order to enforce the content filtering policy established to the particular network, and how that policy is gathered from the user directory 360, 460.

[0058] The following definitions are provided in connection with the filtering process:

[0059] Category--an element of a rating system that represents a series of values for which content or (URL) can be rated.

[0060] Rating--one or more category values provided by a rating supplier for specific content (or URL).

[0061] Threshold--a value that exists within the list of possible values for a category.

[0062] Rule--an element of a user's filtering policy that combines one category with an acceptable threshold and one or more actions.

[0063] Actions--include allow vending of content/page; block content/page (due to rating); block unrated content/page; monitor but vend content/page and log in a file (also warn of questionable content/page, but give option to user to receive or not receive)

[0064] When the proxy cache is initiated, the list of processes required to support the rating suppliers is initiated. As each rating supplier process is initiated, it registers with the filter by sending information regarding the nature of the ratings list or lists supported. As each ratings supplier process registers with the filter, the filter creates a vendor object detailing the type of rules the vendor supports and how to request ratings from the vendor on an ongoing basis (e.g., on the fly). The ratings supplier (or list vendor) is responsible for determining whether the categories it supports are to be used as allow rules or deny rules. In other words, certain rules are used to allow certain content but deny all others, while other rules are used to allow any content that is not denied. This rule information is conveyed to the filter at registration time. Because of the processing order during the rules evaluation phase (to be described below), it is desirable that the filter know if the vendor supports one or both types of rules (e.g., allow or deny).

[0065] Once the vendor object is complete, the filter places it in a list of vendor objects. The list of vendor objects is ordered by the type and number of rules each vendor supports. The order is defined as: Exception List; Allow Rule Only Vendors; Allow Block Rule Vendors; Block Rule Only Vendors. When a ratings supplier is disabled or removed from the system, it notifies the filter of the change. The filter marks the vendor object with a rating supplier as being disabled so that no rules for the vendor are processed. The vendor object is removed from the list of vendor objects and deleted when all the users who are dependent upon that vendor's rules have disconnected from the proxy cache, or those users applicable rules have been changed to exclude the disabled rating supplier.

[0066] Each ratings supplier process is responsible for updating the list of ratings at a predetermined interval. When the list update is completed, the rating supplier process notifies the filter of the new time stamp for the updated list. The filter updates its vendor object and uses the new time stamp to invalidate any ratings for that vendor which are found in the cache prior to the time stamp. Rating invalidation is discussed further below with reference to the rules evaluation procedure.

[0067] A rule contains sufficient information to act on at least one category for one rating supplier. The rating supplier determines what list of categories it supports, as described generally above and whether each category can be used to allow or deny access to Internet with content. The portion of the rules stored in the directory on the user object and/or any group or containment object specifies the limit value or threshold to be applied for the rule. Also stored is whether the rules should be used for allowing, blocking or monitoring access, and if the value can or cannot be overridden by other rules available to the evaluation process. Categories that are defined as allow by the rating supplier can have allow or monitor actions that are defined by the user rule. Categories that are defined as deny by the rating supplier can have deny, monitor or warn actions defined by the user rule.

[0068] As a user is authenticated to the proxy, the filter queries the directory for a list of rules contained in the user object. The rules are placed in a list ordered the same as the vendor objects in the vendor list. This is described further above with respect to ratings supplier registration. The filter then queries the groups and the containment hierarchy for rules that can place further limitations on that user's rule set. When this process is completed, the filter has generated a user object, which contains a concise list of rules to be applied to the user before Internet content is either allowed or denied. Each rule contains a pointer to the hosting vendor object to eliminate any need to match the vendor to the rule during the rule evaluation process.

[0069] Reference is now made to FIG. 5, which describes in detail the rule evaluation process according to a preferred embodiment of this invention. Rule evaluation occurs in the user request phase (366) before the proxy requests information from the Internet and also at the user data phase (382) before the proxy sends the acquired data to the user. The rules evaluation process operates similarly in both the user request phase and the user data phase. Typically, the rating service providers checked at each phase is different as described generally above, during the user request phase, ratings service providers that use a pre-configured list of sites (308). During the user data phase, the filter checks rating service providers that use on-line rating systems (306), content parsing/or embedded rating tags. Hence, a user request may pass the user request phase because it is not rated by the particular rating service providers applicable, but maybe denied by rating service providers relative to the user data phase.

[0070] Request by the user for Internet content causes the filter to receive the user object and begin processing the rules in the order in which they are defined (e.g., Exception List; Allow Rule Only Vendors; Allow and Block Rule Vendors; Block Rule Only Vendors).

[0071] According to FIG. 5, the rule evaluation process 500 begins at start block 502. The requested URL is first formatted with an IP address for the host name according to process block 504. Next, the exception list hosted by the PICS implementation process (PICSIMP 340) is queried for a rating from the exception list in process block 506. Referring now to FIG. 6, the process of querying for a rating is shown. This process will be described generally, and referred to at various times during the description of the evaluation process. For the purposes of this description, the exception list that is queried in process block 506 is treated as another rating supplier that is supported by the PICS implementation process.

[0072] According to FIG. 6, the ratings query process 600 begins at start block 602. The decision block 604 determines whether or not the rating supplier (exception list or rating service) is active. Again, the exception list is treated as a rating supplier for the purposes of this description. If the rating supplier is not active, the process branches to block 606, returning a set of rating data that is null and the process ceases in stop block 608. If the rating supplier is active, the process branches to decision blocks 610 to determine whether there is a rating in the DNS cache (320) for the particular host. If there is no rating, then the process branches to decision block 612. At decision block 612 it is determined whether a rating is in the object cache (330) for the requested URL. If there is no rating in the object cache, then the process branches to block 614. At this time, a rating query is formatted and sent to the rating supplier process. If no rating is returned, then decision block 616 branches to block 606 returning a null set. If however, a rating is returned then the returned rating is formatted into a metadata rating in block 618 so that the rating is usable by the system. The block 618, having created the metadata rating, then branches to decision block 620 to determine whether the rating is relative to the entire host (e.g., web site). If the rating is relative to the entire host, then the formatted metadata rating is added to the DNS cache at block 622 and the formatted metadata rating is returned according to block 624., The query process ceases at block 608, and the main rules evaluation process 500 (FIG. 5) continues.

[0073] Still referring to FIG. 6, if the rating is not for the entire host according to decision block 620, then the process branches to decision block 630 to determine whether there is a metadata rating in the DNS cache already. If so, the formatted metadata rating is removed from the DNS cache in block 632 and the process branches to decision block 634 where it is determined whether an object cache entry (332) exists. Similarly, if there is no metadata rating in the DNS cache according to decision block 630, the process again branches to decision block 634 to determine whether an object cache entry exists. If there is an object cache entry, the process branches to block 636. At this time, a formatted metadata rating is added to the object cache entry. The process again returns the formatted metadata rating at this time according to block 624 and stops (block 608). If there is no object cache entry according to decision block 634 then the process branches directly to block 624 to return the formatted metadata rating and stops (block 608).

[0074] Returning to decision block 612, if there is a rating in the object cache for the particular requested URL, then the process branches to decision block 640 which determines whether the filter's time stamp matches that of the rating. If not, then the process branches back to block 614 and the above-described steps are repeated. If the filter time stamp matches, then the process branches to decision block 642. At this time, the process determines whether the rating supplier time stamp matches that of the object cache. If not, then the process again branches back to block 614. If the time stamp does match, then the process branches to block 644 in which the formatted metadata rating is retrieved from the cache since it has not changed and the process returns the formatted metadata rating in block 624, finally stopping in block 608.

[0075] Referring again to 610, if a rating exist in the DNS cache for the host (requested web site), then the process branches to decision block 650. At this time, the process determines whether the filter time stamp matches that of the DNS cache rating entry. If the time stamp does not match, then the process branches back to decision block 612. If the filter time stamp does match, then the process branches to decision block 652 where, again, the process determines whether the rating supplier time stamp matches that of the DNS cache entry for the particular host. Again, if it does not match, then the process branches back to decision block 612. If, however, the time stamp does match, then the formatted metadata rating is retrieved from the cache (block 644) and returned (block 624). The process ceases at stop block 608.

[0076] Referring again to the main rule evaluation process 500 (FIG. 5), once the exception list is queried for a rating on the requested content, decision block 508 determines whether a rating was found. If a rating is found, then the process branches to decision block 510, determining whether the rating passes the allow rule. If the rating passes the allow rule, then the process branches to block 512 where the rule that decides the status of the requested content is set. The process then branches to block 514 in which the completion of any monitoring processes occurs. The generalized monitoring process is described further below. In summary, monitoring involves the vending of a page, but logging of the requested content in a file for further review by personnel. A warning can also be given about the possible nature of the content. Once the monitoring process is completed, the rule evaluation process branches to decision block 516. If the deciding rule is set, which in this case it has been in block 512, then the procedure branches directly to decision to block 518. At this time the process determines whether the deciding rule is an exception list rule. If it is an exception list rule, then the evaluation termination process is initiated in block 520. The evaluation termination process is described in detail in FIG. 9.

[0077] Referring to FIG. 9, the evaluation termination process 900 begins at start block 902. First, decision block 904 determines whether the deciding rule is set. If the deciding rule is not set, then the process simply stops at stop block 906. If the deciding rule is set, then the process branches to decision block 908. If logging of decisions is enabled, then a log file entry for the decision is formatted and written in block 910. The process then branches to decision block 912 conversely, if logging is not enabled, then decision block 908 branches directly to decision block 912. In decision block 912 the process determines whether the deciding rule is an allow rule. If the deciding rule is an allow rule then the process terminates at block 906. If the deciding rule is not an allow rule, then the process branches to decision block 914. The process determines whether a rating supplier has provided a blockpage rating. If so, then decision block 914 branches to block 916 where a customized blockpage action is taken and a set block decision is returned to the requestor. The process then stops at block 906. If, however, a blockpage rating is not returned, decision block 914 branches to block 918 where a block response string of characters and a corresponding set block response are sent back to the requestor. The process then stops at stop block 906.

[0078] Reference is again made to the rule evaluation process 500 of FIG. 5. Following the termination of the evaluation process in block 520 (see also FIG. 9), the rule evaluation process 500 ceases at stop block 522. If the content is allowed, then the termination process ends with the vending of the content to the user. Conversely, if the content is not allowed, then the content is not vended (e.g. it is blocked). Referring again to decision block 510, if the rating does not pass an allow rule, then the process branches to decision block 530, which determines whether the rating fails a block rule, then the process branches to process block 512 where the deciding rule is set. If the rating does not fail a block rule in decision block 530, then the process branches to decision block 533 where the process searches for another rule to evaluate. This will be described further below.

[0079] Referring again to decision block 516, if a deciding rule is not set, then the process branches to decision block 534, where it is determined whether the requested URL is formatted with an IP address. If it is not, then the process branches to the evaluation termination process block 520. If the URL is formatted with an IP address, then the process branches to process block 590. The URL is then formatted with the particular content host name and the process branches back to block 506 in which a query for a rating from the exception list occurs.

[0080] Referring again to decision block 518, when the evaluation process determines that the deciding rule set is not an exception list rule, then the process branches to decision block 540 to determine whether the exception list is active. If it is not active, then the process branches back to the evaluation termination process block 520 (also FIG. 9). If the exception list is active, then the process branches to process block 542 where a URL with the content host name is formatted. The process then queries for a rating from the exception list in block 544. The query process is in accordance with the steps described above in FIG. 6. The evaluation process then branches to decision block 546 to determine whether a rating has been found by the query process. If no rating is found, then the process branches again to the evaluation termination process block 520 (also FIG. 9). If a rating is found, then the evaluation process branches to decision block 548. In block 548, the process determines whether the rating passes an allow rule. If it does not, then the evaluation process branches to decision block 550 where it is determined whether the rating fails a block rule. If it does not fail a block rule, then the evaluation process branches to the evaluation termination process in block 520 (also FIG. 9). If the rating either passes an allow rule or fails a block rule, then the deciding rule is set in process block 552 and the evaluation process branches, yet again, to the evaluation termination block 520 (also FIG. 9).

[0081] Reference is again made to decision block 508 of the evaluation process 500. If a rating for the requested host/content is not found on an exception list, then the decision block 508 branches to decision block 532. Note that decision block 532 is also called if an exception list rating does not pass an allow rule and does not fail a block rule via decision block 530. In decision block 532, the process determines whether there is another rule to evaluate. Recall that rules are processed as described above according to a specific order: Exception List; Allow Rule Only Vendors; Allow Block Rule Vendors; Block Rule Only Vendors. Typically, once all rules in the order are exhausted, then there are no more rules to evaluate. If so, then the evaluation process branches to decision block 516, and the process continues as described above. If, however, there are more rules to evaluate, then the evaluation process branches from decision block 532 to process block 554 where the rating query process is undertaken again. In this instance, the rating query process (as described generally in FIG. 6) is performed for a given rating supplier rather than for the exception list

[0082] The evaluation process branches from process block 554 (rating query of FIG. 6) to decision block 556 to determine whether a rating has been found by the query process. If no rating is found, then the process branches to the rule-skipping process as set forth in block 558. As described further below, the results of the rule-skipping process can enable a branch back to one of decision block 532, decision block 516 or process block 512. In general, the rule-skipping process can either set a block-rule flag and/or terminate the evaluation process as if a block rule has failed. The rule-skipping process is described further in FIG. 7.

[0083] According to FIG. 7, the rule-skipping process 700 begins at start block 702. The process proceeds to decision block 704, in which it is determined whether the rule is a block rule. If the rule is not a block rule then the process proceeds to decision block 706 in which it is determined whether there is another rule to evaluate. If there is not another rule to evaluate, then the process branches to a block 708 instructing the evaluation process 500 to branch to the deciding rule set decision block 516. The process 700 then stops according to stop block 710. If decision block 706 determines that there is another rule to evaluate, then the process branches to decision block 712. The rule-skipping process then determines whether this rule is a next rule from the same rating supplier. If it is not the next rule from the same rating supplier as the previous rule, then the rule-skipping process branches to process block 714, instructing the rule evaluation process 500 to branch back to its decision block 532. Conversely, if rule-skipping decision block 712 determines that the next rule is from the same rating supplier, then the rule-skipping process branches back to original decision block 704 to determine whether the next rule is a block rule. If either the first rule or a next rule from the same rating supplier is a block rule, then decision block 704 branches to decision block 718. Decision block 718 determines whether the rule is associated with a block-not-rated flag (e.g. a flag set to block unrated content). If the block-not-rated flag is set, then the decision block 718 branches to decision block 720. If all the allow rules have been evaluated then the procedure branches to process block 722 which causes the rule evaluation procedure 500 to branch to the set deciding rule block 512. If the rule does not have a block-not-rated flag set, then the process branches to decision block 706 and continues as described above. Similarly, if the block-not-rated flag is set, but all allow rules have not been evaluated, then the decision block 720 branches to decision block 724. The decision block 724 determines whether the block rule is the first block rule to fail. If so, then block-rule-failed flag is set and the particular rule is committed to storage according to block 726. The process then branches back to decision block 706. If, however, the block rule is not the first to fail, then the decision branches directly back to block 706.

[0084] Referring again to the rule evaluation process 500, if decision block 556 determines that a rating of the rating supplier has been located, then the evaluation procedure branches to decision block 560. In decision block 560, the process determines whether the rule is an allow rule. If the rule is an allow rule, then the process branches to decision block 562. Decision block 562 determines whether the rating passes an allow rule. If the rating passes an allow rule then the process branches back to process block 512 where the deciding rule is set.

[0085] If the rule is not an allow rule, according to block 560, then the process branches to decision block 564. This block determines whether the rule is a block rule, then the process branches to block 566 where it is determined whether the rating fails the block rule. If the rating fails the block rule then the process branches to decision block 568. This block determines whether all allow rules have been evaluated. If all allow rules have been evaluated, then the host content remains blocked, and the evaluation procedure branches back to block 512 to set the deciding rule.

[0086] Referring back to block 562, if the rule is an allow rule, but the rating does not pass the allow rule, then the process branches to decision block 570. This block determines whether there is another rule to evaluate. If there is no remaining rule to evaluate, then the process branches to decision block 536 to determine whether the block-rule-failed flag is set. If it is not set, then the decision block 536 branches to the deciding rule set block 516. Conversely, if the block-rule-failed flag is set then the deciding rule becomes set in block 512. If there is another rule to evaluate, then the decision block 570 branches to decision block 572. This block determines whether a next rule from the same rating supplier exists. If no further rule from this supplier exists, then the evaluation process branches back to block 554 to query for a rating from another rating supplier. If, however, a next rule from the same rating supplier does exist, then the decision block 572 branches back to decision block 560 to determine whether the next rule an allow rule.

[0087] Referring to decision block 564, if the rule is not a block rule and is not an allow rule, then the process branches to the monitor rule process block 574. The monitor rule process is described further in FIG. 8. The monitor rule process 800 initiates at start block 802. The process begins at decision block 804 in which it is determined whether the rating fails a monitor rule. In general, this means that the rating requires monitoring of use and access to particular content Monitoring typically entails the vending of questionable or unrated content, along with the logging of content to a file for future review. As an option, there can be included the transmission of a warning of questionable content to the user with a prompt to the user to either accept or refuse the transmitted content.

[0088] If the rating fails the monitor rule then the process branches to block 806 in which a log file entry for the rule is created and written. The process then branches to decision block 808. Similarly, if the process does not fail the monitor then the monitor process 800 branches directly to decision block 808. In decision block 808, it is determined whether there is another rule to evaluate. If not, then the monitor process 800 branches to the set deciding rule block 512 in the rule evaluation process 500. This occurs in block 810. The monitor process 800 then ceases at stop block 812. If, however, there is another rule to evaluate, the monitor process branches to decision block 814 in which the process determines whether there a next rule from a rating supplier. If there is a next rule, then the process branches back to decision block 804 to determine if the rating fails the monitor rule. If there is not next rule from the same rating supplier then the decision block 814 branches to block 816 that directs the process to go to the evaluation process decision block 570 to search for another rule to evaluate. The rule evaluation process 500 proceeds from this point as described above.

[0089] Reference is now made to evaluation process decision block 566. If the rule is a block rule according to decision block 564, but the rating does not fail the block rule, then the decision block 566 branches to 570, again searching for another rule to evaluate.

[0090] Similarly, if the rule an allow rule according to decision block 560, but the rating does not pass as an allow rule, then the process branches to decision block 570, searching for another rule to evaluate.

[0091] Referring to decision block 568, if the rule is a block rule according to decision block 564, and the rating fails the block rule, according to decision block 566, then as described above, the process branches to decision block 568 to determine if all allow rules have been evaluated. If all allow rules have not been evaluated then the process branches to decision block 580. Decision block 580 then determines whether this is the first block rule to fail. If it is the first block rule to fail, then the process branches to block 582 to set the block-rule-failed flag and to commit the rule to storage. If, however, this is not the first block rule to fail, then decision block 580 branches directly to decision block 570 to determine whether there is another rule to evaluate.

[0092] Finally, referring to decision block 516, if the deciding rules has not been set, then the process branches to decision block 534 to determine whether a URL formatted with an IP address exists. If it does not exist, as described above, the process terminates at block 520. If, however, a URL with an IP address exists, then the process branches to process block 590. The URL is then formatted with the particular content host name and the process branches back to block 506 in which a query for a rating from the exception list occurs. The rule evaluation process continues therefrom as described above eventually resulting in the vending or blocking of content with monitoring, warning or other action as deemed appropriate.

[0093] The foregoing has been a detailed description of a preferred embodiment of the invention. Various modifications and additions can be without departing from the spirit and scope of the invention. For example, while the preferred embodiment specifies use of the PICS standard, a variety of rating standards, both proprietary and freely available, can be utilized. In addition, a multiplicity of rating standards can be employed by the system with appropriate procedures for differentiating between these systems. The particular caching technique used for various data can be modified or expanded and caches can be resident on a variety of physical memory structures. According, this invention is meant to be taken only by way of example and not to otherwise limit the scope of the invention.

* * * * *

System and method for filtering of web-based content stored on a proxy cache server

McClain, Carolyn B. ; et al.

References